├── README.md
└── floydhub_sdk_demo.ipynb


/README.md:
--------------------------------------------------------------------------------
1 | ## Automating FloydHub workflows
2 | 
3 | FloydHub has public apis that can be used to automate ML pipelines.
4 | See the Jupyter notebook to learn how to do this.
5 | 
6 | 


--------------------------------------------------------------------------------
/floydhub_sdk_demo.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# FloydHub SDK Demo\n",
  8 |     "\n",
  9 |     "This notebook shows how to use the Floyd SDK to automate your FloydHub workflow. You can do all the operations you perform on the cli programatically using the Python SDK. In fact the cli itself uses the sdk to communicate with the FloydHub server. Use pip to install the sdk.\n",
 10 |     "\n",
 11 |     "The best way to execute this notebook is to create a new directory and copy this notebook in to that directory. Then populate the current directory with some files."
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "code",
 16 |    "execution_count": null,
 17 |    "metadata": {},
 18 |    "outputs": [],
 19 |    "source": [
 20 |     "# Install sdk\n",
 21 |     "!pip install -q floyd-cli\n",
 22 |     "\n",
 23 |     "# Create some files for testing purposes\n",
 24 |     "!echo \"hello\" > ./hello.txt\n",
 25 |     "!echo \"print (\\\"Hello world\\\")\" > ./hello_world.py"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "metadata": {},
 31 |    "source": [
 32 |     "# Authentication with username / password\n",
 33 |     "\n",
 34 |     "First step is to authenticate yourselves against the FloydHub server. You can use your username / password combo to get an access token from the server.\n",
 35 |     "\n",
 36 |     "The token is saved by the AuthConfigManager and automatically accessed in subsequent sdk calls. The path where this token is stored is `~/.floydconfig`"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "code",
 41 |    "execution_count": null,
 42 |    "metadata": {},
 43 |    "outputs": [],
 44 |    "source": [
 45 |     "from floyd.client.auth import AuthClient\n",
 46 |     "from floyd.log import configure_logger\n",
 47 |     "from floyd.model.access_token import AccessToken\n",
 48 |     "from floyd.model.credentials import Credentials\n",
 49 |     "from floyd.manager.auth_config import AuthConfigManager\n",
 50 |     "\n",
 51 |     "# Initialize logger\n",
 52 |     "configure_logger(verbose=False)\n",
 53 |     "\n",
 54 |     "# Login using credentials (replace with your credentials)\n",
 55 |     "login_credentials = Credentials(username=\"your_username\", password=\"your_password\")\n",
 56 |     "access_code = AuthClient().login(login_credentials)\n",
 57 |     "user = AuthClient().get_user(access_code)\n",
 58 |     "access_token = AccessToken(username=user.username,\n",
 59 |     "                           token=access_code)\n",
 60 |     "\n",
 61 |     "# Auth token is stored and automatically used in subsequent sdk calls\n",
 62 |     "AuthConfigManager.set_access_token(access_token)"
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "markdown",
 67 |    "metadata": {},
 68 |    "source": [
 69 |     "# Authentication with API Key\n",
 70 |     "\n",
 71 |     "Alternatively, you can get an api key for your account at https://www.floydhub.com/settings/apikey. You can set the expiration of the key and use it for authentication."
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "code",
 76 |    "execution_count": null,
 77 |    "metadata": {},
 78 |    "outputs": [],
 79 |    "source": [
 80 |     "AuthConfigManager.set_apikey(username=\"your_username\", apikey=\"apikey_from_floydhub\")"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "markdown",
 85 |    "metadata": {},
 86 |    "source": [
 87 |     "# Data\n",
 88 |     "\n",
 89 |     "FloydHub manages data separately from code. You need to create a dataset directly from the [website](https://www.floydhub.com/datasets/create). Then use the dataset name in the section below to upload the contents of the current directory to FloydHub as a dataset. You will later mount this data into a job."
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "code",
 94 |    "execution_count": 1,
 95 |    "metadata": {},
 96 |    "outputs": [
 97 |     {
 98 |      "name": "stderr",
 99 |      "output_type": "stream",
100 |      "text": [
101 |       "Waiting for unpack....\n"
102 |      ]
103 |     }
104 |    ],
105 |    "source": [
106 |     "from floyd.client.data import DataClient\n",
107 |     "from floyd.client.dataset import DatasetClient\n",
108 |     "from floyd.manager.auth_config import AuthConfigManager\n",
109 |     "from floyd.manager.data_config import DataConfig\n",
110 |     "from floyd.cli.data_upload_utils import initialize_new_upload, complete_upload\n",
111 |     "from floyd.cli.utils import get_namespace_from_name\n",
112 |     "\n",
113 |     "# Get access token from the stored config file\n",
114 |     "# Or re-authenticate from the previous step\n",
115 |     "access_token = AuthConfigManager.get_access_token()\n",
116 |     "\n",
117 |     "# Replace with your dataset name\n",
118 |     "dataset_name = \"floydlabs/test11\"\n",
119 |     "dataset = DatasetClient().get_by_name(dataset_name)\n",
120 |     "\n",
121 |     "namespace, name = get_namespace_from_name(dataset_name)\n",
122 |     "data_config = DataConfig(name=name,\n",
123 |     "                         namespace=namespace,\n",
124 |     "                         family_id=dataset.id)\n",
125 |     "\n",
126 |     "# This is the actual upload step\n",
127 |     "initialize_new_upload(data_config, access_token, \"new upload\")\n",
128 |     "complete_upload(data_config)"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "code",
133 |    "execution_count": 2,
134 |    "metadata": {},
135 |    "outputs": [],
136 |    "source": [
137 |     "from floyd.manager.data_config import DataConfigManager\n",
138 |     "from floyd.cli.utils import normalize_data_name\n",
139 |     "\n",
140 |     "# Get the uploaded data name\n",
141 |     "data_config = DataConfigManager.get_config()\n",
142 |     "data_name = normalize_data_name(data_config.data_name)"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "markdown",
147 |    "metadata": {},
148 |    "source": [
149 |     "## Dataset info & status\n",
150 |     "\n",
151 |     "We can retrieve the name and info of the datasets we have uploaded. "
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "code",
156 |    "execution_count": 3,
157 |    "metadata": {},
158 |    "outputs": [
159 |     {
160 |      "name": "stdout",
161 |      "output_type": "stream",
162 |      "text": [
163 |       "DATA NAME                     CREATED         STATUS    DISK USAGE\n",
164 |       "----------------------------  --------------  --------  ------------\n",
165 |       "floydlabs/datasets/test11/22  13 seconds ago  valid     341.0 KB\n",
166 |       "floydlabs/datasets/test11/18  1 months ago    valid     53.0 KB\n",
167 |       "floydlabs/datasets/test11/17  4 months ago    valid     11.18 MB\n",
168 |       "floydlabs/datasets/test11/15  7 months ago    valid     795.42 MB\n",
169 |       "floydlabs/datasets/test11/14  7 months ago    valid     795.47 MB\n",
170 |       "floydlabs/datasets/test11/13  7 months ago    valid     795.46 MB\n",
171 |       "floydlabs/datasets/test11/12  7 months ago    valid     795.4 MB\n",
172 |       "floydlabs/datasets/test11/11  8 months ago    valid     769.06 MB\n",
173 |       "floydlabs/datasets/test11/8   8 months ago    valid     278.07 MB\n",
174 |       "floydlabs/datasets/test11/9   8 months ago    valid     278.07 MB\n",
175 |       "floydlabs/datasets/test11/10  8 months ago    valid     62.77 MB\n",
176 |       "floydlabs/datasets/test11/7   8 months ago    valid     20.0 KB\n",
177 |       "floydlabs/datasets/test11/6   10 months ago   valid     40.0 KB\n",
178 |       "floydlabs/datasets/test11/5   10 months ago   valid     40.0 KB\n",
179 |       "floydlabs/datasets/test11/4   10 months ago   valid     10.0 KB\n",
180 |       "floydlabs/datasets/test11/3   10 months ago   valid     10.0 KB\n",
181 |       "floydlabs/datasets/test11/2   1 years ago     valid     10.0 KB\n",
182 |       "floydlabs/datasets/test11/1   1 years ago     valid     10.0 KB\n"
183 |      ]
184 |     }
185 |    ],
186 |    "source": [
187 |     "from floyd.cli.data import get_data_object\n",
188 |     "from floyd.client.data import DataClient\n",
189 |     "from tabulate import tabulate\n",
190 |     "\n",
191 |     "def print_data(data_sources):\n",
192 |     "    \"\"\"\n",
193 |     "    Print dataset information in tabular form\n",
194 |     "    \"\"\"\n",
195 |     "    if not data_sources:\n",
196 |     "        return\n",
197 |     "\n",
198 |     "    headers = [\"DATA NAME\", \"CREATED\", \"STATUS\", \"DISK USAGE\"]\n",
199 |     "    data_list = []\n",
200 |     "    for data_source in data_sources:\n",
201 |     "        data_list.append([data_source.name,\n",
202 |     "                          data_source.created_pretty,\n",
203 |     "                          data_source.state, data_source.size])\n",
204 |     "    print(tabulate(data_list, headers=headers))\n",
205 |     "\n",
206 |     "# This will retrieve the info for all the datasets under floydlabs/test11\n",
207 |     "data_sources = DataClient().get_all()\n",
208 |     "print_data(data_sources)"
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "code",
213 |    "execution_count": 4,
214 |    "metadata": {},
215 |    "outputs": [
216 |     {
217 |      "name": "stdout",
218 |      "output_type": "stream",
219 |      "text": [
220 |       "DATA NAME                     CREATED         STATUS    DISK USAGE\n",
221 |       "----------------------------  --------------  --------  ------------\n",
222 |       "floydlabs/datasets/test11/22  17 seconds ago  valid     341.0 KB\n"
223 |      ]
224 |     }
225 |    ],
226 |    "source": [
227 |     "# or we can get the status of a single entry\n",
228 |     "dataset_name = \"floydlabs/test11\"\n",
229 |     "\n",
230 |     "data_source = get_data_object(dataset_name, use_data_config=False)\n",
231 |     "print_data([data_source] if data_source else [])"
232 |    ]
233 |   },
234 |   {
235 |    "cell_type": "markdown",
236 |    "metadata": {},
237 |    "source": [
238 |     "## Delete a dataset version\n",
239 |     "\n",
240 |     "You can easily delete the dataset version[s]. Please, be careful with this! Expecially if you are automataizing this process."
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": 5,
246 |    "metadata": {},
247 |    "outputs": [
248 |     {
249 |      "name": "stdout",
250 |      "output_type": "stream",
251 |      "text": [
252 |       "Data Deleted:  floydlabs/datasets/test11/22\n"
253 |      ]
254 |     }
255 |    ],
256 |    "source": [
257 |     "# We will remove the last dataset version we have just created\n",
258 |     "dataset_to_remove = 'floydlabs/datasets/test11/22'\n",
259 |     "\n",
260 |     "data_source = get_data_object(dataset_to_remove, use_data_config=True)\n",
261 |     "if not DataClient().delete(data_source.id):\n",
262 |     "    print(\"Error!\")\n",
263 |     "else:\n",
264 |     "    print(\"Data Deleted: \", dataset_to_remove)"
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "markdown",
269 |    "metadata": {},
270 |    "source": [
271 |     "# Job\n",
272 |     "\n",
273 |     "You can kick off a training job, monitor it and download the output all using the sdk. The next section shows how to run a job under a specific project. Create the project from the FloydHub [website](https://www.floydhub.com/projects/create) and use the project name in the next section."
274 |    ]
275 |   },
276 |   {
277 |    "cell_type": "code",
278 |    "execution_count": null,
279 |    "metadata": {},
280 |    "outputs": [],
281 |    "source": [
282 |     "from floyd.client.project import ProjectClient\n",
283 |     "from floyd.manager.experiment_config import ExperimentConfigManager\n",
284 |     "from floyd.manager.floyd_ignore import FloydIgnoreManager\n",
285 |     "from floyd.model.experiment_config import ExperimentConfig\n",
286 |     "from floyd.cli.utils import get_namespace_from_name\n",
287 |     "\n",
288 |     "# Replace with your project name\n",
289 |     "project_name = \"floydlabs/private-proj\"\n",
290 |     "project = ProjectClient().get_by_name(project_name)\n",
291 |     "\n",
292 |     "namespace, name = get_namespace_from_name(project_name)\n",
293 |     "experiment_config = ExperimentConfig(name=name,\n",
294 |     "                                     namespace=namespace,\n",
295 |     "                                     family_id=project.id)\n",
296 |     "ExperimentConfigManager.set_config(experiment_config)\n",
297 |     "FloydIgnoreManager.init()"
298 |    ]
299 |   },
300 |   {
301 |    "cell_type": "markdown",
302 |    "metadata": {},
303 |    "source": [
304 |     "# Mounting Data\n",
305 |     "\n",
306 |     "You can mount any data on FloydHub (that you have access to) in to your job at the path you specify. In this case we are mounting the dataset we created above and mounting it at `/training` path. You also need to specify the floydhub instance type and the [environment](https://docs.floydhub.com/guides/environments/) you want to use.\n",
307 |     "\n",
308 |     "Running a job is currently two step process - you first need to upload the code and then run the experiment (or job)."
309 |    ]
310 |   },
311 |   {
312 |    "cell_type": "code",
313 |    "execution_count": 9,
314 |    "metadata": {},
315 |    "outputs": [
316 |     {
317 |      "name": "stdout",
318 |      "output_type": "stream",
319 |      "text": [
320 |       "Creating project run. Total upload size: 29.3KiB\n",
321 |       "Syncing code ...\n"
322 |      ]
323 |     }
324 |    ],
325 |    "source": [
326 |     "from floyd.client.experiment import ExperimentClient\n",
327 |     "from floyd.client.module import ModuleClient\n",
328 |     "from floyd.constants import INSTANCE_ARCH_MAP\n",
329 |     "from floyd.model.experiment import ExperimentRequest\n",
330 |     "from floyd.model.module import Module\n",
331 |     "\n",
332 |     "# Run a job\n",
333 |     "# Get the data mount id (data_name comes from the previous step)\n",
334 |     "data_obj = DataClient().get(normalize_data_name(data_name))\n",
335 |     "data_ids = [\"{}:{}\".format(data_obj.id, \"/training\")]\n",
336 |     "\n",
337 |     "# Define the data mount point for data\n",
338 |     "module_inputs = {\n",
339 |     "    \"name\": \"/training\",\n",
340 |     "    \"type\": \"dir\" # Always use dir here\n",
341 |     "}\n",
342 |     "    \n",
343 |     "# First create a module and then use it in the experiment create step\n",
344 |     "\n",
345 |     "experiment_name = project_name\n",
346 |     "instance_type = \"c1\" # You can use c1 for cpu, c2 for cpu2, g1 for gpu and g2 for gpu2\n",
347 |     "project_id = project.id\n",
348 |     "\n",
349 |     "# Get env value\n",
350 |     "arch = INSTANCE_ARCH_MAP[instance_type]\n",
351 |     "env = \"tensorflow-1.5\"  # Choose env that you need\n",
352 |     "\n",
353 |     "module = Module(name=experiment_name,\n",
354 |     "                description='foo',\n",
355 |     "                command=\"ls /training\",\n",
356 |     "                mode='command',\n",
357 |     "                family_id=project_id,\n",
358 |     "                inputs=module_inputs,\n",
359 |     "                env=env,\n",
360 |     "                arch=arch)\n",
361 |     "\n",
362 |     "module_id = ModuleClient().create(module)\n",
363 |     "    \n",
364 |     "experiment_request = ExperimentRequest(name=experiment_name,\n",
365 |     "                                       description='foo',\n",
366 |     "                                       full_command='ls /training',\n",
367 |     "                                       module_id=module_id,\n",
368 |     "                                       env=env,\n",
369 |     "                                       data_ids=data_ids,\n",
370 |     "                                       family_id=project_id,\n",
371 |     "                                       instance_type=instance_type)\n",
372 |     "expt_info = ExperimentClient().create(experiment_request)"
373 |    ]
374 |   },
375 |   {
376 |    "cell_type": "markdown",
377 |    "metadata": {},
378 |    "source": [
379 |     "# Tracking an experiment\n",
380 |     "\n",
381 |     "You can track an experiment periodically and wait for it to finish. You can also setup a [notification webhook](https://docs.floydhub.com/guides/notifications/) and get notified when jobs finish. You can also programatically download the output of your training job."
382 |    ]
383 |   },
384 |   {
385 |    "cell_type": "code",
386 |    "execution_count": 12,
387 |    "metadata": {},
388 |    "outputs": [
389 |     {
390 |      "name": "stdout",
391 |      "output_type": "stream",
392 |      "text": [
393 |       "success\n"
394 |      ]
395 |     }
396 |    ],
397 |    "source": [
398 |     "from floyd.client.experiment import ExperimentClient\n",
399 |     "from floyd.client.resource import ResourceClient\n",
400 |     "\n",
401 |     "# Track experiment\n",
402 |     "job_id = expt_info['id']\n",
403 |     "experiment = ExperimentClient().get(job_id)\n",
404 |     "print(experiment.state)\n",
405 |     "\n",
406 |     "# Stop running job (works only if the job is queued or running)\n",
407 |     "# ExperimentClient().stop(job_id)"
408 |    ]
409 |   },
410 |   {
411 |    "cell_type": "code",
412 |    "execution_count": 14,
413 |    "metadata": {},
414 |    "outputs": [
415 |     {
416 |      "name": "stdout",
417 |      "output_type": "stream",
418 |      "text": [
419 |       "2019-01-15 10:59:43,547 INFO - Preparing to run TaskInstance <TaskInstance: floydlabs/projects/private-proj/95 (id: MojAv2Wf9kGjENAfqhDEUV)\n",
420 |       "2019-01-15 10:59:43,573 INFO - Starting attempt 1\n",
421 |       "2019-01-15 10:59:43,590 INFO - Downloading and setting up data sources\n",
422 |       "2019-01-15 10:59:43,602 INFO - Downloading and mounting training. ETA: 2 seconds\n",
423 |       "2019-01-15 10:59:43,990 INFO - Using Docker image: floydhub/tensorflow:1.5.0-py3_aws.35\n",
424 |       "2019-01-15 10:59:44,121 INFO - Starting container...\n",
425 |       "2019-01-15 10:59:44,329 INFO - \n",
426 |       "################################################################################\n",
427 |       "\n",
428 |       "2019-01-15 10:59:44,330 INFO - Run Output:\n",
429 |       "2019-01-15 10:59:44,344 INFO - Starting services.\n",
430 |       "2019-01-15 10:59:44,493 INFO - demo\n",
431 |       "2019-01-15 10:59:44,493 INFO - floydhub_sdk_demo.ipynb\n",
432 |       "2019-01-15 10:59:44,493 INFO - hello.txt\n",
433 |       "2019-01-15 10:59:44,493 INFO - hello_world.py\n",
434 |       "2019-01-15 10:59:44,545 INFO - \n",
435 |       "################################################################################\n",
436 |       "\n",
437 |       "2019-01-15 10:59:44,546 INFO - Waiting for container to complete...\n",
438 |       "2019-01-15 10:59:44,708 INFO - Persisting outputs...\n",
439 |       "2019-01-15 10:59:45,002 INFO - Creating data module for output...\n",
440 |       "2019-01-15 10:59:45,076 INFO - Data module created for output.\n",
441 |       "2019-01-15 10:59:45,076 INFO - Persisting data in home...\n",
442 |       "2019-01-15 10:59:45,341 INFO - Home data persisted.\n",
443 |       "2019-01-15 10:59:45,341 INFO - [success] Finished execution\n",
444 |       "\n"
445 |      ]
446 |     }
447 |    ],
448 |    "source": [
449 |     "# Get logs\n",
450 |     "log_resource_id = experiment.instance_log_id\n",
451 |     "logs = ResourceClient().get_content(log_resource_id)\n",
452 |     "print(logs)"
453 |    ]
454 |   },
455 |   {
456 |    "cell_type": "code",
457 |    "execution_count": 13,
458 |    "metadata": {},
459 |    "outputs": [
460 |     {
461 |      "name": "stdout",
462 |      "output_type": "stream",
463 |      "text": [
464 |       "Downloading the tar file to the current directory ...\n",
465 |       "Untarring the contents of the file ...\n",
466 |       "Cleaning up the tar file ...\n"
467 |      ]
468 |     },
469 |     {
470 |      "data": {
471 |       "text/plain": [
472 |        "'output.tar'"
473 |       ]
474 |      },
475 |      "execution_count": 13,
476 |      "metadata": {},
477 |      "output_type": "execute_result"
478 |     }
479 |    ],
480 |    "source": [
481 |     "# Download an output model file\n",
482 |     "output_id = experiment.output_id\n",
483 |     "data_url = \"https://www.floydhub.com/api/v1/resources/{}?content=true&download=true\".format(output_id)\n",
484 |     "DataClient().download_tar(url=data_url,\n",
485 |     "                          untar=True,\n",
486 |     "                          delete_after_untar=True)"
487 |    ]
488 |   },
489 |   {
490 |    "cell_type": "code",
491 |    "execution_count": null,
492 |    "metadata": {},
493 |    "outputs": [],
494 |    "source": [
495 |     "## Get detailed info about the experiment by directly parsing the job info json.\n",
496 |     "## Note: Some of these fields are for internal use and can change without warning.\n",
497 |     "\n",
498 |     "from floyd.client.experiment import ExperimentClient\n",
499 |     "\n",
500 |     "ExperimentClient().request(\"GET\", \"/experiments/\" + experiment.id).json()"
501 |    ]
502 |   },
503 |   {
504 |    "cell_type": "markdown",
505 |    "metadata": {},
506 |    "source": [
507 |     "# Support\n",
508 |     "\n",
509 |     "This sdk is in beta. If you have any questions or are interested in adopting this for your workflow, please contact us at support@floydhub.com. We are happy to support you and work with you in automating your training."
510 |    ]
511 |   }
512 |  ],
513 |  "metadata": {
514 |   "kernelspec": {
515 |    "display_name": "Python 3",
516 |    "language": "python",
517 |    "name": "python3"
518 |   },
519 |   "language_info": {
520 |    "codemirror_mode": {
521 |     "name": "ipython",
522 |     "version": 3
523 |    },
524 |    "file_extension": ".py",
525 |    "mimetype": "text/x-python",
526 |    "name": "python",
527 |    "nbconvert_exporter": "python",
528 |    "pygments_lexer": "ipython3",
529 |    "version": "3.6.5"
530 |   }
531 |  },
532 |  "nbformat": 4,
533 |  "nbformat_minor": 2
534 | }
535 | 


--------------------------------------------------------------------------------