├── README.md
├── turicreate_activity_classification.ipynb
├── turicreate_classify_data.ipynb
├── turicreate_image_classification.ipynb
├── turicreate_image_similarity.ipynb
├── turicreate_object_detection.ipynb
├── turicreate_recommender.ipynb
├── turicreate_sframes_intro.ipynb
├── turicreate_style_transfer.ipynb
└── turicreate_text_classification.ipynb


/README.md:
--------------------------------------------------------------------------------
 1 | # turicreate-colab
 2 | A collection of Google Colaboratory notebooks for Turi Create created from various available resources.
 3 | 
 4 | Click the link to open the notebook in Google Colab. To learn more about Google Colab start at https://colab.research.google.com/. After Google Colab opens the notebook, click on the `File` menu and select `Save a copy in Drive...` and you can edit the notebook on your own Google Drive.
 5 | 
 6 | * SFrames Introduction: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_sframes_intro.ipynb
 7 | * Style Transfer: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_style_transfer.ipynb
 8 | * Image Classification: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_image_classification.ipynb
 9 | * Image Similarity: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_image_similarity.ipynb
10 | * Activity Classification: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_activity_classification.ipynb
11 | * Data Classification: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_classify_data.ipynb
12 | * Object Detection: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_object_detection.ipynb
13 | * Recommender: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_recommender.ipynb
14 | * Text Classification: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_text_classification.ipynb
15 | 


--------------------------------------------------------------------------------
/turicreate_activity_classification.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "nbformat": 4,
   3 |   "nbformat_minor": 0,
   4 |   "metadata": {
   5 |     "colab": {
   6 |       "name": "turicreate-activity-classification.ipynb",
   7 |       "version": "0.3.2",
   8 |       "provenance": [],
   9 |       "collapsed_sections": [],
  10 |       "include_colab_link": true
  11 |     },
  12 |     "kernelspec": {
  13 |       "name": "python3",
  14 |       "display_name": "Python 3"
  15 |     },
  16 |     "accelerator": "GPU"
  17 |   },
  18 |   "cells": [
  19 |     {
  20 |       "cell_type": "markdown",
  21 |       "metadata": {
  22 |         "id": "view-in-github",
  23 |         "colab_type": "text"
  24 |       },
  25 |       "source": [
  26 |         "[View in Colaboratory](https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_activity_classification.ipynb)"
  27 |       ]
  28 |     },
  29 |     {
  30 |       "metadata": {
  31 |         "id": "3zKSmHFi38IA",
  32 |         "colab_type": "text"
  33 |       },
  34 |       "cell_type": "markdown",
  35 |       "source": [
  36 |         "# Activity Classification\n",
  37 |         "https://apple.github.io/turicreate/docs/userguide/activity_classifier/\n",
  38 |         "\n",
  39 |         "Activity classification is the task of identifying a pre-defined set of physical actions using motion-sensory inputs. Such sensors include accelerometers, gyroscopes, thermostats, and more found in most handheld devices today.\n",
  40 |         "\n",
  41 |         "Possible applications include counting swimming laps using a watch's accelerometer data, turning on Bluetooth controlled lights when recognizing a certain gesture using gyroscope data from a handheld phone, or creating shortcuts to your favorite phone applications using hand gestures.\n",
  42 |         "\n",
  43 |         "The activity classifier in Turi Create creates a deep learning model capable of detecting temporal features in sensor data, lending itself well to the task of activity classification. Before we dive into the model architecture, let's see a working example."
  44 |       ]
  45 |     },
  46 |     {
  47 |       "metadata": {
  48 |         "id": "-ywt2dW81uvE",
  49 |         "colab_type": "text"
  50 |       },
  51 |       "cell_type": "markdown",
  52 |       "source": [
  53 |         "## Turi Create and GPU Setup"
  54 |       ]
  55 |     },
  56 |     {
  57 |       "metadata": {
  58 |         "id": "nZBUZmlD1vWh",
  59 |         "colab_type": "code",
  60 |         "colab": {}
  61 |       },
  62 |       "cell_type": "code",
  63 |       "source": [
  64 |         "!apt install libnvrtc8.0\n",
  65 |         "!pip uninstall -y mxnet-cu80 && pip install mxnet-cu80==1.1.0\n",
  66 |         "!pip install turicreate"
  67 |       ],
  68 |       "execution_count": 0,
  69 |       "outputs": []
  70 |     },
  71 |     {
  72 |       "metadata": {
  73 |         "id": "Wtd2fPfSbPe9",
  74 |         "colab_type": "text"
  75 |       },
  76 |       "cell_type": "markdown",
  77 |       "source": [
  78 |         "## Google Drive Access\n",
  79 |         "\n",
  80 |         "You will be asked to click a link to generate a secret key to access your Google Drive. \n",
  81 |         "\n",
  82 |         "Copy and paste secret key it into the space provided with the notebook."
  83 |       ]
  84 |     },
  85 |     {
  86 |       "metadata": {
  87 |         "id": "BvGd7tK8bQmM",
  88 |         "colab_type": "code",
  89 |         "colab": {
  90 |           "base_uri": "https://localhost:8080/",
  91 |           "height": 35
  92 |         },
  93 |         "outputId": "73652884-cee8-44a9-fee7-3341cbfde329"
  94 |       },
  95 |       "cell_type": "code",
  96 |       "source": [
  97 |         "import os.path\n",
  98 |         "from google.colab import drive\n",
  99 |         "\n",
 100 |         "# mount Google Drive to /content/drive/My Drive/\n",
 101 |         "if os.path.isdir(\"/content/drive/My Drive\"):\n",
 102 |         "  print(\"Google Drive already mounted\")\n",
 103 |         "else:\n",
 104 |         "  drive.mount('/content/drive')"
 105 |       ],
 106 |       "execution_count": 1,
 107 |       "outputs": [
 108 |         {
 109 |           "output_type": "stream",
 110 |           "text": [
 111 |             "Google Drive already mounted\n"
 112 |           ],
 113 |           "name": "stdout"
 114 |         }
 115 |       ]
 116 |     },
 117 |     {
 118 |       "metadata": {
 119 |         "id": "Cv8D8h_8bfQ2",
 120 |         "colab_type": "text"
 121 |       },
 122 |       "cell_type": "markdown",
 123 |       "source": [
 124 |         "## Fetch Data"
 125 |       ]
 126 |     },
 127 |     {
 128 |       "metadata": {
 129 |         "id": "gb2Y5xDHJQoh",
 130 |         "colab_type": "code",
 131 |         "colab": {}
 132 |       },
 133 |       "cell_type": "code",
 134 |       "source": [
 135 |         "import os.path\n",
 136 |         "import urllib.request\n",
 137 |         "import tarfile\n",
 138 |         "import zipfile\n",
 139 |         "import gzip\n",
 140 |         "from shutil import copy\n",
 141 |         "\n",
 142 |         "def fetch_remote_datafile(filename, remote_url):\n",
 143 |         "  if os.path.isfile(\"./\" + filename):\n",
 144 |         "    print(\"already have \" + filename + \" in workspace\")\n",
 145 |         "    return\n",
 146 |         "  print(\"fetching \" + filename + \" from \" + remote_url + \"...\")\n",
 147 |         "  urllib.request.urlretrieve(remote_url, \"./\" + filename)\n",
 148 |         "\n",
 149 |         "def cache_datafile_in_drive(filename):\n",
 150 |         "  if os.path.isfile(\"./\" + filename) == False:\n",
 151 |         "    print(\"cannot cache \" + filename + \", it is not in workspace\")\n",
 152 |         "    return\n",
 153 |         "  \n",
 154 |         "  data_drive_path = \"/content/drive/My Drive/Colab Notebooks/data/\"\n",
 155 |         "  if os.path.isfile(data_drive_path + filename):\n",
 156 |         "    print(\"\" + filename + \" has already been stored in Google Drive\")\n",
 157 |         "  else:\n",
 158 |         "    print(\"copying \" + filename + \" to \" + data_drive_path)\n",
 159 |         "    copy(\"./\" + filename, data_drive_path)\n",
 160 |         "  \n",
 161 |         "\n",
 162 |         "def load_datafile_from_drive(filename, remote_url=None):\n",
 163 |         "  data_drive_path = \"/content/drive/My Drive/Colab Notebooks/data/\"\n",
 164 |         "  if os.path.isfile(\"./\" + filename):\n",
 165 |         "    print(\"already have \" + filename + \" in workspace\")\n",
 166 |         "  elif os.path.isfile(data_drive_path + filename):\n",
 167 |         "    print(\"have \" + filename + \" in Google Drive, copying to workspace...\")\n",
 168 |         "    copy(data_drive_path + filename, \".\")\n",
 169 |         "  elif remote_url != None:\n",
 170 |         "    fetch_remote_datafile(filename, remote_url)\n",
 171 |         "  else:\n",
 172 |         "    print(\"error: you need to manually download \" + filename + \" and put in drive\")\n",
 173 |         "    \n",
 174 |         "def extract_datafile(filename, expected_extract_artifact=None):\n",
 175 |         "  if expected_extract_artifact != None and (os.path.isfile(expected_extract_artifact) or os.path.isdir(expected_extract_artifact)):\n",
 176 |         "    print(\"files in \" + filename + \" have already been extracted\")\n",
 177 |         "  elif os.path.isfile(\"./\" + filename) == False:\n",
 178 |         "    print(\"error: cannot extract \" + filename + \", it is not in the workspace\")\n",
 179 |         "  else:\n",
 180 |         "    extension = filename.split('.')[-1]\n",
 181 |         "    if extension == \"zip\":\n",
 182 |         "      print(\"extracting \" + filename + \"...\")\n",
 183 |         "      data_file = open(filename, \"rb\")\n",
 184 |         "      z = zipfile.ZipFile(data_file)\n",
 185 |         "      for name in z.namelist():\n",
 186 |         "          print(\"    extracting file\", name)\n",
 187 |         "          z.extract(name, \"./\")\n",
 188 |         "      data_file.close()\n",
 189 |         "    elif extension == \"gz\":\n",
 190 |         "      print(\"extracting \" + filename + \"...\")\n",
 191 |         "      if filename.split('.')[-2] == \"tar\":\n",
 192 |         "        tar = tarfile.open(filename)\n",
 193 |         "        tar.extractall()\n",
 194 |         "        tar.close()\n",
 195 |         "      else:\n",
 196 |         "        data_zip_file = gzip.GzipFile(filename, 'rb')\n",
 197 |         "        data = data_zip_file.read()\n",
 198 |         "        data_zip_file.close()\n",
 199 |         "        extracted_file = open('.'.join(filename.split('.')[0:-1]), 'wb')\n",
 200 |         "        extracted_file.write(data)\n",
 201 |         "        extracted_file.close()\n",
 202 |         "    elif extension == \"tar\":\n",
 203 |         "      print(\"extracting \" + filename + \"...\")\n",
 204 |         "      tar = tarfile.open(filename)\n",
 205 |         "      tar.extractall()\n",
 206 |         "      tar.close()\n",
 207 |         "    elif extension == \"csv\":\n",
 208 |         "      print(\"do not need to extract csv\")\n",
 209 |         "    else:\n",
 210 |         "      print(\"cannot extract \" + filename)\n",
 211 |         "      \n",
 212 |         "def load_cache_extract_datafile(filename, expected_extract_artifact=None, remote_url=None):\n",
 213 |         "  load_datafile_from_drive(filename, remote_url)\n",
 214 |         "  extract_datafile(filename, expected_extract_artifact)\n",
 215 |         "  cache_datafile_in_drive(filename)\n",
 216 |         "  "
 217 |       ],
 218 |       "execution_count": 0,
 219 |       "outputs": []
 220 |     },
 221 |     {
 222 |       "metadata": {
 223 |         "id": "XjzgLYjNJRGM",
 224 |         "colab_type": "code",
 225 |         "colab": {
 226 |           "base_uri": "https://localhost:8080/",
 227 |           "height": 71
 228 |         },
 229 |         "outputId": "5f8a1560-c7e6-477c-d2cb-4aed5bc0cdea"
 230 |       },
 231 |       "cell_type": "code",
 232 |       "source": [
 233 |         "load_cache_extract_datafile(\"HAPT Data Set.zip\", \"RawData\", \"http://archive.ics.uci.edu/ml/machine-learning-databases/00341/HAPT%20Data%20Set.zip\")"
 234 |       ],
 235 |       "execution_count": 5,
 236 |       "outputs": [
 237 |         {
 238 |           "output_type": "stream",
 239 |           "text": [
 240 |             "already have HAPT Data Set.zip in workspace\n",
 241 |             "files in HAPT Data Set.zip have already been extracted\n",
 242 |             "HAPT Data Set.zip has already been stored in Google Drive\n"
 243 |           ],
 244 |           "name": "stdout"
 245 |         }
 246 |       ]
 247 |     },
 248 |     {
 249 |       "metadata": {
 250 |         "id": "AdTJumQccZO8",
 251 |         "colab_type": "text"
 252 |       },
 253 |       "cell_type": "markdown",
 254 |       "source": [
 255 |         "## Setup Turi Create"
 256 |       ]
 257 |     },
 258 |     {
 259 |       "metadata": {
 260 |         "id": "CZH8VDCOcebW",
 261 |         "colab_type": "code",
 262 |         "colab": {}
 263 |       },
 264 |       "cell_type": "code",
 265 |       "source": [
 266 |         "import mxnet as mx\n",
 267 |         "import turicreate as tc"
 268 |       ],
 269 |       "execution_count": 0,
 270 |       "outputs": []
 271 |     },
 272 |     {
 273 |       "metadata": {
 274 |         "id": "TfwLewM6ce6-",
 275 |         "colab_type": "code",
 276 |         "colab": {}
 277 |       },
 278 |       "cell_type": "code",
 279 |       "source": [
 280 |         "# Use all GPUs (default)\n",
 281 |         "tc.config.set_num_gpus(-1)\n",
 282 |         "\n",
 283 |         "# Use only 1 GPU\n",
 284 |         "#tc.config.set_num_gpus(1)\n",
 285 |         "\n",
 286 |         "# Use CPU\n",
 287 |         "#tc.config.set_num_gpus(0)"
 288 |       ],
 289 |       "execution_count": 0,
 290 |       "outputs": []
 291 |     },
 292 |     {
 293 |       "metadata": {
 294 |         "id": "ei1gcD_bQNVY",
 295 |         "colab_type": "text"
 296 |       },
 297 |       "cell_type": "markdown",
 298 |       "source": [
 299 |         "## Data Preparation\n",
 300 |         "\n",
 301 |         "https://apple.github.io/turicreate/docs/userguide/activity_classifier/data-preparation.html"
 302 |       ]
 303 |     },
 304 |     {
 305 |       "metadata": {
 306 |         "id": "JiQ0CGKMbEsI",
 307 |         "colab_type": "code",
 308 |         "colab": {
 309 |           "base_uri": "https://localhost:8080/",
 310 |           "height": 238
 311 |         },
 312 |         "outputId": "12f15316-89f0-4181-e986-a38d59eb3307"
 313 |       },
 314 |       "cell_type": "code",
 315 |       "source": [
 316 |         "data_dir = './RawData/'\n",
 317 |         "\n",
 318 |         "def find_label_for_containing_interval(intervals, index):\n",
 319 |         "    containing_interval = intervals[:, 0][(intervals[:, 1] <= index) & (index <= intervals[:, 2])]\n",
 320 |         "    if len(containing_interval) == 1:\n",
 321 |         "        return containing_interval[0]\n",
 322 |         "\n",
 323 |         "# Load labels\n",
 324 |         "labels = tc.SFrame.read_csv(data_dir + 'labels.txt', delimiter=' ', header=False, verbose=False)\n",
 325 |         "labels = labels.rename({'X1': 'exp_id', 'X2': 'user_id', 'X3': 'activity_id', 'X4': 'start', 'X5': 'end'})\n",
 326 |         "labels.head()"
 327 |       ],
 328 |       "execution_count": 11,
 329 |       "outputs": [
 330 |         {
 331 |           "output_type": "execute_result",
 332 |           "data": {
 333 |             "text/html": [
 334 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
 335 |               "    <tr>\n",
 336 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">exp_id</th>\n",
 337 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">user_id</th>\n",
 338 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">activity_id</th>\n",
 339 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">start</th>\n",
 340 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">end</th>\n",
 341 |               "    </tr>\n",
 342 |               "    <tr>\n",
 343 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 344 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 345 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n",
 346 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">250</td>\n",
 347 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1232</td>\n",
 348 |               "    </tr>\n",
 349 |               "    <tr>\n",
 350 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 351 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 352 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7</td>\n",
 353 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1233</td>\n",
 354 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1392</td>\n",
 355 |               "    </tr>\n",
 356 |               "    <tr>\n",
 357 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 358 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 359 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4</td>\n",
 360 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1393</td>\n",
 361 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2194</td>\n",
 362 |               "    </tr>\n",
 363 |               "    <tr>\n",
 364 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 365 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 366 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8</td>\n",
 367 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2195</td>\n",
 368 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2359</td>\n",
 369 |               "    </tr>\n",
 370 |               "    <tr>\n",
 371 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 372 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 373 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n",
 374 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2360</td>\n",
 375 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3374</td>\n",
 376 |               "    </tr>\n",
 377 |               "    <tr>\n",
 378 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 379 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 380 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">11</td>\n",
 381 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3375</td>\n",
 382 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3662</td>\n",
 383 |               "    </tr>\n",
 384 |               "    <tr>\n",
 385 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 386 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 387 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">6</td>\n",
 388 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3663</td>\n",
 389 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4538</td>\n",
 390 |               "    </tr>\n",
 391 |               "    <tr>\n",
 392 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 393 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 394 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10</td>\n",
 395 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4539</td>\n",
 396 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4735</td>\n",
 397 |               "    </tr>\n",
 398 |               "    <tr>\n",
 399 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 400 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 401 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4</td>\n",
 402 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4736</td>\n",
 403 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5667</td>\n",
 404 |               "    </tr>\n",
 405 |               "    <tr>\n",
 406 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 407 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 408 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">9</td>\n",
 409 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5668</td>\n",
 410 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5859</td>\n",
 411 |               "    </tr>\n",
 412 |               "</table>\n",
 413 |               "[10 rows x 5 columns]<br/>\n",
 414 |               "</div>"
 415 |             ],
 416 |             "text/plain": [
 417 |               "Columns:\n",
 418 |               "\texp_id\tint\n",
 419 |               "\tuser_id\tint\n",
 420 |               "\tactivity_id\tint\n",
 421 |               "\tstart\tint\n",
 422 |               "\tend\tint\n",
 423 |               "\n",
 424 |               "Rows: 10\n",
 425 |               "\n",
 426 |               "Data:\n",
 427 |               "+--------+---------+-------------+-------+------+\n",
 428 |               "| exp_id | user_id | activity_id | start | end  |\n",
 429 |               "+--------+---------+-------------+-------+------+\n",
 430 |               "|   1    |    1    |      5      |  250  | 1232 |\n",
 431 |               "|   1    |    1    |      7      |  1233 | 1392 |\n",
 432 |               "|   1    |    1    |      4      |  1393 | 2194 |\n",
 433 |               "|   1    |    1    |      8      |  2195 | 2359 |\n",
 434 |               "|   1    |    1    |      5      |  2360 | 3374 |\n",
 435 |               "|   1    |    1    |      11     |  3375 | 3662 |\n",
 436 |               "|   1    |    1    |      6      |  3663 | 4538 |\n",
 437 |               "|   1    |    1    |      10     |  4539 | 4735 |\n",
 438 |               "|   1    |    1    |      4      |  4736 | 5667 |\n",
 439 |               "|   1    |    1    |      9      |  5668 | 5859 |\n",
 440 |               "+--------+---------+-------------+-------+------+\n",
 441 |               "[10 rows x 5 columns]"
 442 |             ]
 443 |           },
 444 |           "metadata": {
 445 |             "tags": []
 446 |           },
 447 |           "execution_count": 11
 448 |         }
 449 |       ]
 450 |     },
 451 |     {
 452 |       "metadata": {
 453 |         "id": "a_oKPOf6cvAI",
 454 |         "colab_type": "text"
 455 |       },
 456 |       "cell_type": "markdown",
 457 |       "source": [
 458 |         "Next, we need to get the accelerometer and gyroscope data for each experiment. For each experiment, every sensor's data is in a separate file. In the code below we load the accelerometer and gyroscope data from all experiments into a single SFrame. While loading the collected samples, we also calculate the label for each sample using our previously defined function. The final SFrame contains a column named exp_id to identify each unique sessions."
 459 |       ]
 460 |     },
 461 |     {
 462 |       "metadata": {
 463 |         "id": "MVO9uGTocvqe",
 464 |         "colab_type": "code",
 465 |         "colab": {}
 466 |       },
 467 |       "cell_type": "code",
 468 |       "source": [
 469 |         "from glob import glob\n",
 470 |         "\n",
 471 |         "acc_files = glob(data_dir + 'acc_*.txt')\n",
 472 |         "gyro_files = glob(data_dir + 'gyro_*.txt')\n",
 473 |         "\n",
 474 |         "# Load data\n",
 475 |         "data = tc.SFrame()\n",
 476 |         "files = zip(sorted(acc_files), sorted(gyro_files))\n",
 477 |         "for acc_file, gyro_file in files:\n",
 478 |         "    exp_id = int(acc_file.split('_')[1][-2:])\n",
 479 |         "\n",
 480 |         "    # Load accel data\n",
 481 |         "    sf = tc.SFrame.read_csv(acc_file, delimiter=' ', header=False, verbose=False)\n",
 482 |         "    sf = sf.rename({'X1': 'acc_x', 'X2': 'acc_y', 'X3': 'acc_z'})\n",
 483 |         "    sf['exp_id'] = exp_id\n",
 484 |         "\n",
 485 |         "    # Load gyro data\n",
 486 |         "    gyro_sf = tc.SFrame.read_csv(gyro_file, delimiter=' ', header=False, verbose=False)\n",
 487 |         "    gyro_sf = gyro_sf.rename({'X1': 'gyro_x', 'X2': 'gyro_y', 'X3': 'gyro_z'})\n",
 488 |         "    sf = sf.add_columns(gyro_sf)\n",
 489 |         "\n",
 490 |         "    # Calc labels\n",
 491 |         "    exp_labels = labels[labels['exp_id'] == exp_id][['activity_id', 'start', 'end']].to_numpy()\n",
 492 |         "    sf = sf.add_row_number()\n",
 493 |         "    sf['activity_id'] = sf['id'].apply(lambda x: find_label_for_containing_interval(exp_labels, x))\n",
 494 |         "    sf = sf.remove_columns(['id'])\n",
 495 |         "\n",
 496 |         "    data = data.append(sf)"
 497 |       ],
 498 |       "execution_count": 0,
 499 |       "outputs": []
 500 |     },
 501 |     {
 502 |       "metadata": {
 503 |         "id": "UVT3HRrQc4zw",
 504 |         "colab_type": "text"
 505 |       },
 506 |       "cell_type": "markdown",
 507 |       "source": [
 508 |         "Finally, we encode the labels back into a readable string format, and save the resulting SFrame."
 509 |       ]
 510 |     },
 511 |     {
 512 |       "metadata": {
 513 |         "id": "rWRH6Mtnc5sU",
 514 |         "colab_type": "code",
 515 |         "colab": {}
 516 |       },
 517 |       "cell_type": "code",
 518 |       "source": [
 519 |         "target_map = {\n",
 520 |         "    1.: 'walking',          \n",
 521 |         "    2.: 'climbing_upstairs',\n",
 522 |         "    3.: 'climbing_downstairs',\n",
 523 |         "    4.: 'sitting',\n",
 524 |         "    5.: 'standing',\n",
 525 |         "    6.: 'laying'\n",
 526 |         "}\n",
 527 |         "\n",
 528 |         "# Use the same labels used in the experiment\n",
 529 |         "data = data.filter_by(list(target_map.keys()), 'activity_id')\n",
 530 |         "data['activity'] = data['activity_id'].apply(lambda x: target_map[x])\n",
 531 |         "data = data.remove_column('activity_id')\n",
 532 |         "\n",
 533 |         "data.save('hapt_data.sframe')"
 534 |       ],
 535 |       "execution_count": 0,
 536 |       "outputs": []
 537 |     },
 538 |     {
 539 |       "metadata": {
 540 |         "id": "fprN0DI7eC0a",
 541 |         "colab_type": "code",
 542 |         "colab": {
 543 |           "base_uri": "https://localhost:8080/",
 544 |           "height": 442
 545 |         },
 546 |         "outputId": "f2b416da-05b8-4f5b-8bc7-41940d1655a3"
 547 |       },
 548 |       "cell_type": "code",
 549 |       "source": [
 550 |         "data.head()"
 551 |       ],
 552 |       "execution_count": 15,
 553 |       "outputs": [
 554 |         {
 555 |           "output_type": "execute_result",
 556 |           "data": {
 557 |             "text/html": [
 558 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
 559 |               "    <tr>\n",
 560 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">acc_x</th>\n",
 561 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">acc_y</th>\n",
 562 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">acc_z</th>\n",
 563 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">exp_id</th>\n",
 564 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">gyro_x</th>\n",
 565 |               "    </tr>\n",
 566 |               "    <tr>\n",
 567 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.020833394742025</td>\n",
 568 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1250000020616516</td>\n",
 569 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.105555564319952</td>\n",
 570 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 571 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.002748893573880196</td>\n",
 572 |               "    </tr>\n",
 573 |               "    <tr>\n",
 574 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.025000070391787</td>\n",
 575 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1250000020616516</td>\n",
 576 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.1013888947481719</td>\n",
 577 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 578 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.0003054326225537807</td>\n",
 579 |               "    </tr>\n",
 580 |               "    <tr>\n",
 581 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.020833394742025</td>\n",
 582 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1250000020616516</td>\n",
 583 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.1041666724366978</td>\n",
 584 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 585 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.01221730466932058</td>\n",
 586 |               "    </tr>\n",
 587 |               "    <tr>\n",
 588 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.016666719092262</td>\n",
 589 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1250000020616516</td>\n",
 590 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.1083333359304957</td>\n",
 591 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 592 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.01130100712180138</td>\n",
 593 |               "    </tr>\n",
 594 |               "    <tr>\n",
 595 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.018055610975516</td>\n",
 596 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1277777858281599</td>\n",
 597 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.1083333359304957</td>\n",
 598 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 599 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.01099557429552078</td>\n",
 600 |               "    </tr>\n",
 601 |               "    <tr>\n",
 602 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.018055610975516</td>\n",
 603 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1291666655554495</td>\n",
 604 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.1041666724366978</td>\n",
 605 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 606 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.009162978269159794</td>\n",
 607 |               "    </tr>\n",
 608 |               "    <tr>\n",
 609 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.01944450285877</td>\n",
 610 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1250000020616516</td>\n",
 611 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.1013888947481719</td>\n",
 612 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 613 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.01007927674800158</td>\n",
 614 |               "    </tr>\n",
 615 |               "    <tr>\n",
 616 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.016666719092262</td>\n",
 617 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1236111101783975</td>\n",
 618 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.09722222517639174</td>\n",
 619 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 620 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.01374446786940098</td>\n",
 621 |               "    </tr>\n",
 622 |               "    <tr>\n",
 623 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.020833394742025</td>\n",
 624 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1277777858281599</td>\n",
 625 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.09861111705964588</td>\n",
 626 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 627 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.009773843921720982</td>\n",
 628 |               "    </tr>\n",
 629 |               "    <tr>\n",
 630 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.01944450285877</td>\n",
 631 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.1152777831908018</td>\n",
 632 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.09444444748786576</td>\n",
 633 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 634 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.01649336144328117</td>\n",
 635 |               "    </tr>\n",
 636 |               "</table>\n",
 637 |               "<table frame=\"box\" rules=\"cols\">\n",
 638 |               "    <tr>\n",
 639 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">gyro_y</th>\n",
 640 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">gyro_z</th>\n",
 641 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">activity</th>\n",
 642 |               "    </tr>\n",
 643 |               "    <tr>\n",
 644 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.00427605677396059</td>\n",
 645 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.002748893573880196</td>\n",
 646 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 647 |               "    </tr>\n",
 648 |               "    <tr>\n",
 649 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.002138028386980295</td>\n",
 650 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.006108652334660292</td>\n",
 651 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 652 |               "    </tr>\n",
 653 |               "    <tr>\n",
 654 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0009162978967651724</td>\n",
 655 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.00733038317412138</td>\n",
 656 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 657 |               "    </tr>\n",
 658 |               "    <tr>\n",
 659 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.001832595793530345</td>\n",
 660 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.006414085160940886</td>\n",
 661 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 662 |               "    </tr>\n",
 663 |               "    <tr>\n",
 664 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.001527163083665073</td>\n",
 665 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.004886921960860491</td>\n",
 666 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 667 |               "    </tr>\n",
 668 |               "    <tr>\n",
 669 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.003054326167330146</td>\n",
 670 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.01007927674800158</td>\n",
 671 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 672 |               "    </tr>\n",
 673 |               "    <tr>\n",
 674 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.00366519158706069</td>\n",
 675 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0003054326225537807</td>\n",
 676 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 677 |               "    </tr>\n",
 678 |               "    <tr>\n",
 679 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.01496619824320078</td>\n",
 680 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.00427605677396059</td>\n",
 681 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 682 |               "    </tr>\n",
 683 |               "    <tr>\n",
 684 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-0.006414085160940886</td>\n",
 685 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0003054326225537807</td>\n",
 686 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 687 |               "    </tr>\n",
 688 |               "    <tr>\n",
 689 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.00366519158706069</td>\n",
 690 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.003359758760780096</td>\n",
 691 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 692 |               "    </tr>\n",
 693 |               "</table>\n",
 694 |               "[10 rows x 8 columns]<br/>\n",
 695 |               "</div>"
 696 |             ],
 697 |             "text/plain": [
 698 |               "Columns:\n",
 699 |               "\tacc_x\tfloat\n",
 700 |               "\tacc_y\tfloat\n",
 701 |               "\tacc_z\tfloat\n",
 702 |               "\texp_id\tint\n",
 703 |               "\tgyro_x\tfloat\n",
 704 |               "\tgyro_y\tfloat\n",
 705 |               "\tgyro_z\tfloat\n",
 706 |               "\tactivity\tstr\n",
 707 |               "\n",
 708 |               "Rows: 10\n",
 709 |               "\n",
 710 |               "Data:\n",
 711 |               "+-------------------+---------------------+---------------------+--------+\n",
 712 |               "|       acc_x       |        acc_y        |        acc_z        | exp_id |\n",
 713 |               "+-------------------+---------------------+---------------------+--------+\n",
 714 |               "| 1.020833394742025 | -0.1250000020616516 |  0.105555564319952  |   1    |\n",
 715 |               "| 1.025000070391787 | -0.1250000020616516 |  0.1013888947481719 |   1    |\n",
 716 |               "| 1.020833394742025 | -0.1250000020616516 |  0.1041666724366978 |   1    |\n",
 717 |               "| 1.016666719092262 | -0.1250000020616516 |  0.1083333359304957 |   1    |\n",
 718 |               "| 1.018055610975516 | -0.1277777858281599 |  0.1083333359304957 |   1    |\n",
 719 |               "| 1.018055610975516 | -0.1291666655554495 |  0.1041666724366978 |   1    |\n",
 720 |               "|  1.01944450285877 | -0.1250000020616516 |  0.1013888947481719 |   1    |\n",
 721 |               "| 1.016666719092262 | -0.1236111101783975 | 0.09722222517639174 |   1    |\n",
 722 |               "| 1.020833394742025 | -0.1277777858281599 | 0.09861111705964588 |   1    |\n",
 723 |               "|  1.01944450285877 | -0.1152777831908018 | 0.09444444748786576 |   1    |\n",
 724 |               "+-------------------+---------------------+---------------------+--------+\n",
 725 |               "+------------------------+-----------------------+-----------------------+\n",
 726 |               "|         gyro_x         |         gyro_y        |         gyro_z        |\n",
 727 |               "+------------------------+-----------------------+-----------------------+\n",
 728 |               "| -0.002748893573880196  |  -0.00427605677396059 |  0.002748893573880196 |\n",
 729 |               "| -0.0003054326225537807 | -0.002138028386980295 |  0.006108652334660292 |\n",
 730 |               "|  0.01221730466932058   | 0.0009162978967651724 |  -0.00733038317412138 |\n",
 731 |               "|  0.01130100712180138   | -0.001832595793530345 | -0.006414085160940886 |\n",
 732 |               "|  0.01099557429552078   | -0.001527163083665073 | -0.004886921960860491 |\n",
 733 |               "|  0.009162978269159794  | -0.003054326167330146 |  0.01007927674800158  |\n",
 734 |               "|  0.01007927674800158   |  -0.00366519158706069 | 0.0003054326225537807 |\n",
 735 |               "|  0.01374446786940098   |  -0.01496619824320078 |  0.00427605677396059  |\n",
 736 |               "|  0.009773843921720982  | -0.006414085160940886 | 0.0003054326225537807 |\n",
 737 |               "|  0.01649336144328117   |  0.00366519158706069  |  0.003359758760780096 |\n",
 738 |               "+------------------------+-----------------------+-----------------------+\n",
 739 |               "+----------+\n",
 740 |               "| activity |\n",
 741 |               "+----------+\n",
 742 |               "| standing |\n",
 743 |               "| standing |\n",
 744 |               "| standing |\n",
 745 |               "| standing |\n",
 746 |               "| standing |\n",
 747 |               "| standing |\n",
 748 |               "| standing |\n",
 749 |               "| standing |\n",
 750 |               "| standing |\n",
 751 |               "| standing |\n",
 752 |               "+----------+\n",
 753 |               "[10 rows x 8 columns]"
 754 |             ]
 755 |           },
 756 |           "metadata": {
 757 |             "tags": []
 758 |           },
 759 |           "execution_count": 15
 760 |         }
 761 |       ]
 762 |     },
 763 |     {
 764 |       "metadata": {
 765 |         "id": "mI82V7dvho-6",
 766 |         "colab_type": "code",
 767 |         "colab": {
 768 |           "base_uri": "https://localhost:8080/",
 769 |           "height": 164
 770 |         },
 771 |         "outputId": "2c8c5bc4-3871-40ed-feb6-8e30f1e334f8"
 772 |       },
 773 |       "cell_type": "code",
 774 |       "source": [
 775 |         "data.groupby('activity', [tc.aggregate.COUNT]).sort(\"Count\", ascending = False)"
 776 |       ],
 777 |       "execution_count": 16,
 778 |       "outputs": [
 779 |         {
 780 |           "output_type": "execute_result",
 781 |           "data": {
 782 |             "text/html": [
 783 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
 784 |               "    <tr>\n",
 785 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">activity</th>\n",
 786 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">Count</th>\n",
 787 |               "    </tr>\n",
 788 |               "    <tr>\n",
 789 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">standing</td>\n",
 790 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">138105</td>\n",
 791 |               "    </tr>\n",
 792 |               "    <tr>\n",
 793 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">laying</td>\n",
 794 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">136865</td>\n",
 795 |               "    </tr>\n",
 796 |               "    <tr>\n",
 797 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">sitting</td>\n",
 798 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">126677</td>\n",
 799 |               "    </tr>\n",
 800 |               "    <tr>\n",
 801 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">walking</td>\n",
 802 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">122091</td>\n",
 803 |               "    </tr>\n",
 804 |               "    <tr>\n",
 805 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">climbing_upstairs</td>\n",
 806 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">116707</td>\n",
 807 |               "    </tr>\n",
 808 |               "    <tr>\n",
 809 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">climbing_downstairs</td>\n",
 810 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">107961</td>\n",
 811 |               "    </tr>\n",
 812 |               "</table>\n",
 813 |               "[6 rows x 2 columns]<br/>\n",
 814 |               "</div>"
 815 |             ],
 816 |             "text/plain": [
 817 |               "Columns:\n",
 818 |               "\tactivity\tstr\n",
 819 |               "\tCount\tint\n",
 820 |               "\n",
 821 |               "Rows: 6\n",
 822 |               "\n",
 823 |               "Data:\n",
 824 |               "+---------------------+--------+\n",
 825 |               "|       activity      | Count  |\n",
 826 |               "+---------------------+--------+\n",
 827 |               "|       standing      | 138105 |\n",
 828 |               "|        laying       | 136865 |\n",
 829 |               "|       sitting       | 126677 |\n",
 830 |               "|       walking       | 122091 |\n",
 831 |               "|  climbing_upstairs  | 116707 |\n",
 832 |               "| climbing_downstairs | 107961 |\n",
 833 |               "+---------------------+--------+\n",
 834 |               "[6 rows x 2 columns]"
 835 |             ]
 836 |           },
 837 |           "metadata": {
 838 |             "tags": []
 839 |           },
 840 |           "execution_count": 16
 841 |         }
 842 |       ]
 843 |     },
 844 |     {
 845 |       "metadata": {
 846 |         "id": "ikL5gDh-QQW5",
 847 |         "colab_type": "text"
 848 |       },
 849 |       "cell_type": "markdown",
 850 |       "source": [
 851 |         "## Example Activity Classififcation - HAPT Data"
 852 |       ]
 853 |     },
 854 |     {
 855 |       "metadata": {
 856 |         "id": "Am3pqXk2e4Yh",
 857 |         "colab_type": "code",
 858 |         "colab": {}
 859 |       },
 860 |       "cell_type": "code",
 861 |       "source": [
 862 |         "# Load sessions from preprocessed data\n",
 863 |         "data = tc.SFrame('hapt_data.sframe')"
 864 |       ],
 865 |       "execution_count": 0,
 866 |       "outputs": []
 867 |     },
 868 |     {
 869 |       "metadata": {
 870 |         "id": "yy_6w2sjtB7C",
 871 |         "colab_type": "code",
 872 |         "colab": {}
 873 |       },
 874 |       "cell_type": "code",
 875 |       "source": [
 876 |         "# Train/test split by recording sessions\n",
 877 |         "train, test = tc.activity_classifier.util.random_split_by_session(data, session_id='exp_id', fraction=0.8)"
 878 |       ],
 879 |       "execution_count": 0,
 880 |       "outputs": []
 881 |     },
 882 |     {
 883 |       "metadata": {
 884 |         "id": "U9KR53TtePFI",
 885 |         "colab_type": "code",
 886 |         "colab": {
 887 |           "base_uri": "https://localhost:8080/",
 888 |           "height": 370
 889 |         },
 890 |         "outputId": "35ed0af1-d681-468d-d837-0f8c363f2d08"
 891 |       },
 892 |       "cell_type": "code",
 893 |       "source": [
 894 |         "# Create an activity classifier\n",
 895 |         "model = tc.activity_classifier.create(train, session_id='exp_id', target='activity', prediction_window=50)"
 896 |       ],
 897 |       "execution_count": 7,
 898 |       "outputs": [
 899 |         {
 900 |           "output_type": "stream",
 901 |           "text": [
 902 |             "The dataset has less than the minimum of 100 sessions required for train-validation split. Continuing without validation set\n"
 903 |           ],
 904 |           "name": "stdout"
 905 |         },
 906 |         {
 907 |           "output_type": "display_data",
 908 |           "data": {
 909 |             "text/html": [
 910 |               "<pre>Pre-processing 585143 samples...</pre>"
 911 |             ],
 912 |             "text/plain": [
 913 |               "Pre-processing 585143 samples..."
 914 |             ]
 915 |           },
 916 |           "metadata": {
 917 |             "tags": []
 918 |           }
 919 |         },
 920 |         {
 921 |           "output_type": "display_data",
 922 |           "data": {
 923 |             "text/html": [
 924 |               "<pre>Using sequences of size 1000 for model creation.</pre>"
 925 |             ],
 926 |             "text/plain": [
 927 |               "Using sequences of size 1000 for model creation."
 928 |             ]
 929 |           },
 930 |           "metadata": {
 931 |             "tags": []
 932 |           }
 933 |         },
 934 |         {
 935 |           "output_type": "display_data",
 936 |           "data": {
 937 |             "text/html": [
 938 |               "<pre>Processed a total of 48 sessions.</pre>"
 939 |             ],
 940 |             "text/plain": [
 941 |               "Processed a total of 48 sessions."
 942 |             ]
 943 |           },
 944 |           "metadata": {
 945 |             "tags": []
 946 |           }
 947 |         },
 948 |         {
 949 |           "output_type": "stream",
 950 |           "text": [
 951 |             "Using GPU to create model (CUDA)\n",
 952 |             "+----------------+----------------+----------------+----------------+\n",
 953 |             "| Iteration      | Train Accuracy | Train Loss     | Elapsed Time   |\n",
 954 |             "+----------------+----------------+----------------+----------------+\n",
 955 |             "| 1              | 0.623          | 0.977          | 0.6            |\n",
 956 |             "| 2              | 0.810          | 0.541          | 1.2            |\n",
 957 |             "| 3              | 0.846          | 0.412          | 1.8            |\n",
 958 |             "| 4              | 0.863          | 0.359          | 2.4            |\n",
 959 |             "| 5              | 0.873          | 0.322          | 3.0            |\n",
 960 |             "| 6              | 0.889          | 0.293          | 3.6            |\n",
 961 |             "| 7              | 0.895          | 0.264          | 4.2            |\n",
 962 |             "| 8              | 0.902          | 0.242          | 4.8            |\n",
 963 |             "| 9              | 0.911          | 0.224          | 5.4            |\n",
 964 |             "| 10             | 0.916          | 0.208          | 6.0            |\n",
 965 |             "+----------------+----------------+----------------+----------------+\n",
 966 |             "Training complete\n",
 967 |             "Total Time Spent: 5.95675s\n"
 968 |           ],
 969 |           "name": "stdout"
 970 |         }
 971 |       ]
 972 |     },
 973 |     {
 974 |       "metadata": {
 975 |         "id": "wEDXZ0VaeQf6",
 976 |         "colab_type": "code",
 977 |         "colab": {
 978 |           "base_uri": "https://localhost:8080/",
 979 |           "height": 910
 980 |         },
 981 |         "outputId": "5a9ea21d-57b5-4413-a05f-397db276f7a9"
 982 |       },
 983 |       "cell_type": "code",
 984 |       "source": [
 985 |         "# Evaluate the model and save the results into a dictionary\n",
 986 |         "metrics = model.evaluate(test)\n",
 987 |         "print(metrics)"
 988 |       ],
 989 |       "execution_count": 17,
 990 |       "outputs": [
 991 |         {
 992 |           "output_type": "stream",
 993 |           "text": [
 994 |             "{'accuracy': 0.9324280455461433, 'auc': 0.994000145914642, 'precision': 0.9344077064028339, 'recall': 0.9313875060212965, 'f1_score': 0.9325884730108659, 'log_loss': 0.2183073822297761, 'confusion_matrix': Columns:\n",
 995 |             "\ttarget_label\tstr\n",
 996 |             "\tpredicted_label\tstr\n",
 997 |             "\tcount\tint\n",
 998 |             "\n",
 999 |             "Rows: 31\n",
1000 |             "\n",
1001 |             "Data:\n",
1002 |             "+---------------------+---------------------+-------+\n",
1003 |             "|     target_label    |   predicted_label   | count |\n",
1004 |             "+---------------------+---------------------+-------+\n",
1005 |             "| climbing_downstairs |  climbing_upstairs  |  640  |\n",
1006 |             "|  climbing_upstairs  |  climbing_upstairs  | 23073 |\n",
1007 |             "|  climbing_upstairs  | climbing_downstairs |  1048 |\n",
1008 |             "|        laying       |       walking       |   31  |\n",
1009 |             "| climbing_downstairs | climbing_downstairs | 21240 |\n",
1010 |             "|       sitting       |       standing      |  3346 |\n",
1011 |             "|        laying       |        laying       | 29554 |\n",
1012 |             "|       standing      |       standing      | 29827 |\n",
1013 |             "|       walking       |       sitting       |  1351 |\n",
1014 |             "|       sitting       |       sitting       | 25134 |\n",
1015 |             "+---------------------+---------------------+-------+\n",
1016 |             "[31 rows x 3 columns]\n",
1017 |             "Note: Only the head of the SFrame is printed.\n",
1018 |             "You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'roc_curve': Columns:\n",
1019 |             "\tthreshold\tfloat\n",
1020 |             "\tfpr\tfloat\n",
1021 |             "\ttpr\tfloat\n",
1022 |             "\tp\tint\n",
1023 |             "\tn\tint\n",
1024 |             "\tclass\tstr\n",
1025 |             "\n",
1026 |             "Rows: 600006\n",
1027 |             "\n",
1028 |             "Data:\n",
1029 |             "+-----------+--------------------+-----+-------+--------+---------------------+\n",
1030 |             "| threshold |        fpr         | tpr |   p   |   n    |        class        |\n",
1031 |             "+-----------+--------------------+-----+-------+--------+---------------------+\n",
1032 |             "|    0.0    |        1.0         | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1033 |             "|   1e-05   | 0.9458010041077134 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1034 |             "|   2e-05   | 0.9026557507987221 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1035 |             "|   3e-05   | 0.8837574167047011 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1036 |             "|   4e-05   | 0.8687813783660429 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1037 |             "|   5e-05   | 0.8470304655408489 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1038 |             "|   6e-05   | 0.8288452761296212 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1039 |             "|   7e-05   | 0.813869237790963  | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1040 |             "|   8e-05   | 0.8056680739388408 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1041 |             "|   9e-05   | 0.797823482428115  | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1042 |             "+-----------+--------------------+-----+-------+--------+---------------------+\n",
1043 |             "[600006 rows x 6 columns]\n",
1044 |             "Note: Only the head of the SFrame is printed.\n",
1045 |             "You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}\n"
1046 |           ],
1047 |           "name": "stdout"
1048 |         }
1049 |       ]
1050 |     },
1051 |     {
1052 |       "metadata": {
1053 |         "id": "0211dGgJeRsh",
1054 |         "colab_type": "code",
1055 |         "colab": {
1056 |           "base_uri": "https://localhost:8080/",
1057 |           "height": 34
1058 |         },
1059 |         "outputId": "8ec5c45c-5bd3-4c8c-f691-81fd333168dc"
1060 |       },
1061 |       "cell_type": "code",
1062 |       "source": [
1063 |         "print(metrics['accuracy'])"
1064 |       ],
1065 |       "execution_count": 9,
1066 |       "outputs": [
1067 |         {
1068 |           "output_type": "stream",
1069 |           "text": [
1070 |             "0.9324280455461433\n"
1071 |           ],
1072 |           "name": "stdout"
1073 |         }
1074 |       ]
1075 |     },
1076 |     {
1077 |       "metadata": {
1078 |         "id": "4ErNlq_vfhby",
1079 |         "colab_type": "text"
1080 |       },
1081 |       "cell_type": "markdown",
1082 |       "source": [
1083 |         "Since we have created the model with samples taken at 50Hz and set the prediction_window to 50, we will get one prediction per second. Invoking our newly created model on the above 3-seconds walking example produces the following per-second predictions:"
1084 |       ]
1085 |     },
1086 |     {
1087 |       "metadata": {
1088 |         "id": "0wldXO3kfjdv",
1089 |         "colab_type": "code",
1090 |         "colab": {
1091 |           "base_uri": "https://localhost:8080/",
1092 |           "height": 109
1093 |         },
1094 |         "outputId": "772997a2-0d54-41da-c1b7-20428cf7cabd"
1095 |       },
1096 |       "cell_type": "code",
1097 |       "source": [
1098 |         "walking_3_sec = data[(data['activity'] == 'walking') & (data['exp_id'] == 1)][1000:1150]\n",
1099 |         "model.predict(walking_3_sec, output_frequency='per_window')"
1100 |       ],
1101 |       "execution_count": 12,
1102 |       "outputs": [
1103 |         {
1104 |           "output_type": "execute_result",
1105 |           "data": {
1106 |             "text/html": [
1107 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
1108 |               "    <tr>\n",
1109 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">prediction_id</th>\n",
1110 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">exp_id</th>\n",
1111 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">class</th>\n",
1112 |               "    </tr>\n",
1113 |               "    <tr>\n",
1114 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1115 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
1116 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">walking</td>\n",
1117 |               "    </tr>\n",
1118 |               "    <tr>\n",
1119 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
1120 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
1121 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">walking</td>\n",
1122 |               "    </tr>\n",
1123 |               "    <tr>\n",
1124 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
1125 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
1126 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">walking</td>\n",
1127 |               "    </tr>\n",
1128 |               "</table>\n",
1129 |               "[3 rows x 3 columns]<br/>\n",
1130 |               "</div>"
1131 |             ],
1132 |             "text/plain": [
1133 |               "Columns:\n",
1134 |               "\tprediction_id\tint\n",
1135 |               "\texp_id\tint\n",
1136 |               "\tclass\tstr\n",
1137 |               "\n",
1138 |               "Rows: 3\n",
1139 |               "\n",
1140 |               "Data:\n",
1141 |               "+---------------+--------+---------+\n",
1142 |               "| prediction_id | exp_id |  class  |\n",
1143 |               "+---------------+--------+---------+\n",
1144 |               "|       0       |   1    | walking |\n",
1145 |               "|       1       |   1    | walking |\n",
1146 |               "|       2       |   1    | walking |\n",
1147 |               "+---------------+--------+---------+\n",
1148 |               "[3 rows x 3 columns]"
1149 |             ]
1150 |           },
1151 |           "metadata": {
1152 |             "tags": []
1153 |           },
1154 |           "execution_count": 12
1155 |         }
1156 |       ]
1157 |     },
1158 |     {
1159 |       "metadata": {
1160 |         "id": "8osqXiCQP2Vk",
1161 |         "colab_type": "text"
1162 |       },
1163 |       "cell_type": "markdown",
1164 |       "source": [
1165 |         "## Save / Export Model"
1166 |       ]
1167 |     },
1168 |     {
1169 |       "metadata": {
1170 |         "id": "skcTzbLiP0NU",
1171 |         "colab_type": "code",
1172 |         "colab": {}
1173 |       },
1174 |       "cell_type": "code",
1175 |       "source": [
1176 |         "# Save the model for later use in Turi Create\n",
1177 |         "model.save('ActivityClassifier.model')"
1178 |       ],
1179 |       "execution_count": 0,
1180 |       "outputs": []
1181 |     },
1182 |     {
1183 |       "metadata": {
1184 |         "id": "gsLbJ0M5P1iR",
1185 |         "colab_type": "code",
1186 |         "colab": {
1187 |           "base_uri": "https://localhost:8080/",
1188 |           "height": 67
1189 |         },
1190 |         "outputId": "3cec968c-7f20-4ba8-fb8e-bb806ae84979"
1191 |       },
1192 |       "cell_type": "code",
1193 |       "source": [
1194 |         "# Export for use in Core ML\n",
1195 |         "model.export_coreml('ActivityClassifier.mlmodel')"
1196 |       ],
1197 |       "execution_count": 11,
1198 |       "outputs": [
1199 |         {
1200 |           "output_type": "stream",
1201 |           "text": [
1202 |             "/usr/local/lib/python3.6/dist-packages/coremltools/_deps/__init__.py:118: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead\n",
1203 |             "  % (tensorflow.__version__, TF_MAX_VERSION))\n",
1204 |             "WARNING:root:TensorFlow version 1.10.1 detected. Last version known to be fully compatible is 1.5.0 .\n"
1205 |           ],
1206 |           "name": "stderr"
1207 |         }
1208 |       ]
1209 |     },
1210 |     {
1211 |       "metadata": {
1212 |         "id": "y9zcbtW7fuPS",
1213 |         "colab_type": "code",
1214 |         "colab": {}
1215 |       },
1216 |       "cell_type": "code",
1217 |       "source": [
1218 |         "# download mlmodel locally\n",
1219 |         "from google.colab import files\n",
1220 |         "files.download(\"ActivityClassifier.mlmodel\")"
1221 |       ],
1222 |       "execution_count": 0,
1223 |       "outputs": []
1224 |     },
1225 |     {
1226 |       "metadata": {
1227 |         "id": "BZiVwtK1fy9F",
1228 |         "colab_type": "code",
1229 |         "colab": {
1230 |           "base_uri": "https://localhost:8080/",
1231 |           "height": 34
1232 |         },
1233 |         "outputId": "531297da-e360-4bfb-8b09-6d424bb3c584"
1234 |       },
1235 |       "cell_type": "code",
1236 |       "source": [
1237 |         "# copy model to Google Drive\n",
1238 |         "from shutil import copy\n",
1239 |         "copy(\"/content/ActivityClassifier.mlmodel\", \"/content/drive/My Drive/Colab Notebooks/data/models/ActivityClassifier.mlmodel\")"
1240 |       ],
1241 |       "execution_count": 13,
1242 |       "outputs": [
1243 |         {
1244 |           "output_type": "execute_result",
1245 |           "data": {
1246 |             "text/plain": [
1247 |               "'/content/drive/Colab Notebooks/data/models/ActivityClassifier.mlmodel'"
1248 |             ]
1249 |           },
1250 |           "metadata": {
1251 |             "tags": []
1252 |           },
1253 |           "execution_count": 13
1254 |         }
1255 |       ]
1256 |     },
1257 |     {
1258 |       "metadata": {
1259 |         "id": "kJeYR7haf2b6",
1260 |         "colab_type": "code",
1261 |         "colab": {
1262 |           "base_uri": "https://localhost:8080/",
1263 |           "height": 34
1264 |         },
1265 |         "outputId": "4688a2cf-9b58-4f12-c2d5-ea8a50a795df"
1266 |       },
1267 |       "cell_type": "code",
1268 |       "source": [
1269 |         "# copy model to Google Drive\n",
1270 |         "from shutil import copytree\n",
1271 |         "copytree(\"/content/ActivityClassifier.model\", \"/content/drive/My Drive/Colab Notebooks/data/models/ActivityClassifier.model\")"
1272 |       ],
1273 |       "execution_count": 15,
1274 |       "outputs": [
1275 |         {
1276 |           "output_type": "execute_result",
1277 |           "data": {
1278 |             "text/plain": [
1279 |               "'/content/drive/Colab Notebooks/data/models/ActivityClassifier.model'"
1280 |             ]
1281 |           },
1282 |           "metadata": {
1283 |             "tags": []
1284 |           },
1285 |           "execution_count": 15
1286 |         }
1287 |       ]
1288 |     },
1289 |     {
1290 |       "metadata": {
1291 |         "id": "Q7RgoTcpgaKq",
1292 |         "colab_type": "text"
1293 |       },
1294 |       "cell_type": "markdown",
1295 |       "source": [
1296 |         "## How does this work?\n",
1297 |         "\n",
1298 |         "The deep learning model relies on convolutional layers to extract temporal features from a single prediction window, for example an arching movement could possibly be a strong indicator of swimming. Furthermore, it relies on recurrent layers to extract temporal features over time, for example if a subject was swimming in the previous timestamp, then it is most likely not sky diving in the next. Below is a sketch of the neural network used for the activity classifier in Turi Create.\n",
1299 |         "\n",
1300 |         "![deep learning model](https://apple.github.io/turicreate/docs/userguide/activity_classifier/images/activity_classifier_network.png)\n",
1301 |         "\n",
1302 |         "A single input to the neural network is a session as defined in the previous section. The convolutional layer operates on each prediction window, finding spatial features that may be relevant to the labeled activities. \n",
1303 |         "\n",
1304 |         "![prediction window](https://apple.github.io/turicreate/docs/userguide/activity_classifier/images/convolutional_filter.png)\n",
1305 |         "\n",
1306 |         "The output of the convolutional layer is a vector representation for each prediction window, encoding these learnt features. The recurrent layer takes as input a sequence of these vectors.\n",
1307 |         "\n",
1308 |         "The recurrent layer is specialized for learning temporal features across sequences. For example, it may learn that spatial features associated with walking are more likely to occur after detecting spatial features associated with running. These features are further encoded into the output of the recurrent layer.\n",
1309 |         "\n",
1310 |         "In order to detect these features along sessions the recurrent layer takes into account it's own state - the output of the recurrent layer for the previous prediction window. The output of the recurrent layer for the current prediction window is turned into a probability vector across all desired activities to produce the final classification."
1311 |       ]
1312 |     },
1313 |     {
1314 |       "metadata": {
1315 |         "id": "licB01Ek33VN",
1316 |         "colab_type": "text"
1317 |       },
1318 |       "cell_type": "markdown",
1319 |       "source": [
1320 |         "Blogs:\n",
1321 |         "*   https://medium.com/@howal/activity-monitoring-with-apples-turi-create-machine-learning-1043ce5b9203"
1322 |       ]
1323 |     }
1324 |   ]
1325 | }


--------------------------------------------------------------------------------
/turicreate_sframes_intro.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "nbformat": 4,
   3 |   "nbformat_minor": 0,
   4 |   "metadata": {
   5 |     "colab": {
   6 |       "name": "turicreate-sframes-intro.ipynb",
   7 |       "version": "0.3.2",
   8 |       "provenance": [],
   9 |       "collapsed_sections": [],
  10 |       "include_colab_link": true
  11 |     },
  12 |     "kernelspec": {
  13 |       "name": "python3",
  14 |       "display_name": "Python 3"
  15 |     },
  16 |     "accelerator": "GPU"
  17 |   },
  18 |   "cells": [
  19 |     {
  20 |       "cell_type": "markdown",
  21 |       "metadata": {
  22 |         "id": "view-in-github",
  23 |         "colab_type": "text"
  24 |       },
  25 |       "source": [
  26 |         "[View in Colaboratory](https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_sframes_intro.ipynb)"
  27 |       ]
  28 |     },
  29 |     {
  30 |       "metadata": {
  31 |         "id": "MRYC2Vn02lMd",
  32 |         "colab_type": "text"
  33 |       },
  34 |       "cell_type": "markdown",
  35 |       "source": [
  36 |         "# Introduction to Turi Create SFrames\n",
  37 |         "\n",
  38 |         "https://github.com/apple/turicreate/blob/master/userguide/sframe/sframe-intro.md"
  39 |       ]
  40 |     },
  41 |     {
  42 |       "metadata": {
  43 |         "id": "qo2KqxVs2q8B",
  44 |         "colab_type": "text"
  45 |       },
  46 |       "cell_type": "markdown",
  47 |       "source": [
  48 |         "SFrames are the primary data structure for extracting data from other sources for use in Turi Create.\n",
  49 |         "\n",
  50 |         "They are similar to Pandas Dataframes but do not need to be loaded as a whole into RAM, so are not constrained by the RAM of the machine running the code. This makes it a scalable data structure. It is column immutable and supports out-of-core processing.\n",
  51 |         "\n",
  52 |         "SFrames can extract data from the following static file formats:\n",
  53 |         "\n",
  54 |         "*   CSV\n",
  55 |         "*   JSON\n",
  56 |         "*   SQL databases"
  57 |       ]
  58 |     },
  59 |     {
  60 |       "metadata": {
  61 |         "id": "Ew1_T9B94_ip",
  62 |         "colab_type": "text"
  63 |       },
  64 |       "cell_type": "markdown",
  65 |       "source": [
  66 |         "## Turi Create and GPU Setup"
  67 |       ]
  68 |     },
  69 |     {
  70 |       "metadata": {
  71 |         "id": "DCSsL_vs5BHF",
  72 |         "colab_type": "code",
  73 |         "colab": {}
  74 |       },
  75 |       "cell_type": "code",
  76 |       "source": [
  77 |         "!apt install libnvrtc8.0\n",
  78 |         "!pip uninstall -y mxnet-cu80 && pip install mxnet-cu80==1.1.0\n",
  79 |         "!pip install turicreate"
  80 |       ],
  81 |       "execution_count": 0,
  82 |       "outputs": []
  83 |     },
  84 |     {
  85 |       "metadata": {
  86 |         "id": "-nVQfLiK6U_7",
  87 |         "colab_type": "text"
  88 |       },
  89 |       "cell_type": "markdown",
  90 |       "source": [
  91 |         "## Google Drive Access\n",
  92 |         "\n",
  93 |         "You will be asked to click a link to generate a secret key to access your Google Drive. \n",
  94 |         "\n",
  95 |         "Copy and paste secret key it into the space provided with the notebook."
  96 |       ]
  97 |     },
  98 |     {
  99 |       "metadata": {
 100 |         "id": "y5a7kHU26Vm9",
 101 |         "colab_type": "code",
 102 |         "colab": {}
 103 |       },
 104 |       "cell_type": "code",
 105 |       "source": [
 106 |         "import os.path\n",
 107 |         "from google.colab import drive\n",
 108 |         "\n",
 109 |         "# mount Google Drive to /content/drive/My Drive/\n",
 110 |         "if os.path.isdir(\"/content/drive/My Drive\"):\n",
 111 |         "  print(\"Google Drive already mounted\")\n",
 112 |         "else:\n",
 113 |         "  drive.mount('/content/drive')"
 114 |       ],
 115 |       "execution_count": 0,
 116 |       "outputs": []
 117 |     },
 118 |     {
 119 |       "metadata": {
 120 |         "id": "dTWuiXji9cEK",
 121 |         "colab_type": "text"
 122 |       },
 123 |       "cell_type": "markdown",
 124 |       "source": [
 125 |         "## Fetch Data"
 126 |       ]
 127 |     },
 128 |     {
 129 |       "metadata": {
 130 |         "id": "_q-u2IbDPA2Q",
 131 |         "colab_type": "code",
 132 |         "colab": {}
 133 |       },
 134 |       "cell_type": "code",
 135 |       "source": [
 136 |         "import os.path\n",
 137 |         "import urllib.request\n",
 138 |         "import tarfile\n",
 139 |         "import zipfile\n",
 140 |         "import gzip\n",
 141 |         "from shutil import copy\n",
 142 |         "\n",
 143 |         "def fetch_remote_datafile(filename, remote_url):\n",
 144 |         "  if os.path.isfile(\"./\" + filename):\n",
 145 |         "    print(\"already have \" + filename + \" in workspace\")\n",
 146 |         "    return\n",
 147 |         "  print(\"fetching \" + filename + \" from \" + remote_url + \"...\")\n",
 148 |         "  urllib.request.urlretrieve(remote_url, \"./\" + filename)\n",
 149 |         "\n",
 150 |         "def cache_datafile_in_drive(filename):\n",
 151 |         "  if os.path.isfile(\"./\" + filename) == False:\n",
 152 |         "    print(\"cannot cache \" + filename + \", it is not in workspace\")\n",
 153 |         "    return\n",
 154 |         "  \n",
 155 |         "  data_drive_path = \"/content/drive/My Drive/Colab Notebooks/data/\"\n",
 156 |         "  if os.path.isfile(data_drive_path + filename):\n",
 157 |         "    print(\"\" + filename + \" has already been stored in Google Drive\")\n",
 158 |         "  else:\n",
 159 |         "    print(\"copying \" + filename + \" to \" + data_drive_path)\n",
 160 |         "    copy(\"./\" + filename, data_drive_path)\n",
 161 |         "  \n",
 162 |         "\n",
 163 |         "def load_datafile_from_drive(filename, remote_url=None):\n",
 164 |         "  data_drive_path = \"/content/drive/My Drive/Colab Notebooks/data/\"\n",
 165 |         "  if os.path.isfile(\"./\" + filename):\n",
 166 |         "    print(\"already have \" + filename + \" in workspace\")\n",
 167 |         "  elif os.path.isfile(data_drive_path + filename):\n",
 168 |         "    print(\"have \" + filename + \" in Google Drive, copying to workspace...\")\n",
 169 |         "    copy(data_drive_path + filename, \".\")\n",
 170 |         "  elif remote_url != None:\n",
 171 |         "    fetch_remote_datafile(filename, remote_url)\n",
 172 |         "  else:\n",
 173 |         "    print(\"error: you need to manually download \" + filename + \" and put in drive\")\n",
 174 |         "    \n",
 175 |         "def extract_datafile(filename, expected_extract_artifact=None):\n",
 176 |         "  if expected_extract_artifact != None and (os.path.isfile(expected_extract_artifact) or os.path.isdir(expected_extract_artifact)):\n",
 177 |         "    print(\"files in \" + filename + \" have already been extracted\")\n",
 178 |         "  elif os.path.isfile(\"./\" + filename) == False:\n",
 179 |         "    print(\"error: cannot extract \" + filename + \", it is not in the workspace\")\n",
 180 |         "  else:\n",
 181 |         "    extension = filename.split('.')[-1]\n",
 182 |         "    if extension == \"zip\":\n",
 183 |         "      print(\"extracting \" + filename + \"...\")\n",
 184 |         "      data_file = open(filename, \"rb\")\n",
 185 |         "      z = zipfile.ZipFile(data_file)\n",
 186 |         "      for name in z.namelist():\n",
 187 |         "          print(\"    extracting file\", name)\n",
 188 |         "          z.extract(name, \"./\")\n",
 189 |         "      data_file.close()\n",
 190 |         "    elif extension == \"gz\":\n",
 191 |         "      print(\"extracting \" + filename + \"...\")\n",
 192 |         "      if filename.split('.')[-2] == \"tar\":\n",
 193 |         "        tar = tarfile.open(filename)\n",
 194 |         "        tar.extractall()\n",
 195 |         "        tar.close()\n",
 196 |         "      else:\n",
 197 |         "        data_zip_file = gzip.GzipFile(filename, 'rb')\n",
 198 |         "        data = data_zip_file.read()\n",
 199 |         "        data_zip_file.close()\n",
 200 |         "        extracted_file = open('.'.join(filename.split('.')[0:-1]), 'wb')\n",
 201 |         "        extracted_file.write(data)\n",
 202 |         "        extracted_file.close()\n",
 203 |         "    elif extension == \"tar\":\n",
 204 |         "      print(\"extracting \" + filename + \"...\")\n",
 205 |         "      tar = tarfile.open(filename)\n",
 206 |         "      tar.extractall()\n",
 207 |         "      tar.close()\n",
 208 |         "    elif extension == \"csv\":\n",
 209 |         "      print(\"do not need to extract csv\")\n",
 210 |         "    else:\n",
 211 |         "      print(\"cannot extract \" + filename)\n",
 212 |         "      \n",
 213 |         "def load_cache_extract_datafile(filename, expected_extract_artifact=None, remote_url=None):\n",
 214 |         "  load_datafile_from_drive(filename, remote_url)\n",
 215 |         "  extract_datafile(filename, expected_extract_artifact)\n",
 216 |         "  cache_datafile_in_drive(filename)\n",
 217 |         "  "
 218 |       ],
 219 |       "execution_count": 0,
 220 |       "outputs": []
 221 |     },
 222 |     {
 223 |       "metadata": {
 224 |         "id": "ja-WPIYCPBo2",
 225 |         "colab_type": "code",
 226 |         "colab": {
 227 |           "base_uri": "https://localhost:8080/",
 228 |           "height": 71
 229 |         },
 230 |         "outputId": "e3a55a49-3595-4033-8fc4-c29f4e636047"
 231 |       },
 232 |       "cell_type": "code",
 233 |       "source": [
 234 |         "load_cache_extract_datafile(\"song_data.csv.zip\", \"song_data.csv\", \"https://static.turi.com/datasets/millionsong/song_data.csv\")"
 235 |       ],
 236 |       "execution_count": 3,
 237 |       "outputs": [
 238 |         {
 239 |           "output_type": "stream",
 240 |           "text": [
 241 |             "already have song_data.csv.zip in workspace\n",
 242 |             "files in song_data.csv.zip have already been extracted\n",
 243 |             "song_data.csv.zip has already been stored in Google Drive\n"
 244 |           ],
 245 |           "name": "stdout"
 246 |         }
 247 |       ]
 248 |     },
 249 |     {
 250 |       "metadata": {
 251 |         "id": "0cOYIVEQPRJM",
 252 |         "colab_type": "code",
 253 |         "colab": {
 254 |           "base_uri": "https://localhost:8080/",
 255 |           "height": 71
 256 |         },
 257 |         "outputId": "718819df-2d6f-487a-ec33-5f613a84af32"
 258 |       },
 259 |       "cell_type": "code",
 260 |       "source": [
 261 |         "load_cache_extract_datafile(\"10000.txt.zip\", \"10000.txt\", \"https://static.turi.com/datasets/millionsong/10000.txt\")"
 262 |       ],
 263 |       "execution_count": 5,
 264 |       "outputs": [
 265 |         {
 266 |           "output_type": "stream",
 267 |           "text": [
 268 |             "already have 10000.txt.zip in workspace\n",
 269 |             "files in 10000.txt.zip have already been extracted\n",
 270 |             "10000.txt.zip has already been stored in Google Drive\n"
 271 |           ],
 272 |           "name": "stdout"
 273 |         }
 274 |       ]
 275 |     },
 276 |     {
 277 |       "metadata": {
 278 |         "id": "lBLwgPKlPdGs",
 279 |         "colab_type": "code",
 280 |         "colab": {
 281 |           "base_uri": "https://localhost:8080/",
 282 |           "height": 71
 283 |         },
 284 |         "outputId": "a89500cb-6460-4b86-d8a3-df697dcb3582"
 285 |       },
 286 |       "cell_type": "code",
 287 |       "source": [
 288 |         "load_cache_extract_datafile(\"loc-gowalla_totalCheckins.txt.gz\", \"loc-gowalla_totalCheckins.txt\", \"https://snap.stanford.edu/data/loc-gowalla_totalCheckins.txt.gz\")"
 289 |       ],
 290 |       "execution_count": 11,
 291 |       "outputs": [
 292 |         {
 293 |           "output_type": "stream",
 294 |           "text": [
 295 |             "already have loc-gowalla_totalCheckins.txt.gz in workspace\n",
 296 |             "files in loc-gowalla_totalCheckins.txt.gz have already been extracted\n",
 297 |             "loc-gowalla_totalCheckins.txt.gz has already been stored in Google Drive\n"
 298 |           ],
 299 |           "name": "stdout"
 300 |         }
 301 |       ]
 302 |     },
 303 |     {
 304 |       "metadata": {
 305 |         "id": "_-x_YK5h9DfS",
 306 |         "colab_type": "text"
 307 |       },
 308 |       "cell_type": "markdown",
 309 |       "source": [
 310 |         "## Setup Turi Create"
 311 |       ]
 312 |     },
 313 |     {
 314 |       "metadata": {
 315 |         "id": "R80hcNX19F7X",
 316 |         "colab_type": "code",
 317 |         "colab": {}
 318 |       },
 319 |       "cell_type": "code",
 320 |       "source": [
 321 |         "import mxnet as mx\n",
 322 |         "import turicreate as tc"
 323 |       ],
 324 |       "execution_count": 0,
 325 |       "outputs": []
 326 |     },
 327 |     {
 328 |       "metadata": {
 329 |         "id": "JMpUas6Y9IHc",
 330 |         "colab_type": "code",
 331 |         "colab": {}
 332 |       },
 333 |       "cell_type": "code",
 334 |       "source": [
 335 |         "# Use all GPUs (default)\n",
 336 |         "tc.config.set_num_gpus(-1)\n",
 337 |         "\n",
 338 |         "# Use only 1 GPU\n",
 339 |         "#tc.config.set_num_gpus(1)\n",
 340 |         "\n",
 341 |         "# Use CPU\n",
 342 |         "#tc.config.set_num_gpus(0)"
 343 |       ],
 344 |       "execution_count": 0,
 345 |       "outputs": []
 346 |     },
 347 |     {
 348 |       "metadata": {
 349 |         "id": "zi7jvSvE27eP",
 350 |         "colab_type": "text"
 351 |       },
 352 |       "cell_type": "markdown",
 353 |       "source": [
 354 |         "## Sample Data\n",
 355 |         "\n",
 356 |         "The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.\n",
 357 |         "\n",
 358 |         "https://labrosa.ee.columbia.edu/millionsong/\n",
 359 |         "\n",
 360 |         "The first table contains metadata about each song in the database. Here's how we load it into an SFrame:"
 361 |       ]
 362 |     },
 363 |     {
 364 |       "metadata": {
 365 |         "id": "Z1c84n0o2Znq",
 366 |         "colab_type": "code",
 367 |         "colab": {
 368 |           "base_uri": "https://localhost:8080/",
 369 |           "height": 233
 370 |         },
 371 |         "outputId": "87a21db1-2574-4da4-c40f-05b1cc002af8"
 372 |       },
 373 |       "cell_type": "code",
 374 |       "source": [
 375 |         "songs = tc.SFrame.read_csv(\"./song_data.csv\")"
 376 |       ],
 377 |       "execution_count": 7,
 378 |       "outputs": [
 379 |         {
 380 |           "output_type": "display_data",
 381 |           "data": {
 382 |             "text/html": [
 383 |               "<pre>Finished parsing file /content/song_data.csv</pre>"
 384 |             ],
 385 |             "text/plain": [
 386 |               "Finished parsing file /content/song_data.csv"
 387 |             ]
 388 |           },
 389 |           "metadata": {
 390 |             "tags": []
 391 |           }
 392 |         },
 393 |         {
 394 |           "output_type": "display_data",
 395 |           "data": {
 396 |             "text/html": [
 397 |               "<pre>Parsing completed. Parsed 100 lines in 1.86884 secs.</pre>"
 398 |             ],
 399 |             "text/plain": [
 400 |               "Parsing completed. Parsed 100 lines in 1.86884 secs."
 401 |             ]
 402 |           },
 403 |           "metadata": {
 404 |             "tags": []
 405 |           }
 406 |         },
 407 |         {
 408 |           "output_type": "stream",
 409 |           "text": [
 410 |             "------------------------------------------------------\n",
 411 |             "Inferred types from first 100 line(s) of file as \n",
 412 |             "column_type_hints=[str,str,str,str,int]\n",
 413 |             "If parsing fails due to incorrect types, you can correct\n",
 414 |             "the inferred type list above and pass it to read_csv in\n",
 415 |             "the column_type_hints argument\n",
 416 |             "------------------------------------------------------\n"
 417 |           ],
 418 |           "name": "stdout"
 419 |         },
 420 |         {
 421 |           "output_type": "display_data",
 422 |           "data": {
 423 |             "text/html": [
 424 |               "<pre>Read 637410 lines. Lines per second: 490463</pre>"
 425 |             ],
 426 |             "text/plain": [
 427 |               "Read 637410 lines. Lines per second: 490463"
 428 |             ]
 429 |           },
 430 |           "metadata": {
 431 |             "tags": []
 432 |           }
 433 |         },
 434 |         {
 435 |           "output_type": "display_data",
 436 |           "data": {
 437 |             "text/html": [
 438 |               "<pre>Finished parsing file /content/song_data.csv</pre>"
 439 |             ],
 440 |             "text/plain": [
 441 |               "Finished parsing file /content/song_data.csv"
 442 |             ]
 443 |           },
 444 |           "metadata": {
 445 |             "tags": []
 446 |           }
 447 |         },
 448 |         {
 449 |           "output_type": "display_data",
 450 |           "data": {
 451 |             "text/html": [
 452 |               "<pre>Parsing completed. Parsed 1000000 lines in 1.49576 secs.</pre>"
 453 |             ],
 454 |             "text/plain": [
 455 |               "Parsing completed. Parsed 1000000 lines in 1.49576 secs."
 456 |             ]
 457 |           },
 458 |           "metadata": {
 459 |             "tags": []
 460 |           }
 461 |         }
 462 |       ]
 463 |     },
 464 |     {
 465 |       "metadata": {
 466 |         "id": "lgWs2Lbo3fF_",
 467 |         "colab_type": "code",
 468 |         "colab": {
 469 |           "base_uri": "https://localhost:8080/",
 470 |           "height": 296
 471 |         },
 472 |         "outputId": "2b0ecc17-dd31-406a-9e89-c35212171886"
 473 |       },
 474 |       "cell_type": "code",
 475 |       "source": [
 476 |         "songs.head()"
 477 |       ],
 478 |       "execution_count": 21,
 479 |       "outputs": [
 480 |         {
 481 |           "output_type": "execute_result",
 482 |           "data": {
 483 |             "text/html": [
 484 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
 485 |               "    <tr>\n",
 486 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">song_id</th>\n",
 487 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">title</th>\n",
 488 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">release</th>\n",
 489 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">artist_name</th>\n",
 490 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">year</th>\n",
 491 |               "    </tr>\n",
 492 |               "    <tr>\n",
 493 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOQMMHC12AB0180CB8</td>\n",
 494 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Silent Night</td>\n",
 495 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Monster Ballads X-Mas</td>\n",
 496 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Faster Pussy cat</td>\n",
 497 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2003</td>\n",
 498 |               "    </tr>\n",
 499 |               "    <tr>\n",
 500 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOVFVAK12A8C1350D9</td>\n",
 501 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Tanssi vaan</td>\n",
 502 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Karkuteillä</td>\n",
 503 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Karkkiautomaatti</td>\n",
 504 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1995</td>\n",
 505 |               "    </tr>\n",
 506 |               "    <tr>\n",
 507 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOGTUKN12AB017F4F1</td>\n",
 508 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">No One Could Ever</td>\n",
 509 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Butter</td>\n",
 510 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Hudson Mohawke</td>\n",
 511 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2006</td>\n",
 512 |               "    </tr>\n",
 513 |               "    <tr>\n",
 514 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOBNYVR12A8C13558C</td>\n",
 515 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Si Vos Querés</td>\n",
 516 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">De Culo</td>\n",
 517 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Yerba Brava</td>\n",
 518 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2003</td>\n",
 519 |               "    </tr>\n",
 520 |               "    <tr>\n",
 521 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOHSBXH12A8C13B0DF</td>\n",
 522 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Tangle Of Aspens</td>\n",
 523 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Rene Ablaze Presents<br>Winter Sessions ...</td>\n",
 524 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Der Mystic</td>\n",
 525 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
 526 |               "    </tr>\n",
 527 |               "    <tr>\n",
 528 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOZVAPQ12A8C13B63C</td>\n",
 529 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Symphony No. 1 G minor<br>\"Sinfonie ...</td>\n",
 530 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Berwald: Symphonies Nos.<br>1/2/3/4 ...</td>\n",
 531 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">David Montgomery</td>\n",
 532 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
 533 |               "    </tr>\n",
 534 |               "    <tr>\n",
 535 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOQVRHI12A6D4FB2D7</td>\n",
 536 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">We Have Got Love</td>\n",
 537 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Strictly The Best Vol. 34</td>\n",
 538 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Sasha / Turbulence</td>\n",
 539 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
 540 |               "    </tr>\n",
 541 |               "    <tr>\n",
 542 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOEYRFT12AB018936C</td>\n",
 543 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2 Da Beat Ch'yall</td>\n",
 544 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Da Bomb</td>\n",
 545 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Kris Kross</td>\n",
 546 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1993</td>\n",
 547 |               "    </tr>\n",
 548 |               "    <tr>\n",
 549 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOPMIYT12A6D4F851E</td>\n",
 550 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Goodbye</td>\n",
 551 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Danny Boy</td>\n",
 552 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Joseph Locke</td>\n",
 553 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
 554 |               "    </tr>\n",
 555 |               "    <tr>\n",
 556 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOJCFMH12A8C13B0C2</td>\n",
 557 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">Mama_ mama can't you see<br>? ...</td>\n",
 558 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">March to cadence with the<br>US marines ...</td>\n",
 559 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">The Sun Harbor's Chorus-<br>Documentary Recordings ...</td>\n",
 560 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
 561 |               "    </tr>\n",
 562 |               "</table>\n",
 563 |               "[10 rows x 5 columns]<br/>\n",
 564 |               "</div>"
 565 |             ],
 566 |             "text/plain": [
 567 |               "Columns:\n",
 568 |               "\tsong_id\tstr\n",
 569 |               "\ttitle\tstr\n",
 570 |               "\trelease\tstr\n",
 571 |               "\tartist_name\tstr\n",
 572 |               "\tyear\tint\n",
 573 |               "\n",
 574 |               "Rows: 10\n",
 575 |               "\n",
 576 |               "Data:\n",
 577 |               "+--------------------+-------------------------------+\n",
 578 |               "|      song_id       |             title             |\n",
 579 |               "+--------------------+-------------------------------+\n",
 580 |               "| SOQMMHC12AB0180CB8 |          Silent Night         |\n",
 581 |               "| SOVFVAK12A8C1350D9 |          Tanssi vaan          |\n",
 582 |               "| SOGTUKN12AB017F4F1 |       No One Could Ever       |\n",
 583 |               "| SOBNYVR12A8C13558C |         Si Vos Querés         |\n",
 584 |               "| SOHSBXH12A8C13B0DF |        Tangle Of Aspens       |\n",
 585 |               "| SOZVAPQ12A8C13B63C | Symphony No. 1 G minor \"Si... |\n",
 586 |               "| SOQVRHI12A6D4FB2D7 |        We Have Got Love       |\n",
 587 |               "| SOEYRFT12AB018936C |       2 Da Beat Ch'yall       |\n",
 588 |               "| SOPMIYT12A6D4F851E |            Goodbye            |\n",
 589 |               "| SOJCFMH12A8C13B0C2 |   Mama_ mama can't you see ?  |\n",
 590 |               "+--------------------+-------------------------------+\n",
 591 |               "+-------------------------------+-------------------------------+------+\n",
 592 |               "|            release            |          artist_name          | year |\n",
 593 |               "+-------------------------------+-------------------------------+------+\n",
 594 |               "|     Monster Ballads X-Mas     |        Faster Pussy cat       | 2003 |\n",
 595 |               "|          Karkuteillä          |        Karkkiautomaatti       | 1995 |\n",
 596 |               "|             Butter            |         Hudson Mohawke        | 2006 |\n",
 597 |               "|            De Culo            |          Yerba Brava          | 2003 |\n",
 598 |               "| Rene Ablaze Presents Winte... |           Der Mystic          |  0   |\n",
 599 |               "| Berwald: Symphonies Nos. 1... |        David Montgomery       |  0   |\n",
 600 |               "|   Strictly The Best Vol. 34   |       Sasha / Turbulence      |  0   |\n",
 601 |               "|            Da Bomb            |           Kris Kross          | 1993 |\n",
 602 |               "|           Danny Boy           |          Joseph Locke         |  0   |\n",
 603 |               "| March to cadence with the ... | The Sun Harbor's Chorus-Do... |  0   |\n",
 604 |               "+-------------------------------+-------------------------------+------+\n",
 605 |               "[10 rows x 5 columns]"
 606 |             ]
 607 |           },
 608 |           "metadata": {
 609 |             "tags": []
 610 |           },
 611 |           "execution_count": 21
 612 |         }
 613 |       ]
 614 |     },
 615 |     {
 616 |       "metadata": {
 617 |         "id": "SlHixuEd3MI6",
 618 |         "colab_type": "text"
 619 |       },
 620 |       "cell_type": "markdown",
 621 |       "source": [
 622 |         "No options are needed for the simplest case, as the SFrame parser infers column types. Of course, there are many options you may need to specify when importing a csv file. Some of the more common options come in to play when we load the usage data of users listening to these songs online:"
 623 |       ]
 624 |     },
 625 |     {
 626 |       "metadata": {
 627 |         "id": "hzAkDYbR3Mk1",
 628 |         "colab_type": "code",
 629 |         "colab": {
 630 |           "base_uri": "https://localhost:8080/",
 631 |           "height": 107
 632 |         },
 633 |         "outputId": "d682e0eb-56fd-4e3b-eaa4-cdbbee857031"
 634 |       },
 635 |       "cell_type": "code",
 636 |       "source": [
 637 |         "usage_data = tc.SFrame.read_csv(\"./10000.txt\",\n",
 638 |         "                                header=False,\n",
 639 |         "                                delimiter='\\t',\n",
 640 |         "                                column_type_hints={'X3':int})"
 641 |       ],
 642 |       "execution_count": 8,
 643 |       "outputs": [
 644 |         {
 645 |           "output_type": "display_data",
 646 |           "data": {
 647 |             "text/html": [
 648 |               "<pre>Finished parsing file /content/10000.txt</pre>"
 649 |             ],
 650 |             "text/plain": [
 651 |               "Finished parsing file /content/10000.txt"
 652 |             ]
 653 |           },
 654 |           "metadata": {
 655 |             "tags": []
 656 |           }
 657 |         },
 658 |         {
 659 |           "output_type": "display_data",
 660 |           "data": {
 661 |             "text/html": [
 662 |               "<pre>Parsing completed. Parsed 100 lines in 1.70624 secs.</pre>"
 663 |             ],
 664 |             "text/plain": [
 665 |               "Parsing completed. Parsed 100 lines in 1.70624 secs."
 666 |             ]
 667 |           },
 668 |           "metadata": {
 669 |             "tags": []
 670 |           }
 671 |         },
 672 |         {
 673 |           "output_type": "display_data",
 674 |           "data": {
 675 |             "text/html": [
 676 |               "<pre>Read 844838 lines. Lines per second: 741447</pre>"
 677 |             ],
 678 |             "text/plain": [
 679 |               "Read 844838 lines. Lines per second: 741447"
 680 |             ]
 681 |           },
 682 |           "metadata": {
 683 |             "tags": []
 684 |           }
 685 |         },
 686 |         {
 687 |           "output_type": "display_data",
 688 |           "data": {
 689 |             "text/html": [
 690 |               "<pre>Finished parsing file /content/10000.txt</pre>"
 691 |             ],
 692 |             "text/plain": [
 693 |               "Finished parsing file /content/10000.txt"
 694 |             ]
 695 |           },
 696 |           "metadata": {
 697 |             "tags": []
 698 |           }
 699 |         },
 700 |         {
 701 |           "output_type": "display_data",
 702 |           "data": {
 703 |             "text/html": [
 704 |               "<pre>Parsing completed. Parsed 2000000 lines in 1.49596 secs.</pre>"
 705 |             ],
 706 |             "text/plain": [
 707 |               "Parsing completed. Parsed 2000000 lines in 1.49596 secs."
 708 |             ]
 709 |           },
 710 |           "metadata": {
 711 |             "tags": []
 712 |           }
 713 |         }
 714 |       ]
 715 |     },
 716 |     {
 717 |       "metadata": {
 718 |         "id": "1hgMj9Mq3gCC",
 719 |         "colab_type": "code",
 720 |         "colab": {
 721 |           "base_uri": "https://localhost:8080/",
 722 |           "height": 415
 723 |         },
 724 |         "outputId": "7523fd11-2573-418f-bc5f-8f627583f254"
 725 |       },
 726 |       "cell_type": "code",
 727 |       "source": [
 728 |         "usage_data.head()"
 729 |       ],
 730 |       "execution_count": 9,
 731 |       "outputs": [
 732 |         {
 733 |           "output_type": "execute_result",
 734 |           "data": {
 735 |             "text/html": [
 736 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
 737 |               "    <tr>\n",
 738 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">X1</th>\n",
 739 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">X2</th>\n",
 740 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">X3</th>\n",
 741 |               "    </tr>\n",
 742 |               "    <tr>\n",
 743 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 744 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOAKIMP12A8C130995</td>\n",
 745 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 746 |               "    </tr>\n",
 747 |               "    <tr>\n",
 748 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 749 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOBBMDR12A8C13253B</td>\n",
 750 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
 751 |               "    </tr>\n",
 752 |               "    <tr>\n",
 753 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 754 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOBXHDL12A81C204C0</td>\n",
 755 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 756 |               "    </tr>\n",
 757 |               "    <tr>\n",
 758 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 759 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOBYHAJ12A6701BF1D</td>\n",
 760 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 761 |               "    </tr>\n",
 762 |               "    <tr>\n",
 763 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 764 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SODACBL12A8C13C273</td>\n",
 765 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 766 |               "    </tr>\n",
 767 |               "    <tr>\n",
 768 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 769 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SODDNQT12A6D4F5F7E</td>\n",
 770 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n",
 771 |               "    </tr>\n",
 772 |               "    <tr>\n",
 773 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 774 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SODXRTY12AB0180F3B</td>\n",
 775 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 776 |               "    </tr>\n",
 777 |               "    <tr>\n",
 778 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 779 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOFGUAY12AB017B0A8</td>\n",
 780 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 781 |               "    </tr>\n",
 782 |               "    <tr>\n",
 783 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 784 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOFRQTD12A81C233C0</td>\n",
 785 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 786 |               "    </tr>\n",
 787 |               "    <tr>\n",
 788 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 789 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOHQWYZ12A6D4FA701</td>\n",
 790 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 791 |               "    </tr>\n",
 792 |               "</table>\n",
 793 |               "[10 rows x 3 columns]<br/>\n",
 794 |               "</div>"
 795 |             ],
 796 |             "text/plain": [
 797 |               "Columns:\n",
 798 |               "\tX1\tstr\n",
 799 |               "\tX2\tstr\n",
 800 |               "\tX3\tint\n",
 801 |               "\n",
 802 |               "Rows: 10\n",
 803 |               "\n",
 804 |               "Data:\n",
 805 |               "+-------------------------------+--------------------+----+\n",
 806 |               "|               X1              |         X2         | X3 |\n",
 807 |               "+-------------------------------+--------------------+----+\n",
 808 |               "| b80344d063b5ccb3212f76538f... | SOAKIMP12A8C130995 | 1  |\n",
 809 |               "| b80344d063b5ccb3212f76538f... | SOBBMDR12A8C13253B | 2  |\n",
 810 |               "| b80344d063b5ccb3212f76538f... | SOBXHDL12A81C204C0 | 1  |\n",
 811 |               "| b80344d063b5ccb3212f76538f... | SOBYHAJ12A6701BF1D | 1  |\n",
 812 |               "| b80344d063b5ccb3212f76538f... | SODACBL12A8C13C273 | 1  |\n",
 813 |               "| b80344d063b5ccb3212f76538f... | SODDNQT12A6D4F5F7E | 5  |\n",
 814 |               "| b80344d063b5ccb3212f76538f... | SODXRTY12AB0180F3B | 1  |\n",
 815 |               "| b80344d063b5ccb3212f76538f... | SOFGUAY12AB017B0A8 | 1  |\n",
 816 |               "| b80344d063b5ccb3212f76538f... | SOFRQTD12A81C233C0 | 1  |\n",
 817 |               "| b80344d063b5ccb3212f76538f... | SOHQWYZ12A6D4FA701 | 1  |\n",
 818 |               "+-------------------------------+--------------------+----+\n",
 819 |               "[10 rows x 3 columns]"
 820 |             ]
 821 |           },
 822 |           "metadata": {
 823 |             "tags": []
 824 |           },
 825 |           "execution_count": 9
 826 |         }
 827 |       ]
 828 |     },
 829 |     {
 830 |       "metadata": {
 831 |         "id": "NJzJVDKF3SGO",
 832 |         "colab_type": "text"
 833 |       },
 834 |       "cell_type": "markdown",
 835 |       "source": [
 836 |         "The header and delimiter options are needed because this particular csv file does not provide column names in its first line, and the values are separated by tabs, not commas. The column_type_hints keeps the SFrame csv parser from attempting to infer the datatype of each column, which it does by default. For a full list of options when parsing csv files, check our [API Reference](https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.read_csv.html#turicreate.SFrame.read_csv)."
 837 |       ]
 838 |     },
 839 |     {
 840 |       "metadata": {
 841 |         "id": "-8jl3ffU3k6v",
 842 |         "colab_type": "text"
 843 |       },
 844 |       "cell_type": "markdown",
 845 |       "source": [
 846 |         "Here we might want to rename columns from the default names:"
 847 |       ]
 848 |     },
 849 |     {
 850 |       "metadata": {
 851 |         "id": "Tpx2vEVP3mvw",
 852 |         "colab_type": "code",
 853 |         "colab": {
 854 |           "base_uri": "https://localhost:8080/",
 855 |           "height": 449
 856 |         },
 857 |         "outputId": "1b4ff3dd-a88c-43db-a809-b6c539b00c4c"
 858 |       },
 859 |       "cell_type": "code",
 860 |       "source": [
 861 |         "usage_data.rename({'X1':'user_id', 'X2':'song_id', 'X3':'listen_count'})"
 862 |       ],
 863 |       "execution_count": 10,
 864 |       "outputs": [
 865 |         {
 866 |           "output_type": "execute_result",
 867 |           "data": {
 868 |             "text/html": [
 869 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
 870 |               "    <tr>\n",
 871 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">user_id</th>\n",
 872 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">song_id</th>\n",
 873 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">listen_count</th>\n",
 874 |               "    </tr>\n",
 875 |               "    <tr>\n",
 876 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 877 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOAKIMP12A8C130995</td>\n",
 878 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 879 |               "    </tr>\n",
 880 |               "    <tr>\n",
 881 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 882 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOBBMDR12A8C13253B</td>\n",
 883 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
 884 |               "    </tr>\n",
 885 |               "    <tr>\n",
 886 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 887 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOBXHDL12A81C204C0</td>\n",
 888 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 889 |               "    </tr>\n",
 890 |               "    <tr>\n",
 891 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 892 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOBYHAJ12A6701BF1D</td>\n",
 893 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 894 |               "    </tr>\n",
 895 |               "    <tr>\n",
 896 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 897 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SODACBL12A8C13C273</td>\n",
 898 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 899 |               "    </tr>\n",
 900 |               "    <tr>\n",
 901 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 902 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SODDNQT12A6D4F5F7E</td>\n",
 903 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n",
 904 |               "    </tr>\n",
 905 |               "    <tr>\n",
 906 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 907 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SODXRTY12AB0180F3B</td>\n",
 908 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 909 |               "    </tr>\n",
 910 |               "    <tr>\n",
 911 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 912 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOFGUAY12AB017B0A8</td>\n",
 913 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 914 |               "    </tr>\n",
 915 |               "    <tr>\n",
 916 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 917 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOFRQTD12A81C233C0</td>\n",
 918 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 919 |               "    </tr>\n",
 920 |               "    <tr>\n",
 921 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b80344d063b5ccb3212f76538<br>f3d9e43d87dca9e ...</td>\n",
 922 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">SOHQWYZ12A6D4FA701</td>\n",
 923 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
 924 |               "    </tr>\n",
 925 |               "</table>\n",
 926 |               "[2000000 rows x 3 columns]<br/>Note: Only the head of the SFrame is printed.<br/>You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n",
 927 |               "</div>"
 928 |             ],
 929 |             "text/plain": [
 930 |               "Columns:\n",
 931 |               "\tuser_id\tstr\n",
 932 |               "\tsong_id\tstr\n",
 933 |               "\tlisten_count\tint\n",
 934 |               "\n",
 935 |               "Rows: 2000000\n",
 936 |               "\n",
 937 |               "Data:\n",
 938 |               "+-------------------------------+--------------------+--------------+\n",
 939 |               "|            user_id            |      song_id       | listen_count |\n",
 940 |               "+-------------------------------+--------------------+--------------+\n",
 941 |               "| b80344d063b5ccb3212f76538f... | SOAKIMP12A8C130995 |      1       |\n",
 942 |               "| b80344d063b5ccb3212f76538f... | SOBBMDR12A8C13253B |      2       |\n",
 943 |               "| b80344d063b5ccb3212f76538f... | SOBXHDL12A81C204C0 |      1       |\n",
 944 |               "| b80344d063b5ccb3212f76538f... | SOBYHAJ12A6701BF1D |      1       |\n",
 945 |               "| b80344d063b5ccb3212f76538f... | SODACBL12A8C13C273 |      1       |\n",
 946 |               "| b80344d063b5ccb3212f76538f... | SODDNQT12A6D4F5F7E |      5       |\n",
 947 |               "| b80344d063b5ccb3212f76538f... | SODXRTY12AB0180F3B |      1       |\n",
 948 |               "| b80344d063b5ccb3212f76538f... | SOFGUAY12AB017B0A8 |      1       |\n",
 949 |               "| b80344d063b5ccb3212f76538f... | SOFRQTD12A81C233C0 |      1       |\n",
 950 |               "| b80344d063b5ccb3212f76538f... | SOHQWYZ12A6D4FA701 |      1       |\n",
 951 |               "+-------------------------------+--------------------+--------------+\n",
 952 |               "[2000000 rows x 3 columns]\n",
 953 |               "Note: Only the head of the SFrame is printed.\n",
 954 |               "You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns."
 955 |             ]
 956 |           },
 957 |           "metadata": {
 958 |             "tags": []
 959 |           },
 960 |           "execution_count": 10
 961 |         }
 962 |       ]
 963 |     },
 964 |     {
 965 |       "metadata": {
 966 |         "id": "ymSWtQwz3pxd",
 967 |         "colab_type": "text"
 968 |       },
 969 |       "cell_type": "markdown",
 970 |       "source": [
 971 |         "SFrames can be saved as a csv file or in the SFrame binary format. If your SFrame is saved in binary format loading it is instantaneous, so we won't ever have to parse that file again. Here, the default is to save in binary format, and we supply the name of a directory to be created which will hold the binary files:"
 972 |       ]
 973 |     },
 974 |     {
 975 |       "metadata": {
 976 |         "id": "rxJsNce-3ZRL",
 977 |         "colab_type": "code",
 978 |         "colab": {}
 979 |       },
 980 |       "cell_type": "code",
 981 |       "source": [
 982 |         "usage_data.save('./music_usage_data.sframe')"
 983 |       ],
 984 |       "execution_count": 0,
 985 |       "outputs": []
 986 |     },
 987 |     {
 988 |       "metadata": {
 989 |         "id": "gLcdT8Cv3tfb",
 990 |         "colab_type": "text"
 991 |       },
 992 |       "cell_type": "markdown",
 993 |       "source": [
 994 |         "Loading is then very fast:"
 995 |       ]
 996 |     },
 997 |     {
 998 |       "metadata": {
 999 |         "id": "_FUN-pBF3vTy",
1000 |         "colab_type": "code",
1001 |         "colab": {}
1002 |       },
1003 |       "cell_type": "code",
1004 |       "source": [
1005 |         "same_usage_data = tc.load_sframe('./music_usage_data.sframe')"
1006 |       ],
1007 |       "execution_count": 0,
1008 |       "outputs": []
1009 |     },
1010 |     {
1011 |       "metadata": {
1012 |         "id": "ze0lxR9k3zRA",
1013 |         "colab_type": "text"
1014 |       },
1015 |       "cell_type": "markdown",
1016 |       "source": [
1017 |         "## Data Types\n",
1018 |         "\n",
1019 |         "An SFrame is made up of columns of a contiguous type, a number of datatypes are supported:\n",
1020 |         "\n",
1021 |         "*   int (signed 64-bit integer)\n",
1022 |         "*   float (double-precision floating point)\n",
1023 |         "*   str (string)\n",
1024 |         "*   array.array (1-D array of doubles)\n",
1025 |         "*   list (arbitrarily list of elements)\n",
1026 |         "*   dict (arbitrary dictionary of elements)\n",
1027 |         "*   datetime.datetime (datetime with microsecond precision)\n",
1028 |         "*   image (image)"
1029 |       ]
1030 |     },
1031 |     {
1032 |       "metadata": {
1033 |         "id": "HUYoOHQjCxaZ",
1034 |         "colab_type": "text"
1035 |       },
1036 |       "cell_type": "markdown",
1037 |       "source": [
1038 |         "## Memory Intensive Example\n",
1039 |         "\n",
1040 |         "https://blog.usejournal.com/python-for-big-data-computation-on-a-single-computer-c232046df3c3\n",
1041 |         "\n",
1042 |         "The data we will use for our experiment comes from the (now inexistent) Gowalla social networking site. Two data nice data sets coming from this site are available here. We will be looking at the biggest one, which contains the event-log of “check-ins” of Gowalla’s users to a set of locations. This data set contains 6.44 million records, each containing a single check-in and just a few columns, of which we will pick only 3: user_id, location_id and checkin_ts (the second-resolution timestamp of the check-in event).\n",
1043 |         "\n",
1044 |         "https://snap.stanford.edu/data/loc-gowalla.html"
1045 |       ]
1046 |     },
1047 |     {
1048 |       "metadata": {
1049 |         "id": "YfHt3wIDEAYS",
1050 |         "colab_type": "text"
1051 |       },
1052 |       "cell_type": "markdown",
1053 |       "source": [
1054 |         "The problem and its (theoretical) solution\n",
1055 |         "We will use Turi Create to attack what could be termed the “stalker-stalkee detection problem” on this data set. In this problem, we are asked to identify pairs of users (E, R) that maximize the ‘stalking measure between E and R’. The stalking measure between E and R is defined as the number of distinct locations where there was ever a check-in by user E (the stalkEE) followed by a check-in by user R (the stalkER).\n",
1056 |         "\n",
1057 |         "The first thing is to index the check-ins by location_id (remember that in pandas a single value for a key can refer to more than one row). This will make the following computation easier.\n",
1058 |         "\n",
1059 |         "Then comes the tricky part, for each location we want to consider all pairs of check-ins where the check-in time stamp of the first user in the pair strictly precedes that of the second user. So generate chin_pairs, a data frame containing all pairs of check-ins for the same location and then filter it to enforce the conditions just described, to generate pairs_filtered.\n",
1060 |         "\n",
1061 |         "However, trying to run a naïve Pandas solution on a laptop or PC with the amount of RAM that is usual these days, (say 16GB), will result in a MemoryError exception. With Turi Create and SFrames we do not have such problems."
1062 |       ]
1063 |     },
1064 |     {
1065 |       "metadata": {
1066 |         "id": "tj8qH8pxEqqg",
1067 |         "colab_type": "code",
1068 |         "colab": {
1069 |           "base_uri": "https://localhost:8080/",
1070 |           "height": 269
1071 |         },
1072 |         "outputId": "49646044-ff47-41fa-af77-ac1b4c27a0df"
1073 |       },
1074 |       "cell_type": "code",
1075 |       "source": [
1076 |         "checkins = ( tc.SFrame.read_csv( 'loc-gowalla_totalCheckins.txt',                  \n",
1077 |         "                                 delimiter='\\t', header=False )\n",
1078 |         "                .rename( {'X1': 'user_id', 'X2' : 'checkin_ts',\n",
1079 |         "                          'X3': 'lat', 'X4' : 'lon',\n",
1080 |         "                          'X5': 'location_id'} )\n",
1081 |         "  [[\"user_id\", \"location_id\", \"checkin_ts\"]] )"
1082 |       ],
1083 |       "execution_count": 6,
1084 |       "outputs": [
1085 |         {
1086 |           "output_type": "display_data",
1087 |           "data": {
1088 |             "text/plain": [
1089 |               "Finished parsing file /content/loc-gowalla_totalCheckins.txt"
1090 |             ],
1091 |             "text/html": [
1092 |               "<pre>Finished parsing file /content/loc-gowalla_totalCheckins.txt</pre>"
1093 |             ]
1094 |           },
1095 |           "metadata": {
1096 |             "tags": []
1097 |           }
1098 |         },
1099 |         {
1100 |           "output_type": "display_data",
1101 |           "data": {
1102 |             "text/plain": [
1103 |               "Parsing completed. Parsed 100 lines in 1.49398 secs."
1104 |             ],
1105 |             "text/html": [
1106 |               "<pre>Parsing completed. Parsed 100 lines in 1.49398 secs.</pre>"
1107 |             ]
1108 |           },
1109 |           "metadata": {
1110 |             "tags": []
1111 |           }
1112 |         },
1113 |         {
1114 |           "output_type": "stream",
1115 |           "text": [
1116 |             "------------------------------------------------------\n",
1117 |             "Inferred types from first 100 line(s) of file as \n",
1118 |             "column_type_hints=[int,str,float,float,int]\n",
1119 |             "If parsing fails due to incorrect types, you can correct\n",
1120 |             "the inferred type list above and pass it to read_csv in\n",
1121 |             "the column_type_hints argument\n",
1122 |             "------------------------------------------------------\n"
1123 |           ],
1124 |           "name": "stdout"
1125 |         },
1126 |         {
1127 |           "output_type": "display_data",
1128 |           "data": {
1129 |             "text/plain": [
1130 |               "Read 870755 lines. Lines per second: 228086"
1131 |             ],
1132 |             "text/html": [
1133 |               "<pre>Read 870755 lines. Lines per second: 228086</pre>"
1134 |             ]
1135 |           },
1136 |           "metadata": {
1137 |             "tags": []
1138 |           }
1139 |         },
1140 |         {
1141 |           "output_type": "display_data",
1142 |           "data": {
1143 |             "text/plain": [
1144 |               "Read 2588975 lines. Lines per second: 271390"
1145 |             ],
1146 |             "text/html": [
1147 |               "<pre>Read 2588975 lines. Lines per second: 271390</pre>"
1148 |             ]
1149 |           },
1150 |           "metadata": {
1151 |             "tags": []
1152 |           }
1153 |         },
1154 |         {
1155 |           "output_type": "display_data",
1156 |           "data": {
1157 |             "text/plain": [
1158 |               "Read 4301430 lines. Lines per second: 281586"
1159 |             ],
1160 |             "text/html": [
1161 |               "<pre>Read 4301430 lines. Lines per second: 281586</pre>"
1162 |             ]
1163 |           },
1164 |           "metadata": {
1165 |             "tags": []
1166 |           }
1167 |         },
1168 |         {
1169 |           "output_type": "display_data",
1170 |           "data": {
1171 |             "text/plain": [
1172 |               "Finished parsing file /content/loc-gowalla_totalCheckins.txt"
1173 |             ],
1174 |             "text/html": [
1175 |               "<pre>Finished parsing file /content/loc-gowalla_totalCheckins.txt</pre>"
1176 |             ]
1177 |           },
1178 |           "metadata": {
1179 |             "tags": []
1180 |           }
1181 |         },
1182 |         {
1183 |           "output_type": "display_data",
1184 |           "data": {
1185 |             "text/plain": [
1186 |               "Parsing completed. Parsed 6442892 lines in 19.7667 secs."
1187 |             ],
1188 |             "text/html": [
1189 |               "<pre>Parsing completed. Parsed 6442892 lines in 19.7667 secs.</pre>"
1190 |             ]
1191 |           },
1192 |           "metadata": {
1193 |             "tags": []
1194 |           }
1195 |         }
1196 |       ]
1197 |     },
1198 |     {
1199 |       "metadata": {
1200 |         "id": "owEASksxFHiK",
1201 |         "colab_type": "code",
1202 |         "colab": {
1203 |           "base_uri": "https://localhost:8080/",
1204 |           "height": 245
1205 |         },
1206 |         "outputId": "1370e5e6-5833-4ced-a991-4bac8453dc5e"
1207 |       },
1208 |       "cell_type": "code",
1209 |       "source": [
1210 |         "checkins.head()"
1211 |       ],
1212 |       "execution_count": 7,
1213 |       "outputs": [
1214 |         {
1215 |           "output_type": "execute_result",
1216 |           "data": {
1217 |             "text/html": [
1218 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
1219 |               "    <tr>\n",
1220 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">user_id</th>\n",
1221 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">location_id</th>\n",
1222 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">checkin_ts</th>\n",
1223 |               "    </tr>\n",
1224 |               "    <tr>\n",
1225 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1226 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">22847</td>\n",
1227 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-19T23:55:27Z</td>\n",
1228 |               "    </tr>\n",
1229 |               "    <tr>\n",
1230 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1231 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1232 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1233 |               "    </tr>\n",
1234 |               "    <tr>\n",
1235 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1236 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">316637</td>\n",
1237 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-17T23:42:03Z</td>\n",
1238 |               "    </tr>\n",
1239 |               "    <tr>\n",
1240 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1241 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">16516</td>\n",
1242 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-17T19:26:05Z</td>\n",
1243 |               "    </tr>\n",
1244 |               "    <tr>\n",
1245 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1246 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5535878</td>\n",
1247 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-16T18:50:42Z</td>\n",
1248 |               "    </tr>\n",
1249 |               "    <tr>\n",
1250 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1251 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15372</td>\n",
1252 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-12T23:58:03Z</td>\n",
1253 |               "    </tr>\n",
1254 |               "    <tr>\n",
1255 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1256 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">21714</td>\n",
1257 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-12T22:02:11Z</td>\n",
1258 |               "    </tr>\n",
1259 |               "    <tr>\n",
1260 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1261 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1262 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-12T19:44:40Z</td>\n",
1263 |               "    </tr>\n",
1264 |               "    <tr>\n",
1265 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1266 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">153505</td>\n",
1267 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-12T15:57:20Z</td>\n",
1268 |               "    </tr>\n",
1269 |               "    <tr>\n",
1270 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1271 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1272 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-12T15:19:03Z</td>\n",
1273 |               "    </tr>\n",
1274 |               "</table>\n",
1275 |               "[10 rows x 3 columns]<br/>\n",
1276 |               "</div>"
1277 |             ],
1278 |             "text/plain": [
1279 |               "Columns:\n",
1280 |               "\tuser_id\tint\n",
1281 |               "\tlocation_id\tint\n",
1282 |               "\tcheckin_ts\tstr\n",
1283 |               "\n",
1284 |               "Rows: 10\n",
1285 |               "\n",
1286 |               "Data:\n",
1287 |               "+---------+-------------+----------------------+\n",
1288 |               "| user_id | location_id |      checkin_ts      |\n",
1289 |               "+---------+-------------+----------------------+\n",
1290 |               "|    0    |    22847    | 2010-10-19T23:55:27Z |\n",
1291 |               "|    0    |    420315   | 2010-10-18T22:17:43Z |\n",
1292 |               "|    0    |    316637   | 2010-10-17T23:42:03Z |\n",
1293 |               "|    0    |    16516    | 2010-10-17T19:26:05Z |\n",
1294 |               "|    0    |   5535878   | 2010-10-16T18:50:42Z |\n",
1295 |               "|    0    |    15372    | 2010-10-12T23:58:03Z |\n",
1296 |               "|    0    |    21714    | 2010-10-12T22:02:11Z |\n",
1297 |               "|    0    |    420315   | 2010-10-12T19:44:40Z |\n",
1298 |               "|    0    |    153505   | 2010-10-12T15:57:20Z |\n",
1299 |               "|    0    |    420315   | 2010-10-12T15:19:03Z |\n",
1300 |               "+---------+-------------+----------------------+\n",
1301 |               "[10 rows x 3 columns]"
1302 |             ]
1303 |           },
1304 |           "metadata": {
1305 |             "tags": []
1306 |           },
1307 |           "execution_count": 7
1308 |         }
1309 |       ]
1310 |     },
1311 |     {
1312 |       "metadata": {
1313 |         "id": "0vdQZ__rFQ8I",
1314 |         "colab_type": "text"
1315 |       },
1316 |       "cell_type": "markdown",
1317 |       "source": [
1318 |         "Next, generate the pairs of check-ins that satisfy the conditions of our detection algorithms."
1319 |       ]
1320 |     },
1321 |     {
1322 |       "metadata": {
1323 |         "id": "nFoEzuYlMQgv",
1324 |         "colab_type": "code",
1325 |         "colab": {}
1326 |       },
1327 |       "cell_type": "code",
1328 |       "source": [
1329 |         "import datetime\n",
1330 |         "import dateutil.parser"
1331 |       ],
1332 |       "execution_count": 0,
1333 |       "outputs": []
1334 |     },
1335 |     {
1336 |       "metadata": {
1337 |         "id": "1wEhpuFVFUJC",
1338 |         "colab_type": "code",
1339 |         "colab": {}
1340 |       },
1341 |       "cell_type": "code",
1342 |       "source": [
1343 |         "chin_ps = ( checkins.join(checkins, on='location_id').rename( {'checkin_ts': 'checkin_ts_ee', 'checkin_ts.1': 'checkin_ts_er', 'user_id': 'stalkee' , 'user_id.1': 'stalker' } ) )"
1344 |       ],
1345 |       "execution_count": 0,
1346 |       "outputs": []
1347 |     },
1348 |     {
1349 |       "metadata": {
1350 |         "id": "phw55yidIWg_",
1351 |         "colab_type": "code",
1352 |         "colab": {}
1353 |       },
1354 |       "cell_type": "code",
1355 |       "source": [
1356 |         "chin_ps['time_diff'] = (chin_ps['checkin_ts_er'].apply(dateutil.parser.parse) - chin_ps['checkin_ts_ee'].apply(dateutil.parser.parse)) / 86400"
1357 |       ],
1358 |       "execution_count": 0,
1359 |       "outputs": []
1360 |     },
1361 |     {
1362 |       "metadata": {
1363 |         "id": "t9OQPWvlFuIE",
1364 |         "colab_type": "code",
1365 |         "colab": {
1366 |           "base_uri": "https://localhost:8080/",
1367 |           "height": 245
1368 |         },
1369 |         "outputId": "dbb4b625-e100-487c-d104-66b0f819062c"
1370 |       },
1371 |       "cell_type": "code",
1372 |       "source": [
1373 |         "# pairs_filtered = chin_ps[ (chin_ps['checkin_ts_ee'] < chin_ps['checkin_ts_er']) & (chin_ps['stalkee'] != chin_ps['stalker']) ]\n",
1374 |         "pairs_filtered = chin_ps[ (chin_ps['time_diff'] > 0.0) & (chin_ps['time_diff'] < 1.0) & (chin_ps['stalkee'] != chin_ps['stalker']) ]\n",
1375 |         "pairs_filtered.head()"
1376 |       ],
1377 |       "execution_count": 11,
1378 |       "outputs": [
1379 |         {
1380 |           "output_type": "execute_result",
1381 |           "data": {
1382 |             "text/html": [
1383 |               "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
1384 |               "    <tr>\n",
1385 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">stalkee</th>\n",
1386 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">location_id</th>\n",
1387 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">checkin_ts_ee</th>\n",
1388 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">stalker</th>\n",
1389 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">checkin_ts_er</th>\n",
1390 |               "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">time_diff</th>\n",
1391 |               "    </tr>\n",
1392 |               "    <tr>\n",
1393 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7</td>\n",
1394 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1395 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T20:24:42Z</td>\n",
1396 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1397 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1398 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0784837962963</td>\n",
1399 |               "    </tr>\n",
1400 |               "    <tr>\n",
1401 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7</td>\n",
1402 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1403 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T15:08:58Z</td>\n",
1404 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1405 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1406 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.297743055556</td>\n",
1407 |               "    </tr>\n",
1408 |               "    <tr>\n",
1409 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">31</td>\n",
1410 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1411 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T14:00:53Z</td>\n",
1412 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1413 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1414 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.345023148148</td>\n",
1415 |               "    </tr>\n",
1416 |               "    <tr>\n",
1417 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">66</td>\n",
1418 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1419 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T18:59:11Z</td>\n",
1420 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1421 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1422 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.13787037037</td>\n",
1423 |               "    </tr>\n",
1424 |               "    <tr>\n",
1425 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">327</td>\n",
1426 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1427 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T21:21:12Z</td>\n",
1428 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1429 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1430 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0392476851852</td>\n",
1431 |               "    </tr>\n",
1432 |               "    <tr>\n",
1433 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">327</td>\n",
1434 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1435 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T14:05:59Z</td>\n",
1436 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1437 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1438 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.341481481481</td>\n",
1439 |               "    </tr>\n",
1440 |               "    <tr>\n",
1441 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">342</td>\n",
1442 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1443 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T14:10:40Z</td>\n",
1444 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1445 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1446 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.338229166667</td>\n",
1447 |               "    </tr>\n",
1448 |               "    <tr>\n",
1449 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">350</td>\n",
1450 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1451 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T19:28:34Z</td>\n",
1452 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1453 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1454 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.117465277778</td>\n",
1455 |               "    </tr>\n",
1456 |               "    <tr>\n",
1457 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">456</td>\n",
1458 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1459 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T16:00:08Z</td>\n",
1460 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1461 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1462 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.262210648148</td>\n",
1463 |               "    </tr>\n",
1464 |               "    <tr>\n",
1465 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">515</td>\n",
1466 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">420315</td>\n",
1467 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T11:42:06Z</td>\n",
1468 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
1469 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2010-10-18T22:17:43Z</td>\n",
1470 |               "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.441400462963</td>\n",
1471 |               "    </tr>\n",
1472 |               "</table>\n",
1473 |               "[10 rows x 6 columns]<br/>\n",
1474 |               "</div>"
1475 |             ],
1476 |             "text/plain": [
1477 |               "Columns:\n",
1478 |               "\tstalkee\tint\n",
1479 |               "\tlocation_id\tint\n",
1480 |               "\tcheckin_ts_ee\tstr\n",
1481 |               "\tstalker\tint\n",
1482 |               "\tcheckin_ts_er\tstr\n",
1483 |               "\ttime_diff\tfloat\n",
1484 |               "\n",
1485 |               "Rows: 10\n",
1486 |               "\n",
1487 |               "Data:\n",
1488 |               "+---------+-------------+----------------------+---------+----------------------+\n",
1489 |               "| stalkee | location_id |    checkin_ts_ee     | stalker |    checkin_ts_er     |\n",
1490 |               "+---------+-------------+----------------------+---------+----------------------+\n",
1491 |               "|    7    |    420315   | 2010-10-18T20:24:42Z |    0    | 2010-10-18T22:17:43Z |\n",
1492 |               "|    7    |    420315   | 2010-10-18T15:08:58Z |    0    | 2010-10-18T22:17:43Z |\n",
1493 |               "|    31   |    420315   | 2010-10-18T14:00:53Z |    0    | 2010-10-18T22:17:43Z |\n",
1494 |               "|    66   |    420315   | 2010-10-18T18:59:11Z |    0    | 2010-10-18T22:17:43Z |\n",
1495 |               "|   327   |    420315   | 2010-10-18T21:21:12Z |    0    | 2010-10-18T22:17:43Z |\n",
1496 |               "|   327   |    420315   | 2010-10-18T14:05:59Z |    0    | 2010-10-18T22:17:43Z |\n",
1497 |               "|   342   |    420315   | 2010-10-18T14:10:40Z |    0    | 2010-10-18T22:17:43Z |\n",
1498 |               "|   350   |    420315   | 2010-10-18T19:28:34Z |    0    | 2010-10-18T22:17:43Z |\n",
1499 |               "|   456   |    420315   | 2010-10-18T16:00:08Z |    0    | 2010-10-18T22:17:43Z |\n",
1500 |               "|   515   |    420315   | 2010-10-18T11:42:06Z |    0    | 2010-10-18T22:17:43Z |\n",
1501 |               "+---------+-------------+----------------------+---------+----------------------+\n",
1502 |               "+-----------------+\n",
1503 |               "|    time_diff    |\n",
1504 |               "+-----------------+\n",
1505 |               "| 0.0784837962963 |\n",
1506 |               "|  0.297743055556 |\n",
1507 |               "|  0.345023148148 |\n",
1508 |               "|  0.13787037037  |\n",
1509 |               "| 0.0392476851852 |\n",
1510 |               "|  0.341481481481 |\n",
1511 |               "|  0.338229166667 |\n",
1512 |               "|  0.117465277778 |\n",
1513 |               "|  0.262210648148 |\n",
1514 |               "|  0.441400462963 |\n",
1515 |               "+-----------------+\n",
1516 |               "[10 rows x 6 columns]"
1517 |             ]
1518 |           },
1519 |           "metadata": {
1520 |             "tags": []
1521 |           },
1522 |           "execution_count": 11
1523 |         }
1524 |       ]
1525 |     },
1526 |     {
1527 |       "metadata": {
1528 |         "id": "lFli73MVF6S3",
1529 |         "colab_type": "code",
1530 |         "colab": {}
1531 |       },
1532 |       "cell_type": "code",
1533 |       "source": [
1534 |         "final_result = ( pairs_filtered[['stalkee', 'stalker', 'location_id']]\n",
1535 |         "                    .unique()\n",
1536 |         "                    .groupby( ['stalkee', 'stalker'], {\"location_count\": agg.COUNT })\n",
1537 |         "                    .topk( 'location_count', k=5 )\n",
1538 |         "                    .materialize() )"
1539 |       ],
1540 |       "execution_count": 0,
1541 |       "outputs": []
1542 |     },
1543 |     {
1544 |       "metadata": {
1545 |         "id": "m3fxFqCbF8zt",
1546 |         "colab_type": "code",
1547 |         "colab": {}
1548 |       },
1549 |       "cell_type": "code",
1550 |       "source": [
1551 |         "print( final_result )"
1552 |       ],
1553 |       "execution_count": 0,
1554 |       "outputs": []
1555 |     },
1556 |     {
1557 |       "metadata": {
1558 |         "id": "zg8dAd69A0eU",
1559 |         "colab_type": "text"
1560 |       },
1561 |       "cell_type": "markdown",
1562 |       "source": [
1563 |         "## Articles, Repositories, etc\n",
1564 |         "\n",
1565 |         "*   https://medium.com/@nilotic2/a-guide-to-turi-create-a72f53f26721\n",
1566 |         "*   https://blog.usejournal.com/python-for-big-data-computation-on-a-single-computer-c232046df3c3\n",
1567 |         "*   https://github.com/onmyway133/Avengers"
1568 |       ]
1569 |     }
1570 |   ]
1571 | }


--------------------------------------------------------------------------------