├── README.md
├── turicreate_activity_classification.ipynb
├── turicreate_classify_data.ipynb
├── turicreate_image_classification.ipynb
├── turicreate_image_similarity.ipynb
├── turicreate_object_detection.ipynb
├── turicreate_recommender.ipynb
├── turicreate_sframes_intro.ipynb
├── turicreate_style_transfer.ipynb
└── turicreate_text_classification.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # turicreate-colab
2 | A collection of Google Colaboratory notebooks for Turi Create created from various available resources.
3 |
4 | Click the link to open the notebook in Google Colab. To learn more about Google Colab start at https://colab.research.google.com/. After Google Colab opens the notebook, click on the `File` menu and select `Save a copy in Drive...` and you can edit the notebook on your own Google Drive.
5 |
6 | * SFrames Introduction: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_sframes_intro.ipynb
7 | * Style Transfer: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_style_transfer.ipynb
8 | * Image Classification: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_image_classification.ipynb
9 | * Image Similarity: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_image_similarity.ipynb
10 | * Activity Classification: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_activity_classification.ipynb
11 | * Data Classification: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_classify_data.ipynb
12 | * Object Detection: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_object_detection.ipynb
13 | * Recommender: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_recommender.ipynb
14 | * Text Classification: https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_text_classification.ipynb
15 |
--------------------------------------------------------------------------------
/turicreate_activity_classification.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "turicreate-activity-classification.ipynb",
7 | "version": "0.3.2",
8 | "provenance": [],
9 | "collapsed_sections": [],
10 | "include_colab_link": true
11 | },
12 | "kernelspec": {
13 | "name": "python3",
14 | "display_name": "Python 3"
15 | },
16 | "accelerator": "GPU"
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {
22 | "id": "view-in-github",
23 | "colab_type": "text"
24 | },
25 | "source": [
26 | "[View in Colaboratory](https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_activity_classification.ipynb)"
27 | ]
28 | },
29 | {
30 | "metadata": {
31 | "id": "3zKSmHFi38IA",
32 | "colab_type": "text"
33 | },
34 | "cell_type": "markdown",
35 | "source": [
36 | "# Activity Classification\n",
37 | "https://apple.github.io/turicreate/docs/userguide/activity_classifier/\n",
38 | "\n",
39 | "Activity classification is the task of identifying a pre-defined set of physical actions using motion-sensory inputs. Such sensors include accelerometers, gyroscopes, thermostats, and more found in most handheld devices today.\n",
40 | "\n",
41 | "Possible applications include counting swimming laps using a watch's accelerometer data, turning on Bluetooth controlled lights when recognizing a certain gesture using gyroscope data from a handheld phone, or creating shortcuts to your favorite phone applications using hand gestures.\n",
42 | "\n",
43 | "The activity classifier in Turi Create creates a deep learning model capable of detecting temporal features in sensor data, lending itself well to the task of activity classification. Before we dive into the model architecture, let's see a working example."
44 | ]
45 | },
46 | {
47 | "metadata": {
48 | "id": "-ywt2dW81uvE",
49 | "colab_type": "text"
50 | },
51 | "cell_type": "markdown",
52 | "source": [
53 | "## Turi Create and GPU Setup"
54 | ]
55 | },
56 | {
57 | "metadata": {
58 | "id": "nZBUZmlD1vWh",
59 | "colab_type": "code",
60 | "colab": {}
61 | },
62 | "cell_type": "code",
63 | "source": [
64 | "!apt install libnvrtc8.0\n",
65 | "!pip uninstall -y mxnet-cu80 && pip install mxnet-cu80==1.1.0\n",
66 | "!pip install turicreate"
67 | ],
68 | "execution_count": 0,
69 | "outputs": []
70 | },
71 | {
72 | "metadata": {
73 | "id": "Wtd2fPfSbPe9",
74 | "colab_type": "text"
75 | },
76 | "cell_type": "markdown",
77 | "source": [
78 | "## Google Drive Access\n",
79 | "\n",
80 | "You will be asked to click a link to generate a secret key to access your Google Drive. \n",
81 | "\n",
82 | "Copy and paste secret key it into the space provided with the notebook."
83 | ]
84 | },
85 | {
86 | "metadata": {
87 | "id": "BvGd7tK8bQmM",
88 | "colab_type": "code",
89 | "colab": {
90 | "base_uri": "https://localhost:8080/",
91 | "height": 35
92 | },
93 | "outputId": "73652884-cee8-44a9-fee7-3341cbfde329"
94 | },
95 | "cell_type": "code",
96 | "source": [
97 | "import os.path\n",
98 | "from google.colab import drive\n",
99 | "\n",
100 | "# mount Google Drive to /content/drive/My Drive/\n",
101 | "if os.path.isdir(\"/content/drive/My Drive\"):\n",
102 | " print(\"Google Drive already mounted\")\n",
103 | "else:\n",
104 | " drive.mount('/content/drive')"
105 | ],
106 | "execution_count": 1,
107 | "outputs": [
108 | {
109 | "output_type": "stream",
110 | "text": [
111 | "Google Drive already mounted\n"
112 | ],
113 | "name": "stdout"
114 | }
115 | ]
116 | },
117 | {
118 | "metadata": {
119 | "id": "Cv8D8h_8bfQ2",
120 | "colab_type": "text"
121 | },
122 | "cell_type": "markdown",
123 | "source": [
124 | "## Fetch Data"
125 | ]
126 | },
127 | {
128 | "metadata": {
129 | "id": "gb2Y5xDHJQoh",
130 | "colab_type": "code",
131 | "colab": {}
132 | },
133 | "cell_type": "code",
134 | "source": [
135 | "import os.path\n",
136 | "import urllib.request\n",
137 | "import tarfile\n",
138 | "import zipfile\n",
139 | "import gzip\n",
140 | "from shutil import copy\n",
141 | "\n",
142 | "def fetch_remote_datafile(filename, remote_url):\n",
143 | " if os.path.isfile(\"./\" + filename):\n",
144 | " print(\"already have \" + filename + \" in workspace\")\n",
145 | " return\n",
146 | " print(\"fetching \" + filename + \" from \" + remote_url + \"...\")\n",
147 | " urllib.request.urlretrieve(remote_url, \"./\" + filename)\n",
148 | "\n",
149 | "def cache_datafile_in_drive(filename):\n",
150 | " if os.path.isfile(\"./\" + filename) == False:\n",
151 | " print(\"cannot cache \" + filename + \", it is not in workspace\")\n",
152 | " return\n",
153 | " \n",
154 | " data_drive_path = \"/content/drive/My Drive/Colab Notebooks/data/\"\n",
155 | " if os.path.isfile(data_drive_path + filename):\n",
156 | " print(\"\" + filename + \" has already been stored in Google Drive\")\n",
157 | " else:\n",
158 | " print(\"copying \" + filename + \" to \" + data_drive_path)\n",
159 | " copy(\"./\" + filename, data_drive_path)\n",
160 | " \n",
161 | "\n",
162 | "def load_datafile_from_drive(filename, remote_url=None):\n",
163 | " data_drive_path = \"/content/drive/My Drive/Colab Notebooks/data/\"\n",
164 | " if os.path.isfile(\"./\" + filename):\n",
165 | " print(\"already have \" + filename + \" in workspace\")\n",
166 | " elif os.path.isfile(data_drive_path + filename):\n",
167 | " print(\"have \" + filename + \" in Google Drive, copying to workspace...\")\n",
168 | " copy(data_drive_path + filename, \".\")\n",
169 | " elif remote_url != None:\n",
170 | " fetch_remote_datafile(filename, remote_url)\n",
171 | " else:\n",
172 | " print(\"error: you need to manually download \" + filename + \" and put in drive\")\n",
173 | " \n",
174 | "def extract_datafile(filename, expected_extract_artifact=None):\n",
175 | " if expected_extract_artifact != None and (os.path.isfile(expected_extract_artifact) or os.path.isdir(expected_extract_artifact)):\n",
176 | " print(\"files in \" + filename + \" have already been extracted\")\n",
177 | " elif os.path.isfile(\"./\" + filename) == False:\n",
178 | " print(\"error: cannot extract \" + filename + \", it is not in the workspace\")\n",
179 | " else:\n",
180 | " extension = filename.split('.')[-1]\n",
181 | " if extension == \"zip\":\n",
182 | " print(\"extracting \" + filename + \"...\")\n",
183 | " data_file = open(filename, \"rb\")\n",
184 | " z = zipfile.ZipFile(data_file)\n",
185 | " for name in z.namelist():\n",
186 | " print(\" extracting file\", name)\n",
187 | " z.extract(name, \"./\")\n",
188 | " data_file.close()\n",
189 | " elif extension == \"gz\":\n",
190 | " print(\"extracting \" + filename + \"...\")\n",
191 | " if filename.split('.')[-2] == \"tar\":\n",
192 | " tar = tarfile.open(filename)\n",
193 | " tar.extractall()\n",
194 | " tar.close()\n",
195 | " else:\n",
196 | " data_zip_file = gzip.GzipFile(filename, 'rb')\n",
197 | " data = data_zip_file.read()\n",
198 | " data_zip_file.close()\n",
199 | " extracted_file = open('.'.join(filename.split('.')[0:-1]), 'wb')\n",
200 | " extracted_file.write(data)\n",
201 | " extracted_file.close()\n",
202 | " elif extension == \"tar\":\n",
203 | " print(\"extracting \" + filename + \"...\")\n",
204 | " tar = tarfile.open(filename)\n",
205 | " tar.extractall()\n",
206 | " tar.close()\n",
207 | " elif extension == \"csv\":\n",
208 | " print(\"do not need to extract csv\")\n",
209 | " else:\n",
210 | " print(\"cannot extract \" + filename)\n",
211 | " \n",
212 | "def load_cache_extract_datafile(filename, expected_extract_artifact=None, remote_url=None):\n",
213 | " load_datafile_from_drive(filename, remote_url)\n",
214 | " extract_datafile(filename, expected_extract_artifact)\n",
215 | " cache_datafile_in_drive(filename)\n",
216 | " "
217 | ],
218 | "execution_count": 0,
219 | "outputs": []
220 | },
221 | {
222 | "metadata": {
223 | "id": "XjzgLYjNJRGM",
224 | "colab_type": "code",
225 | "colab": {
226 | "base_uri": "https://localhost:8080/",
227 | "height": 71
228 | },
229 | "outputId": "5f8a1560-c7e6-477c-d2cb-4aed5bc0cdea"
230 | },
231 | "cell_type": "code",
232 | "source": [
233 | "load_cache_extract_datafile(\"HAPT Data Set.zip\", \"RawData\", \"http://archive.ics.uci.edu/ml/machine-learning-databases/00341/HAPT%20Data%20Set.zip\")"
234 | ],
235 | "execution_count": 5,
236 | "outputs": [
237 | {
238 | "output_type": "stream",
239 | "text": [
240 | "already have HAPT Data Set.zip in workspace\n",
241 | "files in HAPT Data Set.zip have already been extracted\n",
242 | "HAPT Data Set.zip has already been stored in Google Drive\n"
243 | ],
244 | "name": "stdout"
245 | }
246 | ]
247 | },
248 | {
249 | "metadata": {
250 | "id": "AdTJumQccZO8",
251 | "colab_type": "text"
252 | },
253 | "cell_type": "markdown",
254 | "source": [
255 | "## Setup Turi Create"
256 | ]
257 | },
258 | {
259 | "metadata": {
260 | "id": "CZH8VDCOcebW",
261 | "colab_type": "code",
262 | "colab": {}
263 | },
264 | "cell_type": "code",
265 | "source": [
266 | "import mxnet as mx\n",
267 | "import turicreate as tc"
268 | ],
269 | "execution_count": 0,
270 | "outputs": []
271 | },
272 | {
273 | "metadata": {
274 | "id": "TfwLewM6ce6-",
275 | "colab_type": "code",
276 | "colab": {}
277 | },
278 | "cell_type": "code",
279 | "source": [
280 | "# Use all GPUs (default)\n",
281 | "tc.config.set_num_gpus(-1)\n",
282 | "\n",
283 | "# Use only 1 GPU\n",
284 | "#tc.config.set_num_gpus(1)\n",
285 | "\n",
286 | "# Use CPU\n",
287 | "#tc.config.set_num_gpus(0)"
288 | ],
289 | "execution_count": 0,
290 | "outputs": []
291 | },
292 | {
293 | "metadata": {
294 | "id": "ei1gcD_bQNVY",
295 | "colab_type": "text"
296 | },
297 | "cell_type": "markdown",
298 | "source": [
299 | "## Data Preparation\n",
300 | "\n",
301 | "https://apple.github.io/turicreate/docs/userguide/activity_classifier/data-preparation.html"
302 | ]
303 | },
304 | {
305 | "metadata": {
306 | "id": "JiQ0CGKMbEsI",
307 | "colab_type": "code",
308 | "colab": {
309 | "base_uri": "https://localhost:8080/",
310 | "height": 238
311 | },
312 | "outputId": "12f15316-89f0-4181-e986-a38d59eb3307"
313 | },
314 | "cell_type": "code",
315 | "source": [
316 | "data_dir = './RawData/'\n",
317 | "\n",
318 | "def find_label_for_containing_interval(intervals, index):\n",
319 | " containing_interval = intervals[:, 0][(intervals[:, 1] <= index) & (index <= intervals[:, 2])]\n",
320 | " if len(containing_interval) == 1:\n",
321 | " return containing_interval[0]\n",
322 | "\n",
323 | "# Load labels\n",
324 | "labels = tc.SFrame.read_csv(data_dir + 'labels.txt', delimiter=' ', header=False, verbose=False)\n",
325 | "labels = labels.rename({'X1': 'exp_id', 'X2': 'user_id', 'X3': 'activity_id', 'X4': 'start', 'X5': 'end'})\n",
326 | "labels.head()"
327 | ],
328 | "execution_count": 11,
329 | "outputs": [
330 | {
331 | "output_type": "execute_result",
332 | "data": {
333 | "text/html": [
334 | "
\n",
335 | " \n",
336 | " exp_id | \n",
337 | " user_id | \n",
338 | " activity_id | \n",
339 | " start | \n",
340 | " end | \n",
341 | "
\n",
342 | " \n",
343 | " 1 | \n",
344 | " 1 | \n",
345 | " 5 | \n",
346 | " 250 | \n",
347 | " 1232 | \n",
348 | "
\n",
349 | " \n",
350 | " 1 | \n",
351 | " 1 | \n",
352 | " 7 | \n",
353 | " 1233 | \n",
354 | " 1392 | \n",
355 | "
\n",
356 | " \n",
357 | " 1 | \n",
358 | " 1 | \n",
359 | " 4 | \n",
360 | " 1393 | \n",
361 | " 2194 | \n",
362 | "
\n",
363 | " \n",
364 | " 1 | \n",
365 | " 1 | \n",
366 | " 8 | \n",
367 | " 2195 | \n",
368 | " 2359 | \n",
369 | "
\n",
370 | " \n",
371 | " 1 | \n",
372 | " 1 | \n",
373 | " 5 | \n",
374 | " 2360 | \n",
375 | " 3374 | \n",
376 | "
\n",
377 | " \n",
378 | " 1 | \n",
379 | " 1 | \n",
380 | " 11 | \n",
381 | " 3375 | \n",
382 | " 3662 | \n",
383 | "
\n",
384 | " \n",
385 | " 1 | \n",
386 | " 1 | \n",
387 | " 6 | \n",
388 | " 3663 | \n",
389 | " 4538 | \n",
390 | "
\n",
391 | " \n",
392 | " 1 | \n",
393 | " 1 | \n",
394 | " 10 | \n",
395 | " 4539 | \n",
396 | " 4735 | \n",
397 | "
\n",
398 | " \n",
399 | " 1 | \n",
400 | " 1 | \n",
401 | " 4 | \n",
402 | " 4736 | \n",
403 | " 5667 | \n",
404 | "
\n",
405 | " \n",
406 | " 1 | \n",
407 | " 1 | \n",
408 | " 9 | \n",
409 | " 5668 | \n",
410 | " 5859 | \n",
411 | "
\n",
412 | "
\n",
413 | "[10 rows x 5 columns]
\n",
414 | "
"
415 | ],
416 | "text/plain": [
417 | "Columns:\n",
418 | "\texp_id\tint\n",
419 | "\tuser_id\tint\n",
420 | "\tactivity_id\tint\n",
421 | "\tstart\tint\n",
422 | "\tend\tint\n",
423 | "\n",
424 | "Rows: 10\n",
425 | "\n",
426 | "Data:\n",
427 | "+--------+---------+-------------+-------+------+\n",
428 | "| exp_id | user_id | activity_id | start | end |\n",
429 | "+--------+---------+-------------+-------+------+\n",
430 | "| 1 | 1 | 5 | 250 | 1232 |\n",
431 | "| 1 | 1 | 7 | 1233 | 1392 |\n",
432 | "| 1 | 1 | 4 | 1393 | 2194 |\n",
433 | "| 1 | 1 | 8 | 2195 | 2359 |\n",
434 | "| 1 | 1 | 5 | 2360 | 3374 |\n",
435 | "| 1 | 1 | 11 | 3375 | 3662 |\n",
436 | "| 1 | 1 | 6 | 3663 | 4538 |\n",
437 | "| 1 | 1 | 10 | 4539 | 4735 |\n",
438 | "| 1 | 1 | 4 | 4736 | 5667 |\n",
439 | "| 1 | 1 | 9 | 5668 | 5859 |\n",
440 | "+--------+---------+-------------+-------+------+\n",
441 | "[10 rows x 5 columns]"
442 | ]
443 | },
444 | "metadata": {
445 | "tags": []
446 | },
447 | "execution_count": 11
448 | }
449 | ]
450 | },
451 | {
452 | "metadata": {
453 | "id": "a_oKPOf6cvAI",
454 | "colab_type": "text"
455 | },
456 | "cell_type": "markdown",
457 | "source": [
458 | "Next, we need to get the accelerometer and gyroscope data for each experiment. For each experiment, every sensor's data is in a separate file. In the code below we load the accelerometer and gyroscope data from all experiments into a single SFrame. While loading the collected samples, we also calculate the label for each sample using our previously defined function. The final SFrame contains a column named exp_id to identify each unique sessions."
459 | ]
460 | },
461 | {
462 | "metadata": {
463 | "id": "MVO9uGTocvqe",
464 | "colab_type": "code",
465 | "colab": {}
466 | },
467 | "cell_type": "code",
468 | "source": [
469 | "from glob import glob\n",
470 | "\n",
471 | "acc_files = glob(data_dir + 'acc_*.txt')\n",
472 | "gyro_files = glob(data_dir + 'gyro_*.txt')\n",
473 | "\n",
474 | "# Load data\n",
475 | "data = tc.SFrame()\n",
476 | "files = zip(sorted(acc_files), sorted(gyro_files))\n",
477 | "for acc_file, gyro_file in files:\n",
478 | " exp_id = int(acc_file.split('_')[1][-2:])\n",
479 | "\n",
480 | " # Load accel data\n",
481 | " sf = tc.SFrame.read_csv(acc_file, delimiter=' ', header=False, verbose=False)\n",
482 | " sf = sf.rename({'X1': 'acc_x', 'X2': 'acc_y', 'X3': 'acc_z'})\n",
483 | " sf['exp_id'] = exp_id\n",
484 | "\n",
485 | " # Load gyro data\n",
486 | " gyro_sf = tc.SFrame.read_csv(gyro_file, delimiter=' ', header=False, verbose=False)\n",
487 | " gyro_sf = gyro_sf.rename({'X1': 'gyro_x', 'X2': 'gyro_y', 'X3': 'gyro_z'})\n",
488 | " sf = sf.add_columns(gyro_sf)\n",
489 | "\n",
490 | " # Calc labels\n",
491 | " exp_labels = labels[labels['exp_id'] == exp_id][['activity_id', 'start', 'end']].to_numpy()\n",
492 | " sf = sf.add_row_number()\n",
493 | " sf['activity_id'] = sf['id'].apply(lambda x: find_label_for_containing_interval(exp_labels, x))\n",
494 | " sf = sf.remove_columns(['id'])\n",
495 | "\n",
496 | " data = data.append(sf)"
497 | ],
498 | "execution_count": 0,
499 | "outputs": []
500 | },
501 | {
502 | "metadata": {
503 | "id": "UVT3HRrQc4zw",
504 | "colab_type": "text"
505 | },
506 | "cell_type": "markdown",
507 | "source": [
508 | "Finally, we encode the labels back into a readable string format, and save the resulting SFrame."
509 | ]
510 | },
511 | {
512 | "metadata": {
513 | "id": "rWRH6Mtnc5sU",
514 | "colab_type": "code",
515 | "colab": {}
516 | },
517 | "cell_type": "code",
518 | "source": [
519 | "target_map = {\n",
520 | " 1.: 'walking', \n",
521 | " 2.: 'climbing_upstairs',\n",
522 | " 3.: 'climbing_downstairs',\n",
523 | " 4.: 'sitting',\n",
524 | " 5.: 'standing',\n",
525 | " 6.: 'laying'\n",
526 | "}\n",
527 | "\n",
528 | "# Use the same labels used in the experiment\n",
529 | "data = data.filter_by(list(target_map.keys()), 'activity_id')\n",
530 | "data['activity'] = data['activity_id'].apply(lambda x: target_map[x])\n",
531 | "data = data.remove_column('activity_id')\n",
532 | "\n",
533 | "data.save('hapt_data.sframe')"
534 | ],
535 | "execution_count": 0,
536 | "outputs": []
537 | },
538 | {
539 | "metadata": {
540 | "id": "fprN0DI7eC0a",
541 | "colab_type": "code",
542 | "colab": {
543 | "base_uri": "https://localhost:8080/",
544 | "height": 442
545 | },
546 | "outputId": "f2b416da-05b8-4f5b-8bc7-41940d1655a3"
547 | },
548 | "cell_type": "code",
549 | "source": [
550 | "data.head()"
551 | ],
552 | "execution_count": 15,
553 | "outputs": [
554 | {
555 | "output_type": "execute_result",
556 | "data": {
557 | "text/html": [
558 | "\n",
559 | " \n",
560 | " acc_x | \n",
561 | " acc_y | \n",
562 | " acc_z | \n",
563 | " exp_id | \n",
564 | " gyro_x | \n",
565 | "
\n",
566 | " \n",
567 | " 1.020833394742025 | \n",
568 | " -0.1250000020616516 | \n",
569 | " 0.105555564319952 | \n",
570 | " 1 | \n",
571 | " -0.002748893573880196 | \n",
572 | "
\n",
573 | " \n",
574 | " 1.025000070391787 | \n",
575 | " -0.1250000020616516 | \n",
576 | " 0.1013888947481719 | \n",
577 | " 1 | \n",
578 | " -0.0003054326225537807 | \n",
579 | "
\n",
580 | " \n",
581 | " 1.020833394742025 | \n",
582 | " -0.1250000020616516 | \n",
583 | " 0.1041666724366978 | \n",
584 | " 1 | \n",
585 | " 0.01221730466932058 | \n",
586 | "
\n",
587 | " \n",
588 | " 1.016666719092262 | \n",
589 | " -0.1250000020616516 | \n",
590 | " 0.1083333359304957 | \n",
591 | " 1 | \n",
592 | " 0.01130100712180138 | \n",
593 | "
\n",
594 | " \n",
595 | " 1.018055610975516 | \n",
596 | " -0.1277777858281599 | \n",
597 | " 0.1083333359304957 | \n",
598 | " 1 | \n",
599 | " 0.01099557429552078 | \n",
600 | "
\n",
601 | " \n",
602 | " 1.018055610975516 | \n",
603 | " -0.1291666655554495 | \n",
604 | " 0.1041666724366978 | \n",
605 | " 1 | \n",
606 | " 0.009162978269159794 | \n",
607 | "
\n",
608 | " \n",
609 | " 1.01944450285877 | \n",
610 | " -0.1250000020616516 | \n",
611 | " 0.1013888947481719 | \n",
612 | " 1 | \n",
613 | " 0.01007927674800158 | \n",
614 | "
\n",
615 | " \n",
616 | " 1.016666719092262 | \n",
617 | " -0.1236111101783975 | \n",
618 | " 0.09722222517639174 | \n",
619 | " 1 | \n",
620 | " 0.01374446786940098 | \n",
621 | "
\n",
622 | " \n",
623 | " 1.020833394742025 | \n",
624 | " -0.1277777858281599 | \n",
625 | " 0.09861111705964588 | \n",
626 | " 1 | \n",
627 | " 0.009773843921720982 | \n",
628 | "
\n",
629 | " \n",
630 | " 1.01944450285877 | \n",
631 | " -0.1152777831908018 | \n",
632 | " 0.09444444748786576 | \n",
633 | " 1 | \n",
634 | " 0.01649336144328117 | \n",
635 | "
\n",
636 | "
\n",
637 | "
\n",
638 | " \n",
639 | " gyro_y | \n",
640 | " gyro_z | \n",
641 | " activity | \n",
642 | "
\n",
643 | " \n",
644 | " -0.00427605677396059 | \n",
645 | " 0.002748893573880196 | \n",
646 | " standing | \n",
647 | "
\n",
648 | " \n",
649 | " -0.002138028386980295 | \n",
650 | " 0.006108652334660292 | \n",
651 | " standing | \n",
652 | "
\n",
653 | " \n",
654 | " 0.0009162978967651724 | \n",
655 | " -0.00733038317412138 | \n",
656 | " standing | \n",
657 | "
\n",
658 | " \n",
659 | " -0.001832595793530345 | \n",
660 | " -0.006414085160940886 | \n",
661 | " standing | \n",
662 | "
\n",
663 | " \n",
664 | " -0.001527163083665073 | \n",
665 | " -0.004886921960860491 | \n",
666 | " standing | \n",
667 | "
\n",
668 | " \n",
669 | " -0.003054326167330146 | \n",
670 | " 0.01007927674800158 | \n",
671 | " standing | \n",
672 | "
\n",
673 | " \n",
674 | " -0.00366519158706069 | \n",
675 | " 0.0003054326225537807 | \n",
676 | " standing | \n",
677 | "
\n",
678 | " \n",
679 | " -0.01496619824320078 | \n",
680 | " 0.00427605677396059 | \n",
681 | " standing | \n",
682 | "
\n",
683 | " \n",
684 | " -0.006414085160940886 | \n",
685 | " 0.0003054326225537807 | \n",
686 | " standing | \n",
687 | "
\n",
688 | " \n",
689 | " 0.00366519158706069 | \n",
690 | " 0.003359758760780096 | \n",
691 | " standing | \n",
692 | "
\n",
693 | "
\n",
694 | "[10 rows x 8 columns]
\n",
695 | "
"
696 | ],
697 | "text/plain": [
698 | "Columns:\n",
699 | "\tacc_x\tfloat\n",
700 | "\tacc_y\tfloat\n",
701 | "\tacc_z\tfloat\n",
702 | "\texp_id\tint\n",
703 | "\tgyro_x\tfloat\n",
704 | "\tgyro_y\tfloat\n",
705 | "\tgyro_z\tfloat\n",
706 | "\tactivity\tstr\n",
707 | "\n",
708 | "Rows: 10\n",
709 | "\n",
710 | "Data:\n",
711 | "+-------------------+---------------------+---------------------+--------+\n",
712 | "| acc_x | acc_y | acc_z | exp_id |\n",
713 | "+-------------------+---------------------+---------------------+--------+\n",
714 | "| 1.020833394742025 | -0.1250000020616516 | 0.105555564319952 | 1 |\n",
715 | "| 1.025000070391787 | -0.1250000020616516 | 0.1013888947481719 | 1 |\n",
716 | "| 1.020833394742025 | -0.1250000020616516 | 0.1041666724366978 | 1 |\n",
717 | "| 1.016666719092262 | -0.1250000020616516 | 0.1083333359304957 | 1 |\n",
718 | "| 1.018055610975516 | -0.1277777858281599 | 0.1083333359304957 | 1 |\n",
719 | "| 1.018055610975516 | -0.1291666655554495 | 0.1041666724366978 | 1 |\n",
720 | "| 1.01944450285877 | -0.1250000020616516 | 0.1013888947481719 | 1 |\n",
721 | "| 1.016666719092262 | -0.1236111101783975 | 0.09722222517639174 | 1 |\n",
722 | "| 1.020833394742025 | -0.1277777858281599 | 0.09861111705964588 | 1 |\n",
723 | "| 1.01944450285877 | -0.1152777831908018 | 0.09444444748786576 | 1 |\n",
724 | "+-------------------+---------------------+---------------------+--------+\n",
725 | "+------------------------+-----------------------+-----------------------+\n",
726 | "| gyro_x | gyro_y | gyro_z |\n",
727 | "+------------------------+-----------------------+-----------------------+\n",
728 | "| -0.002748893573880196 | -0.00427605677396059 | 0.002748893573880196 |\n",
729 | "| -0.0003054326225537807 | -0.002138028386980295 | 0.006108652334660292 |\n",
730 | "| 0.01221730466932058 | 0.0009162978967651724 | -0.00733038317412138 |\n",
731 | "| 0.01130100712180138 | -0.001832595793530345 | -0.006414085160940886 |\n",
732 | "| 0.01099557429552078 | -0.001527163083665073 | -0.004886921960860491 |\n",
733 | "| 0.009162978269159794 | -0.003054326167330146 | 0.01007927674800158 |\n",
734 | "| 0.01007927674800158 | -0.00366519158706069 | 0.0003054326225537807 |\n",
735 | "| 0.01374446786940098 | -0.01496619824320078 | 0.00427605677396059 |\n",
736 | "| 0.009773843921720982 | -0.006414085160940886 | 0.0003054326225537807 |\n",
737 | "| 0.01649336144328117 | 0.00366519158706069 | 0.003359758760780096 |\n",
738 | "+------------------------+-----------------------+-----------------------+\n",
739 | "+----------+\n",
740 | "| activity |\n",
741 | "+----------+\n",
742 | "| standing |\n",
743 | "| standing |\n",
744 | "| standing |\n",
745 | "| standing |\n",
746 | "| standing |\n",
747 | "| standing |\n",
748 | "| standing |\n",
749 | "| standing |\n",
750 | "| standing |\n",
751 | "| standing |\n",
752 | "+----------+\n",
753 | "[10 rows x 8 columns]"
754 | ]
755 | },
756 | "metadata": {
757 | "tags": []
758 | },
759 | "execution_count": 15
760 | }
761 | ]
762 | },
763 | {
764 | "metadata": {
765 | "id": "mI82V7dvho-6",
766 | "colab_type": "code",
767 | "colab": {
768 | "base_uri": "https://localhost:8080/",
769 | "height": 164
770 | },
771 | "outputId": "2c8c5bc4-3871-40ed-feb6-8e30f1e334f8"
772 | },
773 | "cell_type": "code",
774 | "source": [
775 | "data.groupby('activity', [tc.aggregate.COUNT]).sort(\"Count\", ascending = False)"
776 | ],
777 | "execution_count": 16,
778 | "outputs": [
779 | {
780 | "output_type": "execute_result",
781 | "data": {
782 | "text/html": [
783 | "\n",
784 | " \n",
785 | " activity | \n",
786 | " Count | \n",
787 | "
\n",
788 | " \n",
789 | " standing | \n",
790 | " 138105 | \n",
791 | "
\n",
792 | " \n",
793 | " laying | \n",
794 | " 136865 | \n",
795 | "
\n",
796 | " \n",
797 | " sitting | \n",
798 | " 126677 | \n",
799 | "
\n",
800 | " \n",
801 | " walking | \n",
802 | " 122091 | \n",
803 | "
\n",
804 | " \n",
805 | " climbing_upstairs | \n",
806 | " 116707 | \n",
807 | "
\n",
808 | " \n",
809 | " climbing_downstairs | \n",
810 | " 107961 | \n",
811 | "
\n",
812 | "
\n",
813 | "[6 rows x 2 columns]
\n",
814 | "
"
815 | ],
816 | "text/plain": [
817 | "Columns:\n",
818 | "\tactivity\tstr\n",
819 | "\tCount\tint\n",
820 | "\n",
821 | "Rows: 6\n",
822 | "\n",
823 | "Data:\n",
824 | "+---------------------+--------+\n",
825 | "| activity | Count |\n",
826 | "+---------------------+--------+\n",
827 | "| standing | 138105 |\n",
828 | "| laying | 136865 |\n",
829 | "| sitting | 126677 |\n",
830 | "| walking | 122091 |\n",
831 | "| climbing_upstairs | 116707 |\n",
832 | "| climbing_downstairs | 107961 |\n",
833 | "+---------------------+--------+\n",
834 | "[6 rows x 2 columns]"
835 | ]
836 | },
837 | "metadata": {
838 | "tags": []
839 | },
840 | "execution_count": 16
841 | }
842 | ]
843 | },
844 | {
845 | "metadata": {
846 | "id": "ikL5gDh-QQW5",
847 | "colab_type": "text"
848 | },
849 | "cell_type": "markdown",
850 | "source": [
851 | "## Example Activity Classififcation - HAPT Data"
852 | ]
853 | },
854 | {
855 | "metadata": {
856 | "id": "Am3pqXk2e4Yh",
857 | "colab_type": "code",
858 | "colab": {}
859 | },
860 | "cell_type": "code",
861 | "source": [
862 | "# Load sessions from preprocessed data\n",
863 | "data = tc.SFrame('hapt_data.sframe')"
864 | ],
865 | "execution_count": 0,
866 | "outputs": []
867 | },
868 | {
869 | "metadata": {
870 | "id": "yy_6w2sjtB7C",
871 | "colab_type": "code",
872 | "colab": {}
873 | },
874 | "cell_type": "code",
875 | "source": [
876 | "# Train/test split by recording sessions\n",
877 | "train, test = tc.activity_classifier.util.random_split_by_session(data, session_id='exp_id', fraction=0.8)"
878 | ],
879 | "execution_count": 0,
880 | "outputs": []
881 | },
882 | {
883 | "metadata": {
884 | "id": "U9KR53TtePFI",
885 | "colab_type": "code",
886 | "colab": {
887 | "base_uri": "https://localhost:8080/",
888 | "height": 370
889 | },
890 | "outputId": "35ed0af1-d681-468d-d837-0f8c363f2d08"
891 | },
892 | "cell_type": "code",
893 | "source": [
894 | "# Create an activity classifier\n",
895 | "model = tc.activity_classifier.create(train, session_id='exp_id', target='activity', prediction_window=50)"
896 | ],
897 | "execution_count": 7,
898 | "outputs": [
899 | {
900 | "output_type": "stream",
901 | "text": [
902 | "The dataset has less than the minimum of 100 sessions required for train-validation split. Continuing without validation set\n"
903 | ],
904 | "name": "stdout"
905 | },
906 | {
907 | "output_type": "display_data",
908 | "data": {
909 | "text/html": [
910 | "Pre-processing 585143 samples...
"
911 | ],
912 | "text/plain": [
913 | "Pre-processing 585143 samples..."
914 | ]
915 | },
916 | "metadata": {
917 | "tags": []
918 | }
919 | },
920 | {
921 | "output_type": "display_data",
922 | "data": {
923 | "text/html": [
924 | "Using sequences of size 1000 for model creation.
"
925 | ],
926 | "text/plain": [
927 | "Using sequences of size 1000 for model creation."
928 | ]
929 | },
930 | "metadata": {
931 | "tags": []
932 | }
933 | },
934 | {
935 | "output_type": "display_data",
936 | "data": {
937 | "text/html": [
938 | "Processed a total of 48 sessions.
"
939 | ],
940 | "text/plain": [
941 | "Processed a total of 48 sessions."
942 | ]
943 | },
944 | "metadata": {
945 | "tags": []
946 | }
947 | },
948 | {
949 | "output_type": "stream",
950 | "text": [
951 | "Using GPU to create model (CUDA)\n",
952 | "+----------------+----------------+----------------+----------------+\n",
953 | "| Iteration | Train Accuracy | Train Loss | Elapsed Time |\n",
954 | "+----------------+----------------+----------------+----------------+\n",
955 | "| 1 | 0.623 | 0.977 | 0.6 |\n",
956 | "| 2 | 0.810 | 0.541 | 1.2 |\n",
957 | "| 3 | 0.846 | 0.412 | 1.8 |\n",
958 | "| 4 | 0.863 | 0.359 | 2.4 |\n",
959 | "| 5 | 0.873 | 0.322 | 3.0 |\n",
960 | "| 6 | 0.889 | 0.293 | 3.6 |\n",
961 | "| 7 | 0.895 | 0.264 | 4.2 |\n",
962 | "| 8 | 0.902 | 0.242 | 4.8 |\n",
963 | "| 9 | 0.911 | 0.224 | 5.4 |\n",
964 | "| 10 | 0.916 | 0.208 | 6.0 |\n",
965 | "+----------------+----------------+----------------+----------------+\n",
966 | "Training complete\n",
967 | "Total Time Spent: 5.95675s\n"
968 | ],
969 | "name": "stdout"
970 | }
971 | ]
972 | },
973 | {
974 | "metadata": {
975 | "id": "wEDXZ0VaeQf6",
976 | "colab_type": "code",
977 | "colab": {
978 | "base_uri": "https://localhost:8080/",
979 | "height": 910
980 | },
981 | "outputId": "5a9ea21d-57b5-4413-a05f-397db276f7a9"
982 | },
983 | "cell_type": "code",
984 | "source": [
985 | "# Evaluate the model and save the results into a dictionary\n",
986 | "metrics = model.evaluate(test)\n",
987 | "print(metrics)"
988 | ],
989 | "execution_count": 17,
990 | "outputs": [
991 | {
992 | "output_type": "stream",
993 | "text": [
994 | "{'accuracy': 0.9324280455461433, 'auc': 0.994000145914642, 'precision': 0.9344077064028339, 'recall': 0.9313875060212965, 'f1_score': 0.9325884730108659, 'log_loss': 0.2183073822297761, 'confusion_matrix': Columns:\n",
995 | "\ttarget_label\tstr\n",
996 | "\tpredicted_label\tstr\n",
997 | "\tcount\tint\n",
998 | "\n",
999 | "Rows: 31\n",
1000 | "\n",
1001 | "Data:\n",
1002 | "+---------------------+---------------------+-------+\n",
1003 | "| target_label | predicted_label | count |\n",
1004 | "+---------------------+---------------------+-------+\n",
1005 | "| climbing_downstairs | climbing_upstairs | 640 |\n",
1006 | "| climbing_upstairs | climbing_upstairs | 23073 |\n",
1007 | "| climbing_upstairs | climbing_downstairs | 1048 |\n",
1008 | "| laying | walking | 31 |\n",
1009 | "| climbing_downstairs | climbing_downstairs | 21240 |\n",
1010 | "| sitting | standing | 3346 |\n",
1011 | "| laying | laying | 29554 |\n",
1012 | "| standing | standing | 29827 |\n",
1013 | "| walking | sitting | 1351 |\n",
1014 | "| sitting | sitting | 25134 |\n",
1015 | "+---------------------+---------------------+-------+\n",
1016 | "[31 rows x 3 columns]\n",
1017 | "Note: Only the head of the SFrame is printed.\n",
1018 | "You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'roc_curve': Columns:\n",
1019 | "\tthreshold\tfloat\n",
1020 | "\tfpr\tfloat\n",
1021 | "\ttpr\tfloat\n",
1022 | "\tp\tint\n",
1023 | "\tn\tint\n",
1024 | "\tclass\tstr\n",
1025 | "\n",
1026 | "Rows: 600006\n",
1027 | "\n",
1028 | "Data:\n",
1029 | "+-----------+--------------------+-----+-------+--------+---------------------+\n",
1030 | "| threshold | fpr | tpr | p | n | class |\n",
1031 | "+-----------+--------------------+-----+-------+--------+---------------------+\n",
1032 | "| 0.0 | 1.0 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1033 | "| 1e-05 | 0.9458010041077134 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1034 | "| 2e-05 | 0.9026557507987221 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1035 | "| 3e-05 | 0.8837574167047011 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1036 | "| 4e-05 | 0.8687813783660429 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1037 | "| 5e-05 | 0.8470304655408489 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1038 | "| 6e-05 | 0.8288452761296212 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1039 | "| 7e-05 | 0.813869237790963 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1040 | "| 8e-05 | 0.8056680739388408 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1041 | "| 9e-05 | 0.797823482428115 | 1.0 | 23039 | 140224 | climbing_downstairs |\n",
1042 | "+-----------+--------------------+-----+-------+--------+---------------------+\n",
1043 | "[600006 rows x 6 columns]\n",
1044 | "Note: Only the head of the SFrame is printed.\n",
1045 | "You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}\n"
1046 | ],
1047 | "name": "stdout"
1048 | }
1049 | ]
1050 | },
1051 | {
1052 | "metadata": {
1053 | "id": "0211dGgJeRsh",
1054 | "colab_type": "code",
1055 | "colab": {
1056 | "base_uri": "https://localhost:8080/",
1057 | "height": 34
1058 | },
1059 | "outputId": "8ec5c45c-5bd3-4c8c-f691-81fd333168dc"
1060 | },
1061 | "cell_type": "code",
1062 | "source": [
1063 | "print(metrics['accuracy'])"
1064 | ],
1065 | "execution_count": 9,
1066 | "outputs": [
1067 | {
1068 | "output_type": "stream",
1069 | "text": [
1070 | "0.9324280455461433\n"
1071 | ],
1072 | "name": "stdout"
1073 | }
1074 | ]
1075 | },
1076 | {
1077 | "metadata": {
1078 | "id": "4ErNlq_vfhby",
1079 | "colab_type": "text"
1080 | },
1081 | "cell_type": "markdown",
1082 | "source": [
1083 | "Since we have created the model with samples taken at 50Hz and set the prediction_window to 50, we will get one prediction per second. Invoking our newly created model on the above 3-seconds walking example produces the following per-second predictions:"
1084 | ]
1085 | },
1086 | {
1087 | "metadata": {
1088 | "id": "0wldXO3kfjdv",
1089 | "colab_type": "code",
1090 | "colab": {
1091 | "base_uri": "https://localhost:8080/",
1092 | "height": 109
1093 | },
1094 | "outputId": "772997a2-0d54-41da-c1b7-20428cf7cabd"
1095 | },
1096 | "cell_type": "code",
1097 | "source": [
1098 | "walking_3_sec = data[(data['activity'] == 'walking') & (data['exp_id'] == 1)][1000:1150]\n",
1099 | "model.predict(walking_3_sec, output_frequency='per_window')"
1100 | ],
1101 | "execution_count": 12,
1102 | "outputs": [
1103 | {
1104 | "output_type": "execute_result",
1105 | "data": {
1106 | "text/html": [
1107 | "\n",
1108 | " \n",
1109 | " prediction_id | \n",
1110 | " exp_id | \n",
1111 | " class | \n",
1112 | "
\n",
1113 | " \n",
1114 | " 0 | \n",
1115 | " 1 | \n",
1116 | " walking | \n",
1117 | "
\n",
1118 | " \n",
1119 | " 1 | \n",
1120 | " 1 | \n",
1121 | " walking | \n",
1122 | "
\n",
1123 | " \n",
1124 | " 2 | \n",
1125 | " 1 | \n",
1126 | " walking | \n",
1127 | "
\n",
1128 | "
\n",
1129 | "[3 rows x 3 columns]
\n",
1130 | "
"
1131 | ],
1132 | "text/plain": [
1133 | "Columns:\n",
1134 | "\tprediction_id\tint\n",
1135 | "\texp_id\tint\n",
1136 | "\tclass\tstr\n",
1137 | "\n",
1138 | "Rows: 3\n",
1139 | "\n",
1140 | "Data:\n",
1141 | "+---------------+--------+---------+\n",
1142 | "| prediction_id | exp_id | class |\n",
1143 | "+---------------+--------+---------+\n",
1144 | "| 0 | 1 | walking |\n",
1145 | "| 1 | 1 | walking |\n",
1146 | "| 2 | 1 | walking |\n",
1147 | "+---------------+--------+---------+\n",
1148 | "[3 rows x 3 columns]"
1149 | ]
1150 | },
1151 | "metadata": {
1152 | "tags": []
1153 | },
1154 | "execution_count": 12
1155 | }
1156 | ]
1157 | },
1158 | {
1159 | "metadata": {
1160 | "id": "8osqXiCQP2Vk",
1161 | "colab_type": "text"
1162 | },
1163 | "cell_type": "markdown",
1164 | "source": [
1165 | "## Save / Export Model"
1166 | ]
1167 | },
1168 | {
1169 | "metadata": {
1170 | "id": "skcTzbLiP0NU",
1171 | "colab_type": "code",
1172 | "colab": {}
1173 | },
1174 | "cell_type": "code",
1175 | "source": [
1176 | "# Save the model for later use in Turi Create\n",
1177 | "model.save('ActivityClassifier.model')"
1178 | ],
1179 | "execution_count": 0,
1180 | "outputs": []
1181 | },
1182 | {
1183 | "metadata": {
1184 | "id": "gsLbJ0M5P1iR",
1185 | "colab_type": "code",
1186 | "colab": {
1187 | "base_uri": "https://localhost:8080/",
1188 | "height": 67
1189 | },
1190 | "outputId": "3cec968c-7f20-4ba8-fb8e-bb806ae84979"
1191 | },
1192 | "cell_type": "code",
1193 | "source": [
1194 | "# Export for use in Core ML\n",
1195 | "model.export_coreml('ActivityClassifier.mlmodel')"
1196 | ],
1197 | "execution_count": 11,
1198 | "outputs": [
1199 | {
1200 | "output_type": "stream",
1201 | "text": [
1202 | "/usr/local/lib/python3.6/dist-packages/coremltools/_deps/__init__.py:118: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead\n",
1203 | " % (tensorflow.__version__, TF_MAX_VERSION))\n",
1204 | "WARNING:root:TensorFlow version 1.10.1 detected. Last version known to be fully compatible is 1.5.0 .\n"
1205 | ],
1206 | "name": "stderr"
1207 | }
1208 | ]
1209 | },
1210 | {
1211 | "metadata": {
1212 | "id": "y9zcbtW7fuPS",
1213 | "colab_type": "code",
1214 | "colab": {}
1215 | },
1216 | "cell_type": "code",
1217 | "source": [
1218 | "# download mlmodel locally\n",
1219 | "from google.colab import files\n",
1220 | "files.download(\"ActivityClassifier.mlmodel\")"
1221 | ],
1222 | "execution_count": 0,
1223 | "outputs": []
1224 | },
1225 | {
1226 | "metadata": {
1227 | "id": "BZiVwtK1fy9F",
1228 | "colab_type": "code",
1229 | "colab": {
1230 | "base_uri": "https://localhost:8080/",
1231 | "height": 34
1232 | },
1233 | "outputId": "531297da-e360-4bfb-8b09-6d424bb3c584"
1234 | },
1235 | "cell_type": "code",
1236 | "source": [
1237 | "# copy model to Google Drive\n",
1238 | "from shutil import copy\n",
1239 | "copy(\"/content/ActivityClassifier.mlmodel\", \"/content/drive/My Drive/Colab Notebooks/data/models/ActivityClassifier.mlmodel\")"
1240 | ],
1241 | "execution_count": 13,
1242 | "outputs": [
1243 | {
1244 | "output_type": "execute_result",
1245 | "data": {
1246 | "text/plain": [
1247 | "'/content/drive/Colab Notebooks/data/models/ActivityClassifier.mlmodel'"
1248 | ]
1249 | },
1250 | "metadata": {
1251 | "tags": []
1252 | },
1253 | "execution_count": 13
1254 | }
1255 | ]
1256 | },
1257 | {
1258 | "metadata": {
1259 | "id": "kJeYR7haf2b6",
1260 | "colab_type": "code",
1261 | "colab": {
1262 | "base_uri": "https://localhost:8080/",
1263 | "height": 34
1264 | },
1265 | "outputId": "4688a2cf-9b58-4f12-c2d5-ea8a50a795df"
1266 | },
1267 | "cell_type": "code",
1268 | "source": [
1269 | "# copy model to Google Drive\n",
1270 | "from shutil import copytree\n",
1271 | "copytree(\"/content/ActivityClassifier.model\", \"/content/drive/My Drive/Colab Notebooks/data/models/ActivityClassifier.model\")"
1272 | ],
1273 | "execution_count": 15,
1274 | "outputs": [
1275 | {
1276 | "output_type": "execute_result",
1277 | "data": {
1278 | "text/plain": [
1279 | "'/content/drive/Colab Notebooks/data/models/ActivityClassifier.model'"
1280 | ]
1281 | },
1282 | "metadata": {
1283 | "tags": []
1284 | },
1285 | "execution_count": 15
1286 | }
1287 | ]
1288 | },
1289 | {
1290 | "metadata": {
1291 | "id": "Q7RgoTcpgaKq",
1292 | "colab_type": "text"
1293 | },
1294 | "cell_type": "markdown",
1295 | "source": [
1296 | "## How does this work?\n",
1297 | "\n",
1298 | "The deep learning model relies on convolutional layers to extract temporal features from a single prediction window, for example an arching movement could possibly be a strong indicator of swimming. Furthermore, it relies on recurrent layers to extract temporal features over time, for example if a subject was swimming in the previous timestamp, then it is most likely not sky diving in the next. Below is a sketch of the neural network used for the activity classifier in Turi Create.\n",
1299 | "\n",
1300 | "\n",
1301 | "\n",
1302 | "A single input to the neural network is a session as defined in the previous section. The convolutional layer operates on each prediction window, finding spatial features that may be relevant to the labeled activities. \n",
1303 | "\n",
1304 | "\n",
1305 | "\n",
1306 | "The output of the convolutional layer is a vector representation for each prediction window, encoding these learnt features. The recurrent layer takes as input a sequence of these vectors.\n",
1307 | "\n",
1308 | "The recurrent layer is specialized for learning temporal features across sequences. For example, it may learn that spatial features associated with walking are more likely to occur after detecting spatial features associated with running. These features are further encoded into the output of the recurrent layer.\n",
1309 | "\n",
1310 | "In order to detect these features along sessions the recurrent layer takes into account it's own state - the output of the recurrent layer for the previous prediction window. The output of the recurrent layer for the current prediction window is turned into a probability vector across all desired activities to produce the final classification."
1311 | ]
1312 | },
1313 | {
1314 | "metadata": {
1315 | "id": "licB01Ek33VN",
1316 | "colab_type": "text"
1317 | },
1318 | "cell_type": "markdown",
1319 | "source": [
1320 | "Blogs:\n",
1321 | "* https://medium.com/@howal/activity-monitoring-with-apples-turi-create-machine-learning-1043ce5b9203"
1322 | ]
1323 | }
1324 | ]
1325 | }
--------------------------------------------------------------------------------
/turicreate_sframes_intro.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "turicreate-sframes-intro.ipynb",
7 | "version": "0.3.2",
8 | "provenance": [],
9 | "collapsed_sections": [],
10 | "include_colab_link": true
11 | },
12 | "kernelspec": {
13 | "name": "python3",
14 | "display_name": "Python 3"
15 | },
16 | "accelerator": "GPU"
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {
22 | "id": "view-in-github",
23 | "colab_type": "text"
24 | },
25 | "source": [
26 | "[View in Colaboratory](https://colab.research.google.com/github/jagatfx/turicreate-colab/blob/master/turicreate_sframes_intro.ipynb)"
27 | ]
28 | },
29 | {
30 | "metadata": {
31 | "id": "MRYC2Vn02lMd",
32 | "colab_type": "text"
33 | },
34 | "cell_type": "markdown",
35 | "source": [
36 | "# Introduction to Turi Create SFrames\n",
37 | "\n",
38 | "https://github.com/apple/turicreate/blob/master/userguide/sframe/sframe-intro.md"
39 | ]
40 | },
41 | {
42 | "metadata": {
43 | "id": "qo2KqxVs2q8B",
44 | "colab_type": "text"
45 | },
46 | "cell_type": "markdown",
47 | "source": [
48 | "SFrames are the primary data structure for extracting data from other sources for use in Turi Create.\n",
49 | "\n",
50 | "They are similar to Pandas Dataframes but do not need to be loaded as a whole into RAM, so are not constrained by the RAM of the machine running the code. This makes it a scalable data structure. It is column immutable and supports out-of-core processing.\n",
51 | "\n",
52 | "SFrames can extract data from the following static file formats:\n",
53 | "\n",
54 | "* CSV\n",
55 | "* JSON\n",
56 | "* SQL databases"
57 | ]
58 | },
59 | {
60 | "metadata": {
61 | "id": "Ew1_T9B94_ip",
62 | "colab_type": "text"
63 | },
64 | "cell_type": "markdown",
65 | "source": [
66 | "## Turi Create and GPU Setup"
67 | ]
68 | },
69 | {
70 | "metadata": {
71 | "id": "DCSsL_vs5BHF",
72 | "colab_type": "code",
73 | "colab": {}
74 | },
75 | "cell_type": "code",
76 | "source": [
77 | "!apt install libnvrtc8.0\n",
78 | "!pip uninstall -y mxnet-cu80 && pip install mxnet-cu80==1.1.0\n",
79 | "!pip install turicreate"
80 | ],
81 | "execution_count": 0,
82 | "outputs": []
83 | },
84 | {
85 | "metadata": {
86 | "id": "-nVQfLiK6U_7",
87 | "colab_type": "text"
88 | },
89 | "cell_type": "markdown",
90 | "source": [
91 | "## Google Drive Access\n",
92 | "\n",
93 | "You will be asked to click a link to generate a secret key to access your Google Drive. \n",
94 | "\n",
95 | "Copy and paste secret key it into the space provided with the notebook."
96 | ]
97 | },
98 | {
99 | "metadata": {
100 | "id": "y5a7kHU26Vm9",
101 | "colab_type": "code",
102 | "colab": {}
103 | },
104 | "cell_type": "code",
105 | "source": [
106 | "import os.path\n",
107 | "from google.colab import drive\n",
108 | "\n",
109 | "# mount Google Drive to /content/drive/My Drive/\n",
110 | "if os.path.isdir(\"/content/drive/My Drive\"):\n",
111 | " print(\"Google Drive already mounted\")\n",
112 | "else:\n",
113 | " drive.mount('/content/drive')"
114 | ],
115 | "execution_count": 0,
116 | "outputs": []
117 | },
118 | {
119 | "metadata": {
120 | "id": "dTWuiXji9cEK",
121 | "colab_type": "text"
122 | },
123 | "cell_type": "markdown",
124 | "source": [
125 | "## Fetch Data"
126 | ]
127 | },
128 | {
129 | "metadata": {
130 | "id": "_q-u2IbDPA2Q",
131 | "colab_type": "code",
132 | "colab": {}
133 | },
134 | "cell_type": "code",
135 | "source": [
136 | "import os.path\n",
137 | "import urllib.request\n",
138 | "import tarfile\n",
139 | "import zipfile\n",
140 | "import gzip\n",
141 | "from shutil import copy\n",
142 | "\n",
143 | "def fetch_remote_datafile(filename, remote_url):\n",
144 | " if os.path.isfile(\"./\" + filename):\n",
145 | " print(\"already have \" + filename + \" in workspace\")\n",
146 | " return\n",
147 | " print(\"fetching \" + filename + \" from \" + remote_url + \"...\")\n",
148 | " urllib.request.urlretrieve(remote_url, \"./\" + filename)\n",
149 | "\n",
150 | "def cache_datafile_in_drive(filename):\n",
151 | " if os.path.isfile(\"./\" + filename) == False:\n",
152 | " print(\"cannot cache \" + filename + \", it is not in workspace\")\n",
153 | " return\n",
154 | " \n",
155 | " data_drive_path = \"/content/drive/My Drive/Colab Notebooks/data/\"\n",
156 | " if os.path.isfile(data_drive_path + filename):\n",
157 | " print(\"\" + filename + \" has already been stored in Google Drive\")\n",
158 | " else:\n",
159 | " print(\"copying \" + filename + \" to \" + data_drive_path)\n",
160 | " copy(\"./\" + filename, data_drive_path)\n",
161 | " \n",
162 | "\n",
163 | "def load_datafile_from_drive(filename, remote_url=None):\n",
164 | " data_drive_path = \"/content/drive/My Drive/Colab Notebooks/data/\"\n",
165 | " if os.path.isfile(\"./\" + filename):\n",
166 | " print(\"already have \" + filename + \" in workspace\")\n",
167 | " elif os.path.isfile(data_drive_path + filename):\n",
168 | " print(\"have \" + filename + \" in Google Drive, copying to workspace...\")\n",
169 | " copy(data_drive_path + filename, \".\")\n",
170 | " elif remote_url != None:\n",
171 | " fetch_remote_datafile(filename, remote_url)\n",
172 | " else:\n",
173 | " print(\"error: you need to manually download \" + filename + \" and put in drive\")\n",
174 | " \n",
175 | "def extract_datafile(filename, expected_extract_artifact=None):\n",
176 | " if expected_extract_artifact != None and (os.path.isfile(expected_extract_artifact) or os.path.isdir(expected_extract_artifact)):\n",
177 | " print(\"files in \" + filename + \" have already been extracted\")\n",
178 | " elif os.path.isfile(\"./\" + filename) == False:\n",
179 | " print(\"error: cannot extract \" + filename + \", it is not in the workspace\")\n",
180 | " else:\n",
181 | " extension = filename.split('.')[-1]\n",
182 | " if extension == \"zip\":\n",
183 | " print(\"extracting \" + filename + \"...\")\n",
184 | " data_file = open(filename, \"rb\")\n",
185 | " z = zipfile.ZipFile(data_file)\n",
186 | " for name in z.namelist():\n",
187 | " print(\" extracting file\", name)\n",
188 | " z.extract(name, \"./\")\n",
189 | " data_file.close()\n",
190 | " elif extension == \"gz\":\n",
191 | " print(\"extracting \" + filename + \"...\")\n",
192 | " if filename.split('.')[-2] == \"tar\":\n",
193 | " tar = tarfile.open(filename)\n",
194 | " tar.extractall()\n",
195 | " tar.close()\n",
196 | " else:\n",
197 | " data_zip_file = gzip.GzipFile(filename, 'rb')\n",
198 | " data = data_zip_file.read()\n",
199 | " data_zip_file.close()\n",
200 | " extracted_file = open('.'.join(filename.split('.')[0:-1]), 'wb')\n",
201 | " extracted_file.write(data)\n",
202 | " extracted_file.close()\n",
203 | " elif extension == \"tar\":\n",
204 | " print(\"extracting \" + filename + \"...\")\n",
205 | " tar = tarfile.open(filename)\n",
206 | " tar.extractall()\n",
207 | " tar.close()\n",
208 | " elif extension == \"csv\":\n",
209 | " print(\"do not need to extract csv\")\n",
210 | " else:\n",
211 | " print(\"cannot extract \" + filename)\n",
212 | " \n",
213 | "def load_cache_extract_datafile(filename, expected_extract_artifact=None, remote_url=None):\n",
214 | " load_datafile_from_drive(filename, remote_url)\n",
215 | " extract_datafile(filename, expected_extract_artifact)\n",
216 | " cache_datafile_in_drive(filename)\n",
217 | " "
218 | ],
219 | "execution_count": 0,
220 | "outputs": []
221 | },
222 | {
223 | "metadata": {
224 | "id": "ja-WPIYCPBo2",
225 | "colab_type": "code",
226 | "colab": {
227 | "base_uri": "https://localhost:8080/",
228 | "height": 71
229 | },
230 | "outputId": "e3a55a49-3595-4033-8fc4-c29f4e636047"
231 | },
232 | "cell_type": "code",
233 | "source": [
234 | "load_cache_extract_datafile(\"song_data.csv.zip\", \"song_data.csv\", \"https://static.turi.com/datasets/millionsong/song_data.csv\")"
235 | ],
236 | "execution_count": 3,
237 | "outputs": [
238 | {
239 | "output_type": "stream",
240 | "text": [
241 | "already have song_data.csv.zip in workspace\n",
242 | "files in song_data.csv.zip have already been extracted\n",
243 | "song_data.csv.zip has already been stored in Google Drive\n"
244 | ],
245 | "name": "stdout"
246 | }
247 | ]
248 | },
249 | {
250 | "metadata": {
251 | "id": "0cOYIVEQPRJM",
252 | "colab_type": "code",
253 | "colab": {
254 | "base_uri": "https://localhost:8080/",
255 | "height": 71
256 | },
257 | "outputId": "718819df-2d6f-487a-ec33-5f613a84af32"
258 | },
259 | "cell_type": "code",
260 | "source": [
261 | "load_cache_extract_datafile(\"10000.txt.zip\", \"10000.txt\", \"https://static.turi.com/datasets/millionsong/10000.txt\")"
262 | ],
263 | "execution_count": 5,
264 | "outputs": [
265 | {
266 | "output_type": "stream",
267 | "text": [
268 | "already have 10000.txt.zip in workspace\n",
269 | "files in 10000.txt.zip have already been extracted\n",
270 | "10000.txt.zip has already been stored in Google Drive\n"
271 | ],
272 | "name": "stdout"
273 | }
274 | ]
275 | },
276 | {
277 | "metadata": {
278 | "id": "lBLwgPKlPdGs",
279 | "colab_type": "code",
280 | "colab": {
281 | "base_uri": "https://localhost:8080/",
282 | "height": 71
283 | },
284 | "outputId": "a89500cb-6460-4b86-d8a3-df697dcb3582"
285 | },
286 | "cell_type": "code",
287 | "source": [
288 | "load_cache_extract_datafile(\"loc-gowalla_totalCheckins.txt.gz\", \"loc-gowalla_totalCheckins.txt\", \"https://snap.stanford.edu/data/loc-gowalla_totalCheckins.txt.gz\")"
289 | ],
290 | "execution_count": 11,
291 | "outputs": [
292 | {
293 | "output_type": "stream",
294 | "text": [
295 | "already have loc-gowalla_totalCheckins.txt.gz in workspace\n",
296 | "files in loc-gowalla_totalCheckins.txt.gz have already been extracted\n",
297 | "loc-gowalla_totalCheckins.txt.gz has already been stored in Google Drive\n"
298 | ],
299 | "name": "stdout"
300 | }
301 | ]
302 | },
303 | {
304 | "metadata": {
305 | "id": "_-x_YK5h9DfS",
306 | "colab_type": "text"
307 | },
308 | "cell_type": "markdown",
309 | "source": [
310 | "## Setup Turi Create"
311 | ]
312 | },
313 | {
314 | "metadata": {
315 | "id": "R80hcNX19F7X",
316 | "colab_type": "code",
317 | "colab": {}
318 | },
319 | "cell_type": "code",
320 | "source": [
321 | "import mxnet as mx\n",
322 | "import turicreate as tc"
323 | ],
324 | "execution_count": 0,
325 | "outputs": []
326 | },
327 | {
328 | "metadata": {
329 | "id": "JMpUas6Y9IHc",
330 | "colab_type": "code",
331 | "colab": {}
332 | },
333 | "cell_type": "code",
334 | "source": [
335 | "# Use all GPUs (default)\n",
336 | "tc.config.set_num_gpus(-1)\n",
337 | "\n",
338 | "# Use only 1 GPU\n",
339 | "#tc.config.set_num_gpus(1)\n",
340 | "\n",
341 | "# Use CPU\n",
342 | "#tc.config.set_num_gpus(0)"
343 | ],
344 | "execution_count": 0,
345 | "outputs": []
346 | },
347 | {
348 | "metadata": {
349 | "id": "zi7jvSvE27eP",
350 | "colab_type": "text"
351 | },
352 | "cell_type": "markdown",
353 | "source": [
354 | "## Sample Data\n",
355 | "\n",
356 | "The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.\n",
357 | "\n",
358 | "https://labrosa.ee.columbia.edu/millionsong/\n",
359 | "\n",
360 | "The first table contains metadata about each song in the database. Here's how we load it into an SFrame:"
361 | ]
362 | },
363 | {
364 | "metadata": {
365 | "id": "Z1c84n0o2Znq",
366 | "colab_type": "code",
367 | "colab": {
368 | "base_uri": "https://localhost:8080/",
369 | "height": 233
370 | },
371 | "outputId": "87a21db1-2574-4da4-c40f-05b1cc002af8"
372 | },
373 | "cell_type": "code",
374 | "source": [
375 | "songs = tc.SFrame.read_csv(\"./song_data.csv\")"
376 | ],
377 | "execution_count": 7,
378 | "outputs": [
379 | {
380 | "output_type": "display_data",
381 | "data": {
382 | "text/html": [
383 | "Finished parsing file /content/song_data.csv
"
384 | ],
385 | "text/plain": [
386 | "Finished parsing file /content/song_data.csv"
387 | ]
388 | },
389 | "metadata": {
390 | "tags": []
391 | }
392 | },
393 | {
394 | "output_type": "display_data",
395 | "data": {
396 | "text/html": [
397 | "Parsing completed. Parsed 100 lines in 1.86884 secs.
"
398 | ],
399 | "text/plain": [
400 | "Parsing completed. Parsed 100 lines in 1.86884 secs."
401 | ]
402 | },
403 | "metadata": {
404 | "tags": []
405 | }
406 | },
407 | {
408 | "output_type": "stream",
409 | "text": [
410 | "------------------------------------------------------\n",
411 | "Inferred types from first 100 line(s) of file as \n",
412 | "column_type_hints=[str,str,str,str,int]\n",
413 | "If parsing fails due to incorrect types, you can correct\n",
414 | "the inferred type list above and pass it to read_csv in\n",
415 | "the column_type_hints argument\n",
416 | "------------------------------------------------------\n"
417 | ],
418 | "name": "stdout"
419 | },
420 | {
421 | "output_type": "display_data",
422 | "data": {
423 | "text/html": [
424 | "Read 637410 lines. Lines per second: 490463
"
425 | ],
426 | "text/plain": [
427 | "Read 637410 lines. Lines per second: 490463"
428 | ]
429 | },
430 | "metadata": {
431 | "tags": []
432 | }
433 | },
434 | {
435 | "output_type": "display_data",
436 | "data": {
437 | "text/html": [
438 | "Finished parsing file /content/song_data.csv
"
439 | ],
440 | "text/plain": [
441 | "Finished parsing file /content/song_data.csv"
442 | ]
443 | },
444 | "metadata": {
445 | "tags": []
446 | }
447 | },
448 | {
449 | "output_type": "display_data",
450 | "data": {
451 | "text/html": [
452 | "Parsing completed. Parsed 1000000 lines in 1.49576 secs.
"
453 | ],
454 | "text/plain": [
455 | "Parsing completed. Parsed 1000000 lines in 1.49576 secs."
456 | ]
457 | },
458 | "metadata": {
459 | "tags": []
460 | }
461 | }
462 | ]
463 | },
464 | {
465 | "metadata": {
466 | "id": "lgWs2Lbo3fF_",
467 | "colab_type": "code",
468 | "colab": {
469 | "base_uri": "https://localhost:8080/",
470 | "height": 296
471 | },
472 | "outputId": "2b0ecc17-dd31-406a-9e89-c35212171886"
473 | },
474 | "cell_type": "code",
475 | "source": [
476 | "songs.head()"
477 | ],
478 | "execution_count": 21,
479 | "outputs": [
480 | {
481 | "output_type": "execute_result",
482 | "data": {
483 | "text/html": [
484 | "\n",
485 | " \n",
486 | " song_id | \n",
487 | " title | \n",
488 | " release | \n",
489 | " artist_name | \n",
490 | " year | \n",
491 | "
\n",
492 | " \n",
493 | " SOQMMHC12AB0180CB8 | \n",
494 | " Silent Night | \n",
495 | " Monster Ballads X-Mas | \n",
496 | " Faster Pussy cat | \n",
497 | " 2003 | \n",
498 | "
\n",
499 | " \n",
500 | " SOVFVAK12A8C1350D9 | \n",
501 | " Tanssi vaan | \n",
502 | " Karkuteillä | \n",
503 | " Karkkiautomaatti | \n",
504 | " 1995 | \n",
505 | "
\n",
506 | " \n",
507 | " SOGTUKN12AB017F4F1 | \n",
508 | " No One Could Ever | \n",
509 | " Butter | \n",
510 | " Hudson Mohawke | \n",
511 | " 2006 | \n",
512 | "
\n",
513 | " \n",
514 | " SOBNYVR12A8C13558C | \n",
515 | " Si Vos Querés | \n",
516 | " De Culo | \n",
517 | " Yerba Brava | \n",
518 | " 2003 | \n",
519 | "
\n",
520 | " \n",
521 | " SOHSBXH12A8C13B0DF | \n",
522 | " Tangle Of Aspens | \n",
523 | " Rene Ablaze Presents Winter Sessions ... | \n",
524 | " Der Mystic | \n",
525 | " 0 | \n",
526 | "
\n",
527 | " \n",
528 | " SOZVAPQ12A8C13B63C | \n",
529 | " Symphony No. 1 G minor \"Sinfonie ... | \n",
530 | " Berwald: Symphonies Nos. 1/2/3/4 ... | \n",
531 | " David Montgomery | \n",
532 | " 0 | \n",
533 | "
\n",
534 | " \n",
535 | " SOQVRHI12A6D4FB2D7 | \n",
536 | " We Have Got Love | \n",
537 | " Strictly The Best Vol. 34 | \n",
538 | " Sasha / Turbulence | \n",
539 | " 0 | \n",
540 | "
\n",
541 | " \n",
542 | " SOEYRFT12AB018936C | \n",
543 | " 2 Da Beat Ch'yall | \n",
544 | " Da Bomb | \n",
545 | " Kris Kross | \n",
546 | " 1993 | \n",
547 | "
\n",
548 | " \n",
549 | " SOPMIYT12A6D4F851E | \n",
550 | " Goodbye | \n",
551 | " Danny Boy | \n",
552 | " Joseph Locke | \n",
553 | " 0 | \n",
554 | "
\n",
555 | " \n",
556 | " SOJCFMH12A8C13B0C2 | \n",
557 | " Mama_ mama can't you see ? ... | \n",
558 | " March to cadence with the US marines ... | \n",
559 | " The Sun Harbor's Chorus- Documentary Recordings ... | \n",
560 | " 0 | \n",
561 | "
\n",
562 | "
\n",
563 | "[10 rows x 5 columns]
\n",
564 | "
"
565 | ],
566 | "text/plain": [
567 | "Columns:\n",
568 | "\tsong_id\tstr\n",
569 | "\ttitle\tstr\n",
570 | "\trelease\tstr\n",
571 | "\tartist_name\tstr\n",
572 | "\tyear\tint\n",
573 | "\n",
574 | "Rows: 10\n",
575 | "\n",
576 | "Data:\n",
577 | "+--------------------+-------------------------------+\n",
578 | "| song_id | title |\n",
579 | "+--------------------+-------------------------------+\n",
580 | "| SOQMMHC12AB0180CB8 | Silent Night |\n",
581 | "| SOVFVAK12A8C1350D9 | Tanssi vaan |\n",
582 | "| SOGTUKN12AB017F4F1 | No One Could Ever |\n",
583 | "| SOBNYVR12A8C13558C | Si Vos Querés |\n",
584 | "| SOHSBXH12A8C13B0DF | Tangle Of Aspens |\n",
585 | "| SOZVAPQ12A8C13B63C | Symphony No. 1 G minor \"Si... |\n",
586 | "| SOQVRHI12A6D4FB2D7 | We Have Got Love |\n",
587 | "| SOEYRFT12AB018936C | 2 Da Beat Ch'yall |\n",
588 | "| SOPMIYT12A6D4F851E | Goodbye |\n",
589 | "| SOJCFMH12A8C13B0C2 | Mama_ mama can't you see ? |\n",
590 | "+--------------------+-------------------------------+\n",
591 | "+-------------------------------+-------------------------------+------+\n",
592 | "| release | artist_name | year |\n",
593 | "+-------------------------------+-------------------------------+------+\n",
594 | "| Monster Ballads X-Mas | Faster Pussy cat | 2003 |\n",
595 | "| Karkuteillä | Karkkiautomaatti | 1995 |\n",
596 | "| Butter | Hudson Mohawke | 2006 |\n",
597 | "| De Culo | Yerba Brava | 2003 |\n",
598 | "| Rene Ablaze Presents Winte... | Der Mystic | 0 |\n",
599 | "| Berwald: Symphonies Nos. 1... | David Montgomery | 0 |\n",
600 | "| Strictly The Best Vol. 34 | Sasha / Turbulence | 0 |\n",
601 | "| Da Bomb | Kris Kross | 1993 |\n",
602 | "| Danny Boy | Joseph Locke | 0 |\n",
603 | "| March to cadence with the ... | The Sun Harbor's Chorus-Do... | 0 |\n",
604 | "+-------------------------------+-------------------------------+------+\n",
605 | "[10 rows x 5 columns]"
606 | ]
607 | },
608 | "metadata": {
609 | "tags": []
610 | },
611 | "execution_count": 21
612 | }
613 | ]
614 | },
615 | {
616 | "metadata": {
617 | "id": "SlHixuEd3MI6",
618 | "colab_type": "text"
619 | },
620 | "cell_type": "markdown",
621 | "source": [
622 | "No options are needed for the simplest case, as the SFrame parser infers column types. Of course, there are many options you may need to specify when importing a csv file. Some of the more common options come in to play when we load the usage data of users listening to these songs online:"
623 | ]
624 | },
625 | {
626 | "metadata": {
627 | "id": "hzAkDYbR3Mk1",
628 | "colab_type": "code",
629 | "colab": {
630 | "base_uri": "https://localhost:8080/",
631 | "height": 107
632 | },
633 | "outputId": "d682e0eb-56fd-4e3b-eaa4-cdbbee857031"
634 | },
635 | "cell_type": "code",
636 | "source": [
637 | "usage_data = tc.SFrame.read_csv(\"./10000.txt\",\n",
638 | " header=False,\n",
639 | " delimiter='\\t',\n",
640 | " column_type_hints={'X3':int})"
641 | ],
642 | "execution_count": 8,
643 | "outputs": [
644 | {
645 | "output_type": "display_data",
646 | "data": {
647 | "text/html": [
648 | "Finished parsing file /content/10000.txt
"
649 | ],
650 | "text/plain": [
651 | "Finished parsing file /content/10000.txt"
652 | ]
653 | },
654 | "metadata": {
655 | "tags": []
656 | }
657 | },
658 | {
659 | "output_type": "display_data",
660 | "data": {
661 | "text/html": [
662 | "Parsing completed. Parsed 100 lines in 1.70624 secs.
"
663 | ],
664 | "text/plain": [
665 | "Parsing completed. Parsed 100 lines in 1.70624 secs."
666 | ]
667 | },
668 | "metadata": {
669 | "tags": []
670 | }
671 | },
672 | {
673 | "output_type": "display_data",
674 | "data": {
675 | "text/html": [
676 | "Read 844838 lines. Lines per second: 741447
"
677 | ],
678 | "text/plain": [
679 | "Read 844838 lines. Lines per second: 741447"
680 | ]
681 | },
682 | "metadata": {
683 | "tags": []
684 | }
685 | },
686 | {
687 | "output_type": "display_data",
688 | "data": {
689 | "text/html": [
690 | "Finished parsing file /content/10000.txt
"
691 | ],
692 | "text/plain": [
693 | "Finished parsing file /content/10000.txt"
694 | ]
695 | },
696 | "metadata": {
697 | "tags": []
698 | }
699 | },
700 | {
701 | "output_type": "display_data",
702 | "data": {
703 | "text/html": [
704 | "Parsing completed. Parsed 2000000 lines in 1.49596 secs.
"
705 | ],
706 | "text/plain": [
707 | "Parsing completed. Parsed 2000000 lines in 1.49596 secs."
708 | ]
709 | },
710 | "metadata": {
711 | "tags": []
712 | }
713 | }
714 | ]
715 | },
716 | {
717 | "metadata": {
718 | "id": "1hgMj9Mq3gCC",
719 | "colab_type": "code",
720 | "colab": {
721 | "base_uri": "https://localhost:8080/",
722 | "height": 415
723 | },
724 | "outputId": "7523fd11-2573-418f-bc5f-8f627583f254"
725 | },
726 | "cell_type": "code",
727 | "source": [
728 | "usage_data.head()"
729 | ],
730 | "execution_count": 9,
731 | "outputs": [
732 | {
733 | "output_type": "execute_result",
734 | "data": {
735 | "text/html": [
736 | "\n",
737 | " \n",
738 | " X1 | \n",
739 | " X2 | \n",
740 | " X3 | \n",
741 | "
\n",
742 | " \n",
743 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
744 | " SOAKIMP12A8C130995 | \n",
745 | " 1 | \n",
746 | "
\n",
747 | " \n",
748 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
749 | " SOBBMDR12A8C13253B | \n",
750 | " 2 | \n",
751 | "
\n",
752 | " \n",
753 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
754 | " SOBXHDL12A81C204C0 | \n",
755 | " 1 | \n",
756 | "
\n",
757 | " \n",
758 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
759 | " SOBYHAJ12A6701BF1D | \n",
760 | " 1 | \n",
761 | "
\n",
762 | " \n",
763 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
764 | " SODACBL12A8C13C273 | \n",
765 | " 1 | \n",
766 | "
\n",
767 | " \n",
768 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
769 | " SODDNQT12A6D4F5F7E | \n",
770 | " 5 | \n",
771 | "
\n",
772 | " \n",
773 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
774 | " SODXRTY12AB0180F3B | \n",
775 | " 1 | \n",
776 | "
\n",
777 | " \n",
778 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
779 | " SOFGUAY12AB017B0A8 | \n",
780 | " 1 | \n",
781 | "
\n",
782 | " \n",
783 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
784 | " SOFRQTD12A81C233C0 | \n",
785 | " 1 | \n",
786 | "
\n",
787 | " \n",
788 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
789 | " SOHQWYZ12A6D4FA701 | \n",
790 | " 1 | \n",
791 | "
\n",
792 | "
\n",
793 | "[10 rows x 3 columns]
\n",
794 | "
"
795 | ],
796 | "text/plain": [
797 | "Columns:\n",
798 | "\tX1\tstr\n",
799 | "\tX2\tstr\n",
800 | "\tX3\tint\n",
801 | "\n",
802 | "Rows: 10\n",
803 | "\n",
804 | "Data:\n",
805 | "+-------------------------------+--------------------+----+\n",
806 | "| X1 | X2 | X3 |\n",
807 | "+-------------------------------+--------------------+----+\n",
808 | "| b80344d063b5ccb3212f76538f... | SOAKIMP12A8C130995 | 1 |\n",
809 | "| b80344d063b5ccb3212f76538f... | SOBBMDR12A8C13253B | 2 |\n",
810 | "| b80344d063b5ccb3212f76538f... | SOBXHDL12A81C204C0 | 1 |\n",
811 | "| b80344d063b5ccb3212f76538f... | SOBYHAJ12A6701BF1D | 1 |\n",
812 | "| b80344d063b5ccb3212f76538f... | SODACBL12A8C13C273 | 1 |\n",
813 | "| b80344d063b5ccb3212f76538f... | SODDNQT12A6D4F5F7E | 5 |\n",
814 | "| b80344d063b5ccb3212f76538f... | SODXRTY12AB0180F3B | 1 |\n",
815 | "| b80344d063b5ccb3212f76538f... | SOFGUAY12AB017B0A8 | 1 |\n",
816 | "| b80344d063b5ccb3212f76538f... | SOFRQTD12A81C233C0 | 1 |\n",
817 | "| b80344d063b5ccb3212f76538f... | SOHQWYZ12A6D4FA701 | 1 |\n",
818 | "+-------------------------------+--------------------+----+\n",
819 | "[10 rows x 3 columns]"
820 | ]
821 | },
822 | "metadata": {
823 | "tags": []
824 | },
825 | "execution_count": 9
826 | }
827 | ]
828 | },
829 | {
830 | "metadata": {
831 | "id": "NJzJVDKF3SGO",
832 | "colab_type": "text"
833 | },
834 | "cell_type": "markdown",
835 | "source": [
836 | "The header and delimiter options are needed because this particular csv file does not provide column names in its first line, and the values are separated by tabs, not commas. The column_type_hints keeps the SFrame csv parser from attempting to infer the datatype of each column, which it does by default. For a full list of options when parsing csv files, check our [API Reference](https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.read_csv.html#turicreate.SFrame.read_csv)."
837 | ]
838 | },
839 | {
840 | "metadata": {
841 | "id": "-8jl3ffU3k6v",
842 | "colab_type": "text"
843 | },
844 | "cell_type": "markdown",
845 | "source": [
846 | "Here we might want to rename columns from the default names:"
847 | ]
848 | },
849 | {
850 | "metadata": {
851 | "id": "Tpx2vEVP3mvw",
852 | "colab_type": "code",
853 | "colab": {
854 | "base_uri": "https://localhost:8080/",
855 | "height": 449
856 | },
857 | "outputId": "1b4ff3dd-a88c-43db-a809-b6c539b00c4c"
858 | },
859 | "cell_type": "code",
860 | "source": [
861 | "usage_data.rename({'X1':'user_id', 'X2':'song_id', 'X3':'listen_count'})"
862 | ],
863 | "execution_count": 10,
864 | "outputs": [
865 | {
866 | "output_type": "execute_result",
867 | "data": {
868 | "text/html": [
869 | "\n",
870 | " \n",
871 | " user_id | \n",
872 | " song_id | \n",
873 | " listen_count | \n",
874 | "
\n",
875 | " \n",
876 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
877 | " SOAKIMP12A8C130995 | \n",
878 | " 1 | \n",
879 | "
\n",
880 | " \n",
881 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
882 | " SOBBMDR12A8C13253B | \n",
883 | " 2 | \n",
884 | "
\n",
885 | " \n",
886 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
887 | " SOBXHDL12A81C204C0 | \n",
888 | " 1 | \n",
889 | "
\n",
890 | " \n",
891 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
892 | " SOBYHAJ12A6701BF1D | \n",
893 | " 1 | \n",
894 | "
\n",
895 | " \n",
896 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
897 | " SODACBL12A8C13C273 | \n",
898 | " 1 | \n",
899 | "
\n",
900 | " \n",
901 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
902 | " SODDNQT12A6D4F5F7E | \n",
903 | " 5 | \n",
904 | "
\n",
905 | " \n",
906 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
907 | " SODXRTY12AB0180F3B | \n",
908 | " 1 | \n",
909 | "
\n",
910 | " \n",
911 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
912 | " SOFGUAY12AB017B0A8 | \n",
913 | " 1 | \n",
914 | "
\n",
915 | " \n",
916 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
917 | " SOFRQTD12A81C233C0 | \n",
918 | " 1 | \n",
919 | "
\n",
920 | " \n",
921 | " b80344d063b5ccb3212f76538 f3d9e43d87dca9e ... | \n",
922 | " SOHQWYZ12A6D4FA701 | \n",
923 | " 1 | \n",
924 | "
\n",
925 | "
\n",
926 | "[2000000 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n",
927 | "
"
928 | ],
929 | "text/plain": [
930 | "Columns:\n",
931 | "\tuser_id\tstr\n",
932 | "\tsong_id\tstr\n",
933 | "\tlisten_count\tint\n",
934 | "\n",
935 | "Rows: 2000000\n",
936 | "\n",
937 | "Data:\n",
938 | "+-------------------------------+--------------------+--------------+\n",
939 | "| user_id | song_id | listen_count |\n",
940 | "+-------------------------------+--------------------+--------------+\n",
941 | "| b80344d063b5ccb3212f76538f... | SOAKIMP12A8C130995 | 1 |\n",
942 | "| b80344d063b5ccb3212f76538f... | SOBBMDR12A8C13253B | 2 |\n",
943 | "| b80344d063b5ccb3212f76538f... | SOBXHDL12A81C204C0 | 1 |\n",
944 | "| b80344d063b5ccb3212f76538f... | SOBYHAJ12A6701BF1D | 1 |\n",
945 | "| b80344d063b5ccb3212f76538f... | SODACBL12A8C13C273 | 1 |\n",
946 | "| b80344d063b5ccb3212f76538f... | SODDNQT12A6D4F5F7E | 5 |\n",
947 | "| b80344d063b5ccb3212f76538f... | SODXRTY12AB0180F3B | 1 |\n",
948 | "| b80344d063b5ccb3212f76538f... | SOFGUAY12AB017B0A8 | 1 |\n",
949 | "| b80344d063b5ccb3212f76538f... | SOFRQTD12A81C233C0 | 1 |\n",
950 | "| b80344d063b5ccb3212f76538f... | SOHQWYZ12A6D4FA701 | 1 |\n",
951 | "+-------------------------------+--------------------+--------------+\n",
952 | "[2000000 rows x 3 columns]\n",
953 | "Note: Only the head of the SFrame is printed.\n",
954 | "You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns."
955 | ]
956 | },
957 | "metadata": {
958 | "tags": []
959 | },
960 | "execution_count": 10
961 | }
962 | ]
963 | },
964 | {
965 | "metadata": {
966 | "id": "ymSWtQwz3pxd",
967 | "colab_type": "text"
968 | },
969 | "cell_type": "markdown",
970 | "source": [
971 | "SFrames can be saved as a csv file or in the SFrame binary format. If your SFrame is saved in binary format loading it is instantaneous, so we won't ever have to parse that file again. Here, the default is to save in binary format, and we supply the name of a directory to be created which will hold the binary files:"
972 | ]
973 | },
974 | {
975 | "metadata": {
976 | "id": "rxJsNce-3ZRL",
977 | "colab_type": "code",
978 | "colab": {}
979 | },
980 | "cell_type": "code",
981 | "source": [
982 | "usage_data.save('./music_usage_data.sframe')"
983 | ],
984 | "execution_count": 0,
985 | "outputs": []
986 | },
987 | {
988 | "metadata": {
989 | "id": "gLcdT8Cv3tfb",
990 | "colab_type": "text"
991 | },
992 | "cell_type": "markdown",
993 | "source": [
994 | "Loading is then very fast:"
995 | ]
996 | },
997 | {
998 | "metadata": {
999 | "id": "_FUN-pBF3vTy",
1000 | "colab_type": "code",
1001 | "colab": {}
1002 | },
1003 | "cell_type": "code",
1004 | "source": [
1005 | "same_usage_data = tc.load_sframe('./music_usage_data.sframe')"
1006 | ],
1007 | "execution_count": 0,
1008 | "outputs": []
1009 | },
1010 | {
1011 | "metadata": {
1012 | "id": "ze0lxR9k3zRA",
1013 | "colab_type": "text"
1014 | },
1015 | "cell_type": "markdown",
1016 | "source": [
1017 | "## Data Types\n",
1018 | "\n",
1019 | "An SFrame is made up of columns of a contiguous type, a number of datatypes are supported:\n",
1020 | "\n",
1021 | "* int (signed 64-bit integer)\n",
1022 | "* float (double-precision floating point)\n",
1023 | "* str (string)\n",
1024 | "* array.array (1-D array of doubles)\n",
1025 | "* list (arbitrarily list of elements)\n",
1026 | "* dict (arbitrary dictionary of elements)\n",
1027 | "* datetime.datetime (datetime with microsecond precision)\n",
1028 | "* image (image)"
1029 | ]
1030 | },
1031 | {
1032 | "metadata": {
1033 | "id": "HUYoOHQjCxaZ",
1034 | "colab_type": "text"
1035 | },
1036 | "cell_type": "markdown",
1037 | "source": [
1038 | "## Memory Intensive Example\n",
1039 | "\n",
1040 | "https://blog.usejournal.com/python-for-big-data-computation-on-a-single-computer-c232046df3c3\n",
1041 | "\n",
1042 | "The data we will use for our experiment comes from the (now inexistent) Gowalla social networking site. Two data nice data sets coming from this site are available here. We will be looking at the biggest one, which contains the event-log of “check-ins” of Gowalla’s users to a set of locations. This data set contains 6.44 million records, each containing a single check-in and just a few columns, of which we will pick only 3: user_id, location_id and checkin_ts (the second-resolution timestamp of the check-in event).\n",
1043 | "\n",
1044 | "https://snap.stanford.edu/data/loc-gowalla.html"
1045 | ]
1046 | },
1047 | {
1048 | "metadata": {
1049 | "id": "YfHt3wIDEAYS",
1050 | "colab_type": "text"
1051 | },
1052 | "cell_type": "markdown",
1053 | "source": [
1054 | "The problem and its (theoretical) solution\n",
1055 | "We will use Turi Create to attack what could be termed the “stalker-stalkee detection problem” on this data set. In this problem, we are asked to identify pairs of users (E, R) that maximize the ‘stalking measure between E and R’. The stalking measure between E and R is defined as the number of distinct locations where there was ever a check-in by user E (the stalkEE) followed by a check-in by user R (the stalkER).\n",
1056 | "\n",
1057 | "The first thing is to index the check-ins by location_id (remember that in pandas a single value for a key can refer to more than one row). This will make the following computation easier.\n",
1058 | "\n",
1059 | "Then comes the tricky part, for each location we want to consider all pairs of check-ins where the check-in time stamp of the first user in the pair strictly precedes that of the second user. So generate chin_pairs, a data frame containing all pairs of check-ins for the same location and then filter it to enforce the conditions just described, to generate pairs_filtered.\n",
1060 | "\n",
1061 | "However, trying to run a naïve Pandas solution on a laptop or PC with the amount of RAM that is usual these days, (say 16GB), will result in a MemoryError exception. With Turi Create and SFrames we do not have such problems."
1062 | ]
1063 | },
1064 | {
1065 | "metadata": {
1066 | "id": "tj8qH8pxEqqg",
1067 | "colab_type": "code",
1068 | "colab": {
1069 | "base_uri": "https://localhost:8080/",
1070 | "height": 269
1071 | },
1072 | "outputId": "49646044-ff47-41fa-af77-ac1b4c27a0df"
1073 | },
1074 | "cell_type": "code",
1075 | "source": [
1076 | "checkins = ( tc.SFrame.read_csv( 'loc-gowalla_totalCheckins.txt', \n",
1077 | " delimiter='\\t', header=False )\n",
1078 | " .rename( {'X1': 'user_id', 'X2' : 'checkin_ts',\n",
1079 | " 'X3': 'lat', 'X4' : 'lon',\n",
1080 | " 'X5': 'location_id'} )\n",
1081 | " [[\"user_id\", \"location_id\", \"checkin_ts\"]] )"
1082 | ],
1083 | "execution_count": 6,
1084 | "outputs": [
1085 | {
1086 | "output_type": "display_data",
1087 | "data": {
1088 | "text/plain": [
1089 | "Finished parsing file /content/loc-gowalla_totalCheckins.txt"
1090 | ],
1091 | "text/html": [
1092 | "Finished parsing file /content/loc-gowalla_totalCheckins.txt
"
1093 | ]
1094 | },
1095 | "metadata": {
1096 | "tags": []
1097 | }
1098 | },
1099 | {
1100 | "output_type": "display_data",
1101 | "data": {
1102 | "text/plain": [
1103 | "Parsing completed. Parsed 100 lines in 1.49398 secs."
1104 | ],
1105 | "text/html": [
1106 | "Parsing completed. Parsed 100 lines in 1.49398 secs.
"
1107 | ]
1108 | },
1109 | "metadata": {
1110 | "tags": []
1111 | }
1112 | },
1113 | {
1114 | "output_type": "stream",
1115 | "text": [
1116 | "------------------------------------------------------\n",
1117 | "Inferred types from first 100 line(s) of file as \n",
1118 | "column_type_hints=[int,str,float,float,int]\n",
1119 | "If parsing fails due to incorrect types, you can correct\n",
1120 | "the inferred type list above and pass it to read_csv in\n",
1121 | "the column_type_hints argument\n",
1122 | "------------------------------------------------------\n"
1123 | ],
1124 | "name": "stdout"
1125 | },
1126 | {
1127 | "output_type": "display_data",
1128 | "data": {
1129 | "text/plain": [
1130 | "Read 870755 lines. Lines per second: 228086"
1131 | ],
1132 | "text/html": [
1133 | "Read 870755 lines. Lines per second: 228086
"
1134 | ]
1135 | },
1136 | "metadata": {
1137 | "tags": []
1138 | }
1139 | },
1140 | {
1141 | "output_type": "display_data",
1142 | "data": {
1143 | "text/plain": [
1144 | "Read 2588975 lines. Lines per second: 271390"
1145 | ],
1146 | "text/html": [
1147 | "Read 2588975 lines. Lines per second: 271390
"
1148 | ]
1149 | },
1150 | "metadata": {
1151 | "tags": []
1152 | }
1153 | },
1154 | {
1155 | "output_type": "display_data",
1156 | "data": {
1157 | "text/plain": [
1158 | "Read 4301430 lines. Lines per second: 281586"
1159 | ],
1160 | "text/html": [
1161 | "Read 4301430 lines. Lines per second: 281586
"
1162 | ]
1163 | },
1164 | "metadata": {
1165 | "tags": []
1166 | }
1167 | },
1168 | {
1169 | "output_type": "display_data",
1170 | "data": {
1171 | "text/plain": [
1172 | "Finished parsing file /content/loc-gowalla_totalCheckins.txt"
1173 | ],
1174 | "text/html": [
1175 | "Finished parsing file /content/loc-gowalla_totalCheckins.txt
"
1176 | ]
1177 | },
1178 | "metadata": {
1179 | "tags": []
1180 | }
1181 | },
1182 | {
1183 | "output_type": "display_data",
1184 | "data": {
1185 | "text/plain": [
1186 | "Parsing completed. Parsed 6442892 lines in 19.7667 secs."
1187 | ],
1188 | "text/html": [
1189 | "Parsing completed. Parsed 6442892 lines in 19.7667 secs.
"
1190 | ]
1191 | },
1192 | "metadata": {
1193 | "tags": []
1194 | }
1195 | }
1196 | ]
1197 | },
1198 | {
1199 | "metadata": {
1200 | "id": "owEASksxFHiK",
1201 | "colab_type": "code",
1202 | "colab": {
1203 | "base_uri": "https://localhost:8080/",
1204 | "height": 245
1205 | },
1206 | "outputId": "1370e5e6-5833-4ced-a991-4bac8453dc5e"
1207 | },
1208 | "cell_type": "code",
1209 | "source": [
1210 | "checkins.head()"
1211 | ],
1212 | "execution_count": 7,
1213 | "outputs": [
1214 | {
1215 | "output_type": "execute_result",
1216 | "data": {
1217 | "text/html": [
1218 | "\n",
1219 | " \n",
1220 | " user_id | \n",
1221 | " location_id | \n",
1222 | " checkin_ts | \n",
1223 | "
\n",
1224 | " \n",
1225 | " 0 | \n",
1226 | " 22847 | \n",
1227 | " 2010-10-19T23:55:27Z | \n",
1228 | "
\n",
1229 | " \n",
1230 | " 0 | \n",
1231 | " 420315 | \n",
1232 | " 2010-10-18T22:17:43Z | \n",
1233 | "
\n",
1234 | " \n",
1235 | " 0 | \n",
1236 | " 316637 | \n",
1237 | " 2010-10-17T23:42:03Z | \n",
1238 | "
\n",
1239 | " \n",
1240 | " 0 | \n",
1241 | " 16516 | \n",
1242 | " 2010-10-17T19:26:05Z | \n",
1243 | "
\n",
1244 | " \n",
1245 | " 0 | \n",
1246 | " 5535878 | \n",
1247 | " 2010-10-16T18:50:42Z | \n",
1248 | "
\n",
1249 | " \n",
1250 | " 0 | \n",
1251 | " 15372 | \n",
1252 | " 2010-10-12T23:58:03Z | \n",
1253 | "
\n",
1254 | " \n",
1255 | " 0 | \n",
1256 | " 21714 | \n",
1257 | " 2010-10-12T22:02:11Z | \n",
1258 | "
\n",
1259 | " \n",
1260 | " 0 | \n",
1261 | " 420315 | \n",
1262 | " 2010-10-12T19:44:40Z | \n",
1263 | "
\n",
1264 | " \n",
1265 | " 0 | \n",
1266 | " 153505 | \n",
1267 | " 2010-10-12T15:57:20Z | \n",
1268 | "
\n",
1269 | " \n",
1270 | " 0 | \n",
1271 | " 420315 | \n",
1272 | " 2010-10-12T15:19:03Z | \n",
1273 | "
\n",
1274 | "
\n",
1275 | "[10 rows x 3 columns]
\n",
1276 | "
"
1277 | ],
1278 | "text/plain": [
1279 | "Columns:\n",
1280 | "\tuser_id\tint\n",
1281 | "\tlocation_id\tint\n",
1282 | "\tcheckin_ts\tstr\n",
1283 | "\n",
1284 | "Rows: 10\n",
1285 | "\n",
1286 | "Data:\n",
1287 | "+---------+-------------+----------------------+\n",
1288 | "| user_id | location_id | checkin_ts |\n",
1289 | "+---------+-------------+----------------------+\n",
1290 | "| 0 | 22847 | 2010-10-19T23:55:27Z |\n",
1291 | "| 0 | 420315 | 2010-10-18T22:17:43Z |\n",
1292 | "| 0 | 316637 | 2010-10-17T23:42:03Z |\n",
1293 | "| 0 | 16516 | 2010-10-17T19:26:05Z |\n",
1294 | "| 0 | 5535878 | 2010-10-16T18:50:42Z |\n",
1295 | "| 0 | 15372 | 2010-10-12T23:58:03Z |\n",
1296 | "| 0 | 21714 | 2010-10-12T22:02:11Z |\n",
1297 | "| 0 | 420315 | 2010-10-12T19:44:40Z |\n",
1298 | "| 0 | 153505 | 2010-10-12T15:57:20Z |\n",
1299 | "| 0 | 420315 | 2010-10-12T15:19:03Z |\n",
1300 | "+---------+-------------+----------------------+\n",
1301 | "[10 rows x 3 columns]"
1302 | ]
1303 | },
1304 | "metadata": {
1305 | "tags": []
1306 | },
1307 | "execution_count": 7
1308 | }
1309 | ]
1310 | },
1311 | {
1312 | "metadata": {
1313 | "id": "0vdQZ__rFQ8I",
1314 | "colab_type": "text"
1315 | },
1316 | "cell_type": "markdown",
1317 | "source": [
1318 | "Next, generate the pairs of check-ins that satisfy the conditions of our detection algorithms."
1319 | ]
1320 | },
1321 | {
1322 | "metadata": {
1323 | "id": "nFoEzuYlMQgv",
1324 | "colab_type": "code",
1325 | "colab": {}
1326 | },
1327 | "cell_type": "code",
1328 | "source": [
1329 | "import datetime\n",
1330 | "import dateutil.parser"
1331 | ],
1332 | "execution_count": 0,
1333 | "outputs": []
1334 | },
1335 | {
1336 | "metadata": {
1337 | "id": "1wEhpuFVFUJC",
1338 | "colab_type": "code",
1339 | "colab": {}
1340 | },
1341 | "cell_type": "code",
1342 | "source": [
1343 | "chin_ps = ( checkins.join(checkins, on='location_id').rename( {'checkin_ts': 'checkin_ts_ee', 'checkin_ts.1': 'checkin_ts_er', 'user_id': 'stalkee' , 'user_id.1': 'stalker' } ) )"
1344 | ],
1345 | "execution_count": 0,
1346 | "outputs": []
1347 | },
1348 | {
1349 | "metadata": {
1350 | "id": "phw55yidIWg_",
1351 | "colab_type": "code",
1352 | "colab": {}
1353 | },
1354 | "cell_type": "code",
1355 | "source": [
1356 | "chin_ps['time_diff'] = (chin_ps['checkin_ts_er'].apply(dateutil.parser.parse) - chin_ps['checkin_ts_ee'].apply(dateutil.parser.parse)) / 86400"
1357 | ],
1358 | "execution_count": 0,
1359 | "outputs": []
1360 | },
1361 | {
1362 | "metadata": {
1363 | "id": "t9OQPWvlFuIE",
1364 | "colab_type": "code",
1365 | "colab": {
1366 | "base_uri": "https://localhost:8080/",
1367 | "height": 245
1368 | },
1369 | "outputId": "dbb4b625-e100-487c-d104-66b0f819062c"
1370 | },
1371 | "cell_type": "code",
1372 | "source": [
1373 | "# pairs_filtered = chin_ps[ (chin_ps['checkin_ts_ee'] < chin_ps['checkin_ts_er']) & (chin_ps['stalkee'] != chin_ps['stalker']) ]\n",
1374 | "pairs_filtered = chin_ps[ (chin_ps['time_diff'] > 0.0) & (chin_ps['time_diff'] < 1.0) & (chin_ps['stalkee'] != chin_ps['stalker']) ]\n",
1375 | "pairs_filtered.head()"
1376 | ],
1377 | "execution_count": 11,
1378 | "outputs": [
1379 | {
1380 | "output_type": "execute_result",
1381 | "data": {
1382 | "text/html": [
1383 | "\n",
1384 | " \n",
1385 | " stalkee | \n",
1386 | " location_id | \n",
1387 | " checkin_ts_ee | \n",
1388 | " stalker | \n",
1389 | " checkin_ts_er | \n",
1390 | " time_diff | \n",
1391 | "
\n",
1392 | " \n",
1393 | " 7 | \n",
1394 | " 420315 | \n",
1395 | " 2010-10-18T20:24:42Z | \n",
1396 | " 0 | \n",
1397 | " 2010-10-18T22:17:43Z | \n",
1398 | " 0.0784837962963 | \n",
1399 | "
\n",
1400 | " \n",
1401 | " 7 | \n",
1402 | " 420315 | \n",
1403 | " 2010-10-18T15:08:58Z | \n",
1404 | " 0 | \n",
1405 | " 2010-10-18T22:17:43Z | \n",
1406 | " 0.297743055556 | \n",
1407 | "
\n",
1408 | " \n",
1409 | " 31 | \n",
1410 | " 420315 | \n",
1411 | " 2010-10-18T14:00:53Z | \n",
1412 | " 0 | \n",
1413 | " 2010-10-18T22:17:43Z | \n",
1414 | " 0.345023148148 | \n",
1415 | "
\n",
1416 | " \n",
1417 | " 66 | \n",
1418 | " 420315 | \n",
1419 | " 2010-10-18T18:59:11Z | \n",
1420 | " 0 | \n",
1421 | " 2010-10-18T22:17:43Z | \n",
1422 | " 0.13787037037 | \n",
1423 | "
\n",
1424 | " \n",
1425 | " 327 | \n",
1426 | " 420315 | \n",
1427 | " 2010-10-18T21:21:12Z | \n",
1428 | " 0 | \n",
1429 | " 2010-10-18T22:17:43Z | \n",
1430 | " 0.0392476851852 | \n",
1431 | "
\n",
1432 | " \n",
1433 | " 327 | \n",
1434 | " 420315 | \n",
1435 | " 2010-10-18T14:05:59Z | \n",
1436 | " 0 | \n",
1437 | " 2010-10-18T22:17:43Z | \n",
1438 | " 0.341481481481 | \n",
1439 | "
\n",
1440 | " \n",
1441 | " 342 | \n",
1442 | " 420315 | \n",
1443 | " 2010-10-18T14:10:40Z | \n",
1444 | " 0 | \n",
1445 | " 2010-10-18T22:17:43Z | \n",
1446 | " 0.338229166667 | \n",
1447 | "
\n",
1448 | " \n",
1449 | " 350 | \n",
1450 | " 420315 | \n",
1451 | " 2010-10-18T19:28:34Z | \n",
1452 | " 0 | \n",
1453 | " 2010-10-18T22:17:43Z | \n",
1454 | " 0.117465277778 | \n",
1455 | "
\n",
1456 | " \n",
1457 | " 456 | \n",
1458 | " 420315 | \n",
1459 | " 2010-10-18T16:00:08Z | \n",
1460 | " 0 | \n",
1461 | " 2010-10-18T22:17:43Z | \n",
1462 | " 0.262210648148 | \n",
1463 | "
\n",
1464 | " \n",
1465 | " 515 | \n",
1466 | " 420315 | \n",
1467 | " 2010-10-18T11:42:06Z | \n",
1468 | " 0 | \n",
1469 | " 2010-10-18T22:17:43Z | \n",
1470 | " 0.441400462963 | \n",
1471 | "
\n",
1472 | "
\n",
1473 | "[10 rows x 6 columns]
\n",
1474 | "
"
1475 | ],
1476 | "text/plain": [
1477 | "Columns:\n",
1478 | "\tstalkee\tint\n",
1479 | "\tlocation_id\tint\n",
1480 | "\tcheckin_ts_ee\tstr\n",
1481 | "\tstalker\tint\n",
1482 | "\tcheckin_ts_er\tstr\n",
1483 | "\ttime_diff\tfloat\n",
1484 | "\n",
1485 | "Rows: 10\n",
1486 | "\n",
1487 | "Data:\n",
1488 | "+---------+-------------+----------------------+---------+----------------------+\n",
1489 | "| stalkee | location_id | checkin_ts_ee | stalker | checkin_ts_er |\n",
1490 | "+---------+-------------+----------------------+---------+----------------------+\n",
1491 | "| 7 | 420315 | 2010-10-18T20:24:42Z | 0 | 2010-10-18T22:17:43Z |\n",
1492 | "| 7 | 420315 | 2010-10-18T15:08:58Z | 0 | 2010-10-18T22:17:43Z |\n",
1493 | "| 31 | 420315 | 2010-10-18T14:00:53Z | 0 | 2010-10-18T22:17:43Z |\n",
1494 | "| 66 | 420315 | 2010-10-18T18:59:11Z | 0 | 2010-10-18T22:17:43Z |\n",
1495 | "| 327 | 420315 | 2010-10-18T21:21:12Z | 0 | 2010-10-18T22:17:43Z |\n",
1496 | "| 327 | 420315 | 2010-10-18T14:05:59Z | 0 | 2010-10-18T22:17:43Z |\n",
1497 | "| 342 | 420315 | 2010-10-18T14:10:40Z | 0 | 2010-10-18T22:17:43Z |\n",
1498 | "| 350 | 420315 | 2010-10-18T19:28:34Z | 0 | 2010-10-18T22:17:43Z |\n",
1499 | "| 456 | 420315 | 2010-10-18T16:00:08Z | 0 | 2010-10-18T22:17:43Z |\n",
1500 | "| 515 | 420315 | 2010-10-18T11:42:06Z | 0 | 2010-10-18T22:17:43Z |\n",
1501 | "+---------+-------------+----------------------+---------+----------------------+\n",
1502 | "+-----------------+\n",
1503 | "| time_diff |\n",
1504 | "+-----------------+\n",
1505 | "| 0.0784837962963 |\n",
1506 | "| 0.297743055556 |\n",
1507 | "| 0.345023148148 |\n",
1508 | "| 0.13787037037 |\n",
1509 | "| 0.0392476851852 |\n",
1510 | "| 0.341481481481 |\n",
1511 | "| 0.338229166667 |\n",
1512 | "| 0.117465277778 |\n",
1513 | "| 0.262210648148 |\n",
1514 | "| 0.441400462963 |\n",
1515 | "+-----------------+\n",
1516 | "[10 rows x 6 columns]"
1517 | ]
1518 | },
1519 | "metadata": {
1520 | "tags": []
1521 | },
1522 | "execution_count": 11
1523 | }
1524 | ]
1525 | },
1526 | {
1527 | "metadata": {
1528 | "id": "lFli73MVF6S3",
1529 | "colab_type": "code",
1530 | "colab": {}
1531 | },
1532 | "cell_type": "code",
1533 | "source": [
1534 | "final_result = ( pairs_filtered[['stalkee', 'stalker', 'location_id']]\n",
1535 | " .unique()\n",
1536 | " .groupby( ['stalkee', 'stalker'], {\"location_count\": agg.COUNT })\n",
1537 | " .topk( 'location_count', k=5 )\n",
1538 | " .materialize() )"
1539 | ],
1540 | "execution_count": 0,
1541 | "outputs": []
1542 | },
1543 | {
1544 | "metadata": {
1545 | "id": "m3fxFqCbF8zt",
1546 | "colab_type": "code",
1547 | "colab": {}
1548 | },
1549 | "cell_type": "code",
1550 | "source": [
1551 | "print( final_result )"
1552 | ],
1553 | "execution_count": 0,
1554 | "outputs": []
1555 | },
1556 | {
1557 | "metadata": {
1558 | "id": "zg8dAd69A0eU",
1559 | "colab_type": "text"
1560 | },
1561 | "cell_type": "markdown",
1562 | "source": [
1563 | "## Articles, Repositories, etc\n",
1564 | "\n",
1565 | "* https://medium.com/@nilotic2/a-guide-to-turi-create-a72f53f26721\n",
1566 | "* https://blog.usejournal.com/python-for-big-data-computation-on-a-single-computer-c232046df3c3\n",
1567 | "* https://github.com/onmyway133/Avengers"
1568 | ]
1569 | }
1570 | ]
1571 | }
--------------------------------------------------------------------------------