├── .gitignore ├── LICENSE ├── README.md ├── Reference Material └── visualapi.pdf ├── Week 1 - Data Science Background and Course Software Setup ├── Week1Lec1.pdf ├── Week1Lec2.pdf └── lab0_student.ipynb ├── Week 2 - Introduction to Apache Spark ├── Week2Lec3.pdf ├── Week2Lec4.pdf └── lab1_word_count_student.ipynb ├── Week 3 - Data Management ├── Week3Lec5.pdf ├── Week3Lec6.pdf └── lab2_apache_log_student.ipynb ├── Week 4 - Data Quality, Exploratory Data Analysis, and Machine Learning ├── Lab 3 Quiz Questions.pdf ├── Week4Lec7.pdf ├── Week4Lec8.pdf └── lab3_text_analysis_and_entity_resolution_student.ipynb └── Week 5 - Introduction to Machine Learning with Apache Spark ├── Lab 4 Quiz Questions.pdf └── lab4_machine_learning_student.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | # C extensions 6 | *.so 7 | 8 | # Distribution / packaging 9 | .Python 10 | env/ 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | *.egg-info/ 23 | .installed.cfg 24 | *.egg 25 | 26 | # PyInstaller 27 | # Usually these files are written by a python script from a template 28 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 29 | *.manifest 30 | *.spec 31 | 32 | # Installer logs 33 | pip-log.txt 34 | pip-delete-this-directory.txt 35 | 36 | # Unit test / coverage reports 37 | htmlcov/ 38 | .tox/ 39 | .coverage 40 | .coverage.* 41 | .cache 42 | nosetests.xml 43 | coverage.xml 44 | *,cover 45 | 46 | # Translations 47 | *.mo 48 | *.pot 49 | 50 | # Django stuff: 51 | *.log 52 | 53 | # Sphinx documentation 54 | docs/_build/ 55 | 56 | # PyBuilder 57 | target/ 58 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Dipanjan Sarkar 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark 2 | This repository contains code files specifically IPython notebooks for the assignments in the course "Introduction to Big Data with Apache Spark" by UC Berkeley and Databricks on edX 3 | -------------------------------------------------------------------------------- /Reference Material/visualapi.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Reference Material/visualapi.pdf -------------------------------------------------------------------------------- /Week 1 - Data Science Background and Course Software Setup/Week1Lec1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 1 - Data Science Background and Course Software Setup/Week1Lec1.pdf -------------------------------------------------------------------------------- /Week 1 - Data Science Background and Course Software Setup/Week1Lec2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 1 - Data Science Background and Course Software Setup/Week1Lec2.pdf -------------------------------------------------------------------------------- /Week 1 - Data Science Background and Course Software Setup/lab0_student.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#![Spark Logo](http://spark-mooc.github.io/web-assets/images/ta_Spark-logo-small.png) + ![Python Logo](http://spark-mooc.github.io/web-assets/images/python-logo-master-v3-TM-flattened_small.png)\n", 8 | "# **First Notebook: Virtual machine test and assignment submission**\n", 9 | "#### This notebook will test that the virtual machine (VM) is functioning properly and will show you how to submit an assignment to the autograder. To move through the notebook just run each of the cells. You will not need to solve any problems to complete this lab. You can run a cell by pressing \"shift-enter\", which will compute the current cell and advance to the next cell, or by clicking in a cell and pressing \"control-enter\", which will compute the current cell and remain in that cell. At the end of the notebook you will export / download the notebook and submit it to the autograder.\n", 10 | "#### ** This notebook covers: **\n", 11 | "#### *Part 1:* Test Spark functionality\n", 12 | "#### *Part 2:* Check class testing library\n", 13 | "#### *Part 3:* Check plotting\n", 14 | "#### *Part 4:* Check MathJax formulas\n", 15 | "#### *Part 5:* Export / download and submit" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "### ** Part 1: Test Spark functionality **" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "#### ** (1a) Parallelize, filter, and reduce **" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 1, 35 | "metadata": { 36 | "collapsed": false 37 | }, 38 | "outputs": [ 39 | { 40 | "name": "stdout", 41 | "output_type": "stream", 42 | "text": [ 43 | "4999950000\n", 44 | "714264285\n" 45 | ] 46 | } 47 | ], 48 | "source": [ 49 | "# Check that Spark is working\n", 50 | "largeRange = sc.parallelize(xrange(100000))\n", 51 | "reduceTest = largeRange.reduce(lambda a, b: a + b)\n", 52 | "filterReduceTest = largeRange.filter(lambda x: x % 7 == 0).sum()\n", 53 | "\n", 54 | "print reduceTest\n", 55 | "print filterReduceTest\n", 56 | "\n", 57 | "# If the Spark jobs don't work properly these will raise an AssertionError\n", 58 | "assert reduceTest == 4999950000\n", 59 | "assert filterReduceTest == 714264285" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "#### ** (1b) Loading a text file **" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 2, 72 | "metadata": { 73 | "collapsed": false 74 | }, 75 | "outputs": [ 76 | { 77 | "name": "stdout", 78 | "output_type": "stream", 79 | "text": [ 80 | "122395\n" 81 | ] 82 | } 83 | ], 84 | "source": [ 85 | "# Check loading data with sc.textFile\n", 86 | "import os.path\n", 87 | "baseDir = os.path.join('data')\n", 88 | "inputPath = os.path.join('cs100', 'lab1', 'shakespeare.txt')\n", 89 | "fileName = os.path.join(baseDir, inputPath)\n", 90 | "\n", 91 | "rawData = sc.textFile(fileName)\n", 92 | "shakespeareCount = rawData.count()\n", 93 | "\n", 94 | "print shakespeareCount\n", 95 | "\n", 96 | "# If the text file didn't load properly an AssertionError will be raised\n", 97 | "assert shakespeareCount == 122395" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "### ** Part 2: Check class testing library **" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "#### ** (2a) Compare with hash **" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 3, 117 | "metadata": { 118 | "collapsed": false 119 | }, 120 | "outputs": [ 121 | { 122 | "name": "stdout", 123 | "output_type": "stream", 124 | "text": [ 125 | "1 test passed.\n", 126 | "1 test passed.\n" 127 | ] 128 | } 129 | ], 130 | "source": [ 131 | "# TEST Compare with hash (2a)\n", 132 | "# Check our testing library/package\n", 133 | "# This should print '1 test passed.' on two lines\n", 134 | "from test_helper import Test\n", 135 | "\n", 136 | "twelve = 12\n", 137 | "Test.assertEquals(twelve, 12, 'twelve should equal 12')\n", 138 | "Test.assertEqualsHashed(twelve, '7b52009b64fd0a2a49e6d8a939753077792b0554',\n", 139 | " 'twelve, once hashed, should equal the hashed value of 12')" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "#### ** (2b) Compare lists **" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 4, 152 | "metadata": { 153 | "collapsed": false 154 | }, 155 | "outputs": [ 156 | { 157 | "name": "stdout", 158 | "output_type": "stream", 159 | "text": [ 160 | "1 test passed.\n" 161 | ] 162 | } 163 | ], 164 | "source": [ 165 | "# TEST Compare lists (2b)\n", 166 | "# This should print '1 test passed.'\n", 167 | "unsortedList = [(5, 'b'), (5, 'a'), (4, 'c'), (3, 'a')]\n", 168 | "Test.assertEquals(sorted(unsortedList), [(3, 'a'), (4, 'c'), (5, 'a'), (5, 'b')],\n", 169 | " 'unsortedList does not sort properly')" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "### ** Part 3: Check plotting **" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "#### ** (3a) Our first plot **\n", 184 | "#### After executing the code cell below, you should see a plot with 50 blue circles. The circles should start at the bottom left and end at the top right." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 5, 190 | "metadata": { 191 | "collapsed": false 192 | }, 193 | "outputs": [ 194 | { 195 | "data": { 196 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAA20AAAIkCAYAAACN0sPaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzs3deaHOd1//vvW7G7qrqr02RgkMEoSqKtaFth29sXoHuw\nLkoX4Mc+2wfbZ9t/WVa0AikxAxhkTOjpnLsr7oOaGRICQFTTBDEg1+d5JIrkdPfbw35G8+Na71oq\nTdMUIYQQQgghhBCnkva8DyCEEEIIIYQQ4skktAkhhBBCCCHEKSahTQghhBBCCCFOMQltQgghhBBC\nCHGKSWgTQgghhBBCiFNMQpsQQgghhBBCnGIS2oQQQgghhBDiFJPQJoQQQgghhBCnmIQ2IYQQQggh\nhDjFJLSJZ6rdbvOzn/2Mdrv9vI8ivsTkcyaeNfmMiS+CfM7EF0E+Zy8mCW1CCCGEEEIIcYpJaBNC\nCCGEEEKIU0xCmxBCCCGEEEKcYhLahBBCCCGEEOIUM573AfJIkoQ//vGP3Lx5k+l0iuM4vPTSS3zz\nm99EKfW8jyeEEEIIIYQQz8wLEdrefvttPvroI370ox9Rq9U4PDzkF7/4BZZl8frrrz/v4wkhhBBC\nCCHEM/NChLZWq8X58+fZ3t4GwPM8dnZ2aLVaz/lkQgghhBBCCPFsvRB32ra3t9nd3WUwGADQ6XRo\nNpsnIU4IIYQQQgghvqxeiErbq6++yng85t///d/RNI00TfnWt77FpUuXnvgYWRh4OvR6vYf+KMSz\nIJ8z8azJZ0x8EeRzJr4I8jk7XRqNRq6vU2maps/4LP9r7733Hm+//Tbf//73qVardDodfvOb3/C9\n732Pq1evPvYxP/vZz77gUwohhBBCCCFEfj/96U9zfd0LUWl7++23efPNN08qa7VajdFoxJ///Ocn\nhraf/OQnX+QRxRP0ej1+/vOf8+Mf/5hqtfq8jyO+pORzJp41+YyJL4J8zsQX4av2OZuFEfMoJooT\ndKUwdR3PNl64CfQvRGhL0/SRb6xSik8rEuYtNYovRrValX8m4pmTz5l41uQzJr4I8jkTX4Qv8+cs\nTVP2h1N22gN2B1MSUjiKDUqBX7C43PC5UCthGfrzPWxOL0RoO3/+PG+//Tae51GtVmm327z77ru8\n/PLLz/toQgghhBBCiFNidzDhrQcthvMQxza50PApFyx0TZGkKfMopjmc8scHLf6y1+Hqis8bm3W0\nU155eyFC2/e+9z0sy+LXv/410+kU13V59dVXefPNN5/30YQQQgghhBDPwHAecLM95MFgzCKKiZMU\nU9co2RaXGmW2Kx6G/vEw/J32gD/cb1EuWLxxZoVywXrkOR3LpOYUCKKYvcGE95s9hvOA759ff+i5\nTpsXIrSZpsl3v/tdvvvd7z7vowghhBBCCCGeoYPhlA8Pe+wPp+iaRsMrUnN1NKWIkoThPOC3d5q8\nbbS5WC/zylqF5mjG7+8dslZ2udTwn3pnzTJ0ztfLlIsWH+13+d3dJn93Yf3U3nV7IUKbEEIIIYQQ\n4sUTJQkP+hP6swVBnKDIAtOKW2Cj7DwUktI05Vqrz1sP2jiWyeXVKite8bGti/MwYm8w4Vp7wO3u\nkHkUU/eKuQLbJ9WcAi+t1/jwoEP9sM8ra6dzOIuENiGEEEIIIcTnarQIudkecKszZBbGFEwDQ1ek\nKYRxwnv7XcoF82ggSJmCqfPRYZ+3d9tsVjzO18qfGr4KpsHFhs+m7/Lb2/skacqleuUzVcrqboFV\nz+F6a8BLq5VTeb9NQpsQQgghhBDiU83DmHkUEcUJhq5RMHQK5uOjxPXDPm/ttkEpVktFXi17FK2H\nv3Y4D9jrj3l7t8P7B12urPi8f9Bjq+Jxvu7nPpepaxiaRrloMVwE1HT7MwW3zYrLn++32BtMOFPx\nln78syahTQghhBBCCPGINE3ZOxqdvzeYkJJNzldH/9koO1w+qnYdV6fe2evw3kGXdd/lfK2Mrj1+\nuEe5YFFerxHGMdcOuvz2TpO1ssu5WnmpMx6OZsRJwtlKiTBJmEcxxSeEyU/j2RaebXKjPZDQJoQQ\nQgghhDj97vZG/GW3w2gR4tomFxoVXNtEV4o4TZkGIQeDCb+4uY9nm7yxUSNKUt476LJdK3O2Wsr1\nOqauc6ZaYncwxS/YhHGy1O60wWyBa1s4tslkETILIoqGkaXKJa2UHO53h4/dEf28SWgTQgghhBDi\nSyyMY+50xxyOZwx6XQDeut9iPVBcrJfxbPPka9M05YNmj7/sdag4Bd5YqTx2dH65YLFedhkvAu73\nxvz6zgFRnHCmmj+wHdsfTCgXLcpFi9EipK7ruUNXGMeYR6P6LUNnGoSESXLy15Zh6hpJmhIlCaZ+\nupZuS2gTQgghhBDiS2gwC7hxNF0xiBNKtgVJCsA4ivig2cvukfkOV1Z81kvZMI6/7HU4Uy2xXS09\nteLk2RavrNf48/2I5njGildc6oxpmtKazNiqlCiaBpMgJIjj3NW2FDg+oqlrKKWYh/FnCm3H7/To\nW3SqSGgTQgghhBDiBbCIYiZBRBjH6EphGTol23xssLrdGfI/9w7RlGKt7LLhu9iGzrCrcQBcaVRw\nK1UORzP2BxMe7OyxXipyMJqxWfGWuluWpinzKGat5LCIY4Iof+iKkpQkgYKhY+gamlLMwij34w1d\nYxZEJ3+uKUWafrbUFSYJoLBO4ZJtCW1CCCGEEEKcUmma0p7M2WkPuNcbE6cp6dE0EEXWpnil4XO+\nVsI+Cjo3Wn3+cL/FSsnh8sqTR9jrmsaG77JedmiOprx9/5By0eZcdblhIL3pgnkYcWmlgkIxDfKH\nriRJANC07IyWobMIY+IkRdee3iPpWSat0exkqqUCUj5baOtPF1SK1qm7zwYS2oQQQgghhDiV2uMZ\nf7jfojdbYBnZwA6/aGNo2d2rRRTTHE3504MWf9nrcKleZsUr8sf7bdZ9l4v1fIumlVJUijaWoVN1\nbCZBSOkx99ieZBZGKKXwbJMwTpiFEVGSYDxhcuQnHU+XjI96Ek1NY05EnKboOS62rZddbneGHI6n\nbPreUbvk8qFrEcV0p3O+fXZ16cd+ESS0CSGEEEII8Yxld7fmPOiPWUQJ8dGwDMcyOF8tPRKS7vfH\n/PZOE9vQeXWjTqX46P4x1zapuQWCKGZ/OOGjw2yAyGrJyR3Yju0PJ5i6znrZZRpGFC0jV+gCsiqX\nlt0ns3SdeRgzC2NKdp7QpjB0xTQIgeLJmfO2OFqGzopXpDmcslF2SZIUzVg+tO0PJli6xvna6Rv3\nDxLahBBCCCGEeGbCOOFOd8SN9oD+bIGp69iGjqYpkiRlFka8t99lo+xyZcVns+zQHM349e0DfMfm\n5bXaE9sbj1mGfnIH7YP9LlWnwLIz73uTOVXHxrFMhrOAWRBTKuQLbUopkuOQpbJ7ZkEUwyemUn7a\nY9dK2Xs+84mpk8ucfqvicXi/xYPemHLRomAuN/lxGoTsD8ZcafinbmrkMQltQgghhBBCLCFNU6Kj\ndj5DU0+saA1mAf99a4/hIqTqFB5bMYuThNY4GwbyXzt7rJWK9KYLSgWLV9ZqS1XLhrMFNdfG0BWT\nIHxolP/ThEmCq2fVPsvQmEcRXmrken1T14iTrHqoa9kwkejorloem77Hbn9CdzKnUrSB5Voc/aLN\n2arHTnvA5ZUKayUn92PnYcT7ex3KtsXrG7Xcj/uiSWgTQgghhBDiKeIk4X5/wk57QGcyP6ksaUrR\n8Apcbvic8b2T4Rm96YL/c2MXpSn+ZnuNovn4X7t1TWO97LJedulO57x17xCAb5xdXSq4BFFMZ7rg\nQt3HNg1mYYRr5QtdAGn68eh8y9BZRDGLKMlVtao6WRBtj2esld3s+XKfPGvzrDgWe4MxRdNAVyp3\na+axmlMgTfrc7w4xNcVWxTu5L/c4aZrSny243uxRNHV+eGnjZJDLaSShTQghhBBCiCeIkoQPDnrs\ndIbMgohy0eJsrYypa6Rk97na4xm/unWAYxlcaficq5b4xc09NF3xxuYKRs4R8tWiTcHUKZgG0yDC\n1HXy5rZ5FEOa4tkmtqETRDGLKKbwhLD41wxNO6keakqhUMRJAjw9yBRMg7pboDmcslpySNP0qS2d\nf+1C3efPD1rcag94Za2a+31D1t54rdnjUqPEilfkg4M+u/0xKyWHTd/FsT6uOEZxwuFoyv5wwjyM\nWHGL/MPF9dzfp+fldJ9OCCGEEEKIz1GapvRmC6ZBNuFQVxq2qdNwCidj548tophf3drncDxnrezw\nynrtoQBwbKviMQlC9gcT3tnv8u5+B03TePPsau7ABtCdzonihDMrHov4OHTlq/7ER+2IuqbQjipV\n0yDKHUYcy2A4W5CmKUoplFquWrbpe7yz22K0CFGw9K4zv2iz4hW50xlSNHVeWa99aqXs2HAe8MF+\nh5Jt8g8XN7ENnUt1n1udITudIQeDCbqm0I8mbmb/zBVnKy6XGz6rXvFUjvj/axLahBBCCCHEl14Y\nx9ztjbnRGtCbLUjTLJQosv/yLIPLDZ8LtTKOZRDGCf+1s0d/HvD6VoPyU0bgu5bJ5ZUK1aLN7+4c\ncLHhYy4ZXHrTBbZhUHEKjBchszDKHdqOK1vHbZumrjELo5MQ9jSbvsfh6JDhPMAv2h9/b3KqOjae\nbXGj2eNcvYy/xMoAgNE8oD9dcLlRpj8L+MPdJmtlh42y+0jwPA7e+/0xvdmCda/I31/8uL3Rs03e\n2Kzz2nqNveGEySIkPAprlqGzWXZxrBcrBr1YpxVCCCGEEGIJaZpyozXgL/sdgjihdjQQxLPNrPpy\nNMFxfzjhnb0u7+53udLwmYUR3dmCN7YaePZyO8scy6BUsBnNQ8rF/I+N4gTTyIKebehMg/BkafTT\nWLoOKOZhjGOZJ0EtSUHPkb78ooVrmxwMJ5QL9tItjkopXt+s88udXe50Bqy4xdyPHc0D3t/vUHdt\nfnx5i2kYsdMecKszZLc/xi9kO+SOB5yMFwFBFFMr2nxne5ULtdJjq3K6pjhbOZ0j/JcloU0IIYQQ\nQrww0jSlM5mzP5oSxglxkmLqGuWCxdmK91B1K01T3t5t8+Fhn7WSy3bVw/6rqo2mK0q6RalgcbHu\nsz+c8H6zyyJKeGNzucAG2b6vhlekVDCZhTFukp4MJ3nqe+Pj6papaygUszCipD/9DAVTx7NNDkdT\nam7h5HnSnDUzpRRbvsf1Vp/OZIal61hLDuYI4gTL0FAp/PnBIWeqJdZKzhND5yKK2R9M2OuPWfEK\n/P2FDQxdo6xbvHlmha9t1LnbG7E3mLCIEsI4xjI0zvouF+tlGm7hhWht/DxIaBNCCCGEEKdeGCfc\n62X7zrrTBYamZcFGZQMz5lHMWw9aXKyXuVT38YsW7x/0+LDZ5+KKz6b/9IqLoWucrZboTee0xjNs\nU2eZPsE0zap2jZKDpesswphZGOUevW9o6mQYCGSVojjnkmmlFJsVl+uHfRZhhHZUeVqmWrbuuxyO\np1w76PLqRi132ASYBCEf7HfYKrt8+9wq7+x1uNcdcrczZKVUpOoUsuEtadaq2hrP6E7nWLrGS6s+\nb2zUHwl3pq5xueFzueHnPseXlYQ2IYQQQghxqnWnC/775h6TMKJatHllo071r/adzY9aHG+0h1w7\n7LNd9bjbG7NdK+UKbMfiJGE4D9j0PYI4YXrU7phHmmbtiNnuto/vleUdve/aJruDCYsoxjZ0lFLk\nzGwArJUcbrYHHAynrJQcdKXQlriZpinFmYrH3mDCTmuAoetPrWalaUp3Oudas0elYJ3cLfu7CxtM\ngygbCNIecDianjxGkbVjfvvsKudr3qldaH2aSGgTQgghhBBfqOE84FZnyGAeEEQJmga2rrNWch75\nJf5wPOMXN/cwdZ2/3V574jTEgmlwoe5zrlbmQW/Mu/ttygWbs5XSUmfrTRdEccKG75KkR3fUTCNX\ntU0pQH08DMQydIJFTBgnuVoNVz2Hm60s4Jytlpa+V6ZrGmcrJW53Bkftju5S00RmYcROa8Dlehnb\n0Lne7HLH0Nkou6yV3YdaT6M4oTmasj+YsIgiNssO3z+//tD7dCyD1zdqvLZeZR7FBFGCUtlkyeNQ\nKvKR0CaEEEIIIZ65NE3ZHUy40R6wP5yiK41S0TrZDzYNA+72x/x5r82FWpkrRy1xv7y5T8EweG2z\nnmsEvKYUa2WH6y2dimMzWaI9EbI2zJRsEEicpEyCkCCOc4UupRSmprGI4pOzQFZ9y8PQNdbLDs3h\nlK2Kly28XqJFEeBcrcTBcMLtdp9ywcTN+d4ni5D39zu4lnE0Ol+jPZmz0x5wrzfibneIpetZy2aS\nEsQJmiLX6HylFEXToJj/H4P4KxLahBBCCCHEMxUnCf9z95DbvRGeZXJ5pcpKqfhIFel4MMWtzpCb\n7UFWVVOKVzfyBbZj+4MJlq6zXnaZBCEFQ8+9Ly052VOmMPRs59ksjHIP5ai7BdrjOWerpU+EmPw9\njpu+x25/woPeCL9o4xnLJZ0gToCUStFi57BPdzJns+LhF6zHhqpZGLE/mHAwnFAtWPzg0ubJmoEV\nr8iKV+SbWxH3+2NmYUyUJBiaRsHUOeN7L9zo/BeVfJeFEEIIIcRSjoeC3O2NmYUR0dEER8c0uFAr\ncabinQyxiJOUX97aZ2845aW1Givek0fB24bO+XqZ7VqJ9/c63OuNeGOrsdSCaoDhfEG5kI2wH86C\n3BMYIRv+QZoSH019NHWdII5zv/ZWxeNgOKE3XVB1CgBLtQG6tsn5epnrhz22ayVWPSf3Y8M45v29\nDgXD4J+ubLE/mnLtsM/7e20KpkHDKxKN5wA0R1PuzlIGswUFU+eV1QqvrVcfe7+sYBpcWankPof4\n/EloE0IIIYQQuYwXIddbfW51hizihErRpmiZaEctc8NFyK9uH1C0DC7Xy1xZ8Xlnr8PecMprGw0q\njp3rdTSlKJg6rmVg6PrJYI68wjg5uftmGRrzKMbLuWS6ePS4ySKgXLRRWYbLPUWyVMjWBxwMJydt\nmcvcSwPY8l1uHPbY7Y+xDZ3t6uP3kH3SaB7wUbOLphQ/vLSJa5tctn0u1cscjmfcaA1oDqfMhiMA\nWuMpjbrL98+vcbbqYSxRyRRfPAltQgghhBDiqQ5HM355a58oTVkrO2yU3ccOBZkGIXuDCR80e1xr\n9ZmHMVdWq7kDG2Qtis3RlM2Kh6VrjBchtq4vNVTjmGVkoW8exSeB7NOUCxaOZXIwnFIu5j/zJ23X\nyry/1+ZBb8R62cVcIhAlacpHzR5+weJyo8y11oD9wYTVksOm7+JYH7dLJklKa5wNAxkvQqrFrL3x\nk3f4lFKslRzWSlnF7rDV4v/58I/831fP0mg0PtP7E188CW1CCCGEEF9B/dmCm+0hh+PZyT0oS9ep\nOTaXGz415+OR+s3RlP+6uYdjmnzjMfu0PsmxTC6vVDhbLfHrm3soBQ23sNTZgigmilPKBRvb0Jca\nBgLZqP0oSYCsyqWprBKYh1KKTd9lpz0giOJsGMjRVMi8VrwiWxWPndYA29CpOjYqxxPEScJHBz3G\ni4AfX95kreRwdbVyNDZ/yMFggqFr6Jo62XcGsFl2+JszDTZ996lVvWWrfuJ0kNAmhBBCCPEVkaYp\nDwYTrrf6NEczdE2j5hZwbBNQ2V21/oSbnSF1p8DVFZ+6Y/OrWwc4lsnrm43cv/RrSqFrirpbYDAP\nqTkaWs5JiFngyu6UGbq29DCQUsHiQW9MnCTomoZCnYzhz2Ot7HKrM2S3P2bFcx4adZ9HenQnzjY0\n9gcTZmHEVsWj8le75Y4lScrheMpuPzvzDy5unFTGXMvkaxt1Xl2rsTeYMA5CgihG0xS2rrNedigX\n8t3XEy8uCW1CCCGEEF8BaZry1oM211p9PNviymqVhvfoBMdsWfKC/cGY39xtYmpZmenV9fpSVZrW\neEoKnK2VmYdxtmQ65/j546rUccyyDJ1FGJPk3Fu2UXa51x3RHs9YK7ukpEt1Vpq6xsV6meuHfZI0\n5fKSQzge9Md0JjN+cHETQ1e8f9Djg/0OtmGwVnYoGDqaphEnCaNFQGs0I0kSNsoub2zWqT2mlVTX\nFGer+ZeEiy8XCW1CCCGEEC+gNE1pjmbc7g4ZLyKCOMbQFLahc6bica5aOqkQpWnK7+8dcrMz5ELD\nZ9N/8i//SmXVsbpb4GAw4U/3D7nU8JcauQ/Z+H5L17ENnSRJsyXVlkme3JedWxEc7TvTlSIlJUlS\nNP3pT1AwDepugeZwymrJOWpxXHIYSMXjXm/Eg94I1zI5Vys99TnSNOVOZ8juYMwbGzUuNcoAbFc8\nWpM5O60B93sjkjQ9mWtiGTovrWQDQ0pSMRNPIKFNCCGEEOIFEsUJNztDdtoDBvOAgmng2RZFyyRJ\nU8ZBzP/cO+Tt3TaX6mUuN3zu98fstIdcWauetN3lESYJjmXgFSxGi2CpNrwoTk/G/luGzngRE0Qx\ntvn0FkdT1/Bsk/Z4RuMTS5vzNzjCZsXjnQctdvtjSgVrqemTAINZQJIkXKyX2euPOBxNWS87rJfd\nR9o0gyhmf5jtOovjhL89s8LVFf/k7yulWPWKrHpF0jRbTB0frUkwNLV0oBRfPRLahBBCCCFeENMg\n4pe39ulM59TcIq9vNig/ZmnyIozYH0650R5yoz0gihM2K6WlAhtAczil7hYpFaysvdEyT4LY0+ja\nx/fIdE2hK415FOUKbUopNisu15s95mGEcbQ7bJloU3MKnKmWuNUecLHhs7bEvrPhPOCDgw5bvssP\nLm7Sny+40Rpwtzfmfm+EZ1snw1iiOGtxNDWNC7USlxs+1U+ZlKmUWjpACiGhTQghhBDiOUnTlIPR\nlL3BlEGvA8C7+x1WQo0L9RLuJ8a7z8KI/7zxgFkU8/UzK3j2k6tetmlwvl7mbNXj93eaTILoU5da\nP0kQx1QcG0vP7pTNwuihcfKfxtQ1wjg+GQaiaYp4mWEgJYdb7QHN0ZSNctbOuXSLo++y0+pzvzdC\n1xRnKt5jl0cfS5KUg9GE2+0Bq16Rv7uwjqYpak6B75wr8I2tBne6I7rTOYsoRqGwCiYvr/qcr5Zy\nD0oRYlkS2oQQQgghvmCLKOZ2d8hOa8BgEVIwdNQ8BKAzXXBw0OW9gy5nfJfLDZ+Ga/PLW/vMwiyw\nPW4/2uNkExyh7hWZhTEFM1lqEmKUZC2OSmUhLKu2GbnCU8MrcrM9oDOZs1pyUEDOqfsA6JrGlu9x\ntzfC0nUqRTt3lQ+Oxuc3e6y6BS42fD5q9tjrj2l4Dhu+i2ebaEqRptl9u4PhlMPRNGuJrJX42+3V\nRxZO24bOS6vLDSUR4vMgoU0IIYQQ4gvUmcz571v7zMKIulvkjbpPqWAx6nVpAy+tVnErVZqjKQeD\nCff6Y/yCRW+24BtnVnMHNoDxImQaRLyyXkcpmCzCpZZc69rH+81sQydYxARxkqu9r2ga1B2bg+GE\nFa+YDd5Y8urWuXqZ3nTOjcMeb2yt5H5cFCd8cNAhjGP+8coZao7NSys+t7sjbrQGvLPbIk1BU1mQ\nVEDhaCDIxXpZRuiLU0dCmxBCCCHE/1J3umAwWxDEydFEQI2GW3yklfBwPOO/dvawDZ1vnVt/Yjud\nrmls+h4bZZf2ZM4f7x7Q8B59vqcJjpYvF63sV75ZGBEnae6Kla3rzMII4GTH2jL7zjYrHu/uthnN\nAzQtG7qxDE0pXNukM5mz0+oxDUI2ffeJwTVJU1rjGfe7I0hTfnhx82R8fsE0eGWtysurFQ7HM6ZB\nRJgkGJqGbeisecVPXRouxPMkoU0IIYQQ4jOIkoT7vTE32gPakzlJmlWmSCFOU3Sl2PIdLjd8NsoO\no0XIL2/uUzQNXtus5xqhfzy0wjKy9sDhPMQvWLknchxXyXRNoSntZF9a3vC3WnK43RkQxllbpSJr\nJ8yr5hQoFSyuHfY4XyuzssQwEIDmcEJrNOPvLqwRxik3WgP2+mMqjs1qycE2dJRSREnCYLqgOZoS\nJwkbJYc3z6zgFx+tmCmllh7IIsTzJqFNCCGEEGJJ+8MJv717yCyI8Is2L63VqLmFk8XPcZJwOJqx\nP5xwf2ePStHC1nVSBa9u5Atsxw5HUwqmwWrJZR5FFGM998CL44panKToxsf30vKGtvWyw53OgNZo\nymbFy5ZUL9HjqJTia5sNfnlzl1vtATWnAOQ7+/5gwq12nyuNMi+vVlFK8epalXv9MTdaA3YOew+t\nALB0nasNn0sNaW8UXz4S2oQQQggh4GR/VhQnGLqGqWsnIeyT7nRH/O5uk1LB4tWNOsXHtOrpmsaG\n77LhuwznAR8edLjfH/P6ZmPpFrxFFFM0DWxTJ4izSlne0GYdTUo8foyuKYI4PVo2nePxhs5KqcjB\ncHoyffJx35M859AV/GW3xXo5+7487vuWpim92YL9/pjebMFLKz5vnlk5CYqGrnGxXuZCrcQiyu7X\nJWm276xgGEsNKhHiRSKhTQghhBBfad3pgp12n7vdMVGSZAMzyILXuarH5YZPzbFRSrE/nPC7u01q\nbpGrq5VcVadywaLuFBkvQixdJzq6R5VXFCcPLalehHHue2mebeJYJs3hFL9of2JJdYrK2WO5XS3T\nHh/yUbPHuVoJa4nQGScJ7+938CyD/+vKJrc6I262h+z1x/hFm4pjY2gaKSlBlNAaTwmimGrR5vvn\n1jhfKz32e6yUomAaFJa74ifEC0tCmxBCCCG+kpqjKe/sdWlNZhi6xnrZxbHNk4mJsyDkXn/Czc6Q\nulPg1fUKv7/XolSwcgc2yKpHh+Mpm76HpimG8/BkOEYeuqYIkwTIKlbzMGYeRrg5WhyVUmxVPG60\n+gRR/PFfz/3q4NomL69X+cOdJppS1N0Ceo73HkQx7+9/PMGxXLD5xpbN6+s17vfH7LSHHAzGhHGC\nUgpL19gqO0crDgpL72QT4stMQpsQQgghvnJutof84f4hRcvkpbUa9ceGhCJnqyW60wUPeiP+v2u7\nKE3x+mZjqUARp1kVqVSwKJgG0yA8GeyRh2XojCYhaZrdJ9M/45Lq/cGEtbKDptTSgSiIEhxLJ4gi\n/nivmbXHo1mfAAAgAElEQVQ4lt3HtmnOw4j9wYTmcIptaPzjla2HQqqha1yol7lQLwPZxEfF8ouz\nhfgqkdAmhBBCiBdenKTsDSY0xzOCOCZNwdI1/ILFuVrpob1id7pD/udek9WSw+WVT6+YqaPKUrVo\n8YudXRzTZIn90NnZ4qxKpmsqm8CoFLMwwtTzDctY8Yrs9ccM50HW4ghLTXA0dI3tWolb7QEAm767\n1Pl70zm3OwNeXavx0mqFjw573OmOud8bUXUKOJaBrhRRkjINQvrTBbahc3XF56VVH8f69IrgZ7kj\nJ8RXzQsT2v71X/+V8Xj8yF9/9dVX+fu///vncCIhhBBCPG/TIOJWZ8hOe8AkiChaBqamnYyBv94e\n8Oe9DuerHpdXfEDxu7uHNLynB7ZPGi1CFNk0xdE8xNC03JWyv95vdtzi6NlprsBSKdq4lsnBcIJf\ntI9C43JBZ7taoj2ecbszwLUMSjmnK7bHM641u2yVHf7mzAqapvj29hpf32xwpzviXm/McLYgOqoc\nOpbB986vsV3xZOeZEJ+jFya0/eQnP3no3yp1u13+4z/+g0uXLj3HUwkhhBDiedkdTPjN7QOiJGWl\nVOTqWu2Re15BFHMwnHC3P2GnM8QxDQxN48oSd9IAxosQlGK17DBehEyDEL+Y716aoWkoxcmdMlPX\nWETZkmtNf/oZlFJsHt1Lm4cRaZryWYYkagr8gsnd7pDOZM6G79Lwio8ExzRN6U7n7A8m9GcLzlc9\nvnNu7SR8AtiGzkurFV5arSx/ECHE0l6Y0FYoFB7687t37+L7PhsbG8/pREIIIYT4vB2P3Q/jBO1o\nOMXjKjZ3uiN+e7dJpWhzdbX6xKqOZehs18qcrZa40xnywUGXK6sVtCUrVVGcHIUvtfQER6UUNbdA\nazRjveyehMVkmXtpZYcH/REf7Hc4Vy9nC7ZzStOUm+0BszDmRxc3idOU660BNw572e40t3BSNYzi\nhN50QRjH1J3Cp05wFEJ8cV6Y0PZJcRxz48YNvv71rz/vowghhBDiczBehFmbY2fIIoxO7o0pYL3k\ncHnFZ6vsommK5mjK7+42qbsFrh4tXX6a4wEeRdPAtU0mQZhr+uLHT/Dx/1x2giPApu/x7m6L8eLj\n110mBhmaxtc2G/zq5h63WgPqxcLTH0QW2HZafQ5HU769vcrG0X22MxWPwSzgZmfA4XjGbBGSkt0D\n3K64D605EEI8fy9kaLtz5w5BEHD16tXnfRQhhBBC/C8M5wF/3m2zO5iCgtWSg18rY2iKJIVFFHE4\nmvLfN/fxbIOXVirc7AxxLJMrOQPbsfZkTsMr4loW4yCiYOZfxmxqGlGSLXLWlDpqcYxzh7aaY1M0\nDfYHEy6tZC2FywYiXdOwDA1Nwdu7LdZKzhOXVMdJwuFoxt5gTBDFfGd7lUsN/6Gv8YsWb55ZWeoM\nQojn44UMbdeuXWN7exvHcZ74Ne12+ws8kXiSXq/30B+FeBbkcyaeNfmMPRu96Zy3dtukwJrnUnNt\ndBVDMAVAB0zAc3SmZkprNOM3H3ZI0pRX1uuMe92lXm827FOyLYJRtoOtNdVxHhN4HkcLI5LJiL19\n8Is2QZS1RxqL/G2KDT3mbquHtphQsk2mof1QtW08HDz0x0+K05Qbh32cNOVvt1fZHYzZbTa5v5dQ\nKpg4ZrZfLjlqLx1MF6QprHoFLtRL+ITyu5EA5OfZadNoNHJ9nUqXmRl7CoxGI/7t3/6Nf/7nf+bc\nuXNP/Lqf/exnX+CphBBCCCGEEGI5P/3pT3N93QtXabt27RrFYpHt7e1P/bqf/OQnX9CJxKfp9Xr8\n/Oc/58c//jHVavV5H0d8ScnnTDxr8hl7usEs4H5/xN5wSpRky5J1LdvdBVA0dM5WPLYqLoam+NWt\nA1CKK6sV9JxtgmGc8M5em9WSg2MZuJZBwcj/q8wHBx2KpslmxSNNUyZBSMmysIx8o+k70zm324OT\ntswkTaksMRAEshbNG4c9bEOnUrRZLTm4tokiq7C9+z+/4Wvf+T5e2SdOU7qTOYejKUrBm1sNqk6+\nu2xCPIn8PHsxvVChLU1Trl+/ztWrV5/aB5631Ci+GNVqVf6ZiGdOPmfiWZPP2KPCOOY3d5rsDiaY\nusnZzQ3Wyy7W0TLr43C0P5hwezTjznxMwy0Q2A5/u71GIWd7ImT331RxztraCppSJElCyS2S92qY\nN0+ZBhFe5egu3CzAsY2nLn8+eXw15SBQNCON7WoZy9Ao5xz7D7AII1rDgCtnNzlXK/NBs8etaYAT\nKlZLDrGZBbKFYTOJdFrjKQqNM+trvLFZp7xkQBTi08jPsxfLCxXadnd3mUwmvPTSS8/7KEIIIcRX\n3jyM+PnOHv15wNW1Gg238Mi/VFVK4dkWV1YtLtR97vdGfHjQZd13l6qSQTZcA8DQFKauM17EBFGM\nbeq5Hr/xiQmOpYKFUpAscUlEU4rXNhu8da/JrXafN7by/8K7iGLe2+9Q0HW+c24dxzK4UCvRHM3Y\naQ940BsRjocAPOiO8CoGr69XuVgv4+YMlUKIL68XKrSdOXOGf/mXf3nexxBCCCG+tMI44V5vRHe6\nIIizkGTpGnW3wHbFO9mHFsUJv7y1z3AR8vUzK7mChaFrVJ0ClqHjFy3GQYi3xNh97RP7zXRNoWsa\n0zDKHdqOJzg2h1NKBYuU5cbuA5QLFhXH5mAw4Vqzx4W6j1+0ntgBlKQpncmcW+0+BV3nR5c3cazs\n1y+lFOtlh/WyQ5qm7DcP+X8//CP/dPUMa6srMm5fCHHihQptQgghhHg2hvOAnfaAW50hizjBtUwM\nLQtoYZJwvTXgbaPNxUaZS/Uyd7oj2tMFX9tsLFUJ6kxmOJZB3S0yCUIsXTtppXyabAG0IohiHMvE\n0jVmYUSaprl3tW34Hrc7A9bnLunR+P5lTIOQaRDxja0Gncmc9/fbFEyD9bKbLanWNFKyCY6t0Yzm\ncEKUJKyXHL57bu0ksD3ubMffB0PXJLAJIR4ioU0IIYT4irve6vOn+200TbFWdtgou4/cNZuFEfuD\nCdcPB3zU7JGksO67lJa8ZxUmCZauUzANwjhhFka5Q1vRNHAsg9Z4RsX5uBUzSUHPmXG2Ki6H4ykf\nHHS4UPex3HyvDUctjnsdKgWLN880MDSNw3HW3nivO+R2++FR/ZahcbFe5nKjTGWJu29CCPHXJLQJ\nIYQQX2Hv7Xd5Z7/Duu9yoe4/sfJUNA0uNnzO18r8+cEhzdGMV5zlg0iafrxU2jJ0FmG27yzPkmul\nFJu+y057wLk4RjtqbkyXaHTUNY3XN+r84sYDbrX7+EUr14CP0Tzgg4PsTtoPLm1g6lnYWys5rJUc\npkHEcB4QxglKZVXBmmOffJ0QQvxvSGgTQgghvkTSoztUd3pjJkFIGCcYmqJgGpyteGyWnZPQdLM9\n5J39DmerZbZrpVzPrx2Fq7pXIEpS5mFMIeedMsjWABwPFLF0nXkYMw8j3Jx329bKLrc6Qw6HU9bK\nbnamJW+mLaKYomlQtk3eedCiVLDY8F0aXvGh0JqmKe3JnP3+mOE8oOEW+IeLG49tcXQs44mtj0II\n8b8lP12EEEKIL4EoSbjbHbPTHtCZzjF1HccyMTRFECV0Z1NutoeUbJMrKz5nKx5v7bZoeA5nq95S\nrzUOIs5UPUxdZ7QIsI1Hp0Y+iWMa7IcTwjjB1DUMTREdhbg8TF1js+zwoD/GMnSKprHU/a8girnW\n7LHiFfinK2fYG07ZaQ+4cdjjVntAwdTRlEaSJizCmChJWPOKvLGxzpmKl6siKIQQnzcJbUIIIcQL\nbhZG/PfNfdrTOdWizSvrdaqO/UiYGc0D9gYT3t5t89aDFkkKXz9TXir0JGlKkqSYmkbB1BnNYxZR\nnHvf2mrJ4VZ7QGs0ZbPiHS2pXurtcnGlwngRcu2gy6sb9dx72uZhxHt7HUxN8Q8XNzB0je2qx3bV\nYzBbcLc3ZhZGREmCoWkUj6qT1c/QBiqEEJ8nCW1CCCHEC2waRPznjQfMophvnFnBs598P6tUsHip\nYHGuVuIXNx5QdQpLT088lpKN4DeOxu7nDW2WobPiFWmOpmz4Likpasn2Rk0pVkoOB8MJt9p9ZmHE\npu8+cUl2GMccDKfs9sc4ps4PL20+MvHSL9q8IcNChBCnlIQ2IYQQ4pRJkpS94YT94ZQgToiTBMvQ\nKdsW52ulk7tTx7vSZlHM17dWcgenRRQfhSeH/mxBzSnkbvvTlELXFOHxDjdDZxqERHFyssPtaTYr\nHs37LQ5HM1zbPLknl1cUJ+wPJlxd8VnxHG60BxwMJpQLFo1SEUvXUSpru+xN5nQmc3QNzldLvLFZ\np5jz+ySEEKeF/NQSQgghTolZGHGrM2SnPWC8iChaxlEAgXgecrszygaHVFwuN3wG84D2dM43zuQP\nbJC1CSpgpeQwDUImQZhrguIxv2jRnczYqngnu9ziNM39S0W5YLHhZ22SZ6sem37+O3VxkvDBQQfS\nlDfPrOIXLV5Zq/JgMGanNeBOe8Bxt6UCPNvkza0652vlpQamCCHEaSKhTQghhDgFDoZTfnV7nzBO\naZSKXF6tPtLqGMUJzdGU/cGEO90RaZpSc51PbYl8nCjJllEberbQeR7GeHb+RdNbFY93d9uM5wHe\nUdhL0/wX05RSXFmt0h7PuNka4FgGRdN96t26RRTz4X6HRRTzw0sb+MXstXVNca5a4ly1RJykBHFM\nmoKla+iakkXVQogXnoQ2IYQQ4jl70B/zq9sHeLbFN8/WnthmaOgaWxWPTd/lZnvA9cMe5xrW0e6z\n/K+nKUWSpqRpim3oLKKIeRjnHllfcwoUTIOD4ZRLJ4FxuWCkyMJWw7W53R6w2xuz4buslZ2Hdpul\nacpgHrA3GNOdzHFMg3+8skXdLTz2eXVNUdTk1xshxJeL/FQTQgghnoE0TTkczxjOQ8I4RimFpeus\nlYp4n9hJ1pnM+c2dJn7R5pX1Wq6qkFKKKE6oFG0sXWO8CCgt0d5o6tlmsyCKsU0DQ9OYh1Hu0KaU\nYqvicbM9oOrYGLrGspPwb7UHJAn86MomSZqy0x5wtzfibneEYxnoSpGQEkQJYRzjFyy+dXaF89US\nliFtjkKIrxYJbUIIIcTnKIhibndH7LQHDOYBKWBoGmmaEiUpulJs+Q6XGz4bZYe3HrQxDZ2X1/IF\ntmPzKKJUsCha5sn0RjPnIJBK0UbXNA7HM85WSxiaRhDHS73PM5VsTP61Zo/z9TKrXjHX49I05V53\nxMFwwrfOrlA7qpg1vCLf2Iq40xsxXkQEcYyusvbNzbLDqleUNkchxFeWhDYhhBDic3KnO+QP91oE\nSULdLfLaZgO/YJ2EjThJOBzN2B9OuL+zh2MZTIKQV9brn2GCYoplqKP2xphZGGHq+apthq6xVnY4\nHE7ZqnigyHalpeTuclRKcXW1woP+mDudIYam2PDdh1ob/9o0CLnbHdGdzPjGZp0rK5WH/n7BNHh5\ntZrvAEII8RUioU0IIYT4HHzY7PH2bpuGV+RC3X9sC5+uaWz4Lutlh9Ei5A93DtA0tdTkxo+fS5Fk\nU/ex9OWHiWz6Lnv9Cd3JnJJtZXfilixkdSYLSpbBhVqZe4Mx93tj6m6Bdd+laOhomkacJIwWIfv9\nMcN5QNEy+P75dc7XSsu9mBBCfIVJaBNCCCE+RZqmhFHWOhjGMWmaPtKmd7sz5O3dNlsVj3O18lPb\n+JRSuJaBoWvU3SLDeUB1iV1pkN1Lm0cRwMkwkUUU595B5tkWNdfmTmfApUYFc8l7YuNFwO3OgHO1\nEt+7sM43w+ikLfT9vTafHCapFKx5Rd7YWOdMxVvqfQohhJDQJoQQQjwiTVO608XRcIwxwWgAwH9e\n36VwMOZivcylehm/aDMLI35/75AVz8kV2I4FcUKaZne5EmA0D6g4du4zrnhFPjzoMguyfW6aUsRJ\n/rH7AK+s13nrfpMPm11e36jnftxoHvD+foda0ebb22tA1tr4ylqVl1crdKcLgjgmSlJMTcOxjM9U\nTRRCCJGR0CaEEEJ8wt5gwrv7XTrTOaaus+67pGbCNeBsrURadLjRHvLRYZ/1koNrGiQpXGz4Sw3K\niOIESLNdabrOLIyIkwRdyzdMZMVz2NEHNEcTztd9FNkY/2WYusam7/Lebocbhz2mQchG2cV+QrVu\nHkbsDybsDyesuAX+/sL6I8NPlFJPHMcvhBDis5HQJoQQQhy5dtjnTw9alGyLl9fr1BwbpRRDQiDb\nT1au+ZyrlWmPZzzojbje6rNdKz9xt9qTHLcIJkmKZWV30mZhjGfnex7taPDHbn/MmUqJlHTZK2mk\naUprPOPqSoWKY3GzM+RBf0zVKVB3C5i6RppCGCd0JjP6swW2rvPKaoXX15+8T04IIcTnS0KbEEII\nAVxvZYFt0/c4X//0NkdNKVZLDgDtyRzXslhEMfYS98IMTQMUizgGBaahMQsjXMvIXbHb9F32BmOu\nH/bY9L3cQ0ggC2w7rT6LMOb18zVWvSJf26hztzfiRmvArVaf47qdIgus391e41zVk7AmhBBfMAlt\nQgghvrTSNKUzmdOazAnjbNSiqWtUizZrpY/3fjVHU/50v82G73Gh4ed+/mkQ4VompYLJYBZQc+2j\nMPZ0lqFTLlq0RjNWPAdT0wiimDhNMXKGr4Jp8Ppmgz/da7KIYr55ZiXX49I05WZ7wOFoyne21052\nrJm6xuWGz+WGT5wkLKIEpcDStdxtm0IIIT5/EtqEEEJ86YRxwr3eiBvtAd3pAqXUyd2rKE6I0xS/\nYHGl4XO+VuLDZp+CpXOhXl7qdaIkwdA1XMtkOA+YBdnC67y2fPdkmMjxioAlr6VRKdqU7Cw0vrvb\nZrPisVZyHlsNS47aIff6Y2ZhxHe2V7nUePx71jUNx5KgJoQQp4GENiGEEF8qzdGUX98+YBbFVIs2\nr2zUqRbtk6pamqYM5wF7gwl/etDird0WQZTw8nptqUEix9I0BZVVzmZRjPeYlQBPcjxM5GA4Ybu2\nXGA8NlmEBFHC986vMZgF3OsOudsZUveKeLaJoSniNGUWRrRGM+IkYaPk8L1zq6wdtXgKIYQ43SS0\nCSGE+NJ40B/z69sHOLbJa5sNCo+ZgqiUwi/a+EWbIIr5/d0DoiTFs5cfSW9qGlGSkKYp1tGutHkY\nU7Ty/d+rpinOVj1utQc4lkHBNJa6lxbGCR8edPGLFi+vVtA1jW8EEbe7Q251hvQmM5KjEFkwdK40\nylxu+DJ+XwghXjAS2oQQQnwptMczfnOnSblo8/J6LVf4sQwdU9co2RaTIMLUtaWGifhFm9udAcN5\ngF/M7rMtovyhDeBstcR4EXLjsM+5Wpk1L1/1K4hi3tvvoEj5wcWNkztnjmXw2nqN19ZrpGlKnKbo\nSn2mKqIQQojTQUKbEEKIU+t4yfWtzpBxEBLGCbqWVY22fI+zFRdd00jTlN/dO8Q2jdyB7VgUp3ie\nhaEphvOAhlvIHXD8ooVrmxwMJ/hHLZjL7kpTSnFppcL93oj7vSFxkrBV8Z54Ny6ME5rDCbuDMbau\n8cMrW0+snCmlcg81EUIIcXpJaBNCCHHqxEnCne6YnfbgZMl1dj9LI0xSxtMFd7pjipbB5XoZzzYZ\nzAPe2GwsFdgA4jRFU1C0DEbzgHkUU3zCcum/ppRi0/e40eoTRDEKSD7D+22NppRtk9c36tzqDPnL\ngxaubbJacrB0DaUUUZIwmAW0x1N0pdiuenx9s45jmZ/hFYUQQrxIJLQJIYQ4VeZhzK9u79Mczag4\nNq+s16k69iPVr2kQsj+Y8EGzx3gRUnHspSY3HjM0RZSkaEphaBqzIKJoGOTdVL1WcrjdGbBztGR7\n2dA4WYTc6424WC/z+kaN19ar7A2n7LQH3OsOSdOUlOw4nm3yza0GF2qlx97XE0II8eUkP/GFEEKc\nGoso5v/ceMAoiHjjzMqnDsxwLJNLKxXWyw6/3NnDsy3CODkZnZ9X0cwqbJDdcZsGIWGSnKwIeBpD\n13hto847u21utwdcWankfu1JEPLefptqweLNMw0gq95t+S5bvkuapgRxQpKmsitNCCG+wuSnvxBC\niFMhTlJ+dXs/C2xbjdwTDhdxgm3q+EWb/jwgipdrUNzwXYazBfMwOglqcbLcc1SdAmerJQ5HU262\nB/Sni2wVwBPEScL+YMJfHrQoWSY/uLSJqT8aNpVS2IZO0TQksAkhxFeYVNqEEEI8U2Ecc7c3Zncw\nYR7FJElWNfJsk4v18sngjwf9MQfDGV/baix1TyuKExRQKljMw4hJEOIX7dyPX/GK7OgazeGUc/Uy\nsPwwEYBZGLFeLmLrivf32xRMg/WyS9XJpkqmacoiTjgcTWmNpqQpbFddvr29+tjAJoQQQhyT0CaE\nEOKZGM4DbrQG3OoOCeIEv2BjmzqGrojihPuDCTc7Q2pFm8srPrc7Q8pFa6nABTx0180ydBZhTJyk\n6Fq+u2W6prFRdtkbTFgrO3Bygyy/0TygN53z7bMrXG74HI5nJ3fSbrc/DoBKQcE0eHWtyqV6GdeW\nISJCCCGeTkKbEEKIz92D/pjf3GmSAutllw3ffWT/WZqm9GcL9voTfn37gCCK+cbZ1aVfyzgKZ2Ec\nU/z/2buzHzmvO73j33etvapr6Z3N5i5LMmVJHsmbLI/smQQYjzGJgVwEGSAXAfxH+S5IkNwkmTiD\nDCabZI+XsWyJomSRkrgvvXfte9W75aIljmiRYld1sdndfD43Eqvrfc8p6BVZD885v5/j0PcCep5P\neoRAdLyQodzp8dFGleP5zEjFRHqez6X1CjPpBCeLWQzDYDaTZDaTpOf5tAY7rQpMA1zLYirhaquj\niIiMRKFNREQm6k6txa9vbjKVjPHMbP6hAcUwDPLJOPlknA/XIrZaPcxP+5yNEpqycRfLMim3exwv\nOLiWSX/E0OZYFucXSrx9a4Or23WycZe48+gti43egI82qqRdm++enMP+o8+acOxdtw8QERF5GP1J\nIiIiE1Nu9/jH25sUUnGemc3vukm1YRhMJWKEETR6Q6YSMXab2yzTZC6TZLPV41g+g2UaDIOIKGLX\n94CdgJV0bXpDn/dXtplKxpjPpSkk79+uGUYR5XaP9UaHVn/IbCbBd0/Nf2ElUUREZFIU2kRE5JH8\nMGSl3qbWGzL0A0zDwLVMptMJ5rPJe+Hsg/UqrmVxboTABjvFRGzLJOnadIYewyAYKQQt5NKs1jtU\nO/17Z+IiIowRzqY1P608+WdnFxkEIVe3G3y8UcGxTMx+F4ArW3X8xpAgDJnLJHlxocixXApzl+fn\nRERExqHQJiIiD9UaeFwvN7headL3AuKOjW0ZRBF4QciHGzWycYczpRzFZJyNVpezM/mRG0xbpoEX\nRtif9iLrDv2RQlsq5lBIxbhVafLMbB4YrZSIFwRc2aqRT8RYyKUwDIOThQzV7oA7tTa1qk8FyCdd\nSsUplvPpkQumiIiIjEuhTUREviCKIq5sN7iwWr5XWGM+l7rvfFYURbQGHmv1Nu+tVuh7HnHHppRO\njDyeY1m0B32iKMK1LHqehx+GXzgj9mW+MlvgvZUtPt6scrKY2/VK39APuLRewTIMvntq/t51hmFQ\nTMUppuKU43ANeGG+SKlUHPnziYiI7IXKV4mIyBf8Yb3KuyvbzGVTvLo8y6lS7gsFNQzDIBt3+cpc\ngVeWZ/DDiHQ8xtAfrTE17PRK639aadG1TQwM+l4w0j1ce6eYSLs/5MZ2nbVG+0ubZIdRxGary8WV\nbYgi/vT0wkjFS0RERPaLVtpEROQ+V7frfLhRZbmQ5Vg+s6trTMMkZlukYw7N/hDTcHFH2N6YT8ZI\nuDabzQ7ZuItpjtfgujvcqRo5l0lyu9LkVqXJTCZJKZXAsU0MdrZ1Vjt9NltdgjBkIZvklaUZ9UwT\nEZEDS6FNROQp4gUhAz+4t/UwZls41j9tuuh7PhdWy8xmU7sObLBTqAQMkq6NaRq0Bh5Fy9r1wTLD\nMFjIpblRbjDwAwx2tl+OIooiVuttZtNJ/uzcMTpDjxuVJtfKTTabHT67nfFpv7RzpRynS1mycXek\ncURERPabQpuIyBEXRRFb7R7Xyg3u1jv3rWCZhsGxXIozpRyzmQQ3qy2CMGK5sPvABjuFRGBny2Hc\ntu5VgBxltW0um2Kl3uaTjSrLxSyONdofUbeqTdqDIX9ybAGAlOtwfr7Ic7MF2oOd+USAa5mkXQfb\n0gkBERE5HBTaRESOsJV6mw/WKtT7Q2K2zfFClqRrY5kmQRjSHfpsNDvcqbfJxhw6Q59SOoFjjdZz\nzDZNTBP6XkAxZWIaBj3PHym0OZbJ+YUS793d4upWnefnC7u6LooiblaarDfavHysxEIudd/PLdMg\nl9BqmoiIHF4KbSIiR9THWzUurJTJxmM8P18il3C/UFExn4SFXIpmf8iVrRq13oCTpSmIGKlmvmEY\nTKcSbLW6LORSuLbFwAsIw2ikHmbpmMOJYpY/rJb5aKPK4lSa+WzqgeHvsybXa/U2naHH149Nc246\nt/tJi4iIHBIKbSIiR9DV7QYXVsosTKU5Uch+afl7wzDIJWLMZVM0+x4Y0PV8ku5of0QsTKXZvLtF\nozckE3fp4xNEEeZIHdOg0RtwfCrNdDrOrVqbu7UWhWScbNzFMk3CKKLv+2y3eveaXH/j+Axz2eRI\n44iIiBwWhya0dTod3n77be7evUsQBORyOb73ve9RKpWe9NRERA6UcrvHO5+W6z9Z3P3KUxBGxB0L\n17ZoDYbYpjHS9sZs3CUdc1lrtHkmvrO1cdQKkJ2hR7074FsnZjlVzPLiYolb1RbXK03WGm38IMQ0\nDeK2zdlSltPFnLY+iojIkXcoQttgMOBnP/sZi4uL/MVf/AWJRIJms4nr6g9qEXm6dIYe2+0+XhAQ\nRjtFNXIJl3widm817ZPtOq5lcao02lZB09g5H5ZwbPwgojvimTTDMFguZrm0VmG13iKXiI20xjb0\nA8i4k+YAACAASURBVC6vVcgnXI5PpYGd3mvnZqY4NzMF/FNFyd02zhYRETkKDkVou3jxIplMhu99\n73v3Xkun009wRiIi+yeKIjZbO9UfVxod/DDCNMDA2Nl+aEApFedsKUcpFeduvcPxR2yJfBDbMvHD\niCAMidkmPS8gCKN7lSF3Yzqd4FQpy9XtOj3PJ5+I7eq6vufz4VoFyzR4/fTCQys7KqyJiMjT6FCE\nttu3b7O0tMT/+T//h42NDZLJJM8//zxf+cpXnvTUREQeq9bA41c31qn2BiQcm+Vilpl08l6oCaOI\nWrfPeqPDb25t4ochlmkwmxn9fFchGQeg3O4zk0nS9wJ63k6z6lEs5TNsNLus1duEYcRCLsV0Jon5\ngMDV83zWGx02mx3Srs3rpxdGHk9EROSoOxShrdVqcfnyZV544QVefvlltra2+PWvf41pmpw7d+5J\nT09E5LGodgf8/PoaUQTnF0pk41+s/mgaBsVUgmIqQc/z+c2NNVKOixeGI/chizs2xWSczWaHmUwC\n2zIZ+gGMGKKCMMIPQl6YLzIIAq5t17lZblLKJIjZFqZh4Ichrf6QendA3LZ4ZmaKZ2fyxJ3RWg2I\niIg8DQ5FaIuiiOnpaV555RUAisUitVqNjz766KGhrVwu7+cU5SFqtdp9/xR5HI7ic9Ydevz29haG\nYXBmOofRa9PqPfq6uNfDMnw2NgMyMQd3xOCWMzy26zU2nRDHNAnCCHsw2vnhrXaPsNthaT5Nwolx\nMmGyUm+zVSkzDELCKMI2TZKOzVenUsxmktgWtBs12iONtH+O4jMmB4+eM9kPes4Olt0WVTSiaMTS\nXk/Af/pP/4ljx47x+uuv33vt8uXLvPfee/ybf/NvHnjNT3/60/2anoiIiIiIyMh+8pOf7Op9h2Kl\nbW5ujnq9ft9r9Xr9S4uR/PjHP37c05JdqNVqvPXWW7zxxhvk8/knPR05og7TcxZFEc2+R9fzdsrX\nGwaOZZFPujjWztbAVn/Ir25ucLyQuXfObLcub1SIOw4LuTSd4ZCU6xAfoQIkwMAP+HizRgQsTqWZ\nTu1uDu2hz81ynWzc5U+OTY+8PfMgO0zPmBxees5kP+g5O5wORWg7f/48P/vZz3jvvfc4deoU29vb\nfPzxx/etvP0x9W87WPL5vP6byGN3kJ8zPwi5XWtzrdKg0u0TRYABUQQGHm5zyMlCljOlLLV+EzeT\n4/ji3AOLd3yZzADaA49MPo819AHIJuOM2N+aZC7Pr66vstqPSGRizGSSDw1hAz9gvdFhrTNgZnqa\n756aJzZiUDwsDvIzJkeHnjPZD3rODpdDEdqmp6f5Z//sn/G73/2OCxcukM1m+fa3v82ZM2ee9NRE\nRB7pVrXFu3e36fsBU8kYX5krMpVwMQ2DiJ3+ZJvNLjcqTT7ZqtHzAk5PT40c2ADmcyk+WNmmNfBI\nOjadoccwCEbqtwYQRBEJ22Y6Fed2tcmtSpPpTIJ8Mo5jmUQRDIOA7XaPWrePa5k8M5PjhYUitnl0\nVthEREQOgkMR2gCOHz/O8ePHn/Q0RERG8tFmjfdWyxRSCb5azBJ37v9t12CnauNyMctSIcOdapOP\nNmrYnxYBGaVHGkA+ESPpOmw2O5ydyQM7lRpdRgtt6402uYTLP//KEj0v4EalybVyg61W97655xIx\nXl2a4UQhfW97p4iIiEzWoQltIiKHzY1Kk/dWyyxOpVneRbNr0zDIJWLEbAvDNGj0BuSTsZEaShuG\nwUIuxfVyg1Z/iGFAOGK5qfZgSLnd56XFIoZhkHRtvjpf4Pm5PH0/YOiHGAa4lkXMNtXwWkRE5DHT\nHhYRkcegPfD43Z0tZjLJXQW2zzMNSDo2fhjRGngjjz2fS5GJu3yyWaPv+SNd2/N8Lq1VKKVinC3l\n7vuZYRgkHJtcwiUbd4k7lgKbiIjIPlBoExEZUxRFBA9ZxrpeaRIBp0q5kYKN82mxjzCKcG2LvhcQ\njtiZxTJNzi8UcS2TK5s1mv0hu+nuUu8OeH9lm5Rr891T80eq+qOIiMhhpu2RIiIjaPQGXK80uVNr\nM/B3ApVlmqRdm5PFLCcLWRzL4HqlyWwmiTViUY6k4+BYFpVOn+VCloHv0/cCku5ov107lsVyMcuF\nO5tc26qx0egwn0sxk0ncN6cgDNlq9VhvdugOPOayCV47eXSrP4qIiBxGCm0iIruw0exyebPGRquL\nZZrMZJLEbQvTNAjCiFZ/yHurFT5Yr5CLuXQGHs/OFUYexzQN5nJJ1uodlvIZHMui5/kkHXvksv3b\nrS4n8hleWChyvdzgZrnBzUqDmG1hmSZBGDLwA4hgMZfkG0vTzGeT2vIoIiJywCi0iYg8wpXtOu/e\nLZN0bc7O5CmlEw8sx+8FARvNLlc2azi2NXY1xYVcmrvVNpVOj6lEnJ7nERJhjpDaep5PvTvgWydm\nWcilWMil6Aw87tTb9DwfLwhxLJOEY7M0lSYdc8aaq4iIiDx+Cm0iIl/iyladd1a2mc+lOVn88oIi\njmWxlM/Q6A1pDYbUegPyidi9c2q7lXBsiqk4d6otUnM7YeqzZty7EUYRVzdrJFyb41Ppe6+nYg7P\nzuZHmouIiIg8eTplLiLyEBvNLu+ulJnPpUcqKGKZBq5lYmJQ7w0IR625Dzwzm8cxDT7ZrDL0g11f\nF0YRH29U6Xoer52cUzERERGRI0B/mouIPMSlzSrJ2E6BkVHYpokfRqRiDmEU0Rux7D6Aa1ucXyzh\nBSFXt+qU290vrSIZRRGN3oAPVrdp9Yd858QcM+nEyOOKiIjIwaPtkSLyVPLDkM7AYxCEGOw0ik7H\nHCxzZzWt3huw2epxbiY/cmGObNxltd6i7/mfFhIJSLoOo9b3SLoOhWSM7VaPa9t1blWazGZTTGcS\nuNZOU2s/CKl2+6w3OvQ8n1zc5ftnFigpsImIiBwZCm0i8lRp9ofcqDS5Xm4yCIJ7Z8UMIO7YnClm\nOVXMcr3cxDZNimOEn+l0gmuWxWarw1I+S3swZBgEI5fRH/oBjf6Q75zaWTW7Xm5yo9pkpda6732m\nYXBsKsWZUo65TELVH0VERI4YhTYReSp0hz6/v7vFWqOD+WnJ/mIqjmOZRIAXhFTaPS5v1ri0UaPn\n+ZwoZh9YJfJRTNNgPpditd5mKZ/BNAwG/uihbb3RwTFNTuQzuLbF15emeWGhQLU7YBiERFGEa1lk\n4w5JV9UfRUREjiqFNhE58uq9Ab+4vs4wCDk1PcVMOolpfjGMTSVinChm2Wx2eW9lGy+M8MMQe8QG\n2QDzuRR3a62dLY2Z5MjFSDoDj9VGm7PFLO7nwp5jWcxmkiPPR0RERA4vFSIRkSOtM/T4+bU1giji\na8emmcumHhjYPmOZJrPZJAnXxjSg3h2OVf0x4dicm8mz3eqx0fzyIiJ/rDv0+HC9TCER42uLxZHH\nFhERkaNFK20icmRFUcSvb27ghREvHpu+b8Xqy5iGgWUYOJZJSESjPySfjI08/nwuhR+GXFqvMAwC\nEo5Nwn34b7thFLHd7nFju04u7vLdU/NjN+gWERGRo0OhTUSOrO1On3Knz3PzxV0HNgDDMIg7Ft2h\nTymdpDv08IJw5CbZAMem0twsN2h0B7x7Z5NcIsZ8LkUuEcM2DcIoYuAHbLa6bDa7BGHIsVyKby7P\njjRnEREROboU2kTkUIuiCD8IH/iza9sNYrbNVGL0VbK5TIrbtSbLhSyGYdDzfBzLHfk+1W4fyzT4\n588cozP0uVpu8MlmlSiCzzZMGkDMNjlbynK6mCOXGH0cEREROboU2kTk0OkOfW5UmtysNukOfbx2\nE4A3r65yph9xupgjZpvcrbc5VsiMVQJ/LpfiVrXJdrtHPhmn7weko2jkapJr9Q6lVJyZT4uHnCxm\nqfcGtAcewyDENAxitkkplRhrJU9ERESOPoU2ETk0Gr0BH25UuVvvEEVQyiQoZZIM3IgrQCbucrXc\n5OOtOinXZuAHzKTHq7QYsy2m0wk2Gh2K6TiRv7OiN8qWxWZ/SKM34NsnZu97fSoRG2v1T0RERJ5O\nCm0icihsNLv86uY6GAbLhSwzmST2pytTzWAA7JwfS+cLlNs9rmzW6PshfhjiMt7ZsBPFLBfubnF1\ns87CVIpRikj2PJ/L6xVmMwmO59NjjS8iIiICKvkvIofAVqvHL66vEXccXl6aYWEqfS+w/THTMJjJ\nJDlRzGKZBs3+kJ7njzVu0nX46nyJzmDIje0GXhDs6rpWf8j7K9ukXZvXTs5hjdHnTUREROQzWmkT\nkQOtM/T45Y11kjGH5xeKuz5T5lgWlmlgmxatvodlmrhjnBmbSsZ4br7I27c2uLiyxVw2xXwuTTrm\n3Pe+KIqodgesN9rUuwNmMwleOzlH3NFvsyIiIrI3+jYhIgfatXITLwx5cW5mpCIgqZiNgcEwCHAs\nk+7Awx2j1xpAEIZkYw7PzuS5W29zsblFKuYQs3eCoR9GdAYeXhBQTMb51olZlvNprbCJiIjIRCi0\niciB5Ych18uN+86v7VY65pJNuGw0O5yZnqLn+QRhhGWOXklyrdHh2FSKry9N89JiidVGh5VGm74f\n4AcRKdtiJhXnZCFDIRkbq1qliIiIyMMotInIExOEIdXugGEQEoYRjmWSiTuk3J2thyv1Dj0v4Nlc\naqz7L+RSfLxRxQ8iDHZ6rf3xtsZHafWHdAYef3KsBIBpGizl0yypuIiIiIjsE4U2Edl37YHH9UqT\n6+UGPT/4py7T7BQSWcwlOVPKcbfWIhN3SbqjBa3PzKSTXLca3Kk1OZ7P7PRaGyG0hVHEzXKDdMxh\nYczgKCIiIrJXCm0ism+8IOB3d7a4U+tgGDCTSfJMNolrWZiGgR+G1HsD1hod7l5bo+/7zGfHX9Ey\nTYOvzOX5cK3CimEwm919z7YoiriyWaM79PjTMwsjN9UWERERmRSFNhHZF92hzy+ur1HvDzlZyjGT\nSXyhUIdtmcw5NnPZFM3+kLdvrjMIAvpeQNwZr9daMZXg3EyeD9fL9D2ffDyG+4h7DfyAq1s1Wv0h\n3zoxy2xmvAbdIiIiIpOg0CYij50XBPzyxjqtocfXjk3fO7P2ZbJxl1I6QQQ0+0NMw8W1xwtu87kU\ntW6fW9Umv7uzQSEVZyGXJhd37xUNiaKIem/AeqNDtdsnblv86ekF5kZYnRMRERF5HBTaROSxe2+1\nTLU34GuLuwtsn0k4NvXeANOARn9IMRXf0zbF5akMZ0pZrpYbXForYxrGvaqUfhASRhFTiRjfOD7D\ncj6NY40XEkVEREQmSaFNRB6rgR9ws9piKZ8hNWLlxrlsio1mBz+MMAzoe/5YRUkGfkC12+fVpWnO\nTk9xdjrHVrt3r3KlATiWSTEVZzoVV8l+EREROVAU2kTksbpZaRKEjLXNMJdwSbkOm60OS/ksPS8g\n6TgwYqbaaHRwLZPlfAYAwzCYzSR1Vk1EREQOhdG61YqIPEAURXhBSBCGX3j9arlBMRUfa6uhYRgs\nTKWpdQf4QYAfhgyDYKR7DP2AjWaHE/nM2GfiRERERJ4krbSJyFi8IOBWtc21coNmf0gY7TRbs02T\n+exOn7Vs3KE18HhmKjP2OHPZJOvNDle36ywXsgyDcNfhKwhDLq1XcC2T5+byY89BRERE5ElSaBOR\nkfS9gA83qtysNhkGIflknOOFLJZlEkURQz9kq93ldr1NwrF2QpY1/qK+ZZqcXyhx8e4WV7dqPDtb\n2FWD7IEfcGm9gh8E/ODs4kgFUEREREQOEoU2Edm1Vn/IL26s0xn6zGVTzOdSxB6w6rWUT9PsD7lV\nadLzerSHPpl4jHHre8Rsi5eWZvj51RUubVQod3rM59IUkrEvFA1p9oes1dtUOn1Srs2fnzvGVCI2\n3sAiIiIiB4BCm4jsSnfo8ea1VfwQvnZsmoTz8N8+DMMgl4jxzGye7XaP7tCj1R+SjbsjFxH5jGtb\nTMVdCskYfhjx8UYFx7JIujZBpwXARxtVPLdPNu7w0mKRU8XsA0OliIiIyGGi0CYijxRGEf9wY51h\nEPG1Y9O7DkIxeydU9byAnu9je8ZYJfsBOgOPYRBybnqKY1Mpqt0BN6stukOPdm/nPdPpOM8cX2A+\nm1TZfhERETkyFNpE5JHWGh0qnQEvjBDYYGfFbT6X4m6tzXw2SWfok3DssQLVWqNNyrVZyKUwDINi\nKk4xFQegXHZZeQfOzxcp5VIj31tERETkIFPJfxF5pGvlBqmYs7O9cUTz2RRRFNH6tMLkwA8ffdEf\n8cOQ7XaP06UslqkVNBEREXm6KLSJyJdq9oesNbvMj7mCFXdsiqk4a40OYRjR9fyRro+iiJvlBiZw\nupgdaw4iIiIih5lCm4gQRREDP6Az9Bj6AdGnPddgZ2ukgcFMOjn2/c/O5DFNg1uVBt2hRxBGj77o\n03ndrrbYanV55fjM2OfhRERERA4znWkTeUpFUUSl0+daucGdepsgjIjYKe7oWCYnC1lOl7IM/ADX\nNjH3sC0xZlu8sFDi3U97rcVti2Iq/qVn2wZ+wK1Kk3K7y0uLJU5plU1ERESeUgptIk+h1UaHD9er\nVLp9XMtiPpcm4TpYhoEfhnSGHtcrTT7ZrmMaYJt7L5ufijl8db7Ib2+tc3m9QtJ1mM+lmM0ksT9t\nvh1FEY3ekLVGm2q3j2uZfGt5lpMKbCIiIvIUU2gTecp8vFnjwmqZTNzl2bki+Qc0qAZYLmSptHtc\n3qjS9z36XkDc2Vt4s0yDbMzhm8tzbLQ63Kk2uVFuYBkGhgFBGGEYMBV3eXVpmuV8Bld91kREROQp\ndyhC2zvvvMOFCxfuey2ZTPLXf/3XT2hGIofTla06F1bLLEylOVHIfun2RNMwmM4kOROEXN6oUO70\nmE4n9tSsutUfYpkmS1MpTpeydIc+W+0eQz8gjCIcyyITd5h+xNZJERERkafJoQhtAIVCgR/+8If3\nfq0vdCKj2Wx1eXelzHwuzclibtfXzWaS3CjXafQGuJZFIRm7t51xFFEUsd7scHwqfW/1LOnanChk\nRr6XiIiIyNPk0IQ2wzBIJBJPehoih9ZHm3XirjXy+TDbMpnJJCl3+pRSCXqeT8YavV9bvTdg6Aec\nmd59YBQRERGRQxTaGo0G//E//kcsy2JmZoZXXnmFbFbFCUR2o9Ufst7scGp6aqxV6sWpNOuNLs3+\nEMM0SEfRSPeJoog71Rb5RIzpVHzk8UVERESeZoeiT9vs7CxvvPEGf/EXf8Hrr79Ot9vlZz/7Gf1+\n/0lPTeRQuF5pYpjm2L3W0jGX+VySlXqbZm9A3wt2fW0URVzZqtMderx8rKStzSIiIiIjOhQrbUtL\nS/f9enZ2lv/8n/8zV65c4YUXXnjgNeVyeT+mJo9Qq9Xu+6c8GTdW18nYFu16dex7zDrQjgbcuNNg\nkE9zbCrNo+JXEEXcrjZp9TzOz+exBl3Kg+7Yc3gYPWfyuOkZk/2g50z2g56zg6VUKu3qfUYURdFj\nnstj8T//5/8kl8vx2muvPfDnP/3pT/d5RiIiIiIiIrv3k5/8ZFfvOxQrbX8sCALq9Trz8/MPfc+P\nf/zjfZyRPEytVuOtt97ijTfeIJ/PP+npHClRFFHt9lmpd9ho9Qj+6O9fTAxm0nGWptK8t1ZmNpti\nJr33Yj53ai3a/QH5ZJytVh/DgIRrYxkGEeAHIV3PJ2abHMulWJpKk3SdPY/7ZfScyeOmZ0z2g54z\n2Q96zg6nQxHafvvb37K8vEwqlaLf73PhwgU8z+PcuXMPvWa3S42yP/L5vP6bTFCzP+Q3tzaodgfE\n7BgnlwqU0gkcy8QAvCCk0umz3uxwodqna8Ywkhmyhb1XbrQ9g2Ia/uzcMToDjxvVJs2+x9APMA2D\nmG0xm0mwlE9jm/t7bFbPmTxuesZkP+g5k/2g5+xwORShrdPp8P/+3/+j3++TSCSYmZnhX/yLf0E6\nnX7SUxPZd5VOn19cXyPC4Ln5IlOJ2BeKe7i2xXwuxVw2SWvg8fatdbZbXZY+1yNtHFEU0e57LGZ3\nCpqkYg7n54t7+jwiIiIi8uUORWj7wQ9+8KSnIHIgNPtDfnFjHcs0eX6hhPOIJteGYZCNuzw/V+QP\n6xU2Wz3msslHXvcw9d6Age9zqqR2GyIiIiL75VCU/BeRnVWu397eJIrYVWD7vNlskpRjU+30afSG\njFt+aL3RYUq91kRERET2lUKbyCFR6Q4od/qcKuVGXimzTJP5XIp6r0/P8xgGu++z9plWf0it2+ds\nKadeayIiIiL7SKFN5JC4tt3AtSzyydhY1x8vZIg7FjcrDRq94UjX9oY+l9YrzKQTnCxmxhpfRERE\nRMaj0CZyCAz8gNv1FnO51NirXI5lcX6hRBhGXF4v0+gPdnVdozfg/ZVtMq7Nayfn970ipIiIiMjT\n7lAUIhF52tW6A/wgopTaW5+1pOvwyvIsP7+6ynt3tiilE8znUhSS8fvCYBhFbLd7rDfatPsec9kE\nr52cJ7aHypMiIiIiMh6FNpFDwAtCgLGrPn5eKuaSizscn8rQ9Xw+3qhiWyZx28Y0DYIwpO8FBGHI\nfCbJy4slFrMpTFPn2ERERESeBIU2kQOg3htws9KiM/QYBiG2udOkejGXYiGXmvh4pmEym0lwdjpH\ntTvgbr1N3w/wgxDHMkk4Nsv5DLmEO/GxRURERGQ0Cm0iT0gYRtxttLlWbrDZ6mGbJsmYg22a9PyQ\ncnfAtXKTdMxhOhUnIsILQ+w9rrYFYUgQRbi2iWEYFFNxiirhLyIiInJgKbSJPAFDP+DXNzdYa3bJ\nxl3OzRYopuKYf1RkpNUfst7ocLPaoj3w2Gx2OVHcW2PrSqePaUA+oaAmIiIichgotInss6Ef8Oa1\nVeq9IV9dKDH1JSX8M3GXTNzlRDHLr66vcqvaZD6bIuaMXxBkrdFhLpPU1kcRERGRQ0K1u0X2URhF\n/PrWBvXekBcWp780sH2ea1u8dGyGvuez2mjjh+FY47cHHu3+kLOl3FjXi4iIiMj+U2gT2Ucr9Q5r\njS5fmSuQijkjXTuVjFFMxlmpt2n0dtdj7fPCMOLado10zGHxMRQ3EREREZHHQ6FNZB9dKzfIxF3y\nydHPkxmGwfMLRbwg4OPNGgMv2PW1YRjx0UaVgRfwnZNzKt8vIiIicogotInsk0ZvwEary/weVrmS\nrsNLx2aodwdcWNmi3h0QRdGXXtPqD/lgdZv2YMhrJ+coqVKkiIiIyKGiQiQi++RmtYVlmpTSiT3d\np5CKs1zIsFbvcGm9TNyxmc+mKKUT95pv+2FIpdNnvdGhM/DIxh1+cHZRpf1FREREDiGFNpF90hn6\npFznC2X9x1FMJWh0+3z/zALXK01uV5vcKDf4bM3NAEwD5rMpXlmaZj6bnMi4IiIiIrL/FNpE9skw\nCLAmdJbMtnYaYxeSceayKbpDn2q3zzAIiSJwbZN8IkZ6xGInIiIiInLwKLSJ7BPbMAgfcf5st4JP\nS/5b5s52yKRrk3TTE7m3iIiIiBwsCm0iE9DsD7lebrLSaDPwA4IowjVN0jGX06Usx6fSxBybQXf0\nUv0PMvACbMuc2MqdiIiIiBxcCm0ie7DR7PLRVo31ZvdekZF8ysI0DPwwpNUf8o+3NnnPLlNMxegO\nPZr9Idm4O/aYURSx2epyfEorayIiIiJPA4U2kTFEUcQnW3UurJZJug5nZvJMpxMPLPbR83zWGx3W\nG216Q5+Veovn5opjj13tDvCCgDOl3F4+goiIiIgcEurTJjKGjz8NbAtTaV48Ns1s5uHVGROOzalS\njpePz+LaNqu1nfA2jiiKWKu3KSbjFJKxvXwEERERETkkFNpERrRSb/PeapnFqQwnizmMXZbSjzs2\nryzP4och761s4QfByGPfqbVo9gc8P5ff9bgiIiIicrgptImMIIoiPlyvko3HWC5kRr4+FXN4eWmW\nWrfP+6tlvCDc9bi3q03u1lp8baHIMZ1nExEREXlqKLSJjKDc6VPtDVjMp8de6ZrNJpnLpqh0+rx3\nd4uVegv/IeEtiiIqnR5/WCuzUmvx4kKR52bze/kIIiIiInLIqBCJyAiulRu4tkU+sbfzZCeKWTqD\nIYWEy91qi9uVJqV0kkzcwTZNgjBi4Adstbp4QUApFeelU/NaYRMRERF5Cim0iexSGEXcqbdZyGX2\nfJ6skIwTc2yKqTjfPjnHzUqT65UmlU4PPm3A7VoWx6dSnCnlKKbik/gIIiIiInIIKbSJ7JIXhIRh\nRMLd+/82hmEQt236fkDCsXlursBzcwWiKMIPQyzDxFTjbBERERFhAqGt2Wxy584dyuUyjUaDwWCw\n84U0HieRSFAqlVhaWqJYHL8vlchB4IchEWBNqGqjaRhfOMtmGAaOZU3k/iIiIiJyNIwd2tbX17l4\n8SJhGDI7O8vJkyeJxWLEYjEMw2AwGDAYDKhUKvz+979nMBhw/vx5Tp06Ncn5i+wbxzQxgCDcXcXH\nRwmiEMdSLSARERER+XIjh7Yoinj77bexLIvvfe97JJPJL33/8ePHARgMBnz00Uf8/Oc/5zvf+Q6O\n44w3Y5EnxLFMbMukPfSY3uO9wiii7wUkJ7DVUkRERESOtpG/Mb7//vucO3eOQqEw0nWxWIwXX3yR\ndrvNhQsX+MY3vjHq0CKPVbM/5HqlSa07YBgEGIZBzDKZSSc5VcwQd2xOFbJcqzRZLmQx97BNcrvV\nIwhDlvOj93oTERERkafLyKHtxRdf3NOA6XRagU0OjCiKWG10uFpusN7sYpkm2biLY1lEQMcLuLhW\n5g/rFY7n08xlkgTbIeV2j5nMl68yf5n1Rpv5bJJs3J3chxERERGRI0l7s+SpFYQR79zd4nqlSSrm\ncGYmz3Q68YUVNC8I2Wx2WG12uFVtEbNNVuptSg94727UuwPaA4+vHytN6qOIiIiIyBGm0CZPpTCK\n+M3Nde42OpyZyTP7JatmjmVyLJ9hcSrNrWqTW5UmRD5XNms8M5sfqWdbd+jx8UaV+WyShVxq0HoO\ncwAAIABJREFUEh9FRERERI44hTZ5Kr23UuZOo8Nzc0UKu2xcbRgGJ4s5LMPk2naNjWaHMIr4ymxh\nVz3Vmr0Bl9ar5OIOr52c29OZOBERERF5eii0yVOn2R9yZbvOyWJu14Ht85byaTpDj2qnR7M/4He3\nN5jLppjPJok59/8vFUURtd6A9XqbWm/AXDrBa6fmcW31YhMRERGR3VFok6fOjUoT0zSZz463PdEw\nDI4XMlTaPV6YL9LzfK5XmqzUW+TiMWK2hWnuNM5uDYYM/YBCMsY3js9wspDBMtWbTURERER2b+TQ\n1u/36XQ65PN5zE+/fNZqNXK53L1fixxUfhByvdxkJpPc1ZbGh0m5Dpm4y1qzww/OHuP8fJHbtRar\njQ4DP2DoRbi2yVIuxelilmIqPtLZNxERERGRz4wU2m7evMlbb72FZVm4rssPfvADZmZmCIKA//Af\n/gP/9t/+28c1T5GJWG926fsB8xMoAjKfS3F1q0Z74JGOOZwp5ThTyk1gliIiIiIi/2Tk0Pav//W/\nJpFIUK1Weeedd3jppZeYnp7WKoIcCh3PwzINEs7edwanYw5RBF3PJx1zJjA7EREREZEvGmk/4+Li\nIolEAoBCocCf//mfc/fuXarV6mOZnMik+UE0sTNl9qf38YJwIvcTEREREXmQkb69GobBysoKf//3\nf0+/38cwDF5++WUqlQpBEDyuOYpMjG0aBOFkQlYQRvfuKSIiIiLyuIy0R+zcuXNsbm5y6tQpYrHY\nvdfPnj1LMvnw5sQiB0XMtgjCiKEf7Lnsfs/zMQyI2yrCKiIiIiKPz8j7xGZnZzl37twXzrAtLi5O\nbFIij8tCLoVjmWw0O3u+10azQz4RIxvXeTYREREReXwmVqP/2rVr/N//+3/p9/v3XqvX65O6/X0u\nXrzIT3/6U37zm988lvvL0RWzLZbzaTaaXaIoGvs+A8+n2u1zdjqnIjwiIiIi8lhNbF/X9vY2nU6H\nTqdDPB4HoNVq8cknn/DCCy/cK2CyV1tbW3z00UcUi0V9WZb7+EHI7Vqb27UWXc/HC0Ic0yThWizn\nMyznMziWyZlSjuuVJtvtHjOZ8bb1rjY6uJbJcj494U8hIiIiInK/iYU2x3H44Q9/iP258z1LS0ss\nLCzw7rvv8uqrr+55DM/zeOutt3j99de5cOHCnu8nR0Nn6PHJVp0blRaDIGAqESMVc7EMgyCK6A49\n3r6zxcXVMqeKWc5NT3Esl+Ladp2U65AasVz/drvHWr3N1xaKONbezsWJiIiIiDzKxELbuXPn+C//\n5b+wvLzM3Nwcc3NzJBIJLMua2IrYr371K44fP87i4qJCmwBQ7vT5hxvreH7IbDbJfC5F/AE92Pqe\nz3qzw9Vyk1vVFt9cnqXnBfxhrcxz80WycXdX4202O1zbrnOqmOH5ufykP46IiIiIyBdMLLT9/ve/\n58SJE/T7fX7/+99Tr9fJZDIkEglKpdKe73/t2jUqlQr/8l/+y129v1wu73lM2btarXbfPyep3hvw\n+ztbOLbFuVIO2/QZthoMH/L+ogG5jMONcoOff3iFFxeKBJ0+F6/cJJ+MMZ1JkHxA4IuAZn9Iud2l\n2fc4PpXidNKiUqlM/DPJeB7ncyYCesZkf+g5k/2g5+xg2W1OMqK9VGP4nPfff5+vfe1r937d6/VY\nX1/nzp07nD9/nmKxOPa92+02f/M3f8MPf/hDCoUCAH/7t39LsVjk29/+9gOv+elPfzr2eCIiIiIi\nIo/bT37yk129b2Kh7eLFi5w+fZpMJnPf677v89577/HKK6+Mfe9bt27xv//3/75vm2UURRiGgWEY\n/Lt/9+++sAVTK20HQ61W46233uKNN94gn5/cdsKPt2rcqbZ5dr6AbY5eBNUPIz7aqLA0lebZ2Txh\nFLHd7nG33qbZ9/CDEAxwLZNCMsbSVJqpREzFbw6ox/WciXxGz5jsBz1nsh/0nB0su11pm9j2yBde\neIFLly5hWRbPPfccAKurq/z93/89y8vLe7r34uIi/+pf/at7v46iiF/84hdMTU3x4osvPvCL9CS2\nZMrk5PP5if038YOQzdUm8/NzFEq5se+zaDhstLp8M1/AsUxmpuH5icxQnpRJPmciD6JnTPaDnjPZ\nD3rODpeJhTbTNDl//vx9r83Pz/Otb32Lubm5Pd3bcZwv/E2AbdvEYjH9DcFT6E69Td8PmM+l9nSf\n+WyKlfpOi4Azewh/IiIiIiKP08j7yjY2NnZ/c9Pkueeeu3cO7TOrq6ujDvtA2qr2dLpdbZFLxEg8\noGjIKGKOTT4R43a1NaGZiYiIiIhM3sihLYoi3n77bYbDh9Xoezjf93n77bfp9XojX/vHfvSjH/Gt\nb31rz/eRw6fr+STdySwSJ1yHrudP5F4iIiIiIo/DyN985+fnSaVS/PKXvyQWi3Hu3DlKpRLmQ4pB\nRFFEpVLhxo0bVCoVvv71rzMzM7PnicvTyw8jrDGKjzyIZRp4QTiRe4mIiIiIPA5jLVdks1l+8IMf\nsLW1xaVLl9jY2CAejxOPx3HdnSbFw+GQfr9Pr9djenqaZ555hldffXWik5enk20aBOFkglYQRjjW\nZAKgiIiIiMjjsKc9ZjMzM/dWzRqNBp1Oh36/TxRFxONxkskkU1NTOnsmE5V0bDrDyWxp7A29BzbU\nFhERERE5KCb2bTWXy5HLqQKfPH7LhQz/eGuTnufvqRjJwPOp9QZ847i264qIiIjIwaV9YXLoHJ9K\nE7ct1hudPd1nvdnBtUyW85lHv1lERERE5AlRaJNDx7ZMTpWybDW7eEEw1j28IGSj2eV0MaszbSIi\nIiJyoOnbqhxKz0zncG2TS+uVkYuShFHE5fUKjmlwbnrqMc1QRERERGQy9qUCw7Vr1/A8j9OnTzMc\nDul0OszOzu7H0HLIBGHEWqPDVrvHMAgII3Atk1zC5UQ+g2tbACRdh9dPzfPm1VX+sFbm+fkijmU9\n8v5eEHJ5vULf83njzALpmPO4P5KIiIiIyJ7sS2izbZtTp05x48YNzpw5w/b29n4MK4dId+hzo9Lk\nWrlBZ+iTcG0c08QwDPww5Eq5wcXVCicKac6UpigkYxRTcb5/dpFf3Fjn3dtbzGSTLORSxB9QnKTv\n+aw3O2w2uzimwffPLFBKJ57AJxURERERGc2+hLaNjQ0WFhaIx+MAWLtYEZGnx2qjw29ubuCHEdOZ\nBOdmC6T+aAVs6AdsNDvcqrW5Vm5yfr7AV+cKFFNx/vkzx7iy1eB6pclavc1UMkbCtbENEz8K6Q19\n6r0BMcvkbCnLuekprbCJiIiIyKGxL6HtzJkz/Nf/+l/JZrNUKhX6/T7Hjx/fj6HlgLtVbfGPtzeZ\nSsQ4N5PHfkhRENe2OF7IspTPsFJv88FalZ7n88rSDCnX4aVjJc7PF7hTb3Or2qI38BgGIY5lknRs\nvnJ8huP5jIqOiIiIiMihsy+hrVQq8Vd/9VfcuHED27Z5/vnn92NYOeA2W11+e3uTYirBuZndNWE3\nDIOlT8PX1a06Kdfh+bkC8GlVyWKWU8Xs4566iIiIiMi+mVhoC8MQwzAe+sU7mUzy1a9+dVLDySEX\nRRHv3N0m6Tq7DmyfN5dN0fcCPlircqKQIeVqu6OIiIiIHE0T2yv2s5/9jNu3bz/wZ/V6nb/927/l\n7/7u7xgMBpMaUg6xrXaPRn/IcjE7cmD7zFI+DQbcqDQnPDsRERERkYNjYqFteXmZbDbLBx98QLVa\nve9nv/zlLzl58iTf/OY3ef/99yc1pBxiV7cbxB2bXNwd+x6WaTKTSXKt3CQIownOTkRERETk4JhY\naDMMg//xP/4HH3zwAf/9v/93yuXyvZ9tbm5y+vRpCoXC2KsqcnT0PZ+VRof5bGrPz8N8LkV3uFPO\nX0RERETkKJrYmbZOp8Nf//VfY9s2rVaLS5cuUSqVGAwGhGF4r9y/be9L7RM5wDpDnzCKyCbGX2X7\nTMp1sEyD1sCbwMxERERERA6eia20JZPJe4Esk8kQi8UACIIA4N6KShRpG9vTzgtCoghsczKPn22a\neEE4kXuJiIiIiBw0E1v2ajQavPvuuySTSba2tiiVSsDO1kiAbrdLMpmk3W5Pakg5pExzJ8CHEwrw\nIRGWqW23IiIiInI0TSy0vfrqq7z55ptUKhXOnj2LaZr8wz/8A2EY8pd/+Ze8+eabxONxlpeXJzWk\nHFIxy8IwoO8HJPdYqj8IQ/wgxLWsCc1ORERERORgmVhoS6VS/OhHP7rvtWefffbev5umSbPZ5OzZ\ns5MaUg6pbNwhG3fZbHYpJON7utdWq4eBwXw2OaHZiYiIiIgcLBOtCtLv97l48SJra2tEUcTs7Cwv\nvfQSqVSKubk55ubmJjmcHFKGYXC2lOPdlW2GfoBrj7dKFkUR640Oi7kk6Ziaa4uIiIjI0TSxQiTN\nZpO/+Zu/odFoMD09TbFYpFar8d/+23+j0WhMahg5Ik4UMtimuadS/c3+kK7ncXY6N8GZiYiIiIgc\nLBNbabt48SI/+tGPSKfT973eaDR4//33ef311yc1lBwBMdviTCnLx1t1cvEYU8nYSNcP/YBPNmuU\nknHmMtoaKSIiIiJH18RW2tLp9BcCG0Aul8NxtHVNvuhrCyXms0kub1Sodfu7vq7v+XywWsa1DL57\nak4N20VERETkSJtYaPN9/6E/U282eRDLNHjt5DwLmSSX1itc267THT68SbYXBNyttbi4so1rGbxx\nZnHP1SdFRERERA66iW2PNE2T999/n2effRbXdYmiiG63y9WrV3Fdd1LDyBHjWCbfPTXP5c0aV8oN\nNhodsgmX6XQSxzIxDQMvDKl1+lQ6fSwTTuQzvLBQJOFMtI6OiIiIiMiBNLFvvS+//DK//vWv+ff/\n/t9jWRZBEBBFEadOneL73//+pIaRQ6I79GgPfLZbXQDqvQGFMLrXWPvzTNPgq/MFnp3Nc7fe5lq5\nwc1ync/WZw0gHXN4abHIyUKWuKOebCIiIiLy9JjoStt3v/tdnn/+eVZXV4miiLm5OWZmZqhUKhSL\nxUkNJQdUFEWsN7tcKzdYbXQJo4ig2wLgH29t8mF9yJlSjlPFLEn3i4+eZRqcKGQ4UcgQhCHDICSK\nIhzLwjYNnV0TERERkafSxPeXFQoFCoXCfa/99re/5Yc//OGkh5IDZL3Z5Z27WzT7HsmYw8lSjlzC\npddweecjODM9Rdty+cN6lQ83qpwsZPj60jS2+eBjlZZpknjIz0REREREniZjhbaVlRX+7u/+btfv\n1wrJ0Xaz0uTtO1ukYy4vHJsmG/+nM4y+tRO80jGHhUKeU8Ucm60uNypNWgOP756aJzZmc20RERER\nkafBWKEtHo+ztLTEd77znUcGsiiKePPNN8eanBx8K/U2v729RSmT4Oz01COfB9syWZxKk4m7XF6r\n8Kub6/zp6QUsraqJiIiIiDzQWKEtm83y9a9/nWw2u6v3/8mf/Mk4w8gB1x36/ObWJlPJ2K4C2+dl\n4y7PzRf4w1qFP6xXeXGx9BhnKiIiIiJyeI21vOG6LjMzM7t+/7Fjx8YZRg64G5UmQRhxbiY/1hbY\nbCLGwlSKa+UmXhA+hhmKiIiIiBx+2pMmYwnDiGvlBqVMAtsa/zGaz6YYBAF3au0Jzk5ERERE5OhQ\naJOxrDY7dIY+C7nUnu4Td2zyiRjXyo0JzUxERERE5GhRaJOxbLd7xB2bdMx99JsfoZROUO328YJg\nAjMTERERETlaFNpkLMMgxNnDtsjPc2yLCBj6OtcmIiIiIvLHFNpkPNHkbvVZCZMJ3lJERERE5MhQ\naJOxuLaJH05mZcwLQgzAndDKnYiIiIjIUaJvyTKWXDxGz/Ppe/6e71Xt9knHnIlttxQREREROUr0\nLVnGcjyfJmZZrDc7e7qPFwRU2j3OlHJj9XoTERERETnqFNpkLI5lcqqYYbPZJYzGP4222eximQYn\nC5kJzk5ERERE5OhQaJOxnSnlCMOQ29XmWNf3PZ+VepvlqTRxx57w7EREREREjgaFNhlbNu7y0mKJ\n1Xqb1Xp7pGsHfsCHaxVSjsWLi6XHNEMRERERkcPvUCxvXL58mcuXL9NqtQAoFAq8/PLLLC0tPeGZ\nyTMzU/T9gMubNXqez3Ihg2NZD31/FEXUewOubNWIWRavn17QKpuIiIiIyJc4FN+WU6kU3/jGN8jl\nckRRxJUrV/hf/+t/8eMf/5hCofCkp/dUMwyDF/9/e3f23NZ533/8c7CvxEKABEmRojZ6kVfZkm1Z\nsSXHTmq3SlJ30vaiveqM/478HbroRZretTPpOEmbdPqTnJ9Z2ZJFUbJExTJFWxsJiuACYt/O6YUq\n1LQkmyIB4oB6v240OHjw4AviOzY/fM55zlBCAbdL529nNL9aUG/Ir8FISGGvuzmublq6vZLXXLag\nSr2uRNCnI7sGFPB0RQsCAAAAHdMVvzHv3LlzzeODBw9qampKCwsLhDabGOuLaiQW0ldLOU1nsrp4\na0GGIal097TJz2cz8oR6NBwNam8ior6Qn90iAQAAgHXoitD2TaZpamZmRo1GQ6lUqtPl4Bt8bpee\n6o/pyb6o5nMl5So1LS06dEHS8wNx7RvZwcoaAAAA8Ii65jfopaUl/frXv1aj0ZDL5dLbb7+tSCTy\n0PGZTGYLq8O3uSTFDElOU5IUsGoqrq6o2NGqsF0tLy+v+RdoNXoMW4E+w1agz+wlkVjfhnyGZW3i\nJltbyDRN5fN5VatVzczM6PLlyzp+/PhDP+iJEye2uEIAAAAAWL8PPvhgXeO6JrR9229/+1uFw2G9\n8cYbD3yelTZ7WF5e1smTJ3Xs2DHFYrFOl4Ntij5Du9Fj2Ar0GbYCfWYv611p65rTI7/Nsix9V95c\n7w8AD9YwLWUKJZXrDTVMS26nQ0GPSzG/d0MbiMRiMb4TtB19hnajx7AV6DNsBfqsu3RFaDtz5oyG\nh4cVCoVUq9U0PT2tubk5vfjii50ubdvJV2qaWVzV9OKqStW6JMmSZEgyDCnm92pfMqKdsdB33o8N\nAAAAQGt0RWgrlUo6deqUisWiPB6P4vG43nvvPQ0NDXW6tG2jbpo6d3NBM4s5GYaUDAf0RH9cfrdT\nDsNQw7SUq1Q1ly3o0xt3dP52Ri8MJrQv+fDNYAAAAABsXleEtjfffLPTJWxrtUZDf7w2p/l8Sbt6\nI+rvCcjpcKwZ43IaigV8igV8qtTqur6c09mbd1Ss1vTcYC/3XAMAAADapCtCG9qnYVr6+Ku07hTK\nenYoqR6f53tf43W7NNYXU8Dj0qX0sjwup57q50JWAAAAoB0IbY+5P91Z1txqUfsHEusKbN+0IxpW\nvWFp8nZGfSG/eoO+NlUJAAAAPL4c3z8E25VpWvpyIau+cEDRgHdDc+yMh+VyOjWdyba4OgAAAAAS\noe2xdjtbUKFa10AktOE5DMPQQCSo68t5VeqNFlYHAAAAQCK0PdamM1mFfR6FvO5NzZPqCahuWvpq\ncbVFlQEAAAC4h9D2GMsUyi25Ds3tdCrsc2uxWGlBVQAAAAC+idD2mGqYluqmKZezNS3gcjpU5fRI\nAAAAoOUIbWgJQ4asThcBAAAAbEOEtseU02HI6XCo3jBbMl+9YcrrcrZkLgAAAAD/h9D2GEsEfVos\nlDc9T63R0Gq5ovgGbxsAAAAA4OEIbY+xvYke5cpVFSq1Tc2TXi3K6TC0Kx5uUWUAAAAA7iG0PcaG\nIiEFPC7NZgsbnsOyLKWzBe2MhuRzu1pYHQAAAACJ0PZYczoM7UtEdCdXVLa0se36by7nVG00tDcZ\nbXF1AAAAACRC22Pvqf6YUmG/rswtKVeuPtJrZ7N53VjK6fnBXiVacL83AAAAAPcjtD3mnA5Dr+9K\nKR7w6PPbGc1lCzLN7968v1pvaHphRTOZrJ7qj+rp/tgWVQsAAAA8frgICfK6nDq2d0hnbtzRTGZF\n1xdX1dcTUKonIJ/bJYdhqGGaylVqms3mtVQoy+1w6OUdSY0lIzIMo9MfAQAAANi2CG2QJLmcDh3e\nldIzA3FdW1zVTGZVsyt5WZIMQ7IsyWFIEZ9HB3ckNRoPy8N92QAAAIC2I7RhjR6fRy8OJfRsKq47\n+ZIq9YbqpiW306Ggx6VE0MfKGgAAALCFCG14IJfTocFIsNNlAAAAAI89NiIBAAAAABsjtAEAAACA\njRHaAAAAAMDGCG0AAAAAYGOENgAAAACwMUIbAAAAANgYoQ0AAAAAbIzQBgAAAAA2RmgDAAAAABtz\ndboAtE/DNJWv1FVtNCRJHqdTIa9bTofR4coAAAAArBehbRvKlau6triqa4urqtQbsixJhmRI8rld\n2tvboz29PQp63Z0uFQAAAMD3ILRtI8VqXWdv3tFstiDD4VB/OKDeoE8up0OypFrDVKZQ0tT8si6n\nlzUcDerl4aR8btoAAAAAsCt+W98msqWKTl2bU7VhancyqmTIL6fj/ksWowGvdvX26E6upOtLq1q5\nektv7hlU2OfpQNUAAAAAvg8bkWwDhWpNJ6dn1bAsPb8jqVRP8IGB7R6nw6GBSFAv7Eiq0rB06tqs\nSrX6FlYMAAAAYL0IbV3OsiyNf5VWzbT07GBCXpdz3a/1uV16ZrBXpVpDp7+eb2OVAAAAADaK0Nbl\nFgplZQpl7euLyvMIge0en9ul3cmo0rmilouVNlQIAAAAYDMIbV1ueiErr8ulqN+74TkS/7tZybXF\nbAsrAwAAANAKhLYuVqrVdWMlr1QkIMPY+L3XDMNQKhzUV0s5VeuNFlYIAAAAYLMIbV0sky+rYVrq\nCwU2PVdfOKBq3dQip0gCAAAAtkJo62LVxt1VMbdz81+jx+VYMycAAAAAeyC0dTHTkrTxsyLXMAxD\nlu7uRgkAAADAPghtXcztdMiyJLMFQaveMGVIcjsffQdKAAAAAO1DaOtiEZ9HDkMt2ap/qViWw7g7\nJwAAAAD7ILR1sVjAq2TQr7lsYdNzzWULGugJKOR1t6AyAAAAAK1CaOty+5IRZUsVFau1Dc+RK1dV\nqNS0NxFpYWUAAAAAWoHQ1uWGo0H5PS59lcluaBMR07L0VSarkNetwUiwDRUCAAAA2AxXpwtYr/Pn\nz+vrr7/WysqKXC6X+vv7dejQIUWj0U6X1lFOh0OvjPTpjzNzml5Y0d5kdN032rYsS1fnl1Ws1nR0\n76Acm7hBNwAAAID26JqVtnQ6rf379+tnP/uZ3nvvPZmmqd/97neq1+udLq3jhiJBvTLSp4VcSVfS\nS6rWv/9ea5V6Q5fnFrVcLOvV0X71hzd/g24AAAAArdc1K23vvvvumsdHjx7VL3/5S2UyGaVSqQ5V\nZR+7e3vkcTp0+vq8zlxPqzfo12AkqB6fp7nyZlmWVkoVzWULWiqW5XM5dXTPoFI9BDYAAADArrom\ntH1bpXJ3m3uv19vhSuxjRzSkn4T8+noppy8zWV2azchhGHI57y6o1humTMtS1O/VoeE+jcZD3JcN\nAAAAsLmuDG2WZen06dMaGBhQLBZ74JhMJrPFVdlHr0OKJwNaLDiUq9RUazQkw5Db61DE71HM75Vh\n1JRdXm57Lcv/+x7LW/BeeHzRZ2g3egxbgT7DVqDP7CWRSKxrnGFtZMvBDvv444918+ZN/eQnP1Ew\n+OAdD0+cOLHFVQEAAADA+n3wwQfrGtd1oW18fFzXr1/X8ePHFQ6HHzrucV5ps5Pl5WWdPHlSx44d\ne+iqKLBZ9BnajR7DVqDPsBXoM3tZ70pb15weaVnWugObtP4fALZGLBbjO0Hb0WdoN3oMW4E+w1ag\nz7pL12z5Pz4+runpab311ltyuVwqFosqFots+Q8AAABgW+ualbapqSkZhqEPP/xwzfGjR49qbGys\nQ1UBAAAAQHt1TWhb70V6AAAAALCddM3pkQAAAADwOCK0AQAAAICNEdoAAAAAwMYIbQAAAABgY4Q2\nAAAAALAxQhsAAAAA2BihDQAAAABsjNAGAAAAADZGaAMAAAAAGyO0AQAAAICNEdoAAAAAwMYIbQAA\nAABgY4Q2AAAAALAxQhsAAAAA2BihDQAAAABsjNAGAAAAADZGaAMAAAAAGyO0AQAAAICNuTpdAB6s\n1mjo+nJeXy3mVKjWVDNNuRwO+VxOjcRC2t3bI7+brw8AAADY7vit32YK1Zr+NL+imaVVVRumYgGf\nYkG/XA5DDctSqVrXxdklfT63pJFoSE/1xxQLeDtdNgAAAIA2IbTZyGKhrD/OzKnaMJXqCWqgJyDv\nA1bT6g1T87miZrN53coW9Npov4ajoQ5UDAAAAKDduKbNJpaKFf2/6dsyDEMHhvs02tvzwMAmSS6n\nQ0PRkF4a6VfY79HHM2ndXMlvccUAAAAAtgKhzQbKtbr+eG1WbqdTzw0l5HE51/U6h2Hoqf64YkGf\nTn89r6Vipc2VAgAAANhqhDYbmM6sqlRraP9Ar5yOR/tKDMPQE/0xOR0OXZlfalOFAAAAADqF0NZh\nDdPSdCarZNi/7hW2b3MYhgYjQd1cKahYrbe4QgAAAACdRGjrsNlsQYVqXYORzW0k0tcTkGVJM4ur\nLaoMAAAAgB0Q2jrsxkpOQa9bQa97U/O4HA4lQn59vZxrUWUAAAAA7IDQ1mGlWqNlN8n2e1wq1xot\nmQsAAACAPRDaOqxumnI6jJbM5XQYqpumLMtqyXwAAAAAOo/Q1mFup0MNszUhq2FacjscMozWhEAA\nAAAAnUdo6zC/y6VCtdaS1bFCpSa/pzWnWgIAAACwB0Jbh43GwypV68pVapuap9ZoaLFQ1q54uEWV\nAQAAALADQluHDfQE1ONza3Ylv6l50qtFOR3SrnhPiyoDAAAAYAeEtg4zDEN7ExEtFsoq1TZ2Y+x6\nw9RctqCd0bB87o3doBsAAACAPRHabGBPb48iPrcuzWZUazzalv2mZelKekkOSU+nYu0pEAAAAEDH\nENpswONy6o09g3Iahi7cyqx7xa3eMHXpdkaFalVHdqfU4/O0uVIAAAAAW43QZhNhr1sz/I3HAAAW\nAElEQVRv7xuS1+XQxI15fTG/rFy5+sCxpVpdM5msPrsxr2q9oWN7h9QfDmxxxQAAAAC2AvvD20jY\n59E7Yzt0LbOqLzNZXbi1oKDXraDHLafTkGlaKtXqWi1V5XM59UQyon2JiIJed6dLBwAAANAmhDab\n8bqcejoV05P9Uc2tFvXV4qqKtbqKFVNuh6Eej1vP9Mc0HA3J5WShFAAAANjuCG025TAMDUWCGooE\nO10KAAAAgA5iqQYAAAAAbIzQBgAAAAA2RmgDAAAAABsjtAEAAACAjRHaAAAAAMDGumb3yLm5OV24\ncEGZTEbFYlE/+tGPNDo62umyAAAAAKCtumalrV6vK5FI6MiRI50uBQAAAAC2TNestA0PD2t4eLjT\nZQAAAADAluqalTYAAAAAeBx1zUrbo8pkMp0uAZKWl5fX/Au0A32GdqPHsBXoM2wF+sxeEonEusYZ\nlmVZba6l5U6cOPG9G5GcOHFi6woCAAAAgEf0wQcfrGvctl1pe//99ztdAnT3rzgnT57UsWPHFIvF\nOl0Otin6DO1Gj2Er0GfYCvRZd9q2oW29S43YGrFYjO8EbUefod3oMWwF+gxbgT7rLl0T2mq1mrLZ\nbPNxLpdTJpORz+dTKBTqYGUAAAAA0D5dE9oWFhb0m9/8RpJkGIZOnz4tSRobG9PRo0c7WBkAAAAA\ntE/XhLbBwcF1X6gHAAAAANsF92kDAAAAABsjtAEAAACAjRHaAAAAAMDGCG0AAAAAYGOENgAAAACw\nMUIbAAAAANgYoQ0AAAAAbIzQBgAAAAA2RmgDAAAAABsjtAEAAACAjRHaAAAAAMDGCG0AAAAAYGOE\nNgAAAACwMUIbAAAAANgYoQ0AAAAAbIzQBgAAAAA2RmgDAAAAABsjtAEAAACAjRHaAAAAAMDGCG0A\nAAAAYGOENgAAAACwMUIbAAAAANgYoQ0AAAAAbIzQBgAAAAA2RmgDAAAAABsjtAEAAACAjRHaAAAA\nAMDGCG0AAAAAYGOENgAAAACwMUIbAAAAANgYoQ0AAAAAbIzQBgAAAAA2RmgDAAAAABsjtAEAAACA\njRHaAAAAAMDGCG0AAAAAYGOENgAAAACwMUIbAAAAANgYoQ0AAAAAbMzV6QK2u9VyVfO5kqqNhkzL\nktvpUI/Xo1RPQA7D6HR5AAAAAGyO0NYGpmVpNlvQdCarudWiTEluh0OGYahumjJNS2GvW3sTEe3u\nDcvn5msAAAAA8GCkhRZbLVf1/2fmtFKqKuR1a3cyqr5QQA7H/62q5cpVzWYLmpzN6PO5RR3YkdS+\nZKSDVQMAAACwK0JbCy0VKzo1fVuWDD2/I6mwz/PAcWGfR0/4PNqdiOj60qrO3ryjUq2u5wZ7t7hi\nAAAAAHZHaGuRQqWmP16blcPh0DODCbmd37/Hi9vp0N5kVF6XU5fSS/K5nRpLRregWgAAAADdoqtC\n2+XLl3Xx4kUVi0XFYjEdPnxYqVSq02VJks7dWlDNtPTCjvUFtm8ajoVVrTc0cSujwZ6gQl53m6oE\nAAAA0G26Zsv/a9eu6fTp03rxxRf1V3/1V0qlUvr3f/935fP5TpemfKWm29mihmNheVzODc0x2tsj\nSZpZXG1laQAAAAC6XNeEtosXL+rJJ5/Uk08+qWg0qsOHDysYDGpqaqrTpena4qoMQ+oL+zc8h9Ph\nUDIc0PTiqhqm2cLqAAAAAHSzrghtjUZDmUxGO3bsWHN8x44dmp+f71BVdzVMS9cWV9UXDsjp2NyP\nczASVKla161soUXVAQAAAOh2XRHayuWyLMuS3792Jcvv96tYLHaoqrvKtbrKtbpiAd+m5wp43PK6\nnMqWqi2oDAAAAMB20FUbkTyKTCazJe+zWqqqUcipnHNptbL5FTKzlNfKUkMZj9WC6jpveXl5zb9A\nO9BnaDd6DFuBPsNWoM/sJZFIrGucYVmW7dNBo9HQP/7jP+qdd97R6Oho8/h///d/a3FxUcePH7/v\nNSdOnNjCCgEAAADg0XzwwQfrGtcVK21Op1PJZFK3bt1aE9q+/fib3n///S2prVSr69T0rEZ7exT1\nezc1lyXp0mxGu+Jh7dsm92tbXl7WyZMndezYMcVisU6Xg22KPkO70WPYCvQZtgJ91p26IrRJ0rPP\nPquTJ08qmUyqr69PV65cUaFQ0NNPP/3A8etdatwsy7IUX64o73RoJN67qblWihXJX9buHYNKhAMt\nqtAeYrHYln0neHzRZ2g3egxbgT7DVqDPukvXhLY9e/aoUqloYmJCxWJR8Xhc7777rkKhUEfrMgxD\n+xIRnb21oGq9seH7tEnSbDavqM+jvtDGbx0AAAAAYHvpmtAmSU8//fRDV9Y6aTQe1uTsouZWC9oZ\n79nQHNV6Q0vFsg4NJ2UYRosrBAAAANCtumLLf7vzuJza0xvW7eW8cuVH367ftCz9aX5JPpdTO2Ph\nNlQIAAAAoFsR2lrk+cGEEiGfLs8tPlJwM01LV9JLKlZqOrIrtanTKwEAAABsP4S2FnE5HfrBrgHF\n/B59fjujW8s51RvmQ8dblqXlYlkXbi8oX67q9V0p9W+zzUcAAAAAbF5XXdNmdz63U2/tHdK5Wwv6\naimnG0s59Yb86g8H5HE55DAM1U1TK6WK5rIFVesNxfxeHdmVUiLo63T5AAAAAGyI0NZiLqdDr+zs\n13ODvZpZXNV0ZlWX5zK6dwtzw5AchqGRaEj7khElgj42HgEAAADwUIS2NvG7Xdqfiuup/phWy1VV\n6g1ZluR2OhTyuuXl2jUAAAAA60BoazOHYSjq93a6DAAAAABdio1IAAAAAMDGCG0AAAAAYGOENgAA\nAACwMUIbAAAAANgYoQ0AAAAAbIzQBgAAAAA2RmgDAAAAABsjtAEAAACAjRHaAAAAAMDGCG0AAAAA\nYGOENgAAAACwMcOyLKvTRQAAAAAAHoyVNgAAAACwMUIbAAAAANgYoQ0AAAAAbIzQBgAAAAA2RmgD\nAAAAABsjtAEAAACAjbk6XQC2n88++0wTExNrjgUCAf3d3/1dhyrCdjA3N6cLFy4ok8moWCzqRz/6\nkUZHR5vPnzp1SlevXl3zmv7+fv30pz/d4krRzc6fP6+vv/5aKysrcrlc6u/v16FDhxSNRptj6DVs\n1tTUlKamppTL5SRJ8XhcBw4c0PDwsCR6DK03OTmpM2fO6JlnntHhw4cl0WfdhtCGtojH4/rzP//z\n5mPDMDpYDbaDer2uRCKhJ598Un/4wx8eOGZ4eFhHjx5tPnY4OJkAjyadTmv//v1KJpMyTVNnz57V\n7373O/31X/+1XK7/+18mvYbNCAaDeuWVVxSJRGRZlq5evarf//73ev/99xWPxyXRY2idO3fu6MqV\nK+rt7b3v9zH6rHsQ2tAWhmHI7/d3ugxsI8PDw82/Qj+M0+mk77Ap77777prHR48e1S9/+UtlMhml\nUqnmcXoNm7Fz5841jw8ePKipqSktLCw0Qxs9hlao1Wo6efKk3njjjfvOgpLos25CaENbZLNZ/epX\nv5LT6VRfX58OHjyonp6eTpeFbW52dlb/9E//JI/Ho4GBAR08eJD/GWFTKpWKJMnr9a45Tq+hVUzT\n1MzMjBqNxpo/DNBjaIWPP/5YIyMjGhoaemBoo8+6h2FZltXpIrC93Lx5U/V6XZFIRKVSSRMTE1pZ\nWdHPf/5z+Xy+TpeHbeDEiRP3XdN27do1ud1uhcNhra6u6rPPPpNpmnr//ffldDo7Vyy6lmVZ+v3v\nf69arabjx483j9NraIWlpSX9+te/VqPRkMvl0g9/+EONjIxIosfQGtPT05qcnNRf/uVfyul06sMP\nP1QikdBrr70miT7rNs5f/OIXv+h0EdheIpGIYrGY/H6/enp6tGfPHl26dElut1v9/f2dLg/bwLlz\n57Rnz541m0PE43FFIhH5/X5Fo1GNjo7q3LlzisfjisViHawW3Wp8fFzz8/P6sz/7M3k8nuZxeg2t\n4PV6tXfvXo2NjcntduvTTz/V8PCwAoEAPYZNy+fz+s///E/9+Mc/VjAYlCRdvXpVfr+/eakBfdZd\nOD0SbedyuRSPx7W6utrpUvAYCQQCCoVC9B02ZHx8XDdu3NDx48ebv/A8DL2GjXA4HM3LBhKJhBYW\nFjQ1NaU33njjvrH0GB5VJpNRqVTSv/7rvzaPWZaldDqtqakp/cM//MN9m5LQZ/ZGaEPbNRoNrays\naGBgoNOl4DFSLpeVz+cVCAQ6XQq6iGVZGh8f1/Xr13X8+HGFw+HvfQ29hlawLEsPu2KFHsOjGhoa\n0s9//vPmY8uy9NFHHykajeqFF1544K7e9Jm9EdrQcp988ol27typYDCocrmsiYkJ1Wo1jY2Ndbo0\ndLFaraZsNtt8nMvllMlk5PP55PV6de7cOe3atUuBQEC5XE5nz56Vz+dbc90b8H3Gx8c1PT2tH//4\nx3K5XCoWi5Ikj8cjl8ulWq1Gr2HTzpw5o+HhYYVCIdVqNU1PT2tubk4vvvgiPYaWcLvd953i6HK5\n5PV6FYvF6LMuRGhDyxUKBf3Xf/2XyuWy/H6/+vr69LOf/UyhUKjTpaGLLSws6De/+Y2ku7eUOH36\ntCRpbGxMR44c0dLSkr788ktVKhUFAgENDg7q7bffltvt7mTZ6DJTU1MyDEMffvjhmuNHjx7V2NiY\nDMOg17BppVJJp06dUrFYlMfjUTwe13vvvaehoSHV63V6DG1zb4WN/5Z1H3aPBAAAAAAb47bnAAAA\nAGBjhDYAAAAAsDFCGwAAAADYGKENAAAAAGyM0AYAAAAANkZoAwAAAAAbI7QBAAAAgI0R2gAAAADA\nxghtAAAAAGBjhDYAAAAAsDFCGwAAGzQ3N6d0Ot3RGiYnJ2VZVkdrAAC0F6ENAIANWF1d1Y0bN5RK\npTpax8jIiD799NOO1gAAaC9XpwsAAKAbnT17Vj/4wQ/WHLMsS7Ozs5qYmNDx48e/d46rV6/q888/\nVzKZlMvlUj6fVzQa1aFDh9aMO3/+vMrlstxut3K5nF5//XV5PB5JUjwel9Pp1NzcnAYGBlr3AQEA\ntsFKGwAAj+jOnTvyer3N4CRJMzMz+uijjzQ1NaV8Pr+ueSzLUrlc1vT0tG7evKlUKqWDBw+uGXP5\n8mXNzc3ptdde08svv6xYLKaTJ0+uGbN//35NTk5u/oMBAGyJlTYAAB7R5cuXtX///jXHdu/erd27\nd+vq1avKZDLrmscwDL311lvfuUI2OTmpV155pfl43759OnPmjLLZrCKRiCQpEAjINE2trq6qp6dn\nA58IAGBnrLQBAPCI0um0EonEA5971E1Bvmv8ysqKCoWCYrFY81gwGJTH49Hs7Oyasf39/bpx48Yj\nvTcAoDuw0gYAsIXPPvtMy8vL6u/vV7lcVr1e1+3bt/UXf/EX8vv9yufzmpiYUE9Pj+r1uvL5vEZG\nRrR7925J0rlz57S8vKx9+/bJ6XRqcXFRtVpNlUpFr7/++n3vNz09rbm5OQWDQQWDQU1OTupv/uZv\nJEnlclmTk5Nyu90yDEMul0vPPfecJCmbzcrn88nhaM3fPdPptGZnZ2UYhrLZrA4fPiyfzyfp7mYn\nktachilJbrf7vlMwk8mkvvrqq5bUBACwF1baAAAdd/PmTe3YsaO5E+LIyIh27typcrks0zRVKpX0\n29/+VmNjY3rhhRc0MjKiq1evKhQKSZJu376toaEhpVIpffzxx6rX63r++ef18ssva3p6WouLi2ve\nb2pqShcvXtTrr7+uAwcO6Pbt282gVC6X9W//9m/q6+vTSy+9pAMHDqx5faFQaI5thUajoZdfflkv\nvfSSksmk/vCHPzSfq1arkiSXa+3fWN1utyqVyppjXq9X2Wy2ZXUBAOyD0AYA6DjDMJRKpbS0tKTd\nu3crlUppaGhIf//3f69gMKhPPvlEvb29ze31PR6PnE5n8xTFbDar/v5+LS0tqb+/X6Ojo5Ik0zRV\nq9VkGEbzvQqFgj755BO99tprzdUyv9/fvK5sfHxcPT09zRW8dDq9ZqWrXC7ft/K1UcPDw3rppZea\nj4eGhpROp5v3frtX9zfrv/e5vn1apdfrvS/IAQC2B0IbAKDjduzYIeluQBocHFzzXLlc1rVr17Rr\n167msXQ6rb6+vmboevrpp2UYhubn55uBTbq7y6PD4VA0Gm0e++KLL+R2u9ds/pFOp5VKpVSpVDQz\nMyO3263JyUmdO3dO+Xxehw8fbo5t5Y2s/X7/mtMs3W53s25JD13Rq9fr9wVHwzDuC3cAgO2Ba9oA\nALZQq9W0uLh4382q79y5I9M07wtZ/f39a8aVy2WtrKysef3169c1NDS0JhjNz8+veW29XtfS0pJS\nqZRWVlZkWZYOHjzY3Jnx23w+X/O0xc2oVqv6l3/5Fz3zzDPN6+VqtZokNesNh8OSpGKx2AxwlmWp\nUqk0n7unUqm09LRNAIB9sNIGALCF+fl5ud3uNTslSndDldvtViAQWDO2v79fd+7c0fz8vKS7QS4Q\nCDSvc5Pu3jtt165dKpVKmp6elnT3GrJvrrwtLCwoGo3K4/E0rx375ntJd0PTvaAWCARULpc3/Xnv\nrYp9M3zd23jk3mpjT0+PIpGIVlZWmmNWVlbUaDQ0NDS0Zr5KpSK/37/pugAA9kNoAwDYwr1TFL+t\nr69PhmHINE1J0qVLl5TP5xWJRDQ7O9u8ri2dTq9ZjSuXy8rlchoaGtLMzEwz5AwMDDQDWKPR0Nmz\nZ5vvG4vFFI1Gtby83Jwnl8tpcnJSTqdTkhSNRlUqlR56mqRlWQ987osvvtA///M/q1QqSbp7KuQT\nTzyxpubp6WmNjY0pHo83j+3bt09ffvnlmnlGR0fvWwlcXFx86G0IAADdjdMjAQC2UCgU1ly3dk8o\nFNKrr76q8fHx5oYhL774oi5evKhkMtkMU99+vc/n065duzQ1NaVEItFchXruuef00Ucf6fz58zIM\nQ7VarRmcDMPQO++8o4mJCcViMVmWJa/Xq1dffbV5yqJhGEomk/eFpJs3b+pPf/qT0ul0cwfKaDSq\nN998sznGNM01190999xzmpiYUL1eV61WUzQaXbMxiSS98MILOnPmjMbHx+X1elUqlXT06NH7fk7z\n8/M6ePDgBn7yAAC7M6xWXlENAEAXqVar+tWvfqW//du/ve+UyO9y48YNpdNpHTp06JHer1wua2Fh\nQcPDw49a6neqVCr6j//4D/30pz9t6bwAAHvg9EgAwGPj0qVL+uijj5qPL168qLGxsUcKbJI0MjKi\nxcVF1ev1R3rdrVu3HngK6GZduXJFzz77bMvnBQDYA6ENAPDYuHca4oULF3T69Gm5XC4dOXJkQ3O9\n9NJLOnfu3LrH53I5OZ3O5rb+rZLP57WystK8rxwAYPvh9EgAADZoenpafr//vp0ct9Knn36qAwcO\ntDwMAgDsg9AGAAAAADbG6ZEAAAAAYGOENgAAAACwMUIbAAAAANgYoQ0AAAAAbIzQBgAAAAA2RmgD\nAAAAABsjtAEAAACAjRHaAAAAAMDGCG0AAAAAYGOENgAAAACwMUIbAAAAANjY/wButjzeDQGnxwAA\nAABJRU5ErkJggg==\n", 197 | "text/plain": [ 198 | "" 199 | ] 200 | }, 201 | "metadata": {}, 202 | "output_type": "display_data" 203 | } 204 | ], 205 | "source": [ 206 | "# Check matplotlib plotting\n", 207 | "import matplotlib.pyplot as plt\n", 208 | "import matplotlib.cm as cm\n", 209 | "from math import log\n", 210 | "\n", 211 | "# function for generating plot layout\n", 212 | "def preparePlot(xticks, yticks, figsize=(10.5, 6), hideLabels=False, gridColor='#999999', gridWidth=1.0):\n", 213 | " plt.close()\n", 214 | " fig, ax = plt.subplots(figsize=figsize, facecolor='white', edgecolor='white')\n", 215 | " ax.axes.tick_params(labelcolor='#999999', labelsize='10')\n", 216 | " for axis, ticks in [(ax.get_xaxis(), xticks), (ax.get_yaxis(), yticks)]:\n", 217 | " axis.set_ticks_position('none')\n", 218 | " axis.set_ticks(ticks)\n", 219 | " axis.label.set_color('#999999')\n", 220 | " if hideLabels: axis.set_ticklabels([])\n", 221 | " plt.grid(color=gridColor, linewidth=gridWidth, linestyle='-')\n", 222 | " map(lambda position: ax.spines[position].set_visible(False), ['bottom', 'top', 'left', 'right'])\n", 223 | " return fig, ax\n", 224 | "\n", 225 | "# generate layout and plot data\n", 226 | "x = range(1, 50)\n", 227 | "y = [log(x1 ** 2) for x1 in x]\n", 228 | "fig, ax = preparePlot(range(5, 60, 10), range(0, 12, 1))\n", 229 | "plt.scatter(x, y, s=14**2, c='#d6ebf2', edgecolors='#8cbfd0', alpha=0.75)\n", 230 | "ax.set_xlabel(r'$range(1, 50)$'), ax.set_ylabel(r'$\\log_e(x^2)$')\n", 231 | "pass" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "### ** Part 4: Check MathJax Formulas **" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "#### ** (4a) Gradient descent formula **\n", 246 | "#### You should see a formula on the line below this one: $$ \\scriptsize \\mathbf{w}_{i+1} = \\mathbf{w}_i - \\alpha_i \\sum_j (\\mathbf{w}_i^\\top\\mathbf{x}_j - y_j) \\mathbf{x}_j \\,.$$\n", 247 | " \n", 248 | "#### This formula is included inline with the text and is $ \\scriptsize (\\mathbf{w}^\\top \\mathbf{x} - y) \\mathbf{x} $." 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "#### ** (4b) Log loss formula **\n", 256 | "#### This formula shows log loss for single point. Log loss is defined as: $$ \\begin{align} \\scriptsize \\ell_{log}(p, y) = \\begin{cases} -\\log (p) & \\text{if } y = 1 \\\\\\ -\\log(1-p) & \\text{if } y = 0 \\end{cases} \\end{align} $$" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "### ** Part 5: Export / download and submit **" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "#### ** (5a) Time to submit **" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "#### You have completed the lab. To submit the lab for grading you will need to download it from your IPython Notebook environment. You can do this by clicking on \"File\", then hovering your mouse over \"Download as\", and then clicking on \"Python (.py)\". This will export your IPython Notebook as a .py file to your computer.\n", 278 | "#### To upload this file to the course autograder, go to the edX website and find the page for submitting this assignment. Click \"Choose file\", then navigate to and click on the downloaded .py file. Now click the \"Open\" button and then the \"Check\" button. Your submission will be graded shortly and will be available on the page where you submitted. Note that when submission volumes are high, it may take as long as an hour to receive results." 279 | ] 280 | } 281 | ], 282 | "metadata": { 283 | "kernelspec": { 284 | "display_name": "Python 2", 285 | "language": "python", 286 | "name": "python2" 287 | }, 288 | "language_info": { 289 | "codemirror_mode": { 290 | "name": "ipython", 291 | "version": 2 292 | }, 293 | "file_extension": ".py", 294 | "mimetype": "text/x-python", 295 | "name": "python", 296 | "nbconvert_exporter": "python", 297 | "pygments_lexer": "ipython2", 298 | "version": "2.7.6" 299 | } 300 | }, 301 | "nbformat": 4, 302 | "nbformat_minor": 0 303 | } 304 | -------------------------------------------------------------------------------- /Week 2 - Introduction to Apache Spark/Week2Lec3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 2 - Introduction to Apache Spark/Week2Lec3.pdf -------------------------------------------------------------------------------- /Week 2 - Introduction to Apache Spark/Week2Lec4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 2 - Introduction to Apache Spark/Week2Lec4.pdf -------------------------------------------------------------------------------- /Week 2 - Introduction to Apache Spark/lab1_word_count_student.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#![Spark Logo](http://spark-mooc.github.io/web-assets/images/ta_Spark-logo-small.png) + ![Python Logo](http://spark-mooc.github.io/web-assets/images/python-logo-master-v3-TM-flattened_small.png)\n", 8 | "# **Word Count Lab: Building a word count application**\n", 9 | "#### This lab will build on the techniques covered in the Spark tutorial to develop a simple word count application. The volume of unstructured text in existence is growing dramatically, and Spark is an excellent tool for analyzing this type of data. In this lab, we will write code that calculates the most common words in the [Complete Works of William Shakespeare](http://www.gutenberg.org/ebooks/100) retrieved from [Project Gutenberg](http://www.gutenberg.org/wiki/Main_Page). This could also be scaled to find the most common words on the Internet.\n", 10 | "#### ** During this lab we will cover: **\n", 11 | "#### *Part 1:* Creating a base RDD and pair RDDs\n", 12 | "#### *Part 2:* Counting with pair RDDs\n", 13 | "#### *Part 3:* Finding unique words and a mean value\n", 14 | "#### *Part 4:* Apply word count to a file\n", 15 | "#### Note that, for reference, you can look up the details of the relevant methods in [Spark's Python API](https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD)" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "### ** Part 1: Creating a base RDD and pair RDDs **" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "#### In this part of the lab, we will explore creating a base RDD with `parallelize` and using pair RDDs to count words." 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "#### ** (1a) Create a base RDD **\n", 37 | "#### We'll start by generating a base RDD by using a Python list and the `sc.parallelize` method. Then we'll print out the type of the base RDD." 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 1, 43 | "metadata": { 44 | "collapsed": false 45 | }, 46 | "outputs": [ 47 | { 48 | "name": "stdout", 49 | "output_type": "stream", 50 | "text": [ 51 | "\n" 52 | ] 53 | } 54 | ], 55 | "source": [ 56 | "wordsList = ['cat', 'elephant', 'rat', 'rat', 'cat']\n", 57 | "wordsRDD = sc.parallelize(wordsList, 4)\n", 58 | "# Print out the type of wordsRDD\n", 59 | "print type(wordsRDD)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "#### ** (1b) Pluralize and test **\n", 67 | "#### Let's use a `map()` transformation to add the letter 's' to each string in the base RDD we just created. We'll define a Python function that returns the word with an 's' at the end of the word. Please replace `` with your solution. If you have trouble, the next cell has the solution. After you have defined `makePlural` you can run the third cell which contains a test. If you implementation is correct it will print `1 test passed`.\n", 68 | "#### This is the general form that exercises will take, except that no example solution will be provided. Exercises will include an explanation of what is expected, followed by code cells where one cell will have one or more `` sections. The cell that needs to be modified will have `# TODO: Replace with appropriate code` on its first line. Once the `` sections are updated and the code is run, the test cell can then be run to verify the correctness of your solution. The last code cell before the next markdown section will contain the tests." 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 2, 74 | "metadata": { 75 | "collapsed": false 76 | }, 77 | "outputs": [ 78 | { 79 | "name": "stdout", 80 | "output_type": "stream", 81 | "text": [ 82 | "cats\n" 83 | ] 84 | } 85 | ], 86 | "source": [ 87 | "# TODO: Replace with appropriate code\n", 88 | "def makePlural(word):\n", 89 | " \"\"\"Adds an 's' to `word`.\n", 90 | "\n", 91 | " Note:\n", 92 | " This is a simple function that only adds an 's'. No attempt is made to follow proper\n", 93 | " pluralization rules.\n", 94 | "\n", 95 | " Args:\n", 96 | " word (str): A string.\n", 97 | "\n", 98 | " Returns:\n", 99 | " str: A string with 's' added to it.\n", 100 | " \"\"\"\n", 101 | " return word + 's'\n", 102 | "\n", 103 | "print makePlural('cat')" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 3, 109 | "metadata": { 110 | "collapsed": false 111 | }, 112 | "outputs": [ 113 | { 114 | "name": "stdout", 115 | "output_type": "stream", 116 | "text": [ 117 | "cats\n" 118 | ] 119 | } 120 | ], 121 | "source": [ 122 | "# One way of completing the function\n", 123 | "def makePlural(word):\n", 124 | " return word + 's'\n", 125 | "\n", 126 | "print makePlural('cat')" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 4, 132 | "metadata": { 133 | "collapsed": false 134 | }, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "1 test passed.\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "# Load in the testing code and check to see if your answer is correct\n", 146 | "# If incorrect it will report back '1 test failed' for each failed test\n", 147 | "# Make sure to rerun any cell you change before trying the test again\n", 148 | "from test_helper import Test\n", 149 | "# TEST Pluralize and test (1b)\n", 150 | "Test.assertEquals(makePlural('rat'), 'rats', 'incorrect result: makePlural does not add an s')" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "#### ** (1c) Apply `makePlural` to the base RDD **\n", 158 | "#### Now pass each item in the base RDD into a [map()](http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.map) transformation that applies the `makePlural()` function to each element. And then call the [collect()](http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.collect) action to see the transformed RDD." 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 5, 164 | "metadata": { 165 | "collapsed": false 166 | }, 167 | "outputs": [ 168 | { 169 | "name": "stdout", 170 | "output_type": "stream", 171 | "text": [ 172 | "['cats', 'elephants', 'rats', 'rats', 'cats']\n" 173 | ] 174 | } 175 | ], 176 | "source": [ 177 | "# TODO: Replace with appropriate code\n", 178 | "pluralRDD = wordsRDD.map(makePlural)\n", 179 | "print pluralRDD.collect()" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 6, 185 | "metadata": { 186 | "collapsed": false 187 | }, 188 | "outputs": [ 189 | { 190 | "name": "stdout", 191 | "output_type": "stream", 192 | "text": [ 193 | "1 test passed.\n" 194 | ] 195 | } 196 | ], 197 | "source": [ 198 | "# TEST Apply makePlural to the base RDD(1c)\n", 199 | "Test.assertEquals(pluralRDD.collect(), ['cats', 'elephants', 'rats', 'rats', 'cats'],\n", 200 | " 'incorrect values for pluralRDD')" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "#### ** (1d) Pass a `lambda` function to `map` **\n", 208 | "#### Let's create the same RDD using a `lambda` function." 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 7, 214 | "metadata": { 215 | "collapsed": false 216 | }, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "['cats', 'elephants', 'rats', 'rats', 'cats']\n" 223 | ] 224 | } 225 | ], 226 | "source": [ 227 | "# TODO: Replace with appropriate code\n", 228 | "pluralLambdaRDD = wordsRDD.map(lambda word: word + 's')\n", 229 | "print pluralLambdaRDD.collect()" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 8, 235 | "metadata": { 236 | "collapsed": false 237 | }, 238 | "outputs": [ 239 | { 240 | "name": "stdout", 241 | "output_type": "stream", 242 | "text": [ 243 | "1 test passed.\n" 244 | ] 245 | } 246 | ], 247 | "source": [ 248 | "# TEST Pass a lambda function to map (1d)\n", 249 | "Test.assertEquals(pluralLambdaRDD.collect(), ['cats', 'elephants', 'rats', 'rats', 'cats'],\n", 250 | " 'incorrect values for pluralLambdaRDD (1d)')" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "#### ** (1e) Length of each word **\n", 258 | "#### Now use `map()` and a `lambda` function to return the number of characters in each word. We'll `collect` this result directly into a variable." 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 10, 264 | "metadata": { 265 | "collapsed": false 266 | }, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "[4, 9, 4, 4, 4]\n" 273 | ] 274 | } 275 | ], 276 | "source": [ 277 | "# TODO: Replace with appropriate code\n", 278 | "pluralLengths = (pluralRDD\n", 279 | " .map(lambda word: len(word))\n", 280 | " .collect())\n", 281 | "print pluralLengths" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 11, 287 | "metadata": { 288 | "collapsed": false 289 | }, 290 | "outputs": [ 291 | { 292 | "name": "stdout", 293 | "output_type": "stream", 294 | "text": [ 295 | "1 test passed.\n" 296 | ] 297 | } 298 | ], 299 | "source": [ 300 | "# TEST Length of each word (1e)\n", 301 | "Test.assertEquals(pluralLengths, [4, 9, 4, 4, 4],\n", 302 | " 'incorrect values for pluralLengths')" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "#### ** (1f) Pair RDDs **\n", 310 | "#### The next step in writing our word counting program is to create a new type of RDD, called a pair RDD. A pair RDD is an RDD where each element is a pair tuple `(k, v)` where `k` is the key and `v` is the value. In this example, we will create a pair consisting of `('', 1)` for each word element in the RDD.\n", 311 | "#### We can create the pair RDD using the `map()` transformation with a `lambda()` function to create a new RDD." 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": 12, 317 | "metadata": { 318 | "collapsed": false 319 | }, 320 | "outputs": [ 321 | { 322 | "name": "stdout", 323 | "output_type": "stream", 324 | "text": [ 325 | "[('cat', 1), ('elephant', 1), ('rat', 1), ('rat', 1), ('cat', 1)]\n" 326 | ] 327 | } 328 | ], 329 | "source": [ 330 | "# TODO: Replace with appropriate code\n", 331 | "wordPairs = wordsRDD.map(lambda word: (word, 1))\n", 332 | "print wordPairs.collect()" 333 | ] 334 | }, 335 | { 336 | "cell_type": "code", 337 | "execution_count": 13, 338 | "metadata": { 339 | "collapsed": false 340 | }, 341 | "outputs": [ 342 | { 343 | "name": "stdout", 344 | "output_type": "stream", 345 | "text": [ 346 | "1 test passed.\n" 347 | ] 348 | } 349 | ], 350 | "source": [ 351 | "# TEST Pair RDDs (1f)\n", 352 | "Test.assertEquals(wordPairs.collect(),\n", 353 | " [('cat', 1), ('elephant', 1), ('rat', 1), ('rat', 1), ('cat', 1)],\n", 354 | " 'incorrect value for wordPairs')" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "### ** Part 2: Counting with pair RDDs **" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "#### Now, let's count the number of times a particular word appears in the RDD. There are multiple ways to perform the counting, but some are much less efficient than others.\n", 369 | "#### A naive approach would be to `collect()` all of the elements and count them in the driver program. While this approach could work for small datasets, we want an approach that will work for any size dataset including terabyte- or petabyte-sized datasets. In addition, performing all of the work in the driver program is slower than performing it in parallel in the workers. For these reasons, we will use data parallel operations." 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "#### ** (2a) `groupByKey()` approach **\n", 377 | "#### An approach you might first consider (we'll see shortly that there are better ways) is based on using the [groupByKey()](http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.groupByKey) transformation. As the name implies, the `groupByKey()` transformation groups all the elements of the RDD with the same key into a single list in one of the partitions. There are two problems with using `groupByKey()`:\n", 378 | " + #### The operation requires a lot of data movement to move all the values into the appropriate partitions.\n", 379 | " + #### The lists can be very large. Consider a word count of English Wikipedia: the lists for common words (e.g., the, a, etc.) would be huge and could exhaust the available memory in a worker.\n", 380 | " \n", 381 | "#### Use `groupByKey()` to generate a pair RDD of type `('word', iterator)`." 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 15, 387 | "metadata": { 388 | "collapsed": false 389 | }, 390 | "outputs": [ 391 | { 392 | "name": "stdout", 393 | "output_type": "stream", 394 | "text": [ 395 | "rat: [1, 1]\n", 396 | "elephant: [1]\n", 397 | "cat: [1, 1]\n" 398 | ] 399 | } 400 | ], 401 | "source": [ 402 | "# TODO: Replace with appropriate code\n", 403 | "# Note that groupByKey requires no parameters\n", 404 | "wordsGrouped = wordPairs.groupByKey()\n", 405 | "for key, value in wordsGrouped.collect():\n", 406 | " print '{0}: {1}'.format(key, list(value))" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": 16, 412 | "metadata": { 413 | "collapsed": false 414 | }, 415 | "outputs": [ 416 | { 417 | "name": "stdout", 418 | "output_type": "stream", 419 | "text": [ 420 | "1 test passed.\n" 421 | ] 422 | } 423 | ], 424 | "source": [ 425 | "# TEST groupByKey() approach (2a)\n", 426 | "Test.assertEquals(sorted(wordsGrouped.mapValues(lambda x: list(x)).collect()),\n", 427 | " [('cat', [1, 1]), ('elephant', [1]), ('rat', [1, 1])],\n", 428 | " 'incorrect value for wordsGrouped')" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "#### ** (2b) Use `groupByKey()` to obtain the counts **\n", 436 | "#### Using the `groupByKey()` transformation creates an RDD containing 3 elements, each of which is a pair of a word and a Python iterator.\n", 437 | "#### Now sum the iterator using a `map()` transformation. The result should be a pair RDD consisting of (word, count) pairs." 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 22, 443 | "metadata": { 444 | "collapsed": false 445 | }, 446 | "outputs": [ 447 | { 448 | "name": "stdout", 449 | "output_type": "stream", 450 | "text": [ 451 | "[('rat', 2), ('elephant', 1), ('cat', 2)]\n" 452 | ] 453 | } 454 | ], 455 | "source": [ 456 | "# TODO: Replace with appropriate code\n", 457 | "wordCountsGrouped = wordsGrouped.map(lambda (k,v): (k, sum(v)))\n", 458 | "print wordCountsGrouped.collect()" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": 23, 464 | "metadata": { 465 | "collapsed": false 466 | }, 467 | "outputs": [ 468 | { 469 | "name": "stdout", 470 | "output_type": "stream", 471 | "text": [ 472 | "1 test passed.\n" 473 | ] 474 | } 475 | ], 476 | "source": [ 477 | "# TEST Use groupByKey() to obtain the counts (2b)\n", 478 | "Test.assertEquals(sorted(wordCountsGrouped.collect()),\n", 479 | " [('cat', 2), ('elephant', 1), ('rat', 2)],\n", 480 | " 'incorrect value for wordCountsGrouped')" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "#### ** (2c) Counting using `reduceByKey` **\n", 488 | "#### A better approach is to start from the pair RDD and then use the [reduceByKey()](http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.reduceByKey) transformation to create a new pair RDD. The `reduceByKey()` transformation gathers together pairs that have the same key and applies the function provided to two values at a time, iteratively reducing all of the values to a single value. `reduceByKey()` operates by applying the function first within each partition on a per-key basis and then across the partitions, allowing it to scale efficiently to large datasets." 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 31, 494 | "metadata": { 495 | "collapsed": false 496 | }, 497 | "outputs": [ 498 | { 499 | "name": "stdout", 500 | "output_type": "stream", 501 | "text": [ 502 | "[('rat', 2), ('elephant', 1), ('cat', 2)]\n" 503 | ] 504 | } 505 | ], 506 | "source": [ 507 | "# TODO: Replace with appropriate code\n", 508 | "# Note that reduceByKey takes in a function that accepts two values and returns a single value\n", 509 | "\n", 510 | "wordCounts = wordPairs.reduceByKey(lambda a,b: a+b)\n", 511 | "print wordCounts.collect()" 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": 32, 517 | "metadata": { 518 | "collapsed": false 519 | }, 520 | "outputs": [ 521 | { 522 | "name": "stdout", 523 | "output_type": "stream", 524 | "text": [ 525 | "1 test passed.\n" 526 | ] 527 | } 528 | ], 529 | "source": [ 530 | "# TEST Counting using reduceByKey (2c)\n", 531 | "Test.assertEquals(sorted(wordCounts.collect()), [('cat', 2), ('elephant', 1), ('rat', 2)],\n", 532 | " 'incorrect value for wordCounts')" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "metadata": {}, 538 | "source": [ 539 | "#### ** (2d) All together **\n", 540 | "#### The expert version of the code performs the `map()` to pair RDD, `reduceByKey()` transformation, and `collect` in one statement." 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": 33, 546 | "metadata": { 547 | "collapsed": false 548 | }, 549 | "outputs": [ 550 | { 551 | "name": "stdout", 552 | "output_type": "stream", 553 | "text": [ 554 | "[('rat', 2), ('elephant', 1), ('cat', 2)]\n" 555 | ] 556 | } 557 | ], 558 | "source": [ 559 | "# TODO: Replace with appropriate code\n", 560 | "wordCountsCollected = (wordsRDD\n", 561 | " .map(lambda word: (word, 1))\n", 562 | " .reduceByKey(lambda a,b: a+b)\n", 563 | " .collect())\n", 564 | "print wordCountsCollected" 565 | ] 566 | }, 567 | { 568 | "cell_type": "code", 569 | "execution_count": 34, 570 | "metadata": { 571 | "collapsed": false 572 | }, 573 | "outputs": [ 574 | { 575 | "name": "stdout", 576 | "output_type": "stream", 577 | "text": [ 578 | "1 test passed.\n" 579 | ] 580 | } 581 | ], 582 | "source": [ 583 | "# TEST All together (2d)\n", 584 | "Test.assertEquals(sorted(wordCountsCollected), [('cat', 2), ('elephant', 1), ('rat', 2)],\n", 585 | " 'incorrect value for wordCountsCollected')" 586 | ] 587 | }, 588 | { 589 | "cell_type": "markdown", 590 | "metadata": {}, 591 | "source": [ 592 | "### ** Part 3: Finding unique words and a mean value **" 593 | ] 594 | }, 595 | { 596 | "cell_type": "markdown", 597 | "metadata": {}, 598 | "source": [ 599 | "#### ** (3a) Unique words **\n", 600 | "#### Calculate the number of unique words in `wordsRDD`. You can use other RDDs that you have already created to make this easier." 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": 41, 606 | "metadata": { 607 | "collapsed": false 608 | }, 609 | "outputs": [ 610 | { 611 | "name": "stdout", 612 | "output_type": "stream", 613 | "text": [ 614 | "3\n" 615 | ] 616 | } 617 | ], 618 | "source": [ 619 | "# TODO: Replace with appropriate code\n", 620 | "uniqueWords = wordsRDD.map(lambda word: (word, 1)).distinct().count()\n", 621 | "print uniqueWords" 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": 42, 627 | "metadata": { 628 | "collapsed": false 629 | }, 630 | "outputs": [ 631 | { 632 | "name": "stdout", 633 | "output_type": "stream", 634 | "text": [ 635 | "1 test passed.\n" 636 | ] 637 | } 638 | ], 639 | "source": [ 640 | "# TEST Unique words (3a)\n", 641 | "Test.assertEquals(uniqueWords, 3, 'incorrect count of uniqueWords')" 642 | ] 643 | }, 644 | { 645 | "cell_type": "markdown", 646 | "metadata": {}, 647 | "source": [ 648 | "#### ** (3b) Mean using `reduce` **\n", 649 | "#### Find the mean number of words per unique word in `wordCounts`.\n", 650 | "#### Use a `reduce()` action to sum the counts in `wordCounts` and then divide by the number of unique words. First `map()` the pair RDD `wordCounts`, which consists of (key, value) pairs, to an RDD of values." 651 | ] 652 | }, 653 | { 654 | "cell_type": "code", 655 | "execution_count": 50, 656 | "metadata": { 657 | "collapsed": false 658 | }, 659 | "outputs": [ 660 | { 661 | "name": "stdout", 662 | "output_type": "stream", 663 | "text": [ 664 | "5\n", 665 | "1.67\n" 666 | ] 667 | } 668 | ], 669 | "source": [ 670 | "# TODO: Replace with appropriate code\n", 671 | "from operator import add\n", 672 | "\n", 673 | "totalCount = (wordCounts\n", 674 | " .map(lambda (a,b): b)\n", 675 | " .reduce(add))\n", 676 | "average = totalCount / float(wordCounts.distinct().count())\n", 677 | "print totalCount\n", 678 | "print round(average, 2)" 679 | ] 680 | }, 681 | { 682 | "cell_type": "code", 683 | "execution_count": 51, 684 | "metadata": { 685 | "collapsed": false 686 | }, 687 | "outputs": [ 688 | { 689 | "name": "stdout", 690 | "output_type": "stream", 691 | "text": [ 692 | "1 test passed.\n" 693 | ] 694 | } 695 | ], 696 | "source": [ 697 | "# TEST Mean using reduce (3b)\n", 698 | "Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')" 699 | ] 700 | }, 701 | { 702 | "cell_type": "markdown", 703 | "metadata": {}, 704 | "source": [ 705 | "### ** Part 4: Apply word count to a file **" 706 | ] 707 | }, 708 | { 709 | "cell_type": "markdown", 710 | "metadata": {}, 711 | "source": [ 712 | "#### In this section we will finish developing our word count application. We'll have to build the `wordCount` function, deal with real world problems like capitalization and punctuation, load in our data source, and compute the word count on the new data." 713 | ] 714 | }, 715 | { 716 | "cell_type": "markdown", 717 | "metadata": {}, 718 | "source": [ 719 | "#### ** (4a) `wordCount` function **\n", 720 | "#### First, define a function for word counting. You should reuse the techniques that have been covered in earlier parts of this lab. This function should take in an RDD that is a list of words like `wordsRDD` and return a pair RDD that has all of the words and their associated counts." 721 | ] 722 | }, 723 | { 724 | "cell_type": "code", 725 | "execution_count": 52, 726 | "metadata": { 727 | "collapsed": false 728 | }, 729 | "outputs": [ 730 | { 731 | "name": "stdout", 732 | "output_type": "stream", 733 | "text": [ 734 | "[('rat', 2), ('elephant', 1), ('cat', 2)]\n" 735 | ] 736 | } 737 | ], 738 | "source": [ 739 | "# TODO: Replace with appropriate code\n", 740 | "def wordCount(wordListRDD):\n", 741 | " \"\"\"Creates a pair RDD with word counts from an RDD of words.\n", 742 | "\n", 743 | " Args:\n", 744 | " wordListRDD (RDD of str): An RDD consisting of words.\n", 745 | "\n", 746 | " Returns:\n", 747 | " RDD of (str, int): An RDD consisting of (word, count) tuples.\n", 748 | " \"\"\"\n", 749 | " return (wordListRDD\n", 750 | " .map(lambda a : (a,1))\n", 751 | " .reduceByKey(lambda a,b: a+b))\n", 752 | "print wordCount(wordsRDD).collect()" 753 | ] 754 | }, 755 | { 756 | "cell_type": "code", 757 | "execution_count": 53, 758 | "metadata": { 759 | "collapsed": false 760 | }, 761 | "outputs": [ 762 | { 763 | "name": "stdout", 764 | "output_type": "stream", 765 | "text": [ 766 | "1 test passed.\n" 767 | ] 768 | } 769 | ], 770 | "source": [ 771 | "# TEST wordCount function (4a)\n", 772 | "Test.assertEquals(sorted(wordCount(wordsRDD).collect()),\n", 773 | " [('cat', 2), ('elephant', 1), ('rat', 2)],\n", 774 | " 'incorrect definition for wordCount function')" 775 | ] 776 | }, 777 | { 778 | "cell_type": "markdown", 779 | "metadata": {}, 780 | "source": [ 781 | "#### ** (4b) Capitalization and punctuation **\n", 782 | "#### Real world files are more complicated than the data we have been using in this lab. Some of the issues we have to address are:\n", 783 | " + #### Words should be counted independent of their capitialization (e.g., Spark and spark should be counted as the same word).\n", 784 | " + #### All punctuation should be removed.\n", 785 | " + #### Any leading or trailing spaces on a line should be removed.\n", 786 | " \n", 787 | "#### Define the function `removePunctuation` that converts all text to lower case, removes any punctuation, and removes leading and trailing spaces. Use the Python [re](https://docs.python.org/2/library/re.html) module to remove any text that is not a letter, number, or space. Reading `help(re.sub)` might be useful." 788 | ] 789 | }, 790 | { 791 | "cell_type": "code", 792 | "execution_count": 54, 793 | "metadata": { 794 | "collapsed": false 795 | }, 796 | "outputs": [ 797 | { 798 | "name": "stdout", 799 | "output_type": "stream", 800 | "text": [ 801 | "hi you\n", 802 | "no underscore\n" 803 | ] 804 | } 805 | ], 806 | "source": [ 807 | "# TODO: Replace with appropriate code\n", 808 | "import re\n", 809 | "def removePunctuation(text):\n", 810 | " \"\"\"Removes punctuation, changes to lower case, and strips leading and trailing spaces.\n", 811 | "\n", 812 | " Note:\n", 813 | " Only spaces, letters, and numbers should be retained. Other characters should should be\n", 814 | " eliminated (e.g. it's becomes its). Leading and trailing spaces should be removed after\n", 815 | " punctuation is removed.\n", 816 | "\n", 817 | " Args:\n", 818 | " text (str): A string.\n", 819 | "\n", 820 | " Returns:\n", 821 | " str: The cleaned up string.\n", 822 | " \"\"\"\n", 823 | " return re.sub(\"[^a-zA-Z0-9 ]\", \"\", text.strip(\" \").lower())\n", 824 | "print removePunctuation('Hi, you!')\n", 825 | "print removePunctuation(' No under_score!')" 826 | ] 827 | }, 828 | { 829 | "cell_type": "code", 830 | "execution_count": 55, 831 | "metadata": { 832 | "collapsed": false 833 | }, 834 | "outputs": [ 835 | { 836 | "name": "stdout", 837 | "output_type": "stream", 838 | "text": [ 839 | "1 test passed.\n" 840 | ] 841 | } 842 | ], 843 | "source": [ 844 | "# TEST Capitalization and punctuation (4b)\n", 845 | "Test.assertEquals(removePunctuation(\" The Elephant's 4 cats. \"),\n", 846 | " 'the elephants 4 cats',\n", 847 | " 'incorrect definition for removePunctuation function')" 848 | ] 849 | }, 850 | { 851 | "cell_type": "markdown", 852 | "metadata": {}, 853 | "source": [ 854 | "#### ** (4c) Load a text file **\n", 855 | "#### For the next part of this lab, we will use the [Complete Works of William Shakespeare](http://www.gutenberg.org/ebooks/100) from [Project Gutenberg](http://www.gutenberg.org/wiki/Main_Page). To convert a text file into an RDD, we use the `SparkContext.textFile()` method. We also apply the recently defined `removePunctuation()` function using a `map()` transformation to strip out the punctuation and change all text to lowercase. Since the file is large we use `take(15)`, so that we only print 15 lines." 856 | ] 857 | }, 858 | { 859 | "cell_type": "code", 860 | "execution_count": 56, 861 | "metadata": { 862 | "collapsed": false 863 | }, 864 | "outputs": [ 865 | { 866 | "name": "stdout", 867 | "output_type": "stream", 868 | "text": [ 869 | "0: 1609\n", 870 | "1: \n", 871 | "2: the sonnets\n", 872 | "3: \n", 873 | "4: by william shakespeare\n", 874 | "5: \n", 875 | "6: \n", 876 | "7: \n", 877 | "8: 1\n", 878 | "9: from fairest creatures we desire increase\n", 879 | "10: that thereby beautys rose might never die\n", 880 | "11: but as the riper should by time decease\n", 881 | "12: his tender heir might bear his memory\n", 882 | "13: but thou contracted to thine own bright eyes\n", 883 | "14: feedst thy lights flame with selfsubstantial fuel\n" 884 | ] 885 | } 886 | ], 887 | "source": [ 888 | "# Just run this code\n", 889 | "import os.path\n", 890 | "baseDir = os.path.join('data')\n", 891 | "inputPath = os.path.join('cs100', 'lab1', 'shakespeare.txt')\n", 892 | "fileName = os.path.join(baseDir, inputPath)\n", 893 | "\n", 894 | "shakespeareRDD = (sc\n", 895 | " .textFile(fileName, 8)\n", 896 | " .map(removePunctuation))\n", 897 | "print '\\n'.join(shakespeareRDD\n", 898 | " .zipWithIndex() # to (line, lineNum)\n", 899 | " .map(lambda (l, num): '{0}: {1}'.format(num, l)) # to 'lineNum: line'\n", 900 | " .take(15))" 901 | ] 902 | }, 903 | { 904 | "cell_type": "markdown", 905 | "metadata": {}, 906 | "source": [ 907 | "#### ** (4d) Words from lines **\n", 908 | "#### Before we can use the `wordcount()` function, we have to address two issues with the format of the RDD:\n", 909 | " + #### The first issue is that that we need to split each line by its spaces.\n", 910 | " + #### The second issue is we need to filter out empty lines.\n", 911 | " \n", 912 | "#### Apply a transformation that will split each element of the RDD by its spaces. For each element of the RDD, you should apply Python's string [split()](https://docs.python.org/2/library/string.html#string.split) function. You might think that a `map()` transformation is the way to do this, but think about what the result of the `split()` function will be." 913 | ] 914 | }, 915 | { 916 | "cell_type": "code", 917 | "execution_count": 60, 918 | "metadata": { 919 | "collapsed": false 920 | }, 921 | "outputs": [ 922 | { 923 | "name": "stdout", 924 | "output_type": "stream", 925 | "text": [ 926 | "[u'zwaggerd', u'zounds', u'zounds', u'zounds', u'zounds']\n", 927 | "928908\n" 928 | ] 929 | } 930 | ], 931 | "source": [ 932 | "# TODO: Replace with appropriate code\n", 933 | "shakespeareWordsRDD = shakespeareRDD.flatMap(lambda a: a.split(\" \"))\n", 934 | "shakespeareWordCount = shakespeareWordsRDD.count()\n", 935 | "print shakespeareWordsRDD.top(5)\n", 936 | "print shakespeareWordCount" 937 | ] 938 | }, 939 | { 940 | "cell_type": "code", 941 | "execution_count": 61, 942 | "metadata": { 943 | "collapsed": false 944 | }, 945 | "outputs": [ 946 | { 947 | "name": "stdout", 948 | "output_type": "stream", 949 | "text": [ 950 | "1 test passed.\n", 951 | "1 test passed.\n" 952 | ] 953 | } 954 | ], 955 | "source": [ 956 | "# TEST Words from lines (4d)\n", 957 | "# This test allows for leading spaces to be removed either before or after\n", 958 | "# punctuation is removed.\n", 959 | "Test.assertTrue(shakespeareWordCount == 927631 or shakespeareWordCount == 928908,\n", 960 | " 'incorrect value for shakespeareWordCount')\n", 961 | "Test.assertEquals(shakespeareWordsRDD.top(5),\n", 962 | " [u'zwaggerd', u'zounds', u'zounds', u'zounds', u'zounds'],\n", 963 | " 'incorrect value for shakespeareWordsRDD')" 964 | ] 965 | }, 966 | { 967 | "cell_type": "markdown", 968 | "metadata": {}, 969 | "source": [ 970 | "#### ** (4e) Remove empty elements **\n", 971 | "#### The next step is to filter out the empty elements. Remove all entries where the word is `''`." 972 | ] 973 | }, 974 | { 975 | "cell_type": "code", 976 | "execution_count": 62, 977 | "metadata": { 978 | "collapsed": false 979 | }, 980 | "outputs": [ 981 | { 982 | "name": "stdout", 983 | "output_type": "stream", 984 | "text": [ 985 | "882996\n" 986 | ] 987 | } 988 | ], 989 | "source": [ 990 | "# TODO: Replace with appropriate code\n", 991 | "shakeWordsRDD = shakespeareWordsRDD.filter(lambda word: len(word) > 0)\n", 992 | "shakeWordCount = shakeWordsRDD.count()\n", 993 | "print shakeWordCount" 994 | ] 995 | }, 996 | { 997 | "cell_type": "code", 998 | "execution_count": 63, 999 | "metadata": { 1000 | "collapsed": false 1001 | }, 1002 | "outputs": [ 1003 | { 1004 | "name": "stdout", 1005 | "output_type": "stream", 1006 | "text": [ 1007 | "1 test passed.\n" 1008 | ] 1009 | } 1010 | ], 1011 | "source": [ 1012 | "# TEST Remove empty elements (4e)\n", 1013 | "Test.assertEquals(shakeWordCount, 882996, 'incorrect value for shakeWordCount')" 1014 | ] 1015 | }, 1016 | { 1017 | "cell_type": "markdown", 1018 | "metadata": {}, 1019 | "source": [ 1020 | "#### ** (4f) Count the words **\n", 1021 | "#### We now have an RDD that is only words. Next, let's apply the `wordCount()` function to produce a list of word counts. We can view the top 15 words by using the `takeOrdered()` action; however, since the elements of the RDD are pairs, we need a custom sort function that sorts using the value part of the pair.\n", 1022 | "#### You'll notice that many of the words are common English words. These are called stopwords. In a later lab, we will see how to eliminate them from the results.\n", 1023 | "#### Use the `wordCount()` function and `takeOrdered()` to obtain the fifteen most common words and their counts." 1024 | ] 1025 | }, 1026 | { 1027 | "cell_type": "code", 1028 | "execution_count": 66, 1029 | "metadata": { 1030 | "collapsed": false 1031 | }, 1032 | "outputs": [ 1033 | { 1034 | "name": "stdout", 1035 | "output_type": "stream", 1036 | "text": [ 1037 | "the: 27361\n", 1038 | "and: 26028\n", 1039 | "i: 20681\n", 1040 | "to: 19150\n", 1041 | "of: 17463\n", 1042 | "a: 14593\n", 1043 | "you: 13615\n", 1044 | "my: 12481\n", 1045 | "in: 10956\n", 1046 | "that: 10890\n", 1047 | "is: 9134\n", 1048 | "not: 8497\n", 1049 | "with: 7771\n", 1050 | "me: 7769\n", 1051 | "it: 7678\n" 1052 | ] 1053 | } 1054 | ], 1055 | "source": [ 1056 | "# TODO: Replace with appropriate code\n", 1057 | "top15WordsAndCounts = wordCount(shakeWordsRDD).takeOrdered(15, lambda (a,b): -b)\n", 1058 | "print '\\n'.join(map(lambda (w, c): '{0}: {1}'.format(w, c), top15WordsAndCounts))" 1059 | ] 1060 | }, 1061 | { 1062 | "cell_type": "code", 1063 | "execution_count": 67, 1064 | "metadata": { 1065 | "collapsed": false 1066 | }, 1067 | "outputs": [ 1068 | { 1069 | "name": "stdout", 1070 | "output_type": "stream", 1071 | "text": [ 1072 | "1 test passed.\n" 1073 | ] 1074 | } 1075 | ], 1076 | "source": [ 1077 | "# TEST Count the words (4f)\n", 1078 | "Test.assertEquals(top15WordsAndCounts,\n", 1079 | " [(u'the', 27361), (u'and', 26028), (u'i', 20681), (u'to', 19150), (u'of', 17463),\n", 1080 | " (u'a', 14593), (u'you', 13615), (u'my', 12481), (u'in', 10956), (u'that', 10890),\n", 1081 | " (u'is', 9134), (u'not', 8497), (u'with', 7771), (u'me', 7769), (u'it', 7678)],\n", 1082 | " 'incorrect value for top15WordsAndCounts')" 1083 | ] 1084 | } 1085 | ], 1086 | "metadata": { 1087 | "kernelspec": { 1088 | "display_name": "Python 2", 1089 | "language": "python", 1090 | "name": "python2" 1091 | }, 1092 | "language_info": { 1093 | "codemirror_mode": { 1094 | "name": "ipython", 1095 | "version": 2 1096 | }, 1097 | "file_extension": ".py", 1098 | "mimetype": "text/x-python", 1099 | "name": "python", 1100 | "nbconvert_exporter": "python", 1101 | "pygments_lexer": "ipython2", 1102 | "version": "2.7.6" 1103 | } 1104 | }, 1105 | "nbformat": 4, 1106 | "nbformat_minor": 0 1107 | } 1108 | -------------------------------------------------------------------------------- /Week 3 - Data Management/Week3Lec5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 3 - Data Management/Week3Lec5.pdf -------------------------------------------------------------------------------- /Week 3 - Data Management/Week3Lec6.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 3 - Data Management/Week3Lec6.pdf -------------------------------------------------------------------------------- /Week 4 - Data Quality, Exploratory Data Analysis, and Machine Learning/Lab 3 Quiz Questions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 4 - Data Quality, Exploratory Data Analysis, and Machine Learning/Lab 3 Quiz Questions.pdf -------------------------------------------------------------------------------- /Week 4 - Data Quality, Exploratory Data Analysis, and Machine Learning/Week4Lec7.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 4 - Data Quality, Exploratory Data Analysis, and Machine Learning/Week4Lec7.pdf -------------------------------------------------------------------------------- /Week 4 - Data Quality, Exploratory Data Analysis, and Machine Learning/Week4Lec8.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 4 - Data Quality, Exploratory Data Analysis, and Machine Learning/Week4Lec8.pdf -------------------------------------------------------------------------------- /Week 5 - Introduction to Machine Learning with Apache Spark/Lab 4 Quiz Questions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dipanjanS/BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark/d0dbc626fc55c2717c54e38353848668bb25baad/Week 5 - Introduction to Machine Learning with Apache Spark/Lab 4 Quiz Questions.pdf -------------------------------------------------------------------------------- /Week 5 - Introduction to Machine Learning with Apache Spark/lab4_machine_learning_student.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "version 1.0.2\n", 8 | "#![Spark Logo](http://spark-mooc.github.io/web-assets/images/ta_Spark-logo-small.png) + ![Python Logo](http://spark-mooc.github.io/web-assets/images/python-logo-master-v3-TM-flattened_small.png)\n", 9 | "# **Introduction to Machine Learning with Apache Spark**\n", 10 | "## **Predicting Movie Ratings**\n", 11 | "#### One of the most common uses of big data is to predict what users want. This allows Google to show you relevant ads, Amazon to recommend relevant products, and Netflix to recommend movies that you might like. This lab will demonstrate how we can use Apache Spark to recommend movies to a user. We will start with some basic techniques, and then use the [Spark MLlib][mllib] library's Alternating Least Squares method to make more sophisticated predictions.\n", 12 | "#### For this lab, we will use a subset dataset of 500,000 ratings we have included for you into your VM (and on Databricks) from the [movielens 10M stable benchmark rating dataset](http://grouplens.org/datasets/movielens/). However, the same code you write will work for the full dataset, or their latest dataset of 21 million ratings.\n", 13 | "#### In this lab:\n", 14 | "#### *Part 0*: Preliminaries\n", 15 | "#### *Part 1*: Basic Recommendations\n", 16 | "#### *Part 2*: Collaborative Filtering\n", 17 | "#### *Part 3*: Predictions for Yourself\n", 18 | "#### As mentioned during the first Learning Spark lab, think carefully before calling `collect()` on any datasets. When you are using a small dataset, calling `collect()` and then using Python to get a sense for the data locally (in the driver program) will work fine, but this will not work when you are using a large dataset that doesn't fit in memory on one machine. Solutions that call `collect()` and do local analysis that could have been done with Spark will likely fail in the autograder and not receive full credit.\n", 19 | "[mllib]: https://spark.apache.org/mllib/" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### Code\n", 27 | "#### This assignment can be completed using basic Python and pySpark Transformations and Actions. Libraries other than math are not necessary. With the exception of the ML functions that we introduce in this assignment, you should be able to complete all parts of this homework using only the Spark functions you have used in prior lab exercises (although you are welcome to use more features of Spark if you like!)." 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 12, 33 | "metadata": { 34 | "collapsed": false 35 | }, 36 | "outputs": [], 37 | "source": [ 38 | "import sys\n", 39 | "import os\n", 40 | "from test_helper import Test\n", 41 | "\n", 42 | "baseDir = os.path.join('data')\n", 43 | "inputPath = os.path.join('cs100', 'lab4', 'small')\n", 44 | "\n", 45 | "ratingsFilename = os.path.join(baseDir, inputPath, 'ratings.dat.gz')\n", 46 | "moviesFilename = os.path.join(baseDir, inputPath, 'movies.dat')" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "### **Part 0: Preliminaries**\n", 54 | "#### We read in each of the files and create an RDD consisting of parsed lines.\n", 55 | "#### Each line in the ratings dataset (`ratings.dat.gz`) is formatted as:\n", 56 | "#### `UserID::MovieID::Rating::Timestamp`\n", 57 | "#### Each line in the movies (`movies.dat`) dataset is formatted as:\n", 58 | "#### `MovieID::Title::Genres`\n", 59 | "#### The `Genres` field has the format\n", 60 | "#### `Genres1|Genres2|Genres3|...`\n", 61 | "#### The format of these files is uniform and simple, so we can use Python [`split()`](https://docs.python.org/2/library/stdtypes.html#str.split) to parse their lines.\n", 62 | "#### Parsing the two files yields two RDDS\n", 63 | "* #### For each line in the ratings dataset, we create a tuple of (UserID, MovieID, Rating). We drop the timestamp because we do not need it for this exercise.\n", 64 | "* #### For each line in the movies dataset, we create a tuple of (MovieID, Title). We drop the Genres because we do not need them for this exercise." 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 13, 70 | "metadata": { 71 | "collapsed": false 72 | }, 73 | "outputs": [ 74 | { 75 | "name": "stdout", 76 | "output_type": "stream", 77 | "text": [ 78 | "There are 487650 ratings and 3883 movies in the datasets\n", 79 | "Ratings: [(1, 1193, 5.0), (1, 914, 3.0), (1, 2355, 5.0)]\n", 80 | "Movies: [(1, u'Toy Story (1995)'), (2, u'Jumanji (1995)'), (3, u'Grumpier Old Men (1995)')]\n" 81 | ] 82 | } 83 | ], 84 | "source": [ 85 | "numPartitions = 2\n", 86 | "rawRatings = sc.textFile(ratingsFilename).repartition(numPartitions)\n", 87 | "rawMovies = sc.textFile(moviesFilename)\n", 88 | "\n", 89 | "def get_ratings_tuple(entry):\n", 90 | " \"\"\" Parse a line in the ratings dataset\n", 91 | " Args:\n", 92 | " entry (str): a line in the ratings dataset in the form of UserID::MovieID::Rating::Timestamp\n", 93 | " Returns:\n", 94 | " tuple: (UserID, MovieID, Rating)\n", 95 | " \"\"\"\n", 96 | " items = entry.split('::')\n", 97 | " return int(items[0]), int(items[1]), float(items[2])\n", 98 | "\n", 99 | "\n", 100 | "def get_movie_tuple(entry):\n", 101 | " \"\"\" Parse a line in the movies dataset\n", 102 | " Args:\n", 103 | " entry (str): a line in the movies dataset in the form of MovieID::Title::Genres\n", 104 | " Returns:\n", 105 | " tuple: (MovieID, Title)\n", 106 | " \"\"\"\n", 107 | " items = entry.split('::')\n", 108 | " return int(items[0]), items[1]\n", 109 | "\n", 110 | "\n", 111 | "ratingsRDD = rawRatings.map(get_ratings_tuple).cache()\n", 112 | "moviesRDD = rawMovies.map(get_movie_tuple).cache()\n", 113 | "\n", 114 | "ratingsCount = ratingsRDD.count()\n", 115 | "moviesCount = moviesRDD.count()\n", 116 | "\n", 117 | "print 'There are %s ratings and %s movies in the datasets' % (ratingsCount, moviesCount)\n", 118 | "print 'Ratings: %s' % ratingsRDD.take(3)\n", 119 | "print 'Movies: %s' % moviesRDD.take(3)\n", 120 | "\n", 121 | "assert ratingsCount == 487650\n", 122 | "assert moviesCount == 3883\n", 123 | "assert moviesRDD.filter(lambda (id, title): title == 'Toy Story (1995)').count() == 1\n", 124 | "assert (ratingsRDD.takeOrdered(1, key=lambda (user, movie, rating): movie)\n", 125 | " == [(1, 1, 5.0)])" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "#### In this lab we will be examining subsets of the tuples we create (e.g., the top rated movies by users). Whenever we examine only a subset of a large dataset, there is the potential that the result will depend on the order we perform operations, such as joins, or how the data is partitioned across the workers. What we want to guarantee is that we always see the same results for a subset, independent of how we manipulate or store the data.\n", 133 | "#### We can do that by sorting before we examine a subset. You might think that the most obvious choice when dealing with an RDD of tuples would be to use the [`sortByKey()` method][sortbykey]. However this choice is problematic, as we can still end up with different results if the key is not unique.\n", 134 | "#### Note: It is important to use the [`unicode` type](https://docs.python.org/2/howto/unicode.html#the-unicode-type) instead of the `string` type as the titles are in unicode characters.\n", 135 | "#### Consider the following example, and note that while the sets are equal, the printed lists are usually in different order by value, *although they may randomly match up from time to time.*\n", 136 | "#### You can try running this multiple times. If the last assertion fails, don't worry about it: that was just the luck of the draw. And note that in some environments the results may be more deterministic.\n", 137 | "[sortbykey]: https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.sortByKey" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 14, 143 | "metadata": { 144 | "collapsed": false 145 | }, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "[(1, u'alpha'), (1, u'epsilon'), (1, u'delta'), (2, u'alpha'), (2, u'beta'), (3, u'alpha')]\n", 152 | "[(1, u'delta'), (1, u'epsilon'), (1, u'alpha'), (2, u'alpha'), (2, u'beta'), (3, u'alpha')]\n" 153 | ] 154 | } 155 | ], 156 | "source": [ 157 | "tmp1 = [(1, u'alpha'), (2, u'alpha'), (2, u'beta'), (3, u'alpha'), (1, u'epsilon'), (1, u'delta')]\n", 158 | "tmp2 = [(1, u'delta'), (2, u'alpha'), (2, u'beta'), (3, u'alpha'), (1, u'epsilon'), (1, u'alpha')]\n", 159 | "\n", 160 | "oneRDD = sc.parallelize(tmp1)\n", 161 | "twoRDD = sc.parallelize(tmp2)\n", 162 | "oneSorted = oneRDD.sortByKey(True).collect()\n", 163 | "twoSorted = twoRDD.sortByKey(True).collect()\n", 164 | "print oneSorted\n", 165 | "print twoSorted\n", 166 | "assert set(oneSorted) == set(twoSorted) # Note that both lists have the same elements\n", 167 | "assert twoSorted[0][0] < twoSorted.pop()[0] # Check that it is sorted by the keys\n", 168 | "assert oneSorted[0:2] != twoSorted[0:2] # Note that the subset consisting of the first two elements does not match" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "#### Even though the two lists contain identical tuples, the difference in ordering *sometimes* yields a different ordering for the sorted RDD (try running the cell repeatedly and see if the results change or the assertion fails). If we only examined the first two elements of the RDD (e.g., using `take(2)`), then we would observe different answers - **that is a really bad outcome as we want identical input data to always yield identical output**. A better technique is to sort the RDD by *both the key and value*, which we can do by combining the key and value into a single string and then sorting on that string. Since the key is an integer and the value is a unicode string, we can use a function to combine them into a single unicode string (e.g., `unicode('%.3f' % key) + ' ' + value`) before sorting the RDD using [sortBy()][sortby].\n", 176 | "[sortby]: https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.sortBy" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 15, 182 | "metadata": { 183 | "collapsed": false 184 | }, 185 | "outputs": [ 186 | { 187 | "name": "stdout", 188 | "output_type": "stream", 189 | "text": [ 190 | "[(1, u'alpha'), (1, u'delta'), (1, u'epsilon'), (2, u'alpha'), (2, u'beta'), (3, u'alpha')]\n", 191 | "[(1, u'alpha'), (1, u'delta'), (1, u'epsilon'), (2, u'alpha'), (2, u'beta'), (3, u'alpha')]\n" 192 | ] 193 | } 194 | ], 195 | "source": [ 196 | "def sortFunction(tuple):\n", 197 | " \"\"\" Construct the sort string (does not perform actual sorting)\n", 198 | " Args:\n", 199 | " tuple: (rating, MovieName)\n", 200 | " Returns:\n", 201 | " sortString: the value to sort with, 'rating MovieName'\n", 202 | " \"\"\"\n", 203 | " key = unicode('%.3f' % tuple[0])\n", 204 | " value = tuple[1]\n", 205 | " return (key + ' ' + value)\n", 206 | "\n", 207 | "\n", 208 | "print oneRDD.sortBy(sortFunction, True).collect()\n", 209 | "print twoRDD.sortBy(sortFunction, True).collect()" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "#### If we just want to look at the first few elements of the RDD in sorted order, we can use the [takeOrdered][takeordered] method with the `sortFunction` we defined.\n", 217 | "[takeordered]: https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.takeOrdered" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 16, 223 | "metadata": { 224 | "collapsed": false 225 | }, 226 | "outputs": [ 227 | { 228 | "name": "stdout", 229 | "output_type": "stream", 230 | "text": [ 231 | "one is [(1, u'alpha'), (1, u'delta'), (1, u'epsilon'), (2, u'alpha'), (2, u'beta'), (3, u'alpha')]\n", 232 | "two is [(1, u'alpha'), (1, u'delta'), (1, u'epsilon'), (2, u'alpha'), (2, u'beta'), (3, u'alpha')]\n" 233 | ] 234 | } 235 | ], 236 | "source": [ 237 | "oneSorted1 = oneRDD.takeOrdered(oneRDD.count(),key=sortFunction)\n", 238 | "twoSorted1 = twoRDD.takeOrdered(twoRDD.count(),key=sortFunction)\n", 239 | "print 'one is %s' % oneSorted1\n", 240 | "print 'two is %s' % twoSorted1\n", 241 | "assert oneSorted1 == twoSorted1" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "### **Part 1: Basic Recommendations**\n", 249 | "#### One way to recommend movies is to always recommend the movies with the highest average rating. In this part, we will use Spark to find the name, number of ratings, and the average rating of the 20 movies with the highest average rating and more than 500 reviews. We want to filter our movies with high ratings but fewer than or equal to 500 reviews because movies with few reviews may not have broad appeal to everyone." 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": {}, 255 | "source": [ 256 | "#### **(1a) Number of Ratings and Average Ratings for a Movie**\n", 257 | "#### Using only Python, implement a helper function `getCountsAndAverages()` that takes a single tuple of (MovieID, (Rating1, Rating2, Rating3, ...)) and returns a tuple of (MovieID, (number of ratings, averageRating)). For example, given the tuple `(100, (10.0, 20.0, 30.0))`, your function should return `(100, (3, 20.0))`" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 17, 263 | "metadata": { 264 | "collapsed": false 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "# TODO: Replace with appropriate code\n", 269 | "\n", 270 | "# First, implement a helper function `getCountsAndAverages` using only Python\n", 271 | "def getCountsAndAverages(IDandRatingsTuple):\n", 272 | " \"\"\" Calculate average rating\n", 273 | " Args:\n", 274 | " IDandRatingsTuple: a single tuple of (MovieID, (Rating1, Rating2, Rating3, ...))\n", 275 | " Returns:\n", 276 | " tuple: a tuple of (MovieID, (number of ratings, averageRating))\n", 277 | " \"\"\"\n", 278 | " aggr_result = (IDandRatingsTuple[0], (len(IDandRatingsTuple[1]), float(sum(IDandRatingsTuple[1])) / len(IDandRatingsTuple[1])))\n", 279 | " return aggr_result" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 18, 285 | "metadata": { 286 | "collapsed": false 287 | }, 288 | "outputs": [ 289 | { 290 | "name": "stdout", 291 | "output_type": "stream", 292 | "text": [ 293 | "1 test passed.\n", 294 | "1 test passed.\n", 295 | "1 test passed.\n" 296 | ] 297 | } 298 | ], 299 | "source": [ 300 | "# TEST Number of Ratings and Average Ratings for a Movie (1a)\n", 301 | "\n", 302 | "Test.assertEquals(getCountsAndAverages((1, (1, 2, 3, 4))), (1, (4, 2.5)),\n", 303 | " 'incorrect getCountsAndAverages() with integer list')\n", 304 | "Test.assertEquals(getCountsAndAverages((100, (10.0, 20.0, 30.0))), (100, (3, 20.0)),\n", 305 | " 'incorrect getCountsAndAverages() with float list')\n", 306 | "Test.assertEquals(getCountsAndAverages((110, xrange(20))), (110, (20, 9.5)),\n", 307 | " 'incorrect getCountsAndAverages() with xrange')" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "#### **(1b) Movies with Highest Average Ratings**\n", 315 | "#### Now that we have a way to calculate the average ratings, we will use the `getCountsAndAverages()` helper function with Spark to determine movies with highest average ratings.\n", 316 | "#### The steps you should perform are:\n", 317 | "* #### Recall that the `ratingsRDD` contains tuples of the form (UserID, MovieID, Rating). From `ratingsRDD` create an RDD with tuples of the form (MovieID, Python iterable of Ratings for that MovieID). This transformation will yield an RDD of the form: `[(1, ), (2, ), (3, )]`. Note that you will only need to perform two Spark transformations to do this step.\n", 318 | "* #### Using `movieIDsWithRatingsRDD` and your `getCountsAndAverages()` helper function, compute the number of ratings and average rating for each movie to yield tuples of the form (MovieID, (number of ratings, average rating)). This transformation will yield an RDD of the form: `[(1, (993, 4.145015105740181)), (2, (332, 3.174698795180723)), (3, (299, 3.0468227424749164))]`. You can do this step with one Spark transformation\n", 319 | "* #### We want to see movie names, instead of movie IDs. To `moviesRDD`, apply RDD transformations that use `movieIDsWithAvgRatingsRDD` to get the movie names for `movieIDsWithAvgRatingsRDD`, yielding tuples of the form (average rating, movie name, number of ratings). This set of transformations will yield an RDD of the form: `[(1.0, u'Autopsy (Macchie Solari) (1975)', 1), (1.0, u'Better Living (1998)', 1), (1.0, u'Big Squeeze, The (1996)', 3)]`. You will need to do two Spark transformations to complete this step: first use the `moviesRDD` with `movieIDsWithAvgRatingsRDD` to create a new RDD with Movie names matched to Movie IDs, then convert that RDD into the form of (average rating, movie name, number of ratings). These transformations will yield an RDD that looks like: `[(3.6818181818181817, u'Happiest Millionaire, The (1967)', 22), (3.0468227424749164, u'Grumpier Old Men (1995)', 299), (2.882978723404255, u'Hocus Pocus (1993)', 94)]`" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 31, 325 | "metadata": { 326 | "collapsed": false 327 | }, 328 | "outputs": [ 329 | { 330 | "name": "stdout", 331 | "output_type": "stream", 332 | "text": [ 333 | "movieIDsWithRatingsRDD: [(2, ), (4, ), (6, )]\n", 334 | "\n", 335 | "movieIDsWithAvgRatingsRDD: [(2, (332, 3.174698795180723)), (4, (71, 2.676056338028169)), (6, (442, 3.7918552036199094))]\n", 336 | "\n", 337 | "movieNameWithAvgRatingsRDD: [(3.6818181818181817, u'Happiest Millionaire, The (1967)', 22), (3.0468227424749164, u'Grumpier Old Men (1995)', 299), (2.882978723404255, u'Hocus Pocus (1993)', 94)]\n", 338 | "\n" 339 | ] 340 | } 341 | ], 342 | "source": [ 343 | "# TODO: Replace with appropriate code\n", 344 | "\n", 345 | "# From ratingsRDD with tuples of (UserID, MovieID, Rating) create an RDD with tuples of\n", 346 | "# the (MovieID, iterable of Ratings for that MovieID)\n", 347 | "movieIDsWithRatingsRDD = (ratingsRDD\n", 348 | " .map(lambda x:(x[1], x[2]))\n", 349 | " .groupByKey())\n", 350 | "print 'movieIDsWithRatingsRDD: %s\\n' % movieIDsWithRatingsRDD.take(3)\n", 351 | "\n", 352 | "# Using `movieIDsWithRatingsRDD`, compute the number of ratings and average rating for each movie to\n", 353 | "# yield tuples of the form (MovieID, (number of ratings, average rating))\n", 354 | "movieIDsWithAvgRatingsRDD = movieIDsWithRatingsRDD.map(getCountsAndAverages)\n", 355 | "print 'movieIDsWithAvgRatingsRDD: %s\\n' % movieIDsWithAvgRatingsRDD.take(3)\n", 356 | "\n", 357 | "# To `movieIDsWithAvgRatingsRDD`, apply RDD transformations that use `moviesRDD` to get the movie\n", 358 | "# names for `movieIDsWithAvgRatingsRDD`, yielding tuples of the form\n", 359 | "# (average rating, movie name, number of ratings)\n", 360 | "movieNameWithAvgRatingsRDD = (moviesRDD\n", 361 | " .join(movieIDsWithAvgRatingsRDD).map(lambda x:(x[1][1][1], x[1][0], x[1][1][0])))\n", 362 | "print 'movieNameWithAvgRatingsRDD: %s\\n' % movieNameWithAvgRatingsRDD.take(3)" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 32, 368 | "metadata": { 369 | "collapsed": false 370 | }, 371 | "outputs": [ 372 | { 373 | "name": "stdout", 374 | "output_type": "stream", 375 | "text": [ 376 | "1 test passed.\n", 377 | "1 test passed.\n", 378 | "1 test passed.\n", 379 | "1 test passed.\n", 380 | "1 test passed.\n", 381 | "1 test passed.\n", 382 | "1 test passed.\n", 383 | "1 test passed.\n" 384 | ] 385 | } 386 | ], 387 | "source": [ 388 | "# TEST Movies with Highest Average Ratings (1b)\n", 389 | "\n", 390 | "Test.assertEquals(movieIDsWithRatingsRDD.count(), 3615,\n", 391 | " 'incorrect movieIDsWithRatingsRDD.count() (expected 3615)')\n", 392 | "movieIDsWithRatingsTakeOrdered = movieIDsWithRatingsRDD.takeOrdered(3)\n", 393 | "Test.assertTrue(movieIDsWithRatingsTakeOrdered[0][0] == 1 and\n", 394 | " len(list(movieIDsWithRatingsTakeOrdered[0][1])) == 993,\n", 395 | " 'incorrect count of ratings for movieIDsWithRatingsTakeOrdered[0] (expected 993)')\n", 396 | "Test.assertTrue(movieIDsWithRatingsTakeOrdered[1][0] == 2 and\n", 397 | " len(list(movieIDsWithRatingsTakeOrdered[1][1])) == 332,\n", 398 | " 'incorrect count of ratings for movieIDsWithRatingsTakeOrdered[1] (expected 332)')\n", 399 | "Test.assertTrue(movieIDsWithRatingsTakeOrdered[2][0] == 3 and\n", 400 | " len(list(movieIDsWithRatingsTakeOrdered[2][1])) == 299,\n", 401 | " 'incorrect count of ratings for movieIDsWithRatingsTakeOrdered[2] (expected 299)')\n", 402 | "\n", 403 | "Test.assertEquals(movieIDsWithAvgRatingsRDD.count(), 3615,\n", 404 | " 'incorrect movieIDsWithAvgRatingsRDD.count() (expected 3615)')\n", 405 | "Test.assertEquals(movieIDsWithAvgRatingsRDD.takeOrdered(3),\n", 406 | " [(1, (993, 4.145015105740181)), (2, (332, 3.174698795180723)),\n", 407 | " (3, (299, 3.0468227424749164))],\n", 408 | " 'incorrect movieIDsWithAvgRatingsRDD.takeOrdered(3)')\n", 409 | "\n", 410 | "Test.assertEquals(movieNameWithAvgRatingsRDD.count(), 3615,\n", 411 | " 'incorrect movieNameWithAvgRatingsRDD.count() (expected 3615)')\n", 412 | "Test.assertEquals(movieNameWithAvgRatingsRDD.takeOrdered(3),\n", 413 | " [(1.0, u'Autopsy (Macchie Solari) (1975)', 1), (1.0, u'Better Living (1998)', 1),\n", 414 | " (1.0, u'Big Squeeze, The (1996)', 3)],\n", 415 | " 'incorrect movieNameWithAvgRatingsRDD.takeOrdered(3)')" 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": {}, 421 | "source": [ 422 | "#### **(1c) Movies with Highest Average Ratings and more than 500 reviews**\n", 423 | "#### Now that we have an RDD of the movies with highest averge ratings, we can use Spark to determine the 20 movies with highest average ratings and more than 500 reviews.\n", 424 | "#### Apply a single RDD transformation to `movieNameWithAvgRatingsRDD` to limit the results to movies with ratings from more than 500 people. We then use the `sortFunction()` helper function to sort by the average rating to get the movies in order of their rating (highest rating first). You will end up with an RDD of the form: `[(4.5349264705882355, u'Shawshank Redemption, The (1994)', 1088), (4.515798462852263, u\"Schindler's List (1993)\", 1171), (4.512893982808023, u'Godfather, The (1972)', 1047)]`" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": 35, 430 | "metadata": { 431 | "collapsed": false 432 | }, 433 | "outputs": [ 434 | { 435 | "name": "stdout", 436 | "output_type": "stream", 437 | "text": [ 438 | "Movies with highest ratings: [(4.5349264705882355, u'Shawshank Redemption, The (1994)', 1088), (4.515798462852263, u\"Schindler's List (1993)\", 1171), (4.512893982808023, u'Godfather, The (1972)', 1047), (4.510460251046025, u'Raiders of the Lost Ark (1981)', 1195), (4.505415162454874, u'Usual Suspects, The (1995)', 831), (4.457256461232604, u'Rear Window (1954)', 503), (4.45468509984639, u'Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)', 651), (4.43953006219765, u'Star Wars: Episode IV - A New Hope (1977)', 1447), (4.4, u'Sixth Sense, The (1999)', 1110), (4.394285714285714, u'North by Northwest (1959)', 700), (4.379506641366224, u'Citizen Kane (1941)', 527), (4.375, u'Casablanca (1942)', 776), (4.363975155279503, u'Godfather: Part II, The (1974)', 805), (4.358816276202219, u\"One Flew Over the Cuckoo's Nest (1975)\", 811), (4.358173076923077, u'Silence of the Lambs, The (1991)', 1248), (4.335826477187734, u'Saving Private Ryan (1998)', 1337), (4.326241134751773, u'Chinatown (1974)', 564), (4.325383304940375, u'Life Is Beautiful (La Vita \\ufffd bella) (1997)', 587), (4.324110671936759, u'Monty Python and the Holy Grail (1974)', 759), (4.3096, u'Matrix, The (1999)', 1250)]\n" 439 | ] 440 | } 441 | ], 442 | "source": [ 443 | "# TODO: Replace with appropriate code\n", 444 | "\n", 445 | "# Apply an RDD transformation to `movieNameWithAvgRatingsRDD` to limit the results to movies with\n", 446 | "# ratings from more than 500 people. We then use the `sortFunction()` helper function to sort by the\n", 447 | "# average rating to get the movies in order of their rating (highest rating first)\n", 448 | "movieLimitedAndSortedByRatingRDD = (movieNameWithAvgRatingsRDD\n", 449 | " .filter(lambda x: (x[2] > 500))\n", 450 | " .sortBy(sortFunction, False))\n", 451 | "print 'Movies with highest ratings: %s' % movieLimitedAndSortedByRatingRDD.take(20)" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 36, 457 | "metadata": { 458 | "collapsed": false 459 | }, 460 | "outputs": [ 461 | { 462 | "name": "stdout", 463 | "output_type": "stream", 464 | "text": [ 465 | "1 test passed.\n", 466 | "1 test passed.\n" 467 | ] 468 | } 469 | ], 470 | "source": [ 471 | "# TEST Movies with Highest Average Ratings and more than 500 Reviews (1c)\n", 472 | "\n", 473 | "Test.assertEquals(movieLimitedAndSortedByRatingRDD.count(), 194,\n", 474 | " 'incorrect movieLimitedAndSortedByRatingRDD.count()')\n", 475 | "Test.assertEquals(movieLimitedAndSortedByRatingRDD.take(20),\n", 476 | " [(4.5349264705882355, u'Shawshank Redemption, The (1994)', 1088),\n", 477 | " (4.515798462852263, u\"Schindler's List (1993)\", 1171),\n", 478 | " (4.512893982808023, u'Godfather, The (1972)', 1047),\n", 479 | " (4.510460251046025, u'Raiders of the Lost Ark (1981)', 1195),\n", 480 | " (4.505415162454874, u'Usual Suspects, The (1995)', 831),\n", 481 | " (4.457256461232604, u'Rear Window (1954)', 503),\n", 482 | " (4.45468509984639, u'Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)', 651),\n", 483 | " (4.43953006219765, u'Star Wars: Episode IV - A New Hope (1977)', 1447),\n", 484 | " (4.4, u'Sixth Sense, The (1999)', 1110), (4.394285714285714, u'North by Northwest (1959)', 700),\n", 485 | " (4.379506641366224, u'Citizen Kane (1941)', 527), (4.375, u'Casablanca (1942)', 776),\n", 486 | " (4.363975155279503, u'Godfather: Part II, The (1974)', 805),\n", 487 | " (4.358816276202219, u\"One Flew Over the Cuckoo's Nest (1975)\", 811),\n", 488 | " (4.358173076923077, u'Silence of the Lambs, The (1991)', 1248),\n", 489 | " (4.335826477187734, u'Saving Private Ryan (1998)', 1337),\n", 490 | " (4.326241134751773, u'Chinatown (1974)', 564),\n", 491 | " (4.325383304940375, u'Life Is Beautiful (La Vita \\ufffd bella) (1997)', 587),\n", 492 | " (4.324110671936759, u'Monty Python and the Holy Grail (1974)', 759),\n", 493 | " (4.3096, u'Matrix, The (1999)', 1250)], 'incorrect sortedByRatingRDD.take(20)')" 494 | ] 495 | }, 496 | { 497 | "cell_type": "markdown", 498 | "metadata": {}, 499 | "source": [ 500 | "#### Using a threshold on the number of reviews is one way to improve the recommendations, but there are many other good ways to improve quality. For example, you could weight ratings by the number of ratings." 501 | ] 502 | }, 503 | { 504 | "cell_type": "markdown", 505 | "metadata": {}, 506 | "source": [ 507 | "## **Part 2: Collaborative Filtering**\n", 508 | "#### In this course, you have learned about many of the basic transformations and actions that Spark allows us to apply to distributed datasets. Spark also exposes some higher level functionality; in particular, Machine Learning using a component of Spark called [MLlib][mllib]. In this part, you will learn how to use MLlib to make personalized movie recommendations using the movie data we have been analyzing.\n", 509 | "#### We are going to use a technique called [collaborative filtering][collab]. Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue x than to have the opinion on x of a person chosen randomly. You can read more about collaborative filtering [here][collab2].\n", 510 | "#### The image below (from [Wikipedia][collab]) shows an example of predicting of the user's rating using collaborative filtering. At first, people rate different items (like videos, images, games). After that, the system is making predictions about a user's rating for an item, which the user has not rated yet. These predictions are built upon the existing ratings of other users, who have similar ratings with the active user. For instance, in the image below the system has made a prediction, that the active user will not like the video.\n", 511 | "![collaborative filtering](https://courses.edx.org/c4x/BerkeleyX/CS100.1x/asset/Collaborative_filtering.gif)\n", 512 | "[mllib]: https://spark.apache.org/mllib/\n", 513 | "[collab]: https://en.wikipedia.org/?title=Collaborative_filtering\n", 514 | "[collab2]: http://recommender-systems.org/collaborative-filtering/" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": {}, 520 | "source": [ 521 | "#### For movie recommendations, we start with a matrix whose entries are movie ratings by users (shown in red in the diagram below). Each column represents a user (shown in green) and each row represents a particular movie (shown in blue).\n", 522 | "#### Since not all users have rated all movies, we do not know all of the entries in this matrix, which is precisely why we need collaborative filtering. For each user, we have ratings for only a subset of the movies. With collaborative filtering, the idea is to approximate the ratings matrix by factorizing it as the product of two matrices: one that describes properties of each user (shown in green), and one that describes properties of each movie (shown in blue).\n", 523 | "![factorization](http://spark-mooc.github.io/web-assets/images/matrix_factorization.png)\n", 524 | "#### We want to select these two matrices such that the error for the users/movie pairs where we know the correct ratings is minimized. The [Alternating Least Squares][als] algorithm does this by first randomly filling the users matrix with values and then optimizing the value of the movies such that the error is minimized. Then, it holds the movies matrix constrant and optimizes the value of the user's matrix. This alternation between which matrix to optimize is the reason for the \"alternating\" in the name.\n", 525 | "#### This optimization is what's being shown on the right in the image above. Given a fixed set of user factors (i.e., values in the users matrix), we use the known ratings to find the best values for the movie factors using the optimization written at the bottom of the figure. Then we \"alternate\" and pick the best user factors given fixed movie factors.\n", 526 | "#### For a simple example of what the users and movies matrices might look like, check out the [videos from Lecture 8][videos] or the [slides from Lecture 8][slides]\n", 527 | "[videos]: https://courses.edx.org/courses/BerkeleyX/CS100.1x/1T2015/courseware/00eb8b17939b4889a41a6d8d2f35db83/3bd3bba368be4102b40780550d3d8da6/\n", 528 | "[slides]: https://courses.edx.org/c4x/BerkeleyX/CS100.1x/asset/Week4Lec8.pdf\n", 529 | "[als]: https://en.wikiversity.org/wiki/Least-Squares_Method" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "#### **(2a) Creating a Training Set**\n", 537 | "#### Before we jump into using machine learning, we need to break up the `ratingsRDD` dataset into three pieces:\n", 538 | "* #### A training set (RDD), which we will use to train models\n", 539 | "* #### A validation set (RDD), which we will use to choose the best model\n", 540 | "* #### A test set (RDD), which we will use for our experiments\n", 541 | "#### To randomly split the dataset into the multiple groups, we can use the pySpark [randomSplit()](https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.randomSplit) transformation. `randomSplit()` takes a set of splits and and seed and returns multiple RDDs." 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": 37, 547 | "metadata": { 548 | "collapsed": false 549 | }, 550 | "outputs": [ 551 | { 552 | "name": "stdout", 553 | "output_type": "stream", 554 | "text": [ 555 | "Training: 292716, validation: 96902, test: 98032\n", 556 | "\n", 557 | "[(1, 914, 3.0), (1, 2355, 5.0), (1, 595, 5.0)]\n", 558 | "[(1, 1287, 5.0), (1, 594, 4.0), (1, 1270, 5.0)]\n", 559 | "[(1, 1193, 5.0), (1, 2398, 4.0), (1, 1035, 5.0)]\n" 560 | ] 561 | } 562 | ], 563 | "source": [ 564 | "trainingRDD, validationRDD, testRDD = ratingsRDD.randomSplit([6, 2, 2], seed=0L)\n", 565 | "\n", 566 | "print 'Training: %s, validation: %s, test: %s\\n' % (trainingRDD.count(),\n", 567 | " validationRDD.count(),\n", 568 | " testRDD.count())\n", 569 | "print trainingRDD.take(3)\n", 570 | "print validationRDD.take(3)\n", 571 | "print testRDD.take(3)\n", 572 | "\n", 573 | "assert trainingRDD.count() == 292716\n", 574 | "assert validationRDD.count() == 96902\n", 575 | "assert testRDD.count() == 98032\n", 576 | "\n", 577 | "assert trainingRDD.filter(lambda t: t == (1, 914, 3.0)).count() == 1\n", 578 | "assert trainingRDD.filter(lambda t: t == (1, 2355, 5.0)).count() == 1\n", 579 | "assert trainingRDD.filter(lambda t: t == (1, 595, 5.0)).count() == 1\n", 580 | "\n", 581 | "assert validationRDD.filter(lambda t: t == (1, 1287, 5.0)).count() == 1\n", 582 | "assert validationRDD.filter(lambda t: t == (1, 594, 4.0)).count() == 1\n", 583 | "assert validationRDD.filter(lambda t: t == (1, 1270, 5.0)).count() == 1\n", 584 | "\n", 585 | "assert testRDD.filter(lambda t: t == (1, 1193, 5.0)).count() == 1\n", 586 | "assert testRDD.filter(lambda t: t == (1, 2398, 4.0)).count() == 1\n", 587 | "assert testRDD.filter(lambda t: t == (1, 1035, 5.0)).count() == 1" 588 | ] 589 | }, 590 | { 591 | "cell_type": "markdown", 592 | "metadata": {}, 593 | "source": [ 594 | "#### After splitting the dataset, your training set has about 293,000 entries and the validation and test sets each have about 97,000 entries (the exact number of entries in each dataset varies slightly due to the random nature of the `randomSplit()` transformation." 595 | ] 596 | }, 597 | { 598 | "cell_type": "markdown", 599 | "metadata": {}, 600 | "source": [ 601 | "#### **(2b) Root Mean Square Error (RMSE)**\n", 602 | "#### In the next part, you will generate a few different models, and will need a way to decide which model is best. We will use the [Root Mean Square Error](https://en.wikipedia.org/wiki/Root-mean-square_deviation) (RMSE) or Root Mean Square Deviation (RMSD) to compute the error of each model. RMSE is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed. The RMSD represents the sample standard deviation of the differences between predicted values and observed values. These individual differences are called residuals when the calculations are performed over the data sample that was used for estimation, and are called prediction errors when computed out-of-sample. The RMSE serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSE is a good measure of accuracy, but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent.\n", 603 | "#### The RMSE is the square root of the average value of the square of `(actual rating - predicted rating)` for all users and movies for which we have the actual rating. Versions of Spark MLlib beginning with Spark 1.4 include a [RegressionMetrics](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RegressionMetrics) modiule that can be used to compute the RMSE. However, since we are using Spark 1.3.1, we will write our own function.\n", 604 | "#### Write a function to compute the sum of squared error given `predictedRDD` and `actualRDD` RDDs. Both RDDs consist of tuples of the form (UserID, MovieID, Rating)\n", 605 | "#### Given two ratings RDDs, *x* and *y* of size *n*, we define RSME as follows: $ RMSE = \\sqrt{\\frac{\\sum_{i = 1}^{n} (x_i - y_i)^2}{n}}$\n", 606 | "#### To calculate RSME, the steps you should perform are:\n", 607 | "* #### Transform `predictedRDD` into the tuples of the form ((UserID, MovieID), Rating). For example, tuples like `[((1, 1), 5), ((1, 2), 3), ((1, 3), 4), ((2, 1), 3), ((2, 2), 2), ((2, 3), 4)]`. You can perform this step with a single Spark transformation.\n", 608 | "* #### Transform `actualRDD` into the tuples of the form ((UserID, MovieID), Rating). For example, tuples like `[((1, 2), 3), ((1, 3), 5), ((2, 1), 5), ((2, 2), 1)]`. You can perform this step with a single Spark transformation.\n", 609 | "* #### Using only RDD transformations (you only need to perform two transformations), compute the squared error for each *matching* entry (i.e., the same (UserID, MovieID) in each RDD) in the reformatted RDDs - do *not* use `collect()` to perform this step. Note that not every (UserID, MovieID) pair will appear in both RDDs - if a pair does not appear in both RDDs, then it does not contribute to the RMSE. You will end up with an RDD with entries of the form $ (x_i - y_i)^2$ You might want to check out Python's [math](https://docs.python.org/2/library/math.html) module to see how to compute these values\n", 610 | "* #### Using an RDD action (but **not** `collect()`), compute the total squared error: $ SE = \\sum_{i = 1}^{n} (x_i - y_i)^2 $\n", 611 | "* #### Compute *n* by using an RDD action (but **not** `collect()`), to count the number of pairs for which you computed the total squared error\n", 612 | "* #### Using the total squared error and the number of pairs, compute the RSME. Make sure you compute this value as a [float](https://docs.python.org/2/library/stdtypes.html#numeric-types-int-float-long-complex).\n", 613 | "#### Note: Your solution must only use transformations and actions on RDDs. Do _not_ call `collect()` on either RDD." 614 | ] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "execution_count": 38, 619 | "metadata": { 620 | "collapsed": false 621 | }, 622 | "outputs": [ 623 | { 624 | "name": "stdout", 625 | "output_type": "stream", 626 | "text": [ 627 | "Error for test dataset (should be 1.22474487139): 1.22474487139\n", 628 | "Error for test dataset2 (should be 3.16227766017): 3.16227766017\n", 629 | "Error for testActual dataset (should be 0.0): 0.0\n" 630 | ] 631 | } 632 | ], 633 | "source": [ 634 | "# TODO: Replace with appropriate code\n", 635 | "import math\n", 636 | "\n", 637 | "def computeError(predictedRDD, actualRDD):\n", 638 | " \"\"\" Compute the root mean squared error between predicted and actual\n", 639 | " Args:\n", 640 | " predictedRDD: predicted ratings for each movie and each user where each entry is in the form\n", 641 | " (UserID, MovieID, Rating)\n", 642 | " actualRDD: actual ratings where each entry is in the form (UserID, MovieID, Rating)\n", 643 | " Returns:\n", 644 | " RSME (float): computed RSME value\n", 645 | " \"\"\"\n", 646 | " # Transform predictedRDD into the tuples of the form ((UserID, MovieID), Rating)\n", 647 | " predictedReformattedRDD = predictedRDD.map(lambda x: ((x[0], x[1]), x[2]))\n", 648 | "\n", 649 | " # Transform actualRDD into the tuples of the form ((UserID, MovieID), Rating)\n", 650 | " actualReformattedRDD = actualRDD.map(lambda x: ((x[0], x[1]), x[2]))\n", 651 | "\n", 652 | " # Compute the squared error for each matching entry (i.e., the same (User ID, Movie ID) in each\n", 653 | " # RDD) in the reformatted RDDs using RDD transformtions - do not use collect()\n", 654 | " squaredErrorsRDD = (predictedReformattedRDD\n", 655 | " .join(actualReformattedRDD)\n", 656 | " .map(lambda x: (x, (x[1][0] - x[1][1])**2)))\n", 657 | "\n", 658 | " # Compute the total squared error - do not use collect()\n", 659 | " totalError = squaredErrorsRDD.values().sum()\n", 660 | "\n", 661 | " # Count the number of entries for which you computed the total squared error\n", 662 | " numRatings = squaredErrorsRDD.count()\n", 663 | "\n", 664 | " # Using the total squared error and the number of entries, compute the RSME\n", 665 | " return math.sqrt(float(totalError)/numRatings)\n", 666 | "\n", 667 | "\n", 668 | "# sc.parallelize turns a Python list into a Spark RDD.\n", 669 | "testPredicted = sc.parallelize([\n", 670 | " (1, 1, 5),\n", 671 | " (1, 2, 3),\n", 672 | " (1, 3, 4),\n", 673 | " (2, 1, 3),\n", 674 | " (2, 2, 2),\n", 675 | " (2, 3, 4)])\n", 676 | "testActual = sc.parallelize([\n", 677 | " (1, 2, 3),\n", 678 | " (1, 3, 5),\n", 679 | " (2, 1, 5),\n", 680 | " (2, 2, 1)])\n", 681 | "testPredicted2 = sc.parallelize([\n", 682 | " (2, 2, 5),\n", 683 | " (1, 2, 5)])\n", 684 | "testError = computeError(testPredicted, testActual)\n", 685 | "print 'Error for test dataset (should be 1.22474487139): %s' % testError\n", 686 | "\n", 687 | "testError2 = computeError(testPredicted2, testActual)\n", 688 | "print 'Error for test dataset2 (should be 3.16227766017): %s' % testError2\n", 689 | "\n", 690 | "testError3 = computeError(testActual, testActual)\n", 691 | "print 'Error for testActual dataset (should be 0.0): %s' % testError3" 692 | ] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": 39, 697 | "metadata": { 698 | "collapsed": false 699 | }, 700 | "outputs": [ 701 | { 702 | "name": "stdout", 703 | "output_type": "stream", 704 | "text": [ 705 | "1 test passed.\n", 706 | "1 test passed.\n", 707 | "1 test passed.\n" 708 | ] 709 | } 710 | ], 711 | "source": [ 712 | "# TEST Root Mean Square Error (2b)\n", 713 | "Test.assertTrue(abs(testError - 1.22474487139) < 0.00000001,\n", 714 | " 'incorrect testError (expected 1.22474487139)')\n", 715 | "Test.assertTrue(abs(testError2 - 3.16227766017) < 0.00000001,\n", 716 | " 'incorrect testError2 result (expected 3.16227766017)')\n", 717 | "Test.assertTrue(abs(testError3 - 0.0) < 0.00000001,\n", 718 | " 'incorrect testActual result (expected 0.0)')" 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "metadata": {}, 724 | "source": [ 725 | "#### **(2c) Using ALS.train()**\n", 726 | "#### In this part, we will use the MLlib implementation of Alternating Least Squares, [ALS.train()](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.recommendation.ALS). ALS takes a training dataset (RDD) and several parameters that control the model creation process. To determine the best values for the parameters, we will use ALS to train several models, and then we will select the best model and use the parameters from that model in the rest of this lab exercise.\n", 727 | "#### The process we will use for determining the best model is as follows:\n", 728 | "* #### Pick a set of model parameters. The most important parameter to `ALS.train()` is the *rank*, which is the number of rows in the Users matrix (green in the diagram above) or the number of columns in the Movies matrix (blue in the diagram above). (In general, a lower rank will mean higher error on the training dataset, but a high rank may lead to [overfitting](https://en.wikipedia.org/wiki/Overfitting).) We will train models with ranks of 4, 8, and 12 using the `trainingRDD` dataset.\n", 729 | "* #### Create a model using `ALS.train(trainingRDD, rank, seed=seed, iterations=iterations, lambda_=regularizationParameter)` with three parameters: an RDD consisting of tuples of the form (UserID, MovieID, rating) used to train the model, an integer rank (4, 8, or 12), a number of iterations to execute (we will use 5 for the `iterations` parameter), and a regularization coefficient (we will use 0.1 for the `regularizationParameter`).\n", 730 | "* #### For the prediction step, create an input RDD, `validationForPredictRDD`, consisting of (UserID, MovieID) pairs that you extract from `validationRDD`. You will end up with an RDD of the form: `[(1, 1287), (1, 594), (1, 1270)]`\n", 731 | "* #### Using the model and `validationForPredictRDD`, we can predict rating values by calling [model.predictAll()](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.recommendation.MatrixFactorizationModel.predictAll) with the `validationForPredictRDD` dataset, where `model` is the model we generated with ALS.train(). `predictAll` accepts an RDD with each entry in the format (userID, movieID) and outputs an RDD with each entry in the format (userID, movieID, rating).\n", 732 | "* #### Evaluate the quality of the model by using the `computeError()` function you wrote in part (2b) to compute the error between the predicted ratings and the actual ratings in `validationRDD`.\n", 733 | "#### Which rank produces the best model, based on the RMSE with the `validationRDD` dataset?\n", 734 | "#### Note: It is likely that this operation will take a noticeable amount of time (around a minute in our VM); you can observe its progress on the [Spark Web UI](http://localhost:4040). Probably most of the time will be spent running your `computeError()` function, since, unlike the Spark ALS implementation (and the Spark 1.4 [RegressionMetrics](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RegressionMetrics) module), this does not use a fast linear algebra library and needs to run some Python code for all 100k entries." 735 | ] 736 | }, 737 | { 738 | "cell_type": "code", 739 | "execution_count": 40, 740 | "metadata": { 741 | "collapsed": false 742 | }, 743 | "outputs": [ 744 | { 745 | "name": "stdout", 746 | "output_type": "stream", 747 | "text": [ 748 | "For rank 4 the RMSE is 0.892734779484\n", 749 | "For rank 8 the RMSE is 0.890121292255\n", 750 | "For rank 12 the RMSE is 0.890216118367\n", 751 | "The best model was trained with rank 8\n" 752 | ] 753 | } 754 | ], 755 | "source": [ 756 | "# TODO: Replace with appropriate code\n", 757 | "from pyspark.mllib.recommendation import ALS\n", 758 | "\n", 759 | "validationForPredictRDD = validationRDD.map(lambda x: (x[0], x[1]))\n", 760 | "\n", 761 | "seed = 5L\n", 762 | "iterations = 5\n", 763 | "regularizationParameter = 0.1\n", 764 | "ranks = [4, 8, 12]\n", 765 | "errors = [0, 0, 0]\n", 766 | "err = 0\n", 767 | "tolerance = 0.03\n", 768 | "\n", 769 | "minError = float('inf')\n", 770 | "bestRank = -1\n", 771 | "bestIteration = -1\n", 772 | "for rank in ranks:\n", 773 | " model = ALS.train(trainingRDD, rank, seed=seed, iterations=iterations,\n", 774 | " lambda_=regularizationParameter)\n", 775 | " predictedRatingsRDD = model.predictAll(validationForPredictRDD)\n", 776 | " error = computeError(predictedRatingsRDD, validationRDD)\n", 777 | " errors[err] = error\n", 778 | " err += 1\n", 779 | " print 'For rank %s the RMSE is %s' % (rank, error)\n", 780 | " if error < minError:\n", 781 | " minError = error\n", 782 | " bestRank = rank\n", 783 | "\n", 784 | "print 'The best model was trained with rank %s' % bestRank" 785 | ] 786 | }, 787 | { 788 | "cell_type": "code", 789 | "execution_count": 41, 790 | "metadata": { 791 | "collapsed": false 792 | }, 793 | "outputs": [ 794 | { 795 | "name": "stdout", 796 | "output_type": "stream", 797 | "text": [ 798 | "1 test passed.\n", 799 | "1 test passed.\n", 800 | "1 test passed.\n", 801 | "1 test passed.\n", 802 | "1 test passed.\n", 803 | "1 test passed.\n" 804 | ] 805 | } 806 | ], 807 | "source": [ 808 | "# TEST Using ALS.train (2c)\n", 809 | "Test.assertEquals(trainingRDD.getNumPartitions(), 2,\n", 810 | " 'incorrect number of partitions for trainingRDD (expected 2)')\n", 811 | "Test.assertEquals(validationForPredictRDD.count(), 96902,\n", 812 | " 'incorrect size for validationForPredictRDD (expected 96902)')\n", 813 | "Test.assertEquals(validationForPredictRDD.filter(lambda t: t == (1, 1907)).count(), 1,\n", 814 | " 'incorrect content for validationForPredictRDD')\n", 815 | "Test.assertTrue(abs(errors[0] - 0.883710109497) < tolerance, 'incorrect errors[0]')\n", 816 | "Test.assertTrue(abs(errors[1] - 0.878486305621) < tolerance, 'incorrect errors[1]')\n", 817 | "Test.assertTrue(abs(errors[2] - 0.876832795659) < tolerance, 'incorrect errors[2]')" 818 | ] 819 | }, 820 | { 821 | "cell_type": "markdown", 822 | "metadata": {}, 823 | "source": [ 824 | "#### **(2d) Testing Your Model**\n", 825 | "#### So far, we used the `trainingRDD` and `validationRDD` datasets to select the best model. Since we used these two datasets to determine what model is best, we cannot use them to test how good the model is - otherwise we would be very vulnerable to [overfitting](https://en.wikipedia.org/wiki/Overfitting). To decide how good our model is, we need to use the `testRDD` dataset. We will use the `bestRank` you determined in part (2c) to create a model for predicting the ratings for the test dataset and then we will compute the RMSE.\n", 826 | "#### The steps you should perform are:\n", 827 | "* #### Train a model, using the `trainingRDD`, `bestRank` from part (2c), and the parameters you used in in part (2c): `seed=seed`, `iterations=iterations`, and `lambda_=regularizationParameter` - make sure you include **all** of the parameters.\n", 828 | "* #### For the prediction step, create an input RDD, `testForPredictingRDD`, consisting of (UserID, MovieID) pairs that you extract from `testRDD`. You will end up with an RDD of the form: `[(1, 1287), (1, 594), (1, 1270)]`\n", 829 | "* #### Use [myModel.predictAll()](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.recommendation.MatrixFactorizationModel.predictAll) to predict rating values for the test dataset.\n", 830 | "* #### For validation, use the `testRDD`and your `computeError` function to compute the RMSE between `testRDD` and the `predictedTestRDD` from the model.\n", 831 | "* #### Evaluate the quality of the model by using the `computeError()` function you wrote in part (2b) to compute the error between the predicted ratings and the actual ratings in `testRDD`." 832 | ] 833 | }, 834 | { 835 | "cell_type": "code", 836 | "execution_count": 42, 837 | "metadata": { 838 | "collapsed": false 839 | }, 840 | "outputs": [ 841 | { 842 | "name": "stdout", 843 | "output_type": "stream", 844 | "text": [ 845 | "The model had a RMSE on the test set of 0.891048561304\n" 846 | ] 847 | } 848 | ], 849 | "source": [ 850 | "# TODO: Replace with appropriate code\n", 851 | "myModel = ALS.train(trainingRDD, bestRank, seed=seed, iterations=iterations, lambda_=regularizationParameter)\n", 852 | "testForPredictingRDD = testRDD.map(lambda x: (x[0], x[1]))\n", 853 | "predictedTestRDD = myModel.predictAll(testForPredictingRDD)\n", 854 | "\n", 855 | "testRMSE = computeError(testRDD, predictedTestRDD)\n", 856 | "\n", 857 | "print 'The model had a RMSE on the test set of %s' % testRMSE" 858 | ] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "execution_count": 43, 863 | "metadata": { 864 | "collapsed": false 865 | }, 866 | "outputs": [ 867 | { 868 | "name": "stdout", 869 | "output_type": "stream", 870 | "text": [ 871 | "1 test passed.\n" 872 | ] 873 | } 874 | ], 875 | "source": [ 876 | "# TEST Testing Your Model (2d)\n", 877 | "Test.assertTrue(abs(testRMSE - 0.87809838344) < tolerance, 'incorrect testRMSE')" 878 | ] 879 | }, 880 | { 881 | "cell_type": "markdown", 882 | "metadata": {}, 883 | "source": [ 884 | "#### **(2e) Comparing Your Model**\n", 885 | "#### Looking at the RMSE for the results predicted by the model versus the values in the test set is one way to evalute the quality of our model. Another way to evaluate the model is to evaluate the error from a test set where every rating is the average rating for the training set.\n", 886 | "#### The steps you should perform are:\n", 887 | "* #### Use the `trainingRDD` to compute the average rating across all movies in that training dataset.\n", 888 | "* #### Use the average rating that you just determined and the `testRDD` to create an RDD with entries of the form (userID, movieID, average rating).\n", 889 | "* #### Use your `computeError` function to compute the RMSE between the `testRDD` validation RDD that you just created and the `testForAvgRDD`." 890 | ] 891 | }, 892 | { 893 | "cell_type": "code", 894 | "execution_count": 44, 895 | "metadata": { 896 | "collapsed": false 897 | }, 898 | "outputs": [ 899 | { 900 | "name": "stdout", 901 | "output_type": "stream", 902 | "text": [ 903 | "The average rating for movies in the training set is 3.57409571052\n", 904 | "The RMSE on the average set is 1.12036693569\n" 905 | ] 906 | } 907 | ], 908 | "source": [ 909 | "# TODO: Replace with appropriate code\n", 910 | "\n", 911 | "trainingAvgRating = float(trainingRDD.map(lambda x: x[2]).sum()) / trainingRDD.count()\n", 912 | "print 'The average rating for movies in the training set is %s' % trainingAvgRating\n", 913 | "\n", 914 | "testForAvgRDD = testRDD.map(lambda x: (x[0], x[1], trainingAvgRating))\n", 915 | "testAvgRMSE = computeError(testRDD, testForAvgRDD)\n", 916 | "print 'The RMSE on the average set is %s' % testAvgRMSE" 917 | ] 918 | }, 919 | { 920 | "cell_type": "code", 921 | "execution_count": 45, 922 | "metadata": { 923 | "collapsed": false 924 | }, 925 | "outputs": [ 926 | { 927 | "name": "stdout", 928 | "output_type": "stream", 929 | "text": [ 930 | "1 test passed.\n", 931 | "1 test passed.\n" 932 | ] 933 | } 934 | ], 935 | "source": [ 936 | "# TEST Comparing Your Model (2e)\n", 937 | "Test.assertTrue(abs(trainingAvgRating - 3.57409571052) < 0.000001,\n", 938 | " 'incorrect trainingAvgRating (expected 3.57409571052)')\n", 939 | "Test.assertTrue(abs(testAvgRMSE - 1.12036693569) < 0.000001,\n", 940 | " 'incorrect testAvgRMSE (expected 1.12036693569)')" 941 | ] 942 | }, 943 | { 944 | "cell_type": "markdown", 945 | "metadata": {}, 946 | "source": [ 947 | "#### You now have code to predict how users will rate movies!" 948 | ] 949 | }, 950 | { 951 | "cell_type": "markdown", 952 | "metadata": {}, 953 | "source": [ 954 | "## **Part 3: Predictions for Yourself**\n", 955 | "#### The ultimate goal of this lab exercise is to predict what movies to recommend to yourself. In order to do that, you will first need to add ratings for yourself to the `ratingsRDD` dataset." 956 | ] 957 | }, 958 | { 959 | "cell_type": "markdown", 960 | "metadata": {}, 961 | "source": [ 962 | "#### **(3a) Your Movie Ratings**\n", 963 | "#### To help you provide ratings for yourself, we have included the following code to list the names and movie IDs of the 50 highest-rated movies from `movieLimitedAndSortedByRatingRDD` which we created in part 1 the lab." 964 | ] 965 | }, 966 | { 967 | "cell_type": "code", 968 | "execution_count": 70, 969 | "metadata": { 970 | "collapsed": false 971 | }, 972 | "outputs": [ 973 | { 974 | "name": "stdout", 975 | "output_type": "stream", 976 | "text": [ 977 | "Most rated movies:\n", 978 | "(average rating, movie name, number of reviews)\n", 979 | "(4.5349264705882355, u'Shawshank Redemption, The (1994)', 1088)\n", 980 | "(4.515798462852263, u\"Schindler's List (1993)\", 1171)\n", 981 | "(4.512893982808023, u'Godfather, The (1972)', 1047)\n", 982 | "(4.510460251046025, u'Raiders of the Lost Ark (1981)', 1195)\n", 983 | "(4.505415162454874, u'Usual Suspects, The (1995)', 831)\n", 984 | "(4.457256461232604, u'Rear Window (1954)', 503)\n", 985 | "(4.45468509984639, u'Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)', 651)\n", 986 | "(4.43953006219765, u'Star Wars: Episode IV - A New Hope (1977)', 1447)\n", 987 | "(4.4, u'Sixth Sense, The (1999)', 1110)\n", 988 | "(4.394285714285714, u'North by Northwest (1959)', 700)\n", 989 | "(4.379506641366224, u'Citizen Kane (1941)', 527)\n", 990 | "(4.375, u'Casablanca (1942)', 776)\n", 991 | "(4.363975155279503, u'Godfather: Part II, The (1974)', 805)\n", 992 | "(4.358816276202219, u\"One Flew Over the Cuckoo's Nest (1975)\", 811)\n", 993 | "(4.358173076923077, u'Silence of the Lambs, The (1991)', 1248)\n", 994 | "(4.335826477187734, u'Saving Private Ryan (1998)', 1337)\n", 995 | "(4.326241134751773, u'Chinatown (1974)', 564)\n", 996 | "(4.325383304940375, u'Life Is Beautiful (La Vita \\ufffd bella) (1997)', 587)\n", 997 | "(4.324110671936759, u'Monty Python and the Holy Grail (1974)', 759)\n", 998 | "(4.3096, u'Matrix, The (1999)', 1250)\n", 999 | "(4.309457579972183, u'Star Wars: Episode V - The Empire Strikes Back (1980)', 1438)\n", 1000 | "(4.30379746835443, u'Young Frankenstein (1974)', 553)\n", 1001 | "(4.301346801346801, u'Psycho (1960)', 594)\n", 1002 | "(4.296438883541867, u'Pulp Fiction (1994)', 1039)\n", 1003 | "(4.286535303776683, u'Fargo (1996)', 1218)\n", 1004 | "(4.282367447595561, u'GoodFellas (1990)', 811)\n", 1005 | "(4.27943661971831, u'American Beauty (1999)', 1775)\n", 1006 | "(4.268053855569155, u'Wizard of Oz, The (1939)', 817)\n", 1007 | "(4.267774699907664, u'Princess Bride, The (1987)', 1083)\n", 1008 | "(4.253333333333333, u'Graduate, The (1967)', 600)\n", 1009 | "(4.236263736263736, u'Run Lola Run (Lola rennt) (1998)', 546)\n", 1010 | "(4.233807266982622, u'Amadeus (1984)', 633)\n", 1011 | "(4.232558139534884, u'Toy Story 2 (1999)', 860)\n", 1012 | "(4.232558139534884, u'This Is Spinal Tap (1984)', 516)\n", 1013 | "(4.228494623655914, u'Almost Famous (2000)', 744)\n", 1014 | "(4.2250755287009065, u'Christmas Story, A (1983)', 662)\n", 1015 | "(4.216757741347905, u'Glory (1989)', 549)\n", 1016 | "(4.213358070500927, u'Apocalypse Now (1979)', 539)\n", 1017 | "(4.20992028343667, u'L.A. Confidential (1997)', 1129)\n", 1018 | "(4.204733727810651, u'Blade Runner (1982)', 845)\n", 1019 | "(4.1886120996441285, u'Sling Blade (1996)', 562)\n", 1020 | "(4.184615384615385, u'Braveheart (1995)', 1300)\n", 1021 | "(4.184168012924071, u'Butch Cassidy and the Sundance Kid (1969)', 619)\n", 1022 | "(4.182509505703422, u'Good Will Hunting (1997)', 789)\n", 1023 | "(4.166969147005445, u'Taxi Driver (1976)', 551)\n", 1024 | "(4.162767039674466, u'Terminator, The (1984)', 983)\n", 1025 | "(4.157545605306799, u'Reservoir Dogs (1992)', 603)\n", 1026 | "(4.153333333333333, u'Jaws (1975)', 750)\n", 1027 | "(4.149840595111583, u'Alien (1979)', 941)\n", 1028 | "(4.145015105740181, u'Toy Story (1995)', 993)\n" 1029 | ] 1030 | } 1031 | ], 1032 | "source": [ 1033 | "print 'Most rated movies:'\n", 1034 | "print '(average rating, movie name, number of reviews)'\n", 1035 | "for ratingsTuple in movieLimitedAndSortedByRatingRDD.take(50):\n", 1036 | " print ratingsTuple\n", 1037 | " \n", 1038 | "#a = moviesRDD.join(movieIDsWithAvgRatingsRDD).map(lambda x: (x[0], x[1][0], x[1][1][1], x[1][1][0])).filter(lambda x: (x[3] > 1000 )).filter(lambda x: (x[2] < 4 )).take(50)\n", 1039 | "#for i in a:\n", 1040 | "# print i" 1041 | ] 1042 | }, 1043 | { 1044 | "cell_type": "markdown", 1045 | "metadata": {}, 1046 | "source": [ 1047 | "#### The user ID 0 is unassigned, so we will use it for your ratings. We set the variable `myUserID` to 0 for you. Next, create a new RDD `myRatingsRDD` with your ratings for at least 10 movie ratings. Each entry should be formatted as `(myUserID, movieID, rating)` (i.e., each entry should be formatted in the same way as `trainingRDD`). As in the original dataset, ratings should be between 1 and 5 (inclusive). If you have not seen at least 10 of these movies, you can increase the parameter passed to `take()` in the above cell until there are 10 movies that you have seen (or you can also guess what your rating would be for movies you have not seen)." 1048 | ] 1049 | }, 1050 | { 1051 | "cell_type": "code", 1052 | "execution_count": 71, 1053 | "metadata": { 1054 | "collapsed": false 1055 | }, 1056 | "outputs": [ 1057 | { 1058 | "name": "stdout", 1059 | "output_type": "stream", 1060 | "text": [ 1061 | "My movie ratings: [(0, 2115, 4.5), (0, 480, 4), (0, 1377, 3.8), (0, 648, 4), (0, 2571, 4.8), (0, 1198, 5), (0, 1580, 3.6), (0, 1219, 4.5), (0, 589, 3.2), (0, 1097, 4)]\n" 1062 | ] 1063 | } 1064 | ], 1065 | "source": [ 1066 | "# TODO: Replace with appropriate code\n", 1067 | "myUserID = 0\n", 1068 | "\n", 1069 | "# Note that the movie IDs are the *last* number on each line. A common error was to use the number of ratings as the movie ID.\n", 1070 | "myRatedMovies = [\n", 1071 | " # The format of each line is (myUserID, movie ID, your rating)\n", 1072 | " # For example, to give the movie \"Star Wars: Episode IV - A New Hope (1977)\" a five rating, you would add the following line:\n", 1073 | " # (myUserID, 260, 5),\n", 1074 | " (myUserID, 2115, 4.5), # Indiana Jones and the Temple of Doom\n", 1075 | " (myUserID, 480, 4), # Jurassic Park\n", 1076 | " (myUserID, 1377, 3.8), # Batman Returns\n", 1077 | " (myUserID, 648, 4), # Mission Impossible\n", 1078 | " (myUserID, 2571, 4.8), # Matrix\n", 1079 | " (myUserID, 1198, 5), # Raiders of the Lost Ark\n", 1080 | " (myUserID, 1580, 3.6), # Men In Black\n", 1081 | " (myUserID, 1219, 4.5), # Psycho\n", 1082 | " (myUserID, 589, 3.2), # Terminator 2\n", 1083 | " (myUserID, 1097, 4) # ET\n", 1084 | " ]\n", 1085 | "myRatingsRDD = sc.parallelize(myRatedMovies)\n", 1086 | "print 'My movie ratings: %s' % myRatingsRDD.take(10)" 1087 | ] 1088 | }, 1089 | { 1090 | "cell_type": "markdown", 1091 | "metadata": {}, 1092 | "source": [ 1093 | "#### **(3b) Add Your Movies to Training Dataset**\n", 1094 | "#### Now that you have ratings for yourself, you need to add your ratings to the `training` dataset so that the model you train will incorporate your preferences. Spark's [union()](http://spark.apache.org/docs/latest/api/python/pyspark.rdd.RDD-class.html#union) transformation combines two RDDs; use `union()` to create a new training dataset that includes your ratings and the data in the original training dataset." 1095 | ] 1096 | }, 1097 | { 1098 | "cell_type": "code", 1099 | "execution_count": 72, 1100 | "metadata": { 1101 | "collapsed": false 1102 | }, 1103 | "outputs": [ 1104 | { 1105 | "name": "stdout", 1106 | "output_type": "stream", 1107 | "text": [ 1108 | "The training dataset now has 10 more entries than the original training dataset\n" 1109 | ] 1110 | } 1111 | ], 1112 | "source": [ 1113 | "# TODO: Replace with appropriate code\n", 1114 | "trainingWithMyRatingsRDD = trainingRDD.union(myRatingsRDD)\n", 1115 | "\n", 1116 | "print ('The training dataset now has %s more entries than the original training dataset' %\n", 1117 | " (trainingWithMyRatingsRDD.count() - trainingRDD.count()))\n", 1118 | "assert (trainingWithMyRatingsRDD.count() - trainingRDD.count()) == myRatingsRDD.count()" 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "markdown", 1123 | "metadata": {}, 1124 | "source": [ 1125 | "#### **(3c) Train a Model with Your Ratings**\n", 1126 | "#### Now, train a model with your ratings added and the parameters you used in in part (2c): `bestRank`, `seed=seed`, `iterations=iterations`, and `lambda_=regularizationParameter` - make sure you include **all** of the parameters." 1127 | ] 1128 | }, 1129 | { 1130 | "cell_type": "code", 1131 | "execution_count": 73, 1132 | "metadata": { 1133 | "collapsed": false 1134 | }, 1135 | "outputs": [], 1136 | "source": [ 1137 | "# TODO: Replace with appropriate code\n", 1138 | "myRatingsModel = ALS.train(trainingWithMyRatingsRDD, bestRank, seed=seed, iterations=iterations, lambda_=regularizationParameter)" 1139 | ] 1140 | }, 1141 | { 1142 | "cell_type": "markdown", 1143 | "metadata": {}, 1144 | "source": [ 1145 | "#### **(3d) Check RMSE for the New Model with Your Ratings**\n", 1146 | "#### Compute the RMSE for this new model on the test set.\n", 1147 | "* #### For the prediction step, we reuse `testForPredictingRDD`, consisting of (UserID, MovieID) pairs that you extracted from `testRDD`. The RDD has the form: `[(1, 1287), (1, 594), (1, 1270)]`\n", 1148 | "* #### Use `myRatingsModel.predictAll()` to predict rating values for the `testForPredictingRDD` test dataset, set this as `predictedTestMyRatingsRDD`\n", 1149 | "* #### For validation, use the `testRDD`and your `computeError` function to compute the RMSE between `testRDD` and the `predictedTestMyRatingsRDD` from the model." 1150 | ] 1151 | }, 1152 | { 1153 | "cell_type": "code", 1154 | "execution_count": 74, 1155 | "metadata": { 1156 | "collapsed": false 1157 | }, 1158 | "outputs": [ 1159 | { 1160 | "name": "stdout", 1161 | "output_type": "stream", 1162 | "text": [ 1163 | "The model had a RMSE on the test set of 0.891957706731\n" 1164 | ] 1165 | } 1166 | ], 1167 | "source": [ 1168 | "# TODO: Replace with appropriate code\n", 1169 | "predictedTestMyRatingsRDD = myRatingsModel.predictAll(testForPredictingRDD)\n", 1170 | "testRMSEMyRatings = computeError(testRDD, predictedTestMyRatingsRDD)\n", 1171 | "print 'The model had a RMSE on the test set of %s' % testRMSEMyRatings" 1172 | ] 1173 | }, 1174 | { 1175 | "cell_type": "markdown", 1176 | "metadata": {}, 1177 | "source": [ 1178 | "#### **(3e) Predict Your Ratings**\n", 1179 | "#### So far, we have only used the `predictAll` method to compute the error of the model. Here, use the `predictAll` to predict what ratings you would give to the movies that you did not already provide ratings for.\n", 1180 | "#### The steps you should perform are:\n", 1181 | "* #### Use the Python list `myRatedMovies` to transform the `moviesRDD` into an RDD with entries that are pairs of the form (myUserID, Movie ID) and that does not contain any movies that you have rated. This transformation will yield an RDD of the form: `[(0, 1), (0, 2), (0, 3), (0, 4)]`. Note that you can do this step with one RDD transformation.\n", 1182 | "* #### For the prediction step, use the input RDD, `myUnratedMoviesRDD`, with myRatingsModel.predictAll() to predict your ratings for the movies." 1183 | ] 1184 | }, 1185 | { 1186 | "cell_type": "code", 1187 | "execution_count": 76, 1188 | "metadata": { 1189 | "collapsed": false 1190 | }, 1191 | "outputs": [], 1192 | "source": [ 1193 | "# TODO: Replace with appropriate code\n", 1194 | "\n", 1195 | "# Use the Python list myRatedMovies to transform the moviesRDD into an RDD with entries that are pairs of the form (myUserID, Movie ID) and that does not contain any movies that you have rated.\n", 1196 | "myUnratedMoviesRDD = (moviesRDD\n", 1197 | " .filter(lambda x: x[0] not in [x[1] for x in myRatedMovies])\n", 1198 | " .map(lambda x: (myUserID, x[0])))\n", 1199 | "\n", 1200 | "# Use the input RDD, myUnratedMoviesRDD, with myRatingsModel.predictAll() to predict your ratings for the movies\n", 1201 | "predictedRatingsRDD = myRatingsModel.predictAll(myUnratedMoviesRDD)" 1202 | ] 1203 | }, 1204 | { 1205 | "cell_type": "markdown", 1206 | "metadata": {}, 1207 | "source": [ 1208 | "#### **(3f) Predict Your Ratings**\n", 1209 | "#### We have our predicted ratings. Now we can print out the 25 movies with the highest predicted ratings.\n", 1210 | "#### The steps you should perform are:\n", 1211 | "* #### From Parts (1b) and (1c), we know that we should look at movies with a reasonable number of reviews (e.g., more than 75 reviews). You can experiment with a lower threshold, but fewer ratings for a movie may yield higher prediction errors. Transform `movieIDsWithAvgRatingsRDD` from Part (1b), which has the form (MovieID, (number of ratings, average rating)), into an RDD of the form (MovieID, number of ratings): `[(2, 332), (4, 71), (6, 442)]`\n", 1212 | "* #### We want to see movie names, instead of movie IDs. Transform `predictedRatingsRDD` into an RDD with entries that are pairs of the form (Movie ID, Predicted Rating): `[(3456, -0.5501005376936687), (1080, 1.5885892024487962), (320, -3.7952255522487865)]`\n", 1213 | "* #### Use RDD transformations with `predictedRDD` and `movieCountsRDD` to yield an RDD with tuples of the form (Movie ID, (Predicted Rating, number of ratings)): `[(2050, (0.6694097486155939, 44)), (10, (5.29762541533513, 418)), (2060, (0.5055259373841172, 97))]`\n", 1214 | "* #### Use RDD transformations with `predictedWithCountsRDD` and `moviesRDD` to yield an RDD with tuples of the form (Predicted Rating, Movie Name, number of ratings), _for movies with more than 75 ratings._ For example: `[(7.983121900375243, u'Under Siege (1992)'), (7.9769201864261285, u'Fifth Element, The (1997)')]`" 1215 | ] 1216 | }, 1217 | { 1218 | "cell_type": "code", 1219 | "execution_count": 81, 1220 | "metadata": { 1221 | "collapsed": false 1222 | }, 1223 | "outputs": [ 1224 | { 1225 | "name": "stdout", 1226 | "output_type": "stream", 1227 | "text": [ 1228 | "My highest rated movies as predicted (for movies with more than 75 reviews):\n", 1229 | "(4.617515565226586, u'Paths of Glory (1957)', 105)\n", 1230 | "(4.580442980252668, u'Shawshank Redemption, The (1994)', 1088)\n", 1231 | "(4.558944254927245, u'Usual Suspects, The (1995)', 831)\n", 1232 | "(4.5361511233660465, u'Central Station (Central do Brasil) (1998)', 103)\n", 1233 | "(4.532177731227485, u\"Schindler's List (1993)\", 1171)\n", 1234 | "(4.529464949063941, u'Wrong Trousers, The (1993)', 425)\n", 1235 | "(4.528013235422944, u'Star Wars: Episode IV - A New Hope (1977)', 1447)\n", 1236 | "(4.520826234767605, u'Double Indemnity (1944)', 274)\n", 1237 | "(4.486983239408896, u'Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)', 278)\n", 1238 | "(4.47968409678268, u'Star Wars: Episode V - The Empire Strikes Back (1980)', 1438)\n", 1239 | "(4.458990863192621, u'Close Shave, A (1995)', 318)\n", 1240 | "(4.449010736622215, u'Gaslight (1944)', 101)\n", 1241 | "(4.441374873132692, u'Bridge on the River Kwai, The (1957)', 481)\n", 1242 | "(4.435936992100364, u'Godfather, The (1972)', 1047)\n", 1243 | "(4.428243993345163, u'Sixth Sense, The (1999)', 1110)\n", 1244 | "(4.4238592489528035, u'Pulp Fiction (1994)', 1039)\n", 1245 | "(4.418094843257308, u'Killer, The (Die xue shuang xiong) (1989)', 117)\n", 1246 | "(4.417747761006179, u'Monty Python and the Holy Grail (1974)', 759)\n", 1247 | "(4.416446738056047, u'Silence of the Lambs, The (1991)', 1248)\n", 1248 | "(4.400408520322576, u'Maltese Falcon, The (1941)', 476)\n" 1249 | ] 1250 | } 1251 | ], 1252 | "source": [ 1253 | "# TODO: Replace with appropriate code\n", 1254 | "\n", 1255 | "# Transform movieIDsWithAvgRatingsRDD from part (1b), which has the form (MovieID, (number of ratings, average rating)), into and RDD of the form (MovieID, number of ratings)\n", 1256 | "movieCountsRDD = movieIDsWithAvgRatingsRDD.map(lambda x: (x[0], x[1][0]))\n", 1257 | "\n", 1258 | "# Transform predictedRatingsRDD into an RDD with entries that are pairs of the form (Movie ID, Predicted Rating)\n", 1259 | "predictedRDD = predictedRatingsRDD.map(lambda x: (x[1], x[2]))\n", 1260 | "\n", 1261 | "# Use RDD transformations with predictedRDD and movieCountsRDD to yield an RDD with tuples of the form (Movie ID, (Predicted Rating, number of ratings))\n", 1262 | "predictedWithCountsRDD = (predictedRDD\n", 1263 | " .join(movieCountsRDD))\n", 1264 | "\n", 1265 | "# Use RDD transformations with PredictedWithCountsRDD and moviesRDD to yield an RDD with tuples of the form (Predicted Rating, Movie Name, number of ratings), for movies with more than 75 ratings\n", 1266 | "ratingsWithNamesRDD = (predictedWithCountsRDD\n", 1267 | " .filter(lambda x: x[1][1] > 75)\n", 1268 | " .join(moviesRDD)\n", 1269 | " .map(lambda x: (x[1][0][0], x[1][1], x[1][0][1])))\n", 1270 | "\n", 1271 | "predictedHighestRatedMovies = ratingsWithNamesRDD.takeOrdered(20, key=lambda x: -x[0])\n", 1272 | "print ('My highest rated movies as predicted (for movies with more than 75 reviews):\\n%s' %\n", 1273 | " '\\n'.join(map(str, predictedHighestRatedMovies)))" 1274 | ] 1275 | } 1276 | ], 1277 | "metadata": { 1278 | "kernelspec": { 1279 | "display_name": "Python 2", 1280 | "language": "python", 1281 | "name": "python2" 1282 | }, 1283 | "language_info": { 1284 | "codemirror_mode": { 1285 | "name": "ipython", 1286 | "version": 2 1287 | }, 1288 | "file_extension": ".py", 1289 | "mimetype": "text/x-python", 1290 | "name": "python", 1291 | "nbconvert_exporter": "python", 1292 | "pygments_lexer": "ipython2", 1293 | "version": "2.7.6" 1294 | } 1295 | }, 1296 | "nbformat": 4, 1297 | "nbformat_minor": 0 1298 | } 1299 | --------------------------------------------------------------------------------