├── Analyzing US Economic Data and Building Dashboard.ipynb
├── Final Project - Data Visualization.ipynb
├── First Notebook.ipynb
├── House Sales in King County, USA Project.ipynb
├── Machine Learning Final Project.ipynb
├── Neighborhoods in Mumbai to Open a Restaurant.ipynb
├── README.md
├── SQL Assignment - Chicago.ipynb
└── Segmenting and Clustering Neighborhoods in Toronto.ipynb
/Analyzing US Economic Data and Building Dashboard.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": ""
7 | },
8 | {
9 | "cell_type": "markdown",
10 | "metadata": {},
11 | "source": "
Analyzing US Economic Data and Building a Dashboard
\n
Description
\n"
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": "Extracting essential data from a dataset and displaying it is a necessary part of data science; therefore individuals can make correct decisions based on the data. In this assignment, you will extract some essential economic indicators from some data, you will then display these economic indicators in a Dashboard. You can then share the dashboard via an URL.\n
\n Gross domestic product (GDP) is a measure of the market value of all the final goods and services produced in a period. GDP is an indicator of how well the economy is doing. A drop in GDP indicates the economy is producing less; similarly an increase in GDP suggests the economy is performing better. In this lab, you will examine how changes in GDP impact the unemployment rate. You will take screen shots of every step, you will share the notebook and the URL pointing to the dashboard.
"
41 | },
42 | "metadata": {},
43 | "output_type": "display_data"
44 | },
45 | {
46 | "data": {
47 | "application/javascript": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n var JS_MIME_TYPE = 'application/javascript';\n var HTML_MIME_TYPE = 'text/html';\n var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n var CLASS_NAME = 'output_bokeh rendered_html';\n\n /**\n * Render data to the DOM node\n */\n function render(props, node) {\n var script = document.createElement(\"script\");\n node.appendChild(script);\n }\n\n /**\n * Handle when an output is cleared or removed\n */\n function handleClearOutput(event, handle) {\n var cell = handle.cell;\n\n var id = cell.output_area._bokeh_element_id;\n var server_id = cell.output_area._bokeh_server_id;\n // Clean up Bokeh references\n if (id != null && id in Bokeh.index) {\n Bokeh.index[id].model.document.clear();\n delete Bokeh.index[id];\n }\n\n if (server_id !== undefined) {\n // Clean up Bokeh references\n var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n cell.notebook.kernel.execute(cmd, {\n iopub: {\n output: function(msg) {\n var id = msg.content.text.trim();\n if (id in Bokeh.index) {\n Bokeh.index[id].model.document.clear();\n delete Bokeh.index[id];\n }\n }\n }\n });\n // Destroy server and session\n var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n cell.notebook.kernel.execute(cmd);\n }\n }\n\n /**\n * Handle when a new output is added\n */\n function handleAddOutput(event, handle) {\n var output_area = handle.output_area;\n var output = handle.output;\n\n // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n return\n }\n\n var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n\n if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n // store reference to embed id on output_area\n output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n }\n if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n var bk_div = document.createElement(\"div\");\n bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n var script_attrs = bk_div.children[0].attributes;\n for (var i = 0; i < script_attrs.length; i++) {\n toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n }\n // store reference to server id on output_area\n output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n }\n }\n\n function register_renderer(events, OutputArea) {\n\n function append_mime(data, metadata, element) {\n // create a DOM node to render to\n var toinsert = this.create_output_subarea(\n metadata,\n CLASS_NAME,\n EXEC_MIME_TYPE\n );\n this.keyboard_manager.register_events(toinsert);\n // Render to node\n var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n render(props, toinsert[toinsert.length - 1]);\n element.append(toinsert);\n return toinsert\n }\n\n /* Handle when an output is cleared or removed */\n events.on('clear_output.CodeCell', handleClearOutput);\n events.on('delete.Cell', handleClearOutput);\n\n /* Handle when a new output is added */\n events.on('output_added.OutputArea', handleAddOutput);\n\n /**\n * Register the mime type and append_mime function with output_area\n */\n OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n /* Is output safe? */\n safe: true,\n /* Index of renderer in `output_area.display_order` */\n index: 0\n });\n }\n\n // register the mime type if in Jupyter Notebook environment and previously unregistered\n if (root.Jupyter !== undefined) {\n var events = require('base/js/events');\n var OutputArea = require('notebook/js/outputarea').OutputArea;\n\n if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n register_renderer(events, OutputArea);\n }\n }\n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"
\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"
re-rerun `output_notebook()` to attempt to load from CDN again, or
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"1940\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n }\n finally {\n delete root._bokeh_onload_callbacks\n }\n console.info(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(js_urls, callback) {\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = js_urls.length;\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var s = document.createElement('script');\n s.src = url;\n s.async = false;\n s.onreadystatechange = s.onload = function() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: all BokehJS libraries loaded\");\n run_callbacks()\n }\n };\n s.onerror = function() {\n console.warn(\"failed to load library \" + url);\n };\n console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.getElementsByTagName(\"head\")[0].appendChild(s);\n }\n };var element = document.getElementById(\"1940\");\n if (element == null) {\n console.log(\"Bokeh: ERROR: autoload.js configured with elementid '1940' but no matching script tag was found. \")\n return false;\n }\n\n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-1.0.4.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.4.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.4.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-1.0.4.min.js\"];\n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n \n function(Bokeh) {\n \n },\n function(Bokeh) {\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-1.0.4.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-1.0.4.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.4.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.4.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.4.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.4.min.css\");\n }\n ];\n\n function run_inline_js() {\n \n if ((root.Bokeh !== undefined) || (force === true)) {\n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"1940\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(js_urls, function() {\n console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));",
48 | "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"
\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"
re-rerun `output_notebook()` to attempt to load from CDN again, or
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"1940\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n }\n finally {\n delete root._bokeh_onload_callbacks\n }\n console.info(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(js_urls, callback) {\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = js_urls.length;\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var s = document.createElement('script');\n s.src = url;\n s.async = false;\n s.onreadystatechange = s.onload = function() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: all BokehJS libraries loaded\");\n run_callbacks()\n }\n };\n s.onerror = function() {\n console.warn(\"failed to load library \" + url);\n };\n console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.getElementsByTagName(\"head\")[0].appendChild(s);\n }\n };var element = document.getElementById(\"1940\");\n if (element == null) {\n console.log(\"Bokeh: ERROR: autoload.js configured with elementid '1940' but no matching script tag was found. \")\n return false;\n }\n\n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-1.0.4.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.4.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.4.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-1.0.4.min.js\"];\n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n \n function(Bokeh) {\n \n },\n function(Bokeh) {\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-1.0.4.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-1.0.4.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.4.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.0.4.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.4.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.0.4.min.css\");\n }\n ];\n\n function run_inline_js() {\n \n if ((root.Bokeh !== undefined) || (force === true)) {\n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"1940\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(js_urls, function() {\n console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));"
49 | },
50 | "metadata": {},
51 | "output_type": "display_data"
52 | }
53 | ],
54 | "source": "import pandas as pd\nfrom bokeh.plotting import figure, output_file, show,output_notebook\noutput_notebook()"
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {},
59 | "source": "In this section, we define the function make_dashboard. \nYou don't have to know how the function works, you should only care about the inputs. The function will produce a dashboard as well as an html file. You can then use this html file to share your dashboard. If you do not know what an html file is don't worry everything you need to know will be provided in the lab. "
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": 84,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": "def make_dashboard(x, gdp_change, unemployment, title, file_name):\n output_file(file_name)\n p = figure(title=title, x_axis_label='year', y_axis_label='%')\n p.line(x.squeeze(), gdp_change.squeeze(), color=\"firebrick\", line_width=4, legend=\"% GDP change\")\n p.line(x.squeeze(), unemployment.squeeze(), line_width=4, legend=\"% unemployed\")\n show(p)"
67 | },
68 | {
69 | "cell_type": "markdown",
70 | "metadata": {},
71 | "source": "The dictionary links contain the CSV files with all the data. The value for the key GDP is the file that contains the GDP data. The value for the key unemployment contains the unemployment data."
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 85,
76 | "metadata": {},
77 | "outputs": [],
78 | "source": "links={'GDP':'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/projects/coursera_project/clean_gdp.csv',\\\n 'unemployment':'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/projects/coursera_project/clean_unemployment.csv'}"
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "metadata": {},
83 | "source": "
Question 1: Create a dataframe that contains the GDP data and display the first five rows of the dataframe.
"
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {},
88 | "source": "Use the dictionary links and the function pd.read_csv to create a Pandas dataframes that contains the GDP data."
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": "Hint: links[\"GDP\"] contains the path or name of the file."
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": 86,
98 | "metadata": {},
99 | "outputs": [],
100 | "source": "# Type your code here\ndf_gdp = pd.read_csv(links['GDP'])"
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "metadata": {},
105 | "source": "Use the method head() to display the first five rows of the GDP data, then take a screen-shot."
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 87,
110 | "metadata": {},
111 | "outputs": [
112 | {
113 | "data": {
114 | "text/html": "
Question 4: Use the function make_dashboard to make a dashboard
"
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "metadata": {},
193 | "source": "In this section, you will call the function make_dashboard , to produce a dashboard. We will use the convention of giving each variable the same name as the function parameter."
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": "Create a new dataframe with the column 'date' called x from the dataframe that contains the GDP data."
199 | },
200 | {
201 | "cell_type": "code",
202 | "execution_count": 91,
203 | "metadata": {},
204 | "outputs": [],
205 | "source": "x = df_gdp['date'] # Create your dataframe with column date"
206 | },
207 | {
208 | "cell_type": "markdown",
209 | "metadata": {},
210 | "source": "Create a new dataframe with the column 'change-current' called gdp_change from the dataframe that contains the GDP data."
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 92,
215 | "metadata": {},
216 | "outputs": [],
217 | "source": "gdp_change = df_gdp['change-current'] # Create your dataframe with column change-current"
218 | },
219 | {
220 | "cell_type": "markdown",
221 | "metadata": {},
222 | "source": "Create a new dataframe with the column 'unemployment' called unemployment from the dataframe that contains the unemployment data."
223 | },
224 | {
225 | "cell_type": "code",
226 | "execution_count": 93,
227 | "metadata": {},
228 | "outputs": [],
229 | "source": "unemployment = df_unemployment['unemployment'] # Create your dataframe with column unemployment"
230 | },
231 | {
232 | "cell_type": "markdown",
233 | "metadata": {},
234 | "source": "Give your dashboard a string title, and assign it to the variable title"
235 | },
236 | {
237 | "cell_type": "code",
238 | "execution_count": 94,
239 | "metadata": {},
240 | "outputs": [],
241 | "source": "title = 'Dashboard for Unemployment and GDP Change' # Give your dashboard a string title"
242 | },
243 | {
244 | "cell_type": "markdown",
245 | "metadata": {},
246 | "source": "Finally, the function make_dashboard will output an .html in your direictory, just like a csv file. The name of the file is \"index.html\" and it will be stored in the varable file_name."
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": 95,
251 | "metadata": {},
252 | "outputs": [],
253 | "source": "file_name = \"index.html\""
254 | },
255 | {
256 | "cell_type": "markdown",
257 | "metadata": {},
258 | "source": "Call the function make_dashboard , to produce a dashboard. Assign the parameter values accordingly take a the , take a screen shot of the dashboard and submit it."
259 | },
260 | {
261 | "cell_type": "code",
262 | "execution_count": 96,
263 | "metadata": {},
264 | "outputs": [
265 | {
266 | "data": {
267 | "text/html": "\n\n\n\n\n\n \n"
268 | },
269 | "metadata": {},
270 | "output_type": "display_data"
271 | },
272 | {
273 | "data": {
274 | "application/javascript": "(function(root) {\n function embed_document(root) {\n \n var docs_json = {\"7e9ee6d3-2eb7-4009-aecf-268a588f8538\":{\"roots\":{\"references\":[{\"attributes\":{\"below\":[{\"id\":\"1952\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"1957\",\"type\":\"LinearAxis\"}],\"renderers\":[{\"id\":\"1952\",\"type\":\"LinearAxis\"},{\"id\":\"1956\",\"type\":\"Grid\"},{\"id\":\"1957\",\"type\":\"LinearAxis\"},{\"id\":\"1961\",\"type\":\"Grid\"},{\"id\":\"1970\",\"type\":\"BoxAnnotation\"},{\"id\":\"1988\",\"type\":\"Legend\"},{\"id\":\"1980\",\"type\":\"GlyphRenderer\"},{\"id\":\"1993\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"1941\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"1968\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1944\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"1948\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1946\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1950\",\"type\":\"LinearScale\"}},\"id\":\"1942\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"1948\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"2052\",\"type\":\"Selection\"},{\"attributes\":{\"plot\":null,\"text\":\"Dashboard for Unemployment and GDP Change\"},\"id\":\"1941\",\"type\":\"Title\"},{\"attributes\":{\"data_source\":{\"id\":\"1977\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1978\",\"type\":\"Line\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1979\",\"type\":\"Line\"},\"selection_glyph\":null,\"view\":{\"id\":\"1981\",\"type\":\"CDSView\"}},\"id\":\"1980\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"source\":{\"id\":\"1977\",\"type\":\"ColumnDataSource\"}},\"id\":\"1981\",\"type\":\"CDSView\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[1948,1949,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016],\"y\":{\"__ndarray__\":\"ZmZmZmZm5r8AAAAAAAAkQGZmZmZmZi9AmpmZmZmZF0AAAAAAAAAYQDMzMzMzM9M/zczMzMzMIUBmZmZmZmYWQAAAAAAAABZAAAAAAAAA+D/NzMzMzMwgQAAAAAAAABBAmpmZmZmZDUCamZmZmZkdQGZmZmZmZhZAmpmZmZmZHUDNzMzMzMwgQDMzMzMzMyNAzczMzMzMFkDNzMzMzMwiQGZmZmZmZiBAAAAAAAAAFkAAAAAAAAAhQJqZmZmZmSNAzczMzMzMJkDNzMzMzMwgQAAAAAAAACJAZmZmZmZmJkAzMzMzMzMmQAAAAAAAACpAZmZmZmZmJ0CamZmZmZkhQGZmZmZmZihAMzMzMzMzEUBmZmZmZmYhQDMzMzMzMyZAAAAAAAAAHkAAAAAAAAAWQAAAAAAAABhAmpmZmZmZH0DNzMzMzMweQM3MzMzMzBZAZmZmZmZmCkCamZmZmZkXQM3MzMzMzBRAMzMzMzMzGUAzMzMzMzMTQM3MzMzMzBZAzczMzMzMGEDNzMzMzMwWQDMzMzMzMxlAAAAAAAAAGkCamZmZmZkJQDMzMzMzMwtAMzMzMzMzE0BmZmZmZmYaQM3MzMzMzBpAAAAAAAAAGEBmZmZmZmYSQM3MzMzMzPw/zczMzMzM/L9mZmZmZmYOQJqZmZmZmQ1AzczMzMzMEEDNzMzMzMwMQJqZmZmZmRFAAAAAAAAAEECamZmZmZkFQM3MzMzMzBBA\",\"dtype\":\"float64\",\"shape\":[69]}},\"selected\":{\"id\":\"2001\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"2002\",\"type\":\"UnionRenderers\"}},\"id\":\"1977\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"axis_label\":\"%\",\"formatter\":{\"id\":\"1985\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"1942\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"1958\",\"type\":\"BasicTicker\"}},\"id\":\"1957\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"2053\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"1985\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"1950\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"1958\",\"type\":\"BasicTicker\"},{\"attributes\":{\"items\":[{\"id\":\"1989\",\"type\":\"LegendItem\"},{\"id\":\"2003\",\"type\":\"LegendItem\"}],\"plot\":{\"id\":\"1942\",\"subtype\":\"Figure\",\"type\":\"Plot\"}},\"id\":\"1988\",\"type\":\"Legend\"},{\"attributes\":{},\"id\":\"1953\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"2001\",\"type\":\"Selection\"},{\"attributes\":{\"axis_label\":\"year\",\"formatter\":{\"id\":\"1983\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"1942\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"1953\",\"type\":\"BasicTicker\"}},\"id\":\"1952\",\"type\":\"LinearAxis\"},{\"attributes\":{\"label\":{\"value\":\"% GDP change\"},\"renderers\":[{\"id\":\"1980\",\"type\":\"GlyphRenderer\"}]},\"id\":\"1989\",\"type\":\"LegendItem\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"1962\",\"type\":\"PanTool\"},{\"id\":\"1963\",\"type\":\"WheelZoomTool\"},{\"id\":\"1964\",\"type\":\"BoxZoomTool\"},{\"id\":\"1965\",\"type\":\"SaveTool\"},{\"id\":\"1966\",\"type\":\"ResetTool\"},{\"id\":\"1967\",\"type\":\"HelpTool\"}]},\"id\":\"1968\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1983\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"1962\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"1963\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[1948,1949,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016],\"y\":{\"__ndarray__\":\"AAAAAAAADkAzMzMzMzMYQFVVVVVV1RRARURERERECkA0MzMzMzMIQGdmZmZmZgdA3d3d3d1dFkB4d3d3d3cRQAAAAAAAgBBAMzMzMzMzEUDd3d3d3V0bQM3MzMzMzBVArKqqqqoqFkBERERERMQaQEVERERERBZAEhERERGRFkAiIiIiIqIUQImIiIiICBJAVVVVVVVVDkC5u7u7u7sOQHd3d3d3dwxA7+7u7u7uC0Dv7u7u7u4TQM3MzMzMzBdAZ2ZmZmZmFkDu7u7u7m4TQBIRERERkRZAMzMzMzPzIEDLzMzMzMweQDUzMzMzMxxAREREREREGEBnZmZmZmYXQDUzMzMzsxxAeHd3d3d3HkCqqqqqqmojQDMzMzMzMyNAiYiIiIgIHkBERERERMQcQAAAAAAAABxANTMzMzOzGEB3d3d3d/cVQIiIiIiICBVAd3d3d3d3FkBlZmZmZmYbQHh3d3d39x1AISIiIiKiG0BnZmZmZmYYQN/d3d3dXRZAIyIiIiKiFUBERERERMQTQAAAAAAAABJA393d3d3dEEC5u7u7u7sPQHd3d3d39xJAIyIiIiIiF0B4d3d3d/cXQKyqqqqqKhZAVVVVVVVVFEDv7u7u7m4SQHh3d3d3dxJAMzMzMzMzF0AREREREZEiQHd3d3d3NyNA3t3d3d3dIUBnZmZmZiYgQO/u7u7ubh1AIyIiIiKiGECamZmZmRkVQAAAAAAAgBNA\",\"dtype\":\"float64\",\"shape\":[69]}},\"selected\":{\"id\":\"2052\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"2053\",\"type\":\"UnionRenderers\"}},\"id\":\"1990\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"overlay\":{\"id\":\"1970\",\"type\":\"BoxAnnotation\"}},\"id\":\"1964\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"line_color\":\"#1f77b4\",\"line_width\":4,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1991\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"1965\",\"type\":\"SaveTool\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":4,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1992\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"1966\",\"type\":\"ResetTool\"},{\"attributes\":{\"data_source\":{\"id\":\"1990\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1991\",\"type\":\"Line\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1992\",\"type\":\"Line\"},\"selection_glyph\":null,\"view\":{\"id\":\"1994\",\"type\":\"CDSView\"}},\"id\":\"1993\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"1942\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"1958\",\"type\":\"BasicTicker\"}},\"id\":\"1961\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null},\"id\":\"1944\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"1967\",\"type\":\"HelpTool\"},{\"attributes\":{\"plot\":{\"id\":\"1942\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"1953\",\"type\":\"BasicTicker\"}},\"id\":\"1956\",\"type\":\"Grid\"},{\"attributes\":{\"source\":{\"id\":\"1990\",\"type\":\"ColumnDataSource\"}},\"id\":\"1994\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"2002\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1970\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"label\":{\"value\":\"% unemployed\"},\"renderers\":[{\"id\":\"1993\",\"type\":\"GlyphRenderer\"}]},\"id\":\"2003\",\"type\":\"LegendItem\"},{\"attributes\":{\"line_color\":\"firebrick\",\"line_width\":4,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1978\",\"type\":\"Line\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":4,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1979\",\"type\":\"Line\"},{\"attributes\":{\"callback\":null},\"id\":\"1946\",\"type\":\"DataRange1d\"}],\"root_ids\":[\"1942\"]},\"title\":\"Bokeh Application\",\"version\":\"1.0.4\"}};\n var render_items = [{\"docid\":\"7e9ee6d3-2eb7-4009-aecf-268a588f8538\",\"roots\":{\"1942\":\"e7891f1e-642b-4997-aa2f-10cb71655477\"}}];\n root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n\n }\n if (root.Bokeh !== undefined) {\n embed_document(root);\n } else {\n var attempts = 0;\n var timer = setInterval(function(root) {\n if (root.Bokeh !== undefined) {\n embed_document(root);\n clearInterval(timer);\n }\n attempts++;\n if (attempts > 100) {\n console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n clearInterval(timer);\n }\n }, 10, root)\n }\n})(window);",
275 | "application/vnd.bokehjs_exec.v0+json": ""
276 | },
277 | "metadata": {
278 | "application/vnd.bokehjs_exec.v0+json": {
279 | "id": "1942"
280 | }
281 | },
282 | "output_type": "display_data"
283 | }
284 | ],
285 | "source": "# Fill up the parameters in the following function:\nmake_dashboard(x=x, gdp_change=gdp_change, unemployment=unemployment, title=title, file_name=file_name)"
286 | },
287 | {
288 | "cell_type": "markdown",
289 | "metadata": {},
290 | "source": "
(Optional not marked)Save the dashboard on IBM cloud and display it
"
291 | },
292 | {
293 | "cell_type": "markdown",
294 | "metadata": {},
295 | "source": "From the tutorial PROVISIONING AN OBJECT STORAGE INSTANCE ON IBM CLOUD copy the JSON object containing the credentials you created. You\u2019ll want to store everything you see in a credentials variable like the one below (obviously, replace the placeholder values with your own). Take special note of your access_key_id and secret_access_key. Do not delete # @hidden_cell as this will not allow people to see your credentials when you share your notebook. "
296 | },
297 | {
298 | "cell_type": "markdown",
299 | "metadata": {},
300 | "source": "\ncredentials = { \n \"apikey\": \"your-api-key\", \n \"cos_hmac_keys\": { \n \"access_key_id\": \"your-access-key-here\", \n \"secret_access_key\": \"your-secret-access-key-here\" \n }, \n\n\n \"endpoints\": \"your-endpoints\", \n \"iam_apikey_description\": \"your-iam_apikey_description\", \n \"iam_apikey_name\": \"your-iam_apikey_name\", \n \"iam_role_crn\": \"your-iam_apikey_name\", \n \"iam_serviceid_crn\": \"your-iam_serviceid_crn\", \n \"resource_instance_id\": \"your-resource_instance_id\" \n}\n"
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": 97,
305 | "metadata": {},
306 | "outputs": [],
307 | "source": "# The code was removed by Watson Studio for sharing."
308 | },
309 | {
310 | "cell_type": "markdown",
311 | "metadata": {},
312 | "source": "You will need the endpoint make sure the setting are the same as PROVISIONING AN OBJECT STORAGE INSTANCE ON IBM CLOUD assign the name of your bucket to the variable bucket_name "
313 | },
314 | {
315 | "cell_type": "code",
316 | "execution_count": 98,
317 | "metadata": {},
318 | "outputs": [],
319 | "source": "endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'"
320 | },
321 | {
322 | "cell_type": "markdown",
323 | "metadata": {},
324 | "source": "From the tutorial PROVISIONING AN OBJECT STORAGE INSTANCE ON IBM CLOUD assign the name of your bucket to the variable bucket_name "
325 | },
326 | {
327 | "cell_type": "code",
328 | "execution_count": 99,
329 | "metadata": {},
330 | "outputs": [],
331 | "source": "bucket_name = 'python-for-ds-and-ai-bucket' # Type your bucket name on IBM Cloud"
332 | },
333 | {
334 | "cell_type": "markdown",
335 | "metadata": {},
336 | "source": "We can access IBM Cloud Object Storage with Python useing the boto3 library, which we\u2019ll import below:"
337 | },
338 | {
339 | "cell_type": "code",
340 | "execution_count": 100,
341 | "metadata": {},
342 | "outputs": [],
343 | "source": "import boto3"
344 | },
345 | {
346 | "cell_type": "markdown",
347 | "metadata": {},
348 | "source": "We can interact with IBM Cloud Object Storage through a boto3 resource object."
349 | },
350 | {
351 | "cell_type": "code",
352 | "execution_count": 101,
353 | "metadata": {},
354 | "outputs": [],
355 | "source": "resource = boto3.resource(\n 's3',\n aws_access_key_id = credentials[\"cos_hmac_keys\"]['access_key_id'],\n aws_secret_access_key = credentials[\"cos_hmac_keys\"][\"secret_access_key\"],\n endpoint_url = endpoint,\n)"
356 | },
357 | {
358 | "cell_type": "markdown",
359 | "metadata": {},
360 | "source": "We are going to use open to create a file object. To get the path of the file, you are going to concatenate the name of the file stored in the variable file_name. The directory stored in the variable directory using the + operator and assign it to the variable \nhtml_path. We will use the function getcwd() to find current the working directory."
361 | },
362 | {
363 | "cell_type": "code",
364 | "execution_count": 102,
365 | "metadata": {},
366 | "outputs": [],
367 | "source": "import os\n\ndirectory = os.getcwd()\nhtml_path = directory + \"/\" + file_name"
368 | },
369 | {
370 | "cell_type": "markdown",
371 | "metadata": {},
372 | "source": "Now you must read the html file, use the function f = open(html_path, mode) to create a file object and assign it to the variable f. The parameter file should be the variable html_path, the mode should be \"r\" for read. "
373 | },
374 | {
375 | "cell_type": "code",
376 | "execution_count": 103,
377 | "metadata": {},
378 | "outputs": [],
379 | "source": "# Type your code here\nf = open(file=html_path, mode='r')"
380 | },
381 | {
382 | "cell_type": "markdown",
383 | "metadata": {},
384 | "source": "To load your dataset into the bucket we will use the method put_object, you must set the parameter name to the name of the bucket, the parameter Key should be the name of the HTML file and the value for the parameter Body should be set to f.read()."
385 | },
386 | {
387 | "cell_type": "code",
388 | "execution_count": 104,
389 | "metadata": {},
390 | "outputs": [
391 | {
392 | "data": {
393 | "text/plain": "s3.Object(bucket_name='python-for-ds-and-ai-bucket', key='index.html')"
394 | },
395 | "execution_count": 104,
396 | "metadata": {},
397 | "output_type": "execute_result"
398 | }
399 | ],
400 | "source": "# Fill up the parameters in the following function:\nresource.Bucket(name=bucket_name).put_object(Key=file_name, Body=f.read())"
401 | },
402 | {
403 | "cell_type": "markdown",
404 | "metadata": {},
405 | "source": "In the dictionary Params provide the bucket name as the value for the key 'Bucket'. Also for the value of the key 'Key' add the name of the html file, both values should be strings."
406 | },
407 | {
408 | "cell_type": "code",
409 | "execution_count": 105,
410 | "metadata": {},
411 | "outputs": [],
412 | "source": "# Fill in the value for each key\nParams = {'Bucket': bucket_name,'Key': file_name}"
413 | },
414 | {
415 | "cell_type": "markdown",
416 | "metadata": {},
417 | "source": "The following lines of code will generate a URL to share your dashboard. The URL only last seven days, but don't worry you will get full marks if the URL is visible in your notebook. "
418 | },
419 | {
420 | "cell_type": "code",
421 | "execution_count": 106,
422 | "metadata": {},
423 | "outputs": [
424 | {
425 | "name": "stdout",
426 | "output_type": "stream",
427 | "text": "https://s3-api.us-geo.objectstorage.softlayer.net/python-for-ds-and-ai-bucket/index.html?AWSAccessKeyId=58a4eaefd7364ccbba025153bff5738b&Signature=ZnxTgAFOI3kUNeJhfHTRejpMFy8%3D&Expires=1596791728\n"
428 | }
429 | ],
430 | "source": "import sys\ntime = 7*24*60**2\nclient = boto3.client(\n 's3',\n aws_access_key_id = credentials[\"cos_hmac_keys\"]['access_key_id'],\n aws_secret_access_key = credentials[\"cos_hmac_keys\"][\"secret_access_key\"],\n endpoint_url=endpoint,\n\n)\nurl = client.generate_presigned_url('get_object',Params=Params,ExpiresIn=time)\nprint(url)"
431 | },
432 | {
433 | "cell_type": "markdown",
434 | "metadata": {},
435 | "source": "
Once you complete your notebook you will have to share it to be marked. Select the icon on the top right a marked in red in the image below, a dialogue box should open, select the option all content excluding sensitive code cells.
\n\n
\n\n\n
You can then share the notebook via a URL by scrolling down as shown in the following image:
\n\nJoseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.\n
"
7 | },
8 | {
9 | "cell_type": "markdown",
10 | "metadata": {},
11 | "source": "# House Sales in King County, USA"
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": "This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015."
17 | },
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {},
21 | "source": "id : A notation for a house\n\n date: Date house was sold\n\n\nprice: Price is prediction target\n\n\nbedrooms: Number of bedrooms\n\n\nbathrooms: Number of bathrooms\n\nsqft_living: Square footage of the home\n\nsqft_lot: Square footage of the lot\n\n\nfloors :Total floors (levels) in house\n\n\nwaterfront :House which has a view to a waterfront\n\n\nview: Has been viewed\n\n\ncondition :How good the condition is overall\n\ngrade: overall grade given to the housing unit, based on King County grading system\n\n\nsqft_above : Square footage of house apart from basement\n\n\nsqft_basement: Square footage of the basement\n\nyr_built : Built Year\n\n\nyr_renovated : Year when house was renovated\n\nzipcode: Zip code\n\n\nlat: Latitude coordinate\n\nlong: Longitude coordinate\n\nsqft_living15 : Living room area in 2015(implies-- some renovations) This might or might not have affected the lotsize area\n\n\nsqft_lot15 : LotSize area in 2015(implies-- some renovations)"
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": "You will require the following libraries: "
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 7,
31 | "metadata": {},
32 | "outputs": [],
33 | "source": "import pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport seaborn as sns\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler,PolynomialFeatures\nfrom sklearn.linear_model import LinearRegression\n%matplotlib inline"
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {},
38 | "source": "# Module 1: Importing Data Sets "
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {},
43 | "source": " Load the csv: "
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 8,
48 | "metadata": {
49 | "jupyter": {
50 | "outputs_hidden": false
51 | }
52 | },
53 | "outputs": [],
54 | "source": "file_name='https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/coursera/project/kc_house_data_NaN.csv'\ndf=pd.read_csv(file_name)"
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {},
59 | "source": "\nWe use the method head to display the first 5 columns of the dataframe."
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": 9,
64 | "metadata": {},
65 | "outputs": [
66 | {
67 | "data": {
68 | "text/html": "
"
315 | },
316 | "metadata": {
317 | "needs_background": "light"
318 | },
319 | "output_type": "display_data"
320 | }
321 | ],
322 | "source": "sns.regplot(x='sqft_above', y='price', data=df)"
323 | },
324 | {
325 | "cell_type": "markdown",
326 | "metadata": {},
327 | "source": "\nWe can use the Pandas method corr() to find the feature other than price that is most correlated with price."
328 | },
329 | {
330 | "cell_type": "code",
331 | "execution_count": 32,
332 | "metadata": {
333 | "jupyter": {
334 | "outputs_hidden": false
335 | }
336 | },
337 | "outputs": [
338 | {
339 | "data": {
340 | "text/plain": "zipcode -0.053203\nlong 0.021626\ncondition 0.036362\nyr_built 0.054012\nsqft_lot15 0.082447\nsqft_lot 0.089661\nyr_renovated 0.126434\nfloors 0.256794\nwaterfront 0.266369\nlat 0.307003\nbedrooms 0.308797\nsqft_basement 0.323816\nview 0.397293\nbathrooms 0.525738\nsqft_living15 0.585379\nsqft_above 0.605567\ngrade 0.667434\nsqft_living 0.702035\nprice 1.000000\nName: price, dtype: float64"
341 | },
342 | "execution_count": 32,
343 | "metadata": {},
344 | "output_type": "execute_result"
345 | }
346 | ],
347 | "source": "df.corr()['price'].sort_values()"
348 | },
349 | {
350 | "cell_type": "markdown",
351 | "metadata": {},
352 | "source": "# Module 4: Model Development"
353 | },
354 | {
355 | "cell_type": "markdown",
356 | "metadata": {},
357 | "source": "\nWe can Fit a linear regression model using the longitude feature 'long' and caculate the R^2."
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 33,
362 | "metadata": {
363 | "jupyter": {
364 | "outputs_hidden": false
365 | }
366 | },
367 | "outputs": [
368 | {
369 | "data": {
370 | "text/plain": "0.00046769430149007363"
371 | },
372 | "execution_count": 33,
373 | "metadata": {},
374 | "output_type": "execute_result"
375 | }
376 | ],
377 | "source": "X = df[['long']]\nY = df['price']\nlm = LinearRegression()\nlm.fit(X,Y)\nlm.score(X, Y)"
378 | },
379 | {
380 | "cell_type": "markdown",
381 | "metadata": {},
382 | "source": "### Question 6\nFit a linear regression model to predict the 'price' using the feature 'sqft_living' then calculate the R^2. Take a screenshot of your code and the value of the R^2."
383 | },
384 | {
385 | "cell_type": "code",
386 | "execution_count": 35,
387 | "metadata": {
388 | "jupyter": {
389 | "outputs_hidden": false
390 | }
391 | },
392 | "outputs": [
393 | {
394 | "data": {
395 | "text/plain": "0.49285321790379316"
396 | },
397 | "execution_count": 35,
398 | "metadata": {},
399 | "output_type": "execute_result"
400 | }
401 | ],
402 | "source": "X1 = df[['sqft_living']]\nY1 = df[['price']]\nlm1 = LinearRegression().fit(X1, Y1)\nlm1.score(X1, Y1)"
403 | },
404 | {
405 | "cell_type": "markdown",
406 | "metadata": {},
407 | "source": "### Question 7\nFit a linear regression model to predict the 'price' using the list of features:"
408 | },
409 | {
410 | "cell_type": "code",
411 | "execution_count": 37,
412 | "metadata": {},
413 | "outputs": [
414 | {
415 | "data": {
416 | "text/plain": "0.657679183672129"
417 | },
418 | "execution_count": 37,
419 | "metadata": {},
420 | "output_type": "execute_result"
421 | }
422 | ],
423 | "source": "features =[\"floors\", \"waterfront\",\"lat\" ,\"bedrooms\" ,\"sqft_basement\" ,\"view\" ,\"bathrooms\",\"sqft_living15\",\"sqft_above\",\"grade\",\"sqft_living\"]\nlm2 = LinearRegression().fit(df[features], df[['price']])\nlm2.score(df[features], df[['price']])"
424 | },
425 | {
426 | "cell_type": "markdown",
427 | "metadata": {},
428 | "source": "Then calculate the R^2. Take a screenshot of your code."
429 | },
430 | {
431 | "cell_type": "markdown",
432 | "metadata": {
433 | "jupyter": {
434 | "outputs_hidden": false
435 | }
436 | },
437 | "source": ""
438 | },
439 | {
440 | "cell_type": "markdown",
441 | "metadata": {},
442 | "source": "### This will help with Question 8\n\nCreate a list of tuples, the first element in the tuple contains the name of the estimator:\n\n'scale'\n\n'polynomial'\n\n'model'\n\nThe second element in the tuple contains the model constructor \n\nStandardScaler()\n\nPolynomialFeatures(include_bias=False)\n\nLinearRegression()\n"
443 | },
444 | {
445 | "cell_type": "code",
446 | "execution_count": 38,
447 | "metadata": {},
448 | "outputs": [],
449 | "source": "Input=[('scale',StandardScaler()),('polynomial', PolynomialFeatures(include_bias=False)),('model',LinearRegression())]"
450 | },
451 | {
452 | "cell_type": "markdown",
453 | "metadata": {},
454 | "source": "### Question 8\nUse the list to create a pipeline object to predict the 'price', fit the object using the features in the list features, and calculate the R^2."
455 | },
456 | {
457 | "cell_type": "code",
458 | "execution_count": 40,
459 | "metadata": {
460 | "jupyter": {
461 | "outputs_hidden": false
462 | }
463 | },
464 | "outputs": [
465 | {
466 | "name": "stderr",
467 | "output_type": "stream",
468 | "text": "/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.\n return self.partial_fit(X, y)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/base.py:467: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.\n return self.fit(X, y, **fit_params).transform(X)\n/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/pipeline.py:511: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.\n Xt = transform.transform(Xt)\n"
469 | },
470 | {
471 | "data": {
472 | "text/plain": "0.7513408553309376"
473 | },
474 | "execution_count": 40,
475 | "metadata": {},
476 | "output_type": "execute_result"
477 | }
478 | ],
479 | "source": "pipe = Pipeline(Input)\npipe.fit(df[features], df[['price']])\npipe.score(df[features], df[['price']])"
480 | },
481 | {
482 | "cell_type": "markdown",
483 | "metadata": {},
484 | "source": "# Module 5: Model Evaluation and Refinement"
485 | },
486 | {
487 | "cell_type": "markdown",
488 | "metadata": {},
489 | "source": "Import the necessary modules:"
490 | },
491 | {
492 | "cell_type": "code",
493 | "execution_count": 41,
494 | "metadata": {
495 | "jupyter": {
496 | "outputs_hidden": false
497 | }
498 | },
499 | "outputs": [
500 | {
501 | "name": "stdout",
502 | "output_type": "stream",
503 | "text": "done\n"
504 | }
505 | ],
506 | "source": "from sklearn.model_selection import cross_val_score\nfrom sklearn.model_selection import train_test_split\nprint(\"done\")"
507 | },
508 | {
509 | "cell_type": "markdown",
510 | "metadata": {},
511 | "source": "We will split the data into training and testing sets:"
512 | },
513 | {
514 | "cell_type": "code",
515 | "execution_count": 42,
516 | "metadata": {
517 | "jupyter": {
518 | "outputs_hidden": false
519 | }
520 | },
521 | "outputs": [
522 | {
523 | "name": "stdout",
524 | "output_type": "stream",
525 | "text": "number of test samples: 3242\nnumber of training samples: 18371\n"
526 | }
527 | ],
528 | "source": "features =[\"floors\", \"waterfront\",\"lat\" ,\"bedrooms\" ,\"sqft_basement\" ,\"view\" ,\"bathrooms\",\"sqft_living15\",\"sqft_above\",\"grade\",\"sqft_living\"] \nX = df[features]\nY = df['price']\n\nx_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.15, random_state=1)\n\n\nprint(\"number of test samples:\", x_test.shape[0])\nprint(\"number of training samples:\",x_train.shape[0])"
529 | },
530 | {
531 | "cell_type": "markdown",
532 | "metadata": {},
533 | "source": "### Question 9\nCreate and fit a Ridge regression object using the training data, set the regularization parameter to 0.1, and calculate the R^2 using the test data. \n"
534 | },
535 | {
536 | "cell_type": "code",
537 | "execution_count": 43,
538 | "metadata": {},
539 | "outputs": [],
540 | "source": "from sklearn.linear_model import Ridge"
541 | },
542 | {
543 | "cell_type": "code",
544 | "execution_count": 44,
545 | "metadata": {
546 | "jupyter": {
547 | "outputs_hidden": false
548 | }
549 | },
550 | "outputs": [
551 | {
552 | "data": {
553 | "text/plain": "0.6478759163939121"
554 | },
555 | "execution_count": 44,
556 | "metadata": {},
557 | "output_type": "execute_result"
558 | }
559 | ],
560 | "source": "RR = Ridge(alpha=0.1).fit(x_train, y_train)\nRR.score(x_test, y_test)"
561 | },
562 | {
563 | "cell_type": "markdown",
564 | "metadata": {},
565 | "source": "### Question 10\nPerform a second order polynomial transform on both the training data and testing data. Create and fit a Ridge regression object using the training data, set the regularisation parameter to 0.1, and calculate the R^2 utilising the test data provided. Take a screenshot of your code and the R^2."
566 | },
567 | {
568 | "cell_type": "code",
569 | "execution_count": 51,
570 | "metadata": {
571 | "jupyter": {
572 | "outputs_hidden": false
573 | }
574 | },
575 | "outputs": [
576 | {
577 | "data": {
578 | "text/plain": "0.7002744279699229"
579 | },
580 | "execution_count": 51,
581 | "metadata": {},
582 | "output_type": "execute_result"
583 | }
584 | ],
585 | "source": "from sklearn.preprocessing import PolynomialFeatures\npoly = PolynomialFeatures(degree=2)\nx_train_poly = poly.fit_transform(x_train)\nx_test_poly = poly.transform(x_test)\nRR1 = Ridge(alpha=0.1).fit(x_train_poly, y_train)\nRR1.score(x_test_poly, y_test)"
586 | },
587 | {
588 | "cell_type": "markdown",
589 | "metadata": {},
590 | "source": "
Once you complete your notebook you will have to share it. Select the icon on the top right a marked in red in the image below, a dialogue box should open, and select the option all content excluding sensitive code cells.
\n
\n \n
You can then share the notebook via a URL by scrolling down as shown in the following image:
\n\nJoseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD."
596 | },
597 | {
598 | "cell_type": "markdown",
599 | "metadata": {},
600 | "source": "Other contributors: Michelle Carey, Mavis Zhou "
601 | },
602 | {
603 | "cell_type": "code",
604 | "execution_count": null,
605 | "metadata": {},
606 | "outputs": [],
607 | "source": ""
608 | }
609 | ],
610 | "metadata": {
611 | "kernelspec": {
612 | "display_name": "Python 3.6",
613 | "language": "python",
614 | "name": "python3"
615 | },
616 | "language_info": {
617 | "codemirror_mode": {
618 | "name": "ipython",
619 | "version": 3
620 | },
621 | "file_extension": ".py",
622 | "mimetype": "text/x-python",
623 | "name": "python",
624 | "nbconvert_exporter": "python",
625 | "pygments_lexer": "ipython3",
626 | "version": "3.6.9"
627 | },
628 | "widgets": {
629 | "state": {},
630 | "version": "1.1.2"
631 | }
632 | },
633 | "nbformat": 4,
634 | "nbformat_minor": 4
635 | }
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # IBM Data Science Professional Certificate Projects
2 |
3 | This repository contains the projects/assignments for courses in the *IBM Data Science Professional Certificate* on Coursera. The professional certificate contains 9 courses. These are as follows:
4 | 1. What is Data Science?
5 | 2. Tools for Data Science
6 | 3. Data Science Methodology
7 | 4. Python for Data Science and AI
8 | 5. Databases and SQL for Data Science
9 | 6. Data Analysis with Python
10 | 7. Data Visualization with Python
11 | 8. Machine Learning with Python
12 | 9. Applied Data Science Capstone
13 |
14 | Project/assignment notebooks for courses 2, 4, 5, 6, 7, 8 and 9 are included in this repository. Courses 1 and 3 only have quizzes as part of their assignments. Hence, there are no notebooks for them.
15 |
16 | The last course, *Applied Data Science Capstone*, has multiple submissions. These include:
17 | 1. A basic introduction to the capstone project notebook for week 1 of the course
18 | 2. An assignment notebook for week 3 of the course
19 | 3. A PDF document highlighting the introduction and data collection for the final project for week 4 of the course
20 | 4. The final project notebook, project report, and project presentation for week 5 of the course
21 |
22 | All of the files mentioned above can be found at https://github.com/raunakbhutoria/Coursera_Capstone. The Coursera_Capstone repository was the one used for making all submissions for the capstone course. Thus, in this repository I will only be including the Week 3 assignment notebook and the final project code notebook. In order to view the report, presentation, and other materials of the capstone course, please visit my Coursera_Capstone repository.
23 |
24 | # Thank You!
25 |
--------------------------------------------------------------------------------
/SQL Assignment - Chicago.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": "\n\n
Assignment: Notebook for Peer Assignment
"
7 | },
8 | {
9 | "cell_type": "markdown",
10 | "metadata": {},
11 | "source": "# Introduction\n\nUsing this Python notebook you will:\n1. Understand 3 Chicago datasets \n1. Load the 3 datasets into 3 tables in a Db2 database\n1. Execute SQL queries to answer assignment questions "
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": "## Understand the datasets \nTo complete the assignment problems in this notebook you will be using three datasets that are available on the city of Chicago's Data Portal:\n1. Socioeconomic Indicators in Chicago\n1. Chicago Public Schools\n1. Chicago Crime Data\n\n### 1. Socioeconomic Indicators in Chicago\nThis dataset contains a selection of six socioeconomic indicators of public health significance and a \u201chardship index,\u201d for each Chicago community area, for the years 2008 \u2013 2012.\n\nFor this assignment you will use a snapshot of this dataset which can be downloaded from:\nhttps://ibm.box.com/shared/static/05c3415cbfbtfnr2fx4atenb2sd361ze.csv\n\nA detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:\nhttps://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2\n\n\n\n### 2. Chicago Public Schools\n\nThis dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.\n\nFor this assignment you will use a snapshot of this dataset which can be downloaded from:\nhttps://ibm.box.com/shared/static/f9gjvj1gjmxxzycdhplzt01qtz0s7ew7.csv\n\nA detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:\nhttps://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t\n\n\n\n\n### 3. Chicago Crime Data \n\nThis dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. \n\nThis dataset is quite large - over 1.5GB in size with over 6.5 million rows. For the purposes of this assignment we will use a much smaller sample of this dataset which can be downloaded from:\nhttps://ibm.box.com/shared/static/svflyugsr9zbqy5bmowgswqemfpm1x7f.csv\n\nA detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:\nhttps://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2\n"
17 | },
18 | {
19 | "cell_type": "markdown",
20 | "metadata": {},
21 | "source": "### Download the datasets\nIn many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the links below to download and save the datasets (.CSV files):\n1. __CENSUS_DATA:__ https://ibm.box.com/shared/static/05c3415cbfbtfnr2fx4atenb2sd361ze.csv\n1. __CHICAGO_PUBLIC_SCHOOLS__ https://ibm.box.com/shared/static/f9gjvj1gjmxxzycdhplzt01qtz0s7ew7.csv\n1. __CHICAGO_CRIME_DATA:__ https://ibm.box.com/shared/static/svflyugsr9zbqy5bmowgswqemfpm1x7f.csv\n\n__NOTE:__ Ensure you have downloaded the datasets using the links above instead of directly from the Chicago Data Portal. The versions linked here are subsets of the original datasets and have some of the column names modified to be more database friendly which will make it easier to complete this assignment."
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": "### Store the datasets in database tables\nTo analyze the data using SQL, it first needs to be stored in the database.\n\nWhile it is easier to read the dataset into a Pandas dataframe and then PERSIST it into the database as we saw in Week 3 Lab 3, it results in mapping to default datatypes which may not be optimal for SQL querying. For example a long textual field may map to a CLOB instead of a VARCHAR. \n\nTherefore, __it is highly recommended to manually load the table using the database console LOAD tool, as indicated in Week 2 Lab 1 Part II__. The only difference with that lab is that in Step 5 of the instructions you will need to click on create \"(+) New Table\" and specify the name of the table you want to create and then click \"Next\". \n\n\n\n##### Now open the Db2 console, open the LOAD tool, Select / Drag the .CSV file for the first dataset, Next create a New Table, and then follow the steps on-screen instructions to load the data. Name the new tables as folows:\n1. __CENSUS_DATA__\n1. __CHICAGO_PUBLIC_SCHOOLS__\n1. __CHICAGO_CRIME_DATA__"
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": "### Connect to the database \nLet us first load the SQL extension and establish a connection with the database"
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 1,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": "%load_ext sql"
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {},
43 | "source": "In the next cell enter your db2 connection string. Recall you created Service Credentials for your Db2 instance in first lab in Week 3. From the __uri__ field of your Db2 service credentials copy everything after db2:// (except the double quote at the end) and paste it in the cell below after ibm_db_sa://\n\n"
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 2,
48 | "metadata": {},
49 | "outputs": [
50 | {
51 | "data": {
52 | "text/plain": "'Connected: pgq43854@BLUDB'"
53 | },
54 | "execution_count": 2,
55 | "metadata": {},
56 | "output_type": "execute_result"
57 | }
58 | ],
59 | "source": "# Remember the connection string is of the format:\n# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name\n# Enter the connection string for your Db2 on Cloud database instance below\n%sql ibm_db_sa://pgq43854:3js4p2w4dzrc85%5Ev@dashdb-txn-sbox-yp-lon02-02.services.eu-gb.bluemix.net:50000/BLUDB"
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": "## Problems\nNow write and execute SQL queries to solve assignment problems\n\n### Problem 1\n\n##### Find the total number of crimes recorded in the CRIME table"
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 10,
69 | "metadata": {},
70 | "outputs": [
71 | {
72 | "name": "stdout",
73 | "output_type": "stream",
74 | "text": " * ibm_db_sa://pgq43854:***@dashdb-txn-sbox-yp-lon02-02.services.eu-gb.bluemix.net:50000/BLUDB\nDone.\n"
75 | },
76 | {
77 | "data": {
78 | "text/html": "