├── 00_Requirements_and_Installs.ipynb ├── 01_What_is_Machine_Learning.ipynb ├── 02-intro-to-regression-used-car_final.ipynb ├── 03_Classification_heart.ipynb ├── 04-intro-to-NLP-and-topic-modeling.ipynb ├── README.md ├── data ├── data_small_final.csv └── heart_2020_cleaned.csv ├── imgs ├── CRT.png ├── adjusted_r.png ├── emotions.png ├── flying_blind.png ├── logistic_1.png ├── nlp_topics.png ├── r_2.png ├── regression_ml.png └── saab.png └── solutions ├── 02-intro-to-regression-used-car_final_solutions.ipynb └── 03_Classification_heart_solutions.ipynb /00_Requirements_and_Installs.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Intro into Machine Learning!\n" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Set-up \n", 15 | "\n", 16 | "\n", 17 | "### 0) Installing Anaconda\n", 18 | "\n", 19 | "If you haven't already, follow the instructions [here](https://github.com/julialintern/Intro_to_Deep_Learning/anaconda_install/) to **install an updated version of Anaconda** (with Python 3). \n", 20 | "\n", 21 | "Next, check that `conda` is installed by running `conda -V` from your terminal. You should\n", 22 | "receive a response indicating your current `conda` version." 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "### 1) Install Environment: \n", 30 | "\n", 31 | "\n", 32 | "```bash\n", 33 | "conda create -n ml python=3 \n", 34 | "conda activate ml\n", 35 | "conda install anaconda\n", 36 | "```\n", 37 | "\n", 38 | "\n", 39 | "#### Add 'ml' kernel to jupyter\n", 40 | "```bash\n", 41 | "conda install ipykernal\n", 42 | "python -m ipykernel install --user --name ml\n", 43 | "```\n", 44 | "\n", 45 | "```bash\n", 46 | "$ conda activate ml\n", 47 | "```\n", 48 | "You can then start Jupyter by running\n", 49 | "\n", 50 | "```bash\n", 51 | "$ jupyter notebook\n", 52 | "```\n", 53 | "\n", 54 | "When starting a new notebook in Jupyter, you should select \"Kernel ->\n", 55 | "Change Kernel -> \"ml\" before running." 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "\n", 63 | "\n" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": { 69 | "collapsed": true 70 | }, 71 | "source": [ 72 | "### 2) Git Clone:\n", 73 | "- (In case you haven't yet), please git clone the workshop repo : https://github.com/julialintern/intro_to_machine_learning" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": { 79 | "collapsed": true 80 | }, 81 | "source": [ 82 | "### 3) Testing:\n", 83 | "#### Launch jupyter notebook\n" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 1, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "# once in your notebook, test:\n", 93 | "from sklearn.linear_model import LinearRegression" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [] 102 | } 103 | ], 104 | "metadata": { 105 | "kernelspec": { 106 | "display_name": "ml", 107 | "language": "python", 108 | "name": "ml" 109 | }, 110 | "language_info": { 111 | "codemirror_mode": { 112 | "name": "ipython", 113 | "version": 3 114 | }, 115 | "file_extension": ".py", 116 | "mimetype": "text/x-python", 117 | "name": "python", 118 | "nbconvert_exporter": "python", 119 | "pygments_lexer": "ipython3", 120 | "version": "3.10.11" 121 | } 122 | }, 123 | "nbformat": 4, 124 | "nbformat_minor": 2 125 | } 126 | -------------------------------------------------------------------------------- /01_What_is_Machine_Learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "f09755c1", 6 | "metadata": {}, 7 | "source": [ 8 | "# What is Machine Learning?" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "0de8d9da", 14 | "metadata": {}, 15 | "source": [ 16 | "*Machine Learning is a subfield of Ai that involves the development of algorithms and statistical models that allow computers to automatically learn and improve from experience without being explicitly programmed. In other words, it is the science of getting computers to learn from data and make predictions or decisions based on that learning.* \n", 17 | "\n", 18 | "*Machine Learning algorithms can be broadly categorized into three types: superivsed learning, unsupervised learning and reinforcement learning.*\n", 19 | "\n", 20 | " --chatgpt" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "c3cd729b", 26 | "metadata": {}, 27 | "source": [ 28 | " " 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "id": "ea67f4ba", 34 | "metadata": {}, 35 | "source": [ 36 | " " 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "id": "ac487c5d", 42 | "metadata": {}, 43 | "source": [ 44 | " # ALL Things NLP" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 3, 50 | "id": "bd4e499e", 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "#" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "id": "fc1c0134", 60 | "metadata": {}, 61 | "source": [ 62 | "" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "id": "9aa8f593", 68 | "metadata": {}, 69 | "source": [ 70 | " " 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "id": "681cb1bd", 76 | "metadata": {}, 77 | "source": [ 78 | "[source: Finding-good-read-among-billions-of-choices-1220]('https://news.mit.edu/2019/finding-good-read-among-billions-of-choices-1220')" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "id": "552e3e0e", 84 | "metadata": {}, 85 | "source": [ 86 | "[Machine Learning as per Wikipedia](https://en.wikipedia.org/wiki/Machine_learning)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "id": "87acdd98", 92 | "metadata": {}, 93 | "source": [ 94 | "[it's all around us](https://www.nytimes.com/search?query=ai)" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "id": "d346a235", 100 | "metadata": {}, 101 | "source": [ 102 | "A bit about optimization and discovering the ultimate trade-off between : \n", 103 | " \n", 104 | " * Exhibit Exemplary Grit \n", 105 | " * “Insanity is doing the same thing over and over and expecting different results.” \n", 106 | "\n", 107 | " * Be endlessly curious. Try all the things \n", 108 | " * MVP approach: (just make sure you Deliver by the deadline !) " 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "id": "4631772f", 114 | "metadata": {}, 115 | "source": [ 116 | "### Quick Overview \n", 117 | "\n", 118 | "Regression modeling: - 60 min \n", 119 | "EventX Q&A - 10 min\n", 120 | "\n", 121 | "\n", 122 | "Classification modeling: - 60 min \n", 123 | "EventX Q&A - 10 min\n", 124 | "\n", 125 | "\n", 126 | "NLP & Topic Modeling - 20 min \n", 127 | "EventX Q&A - 5 min\n", 128 | "\n", 129 | "Total \t\t - 2 hrs 45 mins" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "id": "e37f945c", 136 | "metadata": {}, 137 | "outputs": [], 138 | "source": [] 139 | } 140 | ], 141 | "metadata": { 142 | "kernelspec": { 143 | "display_name": "ml", 144 | "language": "python", 145 | "name": "ml" 146 | }, 147 | "language_info": { 148 | "codemirror_mode": { 149 | "name": "ipython", 150 | "version": 3 151 | }, 152 | "file_extension": ".py", 153 | "mimetype": "text/x-python", 154 | "name": "python", 155 | "nbconvert_exporter": "python", 156 | "pygments_lexer": "ipython3", 157 | "version": "3.10.11" 158 | } 159 | }, 160 | "nbformat": 4, 161 | "nbformat_minor": 5 162 | } 163 | -------------------------------------------------------------------------------- /04-intro-to-NLP-and-topic-modeling.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Objectives\n", 15 | "At the end of this notebook the students should: \n", 16 | "\n", 17 | "* Develop a basic understanding of how to get started with text data\n", 18 | "* Perform basic preprocessing & vectorization of text data \n", 19 | "* Build and interpret an NMF topic model \n", 20 | "\n", 21 | "Data: \n", 22 | "We'll take a look at: [one million ABC News headlines](https://www.kaggle.com/code/thebrownviking20/k-means-clustering-of-1-million-headlines/data)" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "# Building an NLP Pipeline\n", 30 | "\n", 31 | "For the pair problem today, we'll build a pipeline which manages the *basic* requirements for an NLP project. The goal is to build a toolbox for converting one or more strings of text into a matrix (retaining textual information along the way)." 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "## Step 1: Read in Data" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 10, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [ 47 | "import pandas as pd\n", 48 | "import numpy as np" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": {}, 55 | "outputs": [ 56 | { 57 | "data": { 58 | "text/html": [ 59 | "
\n", 60 | "\n", 73 | "\n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | "
publish_dateheadline_text
020030219aba decides against community broadcasting lic...
120030219act fire witnesses must be aware of defamation
220030219a g calls for infrastructure protection summit
320030219air nz staff in aust strike for pay rise
420030219air nz strike to affect australian travellers
\n", 109 | "
" 110 | ], 111 | "text/plain": [ 112 | " publish_date headline_text\n", 113 | "0 20030219 aba decides against community broadcasting lic...\n", 114 | "1 20030219 act fire witnesses must be aware of defamation\n", 115 | "2 20030219 a g calls for infrastructure protection summit\n", 116 | "3 20030219 air nz staff in aust strike for pay rise\n", 117 | "4 20030219 air nz strike to affect australian travellers" 118 | ] 119 | }, 120 | "execution_count": 2, 121 | "metadata": {}, 122 | "output_type": "execute_result" 123 | } 124 | ], 125 | "source": [ 126 | "df = pd.read_csv('~/Downloads/abcnews-date-text.csv')\n", 127 | "df.head()" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## Step 2: Vectorize (part 1)\n", 135 | "\n", 136 | "Using one of the below vectorizers provided by Sci-Kit Learn, **convert the `reviews` pandas Series to a matrix**, where each row represents a document, and each column represents a term (or, a word in a document). The number of rows should match the number of rows in `df` — this is called the \"corpus\". And, the number of columns should be the total number of *distinct* terms (i.e., words) in the corpus — this is called the \"vocabulary\".\n", 137 | "\n", 138 | "**Build the matrix such that the value at `(i,j)` is the *Count* of term (column) `j` in document (row) `i`.**\n", 139 | "\n", 140 | "**What are the terms in this corpus?** *Hint: When using one of these vectorizers, what is the difference between `.vocabulary_` and `.get_feature_names()`?*\n", 141 | "\n", 142 | "*Note: The default behaviour for vectorizers is to output a Sparse matrix.*" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 3, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 4, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "docs = df.headline_text" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 5, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "vec = CountVectorizer()" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 6, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "doc_term = vec.fit_transform(docs)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 11, 184 | "metadata": { 185 | "scrolled": true, 186 | "tags": [] 187 | }, 188 | "outputs": [ 189 | { 190 | "data": { 191 | "text/plain": [ 192 | "{'aba': 4665,\n", 193 | " 'decides': 25198,\n", 194 | " 'against': 5913,\n", 195 | " 'community': 21254,\n", 196 | " 'broadcasting': 15400,\n", 197 | " 'licence': 51363,\n", 198 | " 'act': 5288,\n", 199 | " 'fire': 33684,\n", 200 | " 'witnesses': 94686,\n", 201 | " 'must': 59152,\n", 202 | " 'be': 11516,\n", 203 | " 'aware': 9888,\n", 204 | " 'of': 62095,\n", 205 | " 'defamation': 25374,\n", 206 | " 'calls': 16958,\n", 207 | " 'for': 34625,\n", 208 | " 'infrastructure': 44473,\n", 209 | " 'protection': 68858,\n", 210 | " 'summit': 83464,\n", 211 | " 'air': 6232,\n", 212 | " 'nz': 61725,\n", 213 | " 'staff': 81689,\n", 214 | " 'in': 43928,\n", 215 | " 'aust': 9583,\n", 216 | " 'strike': 82846,\n", 217 | " 'pay': 64904,\n", 218 | " 'rise': 73710,\n", 219 | " 'to': 86856,\n", 220 | " 'affect': 5775,\n", 221 | " 'australian': 9638,\n", 222 | " 'travellers': 87872,\n", 223 | " 'ambitious': 7142,\n", 224 | " 'olsson': 62368,\n", 225 | " 'wins': 94525,\n", 226 | " 'triple': 88206,\n", 227 | " 'jump': 46985,\n", 228 | " 'antic': 7854,\n", 229 | " 'delighted': 25653,\n", 230 | " 'with': 94643,\n", 231 | " 'record': 71323,\n", 232 | " 'breaking': 15014,\n", 233 | " 'barca': 10886,\n", 234 | " 'aussie': 9577,\n", 235 | " 'qualifier': 69690,\n", 236 | " 'stosur': 82616,\n", 237 | " 'wastes': 92973,\n", 238 | " 'four': 34953,\n", 239 | " 'memphis': 55926,\n", 240 | " 'match': 54705,\n", 241 | " 'addresses': 5422,\n", 242 | " 'un': 89301,\n", 243 | " 'security': 77124,\n", 244 | " 'council': 22877,\n", 245 | " 'over': 63341,\n", 246 | " 'iraq': 45396,\n", 247 | " 'australia': 9635,\n", 248 | " 'is': 45527,\n", 249 | " 'locked': 52080,\n", 250 | " 'into': 45131,\n", 251 | " 'war': 92721,\n", 252 | " 'timetable': 86655,\n", 253 | " 'opp': 62643,\n", 254 | " 'contribute': 22177,\n", 255 | " '10': 407,\n", 256 | " 'million': 56790,\n", 257 | " 'aid': 6175,\n", 258 | " 'take': 84661,\n", 259 | " 'as': 8836,\n", 260 | " 'robson': 74004,\n", 261 | " 'celebrates': 18392,\n", 262 | " 'birthday': 13111,\n", 263 | " 'bathhouse': 11314,\n", 264 | " 'plans': 66655,\n", 265 | " 'move': 58496,\n", 266 | " 'ahead': 6132,\n", 267 | " 'big': 12724,\n", 268 | " 'hopes': 42312,\n", 269 | " 'launceston': 50385,\n", 270 | " 'cycling': 24275,\n", 271 | " 'championship': 18759,\n", 272 | " 'plan': 66622,\n", 273 | " 'boost': 14283,\n", 274 | " 'paroo': 64506,\n", 275 | " 'water': 93009,\n", 276 | " 'supplies': 83779,\n", 277 | " 'blizzard': 13518,\n", 278 | " 'buries': 16261,\n", 279 | " 'united': 89944,\n", 280 | " 'states': 81976,\n", 281 | " 'bills': 12850,\n", 282 | " 'brigadier': 15246,\n", 283 | " 'dismisses': 27304,\n", 284 | " 'reports': 72508,\n", 285 | " 'troops': 88293,\n", 286 | " 'harassed': 40083,\n", 287 | " 'british': 15357,\n", 288 | " 'combat': 21013,\n", 289 | " 'arriving': 8712,\n", 290 | " 'daily': 24425,\n", 291 | " 'kuwait': 49610,\n", 292 | " 'bryant': 15709,\n", 293 | " 'leads': 50598,\n", 294 | " 'lakers': 49874,\n", 295 | " 'double': 28169,\n", 296 | " 'overtime': 63558,\n", 297 | " 'win': 94378,\n", 298 | " 'bushfire': 16423,\n", 299 | " 'victims': 91637,\n", 300 | " 'urged': 90496,\n", 301 | " 'see': 77157,\n", 302 | " 'centrelink': 18505,\n", 303 | " 'businesses': 16464,\n", 304 | " 'should': 78631,\n", 305 | " 'prepare': 68105,\n", 306 | " 'terrorist': 85822,\n", 307 | " 'attacks': 9352,\n", 308 | " 'calleri': 16940,\n", 309 | " 'avenges': 9794,\n", 310 | " 'final': 33539,\n", 311 | " 'defeat': 25389,\n", 312 | " 'eliminate': 30005,\n", 313 | " 'massu': 54641,\n", 314 | " 'call': 16926,\n", 315 | " 'ethanol': 31338,\n", 316 | " 'blend': 13442,\n", 317 | " 'fuel': 35579,\n", 318 | " 'go': 37632,\n", 319 | " 'carews': 17607,\n", 320 | " 'freak': 35149,\n", 321 | " 'goal': 37635,\n", 322 | " 'leaves': 50701,\n", 323 | " 'roma': 74249,\n", 324 | " 'ruins': 74836,\n", 325 | " 'cemeteries': 18442,\n", 326 | " 'miss': 57268,\n", 327 | " 'out': 63134,\n", 328 | " 'on': 62447,\n", 329 | " 'funds': 35701,\n", 330 | " 'code': 20647,\n", 331 | " 'conduct': 21608,\n", 332 | " 'toughens': 87372,\n", 333 | " 'organ': 62837,\n", 334 | " 'donation': 27965,\n", 335 | " 'regulations': 71844,\n", 336 | " 'commonwealth': 21215,\n", 337 | " 'bank': 10742,\n", 338 | " 'cuts': 24189,\n", 339 | " 'fixed': 33916,\n", 340 | " 'home': 42085,\n", 341 | " 'loan': 52003,\n", 342 | " 'rates': 70681,\n", 343 | " 'help': 41071,\n", 344 | " 'homeless': 42111,\n", 345 | " 'youth': 96099,\n", 346 | " 'chief': 19276,\n", 347 | " 'executive': 31779,\n", 348 | " 'fails': 32361,\n", 349 | " 'secure': 77115,\n", 350 | " 'position': 67583,\n", 351 | " 'councillor': 22881,\n", 352 | " 'contest': 22098,\n", 353 | " 'wollongong': 94807,\n", 354 | " 'independent': 44156,\n", 355 | " 'moves': 58506,\n", 356 | " 'protect': 68852,\n", 357 | " 'tas': 85112,\n", 358 | " 'heritage': 41262,\n", 359 | " 'garden': 36214,\n", 360 | " 'welcomes': 93475,\n", 361 | " 'ambulance': 7156,\n", 362 | " 'levy': 51207,\n", 363 | " 'decision': 25214,\n", 364 | " 'insurance': 44867,\n", 365 | " 'breakthrough': 15022,\n", 366 | " 'crean': 23282,\n", 367 | " 'tells': 85578,\n", 368 | " 'alp': 6934,\n", 369 | " 'leadership': 50590,\n", 370 | " 'critics': 23514,\n", 371 | " 'shut': 78779,\n", 372 | " 'up': 90347,\n", 373 | " 'dargo': 24729,\n", 374 | " 'threat': 86274,\n", 375 | " 'expected': 31916,\n", 376 | " 'death': 25061,\n", 377 | " 'toll': 86998,\n", 378 | " 'continues': 22124,\n", 379 | " 'climb': 20233,\n", 380 | " 'korean': 49142,\n", 381 | " 'subway': 83245,\n", 382 | " 'dems': 25839,\n", 383 | " 'hold': 41969,\n", 384 | " 'plebiscite': 66791,\n", 385 | " 'iraqi': 45397,\n", 386 | " 'conflict': 21685,\n", 387 | " 'dent': 25907,\n", 388 | " 'downs': 28277,\n", 389 | " 'philippoussis': 65916,\n", 390 | " 'tie': 86503,\n", 391 | " 'break': 14998,\n", 392 | " 'thriller': 86306,\n", 393 | " 'de': 24986,\n", 394 | " 'villiers': 91798,\n", 395 | " 'learn': 50660,\n", 396 | " 'fate': 32794,\n", 397 | " 'march': 54142,\n", 398 | " 'digital': 26772,\n", 399 | " 'tv': 88863,\n", 400 | " 'will': 94248,\n", 401 | " 'become': 11702,\n", 402 | " 'commonplace': 21209,\n", 403 | " 'direct': 26952,\n", 404 | " 'anger': 7562,\n", 405 | " 'at': 9234,\n", 406 | " 'govt': 38147,\n", 407 | " 'not': 61253,\n", 408 | " 'soldiers': 80452,\n", 409 | " 'urges': 90501,\n", 410 | " 'dispute': 27396,\n", 411 | " 'smithton': 79980,\n", 412 | " 'vegetable': 91267,\n", 413 | " 'processing': 68512,\n", 414 | " 'plant': 66656,\n", 415 | " 'dog': 27798,\n", 416 | " 'mauls': 54861,\n", 417 | " '18': 1398,\n", 418 | " 'month': 57926,\n", 419 | " 'old': 62303,\n", 420 | " 'toddler': 86907,\n", 421 | " 'nsw': 61416,\n", 422 | " 'dying': 29141,\n", 423 | " 'passengers': 64641,\n", 424 | " 'phoned': 65972,\n", 425 | " 'england': 30620,\n", 426 | " 'change': 18795,\n", 427 | " 'three': 86288,\n", 428 | " 'wales': 92486,\n", 429 | " 'epa': 30924,\n", 430 | " 'still': 82344,\n", 431 | " 'trying': 88469,\n", 432 | " 'recover': 71346,\n", 433 | " 'chemical': 19133,\n", 434 | " 'clean': 20096,\n", 435 | " 'costs': 22801,\n", 436 | " 'expressions': 32041,\n", 437 | " 'interest': 44962,\n", 438 | " 'sought': 80721,\n", 439 | " 'build': 15928,\n", 440 | " 'livestock': 51908,\n", 441 | " 'fed': 32968,\n", 442 | " 're': 70883,\n", 443 | " 'introduce': 45155,\n", 444 | " 'national': 59796,\n", 445 | " 'firefighters': 33725,\n", 446 | " 'contain': 22039,\n", 447 | " 'acid': 5207,\n", 448 | " 'spill': 81173,\n", 449 | " 'injured': 44603,\n", 450 | " 'head': 40658,\n", 451 | " 'highway': 41513,\n", 452 | " 'crash': 23211,\n", 453 | " 'freedom': 35195,\n", 454 | " 'records': 71331,\n", 455 | " 'net': 60280,\n", 456 | " 'profit': 68589,\n", 457 | " 'third': 86149,\n", 458 | " 'successive': 83264,\n", 459 | " 'allocated': 6832,\n", 460 | " 'domestic': 27900,\n", 461 | " 'violence': 91872,\n", 462 | " 'risk': 73724,\n", 463 | " 'announced': 7729,\n", 464 | " 'bridge': 15202,\n", 465 | " 'work': 95097,\n", 466 | " 'cadell': 16751,\n", 467 | " 'upgrade': 90371,\n", 468 | " 'restore': 72886,\n", 469 | " 'cossack': 22764,\n", 470 | " 'german': 36867,\n", 471 | " 'court': 22967,\n", 472 | " 'give': 37304,\n", 473 | " 'verdict': 91423,\n", 474 | " 'sept': 77482,\n", 475 | " '11': 578,\n", 476 | " 'accused': 5154,\n", 477 | " 'gilchrist': 37105,\n", 478 | " 'backs': 10207,\n", 479 | " 'rest': 72849,\n", 480 | " 'policy': 67156,\n", 481 | " 'girl': 37247,\n", 482 | " 'gold': 37746,\n", 483 | " 'coast': 20509,\n", 484 | " 'hear': 40763,\n", 485 | " 'about': 4891,\n", 486 | " 'bilby': 12799,\n", 487 | " 'project': 68645,\n", 488 | " 'golf': 37793,\n", 489 | " 'club': 20378,\n", 490 | " 'feeling': 33026,\n", 491 | " 'smoking': 80006,\n", 492 | " 'ban': 10642,\n", 493 | " 'impact': 43735,\n", 494 | " 'blame': 13344,\n", 495 | " 'ethanols': 31340,\n", 496 | " 'unpopularity': 90087,\n", 497 | " 'greens': 38568,\n", 498 | " 'offer': 62127,\n", 499 | " 'police': 67137,\n", 500 | " 'station': 81989,\n", 501 | " 'alternative': 6998,\n", 502 | " 'griffiths': 38693,\n", 503 | " 'under': 89503,\n", 504 | " 'knock': 48873,\n", 505 | " 'back': 10139,\n", 506 | " 'group': 38880,\n", 507 | " 'meet': 55704,\n", 508 | " 'north': 61169,\n", 509 | " 'west': 93598,\n", 510 | " 'wa': 92293,\n", 511 | " 'rock': 74032,\n", 512 | " 'art': 8749,\n", 513 | " 'hacker': 39502,\n", 514 | " 'gains': 35933,\n", 515 | " 'access': 5058,\n", 516 | " 'eight': 29762,\n", 517 | " 'credit': 23317,\n", 518 | " 'cards': 17575,\n", 519 | " 'hanson': 40021,\n", 520 | " 'grossly': 38848,\n", 521 | " 'naive': 59478,\n", 522 | " 'issues': 45648,\n", 523 | " 'costa': 22772,\n", 524 | " 'where': 93823,\n", 525 | " 'she': 78103,\n", 526 | " 'came': 17039,\n", 527 | " 'from': 35441,\n", 528 | " 'mp': 58554,\n", 529 | " 'harrington': 40268,\n", 530 | " 'raring': 70627,\n", 531 | " 'after': 5882,\n", 532 | " 'health': 40738,\n", 533 | " 'minister': 57006,\n", 534 | " 'and': 7458,\n", 535 | " 'tissue': 86793,\n", 536 | " 'storage': 82568,\n", 537 | " 'heavy': 40860,\n", 538 | " 'metal': 56226,\n", 539 | " 'deposits': 26014,\n", 540 | " 'survey': 83928,\n", 541 | " 'nearing': 60009,\n", 542 | " 'end': 30492,\n", 543 | " 'rios': 73660,\n", 544 | " 'pulls': 69170,\n", 545 | " 'buenos': 15867,\n", 546 | " 'aires': 6266,\n", 547 | " 'open': 62580,\n", 548 | " 'inquest': 44685,\n", 549 | " 'finds': 33583,\n", 550 | " 'mans': 53996,\n", 551 | " 'accidental': 5075,\n", 552 | " 'investigations': 45256,\n", 553 | " 'underway': 89649,\n", 554 | " 'investigation': 45255,\n", 555 | " 'elster': 30125,\n", 556 | " 'creek': 23336,\n", 557 | " 'iraqs': 45400,\n", 558 | " 'neighbours': 60148,\n", 559 | " 'plead': 66771,\n", 560 | " 'continued': 22123,\n", 561 | " 'inspections': 44768,\n", 562 | " 'own': 63616,\n", 563 | " 'rebuilding': 71104,\n", 564 | " 'white': 93920,\n", 565 | " 'house': 42605,\n", 566 | " 'irish': 45424,\n", 567 | " 'man': 53773,\n", 568 | " 'arrested': 8691,\n", 569 | " 'omagh': 62390,\n", 570 | " 'bombing': 14076,\n", 571 | " 'irrigators': 45499,\n", 572 | " 'vote': 92185,\n", 573 | " 'river': 73776,\n", 574 | " 'management': 53781,\n", 575 | " 'israeli': 45634,\n", 576 | " 'forces': 34654,\n", 577 | " 'push': 69358,\n", 578 | " 'gaza': 36484,\n", 579 | " 'strip': 82868,\n", 580 | " 'jury': 47067,\n", 581 | " 'consider': 21893,\n", 582 | " 'murder': 58997,\n", 583 | " 'case': 17910,\n", 584 | " 'juvenile': 47098,\n", 585 | " 'sex': 77723,\n", 586 | " 'offenders': 62118,\n", 587 | " 'unlikely': 89997,\n", 588 | " 'reoffend': 72386,\n", 589 | " 'kelly': 47887,\n", 590 | " 'disgusted': 27220,\n", 591 | " 'alleged': 6772,\n", 592 | " 'bp': 14743,\n", 593 | " 'scare': 76249,\n", 594 | " 'surprised': 83900,\n", 595 | " 'confidence': 21659,\n", 596 | " 'low': 52529,\n", 597 | " '314': 2945,\n", 598 | " 'missing': 57277,\n", 599 | " 'last': 50289,\n", 600 | " 'minute': 57079,\n", 601 | " 'hands': 39924,\n", 602 | " 'alinghi': 6697,\n", 603 | " 'lead': 50579,\n", 604 | " 'demand': 25731,\n", 605 | " 'service': 77600,\n", 606 | " 'central': 18485,\n", 607 | " 'qld': 69583,\n", 608 | " 'hijack': 41532,\n", 609 | " 'attempt': 9366,\n", 610 | " 'charged': 18888,\n", 611 | " 'cooma': 22341,\n", 612 | " 'fined': 33585,\n", 613 | " 'aboriginal': 4858,\n", 614 | " 'tent': 85708,\n", 615 | " 'embassy': 30207,\n", 616 | " 'raid': 70250,\n", 617 | " 'jailed': 45909,\n", 618 | " 'keno': 47973,\n", 619 | " 'fraud': 35121,\n", 620 | " 'knife': 48842,\n", 621 | " 'hijacks': 41538,\n", 622 | " 'light': 51483,\n", 623 | " 'plane': 66623,\n", 624 | " 'martin': 54485,\n", 625 | " 'lobby': 52025,\n", 626 | " 'losing': 52427,\n", 627 | " 'nt': 61438,\n", 628 | " 'seat': 77029,\n", 629 | " 'massive': 54637,\n", 630 | " 'drug': 28630,\n", 631 | " 'crop': 23599,\n", 632 | " 'discovered': 27130,\n", 633 | " 'western': 93624,\n", 634 | " 'mayor': 54964,\n", 635 | " 'warns': 92850,\n", 636 | " 'landfill': 50013,\n", 637 | " 'protesters': 68881,\n", 638 | " 'meeting': 55707,\n", 639 | " 'tick': 86467,\n", 640 | " 'clearance': 20115,\n", 641 | " 'focus': 34435,\n", 642 | " 'broken': 15466,\n", 643 | " 'hill': 41573,\n", 644 | " 'woes': 94760,\n", 645 | " 'moderate': 57518,\n", 646 | " 'lift': 51467,\n", 647 | " 'wages': 92363,\n", 648 | " 'growth': 38916,\n", 649 | " 'more': 58099,\n", 650 | " 'than': 85951,\n", 651 | " '40': 3236,\n", 652 | " 'pc': 64958,\n", 653 | " 'young': 96071,\n", 654 | " 'men': 55929,\n", 655 | " 'drink': 28514,\n", 656 | " 'alcohol': 6540,\n", 657 | " 'restrictions': 72905,\n", 658 | " 'predicted': 67988,\n", 659 | " 'northern': 61194,\n", 660 | " 'women': 94844,\n", 661 | " 'councillors': 22882,\n", 662 | " 'most': 58296,\n", 663 | " 'highly': 41502,\n", 664 | " 'educated': 29614,\n", 665 | " 'live': 51879,\n", 666 | " 'raises': 70323,\n", 667 | " 'hospital': 42497,\n", 668 | " 'concerns': 21497,\n", 669 | " 'parliament': 64462,\n", 670 | " 'rejects': 72025,\n", 671 | " 'claims': 19947,\n", 672 | " 'mugabe': 58700,\n", 673 | " 'touch': 87358,\n", 674 | " 'down': 28241,\n", 675 | " 'paris': 64397,\n", 676 | " 'gallery': 36017,\n", 677 | " 'gets': 36924,\n", 678 | " 'all': 6731,\n", 679 | " 'clear': 20114,\n", 680 | " 'nato': 59824,\n", 681 | " 'gives': 37308,\n", 682 | " 'green': 38509,\n", 683 | " 'defend': 25414,\n", 684 | " 'turkey': 88755,\n", 685 | " 'nca': 59958,\n", 686 | " 'defends': 25424,\n", 687 | " 'new': 60375,\n", 688 | " 'zealand': 96377,\n", 689 | " 'imposes': 43841,\n", 690 | " 'visa': 91924,\n", 691 | " 'entry': 30838,\n", 692 | " 'zimbabwe': 96512,\n", 693 | " 'no': 60941,\n", 694 | " 'side': 78862,\n", 695 | " 'effects': 29655,\n", 696 | " 'whooping': 94028,\n", 697 | " 'cough': 22853,\n", 698 | " 'vaccine': 90870,\n", 699 | " 'holding': 41979,\n", 700 | " 'vegetation': 91274,\n", 701 | " 'running': 74898,\n", 702 | " 'race': 70057,\n", 703 | " 'campaign': 17094,\n", 704 | " 'pledges': 66800,\n", 705 | " '50m': 3598,\n", 706 | " 'drought': 28592,\n", 707 | " 'relief': 72114,\n", 708 | " 'boosts': 14287,\n", 709 | " 'nurse': 61601,\n", 710 | " 'number': 61554,\n", 711 | " 'overseas': 63505,\n", 712 | " 'intake': 44884,\n", 713 | " 'nth': 61465,\n", 714 | " 'koreans': 49143,\n", 715 | " 'seek': 77182,\n", 716 | " 'asylum': 9229,\n", 717 | " 'japanese': 46065,\n", 718 | " 'nursing': 61609,\n", 719 | " 'student': 82992,\n", 720 | " 'oh': 62220,\n", 721 | " 'brother': 15565,\n", 722 | " 'your': 96085,\n", 723 | " 'times': 86647,\n", 724 | " 'says': 76141,\n", 725 | " 'ganguly': 36166,\n", 726 | " 'senior': 77394,\n", 727 | " 'omodei': 62440,\n", 728 | " 'stay': 82028,\n", 729 | " 'politics': 67196,\n", 730 | " 'onesteel': 62482,\n", 731 | " 'invest': 45236,\n", 732 | " '80m': 4284,\n", 733 | " 'whyalla': 94047,\n", 734 | " 'steelworks': 82112,\n", 735 | " 'opposition': 62673,\n", 736 | " 'recherche': 71195,\n", 737 | " 'bay': 11437,\n", 738 | " 'orientation': 62879,\n", 739 | " 'begins': 11856,\n", 740 | " 'uni': 89853,\n", 741 | " 'students': 82993,\n", 742 | " 'osullivan': 63050,\n", 743 | " 'world': 95154,\n", 744 | " 'cross': 23614,\n", 745 | " 'country': 22933,\n", 746 | " 'doubt': 28179,\n", 747 | " 'pagan': 63840,\n", 748 | " 'rule': 74843,\n", 749 | " 'changes': 18804,\n", 750 | " 'necessary': 60027,\n", 751 | " 'pair': 63893,\n", 752 | " 'face': 32254,\n", 753 | " 'ayr': 9996,\n", 754 | " 'patterson': 64808,\n", 755 | " 'attend': 9372,\n", 756 | " 'show': 78650,\n", 757 | " 'displays': 27375,\n", 758 | " 'govts': 38149,\n", 759 | " 'arrogance': 8713,\n", 760 | " 'snubs': 80252,\n", 761 | " 'avoid': 9844,\n", 762 | " 'lions': 51739,\n", 763 | " 'den': 25844,\n", 764 | " 'peace': 64985,\n", 765 | " 'agreement': 6050,\n", 766 | " 'may': 54922,\n", 767 | " 'bring': 15302,\n", 768 | " 'respite': 72819,\n", 769 | " 'venezuela': 91354,\n", 770 | " 'pienaar': 66163,\n", 771 | " 'shines': 78377,\n", 772 | " 'ajax': 6355,\n", 773 | " 'frustrate': 35522,\n", 774 | " 'arsenal': 8733,\n", 775 | " 'second': 77074,\n", 776 | " 'skatepark': 79376,\n", 777 | " 'encourage': 30473,\n", 778 | " 'farmers': 32645,\n", 779 | " 'plantation': 66659,\n", 780 | " 'timber': 86612,\n", 781 | " 'png': 66961,\n", 782 | " 'nurses': 61606,\n", 783 | " 'colleague': 20838,\n", 784 | " 'raped': 70578,\n", 785 | " 'way': 93155,\n", 786 | " 'cracking': 23142,\n", 787 | " 'driver': 28538,\n", 788 | " 'safety': 75294,\n", 789 | " 'policewomen': 67148,\n", 790 | " 'accusations': 5151,\n", 791 | " 'feature': 32956,\n", 792 | " 'federal': 32975,\n", 793 | " 'crime': 23426,\n", 794 | " 'probe': 68476,\n", 795 | " 'launched': 50388,\n", 796 | " 'program': 68618,\n", 797 | " 'monitor': 57819,\n", 798 | " 'forest': 34723,\n", 799 | " 'harvested': 40339,\n", 800 | " 'areas': 8459,\n", 801 | " 'public': 69072,\n", 802 | " 'check': 19059,\n", 803 | " 'gas': 36313,\n", 804 | " 'cylinders': 24298,\n", 805 | " 'warned': 92831,\n", 806 | " 'phone': 65969,\n", 807 | " 'scam': 76198,\n", 808 | " 'qantas': 69487,\n", 809 | " 'international': 45018,\n", 810 | " 'crews': 23398,\n", 811 | " 'cut': 24173,\n", 812 | " '2500': 2359,\n", 813 | " 'jobs': 46582,\n", 814 | " 'outrages': 63261,\n", 815 | " 'unions': 89927,\n", 816 | " 'qr': 69611,\n", 817 | " 'planning': 66653,\n", 818 | " 'route': 74587,\n", 819 | " 'sackings': 75209,\n", 820 | " 'questions': 69837,\n", 821 | " 'grows': 38915,\n", 822 | " 'rabbit': 70034,\n", 823 | " 'control': 22190,\n", 824 | " 'trial': 88081,\n", 825 | " 'radioactive': 70147,\n", 826 | " 'wmcs': 94726,\n", 827 | " 'olympic': 62381,\n", 828 | " 'dam': 24521,\n", 829 | " 'mine': 56911,\n", 830 | " 'rain': 70286,\n", 831 | " 'eases': 29296,\n", 832 | " 'wheatbelt': 93787,\n", 833 | " 'reading': 70923,\n", 834 | " 'first': 33787,\n", 835 | " 'division': 27604,\n", 836 | " 'amount': 7290,\n", 837 | " 'gladstone': 37338,\n", 838 | " 'ventures': 91391,\n", 839 | " 'refshauge': 71694,\n", 840 | " 'regulator': 71845,\n", 841 | " 'inspect': 44764,\n", 842 | " 'gm': 37596,\n", 843 | " 'canola': 17311,\n", 844 | " 'trials': 88086,\n", 845 | " 'report': 72501,\n", 846 | " 'highlights': 41500,\n", 847 | " 'container': 22041,\n", 848 | " 'terminal': 85755,\n", 849 | " 'potential': 67682,\n", 850 | " 'resource': 72797,\n", 851 | " 'stocks': 82456,\n", 852 | " 'ords': 62819,\n", 853 | " 'restraint': 72897,\n", 854 | " 'order': 62798,\n", 855 | " 'issued': 45647,\n", 856 | " 'anti': 7846,\n", 857 | " 'discrimination': 27158,\n", 858 | " 'rfs': 73305,\n", 859 | " 'claim': 19940,\n", 860 | " 'that': 85972,\n", 861 | " 'authorities': 9698,\n", 862 | " 'spurned': 81495,\n", 863 | " 'ricciuto': 73403,\n", 864 | " 'undergoes': 89546,\n", 865 | " 'surgery': 83867,\n", 866 | " 'ankle': 7652,\n", 867 | " 'rice': 73407,\n", 868 | " 'mill': 56747,\n", 869 | " 'closures': 20336,\n", 870 | " 'put': 69373,\n", 871 | " '300': 2816,\n", 872 | " 'rsl': 74695,\n", 873 | " 'angry': 7599,\n", 874 | " 'troop': 88288,\n", 875 | " 'harassment': 40088,\n", 876 | " 'review': 73186,\n", 877 | " 'bushwalker': 16447,\n", 878 | " 'sa': 75132,\n", 879 | " 'premier': 68073,\n", 880 | " 'action': 5298,\n", 881 | " 'murray': 59045,\n", 882 | " 'saudi': 76013,\n", 883 | " 'arabians': 8317,\n", 884 | " 'stand': 81808,\n", 885 | " 'al': 6415,\n", 886 | " 'qaeda': 69473,\n", 887 | " 'arabia': 8315,\n", 888 | " 'arabs': 8323,\n", 889 | " 'inevitable': 44336,\n", 890 | " 'search': 76986,\n", 891 | " 'resolution': 72775,\n", 892 | " 'shortly': 78616,\n", 893 | " 'shire': 78441,\n", 894 | " 'offers': 62131,\n", 895 | " 'assurances': 9150,\n", 896 | " 'finances': 33557,\n", 897 | " 'six': 79325,\n", 898 | " 'palestinians': 63978,\n", 899 | " 'killed': 48390,\n", 900 | " 'incursion': 44116,\n", 901 | " 'slow': 79816,\n", 902 | " 'recovery': 71351,\n", 903 | " 'economy': 29483,\n", 904 | " 'bans': 10796,\n", 905 | " 'hit': 41746,\n", 906 | " 'tabcorp': 84467,\n", 907 | " 'bottom': 14536,\n", 908 | " 'line': 51658,\n", 909 | " 'snowtown': 80242,\n", 910 | " 'delayed': 25599,\n", 911 | " 'forced': 34648,\n", 912 | " 'label': 49706,\n", 913 | " 'stations': 81995,\n", 914 | " 'get': 36915,\n", 915 | " 'sterrey': 82267,\n", 916 | " 'steer': 82129,\n", 917 | " 'sharks': 78021,\n", 918 | " 'sign': 78948,\n", 919 | " 'fisherman': 33815,\n", 920 | " 'stop': 82549,\n", 921 | " 'changing': 18808,\n", 922 | " 'the': 85989,\n", 923 | " 'rules': 74848,\n", 924 | " 'fans': 32563,\n", 925 | " 'tell': 85574,\n", 926 | " 'afl': 5843,\n", 927 | " 'sugar': 83337,\n", 928 | " 'industry': 44298,\n", 929 | " 'revealed': 73133,\n", 930 | " 'surge': 83862,\n", 931 | " 'car': 17486,\n", 932 | " 'sales': 75436,\n", 933 | " 'abs': 4922,\n", 934 | " 'swiss': 84250,\n", 935 | " 'challengers': 18717,\n", 936 | " 'looking': 52330,\n", 937 | " 'future': 35794,\n", 938 | " 'taipans': 84637,\n", 939 | " 'placing': 66601,\n", 940 | " 'publics': 69085,\n", 941 | " 'talk': 84732,\n", 942 | " 'asian': 8952,\n", 943 | " 'nuclear': 61491,\n", 944 | " 'arms': 8622,\n", 945 | " 'unhelpful': 89842,\n", 946 | " 'downer': 28246,\n", 947 | " 'tasmanian': 85153,\n", 948 | " 'scientists': 76573,\n", 949 | " 'east': 29306,\n", 950 | " 'taylor': 85313,\n", 951 | " 'denies': 25866,\n", 952 | " 'calling': 16948,\n", 953 | " 'waugh': 93126,\n", 954 | " 'quit': 69940,\n", 955 | " 'teen': 85464,\n", 956 | " 'charges': 18891,\n", 957 | " 'testing': 85864,\n", 958 | " 'shows': 78688,\n", 959 | " 'dioxin': 26919,\n", 960 | " 'above': 4896,\n", 961 | " 'drinking': 28519,\n", 962 | " 'standards': 81817,\n", 963 | " 'thousands': 86260,\n", 964 | " 'remember': 72204,\n", 965 | " '61st': 3855,\n", 966 | " 'anniversary': 7709,\n", 967 | " 'darwin': 24797,\n", 968 | " 'ask': 8968,\n", 969 | " 'members': 55895,\n", 970 | " 'support': 83787,\n", 971 | " 'protests': 68885,\n", 972 | " 'continue': 22122,\n", 973 | " 'tree': 87957,\n", 974 | " 'disease': 27180,\n", 975 | " 'study': 83003,\n", 976 | " 'us': 90540,\n", 977 | " 'aircraft': 6253,\n", 978 | " 'attack': 9346,\n", 979 | " 'sth': 82297,\n", 980 | " 'target': 85032,\n", 981 | " 'vff': 91564,\n", 982 | " 'buy': 16575,\n", 983 | " 'stock': 82422,\n", 984 | " 'feed': 33007,\n", 985 | " 'pellets': 65196,\n", 986 | " 'affected': 5776,\n", 987 | " 'vic': 91594,\n", 988 | " 'local': 52042,\n", 989 | " 'councils': 22884,\n", 990 | " 'welcome': 93470,\n", 991 | " 'single': 79192,\n", 992 | " 'polling': 67227,\n", 993 | " 'day': 24925,\n", 994 | " 'victorian': 91650,\n", 995 | " 'honoured': 42230,\n", 996 | " 'awards': 9887,\n", 997 | " 'vowles': 92206,\n", 998 | " 'retire': 73013,\n", 999 | " 'season': 77013,\n", 1000 | " 'coach': 20469,\n", 1001 | " 'accuses': 5158,\n", 1002 | " 'players': 66732,\n", 1003 | " 'belittling': 11984,\n", 1004 | " 'red': 71402,\n", 1005 | " 'warne': 92830,\n", 1006 | " 'hearing': 40769,\n", 1007 | " 'set': 77631,\n", 1008 | " 'friday': 35347,\n", 1009 | " 'webb': 93281,\n", 1010 | " 'favourite': 32867,\n", 1011 | " 'ladies': 49787,\n", 1012 | " 'masters': 54671,\n", 1013 | " 'widnes': 94102,\n", 1014 | " 'abandon': 4675,\n", 1015 | " 'paul': 64825,\n", 1016 | " 'bid': 12680,\n", 1017 | " 'wildlife': 94211,\n", 1018 | " 'sanctuaries': 75656,\n", 1019 | " 'williams': 94282,\n", 1020 | " 'tight': 86539,\n", 1021 | " 'bowling': 14683,\n", 1022 | " 'key': 48133,\n", 1023 | " 'warriors': 92899,\n", 1024 | " 'wine': 94449,\n", 1025 | " 'bounces': 14595,\n", 1026 | " 'sacking': 75208,\n", 1027 | " 'worksafe': 95142,\n", 1028 | " 'probes': 68479,\n", 1029 | " 'potato': 67673,\n", 1030 | " 'harvester': 40340,\n", 1031 | " 'injuries': 44605,\n", 1032 | " '15': 1055,\n", 1033 | " 'dead': 24992,\n", 1034 | " 'rebel': 71072,\n", 1035 | " 'philippines': 65913,\n", 1036 | " 'army': 8627,\n", 1037 | " 'abattoir': 4692,\n", 1038 | " 'sale': 75429,\n", 1039 | " 'again': 5910,\n", 1040 | " 'academic': 5009,\n", 1041 | " 'upbeat': 90349,\n", 1042 | " 'higher': 41485,\n", 1043 | " 'education': 29619,\n", 1044 | " 'administrator': 5531,\n", 1045 | " 'appointed': 8171,\n", 1046 | " 'land': 49990,\n", 1047 | " 'aec': 5706,\n", 1048 | " 'declare': 25236,\n", 1049 | " 'if': 43424,\n", 1050 | " 'lose': 52421,\n", 1051 | " 'parliamentary': 64465,\n", 1052 | " 'amcor': 7167,\n", 1053 | " 'solid': 80472,\n", 1054 | " 'result': 72930,\n", 1055 | " 'americas': 7208,\n", 1056 | " 'cup': 24016,\n", 1057 | " 'fourth': 34968,\n", 1058 | " 'cancelled': 17192,\n", 1059 | " 'poised': 67071,\n", 1060 | " 'swoop': 84274,\n", 1061 | " 'beckham': 11688,\n", 1062 | " 'less': 51083,\n", 1063 | " 'austeel': 9594,\n", 1064 | " 'eis': 29783,\n", 1065 | " 'release': 72085,\n", 1066 | " 'due': 28765,\n", 1067 | " 'soon': 80622,\n", 1068 | " 'flag': 33945,\n", 1069 | " '100th': 468,\n", 1070 | " 'awu': 9938,\n", 1071 | " 'port': 67501,\n", 1072 | " 'kembla': 47911,\n", 1073 | " 'baby': 10105,\n", 1074 | " 'badly': 10285,\n", 1075 | " 'burnt': 16323,\n", 1076 | " 'brisbane': 15322,\n", 1077 | " 'bad': 10242,\n", 1078 | " 'weather': 93261,\n", 1079 | " 'might': 56612,\n", 1080 | " 'have': 40502,\n", 1081 | " 'caused': 18209,\n", 1082 | " 'iranian': 45393,\n", 1083 | " 'depleted': 25976,\n", 1084 | " 'juve': 47096,\n", 1085 | " 'beware': 12557,\n", 1086 | " 'standard': 81811,\n", 1087 | " 'alcoholic': 6543,\n", 1088 | " 'dream': 28430,\n", 1089 | " 'reality': 70968,\n", 1090 | " 'sparkies': 80937,\n", 1091 | " 'britain': 15348,\n", 1092 | " 'nationals': 59809,\n", 1093 | " 'leave': 50697,\n", 1094 | " 'high': 41481,\n", 1095 | " 'overturns': 63571,\n", 1096 | " 'blair': 13331,\n", 1097 | " 'magician': 53278,\n", 1098 | " 'entomb': 30808,\n", 1099 | " 'himself': 41626,\n", 1100 | " 'cheese': 19094,\n", 1101 | " 'bungle': 16156,\n", 1102 | " 'doctor': 27742,\n", 1103 | " 'waiting': 92428,\n", 1104 | " 'practise': 67849,\n", 1105 | " 'coronial': 22628,\n", 1106 | " 'inquiry': 44696,\n", 1107 | " 'winds': 94434,\n", 1108 | " 'bush': 16411,\n", 1109 | " 'thanks': 85960,\n", 1110 | " 'ambos': 7149,\n", 1111 | " 'wake': 92446,\n", 1112 | " 'funding': 35693,\n", 1113 | " 'canegrowers': 17242,\n", 1114 | " 'hope': 42298,\n", 1115 | " 'late': 50302,\n", 1116 | " 'summer': 83448,\n", 1117 | " 'capriati': 17437,\n", 1118 | " 'hungry': 42930,\n", 1119 | " 'dubai': 28706,\n", 1120 | " 'celts': 18436,\n", 1121 | " 'underdogs': 89527,\n", 1122 | " 'uefa': 89131,\n", 1123 | " 'clash': 20022,\n", 1124 | " 'oneill': 62472,\n", 1125 | " 'chambers': 18735,\n", 1126 | " 'vows': 92207,\n", 1127 | " 'smash': 79922,\n", 1128 | " 'mark': 54284,\n", 1129 | " 'charvis': 18964,\n", 1130 | " 'pays': 64940,\n", 1131 | " 'penalty': 65234,\n", 1132 | " 'humphreys': 42891,\n", 1133 | " 'earns': 29258,\n", 1134 | " 'shock': 78494,\n", 1135 | " 'christmas': 19620,\n", 1136 | " 'detention': 26320,\n", 1137 | " 'centre': 18497,\n", 1138 | " 'quashed': 69762,\n", 1139 | " 'defence': 25410,\n", 1140 | " 'spending': 81114,\n", 1141 | " 'priority': 68396,\n", 1142 | " 'causing': 18215,\n", 1143 | " 'indigenous': 44206,\n", 1144 | " 'autopsy': 9741,\n", 1145 | " 'consent': 21862,\n", 1146 | " 'college': 20860,\n", 1147 | " 'experience': 31943,\n", 1148 | " 'concern': 21494,\n", 1149 | " 'covered': 23016,\n", 1150 | " 'by': 16613,\n", 1151 | " 'legal': 50796,\n", 1152 | " 'concorde': 21537,\n", 1153 | " 'makes': 53528,\n", 1154 | " 'emergency': 30269,\n", 1155 | " 'landing': 50021,\n", 1156 | " 'canada': 17157,\n", 1157 | " 'laments': 49934,\n", 1158 | " 'job': 46571,\n", 1159 | " 'advertising': 5670,\n", 1160 | " 'general': 36671,\n", 1161 | " 'manager': 53782,\n", 1162 | " 'step': 82205,\n", 1163 | " 'vandalism': 91024,\n", 1164 | " 'reporting': 72507,\n", 1165 | " 'reward': 73250,\n", 1166 | " 'racing': 70086,\n", 1167 | " 'clubs': 20389,\n", 1168 | " 'tab': 84457,\n", 1169 | " 'fears': 32937,\n", 1170 | " 'longford': 52275,\n", 1171 | " 'compo': 21395,\n", 1172 | " 'today': 86902,\n", 1173 | " 'cristal': 23478,\n", 1174 | " 'libertadores': 51332,\n", 1175 | " 'streak': 82755,\n", 1176 | " 'cuper': 24021,\n", 1177 | " 'slams': 79615,\n", 1178 | " 'inter': 44931,\n", 1179 | " 'boy': 14712,\n", 1180 | " 'recoba': 71242,\n", 1181 | " 'deportivo': 26003,\n", 1182 | " 'slip': 79770,\n", 1183 | " 'buoyant': 16197,\n", 1184 | " 'minnows': 57041,\n", 1185 | " 'basel': 11191,\n", 1186 | " 'distance': 27468,\n", 1187 | " 'swimmer': 84212,\n", 1188 | " 'maroney': 54377,\n", 1189 | " 'it': 45661,\n", 1190 | " 'quits': 69946,\n", 1191 | " 'dixon': 27625,\n", 1192 | " ...}" 1193 | ] 1194 | }, 1195 | "execution_count": 11, 1196 | "metadata": {}, 1197 | "output_type": "execute_result" 1198 | } 1199 | ], 1200 | "source": [ 1201 | "vec.vocabulary_" 1202 | ] 1203 | }, 1204 | { 1205 | "cell_type": "code", 1206 | "execution_count": 12, 1207 | "metadata": {}, 1208 | "outputs": [ 1209 | { 1210 | "data": { 1211 | "text/plain": [ 1212 | "(1103665, 96687)" 1213 | ] 1214 | }, 1215 | "execution_count": 12, 1216 | "metadata": {}, 1217 | "output_type": "execute_result" 1218 | } 1219 | ], 1220 | "source": [ 1221 | "doc_term.shape" 1222 | ] 1223 | }, 1224 | { 1225 | "cell_type": "code", 1226 | "execution_count": 13, 1227 | "metadata": {}, 1228 | "outputs": [ 1229 | { 1230 | "data": { 1231 | "text/plain": [ 1232 | "(1103665, 2)" 1233 | ] 1234 | }, 1235 | "execution_count": 13, 1236 | "metadata": {}, 1237 | "output_type": "execute_result" 1238 | } 1239 | ], 1240 | "source": [ 1241 | "df.shape" 1242 | ] 1243 | }, 1244 | { 1245 | "cell_type": "markdown", 1246 | "metadata": {}, 1247 | "source": [ 1248 | "## Vectorize (part 2)\n", 1249 | "\n", 1250 | "**Build the matrix such that the value at `(i,j)` represents a sort of *normalized frequency*,** which takes into account (a) the density of term `j` in document `i`, as well as (b) the number of documents in which that term occurs.\n", 1251 | "\n", 1252 | "*Hint: Try `TfidfVectorizer`. What is this?*" 1253 | ] 1254 | }, 1255 | { 1256 | "cell_type": "code", 1257 | "execution_count": 14, 1258 | "metadata": {}, 1259 | "outputs": [], 1260 | "source": [ 1261 | "vec = TfidfVectorizer()" 1262 | ] 1263 | }, 1264 | { 1265 | "cell_type": "code", 1266 | "execution_count": 15, 1267 | "metadata": {}, 1268 | "outputs": [], 1269 | "source": [ 1270 | "doc_term = vec.fit_transform(docs.values)" 1271 | ] 1272 | }, 1273 | { 1274 | "cell_type": "code", 1275 | "execution_count": 16, 1276 | "metadata": {}, 1277 | "outputs": [ 1278 | { 1279 | "name": "stderr", 1280 | "output_type": "stream", 1281 | "text": [ 1282 | "/Users/julialintern/opt/anaconda3/envs/ml/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n", 1283 | " warnings.warn(msg, category=FutureWarning)\n" 1284 | ] 1285 | }, 1286 | { 1287 | "data": { 1288 | "text/html": [ 1289 | "
\n", 1290 | "\n", 1303 | "\n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | "
000002005006007010101010115010213010215...zydeligzygarzygiefszygierzylzylvesterzyngazyngierzzzzz
00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
\n", 1357 | "

1 rows × 96687 columns

\n", 1358 | "
" 1359 | ], 1360 | "text/plain": [ 1361 | " 000 002 005 006 007 01 0101 010115 010213 010215 ... zydelig \\\n", 1362 | "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 \n", 1363 | "\n", 1364 | " zygar zygiefs zygier zyl zylvester zynga zyngier zz zzz \n", 1365 | "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", 1366 | "\n", 1367 | "[1 rows x 96687 columns]" 1368 | ] 1369 | }, 1370 | "execution_count": 16, 1371 | "metadata": {}, 1372 | "output_type": "execute_result" 1373 | } 1374 | ], 1375 | "source": [ 1376 | "# we can look at the 1st row to see what is happening, \n", 1377 | "tfidf_df=pd.DataFrame(doc_term[0].toarray(),columns=vec.get_feature_names())\n", 1378 | "tfidf_df.head()" 1379 | ] 1380 | }, 1381 | { 1382 | "cell_type": "code", 1383 | "execution_count": 7, 1384 | "metadata": {}, 1385 | "outputs": [ 1386 | { 1387 | "name": "stderr", 1388 | "output_type": "stream", 1389 | "text": [ 1390 | "/Users/julialintern/.local/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n", 1391 | " warnings.warn(msg, category=FutureWarning)\n" 1392 | ] 1393 | }, 1394 | { 1395 | "data": { 1396 | "text/html": [ 1397 | "
\n", 1398 | "\n", 1411 | "\n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | "
aaaaaaacoaactaaameraamiaantaaptaaronab...zsazuckerbergzuckerbergszullozumazurichzusakzverevzvonarevazygier
00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
\n", 1465 | "

1 rows × 35226 columns

\n", 1466 | "
" 1467 | ], 1468 | "text/plain": [ 1469 | " aa aaa aaco aacta aamer aami aant aapt aaron ab ... zsa \\\n", 1470 | "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 \n", 1471 | "\n", 1472 | " zuckerberg zuckerbergs zullo zuma zurich zusak zverev zvonareva \\\n", 1473 | "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", 1474 | "\n", 1475 | " zygier \n", 1476 | "0 0.0 \n", 1477 | "\n", 1478 | "[1 rows x 35226 columns]" 1479 | ] 1480 | }, 1481 | "execution_count": 7, 1482 | "metadata": {}, 1483 | "output_type": "execute_result" 1484 | } 1485 | ], 1486 | "source": [ 1487 | "# Let's do a bit better .. \n", 1488 | "vec=TfidfVectorizer(stop_words='english',max_df=.75,min_df=6,token_pattern=r'(?u)\\b[A-Za-z]+\\b')\n", 1489 | "doc_term=vec.fit_transform(docs.values)\n", 1490 | "\n", 1491 | "tfidf_df=pd.DataFrame(doc_term[0].toarray(),columns=vec.get_feature_names())\n", 1492 | "tfidf_df.head()" 1493 | ] 1494 | }, 1495 | { 1496 | "cell_type": "markdown", 1497 | "metadata": {}, 1498 | "source": [ 1499 | "## What can you do with this?\n", 1500 | "\n", 1501 | "Try a few different operations, and try to **interpret their meaning/usecase**:\n", 1502 | "\n", 1503 | "* Calculate the correlation between documents, or between terms\n", 1504 | "* Consider bigrams or n-grams in your vectorizer\n", 1505 | "* Determine if there is multicollinearity between documents, or between terms\n", 1506 | "* Try to incorporate the `user_id` into your analysis\n", 1507 | "* Build a Python Class to make your work repeatable\n" 1508 | ] 1509 | }, 1510 | { 1511 | "cell_type": "markdown", 1512 | "metadata": {}, 1513 | "source": [ 1514 | "### NMF \n", 1515 | "\n", 1516 | "( document matrix x topic matrix)" 1517 | ] 1518 | }, 1519 | { 1520 | "cell_type": "markdown", 1521 | "metadata": {}, 1522 | "source": [ 1523 | "" 1524 | ] 1525 | }, 1526 | { 1527 | "cell_type": "code", 1528 | "execution_count": 18, 1529 | "metadata": {}, 1530 | "outputs": [ 1531 | { 1532 | "data": { 1533 | "text/plain": [ 1534 | "(1103665, 32362)" 1535 | ] 1536 | }, 1537 | "execution_count": 18, 1538 | "metadata": {}, 1539 | "output_type": "execute_result" 1540 | } 1541 | ], 1542 | "source": [ 1543 | "doc_term.shape" 1544 | ] 1545 | }, 1546 | { 1547 | "cell_type": "code", 1548 | "execution_count": 8, 1549 | "metadata": {}, 1550 | "outputs": [ 1551 | { 1552 | "data": { 1553 | "text/html": [ 1554 | "
NMF(init='random', n_components=10)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" 1555 | ], 1556 | "text/plain": [ 1557 | "NMF(init='random', n_components=10)" 1558 | ] 1559 | }, 1560 | "execution_count": 8, 1561 | "metadata": {}, 1562 | "output_type": "execute_result" 1563 | } 1564 | ], 1565 | "source": [ 1566 | "from sklearn.decomposition import NMF\n", 1567 | "nmf = NMF(n_components=10, init='random')\n", 1568 | "W = nmf.fit_transform(doc_term)\n", 1569 | "H = nmf.components_\n", 1570 | "\n", 1571 | "### OUTPUT THE MODEL\n", 1572 | "nmf" 1573 | ] 1574 | }, 1575 | { 1576 | "cell_type": "code", 1577 | "execution_count": 11, 1578 | "metadata": {}, 1579 | "outputs": [ 1580 | { 1581 | "name": "stderr", 1582 | "output_type": "stream", 1583 | "text": [ 1584 | "/Users/julialintern/.local/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n", 1585 | " warnings.warn(msg, category=FutureWarning)\n" 1586 | ] 1587 | } 1588 | ], 1589 | "source": [ 1590 | "vocab = vec.get_feature_names()\n", 1591 | "id_topic = nmf.fit_transform(doc_term)\n", 1592 | "n_top_words=10\n", 1593 | "topic_words = {}\n", 1594 | "\n", 1595 | "for topic, comp in enumerate(nmf.components_):\n", 1596 | " word_idx = np.argsort(comp)[::-1][:n_top_words]\n", 1597 | " # store the words most relevant to the topic\n", 1598 | " topic_words[topic] = [vocab[i] for i in word_idx]" 1599 | ] 1600 | }, 1601 | { 1602 | "cell_type": "code", 1603 | "execution_count": 12, 1604 | "metadata": {}, 1605 | "outputs": [ 1606 | { 1607 | "name": "stdout", 1608 | "output_type": "stream", 1609 | "text": [ 1610 | "0 ['new', 'zealand', 'cases', 'laws', 'year', 'coronavirus', 'york', 'records', 'covid', 'home']\n", 1611 | "\n", 1612 | "\n", 1613 | "1 ['govt', 'council', 'says', 'plan', 'water', 'health', 'urged', 'qld', 'funding', 'government']\n", 1614 | "\n", 1615 | "\n", 1616 | "2 ['interview', 'extended', 'michael', 'david', 'john', 'nrl', 'smith', 'james', 'ben', 'scott']\n", 1617 | "\n", 1618 | "\n", 1619 | "3 ['news', 'abc', 'rural', 'national', 'business', 'weather', 'market', 'sport', 'analysis', 'entertainment']\n", 1620 | "\n", 1621 | "\n", 1622 | "4 ['australia', 'day', 'south', 'world', 'cup', 'test', 'coronavirus', 'live', 'vs', 'china']\n", 1623 | "\n", 1624 | "\n", 1625 | "5 ['country', 'hour', 'nsw', 'tas', 'wa', 'vic', 'august', 'drum', 'october', 'sa']\n", 1626 | "\n", 1627 | "\n", 1628 | "6 ['crash', 'car', 'killed', 'dies', 'fatal', 'woman', 'road', 'driver', 'plane', 'dead']\n", 1629 | "\n", 1630 | "\n", 1631 | "7 ['man', 'charged', 'murder', 'missing', 'jailed', 'stabbing', 'arrested', 'guilty', 'death', 'sydney']\n", 1632 | "\n", 1633 | "\n", 1634 | "8 ['court', 'accused', 'face', 'murder', 'charges', 'faces', 'told', 'case', 'high', 'sex']\n", 1635 | "\n", 1636 | "\n", 1637 | "9 ['police', 'investigate', 'probe', 'missing', 'search', 'death', 'hunt', 'officer', 'shooting', 'seek']\n", 1638 | "\n", 1639 | "\n" 1640 | ] 1641 | } 1642 | ], 1643 | "source": [ 1644 | "for k,v in topic_words.items():\n", 1645 | " print(k,v)\n", 1646 | " print('\\n')" 1647 | ] 1648 | }, 1649 | { 1650 | "cell_type": "markdown", 1651 | "metadata": {}, 1652 | "source": [ 1653 | "### Using Glove Embeddings to plot Emotions " 1654 | ] 1655 | }, 1656 | { 1657 | "cell_type": "markdown", 1658 | "metadata": {}, 1659 | "source": [ 1660 | "" 1661 | ] 1662 | }, 1663 | { 1664 | "cell_type": "code", 1665 | "execution_count": null, 1666 | "metadata": {}, 1667 | "outputs": [], 1668 | "source": [] 1669 | } 1670 | ], 1671 | "metadata": { 1672 | "kernelspec": { 1673 | "display_name": "ml", 1674 | "language": "python", 1675 | "name": "ml" 1676 | }, 1677 | "language_info": { 1678 | "codemirror_mode": { 1679 | "name": "ipython", 1680 | "version": 3 1681 | }, 1682 | "file_extension": ".py", 1683 | "mimetype": "text/x-python", 1684 | "name": "python", 1685 | "nbconvert_exporter": "python", 1686 | "pygments_lexer": "ipython3", 1687 | "version": "3.9.12" 1688 | }, 1689 | "toc": { 1690 | "base_numbering": 1, 1691 | "nav_menu": {}, 1692 | "number_sections": true, 1693 | "sideBar": true, 1694 | "skip_h1_title": false, 1695 | "title_cell": "Table of Contents", 1696 | "title_sidebar": "Contents", 1697 | "toc_cell": false, 1698 | "toc_position": {}, 1699 | "toc_section_display": "block", 1700 | "toc_window_display": false 1701 | } 1702 | }, 1703 | "nbformat": 4, 1704 | "nbformat_minor": 4 1705 | } 1706 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # intro_to_machine_learning 2 | 3 | Machine Learning tutorial and walk thru for ODSC EAST Conference 2023 4 | -------------------------------------------------------------------------------- /imgs/CRT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/CRT.png -------------------------------------------------------------------------------- /imgs/adjusted_r.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/adjusted_r.png -------------------------------------------------------------------------------- /imgs/emotions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/emotions.png -------------------------------------------------------------------------------- /imgs/flying_blind.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/flying_blind.png -------------------------------------------------------------------------------- /imgs/logistic_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/logistic_1.png -------------------------------------------------------------------------------- /imgs/nlp_topics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/nlp_topics.png -------------------------------------------------------------------------------- /imgs/r_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/r_2.png -------------------------------------------------------------------------------- /imgs/regression_ml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/regression_ml.png -------------------------------------------------------------------------------- /imgs/saab.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/saab.png --------------------------------------------------------------------------------