├── 00_Requirements_and_Installs.ipynb
├── 01_What_is_Machine_Learning.ipynb
├── 02-intro-to-regression-used-car_final.ipynb
├── 03_Classification_heart.ipynb
├── 04-intro-to-NLP-and-topic-modeling.ipynb
├── README.md
├── data
├── data_small_final.csv
└── heart_2020_cleaned.csv
├── imgs
├── CRT.png
├── adjusted_r.png
├── emotions.png
├── flying_blind.png
├── logistic_1.png
├── nlp_topics.png
├── r_2.png
├── regression_ml.png
└── saab.png
└── solutions
├── 02-intro-to-regression-used-car_final_solutions.ipynb
└── 03_Classification_heart_solutions.ipynb
/00_Requirements_and_Installs.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Intro into Machine Learning!\n"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## Set-up \n",
15 | "\n",
16 | "\n",
17 | "### 0) Installing Anaconda\n",
18 | "\n",
19 | "If you haven't already, follow the instructions [here](https://github.com/julialintern/Intro_to_Deep_Learning/anaconda_install/) to **install an updated version of Anaconda** (with Python 3). \n",
20 | "\n",
21 | "Next, check that `conda` is installed by running `conda -V` from your terminal. You should\n",
22 | "receive a response indicating your current `conda` version."
23 | ]
24 | },
25 | {
26 | "cell_type": "markdown",
27 | "metadata": {},
28 | "source": [
29 | "### 1) Install Environment: \n",
30 | "\n",
31 | "\n",
32 | "```bash\n",
33 | "conda create -n ml python=3 \n",
34 | "conda activate ml\n",
35 | "conda install anaconda\n",
36 | "```\n",
37 | "\n",
38 | "\n",
39 | "#### Add 'ml' kernel to jupyter\n",
40 | "```bash\n",
41 | "conda install ipykernal\n",
42 | "python -m ipykernel install --user --name ml\n",
43 | "```\n",
44 | "\n",
45 | "```bash\n",
46 | "$ conda activate ml\n",
47 | "```\n",
48 | "You can then start Jupyter by running\n",
49 | "\n",
50 | "```bash\n",
51 | "$ jupyter notebook\n",
52 | "```\n",
53 | "\n",
54 | "When starting a new notebook in Jupyter, you should select \"Kernel ->\n",
55 | "Change Kernel -> \"ml\" before running."
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "\n",
63 | "\n"
64 | ]
65 | },
66 | {
67 | "cell_type": "markdown",
68 | "metadata": {
69 | "collapsed": true
70 | },
71 | "source": [
72 | "### 2) Git Clone:\n",
73 | "- (In case you haven't yet), please git clone the workshop repo : https://github.com/julialintern/intro_to_machine_learning"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {
79 | "collapsed": true
80 | },
81 | "source": [
82 | "### 3) Testing:\n",
83 | "#### Launch jupyter notebook\n"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 1,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": [
92 | "# once in your notebook, test:\n",
93 | "from sklearn.linear_model import LinearRegression"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": null,
99 | "metadata": {},
100 | "outputs": [],
101 | "source": []
102 | }
103 | ],
104 | "metadata": {
105 | "kernelspec": {
106 | "display_name": "ml",
107 | "language": "python",
108 | "name": "ml"
109 | },
110 | "language_info": {
111 | "codemirror_mode": {
112 | "name": "ipython",
113 | "version": 3
114 | },
115 | "file_extension": ".py",
116 | "mimetype": "text/x-python",
117 | "name": "python",
118 | "nbconvert_exporter": "python",
119 | "pygments_lexer": "ipython3",
120 | "version": "3.10.11"
121 | }
122 | },
123 | "nbformat": 4,
124 | "nbformat_minor": 2
125 | }
126 |
--------------------------------------------------------------------------------
/01_What_is_Machine_Learning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "f09755c1",
6 | "metadata": {},
7 | "source": [
8 | "# What is Machine Learning?"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "id": "0de8d9da",
14 | "metadata": {},
15 | "source": [
16 | "*Machine Learning is a subfield of Ai that involves the development of algorithms and statistical models that allow computers to automatically learn and improve from experience without being explicitly programmed. In other words, it is the science of getting computers to learn from data and make predictions or decisions based on that learning.* \n",
17 | "\n",
18 | "*Machine Learning algorithms can be broadly categorized into three types: superivsed learning, unsupervised learning and reinforcement learning.*\n",
19 | "\n",
20 | " --chatgpt"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "id": "c3cd729b",
26 | "metadata": {},
27 | "source": [
28 | "
"
29 | ]
30 | },
31 | {
32 | "cell_type": "markdown",
33 | "id": "ea67f4ba",
34 | "metadata": {},
35 | "source": [
36 | "
"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "id": "ac487c5d",
42 | "metadata": {},
43 | "source": [
44 | " # ALL Things NLP"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 3,
50 | "id": "bd4e499e",
51 | "metadata": {},
52 | "outputs": [],
53 | "source": [
54 | "#
"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "id": "fc1c0134",
60 | "metadata": {},
61 | "source": [
62 | "
"
63 | ]
64 | },
65 | {
66 | "cell_type": "markdown",
67 | "id": "9aa8f593",
68 | "metadata": {},
69 | "source": [
70 | "
"
71 | ]
72 | },
73 | {
74 | "cell_type": "markdown",
75 | "id": "681cb1bd",
76 | "metadata": {},
77 | "source": [
78 | "[source: Finding-good-read-among-billions-of-choices-1220]('https://news.mit.edu/2019/finding-good-read-among-billions-of-choices-1220')"
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "id": "552e3e0e",
84 | "metadata": {},
85 | "source": [
86 | "[Machine Learning as per Wikipedia](https://en.wikipedia.org/wiki/Machine_learning)"
87 | ]
88 | },
89 | {
90 | "cell_type": "markdown",
91 | "id": "87acdd98",
92 | "metadata": {},
93 | "source": [
94 | "[it's all around us](https://www.nytimes.com/search?query=ai)"
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "id": "d346a235",
100 | "metadata": {},
101 | "source": [
102 | "A bit about optimization and discovering the ultimate trade-off between : \n",
103 | " \n",
104 | " * Exhibit Exemplary Grit \n",
105 | " * “Insanity is doing the same thing over and over and expecting different results.” \n",
106 | "\n",
107 | " * Be endlessly curious. Try all the things \n",
108 | " * MVP approach: (just make sure you Deliver by the deadline !) "
109 | ]
110 | },
111 | {
112 | "cell_type": "markdown",
113 | "id": "4631772f",
114 | "metadata": {},
115 | "source": [
116 | "### Quick Overview \n",
117 | "\n",
118 | "Regression modeling: - 60 min \n",
119 | "EventX Q&A - 10 min\n",
120 | "\n",
121 | "\n",
122 | "Classification modeling: - 60 min \n",
123 | "EventX Q&A - 10 min\n",
124 | "\n",
125 | "\n",
126 | "NLP & Topic Modeling - 20 min \n",
127 | "EventX Q&A - 5 min\n",
128 | "\n",
129 | "Total \t\t - 2 hrs 45 mins"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": null,
135 | "id": "e37f945c",
136 | "metadata": {},
137 | "outputs": [],
138 | "source": []
139 | }
140 | ],
141 | "metadata": {
142 | "kernelspec": {
143 | "display_name": "ml",
144 | "language": "python",
145 | "name": "ml"
146 | },
147 | "language_info": {
148 | "codemirror_mode": {
149 | "name": "ipython",
150 | "version": 3
151 | },
152 | "file_extension": ".py",
153 | "mimetype": "text/x-python",
154 | "name": "python",
155 | "nbconvert_exporter": "python",
156 | "pygments_lexer": "ipython3",
157 | "version": "3.10.11"
158 | }
159 | },
160 | "nbformat": 4,
161 | "nbformat_minor": 5
162 | }
163 |
--------------------------------------------------------------------------------
/04-intro-to-NLP-and-topic-modeling.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "# Objectives\n",
15 | "At the end of this notebook the students should: \n",
16 | "\n",
17 | "* Develop a basic understanding of how to get started with text data\n",
18 | "* Perform basic preprocessing & vectorization of text data \n",
19 | "* Build and interpret an NMF topic model \n",
20 | "\n",
21 | "Data: \n",
22 | "We'll take a look at: [one million ABC News headlines](https://www.kaggle.com/code/thebrownviking20/k-means-clustering-of-1-million-headlines/data)"
23 | ]
24 | },
25 | {
26 | "cell_type": "markdown",
27 | "metadata": {},
28 | "source": [
29 | "# Building an NLP Pipeline\n",
30 | "\n",
31 | "For the pair problem today, we'll build a pipeline which manages the *basic* requirements for an NLP project. The goal is to build a toolbox for converting one or more strings of text into a matrix (retaining textual information along the way)."
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {},
37 | "source": [
38 | "## Step 1: Read in Data"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 10,
44 | "metadata": {},
45 | "outputs": [],
46 | "source": [
47 | "import pandas as pd\n",
48 | "import numpy as np"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 2,
54 | "metadata": {},
55 | "outputs": [
56 | {
57 | "data": {
58 | "text/html": [
59 | "
\n",
60 | "\n",
73 | "
\n",
74 | " \n",
75 | " \n",
76 | " | \n",
77 | " publish_date | \n",
78 | " headline_text | \n",
79 | "
\n",
80 | " \n",
81 | " \n",
82 | " \n",
83 | " 0 | \n",
84 | " 20030219 | \n",
85 | " aba decides against community broadcasting lic... | \n",
86 | "
\n",
87 | " \n",
88 | " 1 | \n",
89 | " 20030219 | \n",
90 | " act fire witnesses must be aware of defamation | \n",
91 | "
\n",
92 | " \n",
93 | " 2 | \n",
94 | " 20030219 | \n",
95 | " a g calls for infrastructure protection summit | \n",
96 | "
\n",
97 | " \n",
98 | " 3 | \n",
99 | " 20030219 | \n",
100 | " air nz staff in aust strike for pay rise | \n",
101 | "
\n",
102 | " \n",
103 | " 4 | \n",
104 | " 20030219 | \n",
105 | " air nz strike to affect australian travellers | \n",
106 | "
\n",
107 | " \n",
108 | "
\n",
109 | "
"
110 | ],
111 | "text/plain": [
112 | " publish_date headline_text\n",
113 | "0 20030219 aba decides against community broadcasting lic...\n",
114 | "1 20030219 act fire witnesses must be aware of defamation\n",
115 | "2 20030219 a g calls for infrastructure protection summit\n",
116 | "3 20030219 air nz staff in aust strike for pay rise\n",
117 | "4 20030219 air nz strike to affect australian travellers"
118 | ]
119 | },
120 | "execution_count": 2,
121 | "metadata": {},
122 | "output_type": "execute_result"
123 | }
124 | ],
125 | "source": [
126 | "df = pd.read_csv('~/Downloads/abcnews-date-text.csv')\n",
127 | "df.head()"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "## Step 2: Vectorize (part 1)\n",
135 | "\n",
136 | "Using one of the below vectorizers provided by Sci-Kit Learn, **convert the `reviews` pandas Series to a matrix**, where each row represents a document, and each column represents a term (or, a word in a document). The number of rows should match the number of rows in `df` — this is called the \"corpus\". And, the number of columns should be the total number of *distinct* terms (i.e., words) in the corpus — this is called the \"vocabulary\".\n",
137 | "\n",
138 | "**Build the matrix such that the value at `(i,j)` is the *Count* of term (column) `j` in document (row) `i`.**\n",
139 | "\n",
140 | "**What are the terms in this corpus?** *Hint: When using one of these vectorizers, what is the difference between `.vocabulary_` and `.get_feature_names()`?*\n",
141 | "\n",
142 | "*Note: The default behaviour for vectorizers is to output a Sparse matrix.*"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": 3,
148 | "metadata": {},
149 | "outputs": [],
150 | "source": [
151 | "from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": 4,
157 | "metadata": {},
158 | "outputs": [],
159 | "source": [
160 | "docs = df.headline_text"
161 | ]
162 | },
163 | {
164 | "cell_type": "code",
165 | "execution_count": 5,
166 | "metadata": {},
167 | "outputs": [],
168 | "source": [
169 | "vec = CountVectorizer()"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": 6,
175 | "metadata": {},
176 | "outputs": [],
177 | "source": [
178 | "doc_term = vec.fit_transform(docs)"
179 | ]
180 | },
181 | {
182 | "cell_type": "code",
183 | "execution_count": 11,
184 | "metadata": {
185 | "scrolled": true,
186 | "tags": []
187 | },
188 | "outputs": [
189 | {
190 | "data": {
191 | "text/plain": [
192 | "{'aba': 4665,\n",
193 | " 'decides': 25198,\n",
194 | " 'against': 5913,\n",
195 | " 'community': 21254,\n",
196 | " 'broadcasting': 15400,\n",
197 | " 'licence': 51363,\n",
198 | " 'act': 5288,\n",
199 | " 'fire': 33684,\n",
200 | " 'witnesses': 94686,\n",
201 | " 'must': 59152,\n",
202 | " 'be': 11516,\n",
203 | " 'aware': 9888,\n",
204 | " 'of': 62095,\n",
205 | " 'defamation': 25374,\n",
206 | " 'calls': 16958,\n",
207 | " 'for': 34625,\n",
208 | " 'infrastructure': 44473,\n",
209 | " 'protection': 68858,\n",
210 | " 'summit': 83464,\n",
211 | " 'air': 6232,\n",
212 | " 'nz': 61725,\n",
213 | " 'staff': 81689,\n",
214 | " 'in': 43928,\n",
215 | " 'aust': 9583,\n",
216 | " 'strike': 82846,\n",
217 | " 'pay': 64904,\n",
218 | " 'rise': 73710,\n",
219 | " 'to': 86856,\n",
220 | " 'affect': 5775,\n",
221 | " 'australian': 9638,\n",
222 | " 'travellers': 87872,\n",
223 | " 'ambitious': 7142,\n",
224 | " 'olsson': 62368,\n",
225 | " 'wins': 94525,\n",
226 | " 'triple': 88206,\n",
227 | " 'jump': 46985,\n",
228 | " 'antic': 7854,\n",
229 | " 'delighted': 25653,\n",
230 | " 'with': 94643,\n",
231 | " 'record': 71323,\n",
232 | " 'breaking': 15014,\n",
233 | " 'barca': 10886,\n",
234 | " 'aussie': 9577,\n",
235 | " 'qualifier': 69690,\n",
236 | " 'stosur': 82616,\n",
237 | " 'wastes': 92973,\n",
238 | " 'four': 34953,\n",
239 | " 'memphis': 55926,\n",
240 | " 'match': 54705,\n",
241 | " 'addresses': 5422,\n",
242 | " 'un': 89301,\n",
243 | " 'security': 77124,\n",
244 | " 'council': 22877,\n",
245 | " 'over': 63341,\n",
246 | " 'iraq': 45396,\n",
247 | " 'australia': 9635,\n",
248 | " 'is': 45527,\n",
249 | " 'locked': 52080,\n",
250 | " 'into': 45131,\n",
251 | " 'war': 92721,\n",
252 | " 'timetable': 86655,\n",
253 | " 'opp': 62643,\n",
254 | " 'contribute': 22177,\n",
255 | " '10': 407,\n",
256 | " 'million': 56790,\n",
257 | " 'aid': 6175,\n",
258 | " 'take': 84661,\n",
259 | " 'as': 8836,\n",
260 | " 'robson': 74004,\n",
261 | " 'celebrates': 18392,\n",
262 | " 'birthday': 13111,\n",
263 | " 'bathhouse': 11314,\n",
264 | " 'plans': 66655,\n",
265 | " 'move': 58496,\n",
266 | " 'ahead': 6132,\n",
267 | " 'big': 12724,\n",
268 | " 'hopes': 42312,\n",
269 | " 'launceston': 50385,\n",
270 | " 'cycling': 24275,\n",
271 | " 'championship': 18759,\n",
272 | " 'plan': 66622,\n",
273 | " 'boost': 14283,\n",
274 | " 'paroo': 64506,\n",
275 | " 'water': 93009,\n",
276 | " 'supplies': 83779,\n",
277 | " 'blizzard': 13518,\n",
278 | " 'buries': 16261,\n",
279 | " 'united': 89944,\n",
280 | " 'states': 81976,\n",
281 | " 'bills': 12850,\n",
282 | " 'brigadier': 15246,\n",
283 | " 'dismisses': 27304,\n",
284 | " 'reports': 72508,\n",
285 | " 'troops': 88293,\n",
286 | " 'harassed': 40083,\n",
287 | " 'british': 15357,\n",
288 | " 'combat': 21013,\n",
289 | " 'arriving': 8712,\n",
290 | " 'daily': 24425,\n",
291 | " 'kuwait': 49610,\n",
292 | " 'bryant': 15709,\n",
293 | " 'leads': 50598,\n",
294 | " 'lakers': 49874,\n",
295 | " 'double': 28169,\n",
296 | " 'overtime': 63558,\n",
297 | " 'win': 94378,\n",
298 | " 'bushfire': 16423,\n",
299 | " 'victims': 91637,\n",
300 | " 'urged': 90496,\n",
301 | " 'see': 77157,\n",
302 | " 'centrelink': 18505,\n",
303 | " 'businesses': 16464,\n",
304 | " 'should': 78631,\n",
305 | " 'prepare': 68105,\n",
306 | " 'terrorist': 85822,\n",
307 | " 'attacks': 9352,\n",
308 | " 'calleri': 16940,\n",
309 | " 'avenges': 9794,\n",
310 | " 'final': 33539,\n",
311 | " 'defeat': 25389,\n",
312 | " 'eliminate': 30005,\n",
313 | " 'massu': 54641,\n",
314 | " 'call': 16926,\n",
315 | " 'ethanol': 31338,\n",
316 | " 'blend': 13442,\n",
317 | " 'fuel': 35579,\n",
318 | " 'go': 37632,\n",
319 | " 'carews': 17607,\n",
320 | " 'freak': 35149,\n",
321 | " 'goal': 37635,\n",
322 | " 'leaves': 50701,\n",
323 | " 'roma': 74249,\n",
324 | " 'ruins': 74836,\n",
325 | " 'cemeteries': 18442,\n",
326 | " 'miss': 57268,\n",
327 | " 'out': 63134,\n",
328 | " 'on': 62447,\n",
329 | " 'funds': 35701,\n",
330 | " 'code': 20647,\n",
331 | " 'conduct': 21608,\n",
332 | " 'toughens': 87372,\n",
333 | " 'organ': 62837,\n",
334 | " 'donation': 27965,\n",
335 | " 'regulations': 71844,\n",
336 | " 'commonwealth': 21215,\n",
337 | " 'bank': 10742,\n",
338 | " 'cuts': 24189,\n",
339 | " 'fixed': 33916,\n",
340 | " 'home': 42085,\n",
341 | " 'loan': 52003,\n",
342 | " 'rates': 70681,\n",
343 | " 'help': 41071,\n",
344 | " 'homeless': 42111,\n",
345 | " 'youth': 96099,\n",
346 | " 'chief': 19276,\n",
347 | " 'executive': 31779,\n",
348 | " 'fails': 32361,\n",
349 | " 'secure': 77115,\n",
350 | " 'position': 67583,\n",
351 | " 'councillor': 22881,\n",
352 | " 'contest': 22098,\n",
353 | " 'wollongong': 94807,\n",
354 | " 'independent': 44156,\n",
355 | " 'moves': 58506,\n",
356 | " 'protect': 68852,\n",
357 | " 'tas': 85112,\n",
358 | " 'heritage': 41262,\n",
359 | " 'garden': 36214,\n",
360 | " 'welcomes': 93475,\n",
361 | " 'ambulance': 7156,\n",
362 | " 'levy': 51207,\n",
363 | " 'decision': 25214,\n",
364 | " 'insurance': 44867,\n",
365 | " 'breakthrough': 15022,\n",
366 | " 'crean': 23282,\n",
367 | " 'tells': 85578,\n",
368 | " 'alp': 6934,\n",
369 | " 'leadership': 50590,\n",
370 | " 'critics': 23514,\n",
371 | " 'shut': 78779,\n",
372 | " 'up': 90347,\n",
373 | " 'dargo': 24729,\n",
374 | " 'threat': 86274,\n",
375 | " 'expected': 31916,\n",
376 | " 'death': 25061,\n",
377 | " 'toll': 86998,\n",
378 | " 'continues': 22124,\n",
379 | " 'climb': 20233,\n",
380 | " 'korean': 49142,\n",
381 | " 'subway': 83245,\n",
382 | " 'dems': 25839,\n",
383 | " 'hold': 41969,\n",
384 | " 'plebiscite': 66791,\n",
385 | " 'iraqi': 45397,\n",
386 | " 'conflict': 21685,\n",
387 | " 'dent': 25907,\n",
388 | " 'downs': 28277,\n",
389 | " 'philippoussis': 65916,\n",
390 | " 'tie': 86503,\n",
391 | " 'break': 14998,\n",
392 | " 'thriller': 86306,\n",
393 | " 'de': 24986,\n",
394 | " 'villiers': 91798,\n",
395 | " 'learn': 50660,\n",
396 | " 'fate': 32794,\n",
397 | " 'march': 54142,\n",
398 | " 'digital': 26772,\n",
399 | " 'tv': 88863,\n",
400 | " 'will': 94248,\n",
401 | " 'become': 11702,\n",
402 | " 'commonplace': 21209,\n",
403 | " 'direct': 26952,\n",
404 | " 'anger': 7562,\n",
405 | " 'at': 9234,\n",
406 | " 'govt': 38147,\n",
407 | " 'not': 61253,\n",
408 | " 'soldiers': 80452,\n",
409 | " 'urges': 90501,\n",
410 | " 'dispute': 27396,\n",
411 | " 'smithton': 79980,\n",
412 | " 'vegetable': 91267,\n",
413 | " 'processing': 68512,\n",
414 | " 'plant': 66656,\n",
415 | " 'dog': 27798,\n",
416 | " 'mauls': 54861,\n",
417 | " '18': 1398,\n",
418 | " 'month': 57926,\n",
419 | " 'old': 62303,\n",
420 | " 'toddler': 86907,\n",
421 | " 'nsw': 61416,\n",
422 | " 'dying': 29141,\n",
423 | " 'passengers': 64641,\n",
424 | " 'phoned': 65972,\n",
425 | " 'england': 30620,\n",
426 | " 'change': 18795,\n",
427 | " 'three': 86288,\n",
428 | " 'wales': 92486,\n",
429 | " 'epa': 30924,\n",
430 | " 'still': 82344,\n",
431 | " 'trying': 88469,\n",
432 | " 'recover': 71346,\n",
433 | " 'chemical': 19133,\n",
434 | " 'clean': 20096,\n",
435 | " 'costs': 22801,\n",
436 | " 'expressions': 32041,\n",
437 | " 'interest': 44962,\n",
438 | " 'sought': 80721,\n",
439 | " 'build': 15928,\n",
440 | " 'livestock': 51908,\n",
441 | " 'fed': 32968,\n",
442 | " 're': 70883,\n",
443 | " 'introduce': 45155,\n",
444 | " 'national': 59796,\n",
445 | " 'firefighters': 33725,\n",
446 | " 'contain': 22039,\n",
447 | " 'acid': 5207,\n",
448 | " 'spill': 81173,\n",
449 | " 'injured': 44603,\n",
450 | " 'head': 40658,\n",
451 | " 'highway': 41513,\n",
452 | " 'crash': 23211,\n",
453 | " 'freedom': 35195,\n",
454 | " 'records': 71331,\n",
455 | " 'net': 60280,\n",
456 | " 'profit': 68589,\n",
457 | " 'third': 86149,\n",
458 | " 'successive': 83264,\n",
459 | " 'allocated': 6832,\n",
460 | " 'domestic': 27900,\n",
461 | " 'violence': 91872,\n",
462 | " 'risk': 73724,\n",
463 | " 'announced': 7729,\n",
464 | " 'bridge': 15202,\n",
465 | " 'work': 95097,\n",
466 | " 'cadell': 16751,\n",
467 | " 'upgrade': 90371,\n",
468 | " 'restore': 72886,\n",
469 | " 'cossack': 22764,\n",
470 | " 'german': 36867,\n",
471 | " 'court': 22967,\n",
472 | " 'give': 37304,\n",
473 | " 'verdict': 91423,\n",
474 | " 'sept': 77482,\n",
475 | " '11': 578,\n",
476 | " 'accused': 5154,\n",
477 | " 'gilchrist': 37105,\n",
478 | " 'backs': 10207,\n",
479 | " 'rest': 72849,\n",
480 | " 'policy': 67156,\n",
481 | " 'girl': 37247,\n",
482 | " 'gold': 37746,\n",
483 | " 'coast': 20509,\n",
484 | " 'hear': 40763,\n",
485 | " 'about': 4891,\n",
486 | " 'bilby': 12799,\n",
487 | " 'project': 68645,\n",
488 | " 'golf': 37793,\n",
489 | " 'club': 20378,\n",
490 | " 'feeling': 33026,\n",
491 | " 'smoking': 80006,\n",
492 | " 'ban': 10642,\n",
493 | " 'impact': 43735,\n",
494 | " 'blame': 13344,\n",
495 | " 'ethanols': 31340,\n",
496 | " 'unpopularity': 90087,\n",
497 | " 'greens': 38568,\n",
498 | " 'offer': 62127,\n",
499 | " 'police': 67137,\n",
500 | " 'station': 81989,\n",
501 | " 'alternative': 6998,\n",
502 | " 'griffiths': 38693,\n",
503 | " 'under': 89503,\n",
504 | " 'knock': 48873,\n",
505 | " 'back': 10139,\n",
506 | " 'group': 38880,\n",
507 | " 'meet': 55704,\n",
508 | " 'north': 61169,\n",
509 | " 'west': 93598,\n",
510 | " 'wa': 92293,\n",
511 | " 'rock': 74032,\n",
512 | " 'art': 8749,\n",
513 | " 'hacker': 39502,\n",
514 | " 'gains': 35933,\n",
515 | " 'access': 5058,\n",
516 | " 'eight': 29762,\n",
517 | " 'credit': 23317,\n",
518 | " 'cards': 17575,\n",
519 | " 'hanson': 40021,\n",
520 | " 'grossly': 38848,\n",
521 | " 'naive': 59478,\n",
522 | " 'issues': 45648,\n",
523 | " 'costa': 22772,\n",
524 | " 'where': 93823,\n",
525 | " 'she': 78103,\n",
526 | " 'came': 17039,\n",
527 | " 'from': 35441,\n",
528 | " 'mp': 58554,\n",
529 | " 'harrington': 40268,\n",
530 | " 'raring': 70627,\n",
531 | " 'after': 5882,\n",
532 | " 'health': 40738,\n",
533 | " 'minister': 57006,\n",
534 | " 'and': 7458,\n",
535 | " 'tissue': 86793,\n",
536 | " 'storage': 82568,\n",
537 | " 'heavy': 40860,\n",
538 | " 'metal': 56226,\n",
539 | " 'deposits': 26014,\n",
540 | " 'survey': 83928,\n",
541 | " 'nearing': 60009,\n",
542 | " 'end': 30492,\n",
543 | " 'rios': 73660,\n",
544 | " 'pulls': 69170,\n",
545 | " 'buenos': 15867,\n",
546 | " 'aires': 6266,\n",
547 | " 'open': 62580,\n",
548 | " 'inquest': 44685,\n",
549 | " 'finds': 33583,\n",
550 | " 'mans': 53996,\n",
551 | " 'accidental': 5075,\n",
552 | " 'investigations': 45256,\n",
553 | " 'underway': 89649,\n",
554 | " 'investigation': 45255,\n",
555 | " 'elster': 30125,\n",
556 | " 'creek': 23336,\n",
557 | " 'iraqs': 45400,\n",
558 | " 'neighbours': 60148,\n",
559 | " 'plead': 66771,\n",
560 | " 'continued': 22123,\n",
561 | " 'inspections': 44768,\n",
562 | " 'own': 63616,\n",
563 | " 'rebuilding': 71104,\n",
564 | " 'white': 93920,\n",
565 | " 'house': 42605,\n",
566 | " 'irish': 45424,\n",
567 | " 'man': 53773,\n",
568 | " 'arrested': 8691,\n",
569 | " 'omagh': 62390,\n",
570 | " 'bombing': 14076,\n",
571 | " 'irrigators': 45499,\n",
572 | " 'vote': 92185,\n",
573 | " 'river': 73776,\n",
574 | " 'management': 53781,\n",
575 | " 'israeli': 45634,\n",
576 | " 'forces': 34654,\n",
577 | " 'push': 69358,\n",
578 | " 'gaza': 36484,\n",
579 | " 'strip': 82868,\n",
580 | " 'jury': 47067,\n",
581 | " 'consider': 21893,\n",
582 | " 'murder': 58997,\n",
583 | " 'case': 17910,\n",
584 | " 'juvenile': 47098,\n",
585 | " 'sex': 77723,\n",
586 | " 'offenders': 62118,\n",
587 | " 'unlikely': 89997,\n",
588 | " 'reoffend': 72386,\n",
589 | " 'kelly': 47887,\n",
590 | " 'disgusted': 27220,\n",
591 | " 'alleged': 6772,\n",
592 | " 'bp': 14743,\n",
593 | " 'scare': 76249,\n",
594 | " 'surprised': 83900,\n",
595 | " 'confidence': 21659,\n",
596 | " 'low': 52529,\n",
597 | " '314': 2945,\n",
598 | " 'missing': 57277,\n",
599 | " 'last': 50289,\n",
600 | " 'minute': 57079,\n",
601 | " 'hands': 39924,\n",
602 | " 'alinghi': 6697,\n",
603 | " 'lead': 50579,\n",
604 | " 'demand': 25731,\n",
605 | " 'service': 77600,\n",
606 | " 'central': 18485,\n",
607 | " 'qld': 69583,\n",
608 | " 'hijack': 41532,\n",
609 | " 'attempt': 9366,\n",
610 | " 'charged': 18888,\n",
611 | " 'cooma': 22341,\n",
612 | " 'fined': 33585,\n",
613 | " 'aboriginal': 4858,\n",
614 | " 'tent': 85708,\n",
615 | " 'embassy': 30207,\n",
616 | " 'raid': 70250,\n",
617 | " 'jailed': 45909,\n",
618 | " 'keno': 47973,\n",
619 | " 'fraud': 35121,\n",
620 | " 'knife': 48842,\n",
621 | " 'hijacks': 41538,\n",
622 | " 'light': 51483,\n",
623 | " 'plane': 66623,\n",
624 | " 'martin': 54485,\n",
625 | " 'lobby': 52025,\n",
626 | " 'losing': 52427,\n",
627 | " 'nt': 61438,\n",
628 | " 'seat': 77029,\n",
629 | " 'massive': 54637,\n",
630 | " 'drug': 28630,\n",
631 | " 'crop': 23599,\n",
632 | " 'discovered': 27130,\n",
633 | " 'western': 93624,\n",
634 | " 'mayor': 54964,\n",
635 | " 'warns': 92850,\n",
636 | " 'landfill': 50013,\n",
637 | " 'protesters': 68881,\n",
638 | " 'meeting': 55707,\n",
639 | " 'tick': 86467,\n",
640 | " 'clearance': 20115,\n",
641 | " 'focus': 34435,\n",
642 | " 'broken': 15466,\n",
643 | " 'hill': 41573,\n",
644 | " 'woes': 94760,\n",
645 | " 'moderate': 57518,\n",
646 | " 'lift': 51467,\n",
647 | " 'wages': 92363,\n",
648 | " 'growth': 38916,\n",
649 | " 'more': 58099,\n",
650 | " 'than': 85951,\n",
651 | " '40': 3236,\n",
652 | " 'pc': 64958,\n",
653 | " 'young': 96071,\n",
654 | " 'men': 55929,\n",
655 | " 'drink': 28514,\n",
656 | " 'alcohol': 6540,\n",
657 | " 'restrictions': 72905,\n",
658 | " 'predicted': 67988,\n",
659 | " 'northern': 61194,\n",
660 | " 'women': 94844,\n",
661 | " 'councillors': 22882,\n",
662 | " 'most': 58296,\n",
663 | " 'highly': 41502,\n",
664 | " 'educated': 29614,\n",
665 | " 'live': 51879,\n",
666 | " 'raises': 70323,\n",
667 | " 'hospital': 42497,\n",
668 | " 'concerns': 21497,\n",
669 | " 'parliament': 64462,\n",
670 | " 'rejects': 72025,\n",
671 | " 'claims': 19947,\n",
672 | " 'mugabe': 58700,\n",
673 | " 'touch': 87358,\n",
674 | " 'down': 28241,\n",
675 | " 'paris': 64397,\n",
676 | " 'gallery': 36017,\n",
677 | " 'gets': 36924,\n",
678 | " 'all': 6731,\n",
679 | " 'clear': 20114,\n",
680 | " 'nato': 59824,\n",
681 | " 'gives': 37308,\n",
682 | " 'green': 38509,\n",
683 | " 'defend': 25414,\n",
684 | " 'turkey': 88755,\n",
685 | " 'nca': 59958,\n",
686 | " 'defends': 25424,\n",
687 | " 'new': 60375,\n",
688 | " 'zealand': 96377,\n",
689 | " 'imposes': 43841,\n",
690 | " 'visa': 91924,\n",
691 | " 'entry': 30838,\n",
692 | " 'zimbabwe': 96512,\n",
693 | " 'no': 60941,\n",
694 | " 'side': 78862,\n",
695 | " 'effects': 29655,\n",
696 | " 'whooping': 94028,\n",
697 | " 'cough': 22853,\n",
698 | " 'vaccine': 90870,\n",
699 | " 'holding': 41979,\n",
700 | " 'vegetation': 91274,\n",
701 | " 'running': 74898,\n",
702 | " 'race': 70057,\n",
703 | " 'campaign': 17094,\n",
704 | " 'pledges': 66800,\n",
705 | " '50m': 3598,\n",
706 | " 'drought': 28592,\n",
707 | " 'relief': 72114,\n",
708 | " 'boosts': 14287,\n",
709 | " 'nurse': 61601,\n",
710 | " 'number': 61554,\n",
711 | " 'overseas': 63505,\n",
712 | " 'intake': 44884,\n",
713 | " 'nth': 61465,\n",
714 | " 'koreans': 49143,\n",
715 | " 'seek': 77182,\n",
716 | " 'asylum': 9229,\n",
717 | " 'japanese': 46065,\n",
718 | " 'nursing': 61609,\n",
719 | " 'student': 82992,\n",
720 | " 'oh': 62220,\n",
721 | " 'brother': 15565,\n",
722 | " 'your': 96085,\n",
723 | " 'times': 86647,\n",
724 | " 'says': 76141,\n",
725 | " 'ganguly': 36166,\n",
726 | " 'senior': 77394,\n",
727 | " 'omodei': 62440,\n",
728 | " 'stay': 82028,\n",
729 | " 'politics': 67196,\n",
730 | " 'onesteel': 62482,\n",
731 | " 'invest': 45236,\n",
732 | " '80m': 4284,\n",
733 | " 'whyalla': 94047,\n",
734 | " 'steelworks': 82112,\n",
735 | " 'opposition': 62673,\n",
736 | " 'recherche': 71195,\n",
737 | " 'bay': 11437,\n",
738 | " 'orientation': 62879,\n",
739 | " 'begins': 11856,\n",
740 | " 'uni': 89853,\n",
741 | " 'students': 82993,\n",
742 | " 'osullivan': 63050,\n",
743 | " 'world': 95154,\n",
744 | " 'cross': 23614,\n",
745 | " 'country': 22933,\n",
746 | " 'doubt': 28179,\n",
747 | " 'pagan': 63840,\n",
748 | " 'rule': 74843,\n",
749 | " 'changes': 18804,\n",
750 | " 'necessary': 60027,\n",
751 | " 'pair': 63893,\n",
752 | " 'face': 32254,\n",
753 | " 'ayr': 9996,\n",
754 | " 'patterson': 64808,\n",
755 | " 'attend': 9372,\n",
756 | " 'show': 78650,\n",
757 | " 'displays': 27375,\n",
758 | " 'govts': 38149,\n",
759 | " 'arrogance': 8713,\n",
760 | " 'snubs': 80252,\n",
761 | " 'avoid': 9844,\n",
762 | " 'lions': 51739,\n",
763 | " 'den': 25844,\n",
764 | " 'peace': 64985,\n",
765 | " 'agreement': 6050,\n",
766 | " 'may': 54922,\n",
767 | " 'bring': 15302,\n",
768 | " 'respite': 72819,\n",
769 | " 'venezuela': 91354,\n",
770 | " 'pienaar': 66163,\n",
771 | " 'shines': 78377,\n",
772 | " 'ajax': 6355,\n",
773 | " 'frustrate': 35522,\n",
774 | " 'arsenal': 8733,\n",
775 | " 'second': 77074,\n",
776 | " 'skatepark': 79376,\n",
777 | " 'encourage': 30473,\n",
778 | " 'farmers': 32645,\n",
779 | " 'plantation': 66659,\n",
780 | " 'timber': 86612,\n",
781 | " 'png': 66961,\n",
782 | " 'nurses': 61606,\n",
783 | " 'colleague': 20838,\n",
784 | " 'raped': 70578,\n",
785 | " 'way': 93155,\n",
786 | " 'cracking': 23142,\n",
787 | " 'driver': 28538,\n",
788 | " 'safety': 75294,\n",
789 | " 'policewomen': 67148,\n",
790 | " 'accusations': 5151,\n",
791 | " 'feature': 32956,\n",
792 | " 'federal': 32975,\n",
793 | " 'crime': 23426,\n",
794 | " 'probe': 68476,\n",
795 | " 'launched': 50388,\n",
796 | " 'program': 68618,\n",
797 | " 'monitor': 57819,\n",
798 | " 'forest': 34723,\n",
799 | " 'harvested': 40339,\n",
800 | " 'areas': 8459,\n",
801 | " 'public': 69072,\n",
802 | " 'check': 19059,\n",
803 | " 'gas': 36313,\n",
804 | " 'cylinders': 24298,\n",
805 | " 'warned': 92831,\n",
806 | " 'phone': 65969,\n",
807 | " 'scam': 76198,\n",
808 | " 'qantas': 69487,\n",
809 | " 'international': 45018,\n",
810 | " 'crews': 23398,\n",
811 | " 'cut': 24173,\n",
812 | " '2500': 2359,\n",
813 | " 'jobs': 46582,\n",
814 | " 'outrages': 63261,\n",
815 | " 'unions': 89927,\n",
816 | " 'qr': 69611,\n",
817 | " 'planning': 66653,\n",
818 | " 'route': 74587,\n",
819 | " 'sackings': 75209,\n",
820 | " 'questions': 69837,\n",
821 | " 'grows': 38915,\n",
822 | " 'rabbit': 70034,\n",
823 | " 'control': 22190,\n",
824 | " 'trial': 88081,\n",
825 | " 'radioactive': 70147,\n",
826 | " 'wmcs': 94726,\n",
827 | " 'olympic': 62381,\n",
828 | " 'dam': 24521,\n",
829 | " 'mine': 56911,\n",
830 | " 'rain': 70286,\n",
831 | " 'eases': 29296,\n",
832 | " 'wheatbelt': 93787,\n",
833 | " 'reading': 70923,\n",
834 | " 'first': 33787,\n",
835 | " 'division': 27604,\n",
836 | " 'amount': 7290,\n",
837 | " 'gladstone': 37338,\n",
838 | " 'ventures': 91391,\n",
839 | " 'refshauge': 71694,\n",
840 | " 'regulator': 71845,\n",
841 | " 'inspect': 44764,\n",
842 | " 'gm': 37596,\n",
843 | " 'canola': 17311,\n",
844 | " 'trials': 88086,\n",
845 | " 'report': 72501,\n",
846 | " 'highlights': 41500,\n",
847 | " 'container': 22041,\n",
848 | " 'terminal': 85755,\n",
849 | " 'potential': 67682,\n",
850 | " 'resource': 72797,\n",
851 | " 'stocks': 82456,\n",
852 | " 'ords': 62819,\n",
853 | " 'restraint': 72897,\n",
854 | " 'order': 62798,\n",
855 | " 'issued': 45647,\n",
856 | " 'anti': 7846,\n",
857 | " 'discrimination': 27158,\n",
858 | " 'rfs': 73305,\n",
859 | " 'claim': 19940,\n",
860 | " 'that': 85972,\n",
861 | " 'authorities': 9698,\n",
862 | " 'spurned': 81495,\n",
863 | " 'ricciuto': 73403,\n",
864 | " 'undergoes': 89546,\n",
865 | " 'surgery': 83867,\n",
866 | " 'ankle': 7652,\n",
867 | " 'rice': 73407,\n",
868 | " 'mill': 56747,\n",
869 | " 'closures': 20336,\n",
870 | " 'put': 69373,\n",
871 | " '300': 2816,\n",
872 | " 'rsl': 74695,\n",
873 | " 'angry': 7599,\n",
874 | " 'troop': 88288,\n",
875 | " 'harassment': 40088,\n",
876 | " 'review': 73186,\n",
877 | " 'bushwalker': 16447,\n",
878 | " 'sa': 75132,\n",
879 | " 'premier': 68073,\n",
880 | " 'action': 5298,\n",
881 | " 'murray': 59045,\n",
882 | " 'saudi': 76013,\n",
883 | " 'arabians': 8317,\n",
884 | " 'stand': 81808,\n",
885 | " 'al': 6415,\n",
886 | " 'qaeda': 69473,\n",
887 | " 'arabia': 8315,\n",
888 | " 'arabs': 8323,\n",
889 | " 'inevitable': 44336,\n",
890 | " 'search': 76986,\n",
891 | " 'resolution': 72775,\n",
892 | " 'shortly': 78616,\n",
893 | " 'shire': 78441,\n",
894 | " 'offers': 62131,\n",
895 | " 'assurances': 9150,\n",
896 | " 'finances': 33557,\n",
897 | " 'six': 79325,\n",
898 | " 'palestinians': 63978,\n",
899 | " 'killed': 48390,\n",
900 | " 'incursion': 44116,\n",
901 | " 'slow': 79816,\n",
902 | " 'recovery': 71351,\n",
903 | " 'economy': 29483,\n",
904 | " 'bans': 10796,\n",
905 | " 'hit': 41746,\n",
906 | " 'tabcorp': 84467,\n",
907 | " 'bottom': 14536,\n",
908 | " 'line': 51658,\n",
909 | " 'snowtown': 80242,\n",
910 | " 'delayed': 25599,\n",
911 | " 'forced': 34648,\n",
912 | " 'label': 49706,\n",
913 | " 'stations': 81995,\n",
914 | " 'get': 36915,\n",
915 | " 'sterrey': 82267,\n",
916 | " 'steer': 82129,\n",
917 | " 'sharks': 78021,\n",
918 | " 'sign': 78948,\n",
919 | " 'fisherman': 33815,\n",
920 | " 'stop': 82549,\n",
921 | " 'changing': 18808,\n",
922 | " 'the': 85989,\n",
923 | " 'rules': 74848,\n",
924 | " 'fans': 32563,\n",
925 | " 'tell': 85574,\n",
926 | " 'afl': 5843,\n",
927 | " 'sugar': 83337,\n",
928 | " 'industry': 44298,\n",
929 | " 'revealed': 73133,\n",
930 | " 'surge': 83862,\n",
931 | " 'car': 17486,\n",
932 | " 'sales': 75436,\n",
933 | " 'abs': 4922,\n",
934 | " 'swiss': 84250,\n",
935 | " 'challengers': 18717,\n",
936 | " 'looking': 52330,\n",
937 | " 'future': 35794,\n",
938 | " 'taipans': 84637,\n",
939 | " 'placing': 66601,\n",
940 | " 'publics': 69085,\n",
941 | " 'talk': 84732,\n",
942 | " 'asian': 8952,\n",
943 | " 'nuclear': 61491,\n",
944 | " 'arms': 8622,\n",
945 | " 'unhelpful': 89842,\n",
946 | " 'downer': 28246,\n",
947 | " 'tasmanian': 85153,\n",
948 | " 'scientists': 76573,\n",
949 | " 'east': 29306,\n",
950 | " 'taylor': 85313,\n",
951 | " 'denies': 25866,\n",
952 | " 'calling': 16948,\n",
953 | " 'waugh': 93126,\n",
954 | " 'quit': 69940,\n",
955 | " 'teen': 85464,\n",
956 | " 'charges': 18891,\n",
957 | " 'testing': 85864,\n",
958 | " 'shows': 78688,\n",
959 | " 'dioxin': 26919,\n",
960 | " 'above': 4896,\n",
961 | " 'drinking': 28519,\n",
962 | " 'standards': 81817,\n",
963 | " 'thousands': 86260,\n",
964 | " 'remember': 72204,\n",
965 | " '61st': 3855,\n",
966 | " 'anniversary': 7709,\n",
967 | " 'darwin': 24797,\n",
968 | " 'ask': 8968,\n",
969 | " 'members': 55895,\n",
970 | " 'support': 83787,\n",
971 | " 'protests': 68885,\n",
972 | " 'continue': 22122,\n",
973 | " 'tree': 87957,\n",
974 | " 'disease': 27180,\n",
975 | " 'study': 83003,\n",
976 | " 'us': 90540,\n",
977 | " 'aircraft': 6253,\n",
978 | " 'attack': 9346,\n",
979 | " 'sth': 82297,\n",
980 | " 'target': 85032,\n",
981 | " 'vff': 91564,\n",
982 | " 'buy': 16575,\n",
983 | " 'stock': 82422,\n",
984 | " 'feed': 33007,\n",
985 | " 'pellets': 65196,\n",
986 | " 'affected': 5776,\n",
987 | " 'vic': 91594,\n",
988 | " 'local': 52042,\n",
989 | " 'councils': 22884,\n",
990 | " 'welcome': 93470,\n",
991 | " 'single': 79192,\n",
992 | " 'polling': 67227,\n",
993 | " 'day': 24925,\n",
994 | " 'victorian': 91650,\n",
995 | " 'honoured': 42230,\n",
996 | " 'awards': 9887,\n",
997 | " 'vowles': 92206,\n",
998 | " 'retire': 73013,\n",
999 | " 'season': 77013,\n",
1000 | " 'coach': 20469,\n",
1001 | " 'accuses': 5158,\n",
1002 | " 'players': 66732,\n",
1003 | " 'belittling': 11984,\n",
1004 | " 'red': 71402,\n",
1005 | " 'warne': 92830,\n",
1006 | " 'hearing': 40769,\n",
1007 | " 'set': 77631,\n",
1008 | " 'friday': 35347,\n",
1009 | " 'webb': 93281,\n",
1010 | " 'favourite': 32867,\n",
1011 | " 'ladies': 49787,\n",
1012 | " 'masters': 54671,\n",
1013 | " 'widnes': 94102,\n",
1014 | " 'abandon': 4675,\n",
1015 | " 'paul': 64825,\n",
1016 | " 'bid': 12680,\n",
1017 | " 'wildlife': 94211,\n",
1018 | " 'sanctuaries': 75656,\n",
1019 | " 'williams': 94282,\n",
1020 | " 'tight': 86539,\n",
1021 | " 'bowling': 14683,\n",
1022 | " 'key': 48133,\n",
1023 | " 'warriors': 92899,\n",
1024 | " 'wine': 94449,\n",
1025 | " 'bounces': 14595,\n",
1026 | " 'sacking': 75208,\n",
1027 | " 'worksafe': 95142,\n",
1028 | " 'probes': 68479,\n",
1029 | " 'potato': 67673,\n",
1030 | " 'harvester': 40340,\n",
1031 | " 'injuries': 44605,\n",
1032 | " '15': 1055,\n",
1033 | " 'dead': 24992,\n",
1034 | " 'rebel': 71072,\n",
1035 | " 'philippines': 65913,\n",
1036 | " 'army': 8627,\n",
1037 | " 'abattoir': 4692,\n",
1038 | " 'sale': 75429,\n",
1039 | " 'again': 5910,\n",
1040 | " 'academic': 5009,\n",
1041 | " 'upbeat': 90349,\n",
1042 | " 'higher': 41485,\n",
1043 | " 'education': 29619,\n",
1044 | " 'administrator': 5531,\n",
1045 | " 'appointed': 8171,\n",
1046 | " 'land': 49990,\n",
1047 | " 'aec': 5706,\n",
1048 | " 'declare': 25236,\n",
1049 | " 'if': 43424,\n",
1050 | " 'lose': 52421,\n",
1051 | " 'parliamentary': 64465,\n",
1052 | " 'amcor': 7167,\n",
1053 | " 'solid': 80472,\n",
1054 | " 'result': 72930,\n",
1055 | " 'americas': 7208,\n",
1056 | " 'cup': 24016,\n",
1057 | " 'fourth': 34968,\n",
1058 | " 'cancelled': 17192,\n",
1059 | " 'poised': 67071,\n",
1060 | " 'swoop': 84274,\n",
1061 | " 'beckham': 11688,\n",
1062 | " 'less': 51083,\n",
1063 | " 'austeel': 9594,\n",
1064 | " 'eis': 29783,\n",
1065 | " 'release': 72085,\n",
1066 | " 'due': 28765,\n",
1067 | " 'soon': 80622,\n",
1068 | " 'flag': 33945,\n",
1069 | " '100th': 468,\n",
1070 | " 'awu': 9938,\n",
1071 | " 'port': 67501,\n",
1072 | " 'kembla': 47911,\n",
1073 | " 'baby': 10105,\n",
1074 | " 'badly': 10285,\n",
1075 | " 'burnt': 16323,\n",
1076 | " 'brisbane': 15322,\n",
1077 | " 'bad': 10242,\n",
1078 | " 'weather': 93261,\n",
1079 | " 'might': 56612,\n",
1080 | " 'have': 40502,\n",
1081 | " 'caused': 18209,\n",
1082 | " 'iranian': 45393,\n",
1083 | " 'depleted': 25976,\n",
1084 | " 'juve': 47096,\n",
1085 | " 'beware': 12557,\n",
1086 | " 'standard': 81811,\n",
1087 | " 'alcoholic': 6543,\n",
1088 | " 'dream': 28430,\n",
1089 | " 'reality': 70968,\n",
1090 | " 'sparkies': 80937,\n",
1091 | " 'britain': 15348,\n",
1092 | " 'nationals': 59809,\n",
1093 | " 'leave': 50697,\n",
1094 | " 'high': 41481,\n",
1095 | " 'overturns': 63571,\n",
1096 | " 'blair': 13331,\n",
1097 | " 'magician': 53278,\n",
1098 | " 'entomb': 30808,\n",
1099 | " 'himself': 41626,\n",
1100 | " 'cheese': 19094,\n",
1101 | " 'bungle': 16156,\n",
1102 | " 'doctor': 27742,\n",
1103 | " 'waiting': 92428,\n",
1104 | " 'practise': 67849,\n",
1105 | " 'coronial': 22628,\n",
1106 | " 'inquiry': 44696,\n",
1107 | " 'winds': 94434,\n",
1108 | " 'bush': 16411,\n",
1109 | " 'thanks': 85960,\n",
1110 | " 'ambos': 7149,\n",
1111 | " 'wake': 92446,\n",
1112 | " 'funding': 35693,\n",
1113 | " 'canegrowers': 17242,\n",
1114 | " 'hope': 42298,\n",
1115 | " 'late': 50302,\n",
1116 | " 'summer': 83448,\n",
1117 | " 'capriati': 17437,\n",
1118 | " 'hungry': 42930,\n",
1119 | " 'dubai': 28706,\n",
1120 | " 'celts': 18436,\n",
1121 | " 'underdogs': 89527,\n",
1122 | " 'uefa': 89131,\n",
1123 | " 'clash': 20022,\n",
1124 | " 'oneill': 62472,\n",
1125 | " 'chambers': 18735,\n",
1126 | " 'vows': 92207,\n",
1127 | " 'smash': 79922,\n",
1128 | " 'mark': 54284,\n",
1129 | " 'charvis': 18964,\n",
1130 | " 'pays': 64940,\n",
1131 | " 'penalty': 65234,\n",
1132 | " 'humphreys': 42891,\n",
1133 | " 'earns': 29258,\n",
1134 | " 'shock': 78494,\n",
1135 | " 'christmas': 19620,\n",
1136 | " 'detention': 26320,\n",
1137 | " 'centre': 18497,\n",
1138 | " 'quashed': 69762,\n",
1139 | " 'defence': 25410,\n",
1140 | " 'spending': 81114,\n",
1141 | " 'priority': 68396,\n",
1142 | " 'causing': 18215,\n",
1143 | " 'indigenous': 44206,\n",
1144 | " 'autopsy': 9741,\n",
1145 | " 'consent': 21862,\n",
1146 | " 'college': 20860,\n",
1147 | " 'experience': 31943,\n",
1148 | " 'concern': 21494,\n",
1149 | " 'covered': 23016,\n",
1150 | " 'by': 16613,\n",
1151 | " 'legal': 50796,\n",
1152 | " 'concorde': 21537,\n",
1153 | " 'makes': 53528,\n",
1154 | " 'emergency': 30269,\n",
1155 | " 'landing': 50021,\n",
1156 | " 'canada': 17157,\n",
1157 | " 'laments': 49934,\n",
1158 | " 'job': 46571,\n",
1159 | " 'advertising': 5670,\n",
1160 | " 'general': 36671,\n",
1161 | " 'manager': 53782,\n",
1162 | " 'step': 82205,\n",
1163 | " 'vandalism': 91024,\n",
1164 | " 'reporting': 72507,\n",
1165 | " 'reward': 73250,\n",
1166 | " 'racing': 70086,\n",
1167 | " 'clubs': 20389,\n",
1168 | " 'tab': 84457,\n",
1169 | " 'fears': 32937,\n",
1170 | " 'longford': 52275,\n",
1171 | " 'compo': 21395,\n",
1172 | " 'today': 86902,\n",
1173 | " 'cristal': 23478,\n",
1174 | " 'libertadores': 51332,\n",
1175 | " 'streak': 82755,\n",
1176 | " 'cuper': 24021,\n",
1177 | " 'slams': 79615,\n",
1178 | " 'inter': 44931,\n",
1179 | " 'boy': 14712,\n",
1180 | " 'recoba': 71242,\n",
1181 | " 'deportivo': 26003,\n",
1182 | " 'slip': 79770,\n",
1183 | " 'buoyant': 16197,\n",
1184 | " 'minnows': 57041,\n",
1185 | " 'basel': 11191,\n",
1186 | " 'distance': 27468,\n",
1187 | " 'swimmer': 84212,\n",
1188 | " 'maroney': 54377,\n",
1189 | " 'it': 45661,\n",
1190 | " 'quits': 69946,\n",
1191 | " 'dixon': 27625,\n",
1192 | " ...}"
1193 | ]
1194 | },
1195 | "execution_count": 11,
1196 | "metadata": {},
1197 | "output_type": "execute_result"
1198 | }
1199 | ],
1200 | "source": [
1201 | "vec.vocabulary_"
1202 | ]
1203 | },
1204 | {
1205 | "cell_type": "code",
1206 | "execution_count": 12,
1207 | "metadata": {},
1208 | "outputs": [
1209 | {
1210 | "data": {
1211 | "text/plain": [
1212 | "(1103665, 96687)"
1213 | ]
1214 | },
1215 | "execution_count": 12,
1216 | "metadata": {},
1217 | "output_type": "execute_result"
1218 | }
1219 | ],
1220 | "source": [
1221 | "doc_term.shape"
1222 | ]
1223 | },
1224 | {
1225 | "cell_type": "code",
1226 | "execution_count": 13,
1227 | "metadata": {},
1228 | "outputs": [
1229 | {
1230 | "data": {
1231 | "text/plain": [
1232 | "(1103665, 2)"
1233 | ]
1234 | },
1235 | "execution_count": 13,
1236 | "metadata": {},
1237 | "output_type": "execute_result"
1238 | }
1239 | ],
1240 | "source": [
1241 | "df.shape"
1242 | ]
1243 | },
1244 | {
1245 | "cell_type": "markdown",
1246 | "metadata": {},
1247 | "source": [
1248 | "## Vectorize (part 2)\n",
1249 | "\n",
1250 | "**Build the matrix such that the value at `(i,j)` represents a sort of *normalized frequency*,** which takes into account (a) the density of term `j` in document `i`, as well as (b) the number of documents in which that term occurs.\n",
1251 | "\n",
1252 | "*Hint: Try `TfidfVectorizer`. What is this?*"
1253 | ]
1254 | },
1255 | {
1256 | "cell_type": "code",
1257 | "execution_count": 14,
1258 | "metadata": {},
1259 | "outputs": [],
1260 | "source": [
1261 | "vec = TfidfVectorizer()"
1262 | ]
1263 | },
1264 | {
1265 | "cell_type": "code",
1266 | "execution_count": 15,
1267 | "metadata": {},
1268 | "outputs": [],
1269 | "source": [
1270 | "doc_term = vec.fit_transform(docs.values)"
1271 | ]
1272 | },
1273 | {
1274 | "cell_type": "code",
1275 | "execution_count": 16,
1276 | "metadata": {},
1277 | "outputs": [
1278 | {
1279 | "name": "stderr",
1280 | "output_type": "stream",
1281 | "text": [
1282 | "/Users/julialintern/opt/anaconda3/envs/ml/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n",
1283 | " warnings.warn(msg, category=FutureWarning)\n"
1284 | ]
1285 | },
1286 | {
1287 | "data": {
1288 | "text/html": [
1289 | "\n",
1290 | "\n",
1303 | "
\n",
1304 | " \n",
1305 | " \n",
1306 | " | \n",
1307 | " 000 | \n",
1308 | " 002 | \n",
1309 | " 005 | \n",
1310 | " 006 | \n",
1311 | " 007 | \n",
1312 | " 01 | \n",
1313 | " 0101 | \n",
1314 | " 010115 | \n",
1315 | " 010213 | \n",
1316 | " 010215 | \n",
1317 | " ... | \n",
1318 | " zydelig | \n",
1319 | " zygar | \n",
1320 | " zygiefs | \n",
1321 | " zygier | \n",
1322 | " zyl | \n",
1323 | " zylvester | \n",
1324 | " zynga | \n",
1325 | " zyngier | \n",
1326 | " zz | \n",
1327 | " zzz | \n",
1328 | "
\n",
1329 | " \n",
1330 | " \n",
1331 | " \n",
1332 | " 0 | \n",
1333 | " 0.0 | \n",
1334 | " 0.0 | \n",
1335 | " 0.0 | \n",
1336 | " 0.0 | \n",
1337 | " 0.0 | \n",
1338 | " 0.0 | \n",
1339 | " 0.0 | \n",
1340 | " 0.0 | \n",
1341 | " 0.0 | \n",
1342 | " 0.0 | \n",
1343 | " ... | \n",
1344 | " 0.0 | \n",
1345 | " 0.0 | \n",
1346 | " 0.0 | \n",
1347 | " 0.0 | \n",
1348 | " 0.0 | \n",
1349 | " 0.0 | \n",
1350 | " 0.0 | \n",
1351 | " 0.0 | \n",
1352 | " 0.0 | \n",
1353 | " 0.0 | \n",
1354 | "
\n",
1355 | " \n",
1356 | "
\n",
1357 | "
1 rows × 96687 columns
\n",
1358 | "
"
1359 | ],
1360 | "text/plain": [
1361 | " 000 002 005 006 007 01 0101 010115 010213 010215 ... zydelig \\\n",
1362 | "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 \n",
1363 | "\n",
1364 | " zygar zygiefs zygier zyl zylvester zynga zyngier zz zzz \n",
1365 | "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
1366 | "\n",
1367 | "[1 rows x 96687 columns]"
1368 | ]
1369 | },
1370 | "execution_count": 16,
1371 | "metadata": {},
1372 | "output_type": "execute_result"
1373 | }
1374 | ],
1375 | "source": [
1376 | "# we can look at the 1st row to see what is happening, \n",
1377 | "tfidf_df=pd.DataFrame(doc_term[0].toarray(),columns=vec.get_feature_names())\n",
1378 | "tfidf_df.head()"
1379 | ]
1380 | },
1381 | {
1382 | "cell_type": "code",
1383 | "execution_count": 7,
1384 | "metadata": {},
1385 | "outputs": [
1386 | {
1387 | "name": "stderr",
1388 | "output_type": "stream",
1389 | "text": [
1390 | "/Users/julialintern/.local/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n",
1391 | " warnings.warn(msg, category=FutureWarning)\n"
1392 | ]
1393 | },
1394 | {
1395 | "data": {
1396 | "text/html": [
1397 | "\n",
1398 | "\n",
1411 | "
\n",
1412 | " \n",
1413 | " \n",
1414 | " | \n",
1415 | " aa | \n",
1416 | " aaa | \n",
1417 | " aaco | \n",
1418 | " aacta | \n",
1419 | " aamer | \n",
1420 | " aami | \n",
1421 | " aant | \n",
1422 | " aapt | \n",
1423 | " aaron | \n",
1424 | " ab | \n",
1425 | " ... | \n",
1426 | " zsa | \n",
1427 | " zuckerberg | \n",
1428 | " zuckerbergs | \n",
1429 | " zullo | \n",
1430 | " zuma | \n",
1431 | " zurich | \n",
1432 | " zusak | \n",
1433 | " zverev | \n",
1434 | " zvonareva | \n",
1435 | " zygier | \n",
1436 | "
\n",
1437 | " \n",
1438 | " \n",
1439 | " \n",
1440 | " 0 | \n",
1441 | " 0.0 | \n",
1442 | " 0.0 | \n",
1443 | " 0.0 | \n",
1444 | " 0.0 | \n",
1445 | " 0.0 | \n",
1446 | " 0.0 | \n",
1447 | " 0.0 | \n",
1448 | " 0.0 | \n",
1449 | " 0.0 | \n",
1450 | " 0.0 | \n",
1451 | " ... | \n",
1452 | " 0.0 | \n",
1453 | " 0.0 | \n",
1454 | " 0.0 | \n",
1455 | " 0.0 | \n",
1456 | " 0.0 | \n",
1457 | " 0.0 | \n",
1458 | " 0.0 | \n",
1459 | " 0.0 | \n",
1460 | " 0.0 | \n",
1461 | " 0.0 | \n",
1462 | "
\n",
1463 | " \n",
1464 | "
\n",
1465 | "
1 rows × 35226 columns
\n",
1466 | "
"
1467 | ],
1468 | "text/plain": [
1469 | " aa aaa aaco aacta aamer aami aant aapt aaron ab ... zsa \\\n",
1470 | "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 \n",
1471 | "\n",
1472 | " zuckerberg zuckerbergs zullo zuma zurich zusak zverev zvonareva \\\n",
1473 | "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n",
1474 | "\n",
1475 | " zygier \n",
1476 | "0 0.0 \n",
1477 | "\n",
1478 | "[1 rows x 35226 columns]"
1479 | ]
1480 | },
1481 | "execution_count": 7,
1482 | "metadata": {},
1483 | "output_type": "execute_result"
1484 | }
1485 | ],
1486 | "source": [
1487 | "# Let's do a bit better .. \n",
1488 | "vec=TfidfVectorizer(stop_words='english',max_df=.75,min_df=6,token_pattern=r'(?u)\\b[A-Za-z]+\\b')\n",
1489 | "doc_term=vec.fit_transform(docs.values)\n",
1490 | "\n",
1491 | "tfidf_df=pd.DataFrame(doc_term[0].toarray(),columns=vec.get_feature_names())\n",
1492 | "tfidf_df.head()"
1493 | ]
1494 | },
1495 | {
1496 | "cell_type": "markdown",
1497 | "metadata": {},
1498 | "source": [
1499 | "## What can you do with this?\n",
1500 | "\n",
1501 | "Try a few different operations, and try to **interpret their meaning/usecase**:\n",
1502 | "\n",
1503 | "* Calculate the correlation between documents, or between terms\n",
1504 | "* Consider bigrams or n-grams in your vectorizer\n",
1505 | "* Determine if there is multicollinearity between documents, or between terms\n",
1506 | "* Try to incorporate the `user_id` into your analysis\n",
1507 | "* Build a Python Class to make your work repeatable\n"
1508 | ]
1509 | },
1510 | {
1511 | "cell_type": "markdown",
1512 | "metadata": {},
1513 | "source": [
1514 | "### NMF \n",
1515 | "\n",
1516 | "( document matrix x topic matrix)"
1517 | ]
1518 | },
1519 | {
1520 | "cell_type": "markdown",
1521 | "metadata": {},
1522 | "source": [
1523 | "
"
1524 | ]
1525 | },
1526 | {
1527 | "cell_type": "code",
1528 | "execution_count": 18,
1529 | "metadata": {},
1530 | "outputs": [
1531 | {
1532 | "data": {
1533 | "text/plain": [
1534 | "(1103665, 32362)"
1535 | ]
1536 | },
1537 | "execution_count": 18,
1538 | "metadata": {},
1539 | "output_type": "execute_result"
1540 | }
1541 | ],
1542 | "source": [
1543 | "doc_term.shape"
1544 | ]
1545 | },
1546 | {
1547 | "cell_type": "code",
1548 | "execution_count": 8,
1549 | "metadata": {},
1550 | "outputs": [
1551 | {
1552 | "data": {
1553 | "text/html": [
1554 | "NMF(init='random', n_components=10)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
1555 | ],
1556 | "text/plain": [
1557 | "NMF(init='random', n_components=10)"
1558 | ]
1559 | },
1560 | "execution_count": 8,
1561 | "metadata": {},
1562 | "output_type": "execute_result"
1563 | }
1564 | ],
1565 | "source": [
1566 | "from sklearn.decomposition import NMF\n",
1567 | "nmf = NMF(n_components=10, init='random')\n",
1568 | "W = nmf.fit_transform(doc_term)\n",
1569 | "H = nmf.components_\n",
1570 | "\n",
1571 | "### OUTPUT THE MODEL\n",
1572 | "nmf"
1573 | ]
1574 | },
1575 | {
1576 | "cell_type": "code",
1577 | "execution_count": 11,
1578 | "metadata": {},
1579 | "outputs": [
1580 | {
1581 | "name": "stderr",
1582 | "output_type": "stream",
1583 | "text": [
1584 | "/Users/julialintern/.local/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n",
1585 | " warnings.warn(msg, category=FutureWarning)\n"
1586 | ]
1587 | }
1588 | ],
1589 | "source": [
1590 | "vocab = vec.get_feature_names()\n",
1591 | "id_topic = nmf.fit_transform(doc_term)\n",
1592 | "n_top_words=10\n",
1593 | "topic_words = {}\n",
1594 | "\n",
1595 | "for topic, comp in enumerate(nmf.components_):\n",
1596 | " word_idx = np.argsort(comp)[::-1][:n_top_words]\n",
1597 | " # store the words most relevant to the topic\n",
1598 | " topic_words[topic] = [vocab[i] for i in word_idx]"
1599 | ]
1600 | },
1601 | {
1602 | "cell_type": "code",
1603 | "execution_count": 12,
1604 | "metadata": {},
1605 | "outputs": [
1606 | {
1607 | "name": "stdout",
1608 | "output_type": "stream",
1609 | "text": [
1610 | "0 ['new', 'zealand', 'cases', 'laws', 'year', 'coronavirus', 'york', 'records', 'covid', 'home']\n",
1611 | "\n",
1612 | "\n",
1613 | "1 ['govt', 'council', 'says', 'plan', 'water', 'health', 'urged', 'qld', 'funding', 'government']\n",
1614 | "\n",
1615 | "\n",
1616 | "2 ['interview', 'extended', 'michael', 'david', 'john', 'nrl', 'smith', 'james', 'ben', 'scott']\n",
1617 | "\n",
1618 | "\n",
1619 | "3 ['news', 'abc', 'rural', 'national', 'business', 'weather', 'market', 'sport', 'analysis', 'entertainment']\n",
1620 | "\n",
1621 | "\n",
1622 | "4 ['australia', 'day', 'south', 'world', 'cup', 'test', 'coronavirus', 'live', 'vs', 'china']\n",
1623 | "\n",
1624 | "\n",
1625 | "5 ['country', 'hour', 'nsw', 'tas', 'wa', 'vic', 'august', 'drum', 'october', 'sa']\n",
1626 | "\n",
1627 | "\n",
1628 | "6 ['crash', 'car', 'killed', 'dies', 'fatal', 'woman', 'road', 'driver', 'plane', 'dead']\n",
1629 | "\n",
1630 | "\n",
1631 | "7 ['man', 'charged', 'murder', 'missing', 'jailed', 'stabbing', 'arrested', 'guilty', 'death', 'sydney']\n",
1632 | "\n",
1633 | "\n",
1634 | "8 ['court', 'accused', 'face', 'murder', 'charges', 'faces', 'told', 'case', 'high', 'sex']\n",
1635 | "\n",
1636 | "\n",
1637 | "9 ['police', 'investigate', 'probe', 'missing', 'search', 'death', 'hunt', 'officer', 'shooting', 'seek']\n",
1638 | "\n",
1639 | "\n"
1640 | ]
1641 | }
1642 | ],
1643 | "source": [
1644 | "for k,v in topic_words.items():\n",
1645 | " print(k,v)\n",
1646 | " print('\\n')"
1647 | ]
1648 | },
1649 | {
1650 | "cell_type": "markdown",
1651 | "metadata": {},
1652 | "source": [
1653 | "### Using Glove Embeddings to plot Emotions "
1654 | ]
1655 | },
1656 | {
1657 | "cell_type": "markdown",
1658 | "metadata": {},
1659 | "source": [
1660 | "
"
1661 | ]
1662 | },
1663 | {
1664 | "cell_type": "code",
1665 | "execution_count": null,
1666 | "metadata": {},
1667 | "outputs": [],
1668 | "source": []
1669 | }
1670 | ],
1671 | "metadata": {
1672 | "kernelspec": {
1673 | "display_name": "ml",
1674 | "language": "python",
1675 | "name": "ml"
1676 | },
1677 | "language_info": {
1678 | "codemirror_mode": {
1679 | "name": "ipython",
1680 | "version": 3
1681 | },
1682 | "file_extension": ".py",
1683 | "mimetype": "text/x-python",
1684 | "name": "python",
1685 | "nbconvert_exporter": "python",
1686 | "pygments_lexer": "ipython3",
1687 | "version": "3.9.12"
1688 | },
1689 | "toc": {
1690 | "base_numbering": 1,
1691 | "nav_menu": {},
1692 | "number_sections": true,
1693 | "sideBar": true,
1694 | "skip_h1_title": false,
1695 | "title_cell": "Table of Contents",
1696 | "title_sidebar": "Contents",
1697 | "toc_cell": false,
1698 | "toc_position": {},
1699 | "toc_section_display": "block",
1700 | "toc_window_display": false
1701 | }
1702 | },
1703 | "nbformat": 4,
1704 | "nbformat_minor": 4
1705 | }
1706 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # intro_to_machine_learning
2 |
3 | Machine Learning tutorial and walk thru for ODSC EAST Conference 2023
4 |
--------------------------------------------------------------------------------
/imgs/CRT.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/CRT.png
--------------------------------------------------------------------------------
/imgs/adjusted_r.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/adjusted_r.png
--------------------------------------------------------------------------------
/imgs/emotions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/emotions.png
--------------------------------------------------------------------------------
/imgs/flying_blind.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/flying_blind.png
--------------------------------------------------------------------------------
/imgs/logistic_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/logistic_1.png
--------------------------------------------------------------------------------
/imgs/nlp_topics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/nlp_topics.png
--------------------------------------------------------------------------------
/imgs/r_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/r_2.png
--------------------------------------------------------------------------------
/imgs/regression_ml.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/regression_ml.png
--------------------------------------------------------------------------------
/imgs/saab.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/julialintern/intro_to_machine_learning/6329e6b928e91004714407d171fc9414abf7368d/imgs/saab.png
--------------------------------------------------------------------------------