├── Assignment+1.ipynb
├── Assignment+2.ipynb
├── Assignment+3.ipynb
├── Assignment+4.ipynb
├── Case+Study+-+Sentiment+Analysis.ipynb
├── README.md
├── Regex+with+Pandas+and+Named+Groups.ipynb
├── dates.txt
├── moby.txt
├── newsgroups
├── paraphrases.csv
└── spam.csv


/Assignment+1.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "---\n",
  8 |     "\n",
  9 |     "_You are currently looking at **version 1.0** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-text-mining/resources/d9pwm) course resource._\n",
 10 |     "\n",
 11 |     "---"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "# Assignment 1\n",
 19 |     "\n",
 20 |     "In this assignment, you'll be working with messy medical data and using regex to extract relevant infromation from the data. \n",
 21 |     "\n",
 22 |     "Each line of the `dates.txt` file corresponds to a medical note. Each note has a date that needs to be extracted, but each date is encoded in one of many formats.\n",
 23 |     "\n",
 24 |     "The goal of this assignment is to correctly identify all of the different date variants encoded in this dataset and to properly normalize and sort the dates. \n",
 25 |     "\n",
 26 |     "Here is a list of some of the variants you might encounter in this dataset:\n",
 27 |     "* 04/20/2009; 04/20/09; 4/20/09; 4/3/09\n",
 28 |     "* Mar-20-2009; Mar 20, 2009; March 20, 2009;  Mar. 20, 2009; Mar 20 2009;\n",
 29 |     "* 20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009\n",
 30 |     "* Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009\n",
 31 |     "* Feb 2009; Sep 2009; Oct 2010\n",
 32 |     "* 6/2008; 12/2009\n",
 33 |     "* 2009; 2010\n",
 34 |     "\n",
 35 |     "Once you have extracted these date patterns from the text, the next step is to sort them in ascending chronological order accoring to the following rules:\n",
 36 |     "* Assume all dates in xx/xx/xx format are mm/dd/yy\n",
 37 |     "* Assume all dates where year is encoded in only two digits are years from the 1900's (e.g. 1/5/89 is January 5th, 1989)\n",
 38 |     "* If the day is missing (e.g. 9/2009), assume it is the first day of the month (e.g. September 1, 2009).\n",
 39 |     "* If the month is missing (e.g. 2010), assume it is the first of January of that year (e.g. January 1, 2010).\n",
 40 |     "\n",
 41 |     "With these rules in mind, find the correct date in each note and return a pandas Series in chronological order of the original Series' indices.\n",
 42 |     "\n",
 43 |     "For example if the original series was this:\n",
 44 |     "\n",
 45 |     "    0    1999\n",
 46 |     "    1    2010\n",
 47 |     "    2    1978\n",
 48 |     "    3    2015\n",
 49 |     "    4    1985\n",
 50 |     "\n",
 51 |     "Your function should return this:\n",
 52 |     "\n",
 53 |     "    0    2\n",
 54 |     "    1    4\n",
 55 |     "    2    0\n",
 56 |     "    3    1\n",
 57 |     "    4    3\n",
 58 |     "\n",
 59 |     "Your score will be calculated using [Kendall's tau](https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient), a correlation measure for ordinal data.\n",
 60 |     "\n",
 61 |     "*This function should return a Series of length 500 and dtype int.*"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "code",
 66 |    "execution_count": 1,
 67 |    "metadata": {},
 68 |    "outputs": [
 69 |     {
 70 |      "data": {
 71 |       "text/plain": [
 72 |        "0         03/25/93 Total time of visit (in minutes):\\n\n",
 73 |        "1                       6/18/85 Primary Care Doctor:\\n\n",
 74 |        "2    sshe plans to move as of 7/8/71 In-Home Servic...\n",
 75 |        "3                7 on 9/27/75 Audit C Score Current:\\n\n",
 76 |        "4    2/6/96 sleep studyPain Treatment Pain Level (N...\n",
 77 |        "5                    .Per 7/06/79 Movement D/O note:\\n\n",
 78 |        "6    4, 5/18/78 Patient's thoughts about current su...\n",
 79 |        "7    10/24/89 CPT Code: 90801 - Psychiatric Diagnos...\n",
 80 |        "8                         3/7/86 SOS-10 Total Score:\\n\n",
 81 |        "9             (4/10/71)Score-1Audit C Score Current:\\n\n",
 82 |        "dtype: object"
 83 |       ]
 84 |      },
 85 |      "execution_count": 1,
 86 |      "metadata": {},
 87 |      "output_type": "execute_result"
 88 |     }
 89 |    ],
 90 |    "source": [
 91 |     "import pandas as pd\n",
 92 |     "\n",
 93 |     "doc = []\n",
 94 |     "with open('dates.txt') as file:\n",
 95 |     "    for line in file:\n",
 96 |     "        doc.append(line)\n",
 97 |     "\n",
 98 |     "df = pd.Series(doc)\n",
 99 |     "df.head(10)"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "code",
104 |    "execution_count": 2,
105 |    "metadata": {},
106 |    "outputs": [],
107 |    "source": [
108 |     "def date_sorter():\n",
109 |     "    \n",
110 |     "    # Your code here\n",
111 |     "    # Full date\n",
112 |     "    global df\n",
113 |     "    dates_extracted = df.str.extractall(r'(?P<origin>(?P<month>\\d?\\d)[/|-](?P<day>\\d?\\d)[/|-](?P<year>\\d{4}))')\n",
114 |     "    index_left = ~df.index.isin([x[0] for x in dates_extracted.index])\n",
115 |     "    dates_extracted = dates_extracted.append(df[index_left].str.extractall(r'(?P<origin>(?P<month>\\d?\\d)[/|-](?P<day>([0-2]?[0-9])|([3][01]))[/|-](?P<year>\\d{2}))'))\n",
116 |     "    index_left = ~df.index.isin([x[0] for x in dates_extracted.index])\n",
117 |     "    del dates_extracted[3]\n",
118 |     "    del dates_extracted[4]\n",
119 |     "    dates_extracted = dates_extracted.append(df[index_left].str.extractall(r'(?P<origin>(?P<day>\\d?\\d) ?(?P<month>[a-zA-Z]{3,})\\.?,? (?P<year>\\d{4}))'))\n",
120 |     "    index_left = ~df.index.isin([x[0] for x in dates_extracted.index])\n",
121 |     "    dates_extracted = dates_extracted.append(df[index_left].str.extractall(r'(?P<origin>(?P<month>[a-zA-Z]{3,})\\.?-? ?(?P<day>\\d\\d?)(th|nd|st)?,?-? ?(?P<year>\\d{4}))'))\n",
122 |     "    del dates_extracted[3]\n",
123 |     "    index_left = ~df.index.isin([x[0] for x in dates_extracted.index])\n",
124 |     "\n",
125 |     "    # Without day\n",
126 |     "    dates_without_day = df[index_left].str.extractall('(?P<origin>(?P<month>[A-Z][a-z]{2,}),?\\.? (?P<year>\\d{4}))')\n",
127 |     "    dates_without_day = dates_without_day.append(df[index_left].str.extractall(r'(?P<origin>(?P<month>\\d\\d?)/(?P<year>\\d{4}))'))\n",
128 |     "    dates_without_day['day'] = 1\n",
129 |     "    dates_extracted = dates_extracted.append(dates_without_day)\n",
130 |     "    index_left = ~df.index.isin([x[0] for x in dates_extracted.index])\n",
131 |     "\n",
132 |     "    # Only year\n",
133 |     "    dates_only_year = df[index_left].str.extractall(r'(?P<origin>(?P<year>\\d{4}))')\n",
134 |     "    dates_only_year['day'] = 1\n",
135 |     "    dates_only_year['month'] = 1\n",
136 |     "    dates_extracted = dates_extracted.append(dates_only_year)\n",
137 |     "    index_left = ~df.index.isin([x[0] for x in dates_extracted.index])\n",
138 |     "\n",
139 |     "    # Year\n",
140 |     "    dates_extracted['year'] = dates_extracted['year'].apply(lambda x: '19' + x if len(x) == 2 else x)\n",
141 |     "    dates_extracted['year'] = dates_extracted['year'].apply(lambda x: str(x))\n",
142 |     "\n",
143 |     "    # Month\n",
144 |     "    dates_extracted['month'] = dates_extracted['month'].apply(lambda x: x[1:] if type(x) is str and x.startswith('0') else x)\n",
145 |     "    month_dict = dict({'September': 9, 'Mar': 3, 'November': 11, 'Jul': 7, 'January': 1, 'December': 12,\n",
146 |     "                       'Feb': 2, 'May': 5, 'Aug': 8, 'Jun': 6, 'Sep': 9, 'Oct': 10, 'June': 6, 'March': 3,\n",
147 |     "                       'February': 2, 'Dec': 12, 'Apr': 4, 'Jan': 1, 'Janaury': 1,'August': 8, 'October': 10,\n",
148 |     "                       'July': 7, 'Since': 1, 'Nov': 11, 'April': 4, 'Decemeber': 12, 'Age': 8})\n",
149 |     "    dates_extracted.replace({\"month\": month_dict}, inplace=True)\n",
150 |     "    dates_extracted['month'] = dates_extracted['month'].apply(lambda x: str(x))\n",
151 |     "\n",
152 |     "    # Day\n",
153 |     "    dates_extracted['day'] = dates_extracted['day'].apply(lambda x: str(x))\n",
154 |     "\n",
155 |     "    # Cleaned date\n",
156 |     "    dates_extracted['date'] = dates_extracted['month'] + '/' + dates_extracted['day'] + '/' + dates_extracted['year']\n",
157 |     "    dates_extracted['date'] = pd.to_datetime(dates_extracted['date'])\n",
158 |     "\n",
159 |     "    dates_extracted.sort_values(by='date', inplace=True)\n",
160 |     "    df1 = pd.Series(list(dates_extracted.index.labels[0]))\n",
161 |     "    \n",
162 |     "    return df1# Your answer here"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "code",
167 |    "execution_count": 3,
168 |    "metadata": {},
169 |    "outputs": [
170 |     {
171 |      "name": "stdout",
172 |      "output_type": "stream",
173 |      "text": [
174 |       "0        9\n",
175 |       "1       84\n",
176 |       "2        2\n",
177 |       "3       53\n",
178 |       "4       28\n",
179 |       "5      474\n",
180 |       "6      153\n",
181 |       "7       13\n",
182 |       "8      129\n",
183 |       "9       98\n",
184 |       "10     111\n",
185 |       "11     225\n",
186 |       "12      31\n",
187 |       "13     171\n",
188 |       "14     191\n",
189 |       "15     486\n",
190 |       "16     335\n",
191 |       "17     415\n",
192 |       "18      36\n",
193 |       "19     405\n",
194 |       "20     323\n",
195 |       "21     422\n",
196 |       "22     375\n",
197 |       "23     380\n",
198 |       "24     345\n",
199 |       "25      57\n",
200 |       "26     481\n",
201 |       "27     436\n",
202 |       "28     104\n",
203 |       "29     299\n",
204 |       "      ... \n",
205 |       "470    220\n",
206 |       "471    243\n",
207 |       "472    208\n",
208 |       "473    139\n",
209 |       "474    320\n",
210 |       "475    383\n",
211 |       "476    286\n",
212 |       "477    244\n",
213 |       "478    480\n",
214 |       "479    431\n",
215 |       "480    279\n",
216 |       "481    198\n",
217 |       "482    381\n",
218 |       "483    463\n",
219 |       "484    366\n",
220 |       "485    439\n",
221 |       "486    255\n",
222 |       "487    401\n",
223 |       "488    475\n",
224 |       "489    257\n",
225 |       "490    152\n",
226 |       "491    235\n",
227 |       "492    464\n",
228 |       "493    253\n",
229 |       "494    231\n",
230 |       "495    427\n",
231 |       "496    141\n",
232 |       "497    186\n",
233 |       "498    161\n",
234 |       "499    413\n",
235 |       "Length: 500, dtype: int64\n"
236 |      ]
237 |     }
238 |    ],
239 |    "source": [
240 |     "#print(date_sorter())"
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": null,
246 |    "metadata": {
247 |     "collapsed": true
248 |    },
249 |    "outputs": [],
250 |    "source": []
251 |   }
252 |  ],
253 |  "metadata": {
254 |   "coursera": {
255 |    "course_slug": "python-text-mining",
256 |    "graded_item_id": "LvcWI",
257 |    "launcher_item_id": "krne9",
258 |    "part_id": "Mkp1I"
259 |   },
260 |   "kernelspec": {
261 |    "display_name": "Python 3",
262 |    "language": "python",
263 |    "name": "python3"
264 |   },
265 |   "language_info": {
266 |    "codemirror_mode": {
267 |     "name": "ipython",
268 |     "version": 3
269 |    },
270 |    "file_extension": ".py",
271 |    "mimetype": "text/x-python",
272 |    "name": "python",
273 |    "nbconvert_exporter": "python",
274 |    "pygments_lexer": "ipython3",
275 |    "version": "3.6.2"
276 |   }
277 |  },
278 |  "nbformat": 4,
279 |  "nbformat_minor": 2
280 | }
281 | 


--------------------------------------------------------------------------------
/Assignment+2.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "---\n",
  8 |     "\n",
  9 |     "_You are currently looking at **version 1.0** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-text-mining/resources/d9pwm) course resource._\n",
 10 |     "\n",
 11 |     "---"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "# Assignment 2 - Introduction to NLTK\n",
 19 |     "\n",
 20 |     "In part 1 of this assignment you will use nltk to explore the Herman Melville novel Moby Dick. Then in part 2 you will create a spelling recommender function that uses nltk to find words similar to the misspelling. "
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "markdown",
 25 |    "metadata": {},
 26 |    "source": [
 27 |     "## Part 1 - Analyzing Moby Dick"
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "code",
 32 |    "execution_count": 1,
 33 |    "metadata": {
 34 |     "collapsed": true
 35 |    },
 36 |    "outputs": [],
 37 |    "source": [
 38 |     "import nltk\n",
 39 |     "import pandas as pd\n",
 40 |     "import numpy as np\n",
 41 |     "\n",
 42 |     "# If you would like to work with the raw text you can use 'moby_raw'\n",
 43 |     "with open('moby.txt', 'r') as f:\n",
 44 |     "    moby_raw = f.read()\n",
 45 |     "    \n",
 46 |     "# If you would like to work with the novel in nltk.Text format you can use 'text1'\n",
 47 |     "moby_tokens = nltk.word_tokenize(moby_raw)\n",
 48 |     "text1 = nltk.Text(moby_tokens)"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "markdown",
 53 |    "metadata": {},
 54 |    "source": [
 55 |     "### Example 1\n",
 56 |     "\n",
 57 |     "How many tokens (words and punctuation symbols) are in text1?\n",
 58 |     "\n",
 59 |     "*This function should return an integer.*"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": 2,
 65 |    "metadata": {},
 66 |    "outputs": [
 67 |     {
 68 |      "data": {
 69 |       "text/plain": [
 70 |        "254989"
 71 |       ]
 72 |      },
 73 |      "execution_count": 2,
 74 |      "metadata": {},
 75 |      "output_type": "execute_result"
 76 |     }
 77 |    ],
 78 |    "source": [
 79 |     "def example_one():\n",
 80 |     "    \n",
 81 |     "    return len(nltk.word_tokenize(moby_raw)) # or alternatively len(text1)\n",
 82 |     "\n",
 83 |     "example_one()"
 84 |    ]
 85 |   },
 86 |   {
 87 |    "cell_type": "markdown",
 88 |    "metadata": {},
 89 |    "source": [
 90 |     "### Example 2\n",
 91 |     "\n",
 92 |     "How many unique tokens (unique words and punctuation) does text1 have?\n",
 93 |     "\n",
 94 |     "*This function should return an integer.*"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "code",
 99 |    "execution_count": 3,
100 |    "metadata": {},
101 |    "outputs": [
102 |     {
103 |      "data": {
104 |       "text/plain": [
105 |        "20755"
106 |       ]
107 |      },
108 |      "execution_count": 3,
109 |      "metadata": {},
110 |      "output_type": "execute_result"
111 |     }
112 |    ],
113 |    "source": [
114 |     "def example_two():\n",
115 |     "    \n",
116 |     "    return len(set(nltk.word_tokenize(moby_raw))) # or alternatively len(set(text1))\n",
117 |     "\n",
118 |     "example_two()"
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "markdown",
123 |    "metadata": {},
124 |    "source": [
125 |     "### Example 3\n",
126 |     "\n",
127 |     "After lemmatizing the verbs, how many unique tokens does text1 have?\n",
128 |     "\n",
129 |     "*This function should return an integer.*"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "code",
134 |    "execution_count": 4,
135 |    "metadata": {},
136 |    "outputs": [
137 |     {
138 |      "data": {
139 |       "text/plain": [
140 |        "16900"
141 |       ]
142 |      },
143 |      "execution_count": 4,
144 |      "metadata": {},
145 |      "output_type": "execute_result"
146 |     }
147 |    ],
148 |    "source": [
149 |     "from nltk.stem import WordNetLemmatizer\n",
150 |     "\n",
151 |     "def example_three():\n",
152 |     "\n",
153 |     "    lemmatizer = WordNetLemmatizer()\n",
154 |     "    lemmatized = [lemmatizer.lemmatize(w,'v') for w in text1]\n",
155 |     "\n",
156 |     "    return len(set(lemmatized))\n",
157 |     "\n",
158 |     "example_three()"
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "markdown",
163 |    "metadata": {},
164 |    "source": [
165 |     "### Question 1\n",
166 |     "\n",
167 |     "What is the lexical diversity of the given text input? (i.e. ratio of unique tokens to the total number of tokens)\n",
168 |     "\n",
169 |     "*This function should return a float.*"
170 |    ]
171 |   },
172 |   {
173 |    "cell_type": "code",
174 |    "execution_count": 5,
175 |    "metadata": {},
176 |    "outputs": [
177 |     {
178 |      "data": {
179 |       "text/plain": [
180 |        "0.08139566804842562"
181 |       ]
182 |      },
183 |      "execution_count": 5,
184 |      "metadata": {},
185 |      "output_type": "execute_result"
186 |     }
187 |    ],
188 |    "source": [
189 |     "def answer_one():\n",
190 |     "    \n",
191 |     "    \n",
192 |     "    return example_two()/example_one()\n",
193 |     "\n",
194 |     "answer_one()"
195 |    ]
196 |   },
197 |   {
198 |    "cell_type": "markdown",
199 |    "metadata": {},
200 |    "source": [
201 |     "### Question 2\n",
202 |     "\n",
203 |     "What percentage of tokens is 'whale'or 'Whale'?\n",
204 |     "\n",
205 |     "*This function should return a float.*"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "code",
210 |    "execution_count": 6,
211 |    "metadata": {},
212 |    "outputs": [
213 |     {
214 |      "data": {
215 |       "text/plain": [
216 |        "0.4125668166077752"
217 |       ]
218 |      },
219 |      "execution_count": 6,
220 |      "metadata": {},
221 |      "output_type": "execute_result"
222 |     }
223 |    ],
224 |    "source": [
225 |     "def answer_two():\n",
226 |     "    \n",
227 |     "    \n",
228 |     "    return (text1.vocab()['whale'] + text1.vocab()['Whale']) / len(nltk.word_tokenize(moby_raw)) * 100 # Your answer here\n",
229 |     "\n",
230 |     "answer_two()"
231 |    ]
232 |   },
233 |   {
234 |    "cell_type": "markdown",
235 |    "metadata": {},
236 |    "source": [
237 |     "### Question 3\n",
238 |     "\n",
239 |     "What are the 20 most frequently occurring (unique) tokens in the text? What is their frequency?\n",
240 |     "\n",
241 |     "*This function should return a list of 20 tuples where each tuple is of the form `(token, frequency)`. The list should be sorted in descending order of frequency.*"
242 |    ]
243 |   },
244 |   {
245 |    "cell_type": "code",
246 |    "execution_count": 7,
247 |    "metadata": {},
248 |    "outputs": [
249 |     {
250 |      "data": {
251 |       "text/plain": [
252 |        "[(',', 19204),\n",
253 |        " ('the', 13715),\n",
254 |        " ('.', 7308),\n",
255 |        " ('of', 6513),\n",
256 |        " ('and', 6010),\n",
257 |        " ('a', 4545),\n",
258 |        " ('to', 4515),\n",
259 |        " (';', 4173),\n",
260 |        " ('in', 3908),\n",
261 |        " ('that', 2978),\n",
262 |        " ('his', 2459),\n",
263 |        " ('it', 2196),\n",
264 |        " ('I', 2097),\n",
265 |        " ('!', 1767),\n",
266 |        " ('is', 1722),\n",
267 |        " ('--', 1713),\n",
268 |        " ('with', 1659),\n",
269 |        " ('he', 1658),\n",
270 |        " ('was', 1639),\n",
271 |        " ('as', 1620)]"
272 |       ]
273 |      },
274 |      "execution_count": 7,
275 |      "metadata": {},
276 |      "output_type": "execute_result"
277 |     }
278 |    ],
279 |    "source": [
280 |     "def answer_three():\n",
281 |     "    import operator\n",
282 |     "\n",
283 |     "    return sorted(text1.vocab().items(), key=operator.itemgetter(1), reverse=True)[:20] # Your answer here\n",
284 |     "\n",
285 |     "answer_three()"
286 |    ]
287 |   },
288 |   {
289 |    "cell_type": "markdown",
290 |    "metadata": {},
291 |    "source": [
292 |     "### Question 4\n",
293 |     "\n",
294 |     "What tokens have a length of greater than 5 and frequency of more than 150?\n",
295 |     "\n",
296 |     "*This function should return a sorted list of the tokens that match the above constraints. To sort your list, use `sorted()`*"
297 |    ]
298 |   },
299 |   {
300 |    "cell_type": "code",
301 |    "execution_count": 8,
302 |    "metadata": {},
303 |    "outputs": [
304 |     {
305 |      "data": {
306 |       "text/plain": [
307 |        "['Captain',\n",
308 |        " 'Pequod',\n",
309 |        " 'Queequeg',\n",
310 |        " 'Starbuck',\n",
311 |        " 'almost',\n",
312 |        " 'before',\n",
313 |        " 'himself',\n",
314 |        " 'little',\n",
315 |        " 'seemed',\n",
316 |        " 'should',\n",
317 |        " 'though',\n",
318 |        " 'through',\n",
319 |        " 'whales',\n",
320 |        " 'without']"
321 |       ]
322 |      },
323 |      "execution_count": 8,
324 |      "metadata": {},
325 |      "output_type": "execute_result"
326 |     }
327 |    ],
328 |    "source": [
329 |     "def answer_four():\n",
330 |     "    \n",
331 |     "    \n",
332 |     "    return sorted([token for token, freq in text1.vocab().items() if len(token) > 5 and freq > 150]) # Your answer here\n",
333 |     "\n",
334 |     "answer_four()"
335 |    ]
336 |   },
337 |   {
338 |    "cell_type": "markdown",
339 |    "metadata": {},
340 |    "source": [
341 |     "### Question 5\n",
342 |     "\n",
343 |     "Find the longest word in text1 and that word's length.\n",
344 |     "\n",
345 |     "*This function should return a tuple `(longest_word, length)`.*"
346 |    ]
347 |   },
348 |   {
349 |    "cell_type": "code",
350 |    "execution_count": 9,
351 |    "metadata": {},
352 |    "outputs": [
353 |     {
354 |      "data": {
355 |       "text/plain": [
356 |        "(\"twelve-o'clock-at-night\", 23)"
357 |       ]
358 |      },
359 |      "execution_count": 9,
360 |      "metadata": {},
361 |      "output_type": "execute_result"
362 |     }
363 |    ],
364 |    "source": [
365 |     "def answer_five():\n",
366 |     "    import operator\n",
367 |     "    \n",
368 |     "    return sorted([(token, len(token))for token, freq in text1.vocab().items()], key=operator.itemgetter(1), reverse=True)[0] # Your answer here\n",
369 |     "\n",
370 |     "answer_five()"
371 |    ]
372 |   },
373 |   {
374 |    "cell_type": "markdown",
375 |    "metadata": {},
376 |    "source": [
377 |     "### Question 6\n",
378 |     "\n",
379 |     "What unique words have a frequency of more than 2000? What is their frequency?\n",
380 |     "\n",
381 |     "\"Hint:  you may want to use `isalpha()` to check if the token is a word and not punctuation.\"\n",
382 |     "\n",
383 |     "*This function should return a list of tuples of the form `(frequency, word)` sorted in descending order of frequency.*"
384 |    ]
385 |   },
386 |   {
387 |    "cell_type": "code",
388 |    "execution_count": 10,
389 |    "metadata": {},
390 |    "outputs": [
391 |     {
392 |      "data": {
393 |       "text/plain": [
394 |        "[(13715, 'the'),\n",
395 |        " (6513, 'of'),\n",
396 |        " (6010, 'and'),\n",
397 |        " (4545, 'a'),\n",
398 |        " (4515, 'to'),\n",
399 |        " (3908, 'in'),\n",
400 |        " (2978, 'that'),\n",
401 |        " (2459, 'his'),\n",
402 |        " (2196, 'it'),\n",
403 |        " (2097, 'I')]"
404 |       ]
405 |      },
406 |      "execution_count": 10,
407 |      "metadata": {},
408 |      "output_type": "execute_result"
409 |     }
410 |    ],
411 |    "source": [
412 |     "def answer_six():\n",
413 |     "    import operator\n",
414 |     "    \n",
415 |     "    return sorted([(freq, token) for token, freq in text1.vocab().items() if freq > 2000 and token.isalpha()], key=operator.itemgetter(0), reverse=True) # Your answer here\n",
416 |     "\n",
417 |     "answer_six()"
418 |    ]
419 |   },
420 |   {
421 |    "cell_type": "markdown",
422 |    "metadata": {},
423 |    "source": [
424 |     "### Question 7\n",
425 |     "\n",
426 |     "What is the average number of tokens per sentence?\n",
427 |     "\n",
428 |     "*This function should return a float.*"
429 |    ]
430 |   },
431 |   {
432 |    "cell_type": "code",
433 |    "execution_count": 11,
434 |    "metadata": {},
435 |    "outputs": [
436 |     {
437 |      "data": {
438 |       "text/plain": [
439 |        "25.881952902963864"
440 |       ]
441 |      },
442 |      "execution_count": 11,
443 |      "metadata": {},
444 |      "output_type": "execute_result"
445 |     }
446 |    ],
447 |    "source": [
448 |     "def answer_seven():\n",
449 |     "    \n",
450 |     "    \n",
451 |     "    return np.mean([len(nltk.word_tokenize(sent)) for sent in nltk.sent_tokenize(moby_raw)]) # Your answer here\n",
452 |     "\n",
453 |     "answer_seven()"
454 |    ]
455 |   },
456 |   {
457 |    "cell_type": "markdown",
458 |    "metadata": {},
459 |    "source": [
460 |     "### Question 8\n",
461 |     "\n",
462 |     "What are the 5 most frequent parts of speech in this text? What is their frequency?\n",
463 |     "\n",
464 |     "*This function should return a list of tuples of the form `(part_of_speech, frequency)` sorted in descending order of frequency.*"
465 |    ]
466 |   },
467 |   {
468 |    "cell_type": "code",
469 |    "execution_count": 12,
470 |    "metadata": {},
471 |    "outputs": [
472 |     {
473 |      "data": {
474 |       "text/plain": [
475 |        "[('NN', 32730), ('IN', 28657), ('DT', 25867), (',', 19204), ('JJ', 17620)]"
476 |       ]
477 |      },
478 |      "execution_count": 12,
479 |      "metadata": {},
480 |      "output_type": "execute_result"
481 |     }
482 |    ],
483 |    "source": [
484 |     "def answer_eight():\n",
485 |     "    from collections import Counter\n",
486 |     "    import operator\n",
487 |     "    \n",
488 |     "    return sorted(Counter([tag for token, tag in nltk.pos_tag(text1)]).items(), key=operator.itemgetter(1), reverse=True)[:5] # Your answer here\n",
489 |     "\n",
490 |     "answer_eight()"
491 |    ]
492 |   },
493 |   {
494 |    "cell_type": "markdown",
495 |    "metadata": {},
496 |    "source": [
497 |     "## Part 2 - Spelling Recommender\n",
498 |     "\n",
499 |     "For this part of the assignment you will create three different spelling recommenders, that each take a list of misspelled words and recommends a correctly spelled word for every word in the list.\n",
500 |     "\n",
501 |     "For every misspelled word, the recommender should find find the word in `correct_spellings` that has the shortest distance*, and starts with the same letter as the misspelled word, and return that word as a recommendation.\n",
502 |     "\n",
503 |     "*Each of the three different recommenders will use a different distance measure (outlined below).\n",
504 |     "\n",
505 |     "Each of the recommenders should provide recommendations for the three default words provided: `['cormulent', 'incendenece', 'validrate']`."
506 |    ]
507 |   },
508 |   {
509 |    "cell_type": "code",
510 |    "execution_count": 13,
511 |    "metadata": {
512 |     "collapsed": true
513 |    },
514 |    "outputs": [],
515 |    "source": [
516 |     "from nltk.corpus import words\n",
517 |     "\n",
518 |     "correct_spellings = words.words()"
519 |    ]
520 |   },
521 |   {
522 |    "cell_type": "markdown",
523 |    "metadata": {},
524 |    "source": [
525 |     "### Question 9\n",
526 |     "\n",
527 |     "For this recommender, your function should provide recommendations for the three default words provided above using the following distance metric:\n",
528 |     "\n",
529 |     "**[Jaccard distance](https://en.wikipedia.org/wiki/Jaccard_index) on the trigrams of the two words.**\n",
530 |     "\n",
531 |     "*This function should return a list of length three:\n",
532 |     "`['cormulent_reccomendation', 'incendenece_reccomendation', 'validrate_reccomendation']`.*"
533 |    ]
534 |   },
535 |   {
536 |    "cell_type": "code",
537 |    "execution_count": 14,
538 |    "metadata": {},
539 |    "outputs": [
540 |     {
541 |      "data": {
542 |       "text/plain": [
543 |        "['corpulent', 'indecence', 'validate']"
544 |       ]
545 |      },
546 |      "execution_count": 14,
547 |      "metadata": {},
548 |      "output_type": "execute_result"
549 |     }
550 |    ],
551 |    "source": [
552 |     "def answer_nine(entries=['cormulent', 'incendenece', 'validrate']):\n",
553 |     "    result = []\n",
554 |     "    import operator\n",
555 |     "    for entry in entries:\n",
556 |     "        spell_list = [spell for spell in correct_spellings if spell.startswith(entry[0]) and len(spell) > 2]\n",
557 |     "        distance_list = [(spell, nltk.jaccard_distance(set(nltk.ngrams(entry, n=3)), set(nltk.ngrams(spell, n=3)))) for spell in spell_list]\n",
558 |     "\n",
559 |     "        result.append(sorted(distance_list, key=operator.itemgetter(1))[0][0])\n",
560 |     "    \n",
561 |     "    return result # Your answer here\n",
562 |     "    \n",
563 |     "answer_nine()"
564 |    ]
565 |   },
566 |   {
567 |    "cell_type": "markdown",
568 |    "metadata": {},
569 |    "source": [
570 |     "### Question 10\n",
571 |     "\n",
572 |     "For this recommender, your function should provide recommendations for the three default words provided above using the following distance metric:\n",
573 |     "\n",
574 |     "**[Jaccard distance](https://en.wikipedia.org/wiki/Jaccard_index) on the 4-grams of the two words.**\n",
575 |     "\n",
576 |     "*This function should return a list of length three:\n",
577 |     "`['cormulent_reccomendation', 'incendenece_reccomendation', 'validrate_reccomendation']`.*"
578 |    ]
579 |   },
580 |   {
581 |    "cell_type": "code",
582 |    "execution_count": 15,
583 |    "metadata": {},
584 |    "outputs": [
585 |     {
586 |      "data": {
587 |       "text/plain": [
588 |        "['cormus', 'incendiary', 'valid']"
589 |       ]
590 |      },
591 |      "execution_count": 15,
592 |      "metadata": {},
593 |      "output_type": "execute_result"
594 |     }
595 |    ],
596 |    "source": [
597 |     "def answer_ten(entries=['cormulent', 'incendenece', 'validrate']):\n",
598 |     "    result = []\n",
599 |     "    import operator\n",
600 |     "    for entry in entries:\n",
601 |     "        spell_list = [spell for spell in correct_spellings if spell.startswith(entry[0]) and len(spell) > 2]\n",
602 |     "        distance_list = [(spell, nltk.jaccard_distance(set(nltk.ngrams(entry, n=4)), set(nltk.ngrams(spell, n=4)))) for spell in spell_list]\n",
603 |     "\n",
604 |     "        result.append(sorted(distance_list, key=operator.itemgetter(1))[0][0])\n",
605 |     "    \n",
606 |     "    return result # Your answer here\n",
607 |     "    \n",
608 |     "answer_ten()"
609 |    ]
610 |   },
611 |   {
612 |    "cell_type": "markdown",
613 |    "metadata": {},
614 |    "source": [
615 |     "### Question 11\n",
616 |     "\n",
617 |     "For this recommender, your function should provide recommendations for the three default words provided above using the following distance metric:\n",
618 |     "\n",
619 |     "**[Edit distance on the two words with transpositions.](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance)**\n",
620 |     "\n",
621 |     "*This function should return a list of length three:\n",
622 |     "`['cormulent_reccomendation', 'incendenece_reccomendation', 'validrate_reccomendation']`.*"
623 |    ]
624 |   },
625 |   {
626 |    "cell_type": "code",
627 |    "execution_count": 16,
628 |    "metadata": {},
629 |    "outputs": [
630 |     {
631 |      "data": {
632 |       "text/plain": [
633 |        "['corpulent', 'intendence', 'validate']"
634 |       ]
635 |      },
636 |      "execution_count": 16,
637 |      "metadata": {},
638 |      "output_type": "execute_result"
639 |     }
640 |    ],
641 |    "source": [
642 |     "def answer_eleven(entries=['cormulent', 'incendenece', 'validrate']):\n",
643 |     "    result = []\n",
644 |     "    import operator\n",
645 |     "    for entry in entries:\n",
646 |     "        spell_list = [spell for spell in correct_spellings if spell.startswith(entry[0]) and len(spell) > 2]\n",
647 |     "        distance_list = [(spell, nltk.edit_distance(entry, spell, transpositions=True)) for spell in spell_list]\n",
648 |     "\n",
649 |     "        result.append(sorted(distance_list, key=operator.itemgetter(1))[0][0])\n",
650 |     "    \n",
651 |     "    return result# Your answer here \n",
652 |     "    \n",
653 |     "answer_eleven()"
654 |    ]
655 |   },
656 |   {
657 |    "cell_type": "code",
658 |    "execution_count": null,
659 |    "metadata": {
660 |     "collapsed": true
661 |    },
662 |    "outputs": [],
663 |    "source": []
664 |   }
665 |  ],
666 |  "metadata": {
667 |   "coursera": {
668 |    "course_slug": "python-text-mining",
669 |    "graded_item_id": "r35En",
670 |    "launcher_item_id": "tCVfW",
671 |    "part_id": "NTVgL"
672 |   },
673 |   "kernelspec": {
674 |    "display_name": "Python 3",
675 |    "language": "python",
676 |    "name": "python3"
677 |   },
678 |   "language_info": {
679 |    "codemirror_mode": {
680 |     "name": "ipython",
681 |     "version": 3
682 |    },
683 |    "file_extension": ".py",
684 |    "mimetype": "text/x-python",
685 |    "name": "python",
686 |    "nbconvert_exporter": "python",
687 |    "pygments_lexer": "ipython3",
688 |    "version": "3.6.2"
689 |   }
690 |  },
691 |  "nbformat": 4,
692 |  "nbformat_minor": 2
693 | }
694 | 


--------------------------------------------------------------------------------
/Assignment+3.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "---\n",
  8 |     "\n",
  9 |     "_You are currently looking at **version 1.0** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-text-mining/resources/d9pwm) course resource._\n",
 10 |     "\n",
 11 |     "---"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "# Assignment 3\n",
 19 |     "\n",
 20 |     "In this assignment you will explore text message data and create models to predict if a message is spam or not. "
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "code",
 25 |    "execution_count": 1,
 26 |    "metadata": {},
 27 |    "outputs": [
 28 |     {
 29 |      "data": {
 30 |       "text/html": [
 31 |        "<div>\n",
 32 |        "<style>\n",
 33 |        "    .dataframe thead tr:only-child th {\n",
 34 |        "        text-align: right;\n",
 35 |        "    }\n",
 36 |        "\n",
 37 |        "    .dataframe thead th {\n",
 38 |        "        text-align: left;\n",
 39 |        "    }\n",
 40 |        "\n",
 41 |        "    .dataframe tbody tr th {\n",
 42 |        "        vertical-align: top;\n",
 43 |        "    }\n",
 44 |        "</style>\n",
 45 |        "<table border=\"1\" class=\"dataframe\">\n",
 46 |        "  <thead>\n",
 47 |        "    <tr style=\"text-align: right;\">\n",
 48 |        "      <th></th>\n",
 49 |        "      <th>text</th>\n",
 50 |        "      <th>target</th>\n",
 51 |        "    </tr>\n",
 52 |        "  </thead>\n",
 53 |        "  <tbody>\n",
 54 |        "    <tr>\n",
 55 |        "      <th>0</th>\n",
 56 |        "      <td>Go until jurong point, crazy.. Available only ...</td>\n",
 57 |        "      <td>0</td>\n",
 58 |        "    </tr>\n",
 59 |        "    <tr>\n",
 60 |        "      <th>1</th>\n",
 61 |        "      <td>Ok lar... Joking wif u oni...</td>\n",
 62 |        "      <td>0</td>\n",
 63 |        "    </tr>\n",
 64 |        "    <tr>\n",
 65 |        "      <th>2</th>\n",
 66 |        "      <td>Free entry in 2 a wkly comp to win FA Cup fina...</td>\n",
 67 |        "      <td>1</td>\n",
 68 |        "    </tr>\n",
 69 |        "    <tr>\n",
 70 |        "      <th>3</th>\n",
 71 |        "      <td>U dun say so early hor... U c already then say...</td>\n",
 72 |        "      <td>0</td>\n",
 73 |        "    </tr>\n",
 74 |        "    <tr>\n",
 75 |        "      <th>4</th>\n",
 76 |        "      <td>Nah I don't think he goes to usf, he lives aro...</td>\n",
 77 |        "      <td>0</td>\n",
 78 |        "    </tr>\n",
 79 |        "    <tr>\n",
 80 |        "      <th>5</th>\n",
 81 |        "      <td>FreeMsg Hey there darling it's been 3 week's n...</td>\n",
 82 |        "      <td>1</td>\n",
 83 |        "    </tr>\n",
 84 |        "    <tr>\n",
 85 |        "      <th>6</th>\n",
 86 |        "      <td>Even my brother is not like to speak with me. ...</td>\n",
 87 |        "      <td>0</td>\n",
 88 |        "    </tr>\n",
 89 |        "    <tr>\n",
 90 |        "      <th>7</th>\n",
 91 |        "      <td>As per your request 'Melle Melle (Oru Minnamin...</td>\n",
 92 |        "      <td>0</td>\n",
 93 |        "    </tr>\n",
 94 |        "    <tr>\n",
 95 |        "      <th>8</th>\n",
 96 |        "      <td>WINNER!! As a valued network customer you have...</td>\n",
 97 |        "      <td>1</td>\n",
 98 |        "    </tr>\n",
 99 |        "    <tr>\n",
100 |        "      <th>9</th>\n",
101 |        "      <td>Had your mobile 11 months or more? U R entitle...</td>\n",
102 |        "      <td>1</td>\n",
103 |        "    </tr>\n",
104 |        "  </tbody>\n",
105 |        "</table>\n",
106 |        "</div>"
107 |       ],
108 |       "text/plain": [
109 |        "                                                text  target\n",
110 |        "0  Go until jurong point, crazy.. Available only ...       0\n",
111 |        "1                      Ok lar... Joking wif u oni...       0\n",
112 |        "2  Free entry in 2 a wkly comp to win FA Cup fina...       1\n",
113 |        "3  U dun say so early hor... U c already then say...       0\n",
114 |        "4  Nah I don't think he goes to usf, he lives aro...       0\n",
115 |        "5  FreeMsg Hey there darling it's been 3 week's n...       1\n",
116 |        "6  Even my brother is not like to speak with me. ...       0\n",
117 |        "7  As per your request 'Melle Melle (Oru Minnamin...       0\n",
118 |        "8  WINNER!! As a valued network customer you have...       1\n",
119 |        "9  Had your mobile 11 months or more? U R entitle...       1"
120 |       ]
121 |      },
122 |      "execution_count": 1,
123 |      "metadata": {},
124 |      "output_type": "execute_result"
125 |     }
126 |    ],
127 |    "source": [
128 |     "import pandas as pd\n",
129 |     "import numpy as np\n",
130 |     "\n",
131 |     "spam_data = pd.read_csv('spam.csv')\n",
132 |     "\n",
133 |     "spam_data['target'] = np.where(spam_data['target']=='spam',1,0)\n",
134 |     "spam_data.head(10)"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": 2,
140 |    "metadata": {
141 |     "collapsed": true
142 |    },
143 |    "outputs": [],
144 |    "source": [
145 |     "from sklearn.model_selection import train_test_split\n",
146 |     "\n",
147 |     "\n",
148 |     "X_train, X_test, y_train, y_test = train_test_split(spam_data['text'], \n",
149 |     "                                                    spam_data['target'], \n",
150 |     "                                                    random_state=0)"
151 |    ]
152 |   },
153 |   {
154 |    "cell_type": "markdown",
155 |    "metadata": {},
156 |    "source": [
157 |     "### Question 1\n",
158 |     "What percentage of the documents in `spam_data` are spam?"
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "code",
163 |    "execution_count": 3,
164 |    "metadata": {
165 |     "collapsed": true
166 |    },
167 |    "outputs": [],
168 |    "source": [
169 |     "def answer_one():\n",
170 |     "    \n",
171 |     "    return len(spam_data[spam_data['target'] == 1]) / len(spam_data) * 100"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": 4,
177 |    "metadata": {},
178 |    "outputs": [
179 |     {
180 |      "data": {
181 |       "text/plain": [
182 |        "13.406317300789663"
183 |       ]
184 |      },
185 |      "execution_count": 4,
186 |      "metadata": {},
187 |      "output_type": "execute_result"
188 |     }
189 |    ],
190 |    "source": [
191 |     "answer_one()"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "markdown",
196 |    "metadata": {},
197 |    "source": [
198 |     "### Question 2\n",
199 |     "\n",
200 |     "Fit the training data `X_train` using a Count Vectorizer with default parameters.\n",
201 |     "\n",
202 |     "What is the longest token in the vocabulary?\n",
203 |     "\n",
204 |     "*This function should return a string.*"
205 |    ]
206 |   },
207 |   {
208 |    "cell_type": "code",
209 |    "execution_count": 5,
210 |    "metadata": {
211 |     "collapsed": true
212 |    },
213 |    "outputs": [],
214 |    "source": [
215 |     "from sklearn.feature_extraction.text import CountVectorizer\n",
216 |     "\n",
217 |     "def answer_two():\n",
218 |     "    import operator\n",
219 |     "\n",
220 |     "    vectorizer = CountVectorizer()\n",
221 |     "    vectorizer.fit(X_train)\n",
222 |     "    \n",
223 |     "    return sorted([(token, len(token)) for token in vectorizer.vocabulary_.keys()], key=operator.itemgetter(1), reverse=True)[0][0]"
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "code",
228 |    "execution_count": 6,
229 |    "metadata": {},
230 |    "outputs": [
231 |     {
232 |      "data": {
233 |       "text/plain": [
234 |        "'com1win150ppmx3age16subscription'"
235 |       ]
236 |      },
237 |      "execution_count": 6,
238 |      "metadata": {},
239 |      "output_type": "execute_result"
240 |     }
241 |    ],
242 |    "source": [
243 |     "answer_two()"
244 |    ]
245 |   },
246 |   {
247 |    "cell_type": "markdown",
248 |    "metadata": {},
249 |    "source": [
250 |     "### Question 3\n",
251 |     "\n",
252 |     "Fit and transform the training data `X_train` using a Count Vectorizer with default parameters.\n",
253 |     "\n",
254 |     "Next, fit a fit a multinomial Naive Bayes classifier model with smoothing `alpha=0.1`. Find the area under the curve (AUC) score using the transformed test data.\n",
255 |     "\n",
256 |     "*This function should return the AUC score as a float.*"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "code",
261 |    "execution_count": 7,
262 |    "metadata": {
263 |     "collapsed": true
264 |    },
265 |    "outputs": [],
266 |    "source": [
267 |     "from sklearn.naive_bayes import MultinomialNB\n",
268 |     "from sklearn.metrics import roc_auc_score\n",
269 |     "\n",
270 |     "def answer_three():\n",
271 |     "    vectorizer = CountVectorizer()\n",
272 |     "    X_train_transformed = vectorizer.fit_transform(X_train)\n",
273 |     "    X_test_transformed = vectorizer.transform(X_test)\n",
274 |     "\n",
275 |     "    clf = MultinomialNB(alpha=0.1)\n",
276 |     "    clf.fit(X_train_transformed, y_train)\n",
277 |     "\n",
278 |     "    y_predicted = clf.predict(X_test_transformed)\n",
279 |     "    \n",
280 |     "    return roc_auc_score(y_test, y_predicted)"
281 |    ]
282 |   },
283 |   {
284 |    "cell_type": "code",
285 |    "execution_count": 8,
286 |    "metadata": {},
287 |    "outputs": [
288 |     {
289 |      "data": {
290 |       "text/plain": [
291 |        "0.97208121827411165"
292 |       ]
293 |      },
294 |      "execution_count": 8,
295 |      "metadata": {},
296 |      "output_type": "execute_result"
297 |     }
298 |    ],
299 |    "source": [
300 |     "answer_three()"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "markdown",
305 |    "metadata": {},
306 |    "source": [
307 |     "### Question 4\n",
308 |     "\n",
309 |     "Fit and transform the training data `X_train` using a Tfidf Vectorizer with default parameters.\n",
310 |     "\n",
311 |     "What 20 features have the smallest tf-idf and what 20 have the largest tf-idf?\n",
312 |     "\n",
313 |     "Put these features in a two series where each series is sorted by tf-idf value and then alphabetically by feature name. The index of the series should be the feature name, and the data should be the tf-idf.\n",
314 |     "\n",
315 |     "The series of 20 features with smallest tf-idfs should be sorted smallest tfidf first, the list of 20 features with largest tf-idfs should be sorted largest first. \n",
316 |     "\n",
317 |     "*This function should return a tuple of two series\n",
318 |     "`(smallest tf-idfs series, largest tf-idfs series)`.*"
319 |    ]
320 |   },
321 |   {
322 |    "cell_type": "code",
323 |    "execution_count": 9,
324 |    "metadata": {
325 |     "collapsed": true
326 |    },
327 |    "outputs": [],
328 |    "source": [
329 |     "from sklearn.feature_extraction.text import TfidfVectorizer\n",
330 |     "\n",
331 |     "def answer_four():\n",
332 |     "    import operator\n",
333 |     "\n",
334 |     "    vectorizer = TfidfVectorizer()\n",
335 |     "    X_train_transformed = vectorizer.fit_transform(X_train)\n",
336 |     "\n",
337 |     "    feature_names = vectorizer.get_feature_names()\n",
338 |     "    idfs = vectorizer.idf_\n",
339 |     "    names_idfs = list(zip(feature_names, idfs))\n",
340 |     "\n",
341 |     "    smallest = sorted(names_idfs, key=operator.itemgetter(1))[:20]\n",
342 |     "    smallest = pd.Series([features[1] for features in smallest], index=[features[0] for features in smallest])\n",
343 |     "\n",
344 |     "    largest = sorted(names_idfs, key=operator.itemgetter(1), reverse=True)[:20]\n",
345 |     "    # largest = sorted(names_idfs, key=operator.itemgetter(1,0), reverse=True)[:20]\n",
346 |     "    largest = sorted(largest, key=operator.itemgetter(0))\n",
347 |     "    largest = pd.Series([features[1] for features in largest], index=[features[0] for features in largest])\n",
348 |     "    \n",
349 |     "    return (smallest, largest)"
350 |    ]
351 |   },
352 |   {
353 |    "cell_type": "code",
354 |    "execution_count": 10,
355 |    "metadata": {},
356 |    "outputs": [
357 |     {
358 |      "data": {
359 |       "text/plain": [
360 |        "(to      2.198406\n",
361 |        " you     2.265645\n",
362 |        " the     2.707383\n",
363 |        " in      2.890761\n",
364 |        " and     2.976764\n",
365 |        " is      3.003012\n",
366 |        " me      3.111530\n",
367 |        " for     3.206840\n",
368 |        " it      3.222174\n",
369 |        " my      3.231044\n",
370 |        " call    3.297812\n",
371 |        " your    3.300196\n",
372 |        " of      3.319473\n",
373 |        " have    3.354130\n",
374 |        " that    3.408477\n",
375 |        " on      3.463136\n",
376 |        " now     3.465949\n",
377 |        " can     3.545053\n",
378 |        " are     3.560414\n",
379 |        " so      3.566625\n",
380 |        " dtype: float64, 000pes         8.644919\n",
381 |        " 0089           8.644919\n",
382 |        " 0121           8.644919\n",
383 |        " 01223585236    8.644919\n",
384 |        " 0125698789     8.644919\n",
385 |        " 02072069400    8.644919\n",
386 |        " 02073162414    8.644919\n",
387 |        " 02085076972    8.644919\n",
388 |        " 021            8.644919\n",
389 |        " 0430           8.644919\n",
390 |        " 07008009200    8.644919\n",
391 |        " 07099833605    8.644919\n",
392 |        " 07123456789    8.644919\n",
393 |        " 0721072        8.644919\n",
394 |        " 07753741225    8.644919\n",
395 |        " 077xxx         8.644919\n",
396 |        " 078            8.644919\n",
397 |        " 07808247860    8.644919\n",
398 |        " 07808726822    8.644919\n",
399 |        " 078498         8.644919\n",
400 |        " dtype: float64)"
401 |       ]
402 |      },
403 |      "execution_count": 10,
404 |      "metadata": {},
405 |      "output_type": "execute_result"
406 |     }
407 |    ],
408 |    "source": [
409 |     "answer_four()"
410 |    ]
411 |   },
412 |   {
413 |    "cell_type": "markdown",
414 |    "metadata": {},
415 |    "source": [
416 |     "### Question 5\n",
417 |     "\n",
418 |     "Fit and transform the training data `X_train` using a Tfidf Vectorizer ignoring terms that have a document frequency strictly lower than **3**.\n",
419 |     "\n",
420 |     "Then fit a multinomial Naive Bayes classifier model with smoothing `alpha=0.1` and compute the area under the curve (AUC) score using the transformed test data.\n",
421 |     "\n",
422 |     "*This function should return the AUC score as a float.*"
423 |    ]
424 |   },
425 |   {
426 |    "cell_type": "code",
427 |    "execution_count": 11,
428 |    "metadata": {
429 |     "collapsed": true
430 |    },
431 |    "outputs": [],
432 |    "source": [
433 |     "def answer_five():\n",
434 |     "    vectorizer = TfidfVectorizer(min_df=3)\n",
435 |     "    X_train_transformed = vectorizer.fit_transform(X_train)\n",
436 |     "    X_test_transformed = vectorizer.transform(X_test)\n",
437 |     "\n",
438 |     "    clf = MultinomialNB(alpha=0.1)\n",
439 |     "    clf.fit(X_train_transformed, y_train)\n",
440 |     "\n",
441 |     "    # y_predicted_prob = clf.predict_proba(X_test_transformed)[:, 1]\n",
442 |     "    y_predicted = clf.predict(X_test_transformed)\n",
443 |     "\n",
444 |     "    # return roc_auc_score(y_test, y_predicted_prob) #Your answer here\n",
445 |     "    return roc_auc_score(y_test, y_predicted)"
446 |    ]
447 |   },
448 |   {
449 |    "cell_type": "code",
450 |    "execution_count": 12,
451 |    "metadata": {},
452 |    "outputs": [
453 |     {
454 |      "data": {
455 |       "text/plain": [
456 |        "0.94162436548223349"
457 |       ]
458 |      },
459 |      "execution_count": 12,
460 |      "metadata": {},
461 |      "output_type": "execute_result"
462 |     }
463 |    ],
464 |    "source": [
465 |     "answer_five()"
466 |    ]
467 |   },
468 |   {
469 |    "cell_type": "markdown",
470 |    "metadata": {},
471 |    "source": [
472 |     "### Question 6\n",
473 |     "\n",
474 |     "What is the average length of documents (number of characters) for not spam and spam documents?\n",
475 |     "\n",
476 |     "*This function should return a tuple (average length not spam, average length spam).*"
477 |    ]
478 |   },
479 |   {
480 |    "cell_type": "code",
481 |    "execution_count": 13,
482 |    "metadata": {
483 |     "collapsed": true
484 |    },
485 |    "outputs": [],
486 |    "source": [
487 |     "def answer_six():\n",
488 |     "    spam_data['length'] = spam_data['text'].apply(lambda x:len(x))\n",
489 |     "    \n",
490 |     "    return (np.mean(spam_data['length'][spam_data['target'] == 0]), np.mean(spam_data['length'][spam_data['target'] == 1]))#Your answer here"
491 |    ]
492 |   },
493 |   {
494 |    "cell_type": "code",
495 |    "execution_count": 14,
496 |    "metadata": {},
497 |    "outputs": [
498 |     {
499 |      "data": {
500 |       "text/plain": [
501 |        "(71.02362694300518, 138.8661311914324)"
502 |       ]
503 |      },
504 |      "execution_count": 14,
505 |      "metadata": {},
506 |      "output_type": "execute_result"
507 |     }
508 |    ],
509 |    "source": [
510 |     "answer_six()"
511 |    ]
512 |   },
513 |   {
514 |    "cell_type": "markdown",
515 |    "metadata": {},
516 |    "source": [
517 |     "<br>\n",
518 |     "<br>\n",
519 |     "The following function has been provided to help you combine new features into the training data:"
520 |    ]
521 |   },
522 |   {
523 |    "cell_type": "code",
524 |    "execution_count": 15,
525 |    "metadata": {
526 |     "collapsed": true
527 |    },
528 |    "outputs": [],
529 |    "source": [
530 |     "def add_feature(X, feature_to_add):\n",
531 |     "    \"\"\"\n",
532 |     "    Returns sparse feature matrix with added feature.\n",
533 |     "    feature_to_add can also be a list of features.\n",
534 |     "    \"\"\"\n",
535 |     "    from scipy.sparse import csr_matrix, hstack\n",
536 |     "    return hstack([X, csr_matrix(feature_to_add).T], 'csr')\n"
537 |    ]
538 |   },
539 |   {
540 |    "cell_type": "markdown",
541 |    "metadata": {},
542 |    "source": [
543 |     "### Question 7\n",
544 |     "\n",
545 |     "Fit and transform the training data X_train using a Tfidf Vectorizer ignoring terms that have a document frequency strictly lower than **5**.\n",
546 |     "\n",
547 |     "Using this document-term matrix and an additional feature, **the length of document (number of characters)**, fit a Support Vector Classification model with regularization `C=10000`. Then compute the area under the curve (AUC) score using the transformed test data.\n",
548 |     "\n",
549 |     "*This function should return the AUC score as a float.*"
550 |    ]
551 |   },
552 |   {
553 |    "cell_type": "code",
554 |    "execution_count": 16,
555 |    "metadata": {
556 |     "collapsed": true
557 |    },
558 |    "outputs": [],
559 |    "source": [
560 |     "from sklearn.svm import SVC\n",
561 |     "\n",
562 |     "def answer_seven():\n",
563 |     "    vectorizer = TfidfVectorizer(min_df=5)\n",
564 |     "\n",
565 |     "    X_train_transformed = vectorizer.fit_transform(X_train)\n",
566 |     "    X_train_transformed_with_length = add_feature(X_train_transformed, X_train.str.len())\n",
567 |     "\n",
568 |     "    X_test_transformed = vectorizer.transform(X_test)\n",
569 |     "    X_test_transformed_with_length = add_feature(X_test_transformed, X_test.str.len())\n",
570 |     "\n",
571 |     "    clf = SVC(C=10000)\n",
572 |     "\n",
573 |     "    clf.fit(X_train_transformed_with_length, y_train)\n",
574 |     "\n",
575 |     "    y_predicted = clf.predict(X_test_transformed_with_length)\n",
576 |     "    \n",
577 |     "    return roc_auc_score(y_test, y_predicted)"
578 |    ]
579 |   },
580 |   {
581 |    "cell_type": "code",
582 |    "execution_count": 17,
583 |    "metadata": {},
584 |    "outputs": [
585 |     {
586 |      "data": {
587 |       "text/plain": [
588 |        "0.95813668234215565"
589 |       ]
590 |      },
591 |      "execution_count": 17,
592 |      "metadata": {},
593 |      "output_type": "execute_result"
594 |     }
595 |    ],
596 |    "source": [
597 |     "answer_seven()"
598 |    ]
599 |   },
600 |   {
601 |    "cell_type": "markdown",
602 |    "metadata": {},
603 |    "source": [
604 |     "### Question 8\n",
605 |     "\n",
606 |     "What is the average number of digits per document for not spam and spam documents?\n",
607 |     "\n",
608 |     "*This function should return a tuple (average # digits not spam, average # digits spam).*"
609 |    ]
610 |   },
611 |   {
612 |    "cell_type": "code",
613 |    "execution_count": 18,
614 |    "metadata": {
615 |     "collapsed": true
616 |    },
617 |    "outputs": [],
618 |    "source": [
619 |     "def answer_eight():\n",
620 |     "    spam_data['length'] = spam_data['text'].apply(lambda x: len(''.join([a for a in x if a.isdigit()])))\n",
621 |     "    \n",
622 |     "    return (np.mean(spam_data['length'][spam_data['target'] == 0]), np.mean(spam_data['length'][spam_data['target'] == 1]))"
623 |    ]
624 |   },
625 |   {
626 |    "cell_type": "code",
627 |    "execution_count": 19,
628 |    "metadata": {},
629 |    "outputs": [
630 |     {
631 |      "data": {
632 |       "text/plain": [
633 |        "(0.2992746113989637, 15.759036144578314)"
634 |       ]
635 |      },
636 |      "execution_count": 19,
637 |      "metadata": {},
638 |      "output_type": "execute_result"
639 |     }
640 |    ],
641 |    "source": [
642 |     "answer_eight()"
643 |    ]
644 |   },
645 |   {
646 |    "cell_type": "markdown",
647 |    "metadata": {},
648 |    "source": [
649 |     "### Question 9\n",
650 |     "\n",
651 |     "Fit and transform the training data `X_train` using a Tfidf Vectorizer ignoring terms that have a document frequency strictly lower than **5** and using **word n-grams from n=1 to n=3** (unigrams, bigrams, and trigrams).\n",
652 |     "\n",
653 |     "Using this document-term matrix and the following additional features:\n",
654 |     "* the length of document (number of characters)\n",
655 |     "* **number of digits per document**\n",
656 |     "\n",
657 |     "fit a Logistic Regression model with regularization `C=100`. Then compute the area under the curve (AUC) score using the transformed test data.\n",
658 |     "\n",
659 |     "*This function should return the AUC score as a float.*"
660 |    ]
661 |   },
662 |   {
663 |    "cell_type": "code",
664 |    "execution_count": 20,
665 |    "metadata": {
666 |     "collapsed": true
667 |    },
668 |    "outputs": [],
669 |    "source": [
670 |     "from sklearn.linear_model import LogisticRegression\n",
671 |     "\n",
672 |     "def answer_nine():\n",
673 |     "    vectorizer = TfidfVectorizer(min_df=5, ngram_range=[1,3])\n",
674 |     "\n",
675 |     "    X_train_transformed = vectorizer.fit_transform(X_train)\n",
676 |     "    X_train_transformed_with_length = add_feature(X_train_transformed, [X_train.str.len(),\n",
677 |     "                                                                        X_train.apply(lambda x: len(''.join([a for a in x if a.isdigit()])))])\n",
678 |     "\n",
679 |     "    X_test_transformed = vectorizer.transform(X_test)\n",
680 |     "    X_test_transformed_with_length = add_feature(X_test_transformed, [X_test.str.len(),\n",
681 |     "                                                                      X_test.apply(lambda x: len(''.join([a for a in x if a.isdigit()])))])\n",
682 |     "\n",
683 |     "    clf = LogisticRegression(C=100)\n",
684 |     "\n",
685 |     "    clf.fit(X_train_transformed_with_length, y_train)\n",
686 |     "\n",
687 |     "    y_predicted = clf.predict(X_test_transformed_with_length)\n",
688 |     "\n",
689 |     "    return roc_auc_score(y_test, y_predicted)"
690 |    ]
691 |   },
692 |   {
693 |    "cell_type": "code",
694 |    "execution_count": 21,
695 |    "metadata": {},
696 |    "outputs": [
697 |     {
698 |      "data": {
699 |       "text/plain": [
700 |        "0.96787090640544626"
701 |       ]
702 |      },
703 |      "execution_count": 21,
704 |      "metadata": {},
705 |      "output_type": "execute_result"
706 |     }
707 |    ],
708 |    "source": [
709 |     "answer_nine()"
710 |    ]
711 |   },
712 |   {
713 |    "cell_type": "markdown",
714 |    "metadata": {},
715 |    "source": [
716 |     "### Question 10\n",
717 |     "\n",
718 |     "What is the average number of non-word characters (anything other than a letter, digit or underscore) per document for not spam and spam documents?\n",
719 |     "\n",
720 |     "*This function should return a tuple (average # non-word characters not spam, average # non-word characters spam).*"
721 |    ]
722 |   },
723 |   {
724 |    "cell_type": "code",
725 |    "execution_count": 22,
726 |    "metadata": {
727 |     "collapsed": true
728 |    },
729 |    "outputs": [],
730 |    "source": [
731 |     "def answer_ten():\n",
732 |     "    spam_data['length'] = spam_data['text'].str.findall(r'(\\W)').str.len()\n",
733 |     "    \n",
734 |     "    return (np.mean(spam_data['length'][spam_data['target'] == 0]), np.mean(spam_data['length'][spam_data['target'] == 1]))"
735 |    ]
736 |   },
737 |   {
738 |    "cell_type": "code",
739 |    "execution_count": 23,
740 |    "metadata": {},
741 |    "outputs": [
742 |     {
743 |      "data": {
744 |       "text/plain": [
745 |        "(17.29181347150259, 29.041499330655956)"
746 |       ]
747 |      },
748 |      "execution_count": 23,
749 |      "metadata": {},
750 |      "output_type": "execute_result"
751 |     }
752 |    ],
753 |    "source": [
754 |     "answer_ten()"
755 |    ]
756 |   },
757 |   {
758 |    "cell_type": "markdown",
759 |    "metadata": {},
760 |    "source": [
761 |     "### Question 11\n",
762 |     "\n",
763 |     "Fit and transform the training data X_train using a Count Vectorizer ignoring terms that have a document frequency strictly lower than **5** and using **character n-grams from n=2 to n=5.**\n",
764 |     "\n",
765 |     "To tell Count Vectorizer to use character n-grams pass in `analyzer='char_wb'` which creates character n-grams only from text inside word boundaries. This should make the model more robust to spelling mistakes.\n",
766 |     "\n",
767 |     "Using this document-term matrix and the following additional features:\n",
768 |     "* the length of document (number of characters)\n",
769 |     "* number of digits per document\n",
770 |     "* **number of non-word characters (anything other than a letter, digit or underscore.)**\n",
771 |     "\n",
772 |     "fit a Logistic Regression model with regularization C=100. Then compute the area under the curve (AUC) score using the transformed test data.\n",
773 |     "\n",
774 |     "Also **find the 10 smallest and 10 largest coefficients from the model** and return them along with the AUC score in a tuple.\n",
775 |     "\n",
776 |     "The list of 10 smallest coefficients should be sorted smallest first, the list of 10 largest coefficients should be sorted largest first.\n",
777 |     "\n",
778 |     "The three features that were added to the document term matrix should have the following names should they appear in the list of coefficients:\n",
779 |     "['length_of_doc', 'digit_count', 'non_word_char_count']\n",
780 |     "\n",
781 |     "*This function should return a tuple `(AUC score as a float, smallest coefs list, largest coefs list)`.*"
782 |    ]
783 |   },
784 |   {
785 |    "cell_type": "code",
786 |    "execution_count": 24,
787 |    "metadata": {
788 |     "collapsed": true
789 |    },
790 |    "outputs": [],
791 |    "source": [
792 |     "def answer_eleven():\n",
793 |     "    vectorizer = CountVectorizer(min_df=5, analyzer='char_wb', ngram_range=[2,5])\n",
794 |     "\n",
795 |     "    X_train_transformed = vectorizer.fit_transform(X_train)\n",
796 |     "    X_train_transformed_with_length = add_feature(X_train_transformed, [X_train.str.len(),\n",
797 |     "                                                                        X_train.apply(lambda x: len(''.join([a for a in x if a.isdigit()]))),\n",
798 |     "                                                                        X_train.str.findall(r'(\\W)').str.len()])\n",
799 |     "\n",
800 |     "    X_test_transformed = vectorizer.transform(X_test)\n",
801 |     "    X_test_transformed_with_length = add_feature(X_test_transformed, [X_test.str.len(),\n",
802 |     "                                                                      X_test.apply(lambda x: len(''.join([a for a in x if a.isdigit()]))),\n",
803 |     "                                                                      X_test.str.findall(r'(\\W)').str.len()])\n",
804 |     "\n",
805 |     "    clf = LogisticRegression(C=100)\n",
806 |     "\n",
807 |     "    clf.fit(X_train_transformed_with_length, y_train)\n",
808 |     "\n",
809 |     "    y_predicted = clf.predict(X_test_transformed_with_length)\n",
810 |     "\n",
811 |     "    auc = roc_auc_score(y_test, y_predicted)\n",
812 |     "\n",
813 |     "    feature_names = np.array(vectorizer.get_feature_names() + ['length_of_doc', 'digit_count', 'non_word_char_count'])\n",
814 |     "    sorted_coef_index = clf.coef_[0].argsort()\n",
815 |     "    smallest = feature_names[sorted_coef_index[:10]]\n",
816 |     "    largest = feature_names[sorted_coef_index[:-11:-1]]\n",
817 |     "\n",
818 |     "    return (auc, list(smallest), list(largest))"
819 |    ]
820 |   },
821 |   {
822 |    "cell_type": "code",
823 |    "execution_count": 25,
824 |    "metadata": {},
825 |    "outputs": [
826 |     {
827 |      "data": {
828 |       "text/plain": [
829 |        "(0.97885931107074342,\n",
830 |        " ['. ', '..', '? ', ' i', ' y', ' go', ':)', ' h', 'go', ' m'],\n",
831 |        " ['digit_count', 'ne', 'ia', 'co', 'xt', ' ch', 'mob', ' x', 'ww', 'ar'])"
832 |       ]
833 |      },
834 |      "execution_count": 25,
835 |      "metadata": {},
836 |      "output_type": "execute_result"
837 |     }
838 |    ],
839 |    "source": [
840 |     "answer_eleven()"
841 |    ]
842 |   },
843 |   {
844 |    "cell_type": "code",
845 |    "execution_count": null,
846 |    "metadata": {
847 |     "collapsed": true
848 |    },
849 |    "outputs": [],
850 |    "source": []
851 |   }
852 |  ],
853 |  "metadata": {
854 |   "coursera": {
855 |    "course_slug": "python-text-mining",
856 |    "graded_item_id": "Pn19K",
857 |    "launcher_item_id": "y1juS",
858 |    "part_id": "ctlgo"
859 |   },
860 |   "kernelspec": {
861 |    "display_name": "Python 3",
862 |    "language": "python",
863 |    "name": "python3"
864 |   },
865 |   "language_info": {
866 |    "codemirror_mode": {
867 |     "name": "ipython",
868 |     "version": 3
869 |    },
870 |    "file_extension": ".py",
871 |    "mimetype": "text/x-python",
872 |    "name": "python",
873 |    "nbconvert_exporter": "python",
874 |    "pygments_lexer": "ipython3",
875 |    "version": "3.6.2"
876 |   }
877 |  },
878 |  "nbformat": 4,
879 |  "nbformat_minor": 2
880 | }
881 | 


--------------------------------------------------------------------------------
/Assignment+4.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "---\n",
  8 |     "\n",
  9 |     "_You are currently looking at **version 1.0** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-text-mining/resources/d9pwm) course resource._\n",
 10 |     "\n",
 11 |     "---"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "# Assignment 4 - Document Similarity & Topic Modelling"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Part 1 - Document Similarity\n",
 26 |     "\n",
 27 |     "For the first part of this assignment, you will complete the functions `doc_to_synsets` and `similarity_score` which will be used by `document_path_similarity` to find the path similarity between two documents.\n",
 28 |     "\n",
 29 |     "The following functions are provided:\n",
 30 |     "* **`convert_tag:`** converts the tag given by `nltk.pos_tag` to a tag used by `wordnet.synsets`. You will need to use this function in `doc_to_synsets`.\n",
 31 |     "* **`document_path_similarity:`** computes the symmetrical path similarity between two documents by finding the synsets in each document using `doc_to_synsets`, then computing similarities using `similarity_score`.\n",
 32 |     "\n",
 33 |     "You will need to finish writing the following functions:\n",
 34 |     "* **`doc_to_synsets:`** returns a list of synsets in document. This function should first tokenize and part of speech tag the document using `nltk.word_tokenize` and `nltk.pos_tag`. Then it should find each tokens corresponding synset using `wn.synsets(token, wordnet_tag)`. The first synset match should be used. If there is no match, that token is skipped.\n",
 35 |     "* **`similarity_score:`** returns the normalized similarity score of a list of synsets (s1) onto a second list of synsets (s2). For each synset in s1, find the synset in s2 with the largest similarity value. Sum all of the largest similarity values together and normalize this value by dividing it by the number of largest similarity values found. Be careful with data types, which should be floats. Missing values should be ignored.\n",
 36 |     "\n",
 37 |     "Once `doc_to_synsets` and `similarity_score` have been completed, submit to the autograder which will run `test_document_path_similarity` to test that these functions are running correctly. \n",
 38 |     "\n",
 39 |     "*Do not modify the functions `convert_tag`, `document_path_similarity`, and `test_document_path_similarity`.*"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": null,
 45 |    "metadata": {
 46 |     "collapsed": true
 47 |    },
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "import numpy as np\n",
 51 |     "import nltk\n",
 52 |     "from nltk.corpus import wordnet as wn\n",
 53 |     "import pandas as pd\n",
 54 |     "\n",
 55 |     "\n",
 56 |     "def convert_tag(tag):\n",
 57 |     "    \"\"\"Convert the tag given by nltk.pos_tag to the tag used by wordnet.synsets\"\"\"\n",
 58 |     "    \n",
 59 |     "    tag_dict = {'N': 'n', 'J': 'a', 'R': 'r', 'V': 'v'}\n",
 60 |     "    try:\n",
 61 |     "        return tag_dict[tag[0]]\n",
 62 |     "    except KeyError:\n",
 63 |     "        return None\n",
 64 |     "\n",
 65 |     "\n",
 66 |     "def doc_to_synsets(doc):\n",
 67 |     "    \"\"\"\n",
 68 |     "    Returns a list of synsets in document.\n",
 69 |     "\n",
 70 |     "    Tokenizes and tags the words in the document doc.\n",
 71 |     "    Then finds the first synset for each word/tag combination.\n",
 72 |     "    If a synset is not found for that combination it is skipped.\n",
 73 |     "\n",
 74 |     "    Args:\n",
 75 |     "        doc: string to be converted\n",
 76 |     "\n",
 77 |     "    Returns:\n",
 78 |     "        list of synsets\n",
 79 |     "\n",
 80 |     "    Example:\n",
 81 |     "        doc_to_synsets('Fish are nvqjp friends.')\n",
 82 |     "        Out: [Synset('fish.n.01'), Synset('be.v.01'), Synset('friend.n.01')]\n",
 83 |     "    \"\"\"\n",
 84 |     "    \n",
 85 |     "\n",
 86 |     "    # Your Code Here\n",
 87 |     "    \n",
 88 |     "    return # Your Answer Here\n",
 89 |     "\n",
 90 |     "\n",
 91 |     "def similarity_score(s1, s2):\n",
 92 |     "    \"\"\"\n",
 93 |     "    Calculate the normalized similarity score of s1 onto s2\n",
 94 |     "\n",
 95 |     "    For each synset in s1, finds the synset in s2 with the largest similarity value.\n",
 96 |     "    Sum of all of the largest similarity values and normalize this value by dividing it by the\n",
 97 |     "    number of largest similarity values found.\n",
 98 |     "\n",
 99 |     "    Args:\n",
100 |     "        s1, s2: list of synsets from doc_to_synsets\n",
101 |     "\n",
102 |     "    Returns:\n",
103 |     "        normalized similarity score of s1 onto s2\n",
104 |     "\n",
105 |     "    Example:\n",
106 |     "        synsets1 = doc_to_synsets('I like cats')\n",
107 |     "        synsets2 = doc_to_synsets('I like dogs')\n",
108 |     "        similarity_score(synsets1, synsets2)\n",
109 |     "        Out: 0.73333333333333339\n",
110 |     "    \"\"\"\n",
111 |     "    \n",
112 |     "    \n",
113 |     "    # Your Code Here\n",
114 |     "    \n",
115 |     "    return # Your Answer Here\n",
116 |     "\n",
117 |     "\n",
118 |     "def document_path_similarity(doc1, doc2):\n",
119 |     "    \"\"\"Finds the symmetrical similarity between doc1 and doc2\"\"\"\n",
120 |     "\n",
121 |     "    synsets1 = doc_to_synsets(doc1)\n",
122 |     "    synsets2 = doc_to_synsets(doc2)\n",
123 |     "\n",
124 |     "    return (similarity_score(synsets1, synsets2) + similarity_score(synsets2, synsets1)) / 2"
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "markdown",
129 |    "metadata": {},
130 |    "source": [
131 |     "### test_document_path_similarity\n",
132 |     "\n",
133 |     "Use this function to check if doc_to_synsets and similarity_score are correct.\n",
134 |     "\n",
135 |     "*This function should return the similarity score as a float.*"
136 |    ]
137 |   },
138 |   {
139 |    "cell_type": "code",
140 |    "execution_count": null,
141 |    "metadata": {
142 |     "collapsed": true
143 |    },
144 |    "outputs": [],
145 |    "source": [
146 |     "def test_document_path_similarity():\n",
147 |     "    doc1 = 'This is a function to test document_path_similarity.'\n",
148 |     "    doc2 = 'Use this function to see if your code in doc_to_synsets \\\n",
149 |     "    and similarity_score is correct!'\n",
150 |     "    return document_path_similarity(doc1, doc2)"
151 |    ]
152 |   },
153 |   {
154 |    "cell_type": "markdown",
155 |    "metadata": {},
156 |    "source": [
157 |     "<br>\n",
158 |     "___\n",
159 |     "`paraphrases` is a DataFrame which contains the following columns: `Quality`, `D1`, and `D2`.\n",
160 |     "\n",
161 |     "`Quality` is an indicator variable which indicates if the two documents `D1` and `D2` are paraphrases of one another (1 for paraphrase, 0 for not paraphrase)."
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "code",
166 |    "execution_count": null,
167 |    "metadata": {
168 |     "collapsed": true
169 |    },
170 |    "outputs": [],
171 |    "source": [
172 |     "# Use this dataframe for questions most_similar_docs and label_accuracy\n",
173 |     "paraphrases = pd.read_csv('paraphrases.csv')\n",
174 |     "paraphrases.head()"
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "markdown",
179 |    "metadata": {},
180 |    "source": [
181 |     "___\n",
182 |     "\n",
183 |     "### most_similar_docs\n",
184 |     "\n",
185 |     "Using `document_path_similarity`, find the pair of documents in paraphrases which has the maximum similarity score.\n",
186 |     "\n",
187 |     "*This function should return a tuple `(D1, D2, similarity_score)`*"
188 |    ]
189 |   },
190 |   {
191 |    "cell_type": "code",
192 |    "execution_count": null,
193 |    "metadata": {
194 |     "collapsed": true
195 |    },
196 |    "outputs": [],
197 |    "source": [
198 |     "def most_similar_docs():\n",
199 |     "    \n",
200 |     "    # Your Code Here\n",
201 |     "    paraphrases['similarity'] = paraphrases.apply(lambda x:document_path_similarity(x['D1'], x['D2']), axis=1)\n",
202 |     "    pair = paraphrases.iloc[paraphrases['similarity'].argmax()]\n",
203 |     "    return pair['D1'], pair['D2'], pair['similarity']"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "markdown",
208 |    "metadata": {},
209 |    "source": [
210 |     "### label_accuracy\n",
211 |     "\n",
212 |     "Provide labels for the twenty pairs of documents by computing the similarity for each pair using `document_path_similarity`. Let the classifier rule be that if the score is greater than 0.75, label is paraphrase (1), else label is not paraphrase (0). Report accuracy of the classifier using scikit-learn's accuracy_score.\n",
213 |     "\n",
214 |     "*This function should return a float.*"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": null,
220 |    "metadata": {
221 |     "collapsed": true
222 |    },
223 |    "outputs": [],
224 |    "source": [
225 |     "def label_accuracy():\n",
226 |     "    from sklearn.metrics import accuracy_score\n",
227 |     "\n",
228 |     "    # Your Code Here\n",
229 |     "    paraphrases['predicted'] = np.where(paraphrases['similarity'] > 0.75, 1, 0)\n",
230 |     "    \n",
231 |     "    return accuracy_score(paraphrases['Quality'], paraphrases['predicted'])"
232 |    ]
233 |   },
234 |   {
235 |    "cell_type": "markdown",
236 |    "metadata": {},
237 |    "source": [
238 |     "## Part 2 - Topic Modelling\n",
239 |     "\n",
240 |     "For the second part of this assignment, you will use Gensim's LDA (Latent Dirichlet Allocation) model to model topics in `newsgroup_data`. You will first need to finish the code in the cell below by using gensim.models.ldamodel.LdaModel constructor to estimate LDA model parameters on the corpus, and save to the variable `ldamodel`. Extract 10 topics using `corpus` and `id_map`, and with `passes=25` and `random_state=34`."
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": null,
246 |    "metadata": {
247 |     "collapsed": true
248 |    },
249 |    "outputs": [],
250 |    "source": [
251 |     "import pickle\n",
252 |     "import gensim\n",
253 |     "from sklearn.feature_extraction.text import CountVectorizer\n",
254 |     "\n",
255 |     "# Load the list of documents\n",
256 |     "with open('newsgroups', 'rb') as f:\n",
257 |     "    newsgroup_data = pickle.load(f)\n",
258 |     "\n",
259 |     "# Use CountVectorizor to find three letter tokens, remove stop_words, \n",
260 |     "# remove tokens that don't appear in at least 20 documents,\n",
261 |     "# remove tokens that appear in more than 20% of the documents\n",
262 |     "vect = CountVectorizer(min_df=20, max_df=0.2, stop_words='english', \n",
263 |     "                       token_pattern='(?u)\\\\b\\\\w\\\\w\\\\w+\\\\b')\n",
264 |     "# Fit and transform\n",
265 |     "X = vect.fit_transform(newsgroup_data)\n",
266 |     "\n",
267 |     "# Convert sparse matrix to gensim corpus.\n",
268 |     "corpus = gensim.matutils.Sparse2Corpus(X, documents_columns=False)\n",
269 |     "\n",
270 |     "# Mapping from word IDs to words (To be used in LdaModel's id2word parameter)\n",
271 |     "id_map = dict((v, k) for k, v in vect.vocabulary_.items())\n"
272 |    ]
273 |   },
274 |   {
275 |    "cell_type": "code",
276 |    "execution_count": null,
277 |    "metadata": {
278 |     "collapsed": true
279 |    },
280 |    "outputs": [],
281 |    "source": [
282 |     "# Use the gensim.models.ldamodel.LdaModel constructor to estimate \n",
283 |     "# LDA model parameters on the corpus, and save to the variable `ldamodel`\n",
284 |     "\n",
285 |     "# Your code here:\n",
286 |     "ldamodel = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=id_map, passes=25, random_state=34, num_topics=10)"
287 |    ]
288 |   },
289 |   {
290 |    "cell_type": "markdown",
291 |    "metadata": {},
292 |    "source": [
293 |     "### lda_topics\n",
294 |     "\n",
295 |     "Using `ldamodel`, find a list of the 10 topics and the most significant 10 words in each topic. This should be structured as a list of 10 tuples where each tuple takes on the form:\n",
296 |     "\n",
297 |     "`(9, '0.068*\"space\" + 0.036*\"nasa\" + 0.021*\"science\" + 0.020*\"edu\" + 0.019*\"data\" + 0.017*\"shuttle\" + 0.015*\"launch\" + 0.015*\"available\" + 0.014*\"center\" + 0.014*\"sci\"')`\n",
298 |     "\n",
299 |     "for example.\n",
300 |     "\n",
301 |     "*This function should return a list of tuples.*"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": null,
307 |    "metadata": {
308 |     "collapsed": true
309 |    },
310 |    "outputs": [],
311 |    "source": [
312 |     "def lda_topics():\n",
313 |     "    \n",
314 |     "    # Your Code Here\n",
315 |     "    \n",
316 |     "    return ldamodel.print_topics(num_words=10)"
317 |    ]
318 |   },
319 |   {
320 |    "cell_type": "markdown",
321 |    "metadata": {},
322 |    "source": [
323 |     "### topic_distribution\n",
324 |     "\n",
325 |     "For the new document `new_doc`, find the topic distribution. Remember to use vect.transform on the the new doc, and Sparse2Corpus to convert the sparse matrix to gensim corpus.\n",
326 |     "\n",
327 |     "*This function should return a list of tuples, where each tuple is `(#topic, probability)`*"
328 |    ]
329 |   },
330 |   {
331 |    "cell_type": "code",
332 |    "execution_count": null,
333 |    "metadata": {
334 |     "collapsed": true
335 |    },
336 |    "outputs": [],
337 |    "source": [
338 |     "new_doc = [\"\\n\\nIt's my understanding that the freezing will start to occur because \\\n",
339 |     "of the\\ngrowing distance of Pluto and Charon from the Sun, due to it's\\nelliptical orbit. \\\n",
340 |     "It is not due to shadowing effects. \\n\\n\\nPluto can shadow Charon, and vice-versa.\\n\\nGeorge \\\n",
341 |     "Krumins\\n-- \"]"
342 |    ]
343 |   },
344 |   {
345 |    "cell_type": "code",
346 |    "execution_count": null,
347 |    "metadata": {
348 |     "collapsed": true
349 |    },
350 |    "outputs": [],
351 |    "source": [
352 |     "def topic_distribution():\n",
353 |     "    \n",
354 |     "    # Your Code Here\n",
355 |     "    # transform\n",
356 |     "    X = vect.transform(new_doc)\n",
357 |     "\n",
358 |     "    # Convert sparse matrix to gensim corpus.\n",
359 |     "    corpus = gensim.matutils.Sparse2Corpus(X, documents_columns=False)\n",
360 |     "    \n",
361 |     "    return list(ldamodel[corpus])[0] # Your Answer Here"
362 |    ]
363 |   },
364 |   {
365 |    "cell_type": "markdown",
366 |    "metadata": {},
367 |    "source": [
368 |     "### topic_names\n",
369 |     "\n",
370 |     "From the list of the following given topics, assign topic names to the topics you found. If none of these names best matches the topics you found, create a new 1-3 word \"title\" for the topic.\n",
371 |     "\n",
372 |     "Topics: Health, Science, Automobiles, Politics, Government, Travel, Computers & IT, Sports, Business, Society & Lifestyle, Religion, Education.\n",
373 |     "\n",
374 |     "*This function should return a list of 10 strings.*"
375 |    ]
376 |   },
377 |   {
378 |    "cell_type": "code",
379 |    "execution_count": null,
380 |    "metadata": {
381 |     "collapsed": true
382 |    },
383 |    "outputs": [],
384 |    "source": [
385 |     "def topic_names():\n",
386 |     "    \n",
387 |     "    # Your Code Here\n",
388 |     "    \n",
389 |     "    return ['Automobiles', 'Health', 'Science',\n",
390 |     "            'Politics',\n",
391 |     "            'Sports',\n",
392 |     "            'Business', 'Society & Lifestyle',\n",
393 |     "            'Religion', 'Education', 'Computers & IT']# Your Answer Here"
394 |    ]
395 |   }
396 |  ],
397 |  "metadata": {
398 |   "coursera": {
399 |    "course_slug": "python-text-mining",
400 |    "graded_item_id": "2qbcK",
401 |    "launcher_item_id": "pi9Sh",
402 |    "part_id": "kQiwX"
403 |   },
404 |   "kernelspec": {
405 |    "display_name": "Python 3",
406 |    "language": "python",
407 |    "name": "python3"
408 |   },
409 |   "language_info": {
410 |    "codemirror_mode": {
411 |     "name": "ipython",
412 |     "version": 3
413 |    },
414 |    "file_extension": ".py",
415 |    "mimetype": "text/x-python",
416 |    "name": "python",
417 |    "nbconvert_exporter": "python",
418 |    "pygments_lexer": "ipython3",
419 |    "version": "3.6.2"
420 |   }
421 |  },
422 |  "nbformat": 4,
423 |  "nbformat_minor": 2
424 | }
425 | 


--------------------------------------------------------------------------------
/Case+Study+-+Sentiment+Analysis.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "---\n",
  8 |     "\n",
  9 |     "_You are currently looking at **version 1.0** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-text-mining/resources/d9pwm) course resource._\n",
 10 |     "\n",
 11 |     "---"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "*Note: Some of the cells in this notebook are computationally expensive. To reduce runtime, this notebook is using a subset of the data.*"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "# Case Study: Sentiment Analysis"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "metadata": {},
 31 |    "source": [
 32 |     "### Data Prep"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "code",
 37 |    "execution_count": null,
 38 |    "metadata": {},
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "import pandas as pd\n",
 42 |     "import numpy as np\n",
 43 |     "\n",
 44 |     "# Read in the data\n",
 45 |     "df = pd.read_csv('Amazon_Unlocked_Mobile.csv')\n",
 46 |     "\n",
 47 |     "# Sample the data to speed up computation\n",
 48 |     "# Comment out this line to match with lecture\n",
 49 |     "df = df.sample(frac=0.1, random_state=10)\n",
 50 |     "\n",
 51 |     "df.head()"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "code",
 56 |    "execution_count": null,
 57 |    "metadata": {},
 58 |    "outputs": [],
 59 |    "source": [
 60 |     "# Drop missing values\n",
 61 |     "df.dropna(inplace=True)\n",
 62 |     "\n",
 63 |     "# Remove any 'neutral' ratings equal to 3\n",
 64 |     "df = df[df['Rating'] != 3]\n",
 65 |     "\n",
 66 |     "# Encode 4s and 5s as 1 (rated positively)\n",
 67 |     "# Encode 1s and 2s as 0 (rated poorly)\n",
 68 |     "df['Positively Rated'] = np.where(df['Rating'] > 3, 1, 0)\n",
 69 |     "df.head(10)"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": null,
 75 |    "metadata": {},
 76 |    "outputs": [],
 77 |    "source": [
 78 |     "# Most ratings are positive\n",
 79 |     "df['Positively Rated'].mean()"
 80 |    ]
 81 |   },
 82 |   {
 83 |    "cell_type": "code",
 84 |    "execution_count": null,
 85 |    "metadata": {
 86 |     "collapsed": true
 87 |    },
 88 |    "outputs": [],
 89 |    "source": [
 90 |     "from sklearn.model_selection import train_test_split\n",
 91 |     "\n",
 92 |     "# Split data into training and test sets\n",
 93 |     "X_train, X_test, y_train, y_test = train_test_split(df['Reviews'], \n",
 94 |     "                                                    df['Positively Rated'], \n",
 95 |     "                                                    random_state=0)"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "code",
100 |    "execution_count": null,
101 |    "metadata": {},
102 |    "outputs": [],
103 |    "source": [
104 |     "print('X_train first entry:\\n\\n', X_train.iloc[0])\n",
105 |     "print('\\n\\nX_train shape: ', X_train.shape)"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "markdown",
110 |    "metadata": {},
111 |    "source": [
112 |     "# CountVectorizer"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": null,
118 |    "metadata": {
119 |     "collapsed": true
120 |    },
121 |    "outputs": [],
122 |    "source": [
123 |     "from sklearn.feature_extraction.text import CountVectorizer\n",
124 |     "\n",
125 |     "# Fit the CountVectorizer to the training data\n",
126 |     "vect = CountVectorizer().fit(X_train)"
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "code",
131 |    "execution_count": null,
132 |    "metadata": {
133 |     "scrolled": false
134 |    },
135 |    "outputs": [],
136 |    "source": [
137 |     "vect.get_feature_names()[::2000]"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": null,
143 |    "metadata": {},
144 |    "outputs": [],
145 |    "source": [
146 |     "len(vect.get_feature_names())"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": null,
152 |    "metadata": {},
153 |    "outputs": [],
154 |    "source": [
155 |     "# transform the documents in the training data to a document-term matrix\n",
156 |     "X_train_vectorized = vect.transform(X_train)\n",
157 |     "\n",
158 |     "X_train_vectorized"
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "code",
163 |    "execution_count": null,
164 |    "metadata": {},
165 |    "outputs": [],
166 |    "source": [
167 |     "from sklearn.linear_model import LogisticRegression\n",
168 |     "\n",
169 |     "# Train the model\n",
170 |     "model = LogisticRegression()\n",
171 |     "model.fit(X_train_vectorized, y_train)"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "metadata": {},
178 |    "outputs": [],
179 |    "source": [
180 |     "from sklearn.metrics import roc_auc_score\n",
181 |     "\n",
182 |     "# Predict the transformed test documents\n",
183 |     "predictions = model.predict(vect.transform(X_test))\n",
184 |     "\n",
185 |     "print('AUC: ', roc_auc_score(y_test, predictions))"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "code",
190 |    "execution_count": null,
191 |    "metadata": {
192 |     "scrolled": true
193 |    },
194 |    "outputs": [],
195 |    "source": [
196 |     "# get the feature names as numpy array\n",
197 |     "feature_names = np.array(vect.get_feature_names())\n",
198 |     "\n",
199 |     "# Sort the coefficients from the model\n",
200 |     "sorted_coef_index = model.coef_[0].argsort()\n",
201 |     "\n",
202 |     "# Find the 10 smallest and 10 largest coefficients\n",
203 |     "# The 10 largest coefficients are being indexed using [:-11:-1] \n",
204 |     "# so the list returned is in order of largest to smallest\n",
205 |     "print('Smallest Coefs:\\n{}\\n'.format(feature_names[sorted_coef_index[:10]]))\n",
206 |     "print('Largest Coefs: \\n{}'.format(feature_names[sorted_coef_index[:-11:-1]]))"
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "markdown",
211 |    "metadata": {},
212 |    "source": [
213 |     "# Tfidf"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": null,
219 |    "metadata": {},
220 |    "outputs": [],
221 |    "source": [
222 |     "from sklearn.feature_extraction.text import TfidfVectorizer\n",
223 |     "\n",
224 |     "# Fit the TfidfVectorizer to the training data specifiying a minimum document frequency of 5\n",
225 |     "vect = TfidfVectorizer(min_df=5).fit(X_train)\n",
226 |     "len(vect.get_feature_names())"
227 |    ]
228 |   },
229 |   {
230 |    "cell_type": "code",
231 |    "execution_count": null,
232 |    "metadata": {},
233 |    "outputs": [],
234 |    "source": [
235 |     "X_train_vectorized = vect.transform(X_train)\n",
236 |     "\n",
237 |     "model = LogisticRegression()\n",
238 |     "model.fit(X_train_vectorized, y_train)\n",
239 |     "\n",
240 |     "predictions = model.predict(vect.transform(X_test))\n",
241 |     "\n",
242 |     "print('AUC: ', roc_auc_score(y_test, predictions))"
243 |    ]
244 |   },
245 |   {
246 |    "cell_type": "code",
247 |    "execution_count": null,
248 |    "metadata": {},
249 |    "outputs": [],
250 |    "source": [
251 |     "feature_names = np.array(vect.get_feature_names())\n",
252 |     "\n",
253 |     "sorted_tfidf_index = X_train_vectorized.max(0).toarray()[0].argsort()\n",
254 |     "\n",
255 |     "print('Smallest tfidf:\\n{}\\n'.format(feature_names[sorted_tfidf_index[:10]]))\n",
256 |     "print('Largest tfidf: \\n{}'.format(feature_names[sorted_tfidf_index[:-11:-1]]))"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "code",
261 |    "execution_count": null,
262 |    "metadata": {},
263 |    "outputs": [],
264 |    "source": [
265 |     "sorted_coef_index = model.coef_[0].argsort()\n",
266 |     "\n",
267 |     "print('Smallest Coefs:\\n{}\\n'.format(feature_names[sorted_coef_index[:10]]))\n",
268 |     "print('Largest Coefs: \\n{}'.format(feature_names[sorted_coef_index[:-11:-1]]))"
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "code",
273 |    "execution_count": null,
274 |    "metadata": {},
275 |    "outputs": [],
276 |    "source": [
277 |     "# These reviews are treated the same by our current model\n",
278 |     "print(model.predict(vect.transform(['not an issue, phone is working',\n",
279 |     "                                    'an issue, phone is not working'])))"
280 |    ]
281 |   },
282 |   {
283 |    "cell_type": "markdown",
284 |    "metadata": {},
285 |    "source": [
286 |     "# n-grams"
287 |    ]
288 |   },
289 |   {
290 |    "cell_type": "code",
291 |    "execution_count": null,
292 |    "metadata": {},
293 |    "outputs": [],
294 |    "source": [
295 |     "# Fit the CountVectorizer to the training data specifiying a minimum \n",
296 |     "# document frequency of 5 and extracting 1-grams and 2-grams\n",
297 |     "vect = CountVectorizer(min_df=5, ngram_range=(1,2)).fit(X_train)\n",
298 |     "\n",
299 |     "X_train_vectorized = vect.transform(X_train)\n",
300 |     "\n",
301 |     "len(vect.get_feature_names())"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": null,
307 |    "metadata": {},
308 |    "outputs": [],
309 |    "source": [
310 |     "model = LogisticRegression()\n",
311 |     "model.fit(X_train_vectorized, y_train)\n",
312 |     "\n",
313 |     "predictions = model.predict(vect.transform(X_test))\n",
314 |     "\n",
315 |     "print('AUC: ', roc_auc_score(y_test, predictions))"
316 |    ]
317 |   },
318 |   {
319 |    "cell_type": "code",
320 |    "execution_count": null,
321 |    "metadata": {},
322 |    "outputs": [],
323 |    "source": [
324 |     "feature_names = np.array(vect.get_feature_names())\n",
325 |     "\n",
326 |     "sorted_coef_index = model.coef_[0].argsort()\n",
327 |     "\n",
328 |     "print('Smallest Coefs:\\n{}\\n'.format(feature_names[sorted_coef_index[:10]]))\n",
329 |     "print('Largest Coefs: \\n{}'.format(feature_names[sorted_coef_index[:-11:-1]]))"
330 |    ]
331 |   },
332 |   {
333 |    "cell_type": "code",
334 |    "execution_count": null,
335 |    "metadata": {},
336 |    "outputs": [],
337 |    "source": [
338 |     "# These reviews are now correctly identified\n",
339 |     "print(model.predict(vect.transform(['not an issue, phone is working',\n",
340 |     "                                    'an issue, phone is not working'])))"
341 |    ]
342 |   }
343 |  ],
344 |  "metadata": {
345 |   "kernelspec": {
346 |    "display_name": "Python 3",
347 |    "language": "python",
348 |    "name": "python3"
349 |   },
350 |   "language_info": {
351 |    "codemirror_mode": {
352 |     "name": "ipython",
353 |     "version": 3
354 |    },
355 |    "file_extension": ".py",
356 |    "mimetype": "text/x-python",
357 |    "name": "python",
358 |    "nbconvert_exporter": "python",
359 |    "pygments_lexer": "ipython3",
360 |    "version": "3.6.0"
361 |   }
362 |  },
363 |  "nbformat": 4,
364 |  "nbformat_minor": 2
365 | }
366 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Applied-text-mining-in-Python
2 | 


--------------------------------------------------------------------------------
/Regex+with+Pandas+and+Named+Groups.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "---\n",
  8 |     "\n",
  9 |     "_You are currently looking at **version 1.0** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-text-mining/resources/d9pwm) course resource._\n",
 10 |     "\n",
 11 |     "---"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "# Working with Text Data in pandas"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "code",
 23 |    "execution_count": 1,
 24 |    "metadata": {
 25 |     "collapsed": false,
 26 |     "scrolled": true
 27 |    },
 28 |    "outputs": [
 29 |     {
 30 |      "data": {
 31 |       "text/html": [
 32 |        "<div>\n",
 33 |        "<table border=\"1\" class=\"dataframe\">\n",
 34 |        "  <thead>\n",
 35 |        "    <tr style=\"text-align: right;\">\n",
 36 |        "      <th></th>\n",
 37 |        "      <th>text</th>\n",
 38 |        "    </tr>\n",
 39 |        "  </thead>\n",
 40 |        "  <tbody>\n",
 41 |        "    <tr>\n",
 42 |        "      <th>0</th>\n",
 43 |        "      <td>Monday: The doctor's appointment is at 2:45pm.</td>\n",
 44 |        "    </tr>\n",
 45 |        "    <tr>\n",
 46 |        "      <th>1</th>\n",
 47 |        "      <td>Tuesday: The dentist's appointment is at 11:30...</td>\n",
 48 |        "    </tr>\n",
 49 |        "    <tr>\n",
 50 |        "      <th>2</th>\n",
 51 |        "      <td>Wednesday: At 7:00pm, there is a basketball game!</td>\n",
 52 |        "    </tr>\n",
 53 |        "    <tr>\n",
 54 |        "      <th>3</th>\n",
 55 |        "      <td>Thursday: Be back home by 11:15 pm at the latest.</td>\n",
 56 |        "    </tr>\n",
 57 |        "    <tr>\n",
 58 |        "      <th>4</th>\n",
 59 |        "      <td>Friday: Take the train at 08:10 am, arrive at ...</td>\n",
 60 |        "    </tr>\n",
 61 |        "  </tbody>\n",
 62 |        "</table>\n",
 63 |        "</div>"
 64 |       ],
 65 |       "text/plain": [
 66 |        "                                                text\n",
 67 |        "0     Monday: The doctor's appointment is at 2:45pm.\n",
 68 |        "1  Tuesday: The dentist's appointment is at 11:30...\n",
 69 |        "2  Wednesday: At 7:00pm, there is a basketball game!\n",
 70 |        "3  Thursday: Be back home by 11:15 pm at the latest.\n",
 71 |        "4  Friday: Take the train at 08:10 am, arrive at ..."
 72 |       ]
 73 |      },
 74 |      "execution_count": 1,
 75 |      "metadata": {},
 76 |      "output_type": "execute_result"
 77 |     }
 78 |    ],
 79 |    "source": [
 80 |     "import pandas as pd\n",
 81 |     "\n",
 82 |     "time_sentences = [\"Monday: The doctor's appointment is at 2:45pm.\", \n",
 83 |     "                  \"Tuesday: The dentist's appointment is at 11:30 am.\",\n",
 84 |     "                  \"Wednesday: At 7:00pm, there is a basketball game!\",\n",
 85 |     "                  \"Thursday: Be back home by 11:15 pm at the latest.\",\n",
 86 |     "                  \"Friday: Take the train at 08:10 am, arrive at 09:00am.\"]\n",
 87 |     "\n",
 88 |     "df = pd.DataFrame(time_sentences, columns=['text'])\n",
 89 |     "df"
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "code",
 94 |    "execution_count": 2,
 95 |    "metadata": {
 96 |     "collapsed": false
 97 |    },
 98 |    "outputs": [
 99 |     {
100 |      "data": {
101 |       "text/plain": [
102 |        "0    46\n",
103 |        "1    50\n",
104 |        "2    49\n",
105 |        "3    49\n",
106 |        "4    54\n",
107 |        "Name: text, dtype: int64"
108 |       ]
109 |      },
110 |      "execution_count": 2,
111 |      "metadata": {},
112 |      "output_type": "execute_result"
113 |     }
114 |    ],
115 |    "source": [
116 |     "# find the number of characters for each string in df['text']\n",
117 |     "df['text'].str.len()"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "code",
122 |    "execution_count": 3,
123 |    "metadata": {
124 |     "collapsed": false
125 |    },
126 |    "outputs": [
127 |     {
128 |      "data": {
129 |       "text/plain": [
130 |        "0     7\n",
131 |        "1     8\n",
132 |        "2     8\n",
133 |        "3    10\n",
134 |        "4    10\n",
135 |        "Name: text, dtype: int64"
136 |       ]
137 |      },
138 |      "execution_count": 3,
139 |      "metadata": {},
140 |      "output_type": "execute_result"
141 |     }
142 |    ],
143 |    "source": [
144 |     "# find the number of tokens for each string in df['text']\n",
145 |     "df['text'].str.split().str.len()"
146 |    ]
147 |   },
148 |   {
149 |    "cell_type": "code",
150 |    "execution_count": 4,
151 |    "metadata": {
152 |     "collapsed": false
153 |    },
154 |    "outputs": [
155 |     {
156 |      "data": {
157 |       "text/plain": [
158 |        "0     True\n",
159 |        "1     True\n",
160 |        "2    False\n",
161 |        "3    False\n",
162 |        "4    False\n",
163 |        "Name: text, dtype: bool"
164 |       ]
165 |      },
166 |      "execution_count": 4,
167 |      "metadata": {},
168 |      "output_type": "execute_result"
169 |     }
170 |    ],
171 |    "source": [
172 |     "# find which entries contain the word 'appointment'\n",
173 |     "df['text'].str.contains('appointment')"
174 |    ]
175 |   },
176 |   {
177 |    "cell_type": "code",
178 |    "execution_count": 5,
179 |    "metadata": {
180 |     "collapsed": false
181 |    },
182 |    "outputs": [
183 |     {
184 |      "data": {
185 |       "text/plain": [
186 |        "0    3\n",
187 |        "1    4\n",
188 |        "2    3\n",
189 |        "3    4\n",
190 |        "4    8\n",
191 |        "Name: text, dtype: int64"
192 |       ]
193 |      },
194 |      "execution_count": 5,
195 |      "metadata": {},
196 |      "output_type": "execute_result"
197 |     }
198 |    ],
199 |    "source": [
200 |     "# find how many times a digit occurs in each string\n",
201 |     "df['text'].str.count(r'\\d')"
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "code",
206 |    "execution_count": 6,
207 |    "metadata": {
208 |     "collapsed": false
209 |    },
210 |    "outputs": [
211 |     {
212 |      "data": {
213 |       "text/plain": [
214 |        "0                   [2, 4, 5]\n",
215 |        "1                [1, 1, 3, 0]\n",
216 |        "2                   [7, 0, 0]\n",
217 |        "3                [1, 1, 1, 5]\n",
218 |        "4    [0, 8, 1, 0, 0, 9, 0, 0]\n",
219 |        "Name: text, dtype: object"
220 |       ]
221 |      },
222 |      "execution_count": 6,
223 |      "metadata": {},
224 |      "output_type": "execute_result"
225 |     }
226 |    ],
227 |    "source": [
228 |     "# find all occurances of the digits\n",
229 |     "df['text'].str.findall(r'\\d')"
230 |    ]
231 |   },
232 |   {
233 |    "cell_type": "code",
234 |    "execution_count": 7,
235 |    "metadata": {
236 |     "collapsed": false
237 |    },
238 |    "outputs": [
239 |     {
240 |      "data": {
241 |       "text/plain": [
242 |        "0               [(2, 45)]\n",
243 |        "1              [(11, 30)]\n",
244 |        "2               [(7, 00)]\n",
245 |        "3              [(11, 15)]\n",
246 |        "4    [(08, 10), (09, 00)]\n",
247 |        "Name: text, dtype: object"
248 |       ]
249 |      },
250 |      "execution_count": 7,
251 |      "metadata": {},
252 |      "output_type": "execute_result"
253 |     }
254 |    ],
255 |    "source": [
256 |     "# group and find the hours and minutes\n",
257 |     "df['text'].str.findall(r'(\\d?\\d):(\\d\\d)')"
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "code",
262 |    "execution_count": 8,
263 |    "metadata": {
264 |     "collapsed": false
265 |    },
266 |    "outputs": [
267 |     {
268 |      "data": {
269 |       "text/plain": [
270 |        "0          ???: The doctor's appointment is at 2:45pm.\n",
271 |        "1       ???: The dentist's appointment is at 11:30 am.\n",
272 |        "2          ???: At 7:00pm, there is a basketball game!\n",
273 |        "3         ???: Be back home by 11:15 pm at the latest.\n",
274 |        "4    ???: Take the train at 08:10 am, arrive at 09:...\n",
275 |        "Name: text, dtype: object"
276 |       ]
277 |      },
278 |      "execution_count": 8,
279 |      "metadata": {},
280 |      "output_type": "execute_result"
281 |     }
282 |    ],
283 |    "source": [
284 |     "# replace weekdays with '???'\n",
285 |     "df['text'].str.replace(r'\\w+day\\b', '???')"
286 |    ]
287 |   },
288 |   {
289 |    "cell_type": "code",
290 |    "execution_count": 9,
291 |    "metadata": {
292 |     "collapsed": false
293 |    },
294 |    "outputs": [
295 |     {
296 |      "ename": "TypeError",
297 |      "evalue": "repl must be a string",
298 |      "output_type": "error",
299 |      "traceback": [
300 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
301 |       "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
302 |       "\u001b[0;32m<ipython-input-9-e7efa9d56ecc>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# replace weekdays with 3 letter abbrevations\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'text'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreplace\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mr'(\\w+day\\b)'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroups\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
303 |       "\u001b[0;32m/home/sid/anaconda2/lib/python2.7/site-packages/pandas/core/strings.pyc\u001b[0m in \u001b[0;36mreplace\u001b[0;34m(self, pat, repl, n, case, flags)\u001b[0m\n\u001b[1;32m   1504\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mreplace\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpat\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrepl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mn\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcase\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1505\u001b[0m         result = str_replace(self._data, pat, repl, n=n, case=case,\n\u001b[0;32m-> 1506\u001b[0;31m                              flags=flags)\n\u001b[0m\u001b[1;32m   1507\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_wrap_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1508\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
304 |       "\u001b[0;32m/home/sid/anaconda2/lib/python2.7/site-packages/pandas/core/strings.pyc\u001b[0m in \u001b[0;36mstr_replace\u001b[0;34m(arr, pat, repl, n, case, flags)\u001b[0m\n\u001b[1;32m    320\u001b[0m     \u001b[0;31m# Check whether repl is valid (GH 13438)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    321\u001b[0m     \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mis_string_like\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrepl\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 322\u001b[0;31m         \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"repl must be a string\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    323\u001b[0m     \u001b[0muse_re\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mcase\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpat\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    324\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
305 |       "\u001b[0;31mTypeError\u001b[0m: repl must be a string"
306 |      ]
307 |     }
308 |    ],
309 |    "source": [
310 |     "# replace weekdays with 3 letter abbrevations\n",
311 |     "df['text'].str.replace(r'(\\w+day\\b)', lambda x: x.groups()[0][:3])"
312 |    ]
313 |   },
314 |   {
315 |    "cell_type": "code",
316 |    "execution_count": 10,
317 |    "metadata": {
318 |     "collapsed": false
319 |    },
320 |    "outputs": [
321 |     {
322 |      "name": "stderr",
323 |      "output_type": "stream",
324 |      "text": [
325 |       "/home/sid/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:2: FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame)\n",
326 |       "  from ipykernel import kernelapp as app\n"
327 |      ]
328 |     },
329 |     {
330 |      "data": {
331 |       "text/html": [
332 |        "<div>\n",
333 |        "<table border=\"1\" class=\"dataframe\">\n",
334 |        "  <thead>\n",
335 |        "    <tr style=\"text-align: right;\">\n",
336 |        "      <th></th>\n",
337 |        "      <th>0</th>\n",
338 |        "      <th>1</th>\n",
339 |        "    </tr>\n",
340 |        "  </thead>\n",
341 |        "  <tbody>\n",
342 |        "    <tr>\n",
343 |        "      <th>0</th>\n",
344 |        "      <td>2</td>\n",
345 |        "      <td>45</td>\n",
346 |        "    </tr>\n",
347 |        "    <tr>\n",
348 |        "      <th>1</th>\n",
349 |        "      <td>11</td>\n",
350 |        "      <td>30</td>\n",
351 |        "    </tr>\n",
352 |        "    <tr>\n",
353 |        "      <th>2</th>\n",
354 |        "      <td>7</td>\n",
355 |        "      <td>00</td>\n",
356 |        "    </tr>\n",
357 |        "    <tr>\n",
358 |        "      <th>3</th>\n",
359 |        "      <td>11</td>\n",
360 |        "      <td>15</td>\n",
361 |        "    </tr>\n",
362 |        "    <tr>\n",
363 |        "      <th>4</th>\n",
364 |        "      <td>08</td>\n",
365 |        "      <td>10</td>\n",
366 |        "    </tr>\n",
367 |        "  </tbody>\n",
368 |        "</table>\n",
369 |        "</div>"
370 |       ],
371 |       "text/plain": [
372 |        "    0   1\n",
373 |        "0   2  45\n",
374 |        "1  11  30\n",
375 |        "2   7  00\n",
376 |        "3  11  15\n",
377 |        "4  08  10"
378 |       ]
379 |      },
380 |      "execution_count": 10,
381 |      "metadata": {},
382 |      "output_type": "execute_result"
383 |     }
384 |    ],
385 |    "source": [
386 |     "# create new columns from first match of extracted groups\n",
387 |     "df['text'].str.extract(r'(\\d?\\d):(\\d\\d)')"
388 |    ]
389 |   },
390 |   {
391 |    "cell_type": "code",
392 |    "execution_count": 11,
393 |    "metadata": {
394 |     "collapsed": false
395 |    },
396 |    "outputs": [
397 |     {
398 |      "data": {
399 |       "text/html": [
400 |        "<div>\n",
401 |        "<table border=\"1\" class=\"dataframe\">\n",
402 |        "  <thead>\n",
403 |        "    <tr style=\"text-align: right;\">\n",
404 |        "      <th></th>\n",
405 |        "      <th></th>\n",
406 |        "      <th>0</th>\n",
407 |        "      <th>1</th>\n",
408 |        "      <th>2</th>\n",
409 |        "      <th>3</th>\n",
410 |        "    </tr>\n",
411 |        "    <tr>\n",
412 |        "      <th></th>\n",
413 |        "      <th>match</th>\n",
414 |        "      <th></th>\n",
415 |        "      <th></th>\n",
416 |        "      <th></th>\n",
417 |        "      <th></th>\n",
418 |        "    </tr>\n",
419 |        "  </thead>\n",
420 |        "  <tbody>\n",
421 |        "    <tr>\n",
422 |        "      <th>0</th>\n",
423 |        "      <th>0</th>\n",
424 |        "      <td>2:45pm</td>\n",
425 |        "      <td>2</td>\n",
426 |        "      <td>45</td>\n",
427 |        "      <td>pm</td>\n",
428 |        "    </tr>\n",
429 |        "    <tr>\n",
430 |        "      <th>1</th>\n",
431 |        "      <th>0</th>\n",
432 |        "      <td>11:30 am</td>\n",
433 |        "      <td>11</td>\n",
434 |        "      <td>30</td>\n",
435 |        "      <td>am</td>\n",
436 |        "    </tr>\n",
437 |        "    <tr>\n",
438 |        "      <th>2</th>\n",
439 |        "      <th>0</th>\n",
440 |        "      <td>7:00pm</td>\n",
441 |        "      <td>7</td>\n",
442 |        "      <td>00</td>\n",
443 |        "      <td>pm</td>\n",
444 |        "    </tr>\n",
445 |        "    <tr>\n",
446 |        "      <th>3</th>\n",
447 |        "      <th>0</th>\n",
448 |        "      <td>11:15 pm</td>\n",
449 |        "      <td>11</td>\n",
450 |        "      <td>15</td>\n",
451 |        "      <td>pm</td>\n",
452 |        "    </tr>\n",
453 |        "    <tr>\n",
454 |        "      <th rowspan=\"2\" valign=\"top\">4</th>\n",
455 |        "      <th>0</th>\n",
456 |        "      <td>08:10 am</td>\n",
457 |        "      <td>08</td>\n",
458 |        "      <td>10</td>\n",
459 |        "      <td>am</td>\n",
460 |        "    </tr>\n",
461 |        "    <tr>\n",
462 |        "      <th>1</th>\n",
463 |        "      <td>09:00am</td>\n",
464 |        "      <td>09</td>\n",
465 |        "      <td>00</td>\n",
466 |        "      <td>am</td>\n",
467 |        "    </tr>\n",
468 |        "  </tbody>\n",
469 |        "</table>\n",
470 |        "</div>"
471 |       ],
472 |       "text/plain": [
473 |        "                0   1   2   3\n",
474 |        "  match                      \n",
475 |        "0 0        2:45pm   2  45  pm\n",
476 |        "1 0      11:30 am  11  30  am\n",
477 |        "2 0        7:00pm   7  00  pm\n",
478 |        "3 0      11:15 pm  11  15  pm\n",
479 |        "4 0      08:10 am  08  10  am\n",
480 |        "  1       09:00am  09  00  am"
481 |       ]
482 |      },
483 |      "execution_count": 11,
484 |      "metadata": {},
485 |      "output_type": "execute_result"
486 |     }
487 |    ],
488 |    "source": [
489 |     "# extract the entire time, the hours, the minutes, and the period\n",
490 |     "df['text'].str.extractall(r'((\\d?\\d):(\\d\\d) ?([ap]m))')"
491 |    ]
492 |   },
493 |   {
494 |    "cell_type": "code",
495 |    "execution_count": 12,
496 |    "metadata": {
497 |     "collapsed": false
498 |    },
499 |    "outputs": [
500 |     {
501 |      "data": {
502 |       "text/html": [
503 |        "<div>\n",
504 |        "<table border=\"1\" class=\"dataframe\">\n",
505 |        "  <thead>\n",
506 |        "    <tr style=\"text-align: right;\">\n",
507 |        "      <th></th>\n",
508 |        "      <th></th>\n",
509 |        "      <th>time</th>\n",
510 |        "      <th>hour</th>\n",
511 |        "      <th>minute</th>\n",
512 |        "      <th>period</th>\n",
513 |        "    </tr>\n",
514 |        "    <tr>\n",
515 |        "      <th></th>\n",
516 |        "      <th>match</th>\n",
517 |        "      <th></th>\n",
518 |        "      <th></th>\n",
519 |        "      <th></th>\n",
520 |        "      <th></th>\n",
521 |        "    </tr>\n",
522 |        "  </thead>\n",
523 |        "  <tbody>\n",
524 |        "    <tr>\n",
525 |        "      <th>0</th>\n",
526 |        "      <th>0</th>\n",
527 |        "      <td>2:45pm</td>\n",
528 |        "      <td>2</td>\n",
529 |        "      <td>45</td>\n",
530 |        "      <td>pm</td>\n",
531 |        "    </tr>\n",
532 |        "    <tr>\n",
533 |        "      <th>1</th>\n",
534 |        "      <th>0</th>\n",
535 |        "      <td>11:30 am</td>\n",
536 |        "      <td>11</td>\n",
537 |        "      <td>30</td>\n",
538 |        "      <td>am</td>\n",
539 |        "    </tr>\n",
540 |        "    <tr>\n",
541 |        "      <th>2</th>\n",
542 |        "      <th>0</th>\n",
543 |        "      <td>7:00pm</td>\n",
544 |        "      <td>7</td>\n",
545 |        "      <td>00</td>\n",
546 |        "      <td>pm</td>\n",
547 |        "    </tr>\n",
548 |        "    <tr>\n",
549 |        "      <th>3</th>\n",
550 |        "      <th>0</th>\n",
551 |        "      <td>11:15 pm</td>\n",
552 |        "      <td>11</td>\n",
553 |        "      <td>15</td>\n",
554 |        "      <td>pm</td>\n",
555 |        "    </tr>\n",
556 |        "    <tr>\n",
557 |        "      <th rowspan=\"2\" valign=\"top\">4</th>\n",
558 |        "      <th>0</th>\n",
559 |        "      <td>08:10 am</td>\n",
560 |        "      <td>08</td>\n",
561 |        "      <td>10</td>\n",
562 |        "      <td>am</td>\n",
563 |        "    </tr>\n",
564 |        "    <tr>\n",
565 |        "      <th>1</th>\n",
566 |        "      <td>09:00am</td>\n",
567 |        "      <td>09</td>\n",
568 |        "      <td>00</td>\n",
569 |        "      <td>am</td>\n",
570 |        "    </tr>\n",
571 |        "  </tbody>\n",
572 |        "</table>\n",
573 |        "</div>"
574 |       ],
575 |       "text/plain": [
576 |        "             time hour minute period\n",
577 |        "  match                             \n",
578 |        "0 0        2:45pm    2     45     pm\n",
579 |        "1 0      11:30 am   11     30     am\n",
580 |        "2 0        7:00pm    7     00     pm\n",
581 |        "3 0      11:15 pm   11     15     pm\n",
582 |        "4 0      08:10 am   08     10     am\n",
583 |        "  1       09:00am   09     00     am"
584 |       ]
585 |      },
586 |      "execution_count": 12,
587 |      "metadata": {},
588 |      "output_type": "execute_result"
589 |     }
590 |    ],
591 |    "source": [
592 |     "# extract the entire time, the hours, the minutes, and the period with group names\n",
593 |     "df['text'].str.extractall(r'(?P<time>(?P<hour>\\d?\\d):(?P<minute>\\d\\d) ?(?P<period>[ap]m))')"
594 |    ]
595 |   },
596 |   {
597 |    "cell_type": "code",
598 |    "execution_count": null,
599 |    "metadata": {
600 |     "collapsed": true
601 |    },
602 |    "outputs": [],
603 |    "source": []
604 |   }
605 |  ],
606 |  "metadata": {
607 |   "kernelspec": {
608 |    "display_name": "Python [default]",
609 |    "language": "python",
610 |    "name": "python2"
611 |   },
612 |   "language_info": {
613 |    "codemirror_mode": {
614 |     "name": "ipython",
615 |     "version": 2
616 |    },
617 |    "file_extension": ".py",
618 |    "mimetype": "text/x-python",
619 |    "name": "python",
620 |    "nbconvert_exporter": "python",
621 |    "pygments_lexer": "ipython2",
622 |    "version": "2.7.13"
623 |   }
624 |  },
625 |  "nbformat": 4,
626 |  "nbformat_minor": 2
627 | }
628 | 


--------------------------------------------------------------------------------
/dates.txt:
--------------------------------------------------------------------------------
  1 | 03/25/93 Total time of visit (in minutes):
  2 | 6/18/85 Primary Care Doctor:
  3 | sshe plans to move as of 7/8/71 In-Home Services: None
  4 | 7 on 9/27/75 Audit C Score Current:
  5 | 2/6/96 sleep studyPain Treatment Pain Level (Numeric Scale): 7
  6 | .Per 7/06/79 Movement D/O note:
  7 | 4, 5/18/78 Patient's thoughts about current substance abuse:
  8 | 10/24/89 CPT Code: 90801 - Psychiatric Diagnosis Interview
  9 | 3/7/86 SOS-10 Total Score:
 10 | (4/10/71)Score-1Audit C Score Current:
 11 | (5/11/85) Crt-1.96, BUN-26; AST/ALT-16/22; WBC_12.6Activities of Daily Living (ADL) Bathing: Independent
 12 | 4/09/75 SOS-10 Total Score:
 13 | 8/01/98 Communication with referring physician?: Done
 14 | 1/26/72 Communication with referring physician?: Not Done
 15 | 5/24/1990 CPT Code: 90792: With medical services
 16 | 1/25/2011 CPT Code: 90792: With medical services
 17 | 4/12/82 Total time of visit (in minutes):
 18 | 1; 10/13/1976 Audit C Score, Highest/Date:
 19 | 4, 4/24/98 Relevant Drug History:
 20 | ) 59 yo unemployed w referred by Urgent Care for psychiatric evaluation and follow up. The patient reports she has been dx w BAD. Her main complaint today is anergy. Ms. Hartman was evaluated on one occasion 5/21/77. She was a cooperative but somewhat vague historian.History of Present Illness and Precipitating Events
 21 | 7/21/98 Total time of visit (in minutes):
 22 | 210/21/79 SOS-10 Total Score:
 23 | 3/03/90 CPT Code: 90792: With medical services
 24 | 2/11/76 CPT Code: 90792: With medical services
 25 | 07/25/1984 CPT Code: 90791: No medical services
 26 | 4-13-82 Other Child Mental Health Outcomes Scales Used:
 27 | 9/22/89 CPT Code: 90792: With medical services
 28 | 9/02/76 CPT Code: 90791: No medical services
 29 | 9/12/71 [report_end]
 30 | 10/24/86 Communication with referring physician?: Done
 31 | 03/31/1985 Total time of visit (in minutes):
 32 | 7/20/72 CPT Code: 90791: No medical services
 33 | 4/12/87= 1, negativeAudit C Score Current:
 34 | 06/20/91 Total time of visit (in minutes):
 35 | 5/12/2012 Primary Care Doctor:
 36 | 3/15/83 SOS-10 Total Score:
 37 | 2/14/73 CPT Code: 90801 - Psychiatric Diagnosis Interview
 38 | 5/24/88 CPT Code: 90792: With medical services
 39 | 7/27/1986 Total time of visit (in minutes):
 40 | 1-14-81 Communication with referring physician?: Done
 41 | 7-29-75 CPT Code: 90801 - Psychiatric Diagnosis Interview
 42 | (6/24/87) TSH-2.18; Activities of Daily Living (ADL) Bathing: Independent
 43 | 8/14/94 Primary Care Doctor:
 44 | 4/13/2002 Primary Care Doctor:
 45 | 8/16/82 CPT Code: 90792: With medical services
 46 | 2/15/1998 Total time of visit (in minutes):
 47 | 7/15/91 CPT Code: 90792: With medical services
 48 | 06/12/94 SOS-10 Total Score:
 49 | 9/17/84 Communication with referring physician?: Done
 50 | 2/28/75 Other Adult Mental Health Outcomes Scales Used:
 51 | sOP WPM - Dr. Romo-psychopharm since 11/22/75
 52 | "In the context of the patient's work up, she had Neuropsychological  testing 5/24/91.  She was referred because of her cognitive decline and symptoms of motor dysfunction to help characterize her current functioning. :
 53 | 6/13/92 CPT Code: 90791: No medical services
 54 | 7/11/71 SOS-10 Total Score:
 55 | 12/26/86 CPT Code: 90791: No medical services
 56 | 10/11/1987 CPT Code: 90791: No medical services
 57 | 3/14/95 Primary Care Doctor:
 58 | 12/01/73 CPT Code: 90791: No medical services
 59 | 0: 12/5/2010 Audit C Score Current:
 60 | 08/20/1982 SOS-10 Total Score:
 61 | 7/24/95 SOS-10 Total Score:
 62 | 8/06/83 CPT Code: 90791: No medical services
 63 | 02/22/92 SOS-10 Total Score:
 64 | 6/28/87 Impression Strengths/Abilities:
 65 | 07/29/1994 CPT code: 99203
 66 | 08/11/78 CPT Code: 90801 - Psychiatric Diagnosis Interview
 67 | 10/29/91 Communication with referring physician?: Done
 68 | 7/6/91 SOS-10 Total Score:
 69 | onone as of 1/21/87 Protective Factors:
 70 | 24 yo right handed woman with history of large right frontal mass s/p resection 11/3/1985 who had recent urgent R cranial wound revision and placement of L EVD for declining vision and increased drainage from craniotomy incision site and possible infection. She has a hx of secondary mania related to psychosis and manipulation of her right frontal lobe.
 71 | 7/04/82 CPT Code: 90801 - Psychiatric Diagnosis Interview
 72 | 4-13-89 Communication with referring physician?: Not Done
 73 | Lithium 0.25 (7/11/77).  LFTS wnl.  Urine tox neg.  Serum tox + fluoxetine 500; otherwise neg.  TSH 3.28.  BUN/Cr: 16/0.83.  Lipids unremarkable.  B12 363, Folate >20.  CBC: 4.9/36/308 Pertinent Medical Review of Systems Constitutional:
 74 | 4/12/74= 1Caffeine / Tobacco Use Caffeinated products: Yes
 75 | 09/19/81 CPT Code: 90792: With medical services
 76 | 9/6/79 Primary Care Doctor:
 77 | 12/5/87 Total time of visit (in minutes):
 78 | 01/05/1999 [report_end]
 79 | 4/22/80 SOS-10 Total Score:
 80 | 10/04/98 SOS-10 Total Score:
 81 | .Mr. Echeverria described having panic-like experiences on at least three occasions. The first of these episodes occurred when facial meshing was placed on him during radiation therapy. He recalled the experience vividly and described "freaking out", involving a sensation of crushing in his chest, difficulty breathing, a fear that he could die, and intense emotional disquiet. The second episode occurred during the last 10-15 minutes of an MRI scan. Mr. Echeverria noted feeling surprised and fearful when the feelings of panic started. Similar physiological and psychological symptoms were reported as above. The third episode occurred recently on 6/29/81. Mr.Echeverria was driving his car and felt similar symptoms as described above. He reported fearing that he would crash his vehicle and subsequently pulled over and abandoned his vehicle.
 82 | . Wile taking additional history, pt endorsed moderate depression and anxiety last 2 weeks, even though her period was on 8/04/78. Pt then acknowledged low to moderate mood and anxiety symptoms throughout the month, with premenstrual worsening. This mood pattern seems to have existed for many years, while premenstrual worsening increased in severity during last ~ 2 years, in context of likely peri menopause transition. Pt reported average of 3 hot flashes per day and 2 night sweats.
 83 | Death of mother; 7/07/1974 Meaningful activities/supports:
 84 | 09/14/2000 CPT Code: 90792: With medical services
 85 | 5/18/71 Total time of visit (in minutes):
 86 | 8/09/1981 Communication with referring physician?: Done
 87 | 6/05/93 CPT Code: 90791: No medical services
 88 | )Dilantin (PHENYTOIN) 100 MG CAPSULE Take 2 Capsule(s) PO HS; No Change (Taking Differently), Comments: decreased from 290 daily to 260 mg daily due to elevated phenytoin level from 8/9/97
 89 | 12/8/82 Audit C=3Audit C Score Current:
 90 | 8/26/89 CPT Code: 90791: No medical services
 91 | 10/13/95 CPT Code: 90791: No medical services
 92 | 4/19/91 Communication with referring physician?: Not Done
 93 | .APS - Psychiatry consult paged/requested in person at 04/08/2004 16:39 Patrick, Christian [hpp2]
 94 | 9/20/76 CPT Code: 90801 - Psychiatric Diagnosis Interview
 95 | 12/08/1990 @11 am [report_end]
 96 | 4/11/1974 Chief Complaint / HPI Chief Complaint (Patients own words)
 97 | 7/18/86 SOS-10 Total Score:
 98 | 3/31/91 Communication with referring physician?: Done
 99 | 5/13/72 Other Adult Mental Health Outcomes Scales Used:
100 | 011/14/83 Audit C Score Current:
101 | 8/16/92 SOS-10 Total Score:
102 | 10/05/97 CPT Code: 90791: No medical services
103 | 07/18/2002 CPT Code: 90792: With medical services
104 | 9/22/82 Total time of visit (in minutes):
105 | 2/24/74 SOS-10 Total Score:
106 | (2/03/78) TSH-0.90 Activities of Daily Living (ADL) Bathing: Independent
107 | 2/11/2006 CPT Code: 90791: No medical services
108 | Pt is a 21 year old, single, heterosexual identified, Liechtenstein male. He resides in Rabat, in a rented apartment, with a roommate. He has not graduated from high school nor has he earned his GED. He has been working full time, steadily, since the age of 16. He recently left a job at jacobs engineering group where he worked for about 5 years. He started working at becton dickinson a few weeks ago. Referral to SMH was precipitated by pt seeing his PCP, Dr. Ford, for a physical and sharing concerns re: depression. He was having suicidal thoughts the week of 8/22/83 and noticed that his depression and anxiety symptoms were starting to interfere with work. Per Dr. Ford's recommendation, he contacted PAL for assistance. When screened for substance use, pt was referred to SMH for an intake. Pt was scheduled to be seen last week due to urgency of his depression and SI, but missed the intake appointment due to getting extremely intoxicated the night before. Pt presents today seeking individual therapy and psychiatry services to address longstanding depression.
109 | PET Scan (DPSH 5/04/74): 1) Marked hypometabolism involving the bilateral caudate nuclei. This is a non-specific finding, although this raises the question of a multisystem atrophy, specifically MSA-P (striatonigral degeneration), 2) Cortical hypometabolism involving the right inferior frontal gyrus. This is a non-specific finding of uncertain significance, and is not definitively related to a neurodegenerative process, although pathology in this region is not excluded
110 | 7/20/2011 [report_end]
111 | 6/17/95 Total time of visit (in minutes):
112 | 6/10/72 SOS-10 Total Score:
113 | nPt denied use to me but endorsed use on 10/16/82 as part of BVH initial visit summary supplement by Nicholas BenjaminOpiates: Yes
114 | 12/15/92 CPT Code: 90801 - Psychiatric Diagnosis Interview
115 | 12/8/97 SOS-10 Total Score:
116 | 4/05/89 Primary Care Doctor:
117 | 12/04/87 SOS-10 Total Score:
118 | 4 (6/20/77)Audit C Score Current:
119 | see 4/27/2006 consult note Dr. GuevaraWhat factors in prior treatment were helpful/not helpful:
120 | 07/17/92 CPT Code: 90791: No medical services
121 | 12/22/98 CPT Code: 90801 - Psychiatric Diagnosis Interview
122 | 10/02/96 Age:
123 | 11/05/90 CPT Code: 90792: With medical services
124 | 5/04/77 CPT Code: 90792: With medical services
125 | 2/27/96 Communication with referring physician?: Done
126 | s The patient is a 44 year old married Caucasian woman, unemployed Decorator, living with husband and caring for two young children, who is referred by Capitol Hill Hospital PCP, Dr. Heather Zubia, for urgent evaluation/treatment till first visit with Dr. Toney Winkler IN EIGHT WEEKS on 24 Jan 2001.
127 | .10 Sep 2004 - Intake at EEC for IOP but did not follow up
128 | see above and APS eval of 26 May 1982 Social History Marital Status: Single
129 | Tbooked for intake appointment at Sierra Vista, Chongging, WY on 28 June 2002
130 | 06 May 1972 SOS-10 Total Score:
131 | 25 Oct 1987 Total time of visit (in minutes):
132 | 14 Oct 1996 SOS-10 Total Score:
133 | 30 Nov 2007 CPT Code: 90801 - Psychiatric Diagnosis Interview
134 | h missed intake office visit on 28 June 1994 at Sierra Vista Nursing HomeSuicidal Behavior Hx of Suicidal Behavior: Yes
135 | 14 Jan 1981 SOS-10 Total Score:
136 | .On 10 Oct 1985 patient began FMLA from work.
137 | 11 February 1985 CPT Code: 90801 - Psychiatric Diagnosis Interview
138 | 7intake for follow up treatment at Anson General Hospital on 10 Feb 1983 @ 12 AM
139 | s The patient is a 49 year old married Canadian emigree, living with husband and two young adult children, with history of recurrent major depression with psychotic features and SI/SA, and alcoholism, requiring multiple hospitalizations for SI/SA and detox, who presents for bridge care s/p discharge from Placerville Sanatorium on 05 Feb 1992 and follow up outpatient mental health treatment at Anson General Hospital.
140 | ssee 21 Oct 2012 Schroder Hospital discharge summaryPsychiatric Review of Systems DEPRESSION: Has the patient had periods of time lasting two weeks or longer in which, most of the day on most days, they felt little interest or pleasure in doing things, or they had to push themselves to do things: Yes
141 | 14 Feb 1995 Primary Care Doctor:
142 | 30 May 2016 SOS-10 Total Score:
143 | 22 Jan 1996 @ 11 AMCommunication with referring physician?: Done
144 | 14 Oct 1992 CPT Code: 90792: With medical services
145 | .On 06 Oct 2003 patient sought attention of CMC PCP Dr. Nevarez, who consulted  Grandview Sanitarium, obrained consultation from this psychiatrist, who considered SSRI 'poop out' and suggested swtich from paroxitine to fluoxetine.
146 | 18 Oct 1999 SOS-10 Total Score:
147 | 11 Nov 2004 Total time of visit (in minutes):
148 | 30 May 2001 CPT Code: 90801 - Psychiatric Diagnosis Interview
149 | .On 02 Feb 1978 patient was seen by Dr. Fisher who found depression had relapsed on fluoxetine 20 mg, with patient reporting work-related stressors and complaining "Prozac has stopped working".  Plan:  stop fluoxetine and start bupropion 150 mg BID.
150 | 09 Sep 1989 Primary Care Doctor:
151 | 12 March 1980 SOS-10 Total Score:
152 | 22 June 1990 Medical History:
153 | 28 Sep 2015 Primary Care Doctor:
154 | 13 Jan 1972 Primary Care Doctor:
155 | 06 Mar 1974 Primary Care Doctor:
156 | 10 Oct 1974 Primary Care Doctor:
157 | 26 May 1974 Primary Care Doctor:
158 | short term partial hospital program at Pine Rest Hospital starting 10 Feb 1990
159 | 23 Aug 2000 Primary Care Doctor:
160 | .The patient brought himself to APS on 26 May 2001 for ONE MONTH HISTORY of recurrence of increased depressed and anxious mood, with
161 | see 21 Oct 2007 Schroder Hospital discharge summaryViolent Behavior Hx of Violent Behavior: Yes
162 | 19 Oct 2016 Communication with referring physician?: Done
163 | dON 05 Mar 1974
164 | 29 Jan 1994 Primary Care Doctor:
165 | see 21 Oct 1978 Schroder Hospital discharge summaryHx of Non Suicidal Self Injurious Behavior: Yes
166 | .On 18 August 1975 patient presented to BH ED/APS for psychological distress over binge eating and weight concerns, as well as acknowledging chronic intermittent SI but no acute SI; otherwise good response to Abilify 20 mg QD.  Patient was discharged home to follow up with Urgent care TODAY.
167 | 11 Nov 1996 CPT Code: 90792: With medical services
168 | 01 Oct 1979 Primary Care Doctor:
169 | .On 13 Oct 1986 patient admitted to Schroder Hospital for affect instability, having been discharged THAT DAY from a residential program.
170 | see 21 Oct 1995 Schroder Hospital discharge summary-Psychiatric History Hx of Inpatient Treatment: Yes
171 | 44 y/o MWF, unemployed Decorator, living with husband and caring for their two young children, referred by CHH PCP for urgent evaluation/treatment till first visit with Dr. Toney Winkler on 24 Jan 2011, presents with history of unclear rheumatologic illness and partially treated post-partum depression which has worsened in severity since patient's discovery of dire family financial situation, with significant anxiety symptoms causing insomnia.
172 | 04 Oct 1972 Total time of visit (in minutes):
173 | .On 23 Aug 1993 at return office visit Dr. Fisher found patient to still be depressed on bupropion 150 mg QD, with new complaints of cognitive symptoms: "difficult to speak in an organized fashion, has poor memory, and has some word finding difficulties".  Patient reports cognitive problems as "can't find the right word....can't remember how to say the word".  Brain MR was WNL.
174 | 18 Oct 2006 Age:
175 | .On 04 Dec 1988 the patient was admitted to Fairmont Hospital from the APS for NINE MONTH history of increased depression severity, on treatment with sertraline 200 mg, Lunesta 3 mg, and lorazepam 0.5 mg 1-2 QD prn, and in midst of SIX MONTH trial of psychotherapy from Carol Shaw, PhD, in the setting of termination from job as User Support Technician, after six month leave, in midst of lawsuit over wrongful termination.
176 | .On 21 Oct 1983 patient was discharged from Scroder Hospital after EIGHT DAY ADMISSION
177 | see above and APS eval of 26 May 2010 Education Education Level: College Grad
178 | 22 year old single Caucasian/Latino woman, unemployed Cook recent college graduate, living along with pet rabbit, with long history of depression with cutting and SA but never hospitalized, as well as obesity, hypothyroidism in euthyroid state and PCO, referred by new NMH PCP Dr. Evelyn Julian for urgent evaluation and treatment till first visit with colleague, Dr.  Inez Burns, on 18 Jan 1990, presents with untreated recurrent major depression and developmental history of verbally abusive alcohol abusing mother and school bullying.
179 | 15 Jun 1985 @ 11 AMCommunication with referring physician?: Done
180 | .On 10 Dec 1982, the first night after hospital discharge, the patient experienced a nightmare, which he attributed to new medications and he self-discontinued all these new medications.
181 | .On 09 Dec 1988 the patient was discharged home-
182 | 18 August 1995 Primary Care Doctor:
183 | eby 13 June 1974 it appears amitriptyline had been stopped and bupropion 150 mg BID was restarted.
184 | see above and APS eval of 26 May 2008 Financial Stress: Yes
185 | 11 Nov 2002 SOS-10 Total Score:
186 | see 17 Aug 1985 eval by Dr. Ngo.Social History Marital Status: Married
187 | 13 Oct 2016 Primary Care Doctor:
188 | 14 Jan 2008 Total time of visit (in minutes):
189 | 12 March 2004 CPT Code: 90801 - Psychiatric Diagnosis Interview
190 | ssee 21 Oct 1977 Schroder Hospital discharge summaryHx of Outpatient Treatment: Yes
191 | .By 10 Aug 2000 patient was rehospitalized at Sweetwater Hospital for depression, SI, hypersomnia, fatigue, poor concentration, negative ruminations, dissociated states and ? VH.  This was in the setting of no medications and continued alcohol and MJ abuse.
192 | 30 Nov 1972 SOS-10 Total Score:
193 | 06 May 1993 CPT Code: 90792: With medical services
194 | s 22 year old single Caucasian/Latino woman, unemployed Cook recent college graduate, living along with pet rabbit, with long history of depression with SA but never hospitalized, as well as obesity, hypothyroidism, and PCO, referred by new NMH PCP Dr. Evelyn Julian for urgent evaluation and treatment till first visit with colleague, Dr.  Inez Burns, on 18 Jan 1995.
195 | April 11, 1990 CPT Code: 90791: No medical services
196 | MRI May 30, 2001 empty sella but no problems with endocrine functionPertinent Medical Review of Systems Constitutional:
197 | .Feb 18, 1994: made a phone call to Mom and Mom commented that he was talking very fast, hard to interrupt, but was in super happy spirits, so didn't make a big deal of it.
198 | Brother died February 18, 1981 Parental/Caregiver obligations:
199 | none; but currently has appt with new HJH PCP Rachel Salas, MD on October. 11, 2013 Other Agency Involvement: No
200 | .Came back to US on Jan 24 1986, saw Dr. Quackenbush at Beaufort Memorial Hospital.  Checked VPA level and found it to be therapeutic and confirmed BPAD dx.  Also, has a general physician exam and found to be in good general health, except for being slightly overwt.
201 | July 26, 1978 Total time of visit (in minutes):
202 | father was depressed inpatient at DFC December 23, 1999 Family History of Substance Abuse: Yes
203 | May 15, 1989 SOS-10 Total Score:
204 | September 06, 1995 Total time of visit (in minutes):
205 | Mar. 10, 1976 CPT Code: 90791: No medical services
206 | .Got back to U.S. Jan 27, 1983.
207 | Queen Hamilton in Bonita Springs courthouse.  Ends October 23 1990 Current legal stresses/needs:
208 | r August 12 2004 - diagnosed with Parkinson's stopped working Financial Stress: No
209 | September 01, 2012 Age:
210 | July 25, 1983 Total time of visit (in minutes):
211 | August 11, 1989 Total time of visit (in minutes):
212 | April 17, 1992 Total time of visit (in minutes):
213 | EKG July 24, 1999: QTc 496 msPertinent Medical Review of Systems Constitutional:
214 | July 11, 1997 CPT Code: 90792: With medical services
215 | s Gale Youngquist is a 22 yo single Caucasian female who presents to the HJH Psychiatry for an evaluation of her current eating disorder status and treatment needs.  She shared that she began restricting at age 11 yo but could not identify any specific triggers.  "Mostly I wouldn't eat at school, and then I'd have dinner at home, but some days not," per pt. Her parents and ski coach started noticing that "I wasn't eating and was losing weight."  She began outpatient treatment at age 11, and has been in and out of outpatient and higher levels of care since then, including multiple emergency room visits and against medical advice (AMA) departures.  Most recently, she left Jefferson County Health Delegates AMA on Sep. 10, 1974.  "They weren't saying I was ready to go, but insurance pulled out," per pt.  She said she felt both dissappointed but also relieved to leave residential care to move back home w/ her mother.
216 | .August 14, 1981- bad reaction to SpiceK2 - synthetic MJ- admitted to Crete Manor, Mcalester.
217 | Nov 11, 1988 Total time of visit (in minutes):
218 | e June 13, 2011 Suicidal Behavior Hx of Suicidal Behavior: Uncertain
219 | May 14, 1989 QTc  467 ms.  Pertinent Medical Review of Systems Constitutional:
220 | stwin boys born Dec 14 1975 Gambling behavior: No
221 | Refilled trazodone. Patient to follow up with geriatric psychiatrist. My leaving the clinic was discussed. Patient to call me until June 25, 2012 if she has any issues.
222 | Oct 18, 1980 Total time of visit (in minutes):
223 | May 15, 1998 Age:
224 | .She saw a counselor in high school about her drinking in high school, and in college.  Was a period in college where she decided to leave college - felt liek the palce was too big, lost - felt depressed after she left.   therapist told her when she was 20 that she wasn't built to drink alcohol so she quit (she was taking risks that would horrify her as a mother).  She thinks she may be alcoholic, last drink was October 14 1974 - except once- feels not an effort.  Felt better after 20. Had various jobs, got involved in jobs, did flower arranging job (family business)
225 | July 25, 1998 CPT Code: 90791: No medical services
226 | June 15, 1972 Family Psych History: Family History of Suicidal Behavior: Death(s) by Suicide
227 | See previous - Parking garage incident January 07, 1991.
228 | September. 15, 2011 Total time of visit (in minutes):
229 | s 20 yo M carries dx of BPAD, presents for psychopharm consult.  Moved to Independence area for school as of September 1985.
230 | t Allergies Sulfa (Sulfonamide Antibiotics) - Renal Toxicity : pt developed acute interstitial nephritis on Bactrim (June 2011)
231 | B/R Walnut Ridge. Raised with sister and parents. Parents divorced when he was 12 yo. Raised with mother. Attended SUNY New Paltz. Never married, no children. Worked as a Cleaning Supervisor, but got laid off in May 1986. Has friends that he spends time with. Lives in Topeka. No pets. No access to guns/firearms. Past verbal, emotional, physical, sexual abuse: No
232 | 50 yo DWF with a history of alcohol use disorder, anxiety and depression, and prior trauma history, no prior psychiatric treatment referred for care at ZMH by PCP.  She is seeking help mainly around management of anxiety, and it is unclear to what extent alcohol (which she has not consumed since May 2016) is affecting her current situation.  Biological factors include likely primary mood and/or anxiety disorders (perhaps depression with panic attacks), consequences of long-term alcohol use. Important psychological factors include the loss of three improtant family members and the prior trauma at multiple times. Her social situation reflects isolation and an unfortunately high level of dependence on her children.  One important challenge is her discomfort with opening up and seeking care, which may benefit from positive treatment alliances.
233 | )HTN, hypercholesterolemia, DM, sleep apnea,, nephrolithiasis. chronic renal impairment, DVT since July 1977 on enoxaparin.
234 | Dr. Gloria English, who conducted an initial consultation in July, 1990
235 | . Patient presents with GAD flareup in Jul 2003 in setting of extended hospital ED visit for symptoms of panic attack with cardiac and neuro testing, which have responded poorly to trials of lorazepam, clonazepam and CBT; trials marked by poor adherence, as well as poor tolerance of low dose paroxetine.
236 | pOct 2015 - Admitted to Gray Clinic for depression, alcohol intox/dependence and SI.  Discharged to MH Partial Hospital---on trazadone 50 mg QHS, Ativan 0.5 mg BID, Wellbutrin 150 mg BID, Seroquel 50 mg QHS and Zoloft 100 mg.  It is unclear if patient attended Memorial partial hospital program.  By report she had refills of trazadone and sertraline via PCP (see HPI).Hx of Outpatient Treatment: Yes
237 | Pt does not use marijuana currently. Her last use was before her May 1995 suicide attempt. Pt used to smoke marijuana daily bc it was provided by an ex-boyfriend several times per day. Her first use was when she was 18.
238 | safter evicted in February 1976, hospitalized at Pemberly for 1 mo.Hx of Outpatient Treatment: No
239 | . History of sleep apnea, set up with CPAP in January 1995. Over the last month or so he has been able to wear 5-6 hours.
240 | . In Feb 1978, misdxed with STD- had plan to O/D on painkillers, but never did it. Hx of Non Suicidal Self Injurious Behavior: No
241 | )- Venlafaxine 37.5mg daily: May, 2011: self-discontinued due to side effects (dizziness)
242 | sLanguage based learning disorder, dyslexia.  Placed on IEP in 1st grade through Westerbrook HS prior to transitioning to VFA in 8th grade.  Graduated from VF Academy in May 2004.  Attended 1.5 years college at Arcadia.Employment Currently employed: Yes
243 | oFather died in Nov 2010.  Normal grieving.Alcohol Use: How often did you have a drink containing alcohol in the past year: Never (0 Points)
244 | tIn Sep 2012 patient returned to Jinan, moving into an off campus rental apartment he's sharing with software engineer visiting from Storage Technology Corporation (originally from Mount Pleasant).
245 | s Mr. Moss is a 27-year-old, Caucasian, engaged veteran of the Navy. He was previously scheduled for an intake at the Southton Sanitorium in January, 2013 but cancelled due to ongoing therapy (see Psychiatric History for more details). He presents to the current intake with primary complaints of sleep difficulties, depressive symptoms,  and PTSD.
246 | lNovember 1990 - NPCCHx of Outpatient Treatment: Yes
247 | husband's death July 1981 Cultural Identity (Race, Country of Origin):
248 | "May 1983"Primary Care Doctor:
249 | ;Trazodone 50-100 mg QHS, had also been on this long-time when it was d/ced in July 1995.
250 | 0h/o 3 detoxes (2 for EtOH, 1 opioids).  Alaska April 1993 (opioids)Hx of Outpatient Treatment: Yes
251 | ) He endorsed the following hyperarousal symptoms: disturbed sleep (e.g., months with great sleep and then has nights he barely sleeps- mind racing with worries related to job, expecting a child (May 2005)); reckless or self-destructive behavior (e.g., see above); difficulty with concentration; some hypervigilance; and exaggerated startle response triggered by loud noises.
252 | December 1998 Primary Care Doctor:
253 | n Abilify (added to Lexapro + Wellbutrin in Jan 2007)
254 | slast use Feb 2016 Sedative-Hypnotics: Yes
255 | sApproximately 7 psychiatric hospitalizations starting Humana Hospital-Augusta (Alabama) at age 12, most recently 1-month residential program at Jefferson County Health Department in August 1979 but left AMA
256 | sOne week Memorial Psychiatric Hospital Oct 2014, feels triggered by distress with mother as had stopped and felt depressed 2 months with sucidal ideation. Does not have records but responded well and told PTSD main diagnosis. Depression lifetd with treatment
257 | IAug 1988: Symmes Hospital s/p SA OD with ASA
258 | . Sep 2015- Transferred to Memorial Hospital from above.  Discharged to MH Partial Hospital on Zoloft, Trazadone and Neurontin but unclear if she followed up.
259 | .  Pt diagnosed in Apr 1976 after he presented with 2 month history of headaches and gait instability. MRI demonstrated 4 cm L cereballar mass in the paravermian region. He was admitted to PRM and underwent resection complicated by post-op delirium. Post-op sequelas include left palatal myoclonus and ataxia on the left upper and lower extremities which has progressively improved. Pt has not had any evidence of tumor recurrence.
260 | eLast saw a therapist in Nov 1979 What factors in prior treatment were helpful/not helpful:
261 | .Iredell Memorial Hospital, February 2000, after suicide attempt by cutting.Hx of Outpatient Treatment: Yes
262 | .  Pt was seen in Lutheran Brethern Homes intake clinic.  Since her father's death, pt report worsening depression, poor sleep, fatigue, anxiety, and auditory hallucinations. Her auditory hallucinations consist of her father's voicing telling her to come with him, but she denies any commands to harm herself. Father died Oct 1986, mother died the year prior.  Sister died last year.  As a result of these voices patient's sleep has worsened. She denies suicidal ideation stating that she would never harm himself because of her youngest sons who live at home.
263 | hHelen Ernst, M.D. last appt Jun 2002, no other treaters
264 | sgoiter--diagnosed in September 1981, Pt feels thyroid problems are related to h/o lithium use
265 | ssomething related to hypnosis, Pt unable to explain, medical record indicates when Pt presented in June 2007 to ED her speech was nonsensical but she was reporting some type of hypnosis eventGAD: Has the patient had times when they worried excessively about day to day matters for most of the day, more days than not: No
266 | .  Pt reports h/o difficulty with EtOH and opioids. EtOH: 1st use at age 10, regular use beginning in high school, more problematic at age 25 when started losing things.  Drinking 12 beers/day over past year since returning to State College from Alaska.  Unsure of heaviest use period.  h/o frequent blackouts.  Has passed out and woken up in variety of places.  h/o withdrawal symptoms, 1st detox at age 25, no h/o withdrawal seizure, no DTs.  opioids: 1st use at age 21 (perocet).  Began using daily at age 25, at worst was using 8 to 10 percocets per day (depending on dose).  Detox in April 1989, was using perioidically since back in State College, but has primarily been drinking.  Last use 3 mths ago.  no h/o suboxone or methadone.  No h/o accidental OD.  No h/o heroin use.  No h/o IVDU.  tobacco: 1 ppd since age 16, some interest in quitting, not right now.  H/o quitting for a week.
267 | September 1999 Primary Care Doctor:
268 | .   KEMH - Oct 1980 (Beverly Hospital):  psychosis in the setting of discontinuation of Abilify 2 months prior.  Pt had CAH and delusions that he had knowledge of a "moral force"
269 | . Was on lexapro and strattera with good result but taken off both in Dec 2009 because of   familial prolonged QTc  syndrome with mutation on gen KCNQ1 and need for anaesthesia for wisdom teeth.
270 | passive SI only (most recent episode end of July 1992) If Yes, please describe:
271 | May, 2006 Primary Care Doctor:
272 | .Spoke to sister Naomi Ely 708-810-7787 who reports he has been doing much better since he went to Dysart Clinic (he was drinking for a month leading up to this, his ammonia was high, and physicians were worried about early).  She feels his cognition is back to baseline, "100% better".  She says he has been successful in abstaining from substances as far as she knows, thinks a schedule is useful to him, doctor's appts etc.  Notes that he returned from LA in August 2008, gets bouts of "exhaustion" even in sobriety.  She denies ever witnessing any periods of manic behavior from patient.  Their father has dementia that started at age 84.  Notes patient is living with uncle in Black River Falls (uncle is 89), lived with sister 3 months who also takes care of her own father in Talladega.  She knows he is working on getting social security, subsidizing housing.  Stable situation with patient's girlfriend Nutt.Suicidal Behavior Hx of Suicidal Behavior: No
273 | sShe tries to follow several dietary rules including avoiding "junk foods" like chips  ice cream, trying not to eat more than 2,000 kcals/day, and eating at roughly the same time every day.  She also purges by self-induced vomiting "when I feel like I'm eating too much."  Typically this is after breakfast and dinner.  She reports that she has been vomiting 2x/day "most days" for the past 3 months, consistent with "extreme" to "severe" severity of compensatory behaviors per DSM-5.  She notes that she previously vomited 4-6x/day between ages 11-16 yo (i.e., before she received any treatment for her eating disorder).  She previously took laxatives 2x/day from ages 14-17, but the last episode was >1 year ago (stopped b/c "they were wicked expensive").  She has also taken diet pills (last episode in Feb 1993) but stopped b/c "they weren't working."  No history of diuretic misuse.
274 | ) - Zoloft 100 mg daily: February, 2010 : self-discontinued due to side effects (unknown)
275 | 7HH, April 1985 Hx of Outpatient Treatment:
276 | .   Ex-BF 25 yo, plans to go to sch in URUGUAY September 1984.
277 | April 1986 Primary Care Doctor:
278 | e Mr. Oliveira reported a long history of alcohol problems which worsened after he lost his job in Apr 2007.
279 | SA in September 1974, OD on luvoxHx of Non Suicidal Self Injurious Behavior: No
280 | .BY Sep 2013 patient called his PCP in Copenhagen, UT, who rx Zoloft 50 mg and referred patient to Elberton-Elbert Hospital for psychiatric follow up.
281 | s1 admission to Psychiatric Inpatient in July 1985 following suicide attemptHx of Outpatient Treatment: Yes
282 | yAug 2004 - Dr. Tom Ngo at TMC; dx GAD; patient stopped after two visits re: "I was scared if this meant I was sick and really needed this", despite success in controlling anxiety-related symptoms of nausea after just one treatment visit.Psychiatric Review of Systems DEPRESSION: Has the patient had periods of time lasting two weeks or longer in which, most of the day on most days, they felt little interest or pleasure in doing things, or they had to push themselves to do things: No
283 | s Cathy Bowers is a 50 yo single Caucasian female who presents to the ANH Eating Disorders Department for an evaluation and treatment recommendations for low weight.  She shared that she has recently lost a great deal of weight and is having difficulty meeting her calorie needs due to difficulties with gagging/swallowing, and aversions to specific food textures.  Specifically, since May 2012, she has lost 18 lbs, going from 128 lbs (BMI = 19.5, normal range) to 110.2 lbs (BMI = 16.8, underweight range) at a height of 5'8" tall.  She has had amenorrhea for 2 months.  Her current weight is her lowest since high school, when she was a model and weighed 98 lbs (BMI = 14.9, underweight range).  At that time, she had amenorrhea, felt pressure to be thin in order to keep her job, and most likely met criteria for frank anorexia nervosa nervosa-restricting type.
284 | AFeb 1977: Symmes Hospital
285 | .  Alvarado Medical Center - Jan 1987 : worsening psychosis and mania (impulsivity of going to Pompeii, grandiosity in wanting to send script to famous playwright, increased painting activity, maxed out credit card, decreased sleep) Hx of Outpatient Treatment: Yes
286 | sSep 1983 GSW to face (L-TMJ region), ? gang related, with L-CN VII injury and ? TBI, requiring plastic surgical reconstructionActivities of Daily Living (ADL) Bathing: Independent
287 | father died suddenly in January 2013 Meaningful activities/supports:
288 | .The patient reports onset of depressed mood SINCE ~6 MONTHS s/p liver transplant in Mar 2010 associated with:
289 | . Inhaled helium - August 2009 Hx of Non Suicidal Self Injurious Behavior: Yes
290 | NV fire fighter died Sep 2007 while working.  Was friend from deployment to San Marino and trainings for years prior.  Still troubling to pt.  Didn't go to his funeral.  Spiritual/Religion:
291 | ostoppepd working 1rs week of December 2011 Occupation/Work History:
292 | Stopped daily use in Jan 2004; occassionally smokes when drinking now but this is infrequentFamily Psych History: Family History of Suicidal Behavior: None
293 | olanzapine to address manic behavior in November 1995 - mania secondary to right frontal lobe lesion and prednisone.Psychiatric Review of Systems DEPRESSION: Has the patient had periods of time lasting two weeks or longer in which, most of the day on most days, they felt little interest or pleasure in doing things, or they had to push themselves to do things: No
294 | The patient reports having had an endoscopy, colonoscopy and abdominal CT at an outside hospital in the Gravette Medical, which were read as negative. The patient also reports a number of labs which were also normal in September 2008. Prior relevant imaging:
295 | February 1983 Primary Care Doctor:
296 | sKern Hospital March 1983 for SI
297 | cutting of wrist and thigh x 2years - stopped since Aug 1979
298 | B/R Natchez, GA on a commune. Completed 1 year of medical school. Married to husband for 50+ years. Moved to Maine and later settled in West Hurley, AK. HAs 2 children and 5 grandchildren. Worked as notary public for the local government for over 50 years. Husband now in nursing home and she has moved in with her daughter's family in Emerald Isle, AK as of Jan 2009. No access to firearms. Past verbal, emotional, physical, sexual abuse: No
299 | HH, Janaury 1993
300 | March 1974 Primary Care Doctor:
301 | .Since January 1994, she feels that hse has been much more irritable and frustrated towards her husband about 1 week prior to her menses. She does not have a h/o premenstrual mood symptoms.  She notes that little things would set her off with him. She was not this way at work or in other social settings.
302 | S/p colectomy in Dec 1992.Problems ANXIETY : Dr. Tatiana, KWMC, klonapin for sleep, propranolol helps at night
303 | Hx of 3 instances that pt believes were concussions (last November 2004, playing flag football)
304 | 7 first psych hospital approx age 20, most recent at age 25 in January 1977 Hx of Outpatient Treatment: Yes
305 | past suicidal gestures by overdose (trazodone/alcohol) and superficial cutting (high school, last in Mar 2002).  Reports feeling ambivalent about life, but re ports she never really wanted to kill herself.If Yes, please describe:
306 | . The patient was hospitalized in Feb 2000 at Sweetwater Hospital
307 | May, 2004 Hx of Brain Injury: No
308 | invasive ductal carcinoma of the left breast(s/p chemotherapy, mastectomy and  HTN, hypercholesterolemia, DM, sleep apnea,, nephrolithiasis. chronic renal impairment, DVT since July 2006 on enoxparin.Surgical History:
309 | dCelexa: most helpful, came off half-way through sr yr (Feb 1994) b/c had been doing well enough to not need it
310 | Pt put on a 504 at the beginning of 10 th grade, approved for IEP in April 1977 for emotional/social support.Employment Currently employed: No
311 | eZoloft : dizziness and tiredness (only 1 dose, Oct 1992)
312 | - Prozac 20 mg daily:  February, 1995: self-discontinued due to side effects (unknown)
313 | .On review of systems, she reports that her sleep is generally good, except when she wakes up around 1am and struggles to fall back asleep for an hour or more.  On these instances she usually wakes up exhausted.  She reports going to sleep around 8:30 or 9pm because her partner keeps an early schedule (wakes up at 4am), though she says she typically sleeps much later because she does not have to be at work until 10 am.  She reports enjoying activities, eating well, and having good energy and ability to concentrate.  She denies suicidal thoughts or self-injury since February 1989  (her last suicide attempt).  She denies thoughts of harming others or access to guns.  She denies current or prior symptoms of mania or psychosis.  She endorses drinking two glasses of wine per night, though on rare occasions she drinks three (usually when she "overreacts").  She states this has happened about 5 times in the past few months.  She denies feeling shaky in the morning and denies other substance use or cigarette smoking.
314 | nLoss of father to cardiac event in Decemeber 1978
315 | a Saddle PE January 2007
316 | rBrookhaven outpatient program in Jun 1976- lost 40 lbs, couldn't get out of bed; had been seeing a therapist at the time
317 | 2 cats-3 family house-planning to move in May 2011 to Crowley ALFirearms: None
318 | . Psychosocial: lives w/ father, looking to get own apartment. Lives in Saluda. Fa. has roommate. Parents divorced when pt 5 y/o old. Mother lives in saluda, see her often, drives hm to work. Relationship okay w/ dad and mom is okay too. Has one sibling, age 18 y/o. Get on well currently. Close family on mother's side. No family issues. Employment: Woriking 63 hours - "I love it." At construction worker. School: finished HS - dipolma, started working after HS. Likes job, makes good money. Peers: have friends, no issues, all friends are sober from opiates but drink/smoke "they distance from me when I use." Single currently. Stressors: biggest stress - Mar, 1975 - uncle passed from heroin OD. Pt was close do him, was mother's brother. Finances - getting better, owe lots of money to people/banks too.
319 | ) Paxil (Jan 1978) : sedation
320 | . Other collateral, noted in a July 1975 note documenting patient's CM at Johnson Hospital, reports patient has history of hoarding behavior and chronic delusions with ex-husband as focus, and recurring conviction that men are coming into her home and stealing things, writing things on her belongings, etc.  Patient apparently has long history of seeing psychiatrists and therapists but none for very long.  Hx of Outpatient Treatment: Yes
321 | 50 yo man with history consistant with ADHD undiagnosed or treated and a long history of anxiety verging on GAD combined with history of social anxieties.  He started drinking at age 16 and had early DUI's, then went from weekend to daily drinking.  Eventually sought treatment and was sober a year in programs.  Had some semblence of drinking control until the death of his father November 2012 - "after that I was so upset and grieving and I couldn't control my drinking"  This led to job loss and homelessness.
322 | 2June, 1999 Audit C Score Current:
323 | Mother died October 1991; very close. Father still alive at age 79; lives downstairs from them.Past verbal, emotional, physical, sexual abuse: No
324 | 4ectopic pregnancy in March 1973 Pertinent Medical Review of Systems Sexual (include birth control method if used):
325 | Hep C and HIV negative, LFTs WNL (October 1996)Problems Opioid dependence
326 | .She has however developed severe anxiety and depression, which she indicates as starting approximately one year ago.  In Jun 2007 she was assessed at ASMC for vertigo which is worsened by stress and then later in the year she was twice seen in the Ronan Medical Clinic ED for chest pain with apparently negative cardiac work-up.  She describes both persisting generalized anxiety (feeling on edge, restless, worried) as well as episodic-like autonomic arousal (sweating, dizziness, chest tightness, chills) consistent with panic attacks.  These symptoms have worsened over the past year, and have only slightly helped by 0.5mg of lorazepam taken thrice daily.
327 | . Once off Cymbalta, mood was stable and good, she lost 80 lbs (on high protein Dukan diet), started biking, active w/ friends etc until October 1995 when she  rather abruptly noted a decline in mood, worse by the year's end, with decreased interest, (stopped making jewelry, biking, seeing friends) much more social withdrawal and decreased energy. Felt "like a veil was drawn over me".
328 | yEnd stage renal disease : Secondary to obstructive uropathy, diabetes and HTN. On hemodialysis qTTS at Hartford since April 1999. Followed by Dr. Kruger.
329 | s Pt reports long Hx of drug addiction. PLEASE SEE NSC SUPPLEMENTAL NOTE on this same date for details about substance use Hx. Her drugs of choice are opiates and benzodiazepines. She said that she used cocaine about once per month when she was on methadone. She got off of methadone in May, 2001 (7 months ago). She reports using it cocaine 1 x since getting off methadone. She states she has been abstinent from opiates for 7 months. Pt states: "I have no desire to use an opiate or a drug at all. I don't have a desire to use heroin or have any craving for it. I can't stand that I am all of the place, I can't have a conversation, I can't sit down and watch a movie - my ADHD is that bad. I just want my ADHD treated. The heroin takes the racing thoughts away."  She is primarily seeking treatment for ADHD. She reps her substance abuse has been to self-medicate her ADHD and believes that if she were treated for ADHD she would stop abusing substances. Currently, Pt's PCP, Dr. Michael Yarber is tapering Pt off of benzo's.
330 | CLLC and DCF - ended in March, 2000. Psychiatric Review of Systems DEPRESSION: Has the patient had periods of time lasting two weeks or longer in which, most of the day on most days, they felt sad, down, or depressed: Yes
331 | "-worked with Uma Dewitt (ED specialist) x 10 months, but stopped b/c "I wasn't getting anywhere" (April 1988)Prior medication trials (including efficacy, reasons discontinued):
332 | Prozac- highest dose 40 mg; currently on 20 mg and doing well. Started about 4-5 years ago and would come on and off but her mood symptoms would relapse and she would go back on.  Denies that the Prozac ever made her mood worse. She was on 40 mg when she found out she was pregnant and stayed on it about 1 month before coming off and remained off for 4months. Resumed use in December 1993 but at lower dose, 20 mg.
333 | )last use in June 1974.Longitudinal Alcohol use History:
334 | sNovember 1997 - suicidal ideation - HHR
335 | sAppendectomy in July 1986, shortly after delivery.Axis IV: Problems related to social environment; Problems with primary support group
336 | Open appendectomy, as a child. 2.  Laparoscopic cholecystectomy and umbilical hernia repair, as discussed above, in February 1973. Prior relevant labs:
337 | sOcella 0.03-3MG TABLET Take 1 PO QD, next pap due March 1978 x 90 days
338 | sWas on lexapro and strattera with good result but taken off both in Dec 2007 because of  familial prolonged QTc  syndrome with mutation on gen KCNQ1 and need for anaesthesia for wisdom teeth.
339 | 2Last time received treatment was from a life coach (through PCP): started Apr, 1998 - present. Phone meetings every 2 weeks.
340 | Currently on Zoloft, 50 mg, Dicolenac, 100 mg, Hydrocodone, Seroquel, 100 mg. Was hospitalized for Serotonin Syndrome for 3 days in March, 2005 due to Dr. Potts increasing effexor and zoloft. Current Psychopharmacologist(s) and Phone #(s):
341 | Future-oriented; recent increase in hope; in supportive committed relationship; connected to several close friends; recently employed; seeking treatment; excited about being a father (May 1980); resilient; benefitted from TBI treatment in pastNeeds/Preferences:
342 | .Nov 2007- Evaluation and treatment at CH ED/APS for acute alcohol and medication OD, with history of depression relapse and alcohol abuse PRIOR TWO WEEKS due to financial stress when she lost some Meteorological Technician work.  Discharged to detox and follow up with Fresenius Medical Care Center.
343 | 5Bethania Hospital March 1976 Hx of Outpatient Treatment: Yes
344 | 6/1998 Primary Care Doctor:
345 | s 52 y/o MWM h/o chronic depression, anxiety, adhd.  Here for psychopharm transfer in the context of his recent psychiatrist Dr. Blankenship's departure in 6/2005.  Patient endorses a long history of depression "ever since childhood" which wodesned in college and while patient was in Doctors Without Borders--was stationed in Bulgaria as a medical assistant, and left after several months: "I was so depressed I only weighed 120 lbs."
346 | 10/1973 Hx of Brain Injury: Yes
347 | 9/2005 Primary Care Doctor:
348 | s 03/1980 Positive PPD: treated with INH for 6 months
349 | 12/2005 Family Psych History: Family History of Suicidal Behavior: None
350 | 5/1987 Primary Care Doctor:
351 | 5/2004 Primary Care Doctor:
352 | A pleasant 28 yo woman with no formal psychiatric history and with a h/o SCCA of the right tongue (s/p partial glossectomy and neck dissection in 8/1974) referred to psycho-oncology for assistance with adjustment issues following recovery. The patient does not meet criteria for a major mood or anxiety disorder. She is not at imminent risk of harm to self or others. She would benefit from psychotherapy to help her integrate her experience of cancer and the break-up of her engagement, and to think through how to continue to create a life for herself moving forward.
353 | See initial PROMPTCARE evaluation from 3/1986 for more complete informationSocial History Marital Status: Single
354 | 10/1997 Primary Care Doctor:
355 | Deviated septum, 3/1993 Activities of Daily Living (ADL) Bathing: Independent
356 | 3/1981 Primary Care Doctor:
357 | . Patient states that she will follow up with the Cotta Hospital clinic regarding getting a therapist for ongoing care. (Per LMR, pt transitioned to a new therapist in 9/2003 and was seen for 1 appointment; the pt canceled her f/u appt 2 weeks later and was not seen for f/u appointments since. )This visit for a one-time consultation only? Yes
358 | % 10/1993 Echo: LVEF 60%
359 | .Mother died, 1/1983.
360 | . GBSG : 7/1994
361 | 12/2008 Primary Care Doctor:
362 | n Abnormal CXR : 10/1980 ? scarring @ R heart border needs f/u CXR
363 | md. metabolic montioring as indicated inc. annual EKG in ~8/2003
364 | 12/1975 Primary Care Doctor:
365 | sDr. Yaeger at Jarman Memorial Hospital for meds, on Citalopram since 11/2010.
366 | sCurrently See Jeremiah Ngo in Omak. Sees 2x month through his EAP. Feels it is supportive. Been in treatment since 7/1997. No inpatient treatment. No medsPsychiatric Review of Systems DEPRESSION: Has the patient had periods of time lasting two weeks or longer in which, most of the day on most days, they felt little interest or pleasure in doing things, or they had to push themselves to do things: No
367 | 1. Metastatic pancreatic adenocarcinoma, dx: 7/2014
368 | rtherapy: Patient states that she will follow up with the Cotta Hospital clinic regarding getting a therapist for ongoing care. (Per LMR, pt transitioned to a new therapist in 9/2001 and was seen for 1 appointment; the pt canceled her f/u appt 2 weeks later and was not seen for f/u appointments since. )Prescriptions Given:
369 | craniotomy 8/1986 Prior relevant labs:
370 | 1/1978 Primary Care Doctor:
371 | 4 (9/1975)Patient's thoughts about current substance abuse:
372 | s Pt is a 73-y.o. MWW with a history of recurrent depression who presents on referral for a one-time consultation (with one follow-up)  from her outptient psychiatrist, Anna Hart, M.D., and PCP, Collis Yeoman M.D. with a complex history involving anxiety, mood and somatic sx's -- "rumination about health"  anorexia and insomnia per Dr. Hart and ~twenty visits to PCP  recently per Dr. Yeoman with GI and nephrology consulted.  The patient has been offered CBT recently for her insomnia, but "the situation has totally impaired all levels of functioning" per Dr. Hart who started seeing the patient in 1/2009 with the pt. presenting at that point still anxious and with insomnia on high doses of Ativan and Ambien.
373 | 7Xitlaly Hobbs OP med mgmt since 5/1995.
374 | Up until his accident, patient would drink 3-4 drinks per night and on weekend sometimes more. Since accident 8/1989, patient rarely has one beer. Drug Use: How many times in the past year has the patient used an illegal drug or used a prescription medication for nonmedical reasons:
375 | weight reduction surgery- bariatic sleeve 11/2000 Prior relevant labs:
376 | 06/1973 Primary Care Doctor:
377 | 2/1999; 472 ms re QTcPertinent Medical Review of Systems Constitutional:
378 | s Pt. is a 59 y.o. woman w/ hx PTSD and anxiety transferring care from Dr. Odonnell, who has left the practice.  Her last appt. w/ Dr. Odonnell was 6/2001. She missed her initial appt w/ me due to influenza and pneumonia.
379 | The pt. is a 45 Y/O F who comes in to discuss her sexual problems-low libido.  Factors to consider--long-standing problem, marital issues, recent delivery of twins 12/1978 -IVF/donor egg/breatfeeding, sexual abuse as a child age 7-?college, depressed mood (variable), hypothyroid (treated).  Pt. has almost no libido now and feels this may be impacting her marriage.  She can lubricate and get orgasm on occasion.  She has had some success with sexual fantasy in the past.  She has not followed through with a med trial of bupropion rx by Peggy Wiggins, MD because she was breast feeding at  the time.  Discussion of the complexity of the problem.  Advised she resume couple's therapy/indiv therapy or discuss with husband if he would be willing to come in for behaviorally-based sex therapy. Reading list given for her to explore options.  Med addition may be helpful to rx mood symptoms and libido (which may be low because of mood.)Patient Instructions
380 | 8/2009 Primary Care Doctor:
381 | dPer Pt Intake from 7/1973: Sometimes has passive "fleeting" thoughts about death ("It would be so much easier to be dead.")  More commonly will think about disappearing and starting over somewhere geographically different.  Does not see suicide or relocating as an option.  Has never taken any action towards acting on either of these thoughts.  These thoughts are followed by the thought "that's not realistic Lowery."  This "resets" the patient and allows her to focus back on the task or challenge at hand.Hx of Non Suicidal Self Injurious Behavior: No
382 | b fibroid tumor resection pending 1/2014
383 | sOne prior voluntary hospitalization in 09/1975 for depression on Psychiatry. Hx of Outpatient Treatment: Yes
384 | sober since 12/2012 after binge drinkingDrug Use: Stimulants: Yes
385 | tProblems Transplant of kidney : 5/1999- functions well now
386 | 7/1989 Communication with referring physician?: Done
387 | 7/2009 Primary Care Doctor:
388 | .  Partner was in Anchor Bay Entertainment accident, in 11/1998,
389 | see above.   Last cutting in 3/1995 Violent Behavior Hx of Violent Behavior: No
390 | 2/2009 Primary Care Doctor:
391 | 08/1988 Primary Care Doctor:
392 | 4/2007 Primary Care Doctor:
393 | TSH 0.67 5/2000 Prior EEG:
394 | 4/2012 open heart surgery
395 | .  Pt described period of difficulty that started 10/2001 with lack of work as an copyeditor and a break up w a BF that spiraled into her dc of medications and decompensation of mental state, requiring extensive inpatient and residential care.  She feels she has recovered from that mental health relapse and is healthier at this time, putting greater focus on her overall health and well being, having positive supports and being honest w them about her functioning, and feeling a strong connection w her psychiatrist.  While pt has made great strides in improving her mental health, supports, and activities, there is concern w her level of distress and attachment related to the current relationship and how problems within that relationship could be risk factors for her decline in functioning.  Pt will benefit from a focus on interpersonal connections and how such does and does not relate to her own identity and functioning.  -Recommendations / Plan Level of Care: Outpatient
396 | 2/1977 Primary Care Doctor:
397 | 8/2008 Primary Care Doctor:
398 | 2/1983 Primary Care Doctor:
399 | nstill cries over her last romantic relationship that ended abruptly in 5/1979 after 3 years --they lived together -> perceived she was very happy with BF at the time; idealizes former BF; he never returned after a cycling tour overseas; then she learned he got married and had a family. EATING DISORDERS: Has the patient had periods of time during which they were concerned about eating or their weight: Uncertain
400 | tProblems Anxiety : Long history of social anxiety which he has managed to confront substantially. History of probable panic attack x 1 in 1/1992. Tends to hold on to stress which then erupts in feeling overwhelmed and tearful, though briefly (minutes).
401 | y Vomiting : likely gerd 11/2008
402 | . Depression- Since 12/2014 she has noticed mood being dampended and accompanied by neurovegetative sx of low energy, some anhedonia, middle insomnia, low appetite, weight gain of 5-10 # in the last year, social isolation, indecisiveness, reduced concentration, guilty thoughts, and helplessness. SHe does not recall depressive episodes in the past. Denies hopeless thinking and denies SI/HI/SIB. Denies prior hx of safety related concenrs. She denies hx of manic sx. She no longer enjoys her job. She is able to perform her job but reports she does not derive much excitement and enjoyment from doing challenging work. She would like to change her job and possibly make a career change. She is considering moving back home to Montana with spouse but is anxious and worried about change.
403 | Dr. Noland-pending 4/1974 Hx of Brain Injury: No
404 | rib resection 10/1981 due to bone tumor in rib
405 | 10/1986 Primary Care Doctor:
406 | History of two suicide attempts, most recently in 03/1973, reportedly caused by a medication side effect. Recent hospitalization due to suicidality. Patient's current risk status: Appropriate for continued outpatient treatment
407 | 12/1994 Primary Care Doctor:
408 | .  8/1999:   Consulted Pediatric Neurogastroenterology, Dr. Quin-Fontenot: SSRI suggested as possibility for anixety.  Referred to PsychMD Dr. Ferrell.
409 | e Pace Maker placed 10/2010
410 | 3Adenomatous polyp of colon : 10/1994. Repeat in 3 years because bowel prep not optimal.
411 | . Patient primary concern is related to a TBI experienced at bootcamp in 01/2007. He reports a fellow army recruit sucker punched him from behind and knocked him out immediately. He is not sure how long he was out for but the next thing he remembers he was in the ambulance going to the hospital. He reports problems since that time although it was hard to get a sense of the specific symptoms he is experiencing. He reports he tends to wake up in the middle of the night and feels like he is having a seizure and can't control his body. He will feel disoriented at times (not completely dissociative, but feels "weird, out of my body." He also had headaches and feels lightheaded. He reports he has also been struggling with depression since the incident where he will have good days and down days. On the down days, he wants to sleep all day and not get out of bed.
412 | 8/2010 QTc 433 ms.Pertinent Medical Review of Systems Constitutional:
413 | kNotice that in 03/1990, sustained a bizarre injury.  He was in Colorado City at the time.  He was driving his car, and he says he had recently ran out of Saphris, which is an antipsychotic he was taking.  He says he does not recall all the events but believes he stepped out of his vehicle and then walked off of a bridge, sustaining a seven-story fall.  He was found unconscious.  He was taken to and treated at Norfolk Health Center in Colorado City, where he underwent open reduction internal fixation of the right humerus as well as the left femur.  was in ICU for a week, multiple fx.  He subsequently recovered from his injuries in the state of South Carolina
414 | 11/2016 Primary Care Doctor:
415 | n4/2004 R mastectomy with reconstruction.
416 | 2/1973 Primary Care Doctor:
417 | s1 CSU stay at Donelson Hospital 7/1987 in the context of voicing thoughts of ending his life in front of wife and son.
418 | tHead trauma-8/2000;s/p traumatic brain injury with hemorrhagic contusion in both frontal lobes and R temporal lob.  Result of bike accident
419 | ANTIDEPRESSANT TREATMENT HISTORY  per Dr. Beard 8/1975
420 | TSH in 5/1977 okayPertinent Medical Review of Systems Constitutional:
421 | .Patient reports overall stable mental health since last visit with Dr. Shepard, medication compliance and states, "I think the medicines are working well."  Pt. has not had an EKG in the last years and wishes not to have this today, but on 3/2000 visit with Dr. Platt.  Pt. highlights separation, but amicable and in fact improved relationship with his wife and only his concern that her car broke down recently and that makes it more difficult for her to get to work.  He notes good relationships with his children and grandchildren. Pt. firmly and clearly agrees to safety plan of calling family members, calling 911 or bringing hiself to closest ER if any idea/plan or intent re: harm to self or others occurs.  He has no hx of SA's.Suicidal Behavior Hx of Suicidal Behavior: No
422 | sChesterfield 9/1984 for 3 weeks for dual diagnosis alcohol and PTSDHx of Outpatient Treatment: Yes
423 | 4/1973 Primary Care Doctor:
424 | Polyarteritis nodosa, presumed (p/w LE claudication 12/1986): improved w/ prednisone 60 mg qday.  Elevated inflammatory markers, radiographic evidence of vascular inflammation on MR, but muscle biopsy and angiography negative, but has had brisk response to empiric immunosuppression
425 | db. routine metabolic montoring as indicated and by 4/1979
426 | sS/P TAH/BSO : 7/2004 for endometrial hyperplasia
427 | Retired in 11/1984 after teaching for 30 yearsLiving Situation Current Living Situation (Including type of dwelling and who patient lives with):
428 | 6e. monitor pt's depressive experience in context of adjusting to upcoming retirement from work in 5/2016 and realtionships with his brother, son and daughter This visit for a one-time consultation only? No
429 | e12/2007 dx of endocarditis
430 | 11/1982 Primary Care Doctor:
431 | 06/1981 Hx of Brain Injury: Yes
432 | CBC, CMP, TSH, EKG: re: QTc 445 ms in 4/2013 Collateral information obtained:
433 | s Pt. is a 76-y.o. WWW who presents in the context of after being divorced in a contentious situation, married a man who was "a much beter match" for her and after his death in 4/1999 she has experienced a complicated grief reaction.
434 | a Endometriosis : dx on laparoscopy 5/2006
435 | 10/1978 Communication with referring physician?: Done
436 | . First Rx'd medication 12/1989.
437 | "Tapered off Xanax in 2/1974
438 | bObesity : 11/1986 MCC LAPAROSCOPIC PARTIAL VERTICAL GASTRECTOMY
439 | . 4/1983:  Dr. Tejeda recently consulted through E-Docs.  Fluoxetine was increased to 40 mg/day.
440 | Since 10/2014: Fatigued, more forgetful, impaired dexterity on her left hand. MRI reveals an approximately 4.2cm x 3.3cm x 2.5cm right parietal enhancing mass with surrounding edema
441 | 6/1989 Primary Care Doctor:
442 | 9/1980 Primary Care Doctor:
443 | 9/1992 Primary Care Doctor:
444 | s30 days in GA in 9/2000. Hx of Outpatient Treatment: Yes
445 | 7/1981 Primary Care Doctor:
446 | 9 Queen Maldonado, LICSW - dynamic/supportive counseling since 1/2008
447 | Raised in Sao Tome by both parents along with 2 siblings (older brother--2 years older, twin brother). Father was a fisherman. Nice upbringing, parents always supportive. Mother was a railway conductor. All brothers are Sao Tome fisherman. Lots of tradition, very tightknit. Loss of mother in the last year was very sad for them. His Dad just moved to West Virginia. He met girlfriend 4/2002 at a bar. He discussed that they are both supportive of eachother. Past verbal, emotional, physical, sexual abuse: No
448 | .Patient currently feels "up and down" at times 2/2 to perimenopausal symptoms--for the last year and a half, has had off and on menstrual cycles--"For 9 months I didn't have my period, then three months I did, and for the last five months (since 7/1985) I haven't..." Does endorse hot flashes, feeling "more bloated," but symptoms manageable.  Has noted feeling "down here and there but it does go away..."
449 | 5/2010 Hx of Brain Injury: No
450 | r Pneumonia : 8/2002
451 | 1/1994 Primary Care Doctor:
452 | Reports MRI of brain done 12/2004 at Gravette Medical Center was WNLPrior EEG:
453 | s6 past psychiatric hospitalizations starting at age 16.  Last 3/2003 for SIB/SI.  (WWL x 2, Getwell Hospital, Lincoln Hospital, cOX nORTH, Lotus Clinic)Hx of Outpatient Treatment: Yes
454 | 7/1991 Medical History:
455 | 7/1982 Primary Care Doctor:
456 | sHemmorage caused by probe in 1984 Medical History:
457 | sHas been at MYH since his treaters in NE retired in 2000. Was seen in NE for 20 years. Previouysly seen by Drs. Brandy Ivery and Quinn Prater. Prior medication trials (including efficacy, reasons discontinued):
458 | Pt joined Army reserves in 2001 and has 3 years left in this commitment.-Mental Status Exam Was the exam performed? (If not, indicate reason): Yes
459 | one sister from whom he is estranged due to her opiate dependence, legal conflict over mother's house following father's death in 1982, "I dont' want to see family ever again"Past verbal, emotional, physical, sexual abuse: Patient Declined
460 | sSince 1998. Prior medication trials (including efficacy, reasons discontinued):
461 | 1 Ex-smoker : quit 2012
462 | . Age 16, 1991, frontal impact. out for two weeks from sports.
463 | sLexapro (1988-now): Good response (anxiety)
464 | s      25 year old engaged to be married Optician of Irish descent. She has a history of severe obesity necessitating a gastric bypass in 2014 (mother had a gastric bypass some years before). "The doctor said I needed to stop eating or I was going to end up in a coffin."  Pt came to the US from Ireland at age 3 and lived with her grandparents initially, which was very stabilizing. Pt reports that sadness and anxiety symptoms began after her grandfather's death when she was in 3rd grade. "I grew up in a strict Irish Catholic atmosphere and my parents were hardest on me because I was the oldest. She reports regular verbal, emotional, physical abuse from both parents throughout childhood.  She continues to live with her parents and reports that they get along OK, but they don't talk about the past. Stilll feels "a little low and irritable."
465 | 8Complications from brain hemmorage in 2016 Axis IV: Economic problems; Occupational problems; Problems related to social environment
466 | .Age, 19, 1976, playing football, frontal impact. out for two weeks from sports.
467 | s1981  Swedish-American Hospital
468 | aS/P suicide attempt 2011 Hx of Outpatient Treatment: Yes
469 | Patient has a history of suicidal ideation with plan. She denied ever attempting to take her life, but noted coming close to it in the past. Patient also reported that she has not had suicidal ideation, intent, or plan since 1997, other than this one instance this past week of passive ideation. Patient committed to the safety plan, which is to call 911, or call her sister if she is unable to call 911 (who would call 911 for her), or go to the nearest ER and then page the current writer. She also has the number for the crisis hotline and related that she could also call her mental health counselor that she sees at DMC. Multi-Axial Diagnoses/Assessment Anxiety Disorders 300.3 Obsessive Compulsive Disorder
470 | Born and raised in Fowlerville, IN.  Parents divorced when she was young, states that it was a "bad" divorce.  Received her college degree from Allegheny College in 2003.  Past verbal, emotional, physical, sexual abuse: No
471 | y1983 Clinic Hospital, first hospitalization, s/p SA by OD
472 | tProblems Urinary incontinence : mild urge incontinence, noted in 1999.
473 | .2010 - wife; nightmares and angry outbursts; drinking a lot; living in deceased grandmother's house with wife.  Working for American Electric Power- billing.
474 | shx of TBI (1975) ISO MVA.Medical History:
475 | sPatient reported losing three friends that passed away during his deployment, including a close friend Jacques, and two other friends that he lived with for a period of time. A fourth friend passed away prior to his deployment in 1972.  Patient reported thinking about his friends daily and described his feelings of grief as remaining constant since they died.  Alcohol Use: How often did you have a drink containing alcohol in the past year: Two to four times a month (2 points)
476 | TSH okay in 2015 Prior EKG:
477 | 1989 Family Psych History: Family History of Suicidal Behavior: Ideation/Threat(s)
478 | oEnjoys animals, had a dog x 14 yrs who died in 1994 Interpersonal Interactions/ Concerns:
479 | eHistory of small right parietal subgaleal hematoma s/p MVA (1993)
480 | sIn KEP Psychiatryfor therapy and medications since 1996 Prior medication trials (including efficacy, reasons discontinued):
481 | 1. Esophageal cancer, dx: 2013, on FOLFOX with oxaliplatin desensitization
482 | y1974 (all)
483 | h/o restraining order by sister/mother in 1990-- petitioned against this, no current legal problemsDoes patient have a Legal Guardian, Rep Payee, or Conservatorship (If yes, please include name and phone #): No
484 | sTexas Medical Center; Oklahoma for 2 weeks; 1995
485 | Death of former partner in 2004 by overdose as noted aboveParental/Caregiver obligations:
486 | Was "average" student.  "I didn't have too many friends at home at school, and was picked on a bit in school."  Went to college and really enjoyed, made good friends, still close with some of them.  Went to 6 year college/engineering program.  Has been employed f/t as materials engineer since graduated in 1987.Family history and relationships:
487 | Contemplating jumping off building - 1973 - difficulty writing paper.
488 | appendectomy s/p delivery 1992 Prior relevant labs:
489 | tProblems renal cell cancer : s/p nephrectomy 1977
490 | ran own business for 35 years, sold in 1985
491 | Lab: B12 969 2007
492 | )and 8mo in 2009
493 | .Moved to USA in 1986. Suffered from malnutrition but denies physical/sexual/emotional abuse.
494 | r1978
495 | . Went to Emerson, in Newfane Alaska. Started in 2002 at CNM. Generally likes job, does not have time to do what she needs to do. Feels she is working more than should be.
496 | 1979 Family Psych History: Family History of Suicidal Behavior: None
497 | therapist and friend died in ~2006 Parental/Caregiver obligations:
498 | 2008 partial thyroidectomy
499 | sPt describes a history of sexual abuse as a child; also describes the experience of finding a former partner after he died secondary to overdose in 2005 as traumaticPTSD: Does the patient feel themselves getting very upset whenever they are reminded of their traumatic experience: Yes
500 | . In 1980, patient was living in Naples and described being upset by the amount of noise generated by the neighbor living under his apartment.  He noted the woman was a heroin user and when she refused to quiet down, he walked into her apartment and "trashed it".  He was arrested for breaking and entering and destruction of property and was placed on probation.
501 | 


--------------------------------------------------------------------------------
/newsgroups:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sidsriv/Applied-text-mining-in-Python/b465d61996c7d5f55f38e0f918fff85997c08a16/newsgroups


--------------------------------------------------------------------------------
/paraphrases.csv:
--------------------------------------------------------------------------------
 1 | Quality,D1,D2
 2 | 1,"Ms Stewart, the chief executive, was not expected to attend.","Ms Stewart, 61, its chief executive officer and chairwoman, did not attend.
 3 | "
 4 | 1,"After more than two years' detention under the State Security Bureau, the four were found guilty of subversion in Beijing's No. 1 Intermediate Court last Wednesday.","After more than two years in detention by the State Security Bureau, the four were found guilty last Wednesday of subversion.
 5 | "
 6 | 1,"""It still remains to be seen whether the revenue recovery will be short- or long-lived,"" Sprayregen said.","""It remains to be seen whether the revenue recovery will be short- or long-lived,"" said James Sprayregen, UAL bankruptcy attorney, in court.
 7 | "
 8 | 0,"And it's going to be a wild ride,"" said Allan Hoffenblum, a Republican consultant.","Now the rest is just mechanical,"" said Allan Hoffenblum, a Republican consultant.
 9 | "
10 | 1,"The cards are issued by Mexico's consulates to its citizens living abroad and show the date of birth, a current photograph and the address of the card holder.","The card is issued by Mexico's consulates to its citizens living abroad and shows the date of birth, a current photograph and the address of the cardholder.
11 | "
12 | 1,"Their difference was over whether the court should pay attention to legal opinions of other world courts, such as the European Court of Human Rights.","Their difference was over whether the court should take into account the legal opinions of other world courts, like the European Court of Human Rights.
13 | "
14 | 1,"The only announced Republican to replace Davis is Rep. Darrell Issa of Vista, who has spent $1.71 million of his own money to force a recall.","So far the only declared major party candidate is Rep. Darrell Issa, a Republican who has spent $1.5 million of his own money to fund the recall.
15 | "
16 | 1,"Druce will face murder charges, Conte said.","Conte said Druce will be charged with murder.
17 | "
18 | 0,"""It's a major victory for Maine, and it's a major victory for other states.","The Maine program could be a model for other states.
19 | "
20 | 1,"Microsoft said Friday that it is halting development of future Macintosh versions of its Internet Explorer browser, citing competition from Apple Computer's Safari browser.","Microsoft will stop developing versions of its Internet Explorer browser software for Macintosh computers, saying that Apple's Safari is now all that Apple needs.
21 | "
22 | 0,New legit download service launches with PC users in mind.,"BuyMusic is the first subscription-free paid download music service for PC users.
23 | "
24 | 0,"All patients developed some or all of the symptoms of E. coli food poisoning: bloody diarrhea, vomiting, abdominal cramping and nausea.","Symptoms of the E. coli infection include bloody diarrhea, nausea, vomiting and abdominal cramping.
25 | "
26 | 0,"Nine years ago, they were married by a justice of the peace.","His wife, married to Moore by a justice of the peace, started planning her dream wedding.
27 | "
28 | 1,"""Indeed, Iran should be put on notice that efforts to try to remake Iraq in their image will be aggressively put down,"" he said.","""Iran should be on notice that attempts to remake Iraq in Iran's image will be aggressively put down,"" he said.
29 | "
30 | 0,The euro tagged another record high against the dollar on Tuesday as demand for higher-yielding euro-based assets overshadowed solid U.S. economic data.,"The euro ros further into record territory on Tuesday as demand for higher-yielding euro-based assets overshadowed U.S. economic data showing rising consumer confidence and a strong housing market.
31 | "
32 | 0,"US Special Forces troops are training Colombian soldiers at military bases in the region to protect the pipeline, which transports crude from Colombia's second-biggest oil field.","U.S. Special Forces troops are training Colombian soldiers at military bases in the region to protect the pipeline.
33 | "
34 | 0,The constitutionality of outlawing partial birth abortion is not an open question.,"Defenders of the partial birth abortion ban downplayed the legal challenges.
35 | "
36 | 0,"Sun was the lone major vendor to see its shipments decline, falling 2.9 percent to 59,692 units.","IBM (NYSE: IBM)  was the fastest-growing vendor, with sales jumping 37 percent to 220,000 units.
37 | "
38 | 0,"On Monday the Palestinian Prime Minister, Mahmoud Abbas, will report to the Palestinian parliament on his Government's achievements in its first 100 days in office.","Palestinian Prime Minister Mahmoud Abbas must defend the record of his first 100 days in office before Parliament today as the death toll in the occupied territories continues to rise.
39 | "
40 | 1,"The NASD also alleges Young flew multiple times on Tyco corporate jets, often accompanied by Kozlowski.","The NASD alleges that the analyst flew multiples times on Tyco's corporate jets for business trips, sometimes accompanied by Kozlowski.
41 | "
42 | 


--------------------------------------------------------------------------------