├── .DS_Store
├── README.md
├── ProjectProposal_group042.ipynb
├── DataCheckpoint_group042.ipynb
└── .ipynb_checkpoints
    └── DataCheckpoint_group042-checkpoint.ipynb


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/COGS108/group042_wi21/main/.DS_Store


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | This is your group repo for your final project for COGS108.
 2 | 
 3 | This repository is private, and is only visible to the course instructors and your group mates; it is not visible to anyone else.
 4 | 
 5 | Template notebooks for each component are provided. Only work on the notebook prior to its due date. After each submission is due, move onto the next notebook (For example, after the proposal is due, start working in the Checkpoint #1 notebook). 
 6 | 
 7 | This repository will be frozen on the final project due date. No further changes can be made after that time.
 8 | 
 9 | Your project proposal and final project will be graded based solely on the corresponding project notebooks in this repository.
10 | 
11 | Template Jupyter notebooks have been included. Be sure to change XXX to your group's group number in the file names. For each due date, make sure you have a notebook present in this repository by each due date with the following name (where XX is replaced by your group number):
12 | 
13 | - `ProjectProposal_groupXXX.ipynb`
14 | - `DataCheckpoint_groupXXX.ipynb`
15 | - `EDACheckpoint_groupXXX.ipynb`
16 | - `FinalProject_groupXXX.ipynb`
17 | 
18 | This is *your* repo. You are free to manage the repo as you see fit, edit this README, add data files, add scripts, etc. So long as there are the four files above on due dates with the required information, the rest is up to you all. 
19 | 
20 | Also, you are free and encouraged to share this project after the course and to add it to your portfolio. Just be sure to fork it to your GitHub at the end of the quarter!


--------------------------------------------------------------------------------
/ProjectProposal_group042.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# COGS 108 - Final Project Proposal"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "# Names\n",
 15 |     "\n",
 16 |     "- Anna Wang\n",
 17 |     "- Kristy Liou\n",
 18 |     "- Chloe Salem\n",
 19 |     "- Zeven Vidmar Barker\n",
 20 |     "- Maxtierney Arias"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "markdown",
 25 |    "metadata": {},
 26 |    "source": [
 27 |     "# Research Question"
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "markdown",
 32 |    "metadata": {},
 33 |    "source": [
 34 |     "Can we predict the happiness of the citizens of a region based on the attitude of the broadcast news reports regarding that region?"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "markdown",
 39 |    "metadata": {},
 40 |    "source": [
 41 |     "## Background and Prior Work"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "markdown",
 46 |    "metadata": {},
 47 |    "source": [
 48 |     "Our team chose the topic of News to research due to the large impact it has on it’s audience and society. In 2016 to 2018, it is reported that more than 5.5 million American citizens tune in for the evening news channels such as ABC, CBS, and NBC. Many people in society look at the News for reliable and relevant information about current events. The News also has the effect to change individuals’ minds and behaviors due to their tone and rhetoric. It is one of the main sources people go to due to their authority and assumed credibility about current topics. In recent years, the news has been spread out in various platforms throughout the internet due to the increased usage of online media, therefore giving people faster communication on what is currently happening. Online media has also negatively impacted the spread of news due to the rise of fake articles. Because of social media, citizens have easy access in expressing their opinion and feelings about current events. We want to research more about how the attitude of televised news has an effect or even reflect how citizens of a certain region feel.
 49 | "Due to the current political climates, there have been conversations, specifically about American’s happiness/unhappiness within their society. We plan on using the World’s Happiness Report, which is an annual report that globally surveys people and ranks 156 countries on how happy their citizens are perceived to be, to use as a measurement for the happiness of a country. The World’s Happiness report takes in consideration multiple aspects of a country, including GDP per capita, social support, life expectancy, freedom to make life choices, generosity, perceptions of corruption, and dystopia. Since the news has a reputation of covering more negative news than positive, we are interested in further researching on the proportion of negative news with positive, and how that reflects back onto the country’s happiness level. \n",Prior to this research, the BBC has done it’s own research on how news can mentally and physically shape those individuals watching. The BBC reports that due to how certain news outlets cover a particular topic, it has the power to change people’s perspective on certain topics and groups of people. The BBC states that the news can increase individual’s stress levels which can further lead to physical health problems in the future due their unproportional coverage of positive and negative news. https://www.bbc.com/future/article/20200512-how-the-news-changes-the-way-we-think-and-behave
 50 | "There have also been many studies that indicate that excessive intake of the news has a negative impact on individuals’ mental health. Natascha de Hoog has published a study that primarily focuses on individuals watching the news daily, rather than in a specific time of terror. The study results stated that the negative news perception related more with a negative and less positive effect.
 51 | "https://bpspsychub.onlinelibrary.wiley.com/doi/full/10.1111/bjop.12389
 52 | "In contrast to de Hoog’s study, Karen Elizabeth McIntye conducted an experiment focused on the effects of reading more positive news. McIntye’s study reported that readers who read positive stories had higher levels of positive affect than reading hardnews stories. Although McIntye reported this, it does not translate into a larger sense of personal well-being.  
 53 | "https://www.tandfonline.com/doi/full/10.1080/1041794X.2016.1171892?casa_token=1Oj1Y5YFQsAAAAAA%3A0doLKEbOhGHx32vNTnjV_4URWb60DKYdTtUgQ1kVXhwLjQ1lsB5x3Pf6vWmoTgZyn7Yq_k7UwNk
 54 | 
 55 |     "\n",
 56 |     "References (include links):\n",
 57 |     "- 1)Reference: (1) https://www.journalism.org/fact-sheet/network-news/\n",
 58 |     "- 2)"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {},
 64 |    "source": [
 65 |     "# Hypothesis\n"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "markdown",
 70 |    "metadata": {},
 71 |    "source": [
 72 |     "We hypothesize firstly that the attributes of broadcast news language concerning a region will reveal a pattern of differing attitudes towards different regions. Secondly, we believe that there will be a positive correlation between the attitude with which the news media discusses a region and the happiness of the citizens in it. Lastly, we hypothesize that this attitude difference can be used to predict the happiness of citizens who reside in a region."
 73 |    ]
 74 |   },
 75 |   {
 76 |    "cell_type": "markdown",
 77 |    "metadata": {},
 78 |    "source": [
 79 |     "# Data"
 80 |    ]
 81 |   },
 82 |   {
 83 |    "cell_type": "markdown",
 84 |    "metadata": {},
 85 |    "source": [
 86 |     "The ideal dataset would consist of a country’s happiness factor combined with its various demographic variables likely to affect this factor (such as population density, total population, etc.) Additionally, we would want some type of data that can adequately describe the content and tone of the news, such as closed captioning samples or headlines. Ideally, we would like to collect this data from the most popular news stations in each region for several regions to best reflect the largest population of broadcast viewers for each. To more accurately determine broadcast media’s effects on a region, we may also seek data on news consumption for the regions we target in our research.
 87 | "We have already done research on publicly available datasets that could potentially provide this information, including World Happiness Report Data and broadcast news closed-captioning samples. 
 88 | "
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "metadata": {},
 94 |    "source": [
 95 |     "# Ethics & Privacy"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "metadata": {},
101 |    "source": [
102 |     "Since we will likely be looking at linguistic attributes of news reports that are in English, our findings will only be able to be generalized to English-speaking countries and regions. 
103 | "There will be biases inherent in data that measures and reports happiness because it is an extremely subjective quality to measure. People may not always report their happiness accurately. As such, our analysis may not be able to use entire accounts of people's emotions and feelings during our time of research.
104 | "In order to detect these specific issues regarding personal emotion, it is important to consider how to approach each individual response. Before communicating our analysis we plan to first find consensual subjects and data, allowing us to properly analyze the correlation between people's feelings and the news. During our time of investigation and planning, we will also continue to abide by our data and subjects' consent, maintaining a full agreement to keep privacy and choice up to the individual limiting any concerns through to the end of our analysis. 
105 | "Besides personal emotion and feelings, our topic doesn't necessarily offer any theatenting problems towards people's data privacy and equitable impact.
106 | "In order to properly address these issues, we plan to handle recorded information and data with care, continuing to follow and resolve any individual concerns as well as keeping potential threats to personal privacy and equitable data to a minimum.  Specifically, we plan to be transparent in our analyses and how we utilize our recordings, stopping any feeling of deception when dealing with news reports and their influence on people."
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "markdown",
111 |    "metadata": {},
112 |    "source": [
113 |     "# Team Expectations "
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "* *Showing up to team meetings on time\n",
121 |     "* *Completing all work that is assigned to you and reaching out to team members if anything is unclear\n",
122 |     "* *Communicating any issues or uncertainties in the group chat so we can all collaborate and help\n",
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "markdown",
127 |    "metadata": {},
128 |    "source": [
129 |     "# Project Timeline Proposal"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "markdown",
134 |    "metadata": {},
135 |    "source": [
136 |     "| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |\n",
137 |     "|---|---|---|---|\n",
138 |     "| 1/20  |  7 PM | Read & Think about COGS 108 expectations; brainstorm topics/questions  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | \n",
139 |     "| 1/26  |  7 PM |  Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal | \n",
140 |     "| 2/1  | 7 PM  | Edit, finalize, and submit proposal; Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |\n",
141 |     "| 2/14  | 7 PM  | Import & Wrangle Data; EDA | Review/Edit wrangling/EDA; Discuss Analysis Plan   |\n",
142 |     "| 2/23  | 7 PM  | Finalize wrangling/EDA; Begin Analysis| Discuss/edit Analysis; Complete project check-in |\n",
143 |     "| 3/13  | 7 PM  | Complete analysis; Draft results/conclusion/discussion| Discuss/edit full project |\n",
144 |     "| 3/19  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |"
145 |    ]
146 |   }
147 |  ],
148 |  "metadata": {
149 |   "kernelspec": {
150 |    "display_name": "Python 3",
151 |    "language": "python",
152 |    "name": "python3"
153 |   },
154 |   "language_info": {
155 |    "codemirror_mode": {
156 |     "name": "ipython",
157 |     "version": 3
158 |    },
159 |    "file_extension": ".py",
160 |    "mimetype": "text/x-python",
161 |    "name": "python",
162 |    "nbconvert_exporter": "python",
163 |    "pygments_lexer": "ipython3",
164 |    "version": "3.7.9"
165 |   }
166 |  },
167 |  "nbformat": 4,
168 |  "nbformat_minor": 2
169 | }
170 | 


--------------------------------------------------------------------------------
/DataCheckpoint_group042.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# COGS 108 - Data Checkpoint"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "# Names\n",
 15 |     "\n",
 16 |     "- Anna Wang\n",
 17 |     "- Chloe Salem\n",
 18 |     "- Kristy Liou\n",
 19 |     "- Maxtierney Arias\n",
 20 |     "- Zeven Vidmar Barker"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "markdown",
 25 |    "metadata": {},
 26 |    "source": [
 27 |     "<a id='research_question'></a>\n",
 28 |     "# Research Question"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "Can we predict which state in the USA will be covid-free first based on current hospital records, state regulations, and population?"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "# Dataset(s)"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "markdown",
 47 |    "metadata": {},
 48 |    "source": [
 49 |     "*Fill in your dataset information here*\n",
 50 |     "\n",
 51 |     "(Copy this information for each dataset)\n",
 52 |     "- Dataset Name: **Population, Population Changes, and Estimates**\n",
 53 |     "- Link to the dataset: https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/totals/nst-est2020.csv\n",
 54 |     "- Number of observations: The US Census dataset from 2010 includes populations of the country, states, and regions as well as estimates of each for every year leading up to 2020. This dataset is the best available given that the 2020 census is still being processed.\n",
 55 |     "\n",
 56 |     "\n",
 57 |     "- Dataset Name: **US State Vaccinations**\n",
 58 |     "- Link to the dataset: https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/us_state_vaccinations.csv\n",
 59 |     "- It has 2104 observations, which include data on how many people from each state has been vaccinated starting from January 12, 2021.  These observations are broken down by day, which will allow us to analyze the rate at which vaccines are being received and distributed in each state. The dataset features the total number of vaccinations a state has each day with the total number of vaccinations distributed per day.\n",
 60 |     "\n",
 61 |     "\n",
 62 |     "- Dataset Name: **COVID Tracking**\n",
 63 |     "- Link to the dataset: https://covidtracking.com/data\n",
 64 |     "- It has 2,006 observations, that shows us the number of COVID cases for each state in the US since January 12, 2020, along with patient hospitalization data by state, data on deaths, and COVID testing information. We will be utilizing hospitalization data and testing information.\n",
 65 |     "\n",
 66 |     "\n",
 67 |     "\n"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "markdown",
 72 |    "metadata": {},
 73 |    "source": [
 74 |     "# Setup"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "code",
 79 |    "execution_count": 60,
 80 |    "metadata": {},
 81 |    "outputs": [],
 82 |    "source": [
 83 |     "#importing needed libraries\n",
 84 |     "import pandas as pd\n",
 85 |     "import seaborn as sns\n",
 86 |     "import numpy as np\n",
 87 |     "\n",
 88 |     "# reading data sets\n",
 89 |     "population = pd.read_csv(\"https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/totals/nst-est2020.csv\")\n",
 90 |     "vaccinations = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/us_state_vaccinations.csv')\n",
 91 |     "# includes hospitalization and covid cases over time\n",
 92 |     "case_tracking = pd.read_csv('https://covidtracking.com/data/download/all-states-history.csv')"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "markdown",
 97 |    "metadata": {},
 98 |    "source": [
 99 |     "# Data Cleaning"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "markdown",
104 |    "metadata": {},
105 |    "source": [
106 |     "The three datasets we used were already in a tidy format, with variables representing every distinct measurement made by the sources in the columns, and separate observations in the rows. Since we did remove observations unrelated to states in the Census, and we just wanted the states and their 2020 population estimates, we reset the index for `population`. For the other two datasets, we just removed the variables that were unnecessary for our analysis.\n",
107 |     "\n",
108 |     "To answer our question, we will utilize data on hospital capacity by state, vaccinations in each state, and COVID cases by state, combined with population data sourced from the US Census. \n"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": 61,
114 |    "metadata": {},
115 |    "outputs": [
116 |     {
117 |      "data": {
118 |       "text/html": [
119 |        "<div>\n",
120 |        "<style scoped>\n",
121 |        "    .dataframe tbody tr th:only-of-type {\n",
122 |        "        vertical-align: middle;\n",
123 |        "    }\n",
124 |        "\n",
125 |        "    .dataframe tbody tr th {\n",
126 |        "        vertical-align: top;\n",
127 |        "    }\n",
128 |        "\n",
129 |        "    .dataframe thead th {\n",
130 |        "        text-align: right;\n",
131 |        "    }\n",
132 |        "</style>\n",
133 |        "<table border=\"1\" class=\"dataframe\">\n",
134 |        "  <thead>\n",
135 |        "    <tr style=\"text-align: right;\">\n",
136 |        "      <th></th>\n",
137 |        "      <th>POPESTIMATE2020</th>\n",
138 |        "    </tr>\n",
139 |        "    <tr>\n",
140 |        "      <th>NAME</th>\n",
141 |        "      <th></th>\n",
142 |        "    </tr>\n",
143 |        "  </thead>\n",
144 |        "  <tbody>\n",
145 |        "    <tr>\n",
146 |        "      <td>Alabama</td>\n",
147 |        "      <td>4921532</td>\n",
148 |        "    </tr>\n",
149 |        "    <tr>\n",
150 |        "      <td>Alaska</td>\n",
151 |        "      <td>731158</td>\n",
152 |        "    </tr>\n",
153 |        "    <tr>\n",
154 |        "      <td>Arizona</td>\n",
155 |        "      <td>7421401</td>\n",
156 |        "    </tr>\n",
157 |        "    <tr>\n",
158 |        "      <td>Arkansas</td>\n",
159 |        "      <td>3030522</td>\n",
160 |        "    </tr>\n",
161 |        "    <tr>\n",
162 |        "      <td>California</td>\n",
163 |        "      <td>39368078</td>\n",
164 |        "    </tr>\n",
165 |        "  </tbody>\n",
166 |        "</table>\n",
167 |        "</div>"
168 |       ],
169 |       "text/plain": [
170 |        "            POPESTIMATE2020\n",
171 |        "NAME                       \n",
172 |        "Alabama             4921532\n",
173 |        "Alaska               731158\n",
174 |        "Arizona             7421401\n",
175 |        "Arkansas            3030522\n",
176 |        "California         39368078"
177 |       ]
178 |      },
179 |      "execution_count": 61,
180 |      "metadata": {},
181 |      "output_type": "execute_result"
182 |     }
183 |    ],
184 |    "source": [
185 |     "# Removing the regions population\n",
186 |     "population = population[5::]\n",
187 |     "# Remove columns with years outside 2020\n",
188 |     "population = population.drop(population.columns[7:-1], 1)\n",
189 |     "# Remove unnecessary variables related to region and identifiers\n",
190 |     "population = population.drop(['STATE','SUMLEV', 'DIVISION', 'REGION','CENSUS2010POP','ESTIMATESBASE2010'], 1)\n",
191 |     "# Set index to be the name of the state, rather than an arbitrary number\n",
192 |     "population.set_index(['NAME'], inplace=True)\n",
193 |     "population.head()"
194 |    ]
195 |   },
196 |   {
197 |    "cell_type": "code",
198 |    "execution_count": 62,
199 |    "metadata": {},
200 |    "outputs": [
201 |     {
202 |      "data": {
203 |       "text/html": [
204 |        "<div>\n",
205 |        "<style scoped>\n",
206 |        "    .dataframe tbody tr th:only-of-type {\n",
207 |        "        vertical-align: middle;\n",
208 |        "    }\n",
209 |        "\n",
210 |        "    .dataframe tbody tr th {\n",
211 |        "        vertical-align: top;\n",
212 |        "    }\n",
213 |        "\n",
214 |        "    .dataframe thead th {\n",
215 |        "        text-align: right;\n",
216 |        "    }\n",
217 |        "</style>\n",
218 |        "<table border=\"1\" class=\"dataframe\">\n",
219 |        "  <thead>\n",
220 |        "    <tr style=\"text-align: right;\">\n",
221 |        "      <th></th>\n",
222 |        "      <th>date</th>\n",
223 |        "      <th>location</th>\n",
224 |        "      <th>total_vaccinations</th>\n",
225 |        "      <th>total_distributed</th>\n",
226 |        "      <th>people_vaccinated</th>\n",
227 |        "      <th>people_fully_vaccinated_per_hundred</th>\n",
228 |        "      <th>total_vaccinations_per_hundred</th>\n",
229 |        "      <th>people_fully_vaccinated</th>\n",
230 |        "      <th>people_vaccinated_per_hundred</th>\n",
231 |        "      <th>distributed_per_hundred</th>\n",
232 |        "      <th>daily_vaccinations_raw</th>\n",
233 |        "      <th>daily_vaccinations</th>\n",
234 |        "    </tr>\n",
235 |        "  </thead>\n",
236 |        "  <tbody>\n",
237 |        "    <tr>\n",
238 |        "      <td>0</td>\n",
239 |        "      <td>2021-01-12</td>\n",
240 |        "      <td>Alabama</td>\n",
241 |        "      <td>78134.0</td>\n",
242 |        "      <td>377025.0</td>\n",
243 |        "      <td>70861.0</td>\n",
244 |        "      <td>0.15</td>\n",
245 |        "      <td>1.59</td>\n",
246 |        "      <td>7270.0</td>\n",
247 |        "      <td>1.44</td>\n",
248 |        "      <td>7.69</td>\n",
249 |        "      <td>NaN</td>\n",
250 |        "      <td>NaN</td>\n",
251 |        "    </tr>\n",
252 |        "    <tr>\n",
253 |        "      <td>1</td>\n",
254 |        "      <td>2021-01-13</td>\n",
255 |        "      <td>Alabama</td>\n",
256 |        "      <td>84040.0</td>\n",
257 |        "      <td>378975.0</td>\n",
258 |        "      <td>74792.0</td>\n",
259 |        "      <td>0.19</td>\n",
260 |        "      <td>1.71</td>\n",
261 |        "      <td>9245.0</td>\n",
262 |        "      <td>1.52</td>\n",
263 |        "      <td>7.73</td>\n",
264 |        "      <td>5906.0</td>\n",
265 |        "      <td>5906.0</td>\n",
266 |        "    </tr>\n",
267 |        "    <tr>\n",
268 |        "      <td>2</td>\n",
269 |        "      <td>2021-01-14</td>\n",
270 |        "      <td>Alabama</td>\n",
271 |        "      <td>92300.0</td>\n",
272 |        "      <td>435350.0</td>\n",
273 |        "      <td>80480.0</td>\n",
274 |        "      <td>NaN</td>\n",
275 |        "      <td>1.88</td>\n",
276 |        "      <td>NaN</td>\n",
277 |        "      <td>1.64</td>\n",
278 |        "      <td>8.88</td>\n",
279 |        "      <td>8260.0</td>\n",
280 |        "      <td>7083.0</td>\n",
281 |        "    </tr>\n",
282 |        "    <tr>\n",
283 |        "      <td>3</td>\n",
284 |        "      <td>2021-01-15</td>\n",
285 |        "      <td>Alabama</td>\n",
286 |        "      <td>100567.0</td>\n",
287 |        "      <td>444650.0</td>\n",
288 |        "      <td>86956.0</td>\n",
289 |        "      <td>0.27</td>\n",
290 |        "      <td>2.05</td>\n",
291 |        "      <td>13488.0</td>\n",
292 |        "      <td>1.77</td>\n",
293 |        "      <td>9.07</td>\n",
294 |        "      <td>8267.0</td>\n",
295 |        "      <td>7478.0</td>\n",
296 |        "    </tr>\n",
297 |        "    <tr>\n",
298 |        "      <td>4</td>\n",
299 |        "      <td>2021-01-16</td>\n",
300 |        "      <td>Alabama</td>\n",
301 |        "      <td>NaN</td>\n",
302 |        "      <td>NaN</td>\n",
303 |        "      <td>NaN</td>\n",
304 |        "      <td>NaN</td>\n",
305 |        "      <td>NaN</td>\n",
306 |        "      <td>NaN</td>\n",
307 |        "      <td>NaN</td>\n",
308 |        "      <td>NaN</td>\n",
309 |        "      <td>7557.0</td>\n",
310 |        "      <td>7498.0</td>\n",
311 |        "    </tr>\n",
312 |        "  </tbody>\n",
313 |        "</table>\n",
314 |        "</div>"
315 |       ],
316 |       "text/plain": [
317 |        "         date location  total_vaccinations  total_distributed  \\\n",
318 |        "0  2021-01-12  Alabama             78134.0           377025.0   \n",
319 |        "1  2021-01-13  Alabama             84040.0           378975.0   \n",
320 |        "2  2021-01-14  Alabama             92300.0           435350.0   \n",
321 |        "3  2021-01-15  Alabama            100567.0           444650.0   \n",
322 |        "4  2021-01-16  Alabama                 NaN                NaN   \n",
323 |        "\n",
324 |        "   people_vaccinated  people_fully_vaccinated_per_hundred  \\\n",
325 |        "0            70861.0                                 0.15   \n",
326 |        "1            74792.0                                 0.19   \n",
327 |        "2            80480.0                                  NaN   \n",
328 |        "3            86956.0                                 0.27   \n",
329 |        "4                NaN                                  NaN   \n",
330 |        "\n",
331 |        "   total_vaccinations_per_hundred  people_fully_vaccinated  \\\n",
332 |        "0                            1.59                   7270.0   \n",
333 |        "1                            1.71                   9245.0   \n",
334 |        "2                            1.88                      NaN   \n",
335 |        "3                            2.05                  13488.0   \n",
336 |        "4                             NaN                      NaN   \n",
337 |        "\n",
338 |        "   people_vaccinated_per_hundred  distributed_per_hundred  \\\n",
339 |        "0                           1.44                     7.69   \n",
340 |        "1                           1.52                     7.73   \n",
341 |        "2                           1.64                     8.88   \n",
342 |        "3                           1.77                     9.07   \n",
343 |        "4                            NaN                      NaN   \n",
344 |        "\n",
345 |        "   daily_vaccinations_raw  daily_vaccinations  \n",
346 |        "0                     NaN                 NaN  \n",
347 |        "1                  5906.0              5906.0  \n",
348 |        "2                  8260.0              7083.0  \n",
349 |        "3                  8267.0              7478.0  \n",
350 |        "4                  7557.0              7498.0  "
351 |       ]
352 |      },
353 |      "execution_count": 62,
354 |      "metadata": {},
355 |      "output_type": "execute_result"
356 |     }
357 |    ],
358 |    "source": [
359 |     "#drop columns unrelated to research question\n",
360 |     "vaccinations = vaccinations.drop(['daily_vaccinations_per_million', 'share_doses_used', 'daily_vaccinations_per_million', 'share_doses_used'], 1)\n",
361 |     "vaccinations.head()"
362 |    ]
363 |   },
364 |   {
365 |    "cell_type": "code",
366 |    "execution_count": 63,
367 |    "metadata": {},
368 |    "outputs": [
369 |     {
370 |      "data": {
371 |       "text/html": [
372 |        "<div>\n",
373 |        "<style scoped>\n",
374 |        "    .dataframe tbody tr th:only-of-type {\n",
375 |        "        vertical-align: middle;\n",
376 |        "    }\n",
377 |        "\n",
378 |        "    .dataframe tbody tr th {\n",
379 |        "        vertical-align: top;\n",
380 |        "    }\n",
381 |        "\n",
382 |        "    .dataframe thead th {\n",
383 |        "        text-align: right;\n",
384 |        "    }\n",
385 |        "</style>\n",
386 |        "<table border=\"1\" class=\"dataframe\">\n",
387 |        "  <thead>\n",
388 |        "    <tr style=\"text-align: right;\">\n",
389 |        "      <th></th>\n",
390 |        "      <th>date</th>\n",
391 |        "      <th>state</th>\n",
392 |        "      <th>hospitalized</th>\n",
393 |        "      <th>hospitalizedCumulative</th>\n",
394 |        "      <th>hospitalizedCurrently</th>\n",
395 |        "      <th>hospitalizedIncrease</th>\n",
396 |        "      <th>positive</th>\n",
397 |        "      <th>recovered</th>\n",
398 |        "      <th>totalTestEncountersViral</th>\n",
399 |        "      <th>totalTestEncountersViralIncrease</th>\n",
400 |        "      <th>totalTestResults</th>\n",
401 |        "      <th>totalTestResultsIncrease</th>\n",
402 |        "    </tr>\n",
403 |        "  </thead>\n",
404 |        "  <tbody>\n",
405 |        "    <tr>\n",
406 |        "      <td>0</td>\n",
407 |        "      <td>2021-02-12</td>\n",
408 |        "      <td>AK</td>\n",
409 |        "      <td>1230.0</td>\n",
410 |        "      <td>1230.0</td>\n",
411 |        "      <td>35.0</td>\n",
412 |        "      <td>3</td>\n",
413 |        "      <td>54282.0</td>\n",
414 |        "      <td>NaN</td>\n",
415 |        "      <td>NaN</td>\n",
416 |        "      <td>0</td>\n",
417 |        "      <td>1584548.0</td>\n",
418 |        "      <td>7192</td>\n",
419 |        "    </tr>\n",
420 |        "    <tr>\n",
421 |        "      <td>1</td>\n",
422 |        "      <td>2021-02-12</td>\n",
423 |        "      <td>AL</td>\n",
424 |        "      <td>44148.0</td>\n",
425 |        "      <td>44148.0</td>\n",
426 |        "      <td>1267.0</td>\n",
427 |        "      <td>242</td>\n",
428 |        "      <td>478667.0</td>\n",
429 |        "      <td>264621.0</td>\n",
430 |        "      <td>NaN</td>\n",
431 |        "      <td>0</td>\n",
432 |        "      <td>2218143.0</td>\n",
433 |        "      <td>6772</td>\n",
434 |        "    </tr>\n",
435 |        "    <tr>\n",
436 |        "      <td>2</td>\n",
437 |        "      <td>2021-02-12</td>\n",
438 |        "      <td>AR</td>\n",
439 |        "      <td>14278.0</td>\n",
440 |        "      <td>14278.0</td>\n",
441 |        "      <td>712.0</td>\n",
442 |        "      <td>23</td>\n",
443 |        "      <td>311608.0</td>\n",
444 |        "      <td>293796.0</td>\n",
445 |        "      <td>NaN</td>\n",
446 |        "      <td>0</td>\n",
447 |        "      <td>2569944.0</td>\n",
448 |        "      <td>6669</td>\n",
449 |        "    </tr>\n",
450 |        "    <tr>\n",
451 |        "      <td>3</td>\n",
452 |        "      <td>2021-02-12</td>\n",
453 |        "      <td>AS</td>\n",
454 |        "      <td>NaN</td>\n",
455 |        "      <td>NaN</td>\n",
456 |        "      <td>NaN</td>\n",
457 |        "      <td>0</td>\n",
458 |        "      <td>0.0</td>\n",
459 |        "      <td>NaN</td>\n",
460 |        "      <td>NaN</td>\n",
461 |        "      <td>0</td>\n",
462 |        "      <td>2140.0</td>\n",
463 |        "      <td>0</td>\n",
464 |        "    </tr>\n",
465 |        "    <tr>\n",
466 |        "      <td>4</td>\n",
467 |        "      <td>2021-02-12</td>\n",
468 |        "      <td>AZ</td>\n",
469 |        "      <td>55413.0</td>\n",
470 |        "      <td>55413.0</td>\n",
471 |        "      <td>2396.0</td>\n",
472 |        "      <td>141</td>\n",
473 |        "      <td>793532.0</td>\n",
474 |        "      <td>110642.0</td>\n",
475 |        "      <td>NaN</td>\n",
476 |        "      <td>0</td>\n",
477 |        "      <td>7140917.0</td>\n",
478 |        "      <td>35221</td>\n",
479 |        "    </tr>\n",
480 |        "  </tbody>\n",
481 |        "</table>\n",
482 |        "</div>"
483 |       ],
484 |       "text/plain": [
485 |        "         date state  hospitalized  hospitalizedCumulative  \\\n",
486 |        "0  2021-02-12    AK        1230.0                  1230.0   \n",
487 |        "1  2021-02-12    AL       44148.0                 44148.0   \n",
488 |        "2  2021-02-12    AR       14278.0                 14278.0   \n",
489 |        "3  2021-02-12    AS           NaN                     NaN   \n",
490 |        "4  2021-02-12    AZ       55413.0                 55413.0   \n",
491 |        "\n",
492 |        "   hospitalizedCurrently  hospitalizedIncrease  positive  recovered  \\\n",
493 |        "0                   35.0                     3   54282.0        NaN   \n",
494 |        "1                 1267.0                   242  478667.0   264621.0   \n",
495 |        "2                  712.0                    23  311608.0   293796.0   \n",
496 |        "3                    NaN                     0       0.0        NaN   \n",
497 |        "4                 2396.0                   141  793532.0   110642.0   \n",
498 |        "\n",
499 |        "   totalTestEncountersViral  totalTestEncountersViralIncrease  \\\n",
500 |        "0                       NaN                                 0   \n",
501 |        "1                       NaN                                 0   \n",
502 |        "2                       NaN                                 0   \n",
503 |        "3                       NaN                                 0   \n",
504 |        "4                       NaN                                 0   \n",
505 |        "\n",
506 |        "   totalTestResults  totalTestResultsIncrease  \n",
507 |        "0         1584548.0                      7192  \n",
508 |        "1         2218143.0                      6772  \n",
509 |        "2         2569944.0                      6669  \n",
510 |        "3            2140.0                         0  \n",
511 |        "4         7140917.0                     35221  "
512 |       ]
513 |      },
514 |      "execution_count": 63,
515 |      "metadata": {},
516 |      "output_type": "execute_result"
517 |     }
518 |    ],
519 |    "source": [
520 |     "#dropping more unrelated columns to research question - including deaths, antibody tests, negative results, and positive results related to type of test\n",
521 |     "case_tracking = case_tracking.drop(['dataQualityGrade','deathProbable', 'death', 'deathConfirmed', 'deathIncrease', 'negative',\n",
522 |     "       'negativeIncrease', 'negativeTestsAntibody',\n",
523 |     "       'negativeTestsPeopleAntibody', 'negativeTestsViral', 'totalTestsAntibody', 'totalTestsAntigen',\n",
524 |     "       'totalTestsPeopleAntibody', 'totalTestsPeopleAntigen',\n",
525 |     "       'totalTestsPeopleViral', 'totalTestsPeopleViralIncrease',\n",
526 |     "       'totalTestsViral', 'totalTestsViralIncrease','positiveCasesViral', 'positiveIncrease', 'positiveScore',\n",
527 |     "       'positiveTestsAntibody', 'positiveTestsAntigen',\n",
528 |     "       'positiveTestsPeopleAntibody', 'positiveTestsPeopleAntigen',\n",
529 |     "       'positiveTestsViral', 'inIcuCumulative','inIcuCurrently','onVentilatorCumulative', 'onVentilatorCurrently'], 1)\n",
530 |     "case_tracking.head()"
531 |    ]
532 |   },
533 |   {
534 |    "cell_type": "markdown",
535 |    "metadata": {},
536 |    "source": [
537 |     "# Project Proposal (updated)"
538 |    ]
539 |   },
540 |   {
541 |    "cell_type": "markdown",
542 |    "metadata": {},
543 |    "source": [
544 |     "| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |\n",
545 |     "|---|---|---|---|\n",
546 |     "| 1/20  |  1 PM | Read & Think about COGS 108 expectations  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | \n",
547 |     "| 1/26  |  10 AM |  Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal | \n",
548 |     "| 2/1  | 10 AM  | Edit, finalize, and submit proposal; Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |\n",
549 |     "| 2/13  | 4 PM  | Import & Wrangle Data; EDA | Review/Edit wrangling/EDA; Discuss Analysis Plan   |\n",
550 |     "| 2/23  | 12 PM  | Finalize wrangling/EDA; Begin Analysis| Discuss/edit Analysis; Complete project check-in |\n",
551 |     "| 3/13  | 12 PM  | Complete analysis; Draft results/conclusion/discussion| Discuss/edit full project |\n",
552 |     "| 3/19  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |"
553 |    ]
554 |   },
555 |   {
556 |    "cell_type": "code",
557 |    "execution_count": null,
558 |    "metadata": {},
559 |    "outputs": [],
560 |    "source": []
561 |   }
562 |  ],
563 |  "metadata": {
564 |   "kernelspec": {
565 |    "display_name": "Python 3",
566 |    "language": "python",
567 |    "name": "python3"
568 |   },
569 |   "language_info": {
570 |    "codemirror_mode": {
571 |     "name": "ipython",
572 |     "version": 3
573 |    },
574 |    "file_extension": ".py",
575 |    "mimetype": "text/x-python",
576 |    "name": "python",
577 |    "nbconvert_exporter": "python",
578 |    "pygments_lexer": "ipython3",
579 |    "version": "3.7.4"
580 |   }
581 |  },
582 |  "nbformat": 4,
583 |  "nbformat_minor": 2
584 | }
585 | 


--------------------------------------------------------------------------------
/.ipynb_checkpoints/DataCheckpoint_group042-checkpoint.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# COGS 108 - Data Checkpoint"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "# Names\n",
 15 |     "\n",
 16 |     "- Anna Wang\n",
 17 |     "- Chloe Salem\n",
 18 |     "- Kristy Liou\n",
 19 |     "- Maxtierney Arias\n",
 20 |     "- Zeven Vidmar Barker"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "markdown",
 25 |    "metadata": {},
 26 |    "source": [
 27 |     "<a id='research_question'></a>\n",
 28 |     "# Research Question"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "Can we predict which state in the USA will be covid-free first based on current hospital records, state regulations, and population?"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "# Dataset(s)"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "markdown",
 47 |    "metadata": {},
 48 |    "source": [
 49 |     "*Fill in your dataset information here*\n",
 50 |     "\n",
 51 |     "(Copy this information for each dataset)\n",
 52 |     "- Dataset Name: **Population, Population Changes, and Estimates**\n",
 53 |     "- Link to the dataset: https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/totals/nst-est2020.csv\n",
 54 |     "- Number of observations: The US Census dataset from 2010 includes populations of the country, states, and regions as well as estimates of each for every year leading up to 2020. This dataset is the best available given that the 2020 census is still being processed.\n",
 55 |     "\n",
 56 |     "\n",
 57 |     "- Dataset Name: **US State Vaccinations**\n",
 58 |     "- Link to the dataset: https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/us_state_vaccinations.csv\n",
 59 |     "- It has 2104 observations, which include data on how many people from each state has been vaccinated starting from January 12, 2021.  These observations are broken down by day, which will allow us to analyze the rate at which vaccines are being received and distributed in each state. The dataset features the total number of vaccinations a state has each day with the total number of vaccinations distributed per day.\n",
 60 |     "\n",
 61 |     "\n",
 62 |     "- Dataset Name: **COVID Tracking**\n",
 63 |     "- Link to the dataset: https://covidtracking.com/data\n",
 64 |     "- It has 2,006 observations, that shows us the number of COVID cases for each state in the US since January 12, 2020, along with patient hospitalization data by state, data on deaths, and COVID testing information. We will be utilizing hospitalization data and testing information.\n",
 65 |     "\n",
 66 |     "\n",
 67 |     "\n"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "markdown",
 72 |    "metadata": {},
 73 |    "source": [
 74 |     "# Setup"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "code",
 79 |    "execution_count": 60,
 80 |    "metadata": {},
 81 |    "outputs": [],
 82 |    "source": [
 83 |     "#importing needed libraries\n",
 84 |     "import pandas as pd\n",
 85 |     "import seaborn as sns\n",
 86 |     "import numpy as np\n",
 87 |     "\n",
 88 |     "# reading data sets\n",
 89 |     "population = pd.read_csv(\"https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/national/totals/nst-est2020.csv\")\n",
 90 |     "vaccinations = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/us_state_vaccinations.csv')\n",
 91 |     "# includes hospitalization and covid cases over time\n",
 92 |     "case_tracking = pd.read_csv('https://covidtracking.com/data/download/all-states-history.csv')"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "markdown",
 97 |    "metadata": {},
 98 |    "source": [
 99 |     "# Data Cleaning"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "markdown",
104 |    "metadata": {},
105 |    "source": [
106 |     "The three datasets we used were already in a tidy format, with variables representing every distinct measurement made by the sources in the columns, and separate observations in the rows. Since we did remove observations unrelated to states in the Census, and we just wanted the states and their 2020 population estimates, we reset the index for `population`. For the other two datasets, we just removed the variables that were unnecessary for our analysis.\n",
107 |     "\n",
108 |     "To answer our question, we will utilize data on hospital capacity by state, vaccinations in each state, and COVID cases by state, combined with population data sourced from the US Census. \n"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": 61,
114 |    "metadata": {},
115 |    "outputs": [
116 |     {
117 |      "data": {
118 |       "text/html": [
119 |        "<div>\n",
120 |        "<style scoped>\n",
121 |        "    .dataframe tbody tr th:only-of-type {\n",
122 |        "        vertical-align: middle;\n",
123 |        "    }\n",
124 |        "\n",
125 |        "    .dataframe tbody tr th {\n",
126 |        "        vertical-align: top;\n",
127 |        "    }\n",
128 |        "\n",
129 |        "    .dataframe thead th {\n",
130 |        "        text-align: right;\n",
131 |        "    }\n",
132 |        "</style>\n",
133 |        "<table border=\"1\" class=\"dataframe\">\n",
134 |        "  <thead>\n",
135 |        "    <tr style=\"text-align: right;\">\n",
136 |        "      <th></th>\n",
137 |        "      <th>POPESTIMATE2020</th>\n",
138 |        "    </tr>\n",
139 |        "    <tr>\n",
140 |        "      <th>NAME</th>\n",
141 |        "      <th></th>\n",
142 |        "    </tr>\n",
143 |        "  </thead>\n",
144 |        "  <tbody>\n",
145 |        "    <tr>\n",
146 |        "      <td>Alabama</td>\n",
147 |        "      <td>4921532</td>\n",
148 |        "    </tr>\n",
149 |        "    <tr>\n",
150 |        "      <td>Alaska</td>\n",
151 |        "      <td>731158</td>\n",
152 |        "    </tr>\n",
153 |        "    <tr>\n",
154 |        "      <td>Arizona</td>\n",
155 |        "      <td>7421401</td>\n",
156 |        "    </tr>\n",
157 |        "    <tr>\n",
158 |        "      <td>Arkansas</td>\n",
159 |        "      <td>3030522</td>\n",
160 |        "    </tr>\n",
161 |        "    <tr>\n",
162 |        "      <td>California</td>\n",
163 |        "      <td>39368078</td>\n",
164 |        "    </tr>\n",
165 |        "  </tbody>\n",
166 |        "</table>\n",
167 |        "</div>"
168 |       ],
169 |       "text/plain": [
170 |        "            POPESTIMATE2020\n",
171 |        "NAME                       \n",
172 |        "Alabama             4921532\n",
173 |        "Alaska               731158\n",
174 |        "Arizona             7421401\n",
175 |        "Arkansas            3030522\n",
176 |        "California         39368078"
177 |       ]
178 |      },
179 |      "execution_count": 61,
180 |      "metadata": {},
181 |      "output_type": "execute_result"
182 |     }
183 |    ],
184 |    "source": [
185 |     "# Removing the regions population\n",
186 |     "population = population[5::]\n",
187 |     "# Remove columns with years outside 2020\n",
188 |     "population = population.drop(population.columns[7:-1], 1)\n",
189 |     "# Remove unnecessary variables related to region and identifiers\n",
190 |     "population = population.drop(['STATE','SUMLEV', 'DIVISION', 'REGION','CENSUS2010POP','ESTIMATESBASE2010'], 1)\n",
191 |     "# Set index to be the name of the state, rather than an arbitrary number\n",
192 |     "population.set_index(['NAME'], inplace=True)\n",
193 |     "population.head()"
194 |    ]
195 |   },
196 |   {
197 |    "cell_type": "code",
198 |    "execution_count": 62,
199 |    "metadata": {},
200 |    "outputs": [
201 |     {
202 |      "data": {
203 |       "text/html": [
204 |        "<div>\n",
205 |        "<style scoped>\n",
206 |        "    .dataframe tbody tr th:only-of-type {\n",
207 |        "        vertical-align: middle;\n",
208 |        "    }\n",
209 |        "\n",
210 |        "    .dataframe tbody tr th {\n",
211 |        "        vertical-align: top;\n",
212 |        "    }\n",
213 |        "\n",
214 |        "    .dataframe thead th {\n",
215 |        "        text-align: right;\n",
216 |        "    }\n",
217 |        "</style>\n",
218 |        "<table border=\"1\" class=\"dataframe\">\n",
219 |        "  <thead>\n",
220 |        "    <tr style=\"text-align: right;\">\n",
221 |        "      <th></th>\n",
222 |        "      <th>date</th>\n",
223 |        "      <th>location</th>\n",
224 |        "      <th>total_vaccinations</th>\n",
225 |        "      <th>total_distributed</th>\n",
226 |        "      <th>people_vaccinated</th>\n",
227 |        "      <th>people_fully_vaccinated_per_hundred</th>\n",
228 |        "      <th>total_vaccinations_per_hundred</th>\n",
229 |        "      <th>people_fully_vaccinated</th>\n",
230 |        "      <th>people_vaccinated_per_hundred</th>\n",
231 |        "      <th>distributed_per_hundred</th>\n",
232 |        "      <th>daily_vaccinations_raw</th>\n",
233 |        "      <th>daily_vaccinations</th>\n",
234 |        "    </tr>\n",
235 |        "  </thead>\n",
236 |        "  <tbody>\n",
237 |        "    <tr>\n",
238 |        "      <td>0</td>\n",
239 |        "      <td>2021-01-12</td>\n",
240 |        "      <td>Alabama</td>\n",
241 |        "      <td>78134.0</td>\n",
242 |        "      <td>377025.0</td>\n",
243 |        "      <td>70861.0</td>\n",
244 |        "      <td>0.15</td>\n",
245 |        "      <td>1.59</td>\n",
246 |        "      <td>7270.0</td>\n",
247 |        "      <td>1.44</td>\n",
248 |        "      <td>7.69</td>\n",
249 |        "      <td>NaN</td>\n",
250 |        "      <td>NaN</td>\n",
251 |        "    </tr>\n",
252 |        "    <tr>\n",
253 |        "      <td>1</td>\n",
254 |        "      <td>2021-01-13</td>\n",
255 |        "      <td>Alabama</td>\n",
256 |        "      <td>84040.0</td>\n",
257 |        "      <td>378975.0</td>\n",
258 |        "      <td>74792.0</td>\n",
259 |        "      <td>0.19</td>\n",
260 |        "      <td>1.71</td>\n",
261 |        "      <td>9245.0</td>\n",
262 |        "      <td>1.52</td>\n",
263 |        "      <td>7.73</td>\n",
264 |        "      <td>5906.0</td>\n",
265 |        "      <td>5906.0</td>\n",
266 |        "    </tr>\n",
267 |        "    <tr>\n",
268 |        "      <td>2</td>\n",
269 |        "      <td>2021-01-14</td>\n",
270 |        "      <td>Alabama</td>\n",
271 |        "      <td>92300.0</td>\n",
272 |        "      <td>435350.0</td>\n",
273 |        "      <td>80480.0</td>\n",
274 |        "      <td>NaN</td>\n",
275 |        "      <td>1.88</td>\n",
276 |        "      <td>NaN</td>\n",
277 |        "      <td>1.64</td>\n",
278 |        "      <td>8.88</td>\n",
279 |        "      <td>8260.0</td>\n",
280 |        "      <td>7083.0</td>\n",
281 |        "    </tr>\n",
282 |        "    <tr>\n",
283 |        "      <td>3</td>\n",
284 |        "      <td>2021-01-15</td>\n",
285 |        "      <td>Alabama</td>\n",
286 |        "      <td>100567.0</td>\n",
287 |        "      <td>444650.0</td>\n",
288 |        "      <td>86956.0</td>\n",
289 |        "      <td>0.27</td>\n",
290 |        "      <td>2.05</td>\n",
291 |        "      <td>13488.0</td>\n",
292 |        "      <td>1.77</td>\n",
293 |        "      <td>9.07</td>\n",
294 |        "      <td>8267.0</td>\n",
295 |        "      <td>7478.0</td>\n",
296 |        "    </tr>\n",
297 |        "    <tr>\n",
298 |        "      <td>4</td>\n",
299 |        "      <td>2021-01-16</td>\n",
300 |        "      <td>Alabama</td>\n",
301 |        "      <td>NaN</td>\n",
302 |        "      <td>NaN</td>\n",
303 |        "      <td>NaN</td>\n",
304 |        "      <td>NaN</td>\n",
305 |        "      <td>NaN</td>\n",
306 |        "      <td>NaN</td>\n",
307 |        "      <td>NaN</td>\n",
308 |        "      <td>NaN</td>\n",
309 |        "      <td>7557.0</td>\n",
310 |        "      <td>7498.0</td>\n",
311 |        "    </tr>\n",
312 |        "  </tbody>\n",
313 |        "</table>\n",
314 |        "</div>"
315 |       ],
316 |       "text/plain": [
317 |        "         date location  total_vaccinations  total_distributed  \\\n",
318 |        "0  2021-01-12  Alabama             78134.0           377025.0   \n",
319 |        "1  2021-01-13  Alabama             84040.0           378975.0   \n",
320 |        "2  2021-01-14  Alabama             92300.0           435350.0   \n",
321 |        "3  2021-01-15  Alabama            100567.0           444650.0   \n",
322 |        "4  2021-01-16  Alabama                 NaN                NaN   \n",
323 |        "\n",
324 |        "   people_vaccinated  people_fully_vaccinated_per_hundred  \\\n",
325 |        "0            70861.0                                 0.15   \n",
326 |        "1            74792.0                                 0.19   \n",
327 |        "2            80480.0                                  NaN   \n",
328 |        "3            86956.0                                 0.27   \n",
329 |        "4                NaN                                  NaN   \n",
330 |        "\n",
331 |        "   total_vaccinations_per_hundred  people_fully_vaccinated  \\\n",
332 |        "0                            1.59                   7270.0   \n",
333 |        "1                            1.71                   9245.0   \n",
334 |        "2                            1.88                      NaN   \n",
335 |        "3                            2.05                  13488.0   \n",
336 |        "4                             NaN                      NaN   \n",
337 |        "\n",
338 |        "   people_vaccinated_per_hundred  distributed_per_hundred  \\\n",
339 |        "0                           1.44                     7.69   \n",
340 |        "1                           1.52                     7.73   \n",
341 |        "2                           1.64                     8.88   \n",
342 |        "3                           1.77                     9.07   \n",
343 |        "4                            NaN                      NaN   \n",
344 |        "\n",
345 |        "   daily_vaccinations_raw  daily_vaccinations  \n",
346 |        "0                     NaN                 NaN  \n",
347 |        "1                  5906.0              5906.0  \n",
348 |        "2                  8260.0              7083.0  \n",
349 |        "3                  8267.0              7478.0  \n",
350 |        "4                  7557.0              7498.0  "
351 |       ]
352 |      },
353 |      "execution_count": 62,
354 |      "metadata": {},
355 |      "output_type": "execute_result"
356 |     }
357 |    ],
358 |    "source": [
359 |     "#drop columns unrelated to research question\n",
360 |     "vaccinations = vaccinations.drop(['daily_vaccinations_per_million', 'share_doses_used', 'daily_vaccinations_per_million', 'share_doses_used'], 1)\n",
361 |     "vaccinations.head()"
362 |    ]
363 |   },
364 |   {
365 |    "cell_type": "code",
366 |    "execution_count": 63,
367 |    "metadata": {},
368 |    "outputs": [
369 |     {
370 |      "data": {
371 |       "text/html": [
372 |        "<div>\n",
373 |        "<style scoped>\n",
374 |        "    .dataframe tbody tr th:only-of-type {\n",
375 |        "        vertical-align: middle;\n",
376 |        "    }\n",
377 |        "\n",
378 |        "    .dataframe tbody tr th {\n",
379 |        "        vertical-align: top;\n",
380 |        "    }\n",
381 |        "\n",
382 |        "    .dataframe thead th {\n",
383 |        "        text-align: right;\n",
384 |        "    }\n",
385 |        "</style>\n",
386 |        "<table border=\"1\" class=\"dataframe\">\n",
387 |        "  <thead>\n",
388 |        "    <tr style=\"text-align: right;\">\n",
389 |        "      <th></th>\n",
390 |        "      <th>date</th>\n",
391 |        "      <th>state</th>\n",
392 |        "      <th>hospitalized</th>\n",
393 |        "      <th>hospitalizedCumulative</th>\n",
394 |        "      <th>hospitalizedCurrently</th>\n",
395 |        "      <th>hospitalizedIncrease</th>\n",
396 |        "      <th>positive</th>\n",
397 |        "      <th>recovered</th>\n",
398 |        "      <th>totalTestEncountersViral</th>\n",
399 |        "      <th>totalTestEncountersViralIncrease</th>\n",
400 |        "      <th>totalTestResults</th>\n",
401 |        "      <th>totalTestResultsIncrease</th>\n",
402 |        "    </tr>\n",
403 |        "  </thead>\n",
404 |        "  <tbody>\n",
405 |        "    <tr>\n",
406 |        "      <td>0</td>\n",
407 |        "      <td>2021-02-12</td>\n",
408 |        "      <td>AK</td>\n",
409 |        "      <td>1230.0</td>\n",
410 |        "      <td>1230.0</td>\n",
411 |        "      <td>35.0</td>\n",
412 |        "      <td>3</td>\n",
413 |        "      <td>54282.0</td>\n",
414 |        "      <td>NaN</td>\n",
415 |        "      <td>NaN</td>\n",
416 |        "      <td>0</td>\n",
417 |        "      <td>1584548.0</td>\n",
418 |        "      <td>7192</td>\n",
419 |        "    </tr>\n",
420 |        "    <tr>\n",
421 |        "      <td>1</td>\n",
422 |        "      <td>2021-02-12</td>\n",
423 |        "      <td>AL</td>\n",
424 |        "      <td>44148.0</td>\n",
425 |        "      <td>44148.0</td>\n",
426 |        "      <td>1267.0</td>\n",
427 |        "      <td>242</td>\n",
428 |        "      <td>478667.0</td>\n",
429 |        "      <td>264621.0</td>\n",
430 |        "      <td>NaN</td>\n",
431 |        "      <td>0</td>\n",
432 |        "      <td>2218143.0</td>\n",
433 |        "      <td>6772</td>\n",
434 |        "    </tr>\n",
435 |        "    <tr>\n",
436 |        "      <td>2</td>\n",
437 |        "      <td>2021-02-12</td>\n",
438 |        "      <td>AR</td>\n",
439 |        "      <td>14278.0</td>\n",
440 |        "      <td>14278.0</td>\n",
441 |        "      <td>712.0</td>\n",
442 |        "      <td>23</td>\n",
443 |        "      <td>311608.0</td>\n",
444 |        "      <td>293796.0</td>\n",
445 |        "      <td>NaN</td>\n",
446 |        "      <td>0</td>\n",
447 |        "      <td>2569944.0</td>\n",
448 |        "      <td>6669</td>\n",
449 |        "    </tr>\n",
450 |        "    <tr>\n",
451 |        "      <td>3</td>\n",
452 |        "      <td>2021-02-12</td>\n",
453 |        "      <td>AS</td>\n",
454 |        "      <td>NaN</td>\n",
455 |        "      <td>NaN</td>\n",
456 |        "      <td>NaN</td>\n",
457 |        "      <td>0</td>\n",
458 |        "      <td>0.0</td>\n",
459 |        "      <td>NaN</td>\n",
460 |        "      <td>NaN</td>\n",
461 |        "      <td>0</td>\n",
462 |        "      <td>2140.0</td>\n",
463 |        "      <td>0</td>\n",
464 |        "    </tr>\n",
465 |        "    <tr>\n",
466 |        "      <td>4</td>\n",
467 |        "      <td>2021-02-12</td>\n",
468 |        "      <td>AZ</td>\n",
469 |        "      <td>55413.0</td>\n",
470 |        "      <td>55413.0</td>\n",
471 |        "      <td>2396.0</td>\n",
472 |        "      <td>141</td>\n",
473 |        "      <td>793532.0</td>\n",
474 |        "      <td>110642.0</td>\n",
475 |        "      <td>NaN</td>\n",
476 |        "      <td>0</td>\n",
477 |        "      <td>7140917.0</td>\n",
478 |        "      <td>35221</td>\n",
479 |        "    </tr>\n",
480 |        "  </tbody>\n",
481 |        "</table>\n",
482 |        "</div>"
483 |       ],
484 |       "text/plain": [
485 |        "         date state  hospitalized  hospitalizedCumulative  \\\n",
486 |        "0  2021-02-12    AK        1230.0                  1230.0   \n",
487 |        "1  2021-02-12    AL       44148.0                 44148.0   \n",
488 |        "2  2021-02-12    AR       14278.0                 14278.0   \n",
489 |        "3  2021-02-12    AS           NaN                     NaN   \n",
490 |        "4  2021-02-12    AZ       55413.0                 55413.0   \n",
491 |        "\n",
492 |        "   hospitalizedCurrently  hospitalizedIncrease  positive  recovered  \\\n",
493 |        "0                   35.0                     3   54282.0        NaN   \n",
494 |        "1                 1267.0                   242  478667.0   264621.0   \n",
495 |        "2                  712.0                    23  311608.0   293796.0   \n",
496 |        "3                    NaN                     0       0.0        NaN   \n",
497 |        "4                 2396.0                   141  793532.0   110642.0   \n",
498 |        "\n",
499 |        "   totalTestEncountersViral  totalTestEncountersViralIncrease  \\\n",
500 |        "0                       NaN                                 0   \n",
501 |        "1                       NaN                                 0   \n",
502 |        "2                       NaN                                 0   \n",
503 |        "3                       NaN                                 0   \n",
504 |        "4                       NaN                                 0   \n",
505 |        "\n",
506 |        "   totalTestResults  totalTestResultsIncrease  \n",
507 |        "0         1584548.0                      7192  \n",
508 |        "1         2218143.0                      6772  \n",
509 |        "2         2569944.0                      6669  \n",
510 |        "3            2140.0                         0  \n",
511 |        "4         7140917.0                     35221  "
512 |       ]
513 |      },
514 |      "execution_count": 63,
515 |      "metadata": {},
516 |      "output_type": "execute_result"
517 |     }
518 |    ],
519 |    "source": [
520 |     "#dropping more unrelated columns to research question - including deaths, antibody tests, negative results, and positive results related to type of test\n",
521 |     "case_tracking = case_tracking.drop(['dataQualityGrade','deathProbable', 'death', 'deathConfirmed', 'deathIncrease', 'negative',\n",
522 |     "       'negativeIncrease', 'negativeTestsAntibody',\n",
523 |     "       'negativeTestsPeopleAntibody', 'negativeTestsViral', 'totalTestsAntibody', 'totalTestsAntigen',\n",
524 |     "       'totalTestsPeopleAntibody', 'totalTestsPeopleAntigen',\n",
525 |     "       'totalTestsPeopleViral', 'totalTestsPeopleViralIncrease',\n",
526 |     "       'totalTestsViral', 'totalTestsViralIncrease','positiveCasesViral', 'positiveIncrease', 'positiveScore',\n",
527 |     "       'positiveTestsAntibody', 'positiveTestsAntigen',\n",
528 |     "       'positiveTestsPeopleAntibody', 'positiveTestsPeopleAntigen',\n",
529 |     "       'positiveTestsViral', 'inIcuCumulative','inIcuCurrently','onVentilatorCumulative', 'onVentilatorCurrently'], 1)\n",
530 |     "case_tracking.head()"
531 |    ]
532 |   },
533 |   {
534 |    "cell_type": "markdown",
535 |    "metadata": {},
536 |    "source": [
537 |     "# Project Proposal (updated)"
538 |    ]
539 |   },
540 |   {
541 |    "cell_type": "markdown",
542 |    "metadata": {},
543 |    "source": [
544 |     "| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |\n",
545 |     "|---|---|---|---|\n",
546 |     "| 1/20  |  1 PM | Read & Think about COGS 108 expectations  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | \n",
547 |     "| 1/26  |  10 AM |  Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal | \n",
548 |     "| 2/1  | 10 AM  | Edit, finalize, and submit proposal; Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |\n",
549 |     "| 2/13  | 4 PM  | Import & Wrangle Data; EDA | Review/Edit wrangling/EDA; Discuss Analysis Plan   |\n",
550 |     "| 2/23  | 12 PM  | Finalize wrangling/EDA; Begin Analysis| Discuss/edit Analysis; Complete project check-in |\n",
551 |     "| 3/13  | 12 PM  | Complete analysis; Draft results/conclusion/discussion| Discuss/edit full project |\n",
552 |     "| 3/19  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |"
553 |    ]
554 |   },
555 |   {
556 |    "cell_type": "code",
557 |    "execution_count": null,
558 |    "metadata": {},
559 |    "outputs": [],
560 |    "source": []
561 |   }
562 |  ],
563 |  "metadata": {
564 |   "kernelspec": {
565 |    "display_name": "Python 3",
566 |    "language": "python",
567 |    "name": "python3"
568 |   },
569 |   "language_info": {
570 |    "codemirror_mode": {
571 |     "name": "ipython",
572 |     "version": 3
573 |    },
574 |    "file_extension": ".py",
575 |    "mimetype": "text/x-python",
576 |    "name": "python",
577 |    "nbconvert_exporter": "python",
578 |    "pygments_lexer": "ipython3",
579 |    "version": "3.7.4"
580 |   }
581 |  },
582 |  "nbformat": 4,
583 |  "nbformat_minor": 2
584 | }
585 | 


--------------------------------------------------------------------------------