├── README.md
└── COVID-19-CoronaVirus-Chart-Graph-Map-Data-Analysis.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # covid19-data-analysis
2 | COVID-19 (Corona Virus) Data Analysis, Visualization, Case Study
3 |
--------------------------------------------------------------------------------
/COVID-19-CoronaVirus-Chart-Graph-Map-Data-Analysis.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"metadata":{},"cell_type":"markdown","source":"# COVID-19 - DATA ANALYSIS\n\n## Data Analysis, Data Visualization & Comparison\n\nThis notebook contains data analysis and visualization of COVID-19 (Corona Virus) cases around the world.\n\n## About COVID-19\n\n*Image by iXimus from Pixabay*\n\n[Coronavirus disease 2019 (COVID-19)](https://en.wikipedia.org/wiki/Coronavirus_disease_2019) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).\n\n* **First Identified:** December 2019 in Wuhan, the capital of Hubei province, China\n* **Common Symptoms:** Fever, Cough, Fatigue, Shortness of Breath and Loss of Smell\n* **Concering Symptoms:** Difficulty breathing, Persistent Chest Pain, Confusion, Difficulty Waking, and Bluish Skin\n* **Complications:**\tPneumonia, Viral Sepsis, Acute Respiratory Distress Syndrome, Kidney Failure\n* **Usual Onset:**\t2–14 days (typically 5) from infection (time from exposure to onset of symptoms)\n* **Risk factors:**\tTravel, Viral Exposure\n* **Prevention:** \tHand Washing, Face Coverings, Quarantine, Social Distancing\n\n### Useful Information on Covid-19\n* [WHO](https://www.who.int/emergencies/diseases/novel-coronavirus-2019) - World Health Organization \n* [CDC](https://www.cdc.gov/coronavirus/2019-ncov) - Centers for Disease Control and Prevention"},{"metadata":{},"cell_type":"markdown","source":"# Dataset\n\n1. Git repository of the **Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)**.\n\n * Master branch: https://github.com/CSSEGISandData/COVID-19\n * Web-data branch: https://github.com/CSSEGISandData/COVID-19/tree/web-data\n\n\n2. Kaggle dataset: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset"},{"metadata":{},"cell_type":"markdown","source":"# Install Packages\n\n**pycountry-convert**: Using country data derived from wikipedia, this package provides conversion functions between ISO country names, country-codes, and continent names. (https://pypi.org/project/pycountry-convert/)"},{"metadata":{"trusted":true,"_kg_hide-output":true},"cell_type":"code","source":"!pip install pycountry_convert","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Import Packages"},{"metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true},"cell_type":"code","source":"import numpy as np # linear algebra\nimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\nimport matplotlib.pyplot as plt\nimport matplotlib.dates as mdates\nfrom matplotlib.dates import DateFormatter\n#%matplotlib inline\nimport seaborn as sns\nfrom datetime import datetime\nfrom pandas.plotting import register_matplotlib_converters\nregister_matplotlib_converters()\nimport pycountry_convert as pc\nimport plotly.express as px\nimport plotly.graph_objects as go\nfrom plotly.subplots import make_subplots\nimport folium\nimport json","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Get Data from Dataset\n\n1. Overall Cases\n1. Cases by Country\n1. Cases by State\n1. Cases timeline\n\n1. Confirmed Cases Timeline Global\n1. Confirmed Cases Timeline US\n1. Deaths Cases Timeline Global\n1. Deaths Cases Timeline US\n1. Recovered Cases Timeline Global"},{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"df_cases = pd.read_csv(\"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases.csv\")\ndf_cases_country = pd.read_csv(\"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv\")\ndf_cases_state = pd.read_csv(\"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_state.csv\")\ndf_cases_time = pd.read_csv(\"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_time.csv\", parse_dates = ['Last_Update','Report_Date_String'])\n\ndf_confirmed_global = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')\ndf_confirmed_us = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')\ndf_deaths_global = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')\ndf_deaths_us = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv')\ndf_recovered_global = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_cases.shape)\nprint ('Last Update: ' + str(df_cases.Last_Update.max()))\ndf_cases.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_cases_country.shape)\nprint ('Last Update: ' + str(df_cases_country.Last_Update.max()))\ndf_cases_country.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_cases_state.shape)\ndf_cases_state.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_cases_time.shape)\ndf_cases_time.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_confirmed_global.shape)\ndf_confirmed_global.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_confirmed_us.shape)\ndf_confirmed_us.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_deaths_global.shape)\ndf_deaths_global.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_deaths_us.shape)\ndf_deaths_us.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print (df_recovered_global.shape)\ndf_recovered_global.head(1)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"df_data = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv', parse_dates = ['ObservationDate','Last Update'])\nprint (df_data.shape)\nprint ('Last update: ' + str(df_data.ObservationDate.max()))\ndf_data.head(2)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"# Clean data\ndf_data = df_data.drop(['SNo', 'Last Update'], axis=1)\ndf_data = df_data.rename(columns={\n 'ObservationDate': 'Date', \n 'Country/Region': 'Country_Region', \n 'Province/State': 'Province_State'\n})\ndf_data.head(2)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"# Sort data\ndf_data = df_data.sort_values(['Date','Country_Region','Province_State'])\n# Get first reported case date\ndf_data['first_date'] = df_data.groupby('Country_Region')['Date'].transform('min')\n# Get days since first reported case date\ndf_data['days'] = (df_data['Date'] - df_data['first_date']).dt.days\nprint(df_data.shape)\ndf_data.head(2)","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Total Cases Global"},{"metadata":{"trusted":true},"cell_type":"code","source":"total_confirmed = np.sum(df_cases_country['Confirmed'])\ntotal_deaths = np.sum(df_cases_country['Deaths'])\ntotal_recovered = np.sum(df_cases_country['Recovered'])\ntotal_active = np.sum(df_cases_country['Active'])\ntotal_mortality_rate = np.round((np.sum(df_cases_country['Deaths']) / np.sum(df_cases_country['Confirmed']) * 100), 2)\ntotal_recover_rate = np.round((np.sum(df_cases_country['Recovered']) / np.sum(df_cases_country['Confirmed']) * 100), 2)\n\nprint (\"Confirmed: %s\" %format(total_confirmed, \",\"))\nprint (\"Deaths: %s\" %format(total_deaths, \",\"))\nprint (\"Recovered: %s\" %format(total_recovered, \",\"))\nprint (\"Active: %s\" %format(total_active, \",\"))\nprint (\"Death rate %%: %.2f\" %((total_deaths / total_confirmed) * 100))\nprint (\"Recover rate %%: %.2f\" %((total_recovered / total_confirmed) * 100))\n\ndata = {\n 'Confirmed': [total_confirmed],\n 'Deaths': [total_deaths],\n 'Recovered': [total_recovered],\n 'Active': [total_active],\n 'Mortality Rate %': [total_mortality_rate],\n 'Recover Rate %': [total_recover_rate]\n}\ndf_total = pd.DataFrame(data)\n# colormaps: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html\ndf_total.style.hide_index().background_gradient(cmap='Wistia', axis=1)","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Cases by Country"},{"metadata":{"trusted":true},"cell_type":"code","source":"df_total_counts = df_cases_country.sort_values(by=['Confirmed'],ascending=[False])\ndf_total_counts['Death Rate'] = df_total_counts['Deaths'] / df_total_counts['Confirmed'] * 100\ndf_total_counts['Recovery Rate'] = df_total_counts['Recovered'] / df_total_counts['Confirmed'] * 100\n\ndf_total_counts['Incident_Rate'].fillna(0, inplace=True)\n\n# remove unnecessary columns\n# add different gradient color to each column\n# https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html\ndf_total_counts.drop(['Last_Update', 'Lat', 'Long_', 'People_Tested', 'People_Hospitalized', 'UID', 'ISO3', 'Mortality_Rate'], axis=1)\\\n.style.hide_index()\\\n.background_gradient(cmap='Blues',subset=[\"Confirmed\"])\\\n.background_gradient(cmap='Reds',subset=[\"Deaths\"])\\\n.background_gradient(cmap='Greens',subset=[\"Recovered\"])\\\n.background_gradient(cmap='Purples',subset=[\"Active\"])\\\n.background_gradient(cmap='GnBu',subset=[\"Incident_Rate\"])\\\n.background_gradient(cmap='OrRd',subset=[\"Death Rate\"])\\\n.background_gradient(cmap='PuBu',subset=[\"Recovery Rate\"])\\","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = go.Figure(data=[\n go.Pie(labels=df_total_counts['Country_Region'], \n values=df_total_counts['Confirmed'], \n hole=.35,\n textinfo='label+percent'\n )\n])\n\nfig.update_layout(\n title_text=\"Confirmed Cases Percentage by Countries\",\n # Add annotations in the center of the donut pies.\n annotations=[\n dict(text='Confirmed
Cases', showarrow=False),\n ]\n)\nfig.update_traces(textposition='inside')\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = go.Figure(data=[\n go.Pie(labels=df_total_counts['Country_Region'], \n values=df_total_counts['Deaths'], \n hole=.35,\n textinfo='label+percent'\n )\n])\n\nfig.update_layout(\n title_text=\"Deaths Cases Percentage by Countries\",\n # Add annotations in the center of the donut pies.\n annotations=[\n dict(text='Deaths
Cases', showarrow=False),\n ]\n)\nfig.update_traces(textposition='inside')\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Heat Map of World Countries"},{"metadata":{"trusted":true},"cell_type":"code","source":"# https://plotly.com/python/choropleth-maps/\ndata = df_cases_country.copy()\ndata['Confirmed_Log'] = np.log10(df_cases_country['Confirmed'])\ndata['Mortality_Rate'] = np.round(data['Mortality_Rate'], 2)\nfig = px.choropleth(data, \n locations='ISO3',\n color='Confirmed_Log', # a column in the dataset\n hover_name='Country_Region', # column to add to hover information\n hover_data=['Confirmed', 'Deaths', 'Recovered', 'Mortality_Rate'],\n color_continuous_scale=px.colors.sequential.Plasma)\nfig.update_layout(title_text=\"Heat Map - Confirmed Cases\")\nfig.update_coloraxes(colorbar_title=\"Color (Confirmed Cases Log Scale)\")\nfig.update(layout_coloraxis_showscale=False)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"# world_geo = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json'\n# world_geo = 'https://github.com/johan/world.geo.json/blob/master/countries.geo.json'\n\nworld_geo_json = '/kaggle/input/country-state-geo-location/countries.geo.json'\nwith open(world_geo_json) as f:\n world_geo = json.load(f)\n\ndata = df_cases_country.copy()\n#print (data[data['ISO3'] == 'AFG']['Confirmed'].iloc[0])\n\nfor index, item in enumerate(world_geo['features']):\n row = data[data['ISO3'] == item['id']]\n if row.empty: continue # skip for countries that are not present in the cases dataset\n world_geo['features'][index]['properties']['Confirmed'] = str(row.iloc[0]['Confirmed'])\n world_geo['features'][index]['properties']['Deaths'] = str(row.iloc[0]['Deaths'])\n world_geo['features'][index]['properties']['Recovered'] = str(row.iloc[0]['Recovered'])\n world_geo['features'][index]['properties']['Mortality Rate'] = str(np.round(row.iloc[0]['Mortality_Rate'],2)) + '%'\n world_geo['features'][index]['properties']['Recovery Rate'] = str(np.round(row.iloc[0]['Recovered'] / row.iloc[0]['Confirmed'] * 100, 2)) + '%'\n\nprint (world_geo['features'][0]['properties'])","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"data = df_cases_country.copy()\n\n# for Kosovo, ISO3 in geoJson = CS-KM & in cases CSV = XKS\ndata['ISO3'].replace('XKS', 'CS-KM', inplace=True) \n#print (data[data['Country_Region'] == 'Kosovo'])\n\n# logarithmic value is taken to avoid skewness\n# as US cases count is very much higher than the rest of the world\ndata['Confirmed_Log'] = np.log2(data['Confirmed'])\n\n# create a plain world map\nworld_map = folium.Map(location=[10,0], tiles=\"cartodbpositron\", zoom_start=2, max_zoom=6, min_zoom=2)\n\n# add tile layers to the map\ntiles = ['stamenwatercolor', 'cartodbpositron', 'openstreetmap', 'stamenterrain']\nfor tile in tiles:\n folium.TileLayer(tile).add_to(world_map)\n\nchoropleth = folium.Choropleth(\n geo_data=world_geo,\n name='choropleth',\n data=data,\n columns=['ISO3', 'Confirmed_Log'],\n key_on='feature.id',\n fill_color='OrRd',\n fill_opacity=0.7,\n line_opacity=0.2,\n nan_fill_color='#fef0d9',\n nan_fill_opacity=0.2,\n legend_name='Confirmed Cases (Log Scale)',\n highlight=True,\n line_color='black'\n).add_to(world_map)\n\nstyle_function = \"font-size: 15px; font-weight: bold\"\nchoropleth.geojson.add_child(\n folium.features.GeoJsonTooltip(\n fields=['name', 'Confirmed', 'Deaths', 'Recovered', 'Mortality Rate', 'Recovery Rate'],\n aliases=['Country','Confirmed', 'Deaths', 'Recovered', 'Mortality Rate', 'Recovery Rate'], \n labels=True\n )\n)\n\nfolium.LayerControl(collapsed=True).add_to(world_map)\nworld_map\n\n","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Cases by Continent"},{"metadata":{"trusted":true},"cell_type":"code","source":"def country_to_continent(country_name):\n country_alpha2 = pc.country_name_to_country_alpha2(country_name)\n country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)\n country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)\n return country_continent_name\n\n# Example\n#country_name = 'Germany'\n#print(country_to_continent(country_name))","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"def get_continent(iso3):\n try:\n continent = pc.convert_continent_code_to_continent_name( \\\n pc.country_alpha2_to_continent_code( \\\n pc.country_alpha3_to_country_alpha2(iso3) ) )\n if continent == 'Oceania':\n continent = 'Australia'\n return continent\n except:\n return 'others'\n\n# cases_country data with continent\n#cols = ['Country_Region', 'Confirmed', 'Deaths', 'Recovered', 'Active', 'ISO3']\n#df_continent = df_cases_country[cols].copy()\ndf_continent = df_cases_country.copy()\ndata_continent = []\nfor index, row in df_continent.iterrows():\n data_continent.append(get_continent(row.ISO3))\ndf_continent['Continent'] = data_continent\n\n# cases_time data with continent\ndf_continent_time = df_cases_time.copy()\ndata_continent = []\nfor index, row in df_continent_time.iterrows():\n data_continent.append(get_continent(row.iso3))\ndf_continent_time['Continent'] = data_continent\ndf_continent_time['Confirmed'].fillna(0, inplace=True)\ndf_continent_time['Deaths'].fillna(0, inplace=True)\nstart_date = df_continent_time['Report_Date_String'].min()\ndf_continent_time['Days'] = (df_continent_time['Report_Date_String'] - start_date).dt.days + 1\n\n# Get continent total\ndf_continent_total = df_continent.groupby([\"Continent\"])['Country_Region', 'Confirmed', 'Deaths', 'Recovered', 'Active', 'ISO3'].sum()\ndf_continent_total['Mortality Rate (%)'] = df_continent_total['Deaths'] / df_continent_total['Confirmed'] * 100\ndf_continent_total['Recovery Rate (%)'] = df_continent_total['Recovered'] / df_continent_total['Confirmed'] * 100\n\ndf_continent_total.style \\\n.background_gradient(cmap='Blues',subset=[\"Confirmed\"])\\\n.background_gradient(cmap='Reds',subset=[\"Deaths\"])\\\n.background_gradient(cmap='Greens',subset=[\"Recovered\"])\\\n.background_gradient(cmap='Purples',subset=[\"Active\"])\\\n.background_gradient(cmap='OrRd',subset=[\"Mortality Rate (%)\"])\\\n.background_gradient(cmap='PuBuGn',subset=[\"Recovery Rate (%)\"])\\","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = go.Figure(data=[\n go.Bar(name='Confirmed', x=df_continent_total.index, y=df_continent_total['Confirmed'],\n text=df_continent['Confirmed'], texttemplate='%{text:.2s}', textposition='outside'),\n go.Bar(name='Deaths', x=df_continent_total.index, y=df_continent_total['Deaths'],\n text=df_continent['Deaths'], texttemplate='%{text:.2s}', textposition='outside'),\n go.Bar(name='Recovered', x=df_continent_total.index, y=df_continent_total['Recovered'],\n text=df_continent['Recovered'], texttemplate='%{text:.2s}', textposition='outside'),\n])\n# Change the bar mode\nfig.update_layout(barmode='group')\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])\nfig.add_trace(go.Pie(labels=df_continent_total.index, \n values=df_continent_total['Confirmed'], \n hole=.35,\n textinfo='label+percent',\n name='Confirmed'\n ),\n 1, 1)\nfig.add_trace(go.Pie(labels=df_continent_total.index, \n values=df_continent_total['Deaths'], \n hole=.35,\n textinfo='label+percent',\n name='Deaths'\n ),\n 1, 2)\n\nfig.update_layout(\n title_text=\"Confirmed & Deaths Cases Percentage by Continent\",\n # Add annotations in the center of the donut pies.\n annotations=[\n dict(text='Confirmed
Cases', x=0.18, y=0.5, showarrow=False),\n dict(text='Deaths
Cases', x=0.80, y=0.5, showarrow=False),\n ]\n)\n#fig.update_traces(textposition='inside')\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Chart of Top Countries\n\nHigh number of:\n\n* Confirmed Cases\n* Active Cases\n* Recovered\n* Recovery Rate\n* Deaths\n* Death Rate"},{"metadata":{"trusted":true},"cell_type":"code","source":"df_top_confirmed = df_cases_country.sort_values(by=['Confirmed'],ascending=[False]).head(10)\ndf_top_confirmed.head()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"df_top_confirmed = df_cases_country.sort_values(by=['Confirmed'],ascending=[False]).head(10)\n\nfig = go.Figure(data=[\n go.Bar(name='Confirmed', x=df_top_confirmed['Country_Region'], y=df_top_confirmed['Confirmed'], \n text=df_top_confirmed['Confirmed'], texttemplate='%{text:.2s}', textposition='outside'),\n go.Bar(name='Deaths', x=df_top_confirmed['Country_Region'], y=df_top_confirmed['Deaths'], \n text=df_top_confirmed['Deaths'], texttemplate='%{text:.2s}', textposition='outside'),\n go.Bar(name='Recovered', x=df_top_confirmed['Country_Region'], y=df_top_confirmed['Recovered'], \n text=df_top_confirmed['Recovered'], texttemplate='%{text:.2s}', textposition='outside'),\n])\n# Change the bar mode\nfig.update_layout(barmode='group')\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"df_top_total_counts = df_total_counts.copy()\n# remove 'Diamond Princess' cruise ship row from the list\nindex_name = df_top_total_counts[ df_top_total_counts['Country_Region'] == 'Diamond Princess' ].index\ndf_top_total_counts.drop(index_name , inplace=True)\n\ndef plot_top_cases(column, title='', count=10, data=df_top_total_counts):\n title = column if title == '' else title\n plot_data = data\n \n # for death rate plot, taking data with deaths >= 10\n if column == 'Death Rate': plot_data = plot_data[plot_data.Deaths>=10]\n # for recovery rate plot, taking data with recovered >= 100\n if column == 'Recovery Rate': plot_data = plot_data[plot_data.Recovered>=100]\n \n plot_data = plot_data.sort_values(by=[column],ascending=[False]).head(count)\n\n fig = px.bar(plot_data, y=column, x='Country_Region', \n text=column, orientation='v', \n title=title+': Top '+str(count)+' Countries')\n pc_str = ''\n if column in ['Death Rate', 'Recovery Rate']: pc_str = \"%\"\n fig.update_traces(texttemplate='%{text:.2s}'+pc_str, textposition='outside')\n \n fig.update_layout(\n uniformtext_minsize=8, \n uniformtext_mode='hide',\n xaxis_title=\"\",\n yaxis_title=\"\"\n )\n fig.show()\n\n\nplot_top_cases('Confirmed')\nplot_top_cases('Deaths')\nplot_top_cases('Recovered')\nplot_top_cases('Death Rate', 'Death Rate (10+ deaths)')\nplot_top_cases('Recovery Rate', 'Recovery Rate (100+ recovery)')","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Time Series of Cases\n\nTrajectory of cases from the outbreak day to current day."},{"metadata":{"trusted":true},"cell_type":"code","source":"# Get sum of confirmed, deaths and recovered cases\ndef get_timeline(country = None, group_by = ['Date']):\n df_t = df_data if country is None else df_data[df_data['Country_Region'] == country]\n df_t = df_t.groupby(group_by)[['Confirmed','Deaths','Recovered']].sum().reset_index()\n df_t['Active'] = df_t['Confirmed'] - df_t['Deaths'] - df_t['Recovered']\n df_t['Death Rate'] = np.round(df_t['Deaths'] / df_t['Confirmed'] * 100, 2)\n df_t['Recovery Rate'] = np.round(df_t['Recovered'] / df_t['Confirmed'] * 100, 2)\n return df_t\n\n# Get confirmed, deaths and recovered cases for each date\n# Only the cases registered for particular day\n# and not the cummulative sum from previous days\ndef get_timeline_daily():\n df_t = df_data\n df_t = df_t.groupby(['Date'])['Confirmed','Deaths','Recovered'].sum()\n '''\n # The cases count is not of the current day only\n # It is the sum of previous days total + current day total\n # So, the sum we do above will result in cummulative sum for each Date\n # Hence, we need to compute difference between \n # current date and previous date total values of Confirmed, Deaths & Recovered Cases\n '''\n df_t = df_t.diff().fillna(df_t).reset_index()\n return df_t","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"df_t = get_timeline()\ndf_t.head()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"#fig = px.line(df, x=\"Last_Update\", y=\"Confirmed\", title='Confirmed & Death Cases Trajectory')\nfig = go.Figure()\nfig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Confirmed'], \n mode='lines+markers', name='Confirmed'))\nfig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Deaths'], \n mode='lines+markers', name='Deaths'))\nfig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Recovered'], \n mode='lines+markers', name='Recovered'))\nfig.update_layout(\n xaxis_title=\"\",\n yaxis_title=\"\",\n title = 'Time Series - Confirmed, Deaths & Recovered Cases'\n )\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = go.Figure()\nfig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Confirmed'], \n mode='lines+markers', name='Confirmed'))\nfig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Deaths'], \n mode='lines+markers', name='Deaths'))\nfig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Recovered'], \n mode='lines+markers', name='Recovered'))\nfig.update_layout(\n xaxis_title=\"\",\n yaxis_title=\"\",\n yaxis_type=\"log\",\n title = 'Time Series - Confirmed, Deaths & Recovered Cases - Log Scale'\n )\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"df_t_daily = get_timeline_daily()\ndf_t_daily.head()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = go.Figure()\nfig.add_trace(go.Scatter(x=df_t_daily['Date'], y=df_t_daily['Confirmed'], \n mode='lines+markers', name='Confirmed'))\nfig.add_trace(go.Scatter(x=df_t_daily['Date'], y=df_t_daily['Deaths'], \n mode='lines+markers', name='Deaths'))\nfig.add_trace(go.Scatter(x=df_t_daily['Date'], y=df_t_daily['Recovered'], \n mode='lines+markers', name='Recovered'))\nfig.update_layout(\n xaxis_title=\"\",\n yaxis_title=\"\",\n title = 'Time Series - Confirmed, Deaths & Recovered [Daily Cases]'\n )\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = go.Figure()\nfig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Recovery Rate'], \n mode='lines+markers', name='Recovery Rate'))\nfig.add_trace(go.Scatter(x=df_t['Date'], y=df_t['Death Rate'], \n mode='lines+markers', name='Death Rate'))\nfig.update_layout(\n xaxis_title=\"\",\n yaxis_title=\"\",\n title = 'Time Series - Death Rate & Recovery Rate [Daily Cases]'\n )\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"# Get only the data of the latest date\ndf_data_latest = df_data[df_data.Date == df_data.Date.max()]\ndf_data_latest.head()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"df_t = df_data.groupby(['Country_Region', 'Date'])['Confirmed','Deaths','Recovered'].sum().reset_index()\n\ntop10_confirmed_country_list = df_data_latest.sort_values('Confirmed', ascending=False).head(10)['Country_Region'].to_list()\ntop10_deaths_country_list = df_data_latest.sort_values('Deaths', ascending=False).head(10)['Country_Region'].to_list()\ntop10_recovered_country_list = df_data_latest.sort_values('Recovered', ascending=False).head(10)['Country_Region'].to_list()\nprint (top10_confirmed_country_list)\nprint (top10_deaths_country_list)\nprint (top10_recovered_country_list)\n\ndf_top10_confirmed = df_t[df_t.Country_Region.isin(top10_confirmed_country_list)]\ndf_top10_deaths = df_t[df_t.Country_Region.isin(top10_deaths_country_list)]\ndf_top10_recovered = df_t[df_t.Country_Region.isin(top10_recovered_country_list)]\n\ndef get_top10(country_list): \n df_top = df_t[df_t.Country_Region.isin(country_list)]\n df_top = df_top.groupby(['Country_Region', 'Date']).sum()\n df_top10 = pd.DataFrame()\n for country, df_new in df_top.groupby(level=0):\n # 1. The cases count is cummulative in the dataset.\n # Hence, we calculate the difference between current row and next row value\n # 2. Some rows after diff() were showing negative values.\n # This is because in some cases, the next day entry had lesser value than current day entry.\n # Therefore, used abs() to make them positive.\n # Hoping that the dataset is corrected later on.\n df_new = df_new.diff().fillna(df_new).abs()\n df_top10 = df_top10.append(df_new, ignore_index=False)\n return df_top10.reset_index()\n\ndf_top10_confirmed_daily = get_top10(top10_confirmed_country_list)\ndf_top10_deaths_daily = get_top10(top10_deaths_country_list)\ndf_top10_recovered_daily = get_top10(top10_recovered_country_list)\ndf_top10_confirmed_daily.head(2)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_top10_confirmed, x=\"Date\", y=\"Confirmed\", color=\"Country_Region\")\nfig.update_layout(\n title='Time Series - Confirmed Cases: Top 10 Countries',\n xaxis_title='',\n yaxis_title='',\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_top10_confirmed, x=\"Date\", y=\"Confirmed\", color=\"Country_Region\")\nfig.update_layout(\n title='Time Series - Confirmed Cases: Top 10 Countries - Log Scale',\n xaxis_title='',\n yaxis_title='',\n yaxis_type=\"log\",\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_top10_confirmed_daily, x=\"Date\", y=\"Confirmed\", color=\"Country_Region\")\nfig.update_layout(\n title='Time Series - Confirmed Cases [Daily]: Top 10 Countries',\n xaxis_title='',\n yaxis_title=''\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_top10_deaths, x=\"Date\", y=\"Deaths\", color=\"Country_Region\")\nfig.update_layout(\n title='Time Series - Deaths Cases: Top 10 Countries',\n xaxis_title='',\n yaxis_title=''\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_top10_deaths_daily, x=\"Date\", y=\"Deaths\", color=\"Country_Region\")\nfig.update_layout(\n title='Time Series - Deaths Cases [Daily]: Top 10 Countries',\n xaxis_title='',\n yaxis_title=''\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_top10_recovered, x=\"Date\", y=\"Recovered\", color=\"Country_Region\")\nfig.update_layout(\n title='Time Series - Recovered Cases: Top 10 Countries',\n xaxis_title='',\n yaxis_title='',\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_top10_recovered_daily, x=\"Date\", y=\"Recovered\", color=\"Country_Region\")\nfig.update_layout(\n title='Time Series - Recovered Cases [Daily]: Top 10 Countries',\n xaxis_title='',\n yaxis_title=''\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Time Series - Cases by Continent"},{"metadata":{"trusted":true},"cell_type":"code","source":"df_continent_t = df_continent_time.copy()\ndf_continent_t = df_continent_t.groupby(['Continent', 'Country_Region', 'Last_Update']).max().reset_index()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"# Get total count of each day for each continent\ndf_continent_t = df_continent_t.groupby(['Continent', 'Last_Update']).sum().reset_index()\n\n# Calculate Death/Mortality Rate\ndf_continent_t['Death Rate'] = df_continent_t['Deaths'] / df_continent_t['Confirmed'] * 100\ndf_continent_t['Death Rate'].fillna(0, inplace=True)\ndf_continent_t.head()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_continent_t, x=\"Last_Update\", y=\"Confirmed\", color=\"Continent\")\nfig.update_layout(\n title='Time Series - Confirmed Cases by Continent',\n xaxis_title='',\n yaxis_title='',\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_continent_t, x=\"Last_Update\", y=\"Confirmed\", color=\"Continent\")\nfig.update_layout(\n title='Time Series - Confirmed Cases by Continent [Log Scale]',\n xaxis_title='',\n yaxis_title='',\n yaxis_type=\"log\",\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_continent_t, x=\"Last_Update\", y=\"Deaths\", color=\"Continent\")\nfig.update_layout(\n title='Time Series - Deaths Cases by Continent',\n xaxis_title='',\n yaxis_title='',\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_continent_t, x=\"Last_Update\", y=\"Deaths\", color=\"Continent\")\nfig.update_layout(\n title='Time Series - Deaths Cases by Continent [Log Scale]',\n xaxis_title='',\n yaxis_title='',\n yaxis_type='log',\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.line(df_continent_t, x=\"Last_Update\", y=\"Death Rate\", color=\"Continent\")\nfig.update_layout(\n title='Time Series - Death Rate (%) by Continent',\n xaxis_title='',\n yaxis_title='',\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# Progression over Time"},{"metadata":{},"cell_type":"markdown","source":"## Confirmed Cases - Animation over Time"},{"metadata":{"trusted":true},"cell_type":"code","source":"df_temp = df_cases_time.groupby(['Last_Update', 'Country_Region'])['Confirmed', 'Deaths'].max().reset_index()\ndf_temp[\"Last_Update\"] = pd.to_datetime(df_temp[\"Last_Update\"]).dt.strftime('%m/%d/%Y')\ndf_temp['Confirmed'].fillna(0, inplace=True)\ndf_temp.sort_values('Confirmed', ascending=False).head()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.scatter_geo(df_temp, locations=\"Country_Region\", locationmode='country names', \n hover_name=\"Country_Region\", hover_data=[\"Confirmed\", \"Deaths\"], animation_frame=\"Last_Update\",\n color=np.log10(df_temp[\"Confirmed\"]+1)-1, size=np.power(df_temp[\"Confirmed\"]+1, 0.3)-1,\n range_color= [0, max(np.log10(df_temp[\"Confirmed\"]+1))],\n title=\"COVID-19 Progression Animation Over Time\",\n color_continuous_scale=px.colors.sequential.Plasma,\n projection=\"natural earth\")\nfig.update_coloraxes(colorscale=\"hot\")\nfig.update(layout_coloraxis_showscale=False)\n#fig.update_coloraxes(colorbar_title=\"Color (Confirmed Cases Log Scale)\")\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Confirmed Cases vs. Mortality Rate - Animation over Time"},{"metadata":{"trusted":true},"cell_type":"code","source":"df_continent_t = df_continent_time.copy()\ndf_continent_t[\"Last_Update\"] = pd.to_datetime(df_continent_t[\"Last_Update\"]).dt.strftime('%m/%d/%Y')\n# while calculating mortality rate, adding 1 to confirmed to avoid divide by zero\ndf_continent_t['Mortality Rate'] = df_continent_t['Deaths'] / (df_continent_t['Confirmed']+1) * 100\ndf_continent_t.sort_values('Confirmed', ascending=False).head(2)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"fig = px.scatter(df_continent_t, y=df_continent_t['Mortality Rate'],\n x=df_continent_t['Confirmed']+1,\n color=\"Continent\", \n hover_name=\"Country_Region\",\n hover_data=[\"Confirmed\", \"Deaths\"],\n color_continuous_scale=px.colors.sequential.Plasma,\n size=np.power(df_continent_t[\"Confirmed\"]+1, 0.3)-0.5,\n size_max=30,\n log_x=True,\n height=600,\n #title='COVID-19',\n range_y=[-1, 20],\n range_x=[1, df_continent_t[\"Confirmed\"].max()],\n animation_frame=\"Last_Update\", \n animation_group=\"Country_Region\",\n )\nfig.update_layout(\n title='Time Series - Confirmed Cases vs Mortality Rate by Continent',\n xaxis_title='Confirmed Cases',\n yaxis_title='Mortality Rate (%)',\n #xaxis_type='log'\n)\nfig.show()","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Useful Notebooks\n\n* [Coronavirus 2019-20 Visualization](https://www.kaggle.com/holfyuen/coronavirus-2019-20-visualization)\n* [COVID-19 Case Study - Analysis, Viz & Comparisons](https://www.kaggle.com/tarunkr/covid-19-case-study-analysis-viz-comparisons)\n"}],"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.6.4","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python"}},"nbformat":4,"nbformat_minor":4}
--------------------------------------------------------------------------------