├── Avenir.ttc ├── README.md ├── TidyTuesday01022022.ipynb ├── TidyTuesday01032022.ipynb ├── TidyTuesday02082022.ipynb ├── TidyTuesday03052022.ipynb ├── TidyTuesday05072022.ipynb ├── TidyTuesday07062022.ipynb ├── TidyTuesday08022022.ipynb ├── TidyTuesday12072022.ipynb ├── TidyTuesday140622.ipynb ├── TidyTuesday17052022.ipynb ├── TidyTuesday19072022.ipynb ├── TidyTuesday21062022.ipynb ├── TidyTuesday22022022.ipynb ├── TidyTuesday22032022.ipynb ├── TidyTuesday24052022.ipynb ├── TidyTuesday28062022.ipynb └── TidyTuesday29032022.ipynb /Avenir.ttc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xh313/TidyTuesdayWithPython/07be77cb1bc7d8b367265e929c3cb662074d3eff/Avenir.ttc -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Tidy Tuesday with Python 2 | My weekly (or monthly) data visualisation practicing using data from TidyTuesday, using Matplotlib and Python instead of R! 3 | 4 | # Projects 5 | 6 | ## 2 Aug 2022 7 | Frogs spotted in Oregon. I experimented with circle packing but it's so much pain that I would probably never do it again. 8 | Anyway here's the code and the pic. 9 | 10 | ![frogs](https://user-images.githubusercontent.com/77285010/182714887-54e0da04-e166-4494-9ba3-f829796f83d2.png) 11 | 12 | 13 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday02082022.ipynb) 14 | 15 | ## 19 Jul 2022 16 | This data set is so interesting that I got so obsessed with wrangling it and forgot about visualisation... 17 | 18 | Here is a meaningless graph just for fun XD 19 | 20 | ![image](https://user-images.githubusercontent.com/77285010/180100441-960b7acb-c61a-4dc2-b882-1a4fecb0b45e.png) 21 | 22 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday19072022.ipynb) 23 | 24 | ## 12 Jul 2022 25 | European flights. I don't know what I'm doing. Ideally this highlights the hit of COVID on air traffic. 26 | 27 | ![image](https://user-images.githubusercontent.com/77285010/178685955-5d32be39-3327-4355-8e84-682df76e9e95.png) 28 | 29 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday12072022.ipynb) 30 | 31 | 32 | ## 5 Jul 2022 33 | SF rent and lease distribution. 34 | I tried to do an animation and failed miserably. 35 | I still have to get up to work on Wednesday so I'd try again next week lol. 36 | 37 | 38 | ![image](https://user-images.githubusercontent.com/77285010/177603951-a97a16ef-d9f9-4bf4-8b54-83d9f5414523.png) 39 | 40 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday05072022.ipynb) 41 | 42 | ## 28 June 2022 43 | UK pay gap. 44 | ![image](https://user-images.githubusercontent.com/77285010/176596978-99360163-e0c4-4cb1-8d6c-5cb0905eb0c2.png) 45 | 46 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday28062022.ipynb) 47 | 48 | 49 | ## 21 June 2022 50 | In honour of Juneteenth :) did a lot of text processing stuff to brush up my regex. 51 | ![image](https://user-images.githubusercontent.com/77285010/174964887-70fd1b09-d77a-407f-90d9-d83b8b5ee55e.png) 52 | 53 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday21062022.ipynb) 54 | 55 | ## 14 June 2022 56 | The data set is on droughts in the US but I focused on California in this part. It isn't going well... 57 | ![IMG_6613](https://user-images.githubusercontent.com/77285010/173525177-0d7c189f-62a7-4e32-b12a-fa870e78a982.JPEG) 58 | 59 | Then I also looked into the general trend for every state in the US but this is kind of unclear at first glance... 60 | ![us_droughts](https://user-images.githubusercontent.com/77285010/173766435-c2842e5a-4815-4901-bbb4-e971f3f9c9e6.jpg) 61 | 62 | 63 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday140622.ipynb) 64 | 65 | ## 7 June 2022 66 | Holding companies donating to anti-LGBTQ politicians accountable. 67 | ![image](https://user-images.githubusercontent.com/77285010/172539932-98749a80-a3f1-42e1-8570-26daa07c6f28.png) 68 | Condensed: 69 | ![image](https://user-images.githubusercontent.com/77285010/172540033-f1fd89da-2932-4224-a0bb-62c658a997ef.png) 70 | 71 | 72 | 73 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday07062022.ipynb) 74 | 75 | ## 24 May 2022 76 | Women's rugby. 77 | ![image](https://user-images.githubusercontent.com/77285010/172507547-2bd106f1-24d0-430c-83f6-383b9fc0f744.png) 78 | 79 | Code: [Here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday24052022.ipynb) 80 | 81 | ## 17 May 2022 82 | Eurovision! And the drastic contrast between 2022 and 2021. 83 | 84 | 2021: 85 | ![image](https://user-images.githubusercontent.com/77285010/168933051-b9dd7e9a-8796-4dc8-879d-8d03ee2457d5.png) 86 | 87 | Whereas 2022: 88 | ![image](https://user-images.githubusercontent.com/77285010/168933069-18ee2f83-b542-4e1b-99e2-e5cf932eaf88.png) 89 | 90 | All of our best wishes go to Ukraine <3 91 | 92 | Plotted on Python using Basemap in Matplotlib and Geopy. 93 | 94 | Code: [Click here](https://github.com/xh313/TidyTuesdayWithPython/blob/main/TidyTuesday17052022.ipynb) 95 | 96 | ## 3 May 2022 97 | After a month of random COVID disruptions and UCLA DataFest I am finally back to TidyTuesday! 98 | 99 | Today's raw data: https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-05-03 100 | 101 | Graphic: 102 | ![image](https://user-images.githubusercontent.com/77285010/166563044-82c86dd3-f435-4d74-87f8-190df2046339.png) 103 | 104 | 105 | ## 29 Mar 2022 106 | 107 | Plotly is so cool! 108 | 109 | ![ncaafunds](https://user-images.githubusercontent.com/77285010/160757774-009472e4-ab94-4876-93d4-a698e16894c4.png) 110 | 111 | 112 | ## 22 Mar 2022 113 | 114 | Cheesiest plot I've made so far... 115 | - Raw data: [https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-03-22/babynames.csv] 116 | 117 | Graphic: 118 | ![A graphic showing the trendiness of a selection of feminine baby names across the time span from 1960 to 2017.](https://user-images.githubusercontent.com/77285010/159595530-db8cbe5c-5565-4507-8b18-5a7ff6cd0c0e.png) 119 | 120 | XH 121 | 22 Mar 2022 122 | 123 | 124 | ## 08 Mar 2022 125 | 126 | **PENDING** 127 | 128 | 129 | ## 01 Mar 2022 130 | 131 | Tried Geopandas and Geoplots the first time! I'd say I would probably rather use seaborn the next time though... 132 | 133 | Data processing logic: 134 | - Raw data: [https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-03-01/stations.csv] 135 | - Map the points on the country map using the LONGITUDE and LATITUDE columns. 136 | - Colour code the points using the column FUEL_TYPE_CODE distinguishing the fuel types. 137 | 138 | Visual features: 139 | - Designs: 140 | - Map of the US (excluding Alaska and islands) as background with faint county borders 141 | - Translucent data points showing the density of the distribution clearly 142 | - Legend showing fuel types 143 | - Avenir typesetting! Avenir is the best 144 | - Also added Alt text 145 | 146 | Issues: 147 | - Projection: whenever I employ projection methods the session crashes, so now the map looks kind of squished 148 | - The spots on the legend are so faint that it's hard to tell apart the difference in colours 149 | - The legend handles are currently acronym and might work better if I type in the full name 150 | 151 | Plans: 152 | - Maybe fix the projection issue 153 | - Differentiate the colours more 154 | 155 | Graphic: 156 | ![A graphic showing the alternative fuel station distribution in the US. The shape of the country excluding Alaska and islands are shown on the background with bright points indicating the occurrences of the stations in different regions. The colours mark the type of alt fuel the stations supply. ](https://user-images.githubusercontent.com/77285010/156272212-6f779af9-70fd-4352-81c1-7ccfa4a250e0.jpg) 157 | 158 | XH 159 | 01 Mar 2022 160 | 161 | ## 22 Feb 2022 162 | 163 | Happy 22022022 palindrome day! 164 | 165 | *Content note: The raw data given by TidyTuesday this week involves comparison between countries, 166 | which might involve some political disputes and/or underlying assumptions. The raw data does not 167 | come from me and does not represent my political opinions. Please assess the credibility of the 168 | original data under your own judgements.* 169 | 170 | Data processing logic: 171 | - Raw data: [https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-02-22/freedom.csv] 172 | - Selecting the column of 'Status' (Free, partially free and not free) 173 | - Extract the status of each country for every year (1995-2020), count the data and funnel into 3 dictionaries. 174 | - Show the trend of the number of countries that are in each status over the 26 years 175 | 176 | Visual features: 177 | - Designs: 178 | - Used mock-ggplot style with some modifications (facecolor etc.) 179 | - Translucent on-graph legend with sharp corners 180 | - All-filling solid colours with different shades 181 | - Avenir typesetting! Avenir is the best 182 | 183 | Issues: 184 | - Sort of boring (I didn't have much time to make it fancier :( 185 | - Would probably work better if the graph is more horizontal (aka the height could be decreased) 186 | - The grids in the background are useless since the area fills don't have an alpha (quick fix) 187 | 188 | Plans: 189 | - Add alphas to the filled area under curves 190 | - Change graph dimensions 191 | - Improving the documentation and styling 192 | - Alt text 193 | 194 | Graphic: 195 | ![image](https://user-images.githubusercontent.com/77285010/155228929-2977f09b-6437-45b1-88c1-fc0185085030.png) 196 | 197 | XH 198 | 22 Feb 2022 199 | 200 | ## 8 Feb 2022 (actually using the data set from 25 Jan 2022) 201 | I am not interested in random american airforce people so I pulled out an old boardgame data set instead! 202 | 203 | ** LOGGING IN PROCESS NOT FINISHED ** 204 | 205 | Data processing logic: 206 | - Data on dog breeds and their different traits 207 | - Quantifying all qualitative descriptions into scores using text processing 208 | - Weighting and categorising each trait into two new parameters 'friendliness' and 'fluffiness' 209 | - Plot scatter plot with each point corresponding to a breed on the quadrant of fluffiness-friendliness 210 | 211 | Visual features: 212 | - Detecting overlapping points or close-by points automatically and wrap/dodge off the labelling (still bugged :( ) 213 | - Generating a new colour for each data point on the tab 20b palette (or any other palettes, might change it if in the mood) 214 | - Avenir typesetting! Avenir is the best 215 | - Annotation of the breed name beside each data point 216 | - Legend indexing all 190+ breed names 217 | 218 | Issues: 219 | - The overlap detector does not work for certain few points for some reason 220 | - Graph too huge with too many data points -- hard to read! Don't know if there's a better way to present the data! 221 | - Might need to adjust some weighings a bit (as I don't own a dog myself, I am biased!) 222 | 223 | Plans: 224 | - Indexing the position on the diagram for each breed and incorporating into the legend 225 | - Improving the documentation and styling (it is currently unfortunately a mess!) 226 | 227 | Graphic: 228 | ![image](https://user-images.githubusercontent.com/77285010/153077694-223bb4b3-3a34-4551-8f44-0073bddb0e35.png) 229 | 230 | XH 231 | 8 Feb 2022 232 | 233 | ## 1 Feb 2022 234 | 235 | Data processing logic: 236 | - Data on dog breeds and their different traits 237 | - Quantifying all qualitative descriptions into scores using text processing 238 | - Weighting and categorising each trait into two new parameters 'friendliness' and 'fluffiness' 239 | - Plot scatter plot with each point corresponding to a breed on the quadrant of fluffiness-friendliness 240 | 241 | Visual features: 242 | - Detecting overlapping points or close-by points automatically and wrap/dodge off the labelling (still bugged :( ) 243 | - Generating a new colour for each data point on the tab 20b palette (or any other palettes, might change it if in the mood) 244 | - Avenir typesetting! Avenir is the best 245 | - Annotation of the breed name beside each data point 246 | - Legend indexing all 190+ breed names 247 | 248 | Issues: 249 | - The overlap detector does not work for certain few points for some reason 250 | - Graph too huge with too many data points -- hard to read! Don't know if there's a better way to present the data! 251 | - Might need to adjust some weighings a bit (as I don't own a dog myself, I am biased!) 252 | 253 | Plans: 254 | - Indexing the position on the diagram for each breed and incorporating into the legend 255 | - Improving the documentation and styling (it is currently unfortunately a mess!) 256 | 257 | Graphic: 258 | ![graph01022022](https://user-images.githubusercontent.com/77285010/152634365-5ebdee2d-113b-448e-b65c-a557762e87a7.png) 259 | 260 | XH 261 | 5 Feb 2022 262 | -------------------------------------------------------------------------------- /TidyTuesday22022022.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "TidyTuesday22022022.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [], 9 | "authorship_tag": "ABX9TyOBcLe9cvCuKCaFR2LFrPnV", 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "python3", 14 | "display_name": "Python 3" 15 | }, 16 | "language_info": { 17 | "name": "python" 18 | } 19 | }, 20 | "cells": [ 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "id": "view-in-github", 25 | "colab_type": "text" 26 | }, 27 | "source": [ 28 | "\"Open" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "source": [ 34 | "from google.colab import drive\n", 35 | "drive.mount('/content/drive')" 36 | ], 37 | "metadata": { 38 | "colab": { 39 | "base_uri": "https://localhost:8080/" 40 | }, 41 | "id": "S5FH6THp1EFv", 42 | "outputId": "c3d3a209-8270-49d9-fd7e-c49bdf6254e5" 43 | }, 44 | "execution_count": 1, 45 | "outputs": [ 46 | { 47 | "output_type": "stream", 48 | "name": "stdout", 49 | "text": [ 50 | "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" 51 | ] 52 | } 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 2, 58 | "metadata": { 59 | "id": "TEspAZlQ25X7" 60 | }, 61 | "outputs": [], 62 | "source": [ 63 | "import pandas as pd\n", 64 | "import numpy as np\n", 65 | "import matplotlib.pyplot as plt\n", 66 | "import matplotlib.axes as ax\n", 67 | "import matplotlib.pylab as pl\n", 68 | "import math\n", 69 | "from collections import Counter\n", 70 | "import csv\n", 71 | "import matplotlib as mpl\n", 72 | "import matplotlib.style as style" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "source": [ 78 | "all = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-22/freedom.csv', skiprows=0)\n", 79 | "df = all\n", 80 | "\n", 81 | "timespan = 26 # years" 82 | ], 83 | "metadata": { 84 | "id": "g62IfFSR61Ou" 85 | }, 86 | "execution_count": 25, 87 | "outputs": [] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "source": [ 92 | "status = list(df['Status'])\n", 93 | "year = list(df['year'])\n", 94 | "\n", 95 | "NFcount = {}\n", 96 | "PFcount = {}\n", 97 | "Fcount = {}\n", 98 | "\n", 99 | "# Initialize the counts\n", 100 | "for i in range(1995, 1995+timespan):\n", 101 | " NFcount[i] = 0\n", 102 | " PFcount[i] = 0\n", 103 | " Fcount[i] = 0\n", 104 | "\n", 105 | "\n", 106 | "# Count the numbers\n", 107 | "for i in range(len(year)):\n", 108 | " if status[i] == 'NF': # NF\n", 109 | " NFcount[year[i]] += 1\n", 110 | " if status[i] == 'F': # F\n", 111 | " Fcount[year[i]] += 1\n", 112 | " else:\n", 113 | " PFcount[year[i]] += 1\n", 114 | "\n", 115 | "print(Fcount)" 116 | ], 117 | "metadata": { 118 | "colab": { 119 | "base_uri": "https://localhost:8080/" 120 | }, 121 | "id": "X3RStKDh85E3", 122 | "outputId": "4c3033e5-8e8b-41b1-e908-66012978be68" 123 | }, 124 | "execution_count": 4, 125 | "outputs": [ 126 | { 127 | "output_type": "stream", 128 | "name": "stdout", 129 | "text": [ 130 | "{1995: 76, 1996: 78, 1997: 80, 1998: 86, 1999: 84, 2000: 85, 2001: 84, 2002: 87, 2003: 87, 2004: 88, 2005: 88, 2006: 89, 2007: 89, 2008: 88, 2009: 88, 2010: 86, 2011: 86, 2012: 89, 2013: 87, 2014: 88, 2015: 85, 2016: 86, 2017: 87, 2018: 85, 2019: 82, 2020: 81}\n" 131 | ] 132 | } 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "source": [ 138 | "# Styling\n", 139 | "\n", 140 | "style.use('seaborn-talk')\n", 141 | "style.use('ggplot')\n", 142 | "\n", 143 | "mpl.font_manager.fontManager.addfont('/content/drive/MyDrive/Avenir.ttc')\n", 144 | "mpl.rc('font', family='Avenir') # Changing all runtime fonts into Avenir" 145 | ], 146 | "metadata": { 147 | "id": "GgDpku4mKtXl" 148 | }, 149 | "execution_count": 22, 150 | "outputs": [] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "source": [ 155 | "# Variables\n", 156 | "nf = list(NFcount.values())\n", 157 | "pf = list(PFcount.values())\n", 158 | "f = list(Fcount.values())\n", 159 | "\n", 160 | "time = np.linspace(1995, 2021, timespan)\n", 161 | "\n", 162 | "plt.figure(dpi = 80, facecolor='#e5e5e5')\n", 163 | "#plt.plot(time, f, '-')\n", 164 | "plt.fill_between(time, pf, label='Partially Free', color='#8bb7d2')\n", 165 | "plt.fill_between(time, f, label='Free', color='#448ab5')\n", 166 | "plt.fill_between(time, nf, label='Not Free', color='#396581')\n", 167 | "\n", 168 | "plt.ylabel('Number of Countries')#, fontdict={'size':10})\n", 169 | "plt.xlabel('Years')#, fontdict={'size':10})\n", 170 | "plt.xlim((1995,2020))\n", 171 | "plt.ylim(bottom=0)\n", 172 | "plt.title('Freedom in the World Over the Years')\n", 173 | "\n", 174 | "plt.legend(bbox_to_anchor=(0.1, 0.05, 0.8, .102), loc='center',\n", 175 | " ncol=3, mode=\"expand\", borderaxespad=0., fancybox=False)\n", 176 | "plt.show()" 177 | ], 178 | "metadata": { 179 | "colab": { 180 | "base_uri": "https://localhost:8080/", 181 | "height": 529 182 | }, 183 | "id": "inBMAElhGy2E", 184 | "outputId": "c9539e1a-896a-4bac-bb64-12c31609ea48" 185 | }, 186 | "execution_count": 24, 187 | "outputs": [ 188 | { 189 | "output_type": "display_data", 190 | "data": { 191 | "image/png": "\n", 192 | "text/plain": [ 193 | "
" 194 | ] 195 | }, 196 | "metadata": {} 197 | } 198 | ] 199 | } 200 | ] 201 | } -------------------------------------------------------------------------------- /TidyTuesday29032022.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "TidyTuesday29032022.ipynb", 7 | "provenance": [], 8 | "authorship_tag": "ABX9TyNS4By/R1Cey+pp+6U7OtS8", 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "language_info": { 16 | "name": "python" 17 | } 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "view-in-github", 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "\"Open" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": null, 33 | "metadata": { 34 | "id": "B1SMxs7F8Wum" 35 | }, 36 | "outputs": [], 37 | "source": [ 38 | "" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 1, 44 | "metadata": { 45 | "id": "uqmmCeFGFGP-" 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "# Run from here after first time\n", 50 | "import pandas as pd\n", 51 | "import numpy as np\n", 52 | "import matplotlib.pyplot as plt\n", 53 | "import matplotlib.axes as ax\n", 54 | "import matplotlib.pylab as pl\n", 55 | "import math\n", 56 | "from collections import Counter\n", 57 | "import csv\n", 58 | "import matplotlib as mpl\n", 59 | "import seaborn as sns" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "source": [ 65 | "all = pd.read_csv('https://raw.githubusercontent.com/rfordatascience\\\n", 66 | "/tidytuesday/master/data/2022/2022-03-29/sports.csv', low_memory=False)\n", 67 | "\n", 68 | "all['gender'] = all['sum_partic_men'] > all['sum_partic_women']\n", 69 | "\n", 70 | "list_sports = ['Basketball', 'Football', 'Golf', 'Baseball', \n", 71 | " 'Volleyball', 'Swimming', 'Wrestling', 'Softball']\n", 72 | "\n", 73 | "df = all[all.sports.isin(list_sports) & (all['total_exp_menwomen'] < 1000000)\\\n", 74 | " & (all['total_exp_menwomen'] > 100000)] # Filter out abnormal data\n", 75 | "df.describe()" 76 | ], 77 | "metadata": { 78 | "colab": { 79 | "base_uri": "https://localhost:8080/", 80 | "height": 394 81 | }, 82 | "id": "Clbbk33J9dX_", 83 | "outputId": "cb702404-e6c1-4b66-c330-0afd536a43e5" 84 | }, 85 | "execution_count": 89, 86 | "outputs": [ 87 | { 88 | "output_type": "execute_result", 89 | "data": { 90 | "text/plain": [ 91 | " year unitid zip_text classification_code \\\n", 92 | "count 30535.000000 30535.000000 3.050700e+04 30535.000000 \n", 93 | "mean 2017.009170 183153.928476 6.151950e+07 7.337416 \n", 94 | "std 1.408475 56986.722425 1.873143e+08 3.987694 \n", 95 | "min 2015.000000 100654.000000 6.810000e+02 1.000000 \n", 96 | "25% 2016.000000 148496.000000 2.963900e+04 4.000000 \n", 97 | "50% 2017.000000 179043.000000 5.320200e+04 6.000000 \n", 98 | "75% 2018.000000 213598.000000 8.105000e+04 10.000000 \n", 99 | "max 2019.000000 800001.000000 9.977575e+08 20.000000 \n", 100 | "\n", 101 | " ef_male_count ef_female_count ef_total_count sector_cd \\\n", 102 | "count 30535.000000 30535.000000 30535.000000 30535.000000 \n", 103 | "mean 1616.054560 1972.100835 3588.155395 2.179532 \n", 104 | "std 1923.261546 2288.410532 4153.982485 3.103907 \n", 105 | "min 0.000000 0.000000 0.000000 1.000000 \n", 106 | "25% 514.000000 629.000000 1173.000000 1.000000 \n", 107 | "50% 894.000000 1139.000000 2062.000000 2.000000 \n", 108 | "75% 1951.500000 2443.000000 4389.500000 2.000000 \n", 109 | "max 17376.000000 25361.000000 36401.000000 99.000000 \n", 110 | "\n", 111 | " sportscode partic_men ... partic_coed_men partic_coed_women \\\n", 112 | "count 30535.000000 19844.000000 ... 0.0 0.0 \n", 113 | "mean 11.194334 30.791373 ... NaN NaN \n", 114 | "std 9.757280 26.348312 ... NaN NaN \n", 115 | "min 1.000000 1.000000 ... NaN NaN \n", 116 | "25% 2.000000 14.000000 ... NaN NaN \n", 117 | "50% 8.000000 22.000000 ... NaN NaN \n", 118 | "75% 16.000000 36.000000 ... NaN NaN \n", 119 | "max 28.000000 251.000000 ... NaN NaN \n", 120 | "\n", 121 | " sum_partic_men sum_partic_women rev_men rev_women \\\n", 122 | "count 30535.000000 30535.000000 1.984400e+04 2.090400e+04 \n", 123 | "mean 20.010611 11.060062 2.835943e+05 2.551100e+05 \n", 124 | "std 25.824316 9.135903 2.097301e+05 1.889236e+05 \n", 125 | "min 0.000000 0.000000 1.253000e+03 1.177000e+03 \n", 126 | "25% 0.000000 0.000000 1.320480e+05 1.226078e+05 \n", 127 | "50% 14.000000 13.000000 2.167260e+05 1.957325e+05 \n", 128 | "75% 30.000000 18.000000 3.862922e+05 3.349522e+05 \n", 129 | "max 251.000000 121.000000 3.460784e+06 4.064427e+06 \n", 130 | "\n", 131 | " total_rev_menwomen exp_men exp_women total_exp_menwomen \n", 132 | "count 3.053500e+04 19844.000000 20904.000000 30535.000000 \n", 133 | "mean 3.589476e+05 278618.552056 253752.817403 354784.851547 \n", 134 | "std 2.335386e+05 200018.096784 187098.457932 226116.131079 \n", 135 | "min 1.177000e+03 12950.000000 13623.000000 100001.000000 \n", 136 | "25% 1.735285e+05 129725.500000 120893.500000 171083.000000 \n", 137 | "50% 2.893730e+05 213711.000000 193643.000000 286411.000000 \n", 138 | "75% 4.827165e+05 382003.250000 332744.750000 479724.500000 \n", 139 | "max 4.064427e+06 999899.000000 999948.000000 999948.000000 \n", 140 | "\n", 141 | "[8 rows x 21 columns]" 142 | ], 143 | "text/html": [ 144 | "\n", 145 | "
\n", 146 | "
\n", 147 | "
\n", 148 | "\n", 161 | "\n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | "
yearunitidzip_textclassification_codeef_male_countef_female_countef_total_countsector_cdsportscodepartic_men...partic_coed_menpartic_coed_womensum_partic_mensum_partic_womenrev_menrev_womentotal_rev_menwomenexp_menexp_womentotal_exp_menwomen
count30535.00000030535.0000003.050700e+0430535.00000030535.00000030535.00000030535.00000030535.00000030535.00000019844.000000...0.00.030535.00000030535.0000001.984400e+042.090400e+043.053500e+0419844.00000020904.00000030535.000000
mean2017.009170183153.9284766.151950e+077.3374161616.0545601972.1008353588.1553952.17953211.19433430.791373...NaNNaN20.01061111.0600622.835943e+052.551100e+053.589476e+05278618.552056253752.817403354784.851547
std1.40847556986.7224251.873143e+083.9876941923.2615462288.4105324153.9824853.1039079.75728026.348312...NaNNaN25.8243169.1359032.097301e+051.889236e+052.335386e+05200018.096784187098.457932226116.131079
min2015.000000100654.0000006.810000e+021.0000000.0000000.0000000.0000001.0000001.0000001.000000...NaNNaN0.0000000.0000001.253000e+031.177000e+031.177000e+0312950.00000013623.000000100001.000000
25%2016.000000148496.0000002.963900e+044.000000514.000000629.0000001173.0000001.0000002.00000014.000000...NaNNaN0.0000000.0000001.320480e+051.226078e+051.735285e+05129725.500000120893.500000171083.000000
50%2017.000000179043.0000005.320200e+046.000000894.0000001139.0000002062.0000002.0000008.00000022.000000...NaNNaN14.00000013.0000002.167260e+051.957325e+052.893730e+05213711.000000193643.000000286411.000000
75%2018.000000213598.0000008.105000e+0410.0000001951.5000002443.0000004389.5000002.00000016.00000036.000000...NaNNaN30.00000018.0000003.862922e+053.349522e+054.827165e+05382003.250000332744.750000479724.500000
max2019.000000800001.0000009.977575e+0820.00000017376.00000025361.00000036401.00000099.00000028.000000251.000000...NaNNaN251.000000121.0000003.460784e+064.064427e+064.064427e+06999899.000000999948.000000999948.000000
\n", 383 | "

8 rows × 21 columns

\n", 384 | "
\n", 385 | " \n", 395 | " \n", 396 | " \n", 433 | "\n", 434 | " \n", 458 | "
\n", 459 | "
\n", 460 | " " 461 | ] 462 | }, 463 | "metadata": {}, 464 | "execution_count": 89 465 | } 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "source": [ 471 | "# Using pandas methods and slicing to determine the order by decreasing median\n", 472 | "my_order = df.groupby(by=[\"sports\"])[\"total_exp_menwomen\"].median().iloc[::-1].index" 473 | ], 474 | "metadata": { 475 | "id": "4r2SetFkgpPz" 476 | }, 477 | "execution_count": 85, 478 | "outputs": [] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "source": [ 483 | "# Version using plotly\n", 484 | "\n", 485 | "import plotly.graph_objects as go\n", 486 | "\n", 487 | "fig = go.Figure()\n", 488 | "\n", 489 | "fig.add_trace(go.Violin(x=df['sports'][df['sum_partic_men'] < df['sum_partic_women']],\n", 490 | " y=df['total_exp_menwomen'][df['sum_partic_men'] < df['sum_partic_women']],\n", 491 | " legendgroup='Women Dominated', scalegroup='Women Dominated', name='Women Dominated',\n", 492 | " side='negative', marker=None, points=None,\n", 493 | " line_color='royalblue')\n", 494 | " )\n", 495 | "fig.add_trace(go.Violin(x=df['sports'][df['sum_partic_men'] >= df['sum_partic_women']],\n", 496 | " y=df['total_exp_menwomen'][df['sum_partic_men'] >= df['sum_partic_women']],\n", 497 | " legendgroup='Men Dominated', scalegroup='Men Dominated', name='Men Dominated',\n", 498 | " side='positive', marker=None, points=None,\n", 499 | " line_color='lightseagreen')\n", 500 | " )\n", 501 | "fig.update_traces(meanline_visible=True,)\n", 502 | "fig.update_layout(violingap=0, violinmode='overlay',\n", 503 | " title_text='Annual expenditure in Men- vs Women-Dominated Collegiate Sports', \n", 504 | " #showlegend=False\n", 505 | " )\n", 506 | "fig.show()" 507 | ], 508 | "metadata": { 509 | "id": "Z7C-7WJv2Ni6" 510 | }, 511 | "execution_count": null, 512 | "outputs": [] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "source": [ 517 | "![ncaafunds](https://user-images.githubusercontent.com/77285010/160757774-009472e4-ab94-4876-93d4-a698e16894c4.png)" 518 | ], 519 | "metadata": { 520 | "id": "xDt2Z8_MTePZ" 521 | } 522 | }, 523 | { 524 | "cell_type": "code", 525 | "source": [ 526 | "# Normal seaborn plot\n", 527 | "sns.set_theme(style=\"whitegrid\")\n", 528 | "\n", 529 | "ax = sns.violinplot(data=df, x=\"sports\", y=\"total_exp_menwomen\", hue=\"gender\",\n", 530 | " split=True, inner=\"quart\", linewidth=1, order=my_order,\n", 531 | " palette='Set2')\n", 532 | "\n", 533 | "sns.despine(left=True)\n", 534 | "\n", 535 | "plt.show()" 536 | ], 537 | "metadata": { 538 | "colab": { 539 | "base_uri": "https://localhost:8080/", 540 | "height": 296 541 | }, 542 | "id": "ufIH09dDgGOd", 543 | "outputId": "a8e800e9-fbe0-4eb4-8c20-f71e9861c692" 544 | }, 545 | "execution_count": 90, 546 | "outputs": [ 547 | { 548 | "output_type": "display_data", 549 | "data": { 550 | "text/plain": [ 551 | "
" 552 | ], 553 | "image/png": "\n" 554 | }, 555 | "metadata": {} 556 | } 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "source": [ 562 | "fig.write_html(\"ncaafunds.html\")" 563 | ], 564 | "metadata": { 565 | "id": "UP2Yq23JDMbC" 566 | }, 567 | "execution_count": 103, 568 | "outputs": [] 569 | } 570 | ] 571 | } --------------------------------------------------------------------------------