├── R_code ├── .Rprofile ├── R_code.Rproj ├── Line_graph.Rmd ├── Line_graph.ipynb ├── Leaflet_map.Rmd ├── Leaflet_map.ipynb └── Guide.Rmd ├── Data ├── GI_map.xlsx ├── GI_det_EW.csv ├── GI_age.csv ├── sexuality_country_gender.csv ├── my_plotly_graph.html └── cleaned_sexuality_df.csv ├── Images ├── ukds.png ├── GH_pages.png └── GH_pages_resized_30.png ├── .gitignore ├── Python_code ├── Folium_map.ipynb ├── Data_cleaning_sexuality.ipynb ├── Line_graph.ipynb └── HTML_files │ ├── gi_per.html │ ├── gi_age.html │ ├── gi_age2.html │ ├── line.html │ └── scatter.html ├── .gitattributes ├── README.md └── Dockerfile /R_code/.Rprofile: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Data/GI_map.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UKDataServiceOpen/Interactive_visualisations/main/Data/GI_map.xlsx -------------------------------------------------------------------------------- /Images/ukds.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UKDataServiceOpen/Interactive_visualisations/main/Images/ukds.png -------------------------------------------------------------------------------- /Images/GH_pages.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UKDataServiceOpen/Interactive_visualisations/main/Images/GH_pages.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Jupyter Notebook checkpoints 2 | .ipynb_checkpoints/ 3 | 4 | # R-related files 5 | .RData 6 | .Rhistory 7 | .Rproj.user/ 8 | -------------------------------------------------------------------------------- /Images/GH_pages_resized_30.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UKDataServiceOpen/Interactive_visualisations/main/Images/GH_pages_resized_30.png -------------------------------------------------------------------------------- /Python_code/Folium_map.ipynb: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:a210c5218e1600323d6b7bf5afbb3c43512398eef67c91a42bf547e4c5301e09 3 | size 29415 4 | -------------------------------------------------------------------------------- /R_code/R_code.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | Shapefiles/LADs/LAD_MAY_2022_UK_BFE_V3.shp filter=lfs diff=lfs merge=lfs -text 2 | Folium_map.ipynb filter=lfs diff=lfs merge=lfs -text 3 | *.shp filter=lfs diff=lfs merge=lfs -text 4 | Data/map.html filter=lfs diff=lfs merge=lfs -text 5 | Data/map2.html filter=lfs diff=lfs merge=lfs -text 6 | -------------------------------------------------------------------------------- /Data/GI_det_EW.csv: -------------------------------------------------------------------------------- 1 | England and Wales Code,England and Wales,Gender identity (8 categories) Code,Gender identity (8 categories),Observation 2 | K04000001,England and Wales,-8,Does not apply,0 3 | K04000001,England and Wales,1,Gender identity the same as sex registered at birth,45389635 4 | K04000001,England and Wales,2,Gender identity different from sex registered at birth but no specific identity given,117775 5 | K04000001,England and Wales,3,Trans woman,47572 6 | K04000001,England and Wales,4,Trans man,48435 7 | K04000001,England and Wales,5,Non-binary,30257 8 | K04000001,England and Wales,6,All other gender identities,18074 9 | K04000001,England and Wales,7,Not answered,2914625 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Interactive visualisations 2 | 3 | 4 | This repository holds materials for the Interactive Visualisation workshop. 5 | It contains both **Python code** (written in Jupyter notebook), and **R code** (written in RStudio and converted to .ipynb for purposes of running it in the interactive Binder environment). But if you'd rather clone the repo and execute the code in your own computational environment, please do so, as we also have R code notebooks available as R markdown files. 6 | 7 | **Datasets used** in this workshop are from the **2021 UK census**, and involve the new voluntary question which focuses on **gender identity**. In particular, we explore the relationship between age and gender identity, as well as ethnicity and gender identity. 8 | 9 | Each respective code folder contains a **general guide** to creating interactive visualisations (from simple bar charts to scatter plots), and an additional notebook just focused on **interactive mapping**. 10 | 11 | To access and run the code files interactively, click the button below: 12 | 13 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/UKDataServiceOpen/Interactive_visualisations/HEAD) 14 | 15 | 16 | Preview the visualisations we'll be coding by visiting our GitHub page: 17 | 18 | [![GitHub Pages](Images/GH_pages.png)](https://ukdataserviceopen.github.io/blog/2024/05/10/interactive-visualisations-workshop.html) 19 | 20 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # Use the official R-base image as the base image 2 | FROM r-base:4.4.0 3 | 4 | # Add the R Project repository key and repository 5 | RUN apt-get update && \ 6 | apt-get install -y gnupg2 software-properties-common && \ 7 | gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys B8F25A8A73EACF41 && \ 8 | gpg --export --armor B8F25A8A73EACF41 | tee /etc/apt/trusted.gpg.d/cran_debian_key.asc && \ 9 | add-apt-repository 'deb http://cloud.r-project.org/bin/linux/debian buster-cran40/' && \ 10 | apt-get update 11 | 12 | # Install necessary packages without libglib2.0-0 13 | RUN apt-get install -y \ 14 | libglib2.0-bin \ 15 | gir1.2-girepository-2.0 16 | 17 | # Install Python 3 and necessary libraries 18 | RUN apt-get install -y \ 19 | python3 python3-pip python3-venv python3-dev \ 20 | libudunits2-dev libgdal-dev libgeos-dev libproj-dev \ 21 | libsqlite3-dev build-essential librsvg2-dev libcairo2-dev sudo 22 | 23 | # Create jovyan user and home directory 24 | RUN useradd -m -s /bin/bash jovyan 25 | 26 | # Create and activate a virtual environment, then install additional Python packages 27 | RUN python3 -m venv /opt/venv && \ 28 | . /opt/venv/bin/activate && \ 29 | /opt/venv/bin/pip install --upgrade pip && \ 30 | /opt/venv/bin/pip install jupyter ipykernel pyarrow pandas geopandas folium plotly statsmodels && \ 31 | /opt/venv/bin/python -m ipykernel install --name=venv --user 32 | 33 | # Ensure the virtual environment is used for subsequent commands 34 | ENV PATH="/opt/venv/bin:$PATH" 35 | 36 | # Install R packages including IRkernel which allows R to run on Jupyter Notebook 37 | RUN R -e "install.packages(c('leaflet', 'readr', 'dplyr', 'ggplot2', 'plotly', 'sf', 'IRkernel', 'Cairo', 'rsvg'), dependencies=TRUE, repos='https://cloud.r-project.org/')" && \ 38 | R -e "IRkernel::installspec(user = FALSE)" 39 | 40 | # Clean up package lists to reduce image size 41 | RUN apt-get clean && \ 42 | rm -rf /var/lib/apt/lists/* 43 | 44 | # Set the working directory to /home/jovyan (default for Binder) 45 | WORKDIR /home/jovyan 46 | 47 | # Copy all contents of the repository into the working directory 48 | COPY . /home/jovyan 49 | 50 | # Debug step: List contents of /home/jovyan 51 | RUN ls -la /home/jovyan 52 | 53 | # Change ownership and permissions of the /home/jovyan directory 54 | RUN chown -R jovyan:jovyan /home/jovyan && chmod -R 775 /home/jovyan 55 | 56 | # Expose the port Jupyter will run on 57 | EXPOSE 8888 58 | 59 | # Switch to jovyan user 60 | USER jovyan 61 | 62 | # Set a default command to run JupyterLab with the virtual environment activated 63 | CMD ["/bin/bash", "-c", ". /opt/venv/bin/activate && exec jupyter lab --ip=0.0.0.0 --port=8888 --notebook-dir=/home/jovyan --no-browser --allow-root"] 64 | 65 | 66 | 67 | 68 | -------------------------------------------------------------------------------- /Data/GI_age.csv: -------------------------------------------------------------------------------- 1 | England and Wales Code,England and Wales,Gender identity (7 categories) Code,Gender identity (7 categories),Age (6 categories) Code,Age (6 categories),Observation 2 | K04000001,England and Wales,-8,Does not apply,1,Aged 15 years and under,0 3 | K04000001,England and Wales,-8,Does not apply,2,Aged 16 to 24 years,0 4 | K04000001,England and Wales,-8,Does not apply,3,Aged 25 to 34 years,0 5 | K04000001,England and Wales,-8,Does not apply,4,Aged 35 to 49 years,0 6 | K04000001,England and Wales,-8,Does not apply,5,Aged 50 to 64 years,0 7 | K04000001,England and Wales,-8,Does not apply,6,Aged 65 years and over,0 8 | K04000001,England and Wales,1,Gender identity the same as sex registered at birth,1,Aged 15 years and under,0 9 | K04000001,England and Wales,1,Gender identity the same as sex registered at birth,2,Aged 16 to 24 years,5809658 10 | K04000001,England and Wales,1,Gender identity the same as sex registered at birth,3,Aged 25 to 34 years,7518377 11 | K04000001,England and Wales,1,Gender identity the same as sex registered at birth,4,Aged 35 to 49 years,10829667 12 | K04000001,England and Wales,1,Gender identity the same as sex registered at birth,5,Aged 50 to 64 years,10966023 13 | K04000001,England and Wales,1,Gender identity the same as sex registered at birth,6,Aged 65 years and over,10265910 14 | K04000001,England and Wales,2,Gender identity different from sex registered at birth but no specific identity given,1,Aged 15 years and under,0 15 | K04000001,England and Wales,2,Gender identity different from sex registered at birth but no specific identity given,2,Aged 16 to 24 years,16590 16 | K04000001,England and Wales,2,Gender identity different from sex registered at birth but no specific identity given,3,Aged 25 to 34 years,28375 17 | K04000001,England and Wales,2,Gender identity different from sex registered at birth but no specific identity given,4,Aged 35 to 49 years,38280 18 | K04000001,England and Wales,2,Gender identity different from sex registered at birth but no specific identity given,5,Aged 50 to 64 years,21678 19 | K04000001,England and Wales,2,Gender identity different from sex registered at birth but no specific identity given,6,Aged 65 years and over,12852 20 | K04000001,England and Wales,3,Trans woman,1,Aged 15 years and under,0 21 | K04000001,England and Wales,3,Trans woman,2,Aged 16 to 24 years,9186 22 | K04000001,England and Wales,3,Trans woman,3,Aged 25 to 34 years,9835 23 | K04000001,England and Wales,3,Trans woman,4,Aged 35 to 49 years,12607 24 | K04000001,England and Wales,3,Trans woman,5,Aged 50 to 64 years,9449 25 | K04000001,England and Wales,3,Trans woman,6,Aged 65 years and over,6495 26 | K04000001,England and Wales,4,Trans man,1,Aged 15 years and under,0 27 | K04000001,England and Wales,4,Trans man,2,Aged 16 to 24 years,13819 28 | K04000001,England and Wales,4,Trans man,3,Aged 25 to 34 years,8910 29 | K04000001,England and Wales,4,Trans man,4,Aged 35 to 49 years,11700 30 | K04000001,England and Wales,4,Trans man,5,Aged 50 to 64 years,8264 31 | K04000001,England and Wales,4,Trans man,6,Aged 65 years and over,5742 32 | K04000001,England and Wales,5,All other gender identities,1,Aged 15 years and under,0 33 | K04000001,England and Wales,5,All other gender identities,2,Aged 16 to 24 years,23597 34 | K04000001,England and Wales,5,All other gender identities,3,Aged 25 to 34 years,14550 35 | K04000001,England and Wales,5,All other gender identities,4,Aged 35 to 49 years,6628 36 | K04000001,England and Wales,5,All other gender identities,5,Aged 50 to 64 years,2881 37 | K04000001,England and Wales,5,All other gender identities,6,Aged 65 years and over,675 38 | K04000001,England and Wales,6,Not answered,1,Aged 15 years and under,0 39 | K04000001,England and Wales,6,Not answered,2,Aged 16 to 24 years,445463 40 | K04000001,England and Wales,6,Not answered,3,Aged 25 to 34 years,470493 41 | K04000001,England and Wales,6,Not answered,4,Aged 35 to 49 years,627215 42 | K04000001,England and Wales,6,Not answered,5,Aged 50 to 64 years,599783 43 | K04000001,England and Wales,6,Not answered,6,Aged 65 years and over,771669 44 | -------------------------------------------------------------------------------- /Python_code/Data_cleaning_sexuality.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "id": "69011e1e", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import pandas as pd" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "id": "6546572f", 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import plotly.io as pio\n", 21 | "pio.renderers.default = \"notebook_connected\"" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "id": "ff67ce73", 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "# Read the CSV file into a pandas DataFrame\n", 32 | "\n", 33 | "df = pd.read_csv('../Data/sexuality_country_gender.csv')" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "id": "05cc6624", 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "df.head()" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "id": "c09c11d4", 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "# Fill down 'Country' and 'Sex' values\n", 54 | "df['Country'].fillna(method='ffill', inplace=True)\n", 55 | "df['Gender'].fillna(method='ffill', inplace=True)\n", 56 | "\n", 57 | "# Filter out rows related to \"Weighted base (000s)\" and \"Unweighted sample\" for separate handling\n", 58 | "main_df = df[~df['Gender'].str.contains(\"Weighted base|Unweighted sample\")]\n", 59 | "\n", 60 | "# Drop unnecessary NaN columns\n", 61 | "main_df = main_df.dropna(axis=1, how='all')\n", 62 | "main_df = main_df.dropna(axis=0, how='any')\n", 63 | "\n", 64 | "# Display the cleaned main data to ensure it's structured correctly\n", 65 | "main_df.head()" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "id": "85f1f7b9", 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "year_columns = ['2010', '2011', '2012', '2013', '2014'] # Update this list based on your dataset\n", 76 | "long_format_df = main_df.melt(id_vars=['Country', 'Gender', 'Sexuality'], value_vars=year_columns, var_name='Year', value_name='Percentage')\n", 77 | "\n", 78 | "# Convert 'Percentage' to numeric, as it may be read as string due to the initial NaN values\n", 79 | "long_format_df['Percentage'] = pd.to_numeric(long_format_df['Percentage'], errors='coerce')\n", 80 | "\n", 81 | "# Display the transformed dataset ready for plotting\n", 82 | "long_format_df.head(20)" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "id": "c0c098c6", 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "# Sort datafrmae into right order \n", 93 | "\n", 94 | "# Sorting the DataFrame by 'Country', 'Sex', and then 'Year'\n", 95 | "sorted_df = long_format_df.sort_values(by=['Country', 'Gender', 'Year']).reset_index(drop = True)\n", 96 | "\n", 97 | "# Display the sorted DataFrame to check if it flows as expected\n", 98 | "sorted_df.head(20)" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "id": "d5165ea5", 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "# Round values in Percentage column to 2 decimal places\n", 109 | "\n", 110 | "sorted_df['Percentage'] = sorted_df['Percentage'].round(2)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "id": "c172705d", 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "# Save df\n", 121 | "\n", 122 | "sorted_df.to_csv('../Data/cleaned_sexuality_df.csv', index = False)" 123 | ] 124 | } 125 | ], 126 | "metadata": { 127 | "kernelspec": { 128 | "display_name": "venv", 129 | "language": "python", 130 | "name": "venv" 131 | }, 132 | "language_info": { 133 | "codemirror_mode": { 134 | "name": "ipython", 135 | "version": 3 136 | }, 137 | "file_extension": ".py", 138 | "mimetype": "text/x-python", 139 | "name": "python", 140 | "nbconvert_exporter": "python", 141 | "pygments_lexer": "ipython3", 142 | "version": "3.11.9" 143 | }, 144 | "toc": { 145 | "base_numbering": 1, 146 | "nav_menu": {}, 147 | "number_sections": true, 148 | "sideBar": true, 149 | "skip_h1_title": false, 150 | "title_cell": "Table of Contents", 151 | "title_sidebar": "Contents", 152 | "toc_cell": false, 153 | "toc_position": {}, 154 | "toc_section_display": true, 155 | "toc_window_display": true 156 | }, 157 | "varInspector": { 158 | "cols": { 159 | "lenName": 16, 160 | "lenType": 16, 161 | "lenVar": 40 162 | }, 163 | "kernels_config": { 164 | "python": { 165 | "delete_cmd_postfix": "", 166 | "delete_cmd_prefix": "del ", 167 | "library": "var_list.py", 168 | "varRefreshCmd": "print(var_dic_list())" 169 | }, 170 | "r": { 171 | "delete_cmd_postfix": ") ", 172 | "delete_cmd_prefix": "rm(", 173 | "library": "var_list.r", 174 | "varRefreshCmd": "cat(var_dic_list()) " 175 | } 176 | }, 177 | "types_to_exclude": [ 178 | "module", 179 | "function", 180 | "builtin_function_or_method", 181 | "instance", 182 | "_Feature" 183 | ], 184 | "window_display": false 185 | } 186 | }, 187 | "nbformat": 4, 188 | "nbformat_minor": 5 189 | } 190 | -------------------------------------------------------------------------------- /R_code/Line_graph.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Line_graph" 3 | output: html_document 4 | --- 5 | 6 | ```{r setup, include=FALSE} 7 | knitr::opts_chunk$set(echo = FALSE) 8 | ``` 9 | 10 | ```{r} 11 | # Allows us to read-in csv files 12 | library(readr) 13 | # For data manipulation 14 | library(dplyr) 15 | # For regular expression operations 16 | library(stringr) 17 | # Used tp create interactive visualisations 18 | library(plotly) 19 | ``` 20 | 21 | # Dataset 3 22 | 23 | This dataset includes sexual identity estimates by gender from 2010 to 2014. This is presented at a UK level, and broken down by England, Wales, Scotland and Northern Ireland. I wanted this guide to include a demo of how to make interactive line graphs with gender identity data, but unfortunately given this is only the first year that the ONS has collected this data that was not possible. So I found a dataset from 2015 which involves experimental statistics that have been used in the Integrated Household Survey. For more info, you can check out this [ONS link](https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/sexuality/datasets/sexualidentitybyagegroupbycountry). 24 | 25 | ```{r} 26 | # Load in dataset 27 | 28 | df3 <- read_csv('../Data/cleaned_sexuality_df.csv') 29 | ``` 30 | 31 | ```{r} 32 | # Brief glimpse at underlying data structure 33 | 34 | head(df3, 10) 35 | ``` 36 | 37 | ## Data cleaning 38 | 39 | When I first found this dataset it was very messy and formatted terribly, so I performed some cleaning on it in a separate jupyter notebook, to save cluttering this one and distracting from the main tutorial. If you'd like to see how I cleaned it up, please see the ['Data_cleaning_sexuality.ipynb'](Data_cleaning_sexuality.ipynb) notebook. 40 | 41 | ## Data pre-processing 42 | 43 | The only pre-processing we're going to do is subset our data by country, and also create 2 separate datasets for Gender = Men and Gender = Women. I'll explain why this step is needed soon. 44 | 45 | ```{r} 46 | # Filter dataset to focus on England 47 | england_df <- df3 %>% 48 | filter(Country == 'England') 49 | ``` 50 | 51 | ```{r} 52 | # Let's check it worked.. 53 | 54 | unique(england_df$Country) 55 | ``` 56 | 57 | ```{r} 58 | # Further filter data for each gender 59 | 60 | men <- england_df %>% filter(Gender == "Men") 61 | women <- england_df %>% filter(Gender == "Women") 62 | 63 | # Let's check it worked 64 | 65 | unique(men$Gender) 66 | unique(women$Gender) 67 | ``` 68 | 69 | 70 | ## Interactive linegraph 71 | 72 | Creating a simple line graph in plotly is pretty easy, but where plotly struggles (in R) is in handling facet plots. A facet plot is a type of visualisation that divides data into subplots based on categorical variables. What I'd like to do is create a facet plot of sexuality percentages in England (2010-2014) with individual subplots for our two genders. This is achieved easily in Python due to the plotly.express module, which provides a simple way to create facet plots. Unfortunately, we'll have to go through a bit more of a longwinded route, where we'll manually create our individual plots for each gender, then combine them using the subplot function. Also, plotly.express automatically manages legends to ensure they're unified across facets, but R's plotly requires that we manually sync up these legends. Womp womp. Let's get to it. 73 | 74 | 75 | 76 | ```{r} 77 | # Create individual plot for each gender 78 | 79 | # Create plots for each gender 80 | men_plot <- plot_ly(men, 81 | x = ~Year, 82 | y = ~Percentage, 83 | color = ~Sexuality, 84 | type = 'scatter', 85 | # mode used to make sure our data points are connected by lines across the years 86 | mode = 'lines+markers', 87 | hoverinfo = 'text', 88 | text = ~paste("Year:", Year, "
Percentage:", Percentage, "
Sexuality:", Sexuality), 89 | # legendgroup parameter ensures that data points relating to the same category are synced across plots 90 | legendgroup = ~Sexuality, 91 | # showlegend parameter set to TRUE only for this plot to avoid duplicate legends 92 | showlegend = TRUE) %>% 93 | layout(xaxis = list(title = 'Year', tickvals = 2010:2014, ticktext = 2010:2014), 94 | yaxis = list(title = 'Percentage'), 95 | # Here we add an annotation to the graph to label the first subplot "Men" 96 | # Setting xref and yref to 'paper' simply means the annotation won't move if we zoom in or out 97 | annotations = list( 98 | list(x = 0.5, y = 1.05, text = "Men", showarrow = FALSE, xref='paper', yref='paper'))) 99 | 100 | 101 | women_plot <- plot_ly(women, 102 | x = ~Year, 103 | y = ~Percentage, 104 | color = ~Sexuality, 105 | type = 'scatter', 106 | mode = 'lines+markers', 107 | hoverinfo = 'text', 108 | text = ~paste("Year:", Year, "
Percentage:", Percentage, "
Sexuality:", Sexuality), 109 | legendgroup = ~Sexuality, 110 | showlegend = FALSE) %>% 111 | layout(xaxis = list(title = 'Year', tickvals = 2010:2014, ticktext = 2010:2014), 112 | yaxis = list(title = 'Percentage'), 113 | annotations = list( 114 | list(x = 0.5, y = 1.05, text = "Women", showarrow = FALSE, xref='paper', yref='paper'))) 115 | 116 | # Let's take a look at one of these graphs 117 | 118 | women_plot 119 | ``` 120 | 121 | ```{r} 122 | # Combine individual plots using subplot 123 | # Within subplot, define number of rows, make sure share same x axes and both axes titles 124 | fig5 <- subplot(men_plot, women_plot, nrows = 2, shareX = TRUE, titleX = TRUE, titleY = TRUE) %>% 125 | layout( 126 | title = list( 127 | text = 'Sexuality Percentages by Gender in England (2010-2014)', 128 | y = 0.98, # Move the title higher up 129 | x = 0.5, # Center the title 130 | xanchor = "center", 131 | yanchor = "top" 132 | ), 133 | margin = list(t = 100), # Add space at the top for the title 134 | height = 800, 135 | width = 1000 136 | ) 137 | 138 | fig5 139 | ``` 140 | -------------------------------------------------------------------------------- /Data/sexuality_country_gender.csv: -------------------------------------------------------------------------------- 1 | Country,Gender,Sexuality,,2010,2011,2012,2013,2014 2 | UK,Men,,,,,,, 3 | ,,Heterosexual / Straight,,93.62811882,93.83312,93.23683139,92.26242206,92.46354755 4 | ,,Gay / Lesbian,,1.372509211,1.391262926,1.459347924,1.551245295,1.490815218 5 | ,,Bisexual,,0.36131733,0.386498562,0.325135912,0.360631366,0.321500701 6 | ,,Other,,0.418582671,0.367061623,0.330933952,0.260649684,0.316080858 7 | ,,Don't know/refuse,,3.519557136,3.419206992,3.476622124,3.944665469,3.814776265 8 | ,,Non-response,,0.699914828,0.602849894,1.171128699,1.620386122,1.593279404 9 | ,Women,,,,,,, 10 | ,,Heterosexual / Straight,,94.35353472,94.35996194,93.73789331,93.12693735,93.11510822 11 | ,,Gay / Lesbian,,0.636032812,0.668994001,0.716200431,0.81811695,0.65110956 12 | ,,Bisexual,,0.557008164,0.586374269,0.538799017,0.553086384,0.680992584 13 | ,,Other,,0.352114571,0.29377308,0.25004256,0.266139096,0.331951343 14 | ,,Don't know/refuse,,3.500682946,3.607818251,3.764299753,3.856405125,3.965208136 15 | ,,Non-response,,0.600626788,0.483078464,0.99276493,1.3793151,1.255630157 16 | ,,,,,,,, 17 | ,Weighted base (000s) ,,,,,,, 18 | ,,Males,,24308717.74,24505659.12,24706280.39,24907902.78,25181910.9 19 | ,,Females,,25508484.31,25661932.66,25826203.38,25992983.04,26453346.81 20 | ,Unweighted sample,,,,,,, 21 | ,,Males,,103294,87019,78313,78711,74333 22 | ,,Females,,128111,108426,99884,100109,93888 23 | England,Men,,,,,,, 24 | ,,Heterosexual / Straight,,93.46095169,93.6102045,93.00390992,91.96060373,92.24357479 25 | ,,Gay / Lesbian,,1.409938893,1.428043932,1.51711417,1.648751959,1.540857592 26 | ,,Bisexual,,0.383175378,0.388705396,0.315574803,0.364382383,0.323187693 27 | ,,Other,,0.439337792,0.38275392,0.319925832,0.26104683,0.311220581 28 | ,,Don't know/refuse,,3.675059364,3.614741574,3.662760907,4.151810228,3.947951862 29 | ,,Non-response,,0.631536879,0.575550678,1.180714364,1.613404874,1.633207478 30 | ,Women,,,,,,, 31 | ,,Heterosexual / Straight,,94.30702106,94.21343431,93.51278045,92.93290161,92.82094392 32 | ,,Gay / Lesbian,,0.639942374,0.660549286,0.738128415,0.822970429,0.679285111 33 | ,,Bisexual,,0.609534911,0.61474986,0.547201704,0.553738898,0.689049289 34 | ,,Other,,0.341734012,0.290688591,0.247195243,0.263122197,0.340662553 35 | ,,Don't know/refuse,,3.60653303,3.779135708,3.981289916,4.057598482,4.19149524 36 | ,,Non-response,,0.495234614,0.441442247,0.973404276,1.369668381,1.27856389 37 | ,Weighted base (000s) ,,,,,,, 38 | ,,Males,,20423676.62,20595901.39,20770032.75,20945743.72,21174113.8 39 | ,,Females,,21328711.08,21470378.22,21615047.84,21762276.48,22150559.09 40 | ,Unweighted sample,,,,,,, 41 | ,,Males,,79763,65577,57514,57775,54628 42 | ,,Females,,97868,80892,72938,73024,68306 43 | Wales,Men,,,,,,, 44 | ,,,,,,,, 45 | ,,Heterosexual / Straight,,93.99947253,95.27644151,93.82542704,93.13178673,93.60356113 46 | ,,Gay / Lesbian,,1.209079668,1.018596429,1.153972993,1.2238716,1.476273829 47 | ,,Bisexual,,0.328014664,0.309472418,0.496198245,0.377706214,0.239305016 48 | ,,Other,,0.30393243,0.238524827,0.459964613,0.315820887,0.490810953 49 | ,,Don't know/refuse,,2.955478239,2.326213467,2.782720125,3.046023168,2.948026226 50 | ,,Non-response,,1.204022467,0.830751345,1.28171698,1.9047914,1.24202285 51 | ,Women,,,,,,, 52 | ,,Heterosexual / Straight,,94.80461267,95.0339601,94.66231674,93.92877999,94.24760043 53 | ,,Gay / Lesbian,,0.543387787,0.770042321,0.558070399,0.669843723,0.623326069 54 | ,,Bisexual,,0.196116198,0.327624929,0.356052208,0.527131569,0.703674453 55 | ,,Other,,0.386962695,0.320760393,0.338465432,0.386150572,0.401681864 56 | ,,Don't know/refuse,,2.730869369,2.623739352,2.806198312,2.7790508,3.03566191 57 | ,,Non-response,,1.338051276,0.923872901,1.278896905,1.709043347,0.988055272 58 | ,,,,,,,, 59 | ,Weighted base (000s) ,,,,,,, 60 | ,,Males,,1177133.35,1185029.68,1194028.81,1203112.32,1224169.91 61 | ,,Females,,1250355.67,1255875.13,1262312.07,1269093.03,1282153.98 62 | ,Unweighted sample,,,,,,, 63 | ,,Males,,9582,8979,8914,9093,8645 64 | ,,Females,,11933,11210,11389,11560,11037 65 | Scotland,Men,,,,,,, 66 | ,,Heterosexual / Straight,,94.88507567,94.9442994,94.77618445,94.43224829,94.36574403 67 | ,,Gay / Lesbian,,1.142923503,1.365051204,1.292271099,0.974427117,1.094706743 68 | ,,Bisexual,,0.281250249,0.460850059,0.260115294,0.290933166,0.28803851 69 | ,,Other,,0.323164461,0.313607741,0.283961872,0.141615719,0.247092986 70 | ,,Don't know/refuse,,2.163401415,1.99231078,2.258355705,2.453858406,2.547685028 71 | ,,Non-response,,1.204184705,0.923880814,1.129111585,1.706917301,1.4567327 72 | ,Women,,,,,,, 73 | ,,Heterosexual / Straight,,94.81142823,94.87226614,95.02178483,94.2214845,94.90160028 74 | ,,Gay / Lesbian,,0.684099771,0.847356557,0.762843059,1.06255585,0.563237521 75 | ,,Bisexual,,0.356577513,0.48196109,0.466165064,0.346488158,0.363738828 76 | ,,Other,,0.36605498,0.290627862,0.269216405,0.273659686,0.262959014 77 | ,,Don't know/refuse,,2.631876647,2.713222538,2.426957992,2.59807418,2.625206158 78 | ,,Non-response,,1.149962862,0.794565816,1.053032654,1.497737624,1.283258203 79 | ,,,,,,,, 80 | ,Weighted base (000s) ,,,,,,, 81 | ,,Males,,2030910.2,2041091.2,2052286.09,2062913.65,2084745.54 82 | ,,Females,,2212405.36,2213191.11,2221502.81,2229100.71,2281488.63 83 | ,Unweighted sample,,,,,,, 84 | ,,Males,,12720,11227,10579,10531,9873 85 | ,,Females,,16457,14548,13783,13822,12919 86 | Nireland,Men,,,,,,, 87 | ,,Heterosexual / Straight,,94.25480951,94.72942689,94.65116266,93.41119957,91.45702853 88 | ,,Gay / Lesbian,,1.216222977,1.007409124,0.74581328,0.892526169,1.181726548 89 | ,,Bisexual,,-,0.231545447,0.510330906,0.424800953,0.514181478 90 | ,,Other,,0.278033494,0.276702463,0.578744821,0.506092879,0.363062902 91 | ,,Don't know/refuse,,3.877460003,3.683154295,2.69779486,3.68289489,5.077846871 92 | ,,Non-response1,,*,*,0.81615347,1.082485534,1.406153674 93 | ,Women,,,,,,, 94 | ,,Heterosexual / Straight,,93.53768039,95.97344012,94.90206006,94.17154774,94.45180528 95 | ,,Gay / Lesbian,,0.532979774,0.197922956,0.196524143,*,* 96 | ,,Bisexual,,0.242298527,0.512751627,0.8280934,1.207364535,1.379459847 97 | ,,Other,,0.557117438,0.348159319,0.12263717,0.12496027,0.162893589 98 | ,,Don't know/refuse,,4.375210073,2.967725978,3.063223497,3.574886239,2.932440424 99 | ,,Non-response,,0.754713797,-,0.887461729,0.734279299,0.94722537 100 | ,,,,,,,, 101 | ,Weighted base (000s) ,,,,,,, 102 | ,,Males,,676997.57,683636.85,689932.74,696133.09,698881.65 103 | ,,Females,,717012.2,722488.2,727340.66,732512.82,739145.11 104 | ,Unweighted sample,,,,,,, 105 | ,,Males,,1229,1236,1306,1312,1187 106 | ,,Females,,1853,1776,1774,1703,1626 -------------------------------------------------------------------------------- /Python_code/Line_graph.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "7835cbc9", 6 | "metadata": {}, 7 | "source": [ 8 | "# Import packages" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "id": "177942ff", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "# Allows us to read-in csv files, and used for data manipulation\n", 19 | "import pandas as pd\n", 20 | "\n", 21 | "# Used to create regular expressions to match strings\n", 22 | "import re\n", 23 | "\n", 24 | "# Modules used to create interactive visualisations \n", 25 | "import plotly.express as px\n", 26 | "import plotly.graph_objects as go" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "id": "fdb2e3f8", 32 | "metadata": {}, 33 | "source": [ 34 | "# Dataset 4\n", 35 | "\n", 36 | "This dataset includes sexual identity estimates by gender from 2010 to 2014. This is presented at a UK level, and broken down by England, Wales, Scotland and Northern Ireland. I wanted this guide to include a demo of how to make interactive line graphs with gender identity data, but unfortunately given this is only the first year that the ONS has collected this data that was not possible. So I found a dataset from 2015 which involves experimental statistics that have been used in the Integrated Household Survey. For more info, you can check out this [ONS link](https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/sexuality/datasets/sexualidentitybyagegroupbycountry). " 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "id": "0741066d", 43 | "metadata": {}, 44 | "outputs": [], 45 | "source": [ 46 | "df4 = pd.read_csv('../Data/cleaned_sexuality_df.csv')" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "id": "ad837265", 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "# Brief glimpse at underlying data structure\n", 57 | "\n", 58 | "df4.head(50)" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "id": "d0a19aef", 64 | "metadata": {}, 65 | "source": [ 66 | "## Data cleaning\n", 67 | "\n", 68 | "When I first found this dataset it was very messy and formatted terribly, so I performed some cleaning on it in a separate jupyter notebook, to save cluttering this one and distracting from the main tutorial. If you'd like to see how I cleaned it up, please see the ['Data_cleaning_sexuality.ipynb'](Data_cleaning_sexuality.ipynb) notebook." 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "id": "85a5341a", 74 | "metadata": {}, 75 | "source": [ 76 | "## Data pre-processing\n", 77 | "\n", 78 | "The only pre-processing we're going to do is subset our data so that we have it ready to analyse in the following step." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "id": "e5d4e369", 85 | "metadata": {}, 86 | "outputs": [], 87 | "source": [ 88 | "# Filtering the dataset for England only\n", 89 | "\n", 90 | "england_df = df4[df4['Country'] == 'England']" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "id": "6ffda0fe", 96 | "metadata": {}, 97 | "source": [ 98 | "## Interactive linegraph\n", 99 | "\n", 100 | "By now you probably know the drill. Just like we had our px.bar and px.scatter methods, we have a corresponding one for linegraphs, appropriately named px.line. The parameters used are the same, with the only difference being that we're using:\n", 101 | "\n", 102 | "* facet_row - when we specify a categorical variable here (Gender), this instructs Plotly to create a separate subplot (a row) for each unique value. \n", 103 | "\n", 104 | "* facet_column - when we specify a categorical variable here (Country), this instructs Plotly to create a separate subplot (a column) for each unique value.\n", 105 | "\n", 106 | "Thus, we get our 2x1 grid of linegraphs. If we added on another country e.g. Scotland, and used these same parameters we'd get a 2x3 grid, and so on. " 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "id": "3a66cbaa", 112 | "metadata": {}, 113 | "source": [ 114 | "## Interactive legends\n", 115 | "\n", 116 | "Again, the cool thing about Plotly's legends is that they are interactive by default. Thus, this allows us to omit values which dominate the graph and obscure our ability to get to the nitty gritty of the data.\n" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": null, 122 | "id": "6c0a5aea", 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [ 126 | "# Specify hover_data\n", 127 | "\n", 128 | "hover_data = {'Sexuality': True,\n", 129 | " 'Percentage': ':.2f%',\n", 130 | " 'Country': False,\n", 131 | " 'Year': False,\n", 132 | " 'Gender': True}" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "id": "a4bceeb7", 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "\n", 143 | "fig6 = px.line(england_df,\n", 144 | " x='Year',\n", 145 | " y='Percentage',\n", 146 | " color='Sexuality',\n", 147 | " facet_row='Gender',\n", 148 | " facet_col='Country',\n", 149 | " hover_data = hover_data,\n", 150 | " title='Sexuality Percentages by Gender in England (2010-2014)',\n", 151 | " markers=True,\n", 152 | " height = 800,\n", 153 | " width = 1000)\n", 154 | "\n", 155 | "# Enhance the layout for readability\n", 156 | "fig6.update_layout(title_x = 0.15,\n", 157 | " legend_title_text='Sexuality')\n", 158 | "\n", 159 | "fig6.show()" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": null, 165 | "id": "0a510c8f", 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "# Finally, let's update our x-axis so that it only shows whole years\n", 170 | "\n", 171 | "# dtick \"M12\" - tells plotly to place a tick every 12 months \n", 172 | "fig6.update_xaxes(dtick=\"M12\", tickformat=\"%Y\")" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": null, 178 | "id": "f3ee60a7", 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [] 182 | } 183 | ], 184 | "metadata": { 185 | "kernelspec": { 186 | "display_name": "Python 3 (ipykernel)", 187 | "language": "python", 188 | "name": "python3" 189 | }, 190 | "language_info": { 191 | "codemirror_mode": { 192 | "name": "ipython", 193 | "version": 3 194 | }, 195 | "file_extension": ".py", 196 | "mimetype": "text/x-python", 197 | "name": "python", 198 | "nbconvert_exporter": "python", 199 | "pygments_lexer": "ipython3", 200 | "version": "3.10.13" 201 | }, 202 | "toc": { 203 | "base_numbering": 1, 204 | "nav_menu": {}, 205 | "number_sections": true, 206 | "sideBar": true, 207 | "skip_h1_title": false, 208 | "title_cell": "Table of Contents", 209 | "title_sidebar": "Contents", 210 | "toc_cell": false, 211 | "toc_position": {}, 212 | "toc_section_display": true, 213 | "toc_window_display": true 214 | }, 215 | "varInspector": { 216 | "cols": { 217 | "lenName": 16, 218 | "lenType": 16, 219 | "lenVar": 40 220 | }, 221 | "kernels_config": { 222 | "python": { 223 | "delete_cmd_postfix": "", 224 | "delete_cmd_prefix": "del ", 225 | "library": "var_list.py", 226 | "varRefreshCmd": "print(var_dic_list())" 227 | }, 228 | "r": { 229 | "delete_cmd_postfix": ") ", 230 | "delete_cmd_prefix": "rm(", 231 | "library": "var_list.r", 232 | "varRefreshCmd": "cat(var_dic_list()) " 233 | } 234 | }, 235 | "types_to_exclude": [ 236 | "module", 237 | "function", 238 | "builtin_function_or_method", 239 | "instance", 240 | "_Feature" 241 | ], 242 | "window_display": false 243 | } 244 | }, 245 | "nbformat": 4, 246 | "nbformat_minor": 5 247 | } 248 | -------------------------------------------------------------------------------- /Python_code/HTML_files/gi_per.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
5 |
6 | 7 | -------------------------------------------------------------------------------- /Data/my_plotly_graph.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
5 |
6 | 7 | -------------------------------------------------------------------------------- /Python_code/HTML_files/gi_age.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
5 |
6 | 7 | -------------------------------------------------------------------------------- /Python_code/HTML_files/gi_age2.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
5 |
6 | 7 | -------------------------------------------------------------------------------- /R_code/Line_graph.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": "\n" 7 | }, 8 | { 9 | "cell_type": "code", 10 | "execution_count": null, 11 | "metadata": {}, 12 | "outputs": [], 13 | "source": [ 14 | "knitr::opts_chunk$set(echo = FALSE)\n", 15 | "\n" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": "\n" 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": null, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "# Allows us to read-in csv files\n", 30 | "library(readr) \n", 31 | "# For data manipulation\n", 32 | "library(dplyr) \n", 33 | "# For regular expression operations \n", 34 | "library(stringr) \n", 35 | "# Used tp create interactive visualisations\n", 36 | "library(plotly)\n" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "# Dataset 3\n", 44 | "\n", 45 | "This dataset includes sexual identity estimates by gender from 2010 to 2014. This is presented at a UK level, and broken down by England, Wales, Scotland and Northern Ireland. I wanted this guide to include a demo of how to make interactive line graphs with gender identity data, but unfortunately given this is only the first year that the ONS has collected this data that was not possible. So I found a dataset from 2015 which involves experimental statistics that have been used in the Integrated Household Survey. For more info, you can check out this [ONS link](https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/sexuality/datasets/sexualidentitybyagegroupbycountry). \n" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": null, 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "# Load in dataset\n", 55 | "\n", 56 | "df3 <- read_csv('../Data/cleaned_sexuality_df.csv')\n" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": "\n" 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "# Brief glimpse at underlying data structure\n", 71 | "\n", 72 | "head(df3, 10)\n" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "## Data cleaning\n", 80 | "\n", 81 | "When I first found this dataset it was very messy and formatted terribly, so I performed some cleaning on it in a separate jupyter notebook, to save cluttering this one and distracting from the main tutorial. If you'd like to see how I cleaned it up, please see the ['Data_cleaning_sexuality.ipynb'](Data_cleaning_sexuality.ipynb) notebook. \n", 82 | "\n", 83 | "## Data pre-processing\n", 84 | "\n", 85 | "The only pre-processing we're going to do is subset our data by country, and also create 2 separate datasets for Gender = Men and Gender = Women. I'll explain why this step is needed soon. \n" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "metadata": {}, 92 | "outputs": [], 93 | "source": [ 94 | "# Filter dataset to focus on England\n", 95 | "england_df <- df3 %>%\n", 96 | " filter(Country == 'England')\n" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": "\n" 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "# Let's check it worked.. \n", 111 | "\n", 112 | "unique(england_df$Country)\n" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": "\n" 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [ 126 | "# Further filter data for each gender\n", 127 | "\n", 128 | "men <- england_df %>% filter(Gender == \"Men\")\n", 129 | "women <- england_df %>% filter(Gender == \"Women\")\n", 130 | "\n", 131 | "# Let's check it worked\n", 132 | "\n", 133 | "unique(men$Gender)\n", 134 | "unique(women$Gender)\n" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "## Interactive linegraph\n", 142 | "\n", 143 | "Creating a simple line graph in plotly is pretty easy, but where plotly struggles (in R) is in handling facet plots. A facet plot is a type of visualisation that divides data into subplots based on categorical variables. What I'd like to do is create a facet plot of sexuality percentages in England (2010-2014) with individual subplots for our two genders. This is achieved easily in Python due to the plotly.express module, which provides a simple way to create facet plots. Unfortunately, we'll have to go through a bit more of a longwinded route, where we'll manually create our individual plots for each gender, then combine them using the subplot function. Also, plotly.express automatically manages legends to ensure they're unified across facets, but R's plotly requires that we manually sync up these legends. Womp womp. Let's get to it. \n" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": {}, 150 | "outputs": [], 151 | "source": [ 152 | "# Create individual plot for each gender\n", 153 | "\n", 154 | "# Create plots for each gender\n", 155 | "men_plot <- plot_ly(men, \n", 156 | " x = ~Year, \n", 157 | " y = ~Percentage, \n", 158 | " color = ~Sexuality, \n", 159 | " type = 'scatter', \n", 160 | " # mode used to make sure our data points are connected by lines across the years\n", 161 | " mode = 'lines+markers', \n", 162 | " hoverinfo = 'text',\n", 163 | " text = ~paste(\"Year:\", Year, \"
Percentage:\", Percentage, \"
Sexuality:\", Sexuality),\n", 164 | " # legendgroup parameter ensures that data points relating to the same category are synced across plots\n", 165 | " legendgroup = ~Sexuality,\n", 166 | " # showlegend parameter set to TRUE only for this plot to avoid duplicate legends\n", 167 | " showlegend = TRUE) %>%\n", 168 | " layout(xaxis = list(title = 'Year', tickvals = 2010:2014, ticktext = 2010:2014),\n", 169 | " yaxis = list(title = 'Percentage'),\n", 170 | " # Here we add an annotation to the graph to label the first subplot \"Men\"\n", 171 | " # Setting xref and yref to 'paper' simply means the annotation won't move if we zoom in or out\n", 172 | " annotations = list(\n", 173 | " list(x = 0.5, y = 1.05, text = \"Men\", showarrow = FALSE, xref='paper', yref='paper')))\n", 174 | "\n", 175 | "\n", 176 | "women_plot <- plot_ly(women, \n", 177 | " x = ~Year, \n", 178 | " y = ~Percentage, \n", 179 | " color = ~Sexuality, \n", 180 | " type = 'scatter', \n", 181 | " mode = 'lines+markers', \n", 182 | " hoverinfo = 'text',\n", 183 | " text = ~paste(\"Year:\", Year, \"
Percentage:\", Percentage, \"
Sexuality:\", Sexuality),\n", 184 | " legendgroup = ~Sexuality,\n", 185 | " showlegend = FALSE) %>%\n", 186 | " layout(xaxis = list(title = 'Year', tickvals = 2010:2014, ticktext = 2010:2014),\n", 187 | " yaxis = list(title = 'Percentage'),\n", 188 | " annotations = list(\n", 189 | " list(x = 0.5, y = 1.05, text = \"Women\", showarrow = FALSE, xref='paper', yref='paper')))\n", 190 | "\n", 191 | "# Let's take a look at one of these graphs\n", 192 | "\n", 193 | "women_plot\n" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": "\n" 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "# Combine individual plots using subplot\n", 208 | "# Within subplot, define number of rows, make sure share same x axes and both axes titles\n", 209 | "fig5 <- subplot(men_plot, women_plot, nrows = 2, shareX = TRUE, titleX = TRUE, titleY = TRUE) %>%\n", 210 | " layout(\n", 211 | " title = list(\n", 212 | " text = 'Sexuality Percentages by Gender in England (2010-2014)', \n", 213 | " y = 0.98, # Move the title higher up\n", 214 | " x = 0.5, # Center the title\n", 215 | " xanchor = \"center\",\n", 216 | " yanchor = \"top\"\n", 217 | " ),\n", 218 | " margin = list(t = 100), # Add space at the top for the title\n", 219 | " height = 800,\n", 220 | " width = 1000\n", 221 | " )\n", 222 | "\n", 223 | "fig5\n" 224 | ] 225 | } 226 | ], 227 | "metadata": { 228 | "anaconda-cloud": "", 229 | "kernelspec": { 230 | "display_name": "R", 231 | "langauge": "R", 232 | "name": "ir" 233 | }, 234 | "language_info": { 235 | "codemirror_mode": "r", 236 | "file_extension": ".r", 237 | "mimetype": "text/x-r-source", 238 | "name": "R", 239 | "pygments_lexer": "r", 240 | "version": "3.4.1" 241 | } 242 | }, 243 | "nbformat": 4, 244 | "nbformat_minor": 1 245 | } 246 | -------------------------------------------------------------------------------- /Data/cleaned_sexuality_df.csv: -------------------------------------------------------------------------------- 1 | Country,Gender,Sexuality,Year,Percentage 2 | England,Men,Heterosexual / Straight,2010,93.46 3 | England,Men,Gay / Lesbian,2010,1.41 4 | England,Men,Bisexual,2010,0.38 5 | England,Men,Other,2010,0.44 6 | England,Men,Don't know/refuse,2010,3.68 7 | England,Men,Non-response,2010,0.63 8 | England,Men,Heterosexual / Straight,2011,93.61 9 | England,Men,Gay / Lesbian,2011,1.43 10 | England,Men,Bisexual,2011,0.39 11 | England,Men,Other,2011,0.38 12 | England,Men,Don't know/refuse,2011,3.61 13 | England,Men,Non-response,2011,0.58 14 | England,Men,Heterosexual / Straight,2012,93.0 15 | England,Men,Gay / Lesbian,2012,1.52 16 | England,Men,Bisexual,2012,0.32 17 | England,Men,Other,2012,0.32 18 | England,Men,Don't know/refuse,2012,3.66 19 | England,Men,Non-response,2012,1.18 20 | England,Men,Heterosexual / Straight,2013,91.96 21 | England,Men,Gay / Lesbian,2013,1.65 22 | England,Men,Bisexual,2013,0.36 23 | England,Men,Other,2013,0.26 24 | England,Men,Don't know/refuse,2013,4.15 25 | England,Men,Non-response,2013,1.61 26 | England,Men,Heterosexual / Straight,2014,92.24 27 | England,Men,Gay / Lesbian,2014,1.54 28 | England,Men,Bisexual,2014,0.32 29 | England,Men,Other,2014,0.31 30 | England,Men,Don't know/refuse,2014,3.95 31 | England,Men,Non-response,2014,1.63 32 | England,Women,Heterosexual / Straight,2010,94.31 33 | England,Women,Gay / Lesbian,2010,0.64 34 | England,Women,Bisexual,2010,0.61 35 | England,Women,Other,2010,0.34 36 | England,Women,Don't know/refuse,2010,3.61 37 | England,Women,Non-response,2010,0.5 38 | England,Women,Heterosexual / Straight,2011,94.21 39 | England,Women,Gay / Lesbian,2011,0.66 40 | England,Women,Bisexual,2011,0.61 41 | England,Women,Other,2011,0.29 42 | England,Women,Don't know/refuse,2011,3.78 43 | England,Women,Non-response,2011,0.44 44 | England,Women,Heterosexual / Straight,2012,93.51 45 | England,Women,Gay / Lesbian,2012,0.74 46 | England,Women,Bisexual,2012,0.55 47 | England,Women,Other,2012,0.25 48 | England,Women,Don't know/refuse,2012,3.98 49 | England,Women,Non-response,2012,0.97 50 | England,Women,Heterosexual / Straight,2013,92.93 51 | England,Women,Gay / Lesbian,2013,0.82 52 | England,Women,Bisexual,2013,0.55 53 | England,Women,Other,2013,0.26 54 | England,Women,Don't know/refuse,2013,4.06 55 | England,Women,Non-response,2013,1.37 56 | England,Women,Heterosexual / Straight,2014,92.82 57 | England,Women,Gay / Lesbian,2014,0.68 58 | England,Women,Bisexual,2014,0.69 59 | England,Women,Other,2014,0.34 60 | England,Women,Don't know/refuse,2014,4.19 61 | England,Women,Non-response,2014,1.28 62 | Nireland,Men,Heterosexual / Straight,2010,94.25 63 | Nireland,Men,Gay / Lesbian,2010,1.22 64 | Nireland,Men,Bisexual,2010, 65 | Nireland,Men,Other,2010,0.28 66 | Nireland,Men,Don't know/refuse,2010,3.88 67 | Nireland,Men,Non-response1,2010, 68 | Nireland,Men,Heterosexual / Straight,2011,94.73 69 | Nireland,Men,Gay / Lesbian,2011,1.01 70 | Nireland,Men,Bisexual,2011,0.23 71 | Nireland,Men,Other,2011,0.28 72 | Nireland,Men,Don't know/refuse,2011,3.68 73 | Nireland,Men,Non-response1,2011, 74 | Nireland,Men,Heterosexual / Straight,2012,94.65 75 | Nireland,Men,Gay / Lesbian,2012,0.75 76 | Nireland,Men,Bisexual,2012,0.51 77 | Nireland,Men,Other,2012,0.58 78 | Nireland,Men,Don't know/refuse,2012,2.7 79 | Nireland,Men,Non-response1,2012,0.82 80 | Nireland,Men,Heterosexual / Straight,2013,93.41 81 | Nireland,Men,Gay / Lesbian,2013,0.89 82 | Nireland,Men,Bisexual,2013,0.42 83 | Nireland,Men,Other,2013,0.51 84 | Nireland,Men,Don't know/refuse,2013,3.68 85 | Nireland,Men,Non-response1,2013,1.08 86 | Nireland,Men,Heterosexual / Straight,2014,91.46 87 | Nireland,Men,Gay / Lesbian,2014,1.18 88 | Nireland,Men,Bisexual,2014,0.51 89 | Nireland,Men,Other,2014,0.36 90 | Nireland,Men,Don't know/refuse,2014,5.08 91 | Nireland,Men,Non-response1,2014,1.41 92 | Nireland,Women,Heterosexual / Straight,2010,93.54 93 | Nireland,Women,Gay / Lesbian,2010,0.53 94 | Nireland,Women,Bisexual,2010,0.24 95 | Nireland,Women,Other,2010,0.56 96 | Nireland,Women,Don't know/refuse,2010,4.38 97 | Nireland,Women,Non-response,2010,0.75 98 | Nireland,Women,Heterosexual / Straight,2011,95.97 99 | Nireland,Women,Gay / Lesbian,2011,0.2 100 | Nireland,Women,Bisexual,2011,0.51 101 | Nireland,Women,Other,2011,0.35 102 | Nireland,Women,Don't know/refuse,2011,2.97 103 | Nireland,Women,Non-response,2011, 104 | Nireland,Women,Heterosexual / Straight,2012,94.9 105 | Nireland,Women,Gay / Lesbian,2012,0.2 106 | Nireland,Women,Bisexual,2012,0.83 107 | Nireland,Women,Other,2012,0.12 108 | Nireland,Women,Don't know/refuse,2012,3.06 109 | Nireland,Women,Non-response,2012,0.89 110 | Nireland,Women,Heterosexual / Straight,2013,94.17 111 | Nireland,Women,Gay / Lesbian,2013, 112 | Nireland,Women,Bisexual,2013,1.21 113 | Nireland,Women,Other,2013,0.12 114 | Nireland,Women,Don't know/refuse,2013,3.57 115 | Nireland,Women,Non-response,2013,0.73 116 | Nireland,Women,Heterosexual / Straight,2014,94.45 117 | Nireland,Women,Gay / Lesbian,2014, 118 | Nireland,Women,Bisexual,2014,1.38 119 | Nireland,Women,Other,2014,0.16 120 | Nireland,Women,Don't know/refuse,2014,2.93 121 | Nireland,Women,Non-response,2014,0.95 122 | Scotland,Men,Heterosexual / Straight,2010,94.89 123 | Scotland,Men,Gay / Lesbian,2010,1.14 124 | Scotland,Men,Bisexual,2010,0.28 125 | Scotland,Men,Other,2010,0.32 126 | Scotland,Men,Don't know/refuse,2010,2.16 127 | Scotland,Men,Non-response,2010,1.2 128 | Scotland,Men,Heterosexual / Straight,2011,94.94 129 | Scotland,Men,Gay / Lesbian,2011,1.37 130 | Scotland,Men,Bisexual,2011,0.46 131 | Scotland,Men,Other,2011,0.31 132 | Scotland,Men,Don't know/refuse,2011,1.99 133 | Scotland,Men,Non-response,2011,0.92 134 | Scotland,Men,Heterosexual / Straight,2012,94.78 135 | Scotland,Men,Gay / Lesbian,2012,1.29 136 | Scotland,Men,Bisexual,2012,0.26 137 | Scotland,Men,Other,2012,0.28 138 | Scotland,Men,Don't know/refuse,2012,2.26 139 | Scotland,Men,Non-response,2012,1.13 140 | Scotland,Men,Heterosexual / Straight,2013,94.43 141 | Scotland,Men,Gay / Lesbian,2013,0.97 142 | Scotland,Men,Bisexual,2013,0.29 143 | Scotland,Men,Other,2013,0.14 144 | Scotland,Men,Don't know/refuse,2013,2.45 145 | Scotland,Men,Non-response,2013,1.71 146 | Scotland,Men,Heterosexual / Straight,2014,94.37 147 | Scotland,Men,Gay / Lesbian,2014,1.09 148 | Scotland,Men,Bisexual,2014,0.29 149 | Scotland,Men,Other,2014,0.25 150 | Scotland,Men,Don't know/refuse,2014,2.55 151 | Scotland,Men,Non-response,2014,1.46 152 | Scotland,Women,Heterosexual / Straight,2010,94.81 153 | Scotland,Women,Gay / Lesbian,2010,0.68 154 | Scotland,Women,Bisexual,2010,0.36 155 | Scotland,Women,Other,2010,0.37 156 | Scotland,Women,Don't know/refuse,2010,2.63 157 | Scotland,Women,Non-response,2010,1.15 158 | Scotland,Women,Heterosexual / Straight,2011,94.87 159 | Scotland,Women,Gay / Lesbian,2011,0.85 160 | Scotland,Women,Bisexual,2011,0.48 161 | Scotland,Women,Other,2011,0.29 162 | Scotland,Women,Don't know/refuse,2011,2.71 163 | Scotland,Women,Non-response,2011,0.79 164 | Scotland,Women,Heterosexual / Straight,2012,95.02 165 | Scotland,Women,Gay / Lesbian,2012,0.76 166 | Scotland,Women,Bisexual,2012,0.47 167 | Scotland,Women,Other,2012,0.27 168 | Scotland,Women,Don't know/refuse,2012,2.43 169 | Scotland,Women,Non-response,2012,1.05 170 | Scotland,Women,Heterosexual / Straight,2013,94.22 171 | Scotland,Women,Gay / Lesbian,2013,1.06 172 | Scotland,Women,Bisexual,2013,0.35 173 | Scotland,Women,Other,2013,0.27 174 | Scotland,Women,Don't know/refuse,2013,2.6 175 | Scotland,Women,Non-response,2013,1.5 176 | Scotland,Women,Heterosexual / Straight,2014,94.9 177 | Scotland,Women,Gay / Lesbian,2014,0.56 178 | Scotland,Women,Bisexual,2014,0.36 179 | Scotland,Women,Other,2014,0.26 180 | Scotland,Women,Don't know/refuse,2014,2.63 181 | Scotland,Women,Non-response,2014,1.28 182 | UK,Men,Heterosexual / Straight,2010,93.63 183 | UK,Men,Gay / Lesbian,2010,1.37 184 | UK,Men,Bisexual,2010,0.36 185 | UK,Men,Other,2010,0.42 186 | UK,Men,Don't know/refuse,2010,3.52 187 | UK,Men,Non-response,2010,0.7 188 | UK,Men,Heterosexual / Straight,2011,93.83 189 | UK,Men,Gay / Lesbian,2011,1.39 190 | UK,Men,Bisexual,2011,0.39 191 | UK,Men,Other,2011,0.37 192 | UK,Men,Don't know/refuse,2011,3.42 193 | UK,Men,Non-response,2011,0.6 194 | UK,Men,Heterosexual / Straight,2012,93.24 195 | UK,Men,Gay / Lesbian,2012,1.46 196 | UK,Men,Bisexual,2012,0.33 197 | UK,Men,Other,2012,0.33 198 | UK,Men,Don't know/refuse,2012,3.48 199 | UK,Men,Non-response,2012,1.17 200 | UK,Men,Heterosexual / Straight,2013,92.26 201 | UK,Men,Gay / Lesbian,2013,1.55 202 | UK,Men,Bisexual,2013,0.36 203 | UK,Men,Other,2013,0.26 204 | UK,Men,Don't know/refuse,2013,3.94 205 | UK,Men,Non-response,2013,1.62 206 | UK,Men,Heterosexual / Straight,2014,92.46 207 | UK,Men,Gay / Lesbian,2014,1.49 208 | UK,Men,Bisexual,2014,0.32 209 | UK,Men,Other,2014,0.32 210 | UK,Men,Don't know/refuse,2014,3.81 211 | UK,Men,Non-response,2014,1.59 212 | UK,Women,Heterosexual / Straight,2010,94.35 213 | UK,Women,Gay / Lesbian,2010,0.64 214 | UK,Women,Bisexual,2010,0.56 215 | UK,Women,Other,2010,0.35 216 | UK,Women,Don't know/refuse,2010,3.5 217 | UK,Women,Non-response,2010,0.6 218 | UK,Women,Heterosexual / Straight,2011,94.36 219 | UK,Women,Gay / Lesbian,2011,0.67 220 | UK,Women,Bisexual,2011,0.59 221 | UK,Women,Other,2011,0.29 222 | UK,Women,Don't know/refuse,2011,3.61 223 | UK,Women,Non-response,2011,0.48 224 | UK,Women,Heterosexual / Straight,2012,93.74 225 | UK,Women,Gay / Lesbian,2012,0.72 226 | UK,Women,Bisexual,2012,0.54 227 | UK,Women,Other,2012,0.25 228 | UK,Women,Don't know/refuse,2012,3.76 229 | UK,Women,Non-response,2012,0.99 230 | UK,Women,Heterosexual / Straight,2013,93.13 231 | UK,Women,Gay / Lesbian,2013,0.82 232 | UK,Women,Bisexual,2013,0.55 233 | UK,Women,Other,2013,0.27 234 | UK,Women,Don't know/refuse,2013,3.86 235 | UK,Women,Non-response,2013,1.38 236 | UK,Women,Heterosexual / Straight,2014,93.12 237 | UK,Women,Gay / Lesbian,2014,0.65 238 | UK,Women,Bisexual,2014,0.68 239 | UK,Women,Other,2014,0.33 240 | UK,Women,Don't know/refuse,2014,3.97 241 | UK,Women,Non-response,2014,1.26 242 | Wales,Men,Heterosexual / Straight,2010,94.0 243 | Wales,Men,Gay / Lesbian,2010,1.21 244 | Wales,Men,Bisexual,2010,0.33 245 | Wales,Men,Other,2010,0.3 246 | Wales,Men,Don't know/refuse,2010,2.96 247 | Wales,Men,Non-response,2010,1.2 248 | Wales,Men,Heterosexual / Straight,2011,95.28 249 | Wales,Men,Gay / Lesbian,2011,1.02 250 | Wales,Men,Bisexual,2011,0.31 251 | Wales,Men,Other,2011,0.24 252 | Wales,Men,Don't know/refuse,2011,2.33 253 | Wales,Men,Non-response,2011,0.83 254 | Wales,Men,Heterosexual / Straight,2012,93.83 255 | Wales,Men,Gay / Lesbian,2012,1.15 256 | Wales,Men,Bisexual,2012,0.5 257 | Wales,Men,Other,2012,0.46 258 | Wales,Men,Don't know/refuse,2012,2.78 259 | Wales,Men,Non-response,2012,1.28 260 | Wales,Men,Heterosexual / Straight,2013,93.13 261 | Wales,Men,Gay / Lesbian,2013,1.22 262 | Wales,Men,Bisexual,2013,0.38 263 | Wales,Men,Other,2013,0.32 264 | Wales,Men,Don't know/refuse,2013,3.05 265 | Wales,Men,Non-response,2013,1.9 266 | Wales,Men,Heterosexual / Straight,2014,93.6 267 | Wales,Men,Gay / Lesbian,2014,1.48 268 | Wales,Men,Bisexual,2014,0.24 269 | Wales,Men,Other,2014,0.49 270 | Wales,Men,Don't know/refuse,2014,2.95 271 | Wales,Men,Non-response,2014,1.24 272 | Wales,Women,Heterosexual / Straight,2010,94.8 273 | Wales,Women,Gay / Lesbian,2010,0.54 274 | Wales,Women,Bisexual,2010,0.2 275 | Wales,Women,Other,2010,0.39 276 | Wales,Women,Don't know/refuse,2010,2.73 277 | Wales,Women,Non-response,2010,1.34 278 | Wales,Women,Heterosexual / Straight,2011,95.03 279 | Wales,Women,Gay / Lesbian,2011,0.77 280 | Wales,Women,Bisexual,2011,0.33 281 | Wales,Women,Other,2011,0.32 282 | Wales,Women,Don't know/refuse,2011,2.62 283 | Wales,Women,Non-response,2011,0.92 284 | Wales,Women,Heterosexual / Straight,2012,94.66 285 | Wales,Women,Gay / Lesbian,2012,0.56 286 | Wales,Women,Bisexual,2012,0.36 287 | Wales,Women,Other,2012,0.34 288 | Wales,Women,Don't know/refuse,2012,2.81 289 | Wales,Women,Non-response,2012,1.28 290 | Wales,Women,Heterosexual / Straight,2013,93.93 291 | Wales,Women,Gay / Lesbian,2013,0.67 292 | Wales,Women,Bisexual,2013,0.53 293 | Wales,Women,Other,2013,0.39 294 | Wales,Women,Don't know/refuse,2013,2.78 295 | Wales,Women,Non-response,2013,1.71 296 | Wales,Women,Heterosexual / Straight,2014,94.25 297 | Wales,Women,Gay / Lesbian,2014,0.62 298 | Wales,Women,Bisexual,2014,0.7 299 | Wales,Women,Other,2014,0.4 300 | Wales,Women,Don't know/refuse,2014,3.04 301 | Wales,Women,Non-response,2014,0.99 302 | -------------------------------------------------------------------------------- /Python_code/HTML_files/line.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
5 |
6 | 7 | -------------------------------------------------------------------------------- /R_code/Leaflet_map.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Interactive Mapping" 3 | output: html_document 4 | editor_options: 5 | markdown: 6 | wrap: sentence 7 | --- 8 | 9 | ```{r setup, include=FALSE} 10 | knitr::opts_chunk$set(echo=TRUE) 11 | ``` 12 | 13 | ## Guide 14 | 15 | In this notebook we will create an interactive map of the UK which displays the % of Trans men in each local authority. 16 | There are 331 local authorities in the UK, and we are using data collected in the 2021 UK Census which included 2 new questions on sexuality and gender identity. 17 | The following data used are: 18 | 19 | - [Gender identity (detailed)](https://www.ons.gov.uk/datasets/TS070/editions/2021/versions/3) - this dataset classifies usual residents aged 16 years and over in England and Wales by gender identity. 20 | - [Local Authority District Boundaries](https://geoportal.statistics.gov.uk/datasets/bb53f91cce9e4fd6b661dc0a6c734a3f_0/about) - this file contains the digital vector boundaries for Local Authority Districts in the UK as of May 2022. 21 | 22 | ## Install packages 23 | 24 | If you're running this code on your own PC (and not through the Binder link) then you're going to want to uncomment the lines below so you can install the requisite packages. Another thing to remember is to set your working directory to the correct folder. Otherwise reading in data will be difficult. 25 | 26 | ```{r} 27 | # install.packages("leaflet") 28 | # install.packages("sf") 29 | # install.packages("dplyr") 30 | # install.packages("readr") 31 | ``` 32 | 33 | ## Import libraries 34 | 35 | ```{r} 36 | 37 | # used to read-in datasets 38 | library(readr) 39 | # used to manipulate datasets 40 | library(dplyr) 41 | # used to read-in spatial data, shapefiles 42 | library(sf) 43 | # used to create interactive maps 44 | library(leaflet) 45 | # used to scrape data from websites 46 | library(httr) 47 | 48 | ``` 49 | 50 | ## Read-in dataset 51 | 52 | ```{r} 53 | # First, let's read in our gender identity dataset 54 | 55 | df <- read_csv('../Data/GI_det.csv') 56 | ``` 57 | 58 | 59 | ```{r} 60 | # Use head function to check out the first few rows - but can also access df via environment pane 61 | 62 | head(df, 10) 63 | ``` 64 | 65 | ## Data Cleaning 66 | 67 | Before we can calculate the %'s of trans men in each local authority, it's good to do some housekeeping and get our dataframe in order. 68 | 69 | There's a few things that need sorting including: 70 | 71 | 1. renaming columns so they are easier to reference 72 | 2. removing 'Does not apply' from gender identity category 73 | 74 | 75 | ### Pipe operator - %\>% 76 | 77 | The pipe operator is used to pass the result of one function directly into the next one. 78 | E.g. let's say we had some code: 79 | 80 | ```{} 81 | sorted_data <- my_data %\>% filter(condition) %\>% arrange(sorting_variable) 82 | ``` 83 | 84 | What we're doing is using the pipe operator to pass my_data to the filter() function, and the result of this is then passed to the arrange() function. 85 | 86 | Basically, pipes allow us to chain together a sequence of functions in a way that's easy to read and understand. 87 | 88 | In the code below we use the pipe operator to pass our dataframe to the rename function. 89 | 90 | This basically supplies the rename function with its first argument, which is the dataframe to filter on. 91 | 92 | ### 1 93 | 94 | ```{r} 95 | # Rename columns using the rename function from dplyr 96 | # Specify what you want to rename the column to, and supply the original column string 97 | 98 | df <- df %>% 99 | rename(LA_code = `Lower tier local authorities Code`, 100 | # backticks ` necessary when names are syntactically invalid, e.g. spaces, special characters etc. 101 | LA_name = `Lower tier local authorities`, 102 | GI_code = `Gender identity (8 categories) Code`, 103 | GI_cat = `Gender identity (8 categories)`) 104 | ``` 105 | 106 | 107 | ```{r} 108 | # Let's use the colnames function to see if it worked 109 | 110 | colnames(df) 111 | ``` 112 | 113 | ### 2 114 | 115 | ### Logical operators - ==, !=, \<, \>, \<=, \>=, &, \|, ! 116 | 117 | Logical operators are used to perform comparisons between values or expressions, which result in a logical (Boolean) value of 'TRUE' or 'FALSE'. 118 | 119 | In the code below we use the '!=' 'Does not equal' operator which tests if the GI_cat value in each row of the df does not equal the string 'Does not apply'. 120 | 121 | For each row where GI_cat is not equal to 'Does not apply', the expression valuates to TRUE. 122 | 123 | We filter so we only keep rows where this expression evaluates to TRUE. 124 | 125 | ```{r} 126 | 127 | # Use dplyr's filter function to get rid of 'Does not apply' 128 | # Use '!=' to keep everything except 'Does not apply' category 129 | 130 | df <- df %>% filter(GI_cat != 'Does not apply') 131 | 132 | ``` 133 | 134 | ### Dollar sign operator - $ 135 | 136 | This operator is used to access elements, such as columns of a dataframe, by name.Below, we use it to access the gender identity category column, where we want to view the unique values. 137 | 138 | ```{r} 139 | # Unique function can be applied to a column in a df to see which values are in that column 140 | # Let's see if 'Does not apply' has been successfully dropped 141 | 142 | unique(df$GI_cat) 143 | 144 | ``` 145 | 146 | 147 | ## Data Pre-processing 148 | 149 | Now onto the more interesting stuff. 150 | The data pre-processing stage involves preparing and transforming data into a suitable format for further analysis.It can involve selecting features, transforming variables, and creating new variables.For our purposes, we need to create a new column 'Percentages' which contains the % of Trans men in each local authority. 151 | 152 | So, we'll need to first calculate the % of each gender identity category for each local authority. Then, we'll want to filter our dataset so that we only keep the responses related to Trans men. 153 | 154 | ```{r} 155 | # Use group_by to group the dataframe by the LA_name column 156 | # Use mutate to perform calculation within each LA_name group, convert result to a % by multiplying by 100 157 | # round() is used to round %'s to 2 decimal places 158 | 159 | df <- df %>% 160 | group_by(LA_name) %>% 161 | mutate(Percentage = round(Observation / sum(Observation) * 100, 2)) 162 | ``` 163 | 164 | 165 | ```{r} 166 | # Let's check out the results 167 | 168 | head(df, 10) 169 | ``` 170 | 171 | ```{r} 172 | # Use filter() to only keep rows where GI_cat equals 'Trans man' 173 | df <- df %>% 174 | filter(GI_cat == 'Trans man') %>% 175 | # Use select() with '-' to remove 'Observation' column 176 | select(-Observation) %>% 177 | # Use distinct() to remove duplicate rows, as a precaution 178 | distinct() %>% 179 | # Use ungroup() to remove grouping - resetting the dataframes state after performing group operations is good practice 180 | ungroup() 181 | ``` 182 | 183 | 184 | ```{r} 185 | # Let's take a look at the results 186 | head(df) 187 | ``` 188 | 189 | ## Read-in shapefile 190 | 191 | Now that we have our gender identity dataset sorted, we can start on the mapping process. And that starts with reading in our shapefile, which we should have downloaded from the geoportal. If (like me) you don't work with spatial data much, you might assume that you only need the shapefile, and you might delete the others that come with the folder. However, a shapefile is not just a single .shp file, but a collection of files that work together, and each of these files plays a crucial role in defining the shapefile's data and behaviour. When you try and read a shapefile into R, the software expects all components to be present, and missing them can lead to errors or incorrect spatial references. E.g. without the .dbf file, you'd lose all attribute data associated with the geographic features, and without the .shx file you might not be able to read the .shp file altogether. 192 | 193 | **TLDR: Make sure when you download the shapefile folder you keep all the files!** 194 | 195 | Anyway, let's get started. 196 | 197 | ```{r} 198 | # Download shapefiles from geoportal 199 | 200 | # URL for the direct download of the shapefile 201 | url <- "https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/Local_Authority_Districts_May_2022_UK_BFE_V3_2022/FeatureServer/replicafilescache/Local_Authority_Districts_May_2022_UK_BFE_V3_2022_3331011932393166417.zip" 202 | 203 | # Create a temporary directory 204 | tmp_dir <- tempdir() 205 | print(paste("Created temporary directory:", tmp_dir)) 206 | 207 | # Set destination file path 208 | dest_file <- file.path(tmp_dir, "shapefile.zip") 209 | 210 | # Download the shapefile 211 | response <- GET(url, write_disk(dest_file, overwrite = TRUE)) 212 | 213 | # Check if the download was successful 214 | if (response$status_code == 200) 215 | print("Download successful") 216 | 217 | # Unzip the file within the temporary directory 218 | unzip(dest_file, exdir = tmp_dir) 219 | print(paste("Files extracted to:", tmp_dir)) 220 | 221 | # List all files in the temporary directory to verify extraction 222 | extracted_files <- list.files(tmp_dir) 223 | print("Extracted files:") 224 | print(extracted_files) 225 | 226 | # Define the path to the actual shapefile (.shp) 227 | shapefile_path <- file.path(tmp_dir, 'LAD_MAY_2022_UK_BFE_V3.shp') 228 | 229 | # Read in shapefile to a simple features object 230 | # st_read() reads in spatial data to a 'simple features' object 231 | sf <- st_read(shapefile_path) 232 | print("Shapefile loaded successfully.") 233 | 234 | 235 | ``` 236 | 237 | 238 | 239 | ```{r} 240 | # Let's check it out 241 | head(sf) 242 | # Better to just view via environment pane 243 | ``` 244 | 245 | 246 | ```{r} 247 | # Inspect dimensions 248 | dim(sf) 249 | ``` 250 | 251 | ```{r} 252 | # length() with the unique() function gives us the number of unique values in a column 253 | 254 | length(unique(sf$LAD22NM)) 255 | ``` 256 | 257 | ## Cleaning shapefile 258 | 259 | Hmm.We have 331 local authorities in our dataset that we want to plot, but there are 374 listed here. 260 | We'll need to remove the local authorities that don't match the ones in our df. 261 | 262 | 1. rename columns to match 'df' 263 | 2. get rid of redundant Local Authorities 264 | 265 | ### 1 266 | 267 | ```{r} 268 | # Use rename function so sf columns match those in original df 269 | 270 | sf <- sf %>% 271 | rename(LA_code = LAD22CD, 272 | LA_name = LAD22NM) 273 | 274 | # Let's see if it worked 275 | colnames(sf) 276 | ``` 277 | 278 | ```{r} 279 | # Replace specific values in the LA_name column using recode() 280 | 281 | sf$LA_name <- sf$LA_name %>% 282 | recode(`Bristol, City of` = "Bristol", 283 | `Kingston upon Hull, City of` = "Kingston upon Hull", 284 | `Herefordshire, County of` = "Herefordshire") 285 | ``` 286 | 287 | ### 2 288 | 289 | ### %in% operator 290 | 291 | This is used to check if elements of one list are in another list. 292 | Much like the logical operators, it returns a boolean value TRUE or FALSE. 293 | And we only keep rows in the LA_code for the 'sf' dataset, if they are present in the LA_code column in 'df'. 294 | 295 | ```{r} 296 | # Use filter() with %in% and unique() to only keep LA's that match 297 | 298 | sf <- sf %>% 299 | filter(LA_code %in% unique(df$LA_code)) 300 | ``` 301 | 302 | 303 | ```{r} 304 | # Let's see how it looks.. 305 | # We should have 331 unique LA_codes 306 | length(unique(sf$LA_code)) 307 | ``` 308 | 309 | ## Pre-processing shapefile 310 | 311 | When it comes to mapping our data, it is important that we know which Coordinate Reference System (CRS) we are working with. Simply put, the CRS is a way to describe how the spatial data in the 'sf' object maps to locations on earth. The CRS is just a way of translating 3D reality into 2D maps. And when it comes to using mapping libraries like 'leaflet', knowing the CRS is important because leaflet expects coordinates in a specific format (usually latitude and longitude), which is EPSG:4326. If our CRS isn't in this format then we might need to transform it so that it matches what leaflet expects. Let's go ahead and see what our CRS is saying. 312 | 313 | ```{r} 314 | # st_crs() shows our CRS info 315 | st_crs(sf) 316 | ``` 317 | 318 | 319 | ```{r} 320 | # To transform our crs to EPSG: 4326, simply use st_transform() and specify the crs 321 | # Note: you don't have to use the %>% pipe operator all the time 322 | sf <- st_transform(sf, crs = 4326) 323 | ``` 324 | 325 | ### Merge datasets 326 | 327 | What we want to do now is merge our 'df' dataframe with our 'sf' spatial object, so that we can directly access the data and map it! 328 | 329 | When you use the merge function in R, the order in which you place the data matters in terms of the result's class type and spatial attributes. 330 | So, in terms of class type, we have a dataframe and a spatial object. By placing 'sf' first, the result will be a spatial object, which is important because this retains the spatial characteristics and geometry columns of the 'sf' object. We merge the columns on the LA_code and LA_name columns which are present in both datasets. 331 | 332 | ### 'c' function 333 | 334 | Don't overthink it. It's just a way to group items together in R, whether for defining a set of values to work with, specifying parameters for a function, or any number of other uses where a list of items is needed. 335 | 336 | ```{r} 337 | # Merge the dataframes 338 | merged <- merge(sf, df, by = c('LA_code', 'LA_name')) 339 | ``` 340 | 341 | 342 | ```{r} 343 | # Let's check it out 344 | head(merged) 345 | ``` 346 | 347 | ## Data Analysis 348 | 349 | ## Building our interactive map 350 | 351 | Finally, we can now build out interactive map using leaflet. You can see from the 'geometry' column that we're working with 'MULTIPOLYGON's' and 'POLYGON's'. Multipolygons are a collection of polygons grouped together as a single geometric entity. Basically, multipolygons are good at representing complex shapes. We also have some standard polygons too. In total we have 331 shapes to plot, each representing a local authority. You can take a look at these separate shapes by using the plot function and indexing the row and column (see below). 352 | 353 | ```{r} 354 | plot(sf[1, 'geometry']) 355 | ``` 356 | 357 | The code below has helpful code comments that should help you grasp what each bit of the code is doing. But, to provide the overall picture, what we have below is some code for our colour palette which will create a colour scale for the range of values in our 'Percentage' column. Then, we create our interactive map which we've named 'uk_map'. We center our map, add some default map tiles, add our polygons, colour them, then add in the interactive elements such as highlight options (how background changes when cursor hovers over a shape) and label (which specifies tooltips). Then, we add a legend. Finally, we can display this interactive map. 358 | 359 | 360 | ```{r} 361 | # Define the color palette for filling in our multipolygon shapes 362 | # domain sets the range of data values that the colour scale should cover 363 | color_palette <- colorNumeric(palette = "YlGnBu", domain = merged$Percentage) 364 | ``` 365 | 366 | 367 | ```{r} 368 | # Use leaflet function with 'merged' dataset 369 | uk_map <- leaflet(merged) %>% 370 | # Centers the map on long and lat for UK 371 | setView(lng = -3.0, lat = 53, zoom = 6) %>% 372 | # Adds default map tiles (the visual image of the map) 373 | addTiles() %>% 374 | # Adds multipolygons to the map, and colours them based on the 'Percentage' column 375 | # We use the palette we created above 376 | addPolygons( 377 | fillColor = ~color_palette(Percentage), 378 | weight = 1, # Set the border weight to 1 for thinner borders 379 | color = "#000000", 380 | fillOpacity = 0.7, 381 | highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE), 382 | label = ~paste(LA_name, ":", Percentage, "%"), # This will create tooltips showing the info 383 | labelOptions = labelOptions( 384 | style = list("font-weight" = "normal", padding = "3px 8px"), 385 | textsize = "12px", direction = "auto") # Adjust text size as needed 386 | ) %>% 387 | addLegend(pal = color_palette, values = ~Percentage, opacity = 0.7, title = "Percentage", position = "topright") 388 | 389 | # Render the map 390 | uk_map 391 | ``` 392 | 393 | -------------------------------------------------------------------------------- /R_code/Leaflet_map.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": { 14 | "vscode": { 15 | "languageId": "r" 16 | } 17 | }, 18 | "outputs": [], 19 | "source": [ 20 | "knitr::opts_chunk$set(echo=TRUE)\n" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "# Guide\n", 28 | "\n", 29 | "In this notebook we will create an interactive map of the UK which displays the % of Trans men in each local authority.\n", 30 | "There are 331 local authorities in the UK, and we are using data collected in the 2021 UK Census which included 2 new questions on sexuality and gender identity.\n", 31 | "The following data used are:\n", 32 | "\n", 33 | "- [Gender identity (detailed)](https://www.ons.gov.uk/datasets/TS070/editions/2021/versions/3) - this dataset classifies usual residents aged 16 years and over in England and Wales by gender identity.\n", 34 | "- [Local Authority District Boundaries](https://geoportal.statistics.gov.uk/datasets/bb53f91cce9e4fd6b661dc0a6c734a3f_0/about) - this file contains the digital vector boundaries for Local Authority Districts in the UK as of May 2022.\n", 35 | "\n", 36 | "# Install packages\n", 37 | "\n", 38 | "If you're running this code on your own PC (and not through the Binder link) then you're going to want to uncomment the lines below so you can install the requisite packages. Another thing to remember is to set your working directory to the correct folder. Otherwise reading in data will be difficult. \n" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "vscode": { 46 | "languageId": "r" 47 | } 48 | }, 49 | "outputs": [], 50 | "source": [ 51 | " # install.packages(\"leaflet\")\n", 52 | " # install.packages(\"sf\")\n", 53 | " # install.packages(\"dplyr\")\n", 54 | " # install.packages(\"readr\")\n" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "# Import libraries\n", 62 | "\n" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": { 69 | "vscode": { 70 | "languageId": "r" 71 | } 72 | }, 73 | "outputs": [], 74 | "source": [ 75 | "# used to read-in datasets\n", 76 | "library(readr)\n", 77 | "# used to manipulate datasets\n", 78 | "library(dplyr)\n", 79 | "# used to read-in spatial data, shapefiles\n", 80 | "library(sf)\n", 81 | "# used to create interactive maps\n", 82 | "library(leaflet)\n", 83 | "# used to scrape data from websites\n", 84 | "library(httr)\n" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "# Read-in dataset\n", 92 | "\n" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "vscode": { 100 | "languageId": "r" 101 | } 102 | }, 103 | "outputs": [], 104 | "source": [ 105 | "# First, let's read in our gender identity dataset\n", 106 | "\n", 107 | "df <- read_csv('../Data/GI_det.csv')\n" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "\n", 115 | "\n" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": { 122 | "vscode": { 123 | "languageId": "r" 124 | } 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "# Use head function to check out the first few rows - but can also access df via environment pane\n", 129 | "\n", 130 | "head(df, 10)\n" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "# Data Cleaning\n", 138 | "\n", 139 | "Before we can calculate the %'s of trans men in each local authority, it's good to do some housekeeping and get our dataframe in order.\n", 140 | "\n", 141 | "There's a few things that need sorting including:\n", 142 | "\n", 143 | "1. renaming columns so they are easier to reference\n", 144 | "2. removing 'Does not apply' from gender identity category\n", 145 | "\n", 146 | "\n", 147 | "## Pipe operator - %\\>%\n", 148 | "\n", 149 | "The pipe operator is used to pass the result of one function directly into the next one.\n", 150 | "E.g. let's say we had some code:\n" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": { 157 | "vscode": { 158 | "languageId": "r" 159 | } 160 | }, 161 | "outputs": [], 162 | "source": [ 163 | "sorted_data <- my_data %\\>% filter(condition) %\\>% arrange(sorting_variable)\n", 164 | "\n" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "What we're doing is using the pipe operator to pass my_data to the filter() function, and the result of this is then passed to the arrange() function.\n", 172 | "\n", 173 | "Basically, pipes allow us to chain together a sequence of functions in a way that's easy to read and understand.\n", 174 | "\n", 175 | "In the code below we use the pipe operator to pass our dataframe to the rename function.\n", 176 | "\n", 177 | "This basically supplies the rename function with its first argument, which is the dataframe to filter on.\n", 178 | "\n", 179 | "## 1\n" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": null, 185 | "metadata": { 186 | "vscode": { 187 | "languageId": "r" 188 | } 189 | }, 190 | "outputs": [], 191 | "source": [ 192 | "# Rename columns using the rename function from dplyr\n", 193 | "# Specify what you want to rename the column to, and supply the original column string\n", 194 | "\n", 195 | "df <- df %>% \n", 196 | " rename(LA_code = `Lower tier local authorities Code`,\n", 197 | " # backticks ` necessary when names are syntactically invalid, e.g. spaces, special characters etc.\n", 198 | " LA_name = `Lower tier local authorities`,\n", 199 | " GI_code = `Gender identity (8 categories) Code`,\n", 200 | " GI_cat = `Gender identity (8 categories)`)\n" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "\n", 208 | "\n" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": { 215 | "vscode": { 216 | "languageId": "r" 217 | } 218 | }, 219 | "outputs": [], 220 | "source": [ 221 | "# Let's use the colnames function to see if it worked\n", 222 | "\n", 223 | "colnames(df)\n" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "## 2\n", 231 | "\n", 232 | "### Logical operators - ==, !=, \\<, \\>, \\<=, \\>=, &, \\|, !\n", 233 | "\n", 234 | "Logical operators are used to perform comparisons between values or expressions, which result in a logical (Boolean) value of 'TRUE' or 'FALSE'.\n", 235 | "\n", 236 | "In the code below we use the '!=' 'Does not equal' operator which tests if the GI_cat value in each row of the df does not equal the string 'Does not apply'.\n", 237 | "\n", 238 | "For each row where GI_cat is not equal to 'Does not apply', the expression valuates to TRUE.\n", 239 | "\n", 240 | "We filter so we only keep rows where this expression evaluates to TRUE.\n" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": null, 246 | "metadata": { 247 | "vscode": { 248 | "languageId": "r" 249 | } 250 | }, 251 | "outputs": [], 252 | "source": [ 253 | "# Use dplyr's filter function to get rid of 'Does not apply'\n", 254 | "# Use '!=' to keep everything except 'Does not apply' category\n", 255 | "\n", 256 | "df <- df %>% filter(GI_cat != 'Does not apply')\n" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "### Dollar sign operator - $\n", 264 | "\n", 265 | "This operator is used to access elements, such as columns of a dataframe, by name.Below, we use it to access the gender identity category column, where we want to view the unique values.\n" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": { 272 | "vscode": { 273 | "languageId": "r" 274 | } 275 | }, 276 | "outputs": [], 277 | "source": [ 278 | "# Unique function can be applied to a column in a df to see which values are in that column\n", 279 | "# Let's see if 'Does not apply' has been successfully dropped\n", 280 | "\n", 281 | "unique(df$GI_cat)\n" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "# Data Pre-processing\n", 289 | "\n", 290 | "Now onto the more interesting stuff.\n", 291 | "The data pre-processing stage involves preparing and transforming data into a suitable format for further analysis.It can involve selecting features, transforming variables, and creating new variables.For our purposes, we need to create a new column 'Percentages' which contains the % of Trans men in each local authority. \n", 292 | "\n", 293 | "So, we'll need to first calculate the % of each gender identity category for each local authority. Then, we'll want to filter our dataset so that we only keep the responses related to Trans men.\n" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": { 300 | "vscode": { 301 | "languageId": "r" 302 | } 303 | }, 304 | "outputs": [], 305 | "source": [ 306 | "# Use group_by to group the dataframe by the LA_name column\n", 307 | "# Use mutate to perform calculation within each LA_name group, convert result to a % by multiplying by 100\n", 308 | "# round() is used to round %'s to 2 decimal places\n", 309 | "\n", 310 | "df <- df %>%\n", 311 | " group_by(LA_name) %>%\n", 312 | " mutate(Percentage = round(Observation / sum(Observation) * 100, 2))\n" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "\n", 320 | "\n" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "metadata": { 327 | "vscode": { 328 | "languageId": "r" 329 | } 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "# Let's check out the results\n", 334 | "\n", 335 | "head(df, 10)\n" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "\n" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "metadata": { 349 | "vscode": { 350 | "languageId": "r" 351 | } 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "# Use filter() to only keep rows where GI_cat equals 'Trans man'\n", 356 | "df <- df %>% \n", 357 | " filter(GI_cat == 'Trans man') %>%\n", 358 | " # Use select() with '-' to remove 'Observation' column\n", 359 | " select(-Observation) %>% \n", 360 | " # Use distinct() to remove duplicate rows, as a precaution\n", 361 | " distinct() %>% \n", 362 | " # Use ungroup() to remove grouping - resetting the dataframes state after performing group operations is good practice\n", 363 | " ungroup()\n" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "\n", 371 | "\n" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": null, 377 | "metadata": { 378 | "vscode": { 379 | "languageId": "r" 380 | } 381 | }, 382 | "outputs": [], 383 | "source": [ 384 | "# Let's take a look at the results\n", 385 | "head(df)\n" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": {}, 391 | "source": [ 392 | "# Read-in shapefile\n", 393 | "\n", 394 | "Now that we have our gender identity dataset sorted, we can start on the mapping process. And that starts with reading in our shapefile, which we should have downloaded from the geoportal. If (like me) you don't work with spatial data much, you might assume that you only need the shapefile, and you might delete the others that come with the folder. However, a shapefile is not just a single .shp file, but a collection of files that work together, and each of these files plays a crucial role in defining the shapefile's data and behaviour. When you try and read a shapefile into R, the software expects all components to be present, and missing them can lead to errors or incorrect spatial references. E.g. without the .dbf file, you'd lose all attribute data associated with the geographic features, and without the .shx file you might not be able to read the .shp file altogether. \n", 395 | "\n", 396 | "**TLDR: Make sure when you download the shapefile folder you keep all the files!**\n", 397 | "\n", 398 | "Anyway, let's get started.\n" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": null, 404 | "metadata": { 405 | "vscode": { 406 | "languageId": "r" 407 | } 408 | }, 409 | "outputs": [], 410 | "source": [ 411 | "# Download shapefiles from geoportal \n", 412 | "\n", 413 | "# URL for the direct download of the shapefile\n", 414 | "url <- \"https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/Local_Authority_Districts_May_2022_UK_BFE_V3_2022/FeatureServer/replicafilescache/Local_Authority_Districts_May_2022_UK_BFE_V3_2022_3331011932393166417.zip\"\n", 415 | "\n", 416 | "# Create a temporary directory\n", 417 | "tmp_dir <- tempdir()\n", 418 | "print(paste(\"Created temporary directory:\", tmp_dir))\n", 419 | "\n", 420 | "# Set destination file path\n", 421 | "dest_file <- file.path(tmp_dir, \"shapefile.zip\")\n", 422 | "\n", 423 | "# Download the shapefile\n", 424 | "response <- GET(url, write_disk(dest_file, overwrite = TRUE))\n", 425 | "\n", 426 | "# Check if the download was successful\n", 427 | "if (response$status_code == 200)\n", 428 | " print(\"Download successful\")\n", 429 | " \n", 430 | " # Unzip the file within the temporary directory\n", 431 | " unzip(dest_file, exdir = tmp_dir)\n", 432 | " print(paste(\"Files extracted to:\", tmp_dir))\n", 433 | " \n", 434 | " # List all files in the temporary directory to verify extraction\n", 435 | " extracted_files <- list.files(tmp_dir)\n", 436 | " print(\"Extracted files:\")\n", 437 | " print(extracted_files)\n", 438 | " \n", 439 | " # Define the path to the actual shapefile (.shp)\n", 440 | " shapefile_path <- file.path(tmp_dir, 'LAD_MAY_2022_UK_BFE_V3.shp')\n", 441 | " \n", 442 | " # Read in shapefile to a simple features object\n", 443 | " # st_read() reads in spatial data to a 'simple features' object\n", 444 | " sf <- st_read(shapefile_path)\n", 445 | " print(\"Shapefile loaded successfully.\")\n", 446 | " " 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "\n", 454 | "\n" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": null, 460 | "metadata": { 461 | "vscode": { 462 | "languageId": "r" 463 | } 464 | }, 465 | "outputs": [], 466 | "source": [ 467 | "# Let's check it out \n", 468 | "head(sf)\n", 469 | "# Better to just view via environment pane\n" 470 | ] 471 | }, 472 | { 473 | "cell_type": "markdown", 474 | "metadata": {}, 475 | "source": [ 476 | "\n", 477 | "\n" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "metadata": { 484 | "vscode": { 485 | "languageId": "r" 486 | } 487 | }, 488 | "outputs": [], 489 | "source": [ 490 | "# Inspect dimensions\n", 491 | "dim(sf)\n" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "\n" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": null, 504 | "metadata": { 505 | "vscode": { 506 | "languageId": "r" 507 | } 508 | }, 509 | "outputs": [], 510 | "source": [ 511 | "# length() with the unique() function gives us the number of unique values in a column\n", 512 | "\n", 513 | "length(unique(sf$LAD22NM))\n" 514 | ] 515 | }, 516 | { 517 | "cell_type": "markdown", 518 | "metadata": {}, 519 | "source": [ 520 | "# Cleaning shapefile\n", 521 | "\n", 522 | "Hmm.We have 331 local authorities in our dataset that we want to plot, but there are 374 listed here.\n", 523 | "We'll need to remove the local authorities that don't match the ones in our df.\n", 524 | "\n", 525 | "1. rename columns to match 'df'\n", 526 | "2. get rid of redundant Local Authorities\n", 527 | "\n", 528 | "## 1\n" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": { 535 | "vscode": { 536 | "languageId": "r" 537 | } 538 | }, 539 | "outputs": [], 540 | "source": [ 541 | "# Use rename function so sf columns match those in original df\n", 542 | "\n", 543 | "sf <- sf %>% \n", 544 | " rename(LA_code = LAD22CD, \n", 545 | " LA_name = LAD22NM)\n", 546 | "\n", 547 | "# Let's see if it worked\n", 548 | "colnames(sf)\n" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": {}, 554 | "source": [ 555 | "\n" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": null, 561 | "metadata": { 562 | "vscode": { 563 | "languageId": "r" 564 | } 565 | }, 566 | "outputs": [], 567 | "source": [ 568 | "# Replace specific values in the LA_name column using recode()\n", 569 | "\n", 570 | "sf$LA_name <- sf$LA_name %>% \n", 571 | " recode(`Bristol, City of` = \"Bristol\", \n", 572 | " `Kingston upon Hull, City of` = \"Kingston upon Hull\", \n", 573 | " `Herefordshire, County of` = \"Herefordshire\")\n" 574 | ] 575 | }, 576 | { 577 | "cell_type": "markdown", 578 | "metadata": {}, 579 | "source": [ 580 | "## 2\n", 581 | "\n", 582 | "### %in% operator\n", 583 | "\n", 584 | "This is used to check if elements of one list are in another list.\n", 585 | "Much like the logical operators, it returns a boolean value TRUE or FALSE.\n", 586 | "And we only keep rows in the LA_code for the 'sf' dataset, if they are present in the LA_code column in 'df'.\n" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": null, 592 | "metadata": { 593 | "vscode": { 594 | "languageId": "r" 595 | } 596 | }, 597 | "outputs": [], 598 | "source": [ 599 | "# Use filter() with %in% and unique() to only keep LA's that match \n", 600 | "\n", 601 | "sf <- sf %>% \n", 602 | " filter(LA_code %in% unique(df$LA_code))\n" 603 | ] 604 | }, 605 | { 606 | "cell_type": "markdown", 607 | "metadata": {}, 608 | "source": [ 609 | "\n", 610 | "\n" 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": null, 616 | "metadata": { 617 | "vscode": { 618 | "languageId": "r" 619 | } 620 | }, 621 | "outputs": [], 622 | "source": [ 623 | "# Let's see how it looks.. \n", 624 | "# We should have 331 unique LA_codes\n", 625 | "length(unique(sf$LA_code))\n" 626 | ] 627 | }, 628 | { 629 | "cell_type": "markdown", 630 | "metadata": {}, 631 | "source": [ 632 | "# Pre-processing shapefile\n", 633 | "\n", 634 | "When it comes to mapping our data, it is important that we know which Coordinate Reference System (CRS) we are working with. Simply put, the CRS is a way to describe how the spatial data in the 'sf' object maps to locations on earth. The CRS is just a way of translating 3D reality into 2D maps. And when it comes to using mapping libraries like 'leaflet', knowing the CRS is important because leaflet expects coordinates in a specific format (usually latitude and longitude), which is EPSG:4326. If our CRS isn't in this format then we might need to transform it so that it matches what leaflet expects. Let's go ahead and see what our CRS is saying. \n" 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": null, 640 | "metadata": { 641 | "vscode": { 642 | "languageId": "r" 643 | } 644 | }, 645 | "outputs": [], 646 | "source": [ 647 | "# st_crs() shows our CRS info\n", 648 | "st_crs(sf)\n" 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": {}, 654 | "source": [ 655 | "\n", 656 | "\n" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": null, 662 | "metadata": { 663 | "vscode": { 664 | "languageId": "r" 665 | } 666 | }, 667 | "outputs": [], 668 | "source": [ 669 | "# To transform our crs to EPSG: 4326, simply use st_transform() and specify the crs\n", 670 | "# Note: you don't have to use the %>% pipe operator all the time\n", 671 | "sf <- st_transform(sf, crs = 4326)\n" 672 | ] 673 | }, 674 | { 675 | "cell_type": "markdown", 676 | "metadata": {}, 677 | "source": [ 678 | "## Merge datasets\n", 679 | "\n", 680 | "What we want to do now is merge our 'df' dataframe with our 'sf' spatial object, so that we can directly access the data and map it!\n", 681 | "\n", 682 | "When you use the merge function in R, the order in which you place the data matters in terms of the result's class type and spatial attributes. \n", 683 | "So, in terms of class type, we have a dataframe and a spatial object. By placing 'sf' first, the result will be a spatial object, which is important because this retains the spatial characteristics and geometry columns of the 'sf' object. We merge the columns on the LA_code and LA_name columns which are present in both datasets. \n", 684 | "\n", 685 | "### 'c' function\n", 686 | "\n", 687 | "Don't overthink it. It's just a way to group items together in R, whether for defining a set of values to work with, specifying parameters for a function, or any number of other uses where a list of items is needed. \n" 688 | ] 689 | }, 690 | { 691 | "cell_type": "code", 692 | "execution_count": null, 693 | "metadata": { 694 | "vscode": { 695 | "languageId": "r" 696 | } 697 | }, 698 | "outputs": [], 699 | "source": [ 700 | "# Merge the dataframes\n", 701 | "merged <- merge(sf, df, by = c('LA_code', 'LA_name'))\n" 702 | ] 703 | }, 704 | { 705 | "cell_type": "markdown", 706 | "metadata": {}, 707 | "source": [ 708 | "\n", 709 | "\n" 710 | ] 711 | }, 712 | { 713 | "cell_type": "code", 714 | "execution_count": null, 715 | "metadata": { 716 | "vscode": { 717 | "languageId": "r" 718 | } 719 | }, 720 | "outputs": [], 721 | "source": [ 722 | "# Let's check it out\n", 723 | "head(merged)\n" 724 | ] 725 | }, 726 | { 727 | "cell_type": "markdown", 728 | "metadata": {}, 729 | "source": [ 730 | "# Data Analysis\n", 731 | "\n", 732 | "## Building our interactive map\n", 733 | "\n", 734 | "Finally, we can now build out interactive map using leaflet. You can see from the 'geometry' column that we're working with 'MULTIPOLYGON's' and 'POLYGON's'. Multipolygons are a collection of polygons grouped together as a single geometric entity. Basically, multipolygons are good at representing complex shapes. We also have some standard polygons too. In total we have 331 shapes to plot, each representing a local authority. You can take a look at these separate shapes by using the plot function and indexing the row and column (see below). \n" 735 | ] 736 | }, 737 | { 738 | "cell_type": "code", 739 | "execution_count": null, 740 | "metadata": { 741 | "vscode": { 742 | "languageId": "r" 743 | } 744 | }, 745 | "outputs": [], 746 | "source": [ 747 | "plot(sf[1, 'geometry'])\n", 748 | "\n" 749 | ] 750 | }, 751 | { 752 | "cell_type": "markdown", 753 | "metadata": {}, 754 | "source": [ 755 | "The code below has helpful code comments that should help you grasp what each bit of the code is doing. But, to provide the overall picture, what we have below is some code for our colour palette which will create a colour scale for the range of values in our 'Percentage' column. Then, we create our interactive map which we've named 'uk_map'. We center our map, add some default map tiles, add our polygons, colour them, then add in the interactive elements such as highlight options (how background changes when cursor hovers over a shape) and label (which specifies tooltips). Then, we add a legend. Finally, we can display this interactive map. \n", 756 | "\n" 757 | ] 758 | }, 759 | { 760 | "cell_type": "code", 761 | "execution_count": null, 762 | "metadata": { 763 | "vscode": { 764 | "languageId": "r" 765 | } 766 | }, 767 | "outputs": [], 768 | "source": [ 769 | "# Define the color palette for filling in our multipolygon shapes\n", 770 | "# domain sets the range of data values that the colour scale should cover\n", 771 | "color_palette <- colorNumeric(palette = \"YlGnBu\", domain = merged$Percentage)\n" 772 | ] 773 | }, 774 | { 775 | "cell_type": "markdown", 776 | "metadata": {}, 777 | "source": [ 778 | "\n", 779 | "\n" 780 | ] 781 | }, 782 | { 783 | "cell_type": "code", 784 | "execution_count": null, 785 | "metadata": { 786 | "vscode": { 787 | "languageId": "r" 788 | } 789 | }, 790 | "outputs": [], 791 | "source": [ 792 | "# Use leaflet function with 'merged' dataset\n", 793 | "uk_map <- leaflet(merged) %>%\n", 794 | " # Centers the map on long and lat for UK\n", 795 | " setView(lng = -3.0, lat = 53, zoom = 6) %>%\n", 796 | " # Adds default map tiles (the visual image of the map)\n", 797 | " addTiles() %>%\n", 798 | " # Adds multipolygons to the map, and colours them based on the 'Percentage' column\n", 799 | " # We use the palette we created above\n", 800 | " addPolygons(\n", 801 | " fillColor = ~color_palette(Percentage),\n", 802 | " weight = 1, # Set the border weight to 1 for thinner borders\n", 803 | " color = \"#000000\",\n", 804 | " fillOpacity = 0.7,\n", 805 | " highlightOptions = highlightOptions(color = \"white\", weight = 2, bringToFront = TRUE),\n", 806 | " label = ~paste(LA_name, \":\", Percentage, \"%\"), # This will create tooltips showing the info\n", 807 | " labelOptions = labelOptions(\n", 808 | " style = list(\"font-weight\" = \"normal\", padding = \"3px 8px\"),\n", 809 | " textsize = \"12px\", direction = \"auto\") # Adjust text size as needed\n", 810 | " ) %>%\n", 811 | " addLegend(pal = color_palette, values = ~Percentage, opacity = 0.7, title = \"Percentage\", position = \"topright\")\n", 812 | "\n", 813 | "# Render the map\n", 814 | "uk_map\n" 815 | ] 816 | } 817 | ], 818 | "metadata": { 819 | "anaconda-cloud": "", 820 | "kernelspec": { 821 | "display_name": "R", 822 | "language": "R", 823 | "name": "ir" 824 | }, 825 | "language_info": { 826 | "codemirror_mode": "r", 827 | "file_extension": ".r", 828 | "mimetype": "text/x-r-source", 829 | "name": "R", 830 | "pygments_lexer": "r", 831 | "version": "4.4.0" 832 | }, 833 | "toc": { 834 | "base_numbering": 1, 835 | "nav_menu": {}, 836 | "number_sections": true, 837 | "sideBar": true, 838 | "skip_h1_title": false, 839 | "title_cell": "Table of Contents", 840 | "title_sidebar": "Contents", 841 | "toc_cell": false, 842 | "toc_position": {}, 843 | "toc_section_display": true, 844 | "toc_window_display": true 845 | } 846 | }, 847 | "nbformat": 4, 848 | "nbformat_minor": 4 849 | } 850 | -------------------------------------------------------------------------------- /R_code/Guide.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R Notebook" 3 | output: html_notebook 4 | --- 5 | 6 | 7 | ```{r setup, include=FALSE} 8 | knitr::opts_chunk$set(eval = FALSE) 9 | 10 | ``` 11 | 12 | ![UKDS logo](/Users/loucap/Documents/GitWork/fresh_repo/Images/ukds.png) 13 | 14 | # Guide to Interactive Visualisations 15 | 16 | In this guide, you'll be shown how to make 2 key types of interactive visualisations, which include: 17 | 18 | * Basic bar chart + group and stacked 19 | * Scatterplot + dropdown menu 20 | 21 | To create these visualisations, we'll be using the **'plotly'** package. 22 | 23 | # Census Data 24 | 25 | Datasets used in this workshop are from the 2021 census, and involve the new voluntary question which focuses on gender identity. In particular, we explore the relationship between age and gender identity, as well as ethnicity and gender identity. 26 | 27 | **However, please note:** 28 | 29 | The Office for Statistics Regulation confirmed on 12/09/2024 that the gender identity estimates from Census 2021 in England and Wales are no longer 'accredited official statistics' and are **now classified as 'official statistics in development'**. For further information, please see: [Sexual orientation and gender identity quality information for Census 2021.](https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/sexuality/methodologies/sexualorientationandgenderidentityqualityinformationforcensus2021) 30 | 31 | # Let's begin... 32 | 33 | Let's get started by importing the necessary packages. 34 | 35 | **NOTE:** If you're not following along with Binder, and you have your own computational environment, make sure you install the necessary packages through the command line before proceeding to import. 36 | 37 | ## Install packages 38 | 39 | Uncomment the lines below to install the packages if you're not working in Binder. 40 | 41 | ```{r} 42 | # install.packages("readr") 43 | # install.packages("dplyr") 44 | # install.packages("stringr") 45 | # install.packages("shiny") 46 | # install.packages("plotly") 47 | ``` 48 | 49 | ## Load in packages 50 | 51 | ```{r} 52 | # Allows us to read-in csv files 53 | library(readr) 54 | # For data manipulation 55 | library(dplyr) 56 | # For regular expression operations 57 | library(stringr) 58 | # Used tp create interactive visualisations 59 | library(plotly) 60 | ``` 61 | 62 | # Dataset 1 63 | 64 | The first dataset that we'll be focusing on is a really simple dataset which shows the total counts for 8 gender identity categories across England and Wales. We'll do a bit of data cleaning, remove unnecessary categories (such as 'Does not apply'), and then calculate the % of each gender identity category. Then, we'll create a simple interactive bar chart which displays the percentage by gender identity category, whilst enabling some interactivity when we hover over each bar. 65 | 66 | ```{r} 67 | # Load in dataset 68 | 69 | df <- read_csv('../Data/GI_det_EW.csv') 70 | ``` 71 | * chr - stands for "character" and represents text data. Columns with 'chr' contain strings, meaning any kind of text or combination of letters, numbers, or symbols that are treated as text e.g. "hello" or "123abc" would be of type 'chr' 72 | * dbl - stands for "double" and refers to double-precision floating-point numbers. I.e., it represents numerical data with decimal points e.g. 3.14, 0.001, 29393. 73 | 74 | ```{r} 75 | # Brief glimpse of data structure 76 | # But can also click on the dataset in the Environment pane 77 | head(df, 10) 78 | ``` 79 | 80 | 81 | ## Data cleaning 82 | 83 | * Clean column names 84 | * Filter out unecessary categories 85 | * Clean gender identity category values - too wordy 86 | * Ensure gender_identity column is a factor with levels in desired order 87 | 88 | 89 | ```{r} 90 | # str_replace_all() method finds all substrings which match the regex and replaces them with empty string 91 | # First, let's replace any brackets with empty strings 92 | colnames(df) <- str_replace_all(colnames(df), "\\s*\\([^)]*\\)", "") 93 | 94 | # Lowercase column text and replace empty spaces with "_" 95 | colnames(df) <- tolower(colnames(df)) 96 | colnames(df) <- str_replace_all(colnames(df), " ", "_") 97 | 98 | # Let's see if it worked.. 99 | colnames(df) 100 | 101 | ``` 102 | 103 | 104 | ### Pipes and other operators.. 105 | 106 | So, we've already come across the assignment operator '<-' which is used to assign a value. E.g. df <- read_csv('Data/GI_age.csv'), here we assign our csv file to a dataframe variable called 'df'. 107 | 108 | But, we're now going to encounter the pipe operator '%>%' which can seem intimidating at first but is actually pretty simple. It's used to pass the result of one function directly into the next function. E.g. df <- df %>% filter(gender_identity_code != -8), here we start with our df and pass it to the filter function using the pipe operator. This basically supplies the filter() function with its first argument, which is the dataframe to filter on. And here we encounter a logical operator '!=' within the filter() function, which specifies that we should only keep rows where gender_identity_code is not equal to -8. 109 | 110 | ### Dollar sign operator - $ 111 | 112 | This operator is used to access elements, such as columns of a dataframe, by name. 113 | Below, we use it to access the gender identity code column, where we want to view the unique values. 114 | 115 | ```{r} 116 | # Get rid of redundant categories 117 | df <- df %>% 118 | filter(gender_identity_code != -8) 119 | 120 | # Use unique and access column to output its unique values 121 | 122 | unique(df$gender_identity_code) 123 | ``` 124 | 125 | ```{r} 126 | # Let's take a look at our unique values in our gender_identity category column 127 | 128 | unique(df$gender_identity) 129 | ``` 130 | ```{r} 131 | # Use combo of mutate and recode to replace multiple values in column 132 | # .default ensures that any value not matching those specified are left unchanged 133 | 134 | df <- df %>% 135 | mutate(gender_identity = recode(gender_identity, 136 | "Gender identity the same as sex registered at birth" = "Cisgender", 137 | "Gender identity different from sex registered at birth but no specific identity given" = "Gender identity different from sex", 138 | .default = gender_identity)) 139 | ``` 140 | 141 | ```{r} 142 | # Let's see if it worked... 143 | unique(df$gender_identity) 144 | ``` 145 | 146 | ```{r} 147 | # We use factor to convert gender_identity column to a factor with specified levels 148 | # This tells Plotly the exact order in which to display categories 149 | # Otherwise, R plots categorical data in alphabetical order.. 150 | 151 | df$gender_identity <- factor(df$gender_identity, levels = c( 152 | "Cisgender", 153 | "Gender identity different from sex", 154 | "Trans woman", 155 | "Trans man", 156 | "All other gender identities", 157 | "Not answered" 158 | )) 159 | ``` 160 | 161 | ```{r} 162 | class(df$gender_identity) 163 | 164 | ``` 165 | 166 | ## Data pre-processing 167 | 168 | Before we can plot our data, we need to calculate the percentage of each gender identity category. 169 | 170 | The mutate() function adds a new column 'percentage' to df, and applies the following calculation to each row. 171 | 172 | ```{r} 173 | # mutate() is used to add new variables to a df or modify existing ones 174 | df <- df %>% mutate(percentage = round(observation / sum(observation) * 100, 2)) 175 | ``` 176 | 177 | ```{r} 178 | # Let's take a look.. 179 | head(df$percentage) 180 | ``` 181 | 182 | 183 | 184 | ## Basic interactive bar chart 185 | 186 | Now we can create our first simple interactive visualisation. To do so we use Plotly's plot_ly function, and supply the parameters with the necessary arguments. You'll notice that we use the tilde operator (~) quite a bit when building our graph. By preceding relevant variables with ~ it tells R to look for that variable within the dataframe. 187 | 188 | 189 | ```{r} 190 | # Create the bar chart visualization with percentages on the y-axis 191 | fig <- plot_ly(data = df, x = ~gender_identity, y = ~percentage, type = 'bar', 192 | # defines how the bars should be styled 193 | marker = list(color = 'rgb(158,202,225)', line = list(color = 'rgb(8,48,107)', width = 1.5)), 194 | width = 800, height = 600) 195 | ``` 196 | 197 | 198 | ```{r} 199 | # Let's check it out 200 | fig 201 | 202 | ``` 203 | 204 | ## Using layout() method 205 | 206 | Once a graph has been created, we can use the layout method to customise the appearance and layout. This allows you to modify things such as titles, legend details, axis properties, etc, without needing to recreate the figure from scratch. 207 | 208 | ```{r} 209 | # Let's apply a log scale to our y-axis so this graph is easier to interpet 210 | 211 | fig <- layout(fig, 212 | title = 'Percentage of Each Gender Identity in England and Wales', 213 | # set showline to true, otherwise it disappears when we apply log scale 214 | xaxis = list(title = 'Gender Identity', showline = TRUE), 215 | yaxis = list(type = 'log', title = 'Percentage (Log Scale)')) 216 | ``` 217 | 218 | ```{r} 219 | fig 220 | ``` 221 | 222 | ## Tooltips 223 | 224 | When using different R libraries that are geared towards interactive visualisations, you'll often come across 'tooltips'. These are small boxes that provide information when a user hovers over a part of a data visualisation such as: a point on a graph, a bar in a bar chart, or a segment in a pie chart. They are used to display additional information about the data point or object, providing more context without cluttering up the chart. In Plotly tooltips are referred to as 'hover_data'. 225 | 226 | All interactive plotly graphs come with default hover data, so when you scroll over a bar or a scatterplot data point it will display the specific x-axis value and y-axis value. But, variety is the spice of life and there's going to be times when you want to leverage this feature to include interesting info that isn't included by default. For instance, for our bar chart, I'd like to add in data from the 'Observation' column, which shows the raw count for each gender identity category. 227 | 228 | To do this it's quite easy. We use the text and hoverinfo parameter in the plot_ly function, with text defining the variables we'd like to include and how they should appear, and hoverinfo ensuring that this text is displayed in the tooltips. So, let's create the graph again, but this time let's specify our tooltips. 229 | 230 | ```{r} 231 | new_fig <- plot_ly(data = df, x = ~gender_identity, y = ~percentage, type = 'bar', 232 | # ~paste combines multiple pieces of text and data into one string 233 | hovertext = ~paste(#
is HTML code for a line break 234 | # sprintf - used to format strings 235 | "
Percentage: ", sprintf("%.2f%%", percentage), 236 | "
Observations: ", observation), 237 | # tells plotly to only display the text provided in hovertext 238 | hoverinfo = 'text', 239 | marker = list(color = 'rgb(158,202,225)', line = list(color = 'rgb(8,48,107)', width = 1.5)), 240 | width = 800, height = 600) 241 | 242 | 243 | 244 | # Apply a log scale to the y-axis 245 | new_fig <- layout(new_fig, 246 | title = 'Percentage of Each Gender Identity in England and Wales', 247 | xaxis = list(title = 'Gender Identity', showline = TRUE), 248 | yaxis = list(type = 'log', title = 'Percentage (Log Scale)')) 249 | 250 | ``` 251 | 252 | ```{r} 253 | new_fig 254 | ``` 255 | 256 | 257 | 258 | 259 | # Dataset 2 260 | 261 | This dataset classifies residents by gender identity and age, with the unit of analysis being England and Wales. 262 | 263 | ```{r} 264 | # Load in dataset 265 | 266 | df2 <- read_csv('../Data/GI_age.csv') 267 | ``` 268 | 269 | 270 | ```{r} 271 | # Brief glimpse of data structure 272 | head(df2, 10) 273 | ``` 274 | 275 | ```{r} 276 | # Let's check out the dimensions 277 | 278 | dim(df2) 279 | ``` 280 | 281 | ## Data Cleaning 282 | 283 | * Clean column names 284 | * Filter out unecessary categories 285 | * Clean gender identity category values - too wordy 286 | * Ensure gender_identity column is a factor with levels in desired order 287 | * Clean age category values - too wordy 288 | 289 | We'll whiz through this, because it's the same stuff we did for the last dataset. 290 | 291 | ```{r} 292 | # str_replace_all() method finds all substrings which match the regex and replaces them with empty string 293 | # First, let's replace any brackets with empty strings 294 | colnames(df2) <- str_replace_all(colnames(df2), "\\s*\\([^)]*\\)", "") 295 | 296 | # Lowercase column text and replace empty spaces with "_" 297 | colnames(df2) <- tolower(colnames(df2)) 298 | colnames(df2) <- str_replace_all(colnames(df2), " ", "_") 299 | 300 | # Let's see if it worked.. 301 | colnames(df2) 302 | 303 | ``` 304 | 305 | ```{r} 306 | # Get rid of values that do not apply 307 | df2 <- df2 %>% 308 | filter(gender_identity_code != -8) 309 | 310 | # Use unique and access column to output its unique values 311 | 312 | unique(df2$age_code) 313 | ``` 314 | 315 | ```{r} 316 | # Get rid of redundant age category 317 | # Further filter data 318 | df2 <- df2 %>% 319 | filter(age_code != 1) 320 | 321 | ``` 322 | 323 | ```{r} 324 | # Clean up the values in the 'age' column. Let's shorten them. 325 | 326 | # Chain str_replace() calls together to apply multiple string replacements in succession 327 | # Each str_replace() call is applied to the result of the previous one 328 | df2$age <- df2$age %>% 329 | str_replace('Aged ', '') %>% 330 | str_replace('to', '-') %>% 331 | str_replace('years', '') %>% 332 | str_replace('and over', '+') %>% 333 | str_replace(' - ', '-') 334 | 335 | # We can pass our df to the select function, where we specify the column we're interested in. 336 | # Then, we pipe the output to the head function. 337 | df2 %>% 338 | select(age) %>% 339 | head(5) 340 | ``` 341 | 342 | ```{r} 343 | # Use combo of mutate and recode to replace multiple values in column 344 | # .default ensures that any value not matching those specified are left unchanged 345 | 346 | df2 <- df2 %>% 347 | mutate(gender_identity = recode(gender_identity, 348 | "Gender identity the same as sex registered at birth" = "Cisgender", 349 | "Gender identity different from sex registered at birth but no specific identity given" = "Gender identity different from sex", 350 | .default = gender_identity)) 351 | 352 | 353 | ``` 354 | 355 | 356 | ```{r} 357 | 358 | unique(df2$gender_identity) 359 | ``` 360 | 361 | ```{r} 362 | # We use factor to convert gender_identity column to a factor with specified levels 363 | # This tells Plotly the exact order in which to display categories 364 | 365 | df2$gender_identity <- factor(df2$gender_identity, levels = c( 366 | "Cisgender", 367 | "Gender identity different from sex", 368 | "Trans woman", 369 | "Trans man", 370 | "All other gender identities", 371 | "Not answered" 372 | )) 373 | ``` 374 | 375 | 376 | ## Question 377 | 378 | How is gender identity distributed among different age groups? 379 | 380 | Some subquestions that this can help us answer: 381 | 382 | * What % of trans women are aged 16-24 years? 383 | * Are older age groups over represented in the 'non-response' category? 384 | 385 | ## Data pre-processing 386 | 387 | ### Calculate percentages 388 | 389 | Below, we use the group_by function to group the data by 'gender_identity' and calculate the percentage within each group. Then the mutate() function adds a new column 'percentage' to df, which (for each group) divides the observation by the sum of observations, multiplies it by 100, and rounds it up to 2 decimal points. We then use the ungroup function when we're done with the grouping operation. 390 | 391 | ```{r} 392 | df2 <- df2 %>% 393 | group_by(gender_identity) %>% 394 | mutate(percentage = round((observation / sum(observation) * 100), 2)) %>% 395 | ungroup() 396 | 397 | head(df2) 398 | ``` 399 | 400 | 401 | ## Interactive grouped bar chart 402 | 403 | When creating grouped bar charts, there's a few subtle differences that you'll need to account for in the code. 404 | We'll need a way to colour each bar in each group, according to age categories, which we can do with the 'color' and 'colors' parameters. 405 | 406 | ```{r} 407 | # Create a grouped bar chart with hover information 408 | fig2 <- plot_ly(data = df2, x = ~gender_identity, y = ~percentage, type = 'bar', 409 | # color specifies which variable to colour by 410 | # colors specifies the colour palette to use, and how many colours are required 411 | color = ~age, colors = RColorBrewer::brewer.pal(length(unique(df2$age)), "Set2"), 412 | hoverinfo = 'text', 413 | hovertext = ~paste("Observation: ", observation, 414 | "
Percentage: ", sprintf("%.2f%%", percentage), 415 | "
Age group: ", age), 416 | marker = list(line = list(color = 'rgba(255,255,255, 0.5)', width = 0.5)), 417 | width = 800, height = 600) 418 | 419 | ``` 420 | 421 | ```{r} 422 | fig2 423 | ``` 424 | 425 | ```{r} 426 | fig2 <- layout(fig2,title = 'Distribution of Gender Identity Categories Among Age Groups', 427 | xaxis = list(title = 'Gender Identity'), 428 | yaxis = list(title = 'Percentage'), 429 | legend = list(title = list(text = 'Age Group'))) 430 | 431 | ``` 432 | 433 | ```{r} 434 | fig2 435 | ``` 436 | 437 | ## Stacked bar chart 438 | 439 | The method I show below simply converts the previously made grouped bar chart 'fig2' to a stacked bar chart. Stacked bar charts can only be created using the layout() function to change the barmode, as the default is a grouped bar chart. 440 | 441 | 442 | 443 | ```{r} 444 | # Convert to stacked bar chart 445 | 446 | st_fig <- layout(fig2, 447 | barmode = 'stack') 448 | 449 | st_fig 450 | ``` 451 | 452 | 453 | 454 | ## Dataset 3 455 | 456 | This dataset classifies residents by gender identity and ethnic group, with the unit of analysis being the 331 local authorities across England and Wales. 457 | 458 | ```{r} 459 | # Load in dataset 460 | 461 | df3 <- read_csv('../Data/GI_ethnic.csv') 462 | ``` 463 | 464 | 465 | ```{r} 466 | # Brief glimpse at underlying data structure 467 | head(df3, 10) 468 | ``` 469 | 470 | ## Data Cleaning 471 | 472 | * Clean column names 473 | * Filter out unnecessary categories 474 | 475 | Below, I provide another method 'gsub()' which can be used instead of the str_replace_all() method which has been demonstrated in the previous cleaning sections. Basically, looks for a pattern and applies the replacement to any column names which match the pattern. 476 | 477 | ```{r} 478 | # Remove all text within parentheses from column names and replace it with an empty string 479 | 480 | # tilde operator (~) used to apply function 'gsub' to each colname 481 | # .x represents each colname that gsub will be applied to 482 | df3 <- df3 %>% 483 | rename_with(~ gsub("\\s*\\([^)]*\\)", "", .x)) 484 | ``` 485 | 486 | ```{r} 487 | # Lowercase all text in column names and replace spaces with underscores 488 | df3 <- df3 %>% 489 | rename_with(~ tolower(gsub(" ", "_", .x))) 490 | ``` 491 | 492 | ```{r} 493 | # Shorten the local authority column names as they are way too long 494 | df3 <- df3 %>% 495 | rename(LA_code = lower_tier_local_authorities_code, 496 | LA_name = lower_tier_local_authorities) 497 | 498 | ``` 499 | 500 | ```{r} 501 | # Let's see if it worked 502 | colnames(df3) 503 | ``` 504 | 505 | 506 | ```{r} 507 | # Remove 'Does not apply' categories for the gender identity and ethnic group columns 508 | df3 <- df3 %>% 509 | filter(gender_identity_code != -8, ethnic_group_code != -8) 510 | ``` 511 | 512 | ```{r} 513 | # Let's see if it worked.. 514 | unique(df3$gender_identity_code) 515 | ``` 516 | 517 | ```{r} 518 | # Let's see if it worked.. 519 | unique(df3$ethnic_group_code) 520 | ``` 521 | 522 | ## Question 523 | 524 | How does the rate of 'non-response' on gender identity vary among different ethnic groups across local authorities in England and Wales? 525 | 526 | A subquestion this could help us answer: 527 | 528 | Does the relationship between non-response and ethnic group % for local authorities differ between the 'White' categories and other ethnic groups? 529 | 530 | ## Data pre-processing 531 | 532 | Given that I want to explore the question above, I'd like to create a scatterplot which explores the relationship between the % of certain ethnic groups within local authorities and their non-response rates. Therefore, I'll need to prep my x and y variables, so I'll need to calculate the percentage of each ethnic group in each LA, and that ethnic groups non-response rate within each LA. 533 | 534 | ### Calculate % of each ethnic group in each LA 535 | 536 | ```{r} 537 | # First, we're going to group our data by LA_name, ethnic group, and sum our observations 538 | # This leaves us with the total of each ethnic group in each local authority 539 | ethnic_totals <- df3 %>% 540 | group_by(LA_name, ethnic_group) %>% 541 | summarise(Ethnic_sum = sum(observation, na.rm = TRUE)) %>% 542 | ungroup() 543 | 544 | # Print the first few rows to check 545 | head(ethnic_totals) 546 | ``` 547 | 548 | 549 | ```{r} 550 | # Calculate total observations for each local authority by grouping df3 by local authority and summing up obs 551 | la_totals <- df3 %>% 552 | group_by(LA_name) %>% 553 | summarise(LA_sum = sum(observation, na.rm = TRUE)) %>% 554 | ungroup() 555 | 556 | # Print the first few rows to check 557 | head(la_totals) 558 | ``` 559 | 560 | ```{r} 561 | # Merge the ethnic_totals and la_totals dataframes together 562 | # by parameter specifies which column to perform merge on 563 | 564 | grp_pct <- merge(ethnic_totals, la_totals, by = "LA_name") 565 | ``` 566 | 567 | ```{r} 568 | # Calculate the percentage of each ethnic group within each local authority 569 | # Store results in new column 570 | 571 | grp_pct <- grp_pct %>% 572 | mutate(Percentage = round((Ethnic_sum / LA_sum * 100), 2)) 573 | ``` 574 | 575 | 576 | ```{r} 577 | # Print the first few rows to check 578 | head(grp_pct, 10) 579 | ``` 580 | 581 | ### Calculate Ethnic Group Non-Response Rates (%'s) Within LAs 582 | 583 | ```{r} 584 | # We already have our ethnic group totals which we can re-use... 585 | 586 | ethnic_totals 587 | ``` 588 | 589 | ```{r} 590 | # Calculate sum of non-responses for each ethnic group within each LA 591 | # Filter df3 so that we only have non-response rows 592 | # Group by LA and ethnic group then sum non-response obs and store the results in new column 593 | 594 | non_response_totals <- df3 %>% 595 | filter(gender_identity == 'Not answered') %>% 596 | group_by(LA_name, ethnic_group) %>% 597 | summarise(NR_total = sum(observation, na.rm = TRUE)) %>% 598 | ungroup() 599 | ``` 600 | 601 | ```{r} 602 | 603 | # Let's check it out.. 604 | head(non_response_totals) 605 | ``` 606 | 607 | 608 | ```{r} 609 | # Merge the ethnic group totals with the ethnic group non-response totals 610 | # c - used when we're referencing more than one column 611 | # all.x - performs a left join 612 | grp_nr <- merge(ethnic_totals, non_response_totals, by = c("LA_name", "ethnic_group"), all.x = TRUE) 613 | 614 | ``` 615 | 616 | ```{r} 617 | # Let's check it out.. 618 | 619 | head(grp_nr) 620 | ``` 621 | 622 | 623 | ```{r} 624 | # Calculate the non-response percentage for each ethnic group within each LA 625 | # Store results in new column 626 | 627 | grp_nr <- grp_nr %>% 628 | mutate(Eth_NR_Perc = round((NR_total / Ethnic_sum * 100), 2)) 629 | ``` 630 | 631 | 632 | ```{r} 633 | # Quick glance.. 634 | head(grp_nr) 635 | ``` 636 | ### Merge both datasets 637 | 638 | Now that we've completed the necessary calculations, we are left with two datasets: 639 | 640 | * grp_pct - details the % of each ethnic_group in each LA 641 | * grp_nr - details the ethnic group non-response % in each LA 642 | 643 | All we need to do now then, is merge these datasets together so that we can access the new columns and plot them: 644 | 645 | * Percentage 646 | * Eth_NR_Perc 647 | 648 | ```{r} 649 | # Merge the non-response data with the percentage of each ethnic group within each LA 650 | # Use select to isolate columns I want to preserve in the merge, LA_sum is redundant... 651 | 652 | nr <- merge(grp_nr, select(grp_pct, LA_name, ethnic_group, Percentage), by = c("LA_name", "ethnic_group")) 653 | ``` 654 | 655 | ```{r} 656 | # Quick glance 657 | 658 | head(nr) 659 | ``` 660 | 661 | 662 | ## Interactive scatterplot 663 | 664 | In this section we're going to: 665 | 666 | 1. Create a simple scatterplot exploring the relationship between the percentage of asian citizens within local authorities and their non-response rates 667 | 668 | 2. Implement a dropdown widget to update our scatterplot 669 | 670 | ```{r} 671 | # Subset dataframe so we only have responses from the asian ethnic group 672 | 673 | asian <- nr %>% 674 | filter(ethnic_group == 'Asian, Asian British or Asian Welsh') 675 | ``` 676 | 677 | 678 | ```{r} 679 | # Check it out.. 680 | 681 | head(asian) 682 | ``` 683 | 684 | ```{r} 685 | # Initialize figure 686 | fig3 <- plot_ly(data = asian, 687 | x = ~Percentage, 688 | y = ~Eth_NR_Perc, 689 | text = ~paste('LA Name:', LA_name, 690 | '
Non-response Total:', NR_total, 691 | '
Ethnic Group Total:', Ethnic_sum), 692 | hoverinfo = "text", 693 | mode = 'markers', # Specify marker points 694 | type = 'scatter', # Graph type - scatterplot 695 | name = 'Asian') # Default visible graph 696 | 697 | 698 | # Customize layout 699 | fig3 <- fig3 %>% 700 | layout(title = 'Non-Response Rates of the Asian Ethnic Group Across Local Authorities', 701 | xaxis = list(title = 'Percentage of Ethnic Group'), 702 | yaxis = list(title = 'Non-response Rate'), 703 | width = 700, 704 | height = 700) 705 | 706 | # Show the plot 707 | fig3 708 | ``` 709 | 710 | ## Dropdown selection 711 | 712 | What we're going to do now, is use Plotly's 'updatemenus' in conjunction with the 'update' method to create a dropdown where we can switch between the Asian ethnic group, and the White ethnic group to make some comparisons. 713 | 714 | ### Step 1: Initialise figure and add traces 715 | 716 | We'll start by creating a plot_ly figure with no data or variables specified. This is because we're going to use add_trace to add our two sets of datapoints to the plot. 'Traces' refer to a set of data, so in our example we want to add a trace with the data points relating to our asian ethnic group, and another one for our white ethnic group. This will start to make sense when we look at the code below. 717 | 718 | ```{r} 719 | # Initialize a Plotly figure 720 | fig4 <- plot_ly() 721 | 722 | # Let's take a look.. 723 | # This is our building block 724 | fig4 725 | 726 | ``` 727 | 728 | 729 | 730 | ```{r} 731 | # Subset dataframe so we only have responses from the white ethnic group 732 | white <- nr %>% 733 | filter(ethnic_group == 'White: English, Welsh, Scottish, Northern Irish or British') 734 | 735 | ``` 736 | 737 | ```{r} 738 | # Quick check... 739 | head(white) 740 | ``` 741 | 742 | 743 | ```{r} 744 | # Add trace for the Asian ethnic group 745 | 746 | fig4 <- fig4 %>% add_trace( 747 | data = asian, 748 | x = ~Percentage, 749 | y = ~Eth_NR_Perc, 750 | text = ~paste('LA Name:', LA_name, 751 | '
Non-response Total:', NR_total, 752 | '
Ethnic Group Total:', Ethnic_sum), 753 | type = 'scatter', 754 | mode = 'markers', 755 | name = 'Asian', 756 | hoverinfo = 'text', 757 | # visible parameter sets initial visibility of each trace when plot is first rendered 758 | visible = T 759 | ) 760 | 761 | 762 | # Add trace for the White ethnic group 763 | fig4 <- fig4 %>% add_trace( 764 | data = white, 765 | x = ~Percentage, 766 | y = ~Eth_NR_Perc, 767 | text = ~paste('LA Name:', LA_name, 768 | '
Non-response Total:', NR_total, 769 | '
Ethnic Group Total:', Ethnic_sum), 770 | type = 'scatter', 771 | mode = 'markers', 772 | name = 'White', 773 | hoverinfo = 'text', 774 | visible = F 775 | ) 776 | 777 | fig4 778 | ``` 779 | 780 | ### Step 2: Configure dropdown buttons and implement update method 781 | 782 | ```{r} 783 | 784 | # Define dropdown buttons for interactivity 785 | fig4 <- fig4 %>% layout( 786 | title = "Non-Response Rates Across Local Authorities", 787 | xaxis = list(title = "Percentage of Ethnic Group"), 788 | yaxis = list(title = "Non-response Rate"), 789 | # Hide the legend, as interactive dropdown will handle trace visibility 790 | showlegend = FALSE, 791 | # Add dropdown menu for interactive plot updates 792 | updatemenus = list( 793 | list( 794 | type = "dropdown", 795 | buttons = list( 796 | list( 797 | # the update method changes plot attributes when a button is clicked 798 | method = "update", 799 | # First button makes Asian data visible and hides the White data 800 | # Used to dynamically update the visibility of the traces based on user interaction 801 | args = list(list("visible" = list(TRUE, FALSE)), 802 | # Update the title specific to the Asian data 803 | list("title" = "Non-Response Rates of the Asian Ethnic Group Across Local Authorities")), 804 | # Specify button label 805 | label = "Asian" 806 | ), 807 | list( 808 | method = "update", 809 | args = list(list("visible" = list(FALSE, TRUE)), 810 | list("title" = "Non-Response Rates of the White Ethnic Group Across Local Authorities")), 811 | label = "White" 812 | ) 813 | ) 814 | ) 815 | ) 816 | ) 817 | 818 | # Display the figure 819 | fig4 820 | ``` 821 | 822 | 823 | 824 | # Sharing your interactive graphs online 825 | 826 | I'm going to provide you first with a really simple way to host Plotly graphs specifically, then we'll look into other more complex options that work with many visualisation packages. 827 | 828 | 1. Use Plotly's ['Chart Studio'](https://chart-studio.plotly.com/). You can upload your visualisations directly from your coding environment and then get a link to share them online. You'll need to sign up for an account but it's free, unless you want to share the link privately then you'll need to upgrade your account. Otherwise, for data that's fine being out in the open, this is a good option. 829 | 830 | 2. Embed your graphs in GitHub pages. Embed your graphs in GitHub pages. I'm not going to go into this fully, but if you're interested in doing something like this I recommend looking at GitHub's tutorial: https://pages.github.com/. This is what I used to create a GitHub pages for the UKDS which now acts as a lil website where we can show off cool projects like this one! Might be something to consider if you're a researcher looking to show off your work. 831 | 832 | 833 | -------------------------------------------------------------------------------- /Python_code/HTML_files/scatter.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
5 |
6 | 7 | --------------------------------------------------------------------------------