├── .dockerignore ├── .gitignore ├── Dockerfile ├── README.md ├── assets └── dashboard.png ├── data_prep_net_migration ├── README.md ├── assign_race.ipynb ├── compute_race.ipynb ├── gen_points_in_rectangle_fast_script.ipynb ├── gen_race_mig_points.ipynb └── gen_table_with_race_migration.ipynb ├── data_prep_total_population ├── .ipynb_checkpoints │ └── SeparateTotalDatasetsByState-checkpoint.ipynb ├── README.md ├── SeparateTotalDatasetsByState.ipynb ├── SeparateTotalDatasetsByState.py ├── add_race_net_county_to_population.ipynb ├── gen_table_with_migration.ipynb ├── gen_total_population_points_script.ipynb └── map_blocks_and_calc_population.ipynb ├── entrypoint.sh ├── environment.yml ├── environment_for_docker.yml ├── holoviews_demo ├── README.md ├── census_net_migration_demo.ipynb └── environment.yml ├── id2county.pkl └── plotly_demo ├── README.md ├── app.py ├── assets ├── dash-logo.png ├── rapids-logo.png └── s1.css ├── colab_plotly_rapids_app.ipynb ├── dask_app.py └── utils ├── __init__.py └── utils.py /.dockerignore: -------------------------------------------------------------------------------- 1 | ./data/* 2 | dask-worker-space 3 | .vscode -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | data 2 | **checkpoint** 3 | *pyc 4 | *log -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | ARG RAPIDS_VERSION=22.12 2 | ARG CUDA_VERSION=11.5 3 | ARG LINUX_VERSION=ubuntu20.04 4 | ARG PYTHON_VERSION=3.9 5 | FROM nvcr.io/nvidia/rapidsai/rapidsai-core:${RAPIDS_VERSION}-cuda${CUDA_VERSION}-base-${LINUX_VERSION}-py${PYTHON_VERSION} 6 | 7 | WORKDIR /rapids/ 8 | RUN mkdir plotly_census_demo 9 | 10 | WORKDIR /rapids/plotly_census_demo 11 | RUN mkdir data 12 | WORKDIR /rapids/plotly_census_demo/data 13 | RUN curl https://data.rapids.ai/viz-data/total_population_dataset.parquet -o total_population_dataset.parquet 14 | 15 | WORKDIR /rapids/plotly_census_demo 16 | 17 | COPY . . 18 | 19 | RUN source activate rapids && conda remove --force cuxfilter && mamba env update --file environment_for_docker.yml 20 | 21 | ENTRYPOINT ["bash","./entrypoint.sh"] -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Plotly-Dash + RAPIDS | Census 2020 Visualization 2 | 3 | 4 | 5 | ![dashboard_png_url](./assets/dashboard.png) 6 | 7 | ## Charts 8 | 9 | 1. Map chart shows the total population points for chosen view and selected area 10 | 2. Top counties bar show the top 15 counties for chosen view and selected area 11 | 3. Bottom counties bar show the bottom 15 counties for chosen view and selected area 12 | 4. Race Distribution shows distribution of individual races across blocks for chosen view and selected area 13 | 14 | Cross-filtering is enabled to link all the four charts using box-select tool 15 | 16 | ## Data-Selection Views 17 | 18 | The demo consists of six views and all views are calculated at a block level 19 | 20 | - Total Population view shows total Census 2020 population. 21 | - Migrating In view shows net inward decennial migration. 22 | - Stationary view shows population that were stationary. 23 | - Migrating Out view shows net outward decennial migration. 24 | - Net Migration view shows total decennial migration. Points are colored into three categories - migrating in, stationary, migrating out 25 | - Population with Race shows total Census 2020 population colored into seven race categories - White alone, African American alone, American Indian alone, Asian alone, Native Hawaiian alone, Other Race alone, Two or More races. 26 | 27 | ## Installation and Run Steps 28 | 29 | ## Base Layer Setup 30 | 31 | The visualization uses a Mapbox base layer that requires an access token. Create one for free [here on mapbox](https://www.mapbox.com/help/define-access-token/). Go to the demo root directory's `plotly_demo` folder and create a token file named `.mapbox_token`. Copy your token contents into the file. 32 | 33 | **NOTE:** Installation may fail without the token. 34 | 35 | ## Data 36 | 37 | There is 1 main dataset: 38 | 39 | - [Total Population Dataset](https://data.rapids.ai/viz-data/net_migration_dataset.parquet) ; Consists of Census 2020 total population with decennial migration from Census 2010 at a block level. 40 | - [Net Migration Dataset](https://data.rapids.ai/viz-data/net_migration_dataset.parquet) ; Net migration from Census 2010 at a block level. 41 | 42 | For more information on how the Census 2020 and 2010 Migration data was prepared to show individual points, refer to the `/data_prep_total_population` folder. 43 | 44 | ### Conda Env 45 | 46 | Verify the following arguments in the `environment.yml` match your system(easy way to check `nvidia-smi`): 47 | 48 | cudatoolkit: Version used is `11.5` 49 | 50 | ```bash 51 | # setup conda environment 52 | conda env create --name plotly_env --file environment.yml 53 | source activate plotly_env 54 | 55 | # run and access single GPU version 56 | cd plotly_demo 57 | python app.py 58 | 59 | # run and access multi GPU version, run `python dask_app.py --help for args info` 60 | # if --cuda_visible_devices argument is not passed, all the available GPUs are used 61 | cd plotly_demo 62 | python dask_app.py --cuda_visible_devices=0,1 63 | ``` 64 | 65 | ### Docker 66 | 67 | Verify the following arguments in the Dockerfile match your system: 68 | 69 | 1. CUDA_VERSION: Supported versions are `11.0+` 70 | 2. LINUX_VERSION: Supported OS values are `ubuntu16.04, ubuntu18.04, centos7` 71 | 72 | The most up to date OS and CUDA versions supported can be found here: [RAPIDS requirements](https://rapids.ai/start.html#req) 73 | 74 | ```bash 75 | # build 76 | docker build -t plotly_demo . 77 | 78 | # run and access single GPU version via: http://localhost:8050 / http://ip_address:8050 / http://0.0.0.0:8050 79 | docker run --gpus all --name single_gpu -p 8050:8050 plotly_demo 80 | 81 | # run and access multi GPU version via: http://localhost:8050 / http://ip_address:8050 / http://0.0.0.0:8050 82 | # Use `--gpus all` to use all the available GPUs 83 | docker run --gpus '"device=0,1"' --name multi_gpu -p 8050:8050 plotly_demo dask_app 84 | ``` 85 | 86 | ## Requirements 87 | 88 | ### CUDA/GPU requirements 89 | 90 | - CUDA 11.0+ 91 | - NVIDIA driver 450.80.02+ 92 | - Pascal architecture or better (Compute Capability >=6.0) 93 | 94 | > Recommended Memory: NVIDIA GPU with at least 32GB of memory(or 2 GPUs with equivalent GPU memory when running dask version), and at least 32GB of system memory. 95 | 96 | ### OS requirements 97 | 98 | See the [Rapids System Requirements section](https://rapids.ai/start.html#requirements) for information on compatible OS. 99 | 100 | ## Dependencies 101 | 102 | - python=3.9 103 | - cudatoolkit=11.5 104 | - rapids=22.08 105 | - dash=2.5.1 106 | - jupyterlab=3.4.3 107 | - dash-html-components=2.0.0 108 | - dash-core-components=2.0.0 109 | - dash-daq=0.5.0 110 | - dash_bootstrap_components=1.2.0 111 | 112 | ## FAQ and Known Issues 113 | 114 | **What hardware do I need to run this locally?** To run you need an NVIDIA GPU with at least 32GB of memory(or 2 GPUs with equivalent GPU memory when running dask version), at least 32GB of system memory. 115 | 116 | **How did you compute migration?** Migration was computed by comparing the block level population for census 2010 and 2020 117 | 118 | **How did you compare population having block level boundary changes?** [Relationship Files](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#t10t20) provides the 2010 Census Tabulation Block to 2020 Census Tabulation Block Relationship Files. Block relationships may be one-to-one, many-to-one, one-to-many, or many-to-many. Population count was computed in proportion to take into account the division and collation of blocks across 2010 and 2020. 119 | 120 | **How did you determine race?** Race for stationary and inward migration individuals was randomly assigned within a block but they add up accurately at the block level. However, due to how data is anonymized, race for outward migration population could not be calculated. 121 | 122 | **How did you get individual point locations?** The population density points are randomly placed within a census block and associated to match distribution counts at a census block level. 123 | 124 | **How are the population and distributions filtered?** Use the box select tool icon for the map or click and drag for the bar charts. 125 | 126 | **Why is the population data from 2010 and 2020?** Only census data is recorded on a block level, which provides the highest resolution population distributions available. For more details on census boundaries refer to the [TIGERweb app](https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_apps.html). 127 | 128 | **The dashboard stop responding or the chart data disappeared!** This is likely caused by an Out of Memory Error and the application must be restarted. 129 | 130 | **How do I request a feature or report a bug?** Create an [Issue](https://github.com/rapidsai/plotly-dash-rapids-census-demo/issues) and we will get to it asap. 131 | 132 | ## Acknowledgments and Data Sources 133 | 134 | - 2020 Population Census and 2010 Population Census to compute Migration Dataset, used with permission from IPUMS NHGIS, University of Minnesota, [www.nhgis.org](https://www.nhgis.org/) ( not for redistribution ). 135 | - Base map layer provided by [Mapbox](https://www.mapbox.com/). 136 | - Dashboard developed with [Plotly Dash](https://plotly.com/dash/). 137 | - Geospatial point rendering developed with [Datashader](https://datashader.org/). 138 | - GPU toggle accelerated with [RAPIDS cudf](https://rapids.ai/) and [cupy](https://cupy.chainer.org/), CPU toggle with [pandas](https://pandas.pydata.org/). 139 | - For source code and data workflow, visit our [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/census-2020). 140 | -------------------------------------------------------------------------------- /assets/dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/e4af2bd3de86263b7f1f947ba9b302002e047a55/assets/dashboard.png -------------------------------------------------------------------------------- /data_prep_net_migration/README.md: -------------------------------------------------------------------------------- 1 | # Net Migration dataset generation 2 | 3 | ## Order of execution 4 | 5 | 1. gen_table_with_race_migration 6 | 2. gen_race_mig_points 7 | 3. compute_race 8 | 4. assign_race 9 | 10 | ## Mappings: 11 | 12 | ### Block Net 13 | 14 | 1: Inward Migration
15 | 0: Stationary
16 | -1: Outward Migration
17 | 18 | ### Block diff 19 | 20 | Integer 21 | 22 | ### Race 23 | 24 | 0: All
25 | 1: White
26 | 2: African American
27 | 3: American Indian
28 | 4: Asian alone
29 | 5: Native Hawaiian
30 | 6: Other Race alone
31 | 7: Two or More
32 | 33 | ### County 34 | 35 | Mappings for counties can be found in `id2county.pkl` file from root directory. 36 | 37 | ### Final Dataset 38 | 39 | You can download the final net miragtion dataset [here](https://data.rapids.ai/viz-data/net_migration_dataset.parquet) 40 | -------------------------------------------------------------------------------- /data_prep_net_migration/gen_points_in_rectangle_fast_script.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "6f5f29ab-3b00-4377-a320-421b6e33386f", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Objective:- Alternative script that generates points within rectangular bounds in 40min for sanity checks ( Needs to be integrated with cuSpatial for checking points within polygon) " 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": null, 14 | "id": "1c9f4d54-8bb4-42ce-aa05-a7cb5465472a", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import cudf, cupy\n", 19 | "import pandas as pd, numpy as np\n", 20 | "import geopandas as gpd\n", 21 | "# from shapely.geometry import Point, Polygon\n", 22 | "import os\n", 23 | "import datetime\n", 24 | "import pickle" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "id": "ad825f8c-be08-40c6-8b95-fc75f357be7d", 30 | "metadata": { 31 | "tags": [] 32 | }, 33 | "source": [ 34 | "### ETL" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": null, 40 | "id": "1e9d91de-5c2b-4d5d-8810-f5d74598ca7f", 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "df = pd.read_csv('data/mapped_data_full.csv',encoding='unicode_escape',dtype={'GISJOIN':'int64','ID20':'int64','STATE':'int32','COUNTY':'str','P20':'int32','P10_new':'int32'}).drop('Unnamed: 0',axis=1)\n", 45 | "df['P_delta']=df['P20'] - df['eq_P10']\n", 46 | "df['P_net']= df['P_delta'].apply(lambda x : 1 if x>0 else 0)\n", 47 | "df['number'] = df.P_delta.round().abs().astype('int32')\n", 48 | "df.head()" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "id": "91a55f8f-e5bd-4fd5-9d6e-82cab1a7c1cc", 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "# df =df.to_pandas()" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "id": "0b6a458e-61dd-4645-a122-366a40fa9f95", 64 | "metadata": { 65 | "tags": [] 66 | }, 67 | "source": [ 68 | "#### MAKE function" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "id": "186e35dc-e5e8-4820-b02b-6baea50ca749", 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "def Random_Points_in_Bounds(row): \n", 79 | " polygon = row.iloc[0]\n", 80 | " number = row.iloc[1]\n", 81 | " minx, miny, maxx, maxy = polygon.bounds\n", 82 | " x = np.random.uniform( minx, maxx, number )\n", 83 | " y = np.random.uniform( miny, maxy, number )\n", 84 | " return [x, y]\n", 85 | "\n", 86 | "def makeXYpair(row):\n", 87 | " l1 = row[0]\n", 88 | " l2 = row[1]\n", 89 | " return list(map(lambda x, y:[x,y], l1, l2))\n", 90 | "\n", 91 | "\n", 92 | "def exec_data(state_key_list):\n", 93 | " c=0\n", 94 | " for i in state_key_list:\n", 95 | " c+=1\n", 96 | " if i< 10:\n", 97 | " i_str = '0'+str(i)\n", 98 | " else:\n", 99 | " i_str = str(i)\n", 100 | " path ='data/tl_shapefiles/tl_2021_%s_tabblock20.shp'%(i_str)\n", 101 | " print(\"started reading shape file for state \", states[i])\n", 102 | " if os.path.isfile(path): \n", 103 | " gpdf = gpd.read_file(path)[['GEOID20', 'geometry']].sort_values('GEOID20').reset_index(drop=True)\n", 104 | " gpdf.GEOID20 = gpdf.GEOID20.astype('int64')\n", 105 | " print(\"completed reading shape file for state \", states[i])\n", 106 | " df_temp = df.query('STATE == @i')[['ID20', 'number','COUNTY','P_delta','P_net']]\n", 107 | " merged_df =pd.merge(gpdf,df_temp[['ID20','number']],left_on='GEOID20',right_on='ID20',how='inner')\n", 108 | " merged_df = merged_df[merged_df.number!=0].reset_index(drop=True)\n", 109 | " merged_df =merged_df.reset_index(drop=True).drop(columns=['GEOID20'])\n", 110 | "\n", 111 | " print(\"starting to generate data for \"+str(states[i])+\"... \")\n", 112 | " t1 = datetime.datetime.now()\n", 113 | " population_df = pd.DataFrame(merged_df[['geometry','number']].apply(Random_Points_in_Bounds,axis=1),columns=['population'])\n", 114 | " points_df = population_df['population'].apply(makeXYpair)\n", 115 | " points_df = pd.DataFrame(points_df.explode()).reset_index()\n", 116 | " \n", 117 | " pop_list =points_df['population'].to_list()\n", 118 | " final_df =pd.DataFrame(pop_list,columns=['x','y']).reset_index(drop=True)\n", 119 | " \n", 120 | " ids = merged_df.ID20.to_list()\n", 121 | " number =merged_df.number.to_list()\n", 122 | " \n", 123 | " rows = []\n", 124 | " for id20, n in zip(ids,number):\n", 125 | " rows.extend([id20]*n)\n", 126 | " \n", 127 | " \n", 128 | " final_df['ID20'] = pd.Series(rows)\n", 129 | " final_df = final_df.sort_values('ID20').reset_index(drop=True)\n", 130 | " final_df = pd.merge(final_df,df_temp, on='ID20',how='left')\n", 131 | " \n", 132 | " \n", 133 | " final_df.to_csv('data/migration_files1/migration_%s'%str(states[i])+'.csv', index=False)\n", 134 | " print(\"Processing complete for\", states[i])\n", 135 | " print('Processing for '+str(states[i])+' complete \\n total time', datetime.datetime.now() - t1)\n", 136 | " \n", 137 | " del(df_temp)\n", 138 | " else:\n", 139 | " print(\"shape file does not exist\")\n", 140 | " continue" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "id": "b5e2b4fb-c8e2-48fc-a496-fb0e0b5387a4", 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "# states = {1 :\"AL\",2 :\"AK\",4 :\"AZ\",5 :\"AR\",6 :\"CA\",8 :\"CO\",9 :\"CT\",10:\"DE\",11:\"DC\",12:\"FL\",13:\"GA\",15:\"HI\",\n", 151 | "# 16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS\",21:\"KY\",22:\"LA\",23:\"ME\",24:\"MD\",25:\"MA\",26:\"MI\",27:\"MN\",\n", 152 | "# 28:\"MS\",29:\"MO\",30:\"MT\",31:\"NE\",32:\"NV\",33:\"NH\",34:\"NJ\",35:\"NM\",36:\"NY\",37:\"NC\",38:\"ND\",39:\"OH\",\n", 153 | "# 40:\"OK\",41:\"OR\",42:\"PA\",44:\"RI\",45:\"SC\",46:\"SD\",47:\"TN\",48:\"TX\",49:\"UT\",50:\"VT\",51:\"VA\",53:\"WA\",\n", 154 | "# 54:\"WV\",55:\"WI\",56:\"WY\",72:\"PR\"}\n", 155 | "states= { 12:\"FL\",13:\"GA\",15:\"HI\",16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS}" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "id": "68872dfa-8eb7-44a3-9bdf-73376f8c28ec", 162 | "metadata": { 163 | "tags": [] 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "exec_data(states.keys())" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "id": "42b474dd-8db8-48d1-8a66-62bcc8dbad27", 173 | "metadata": { 174 | "tags": [] 175 | }, 176 | "source": [ 177 | "### Concat States" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": null, 183 | "id": "a1f2fd86-6a7d-41db-96e9-2665c90bf4c4", 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "def merge_shape_and_states(state_key_list):\n", 188 | " concat_states = cudf.DataFrame()\n", 189 | " \n", 190 | " for i in state_key_list:\n", 191 | " if i< 10:\n", 192 | " i_str = '0'+str(i)\n", 193 | " else:\n", 194 | " i_str = str(i)\n", 195 | " path = 'data/migration_files1/migration_%s'%str(states[i])+'.csv'\n", 196 | " if os.path.isfile(path): \n", 197 | " temp = cudf.read_csv(path,dtype={'ID20':'int64','x':'float32','y':'float32'})# Load shape files\n", 198 | " concat_states = cudf.concat([concat_states,temp])\n", 199 | " else:\n", 200 | " print(path)\n", 201 | " print(\"shape file does not exist\")\n", 202 | " continue\n", 203 | " return concat_states" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "id": "e88d458c-2016-4076-b513-59e1277f751b", 210 | "metadata": { 211 | "tags": [] 212 | }, 213 | "outputs": [], 214 | "source": [ 215 | "indv_df = merge_shape_and_states(states.keys())\n", 216 | "indv_df.rename(columns={'GEOID20':'ID20'},inplace=True)\n", 217 | "indv_df.head()" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "id": "fdc83845-75c0-4ad0-8538-1abeca2190cc", 223 | "metadata": {}, 224 | "source": [ 225 | "### Load saved files" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "id": "7d37b71c-fad2-46b5-9324-f0e67bf85a09", 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "pickle.dump(indv_df,open('fulldata_gpu_2','wb'))\n", 236 | "# indv_df = pickle.load(open('fulldata_gpu','rb'))" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "id": "d94ea6f0-0c6f-4932-87cb-53ad28b3b57a", 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "# indv_df = indv_df.to_pandas()" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": null, 252 | "id": "ebae939f-f03d-478d-ad45-c0fee7670f0a", 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "indv_df = dask_cudf.from_cudf(indv_df, npartitions=2).persist()" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "id": "efe847c5-892c-4bee-bacb-38e57ae56ebb", 263 | "metadata": { 264 | "tags": [] 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "# dataset = pd.merge(indv_df,df,on='ID20',how='left')\n", 269 | "dataset = indv_df.merge(df,on='ID20',how='left') # merge dask dfs" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "id": "36e6fb1b-a420-4ba2-8b25-b14f5978ca9b", 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "len(dataset)" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "id": "5b63b35b-3774-4900-a1c2-1dc6615784eb", 286 | "metadata": {}, 287 | "outputs": [], 288 | "source": [ 289 | "del(indv_df)\n", 290 | "del(df)" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "id": "4f85b884-4e84-451c-a170-ce25631cc922", 297 | "metadata": { 298 | "tags": [] 299 | }, 300 | "outputs": [], 301 | "source": [ 302 | "dataset = dataset.sort_values('ID20')\n", 303 | "dataset = dataset.drop(columns=['GISJOIN'])\n", 304 | "dataset.head()" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "id": "e0a789ed-4c60-43fd-ac76-1ac99a971110", 310 | "metadata": { 311 | "tags": [] 312 | }, 313 | "source": [ 314 | "### Viz check" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "id": "f8643ed6-4b19-4733-a2a6-eba71884e700", 321 | "metadata": {}, 322 | "outputs": [], 323 | "source": [ 324 | "from holoviews.element.tiles import CartoDark\n", 325 | "import holoviews as hv\n", 326 | "from holoviews.operation.datashader import datashade,rasterize,shade\n", 327 | "from plotly.colors import sequential\n", 328 | "hv.extension('plotly')" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": null, 334 | "id": "85562104-aef7-48c0-85bd-347316a3f633", 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [ 338 | "dataset[\"easting\"], dataset[\"northing\"] = hv.Tiles.lon_lat_to_easting_northing(dataset[\"x\"], dataset[\"y\"])\n", 339 | "dataset.head()" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": null, 345 | "id": "d8653497-d17e-4c38-b1e8-732d424cae04", 346 | "metadata": {}, 347 | "outputs": [], 348 | "source": [ 349 | "dataset = hv.Dataset(dataset)" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "id": "2a19ce9a-9d34-4507-8d60-93dab88dd289", 356 | "metadata": {}, 357 | "outputs": [], 358 | "source": [ 359 | "mapbox_token = 'pk.eyJ1IjoibmlzaGFudGoiLCJhIjoiY2w1aXpwMXlkMDEyaDNjczBkZDVjY2l6dyJ9.7oLijsue-xOICmTqNInrBQ'\n", 360 | "tiles= hv.Tiles().opts(mapboxstyle=\"dark\", accesstoken=mapbox_token)\n", 361 | "points = datashade(hv.Points(dataset, [\"easting\", \"northing\"]),cmap=sequential.Plasma)" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "id": "5d1fc4fe-e329-4f8c-984b-752b8c87246c", 368 | "metadata": { 369 | "tags": [] 370 | }, 371 | "outputs": [], 372 | "source": [ 373 | "(tiles*points).opts(width=1800, height=500)" 374 | ] 375 | } 376 | ], 377 | "metadata": { 378 | "kernelspec": { 379 | "display_name": "Python 3 (ipykernel)", 380 | "language": "python", 381 | "name": "python3" 382 | }, 383 | "language_info": { 384 | "codemirror_mode": { 385 | "name": "ipython", 386 | "version": 3 387 | }, 388 | "file_extension": ".py", 389 | "mimetype": "text/x-python", 390 | "name": "python", 391 | "nbconvert_exporter": "python", 392 | "pygments_lexer": "ipython3", 393 | "version": "3.9.13" 394 | } 395 | }, 396 | "nbformat": 4, 397 | "nbformat_minor": 5 398 | } 399 | -------------------------------------------------------------------------------- /data_prep_total_population/.ipynb_checkpoints/SeparateTotalDatasetsByState-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Separate Total Population dataset by States" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import cudf\n", 17 | "import pickle" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "metadata": {}, 24 | "outputs": [ 25 | { 26 | "data": { 27 | "text/html": [ 28 | "
\n", 29 | "\n", 42 | "\n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | "
eastingnorthingracenetcounty
0-9626792.03825189.75100
1-9626832.03825073.75100
2-9627101.03825153.50100
3-9627149.03825322.75100
4-9627159.03825334.75100
\n", 96 | "
" 97 | ], 98 | "text/plain": [ 99 | " easting northing race net county\n", 100 | "0 -9626792.0 3825189.75 1 0 0\n", 101 | "1 -9626832.0 3825073.75 1 0 0\n", 102 | "2 -9627101.0 3825153.50 1 0 0\n", 103 | "3 -9627149.0 3825322.75 1 0 0\n", 104 | "4 -9627159.0 3825334.75 1 0 0" 105 | ] 106 | }, 107 | "execution_count": 2, 108 | "metadata": {}, 109 | "output_type": "execute_result" 110 | } 111 | ], 112 | "source": [ 113 | "# Load the dataset\n", 114 | "df = cudf.read_parquet('../data/total_population_dataset.parquet')\n", 115 | "df.head()" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 3, 121 | "metadata": {}, 122 | "outputs": [ 123 | { 124 | "data": { 125 | "text/html": [ 126 | "
\n", 127 | "\n", 140 | "\n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | "
idxcountycounty_lower
00Autauga Countyautauga county
11Baldwin Countybaldwin county
22Barbour Countybarbour county
33Bibb Countybibb county
44Blount Countyblount county
\n", 182 | "
" 183 | ], 184 | "text/plain": [ 185 | " idx county county_lower\n", 186 | "0 0 Autauga County autauga county\n", 187 | "1 1 Baldwin County baldwin county\n", 188 | "2 2 Barbour County barbour county\n", 189 | "3 3 Bibb County bibb county\n", 190 | "4 4 Blount County blount county" 191 | ] 192 | }, 193 | "execution_count": 3, 194 | "metadata": {}, 195 | "output_type": "execute_result" 196 | } 197 | ], 198 | "source": [ 199 | "# Load the state to county mapping\n", 200 | "id2county = pickle.load(open('../id2county.pkl','rb'))\n", 201 | "df_counties = cudf.DataFrame(dict(idx=list(id2county.keys()), county=list(id2county.values())))\n", 202 | "\n", 203 | "# Lowercase the county names for easier merging\n", 204 | "df_counties['county_lower'] = df_counties.county.str.lower()\n", 205 | "df_counties.head()" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 4, 211 | "metadata": {}, 212 | "outputs": [ 213 | { 214 | "data": { 215 | "text/html": [ 216 | "
\n", 217 | "\n", 230 | "\n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | "
countytypestate
0HarrisoncountyMissouri
1JeffersoncountyMissouri
2NewtoncountyMissouri
3WaynecountyMissouri
4LincolncountyMontana
\n", 272 | "
" 273 | ], 274 | "text/plain": [ 275 | " county type state\n", 276 | "0 Harrison county Missouri\n", 277 | "1 Jefferson county Missouri\n", 278 | "2 Newton county Missouri\n", 279 | "3 Wayne county Missouri\n", 280 | "4 Lincoln county Montana" 281 | ] 282 | }, 283 | "execution_count": 4, 284 | "metadata": {}, 285 | "output_type": "execute_result" 286 | } 287 | ], 288 | "source": [ 289 | "# Dataset downloaded from https://public.opendatasoft.com/explore/dataset/georef-united-states-of-america-county/export/?disjunctive.ste_code&disjunctive.ste_name&disjunctive.coty_code&disjunctive.coty_name\n", 290 | "county_state_df = cudf.read_csv('../data/us-counties1.csv', delimiter=\";\")[['Official Name County', 'Type', 'Official Name State']].dropna()\n", 291 | "county_state_df.columns = ['county', 'type', 'state']\n", 292 | "county_state_df.head()" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 5, 298 | "metadata": {}, 299 | "outputs": [], 300 | "source": [ 301 | "# Add the type to the county name\n", 302 | "county_state_df['county'] = county_state_df.apply(lambda row: row['county'] + ' ' + row['type'], axis=1)\n", 303 | "\n", 304 | "# Remove non-ascii characters and abbreviations to match the other id2county mapping dataset\n", 305 | "county_state_df['county'] = county_state_df.county.to_pandas().replace({r'[^\\x00-\\x7F]+': '', r'([A-Z][a-z]+)([A-Z]+)': r'\\1'}, regex=True)\n", 306 | "\n", 307 | "# Lowercase the county names for easier merging\n", 308 | "county_state_df['county_lower'] = county_state_df['county'].str.lower()" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 6, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "# Merge the datasets and drop duplicates to get the state for each county in the total population dataset\n", 318 | "df_map_county_to_states = df_counties.merge(county_state_df, on='county_lower', how='left', suffixes=['', '_y']).drop_duplicates(subset=['county_lower'])[['idx', 'county', 'state' ]]" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 9, 324 | "metadata": {}, 325 | "outputs": [], 326 | "source": [ 327 | "# Fill in the states for unavailable states manually by looking at the counties\n", 328 | "# Carson City, Nevada\n", 329 | "# District of Columbia, Washington DC\n", 330 | "# Remaining, Connecticut\n", 331 | "df_map_county_to_states.loc[df_map_county_to_states.county == 'Carson City', 'state'] = 'Nevada'\n", 332 | "df_map_county_to_states.loc[df_map_county_to_states.county == 'District of Columbia', 'state'] = 'Nevada'\n", 333 | "df_map_county_to_states.loc[df_map_county_to_states.isna().any(axis=1), 'state'] = 'Connecticut'" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 10, 339 | "metadata": {}, 340 | "outputs": [], 341 | "source": [ 342 | "# Save the mapping\n", 343 | "df_map_county_to_states.to_parquet('../data/county_to_state_mapping.parquet')" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 11, 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "data": { 353 | "text/html": [ 354 | "
\n", 355 | "\n", 368 | "\n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | "
idxcountystate
0144Lonoke CountyArkansas
1145Miller CountyGeorgia
3146Mississippi CountyMissouri
5147Nevada CountyCalifornia
7148Newton CountyTexas
............
295476Fairbanks North Star BoroughAlaska
295578Hoonah-Angoon Census AreaAlaska
295679Juneau City and BoroughAlaska
295874Denali BoroughAlaska
295977Haines BoroughAlaska
\n", 446 | "

1955 rows × 3 columns

\n", 447 | "
" 448 | ], 449 | "text/plain": [ 450 | " idx county state\n", 451 | "0 144 Lonoke County Arkansas\n", 452 | "1 145 Miller County Georgia\n", 453 | "3 146 Mississippi County Missouri\n", 454 | "5 147 Nevada County California\n", 455 | "7 148 Newton County Texas\n", 456 | "... ... ... ...\n", 457 | "2954 76 Fairbanks North Star Borough Alaska\n", 458 | "2955 78 Hoonah-Angoon Census Area Alaska\n", 459 | "2956 79 Juneau City and Borough Alaska\n", 460 | "2958 74 Denali Borough Alaska\n", 461 | "2959 77 Haines Borough Alaska\n", 462 | "\n", 463 | "[1955 rows x 3 columns]" 464 | ] 465 | }, 466 | "execution_count": 11, 467 | "metadata": {}, 468 | "output_type": "execute_result" 469 | } 470 | ], 471 | "source": [ 472 | "df_map_county_to_states" 473 | ] 474 | } 475 | ], 476 | "metadata": { 477 | "kernelspec": { 478 | "display_name": "Python 3 (ipykernel)", 479 | "language": "python", 480 | "name": "python3" 481 | }, 482 | "language_info": { 483 | "codemirror_mode": { 484 | "name": "ipython", 485 | "version": 3 486 | }, 487 | "file_extension": ".py", 488 | "mimetype": "text/x-python", 489 | "name": "python", 490 | "nbconvert_exporter": "python", 491 | "pygments_lexer": "ipython3", 492 | "version": "3.10.11" 493 | } 494 | }, 495 | "nbformat": 4, 496 | "nbformat_minor": 4 497 | } 498 | -------------------------------------------------------------------------------- /data_prep_total_population/README.md: -------------------------------------------------------------------------------- 1 | # Total population dataset generation 2 | 3 | ## Order of execution 4 | 5 | 1. map_blocks_and_calc_population 6 | 2. gen_table_with_migration 7 | 3. gen_total_population_points_script 8 | 4. add_race_net_county_to_population 9 | 10 | ## Mappings: 11 | 12 | ### Net 13 | 14 | 1: Inward Migration
15 | 0: Stationary
16 | -1: Outward Migration
17 | 18 | ### Race 19 | 20 | 0: All
21 | 1: White
22 | 2: African American
23 | 3: American Indian
24 | 4: Asian alone
25 | 5: Native Hawaiian
26 | 6: Other Race alone
27 | 7: Two or More
28 | 29 | ### County 30 | 31 | Mappings for counties can be found in `id2county.pkl` file from root directory. 32 | 33 | ### Final Dataset 34 | 35 | You can download the final total population dataset [here](https://data.rapids.ai/viz-data/total_population_dataset.parquet) 36 | -------------------------------------------------------------------------------- /data_prep_total_population/SeparateTotalDatasetsByState.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Separate Total Population dataset by States" 8 | ] 9 | } 10 | ], 11 | "metadata": { 12 | "kernelspec": { 13 | "display_name": "Python 3 (ipykernel)", 14 | "language": "python", 15 | "name": "python3" 16 | }, 17 | "language_info": { 18 | "codemirror_mode": { 19 | "name": "ipython", 20 | "version": 3 21 | }, 22 | "file_extension": ".py", 23 | "mimetype": "text/x-python", 24 | "name": "python", 25 | "nbconvert_exporter": "python", 26 | "pygments_lexer": "ipython3", 27 | "version": "3.10.11" 28 | } 29 | }, 30 | "nbformat": 4, 31 | "nbformat_minor": 4 32 | } 33 | -------------------------------------------------------------------------------- /data_prep_total_population/SeparateTotalDatasetsByState.py: -------------------------------------------------------------------------------- 1 | import cudf 2 | import cuspatial 3 | import geopandas as gpd 4 | import os 5 | from shapely.geometry import Polygon, MultiPolygon 6 | 7 | DATA_PATH = "../data" 8 | DATA_PATH_STATE = f"{DATA_PATH}/state-wise-population" 9 | 10 | # create DATA_PATH if it does not exist 11 | if not os.path.exists(DATA_PATH_STATE): 12 | os.makedirs(DATA_PATH_STATE) 13 | 14 | # Read the total population dataset as a cudf dataframe from the parquet file 15 | df = cudf.read_parquet(f"{DATA_PATH}/total_population_dataset.parquet") 16 | 17 | # Read the shapefile as a cuspatial dataframe and get the state names and geometries 18 | # downloaded from https://hub.arcgis.com/datasets/1b02c87f62d24508970dc1a6df80c98e/explore 19 | shapefile_path = f"{DATA_PATH}/States_shapefile/States_shapefile.shp" 20 | states_data = gpd.read_file(shapefile_path)[ 21 | ["State_Code", "State_Name", "geometry"] 22 | ].to_crs(3857) 23 | 24 | print("Number of states to process: ", len(states_data)) 25 | print("Number of points in total population dataset: ", len(df)) 26 | print("Processing states with Polygon geometries...") 27 | 28 | processed_states = 0 29 | # Loop through the states and get the points in each state and save as a separate dataframe 30 | # process all Polygon geometries in the shapefile 31 | for index, row in states_data.iterrows(): 32 | if isinstance(row["geometry"], MultiPolygon): 33 | # skip MultiPolygon geometries 34 | continue 35 | 36 | state_name = row["State_Name"] 37 | processed_states += 1 38 | print( 39 | "Processing state: ", 40 | state_name, 41 | " (", 42 | processed_states, 43 | "/", 44 | len(states_data), 45 | ")", 46 | ) 47 | 48 | if os.path.exists(f"{DATA_PATH_STATE}/{state_name}.parquet"): 49 | print("State already processed. Skipping...") 50 | continue 51 | 52 | # process all MultiPolygon geometries in the shapefile 53 | # Use cuspatial point_in_polygon to get the points in the state from the total population dataset 54 | state_geometry = cuspatial.GeoSeries( 55 | gpd.GeoSeries(row["geometry"]), index=["selection"] 56 | ) 57 | 58 | # Loop through the total population dataset in batches of 50 million points to avoid OOM issues 59 | batch_size = 50_000_000 60 | points_in_state = cudf.DataFrame({"selection": []}) 61 | for i in range(0, len(df), batch_size): 62 | # get the batch of points 63 | batch = df[i : i + batch_size][["easting", "northing"]] 64 | # convert to GeoSeries 65 | points = cuspatial.GeoSeries.from_points_xy( 66 | batch.interleave_columns().astype("float64") 67 | ) 68 | # get the points in the state from the batch 69 | points_in_state_current_batch = cuspatial.point_in_polygon( 70 | points, state_geometry 71 | ) 72 | # append the points in the state from the batch to the points_in_state dataframe 73 | points_in_state = cudf.concat([points_in_state, points_in_state_current_batch]) 74 | # free up memory 75 | del batch 76 | 77 | print( 78 | f"Number of points in {state_name}: ", 79 | df[points_in_state["selection"]].shape[0], 80 | ) 81 | 82 | # save the points in the state as a separate dataframe 83 | df[points_in_state["selection"]].to_parquet( 84 | f"{DATA_PATH_STATE}/{state_name}.parquet" 85 | ) 86 | 87 | print("Processing states with MultiPolygon geometries...") 88 | # process all MultiPolygon geometries in the shapefile 89 | for index, row in states_data.iterrows(): 90 | if isinstance(row["geometry"], Polygon): 91 | # skip Polygon geometries 92 | continue 93 | 94 | state_name = row["State_Name"] 95 | processed_states += 1 96 | print( 97 | "Processing state: ", 98 | state_name, 99 | " (", 100 | processed_states, 101 | "/", 102 | len(states_data), 103 | ")", 104 | ) 105 | if os.path.exists(f"{DATA_PATH_STATE}/{state_name}.parquet"): 106 | print("State already processed. Skipping...") 107 | continue 108 | 109 | # process all MultiPolygon geometries in the shapefile 110 | points_in_state = None 111 | for polygon in list(row["geometry"].geoms): 112 | # process each polygon in the MultiPolygon 113 | state_geometry = cuspatial.GeoSeries( 114 | gpd.GeoSeries(polygon), index=["selection"] 115 | ) 116 | 117 | # Loop through the total population dataset in batches of 50 million points to avoid OOM issues 118 | batch_size = 50_000_000 119 | points_in_state_current_polygon = cudf.DataFrame({"selection": []}) 120 | for i in range(0, len(df), batch_size): 121 | # get the batch of points 122 | batch = df[i : i + batch_size][["easting", "northing"]] 123 | # convert to GeoSeries 124 | points = cuspatial.GeoSeries.from_points_xy( 125 | batch.interleave_columns().astype("float64") 126 | ) 127 | # get the points in the state from the batch 128 | points_in_state_current_batch = cuspatial.point_in_polygon( 129 | points, state_geometry 130 | ) 131 | # append the points in the state from the batch to the points_in_state_current_polygon dataframe 132 | points_in_state_current_polygon = cudf.concat( 133 | [points_in_state_current_polygon, points_in_state_current_batch] 134 | ) 135 | # free up memory 136 | del batch 137 | 138 | points_in_state = ( 139 | points_in_state_current_polygon 140 | if points_in_state is None 141 | else points_in_state | points_in_state_current_polygon 142 | ) 143 | 144 | print( 145 | f"Number of points in {state_name}: ", 146 | df[points_in_state["selection"]].shape[0], 147 | ) 148 | 149 | # save the points in the state as a separate dataframe 150 | df[points_in_state["selection"]].to_parquet( 151 | f"{DATA_PATH_STATE}/{state_name}.parquet" 152 | ) 153 | -------------------------------------------------------------------------------- /data_prep_total_population/gen_table_with_migration.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "dd3c16ee-6929-4ecf-a442-9555d0b97c03", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Objective:- Clean and save needed attributes and create table for generating migration points." 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "7f7bc3c9-c297-4e02-9d4f-3e27d2223492", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd\n", 19 | "import geopandas as gpd\n", 20 | "import ast,os,random\n", 21 | "pd.set_option('display.float_format','{:.1f}'.format)\n", 22 | "import warnings\n", 23 | "warnings.filterwarnings('ignore')\n", 24 | "import cudf, cupy as cp\n", 25 | "import numpy as np\n", 26 | "import time\n", 27 | "import math\n", 28 | "import pickle\n", 29 | "# pd.set_option('display.max_colwidth', -1)" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "id": "a5b2e784-179d-404f-be48-c2d84bbcf9a7", 35 | "metadata": { 36 | "tags": [] 37 | }, 38 | "source": [ 39 | "#### Load data" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 2, 45 | "id": "c9e38991-3723-4edb-a4b4-f890f24bd85f", 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "df = pd.read_csv('data/mapped_blocks_full.csv',encoding='unicode_escape',usecols=['ID20','STATE','COUNTY','P20','eq_P10'])" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 3, 55 | "id": "b5dbead9-dd67-409a-b81e-7c8dd69c97cc", 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/plain": [ 61 | "334735155" 62 | ] 63 | }, 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "output_type": "execute_result" 67 | } 68 | ], 69 | "source": [ 70 | "df.P20.sum()" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 4, 76 | "id": "2f0bafbd-5a2e-4b86-8ac9-fd2a83a3889a", 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [ 80 | "df.COUNTY.replace({r'[^\\x00-\\x7F]+':''},regex=True,inplace=True)\n", 81 | "df.COUNTY.replace({r'([A-Z][a-z]+)([A-Z]+)':r'\\1'},regex=True,inplace=True)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 6, 87 | "id": "4d82f1b8-e111-463c-8148-82238a2098d5", 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "text/plain": [ 93 | "8174955" 94 | ] 95 | }, 96 | "execution_count": 6, 97 | "metadata": {}, 98 | "output_type": "execute_result" 99 | } 100 | ], 101 | "source": [ 102 | "len(df)" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 7, 108 | "id": "b813f68f-448a-4a97-af82-cbca5f15266b", 109 | "metadata": {}, 110 | "outputs": [ 111 | { 112 | "data": { 113 | "text/html": [ 114 | "
\n", 115 | "\n", 128 | "\n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | "
ID20STATECOUNTYP20eq_P10block_diffblock_net
0100102010010001Autauga County2130.5-10.0-1
1100102010010011Autauga County3430.54.01
2100102010010021Autauga County2951.8-23.0-1
3100102010010031Autauga County1713.34.01
4100102010010041Autauga County00.00.00
\n", 194 | "
" 195 | ], 196 | "text/plain": [ 197 | " ID20 STATE COUNTY P20 eq_P10 block_diff block_net\n", 198 | "0 10010201001000 1 Autauga County 21 30.5 -10.0 -1\n", 199 | "1 10010201001001 1 Autauga County 34 30.5 4.0 1\n", 200 | "2 10010201001002 1 Autauga County 29 51.8 -23.0 -1\n", 201 | "3 10010201001003 1 Autauga County 17 13.3 4.0 1\n", 202 | "4 10010201001004 1 Autauga County 0 0.0 0.0 0" 203 | ] 204 | }, 205 | "execution_count": 7, 206 | "metadata": {}, 207 | "output_type": "execute_result" 208 | } 209 | ], 210 | "source": [ 211 | "df['block_diff'] = df['P20'] - df['eq_P10']\n", 212 | "df['block_diff'] = df['block_diff'].round()\n", 213 | "df['block_net'] = df['block_diff'].apply(lambda x: 1 if x>0 else ( -1 if x<0 else 0))\n", 214 | "df.head()" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 8, 220 | "id": "99f14f0c-35b0-4a6f-8d4f-b5286ef6c9e7", 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "data": { 225 | "text/html": [ 226 | "
\n", 227 | "\n", 240 | "\n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | "
ID20STATECOUNTYP20eq_P10block_diffblock_neterror
0100102010010001Autauga County2130.0-10.0-11.0
1100102010010011Autauga County3430.04.010.0
2100102010010021Autauga County2952.0-23.0-10.0
3100102010010031Autauga County1713.04.010.0
4100102010010041Autauga County00.00.000.0
\n", 312 | "
" 313 | ], 314 | "text/plain": [ 315 | " ID20 STATE COUNTY P20 eq_P10 block_diff block_net \\\n", 316 | "0 10010201001000 1 Autauga County 21 30.0 -10.0 -1 \n", 317 | "1 10010201001001 1 Autauga County 34 30.0 4.0 1 \n", 318 | "2 10010201001002 1 Autauga County 29 52.0 -23.0 -1 \n", 319 | "3 10010201001003 1 Autauga County 17 13.0 4.0 1 \n", 320 | "4 10010201001004 1 Autauga County 0 0.0 0.0 0 \n", 321 | "\n", 322 | " error \n", 323 | "0 1.0 \n", 324 | "1 0.0 \n", 325 | "2 0.0 \n", 326 | "3 0.0 \n", 327 | "4 0.0 " 328 | ] 329 | }, 330 | "execution_count": 8, 331 | "metadata": {}, 332 | "output_type": "execute_result" 333 | } 334 | ], 335 | "source": [ 336 | "df['eq_P10'] = df['eq_P10'].round()\n", 337 | "df['error'] = (df['P20']-df['eq_P10']) - df['block_diff']\n", 338 | "df.head()" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": 9, 344 | "id": "85d04ec7-c40a-491f-b5ac-dbdddfa087c4", 345 | "metadata": {}, 346 | "outputs": [ 347 | { 348 | "data": { 349 | "text/html": [ 350 | "
\n", 351 | "\n", 364 | "\n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | "
ID20STATECOUNTYP20eq_P10block_diffblock_neterror
\n", 381 | "
" 382 | ], 383 | "text/plain": [ 384 | "Empty DataFrame\n", 385 | "Columns: [ID20, STATE, COUNTY, P20, eq_P10, block_diff, block_net, error]\n", 386 | "Index: []" 387 | ] 388 | }, 389 | "execution_count": 9, 390 | "metadata": {}, 391 | "output_type": "execute_result" 392 | } 393 | ], 394 | "source": [ 395 | "df['eq_P10'] = df['eq_P10'] + df['error']\n", 396 | "df[(df['P20']-df['eq_P10'])!=(df['block_diff'])]" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": 14, 402 | "id": "eca29aec-38f6-45ec-a64c-a67582fd79ed", 403 | "metadata": {}, 404 | "outputs": [], 405 | "source": [ 406 | "df[['ID20','COUNTY','P20','eq_P10','block_diff','block_net']].to_parquet('data/total_attr_gen_df.parquet') #save attributes to be added later" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "id": "e74d0125-ec88-43e9-96a0-9f68202fb0b5", 412 | "metadata": {}, 413 | "source": [ 414 | "#### Attach county" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": 2, 420 | "id": "6db27018-a14b-4a4c-8181-6386ab9a6430", 421 | "metadata": {}, 422 | "outputs": [ 423 | { 424 | "data": { 425 | "text/html": [ 426 | "
\n", 427 | "\n", 440 | "\n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | "
ID20COUNTYP20eq_P10block_diffblock_net
010010201001000Autauga County2131.0-10.0-1
110010201001001Autauga County3430.04.01
210010201001002Autauga County2952.0-23.0-1
310010201001003Autauga County1713.04.01
410010201001004Autauga County00.00.00
\n", 500 | "
" 501 | ], 502 | "text/plain": [ 503 | " ID20 COUNTY P20 eq_P10 block_diff block_net\n", 504 | "0 10010201001000 Autauga County 21 31.0 -10.0 -1\n", 505 | "1 10010201001001 Autauga County 34 30.0 4.0 1\n", 506 | "2 10010201001002 Autauga County 29 52.0 -23.0 -1\n", 507 | "3 10010201001003 Autauga County 17 13.0 4.0 1\n", 508 | "4 10010201001004 Autauga County 0 0.0 0.0 0" 509 | ] 510 | }, 511 | "execution_count": 2, 512 | "metadata": {}, 513 | "output_type": "execute_result" 514 | } 515 | ], 516 | "source": [ 517 | "df = pd.read_parquet('data/total_attr_gen_df.parquet')\n", 518 | "df.head()" 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 3, 524 | "id": "ef7dcf13-ebe0-473a-89bd-87e97327de16", 525 | "metadata": {}, 526 | "outputs": [], 527 | "source": [ 528 | "def calculate_points(row):\n", 529 | " net = row[-1]\n", 530 | " p20 = row[0]\n", 531 | " p10 = row[1]\n", 532 | " if net < 0:\n", 533 | " return p20 + p10\n", 534 | " else: return p20" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": 4, 540 | "id": "34b846d8-d09a-4713-bfa2-987b2b189a8e", 541 | "metadata": {}, 542 | "outputs": [], 543 | "source": [ 544 | "df['points'] = df[['P20','eq_P10','block_net']].apply(calculate_points,axis=1)" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 5, 550 | "id": "cc3891e4-3480-441f-8291-fad3f01320df", 551 | "metadata": {}, 552 | "outputs": [ 553 | { 554 | "data": { 555 | "text/html": [ 556 | "
\n", 557 | "\n", 570 | "\n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | "
ID20COUNTYP20eq_P10block_diffblock_netpoints
010010201001000Autauga County2131.0-10.0-152.0
110010201001001Autauga County3430.04.0134.0
210010201001002Autauga County2952.0-23.0-181.0
310010201001003Autauga County1713.04.0117.0
410010201001004Autauga County00.00.000.0
........................
8174950721537506022011Yauco Municipio276.021.0127.0
8174951721537506022012Yauco Municipio4363.0-20.0-1106.0
8174952721537506022013Yauco Municipio195341.0-146.0-1536.0
8174953721537506022014Yauco Municipio00.00.000.0
8174954721537506022015Yauco Municipio00.00.000.0
\n", 696 | "

8174955 rows × 7 columns

\n", 697 | "
" 698 | ], 699 | "text/plain": [ 700 | " ID20 COUNTY P20 eq_P10 block_diff block_net \\\n", 701 | "0 10010201001000 Autauga County 21 31.0 -10.0 -1 \n", 702 | "1 10010201001001 Autauga County 34 30.0 4.0 1 \n", 703 | "2 10010201001002 Autauga County 29 52.0 -23.0 -1 \n", 704 | "3 10010201001003 Autauga County 17 13.0 4.0 1 \n", 705 | "4 10010201001004 Autauga County 0 0.0 0.0 0 \n", 706 | "... ... ... ... ... ... ... \n", 707 | "8174950 721537506022011 Yauco Municipio 27 6.0 21.0 1 \n", 708 | "8174951 721537506022012 Yauco Municipio 43 63.0 -20.0 -1 \n", 709 | "8174952 721537506022013 Yauco Municipio 195 341.0 -146.0 -1 \n", 710 | "8174953 721537506022014 Yauco Municipio 0 0.0 0.0 0 \n", 711 | "8174954 721537506022015 Yauco Municipio 0 0.0 0.0 0 \n", 712 | "\n", 713 | " points \n", 714 | "0 52.0 \n", 715 | "1 34.0 \n", 716 | "2 81.0 \n", 717 | "3 17.0 \n", 718 | "4 0.0 \n", 719 | "... ... \n", 720 | "8174950 27.0 \n", 721 | "8174951 106.0 \n", 722 | "8174952 536.0 \n", 723 | "8174953 0.0 \n", 724 | "8174954 0.0 \n", 725 | "\n", 726 | "[8174955 rows x 7 columns]" 727 | ] 728 | }, 729 | "execution_count": 5, 730 | "metadata": {}, 731 | "output_type": "execute_result" 732 | } 733 | ], 734 | "source": [ 735 | "df" 736 | ] 737 | }, 738 | { 739 | "cell_type": "code", 740 | "execution_count": 6, 741 | "id": "6265d762-7a4b-4e18-9383-7866d35b3246", 742 | "metadata": {}, 743 | "outputs": [], 744 | "source": [ 745 | "county2id = pickle.load(open('county2id.pkl','rb'))" 746 | ] 747 | }, 748 | { 749 | "cell_type": "code", 750 | "execution_count": 10, 751 | "id": "8171f642-0b80-4a8a-a7a2-d1462ce61644", 752 | "metadata": {}, 753 | "outputs": [ 754 | { 755 | "data": { 756 | "text/plain": [ 757 | "Jefferson County 96055\n", 758 | "Los Angeles County 91626\n", 759 | "Cook County 85108\n", 760 | "Washington County 75565\n", 761 | "Montgomery County 66524\n", 762 | "Maricopa County 61427\n", 763 | "Franklin County 60891\n", 764 | "Orange County 60830\n", 765 | "Jackson County 60381\n", 766 | "Wayne County 59249\n", 767 | "Name: COUNTY, dtype: int64" 768 | ] 769 | }, 770 | "execution_count": 10, 771 | "metadata": {}, 772 | "output_type": "execute_result" 773 | } 774 | ], 775 | "source": [ 776 | "df.COUNTY.value_counts().head(10)" 777 | ] 778 | }, 779 | { 780 | "cell_type": "code", 781 | "execution_count": 22, 782 | "id": "a49f4f74-ca82-4ad8-a23c-f185f33fd84f", 783 | "metadata": {}, 784 | "outputs": [], 785 | "source": [ 786 | "df = df[df.points!=0].reset_index(drop=True)" 787 | ] 788 | }, 789 | { 790 | "cell_type": "code", 791 | "execution_count": 23, 792 | "id": "5f766288-a5d7-4f38-8c7c-5d09b04a7a75", 793 | "metadata": {}, 794 | "outputs": [ 795 | { 796 | "data": { 797 | "text/plain": [ 798 | "6200461" 799 | ] 800 | }, 801 | "execution_count": 23, 802 | "metadata": {}, 803 | "output_type": "execute_result" 804 | } 805 | ], 806 | "source": [ 807 | "df[df['COUNTY'] == 'Maricopa County'].points.sum()" 808 | ] 809 | }, 810 | { 811 | "cell_type": "code", 812 | "execution_count": 14, 813 | "id": "c064904a-7a6a-4d55-b64b-e2433d8b2ec7", 814 | "metadata": {}, 815 | "outputs": [], 816 | "source": [ 817 | "df['points'] = df['points'].astype('int32')" 818 | ] 819 | }, 820 | { 821 | "cell_type": "code", 822 | "execution_count": 20, 823 | "id": "e6ab7c4c-f044-49e1-9c28-c8734ab87160", 824 | "metadata": {}, 825 | "outputs": [], 826 | "source": [ 827 | "counties = df[['COUNTY','points']].apply(lambda row: [county2id[row[0]]]*row[1],axis=1)" 828 | ] 829 | }, 830 | { 831 | "cell_type": "code", 832 | "execution_count": 22, 833 | "id": "71b59e6e-5d8c-44c9-818f-3fce3cf12350", 834 | "metadata": {}, 835 | "outputs": [], 836 | "source": [ 837 | "gcounties = cudf.from_pandas(counties)" 838 | ] 839 | }, 840 | { 841 | "cell_type": "code", 842 | "execution_count": 25, 843 | "id": "ac88db07-26ee-4cc0-8102-90e6fa247fa6", 844 | "metadata": {}, 845 | "outputs": [], 846 | "source": [ 847 | "counties_list = gcounties.explode().reset_index(drop=True)" 848 | ] 849 | }, 850 | { 851 | "cell_type": "code", 852 | "execution_count": 27, 853 | "id": "40e5fe64-2bb7-46bf-a741-470ad993f98e", 854 | "metadata": {}, 855 | "outputs": [], 856 | "source": [ 857 | "pickle.dump(counties_list,open('county_list.pkl','wb'))" 858 | ] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "execution_count": 28, 863 | "id": "ccc64011-00ed-43b5-ac0f-332136f6e180", 864 | "metadata": {}, 865 | "outputs": [ 866 | { 867 | "data": { 868 | "text/plain": [ 869 | "504475979" 870 | ] 871 | }, 872 | "execution_count": 28, 873 | "metadata": {}, 874 | "output_type": "execute_result" 875 | } 876 | ], 877 | "source": [ 878 | "len(counties_list)" 879 | ] 880 | }, 881 | { 882 | "cell_type": "markdown", 883 | "id": "20524a87-e169-4306-acde-2f17edd3721f", 884 | "metadata": {}, 885 | "source": [ 886 | "#### Continue making dataset for population gen" 887 | ] 888 | }, 889 | { 890 | "cell_type": "code", 891 | "execution_count": 52, 892 | "id": "875e176a-633b-4bad-ba4c-09b2f97892b5", 893 | "metadata": {}, 894 | "outputs": [ 895 | { 896 | "name": "stdout", 897 | "output_type": "stream", 898 | "text": [ 899 | "8174955\n" 900 | ] 901 | } 902 | ], 903 | "source": [ 904 | "print(len(df))" 905 | ] 906 | }, 907 | { 908 | "cell_type": "code", 909 | "execution_count": 53, 910 | "id": "dd1e348d-3753-4ab5-9da7-e0d67e0a7946", 911 | "metadata": {}, 912 | "outputs": [], 913 | "source": [ 914 | "df =df[df.points!=0]" 915 | ] 916 | }, 917 | { 918 | "cell_type": "code", 919 | "execution_count": 54, 920 | "id": "330fe5a7-5e55-48e4-b88a-3b77a42c526a", 921 | "metadata": {}, 922 | "outputs": [ 923 | { 924 | "name": "stdout", 925 | "output_type": "stream", 926 | "text": [ 927 | "6265163\n" 928 | ] 929 | } 930 | ], 931 | "source": [ 932 | "print(len(df))" 933 | ] 934 | }, 935 | { 936 | "cell_type": "code", 937 | "execution_count": 55, 938 | "id": "9d2dab42-4e75-4618-8166-958653847976", 939 | "metadata": {}, 940 | "outputs": [], 941 | "source": [ 942 | "gen_df = df[['ID20','STATE','points']]" 943 | ] 944 | }, 945 | { 946 | "cell_type": "code", 947 | "execution_count": 56, 948 | "id": "3f0c26b3-e585-44b2-96a1-05261a4379db", 949 | "metadata": {}, 950 | "outputs": [], 951 | "source": [ 952 | "gen_df.to_csv('data/total_population_gen_df.csv')" 953 | ] 954 | }, 955 | { 956 | "cell_type": "code", 957 | "execution_count": 61, 958 | "id": "e321a5b6-dfc0-4617-a74d-df5368e79d63", 959 | "metadata": {}, 960 | "outputs": [ 961 | { 962 | "data": { 963 | "text/plain": [ 964 | "6265163" 965 | ] 966 | }, 967 | "execution_count": 61, 968 | "metadata": {}, 969 | "output_type": "execute_result" 970 | } 971 | ], 972 | "source": [ 973 | "len(gen_df)" 974 | ] 975 | } 976 | ], 977 | "metadata": { 978 | "kernelspec": { 979 | "display_name": "Python 3 (ipykernel)", 980 | "language": "python", 981 | "name": "python3" 982 | }, 983 | "language_info": { 984 | "codemirror_mode": { 985 | "name": "ipython", 986 | "version": 3 987 | }, 988 | "file_extension": ".py", 989 | "mimetype": "text/x-python", 990 | "name": "python", 991 | "nbconvert_exporter": "python", 992 | "pygments_lexer": "ipython3", 993 | "version": "3.9.13" 994 | } 995 | }, 996 | "nbformat": 4, 997 | "nbformat_minor": 5 998 | } 999 | -------------------------------------------------------------------------------- /data_prep_total_population/gen_total_population_points_script.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "1cddfb2e-e508-4410-b32d-fd3452298004", 6 | "metadata": {}, 7 | "source": [ 8 | "#### Objective: Use race migration table to generate race migration points" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "id": "5b77a6ef-ef4e-45fe-8b43-6545313d4556", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd\n", 19 | "import geopandas as gpd\n", 20 | "import ast,os,random\n", 21 | "pd.set_option('display.float_format','{:.1f}'.format)\n", 22 | "import warnings\n", 23 | "warnings.filterwarnings('ignore')\n", 24 | "import cudf, cupy as cp\n", 25 | "import numpy as np\n", 26 | "import time\n", 27 | "import math\n", 28 | "import sys,os,datetime,random\n", 29 | "from shapely.geometry import Point\n", 30 | "# pd.set_option('display.max_colwidth', -1)" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "id": "e99e275d-c977-4392-b81a-bbb73bc5ee4f", 36 | "metadata": { 37 | "tags": [] 38 | }, 39 | "source": [ 40 | "#### Load data" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 8, 46 | "id": "66319687-f2ba-4d55-85df-83692db81f96", 47 | "metadata": {}, 48 | "outputs": [ 49 | { 50 | "data": { 51 | "text/html": [ 52 | "
\n", 53 | "\n", 66 | "\n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | "
ID20STATEpoints
010010201001000152.0
110010201001001134.0
210010201001002181.0
310010201001003117.0
41001020100100518.0
\n", 108 | "
" 109 | ], 110 | "text/plain": [ 111 | " ID20 STATE points\n", 112 | "0 10010201001000 1 52.0\n", 113 | "1 10010201001001 1 34.0\n", 114 | "2 10010201001002 1 81.0\n", 115 | "3 10010201001003 1 17.0\n", 116 | "4 10010201001005 1 8.0" 117 | ] 118 | }, 119 | "execution_count": 8, 120 | "metadata": {}, 121 | "output_type": "execute_result" 122 | } 123 | ], 124 | "source": [ 125 | "df = cudf.read_csv('data/total_population_gen_df.csv').drop('Unnamed: 0',axis=1)\n", 126 | "df.head()" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 3, 132 | "id": "97e9b49c-c118-4ca3-b973-c5f36c2499fa", 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "# df = df[df.STATE==6]\n", 137 | "# len(df)//3\n", 138 | "# df= df.iloc[:len(df)//3]" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 10, 144 | "id": "c44ff46f-c2ef-4ebe-acfe-7dcc7bde6db8", 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "161904\n" 152 | ] 153 | } 154 | ], 155 | "source": [ 156 | "print(len(df))" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 2, 162 | "id": "7d6527f6-8214-4494-81d7-2c7c78b1f80f", 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "def random_points_in_polygon(number, polygon):\n", 167 | " # print(polygon)\n", 168 | " points_x = np.array([])\n", 169 | " points_y = np.array([])\n", 170 | " min_x, min_y, max_x, max_y = polygon.bounds\n", 171 | " i= 0\n", 172 | " while i < number:\n", 173 | " point_x = random.uniform(min_x, max_x)\n", 174 | " point_y = random.uniform(min_y, max_y)\n", 175 | " if polygon.contains(Point(point_x, point_y)):\n", 176 | " points_x = np.append(points_x, point_x)\n", 177 | " points_y = np.append(points_y, point_y)\n", 178 | " i += 1\n", 179 | " return points_x, points_y # returns list of points(lat), list of points(long)\n", 180 | "def generate_data(state, df_temp, gpdf):\n", 181 | " t1 = datetime.datetime.now()\n", 182 | " geoid_index_df = df_temp.index.to_numpy()\n", 183 | " final_points_x = np.array([])\n", 184 | " final_points_y = np.array([])\n", 185 | " geoid = np.array([])\n", 186 | " # Add additional features\n", 187 | " county = np.array([])\n", 188 | " p_delta = np.array([])\n", 189 | " p_net = np.array([])\n", 190 | " \n", 191 | " \n", 192 | " f=0\n", 193 | " for index, row in gpdf.iterrows():\n", 194 | " f+=1\n", 195 | " points_x = np.array([])\n", 196 | " points_y = np.array([])\n", 197 | " geoid_temp = np.array([])\n", 198 | " \n", 199 | " if row['GEOID20'] in geoid_index_df:\n", 200 | " num_points = df_temp.loc[row['GEOID20']]\n", 201 | " polygon = row['geometry']\n", 202 | " #print(row['GEOID10'])\n", 203 | " #print('SUCCESS')\n", 204 | " num_points = df_temp.loc[row['GEOID20']] # store population\n", 205 | " polygon = row['geometry']\n", 206 | "\n", 207 | " \n", 208 | " if polygon is not None:\n", 209 | " points_x, points_y = random_points_in_polygon(num_points, polygon)\n", 210 | " # print(points_x,points_y)\n", 211 | " geoid_temp = np.array([row['GEOID20']]*len(points_x))\n", 212 | " geoid = np.append(geoid,geoid_temp)\n", 213 | " final_points_x = np.append(final_points_x, points_x)\n", 214 | " # print(final_points_x)\n", 215 | " final_points_y = np.append(final_points_y, points_y)\n", 216 | " print('Processing '+str(state)+' - Completed:', \"{0:0.2f}\".format((index/len(gpdf))*100), '%', end='')\n", 217 | " print('', end='\\r')\n", 218 | " \n", 219 | " # if f==11:\n", 220 | " # break\n", 221 | "\n", 222 | " print('Processing for '+str(state)+' complete \\n total time', datetime.datetime.now() - t1)\n", 223 | " df_fin = cudf.DataFrame({'GEOID20': geoid,'x': final_points_x, 'y':final_points_y}) #,'COUNTY':county,'p_delta':p_delta,'p_net':p_net})\n", 224 | " df_fin.GEOID20 = df_fin.GEOID20[1:].astype('int').astype('str')\n", 225 | " df_fin.GEOID20 = df_fin.GEOID20.fillna(method='bfill')\n", 226 | " \n", 227 | " df_fin.to_csv('data/total_population/population_%s_1'%str(state)+'.csv', index=False)\n", 228 | "def exec_data(state_key_list):\n", 229 | " c=0\n", 230 | " for i in state_key_list:\n", 231 | " print(i)\n", 232 | " c+=1\n", 233 | " if i< 10:\n", 234 | " i_str = '0'+str(i)\n", 235 | " else:\n", 236 | " i_str = str(i)\n", 237 | " # path = 'census_2020_data/nhgis0003_shape/nhgis0003_shapefile_tl2020_%s0_block_2020/%s_block_2020.shp'%(i_str,states[i])\n", 238 | " path ='data/tl_shapefiles/tl_2021_%s_tabblock20.shp'%(i_str)\n", 239 | " #print(path)\n", 240 | " print(\"started reading shape file for state \", states[i])\n", 241 | " if os.path.isfile(path): \n", 242 | " gpdf = gpd.read_file(path)[['GEOID20', 'geometry']].sort_values('GEOID20').reset_index(drop=True)\n", 243 | " gpdf.GEOID20 = gpdf.GEOID20.astype('int64')\n", 244 | " gpdf = gpdf[(gpdf.GEOID20>=480019501001000) & (gpdf.GEOID20<=481439502032029)].reset_index(drop=True)\n", 245 | " print(\"completed reading shape file for state \", states[i])\n", 246 | " df_temp = df.query('STATE == @i')[['ID20', 'points']]\n", 247 | " df_temp.index = df_temp.ID20\n", 248 | " df_temp = df_temp['points']\n", 249 | " # print(gpdf.head(3))\n", 250 | " # print(df_temp)\n", 251 | " print(\"starting to generate data for \"+str(states[i])+\"... \")\n", 252 | " generate_data(states[i], df_temp, gpdf)\n", 253 | " del(df_temp)\n", 254 | " else:\n", 255 | " print(\"shape file does not exist\")\n", 256 | " continue\n", 257 | " # if c==2:\n", 258 | " # break " 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 3, 264 | "id": "29a0e4e2-a41d-45a6-b6aa-aa8c87ddf5ef", 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [ 268 | "# states = {1 :\"AL\",2 :\"AK\",4 :\"AZ\",5 :\"AR\",6 :\"CA\",8 :\"CO\",9 :\"CT\",10:\"DE\",11:\"DC\",12:\"FL\",13:\"GA\",15:\"HI\",\n", 269 | "# 16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS\",21:\"KY\",22:\"LA\",23:\"ME\",24:\"MD\",25:\"MA\",26:\"MI\",27:\"MN\",\n", 270 | "# 28:\"MS\",29:\"MO\",30:\"MT\",31:\"NE\",32:\"NV\",33:\"NH\",34:\"NJ\",35:\"NM\",36:\"NY\",37:\"NC\",38:\"ND\",39:\"OH\",\n", 271 | "# 40:\"OK\",41:\"OR\",42:\"PA\",44:\"RI\",45:\"SC\",46:\"SD\",47:\"TN\",48:\"TX\",49:\"UT\",50:\"VT\",51:\"VA\",53:\"WA\",\n", 272 | "# 54:\"WV\",55:\"WI\",56:\"WY\",72:\"PR\"}\n", 273 | "# states = {6:\"CA\"}" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": 13, 279 | "id": "4e6f6e62-cbeb-4a67-aee4-024ab9af2f07", 280 | "metadata": {}, 281 | "outputs": [ 282 | { 283 | "name": "stdout", 284 | "output_type": "stream", 285 | "text": [ 286 | "48\n", 287 | "started reading shape file for state TX\n", 288 | "completed reading shape file for state TX\n", 289 | "starting to generate data for TX... \n", 290 | "Processing for TX complete 100.00 %\n", 291 | " total time 3:08:48.306832\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "exec_data(states.keys())" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "id": "2f48af1e-14fa-467b-b7b4-f773427a45dc", 302 | "metadata": { 303 | "tags": [] 304 | }, 305 | "source": [ 306 | "### Concat Parts" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 2, 312 | "id": "dd9ea4cd-2457-46cf-9824-6eac0d41975c", 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "def merge_parts(state_key_list):\n", 317 | " concat_states = cudf.DataFrame()\n", 318 | " c=0\n", 319 | " for i in state_key_list:\n", 320 | " for c in range(1,4):\n", 321 | " if i< 10:\n", 322 | " i_str = '0'+str(i)\n", 323 | " else:\n", 324 | " i_str = str(i)\n", 325 | " path = 'data/total_population/population_%s_%s'%(str(states[i]),c)+'.csv'\n", 326 | " # print(path)\n", 327 | " if os.path.isfile(path): \n", 328 | " temp = cudf.read_csv(path) # Load shape files\n", 329 | " concat_states = cudf.concat([concat_states,temp])\n", 330 | " else:\n", 331 | " print(\"population file does not exist\")\n", 332 | " continue\n", 333 | " return concat_states" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 4, 339 | "id": "a78e1dfc-8105-4302-bac5-f8b7c2e159de", 340 | "metadata": {}, 341 | "outputs": [], 342 | "source": [ 343 | "concat_parts = merge_parts(states)" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 5, 349 | "id": "ef51cef2-4bd2-451f-a55e-26a86499fc3f", 350 | "metadata": {}, 351 | "outputs": [ 352 | { 353 | "data": { 354 | "text/html": [ 355 | "
\n", 356 | "\n", 369 | "\n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | "
GEOID20xy
060014001001001-122.237.9
160014001001001-122.237.9
260014001001001-122.237.9
360014001001001-122.237.9
460014001001001-122.237.9
............
5932552361150411021048-121.339.4
5932552461150411021048-121.339.4
5932552561150411021048-121.339.4
5932552661150411021048-121.339.4
5932552761150411021048-121.339.4
\n", 447 | "

59325528 rows × 3 columns

\n", 448 | "
" 449 | ], 450 | "text/plain": [ 451 | " GEOID20 x y\n", 452 | "0 60014001001001 -122.2 37.9\n", 453 | "1 60014001001001 -122.2 37.9\n", 454 | "2 60014001001001 -122.2 37.9\n", 455 | "3 60014001001001 -122.2 37.9\n", 456 | "4 60014001001001 -122.2 37.9\n", 457 | "... ... ... ...\n", 458 | "59325523 61150411021048 -121.3 39.4\n", 459 | "59325524 61150411021048 -121.3 39.4\n", 460 | "59325525 61150411021048 -121.3 39.4\n", 461 | "59325526 61150411021048 -121.3 39.4\n", 462 | "59325527 61150411021048 -121.3 39.4\n", 463 | "\n", 464 | "[59325528 rows x 3 columns]" 465 | ] 466 | }, 467 | "execution_count": 5, 468 | "metadata": {}, 469 | "output_type": "execute_result" 470 | } 471 | ], 472 | "source": [ 473 | "concat_parts =concat_parts.reset_index(drop=True)\n", 474 | "concat_parts" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 9, 480 | "id": "d72bb78c-3a19-4a4f-8d91-6ec056169ff9", 481 | "metadata": {}, 482 | "outputs": [ 483 | { 484 | "data": { 485 | "text/plain": [ 486 | "42742567.0" 487 | ] 488 | }, 489 | "execution_count": 9, 490 | "metadata": {}, 491 | "output_type": "execute_result" 492 | } 493 | ], 494 | "source": [ 495 | "df[df.STATE==48].points.sum()" 496 | ] 497 | }, 498 | { 499 | "cell_type": "code", 500 | "execution_count": 21, 501 | "id": "80c9b8b2-4975-4508-b160-96415f5e72af", 502 | "metadata": {}, 503 | "outputs": [ 504 | { 505 | "data": { 506 | "text/plain": [ 507 | "59325528.0" 508 | ] 509 | }, 510 | "execution_count": 21, 511 | "metadata": {}, 512 | "output_type": "execute_result" 513 | } 514 | ], 515 | "source": [ 516 | "df.points.sum()" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": 10, 522 | "id": "496e325f-c830-4e5a-bfe1-a9f8016376b7", 523 | "metadata": {}, 524 | "outputs": [], 525 | "source": [ 526 | "concat_parts.to_pandas().to_csv('data/total_population/population_CA')" 527 | ] 528 | }, 529 | { 530 | "cell_type": "markdown", 531 | "id": "13fe141e-7175-4ee8-923d-80b991dd04f5", 532 | "metadata": { 533 | "tags": [] 534 | }, 535 | "source": [ 536 | "### Concat States" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 2, 542 | "id": "848adcd4-502b-4296-a4cf-92e3ab9ff965", 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "def merge_shape_and_states(state_key_list):\n", 547 | " concat_states = cudf.DataFrame()\n", 548 | " \n", 549 | " for i in state_key_list:\n", 550 | " if i< 10:\n", 551 | " i_str = '0'+str(i)\n", 552 | " else:\n", 553 | " i_str = str(i)\n", 554 | " path = 'data/total_population/population_%s'%str(states[i])+'.csv'\n", 555 | " if os.path.isfile(path): \n", 556 | " temp = cudf.read_csv(path) # Load shape files\n", 557 | " concat_states = cudf.concat([concat_states,temp])\n", 558 | " else:\n", 559 | " print(i)\n", 560 | " print(\"population file does not exist\")\n", 561 | " continue\n", 562 | " print(i)\n", 563 | " return concat_states" 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": 3, 569 | "id": "50dcda12-ae59-45d1-9dc5-50e93ee5692c", 570 | "metadata": {}, 571 | "outputs": [], 572 | "source": [ 573 | "# states = {1 :\"AL\",2 :\"AK\",4 :\"AZ\",5 :\"AR\",6 :\"CA\",8 :\"CO\",9 :\"CT\",10:\"DE\",11:\"DC\",12:\"FL\",13:\"GA\",15:\"HI\",\n", 574 | "# 16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS\",21:\"KY\",22:\"LA\",23:\"ME\",24:\"MD\",25:\"MA\",26:\"MI\",27:\"MN\",\n", 575 | "# 28:\"MS\",29:\"MO\",30:\"MT\",31:\"NE\",32:\"NV\",33:\"NH\",34:\"NJ\",35:\"NM\",36:\"NY\",37:\"NC\",38:\"ND\",39:\"OH\",\n", 576 | "# 40:\"OK\",41:\"OR\",42:\"PA\",44:\"RI\",45:\"SC\",46:\"SD\",47:\"TN\",48:\"TX\",49:\"UT\",50:\"VT\",51:\"VA\",53:\"WA\",\n", 577 | "# 54:\"WV\",55:\"WI\",56:\"WY\",72:\"PR\"}\n", 578 | "states = {1 :\"AL\",2 :\"AK\",4 :\"AZ\",5 :\"AR\",6 :\"CA\",8 :\"CO\",9 :\"CT\",10:\"DE\",11:\"DC\",12:\"FL\",13:\"GA\",15:\"HI\",\n", 579 | " 16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS\",21:\"KY\",22:\"LA\",23:\"ME\",24:\"MD\",25:\"MA\",26:\"MI\",27:\"MN\",\n", 580 | " 28:\"MS\"} # part1\n", 581 | "states = {29:\"MO\",30:\"MT\",31:\"NE\",32:\"NV\",33:\"NH\",34:\"NJ\",35:\"NM\",36:\"NY\",37:\"NC\",38:\"ND\",39:\"OH\",\n", 582 | " 40:\"OK\",41:\"OR\",42:\"PA\",44:\"RI\",45:\"SC\",46:\"SD\",47:\"TN\",48:\"TX\",49:\"UT\",50:\"VT\",51:\"VA\",53:\"WA\",\n", 583 | " 54:\"WV\",55:\"WI\",56:\"WY\",72:\"PR\"} #part2" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 4, 589 | "id": "438ad7af-c7a4-4d39-9741-a4cc0c87859c", 590 | "metadata": { 591 | "tags": [] 592 | }, 593 | "outputs": [ 594 | { 595 | "name": "stdout", 596 | "output_type": "stream", 597 | "text": [ 598 | "29\n", 599 | "30\n", 600 | "31\n", 601 | "32\n", 602 | "33\n", 603 | "34\n", 604 | "35\n", 605 | "36\n", 606 | "37\n", 607 | "38\n", 608 | "39\n", 609 | "40\n", 610 | "41\n", 611 | "42\n", 612 | "44\n", 613 | "45\n", 614 | "46\n", 615 | "47\n", 616 | "48\n", 617 | "49\n", 618 | "50\n", 619 | "51\n", 620 | "53\n", 621 | "54\n", 622 | "55\n", 623 | "56\n", 624 | "72\n" 625 | ] 626 | }, 627 | { 628 | "data": { 629 | "text/html": [ 630 | "
\n", 631 | "\n", 644 | "\n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | "
ID20xy
0290019501001000-92.440.3
1290019501001000-92.440.3
2290019501001001-92.440.3
3290019501001001-92.440.3
4290019501001001-92.440.3
\n", 686 | "
" 687 | ], 688 | "text/plain": [ 689 | " ID20 x y\n", 690 | "0 290019501001000 -92.4 40.3\n", 691 | "1 290019501001000 -92.4 40.3\n", 692 | "2 290019501001001 -92.4 40.3\n", 693 | "3 290019501001001 -92.4 40.3\n", 694 | "4 290019501001001 -92.4 40.3" 695 | ] 696 | }, 697 | "execution_count": 4, 698 | "metadata": {}, 699 | "output_type": "execute_result" 700 | } 701 | ], 702 | "source": [ 703 | "indv_df = merge_shape_and_states(states.keys()).drop('Unnamed: 0',axis=1)\n", 704 | "indv_df.rename(columns={'GEOID20':'ID20'},inplace=True)\n", 705 | "indv_df.head()" 706 | ] 707 | }, 708 | { 709 | "cell_type": "code", 710 | "execution_count": 5, 711 | "id": "9fd6b2fe-5fbf-47b1-93d2-7bd18dfeeb7b", 712 | "metadata": {}, 713 | "outputs": [ 714 | { 715 | "data": { 716 | "text/plain": [ 717 | "248001113" 718 | ] 719 | }, 720 | "execution_count": 5, 721 | "metadata": {}, 722 | "output_type": "execute_result" 723 | } 724 | ], 725 | "source": [ 726 | "len(indv_df)" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": null, 732 | "id": "9bfb9e3e-bfeb-45d6-ac7a-9475e2575577", 733 | "metadata": {}, 734 | "outputs": [], 735 | "source": [ 736 | "# indv_df.to_pandas().to_parquet('data/total_part1.parquet')" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": 6, 742 | "id": "9c369907-9c5f-4874-9d2f-8c80a86b9c56", 743 | "metadata": {}, 744 | "outputs": [], 745 | "source": [ 746 | "# indv_df.to_pandas().to_parquet('data/total_part2.parquet')" 747 | ] 748 | }, 749 | { 750 | "cell_type": "markdown", 751 | "id": "9d138976-2169-4049-9c7c-815657a2b08c", 752 | "metadata": {}, 753 | "source": [ 754 | "### Use processed dfs" 755 | ] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": 2, 760 | "id": "add2e952-e678-434c-84bc-68c3d6527b5d", 761 | "metadata": {}, 762 | "outputs": [], 763 | "source": [ 764 | "# df1 = pd.read_parquet('data/total_part1.parquet')\n", 765 | "# df2 = pd.read_parquet('data/total_part2.parquet')" 766 | ] 767 | }, 768 | { 769 | "cell_type": "code", 770 | "execution_count": 3, 771 | "id": "b9c7d374-9466-4e9e-bd74-ed84d0b25790", 772 | "metadata": {}, 773 | "outputs": [], 774 | "source": [ 775 | "# merged = pd.concat([df1,df2])" 776 | ] 777 | }, 778 | { 779 | "cell_type": "code", 780 | "execution_count": 5, 781 | "id": "ea1d033d-dd0b-4827-b87b-623d11f02c6a", 782 | "metadata": {}, 783 | "outputs": [ 784 | { 785 | "data": { 786 | "text/plain": [ 787 | "504475979" 788 | ] 789 | }, 790 | "execution_count": 5, 791 | "metadata": {}, 792 | "output_type": "execute_result" 793 | } 794 | ], 795 | "source": [ 796 | "# len(merged)" 797 | ] 798 | }, 799 | { 800 | "cell_type": "code", 801 | "execution_count": 6, 802 | "id": "5f3ef3c3-0170-4d7a-be20-f85b7250f2a9", 803 | "metadata": {}, 804 | "outputs": [], 805 | "source": [ 806 | "# gpu = cudf.from_pandas(merged)" 807 | ] 808 | }, 809 | { 810 | "cell_type": "code", 811 | "execution_count": 9, 812 | "id": "68e69c2c-286f-4abd-a0fb-ef11e4c4b851", 813 | "metadata": {}, 814 | "outputs": [], 815 | "source": [ 816 | "# merged.to_parquet('data/total_parts_combined.parquet')" 817 | ] 818 | }, 819 | { 820 | "cell_type": "code", 821 | "execution_count": null, 822 | "id": "ab83595d-dd0f-4e05-a587-49dbcee0b31c", 823 | "metadata": {}, 824 | "outputs": [], 825 | "source": [ 826 | "# dataset = indv_df.merge(df,on='ID20',how='left').sort_values('ID20')\n", 827 | "# dataset.head()" 828 | ] 829 | } 830 | ], 831 | "metadata": { 832 | "kernelspec": { 833 | "display_name": "Python 3 (ipykernel)", 834 | "language": "python", 835 | "name": "python3" 836 | }, 837 | "language_info": { 838 | "codemirror_mode": { 839 | "name": "ipython", 840 | "version": 3 841 | }, 842 | "file_extension": ".py", 843 | "mimetype": "text/x-python", 844 | "name": "python", 845 | "nbconvert_exporter": "python", 846 | "pygments_lexer": "ipython3", 847 | "version": "3.9.13" 848 | } 849 | }, 850 | "nbformat": 4, 851 | "nbformat_minor": 5 852 | } 853 | -------------------------------------------------------------------------------- /entrypoint.sh: -------------------------------------------------------------------------------- 1 | #activating the conda environment 2 | source activate rapids 3 | 4 | cd /rapids/plotly_census_demo/plotly_demo 5 | 6 | if [ "$@" = "dask_app" ]; then 7 | python dask_app.py 8 | else 9 | python app.py 10 | fi 11 | 12 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | channels: 2 | - rapidsai 3 | - conda-forge 4 | - nvidia 5 | dependencies: 6 | - python=3.10 7 | - cudatoolkit=11.8 8 | - cudf=23.06 9 | - dask-cudf=23.06 10 | - dask-cuda=23.06 11 | - dash 12 | - jupyterlab 13 | - jupyter-dash 14 | - jupyterlab-dash 15 | - dash-html-components 16 | - dash-core-components 17 | - dash-daq 18 | - dash-bootstrap-components 19 | - datashader>=0.15 20 | - pyproj 21 | - bokeh 22 | -------------------------------------------------------------------------------- /environment_for_docker.yml: -------------------------------------------------------------------------------- 1 | channels: 2 | - conda-forge 3 | dependencies: 4 | - dash=2.5.1 5 | - dash-html-components=2.0.0 6 | - dash-core-components=2.0.0 7 | - dash-daq=0.5.0 8 | - dash-bootstrap-components=1.2.0 9 | - datashader=0.14 10 | -------------------------------------------------------------------------------- /holoviews_demo/README.md: -------------------------------------------------------------------------------- 1 | # Panel + Holoviews + RAPIDS | Census 2020 Race Migration Visualization 2 | 3 | ![hvr1](https://user-images.githubusercontent.com/35873124/189291984-d95ddf27-9ec8-452a-b596-05398ce47969.png) 4 | 5 | ## Charts 6 | 7 | 1. Map chart shows the total migration points for chosen view and selected area 8 | 2. Top counties bar show the counties with most migration for chosen view and selected area 9 | 3. Net Race migration bar shows total inward and outward migration for chosen view and selected area 10 | 4. Population Distribution shows distribution of migration across blocks for chosen view and selected area 11 | 12 | Cross-filtering is enabled to link all the four charts using box-select tool 13 | 14 | ## Race Views 15 | 16 | The demo consists of eight views ( seven race views + one all-race view) 17 | 18 | Options - All, White alone, African American alone, American Indian alone, Asian alone, Native Hawaiian alone, Other Race alone, Two or More races. 19 | 20 | #### Snapshot examples 21 | 22 | 1. White race 23 | 24 | ![white](https://user-images.githubusercontent.com/35873124/189290231-4f573dba-6357-4f0a-89cd-14260fa35d0b.png) 25 | 26 | 2. Asian race 27 | 28 | ![asian](https://user-images.githubusercontent.com/35873124/189290237-bdece601-4237-436a-a90f-039f42790b9c.png) 29 | 30 | 3. African american race 31 | 32 | ![africanamerican](https://user-images.githubusercontent.com/35873124/189290258-27aa8b71-cdfc-443b-99d9-260b2bbcd342.png) 33 | 34 | ## Colormaps 35 | 36 | User can select from select colormaps 37 | 38 | Options - 'kbc', 'fire', 'bgy', 'bgyw', 'bmy', 'gray'. 39 | 40 | ## Limit 41 | 42 | User can use slider to select how many top counties to show, from 5 to 50 at intervals of 5 43 | 44 | # Installation and Run Steps 45 | 46 | ## Data 47 | 48 | There is 1 main dataset: 49 | 50 | - Net Migration Dataset ; Consists of Race Migration computed using Census 2020 and Census 2010 block data 51 | 52 | For more information on how the Net Migration Dataset was prepared to show individual points, refer to the `/data_prep_net_migration` folder. 53 | 54 | You can download the final net miragtion dataset [here](https://data.rapids.ai/viz-data/net_migration_dataset.parquet) 55 | 56 | ### Conda Env 57 | 58 | Verify the following arguments in the `environment.yml` match your system(easy way to check `nvidia-smi`): 59 | 60 | cudatoolkit: Version used is `11.5` 61 | 62 | ```bash 63 | # setup conda environment 64 | conda env create --name holoviews_env --file environment.yml 65 | source activate holoviews_env 66 | 67 | # run and access 68 | cd holoviews_demo 69 | jupyter lab 70 | run `census_net_migration_demo.ipynb` notebook 71 | ``` 72 | 73 | ## Dependencies 74 | 75 | - python=3.9 76 | - cudatoolkit=11.5 77 | - rapids=22.08 78 | - plotly=5.10.0 79 | - jupyterlab=3.4.3 80 | 81 | ## FAQ and Known Issues 82 | 83 | **What hardware do I need to run this locally?** To run you need an NVIDIA GPU with at least 24GB of memory, at least 32GB of system memory, and a Linux OS as defined in the [RAPIDS requirements](https://rapids.ai/start.html#req). 84 | 85 | **How did you compute migration?** Migration was computed by comparing the block level population for census 2010 and 2020 86 | 87 | **How did you compare population having block level boundary changes?** [Relationship Files](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#t10t20) provides the 2010 Census Tabulation Block to 2020 Census Tabulation Block Relationship Files. Block relationships may be one-to-one, many-to-one, one-to-many, or many-to-many. Population count was computed in proportion to take into account the division and collation of blocks across 2010 and 2020. 88 | 89 | **How did you determine race migration?** We took difference of race counts for census 2020 and census 2010. Individuals were randomly assigned race within a block so that they accurately add up at the block level. 90 | 91 | **How did you get individual point locations?** The population density points are randomly placed within a census block and associated to match distribution counts at a census block level. 92 | 93 | **How are the population and distributions filtered?** Use the box select tool icon for the map or click and drag for the bar charts. 94 | 95 | **Why is the population data from 2010 and 2020?** Only census data is recorded on a block level, which provides the highest resolution population distributions available. For more details on census boundaries refer to the [TIGERweb app](https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_apps.html). 96 | 97 | **The dashboard stop responding or the chart data disappeared!** This is likely caused by an Out of Memory Error and the application must be restarted. 98 | 99 | **How do I request a feature or report a bug?** Create an [Issue](https://github.com/rapidsai/plotly-dash-rapids-census-demo/issues) and we will get to it asap. 100 | 101 | ## Acknowledgments and Data Sources 102 | 103 | - 2020 Population Census and 2010 Population Census to compute Net Migration Dataset, used with permission from IPUMS NHGIS, University of Minnesota, [www.nhgis.org](https://www.nhgis.org/) ( not for redistribution ). 104 | - Dashboard developed with [Panel](https://panel.holoviz.org/) and [Holoviews](https://holoviews.org/index.html) 105 | - Geospatial point rendering developed with [Datashader](https://datashader.org/). 106 | - GPU acceleration with [RAPIDS cudf](https://rapids.ai/) and [cupy](https://cupy.chainer.org/), CPU code with [pandas](https://pandas.pydata.org/). 107 | - For source code and data workflow, visit our [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/census-2020). 108 | -------------------------------------------------------------------------------- /holoviews_demo/environment.yml: -------------------------------------------------------------------------------- 1 | channels: 2 | - rapidsai 3 | - conda-forge 4 | - nvidia 5 | dependencies: 6 | - python=3.9 7 | - cudatoolkit=11.5 8 | - rapids=22.08 9 | - plotly=5.10.0 10 | - jupyterlab=3.4.3 11 | -------------------------------------------------------------------------------- /id2county.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/e4af2bd3de86263b7f1f947ba9b302002e047a55/id2county.pkl -------------------------------------------------------------------------------- /plotly_demo/README.md: -------------------------------------------------------------------------------- 1 | # Plotly-Dash + RAPIDS | Census 2020 Visualization 2 | 3 | There are two versions for the same application, with all the views(described below) in both, single GPU and multi-GPU versions respectively. 4 | 5 | Recommended GPU memory: 6 | 7 | 1. Single GPU version: 32GB+ 8 | 2. Multi-GPU version: 2+ GPUs of 16GB+ each 9 | 10 | ```bash 11 | # run and access single GPU version 12 | cd plotly_demo 13 | python app.py 14 | 15 | # run and access multi GPU version 16 | cd plotly_demo 17 | python dask_app.py 18 | ``` 19 | 20 | ## Snapshot Examples 21 | 22 | ### 1) Total Population View 23 | 24 | ![tp](https://user-images.githubusercontent.com/35873124/189298473-4a6895db-b5b3-49da-b47a-4a39233e7daf.png) 25 | 26 | ### 2) Migrating In View 27 | 28 | ![migin](https://user-images.githubusercontent.com/35873124/189298490-614a7efb-f172-4322-becc-eb79059bfbaa.png) 29 | 30 | ### 3) Stationary View 31 | 32 | ![stationary](https://user-images.githubusercontent.com/35873124/189298509-fb20b2af-3aee-4a12-9cba-885e3d2587f5.png) 33 | 34 | ### 4) Migrating Out View 35 | 36 | ![migout](https://user-images.githubusercontent.com/35873124/189298523-14983e47-38bd-4b73-97fa-6694b13f3362.png) 37 | 38 | ### 5) Net Migration View 39 | 40 | ![netmig](https://user-images.githubusercontent.com/35873124/189298570-64640492-4413-4d0e-a2be-aa2c91df6736.png) 41 | 42 | #### Migration population to color mapping - 43 | 44 | Inward Migration: Purple-Blue
45 | Stationary: Greens
46 | Outward Migration: Red Purples
47 | 48 | ### 6) Population with Race view 49 | 50 | ![race](https://user-images.githubusercontent.com/35873124/189298602-11873dc3-89f2-4934-8208-b68e28e59d57.png) 51 | 52 | #### Race to color mapping - 53 | 54 | White: aqua
55 | African American: lime
56 | American Indian: yellow
57 | Asian: orange
58 | Native Hawaiian: blue
59 | Other Race alone: fuchsia
60 | Two or More: saddlebrown
61 | -------------------------------------------------------------------------------- /plotly_demo/app.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | 4 | import cudf 5 | import dash_bootstrap_components as dbc 6 | import dash_daq as daq 7 | import numpy as np 8 | import pandas as pd 9 | from dash import Dash, ctx, dcc, html 10 | from dash.dependencies import Input, Output, State 11 | from dash.exceptions import PreventUpdate 12 | from distributed import Client 13 | from utils import * 14 | import tarfile 15 | 16 | # ### Dashboards start here 17 | text_color = "#cfd8dc" # Material blue-grey 100 18 | 19 | DATA_PATH = "../data" 20 | DATA_PATH_STATE = f"{DATA_PATH}/state-wise-population" 21 | DATA_PATH_TOTAL = f"{DATA_PATH}/total_population_dataset.parquet" 22 | 23 | # Download the required states data 24 | census_data_url = "https://data.rapids.ai/viz-data/total_population_dataset.parquet" 25 | check_dataset(census_data_url, DATA_PATH_TOTAL) 26 | 27 | census_state_data_url = "https://data.rapids.ai/viz-data/state-wise-population.tar.xz" 28 | if not os.path.exists(DATA_PATH_STATE): 29 | check_dataset(census_state_data_url, f"{DATA_PATH_STATE}.tar.xz") 30 | print("Extracting state-wise-population.tar.xz ...") 31 | with tarfile.open(f"{DATA_PATH_STATE}.tar.xz", "r:xz") as tar: 32 | tar.extractall(DATA_PATH) 33 | print("Done.") 34 | 35 | state_files = os.listdir(DATA_PATH_STATE) 36 | state_names = [os.path.splitext(f)[0] for f in state_files] 37 | # add USA(combined dataset) to the list of states 38 | state_names.append("USA") 39 | 40 | 41 | ( 42 | data_center_3857, 43 | data_3857, 44 | data_4326, 45 | data_center_4326, 46 | selected_map_backup, 47 | selected_race_backup, 48 | selected_county_top_backup, 49 | selected_county_bt_backup, 50 | view_name_backup, 51 | c_df, 52 | gpu_enabled_backup, 53 | dragmode_backup, 54 | currently_loaded_state, 55 | ) = ([], [], [], [], None, None, None, None, None, None, None, "pan", None) 56 | 57 | 58 | app = Dash(__name__) 59 | application = app.server 60 | 61 | app.layout = html.Div( 62 | children=[ 63 | html.Div( 64 | children=[ 65 | html.H1( 66 | children=[ 67 | "Census 2020 Net Migration Visualization", 68 | html.A( 69 | html.Img( 70 | src="assets/rapids-logo.png", 71 | style={ 72 | "float": "right", 73 | "height": "45px", 74 | "marginRight": "1%", 75 | "marginTop": "-7px", 76 | }, 77 | ), 78 | href="https://rapids.ai/", 79 | ), 80 | html.A( 81 | html.Img( 82 | src="assets/dash-logo.png", 83 | style={"float": "right", "height": "30px"}, 84 | ), 85 | href="https://dash.plot.ly/", 86 | ), 87 | ], 88 | style={"textAlign": "left"}, 89 | ), 90 | ] 91 | ), 92 | html.Div( 93 | children=[ 94 | html.Div( 95 | children=[ 96 | html.Div( 97 | children=[ 98 | html.H4( 99 | [ 100 | "Population Count and Query Time", 101 | ], 102 | className="container_title", 103 | ), 104 | dcc.Loading( 105 | dcc.Graph( 106 | id="indicator-graph", 107 | figure=blank_fig(row_heights[3]), 108 | config={"displayModeBar": False}, 109 | ), 110 | color="#b0bec5", 111 | style={"height": f"{row_heights[3]}px"}, 112 | ), 113 | ], 114 | style={"height": f"{row_heights[0]}px"}, 115 | className="five columns pretty_container", 116 | id="indicator-div", 117 | ), 118 | html.Div( 119 | children=[ 120 | html.Div( 121 | children=[ 122 | html.Button( 123 | "Clear All Selections", 124 | id="clear-all", 125 | className="reset-button", 126 | ), 127 | ] 128 | ), 129 | html.H4( 130 | [ 131 | "Options", 132 | ], 133 | className="container_title", 134 | ), 135 | html.Table( 136 | [ 137 | html.Tr( 138 | [ 139 | html.Td( 140 | html.Div("GPU Acceleration"), 141 | className="config-label", 142 | ), 143 | html.Td( 144 | html.Div( 145 | [ 146 | daq.DarkThemeProvider( 147 | daq.BooleanSwitch( 148 | on=True, 149 | color="#00cc96", 150 | id="gpu-toggle", 151 | ) 152 | ), 153 | dbc.Tooltip( 154 | "Caution: Using CPU compute for more than 50 million points is not recommended.", 155 | target="gpu-toggle", 156 | placement="bottom", 157 | autohide=True, 158 | style={ 159 | "textAlign": "left", 160 | "fontSize": "15px", 161 | "color": "white", 162 | "width": "350px", 163 | "padding": "15px", 164 | "borderRadius": "5px", 165 | "backgroundColor": "#2a2a2e", 166 | }, 167 | ), 168 | ] 169 | ) 170 | ), 171 | ####### State Selection Dropdown ###### 172 | html.Td( 173 | html.Div("Select State"), 174 | style={"fontSize": "20px"}, 175 | ), 176 | html.Td( 177 | dcc.Dropdown( 178 | id="state-dropdown", 179 | options=[ 180 | {"label": i, "value": i} 181 | for i in state_names 182 | ], 183 | value="USA", 184 | ), 185 | style={ 186 | "width": "25%", 187 | "height": "15px", 188 | }, 189 | ), 190 | ###### VIEWS ARE HERE ########### 191 | html.Td( 192 | html.Div("Data-Selection"), 193 | style={"fontSize": "20px"}, 194 | ), # className="config-label" 195 | html.Td( 196 | dcc.Dropdown( 197 | id="view-dropdown", 198 | options=[ 199 | { 200 | "label": "Total Population", 201 | "value": "total", 202 | }, 203 | { 204 | "label": "Migrating In", 205 | "value": "in", 206 | }, 207 | { 208 | "label": "Stationary", 209 | "value": "stationary", 210 | }, 211 | { 212 | "label": "Migrating Out", 213 | "value": "out", 214 | }, 215 | { 216 | "label": "Net Migration", 217 | "value": "net", 218 | }, 219 | { 220 | "label": "Population with Race", 221 | "value": "race", 222 | }, 223 | ], 224 | value="in", 225 | searchable=False, 226 | clearable=False, 227 | ), 228 | style={ 229 | "width": "25%", 230 | "height": "15px", 231 | }, 232 | ), 233 | ] 234 | ), 235 | ], 236 | style={"width": "100%", "marginTop": "30px"}, 237 | ), 238 | # Hidden div inside the app that stores the intermediate value 239 | html.Div( 240 | id="datapoints-state-value", 241 | style={"display": "none"}, 242 | ), 243 | ], 244 | style={"height": f"{row_heights[0]}px"}, 245 | className="seven columns pretty_container", 246 | id="config-div", 247 | ), 248 | ] 249 | ), 250 | ##################### Map starts ################################### 251 | html.Div( 252 | children=[ 253 | html.Button( 254 | "Clear Selection", id="reset-map", className="reset-button" 255 | ), 256 | html.H4( 257 | [ 258 | "Population Distribution of Individuals", 259 | ], 260 | className="container_title", 261 | ), 262 | dcc.Graph( 263 | id="map-graph", 264 | config={"displayModeBar": False}, 265 | figure=blank_fig(row_heights[1]), 266 | ), 267 | # Hidden div inside the app that stores the intermediate value 268 | html.Div( 269 | id="intermediate-state-value", style={"display": "none"} 270 | ), 271 | ], 272 | className="twelve columns pretty_container", 273 | id="map-div", 274 | style={"height": "50%"}, 275 | ), 276 | ################# Bars start ######################### 277 | # Race start 278 | html.Div( 279 | children=[ 280 | html.Div( 281 | children=[ 282 | html.Button( 283 | "Clear Selection", 284 | id="clear-race", 285 | className="reset-button", 286 | ), 287 | html.H4( 288 | [ 289 | "Race Distribution", 290 | ], 291 | className="container_title", 292 | ), 293 | dcc.Graph( 294 | id="race-histogram", 295 | config={"displayModeBar": False}, 296 | figure=blank_fig(row_heights[2]), 297 | ), 298 | ], 299 | className="one-third column pretty_container", 300 | id="race-div", 301 | ), # County top starts 302 | html.Div( 303 | children=[ 304 | html.Button( 305 | "Clear Selection", 306 | id="clear-county-top", 307 | className="reset-button", 308 | ), 309 | html.H4( 310 | [ 311 | "County-wise Top 15", 312 | ], 313 | className="container_title", 314 | ), 315 | dcc.Graph( 316 | id="county-histogram-top", 317 | config={"displayModeBar": False}, 318 | figure=blank_fig(row_heights[2]), 319 | animate=False, 320 | ), 321 | ], 322 | className=" one-third column pretty_container", 323 | id="county-div-top", 324 | ), 325 | # County bottom starts 326 | html.Div( 327 | children=[ 328 | html.Button( 329 | "Clear Selection", 330 | id="clear-county-bottom", 331 | className="reset-button", 332 | ), 333 | html.H4( 334 | [ 335 | "County-wise Bottom 15", 336 | ], 337 | className="container_title", 338 | ), 339 | dcc.Graph( 340 | id="county-histogram-bottom", 341 | config={"displayModeBar": False}, 342 | figure=blank_fig(row_heights[2]), 343 | animate=False, 344 | ), 345 | ], 346 | className="one-third column pretty_container", 347 | ), 348 | ], 349 | className="twelve columns", 350 | ) 351 | ############## End of Bars ##################### 352 | ] 353 | ), 354 | html.Div( 355 | [ 356 | html.H4("Acknowledgements and Data Sources", style={"marginTop": "0"}), 357 | dcc.Markdown( 358 | """\ 359 | - 2020 Population Census and 2010 Population Census to compute Migration Dataset, used with permission from IPUMS NHGIS, University of Minnesota, [www.nhgis.org](https://www.nhgis.org/) ( not for redistribution ). 360 | - Base map layer provided by [Mapbox](https://www.mapbox.com/). 361 | - Dashboard developed with [Plotly Dash](https://plotly.com/dash/). 362 | - Geospatial point rendering developed with [Datashader](https://datashader.org/). 363 | - GPU toggle accelerated with [RAPIDS cudf and dask_cudf](https://rapids.ai/) and [cupy](https://cupy.chainer.org/), CPU toggle with [pandas](https://pandas.pydata.org/). 364 | - For source code and data workflow, visit our [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/master). 365 | """ 366 | ), 367 | ], 368 | style={ 369 | "width": "98%", 370 | "marginRight": "0", 371 | "padding": "10px", 372 | }, 373 | className="twelve columns pretty_container", 374 | ), 375 | ], 376 | ) 377 | 378 | 379 | # Clear/reset button callbacks 380 | @app.callback( 381 | Output("map-graph", "selectedData"), 382 | [Input("reset-map", "n_clicks"), Input("clear-all", "n_clicks")], 383 | ) 384 | def clear_map(*args): 385 | return None 386 | 387 | 388 | @app.callback( 389 | Output("race-histogram", "selectedData"), 390 | [Input("clear-race", "n_clicks"), Input("clear-all", "n_clicks")], 391 | ) 392 | def clear_race_hist_selections(*args): 393 | return None 394 | 395 | 396 | @app.callback( 397 | Output("county-histogram-top", "selectedData"), 398 | [Input("clear-county-top", "n_clicks"), Input("clear-all", "n_clicks")], 399 | ) 400 | def clear_county_hist_top_selections(*args): 401 | return None 402 | 403 | 404 | @app.callback( 405 | Output("county-histogram-bottom", "selectedData"), 406 | [Input("clear-county-bottom", "n_clicks"), Input("clear-all", "n_clicks")], 407 | ) 408 | def clear_county_hist_bottom_selections(*args): 409 | return None 410 | 411 | 412 | @app.callback( 413 | [ 414 | Output("indicator-graph", "figure"), 415 | Output("map-graph", "figure"), 416 | Output("map-graph", "config"), 417 | Output("county-histogram-top", "figure"), 418 | Output("county-histogram-top", "config"), 419 | Output("county-histogram-bottom", "figure"), 420 | Output("county-histogram-bottom", "config"), 421 | Output("race-histogram", "figure"), 422 | Output("race-histogram", "config"), 423 | Output("intermediate-state-value", "children"), 424 | ], 425 | [ 426 | Input("map-graph", "relayoutData"), 427 | Input("map-graph", "selectedData"), 428 | Input("race-histogram", "selectedData"), 429 | Input("county-histogram-top", "selectedData"), 430 | Input("county-histogram-bottom", "selectedData"), 431 | Input("view-dropdown", "value"), 432 | Input("state-dropdown", "value"), 433 | Input("gpu-toggle", "on"), 434 | ], 435 | [ 436 | State("intermediate-state-value", "children"), 437 | ], 438 | ) 439 | def update_plots( 440 | relayout_data, 441 | selected_map, 442 | selected_race, 443 | selected_county_top, 444 | selected_county_bottom, 445 | view_name, 446 | state_name, 447 | gpu_enabled, 448 | coordinates_backup, 449 | ): 450 | global data_3857, data_center_3857, data_4326, data_center_4326, currently_loaded_state, selected_race_backup, selected_county_top_backup, selected_county_bt_backup 451 | 452 | # condition to avoid reloading on tool update 453 | if ( 454 | ctx.triggered_id == "map-graph" 455 | and relayout_data 456 | and list(relayout_data.keys()) == ["dragmode"] 457 | ): 458 | raise PreventUpdate 459 | 460 | # condition to avoid a bug in plotly where selectedData is reset following a box-select 461 | if not (selected_race is not None and len(selected_race["points"]) == 0): 462 | selected_race_backup = selected_race 463 | elif ctx.triggered_id == "race-histogram": 464 | raise PreventUpdate 465 | 466 | # condition to avoid a bug in plotly where selectedData is reset following a box-select 467 | if not ( 468 | selected_county_top is not None and len(selected_county_top["points"]) == 0 469 | ): 470 | selected_county_top_backup = selected_county_top 471 | elif ctx.triggered_id == "county-histogram-top": 472 | raise PreventUpdate 473 | 474 | # condition to avoid a bug in plotly where selectedData is reset following a box-select 475 | if not ( 476 | selected_county_bottom is not None 477 | and len(selected_county_bottom["points"]) == 0 478 | ): 479 | selected_county_bt_backup = selected_county_bottom 480 | elif ctx.triggered_id == "county-histogram-bottom": 481 | raise PreventUpdate 482 | 483 | df = read_dataset(state_name, gpu_enabled, currently_loaded_state) 484 | 485 | t0 = time.time() 486 | 487 | if coordinates_backup is not None: 488 | coordinates_4326_backup, position_backup = coordinates_backup 489 | else: 490 | coordinates_4326_backup, position_backup = None, None 491 | 492 | colorscale_name = "Viridis" 493 | 494 | if data_3857 == [] or state_name != currently_loaded_state: 495 | ( 496 | data_3857, 497 | data_center_3857, 498 | data_4326, 499 | data_center_4326, 500 | ) = set_projection_bounds(df) 501 | 502 | ( 503 | datashader_plot, 504 | race_histogram, 505 | county_top_histogram, 506 | county_bottom_histogram, 507 | n_selected_indicator, 508 | coordinates_4326_backup, 509 | position_backup, 510 | ) = build_updated_figures( 511 | df, 512 | relayout_data, 513 | selected_map, 514 | selected_race_backup, 515 | selected_county_top_backup, 516 | selected_county_bt_backup, 517 | colorscale_name, 518 | data_3857, 519 | data_center_3857, 520 | data_4326, 521 | data_center_4326, 522 | coordinates_4326_backup, 523 | position_backup, 524 | view_name, 525 | ) 526 | 527 | barchart_config = { 528 | "displayModeBar": True, 529 | "modeBarButtonsToRemove": [ 530 | "zoom2d", 531 | "pan2d", 532 | "select2d", 533 | "lasso2d", 534 | "zoomIn2d", 535 | "zoomOut2d", 536 | "resetScale2d", 537 | "hoverClosestCartesian", 538 | "hoverCompareCartesian", 539 | "toggleSpikelines", 540 | ], 541 | } 542 | compute_time = time.time() - t0 543 | print(f"Query time: {compute_time}") 544 | n_selected_indicator["data"].append( 545 | { 546 | "title": {"text": "Query Time"}, 547 | "type": "indicator", 548 | "value": round(compute_time, 4), 549 | "domain": {"x": [0.6, 0.85], "y": [0, 0.5]}, 550 | "number": { 551 | "font": { 552 | "color": text_color, 553 | "size": "50px", 554 | }, 555 | "suffix": " seconds", 556 | }, 557 | } 558 | ) 559 | 560 | datashader_plot["layout"]["dragmode"] = ( 561 | relayout_data["dragmode"] 562 | if (relayout_data and "dragmode" in relayout_data) 563 | else dragmode_backup 564 | ) 565 | # update currently loaded state 566 | currently_loaded_state = state_name 567 | 568 | return ( 569 | n_selected_indicator, 570 | datashader_plot, 571 | { 572 | "displayModeBar": True, 573 | "modeBarButtonsToRemove": [ 574 | "lasso2d", 575 | "zoomInMapbox", 576 | "zoomOutMapbox", 577 | "toggleHover", 578 | ], 579 | }, 580 | county_top_histogram, 581 | barchart_config, 582 | county_bottom_histogram, 583 | barchart_config, 584 | race_histogram, 585 | barchart_config, 586 | (coordinates_4326_backup, position_backup), 587 | ) 588 | 589 | 590 | def read_dataset(state_name, gpu_enabled, currently_loaded_state): 591 | global c_df 592 | if state_name != currently_loaded_state: 593 | if state_name == "USA": 594 | data_path = f"{DATA_PATH}/total_population_dataset.parquet" 595 | else: 596 | data_path = f"{DATA_PATH_STATE}/{state_name}.parquet" 597 | c_df = load_dataset(data_path, "cudf" if gpu_enabled else "pandas") 598 | return c_df 599 | 600 | 601 | if __name__ == "__main__": 602 | # Launch dashboard 603 | app.run_server( 604 | debug=True, 605 | dev_tools_hot_reload=True, 606 | host="0.0.0.0", 607 | ) 608 | -------------------------------------------------------------------------------- /plotly_demo/assets/dash-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/e4af2bd3de86263b7f1f947ba9b302002e047a55/plotly_demo/assets/dash-logo.png -------------------------------------------------------------------------------- /plotly_demo/assets/rapids-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/e4af2bd3de86263b7f1f947ba9b302002e047a55/plotly_demo/assets/rapids-logo.png -------------------------------------------------------------------------------- /plotly_demo/assets/s1.css: -------------------------------------------------------------------------------- 1 | /* Table of contents 2 | –––––––––––––––––––––––––––––––––––––––––––––––––– 3 | - Plotly.js 4 | - Grid 5 | - Base Styles 6 | - Typography 7 | - Links 8 | - Buttons 9 | - Forms 10 | - Lists 11 | - Code 12 | - Tables 13 | - Spacing 14 | - Utilities 15 | - Clearing 16 | - Media Queries 17 | */ 18 | 19 | /* Grid 20 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 21 | .container { 22 | position: relative; 23 | width: 100%; 24 | max-width: 960px; 25 | margin: 0 auto; 26 | padding: 0 20px; 27 | box-sizing: border-box; } 28 | .column, 29 | .columns { 30 | width: 100%; 31 | float: left; 32 | box-sizing: border-box; } 33 | 34 | /* For devices larger than 400px */ 35 | @media (min-width: 400px) { 36 | .container { 37 | width: 85%; 38 | padding: 0; } 39 | } 40 | 41 | /* For devices larger than 550px */ 42 | @media (min-width: 550px) { 43 | .container { 44 | width: 80%; 45 | } 46 | .column, 47 | .columns { 48 | margin-left: 4%; } 49 | .column:first-child, 50 | .columns:first-child { 51 | margin-left: 0; } 52 | 53 | 54 | .one.column, 55 | .one.columns { width: 4.66666666667%; } 56 | .two.columns { width: 13.3333333333%; } 57 | .three.columns { width: 22%; } 58 | .four.columns { width: 30.6666666667%; } 59 | .five.columns { width: 39.3333333333%; } 60 | .six.columns { width: 48%; } 61 | .seven.columns { width: 56.6666666667%; } 62 | .eight.columns { width: 65.3333333333%; } 63 | .nine.columns { width: 74.0%; } 64 | .ten.columns { width: 82.6666666667%; } 65 | .eleven.columns { width: 91.3333333333%; } 66 | .twelve.columns { width: 98%; margin-left: 0; margin-right: 0;} 67 | 68 | .one-third.column { width: 32%; margin-right: 0.5;} 69 | .one-third.column:last-child { margin-right: 0;} 70 | .two-thirds.column { width: 65.3333333333%; } 71 | 72 | .one-half.column { width: 48%; } 73 | 74 | /* Offsets */ 75 | .offset-by-one.column, 76 | .offset-by-one.columns { margin-left: 8.66666666667%; } 77 | .offset-by-two.column, 78 | .offset-by-two.columns { margin-left: 17.3333333333%; } 79 | .offset-by-three.column, 80 | .offset-by-three.columns { margin-left: 26%; } 81 | .offset-by-four.column, 82 | .offset-by-four.columns { margin-left: 34.6666666667%; } 83 | .offset-by-five.column, 84 | .offset-by-five.columns { margin-left: 43.3333333333%; } 85 | .offset-by-six.column, 86 | .offset-by-six.columns { margin-left: 52%; } 87 | .offset-by-seven.column, 88 | .offset-by-seven.columns { margin-left: 60.6666666667%; } 89 | .offset-by-eight.column, 90 | .offset-by-eight.columns { margin-left: 69.3333333333%; } 91 | .offset-by-nine.column, 92 | .offset-by-nine.columns { margin-left: 78.0%; } 93 | .offset-by-ten.column, 94 | .offset-by-ten.columns { margin-left: 86.6666666667%; } 95 | .offset-by-eleven.column, 96 | .offset-by-eleven.columns { margin-left: 95.3333333333%; } 97 | 98 | .offset-by-one-third.column, 99 | .offset-by-one-third.columns { margin-left: 34.6666666667%; } 100 | .offset-by-two-thirds.column, 101 | .offset-by-two-thirds.columns { margin-left: 69.3333333333%; } 102 | 103 | .offset-by-one-half.column, 104 | .offset-by-one-half.columns { margin-left: 52%; } 105 | 106 | } 107 | 108 | 109 | /* Base Styles 110 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 111 | /* NOTE 112 | html is set to 62.5% so that all the REM measurements throughout Skeleton 113 | are based on 10px sizing. So basically 1.5rem = 15px :) */ 114 | html { 115 | font-size: 62.5%; } 116 | body { 117 | font-size: 1.5em; /* currently ems cause chrome bug misinterpreting rems on body element */ 118 | line-height: 1.6; 119 | font-weight: 400; 120 | font-family: "Open Sans", "HelveticaNeue", "Helvetica Neue", Helvetica, Arial, sans-serif; 121 | color: #cfd8dc; /* Material blue-grey 100 */ 122 | background-color: #191a1a; /* Material blue-grey 900*/ 123 | margin: 2%; 124 | } 125 | 126 | 127 | /* Typography 128 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 129 | h1, h2, h3, h4, h5, h6 { 130 | margin-top: 0; 131 | margin-bottom: 0; 132 | font-weight: 300; } 133 | h1 { font-size: 3.2rem; line-height: 1.2; letter-spacing: -.1rem; margin-bottom: 2rem; } 134 | h2 { font-size: 3.0rem; line-height: 1.25; letter-spacing: -.1rem; margin-bottom: 1.8rem; margin-top: 1.8rem;} 135 | h3 { font-size: 2.7rem; line-height: 1.3; letter-spacing: -.1rem; margin-bottom: 1.5rem; margin-top:1.5rem;} 136 | h4 { font-size: 2.4rem; line-height: 1.35; letter-spacing: -.08rem; margin-bottom: 1.2rem; margin-top: 1.2rem;} 137 | h5 { font-size: 2.0rem; line-height: 1.5; letter-spacing: -.05rem; margin-bottom: 0.6rem; margin-top: 0.6rem;} 138 | h6 { font-size: 2.0rem; line-height: 1.6; letter-spacing: 0; margin-bottom: 0.75rem; margin-top: 0.75rem;} 139 | 140 | p { 141 | margin-top: 0; } 142 | 143 | 144 | /* Blockquotes 145 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 146 | blockquote { 147 | border-left: 4px lightgrey solid; 148 | padding-left: 1rem; 149 | margin-top: 2rem; 150 | margin-bottom: 2rem; 151 | margin-left: 0rem; 152 | } 153 | 154 | 155 | /* Links 156 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 157 | a { 158 | color: #1565c0; /* Material Blue 800 */ 159 | text-decoration: underline; 160 | cursor: pointer;} 161 | a:hover { 162 | color: #0d47a1; /* Material Blue 900 */ 163 | } 164 | 165 | 166 | /* Buttons 167 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 168 | .button, 169 | button, 170 | input[type="submit"], 171 | input[type="reset"], 172 | input[type="button"] { 173 | display: inline-block; 174 | height: 38px; 175 | padding: 0 30px; 176 | color: #90a4ae; /* Material blue-gray 300*/ 177 | text-align: center; 178 | font-size: 11px; 179 | font-weight: 600; 180 | line-height: 38px; 181 | letter-spacing: .1rem; 182 | text-transform: uppercase; 183 | text-decoration: none; 184 | white-space: nowrap; 185 | background-color: transparent; 186 | border-radius: 4px; 187 | border: 1px solid #90a4ae; /* Material blue-gray 300*/ 188 | cursor: pointer; 189 | box-sizing: border-box; } 190 | .button:hover, 191 | button:hover, 192 | input[type="submit"]:hover, 193 | input[type="reset"]:hover, 194 | input[type="button"]:hover, 195 | .button:focus, 196 | button:focus, 197 | input[type="submit"]:focus, 198 | input[type="reset"]:focus, 199 | input[type="button"]:focus { 200 | color: #cfd8dc; 201 | border-color: #cfd8dc; 202 | outline: 0; } 203 | .button.button-primary, 204 | button.button-primary, 205 | input[type="submit"].button-primary, 206 | input[type="reset"].button-primary, 207 | input[type="button"].button-primary { 208 | color: #FFF; 209 | background-color: #33C3F0; 210 | border-color: #33C3F0; } 211 | .button.button-primary:hover, 212 | button.button-primary:hover, 213 | input[type="submit"].button-primary:hover, 214 | input[type="reset"].button-primary:hover, 215 | input[type="button"].button-primary:hover, 216 | .button.button-primary:focus, 217 | button.button-primary:focus, 218 | input[type="submit"].button-primary:focus, 219 | input[type="reset"].button-primary:focus, 220 | input[type="button"].button-primary:focus { 221 | color: #FFF; 222 | background-color: #1EAEDB; 223 | border-color: #1EAEDB; } 224 | 225 | 226 | /* Forms 227 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 228 | input[type="email"], 229 | input[type="number"], 230 | input[type="search"], 231 | input[type="text"], 232 | input[type="tel"], 233 | input[type="url"], 234 | input[type="password"], 235 | textarea, 236 | select { 237 | height: 38px; 238 | padding: 6px 10px; /* The 6px vertically centers text on FF, ignored by Webkit */ 239 | background-color: #fff; 240 | border: 1px solid #D1D1D1; 241 | border-radius: 4px; 242 | box-shadow: none; 243 | box-sizing: border-box; 244 | font-family: inherit; 245 | font-size: inherit; /*https://stackoverflow.com/questions/6080413/why-doesnt-input-inherit-the-font-from-body*/} 246 | /* Removes awkward default styles on some inputs for iOS */ 247 | input[type="email"], 248 | input[type="number"], 249 | input[type="search"], 250 | input[type="text"], 251 | input[type="tel"], 252 | input[type="url"], 253 | input[type="password"], 254 | textarea { 255 | -webkit-appearance: none; 256 | -moz-appearance: none; 257 | appearance: none; } 258 | textarea { 259 | min-height: 65px; 260 | padding-top: 6px; 261 | padding-bottom: 6px; } 262 | input[type="email"]:focus, 263 | input[type="number"]:focus, 264 | input[type="search"]:focus, 265 | input[type="text"]:focus, 266 | input[type="tel"]:focus, 267 | input[type="url"]:focus, 268 | input[type="password"]:focus, 269 | textarea:focus, 270 | select:focus { 271 | border: 1px solid #33C3F0; 272 | outline: 0; } 273 | label, 274 | legend { 275 | display: block; 276 | margin-bottom: 0px; } 277 | fieldset { 278 | padding: 0; 279 | border-width: 0; } 280 | input[type="checkbox"], 281 | input[type="radio"] { 282 | display: inline; } 283 | label > .label-body { 284 | display: inline-block; 285 | margin-left: .5rem; 286 | font-weight: normal; } 287 | 288 | 289 | /* Lists 290 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 291 | ul { 292 | list-style: circle inside; } 293 | ol { 294 | list-style: decimal inside; } 295 | ol, ul { 296 | padding-left: 0; 297 | margin-top: 0; } 298 | ul ul, 299 | ul ol, 300 | ol ol, 301 | ol ul { 302 | margin: 1.5rem 0 1.5rem 3rem; 303 | font-size: 90%; } 304 | li { 305 | margin-bottom: 0; 306 | } 307 | 308 | /* Tables 309 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 310 | table { 311 | border-collapse: collapse; 312 | } 313 | th:not(.CalendarDay), 314 | td:not(.CalendarDay) { 315 | padding: 4px 10px; 316 | text-align: left; 317 | /*border-bottom: 1px solid #E1E1E1;*/ 318 | } 319 | th:first-child:not(.CalendarDay), 320 | td:first-child:not(.CalendarDay) { 321 | padding-left: 0; } 322 | th:last-child:not(.CalendarDay), 323 | /*td:last-child:not(.CalendarDay) {*/ 324 | /* padding-right: 0; }*/ 325 | 326 | 327 | /* Spacing 328 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 329 | button, 330 | .button { 331 | margin-bottom: 0rem; } 332 | input, 333 | textarea, 334 | select, 335 | fieldset { 336 | margin-bottom: 0rem; } 337 | pre, 338 | dl, 339 | figure, 340 | table, 341 | form { 342 | margin-bottom: 0rem; } 343 | p, 344 | ul, 345 | ol { 346 | margin-bottom: 0.75rem; } 347 | 348 | /* Utilities 349 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 350 | .u-full-width { 351 | width: 100%; 352 | box-sizing: border-box; } 353 | .u-max-full-width { 354 | max-width: 100%; 355 | box-sizing: border-box; } 356 | .u-pull-right { 357 | float: right; } 358 | .u-pull-left { 359 | float: left; } 360 | 361 | 362 | /* Misc 363 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 364 | hr { 365 | margin-top: 3rem; 366 | margin-bottom: 3.5rem; 367 | border-width: 0; 368 | border-top: 1px solid #E1E1E1; } 369 | 370 | 371 | /* Clearing 372 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 373 | 374 | /* Self Clearing Goodness */ 375 | .container:after, 376 | .row:after, 377 | .u-cf { 378 | content: ""; 379 | display: table; 380 | clear: both; } 381 | 382 | 383 | /* Media Queries 384 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 385 | /* 386 | Note: The best way to structure the use of media queries is to create the queries 387 | near the relevant code. For example, if you wanted to change the styles for buttons 388 | on small devices, paste the mobile query code up in the buttons section and style it 389 | there. 390 | */ 391 | 392 | 393 | /* Larger than mobile */ 394 | @media (min-width: 400px) {} 395 | 396 | /* Larger than phablet (also point when grid becomes active) */ 397 | @media (min-width: 550px) {} 398 | 399 | /* Larger than tablet */ 400 | @media (min-width: 750px) {} 401 | 402 | /* Larger than desktop */ 403 | @media (min-width: 1000px) {} 404 | 405 | /* Larger than Desktop HD */ 406 | @media (min-width: 1200px) {} 407 | 408 | /* Pretty container 409 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 410 | .pretty_container { 411 | border-radius: 10px; 412 | background-color: #000000; /* Mapbox light map land color */ 413 | margin: 0.5%; 414 | margin-left: 1.5%; 415 | padding: 1%; 416 | position: relative; 417 | box-shadow: 1px 1px 1px black; 418 | } 419 | 420 | .container_title { 421 | margin-top: 0; 422 | margin-bottom: 0.2em; 423 | font-size: 2.6rem; 424 | line-height: 2.6rem; 425 | } 426 | 427 | /* Special purpose buttons 428 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 429 | .reset-button { 430 | /* width: 100%; */ 431 | /* margin-top: 10px; */ 432 | margin-top: -5px; 433 | height: 30px; 434 | line-height: 30px; 435 | float: right; 436 | } 437 | 438 | .info-icon { 439 | float: right; 440 | cursor: pointer; 441 | } 442 | 443 | 444 | /* Modal info layer 445 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 446 | .modal { 447 | position: fixed; 448 | z-index: 1002; /* Sit on top, including modebar which has z=1001 */ 449 | left: 0; 450 | top: 0; 451 | width: 100%; /* Full width */ 452 | height: 100%; /* Full height */ 453 | background-color: rgba(0, 0, 0, 0.6); /* Black w/ opacity */ 454 | } 455 | 456 | .modal-content { 457 | z-index: 1004; /* Sit on top, including modebar which has z=1001 */ 458 | position: fixed; 459 | left: 0; 460 | width: 60%; 461 | background-color: #3949ab; /* Material indigo 600 */ 462 | color: white; 463 | border-radius: 5px; 464 | margin-left: 20%; 465 | margin-bottom: 2%; 466 | margin-top: 2%; 467 | } 468 | 469 | .modal-content > div { 470 | text-align: left; 471 | margin: 15px; 472 | } 473 | 474 | .modal-content.bottom { 475 | bottom: 0; 476 | } 477 | 478 | .modal-content.top { 479 | top: 0; 480 | } 481 | 482 | /* Config pane 483 | –––––––––––––––––––––––––––––––––––––––––––––––––– */ 484 | .config-label { 485 | text-align: right !important; 486 | } 487 | 488 | .VirtualizedSelectOption { 489 | color: #191a1a; 490 | } -------------------------------------------------------------------------------- /plotly_demo/dask_app.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import cudf 3 | from dash import dcc, html 4 | import numpy as np 5 | import pandas as pd 6 | from dash.dependencies import Input, Output, State 7 | from dash import dcc 8 | import dash_bootstrap_components as dbc 9 | import time 10 | import dash_daq as daq 11 | import dash 12 | from dask import delayed 13 | from distributed import Client 14 | from dask_cuda import LocalCUDACluster 15 | from utils.utils import * 16 | import argparse 17 | 18 | 19 | # ### Dashboards start here 20 | text_color = "#cfd8dc" # Material blue-grey 100 21 | 22 | ( 23 | data_center_3857, 24 | data_3857, 25 | data_4326, 26 | data_center_4326, 27 | selected_map_backup, 28 | selected_race_backup, 29 | selected_county_top_backup, 30 | selected_county_bt_backup, 31 | view_name_backup, 32 | gpu_enabled_backup, 33 | dragmode_backup, 34 | ) = ([], [], [], [], None, None, None, None, None, None, "pan") 35 | 36 | 37 | app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP]) 38 | app.layout = html.Div( 39 | children=[ 40 | ################# Title Bar ############## 41 | html.Div( 42 | [ 43 | html.H1( 44 | children=[ 45 | "Census 2020 Net Migration Visualization", 46 | html.A( 47 | html.Img( 48 | src="assets/rapids-logo.png", 49 | style={ 50 | "float": "right", 51 | "height": "45px", 52 | "margin-right": "1%", 53 | "margin-top": "-7px", 54 | }, 55 | ), 56 | href="https://rapids.ai/", 57 | ), 58 | html.A( 59 | html.Img( 60 | src="assets/dash-logo.png", 61 | style={"float": "right", "height": "30px"}, 62 | ), 63 | href="https://dash.plot.ly/", 64 | ), 65 | ], 66 | style={ 67 | "text-align": "left", 68 | "heights": "30px", 69 | "margin-left": "20px", 70 | }, 71 | ), 72 | ] 73 | ), 74 | ###################### Options Bar ###################### 75 | html.Div( 76 | children=[ 77 | html.Div( 78 | children=[ 79 | html.Table( 80 | [ 81 | html.Tr( 82 | [ 83 | html.Td( 84 | html.Div("CPU"), 85 | style={ 86 | "font-size": "20px", 87 | "padding-left": "1.3rem", 88 | }, # className="config-label" 89 | ), 90 | html.Td( 91 | html.Div( 92 | [ 93 | daq.DarkThemeProvider( 94 | daq.BooleanSwitch( 95 | on=True, # Turn on CPU/GPU 96 | color="#00cc96", 97 | id="gpu-toggle", 98 | ) 99 | ), 100 | dbc.Tooltip( 101 | "Caution: Using CPU compute for more than 50 million points is not recommended.", 102 | target="gpu-toggle", 103 | placement="bottom", 104 | autohide=True, 105 | style={ 106 | "textAlign": "left", 107 | "font-size": "15px", 108 | "color": "white", 109 | "width": "350px", 110 | "padding": "15px", 111 | "border-radius": "5px", 112 | "background-color": "#2a2a2e", 113 | }, 114 | ), 115 | ] 116 | ) 117 | ), 118 | html.Td( 119 | html.Div("GPU + RAPIDS"), 120 | style={ 121 | "font-size": "20px" 122 | }, # , className="config-label" 123 | ), 124 | ####### Indicator graph ###### 125 | html.Td( 126 | [ 127 | dcc.Loading( 128 | dcc.Graph( 129 | id="indicator-graph", 130 | figure=blank_fig(50), 131 | config={ 132 | "displayModeBar": False 133 | }, 134 | style={"width": "95%"}, 135 | ), 136 | color="#b0bec5", 137 | # style={'height': f'{50}px', 'width':'10px'} 138 | ), # style={'width': '50%'}, 139 | ] 140 | ), 141 | ###### VIEWS ARE HERE ########### 142 | html.Td( 143 | html.Div("Data-Selection"), 144 | style={"font-size": "20px"}, 145 | ), # className="config-label" 146 | html.Td( 147 | dcc.Dropdown( 148 | id="view-dropdown", 149 | options=[ 150 | { 151 | "label": "Total Population", 152 | "value": "total", 153 | }, 154 | { 155 | "label": "Migrating In", 156 | "value": "in", 157 | }, 158 | { 159 | "label": "Stationary", 160 | "value": "stationary", 161 | }, 162 | { 163 | "label": "Migrating Out", 164 | "value": "out", 165 | }, 166 | { 167 | "label": "Net Migration", 168 | "value": "net", 169 | }, 170 | { 171 | "label": "Population with Race", 172 | "value": "race", 173 | }, 174 | ], 175 | value="in", 176 | searchable=False, 177 | clearable=False, 178 | ), 179 | style={ 180 | "width": "10%", 181 | "height": "15px", 182 | }, 183 | ), 184 | html.Td( 185 | html.Div( 186 | children=[ 187 | html.Button( 188 | "Clear All Selections", 189 | id="clear-all", 190 | className="reset-button", 191 | ), 192 | ] 193 | ), 194 | style={ 195 | "width": "10%", 196 | "height": "15px", 197 | }, 198 | ), 199 | ] 200 | ), 201 | ], 202 | style={"width": "100%", "margin-top": "0px"}, 203 | ), 204 | # Hidden div inside the app that stores the intermediate value 205 | html.Div( 206 | id="datapoints-state-value", 207 | style={"display": "none"}, 208 | ), 209 | ], 210 | className="columns pretty_container", 211 | ), # className='columns pretty_container', id="config-div"), 212 | ] 213 | ), 214 | ########## End of options bar ####################################### 215 | html.Hr(id="line1", style={"border": "1px solid grey", "margin": "0px"}), 216 | # html.Div( html.Hr(id='line',style={'border': '1px solid red'}) ), 217 | ##################### Map starts ################################### 218 | html.Div( 219 | children=[ 220 | html.Button( 221 | "Clear Selection", id="reset-map", className="reset-button" 222 | ), 223 | html.H4( 224 | [ 225 | "Individual Distribution", 226 | ], 227 | className="container_title", 228 | ), 229 | dcc.Graph( 230 | id="map-graph", 231 | config={"displayModeBar": False}, 232 | figure=blank_fig(440), 233 | ), 234 | # Hidden div inside the app that stores the intermediate value 235 | html.Div(id="intermediate-state-value", style={"display": "none"}), 236 | ], 237 | className="columns pretty_container", 238 | style={"width": "100%", "margin-right": "0", "height": "66%"}, 239 | id="map-div", 240 | ), 241 | html.Hr(id="line2", style={"border": "1px solid grey", "margin": "0px"}), 242 | ################# Bars start ######################### 243 | # Race start 244 | html.Div( 245 | children=[ 246 | html.Button( 247 | "Clear Selection", 248 | id="clear-race", 249 | className="reset-button", 250 | ), 251 | html.H4( 252 | [ 253 | "Race Distribution", 254 | ], 255 | className="container_title", 256 | ), 257 | dcc.Graph( 258 | id="race-histogram", 259 | config={"displayModeBar": False}, 260 | figure=blank_fig(row_heights[2]), 261 | animate=False, 262 | ), 263 | ], 264 | className="columns pretty_container", 265 | id="race-div", 266 | style={"width": "33.33%", "height": "20%"}, 267 | ), 268 | # County top starts 269 | html.Div( 270 | children=[ 271 | html.Button( 272 | "Clear Selection", 273 | id="clear-county-top", 274 | className="reset-button", 275 | ), 276 | html.H4( 277 | [ 278 | "County-wise Top 15", 279 | ], 280 | className="container_title", 281 | ), 282 | dcc.Graph( 283 | id="county-histogram-top", 284 | config={"displayModeBar": False}, 285 | figure=blank_fig(row_heights[2]), 286 | animate=False, 287 | ), 288 | ], 289 | className="columns pretty_container", 290 | id="county-div-top", 291 | style={"width": "33.33%", "height": "20%"}, 292 | ), 293 | # County bottom starts 294 | html.Div( 295 | children=[ 296 | html.Button( 297 | "Clear Selection", 298 | id="clear-county-bottom", 299 | className="reset-button", 300 | ), 301 | html.H4( 302 | [ 303 | "County-wise Bottom 15", 304 | ], 305 | className="container_title", 306 | ), 307 | dcc.Graph( 308 | id="county-histogram-bottom", 309 | config={"displayModeBar": False}, 310 | figure=blank_fig(row_heights[2]), 311 | animate=False, 312 | ), 313 | ], 314 | className="columns pretty_container", 315 | id="county-div-bottom", 316 | style={"width": "33.33%", "height": "20%"}, 317 | ), 318 | ############## End of Bars ##################### 319 | html.Hr(id="line3", style={"border": "1px solid grey", "margin": "0px"}), 320 | html.Div( 321 | [ 322 | html.H4( 323 | "Acknowledgements and Data Sources", 324 | style={"margin-top": "0"}, 325 | ), 326 | dcc.Markdown( 327 | """\ 328 | - 2020 Population Census and 2010 Population Census to compute Migration Dataset, used with permission from IPUMS NHGIS, University of Minnesota, [www.nhgis.org](https://www.nhgis.org/) ( not for redistribution ). 329 | - Base map layer provided by [Mapbox](https://www.mapbox.com/). 330 | - Dashboard developed with [Plotly Dash](https://plotly.com/dash/). 331 | - Geospatial point rendering developed with [Datashader](https://datashader.org/). 332 | - GPU toggle accelerated with [RAPIDS cudf and dask_cudf](https://rapids.ai/) and [cupy](https://cupy.chainer.org/), CPU toggle with [pandas](https://pandas.pydata.org/). 333 | - For source code and data workflow, visit our [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/master). 334 | """ 335 | ), 336 | ], 337 | style={"width": "100%"}, 338 | className="columns pretty_container", 339 | ), 340 | ] 341 | ) 342 | 343 | # Clear/reset button callbacks 344 | 345 | 346 | @app.callback( 347 | Output("map-graph", "selectedData"), 348 | [Input("reset-map", "n_clicks"), Input("clear-all", "n_clicks")], 349 | ) 350 | def clear_map(*args): 351 | return None 352 | 353 | 354 | @app.callback( 355 | Output("race-histogram", "selectedData"), 356 | [Input("clear-race", "n_clicks"), Input("clear-all", "n_clicks")], 357 | ) 358 | def clear_race_hist_selections(*args): 359 | return None 360 | 361 | 362 | @app.callback( 363 | Output("county-histogram-top", "selectedData"), 364 | [Input("clear-county-top", "n_clicks"), Input("clear-all", "n_clicks")], 365 | ) 366 | def clear_county_hist_top_selections(*args): 367 | return None 368 | 369 | 370 | @app.callback( 371 | Output("county-histogram-bottom", "selectedData"), 372 | [Input("clear-county-bottom", "n_clicks"), Input("clear-all", "n_clicks")], 373 | ) 374 | def clear_county_hist_bottom_selections(*args): 375 | return None 376 | 377 | 378 | # # Query string helpers 379 | 380 | 381 | def register_update_plots_callback(client): 382 | """ 383 | Register Dash callback that updates all plots in response to selection events 384 | Args: 385 | df_d: Dask.delayed pandas or cudf DataFrame 386 | """ 387 | 388 | @app.callback( 389 | [ 390 | Output("indicator-graph", "figure"), 391 | Output("map-graph", "figure"), 392 | Output("map-graph", "config"), 393 | Output("map-graph", "relayoutData"), 394 | Output("county-histogram-top", "figure"), 395 | Output("county-histogram-top", "config"), 396 | Output("county-histogram-bottom", "figure"), 397 | Output("county-histogram-bottom", "config"), 398 | Output("race-histogram", "figure"), 399 | Output("race-histogram", "config"), 400 | Output("intermediate-state-value", "children"), 401 | ], 402 | [ 403 | Input("map-graph", "relayoutData"), 404 | Input("map-graph", "selectedData"), 405 | Input("race-histogram", "selectedData"), 406 | Input("county-histogram-top", "selectedData"), 407 | Input("county-histogram-bottom", "selectedData"), 408 | Input("view-dropdown", "value"), 409 | Input("gpu-toggle", "on"), 410 | ], 411 | [ 412 | State("intermediate-state-value", "children"), 413 | State("indicator-graph", "figure"), 414 | State("map-graph", "figure"), 415 | State("map-graph", "config"), 416 | State("map-graph", "relayoutData"), 417 | State("county-histogram-top", "figure"), 418 | State("county-histogram-top", "config"), 419 | State("county-histogram-bottom", "figure"), 420 | State("county-histogram-bottom", "config"), 421 | State("race-histogram", "figure"), 422 | State("race-histogram", "config"), 423 | State("intermediate-state-value", "children"), 424 | ], 425 | ) 426 | def update_plots( 427 | relayout_data, 428 | selected_map, 429 | selected_race, 430 | selected_county_top, 431 | selected_county_bottom, 432 | view_name, 433 | gpu_enabled, 434 | coordinates_backup, 435 | *backup_args, 436 | ): 437 | global data_3857, data_center_3857, data_4326, data_center_4326, selected_map_backup, selected_race_backup, selected_county_top_backup, selected_county_bt_backup, view_name_backup, gpu_enabled_backup, dragmode_backup 438 | 439 | # condition to avoid reloading on tool update 440 | if ( 441 | type(relayout_data) == dict 442 | and list(relayout_data.keys()) == ["dragmode"] 443 | and selected_map == selected_map_backup 444 | and selected_race_backup == selected_race 445 | and selected_county_top_backup == selected_county_top 446 | and selected_county_bt_backup == selected_county_bottom 447 | and view_name_backup == view_name 448 | and gpu_enabled_backup == gpu_enabled 449 | ): 450 | backup_args[1]["layout"]["dragmode"] = relayout_data["dragmode"] 451 | dragmode_backup = relayout_data["dragmode"] 452 | return backup_args 453 | 454 | selected_map_backup = selected_map 455 | selected_race_backup = selected_race 456 | selected_county_top_backup = selected_county_top 457 | selected_county_bt_backup = selected_county_bottom 458 | view_name_backup = view_name 459 | gpu_enabled_backup = gpu_enabled 460 | 461 | t0 = time.time() 462 | 463 | if coordinates_backup is not None: 464 | coordinates_4326_backup, position_backup = coordinates_backup 465 | else: 466 | coordinates_4326_backup, position_backup = None, None 467 | 468 | # Get delayed dataset from client 469 | if gpu_enabled: 470 | df = client.get_dataset("c_df_d") 471 | else: 472 | df = client.get_dataset("pd_df_d") 473 | 474 | colorscale_name = "Viridis" 475 | 476 | if data_3857 == []: 477 | projections = delayed(set_projection_bounds)(df) 478 | ( 479 | data_3857, 480 | data_center_3857, 481 | data_4326, 482 | data_center_4326, 483 | ) = projections.compute() 484 | 485 | figures = build_updated_figures_dask( 486 | df, 487 | relayout_data, 488 | selected_map, 489 | selected_race, 490 | selected_county_top, 491 | selected_county_bottom, 492 | colorscale_name, 493 | data_3857, 494 | data_center_3857, 495 | data_4326, 496 | data_center_4326, 497 | coordinates_4326_backup, 498 | position_backup, 499 | view_name, 500 | ) 501 | 502 | ( 503 | datashader_plot, 504 | race_histogram, 505 | county_top_histogram, 506 | county_bottom_histogram, 507 | n_selected_indicator, 508 | coordinates_4326_backup, 509 | position_backup, 510 | ) = figures 511 | 512 | barchart_config = { 513 | "displayModeBar": True, 514 | "modeBarButtonsToRemove": [ 515 | "zoom2d", 516 | "pan2d", 517 | "select2d", 518 | "lasso2d", 519 | "zoomIn2d", 520 | "zoomOut2d", 521 | "resetScale2d", 522 | "hoverClosestCartesian", 523 | "hoverCompareCartesian", 524 | "toggleSpikelines", 525 | ], 526 | } 527 | compute_time = time.time() - t0 528 | print(f"Query time: {compute_time}") 529 | n_selected_indicator["data"].append( 530 | { 531 | "title": {"text": "Query Time"}, 532 | "type": "indicator", 533 | "value": round(compute_time, 4), 534 | "domain": {"x": [0.53, 0.61], "y": [0, 0.5]}, 535 | "number": { 536 | "font": { 537 | "color": text_color, 538 | "size": "50px", 539 | }, 540 | "suffix": " seconds", 541 | }, 542 | } 543 | ) 544 | datashader_plot["layout"]["dragmode"] = ( 545 | relayout_data["dragmode"] 546 | if (relayout_data and "dragmode" in relayout_data) 547 | else dragmode_backup 548 | ) 549 | 550 | return ( 551 | n_selected_indicator, 552 | datashader_plot, 553 | { 554 | "displayModeBar": True, 555 | "modeBarButtonsToRemove": [ 556 | "lasso2d", 557 | "zoomInMapbox", 558 | "zoomOutMapbox", 559 | "toggleHover", 560 | ], 561 | }, 562 | relayout_data, 563 | race_histogram, 564 | barchart_config, 565 | county_top_histogram, 566 | barchart_config, 567 | county_bottom_histogram, 568 | barchart_config, 569 | (coordinates_4326_backup, position_backup), 570 | ) 571 | 572 | 573 | def publish_dataset_to_cluster(cuda_visible_devices): 574 | 575 | census_data_url = "https://data.rapids.ai/viz-data/total_population_dataset.parquet" 576 | data_path = "../data/total_population_dataset.parquet" 577 | check_dataset(census_data_url, data_path) 578 | 579 | # Note: The creation of a Dask LocalCluster must happen inside the `__main__` block, 580 | cluster = ( 581 | LocalCUDACluster(CUDA_VISIBLE_DEVICES=cuda_visible_devices) 582 | if cuda_visible_devices 583 | else LocalCUDACluster() 584 | ) 585 | client = Client(cluster) 586 | print(f"Dask status: {cluster.dashboard_link}") 587 | 588 | # Load dataset and persist dataset on cluster 589 | def load_and_publish_dataset(): 590 | # dask_cudf DataFrame 591 | c_df_d = load_dataset(data_path, "dask_cudf").persist() 592 | # pandas DataFrame 593 | pd_df_d = load_dataset(data_path, "dask").persist() 594 | 595 | # Unpublish datasets if present 596 | for ds_name in ["pd_df_d", "c_df_d"]: 597 | if ds_name in client.datasets: 598 | client.unpublish_dataset(ds_name) 599 | 600 | # Publish datasets to the cluster 601 | client.publish_dataset(pd_df_d=pd_df_d) 602 | client.publish_dataset(c_df_d=c_df_d) 603 | 604 | load_and_publish_dataset() 605 | 606 | # Precompute field bounds 607 | c_df_d = client.get_dataset("c_df_d") 608 | 609 | # Register top-level callback that updates plots 610 | register_update_plots_callback(client) 611 | 612 | 613 | if __name__ == "__main__": 614 | parser = argparse.ArgumentParser() 615 | parser.add_argument( 616 | "--cuda_visible_devices", 617 | help="supply the value of CUDA_VISIBLE_DEVICES as a comma separated string (e.g: --cuda_visible_devices=0,1), if None, all the available GPUs are used", 618 | default=None, 619 | ) 620 | 621 | args = parser.parse_args() 622 | # development entry point 623 | publish_dataset_to_cluster(args.cuda_visible_devices) 624 | 625 | # Launch dashboard 626 | app.run_server(debug=False, dev_tools_silence_routes_logging=True, host="0.0.0.0") 627 | -------------------------------------------------------------------------------- /plotly_demo/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .utils import * 2 | -------------------------------------------------------------------------------- /plotly_demo/utils/utils.py: -------------------------------------------------------------------------------- 1 | from bokeh import palettes 2 | from pyproj import Transformer 3 | import cudf 4 | import cupy as cp 5 | import dask.dataframe as dd 6 | import datashader as ds 7 | import datashader.transfer_functions as tf 8 | import io 9 | import numpy as np 10 | import os 11 | import pandas as pd 12 | import pickle 13 | import requests 14 | 15 | try: 16 | import dask_cudf 17 | except ImportError: 18 | dask_cudf = None 19 | 20 | # Colors 21 | bgcolor = "#000000" # mapbox dark map land color 22 | text_color = "#cfd8dc" # Material blue-grey 100 23 | mapbox_land_color = "#000000" 24 | c = 9200 25 | # Colors for categories 26 | colors = {} 27 | colors["race"] = { 28 | 1: "aqua", 29 | 2: "lime", 30 | 3: "yellow", 31 | 4: "orange", 32 | 5: "blue", 33 | 6: "fuchsia", 34 | 7: "saddlebrown", 35 | } 36 | race2color = { 37 | "White": "aqua", 38 | "African American": "lime", 39 | "American Indian": "yellow", 40 | "Asian alone": "orange", 41 | "Native Hawaiian": "blue", 42 | "Other Race alone": "fuchsia", 43 | "Two or More": "saddlebrown", 44 | } 45 | colors["net"] = { 46 | -1: palettes.RdPu9[2], 47 | 0: palettes.Greens9[4], 48 | 1: palettes.PuBu9[2], 49 | } # '#32CD32' 50 | 51 | 52 | # Figure template 53 | row_heights = [150, 440, 300, 75] 54 | template = { 55 | "layout": { 56 | "paper_bgcolor": bgcolor, 57 | "plot_bgcolor": bgcolor, 58 | "font": {"color": text_color}, 59 | "margin": {"r": 0, "t": 0, "l": 0, "b": 0}, 60 | "bargap": 0.05, 61 | "xaxis": {"showgrid": False, "automargin": True}, 62 | "yaxis": {"showgrid": True, "automargin": True}, 63 | # 'gridwidth': 0.5, 'gridcolor': mapbox_land_color}, 64 | } 65 | } 66 | 67 | url = "https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/main/id2county.pkl" 68 | id2county = pickle.load(io.BytesIO(requests.get(url).content)) 69 | county2id = {v: k for k, v in id2county.items()} 70 | id2race = { 71 | 0: "All", 72 | 1: "White", 73 | 2: "African American", 74 | 3: "American Indian", 75 | 4: "Asian alone", 76 | 5: "Native Hawaiian", 77 | 6: "Other Race alone", 78 | 7: "Two or More", 79 | } 80 | race2id = {v: k for k, v in id2race.items()} 81 | 82 | mappings = {} 83 | mappings_hover = {} 84 | mapbox_style = "carto-darkmatter" 85 | 86 | 87 | def set_projection_bounds(df_d): 88 | transformer_4326_to_3857 = Transformer.from_crs("epsg:4326", "epsg:3857") 89 | 90 | def epsg_4326_to_3857(coords): 91 | return [transformer_4326_to_3857.transform(*reversed(row)) for row in coords] 92 | 93 | transformer_3857_to_4326 = Transformer.from_crs("epsg:3857", "epsg:4326") 94 | 95 | def epsg_3857_to_4326(coords): 96 | return [ 97 | list(reversed(transformer_3857_to_4326.transform(*row))) for row in coords 98 | ] 99 | 100 | data_3857 = ( 101 | [df_d.easting.min(), df_d.northing.min()], 102 | [df_d.easting.max(), df_d.northing.max()], 103 | ) 104 | data_center_3857 = [ 105 | [ 106 | (data_3857[0][0] + data_3857[1][0]) / 2.0, 107 | (data_3857[0][1] + data_3857[1][1]) / 2.0, 108 | ] 109 | ] 110 | 111 | data_4326 = epsg_3857_to_4326(data_3857) 112 | data_center_4326 = epsg_3857_to_4326(data_center_3857) 113 | 114 | return data_3857, data_center_3857, data_4326, data_center_4326 115 | 116 | 117 | # Build Dash app and initial layout 118 | def blank_fig(height): 119 | """ 120 | Build blank figure with the requested height 121 | Args: 122 | height: height of blank figure in pixels 123 | Returns: 124 | Figure dict 125 | """ 126 | return { 127 | "data": [], 128 | "layout": { 129 | "height": height, 130 | "template": template, 131 | "xaxis": {"visible": False}, 132 | "yaxis": {"visible": False}, 133 | }, 134 | } 135 | 136 | 137 | # Plot functions 138 | def build_colorscale(colorscale_name, transform): 139 | """ 140 | Build plotly colorscale 141 | Args: 142 | colorscale_name: Name of a colorscale from the plotly.colors.sequential module 143 | transform: Transform to apply to colors scale. One of 'linear', 'sqrt', 'cbrt', 144 | or 'log' 145 | Returns: 146 | Plotly color scale list 147 | """ 148 | global colors, mappings 149 | 150 | colors_temp = getattr(palettes, colorscale_name) 151 | if transform == "linear": 152 | scale_values = np.linspace(0, 1, len(colors_temp)) 153 | elif transform == "sqrt": 154 | scale_values = np.linspace(0, 1, len(colors_temp)) ** 2 155 | elif transform == "cbrt": 156 | scale_values = np.linspace(0, 1, len(colors_temp)) ** 3 157 | elif transform == "log": 158 | scale_values = (10 ** np.linspace(0, 1, len(colors_temp)) - 1) / 9 159 | else: 160 | raise ValueError("Unexpected colorscale transform") 161 | return [(v, clr) for v, clr in zip(scale_values, colors_temp)] 162 | 163 | 164 | def get_min_max(df, col): 165 | if dask_cudf and isinstance(df, dask_cudf.core.DataFrame): 166 | return (df[col].min().compute(), df[col].max().compute()) 167 | return (df[col].min(), df[col].max()) 168 | 169 | 170 | def build_datashader_plot( 171 | df, 172 | colorscale_name, 173 | colorscale_transform, 174 | new_coordinates, 175 | position, 176 | x_range, 177 | y_range, 178 | view_name, 179 | ): 180 | # global data_3857, data_center_3857, data_4326, data_center_4326 181 | 182 | x0, x1 = x_range 183 | y0, y1 = y_range 184 | 185 | datashader_color_scale = {} 186 | 187 | cvs = ds.Canvas(plot_width=3840, plot_height=2160, x_range=x_range, y_range=y_range) 188 | 189 | colorscale_transform = "linear" 190 | 191 | if view_name == "race": 192 | aggregate_column = "race" 193 | aggregate = "mean" 194 | elif view_name == "total": 195 | aggregate_column = "net" 196 | aggregate = "count" 197 | colorscale_name = "Viridis10" 198 | elif view_name == "in": 199 | aggregate_column = "net" 200 | aggregate = "count" 201 | colorscale_name = "PuBu9" 202 | elif view_name == "stationary": 203 | aggregate_column = "net" 204 | aggregate = "count" 205 | colorscale_name = "Greens9" 206 | elif view_name == "out": 207 | aggregate_column = "net" 208 | aggregate = "count" 209 | colorscale_name = "RdPu9" 210 | else: # net 211 | aggregate_column = "net" 212 | aggregate = "mean" 213 | 214 | if aggregate == "mean": 215 | datashader_color_scale["color_key"] = colors[aggregate_column] 216 | datashader_color_scale["how"] = "log" 217 | else: 218 | datashader_color_scale["cmap"] = [ 219 | i[1] for i in build_colorscale(colorscale_name, colorscale_transform) 220 | ] 221 | datashader_color_scale["how"] = "log" 222 | 223 | agg = cvs.points( 224 | df, 225 | x="easting", 226 | y="northing", 227 | agg=getattr(ds, aggregate)(aggregate_column), 228 | ) 229 | 230 | cmin = cp.asnumpy(agg.min().data) 231 | cmax = cp.asnumpy(agg.max().data) 232 | 233 | # Count the number of selected towers 234 | temp = agg.sum() 235 | temp.data = cp.asnumpy(temp.data) 236 | n_selected = int(temp) 237 | 238 | if n_selected == 0: 239 | # Nothing to display 240 | lat = [None] 241 | lon = [None] 242 | customdata = [None] 243 | marker = {} 244 | layers = [] 245 | else: 246 | img = tf.shade( 247 | tf.dynspread(agg, threshold=0.7), 248 | **datashader_color_scale, 249 | ).to_pil() 250 | # img = tf.shade(agg,how='log',**datashader_color_scale).to_pil() 251 | 252 | # Add image as mapbox image layer. Note that as of version 4.4, plotly will 253 | # automatically convert the PIL image object into a base64 encoded png string 254 | layers = [ 255 | { 256 | "sourcetype": "image", 257 | "source": img, 258 | "coordinates": new_coordinates, 259 | } 260 | ] 261 | 262 | # Do not display any mapbox markers 263 | lat = [None] 264 | lon = [None] 265 | customdata = [None] 266 | marker = {} 267 | 268 | # Build map figure 269 | map_graph = { 270 | "data": [], 271 | "layout": { 272 | "template": template, 273 | "uirevision": True, 274 | "mapbox": { 275 | "style": mapbox_style, 276 | "layers": layers, 277 | "center": { 278 | "lon": -78.81063494489342, 279 | "lat": 37.471878534555074, 280 | }, 281 | "zoom": 3, 282 | }, 283 | "margin": {"r": 140, "t": 0, "l": 0, "b": 0}, 284 | "height": 500, 285 | "shapes": [ 286 | { 287 | "type": "rect", 288 | "xref": "paper", 289 | "yref": "paper", 290 | "x0": 0, 291 | "y0": 0, 292 | "x1": 1, 293 | "y1": 1, 294 | "line": { 295 | "width": 1, 296 | "color": "#191a1a", 297 | }, 298 | } 299 | ], 300 | }, 301 | } 302 | 303 | if aggregate == "mean": 304 | # for `Age By PurBlue` category 305 | if view_name == "race": 306 | colorscale = [0, 1] 307 | 308 | marker = dict( 309 | size=0, 310 | showscale=True, 311 | colorbar={ 312 | "title": { 313 | "text": "Race", 314 | "side": "right", 315 | "font": {"size": 14}, 316 | }, 317 | "tickvals": [ 318 | (0 + 0.5) / 7, 319 | (1 + 0.5) / 7, 320 | (2 + 0.5) / 7, 321 | (3 + 0.5) / 7, 322 | (4 + 0.5) / 7, 323 | (5 + 0.5) / 7, 324 | (6 + 0.5) / 7, 325 | ], 326 | "ticktext": [ 327 | "White", 328 | "African American", 329 | "American Indian", 330 | "Asian alone", 331 | "Native Hawaiian", 332 | "Other Race alone", 333 | "Two or More", 334 | ], 335 | "ypad": 30, 336 | }, 337 | colorscale=[ 338 | (0 / 7, colors["race"][1]), 339 | (1 / 7, colors["race"][1]), 340 | (1 / 7, colors["race"][2]), 341 | (2 / 7, colors["race"][2]), 342 | (2 / 7, colors["race"][3]), 343 | (3 / 7, colors["race"][3]), 344 | (3 / 7, colors["race"][4]), 345 | (4 / 7, colors["race"][4]), 346 | (4 / 7, colors["race"][5]), 347 | (5 / 7, colors["race"][5]), 348 | (5 / 7, colors["race"][6]), 349 | (6 / 7, colors["race"][6]), 350 | (6 / 7, colors["race"][7]), 351 | (7 / 7, colors["race"][7]), 352 | (7 / 7, colors["race"][7]), 353 | ], 354 | cmin=0, 355 | cmax=1, 356 | ) # end of marker 357 | else: 358 | colorscale = [0, 1] 359 | 360 | marker = dict( 361 | size=0, 362 | showscale=True, 363 | colorbar={ 364 | "title": { 365 | "text": "Migration", 366 | "side": "right", 367 | "font": {"size": 14}, 368 | }, 369 | "tickvals": [(0 + 0.5) / 3, (1 + 0.5) / 3, (2 + 0.5) / 3], 370 | "ticktext": ["Out", "Stationary", "In"], 371 | "ypad": 30, 372 | }, 373 | colorscale=[ 374 | (0 / 3, colors["net"][-1]), 375 | (1 / 3, colors["net"][-1]), 376 | (1 / 3, colors["net"][0]), 377 | (2 / 3, colors["net"][0]), 378 | (2 / 3, colors["net"][1]), 379 | (3 / 3, colors["net"][1]), 380 | ], 381 | cmin=0, 382 | cmax=1, 383 | ) # end of marker 384 | 385 | map_graph["data"].append( 386 | { 387 | "type": "scattermapbox", 388 | "lat": lat, 389 | "lon": lon, 390 | "customdata": customdata, 391 | "marker": marker, 392 | "hoverinfo": "none", 393 | } 394 | ) 395 | map_graph["layout"]["annotations"] = [] 396 | 397 | else: 398 | marker = dict( 399 | size=0, 400 | showscale=True, 401 | colorbar={ 402 | "title": { 403 | "text": "Population", 404 | "side": "right", 405 | "font": {"size": 14}, 406 | }, 407 | "ypad": 30, 408 | }, 409 | colorscale=build_colorscale(colorscale_name, colorscale_transform), 410 | cmin=cmin, 411 | cmax=cmax, 412 | ) # end of marker 413 | 414 | map_graph["data"].append( 415 | { 416 | "type": "scattermapbox", 417 | "lat": lat, 418 | "lon": lon, 419 | "customdata": customdata, 420 | "marker": marker, 421 | "hoverinfo": "none", 422 | } 423 | ) 424 | 425 | map_graph["layout"]["mapbox"].update(position) 426 | 427 | return map_graph 428 | 429 | 430 | def query_df_range_lat_lon(df, x0, x1, y0, y1, x, y): 431 | mask_ = (df[x] >= x0) & (df[x] <= x1) & (df[y] <= y0) & (df[y] >= y1) 432 | if mask_.sum() != len(df): 433 | df = df[mask_] 434 | if isinstance(df, cudf.DataFrame): 435 | df.index = cudf.RangeIndex(0, len(df)) 436 | else: 437 | df.index = pd.RangeIndex(0, len(df)) 438 | del mask_ 439 | return df 440 | 441 | 442 | def bar_selected_ids(selection, column): # select ids for each column 443 | if (column == "county_top") | (column == "county_bottom"): 444 | selected_ids = [county2id[p["label"]] for p in selection["points"]] 445 | else: 446 | selected_ids = [race2id[p["label"]] for p in selection["points"]] 447 | 448 | return selected_ids 449 | 450 | 451 | def query_df_selected_ids(df, col, selected_ids): 452 | if (col == "county_top") | (col == "county_bottom"): 453 | col = "county" 454 | return df[df[col].isin(selected_ids)] 455 | 456 | 457 | def no_data_figure(): 458 | return { 459 | "data": [ 460 | { 461 | "title": {"text": "Query Result"}, 462 | "text": "SOME RANDOM", 463 | "marker": {"text": "NO"}, 464 | } 465 | ], 466 | "layout": { 467 | "height": 250, 468 | "template": template, 469 | "xaxis": {"visible": False}, 470 | "yaxis": {"visible": False}, 471 | }, 472 | } 473 | 474 | 475 | def build_histogram_default_bins( 476 | df, 477 | column, 478 | selections, 479 | orientation, 480 | colorscale_name, 481 | colorscale_transform, 482 | view_name, 483 | flag, 484 | ): 485 | if (view_name == "out") & (column == "race"): 486 | return no_data_figure() 487 | 488 | global race2color 489 | 490 | if (column == "county_top") | (column == "county_bottom"): 491 | column = "county" 492 | 493 | if dask_cudf and isinstance(df, dask_cudf.core.DataFrame): 494 | df = df[[column, "net"]].groupby(column)["net"].count().compute().to_pandas() 495 | elif isinstance(df, cudf.DataFrame): 496 | df = df[[column, "net"]].groupby(column)["net"].count().to_pandas() 497 | elif isinstance(df, dd.core.DataFrame): 498 | df = df[[column, "net"]].groupby(column)["net"].count().compute() 499 | else: 500 | df = df[[column, "net"]].groupby(column)["net"].count() 501 | 502 | df = df.sort_values(ascending=False) # sorted grouped ids by counts 503 | 504 | if (flag == "top") | (flag == "bottom"): 505 | if flag == "top": 506 | view = df.iloc[:15] 507 | else: 508 | view = df.iloc[-15:] 509 | names = [id2county[cid] for cid in view.index.values] 510 | else: 511 | view = df 512 | names = [id2race[rid] for rid in view.index.values] 513 | 514 | bin_edges = names 515 | counts = view.values 516 | 517 | mapping_options = {} 518 | xaxis_labels = {} 519 | if column in mappings: 520 | if column in mappings_hover: 521 | mapping_options = { 522 | "text": list(mappings_hover[column].values()), 523 | "hovertemplate": "%{text}: %{y} ", 524 | } 525 | else: 526 | mapping_options = { 527 | "text": list(mappings[column].values()), 528 | "hovertemplate": "%{text} : %{y} ", 529 | } 530 | xaxis_labels = { 531 | "tickvals": list(mappings[column].keys()), 532 | "ticktext": list(mappings[column].values()), 533 | } 534 | 535 | if view_name == "total": 536 | bar_color = counts 537 | bar_scale = build_colorscale("Viridis10", colorscale_transform) 538 | elif view_name == "in": 539 | bar_color = counts 540 | bar_scale = build_colorscale("PuBu9", colorscale_transform) 541 | elif view_name == "stationary": 542 | bar_color = counts 543 | bar_scale = build_colorscale("Greens9", colorscale_transform) 544 | elif view_name == "out": 545 | bar_color = counts 546 | bar_scale = build_colorscale("RdPu9", colorscale_transform) 547 | elif view_name == "race": 548 | if column == "race": 549 | bar_color = [race2color[race] for race in names] 550 | else: 551 | bar_color = "#2C718E" 552 | bar_scale = None 553 | else: # net 554 | bar_color = "#2C718E" 555 | bar_scale = None 556 | 557 | fig = { 558 | "data": [ 559 | { 560 | "type": "bar", 561 | "x": bin_edges, 562 | "y": counts, 563 | "marker": {"color": bar_color, "colorscale": bar_scale}, 564 | **mapping_options, 565 | } 566 | ], 567 | "layout": { 568 | "yaxis": { 569 | "type": "linear", 570 | "title": {"text": "Count"}, 571 | }, 572 | "xaxis": {**xaxis_labels}, 573 | "selectdirection": "h", 574 | "dragmode": "select", 575 | "template": template, 576 | "uirevision": True, 577 | "hovermode": "closest", 578 | }, 579 | } 580 | if column not in selections: 581 | fig["data"][0]["selectedpoints"] = False 582 | 583 | return fig 584 | 585 | 586 | def cull_empty_partitions(df): 587 | ll = list(df.map_partitions(len).compute()) 588 | df_delayed = df.to_delayed() 589 | df_delayed_new = list() 590 | pempty = None 591 | for ix, n in enumerate(ll): 592 | if 0 == n: 593 | pempty = df.get_partition(ix) 594 | else: 595 | df_delayed_new.append(df_delayed[ix]) 596 | if pempty is not None: 597 | df = dd.from_delayed(df_delayed_new, meta=pempty) 598 | return df 599 | 600 | 601 | def build_updated_figures_dask( 602 | df, 603 | relayout_data, 604 | selected_map, 605 | selected_race, 606 | selected_county_top, 607 | selected_county_bottom, 608 | colorscale_name, 609 | data_3857, 610 | data_center_3857, 611 | data_4326, 612 | data_center_4326, 613 | coordinates_4326_backup, 614 | position_backup, 615 | view_name, 616 | ): 617 | colorscale_transform = "linear" 618 | selected = {} 619 | 620 | selected = { 621 | col: bar_selected_ids(sel, col) 622 | for col, sel in zip( 623 | ["race", "county_top", "county_bottom"], 624 | [selected_race, selected_county_top, selected_county_bottom], 625 | ) 626 | if sel and sel.get("points", []) 627 | } 628 | 629 | if relayout_data is not None: 630 | transformer_4326_to_3857 = Transformer.from_crs("epsg:4326", "epsg:3857") 631 | 632 | def epsg_4326_to_3857(coords): 633 | return [transformer_4326_to_3857.transform(*reversed(row)) for row in coords] 634 | 635 | coordinates_4326 = relayout_data and relayout_data.get("mapbox._derived", {}).get( 636 | "coordinates", None 637 | ) 638 | dragmode = ( 639 | relayout_data 640 | and "dragmode" in relayout_data 641 | and coordinates_4326_backup is not None 642 | ) 643 | 644 | if dragmode: 645 | coordinates_4326 = coordinates_4326_backup 646 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326) 647 | position = position_backup 648 | elif coordinates_4326: 649 | lons, lats = zip(*coordinates_4326) 650 | lon0, lon1 = max(min(lons), data_4326[0][0]), min(max(lons), data_4326[1][0]) 651 | lat0, lat1 = max(min(lats), data_4326[0][1]), min(max(lats), data_4326[1][1]) 652 | coordinates_4326 = [ 653 | [lon0, lat0], 654 | [lon1, lat1], 655 | ] 656 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326) 657 | coordinates_4326_backup = coordinates_4326 658 | 659 | position = { 660 | "zoom": relayout_data.get("mapbox.zoom", None), 661 | "center": relayout_data.get("mapbox.center", None), 662 | } 663 | position_backup = position 664 | 665 | else: 666 | position = { 667 | "zoom": 3.3350828189345934, 668 | "pitch": 0, 669 | "bearing": 0, 670 | "center": { 671 | "lon": -100.55828959790324, 672 | "lat": 38.68323453274175, 673 | }, # {'lon': data_center_4326[0][0]-100, 'lat': data_center_4326[0][1]-10} 674 | } 675 | coordinates_3857 = data_3857 676 | coordinates_4326 = data_4326 677 | 678 | new_coordinates = [ 679 | [coordinates_4326[0][0], coordinates_4326[1][1]], 680 | [coordinates_4326[1][0], coordinates_4326[1][1]], 681 | [coordinates_4326[1][0], coordinates_4326[0][1]], 682 | [coordinates_4326[0][0], coordinates_4326[0][1]], 683 | ] 684 | 685 | x_range, y_range = zip(*coordinates_3857) 686 | x0, x1 = x_range 687 | y0, y1 = y_range 688 | 689 | if selected_map is not None: 690 | coordinates_4326 = selected_map["range"]["mapbox"] 691 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326) 692 | x_range_t, y_range_t = zip(*coordinates_3857) 693 | x0, x1 = x_range_t 694 | y0, y1 = y_range_t 695 | df = df.map_partitions( 696 | query_df_range_lat_lon, x0, x1, y0, y1, "easting", "northing" 697 | ).persist() 698 | 699 | # Select points as per view 700 | 701 | if (view_name == "total") | (view_name == "race"): 702 | df = df[(df["net"] == 0) | (df["net"] == 1)] 703 | # df['race'] = df['race'].astype('category') 704 | elif view_name == "in": 705 | df = df[df["net"] == 1] 706 | df["net"] = df["net"].astype("int8") 707 | elif view_name == "stationary": 708 | df = df[df["net"] == 0] 709 | df["net"] = df["net"].astype("int8") 710 | elif view_name == "out": 711 | df = df[df["net"] == -1] 712 | df["net"] = df["net"].astype("int8") 713 | else: # net migration condition 714 | df = df 715 | # df["net"] = df["net"].astype("category") 716 | 717 | for col in selected: 718 | df = df.map_partitions(query_df_selected_ids, col, selected[col]) 719 | 720 | # cull empty partitions 721 | df = cull_empty_partitions(df).persist() 722 | 723 | datashader_plot = build_datashader_plot( 724 | df, 725 | colorscale_name, 726 | colorscale_transform, 727 | new_coordinates, 728 | position, 729 | x_range, 730 | y_range, 731 | view_name, 732 | ) 733 | 734 | # Build indicator figure 735 | n_selected_indicator = { 736 | "data": [ 737 | { 738 | "domain": {"x": [0.21, 0.41], "y": [0, 0.5]}, 739 | "title": {"text": "Data Size"}, 740 | "type": "indicator", 741 | "value": len(df), 742 | "number": { 743 | "font": {"color": text_color, "size": "50px"}, 744 | "valueformat": ",", 745 | "suffix": " rows", 746 | }, 747 | }, 748 | ], 749 | "layout": { 750 | "template": template, 751 | "height": row_heights[3], 752 | # 'margin': {'l': 0, 'r': 0,'t': 5, 'b': 5} 753 | }, 754 | } 755 | 756 | race_histogram = build_histogram_default_bins( 757 | df, 758 | "race", 759 | selected, 760 | "v", 761 | colorscale_name, 762 | colorscale_transform, 763 | view_name, 764 | flag="All", 765 | ) 766 | 767 | county_top_histogram = build_histogram_default_bins( 768 | df, 769 | "county", 770 | selected, 771 | "v", 772 | colorscale_name, 773 | colorscale_transform, 774 | view_name, 775 | flag="top", 776 | ) 777 | 778 | county_bottom_histogram = build_histogram_default_bins( 779 | df, 780 | "county", 781 | selected, 782 | "v", 783 | colorscale_name, 784 | colorscale_transform, 785 | view_name, 786 | flag="bottom", 787 | ) 788 | 789 | del df 790 | return ( 791 | datashader_plot, 792 | county_top_histogram, 793 | county_bottom_histogram, 794 | race_histogram, 795 | n_selected_indicator, 796 | coordinates_4326_backup, 797 | position_backup, 798 | ) 799 | 800 | 801 | def build_updated_figures( 802 | df, 803 | relayout_data, 804 | selected_map, 805 | selected_race, 806 | selected_county_top, 807 | selected_county_bottom, 808 | colorscale_name, 809 | data_3857, 810 | data_center_3857, 811 | data_4326, 812 | data_center_4326, 813 | coordinates_4326_backup, 814 | position_backup, 815 | view_name, 816 | ): 817 | colorscale_transform = "linear" 818 | selected = {} 819 | 820 | selected = { 821 | col: bar_selected_ids(sel, col) 822 | for col, sel in zip( 823 | ["race", "county_top", "county_bottom"], 824 | [selected_race, selected_county_top, selected_county_bottom], 825 | ) 826 | if sel and sel.get("points", []) 827 | } 828 | 829 | if relayout_data is not None: 830 | transformer_4326_to_3857 = Transformer.from_crs("epsg:4326", "epsg:3857") 831 | 832 | def epsg_4326_to_3857(coords): 833 | return [transformer_4326_to_3857.transform(*reversed(row)) for row in coords] 834 | 835 | coordinates_4326 = relayout_data and relayout_data.get("mapbox._derived", {}).get( 836 | "coordinates", None 837 | ) 838 | dragmode = ( 839 | relayout_data 840 | and "dragmode" in relayout_data 841 | and coordinates_4326_backup is not None 842 | ) 843 | 844 | if dragmode: 845 | coordinates_4326 = coordinates_4326_backup 846 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326) 847 | position = position_backup 848 | elif coordinates_4326: 849 | lons, lats = zip(*coordinates_4326) 850 | lon0, lon1 = max(min(lons), data_4326[0][0]), min(max(lons), data_4326[1][0]) 851 | lat0, lat1 = max(min(lats), data_4326[0][1]), min(max(lats), data_4326[1][1]) 852 | coordinates_4326 = [ 853 | [lon0, lat0], 854 | [lon1, lat1], 855 | ] 856 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326) 857 | coordinates_4326_backup = coordinates_4326 858 | 859 | position = { 860 | "zoom": relayout_data.get("mapbox.zoom", None), 861 | "center": relayout_data.get("mapbox.center", None), 862 | } 863 | position_backup = position 864 | 865 | else: 866 | position = { 867 | "zoom": 3.3350828189345934, 868 | "pitch": 0, 869 | "bearing": 0, 870 | "center": { 871 | "lon": -100.55828959790324, 872 | "lat": 38.68323453274175, 873 | }, # {'lon': data_center_4326[0][0]-100, 'lat': data_center_4326[0][1]-10} 874 | } 875 | coordinates_3857 = data_3857 876 | coordinates_4326 = data_4326 877 | 878 | new_coordinates = [ 879 | [coordinates_4326[0][0], coordinates_4326[1][1]], 880 | [coordinates_4326[1][0], coordinates_4326[1][1]], 881 | [coordinates_4326[1][0], coordinates_4326[0][1]], 882 | [coordinates_4326[0][0], coordinates_4326[0][1]], 883 | ] 884 | 885 | x_range, y_range = zip(*coordinates_3857) 886 | x0, x1 = x_range 887 | y0, y1 = y_range 888 | 889 | if selected_map is not None: 890 | coordinates_4326 = selected_map["range"]["mapbox"] 891 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326) 892 | x_range_t, y_range_t = zip(*coordinates_3857) 893 | x0, x1 = x_range_t 894 | y0, y1 = y_range_t 895 | df = query_df_range_lat_lon(df, x0, x1, y0, y1, "easting", "northing") 896 | 897 | # Select points as per view 898 | if (view_name == "total") | (view_name == "race"): 899 | df = df[(df["net"] == 0) | (df["net"] == 1)] 900 | df["net"] = df["net"].astype("int8") 901 | # df['race'] = df['race'].astype('category') 902 | elif view_name == "in": 903 | df = df[df["net"] == 1] 904 | df["net"] = df["net"].astype("int8") 905 | elif view_name == "stationary": 906 | df = df[df["net"] == 0] 907 | df["net"] = df["net"].astype("int8") 908 | elif view_name == "out": 909 | df = df[df["net"] == -1] 910 | df["net"] = df["net"].astype("int8") 911 | else: # net migration condition 912 | df = df 913 | # df["net"] = df["net"].astype("category") 914 | 915 | for col in selected: 916 | df = query_df_selected_ids(df, col, selected[col]) 917 | 918 | datashader_plot = build_datashader_plot( 919 | df, 920 | colorscale_name, 921 | colorscale_transform, 922 | new_coordinates, 923 | position, 924 | x_range, 925 | y_range, 926 | view_name, 927 | ) 928 | 929 | # Build indicator figure 930 | n_selected_indicator = { 931 | "data": [ 932 | { 933 | "domain": {"x": [0.2, 0.45], "y": [0, 0.5]}, 934 | "title": {"text": "Data Size"}, 935 | "type": "indicator", 936 | "value": len(df), 937 | "number": { 938 | "font": {"color": text_color, "size": "50px"}, 939 | "valueformat": ",", 940 | "suffix": " rows", 941 | }, 942 | }, 943 | ], 944 | "layout": { 945 | "template": template, 946 | "height": row_heights[3], 947 | # 'margin': {'l': 0, 'r': 0,'t': 5, 'b': 5} 948 | }, 949 | } 950 | 951 | race_histogram = build_histogram_default_bins( 952 | df, 953 | "race", 954 | selected, 955 | "v", 956 | colorscale_name, 957 | colorscale_transform, 958 | view_name, 959 | flag="All", 960 | ) 961 | 962 | county_top_histogram = build_histogram_default_bins( 963 | df, 964 | "county", 965 | selected, 966 | "v", 967 | colorscale_name, 968 | colorscale_transform, 969 | view_name, 970 | flag="top", 971 | ) 972 | 973 | county_bottom_histogram = build_histogram_default_bins( 974 | df, 975 | "county", 976 | selected, 977 | "v", 978 | colorscale_name, 979 | colorscale_transform, 980 | view_name, 981 | flag="bottom", 982 | ) 983 | 984 | del df 985 | return ( 986 | datashader_plot, 987 | race_histogram, 988 | county_top_histogram, 989 | county_bottom_histogram, 990 | n_selected_indicator, 991 | coordinates_4326_backup, 992 | position_backup, 993 | ) 994 | 995 | 996 | def check_dataset(dataset_url, data_path): 997 | if not os.path.exists(data_path): 998 | print( 999 | f"Dataset not found at " + data_path + ".\n" 1000 | f"Downloading from {dataset_url}" 1001 | ) 1002 | # Download dataset to data directory 1003 | os.makedirs("../data", exist_ok=True) 1004 | with requests.get(dataset_url, stream=True) as r: 1005 | r.raise_for_status() 1006 | with open(data_path, "wb") as f: 1007 | for chunk in r.iter_content(chunk_size=8192): 1008 | if chunk: 1009 | f.write(chunk) 1010 | print("Download completed!") 1011 | else: 1012 | print(f"Found dataset at {data_path}") 1013 | 1014 | 1015 | def load_dataset(path, dtype="dask_cudf"): 1016 | """ 1017 | Args: 1018 | path: Path to arrow file containing mortgage dataset 1019 | Returns: 1020 | pandas DataFrame 1021 | """ 1022 | if os.path.isdir(path): 1023 | path = path + "/*" 1024 | if dtype == "dask": 1025 | return dd.read_parquet(path, split_row_groups=True) 1026 | elif dask_cudf and dtype == "dask_cudf": 1027 | return dask_cudf.read_parquet(path, split_row_groups=True) 1028 | elif dtype == "pandas": 1029 | return cudf.read_parquet(path).to_pandas() 1030 | return cudf.read_parquet(path) 1031 | --------------------------------------------------------------------------------