├── .dockerignore
├── .gitignore
├── Dockerfile
├── README.md
├── assets
└── dashboard.png
├── data_prep_net_migration
├── README.md
├── assign_race.ipynb
├── compute_race.ipynb
├── gen_points_in_rectangle_fast_script.ipynb
├── gen_race_mig_points.ipynb
└── gen_table_with_race_migration.ipynb
├── data_prep_total_population
├── .ipynb_checkpoints
│ └── SeparateTotalDatasetsByState-checkpoint.ipynb
├── README.md
├── SeparateTotalDatasetsByState.ipynb
├── SeparateTotalDatasetsByState.py
├── add_race_net_county_to_population.ipynb
├── gen_table_with_migration.ipynb
├── gen_total_population_points_script.ipynb
└── map_blocks_and_calc_population.ipynb
├── entrypoint.sh
├── environment.yml
├── environment_for_docker.yml
├── holoviews_demo
├── README.md
├── census_net_migration_demo.ipynb
└── environment.yml
├── id2county.pkl
└── plotly_demo
├── README.md
├── app.py
├── assets
├── dash-logo.png
├── rapids-logo.png
└── s1.css
├── colab_plotly_rapids_app.ipynb
├── dask_app.py
└── utils
├── __init__.py
└── utils.py
/.dockerignore:
--------------------------------------------------------------------------------
1 | ./data/*
2 | dask-worker-space
3 | .vscode
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | data
2 | **checkpoint**
3 | *pyc
4 | *log
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | ARG RAPIDS_VERSION=22.12
2 | ARG CUDA_VERSION=11.5
3 | ARG LINUX_VERSION=ubuntu20.04
4 | ARG PYTHON_VERSION=3.9
5 | FROM nvcr.io/nvidia/rapidsai/rapidsai-core:${RAPIDS_VERSION}-cuda${CUDA_VERSION}-base-${LINUX_VERSION}-py${PYTHON_VERSION}
6 |
7 | WORKDIR /rapids/
8 | RUN mkdir plotly_census_demo
9 |
10 | WORKDIR /rapids/plotly_census_demo
11 | RUN mkdir data
12 | WORKDIR /rapids/plotly_census_demo/data
13 | RUN curl https://data.rapids.ai/viz-data/total_population_dataset.parquet -o total_population_dataset.parquet
14 |
15 | WORKDIR /rapids/plotly_census_demo
16 |
17 | COPY . .
18 |
19 | RUN source activate rapids && conda remove --force cuxfilter && mamba env update --file environment_for_docker.yml
20 |
21 | ENTRYPOINT ["bash","./entrypoint.sh"]
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Plotly-Dash + RAPIDS | Census 2020 Visualization
2 |
3 |
4 |
5 | 
6 |
7 | ## Charts
8 |
9 | 1. Map chart shows the total population points for chosen view and selected area
10 | 2. Top counties bar show the top 15 counties for chosen view and selected area
11 | 3. Bottom counties bar show the bottom 15 counties for chosen view and selected area
12 | 4. Race Distribution shows distribution of individual races across blocks for chosen view and selected area
13 |
14 | Cross-filtering is enabled to link all the four charts using box-select tool
15 |
16 | ## Data-Selection Views
17 |
18 | The demo consists of six views and all views are calculated at a block level
19 |
20 | - Total Population view shows total Census 2020 population.
21 | - Migrating In view shows net inward decennial migration.
22 | - Stationary view shows population that were stationary.
23 | - Migrating Out view shows net outward decennial migration.
24 | - Net Migration view shows total decennial migration. Points are colored into three categories - migrating in, stationary, migrating out
25 | - Population with Race shows total Census 2020 population colored into seven race categories - White alone, African American alone, American Indian alone, Asian alone, Native Hawaiian alone, Other Race alone, Two or More races.
26 |
27 | ## Installation and Run Steps
28 |
29 | ## Base Layer Setup
30 |
31 | The visualization uses a Mapbox base layer that requires an access token. Create one for free [here on mapbox](https://www.mapbox.com/help/define-access-token/). Go to the demo root directory's `plotly_demo` folder and create a token file named `.mapbox_token`. Copy your token contents into the file.
32 |
33 | **NOTE:** Installation may fail without the token.
34 |
35 | ## Data
36 |
37 | There is 1 main dataset:
38 |
39 | - [Total Population Dataset](https://data.rapids.ai/viz-data/net_migration_dataset.parquet) ; Consists of Census 2020 total population with decennial migration from Census 2010 at a block level.
40 | - [Net Migration Dataset](https://data.rapids.ai/viz-data/net_migration_dataset.parquet) ; Net migration from Census 2010 at a block level.
41 |
42 | For more information on how the Census 2020 and 2010 Migration data was prepared to show individual points, refer to the `/data_prep_total_population` folder.
43 |
44 | ### Conda Env
45 |
46 | Verify the following arguments in the `environment.yml` match your system(easy way to check `nvidia-smi`):
47 |
48 | cudatoolkit: Version used is `11.5`
49 |
50 | ```bash
51 | # setup conda environment
52 | conda env create --name plotly_env --file environment.yml
53 | source activate plotly_env
54 |
55 | # run and access single GPU version
56 | cd plotly_demo
57 | python app.py
58 |
59 | # run and access multi GPU version, run `python dask_app.py --help for args info`
60 | # if --cuda_visible_devices argument is not passed, all the available GPUs are used
61 | cd plotly_demo
62 | python dask_app.py --cuda_visible_devices=0,1
63 | ```
64 |
65 | ### Docker
66 |
67 | Verify the following arguments in the Dockerfile match your system:
68 |
69 | 1. CUDA_VERSION: Supported versions are `11.0+`
70 | 2. LINUX_VERSION: Supported OS values are `ubuntu16.04, ubuntu18.04, centos7`
71 |
72 | The most up to date OS and CUDA versions supported can be found here: [RAPIDS requirements](https://rapids.ai/start.html#req)
73 |
74 | ```bash
75 | # build
76 | docker build -t plotly_demo .
77 |
78 | # run and access single GPU version via: http://localhost:8050 / http://ip_address:8050 / http://0.0.0.0:8050
79 | docker run --gpus all --name single_gpu -p 8050:8050 plotly_demo
80 |
81 | # run and access multi GPU version via: http://localhost:8050 / http://ip_address:8050 / http://0.0.0.0:8050
82 | # Use `--gpus all` to use all the available GPUs
83 | docker run --gpus '"device=0,1"' --name multi_gpu -p 8050:8050 plotly_demo dask_app
84 | ```
85 |
86 | ## Requirements
87 |
88 | ### CUDA/GPU requirements
89 |
90 | - CUDA 11.0+
91 | - NVIDIA driver 450.80.02+
92 | - Pascal architecture or better (Compute Capability >=6.0)
93 |
94 | > Recommended Memory: NVIDIA GPU with at least 32GB of memory(or 2 GPUs with equivalent GPU memory when running dask version), and at least 32GB of system memory.
95 |
96 | ### OS requirements
97 |
98 | See the [Rapids System Requirements section](https://rapids.ai/start.html#requirements) for information on compatible OS.
99 |
100 | ## Dependencies
101 |
102 | - python=3.9
103 | - cudatoolkit=11.5
104 | - rapids=22.08
105 | - dash=2.5.1
106 | - jupyterlab=3.4.3
107 | - dash-html-components=2.0.0
108 | - dash-core-components=2.0.0
109 | - dash-daq=0.5.0
110 | - dash_bootstrap_components=1.2.0
111 |
112 | ## FAQ and Known Issues
113 |
114 | **What hardware do I need to run this locally?** To run you need an NVIDIA GPU with at least 32GB of memory(or 2 GPUs with equivalent GPU memory when running dask version), at least 32GB of system memory.
115 |
116 | **How did you compute migration?** Migration was computed by comparing the block level population for census 2010 and 2020
117 |
118 | **How did you compare population having block level boundary changes?** [Relationship Files](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#t10t20) provides the 2010 Census Tabulation Block to 2020 Census Tabulation Block Relationship Files. Block relationships may be one-to-one, many-to-one, one-to-many, or many-to-many. Population count was computed in proportion to take into account the division and collation of blocks across 2010 and 2020.
119 |
120 | **How did you determine race?** Race for stationary and inward migration individuals was randomly assigned within a block but they add up accurately at the block level. However, due to how data is anonymized, race for outward migration population could not be calculated.
121 |
122 | **How did you get individual point locations?** The population density points are randomly placed within a census block and associated to match distribution counts at a census block level.
123 |
124 | **How are the population and distributions filtered?** Use the box select tool icon for the map or click and drag for the bar charts.
125 |
126 | **Why is the population data from 2010 and 2020?** Only census data is recorded on a block level, which provides the highest resolution population distributions available. For more details on census boundaries refer to the [TIGERweb app](https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_apps.html).
127 |
128 | **The dashboard stop responding or the chart data disappeared!** This is likely caused by an Out of Memory Error and the application must be restarted.
129 |
130 | **How do I request a feature or report a bug?** Create an [Issue](https://github.com/rapidsai/plotly-dash-rapids-census-demo/issues) and we will get to it asap.
131 |
132 | ## Acknowledgments and Data Sources
133 |
134 | - 2020 Population Census and 2010 Population Census to compute Migration Dataset, used with permission from IPUMS NHGIS, University of Minnesota, [www.nhgis.org](https://www.nhgis.org/) ( not for redistribution ).
135 | - Base map layer provided by [Mapbox](https://www.mapbox.com/).
136 | - Dashboard developed with [Plotly Dash](https://plotly.com/dash/).
137 | - Geospatial point rendering developed with [Datashader](https://datashader.org/).
138 | - GPU toggle accelerated with [RAPIDS cudf](https://rapids.ai/) and [cupy](https://cupy.chainer.org/), CPU toggle with [pandas](https://pandas.pydata.org/).
139 | - For source code and data workflow, visit our [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/census-2020).
140 |
--------------------------------------------------------------------------------
/assets/dashboard.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/e4af2bd3de86263b7f1f947ba9b302002e047a55/assets/dashboard.png
--------------------------------------------------------------------------------
/data_prep_net_migration/README.md:
--------------------------------------------------------------------------------
1 | # Net Migration dataset generation
2 |
3 | ## Order of execution
4 |
5 | 1. gen_table_with_race_migration
6 | 2. gen_race_mig_points
7 | 3. compute_race
8 | 4. assign_race
9 |
10 | ## Mappings:
11 |
12 | ### Block Net
13 |
14 | 1: Inward Migration
15 | 0: Stationary
16 | -1: Outward Migration
17 |
18 | ### Block diff
19 |
20 | Integer
21 |
22 | ### Race
23 |
24 | 0: All
25 | 1: White
26 | 2: African American
27 | 3: American Indian
28 | 4: Asian alone
29 | 5: Native Hawaiian
30 | 6: Other Race alone
31 | 7: Two or More
32 |
33 | ### County
34 |
35 | Mappings for counties can be found in `id2county.pkl` file from root directory.
36 |
37 | ### Final Dataset
38 |
39 | You can download the final net miragtion dataset [here](https://data.rapids.ai/viz-data/net_migration_dataset.parquet)
40 |
--------------------------------------------------------------------------------
/data_prep_net_migration/gen_points_in_rectangle_fast_script.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "6f5f29ab-3b00-4377-a320-421b6e33386f",
6 | "metadata": {},
7 | "source": [
8 | "#### Objective:- Alternative script that generates points within rectangular bounds in 40min for sanity checks ( Needs to be integrated with cuSpatial for checking points within polygon) "
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "id": "1c9f4d54-8bb4-42ce-aa05-a7cb5465472a",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import cudf, cupy\n",
19 | "import pandas as pd, numpy as np\n",
20 | "import geopandas as gpd\n",
21 | "# from shapely.geometry import Point, Polygon\n",
22 | "import os\n",
23 | "import datetime\n",
24 | "import pickle"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "id": "ad825f8c-be08-40c6-8b95-fc75f357be7d",
30 | "metadata": {
31 | "tags": []
32 | },
33 | "source": [
34 | "### ETL"
35 | ]
36 | },
37 | {
38 | "cell_type": "code",
39 | "execution_count": null,
40 | "id": "1e9d91de-5c2b-4d5d-8810-f5d74598ca7f",
41 | "metadata": {},
42 | "outputs": [],
43 | "source": [
44 | "df = pd.read_csv('data/mapped_data_full.csv',encoding='unicode_escape',dtype={'GISJOIN':'int64','ID20':'int64','STATE':'int32','COUNTY':'str','P20':'int32','P10_new':'int32'}).drop('Unnamed: 0',axis=1)\n",
45 | "df['P_delta']=df['P20'] - df['eq_P10']\n",
46 | "df['P_net']= df['P_delta'].apply(lambda x : 1 if x>0 else 0)\n",
47 | "df['number'] = df.P_delta.round().abs().astype('int32')\n",
48 | "df.head()"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": null,
54 | "id": "91a55f8f-e5bd-4fd5-9d6e-82cab1a7c1cc",
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "# df =df.to_pandas()"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "id": "0b6a458e-61dd-4645-a122-366a40fa9f95",
64 | "metadata": {
65 | "tags": []
66 | },
67 | "source": [
68 | "#### MAKE function"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": null,
74 | "id": "186e35dc-e5e8-4820-b02b-6baea50ca749",
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "def Random_Points_in_Bounds(row): \n",
79 | " polygon = row.iloc[0]\n",
80 | " number = row.iloc[1]\n",
81 | " minx, miny, maxx, maxy = polygon.bounds\n",
82 | " x = np.random.uniform( minx, maxx, number )\n",
83 | " y = np.random.uniform( miny, maxy, number )\n",
84 | " return [x, y]\n",
85 | "\n",
86 | "def makeXYpair(row):\n",
87 | " l1 = row[0]\n",
88 | " l2 = row[1]\n",
89 | " return list(map(lambda x, y:[x,y], l1, l2))\n",
90 | "\n",
91 | "\n",
92 | "def exec_data(state_key_list):\n",
93 | " c=0\n",
94 | " for i in state_key_list:\n",
95 | " c+=1\n",
96 | " if i< 10:\n",
97 | " i_str = '0'+str(i)\n",
98 | " else:\n",
99 | " i_str = str(i)\n",
100 | " path ='data/tl_shapefiles/tl_2021_%s_tabblock20.shp'%(i_str)\n",
101 | " print(\"started reading shape file for state \", states[i])\n",
102 | " if os.path.isfile(path): \n",
103 | " gpdf = gpd.read_file(path)[['GEOID20', 'geometry']].sort_values('GEOID20').reset_index(drop=True)\n",
104 | " gpdf.GEOID20 = gpdf.GEOID20.astype('int64')\n",
105 | " print(\"completed reading shape file for state \", states[i])\n",
106 | " df_temp = df.query('STATE == @i')[['ID20', 'number','COUNTY','P_delta','P_net']]\n",
107 | " merged_df =pd.merge(gpdf,df_temp[['ID20','number']],left_on='GEOID20',right_on='ID20',how='inner')\n",
108 | " merged_df = merged_df[merged_df.number!=0].reset_index(drop=True)\n",
109 | " merged_df =merged_df.reset_index(drop=True).drop(columns=['GEOID20'])\n",
110 | "\n",
111 | " print(\"starting to generate data for \"+str(states[i])+\"... \")\n",
112 | " t1 = datetime.datetime.now()\n",
113 | " population_df = pd.DataFrame(merged_df[['geometry','number']].apply(Random_Points_in_Bounds,axis=1),columns=['population'])\n",
114 | " points_df = population_df['population'].apply(makeXYpair)\n",
115 | " points_df = pd.DataFrame(points_df.explode()).reset_index()\n",
116 | " \n",
117 | " pop_list =points_df['population'].to_list()\n",
118 | " final_df =pd.DataFrame(pop_list,columns=['x','y']).reset_index(drop=True)\n",
119 | " \n",
120 | " ids = merged_df.ID20.to_list()\n",
121 | " number =merged_df.number.to_list()\n",
122 | " \n",
123 | " rows = []\n",
124 | " for id20, n in zip(ids,number):\n",
125 | " rows.extend([id20]*n)\n",
126 | " \n",
127 | " \n",
128 | " final_df['ID20'] = pd.Series(rows)\n",
129 | " final_df = final_df.sort_values('ID20').reset_index(drop=True)\n",
130 | " final_df = pd.merge(final_df,df_temp, on='ID20',how='left')\n",
131 | " \n",
132 | " \n",
133 | " final_df.to_csv('data/migration_files1/migration_%s'%str(states[i])+'.csv', index=False)\n",
134 | " print(\"Processing complete for\", states[i])\n",
135 | " print('Processing for '+str(states[i])+' complete \\n total time', datetime.datetime.now() - t1)\n",
136 | " \n",
137 | " del(df_temp)\n",
138 | " else:\n",
139 | " print(\"shape file does not exist\")\n",
140 | " continue"
141 | ]
142 | },
143 | {
144 | "cell_type": "code",
145 | "execution_count": null,
146 | "id": "b5e2b4fb-c8e2-48fc-a496-fb0e0b5387a4",
147 | "metadata": {},
148 | "outputs": [],
149 | "source": [
150 | "# states = {1 :\"AL\",2 :\"AK\",4 :\"AZ\",5 :\"AR\",6 :\"CA\",8 :\"CO\",9 :\"CT\",10:\"DE\",11:\"DC\",12:\"FL\",13:\"GA\",15:\"HI\",\n",
151 | "# 16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS\",21:\"KY\",22:\"LA\",23:\"ME\",24:\"MD\",25:\"MA\",26:\"MI\",27:\"MN\",\n",
152 | "# 28:\"MS\",29:\"MO\",30:\"MT\",31:\"NE\",32:\"NV\",33:\"NH\",34:\"NJ\",35:\"NM\",36:\"NY\",37:\"NC\",38:\"ND\",39:\"OH\",\n",
153 | "# 40:\"OK\",41:\"OR\",42:\"PA\",44:\"RI\",45:\"SC\",46:\"SD\",47:\"TN\",48:\"TX\",49:\"UT\",50:\"VT\",51:\"VA\",53:\"WA\",\n",
154 | "# 54:\"WV\",55:\"WI\",56:\"WY\",72:\"PR\"}\n",
155 | "states= { 12:\"FL\",13:\"GA\",15:\"HI\",16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS}"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": null,
161 | "id": "68872dfa-8eb7-44a3-9bdf-73376f8c28ec",
162 | "metadata": {
163 | "tags": []
164 | },
165 | "outputs": [],
166 | "source": [
167 | "exec_data(states.keys())"
168 | ]
169 | },
170 | {
171 | "cell_type": "markdown",
172 | "id": "42b474dd-8db8-48d1-8a66-62bcc8dbad27",
173 | "metadata": {
174 | "tags": []
175 | },
176 | "source": [
177 | "### Concat States"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": null,
183 | "id": "a1f2fd86-6a7d-41db-96e9-2665c90bf4c4",
184 | "metadata": {},
185 | "outputs": [],
186 | "source": [
187 | "def merge_shape_and_states(state_key_list):\n",
188 | " concat_states = cudf.DataFrame()\n",
189 | " \n",
190 | " for i in state_key_list:\n",
191 | " if i< 10:\n",
192 | " i_str = '0'+str(i)\n",
193 | " else:\n",
194 | " i_str = str(i)\n",
195 | " path = 'data/migration_files1/migration_%s'%str(states[i])+'.csv'\n",
196 | " if os.path.isfile(path): \n",
197 | " temp = cudf.read_csv(path,dtype={'ID20':'int64','x':'float32','y':'float32'})# Load shape files\n",
198 | " concat_states = cudf.concat([concat_states,temp])\n",
199 | " else:\n",
200 | " print(path)\n",
201 | " print(\"shape file does not exist\")\n",
202 | " continue\n",
203 | " return concat_states"
204 | ]
205 | },
206 | {
207 | "cell_type": "code",
208 | "execution_count": null,
209 | "id": "e88d458c-2016-4076-b513-59e1277f751b",
210 | "metadata": {
211 | "tags": []
212 | },
213 | "outputs": [],
214 | "source": [
215 | "indv_df = merge_shape_and_states(states.keys())\n",
216 | "indv_df.rename(columns={'GEOID20':'ID20'},inplace=True)\n",
217 | "indv_df.head()"
218 | ]
219 | },
220 | {
221 | "cell_type": "markdown",
222 | "id": "fdc83845-75c0-4ad0-8538-1abeca2190cc",
223 | "metadata": {},
224 | "source": [
225 | "### Load saved files"
226 | ]
227 | },
228 | {
229 | "cell_type": "code",
230 | "execution_count": null,
231 | "id": "7d37b71c-fad2-46b5-9324-f0e67bf85a09",
232 | "metadata": {},
233 | "outputs": [],
234 | "source": [
235 | "pickle.dump(indv_df,open('fulldata_gpu_2','wb'))\n",
236 | "# indv_df = pickle.load(open('fulldata_gpu','rb'))"
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": null,
242 | "id": "d94ea6f0-0c6f-4932-87cb-53ad28b3b57a",
243 | "metadata": {},
244 | "outputs": [],
245 | "source": [
246 | "# indv_df = indv_df.to_pandas()"
247 | ]
248 | },
249 | {
250 | "cell_type": "code",
251 | "execution_count": null,
252 | "id": "ebae939f-f03d-478d-ad45-c0fee7670f0a",
253 | "metadata": {},
254 | "outputs": [],
255 | "source": [
256 | "indv_df = dask_cudf.from_cudf(indv_df, npartitions=2).persist()"
257 | ]
258 | },
259 | {
260 | "cell_type": "code",
261 | "execution_count": null,
262 | "id": "efe847c5-892c-4bee-bacb-38e57ae56ebb",
263 | "metadata": {
264 | "tags": []
265 | },
266 | "outputs": [],
267 | "source": [
268 | "# dataset = pd.merge(indv_df,df,on='ID20',how='left')\n",
269 | "dataset = indv_df.merge(df,on='ID20',how='left') # merge dask dfs"
270 | ]
271 | },
272 | {
273 | "cell_type": "code",
274 | "execution_count": null,
275 | "id": "36e6fb1b-a420-4ba2-8b25-b14f5978ca9b",
276 | "metadata": {},
277 | "outputs": [],
278 | "source": [
279 | "len(dataset)"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": null,
285 | "id": "5b63b35b-3774-4900-a1c2-1dc6615784eb",
286 | "metadata": {},
287 | "outputs": [],
288 | "source": [
289 | "del(indv_df)\n",
290 | "del(df)"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": null,
296 | "id": "4f85b884-4e84-451c-a170-ce25631cc922",
297 | "metadata": {
298 | "tags": []
299 | },
300 | "outputs": [],
301 | "source": [
302 | "dataset = dataset.sort_values('ID20')\n",
303 | "dataset = dataset.drop(columns=['GISJOIN'])\n",
304 | "dataset.head()"
305 | ]
306 | },
307 | {
308 | "cell_type": "markdown",
309 | "id": "e0a789ed-4c60-43fd-ac76-1ac99a971110",
310 | "metadata": {
311 | "tags": []
312 | },
313 | "source": [
314 | "### Viz check"
315 | ]
316 | },
317 | {
318 | "cell_type": "code",
319 | "execution_count": null,
320 | "id": "f8643ed6-4b19-4733-a2a6-eba71884e700",
321 | "metadata": {},
322 | "outputs": [],
323 | "source": [
324 | "from holoviews.element.tiles import CartoDark\n",
325 | "import holoviews as hv\n",
326 | "from holoviews.operation.datashader import datashade,rasterize,shade\n",
327 | "from plotly.colors import sequential\n",
328 | "hv.extension('plotly')"
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "execution_count": null,
334 | "id": "85562104-aef7-48c0-85bd-347316a3f633",
335 | "metadata": {},
336 | "outputs": [],
337 | "source": [
338 | "dataset[\"easting\"], dataset[\"northing\"] = hv.Tiles.lon_lat_to_easting_northing(dataset[\"x\"], dataset[\"y\"])\n",
339 | "dataset.head()"
340 | ]
341 | },
342 | {
343 | "cell_type": "code",
344 | "execution_count": null,
345 | "id": "d8653497-d17e-4c38-b1e8-732d424cae04",
346 | "metadata": {},
347 | "outputs": [],
348 | "source": [
349 | "dataset = hv.Dataset(dataset)"
350 | ]
351 | },
352 | {
353 | "cell_type": "code",
354 | "execution_count": null,
355 | "id": "2a19ce9a-9d34-4507-8d60-93dab88dd289",
356 | "metadata": {},
357 | "outputs": [],
358 | "source": [
359 | "mapbox_token = 'pk.eyJ1IjoibmlzaGFudGoiLCJhIjoiY2w1aXpwMXlkMDEyaDNjczBkZDVjY2l6dyJ9.7oLijsue-xOICmTqNInrBQ'\n",
360 | "tiles= hv.Tiles().opts(mapboxstyle=\"dark\", accesstoken=mapbox_token)\n",
361 | "points = datashade(hv.Points(dataset, [\"easting\", \"northing\"]),cmap=sequential.Plasma)"
362 | ]
363 | },
364 | {
365 | "cell_type": "code",
366 | "execution_count": null,
367 | "id": "5d1fc4fe-e329-4f8c-984b-752b8c87246c",
368 | "metadata": {
369 | "tags": []
370 | },
371 | "outputs": [],
372 | "source": [
373 | "(tiles*points).opts(width=1800, height=500)"
374 | ]
375 | }
376 | ],
377 | "metadata": {
378 | "kernelspec": {
379 | "display_name": "Python 3 (ipykernel)",
380 | "language": "python",
381 | "name": "python3"
382 | },
383 | "language_info": {
384 | "codemirror_mode": {
385 | "name": "ipython",
386 | "version": 3
387 | },
388 | "file_extension": ".py",
389 | "mimetype": "text/x-python",
390 | "name": "python",
391 | "nbconvert_exporter": "python",
392 | "pygments_lexer": "ipython3",
393 | "version": "3.9.13"
394 | }
395 | },
396 | "nbformat": 4,
397 | "nbformat_minor": 5
398 | }
399 |
--------------------------------------------------------------------------------
/data_prep_total_population/.ipynb_checkpoints/SeparateTotalDatasetsByState-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Separate Total Population dataset by States"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "import cudf\n",
17 | "import pickle"
18 | ]
19 | },
20 | {
21 | "cell_type": "code",
22 | "execution_count": 2,
23 | "metadata": {},
24 | "outputs": [
25 | {
26 | "data": {
27 | "text/html": [
28 | "
\n",
29 | "\n",
42 | "
\n",
43 | " \n",
44 | " \n",
45 | " | \n",
46 | " easting | \n",
47 | " northing | \n",
48 | " race | \n",
49 | " net | \n",
50 | " county | \n",
51 | "
\n",
52 | " \n",
53 | " \n",
54 | " \n",
55 | " 0 | \n",
56 | " -9626792.0 | \n",
57 | " 3825189.75 | \n",
58 | " 1 | \n",
59 | " 0 | \n",
60 | " 0 | \n",
61 | "
\n",
62 | " \n",
63 | " 1 | \n",
64 | " -9626832.0 | \n",
65 | " 3825073.75 | \n",
66 | " 1 | \n",
67 | " 0 | \n",
68 | " 0 | \n",
69 | "
\n",
70 | " \n",
71 | " 2 | \n",
72 | " -9627101.0 | \n",
73 | " 3825153.50 | \n",
74 | " 1 | \n",
75 | " 0 | \n",
76 | " 0 | \n",
77 | "
\n",
78 | " \n",
79 | " 3 | \n",
80 | " -9627149.0 | \n",
81 | " 3825322.75 | \n",
82 | " 1 | \n",
83 | " 0 | \n",
84 | " 0 | \n",
85 | "
\n",
86 | " \n",
87 | " 4 | \n",
88 | " -9627159.0 | \n",
89 | " 3825334.75 | \n",
90 | " 1 | \n",
91 | " 0 | \n",
92 | " 0 | \n",
93 | "
\n",
94 | " \n",
95 | "
\n",
96 | "
"
97 | ],
98 | "text/plain": [
99 | " easting northing race net county\n",
100 | "0 -9626792.0 3825189.75 1 0 0\n",
101 | "1 -9626832.0 3825073.75 1 0 0\n",
102 | "2 -9627101.0 3825153.50 1 0 0\n",
103 | "3 -9627149.0 3825322.75 1 0 0\n",
104 | "4 -9627159.0 3825334.75 1 0 0"
105 | ]
106 | },
107 | "execution_count": 2,
108 | "metadata": {},
109 | "output_type": "execute_result"
110 | }
111 | ],
112 | "source": [
113 | "# Load the dataset\n",
114 | "df = cudf.read_parquet('../data/total_population_dataset.parquet')\n",
115 | "df.head()"
116 | ]
117 | },
118 | {
119 | "cell_type": "code",
120 | "execution_count": 3,
121 | "metadata": {},
122 | "outputs": [
123 | {
124 | "data": {
125 | "text/html": [
126 | "\n",
127 | "\n",
140 | "
\n",
141 | " \n",
142 | " \n",
143 | " | \n",
144 | " idx | \n",
145 | " county | \n",
146 | " county_lower | \n",
147 | "
\n",
148 | " \n",
149 | " \n",
150 | " \n",
151 | " 0 | \n",
152 | " 0 | \n",
153 | " Autauga County | \n",
154 | " autauga county | \n",
155 | "
\n",
156 | " \n",
157 | " 1 | \n",
158 | " 1 | \n",
159 | " Baldwin County | \n",
160 | " baldwin county | \n",
161 | "
\n",
162 | " \n",
163 | " 2 | \n",
164 | " 2 | \n",
165 | " Barbour County | \n",
166 | " barbour county | \n",
167 | "
\n",
168 | " \n",
169 | " 3 | \n",
170 | " 3 | \n",
171 | " Bibb County | \n",
172 | " bibb county | \n",
173 | "
\n",
174 | " \n",
175 | " 4 | \n",
176 | " 4 | \n",
177 | " Blount County | \n",
178 | " blount county | \n",
179 | "
\n",
180 | " \n",
181 | "
\n",
182 | "
"
183 | ],
184 | "text/plain": [
185 | " idx county county_lower\n",
186 | "0 0 Autauga County autauga county\n",
187 | "1 1 Baldwin County baldwin county\n",
188 | "2 2 Barbour County barbour county\n",
189 | "3 3 Bibb County bibb county\n",
190 | "4 4 Blount County blount county"
191 | ]
192 | },
193 | "execution_count": 3,
194 | "metadata": {},
195 | "output_type": "execute_result"
196 | }
197 | ],
198 | "source": [
199 | "# Load the state to county mapping\n",
200 | "id2county = pickle.load(open('../id2county.pkl','rb'))\n",
201 | "df_counties = cudf.DataFrame(dict(idx=list(id2county.keys()), county=list(id2county.values())))\n",
202 | "\n",
203 | "# Lowercase the county names for easier merging\n",
204 | "df_counties['county_lower'] = df_counties.county.str.lower()\n",
205 | "df_counties.head()"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 4,
211 | "metadata": {},
212 | "outputs": [
213 | {
214 | "data": {
215 | "text/html": [
216 | "\n",
217 | "\n",
230 | "
\n",
231 | " \n",
232 | " \n",
233 | " | \n",
234 | " county | \n",
235 | " type | \n",
236 | " state | \n",
237 | "
\n",
238 | " \n",
239 | " \n",
240 | " \n",
241 | " 0 | \n",
242 | " Harrison | \n",
243 | " county | \n",
244 | " Missouri | \n",
245 | "
\n",
246 | " \n",
247 | " 1 | \n",
248 | " Jefferson | \n",
249 | " county | \n",
250 | " Missouri | \n",
251 | "
\n",
252 | " \n",
253 | " 2 | \n",
254 | " Newton | \n",
255 | " county | \n",
256 | " Missouri | \n",
257 | "
\n",
258 | " \n",
259 | " 3 | \n",
260 | " Wayne | \n",
261 | " county | \n",
262 | " Missouri | \n",
263 | "
\n",
264 | " \n",
265 | " 4 | \n",
266 | " Lincoln | \n",
267 | " county | \n",
268 | " Montana | \n",
269 | "
\n",
270 | " \n",
271 | "
\n",
272 | "
"
273 | ],
274 | "text/plain": [
275 | " county type state\n",
276 | "0 Harrison county Missouri\n",
277 | "1 Jefferson county Missouri\n",
278 | "2 Newton county Missouri\n",
279 | "3 Wayne county Missouri\n",
280 | "4 Lincoln county Montana"
281 | ]
282 | },
283 | "execution_count": 4,
284 | "metadata": {},
285 | "output_type": "execute_result"
286 | }
287 | ],
288 | "source": [
289 | "# Dataset downloaded from https://public.opendatasoft.com/explore/dataset/georef-united-states-of-america-county/export/?disjunctive.ste_code&disjunctive.ste_name&disjunctive.coty_code&disjunctive.coty_name\n",
290 | "county_state_df = cudf.read_csv('../data/us-counties1.csv', delimiter=\";\")[['Official Name County', 'Type', 'Official Name State']].dropna()\n",
291 | "county_state_df.columns = ['county', 'type', 'state']\n",
292 | "county_state_df.head()"
293 | ]
294 | },
295 | {
296 | "cell_type": "code",
297 | "execution_count": 5,
298 | "metadata": {},
299 | "outputs": [],
300 | "source": [
301 | "# Add the type to the county name\n",
302 | "county_state_df['county'] = county_state_df.apply(lambda row: row['county'] + ' ' + row['type'], axis=1)\n",
303 | "\n",
304 | "# Remove non-ascii characters and abbreviations to match the other id2county mapping dataset\n",
305 | "county_state_df['county'] = county_state_df.county.to_pandas().replace({r'[^\\x00-\\x7F]+': '', r'([A-Z][a-z]+)([A-Z]+)': r'\\1'}, regex=True)\n",
306 | "\n",
307 | "# Lowercase the county names for easier merging\n",
308 | "county_state_df['county_lower'] = county_state_df['county'].str.lower()"
309 | ]
310 | },
311 | {
312 | "cell_type": "code",
313 | "execution_count": 6,
314 | "metadata": {},
315 | "outputs": [],
316 | "source": [
317 | "# Merge the datasets and drop duplicates to get the state for each county in the total population dataset\n",
318 | "df_map_county_to_states = df_counties.merge(county_state_df, on='county_lower', how='left', suffixes=['', '_y']).drop_duplicates(subset=['county_lower'])[['idx', 'county', 'state' ]]"
319 | ]
320 | },
321 | {
322 | "cell_type": "code",
323 | "execution_count": 9,
324 | "metadata": {},
325 | "outputs": [],
326 | "source": [
327 | "# Fill in the states for unavailable states manually by looking at the counties\n",
328 | "# Carson City, Nevada\n",
329 | "# District of Columbia, Washington DC\n",
330 | "# Remaining, Connecticut\n",
331 | "df_map_county_to_states.loc[df_map_county_to_states.county == 'Carson City', 'state'] = 'Nevada'\n",
332 | "df_map_county_to_states.loc[df_map_county_to_states.county == 'District of Columbia', 'state'] = 'Nevada'\n",
333 | "df_map_county_to_states.loc[df_map_county_to_states.isna().any(axis=1), 'state'] = 'Connecticut'"
334 | ]
335 | },
336 | {
337 | "cell_type": "code",
338 | "execution_count": 10,
339 | "metadata": {},
340 | "outputs": [],
341 | "source": [
342 | "# Save the mapping\n",
343 | "df_map_county_to_states.to_parquet('../data/county_to_state_mapping.parquet')"
344 | ]
345 | },
346 | {
347 | "cell_type": "code",
348 | "execution_count": 11,
349 | "metadata": {},
350 | "outputs": [
351 | {
352 | "data": {
353 | "text/html": [
354 | "\n",
355 | "\n",
368 | "
\n",
369 | " \n",
370 | " \n",
371 | " | \n",
372 | " idx | \n",
373 | " county | \n",
374 | " state | \n",
375 | "
\n",
376 | " \n",
377 | " \n",
378 | " \n",
379 | " 0 | \n",
380 | " 144 | \n",
381 | " Lonoke County | \n",
382 | " Arkansas | \n",
383 | "
\n",
384 | " \n",
385 | " 1 | \n",
386 | " 145 | \n",
387 | " Miller County | \n",
388 | " Georgia | \n",
389 | "
\n",
390 | " \n",
391 | " 3 | \n",
392 | " 146 | \n",
393 | " Mississippi County | \n",
394 | " Missouri | \n",
395 | "
\n",
396 | " \n",
397 | " 5 | \n",
398 | " 147 | \n",
399 | " Nevada County | \n",
400 | " California | \n",
401 | "
\n",
402 | " \n",
403 | " 7 | \n",
404 | " 148 | \n",
405 | " Newton County | \n",
406 | " Texas | \n",
407 | "
\n",
408 | " \n",
409 | " ... | \n",
410 | " ... | \n",
411 | " ... | \n",
412 | " ... | \n",
413 | "
\n",
414 | " \n",
415 | " 2954 | \n",
416 | " 76 | \n",
417 | " Fairbanks North Star Borough | \n",
418 | " Alaska | \n",
419 | "
\n",
420 | " \n",
421 | " 2955 | \n",
422 | " 78 | \n",
423 | " Hoonah-Angoon Census Area | \n",
424 | " Alaska | \n",
425 | "
\n",
426 | " \n",
427 | " 2956 | \n",
428 | " 79 | \n",
429 | " Juneau City and Borough | \n",
430 | " Alaska | \n",
431 | "
\n",
432 | " \n",
433 | " 2958 | \n",
434 | " 74 | \n",
435 | " Denali Borough | \n",
436 | " Alaska | \n",
437 | "
\n",
438 | " \n",
439 | " 2959 | \n",
440 | " 77 | \n",
441 | " Haines Borough | \n",
442 | " Alaska | \n",
443 | "
\n",
444 | " \n",
445 | "
\n",
446 | "
1955 rows × 3 columns
\n",
447 | "
"
448 | ],
449 | "text/plain": [
450 | " idx county state\n",
451 | "0 144 Lonoke County Arkansas\n",
452 | "1 145 Miller County Georgia\n",
453 | "3 146 Mississippi County Missouri\n",
454 | "5 147 Nevada County California\n",
455 | "7 148 Newton County Texas\n",
456 | "... ... ... ...\n",
457 | "2954 76 Fairbanks North Star Borough Alaska\n",
458 | "2955 78 Hoonah-Angoon Census Area Alaska\n",
459 | "2956 79 Juneau City and Borough Alaska\n",
460 | "2958 74 Denali Borough Alaska\n",
461 | "2959 77 Haines Borough Alaska\n",
462 | "\n",
463 | "[1955 rows x 3 columns]"
464 | ]
465 | },
466 | "execution_count": 11,
467 | "metadata": {},
468 | "output_type": "execute_result"
469 | }
470 | ],
471 | "source": [
472 | "df_map_county_to_states"
473 | ]
474 | }
475 | ],
476 | "metadata": {
477 | "kernelspec": {
478 | "display_name": "Python 3 (ipykernel)",
479 | "language": "python",
480 | "name": "python3"
481 | },
482 | "language_info": {
483 | "codemirror_mode": {
484 | "name": "ipython",
485 | "version": 3
486 | },
487 | "file_extension": ".py",
488 | "mimetype": "text/x-python",
489 | "name": "python",
490 | "nbconvert_exporter": "python",
491 | "pygments_lexer": "ipython3",
492 | "version": "3.10.11"
493 | }
494 | },
495 | "nbformat": 4,
496 | "nbformat_minor": 4
497 | }
498 |
--------------------------------------------------------------------------------
/data_prep_total_population/README.md:
--------------------------------------------------------------------------------
1 | # Total population dataset generation
2 |
3 | ## Order of execution
4 |
5 | 1. map_blocks_and_calc_population
6 | 2. gen_table_with_migration
7 | 3. gen_total_population_points_script
8 | 4. add_race_net_county_to_population
9 |
10 | ## Mappings:
11 |
12 | ### Net
13 |
14 | 1: Inward Migration
15 | 0: Stationary
16 | -1: Outward Migration
17 |
18 | ### Race
19 |
20 | 0: All
21 | 1: White
22 | 2: African American
23 | 3: American Indian
24 | 4: Asian alone
25 | 5: Native Hawaiian
26 | 6: Other Race alone
27 | 7: Two or More
28 |
29 | ### County
30 |
31 | Mappings for counties can be found in `id2county.pkl` file from root directory.
32 |
33 | ### Final Dataset
34 |
35 | You can download the final total population dataset [here](https://data.rapids.ai/viz-data/total_population_dataset.parquet)
36 |
--------------------------------------------------------------------------------
/data_prep_total_population/SeparateTotalDatasetsByState.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## Separate Total Population dataset by States"
8 | ]
9 | }
10 | ],
11 | "metadata": {
12 | "kernelspec": {
13 | "display_name": "Python 3 (ipykernel)",
14 | "language": "python",
15 | "name": "python3"
16 | },
17 | "language_info": {
18 | "codemirror_mode": {
19 | "name": "ipython",
20 | "version": 3
21 | },
22 | "file_extension": ".py",
23 | "mimetype": "text/x-python",
24 | "name": "python",
25 | "nbconvert_exporter": "python",
26 | "pygments_lexer": "ipython3",
27 | "version": "3.10.11"
28 | }
29 | },
30 | "nbformat": 4,
31 | "nbformat_minor": 4
32 | }
33 |
--------------------------------------------------------------------------------
/data_prep_total_population/SeparateTotalDatasetsByState.py:
--------------------------------------------------------------------------------
1 | import cudf
2 | import cuspatial
3 | import geopandas as gpd
4 | import os
5 | from shapely.geometry import Polygon, MultiPolygon
6 |
7 | DATA_PATH = "../data"
8 | DATA_PATH_STATE = f"{DATA_PATH}/state-wise-population"
9 |
10 | # create DATA_PATH if it does not exist
11 | if not os.path.exists(DATA_PATH_STATE):
12 | os.makedirs(DATA_PATH_STATE)
13 |
14 | # Read the total population dataset as a cudf dataframe from the parquet file
15 | df = cudf.read_parquet(f"{DATA_PATH}/total_population_dataset.parquet")
16 |
17 | # Read the shapefile as a cuspatial dataframe and get the state names and geometries
18 | # downloaded from https://hub.arcgis.com/datasets/1b02c87f62d24508970dc1a6df80c98e/explore
19 | shapefile_path = f"{DATA_PATH}/States_shapefile/States_shapefile.shp"
20 | states_data = gpd.read_file(shapefile_path)[
21 | ["State_Code", "State_Name", "geometry"]
22 | ].to_crs(3857)
23 |
24 | print("Number of states to process: ", len(states_data))
25 | print("Number of points in total population dataset: ", len(df))
26 | print("Processing states with Polygon geometries...")
27 |
28 | processed_states = 0
29 | # Loop through the states and get the points in each state and save as a separate dataframe
30 | # process all Polygon geometries in the shapefile
31 | for index, row in states_data.iterrows():
32 | if isinstance(row["geometry"], MultiPolygon):
33 | # skip MultiPolygon geometries
34 | continue
35 |
36 | state_name = row["State_Name"]
37 | processed_states += 1
38 | print(
39 | "Processing state: ",
40 | state_name,
41 | " (",
42 | processed_states,
43 | "/",
44 | len(states_data),
45 | ")",
46 | )
47 |
48 | if os.path.exists(f"{DATA_PATH_STATE}/{state_name}.parquet"):
49 | print("State already processed. Skipping...")
50 | continue
51 |
52 | # process all MultiPolygon geometries in the shapefile
53 | # Use cuspatial point_in_polygon to get the points in the state from the total population dataset
54 | state_geometry = cuspatial.GeoSeries(
55 | gpd.GeoSeries(row["geometry"]), index=["selection"]
56 | )
57 |
58 | # Loop through the total population dataset in batches of 50 million points to avoid OOM issues
59 | batch_size = 50_000_000
60 | points_in_state = cudf.DataFrame({"selection": []})
61 | for i in range(0, len(df), batch_size):
62 | # get the batch of points
63 | batch = df[i : i + batch_size][["easting", "northing"]]
64 | # convert to GeoSeries
65 | points = cuspatial.GeoSeries.from_points_xy(
66 | batch.interleave_columns().astype("float64")
67 | )
68 | # get the points in the state from the batch
69 | points_in_state_current_batch = cuspatial.point_in_polygon(
70 | points, state_geometry
71 | )
72 | # append the points in the state from the batch to the points_in_state dataframe
73 | points_in_state = cudf.concat([points_in_state, points_in_state_current_batch])
74 | # free up memory
75 | del batch
76 |
77 | print(
78 | f"Number of points in {state_name}: ",
79 | df[points_in_state["selection"]].shape[0],
80 | )
81 |
82 | # save the points in the state as a separate dataframe
83 | df[points_in_state["selection"]].to_parquet(
84 | f"{DATA_PATH_STATE}/{state_name}.parquet"
85 | )
86 |
87 | print("Processing states with MultiPolygon geometries...")
88 | # process all MultiPolygon geometries in the shapefile
89 | for index, row in states_data.iterrows():
90 | if isinstance(row["geometry"], Polygon):
91 | # skip Polygon geometries
92 | continue
93 |
94 | state_name = row["State_Name"]
95 | processed_states += 1
96 | print(
97 | "Processing state: ",
98 | state_name,
99 | " (",
100 | processed_states,
101 | "/",
102 | len(states_data),
103 | ")",
104 | )
105 | if os.path.exists(f"{DATA_PATH_STATE}/{state_name}.parquet"):
106 | print("State already processed. Skipping...")
107 | continue
108 |
109 | # process all MultiPolygon geometries in the shapefile
110 | points_in_state = None
111 | for polygon in list(row["geometry"].geoms):
112 | # process each polygon in the MultiPolygon
113 | state_geometry = cuspatial.GeoSeries(
114 | gpd.GeoSeries(polygon), index=["selection"]
115 | )
116 |
117 | # Loop through the total population dataset in batches of 50 million points to avoid OOM issues
118 | batch_size = 50_000_000
119 | points_in_state_current_polygon = cudf.DataFrame({"selection": []})
120 | for i in range(0, len(df), batch_size):
121 | # get the batch of points
122 | batch = df[i : i + batch_size][["easting", "northing"]]
123 | # convert to GeoSeries
124 | points = cuspatial.GeoSeries.from_points_xy(
125 | batch.interleave_columns().astype("float64")
126 | )
127 | # get the points in the state from the batch
128 | points_in_state_current_batch = cuspatial.point_in_polygon(
129 | points, state_geometry
130 | )
131 | # append the points in the state from the batch to the points_in_state_current_polygon dataframe
132 | points_in_state_current_polygon = cudf.concat(
133 | [points_in_state_current_polygon, points_in_state_current_batch]
134 | )
135 | # free up memory
136 | del batch
137 |
138 | points_in_state = (
139 | points_in_state_current_polygon
140 | if points_in_state is None
141 | else points_in_state | points_in_state_current_polygon
142 | )
143 |
144 | print(
145 | f"Number of points in {state_name}: ",
146 | df[points_in_state["selection"]].shape[0],
147 | )
148 |
149 | # save the points in the state as a separate dataframe
150 | df[points_in_state["selection"]].to_parquet(
151 | f"{DATA_PATH_STATE}/{state_name}.parquet"
152 | )
153 |
--------------------------------------------------------------------------------
/data_prep_total_population/gen_table_with_migration.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "dd3c16ee-6929-4ecf-a442-9555d0b97c03",
6 | "metadata": {},
7 | "source": [
8 | "#### Objective:- Clean and save needed attributes and create table for generating migration points."
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "7f7bc3c9-c297-4e02-9d4f-3e27d2223492",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd\n",
19 | "import geopandas as gpd\n",
20 | "import ast,os,random\n",
21 | "pd.set_option('display.float_format','{:.1f}'.format)\n",
22 | "import warnings\n",
23 | "warnings.filterwarnings('ignore')\n",
24 | "import cudf, cupy as cp\n",
25 | "import numpy as np\n",
26 | "import time\n",
27 | "import math\n",
28 | "import pickle\n",
29 | "# pd.set_option('display.max_colwidth', -1)"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "id": "a5b2e784-179d-404f-be48-c2d84bbcf9a7",
35 | "metadata": {
36 | "tags": []
37 | },
38 | "source": [
39 | "#### Load data"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 2,
45 | "id": "c9e38991-3723-4edb-a4b4-f890f24bd85f",
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "df = pd.read_csv('data/mapped_blocks_full.csv',encoding='unicode_escape',usecols=['ID20','STATE','COUNTY','P20','eq_P10'])"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 3,
55 | "id": "b5dbead9-dd67-409a-b81e-7c8dd69c97cc",
56 | "metadata": {},
57 | "outputs": [
58 | {
59 | "data": {
60 | "text/plain": [
61 | "334735155"
62 | ]
63 | },
64 | "execution_count": 3,
65 | "metadata": {},
66 | "output_type": "execute_result"
67 | }
68 | ],
69 | "source": [
70 | "df.P20.sum()"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 4,
76 | "id": "2f0bafbd-5a2e-4b86-8ac9-fd2a83a3889a",
77 | "metadata": {},
78 | "outputs": [],
79 | "source": [
80 | "df.COUNTY.replace({r'[^\\x00-\\x7F]+':''},regex=True,inplace=True)\n",
81 | "df.COUNTY.replace({r'([A-Z][a-z]+)([A-Z]+)':r'\\1'},regex=True,inplace=True)"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": 6,
87 | "id": "4d82f1b8-e111-463c-8148-82238a2098d5",
88 | "metadata": {},
89 | "outputs": [
90 | {
91 | "data": {
92 | "text/plain": [
93 | "8174955"
94 | ]
95 | },
96 | "execution_count": 6,
97 | "metadata": {},
98 | "output_type": "execute_result"
99 | }
100 | ],
101 | "source": [
102 | "len(df)"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": 7,
108 | "id": "b813f68f-448a-4a97-af82-cbca5f15266b",
109 | "metadata": {},
110 | "outputs": [
111 | {
112 | "data": {
113 | "text/html": [
114 | "\n",
115 | "\n",
128 | "
\n",
129 | " \n",
130 | " \n",
131 | " | \n",
132 | " ID20 | \n",
133 | " STATE | \n",
134 | " COUNTY | \n",
135 | " P20 | \n",
136 | " eq_P10 | \n",
137 | " block_diff | \n",
138 | " block_net | \n",
139 | "
\n",
140 | " \n",
141 | " \n",
142 | " \n",
143 | " 0 | \n",
144 | " 10010201001000 | \n",
145 | " 1 | \n",
146 | " Autauga County | \n",
147 | " 21 | \n",
148 | " 30.5 | \n",
149 | " -10.0 | \n",
150 | " -1 | \n",
151 | "
\n",
152 | " \n",
153 | " 1 | \n",
154 | " 10010201001001 | \n",
155 | " 1 | \n",
156 | " Autauga County | \n",
157 | " 34 | \n",
158 | " 30.5 | \n",
159 | " 4.0 | \n",
160 | " 1 | \n",
161 | "
\n",
162 | " \n",
163 | " 2 | \n",
164 | " 10010201001002 | \n",
165 | " 1 | \n",
166 | " Autauga County | \n",
167 | " 29 | \n",
168 | " 51.8 | \n",
169 | " -23.0 | \n",
170 | " -1 | \n",
171 | "
\n",
172 | " \n",
173 | " 3 | \n",
174 | " 10010201001003 | \n",
175 | " 1 | \n",
176 | " Autauga County | \n",
177 | " 17 | \n",
178 | " 13.3 | \n",
179 | " 4.0 | \n",
180 | " 1 | \n",
181 | "
\n",
182 | " \n",
183 | " 4 | \n",
184 | " 10010201001004 | \n",
185 | " 1 | \n",
186 | " Autauga County | \n",
187 | " 0 | \n",
188 | " 0.0 | \n",
189 | " 0.0 | \n",
190 | " 0 | \n",
191 | "
\n",
192 | " \n",
193 | "
\n",
194 | "
"
195 | ],
196 | "text/plain": [
197 | " ID20 STATE COUNTY P20 eq_P10 block_diff block_net\n",
198 | "0 10010201001000 1 Autauga County 21 30.5 -10.0 -1\n",
199 | "1 10010201001001 1 Autauga County 34 30.5 4.0 1\n",
200 | "2 10010201001002 1 Autauga County 29 51.8 -23.0 -1\n",
201 | "3 10010201001003 1 Autauga County 17 13.3 4.0 1\n",
202 | "4 10010201001004 1 Autauga County 0 0.0 0.0 0"
203 | ]
204 | },
205 | "execution_count": 7,
206 | "metadata": {},
207 | "output_type": "execute_result"
208 | }
209 | ],
210 | "source": [
211 | "df['block_diff'] = df['P20'] - df['eq_P10']\n",
212 | "df['block_diff'] = df['block_diff'].round()\n",
213 | "df['block_net'] = df['block_diff'].apply(lambda x: 1 if x>0 else ( -1 if x<0 else 0))\n",
214 | "df.head()"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 8,
220 | "id": "99f14f0c-35b0-4a6f-8d4f-b5286ef6c9e7",
221 | "metadata": {},
222 | "outputs": [
223 | {
224 | "data": {
225 | "text/html": [
226 | "\n",
227 | "\n",
240 | "
\n",
241 | " \n",
242 | " \n",
243 | " | \n",
244 | " ID20 | \n",
245 | " STATE | \n",
246 | " COUNTY | \n",
247 | " P20 | \n",
248 | " eq_P10 | \n",
249 | " block_diff | \n",
250 | " block_net | \n",
251 | " error | \n",
252 | "
\n",
253 | " \n",
254 | " \n",
255 | " \n",
256 | " 0 | \n",
257 | " 10010201001000 | \n",
258 | " 1 | \n",
259 | " Autauga County | \n",
260 | " 21 | \n",
261 | " 30.0 | \n",
262 | " -10.0 | \n",
263 | " -1 | \n",
264 | " 1.0 | \n",
265 | "
\n",
266 | " \n",
267 | " 1 | \n",
268 | " 10010201001001 | \n",
269 | " 1 | \n",
270 | " Autauga County | \n",
271 | " 34 | \n",
272 | " 30.0 | \n",
273 | " 4.0 | \n",
274 | " 1 | \n",
275 | " 0.0 | \n",
276 | "
\n",
277 | " \n",
278 | " 2 | \n",
279 | " 10010201001002 | \n",
280 | " 1 | \n",
281 | " Autauga County | \n",
282 | " 29 | \n",
283 | " 52.0 | \n",
284 | " -23.0 | \n",
285 | " -1 | \n",
286 | " 0.0 | \n",
287 | "
\n",
288 | " \n",
289 | " 3 | \n",
290 | " 10010201001003 | \n",
291 | " 1 | \n",
292 | " Autauga County | \n",
293 | " 17 | \n",
294 | " 13.0 | \n",
295 | " 4.0 | \n",
296 | " 1 | \n",
297 | " 0.0 | \n",
298 | "
\n",
299 | " \n",
300 | " 4 | \n",
301 | " 10010201001004 | \n",
302 | " 1 | \n",
303 | " Autauga County | \n",
304 | " 0 | \n",
305 | " 0.0 | \n",
306 | " 0.0 | \n",
307 | " 0 | \n",
308 | " 0.0 | \n",
309 | "
\n",
310 | " \n",
311 | "
\n",
312 | "
"
313 | ],
314 | "text/plain": [
315 | " ID20 STATE COUNTY P20 eq_P10 block_diff block_net \\\n",
316 | "0 10010201001000 1 Autauga County 21 30.0 -10.0 -1 \n",
317 | "1 10010201001001 1 Autauga County 34 30.0 4.0 1 \n",
318 | "2 10010201001002 1 Autauga County 29 52.0 -23.0 -1 \n",
319 | "3 10010201001003 1 Autauga County 17 13.0 4.0 1 \n",
320 | "4 10010201001004 1 Autauga County 0 0.0 0.0 0 \n",
321 | "\n",
322 | " error \n",
323 | "0 1.0 \n",
324 | "1 0.0 \n",
325 | "2 0.0 \n",
326 | "3 0.0 \n",
327 | "4 0.0 "
328 | ]
329 | },
330 | "execution_count": 8,
331 | "metadata": {},
332 | "output_type": "execute_result"
333 | }
334 | ],
335 | "source": [
336 | "df['eq_P10'] = df['eq_P10'].round()\n",
337 | "df['error'] = (df['P20']-df['eq_P10']) - df['block_diff']\n",
338 | "df.head()"
339 | ]
340 | },
341 | {
342 | "cell_type": "code",
343 | "execution_count": 9,
344 | "id": "85d04ec7-c40a-491f-b5ac-dbdddfa087c4",
345 | "metadata": {},
346 | "outputs": [
347 | {
348 | "data": {
349 | "text/html": [
350 | "\n",
351 | "\n",
364 | "
\n",
365 | " \n",
366 | " \n",
367 | " | \n",
368 | " ID20 | \n",
369 | " STATE | \n",
370 | " COUNTY | \n",
371 | " P20 | \n",
372 | " eq_P10 | \n",
373 | " block_diff | \n",
374 | " block_net | \n",
375 | " error | \n",
376 | "
\n",
377 | " \n",
378 | " \n",
379 | " \n",
380 | "
\n",
381 | "
"
382 | ],
383 | "text/plain": [
384 | "Empty DataFrame\n",
385 | "Columns: [ID20, STATE, COUNTY, P20, eq_P10, block_diff, block_net, error]\n",
386 | "Index: []"
387 | ]
388 | },
389 | "execution_count": 9,
390 | "metadata": {},
391 | "output_type": "execute_result"
392 | }
393 | ],
394 | "source": [
395 | "df['eq_P10'] = df['eq_P10'] + df['error']\n",
396 | "df[(df['P20']-df['eq_P10'])!=(df['block_diff'])]"
397 | ]
398 | },
399 | {
400 | "cell_type": "code",
401 | "execution_count": 14,
402 | "id": "eca29aec-38f6-45ec-a64c-a67582fd79ed",
403 | "metadata": {},
404 | "outputs": [],
405 | "source": [
406 | "df[['ID20','COUNTY','P20','eq_P10','block_diff','block_net']].to_parquet('data/total_attr_gen_df.parquet') #save attributes to be added later"
407 | ]
408 | },
409 | {
410 | "cell_type": "markdown",
411 | "id": "e74d0125-ec88-43e9-96a0-9f68202fb0b5",
412 | "metadata": {},
413 | "source": [
414 | "#### Attach county"
415 | ]
416 | },
417 | {
418 | "cell_type": "code",
419 | "execution_count": 2,
420 | "id": "6db27018-a14b-4a4c-8181-6386ab9a6430",
421 | "metadata": {},
422 | "outputs": [
423 | {
424 | "data": {
425 | "text/html": [
426 | "\n",
427 | "\n",
440 | "
\n",
441 | " \n",
442 | " \n",
443 | " | \n",
444 | " ID20 | \n",
445 | " COUNTY | \n",
446 | " P20 | \n",
447 | " eq_P10 | \n",
448 | " block_diff | \n",
449 | " block_net | \n",
450 | "
\n",
451 | " \n",
452 | " \n",
453 | " \n",
454 | " 0 | \n",
455 | " 10010201001000 | \n",
456 | " Autauga County | \n",
457 | " 21 | \n",
458 | " 31.0 | \n",
459 | " -10.0 | \n",
460 | " -1 | \n",
461 | "
\n",
462 | " \n",
463 | " 1 | \n",
464 | " 10010201001001 | \n",
465 | " Autauga County | \n",
466 | " 34 | \n",
467 | " 30.0 | \n",
468 | " 4.0 | \n",
469 | " 1 | \n",
470 | "
\n",
471 | " \n",
472 | " 2 | \n",
473 | " 10010201001002 | \n",
474 | " Autauga County | \n",
475 | " 29 | \n",
476 | " 52.0 | \n",
477 | " -23.0 | \n",
478 | " -1 | \n",
479 | "
\n",
480 | " \n",
481 | " 3 | \n",
482 | " 10010201001003 | \n",
483 | " Autauga County | \n",
484 | " 17 | \n",
485 | " 13.0 | \n",
486 | " 4.0 | \n",
487 | " 1 | \n",
488 | "
\n",
489 | " \n",
490 | " 4 | \n",
491 | " 10010201001004 | \n",
492 | " Autauga County | \n",
493 | " 0 | \n",
494 | " 0.0 | \n",
495 | " 0.0 | \n",
496 | " 0 | \n",
497 | "
\n",
498 | " \n",
499 | "
\n",
500 | "
"
501 | ],
502 | "text/plain": [
503 | " ID20 COUNTY P20 eq_P10 block_diff block_net\n",
504 | "0 10010201001000 Autauga County 21 31.0 -10.0 -1\n",
505 | "1 10010201001001 Autauga County 34 30.0 4.0 1\n",
506 | "2 10010201001002 Autauga County 29 52.0 -23.0 -1\n",
507 | "3 10010201001003 Autauga County 17 13.0 4.0 1\n",
508 | "4 10010201001004 Autauga County 0 0.0 0.0 0"
509 | ]
510 | },
511 | "execution_count": 2,
512 | "metadata": {},
513 | "output_type": "execute_result"
514 | }
515 | ],
516 | "source": [
517 | "df = pd.read_parquet('data/total_attr_gen_df.parquet')\n",
518 | "df.head()"
519 | ]
520 | },
521 | {
522 | "cell_type": "code",
523 | "execution_count": 3,
524 | "id": "ef7dcf13-ebe0-473a-89bd-87e97327de16",
525 | "metadata": {},
526 | "outputs": [],
527 | "source": [
528 | "def calculate_points(row):\n",
529 | " net = row[-1]\n",
530 | " p20 = row[0]\n",
531 | " p10 = row[1]\n",
532 | " if net < 0:\n",
533 | " return p20 + p10\n",
534 | " else: return p20"
535 | ]
536 | },
537 | {
538 | "cell_type": "code",
539 | "execution_count": 4,
540 | "id": "34b846d8-d09a-4713-bfa2-987b2b189a8e",
541 | "metadata": {},
542 | "outputs": [],
543 | "source": [
544 | "df['points'] = df[['P20','eq_P10','block_net']].apply(calculate_points,axis=1)"
545 | ]
546 | },
547 | {
548 | "cell_type": "code",
549 | "execution_count": 5,
550 | "id": "cc3891e4-3480-441f-8291-fad3f01320df",
551 | "metadata": {},
552 | "outputs": [
553 | {
554 | "data": {
555 | "text/html": [
556 | "\n",
557 | "\n",
570 | "
\n",
571 | " \n",
572 | " \n",
573 | " | \n",
574 | " ID20 | \n",
575 | " COUNTY | \n",
576 | " P20 | \n",
577 | " eq_P10 | \n",
578 | " block_diff | \n",
579 | " block_net | \n",
580 | " points | \n",
581 | "
\n",
582 | " \n",
583 | " \n",
584 | " \n",
585 | " 0 | \n",
586 | " 10010201001000 | \n",
587 | " Autauga County | \n",
588 | " 21 | \n",
589 | " 31.0 | \n",
590 | " -10.0 | \n",
591 | " -1 | \n",
592 | " 52.0 | \n",
593 | "
\n",
594 | " \n",
595 | " 1 | \n",
596 | " 10010201001001 | \n",
597 | " Autauga County | \n",
598 | " 34 | \n",
599 | " 30.0 | \n",
600 | " 4.0 | \n",
601 | " 1 | \n",
602 | " 34.0 | \n",
603 | "
\n",
604 | " \n",
605 | " 2 | \n",
606 | " 10010201001002 | \n",
607 | " Autauga County | \n",
608 | " 29 | \n",
609 | " 52.0 | \n",
610 | " -23.0 | \n",
611 | " -1 | \n",
612 | " 81.0 | \n",
613 | "
\n",
614 | " \n",
615 | " 3 | \n",
616 | " 10010201001003 | \n",
617 | " Autauga County | \n",
618 | " 17 | \n",
619 | " 13.0 | \n",
620 | " 4.0 | \n",
621 | " 1 | \n",
622 | " 17.0 | \n",
623 | "
\n",
624 | " \n",
625 | " 4 | \n",
626 | " 10010201001004 | \n",
627 | " Autauga County | \n",
628 | " 0 | \n",
629 | " 0.0 | \n",
630 | " 0.0 | \n",
631 | " 0 | \n",
632 | " 0.0 | \n",
633 | "
\n",
634 | " \n",
635 | " ... | \n",
636 | " ... | \n",
637 | " ... | \n",
638 | " ... | \n",
639 | " ... | \n",
640 | " ... | \n",
641 | " ... | \n",
642 | " ... | \n",
643 | "
\n",
644 | " \n",
645 | " 8174950 | \n",
646 | " 721537506022011 | \n",
647 | " Yauco Municipio | \n",
648 | " 27 | \n",
649 | " 6.0 | \n",
650 | " 21.0 | \n",
651 | " 1 | \n",
652 | " 27.0 | \n",
653 | "
\n",
654 | " \n",
655 | " 8174951 | \n",
656 | " 721537506022012 | \n",
657 | " Yauco Municipio | \n",
658 | " 43 | \n",
659 | " 63.0 | \n",
660 | " -20.0 | \n",
661 | " -1 | \n",
662 | " 106.0 | \n",
663 | "
\n",
664 | " \n",
665 | " 8174952 | \n",
666 | " 721537506022013 | \n",
667 | " Yauco Municipio | \n",
668 | " 195 | \n",
669 | " 341.0 | \n",
670 | " -146.0 | \n",
671 | " -1 | \n",
672 | " 536.0 | \n",
673 | "
\n",
674 | " \n",
675 | " 8174953 | \n",
676 | " 721537506022014 | \n",
677 | " Yauco Municipio | \n",
678 | " 0 | \n",
679 | " 0.0 | \n",
680 | " 0.0 | \n",
681 | " 0 | \n",
682 | " 0.0 | \n",
683 | "
\n",
684 | " \n",
685 | " 8174954 | \n",
686 | " 721537506022015 | \n",
687 | " Yauco Municipio | \n",
688 | " 0 | \n",
689 | " 0.0 | \n",
690 | " 0.0 | \n",
691 | " 0 | \n",
692 | " 0.0 | \n",
693 | "
\n",
694 | " \n",
695 | "
\n",
696 | "
8174955 rows × 7 columns
\n",
697 | "
"
698 | ],
699 | "text/plain": [
700 | " ID20 COUNTY P20 eq_P10 block_diff block_net \\\n",
701 | "0 10010201001000 Autauga County 21 31.0 -10.0 -1 \n",
702 | "1 10010201001001 Autauga County 34 30.0 4.0 1 \n",
703 | "2 10010201001002 Autauga County 29 52.0 -23.0 -1 \n",
704 | "3 10010201001003 Autauga County 17 13.0 4.0 1 \n",
705 | "4 10010201001004 Autauga County 0 0.0 0.0 0 \n",
706 | "... ... ... ... ... ... ... \n",
707 | "8174950 721537506022011 Yauco Municipio 27 6.0 21.0 1 \n",
708 | "8174951 721537506022012 Yauco Municipio 43 63.0 -20.0 -1 \n",
709 | "8174952 721537506022013 Yauco Municipio 195 341.0 -146.0 -1 \n",
710 | "8174953 721537506022014 Yauco Municipio 0 0.0 0.0 0 \n",
711 | "8174954 721537506022015 Yauco Municipio 0 0.0 0.0 0 \n",
712 | "\n",
713 | " points \n",
714 | "0 52.0 \n",
715 | "1 34.0 \n",
716 | "2 81.0 \n",
717 | "3 17.0 \n",
718 | "4 0.0 \n",
719 | "... ... \n",
720 | "8174950 27.0 \n",
721 | "8174951 106.0 \n",
722 | "8174952 536.0 \n",
723 | "8174953 0.0 \n",
724 | "8174954 0.0 \n",
725 | "\n",
726 | "[8174955 rows x 7 columns]"
727 | ]
728 | },
729 | "execution_count": 5,
730 | "metadata": {},
731 | "output_type": "execute_result"
732 | }
733 | ],
734 | "source": [
735 | "df"
736 | ]
737 | },
738 | {
739 | "cell_type": "code",
740 | "execution_count": 6,
741 | "id": "6265d762-7a4b-4e18-9383-7866d35b3246",
742 | "metadata": {},
743 | "outputs": [],
744 | "source": [
745 | "county2id = pickle.load(open('county2id.pkl','rb'))"
746 | ]
747 | },
748 | {
749 | "cell_type": "code",
750 | "execution_count": 10,
751 | "id": "8171f642-0b80-4a8a-a7a2-d1462ce61644",
752 | "metadata": {},
753 | "outputs": [
754 | {
755 | "data": {
756 | "text/plain": [
757 | "Jefferson County 96055\n",
758 | "Los Angeles County 91626\n",
759 | "Cook County 85108\n",
760 | "Washington County 75565\n",
761 | "Montgomery County 66524\n",
762 | "Maricopa County 61427\n",
763 | "Franklin County 60891\n",
764 | "Orange County 60830\n",
765 | "Jackson County 60381\n",
766 | "Wayne County 59249\n",
767 | "Name: COUNTY, dtype: int64"
768 | ]
769 | },
770 | "execution_count": 10,
771 | "metadata": {},
772 | "output_type": "execute_result"
773 | }
774 | ],
775 | "source": [
776 | "df.COUNTY.value_counts().head(10)"
777 | ]
778 | },
779 | {
780 | "cell_type": "code",
781 | "execution_count": 22,
782 | "id": "a49f4f74-ca82-4ad8-a23c-f185f33fd84f",
783 | "metadata": {},
784 | "outputs": [],
785 | "source": [
786 | "df = df[df.points!=0].reset_index(drop=True)"
787 | ]
788 | },
789 | {
790 | "cell_type": "code",
791 | "execution_count": 23,
792 | "id": "5f766288-a5d7-4f38-8c7c-5d09b04a7a75",
793 | "metadata": {},
794 | "outputs": [
795 | {
796 | "data": {
797 | "text/plain": [
798 | "6200461"
799 | ]
800 | },
801 | "execution_count": 23,
802 | "metadata": {},
803 | "output_type": "execute_result"
804 | }
805 | ],
806 | "source": [
807 | "df[df['COUNTY'] == 'Maricopa County'].points.sum()"
808 | ]
809 | },
810 | {
811 | "cell_type": "code",
812 | "execution_count": 14,
813 | "id": "c064904a-7a6a-4d55-b64b-e2433d8b2ec7",
814 | "metadata": {},
815 | "outputs": [],
816 | "source": [
817 | "df['points'] = df['points'].astype('int32')"
818 | ]
819 | },
820 | {
821 | "cell_type": "code",
822 | "execution_count": 20,
823 | "id": "e6ab7c4c-f044-49e1-9c28-c8734ab87160",
824 | "metadata": {},
825 | "outputs": [],
826 | "source": [
827 | "counties = df[['COUNTY','points']].apply(lambda row: [county2id[row[0]]]*row[1],axis=1)"
828 | ]
829 | },
830 | {
831 | "cell_type": "code",
832 | "execution_count": 22,
833 | "id": "71b59e6e-5d8c-44c9-818f-3fce3cf12350",
834 | "metadata": {},
835 | "outputs": [],
836 | "source": [
837 | "gcounties = cudf.from_pandas(counties)"
838 | ]
839 | },
840 | {
841 | "cell_type": "code",
842 | "execution_count": 25,
843 | "id": "ac88db07-26ee-4cc0-8102-90e6fa247fa6",
844 | "metadata": {},
845 | "outputs": [],
846 | "source": [
847 | "counties_list = gcounties.explode().reset_index(drop=True)"
848 | ]
849 | },
850 | {
851 | "cell_type": "code",
852 | "execution_count": 27,
853 | "id": "40e5fe64-2bb7-46bf-a741-470ad993f98e",
854 | "metadata": {},
855 | "outputs": [],
856 | "source": [
857 | "pickle.dump(counties_list,open('county_list.pkl','wb'))"
858 | ]
859 | },
860 | {
861 | "cell_type": "code",
862 | "execution_count": 28,
863 | "id": "ccc64011-00ed-43b5-ac0f-332136f6e180",
864 | "metadata": {},
865 | "outputs": [
866 | {
867 | "data": {
868 | "text/plain": [
869 | "504475979"
870 | ]
871 | },
872 | "execution_count": 28,
873 | "metadata": {},
874 | "output_type": "execute_result"
875 | }
876 | ],
877 | "source": [
878 | "len(counties_list)"
879 | ]
880 | },
881 | {
882 | "cell_type": "markdown",
883 | "id": "20524a87-e169-4306-acde-2f17edd3721f",
884 | "metadata": {},
885 | "source": [
886 | "#### Continue making dataset for population gen"
887 | ]
888 | },
889 | {
890 | "cell_type": "code",
891 | "execution_count": 52,
892 | "id": "875e176a-633b-4bad-ba4c-09b2f97892b5",
893 | "metadata": {},
894 | "outputs": [
895 | {
896 | "name": "stdout",
897 | "output_type": "stream",
898 | "text": [
899 | "8174955\n"
900 | ]
901 | }
902 | ],
903 | "source": [
904 | "print(len(df))"
905 | ]
906 | },
907 | {
908 | "cell_type": "code",
909 | "execution_count": 53,
910 | "id": "dd1e348d-3753-4ab5-9da7-e0d67e0a7946",
911 | "metadata": {},
912 | "outputs": [],
913 | "source": [
914 | "df =df[df.points!=0]"
915 | ]
916 | },
917 | {
918 | "cell_type": "code",
919 | "execution_count": 54,
920 | "id": "330fe5a7-5e55-48e4-b88a-3b77a42c526a",
921 | "metadata": {},
922 | "outputs": [
923 | {
924 | "name": "stdout",
925 | "output_type": "stream",
926 | "text": [
927 | "6265163\n"
928 | ]
929 | }
930 | ],
931 | "source": [
932 | "print(len(df))"
933 | ]
934 | },
935 | {
936 | "cell_type": "code",
937 | "execution_count": 55,
938 | "id": "9d2dab42-4e75-4618-8166-958653847976",
939 | "metadata": {},
940 | "outputs": [],
941 | "source": [
942 | "gen_df = df[['ID20','STATE','points']]"
943 | ]
944 | },
945 | {
946 | "cell_type": "code",
947 | "execution_count": 56,
948 | "id": "3f0c26b3-e585-44b2-96a1-05261a4379db",
949 | "metadata": {},
950 | "outputs": [],
951 | "source": [
952 | "gen_df.to_csv('data/total_population_gen_df.csv')"
953 | ]
954 | },
955 | {
956 | "cell_type": "code",
957 | "execution_count": 61,
958 | "id": "e321a5b6-dfc0-4617-a74d-df5368e79d63",
959 | "metadata": {},
960 | "outputs": [
961 | {
962 | "data": {
963 | "text/plain": [
964 | "6265163"
965 | ]
966 | },
967 | "execution_count": 61,
968 | "metadata": {},
969 | "output_type": "execute_result"
970 | }
971 | ],
972 | "source": [
973 | "len(gen_df)"
974 | ]
975 | }
976 | ],
977 | "metadata": {
978 | "kernelspec": {
979 | "display_name": "Python 3 (ipykernel)",
980 | "language": "python",
981 | "name": "python3"
982 | },
983 | "language_info": {
984 | "codemirror_mode": {
985 | "name": "ipython",
986 | "version": 3
987 | },
988 | "file_extension": ".py",
989 | "mimetype": "text/x-python",
990 | "name": "python",
991 | "nbconvert_exporter": "python",
992 | "pygments_lexer": "ipython3",
993 | "version": "3.9.13"
994 | }
995 | },
996 | "nbformat": 4,
997 | "nbformat_minor": 5
998 | }
999 |
--------------------------------------------------------------------------------
/data_prep_total_population/gen_total_population_points_script.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "1cddfb2e-e508-4410-b32d-fd3452298004",
6 | "metadata": {},
7 | "source": [
8 | "#### Objective: Use race migration table to generate race migration points"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "5b77a6ef-ef4e-45fe-8b43-6545313d4556",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd\n",
19 | "import geopandas as gpd\n",
20 | "import ast,os,random\n",
21 | "pd.set_option('display.float_format','{:.1f}'.format)\n",
22 | "import warnings\n",
23 | "warnings.filterwarnings('ignore')\n",
24 | "import cudf, cupy as cp\n",
25 | "import numpy as np\n",
26 | "import time\n",
27 | "import math\n",
28 | "import sys,os,datetime,random\n",
29 | "from shapely.geometry import Point\n",
30 | "# pd.set_option('display.max_colwidth', -1)"
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "id": "e99e275d-c977-4392-b81a-bbb73bc5ee4f",
36 | "metadata": {
37 | "tags": []
38 | },
39 | "source": [
40 | "#### Load data"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 8,
46 | "id": "66319687-f2ba-4d55-85df-83692db81f96",
47 | "metadata": {},
48 | "outputs": [
49 | {
50 | "data": {
51 | "text/html": [
52 | "\n",
53 | "\n",
66 | "
\n",
67 | " \n",
68 | " \n",
69 | " | \n",
70 | " ID20 | \n",
71 | " STATE | \n",
72 | " points | \n",
73 | "
\n",
74 | " \n",
75 | " \n",
76 | " \n",
77 | " 0 | \n",
78 | " 10010201001000 | \n",
79 | " 1 | \n",
80 | " 52.0 | \n",
81 | "
\n",
82 | " \n",
83 | " 1 | \n",
84 | " 10010201001001 | \n",
85 | " 1 | \n",
86 | " 34.0 | \n",
87 | "
\n",
88 | " \n",
89 | " 2 | \n",
90 | " 10010201001002 | \n",
91 | " 1 | \n",
92 | " 81.0 | \n",
93 | "
\n",
94 | " \n",
95 | " 3 | \n",
96 | " 10010201001003 | \n",
97 | " 1 | \n",
98 | " 17.0 | \n",
99 | "
\n",
100 | " \n",
101 | " 4 | \n",
102 | " 10010201001005 | \n",
103 | " 1 | \n",
104 | " 8.0 | \n",
105 | "
\n",
106 | " \n",
107 | "
\n",
108 | "
"
109 | ],
110 | "text/plain": [
111 | " ID20 STATE points\n",
112 | "0 10010201001000 1 52.0\n",
113 | "1 10010201001001 1 34.0\n",
114 | "2 10010201001002 1 81.0\n",
115 | "3 10010201001003 1 17.0\n",
116 | "4 10010201001005 1 8.0"
117 | ]
118 | },
119 | "execution_count": 8,
120 | "metadata": {},
121 | "output_type": "execute_result"
122 | }
123 | ],
124 | "source": [
125 | "df = cudf.read_csv('data/total_population_gen_df.csv').drop('Unnamed: 0',axis=1)\n",
126 | "df.head()"
127 | ]
128 | },
129 | {
130 | "cell_type": "code",
131 | "execution_count": 3,
132 | "id": "97e9b49c-c118-4ca3-b973-c5f36c2499fa",
133 | "metadata": {},
134 | "outputs": [],
135 | "source": [
136 | "# df = df[df.STATE==6]\n",
137 | "# len(df)//3\n",
138 | "# df= df.iloc[:len(df)//3]"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": 10,
144 | "id": "c44ff46f-c2ef-4ebe-acfe-7dcc7bde6db8",
145 | "metadata": {},
146 | "outputs": [
147 | {
148 | "name": "stdout",
149 | "output_type": "stream",
150 | "text": [
151 | "161904\n"
152 | ]
153 | }
154 | ],
155 | "source": [
156 | "print(len(df))"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 2,
162 | "id": "7d6527f6-8214-4494-81d7-2c7c78b1f80f",
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "def random_points_in_polygon(number, polygon):\n",
167 | " # print(polygon)\n",
168 | " points_x = np.array([])\n",
169 | " points_y = np.array([])\n",
170 | " min_x, min_y, max_x, max_y = polygon.bounds\n",
171 | " i= 0\n",
172 | " while i < number:\n",
173 | " point_x = random.uniform(min_x, max_x)\n",
174 | " point_y = random.uniform(min_y, max_y)\n",
175 | " if polygon.contains(Point(point_x, point_y)):\n",
176 | " points_x = np.append(points_x, point_x)\n",
177 | " points_y = np.append(points_y, point_y)\n",
178 | " i += 1\n",
179 | " return points_x, points_y # returns list of points(lat), list of points(long)\n",
180 | "def generate_data(state, df_temp, gpdf):\n",
181 | " t1 = datetime.datetime.now()\n",
182 | " geoid_index_df = df_temp.index.to_numpy()\n",
183 | " final_points_x = np.array([])\n",
184 | " final_points_y = np.array([])\n",
185 | " geoid = np.array([])\n",
186 | " # Add additional features\n",
187 | " county = np.array([])\n",
188 | " p_delta = np.array([])\n",
189 | " p_net = np.array([])\n",
190 | " \n",
191 | " \n",
192 | " f=0\n",
193 | " for index, row in gpdf.iterrows():\n",
194 | " f+=1\n",
195 | " points_x = np.array([])\n",
196 | " points_y = np.array([])\n",
197 | " geoid_temp = np.array([])\n",
198 | " \n",
199 | " if row['GEOID20'] in geoid_index_df:\n",
200 | " num_points = df_temp.loc[row['GEOID20']]\n",
201 | " polygon = row['geometry']\n",
202 | " #print(row['GEOID10'])\n",
203 | " #print('SUCCESS')\n",
204 | " num_points = df_temp.loc[row['GEOID20']] # store population\n",
205 | " polygon = row['geometry']\n",
206 | "\n",
207 | " \n",
208 | " if polygon is not None:\n",
209 | " points_x, points_y = random_points_in_polygon(num_points, polygon)\n",
210 | " # print(points_x,points_y)\n",
211 | " geoid_temp = np.array([row['GEOID20']]*len(points_x))\n",
212 | " geoid = np.append(geoid,geoid_temp)\n",
213 | " final_points_x = np.append(final_points_x, points_x)\n",
214 | " # print(final_points_x)\n",
215 | " final_points_y = np.append(final_points_y, points_y)\n",
216 | " print('Processing '+str(state)+' - Completed:', \"{0:0.2f}\".format((index/len(gpdf))*100), '%', end='')\n",
217 | " print('', end='\\r')\n",
218 | " \n",
219 | " # if f==11:\n",
220 | " # break\n",
221 | "\n",
222 | " print('Processing for '+str(state)+' complete \\n total time', datetime.datetime.now() - t1)\n",
223 | " df_fin = cudf.DataFrame({'GEOID20': geoid,'x': final_points_x, 'y':final_points_y}) #,'COUNTY':county,'p_delta':p_delta,'p_net':p_net})\n",
224 | " df_fin.GEOID20 = df_fin.GEOID20[1:].astype('int').astype('str')\n",
225 | " df_fin.GEOID20 = df_fin.GEOID20.fillna(method='bfill')\n",
226 | " \n",
227 | " df_fin.to_csv('data/total_population/population_%s_1'%str(state)+'.csv', index=False)\n",
228 | "def exec_data(state_key_list):\n",
229 | " c=0\n",
230 | " for i in state_key_list:\n",
231 | " print(i)\n",
232 | " c+=1\n",
233 | " if i< 10:\n",
234 | " i_str = '0'+str(i)\n",
235 | " else:\n",
236 | " i_str = str(i)\n",
237 | " # path = 'census_2020_data/nhgis0003_shape/nhgis0003_shapefile_tl2020_%s0_block_2020/%s_block_2020.shp'%(i_str,states[i])\n",
238 | " path ='data/tl_shapefiles/tl_2021_%s_tabblock20.shp'%(i_str)\n",
239 | " #print(path)\n",
240 | " print(\"started reading shape file for state \", states[i])\n",
241 | " if os.path.isfile(path): \n",
242 | " gpdf = gpd.read_file(path)[['GEOID20', 'geometry']].sort_values('GEOID20').reset_index(drop=True)\n",
243 | " gpdf.GEOID20 = gpdf.GEOID20.astype('int64')\n",
244 | " gpdf = gpdf[(gpdf.GEOID20>=480019501001000) & (gpdf.GEOID20<=481439502032029)].reset_index(drop=True)\n",
245 | " print(\"completed reading shape file for state \", states[i])\n",
246 | " df_temp = df.query('STATE == @i')[['ID20', 'points']]\n",
247 | " df_temp.index = df_temp.ID20\n",
248 | " df_temp = df_temp['points']\n",
249 | " # print(gpdf.head(3))\n",
250 | " # print(df_temp)\n",
251 | " print(\"starting to generate data for \"+str(states[i])+\"... \")\n",
252 | " generate_data(states[i], df_temp, gpdf)\n",
253 | " del(df_temp)\n",
254 | " else:\n",
255 | " print(\"shape file does not exist\")\n",
256 | " continue\n",
257 | " # if c==2:\n",
258 | " # break "
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": 3,
264 | "id": "29a0e4e2-a41d-45a6-b6aa-aa8c87ddf5ef",
265 | "metadata": {},
266 | "outputs": [],
267 | "source": [
268 | "# states = {1 :\"AL\",2 :\"AK\",4 :\"AZ\",5 :\"AR\",6 :\"CA\",8 :\"CO\",9 :\"CT\",10:\"DE\",11:\"DC\",12:\"FL\",13:\"GA\",15:\"HI\",\n",
269 | "# 16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS\",21:\"KY\",22:\"LA\",23:\"ME\",24:\"MD\",25:\"MA\",26:\"MI\",27:\"MN\",\n",
270 | "# 28:\"MS\",29:\"MO\",30:\"MT\",31:\"NE\",32:\"NV\",33:\"NH\",34:\"NJ\",35:\"NM\",36:\"NY\",37:\"NC\",38:\"ND\",39:\"OH\",\n",
271 | "# 40:\"OK\",41:\"OR\",42:\"PA\",44:\"RI\",45:\"SC\",46:\"SD\",47:\"TN\",48:\"TX\",49:\"UT\",50:\"VT\",51:\"VA\",53:\"WA\",\n",
272 | "# 54:\"WV\",55:\"WI\",56:\"WY\",72:\"PR\"}\n",
273 | "# states = {6:\"CA\"}"
274 | ]
275 | },
276 | {
277 | "cell_type": "code",
278 | "execution_count": 13,
279 | "id": "4e6f6e62-cbeb-4a67-aee4-024ab9af2f07",
280 | "metadata": {},
281 | "outputs": [
282 | {
283 | "name": "stdout",
284 | "output_type": "stream",
285 | "text": [
286 | "48\n",
287 | "started reading shape file for state TX\n",
288 | "completed reading shape file for state TX\n",
289 | "starting to generate data for TX... \n",
290 | "Processing for TX complete 100.00 %\n",
291 | " total time 3:08:48.306832\n"
292 | ]
293 | }
294 | ],
295 | "source": [
296 | "exec_data(states.keys())"
297 | ]
298 | },
299 | {
300 | "cell_type": "markdown",
301 | "id": "2f48af1e-14fa-467b-b7b4-f773427a45dc",
302 | "metadata": {
303 | "tags": []
304 | },
305 | "source": [
306 | "### Concat Parts"
307 | ]
308 | },
309 | {
310 | "cell_type": "code",
311 | "execution_count": 2,
312 | "id": "dd9ea4cd-2457-46cf-9824-6eac0d41975c",
313 | "metadata": {},
314 | "outputs": [],
315 | "source": [
316 | "def merge_parts(state_key_list):\n",
317 | " concat_states = cudf.DataFrame()\n",
318 | " c=0\n",
319 | " for i in state_key_list:\n",
320 | " for c in range(1,4):\n",
321 | " if i< 10:\n",
322 | " i_str = '0'+str(i)\n",
323 | " else:\n",
324 | " i_str = str(i)\n",
325 | " path = 'data/total_population/population_%s_%s'%(str(states[i]),c)+'.csv'\n",
326 | " # print(path)\n",
327 | " if os.path.isfile(path): \n",
328 | " temp = cudf.read_csv(path) # Load shape files\n",
329 | " concat_states = cudf.concat([concat_states,temp])\n",
330 | " else:\n",
331 | " print(\"population file does not exist\")\n",
332 | " continue\n",
333 | " return concat_states"
334 | ]
335 | },
336 | {
337 | "cell_type": "code",
338 | "execution_count": 4,
339 | "id": "a78e1dfc-8105-4302-bac5-f8b7c2e159de",
340 | "metadata": {},
341 | "outputs": [],
342 | "source": [
343 | "concat_parts = merge_parts(states)"
344 | ]
345 | },
346 | {
347 | "cell_type": "code",
348 | "execution_count": 5,
349 | "id": "ef51cef2-4bd2-451f-a55e-26a86499fc3f",
350 | "metadata": {},
351 | "outputs": [
352 | {
353 | "data": {
354 | "text/html": [
355 | "\n",
356 | "\n",
369 | "
\n",
370 | " \n",
371 | " \n",
372 | " | \n",
373 | " GEOID20 | \n",
374 | " x | \n",
375 | " y | \n",
376 | "
\n",
377 | " \n",
378 | " \n",
379 | " \n",
380 | " 0 | \n",
381 | " 60014001001001 | \n",
382 | " -122.2 | \n",
383 | " 37.9 | \n",
384 | "
\n",
385 | " \n",
386 | " 1 | \n",
387 | " 60014001001001 | \n",
388 | " -122.2 | \n",
389 | " 37.9 | \n",
390 | "
\n",
391 | " \n",
392 | " 2 | \n",
393 | " 60014001001001 | \n",
394 | " -122.2 | \n",
395 | " 37.9 | \n",
396 | "
\n",
397 | " \n",
398 | " 3 | \n",
399 | " 60014001001001 | \n",
400 | " -122.2 | \n",
401 | " 37.9 | \n",
402 | "
\n",
403 | " \n",
404 | " 4 | \n",
405 | " 60014001001001 | \n",
406 | " -122.2 | \n",
407 | " 37.9 | \n",
408 | "
\n",
409 | " \n",
410 | " ... | \n",
411 | " ... | \n",
412 | " ... | \n",
413 | " ... | \n",
414 | "
\n",
415 | " \n",
416 | " 59325523 | \n",
417 | " 61150411021048 | \n",
418 | " -121.3 | \n",
419 | " 39.4 | \n",
420 | "
\n",
421 | " \n",
422 | " 59325524 | \n",
423 | " 61150411021048 | \n",
424 | " -121.3 | \n",
425 | " 39.4 | \n",
426 | "
\n",
427 | " \n",
428 | " 59325525 | \n",
429 | " 61150411021048 | \n",
430 | " -121.3 | \n",
431 | " 39.4 | \n",
432 | "
\n",
433 | " \n",
434 | " 59325526 | \n",
435 | " 61150411021048 | \n",
436 | " -121.3 | \n",
437 | " 39.4 | \n",
438 | "
\n",
439 | " \n",
440 | " 59325527 | \n",
441 | " 61150411021048 | \n",
442 | " -121.3 | \n",
443 | " 39.4 | \n",
444 | "
\n",
445 | " \n",
446 | "
\n",
447 | "
59325528 rows × 3 columns
\n",
448 | "
"
449 | ],
450 | "text/plain": [
451 | " GEOID20 x y\n",
452 | "0 60014001001001 -122.2 37.9\n",
453 | "1 60014001001001 -122.2 37.9\n",
454 | "2 60014001001001 -122.2 37.9\n",
455 | "3 60014001001001 -122.2 37.9\n",
456 | "4 60014001001001 -122.2 37.9\n",
457 | "... ... ... ...\n",
458 | "59325523 61150411021048 -121.3 39.4\n",
459 | "59325524 61150411021048 -121.3 39.4\n",
460 | "59325525 61150411021048 -121.3 39.4\n",
461 | "59325526 61150411021048 -121.3 39.4\n",
462 | "59325527 61150411021048 -121.3 39.4\n",
463 | "\n",
464 | "[59325528 rows x 3 columns]"
465 | ]
466 | },
467 | "execution_count": 5,
468 | "metadata": {},
469 | "output_type": "execute_result"
470 | }
471 | ],
472 | "source": [
473 | "concat_parts =concat_parts.reset_index(drop=True)\n",
474 | "concat_parts"
475 | ]
476 | },
477 | {
478 | "cell_type": "code",
479 | "execution_count": 9,
480 | "id": "d72bb78c-3a19-4a4f-8d91-6ec056169ff9",
481 | "metadata": {},
482 | "outputs": [
483 | {
484 | "data": {
485 | "text/plain": [
486 | "42742567.0"
487 | ]
488 | },
489 | "execution_count": 9,
490 | "metadata": {},
491 | "output_type": "execute_result"
492 | }
493 | ],
494 | "source": [
495 | "df[df.STATE==48].points.sum()"
496 | ]
497 | },
498 | {
499 | "cell_type": "code",
500 | "execution_count": 21,
501 | "id": "80c9b8b2-4975-4508-b160-96415f5e72af",
502 | "metadata": {},
503 | "outputs": [
504 | {
505 | "data": {
506 | "text/plain": [
507 | "59325528.0"
508 | ]
509 | },
510 | "execution_count": 21,
511 | "metadata": {},
512 | "output_type": "execute_result"
513 | }
514 | ],
515 | "source": [
516 | "df.points.sum()"
517 | ]
518 | },
519 | {
520 | "cell_type": "code",
521 | "execution_count": 10,
522 | "id": "496e325f-c830-4e5a-bfe1-a9f8016376b7",
523 | "metadata": {},
524 | "outputs": [],
525 | "source": [
526 | "concat_parts.to_pandas().to_csv('data/total_population/population_CA')"
527 | ]
528 | },
529 | {
530 | "cell_type": "markdown",
531 | "id": "13fe141e-7175-4ee8-923d-80b991dd04f5",
532 | "metadata": {
533 | "tags": []
534 | },
535 | "source": [
536 | "### Concat States"
537 | ]
538 | },
539 | {
540 | "cell_type": "code",
541 | "execution_count": 2,
542 | "id": "848adcd4-502b-4296-a4cf-92e3ab9ff965",
543 | "metadata": {},
544 | "outputs": [],
545 | "source": [
546 | "def merge_shape_and_states(state_key_list):\n",
547 | " concat_states = cudf.DataFrame()\n",
548 | " \n",
549 | " for i in state_key_list:\n",
550 | " if i< 10:\n",
551 | " i_str = '0'+str(i)\n",
552 | " else:\n",
553 | " i_str = str(i)\n",
554 | " path = 'data/total_population/population_%s'%str(states[i])+'.csv'\n",
555 | " if os.path.isfile(path): \n",
556 | " temp = cudf.read_csv(path) # Load shape files\n",
557 | " concat_states = cudf.concat([concat_states,temp])\n",
558 | " else:\n",
559 | " print(i)\n",
560 | " print(\"population file does not exist\")\n",
561 | " continue\n",
562 | " print(i)\n",
563 | " return concat_states"
564 | ]
565 | },
566 | {
567 | "cell_type": "code",
568 | "execution_count": 3,
569 | "id": "50dcda12-ae59-45d1-9dc5-50e93ee5692c",
570 | "metadata": {},
571 | "outputs": [],
572 | "source": [
573 | "# states = {1 :\"AL\",2 :\"AK\",4 :\"AZ\",5 :\"AR\",6 :\"CA\",8 :\"CO\",9 :\"CT\",10:\"DE\",11:\"DC\",12:\"FL\",13:\"GA\",15:\"HI\",\n",
574 | "# 16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS\",21:\"KY\",22:\"LA\",23:\"ME\",24:\"MD\",25:\"MA\",26:\"MI\",27:\"MN\",\n",
575 | "# 28:\"MS\",29:\"MO\",30:\"MT\",31:\"NE\",32:\"NV\",33:\"NH\",34:\"NJ\",35:\"NM\",36:\"NY\",37:\"NC\",38:\"ND\",39:\"OH\",\n",
576 | "# 40:\"OK\",41:\"OR\",42:\"PA\",44:\"RI\",45:\"SC\",46:\"SD\",47:\"TN\",48:\"TX\",49:\"UT\",50:\"VT\",51:\"VA\",53:\"WA\",\n",
577 | "# 54:\"WV\",55:\"WI\",56:\"WY\",72:\"PR\"}\n",
578 | "states = {1 :\"AL\",2 :\"AK\",4 :\"AZ\",5 :\"AR\",6 :\"CA\",8 :\"CO\",9 :\"CT\",10:\"DE\",11:\"DC\",12:\"FL\",13:\"GA\",15:\"HI\",\n",
579 | " 16:\"ID\",17:\"IL\",18:\"IN\",19:\"IA\",20:\"KS\",21:\"KY\",22:\"LA\",23:\"ME\",24:\"MD\",25:\"MA\",26:\"MI\",27:\"MN\",\n",
580 | " 28:\"MS\"} # part1\n",
581 | "states = {29:\"MO\",30:\"MT\",31:\"NE\",32:\"NV\",33:\"NH\",34:\"NJ\",35:\"NM\",36:\"NY\",37:\"NC\",38:\"ND\",39:\"OH\",\n",
582 | " 40:\"OK\",41:\"OR\",42:\"PA\",44:\"RI\",45:\"SC\",46:\"SD\",47:\"TN\",48:\"TX\",49:\"UT\",50:\"VT\",51:\"VA\",53:\"WA\",\n",
583 | " 54:\"WV\",55:\"WI\",56:\"WY\",72:\"PR\"} #part2"
584 | ]
585 | },
586 | {
587 | "cell_type": "code",
588 | "execution_count": 4,
589 | "id": "438ad7af-c7a4-4d39-9741-a4cc0c87859c",
590 | "metadata": {
591 | "tags": []
592 | },
593 | "outputs": [
594 | {
595 | "name": "stdout",
596 | "output_type": "stream",
597 | "text": [
598 | "29\n",
599 | "30\n",
600 | "31\n",
601 | "32\n",
602 | "33\n",
603 | "34\n",
604 | "35\n",
605 | "36\n",
606 | "37\n",
607 | "38\n",
608 | "39\n",
609 | "40\n",
610 | "41\n",
611 | "42\n",
612 | "44\n",
613 | "45\n",
614 | "46\n",
615 | "47\n",
616 | "48\n",
617 | "49\n",
618 | "50\n",
619 | "51\n",
620 | "53\n",
621 | "54\n",
622 | "55\n",
623 | "56\n",
624 | "72\n"
625 | ]
626 | },
627 | {
628 | "data": {
629 | "text/html": [
630 | "\n",
631 | "\n",
644 | "
\n",
645 | " \n",
646 | " \n",
647 | " | \n",
648 | " ID20 | \n",
649 | " x | \n",
650 | " y | \n",
651 | "
\n",
652 | " \n",
653 | " \n",
654 | " \n",
655 | " 0 | \n",
656 | " 290019501001000 | \n",
657 | " -92.4 | \n",
658 | " 40.3 | \n",
659 | "
\n",
660 | " \n",
661 | " 1 | \n",
662 | " 290019501001000 | \n",
663 | " -92.4 | \n",
664 | " 40.3 | \n",
665 | "
\n",
666 | " \n",
667 | " 2 | \n",
668 | " 290019501001001 | \n",
669 | " -92.4 | \n",
670 | " 40.3 | \n",
671 | "
\n",
672 | " \n",
673 | " 3 | \n",
674 | " 290019501001001 | \n",
675 | " -92.4 | \n",
676 | " 40.3 | \n",
677 | "
\n",
678 | " \n",
679 | " 4 | \n",
680 | " 290019501001001 | \n",
681 | " -92.4 | \n",
682 | " 40.3 | \n",
683 | "
\n",
684 | " \n",
685 | "
\n",
686 | "
"
687 | ],
688 | "text/plain": [
689 | " ID20 x y\n",
690 | "0 290019501001000 -92.4 40.3\n",
691 | "1 290019501001000 -92.4 40.3\n",
692 | "2 290019501001001 -92.4 40.3\n",
693 | "3 290019501001001 -92.4 40.3\n",
694 | "4 290019501001001 -92.4 40.3"
695 | ]
696 | },
697 | "execution_count": 4,
698 | "metadata": {},
699 | "output_type": "execute_result"
700 | }
701 | ],
702 | "source": [
703 | "indv_df = merge_shape_and_states(states.keys()).drop('Unnamed: 0',axis=1)\n",
704 | "indv_df.rename(columns={'GEOID20':'ID20'},inplace=True)\n",
705 | "indv_df.head()"
706 | ]
707 | },
708 | {
709 | "cell_type": "code",
710 | "execution_count": 5,
711 | "id": "9fd6b2fe-5fbf-47b1-93d2-7bd18dfeeb7b",
712 | "metadata": {},
713 | "outputs": [
714 | {
715 | "data": {
716 | "text/plain": [
717 | "248001113"
718 | ]
719 | },
720 | "execution_count": 5,
721 | "metadata": {},
722 | "output_type": "execute_result"
723 | }
724 | ],
725 | "source": [
726 | "len(indv_df)"
727 | ]
728 | },
729 | {
730 | "cell_type": "code",
731 | "execution_count": null,
732 | "id": "9bfb9e3e-bfeb-45d6-ac7a-9475e2575577",
733 | "metadata": {},
734 | "outputs": [],
735 | "source": [
736 | "# indv_df.to_pandas().to_parquet('data/total_part1.parquet')"
737 | ]
738 | },
739 | {
740 | "cell_type": "code",
741 | "execution_count": 6,
742 | "id": "9c369907-9c5f-4874-9d2f-8c80a86b9c56",
743 | "metadata": {},
744 | "outputs": [],
745 | "source": [
746 | "# indv_df.to_pandas().to_parquet('data/total_part2.parquet')"
747 | ]
748 | },
749 | {
750 | "cell_type": "markdown",
751 | "id": "9d138976-2169-4049-9c7c-815657a2b08c",
752 | "metadata": {},
753 | "source": [
754 | "### Use processed dfs"
755 | ]
756 | },
757 | {
758 | "cell_type": "code",
759 | "execution_count": 2,
760 | "id": "add2e952-e678-434c-84bc-68c3d6527b5d",
761 | "metadata": {},
762 | "outputs": [],
763 | "source": [
764 | "# df1 = pd.read_parquet('data/total_part1.parquet')\n",
765 | "# df2 = pd.read_parquet('data/total_part2.parquet')"
766 | ]
767 | },
768 | {
769 | "cell_type": "code",
770 | "execution_count": 3,
771 | "id": "b9c7d374-9466-4e9e-bd74-ed84d0b25790",
772 | "metadata": {},
773 | "outputs": [],
774 | "source": [
775 | "# merged = pd.concat([df1,df2])"
776 | ]
777 | },
778 | {
779 | "cell_type": "code",
780 | "execution_count": 5,
781 | "id": "ea1d033d-dd0b-4827-b87b-623d11f02c6a",
782 | "metadata": {},
783 | "outputs": [
784 | {
785 | "data": {
786 | "text/plain": [
787 | "504475979"
788 | ]
789 | },
790 | "execution_count": 5,
791 | "metadata": {},
792 | "output_type": "execute_result"
793 | }
794 | ],
795 | "source": [
796 | "# len(merged)"
797 | ]
798 | },
799 | {
800 | "cell_type": "code",
801 | "execution_count": 6,
802 | "id": "5f3ef3c3-0170-4d7a-be20-f85b7250f2a9",
803 | "metadata": {},
804 | "outputs": [],
805 | "source": [
806 | "# gpu = cudf.from_pandas(merged)"
807 | ]
808 | },
809 | {
810 | "cell_type": "code",
811 | "execution_count": 9,
812 | "id": "68e69c2c-286f-4abd-a0fb-ef11e4c4b851",
813 | "metadata": {},
814 | "outputs": [],
815 | "source": [
816 | "# merged.to_parquet('data/total_parts_combined.parquet')"
817 | ]
818 | },
819 | {
820 | "cell_type": "code",
821 | "execution_count": null,
822 | "id": "ab83595d-dd0f-4e05-a587-49dbcee0b31c",
823 | "metadata": {},
824 | "outputs": [],
825 | "source": [
826 | "# dataset = indv_df.merge(df,on='ID20',how='left').sort_values('ID20')\n",
827 | "# dataset.head()"
828 | ]
829 | }
830 | ],
831 | "metadata": {
832 | "kernelspec": {
833 | "display_name": "Python 3 (ipykernel)",
834 | "language": "python",
835 | "name": "python3"
836 | },
837 | "language_info": {
838 | "codemirror_mode": {
839 | "name": "ipython",
840 | "version": 3
841 | },
842 | "file_extension": ".py",
843 | "mimetype": "text/x-python",
844 | "name": "python",
845 | "nbconvert_exporter": "python",
846 | "pygments_lexer": "ipython3",
847 | "version": "3.9.13"
848 | }
849 | },
850 | "nbformat": 4,
851 | "nbformat_minor": 5
852 | }
853 |
--------------------------------------------------------------------------------
/entrypoint.sh:
--------------------------------------------------------------------------------
1 | #activating the conda environment
2 | source activate rapids
3 |
4 | cd /rapids/plotly_census_demo/plotly_demo
5 |
6 | if [ "$@" = "dask_app" ]; then
7 | python dask_app.py
8 | else
9 | python app.py
10 | fi
11 |
12 |
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | channels:
2 | - rapidsai
3 | - conda-forge
4 | - nvidia
5 | dependencies:
6 | - python=3.10
7 | - cudatoolkit=11.8
8 | - cudf=23.06
9 | - dask-cudf=23.06
10 | - dask-cuda=23.06
11 | - dash
12 | - jupyterlab
13 | - jupyter-dash
14 | - jupyterlab-dash
15 | - dash-html-components
16 | - dash-core-components
17 | - dash-daq
18 | - dash-bootstrap-components
19 | - datashader>=0.15
20 | - pyproj
21 | - bokeh
22 |
--------------------------------------------------------------------------------
/environment_for_docker.yml:
--------------------------------------------------------------------------------
1 | channels:
2 | - conda-forge
3 | dependencies:
4 | - dash=2.5.1
5 | - dash-html-components=2.0.0
6 | - dash-core-components=2.0.0
7 | - dash-daq=0.5.0
8 | - dash-bootstrap-components=1.2.0
9 | - datashader=0.14
10 |
--------------------------------------------------------------------------------
/holoviews_demo/README.md:
--------------------------------------------------------------------------------
1 | # Panel + Holoviews + RAPIDS | Census 2020 Race Migration Visualization
2 |
3 | 
4 |
5 | ## Charts
6 |
7 | 1. Map chart shows the total migration points for chosen view and selected area
8 | 2. Top counties bar show the counties with most migration for chosen view and selected area
9 | 3. Net Race migration bar shows total inward and outward migration for chosen view and selected area
10 | 4. Population Distribution shows distribution of migration across blocks for chosen view and selected area
11 |
12 | Cross-filtering is enabled to link all the four charts using box-select tool
13 |
14 | ## Race Views
15 |
16 | The demo consists of eight views ( seven race views + one all-race view)
17 |
18 | Options - All, White alone, African American alone, American Indian alone, Asian alone, Native Hawaiian alone, Other Race alone, Two or More races.
19 |
20 | #### Snapshot examples
21 |
22 | 1. White race
23 |
24 | 
25 |
26 | 2. Asian race
27 |
28 | 
29 |
30 | 3. African american race
31 |
32 | 
33 |
34 | ## Colormaps
35 |
36 | User can select from select colormaps
37 |
38 | Options - 'kbc', 'fire', 'bgy', 'bgyw', 'bmy', 'gray'.
39 |
40 | ## Limit
41 |
42 | User can use slider to select how many top counties to show, from 5 to 50 at intervals of 5
43 |
44 | # Installation and Run Steps
45 |
46 | ## Data
47 |
48 | There is 1 main dataset:
49 |
50 | - Net Migration Dataset ; Consists of Race Migration computed using Census 2020 and Census 2010 block data
51 |
52 | For more information on how the Net Migration Dataset was prepared to show individual points, refer to the `/data_prep_net_migration` folder.
53 |
54 | You can download the final net miragtion dataset [here](https://data.rapids.ai/viz-data/net_migration_dataset.parquet)
55 |
56 | ### Conda Env
57 |
58 | Verify the following arguments in the `environment.yml` match your system(easy way to check `nvidia-smi`):
59 |
60 | cudatoolkit: Version used is `11.5`
61 |
62 | ```bash
63 | # setup conda environment
64 | conda env create --name holoviews_env --file environment.yml
65 | source activate holoviews_env
66 |
67 | # run and access
68 | cd holoviews_demo
69 | jupyter lab
70 | run `census_net_migration_demo.ipynb` notebook
71 | ```
72 |
73 | ## Dependencies
74 |
75 | - python=3.9
76 | - cudatoolkit=11.5
77 | - rapids=22.08
78 | - plotly=5.10.0
79 | - jupyterlab=3.4.3
80 |
81 | ## FAQ and Known Issues
82 |
83 | **What hardware do I need to run this locally?** To run you need an NVIDIA GPU with at least 24GB of memory, at least 32GB of system memory, and a Linux OS as defined in the [RAPIDS requirements](https://rapids.ai/start.html#req).
84 |
85 | **How did you compute migration?** Migration was computed by comparing the block level population for census 2010 and 2020
86 |
87 | **How did you compare population having block level boundary changes?** [Relationship Files](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#t10t20) provides the 2010 Census Tabulation Block to 2020 Census Tabulation Block Relationship Files. Block relationships may be one-to-one, many-to-one, one-to-many, or many-to-many. Population count was computed in proportion to take into account the division and collation of blocks across 2010 and 2020.
88 |
89 | **How did you determine race migration?** We took difference of race counts for census 2020 and census 2010. Individuals were randomly assigned race within a block so that they accurately add up at the block level.
90 |
91 | **How did you get individual point locations?** The population density points are randomly placed within a census block and associated to match distribution counts at a census block level.
92 |
93 | **How are the population and distributions filtered?** Use the box select tool icon for the map or click and drag for the bar charts.
94 |
95 | **Why is the population data from 2010 and 2020?** Only census data is recorded on a block level, which provides the highest resolution population distributions available. For more details on census boundaries refer to the [TIGERweb app](https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_apps.html).
96 |
97 | **The dashboard stop responding or the chart data disappeared!** This is likely caused by an Out of Memory Error and the application must be restarted.
98 |
99 | **How do I request a feature or report a bug?** Create an [Issue](https://github.com/rapidsai/plotly-dash-rapids-census-demo/issues) and we will get to it asap.
100 |
101 | ## Acknowledgments and Data Sources
102 |
103 | - 2020 Population Census and 2010 Population Census to compute Net Migration Dataset, used with permission from IPUMS NHGIS, University of Minnesota, [www.nhgis.org](https://www.nhgis.org/) ( not for redistribution ).
104 | - Dashboard developed with [Panel](https://panel.holoviz.org/) and [Holoviews](https://holoviews.org/index.html)
105 | - Geospatial point rendering developed with [Datashader](https://datashader.org/).
106 | - GPU acceleration with [RAPIDS cudf](https://rapids.ai/) and [cupy](https://cupy.chainer.org/), CPU code with [pandas](https://pandas.pydata.org/).
107 | - For source code and data workflow, visit our [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/census-2020).
108 |
--------------------------------------------------------------------------------
/holoviews_demo/environment.yml:
--------------------------------------------------------------------------------
1 | channels:
2 | - rapidsai
3 | - conda-forge
4 | - nvidia
5 | dependencies:
6 | - python=3.9
7 | - cudatoolkit=11.5
8 | - rapids=22.08
9 | - plotly=5.10.0
10 | - jupyterlab=3.4.3
11 |
--------------------------------------------------------------------------------
/id2county.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/e4af2bd3de86263b7f1f947ba9b302002e047a55/id2county.pkl
--------------------------------------------------------------------------------
/plotly_demo/README.md:
--------------------------------------------------------------------------------
1 | # Plotly-Dash + RAPIDS | Census 2020 Visualization
2 |
3 | There are two versions for the same application, with all the views(described below) in both, single GPU and multi-GPU versions respectively.
4 |
5 | Recommended GPU memory:
6 |
7 | 1. Single GPU version: 32GB+
8 | 2. Multi-GPU version: 2+ GPUs of 16GB+ each
9 |
10 | ```bash
11 | # run and access single GPU version
12 | cd plotly_demo
13 | python app.py
14 |
15 | # run and access multi GPU version
16 | cd plotly_demo
17 | python dask_app.py
18 | ```
19 |
20 | ## Snapshot Examples
21 |
22 | ### 1) Total Population View
23 |
24 | 
25 |
26 | ### 2) Migrating In View
27 |
28 | 
29 |
30 | ### 3) Stationary View
31 |
32 | 
33 |
34 | ### 4) Migrating Out View
35 |
36 | 
37 |
38 | ### 5) Net Migration View
39 |
40 | 
41 |
42 | #### Migration population to color mapping -
43 |
44 | Inward Migration: Purple-Blue
45 | Stationary: Greens
46 | Outward Migration: Red Purples
47 |
48 | ### 6) Population with Race view
49 |
50 | 
51 |
52 | #### Race to color mapping -
53 |
54 | White: aqua
55 | African American: lime
56 | American Indian: yellow
57 | Asian: orange
58 | Native Hawaiian: blue
59 | Other Race alone: fuchsia
60 | Two or More: saddlebrown
61 |
--------------------------------------------------------------------------------
/plotly_demo/app.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 |
4 | import cudf
5 | import dash_bootstrap_components as dbc
6 | import dash_daq as daq
7 | import numpy as np
8 | import pandas as pd
9 | from dash import Dash, ctx, dcc, html
10 | from dash.dependencies import Input, Output, State
11 | from dash.exceptions import PreventUpdate
12 | from distributed import Client
13 | from utils import *
14 | import tarfile
15 |
16 | # ### Dashboards start here
17 | text_color = "#cfd8dc" # Material blue-grey 100
18 |
19 | DATA_PATH = "../data"
20 | DATA_PATH_STATE = f"{DATA_PATH}/state-wise-population"
21 | DATA_PATH_TOTAL = f"{DATA_PATH}/total_population_dataset.parquet"
22 |
23 | # Download the required states data
24 | census_data_url = "https://data.rapids.ai/viz-data/total_population_dataset.parquet"
25 | check_dataset(census_data_url, DATA_PATH_TOTAL)
26 |
27 | census_state_data_url = "https://data.rapids.ai/viz-data/state-wise-population.tar.xz"
28 | if not os.path.exists(DATA_PATH_STATE):
29 | check_dataset(census_state_data_url, f"{DATA_PATH_STATE}.tar.xz")
30 | print("Extracting state-wise-population.tar.xz ...")
31 | with tarfile.open(f"{DATA_PATH_STATE}.tar.xz", "r:xz") as tar:
32 | tar.extractall(DATA_PATH)
33 | print("Done.")
34 |
35 | state_files = os.listdir(DATA_PATH_STATE)
36 | state_names = [os.path.splitext(f)[0] for f in state_files]
37 | # add USA(combined dataset) to the list of states
38 | state_names.append("USA")
39 |
40 |
41 | (
42 | data_center_3857,
43 | data_3857,
44 | data_4326,
45 | data_center_4326,
46 | selected_map_backup,
47 | selected_race_backup,
48 | selected_county_top_backup,
49 | selected_county_bt_backup,
50 | view_name_backup,
51 | c_df,
52 | gpu_enabled_backup,
53 | dragmode_backup,
54 | currently_loaded_state,
55 | ) = ([], [], [], [], None, None, None, None, None, None, None, "pan", None)
56 |
57 |
58 | app = Dash(__name__)
59 | application = app.server
60 |
61 | app.layout = html.Div(
62 | children=[
63 | html.Div(
64 | children=[
65 | html.H1(
66 | children=[
67 | "Census 2020 Net Migration Visualization",
68 | html.A(
69 | html.Img(
70 | src="assets/rapids-logo.png",
71 | style={
72 | "float": "right",
73 | "height": "45px",
74 | "marginRight": "1%",
75 | "marginTop": "-7px",
76 | },
77 | ),
78 | href="https://rapids.ai/",
79 | ),
80 | html.A(
81 | html.Img(
82 | src="assets/dash-logo.png",
83 | style={"float": "right", "height": "30px"},
84 | ),
85 | href="https://dash.plot.ly/",
86 | ),
87 | ],
88 | style={"textAlign": "left"},
89 | ),
90 | ]
91 | ),
92 | html.Div(
93 | children=[
94 | html.Div(
95 | children=[
96 | html.Div(
97 | children=[
98 | html.H4(
99 | [
100 | "Population Count and Query Time",
101 | ],
102 | className="container_title",
103 | ),
104 | dcc.Loading(
105 | dcc.Graph(
106 | id="indicator-graph",
107 | figure=blank_fig(row_heights[3]),
108 | config={"displayModeBar": False},
109 | ),
110 | color="#b0bec5",
111 | style={"height": f"{row_heights[3]}px"},
112 | ),
113 | ],
114 | style={"height": f"{row_heights[0]}px"},
115 | className="five columns pretty_container",
116 | id="indicator-div",
117 | ),
118 | html.Div(
119 | children=[
120 | html.Div(
121 | children=[
122 | html.Button(
123 | "Clear All Selections",
124 | id="clear-all",
125 | className="reset-button",
126 | ),
127 | ]
128 | ),
129 | html.H4(
130 | [
131 | "Options",
132 | ],
133 | className="container_title",
134 | ),
135 | html.Table(
136 | [
137 | html.Tr(
138 | [
139 | html.Td(
140 | html.Div("GPU Acceleration"),
141 | className="config-label",
142 | ),
143 | html.Td(
144 | html.Div(
145 | [
146 | daq.DarkThemeProvider(
147 | daq.BooleanSwitch(
148 | on=True,
149 | color="#00cc96",
150 | id="gpu-toggle",
151 | )
152 | ),
153 | dbc.Tooltip(
154 | "Caution: Using CPU compute for more than 50 million points is not recommended.",
155 | target="gpu-toggle",
156 | placement="bottom",
157 | autohide=True,
158 | style={
159 | "textAlign": "left",
160 | "fontSize": "15px",
161 | "color": "white",
162 | "width": "350px",
163 | "padding": "15px",
164 | "borderRadius": "5px",
165 | "backgroundColor": "#2a2a2e",
166 | },
167 | ),
168 | ]
169 | )
170 | ),
171 | ####### State Selection Dropdown ######
172 | html.Td(
173 | html.Div("Select State"),
174 | style={"fontSize": "20px"},
175 | ),
176 | html.Td(
177 | dcc.Dropdown(
178 | id="state-dropdown",
179 | options=[
180 | {"label": i, "value": i}
181 | for i in state_names
182 | ],
183 | value="USA",
184 | ),
185 | style={
186 | "width": "25%",
187 | "height": "15px",
188 | },
189 | ),
190 | ###### VIEWS ARE HERE ###########
191 | html.Td(
192 | html.Div("Data-Selection"),
193 | style={"fontSize": "20px"},
194 | ), # className="config-label"
195 | html.Td(
196 | dcc.Dropdown(
197 | id="view-dropdown",
198 | options=[
199 | {
200 | "label": "Total Population",
201 | "value": "total",
202 | },
203 | {
204 | "label": "Migrating In",
205 | "value": "in",
206 | },
207 | {
208 | "label": "Stationary",
209 | "value": "stationary",
210 | },
211 | {
212 | "label": "Migrating Out",
213 | "value": "out",
214 | },
215 | {
216 | "label": "Net Migration",
217 | "value": "net",
218 | },
219 | {
220 | "label": "Population with Race",
221 | "value": "race",
222 | },
223 | ],
224 | value="in",
225 | searchable=False,
226 | clearable=False,
227 | ),
228 | style={
229 | "width": "25%",
230 | "height": "15px",
231 | },
232 | ),
233 | ]
234 | ),
235 | ],
236 | style={"width": "100%", "marginTop": "30px"},
237 | ),
238 | # Hidden div inside the app that stores the intermediate value
239 | html.Div(
240 | id="datapoints-state-value",
241 | style={"display": "none"},
242 | ),
243 | ],
244 | style={"height": f"{row_heights[0]}px"},
245 | className="seven columns pretty_container",
246 | id="config-div",
247 | ),
248 | ]
249 | ),
250 | ##################### Map starts ###################################
251 | html.Div(
252 | children=[
253 | html.Button(
254 | "Clear Selection", id="reset-map", className="reset-button"
255 | ),
256 | html.H4(
257 | [
258 | "Population Distribution of Individuals",
259 | ],
260 | className="container_title",
261 | ),
262 | dcc.Graph(
263 | id="map-graph",
264 | config={"displayModeBar": False},
265 | figure=blank_fig(row_heights[1]),
266 | ),
267 | # Hidden div inside the app that stores the intermediate value
268 | html.Div(
269 | id="intermediate-state-value", style={"display": "none"}
270 | ),
271 | ],
272 | className="twelve columns pretty_container",
273 | id="map-div",
274 | style={"height": "50%"},
275 | ),
276 | ################# Bars start #########################
277 | # Race start
278 | html.Div(
279 | children=[
280 | html.Div(
281 | children=[
282 | html.Button(
283 | "Clear Selection",
284 | id="clear-race",
285 | className="reset-button",
286 | ),
287 | html.H4(
288 | [
289 | "Race Distribution",
290 | ],
291 | className="container_title",
292 | ),
293 | dcc.Graph(
294 | id="race-histogram",
295 | config={"displayModeBar": False},
296 | figure=blank_fig(row_heights[2]),
297 | ),
298 | ],
299 | className="one-third column pretty_container",
300 | id="race-div",
301 | ), # County top starts
302 | html.Div(
303 | children=[
304 | html.Button(
305 | "Clear Selection",
306 | id="clear-county-top",
307 | className="reset-button",
308 | ),
309 | html.H4(
310 | [
311 | "County-wise Top 15",
312 | ],
313 | className="container_title",
314 | ),
315 | dcc.Graph(
316 | id="county-histogram-top",
317 | config={"displayModeBar": False},
318 | figure=blank_fig(row_heights[2]),
319 | animate=False,
320 | ),
321 | ],
322 | className=" one-third column pretty_container",
323 | id="county-div-top",
324 | ),
325 | # County bottom starts
326 | html.Div(
327 | children=[
328 | html.Button(
329 | "Clear Selection",
330 | id="clear-county-bottom",
331 | className="reset-button",
332 | ),
333 | html.H4(
334 | [
335 | "County-wise Bottom 15",
336 | ],
337 | className="container_title",
338 | ),
339 | dcc.Graph(
340 | id="county-histogram-bottom",
341 | config={"displayModeBar": False},
342 | figure=blank_fig(row_heights[2]),
343 | animate=False,
344 | ),
345 | ],
346 | className="one-third column pretty_container",
347 | ),
348 | ],
349 | className="twelve columns",
350 | )
351 | ############## End of Bars #####################
352 | ]
353 | ),
354 | html.Div(
355 | [
356 | html.H4("Acknowledgements and Data Sources", style={"marginTop": "0"}),
357 | dcc.Markdown(
358 | """\
359 | - 2020 Population Census and 2010 Population Census to compute Migration Dataset, used with permission from IPUMS NHGIS, University of Minnesota, [www.nhgis.org](https://www.nhgis.org/) ( not for redistribution ).
360 | - Base map layer provided by [Mapbox](https://www.mapbox.com/).
361 | - Dashboard developed with [Plotly Dash](https://plotly.com/dash/).
362 | - Geospatial point rendering developed with [Datashader](https://datashader.org/).
363 | - GPU toggle accelerated with [RAPIDS cudf and dask_cudf](https://rapids.ai/) and [cupy](https://cupy.chainer.org/), CPU toggle with [pandas](https://pandas.pydata.org/).
364 | - For source code and data workflow, visit our [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/master).
365 | """
366 | ),
367 | ],
368 | style={
369 | "width": "98%",
370 | "marginRight": "0",
371 | "padding": "10px",
372 | },
373 | className="twelve columns pretty_container",
374 | ),
375 | ],
376 | )
377 |
378 |
379 | # Clear/reset button callbacks
380 | @app.callback(
381 | Output("map-graph", "selectedData"),
382 | [Input("reset-map", "n_clicks"), Input("clear-all", "n_clicks")],
383 | )
384 | def clear_map(*args):
385 | return None
386 |
387 |
388 | @app.callback(
389 | Output("race-histogram", "selectedData"),
390 | [Input("clear-race", "n_clicks"), Input("clear-all", "n_clicks")],
391 | )
392 | def clear_race_hist_selections(*args):
393 | return None
394 |
395 |
396 | @app.callback(
397 | Output("county-histogram-top", "selectedData"),
398 | [Input("clear-county-top", "n_clicks"), Input("clear-all", "n_clicks")],
399 | )
400 | def clear_county_hist_top_selections(*args):
401 | return None
402 |
403 |
404 | @app.callback(
405 | Output("county-histogram-bottom", "selectedData"),
406 | [Input("clear-county-bottom", "n_clicks"), Input("clear-all", "n_clicks")],
407 | )
408 | def clear_county_hist_bottom_selections(*args):
409 | return None
410 |
411 |
412 | @app.callback(
413 | [
414 | Output("indicator-graph", "figure"),
415 | Output("map-graph", "figure"),
416 | Output("map-graph", "config"),
417 | Output("county-histogram-top", "figure"),
418 | Output("county-histogram-top", "config"),
419 | Output("county-histogram-bottom", "figure"),
420 | Output("county-histogram-bottom", "config"),
421 | Output("race-histogram", "figure"),
422 | Output("race-histogram", "config"),
423 | Output("intermediate-state-value", "children"),
424 | ],
425 | [
426 | Input("map-graph", "relayoutData"),
427 | Input("map-graph", "selectedData"),
428 | Input("race-histogram", "selectedData"),
429 | Input("county-histogram-top", "selectedData"),
430 | Input("county-histogram-bottom", "selectedData"),
431 | Input("view-dropdown", "value"),
432 | Input("state-dropdown", "value"),
433 | Input("gpu-toggle", "on"),
434 | ],
435 | [
436 | State("intermediate-state-value", "children"),
437 | ],
438 | )
439 | def update_plots(
440 | relayout_data,
441 | selected_map,
442 | selected_race,
443 | selected_county_top,
444 | selected_county_bottom,
445 | view_name,
446 | state_name,
447 | gpu_enabled,
448 | coordinates_backup,
449 | ):
450 | global data_3857, data_center_3857, data_4326, data_center_4326, currently_loaded_state, selected_race_backup, selected_county_top_backup, selected_county_bt_backup
451 |
452 | # condition to avoid reloading on tool update
453 | if (
454 | ctx.triggered_id == "map-graph"
455 | and relayout_data
456 | and list(relayout_data.keys()) == ["dragmode"]
457 | ):
458 | raise PreventUpdate
459 |
460 | # condition to avoid a bug in plotly where selectedData is reset following a box-select
461 | if not (selected_race is not None and len(selected_race["points"]) == 0):
462 | selected_race_backup = selected_race
463 | elif ctx.triggered_id == "race-histogram":
464 | raise PreventUpdate
465 |
466 | # condition to avoid a bug in plotly where selectedData is reset following a box-select
467 | if not (
468 | selected_county_top is not None and len(selected_county_top["points"]) == 0
469 | ):
470 | selected_county_top_backup = selected_county_top
471 | elif ctx.triggered_id == "county-histogram-top":
472 | raise PreventUpdate
473 |
474 | # condition to avoid a bug in plotly where selectedData is reset following a box-select
475 | if not (
476 | selected_county_bottom is not None
477 | and len(selected_county_bottom["points"]) == 0
478 | ):
479 | selected_county_bt_backup = selected_county_bottom
480 | elif ctx.triggered_id == "county-histogram-bottom":
481 | raise PreventUpdate
482 |
483 | df = read_dataset(state_name, gpu_enabled, currently_loaded_state)
484 |
485 | t0 = time.time()
486 |
487 | if coordinates_backup is not None:
488 | coordinates_4326_backup, position_backup = coordinates_backup
489 | else:
490 | coordinates_4326_backup, position_backup = None, None
491 |
492 | colorscale_name = "Viridis"
493 |
494 | if data_3857 == [] or state_name != currently_loaded_state:
495 | (
496 | data_3857,
497 | data_center_3857,
498 | data_4326,
499 | data_center_4326,
500 | ) = set_projection_bounds(df)
501 |
502 | (
503 | datashader_plot,
504 | race_histogram,
505 | county_top_histogram,
506 | county_bottom_histogram,
507 | n_selected_indicator,
508 | coordinates_4326_backup,
509 | position_backup,
510 | ) = build_updated_figures(
511 | df,
512 | relayout_data,
513 | selected_map,
514 | selected_race_backup,
515 | selected_county_top_backup,
516 | selected_county_bt_backup,
517 | colorscale_name,
518 | data_3857,
519 | data_center_3857,
520 | data_4326,
521 | data_center_4326,
522 | coordinates_4326_backup,
523 | position_backup,
524 | view_name,
525 | )
526 |
527 | barchart_config = {
528 | "displayModeBar": True,
529 | "modeBarButtonsToRemove": [
530 | "zoom2d",
531 | "pan2d",
532 | "select2d",
533 | "lasso2d",
534 | "zoomIn2d",
535 | "zoomOut2d",
536 | "resetScale2d",
537 | "hoverClosestCartesian",
538 | "hoverCompareCartesian",
539 | "toggleSpikelines",
540 | ],
541 | }
542 | compute_time = time.time() - t0
543 | print(f"Query time: {compute_time}")
544 | n_selected_indicator["data"].append(
545 | {
546 | "title": {"text": "Query Time"},
547 | "type": "indicator",
548 | "value": round(compute_time, 4),
549 | "domain": {"x": [0.6, 0.85], "y": [0, 0.5]},
550 | "number": {
551 | "font": {
552 | "color": text_color,
553 | "size": "50px",
554 | },
555 | "suffix": " seconds",
556 | },
557 | }
558 | )
559 |
560 | datashader_plot["layout"]["dragmode"] = (
561 | relayout_data["dragmode"]
562 | if (relayout_data and "dragmode" in relayout_data)
563 | else dragmode_backup
564 | )
565 | # update currently loaded state
566 | currently_loaded_state = state_name
567 |
568 | return (
569 | n_selected_indicator,
570 | datashader_plot,
571 | {
572 | "displayModeBar": True,
573 | "modeBarButtonsToRemove": [
574 | "lasso2d",
575 | "zoomInMapbox",
576 | "zoomOutMapbox",
577 | "toggleHover",
578 | ],
579 | },
580 | county_top_histogram,
581 | barchart_config,
582 | county_bottom_histogram,
583 | barchart_config,
584 | race_histogram,
585 | barchart_config,
586 | (coordinates_4326_backup, position_backup),
587 | )
588 |
589 |
590 | def read_dataset(state_name, gpu_enabled, currently_loaded_state):
591 | global c_df
592 | if state_name != currently_loaded_state:
593 | if state_name == "USA":
594 | data_path = f"{DATA_PATH}/total_population_dataset.parquet"
595 | else:
596 | data_path = f"{DATA_PATH_STATE}/{state_name}.parquet"
597 | c_df = load_dataset(data_path, "cudf" if gpu_enabled else "pandas")
598 | return c_df
599 |
600 |
601 | if __name__ == "__main__":
602 | # Launch dashboard
603 | app.run_server(
604 | debug=True,
605 | dev_tools_hot_reload=True,
606 | host="0.0.0.0",
607 | )
608 |
--------------------------------------------------------------------------------
/plotly_demo/assets/dash-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/e4af2bd3de86263b7f1f947ba9b302002e047a55/plotly_demo/assets/dash-logo.png
--------------------------------------------------------------------------------
/plotly_demo/assets/rapids-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/e4af2bd3de86263b7f1f947ba9b302002e047a55/plotly_demo/assets/rapids-logo.png
--------------------------------------------------------------------------------
/plotly_demo/assets/s1.css:
--------------------------------------------------------------------------------
1 | /* Table of contents
2 | ––––––––––––––––––––––––––––––––––––––––––––––––––
3 | - Plotly.js
4 | - Grid
5 | - Base Styles
6 | - Typography
7 | - Links
8 | - Buttons
9 | - Forms
10 | - Lists
11 | - Code
12 | - Tables
13 | - Spacing
14 | - Utilities
15 | - Clearing
16 | - Media Queries
17 | */
18 |
19 | /* Grid
20 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
21 | .container {
22 | position: relative;
23 | width: 100%;
24 | max-width: 960px;
25 | margin: 0 auto;
26 | padding: 0 20px;
27 | box-sizing: border-box; }
28 | .column,
29 | .columns {
30 | width: 100%;
31 | float: left;
32 | box-sizing: border-box; }
33 |
34 | /* For devices larger than 400px */
35 | @media (min-width: 400px) {
36 | .container {
37 | width: 85%;
38 | padding: 0; }
39 | }
40 |
41 | /* For devices larger than 550px */
42 | @media (min-width: 550px) {
43 | .container {
44 | width: 80%;
45 | }
46 | .column,
47 | .columns {
48 | margin-left: 4%; }
49 | .column:first-child,
50 | .columns:first-child {
51 | margin-left: 0; }
52 |
53 |
54 | .one.column,
55 | .one.columns { width: 4.66666666667%; }
56 | .two.columns { width: 13.3333333333%; }
57 | .three.columns { width: 22%; }
58 | .four.columns { width: 30.6666666667%; }
59 | .five.columns { width: 39.3333333333%; }
60 | .six.columns { width: 48%; }
61 | .seven.columns { width: 56.6666666667%; }
62 | .eight.columns { width: 65.3333333333%; }
63 | .nine.columns { width: 74.0%; }
64 | .ten.columns { width: 82.6666666667%; }
65 | .eleven.columns { width: 91.3333333333%; }
66 | .twelve.columns { width: 98%; margin-left: 0; margin-right: 0;}
67 |
68 | .one-third.column { width: 32%; margin-right: 0.5;}
69 | .one-third.column:last-child { margin-right: 0;}
70 | .two-thirds.column { width: 65.3333333333%; }
71 |
72 | .one-half.column { width: 48%; }
73 |
74 | /* Offsets */
75 | .offset-by-one.column,
76 | .offset-by-one.columns { margin-left: 8.66666666667%; }
77 | .offset-by-two.column,
78 | .offset-by-two.columns { margin-left: 17.3333333333%; }
79 | .offset-by-three.column,
80 | .offset-by-three.columns { margin-left: 26%; }
81 | .offset-by-four.column,
82 | .offset-by-four.columns { margin-left: 34.6666666667%; }
83 | .offset-by-five.column,
84 | .offset-by-five.columns { margin-left: 43.3333333333%; }
85 | .offset-by-six.column,
86 | .offset-by-six.columns { margin-left: 52%; }
87 | .offset-by-seven.column,
88 | .offset-by-seven.columns { margin-left: 60.6666666667%; }
89 | .offset-by-eight.column,
90 | .offset-by-eight.columns { margin-left: 69.3333333333%; }
91 | .offset-by-nine.column,
92 | .offset-by-nine.columns { margin-left: 78.0%; }
93 | .offset-by-ten.column,
94 | .offset-by-ten.columns { margin-left: 86.6666666667%; }
95 | .offset-by-eleven.column,
96 | .offset-by-eleven.columns { margin-left: 95.3333333333%; }
97 |
98 | .offset-by-one-third.column,
99 | .offset-by-one-third.columns { margin-left: 34.6666666667%; }
100 | .offset-by-two-thirds.column,
101 | .offset-by-two-thirds.columns { margin-left: 69.3333333333%; }
102 |
103 | .offset-by-one-half.column,
104 | .offset-by-one-half.columns { margin-left: 52%; }
105 |
106 | }
107 |
108 |
109 | /* Base Styles
110 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
111 | /* NOTE
112 | html is set to 62.5% so that all the REM measurements throughout Skeleton
113 | are based on 10px sizing. So basically 1.5rem = 15px :) */
114 | html {
115 | font-size: 62.5%; }
116 | body {
117 | font-size: 1.5em; /* currently ems cause chrome bug misinterpreting rems on body element */
118 | line-height: 1.6;
119 | font-weight: 400;
120 | font-family: "Open Sans", "HelveticaNeue", "Helvetica Neue", Helvetica, Arial, sans-serif;
121 | color: #cfd8dc; /* Material blue-grey 100 */
122 | background-color: #191a1a; /* Material blue-grey 900*/
123 | margin: 2%;
124 | }
125 |
126 |
127 | /* Typography
128 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
129 | h1, h2, h3, h4, h5, h6 {
130 | margin-top: 0;
131 | margin-bottom: 0;
132 | font-weight: 300; }
133 | h1 { font-size: 3.2rem; line-height: 1.2; letter-spacing: -.1rem; margin-bottom: 2rem; }
134 | h2 { font-size: 3.0rem; line-height: 1.25; letter-spacing: -.1rem; margin-bottom: 1.8rem; margin-top: 1.8rem;}
135 | h3 { font-size: 2.7rem; line-height: 1.3; letter-spacing: -.1rem; margin-bottom: 1.5rem; margin-top:1.5rem;}
136 | h4 { font-size: 2.4rem; line-height: 1.35; letter-spacing: -.08rem; margin-bottom: 1.2rem; margin-top: 1.2rem;}
137 | h5 { font-size: 2.0rem; line-height: 1.5; letter-spacing: -.05rem; margin-bottom: 0.6rem; margin-top: 0.6rem;}
138 | h6 { font-size: 2.0rem; line-height: 1.6; letter-spacing: 0; margin-bottom: 0.75rem; margin-top: 0.75rem;}
139 |
140 | p {
141 | margin-top: 0; }
142 |
143 |
144 | /* Blockquotes
145 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
146 | blockquote {
147 | border-left: 4px lightgrey solid;
148 | padding-left: 1rem;
149 | margin-top: 2rem;
150 | margin-bottom: 2rem;
151 | margin-left: 0rem;
152 | }
153 |
154 |
155 | /* Links
156 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
157 | a {
158 | color: #1565c0; /* Material Blue 800 */
159 | text-decoration: underline;
160 | cursor: pointer;}
161 | a:hover {
162 | color: #0d47a1; /* Material Blue 900 */
163 | }
164 |
165 |
166 | /* Buttons
167 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
168 | .button,
169 | button,
170 | input[type="submit"],
171 | input[type="reset"],
172 | input[type="button"] {
173 | display: inline-block;
174 | height: 38px;
175 | padding: 0 30px;
176 | color: #90a4ae; /* Material blue-gray 300*/
177 | text-align: center;
178 | font-size: 11px;
179 | font-weight: 600;
180 | line-height: 38px;
181 | letter-spacing: .1rem;
182 | text-transform: uppercase;
183 | text-decoration: none;
184 | white-space: nowrap;
185 | background-color: transparent;
186 | border-radius: 4px;
187 | border: 1px solid #90a4ae; /* Material blue-gray 300*/
188 | cursor: pointer;
189 | box-sizing: border-box; }
190 | .button:hover,
191 | button:hover,
192 | input[type="submit"]:hover,
193 | input[type="reset"]:hover,
194 | input[type="button"]:hover,
195 | .button:focus,
196 | button:focus,
197 | input[type="submit"]:focus,
198 | input[type="reset"]:focus,
199 | input[type="button"]:focus {
200 | color: #cfd8dc;
201 | border-color: #cfd8dc;
202 | outline: 0; }
203 | .button.button-primary,
204 | button.button-primary,
205 | input[type="submit"].button-primary,
206 | input[type="reset"].button-primary,
207 | input[type="button"].button-primary {
208 | color: #FFF;
209 | background-color: #33C3F0;
210 | border-color: #33C3F0; }
211 | .button.button-primary:hover,
212 | button.button-primary:hover,
213 | input[type="submit"].button-primary:hover,
214 | input[type="reset"].button-primary:hover,
215 | input[type="button"].button-primary:hover,
216 | .button.button-primary:focus,
217 | button.button-primary:focus,
218 | input[type="submit"].button-primary:focus,
219 | input[type="reset"].button-primary:focus,
220 | input[type="button"].button-primary:focus {
221 | color: #FFF;
222 | background-color: #1EAEDB;
223 | border-color: #1EAEDB; }
224 |
225 |
226 | /* Forms
227 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
228 | input[type="email"],
229 | input[type="number"],
230 | input[type="search"],
231 | input[type="text"],
232 | input[type="tel"],
233 | input[type="url"],
234 | input[type="password"],
235 | textarea,
236 | select {
237 | height: 38px;
238 | padding: 6px 10px; /* The 6px vertically centers text on FF, ignored by Webkit */
239 | background-color: #fff;
240 | border: 1px solid #D1D1D1;
241 | border-radius: 4px;
242 | box-shadow: none;
243 | box-sizing: border-box;
244 | font-family: inherit;
245 | font-size: inherit; /*https://stackoverflow.com/questions/6080413/why-doesnt-input-inherit-the-font-from-body*/}
246 | /* Removes awkward default styles on some inputs for iOS */
247 | input[type="email"],
248 | input[type="number"],
249 | input[type="search"],
250 | input[type="text"],
251 | input[type="tel"],
252 | input[type="url"],
253 | input[type="password"],
254 | textarea {
255 | -webkit-appearance: none;
256 | -moz-appearance: none;
257 | appearance: none; }
258 | textarea {
259 | min-height: 65px;
260 | padding-top: 6px;
261 | padding-bottom: 6px; }
262 | input[type="email"]:focus,
263 | input[type="number"]:focus,
264 | input[type="search"]:focus,
265 | input[type="text"]:focus,
266 | input[type="tel"]:focus,
267 | input[type="url"]:focus,
268 | input[type="password"]:focus,
269 | textarea:focus,
270 | select:focus {
271 | border: 1px solid #33C3F0;
272 | outline: 0; }
273 | label,
274 | legend {
275 | display: block;
276 | margin-bottom: 0px; }
277 | fieldset {
278 | padding: 0;
279 | border-width: 0; }
280 | input[type="checkbox"],
281 | input[type="radio"] {
282 | display: inline; }
283 | label > .label-body {
284 | display: inline-block;
285 | margin-left: .5rem;
286 | font-weight: normal; }
287 |
288 |
289 | /* Lists
290 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
291 | ul {
292 | list-style: circle inside; }
293 | ol {
294 | list-style: decimal inside; }
295 | ol, ul {
296 | padding-left: 0;
297 | margin-top: 0; }
298 | ul ul,
299 | ul ol,
300 | ol ol,
301 | ol ul {
302 | margin: 1.5rem 0 1.5rem 3rem;
303 | font-size: 90%; }
304 | li {
305 | margin-bottom: 0;
306 | }
307 |
308 | /* Tables
309 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
310 | table {
311 | border-collapse: collapse;
312 | }
313 | th:not(.CalendarDay),
314 | td:not(.CalendarDay) {
315 | padding: 4px 10px;
316 | text-align: left;
317 | /*border-bottom: 1px solid #E1E1E1;*/
318 | }
319 | th:first-child:not(.CalendarDay),
320 | td:first-child:not(.CalendarDay) {
321 | padding-left: 0; }
322 | th:last-child:not(.CalendarDay),
323 | /*td:last-child:not(.CalendarDay) {*/
324 | /* padding-right: 0; }*/
325 |
326 |
327 | /* Spacing
328 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
329 | button,
330 | .button {
331 | margin-bottom: 0rem; }
332 | input,
333 | textarea,
334 | select,
335 | fieldset {
336 | margin-bottom: 0rem; }
337 | pre,
338 | dl,
339 | figure,
340 | table,
341 | form {
342 | margin-bottom: 0rem; }
343 | p,
344 | ul,
345 | ol {
346 | margin-bottom: 0.75rem; }
347 |
348 | /* Utilities
349 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
350 | .u-full-width {
351 | width: 100%;
352 | box-sizing: border-box; }
353 | .u-max-full-width {
354 | max-width: 100%;
355 | box-sizing: border-box; }
356 | .u-pull-right {
357 | float: right; }
358 | .u-pull-left {
359 | float: left; }
360 |
361 |
362 | /* Misc
363 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
364 | hr {
365 | margin-top: 3rem;
366 | margin-bottom: 3.5rem;
367 | border-width: 0;
368 | border-top: 1px solid #E1E1E1; }
369 |
370 |
371 | /* Clearing
372 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
373 |
374 | /* Self Clearing Goodness */
375 | .container:after,
376 | .row:after,
377 | .u-cf {
378 | content: "";
379 | display: table;
380 | clear: both; }
381 |
382 |
383 | /* Media Queries
384 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
385 | /*
386 | Note: The best way to structure the use of media queries is to create the queries
387 | near the relevant code. For example, if you wanted to change the styles for buttons
388 | on small devices, paste the mobile query code up in the buttons section and style it
389 | there.
390 | */
391 |
392 |
393 | /* Larger than mobile */
394 | @media (min-width: 400px) {}
395 |
396 | /* Larger than phablet (also point when grid becomes active) */
397 | @media (min-width: 550px) {}
398 |
399 | /* Larger than tablet */
400 | @media (min-width: 750px) {}
401 |
402 | /* Larger than desktop */
403 | @media (min-width: 1000px) {}
404 |
405 | /* Larger than Desktop HD */
406 | @media (min-width: 1200px) {}
407 |
408 | /* Pretty container
409 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
410 | .pretty_container {
411 | border-radius: 10px;
412 | background-color: #000000; /* Mapbox light map land color */
413 | margin: 0.5%;
414 | margin-left: 1.5%;
415 | padding: 1%;
416 | position: relative;
417 | box-shadow: 1px 1px 1px black;
418 | }
419 |
420 | .container_title {
421 | margin-top: 0;
422 | margin-bottom: 0.2em;
423 | font-size: 2.6rem;
424 | line-height: 2.6rem;
425 | }
426 |
427 | /* Special purpose buttons
428 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
429 | .reset-button {
430 | /* width: 100%; */
431 | /* margin-top: 10px; */
432 | margin-top: -5px;
433 | height: 30px;
434 | line-height: 30px;
435 | float: right;
436 | }
437 |
438 | .info-icon {
439 | float: right;
440 | cursor: pointer;
441 | }
442 |
443 |
444 | /* Modal info layer
445 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
446 | .modal {
447 | position: fixed;
448 | z-index: 1002; /* Sit on top, including modebar which has z=1001 */
449 | left: 0;
450 | top: 0;
451 | width: 100%; /* Full width */
452 | height: 100%; /* Full height */
453 | background-color: rgba(0, 0, 0, 0.6); /* Black w/ opacity */
454 | }
455 |
456 | .modal-content {
457 | z-index: 1004; /* Sit on top, including modebar which has z=1001 */
458 | position: fixed;
459 | left: 0;
460 | width: 60%;
461 | background-color: #3949ab; /* Material indigo 600 */
462 | color: white;
463 | border-radius: 5px;
464 | margin-left: 20%;
465 | margin-bottom: 2%;
466 | margin-top: 2%;
467 | }
468 |
469 | .modal-content > div {
470 | text-align: left;
471 | margin: 15px;
472 | }
473 |
474 | .modal-content.bottom {
475 | bottom: 0;
476 | }
477 |
478 | .modal-content.top {
479 | top: 0;
480 | }
481 |
482 | /* Config pane
483 | –––––––––––––––––––––––––––––––––––––––––––––––––– */
484 | .config-label {
485 | text-align: right !important;
486 | }
487 |
488 | .VirtualizedSelectOption {
489 | color: #191a1a;
490 | }
--------------------------------------------------------------------------------
/plotly_demo/dask_app.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import cudf
3 | from dash import dcc, html
4 | import numpy as np
5 | import pandas as pd
6 | from dash.dependencies import Input, Output, State
7 | from dash import dcc
8 | import dash_bootstrap_components as dbc
9 | import time
10 | import dash_daq as daq
11 | import dash
12 | from dask import delayed
13 | from distributed import Client
14 | from dask_cuda import LocalCUDACluster
15 | from utils.utils import *
16 | import argparse
17 |
18 |
19 | # ### Dashboards start here
20 | text_color = "#cfd8dc" # Material blue-grey 100
21 |
22 | (
23 | data_center_3857,
24 | data_3857,
25 | data_4326,
26 | data_center_4326,
27 | selected_map_backup,
28 | selected_race_backup,
29 | selected_county_top_backup,
30 | selected_county_bt_backup,
31 | view_name_backup,
32 | gpu_enabled_backup,
33 | dragmode_backup,
34 | ) = ([], [], [], [], None, None, None, None, None, None, "pan")
35 |
36 |
37 | app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])
38 | app.layout = html.Div(
39 | children=[
40 | ################# Title Bar ##############
41 | html.Div(
42 | [
43 | html.H1(
44 | children=[
45 | "Census 2020 Net Migration Visualization",
46 | html.A(
47 | html.Img(
48 | src="assets/rapids-logo.png",
49 | style={
50 | "float": "right",
51 | "height": "45px",
52 | "margin-right": "1%",
53 | "margin-top": "-7px",
54 | },
55 | ),
56 | href="https://rapids.ai/",
57 | ),
58 | html.A(
59 | html.Img(
60 | src="assets/dash-logo.png",
61 | style={"float": "right", "height": "30px"},
62 | ),
63 | href="https://dash.plot.ly/",
64 | ),
65 | ],
66 | style={
67 | "text-align": "left",
68 | "heights": "30px",
69 | "margin-left": "20px",
70 | },
71 | ),
72 | ]
73 | ),
74 | ###################### Options Bar ######################
75 | html.Div(
76 | children=[
77 | html.Div(
78 | children=[
79 | html.Table(
80 | [
81 | html.Tr(
82 | [
83 | html.Td(
84 | html.Div("CPU"),
85 | style={
86 | "font-size": "20px",
87 | "padding-left": "1.3rem",
88 | }, # className="config-label"
89 | ),
90 | html.Td(
91 | html.Div(
92 | [
93 | daq.DarkThemeProvider(
94 | daq.BooleanSwitch(
95 | on=True, # Turn on CPU/GPU
96 | color="#00cc96",
97 | id="gpu-toggle",
98 | )
99 | ),
100 | dbc.Tooltip(
101 | "Caution: Using CPU compute for more than 50 million points is not recommended.",
102 | target="gpu-toggle",
103 | placement="bottom",
104 | autohide=True,
105 | style={
106 | "textAlign": "left",
107 | "font-size": "15px",
108 | "color": "white",
109 | "width": "350px",
110 | "padding": "15px",
111 | "border-radius": "5px",
112 | "background-color": "#2a2a2e",
113 | },
114 | ),
115 | ]
116 | )
117 | ),
118 | html.Td(
119 | html.Div("GPU + RAPIDS"),
120 | style={
121 | "font-size": "20px"
122 | }, # , className="config-label"
123 | ),
124 | ####### Indicator graph ######
125 | html.Td(
126 | [
127 | dcc.Loading(
128 | dcc.Graph(
129 | id="indicator-graph",
130 | figure=blank_fig(50),
131 | config={
132 | "displayModeBar": False
133 | },
134 | style={"width": "95%"},
135 | ),
136 | color="#b0bec5",
137 | # style={'height': f'{50}px', 'width':'10px'}
138 | ), # style={'width': '50%'},
139 | ]
140 | ),
141 | ###### VIEWS ARE HERE ###########
142 | html.Td(
143 | html.Div("Data-Selection"),
144 | style={"font-size": "20px"},
145 | ), # className="config-label"
146 | html.Td(
147 | dcc.Dropdown(
148 | id="view-dropdown",
149 | options=[
150 | {
151 | "label": "Total Population",
152 | "value": "total",
153 | },
154 | {
155 | "label": "Migrating In",
156 | "value": "in",
157 | },
158 | {
159 | "label": "Stationary",
160 | "value": "stationary",
161 | },
162 | {
163 | "label": "Migrating Out",
164 | "value": "out",
165 | },
166 | {
167 | "label": "Net Migration",
168 | "value": "net",
169 | },
170 | {
171 | "label": "Population with Race",
172 | "value": "race",
173 | },
174 | ],
175 | value="in",
176 | searchable=False,
177 | clearable=False,
178 | ),
179 | style={
180 | "width": "10%",
181 | "height": "15px",
182 | },
183 | ),
184 | html.Td(
185 | html.Div(
186 | children=[
187 | html.Button(
188 | "Clear All Selections",
189 | id="clear-all",
190 | className="reset-button",
191 | ),
192 | ]
193 | ),
194 | style={
195 | "width": "10%",
196 | "height": "15px",
197 | },
198 | ),
199 | ]
200 | ),
201 | ],
202 | style={"width": "100%", "margin-top": "0px"},
203 | ),
204 | # Hidden div inside the app that stores the intermediate value
205 | html.Div(
206 | id="datapoints-state-value",
207 | style={"display": "none"},
208 | ),
209 | ],
210 | className="columns pretty_container",
211 | ), # className='columns pretty_container', id="config-div"),
212 | ]
213 | ),
214 | ########## End of options bar #######################################
215 | html.Hr(id="line1", style={"border": "1px solid grey", "margin": "0px"}),
216 | # html.Div( html.Hr(id='line',style={'border': '1px solid red'}) ),
217 | ##################### Map starts ###################################
218 | html.Div(
219 | children=[
220 | html.Button(
221 | "Clear Selection", id="reset-map", className="reset-button"
222 | ),
223 | html.H4(
224 | [
225 | "Individual Distribution",
226 | ],
227 | className="container_title",
228 | ),
229 | dcc.Graph(
230 | id="map-graph",
231 | config={"displayModeBar": False},
232 | figure=blank_fig(440),
233 | ),
234 | # Hidden div inside the app that stores the intermediate value
235 | html.Div(id="intermediate-state-value", style={"display": "none"}),
236 | ],
237 | className="columns pretty_container",
238 | style={"width": "100%", "margin-right": "0", "height": "66%"},
239 | id="map-div",
240 | ),
241 | html.Hr(id="line2", style={"border": "1px solid grey", "margin": "0px"}),
242 | ################# Bars start #########################
243 | # Race start
244 | html.Div(
245 | children=[
246 | html.Button(
247 | "Clear Selection",
248 | id="clear-race",
249 | className="reset-button",
250 | ),
251 | html.H4(
252 | [
253 | "Race Distribution",
254 | ],
255 | className="container_title",
256 | ),
257 | dcc.Graph(
258 | id="race-histogram",
259 | config={"displayModeBar": False},
260 | figure=blank_fig(row_heights[2]),
261 | animate=False,
262 | ),
263 | ],
264 | className="columns pretty_container",
265 | id="race-div",
266 | style={"width": "33.33%", "height": "20%"},
267 | ),
268 | # County top starts
269 | html.Div(
270 | children=[
271 | html.Button(
272 | "Clear Selection",
273 | id="clear-county-top",
274 | className="reset-button",
275 | ),
276 | html.H4(
277 | [
278 | "County-wise Top 15",
279 | ],
280 | className="container_title",
281 | ),
282 | dcc.Graph(
283 | id="county-histogram-top",
284 | config={"displayModeBar": False},
285 | figure=blank_fig(row_heights[2]),
286 | animate=False,
287 | ),
288 | ],
289 | className="columns pretty_container",
290 | id="county-div-top",
291 | style={"width": "33.33%", "height": "20%"},
292 | ),
293 | # County bottom starts
294 | html.Div(
295 | children=[
296 | html.Button(
297 | "Clear Selection",
298 | id="clear-county-bottom",
299 | className="reset-button",
300 | ),
301 | html.H4(
302 | [
303 | "County-wise Bottom 15",
304 | ],
305 | className="container_title",
306 | ),
307 | dcc.Graph(
308 | id="county-histogram-bottom",
309 | config={"displayModeBar": False},
310 | figure=blank_fig(row_heights[2]),
311 | animate=False,
312 | ),
313 | ],
314 | className="columns pretty_container",
315 | id="county-div-bottom",
316 | style={"width": "33.33%", "height": "20%"},
317 | ),
318 | ############## End of Bars #####################
319 | html.Hr(id="line3", style={"border": "1px solid grey", "margin": "0px"}),
320 | html.Div(
321 | [
322 | html.H4(
323 | "Acknowledgements and Data Sources",
324 | style={"margin-top": "0"},
325 | ),
326 | dcc.Markdown(
327 | """\
328 | - 2020 Population Census and 2010 Population Census to compute Migration Dataset, used with permission from IPUMS NHGIS, University of Minnesota, [www.nhgis.org](https://www.nhgis.org/) ( not for redistribution ).
329 | - Base map layer provided by [Mapbox](https://www.mapbox.com/).
330 | - Dashboard developed with [Plotly Dash](https://plotly.com/dash/).
331 | - Geospatial point rendering developed with [Datashader](https://datashader.org/).
332 | - GPU toggle accelerated with [RAPIDS cudf and dask_cudf](https://rapids.ai/) and [cupy](https://cupy.chainer.org/), CPU toggle with [pandas](https://pandas.pydata.org/).
333 | - For source code and data workflow, visit our [GitHub](https://github.com/rapidsai/plotly-dash-rapids-census-demo/tree/master).
334 | """
335 | ),
336 | ],
337 | style={"width": "100%"},
338 | className="columns pretty_container",
339 | ),
340 | ]
341 | )
342 |
343 | # Clear/reset button callbacks
344 |
345 |
346 | @app.callback(
347 | Output("map-graph", "selectedData"),
348 | [Input("reset-map", "n_clicks"), Input("clear-all", "n_clicks")],
349 | )
350 | def clear_map(*args):
351 | return None
352 |
353 |
354 | @app.callback(
355 | Output("race-histogram", "selectedData"),
356 | [Input("clear-race", "n_clicks"), Input("clear-all", "n_clicks")],
357 | )
358 | def clear_race_hist_selections(*args):
359 | return None
360 |
361 |
362 | @app.callback(
363 | Output("county-histogram-top", "selectedData"),
364 | [Input("clear-county-top", "n_clicks"), Input("clear-all", "n_clicks")],
365 | )
366 | def clear_county_hist_top_selections(*args):
367 | return None
368 |
369 |
370 | @app.callback(
371 | Output("county-histogram-bottom", "selectedData"),
372 | [Input("clear-county-bottom", "n_clicks"), Input("clear-all", "n_clicks")],
373 | )
374 | def clear_county_hist_bottom_selections(*args):
375 | return None
376 |
377 |
378 | # # Query string helpers
379 |
380 |
381 | def register_update_plots_callback(client):
382 | """
383 | Register Dash callback that updates all plots in response to selection events
384 | Args:
385 | df_d: Dask.delayed pandas or cudf DataFrame
386 | """
387 |
388 | @app.callback(
389 | [
390 | Output("indicator-graph", "figure"),
391 | Output("map-graph", "figure"),
392 | Output("map-graph", "config"),
393 | Output("map-graph", "relayoutData"),
394 | Output("county-histogram-top", "figure"),
395 | Output("county-histogram-top", "config"),
396 | Output("county-histogram-bottom", "figure"),
397 | Output("county-histogram-bottom", "config"),
398 | Output("race-histogram", "figure"),
399 | Output("race-histogram", "config"),
400 | Output("intermediate-state-value", "children"),
401 | ],
402 | [
403 | Input("map-graph", "relayoutData"),
404 | Input("map-graph", "selectedData"),
405 | Input("race-histogram", "selectedData"),
406 | Input("county-histogram-top", "selectedData"),
407 | Input("county-histogram-bottom", "selectedData"),
408 | Input("view-dropdown", "value"),
409 | Input("gpu-toggle", "on"),
410 | ],
411 | [
412 | State("intermediate-state-value", "children"),
413 | State("indicator-graph", "figure"),
414 | State("map-graph", "figure"),
415 | State("map-graph", "config"),
416 | State("map-graph", "relayoutData"),
417 | State("county-histogram-top", "figure"),
418 | State("county-histogram-top", "config"),
419 | State("county-histogram-bottom", "figure"),
420 | State("county-histogram-bottom", "config"),
421 | State("race-histogram", "figure"),
422 | State("race-histogram", "config"),
423 | State("intermediate-state-value", "children"),
424 | ],
425 | )
426 | def update_plots(
427 | relayout_data,
428 | selected_map,
429 | selected_race,
430 | selected_county_top,
431 | selected_county_bottom,
432 | view_name,
433 | gpu_enabled,
434 | coordinates_backup,
435 | *backup_args,
436 | ):
437 | global data_3857, data_center_3857, data_4326, data_center_4326, selected_map_backup, selected_race_backup, selected_county_top_backup, selected_county_bt_backup, view_name_backup, gpu_enabled_backup, dragmode_backup
438 |
439 | # condition to avoid reloading on tool update
440 | if (
441 | type(relayout_data) == dict
442 | and list(relayout_data.keys()) == ["dragmode"]
443 | and selected_map == selected_map_backup
444 | and selected_race_backup == selected_race
445 | and selected_county_top_backup == selected_county_top
446 | and selected_county_bt_backup == selected_county_bottom
447 | and view_name_backup == view_name
448 | and gpu_enabled_backup == gpu_enabled
449 | ):
450 | backup_args[1]["layout"]["dragmode"] = relayout_data["dragmode"]
451 | dragmode_backup = relayout_data["dragmode"]
452 | return backup_args
453 |
454 | selected_map_backup = selected_map
455 | selected_race_backup = selected_race
456 | selected_county_top_backup = selected_county_top
457 | selected_county_bt_backup = selected_county_bottom
458 | view_name_backup = view_name
459 | gpu_enabled_backup = gpu_enabled
460 |
461 | t0 = time.time()
462 |
463 | if coordinates_backup is not None:
464 | coordinates_4326_backup, position_backup = coordinates_backup
465 | else:
466 | coordinates_4326_backup, position_backup = None, None
467 |
468 | # Get delayed dataset from client
469 | if gpu_enabled:
470 | df = client.get_dataset("c_df_d")
471 | else:
472 | df = client.get_dataset("pd_df_d")
473 |
474 | colorscale_name = "Viridis"
475 |
476 | if data_3857 == []:
477 | projections = delayed(set_projection_bounds)(df)
478 | (
479 | data_3857,
480 | data_center_3857,
481 | data_4326,
482 | data_center_4326,
483 | ) = projections.compute()
484 |
485 | figures = build_updated_figures_dask(
486 | df,
487 | relayout_data,
488 | selected_map,
489 | selected_race,
490 | selected_county_top,
491 | selected_county_bottom,
492 | colorscale_name,
493 | data_3857,
494 | data_center_3857,
495 | data_4326,
496 | data_center_4326,
497 | coordinates_4326_backup,
498 | position_backup,
499 | view_name,
500 | )
501 |
502 | (
503 | datashader_plot,
504 | race_histogram,
505 | county_top_histogram,
506 | county_bottom_histogram,
507 | n_selected_indicator,
508 | coordinates_4326_backup,
509 | position_backup,
510 | ) = figures
511 |
512 | barchart_config = {
513 | "displayModeBar": True,
514 | "modeBarButtonsToRemove": [
515 | "zoom2d",
516 | "pan2d",
517 | "select2d",
518 | "lasso2d",
519 | "zoomIn2d",
520 | "zoomOut2d",
521 | "resetScale2d",
522 | "hoverClosestCartesian",
523 | "hoverCompareCartesian",
524 | "toggleSpikelines",
525 | ],
526 | }
527 | compute_time = time.time() - t0
528 | print(f"Query time: {compute_time}")
529 | n_selected_indicator["data"].append(
530 | {
531 | "title": {"text": "Query Time"},
532 | "type": "indicator",
533 | "value": round(compute_time, 4),
534 | "domain": {"x": [0.53, 0.61], "y": [0, 0.5]},
535 | "number": {
536 | "font": {
537 | "color": text_color,
538 | "size": "50px",
539 | },
540 | "suffix": " seconds",
541 | },
542 | }
543 | )
544 | datashader_plot["layout"]["dragmode"] = (
545 | relayout_data["dragmode"]
546 | if (relayout_data and "dragmode" in relayout_data)
547 | else dragmode_backup
548 | )
549 |
550 | return (
551 | n_selected_indicator,
552 | datashader_plot,
553 | {
554 | "displayModeBar": True,
555 | "modeBarButtonsToRemove": [
556 | "lasso2d",
557 | "zoomInMapbox",
558 | "zoomOutMapbox",
559 | "toggleHover",
560 | ],
561 | },
562 | relayout_data,
563 | race_histogram,
564 | barchart_config,
565 | county_top_histogram,
566 | barchart_config,
567 | county_bottom_histogram,
568 | barchart_config,
569 | (coordinates_4326_backup, position_backup),
570 | )
571 |
572 |
573 | def publish_dataset_to_cluster(cuda_visible_devices):
574 |
575 | census_data_url = "https://data.rapids.ai/viz-data/total_population_dataset.parquet"
576 | data_path = "../data/total_population_dataset.parquet"
577 | check_dataset(census_data_url, data_path)
578 |
579 | # Note: The creation of a Dask LocalCluster must happen inside the `__main__` block,
580 | cluster = (
581 | LocalCUDACluster(CUDA_VISIBLE_DEVICES=cuda_visible_devices)
582 | if cuda_visible_devices
583 | else LocalCUDACluster()
584 | )
585 | client = Client(cluster)
586 | print(f"Dask status: {cluster.dashboard_link}")
587 |
588 | # Load dataset and persist dataset on cluster
589 | def load_and_publish_dataset():
590 | # dask_cudf DataFrame
591 | c_df_d = load_dataset(data_path, "dask_cudf").persist()
592 | # pandas DataFrame
593 | pd_df_d = load_dataset(data_path, "dask").persist()
594 |
595 | # Unpublish datasets if present
596 | for ds_name in ["pd_df_d", "c_df_d"]:
597 | if ds_name in client.datasets:
598 | client.unpublish_dataset(ds_name)
599 |
600 | # Publish datasets to the cluster
601 | client.publish_dataset(pd_df_d=pd_df_d)
602 | client.publish_dataset(c_df_d=c_df_d)
603 |
604 | load_and_publish_dataset()
605 |
606 | # Precompute field bounds
607 | c_df_d = client.get_dataset("c_df_d")
608 |
609 | # Register top-level callback that updates plots
610 | register_update_plots_callback(client)
611 |
612 |
613 | if __name__ == "__main__":
614 | parser = argparse.ArgumentParser()
615 | parser.add_argument(
616 | "--cuda_visible_devices",
617 | help="supply the value of CUDA_VISIBLE_DEVICES as a comma separated string (e.g: --cuda_visible_devices=0,1), if None, all the available GPUs are used",
618 | default=None,
619 | )
620 |
621 | args = parser.parse_args()
622 | # development entry point
623 | publish_dataset_to_cluster(args.cuda_visible_devices)
624 |
625 | # Launch dashboard
626 | app.run_server(debug=False, dev_tools_silence_routes_logging=True, host="0.0.0.0")
627 |
--------------------------------------------------------------------------------
/plotly_demo/utils/__init__.py:
--------------------------------------------------------------------------------
1 | from .utils import *
2 |
--------------------------------------------------------------------------------
/plotly_demo/utils/utils.py:
--------------------------------------------------------------------------------
1 | from bokeh import palettes
2 | from pyproj import Transformer
3 | import cudf
4 | import cupy as cp
5 | import dask.dataframe as dd
6 | import datashader as ds
7 | import datashader.transfer_functions as tf
8 | import io
9 | import numpy as np
10 | import os
11 | import pandas as pd
12 | import pickle
13 | import requests
14 |
15 | try:
16 | import dask_cudf
17 | except ImportError:
18 | dask_cudf = None
19 |
20 | # Colors
21 | bgcolor = "#000000" # mapbox dark map land color
22 | text_color = "#cfd8dc" # Material blue-grey 100
23 | mapbox_land_color = "#000000"
24 | c = 9200
25 | # Colors for categories
26 | colors = {}
27 | colors["race"] = {
28 | 1: "aqua",
29 | 2: "lime",
30 | 3: "yellow",
31 | 4: "orange",
32 | 5: "blue",
33 | 6: "fuchsia",
34 | 7: "saddlebrown",
35 | }
36 | race2color = {
37 | "White": "aqua",
38 | "African American": "lime",
39 | "American Indian": "yellow",
40 | "Asian alone": "orange",
41 | "Native Hawaiian": "blue",
42 | "Other Race alone": "fuchsia",
43 | "Two or More": "saddlebrown",
44 | }
45 | colors["net"] = {
46 | -1: palettes.RdPu9[2],
47 | 0: palettes.Greens9[4],
48 | 1: palettes.PuBu9[2],
49 | } # '#32CD32'
50 |
51 |
52 | # Figure template
53 | row_heights = [150, 440, 300, 75]
54 | template = {
55 | "layout": {
56 | "paper_bgcolor": bgcolor,
57 | "plot_bgcolor": bgcolor,
58 | "font": {"color": text_color},
59 | "margin": {"r": 0, "t": 0, "l": 0, "b": 0},
60 | "bargap": 0.05,
61 | "xaxis": {"showgrid": False, "automargin": True},
62 | "yaxis": {"showgrid": True, "automargin": True},
63 | # 'gridwidth': 0.5, 'gridcolor': mapbox_land_color},
64 | }
65 | }
66 |
67 | url = "https://raw.githubusercontent.com/rapidsai/plotly-dash-rapids-census-demo/main/id2county.pkl"
68 | id2county = pickle.load(io.BytesIO(requests.get(url).content))
69 | county2id = {v: k for k, v in id2county.items()}
70 | id2race = {
71 | 0: "All",
72 | 1: "White",
73 | 2: "African American",
74 | 3: "American Indian",
75 | 4: "Asian alone",
76 | 5: "Native Hawaiian",
77 | 6: "Other Race alone",
78 | 7: "Two or More",
79 | }
80 | race2id = {v: k for k, v in id2race.items()}
81 |
82 | mappings = {}
83 | mappings_hover = {}
84 | mapbox_style = "carto-darkmatter"
85 |
86 |
87 | def set_projection_bounds(df_d):
88 | transformer_4326_to_3857 = Transformer.from_crs("epsg:4326", "epsg:3857")
89 |
90 | def epsg_4326_to_3857(coords):
91 | return [transformer_4326_to_3857.transform(*reversed(row)) for row in coords]
92 |
93 | transformer_3857_to_4326 = Transformer.from_crs("epsg:3857", "epsg:4326")
94 |
95 | def epsg_3857_to_4326(coords):
96 | return [
97 | list(reversed(transformer_3857_to_4326.transform(*row))) for row in coords
98 | ]
99 |
100 | data_3857 = (
101 | [df_d.easting.min(), df_d.northing.min()],
102 | [df_d.easting.max(), df_d.northing.max()],
103 | )
104 | data_center_3857 = [
105 | [
106 | (data_3857[0][0] + data_3857[1][0]) / 2.0,
107 | (data_3857[0][1] + data_3857[1][1]) / 2.0,
108 | ]
109 | ]
110 |
111 | data_4326 = epsg_3857_to_4326(data_3857)
112 | data_center_4326 = epsg_3857_to_4326(data_center_3857)
113 |
114 | return data_3857, data_center_3857, data_4326, data_center_4326
115 |
116 |
117 | # Build Dash app and initial layout
118 | def blank_fig(height):
119 | """
120 | Build blank figure with the requested height
121 | Args:
122 | height: height of blank figure in pixels
123 | Returns:
124 | Figure dict
125 | """
126 | return {
127 | "data": [],
128 | "layout": {
129 | "height": height,
130 | "template": template,
131 | "xaxis": {"visible": False},
132 | "yaxis": {"visible": False},
133 | },
134 | }
135 |
136 |
137 | # Plot functions
138 | def build_colorscale(colorscale_name, transform):
139 | """
140 | Build plotly colorscale
141 | Args:
142 | colorscale_name: Name of a colorscale from the plotly.colors.sequential module
143 | transform: Transform to apply to colors scale. One of 'linear', 'sqrt', 'cbrt',
144 | or 'log'
145 | Returns:
146 | Plotly color scale list
147 | """
148 | global colors, mappings
149 |
150 | colors_temp = getattr(palettes, colorscale_name)
151 | if transform == "linear":
152 | scale_values = np.linspace(0, 1, len(colors_temp))
153 | elif transform == "sqrt":
154 | scale_values = np.linspace(0, 1, len(colors_temp)) ** 2
155 | elif transform == "cbrt":
156 | scale_values = np.linspace(0, 1, len(colors_temp)) ** 3
157 | elif transform == "log":
158 | scale_values = (10 ** np.linspace(0, 1, len(colors_temp)) - 1) / 9
159 | else:
160 | raise ValueError("Unexpected colorscale transform")
161 | return [(v, clr) for v, clr in zip(scale_values, colors_temp)]
162 |
163 |
164 | def get_min_max(df, col):
165 | if dask_cudf and isinstance(df, dask_cudf.core.DataFrame):
166 | return (df[col].min().compute(), df[col].max().compute())
167 | return (df[col].min(), df[col].max())
168 |
169 |
170 | def build_datashader_plot(
171 | df,
172 | colorscale_name,
173 | colorscale_transform,
174 | new_coordinates,
175 | position,
176 | x_range,
177 | y_range,
178 | view_name,
179 | ):
180 | # global data_3857, data_center_3857, data_4326, data_center_4326
181 |
182 | x0, x1 = x_range
183 | y0, y1 = y_range
184 |
185 | datashader_color_scale = {}
186 |
187 | cvs = ds.Canvas(plot_width=3840, plot_height=2160, x_range=x_range, y_range=y_range)
188 |
189 | colorscale_transform = "linear"
190 |
191 | if view_name == "race":
192 | aggregate_column = "race"
193 | aggregate = "mean"
194 | elif view_name == "total":
195 | aggregate_column = "net"
196 | aggregate = "count"
197 | colorscale_name = "Viridis10"
198 | elif view_name == "in":
199 | aggregate_column = "net"
200 | aggregate = "count"
201 | colorscale_name = "PuBu9"
202 | elif view_name == "stationary":
203 | aggregate_column = "net"
204 | aggregate = "count"
205 | colorscale_name = "Greens9"
206 | elif view_name == "out":
207 | aggregate_column = "net"
208 | aggregate = "count"
209 | colorscale_name = "RdPu9"
210 | else: # net
211 | aggregate_column = "net"
212 | aggregate = "mean"
213 |
214 | if aggregate == "mean":
215 | datashader_color_scale["color_key"] = colors[aggregate_column]
216 | datashader_color_scale["how"] = "log"
217 | else:
218 | datashader_color_scale["cmap"] = [
219 | i[1] for i in build_colorscale(colorscale_name, colorscale_transform)
220 | ]
221 | datashader_color_scale["how"] = "log"
222 |
223 | agg = cvs.points(
224 | df,
225 | x="easting",
226 | y="northing",
227 | agg=getattr(ds, aggregate)(aggregate_column),
228 | )
229 |
230 | cmin = cp.asnumpy(agg.min().data)
231 | cmax = cp.asnumpy(agg.max().data)
232 |
233 | # Count the number of selected towers
234 | temp = agg.sum()
235 | temp.data = cp.asnumpy(temp.data)
236 | n_selected = int(temp)
237 |
238 | if n_selected == 0:
239 | # Nothing to display
240 | lat = [None]
241 | lon = [None]
242 | customdata = [None]
243 | marker = {}
244 | layers = []
245 | else:
246 | img = tf.shade(
247 | tf.dynspread(agg, threshold=0.7),
248 | **datashader_color_scale,
249 | ).to_pil()
250 | # img = tf.shade(agg,how='log',**datashader_color_scale).to_pil()
251 |
252 | # Add image as mapbox image layer. Note that as of version 4.4, plotly will
253 | # automatically convert the PIL image object into a base64 encoded png string
254 | layers = [
255 | {
256 | "sourcetype": "image",
257 | "source": img,
258 | "coordinates": new_coordinates,
259 | }
260 | ]
261 |
262 | # Do not display any mapbox markers
263 | lat = [None]
264 | lon = [None]
265 | customdata = [None]
266 | marker = {}
267 |
268 | # Build map figure
269 | map_graph = {
270 | "data": [],
271 | "layout": {
272 | "template": template,
273 | "uirevision": True,
274 | "mapbox": {
275 | "style": mapbox_style,
276 | "layers": layers,
277 | "center": {
278 | "lon": -78.81063494489342,
279 | "lat": 37.471878534555074,
280 | },
281 | "zoom": 3,
282 | },
283 | "margin": {"r": 140, "t": 0, "l": 0, "b": 0},
284 | "height": 500,
285 | "shapes": [
286 | {
287 | "type": "rect",
288 | "xref": "paper",
289 | "yref": "paper",
290 | "x0": 0,
291 | "y0": 0,
292 | "x1": 1,
293 | "y1": 1,
294 | "line": {
295 | "width": 1,
296 | "color": "#191a1a",
297 | },
298 | }
299 | ],
300 | },
301 | }
302 |
303 | if aggregate == "mean":
304 | # for `Age By PurBlue` category
305 | if view_name == "race":
306 | colorscale = [0, 1]
307 |
308 | marker = dict(
309 | size=0,
310 | showscale=True,
311 | colorbar={
312 | "title": {
313 | "text": "Race",
314 | "side": "right",
315 | "font": {"size": 14},
316 | },
317 | "tickvals": [
318 | (0 + 0.5) / 7,
319 | (1 + 0.5) / 7,
320 | (2 + 0.5) / 7,
321 | (3 + 0.5) / 7,
322 | (4 + 0.5) / 7,
323 | (5 + 0.5) / 7,
324 | (6 + 0.5) / 7,
325 | ],
326 | "ticktext": [
327 | "White",
328 | "African American",
329 | "American Indian",
330 | "Asian alone",
331 | "Native Hawaiian",
332 | "Other Race alone",
333 | "Two or More",
334 | ],
335 | "ypad": 30,
336 | },
337 | colorscale=[
338 | (0 / 7, colors["race"][1]),
339 | (1 / 7, colors["race"][1]),
340 | (1 / 7, colors["race"][2]),
341 | (2 / 7, colors["race"][2]),
342 | (2 / 7, colors["race"][3]),
343 | (3 / 7, colors["race"][3]),
344 | (3 / 7, colors["race"][4]),
345 | (4 / 7, colors["race"][4]),
346 | (4 / 7, colors["race"][5]),
347 | (5 / 7, colors["race"][5]),
348 | (5 / 7, colors["race"][6]),
349 | (6 / 7, colors["race"][6]),
350 | (6 / 7, colors["race"][7]),
351 | (7 / 7, colors["race"][7]),
352 | (7 / 7, colors["race"][7]),
353 | ],
354 | cmin=0,
355 | cmax=1,
356 | ) # end of marker
357 | else:
358 | colorscale = [0, 1]
359 |
360 | marker = dict(
361 | size=0,
362 | showscale=True,
363 | colorbar={
364 | "title": {
365 | "text": "Migration",
366 | "side": "right",
367 | "font": {"size": 14},
368 | },
369 | "tickvals": [(0 + 0.5) / 3, (1 + 0.5) / 3, (2 + 0.5) / 3],
370 | "ticktext": ["Out", "Stationary", "In"],
371 | "ypad": 30,
372 | },
373 | colorscale=[
374 | (0 / 3, colors["net"][-1]),
375 | (1 / 3, colors["net"][-1]),
376 | (1 / 3, colors["net"][0]),
377 | (2 / 3, colors["net"][0]),
378 | (2 / 3, colors["net"][1]),
379 | (3 / 3, colors["net"][1]),
380 | ],
381 | cmin=0,
382 | cmax=1,
383 | ) # end of marker
384 |
385 | map_graph["data"].append(
386 | {
387 | "type": "scattermapbox",
388 | "lat": lat,
389 | "lon": lon,
390 | "customdata": customdata,
391 | "marker": marker,
392 | "hoverinfo": "none",
393 | }
394 | )
395 | map_graph["layout"]["annotations"] = []
396 |
397 | else:
398 | marker = dict(
399 | size=0,
400 | showscale=True,
401 | colorbar={
402 | "title": {
403 | "text": "Population",
404 | "side": "right",
405 | "font": {"size": 14},
406 | },
407 | "ypad": 30,
408 | },
409 | colorscale=build_colorscale(colorscale_name, colorscale_transform),
410 | cmin=cmin,
411 | cmax=cmax,
412 | ) # end of marker
413 |
414 | map_graph["data"].append(
415 | {
416 | "type": "scattermapbox",
417 | "lat": lat,
418 | "lon": lon,
419 | "customdata": customdata,
420 | "marker": marker,
421 | "hoverinfo": "none",
422 | }
423 | )
424 |
425 | map_graph["layout"]["mapbox"].update(position)
426 |
427 | return map_graph
428 |
429 |
430 | def query_df_range_lat_lon(df, x0, x1, y0, y1, x, y):
431 | mask_ = (df[x] >= x0) & (df[x] <= x1) & (df[y] <= y0) & (df[y] >= y1)
432 | if mask_.sum() != len(df):
433 | df = df[mask_]
434 | if isinstance(df, cudf.DataFrame):
435 | df.index = cudf.RangeIndex(0, len(df))
436 | else:
437 | df.index = pd.RangeIndex(0, len(df))
438 | del mask_
439 | return df
440 |
441 |
442 | def bar_selected_ids(selection, column): # select ids for each column
443 | if (column == "county_top") | (column == "county_bottom"):
444 | selected_ids = [county2id[p["label"]] for p in selection["points"]]
445 | else:
446 | selected_ids = [race2id[p["label"]] for p in selection["points"]]
447 |
448 | return selected_ids
449 |
450 |
451 | def query_df_selected_ids(df, col, selected_ids):
452 | if (col == "county_top") | (col == "county_bottom"):
453 | col = "county"
454 | return df[df[col].isin(selected_ids)]
455 |
456 |
457 | def no_data_figure():
458 | return {
459 | "data": [
460 | {
461 | "title": {"text": "Query Result"},
462 | "text": "SOME RANDOM",
463 | "marker": {"text": "NO"},
464 | }
465 | ],
466 | "layout": {
467 | "height": 250,
468 | "template": template,
469 | "xaxis": {"visible": False},
470 | "yaxis": {"visible": False},
471 | },
472 | }
473 |
474 |
475 | def build_histogram_default_bins(
476 | df,
477 | column,
478 | selections,
479 | orientation,
480 | colorscale_name,
481 | colorscale_transform,
482 | view_name,
483 | flag,
484 | ):
485 | if (view_name == "out") & (column == "race"):
486 | return no_data_figure()
487 |
488 | global race2color
489 |
490 | if (column == "county_top") | (column == "county_bottom"):
491 | column = "county"
492 |
493 | if dask_cudf and isinstance(df, dask_cudf.core.DataFrame):
494 | df = df[[column, "net"]].groupby(column)["net"].count().compute().to_pandas()
495 | elif isinstance(df, cudf.DataFrame):
496 | df = df[[column, "net"]].groupby(column)["net"].count().to_pandas()
497 | elif isinstance(df, dd.core.DataFrame):
498 | df = df[[column, "net"]].groupby(column)["net"].count().compute()
499 | else:
500 | df = df[[column, "net"]].groupby(column)["net"].count()
501 |
502 | df = df.sort_values(ascending=False) # sorted grouped ids by counts
503 |
504 | if (flag == "top") | (flag == "bottom"):
505 | if flag == "top":
506 | view = df.iloc[:15]
507 | else:
508 | view = df.iloc[-15:]
509 | names = [id2county[cid] for cid in view.index.values]
510 | else:
511 | view = df
512 | names = [id2race[rid] for rid in view.index.values]
513 |
514 | bin_edges = names
515 | counts = view.values
516 |
517 | mapping_options = {}
518 | xaxis_labels = {}
519 | if column in mappings:
520 | if column in mappings_hover:
521 | mapping_options = {
522 | "text": list(mappings_hover[column].values()),
523 | "hovertemplate": "%{text}: %{y} ",
524 | }
525 | else:
526 | mapping_options = {
527 | "text": list(mappings[column].values()),
528 | "hovertemplate": "%{text} : %{y} ",
529 | }
530 | xaxis_labels = {
531 | "tickvals": list(mappings[column].keys()),
532 | "ticktext": list(mappings[column].values()),
533 | }
534 |
535 | if view_name == "total":
536 | bar_color = counts
537 | bar_scale = build_colorscale("Viridis10", colorscale_transform)
538 | elif view_name == "in":
539 | bar_color = counts
540 | bar_scale = build_colorscale("PuBu9", colorscale_transform)
541 | elif view_name == "stationary":
542 | bar_color = counts
543 | bar_scale = build_colorscale("Greens9", colorscale_transform)
544 | elif view_name == "out":
545 | bar_color = counts
546 | bar_scale = build_colorscale("RdPu9", colorscale_transform)
547 | elif view_name == "race":
548 | if column == "race":
549 | bar_color = [race2color[race] for race in names]
550 | else:
551 | bar_color = "#2C718E"
552 | bar_scale = None
553 | else: # net
554 | bar_color = "#2C718E"
555 | bar_scale = None
556 |
557 | fig = {
558 | "data": [
559 | {
560 | "type": "bar",
561 | "x": bin_edges,
562 | "y": counts,
563 | "marker": {"color": bar_color, "colorscale": bar_scale},
564 | **mapping_options,
565 | }
566 | ],
567 | "layout": {
568 | "yaxis": {
569 | "type": "linear",
570 | "title": {"text": "Count"},
571 | },
572 | "xaxis": {**xaxis_labels},
573 | "selectdirection": "h",
574 | "dragmode": "select",
575 | "template": template,
576 | "uirevision": True,
577 | "hovermode": "closest",
578 | },
579 | }
580 | if column not in selections:
581 | fig["data"][0]["selectedpoints"] = False
582 |
583 | return fig
584 |
585 |
586 | def cull_empty_partitions(df):
587 | ll = list(df.map_partitions(len).compute())
588 | df_delayed = df.to_delayed()
589 | df_delayed_new = list()
590 | pempty = None
591 | for ix, n in enumerate(ll):
592 | if 0 == n:
593 | pempty = df.get_partition(ix)
594 | else:
595 | df_delayed_new.append(df_delayed[ix])
596 | if pempty is not None:
597 | df = dd.from_delayed(df_delayed_new, meta=pempty)
598 | return df
599 |
600 |
601 | def build_updated_figures_dask(
602 | df,
603 | relayout_data,
604 | selected_map,
605 | selected_race,
606 | selected_county_top,
607 | selected_county_bottom,
608 | colorscale_name,
609 | data_3857,
610 | data_center_3857,
611 | data_4326,
612 | data_center_4326,
613 | coordinates_4326_backup,
614 | position_backup,
615 | view_name,
616 | ):
617 | colorscale_transform = "linear"
618 | selected = {}
619 |
620 | selected = {
621 | col: bar_selected_ids(sel, col)
622 | for col, sel in zip(
623 | ["race", "county_top", "county_bottom"],
624 | [selected_race, selected_county_top, selected_county_bottom],
625 | )
626 | if sel and sel.get("points", [])
627 | }
628 |
629 | if relayout_data is not None:
630 | transformer_4326_to_3857 = Transformer.from_crs("epsg:4326", "epsg:3857")
631 |
632 | def epsg_4326_to_3857(coords):
633 | return [transformer_4326_to_3857.transform(*reversed(row)) for row in coords]
634 |
635 | coordinates_4326 = relayout_data and relayout_data.get("mapbox._derived", {}).get(
636 | "coordinates", None
637 | )
638 | dragmode = (
639 | relayout_data
640 | and "dragmode" in relayout_data
641 | and coordinates_4326_backup is not None
642 | )
643 |
644 | if dragmode:
645 | coordinates_4326 = coordinates_4326_backup
646 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326)
647 | position = position_backup
648 | elif coordinates_4326:
649 | lons, lats = zip(*coordinates_4326)
650 | lon0, lon1 = max(min(lons), data_4326[0][0]), min(max(lons), data_4326[1][0])
651 | lat0, lat1 = max(min(lats), data_4326[0][1]), min(max(lats), data_4326[1][1])
652 | coordinates_4326 = [
653 | [lon0, lat0],
654 | [lon1, lat1],
655 | ]
656 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326)
657 | coordinates_4326_backup = coordinates_4326
658 |
659 | position = {
660 | "zoom": relayout_data.get("mapbox.zoom", None),
661 | "center": relayout_data.get("mapbox.center", None),
662 | }
663 | position_backup = position
664 |
665 | else:
666 | position = {
667 | "zoom": 3.3350828189345934,
668 | "pitch": 0,
669 | "bearing": 0,
670 | "center": {
671 | "lon": -100.55828959790324,
672 | "lat": 38.68323453274175,
673 | }, # {'lon': data_center_4326[0][0]-100, 'lat': data_center_4326[0][1]-10}
674 | }
675 | coordinates_3857 = data_3857
676 | coordinates_4326 = data_4326
677 |
678 | new_coordinates = [
679 | [coordinates_4326[0][0], coordinates_4326[1][1]],
680 | [coordinates_4326[1][0], coordinates_4326[1][1]],
681 | [coordinates_4326[1][0], coordinates_4326[0][1]],
682 | [coordinates_4326[0][0], coordinates_4326[0][1]],
683 | ]
684 |
685 | x_range, y_range = zip(*coordinates_3857)
686 | x0, x1 = x_range
687 | y0, y1 = y_range
688 |
689 | if selected_map is not None:
690 | coordinates_4326 = selected_map["range"]["mapbox"]
691 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326)
692 | x_range_t, y_range_t = zip(*coordinates_3857)
693 | x0, x1 = x_range_t
694 | y0, y1 = y_range_t
695 | df = df.map_partitions(
696 | query_df_range_lat_lon, x0, x1, y0, y1, "easting", "northing"
697 | ).persist()
698 |
699 | # Select points as per view
700 |
701 | if (view_name == "total") | (view_name == "race"):
702 | df = df[(df["net"] == 0) | (df["net"] == 1)]
703 | # df['race'] = df['race'].astype('category')
704 | elif view_name == "in":
705 | df = df[df["net"] == 1]
706 | df["net"] = df["net"].astype("int8")
707 | elif view_name == "stationary":
708 | df = df[df["net"] == 0]
709 | df["net"] = df["net"].astype("int8")
710 | elif view_name == "out":
711 | df = df[df["net"] == -1]
712 | df["net"] = df["net"].astype("int8")
713 | else: # net migration condition
714 | df = df
715 | # df["net"] = df["net"].astype("category")
716 |
717 | for col in selected:
718 | df = df.map_partitions(query_df_selected_ids, col, selected[col])
719 |
720 | # cull empty partitions
721 | df = cull_empty_partitions(df).persist()
722 |
723 | datashader_plot = build_datashader_plot(
724 | df,
725 | colorscale_name,
726 | colorscale_transform,
727 | new_coordinates,
728 | position,
729 | x_range,
730 | y_range,
731 | view_name,
732 | )
733 |
734 | # Build indicator figure
735 | n_selected_indicator = {
736 | "data": [
737 | {
738 | "domain": {"x": [0.21, 0.41], "y": [0, 0.5]},
739 | "title": {"text": "Data Size"},
740 | "type": "indicator",
741 | "value": len(df),
742 | "number": {
743 | "font": {"color": text_color, "size": "50px"},
744 | "valueformat": ",",
745 | "suffix": " rows",
746 | },
747 | },
748 | ],
749 | "layout": {
750 | "template": template,
751 | "height": row_heights[3],
752 | # 'margin': {'l': 0, 'r': 0,'t': 5, 'b': 5}
753 | },
754 | }
755 |
756 | race_histogram = build_histogram_default_bins(
757 | df,
758 | "race",
759 | selected,
760 | "v",
761 | colorscale_name,
762 | colorscale_transform,
763 | view_name,
764 | flag="All",
765 | )
766 |
767 | county_top_histogram = build_histogram_default_bins(
768 | df,
769 | "county",
770 | selected,
771 | "v",
772 | colorscale_name,
773 | colorscale_transform,
774 | view_name,
775 | flag="top",
776 | )
777 |
778 | county_bottom_histogram = build_histogram_default_bins(
779 | df,
780 | "county",
781 | selected,
782 | "v",
783 | colorscale_name,
784 | colorscale_transform,
785 | view_name,
786 | flag="bottom",
787 | )
788 |
789 | del df
790 | return (
791 | datashader_plot,
792 | county_top_histogram,
793 | county_bottom_histogram,
794 | race_histogram,
795 | n_selected_indicator,
796 | coordinates_4326_backup,
797 | position_backup,
798 | )
799 |
800 |
801 | def build_updated_figures(
802 | df,
803 | relayout_data,
804 | selected_map,
805 | selected_race,
806 | selected_county_top,
807 | selected_county_bottom,
808 | colorscale_name,
809 | data_3857,
810 | data_center_3857,
811 | data_4326,
812 | data_center_4326,
813 | coordinates_4326_backup,
814 | position_backup,
815 | view_name,
816 | ):
817 | colorscale_transform = "linear"
818 | selected = {}
819 |
820 | selected = {
821 | col: bar_selected_ids(sel, col)
822 | for col, sel in zip(
823 | ["race", "county_top", "county_bottom"],
824 | [selected_race, selected_county_top, selected_county_bottom],
825 | )
826 | if sel and sel.get("points", [])
827 | }
828 |
829 | if relayout_data is not None:
830 | transformer_4326_to_3857 = Transformer.from_crs("epsg:4326", "epsg:3857")
831 |
832 | def epsg_4326_to_3857(coords):
833 | return [transformer_4326_to_3857.transform(*reversed(row)) for row in coords]
834 |
835 | coordinates_4326 = relayout_data and relayout_data.get("mapbox._derived", {}).get(
836 | "coordinates", None
837 | )
838 | dragmode = (
839 | relayout_data
840 | and "dragmode" in relayout_data
841 | and coordinates_4326_backup is not None
842 | )
843 |
844 | if dragmode:
845 | coordinates_4326 = coordinates_4326_backup
846 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326)
847 | position = position_backup
848 | elif coordinates_4326:
849 | lons, lats = zip(*coordinates_4326)
850 | lon0, lon1 = max(min(lons), data_4326[0][0]), min(max(lons), data_4326[1][0])
851 | lat0, lat1 = max(min(lats), data_4326[0][1]), min(max(lats), data_4326[1][1])
852 | coordinates_4326 = [
853 | [lon0, lat0],
854 | [lon1, lat1],
855 | ]
856 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326)
857 | coordinates_4326_backup = coordinates_4326
858 |
859 | position = {
860 | "zoom": relayout_data.get("mapbox.zoom", None),
861 | "center": relayout_data.get("mapbox.center", None),
862 | }
863 | position_backup = position
864 |
865 | else:
866 | position = {
867 | "zoom": 3.3350828189345934,
868 | "pitch": 0,
869 | "bearing": 0,
870 | "center": {
871 | "lon": -100.55828959790324,
872 | "lat": 38.68323453274175,
873 | }, # {'lon': data_center_4326[0][0]-100, 'lat': data_center_4326[0][1]-10}
874 | }
875 | coordinates_3857 = data_3857
876 | coordinates_4326 = data_4326
877 |
878 | new_coordinates = [
879 | [coordinates_4326[0][0], coordinates_4326[1][1]],
880 | [coordinates_4326[1][0], coordinates_4326[1][1]],
881 | [coordinates_4326[1][0], coordinates_4326[0][1]],
882 | [coordinates_4326[0][0], coordinates_4326[0][1]],
883 | ]
884 |
885 | x_range, y_range = zip(*coordinates_3857)
886 | x0, x1 = x_range
887 | y0, y1 = y_range
888 |
889 | if selected_map is not None:
890 | coordinates_4326 = selected_map["range"]["mapbox"]
891 | coordinates_3857 = epsg_4326_to_3857(coordinates_4326)
892 | x_range_t, y_range_t = zip(*coordinates_3857)
893 | x0, x1 = x_range_t
894 | y0, y1 = y_range_t
895 | df = query_df_range_lat_lon(df, x0, x1, y0, y1, "easting", "northing")
896 |
897 | # Select points as per view
898 | if (view_name == "total") | (view_name == "race"):
899 | df = df[(df["net"] == 0) | (df["net"] == 1)]
900 | df["net"] = df["net"].astype("int8")
901 | # df['race'] = df['race'].astype('category')
902 | elif view_name == "in":
903 | df = df[df["net"] == 1]
904 | df["net"] = df["net"].astype("int8")
905 | elif view_name == "stationary":
906 | df = df[df["net"] == 0]
907 | df["net"] = df["net"].astype("int8")
908 | elif view_name == "out":
909 | df = df[df["net"] == -1]
910 | df["net"] = df["net"].astype("int8")
911 | else: # net migration condition
912 | df = df
913 | # df["net"] = df["net"].astype("category")
914 |
915 | for col in selected:
916 | df = query_df_selected_ids(df, col, selected[col])
917 |
918 | datashader_plot = build_datashader_plot(
919 | df,
920 | colorscale_name,
921 | colorscale_transform,
922 | new_coordinates,
923 | position,
924 | x_range,
925 | y_range,
926 | view_name,
927 | )
928 |
929 | # Build indicator figure
930 | n_selected_indicator = {
931 | "data": [
932 | {
933 | "domain": {"x": [0.2, 0.45], "y": [0, 0.5]},
934 | "title": {"text": "Data Size"},
935 | "type": "indicator",
936 | "value": len(df),
937 | "number": {
938 | "font": {"color": text_color, "size": "50px"},
939 | "valueformat": ",",
940 | "suffix": " rows",
941 | },
942 | },
943 | ],
944 | "layout": {
945 | "template": template,
946 | "height": row_heights[3],
947 | # 'margin': {'l': 0, 'r': 0,'t': 5, 'b': 5}
948 | },
949 | }
950 |
951 | race_histogram = build_histogram_default_bins(
952 | df,
953 | "race",
954 | selected,
955 | "v",
956 | colorscale_name,
957 | colorscale_transform,
958 | view_name,
959 | flag="All",
960 | )
961 |
962 | county_top_histogram = build_histogram_default_bins(
963 | df,
964 | "county",
965 | selected,
966 | "v",
967 | colorscale_name,
968 | colorscale_transform,
969 | view_name,
970 | flag="top",
971 | )
972 |
973 | county_bottom_histogram = build_histogram_default_bins(
974 | df,
975 | "county",
976 | selected,
977 | "v",
978 | colorscale_name,
979 | colorscale_transform,
980 | view_name,
981 | flag="bottom",
982 | )
983 |
984 | del df
985 | return (
986 | datashader_plot,
987 | race_histogram,
988 | county_top_histogram,
989 | county_bottom_histogram,
990 | n_selected_indicator,
991 | coordinates_4326_backup,
992 | position_backup,
993 | )
994 |
995 |
996 | def check_dataset(dataset_url, data_path):
997 | if not os.path.exists(data_path):
998 | print(
999 | f"Dataset not found at " + data_path + ".\n"
1000 | f"Downloading from {dataset_url}"
1001 | )
1002 | # Download dataset to data directory
1003 | os.makedirs("../data", exist_ok=True)
1004 | with requests.get(dataset_url, stream=True) as r:
1005 | r.raise_for_status()
1006 | with open(data_path, "wb") as f:
1007 | for chunk in r.iter_content(chunk_size=8192):
1008 | if chunk:
1009 | f.write(chunk)
1010 | print("Download completed!")
1011 | else:
1012 | print(f"Found dataset at {data_path}")
1013 |
1014 |
1015 | def load_dataset(path, dtype="dask_cudf"):
1016 | """
1017 | Args:
1018 | path: Path to arrow file containing mortgage dataset
1019 | Returns:
1020 | pandas DataFrame
1021 | """
1022 | if os.path.isdir(path):
1023 | path = path + "/*"
1024 | if dtype == "dask":
1025 | return dd.read_parquet(path, split_row_groups=True)
1026 | elif dask_cudf and dtype == "dask_cudf":
1027 | return dask_cudf.read_parquet(path, split_row_groups=True)
1028 | elif dtype == "pandas":
1029 | return cudf.read_parquet(path).to_pandas()
1030 | return cudf.read_parquet(path)
1031 |
--------------------------------------------------------------------------------