├── 0_Prerequisites.jmd
├── BasicDataAndPlots.jmd
├── COVID-monitoring.jmd
├── DiscussionBasicDataAndPlots.jmd
├── GMTMaps.jmd
├── Manifest.toml
├── Project.toml
├── README.md
├── RegressionDiscontinuityQuestion.ipynb
├── RegressionDiscontinuityQuestion.jmd
├── build.jl
├── data
    └── longevity.csv
└── requirements.txt


/0_Prerequisites.jmd:
--------------------------------------------------------------------------------
 1 | # Prerequisites
 2 | 
 3 | This document simply lets you add packages that are appropriate for
 4 | data analysis and visualizations, and then precompile all of them so
 5 | that you don't have long load times later when you want to use the
 6 | packages. This script could take **quite a long time** and requires
 7 | you to have an internet connection to download the packages.
 8 | 
 9 | ```julia;exec=false
10 | 
11 | using Pkg;
12 | Pkg.add("IJulia")
13 | Pkg.add("Queryverse")
14 | Pkg.add("DataFrames")
15 | Pkg.add("HTTP")
16 | Pkg.add("Plots")
17 | Pkg.add("Gadfly#master")
18 | Pkg.add("Distributions")
19 | Pkg.add("Random")
20 | Pkg.add("GLM")
21 | Pkg.add("Optim")
22 | Pkg.add("BlackBoxOptim")
23 | Pkg.add("Turing#master") # currently has bug
24 | Pkg.add("Stan")
25 | Pkg.add("Statistics")
26 | Pkg.add("SQLite")
27 | Pkg.add("Weave")
28 | Pkg.update()
29 | Pkg.precompile()
30 | 
31 | ```
32 | 


--------------------------------------------------------------------------------
/BasicDataAndPlots.jmd:
--------------------------------------------------------------------------------
  1 | # Acquiring some data and plotting it
  2 | 
  3 | ## A simple data analysis tutorial using Julia
  4 | ### By: Daniel Lakeland
  5 | ### Lakeland Applied Sciences LLC
  6 | 
  7 | 
  8 | So you want to answer some questions you have about the
  9 | world... Suppose you'd like to see how the population of several
 10 | states has changed over time. We'll rely on CSV files published by the
 11 | Census. The goal of this tutorial will be to simply walk you through
 12 | acquiring the data, processing it into a form where it's easy to
 13 | analyze, and plotting some aspects of the data to help you see what
 14 | happened.
 15 | 
 16 | In the companion Discussion we'll talk about why the code looks like
 17 | it does and what other options there are.
 18 | 
 19 | ## Getting the data
 20 | 
 21 | We'll rely on the [Queryverse](https://www.queryverse.org/)
 22 | meta-package which will provide us with ways to reads CSV files and to
 23 | pipe datasets through filtering and mutation operations. To find out
 24 | more you can
 25 | [watch a great introduction](https://youtu.be/OFPNph-WxLM?t=73) by the
 26 | author. And we'll use
 27 | [DataFrames](https://juliadata.github.io/DataFrames.jl/stable/), which
 28 | will let us create in-memory tabular data objects. We will also use
 29 | [Gadfly](http://gadflyjl.org/stable/) which is a plotting package
 30 | particularly oriented towards statistical data analysis plots. We
 31 | won't cover the Queryverse built-in plotting system VegaLite in this
 32 | tutorial, but might provide some separate tutorials.
 33 | 
 34 | Let's get started by loading up the required packages, and grabbing
 35 | some data:
 36 | 
 37 | 
 38 | ```julia
 39 | using Queryverse,DataFrames;
 40 | 
 41 | cenfile="https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv"
 42 | 
 43 | df = DataFrame(load(cenfile));
 44 | display(first(df,5))
 45 | 
 46 | ```
 47 | 
 48 | ## Reshaping the data to long form
 49 | 
 50 | Of course, the shape of this data is all wrong. There are 164 columns
 51 | each column has information on a different variable in a different
 52 | year! This data format is convenient for people wishing to by-hand
 53 | click some buttons and graph some data in Excel, because it places
 54 | related data items together in the dataset and they can select those
 55 | related items with a mouse. It is, however, horrible for someone
 56 | wishing to **program** a computer to do analysis in bulk.
 57 | 
 58 | This is a common data problem. To solve it we will want to reshape the
 59 | data using some
 60 | [DataFrame functions](https://juliadata.github.io/DataFrames.jl/stable/man/reshaping_and_pivoting/). In
 61 | particular `stack` will take a dataset and "stack" it into a long
 62 | form, with one row for each column mentioned. In this case we want all
 63 | the columns except Not the ones that identify the row.
 64 | 
 65 | ## Let's Stack It
 66 | 
 67 | We'll stack the data frame, and then subset to the columns we might
 68 | care about to avoid a lot of extra junk on our screen, and select the
 69 | rows that represent statewide totals (where `STNAME == CTYNAME`), and
 70 | we'll add a `year` column for later use.
 71 | 
 72 | ```julia
 73 | 
 74 | dflong = stack(df,Not([:SUMLEV,:REGION,:DIVISION,:STATE,:COUNTY,:STNAME,:CTYNAME ])) |> @filter(_.STNAME == _.CTYNAME) |> @mutate(year=-1) |> DataFrame;
 75 | 
 76 | select!(dflong,[:STNAME,:year,:variable, :value,:DIVISION,:REGION])
 77 | 
 78 | 
 79 | display(first(dflong,5))
 80 | 
 81 | ```
 82 | 
 83 | Since the table is now "long form" we have one row per county per
 84 | measurement. We can therefore select out just the ones that have
 85 | population measurements. Let's take a look at which measurements those
 86 | are.
 87 | 
 88 | ```julia
 89 | 
 90 | unique(dflong[:,:variable])
 91 | 
 92 | ```
 93 | 
 94 | ## Munging the data
 95 | 
 96 | 
 97 | Clearly POPESTIMATE variables are the ones we want, but the year has
 98 | been encoded into the symbol name because there was one column per
 99 | year... What we want is to split this column into one containing
100 | POPESTIMATE and one containing the year.
101 | 
102 | Since we're going to want to edit the variable names in the variable
103 | column, it's not convenient for them to be symbols. Let's convert them
104 | to String and in the process, also strip off the year and put it into
105 | a numerical `year` column.
106 | 
107 | We notice that the census always put the year pasted on to the end of
108 | the symbol name (when appropriate). But it's not for every
109 | symbol. We'll strip off the last 4 characters, and try to convert it
110 | to an integer using `tryparse`. 
111 | 
112 | ```julia
113 | 
114 | display(first(dflong,5))
115 | dflong2 = DataFrame(dflong |> @mutate(year = something(tryparse(Int,string(_.variable)[end-3:end]),-1), variable=String(_.variable)) |> @mutate(variable = _.year > 0 ? _.variable[1:end-4] : _.variable) )
116 | display(first(dflong2,5))
117 | 
118 | ```
119 | 
120 | There are lots of variables, so let's just select out the ones that
121 | look like POPESTIMATE.
122 | 
123 | ```julia
124 | display(unique(dflong2.variable) )
125 | 
126 | pop = dflong2 |> @filter(_.variable == "POPESTIMATE") |>DataFrame
127 | display(first(pop,5))
128 | 
129 | ```
130 | 
131 | # Visualizing Aspects of Population Data
132 | 
133 | 
134 | Now we have a DataFrame called `pop` that we can examine. Let's find
135 | out what is going on in the population of these states over
136 | time. We'll need to make some plots using Gadfly.
137 | 
138 | ```julia
139 | 
140 | using Gadfly
141 | 
142 | set_default_plot_size(20cm,10cm);
143 | 
144 | display(plot(pop,x=:year,y=:value,Geom.line,color=:STNAME))
145 | display(plot(pop,x=:value,Geom.density(bandwidth=1e6)))
146 | 
147 | ```
148 | 
149 | Normally Jupyter only displays the last thing in a cell, so we
150 | explicitly display each graph.
151 | 
152 | These plots get us some basic information, but they have all sorts of
153 | problems. For example the color key for the 50 states takes up more of
154 | the plot than the plot does. And there are too many lines to really
155 | tell what's going on. And the density plot for population is pretty
156 | interesting, but it has no labels and the units are not very
157 | convenient. Let's fix the density plot first.
158 | 
159 | ```julia
160 | 
161 | display(plot(DataFrame(pop |> @mutate(stpop=_.value/1e6)),
162 | 	x=:stpop,Geom.density(bandwidth=1),
163 | 	Guide.title("Distribution of State Populations"),
164 | 	Guide.xlabel("Population (Millions of People)")))
165 | 
166 | 
167 | ``` 
168 | 
169 | Of course, this is the density for all observations across all the
170 | years. Let's look at how the distribution of populations changed
171 | between say 2015 and 2019:
172 | 
173 | ```julia
174 | set_default_plot_size(20cm,10cm)
175 | display(hstack(plot(DataFrame(pop |> @mutate(stpop=_.value/1e6) |> @filter(_.year == 2015)),
176 | x=:stpop,Geom.density(bandwidth=1),
177 | Guide.title("Distribution of State Populations 2015"),
178 | Guide.xlabel("Population (Millions of People)")),
179 | plot(DataFrame(pop |> @mutate(stpop=_.value/1e6) |> @filter(_.year == 2019)),
180 | x=:stpop,Geom.density(bandwidth=1),
181 | Guide.title("Distribution of State Populations 2019"),
182 | Guide.xlabel("Population (Millions of People)"))))	
183 | 
184 | ```
185 | 
186 | We can see that the distribution is relatively stable as might be
187 | expected since tens of millions of people don't tend to all move
188 | between states every few years. Let's select all the states with more
189 | than 10 Million people, and look at how they trended in time, and
190 | compare to states with less than 2 Million people.
191 | 
192 | ```julia
193 | bigstates = unique(pop[pop.value .> 10e6,:STNAME])
194 | 
195 | bigdata = DataFrame(pop |> @filter(_.STNAME in bigstates) |> @mutate(stpop=_.value/1e6))
196 | bigplot = plot(bigdata, x=:year,y=:stpop, Geom.line,color=:STNAME,Geom.point,Guide.title("Population of Large States"))
197 | 
198 | smallstates = unique(pop[pop.value .< 2e6,:STNAME])
199 | 
200 | smalldata = DataFrame(pop |> @filter(_.STNAME in smallstates) |> @mutate(stpop=_.value/1e6))
201 | 
202 | smallplot = plot(smalldata, x=:year,y=:stpop,Geom.line,color=:STNAME,Geom.point,Guide.title("Population of Small States"))
203 | 
204 | set_default_plot_size(9inch,4inch)
205 | hstack(bigplot,smallplot)
206 | 
207 | ```
208 | 
209 | So far so good. We notice that Idaho has been growing nonlinearly
210 | through time. Let's fit a quadratic to its growth curve and then we'll
211 | call it a day.
212 | 
213 | ```julia
214 | using GLM
215 | iddata = smalldata |> @filter(_.STNAME == "Idaho") |> DataFrame
216 | idgrowth = lm(@formula(stpop ~ (year-2015)+(year-2015)^2),iddata)
217 | 
218 | display(coef(idgrowth))
219 | 
220 | preds = DataFrame(predict(idgrowth,DataFrame(year=[2020,2021]),interval=:prediction))
221 | preds.State = ["Idaho","Idaho"]
222 | preds.year = [2020,2021]
223 | preds
224 | 
225 | 
226 | ```
227 | 
228 | Who knows how many people will be in Idaho in 2021, perhaps COVID-19
229 | will cause an even bigger influx than the recent trend as people leave
230 | CA. But in any case, now we know what we'd expect if the trend
231 | continued as in the past. Extrapolating using a quadratic can be
232 | problematic, but this isn't too far outside the range of plausible. If
233 | we want to do more modeling, we should really learn some Bayesian
234 | methods 😉.
235 | 
236 | 
237 | That's it for now. If you have some interest in what we did, and why
238 | we did it, you can read the Discussion document.
239 | 


--------------------------------------------------------------------------------
/COVID-monitoring.jmd:
--------------------------------------------------------------------------------
  1 | # Monitoring The Progress of the COVID-19 Pandemic
  2 | 
  3 | ## A Julia Data Analysis Tutorial
  4 | ## By Daniel Lakeland
  5 | ## Lakeland Applied Sciences LLC
  6 | 
  7 | 
  8 | As a mathematical modeler and data analyst many of my friends have
  9 | asked me questions about what is going on with the COVID pandemic. On
 10 | my blog I've posted some graphs as PDFs with updates every few weeks,
 11 | but it's more convenient to give my friends and family an executable
 12 | Jupyter notebook where they can update to the latest data any time
 13 | they want. Let's get started by grabbing the daily data for all states
 14 | from the [Covid Tracking Project](https://covidtracking.com/)
 15 | 
 16 | We'll use the
 17 | [Vegalite graphics library](https://vega.github.io/vega-lite/docs/)
 18 | this time with the
 19 | [@vlplot](https://www.queryverse.org/VegaLite.jl/stable/userguide/vlplotmacro/)
 20 | macro.
 21 | 
 22 | ```julia
 23 | using Queryverse, CSVFiles
 24 | ```
 25 | 
 26 | 
 27 | ```julia
 28 | usdata = "https://covidtracking.com/api/v1/states/daily.csv"
 29 | 
 30 | 
 31 | uspopurl = "https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv"
 32 | 
 33 | statecodesurl ="https://www2.census.gov/geo/docs/reference/state.txt?#"
 34 | 
 35 | uspop = load(uspopurl) |> @filter(_.COUNTY == 0) |> @select(:STATE,:STNAME,:POPESTIMATE2019) |> DataFrame
 36 | 
 37 | 
 38 | statecodes = load(File(format"CSV",statecodesurl), delim='|') |> DataFrame
 39 | 
 40 | 
 41 | 
 42 | dat = DataFrame(load(usdata))
 43 | 
 44 | 
 45 | dat = join(dat,statecodes, on = :state => :STUSAB,kind=:left)
 46 | 
 47 | dat = join(dat,uspop, on = :STATE_NAME => :STNAME, kind=:left, makeunique=true)
 48 | 
 49 | 
 50 | display(first(dat,5))
 51 | display(first(uspop))
 52 | 
 53 | 
 54 | display(first(dat,5))
 55 | display(first(uspop))
 56 | 
 57 | ```
 58 | 
 59 | # Understanding the Data
 60 | 
 61 | The Covid Tracking Project aggregates data from all the states on
 62 | various important measures. For the moment let's focus on daily
 63 | positive tests, the ratio of total tests to positive tests, the number
 64 | of hospitalized patients, and the number of deaths per day. 
 65 | 
 66 | Let's figure out which columns those correspond to:
 67 | 
 68 | ```julia
 69 | 
 70 | println(names(dat))
 71 | 
 72 | using Dates
 73 | 
 74 | function convdate(d::Int)
 75 |     return(Date(div(d,10000),div(mod(d,10000),100),mod(d,100)));
 76 | end
 77 | 
 78 | dat2 = dat |> @mutate(thedate=convdate(_.date)) |> DataFrame
 79 | 
 80 | allstates = unique(dat.state);
 81 | 
 82 | ```
 83 | 
 84 | Let's create a function that plots percentage positive, positive per
 85 | day, and deaths per day for a given state...
 86 | 
 87 | ```julia
 88 | 
 89 | function plotstate(df,state)
 90 |     dfstate = df |> @filter(_.state == state) |> 
 91 |         @mutate(testpct=_.positiveIncrease/(_.totalTestResultsIncrease+.1)) |>
 92 |         @select(:thedate,:state,:testpct,:positiveIncrease,:deathIncrease)|>DataFrame
 93 | 
 94 |     testing = @vlplot(width=300,layer=[],title="Testing in $state") +
 95 |         @vlplot(:point,x=:thedate,y={:testpct,axis={title="Percentage Positive"}}) + 
 96 |         @vlplot(transform=[{loess=:testpct,on=:thedate}],
 97 |                 mark=:line,x=:thedate,y=:testpct)
 98 |     
 99 |     cases = @vlplot(width=300,layer=[],title="Cases in $state") + 
100 |         @vlplot(mark={:point,filled=true},
101 |                 x=:thedate,y={:positiveIncrease,axis={title="Cases Per Day"}})+
102 |     @vlplot(transform=[{loess=:positiveIncrease,on=:thedate,bandwidth=.2}],
103 |             mark=:line,x=:thedate,y=:positiveIncrease)
104 |     
105 |     deaths = @vlplot(width=300,layer=[],title="Deaths in $state") + 
106 |         @vlplot(:point,x=:thedate,y={:deathIncrease,axis={title="Deaths Per Day"}}) + 
107 |         @vlplot(transform=[{loess=:deathIncrease,on=:thedate,bandwidth=.2}],
108 |                 mark=:line,x=:thedate,y=:deathIncrease)
109 |     
110 |     return(dfstate |> hcat(testing,cases,deaths))
111 |     
112 | end
113 | 
114 | #test output
115 | #plotstate(dat2,"CA")
116 | 
117 | ```
118 | 
119 | # Plotting All The States:
120 | 
121 | 
122 | 
123 | ```julia
124 | 
125 | for i in allstates
126 |     display(plotstate(dat2,i))
127 | end
128 | 
129 | 
130 | ```
131 | 
132 | # Hospitalization:
133 | 
134 | Hospitalization data is reported rather incompletely in the
135 | covidtracking data. We'll filter out the missing values, and then
136 | graph states for which it's available. Note that for Queryverse
137 | @filter we must filter on isna rather than ismissing.
138 | 
139 | ```julia
140 | 
141 | 
142 | hdat = dat2 |> @filter(! isna(_.hospitalizedCurrently)) |> DataFrame
143 | display(first(hdat,10))
144 | 
145 | for i in allstates
146 | 
147 |     hdat |> @filter(_.state == i) |> @mutate(hosppc = _.hospitalizedCurrently / _.POPESTIMATE2019 * 1e3) |> @vlplot(layer=[]) + @vlplot(:point, x=:thedate, y=:hosppc,title="$i Hospitalization/1000 people") + @vlplot(transform=[{loess=:hosppc,on=:thedate,bandwidth=.2}],mark=:line,x=:thedate,y=:hosppc) |> display
148 | 
149 | end
150 | 
151 | ```
152 | 
153 | 
154 | 
155 | # Mortality Data:
156 | 
157 | Although the CDC in general doesn't have up to date mortality data
158 | available, they have made an effort to create a variety of datasets
159 | for COVID. They're required not to release too-specific
160 | information. They can only do aggregated groups with more than 10
161 | people aggregated. Obviously you'd probably like every day every
162 | county, exactly how many people in each sex and age category
163 | died... But this kind of data is not allowed as it quickly becomes
164 | individually identifiable.
165 | 
166 | The most useful datasets are:
167 | 
168 | 1. COVID-19 [Case Surveillance Public Use Data](https://catalog.data.gov/dataset/covid-19-case-surveillance-public-use-data)
169 |    1. CSV at: https://data.cdc.gov/api/views/vbim-akqf/rows.csv?accessType=DOWNLOAD
170 | 2. [Provisional Death Counts by Sex,Age,State](https://catalog.data.gov/dataset/provisional-covid-19-death-counts-by-sex-age-and-state-fb69a)
171 |    1. CSV at: https://data.cdc.gov/api/views/9bhg-hcku/rows.csv?accessType=DOWNLOAD
172 | 3. [By Place of Death and State](https://catalog.data.gov/dataset/provisional-covid-19-death-counts-by-place-of-death-and-state-936c6)
173 | 4. [By Week and State](https://catalog.data.gov/dataset/provisional-covid-19-death-counts-by-week-ending-date-and-state)
174 |    1. CSV at: https://data.cdc.gov/api/views/r8kw-7aab/rows.csv?accessType=DOWNLOAD
175 | 5. [By Sex Age and Week](https://catalog.data.gov/dataset/provisional-covid-19-death-counts-by-sex-age-and-week)
176 |    1. CSV at: https://data.cdc.gov/api/views/vsak-wrfu/rows.csv?accessType=DOWNLOAD
177 | 6. [Death counts by county](https://catalog.data.gov/dataset/provisional-covid-19-death-counts-in-the-united-states-by-county)
178 |    1. CSV at: https://data.cdc.gov/api/views/kn79-hsxy/rows.csv?accessType=DOWNLOAD
179 | 7. [Weekly death counts by state and cause](https://catalog.data.gov/dataset/weekly-counts-of-deaths-by-state-and-select-causes-2019-2020)
180 |    1. CSV at: https://data.cdc.gov/api/views/muzy-jte6/rows.csv?accessType=DOWNLOAD
181 | 8. 
182 | 
183 | 
184 | 
185 | ```julia
186 | 
187 | 
188 | using HTTP
189 | function grabifsmallerolder(url,filename,size,time)
190 |     s = stat(filename)
191 |     if(s.size < size || s.mtime < time)
192 |         chmod(filename,0o644)
193 |         io = open(filename,"w")
194 |         try r = HTTP.get(url,response_stream=io); catch err; 
195 |             throw(err)
196 |         finally 
197 |             close(io)
198 |         end
199 |     end
200 | end
201 | 
202 | covdeaths = "https://data.cdc.gov/api/views/9bhg-hcku/rows.csv?accessType=DOWNLOAD"
203 | grabifsmallerolder(covdeaths,"covSexAgeState.csv",1e6,time()-3600*24);
204 | 
205 | 
206 | 
207 | ```
208 | 
209 | It's useful to ask which age groups are losing the most expected life
210 | years? To answer this we can use the life expectancy tables provided
211 | by the CDC. To make this analysis simple, we'll use tables for the
212 | whole population against the data for both sexes.
213 | 
214 | 
215 | 
216 | ```julia;
217 | 
218 | covdf = DataFrame(load("covSexAgeState.csv"))
219 | 
220 | agedict = Dict("Under 1 year" => 0.5, "1-4 years" => 2.5, "5-14 years" => 10.0, "15-24 years" => 20.0, "0-17 years" => 17.0/2, "18-29 years" => (18+28)/2.0,
221 |                "25-34 years" => 30.0, "30-49 years" => (30+49)/2.0, "35-44 years" => 40.0, 
222 |                "45-54 years" => 50.0, "50-64 years" => (50+64)/2.0, "55-64 years" => 60.0, "65-74 years" => 70.0 ,
223 |                "75-84 years" => 80.0, "85 years and over" => 90.0, "All ages"=>nothing, "All Ages" => nothing)
224 | 
225 | rename!(covdf,[:date,:startwk,:endwk,:state,:sex,:agegrp,:coviddeaths,:totdeaths,:pneumdeaths,:pneumandcovdeaths,:infldeaths,:pnuminfcovdeaths,:footnote])
226 | 
227 | covdf.agenum  = map(x -> agedict[x],covdf.agegrp)
228 | 
229 | 
230 | covallagedf = covdf[map(x -> x in ["All ages","All Ages"], covdf.agegrp ),:]
231 | 
232 | filter!(x -> ! (x in ["All ages","All Ages"]),covdf)
233 | 
234 | deathsplot = covdf |> @filter(_.state == "United States" && _.sex != "Unknown") |> 
235 | ( @vlplot(layer=[]) + 
236 |   @vlplot(:point,x={"agenum:q",title="Age"},y= {"coviddeaths:q",scale={domain=[0,100000]}},
237 |           color="sex:n",width=600,title="Total Coronavirus Deaths By Age") +
238 |   @vlplot(:line,x="agenum:q",y="coviddeaths:q",color="sex:n"))
239 | 
240 | lifetable = "https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/NVSR/68_07/Table01.xlsx"
241 | 
242 | grabifsmallerolder(lifetable,"alllife.xlsx",100,time()+3600*12);
243 | 
244 | ltd = DataFrame(load("alllife.xlsx","Table 1!A4:G104"))
245 | rename!(ltd,[:agegrp,:qx,:nsurv,:ndie,:pylived,:pylivedabov,:expect])
246 | 
247 | ltd.age = 1:100
248 | 
249 | lyldf = DataFrame(covdf |> @filter(_.sex == "All" && _.agenum != nothing) |> 
250 |                   @mutate(lyl = _.coviddeaths * ltd[Int(round(_.agenum +.1)),:expect]))
251 | 
252 | 
253 | 
254 | lylplot = lyldf |>@mutate(ltl=_.lyl/80)|> @vlplot(:bar,x={"agenum:q",title="Age"},y={"ltl:q",title="Nominal Lifetimes Lost"},width=600,
255 |                                         title="Nominal Lifetimes Lost By Age (1 lifetime = 80 years)")
256 | display([deathsplot; lylplot]);
257 | 
258 | 
259 | ```
260 | 
261 | The CDC Through Data.gov has made available a de-identified
262 | individualized case database of COVID cases:
263 | 
264 | ```julia
265 | 
266 | covidcases="https://data.cdc.gov/api/views/vbim-akqf/rows.csv?accessType=DOWNLOAD"
267 | 
268 | grabifsmallerolder(covidcases,"covidcasespub.csv",1000000,time()-3600*24);
269 | 
270 | covcasedf = DataFrame(load("covidcasespub.csv"))
271 | 
272 | ```
273 | 
274 | Manipulating data is sometimes easier if you can use a well developed
275 | language to manipulate it. Fortunately, the SQLite library is an
276 | excellent tool for manipulating large datasets in a self-contained
277 | database file, without all the complexity of a database management
278 | system like Mariadb/MySQL or PostgreSQL. We can access SQLite from
279 | Julia. It is somewhat slower than in R at the moment, but still fast
280 | enough for our purposes. Hopefully in the future it will be even
281 | better.
282 | 
283 | ```julia
284 | 
285 | using SQLite
286 | 
287 | db = SQLite.DB("coviddb.db")
288 | 
289 | SQLite.drop!(db,"CovidCases";ifexists=true);
290 | SQLite.load!(covcasedf,db,"CovidCases";ifnotexists=true);
291 | ```
292 | 
293 | Let's grab all deaths by day, aggregated by age group:
294 | 
295 | ```julia
296 | byagedeaths = DBInterface.execute(db,"select cdc_report_dt,count(*) as N,age_group,death_yn from CovidCases group by cdc_report_dt,age_group,death_yn") |> DataFrame
297 | 
298 | first(byagedeaths,10)
299 | 
300 | ```
301 | 
302 | Now a graph that shows 40-50 year old death rates:
303 | 
304 | 
305 | ```julia
306 | 
307 | function plotdeaths(df,agegroup)
308 |     df |> @filter(_.age_group == agegroup && _.death_yn == "Yes") |> 
309 |         @vlplot(layer=[],title="US-wide Deaths per day (age $agegroup)",width=600) + 
310 |         @vlplot(:point,x=:cdc_report_dt,y={:N,scale={domain=[0,1000]}},) + 
311 |         @vlplot(:line,x=:cdc_report_dt,y=:N,transform=[{loess=:N, on=:cdc_report_dt,bandwidth=.2}])
312 | end
313 | 
314 | 
315 | plot40 = byagedeaths |> x->plotdeaths(x,"40 - 49 Years")
316 | plot50 = byagedeaths |> x->plotdeaths(x,"50 - 59 Years")
317 | plot60 = byagedeaths |> x->plotdeaths(x,"60 - 69 Years")
318 | plot70 = byagedeaths |> x->plotdeaths(x,"70 - 79 Years")
319 | 
320 | map(display,[plot40,plot50,plot60,plot70]);
321 | 
322 | ```
323 | 


--------------------------------------------------------------------------------
/DiscussionBasicDataAndPlots.jmd:
--------------------------------------------------------------------------------
  1 | # Discussion of "Acquiring some data and plotting it"
  2 | 
  3 | Although the Tutorials are designed to be very straightforward, so you
  4 | can see what we're doing, and you learn by doing... the Discussion is
  5 | all about **putting what we did in a broader perspective** and **showing you
  6 | possible alternatives**, or where things could go wrong. Those are all
  7 | just distractions during the initial learning phase, but are important
  8 | perspectives to have once you're past the initial comfort zone, so you
  9 | can make better decisions about how to do things. **In the tutorial we
 10 | try to select one good way to accomplish the task and stick to it.** In
 11 | the discussion we might show you 3 or 4 other ways.
 12 | 
 13 | First off, we needed to get the CSV file from the Census. The first
 14 | thing that will generally go wrong is that **you don't have the
 15 | slightest clue where to look for the dang file**. When I searched for
 16 | ["census population csv"](https://www.startpage.com/do/dsearch?query=census+population+csv&cat=web&pl=ext-ff&language=english&extVersion=1.3.0)
 17 | on StartPage (an anonymous proxy for Google searches) it led to
 18 | [State Population Totals](https://www.census.gov/data/datasets/time-series/demo/popest/2010s-state-total.html)
 19 | which at the bottom has a link to
 20 | [Datasets](https://www2.census.gov/programs-surveys/popest/datasets/)
 21 | which unhelpfully plops you into a directory tree by decade... so
 22 | selecting 2010-2019 and then counties and then totals led to
 23 | [our final csv file](https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv).
 24 | 
 25 | Since we want states, **it might have made more sense to go to the
 26 | "states" directory**, but if you had done that, you'd have found one
 27 | file with just the 2019 data, whereas **the county directory has the
 28 | full time series, and it has summary data for each state**. Such is the
 29 | life of a data analyst.
 30 | 
 31 | Once we have a file we want to analyze, we need to get the data into
 32 | Julia to analyze it. There are several issues:
 33 | 
 34 | 1. Do you want to download the file via http/https and store it
 35 |    locally, or just grab the contents from the web and read it into memory?
 36 | 2. Which package to do you use to read it into memory?
 37 | 3. How do you handle the data in memory?
 38 | 
 39 | 
 40 | ## Choice of Queryverse for tabular data
 41 | 
 42 | 
 43 | I think the [Queryverse package set](https://www.queryverse.org/) is well thought out from a
 44 | computational perspective and well maintained by
 45 | [David Anthoff](https://www.david-anthoff.com/). He is committed to
 46 | not making breaking changes to his code as much as possible, and has a
 47 | great [video tutorial](https://www.youtube.com/watch?v=OFPNph-WxLM) on
 48 | using Queryverse. But there are a few other useful packages to know
 49 | about which we will discuss below.
 50 | 
 51 | 
 52 | ## Downloading Files to Disk
 53 | 
 54 | If you want to download files to disk, rather than just process them
 55 | in memory, you should know about the
 56 | [HTTP package](https://juliaweb.github.io/HTTP.jl/stable/). Here is an
 57 | example of getting a file from a URL and putting it into a file on
 58 | disk called "co-est2019-alldata.csv" which matches the original name.
 59 | 
 60 | ```julia
 61 | using HTTP
 62 | 
 63 | outfile = open("co-est2019-alldata.csv","w")
 64 | HTTP.get("https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv",response_stream=outfile)
 65 | close(outfile)
 66 | 
 67 | ```
 68 | 
 69 | In this code, we first open our output file for writing, and then hand
 70 | it to HTTP.get() as the `response_stream=` keyword argument. We have
 71 | to be explicit about the HTTP.get because otherwise some other "get"
 72 | function is called. The `HTTP.get` function will stream the response
 73 | to the file for us... we then close the file to ensure it's written
 74 | properly to disk.
 75 | 
 76 | if you do this a bunch, you might write a utility function
 77 | specifically to do it all for you.
 78 | 
 79 | ```julia
 80 | function http_get(url,outfile)
 81 | 	out = open(outfile,"w");
 82 | 	try
 83 | 		HTTP.get(url,response_stream=out)
 84 | 	catch err
 85 | 		println("there was an error $err while saving $url to file $outfile")
 86 | 	end
 87 | 	close(out)
 88 | end
 89 | ```
 90 | 
 91 | Here we
 92 | [introduce a little error handling](https://docs.julialang.org/en/v1/manual/control-flow/#Exception-Handling-1)
 93 | as well. The error handling ensures we are alerted to any problems
 94 | with the network connection or disk io... and that we always close the
 95 | file.
 96 | 
 97 | 
 98 | ## Alternative CSV Handling
 99 | 
100 | If you're looking to read CSV type files, there are a number of
101 | alternative packages from the Queryverse provided CSVFiles
102 | one. Queryverse provides packages to handle CSV,
103 | [Apache Feather](https://arrow.apache.org/docs/python/feather.html),
104 | Excel, SPSS, Stata, SAS, and Parquet files. They are
105 | [reasonably fast](https://www.queryverse.org/benchmarks/), with 1M
106 | rows of 20 columns of mixed data taking 2 seconds or so, the
107 | interfaces are all uniform among the files, and the whole ecosystem
108 | works well within itself, and has excellent generic interfaces to
109 | other packages as well. This makes it hard to recommend alternatives,
110 | but it is worth mentioning the
111 | [CSV.jl](https://juliadata.github.io/CSV.jl/stable/) package, which
112 | can be used to easily read a CSV file into a DataFrame as follows:
113 | 
114 | ```julia;exec=false
115 | 
116 | using CSV;
117 | 
118 | mydf = CSV.read("myfile.csv")
119 | 
120 | ```
121 | 
122 | A convenience function `download` in Julia uses external tools like
123 | curl or wget as available to grab files from urls... but watch out if
124 | you're running on a machine where those external tools aren't
125 | available.
126 | 
127 | ```
128 | help?>  download
129 | search: download
130 | 
131 |   download(url::AbstractString, [localfile::AbstractString])
132 | 
133 |   Download a file from the given url, optionally renaming it to the given
134 |   local file name. If no filename is given this will download into a
135 |   randomly-named file in your temp directory. Note that this function
136 |   relies on the availability of external tools such as curl, wget or fetch
137 |   to download the file and is provided for convenience. For production use
138 |   or situations in which more options are needed, please use a package
139 |   that provides the desired functionality instead.
140 | 
141 |   Returns the filename of the downloaded file.
142 | ```
143 | 
144 | So we could do:
145 | 
146 | ```julia
147 | 
148 | using CSV
149 | 
150 | mydf = CSV.read(download("https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv"))
151 | 
152 | 
153 | ```
154 | 
155 | Note that CSVFiles/Queryverse doesn't seem to have these external
156 | requirements and will read from a URL with its built-in `load`
157 | function.
158 | 
159 | ## Queryverse queries, syntactic sugar, and Data Frames
160 | 
161 | The de-facto standard for in memory tabular data is the
162 | [DataFrames](https://juliadata.github.io/DataFrames.jl/stable/)
163 | package. We use it extensively here. When we pipe these through
164 | Queryverse queries such as `df |> @filter(...)` the Queryverse queries
165 | work with
166 | [iterators](https://docs.julialang.org/en/v1/manual/interfaces/) over
167 | [named tuples](https://docs.julialang.org/en/v1/manual/types/#Named-Tuple-Types-1). Basically
168 | a named tuple is the same as
169 | [E.F. Codd's](https://en.wikipedia.org/wiki/Edgar_F._Codd) notion of a
170 | [Relational Model](https://en.wikipedia.org/wiki/Relational_model) of
171 | "rows" or "tuples" of data grouped together into a "relation" or
172 | "table". This makes the interface extremely general purpose. However
173 | if you want to be able to grab particular values rather than interate
174 | over the entire table, you'll need to construct a DataFrame to hold
175 | the relation represented by the Query. At the end of the chain of
176 | operations if we want a DataFrame we need to pipe into the DataFrame
177 | constructor.
178 | 
179 | ```julia;exec=false
180 | newdf = mydf |> @filter(...) |> @mutate(...) |> DataFrame
181 | 
182 | ## or what's the same
183 | 
184 | newdf = DataFrame(mydf |> @filter(...) |> @mutate(...) )
185 | 
186 | ```
187 | 
188 | The Pipe operator `|>` is a syntactical shorthand for prefix notation
189 | for calling a function with the first argument coming from the left
190 | hand side of the pipe. So `a |> foo(b,c)` is equivalent to
191 | `foo(a,b,c)`.
192 | 
193 | Speaking of syntax, the notation `@foo` denotes a
194 | ["macro"](https://docs.julialang.org/en/v1/manual/metaprogramming/),
195 | that is a function which receives as its arguments the parsed
196 | representation of the language tree and which can then modify the
197 | language itself to generate / expand some syntactic elements into code
198 | to be run. Generating code is known as "metaprogramming" because
199 | you're writing programs that write programs. So presumably
200 | `@filter(_.a == 1)` **generates some code** that iterates through the
201 | named tuples coming in on the left hand side and throws away those for
202 | which the value of the field `a` is not equal to 1. Macros are mostly
203 | useful because they allow you to create specialized sub-languages
204 | within Julia, a feature not found in most languages other than LISP
205 | dialects (some people, including myself, consider Julia a LISP
206 | dialect).
207 | 
208 | ## Splitting The What and the When
209 | 
210 | In addition to the "wide form" we also need to undo the combination of
211 | the year with the variable name.
212 | 
213 | When we split out the year we used the `tryparse` function. If it
214 | can't parse a number, it returns `nothing` which is a special value
215 | meaning nothing... We always want an Integer though, so we call
216 | `something(...)` which returns the first argument that isn't
217 | `nothing`, so that when we get nothing, something will give us -1
218 | instead which is unambiguously not a real year for this dataset. This
219 | keeps our column type as Int (and avoids problems later). When I tried
220 | this without using `something()` to ensure an integer was returned,
221 | then the ENTIRE table became a special "container" type
222 | `DataValue{Any}` type. This made it impossible to convert the Symbols
223 | to Strings.
224 | 
225 | When we called the `stack` function, it creates a dataset in which
226 | each variable is stored as a
227 | ["symbol"](https://docs.julialang.org/en/v1/manual/metaprogramming/#Symbols-1). You
228 | can think of a "symbol" as a single structure in memory that
229 | represents a "name". In computer science this is called an
230 | ["interned string"](https://en.wikipedia.org/wiki/String_interning) a
231 | kind of Platonic ideal of the string... there can be only one, it's
232 | immutable, and it can be mapped one to one with a particular location
233 | in memory. This makes working with symbols very easy when it comes to
234 | things like checking to see if two symbol variables are the
235 | same... you can just compare their position in the global "symbol
236 | table" or something similar. whereas comparing two strings, we must
237 | compare each character in the string to see if they match.
238 | 
239 | ```julia
240 | using BenchmarkTools
241 | 
242 | a = "thequickbrownfoxjumpedoverthelazydog"; b="thequickbrownfoxjumpedoverthelazydog";
243 | asym = :thequickbrownfoxjumpedoverthelazydog; bsym = :thequickbrownfoxjumpedoverthelazydog;
244 | 
245 | @btime a === b;
246 | @btime asym === bsym;
247 | 
248 | 
249 | ``` 
250 | 
251 | The check `a===b` has to actually check all the characters in the
252 | string, so long strings will take longer. Whereas the symbol is just
253 | converted to a kind of marker in the symbol table... To check if two
254 | variables have the same symbol in them, we can immediately just check
255 | to see if they point to the same object regardless of how long the
256 | string associated to the symbol name is.
257 | 
258 | But a symbol is just an atomic thing, it isn't a string you can get
259 | the third character of for example. So when we want to split out the
260 | last 4 characters, we want to work with a string. Hence, we converted
261 | it, using this bit of code:
262 | 
263 | `dflong |> @mutate(year = something(tryparse(Int,string(_.variable)[end-3:end]),-1), variable=String(_.variable)) |> @mutate(variable = _.year > 0 ? _.variable[1:end-4] : _.variable)`
264 | 
265 | 
266 | Let's unparse that a little. We first mutated the table to include the
267 | year, and changed the "variable" by constructing a String from the
268 | symbol. Then for rows where year was a positive number, we took all
269 | but the last 4 characters, otherwise we took the whole string.
270 | 
271 | Note the conditional
272 | [ternary operator](https://docs.julialang.org/en/v1/manual/control-flow/). It
273 | has the form `a ? b : c` and means the same as `if a b else c end`. As
274 | far as I know this ternary operator comes originally from C.
275 | 
276 | 
277 | 
278 | # Visualizing Aspects of Population Data
279 | 
280 | 
281 | There are approximately 4 actively maintained Julia plotting
282 | libraries:
283 | 
284 | 1. Plots
285 | 2. Gadfly
286 | 3. Vega/VegaLite
287 | 4. Makie
288 | 
289 | There may be others as well, but these are the best known.
290 | 
291 | We chose Gadfly in this Tutorial because it offers a "Grammar Of
292 | Graphics" style of plot specification. This is a good style for use in
293 | exploring data because it lets you compose different components
294 | together in a reasonable and understandable way. If you come from a
295 | place where you know Matlab or Python you may be more comfortable with
296 | Plots or Makie.
297 | 
298 | The VegaLite library is part of Queryverse and is also a Grammar Of
299 | Graphics influenced library, however its graphical specification
300 | language is based on a specification written in
301 | [JSON](https://en.wikipedia.org/wiki/JSON) and it's a bit involved to
302 | learn that whole system. Possibly worth it, but maybe not for a first
303 | Tutorial.
304 | 
305 | The Gadfly function `set_default_plot_size(w,h)` obviously sets the
306 | default sizes. It takes two arguments, a width and a height, which
307 | should be expressed as elements of the type `Measures.Length`. The
308 | variable `cm` is a constant global and the expression `20cm` is really
309 | the same as `20*cm` since Julia interprets adjacency as
310 | multiplication. There is also a constant global `inch`.
311 | 
312 | 
313 | ## Fitting basic models:
314 | 
315 | We brought in for our first example of model fitting, the GLM
316 | library. This lets you fit relatively common simple models, and uses a
317 | formula syntax that is very similar to the one used by the lm or glm
318 | functions in R. 
319 | 
320 | 
321 | ```julia;exec=false
322 | using GLM
323 | 
324 | idgrowth = lm(@formula(stpop ~ (year-2015)+(year-2015)^2,smalldata))
325 | 
326 | display(coef(idgrowth))
327 | 
328 | predict(idgrowth,DataFrame(year=[2020,2021]),interval=:prediction)
329 | 
330 | ```
331 | 
332 | The GLM package stands for "Generalized Linear Models" and fits models
333 | by essentially the maximum likelihood method. This means it gives
334 | point estimates, and uses the curvature of the likelihood function to
335 | estimate the standard errors of the parameters.
336 | 
337 | Since these tutorials are written by an opinionated Bayesian, you can
338 | take the confidence intervals spit out by GLM as more or less
339 | approximately equal to a high probability density interval under a
340 | broad prior distribution. That's not always very reasonable thing to
341 | do. In our next tutorial perhaps we will build some explicit Bayesian
342 | models!
343 | 
344 | 


--------------------------------------------------------------------------------
/GMTMaps.jmd:
--------------------------------------------------------------------------------
  1 | # Visualizing Spatial Data with GMT
  2 | 
  3 | GMT is the "Generic Mapping Tools" a powerful toolset for geospatial visualizations. With great power comes great learning curves.
  4 | The goal here is to make it possible for someone who has a data analysis background but not much specific knowledge of geospatial software to
  5 | make maps that show their data in 2D maps, and simultaneously to learn something about the framework and what it is capable of, and how to
  6 | use it.
  7 | 
  8 | # Problem 1: a Choropleth map of US Census Data
  9 | 
 10 | One of the most common tasks in basic geospatial data analysis is to create a map in which regions of the world are colored based on
 11 | the value of some statistic pertaining to the region. We will start with a very simple map of population for each county in the US. 
 12 | The data for this can be found from the Census bureau at https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/totals/co-est2020-alldata.csv
 13 | 
 14 | ```julia
 15 | using CSV,DataFrames,GMT,Printf,DataFramesMeta,Downloads
 16 | 
 17 | countypopurl = "https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/totals/co-est2020-alldata.csv"
 18 | 
 19 | Downloads.download(countypopurl,"data/countypops.csv")
 20 | 
 21 | cpopall = CSV.read("data/countypops.csv",DataFrame)
 22 | 
 23 | cpop = cpopall[:,[:STATE,:COUNTY,:STNAME,:CTYNAME,:POPESTIMATE2020]]
 24 | 
 25 | ```
 26 | 
 27 | Now we have a large number of different estimates for each county in the US. For simplicity let's use the POPESTIMATE2020 variable. The COUNTY 
 28 | variable is the FIPS county code for the county.
 29 | 
 30 | In order to build a Choropleth map, we will need to have a file that defines the spatial boundaries of each county! GMT is a generic mapping tool
 31 | It will read various geospatial data formats which can be used to define the regions to be plotted. The Census bureau has a file which defines
 32 | the boundaries of the counties at 3 different resolutions. These are in "shapefile" format. The files and other related shapefiles are available
 33 | [at the Census website](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html). We will use the medium
 34 | resolution version of the county shapefiles (5 million meter resolution).
 35 | 
 36 | ```julia{eval=false}
 37 | countyshapeurl = "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_5m.zip"
 38 | 
 39 | download(countyshapeurl,"data/cb_2018_us_county_5m.zip")
 40 | cd("data")
 41 | run(`unzip cb_2018_us_county_5m.zip`)
 42 | cd("..")
 43 | 
 44 | ```
 45 | 
 46 | Inside the zip file are a number of files, the one ending in ".shp" is in the shape file format. We can read this using GMT's function
 47 | gmtread
 48 | 
 49 | ```julia
 50 | counties = gmtread("data/cb_2018_us_county_5m.shp")
 51 | ```
 52 | 
 53 | Now, what kind of thing is "counties?" 
 54 | 
 55 | ```julia
 56 | typeof(counties)
 57 | ```
 58 | 
 59 | It's a Vector{GMTdataset}. Basically GMT treats the shape file as a collection of shapes, each shape gets its own GMTdataset, which is a structure
 60 | 
 61 | ```{eval=false}
 62 | search: GMTdataset
 63 | 
 64 |   No documentation found.
 65 | 
 66 |   Summary
 67 |   ≡≡≡≡≡≡≡≡≡
 68 | 
 69 |   mutable struct GMTdataset{T<:Real, N}
 70 | 
 71 |   Fields
 72 |   ≡≡≡≡≡≡≡≡
 73 | 
 74 |   data     :: Array{T<:Real, N}
 75 |   ds_bbox  :: Vector{Float64}
 76 |   bbox     :: Vector{Float64}
 77 |   attrib   :: Dict{String, String}
 78 |   colnames :: Vector{String}
 79 |   text     :: Vector{String}cpop.CFIPS = format("%03d",cpop.COUNTY)
 80 | 
 81 |   geom     :: Int64
 82 | 
 83 |   Supertype Hierarchy
 84 |   ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
 85 | 
 86 |   GMTdataset{T<:Real, N} <: AbstractArray{T<:Real, N} <: Any
 87 | ```
 88 | 
 89 | In addition to "data" which is lattitude and longitude values in this case, there are a variety of metadata fields. one of which is "header".
 90 | 
 91 | Let's look at the header of the first county:
 92 | ```julia
 93 | counties[1].header
 94 | counties[1].attrib
 95 | ```
 96 | 
 97 | This shows that the header is a comma separated values string. We want to "join" our data to these counties such that the county FIPS identifier 
 98 | is the same. The FIPS identifier is a unique numerical code assigned to each county (the names are not necessarily unique). 
 99 | 
100 | In this case, we have the value 39 being the FIPS code for the state of Ohio, and 071 being the county FIPS code for Highland County.
101 | 
102 | So we'd like to join field 2 of the header to the COUNTY column of our population data. 
103 | However our population data represents the COUNTY field as a number, not a 0 padded string. Let's fix that then do the join
104 | 
105 | 
106 | ```julia
107 | 
108 | cpop.COUNTY = Printf.format.(Ref(Printf.Format("%03d")),cpop.COUNTY)
109 | cpop.STATE = Printf.format.(Ref(Printf.Format("%02d")),cpop.STATE)
110 | 
111 | cpop = cpop[cpop.COUNTY .!= "000",:] ## eliminate county "000" which is "the whole state" for each state
112 | 
113 | dfc = DataFrame(STATE = map(x->x.attrib["STATEFP"],counties),COUNTY=map(x -> x.attrib["COUNTYFP"],counties),ORDER=1:length(counties))
114 | 
115 | 
116 | joineddata = @chain leftjoin(dfc,cpop,on= [:STATE,:COUNTY],makeunique=true) begin
117 | @orderby(:ORDER)
118 | end
119 | 
120 | cptvallog = makecpt(range=(log(1000),log(11e6)),C=:plasma)
121 | 
122 | #imshow(counties,level=joineddata.POPESTIMATE2020,cmap=cptval,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),proj=:guess,colorbar=true)
123 | 
124 | ```
125 | 
126 | There are missing values of POPESTIMATE2020 so let's replace them with NaN so we at least still get the polygon drawn. GMT will draw NaN 
127 | as a special color.
128 | 
129 | ```julia
130 | joineddata.POPESTIMATE2020 = replace(joineddata.POPESTIMATE2020,missing => NaN)
131 | 
132 | 
133 | 
134 | GMT.plot(counties,level=log.(1.0 .+ joineddata.POPESTIMATE2020),cmap=cptvallog,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
135 |     proj=:guess,colorbar=true,figname="choroplethlog.png",title="Continental US County log(Population)")
136 | ```
137 | ![Choropleth of log(population) at county level](choroplethlog.png)
138 | 
139 | 
140 | Let's see what it looks like if we don't take the logarithm...
141 | 
142 | 
143 | ```julia
144 | cptval = makecpt(range=(0,11_000_000),C=:plasma)
145 | 
146 | GMT.plot(counties,level=joineddata.POPESTIMATE2020,cmap=cptval,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
147 |     proj=:guess,colorbar=true,figname="choroplethnolog.png",title="Continental US County Population")
148 | ```
149 | 
150 | ![Continental Population](choroplethnolog.png)
151 | 
152 | It may be interesting to work further with the polygons that represent the counties in some additional ways. For example we could calculate the population
153 | Density by dividing the population by the area of the polygon. We can calculate the area using the function gmtspatial.
154 | 
155 | ```julia
156 | 
157 | measures = gmtspatial(counties, area="k")[1] # to get area in km^2
158 | print(typeof(measures))
159 | length(measures)
160 | print(measures)
161 | ```
162 | 
163 | But some counties will consiste of several polygons, like some along the coast may include some islands, etc, to get the total area, we 
164 | need to group by county identifier, and sum all the areas.
165 | 
166 | ```julia
167 | countyareas = @chain dfc begin
168 |   @transform(:area = measures[:,3])
169 |   groupby([:STATE,:COUNTY])
170 |   @combine(:totarea = sum(:area))
171 |   @select(:STATE,:COUNTY,:totarea)
172 | end
173 | 
174 | 
175 | cpop = @chain leftjoin(cpop,countyareas,on = [:STATE, :COUNTY]) begin
176 |   @transform(:density = :POPESTIMATE2020 ./ :totarea)
177 |   @orderby(:STATE,:COUNTY)
178 | end
179 | 
180 | print(cpop[1:10,:])
181 | 
182 | joineddata = @chain leftjoin(dfc,cpop,on = [:STATE,:COUNTY]) begin 
183 |   @orderby(:ORDER)
184 | end
185 | 
186 | 
187 | 
188 | ```
189 | 
190 | ```julia
191 | denscpt = makecpt(range=(0,11e6/(100*100)),C=:plasma)
192 | 
193 | 
194 | GMT.plot(counties,level=Vector{Float64}(replace(joineddata.density,missing=>NaN)),cmap=denscpt,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
195 |     proj=:guess,colorbar=true,figname="choroplethdens.png",title="Continental US County Pop Density (1/km^2)")
196 | 
197 | ```
198 | 
199 | ![Continental US Density](choroplethdens.png)
200 | 
201 | 
202 | ## Smallest Area Containing ~50% of the population
203 | 
204 | Our approach to this question will be to sort the counties by density in decreasing order, cumsum the populations... and then create an indicator 
205 | variable for whether the cumsum is less than 50% of the total, then plot this indicator variable.
206 | 
207 | ```julia
208 | 
209 | totpop = sum(cpop[cpop.COUNTY .!= "000",:POPESTIMATE2020]) ## ignore county 0, that's the "whole state"
210 | print(totpop)
211 | 
212 | areads = @chain cpop begin
213 | @subset(:COUNTY .!= "000")
214 | @orderby(-:density)
215 | @transform( :csum = cumsum(:POPESTIMATE2020))
216 | @transform( :inset = :csum .< totpop/2.0,:in80set = :csum .< 0.8 * totpop)
217 | end
218 | 
219 | joineddata = @chain leftjoin(dfc,areads,on=[:STATE,:COUNTY]) begin
220 |   @orderby(:ORDER)
221 | end
222 | 
223 | insetcmap = makecpt(range(0,1),C=:plasma)
224 | 
225 | GMT.plot(counties,level=Float64.(replace(joineddata.inset,missing => NaN)),cmap=insetcmap,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
226 |     proj=:guess,colorbar=true,figname="choroplethhpdset.png",title="Continental US smallest 50% set ")
227 | 
228 | ```
229 | 
230 | ![high probability density set 50%](choroplethhpdset.png)
231 | 
232 | 
233 | ## 80%tile smallest area
234 | 
235 | ```julia
236 | GMT.plot(counties,level=Float64.(replace(joineddata.in80set,missing => NaN)),cmap=insetcmap,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
237 |     proj=:guess,colorbar=true,figname="choroplethhpd80set.png",title="Continental US smallest 80% set ")
238 | 
239 | ```
240 | 
241 | ![high probability density set 50%](choroplethhpd80set.png)
242 | 
243 | 
244 | 
245 | 


--------------------------------------------------------------------------------
/Manifest.toml:
--------------------------------------------------------------------------------
  1 | # This file is machine-generated - editing it directly is not advised
  2 | 
  3 | julia_version = "1.7.1"
  4 | manifest_format = "2.0"
  5 | 
  6 | [[deps.ArgTools]]
  7 | uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"
  8 | 
  9 | [[deps.Arrow]]
 10 | deps = ["CategoricalArrays", "Dates"]
 11 | git-tree-sha1 = "c86df6ed41b3bd192d663e5e0e7cac0d11fd4375"
 12 | uuid = "69666777-d1a9-59fb-9406-91d4454c9d45"
 13 | version = "0.2.4"
 14 | 
 15 | [[deps.Artifacts]]
 16 | uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
 17 | 
 18 | [[deps.Base64]]
 19 | uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
 20 | 
 21 | [[deps.BinaryProvider]]
 22 | deps = ["Libdl", "Logging", "SHA"]
 23 | git-tree-sha1 = "ecdec412a9abc8db54c0efc5548c64dfce072058"
 24 | uuid = "b99e7846-7c00-51b0-8f62-c81ae34c0232"
 25 | version = "0.5.10"
 26 | 
 27 | [[deps.CEnum]]
 28 | git-tree-sha1 = "215a9aa4a1f23fbd05b92769fdd62559488d70e9"
 29 | uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
 30 | version = "0.4.1"
 31 | 
 32 | [[deps.CSV]]
 33 | deps = ["CodecZlib", "Dates", "FilePathsBase", "InlineStrings", "Mmap", "Parsers", "PooledArrays", "SentinelArrays", "Tables", "Unicode", "WeakRefStrings"]
 34 | git-tree-sha1 = "9519274b50500b8029973d241d32cfbf0b127d97"
 35 | uuid = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
 36 | version = "0.10.2"
 37 | 
 38 | [[deps.CSVFiles]]
 39 | deps = ["CodecZlib", "DataValues", "FileIO", "HTTP", "IterableTables", "IteratorInterfaceExtensions", "TableShowUtils", "TableTraits", "TableTraitsUtils", "TextParse"]
 40 | git-tree-sha1 = "d4dd66b73d3c811daa67587980bf45a179d16983"
 41 | uuid = "5d742f6a-9f54-50ce-8119-2520741973ca"
 42 | version = "1.0.1"
 43 | 
 44 | [[deps.CategoricalArrays]]
 45 | deps = ["DataAPI", "Future", "JSON", "Missings", "Printf", "Statistics", "StructTypes", "Unicode"]
 46 | git-tree-sha1 = "2ac27f59196a68070e132b25713f9a5bbc5fa0d2"
 47 | uuid = "324d7699-5711-5eae-9e2f-1d82baa6b597"
 48 | version = "0.8.3"
 49 | 
 50 | [[deps.Chain]]
 51 | git-tree-sha1 = "339237319ef4712e6e5df7758d0bccddf5c237d9"
 52 | uuid = "8be319e6-bccf-4806-a6f7-6fae938471bc"
 53 | version = "0.4.10"
 54 | 
 55 | [[deps.ChainRulesCore]]
 56 | deps = ["Compat", "LinearAlgebra", "SparseArrays"]
 57 | git-tree-sha1 = "f9982ef575e19b0e5c7a98c6e75ee496c0f73a93"
 58 | uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
 59 | version = "1.12.0"
 60 | 
 61 | [[deps.ChangesOfVariables]]
 62 | deps = ["ChainRulesCore", "LinearAlgebra", "Test"]
 63 | git-tree-sha1 = "bf98fa45a0a4cee295de98d4c1462be26345b9a1"
 64 | uuid = "9e997f8a-9a97-42d5-a9f1-ce6bfc15e2c0"
 65 | version = "0.1.2"
 66 | 
 67 | [[deps.CodecZlib]]
 68 | deps = ["TranscodingStreams", "Zlib_jll"]
 69 | git-tree-sha1 = "ded953804d019afa9a3f98981d99b33e3db7b6da"
 70 | uuid = "944b1d66-785c-5afd-91f1-9de20f533193"
 71 | version = "0.7.0"
 72 | 
 73 | [[deps.CodecZstd]]
 74 | deps = ["CEnum", "TranscodingStreams", "Zstd_jll"]
 75 | git-tree-sha1 = "849470b337d0fa8449c21061de922386f32949d9"
 76 | uuid = "6b39b394-51ab-5f42-8807-6242bab2b4c2"
 77 | version = "0.7.2"
 78 | 
 79 | [[deps.Compat]]
 80 | deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
 81 | git-tree-sha1 = "44c37b4636bc54afac5c574d2d02b625349d6582"
 82 | uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
 83 | version = "3.41.0"
 84 | 
 85 | [[deps.CompilerSupportLibraries_jll]]
 86 | deps = ["Artifacts", "Libdl"]
 87 | uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
 88 | 
 89 | [[deps.Conda]]
 90 | deps = ["Downloads", "JSON", "VersionParsing"]
 91 | git-tree-sha1 = "6cdc8832ba11c7695f494c9d9a1c31e90959ce0f"
 92 | uuid = "8f4d0f93-b110-5947-807f-2305c1781a2d"
 93 | version = "1.6.0"
 94 | 
 95 | [[deps.ConstructionBase]]
 96 | deps = ["LinearAlgebra"]
 97 | git-tree-sha1 = "f74e9d5388b8620b4cee35d4c5a618dd4dc547f4"
 98 | uuid = "187b0558-2788-49d3-abe0-74a17ed4e7c9"
 99 | version = "1.3.0"
100 | 
101 | [[deps.Crayons]]
102 | git-tree-sha1 = "249fe38abf76d48563e2f4556bebd215aa317e15"
103 | uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f"
104 | version = "4.1.1"
105 | 
106 | [[deps.DBInterface]]
107 | git-tree-sha1 = "9b0dc525a052b9269ccc5f7f04d5b3639c65bca5"
108 | uuid = "a10d1c49-ce27-4219-8d33-6db1a4562965"
109 | version = "2.5.0"
110 | 
111 | [[deps.DataAPI]]
112 | git-tree-sha1 = "cc70b17275652eb47bc9e5f81635981f13cea5c8"
113 | uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
114 | version = "1.9.0"
115 | 
116 | [[deps.DataFrames]]
117 | deps = ["Compat", "DataAPI", "Future", "InvertedIndices", "IteratorInterfaceExtensions", "LinearAlgebra", "Markdown", "Missings", "PooledArrays", "PrettyTables", "Printf", "REPL", "Reexport", "SortingAlgorithms", "Statistics", "TableTraits", "Tables", "Unicode"]
118 | git-tree-sha1 = "ae02104e835f219b8930c7664b8012c93475c340"
119 | uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
120 | version = "1.3.2"
121 | 
122 | [[deps.DataFramesMeta]]
123 | deps = ["Chain", "DataFrames", "MacroTools", "OrderedCollections", "Reexport"]
124 | git-tree-sha1 = "ab4768d2cc6ab000cd0cec78e8e1ea6b03c7c3e2"
125 | uuid = "1313f7d8-7da2-5740-9ea0-a2ca25f37964"
126 | version = "0.10.0"
127 | 
128 | [[deps.DataStructures]]
129 | deps = ["Compat", "InteractiveUtils", "OrderedCollections"]
130 | git-tree-sha1 = "3daef5523dd2e769dad2365274f760ff5f282c7d"
131 | uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
132 | version = "0.18.11"
133 | 
134 | [[deps.DataTables]]
135 | deps = ["DataValues", "ReadOnlyArrays", "TableShowUtils", "TableTraitsUtils"]
136 | git-tree-sha1 = "9b069372a767fc6142feecc8e6d737d1b1de4711"
137 | uuid = "743a1d0a-8ebc-4f23-814b-50d006366bc6"
138 | version = "0.1.0"
139 | 
140 | [[deps.DataValueInterfaces]]
141 | git-tree-sha1 = "bfc1187b79289637fa0ef6d4436ebdfe6905cbd6"
142 | uuid = "e2d170a0-9d28-54be-80f0-106bbe20a464"
143 | version = "1.0.0"
144 | 
145 | [[deps.DataValues]]
146 | deps = ["DataValueInterfaces", "Dates"]
147 | git-tree-sha1 = "d88a19299eba280a6d062e135a43f00323ae70bf"
148 | uuid = "e7dc6d0d-1eca-5fa6-8ad6-5aecde8b7ea5"
149 | version = "0.4.13"
150 | 
151 | [[deps.DataVoyager]]
152 | deps = ["DataValues", "Electron", "FilePaths", "IterableTables", "IteratorInterfaceExtensions", "JSON", "TableTraits", "Test", "URIParser", "VegaLite"]
153 | git-tree-sha1 = "159f1d3f07225a59dd4edb8ad15e607fefac9543"
154 | uuid = "5721bf48-af8e-5845-8445-c9e18126e773"
155 | version = "1.0.2"
156 | 
157 | [[deps.Dates]]
158 | deps = ["Printf"]
159 | uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"
160 | 
161 | [[deps.DelimitedFiles]]
162 | deps = ["Mmap"]
163 | uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"
164 | 
165 | [[deps.DensityInterface]]
166 | deps = ["InverseFunctions", "Test"]
167 | git-tree-sha1 = "80c3e8639e3353e5d2912fb3a1916b8455e2494b"
168 | uuid = "b429d917-457f-4dbc-8f4c-0cc954292b1d"
169 | version = "0.4.0"
170 | 
171 | [[deps.Distributed]]
172 | deps = ["Random", "Serialization", "Sockets"]
173 | uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
174 | 
175 | [[deps.Distributions]]
176 | deps = ["ChainRulesCore", "DensityInterface", "FillArrays", "LinearAlgebra", "PDMats", "Printf", "QuadGK", "Random", "SparseArrays", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "Test"]
177 | git-tree-sha1 = "2e97190dfd4382499a4ac349e8d316491c9db341"
178 | uuid = "31c24e10-a181-5473-b8eb-7969acd0382f"
179 | version = "0.25.46"
180 | 
181 | [[deps.DocStringExtensions]]
182 | deps = ["LibGit2"]
183 | git-tree-sha1 = "b19534d1895d702889b219c382a6e18010797f0b"
184 | uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
185 | version = "0.8.6"
186 | 
187 | [[deps.DoubleFloats]]
188 | deps = ["GenericLinearAlgebra", "LinearAlgebra", "Polynomials", "Printf", "Quadmath", "Random", "Requires", "SpecialFunctions"]
189 | git-tree-sha1 = "70858638bb1b9acb83bc0a29fdb449891a71af84"
190 | uuid = "497a8b3b-efae-58df-a0af-a86822472b78"
191 | version = "1.1.26"
192 | 
193 | [[deps.Downloads]]
194 | deps = ["ArgTools", "LibCURL", "NetworkOptions"]
195 | uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
196 | 
197 | [[deps.Electron]]
198 | deps = ["Base64", "FilePaths", "JSON", "Pkg", "Sockets", "URIParser", "UUIDs"]
199 | git-tree-sha1 = "a53025d3eabe23659065b3c5bba7b4ffb1327aa0"
200 | uuid = "a1bb12fb-d4d1-54b4-b10a-ee7951ef7ad3"
201 | version = "3.1.2"
202 | 
203 | [[deps.ExcelFiles]]
204 | deps = ["DataValues", "Dates", "ExcelReaders", "FileIO", "IterableTables", "IteratorInterfaceExtensions", "Printf", "PyCall", "TableShowUtils", "TableTraits", "TableTraitsUtils", "XLSX"]
205 | git-tree-sha1 = "f3e5f4279d77b74bf6aef2b53562f771cc5a0474"
206 | uuid = "89b67f3b-d1aa-5f6f-9ca4-282e8d98620d"
207 | version = "1.0.0"
208 | 
209 | [[deps.ExcelReaders]]
210 | deps = ["DataValues", "Dates", "PyCall", "Test"]
211 | git-tree-sha1 = "6f9db420dd362bd5bcea3a0f6dabf8bda587fec3"
212 | uuid = "c04bee98-12a5-510c-87df-2a230cb6e075"
213 | version = "0.11.0"
214 | 
215 | [[deps.ExprTools]]
216 | git-tree-sha1 = "56559bbef6ca5ea0c0818fa5c90320398a6fbf8d"
217 | uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
218 | version = "0.1.8"
219 | 
220 | [[deps.EzXML]]
221 | deps = ["Printf", "XML2_jll"]
222 | git-tree-sha1 = "0fa3b52a04a4e210aeb1626def9c90df3ae65268"
223 | uuid = "8f5d6c58-4d21-5cfd-889c-e3ad7ee6a615"
224 | version = "1.1.0"
225 | 
226 | [[deps.FeatherFiles]]
227 | deps = ["Arrow", "DataValues", "FeatherLib", "FileIO", "IterableTables", "IteratorInterfaceExtensions", "TableShowUtils", "TableTraits", "TableTraitsUtils", "Test"]
228 | git-tree-sha1 = "a2f2b57b23be259d7839bebae2b8f7bba4851a9b"
229 | uuid = "b675d258-116a-5741-b937-b79f054b0542"
230 | version = "0.8.1"
231 | 
232 | [[deps.FeatherLib]]
233 | deps = ["Arrow", "CategoricalArrays", "Dates", "FlatBuffers", "Mmap", "Random"]
234 | git-tree-sha1 = "a3d0c5ca2f08bc8fae4394775f371f8e032149ab"
235 | uuid = "409f5150-fb84-534f-94db-80d1e10f57e1"
236 | version = "0.2.0"
237 | 
238 | [[deps.FileIO]]
239 | deps = ["Pkg", "Requires", "UUIDs"]
240 | git-tree-sha1 = "67551df041955cc6ee2ed098718c8fcd7fc7aebe"
241 | uuid = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
242 | version = "1.12.0"
243 | 
244 | [[deps.FilePaths]]
245 | deps = ["FilePathsBase", "MacroTools", "Reexport", "Requires"]
246 | git-tree-sha1 = "919d9412dbf53a2e6fe74af62a73ceed0bce0629"
247 | uuid = "8fc22ac5-c921-52a6-82fd-178b2807b824"
248 | version = "0.8.3"
249 | 
250 | [[deps.FilePathsBase]]
251 | deps = ["Compat", "Dates", "Mmap", "Printf", "Test", "UUIDs"]
252 | git-tree-sha1 = "04d13bfa8ef11720c24e4d840c0033d145537df7"
253 | uuid = "48062228-2e41-5def-b9a4-89aafe57970f"
254 | version = "0.9.17"
255 | 
256 | [[deps.FillArrays]]
257 | deps = ["LinearAlgebra", "Random", "SparseArrays", "Statistics"]
258 | git-tree-sha1 = "8756f9935b7ccc9064c6eef0bff0ad643df733a3"
259 | uuid = "1a297f60-69ca-5386-bcde-b61e274b549b"
260 | version = "0.12.7"
261 | 
262 | [[deps.FlatBuffers]]
263 | deps = ["Parameters", "Test"]
264 | git-tree-sha1 = "8582924ac52011d08da9cf1e67f13a71dbbc2594"
265 | uuid = "53afe959-3a16-52fa-a8da-cf864710bae9"
266 | version = "0.5.4"
267 | 
268 | [[deps.Formatting]]
269 | deps = ["Printf"]
270 | git-tree-sha1 = "8339d61043228fdd3eb658d86c926cb282ae72a8"
271 | uuid = "59287772-0a20-5a39-b81b-1366585eb4c0"
272 | version = "0.4.2"
273 | 
274 | [[deps.Future]]
275 | deps = ["Random"]
276 | uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"
277 | 
278 | [[deps.GLM]]
279 | deps = ["Distributions", "LinearAlgebra", "Printf", "Reexport", "SparseArrays", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "StatsModels"]
280 | git-tree-sha1 = "fb764dacfa30f948d52a6a4269ae293a479bbc62"
281 | uuid = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
282 | version = "1.6.1"
283 | 
284 | [[deps.GMT]]
285 | deps = ["Conda", "Dates", "Pkg", "Printf", "Statistics"]
286 | git-tree-sha1 = "d1068159f18828ec1831efd34f1880f84a792b98"
287 | uuid = "5752ebe1-31b9-557e-87aa-f909b540aa54"
288 | version = "0.40.1"
289 | 
290 | [[deps.GenericLinearAlgebra]]
291 | deps = ["LinearAlgebra", "Printf", "Random"]
292 | git-tree-sha1 = "ac44f4f51ffee9ff1ea50bd3fbb5677ea568d33d"
293 | uuid = "14197337-ba66-59df-a3e3-ca00e7dcff7a"
294 | version = "0.2.7"
295 | 
296 | [[deps.HTTP]]
297 | deps = ["Base64", "Dates", "IniFile", "Logging", "MbedTLS", "NetworkOptions", "Sockets", "URIs"]
298 | git-tree-sha1 = "0fa77022fe4b511826b39c894c90daf5fce3334a"
299 | uuid = "cd3eb016-35fb-5094-929b-558a96fad6f3"
300 | version = "0.9.17"
301 | 
302 | [[deps.Highlights]]
303 | deps = ["DocStringExtensions", "InteractiveUtils", "REPL"]
304 | git-tree-sha1 = "f823a2d04fb233d52812c8024a6d46d9581904a4"
305 | uuid = "eafb193a-b7ab-5a9e-9068-77385905fa72"
306 | version = "0.4.5"
307 | 
308 | [[deps.IniFile]]
309 | deps = ["Test"]
310 | git-tree-sha1 = "098e4d2c533924c921f9f9847274f2ad89e018b8"
311 | uuid = "83e8ac13-25f8-5344-8a64-a9f2b223428f"
312 | version = "0.5.0"
313 | 
314 | [[deps.InlineStrings]]
315 | deps = ["Parsers"]
316 | git-tree-sha1 = "61feba885fac3a407465726d0c330b3055df897f"
317 | uuid = "842dd82b-1e85-43dc-bf29-5d0ee9dffc48"
318 | version = "1.1.2"
319 | 
320 | [[deps.InteractiveUtils]]
321 | deps = ["Markdown"]
322 | uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
323 | 
324 | [[deps.Intervals]]
325 | deps = ["Dates", "Printf", "RecipesBase", "Serialization", "TimeZones"]
326 | git-tree-sha1 = "323a38ed1952d30586d0fe03412cde9399d3618b"
327 | uuid = "d8418881-c3e1-53bb-8760-2df7ec849ed5"
328 | version = "1.5.0"
329 | 
330 | [[deps.InverseFunctions]]
331 | deps = ["Test"]
332 | git-tree-sha1 = "a7254c0acd8e62f1ac75ad24d5db43f5f19f3c65"
333 | uuid = "3587e190-3f89-42d0-90ee-14403ec27112"
334 | version = "0.1.2"
335 | 
336 | [[deps.InvertedIndices]]
337 | git-tree-sha1 = "bee5f1ef5bf65df56bdd2e40447590b272a5471f"
338 | uuid = "41ab1584-1d38-5bbf-9106-f11c6c58b48f"
339 | version = "1.1.0"
340 | 
341 | [[deps.IrrationalConstants]]
342 | git-tree-sha1 = "7fd44fd4ff43fc60815f8e764c0f352b83c49151"
343 | uuid = "92d709cd-6900-40b7-9082-c6be49f344b6"
344 | version = "0.1.1"
345 | 
346 | [[deps.IterableTables]]
347 | deps = ["DataValues", "IteratorInterfaceExtensions", "Requires", "TableTraits", "TableTraitsUtils"]
348 | git-tree-sha1 = "70300b876b2cebde43ebc0df42bc8c94a144e1b4"
349 | uuid = "1c8ee90f-4401-5389-894e-7a04a3dc0f4d"
350 | version = "1.0.0"
351 | 
352 | [[deps.IteratorInterfaceExtensions]]
353 | git-tree-sha1 = "a3f24677c21f5bbe9d2a714f95dcd58337fb2856"
354 | uuid = "82899510-4779-5014-852e-03e436cf321d"
355 | version = "1.0.0"
356 | 
357 | [[deps.JLLWrappers]]
358 | deps = ["Preferences"]
359 | git-tree-sha1 = "abc9885a7ca2052a736a600f7fa66209f96506e1"
360 | uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210"
361 | version = "1.4.1"
362 | 
363 | [[deps.JSON]]
364 | deps = ["Dates", "Mmap", "Parsers", "Unicode"]
365 | git-tree-sha1 = "8076680b162ada2a031f707ac7b4953e30667a37"
366 | uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
367 | version = "0.21.2"
368 | 
369 | [[deps.JSONSchema]]
370 | deps = ["HTTP", "JSON", "URIs"]
371 | git-tree-sha1 = "2f49f7f86762a0fbbeef84912265a1ae61c4ef80"
372 | uuid = "7d188eb4-7ad8-530c-ae41-71a32a6d4692"
373 | version = "0.3.4"
374 | 
375 | [[deps.LazyArtifacts]]
376 | deps = ["Artifacts", "Pkg"]
377 | uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3"
378 | 
379 | [[deps.LibCURL]]
380 | deps = ["LibCURL_jll", "MozillaCACerts_jll"]
381 | uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21"
382 | 
383 | [[deps.LibCURL_jll]]
384 | deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"]
385 | uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0"
386 | 
387 | [[deps.LibGit2]]
388 | deps = ["Base64", "NetworkOptions", "Printf", "SHA"]
389 | uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
390 | 
391 | [[deps.LibSSH2_jll]]
392 | deps = ["Artifacts", "Libdl", "MbedTLS_jll"]
393 | uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8"
394 | 
395 | [[deps.Libdl]]
396 | uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
397 | 
398 | [[deps.Libiconv_jll]]
399 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
400 | git-tree-sha1 = "42b62845d70a619f063a7da093d995ec8e15e778"
401 | uuid = "94ce4f54-9a6c-5748-9c1c-f9c7231a4531"
402 | version = "1.16.1+1"
403 | 
404 | [[deps.LinearAlgebra]]
405 | deps = ["Libdl", "libblastrampoline_jll"]
406 | uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
407 | 
408 | [[deps.LogExpFunctions]]
409 | deps = ["ChainRulesCore", "ChangesOfVariables", "DocStringExtensions", "InverseFunctions", "IrrationalConstants", "LinearAlgebra"]
410 | git-tree-sha1 = "e5718a00af0ab9756305a0392832c8952c7426c1"
411 | uuid = "2ab3a3ac-af41-5b50-aa03-7779005ae688"
412 | version = "0.3.6"
413 | 
414 | [[deps.Logging]]
415 | uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
416 | 
417 | [[deps.MacroTools]]
418 | deps = ["Markdown", "Random"]
419 | git-tree-sha1 = "3d3e902b31198a27340d0bf00d6ac452866021cf"
420 | uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
421 | version = "0.5.9"
422 | 
423 | [[deps.Markdown]]
424 | deps = ["Base64"]
425 | uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
426 | 
427 | [[deps.MbedTLS]]
428 | deps = ["Dates", "MbedTLS_jll", "Random", "Sockets"]
429 | git-tree-sha1 = "1c38e51c3d08ef2278062ebceade0e46cefc96fe"
430 | uuid = "739be429-bea8-5141-9913-cc70e7f3736d"
431 | version = "1.0.3"
432 | 
433 | [[deps.MbedTLS_jll]]
434 | deps = ["Artifacts", "Libdl"]
435 | uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"
436 | 
437 | [[deps.MemPool]]
438 | deps = ["DataStructures", "Distributed", "Mmap", "Random", "Serialization", "Sockets", "Test"]
439 | git-tree-sha1 = "d52799152697059353a8eac1000d32ba8d92aa25"
440 | uuid = "f9f48841-c794-520a-933b-121f7ba6ed94"
441 | version = "0.2.0"
442 | 
443 | [[deps.Missings]]
444 | deps = ["DataAPI"]
445 | git-tree-sha1 = "f8c673ccc215eb50fcadb285f522420e29e69e1c"
446 | uuid = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
447 | version = "0.4.5"
448 | 
449 | [[deps.Mmap]]
450 | uuid = "a63ad114-7e13-5084-954f-fe012c677804"
451 | 
452 | [[deps.Mocking]]
453 | deps = ["Compat", "ExprTools"]
454 | git-tree-sha1 = "29714d0a7a8083bba8427a4fbfb00a540c681ce7"
455 | uuid = "78c3b35d-d492-501b-9361-3d52fe80e533"
456 | version = "0.7.3"
457 | 
458 | [[deps.MozillaCACerts_jll]]
459 | uuid = "14a3606d-f60d-562e-9121-12d972cd8159"
460 | 
461 | [[deps.Mustache]]
462 | deps = ["Printf", "Tables"]
463 | git-tree-sha1 = "21d7a05c3b94bcf45af67beccab4f2a1f4a3c30a"
464 | uuid = "ffc61752-8dc7-55ee-8c37-f3e9cdd09e70"
465 | version = "1.0.12"
466 | 
467 | [[deps.MutableArithmetics]]
468 | deps = ["LinearAlgebra", "SparseArrays", "Test"]
469 | git-tree-sha1 = "73deac2cbae0820f43971fad6c08f6c4f2784ff2"
470 | uuid = "d8a4904e-b15c-11e9-3269-09a3773c0cb0"
471 | version = "0.3.2"
472 | 
473 | [[deps.NetworkOptions]]
474 | uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908"
475 | 
476 | [[deps.NodeJS]]
477 | deps = ["Pkg"]
478 | git-tree-sha1 = "905224bbdd4b555c69bb964514cfa387616f0d3a"
479 | uuid = "2bd173c7-0d6d-553b-b6af-13a54713934c"
480 | version = "1.3.0"
481 | 
482 | [[deps.Nullables]]
483 | git-tree-sha1 = "8f87854cc8f3685a60689d8edecaa29d2251979b"
484 | uuid = "4d1e1d77-625e-5b40-9113-a560ec7a8ecd"
485 | version = "1.0.0"
486 | 
487 | [[deps.OpenBLAS_jll]]
488 | deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"]
489 | uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
490 | 
491 | [[deps.OpenLibm_jll]]
492 | deps = ["Artifacts", "Libdl"]
493 | uuid = "05823500-19ac-5b8b-9628-191a04bc5112"
494 | 
495 | [[deps.OpenSpecFun_jll]]
496 | deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Pkg"]
497 | git-tree-sha1 = "13652491f6856acfd2db29360e1bbcd4565d04f1"
498 | uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
499 | version = "0.5.5+0"
500 | 
501 | [[deps.OrderedCollections]]
502 | git-tree-sha1 = "85f8e6578bf1f9ee0d11e7bb1b1456435479d47c"
503 | uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
504 | version = "1.4.1"
505 | 
506 | [[deps.PDMats]]
507 | deps = ["LinearAlgebra", "SparseArrays", "SuiteSparse"]
508 | git-tree-sha1 = "ee26b350276c51697c9c2d88a072b339f9f03d73"
509 | uuid = "90014a1f-27ba-587c-ab20-58faa44d9150"
510 | version = "0.11.5"
511 | 
512 | [[deps.Parameters]]
513 | deps = ["OrderedCollections", "UnPack"]
514 | git-tree-sha1 = "34c0e9ad262e5f7fc75b10a9952ca7692cfc5fbe"
515 | uuid = "d96e819e-fc66-5662-9728-84c9c7592b0a"
516 | version = "0.12.3"
517 | 
518 | [[deps.Parquet]]
519 | deps = ["CodecZlib", "CodecZstd", "Dates", "MemPool", "ProtoBuf", "Snappy", "Thrift"]
520 | git-tree-sha1 = "3dc3ed38c932f5e00d75a5af354438c6b80d973d"
521 | uuid = "626c502c-15b0-58ad-a749-f091afb673ae"
522 | version = "0.4.0"
523 | 
524 | [[deps.ParquetFiles]]
525 | deps = ["DataValues", "FileIO", "IterableTables", "IteratorInterfaceExtensions", "Parquet", "TableShowUtils", "TableTraits", "Test"]
526 | git-tree-sha1 = "7b4414214f41e2ae7844ea827bfd4ec7ae71e749"
527 | uuid = "46a55296-af5a-53b0-aaa0-97023b66127f"
528 | version = "0.2.0"
529 | 
530 | [[deps.Parsers]]
531 | deps = ["Dates"]
532 | git-tree-sha1 = "0b5cfbb704034b5b4c1869e36634438a047df065"
533 | uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0"
534 | version = "2.2.1"
535 | 
536 | [[deps.Pkg]]
537 | deps = ["Artifacts", "Dates", "Downloads", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"]
538 | uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
539 | 
540 | [[deps.Polynomials]]
541 | deps = ["Intervals", "LinearAlgebra", "MutableArithmetics", "RecipesBase"]
542 | git-tree-sha1 = "f184bc53e9add8c737e50fa82885bc3f7d70f628"
543 | uuid = "f27b6e38-b328-58d1-80ce-0feddd5e7a45"
544 | version = "2.0.24"
545 | 
546 | [[deps.PooledArrays]]
547 | deps = ["DataAPI", "Future"]
548 | git-tree-sha1 = "db3a23166af8aebf4db5ef87ac5b00d36eb771e2"
549 | uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
550 | version = "1.4.0"
551 | 
552 | [[deps.Preferences]]
553 | deps = ["TOML"]
554 | git-tree-sha1 = "2cf929d64681236a2e074ffafb8d568733d2e6af"
555 | uuid = "21216c6a-2e73-6563-6e65-726566657250"
556 | version = "1.2.3"
557 | 
558 | [[deps.PrettyTables]]
559 | deps = ["Crayons", "Formatting", "Markdown", "Reexport", "Tables"]
560 | git-tree-sha1 = "dfb54c4e414caa595a1f2ed759b160f5a3ddcba5"
561 | uuid = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
562 | version = "1.3.1"
563 | 
564 | [[deps.Printf]]
565 | deps = ["Unicode"]
566 | uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
567 | 
568 | [[deps.ProtoBuf]]
569 | git-tree-sha1 = "51b74991da46594fb411a715e7e092bef50b99ff"
570 | uuid = "3349acd9-ac6a-5e09-bcdb-63829b23a429"
571 | version = "0.8.0"
572 | 
573 | [[deps.PyCall]]
574 | deps = ["Conda", "Dates", "Libdl", "LinearAlgebra", "MacroTools", "Serialization", "VersionParsing"]
575 | git-tree-sha1 = "71fd4022ecd0c6d20180e23ff1b3e05a143959c2"
576 | uuid = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
577 | version = "1.93.0"
578 | 
579 | [[deps.QuadGK]]
580 | deps = ["DataStructures", "LinearAlgebra"]
581 | git-tree-sha1 = "78aadffb3efd2155af139781b8a8df1ef279ea39"
582 | uuid = "1fd47b50-473d-5c70-9696-f719f8f3bcdc"
583 | version = "2.4.2"
584 | 
585 | [[deps.Quadmath]]
586 | deps = ["Printf", "Random", "Requires"]
587 | git-tree-sha1 = "5a8f74af8eae654086a1d058b4ec94ff192e3de0"
588 | uuid = "be4d8f0f-7fa4-5f49-b795-2f01399ab2dd"
589 | version = "0.5.5"
590 | 
591 | [[deps.Query]]
592 | deps = ["DataValues", "IterableTables", "MacroTools", "QueryOperators", "Statistics"]
593 | git-tree-sha1 = "a66aa7ca6f5c29f0e303ccef5c8bd55067df9bbe"
594 | uuid = "1a8c2f83-1ff3-5112-b086-8aa67b057ba1"
595 | version = "1.0.0"
596 | 
597 | [[deps.QueryOperators]]
598 | deps = ["DataStructures", "DataValues", "IteratorInterfaceExtensions", "TableShowUtils"]
599 | git-tree-sha1 = "911c64c204e7ecabfd1872eb93c49b4e7c701f02"
600 | uuid = "2aef5ad7-51ca-5a8f-8e88-e75cf067b44b"
601 | version = "0.9.3"
602 | 
603 | [[deps.Queryverse]]
604 | deps = ["CSVFiles", "DataFrames", "DataTables", "DataValues", "DataVoyager", "ExcelFiles", "FeatherFiles", "FileIO", "IterableTables", "ParquetFiles", "Query", "Reexport", "StatFiles", "VegaLite"]
605 | git-tree-sha1 = "c9654374d9c5bd053c3f286b4c41a0f2b3fe161e"
606 | uuid = "612083be-0b0f-5412-89c1-4e7c75506a58"
607 | version = "0.7.0"
608 | 
609 | [[deps.REPL]]
610 | deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"]
611 | uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
612 | 
613 | [[deps.Random]]
614 | deps = ["SHA", "Serialization"]
615 | uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
616 | 
617 | [[deps.ReadOnlyArrays]]
618 | deps = ["SparseArrays", "Test"]
619 | git-tree-sha1 = "65f17072a35c2be7ac8941aeeae489013212e71f"
620 | uuid = "988b38a3-91fc-5605-94a2-ee2116b3bd83"
621 | version = "0.1.1"
622 | 
623 | [[deps.ReadStat]]
624 | deps = ["DataValues", "Dates", "ReadStat_jll"]
625 | git-tree-sha1 = "f8652515b68572d3362ee38e32245249413fb2d7"
626 | uuid = "d71aba96-b539-5138-91ee-935c3ee1374c"
627 | version = "1.1.1"
628 | 
629 | [[deps.ReadStat_jll]]
630 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Libiconv_jll", "Pkg", "Zlib_jll"]
631 | git-tree-sha1 = "afd287b1031406b3ec5d835a60b388ceb041bb63"
632 | uuid = "a4dc8951-f1cc-5499-9034-9ec1c3e64557"
633 | version = "1.1.5+0"
634 | 
635 | [[deps.RecipesBase]]
636 | git-tree-sha1 = "6bf3f380ff52ce0832ddd3a2a7b9538ed1bcca7d"
637 | uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
638 | version = "1.2.1"
639 | 
640 | [[deps.Reexport]]
641 | git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b"
642 | uuid = "189a3867-3050-52da-a836-e630ba90ab69"
643 | version = "1.2.2"
644 | 
645 | [[deps.RelocatableFolders]]
646 | deps = ["SHA", "Scratch"]
647 | git-tree-sha1 = "cdbd3b1338c72ce29d9584fdbe9e9b70eeb5adca"
648 | uuid = "05181044-ff0b-4ac5-8273-598c1e38db00"
649 | version = "0.1.3"
650 | 
651 | [[deps.Requires]]
652 | deps = ["UUIDs"]
653 | git-tree-sha1 = "838a3a4188e2ded87a4f9f184b4b0d78a1e91cb7"
654 | uuid = "ae029012-a4dd-5104-9daa-d747884805df"
655 | version = "1.3.0"
656 | 
657 | [[deps.Rmath]]
658 | deps = ["Random", "Rmath_jll"]
659 | git-tree-sha1 = "bf3188feca147ce108c76ad82c2792c57abe7b1f"
660 | uuid = "79098fc4-a85e-5d69-aa6a-4863f24498fa"
661 | version = "0.7.0"
662 | 
663 | [[deps.Rmath_jll]]
664 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
665 | git-tree-sha1 = "68db32dff12bb6127bac73c209881191bf0efbb7"
666 | uuid = "f50d1b31-88e8-58de-be2c-1cc44531875f"
667 | version = "0.3.0+0"
668 | 
669 | [[deps.SHA]]
670 | uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
671 | 
672 | [[deps.SQLite]]
673 | deps = ["BinaryProvider", "DBInterface", "Dates", "Libdl", "Random", "SQLite_jll", "Serialization", "Tables", "Test", "WeakRefStrings"]
674 | git-tree-sha1 = "8e14d9b200b975e93a0ae0e5d17dea1c262690ee"
675 | uuid = "0aa819cd-b072-5ff4-a722-6bc24af294d9"
676 | version = "1.4.0"
677 | 
678 | [[deps.SQLite_jll]]
679 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg", "Zlib_jll"]
680 | git-tree-sha1 = "cca82caa0b6bf7f0bc977e69063c0cf5d7da36e5"
681 | uuid = "76ed43ae-9a5d-5a62-8c75-30186b810ce8"
682 | version = "3.37.0+0"
683 | 
684 | [[deps.Scratch]]
685 | deps = ["Dates"]
686 | git-tree-sha1 = "0b4b7f1393cff97c33891da2a0bf69c6ed241fda"
687 | uuid = "6c6a2e73-6563-6170-7368-637461726353"
688 | version = "1.1.0"
689 | 
690 | [[deps.SentinelArrays]]
691 | deps = ["Dates", "Random"]
692 | git-tree-sha1 = "15dfe6b103c2a993be24404124b8791a09460983"
693 | uuid = "91c51154-3ec4-41a3-a24f-3f23e20d615c"
694 | version = "1.3.11"
695 | 
696 | [[deps.Serialization]]
697 | uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
698 | 
699 | [[deps.Setfield]]
700 | deps = ["ConstructionBase", "Future", "MacroTools", "Requires"]
701 | git-tree-sha1 = "fca29e68c5062722b5b4435594c3d1ba557072a3"
702 | uuid = "efcf1570-3423-57d1-acb7-fd33fddbac46"
703 | version = "0.7.1"
704 | 
705 | [[deps.SharedArrays]]
706 | deps = ["Distributed", "Mmap", "Random", "Serialization"]
707 | uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"
708 | 
709 | [[deps.ShiftedArrays]]
710 | git-tree-sha1 = "22395afdcf37d6709a5a0766cc4a5ca52cb85ea0"
711 | uuid = "1277b4bf-5013-50f5-be3d-901d8477a67a"
712 | version = "1.0.0"
713 | 
714 | [[deps.Snappy]]
715 | deps = ["BinaryProvider", "Libdl", "Random", "Test"]
716 | git-tree-sha1 = "25620a91907972a05863941d6028791c2613888e"
717 | uuid = "59d4ed8c-697a-5b28-a4c7-fe95c22820f9"
718 | version = "0.3.0"
719 | 
720 | [[deps.Sockets]]
721 | uuid = "6462fe0b-24de-5631-8697-dd941f90decc"
722 | 
723 | [[deps.SortingAlgorithms]]
724 | deps = ["DataStructures"]
725 | git-tree-sha1 = "b3363d7460f7d098ca0912c69b082f75625d7508"
726 | uuid = "a2af1166-a08f-5f64-846c-94a0d3cef48c"
727 | version = "1.0.1"
728 | 
729 | [[deps.SparseArrays]]
730 | deps = ["LinearAlgebra", "Random"]
731 | uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
732 | 
733 | [[deps.SpecialFunctions]]
734 | deps = ["ChainRulesCore", "IrrationalConstants", "LogExpFunctions", "OpenLibm_jll", "OpenSpecFun_jll"]
735 | git-tree-sha1 = "a4116accb1c84f0a8e1b9932d873654942b2364b"
736 | uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
737 | version = "2.1.1"
738 | 
739 | [[deps.StatFiles]]
740 | deps = ["DataValues", "FileIO", "IterableTables", "IteratorInterfaceExtensions", "ReadStat", "TableShowUtils", "TableTraits", "TableTraitsUtils", "Test"]
741 | git-tree-sha1 = "28466ea10caec61c476a262172319d2edf248187"
742 | uuid = "1463e38c-9381-5320-bcd4-4134955f093a"
743 | version = "0.8.0"
744 | 
745 | [[deps.Statistics]]
746 | deps = ["LinearAlgebra", "SparseArrays"]
747 | uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
748 | 
749 | [[deps.StatsAPI]]
750 | git-tree-sha1 = "d88665adc9bcf45903013af0982e2fd05ae3d0a6"
751 | uuid = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
752 | version = "1.2.0"
753 | 
754 | [[deps.StatsBase]]
755 | deps = ["DataAPI", "DataStructures", "LinearAlgebra", "LogExpFunctions", "Missings", "Printf", "Random", "SortingAlgorithms", "SparseArrays", "Statistics", "StatsAPI"]
756 | git-tree-sha1 = "51383f2d367eb3b444c961d485c565e4c0cf4ba0"
757 | uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
758 | version = "0.33.14"
759 | 
760 | [[deps.StatsFuns]]
761 | deps = ["ChainRulesCore", "InverseFunctions", "IrrationalConstants", "LogExpFunctions", "Reexport", "Rmath", "SpecialFunctions"]
762 | git-tree-sha1 = "f35e1879a71cca95f4826a14cdbf0b9e253ed918"
763 | uuid = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
764 | version = "0.9.15"
765 | 
766 | [[deps.StatsModels]]
767 | deps = ["DataAPI", "DataStructures", "LinearAlgebra", "Printf", "REPL", "ShiftedArrays", "SparseArrays", "StatsBase", "StatsFuns", "Tables"]
768 | git-tree-sha1 = "677488c295051568b0b79a77a8c44aa86e78b359"
769 | uuid = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
770 | version = "0.6.28"
771 | 
772 | [[deps.StringEncodings]]
773 | deps = ["Libiconv_jll"]
774 | git-tree-sha1 = "50ccd5ddb00d19392577902f0079267a72c5ab04"
775 | uuid = "69024149-9ee7-55f6-a4c4-859efe599b68"
776 | version = "0.3.5"
777 | 
778 | [[deps.StructTypes]]
779 | deps = ["Dates", "UUIDs"]
780 | git-tree-sha1 = "d24a825a95a6d98c385001212dc9020d609f2d4f"
781 | uuid = "856f2bd8-1eba-4b0a-8007-ebc267875bd4"
782 | version = "1.8.1"
783 | 
784 | [[deps.SuiteSparse]]
785 | deps = ["Libdl", "LinearAlgebra", "Serialization", "SparseArrays"]
786 | uuid = "4607b0f0-06f3-5cda-b6b1-a6196a1729e9"
787 | 
788 | [[deps.TOML]]
789 | deps = ["Dates"]
790 | uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
791 | 
792 | [[deps.TableShowUtils]]
793 | deps = ["DataValues", "Dates", "JSON", "Markdown", "Test"]
794 | git-tree-sha1 = "14c54e1e96431fb87f0d2f5983f090f1b9d06457"
795 | uuid = "5e66a065-1f0a-5976-b372-e0b8c017ca10"
796 | version = "0.2.5"
797 | 
798 | [[deps.TableTraits]]
799 | deps = ["IteratorInterfaceExtensions"]
800 | git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39"
801 | uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c"
802 | version = "1.0.1"
803 | 
804 | [[deps.TableTraitsUtils]]
805 | deps = ["DataValues", "IteratorInterfaceExtensions", "Missings", "TableTraits"]
806 | git-tree-sha1 = "78fecfe140d7abb480b53a44f3f85b6aa373c293"
807 | uuid = "382cd787-c1b6-5bf2-a167-d5b971a19bda"
808 | version = "1.0.2"
809 | 
810 | [[deps.Tables]]
811 | deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "TableTraits", "Test"]
812 | git-tree-sha1 = "bb1064c9a84c52e277f1096cf41434b675cd368b"
813 | uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
814 | version = "1.6.1"
815 | 
816 | [[deps.Tar]]
817 | deps = ["ArgTools", "SHA"]
818 | uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"
819 | 
820 | [[deps.Test]]
821 | deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
822 | uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
823 | 
824 | [[deps.TextParse]]
825 | deps = ["CodecZlib", "DataStructures", "Dates", "DoubleFloats", "Mmap", "Nullables", "WeakRefStrings"]
826 | git-tree-sha1 = "eb1f4fb185c8644faa2d18d14c72f2c24412415f"
827 | uuid = "e0df1984-e451-5cb5-8b61-797a481e67e3"
828 | version = "1.0.2"
829 | 
830 | [[deps.Thrift]]
831 | deps = ["BinaryProvider", "Distributed", "Sockets"]
832 | git-tree-sha1 = "c3dd01c6067985a77fef761839203838ac12825b"
833 | uuid = "8d9c9c80-f77e-5080-9541-c6f69d204e22"
834 | version = "0.6.2"
835 | 
836 | [[deps.TimeZones]]
837 | deps = ["Dates", "Downloads", "InlineStrings", "LazyArtifacts", "Mocking", "Printf", "RecipesBase", "Serialization", "Unicode"]
838 | git-tree-sha1 = "0f1017f68dc25f1a0cb99f4988f78fe4f2e7955f"
839 | uuid = "f269a46b-ccf7-5d73-abea-4c690281aa53"
840 | version = "1.7.1"
841 | 
842 | [[deps.TranscodingStreams]]
843 | deps = ["Random", "Test"]
844 | git-tree-sha1 = "216b95ea110b5972db65aa90f88d8d89dcb8851c"
845 | uuid = "3bb67fe8-82b1-5028-8e26-92a6c54297fa"
846 | version = "0.9.6"
847 | 
848 | [[deps.URIParser]]
849 | deps = ["Unicode"]
850 | git-tree-sha1 = "53a9f49546b8d2dd2e688d216421d050c9a31d0d"
851 | uuid = "30578b45-9adc-5946-b283-645ec420af67"
852 | version = "0.4.1"
853 | 
854 | [[deps.URIs]]
855 | git-tree-sha1 = "97bbe755a53fe859669cd907f2d96aee8d2c1355"
856 | uuid = "5c2747f8-b7ea-4ff2-ba2e-563bfd36b1d4"
857 | version = "1.3.0"
858 | 
859 | [[deps.UUIDs]]
860 | deps = ["Random", "SHA"]
861 | uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
862 | 
863 | [[deps.UnPack]]
864 | git-tree-sha1 = "387c1f73762231e86e0c9c5443ce3b4a0a9a0c2b"
865 | uuid = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
866 | version = "1.0.2"
867 | 
868 | [[deps.Unicode]]
869 | uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
870 | 
871 | [[deps.Vega]]
872 | deps = ["DataStructures", "DataValues", "Dates", "FileIO", "FilePaths", "IteratorInterfaceExtensions", "JSON", "JSONSchema", "MacroTools", "NodeJS", "Pkg", "REPL", "Random", "Setfield", "TableTraits", "TableTraitsUtils", "URIParser"]
873 | git-tree-sha1 = "43f83d3119a868874d18da6bca0f4b5b6aae53f7"
874 | uuid = "239c3e63-733f-47ad-beb7-a12fde22c578"
875 | version = "2.3.0"
876 | 
877 | [[deps.VegaLite]]
878 | deps = ["Base64", "DataStructures", "DataValues", "Dates", "FileIO", "FilePaths", "IteratorInterfaceExtensions", "JSON", "MacroTools", "NodeJS", "Pkg", "REPL", "Random", "TableTraits", "TableTraitsUtils", "URIParser", "Vega"]
879 | git-tree-sha1 = "3e23f28af36da21bfb4acef08b144f92ad205660"
880 | uuid = "112f6efa-9a02-5b7d-90c0-432ed331239a"
881 | version = "2.6.0"
882 | 
883 | [[deps.VersionParsing]]
884 | git-tree-sha1 = "58d6e80b4ee071f5efd07fda82cb9fbe17200868"
885 | uuid = "81def892-9a0e-5fdd-b105-ffc91e053289"
886 | version = "1.3.0"
887 | 
888 | [[deps.WeakRefStrings]]
889 | deps = ["DataAPI", "InlineStrings", "Parsers"]
890 | git-tree-sha1 = "c69f9da3ff2f4f02e811c3323c22e5dfcb584cfa"
891 | uuid = "ea10d353-3f73-51f8-a26c-33c1cb351aa5"
892 | version = "1.4.1"
893 | 
894 | [[deps.Weave]]
895 | deps = ["Base64", "Dates", "Highlights", "JSON", "Markdown", "Mustache", "Pkg", "Printf", "REPL", "RelocatableFolders", "Requires", "Serialization", "YAML"]
896 | git-tree-sha1 = "d62575dcea5aeb2bfdfe3b382d145b65975b5265"
897 | uuid = "44d3d7a6-8a23-5bf8-98c5-b353f8df5ec9"
898 | version = "0.10.10"
899 | 
900 | [[deps.XLSX]]
901 | deps = ["Dates", "EzXML", "Printf", "Tables", "ZipFile"]
902 | git-tree-sha1 = "96d05d01d6657583a22410e3ba416c75c72d6e1d"
903 | uuid = "fdbf4ff8-1666-58a4-91e7-1b58723a45e0"
904 | version = "0.7.8"
905 | 
906 | [[deps.XML2_jll]]
907 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Libiconv_jll", "Pkg", "Zlib_jll"]
908 | git-tree-sha1 = "1acf5bdf07aa0907e0a37d3718bb88d4b687b74a"
909 | uuid = "02c8fc9c-b97f-50b9-bbe4-9be30ff0a78a"
910 | version = "2.9.12+0"
911 | 
912 | [[deps.YAML]]
913 | deps = ["Base64", "Dates", "Printf", "StringEncodings"]
914 | git-tree-sha1 = "3c6e8b9f5cdaaa21340f841653942e1a6b6561e5"
915 | uuid = "ddb6d928-2868-570f-bddf-ab3f9cf99eb6"
916 | version = "0.4.7"
917 | 
918 | [[deps.ZipFile]]
919 | deps = ["Libdl", "Printf", "Zlib_jll"]
920 | git-tree-sha1 = "3593e69e469d2111389a9bd06bac1f3d730ac6de"
921 | uuid = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"
922 | version = "0.9.4"
923 | 
924 | [[deps.Zlib_jll]]
925 | deps = ["Libdl"]
926 | uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
927 | 
928 | [[deps.Zstd_jll]]
929 | deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
930 | git-tree-sha1 = "cc4bf3fdde8b7e3e9fa0351bdeedba1cf3b7f6e6"
931 | uuid = "3161d3a3-bdf6-5164-811a-617609db77b4"
932 | version = "1.5.0+0"
933 | 
934 | [[deps.libblastrampoline_jll]]
935 | deps = ["Artifacts", "Libdl", "OpenBLAS_jll"]
936 | uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
937 | 
938 | [[deps.nghttp2_jll]]
939 | deps = ["Artifacts", "Libdl"]
940 | uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d"
941 | 
942 | [[deps.p7zip_jll]]
943 | deps = ["Artifacts", "Libdl"]
944 | uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0"
945 | 


--------------------------------------------------------------------------------
/Project.toml:
--------------------------------------------------------------------------------
 1 | name = "JuliaDataTutorials"
 2 | uuid = "33cdf1e7-ff7b-4b8e-aca7-7f04e2106471"
 3 | authors = ["Daniel Lakeland <dlakelan@street-artists.org>"]
 4 | version = "0.1.0"
 5 | 
 6 | [deps]
 7 | CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
 8 | DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 9 | DataFramesMeta = "1313f7d8-7da2-5740-9ea0-a2ca25f37964"
10 | Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
11 | GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
12 | GMT = "5752ebe1-31b9-557e-87aa-f909b540aa54"
13 | HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
14 | Queryverse = "612083be-0b0f-5412-89c1-4e7c75506a58"
15 | SQLite = "0aa819cd-b072-5ff4-a722-6bc24af294d9"
16 | Weave = "44d3d7a6-8a23-5bf8-98c5-b353f8df5ec9"
17 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # JuliaDataTutorials
2 | Tutorials For Data Analysis in Julia
3 | 
4 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dlakelan/JuliaDataTutorials/master)
5 | 
6 | 
7 | 


--------------------------------------------------------------------------------
/RegressionDiscontinuityQuestion.jmd:
--------------------------------------------------------------------------------
  1 | # Another Regression Discontinuity Confusion
  2 | 
  3 | Andrew Gelman gave an
  4 | [example of a Regression Discontinuity Analysis](https://statmodeling.stat.columbia.edu/2020/07/02/no-i-dont-believe-that-claim-based-on-regression-discontinuity-analysis-that/)
  5 | in which the original authors found evidence that their fit across the
  6 | two regions x < 0 and x > 0 were significantly different, and
  7 | concluded that losing a governors election cut 5-10 years off your
  8 | life.
  9 | 
 10 | The relevant data was
 11 | [available here](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IBKYRX)
 12 | but I've downloaded it already and added it to the data directory as
 13 | "longevity.csv"
 14 | 
 15 | Here's the plan. We're going to discuss some mathematical concepts,
 16 | what it means to have a discontinuity for example, and then we're
 17 | going to try to identify discontinuities and fast-transitions in the
 18 | given data, as well as in synthetic data where we know an appropriate
 19 | signal is present.
 20 | 
 21 | Let's start by loading the data and graphing the raw data:
 22 | 
 23 | ```julia
 24 | 
 25 | using Queryverse, Optim
 26 | 
 27 | longdf = load("data/longevity.csv") |> DataFrame
 28 | 
 29 | longdf |> @vlplot({:point,opacity=.1},x=:margin_pct_1,y=:living_day_post,width=700,height=700)
 30 | 
 31 | ```
 32 | 
 33 | It would be hard to believe you could be elected *after* you died, or
 34 | that you could live many more than 70 years after being elected. Also
 35 | it seems some elections had 100% win margin... those are
 36 | suspicious. So let's just filter that stuff out for ease of
 37 | exposition. None of them seem to affect the question at hand.
 38 | 
 39 | At the same time we'll convert the time-scale to years.
 40 | 
 41 | 
 42 | ```julia
 43 | 
 44 | daypyr = 365.2422
 45 | 
 46 | cleandf = filter(x -> !ismissing(x.living_day_post) && !ismissing(x.margin_pct_1) && x.living_day_post > 0 && x.living_day_post < 70*daypyr && x.margin_pct_1 < 95,longdf)
 47 | 
 48 | cleandf.yrpost = cleandf.living_day_post/daypyr
 49 | 
 50 | 
 51 | cleandf |> @vlplot({:point,opacity=.1},x=:margin_pct_1,y=:yrpost,width=700,height=700)
 52 | 
 53 | ```
 54 | 
 55 | ## The practical concept of discontinuity...
 56 | 
 57 | All of the following functions are mathematically continuous:
 58 | 
 59 | ```julia
 60 | """ 
 61 | sigmoid(x,a,b)
 62 | 
 63 | A rising sigmoid function whose transition occurs at a with scale b,
 64 | evaluated in a numerically stable way so that we only ever
 65 | exponentiate negative numbers. This is a generalized "inverse_logit"
 66 | type function.
 67 | 
 68 | """
 69 | function sigmoid(x,a,b) 
 70 |     s = (x-a)/b;
 71 |     if(s > 0)
 72 |         return(1.0/(1.0+exp(-s)))
 73 |     else
 74 |         return(exp(s)/(1.0+exp(s)))
 75 |     end
 76 | end
 77 |     
 78 |     
 79 | 
 80 | 
 81 | contdf = DataFrame(x=collect(-1.0:.01:1.0))
 82 | contdf.y1 = map( x-> sigmoid(x,0.0,1.0),contdf.x)
 83 | contdf.y2 = map( x-> sigmoid(x,0.0,.05),contdf.x)
 84 | contdf.y3 = map( x-> sigmoid(x,0.0,.0001),contdf.x)
 85 | 
 86 | contdf |> @vlplot(layers=[],width=700,title="Three continuous functions approaching a discontinuous one") + @vlplot({:line,color="blue"},x=:x,y=:y1) + 
 87 |     @vlplot({:line,color="orange"},x=:x,y=:y2) + @vlplot({:line,color="red"},x=:x,y=:y3)
 88 | 
 89 | ```
 90 | 
 91 | Practically speaking, however, on the scale where measurements in x
 92 | resolve down to .01, the red function is in essence discontinuous
 93 | because its value changes extremely rapidly across the least possible
 94 | change in x. In other words, for a data analyst working with real
 95 | world data where all measurements have some level of discrete aspect
 96 | to them, a discontinuous function can be modeled very effectively by a
 97 | rapidly changing continuous function.
 98 | 
 99 | Once we have this insight, then as an analyst we can stop thinking
100 | about whether a function is discontinuous, and instead think about
101 | whether it changes rapidly in a local region or not. In fact, we can
102 | work with a flexible functional form capable of representing slowly
103 | changing or rapidly changing functions, and see if our data shows
104 | evidence of rapid change.
105 | 
106 | There are many ways to represent nonlinear functions. One commonly
107 | available tool is the LOESS fit. LOESS in essence fits a line or low
108 | order polynomial (say quadratic or cubic) to a subset of the data
109 | centered on a point of interest. It does this for many points of
110 | interest, and forms a nonlinear function by connecting the value of
111 | the function at these different points of interest.
112 | 
113 | How many points are involved in the fit is known as the
114 | "bandwidth". In the VegaLite specification this bandwidth is a number
115 | between 0 and 1 representing the fraction of the total data set that
116 | is in use at any given fit-point.
117 | 
118 | Any flexible family of spline type functions that is capable of
119 | detecting a rapid change in living_day_post near margin_pct_1 ~ 0 will
120 | do for detecting whether there's an effect on longevity to winning an
121 | election. We'll fit LOESS curves with 3 different bandwidths: 0.2,
122 | 0.1, and 0.05 graphically, using the built in LOESS function in
123 | VegaLite plotting. As the bandwidth is decreased towards 0, the fit is
124 | dependent only on the closest few data point, and hence can change
125 | rapidly from one place to another when the data differs in one region
126 | or another.
127 | 
128 | ```julia
129 | 
130 | cleandf |> @vlplot(layer=[],width=700,height=400,title="LOESS of years lived post election against win margin in percentage points\nBandwidths = 0.2, 0.1, 0.05") + @vlplot({:point,opacity=.1},x=:margin_pct_1,y=:yrpost) + 
131 |     @vlplot({:line,color="blue",opacity=.5},transform=[{loess=:yrpost,on=:margin_pct_1,bandwidth=.2}],x=:margin_pct_1,y=:yrpost)+
132 |     @vlplot({:line,color="orange",opacity=.5},transform=[{loess=:yrpost,on=:margin_pct_1,bandwidth=.1}],x=:margin_pct_1,y=:yrpost)+
133 |     @vlplot({:line,color="red",opacity=.5},transform=[{loess=:yrpost,on=:margin_pct_1,bandwidth=.05}],x=:margin_pct_1,y=:yrpost)
134 | 
135 | 
136 | ```
137 | 
138 | What this shows is: nothing. As the LOESS fit uses less and less data
139 | due to the reducing bandwidth, we find not that there's a
140 | discontinuous jump in the function, rather there's simply small scale
141 | oscillations up and down which are consistent with more and more noisy
142 | fit. The blue curve which smoothes through the noise appears more
143 | reasonable as an estimate of what to expect than the red curve which
144 | rapidly changes up and down. Any causal model which somehow explains
145 | longevity of candidates by saying that winning by 0.1 percentage
146 | points does nothing to your longevity but by 0.2 percentage points
147 | increases your longevity vs 0.1 by 4 years, while 0.3 percentage
148 | points decreases your longevity vs 0.1 by 2 years is going to have a
149 | lot of explaining to do. On the other hand, the explanation "this
150 | estimate is noisy and there is no evidence of any meaningful effect"
151 | works just fine.
152 | 
153 | Of course this is the raw data. If we believe the referenced analysis,
154 | it just so happens that the people who won the election must have been
155 | much sicker, so that by winning the election, their lifespan was
156 | extended 5 to 10 years thereby just exactly canceling out the effect
157 | of them being much sicker.
158 | 
159 | 🤔
160 | 
161 | What does it look like when there's a real signal? Let's put a few
162 | signals in place. First we'll just reuse the x values so that the
163 | distribution of x values is the same as our analysis. Then we'll
164 | create pure Normally distributed noise with an appropriate
165 | scale... and add in a signal, which will be plotted in blue. The red
166 | LOESS line will be our estimate.
167 | 
168 | ```julia
169 | using Distributions, Random
170 | 
171 | 
172 | n = nrow(cleandf)
173 | Random.seed!(123)
174 | sdy = std(cleandf.living_day_post/daypyr)
175 | 
176 | 
177 | #xvals = rand(Normal(meanx,sdx),n)
178 | xvals = cleandf.margin_pct_1
179 | ynoise = rand(Normal(0,sdy),n)
180 | ysig = [if(xvals[i] > 0) 5.0 else 0.0 end for i in eachindex(ynoise)]
181 | 
182 | 
183 | function plotsig(x,y,n,lab)
184 |     spec = DataFrame(x=x,y=y+n,s=y) |> @vlplot(layers=[],width=700,height=500,title=lab)+
185 |         @vlplot({:point,opacity=.1},x=:x,y=:y) +    
186 |     @vlplot({:line,opacity=1,color="blue"},x=:x,y=:s)+
187 |     @vlplot({:line,color="red"},transform=[{loess=:y,on=:x,bandwidth=.05}],x=:x,y=:y);
188 |     return(spec);
189 | end
190 | 
191 | 
192 | p0 = plotsig(xvals,zeros(n),ynoise,"No signal, pure noise");
193 | display(p0)
194 | 
195 | 
196 | p1 = plotsig(xvals,ysig,ynoise,"A reliably identified step-up signal");
197 | display(p1);
198 | 
199 | ysig2 = [3*xvals[i]*exp(-(xvals[i]/5)^2) for i in eachindex(ynoise)]
200 | 
201 | p2=plotsig(xvals,ysig2,ynoise,"A reliably identified continuous wavelet");
202 | display(p2)
203 | 
204 | ysig3 = [ if(xvals[i] > 0) 3*xvals[i]*exp(-(xvals[i]/5)^2) else 0.0 end for i in eachindex(ynoise)]
205 | 
206 | p3 = plotsig(xvals,ysig3,ynoise,"A reliably identified continuous wavelet x+ only");
207 | display(p3)
208 | 
209 | ```
210 | 
211 | The LOESS methodology using 5% of the data as bandwidth and a similar
212 | number of raw data points clearly identifies the signal in each
213 | case. 
214 | 
215 | We conclude that if there were a signal in the raw data from this
216 | study, it would be fairly strongly visible in the LOESS with 5%
217 | bandwidth rather than oscillating wildly around zero on short "time
218 | scales" (in this case vote percentage scale). The reason we can
219 | conclude this is in essence that a true signal couldn't oscillate
220 | rapidly, as once we transitioned across the x=0 boundary, there's
221 | nothing really causally different between people who won by 0.1
222 | percentage points, and people who won by say 0.15 or 0.2 percentage
223 | points.
224 | 
225 | 


--------------------------------------------------------------------------------
/build.jl:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/julia
 2 | 
 3 | using Weave
 4 | 
 5 | docs = [ #"0_Prerequisites.jmd", ## don't build this as it installs packages
 6 |          "BasicDataAndPlots.jmd",
 7 |          "DiscussionBasicDataAndPlots.jmd",
 8 |          "COVID-monitoring.jmd"
 9 |          ];
10 | 
11 | for i in docs 
12 |     notebook(i);
13 | end
14 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | xlrd==1.1.*
2 | 


--------------------------------------------------------------------------------