└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # How to specify survey settings
  2 | 
  3 | ## Stas Kolenikov, Brady T. West, Peter Lugtig
  4 | 
  5 | In this repository, we document our understanding of, and recommendations for, 
  6 | appropriate best practices in specifying the complex sampling design settings 
  7 | in statistical software that enables design-based analyses of survey data.
  8 | We discuss features of the complex survey data such as stratification, clustering, 
  9 | unequal probabilities of selection, and calibration, and outline their impact on estimation procedures.
 10 | We demonstrate how statistical software treats them, and how the survey data providers 
 11 | can make data users' lifes easier by clearly documenting accurate and efficient ways to make sure 
 12 | that their software properly accounts for the complex sampling design features.
 13 | 
 14 | - [Survey sampling features](#survey-sampling-features)
 15 |     + [Stratification](#stratification)
 16 |     + [Cluster Sampling](#cluster-sampling)
 17 |     + [Unequal Probabilities of Selection](#unequal-probabilities-of-selection)
 18 |     + [Weight Adjustments](#weight-adjustments)
 19 |     + [Sampling is about doing the best job for the money!](#sampling-is-about-doing-the-best-job-for-the-money-)
 20 |  - [Survey settings in statistical software](#survey-settings-in-statistical-software)
 21 |     + [R](#r)
 22 |     + [Stata](#stata)
 23 |     + [SAS](#sas)
 24 |     + [See also](#see-also)
 25 |  - [Documentation on appropriate design-based analysis techniques for complex sample survey data: rubrics](#documentation-on-appropriate-design-based-analysis-techniques-for-complex-sample-survey-data--rubrics)
 26 |  - [Evaluating documentation in practice](#evaluating-documentation-in-practice)
 27 |     + [Dealing with existing documentation](#dealing-with-existing-documentation)
 28 |     + [The National Survey of Family Growth (NSFG), 2013--2015](#the-national-survey-of-family-growth--nsfg---2013--2015)
 29 |     + [The Population Assessment of Tobacco and Health](#the-population-assessment-of-tobacco-and-health)
 30 |     + [Understanding Society (Waves 1--8)](#understanding-society--waves-1--8-)
 31 |     + [European Social Survey](#european-social-survey)
 32 |     + [American Time Use Survey](#american-time-use-survey)
 33 |     + [The 2005 India Human Development Survey](#the-2005-india-human-development-survey)
 34 |     + [A Portrait of Jewish Americans](#a-portrait-of-jewish-americans)
 35 |     + [Placeholder: Survey name](#placeholder--survey-name)
 36 | - [Recommendations for survey organizations](#recommendations-for-survey-organizations)
 37 |     + [The Data  Documentation Initiative](#the-data--documentation-initiative)
 38 | - [Additional resources](#additional-resources)
 39 | - [Acknowledgements](#acknowledgements)
 40 | 
 41 | ### About authors
 42 | 
 43 | Stas Kolenikov is Principal Scientist at [Abt Associates](http://www.abtassociates.com); 
 44 | @skolenik on GitHub; [@StatStas](https://twitter.com/StatStas) on Twitter.
 45 | His interest in this project is from the triple perspective of the data provider
 46 | at the Division of Data Science, Surveys, and Enabling Technologies of Abt Associates;
 47 | of an occasional continuing education instructor teaching courses on survey design and weighting, and estimation
 48 | with complex survey data; and of the developer of 
 49 | [statistical software for survey weight calibration](https://econpapers.repec.org/software/bocbocode/s458430.htm).
 50 | 
 51 | Brady T. West is Research Associate Professor
 52 | in the [Survey Research Center](https://www.src.isr.umich.edu/) 
 53 | at the [Institute for Social Research](http://www.isr.umich.edu/) on the University of Michigan-Ann Arbor campus. 
 54 | His interest in this project arises from his recent NSF-funded work on analytic error 
 55 | (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0158120), 
 56 | suggesting that the majority of secondary analyses published in peer-reviewed journals 
 57 | do not correctly account for the features of complex sample designs in the estimation and inference process. 
 58 | [@bradytwest](https://twitter.com/bradytwest) on Twitter.
 59 | Website: http://www-personal.umich.edu/~bwest/
 60 | 
 61 | Peter Lugtig is Associate Professor at 
 62 | [the Department of Methodology and Statistics](https://www.uu.nl/en/organisation/faculty-of-social-and-behavioural-sciences/about-the-faculty/departments/methodology-statistics) 
 63 | in the School of Social and Behavioural Sciences, University of Utrecht.
 64 | His interest in this project is from the perspective of teaching applied social scientists in working with complex survey data. 
 65 | One of his interests is in estimating seperate components of Total Survey Error in surveys. To this end, it is vital that 
 66 | potential errors introducted via sampling can be identified and correctly adjusted. 
 67 | [@PeterLugtig](https://twitter.com/PeterLugtig) on Twitter.
 68 | Website: http://www.peterlugtig.com/
 69 | 
 70 | ___
 71 | 
 72 | ## Survey sampling features
 73 | 
 74 | One of the objects in survey research is to come up with survey designs that minimize Total Survey Error (TSE). 
 75 | Sampling and adjustment errors are only two of the errors within the larger TSE framework. 
 76 | However, when surveys are build on the principles of probability sampling, sampling  errors are two types of error that can be estimated correctly. 
 77 | When coverage and nonresponse can be estimated as well, there are possibilities to adjust errors in order to ensure that the analysis of the survey represent the larger population. 
 78 | If this is done well, the results from the survey analysis are asymptotically unbiased, whileuncertainty due to the various errors can be estimated.
 79 | In this manifesto, we focus on the "big four" features of complex sampling designs: stratification, cluster sampling, unequal probabilities of selection, and weight adjustments. 
 80 | Each design feature is described in more detail below.
 81 | 
 82 | ### Stratification
 83 | 
 84 | Stratification = breaking up the population/frames into mutually exclusive groups (strata) before sampling. Common examples of strata include:
 85 | 
 86 | *  Geographic regions for in-person samples
 87 | *  Diagnostic groups for patient list samples
 88 | *  Industry and/or employment size and/or geographical regions for establishment samples
 89 | 
 90 | Why do complex sampling designs employ stratification?
 91 | 
 92 | *  Oversample subpopulations of interest if they can be identified on the frame(s)
 93 | *  Oversample areas of higher concentration of the target rare population (c.f. http://www.asasrms.org/Proceedings/y2006/Files/JSM2006-000557.pdf)
 94 | *  Ensure specific accuracy targets in subpopulations of interest
 95 | *  Utilize different sampling designs/frames in different strata
 96 | *  Balance things around/avoid weird outlying samples/spread the sample across the whole population
 97 | *  Optimize costs vs. precision via [Neyman-Chuprow](https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_surveyselect_a0000000208.htm) or more complicated allocations
 98 | 
 99 | ### Cluster Sampling
100 | 
101 | Cluster, or multistage sampling design = sampling groups of units (clusters), rather than the ultimate observation units. Common examples of randomly sampled clusters include:
102 | 
103 | *  Geographic units (e.g., census tracts) in face-to-face surveys
104 | *  Entities in natural hierarchies (e.g. health care providers within practices within hospitals, or students within classes within schools)
105 | 
106 | Why do complex sampling designs employ cluster sampling?
107 | 
108 | *  Complete lists of all units are not available, but the survey statistician can obtain lists of administrative units for which residence or other relevant eligibility status of observation units can be easily identified
109 | *  Reduce interviewer travel time/cost in face-to-face surveys
110 | *  Analytic interest in multilevel modeling of hierarchical structures
111 | 
112 | Terminology: PSU = primary sampling unit = cluster for analytic purposes
113 | 
114 | ### Unequal Probabilities of Selection
115 | 
116 | Why do complex sampling designs assign unequal probabilities of selection to different population units?
117 | 
118 | *  Directly oversample (smaller) subpopulations of interest (e.g., ethnic/racial minorities) that would not have sufficient   
119 |         sample sizes in an equal probability of selection method (epsem) sample
120 | *  Indirectly oversample, by selecting areas with a higher concentration of the target rare population
121 | *  Result of multiple stage/cluster sampling
122 |    - Most samples for face-to-face surveys are designed with probability proportional to size (PPS) sampling at the first few stages, fixed sample size at last stage $\Rightarrow$ approximately EPSEM. In many cases, the sample size at last stage of selection (e.g. the size of a household) is unknown in advance.
123 |    - If measures of sizes are not accurate, or nonresponse depends on cluster (size), no longer EPSEM.
124 | *  Unintended result of multiple frame sampling
125 |     - dual phone users, i.e., those who have both landline and cell phone service, are more likely to be selected.
126 | 
127 | ### Weight Adjustments
128 | 
129 | Why are design weights (computed as the inverse of the probability of selection) adjusted further? These adjustments are designed to be corrections for...
130 | 
131 | *  eligibility
132 | *  frame noncoverage
133 | *  frame overlap in multiple frame surveys
134 | *  statistical efficiency
135 | *  nonresponse (unavoidable in the real world)
136 | 
137 | ### Sampling is about doing the best job for the money!
138 | 
139 | At the end of the day, all of the complex sampling features described above are employed for 1+ of the following reasons:
140 | 
141 | *  Save money
142 | 
143 |    - use cluster samples to save on travel costs
144 |    - use stratified samples to realize statistical efficiency gains
145 |             
146 | *  Cannot get the full population listing
147 | 
148 |    - ... so have to use area samples to gradually zoom down to individuals
149 |    - ... so have to use infrastructure created for a different purpose (telecom or postal) to contact people
150 |    - ... so have to sample a larger, general population, and screen out the eligible rare/hard to reach population
151 |             
152 | *  Overcome real world data collection difficulties
153 | 
154 |    -  nonresponse weight adjustments
155 |             
156 | As a result of all considerations above, most serious surveys that are using face-to-face data collection or address-based sampling are using a complex sample design in their fieldwork. 
157 | Data resulting from the survey cannot be naively analyzed, and survey weights have to be used. 
158 | Survey statisticians routinely compute weights for users. These weights often take the form of a design weight that corrects for 
159 | eligibility, frame overlap, and unequal seelction probabilities in sampling. A separate nonresponse weight corrects for 
160 | nonresponse, and sometimes for noncoverage errors in the frame used. In some surveys additional weights are provided for the 
161 | purpose of doing cross-national comparisons (multi-country surveys) or longitudinal analysis (cohort or panel studies).
162 | For more information on how modern surveys are efficiently designed, and weights are computed, we refer the reader to Kalton, 
163 | Flores-Cervantes (2003), Lohr (2010), Bethlehem (2011), Valliant, Dever, Kreuter (2013), Valliant and Dever (2017) or Kolenikov 
164 | (2016). 
165 | The weights included in the dataset should be accompanied with detailed documentation on how the weights were computed and should be used in practice by applied researchers. 
166 | We have often found that the documentation of survey weights is inadequate. 
167 | Sometimes, details on how the weights were designed are missing. 
168 | More often, the decsription of the weights is sparse or very technical. This then leads to users not using weights at all, or using them incorrectly. 
169 | West, Sakshaug and Aurelien (2016) have shown for example that analytic errors are prevalent in 145 analyses of the survey 'Scientists and Engineers Statistical Data System' (SESTAT). 
170 | This paper seeks to provide a rubric for how survey weights should be documented. We will define a set of rubrics consisting of five main and two bonus elements, and then use these rubrics to discuss the survey documentation of several popular surveys originating in the U.S., U.K. and Europe. 
171 | 
172 | This paper is accompanied by a website, where applied researchers can paste example code from SAS, Stata and R and generate corresponding code in other software packages to facilitate the correct use of weights in future. Please visit https://statstas.shinyapps.io/svysettings/ for details.
173 | 
174 | ___
175 | 
176 | ## Survey settings in statistical software
177 | 
178 | The most common public use data file specification of an area probability sampling design is that
179 | of a two-stage stratified clustered sample. It is nearly always an approximation to the true sampling design,
180 | as most typically the design would include more stages, and some additional modifications of the 
181 | sampling design variables would be undertaken: true sampling strata or units would be combined
182 | or split, units would be swapped with one another, etc., typically in order to mask the true geographical
183 | locations of respondents, as geography is one of the strongest factors putting individuals
184 | at risk of identification and disclosure (Heeringa et al., 2017, Chapter 4).
185 | 
186 | In the examples below, we provide the following semi-standardized examples:
187 | 
188 | - a "public use" stratified two-stage design:
189 |   * the data file in the package native format is `PUMS_svy`, with an appropriate extension
190 |   * strata are `thisStrat`
191 |   * clusters are `thisPSU`
192 |   * weights are `thisWeight`
193 |   * Taylor Series Linearization (TSL) is generally the default variance estimation procedure in these settings
194 | - a "dual frame RDD" design, approximated by an unequal probability design:
195 |   * the data file in the package native format is `RDD_svy`, with an appropriate extension
196 |   * weights are `thisWeight`
197 | - a design with the bootstrap replicate weights:
198 |   * the data file in the package native format is `BSTRAP_svy`, with an appropriate extension
199 |   * the main weights are `thisWeight`
200 |   * the replicate weights are `bsWeight1`, `bsWeight2`, ..., `bsWeight100`
201 | 
202 | In addition, three analyses are discussed:
203 | - estimation of the total of a continuous variable `y`;
204 | - cross-tabulation of two categorical variables `sex` and `race`;
205 | - analysis in subpopulation/domain defined by age restriction, `age` between 18 and 30.
206 | 
207 | ### R
208 | 
209 | Implementation of complex sample survey estimation in `library(survey)` separates the steps of declaring the sampling design
210 | and running estimation.
211 | 
212 | (In terms of reading the input data, we assume that the user follows the best practices of workflow management
213 | and uses `library(here)` to identify the root of the project; 
214 | see [Bryan (2017)](https://www.tidyverse.org/articles/2017/12/workflow-vs-script/)).
215 | 
216 | The "public use" stratified two-stage design:
217 | 
218 | ```
219 | # prerequisites
220 | library(survey)
221 | library(here)
222 | # read the data
223 | thisSurvey <- readData(here("data/PUMS_svy.Rdata"))
224 | # specify the design
225 | thisDesign <- svydesign(id =~ thisPSU, strat =~ thisStrat, weights =~thisWeight, data =~ thisSurvey)
226 | # estimate the total
227 | (total_y <- svytotal(~y, design = thisDesign) )
228 | # tabulate
229 | (tab1_sex_race <- svymean( ~interaction(sex,race,drop=TRUE), design = thisDesign ) )
230 | (tab2_sex_race <- svytable( ~sex+race, design = thisDesign) )
231 | (tab3_sex_race <- svyby(~sex, by = ~race, design = thisDesign, FUN = svymean)
232 | # subpopulation estimation: redeclare the design
233 | young_adults <- subset( design = thisDesign, ( (age>=18) & (age<=30) ) )
234 | (total_y_young <- svytotal(~y, design = young_adults )
235 | ```
236 | 
237 | In the above, the line `( object <- function_call(input1, ... ) )` simultaneously creates and assigns 
238 | the object, and prints it. Lumley (2010) notes that by default, all functions give missing values (`NA`)
239 | when they encounter item missing data. To discard the missing data from analysis, `na.rm=TRUE` should
240 | be specified as an option to the `svy...(...,na.rm=TRUE)` functions, with the effect of treating 
241 | the non-missing data data as a subpopulation. More complex analysis, such as linear models are available through the survey package. 
242 | 
243 | The RDD unequal weights design:
244 | 
245 | ```
246 | # prerequisites
247 | library(survey)
248 | library(here)
249 | # read the data
250 | thisSurvey <- readData(here("data/BSTRAP_svy.Rdata"))
251 | # specify the design
252 | thisDesign <- svrepdesign(id =~ 1, weights =~thisWeight, data =~ thisSurvey)
253 | # estimation can use the same syntax as above
254 | ```
255 | 
256 | The replicate weight design:
257 | 
258 | ```
259 | # prerequisites
260 | library(survey)
261 | library(here)
262 | # read the data
263 | thisSurvey <- readData(here("data/RDD_svy.Rdata"))
264 | # specify the design
265 | thisDesign <- svydesign(weights =~thisWeight, data =~ thisSurvey, 
266 |                         repweights =~ "bsWeight[0-9]+", type="bootstrap",
267 |                         combined.weights = TRUE)
268 | # estimation can use the same syntax as above
269 | ```
270 | 
271 | In the above syntax, `"bsWeight[0-9]+"` is a [*regular expression*](https://regexr.com/) which, in this case, builds
272 | a filter for variable names as follows:
273 | 1. must start with the text `bsWeight` exactly;
274 | 2. this prefix must be followed by a digit `[0-9]`
275 | 3. this digit must happen at least once, and may happen an unlimited number of times (`+` modifier).
276 | 
277 | For more examples, see Thomas Lumley's documentation of the `library(survey)` package:
278 | - [Lumley (2010)](https://www.amazon.com/Complex-Surveys-Guide-Analysis-Using/dp/0470284307) book;
279 | - http://r-survey.r-forge.r-project.org/survey/, home of the `library(survey)` package.
280 | 
281 | ### Stata
282 | 
283 | In Stata, survey settings can be specified once, and be used later with the `svy:` estimation prefix.
284 | The settings can be saved with the data set. This is a recommended best practice for data providers.
285 | 
286 | ```
287 | use thisSurvey, clear
288 | svyset 
289 | * if empty, specify svyset on your own
290 | svyset thisPSU [pw=thisWeight], strata(thisStrat)
291 | * estimate the total
292 | svy :  total y
293 | * tabulate
294 | svy : tab sex race, col se
295 | * subpopulation estimation: subpop option
296 | svy , subpop( if inrange(age,18,30) ) : total y
297 | ```
298 | 
299 | The RDD unequal weights design:
300 | 
301 | ```
302 | use thisSurvey, clear
303 | svyset 
304 | * if empty, specify svyset on your own
305 | svyset thisPSU [pw=thisWeight]
306 | * estimation commands as before
307 | * estimate the total
308 | svy :  total y
309 | * tabulate
310 | svy : tab sex race, col se
311 | * subpopulation estimation: subpop option
312 | svy , subpop( if inrange(age,18,30) ) : total y
313 | ```
314 | 
315 | The replicate weight design:
316 | 
317 | ```
318 | use thisSurvey, clear
319 | svyset 
320 | * if empty, specify svyset on your own
321 | svyset [pw=thisWeight], vce(bootstrap) bsrw( bsWeight* ) mse
322 | * estimate the total
323 | svy :  total y
324 | * tabulate
325 | svy : tab sex race, col se
326 | * subpopulation estimation: subpop option
327 | svy , subpop( if inrange(age,18,30) ) : total y
328 | ```
329 | 
330 | The estimation commands themselves are identical to those for the cluster+strata designs.
331 | 
332 | The `mse` option of the `svyset` command requests the MSE version of the estimator 
333 | where the original estimate is subtracted, <!--
334 | $$
335 | v[\hat\theta] = \frac1R \sum_{r=1}^R (\hat\theta^{(r)}-\hat\theta)^2
336 | $$
337 | -->
338 | vs. variance version where the mean of the pseudo-values is substracted 
339 | when the squared differences are formed.
340 | <!--
341 | $$
342 | \tilde v[\hat\theta] = \frac1R \sum_{r=1}^R (\hat\theta^{(r)}-\bar{\hat\theta})^2, 
343 | \quad \bar{\hat\theta} = \frac1R \sum_{r=1}^$ \hat\theta^{(r)}
344 | $$
345 | -->
346 | 
347 | Starting with Stata 15.1, calibrated weights [are supported](https://www.stata.com/bookstore/survey-weights/).
348 | 
349 | ### SAS
350 | 
351 | In SAS, survey settings need to be declared in every `SURVEY` procedure.
352 | 
353 | [Tabulations and cross-tabulations](https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#surveyfreq_toc.htm):
354 | 
355 | ```
356 | PROC SURVEYFREQ data=thisSurveyLib.thisSurvey;
357 |    WEIGHTS thisWeight;
358 |    CLUSTER thisPSU;
359 |    STRATA thisStrat;
360 |    TABLES sex*race;
361 | RUN;
362 | ```
363 | 
364 | Subpopulation analysis:
365 | 
366 | ```
367 | DATA thisSurveyLib.thisSurvey;
368 |    SET thisSurveyLib.thisSurvey;
369 |    age_18to30 = (age>=18) & (age<=30);
370 | RUN;
371 | PROC SURVEYFREQ data=thisSurveyLib.thisSurvey;
372 |    WEIGHTS thisWeight;
373 |    CLUSTER thisPSU;
374 |    STRATA thisStrat;
375 |    TABLES sex*race;
376 |    DOMAIN age_18to30;
377 | RUN;
378 | ```
379 | 
380 | ### See also
381 | 
382 | https://github.com/skolenik/ICHPS2018-svy
383 | 
384 | https://statstas.shinyapps.io/svysettings/
385 | 
386 | ___
387 | 
388 | ## Documentation on appropriate design-based analysis techniques for complex sample survey data: rubrics
389 | Large scale data collections are nowadays routinely released to the public. They typically include anonymized survey micro-date, along with some variables that include details about the fieldwork itself, 
390 | and one or several weighting variables that allow any data user to correct for unequal sampling probabilities introducted in the survey design, 
391 | as well as errors introduced by coverage and/or nonresponse errors. 
392 | The survey datasets are accompanied with survey documentation that explain the design of the surveys and details on the measurements taken. 
393 | In teh next section we first propose a short checklist in order to assess the qualityt of survey documentation. We then apply the checklist to several existing and widely used public datasets. We argue that the documentation should be understandable and usable to the average applied scientist, and conclude with recommendations on how to improve survey documentation.
394 | 
395 | 
396 | 1. **Can a survey statistician figure out from the documentation how to set the data up for correct estimation?**
397 | This would be a person with training on par with or exceeding the level of the Lohr (1999) or Kish (1965) textbooks, and applied 
398 | experience on par with or exceeding the Lumley (2010) or Heeringa, West and Berglund (2017) books.
399 | 
400 | 2. **Can an applied researcher figure out from the documentation how to set the data up for correct estimation?**
401 | This would be a person who has some background / training in applied statistical analysis, but has only cursory knowledge of 
402 | survey methodology, based on at most several hours of classroom instruction in their "methods" class or a short course at a 
403 | conference.
404 | 
405 | 3. **Is everything described succinctly in one place, or scattered throughout the document?** 
406 | It is of course easier on the user when all the relevant information is easily available in a single section. However, some 
407 | reports put information about weights in one place, e.g. where sampling was described, while information about other complex 
408 | sampling features (e.g., cluster/strata/variance estimation) only appears some twenty pages away.
409 | 
410 | 4. **Are examples of specific syntax to specify survey settings provided?** 
411 | Has the data producer provided worked and clearly-annotated examples of analyses of the complex sample survey data produced by a 
412 | given survey using the syntax for existing procedures in one or more common statistical software packages? And as a bonus, have 
413 | examples been provided in multiple languages (e.g., SAS, R, and Stata)? 
414 | 
415 | 5. **Are there examples given for how to answer substantive research questions?**
416 | In all languages, there are specific ways to run commands that are survey-design-aware. In other words, only specifying the 
417 | design may not be sufficient in ensuring that estimation is done correctly. For instance, are examples provided for both 
418 | descriptive and analytic (i.e., regression-driven) research questions?
419 | 
420 | 6. (Bonus) **Is an executive summary description of the sample design available?**
421 | Many researchers would appreciate a two-three sentence paragraph to summarize the sampling design that
422 | they could copy and paste into their papers, e.g.,
423 | 
424 | > {This survey} is a three-stage areal sampling design survey with census tracts, households, and individuals
425 | as sampling units. The final analysis weights provided by {the organization who collected the data} account 
426 | for unequal selection probabilities, nonresponse, and study eligibility, and are used in all analyses
427 | reported in this paper. Standard errors are estimated using the complex survey bootstrap variance estimation procedures.
428 | 
429 | or
430 | 
431 | > {This survey} is a dual-frame RDD survey that collected data on both landline and mobile phones.
432 | The final analysis weights provided by {the organization who collected the data} account 
433 | for unequal selection probabilities, nonresponse, and study eligibility, and are used in all analyses
434 | reported in this paper. Standard errors are estimated using Taylor series linearization,
435 | the default analytical method available in most statistical packages.
436 | 
437 | 7. (Bonus) **What kinds of references are provided?**
438 | It is often helpful to the end users if the description of the sampling design features is
439 | accompanied by the references to (a) methodological literature describing them in general
440 | (e.g., texbooks such as Korn & Graubard, Kish, Lohr, Heeringa-West-Berglund, Lumley, etc.),
441 | and (b) technical publications specific to the study in question, such as 
442 | the JSM or AAPOR proceedings, technical reports on the provider website, or publications
443 | in technical literature describing the study, if appropriate. E.g., the description
444 | of clustered sampling designs used in the U.S. Census Bureau large scale surveys
445 | such as the American Community Survey or Current Population Survey could refer 
446 | to general descriptions of stratified clustered surveys, to the user Handbooks 
447 | ([*What Researchers Need to Know* ACS Handbook (Census Bureau 2009)](https://www.census.gov/library/publications/2009/acs/researchers.html)), 
448 | and to the technical papers on variance estimation 
449 | ([Ash 2011](http://www.citeulike.org/user/ctacmo/article/13018645)).
450 | 
451 | (Secondary bonus) Are the references pointing out to sources other than the authors? (Chances are if there's a JSM Proceedings 
452 | paper by the same group of authors, it won't be any clearer, frankly.)
453 | 
454 | We now use the seven rubrics defined above to "score" several existing examples of documentation for public-use survey 
455 | data files based on these criteria. For example, if the documentation for a public-use data file successfully satisfies / meets 
456 | the first five rubrics above, the documentation will be scored 5/5. These scores are designed to be **illustrative**, in terms 
457 | of rating existing examples of documentation for public-use data files on how effectively they convey complex sampling features 
458 | and how they should be employed in analysis to users. The scores are designed to motivate data producers to improve the clarity 
459 | of their documentation for a variety of data users hoping to analyze large (and usually publically-funded) survey data sets.
460 | 
461 | ## Evaluating documentation in practice
462 | 
463 | In this section, we will evaluate a convenience sample of the documentation for several public use survey data files (PUFs). We 
464 | will apply the above rubrics to see how the documentation compares in terms of effectively describing appropriate analysis 
465 | techniques to data users.
466 | 
467 | ### Dealing with existing documentation
468 | 
469 | 1. Search documentation for the software footprint as keywords: `svyset` per Stata, 
470 | `PROC SURVEY` per SAS, `svydesign` per R `library(survey)`.
471 | 
472 | 2. If that fails, search for "sampling weight", "final weight", "analysis weight", "survey weight" or "design weight". 
473 | You can search for "weight" per se but you should expect that in health studies, this is likely to produce 
474 | many false positives.
475 | 
476 | 3. See if there is any description of the sampling strata and clusters near the text where weights are mentioned.
477 | 
478 | 4. Search for "*PSU*" and "*cluster*" and "*strata*" and "*stratification*" to find
479 | the variables that needed to be specified in survey settings.
480 | 
481 | 5. Search for "*variance estimation*", the generic technical term to deal with complexities of survey estimation.
482 | 
483 | 6. Search for "*replicate weights*", "*BRR*", "*jackknife*" and "*bootstrap*", the keywords for the popular
484 | replicate variance estimation methods.
485 | 
486 | ___
487 | 
488 | ### The National Survey of Family Growth (NSFG), 2013--2015
489 | 
490 | ⭐⭐⭐⭐⭐
491 | 
492 | **Funding**: 
493 | 
494 | * Eunice Kennedy Shriver National Institute of Child Health and Human Development
495 | * Office of Population Affairs
496 | * NCHS, CDC
497 | * Division of HIV/AIDS Prevention, CDC
498 | * Division of Sexually Transmitted Disease Prevention, CDC
499 | * Division of Reproductive Health, CDC
500 | * Division of Birth Defects and Developmental Disabilities, CDC
501 | * Division of Cancer Prevention and Control, CDC
502 | * Children’s Bureau, Administration for Children and Families (ACF)
503 | * Office of Planning, Research and Evaluation, ACF
504 | 
505 | **Data collection**: The University of Michigan Survey Research Center (http://src.isr.umich.edu)
506 | 
507 | **Host**: The National Center for Health Statistics (http://www.cdc.gov/nchs/)
508 | 
509 | **URL**: http://www.cdc.gov/nchs/nsfg
510 | 
511 | **Rubrics**: 
512 | 
513 | 1. **Can a survey statistician figure out from the documentation how to set the data up for correct estimation?**
514 | Yes. Electronic documents like 
515 | [Example 1: Variance Estimates for Percentages](https://www.cdc.gov/nchs/data/nsfg/NSFG_2013_2015_VarEst_Ex1.pdf) 
516 | linked from the [documentation page](https://www.cdc.gov/nchs/nsfg/nsfg_2013_2015_puf.htm) 
517 | under *Variance estimation* subtitle make it very easy for survey statisticians and applied researchers alike 
518 | to correctly declare complex sampling features to survey analysis software for design-based analyses.
519 | 
520 | 2. **Can an applied researcher figure out from the documentation how to set the data up for correct estimation?**
521 | Yes. See above.
522 | 
523 | 3. **Is everything that the data user needs to know about the complex sampling contained in one place?** 
524 | Yes, although very little (if anything) is said about the actual complex sample design. 
525 | Instead this information appears in separate electronic files, such as 
526 | [Sample Design Documentation](https://www.cdc.gov/nchs/data/nsfg/NSFG_2013-2015_Sample_Design_Documentation.pdf). 
527 | This is out of necessity, however, given the complexity of the NSFG sample design, 
528 | and all of the information that a user needs to compute weighted point estimates and estimate variance 
529 | accounting for the complex sampling can be found in examples like the one indicated above.
530 | 
531 | 4. **Are examples of specific syntax for performing correct design-based analyses provided?**
532 | Yes. Three examples are clearly documented 
533 | ([tabulations for categorical variables](https://www.cdc.gov/nchs/data/nsfg/NSFG_2013_2015_VarEst_Ex1.pdf); 
534 | [means for continuous variables](https://www.cdc.gov/nchs/data/nsfg/NSFG_2013_2015_VarEst_Ex2.pdf);
535 | [analysis with domains/subpopulations](https://www.cdc.gov/nchs/data/nsfg/NSFG_2013_2015_VarEst_Ex3.pdf)) 
536 | and linked on the main documentation page, and both syntax and output are included in each case. 
537 | Bonus: syntax and output are provided for both SAS and Stata.
538 | 
539 | 5. **Are examples of analyses need for addressing specific substantive questions provided?**
540 | Yes; see previous item.
541 | 
542 | 6. **(Bonus) Is an executive summary of the sample design provided?**
543 | Yes; such an executive summary is given in the first section of 
544 | [the main sample document](https://www.cdc.gov/nchs/data/nsfg/NSFG_2013-2015_Sample_Design_Documentation.pdf)
545 | 
546 | 7. **(Bonus) What kinds of references are provided?**
547 | There are several references to the most important sample design literature included in Section 11 of the document linked above.
548 | 
549 | **Score**: 5++/5
550 | 
551 | The NSFG provides an excellent example of the type of documentation 
552 | that needs to be provided to data users to minimize the risk of analytic error 
553 | due to a failure to account for complex sampling features.
554 | 
555 | Accessed on 2018-07-15.
556 | 
557 | ___
558 | 
559 | ### The Population Assessment of Tobacco and Health
560 | 
561 | ⭐⭐⭐⭐⭐
562 | 
563 | **Funding**: The Population Assessment of Tobacco and Health (PATH) Study is a collaboration 
564 | between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), 
565 | and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). 
566 | 
567 | **Data collection**: Westat (http://www.westat.com)
568 | 
569 | **Host**: The National Addiction and HIV Data Archive Program
570 | 
571 | **URL**: https://www.icpsr.umich.edu/icpsrweb/NAHDAP/series/606
572 | 
573 | **Rubrics**: 
574 | 
575 | 1. **Can a survey statistician figure out from the documentation how to set the data up for correct estimation?**
576 | Yes. Section 5 of the Public-Use Files User Guide provides clear detail on the calculation and names 
577 | of the various weight variables that can be used for estimation. This section also discusses variance estimation, 
578 | and clearly describes the replicate weights that have been prepared for data users enabling variance estimation. 
579 | Software options are also discussed in this section, and code illustrating the use of multiple programs 
580 | for the protype example analyses is provided in Appendix A.
581 | 
582 | 2. **Can an applied researcher figure out from the documentation how to set the data up for correct estimation?**
583 | Yes. Appendix A of the User Guide is very helpful, given that it provides annotated example code for several different packages.
584 | Section 5 is aimed at survey statisticians, and will be overwhelming to an audience that is less technically prepared.
585 | 
586 | 3. **Is everything that the data user needs to know about the complex sampling contained in one place?** 
587 | Yes; Section 5 provides all of the necessary sampling information for analysis purposes, and Appendix A contains all of the necessary code for actual practice.
588 | 
589 | 4. **Are examples of specific syntax for performing correct design-based analyses provided?**
590 | Yes. Appendix A of the Public-Use Files User Guide is an excellent example of providing this kind of resource for data users.
591 | 
592 | 5. **Are examples of analyses needed for addressing specific substantive questions provided?**
593 | Yes. Appendix A illustrates a variety of potential analyses that data users could perform.
594 | 
595 | 6. **(Bonus) Is an executive summary of the sample design provided?**
596 | Chapter 2 of the User Guide provides a detailed summary of the sample design, which serves as an executive summary.
597 | 
598 | 7. **(Bonus) What kinds of references are provided?**
599 | There are several references to the most important sample design literature included at the end of the User Guide.
600 | 
601 | **Score**: 5++/5
602 | 
603 | The PATH PUF user guide is another excellent, gold-standard example of detailed and useful information 
604 | designed to make the life of the survey data user easier. 
605 | 
606 | Accessed on 2018-12-17.
607 | ___
608 | 
609 | ### Understanding Society (Waves 1--8)
610 | 
611 | ⭐⭐
612 | 
613 | **Funding**: Economic & Social Research Council (ESRC).
614 | 
615 | **Data collection**: 
616 | - The Institute for Social and Economic Research (ISER), University of Essex
617 | - NatCen Social Research (wave 1-5) - Great Britian
618 | - Central Survey Unit of NISRA (wave 1-5)- Northern Ireland
619 | - Kantar Public UK (wave 6 onwards)
620 | 
621 | **Host**: The UK data archive: https://discover.ukdataservice.ac.uk/series/?sn=2000053
622 | 
623 | **URL**: https://www.understandingsociety.ac.uk/
624 | 
625 | User Guide, including information on sampling design and weighting:
626 | https://www.understandingsociety.ac.uk/sites/default/files/downloads/documentation/mainstage/user-guides/mainstage-user-guide.pdf
627 | 
628 | **Rubrics**:
629 | 
630 | 1. **Can a survey statistician figure out how to declare the complex sampling features?**
631 | Yes. See the link above. The stratification is well described, both for Understanding Society, and it's predecessor, 
632 | The British Household Panel Study. The sample design is complex, as this study is longitudinal. 
633 | The study included refreshment samples to increase sample sizes, include minorities, and add new regions into the study. 
634 | Sampling design variables are described in Section 3.2.7 of the User Guide referenced above.
635 | 
636 | 2. **Can an applied researcher figure out from the documentation how to set the data up for correct estimation?**
637 | This is problematic given the complexity of the design, and the different populations
638 | that this data set could be used to analyze (e.g., the latest cross-section, panel based on BHPS starting Wave 1,
639 | panel based on BHPS, GPS and EMBS starting from Wave 2, etc.).
640 | The applied researcher needs to have 
641 | a very clear definition of what the target population of their analysis exactly is. 
642 | Guidance is provided on what survey weights to use depending on the choice of the target population
643 | in Section 3.3.1 of the above referenced User Guide, but the description is highly technical.
644 | 
645 | 3. **Is everything that the data user needs to know about the complex sampling contained in one place?**
646 | Yes, Section 3.2.7 of the User Guide referenced above includes all information. 
647 | 
648 | 4. **Are examples of specific syntax for performing correct design-based analyses provided?**
649 | No. Examples are provided for data management tasks such as combining waves or identifying individuals within
650 | households, but no `svyset` syntax is provided.
651 | 
652 | 5. **Are examples of analyses need for addressing specific substantive questions provided?**
653 | There are examples of the code provided, but these analysis are unweighted,
654 | and thus aruably misleading.
655 | The description is technical, and practical examples of the kind of questions 
656 | researchers would want to answer using this data may help users to select the right set of weights.
657 | 
658 | 6. **(Bonus) Is an executive summary of the sample design provided?**
659 | No. The sample design is also too complex for this.
660 | 
661 | 7. **(Bonus) What kinds of references are provided?**
662 | The documentation includes many references to additional papers and technical reports 
663 | written on the design and analysis of *Understanding Society* Data.
664 | 
665 | **Score**: 2+/5
666 | 
667 | *Understanding Society* is a very complex survey. While technical documentation is excellent
668 | for its technical purposes, guidance for applied researchers is between limited and confusing. 
669 | Examples in Stata and SPSS are provided, but neither cover data management, or provide
670 | examples of basic *unweighted* analyses, i.e., do not help the researchers to set the data up
671 | for correct analysis that would account for the complex sampling nature of the survey.
672 | 
673 | Accessed on 2018-12-12.
674 | ___
675 | 
676 | ### European Social Survey ###
677 | 
678 | ⭐⭐ 
679 | 
680 | **Funding**: European Commission, Horizon 2020. 
681 | Rounds 1-7 of ESS have been founded by national science foundations and/or European national governments.
682 | 
683 | **Data collection**: coordinated by City University, London, UK. 
684 | Data collection in separate European Countries coordinated within every country. 
685 | 
686 | **Host**: European Social Survey, formerly at Norwegian data Archive
687 | 
688 | **URL**: www.europeansocialsurvey.org
689 | 
690 | Weighting documentation:
691 | http://www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1.pdf
692 | 
693 | **Rubrics**:
694 | 
695 | 1. **Can a survey statistician figure out from the documentation how to set the data up for correct estimation?**
696 | Yes. The European Social Survey is a repeated cross-sectional study conducted in about 30 different countries in Europe.
697 | Sampling is conducted within every country, using either listing methods or registers (of individuals or addresses). 
698 | Three weights (design, poststratification and population equivalence weights) are included in the main datafile. 
699 | This allows for Horvitz-Thompson estimation, but not the specification of a complex survey design.
700 | However, an Integrated Sample data file does include information on stratificiation or cluster variables, 
701 | as well as selection probabilities for every respondent. 
702 | On top of this, a multilevel file adds regional indicators to the main datafile, allowing for multilevel-analysis
703 | 
704 | 2. **Can an applied researcher figure out from the documentation how to set the data up for correct estimation?**
705 | Yes, three weights are provided: a design weight, a poststratification weight and a population equivalence weight. 
706 | Guidance is included on how to comibine the three weights, and when to use what weight in some examples of analyses.
707 | 
708 | 3. **Is everything that the data user needs to know about the complex sampling contained in one place?**
709 | Documentation is scattered across many different documents and files on the ESS website. 
710 | However, most users in practice would use one round of ESS. 
711 | In that case, the country report files contain details on how fieldwork (including sampling) was conducted.
712 | One good aspect of the European Social Survey is that the users are explicitly warned that data need to be weighted 
713 | when data are downloaded from the ESS website. However, there isn't an accompanying warning about using 
714 | the sample design variables for variance estimation as well.
715 | 
716 | 4. **Are examples of specific syntax for performing correct design-based analyses provided?**
717 | No.
718 | 
719 | 5. **Are examples of analyses need for addressing specific substantive questions provided?**
720 | There are a few examples of data management code, but not of the complex survey analysis syntax.
721 | 
722 | 6. **(Bonus) Is an executive summary of the sample design provided?**
723 | There is an executive summary that describes the basic sampling methodology. 
724 | There is no easily accessible executive summary that explains how and why sampling differs over the countries.
725 | 
726 | 7. **(Bonus) What kinds of references are provided?**
727 | There are references to standard textbooks on complex survey design, and references to other documents 
728 | on the ESS website, with more detailed documentation.
729 | 
730 | **Score**: 2/5
731 | 
732 | The ESS is a typical example of documentation written by survey statisticians for survey statisticians,
733 | and it takes a survey statistician to process it and come up with the requisite syntax. 
734 | Novice users may be deterred by the complexity of documentation, and would choose to either underutilize
735 | the resource, or would otherwise have to bombard the survey provides with additional questions.
736 | 
737 | Accessed on 2017-07-19.
738 | ___
739 | 
740 | ### American Time Use Survey
741 | 
742 | **Funding**:
743 | 
744 | **Data collection**:
745 | 
746 | **Host**: ICPSR
747 | 
748 | **URL**: https://www.bls.gov/tus/
749 | 
750 | **Rubrics**:
751 | 
752 | **Score**:
753 | 
754 | 
755 | 
756 | ___
757 | 
758 | ### The 2005 India Human Development Survey
759 | 
760 | ⭐
761 | 
762 | **Funding**: NIH Grants R01HD041455 and R01HD046166.
763 | 
764 | **Data collection**: Per the user guide, the fieldwork was carried out by 24 collaborating institutions under the supervision of the National Council of Applied Economic Research (NCAER) in New Delhi. Interviewers were hired by the institutions and trained by personnel from the University of Maryland and NCAER.
765 | 
766 | **Host**: The data are hosted by ICPSR (see web site below).
767 | 
768 | **URL**: https://www.icpsr.umich.edu/icpsrweb/content/DSDR/idhs-data-guide.html
769 | 
770 | **Rubrics**: 
771 | 
772 | 1. **Can a survey statistician figure out from the documentation how to set the data up for correct estimation?**
773 | No. Even after reviewing the technical documentation (https://ihds.umd.edu/sites/ihds.umd.edu/files/publications/papers/technical%20paper%201.pdf) in addition to the user guide, it is still unclear exactly what variable(s) should be used in variance estimation to reflect the stratification and cluster sampling that was performed. Reading carefully enough reveals the weights that should be used to compute population estimates at the individual or household level (SWEIGHT), but nothing is said about the appropriate design variables (stratum codes, PSU codes) in the data files. Complicating matters is the fact that two apparent PSU variables (IDPSU and PSUID) appear in the data file, with no mention of which one should be used for variance estimation and no references to an appropriate stratum variable. 
774 | 
775 | 2. **Can an applied researcher figure out from the documentation how to set the data up for correct estimation?**
776 | No. The web site above provides a brief description of the weights in Section V, but the user guide only mentions the appropriate weight variable for individuals or households (SWEIGHT) in a section described the constructed and geographic variables. There is also no mention of which variables should be used for appropriate variance estimation, despite the fact that there are variables called PSUID and IDPSU in the data file. There is no indication of what variable (or variables) should be used to capture the stratified sampling that was performed.
777 | 
778 | 3. **Is everything that the data user needs to know about the complex sampling contained in one place?**
779 | For the most part. The user guide contains a detailed section on sampling, and the technical documentation provides even more detailed aimed at survey statisticians.
780 | 
781 | 4. **Are examples of specific syntax for performing correct design-based analyses provided?**
782 | No.
783 | 
784 | 5. **Are examples of analyses need for addressing specific substantive questions provided?**
785 | No. This documentation is in serious need of analysis examples, demonstrating syntax that should be used for design-based analyses along with indications of how to form the appropriate sample design variables for variance estimation.
786 | 
787 | 6. **(Bonus) Is an executive summary of the sample design provided?**
788 | Yes, in the user guide.
789 | 
790 | 7. **(Bonus) What kinds of references are provided?**
791 | The technical documentation does not provide any references, and the user guide briefly refers to papers that have been published using these data (which won't help users who actually want to analyze the data).
792 | 
793 | **Score**: 1/5
794 | 
795 | The documentation is incomplete making it impossible to analyze the data correctly.
796 | 
797 | Accessed 2018-12-21.
798 | ___
799 | 
800 | 
801 | ### A Portrait of Jewish Americans
802 | 
803 | ⭐⭐⭐⭐
804 | 
805 | **Funding**: The Pew Research Center’s 2013 survey of U.S. Jews was conducted by the 
806 | center’s Religion & Public Life Project with generous funding from 
807 | The Pew Charitable Trusts and the Neubauer Family Foundation.
808 | 
809 | **Data collection**: Abt SRBI under contract to Pew Research Center 
810 | 
811 | **Host**: Pew Research Center http://www.pewresearch.org/
812 | 
813 | **URL**: http://www.pewforum.org/dataset/a-portrait-of-jewish-americans/
814 | 
815 | **Rubrics**: how well the documentation matches the desired criteria
816 | 
817 | 1. **Can a survey statistician figure out from the documentation how to set the data up for correct estimation?**
818 | Yes; survey documentation explains the differences between the household and the person-level weights,
819 | and stresses that the boostrap weights should be used for variance estimation.
820 | 
821 | 2. **Can an applied researcher figure out from the documentation how to set the data up for correct estimation?**
822 | Yes; Stata syntax is provided early in the document, or can be found by search in the PDF file.
823 | 
824 | 3. **Is everything described succinctly in one place, or scattered throughout the document?** 
825 | Yes; all of the relevant information is contained in the **Key Elements of the Data** section in about 2 pages.
826 | 
827 | 4. **Are examples of specific syntax to specify survey settings provided?** 
828 | Yes; item 6 of **Key Elements of the Data** section identifies the variables and provides Stata syntax 
829 | for individual level and household level anlayses. 
830 | (Search for any of `Stata`, `SAS`, `weight`, `svyset` would lead the researcher to this information.)
831 | A warninig is given that SPSS Statistics Base package cannot correctly compute standard errors.
832 | 
833 | 5. **Are there examples given for how to answer substantive research questions?**
834 | No examples are given.
835 | 
836 | 6. (Bonus) **Is an executive summary description of the sample design available?**
837 | Sample design is described in painstaking detail in about 9 pages. No short summary of the design is available
838 | from the technical documentation, although such a summary can be found in the substantive report
839 | (http://www.pewforum.org/2013/10/01/jewish-american-beliefs-attitudes-culture-survey/).
840 | 
841 | 7. (Bonus) **What kinds of references are provided?**
842 | No additional references are given.
843 | 
844 | **Score**:
845 | 4+/5
846 | 
847 | A Portrait of Jewish Americans is a very well described survey that most researchers will be able to 
848 | analyze correctly by following the instructions of the data provider. 
849 | Slight limitations of the documentation is that examples of the settings are only 
850 | given for one package, Stata, and no examples of substantive analyses, e.g. those leading to the primary
851 | tables in the substantive report, are provided.
852 | 
853 | Accessed on 2018-12-11.
854 | 
855 | ___
856 | 
857 | ### Placeholder: Survey name
858 | 
859 | **Funding**: the ultimate client of the study
860 | 
861 | **Data collection**: the organization who collected and documented the data
862 | 
863 | **Host**: the organization that hosts the data
864 | 
865 | **URL**: 
866 | 
867 | **Rubrics**: how well the documentation matches the desired criteria
868 | 
869 | **Score**:
870 | 
871 | Accessed on [DATE].
872 | ___
873 | 
874 | ## Recommendations for survey organizations
875 | 
876 | Follow best practices and the above rubrices, and strive to make 5 stars in our ratings!
877 | 
878 | Documentation of survey settings can also be strengthened in the existing documentation standards.
879 | 
880 | ### The Data  Documentation Initiative
881 | 
882 | DDI (http://www.ddialliance.org/) is an international standard for describing the data 
883 | produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. 
884 | DDI is a free standard that can document and manage different stages in the research data lifecycle, 
885 | such as conceptualization, collection, processing, distribution, discovery, and archiving. 
886 | Documenting data with DDI facilitates understanding, interpretation, and use -- 
887 | by people, software systems, and computer networks. 
888 | 
889 | The version of DDI as of January 2019, DDI 3, provides support to document the weights in a free format. 
890 | DDI Codebook 2.5 provides the following specifications:
891 | 
892 | - http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation_files/schemas/codebook_xsd/elements/weight.html
893 | 
894 | 
895 | ## Additional resources
896 | 
897 | Piotr Jabkowski (Adam Mikiewicz University, Poznan, Poland) has been maintaining Surveys Quality Assessment Database (SQAD).
898 | This project compiles metadata from 67 waves of six major European cross-country comparative surveys, 
899 | totaling 1537 individual surveys. 
900 | See [the project webpage](https://www.researchgate.net/project/Surveys-Quality-Assessment-Database-SQAD)
901 | and [the Technical Report 1.0](https://www.researchgate.net/publication/327136417_COMPARATIVE_ANALYSIS_OF_THE_QUALITY_OF_SURVEY_SAMPLES_IN_THE_CROSSNATIONAL_STUDIES_SURVEY_ARCHIVISATION_AND_META-BASE_OF_RESULTS_TECHNICAL_REPORT_VERSION_10).
902 | 
903 | ___
904 | 
905 | ## Acknowledgements
906 | 
907 | The initial impetus for this work came from the AAPOR presentation
908 | by Margaret Levenstein on how ICPSR handles meta data of the surveys they 
909 | store and distribute, and from 
910 | [the discussion on Twitter that followed](https://twitter.com/MaryELosch/status/997213578917707778).
911 | 
912 | <a href='http://ecotrust-canada.github.io/markdown-toc/'>Table of contents generated with markdown-toc</a>
913 | 
914 | 


--------------------------------------------------------------------------------