├── .Rbuildignore
├── .gitattributes
├── .gitignore
├── .travis.yml
├── CONDUCT.md
├── DESCRIPTION
├── NAMESPACE
├── NEWS.md
├── R
├── deprecated_functions.R
└── simulation_functions.R
├── README.Rmd
├── README.md
├── codecov.yml
├── codemeta.json
├── cran-comments.md
├── data-raw
└── prep-cakemap.R
├── docs
├── CONDUCT.html
├── authors.html
├── docsearch.css
├── docsearch.js
├── index.html
├── link.svg
├── news
│ └── index.html
├── pkgdown.css
├── pkgdown.js
├── pkgdown.yml
└── reference
│ ├── check_constraint.html
│ ├── check_ind.html
│ ├── extract.html
│ ├── extract_weights.html
│ ├── index.html
│ ├── integerise.html
│ ├── rake.html
│ ├── rk_extract.html
│ ├── rk_integerise.html
│ ├── rk_rake.html
│ ├── rk_weight.html
│ └── weight.html
├── inst
└── extdata
│ ├── cakemap_cons.csv
│ └── cakemap_inds.csv
├── man
├── extract.Rd
├── extract_weights.Rd
├── integerise.Rd
├── rake.Rd
├── rk_extract.Rd
├── rk_integerise.Rd
├── rk_rake.Rd
├── rk_weight.Rd
└── weight.Rd
├── rakeR.Rproj
└── tests
├── cakemap_cons.csv
├── cakemap_inds.csv
├── testthat.R
└── testthat
├── test_deprecated.R
├── test_rk_extract.R
├── test_rk_integerise.R
├── test_rk_rake.R
└── test_rk_weight.R
/.Rbuildignore:
--------------------------------------------------------------------------------
1 | ^codemeta\.json$
2 | ^.*\.Rproj$
3 | ^\.Rproj\.user$
4 |
5 | .travis.yml
6 | codecov.yml
7 | cran-comments.md
8 | data-raw/
9 | ^README\.Rmd$
10 | ^README-.*\.png$
11 | ^CONDUCT\.md$
12 | inst/
13 | revdep/
14 | docs/
15 | ^docs$
16 |
--------------------------------------------------------------------------------
/.gitattributes:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/philmikejones/rakeR/676dbb0c36e7cf6beeac6f3fb8d350d1758123cc/.gitattributes
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | revdep/
6 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: r
2 |
3 | cache: packages
4 |
5 | r_packages:
6 | - covr
7 |
8 | notifications:
9 | on_success: change
10 | on_failure: change
11 |
12 | after_success:
13 | - Rscript -e 'library(covr); codecov()'
14 |
--------------------------------------------------------------------------------
/CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Code of Conduct
2 |
3 | As contributors and maintainers of this project, we pledge to respect all people who
4 | contribute through reporting issues, posting feature requests, updating documentation,
5 | submitting pull requests or patches, and other activities.
6 |
7 | We are committed to making participation in this project a harassment-free experience for
8 | everyone, regardless of level of experience, gender, gender identity and expression,
9 | sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
10 |
11 | Examples of unacceptable behavior by participants include the use of sexual language or
12 | imagery, derogatory comments or personal attacks, trolling, public or private harassment,
13 | insults, or other unprofessional conduct.
14 |
15 | Project maintainers have the right and responsibility to remove, edit, or reject comments,
16 | commits, code, wiki edits, issues, and other contributions that are not aligned to this
17 | Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed
18 | from the project team.
19 |
20 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
21 | opening an issue or contacting one or more of the project maintainers.
22 |
23 | This Code of Conduct is adapted from the Contributor Covenant
24 | (http:contributor-covenant.org), version 1.0.0, available at
25 | http://contributor-covenant.org/version/1/0/0/
26 |
--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
1 | Package: rakeR
2 | Title: Easy Spatial Microsimulation (Raking) in R
3 | Version: 0.2.1.9000
4 | Date: 2018-02-26
5 | Authors@R: c(
6 | person("Phil Mike", "Jones",
7 | email = "philmikejones@gmail.com",
8 | role = c("aut", "cre"),
9 | comment = c(ORCID = "0000-0001-5173-3245")),
10 | person("Robin", "Lovelace", role = "aut",
11 | comment = "Many functions are based on code by Robin Lovelace and Morgane Dumont"),
12 | person("Morgane", "Dumont", role = "aut",
13 | comment = "Many functions are based on code by Robin Lovelace and Morgane Dumont"),
14 | person("Andrew", "Smith", role = "ctb"))
15 | Description: Functions for performing spatial microsimulation ('raking')
16 | in R.
17 | Depends:
18 | R (>= 3.4.0)
19 | License: GPL-3
20 | Encoding: UTF-8
21 | LazyData: true
22 | RoxygenNote: 6.1.1
23 | Imports: ipfp,
24 | wrswoR
25 | Suggests: testthat,
26 | readr
27 | URL: https://philmikejones.github.io/rakeR/
28 | BugReports: https://github.com/philmikejones/rakeR/issues
29 |
--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
1 | # Generated by roxygen2: do not edit by hand
2 |
3 | export(extract)
4 | export(extract_weights)
5 | export(integerise)
6 | export(rake)
7 | export(rk_extract)
8 | export(rk_integerise)
9 | export(rk_rake)
10 | export(rk_weight)
11 | export(weight)
12 |
--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
1 | v 0.2.2
2 | =======
3 |
4 | * Functions have been renamed to include a `rk_` prefix. Function names without prefixes have been deprecated but will continue to work for now so as not to affect existing code.
5 | * Increased code coverage.
6 | * check_constraint() and check_ind() are now deprecated. These checks are carried out by the weight() and/or integerise()/extract() functions automatically.
7 | * Update README.md documentation
8 | * Add additional unit tests - thanks Derrick Atherton for feedback
9 | * Add appropriate acknowledgements for source of data set used for examples and testing.
10 |
11 | v 0.2.1
12 | =======
13 |
14 | Patch release:
15 |
16 | * Fixes to examples and tests to use correct variable labels and names (thanks Derrick Atherton for pointing this out).
17 | * Fix version number in NEWS
18 |
19 | v 0.2.0
20 | =======
21 |
22 | * integerise() now uses the wrswoR package for sampling without replacement.
23 | This is in the order of 100s of times faster, reducing the time taken for the function to return from hours to minutes.
24 | I have implemented the sample_int_crank() function as this has given results
25 | similar to that of base R's sample() in testing, so this should minimise changes
26 | between the two.
27 | See https://stackoverflow.com/questions/15113650/faster-weighted-sampling-without-replacement or https://cran.r-project.org/package=wrswoR for details of the
28 | implementation.
29 | * simulate() is deprecated. Instead of weight() %>% integerise() %>% simulate(),
30 | just use weight() %>% integerise(). This is to improve consistency with the
31 | steps to produce fractional weights (weight() %>% extract()).
32 | * extract_weights() has been deprecated. Use extract() instead.
33 | * extract() (previously extract_weights()) now stops if it encounters a numeric
34 | variable. See https://github.com/philmikejones/rakeR/issues/49
35 | * integerise() now returns weights unmodified with a note if they are
36 | already integers. See https://github.com/philmikejones/rakeR/issues/42 and https://github.com/philmikejones/rakeR/issues/46
37 | * set.seed() is no longer hard-coded in the integerise() function and can be
38 | specified as a function argument. See:
39 | https://github.com/philmikejones/rakeR/issues/41
40 |
41 | v 0.1.2
42 | =======
43 |
44 | Patch release
45 |
46 | New functions
47 | -------------
48 |
49 | * check_ind() checks individual level data for common errors
50 |
51 |
52 | Fixes:
53 | ------
54 |
55 | * Fix weight() #33: https://github.com/philmikejones/rakeR/issues/33
56 | * check functions now named prep functions to prevent conflicts with validation
57 | * Add date to DESCRIPTION for bibliography building
58 |
59 |
60 | v 0.1.1
61 | =======
62 |
63 | Initial CRAN release
64 |
65 | Fixes:
66 | ------
67 |
68 | * Fix license issue: remove LICENSE, state GPL-3
69 |
70 |
71 | v 0.1.0
72 | =======
73 |
74 | New functions
75 | -------------
76 |
77 | * weight(),
78 | * integerise(),
79 | * simulate(),
80 | * rake(), and
81 | * check_constraint()
82 |
--------------------------------------------------------------------------------
/R/deprecated_functions.R:
--------------------------------------------------------------------------------
1 | #' weight
2 | #'
3 | #' Deprecated. Use rk_weight()
4 | #'
5 | #' @param cons A data frame containing all the constraints. This
6 | #' should be in the format of one row per zone, one column per constraint
7 | #' category. The first column should be a zone code; all other columns must be
8 | #' numeric counts.
9 | #' @param inds A data frame containing individual-level (survey) data. This
10 | #' should be in the format of one row per individual, one column per
11 | #' constraint. The first column should be an individual ID.
12 | #' @param vars A character vector of variables that constrain the simulation
13 | #' (i.e. independent variables)
14 | #' @param iterations The number of iterations the algorithm should complete.
15 | #' Defaults to 10
16 | #'
17 | #' @return A data frame of fractional weights for each individual in each zone
18 | #' with zone codes recorded in column names and individual id recorded in row
19 | #' names.
20 | #' @export
21 | #'
22 | #' @examples
23 | #' # Deprecated. Use rk_weight()
24 | weight <- function(cons, inds, vars = NULL, iterations = 10) {
25 |
26 | .Deprecated("rk_weight")
27 |
28 | rk_weight(cons = cons, inds = inds, vars = vars, iterations = iterations)
29 |
30 | }
31 |
32 | #' extract
33 | #'
34 | #' Deprecated. Use rk_extract() instead.
35 | #'
36 | #' @param weights A weight table, typically produced by rakeR::weight()
37 | #' @param inds The individual level data
38 | #' @param id The unique id variable in the individual level data (inds),
39 | #' usually the first column
40 | #'
41 | #' @return A data frame with zones and aggregated simulated values for each
42 | #' variable
43 | #' @export
44 | #'
45 | #' @examples
46 | #' \dontrun{
47 | #' Deprecated. Use rk_extract()
48 | #' }
49 | extract <- function(weights, inds, id) {
50 |
51 | .Deprecated("rk_extract")
52 |
53 | rk_extract(weights = weights, inds = inds, id = id)
54 |
55 | }
56 |
57 |
58 | #' extract_weights
59 | #'
60 | #' Deprecated: use rakeR::rk_extract()
61 | #'
62 | #' @param weights A weight table, typically produced using rakeR::rk_weight()
63 | #' @param inds The individual level data
64 | #' @param id The unique id variable in the individual level data (inds),
65 | #' usually the first column
66 | #'
67 | #' @return A data frame with zones and aggregated simulated values for each
68 | #' variable
69 | #' @export
70 | #'
71 | #' @examples
72 | #' \dontrun{
73 | #' extract_weights() is deprecated, use rk_extract() instead
74 | #' }
75 | extract_weights <- function(weights, inds, id) {
76 |
77 | stop("extract_weights() is deprecated. Use rk_extract()")
78 |
79 | }
80 |
81 |
82 | #' integerise
83 | #'
84 | #' Deprecated. Use rk_integerise()
85 | #'
86 | #' @param weights A matrix or data frame of fractional weights, typically
87 | #' provided by \code{rakeR::weight()}
88 | #' @param inds The individual--level data (i.e. one row per individual)
89 | #' @param method The integerisation method specified as a character string.
90 | #' Defaults to \code{"trs"}; currently other methods are not implemented.
91 | #' @param seed The seed to use, defaults to 42.
92 | #'
93 | #' @return A data frame of integerised cases
94 | #' @aliases integerize
95 | #' @export
96 | #'
97 | #' @examples
98 | #' \dontrun{
99 | #' Deprecated. Use rk_integerise()
100 | #' }
101 | integerise <- function(weights, inds, method = "trs", seed = 42) {
102 |
103 | .Deprecated("rk_integerise")
104 |
105 | rk_integerise(weights = weights, inds = inds, method = method, seed = seed)
106 |
107 | }
108 |
109 |
110 | #' rake
111 | #'
112 | #' Deprecated. Use rk_rake()
113 | #'
114 | #' @param cons A data frame of constraint variables
115 | #' @param inds A data frame of individual--level (survey) data
116 | #' @param vars A character string of variables to iterate over
117 | #' @param output A string specifying the desired output, either "fraction"
118 | #' (rk_extract()) or "integer" (rk_integerise())
119 | #' @param iterations The number of iterations to perform. Defaults to 10.
120 | #' @param ... Additional arguments to pass to depending on desired output:
121 | #' \itemize{
122 | #' \item{if "fraction" specify 'id' (see rk_extract() documentation)}
123 | #' \item{if "integer" specify 'method' and 'seed' (see rk_integerise()
124 | #' documentation)}
125 | #' }
126 | #'
127 | #' @return A data frame with extracted weights (if output == "fraction", the
128 | #' default) or integerised cases (if output == "integer")
129 | #' @export
130 | #'
131 | #' @examples
132 | #' \dontrun{
133 | #' Deprecated. Use rk_rake()
134 | #' }
135 | rake <- function(cons, inds, vars, output = "fraction", iterations = 10, ...) {
136 |
137 | .Deprecated("rk_rake")
138 |
139 | arguments <- list(...)
140 |
141 | result <- rk_rake(
142 | cons = cons, inds = inds, vars = vars,
143 | output = output, iterations = iterations,
144 | id = arguments[["id"]]
145 | )
146 |
147 | result
148 |
149 | }
150 |
--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | output: github_document
3 | ---
4 |
5 | ```{r, echo = FALSE}
6 | knitr::opts_chunk$set(
7 | collapse = TRUE,
8 | comment = "#>",
9 | fig.path = "README-"
10 | )
11 | ```
12 |
13 |
14 | # rakeR
15 |
16 | [](https://travis-ci.org/philmikejones/rakeR)
17 | [](https://codecov.io/gh/philmikejones/rakeR)
18 | [](https://cran.r-project.org/package=rakeR)
19 | [](https://doi.org/10.5281/zenodo.821506)
20 | [](https://www.repostatus.org/#active)
21 |
22 |
23 | Create a spatial microsimulated data set in R using iterative proportional
24 | fitting ('raking').
25 |
26 |
27 | ## Install
28 |
29 | Install the latest stable version from CRAN:
30 |
31 | ```{r install-cran, eval=FALSE, include=TRUE}
32 | install.packages("rakeR")
33 | ```
34 |
35 | Or install the development version with `devtools`:
36 |
37 | ```{r install-rakeR, eval=FALSE, include=TRUE}
38 | # Obtain devtools if you don't already have it installed
39 | # install.packages("devtools")
40 |
41 | # Install rakeR development version from GitHub
42 | devtools::install_github("philmikejones/rakeR")
43 | ```
44 |
45 | Load the package with:
46 |
47 | ```{r load-rakeR}
48 | library("rakeR")
49 | ```
50 |
51 |
52 | ## Overview
53 |
54 | `rakeR` has three main functions.
55 | The first stage is always to use `rk_weight()` to produce a matrix of fractional weights.
56 | This matrix stores weights for each individual in each zone.
57 |
58 | From this weight matrix, `rakeR` has functions to create fractional weights (`rk_extract()`) or integerised cases (`rk_integerise()`), depending on your needs and use cases.
59 | Fractional (`rk_extract()`ed) weights are generally more accurate, while integer cases are probably the most intuitive to use and are useful as inputs
60 | for further modeling.
61 |
62 | To create fractional weights use `rk_weight()` then `rk_extract()`, and to produce integerised weights use `rk_weight()` then `rk_integerise()`.
63 |
64 |
65 | ## Inputs
66 |
67 | To perform the weighting you should supply two data frames.
68 | One data frame should contain the constraint information (`cons`) with counts per category for each zone (e.g. census counts).
69 | The other data frame should contain the individual--level data (`inds`), i.e. one row per individual.
70 |
71 | In addition, it is necessary to supply a character vector with the names of the constraint variables in `inds` (`vars`).
72 | This is so that `rakeR` knows which are the contraint variables and which variables it should be simulating as an output.
73 |
74 | Below are examples of `cons`, `inds`, and `vars`.
75 |
76 | ```{r data}
77 | cons <- data.frame(
78 | "zone" = letters[1:3],
79 | "age_0_49" = c(8, 2, 7),
80 | "age_gt_50" = c(4, 8, 4),
81 | "sex_f" = c(6, 6, 8),
82 | "sex_m" = c(6, 4, 3),
83 | stringsAsFactors = FALSE
84 | )
85 |
86 | inds <- data.frame(
87 | "id" = LETTERS[1:5],
88 | "age" = c("age_gt_50", "age_gt_50", "age_0_49", "age_gt_50", "age_0_49"),
89 | "sex" = c("sex_m", "sex_m", "sex_m", "sex_f", "sex_f"),
90 | "income" = c(2868, 2474, 2231, 3152, 2473)
91 | )
92 |
93 | vars <- c("age", "sex")
94 | ```
95 |
96 | It is _essential_ that the unique levels in the constraint variables in the
97 | `inds` data set match the variables names in the `cons` data set.
98 | For example, `age_0_49` and `age_gt_50` are variable names in `cons` and the unique levels of the `age` variable in `inds` precisely match these:
99 |
100 | ```{r labels-test}
101 | all.equal(
102 | levels(inds$age), colnames(cons[, 2:3]) # cons[, 1] is the id column
103 | )
104 | ```
105 |
106 | Without this, the functions do not know how to match the `inds` and `cons` data
107 | and will fail so as not to return spurious results.
108 |
109 |
110 | ## `rk_weight()`
111 |
112 | (Re-)weighting is done with `rk_weight()` which returns a data frame of raw weights.
113 |
114 | ```{r weight}
115 | weights <- rk_weight(cons = cons, inds = inds, vars = vars)
116 | weights
117 | ```
118 |
119 | The raw weights tell you how frequently each individual (`A`-`E`) should appear
120 | in each zone (`a`-`c`).
121 | The raw weights are useful when validating and checking performance of the
122 | model, so it can be necessary to save these separately.
123 | They aren't very useful for analysis however, so we can `rk_extract()` or `rk_integerise()` them into a useable form.
124 |
125 |
126 | ## `rk_extract()`
127 |
128 | `rk_extract()` produces aggregated totals of the simulated data for each category in each zone.
129 | `rk_extract()`ed data is generally more accurate than `rk_integerise()`d data, although the user should be careful this isn't spurious precision based on context and knowledge of the domain.
130 | Because `rk_extract()` creates a column for each level of each variable, numerical variables (e.g. income) must be removed or `cut()` (otherwise the result would include a new column for each unique numerical value):
131 |
132 | ```{r extract}
133 | inds$income <- cut(inds$income, breaks = 2, include.lowest = TRUE,
134 | labels = c("low", "high"))
135 |
136 | ext_weights <- rk_extract(weights, inds = inds, id = "id")
137 | ext_weights
138 | ```
139 |
140 | `rk_extract()` returns one row per zone, and the total of each category (for
141 | example female and male, or high and low income) will match the known
142 | population.
143 |
144 |
145 | ## `rk_integerise()`
146 |
147 | The `rk_integerise()` function produces a simulated data frame populated with simulated individuals.
148 | This is typically useful when:
149 |
150 | * You need to include numerical variables, such as income in the example.
151 | * You want individual cases to use as input to a dynamic or agent-based model.
152 | * You want 'case studies' to illustrate characteristics of individuals in an
153 | area.
154 | * Individual-level data is more intuitive to work with.
155 |
156 | ```{r integerise}
157 | int_weights <- rk_integerise(weights, inds = inds)
158 | int_weights[1:6, ]
159 | ```
160 |
161 | `rk_integerise()` returns one row per case, and the number of rows will match
162 | the known population (taken from `cons`).
163 |
164 |
165 | ## `rk_rake()`
166 |
167 | `rk_rake()` is a wrapper for `rk_weight() %>% rk_extract()` or
168 | `rk_weight() %>% rk_integerise()`.
169 | This is useful if the raw weights (from `rk_weight()`) are not required.
170 | The desired output is specified with the `output` argument, which can be
171 | specified with `"fraction"` (the default) or `"integer"`.
172 | The function takes the following arguments in all cases:
173 |
174 | * `cons`
175 | * `inds`
176 | * `vars`
177 | * `output` (default `"fraction"`)
178 | * `iterations` (default 10)
179 |
180 | Additional arguments are required depending on the output requested.
181 | For `output = "fraction"`:
182 |
183 | * `id`
184 |
185 | For `output = "integer"`:
186 |
187 | * `method` (default `"trs"`)
188 | * `seed` (default 42)
189 |
190 | Details of these context-specific arguments can be found in the
191 | respective documentation for `rk_integerise()` or `rk_extract()`.
192 |
193 | ```{r rake-int-exaple}
194 | rake_int <- rk_rake(cons, inds, vars, output = "integer",
195 | method = "trs", seed = 42)
196 | rake_int[1:6, ]
197 | ```
198 |
199 | ```{r rake-frac-example}
200 | rake_frac <- rk_rake(cons, inds, vars, output = "fraction", id = "id")
201 | rake_frac
202 | ```
203 |
204 |
205 | ## Contributing
206 |
207 | Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md).
208 | By participating in this project you agree to abide by its terms.
209 |
210 |
211 | ## Issues and feedback
212 |
213 | Feedback on the API,
214 | [bug reports/issues](https://github.com/philmikejones/rakeR/issues),
215 | and pull requests are very welcome.
216 |
217 |
218 | ## Acknowledgements
219 |
220 | Many of the functions in this package are based on code written by
221 | [Robin Lovelace](https://github.com/Robinlovelace) and
222 | [Morgane Dumont](https://github.com/modumont) for their book
223 | [*Spatial Microsimulation with R* (2016), Chapman and Hall/CRC Press](https://www.crcpress.com/Spatial-Microsimulation-with-R/Lovelace-Dumont/p/book/9781498711548).
224 |
225 | Their book is an excellent resource for learning
226 | about spatial microsimulation and understanding what's going on under the hood
227 | of this package.
228 |
229 | The book and code are licensed under the terms below:
230 |
231 | Copyright (c) 2014 Robin Lovelace
232 |
233 | Permission is hereby granted, free of charge, to any person obtaining a copy
234 | of this software and associated documentation files (the "Software"), to deal
235 | in the Software without restriction, including without limitation the rights
236 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
237 | copies of the Software, and to permit persons to whom the Software is
238 | furnished to do so, subject to the following conditions:
239 |
240 | The above copyright notice and this permission notice shall be included in all
241 | copies or substantial portions of the Software.
242 |
243 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
244 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
245 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
246 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
247 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
248 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
249 | SOFTWARE.
250 |
251 |
252 | The [rewighting (ipfp) algorithm](https://github.com/awblocker/ipfp) is
253 | written by Andrew Blocker.
254 |
255 | The [`wrswoR` package](http://krlmlr.github.io/wrswoR/) used for fast sampling
256 | without replacement is written by Kirill Müller.
257 |
258 |
259 | Thanks to [Tom Broomhead](http://mhs.group.shef.ac.uk/members/tom-broomhead/)
260 | for his feedback on error handling and suggestions on function naming, to [Andrew Smith](https://github.com/virgesmith) for bug fixes, and Derrick Atherton for suggestions, feedback, and testing.
261 |
262 |
263 | Data used in some of the examples and tests ('cakeMap') are anonymised data from the [Adult Dental Health Survey](https://data.gov.uk/dataset/adult_dental_health_survey), used under terms of the [Open Government Licence](http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/).
264 |
265 |
266 | ## Contact
267 |
268 | philmikejones at gmail dot com
269 |
270 |
271 | ## License
272 |
273 | Copyright 2016-18 Phil Mike Jones.
274 |
275 | rakeR is free software: you can redistribute it and/or modify
276 | it under the terms of the GNU General Public License as published by
277 | the Free Software Foundation, either version 3 of the License, or
278 | (at your option) any later version.
279 |
280 | rakeR is distributed in the hope that it will be useful,
281 | but WITHOUT ANY WARRANTY; without even the implied warranty of
282 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
283 | GNU General Public License for more details.
284 |
285 | You should have received a copy of the GNU General Public License
286 | along with rakeR. If not, see .
287 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | rakeR
3 | =====
4 |
5 | [](https://travis-ci.org/philmikejones/rakeR) [](https://codecov.io/gh/philmikejones/rakeR) [](https://cran.r-project.org/package=rakeR) [](https://doi.org/10.5281/zenodo.821506) [](https://www.repostatus.org/#active)
6 |
7 | Create a spatial microsimulated data set in R using iterative proportional fitting ('raking').
8 |
9 | Install
10 | -------
11 |
12 | Install the latest stable version from CRAN:
13 |
14 | ``` r
15 | install.packages("rakeR")
16 | ```
17 |
18 | Or install the development version with `devtools`:
19 |
20 | ``` r
21 | # Obtain devtools if you don't already have it installed
22 | # install.packages("devtools")
23 |
24 | # Install rakeR development version from GitHub
25 | devtools::install_github("philmikejones/rakeR")
26 | ```
27 |
28 | Load the package with:
29 |
30 | ``` r
31 | library("rakeR")
32 | ```
33 |
34 | Overview
35 | --------
36 |
37 | `rakeR` has three main functions. The first stage is always to use `rk_weight()` to produce a matrix of fractional weights. This matrix stores weights for each individual in each zone.
38 |
39 | From this weight matrix, `rakeR` has functions to create fractional weights (`rk_extract()`) or integerised cases (`rk_integerise()`), depending on your needs and use cases. Fractional (`rk_extract()`ed) weights are generally more accurate, while integer cases are probably the most intuitive to use and are useful as inputs for further modeling.
40 |
41 | To create fractional weights use `rk_weight()` then `rk_extract()`, and to produce integerised weights use `rk_weight()` then `rk_integerise()`.
42 |
43 | Inputs
44 | ------
45 |
46 | To perform the weighting you should supply two data frames. One data frame should contain the constraint information (`cons`) with counts per category for each zone (e.g. census counts). The other data frame should contain the individual--level data (`inds`), i.e. one row per individual.
47 |
48 | In addition, it is necessary to supply a character vector with the names of the constraint variables in `inds` (`vars`). This is so that `rakeR` knows which are the contraint variables and which variables it should be simulating as an output.
49 |
50 | Below are examples of `cons`, `inds`, and `vars`.
51 |
52 | ``` r
53 | cons <- data.frame(
54 | "zone" = letters[1:3],
55 | "age_0_49" = c(8, 2, 7),
56 | "age_gt_50" = c(4, 8, 4),
57 | "sex_f" = c(6, 6, 8),
58 | "sex_m" = c(6, 4, 3),
59 | stringsAsFactors = FALSE
60 | )
61 |
62 | inds <- data.frame(
63 | "id" = LETTERS[1:5],
64 | "age" = c("age_gt_50", "age_gt_50", "age_0_49", "age_gt_50", "age_0_49"),
65 | "sex" = c("sex_m", "sex_m", "sex_m", "sex_f", "sex_f"),
66 | "income" = c(2868, 2474, 2231, 3152, 2473)
67 | )
68 |
69 | vars <- c("age", "sex")
70 | ```
71 |
72 | It is *essential* that the unique levels in the constraint variables in the `inds` data set match the variables names in the `cons` data set. For example, `age_0_49` and `age_gt_50` are variable names in `cons` and the unique levels of the `age` variable in `inds` precisely match these:
73 |
74 | ``` r
75 | all.equal(
76 | levels(inds$age), colnames(cons[, 2:3]) # cons[, 1] is the id column
77 | )
78 | #> [1] TRUE
79 | ```
80 |
81 | Without this, the functions do not know how to match the `inds` and `cons` data and will fail so as not to return spurious results.
82 |
83 | `rk_weight()`
84 | -------------
85 |
86 | (Re-)weighting is done with `rk_weight()` which returns a data frame of raw weights.
87 |
88 | ``` r
89 | weights <- rk_weight(cons = cons, inds = inds, vars = vars)
90 | weights
91 | #> a b c
92 | #> A 1.227998 1.7250828 0.7250828
93 | #> B 1.227998 1.7250828 0.7250828
94 | #> C 3.544004 0.5498344 1.5498344
95 | #> D 1.544004 4.5498344 2.5498344
96 | #> E 4.455996 1.4501656 5.4501656
97 | ```
98 |
99 | The raw weights tell you how frequently each individual (`A`-`E`) should appear in each zone (`a`-`c`). The raw weights are useful when validating and checking performance of the model, so it can be necessary to save these separately. They aren't very useful for analysis however, so we can `rk_extract()` or `rk_integerise()` them into a useable form.
100 |
101 | `rk_extract()`
102 | --------------
103 |
104 | `rk_extract()` produces aggregated totals of the simulated data for each category in each zone. `rk_extract()`ed data is generally more accurate than `rk_integerise()`d data, although the user should be careful this isn't spurious precision based on context and knowledge of the domain. Because `rk_extract()` creates a column for each level of each variable, numerical variables (e.g. income) must be removed or `cut()` (otherwise the result would include a new column for each unique numerical value):
105 |
106 | ``` r
107 | inds$income <- cut(inds$income, breaks = 2, include.lowest = TRUE,
108 | labels = c("low", "high"))
109 |
110 | ext_weights <- rk_extract(weights, inds = inds, id = "id")
111 | ext_weights
112 | #> code total age_0_49 age_gt_50 sex_f sex_m high low
113 | #> 1 a 12 8 4 6 6 2.772002 9.227998
114 | #> 2 b 10 2 8 6 4 6.274917 3.725083
115 | #> 3 c 11 7 4 8 3 3.274917 7.725083
116 | ```
117 |
118 | `rk_extract()` returns one row per zone, and the total of each category (for example female and male, or high and low income) will match the known population.
119 |
120 | `rk_integerise()`
121 | -----------------
122 |
123 | The `rk_integerise()` function produces a simulated data frame populated with simulated individuals. This is typically useful when:
124 |
125 | - You need to include numerical variables, such as income in the example.
126 | - You want individual cases to use as input to a dynamic or agent-based model.
127 | - You want 'case studies' to illustrate characteristics of individuals in an area.
128 | - Individual-level data is more intuitive to work with.
129 |
130 | ``` r
131 | int_weights <- rk_integerise(weights, inds = inds)
132 | int_weights[1:6, ]
133 | #> id age sex income zone
134 | #> 1 A age_gt_50 sex_m high a
135 | #> 1.1 A age_gt_50 sex_m high a
136 | #> 2 B age_gt_50 sex_m low a
137 | #> 3 C age_0_49 sex_m low a
138 | #> 3.1 C age_0_49 sex_m low a
139 | #> 3.2 C age_0_49 sex_m low a
140 | ```
141 |
142 | `rk_integerise()` returns one row per case, and the number of rows will match the known population (taken from `cons`).
143 |
144 | `rk_rake()`
145 | -----------
146 |
147 | `rk_rake()` is a wrapper for `rk_weight() %>% rk_extract()` or `rk_weight() %>% rk_integerise()`. This is useful if the raw weights (from `rk_weight()`) are not required. The desired output is specified with the `output` argument, which can be specified with `"fraction"` (the default) or `"integer"`. The function takes the following arguments in all cases:
148 |
149 | - `cons`
150 | - `inds`
151 | - `vars`
152 | - `output` (default `"fraction"`)
153 | - `iterations` (default 10)
154 |
155 | Additional arguments are required depending on the output requested. For `output = "fraction"`:
156 |
157 | - `id`
158 |
159 | For `output = "integer"`:
160 |
161 | - `method` (default `"trs"`)
162 | - `seed` (default 42)
163 |
164 | Details of these context-specific arguments can be found in the respective documentation for `rk_integerise()` or `rk_extract()`.
165 |
166 | ``` r
167 | rake_int <- rk_rake(cons, inds, vars, output = "integer",
168 | method = "trs", seed = 42)
169 | rake_int[1:6, ]
170 | #> id age sex income zone
171 | #> 1 A age_gt_50 sex_m high a
172 | #> 1.1 A age_gt_50 sex_m high a
173 | #> 2 B age_gt_50 sex_m low a
174 | #> 3 C age_0_49 sex_m low a
175 | #> 3.1 C age_0_49 sex_m low a
176 | #> 3.2 C age_0_49 sex_m low a
177 | ```
178 |
179 | ``` r
180 | rake_frac <- rk_rake(cons, inds, vars, output = "fraction", id = "id")
181 | rake_frac
182 | #> code total age_0_49 age_gt_50 sex_f sex_m high low
183 | #> 1 a 12 8 4 6 6 2.772002 9.227998
184 | #> 2 b 10 2 8 6 4 6.274917 3.725083
185 | #> 3 c 11 7 4 8 3 3.274917 7.725083
186 | ```
187 |
188 | Contributing
189 | ------------
190 |
191 | Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.
192 |
193 | Issues and feedback
194 | -------------------
195 |
196 | Feedback on the API, [bug reports/issues](https://github.com/philmikejones/rakeR/issues), and pull requests are very welcome.
197 |
198 | Acknowledgements
199 | ----------------
200 |
201 | Many of the functions in this package are based on code written by [Robin Lovelace](https://github.com/Robinlovelace) and [Morgane Dumont](https://github.com/modumont) for their book [*Spatial Microsimulation with R* (2016), Chapman and Hall/CRC Press](https://www.crcpress.com/Spatial-Microsimulation-with-R/Lovelace-Dumont/p/book/9781498711548).
202 |
203 | Their book is an excellent resource for learning about spatial microsimulation and understanding what's going on under the hood of this package.
204 |
205 | The book and code are licensed under the terms below:
206 |
207 | Copyright (c) 2014 Robin Lovelace
208 |
209 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
210 |
211 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
212 |
213 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
214 |
215 | The [rewighting (ipfp) algorithm](https://github.com/awblocker/ipfp) is written by Andrew Blocker.
216 |
217 | The [`wrswoR` package](http://krlmlr.github.io/wrswoR/) used for fast sampling without replacement is written by Kirill Müller.
218 |
219 | Thanks to [Tom Broomhead](http://mhs.group.shef.ac.uk/members/tom-broomhead/) for his feedback on error handling and suggestions on function naming, to [Andrew Smith](https://github.com/virgesmith) for bug fixes, and Derrick Atherton for suggestions, feedback, and testing.
220 |
221 | Data used in some of the examples and tests ('cakeMap') are anonymised data from the [Adult Dental Health Survey](https://data.gov.uk/dataset/adult_dental_health_survey), used under terms of the [Open Government Licence](http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/).
222 |
223 | Contact
224 | -------
225 |
226 | philmikejones at gmail dot com
227 |
228 | License
229 | -------
230 |
231 | Copyright 2016-18 Phil Mike Jones.
232 |
233 | rakeR is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
234 |
235 | rakeR is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
236 |
237 | You should have received a copy of the GNU General Public License along with rakeR. If not, see .
238 |
--------------------------------------------------------------------------------
/codecov.yml:
--------------------------------------------------------------------------------
1 | codecov:
2 | branch: master
3 |
--------------------------------------------------------------------------------
/codemeta.json:
--------------------------------------------------------------------------------
1 | {
2 | "@context": [
3 | "https://doi.org/10.5063/schema/codemeta-2.0",
4 | "http://schema.org"
5 | ],
6 | "@type": "SoftwareSourceCode",
7 | "identifier": "rakeR",
8 | "description": "Functions for performing spatial microsimulation ('raking')\n in R.",
9 | "name": "rakeR: Easy Spatial Microsimulation (Raking) in R",
10 | "codeRepository": "https://philmikejones.github.io/rakeR/",
11 | "issueTracker": "https://github.com/philmikejones/rakeR/issues",
12 | "license": "https://spdx.org/licenses/GPL-3.0",
13 | "version": "0.2.1.9000",
14 | "programmingLanguage": {
15 | "@type": "ComputerLanguage",
16 | "name": "R",
17 | "version": "3.5.1",
18 | "url": "https://r-project.org"
19 | },
20 | "runtimePlatform": "R version 3.5.1 (2018-07-02)",
21 | "provider": {
22 | "@id": "https://cran.r-project.org",
23 | "@type": "Organization",
24 | "name": "Comprehensive R Archive Network (CRAN)",
25 | "url": "https://cran.r-project.org"
26 | },
27 | "author": [
28 | {
29 | "@type": "Person",
30 | "givenName": "Phil Mike",
31 | "familyName": "Jones",
32 | "email": "philmikejones@gmail.com",
33 | "@id": "https://orcid.org/0000-0001-5173-3245"
34 | },
35 | {
36 | "@type": "Person",
37 | "givenName": "Robin",
38 | "familyName": "Lovelace"
39 | },
40 | {
41 | "@type": "Person",
42 | "givenName": "Morgane",
43 | "familyName": "Dumont"
44 | }
45 | ],
46 | "contributor": [
47 | {
48 | "@type": "Person",
49 | "givenName": "Andrew",
50 | "familyName": "Smith"
51 | }
52 | ],
53 | "maintainer": [
54 | {
55 | "@type": "Person",
56 | "givenName": "Phil Mike",
57 | "familyName": "Jones",
58 | "email": "philmikejones@gmail.com",
59 | "@id": "https://orcid.org/0000-0001-5173-3245"
60 | }
61 | ],
62 | "softwareSuggestions": [
63 | {
64 | "@type": "SoftwareApplication",
65 | "identifier": "testthat",
66 | "name": "testthat",
67 | "provider": {
68 | "@id": "https://cran.r-project.org",
69 | "@type": "Organization",
70 | "name": "Comprehensive R Archive Network (CRAN)",
71 | "url": "https://cran.r-project.org"
72 | },
73 | "sameAs": "https://CRAN.R-project.org/package=testthat"
74 | },
75 | {
76 | "@type": "SoftwareApplication",
77 | "identifier": "readr",
78 | "name": "readr",
79 | "provider": {
80 | "@id": "https://cran.r-project.org",
81 | "@type": "Organization",
82 | "name": "Comprehensive R Archive Network (CRAN)",
83 | "url": "https://cran.r-project.org"
84 | },
85 | "sameAs": "https://CRAN.R-project.org/package=readr"
86 | }
87 | ],
88 | "softwareRequirements": [
89 | {
90 | "@type": "SoftwareApplication",
91 | "identifier": "R",
92 | "name": "R",
93 | "version": ">= 3.4.0"
94 | },
95 | {
96 | "@type": "SoftwareApplication",
97 | "identifier": "ipfp",
98 | "name": "ipfp",
99 | "provider": {
100 | "@id": "https://cran.r-project.org",
101 | "@type": "Organization",
102 | "name": "Comprehensive R Archive Network (CRAN)",
103 | "url": "https://cran.r-project.org"
104 | },
105 | "sameAs": "https://CRAN.R-project.org/package=ipfp"
106 | },
107 | {
108 | "@type": "SoftwareApplication",
109 | "identifier": "wrswoR",
110 | "name": "wrswoR",
111 | "provider": {
112 | "@id": "https://cran.r-project.org",
113 | "@type": "Organization",
114 | "name": "Comprehensive R Archive Network (CRAN)",
115 | "url": "https://cran.r-project.org"
116 | },
117 | "sameAs": "https://CRAN.R-project.org/package=wrswoR"
118 | }
119 | ],
120 | "releaseNotes": "https://github.com/philmikejones/rakeR/blob/master/NEWS.md",
121 | "readme": "https://github.com/philmikejones/rakeR/blob/master/README.md",
122 | "fileSize": "29.318KB",
123 | "contIntegration": [
124 | "https://travis-ci.org/philmikejones/rakeR",
125 | "https://codecov.io/gh/philmikejones/rakeR"
126 | ],
127 | "developmentStatus": "https://www.repostatus.org/#active",
128 | "relatedLink": "https://CRAN.R-project.org/package=rakeR"
129 | }
130 |
--------------------------------------------------------------------------------
/cran-comments.md:
--------------------------------------------------------------------------------
1 | ## Test environments
2 | * Local Ubuntu Linux (16.06 Xenial) 64-bit, R 3.4.2
3 | * Ubuntu Linux (14.04.5 Trusty) on Travis-CI, R 3.4.1
4 | * win-builder: R-release v3.4.2 (2017-09-28)
5 | * win-builder: R-devel (unstable) (2017-09-12 r73242)
6 |
7 |
8 | ## R CMD check results
9 | * Local: no ERRORs, WARNINGs, or NOTEs
10 | * R-release: 1 NOTE - spelling of 'microsimulation' (which is correct)
11 | * R-devel: 1 NOTE - development version
12 |
13 |
14 | ## Downstream dependencies
15 | There are currently no reverse/downstream dependencies
16 |
--------------------------------------------------------------------------------
/data-raw/prep-cakemap.R:
--------------------------------------------------------------------------------
1 | # Prep and export cons and inds for tests
2 |
3 | # Cons
4 | cons <- readr::read_csv("inst/extdata/cakemap_cons.csv")
5 | colnames(cons) <- gsub("X", "", colnames(cons))
6 | colnames(cons) <- gsub("\\.", "_", colnames(cons))
7 | colnames(cons)[13:14] <- c("car_yes", "car_no")
8 | colnames(cons)[15:ncol(cons)] <- paste0("n_", colnames(cons)[15:ncol(cons)])
9 | colnames(cons)[ncol(cons)] <- "n_97"
10 | cons$code <- paste0("c", 1:nrow(cons))
11 | cons <- cons[, c(ncol(cons), 14:13, 15:24, 7:12, 1:6)]
12 |
13 | # population of nssec is 3 out compared to age/sex and car
14 | cons[1, 6] <- 2775
15 |
16 | cons[, 2:ncol(cons)] <- round(cons[, 2:ncol(cons)], digits = 2)
17 | readr::write_csv(cons, path = "tests/cakemap_cons.csv")
18 |
19 |
20 | # inds
21 | inds <- readr::read_csv("inst/extdata/cakemap_inds.csv")
22 | inds$code <- paste0("i", 1:nrow(inds))
23 | inds <- inds[, c(6, 2:5, 1)]
24 |
25 | inds$ageband4 <- gsub("-", "_", inds$ageband4)
26 | inds$NSSEC8 <- gsub("\\.", "_", inds$NSSEC8)
27 | inds$NSSEC8 <- paste0("n_", inds$NSSEC8)
28 | inds$Sex <- factor(inds$Sex, levels = 1:2,
29 | labels = c("male", "female"))
30 | inds$Car <- factor(inds$Car, levels = 1:2,
31 | labels = c("car_yes", "car_no"))
32 |
33 | inds$ageband4[inds$Sex == "male"] <-
34 | paste0("sex_m", inds$ageband4[inds$Sex == "male"])
35 | inds$ageband4[inds$Sex == "female"] <-
36 | paste0("sex_f", inds$ageband4[inds$Sex == "female"])
37 | inds <- inds[, -3]
38 |
39 | readr::write_csv(inds, path = "tests/cakemap_inds.csv")
40 |
--------------------------------------------------------------------------------
/docs/CONDUCT.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
103 |
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
104 |
Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
105 |
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
106 |
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
Functions have been renamed to include a rk_ prefix. Function names without prefixes have been deprecated but will continue to work for now so as not to affect existing code.
107 |
Increased code coverage.
108 |
check_constraint() and check_ind() are now deprecated. These checks are carried out by the weight() and/or integerise()/extract() functions automatically.
109 |
Update README.md documentation
110 |
Add additional unit tests - thanks Derrick Atherton for feedback
111 |
Add appropriate acknowledgements for source of data set used for examples and testing.
112 |
113 |
114 |
115 |
116 | v 0.2.1 2017-10-10
117 |
118 |
Patch release:
119 |
120 |
Fixes to examples and tests to use correct variable labels and names (thanks Derrick Atherton for pointing this out).
121 |
Fix version number in NEWS
122 |
123 |
124 |
125 |
126 | v 0.2.0 2017-06-30
127 |
128 |
129 |
integerise() now uses the wrswoR package for sampling without replacement. This is in the order of 100s of times faster, reducing the time taken for the function to return from hours to minutes. I have implemented the sample_int_crank() function as this has given results similar to that of base R’s sample() in testing, so this should minimise changes between the two. See https://stackoverflow.com/questions/15113650/faster-weighted-sampling-without-replacement or https://cran.r-project.org/package=wrswoR for details of the implementation.
130 |
simulate() is deprecated. Instead of weight() %>% integerise() %>% simulate(), just use weight() %>% integerise(). This is to improve consistency with the steps to produce fractional weights (weight() %>% extract()).
131 |
extract_weights() has been deprecated. Use extract() instead.
Deprecated: these checks are now done as part of the weight() and/or
109 | extract()/integerise() steps automatically
110 |
111 |
112 |
113 |
check_constraint(constraint_var, num_zones)
114 |
115 |
Arguments
116 |
117 |
118 |
119 |
constraint_var
120 |
The constraint table to check, usually a data frame
121 |
122 |
123 |
num_zones
124 |
The number of zones that should be present in the table
125 |
126 |
127 |
128 |
Value
129 |
130 |
If no errors are detected the function returns silently. Any errors
131 | will stop the function or script to be investigated.
132 |
133 |
Details
134 |
135 |
Checks a constraint table for the following common errors:
136 |
Ensures all zone codes are unique
137 |
Ensures there are the expected number of zones
138 |
Ensures all but the zone column are numeric (integer or double)
139 |
140 |
141 |
142 |
Examples
143 |
## Not run
144 | ## check_constraint() is deprecated. These checks are automatically
145 | ## carried out as part of the weight() and/or extract()/integerise()
146 | ## functions
147 |
Extract aggregate weights from individual weight table
108 |
109 |
110 |
111 |
rk_extract(weights, inds, id)
112 |
113 |
Arguments
114 |
115 |
116 |
117 |
weights
118 |
A weight table, typically produced by rakeR::weight()
119 |
120 |
121 |
inds
122 |
The individual level data
123 |
124 |
125 |
id
126 |
The unique id variable in the individual level data (inds),
127 | usually the first column
128 |
129 |
130 |
131 |
Value
132 |
133 |
A data frame with zones and aggregated simulated values for each
134 | variable
135 |
136 |
Details
137 |
138 |
Extract aggregate weights from individual weight table, typically produced
139 | by rakeR::rk_weight()
140 |
Extract cannot operate with numeric variables because it creates a new
141 | variable for each unique factor of each variable
142 | If you want numeric information, like income, you need to cut() the
143 | numeric values, or use integerise() instead.
144 |
145 |
146 |
Examples
147 |
## Not run
148 | ## Use weights object from rk_weight()
149 | ## ext_weights <- rk_extract(weights = weights, inds = inds, id = "id")
150 |
A matrix or data frame of fractional weights, typically
119 | provided by rakeR::rk_weight()
120 |
121 |
122 |
inds
123 |
The individual--level data (i.e. one row per individual)
124 |
125 |
126 |
method
127 |
The integerisation method specified as a character string.
128 | Defaults to "trs"; currently other methods are not implemented.
129 |
130 |
131 |
seed
132 |
The seed to use, defaults to 42.
133 |
134 |
135 |
136 |
Value
137 |
138 |
A data frame of integerised cases
139 |
140 |
Details
141 |
142 |
Extracted weights (using rakeR::rk_extract()) are more 'precise' than
143 | integerised weights (although the user should be careful this is not
144 | spurious precision based on context) as they return fractions.
145 | Nevertheless, integerised weights are useful in cases when:
146 |
Numeric information (such as income) is required, as this needs
147 | to be cut() to work with rakeR::rk_extract()
148 |
Simulated 'individuals' are required for case studies of key
149 | areas.
150 |
Input individual-level data for agent-based or dynamic models are
151 | required
Produces fractional weights using the iterative proportional fitting
109 | algorithm.
110 |
111 |
112 |
113 |
rk_weight(cons, inds, vars=NULL, iterations=10)
114 |
115 |
Arguments
116 |
117 |
118 |
119 |
cons
120 |
A data frame containing all the constraints. This
121 | should be in the format of one row per zone, one column per constraint
122 | category. The first column should be a zone code; all other columns must be
123 | numeric counts.
124 |
125 |
126 |
inds
127 |
A data frame containing individual-level (survey) data. This
128 | should be in the format of one row per individual, one column per
129 | constraint. The first column should be an individual ID.
130 |
131 |
132 |
vars
133 |
A character vector of variables that constrain the simulation
134 | (i.e. independent variables)
135 |
136 |
137 |
iterations
138 |
The number of iterations the algorithm should complete.
139 | Defaults to 10
140 |
141 |
142 |
143 |
Value
144 |
145 |
A data frame of fractional weights for each individual in each zone
146 | with zone codes recorded in column names and individual id recorded in row
147 | names.
148 |
149 |
Details
150 |
151 |
rk_weight() requires three arguments:
152 |
A data frame of constraints (e.g. census tables)
153 |
A data frame of individual data (e.g. a survey)
154 |
A character vector of constraint variable names
155 |
156 |
The first column of each data frame should be an ID. The first column of
157 | cons should contain the zone codes. The first column of inds
158 | should contain the individual unique identifier.
159 |
Both data frames should only contain:
160 |
an ID column (zone ID cons or individual ID inds).
161 |
constraints inds or constraint category cons.
162 |
inds can optionally contain additional dependent variables
163 | that do not influence the weighting process.
164 |
165 |
No other columns should be present (the user can merge these back in later).
166 |
It is essential that the levels in each inds constraint (i.e. column)
167 | match exactly with the column names in cons. In the example below see
168 | how the column names in cons ('age_0_49', 'sex_f', ...) match exactly
169 | the levels in the appropriate inds variables.
170 |
The columns in cons must be arranged in alphabetical order because
171 | these are created alphabetically when they are 'spread' in the
172 | individual-level data.
#> a b c
198 | #> A 1.227998 1.7250828 0.7250828
199 | #> B 1.227998 1.7250828 0.7250828
200 | #> C 3.544004 0.5498344 1.5498344
201 | #> D 1.544004 4.5498344 2.5498344
202 | #> E 4.455996 1.4501656 5.4501656
A data frame containing all the constraints. This
119 | should be in the format of one row per zone, one column per constraint
120 | category. The first column should be a zone code; all other columns must be
121 | numeric counts.
122 |
123 |
124 |
inds
125 |
A data frame containing individual-level (survey) data. This
126 | should be in the format of one row per individual, one column per
127 | constraint. The first column should be an individual ID.
128 |
129 |
130 |
vars
131 |
A character vector of variables that constrain the simulation
132 | (i.e. independent variables)
133 |
134 |
135 |
iterations
136 |
The number of iterations the algorithm should complete.
137 | Defaults to 10
138 |
139 |
140 |
141 |
Value
142 |
143 |
A data frame of fractional weights for each individual in each zone
144 | with zone codes recorded in column names and individual id recorded in row
145 | names.