├── .Rbuildignore ├── .gitignore ├── DESCRIPTION ├── NAMESPACE ├── R ├── codebookr-package.R ├── create_factors.R ├── print.codebook.R ├── read_codebook.R ├── validate_limits.R └── zzz.R ├── README.md ├── codebookr.Rproj ├── inst └── demoFiles │ ├── CodeBook-small2.csv │ ├── data1-birth.csv │ ├── data1-yr21.csv │ ├── data1_codebook.csv │ ├── small2.csv │ └── small2_codebook.csv ├── man ├── codebookr.Rd ├── create_factors.Rd ├── print.codebook.Rd ├── read_codebook.Rd └── validate_limits.Rd └── notes.org /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^notes\.org$ 4 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ## From https://gist.github.com/917150 2 | 3 | *~ 4 | *.png 5 | *.tiff 6 | *.jpg 7 | *.jpeg 8 | 9 | ## .gitignore 10 | ## some from http://help.github.com/ignore-files/ 11 | ## and added R/latex specific 12 | ## Note: goes in root of project and is not stored with git repo 13 | 14 | ##################### 15 | # emacs backups etc # 16 | ##################### 17 | *~ 18 | \#*\# 19 | auto/ 20 | 21 | ####################################################### 22 | # R output # 23 | # NB: may want to comment pdf's in doc subdirectories # 24 | ####################################################### 25 | .Rhistory 26 | *.Rout 27 | *_Rout.txt 28 | *.pdf 29 | 30 | ####################################### 31 | # intermediate Sweave/latex/org files # 32 | ####################################### 33 | ## *.tex # uncomment if too many intermediate Sweave tex files 34 | *.aux 35 | *.out 36 | *.log 37 | *.bbl 38 | *.blg 39 | *.ilg 40 | *.ind 41 | 42 | ########################## 43 | # Packages/zip/iso files # 44 | ########################## 45 | # it's better to unpack these files and commit the raw source 46 | # git has its own built in compression methods 47 | *.7z 48 | *.dmg 49 | *.gz 50 | *.iso 51 | *.jar 52 | *.rar 53 | *.tar 54 | *.zip 55 | 56 | ############# 57 | # Databases # 58 | ############# 59 | *.sql 60 | *.sqlite 61 | 62 | 63 | ############# 64 | # latex # 65 | ############# 66 | *.aux 67 | *.glo 68 | *.idx 69 | *.log 70 | *.toc 71 | *.ist 72 | *.acn 73 | *.acr 74 | *.alg 75 | *.bbl 76 | *.blg 77 | *.dvi 78 | *.glg 79 | *.gls 80 | *.ilg 81 | *.ind 82 | *.lof 83 | *.lot 84 | *.maf 85 | *.mtc 86 | *.mtc1 87 | *.out 88 | *.synctex.gz 89 | *.pdfsync 90 | *.nav 91 | *.snm 92 | 93 | ############# 94 | # latexmk # 95 | ############# 96 | 97 | *.fdb_latexmk 98 | *.fls 99 | 100 | ################# 101 | ## R files 102 | ################# 103 | 104 | *.Rout 105 | *_Rout.txt 106 | .Rhistory 107 | Rplots.pdf 108 | .Rproj.user 109 | .RData 110 | 111 | # C object files 112 | *.o 113 | *.so 114 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: codebookr 2 | Title: Aids cleaning, checking and formatting datasets using codebook metadata 3 | Version: 0.0.0.9003 4 | Authors@R: person("Peter", "Baker", email = "drpetebaker@gmail.com", role = c("aut", "cre")) 5 | Description: Automates cleaning, checking and formatting data using metadata 6 | from Codebooks or Data Dictionaries. It is primarily aimed at 7 | epidemiological research and medical studies but can be easily 8 | used in other research areas. Codebook metadata can be read into 9 | R and then used directly for data checking, cleaning and factor 10 | level definitions. Packages 'tidyverse' and 'assertr' 11 | are used for reading and cleaning data. 12 | Depends: R (>= 3.3.3) 13 | License: GPL-3 14 | Encoding: UTF-8 15 | LazyData: true 16 | Imports: tidyverse, 17 | assertr, 18 | magrittr, 19 | utils, 20 | digest, 21 | dplyr, 22 | readr, 23 | stringr, 24 | zoo, 25 | forcats 26 | RoxygenNote: 6.0.1 27 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | S3method(print,codebook) 4 | export(create_factors) 5 | export(read_codebook) 6 | export(validate_limits) 7 | import(dplyr) 8 | import(magrittr) 9 | -------------------------------------------------------------------------------- /R/codebookr-package.R: -------------------------------------------------------------------------------- 1 | ##' codebookr: A package for reading codebooks and applying to datasets 2 | ##' 3 | ##' The codebookr package provides functions for reading a codebook from a 4 | ##' spreadsheet (.csv file), creating factors and checking levels and 5 | ##' checking continuous values are in a valid range. 6 | ##' 7 | ##' @import magrittr 8 | ##' @import dplyr 9 | ##' 10 | ##' @docType package 11 | ##' @name codebookr 12 | NULL 13 | -------------------------------------------------------------------------------- /R/create_factors.R: -------------------------------------------------------------------------------- 1 | ##' Use codebook to create factors and check levels for validity 2 | ##' 3 | ##' Uses a codebook which is an S3 class \code{codebook}, possibly 4 | ##' read in by \code{read_codebook}, to convert numerical vectors into 5 | ##' \code{factors} in a \code{tibble}. 6 | ##' 7 | ##' REPEAT: Often, when analysing data, data dictionaries or code books are 8 | ##' provided with data files. Rather than a \code{word} \code{doc} or 9 | ##' \code{pdf} files, the format required here is in a very specific 10 | ##' format stored as a \code{csv} file. Once read in, attributes such 11 | ##' as factor labels/levels and variable labels can be added to the 12 | ##' \code{data.frame} and/or also used to check factor labels and 13 | ##' variable names are consistent with the code book. Note that while 14 | ##' various methods may be available which attempt to convert word 15 | ##' docs or pdf's to a spreadsheet and/or csv file, extreme care 16 | ##' should be taken as these are far from perfect. 17 | ##' 18 | ##' @param x \code{tibble} to which \code{codebook} is applied 19 | ##' @param code_book \code{codebook} containing factor names and factor levels 20 | ##' to convert numeric or character vectors to \code{factors} and also test 21 | ##' for valid levels. 22 | ##' @param column_names character vector of column names for conversion to 23 | ##' factors. Default: All \code{factors} defined in \code{codebook}. 24 | ##' @return object of type class \code{tibble} 25 | ##' @author Peter Baker \email{pete@@petebaker.id.au} 26 | ##' @examples 27 | ##' file.copy(system.file('demoFiles', 'data1_codebook.csv', 28 | ##' package='codebookr'), 'data1_codebook.csv') 29 | ##' file.copy(system.file('demoFiles', 'data1-birth.csv', 30 | ##' package='codebookr'), 'data1-birth.csv') 31 | ##' data1_codebook <- read_codebook("data1_codebook.csv", 32 | ##' column_names = list(variable_levels = "Factor.Levels", 33 | ##' variable_original = "Old.Variable", 34 | ##' min = "Min", max = "Max")) 35 | ##' data1 <- readr::read_csv('data1-birth.csv') 36 | ##' data1 37 | ##' myData <- create_factors(data1, data1_codebook) 38 | ##' str(myData) 39 | ##' @export 40 | create_factors <- 41 | function(x, code_book, column_names = NULL) 42 | { 43 | ## set and/or check factor names ------------------------------ 44 | if (is.null(column_names)){ 45 | column_names <- code_book$factor_names 46 | } 47 | if (any(!column_names %in% code_book$factor_names)){ 48 | cat("Variable names not in list of factors:\n") 49 | print(column_names[column_names %in% code_book$factor_names]) 50 | cat("List of factors:\n") 51 | print(code_book$factor_names) 52 | stop("One or more 'column_names' not in codebook factors") 53 | } 54 | 55 | ## apply factor levels to variables ---------------------------- 56 | flevels <- code_book$factor_levels 57 | newData <- x 58 | 59 | for (I in 1:length(column_names)){ 60 | 61 | ## try the labels first 62 | var_name <- column_names[I] 63 | xlev <- flevels[[var_name]] 64 | levs <- xlev$fac.level 65 | labs <- xlev$fac.label 66 | if (all(is.na(labs)) | all(levs == labs)) labs <- levs 67 | 68 | test.labs <- unique(newData[var_name]) 69 | 70 | cat("Processing variable:", var_name, "\n\n") 71 | 72 | if (all(as.character(unlist(test.labs)) %in% labs)){ 73 | newData[[var_name]] <- 74 | readr::parse_factor(newData[[var_name]], levels = labs) 75 | cat("Factor:", var_name, "set up with levels:\n") 76 | if (length(labs)<6){ 77 | extra <- "" 78 | } else { 79 | extra <- "... [truncated] ..." 80 | } 81 | print(utils::head(labs)) 82 | cat(extra, "\n") 83 | } else if (all(as.character(unlist(test.labs)) %in% levs)) { 84 | newData[[var_name]] <- 85 | readr::parse_factor(newData[[var_name]], levels = labs) 86 | cat("Factor:", var_name, "set up with levels:\n") 87 | if (length(labs)<6){ 88 | extra <- "" 89 | } else { 90 | extra <- "... [truncated] ..." 91 | } 92 | print(utils::head(labs)) 93 | cat(extra, "\n") 94 | } else { 95 | cat("Warning: Some factor levels in data not in codebook\n", 96 | " These will be set to NA\n\nCodebook Levels:\n") 97 | print(labs) 98 | cat("Levels in dataset:\n") 99 | test.labs <- test.labs %>% .[[1]] # untibble! 100 | print(test.labs) 101 | cat("present in dataset but not codebook:\n") 102 | print(setdiff(test.labs, labs)) 103 | newData[[var_name]] <- 104 | readr::parse_factor(newData[[var_name]], levels = labs) 105 | } 106 | } 107 | newData 108 | } 109 | 110 | ## data1_codebook <- read_codebook("../inst/demoFiles/data1_codebook.csv", 111 | ## column_names = list(variable_levels = "Factor.Levels", 112 | ## variable_original = "Old.Variable", 113 | ## min = "Min", max = "Max")) 114 | ## code_book <- data1_codebook 115 | ## data1 <- readr::read_csv('../inst/demoFiles/data1-birth.csv') 116 | ## x <- data1 117 | 118 | ## myData <- create_factors(data1, data1_codebook) 119 | ## str(myData) 120 | -------------------------------------------------------------------------------- /R/print.codebook.R: -------------------------------------------------------------------------------- 1 | ##' Print an S3 object of class \code{codebook} 2 | ##' 3 | ##' \code{read_codebook} reads a code book stored as a \code{csv} file 4 | ##' for either checking against a data file or relabelling factor 5 | ##' levels or labelling variables. \code{read_codebook} returns an S3 6 | ##' object of class \code{codebook} 7 | ##' 8 | ##' @aliases codebook 9 | ##' 10 | ##' @param x object of class \code{codebook} 11 | ##' @param extra logical: whether to print extra information. Default: FALSE 12 | ##' @param ... extra arguments passed to specific printing functions 13 | ##' 14 | ##' @seealso \code{\link{read_codebook}} 15 | ##' @author Peter Baker \email{pete@@petebaker.id.au} 16 | ##' @examples 17 | ##' file.copy(system.file('demoFiles', 'data1_codebook.csv', 18 | ##' package='codebookr'), 'data1_codebook.csv') 19 | ##' data1_codebook <- read_codebook("data1_codebook.csv", 20 | ##' column_names = list(variable_levels = "Factor.Levels", 21 | ##' variable_original = "Old.Variable", 22 | ##' min = "Min", max = "Max")) 23 | ##' print(data1_codebook) 24 | ##' @export 25 | print.codebook <- 26 | function(x, extra = FALSE, ...) 27 | { 28 | ## check class of object ------------------------------------- 29 | if (class(x) != "codebook") 30 | stop(paste0("Object '", deparse(substitute(x)), 31 | "' not of class 'codebook'")) 32 | 33 | cat("Codebook:", deparse(substitute(x)), "\n\n") 34 | 35 | if (!is.null(x$file_info)){ 36 | file_info <- x$file_info 37 | cat("Codebook read from file:", file_info$codebook_filename, 38 | "\nRead at:", file_info$codebook_read_time, "\nColumn names:\n") 39 | print(file_info$column_names) 40 | } 41 | 42 | if (extra & !is.null(x$renamed_variables)){ 43 | cat("Renamed Variables:\n") 44 | print(x$renamed_variables) 45 | } 46 | 47 | if(!is.null(x$variable_labels)){ 48 | cat("\nVariable Labels:\n") 49 | print(x$variable_labels) 50 | } 51 | 52 | if(!is.null(x$factor_levels)){ 53 | cat("\nFactor Levels:\n") 54 | print(x$factor_levels) 55 | } 56 | 57 | if(!is.null(x$limits_continuous)){ 58 | cat("\nLimits for Continuous Variables:\n") 59 | print(x$limits_continuous) 60 | } 61 | 62 | if (extra & !is.null(x$data_management_plan)){ 63 | cat("\nData Management Plan details:\n") 64 | print(x$data_management_plan, ...) 65 | } 66 | 67 | } 68 | 69 | ## data1_codebook <- read_codebook("../inst/demoFiles/data1_codebook.csv", 70 | ## column_names = list(variable_levels = "Factor.Levels", 71 | ## variable_original = "Old.Variable", 72 | ## min = "Min", max = "Max")) 73 | ## x <- data1_codebook 74 | 75 | ## x 76 | ## print(x, extra = TRUE) 77 | -------------------------------------------------------------------------------- /R/read_codebook.R: -------------------------------------------------------------------------------- 1 | ##' Read a code book in standard format(s) as a csv file 2 | ##' 3 | ##' \code{read_codebook} reads a code book stored as a \code{csv} file 4 | ##' for either checking against a data file or relabelling factor 5 | ##' levels or labelling variables. 6 | ##' 7 | ##' Often, when analysing data, data dictionaries or code books are 8 | ##' provided with data files. Rather than a \code{word} \code{doc} or 9 | ##' \code{pdf} files, the format required here is in a very specific 10 | ##' format stored as a \code{csv} file. Once read in, attributes such 11 | ##' as factor labels/levels and variable labels can be added to the 12 | ##' \code{data.frame} and/or also used to check factor labels and 13 | ##' variable names are consistent with the code book. Note that while 14 | ##' various methods may be available which attempt to convert word 15 | ##' docs or pdf's to a spreadsheet and/or csv file, extreme care 16 | ##' should be taken as these are far from perfect. 17 | ##' 18 | ##' @param x filename of codebook to parse 19 | ##' @param codebook_directory directory containing codebook. Default : 20 | ##' current directory 21 | ##' @param column_names named character vector containing column names 22 | ##' in Code Book file. The vector contains components 23 | ##' \code{variable_name} = variable name, 24 | ##' \code{variable_original} = original name (if variable_name was 25 | ##' changed), \code{label} for printing/plotting, 26 | ##' \code{variable_levels} = factor levels, \code{variable_limits} or 27 | ##' \code{min} and \code{max} for continuous measurements, 28 | ##' \code{missing_values} = numeric or strings for values of 29 | ##' variable to be set as missing \code{comments} = comments about 30 | ##' the variable which may include the measurement instrument or 31 | ##' references about the measurement. Note that default values may 32 | ##' be found with \code{options(codebookr.column_names)} 33 | ##' @param na a character vector of strings which are to be 34 | ##' interpreted as \sQuote{NA} values. Blank fields are also 35 | ##' considered to be missing values in logical, integer, numeric and 36 | ##' complex fields. Default: \code{c("", "NA", ".", " ")} 37 | ##' @param data_management_plan a list containing information like url, 38 | ##' location, authors, date, version and so on. 39 | ##' Default: "All possible details should be here" 40 | ##' @return S3 object of type class \code{codebook} 41 | ##' @author Peter Baker \email{pete@@petebaker.id.au} 42 | ##' @examples 43 | ##' file.copy(system.file('demoFiles', 'data1_codebook.csv', 44 | ##' package='codebookr'), 'data1_codebook.csv') 45 | ##' data1_codebook <- read_codebook("data1_codebook.csv", 46 | ##' column_names = list(variable_levels = "Factor.Levels", 47 | ##' variable_original = "Old.Variable", 48 | ##' min = "Min", max = "Max")) 49 | ##' @export 50 | read_codebook <- 51 | function(x, codebook_directory = NULL, 52 | column_names = NULL, 53 | na = c("", "NA", ".", " "), 54 | data_management_plan = "All possible details should be here") 55 | { 56 | 57 | ## check directory ------------------------------------------------ 58 | if (is.null(codebook_directory)) { 59 | codebook_directory <- "." 60 | } 61 | 62 | if (!file.exists(codebook_directory)) 63 | stop(paste("Error: 'codebook_directory'", codebook_directory, "not found.")) 64 | 65 | ## check for codebook --------------------------------------------- 66 | codebook_filename <- x 67 | codebook_file <- file.path(codebook_directory, x) 68 | if (!file.exists(codebook_file)) 69 | stop(paste("Error: 'codebook_file'", codebook_file, "not found.")) 70 | 71 | ## set up column names for processing ----------------------------- 72 | default_names <- options("codebookr.column_names")[[1]] 73 | if (!is.null(column_names)){ 74 | ##set_names <- match.arg(names(column_names), choices = names(default_names), 75 | ## several.ok = TRUE) 76 | 77 | ## get names to be changed 78 | set_names <- column_names 79 | column_names <- default_names 80 | ## set names to be changed 81 | column_names[names(set_names)] <- set_names 82 | 83 | ## check that correct names 84 | if (!all(names(column_names) %in% names(default_names))){ 85 | cat("User provided names for 'column_names':\n") 86 | print(names(column_names)) 87 | cat("Should be in:\n") 88 | print(names(default_names)) 89 | stop("Please provide correct names.") 90 | } 91 | } else { 92 | column_names <- default_names 93 | } 94 | 95 | ## read in codebook ---------------------------------------- 96 | ## cat("\nFunction 'read_codebook' largely untested: beware!\n\n") 97 | xCodes <- readr::read_csv(codebook_file, na = na) 98 | dfCodes <- as.data.frame(xCodes) 99 | fileName <- deparse(substitute(x)) 100 | colNames <- names(xCodes) 101 | 102 | ## take md5sum of file and other attributes -------------------- 103 | codebook_read_time <- Sys.time() 104 | codebook_digest <- digest::digest(codebook_file) 105 | codebook_time <- 106 | file.info(codebook_file)[c("size", "mtime", "ctime", "atime")] 107 | 108 | ## check names present and not ------------------------------------------- 109 | definedNames <- column_names %in% colNames # are these present 110 | presentNames <- column_names[definedNames] # names that are present 111 | presentNames2 <- as.character(column_names[definedNames]) # not named 112 | absentNames <- column_names[!definedNames] 113 | 114 | ## are codebook column names same as specified ------------------ 115 | if (!(all(colNames %in% column_names))){ 116 | cat(stringr::str_c("File: '", fileName, "'"), "\n") 117 | cat("Column Names:\n") 118 | print(colNames) 119 | cat("Warning: some column names in codebook not defined:\n") 120 | print(absentNames) 121 | cat("\nColumns present (which may be all that are necessary):\n") 122 | print(presentNames) 123 | extraCols <- setdiff(names(xCodes), presentNames) 124 | if (length(extraCols) > 0){ 125 | cat("Extra columns that perhaps should be used for codebook:\n") 126 | print(extraCols) 127 | } 128 | } 129 | 130 | ## variable_levels ------------------------------------------------------ 131 | if ("variable_levels" %in% names(absentNames)){ 132 | cat("Warning: factor levels column not found.\n", 133 | "This should be set if any factors present\n") 134 | isFactorLevels <- FALSE 135 | } else { 136 | isFactorLevels <- TRUE 137 | } 138 | 139 | ## variable labels ------------------------------------------------ 140 | if (length(presentNames["variable_label"]) > 0){ 141 | hhh <- presentNames[c("variable_name", "variable_label")] 142 | varLabels <- xCodes[as.character(hhh)] 143 | ## varLabels <- 144 | ## dplyr::select(xCodes, 145 | ## dplyr::starts_with(presentNames["variable_name"]), 146 | ## dplyr::starts_with(presentNames["variable_label"])) 147 | varLabels <- dplyr::filter(varLabels, 148 | !is.na(dplyr::select(varLabels, 1))) 149 | } else { 150 | varLabels <- NA 151 | } 152 | 153 | ## renamed variables: -------------------------------------------------- 154 | ## if variable renamed then construct table with new and old name 155 | if ("variable_original" %in% names(presentNames)){ 156 | ## extract old/new variable names 157 | hhh <- presentNames[c("variable_name", "variable_original")] 158 | renamed_table <- xCodes[as.character(hhh)] 159 | ##renamed_table <- 160 | ## dplyr::select(xCodes, 161 | ## dplyr::starts_with(presentNames["variable_name"]), 162 | ## dplyr::starts_with(presentNames["variable_original"])) 163 | renamed_table <- dplyr::filter(renamed_table, 164 | !is.na(dplyr::select(renamed_table, 1))) 165 | names(renamed_table) <- c("variable_name", "variable_original") 166 | } else { 167 | renamed_table <- NA 168 | } 169 | 170 | ## set factor levels ------------------------------------------------ 171 | if (isFactorLevels){ 172 | dfCodes$variable_name.filled <- 173 | as.character(zoo::na.locf(dfCodes[,as.character(presentNames["variable_name"])])) 174 | ## appears more than twice then is a factor 175 | factors <- rle(dfCodes$variable_name.filled) 176 | n.levels <- factors$lengths 177 | factors <- factors$values[factors$lengths>1] 178 | n.levels <- n.levels[n.levels > 1] 179 | names(n.levels) <- factors 180 | 181 | factor.info <- dfCodes[dfCodes[, "variable_name.filled"] %in% factors, ] 182 | tmp <- 183 | strsplit(factor.info[, 184 | as.character(presentNames["variable_levels"])], "=") 185 | factor.info$fac.level <- sapply(tmp, function(y) y[1]) 186 | factor.info$fac.label <- sapply(tmp, function(y) y[2]) 187 | factor.info$Factors <- factor.info$variable_name.filled 188 | ## hadley doesn't like dots so variable_name.filled messes up VNF ok 189 | ## plyr::dlply(factor.info, #.(factor.info$Factors), 190 | ## FACTOR, 191 | ## function(y) list(fac.level = y$fac.level, 192 | ## fac.label = y$fac.label)) 193 | ## but really weird plyr interaction is driving me mad - use by instead 194 | factorLevels <- 195 | by(factor.info, factor.info$Factors, function(y) 196 | list(fac.level = y$fac.level, fac.label = y$fac.label)) 197 | } 198 | 199 | ## determine continuous variables ---------------------------------- 200 | contVars <- 201 | dfCodes[grep("[Cc]ont", 202 | dfCodes[,as.character(presentNames["variable_levels"])]), 203 | as.character(presentNames["variable_name"])] 204 | contVars2 <- 205 | dfCodes[is.na(dfCodes[,as.character(presentNames["variable_levels"])]), 206 | as.character(presentNames["variable_name"])] 207 | 208 | contVars <- unique(c(contVars, contVars2)) 209 | 210 | ## min and max for continuous ------------------------------------- 211 | ## min and max specified or limits - need a sensible consistent appoach 212 | 213 | ## choose cols and drop variables without limits 214 | if (any(c("min", "max", "variable_limits") %in% names(presentNames))){ 215 | vars4 <- c("variable_name", "min", "max", "variable_limits") 216 | whichVars4 <- vars4[vars4 %in% names(presentNames)] 217 | chooseCols <- presentNames[whichVars4] 218 | chooseCols <- unlist(chooseCols) 219 | limitsContinuous <- dfCodes[, chooseCols] 220 | 221 | ## rename to standard names 222 | names(limitsContinuous) <- whichVars4 223 | ## drop if all of min, max, limits missing 224 | limitsContinuous <- 225 | limitsContinuous[apply(!is.na(limitsContinuous[,whichVars4[-1]]), 226 | 1, any), ] 227 | ## put limits in min max 228 | limits_continuous <- 229 | limitsContinuous %>% 230 | dplyr::mutate(lims = strsplit(variable_limits, "\\s+")) # %>% 231 | 232 | limits_continuous$l_min <- sapply(limits_continuous$lims, 233 | function(x) as.numeric(x[1])) 234 | limits_continuous$l_max <- sapply(limits_continuous$lims, 235 | function(x) as.numeric(x[2])) 236 | limits_continuous$min <- with(limits_continuous, ifelse(!is.na(min), 237 | min, l_min)) 238 | limits_continuous$max <- with(limits_continuous, ifelse(!is.na(max), 239 | max, l_max)) 240 | limits_continuous <- limits_continuous[,1:3] 241 | } 242 | 243 | 244 | ## store all codebook data away in a S3 "codebook" class 245 | code_book <- list(variable_names = varLabels[,1], 246 | variable_labels = varLabels, 247 | factor_names = factors, 248 | factor_levels = factorLevels, 249 | factor_info = factor.info, 250 | continuous_names = contVars, 251 | limits_continuous = limits_continuous, 252 | renamed_variables = renamed_table, 253 | names_info = list(present_names = presentNames, 254 | absent_names = absentNames), 255 | file_info = list(codebook_filename = codebook_filename, 256 | codebook_directory = codebook_directory, 257 | codebook_file = codebook_file, 258 | codebook_time = codebook_time, 259 | codebook_read_time = codebook_read_time, 260 | column_names = column_names, 261 | code_book = xCodes), 262 | data_management_plan = data_management_plan 263 | ) 264 | comment(code_book) <- paste0("Codebook read from '", codebook_file, 265 | "' at ", date()) 266 | ## S3 class object containing codebook 267 | ## 268 | ## @method default codebook 269 | class(code_book) <- "codebook" 270 | code_book 271 | } 272 | 273 | ## for development -------------------- 274 | 275 | ## codebook_directory <- "../inst/demoFiles" 276 | ## x <- "data1_codebook.csv" 277 | ## column_names <- c(variable_original = "Old.Variable", variable_levels = "Factor.Levels", min = "Min", max = "Max") 278 | 279 | ## data1_codebook <- read_codebook("../inst/demoFiles/data1_codebook.csv", 280 | ## column_names = list(variable_levels = "Factor.Levels", 281 | ## variable_original = "Old.Variable", 282 | ## min = "Min", max = "Max")) 283 | 284 | -------------------------------------------------------------------------------- /R/validate_limits.R: -------------------------------------------------------------------------------- 1 | ##' Use codebook to check continuous variables for validity 2 | ##' 3 | ##' Uses a codebook which is an S3 class \code{codebook}, possibly 4 | ##' read in by \code{read_codebook}, to check for data within valid 5 | ##' ranges. 6 | ##' 7 | ##' REPEAT: Often, when analysing data, data dictionaries or code books are 8 | ##' provided with data files. Rather than a \code{word} \code{doc} or 9 | ##' \code{pdf} files, the format required here is in a very specific 10 | ##' format stored as a \code{csv} file. Once read in, attributes such 11 | ##' as factor labels/levels and variable labels can be added to the 12 | ##' \code{data.frame} and/or also used to check factor labels and 13 | ##' variable names are consistent with the code book. Note that while 14 | ##' various methods may be available which attempt to convert word 15 | ##' docs or pdf's to a spreadsheet and/or csv file, extreme care 16 | ##' should be taken as these are far from perfect. 17 | ##' 18 | ##' @param x \code{tibble} to which \code{codebook} is applied 19 | ##' @param code_book \code{codebook} containing names of continuous 20 | ##' variables and limits for checking data are within range 21 | ##' @param column_names character vector of column names for checking 22 | ##' data in range. Default: All continuous variables defined in 23 | ##' \code{codebook}. 24 | ##' @return object of type class \code{tibble} containing data out of 25 | ##' limits 26 | ##' @author Peter Baker \email{pete@@petebaker.id.au} 27 | ##' @examples 28 | ##' file.copy(system.file('demoFiles', 'data1_codebook.csv', 29 | ##' package='codebookr'), 'data1_codebook.csv') 30 | ##' file.copy(system.file('demoFiles', 'data1-yr21.csv', 31 | ##' package='codebookr'), 'data1-yr21.csv') 32 | ##' data1_codebook <- read_codebook("data1_codebook.csv", 33 | ##' column_names = list(variable_levels = "Factor.Levels", 34 | ##' variable_original = "Old.Variable", 35 | ##' min = "Min", max = "Max")) 36 | ##' data1 <- readr::read_csv('data1-yr21.csv') 37 | ##' data1 38 | ##' non_valid <- validate_limits(data1, data1_codebook) 39 | ##' @export 40 | validate_limits <- 41 | function(x, code_book, column_names = NULL) 42 | { 43 | ## set and/or check names for limits ------------------------------ 44 | limits <- code_book$limits_continuous 45 | limits_names <- limits[["variable_name"]] 46 | 47 | if (is.null(column_names)){ 48 | column_names <- limits_names 49 | } 50 | if (any(!column_names %in% code_book$continuous_names)){ 51 | cat("Variable names not in list of continuous variables:\n") 52 | print(column_names[column_names %in% code_book$continuous_names]) 53 | cat("List of continuous variables:\n") 54 | print(code_book$continuous_names) 55 | stop("One or more 'column_names' not in codebook") 56 | } 57 | 58 | ## apply limits to variables ---------------------------- 59 | newData <- x 60 | 61 | out_of_range <- vector(mode = "list", length = length(column_names)) 62 | names(out_of_range) <- column_names 63 | 64 | for (I in 1:length(column_names)){ 65 | 66 | ## extract min and max 67 | var_name <- column_names[I] 68 | cat("\n+++ Processing variable:", var_name, "\n") 69 | minMax <- dplyr::filter(limits, variable_name == var_name) 70 | 71 | sumData <- summary(x[[var_name]]) 72 | smin <- sumData["Min."] 73 | smax <- sumData["Max."] 74 | cmin <- minMax[["min"]] 75 | cmax <- minMax[["max"]] 76 | 77 | if (smincmax){ 78 | outOfRange <- dplyr::select(x, c(dplyr::starts_with(var_name))) %>% 79 | dplyr::filter(!is.na(.)) %>% dplyr::filter(. < cmin | . > cmax) 80 | cat("Out of range values:\n") 81 | print(outOfRange) 82 | cat("\n") 83 | } else { 84 | outOfRange <- NA 85 | cat("No values out of range.\n\n") 86 | } 87 | out_of_range[[I]] <- outOfRange 88 | } 89 | out_of_range 90 | } 91 | 92 | ## data1_codebook <- read_codebook("../inst/demoFiles/data1_codebook.csv", 93 | ## column_names = list(variable_levels = "Factor.Levels", 94 | ## variable_original = "Old.Variable", 95 | ## min = "Min", max = "Max")) 96 | ## code_book <- data1_codebook 97 | ## data1 <- readr::read_csv('../inst/demoFiles/data1-yr21.csv') 98 | ## x <- data1 99 | 100 | ## non_valid <- validate_limits(data1, data1_codebook) 101 | -------------------------------------------------------------------------------- /R/zzz.R: -------------------------------------------------------------------------------- 1 | ## Filename: zzz.R 2 | ## Purpose: Non-documented functions for 'codebookr' package 3 | ## 4 | ## To run in terminal use: R CMD BATCH --vanilla zzz.R 5 | 6 | ## Created at: Wed Apr 5 15:52:23 2017 7 | ## Author: Peter Baker 8 | ## Hostname: clearwell2 9 | ## Directory: /home/pete/Data/dev/codebookr/R/ 10 | ## Licence: GPLv3 see 11 | ## 12 | ## Change Log: 13 | ## 14 | 15 | ##' Internal function called when library is loaded to set options 16 | ##' @keywords internal 17 | .onLoad <- function(libname, pkgname){ 18 | op <- options() 19 | 20 | op.codebookr <- 21 | list(codebookr.column_names = c(variable_name = "Variable", 22 | variable_label = "Label", 23 | variable_original = "Original_Name", 24 | variable_levels = "Levels", 25 | variable_limits = "Limits", 26 | min = "Minimum", max = "Maximum", 27 | missing_values = "Missing_Values", 28 | factor_type = "Factor_Type", 29 | comments = "Comments")) #, 30 | ## codebookr.na = c("", "NA", ".", " ")) # keep simpler for now 31 | ## could have na.extra parameter 32 | toset <- !(names(op.codebookr) %in% names(op)) 33 | if(any(toset)) options(op.codebookr[toset]) 34 | 35 | invisible() 36 | } 37 | 38 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # codebookr 2 | 3 | **UNDER CONSTRUCTION** 4 | 5 | **codebookr** is an *R* package under development to automate 6 | cleaning, checking and formatting data using metadata from Codebooks 7 | or Data Dictionaries. It is primarily aimed at epidemiological 8 | research and medical studies but can be easily used in other research 9 | areas. 10 | 11 | Researchers collecting primary, secondary or tertiary data from RCTs 12 | or government and hospital administrative systems often have different 13 | data documentation and data cleaning needs to those scraping data off 14 | the web or collecting in-house data for business analytics. However, 15 | all studies will benefit from using codebooks which comprehensively 16 | document all study variables including derived variables. Codebooks 17 | document data formats, variable names, variable labels, factor levels, 18 | valid ranges for continuous variables, details of measuring 19 | instruments and so on. 20 | 21 | For statistical consultants, each new data set has a new codebook. 22 | While statisticians may get a photocopied codebook or pdf, my 23 | preference is a spreadsheet so that the metadata can be used 24 | directly. Many data analysts are happy to use this metadata to code 25 | syntax to read, clean and check data. I prefer to automate this 26 | process by reading the codebook into *R* and then using the metadata 27 | directly for data checking, cleaning, factor level definitions. 28 | 29 | While there is considerable interest in the data wrangling and 30 | cleaning (Jonge and Loo 2013; Wickham 2014; Fischetti 2017), there 31 | appear to be few tools available to read codebooks (see [here](http://jason.bryer.org/posts/2013-01-10/Function_for_Reading_Codebooks_in_R.html)) 32 | and even less to automatically apply the metadata to datasets. 33 | 34 | Codebook examples are from research projects undertaken at University 35 | of Queensland's School of Public Health and have subsequently been 36 | used in biostatistics courses. 37 | 38 | ## References 39 | 40 | Fischetti, Tony. 2017. Assertr: Assertive Programming for R Analysis Pipelines. [www](https://CRAN.R-project.org/package=assertr.) 41 | 42 | Jonge, Edwin de, and Mark van der Loo. 2013. “An Introduction to Data Cleaning with R.” Technical Report 201313. Statistics Netherlands. [www](http://cran.vinastat.com/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf.) 43 | 44 | Wickham, Hadley. 2014. “Tidy Data.” The Journal of Statistical Software 59 (10). [www](http://www.jstatsoft.org/v59/i10/.) 45 | -------------------------------------------------------------------------------- /codebookr.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: No 4 | SaveWorkspace: No 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: knitr 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageRoxygenize: rd,collate,namespace 22 | -------------------------------------------------------------------------------- /inst/demoFiles/CodeBook-small2.csv: -------------------------------------------------------------------------------- 1 | Variable,Old.Variable,Label,Factor.Levels,Factor.Type 2 | ID,codea,"Subject ID",Continuous, 3 | bmi.21,bmi22,"BMI at 21 Years",Continuous, 4 | bpSys.21,NA,"Mean Systolic Blood Pressure at 21 Years",Continuous, 5 | ysrExter.14,jexter,"YSR: externalising at 14 years",Continuous, 6 | ysrInter.14,jinter,"YSR: internalising at 14 years",Continuous, 7 | ysrAggre.14,jaggre,"YSR: aggression at 14 years",Continuous, 8 | matEducat,ra80,"Maternal Education","1=Incomplete High",Ordered 9 | ,,,"2=Complete High", 10 | ,,,"3=Post High", 11 | familyIncome,ra90,"Recoded Income Phase A","1=$10399 or less",Nominal 12 | ,,,"2=$10400 or more", 13 | sex,c45,"Sex of baby",1=Male,Factor 14 | ,,,2=Female, 15 | -------------------------------------------------------------------------------- /inst/demoFiles/data1-birth.csv: -------------------------------------------------------------------------------- 1 | ID,matEducat,familyIncome,bmiM,sex 2 | 5455,Incomplete High,$10400 or more,18.90359116,female 3 | 7036,Complete High,$10399 or less,17.99816322,Female 4 | 5973,Complete High,$10400 or more,23.50780678,Female 5 | 7142,Incomplete High,$10400 or more,36.26243973,Female 6 | 3003,Complete High,$10400 or more,21.35930634,Female 7 | 1020,Complete High,$10399 or less,18.42653275,Female 8 | 1998,Incomplete High,$10399 or less,24.39105797,Female 9 | 3377,Post High,$10400 or more,18.14486885,Female 10 | 5486,Complete High,$10400 or more,20.07733536,Female 11 | 8321,Complete High,$10400 or more,35.29554749,Female 12 | 238,Complete High,$10400 or more,20.0796032,Female 13 | 911,Post High,$10400 or more,17.85651588,Female 14 | 5576,Complete High,$10400 or more,22.48132896,Female 15 | 6,Complete High,$10400 or more,20.28123283,Female 16 | 2476,Complete High,$10399 or less,18.44254684,Female 17 | 3018,Complete High,$10400 or more,25.55932808,Female 18 | 2463,Complete High,$10400 or more,23.12466812,Female 19 | 2525,Complete High,$10400 or more,21.96712112,Female 20 | 1077,Complete High,$10399 or less,21.71924973,Female 21 | 7745,Incomplete High,$10400 or more,NA,Female 22 | 2948,Post High,$10400 or more,18.75,Female 23 | 1992,Complete High,$10399 or less,20.66115761,Female 24 | 7887,Complete High,$10399 or less,23.14725494,Female 25 | 5145,Incomplete High,$10400 or more,22.64737701,Female 26 | 4499,Complete High,$10399 or less,19.19530296,Female 27 | 2461,Post High,$10400 or more,19.83471107,Female 28 | 5058,Complete High,$10400 or more,19.49317741,Female 29 | 3590,Complete High,$10399 or less,21.1927433,Female 30 | 1307,Complete High,$10400 or more,19.72386742,Female 31 | 3144,Incomplete High,$10400 or more,20.57613182,Female 32 | 6078,Incomplete High,$10399 or less,23.4375,Female -------------------------------------------------------------------------------- /inst/demoFiles/data1-yr21.csv: -------------------------------------------------------------------------------- 1 | ID,bmi.21,bpSys.21,bpDia.21,ysrExter.14,ysrInter.14,ysrAnxiousDep.14,ysrAggre.14 2 | 3377,23.981262207,97,64,9,19,8,5 3 | 5486,21.1265296936,106.5,58.5,16,15,8,13 4 | 8321,25.3243427277,93,58.5,NA,NA,NA,NA 5 | 238,21.9651050568,118.5,66.5,11,16,9,8 6 | 911,20.5254440308,110,69.5,NA,NA,NA,NA 7 | 5576,26.9159183502,105.5,58,1,9,3,1 8 | 6,23.2088565826,105,66.5,8,13,4,7 9 | 2476,23.8914604187,124,86.5,7,20,9,6 10 | 3018,34.0110778809,122,77,8,12,6,3 11 | 2463,23.9096088409,115.5,58,16,23,11,12 12 | 2525,19.7118644714,106,72,6,4,3,5 13 | 1077,24.9430541992,103,60,NA,NA,NA,NA 14 | 7745,25.0972232819,111.5,65,11,14,9,8 15 | 2948,21.1183815002,114,62,4,7,3,2 16 | 1992,16.8445663452,110.5,76.5,4,7,2,2 17 | 7887,31.9731903076,106,53.5,20,24,14,15 18 | 5145,19.8052864075,103,69,0,8,0,0 19 | 4499,18.8999385834,119,63.5,31,32,16,22 20 | 2461,30.5249824524,98.5,64,15,11,5,14 21 | 5058,19.9596271515,109,58.5,13,17,9,9 22 | 3590,22.6426544189,107,68,26,20,10,21 23 | 1307,19.2290248871,97,64,9,13,5,8 24 | 3144,22.893573761,111.5,69,15,19,9,12 25 | 6078,23.2556152344,111,55,NA,NA,NA,NA 26 | 40,45.010433197,NA,NA,NA,NA,NA,NA 27 | 1110,20.3244380951,106.5,63.5,12,7,4,10 28 | 4845,22.6430568695,87.5,54.5,11,13,4,7 29 | 2301,17.393705368,100,62,9,13,2,6 30 | 2241,17.6504669189,109,69.5,9,7,3,8 31 | 6794,23.3109378815,103,63,29,5,1,22 32 | 7071,26.2172737122,105,68,13,31,13,7 33 | 4180,24.1990242004,132,86.5,6,4,2,4 34 | -------------------------------------------------------------------------------- /inst/demoFiles/data1_codebook.csv: -------------------------------------------------------------------------------- 1 | Variable,Old.Variable,Label,Factor.Levels,Min,Max,Limits 2 | ID,codea,"Subject ID",Continuous,,, 3 | bmi.21,bmi22,"BMI at 21 Years",Continuous,,, 4 | bpSys.21,NA,"Mean Systolic Blood Pressure at 21 Years",Continuous,,, 5 | ysrExter.14,jexter,"YSR: externalising at 14 years",Continuous,,,"1 100" 6 | ysrInter.14,jinter,"YSR: internalising at 14 years",Continuous,1,100, 7 | ysrAggre.14,jaggre,"YSR: aggression at 14 years",Continuous,,, 8 | matEducat,ra80,"Maternal Education","1=Incomplete High",,, 9 | ,,,"2=Complete High",,, 10 | ,,,"3=Post High",,, 11 | familyIncome,ra90,"Recoded Income Phase A","1=$10399 or less",,, 12 | ,,,"2=$10400 or more",,, 13 | sex,c45,"Sex of baby",1=Male,,, 14 | ,,,2=Female,,, 15 | -------------------------------------------------------------------------------- /inst/demoFiles/small2.csv: -------------------------------------------------------------------------------- 1 | ID,bmi.21,bpSys.21,bpDia.21,ysrExter.14,ysrInter.14,ysrAnxiousDep.14,ysrAggre.14,matEducat,familyIncome,bmiM,sex 2 | 5455,20.8677444458008,102.5,71,14,13,5,8,1,2,18.9035911560059,1 3 | 7036,23.9626560211182,118,75,15,12,6,13,2,1,17.9981632232666,1 4 | 5973,35.464469909668,105.5,63.5,11,24,8,10,2,2,23.5078067779541,1 5 | 7142,25.5918025970459,121,73.5,9,10,6,7,1,2,36.2624397277832,1 6 | 3003,33.2415084838867,115,74.5,6,9,4,4,2,2,21.3593063354492,1 7 | 1020,19.2122211456299,109,61.5,12,23,15,9,2,1,18.4265327453613,1 8 | 1998,26.3721942901611,112.5,74,12,17,5,8,1,1,24.3910579681396,1 9 | 3377,23.9812622070312,97,64,9,19,8,5,3,2,18.144868850708,1 10 | 5486,21.1265296936035,106.5,58.5,16,15,8,13,2,2,20.077335357666,1 11 | 8321,25.3243427276611,93,58.5,,,,,2,2,35.2955474853516,1 12 | 238,21.9651050567627,118.5,66.5,11,16,9,8,2,2,20.0796031951904,1 13 | 911,20.5254440307617,110,69.5,,,,,3,2,17.8565158843994,1 14 | 5576,26.9159183502197,105.5,58,1,9,3,1,2,2,22.4813289642334,1 15 | 6,23.2088565826416,105,66.5,8,13,4,7,2,2,20.2812328338623,1 16 | 2476,23.8914604187012,124,86.5,7,20,9,6,2,1,18.4425468444824,1 17 | 3018,34.0110778808594,122,77,8,12,6,3,2,2,25.5593280792236,1 18 | 2463,23.9096088409424,115.5,58,16,23,11,12,2,2,23.1246681213379,1 19 | 2525,19.7118644714355,106,72,6,4,3,5,2,2,21.9671211242676,1 20 | 1077,24.9430541992188,103,60,,,,,2,1,21.7192497253418,1 21 | 7745,25.0972232818604,111.5,65,11,14,9,8,1,2,,1 22 | 2948,21.1183815002441,114,62,4,7,3,2,3,2,18.75,1 23 | 1992,16.8445663452148,110.5,76.5,4,7,2,2,2,1,20.6611576080322,1 24 | 7887,31.9731903076172,106,53.5,20,24,14,15,2,1,23.1472549438477,1 25 | 5145,19.8052864074707,103,69,0,8,0,0,1,2,22.6473770141602,1 26 | 4499,18.899938583374,119,63.5,31,32,16,22,2,1,19.1953029632568,1 27 | 2461,30.5249824523926,98.5,64,15,11,5,14,3,2,19.8347110748291,1 28 | 5058,19.9596271514893,109,58.5,13,17,9,9,2,2,19.4931774139404,1 29 | 3590,22.6426544189453,107,68,26,20,10,21,2,1,21.1927433013916,1 30 | 1307,19.229024887085,97,64,9,13,5,8,2,2,19.7238674163818,1 31 | 3144,22.8935737609863,111.5,69,15,19,9,12,1,2,20.5761318206787,1 32 | 6078,23.255615234375,111,55,,,,,1,1,23.4375,1 33 | 40,45.0104331970215,,,,,,,2,1,27.7551021575928,1 34 | 1110,20.3244380950928,106.5,63.5,12,7,4,10,2,1,23.1084327697754,1 35 | 4845,22.6430568695068,87.5,54.5,11,13,4,7,1,,21.7784366607666,1 36 | 2301,17.393705368042,100,62,9,13,2,6,3,2,17.5278148651123,1 37 | 2241,17.6504669189453,109,69.5,9,7,3,8,2,1,16.7871894836426,1 38 | 6794,23.3109378814697,103,63,29,5,1,22,1,2,31.6455688476562,1 39 | 7071,26.2172737121582,105,68,13,31,13,7,3,2,20.3222541809082,1 40 | 4180,24.1990242004395,132,86.5,6,4,2,4,2,2,23.1833896636963,1 41 | 810,30.0327854156494,103,73,19,8,6,15,2,2,19.5716819763184,1 42 | 5947,22.4303493499756,83,52.5,17,15,7,14,2,2,26.3464946746826,1 43 | 2685,21.8233470916748,105,70,5,5,2,4,3,2,19.9572830200195,1 44 | 7279,23.515625,111,66,11,4,1,10,2,2,19.0039100646973,1 45 | 5848,22.1951332092285,102,66.5,6,9,3,5,2,2,21.0076656341553,1 46 | 1479,33.565502166748,108,68.5,17,26,12,13,2,2,24.8015880584717,1 47 | 1910,21.1316337585449,133,77.5,12,32,14,11,2,2,,1 48 | 399,31.4460906982422,114.5,55,10,22,10,7,3,1,28.125,1 49 | 302,14.4751873016357,98.5,68,12,20,10,11,2,2,21.0076656341553,1 50 | 373,30.1474914550781,115,85,,,,,2,1,,1 51 | 4220,16.3339214324951,99.5,57.5,,,,,1,2,22.3132762908936,1 52 | 7609,27.9112205505371,121,69,,,,,2,2,19.6751670837402,1 53 | 6231,21.7403793334961,102,56,14,18,10,12,3,2,20.3244380950928,1 54 | 1858,21.9790191650391,117.5,61.5,17,22,10,14,3,2,19.921875,1 55 | 1218,29.8472156524658,104.5,67,9,25,17,7,2,1,20.8325290679932,1 56 | 5220,19.8430080413818,102,64,13,19,13,10,2,2,,1 57 | 3196,19.0460739135742,109.5,64.5,4,9,5,3,2,2,20.8325290679932,1 58 | 5180,24.6959247589111,100.5,60,9,23,11,8,,,,1 59 | 510,32.2904205322266,115,77,7,6,3,5,2,2,26.4380702972412,1 60 | 2689,21.0265522003174,98.5,69.5,16,21,8,12,3,2,18.5178031921387,1 61 | 1317,26.0678462982178,116,71,23,16,8,17,2,2,19.0519733428955,1 62 | 5914,35.5302467346191,115.5,65.5,10,12,4,7,2,2,22.6473770141602,1 63 | 1428,20.5154933929443,82,58,8,14,7,6,,,,1 64 | 7777,26.3855571746826,118.5,77,9,14,5,5,3,2,18.2183227539062,1 65 | 8554,21.92458152771,90,57.5,8,12,7,4,3,2,,1 66 | 7661,21.5674285888672,111.5,56.5,15,15,5,11,1,1,,1 67 | 1249,19.2634773254395,92,59.5,7,9,2,6,1,1,17.6253795623779,1 68 | 7303,22.4844379425049,84.5,48.5,12,11,5,12,3,,22.5465755462646,1 69 | 871,23.1634578704834,99,66.5,24,15,8,16,2,2,31.8877563476562,1 70 | 3860,20.6020793914795,104,59.5,16,18,10,10,2,2,23.5330429077148,1 71 | 7277,22.4729232788086,97,67.5,19,17,8,15,1,2,18.5901260375977,1 72 | 4720,23.4520874023438,92,55.5,11,6,2,10,2,2,19.7210388183594,1 73 | 3185,22.4566268920898,98.5,65.5,23,15,5,12,2,1,18.5911293029785,1 74 | 2250,18.5063438415527,98,65.5,9,17,6,5,2,,22.8623676300049,1 75 | 1935,22.6705875396729,101,67,9,7,4,6,2,1,23.7953605651855,1 76 | 320,23.3865833282471,123.5,77,9,11,4,9,2,2,18.4911251068115,1 77 | 3991,22.9960041046143,104,65,,,,,2,1,21.8299522399902,1 78 | 7343,17.578125,91,58,9,21,10,6,3,2,21.7554683685303,1 79 | 4335,26.438684463501,115.5,59.5,7,5,0,5,2,2,20.3244380950928,1 80 | 3204,19.2235012054443,110.5,71.5,9,20,8,6,3,2,19.8140525817871,1 81 | 863,27.3452472686768,115.5,72.5,6,18,6,4,1,1,19.6751670837402,1 82 | 6452,43.4574890136719,,,28,20,9,20,3,2,26.171875,1 83 | 3230,19.5713901519775,112.5,67.5,16,12,4,11,2,1,,1 84 | 54,30.6797275543213,117,76,11,31,12,10,2,2,,1 85 | 139,19.1901168823242,104.5,57.5,20,17,8,13,2,1,20.1950912475586,1 86 | 836,31.8324546813965,102.5,72,14,21,9,11,2,2,21.9075813293457,1 87 | 1745,23.7109375,90.5,56.5,13,15,5,11,2,1,18.5528774261475,1 88 | 6062,30.3766708374023,,,5,19,7,4,2,2,19.8351612091064,1 89 | 3121,25.5561008453369,104,68.5,9,6,2,6,2,2,24.5243453979492,1 90 | 5879,23.1833896636963,105.5,74.5,6,12,2,2,2,1,,1 91 | 393,24.8070182800293,110,67,34,39,23,25,2,1,20.6851501464844,1 92 | 6972,20.2385520935059,112,70,19,16,9,12,3,2,20.3244380950928,1 93 | 5898,22.1559009552002,91.5,52.5,,,,,2,1,21.484375,1 94 | 487,18.8195743560791,98,57.5,11,20,8,9,2,2,18.7783451080322,1 95 | 3783,21.6868095397949,104,64.5,11,2,0,7,3,2,23.0111789703369,1 96 | 4843,24.1783771514893,112,58,8,13,5,4,1,1,18.0661678314209,1 97 | 3179,25.4127063751221,119.5,85,16,51,30,9,2,2,21.3382110595703,1 98 | 4865,21.9896717071533,124,69.5,34,36,25,22,2,2,21.4535732269287,1 99 | 8314,20.1276664733887,105,63.5,8,5,2,6,1,1,18.5077667236328,1 100 | 563,28.8791923522949,110,80,26,21,12,19,3,2,19.4674015045166,1 101 | 2366,30.7583770751953,119.5,74,10,15,8,8,2,1,36.4848442077637,1 102 | 1640,25.7859077453613,117,68.5,13,6,2,10,2,2,20.7612438201904,1 103 | 3878,33.4688987731934,125.5,71.5,15,12,4,12,2,2,23.3066806793213,1 104 | 7283,19.5362567901611,107.5,57,18,8,3,12,2,2,20.4381675720215,1 105 | 7570,19.5252227783203,105,61,22,18,7,19,2,2,33.0904235839844,1 106 | 7780,22.2190551757812,132,81.5,3,2,2,0,2,2,19.5716819763184,1 107 | 6593,20.9771423339844,94,53.5,12,8,2,8,2,1,19.1511917114258,1 108 | 4068,33.0035934448242,114.5,68.5,23,15,4,18,2,2,19.8140525817871,1 109 | 6227,26.6387805938721,102,59,15,41,28,13,3,2,21.3599128723145,1 110 | 1345,28.2099494934082,124.5,73.5,1,15,6,1,2,2,23.2254333496094,1 111 | 1178,23.7202053070068,102,59.5,7,2,0,5,2,1,16.6089954376221,1 112 | 7613,28.5120124816895,105,62.5,16,11,2,12,2,2,20.4381675720215,1 113 | 2234,20.1597938537598,108.5,64.5,3,5,1,3,3,2,33.7924728393555,1 114 | 5117,29.9875030517578,,,11,22,11,10,2,1,21.4535732269287,1 115 | 2196,15.9438781738281,116,54.5,7,16,9,6,2,,17.211088180542,1 116 | 5569,20.4923572540283,104,62,5,1,1,4,1,2,20.6575393676758,1 117 | 6554,27.4722881317139,112,66.5,17,16,9,13,2,2,29.296875,1 118 | 8164,23.446870803833,,,3,8,3,2,1,2,22.4081783294678,1 119 | 3624,23.039098739624,105,67.5,17,13,5,12,2,2,18.7327823638916,1 120 | 56,21.4550914764404,122.5,74,9,10,3,8,2,1,17.7458457946777,1 121 | 1788,25.582691192627,111.5,72.5,,,,,2,1,17.6470584869385,1 122 | 3399,23.6641807556152,111.5,62,5,7,4,4,1,2,28.040376663208,1 123 | 3116,19.7437915802002,88,49.5,4,7,2,4,2,2,17.211088180542,1 124 | 533,23.213472366333,119.5,68,7,10,4,5,3,2,18.5077667236328,1 125 | 4362,34.0397758483887,111.5,66,17,19,9,8,1,2,20.9571704864502,1 126 | 4631,24.12473487854,86,50.5,20,37,16,12,2,2,21.7192497253418,1 127 | 5536,26.2993564605713,111,63,11,25,11,10,2,2,,1 128 | 6881,22.65625,107.5,61.5,9,3,2,8,2,2,25.4028167724609,1 129 | 1037,26.1569213867188,99.5,59,4,9,3,2,2,,31.2394142150879,1 130 | 1488,19.5800628662109,100,61.5,20,22,9,17,2,2,20.3618850708008,1 131 | 8005,37.3174667358398,,,22,39,22,17,2,2,16.7311134338379,1 132 | 214,31.2312698364258,110.5,68.5,14,19,10,12,2,2,18.2531089782715,1 133 | 4242,23.9470119476318,89,59,11,22,11,9,2,2,22.4996376037598,1 134 | 5589,21.8771076202393,112.5,64.5,14,23,13,9,2,1,31.5334491729736,1 135 | 4161,18.6408767700195,115.5,75,7,9,4,5,2,2,17.8565158843994,1 136 | 6835,22.9030342102051,121,57,10,7,1,5,2,2,20.4444446563721,1 137 | 6161,19.7426815032959,94,57.5,7,2,0,5,2,2,16.4236488342285,1 138 | 4435,22.8859462738037,107.5,56.5,7,3,1,7,3,2,19.3337306976318,1 139 | 1023,19.7446784973145,103.5,67.5,8,14,3,6,2,2,21.5645446777344,1 140 | 556,20.2626857757568,110,65.5,26,38,17,20,2,,18.7783451080322,1 141 | 8426,17.9295806884766,93.5,58.5,16,5,0,12,2,1,26.7094039916992,1 142 | 3106,18.7165184020996,116.5,64.5,15,20,12,13,3,2,22.3132762908936,1 143 | 7344,23.5939750671387,103.5,66.5,11,13,5,10,2,2,22.4813289642334,1 144 | 1168,20.9248237609863,114.5,62.5,11,12,4,10,2,1,16.9746532440186,1 145 | 3808,31.5723533630371,121,77,17,3,2,12,,2,34.6476020812988,1 146 | 815,23.4215145111084,103,64.5,9,33,19,7,2,2,,1 147 | 7326,19.9222717285156,129,76.5,11,14,9,6,3,2,19.2276859283447,1 148 | 7110,28.8682289123535,118.5,78,8,5,2,5,3,2,22.2063312530518,1 149 | 2890,19.535551071167,110,66,22,15,7,18,2,1,24.3910579681396,1 150 | 6444,29.9089832305908,96.5,52,10,6,0,7,3,2,22.0385684967041,1 151 | 2031,32.5567398071289,108.5,58.5,6,21,10,4,2,1,19.1326541900635,1 152 | 5614,27.6816596984863,92.5,55.5,9,12,4,7,3,2,19.045072555542,1 153 | 3308,32.3708724975586,114,75,16,29,12,11,3,1,24.3417568206787,1 154 | 5244,21.4492950439453,94,58,10,15,8,8,3,2,21.9075813293457,1 155 | 1914,25.6224670410156,119,67,30,25,16,18,2,2,26.6097106933594,1 156 | 1126,18.1896991729736,101,55,10,15,6,8,3,2,19.2671699523926,1 157 | 5122,22.9677429199219,118,65.5,12,15,4,7,2,2,25.2993774414062,1 158 | 8125,19.4248008728027,118,58.5,4,4,0,4,3,2,20.9839878082275,1 159 | 2447,19.9704151153564,111,67.5,20,15,6,14,2,1,18.0802116394043,1 160 | 5795,19.0643348693848,118.5,76.5,15,22,8,12,2,2,19.4674015045166,1 161 | 3566,43.5425834655762,,,9,23,12,5,2,2,23.1404972076416,1 162 | 4708,20.3997135162354,113.5,60,13,7,4,9,2,2,19.8347110748291,1 163 | 3502,23.2353897094727,114.5,64.5,12,11,5,10,3,1,19.6751670837402,1 164 | 8163,21.3015880584717,109.5,64.5,12,16,6,10,2,2,24.9107685089111,1 165 | 5334,17.4265518188477,95.5,61.5,3,7,2,2,1,2,20.5761318206787,1 166 | 2077,35.7177543640137,98,66.5,21,13,4,17,2,2,26.9860019683838,1 167 | 2821,25.3164539337158,107.5,72.5,14,16,11,9,2,2,20.1732521057129,1 168 | 6992,43.2210998535156,,,26,23,13,19,1,1,,1 169 | 912,22.8791885375977,105.5,60,10,26,11,7,3,2,,1 170 | 5575,22.9744052886963,106.5,72,8,17,5,7,2,2,30.4438381195068,1 171 | 2190,19.921875,110.5,57.5,3,5,3,2,2,1,19.1000919342041,1 172 | 8217,22.1659336090088,105.5,64.5,24,22,14,19,1,2,22.4058780670166,1 173 | 1002,20.2261772155762,99,60,4,13,4,3,3,2,20.9566097259521,1 174 | 7874,24.6659183502197,113.5,69,13,23,12,10,2,2,22.0385684967041,1 175 | 1617,42.385799407959,109,74.5,34,36,19,27,2,1,26.0261745452881,1 176 | 2018,29.1981105804443,112.5,65.5,12,16,6,9,2,1,27.9155197143555,1 177 | 8214,19.5828094482422,110,67.5,13,21,12,12,2,2,15.917106628418,1 178 | 4280,21.7375545501709,105.5,65.5,10,7,2,8,2,2,22.3477840423584,1 179 | 6255,17.0020084381104,83,57,10,12,4,8,1,2,17.2879085540771,1 180 | 5089,24.809627532959,99,65,12,6,3,8,3,2,28.228385925293,1 181 | 5393,27.3463859558105,97,71,20,17,9,14,2,2,16.7657375335693,1 182 | 886,25.7647724151611,105.5,61,1,2,1,1,2,2,19.6751670837402,1 183 | 2254,20.5003967285156,116.5,65,16,21,9,14,1,1,18.7961902618408,1 184 | 1038,19.1896629333496,101,65.5,16,29,14,14,3,2,20.9366397857666,1 185 | 1313,24.1357288360596,101,62.5,5,18,8,1,3,2,24.128791809082,1 186 | 6645,22.4277763366699,104,65,13,23,13,11,2,2,23.4952373504639,1 187 | 5172,20.4298992156982,125,71,13,17,9,9,2,1,25.7116622924805,1 188 | 1833,37.4132194519043,113,78.5,14,19,8,8,2,1,22.9130325317383,1 189 | 6314,20.0401554107666,101.5,63,29,18,10,23,3,2,20.2693576812744,1 190 | 861,24.6815395355225,100,50.5,20,35,20,15,2,1,20.6417694091797,1 191 | 833,21.8741550445557,107.5,65,,,,,3,1,21.9075813293457,1 192 | 4663,23.6081314086914,119.5,66.5,8,11,4,4,3,2,21.2585048675537,1 193 | 8184,23.125,116.5,50.5,20,18,7,16,2,1,25.2173595428467,1 194 | 1577,27.07204246521,114,68,15,23,9,10,2,2,20.1732521057129,1 195 | 1593,18.4425468444824,109.5,69.5,16,13,5,12,2,1,22.1297397613525,1 196 | 6670,24.3214817047119,104.5,73.5,11,7,2,8,3,2,23.7196102142334,1 197 | 8261,23.7350196838379,95.5,58,7,6,0,4,2,2,30.8462772369385,1 198 | 1369,16.1225090026855,111,67,19,13,5,11,2,2,20.077335357666,1 199 | 4985,25.7227897644043,105.5,62.5,6,14,4,6,1,2,32.0499496459961,1 200 | 1248,21.1676368713379,94.5,56.5,17,18,13,14,2,,22.5827083587646,1 201 | 2487,25.3076229095459,117,69,,,,,2,1,22.832878112793,1 202 | 4320,26.6112403869629,124,61,8,4,1,7,2,2,22.3081512451172,2 203 | 7304,20.1956233978271,130.5,81.5,22,11,3,14,2,2,17.6692981719971,2 204 | 791,33.2936973571777,,,3,8,4,2,2,1,26.298490524292,2 205 | 3585,21.5905513763428,113.5,67,3,2,0,3,2,2,,2 206 | 5136,29.8355598449707,127,83,19,4,0,11,3,2,30.7394161224365,2 207 | 6093,18.7822570800781,102,58.5,13,5,2,9,2,2,21.3839416503906,2 208 | 596,25.827486038208,124,67.5,20,20,10,14,1,2,22.8571434020996,2 209 | 436,25.8781127929688,121,59,27,20,9,21,2,2,20.2812328338623,2 210 | 3936,23.7937545776367,115,69,12,8,4,11,3,2,19.921875,2 211 | 5001,21.2272453308105,130.5,72,10,6,2,8,2,1,21.9898910522461,2 212 | 1851,20.4511470794678,115,64,3,3,1,2,3,2,18.6851196289062,2 213 | 5395,21.0751838684082,127.5,63,2,3,0,2,2,2,22.3214302062988,2 214 | 2494,22.7920227050781,122.5,65,17,9,5,12,2,1,20.3125,2 215 | 1693,22.4104404449463,118,65.5,9,13,3,6,2,2,21.2585048675537,2 216 | 7649,30.7669830322266,109.5,59,9,12,4,7,2,2,26.5731315612793,2 217 | 2904,20.4285755157471,116.5,74,2,16,7,2,3,2,19.6623077392578,2 218 | 2085,20.8254127502441,125,71.5,12,7,3,9,2,2,24.5389652252197,2 219 | 3719,19.1058349609375,117.5,60.5,7,10,4,7,3,1,15.3502044677734,2 220 | 4674,23.9506187438965,112.5,62,11,15,8,8,1,1,23.5078067779541,2 221 | 1062,26.3352699279785,116.5,58,11,14,6,7,1,1,20.8325290679932,2 222 | 712,17.7800483703613,123.5,73,15,17,8,11,3,1,24.8409805297852,2 223 | 699,23.570873260498,134,81,17,14,9,12,2,2,22.3081512451172,2 224 | 592,25.7173118591309,131,61.5,5,4,0,3,2,,27.34375,2 225 | 3687,29.0735969543457,147.5,83,9,12,4,4,2,2,29.3846759796143,2 226 | 7164,20.5938682556152,122,65,16,24,10,10,2,2,19.5964584350586,2 227 | 5498,21.0249328613281,113,63.5,10,14,5,9,1,2,19.713321685791,2 228 | 7255,28.4398307800293,136,71,13,4,2,11,3,2,19.8991146087646,2 229 | 2919,19.3549041748047,117.5,68.5,37,25,5,24,2,2,18.6114959716797,2 230 | 3133,23.4818840026855,133.5,70.5,25,9,3,19,2,1,18.9865894317627,2 231 | 4270,25.1278305053711,127,72,21,1,0,14,3,2,19.4734058380127,2 232 | 1902,22.77614402771,106,60.5,6,6,2,5,3,2,,2 233 | 3993,30.8955554962158,142,75.5,9,14,5,5,2,1,26.2595825195312,2 234 | 3215,29.8158493041992,138.5,74,21,10,1,18,2,2,21.410945892334,2 235 | 5323,26.5842018127441,153.5,72.5,10,9,5,7,2,2,25.8166313171387,2 236 | 5693,26.6680088043213,128.5,67,10,3,1,8,2,2,18.144868850708,2 237 | 3497,20.6481895446777,115,64,13,12,5,7,2,2,18.0737209320068,2 238 | 7969,21.724292755127,123.5,61,5,9,4,5,2,2,20.4381675720215,2 239 | 8537,19.304630279541,121.5,73,14,24,8,13,3,2,19.1511917114258,2 240 | 3136,17.8639068603516,121,76.5,25,28,11,18,2,2,17.2879085540771,2 241 | 2442,28.4737949371338,129.5,75.5,23,16,6,20,2,1,20.4491386413574,2 242 | 7447,23.8715286254883,138.5,71.5,13,7,1,9,3,2,17.3744525909424,2 243 | 7480,26.1141109466553,123.5,65.5,17,15,8,13,2,1,25.8645133972168,2 244 | 7148,25.3975791931152,141.5,79,9,12,5,6,2,2,22.136739730835,2 245 | 6052,21.6285400390625,116,65.5,1,4,3,1,3,2,23.046875,2 246 | 4562,19.537036895752,118.5,82,14,10,6,9,2,2,20.5761318206787,2 247 | 5047,24.7855243682861,134.5,74.5,3,8,2,2,3,1,24.9107685089111,2 248 | 7418,28.5954360961914,114.5,74.5,9,9,4,6,2,1,21.9898910522461,2 249 | 3613,26.9894924163818,135,75.5,6,12,5,5,2,2,23.9188289642334,2 250 | 2098,23.371072769165,126.5,71.5,11,7,2,7,2,2,23.0680522918701,2 251 | 6079,19.7928066253662,129.5,62.5,10,18,8,8,2,2,22.3081512451172,2 252 | 757,23.146951675415,146.5,90,8,8,4,5,3,2,23.1472549438477,2 253 | 7557,23.589527130127,131,65,4,5,4,3,2,1,18.8271102905273,2 254 | 181,33.9814834594727,127.5,75,19,16,8,13,3,1,23.7332382202148,2 255 | 1645,26.6315879821777,136.5,65.5,7,11,5,3,2,1,22.9130325317383,2 256 | 6729,23.2834968566895,107,56.5,10,7,2,7,2,2,22.5465755462646,2 257 | 6451,22.3381519317627,123,67,5,5,2,2,3,1,24.0054874420166,2 258 | 96,23.6044769287109,111,59,20,7,1,17,3,2,19.5716819763184,2 259 | 1114,34.4740333557129,125,77.5,16,15,9,13,2,2,23.3354663848877,2 260 | 718,27.2196502685547,120.5,69,28,42,22,27,2,2,24.5089435577393,2 261 | 3126,18.7221660614014,117,62,16,25,10,10,2,2,19.3791961669922,2 262 | 7426,28.586877822876,130,78,5,3,1,1,2,2,24.9739875793457,2 263 | 6299,19.3284587860107,116.5,67.5,25,5,3,13,3,2,17.9282169342041,2 264 | 6173,21.8564300537109,127,79,10,12,5,7,3,2,26.7228717803955,2 265 | 7083,19.2950553894043,115.5,69.5,19,9,6,10,2,1,21.0498180389404,2 266 | 5238,23.942268371582,119,70,,,,,2,1,23.6614379882812,2 267 | 514,30.4505710601807,137,82,6,6,2,3,1,2,20.239501953125,2 268 | 727,26.3272686004639,105,56,11,11,4,10,3,2,23.3066806793213,2 269 | 6018,25.5823535919189,150.5,77.5,14,7,3,11,2,2,25,2 270 | 7828,20.7232227325439,119.5,61,6,4,1,4,2,1,20.4491386413574,2 271 | 5066,24.4665012359619,133.5,68.5,1,1,1,1,2,2,20.1955795288086,2 272 | 868,34.8506813049316,128,81,11,23,9,10,2,2,25.153148651123,2 273 | 1911,21.5328578948975,125.5,82.5,10,12,6,6,2,1,20.3244380950928,2 274 | 938,25.6015777587891,127,75.5,13,17,4,11,1,1,17.4043731689453,2 275 | 7789,33.4565200805664,134.5,77.5,22,13,9,16,2,2,22.9421291351318,2 276 | 5598,24.4284496307373,122.5,64,16,30,12,11,3,2,24.9739875793457,2 277 | 2449,26.8859825134277,128.5,74,18,11,6,12,2,1,18.9035911560059,2 278 | 3657,27.3275566101074,135.5,84,21,15,6,16,2,,27.3356380462646,2 279 | 173,25.0908107757568,133.5,71,7,12,5,6,1,,24.5089435577393,2 280 | 2050,22.4089641571045,110,57.5,9,17,7,7,1,2,26.6727638244629,2 281 | 6422,23.9793319702148,129.5,70,24,19,14,19,2,,18.9020385742188,2 282 | 758,34.8162117004395,138,74.5,21,18,7,15,2,2,18.730489730835,2 283 | 5936,21.3220996856689,138,76.5,14,20,9,10,2,2,24.0346088409424,2 284 | 594,19.1953811645508,118.5,53.5,12,25,11,11,2,2,20.3222541809082,2 285 | 110,22.1910209655762,128,59,17,11,8,13,3,2,,2 286 | 8371,15.7239370346069,107,64.5,12,10,1,12,1,2,23.8086891174316,2 287 | 2556,28.0165328979492,140.5,93,13,14,6,11,1,2,15.2415790557861,2 288 | 400,26.5130004882812,121,71.5,9,12,4,9,2,2,23.5555553436279,2 289 | 5943,22.1630878448486,119.5,63.5,,,,,2,1,20.4294166564941,2 290 | 3261,22.2164325714111,132,69,6,15,5,5,2,2,24.3141250610352,2 291 | 3339,41.3713531494141,153,74.5,10,14,4,4,2,1,22.3081512451172,2 292 | 5092,23.4875011444092,112.5,67,,,,,2,2,20.077335357666,2 293 | 6181,19.2407646179199,118.5,64.5,28,41,20,18,2,2,19.6751670837402,2 294 | 348,20.9302310943604,124.5,64.5,13,16,6,9,2,2,,2 295 | 2020,24.2724304199219,115.5,75,20,16,10,17,2,1,23.7953605651855,2 296 | 5949,17.1682643890381,94.5,51,12,13,10,9,2,2,18.9020385742188,2 297 | 1714,23.2513732910156,122.5,70.5,4,7,0,3,2,2,24.3141250610352,2 298 | 6039,26.6666679382324,100,56,4,18,7,4,1,2,23.4952373504639,2 299 | 1246,24.055549621582,141,69.5,17,22,11,14,1,1,18.9865894317627,2 300 | 3598,22.0031032562256,149.5,73,22,9,5,15,2,1,,2 301 | 87,25.0037593841553,122,70,11,3,0,10,3,2,15.8172903060913,2 302 | 299,30.1281833648682,138,69,6,5,2,4,2,,17.6326522827148,2 303 | 3935,25.6196937561035,119,67.5,8,12,6,6,2,2,20.6575393676758,2 304 | 5053,40.9283065795898,125.5,68,,,,,1,2,,2 305 | 4486,24.2242107391357,138,65,15,16,9,12,1,2,26.3958034515381,2 306 | 3742,30.5959167480469,119.5,75.5,7,8,6,6,2,2,22.3081512451172,2 307 | 4728,17.7164516448975,111.5,69,17,43,24,15,2,2,18.9069004058838,2 308 | 1205,18.370719909668,112.5,65.5,33,18,13,27,2,,18.5077667236328,2 309 | 1401,20.3780784606934,114.5,63.5,14,13,6,10,1,1,20.8325290679932,2 310 | 2848,32.5252647399902,104.5,50.5,13,18,8,10,2,2,32.7919425964355,2 311 | 1991,21.3102264404297,119,65,7,18,9,5,1,1,22.8623676300049,2 312 | 2432,23.5117454528809,119.5,71,21,22,13,14,2,2,18.6114959716797,2 313 | 2655,40.3215408325195,125,75.5,14,16,7,9,1,2,21.6712589263916,2 314 | 7096,25.6042823791504,122,65,17,16,5,15,2,2,24.0238094329834,2 315 | 1282,22.2342376708984,115.5,67,9,6,1,6,2,2,19.3624744415283,2 316 | 2171,33.8817520141602,129.5,68.5,3,3,1,1,1,2,,2 317 | 6409,20.4284038543701,124.5,70,3,3,0,3,3,2,18.8267936706543,2 318 | 3314,23.5690231323242,117,54,,,,,2,,21.9898910522461,2 319 | 3755,19.3814182281494,127,75.5,1,0,0,0,1,1,23.7332382202148,2 320 | 1510,21.3400535583496,103,66,12,9,2,10,1,2,21.7738437652588,2 321 | 6313,28.5386638641357,132,69.5,9,5,3,6,1,2,22.0741004943848,2 322 | 1802,27.8569183349609,,,26,21,15,19,1,1,21.513858795166,2 323 | 4620,23.1439876556396,131,68,24,17,8,18,3,1,19.2276859283447,2 324 | 4037,29.0119781494141,121.5,78.5,,,,,2,2,16.037956237793,2 325 | 2412,20.1446285247803,118.5,64.5,12,16,5,5,2,2,16.8061332702637,2 326 | 2418,24.4794864654541,137.5,88,12,14,7,7,3,2,20.5498886108398,2 327 | 7077,28.6895599365234,145.5,77.5,7,10,3,5,3,1,20.2020206451416,2 328 | 1803,19.5849361419678,127.5,64.5,,,,,2,1,21.7864933013916,2 329 | 7577,21.4435291290283,118.5,73.5,10,13,8,9,2,2,26.6389198303223,2 330 | 919,24.2105293273926,146.5,62,9,5,1,6,1,1,21.7079219818115,2 331 | 1312,27.9889259338379,142,74.5,12,15,8,9,2,1,21.6206474304199,2 332 | 2290,37.7529678344727,128,81.5,,,,,3,2,19.8140525817871,2 333 | 6095,23.1372699737549,120.5,73,14,11,5,10,2,,26.6727638244629,2 334 | 5140,29.1516017913818,129.5,71.5,14,11,4,8,3,2,23.5330429077148,2 335 | 4563,37.0951118469238,115,72,24,21,9,19,1,2,25,2 336 | 612,27.4573249816895,147.5,82,14,7,2,9,2,2,,2 337 | 327,16.9547271728516,118,66,11,10,1,9,1,1,17.8565158843994,2 338 | 6947,27.9280757904053,121,71.5,7,8,3,5,2,2,20.7612438201904,2 339 | 8013,26.426586151123,123,67,14,12,1,10,2,2,27.1415824890137,2 340 | 3010,20.1115779876709,126.5,74,14,7,5,9,2,1,17.1029148101807,2 341 | 7934,27.909008026123,127.5,71,4,6,2,2,2,2,20.3125,2 342 | 3631,20.9648876190186,133,68,7,6,2,6,2,2,19.7055320739746,2 343 | 4694,23.9994583129883,108.5,63,1,4,0,1,2,2,26.0261745452881,2 344 | 6471,22.5174713134766,123.5,60.5,15,9,2,9,2,2,21.2183170318604,2 345 | 5960,26.2001647949219,141,86,12,4,1,10,2,2,21.2585048675537,2 346 | 8233,24.1070594787598,121,66.5,5,14,6,3,3,2,20.7008171081543,2 347 | 3973,17.3184051513672,123,73.5,35,29,19,24,2,1,23.1246681213379,2 348 | 7809,18.3289012908936,107.5,62,21,2,0,13,3,2,,2 349 | 6706,17.208065032959,114,67.5,3,3,0,1,2,,16.1616172790527,2 350 | 220,22.0446681976318,130,60,17,11,5,12,1,1,21.5044708251953,2 351 | 5268,18.3085594177246,114.5,53.5,27,16,10,22,2,2,21.0961894989014,2 352 | 7384,24.3132190704346,125.5,67.5,17,21,10,13,2,2,25.5937366485596,2 353 | 3925,27.1171875,132.5,72,7,21,10,5,2,1,,2 354 | 8109,19.8961925506592,114,59.5,3,7,1,3,2,2,20.1732521057129,2 355 | 7945,21.9406394958496,142,72.5,24,23,9,16,2,2,22.8623676300049,2 356 | 4215,25.1862163543701,120,55.5,15,7,4,11,3,2,19.9572830200195,2 357 | 4886,19.1018867492676,100.5,54.5,7,12,4,4,3,2,18.2867813110352,2 358 | 8075,20.5530738830566,134,77.5,22,22,10,16,3,2,20.1950912475586,2 359 | 3540,19.6195011138916,112.5,65.5,10,1,0,7,1,1,22.9421291351318,2 360 | 179,21.1262836456299,115.5,70,6,13,6,5,2,1,24.0882225036621,2 361 | 5419,24.933443069458,143.5,64.5,8,8,3,8,3,2,25.6895618438721,2 362 | 997,20.346549987793,121,72.5,2,5,1,2,3,2,21.09375,2 363 | 7890,21.3237152099609,123,72.5,13,14,4,6,2,2,19.7210388183594,2 364 | 2864,28.1957626342773,142.5,69.5,9,17,4,6,2,2,24.2809734344482,2 365 | 1357,21.4082946777344,122.5,67,21,14,11,13,3,2,23.1111106872559,2 366 | 1896,17.9269828796387,139,81.5,6,6,1,2,3,1,,2 367 | 4766,20.5638904571533,138.5,70,6,3,1,4,1,1,23.3354663848877,2 368 | 6065,22.659912109375,145,80.5,26,9,6,19,2,2,23.7332382202148,2 369 | 1433,31.7545566558838,110.5,67.5,16,14,5,13,2,2,26.6727638244629,2 370 | 7179,21.141077041626,137.5,61.5,6,2,1,6,1,,22.6666660308838,2 371 | 3469,25.2234325408936,135.5,68.5,18,7,1,14,2,2,20.9366397857666,2 372 | 3169,25.135082244873,129,66,5,12,3,2,3,2,17.9282169342041,2 373 | 5340,25.008186340332,133,80.5,14,13,4,10,2,,20.1091651916504,2 374 | 7453,24.2111911773682,124,64.5,18,8,1,11,2,2,21.9075813293457,2 375 | 3830,22.4392757415771,137,65.5,4,2,1,4,2,2,21.7738437652588,2 376 | 4006,30.2298145294189,125.5,72.5,5,12,6,4,2,2,27.2508888244629,2 377 | 2502,23.9665565490723,117.5,65,6,4,1,6,3,2,24.3141250610352,2 378 | 2489,21.8277416229248,150.5,74,8,9,4,4,2,1,18.7327823638916,2 379 | 6747,25.8613262176514,117,58.5,9,11,3,6,2,2,26.2335548400879,2 380 | 1721,35.738208770752,136.5,68.5,17,5,0,14,2,2,18.7327823638916,2 381 | 2652,30.8369808197021,139.5,76.5,14,10,1,11,2,2,19.1326541900635,2 382 | 4584,20.8154621124268,123.5,67.5,11,10,3,6,1,1,29.017448425293,2 383 | 5518,24.6981239318848,117.5,66,12,5,1,7,2,2,20.8325290679932,2 384 | 1718,24.853588104248,123.5,65,8,15,9,5,2,1,20.3244380950928,2 385 | 4246,34.1212310791016,133.5,76,15,20,6,10,2,2,23.3066806793213,2 386 | 5307,21.4878883361816,140.5,74,28,17,8,19,2,2,,2 387 | 5878,21.4039974212646,143.5,71,19,16,5,13,2,2,25.390625,2 388 | 3732,23.1006546020508,122.5,71,13,30,16,11,3,2,16.6659164428711,2 389 | 1008,22.9831123352051,144.5,84.5,21,9,6,17,1,1,,2 390 | 6183,20.7642002105713,127,59,7,15,7,5,2,2,27.4144535064697,2 391 | 169,31.3174171447754,143,77,9,18,10,6,2,2,21.4535732269287,2 392 | 5304,20.7440433502197,105.5,65.5,7,22,9,4,2,2,19.0519733428955,2 393 | 5791,23.5947704315186,111,68.5,14,9,5,12,2,2,22.1002902984619,2 394 | 5365,22.6838512420654,123.5,67,12,3,2,6,2,1,18.75,2 395 | 898,25.1557865142822,118.5,66.5,8,1,0,6,2,1,,2 396 | 1781,21.6795711517334,128.5,70.5,12,7,3,9,2,2,18.6620140075684,2 397 | 1070,26.5071315765381,127,82.5,8,2,1,6,2,2,21.1927433013916,2 398 | 1767,29.1915225982666,127,60,6,1,0,4,2,1,22.9130325317383,2 399 | 8360,26.8250923156738,110.5,69.5,5,17,8,2,2,2,,2 400 | 2924,23.5804443359375,124,66,18,12,6,13,2,2,19.9432125091553,2 401 | 705,23.1010303497314,137,73,5,1,1,4,2,2,17.7462291717529,2 402 | -------------------------------------------------------------------------------- /inst/demoFiles/small2_codebook.csv: -------------------------------------------------------------------------------- 1 | Variable,Old.Variable,Label,Factor.Levels,Factor.Type 2 | ID,codea,"Subject ID",Continuous, 3 | bmi.21,bmi22,"BMI at 21 Years",Continuous, 4 | bpSys.21,NA,"Mean Systolic Blood Pressure at 21 Years",Continuous, 5 | ysrExter.14,jexter,"YSR: externalising at 14 years",Continuous, 6 | ysrInter.14,jinter,"YSR: internalising at 14 years",Continuous, 7 | ysrAggre.14,jaggre,"YSR: aggression at 14 years",Continuous, 8 | matEducat,ra80,"Maternal Education","1=Incomplete High",Ordered 9 | ,,,"2=Complete High", 10 | ,,,"3=Post High", 11 | familyIncome,ra90,"Recoded Income Phase A","1=$10399 or less",Nominal 12 | ,,,"2=$10400 or more", 13 | sex,c45,"Sex of baby",1=Male,Factor 14 | ,,,2=Female, 15 | -------------------------------------------------------------------------------- /man/codebookr.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/codebookr-package.R 3 | \docType{package} 4 | \name{codebookr} 5 | \alias{codebookr} 6 | \alias{codebookr-package} 7 | \title{codebookr: A package for reading codebooks and applying to datasets} 8 | \description{ 9 | The codebookr package provides functions for reading a codebook from a 10 | spreadsheet (.csv file), creating factors and checking levels and 11 | checking continuous values are in a valid range. 12 | } 13 | -------------------------------------------------------------------------------- /man/create_factors.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/create_factors.R 3 | \name{create_factors} 4 | \alias{create_factors} 5 | \title{Use codebook to create factors and check levels for validity} 6 | \usage{ 7 | create_factors(x, code_book, column_names = NULL) 8 | } 9 | \arguments{ 10 | \item{x}{\code{tibble} to which \code{codebook} is applied} 11 | 12 | \item{code_book}{\code{codebook} containing factor names and factor levels 13 | to convert numeric or character vectors to \code{factors} and also test 14 | for valid levels.} 15 | 16 | \item{column_names}{character vector of column names for conversion to 17 | factors. Default: All \code{factors} defined in \code{codebook}.} 18 | } 19 | \value{ 20 | object of type class \code{tibble} 21 | } 22 | \description{ 23 | Uses a codebook which is an S3 class \code{codebook}, possibly 24 | read in by \code{read_codebook}, to convert numerical vectors into 25 | \code{factors} in a \code{tibble}. 26 | } 27 | \details{ 28 | REPEAT: Often, when analysing data, data dictionaries or code books are 29 | provided with data files. Rather than a \code{word} \code{doc} or 30 | \code{pdf} files, the format required here is in a very specific 31 | format stored as a \code{csv} file. Once read in, attributes such 32 | as factor labels/levels and variable labels can be added to the 33 | \code{data.frame} and/or also used to check factor labels and 34 | variable names are consistent with the code book. Note that while 35 | various methods may be available which attempt to convert word 36 | docs or pdf's to a spreadsheet and/or csv file, extreme care 37 | should be taken as these are far from perfect. 38 | } 39 | \examples{ 40 | file.copy(system.file('demoFiles', 'data1_codebook.csv', 41 | package='codebookr'), 'data1_codebook.csv') 42 | file.copy(system.file('demoFiles', 'data1-birth.csv', 43 | package='codebookr'), 'data1-birth.csv') 44 | data1_codebook <- read_codebook("data1_codebook.csv", 45 | column_names = list(variable_levels = "Factor.Levels", 46 | variable_original = "Old.Variable", 47 | min = "Min", max = "Max")) 48 | data1 <- readr::read_csv('data1-birth.csv') 49 | data1 50 | myData <- create_factors(data1, data1_codebook) 51 | str(myData) 52 | } 53 | \author{ 54 | Peter Baker \email{pete@petebaker.id.au} 55 | } 56 | -------------------------------------------------------------------------------- /man/print.codebook.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/print.codebook.R 3 | \name{print.codebook} 4 | \alias{print.codebook} 5 | \alias{codebook} 6 | \title{Print an S3 object of class \code{codebook}} 7 | \usage{ 8 | \method{print}{codebook}(x, extra = FALSE, ...) 9 | } 10 | \arguments{ 11 | \item{x}{object of class \code{codebook}} 12 | 13 | \item{extra}{logical: whether to print extra information. Default: FALSE} 14 | 15 | \item{...}{extra arguments passed to specific printing functions} 16 | } 17 | \description{ 18 | \code{read_codebook} reads a code book stored as a \code{csv} file 19 | for either checking against a data file or relabelling factor 20 | levels or labelling variables. \code{read_codebook} returns an S3 21 | object of class \code{codebook} 22 | } 23 | \examples{ 24 | file.copy(system.file('demoFiles', 'data1_codebook.csv', 25 | package='codebookr'), 'data1_codebook.csv') 26 | data1_codebook <- read_codebook("data1_codebook.csv", 27 | column_names = list(variable_levels = "Factor.Levels", 28 | variable_original = "Old.Variable", 29 | min = "Min", max = "Max")) 30 | print(data1_codebook) 31 | } 32 | \seealso{ 33 | \code{\link{read_codebook}} 34 | } 35 | \author{ 36 | Peter Baker \email{pete@petebaker.id.au} 37 | } 38 | -------------------------------------------------------------------------------- /man/read_codebook.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/read_codebook.R 3 | \name{read_codebook} 4 | \alias{read_codebook} 5 | \title{Read a code book in standard format(s) as a csv file} 6 | \usage{ 7 | read_codebook(x, codebook_directory = NULL, column_names = NULL, 8 | na = c("", "NA", ".", " "), 9 | data_management_plan = "All possible details should be here") 10 | } 11 | \arguments{ 12 | \item{x}{filename of codebook to parse} 13 | 14 | \item{codebook_directory}{directory containing codebook. Default : 15 | current directory} 16 | 17 | \item{column_names}{named character vector containing column names 18 | in Code Book file. The vector contains components 19 | \code{variable_name} = variable name, 20 | \code{variable_original} = original name (if variable_name was 21 | changed), \code{label} for printing/plotting, 22 | \code{variable_levels} = factor levels, \code{variable_limits} or 23 | \code{min} and \code{max} for continuous measurements, 24 | \code{missing_values} = numeric or strings for values of 25 | variable to be set as missing \code{comments} = comments about 26 | the variable which may include the measurement instrument or 27 | references about the measurement. Note that default values may 28 | be found with \code{options(codebookr.column_names)}} 29 | 30 | \item{na}{a character vector of strings which are to be 31 | interpreted as \sQuote{NA} values. Blank fields are also 32 | considered to be missing values in logical, integer, numeric and 33 | complex fields. Default: \code{c("", "NA", ".", " ")}} 34 | 35 | \item{data_management_plan}{a list containing information like url, 36 | location, authors, date, version and so on. 37 | Default: "All possible details should be here"} 38 | } 39 | \value{ 40 | S3 object of type class \code{codebook} 41 | } 42 | \description{ 43 | \code{read_codebook} reads a code book stored as a \code{csv} file 44 | for either checking against a data file or relabelling factor 45 | levels or labelling variables. 46 | } 47 | \details{ 48 | Often, when analysing data, data dictionaries or code books are 49 | provided with data files. Rather than a \code{word} \code{doc} or 50 | \code{pdf} files, the format required here is in a very specific 51 | format stored as a \code{csv} file. Once read in, attributes such 52 | as factor labels/levels and variable labels can be added to the 53 | \code{data.frame} and/or also used to check factor labels and 54 | variable names are consistent with the code book. Note that while 55 | various methods may be available which attempt to convert word 56 | docs or pdf's to a spreadsheet and/or csv file, extreme care 57 | should be taken as these are far from perfect. 58 | } 59 | \examples{ 60 | file.copy(system.file('demoFiles', 'data1_codebook.csv', 61 | package='codebookr'), 'data1_codebook.csv') 62 | data1_codebook <- read_codebook("data1_codebook.csv", 63 | column_names = list(variable_levels = "Factor.Levels", 64 | variable_original = "Old.Variable", 65 | min = "Min", max = "Max")) 66 | } 67 | \author{ 68 | Peter Baker \email{pete@petebaker.id.au} 69 | } 70 | -------------------------------------------------------------------------------- /man/validate_limits.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/validate_limits.R 3 | \name{validate_limits} 4 | \alias{validate_limits} 5 | \title{Use codebook to check continuous variables for validity} 6 | \usage{ 7 | validate_limits(x, code_book, column_names = NULL) 8 | } 9 | \arguments{ 10 | \item{x}{\code{tibble} to which \code{codebook} is applied} 11 | 12 | \item{code_book}{\code{codebook} containing names of continuous 13 | variables and limits for checking data are within range} 14 | 15 | \item{column_names}{character vector of column names for checking 16 | data in range. Default: All continuous variables defined in 17 | \code{codebook}.} 18 | } 19 | \value{ 20 | object of type class \code{tibble} containing data out of 21 | limits 22 | } 23 | \description{ 24 | Uses a codebook which is an S3 class \code{codebook}, possibly 25 | read in by \code{read_codebook}, to check for data within valid 26 | ranges. 27 | } 28 | \details{ 29 | REPEAT: Often, when analysing data, data dictionaries or code books are 30 | provided with data files. Rather than a \code{word} \code{doc} or 31 | \code{pdf} files, the format required here is in a very specific 32 | format stored as a \code{csv} file. Once read in, attributes such 33 | as factor labels/levels and variable labels can be added to the 34 | \code{data.frame} and/or also used to check factor labels and 35 | variable names are consistent with the code book. Note that while 36 | various methods may be available which attempt to convert word 37 | docs or pdf's to a spreadsheet and/or csv file, extreme care 38 | should be taken as these are far from perfect. 39 | } 40 | \examples{ 41 | file.copy(system.file('demoFiles', 'data1_codebook.csv', 42 | package='codebookr'), 'data1_codebook.csv') 43 | file.copy(system.file('demoFiles', 'data1-yr21.csv', 44 | package='codebookr'), 'data1-yr21.csv') 45 | data1_codebook <- read_codebook("data1_codebook.csv", 46 | column_names = list(variable_levels = "Factor.Levels", 47 | variable_original = "Old.Variable", 48 | min = "Min", max = "Max")) 49 | data1 <- readr::read_csv('data1-yr21.csv') 50 | data1 51 | non_valid <- validate_limits(data1, data1_codebook) 52 | } 53 | \author{ 54 | Peter Baker \email{pete@petebaker.id.au} 55 | } 56 | -------------------------------------------------------------------------------- /notes.org: -------------------------------------------------------------------------------- 1 | #+BEGIN_COMMENT 2 | ## Filename: readme.org 3 | ## Hostname: peterbakerlinux.sph.uq.edu.au 4 | ## Directory: /home/pete/Data/dev/codebookr/ 5 | ## Licence: GPLv3 see 6 | ## 7 | ## Created at: Tue Apr 4 16:58:36 2017 8 | ## Change Log: 9 | ## 10 | #+END_COMMENT 11 | #+TITLE: Notes on 'codebookr' package 12 | #+AUTHOR: Peter Baker 13 | #+EMAIL: p.baker1@uq.edu.au 14 | #+TAGS: office(o) home(h) computer(c) graphicalModels(g) workFlow(w) music(m) band(b) 15 | #+SEQ_TODO: TODO(t) STARTED(s) WAITING(w) APPT(a) | DONE(d) CANCELLED(c) DEFERRED(f) 16 | #+HTML_HEAD: 17 | #+EXPORT_SELECT_TAGS: export 18 | #+EXPORT_EXCLUDE_TAGS: noexport 19 | #+OPTIONS: H:2 num:nil toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t 20 | #+STARTUP: showall 21 | #+STARTUP: indent 22 | #+STARTUP: hidestars 23 | #+BABEL: :session *R* :cache yes :results output graphics :exports both :tangle yes 24 | 25 | * Initially 26 | - readCodeBook.R from dryworkflow which I believe is mid 2015 27 | 28 | * Process (as documented previously) 29 | just use devtools:: as outlined in Hadley Wickham's R packages book 30 | 31 | eg devtools::load_all() 32 | devtools::document() 33 | devtools::build() 34 | devtools::check() 35 | 36 | ess seems lacking w.r.t. package development although bindings etc 37 | available with 38 | - ess-r-devtools-ask bound to C-c C-w C-a. 39 | - ess-r-package-mode 40 | - ess-r-set-evaluation-env bound by default to C-c C-t C-s 41 | or set ess-r-package-auto-set-evaluation-env to nil to disable this. 42 | - devtools packages and are accessible with C-c C-w prefix see documentation 43 | for ess-r-package-mode 44 | 45 | 46 | * Codebook data 47 | 48 | Column names for data dictionary (parameter column_names) 49 | 50 | | Name | Value in file | Notes | 51 | |-------------------+------------------+---------------------------| 52 | | variable_name | "Variable" | | 53 | | variable_label | "Label" | | 54 | | variable_original | "Original_Name" | | 55 | | variable_levels | "Levels" | for factors | 56 | | variable_limits | "Limits" | for continuous variables | 57 | | min | "Minimum" | " " " | 58 | | max | "Maximum" | " " " | 59 | | missing_values | "Missing_Values" | documented eg -9, ., etc | 60 | | factor_type | "Factor_Type" | factor, nominal, ordinal | 61 | | comments | "Comments" | | 62 | | | | | 63 | 64 | Note that 65 | - all column_names are unlikely to be present 66 | - can override column names 67 | - might be best to set up as options so that can be changed globally 68 | where necessary (.onLoad - see pp30 R Packages) 69 | 70 | * ideas for other functions 71 | - should I move from camel case to underscore and lower case 72 | - remember can use structure() to make data frame with attribute in 73 | one hit 74 | - I thought I had some of these in a previous incarnation of 75 | dryworkflow 76 | ** STARTED create_factors function 77 | - maybe could be named addFactors or addFactorLevels or add_factors or 78 | create_factors 79 | - could use 'forcats' more although readr probably sufficient 80 | - what about 'purrr' or 'stringr'? Of any use? 81 | - check values for factors maybe case change as well - easier to use 82 | readr::parse_factor() 83 | ** TODO validate_limits for continuous variables 84 | - dplyr ? continuous variables ? 85 | ** TODO overall wrapper function to do all in one 86 | 87 | NB: could change column_* to col_* and keep names a bit smaller 88 | 89 | * S3 codebook object 90 | Maybe should have several objects in list - not just codebook. How 91 | about these as possibilities or should some be attributes?? NB: Can 92 | always modify once I get working strategy 93 | - data_dictionary (al. la. table above) 94 | - version_data_dictionary ?? 95 | - hashsum_data_dictionary (for cross checking) 96 | - data_management_plan (various details as list - can be empty of course) 97 | + version 98 | + date 99 | + description 100 | + url or location 101 | + authors 102 | + just grab some examples and try and populate variours components 103 | - data_managemnt_plan_hashsum ? 104 | - shreadsheet_name (attributes??) 105 | - spreadsheet_column_names 106 | - spreadsheet_hashsum 107 | - log - list indexed by times? 108 | 109 | Some of these things are really about version control - this would be 110 | an attempt to make them self contained but VC better IMHO 111 | 112 | 113 | 114 | --------------------------------------------------------------------------------