├── .Rbuildignore ├── .gitignore ├── DESCRIPTION ├── LICENSE ├── NAMESPACE ├── R ├── baseline.R ├── classes-customer.R ├── classes-weather.R ├── dataSetDocs.R ├── features-RamCore.R ├── features-basic.R ├── features-weather.R ├── filter.R ├── iterator.R ├── solaRUtil.R ├── targeting.R ├── testDataSource.R ├── util-base.R ├── util-census.R ├── util-dataSource.R ├── util-dbUtil.R ├── util-export.R ├── util-regression.R ├── util-timer.R ├── util-tree.R └── visdom.R ├── README.md ├── VISDOM_feature_categories.pdf ├── VISDOM_getting_started.pdf ├── apply_copyright.sh ├── copyright.txt ├── data-raw ├── BuildingClimateZonesByZIPCode.csv ├── Erle_zipcodes.csv ├── census │ ├── ACS_11 │ │ ├── Gaz_zcta_national.txt │ │ └── Zip_to_ZCTA_Crosswalk_2011_JSI.csv │ ├── ACS_2007_2011_SF_Tech_Doc.pdf │ └── census_vars.txt └── migrateData.R ├── data ├── CA_ZIP_CLIMATE.rda ├── CENSUS_GAZ.rda ├── CENSUS_VARS_OF_INTEREST.rda ├── ERLE_ZIP_LOCATION.rda └── ZIP_TO_ZCTA.rda ├── inst ├── doc │ ├── FAQ.html │ ├── advanced_usage.html │ ├── authoring_data_source.html │ ├── bootstrap_devel_environment.html │ ├── customer_data_objects.html │ ├── example_feature_extraction.html │ ├── example_iterator_usage.html │ ├── install_visdom.html │ └── weather_data_objects.html ├── feature_set_run_conf │ └── example_feature_set.conf └── sql │ ├── db_connect_example.conf │ └── feature_runs.create.mysql.sql ├── install ├── generic_create.sql ├── generic_insert.sql ├── generic_visdom_data_source.R └── ubuntu_14.04_bash_requirements.sh ├── man ├── CA_ZIP_CLIMATE.Rd ├── CENSUS_GAZ.Rd ├── DataSource.Rd ├── ERLE_ZIP_LOCATION.Rd ├── MeterDataClass.Rd ├── TestData.Rd ├── WeatherClass.Rd ├── ZIP_TO_ZCTA.Rd ├── acs.fetch.and.cache.Rd ├── applyDateFilters.Rd ├── basicFeatures.Rd ├── buildWhere.Rd ├── cleanFeatureDF.Rd ├── clearCons.Rd ├── clear_acs_cache.Rd ├── conf.dbCon.Rd ├── ctree.boot.Rd ├── ctree.run.Rd ├── ctree.subscore.Rd ├── datesToEpoch.Rd ├── dbCfg.Rd ├── exportData.Rd ├── exportFeatureAndShapeResults.Rd ├── exportShapes.Rd ├── fixNames.Rd ├── getRunId.Rd ├── groupHist.Rd ├── iterator.build.idx.Rd ├── iterator.callAll.Rd ├── iterator.callAllFromCtx.Rd ├── iterator.iterateMeters.Rd ├── iterator.iterateZip.Rd ├── iterator.runMeter.Rd ├── iterator.runZip.Rd ├── iterator.todf.Rd ├── loadACS.Rd ├── mergeShapeFeatures.Rd ├── piecewise.regressor.Rd ├── reexports.Rd ├── regressor.piecewise.Rd ├── regressor.split.Rd ├── regressorDF.Rd ├── rm.col.Rd ├── run.query.Rd ├── runDateFilterIfNeeded.Rd ├── sanityCheckDataSource.Rd ├── showCons.Rd ├── spread.boot.Rd ├── spreadScore.Rd ├── visdom.Rd ├── writeCSVData.Rd ├── writeDatabaseData.Rd └── writeH5Data.Rd ├── tests ├── testthat.R └── testthat │ └── test_util-census.R └── vignettes ├── FAQ.rmd ├── advanced_usage.rmd ├── authoring_data_source.rmd ├── bootstrap_devel_environment.rmd ├── customer_data_objects.rmd ├── example_baseline.R ├── example_census.R ├── example_feature_extraction.rmd ├── example_iterator_usage.rmd ├── install_visdom.rmd └── weather_data_objects.rmd /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^install$ 4 | ^data-raw$ 5 | ^copyright\.txt$ 6 | ^apply_copyright\.sh$ 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | .Rapp.history 4 | 5 | # Output files from R CMD build 6 | /*.tar.gz 7 | 8 | # Example code in package build process 9 | *-Ex.R 10 | 11 | # Output files from R CMD check 12 | /*.Rcheck/ 13 | 14 | # RStudio files 15 | .Rproj.user/ 16 | # produced vignettes 17 | vignettes/*.html 18 | vignettes/*.pdf 19 | # Python compiled files 20 | .pyc 21 | # text editor backup files 22 | .bak 23 | 24 | .RData 25 | 26 | # install ready documentation 27 | inst/doc 28 | 29 | # knitr and R markdown default cache directories 30 | /*_cache/ 31 | /cache/ 32 | 33 | # Temporary files created by R markdown 34 | *.utf8.md 35 | *.knit.md 36 | 37 | .Rproj.user 38 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: visdom 2 | Type: Package 3 | Title: R package for energy data analytics 4 | Version: 0.7.0 5 | Date: 2016-11-03 6 | Authors@R: c( 7 | person("Sam", "Borgeson", email = "sam@convergenceda.com", role = c("aut","cre")), 8 | person("Jungsuk", "Kwac", email = "kwjusu1@gmail.com", role = "aut"), 9 | person("Ram", "Rajagopal", email = "ramr@stanford.edu", role = "aut") 10 | ) 11 | URL: https://github.com/convergenceda/visdom 12 | BugReports: https://github.com/convergenceda/visdom/issues 13 | Depends: 14 | R (>= 3.1.0) 15 | Imports: 16 | akmeans (>= 1.1), 17 | plyr, 18 | dplyr, 19 | magrittr, 20 | assertthat, 21 | tibble, 22 | RCurl, 23 | bitops, 24 | zoo, 25 | latticeExtra, 26 | R.methodsS3, 27 | R.oo, 28 | R.utils, 29 | dplyr, 30 | lubridate, 31 | timeDate, 32 | cvTools, 33 | sandwich, 34 | lmtest, 35 | RColorBrewer, 36 | ggplot2, 37 | solaR, 38 | gridExtra, 39 | party, 40 | testthat, 41 | properties, 42 | digest, 43 | DBI (>= 0.5), 44 | XML, 45 | R.cache, 46 | acs 47 | Remotes: github::josiahjohnston/acs 48 | Suggests: 49 | foreach, 50 | partykit, 51 | doParallel, 52 | foreign, 53 | knitr, 54 | roxygen, 55 | rmarkdown 56 | Description: R package for energy data analytics. VISDOM stands for Visualization 57 | and Insight System for Demand Operations and Management. 58 | License: MIT + file LICENSE 59 | LazyData: true 60 | LazyLoad: yes 61 | RoxygenNote: 5.0.1 62 | VignetteBuilder: knitr 63 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016, 2017 Convergence Data Analytics, LLC 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | S3method(print,DescriptorGenerator) 4 | S3method(print,ModelDescriptor) 5 | export("%>%") 6 | export(DataSource) 7 | export(DescriptorGenerator) 8 | export(MeterDataClass) 9 | export(Mode) 10 | export(ModelDescriptor) 11 | export(TestData) 12 | export(WeatherClass) 13 | export(acs.fetch.and.cache) 14 | export(addZCTA) 15 | export(applyDateFilters) 16 | export(as.daily.df) 17 | export(basicFeatures) 18 | export(buildWhere) 19 | export(changeCP) 20 | export(cleanFeatureDF) 21 | export(clearAllVars) 22 | export(clearCons) 23 | export(clear_acs_cache) 24 | export(cmplt) 25 | export(conf.dbCon) 26 | export(coreFeaturesfn) 27 | export(cp24Generator) 28 | export(ctree.boot) 29 | export(ctree.run) 30 | export(ctree.subscore) 31 | export(cvFold) 32 | export(datesToEpoch) 33 | export(daySummarize) 34 | export(dbCfg) 35 | export(descend) 36 | export(diff2) 37 | export(evalCP) 38 | export(exportData) 39 | export(exportFeatureAndShapeResults) 40 | export(exportShapes) 41 | export(fixNames) 42 | export(geometricLagGenerator) 43 | export(getRunId) 44 | export(getSQLdialect) 45 | export(getWeatherSummary) 46 | export(groupHist) 47 | export(hourlyChangePoint) 48 | export(iterator.build.idx) 49 | export(iterator.callAll) 50 | export(iterator.callAllFromCtx) 51 | export(iterator.issues.todf) 52 | export(iterator.iterateMeters) 53 | export(iterator.iterateZip) 54 | export(iterator.runMeter) 55 | export(iterator.runZip) 56 | export(iterator.todf) 57 | export(kFold) 58 | export(lag) 59 | export(lagGenerator) 60 | export(loadACS) 61 | export(ma) 62 | export(mem.usage) 63 | export(mergeCensus) 64 | export(mergeGazeteer) 65 | export(mergeShapeFeatures) 66 | export(partsGenerator) 67 | export(piecewise.regressor) 68 | export(plot.MeterDataClass) 69 | export(plot.WeatherClass) 70 | export(plot.solarGeom) 71 | export(rDFA) 72 | export(rDFG) 73 | export(regressor.piecewise) 74 | export(regressor.split) 75 | export(regressorDF) 76 | export(rm.col) 77 | export(run.profile) 78 | export(run.query) 79 | export(runDateFilterIfNeeded) 80 | export(sanityCheckDataSource) 81 | export(showCons) 82 | export(solarGeom) 83 | export(spread.boot) 84 | export(spreadScore) 85 | export(summarizeModel) 86 | export(target) 87 | export(tic) 88 | export(toc) 89 | export(toutChangePoint) 90 | export(toutChangePointFast2) 91 | export(toutDailyCPGenerator) 92 | export(toutDailyDivergeCPGenerator) 93 | export(toutDailyFixedCPGenerator) 94 | export(toutDailyFlexCPGenerator) 95 | export(toutDailyNPCPGenerator) 96 | export(toutPieces24Generator) 97 | export(toutPieces24LagGenerator) 98 | export(toutPieces24MAGenerator) 99 | export(validateRes) 100 | export(weatherFeatures) 101 | export(writeCSVData) 102 | export(writeDatabaseData) 103 | export(writeH5Data) 104 | importClassesFrom(acs,acs) 105 | importClassesFrom(acs,acs.lookup) 106 | importClassesFrom(acs,geo) 107 | importClassesFrom(acs,geo.set) 108 | importFrom(dplyr,"%>%") 109 | -------------------------------------------------------------------------------- /R/baseline.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | # baselining methods are TBD 6 | -------------------------------------------------------------------------------- /R/classes-weather.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | #' @title 6 | #' S3 class that holds and normalizes weather data 7 | #' 8 | #' @description 9 | #' Standard format and functions for loading, manipulating, and visualizing weather data in VISDOM. 10 | #' 11 | #' @section Parameters: 12 | #' 13 | #' @param geocode Primary geographic locator used by the data source. This is 14 | #' very often a zip code, but data sources can implement it as census block 15 | #' or any geographic category that covers all meters, with time series 16 | #' observations of weather data associated with a geocode. 17 | #' @param doMeans Indicator of whether to calculate and retain daily and monthly 18 | #' mean values of observations. Disable when not in use for faster performance. 19 | #' @param useCache Data caching option indicating to the data source whether cached 20 | #' data should be used to populate the class. Data sources implement their own 21 | #' caching, but the expectation is that they will hit the underlying data base 22 | #' once to get the data in the first place and then save it to a local cache from 23 | #' which it will be retrieved on subsequent calls. See \link{run.query} for details. 24 | #' @param doSG Option indicating whether expensive solar geometry calculations 25 | #' that rely on the \link{solaR} package should be performed. Off by default for 26 | #' performance reasons. See \link{solarGeom} for details. 27 | #' 28 | #' @details 29 | #' \code{WeatherData} is compatible by default with the output (i.e. database table) of 30 | #' python weather data scraping code that draws upon NOAA's NOAA Quality Controlled 31 | #' Local Climatological Data (QCLCD) ftp server. See \code{http://github.com/sborgeson/local-weather} 32 | #' for code to download and convert weathre data into CSV files, the weather data zip files at 33 | #' \code{http://www.ncdc.noaa.gov/orders/qclcd/}, and the QCLCD homepage here 34 | #' \code{https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/quality-controlled-local-climatological-data-qclcd} 35 | #' @export 36 | #' @seealso \code{\link{MeterDataClass}} 37 | #' @examples 38 | #' \dontrun{ 39 | #' DATA_SOURCE = TestData(10) 40 | #' weather = WeatherData(geocode='12601') 41 | #' plot(weather) 42 | #' } 43 | WeatherClass = function(geocode,raw=NULL,doMeans=T,useCache=F,doSG=F){ 44 | if( is.null(raw) ) { 45 | raw = DATA_SOURCE$getWeatherData(geocode,useCache=useCache) 46 | } 47 | if( !'date' %in% names(raw) ) { 48 | datesIdx = which( names(raw) %in% 'dates') 49 | if(length(datesIdx) == 1) { 50 | names(raw)[datesIdx] = 'date' 51 | } 52 | } 53 | if(length(raw)==0) stop(paste('No data found for geocode',geocode)) 54 | requiredCols = c("date", "temperaturef", "pressure", "dewpointf", "hourlyprecip" ) 55 | if( any(! requiredCols %in% names(raw)) ) { 56 | missing = paste( requiredCols[which(! requiredCols %in% names(raw))], collapse=', ' ) 57 | stop(paste('Required named data columns are missing:', missing)) 58 | } 59 | dates = raw[,'date'] 60 | if( ! 'POSIXct' %in% class(raw$dateclass) ) { 61 | dates = as.POSIXct(dates, tz="America/Los_Angeles", origin='1970-01-01', '%Y-%m-%d %H:%M:%S') 62 | } 63 | rawData = data.frame( 64 | dates = dates, 65 | day = as.Date(dates,tz="America/Los_Angeles"), 66 | tout = raw[,'temperaturef'], 67 | pout = raw[,'pressure'], 68 | rain = raw[,'hourlyprecip'], 69 | dp = raw[,'dewpointf'] 70 | #wind = raw[,'windspeed'] 71 | ) 72 | days = unique(as.Date(rawData$day)) 73 | 74 | 75 | sg = list() 76 | if(doSG) { 77 | #TODO: make solarGeom generic to different geo codes 78 | sg = solarGeom(raw[,'date'],zip=geocode) 79 | } 80 | 81 | # FYI, spring forward causes NA dates to find these: 82 | # which(is.na(dates)) 83 | 84 | # TODO: do we need to do anything about the NA values? 85 | dayMeans = c() 86 | dayMins = c() 87 | dayMaxs = c() 88 | dayStats = c() 89 | dayLengths = c() 90 | if(doMeans) { 91 | #dayMeans = dailyMeans(rawData) 92 | #dayMins = dailyMins(rawData) 93 | #dayMaxs = dailyMaxs(rawData) 94 | dayStats = daySummarize(rawData[,-1],.by = 'day') 95 | if(doSG) { 96 | dayLengths = dailySums(sg[,c('dates','daylight')]) 97 | } 98 | } 99 | 100 | obj = list ( 101 | zip = geocode, 102 | geocode = geocode, 103 | days = days, 104 | dates = rawData$dates, 105 | tout = rawData$tout, 106 | sg = sg, 107 | daylight = sg$daylight, 108 | rawData = rawData, 109 | dayStats = dayStats, 110 | #dayMeans = dayMeans, 111 | #dayMins = dayMins, 112 | #dayMaxs = dayMaxs, 113 | dayLengths = dayLengths, 114 | get = function(x) obj[[x]], 115 | # Not sure why <<- is used here 116 | # <<- searches parent environments before assignment 117 | # http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html 118 | set = function(x, value) obj[[x]] <<- value, 119 | props = list() 120 | ) 121 | 122 | # returns relative humidity as decimal from 0 to 1 given temperature and dewpoint 123 | # using August-Roche-Magnus approximation: http://andrew.rsmas.miami.edu/bmcnoldy/humidity_conversions.pdf 124 | obj$rh = function(tout,dp) { 125 | a = 17.271 126 | b = 237.7 127 | tout = (tout - 32) * 5/9 128 | dp = (dp - 32) * 5/9 129 | rh = exp(a*dp/(b + dp)) / exp(a*tout/(b + tout)) 130 | } 131 | 132 | obj$resample = function(newDates,name='tout') { 133 | # approx returns both the newDates and the interpreted values 134 | # but we only need the values 135 | if(! 'POSIXct' %in% class(newDates)) { 136 | newDates = as.POSIXct(newDates) 137 | } 138 | if(all(is.na(obj$rawData[,name]))) { 139 | return( rep(NA,length(newDates))) 140 | } 141 | # if all the dates match, return the clean data, otherwise, interpolate 142 | dateMatch = obj$dates %in% newDates 143 | 144 | #miss = which(! newDates %in% obj$dates) 145 | #print(paste(sum(dateMatch),length(newDates))) 146 | 147 | if(sum(dateMatch) == length(newDates)) { 148 | a = obj$rawData[,name][dateMatch] 149 | return(a) 150 | } 151 | a = approx(obj$dates, obj$rawData[,name], newDates, method="linear")[[2]] 152 | #b = a[2] 153 | if (all(is.na(a)) & name == 'tout'){ 154 | print(paste(obj$dates[1],obj$dates[length(obj$dates)])) 155 | print(paste(newDates[1], newDates[length(newDates)])) 156 | stop("No weather data available") 157 | } 158 | return(a) 159 | } 160 | 161 | #obj <- list2env(obj) 162 | class(obj) = "WeatherClass" 163 | return(obj) 164 | } 165 | 166 | 167 | # Use dplyr to summarize a data frame, typically by date 168 | #' @export 169 | daySummarize = function(rawData, fns=c('mean','min','max'), .by='day') { 170 | agg = rawData %>% 171 | dplyr::group_by_(.dots = .by) %>% 172 | dplyr::summarize_all( dplyr::funs_(fns, args=list(na.rm=T)) ) 173 | #dplyr::mutate( rain = rain * 24 ) 174 | return(data.frame(agg)) 175 | } 176 | 177 | #' @export 178 | plot.WeatherClass = function(w,colorMap=NA,main=NULL,issueTxt='',type='tout',...) { 179 | # needs a list, called w with: 180 | # w$geocode geocode for the title 181 | # w$days (1 date per day of data) 182 | # w$tout (vector of outside temperature readings) 183 | 184 | if(type=='tout') { 185 | if(is.null(main)) { main <- paste(w$geocode,'weather info',sep='') } 186 | if(length(colorMap) < 2) { colorMap = rev(colorRampPalette(RColorBrewer::brewer.pal(11,"RdBu"))(100)) } #colorMap = heat.colors(100) 187 | op <- par(no.readonly = TRUE) 188 | #par( mfrow=c(1,1), oma=c(2,2,3,0),mar=c(2,2,2,2))# Room for the title 189 | 190 | # image is messed up. we need to reverse the rows, convert to a matrix and transpose the data 191 | # to get the right orientation! 192 | image(matrix(w$tout,nrow=24),col=colorMap,axes=F,main='F') 193 | axis(1, at = seq(0, 1, by = 1/6),labels=0:6 * 4,mgp=c(1,0,0),tcl=0.5) 194 | if(length(w$days) > 16) { 195 | axis(2, at = seq(1,0, by = -1/15),labels=format(w$days[seq(1/16, 1, by = 1/16) * length(w$days)],'%m/%d/%y'),las=1,mgp=c(1,0,0),tcl=0.5) 196 | } else { 197 | axis(2, at = seq(1,0, by = -1/(length(w$days)-1)),labels=format(w$days,'%m/%d/%y'),las=1,mgp=c(1,0,0),tcl=0.5) 198 | } 199 | par(op) 200 | } 201 | } 202 | -------------------------------------------------------------------------------- /R/dataSetDocs.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | # documentation for package data 6 | 7 | #' Zipcode to location mapping 8 | #' 9 | #' A dataset containing columns with US zip codes and the corresponding locaiton information 10 | #' 11 | #' @usage data(ERLE_ZIP_LOCATION) 12 | #' @format A data frame with 43191 rows and 7 variables: 13 | #' \describe{ 14 | #' \item{zip}{zip code, as a number} 15 | #' \item{city}{city the zip code is in} 16 | #' \item{state}{two letter abbreviation for the state the zip code is in} 17 | #' \item{latitude}{latitude of the zip code center (post office?)} 18 | #' \item{longitude}{longitude of the zip code center (post office?)} 19 | #' \item{timezone}{timesone, as offste relative to GMT time} 20 | #' \item{dst}{indicator for participation in daylight savings time} 21 | #' } 22 | #' @source \url{http://can.not.remember/} 23 | "ERLE_ZIP_LOCATION" 24 | 25 | #' A zip code to CA climate zone mapping 26 | #' 27 | #' A dataset containing columns with US zip codes and the corresponding CA CEC climate zones 28 | #' 29 | #' @usage data(CA_ZIP_CLIMATE) 30 | #' @format A data frame with 1706 rows and 2 variables: 31 | #' \describe{ 32 | #' \item{ZIP.Code}{zip code, as a number} 33 | #' \item{Building.Climate.Zone}{the CA CEC climate zone the zip code is in} 34 | #' } 35 | #' @source \url{http://CEC.web.site.somewhere/} 36 | "CA_ZIP_CLIMATE" 37 | 38 | #' A zip code to census zip code tabulation area mapping 39 | #' 40 | #' A dataset containing columns with US zip codes and the corresponding ZCTA ids 41 | #' 42 | #' @usage data(ZIP_TO_ZCTA) 43 | #' 44 | #' @format A data frame with 41979 rows and 5 variables: 45 | #' \describe{ 46 | #' \item{ZIP}{zip code, as a 0 padded string} 47 | #' \item{ZIPType}{zip code type} 48 | #' \item{CityName}{Name of the city the zip code is in} 49 | #' \item{StateAbbr}{Two letter abbreviation for the state the zip code is in} 50 | #' \item{ZCTA}{the census zip code tabulation area (ZCTA) the zip code most overlaps with} 51 | #' } 52 | #' @source \url{http://some.random.helpful.blog/} 53 | "ZIP_TO_ZCTA" 54 | 55 | #' A summary of census statistics for each ZCTA in the census 56 | #' 57 | #' A dataset containing summary census statistics for each ZCTA in the census 58 | #' 59 | #' @usage data(CENSUS_GAZ) 60 | #' 61 | #' @format A data frame with 33120 rows and 9 variables: 62 | #' \describe{ 63 | #' \item{ZCTA}{zip code tabulation area as a 0 padded string} 64 | #' \item{POP10}{Population in 2010} 65 | #' \item{ALAND}{Land area in sqft?} 66 | #' \item{AWATER}{Water area in sqft?} 67 | #' \item{ALAND_SQMI}{Land area in sqare miles} 68 | #' \item{AWATER_SQMI}{Water area in square miles} 69 | #' \item{INTPTLAT}{Latitude of the area} 70 | #' \item{INTPTLONG}{Longitude of the area} 71 | #' } 72 | #' @source \url{http://us.census.link/} 73 | "CENSUS_GAZ" 74 | -------------------------------------------------------------------------------- /R/features-RamCore.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | #daily_usage = Load 6 | #bud = Base load 7 | #pud = Peak load 8 | #pud/bud = Peak-to-base ratio 9 | #morning_peak_hour = morning Peak hour (most common) 10 | #afternoon_peak_hour = afternoon Peak hour (most common) 11 | #daily_peak_hour = Peak hour (most common) 12 | #Max load [Actual maximum in period] 13 | #Min load [Actual minimum in period] 14 | #Load duration (@ 99%?) 15 | #ldc_sum, ldc_win = Load duration curve on common basis 16 | #Load quantiles (1%, 97%, 10%, 20%, ?, 90%) 17 | #Total load during peak hours 18 | #Cooling sensitivity [using Jungsuk?s improved model!] 19 | #Heating sensitivity [using Jungsuk?s improved model!] 20 | #Lighting sensitivity 21 | #Heating/Cooling change point 22 | #Cooling energy 23 | #Heating energy 24 | #Baseload energy 25 | #Lighting energy 26 | #Load shape frequency in the 7 classes 27 | #Entropy 28 | #Gas 29 | 30 | #' @export 31 | coreFeaturesfn = function(meterData, ctx, ...){ 32 | 33 | #load('dicionary200.RData') ## for dic200 use 34 | dic200 = ctx$dic200 35 | 36 | ## get meter data 37 | #meterData = getMeterDataClass(custID) ## assume this is meter data, but not exactly sure the format yet 38 | ## for the time being, I assume the number of rows are the number of days 39 | dlen = nrow(meterData$kwMat) 40 | ## need to decide whether to return any possible features when the meter doesn't have some data for given period 41 | ## using ctx$start.date and ctx$end.date, we can handle this 42 | 43 | ## declare the features 44 | len = 1 ## as this function is for each meter 45 | # cool_coef = matrix(0,len,48) ## temp coef below a breakpoint, and standard deviation of the coef 46 | # heat_coef = matrix(0,len,48) ## temp coef over a breakpoint, and standard deviation of the coef 47 | 48 | daily_usage = matrix(0,len,dlen) ## daily consumption 49 | bud = matrix(0,len,dlen) ## base usage per day 50 | pud = matrix(0,len,dlen) ## peak usage per day 51 | morning_peak_hour = matrix(0,len,dlen) ## morning peak hour per day 52 | afternoon_peak_hour = matrix(0,len,dlen) ## afternoon peak hour per day 53 | daily_peak_hour = matrix(0,len,dlen) ## daily peak hour per day 54 | encode200 = matrix(0,len,dlen) ## encoding by the dictionary of size 200 55 | entropy200 = matrix(0,len,3) ## load shape code entropy using the dictionary of size 200, 1st column is for summer, 2nd for winter, 3rd for a year 56 | cv = matrix(0,len,dlen) ## consumption variability 57 | ldc.sum = matrix(0,len,24) ## load duration curve 58 | ldc.win = matrix(0,len,24) ## load duration curve 59 | ldc.all = matrix(0,len,24) ## load duration curve 60 | 61 | delta = 0.2 ## parameter to be used in generating 4 features below 62 | ram_sum = matrix(0,len,1) ## ram's idea 63 | ram_win = matrix(0,len,1) ## ram's idea 64 | ram_sum_vec = matrix(0,len,24) ## ram's idea 65 | ram_win_vec = matrix(0,len,24) ## ram's idea 66 | 67 | ## features from ram's definition 68 | RS = matrix(0,len,1) 69 | VS=matrix(0,len,1) 70 | Pu=matrix(0,len,1) 71 | Pu.mean=matrix(0,len,1) 72 | R1=matrix(0,len,1) 73 | R2=matrix(0,len,1) 74 | R3=matrix(0,len,1) 75 | R4=matrix(0,len,1) 76 | 77 | ## to calculate temperature sensitivity, we need to know zip code 78 | ## it can be provided via "meterData" or "ctx". 79 | ## Anyway, I decided to use temperature sensitivity from Sam's features which can be extracted in more general condition 80 | 81 | odata = kjs.impute(as.matrix(meterData$kwMat),1:24) ## depending on meter data format, adjust here 82 | odata[which(is.na(odata))] <- 0 # HACK!! What to do with missing data!? 83 | i=1 84 | encode200 = class::knn(dic200,odata,1:200) 85 | 86 | ## assume shannon.entropy2 in utility.r is included in the same package. if not, include manually. 87 | ## sumidx and -sumidx should be populated based on ctx$start.date and ctx$end.date 88 | 89 | midx = as.POSIXlt(meterData$days)$mon + 1 ## month index 90 | sumidx = which(midx%in%c(5:10)) ## consider May to Oct as summer 91 | #print(dim(odata)) 92 | #print(length(midx)) 93 | #print(encode200[sumidx]) 94 | entropy200[i,1] = shannon.entropy2(encode200[sumidx]) 95 | entropy200[i,2] = shannon.entropy2(encode200[-sumidx]) 96 | entropy200[i,3] = shannon.entropy2(encode200) 97 | 98 | morning_peak_hour[i,] = apply(odata[,1:12],1,which.max) 99 | afternoon_peak_hour[i,] = apply(odata[,13:24],1,which.max)+12 100 | daily_peak_hour[i,] = apply(odata,1,which.max) 101 | peak.hours = 13:18 # 1 to 5 pm 102 | peak.hr.tot.load = mean(apply(odata[,peak.hours],1,sum)) 103 | 104 | daily_usage[i,] = apply(odata,1,sum) 105 | bud[i,] = apply(odata,1,min) 106 | pud[i,] = apply(odata,1,max) 107 | 108 | bu = sum(odata[,10:22]) 109 | tu = sum(odata) 110 | RS[i] = bu/tu ## ratio between peak time usage(9AM-10PM) and total usage 111 | # VS[cnt] = 0.8/(1+0.2*(tu-bu)/bu) 112 | VS[i] = 0.2/(1+0.8*(tu-bu)/bu) 113 | tmp = apply(odata[,10:22],1,sum) 114 | Pu[i] = max(tmp) ## max of peak time usage 115 | Pu.mean[i] = mean(tmp) ## mean of peak time usage 116 | 117 | ## monthly 1~5th daily peak selection 118 | # note pud is daily peak 119 | top5PerMonth = matrix(0,5,12) # rows are top 5 ranked daily peaks, cols are months 120 | for (j in unique(midx)){ 121 | ## assume the data length is at most 1year 122 | top5PerMonth[,j] = sort(pud[which(midx==j)],decreasing=T)[1:5] 123 | } 124 | 125 | peakSums = apply(top5PerMonth,1,sum) 126 | print(peakSums) 127 | R1[i] = 1-peakSums[2]/peakSums[1] # ratio of 2nd highest peaks to highest peaks 128 | R2[i] = 1-peakSums[3]/peakSums[1] 129 | R3[i] = 1-peakSums[4]/peakSums[1] 130 | R4[i] = 1-peakSums[5]/peakSums[1] 131 | 132 | cv[i,] = apply(odata,1,function(j){ 133 | sum(sqrt((1/24)^2+diff(j)^2)) 134 | }) 135 | #print(length(sumidx)) 136 | #print(dim(odata)) 137 | #print(odata[sumidx,]) 138 | ld = sapply((1:9)*.1,FUN =function(x) { sum(odata>(max(odata)*x)) } ) 139 | ldc.all[i,] = apply(apply(odata,1,sort,decreasing=T),1,mean,na.rm=T) 140 | ldc.sum[i,] = apply(apply(odata[sumidx,],1,sort,decreasing=T),1,mean,na.rm=T) 141 | ldc.win[i,] = apply(apply(odata[-sumidx,],1,sort,decreasing=T),1,mean,na.rm=T) 142 | 143 | fmat = apply(odata,1,function(j){ 144 | maxbin = floor(max(j/delta)) 145 | ftable = matrix(0,1,25) 146 | if (maxbin<1) ftable 147 | tl = c() 148 | for (k in 1:maxbin){ 149 | t3 = diff(c(0,j>k*delta,0)) 150 | tl = c(tl, which(t3==-1)-which(t3==1)) 151 | } 152 | tmp = table(tl) 153 | ftable[as.numeric(names(tmp))] = tmp 154 | ftable[25] = mean(tl) 155 | ftable 156 | }) 157 | ram_sum_vec[i,] = apply(fmat[1:24,sumidx],1,mean) 158 | ram_win_vec[i,] = apply(fmat[1:24,-sumidx],1,mean) 159 | 160 | ram_sum[i,] = mean(fmat[25,sumidx]) 161 | ram_win[i,] = mean(fmat[25,-sumidx]) 162 | 163 | return(list(custID = meterData$id, daily_usage = daily_usage, bud = bud, pud = pud, 164 | morning_peak_hour = morning_peak_hour, afternoon_peak_hour = afternoon_peak_hour, daily_peak_hour = daily_peak_hour, 165 | peak.hr.tot.load = peak.hr.tot.load, 166 | encode200 = encode200, entropy200 = entropy200, cv = cv, ld = ld, ldc.all = ldc.all, ldc.sum = ldc.sum, ldc.win = ldc.win, 167 | ram_sum = ram_sum, ram_win = ram_win, ram_sum_vec = ram_sum_vec, ram_win_vec = ram_win_vec, 168 | RS = RS, VS = VS, Pu = Pu, Pu.mean = Pu.mean, R1 = R1, R2 = R2, R3 = R3, R4 = R4)) 169 | 170 | } 171 | -------------------------------------------------------------------------------- /R/features-weather.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | #' @export 6 | weatherFeatures = function(w){ # w is an instance of WeatherClass 7 | summerMon = 4:8 # May through Sept - zero based 8 | summerSubset = as.POSIXlt(w$rawData$dates)$mon %in% summerMon 9 | weatherValues = w$rawData[,! names(w$rawData) %in% c('dates','day','date','days')] 10 | yMeans = colMeans(weatherValues,na.rm=T) # annual means 11 | sMeans = colMeans(subset(weatherValues,subset = summerSubset),na.rm=T) # just summer 12 | wMeans = colMeans(subset(weatherValues,subset = !summerSubset),na.rm=T) # just winter 13 | names(sMeans) = paste('summer.',names(sMeans),sep='') 14 | names(wMeans) = paste('winter.',names(wMeans),sep='') 15 | features = append(list(zip5=w$zip),as.list(c(yMeans,wMeans,sMeans))) 16 | 17 | # TODO: Add HDD, CDD 18 | 19 | return(features) 20 | } 21 | 22 | #' @export 23 | getWeatherSummary = function() { 24 | summaryFile = paste(DATA_SOURCE$CACHE_DIR,'/weatherSummary.RData',sep='') 25 | if(file.exists(summaryFile)) { 26 | load(summaryFile) 27 | } 28 | else { 29 | zips = DATA_SOURCE$getZips(useCache=T) 30 | i = 0 31 | n = length(zips) 32 | wList = as.list(rep(NA,length(zips))) 33 | for(zip in zips) { 34 | i = i+1 35 | print(paste(zip,'(',i,'/',n,')')) 36 | tryCatch({ 37 | wList[[i]] = weatherFeatures(WeatherClass(zip,doMeans=F,useCache=T)) 38 | }, 39 | error = function(e) {print(paste('Error in weatherFeatures:',e))}, 40 | finally = {} ) 41 | 42 | } 43 | weatherSummary = data.frame(do.call(rbind,wList)) 44 | colnames(weatherSummary)[1] <- c('zip5') 45 | for(col in colnames(weatherSummary)) { 46 | if(is.factor(weatherSummary[[col]])) { 47 | # as.numeric(levels(f))[f] is the preferred way to convert factors to numerics 48 | weatherSummary[[col]] = as.numeric(levels(weatherSummary[[col]] ))[weatherSummary[[col]]] 49 | } 50 | } 51 | save(weatherSummary,file=summaryFile) 52 | } 53 | weatherSummary$toutC = (weatherSummary$tout - 32) * 5/9 54 | weatherSummary$summer.toutC = (weatherSummary$summer.tout - 32) * 5/9 55 | weatherSummary$winter.toutC = (weatherSummary$winter.tout - 32) * 5/9 56 | weatherSummary$rain = weatherSummary$rain * 365 * 24 57 | weatherSummary$rain[weatherSummary$rain > 120] = NA # there is junk rain data (suprise!!) 58 | return(weatherSummary) 59 | } 60 | 61 | #print(weatherFeatures(WeatherClass(94610,useCache=T))) 62 | -------------------------------------------------------------------------------- /R/filter.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Convergence Data Analytics 2 | # Direct inquiries to Sam Borgeson (sam@convergenceda.com) 3 | 4 | #' @title Date filter meter data 5 | #' 6 | #' @description 7 | #' Utility function that filters raw meter data (one customer day per row) based on date criteria 8 | #' 9 | #' @param df The data frame of data to filter 10 | #' 11 | #' @param filterRules A named list of filtering rules. Supported list entries include: 12 | #' 13 | #' \code{MOY} - list of months of the year to include, using 1 for Jan through 12 for Dec 14 | #' 15 | #' \code{DOW} - days of week to include, using 1 for Sun and 7 for Sat 16 | #' 17 | #' \code{start.date} - the first day of data to include: all dates before this date are excluded 18 | #' 19 | #' \code{end.date} - the last day of data to include: all dates after this date are excluded 20 | #' 21 | #' @param dateCol The name of the column in 'df' with dates in it. 22 | #' 23 | #' @export 24 | applyDateFilters = function(df, filterRules=NULL, dateCol="dates") { 25 | # if no rules, don't filter 26 | if( is.null(filterRules) ) { return(df) } 27 | names(filterRules) = tolower(names(filterRules)) 28 | # sanity check inputs - test for one or more valid filters 29 | validFilters = c('start.date', 'end.date', 'dow', 'moy') 30 | if( ! any(validFilters %in% names(filterRules) ) ) { 31 | print(names(filterRules)) 32 | stop('No valid filter rules.') 33 | } 34 | 35 | # sanity check inputs - check for proper date format 36 | dts = df[, dateCol] 37 | if( any( c('character','factor', 'Date', 'POSIXct') %in% class(dts) )) { 38 | if('Date' %in% class(dts)) { 39 | warning('Conversion from Date class to POSIXlt assumes UTC time zone, which can shift your dates around') 40 | } 41 | dts = as.POSIXlt(dts) 42 | } 43 | if( ! 'POSIXlt' %in% class(dts)) { 44 | stop(sprintf('Dates from column %s not a valid date class type: %s',dateCol, class(dts))) 45 | } 46 | 47 | # if any of these are not in the list, they will be set to null 48 | DOW = filterRules$dow 49 | MOY = filterRules$moy 50 | start = filterRules$start.date 51 | end = filterRules$end.date 52 | 53 | filter = rep(T, length(dts)) 54 | if( ! is.null(DOW) ) { filter = filter & (dts$wday + 1) %in% DOW } 55 | if( ! is.null(MOY) ) { filter = filter & (dts$mon + 1) %in% MOY } 56 | if( ! is.null(start)) { filter = filter & dts >= as.POSIXct(start) } 57 | if( ! is.null(end)) { filter = filter & dts <= as.POSIXct(end) } 58 | return( df[which(filter),]) 59 | } 60 | 61 | 62 | 63 | -------------------------------------------------------------------------------- /R/solaRUtil.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | orig_TZ = Sys.timezone() #Sys.getenv(x='TZ') 6 | require('solaR') # this changes the timezone to UTC!!!! 7 | Sys.setenv(tz=orig_TZ) # change timezone back 8 | 9 | 10 | 11 | # zone elevation and azimuth breaks. 12 | # elevation greater than the max here are assugned to the overheat zone z0 13 | # elevation < 0 are assigned to the night zone 14 | eBr = c(60,30,0) 15 | aBr = c(seq(0,360,45)) 16 | 17 | nameZone = function(azim,elev,azimBreaks,elevBreaks) { 18 | if(elev <= 0) return('night') 19 | if(elev >= max(elevBreaks)) return('z0') 20 | for (i in 1:(length(azimBreaks)-1)) { 21 | for (j in 1:(length(elevBreaks)-1)) { 22 | aBounds = sort(c(azimBreaks[i],azimBreaks[i+1])) # ensure we know which is smallest 23 | eBounds = sort(c(elevBreaks[j],elevBreaks[j+1])) 24 | if( (azim >= aBounds[1] & aBounds[2] > azim) & 25 | (elev >= eBounds[1] & eBounds[2] > elev) ) return(paste('z',i,'.',j,sep='')) 26 | } 27 | } 28 | return('?') 29 | } 30 | 31 | #' @export 32 | solarGeom = function(dates,zip=NULL,lat=NULL,azimBreaks=seq(0,360,45),elevBreaks=c(45,0)) { 33 | # Used to load through direct file access 34 | #ZIP_LOCATION <- read.csv(file.path("data/Erle_zipcodes.csv"), header=TRUE) 35 | # now laods as a part of the VISDOM package data 36 | if( ! exists('ERLE_ZIP_LOCATION')) { 37 | data('ERLE_ZIP_LOCATION', envir = environment()) # loads ERLE_ZIP_LOCATION mapping between zip code and lat/lon 38 | } 39 | blanks = is.na(dates) # dates have na's for daylight savings spring forward day 40 | dates[blanks] <- dates[1] # they must be overridden as real dates for calcSol to work 41 | if(is.null(lat)) { 42 | lat=37.87 43 | zip = as.integer(zip) 44 | print( paste('Deriving lat data from',zip) ) 45 | zipRow = ERLE_ZIP_LOCATION[ERLE_ZIP_LOCATION$zip == zip,] 46 | if(dim(zipRow)[1]==0) { 47 | print( paste("Couldn't find zip information. Using lat",lat) ) 48 | } else { 49 | lat = zipRow[1,'latitude'] 50 | } 51 | 52 | } 53 | print(paste('Latitude:',lat)) 54 | 55 | # Berkeley 37.8717 N, 122.2728 W 56 | # calcSol computes the angles which describe the intradaily apparent movement of the Sun from the Earth 57 | # solObj is an S4 class with lots of solar info 58 | # recall that 'slots' in S4 objects are accessed via the @ operator 59 | # suppressWarnings because zoo complains about duplicate dates caused by daylight savings time 60 | solObj = suppressWarnings(calcSol(lat,sample="hour",BTi=dates,EoT=T,keep.night=T)) 61 | sg = as.data.frameI(solObj) 62 | # the solI slot is a zoo object and we want the time stamps 63 | sg$dates = time(solObj@solI) 64 | sg$elevation = sg$AlS * 180/pi # convert from radians 65 | sg$azimuth = sg$AzS * 180/pi # convert from radians 66 | sg$azimuth = sg$azimuth %% 360 # no negative or > 360 degrees 67 | 68 | # get named zones based on breaks into a factor 69 | sg$zone = factor(apply(as.matrix(sg[,c('azimuth','elevation')]),1, 70 | function(X) nameZone(X[1],X[2],azimBreaks,elevBreaks))) 71 | sg$daylight = sg$elevation > 0 72 | #sg$AlS is the solar elevation 73 | #sg$AzS is the solar asimuth 74 | class(sg) <- c('solarGeom',class(sg)) 75 | sg[blanks,] <- NA # put the blanks back in for the overridden values 76 | return(sg) 77 | } 78 | 79 | #' @export 80 | plot.solarGeom = function(solarGeom,azimBreaks=seq(0,360,45),elevBreaks=c(45,0),color=NULL) { 81 | require('ggplot2') 82 | require(gridExtra) 83 | g = ggplot(data.frame(azimuth=c(0,360),elevation=rep(max(elevBreaks),2)),aes(y=elevation,x=azimuth)) + 84 | geom_polygon(color="grey",size=1,fill=NA) + 85 | coord_polar(start=pi) + 86 | scale_x_continuous(breaks=c(seq(0,315,by=45)),limits=c(0,360), # Note that 0 is S in this case. 87 | labels=paste(c('S','SW','W','NW','N','NE','E','SE'),seq(0,315,by=45)) ) + 88 | scale_y_reverse(breaks=c(seq(0,90,by=15)),limits=c(90,0)) + 89 | labs(title="Annual solar path for each of 24 hours of the day") + 90 | geom_text(x=0,y=-90,label='z0',color='grey') # no reverse axis for text make negative instead 91 | 92 | for (i in 1:(length(azimBreaks)-1)) { 93 | for (j in 1:(length(elevBreaks)-1)) { 94 | zone = data.frame(elevation=c(elevBreaks[j],elevBreaks[j+1],elevBreaks[j+1],elevBreaks[j], elevBreaks[j]), 95 | azimuth =c(azimBreaks[i],azimBreaks[i], azimBreaks[i+1],azimBreaks[i+1],azimBreaks[i])) 96 | g = g + geom_polygon(data=zone, colour="gray",size=1,fill=NA) + 97 | geom_text(x=mean(c(azimBreaks[i],azimBreaks[i+1])), 98 | y=-1*mean(c(elevBreaks[j],elevBreaks[j+1])), # no reverse axis for text make negative instead 99 | label=paste('z',i,'.',j,sep=''),color='grey') 100 | 101 | } 102 | } 103 | cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") 104 | 105 | if (is.null(color)) { 106 | g = g + geom_point(data=solarGeom,aes(x=azimuth,y=elevation)) 107 | } 108 | else if (color == 'zone') { 109 | g = g + geom_point(data=solarGeom,aes(x=azimuth,y=elevation,color=zone)) + 110 | scale_color_brewer(palette='YlGnBu') #color=lubridate::hour(dates) )) color=zone)) 111 | 112 | } 113 | else if(color == 'hour') { 114 | g = g + geom_point(data=solarGeom,aes(x=azimuth,y=elevation,color=lubridate::hour(dates))) + 115 | scale_colour_gradientn(colours=cbPalette) #color=lubridate::hour(dates) )) color=zone)) 116 | } 117 | else { 118 | g = g + geom_point(data=solarGeom,aes(x=azimuth,y=elevation)) 119 | } 120 | grid.arrange(g) 121 | #return(g) 122 | } 123 | 124 | TEST=F 125 | if(TEST) { 126 | # View of whole path, including below horizon 127 | r = MeterDataClass(553991005,93304) 128 | sg = r$weather$sg 129 | plot(sg,color='junk') 130 | } 131 | -------------------------------------------------------------------------------- /R/testDataSource.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | #' @title 6 | #' Implementation of a data source that generates synthetic data for testing and examples 7 | #' 8 | #' @description 9 | #' Example implementation of the data source functions as a coherent data source. This one 10 | #' simply generates random synthetic data that conform to the formats required by \code{\link{MeterDataClass}} 11 | #' and \code{\link{WeatherClass}}, returning valid, but meaningless data for all calls. 12 | #' 13 | #' @param n Number of meters for the data source to provide synthetic data for: determines 14 | #' the return values for functions like \code{DATA_SOURCE$getIds()} and and \code{DATA_SOURCE$getAllData()} 15 | #' 16 | #' @details 17 | #' \code{TestData} is instantiated with a certain number of meters to generate data for. It then generates 18 | #' random data for that number and consumes that data to support the rest of its functions. This class 19 | #' can be used for examples, learning how data sources work, and providing data for tests of feature algorithms 20 | #' and meter data related classes. 21 | #' 22 | #' @seealso \code{\link{DataSource}} 23 | #' 24 | #' @examples 25 | #' \dontrun{ 26 | #' DATA_SOURCE = TestData(n=100) 27 | #' idList = DATA_SOURCE$getIds() 28 | #' dataMatrix = DATA_SOURCE$getAllData() 29 | #' } 30 | #' @export 31 | TestData = function( n=100 ) { 32 | obj = DataSource( ) 33 | 34 | obj$n = n 35 | 36 | obj$getHourlyAlignedData = function( n=NULL ) { 37 | # generate n meters worth of testable data 38 | if(is.null(n)) { n = obj$n } 39 | dates = as.Date('2013-01-01') + 0:364 40 | data = data.frame( id = rep(paste('meter',1:n,sep='_'),each=365), 41 | customerID = rep(paste('cust',1000 + 1:n,sep='_'),each=365), 42 | geocode = '94305', 43 | dates = rep(dates,n)) 44 | # no factors... 45 | data$id = as.character(data$id) 46 | data$customerID = as.character(data$customerID) 47 | data$geocode = as.character(data$geocode) 48 | # generate fake data nx24 49 | obs = matrix(rnorm(n*365*24,0,2) + rnorm(1,10,1.5),ncol=24) 50 | data = cbind(data,obs) 51 | names(data)[5:28] = paste('H',1:24,sep='') 52 | return(data) 53 | } 54 | 55 | obj$getIdMetaData = function(useCache=TRUE) { 56 | ids = obj$getIds() 57 | return( data.frame( id = ids, status='good', income = 'myob' ) ) 58 | } 59 | 60 | obj$getAllData = function(geocode,useCache=TRUE) { 61 | return( obj$getHourlyAlignedData( ) ) 62 | } 63 | 64 | obj$getMeterData = function(id, geo=NULL,useCache=TRUE) { 65 | return( obj$getHourlyAlignedData( n=1 ) ) 66 | } 67 | 68 | obj$getIds = function(geocode=NULL, useCache=TRUE) { 69 | # here we ignore the geo code, but a real data source would return 70 | # just the ids associated with the passed geocode (if applicable) 71 | return( unique(obj$getHourlyAlignedData()$id ) ) 72 | } 73 | 74 | obj$getGeocodes = function( useCache=TRUE ) { 75 | return(c('94305')) 76 | } 77 | 78 | obj$getGeoForId = function(id, useCache=TRUE) { 79 | return('94305') 80 | } 81 | 82 | obj$getWeatherData = function( geocode, useCache=T ) { 83 | set.seed(geocode) 84 | dates = as.POSIXct('2013-01-01',tz = "America/Los_Angeles" ) + 0:(365 * 24 - 1) * 3600 85 | data = data.frame( 86 | date = dates, 87 | temperaturef = (c(0:5,5:0)*3)[as.POSIXlt(dates)$mon + 1] + rep(rnorm(n=365, mean=0, sd=0.5), each=24) + rep( rep(c(54,55,56,57,58,56), each=4), 365 ), 88 | pressure = rep( rep(19,24), 365 ), 89 | hourlyprecip = rep( c( rep(0,12),rep(1,2),rep(0,10) ), 365 ), 90 | dewpointf = rep( rep(55,24), 365 ) 91 | ) 92 | 93 | return(data) 94 | } 95 | 96 | class(obj) = append(class(obj),"TestData") 97 | 98 | return(obj) 99 | } 100 | -------------------------------------------------------------------------------- /R/util-base.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | 6 | #' @importFrom dplyr %>% 7 | #' @export 8 | dplyr::`%>%` 9 | 10 | 11 | #' @export 12 | Mode <- function(x,rndTieBreak=F) { 13 | ux <- unique(x) 14 | tab = tabulate(match(x, ux)) 15 | maxCount = max(tab) 16 | 17 | modes = ux[tab == maxCount] # there could be more than one 18 | if(rndTieBreak) { 19 | out = sample(rep(modes,2),1) # random choice breaks ties; single mode is deterministic 20 | } else { 21 | out = rep(modes,2)[1] 22 | } 23 | if(length(out) == 0) out = NA 24 | return(out) 25 | } 26 | 27 | 28 | # lag (shift) an array of values by n places, filling in the gap at the beginning with n NAs 29 | #' @export 30 | lag = function(v,n=1) { 31 | if(n==0) return(v) 32 | return(c(rep(NA,n),head(v,-n))) 33 | } # prepend NAs and truncate to preserve length 34 | 35 | # finite difference between observations 36 | #' @export 37 | diff2 = function(v,n=1) { return(c(rep(NA,n),diff(v, n))) } # prepend NAs to preserve length of standard diff 38 | 39 | # calculate moving averages note adds n-1 NAs to beginning 40 | # as.numeric is called because filter returns 'ts' objects which apparently don't play well with cbind and data.frames 41 | #' @export 42 | ma = function(v,n=5,weights=NULL) { 43 | if(length(weights) == 0) { weights = rep(1/n,n) } # standard moving window average 44 | as.numeric(stats::filter(v, weights, sides=1)) 45 | } 46 | 47 | #' @title remove named, index, or logical columns, if present, from a data.frame 48 | #' @export 49 | rm.col = function(df, cols) { 50 | cls = class(cols) 51 | subdf = NULL 52 | if( cls == 'character') { 53 | keepers = setdiff(names(df),cols) #! names(df) %in% cols 54 | } else if (cls %in% c('numeric','integer') ) { 55 | keepers = setdiff(1:ncol(df),cols) #! 1:ncol(df) %in% cols 56 | } else if (cls == 'logical' ) { 57 | keepers = ! cols 58 | } else { 59 | stop( paste( 'Unrecognized class for columns', cls ) ) 60 | } 61 | return( df[,keepers] ) 62 | } 63 | 64 | # return only the complete cases of a data.frame 65 | #' @export 66 | cmplt = function(df) { 67 | return( df[complete.cases(df),]) 68 | } 69 | 70 | 71 | # utility fn to clear all active variables - 72 | #leaves .varName vars behind to eliminate these, use ls(all.names=TRUE) 73 | #' @export 74 | clearAllVars = function() { rm(list=ls(),envir=baseenv()) } 75 | -------------------------------------------------------------------------------- /R/util-timer.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | 6 | #' @export 7 | tic <- function(name='default', gcFirst = FALSE, type=c("elapsed", "user.self", "sys.self")) 8 | { 9 | type <- match.arg(type) 10 | assign(".type", type, envir=baseenv()) 11 | if(gcFirst) gc(FALSE) 12 | tic = list() 13 | tryCatch({tic <- get(".tic", envir=baseenv())}, 14 | error=function(e){ }) 15 | tic[name] <- proc.time()[type] 16 | assign(".tic", tic, envir=baseenv()) 17 | invisible(tic) 18 | } 19 | 20 | #' @export 21 | toc <- function(name='default',prefixStr=NA) 22 | { 23 | type <- get(".type", envir=baseenv()) 24 | tic <- get(".tic", envir=baseenv()) 25 | dt <- proc.time()[type] - as.numeric(tic[name]) 26 | 27 | # must be ints... 28 | d <- floor(dt / 3600 / 24) 29 | h <- floor(dt / 3600) %% 24 30 | m <- floor(dt / 60) %% 60 31 | s <- dt %% 60 32 | #f <- s - floor(s) 33 | #s <- floor(s) 34 | if(is.na(prefixStr)) prefixStr <- name 35 | print(paste(prefixStr,': ',sprintf('%02i:%02i:%02i:%05.2f',d,h,m,s),sep='')) 36 | 37 | } 38 | 39 | #' @export 40 | mem.usage = function() { 41 | mem.use = sapply(ls(), function(x) { object.size(get(x))}, simplify = FALSE) 42 | print(sapply(mem.use[order(as.integer(mem.use))], function(var) { format(var, unit = 'auto') } )) 43 | } 44 | 45 | # print out crude profiling stats fort he padded in function with arguments ... 46 | #' @export 47 | run.profile = function(fn, ...) { 48 | Rprof("profile1.out", line.profiling=TRUE) 49 | test = fn(...) 50 | Rprof(NULL) 51 | summaryRprof("profile1.out", lines = "show") 52 | } 53 | -------------------------------------------------------------------------------- /R/util-tree.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | #library(party) 6 | 7 | #' @title 8 | #' bootstrap several runs of the ctree algorithm 9 | #' 10 | #' @description 11 | #' This function repeatedly runs the ctree algorithm on random sub-samples of the underlying data 12 | #' to provide a measure of robustness for the individual ctree results. 13 | #' 14 | #' @param df data.frame with response variable to be explained and all explanatory variables for the ctree to run with 15 | #' @param nRun number of bootstrapping runs to perform 16 | #' @param nSample number of samples (with replacement) for each run 17 | #' @param ctl ctree control instance to be passed to ctree algorithm to control stopping criteria, etc. 18 | #' @param responseVar the name of the variable that the ctree is trying to explain 19 | #' @param ignoreCols an optional list of columns in df, but that should be excluded from the ctree modeling 20 | # 21 | #' @export 22 | ctree.boot = function(df, nRun=100, nSample=NULL, ctl=NULL, responseVar, ignoreCols) { 23 | if(is.null(nSample)) { nSample = nrow(df) / 10 } 24 | out = adply( 1:nRun, 25 | .margins=1, 26 | .fun = ctree.subscore, ctl=ctl, 27 | df=df, nSample=nSample, responseVar=responseVar, ignoreCols=ignoreCols, 28 | .id='run' 29 | #.parallel=T, 30 | #.progress = 'time' 31 | ) 32 | return(out) 33 | } 34 | 35 | #' @title 36 | #' bootstrap spread scores across several runs of the ctree algorithm 37 | #' 38 | #' @description 39 | #' This function repeatedly randomly sub-samples the passed data.frame to provide points for classificaiton 40 | #' by the passed ctree model and calculates the resulting spread score for classifications of each subset 41 | #' and returns a vector of the resulting scores. It is useful for determining the robustness of the spread score 42 | #' for a particular data set. 43 | #' 44 | #' @param df data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton 45 | #' @param df.ct ctree already trained on data, that is used to classify the data points in each sub sample. 46 | #' @param nRun number of bootstrapping runs to perform 47 | #' @param nSample number of samples (with replacement) for each run 48 | #' @param responseVar the name of the variable that the ctree is trying to explain 49 | #' @param ignoreCols an optional list of columns in df, but that should be excluded from the ctree modeling 50 | #' @export 51 | spread.boot = function(df, df.ct, nRun=100, nSample=NULL, responseVar, ignoreCols) { 52 | if(is.null(nSample)) { nSample = nrow(df) / 10 } 53 | out = plyr::adply( 1:nRun, 54 | .margins=1, 55 | .fun = spreadScore, 56 | df=df, 57 | nSample=nSample, 58 | df.ct=df.ct, 59 | responseVar=responseVar, #ignoreCols=ignoreCols, # spreadScore arguments 60 | .id='run' 61 | #.parallel=T, 62 | #.progress = 'time' 63 | ) 64 | names(out)[2] = 'score' 65 | return(out) 66 | } 67 | 68 | #' @title 69 | #' fit a ctree with passed data and compute and return its spread score 70 | #' 71 | #' @description 72 | #' This function trains a ctree using data from a passed data.frame and returns the resulting spread score. 73 | #' 74 | #' @param df data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton 75 | #' @param nSample number of samples if hte spread score should be run on a subset of the data 76 | #' @param ctl ctree control to specify ctree model parameters 77 | #' @param responseVar the name of the variable that the ctree is trying to explain 78 | #' @param ignoreCols an optional list of columns in df, but that should be excluded from the ctree modeling 79 | #' @export 80 | ctree.subscore = function(i, df, nSample=NULL, ctl=NULL, responseVar, ignoreCols) { 81 | sub = sample(1:nrow(df),nSample) 82 | df.ct = ctree.run(df, fmla=formula(paste(responseVar,'~ .')), sub, ctl=ctl, responseVar, ignoreCols ) 83 | score = spreadScore(df[sub, ], df.ct, nSample=nSample, responseVar=responseVar, ignoreCols=ignoreCols) 84 | return( data.frame(score=score) ) 85 | } 86 | 87 | #' @title 88 | #' run a ctree model 89 | #' 90 | #' @description 91 | #' This function trains a partykit::ctree using data from a passed data.frame. 92 | #' 93 | #' @param df data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton 94 | #' @param fmla formula, referring to values in the df, specifying the variables to be used to train the ctree model 95 | #' @param ctl partykit::ctree_control instance to specify ctree model parameters 96 | #' @param responseVar the name of the variable that the ctree is trying to explain 97 | #' @param ignoreCols an optional list of columns in df, but that should be excluded from the ctree modeling 98 | #' @param ... additional arguments to be passed to partykit::ctree 99 | #' 100 | #' @export 101 | ctree.run = function(df, fmla=NULL, sub=NULL, ctl=NULL, responseVar, ignoreCols, ... ) { 102 | if(is.null(fmla)) { 103 | fmla = formula(paste(responseVar,'~ .')) 104 | } 105 | if(is.null(sub)) { sub=TRUE } 106 | if(is.null(ctl)) { 107 | ctl = partykit::ctree_control(mincriterion = 0.95) 108 | } 109 | df.ct <- partykit::ctree( fmla, 110 | data = rm.col( df[sub,], setdiff(ignoreCols,responseVar)), 111 | control = ctl, ... ) 112 | return(df.ct) 113 | } 114 | 115 | #' @title 116 | #' given a ctree model, compute its spread score 117 | #' 118 | #' @description 119 | #' This function computes the "spread score" for a sample of data, given a ctree explaining a binary variable. 120 | #' New data samples are "predicted" into their ctree nodes and the probability of a 'True' response value for 121 | #' their members is fed into a weighted average of the absolute distance from the sample mean across all dat asamples. 122 | #' The higher the score, the better the model has done at classifying people with divergent behaviors (higher or loewr than average). 123 | #' 124 | #' @param df data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton 125 | #' @param df.ct ctree already trained on data, that is used to classify the data points in each sub sample. 126 | #' @param nSample number of samples (with replacement) to use for the calculation 127 | #' @param responseVar the name of the variable that the ctree is trying to explain 128 | #' @param ignoreCols an optional list of columns in df, but that should be excluded from the ctree modeling 129 | #' 130 | #' @export 131 | spreadScore = function(df, df.ct, nSample=NULL, responseVar, ignoreCols=c()) { 132 | sub=T 133 | if( ! is.null(nSample) ) { 134 | sub = sample(1:nrow(df),nSample) 135 | } 136 | newData = rm.col(df[sub,], c(responseVar,ignoreCols)) 137 | responsePct = mean( as.logical(df[,responseVar]) ) 138 | responseProb = NULL 139 | if("constparty" %in% class(df.ct)) { # if the tree is from party 140 | responseprob = predict(df.ct, newdata = newData, type='prob')[,2] 141 | } else { # the tree must be from partykit 142 | responseprob = sapply(partykit::treeresponse(df.ct, newdata = newData), FUN = function(x) x[2]) 143 | } 144 | relResponse = responsePct - responseprob 145 | #print(head(relResponse)) 146 | return(mean(abs(relResponse))) 147 | } 148 | 149 | 150 | #' @title 151 | #' ctree classificaiton results plot for a binary response variable 152 | #' 153 | #' @description 154 | #' plot a histogram of customer segments and enrollment percentages, based on the membership of ctree leaf nodes. 155 | #' The plot is a "histogram" whose x-axis is the average "True" response for each ctree leaf node and whose 156 | #' height is the number of customers in each corresponding group. Strong results have nodes with high membership 157 | #' far from the sample mean. 158 | #' 159 | #' @param df data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton 160 | #' @param df.ct ctree already trained on data, that is used to classify the data points in each sub sample. 161 | #' @param responseCol the name of the variable that the ctree is trying to explain 162 | #' @param title figure title of the plot 163 | #' @param idCol column with data row identifiers in it, which is removed from the df 164 | #' @param xlab label for x-axis of the figure 165 | #' @param normalize boolean for whether the x-axis should be percentages compared to the sample mean or absolute percentages 166 | #' @param compareData optional data.fram of one additional set of bar values to be plotted as well. 167 | #' @param colors optional named list specifying the figure colors. 168 | #' 169 | #' @export 170 | groupHist = function(df, df.ct, responseCol, title='Group probabilities', 171 | idCol='id', xlab='Enrollment probability (%)', 172 | normalize=FALSE, compareData=NULL, colors=NULL) { 173 | if(is.null(colors)) { 174 | colors = list() 175 | colors$fill1st = 'grey24' 176 | colors$line1st = 'black' 177 | colors$fill2nd = 'lightblue' 178 | colors$line2nd = 'blue' 179 | colors$linemean = 'red' 180 | } 181 | response = as.logical(df[,responseCol]) 182 | newData = rm.col(df, c(idCol,responseCol)) 183 | responseProb = NULL 184 | if("constparty" %in% class(df.ct)) { 185 | df$responseprob = predict(df.ct, newdata = newData, type='prob')[,2] 186 | df$group = predict(df.ct, newdata = newData, type='node') 187 | } else { 188 | df$responseprob = sapply(treeresponse(df.ct, newData = newData), FUN = function(x) x[2]) 189 | df$group = where(df.ct, newdata = newData) 190 | } 191 | meanLine = data.frame(x=mean(response)) * 100 192 | barStats = as.data.frame(as.list(aggregate(df$responseprob, by=list(df$group), FUN=function(x) { c(count=sum(x>0),enroll_prob=mean(x)*100) } ))) 193 | names(barStats) = c('nodeId','counts','enroll_prob') 194 | barStats$label = toupper(letters[rank(-barStats$enroll_prob)]) 195 | barStats$group = barStats$label 196 | barStats$pct = barStats$count / sum(barStats$count) * 100 197 | meanLine$meanCount = mean(barStats$counts) 198 | meanLine$maxCount = max(barStats$counts) 199 | meanLine$maxPct = max(barStats$pct) 200 | #print(barStats) 201 | p = ggplot(barStats, aes(x=enroll_prob, y=pct, label=label)) + 202 | geom_bar(stat='identity', width=0.2, color=colors$fill1st, fill=colors$fill1st) + 203 | labs(title=title,x=xlab) + 204 | geom_text(vjust=-1) + 205 | scale_x_continuous(breaks = seq(12,24,1)) + 206 | ylim(0, max(barStats$pct) * 1.1) 207 | if( ! is.null(compareData) ) { 208 | p = p + geom_bar(data=compareData, mapping=aes(x=enroll_prob, y=pct, label=psycographic), 209 | stat='identity', color=colors$fill2nd, fill=colors$fill2nd, width=0.05) 210 | } 211 | #p = p + scale_y_continuous(labels = percent) 212 | p = p + geom_vline(data=meanLine, aes(xintercept=x), color=colors$linemean, size=1) + 213 | geom_text(data=meanLine, aes(x=x, y=maxPct*1.1, hjust=-0.05), 214 | label="full sample mean", 215 | vjust = 0, colour=colors$linemean, angle=0, size=4.5) + 216 | theme( text = element_text(size=15) ) + theme_bw() 217 | 218 | return(p) 219 | } -------------------------------------------------------------------------------- /R/visdom.R: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | #' Smart Meter data analysis tools for R 6 | #' 7 | #' @section Core functions: 8 | #' 9 | #' VISDOM relies most heavily on the following core functions. See the example vignettes for examples of their usage: 10 | #' 11 | #' \itemize{ 12 | #' \item \code{\link{DataSource}}: An S3 class that implements the set of standard data source function for your data. 13 | #' 14 | #' 15 | #' \item \code{\link[visdom]{MeterDataClass}}: S3 class that holds meter data in both vector time series and daily matrix formats, associated weather data, and several supporting functions. 16 | #' 17 | #' 18 | #' \item \code{\link{TestData}}: Example data source S3 class that implements all required data source functions and generates random synthetic data for use in testing and examples. 19 | #' 20 | #' 21 | #' \item \code{\link[visdom]{WeatherClass}}: S3 class that holds weather data and related functions. 22 | #' 23 | #' 24 | #' \item \code{\link{basicFeatures}}: Function that implements a full suite of "basic" feature calculations, which include annual, seasonal, monthly, hour of day averages and variances, and other simple statistics, like simple correlation between outside temperature and consumption. 25 | #' 26 | #' 27 | #' \item \code{\link{dbCfg}}: S3 class that can parse a database config from text file. This is used by the util-dbUtil.R file to connect to the configired database. 28 | #' 29 | #' 30 | #' \item \code{\link{run.query}}: Main function used to run SQL queries to load and cache data. 31 | #' 32 | #' 33 | #' \item \code{\link{iterator.callAllFromCtx}}: Function that iterates through all feature algorithms listed in the configuration ctx environment and returns a concatenated named list of all resulting features for a given MeterDataClass instance. 34 | #' 35 | #' 36 | #' \item \code{\link{iterator.iterateMeters}}: Function that iterates through all passed meter ids to instantiate MEterDataClass for each and call one or more feature extraction functions on each. This requires a properly configured data source, database connection (if applicable) and is further configured using fields in the ctx context object. 37 | #' 38 | #' 39 | #' \item \code{\link{iterator.iterateZip}}: Function that iterates through all passed zip codes, looks up local weather data (once) and looks up the list of meter ids for each and calls iterator.iterateMeters with those meter ids and the per-loaded weather data. This runs faster than calling ids individually, which can load similar weather data over and over. 40 | #' 41 | #' 42 | #' \item \code{\link{iterator.runMeter}}: Utility function that runs the configured set of featture functions on a single passed meter. This is useful for testing and feature development, but also as the function called by parallelizable methods like apply, and the *ply function of plyr. 43 | #' 44 | #' 45 | #' \item \code{\link{iterator.todf}}: The iterator.iterate* functions return lists of feature lists, indexed by meter id. This function converts all the scalar features in this data structure into a single data.frame, one row per meter id. 46 | #' 47 | #' } 48 | #' 49 | #' @docType package 50 | #' @name visdom 51 | NULL 52 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # VISDOM (Visualization and Insight System for Demand Operations and Management) 2 | This module supports bulk time series analysis and related modeling tools for utility interval meter data. 3 | 4 | To get started, ensure that you can access the repository here: https://github.com/convergenceda/visdom 5 | 6 | See the code in the vignettes directory, especially the R markdown file [example_feature_extraction.rmd](./vignettes/example_feature_extraction.rmd). Note: rmd's mix commentary and code and this one will give you a sense of what is required to run features on a given sample of meter data. Cleaning them up is still on our todo list, but the raw source will provide a sense of usage for other tasks. 7 | 8 | ## tl;dr simplest useful run 9 | ```r 10 | devtools::install_github('convergenceda/visdom') # you must have already run install.packages('devtools') 11 | # 100 fake customers with random data - you need to implement your own data soure 12 | # make sure it passes sanityCheckDataSource(YourDataSource())! 13 | DATA_SOURCE = visdom::TestData(n=100) 14 | run_results = visdom::iterator.iterateMeters( DATA_SOURCE$getIds()[1:10], # just 10 for speed 15 | visdom::basicFeatures, as_df=T ) 16 | head(run_results) 17 | ``` 18 | 19 | Your more detailed steps should be: 20 | 21 | 1. To install VISDOM as a package directly from the github repository, with all of its module dependencies automatically installed, follow the instructions in [install_visdom.rmd](./vignettes/install_visdom.rmd) in the vignettes folder. In a nutsheel, you want to run these commands: 22 | ```r 23 | # install devtools if you don't have it. 24 | install.packages(c("devtools")) 25 | 26 | # install bug fix/enhanced version of acs to support loading census data direct from the US Census API 27 | devtools::install_github('josiahjohnston/acs') 28 | 29 | devtools::install_github("convergenceda/visdom", build_vignettes=T ) 30 | 31 | # check if it works! 32 | library(visdom) 33 | ?visdom 34 | 35 | # find a vignette you are interested in 36 | vignette(package='visdom') 37 | # OR for a menu of options: 38 | browseVignettes('visdom') 39 | vignette('authoring_data_source', package='visdom') 40 | ``` 41 | 2. If you will be contributing code or documentation, follow Hadley Wickham's excellent "Getting Started" introduction to get devtools and documentation generation support: http://r-pkgs.had.co.nz/intro.html#intro-get, or you can follow the same steps and see related notes and caveats in [bootstrap_devel_environment.rmd](./vignettes/bootstrap_devel_environment.rmd) in vignettes. 42 | *Familiarize yourself with the rest of the R package background reading available here: http://r-pkgs.had.co.nz/ 43 | * Get added as a collaborator and clone the project from GitHub `git clone git@github.com:convergenceda/visdom.git` so you can work on it and contribute changes locally. 44 | * You can generate documentation and/or install from your local source as a module with the commands `devtools::document()` and `devtools::install()` 45 | 3. To use VISDOM with your own data, you will need to author a DataSource object that maps between the formatting of your data and the data structures used by VISDOM by implementing all the relevant functions stubbed out by DataSource in [util-dataSource.R](./R/util-dataSource.R). This is the key step to using VISDOM and the implementation and usage of a typical data source is detailed in [authoring_data_source.rmd](./vignettes/authoring_data_source.rmd). You will typically need to set up data access (i.e. to a SQL database if applicable - see [util-dbUtil.R](./R/util-dbUtil.R) - or figure out how you will be loading your data from disk or elsewhere), and write the code to perform the queries or other data access steps as appropriate to load, format, and return your data in the VISDOM standard format expected to come out of a data source. You can see the DataSource implemented for testing purposes in the file [testDataSource.R](./R/testDataSource.R) in the R directory of the package. 46 | 5. Call `DATA_SOURCE = YourDataSource()` to setup your data source for use by VISDOM (i.e. assign it to the global variable DATA_SOURCE) 47 | 6. Beyond this point, we assume that you have a working knowledge of the capabilities of the VISDOM package and a decent idea of how you would want to use it for inspiration, look at the vignettes relatd to exploring meter data ([customer_data_objects.rmd](./vignettes/customer_data_objects.rmd) and [weather_data_objects.rmd](./vignettes/weather_data_objects.rmd)), doing feature extraction ([example_feature_extraction.rmd](./vignettes/example_feature_extraction.rmd) and [example_iterator_usage.rmd](./vignettes/example_iterator_usage.rmd)), load shape analysis ([example_load_shape_analysis.rmd](./vignettes/example_load_shape_analysis.rmd)), and customizing analysis for advanced users ([advanced_usage.rmd](./vignettes/advanced_usage.rmd)). 48 | 7. Follow the outline of the extract features script using your data source and correcting any errors and issues that come along. 49 | 8. Use the functions iterator.todf() from [R/iterator.R](./R/iterator.R) to extract a data.frame of scalar meter data features from the list of lists returned by `iterator.iterateCustomers()` and the various merge and export capabilities of [util-export.R](./R/util-export.R) to format your feature data for external consumption. 50 | 51 | -------------------------------------------------------------------------------- /VISDOM_feature_categories.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ConvergenceDA/visdom/ee27cb53b55a1e3a0aec92991253035bcec10b4b/VISDOM_feature_categories.pdf -------------------------------------------------------------------------------- /VISDOM_getting_started.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ConvergenceDA/visdom/ee27cb53b55a1e3a0aec92991253035bcec10b4b/VISDOM_getting_started.pdf -------------------------------------------------------------------------------- /apply_copyright.sh: -------------------------------------------------------------------------------- 1 | for i in ./R/*.R # or whatever other pattern... 2 | do 3 | echo $i 4 | if ! /usr/bin/grep -q Copyright $i 5 | then 6 | /usr/bin/cat copyright.txt $i >$i.new && /usr/bin/mv $i.new $i 7 | fi 8 | done -------------------------------------------------------------------------------- /copyright.txt: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The Board of Trustees of the Leland Stanford Junior University. 2 | # Direct inquiries to Sam Borgeson (sborgeson@stanford.edu) 3 | # or professor Ram Rajagopal (ramr@stanford.edu) 4 | 5 | -------------------------------------------------------------------------------- /data-raw/census/ACS_2007_2011_SF_Tech_Doc.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ConvergenceDA/visdom/ee27cb53b55a1e3a0aec92991253035bcec10b4b/data-raw/census/ACS_2007_2011_SF_Tech_Doc.pdf -------------------------------------------------------------------------------- /data-raw/census/census_vars.txt: -------------------------------------------------------------------------------- 1 | census_var label 2 | DP02_0015 avg_hh_size 3 | DP02_0013 hh_under_18 4 | DP02_0013P hh_under_18_pct 5 | DP02_0014 hh_over_65 6 | DP02_0014P hh_over_65_pct 7 | DP02_0066P pop_past_highschool 8 | DP02_0067P pop_past_bachelors 9 | DP02_0079 res_same_1yr 10 | DP02_0079P res_same_1yr_pct 11 | DP02_0080 res_diff_1yr 12 | DP02_0080P res_diff_1yr_pct 13 | DP03_0062 median_hh_income 14 | DP03_0063 mean_hh_income 15 | DP03_0086 median_fam_income 16 | DP03_0087 mean_fam_income 17 | DP03_0088 per_cap_income 18 | DP03_0119P pct_below_poverty 19 | DP04_0002 occupied_units 20 | DP04_0002P occupied_units_pct 21 | DP04_0045 owner_occupied 22 | DP04_0045P owner_occupied_pct 23 | DP04_0046 renter_occupied 24 | DP04_0046P renter_occupied_pct 25 | DP04_0047 owner_hh_size 26 | DP04_0048 renter_hh_size 27 | DP04_0088 median_home_value 28 | DP04_0036 median_rooms 29 | DP05_0017 median_pop_age 30 | DP05_0018 pop_above_18 31 | DP05_0018P pop_above_18_pct 32 | DP05_0023 pop_above_18_M 33 | DP05_0023P pop_above_18_M_pct 34 | DP05_0024 pop_above_18_F 35 | DP05_0024P pop_above_18_F_pct 36 | DP05_0021 pop_above_65 37 | DP05_0021P pop_above_65_pct 38 | DP05_0026 pop_above_65_M 39 | DP05_0026P pop_above_65_M_pct 40 | DP05_0027 pop_above_65_F 41 | DP05_0027P pop_above_65_F_pct 42 | -------------------------------------------------------------------------------- /data-raw/migrateData.R: -------------------------------------------------------------------------------- 1 | # Zipcode to location mapping 2 | ERLE_ZIP_LOCATION = read.csv(file.path("Erle_zipcodes.csv"), header=TRUE) 3 | devtools::use_data(ERLE_ZIP_LOCATION, overwrite = TRUE) 4 | 5 | CA_ZIP_CLIMATE = read.csv(file.path("BuildingClimateZonesByZIPCode.csv"), header=TRUE) 6 | devtools::use_data(CA_ZIP_CLIMATE, overwrite = TRUE) 7 | 8 | CENSUS_VARS_OF_INTEREST = read.table(file.path("census", "census_vars.txt"), header=TRUE, stringsAsFactors=F) 9 | devtools::use_data(CENSUS_VARS_OF_INTEREST, overwrite = TRUE) 10 | 11 | #read in the "gazeteer" file that has population, latitude, and longitude for the zcta's 12 | CENSUS_GAZ <- read.table(file.path("census/ACS_11","Gaz_zcta_national.txt"), header=TRUE, colClasses=c("character", rep("numeric", 8))) 13 | # fun fact: the geoid in the gazeteer file IS the ZCTA 14 | colnames(CENSUS_GAZ)[1] = c('ZCTA') 15 | devtools::use_data(CENSUS_GAZ, overwrite = TRUE) 16 | -------------------------------------------------------------------------------- /data/CA_ZIP_CLIMATE.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ConvergenceDA/visdom/ee27cb53b55a1e3a0aec92991253035bcec10b4b/data/CA_ZIP_CLIMATE.rda -------------------------------------------------------------------------------- /data/CENSUS_GAZ.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ConvergenceDA/visdom/ee27cb53b55a1e3a0aec92991253035bcec10b4b/data/CENSUS_GAZ.rda -------------------------------------------------------------------------------- /data/CENSUS_VARS_OF_INTEREST.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ConvergenceDA/visdom/ee27cb53b55a1e3a0aec92991253035bcec10b4b/data/CENSUS_VARS_OF_INTEREST.rda -------------------------------------------------------------------------------- /data/ERLE_ZIP_LOCATION.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ConvergenceDA/visdom/ee27cb53b55a1e3a0aec92991253035bcec10b4b/data/ERLE_ZIP_LOCATION.rda -------------------------------------------------------------------------------- /data/ZIP_TO_ZCTA.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ConvergenceDA/visdom/ee27cb53b55a1e3a0aec92991253035bcec10b4b/data/ZIP_TO_ZCTA.rda -------------------------------------------------------------------------------- /inst/doc/advanced_usage.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | Advanced Usage 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 |

Advanced Usage

34 |

Sam Borgeson

35 |

2018-01-04

36 | 37 | 38 | 39 |

TBD. This will contain examples of advanced usage options once the basic documentation is completed and stable.

40 | 41 | 42 | 43 | 44 | 52 | 53 | 54 | 55 | -------------------------------------------------------------------------------- /inst/doc/bootstrap_devel_environment.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | Developing the VISDOM Module 18 | 19 | 20 | 21 | 22 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 |

Developing the VISDOM Module

72 |

Sam Borgeson

73 |

2018-01-04

74 | 75 | 76 | 77 |

First ensure that you have cloned the repository into working version only your local machine, choosing one of:

78 |
cd /dev
 79 | git clone git@github.com:convergenceda/visdom.git
 80 | git clone https://github.com/convergenceda/visdom.git
81 |

Then using that location as as your working directory (here we assume /dev/visdom), load requirements for package development and install from source.

82 |
83 |

Load requirements

84 |
setwd('/dev/visdom')
 85 | 
 86 | install.packages(c("devtools", "roxygen2", "testthat", "knitr"))
87 |

version check

88 |
rstudioapi::isAvailable("0.99.149")
89 |

for some reason, withr (a devtools dependency) will not install for RRO 3.2.2 with message that there is no 3.2.2 version of withr this is probably an ephemeral issue, but install from source for now

90 |
devtools::install_github("jimhester/withr")
91 |

install Wickham’s latest/src devtool support. Note that devtools can’t install itself on Windows (!!) https://github.com/hadley/devtools/issues/503 explains.

92 |
devtools::install_github("hadley/devtools")
93 |
94 |
95 |

Install VISDOM as a local package from your source

96 |

Ensure that your getwd() is the visdom source dir and then call:

97 |
devtools::document()
 98 | devtools::build_vignettes()
 99 | ?visdom
100 | ?MeterDataClass
101 | ?WeatherDataClass
102 |

Followed by installing the package from source, which should resolve and install all the package dependencies, with some exceptions, like for HDF5 support.

103 |
devtools::install(build_vignettes = T)
104 |

Test your install with a fresh R session:

105 |
library(visdom)
106 | ?visdom
107 | browseVignettes(package='visdom')
108 |
109 | 110 | 111 | 112 | 113 | 121 | 122 | 123 | 124 | -------------------------------------------------------------------------------- /inst/doc/customer_data_objects.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | Working With Meter Data 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 |

Working With Meter Data

34 |

Sam Borgeson

35 |

2018-01-04

36 | 37 | 38 | 39 |

Documentation on loading MeterDataClass objects and exploring their data, functions and built-in features.

40 | 41 | 42 | 43 | 44 | 52 | 53 | 54 | 55 | -------------------------------------------------------------------------------- /inst/doc/example_iterator_usage.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | Working With Samples from Many Meters 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 |

Working With Samples from Many Meters

34 |

Sam Borgeson

35 |

2018-01-04

36 | 37 | 38 | 39 |

Examples of iterator function usage TBD. Please see vignette on feature extraction for a working example.

40 | 41 | 42 | 43 | 44 | 52 | 53 | 54 | 55 | -------------------------------------------------------------------------------- /inst/doc/weather_data_objects.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | Working With Weather Data 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 |

Working With Weather Data

34 |

Sam Borgeson

35 |

2018-01-04

36 | 37 | 38 | 39 |

Examples illustrating how to gather and work with weather data.

40 | 41 | 42 | 43 | 44 | 52 | 53 | 54 | 55 | -------------------------------------------------------------------------------- /inst/feature_set_run_conf/example_feature_set.conf: -------------------------------------------------------------------------------- 1 | feature_set=example_feature_set 2 | feature_set_description="Features described in the example_feature_extraction vignette." 3 | run_name=test_run 4 | run_description="Just testing things out." 5 | -------------------------------------------------------------------------------- /inst/sql/db_connect_example.conf: -------------------------------------------------------------------------------- 1 | dbType=MySQL 2 | user=USERNAME 3 | password=PASSWORD 4 | dbname=visdom 5 | -------------------------------------------------------------------------------- /inst/sql/feature_runs.create.mysql.sql: -------------------------------------------------------------------------------- 1 | -- The SQL standard does not support several fundamental features, so table 2 | -- definitions need to be dialect-specific. This is the MySQL version of a table 3 | -- definition for feature run metadata. 4 | CREATE TABLE feature_runs ( 5 | -- AUTO_INCREMENT isn't part of the SQL standard. 6 | -- http://stackoverflow.com/questions/5823912/considering-the-different-implimentations-of-sequence-auto-increment-how-can-i 7 | -- mysql needs AUTO_INCREMENT 8 | -- psql needs the SERIAL datatype 9 | -- SQL Server needs IDENTITY(1,1) isntead of AUTO_INCREMENT 10 | -- Access needs AUTOINCREMENT 11 | -- Oracle needs a separate create sequence statement and a trigger: http://stackoverflow.com/questions/11296361/how-to-create-id-with-auto-increment-on-oracle 12 | id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, 13 | feature_set VARCHAR(20) NOT NULL UNIQUE, 14 | feature_set_description VARCHAR(250) NULL, 15 | run_name VARCHAR(40) NOT NULL UNIQUE, 16 | run_description VARCHAR(250) NULL, 17 | -- This only works with MySQL version 5.6.5 and above. See http://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization.html 18 | create_time DATETIME NOT NULL, 19 | update_time DATETIME DEFAULT NULL 20 | COMMENT 'To be updated programmatically when features are written to the feature table.)', -- limit 255 chars 21 | CONSTRAINT feature_run UNIQUE (feature_set, run_name) 22 | ) 23 | COMMENT 'Tracking metadata for feature sets and runs.' -- limit 60 chars 24 | ; 25 | -------------------------------------------------------------------------------- /install/generic_create.sql: -------------------------------------------------------------------------------- 1 | /* 2 | SQLyog Community v12.2.4 (64 bit) 3 | MySQL - 10.1.13-MariaDB : Database - pgeres 4 | ********************************************************************* 5 | */ 6 | 7 | /* NOTE: this is obviously a create script for MYSQL/MariaDB, but any other SQL database can work just as well. 8 | The important thing is to ensure that R can connect to your preferred DB. 9 | See also the import script for more information on how the databse can be populated and how indices can be 10 | added to exsure adequate performance. 11 | */ 12 | 13 | /*!40101 SET NAMES utf8 */; 14 | 15 | /*!40101 SET SQL_MODE=''*/; 16 | 17 | /*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */; 18 | /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */; 19 | /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */; 20 | /*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */; 21 | CREATE DATABASE /*!32312 IF NOT EXISTS*/`visdom_data` /*!40100 DEFAULT CHARACTER SET latin1 */; 22 | 23 | USE `visdom_data`; 24 | 25 | /*Table structure for table `account` */ 26 | /* all customer acocunt associated data should go here 27 | add or subtract columns to reflect what you know about customers 28 | the main idea is that account data can be added directly to customer features based on meter or account id 29 | so if you want a feature that reflects account information you already have, put it in this table. 30 | */ 31 | 32 | DROP TABLE IF EXISTS `account`; 33 | 34 | CREATE TABLE `account` ( 35 | `id` int(11) NOT NULL AUTO_INCREMENT, 36 | `ACCOUNT_UUID` varchar(11) DEFAULT NULL, 37 | `METER_UUID` varchar(11) DEFAULT NULL, 38 | `ACCOUNT_START_DATE` datetime NOT NULL, # start of association of account_id with meter_id 39 | `ACCOUNT_END_DATE` datetime NOT NULL, # end of association of account_id with meter_id 40 | `zip5` varchar(5) DEFAULT NULL, 41 | `RATE_PLAN` varchar(6) DEFAULT NULL, 42 | `CLIMATE_ZONE` varchar(4) DEFAULT NULL, 43 | `PV_INDICATOR` char(1) DEFAULT NULL, 44 | # you might also have demographics, census data, site physical characteristics, income, etc. here 45 | PRIMARY KEY (`id`), 46 | KEY `zip5_meter_uuid_idx` (`METER_UUID`,`zip5`), 47 | KEY `account_uuid_idx` (`ACCOUNT_UUID`), 48 | KEY `meter_uuid_idx` (`METER_UUID`) 49 | ) ENGINE=InnoDB AUTO_INCREMENT=327676 DEFAULT CHARSET=latin1; 50 | 51 | 52 | /*Table structure for table `intervention` */ 53 | /* intervention data not required to run visdom, but useful for before/after studies 54 | add or subtract intervention data columns to match available intervention data */ 55 | DROP TABLE IF EXISTS `intervention`; 56 | 57 | CREATE TABLE `intervention` ( 58 | `id` int(11) NOT NULL AUTO_INCREMENT, 59 | `account_uuid` varchar(11) DEFAULT NULL, 60 | `meter_uuid` varchar(11) DEFAULT NULL, 61 | `INSTALL_DATE` date DEFAULT NULL, 62 | `MEASURE_TYPE` varchar(10) DEFAULT NULL, 63 | `MEASURE_DESC` varchar(80) DEFAULT NULL, 64 | `TECHNOLOGY_TYPE` varchar(30) DEFAULT NULL, 65 | `EE_PROGRAM_NAME` varchar(10) DEFAULT NULL, 66 | `EST_ANNUAL_KW_SAVINGS` float DEFAULT NULL, 67 | `EST_ANNUAL_KWH_SAVINGS` float DEFAULT NULL, 68 | `EST_ANNUAL_THM_SAVINGS` float DEFAULT NULL, 69 | `INCENTIVE_PAYMENT` float DEFAULT NULL, 70 | `TOTAL_PROJECT_COSTS` float DEFAULT NULL, 71 | PRIMARY KEY (`id`), 72 | KEY `account_uuid_idx` (`account_uuid`), 73 | KEY `meter_uuid_idx` (`meter_uuid`), 74 | KEY `install_idx` (`INSTALL_DATE`) 75 | ) ENGINE=MyISAM AUTO_INCREMENT=13477 DEFAULT CHARSET=latin1; 76 | 77 | 78 | 79 | /*Table structure for table `local_weather` */ 80 | /* this table matches the format of weather data provided by the NOAA data scraping GitHub project local-weather 81 | which keys off of zip code: http://github.com/sborgeson/local-weather 82 | */ 83 | 84 | DROP TABLE IF EXISTS `local_weather`; 85 | 86 | CREATE TABLE `local_weather` ( 87 | `id` int(6) unsigned NOT NULL AUTO_INCREMENT, 88 | `zip5` int(11) NOT NULL, 89 | `date` datetime NOT NULL, 90 | `TemperatureF` float(5,2) DEFAULT NULL, 91 | `DewpointF` float(5,2) DEFAULT NULL, 92 | `Pressure` float(5,2) DEFAULT NULL, 93 | `WindSpeed` float(5,2) DEFAULT NULL, 94 | `Humidity` float(5,2) DEFAULT NULL, 95 | `Clouds` varchar(10) DEFAULT NULL, 96 | `HourlyPrecip` float(5,2) DEFAULT NULL, 97 | `SolarRadiation` float(6,2) DEFAULT NULL, 98 | PRIMARY KEY (`id`), 99 | UNIQUE KEY `zip_date_idx` (`zip5`,`date`), 100 | KEY `zip_idx` (`zip5`), 101 | KEY `date_idx` (`date`) 102 | ) ENGINE=MyISAM AUTO_INCREMENT=1637609 DEFAULT CHARSET=latin1; 103 | 104 | /*Table structure for table `meter_data` */ 105 | /* hourly meter data uses 24 columns per row 15 minute interval data can use 96 cols in the same manner 106 | you can build a data source to pull data for each meter (i.e. all data from a given meter) or account 107 | i.e. for the period of time an individual account holder was behind a given meter. If you get the account_uuid 108 | correct over time, it is very simple for a data source to key by account rather than meter ids. 109 | */ 110 | 111 | DROP TABLE IF EXISTS `meter_data`; 112 | 113 | CREATE TABLE `meter_data` ( 114 | `meter_uuid` varchar(11) NOT NULL, 115 | `account_uuid` varchar(11) NOT NULL, 116 | `date` date NOT NULL, 117 | `zip5` varchar(5) DEFAULT NULL, 118 | `h1` int(11) DEFAULT NULL, # note that these are ints because they are cheaper to store than floats 119 | `h2` int(11) DEFAULT NULL, # and the assimption is that you can store Wh per interval period 120 | `h3` int(11) DEFAULT NULL, # converting to kWh/period on the way out of the db 121 | `h4` int(11) DEFAULT NULL, # thsi table can become HUGE, so optimizations like this are a good idea. 122 | `h5` int(11) DEFAULT NULL, 123 | `h6` int(11) DEFAULT NULL, 124 | `h7` int(11) DEFAULT NULL, 125 | `h8` int(11) DEFAULT NULL, 126 | `h9` int(11) DEFAULT NULL, 127 | `h10` int(11) DEFAULT NULL, 128 | `h11` int(11) DEFAULT NULL, 129 | `h12` int(11) DEFAULT NULL, 130 | `h13` int(11) DEFAULT NULL, 131 | `h14` int(11) DEFAULT NULL, 132 | `h15` int(11) DEFAULT NULL, 133 | `h16` int(11) DEFAULT NULL, 134 | `h17` int(11) DEFAULT NULL, 135 | `h18` int(11) DEFAULT NULL, 136 | `h19` int(11) DEFAULT NULL, 137 | `h20` int(11) DEFAULT NULL, 138 | `h21` int(11) DEFAULT NULL, 139 | `h22` int(11) DEFAULT NULL, 140 | `h23` int(11) DEFAULT NULL, 141 | `h24` int(11) DEFAULT NULL, 142 | PRIMARY KEY (`meter_uuid`,`date`), 143 | KEY `meter_uuid_idx` (`meter_uuid`), 144 | KEY `account_uuid_idx` (`account_uuid`), 145 | KEY `zip_Date_idx` (`date`,`zip5`), 146 | KEY `zip_idx` (`zip5`) 147 | ) ENGINE=InnoDB DEFAULT CHARSET=latin1; 148 | 149 | 150 | -------------------------------------------------------------------------------- /install/generic_insert.sql: -------------------------------------------------------------------------------- 1 | 2 | # under some configurations, this file has to be in the main MySQL data directory 3 | # ou can try to absolutely path to it, but if it doesn't work, see the MySQL docs on LOAD DATA 4 | LOAD DATA LOCAL INFILE 'intervention_data.csv' 5 | IGNORE # ignore duplicates of the primary key 6 | INTO TABLE intervention 7 | FIELDS TERMINATED BY ',' 8 | lines terminated by '\r\n' 9 | IGNORE 1 LINES 10 | ( ACCOUNT_UUID, 11 | METER_UUID, 12 | @INSTALL_DATE_v, # load date as a string variable so it can be parsed with a date format into a date 13 | MEASURE_TYPE, 14 | MEASURE_DESC, 15 | TECHNOLOGY_TYPE, 16 | EE_PROGRAM_NAME, 17 | @EST_ANNUAL_KW_SAVINGS_v, # load numbers as strings to handle nulls. 18 | @EST_ANNUAL_KWH_SAVINGS_v, 19 | @EST_ANNUAL_THM_SAVINGS_v, 20 | @INCENTIVE_PAYMENT_v, 21 | @TOTAL_PROJECT_COSTS_v, 22 | ) 23 | SET INSTALL_DATE = STR_TO_DATE( SUBSTRING_INDEX(@INSTALL_DATE_v, ' ', 1), '%m/%e/%Y'), # convert dates 24 | EST_ANNUAL_KW_SAVINGS = nullif(@EST_ANNUAL_KW_SAVINGS_v,''), # handle blanks as nulls 25 | EST_ANNUAL_KWH_SAVINGS = nullif(@EST_ANNUAL_KWH_SAVINGS_v,''), 26 | EST_ANNUAL_THM_SAVINGS = nullif(@EST_ANNUAL_THM_SAVINGS_v,''), 27 | INCENTIVE_PAYMENT = nullif(@INCENTIVE_PAYMENT_v,''), 28 | TOTAL_PROJECT_COSTS = nullif(@TOTAL_PROJECT_COSTS_v,''), 29 | ID = NULL # auto-increment 30 | ; 31 | 32 | 33 | LOAD DATA LOCAL INFILE 'account_data.csv' # this file has to be in the main data directory 34 | IGNORE # ignore duplicates of the primary key 35 | INTO TABLE account 36 | FIELDS TERMINATED BY ',' 37 | LINES TERMINATED BY '\r\n' 38 | IGNORE 1 LINES 39 | ( account_uuid, 40 | meter_uuid, 41 | @account_start_date_v, 42 | @account_end_date_v, 43 | zip5, 44 | rate_plan, 45 | climate_zone, 46 | @pv_indicator_v 47 | ) 48 | SET pv_indicator = NULLIF(@pv_indicator_v,''), 49 | account_start_date = STR_TO_DATE( SUBSTRING_INDEX(@account_start_date_v, ' ', 1), '%m/%e/%Y'), # convert dates 50 | account_end_date = STR_TO_DATE( SUBSTRING_INDEX(@account_end_date_v, ' ', 1), '%m/%e/%Y'), # convert dates 51 | ID = NULL 52 | ; 53 | 54 | 55 | /* assuming fields are meter_id, date, and 24 x interval readings per row */ 56 | load data local infile 'interval_data.txt' 57 | into TABLE meter_data 58 | FIELDS TERMINATED BY ',' 59 | IGNORE 1 LINES # ignore headers 60 | ( meter_uuid, 61 | # note here we assume the account id was not included with the meter data, but can be added below 62 | @usage_date, # one row per meter day 63 | @h1, @h2, @h3, @h4, @h5, @h6, 64 | @h7, @h8, @h9, @h10, @h11, @h12, 65 | @h13, @h14, @h15, @h16, @h17, @h18, 66 | @h19, @h20, @h21, @h22, @h23, @h24 67 | ) 68 | set date = STR_TO_DATE( SUBSTRING_INDEX(@usage_date, ' ', 1), '%m/%e/%Y'), 69 | h1 = round(@h1 * 1000), # add as ints in units of Wh/period to save disk space 70 | h2 = round(@h2 * 1000), # assuming raw data is kW/hour, we simply multiply by 1000 71 | h3 = round(@h3 * 1000), # and round into ints. 72 | h4 = round(@h4 * 1000), 73 | h5 = round(@h5 * 1000), 74 | h6 = round(@h6 * 1000), 75 | h7 = round(@h7 * 1000), 76 | h8 = round(@h8 * 1000), 77 | h9 = round(@h9 * 1000), 78 | h10 = round(@h10 * 1000), 79 | h11 = round(@h11 * 1000), 80 | h12 = round(@h12 * 1000), 81 | h13 = round(@h13 * 1000), 82 | h14 = round(@h14 * 1000), 83 | h15 = round(@h15 * 1000), 84 | h16 = round(@h16 * 1000), 85 | h17 = round(@h17 * 1000), 86 | h18 = round(@h18 * 1000), 87 | h19 = round(@h19 * 1000), 88 | h20 = round(@h20 * 1000), 89 | h21 = round(@h21 * 1000), 90 | h22 = round(@h22 * 1000), 91 | h23 = round(@h23 * 1000), 92 | h24 = round(@h24 * 1000), 93 | zip5 = NULL 94 | ; 95 | 96 | 97 | # pull zip code data from the account table 98 | UPDATE meter_data 99 | LEFT JOIN account ON meter_data.meter_uuid = account.meter_uuid 100 | SET meter_data.zip5 = account.zip5; 101 | 102 | # assuming meter_data doesnt have account ids, but the account table does, you can add them like this: 103 | UPDATE meter_data AS md 104 | LEFT JOIN account AS a ON md.meter_uuid = a.meter_uuid AND 105 | md.date >= a.ACCOUNT_START_DATE AND 106 | md.date <= a.ACCOUNT_END_DATE 107 | SET md.account_uuid = a.account_uuid; 108 | 109 | # get ready for weather data 110 | # use the output of this as the input into weatherDump.py from http://github.com/sborgeson/local-weather 111 | SELECT zip5, MIN(DATE), MAX(DATE) FROM meter_data GROUP BY zip5; 112 | # save results as zip_dates.csv 113 | # run with 3 stations averaged per zip and a preferred radius of less than 20 km 114 | # python weatherDump.py -i path/to/zip_dates.csv -o path/to/out/dir/weather_data.csv -n 3 -d 20 115 | 116 | 117 | /* load weather data in the format produced by local_weather's dumpWeather.py utility */ 118 | LOAD DATA LOCAL INFILE 'weather_data.csv' 119 | INTO TABLE local_weather 120 | FIELDS TERMINATED BY ',' 121 | LINES TERMINATED BY '\n' 122 | IGNORE 1 LINES 123 | (@dateStr, TemperatureF, DewpointF, Pressure, WindSpeed, Humidity, HourlyPrecip, zip5) 124 | SET `date` = STR_TO_DATE(@dateStr, '%Y-%m-%d %H:%i:%s'); 125 | 126 | -------------------------------------------------------------------------------- /install/generic_visdom_data_source.R: -------------------------------------------------------------------------------- 1 | require(RMySQL) # assuming data is in a MySQL database 2 | library(visdom) # includes generic database support from util-dbUtil.R for things like connection management 3 | 4 | MyDataSource = function(dbConfig='data_db.cfg', queryCache=paste(getwd(),'/','DATA_CACHE','/',sep='')){ 5 | obj = DataSource() 6 | obj$CACHE_DIR = queryCache 7 | obj$DB_CFG = dbCfg(dbConfig) 8 | obj$db = 'visdom_data' 9 | 10 | print(queryCache) 11 | #print(obj$DB_CFG) 12 | 13 | obj$accountTable = 'account' 14 | obj$meterTable = 'meter_data' 15 | obj$weatherTable = 'local_weather' 16 | obj$interventionTable = 'intervention' 17 | 18 | obj$geoColumnName = 'zip5' 19 | 20 | # utility function that returns a list of all the zipcodes in the data set 21 | obj$getGeocodes = function(useCache=F,forceRefresh=F) { 22 | query <- paste("select distinct zip5 from",obj$accountTable,'order by zip5') 23 | cacheFile = NULL 24 | if(useCache) { cacheFile='zipList.RData' } 25 | return(run.query(query,obj$DB_CFG,cacheDir=obj$CACHE_DIR,cacheFile=cacheFile,forceRefresh=forceRefresh)[[1]]) 26 | } 27 | 28 | obj$getIds = function(geocode=NULL,useCache=F,forceRefresh=F) { 29 | cacheFile = NULL 30 | if(is.null(geocode)) { 31 | if(useCache) { 32 | cacheFile = paste('meterids_all.RData',sep='') # only cache sp list for individual geos 33 | } 34 | # note that this query uses meter ids from the account table. You can also use the ids from the actual meter_table 35 | # if the account table is unreliable, but these queries will take longer. 36 | query <- paste("select distinct meter_uuid as id from",obj$accountTable,'order by id') 37 | } else { 38 | if(useCache) { 39 | cacheFile = paste('meterids_',geocode,'.RData',sep='') # only cache sp list for individual geos 40 | } 41 | query <- paste("select distinct meter_uuid as id from ",obj$account," where zip5='",geocode,"' order by id",sep='') 42 | } 43 | return(run.query(query,obj$DB_CFG,cacheDir=obj$CACHE_DIR,cacheFile=cacheFile,forceRefresh=forceRefresh,debug=T)[[1]]) 44 | } 45 | 46 | 47 | obj$getAllData = function(geocode,useCache=F,forceRefresh=F) { 48 | cacheFile = NULL 49 | if(useCache) { cacheFile=paste('meterData_',geocode,'.RData',sep='') } 50 | query = paste( 51 | # note that VISDOM expects the unique identifiers to be called 'id' and dates to be called 'dates' 52 | # in the data framse returned from here 53 | 'SELECT 54 | meter_uuid as id, zip5, date as dates, 55 | h1, h2, h3, h4, h5, h6, h7, h8, h9, h10,h11,h12, 56 | h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24 57 | FROM ',obj$meterTable, 58 | " where zip5 ='",geocode,"' ", 59 | ' ORDER BY id, dates',sep='') 60 | data = run.query(query,obj$DB_CFG,cacheDir=obj$CACHE_DIR,cacheFile=cacheFile,forceRefresh=forceRefresh, debug=F) 61 | # convert last 24 columns to kWh from Wh 62 | data[,c(-23:0) + ncol(data)] = data[,c(-23:0) + ncol(data)] / 1000 63 | return(data) 64 | } 65 | 66 | 67 | obj$getAccountData = function(useCache=F, forceRefresh=F) { 68 | cacheFile = NULL 69 | if(useCache) { cacheFile=paste('accountData.RData',sep='') } 70 | # note that with a select * statement, this generic function returns whatever you are able to cram into the acocunt table. 71 | # this allows for generic support for merging in account features with the other features by account id. 72 | # however, note that the column names can't be controlled by * queries, so all, including the ids will have their db names. 73 | query = 'SELECT * from account order by meter_uuid, acocunt_uuid' 74 | data = run.query(query,obj$DB_CFG,cacheDir=obj$CACHE_DIR,cacheFile=cacheFile,forceRefresh=forceRefresh, debug=F) 75 | return(data) 76 | } 77 | 78 | obj$getInterventionData = function(useCache=F,forceRefresh=F) { 79 | cacheFile = NULL 80 | if(useCache) { cacheFile=paste('interventionData.RData',sep='') } 81 | query = paste( 82 | 'SELECT * FROM ', obj$interventionTable, 83 | ' ORDER BY meter_uuid, account_uuid, install_date',sep='') 84 | data = run.query(query,obj$DB_CFG,cacheDir=obj$CACHE_DIR,cacheFile=cacheFile,forceRefresh=forceRefresh, debug=F) 85 | return(data) 86 | } 87 | 88 | obj$getMeterData = function(id,geocode=NULL) { 89 | # note that VISDOM expects the unique identifiers to be called 'id' and dates to be called 'dates' 90 | # in the data framse returned from here 91 | query = paste( 92 | 'SELECT 93 | meter_uuid as id, zip5, date as dates, 94 | h1, h2, h3, h4, h5, h6, h7, h8, h9, h10,h11,h12, 95 | h13,h14,h15,h16,h17,h18,h19,h20,h21,h22,h23,h24 96 | FROM ',obj$meterTable, 97 | " where meter_uuid ='",id,"'", 98 | ' ORDER BY dates', sep='') 99 | data = run.query(query, obj$DB_CFG, debug=T) 100 | # convert last 24 columns to kW from W 101 | data[,c(-23:0) + ncol(data)] = data[,c(-23:0) + ncol(data)] / 1000 102 | if( 'date' %in% names(data)) { 103 | names(data)[which(names(data) %in% 'date')] = 'dates' 104 | } 105 | return(data) 106 | } 107 | 108 | obj$getWeatherData = function(geocode,useCache=F,forceRefresh=F) { 109 | cacheFile = NULL 110 | if(useCache) { cacheFile=paste('weather_',geocode,'.RData',sep='') } 111 | query = paste( 112 | 'SELECT `date`, TemperatureF, Pressure, DewpointF, HourlyPrecip, WindSpeed 113 | FROM ',obj$weatherTable," where zip5 ='",geocode,"' ORDER BY DATE",sep='') 114 | data = run.query(query,DATA_SOURCE$DB_CFG,cacheDir=obj$CACHE_DIR,cacheFile=cacheFile,forceRefresh=forceRefresh) 115 | names(data) = tolower(names(data)) 116 | return(data) 117 | } 118 | 119 | obj$getGeoForId = function(id,useCache=F, forceRefresh=F) { 120 | query <- paste("select meter_uuid as id, zip5 from",obj$accountTable,' group by meter_uuid order by meter_uuid') 121 | cacheFile = NULL 122 | if(useCache) { cacheFile='idZipList.RData' } 123 | idZipData = run.query(query,obj$DB_CFG,cacheDir=obj$CACHE_DIR,cacheFile=cacheFile,forceRefresh=forceRefresh) 124 | return( idZipData[idZipData$id == id, 'zip5'] ) 125 | } 126 | 127 | class(obj) = append(class(obj),"MyDataSource") 128 | 129 | return(obj) 130 | } -------------------------------------------------------------------------------- /install/ubuntu_14.04_bash_requirements.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Download and install Microsoft R Open (MRO, formerly RRO): 4 | # See https://mran.microsoft.com/documents/rro/installation/#revorinst-lin 5 | 6 | # remove default version of R (if applicable) 7 | sudo apt-get remove r-base-core 8 | 9 | # get MRO 10 | cd /usr/local/src 11 | sudo wget https://mran.microsoft.com/install/mro/3.3.1/microsoft-r-open-3.3.1.tar.gz 12 | 13 | # extract and install 14 | sudo tar zxvf microsoft-r-open-3.3.1.tar.gz 15 | cd microsoft-r-open/ 16 | sudo ./install.sh 17 | #Choose to install MKL libraries and accept licenses 18 | #For unattended install, ./install.sh -a -s 19 | 20 | # back to /usr/local/src 21 | cd /usr/local/src 22 | 23 | #Install rstudio server for running browser-based R sessions: 24 | sudo apt-get install gdebi-core 25 | sudo wget https://download2.rstudio.org/rstudio-server-0.99.903-amd64.deb 26 | sudo gdebi rstudio-server-0.99.903-amd64.deb 27 | 28 | # knitr requires pandoc from more recent than the Ubuntu 14.04 packages 29 | # if necessary sudo apt-get remove pandoc pandoc-citeproc 30 | sudo wget https://github.com/jgm/pandoc/releases/download/1.17.2/pandoc-1.17.2-1-amd64.deb 31 | sudo gdebi pandoc-1.17.2-1-amd64.deb 32 | 33 | # install lib dependencies for R modules that will be used 34 | # by devtools, vignettes, or visodm 35 | sudo apt-get install zlib1g-dev libssl-dev libcurl4-openssl-dev 36 | sudo apt-get install gfortran 37 | sudo apt-get install libmariadbclient-dev 38 | 39 | 40 | # install devtools if you don't have it. 41 | # monitor this for errors related to missing dependencies 42 | sudo R -e 'install.packages(c("devtools"))' 43 | 44 | # install visdom from GitHub. 45 | # the unzip='internal' option is a fix for Ubuntu's devtools support 46 | # see discussion at https://github.com/RevolutionAnalytics/RRO/issues/37 47 | sudo R -e 'options(unzip = "internal"); devtools::install_github("convergenceda/visdom", build_vignettes=F )' 48 | 49 | # check if it works! 50 | R -e 'library(visdom);?visdom' 51 | 52 | # now get the vignettes up and running 53 | sudo R -e 'install.packages(c("knitr","rmarkdown","DBI"))' 54 | 55 | sudo R -e 'options(unzip = "internal"); devtools::install_github("convergenceda/visdom", build_vignettes=T, force=T)' 56 | 57 | -------------------------------------------------------------------------------- /man/CA_ZIP_CLIMATE.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataSetDocs.R 3 | \docType{data} 4 | \name{CA_ZIP_CLIMATE} 5 | \alias{CA_ZIP_CLIMATE} 6 | \title{A zip code to CA climate zone mapping} 7 | \format{A data frame with 1706 rows and 2 variables: 8 | \describe{ 9 | \item{ZIP.Code}{zip code, as a number} 10 | \item{Building.Climate.Zone}{the CA CEC climate zone the zip code is in} 11 | }} 12 | \source{ 13 | \url{http://CEC.web.site.somewhere/} 14 | } 15 | \usage{ 16 | data(CA_ZIP_CLIMATE) 17 | } 18 | \description{ 19 | A dataset containing columns with US zip codes and the corresponding CA CEC climate zones 20 | } 21 | \keyword{datasets} 22 | 23 | -------------------------------------------------------------------------------- /man/CENSUS_GAZ.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataSetDocs.R 3 | \docType{data} 4 | \name{CENSUS_GAZ} 5 | \alias{CENSUS_GAZ} 6 | \title{A summary of census statistics for each ZCTA in the census} 7 | \format{A data frame with 33120 rows and 9 variables: 8 | \describe{ 9 | \item{ZCTA}{zip code tabulation area as a 0 padded string} 10 | \item{POP10}{Population in 2010} 11 | \item{ALAND}{Land area in sqft?} 12 | \item{AWATER}{Water area in sqft?} 13 | \item{ALAND_SQMI}{Land area in sqare miles} 14 | \item{AWATER_SQMI}{Water area in square miles} 15 | \item{INTPTLAT}{Latitude of the area} 16 | \item{INTPTLONG}{Longitude of the area} 17 | }} 18 | \source{ 19 | \url{http://us.census.link/} 20 | } 21 | \usage{ 22 | data(CENSUS_GAZ) 23 | } 24 | \description{ 25 | A dataset containing summary census statistics for each ZCTA in the census 26 | } 27 | \keyword{datasets} 28 | 29 | -------------------------------------------------------------------------------- /man/DataSource.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-dataSource.R 3 | \name{DataSource} 4 | \alias{DataSource} 5 | \title{Main S3 class that defines the data source interface} 6 | \usage{ 7 | DataSource() 8 | } 9 | \description{ 10 | The functions defined in the DataSource class must be implemented in the custom data source for each 11 | new data set, taking steps to convert data and normalize data access. It is good practice to ensure that 12 | custom data sources "extend" DataSource by using the \code{DataSource()} constructor as the line that creates the 13 | custom DataSource S3 objects, and then overriding functions as required. 14 | } 15 | \details{ 16 | \code{DataSource} Functions as a sort of interface definition and/or default implementation for 17 | all data source functions. It is good practice to have custom data sources extend this one because 18 | it provides default implementations for all functions and a handfull of useful utility functions. 19 | } 20 | \seealso{ 21 | \code{\link{TestData}}, \code{\link{sanityCheckDataSource}} 22 | } 23 | 24 | -------------------------------------------------------------------------------- /man/ERLE_ZIP_LOCATION.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataSetDocs.R 3 | \docType{data} 4 | \name{ERLE_ZIP_LOCATION} 5 | \alias{ERLE_ZIP_LOCATION} 6 | \title{Zipcode to location mapping} 7 | \format{A data frame with 43191 rows and 7 variables: 8 | \describe{ 9 | \item{zip}{zip code, as a number} 10 | \item{city}{city the zip code is in} 11 | \item{state}{two letter abbreviation for the state the zip code is in} 12 | \item{latitude}{latitude of the zip code center (post office?)} 13 | \item{longitude}{longitude of the zip code center (post office?)} 14 | \item{timezone}{timesone, as offste relative to GMT time} 15 | \item{dst}{indicator for participation in daylight savings time} 16 | }} 17 | \source{ 18 | \url{http://can.not.remember/} 19 | } 20 | \usage{ 21 | data(ERLE_ZIP_LOCATION) 22 | } 23 | \description{ 24 | A dataset containing columns with US zip codes and the corresponding locaiton information 25 | } 26 | \keyword{datasets} 27 | 28 | -------------------------------------------------------------------------------- /man/MeterDataClass.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/classes-customer.R 3 | \name{MeterDataClass} 4 | \alias{MeterDataClass} 5 | \title{Main S3 class that holds and normalizes VISDOM customer meter data} 6 | \usage{ 7 | MeterDataClass(id, geocode = NULL, weather = NULL, data = NULL, 8 | gasData = NULL, useCache = T, doSG = F, rawData = NULL) 9 | } 10 | \arguments{ 11 | \item{id}{Unique identifier used by data source to identify a customer meter. 12 | Even with numerical id's in a database, it is a good idea to make these 13 | character strings in R.} 14 | 15 | \item{geocode}{Primary geographic locator used by the data source. This is 16 | very often a zip code, but data sources can implement it as census block 17 | or any geographic category that covers all meters, with each meter 18 | associated with only one geocode. If it is not provided, the data source 19 | is used to query for the meter's location.} 20 | 21 | \item{weather}{Reference to instance of \link{WeatherData} object that provides 22 | weather data for the meter's location. The DATA_SOURCE is used to try to 23 | load this information for the relevant geocode if it is not provided.} 24 | 25 | \item{data}{Tabluar format dataframe of electric meter data for the meter identified 26 | by \code{id}, with each row covering all observations for a meter for a day, with 27 | columns including at least: id, date, and hourly or 15 minute readings.} 28 | 29 | \item{gasData}{Tabluar format dataframe of gas meter data, with each row covering all 30 | observations for a meter for a day, with columns including at least: id, date 31 | and daily readings. If this is not provided, the gas meter data is 32 | loaded using the data source if the data source supports gas data. 33 | \code{gasData} is typically passed in as an optimmization 34 | to avoid unnecessary database queries during bulk meter feature extraction.} 35 | 36 | \item{useCache}{Data caching option indicating to the data source whether cached 37 | data should be used to populate the class. Data sources implement their own 38 | caching, but the expectation is that they will hit the underlying data base 39 | once to get the data in the first place and then save it to a local cache from 40 | which it will be retrieved on subsequent calls. See \link{run.query} for details.} 41 | 42 | \item{doSG}{Option indicating whether expensive solar geometry calculations 43 | that rely on the \link{solaR} package should be performed. Off by default for 44 | performance reasons. See \link{solarGeom} for details.} 45 | 46 | \item{rawData}{Tabluar format dataframe of electric meter data, with each row covering all 47 | observations for a meter for a day, with columns including at least: id, date, 48 | and hourly or 15 minute readings. If this is not provided and \code{data} is not provided, 49 | the meter data is loaded using the data source. \code{rawData} is typically passed in as 50 | an optimization to avoid unnecessary database queries during bulk meter feature 51 | extraction.} 52 | } 53 | \description{ 54 | Data structures sand functions used to store, manipulate, and visualize customer 55 | meter data. This class provides structure and standardization to the representation 56 | of meter data in VISDOM, with support for passing in pre-loaded version of data 57 | that is otherwise expensive to load individually. It therefore enables feature 58 | functions, figures, etc. to all be written to draw upon standard structures and 59 | functions and centralizes work on performance. 60 | } 61 | \details{ 62 | MeterDataClass is often called within the iterator framework or using DATA_SOURCE$getMEterDataClass(id) for a data source. 63 | } 64 | \examples{ 65 | \dontrun{ 66 | DATA_SOURCE = TestData(10) 67 | cust = MeterDataClass(id='14') 68 | plot(cust) 69 | } 70 | } 71 | \seealso{ 72 | \code{\link{WeatherClass}} 73 | } 74 | 75 | -------------------------------------------------------------------------------- /man/TestData.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/testDataSource.R 3 | \name{TestData} 4 | \alias{TestData} 5 | \title{Implementation of a data source that generates synthetic data for testing and examples} 6 | \usage{ 7 | TestData(n = 100) 8 | } 9 | \arguments{ 10 | \item{n}{Number of meters for the data source to provide synthetic data for: determines 11 | the return values for functions like \code{DATA_SOURCE$getIds()} and and \code{DATA_SOURCE$getAllData()}} 12 | } 13 | \description{ 14 | Example implementation of the data source functions as a coherent data source. This one 15 | simply generates random synthetic data that conform to the formats required by \code{\link{MeterDataClass}} 16 | and \code{\link{WeatherClass}}, returning valid, but meaningless data for all calls. 17 | } 18 | \details{ 19 | \code{TestData} is instantiated with a certain number of meters to generate data for. It then generates 20 | random data for that number and consumes that data to support the rest of its functions. This class 21 | can be used for examples, learning how data sources work, and providing data for tests of feature algorithms 22 | and meter data related classes. 23 | } 24 | \examples{ 25 | \dontrun{ 26 | DATA_SOURCE = TestData(n=100) 27 | idList = DATA_SOURCE$getIds() 28 | dataMatrix = DATA_SOURCE$getAllData() 29 | } 30 | } 31 | \seealso{ 32 | \code{\link{DataSource}} 33 | } 34 | 35 | -------------------------------------------------------------------------------- /man/WeatherClass.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/classes-weather.R 3 | \name{WeatherClass} 4 | \alias{WeatherClass} 5 | \title{S3 class that holds and normalizes weather data} 6 | \usage{ 7 | WeatherClass(geocode, raw = NULL, doMeans = T, useCache = F, doSG = F) 8 | } 9 | \arguments{ 10 | \item{geocode}{Primary geographic locator used by the data source. This is 11 | very often a zip code, but data sources can implement it as census block 12 | or any geographic category that covers all meters, with time series 13 | observations of weather data associated with a geocode.} 14 | 15 | \item{doMeans}{Indicator of whether to calculate and retain daily and monthly 16 | mean values of observations. Disable when not in use for faster performance.} 17 | 18 | \item{useCache}{Data caching option indicating to the data source whether cached 19 | data should be used to populate the class. Data sources implement their own 20 | caching, but the expectation is that they will hit the underlying data base 21 | once to get the data in the first place and then save it to a local cache from 22 | which it will be retrieved on subsequent calls. See \link{run.query} for details.} 23 | 24 | \item{doSG}{Option indicating whether expensive solar geometry calculations 25 | that rely on the \link{solaR} package should be performed. Off by default for 26 | performance reasons. See \link{solarGeom} for details.} 27 | } 28 | \description{ 29 | Standard format and functions for loading, manipulating, and visualizing weather data in VISDOM. 30 | } 31 | \details{ 32 | \code{WeatherData} is compatible by default with the output (i.e. database table) of 33 | python weather data scraping code that draws upon NOAA's NOAA Quality Controlled 34 | Local Climatological Data (QCLCD) ftp server. See \code{http://github.com/sborgeson/local-weather} 35 | for code to download and convert weathre data into CSV files, the weather data zip files at 36 | \code{http://www.ncdc.noaa.gov/orders/qclcd/}, and the QCLCD homepage here 37 | \code{https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/quality-controlled-local-climatological-data-qclcd} 38 | } 39 | \section{Parameters}{ 40 | 41 | } 42 | \examples{ 43 | \dontrun{ 44 | DATA_SOURCE = TestData(10) 45 | weather = WeatherData(geocode='12601') 46 | plot(weather) 47 | } 48 | } 49 | \seealso{ 50 | \code{\link{MeterDataClass}} 51 | } 52 | 53 | -------------------------------------------------------------------------------- /man/ZIP_TO_ZCTA.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/dataSetDocs.R 3 | \docType{data} 4 | \name{ZIP_TO_ZCTA} 5 | \alias{ZIP_TO_ZCTA} 6 | \title{A zip code to census zip code tabulation area mapping} 7 | \format{A data frame with 41979 rows and 5 variables: 8 | \describe{ 9 | \item{ZIP}{zip code, as a 0 padded string} 10 | \item{ZIPType}{zip code type} 11 | \item{CityName}{Name of the city the zip code is in} 12 | \item{StateAbbr}{Two letter abbreviation for the state the zip code is in} 13 | \item{ZCTA}{the census zip code tabulation area (ZCTA) the zip code most overlaps with} 14 | }} 15 | \source{ 16 | \url{http://some.random.helpful.blog/} 17 | } 18 | \usage{ 19 | data(ZIP_TO_ZCTA) 20 | } 21 | \description{ 22 | A dataset containing columns with US zip codes and the corresponding ZCTA ids 23 | } 24 | \keyword{datasets} 25 | 26 | -------------------------------------------------------------------------------- /man/acs.fetch.and.cache.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-census.R 3 | \name{acs.fetch.and.cache} 4 | \alias{acs.fetch.and.cache} 5 | \title{Download census data, employing caching to avoid repeated downloads.} 6 | \usage{ 7 | acs.fetch.and.cache(...) 8 | } 9 | \description{ 10 | A thin caching wrapper around \code{acs.fetch(...)} in the acs package. 11 | } 12 | \examples{ 13 | acs.fetch.and.cache(...) 14 | } 15 | 16 | -------------------------------------------------------------------------------- /man/applyDateFilters.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/filter.R 3 | \name{applyDateFilters} 4 | \alias{applyDateFilters} 5 | \title{Date filter meter data} 6 | \usage{ 7 | applyDateFilters(df, filterRules = NULL, dateCol = "dates") 8 | } 9 | \arguments{ 10 | \item{df}{The data frame of data to filter} 11 | 12 | \item{filterRules}{A named list of filtering rules. Supported list entries include: 13 | 14 | \code{MOY} - list of months of the year to include, using 1 for Jan through 12 for Dec 15 | 16 | \code{DOW} - days of week to include, using 1 for Sun and 7 for Sat 17 | 18 | \code{start.date} - the first day of data to include: all dates before this date are excluded 19 | 20 | \code{end.date} - the last day of data to include: all dates after this date are excluded} 21 | 22 | \item{dateCol}{The name of the column in 'df' with dates in it.} 23 | } 24 | \description{ 25 | Utility function that filters raw meter data (one customer day per row) based on date criteria 26 | } 27 | 28 | -------------------------------------------------------------------------------- /man/basicFeatures.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/features-basic.R 3 | \name{basicFeatures} 4 | \alias{basicFeatures} 5 | \title{Implementation of several basic statistical features} 6 | \usage{ 7 | basicFeatures(meterData, ...) 8 | } 9 | \arguments{ 10 | \item{meterData}{An instance of \code{\link{MeterDataClass}} that provides the data used for all the implemented feature calculations.} 11 | } 12 | \description{ 13 | Function that implements a set of common statistical features that can be run on an instance of \code{\link{MeterDataClass}}. 14 | } 15 | \details{ 16 | \code{basicFeatures} is called by passing in an instance of a MeterDataClass, which provides the meter data in both linear series 17 | and day-per-row matrix formats. These data structures and some associated indices and intermediate data structures are 18 | re-used throughout this method, which tries to take advantage of the performance benefits of such re-use. 19 | This function is often called in the context of the \code{iterator.*} suite of functions, which can loop through large sets of 20 | meters, calling feature functions on each. 21 | } 22 | \examples{ 23 | \dontrun{ 24 | DATA_SOURCE = TestData(n=10) 25 | meterData = DATA_SOURCE$getMeterDataClass(DATA_SOURCE$getIds()[1]) 26 | features = basicFeatures(meterData) 27 | names(features) 28 | class(features) 29 | } 30 | } 31 | 32 | -------------------------------------------------------------------------------- /man/buildWhere.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-dbUtil.R 3 | \name{buildWhere} 4 | \alias{buildWhere} 5 | \title{Utility function that assembles a where clause from a list of named values. All values are 'AND'ed together. 6 | Does not support nested expressions or 'OR's, except if the values are length() > 1, they are put into an 7 | in() clause enumerating those values.} 8 | \usage{ 9 | buildWhere( config = list(a='foo', b=20, c=c('one','two','three'), d = 1:9) ) 10 | buildWhere( config = list(a='foo', b=20, c=c('one','two','three'), d = 1:9), 11 | other_clause = '(name = "jimmy" or name = "johnny")', 12 | suffix='limit 100' ) 13 | } 14 | \arguments{ 15 | \item{config}{List of named values to be included in the where clause. Supports caracter and numeric classes, 16 | using in() syntax for those that are length > 1.} 17 | 18 | \item{other_clause}{An arbitrary clause that is prepended to the list of clauses that are 'AND'ed together to form the where statement} 19 | 20 | \item{suffix}{Arbitrary text that is appended to the where clause after it has been generated. For example 'limit 100' or 'order by id'} 21 | } 22 | \details{ 23 | \code{buildWhere} is a utility function builds up a where clause using a passes list with named values. 24 | Values of different types require different handling with respect to quotation marks and valus that are arrays 25 | are turned into in( ... ) clauses. In the corner case where nothing is passed in, it returns empty string, 26 | so it should be friendly to concatenation to a select statement regardless of whether there are values. 27 | } 28 | 29 | -------------------------------------------------------------------------------- /man/cleanFeatureDF.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{cleanFeatureDF} 4 | \alias{cleanFeatureDF} 5 | \title{Clean up feature data in preparation for saving} 6 | \usage{ 7 | cleanFeatureDF(features, checkId = TRUE, checkGeo = TRUE) 8 | } 9 | \arguments{ 10 | \item{features}{The data frame of feature to be cleaned up} 11 | 12 | \item{checkId}{boolean indicating whether to enforce a check for an id column with an error message. This should 13 | be true when exporting features or other id matched data and false otherwise.} 14 | 15 | \item{checkGeo}{boolean indicating whether to enforce a check zip5 columns with a warning message. This should 16 | be true when exporting features that will be mapped.} 17 | } 18 | \value{ 19 | A copy of the original data frame that is cleaned up 20 | } 21 | \description{ 22 | This function renames data columns for export via fixNames(), converts factors 23 | into characters, and checks for id and zip5 colums 24 | } 25 | 26 | -------------------------------------------------------------------------------- /man/clearCons.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-dbUtil.R 3 | \name{clearCons} 4 | \alias{clearCons} 5 | \title{Close and clear all connections} 6 | \usage{ 7 | clearCons(cfg) 8 | } 9 | \arguments{ 10 | \item{cfg}{a named list of configuration options, often loaded from a flat configuration file using \code{\link{dbCfg}}.} 11 | } 12 | \description{ 13 | Utility function to close/clear all active db connections, which can sometimes get left open, especially when errors 14 | prevent connections from being closed. 15 | } 16 | 17 | -------------------------------------------------------------------------------- /man/clear_acs_cache.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-census.R 3 | \name{clear_acs_cache} 4 | \alias{clear_acs_cache} 5 | \title{Flush the cache directory that stores downloaded census data.} 6 | \usage{ 7 | clear_acs_cache() 8 | } 9 | \examples{ 10 | clear_acs_cache() 11 | } 12 | 13 | -------------------------------------------------------------------------------- /man/conf.dbCon.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-dbUtil.R 3 | \name{conf.dbCon} 4 | \alias{conf.dbCon} 5 | \title{Get a database connection} 6 | \usage{ 7 | conf.dbCon(cfg) 8 | } 9 | \arguments{ 10 | \item{cfg}{a named list of configuration options, often loaded from a flat configuration file using \code{\link{dbCfg}}.} 11 | } 12 | \description{ 13 | Utility function that opens a database connection using the standard \code{dbCfg} format for all configuration options. 14 | } 15 | \details{ 16 | \code{conf.dbCon} is called by passing in a names list that contains \code{dbType} providing the name of the desired database driver, 17 | which is passed directly into \code{\link{DBI::dbDriver}} to get an instance of the required driver. The rest of the parameters are 18 | driver specific, but typically include \code{host}, \code{user}, \code{password}, and \code{dbname}. These are passed as arguments 19 | into a call to \code{\link{DBI::dbConnect}}, using the named driver to get a database connection and return it. 20 | } 21 | \seealso{ 22 | \code{\link{dbDfg}}, \code{\link{run.query}} 23 | } 24 | 25 | -------------------------------------------------------------------------------- /man/ctree.boot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-tree.R 3 | \name{ctree.boot} 4 | \alias{ctree.boot} 5 | \title{bootstrap several runs of the ctree algorithm} 6 | \usage{ 7 | ctree.boot(df, nRun = 100, nSample = NULL, ctl = NULL, responseVar, 8 | ignoreCols) 9 | } 10 | \arguments{ 11 | \item{df}{data.frame with response variable to be explained and all explanatory variables for the ctree to run with} 12 | 13 | \item{nRun}{number of bootstrapping runs to perform} 14 | 15 | \item{nSample}{number of samples (with replacement) for each run} 16 | 17 | \item{ctl}{ctree control instance to be passed to ctree algorithm to control stopping criteria, etc.} 18 | 19 | \item{responseVar}{the name of the variable that the ctree is trying to explain} 20 | 21 | \item{ignoreCols}{an optional list of columns in df, but that should be excluded from the ctree modeling} 22 | } 23 | \description{ 24 | This function repeatedly runs the ctree algorithm on random sub-samples of the underlying data 25 | to provide a measure of robustness for the individual ctree results. 26 | } 27 | 28 | -------------------------------------------------------------------------------- /man/ctree.run.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-tree.R 3 | \name{ctree.run} 4 | \alias{ctree.run} 5 | \title{run a ctree model} 6 | \usage{ 7 | ctree.run(df, fmla = NULL, sub = NULL, ctl = NULL, responseVar, 8 | ignoreCols, ...) 9 | } 10 | \arguments{ 11 | \item{df}{data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton} 12 | 13 | \item{fmla}{formula, referring to values in the df, specifying the variables to be used to train the ctree model} 14 | 15 | \item{ctl}{partykit::ctree_control instance to specify ctree model parameters} 16 | 17 | \item{responseVar}{the name of the variable that the ctree is trying to explain} 18 | 19 | \item{ignoreCols}{an optional list of columns in df, but that should be excluded from the ctree modeling} 20 | 21 | \item{...}{additional arguments to be passed to partykit::ctree} 22 | } 23 | \description{ 24 | This function trains a partykit::ctree using data from a passed data.frame. 25 | } 26 | 27 | -------------------------------------------------------------------------------- /man/ctree.subscore.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-tree.R 3 | \name{ctree.subscore} 4 | \alias{ctree.subscore} 5 | \title{fit a ctree with passed data and compute and return its spread score} 6 | \usage{ 7 | ctree.subscore(i, df, nSample = NULL, ctl = NULL, responseVar, ignoreCols) 8 | } 9 | \arguments{ 10 | \item{df}{data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton} 11 | 12 | \item{nSample}{number of samples if hte spread score should be run on a subset of the data} 13 | 14 | \item{ctl}{ctree control to specify ctree model parameters} 15 | 16 | \item{responseVar}{the name of the variable that the ctree is trying to explain} 17 | 18 | \item{ignoreCols}{an optional list of columns in df, but that should be excluded from the ctree modeling} 19 | } 20 | \description{ 21 | This function trains a ctree using data from a passed data.frame and returns the resulting spread score. 22 | } 23 | 24 | -------------------------------------------------------------------------------- /man/datesToEpoch.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{datesToEpoch} 4 | \alias{datesToEpoch} 5 | \title{Convert POSIXct data frame columns to integer seconds since the epoch} 6 | \usage{ 7 | datesToEpoch(df) 8 | } 9 | \arguments{ 10 | \item{df}{Data frame of feature data} 11 | } 12 | \value{ 13 | A data frame identical to the one passed in, but with integer columns replacing POSIXct ones 14 | } 15 | \description{ 16 | Searches the data frame for columns with 'POSIXct' in their class values and 17 | converts them to integers representing seconds since the epoch 18 | } 19 | 20 | -------------------------------------------------------------------------------- /man/dbCfg.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-dbUtil.R 3 | \name{dbCfg} 4 | \alias{dbCfg} 5 | \title{Parse standard database configuration file format} 6 | \usage{ 7 | dbCfg(filePath) 8 | } 9 | \arguments{ 10 | \item{filePath}{The file path to the config file to be read in. Must be absolute or relative to the working directory from \code{getwd()}} 11 | } 12 | \description{ 13 | Read database config file with key=value format: 14 | \code{key=value 15 | key2=value2} 16 | etc. into a named list. 17 | Keys must include \code{dbType} passed in to \code{DBI::dbDriver} to load the appropriate driver (i.e. MySQL, PostgreSQL, SQLite, etc.) 18 | and any parameters like \code{host}, \code{user}, \code{password}, \code{dbname} that should be passed into the DBI::dbConnect function call for your driver. 19 | } 20 | \details{ 21 | The result of this function call is just a list with named entries. As long as the entries are valid, it can be generated in other ways. 22 | Note that the ability to specify a single file that contains database configuration information allows for 23 | multiple users to share a code based, but provide their personal credentials in their copy of the relevant cfg file. 24 | Due to the inevitability of code conflicts with these files where user names and passwords are different for each user 25 | and the fact that it is insecure and unwise to include database credentials in any version control system, this means 26 | that such dbCfg files should not be checked into version control. 27 | } 28 | \seealso{ 29 | \code{\link{conf.dbCon}}, \code{\link{run.query}} 30 | } 31 | 32 | -------------------------------------------------------------------------------- /man/exportData.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{exportData} 4 | \alias{exportData} 5 | \title{Export feature data into a selection of formats} 6 | \usage{ 7 | exportData(df, name, label = NA, format = "hdf5", checkId = TRUE, 8 | checkGeo = TRUE, ...) 9 | } 10 | \arguments{ 11 | \item{df}{Data frame of feature data to export} 12 | 13 | \item{name}{Primary name of export, meaning file name or database table name} 14 | 15 | \item{label}{Optional data label for export formats. For example if not NA, this would be the name 16 | of the data table within an hdf5 file or a suffix to the csv file name, as in \code{paste(name, label, sep='_')}} 17 | 18 | \item{format}{One of the supported formats for data export, currently 'hdf5', 'csv', or 'database'} 19 | 20 | \item{checkId}{boolean control over whether to error out with a \code{stop()} if an id column is not present} 21 | 22 | \item{checkGeo}{boolean control over whether to warn if a geographic field, \code{zip5} in this case, is not present.} 23 | 24 | \item{...}{Pass through parameters for specific export methods. For example, 25 | database export requires a conn object.} 26 | } 27 | \description{ 28 | Runs the export function for a given data format on feature data 29 | } 30 | 31 | -------------------------------------------------------------------------------- /man/exportFeatureAndShapeResults.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{exportFeatureAndShapeResults} 4 | \alias{exportFeatureAndShapeResults} 5 | \title{Export feature run and load shape results} 6 | \usage{ 7 | exportFeatureAndShapeResults(feature.data, shape.results.data = NULL, 8 | format = "hdf5", prefix = "", filePath = ".") 9 | } 10 | \arguments{ 11 | \item{feature.data}{File path to an RData file with feature data data frame or the data frame itself} 12 | 13 | \item{shape.results.data}{Optional file path to an RData file containing load shape clustering results 14 | or the results object itself. i.e. results from 15 | \code{visdomloadshape::shapeFeatures(visdomloadshape::shapeCategoryEncoding())}} 16 | 17 | \item{format}{Export data format - one of the ones supported by exportData()} 18 | 19 | \item{prefix}{Optional prefix to put n froun of all feature names} 20 | 21 | \item{filePath}{optional path to the directory where exported data should be written if the export type is a file. '.' by default.} 22 | } 23 | \description{ 24 | Loads feature data and load shape clustering data from RData files and 25 | saves them into the selected export format 26 | } 27 | 28 | -------------------------------------------------------------------------------- /man/exportShapes.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{exportShapes} 4 | \alias{exportShapes} 5 | \title{Save load shape results} 6 | \usage{ 7 | exportShapes(shape.results, prefix = "", format = "hdf5", filePath = ".") 8 | } 9 | \arguments{ 10 | \item{shape.results}{the shape feature results to export. These should be in the format returned by 11 | \code{visdomloadshape::shapeFeatures()}, as in 12 | \code{shapeFeatures(shapeCategoryEncoding(rawData=DATA_SOURCE$getAllData(), metaCols=1:4, encoding.dict=someDict))}} 13 | 14 | \item{prefix}{a prefix to apply to the feature column names} 15 | 16 | \item{format}{the data format for export. One of the values supported by the \code{format} paramater in \code{exportData()}} 17 | 18 | \item{filePath}{optional path to the location where exported files should be written (if applicable). Default is \code{getwd()}} 19 | } 20 | \description{ 21 | Exports standardized load shape clustering and assignment data into a 22 | corresponding set of exported data tables 23 | } 24 | 25 | -------------------------------------------------------------------------------- /man/fixNames.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{fixNames} 4 | \alias{fixNames} 5 | \title{Prepare data frame column names for export} 6 | \usage{ 7 | fixNames(df, prefix = "") 8 | } 9 | \arguments{ 10 | \item{df}{The data frame whose columns are to be renamed} 11 | 12 | \item{prefix}{An optional prefix to place in front of all column names} 13 | } 14 | \value{ 15 | A data frame identical to the one passed in, but with new column names. 16 | } 17 | \description{ 18 | Removes punctuation from data frame column names, replacing all with underscores 19 | and removing underscores that are repeated one after another 20 | } 21 | 22 | -------------------------------------------------------------------------------- /man/getRunId.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{getRunId} 4 | \alias{getRunId} 5 | \title{Get the database id of a feature_set/run combination} 6 | \usage{ 7 | getRunId(conn, runConfig) 8 | } 9 | \arguments{ 10 | \item{conn}{A database connection, usually obtained from \code{conf.dbCon} or \code{\link{DBI::dbConnect}}} 11 | 12 | \item{runConfig}{The run configuration file with key-value pairs of 13 | feature_set, feature_set_description, run_name and run_description. See 14 | 1inst/feature_set_run_conf/exampl_feature_set.conf1 for an example.} 15 | } 16 | \description{ 17 | Load user-specified run config file and return a unique numeric 18 | id from the feature_runs metadata table. Create the feature_runs table if 19 | it does not exist. 20 | } 21 | 22 | -------------------------------------------------------------------------------- /man/groupHist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-tree.R 3 | \name{groupHist} 4 | \alias{groupHist} 5 | \title{ctree classificaiton results plot for a binary response variable} 6 | \usage{ 7 | groupHist(df, df.ct, responseCol, title = "Group probabilities", 8 | idCol = "id", xlab = "Enrollment probability (\%)", normalize = FALSE, 9 | compareData = NULL, colors = NULL) 10 | } 11 | \arguments{ 12 | \item{df}{data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton} 13 | 14 | \item{df.ct}{ctree already trained on data, that is used to classify the data points in each sub sample.} 15 | 16 | \item{responseCol}{the name of the variable that the ctree is trying to explain} 17 | 18 | \item{title}{figure title of the plot} 19 | 20 | \item{idCol}{column with data row identifiers in it, which is removed from the df} 21 | 22 | \item{xlab}{label for x-axis of the figure} 23 | 24 | \item{normalize}{boolean for whether the x-axis should be percentages compared to the sample mean or absolute percentages} 25 | 26 | \item{compareData}{optional data.fram of one additional set of bar values to be plotted as well.} 27 | 28 | \item{colors}{optional named list specifying the figure colors.} 29 | } 30 | \description{ 31 | plot a histogram of customer segments and enrollment percentages, based on the membership of ctree leaf nodes. 32 | The plot is a "histogram" whose x-axis is the average "True" response for each ctree leaf node and whose 33 | height is the number of customers in each corresponding group. Strong results have nodes with high membership 34 | far from the sample mean. 35 | } 36 | 37 | -------------------------------------------------------------------------------- /man/iterator.build.idx.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{iterator.build.idx} 4 | \alias{iterator.build.idx} 5 | \title{Optimize single meter lookup in a data.frame containing many meters} 6 | \usage{ 7 | iterator.build.idx(ctx) 8 | } 9 | \arguments{ 10 | \item{ctx}{The ctx environment that configures a feature run. This function is specifically looking 11 | to build an index for \code{ctx$RAW_DATA}, if present and saves the results as \code{ctx$idxLookup}. It allows 12 | for RAW_DATA to be set as a data.frame with data from a large number of meters loaded at once and in 13 | advance of the feature extraction runs to reduce runtimes.} 14 | } 15 | \description{ 16 | This function does a fast search for the first and last indices for each meter's data in a data.frame 17 | with many meters and caches the resulting indices. These indices can be used to very quickly access data 18 | for individual meters. 19 | } 20 | \details{ 21 | Standard methods of searching for all data for a given meterId would use boolean expressions like 22 | \code{ctx$RAW_DATA[ctx$RAW_DATA$meterId == 'some_id',]}. It turns out that this is pretty inefficient 23 | for large data.frames becaus it generates values for all rows and then does the necessary comparisons for 24 | all rows. Direct numerical indexing avoids this overhead and such indices can be quickly computed using 25 | \code{\link{match}} and the fact that all data for each meter must be returned from the data source in 26 | contigious rows. Finally, the constructor for a \code{MeterDataClass} checks for these \code{ctx$idxLookup} if 27 | \code{ctx$RAW_DATA} is found and uses them to pull the subset of data associated with the meterId passed in 28 | to that constructor. 29 | } 30 | 31 | -------------------------------------------------------------------------------- /man/iterator.callAll.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{iterator.callAll} 4 | \alias{iterator.callAll} 5 | \title{Run multiple feature algorithms specified in a list of functions} 6 | \usage{ 7 | iterator.callAll(meterDataClass, ctx, fnList, ...) 8 | } 9 | \arguments{ 10 | \item{meterDataClass}{The \code{\link{MeterDataClass}} object that contains the data to be analyzed.} 11 | 12 | \item{ctx}{The ctx environment that configures feature runs and provides a place to store and pass data across feature function calls.} 13 | 14 | \item{fnList}{The list of feature extraction functions to be run against the \code{\link{MeterDataClass}}.} 15 | 16 | \item{...}{Additional arguments that will be passed into the feature functions.} 17 | } 18 | \description{ 19 | This function passes the meterDataClass object into 20 | every feature function in the list found in the parameter \code{fnList}, 21 | concatenating all the results into a single list with named values. 22 | } 23 | 24 | -------------------------------------------------------------------------------- /man/iterator.callAllFromCtx.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{iterator.callAllFromCtx} 4 | \alias{iterator.callAllFromCtx} 5 | \title{Run feature algorithms specified in the ctx environment} 6 | \usage{ 7 | iterator.callAllFromCtx(meterDataClass, ctx, ...) 8 | } 9 | \arguments{ 10 | \item{meterDataClass}{The \code{\link{MeterDataClass}} object that contains the data to be analyzed.} 11 | 12 | \item{ctx}{The ctx environment that contains the list of functions to run under the name \code{fnVector} and 13 | configures feature runs and provides a place to store and pass data across feature function calls.} 14 | 15 | \item{...}{Additional arguments that will be passed into the feature functions.} 16 | } 17 | \description{ 18 | This function loads the meter data associated with the passed \code{meterId} and creates 19 | a MeterDataClass instance using that data. It then passes that meter data object into 20 | every feature function in the list found in the ctx environment under \code{fnVector}, 21 | concatenating all the results into a single list with named values. 22 | } 23 | 24 | -------------------------------------------------------------------------------- /man/iterator.iterateMeters.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{iterator.iterateMeters} 4 | \alias{iterator.iterateMeters} 5 | \title{Run features for a list of meter ids} 6 | \usage{ 7 | iterator.iterateMeters(meterList, custFn, ctx = new.env(), as_df = FALSE, 8 | ...) 9 | } 10 | \arguments{ 11 | \item{meterList}{The list of meterIds to use to instantiate a \code{MeterDataClass} objects} 12 | 13 | \item{custFn}{The feature extraction function to run on each instance of \code{MeterDataClass}. 14 | Can also be a list of feature functions, whose results will be \code{c()}'d together, so make sure the names of their 15 | return values are unique!} 16 | 17 | \item{ctx}{The ctx environment for the feature extraction.} 18 | 19 | \item{as_df}{Boolean (defaulted to FALSE) that determines whether \code{iterator.todf} is run on 20 | the results before returning them.} 21 | 22 | \item{...}{Additional arguments to be passed to the feature extraction function.} 23 | } 24 | \description{ 25 | Utility function that runs the configured set of feature functions on the data from a list of passed 26 | meter ids. This is useful for scripting feature extraction on an arbitrary number of meters. 27 | } 28 | 29 | -------------------------------------------------------------------------------- /man/iterator.iterateZip.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{iterator.iterateZip} 4 | \alias{iterator.iterateZip} 5 | \title{Iterate over zip codes, extracting features for meters in each} 6 | \usage{ 7 | iterator.iterateZip(zipList, custFn, cacheResults = FALSE, ctx = new.env(), 8 | clearCachedResultsFromRAM = FALSE, as_df = FALSE, ...) 9 | } 10 | \arguments{ 11 | \item{zipList}{List of all zip codes to iterate over} 12 | 13 | \item{custFn}{The feature function(s) to call on \code{MeterDataClass} instances from within each zip code 14 | If it is a list of feature functions, the results will be \code{c()}'d together, so make sure the names of their 15 | return values are unique!} 16 | 17 | \item{cacheResults}{A boolean flag that indicates whether feature results should be cached as RData. 18 | If true, the cached data will be looked up prior to feature extraction to bypass running 19 | features for the given zip code while returning the cached results. This can prevent duplicate processing 20 | and allow an interrupted batch process to resume processing where it left off. 21 | The cache directory is `getwd()` by default, but can be overridden using `ctx$resultsCache`.} 22 | 23 | \item{ctx}{The ctx environment that configures the feature run.} 24 | 25 | \item{clearCachedResultsFromRAM}{A boolean that instructs the code to drop previous results from RAM if caching 26 | is in effect. This allows for larer runs to be executed and cached without accumulating in RAM. The idea is 27 | that after the run, this function will be called again, with useCache=TRUE and this flag set to FALSE, which 28 | will simply load and concatenate results from disk to re-create the full set.} 29 | 30 | \item{as_df}{Boolean (defaulted to FALSE) that determines whether \code{iterator.todf} is run on 31 | the results before returning them.} 32 | 33 | \item{...}{Arguments to be passed into the feature function(s).} 34 | } 35 | \description{ 36 | Function that iterates through all passed zip codes, looks up local weather data (once) and 37 | looks up the list of meter ids for each and calls iterator.iterateMeters with those meter 38 | ids and the per-loaded weather data. This runs faster than calling ids individually, which 39 | can load similar weather data over and over. 40 | } 41 | 42 | -------------------------------------------------------------------------------- /man/iterator.runMeter.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{iterator.runMeter} 4 | \alias{iterator.runMeter} 5 | \title{Run features for a single meterId} 6 | \usage{ 7 | iterator.runMeter(meterId, custFn, ctx, ...) 8 | } 9 | \arguments{ 10 | \item{meterId}{The meterId to use to instantiate a \code{MeterDataClass} object} 11 | 12 | \item{custFn}{The feature extraction function(s) to run on the instance of \code{MeterDataClass}. 13 | If it is a list of feature functions, the results will be \code{c()}'d together, so make sure the names of their 14 | return values are unique!} 15 | 16 | \item{ctx}{The ctx environment for the feature extraction.} 17 | 18 | \item{...}{Additional arguments to be passed to the feature extraction function.} 19 | } 20 | \description{ 21 | Utility function that runs the configured set of feature functions on a single passed 22 | meter. This is useful for testing and feature development, but also as the function 23 | called by parallelizable methods like apply, and the *ply function of plyr to run feature 24 | extraction in on multiple cores of a computer. 25 | } 26 | 27 | -------------------------------------------------------------------------------- /man/iterator.runZip.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{iterator.runZip} 4 | \alias{iterator.runZip} 5 | \title{Iterate over all meters in a zip code, extracting features for each in a performance optimized manner} 6 | \usage{ 7 | iterator.runZip(zip, custFn, cacheResults = F, ctx = new.env(), 8 | as_df = FALSE, ...) 9 | } 10 | \arguments{ 11 | \item{zip}{The zip code to use for the weather, meter ids, and meter data lookups} 12 | 13 | \item{custFn}{The feature function(s) to call on \code{MeterDataClass} instances from within the zip code 14 | If it is a list of feature functions, the results will be \code{c()}'d together, so make sure the names of their 15 | return values are unique!} 16 | 17 | \item{cacheResults}{A boolean flag that indicates whether results should be cached as RData. 18 | If true, the cached data will be looked up prior to feature extraction to bypass running 19 | features for the given zip code while returning the cached results. This can prevent duplicate processing 20 | and allow an interrupted batch process to resume processing where it left off. 21 | The cache directory is `getwd()` by default, but can be overridden using `ctx$resultsCache`.} 22 | 23 | \item{ctx}{The ctx environment that configures the feature run.} 24 | 25 | \item{as_df}{Boolean (defaulted to FALSE) that determines whether \code{iterator.todf} is run on 26 | the results before returning them.} 27 | 28 | \item{...}{Arguments to be passed into the feature function(s).} 29 | } 30 | \description{ 31 | Function that looks up local weather and all meter data for the passed zip code 32 | (once) and caches them and then looks up all meter ids in the passed zip code 33 | calls iterator.iterateMeters with those meter ids and the pre-loaded weather and meter data. 34 | This runs faster than calling ids individually, which load individual meter data and similar 35 | weather data over and over. 36 | } 37 | 38 | -------------------------------------------------------------------------------- /man/iterator.todf.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{iterator.todf} 4 | \alias{iterator.todf} 5 | \title{Flatten feature data returned by iterator functions into a \code{data.frame}.} 6 | \usage{ 7 | iterator.todf(lofl) 8 | } 9 | \arguments{ 10 | \item{lofl}{The list of lists for feature values that should be flattened into a single data frame, with one row per meterId.} 11 | } 12 | \description{ 13 | Itertor functions like \code{iterator.iterateMeters} return lists of lists of derived features indexed by meterIds. 14 | This function flattens the scalar values in these lists into data.frame, with one row per meterId and columns for every named feature found. 15 | } 16 | \details{ 17 | This function ignores non-scalars, so diagnostic data and other complex data structures can be returned by feature algorithms without 18 | interfering with the task of creating clean vectors of features for each meter. The columns of the returned data.frame are 19 | a super set of all the features returned for every MeterDataClass with computed features. Missing features are given values of NA. 20 | } 21 | 22 | -------------------------------------------------------------------------------- /man/loadACS.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-census.R 3 | \name{loadACS} 4 | \alias{loadACS} 5 | \title{Retrieve census data from the American Community Survey.} 6 | \usage{ 7 | loadACS(filterErr = T, endyear = 2014, span = 5, geography = NULL, 8 | acs_vars_to_load = NULL) 9 | } 10 | \arguments{ 11 | \item{geography}{The geographic aggregations and search terms to use with the \code{acs.fetch} 12 | interface. This will default to all zip codes.} 13 | 14 | \item{acs_vars_to_load}{A dataframe to specify which variables to load and what to name the resultant columns. 15 | The dataframe should have columns "census_var" and "label" as characters. See example below. 16 | This will default to the variables specified in data-raw/census/census_vars.txt} 17 | 18 | \item{filterErr=T}{Remove the standard error columns from the results} 19 | 20 | \item{endyear=2014}{Specify which year of data to download. Currently, 2013 & 2014 are available.} 21 | 22 | \item{span=5}{Whether you want the 3-year or the 5-year estimates. See census website for more explanations.} 23 | } 24 | \description{ 25 | Download data from the Census API and cache it locally on disk. This is a 26 | thin wrapper around the \code{acs.fetch} function in the [acs 27 | package](https://cran.r-project.org/web/packages/acs/index.html). The results 28 | of acs.fetch() are converted from a fancy S4 class to a simple data frame. 29 | The geographical designations (zip code, state, census block, etc) are specified 30 | in the row names, and if zip code (actually ZCTA) is used for the geography, the 31 | numeric value is also available as a column. 32 | } 33 | \details{ 34 | Before using these functions, you will need to set a census API key. You can get one from 35 | http://api.census.gov/data/key_signup.html Once you have obtained a key, set it up like so: 36 | \code{ 37 | acs::api.key.install(key='PASTE_YOUR_KEY_HERE') 38 | } 39 | If you are a developer who frequently reinstalls packages, you may want to 40 | copy this into your .Rprofile 41 | 42 | The path to the cache directory is stored as an R option. You may customize 43 | it by creating a .Rprofile file in either your project or home directory. 44 | \code{ 45 | options("visdom.acs.cache"="/new/path/to/cache/dir") 46 | getOption("visdom.acs.cache") 47 | clear_acs_cache() 48 | } 49 | 50 | The default set of Data Profile variables are specified in 51 | raw-data/census/census_vars.txt. If you edit that list, don't forget to 52 | re-execute migrateData.R. You can look at what variables are available by 53 | going to 54 | \url{http://www.census.gov/data/developers/data-sets/acs-5year.2014.html}, 55 | choosing the appropriate year, searching the page for "ACS Data Profile API 56 | Variables", and clicking 57 | \href{http://api.census.gov/data/2014/acs5/profile/variables.html}{html}. As of 58 | November 11, 2016, the API only works reliably for 2013 and 2014. 59 | } 60 | \examples{ 61 | library(acs) 62 | # Load data by state 63 | loadACS(geography = acs::geo.make(state='*')) 64 | # Load data for 2 zip codes 65 | loadACS(geography = c(acs::geo.make(zip.code='94709'), acs::geo.make(zip.code='94710'))) 66 | # Load data for all tracts, restricted to California. 67 | loadACS(geography = acs::geo.make(state='CA', county='*', tract='*')) 68 | # Unfortunately, Zip codes can't be restricted to a single state like tracts can. 69 | 70 | # Selecting different variables 71 | acs_vars_to_load = data.frame( 72 | census_var = c("DP02_0015", "DP04_0047"), 73 | label = c("avg_hh_size", "owner_hh_size"), 74 | stringsAsFactors = F ) 75 | loadACS(acs_vars_to_load = acs_vars_to_load) 76 | } 77 | \seealso{ 78 | \code{\link{DataSource}} 79 | } 80 | 81 | -------------------------------------------------------------------------------- /man/mergeShapeFeatures.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{mergeShapeFeatures} 4 | \alias{mergeShapeFeatures} 5 | \title{Merge load shape features into feature data frame} 6 | \usage{ 7 | mergeShapeFeatures(features, shape.results) 8 | } 9 | \arguments{ 10 | \item{features}{Data frame of feature data} 11 | 12 | \item{shape.results}{Load shape clustering and assignment results object to pull features from} 13 | } 14 | \value{ 15 | A data frame identical to the one passed in, but with new laod shape feature columns 16 | } 17 | \description{ 18 | Pulls load shape features from a shape results object and appends them to 19 | an existing feature data frame 20 | } 21 | 22 | -------------------------------------------------------------------------------- /man/piecewise.regressor.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-regression.R 3 | \name{piecewise.regressor} 4 | \alias{piecewise.regressor} 5 | \title{Break a vector into piecewise regressor columns, using the specified breaks (aka bins)} 6 | \usage{ 7 | piecewise.regressor(bins, regressor, ...) 8 | } 9 | \arguments{ 10 | \item{bins}{- numerical arry whose values are the break points defining the piecewise regressors} 11 | 12 | \item{regressor}{The single array of regressor values to be broken out into piecewise columns.} 13 | 14 | \item{diverge}{- Defautl FALSE: Whether the first column contains the distance of 15 | the value from the bottom change point (rather than from 0)} 16 | } 17 | \description{ 18 | convienience reordering of \code{\link{regressor.piecewise}} putting the bins first for apply style calls... 19 | 20 | break a vector out for continuous piecewise regression (i.e. the fitted 21 | segments will join eachother at each end) into a matrix whose row 22 | totals are the original values, but whose columns divide the value across 23 | a set of bins, so 82 across bins with boundaries c(50,60,65,70,80,90) 24 | becomes the row 50,10,5,5,10,2,0, which sums to 82... 25 | This is very useful for finding rough change points in thermal response 26 | as is expected for buildings with clear setpoints 27 | } 28 | \details{ 29 | This can be used in an \code{apply} setting to run a parametric grid search of break points. 30 | 31 | TODO: This can create a column of zeros, which should break the regression 32 | so we might need to prune the columns when we're done and keep track of 33 | which bins are in play when comparing across regressions 34 | } 35 | \seealso{ 36 | \code{\link{regressor.piecewise}} for 'normal' syntax 37 | } 38 | 39 | -------------------------------------------------------------------------------- /man/reexports.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-base.R 3 | \docType{import} 4 | \name{reexports} 5 | \alias{\%>\%} 6 | \alias{reexports} 7 | \title{Objects exported from other packages} 8 | \description{ 9 | These objects are imported from other packages. Follow the links 10 | below to see their documentation. 11 | 12 | \describe{ 13 | \item{dplyr}{\code{\link[dplyr]{\%>\%}}} 14 | }} 15 | \keyword{internal} 16 | 17 | -------------------------------------------------------------------------------- /man/regressor.piecewise.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-regression.R 3 | \name{regressor.piecewise} 4 | \alias{regressor.piecewise} 5 | \title{Break a vector into piecewise regressor columns, using the specified breaks} 6 | \usage{ 7 | regressor.piecewise(regressor, bins, diverge = F) 8 | } 9 | \arguments{ 10 | \item{regressor}{The single array of regressor values to be broken out into piecewise columns.} 11 | 12 | \item{bins}{- numerical arry whose values are the break points defining the piecewise regressors} 13 | 14 | \item{diverge}{- Defautl FALSE: Whether the first column contains the distance of 15 | the value from the bottom change point (rather than from 0)} 16 | } 17 | \description{ 18 | break a vector out for continuous piecewise regression (i.e. the fitted 19 | segments will join eachother at each end) into a matrix whose row 20 | totals are the original values, but whose columns divide the value across 21 | a set of bins, so 82 across bins with boundaries c(50,60,65,70,80,90) 22 | becomes the row 50,10,5,5,10,2,0, which sums to 82... 23 | This is very useful for finding rough change points in thermal response 24 | as is expected for buildings with clear setpoints 25 | } 26 | \details{ 27 | TODO: This can create a column of zeros, which should break the regression 28 | so we might need to prune the columns when we're done and keep track of 29 | which bins are in play when comparing across regressions 30 | } 31 | \seealso{ 32 | \code{\link{piecewise.regressor}} for 'apply' syntax 33 | } 34 | 35 | -------------------------------------------------------------------------------- /man/regressor.split.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-regression.R 3 | \name{regressor.split} 4 | \alias{regressor.split} 5 | \title{Split a single column of values into multiple columns as determined by group membership} 6 | \usage{ 7 | regressor.split(regressor, membership = NULL) 8 | } 9 | \arguments{ 10 | \item{regressor}{The vector of data to be split} 11 | 12 | \item{membership}{A vector of the same length as the regressor that assigns group membership categorically. 13 | If the regressor is a factor, no membership assignment is required.} 14 | } 15 | \value{ 16 | A matrix of length(sort(unique(membership))) columns, that is all 17 | zeros except for the regressor values that match the membership values 18 | corresponding to the column. This supports regression with separate 19 | coefficints for each group defined by the membership 20 | for example, regressor.split(Tout,dates$hour) will return a matrix with 24 21 | columns, where the only non-zero entry per row contains the Tout value in 22 | the column corresponding to the hour of day it was recorded 23 | } 24 | \description{ 25 | Split a column of data by membership in discrete groups (as with factor levels) 26 | data into multiple columns containing just the values corresponding to the membership 27 | indicator associated with the column and zeros otherwise. 28 | } 29 | 30 | -------------------------------------------------------------------------------- /man/regressorDF.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-regression.R 3 | \name{regressorDF} 4 | \alias{regressorDF} 5 | \title{data.frame formatting of \code{\link{MeterDataClass}} data} 6 | \usage{ 7 | regressorDF(meterData, norm = FALSE, rm.na = FALSE) 8 | } 9 | \arguments{ 10 | \item{meterData}{The meter data class to be converted} 11 | } 12 | \description{ 13 | Given a MeterDataClass instance (meterData), regressorDF returns a data.frame consisting of a standard set of 14 | regressor columns suitable for passing into a call to lm. 15 | } 16 | 17 | -------------------------------------------------------------------------------- /man/rm.col.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-base.R 3 | \name{rm.col} 4 | \alias{rm.col} 5 | \title{remove named, index, or logical columns, if present, from a data.frame} 6 | \usage{ 7 | rm.col(df, cols) 8 | } 9 | 10 | -------------------------------------------------------------------------------- /man/run.query.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-dbUtil.R 3 | \name{run.query} 4 | \alias{run.query} 5 | \title{Utility function that runs and returns an arbitrary database query} 6 | \usage{ 7 | run.query(query, cfg, cacheDir = NULL, cacheFile = NULL, forceRefresh = F, 8 | debug = F) 9 | } 10 | \arguments{ 11 | \item{query}{a string representation of the SLQ query to be run.} 12 | 13 | \item{cfg}{a named list of connection configuration options, often loaded from a flat configuration file using \code{\link{dbCfg}}.} 14 | 15 | \item{cacheDir}{an optional directory path to a location where query results can be cached as RData for future use without querying the database again.} 16 | 17 | \item{cacheFile}{an optional file name that signals that the results of the query should be cached as RData using the passed cacheFile name. 18 | Note that if there already exists an RData file with that name in the cache directory, the contents will be returned without running the query. 19 | This it is imerative that the \code{cacheFile} names are unique to each unique query that should be run.} 20 | 21 | \item{forceRefresh}{an optional parameter for use when a \code{cacheFile} has been specified. If True (default is False), the query is run against 22 | the database with results overwriting any existing cached data. This is similar behavior to entering the cache directory and erasing the 23 | cached data before calling the \code{run.query} function.} 24 | 25 | \item{debug}{print out diagnostic information about the cache path, file, and status, along with the full text of any SQL query executed.} 26 | } 27 | \details{ 28 | \code{run.query} is a utility function that is very often used in \code{DataSource} implementations. It automatically connects to a database 29 | with configuration that can be read from a config file, runs queries, and returns results as data.frames. It also supports caching of query results 30 | in a query cache directory, using passed cacheFile names, which must be carefully managed by the data source author to ensure that the cacheFile names 31 | are truly unique to each unique query made. For example, the cache file for query to load meter data for an individual meter would need to include 32 | the meter's id (or similar) to ensure that it isn't mistaken for data from another meter already cached. The purpose of all this cacheing logic is, 33 | of course, to improve the performance of data retrieval for large sets of data where query times can significantly impact performance. Thus it is often good 34 | practice to load and cache data in larger chunks than individual meter data. 35 | } 36 | \seealso{ 37 | \code{\link{dbDfg}}, \code{\link{conf.dbCon}} 38 | } 39 | 40 | -------------------------------------------------------------------------------- /man/runDateFilterIfNeeded.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/iterator.R 3 | \name{runDateFilterIfNeeded} 4 | \alias{runDateFilterIfNeeded} 5 | \title{Run date filters on raw meter data} 6 | \usage{ 7 | runDateFilterIfNeeded(ctx) 8 | } 9 | \arguments{ 10 | \item{ctx}{Context object that is an R environment with named parameters} 11 | } 12 | \description{ 13 | Runs \code{applyDateFilters} if the appropriate values are found in the ctx and the data is not yet 14 | flagged as filtered. This function retrieves data from \code{ctx$RAW_DATA} and writes its results to the same field. 15 | } 16 | 17 | -------------------------------------------------------------------------------- /man/sanityCheckDataSource.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-dataSource.R 3 | \name{sanityCheckDataSource} 4 | \alias{sanityCheckDataSource} 5 | \title{Function to exercise the core features and calls of a DataSource} 6 | \usage{ 7 | sanityCheckDataSource(DATA_SOURCE, useCache = FALSE) 8 | } 9 | \arguments{ 10 | \item{DATA_SOURCE}{The DataSource to be tested. In VISDOM, there is a convention to put the 11 | active \code{DataSource} into the global environment with the name DATA_SOURCE, so this code follows that 12 | naming convention, even though you are passing in the data source.} 13 | 14 | \item{useCache}{Boolean to control whether or not the DataSource should rely upon its data cache to 15 | retrieve data. If True and called with an empty cache, the function can be used to pre-populate the 16 | caches with id lists, geo code, meter data by geo code, and weather data by geo code.} 17 | } 18 | \description{ 19 | This function exercises the core functions of a \code{\link{DataSource}}, allowing DataSource authors to 20 | check if their DataSource meets the minimal obligations of data retrieval and formatting. 21 | It can also be used to pre-populate data caches before a feature run. 22 | } 23 | \details{ 24 | While these checks are not comprehensive, they represent the main functions exercised during a feature 25 | extraction run. DataSource authors should feel free to add functions to their DataSource and alter 26 | existing function signatures to meet their individual requirements. As long as this function still runs 27 | without errors, generic iterator functions should still work. The iterator relies on the ability to break 28 | ids and meter data down to geo code level sub-groups, to request all weather and meter data for a zip code, 29 | and to be able to instantiate a MeterDataClass object with the returned meter data and weather data. 30 | 31 | If this function no longer runs on your \code{DataSource}, then advanced users can write their own 32 | iteration methods that break down the feature extraction problem differently. 33 | } 34 | \seealso{ 35 | \code{\link{DataSource}} 36 | } 37 | 38 | -------------------------------------------------------------------------------- /man/showCons.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-dbUtil.R 3 | \name{showCons} 4 | \alias{showCons} 5 | \title{List all open database connections} 6 | \usage{ 7 | showCons(cfg) 8 | } 9 | \arguments{ 10 | \item{cfg}{parsed connection configuration data} 11 | } 12 | \description{ 13 | Use this to see whether too many connections are open 14 | } 15 | 16 | -------------------------------------------------------------------------------- /man/spread.boot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-tree.R 3 | \name{spread.boot} 4 | \alias{spread.boot} 5 | \title{bootstrap spread scores across several runs of the ctree algorithm} 6 | \usage{ 7 | spread.boot(df, df.ct, nRun = 100, nSample = NULL, responseVar, ignoreCols) 8 | } 9 | \arguments{ 10 | \item{df}{data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton} 11 | 12 | \item{df.ct}{ctree already trained on data, that is used to classify the data points in each sub sample.} 13 | 14 | \item{nRun}{number of bootstrapping runs to perform} 15 | 16 | \item{nSample}{number of samples (with replacement) for each run} 17 | 18 | \item{responseVar}{the name of the variable that the ctree is trying to explain} 19 | 20 | \item{ignoreCols}{an optional list of columns in df, but that should be excluded from the ctree modeling} 21 | } 22 | \description{ 23 | This function repeatedly randomly sub-samples the passed data.frame to provide points for classificaiton 24 | by the passed ctree model and calculates the resulting spread score for classifications of each subset 25 | and returns a vector of the resulting scores. It is useful for determining the robustness of the spread score 26 | for a particular data set. 27 | } 28 | 29 | -------------------------------------------------------------------------------- /man/spreadScore.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-tree.R 3 | \name{spreadScore} 4 | \alias{spreadScore} 5 | \title{given a ctree model, compute its spread score} 6 | \usage{ 7 | spreadScore(df, df.ct, nSample = NULL, responseVar, ignoreCols = c()) 8 | } 9 | \arguments{ 10 | \item{df}{data.frame with response variable to be explained and all explanatory variables to be used in ctree classificaiton} 11 | 12 | \item{df.ct}{ctree already trained on data, that is used to classify the data points in each sub sample.} 13 | 14 | \item{nSample}{number of samples (with replacement) to use for the calculation} 15 | 16 | \item{responseVar}{the name of the variable that the ctree is trying to explain} 17 | 18 | \item{ignoreCols}{an optional list of columns in df, but that should be excluded from the ctree modeling} 19 | } 20 | \description{ 21 | This function computes the "spread score" for a sample of data, given a ctree explaining a binary variable. 22 | New data samples are "predicted" into their ctree nodes and the probability of a 'True' response value for 23 | their members is fed into a weighted average of the absolute distance from the sample mean across all dat asamples. 24 | The higher the score, the better the model has done at classifying people with divergent behaviors (higher or loewr than average). 25 | } 26 | 27 | -------------------------------------------------------------------------------- /man/visdom.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/visdom.R 3 | \docType{package} 4 | \name{visdom} 5 | \alias{visdom} 6 | \alias{visdom-package} 7 | \title{Smart Meter data analysis tools for R} 8 | \description{ 9 | Smart Meter data analysis tools for R 10 | } 11 | \section{Core functions}{ 12 | 13 | 14 | VISDOM relies most heavily on the following core functions. See the example vignettes for examples of their usage: 15 | 16 | \itemize{ 17 | \item \code{\link{DataSource}}: An S3 class that implements the set of standard data source function for your data. 18 | 19 | 20 | \item \code{\link[visdom]{MeterDataClass}}: S3 class that holds meter data in both vector time series and daily matrix formats, associated weather data, and several supporting functions. 21 | 22 | 23 | \item \code{\link{TestData}}: Example data source S3 class that implements all required data source functions and generates random synthetic data for use in testing and examples. 24 | 25 | 26 | \item \code{\link[visdom]{WeatherClass}}: S3 class that holds weather data and related functions. 27 | 28 | 29 | \item \code{\link{basicFeatures}}: Function that implements a full suite of "basic" feature calculations, which include annual, seasonal, monthly, hour of day averages and variances, and other simple statistics, like simple correlation between outside temperature and consumption. 30 | 31 | 32 | \item \code{\link{dbCfg}}: S3 class that can parse a database config from text file. This is used by the util-dbUtil.R file to connect to the configired database. 33 | 34 | 35 | \item \code{\link{run.query}}: Main function used to run SQL queries to load and cache data. 36 | 37 | 38 | \item \code{\link{iterator.callAllFromCtx}}: Function that iterates through all feature algorithms listed in the configuration ctx environment and returns a concatenated named list of all resulting features for a given MeterDataClass instance. 39 | 40 | 41 | \item \code{\link{iterator.iterateMeters}}: Function that iterates through all passed meter ids to instantiate MEterDataClass for each and call one or more feature extraction functions on each. This requires a properly configured data source, database connection (if applicable) and is further configured using fields in the ctx context object. 42 | 43 | 44 | \item \code{\link{iterator.iterateZip}}: Function that iterates through all passed zip codes, looks up local weather data (once) and looks up the list of meter ids for each and calls iterator.iterateMeters with those meter ids and the per-loaded weather data. This runs faster than calling ids individually, which can load similar weather data over and over. 45 | 46 | 47 | \item \code{\link{iterator.runMeter}}: Utility function that runs the configured set of featture functions on a single passed meter. This is useful for testing and feature development, but also as the function called by parallelizable methods like apply, and the *ply function of plyr. 48 | 49 | 50 | \item \code{\link{iterator.todf}}: The iterator.iterate* functions return lists of feature lists, indexed by meter id. This function converts all the scalar features in this data structure into a single data.frame, one row per meter id. 51 | 52 | } 53 | } 54 | 55 | -------------------------------------------------------------------------------- /man/writeCSVData.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{writeCSVData} 4 | \alias{writeCSVData} 5 | \title{Write feature data frame to a csv file} 6 | \usage{ 7 | writeCSVData(data, fName, label = NA, filePath = NA) 8 | } 9 | \arguments{ 10 | \item{data}{The feature data frame to be written} 11 | 12 | \item{fName}{The name of the csv file to write the data to} 13 | 14 | \item{label}{Unused, but present for compatibility with other write* fucntions} 15 | 16 | \item{filePath}{optional path to the location where exported files should be written (if applicable). Default is \code{getwd()}} 17 | } 18 | \description{ 19 | Write feature data frame to a csv file 20 | } 21 | 22 | -------------------------------------------------------------------------------- /man/writeDatabaseData.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{writeDatabaseData} 4 | \alias{writeDatabaseData} 5 | \title{Write feature data frame to a database} 6 | \usage{ 7 | writeDatabaseData(data, name = NULL, label = NULL, conn, overwrite = TRUE, 8 | runConfig) 9 | } 10 | \arguments{ 11 | \item{data}{The feature data frame to be written} 12 | 13 | \item{name}{Unused, but present for compatibility with other write* fucntions} 14 | 15 | \item{label}{Unused, but present for compatibility with other write* fucntions} 16 | 17 | \item{conn}{A DBI dbConnection object to the database that will host the table} 18 | 19 | \item{overwrite}{Boolean indicator for whether the data written should overwrite any existing table or append it} 20 | 21 | \item{runConfig}{Path to a run configuration file with names and descriptions 22 | of the feature set and run. See 23 | `inst/feature_set_run_conf/exampl_feature_set.conf` for an example.} 24 | } 25 | \description{ 26 | Write feature data frame to a database using a \code{\link{DBI::dbWriteTable}} call 27 | } 28 | 29 | -------------------------------------------------------------------------------- /man/writeH5Data.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/util-export.R 3 | \name{writeH5Data} 4 | \alias{writeH5Data} 5 | \title{Write feature data frame to an hdf5 file} 6 | \usage{ 7 | writeH5Data(data, fName, label, filePath = NA) 8 | } 9 | \arguments{ 10 | \item{data}{The feature data frame to be written} 11 | 12 | \item{fName}{The name of the hdf5 formatted file to write the data to} 13 | 14 | \item{label}{The name of the data table within the hdf5 file} 15 | 16 | \item{filePath}{optional path to the location where exported files should be written (if applicable). Default is \code{getwd()}} 17 | } 18 | \description{ 19 | Write feature data frame to an hdf5 file 20 | } 21 | 22 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(visdom) 3 | 4 | test_check("VISDOM") 5 | -------------------------------------------------------------------------------- /tests/testthat/test_util-census.R: -------------------------------------------------------------------------------- 1 | library(visdom) 2 | library(acs) 3 | context("Integration test for merging census dataframes") 4 | 5 | check_acs_key = function() { 6 | if( ! acs::api.key.exists() ) { 7 | skip(paste( 8 | "A census API key has not been registered with the acs package. Skipping test.", 9 | "To install a census API key, go to http://api.census.gov/data/key_signup.html", 10 | "To register a census API key, do: acs::api.key.install(key='PASTE_YOUR_KEY_HERE')")) 11 | } 12 | } 13 | 14 | test_that("acs downloading & mergeCensus()", { 15 | check_acs_key() 16 | geography = c(acs::geo.make(zip.code='94709'), acs::geo.make(zip.code='94710')) 17 | censusStats = loadACS(geography = geography) 18 | df = data.frame("ZCTA" = as.numeric(censusStats$ZCTA), "fake_column" = as.numeric(censusStats$mean_fam_income) / 2.0) 19 | df = mergeCensus(df, censusStats=censusStats) 20 | expect_equal(df$fake_column, as.numeric(df$mean_fam_income) / 2.0) 21 | }) 22 | 23 | test_that("acs cache speedup", { 24 | check_acs_key() 25 | old_cache_path=R.cache::getCacheRootPath() 26 | cache_dir = file.path(tempdir(), "visdom_test_R.cache") 27 | dir.create(cache_dir, recursive=T, showWarnings=F) 28 | R.cache::setCacheRootPath(cache_dir) 29 | geography = c(acs::geo.make(zip.code='94709'), acs::geo.make(zip.code='94710')) 30 | time1 = system.time(loadACS(geography=geography)) 31 | time2 = system.time(loadACS(geography=geography)) 32 | expect_gt(time1['elapsed'], time2['elapsed']) 33 | # Clean up 34 | unlink("cache_dir", recursive=TRUE) 35 | R.cache::setCacheRootPath(old_cache_path) 36 | }) 37 | -------------------------------------------------------------------------------- /vignettes/FAQ.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Frequently Asked Questions" 3 | author: "Sam Borgeson" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{FAQ} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | This FAQ will grow with examples of how to resolve common issues or perform common tasks. 13 | 14 | *** 15 | 16 | **Q. When authoring a datasource customerID is my original customerID, ID is internal id for VISDOM. Should they always be in 'cust_id' and 'meter_id' format fount in TestDataSource? And what is the actual difference? Does this mean you can have different ID’s (for example different smart meters) per customer? Or how do I interpret these differences?** 17 | 18 | A. As the author of the data source, you get to choose what set of ids you want to key off of. You just need to be consistent. So if you are using meter ids, you know that the data is associated with the meter over time regardless of who was occupying the house, but if it is customer ids then the data will be some span of time from a specific meter. As long as the correct ids are associated with the meter data you want to use (i.e. getIds returns the ids that you can use to query for specific meter data), it will work. The actual values of the ids can be whatever you like, but you should ensure that they are character strings even if they are plausibly numerical. We have seen too many cases where there are leading zeros that get trimmed or the "number" of the id is too long for R's ints and the ids don't properly match up anymore. 19 | 20 | *** 21 | 22 | **Q. Time formats. How does visdom deal with different time resolutions between readings (1 min vs 1 h)?** 23 | 24 | A. VISDOM currently supports 15 minute and 1 hr readings, in the format of either 24 or 96 columns of data per day. Other timing is technically possible, but the official recommendation is using your data source to sample it down to one of these intervals for now. 25 | 26 | *** 27 | 28 | **Q. What timezone to use in input data?** 29 | 30 | A. Just be consistent. We usually try for the local timezone of the data, so that it is easier to have an intuitive feel for what you are seeing (i.e. 8am is breakfast time) and so it is more possible to sync with local weather data. 31 | 32 | *** 33 | 34 | **Q. How does VISDOM account for daylight saving time?** 35 | 36 | A. With the 24/96 column formats it ignores the 25th hour of fall back and treats the 24th hour of spring forward as missing data. Note that R is pretty picky about DST and crabby about dates in general. If you see unusual shifts in data timing around March or November in your plots, chances are good that DST is not being handled well. 37 | 38 | *** 39 | 40 | **Q. In the standard meter data format what hour does each column correspond to?** 41 | 42 | A. Your column headers don't have to match these, but referring to the TestData output, the first column, H1, is 12-1am (i.e. the average of consumption up to but excluding 1am) so H10 is 9-10am, and h24 is 11pm-12am. 43 | 44 | *** 45 | 46 | **Q. If we want to compare different datasets (for example conventional and high performance buildings), is there currently a way to assign and compare different groups of customers/buildings?** 47 | 48 | A. There isn't a built in idea of two groups, but if your groups are in the same data set, you can just add a custom feature that return each customer's group assignment as a feature and then you will be able to slice and dice later. If they are literally from different data sets, you will compute their features separately and maybe combine the feature data frames in R using \code{rbind()}. assuming you have an identical set features for both. Or you can use \code{plyr::rbind.fill()} to ensure that divergent feature sets are matched up. 49 | 50 | *** 51 | 52 | **Q. Data preparation: Are missing values as NA or 0?** 53 | 54 | A. Missing values should be NA's. 0 is (can be) a legitimate reading. 55 | 56 | *** 57 | 58 | **Q. What method to impute missing values (built in or in preprocess)?** 59 | 60 | A. That is up to you - if you want, your data source can impute values on the fly before returning meter data or you can do a batch process up front and write the results to a new database table that you will then read from with your data source. The latter will be preferable if you are worried CPU time during feature calculations, but this won't be a concern unless you have many ids worth of data. 61 | 62 | *** 63 | 64 | **Q. For what size of total dataset is a database recommended or required?** 65 | 66 | A. If you hoping to load data directly from CSVs or RData or something, you are only limited by the ram of your computer, so you should just try it and see. Thousands of customers should be no problem to load into memory together. 67 | 68 | *** 69 | 70 | **Q. Load Shape Clustering: Is the clustering profile library rich enough to expect it to perform well on customer data that includes also industrial customers and streetlighting.** 71 | 72 | A. First of all, the load shape clustering algorithms are protected by a patent and are therefore available separately from the rest of the VISDOM code. You need to ask for permission to work with that code, but non-commercial licensing is free. As for separating data for clustering, this depends on your intended use. Typically you would separate residential, commercial, industry, etc. and fit clusters separately. Even within commercial there is so much diversity of use that it can be good to do separate fits for each sub type (i.e. NAICS or other codes). 73 | 74 | *** 75 | 76 | **Q. Can we filter out different type of consumers (like residentials, street light, commercial buildings, etc.)** 77 | 78 | A. Yes. You can add an additional parameter to your datasource functions that return the lists of ids and meter data to restrict the ids to the ones you are interested in. i.e. if you implement the DATA_SOURCE functions to respond to such a parameter, you can call the functions with that parameter. 79 | 80 | *** 81 | 82 | **Q. DATA_CACHE: Our data is not in a database but loaded in RData files. Is there then still need to use the Cache functionality?** 83 | 84 | A. No the cache isn't necessary. In fact it is pretty tightly integrated into the database access code, so no DB means you probably don't need a cache. Also, the cache is just RData files, so you effectively are already using "cached" data... 85 | 86 | *** 87 | 88 | -------------------------------------------------------------------------------- /vignettes/advanced_usage.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Advanced Usage" 3 | author: "Sam Borgeson" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Advanced Usage} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | TBD. This will contain examples of advanced usage options once the basic documentation is completed and stable. 13 | -------------------------------------------------------------------------------- /vignettes/authoring_data_source.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Authoring Data Sources" 3 | author: "Sam Borgeson" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Authoring Data Sources} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | #Implementing your own data source 13 | 14 | In VISDOM, a data source is an adapter between your customer and meter data and the internal data formats used by VISDOM functions. To use VISDOM with your own data, you will need to author a DataSource object that maps between the formatting of your data and the data structures used by VISDOM by implementing all the relevant functions stubbed out by `DataSource()` in `util-dataSource.R`. This is the key step and main prerequisite for using VISDOM. You will typically need to set up data access (i.e. to a SQL database if applicable - see `util-dbUtil.R` - or figure out how you will be loading your data from disk or elsewhere), and write the code to perform the queries or other data access steps as appropriate to load, format, and return your data in the VISDOM standard format expected to come out of a data source. You can see the DataSource implemented for testing purposes in the file `testDataSource.R` in the R directory of the package. 15 | 16 | Setting the global variable `DATA_SOURCE = YourDataSource()` configures your data source for use by VISDOM (i.e. assign it to the global variable DATA_SOURCE). 17 | 18 | See the entry for the data parameter in the help for MeterDataClass: 19 | 20 | ```{r eval=F} 21 | library(visdom) 22 | ?MeterDataClass 23 | ``` 24 | 25 | 26 | The MeterDataClass object also does the weather data alignment. It matches and interpolates available weather data (from DATA_SOURCE$getWeatherData() ) to the dates associated with the meter data from getAllData. 27 | 28 | ##Data formats 29 | 30 | ```{r setup} 31 | library(visdom) 32 | DATA_SOURCE = TestData(100) 33 | ``` 34 | 35 | 1. Meter data for a single customer. Note that id (i.e. the VISDOM internal identifier for the meter), geocode, and dates (of type Date - without time) are required, as are 24 hourly or 96 15 minute meter observations per day. customerID (i.e. the owner of the meter) and other fields can be added, but are not required. 36 | 37 | ```{r customer_data_sample} 38 | custdata = DATA_SOURCE$getMeterData(id=1) 39 | head(custdata,2) 40 | dim(custdata) 41 | ``` 42 | 43 | 2. Meter data from multiple customers. 44 | 45 | ```{r meter_data_sample} 46 | # this is all data for a given geocode (i.e. zip code) 47 | geosample = DATA_SOURCE$getAllData( geocode='94305' ) 48 | head(geosample,2) 49 | dim(geosample) 50 | unique(geosample$id) 51 | ``` 52 | 53 | 3. Weather data: dates (required of type POSIXct, with time at whatever intervals observations are available in, ideally with hourly or less intervals), temperaturef (required), pressure, hourlyprecip, dewpoint. 54 | 55 | ```{r weather_attributes} 56 | weather = DATA_SOURCE$getWeatherData(geocode='94305') 57 | head(weather,2) 58 | dim(weather) 59 | ``` 60 | 61 | 4. Misc capabilities 62 | 63 | ```{r geo_attributes} 64 | DATA_SOURCE$getGeoForId('meter_1') 65 | class(DATA_SOURCE$getGeoForId('meter_1')) 66 | ``` 67 | 68 | ```{r ids_attributes} 69 | length(DATA_SOURCE$getIds()) # all the meter ids tracked by the data source 70 | class(DATA_SOURCE$getIds()) 71 | head(DATA_SOURCE$getIds()) 72 | ``` 73 | 74 | ##Testing your data source 75 | 76 | You can call `DATA_SOURCE$getMeterDataClass(id=123)` on your DataSource, replacing 123 with an appropriate id from your data set (i.e. using the provided default implementation of `getMeterDataClass()` and it will hit your data source for all relevant data and instantiate a MeterDataClass object with associated weather data and WeatherData class. Until that call succeeds, you will be getting errors that related to deficiencies in your DataSource, so it is a good guide to what else you need to implement. 77 | 78 | You can also exercise your data source with the function `sanityCheckDataSource()` 79 | 80 | ```{r sanityCheckDataSource, fig.width=6, fig.height=6} 81 | 82 | # runs a standard set of data checks 83 | # with chatty output 84 | sanityCheckDataSource(DATA_SOURCE) 85 | 86 | ``` 87 | 88 | 89 | 90 | Finally, you can probe specific data source functions with test code like this: 91 | 92 | ```{r exampleTestCode, eval=F} 93 | library(visdom) 94 | 95 | DATA_SOURCE = YourDataSource() 96 | 97 | # if your data source in configured to access a database 98 | DATA_SOURCE$run.query('select count(*) from meter_15min_user') 99 | 100 | DATA_SOURCE$run.query('select distinct zip5 from weather') 101 | 102 | # most important functions: 103 | # ------------------------- 104 | 105 | # primary geographic codes associated with meters, 106 | # typically a list of zip codes or census blocks 107 | geos = DATA_SOURCE$getGeocodes() # all geographic regions 108 | ids = DATA_SOURCE$getIds() # all known ids 109 | DATA_SOURCE$getIds(geos[1]) # all ids from the first geocoded location 110 | 111 | DATA_SOURCE$getAllData(geos[1]) # all meter data from the first geocoded region 112 | DATA_SOURCE$getGeoForId(ids[1]) # get the geo code for a specific meterId 113 | 114 | DATA_SOURCE$getMeterData(ids[1]) # returns meter data for a specific meterId 115 | md = DATA_SOURCE$getMeterDataClass(ids[1]) # returns a MeterDAtaClass object, with weather data, etc. 116 | 117 | DATA_SOURCE$getWeatherData(geos[1]) # data frame of tabular weather data for a geo location 118 | 119 | # these use DATA_SOURCE internally 120 | w = WeatherClass(geos[1],doMeans=F,useCache=F,doSG=F) 121 | md = MeterDataClass(ids[1],useCache=F) 122 | plot(md) 123 | 124 | # functions of secondary importance (you will likely know if you need these) 125 | # --------------------------------- 126 | DATA_SOURCE$getGeoCounts() 127 | # these can include census statistics and other suppliments to customer meter data 128 | DATA_SOURCE$getGeoMetaData(geos[1]) 129 | 130 | # gas data is optional 131 | DATA_SOURCE$getAllGasData() 132 | DATA_SOURCE$getGasMeterData(geo=geos[1]) 133 | DATA_SOURCE$getGasMeterData(id=ids[1]) 134 | 135 | ``` 136 | -------------------------------------------------------------------------------- /vignettes/bootstrap_devel_environment.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Developing the VISDOM Module" 3 | author: "Sam Borgeson" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Developing VISDOM} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | First ensure that you have cloned the repository into working version only your local machine, choosing one of: 13 | 14 | ```{bash eval=F} 15 | cd /dev 16 | git clone git@github.com:convergenceda/visdom.git 17 | git clone https://github.com/convergenceda/visdom.git 18 | ``` 19 | 20 | Then using that location as as your working directory (here we assume `/dev/visdom`), load requirements for package development and install from source. 21 | 22 | #Load requirements 23 | 24 | ```{r eval=F} 25 | setwd('/dev/visdom') 26 | 27 | install.packages(c("devtools", "roxygen2", "testthat", "knitr")) 28 | ``` 29 | 30 | version check 31 | 32 | ```{r eval=F} 33 | rstudioapi::isAvailable("0.99.149") 34 | ``` 35 | 36 | for some reason, withr (a devtools dependency) will not install for RRO 3.2.2 with message that there is no 3.2.2 version of withr 37 | this is probably an ephemeral issue, but install from source for now 38 | 39 | ```{r eval=F} 40 | devtools::install_github("jimhester/withr") 41 | ``` 42 | 43 | install Wickham's latest/src devtool support. Note that devtools can't install itself on Windows (!!) 44 | https://github.com/hadley/devtools/issues/503 explains. 45 | 46 | ```{r eval=F} 47 | devtools::install_github("hadley/devtools") 48 | ``` 49 | 50 | #Install VISDOM as a local package from your source 51 | 52 | Ensure that your getwd() is the visdom source dir and then call: 53 | 54 | ```{r eval=F} 55 | devtools::document() 56 | devtools::build_vignettes() 57 | ?visdom 58 | ?MeterDataClass 59 | ?WeatherDataClass 60 | 61 | ``` 62 | 63 | Followed by installing the package from source, which should resolve and install all the package dependencies, 64 | with some exceptions, like for HDF5 support. 65 | 66 | ```{r eval=F} 67 | devtools::install(build_vignettes = T) 68 | ``` 69 | 70 | 71 | Test your install with a fresh R session: 72 | ```{r eval=F} 73 | library(visdom) 74 | ?visdom 75 | browseVignettes(package='visdom') 76 | ``` 77 | -------------------------------------------------------------------------------- /vignettes/customer_data_objects.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Working With Meter Data" 3 | author: "Sam Borgeson" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Working With Meter Data} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | Documentation on loading MeterDataClass objects and exploring their data, functions and built-in features. 13 | -------------------------------------------------------------------------------- /vignettes/example_baseline.R: -------------------------------------------------------------------------------- 1 | #VISDOM_R_PATH = 'C:/dev/VISDOM-R/eneRgy/R' 2 | 3 | #DATASOURCE_PATH = '../../../SSSL_datasources/' 4 | 5 | #setwd( file.path(VISDOM_R_PATH) ) 6 | 7 | source('util-timer.R') # adds tic() and toc() functions 8 | source('iterator.R') # functions to iterate through collections of meters 9 | source('classes-customer.R') # loads MeterDataClass and WeatherDataClass 10 | 11 | 12 | #source(file.path(DATASOURCE_PATH,'/pgeResDataAccess.R')) # provides data source implementation for pge_res data 13 | 14 | #QUERY_CACHE = 'e:/dev/pge_collab/EnergyAnalytics/batch/QUERY_CACHE_STANFORD/' 15 | #DATA_SOURCE = PgeResData(dbConfig=file.path(DATASOURCE_PATH,'pge_res_DB.cfg'), queryCache=QUERY_CACHE) # Use PGE res data for analysis 16 | #DATA_SOURCE = PgeSmbData(dbConfig=file.path(DATASOURCE_PATH,'pge_smb_DB.cfg'), queryCache=QUERY_CACHE) # Use PGE res data for analysis 17 | 18 | 19 | source('baseline.R') # baselineing functions 20 | 21 | id = 34543252435432 22 | zip = 93304 23 | startTime = as.POSIXct('2013-01-01 20:00') 24 | endTime = as.POSIXct('2013-01-01 23:00') 25 | daysBefore = 100 26 | daysAfter = 10 27 | 28 | # 1. Load a MeterDataClass instance 29 | meterData = MeterDataClass(820735863,94610,useCache=T,doSG=F); plot(meterData,type='hourly',estimates=hourlyChangePoint(regressorDF(r),as.list(1:24),reweight=F)) # heat, no cooling 30 | meterData = MeterDataClass(553991005,93304,useCache=T,doSG=F); plot(meterData,type='hourly',estimates=hourlyChangePoint(regressorDF(r),as.list(1:24),reweight=F)) # very clear cooling 24x7 31 | 32 | # 2. Pass meter data into baseline function with linear regression 33 | result.regression = baseline.regression(meterData,startTime,endTime,daysBefore,daysAfter) 34 | 35 | # 3. Pass meter data into baseline function with averaging over past 10 days 36 | result.average = baseline.average(meterData,startTime,endTime,daysBefore,daysAfter) 37 | 38 | # 4. Pass meter data into baseline function with gaussian process 39 | result.gp = baseline.gp(meterData,startTime,endTime,daysBefore,daysAfter) 40 | 41 | # 5. compare and plot the results.... 42 | 43 | 44 | 45 | 46 | -------------------------------------------------------------------------------- /vignettes/example_census.R: -------------------------------------------------------------------------------- 1 | library(visdom) 2 | # use the local census data path 3 | a = data.frame(zippy=c(94611,93304)) # create a data frame with a column of zip codes. 4 | aPlus = mergeCensus(a,zipCol = "zippy") # uses zip column (aka "zippy" in this example) 5 | # to match ZCTA from census and return all the 6 | # ACS 2011 stats for the zips in question 7 | names(aPlus) # look at all the census data columns added 8 | aPlus # print everything out 9 | 10 | # see util-census.R for mode useful census data features. -------------------------------------------------------------------------------- /vignettes/example_iterator_usage.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Working With Samples from Many Meters" 3 | author: "Sam Borgeson" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Working With Samples from Many Meters} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | Examples of iterator function usage TBD. Please see vignette on feature extraction for a working example. 13 | ```{r eval=F} 14 | 15 | 16 | ``` 17 | -------------------------------------------------------------------------------- /vignettes/install_visdom.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Installing VISDOM" 3 | author: "Sam Borgeson" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Installing VISDOM} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | #Simplest approach 13 | 14 | There is support in R for installing and updating packages directly from their version control repositories. If you plan to use VISDOM without altering any of its source code, this is the preferred method of installation. 15 | 16 | ##Install VISDOM directly from GitHub source 17 | 18 | One useful feature of R documentation is its ability to present formatted examples of code usage and outputs, called vignettes. When installing from github, these vignettes are not built by default, largely due to their additional R module dependencies (and their runtimes in some cases). Still, it is highly recommended that you do build and refer to the vignettes. Note that any missing packages that R warns you about can likely be installed using the `install.packages` command. For example, if a package called `plyr` is not installed and you try to run code that depends on it, you will get a message like: `there is no package called ‘plyr’`. You can address this by calling `install.packages('plyr')` and repeat until your dependencies are all loading correctly. 19 | 20 | ```{r eval=F} 21 | install.packages(c("devtools")) 22 | 23 | # build_vignettes=T will build vignettes that serve as examples of usage 24 | # however, this will invoke other dependencies that might complicate things. 25 | # It can be set to F to just get the core code installed, but much of the documentation 26 | # effort to date has been in the form of vignettes. 27 | devtools::install_github("convergenceda/visdom", build_vignettes=T ) 28 | ``` 29 | 30 | ##Install VISDOM directly from local source 31 | 32 | If you are planning to read through, experiment with, or update the VISDOM source code itself, you will want a local copy of the repository on your machine. 33 | 34 | First ensure that you have cloned the repository into working version only your local machine, choosing one of: 35 | 36 | ```{bash eval=F} 37 | cd /dev 38 | git clone git@github.com:convergenceda/visdom.git 39 | git clone https://github.com/convergenceda/visdom.git 40 | ``` 41 | 42 | If you are working within a corporate firewall, it may be necessary to use a proxy server account to connect to GitHub. Here you will need to replace the user and pass with your username and password and `proxy.server` with either the name or ip address of your proxy server. The port `8080` may also be different for your specific configuration. 43 | 44 | ```{bash eval=F} 45 | git config --global http.proxy http://user:pass@proxy.server:8080 46 | git config --global https.proxy https://user:pass@proxy.server:8080 47 | ``` 48 | 49 | Then using that location as as your working directory (here we assume `/dev/visdom`), load requirements for package development and install from source. 50 | 51 | 52 | ```{r eval=F} 53 | 54 | install.packages(c("devtools")) 55 | 56 | setwd('/dev/visdom') 57 | 58 | devtools::install(build_vignettes = T) 59 | 60 | ``` 61 | 62 | ##Confirming that the package and documentation is in place 63 | Now check that you can load VISDOM and use it. 64 | 65 | ```{r eval=F} 66 | library(visdom) 67 | ``` 68 | 69 | Browse through the available vignettes. 70 | 71 | ```{r eval=F} 72 | # if you built them above, you can browse well formatted 73 | # code vignettes that provide example usage. 74 | browseVignettes('visdom') 75 | ``` 76 | 77 | Or the original/old school way. 78 | ```{r eval=F} 79 | # to list 80 | vignette(package='visdom') 81 | # to display a specific one as help 82 | vignette('example_feature_extraction',package='visdom') 83 | ``` 84 | -------------------------------------------------------------------------------- /vignettes/weather_data_objects.rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Working With Weather Data" 3 | author: "Sam Borgeson" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Working With Weather Data} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | Examples illustrating how to gather and work with weather data. 13 | --------------------------------------------------------------------------------