.
675 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | ### Yet Another General Regression (YAG[e]R)
6 | ### Neural Network
7 |
8 | #### Introduction
9 |
10 | Compared with other types of neural networks, [General Regression Neural Network (Specht, 1991)](https://pdfs.semanticscholar.org/45f4/3abc49a8a60e6b43ddbda5af9fc6c88d663d.pdf) is advantageous in several aspects.
11 |
12 | 1. Being an universal approximation function, GRNN has only one tuning parameter to control the overall generalization
13 | 2. The network structure of GRNN is surprisingly simple, with only one hidden layer and the number of neurons equal to the number of training samples.
14 | 3. GRNN is always able to converge globally and won’t be trapped by local solutions.
15 | 4. The training of GRNN is a simple 1-pass, regardless of the sample size, and doesn’t require time-consuming iterations.
16 | 5. Since any projected value of GRNN is the weighted average of training samples, predictions are bounded by the observed range.
17 |
18 | The grnn package (https://cran.r-project.org/web/packages/grnn/index.html), which has not been updated since 2013, is the only implementation of GRNN on CRAN and was designed elegantly with a parsimonious set of functions and lots of opportunities for potential improvements.
19 |
20 | The YAGeR project (https://github.com/statcompute/yager) is my attempt to provide a R implementation of GRNN, with several enhancements.
21 |
22 | 1. While the training function **grnn.fit()** is very similar to learn() and smooth() in the grnn package. three functions were designed to provide GRNN projections. The **grnn.predone()** function generates one projected value based on an input vector. Both **grnn.predict()** and **grnn.parpred()** functions generate a vector of projected values based on an input matrix. The only difference is that **grnn.parpred()** runs in parallel and therefore can be 3 times faster than **grnn.predict()** on my 4-core workstation.
23 | 2. While tuning the only hyper-parameter is the key in GRNN training, there are two functions in the GRnnet project to search for the optimal parameter through the n-fold cross validation, including **grnn.search_rsq()** for numeric outcomes and **grnn.search_auc()** for binary outcomes.
24 | 3. In **grnn.predone()** function, while the default projection is based on the Euclidean distance, there is an option to calculate the GRNN projection based on the Manhattan distance as well for the sake of computational simplicity (Specht, 1991).
25 |
26 | #### Why Use GRNN?
27 |
28 | In the banking industry, GRNN can be useful in several areas. First of all, it can be employed as the replacement of splines to approximate the term structure of interest rates. Secondly, like other neural networks, it can be used in Fraud Detection and Anti-Money Laundering given its flexibility. At last, in the credit risk modeling, it can also be used to develop performance benchmarks and rapid prototypes for scorecards or Expected Loss models due to the simplicity.
29 |
30 | #### Package Dependencies
31 | R version 3.6, base, stats, parallel, MLmetrics, randtoolbox, lhs
32 |
33 | #### Installation
34 |
35 | Download the [yager_0.1.1.tar.gz](https://github.com/statcompute/yager/blob/master/yager_0.1.1.tar.gz) file, save it in your working directory, and then install the package as below.
36 |
37 | ```r
38 | install.packages("yager_0.1.1.tar.gz", repos = NULL, type = "source")
39 | ```
40 |
41 | Alternatively, you can simply install from CRAN.
42 |
43 | ```r
44 | install.packages("yager")
45 | ```
46 |
47 | #### Functions
48 |
49 | ```txt
50 | YAGeR
51 | |
52 | |-- 1D Random Number Generators
53 | | |-- gen_unifm(min = 0, max = 1, n, seed = 1)
54 | | |-- gen_sobol(min = 0, max = 1, n, seed = 1)
55 | | `-- gen_latin(min = 0, max = 1, n, seed = 1)
56 | |
57 | |-- Training
58 | | `-- grnn.fit(x, y, w = rep(1, length(y)), sigma = 1)
59 | |
60 | |-- Prediction
61 | | |-- grnn.predone(net, x, type = 1)
62 | | |-- grnn.predict(net, x)
63 | | `-- grnn.parpred(net, x)
64 | |
65 | |-- Parameter Tuning
66 | | |-- grnn.search_rsq(net, sigmas, nfolds = 4, seed = 1)
67 | | |-- grnn.search_auc(net, sigmas, nfolds = 4, seed = 1)
68 | | `-- grnn.optmiz_auc(net, lower = 0, upper, nfolds = 4, seed = 1, method = 1)
69 | |
70 | |-- Variable Importance
71 | | |-- grnn.x_imp(net, i, class = F)
72 | | |-- grnn.imp(net, class = F)
73 | | |-- grnn.x_pfi(net, i, class = F, ntry = 1e3, seed = 1)
74 | | `-- grnn.pfi(net, class = F, ntry = 1e3, seed = 1)
75 | |
76 | `-- Variable Effect
77 | |-- grnn.margin(net, i, plot = T)
78 | `-- grnn.partial(net, i, plot = T)
79 | ```
80 |
81 | #### Example
82 | It has been mentioned previously that GRNN is an ideal approach employed to develop performance benchmarks for a variety of risk models. People might wonder what the purpose of performance benchmarks is and why we would even need one at all. Sometimes, a model developer had to answer questions about how well the model would perform even before completing the model. Likewise, a model validator also wondered whether the model being validated has a reasonable performance given the data used and the effort spent. As a result, the performance benchmark, which could be built with the same data sample but an alternative methodology, is called for to address aforementioned questions.
83 |
84 | While the performance benchmark can take various forms, including but not limited to business expectations, industry practices, or vendor products, a model-based approach should possess following characteristics:
85 |
86 | - Quick prototype with reasonable efforts
87 | - Comparable baseline with acceptable outcomes
88 | - Flexible framework without strict assumptions
89 | - Practical application to broad domains
90 |
91 | With both empirical and conceptual advantages, GRNN is able to accommendate each of abovementioned requirements and thus can be considered an approriate candidate that might potentially be employed to develop performance benchmarks for a wide variety of models.
92 |
93 | Below is an example illustrating how to use GRNN to develop a benchmark model for the logistic regression shown in https://github.com/statcompute/MonotonicBinning#example.
94 |
95 | ```r
96 | df <- readRDS("df.rds")
97 | source("mob.R")
98 | library(yager)
99 |
100 | # PRE-PROCESS THE DATA WITH MOB PACKAGE
101 | bin_out <- batch_bin(df, 3)
102 | bin_out$BinSum[order(-bin_out$BinSum$iv), ]
103 | # var nbin unique miss min median max ks iv
104 | # bureau_score 34 315 315 443 692.5 848 35.2651 0.8357
105 | # tot_rev_line 20 3617 477 0 10573.0 205395 26.8943 0.4442
106 | # age_oldest_tr 25 460 216 1 137.0 588 20.3646 0.2714
107 | # tot_derog 7 29 213 0 0.0 32 20.0442 0.2599
108 | # ltv 17 145 1 0 100.0 176 16.8807 0.1911
109 | # rev_util 12 101 0 0 30.0 100 16.9615 0.1635
110 | # tot_tr 15 67 213 0 16.0 77 17.3002 0.1425
111 | # tot_rev_debt 8 3880 477 0 3009.5 96260 8.8722 0.0847
112 | # tot_rev_tr 4 21 636 0 3.0 24 9.0779 0.0789
113 | # tot_income 17 1639 5 0 3400.0 8147167 10.3386 0.0775
114 | # tot_open_tr 7 26 1416 0 5.0 26 6.8695 0.0282
115 |
116 | # PERFORMAN WOE TRANSFORMATIONS
117 | df_woe <- batch_woe(df, bin_out$BinLst)
118 |
119 | # PROCESS AND STANDARDIZE THE DATA WITH ZERO MEAN AND UNITY VARIANCE
120 | Y <- df$bad
121 | X <- scale(df_woe$df[, -1])
122 | Reduce(rbind, Map(function(c) data.frame(var = colnames(X)[c], mean = mean(X[, c]), variance = var(X[, c])), seq(dim(X)[2])))
123 | # var mean variance
124 | #1 woe.tot_derog 2.234331e-16 1
125 | #2 woe.tot_tr -2.439238e-15 1
126 | #3 woe.age_oldest_tr -2.502177e-15 1
127 | #4 woe.tot_open_tr -2.088444e-16 1
128 | #5 woe.tot_rev_tr -4.930136e-15 1
129 | #6 woe.tot_rev_debt -2.174607e-16 1
130 | #7 woe.tot_rev_line -8.589630e-16 1
131 | #8 woe.rev_util -8.649849e-15 1
132 | #9 woe.bureau_score 1.439904e-15 1
133 | #10 woe.ltv 3.723332e-15 1
134 | #11 woe.tot_income 5.559240e-16 1
135 |
136 | # INITIATE A GRNN OBJECT
137 | net1 <- grnn.fit(x = X, y = Y)
138 | # CROSS-VALIDATION TO CHOOSE THE OPTIONAL SMOOTH PARAMETER
139 | S <- gen_sobol(min = 0.5, max = 1.5, n = 10, seed = 2019)
140 | cv <- grnn.search_auc(net = net1, sigmas = S, nfolds = 5)
141 | # $test
142 | # sigma auc
143 | #1 1.4066449 0.7543912
144 | #2 0.6205723 0.7303415
145 | #3 1.0710133 0.7553075
146 | #4 0.6764866 0.7378430
147 | #5 1.1322939 0.7553664
148 | #6 0.8402438 0.7507192
149 | #7 1.3590402 0.7546164
150 | #8 1.3031974 0.7548670
151 | #9 0.7555905 0.7455457
152 | #10 1.2174429 0.7552097
153 | # $best
154 | # sigma auc
155 | #5 1.132294 0.7553664
156 |
157 | # REFIT A GRNN WITH THE OPTIMAL PARAMETER VALUE
158 | net2 <- grnn.fit(x = X, y = Y, sigma = cv$best$sigma)
159 | net2.pred <- grnn.parpred(net2, X)
160 |
161 | # BENCHMARK MODEL PERFORMANCE
162 | MLmetrics::KS_Stat(y_pred = net2.pred, y_true = df$bad)
163 | # 44.00242
164 | MLmetrics::AUC(y_pred = net2.pred, y_true = df$bad)
165 | # 0.7895033
166 |
167 | # LOGISTIC REGRESSION PERFORMANCE
168 | MLmetrics::KS_Stat(y_pred = fitted(mdl2), y_true = df$bad)
169 | # 42.61731
170 | MLmetrics::AUC(y_pred = fitted(mdl2), y_true = df$bad)
171 | # 0.7751298
172 |
173 | ```
174 |
175 | The function [grnn.margin()](https://github.com/statcompute/GRnnet/blob/master/code/grnn.margin.R) can also be employed to explore the marginal effect of each attribute in a GRNN.
176 |
177 |
178 |
179 |
180 |
181 |
182 |
183 |
--------------------------------------------------------------------------------
/code/08.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/statcompute/yager/181071cbb3e05630e1deacba65173807cf80aee2/code/08.jpg
--------------------------------------------------------------------------------
/code/GenRandom1D.R:
--------------------------------------------------------------------------------
1 | gen_unifm <- function(min = 0, max = 1, n, seed) {
2 | set.seed(seed)
3 | return(round(min + (max - min) * runif(n), 8))
4 | }
5 |
6 | gen_sobol <- function(min = 0, max = 1, n, seed) {
7 | return(round(min + (max - min) * randtoolbox::sobol(n, dim = 1, scrambling = 3, seed = seed), 8))
8 | }
9 |
10 | gen_latin <- function(min = 0, max = 1, n, seed) {
11 | set.seed(seed)
12 | return(round(min + (max - min) * c(lhs::randomLHS(n, k = 1)), 8))
13 | }
14 |
--------------------------------------------------------------------------------
/code/archive/yager_v1.R:
--------------------------------------------------------------------------------
1 | ###############################################################################
2 | # PACKAGE NAME: Yet Another General Regression (YAG[e]R) Neural Network
3 | # AUTHOR : WENSUI LIU
4 | # DISCLAIMER : THIS IS MY WEEKEND PROJECT AND NOT RELATED TO MY CURRENT WORK WITH MY EMPLOYER
5 | # IT IS FREE (AS FREE BEER) TO USE AND DISTRIBUTE
6 | ###############################################################################
7 |
8 | grnn.fit <- function(x, y, w = rep(1, length(y)), sigma = 1) {
9 | ### CHECK X MATRIX ###
10 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
11 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
12 | ### CHECK Y VECTOR ###
13 | if (is.vector(y) == F) stop("y needs to be a vector.", call. = F)
14 | if (anyNA(y) == T) stop("NA found in y.", call. = F)
15 | if (length(y) != nrow(x)) stop("x and y need to share the same length.", call. = F)
16 | ### CHECK W VECTOR ###
17 | if (is.vector(w) == F) stop("w needs to be a vector.", call. = F)
18 | if (anyNA(w) == T) stop("NA found in w.", call. = F)
19 | if (length(w) != nrow(x)) stop("x and w need to share the same length.", call. = F)
20 | ### CHECK SIGMA ###
21 | if (sigma <= 0) stop("sigma needs to be positive", call. = F)
22 |
23 | gn <- structure(list(), class = "General Regression Neural Net")
24 | gn$x <- x
25 | gn$y <- y
26 | gn$w <- w
27 | gn$sigma <- sigma
28 | return(gn)
29 | }
30 |
31 | grnn.predone<- function(net, x, type = 1) {
32 | ### CHECK INPUT X VECTOR ###
33 | if (is.vector(x) == F) stop("x needs to be a vector.", call. = F)
34 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
35 | if (length(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
36 | ### CHECK INPUT TYPE (CURRENTLY SUPPORTING 1 / 2) ###
37 | if (!(type %in% c(1, 2))) stop("the type is not supported.", call. = F)
38 |
39 | if (type == 1) {
40 | ### EUCLIDEAN DISTANCE BY DEFAULT ###
41 | num <- sum(net$w * net$y * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum((x - xi) ^ 2))) / (2 * (net$sigma ^ 2))))
42 | den <- sum(net$w * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum((x - xi) ^ 2))) / (2 * (net$sigma ^ 2))))
43 | } else if (type == 2) {
44 | ### MANHATTAN DISTANCE ###
45 | num <- sum(net$w * net$y * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum(abs(x - xi)))) / net$sigma))
46 | den <- sum(net$w * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum(abs(x - xi)))) / net$sigma ))
47 | }
48 | return(num / den)
49 | }
50 |
51 | grnn.predict <- function(net, x) {
52 | ### CHECK INPUT X MATRIX ###
53 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
54 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
55 | if (ncol(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
56 |
57 | return(Reduce(c, lapply(split(x, seq(nrow(x))), function(x_) grnn.predone(net, x_))))
58 | }
59 |
60 | grnn.parpred <- function(net, x) {
61 | ### CHECK INPUT X MATRIX ###
62 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
63 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
64 | if (ncol(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
65 |
66 | cls <- parallel::makeCluster(min(floor(nrow(x) / 3), parallel::detectCores() - 1), type = "PSOCK")
67 | obj <- c("net", "x", "grnn.predone", "grnn.predict")
68 | parallel::clusterExport(cls, obj, envir = environment())
69 | spx <- parallel::parLapplyLB(cls, parallel::clusterSplit(cls, seq(nrow(x))),
70 | function(c_) x[c_, ])
71 | rst <- parallel::parLapplyLB(cls, spx, function(x_) grnn.predict(net, x_))
72 | parallel::stopCluster(cls)
73 | return(Reduce(c, rst))
74 | }
75 |
76 | grnn.search_rsq <- function(net, sigmas, nfolds = 3, seed = 1) {
77 | set.seed(seed)
78 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
79 |
80 | cv <- function(s) {
81 | rs <- Reduce(rbind,
82 | lapply(fd,
83 | function(f) data.frame(ya = net$y[unlist(f)],
84 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
85 | net$x[unlist(f), ]))))
86 | return(data.frame(sigma = s, r2 = MLmetrics::R2_Score(y_pred = rs$yp, y_true = rs$ya)))
87 | }
88 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
89 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
90 | parallel::clusterExport(cls, obj, envir = environment())
91 | rst <- Reduce(rbind, parallel::parLapply(cls, sigmas, cv))
92 | parallel::stopCluster(cls)
93 | return(list(test = rst, best = rst[rst$r2 == max(rst$r2), ]))
94 | }
95 |
96 | grnn.search_auc <- function(net, sigmas, nfolds = 3, seed = 1) {
97 | set.seed(seed)
98 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
99 |
100 | cv <- function(s) {
101 | rs <- Reduce(rbind,
102 | lapply(fd,
103 | function(f) data.frame(ya = net$y[unlist(f)],
104 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
105 | net$x[unlist(f), ]))))
106 | return(data.frame(sigma = s, auc = MLmetrics::AUC(y_pred = rs$yp, y_true = rs$ya)))
107 | }
108 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
109 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
110 | parallel::clusterExport(cls, obj, envir = environment())
111 | rst <- Reduce(rbind, parallel::parLapply(cls, sigmas, cv))
112 | parallel::stopCluster(cls)
113 | return(list(test = rst, best = rst[rst$auc == max(rst$auc), ]))
114 | }
115 |
116 | grnn.optmiz_auc <- function(net, lower = 0, upper, nfolds = 3, seed = 1, method = 1) {
117 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
118 | set.seed(seed)
119 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
120 |
121 | cv <- function(s) {
122 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
123 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
124 | parallel::clusterExport(cls, obj, envir = environment())
125 | rs <- Reduce(rbind,
126 | parallel::parLapply(cls, fd,
127 | function(f) data.frame(ya = net$y[unlist(f)],
128 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
129 | net$x[unlist(f), ]))))
130 | parallel::stopCluster(cls)
131 | return(MLmetrics::AUC(y_pred = rs$yp, y_true = rs$ya))
132 | }
133 |
134 | if (method == 1) {
135 | rst <- optimize(f = cv, interval = c(lower, upper), maximum = T)
136 | } else if (method == 2) {
137 | rst <- optim(par = mean(lower, upper), fn = cv, lower = lower, upper = upper,
138 | method = "Brent", control = list(fnscale = -1))
139 | }
140 | return(data.frame(sigma = rst[[1]], auc = rst[[2]]))
141 | }
142 |
143 | grnn.margin <- function(net, i) {
144 | n <- length(unique(net$x[, i]))
145 | x <- matrix(rep(colMeans(net$x), n), nrow = n, byrow = T)
146 | x[, i] <- unique(net$x[, i])
147 | x.pred <- grnn.parpred(net, x)
148 | xname <- colnames(net$x)[i]
149 | plot(sort(x[, i]), x.pred[order(x[, i])], type = "b", lty = 4, lwd = 3, ylab = '', xlab = xname,
150 | main = "Marginal Effect", pch = 16, cex = 1.5, col = "red", cex.main = 1, cex.lab = 1, yaxt = 'n')
151 | }
152 |
153 | gen_unifm <- function(min = 0, max = 1, n, seed = 1) {
154 | set.seed(seed)
155 | return(round(min + (max - min) * runif(n), 8))
156 | }
157 |
158 | gen_sobol <- function(min = 0, max = 1, n, seed = 1) {
159 | return(round(min + (max - min) * randtoolbox::sobol(n, dim = 1, scrambling = 3, seed = seed), 8))
160 | }
161 |
162 | gen_latin <- function(min = 0, max = 1, n, seed = 1) {
163 | set.seed(seed)
164 | return(round(min + (max - min) * c(lhs::randomLHS(n, k = 1)), 8))
165 | }
166 |
--------------------------------------------------------------------------------
/code/archive/yager_v2.R:
--------------------------------------------------------------------------------
1 | ###########################################################################
2 | # PACKAGE NAME: YAGeR (YET ANOTHER GENERAL REGRESSION NEURAL NETWORK) #
3 | # AUTHOR : WENSUI LIU (liuwensui@gmail.com) #
4 | # DISCLAIMER : THIS IS MY WEEKEND PROJECT AND NOT RELATED TO MY #
5 | # CURRENT WORK WITH MY EMPLOYER #
6 | # IT IS FREE (AS FREE BEER) TO USE AND DISTRIBUTE #
7 | ###########################################################################
8 |
9 | gen_unifm <- function(min = 0, max = 1, n, seed = 1) {
10 | set.seed(seed)
11 | return(round(min + (max - min) * runif(n), 8))
12 | }
13 |
14 | gen_sobol <- function(min = 0, max = 1, n, seed = 1) {
15 | return(round(min + (max - min) * randtoolbox::sobol(n, dim = 1, scrambling = 3, seed = seed), 8))
16 | }
17 |
18 | gen_latin <- function(min = 0, max = 1, n, seed = 1) {
19 | set.seed(seed)
20 | return(round(min + (max - min) * c(lhs::randomLHS(n, k = 1)), 8))
21 | }
22 |
23 | ###########################################################################
24 |
25 | grnn.fit <- function(x, y, w = rep(1, length(y)), sigma = 1) {
26 | ### CHECK X MATRIX ###
27 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
28 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
29 | ### CHECK Y VECTOR ###
30 | if (is.vector(y) == F) stop("y needs to be a vector.", call. = F)
31 | if (anyNA(y) == T) stop("NA found in y.", call. = F)
32 | if (length(y) != nrow(x)) stop("x and y need to share the same length.", call. = F)
33 | ### CHECK W VECTOR ###
34 | if (is.vector(w) == F) stop("w needs to be a vector.", call. = F)
35 | if (anyNA(w) == T) stop("NA found in w.", call. = F)
36 | if (length(w) != nrow(x)) stop("x and w need to share the same length.", call. = F)
37 | ### CHECK SIGMA ###
38 | if (sigma <= 0) stop("sigma needs to be positive", call. = F)
39 |
40 | gn <- structure(list(), class = "General Regression Neural Net")
41 | gn$x <- x
42 | gn$y <- y
43 | gn$w <- w
44 | gn$sigma <- sigma
45 | return(gn)
46 | }
47 |
48 | ###########################################################################
49 |
50 | grnn.predone<- function(net, x, type = 1) {
51 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
52 | ### CHECK INPUT X VECTOR ###
53 | if (is.vector(x) == F) stop("x needs to be a vector.", call. = F)
54 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
55 | if (length(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
56 | ### CHECK INPUT TYPE (CURRENTLY SUPPORTING 1 / 2) ###
57 | if (!(type %in% c(1, 2))) stop("the type is not supported.", call. = F)
58 |
59 | if (type == 1) {
60 | ### EUCLIDEAN DISTANCE BY DEFAULT ###
61 | num <- sum(net$w * net$y * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum((x - xi) ^ 2))) / (2 * (net$sigma ^ 2))))
62 | den <- sum(net$w * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum((x - xi) ^ 2))) / (2 * (net$sigma ^ 2))))
63 | } else if (type == 2) {
64 | ### MANHATTAN DISTANCE ###
65 | num <- sum(net$w * net$y * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum(abs(x - xi)))) / net$sigma))
66 | den <- sum(net$w * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum(abs(x - xi)))) / net$sigma ))
67 | }
68 | return(num / den)
69 | }
70 |
71 | ###########################################################################
72 |
73 | grnn.predict <- function(net, x) {
74 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
75 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
76 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
77 | if (ncol(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
78 | return(Reduce(c, lapply(split(x, seq(nrow(x))), function(x_) grnn.predone(net, x_))))
79 | }
80 |
81 | ###########################################################################
82 |
83 | grnn.parpred <- function(net, x) {
84 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
85 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
86 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
87 | if (ncol(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
88 |
89 | cls <- parallel::makeCluster(min(floor(nrow(x) / 3), parallel::detectCores() - 1), type = "PSOCK")
90 | obj <- c("net", "x", "grnn.predone", "grnn.predict")
91 | parallel::clusterExport(cls, obj, envir = environment())
92 | spx <- parallel::parLapplyLB(cls, parallel::clusterSplit(cls, seq(nrow(x))),
93 | function(c_) x[c_, ])
94 | rst <- parallel::parLapplyLB(cls, spx, function(x_) grnn.predict(net, x_))
95 | parallel::stopCluster(cls)
96 | return(Reduce(c, rst))
97 | }
98 |
99 | ###########################################################################
100 |
101 | grnn.search_rsq <- function(net, sigmas, nfolds = 4, seed = 1) {
102 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
103 | if (is.vector(sigmas) != T) stop("sigmas needs to be a vector.", call. = F)
104 |
105 | set.seed(seed)
106 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
107 |
108 | cv <- function(s) {
109 | rs <- Reduce(rbind,
110 | lapply(fd,
111 | function(f) data.frame(ya = net$y[unlist(f)],
112 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
113 | net$x[unlist(f), ]))))
114 | return(data.frame(sigma = s, r2 = MLmetrics::R2_Score(y_pred = rs$yp, y_true = rs$ya)))
115 | }
116 |
117 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
118 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
119 | parallel::clusterExport(cls, obj, envir = environment())
120 | rst <- Reduce(rbind, parallel::parLapply(cls, sigmas, cv))
121 | parallel::stopCluster(cls)
122 | return(list(test = rst, best = rst[rst$r2 == max(rst$r2), ]))
123 | }
124 |
125 | ###########################################################################
126 |
127 | grnn.search_auc <- function(net, sigmas, nfolds = 4, seed = 1) {
128 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
129 | if (is.vector(sigmas) != T) stop("sigmas needs to be a vector.", call. = F)
130 |
131 | set.seed(seed)
132 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
133 |
134 | cv <- function(s) {
135 | rs <- Reduce(rbind,
136 | lapply(fd,
137 | function(f) data.frame(ya = net$y[unlist(f)],
138 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
139 | net$x[unlist(f), ]))))
140 | return(data.frame(sigma = s, auc = MLmetrics::AUC(y_pred = rs$yp, y_true = rs$ya)))
141 | }
142 |
143 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
144 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
145 | parallel::clusterExport(cls, obj, envir = environment())
146 | rst <- Reduce(rbind, parallel::parLapply(cls, sigmas, cv))
147 | parallel::stopCluster(cls)
148 | return(list(test = rst, best = rst[rst$auc == max(rst$auc), ]))
149 | }
150 |
151 | ###########################################################################
152 |
153 | grnn.optmiz_auc <- function(net, lower = 0, upper, nfolds = 4, seed = 1, method = 1) {
154 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
155 | set.seed(seed)
156 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
157 |
158 | cv <- function(s) {
159 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
160 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
161 | parallel::clusterExport(cls, obj, envir = environment())
162 | rs <- Reduce(rbind,
163 | parallel::parLapply(cls, fd,
164 | function(f) data.frame(ya = net$y[unlist(f)],
165 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
166 | net$x[unlist(f), ]))))
167 | parallel::stopCluster(cls)
168 | return(MLmetrics::AUC(y_pred = rs$yp, y_true = rs$ya))
169 | }
170 |
171 | if (method == 1) {
172 | rst <- optimize(f = cv, interval = c(lower, upper), maximum = T)
173 | } else if (method == 2) {
174 | rst <- optim(par = mean(lower, upper), fn = cv, lower = lower, upper = upper,
175 | method = "Brent", control = list(fnscale = -1))
176 | }
177 | return(data.frame(sigma = rst[[1]], auc = rst[[2]]))
178 | }
179 |
180 | ###########################################################################
181 |
182 | grnn.margin <- function(net, i, plot = T) {
183 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
184 | if (i > ncol(net$x)) stop("the selected variable is out of bound.", call. = F)
185 | if (!(plot %in% c(T, F))) stop("the plot input is not correct.", call. = F)
186 |
187 | xname <- colnames(net$x)[i]
188 | n <- length(unique(net$x[, i]))
189 | x <- matrix(rep(colMeans(net$x), n), nrow = n, byrow = T)
190 | x[, i] <- sort(unique(net$x[, i]))
191 | rst <- data.frame(x = x[, i], p = grnn.parpred(net, x))
192 | if (plot == T) {
193 | plot(rst[, 1], rst[, 2], type = "b", lty = 4, lwd = 3, ylab = '', xlab = xname,
194 | main = "Marginal Effect", pch = 16, cex = 1.5, col = "red", cex.main = 1, cex.lab = 1, yaxt = 'n')
195 | } else {
196 | return(rst)
197 | }
198 | }
199 |
200 | ###########################################################################
201 |
202 | grnn.partial <- function(net, i, plot = T) {
203 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
204 | if (i > ncol(net$x)) stop("the selected variable is out of bound.", call. = F)
205 | if (!(plot %in% c(T, F))) stop("the plot input is not correct.", call. = F)
206 |
207 | xname <- colnames(net$x)[i]
208 | xi <- sort(unique(net$x[, i]))
209 |
210 | partial <- function(x_i) {
211 | x <- net$x
212 | x[, i] <- rep(x_i, length(net$y))
213 | return(data.frame(x = x_i, p = mean(grnn.predict(net, x))))
214 | }
215 |
216 | cls <- parallel::makeCluster(min(length(xi), parallel::detectCores() - 1), type = "PSOCK")
217 | obj <- c("net", "grnn.fit", "grnn.predone", "grnn.predict")
218 | parallel::clusterExport(cls, obj, envir = environment())
219 | rst <- Reduce(rbind, parallel::parLapply(cls, xi, partial))
220 | parallel::stopCluster(cls)
221 | if (plot == T) {
222 | plot(rst[, 1], rst[, 2], type = "b", lty = 4, lwd = 3, ylab = '', xlab = xname,
223 | main = "Partial Dependence", pch = 16, cex = 1.5, col = "royalblue", cex.main = 1, cex.lab = 1, yaxt = 'n')
224 | } else {
225 | return(rst)
226 | }
227 | }
228 |
229 | ###########################################################################
230 |
231 | grnn.x_imp <- function(net, i, class = F) {
232 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
233 | if (i > ncol(net$x)) stop("the selected variable is out of bound.", call. = F)
234 | if (!(class %in% c(T, F))) stop("the class input is not correct.", call. = F)
235 |
236 | xname <- colnames(net$x)[i]
237 | x <- net$x
238 | x[, i] <- rep(mean(net$x[, i]), length(net$y))
239 | if (class == T) {
240 | auc0 <- MLmetrics::AUC(grnn.predict(net, net$x), net$y)
241 | auc1 <- MLmetrics::AUC(grnn.predict(net, x), net$y)
242 | auc2 <- MLmetrics::AUC(grnn.predict(grnn.fit(x = x[, -i], y = net$y, sigma = net$sigma), x[, -i]), net$y)
243 | imp1 <- round(max(0, 1 - auc1 / auc0), 8)
244 | imp2 <- round(max(0, 1 - auc2 / auc0), 8)
245 | } else {
246 | rsq0 <- MLmetrics::R2_Score(grnn.predict(net, net$x), net$y)
247 | rsq1 <- MLmetrics::R2_Score(grnn.predict(net, x), net$y)
248 | rsq2 <- MLmetrics::R2_Score(grnn.predict(grnn.fit(x = x[, -i], y = net$y, sigma = net$sigma), x[, -i]), net$y)
249 | imp1 <- round(max(0, 1 - rsq1 / rsq0), 8)
250 | imp2 <- round(max(0, 1 - rsq2 / rsq0), 8)
251 | }
252 | return(data.frame(var = xname, imp1 = imp1, imp2 = imp2))
253 | }
254 |
255 | ###########################################################################
256 |
257 | grnn.x_pfi <- function(net, i, class = F, ntry = 1e3, seed = 1) {
258 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
259 | if (!(class %in% c(T, F))) stop("the class input is not correct.", call. = F)
260 |
261 | xname <- colnames(net$x)[i]
262 | set.seed(seed)
263 | seeds <- round(runif(ntry) * 1e8, 0)
264 | ol <- lapply(seeds, function(s) with(set.seed(s), sample(seq(nrow(net$x)), nrow(net$x), replace = F)))
265 | cl <- Reduce(c, lapply(ol, function(o) abs(cor(seq(nrow(net$x)), o))))
266 | x <- net$x
267 | x[, i] <- net$x[ol[[which(cl == min(cl))]], i]
268 | if (class == T) {
269 | auc0 <- MLmetrics::AUC(grnn.predict(net, net$x), net$y)
270 | auc1 <- MLmetrics::AUC(grnn.predict(net, x), net$y)
271 | pfi <- round(max(0, 1 - auc1 / auc0), 8)
272 | } else {
273 | rsq0 <- MLmetrics::R2_Score(grnn.predict(net, net$x), net$y)
274 | rsq1 <- MLmetrics::R2_Score(grnn.predict(net, x), net$y)
275 | pfi <- round(max(0, 1 - rsq1 / rsq0), 8)
276 | }
277 | return(data.frame(var = xname, pfi = pfi))
278 | }
279 |
280 | ###########################################################################
281 |
282 | grnn.imp <- function(net, class = F) {
283 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
284 | if (!(class %in% c(T, F))) stop("the class input is not correct.", call. = F)
285 |
286 | cls <- parallel::makeCluster(min(ncol(net$x), parallel::detectCores() - 1), type = "PSOCK")
287 | obj <- c("net", "class", "grnn.fit", "grnn.predone", "grnn.predict", "grnn.x_imp")
288 | parallel::clusterExport(cls, obj, envir = environment())
289 | rst1 <- data.frame(idx = seq(ncol(net$x)),
290 | Reduce(rbind, parallel::parLapply(cls, seq(ncol(net$x)), function(i) grnn.x_imp(net, i, class = class))))
291 | parallel::stopCluster(cls)
292 | rst2 <- rst1[with(rst1, order(-imp1, -imp2)), ]
293 | row.names(rst2) <- NULL
294 | return(rst2)
295 | }
296 |
297 | ###########################################################################
298 |
299 | grnn.pfi <- function(net, class = F, ntry = 1e3, seed = 1) {
300 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
301 | if (!(class %in% c(T, F))) stop("the class input is not correct.", call. = F)
302 |
303 | cls <- parallel::makeCluster(min(ncol(net$x), parallel::detectCores() - 1), type = "PSOCK")
304 | obj <- c("net", "class", "grnn.fit", "grnn.predone", "grnn.predict", "grnn.x_pfi", "ntry", "seed")
305 | parallel::clusterExport(cls, obj, envir = environment())
306 | rst1 <- data.frame(idx = seq(ncol(net$x)),
307 | Reduce(rbind, parallel::parLapply(cls, seq(ncol(net$x)),
308 | function(i) grnn.x_pfi(net, i, class = class, ntry = ntry, seed = seed))))
309 | parallel::stopCluster(cls)
310 | rst2 <- rst1[with(rst1, order(-pfi)), ]
311 | row.names(rst2) <- NULL
312 | return(rst2)
313 | }
314 |
315 |
--------------------------------------------------------------------------------
/code/folds.R:
--------------------------------------------------------------------------------
1 | #' Create a list of N-fold index
2 | #'
3 | #' @param idx A vector of index
4 | #' @param n The value of N folds
5 | #' @param seed The seed value with the default = 1
6 | #' @return A list of N-fold index
7 | #' @examples
8 | #' folds(1:10, 3)
9 | #'
10 | folds <- function(idx, n, seed = 1) {
11 | g <- with(set.seed(seed), sample(idx, length(idx))) %% n + 1
12 | r <- split(idx, g)
13 | names(r) <- paste('Fold', seq(n), sep = '')
14 | return(r)
15 | }
16 |
--------------------------------------------------------------------------------
/code/gam.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/statcompute/yager/181071cbb3e05630e1deacba65173807cf80aee2/code/gam.PNG
--------------------------------------------------------------------------------
/code/grnn.fit.R:
--------------------------------------------------------------------------------
1 | grnn.fit <- function(x, y, w = rep(1, length(y)), sigma = 1) {
2 | ### CHECK X MATRIX ###
3 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
4 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
5 | ### CHECK Y VECTOR ###
6 | if (is.vector(y) == F) stop("y needs to be a vector.", call. = F)
7 | if (anyNA(y) == T) stop("NA found in y.", call. = F)
8 | if (length(y) != nrow(x)) stop("x and y need to share the same length.", call. = F)
9 | ### CHECK W VECTOR ###
10 | if (is.vector(w) == F) stop("w needs to be a vector.", call. = F)
11 | if (anyNA(w) == T) stop("NA found in w.", call. = F)
12 | if (length(w) != nrow(x)) stop("x and w need to share the same length.", call. = F)
13 | ### CHECK SIGMA ###
14 | if (sigma <= 0) stop("sigma needs to be positive", call. = F)
15 |
16 | gn <- structure(list(), class = "General Regression Neural Net")
17 | gn$x <- x
18 | gn$y <- y
19 | gn$w <- w
20 | gn$sigma <- sigma
21 | return(gn)
22 | }
23 |
--------------------------------------------------------------------------------
/code/grnn.imp.R:
--------------------------------------------------------------------------------
1 | grnn.imp <- function(net) {
2 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
3 |
4 | cls <- parallel::makeCluster(min(ncol(net$x), parallel::detectCores() - 1), type = "PSOCK")
5 | obj <- c("net", "grnn.fit", "grnn.predone", "grnn.predict", "grnn.x_imp")
6 | parallel::clusterExport(cls, obj, envir = environment())
7 | rst1 <- data.frame(idx = seq(ncol(net$x)),
8 | Reduce(rbind, parallel::parLapply(cls, seq(ncol(net$x)), function(i) grnn.x_imp(net, i))))
9 | parallel::stopCluster(cls)
10 | rst2 <- rst1[with(rst1, order(-imp1, -imp2)), ]
11 | row.names(rst2) <- NULL
12 | return(rst2)
13 | }
14 |
--------------------------------------------------------------------------------
/code/grnn.margin.R:
--------------------------------------------------------------------------------
1 | grnn.margin <- function(net, i, plot = T) {
2 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
3 | xname <- colnames(net$x)[i]
4 | n <- length(unique(net$x[, i]))
5 | x <- matrix(rep(colMeans(net$x), n), nrow = n, byrow = T)
6 | x[, i] <- sort(unique(net$x[, i]))
7 | rst <- data.frame(x = x[, i], p = grnn.parpred(net, x))
8 | if (plot == T) {
9 | plot(rst[, 1], rst[, 2], type = "b", lty = 4, lwd = 3, ylab = '', xlab = xname,
10 | main = "Marginal Effect", pch = 16, cex = 1.5, col = "red", cex.main = 1, cex.lab = 1, yaxt = 'n')
11 | } else {
12 | return(rst)
13 | }
14 | }
15 |
--------------------------------------------------------------------------------
/code/grnn.optmiz_auc.R:
--------------------------------------------------------------------------------
1 | grnn.optmiz_auc <- function(net, lower = 0, upper, nfolds = 4, seed = 1, method = 1) {
2 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
3 | set.seed(seed)
4 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
5 |
6 | cv <- function(s) {
7 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
8 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
9 | parallel::clusterExport(cls, obj, envir = environment())
10 | rs <- Reduce(rbind,
11 | parallel::parLapply(cls, fd,
12 | function(f) data.frame(ya = net$y[unlist(f)],
13 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
14 | net$x[unlist(f), ]))))
15 | parallel::stopCluster(cls)
16 | return(MLmetrics::AUC(y_pred = rs$yp, y_true = rs$ya))
17 | }
18 |
19 | if (method == 1) {
20 | rst <- optimize(f = cv, interval = c(lower, upper), maximum = T)
21 | } else if (method == 2) {
22 | rst <- optim(par = mean(lower, upper), fn = cv, lower = lower, upper = upper,
23 | method = "Brent", control = list(fnscale = -1))
24 | }
25 | return(data.frame(sigma = rst[[1]], auc = rst[[2]]))
26 | }
27 |
--------------------------------------------------------------------------------
/code/grnn.parpred.R:
--------------------------------------------------------------------------------
1 | grnn.parpred <- function(net, x) {
2 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
3 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
4 | if (ncol(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
5 |
6 | cls <- parallel::makeCluster(min(floor(nrow(x) / 3), parallel::detectCores() - 1), type = "PSOCK")
7 | parallel::clusterExport(cls, c("net", "grnn.predict", "grnn.predone", "x"), envir = environment())
8 | spx <- parallel::parLapplyLB(cls, parallel::clusterSplit(cls, seq(nrow(x))),
9 | function(c_) x[c_, ])
10 | rst <- parallel::parLapplyLB(cls, spx, function(x_) grnn.predict(net, x_))
11 | parallel::stopCluster(cls)
12 | return(Reduce(c, rst))
13 | }
14 |
--------------------------------------------------------------------------------
/code/grnn.partial.R:
--------------------------------------------------------------------------------
1 | grnn.partial <- function(net, i, plot = T) {
2 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
3 |
4 | xname <- colnames(net$x)[i]
5 | xi <- sort(unique(net$x[, i]))
6 | partial <- function(x_i) {
7 | x <- net$x
8 | x[, i] <- rep(x_i, length(net$y))
9 | return(data.frame(x = x_i, p = mean(grnn.predict(net, x))))
10 | }
11 | cls <- parallel::makeCluster(min(length(xi), parallel::detectCores() - 1), type = "PSOCK")
12 | obj <- c("net", "grnn.fit", "grnn.predone", "grnn.predict")
13 | parallel::clusterExport(cls, obj, envir = environment())
14 | rst <- Reduce(rbind, parallel::parLapply(cls, xi, partial))
15 | parallel::stopCluster(cls)
16 | if (plot == T) {
17 | plot(rst[, 1], rst[, 2], type = "b", lty = 4, lwd = 3, ylab = '', xlab = xname,
18 | main = "Partial Dependence", pch = 16, cex = 1.5, col = "royalblue", cex.main = 1, cex.lab = 1, yaxt = 'n')
19 |
20 | } else {
21 | return(rst)
22 | }
23 | }
24 |
--------------------------------------------------------------------------------
/code/grnn.pfi.R:
--------------------------------------------------------------------------------
1 | grnn.pfi <- function(net, ntry = 100, seed = 1) {
2 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
3 |
4 | cls <- parallel::makeCluster(min(ncol(net$x), parallel::detectCores() - 1), type = "PSOCK")
5 | obj <- c("net", "grnn.fit", "grnn.predone", "grnn.predict", "grnn.x_pfi", "ntry", "seed")
6 | parallel::clusterExport(cls, obj, envir = environment())
7 | rst1 <- data.frame(idx = seq(ncol(net$x)),
8 | Reduce(rbind, parallel::parLapply(cls, seq(ncol(net$x)), function(i) grnn.x_pfi(net, i, ntry = ntry, seed = seed))))
9 | parallel::stopCluster(cls)
10 | rst2 <- rst1[with(rst1, order(-pfi)), ]
11 | row.names(rst2) <- NULL
12 | return(rst2)
13 | }
14 |
--------------------------------------------------------------------------------
/code/grnn.predict.R:
--------------------------------------------------------------------------------
1 | grnn.predict <- function(net, x) {
2 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
3 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
4 | if (ncol(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
5 |
6 | return(Reduce(c, lapply(split(x, seq(nrow(x))), function(x_) grnn.predone(net, x_))))
7 | }
8 |
--------------------------------------------------------------------------------
/code/grnn.predone.R:
--------------------------------------------------------------------------------
1 | grnn.predone<- function(net, x, type = 1) {
2 | ### CHECK INPUT X VECTOR ###
3 | if (is.vector(x) == F) stop("x needs to be a vector.", call. = F)
4 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
5 | if (length(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
6 | ### CHECK INPUT TYPE (CURRENTLY SUPPORTING 1 / 2) ###
7 | if (!(type %in% c(1, 2))) stop("the type is not supported.", call. = F)
8 |
9 | if (type == 1) {
10 | ### EUCLIDEAN DISTANCE BY DEFAULT ###
11 | num <- sum(net$w * net$y * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum((x - xi) ^ 2))) / (2 * (net$sigma ^ 2))))
12 | den <- sum(net$w * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum((x - xi) ^ 2))) / (2 * (net$sigma ^ 2))))
13 | } else if (type == 2) {
14 | ### MANHATTAN DISTANCE ###
15 | num <- sum(net$w * net$y * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum(abs(x - xi)))) / net$sigma))
16 | den <- sum(net$w * exp(-Reduce(c, lapply(split(net$x, seq(nrow(net$x))), function(xi) sum(abs(x - xi)))) / net$sigma ))
17 | }
18 | return(num / den)
19 | }
20 |
--------------------------------------------------------------------------------
/code/grnn.search_auc.R:
--------------------------------------------------------------------------------
1 | grnn.search_auc <- function(net, sigmas, nfolds = 4, seed = 1) {
2 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
3 | set.seed(seed)
4 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
5 |
6 | cv <- function(s) {
7 | rs <- Reduce(rbind,
8 | lapply(fd,
9 | function(f) data.frame(ya = net$y[unlist(f)],
10 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
11 | net$x[unlist(f), ]))))
12 | return(data.frame(sigma = s, auc = MLmetrics::AUC(y_pred = rs$yp, y_true = rs$ya)))
13 | }
14 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
15 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
16 | parallel::clusterExport(cls, obj, envir = environment())
17 | rst <- Reduce(rbind, parallel::parLapply(cls, sigmas, cv))
18 | parallel::stopCluster(cls)
19 | return(list(test = rst, best = rst[rst$auc == max(rst$auc), ]))
20 | }
21 |
--------------------------------------------------------------------------------
/code/grnn.search_rsq.R:
--------------------------------------------------------------------------------
1 | grnn.search_rsq <- function(net, sigmas, nfolds, seed = 1) {
2 | set.seed(seed)
3 | fd <- caret::createFolds(seq(nrow(net$x)), k = nfolds)
4 |
5 | cv <- function(s) {
6 | rs <- Reduce(rbind,
7 | lapply(fd,
8 | function(f) data.frame(ya = net$y[unlist(f)],
9 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
10 | net$x[unlist(f), ]))))
11 | return(data.frame(sigma = s, r2 = MLmetrics::R2_Score(y_pred = rs$yp, y_true = rs$ya)))
12 | }
13 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
14 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
15 | parallel::clusterExport(cls, obj, envir = environment())
16 | rst <- Reduce(rbind, parallel::parLapply(cls, sigmas, cv))
17 | parallel::stopCluster(cls)
18 | return(list(test = rst, best = rst[rst$r2 == max(rst$r2), ]))
19 | }
20 |
--------------------------------------------------------------------------------
/code/grnn.x_imp.R:
--------------------------------------------------------------------------------
1 | grnn.x_imp <- function(net, i) {
2 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
3 |
4 | xname <- colnames(net$x)[i]
5 | x <- net$x
6 | x[, i] <- rep(mean(net$x[, i]), length(net$y))
7 | auc0 <- MLmetrics::AUC(y_pred = grnn.predict(net, net$x), y_true = net$y)
8 | auc1 <- MLmetrics::AUC(y_pred = grnn.predict(net, x), y_true = net$y)
9 | auc2 <- MLmetrics::AUC(y_pred = grnn.predict(grnn.fit(x = x[, -i], y = net$y, sigma = net$sigma), x[, -i]), y_true = net$y)
10 | return(data.frame(var = xname, imp1 = round(max(0, 1 - auc1 / auc0), 8), imp2 = round(max(0, 1 - auc2 / auc0), 8)))
11 | }
12 |
--------------------------------------------------------------------------------
/code/grnn.x_pfi.R:
--------------------------------------------------------------------------------
1 | grnn.x_pfi <- function(net, i, ntry = 1e3, seed = 1) {
2 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
3 |
4 | xname <- colnames(net$x)[i]
5 | set.seed(seed)
6 | seeds <- round(runif(ntry) * 1e8, 0)
7 | ol <- lapply(seeds, function(s) with(set.seed(s), sample(seq(nrow(net$x)), nrow(net$x), replace = F)))
8 | cl <- Reduce(c, lapply(ol, function(o) abs(cor(seq(nrow(net$x)), o))))
9 | x <- net$x
10 | x[, i] <- net$x[ol[[which(cl == min(cl))]], i]
11 | auc0 <- MLmetrics::AUC(y_pred = grnn.predict(net, net$x), y_true = net$y)
12 | auc1 <- MLmetrics::AUC(y_pred = grnn.predict(net, x), y_true = net$y)
13 | return(data.frame(var = xname, pfi = round(max(0, 1 - auc1 / auc0), 8)))
14 | }
15 |
--------------------------------------------------------------------------------
/code/grnn_learn.SAS:
--------------------------------------------------------------------------------
1 | %macro grnn_learn(data = , x = , y = , sigma = , nn_out = );
2 | options mprint mlogic nocenter;
3 | ********************************************************;
4 | * THIS MACRO IS TO TRAIN A GENERAL REGRESSION NEURAL *;
5 | * NETWORK (SPECHT, 1991) AND STORE THE SPECIFICATION *;
6 | *------------------------------------------------------*;
7 | * INPUT PARAMETERS: *;
8 | * DATA : INPUT SAS DATASET *;
9 | * X : A LIST OF PREDICTORS IN THE NUMERIC FORMAT *;
10 | * Y : A RESPONSE VARIABLE IN THE NUMERIC FORMAT *;
11 | * SIGMA : THE SMOOTH PARAMETER FOR GRNN *;
12 | * NN_OUT: OUTPUT SAS DATASET CONTAINING THE GRNN *;
13 | * SPECIFICATION *;
14 | *------------------------------------------------------*;
15 | * AUTHOR: *;
16 | * WENSUI.LIU@53.COM *;
17 | ********************************************************;
18 |
19 | data _tmp1;
20 | set &data (keep = &x &y);
21 | where &y ~= .;
22 | array _x_ &x;
23 | _miss_ = 0;
24 | do _i_ = 1 to dim(_x_);
25 | if _x_[_i_] = . then _miss_ = 1;
26 | end;
27 | if _miss_ = 0 then output;
28 | run;
29 |
30 | proc summary data = _tmp1;
31 | output out = _avg_ (drop = _type_ _freq_)
32 | mean(&x) = ;
33 | run;
34 |
35 | proc summary data = _tmp1;
36 | output out = _std_ (drop = _type_ _freq_)
37 | std(&x) = ;
38 | run;
39 |
40 | proc standard data = _tmp1 mean = 0 std = 1 out = _data_;
41 | var &x;
42 | run;
43 |
44 | data &nn_out (keep = _neuron_ _key_ _value_);
45 | set _last_ end = eof;
46 | _neuron_ + 1;
47 | length _key_ $32;
48 | array _a_ &y &x;
49 | do _i_ = 1 to dim(_a_);
50 | if _i_ = 1 then _key_ = '_Y_';
51 | else _key_ = upcase(vname(_a_[_i_]));
52 | _value_ = _a_[_i_];
53 | output;
54 | end;
55 | if eof then do;
56 | _neuron_ = 0;
57 | _key_ = "_SIGMA_";
58 | _value_ = σ
59 | output;
60 | set _avg_;
61 | array _b_ &x;
62 | do _i_ = 1 to dim(_b_);
63 | _neuron_ = -1;
64 | _key_ = upcase(vname(_b_[_i_]));
65 | _value_ = _b_[_i_];
66 | output;
67 | end;
68 | set _std_;
69 | array _c_ &x;
70 | do _i_ = 1 to dim(_c_);
71 | _neuron_ = -2;
72 | _key_ = upcase(vname(_c_[_i_]));
73 | _value_ = _c_[_i_];
74 | output;
75 | end;
76 | end;
77 | run;
78 |
79 | proc datasets library = work;
80 | delete _: / memtype = data;
81 | run;
82 | quit;
83 |
84 | ********************************************************;
85 | * END OF THE MACRO *;
86 | ********************************************************;
87 | %mend grnn_learn;
88 |
--------------------------------------------------------------------------------
/code/grnn_margin.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/statcompute/yager/181071cbb3e05630e1deacba65173807cf80aee2/code/grnn_margin.PNG
--------------------------------------------------------------------------------
/code/grnn_pred.SAS:
--------------------------------------------------------------------------------
1 | %macro grnn_pred(data = , x = , id = NA, nn_in = , out = grnn_pred);
2 | options mprint mlogic nocenter;
3 | ********************************************************;
4 | * THIS MACRO IS TO GENERATE PREDICTED VALUES BASED ON *;
5 | * THE SPECIFICATION OF GRNN CREATED BY THE %GRNN_LEARN *;
6 | * MACRO *;
7 | *------------------------------------------------------*;
8 | * INPUT PARAMETERS: *;
9 | * DATA : INPUT SAS DATASET *;
10 | * X : A LIST OF PREDICTORS IN THE NUMERIC FORMAT *;
11 | * ID : AN ID VARIABLE (OPTIONAL) *;
12 | * NN_IN: INPUT SAS DATASET CONTAINING THE GRNN *;
13 | * SPECIFICATION GENERATED FROM %GRNN_LEARN *;
14 | * OUT : OUTPUT SAS DATASET WITH GRNN PREDICTIONS *;
15 | *------------------------------------------------------*;
16 | * AUTHOR: *;
17 | * WENSUI.LIU@53.COM *;
18 | ********************************************************;
19 |
20 | data _data_;
21 | set &data;
22 | array _x_ &x;
23 | _miss_ = 0;
24 | do _i_ = 1 to dim(_x_);
25 | if _x_[_i_] = . then _miss_ = 1;
26 | end;
27 | if _miss_ = 0 then output;
28 | run;
29 |
30 | data _data_;
31 | set _last_ (drop = _miss_);
32 | %if &id = NA %then %do;
33 | _id_ + 1;
34 | %end;
35 | %else %do;
36 | _id_ = &id;
37 | %end;
38 | run;
39 |
40 | proc sort data = _last_ sortsize = max nodupkey;
41 | by _id_;
42 | run;
43 |
44 | data _data_ (keep = _id_ _key_ _value_);
45 | set _last_;
46 | array _x_ &x;
47 | length _key_ $32;
48 | do _i_ = 1 to dim(_x_);
49 | _key_ = upcase(vname(_x_[_i_]));
50 | _value_ = _x_[_i_];
51 | output;
52 | end;
53 | run;
54 |
55 | proc sql noprint;
56 | select _value_ ** 2 into :s2 from &nn_in where _neuron_ = 0;
57 |
58 | create table
59 | _last_ as
60 | select
61 | a._id_,
62 | a._key_,
63 | (a._value_ - b._value_) / c._value_ as _value_
64 | from
65 | _last_ as a,
66 | &nn_in as b,
67 | &nn_in as c
68 | where
69 | compress(a._key_, ' ') = compress(b._key_, ' ') and
70 | compress(a._key_, ' ') = compress(c._key_, ' ') and
71 | b._neuron_ = -1 and
72 | c._neuron_ = -2;
73 |
74 | create table
75 | _last_ as
76 | select
77 | a._id_,
78 | b._neuron_,
79 | sum((a._value_ - b._value_) ** 2) as d2,
80 | mean(c._value_) as y,
81 | exp(-(calculated d2) / (2 * &s2)) as exp
82 | from
83 | _last_ as a,
84 | &nn_in as b,
85 | &nn_in as c
86 | where
87 | compress(a._key_, ' ') = compress(b._key_, ' ') and
88 | b._neuron_ = c._neuron_ and
89 | b._neuron_ > 0 and
90 | c._key_ = '_Y_'
91 | group by
92 | a._id_, b._neuron_;
93 |
94 | create table
95 | _last_ as
96 | select
97 | a._id_,
98 | sum(a.y * a.exp / b.sum_exp) as _pred_
99 | from
100 | _last_ as a inner join (select _id_, sum(exp) as sum_exp from _last_ group by _id_) as b
101 | on
102 | a._id_ = b._id_
103 | group by
104 | a._id_;
105 | quit;
106 |
107 | proc sort data = _last_ out = &out sortsize = max;
108 | by _id_;
109 | run;
110 |
111 | ********************************************************;
112 | * END OF THE MACRO *;
113 | ********************************************************;
114 | %mend grnn_pred;
115 |
--------------------------------------------------------------------------------
/code/yager.R:
--------------------------------------------------------------------------------
1 | ###########################################################################
2 | # PACKAGE NAME: YAGeR (YET ANOTHER GENERAL REGRESSION NEURAL NETWORK) #
3 | # AUTHOR : WENSUI LIU #
4 | # DISCLAIMER : THIS IS MY WEEKEND PROJECT AND NOT RELATED TO MY #
5 | # CURRENT WORK WITH MY EMPLOYER #
6 | # IT IS FREE (AS FREE BEER) TO USE AND DISTRIBUTE #
7 | ###########################################################################
8 |
9 | gen_unifm <- function(min = 0, max = 1, n, seed = 1) {
10 | set.seed(seed)
11 | return(round(min + (max - min) * runif(n), 8))
12 | }
13 |
14 | gen_sobol <- function(min = 0, max = 1, n, seed = 1) {
15 | return(round(min + (max - min) * randtoolbox::sobol(n, dim = 1, scrambling = 3, seed = seed), 8))
16 | }
17 |
18 | gen_latin <- function(min = 0, max = 1, n, seed = 1) {
19 | set.seed(seed)
20 | return(round(min + (max - min) * c(lhs::randomLHS(n, k = 1)), 8))
21 | }
22 |
23 | folds <- function(idx, n, seed = 1) {
24 | g <- with(set.seed(seed), sample(idx, length(idx))) %% n + 1
25 | r <- split(idx, g)
26 | names(r) <- paste('Fold', seq(n), sep = '')
27 | return(r)
28 | }
29 |
30 | ###########################################################################
31 |
32 | grnn.fit <- function(x, y, w = rep(1, length(y)), sigma = 1) {
33 | ### CHECK X MATRIX ###
34 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
35 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
36 | ### CHECK Y VECTOR ###
37 | if (is.vector(y) == F) stop("y needs to be a vector.", call. = F)
38 | if (anyNA(y) == T) stop("NA found in y.", call. = F)
39 | if (length(y) != nrow(x)) stop("x and y need to share the same length.", call. = F)
40 | ### CHECK W VECTOR ###
41 | if (is.vector(w) == F) stop("w needs to be a vector.", call. = F)
42 | if (anyNA(w) == T) stop("NA found in w.", call. = F)
43 | if (length(w) != nrow(x)) stop("x and w need to share the same length.", call. = F)
44 | ### CHECK SIGMA ###
45 | if (sigma <= 0) stop("sigma needs to be positive", call. = F)
46 |
47 | gn <- structure(list(), class = "General Regression Neural Net")
48 | gn$x <- x
49 | gn$y <- y
50 | gn$w <- w
51 | gn$sigma <- sigma
52 | return(gn)
53 | }
54 |
55 | ###########################################################################
56 |
57 | grnn.predone <- function(net, x, type = 1) {
58 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
59 | ### CHECK INPUT X VECTOR ###
60 | if (is.vector(x) == F) stop("x needs to be a vector.", call. = F)
61 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
62 | if (length(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
63 | ### CHECK INPUT TYPE (CURRENTLY SUPPORTING 1 / 2) ###
64 | if (!(type %in% c(1, 2))) stop("the type is not supported.", call. = F)
65 |
66 | xl <- split(net$x, seq(nrow(net$x)))
67 | if (type == 1) {
68 | ### EUCLIDEAN DISTANCE BY DEFAULT ###
69 | num <- sum(net$w * net$y * exp(-Reduce(c, lapply(xl, function(xi) sum((x - xi) ^ 2))) / (2 * (net$sigma ^ 2))))
70 | den <- sum(net$w * exp(-Reduce(c, lapply(xl, function(xi) sum((x - xi) ^ 2))) / (2 * (net$sigma ^ 2))))
71 | } else if (type == 2) {
72 | ### MANHATTAN DISTANCE ###
73 | num <- sum(net$w * net$y * exp(-Reduce(c, lapply(xl, function(xi) sum(abs(x - xi)))) / net$sigma))
74 | den <- sum(net$w * exp(-Reduce(c, lapply(xl, function(xi) sum(abs(x - xi)))) / net$sigma))
75 | }
76 | return(num / den)
77 | }
78 |
79 | ###########################################################################
80 |
81 | grnn.predict <- function(net, x) {
82 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
83 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
84 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
85 | if (ncol(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
86 | return(Reduce(c, lapply(split(x, seq(nrow(x))), function(x_) grnn.predone(net, x_))))
87 | }
88 |
89 | ###########################################################################
90 |
91 | grnn.parpred <- function(net, x) {
92 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
93 | if (is.matrix(x) == F) stop("x needs to be a matrix.", call. = F)
94 | if (anyNA(x) == T) stop("NA found in x.", call. = F)
95 | if (ncol(x) != ncol(net$x)) stop("x dimension is not consistent with grnn.", call. = F)
96 |
97 | cls <- parallel::makeCluster(min(floor(nrow(x) / 3), parallel::detectCores() - 1), type = "PSOCK")
98 | obj <- c("net", "x", "grnn.predone", "grnn.predict")
99 | parallel::clusterExport(cls, obj, envir = environment())
100 | spx <- parallel::parLapplyLB(cls, parallel::clusterSplit(cls, seq(nrow(x))),
101 | function(c_) x[c_, ])
102 | rst <- parallel::parLapplyLB(cls, spx, function(x_) grnn.predict(net, x_))
103 | parallel::stopCluster(cls)
104 | return(Reduce(c, rst))
105 | }
106 |
107 | ###########################################################################
108 |
109 | grnn.search_rsq <- function(net, sigmas, nfolds = 4, seed = 1) {
110 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
111 | if (is.vector(sigmas) != T) stop("sigmas needs to be a vector.", call. = F)
112 |
113 | fd <- folds(seq(nrow(net$x)), n = nfolds, seed = seed)
114 |
115 | cv <- function(s) {
116 | rs <- Reduce(rbind,
117 | lapply(fd,
118 | function(f) data.frame(ya = net$y[unlist(f)],
119 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
120 | net$x[unlist(f), ]))))
121 | return(data.frame(sigma = s, r2 = MLmetrics::R2_Score(y_pred = rs$yp, y_true = rs$ya)))
122 | }
123 |
124 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
125 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
126 | parallel::clusterExport(cls, obj, envir = environment())
127 | rst <- Reduce(rbind, parallel::parLapply(cls, sigmas, cv))
128 | parallel::stopCluster(cls)
129 | return(list(test = rst, best = rst[rst$r2 == max(rst$r2), ]))
130 | }
131 |
132 | ###########################################################################
133 |
134 | grnn.search_auc <- function(net, sigmas, nfolds = 4, seed = 1) {
135 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
136 | if (is.vector(sigmas) != T) stop("sigmas needs to be a vector.", call. = F)
137 |
138 | fd <- folds(seq(nrow(net$x)), n = nfolds, seed = seed)
139 |
140 | cv <- function(s) {
141 | rs <- Reduce(rbind,
142 | lapply(fd,
143 | function(f) data.frame(ya = net$y[unlist(f)],
144 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
145 | net$x[unlist(f), ]))))
146 | return(data.frame(sigma = s, auc = MLmetrics::AUC(y_pred = rs$yp, y_true = rs$ya)))
147 | }
148 |
149 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
150 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
151 | parallel::clusterExport(cls, obj, envir = environment())
152 | rst <- Reduce(rbind, parallel::parLapply(cls, sigmas, cv))
153 | parallel::stopCluster(cls)
154 | return(list(test = rst, best = rst[rst$auc == max(rst$auc), ]))
155 | }
156 |
157 | ###########################################################################
158 |
159 | grnn.optmiz_auc <- function(net, lower = 0, upper, nfolds = 4, seed = 1, method = 1) {
160 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
161 |
162 | fd <- folds(seq(nrow(net$x)), n = nfolds, seed = seed)
163 |
164 | cv <- function(s) {
165 | cls <- parallel::makeCluster(min(nfolds, parallel::detectCores() - 1), type = "PSOCK")
166 | obj <- c("fd", "net", "grnn.fit", "grnn.predone", "grnn.predict")
167 | parallel::clusterExport(cls, obj, envir = environment())
168 | rs <- Reduce(rbind,
169 | parallel::parLapply(cls, fd,
170 | function(f) data.frame(ya = net$y[unlist(f)],
171 | yp = grnn.predict(grnn.fit(net$x[unlist(-f), ], net$y[unlist(-f)], sigma = s),
172 | net$x[unlist(f), ]))))
173 | parallel::stopCluster(cls)
174 | return(MLmetrics::AUC(y_pred = rs$yp, y_true = rs$ya))
175 | }
176 |
177 | if (method == 1) {
178 | rst <- optimize(f = cv, interval = c(lower, upper), maximum = T)
179 | } else if (method == 2) {
180 | rst <- optim(par = mean(lower, upper), fn = cv, lower = lower, upper = upper,
181 | method = "Brent", control = list(fnscale = -1))
182 | }
183 | return(data.frame(sigma = rst[[1]], auc = rst[[2]]))
184 | }
185 |
186 | ###########################################################################
187 |
188 | grnn.margin <- function(net, i, plot = T) {
189 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
190 | if (i > ncol(net$x)) stop("the selected variable is out of bound.", call. = F)
191 | if (!(plot %in% c(T, F))) stop("the plot input is not correct.", call. = F)
192 |
193 | xname <- colnames(net$x)[i]
194 | n <- length(unique(net$x[, i]))
195 | x <- matrix(rep(colMeans(net$x), n), nrow = n, byrow = T)
196 | x[, i] <- sort(unique(net$x[, i]))
197 | rst <- data.frame(x = x[, i], p = grnn.predict(net, x))
198 | if (plot == T) {
199 | plot(rst[, 1], rst[, 2], type = "b", lty = 4, lwd = 3, ylab = '', xlab = xname,
200 | main = "Marginal Effect", pch = 16, cex = 1.5, col = "red", cex.main = 1, cex.lab = 1, yaxt = 'n')
201 | rug(rst[, 1], col = 'green4', ticksize = 0.03, lwd = 3)
202 | } else {
203 | return(rst)
204 | }
205 | }
206 |
207 | ###########################################################################
208 |
209 | grnn.partial <- function(net, i, plot = T) {
210 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
211 | if (i > ncol(net$x)) stop("the selected variable is out of bound.", call. = F)
212 | if (!(plot %in% c(T, F))) stop("the plot input is not correct.", call. = F)
213 |
214 | xname <- colnames(net$x)[i]
215 | xi <- sort(unique(net$x[, i]))
216 |
217 | partial <- function(x_i) {
218 | x <- net$x
219 | x[, i] <- rep(x_i, length(net$y))
220 | return(data.frame(x = x_i, p = mean(grnn.predict(net, x))))
221 | }
222 |
223 | cls <- parallel::makeCluster(min(length(xi), parallel::detectCores() - 1), type = "PSOCK")
224 | obj <- c("net", "grnn.fit", "grnn.predone", "grnn.predict")
225 | parallel::clusterExport(cls, obj, envir = environment())
226 | rst <- Reduce(rbind, parallel::parLapply(cls, xi, partial))
227 | parallel::stopCluster(cls)
228 | if (plot == T) {
229 | plot(rst[, 1], rst[, 2], type = "b", lty = 4, lwd = 3, ylab = '', xlab = xname,
230 | main = "Partial Dependence", pch = 16, cex = 1.5, col = "royalblue", cex.main = 1, cex.lab = 1, yaxt = 'n')
231 | rug(rst[, 1], col = 'green4', ticksize = 0.03, lwd = 3)
232 | } else {
233 | return(rst)
234 | }
235 | }
236 |
237 | ###########################################################################
238 |
239 | grnn.x_imp <- function(net, i, class = F) {
240 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
241 | if (i > ncol(net$x)) stop("the selected variable is out of bound.", call. = F)
242 | if (!(class %in% c(T, F))) stop("the class input is not correct.", call. = F)
243 |
244 | xname <- colnames(net$x)[i]
245 | x <- net$x
246 | x[, i] <- rep(mean(net$x[, i]), length(net$y))
247 | if (class == T) {
248 | auc0 <- MLmetrics::AUC(grnn.predict(net, net$x), net$y)
249 | auc1 <- MLmetrics::AUC(grnn.predict(net, x), net$y)
250 | auc2 <- MLmetrics::AUC(grnn.predict(grnn.fit(x = x[, -i], y = net$y, sigma = net$sigma), x[, -i]), net$y)
251 | imp1 <- round(max(0, 1 - auc1 / auc0), 8)
252 | imp2 <- round(max(0, 1 - auc2 / auc0), 8)
253 | } else {
254 | rsq0 <- MLmetrics::R2_Score(grnn.predict(net, net$x), net$y)
255 | rsq1 <- MLmetrics::R2_Score(grnn.predict(net, x), net$y)
256 | rsq2 <- MLmetrics::R2_Score(grnn.predict(grnn.fit(x = x[, -i], y = net$y, sigma = net$sigma), x[, -i]), net$y)
257 | imp1 <- round(max(0, 1 - rsq1 / rsq0), 8)
258 | imp2 <- round(max(0, 1 - rsq2 / rsq0), 8)
259 | }
260 | return(data.frame(var = xname, imp1 = imp1, imp2 = imp2))
261 | }
262 |
263 | ###########################################################################
264 |
265 | grnn.x_pfi <- function(net, i, class = F, ntry = 1e3, seed = 1) {
266 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
267 | if (!(class %in% c(T, F))) stop("the class input is not correct.", call. = F)
268 |
269 | xname <- colnames(net$x)[i]
270 | set.seed(seed)
271 | seeds <- floor(runif(ntry) * 1e8)
272 | ol <- lapply(seeds, function(s) with(set.seed(s), sample(seq(nrow(net$x)), nrow(net$x), replace = F)))
273 | cl <- Reduce(c, lapply(ol, function(o) abs(cor(seq(nrow(net$x)), o))))
274 | x <- net$x
275 | x[, i] <- net$x[ol[[which(cl == min(cl))]], i]
276 | if (class == T) {
277 | auc0 <- MLmetrics::AUC(grnn.predict(net, net$x), net$y)
278 | auc1 <- MLmetrics::AUC(grnn.predict(net, x), net$y)
279 | pfi <- round(max(0, 1 - auc1 / auc0), 8)
280 | } else {
281 | rsq0 <- MLmetrics::R2_Score(grnn.predict(net, net$x), net$y)
282 | rsq1 <- MLmetrics::R2_Score(grnn.predict(net, x), net$y)
283 | pfi <- round(max(0, 1 - rsq1 / rsq0), 8)
284 | }
285 | return(data.frame(var = xname, pfi = pfi))
286 | }
287 |
288 | ###########################################################################
289 |
290 | grnn.imp <- function(net, class = F) {
291 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
292 | if (!(class %in% c(T, F))) stop("the class input is not correct.", call. = F)
293 |
294 | cls <- parallel::makeCluster(min(ncol(net$x), parallel::detectCores() - 1), type = "PSOCK")
295 | obj <- c("net", "class", "grnn.fit", "grnn.predone", "grnn.predict", "grnn.x_imp")
296 | parallel::clusterExport(cls, obj, envir = environment())
297 | rst1 <- data.frame(idx = seq(ncol(net$x)),
298 | Reduce(rbind, parallel::parLapply(cls, seq(ncol(net$x)), function(i) grnn.x_imp(net, i, class = class))))
299 | parallel::stopCluster(cls)
300 | rst2 <- rst1[with(rst1, order(-imp1, -imp2)), ]
301 | row.names(rst2) <- NULL
302 | return(rst2)
303 | }
304 |
305 | ###########################################################################
306 |
307 | grnn.pfi <- function(net, class = F, ntry = 1e3, seed = 1) {
308 | if (class(net) != "General Regression Neural Net") stop("net needs to be a GRNN.", call. = F)
309 | if (!(class %in% c(T, F))) stop("the class input is not correct.", call. = F)
310 |
311 | cls <- parallel::makeCluster(min(ncol(net$x), parallel::detectCores() - 1), type = "PSOCK")
312 | obj <- c("net", "class", "grnn.fit", "grnn.predone", "grnn.predict", "grnn.x_pfi", "ntry", "seed")
313 | parallel::clusterExport(cls, obj, envir = environment())
314 | rst1 <- data.frame(idx = seq(ncol(net$x)),
315 | Reduce(rbind, parallel::parLapply(cls, seq(ncol(net$x)),
316 | function(i) grnn.x_pfi(net, i, class = class, ntry = ntry, seed = seed))))
317 | parallel::stopCluster(cls)
318 | rst2 <- rst1[with(rst1, order(-pfi)), ]
319 | row.names(rst2) <- NULL
320 | return(rst2)
321 | }
322 |
--------------------------------------------------------------------------------
/yager-manual.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/statcompute/yager/181071cbb3e05630e1deacba65173807cf80aee2/yager-manual.pdf
--------------------------------------------------------------------------------
/yager_0.1.0.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/statcompute/yager/181071cbb3e05630e1deacba65173807cf80aee2/yager_0.1.0.tar.gz
--------------------------------------------------------------------------------
/yager_0.1.1.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/statcompute/yager/181071cbb3e05630e1deacba65173807cf80aee2/yager_0.1.1.tar.gz
--------------------------------------------------------------------------------