├── .gitattributes ├── .gitignore ├── DESCRIPTION ├── Documentation ├── fmlogit_docs.Rmd ├── fmlogit_docs.html └── fmlogit_docs.pdf ├── NAMESPACE ├── R ├── fmlogit.R ├── fmlogit_main.R ├── marginals.R ├── plot_effects.R ├── plot_effects_1.R ├── predictions.R ├── spending_data.R ├── summary.R └── wtp.R ├── README.md ├── data └── spending.rda └── man ├── effects.fmlogit.Rd ├── fitted.fmlogit.Rd ├── fmlogit.Rd ├── plot.fmlogit.Rd ├── plot.fmlogit.margins.Rd ├── spending.Rd ├── summary.fmlogit.Rd └── wtp.Rd /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | 4 | # Custom for Visual Studio 5 | *.cs diff=csharp 6 | 7 | # Standard to msysgit 8 | *.doc diff=astextplain 9 | *.DOC diff=astextplain 10 | *.docx diff=astextplain 11 | *.DOCX diff=astextplain 12 | *.dot diff=astextplain 13 | *.DOT diff=astextplain 14 | *.pdf diff=astextplain 15 | *.PDF diff=astextplain 16 | *.rtf diff=astextplain 17 | *.RTF diff=astextplain 18 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.Rproj 2 | 3 | # Windows image file caches 4 | Thumbs.db 5 | ehthumbs.db 6 | 7 | # Folder config file 8 | Desktop.ini 9 | 10 | # Recycle Bin used on file shares 11 | $RECYCLE.BIN/ 12 | 13 | # Windows Installer files 14 | *.cab 15 | *.msi 16 | *.msm 17 | *.msp 18 | 19 | # Windows shortcuts 20 | *.lnk 21 | 22 | # ========================= 23 | # Operating System Files 24 | # ========================= 25 | 26 | # OSX 27 | # ========================= 28 | 29 | .DS_Store 30 | .AppleDouble 31 | .LSOverride 32 | 33 | # Thumbnails 34 | ._* 35 | 36 | # Files that might appear in the root of a volume 37 | .DocumentRevisions-V100 38 | .fseventsd 39 | .Spotlight-V100 40 | .TemporaryItems 41 | .Trashes 42 | .VolumeIcon.icns 43 | 44 | # Directories potentially created on remote AFP share 45 | .AppleDB 46 | .AppleDesktop 47 | Network Trash Folder 48 | Temporary Items 49 | .apdisk 50 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: fmlogit 2 | Title: Fractional Multinomial Logit using QMLE 3 | Version: 2.0 4 | Authors@R: c(person("James Xinde", "Ji", email = "xji1@ufl.edu",role=c("aut","cre"))) 5 | Description: Provides estimation and simple hypothesis testing of the fractional 6 | multinomial logit model. 7 | Depends: 8 | R (>= 2.6.0),maxLik 9 | Imports: maxLik 10 | Suggests: Foreign, ggplot2, grid 11 | Encoding: UTF-8 12 | LazyData: true 13 | License: MIT 14 | RoxygenNote: 6.1.1 15 | -------------------------------------------------------------------------------- /Documentation/fmlogit_docs.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'The fmlogit Package: An Econometric Document' 3 | author: "Xinde James Ji" 4 | date: "Oct.10, 2016" 5 | output: pdf_document 6 | --- 7 | 8 | This document provides theoretical documentations for the "fmlogit" package in R. Updates will be published at my github site: \url{https://github.com/f1kidd/fmlogit}. Any suggestions or concerns are welcomed\footnote{email: xji@vt.edu}. For function usage and calls, please check directly with help(func_name) after loading the fmlogit package. 9 | 10 | # Motivation 11 | Fractional multinomial responses, or multivariate share models, arises naturally in various occasions. For example, a municipality allocates its budgets to multiple departments, and we are interested in the proportion of the budgets that each department receives. Or, there are multiple candidates in a presendential election, and we are interested in the percentage of support for each candidate in each state. 12 | 13 | The model is distinct in that 1) each of the response lies between 0 and 1, and 2) the share of all responses adds up to one. The fmlogit model utilizes the two distinct factors, and model it explicitly using a multinomial logit transformation on the response variables. If the true data generating process is multinomial fractions, or shares of multiple choices, then the fractional multinomial logit model is consistent and efficient, while other candidate models such as Dirichlet or Beta regression is not. 14 | 15 | # Econometric Model 16 | The basis of this package is Papke and Wooldridge(1996)'s paper, in which they proposed a quasi-maximum likelihood(QMLE) estimator for fractional response variables. As their approach applies to binary response variables, here we expand it to a multinomial response variables with fractional structure. 17 | 18 | We start by writing:\footnote{The demonstration below is in individual specific notation, but matrix notation is not hard to obtain from the individual specific notations. The actual function uses matrix calculation, which increases algorithm speed.} 19 | $$E(y_{ij}|x_i) = G(x_i\beta_j)$$ 20 | for the $j^{th}$ choice of the $i^{th}$ obsevation , where G(.) is a know function satisfying 0 0) { 92 | Xfacnames = colnames(X)[Xfac] 93 | strformFac = paste(Xfacnames, collapse = "+") 94 | Xdum = model.matrix(as.formula(paste("~", strformFac, 95 | sep = "")), data = X)[, -1] 96 | X = cbind(X, Xdum) 97 | X = X[, -Xfac] 98 | } 99 | Xnames = colnames(X) 100 | ynames = colnames(y) 101 | X = as.matrix(X) 102 | y = as.matrix(y) 103 | n = dim(X)[1] 104 | j = dim(y)[2] 105 | k = dim(X)[2] 106 | xy = cbind(X, y) 107 | xy = na.omit(xy) 108 | row.remain = setdiff(1:n, attr(xy, "na.action")) 109 | X = xy[, 1:k] 110 | y = xy[, (k + 1):(k + j)] 111 | n = dim(y)[1] 112 | remove(xy) 113 | # adding in the constant term 114 | if(k==1){ 115 | # check if the input X is constant 116 | if(length(unique(X))==1){ # X is constant 117 | Xnames = "constant" 118 | X = as.matrix(as.numeric(X),nrow=1) 119 | colnames(X) = Xnames 120 | X = as.matrix(X) 121 | k=0 122 | }else{ # one single variable of input 123 | Xnames = "X1" 124 | X = as.matrix(X) 125 | k = dim(X)[2] 126 | X = cbind(X, rep(1, n)) 127 | Xnames = c(Xnames, "constant") 128 | colnames(X) = Xnames 129 | } 130 | }else{ # normal cases 131 | X = X[, apply(X, 2, function(x) length(unique(x)) != 1)] 132 | Xnames = colnames(X) 133 | k = dim(X)[2] 134 | X = cbind(X, rep(1, n)) 135 | Xnames = c(Xnames, "constant") 136 | colnames(X) = Xnames 137 | } 138 | 139 | 140 | testcols <- function(X) { 141 | m = crossprod(as.matrix(X)) 142 | ee = eigen(m) 143 | evecs <- split(zapsmall(ee$vectors), col(ee$vectors)) 144 | mapply(function(val, vec) { 145 | if (val != 0) 146 | NULL 147 | else which(vec != 0) 148 | }, zapsmall(ee$values), evecs) 149 | } 150 | collinear = unique(unlist(testcols(X))) 151 | while (length(collinear) > 0) { 152 | if (qr(X)$rank == dim(X)[2]) 153 | print("Model may suffer from multicollinearity problems.") 154 | break 155 | if ((k + 1) %in% collinear) 156 | collinear = collinear[-length(collinear)] 157 | X = X[, -collinear[length(collinear)]] 158 | Xnames = colnames(X) 159 | k = k - 1 160 | collinear = unique(unlist(testcols(X))) 161 | } 162 | QMLE <- function(betas) { 163 | betas = matrix(betas, nrow = j - 1, byrow = T) 164 | betamat = rbind(rep(0, k + 1), betas) 165 | llf = 0 166 | for (i in 1:j) { 167 | L = y[, i] * ((X %*% betamat[i, ]) - log(rowSums(exp(X %*% 168 | t(betamat))))) 169 | llf = llf + sum(L) 170 | } 171 | return(llf) 172 | } 173 | QMLE_Obs <- function(betas) { 174 | betas = matrix(betas, nrow = j - 1, byrow = T) 175 | betamat = rbind(rep(0, k + 1), betas) 176 | llf = rep(0, n) 177 | for (i in 1:j) { 178 | L = y[, i] * ((X %*% betamat[i, ]) - log(rowSums(exp(X %*% 179 | t(betamat))))) 180 | llf = llf + L 181 | } 182 | return(llf) 183 | } 184 | if (length(beta0) == 0){ 185 | beta0 = rep(0, (k + 1) * (j - 1)) 186 | } 187 | if (length(beta0) != (k + 1) * (j - 1)) { 188 | beta0 = rep(0, (k + 1) * (j - 1)) 189 | warning("Wrong length of beta0 given. Use default setting instead.") 190 | } 191 | opt <- maxLik(QMLE_Obs, start = beta0, method = MLEmethod, 192 | control = list(iterlim = maxit, tol = abstol), ...) 193 | betamat = matrix(opt$estimate, ncol = k + 1, byrow = T) 194 | betamat_aug = rbind(rep(0, k + 1), betamat) 195 | colnames(betamat_aug) = Xnames 196 | rownames(betamat_aug) = ynames 197 | sigmat = matrix(nrow = j - 1, ncol = k + 1) 198 | vcov = list() 199 | 200 | ###insert--nonparametric bootstrap procedure (clustered SE and vcov) 201 | 202 | if(is.null(cluster)==F){ 203 | cluster = cluster[row.remain] 204 | clusters <- names(table(cluster)) 205 | for (i in 1:j) { 206 | # cluster should preferably be coming from a same data frame with the original y and X. 207 | sterrs <- matrix(NA, nrow=reps, ncol=k + 1) 208 | vcov_j_list=list() 209 | 210 | b=1 211 | no_singular_error=c() 212 | while(b<=reps){ 213 | 214 | index <- sample(1:length(clusters), length(clusters), replace=TRUE) 215 | aa <- clusters[index] 216 | bb <- table(aa) 217 | bootdat <- NULL 218 | dat=cbind(y,X) 219 | for(b1 in 1:max(bb)){ 220 | cc <- dat[cluster %in% names(bb[bb %in% b1]),] 221 | for(b2 in 1:b1){ 222 | bootdat <- rbind(bootdat, cc) 223 | } 224 | } 225 | 226 | bootdatX=matrix(bootdat[,(j+1):ncol(bootdat)],nrow=nrow(bootdat)) 227 | bootdaty=bootdat[,1:j] 228 | 229 | sum_expxb = rowSums(exp(bootdatX %*% t(betamat_aug))) 230 | expxb = exp(bootdatX %*% betamat_aug[i, ]) 231 | G = expxb/sum_expxb 232 | g = (expxb * sum_expxb - expxb^2)/sum_expxb^2 233 | X_a = bootdatX * as.vector(sqrt(g^2/(G * (1 - G)))) 234 | A = t(X_a) %*% X_a 235 | mu = bootdaty[, i] - G 236 | X_b = bootdatX * as.vector(mu * g/G/(1 - G)) 237 | B = t(X_b) %*% X_b 238 | 239 | a_solve_error = tryCatch(solve(A),error=function(e){NULL}) 240 | if(is.null(a_solve_error)){ 241 | no_singular_error=c(no_singular_error,b) 242 | next 243 | } 244 | 245 | Var_b = solve(A) %*% B %*% solve(A) 246 | std_b = sqrt(diag(Var_b)) 247 | sterrs[b,]=std_b 248 | vcov_j_list[[b]]=Var_b 249 | 250 | b=b+1 251 | } 252 | if(length(no_singular_error)>0){warning(paste('Error in solve.default(A) : Lapack routine dgesv: system is exactly singular: U[28,28] = 0" Appeared',length(no_singular_error),'times within cluster bootstrap for outcome #',i))} 253 | std_b=apply(sterrs,2,mean) 254 | vcov[[i]] = Reduce("+", vcov_j_list) / length(vcov_j_list) 255 | if (i > 1) 256 | sigmat[i - 1, ] = std_b 257 | } 258 | }else{ 259 | for(i in 1:j){ 260 | # start calculation 261 | sum_expxb = rowSums(exp(X %*% t(betamat_aug))) # sum of the exp(x'b)s 262 | expxb = exp(X %*% betamat_aug[i,]) # individual exp(x'b) 263 | G = expxb / sum_expxb # exp(X'bj) / sum^J(exp(X'bj)) 264 | g = (expxb * sum_expxb - expxb^2) / sum_expxb^2 # derivative of the logit function 265 | 266 | # Here the diagonal of A is the 'standard' standard error 267 | # hat(A) = sum hat(gi)^2 * xi'xi / hat(Gi)(1-hat(Gi)) 268 | # or, Xtilde = X * sqrt(g^2/G(1-G)), A = Xtilde'Xtilde 269 | X_a = X * as.vector(sqrt(g^2/(G*(1-G)))) 270 | A = t(X_a) %*% X_a 271 | 272 | # robust standard error, again following PW(1996) 273 | mu = y[,i] - G 274 | X_b = X * as.vector(mu * g / G / (1-G)) 275 | B = t(X_b) %*% X_b 276 | Var_b = solve(A) %*% B %*% solve(A) 277 | std_b = sqrt(diag(Var_b)) 278 | # std_b= sqrt(diag(solve(A))) is the "unrobust" standard error. 279 | vcov[[i]] = Var_b 280 | if(i>1) sigmat[i-1,] = std_b 281 | } 282 | } 283 | 284 | ###end of insert--nonparametric bootstrap procedure (clustered SE and vcov) 285 | 286 | listmat = list() 287 | for (i in 1:(j - 1)) { 288 | tabout = matrix(ncol = 4, nrow = k + 1) 289 | tabout[, 1:2] = t(rbind(betamat[i, ], sigmat[i, ])) 290 | tabout[, 3] = tabout[, 1]/tabout[, 2] 291 | tabout[, 4] = 2 * (1 - pnorm(abs(tabout[, 3]))) 292 | colnames(tabout) = c("estimate", "std", "z", "p-value") 293 | if (length(Xnames) > 0) 294 | rownames(tabout) = Xnames 295 | listmat[[i]] = tabout 296 | } 297 | if (length(ynames) > 0) 298 | names(listmat) = ynames[2:j] 299 | outlist = list() 300 | outlist$estimates = listmat 301 | outlist$baseline = ynames[1] 302 | outlist$likelihood = opt$maximum 303 | outlist$conv_code = opt$code 304 | outlist$convergence = paste(opt$type, paste(as.character(opt$iterations), 305 | "iterations"), opt$message, sep = ",") 306 | outlist$count = c(Obs = n, Explanatories = k, Choices = j) 307 | outlist$y = y 308 | outlist$X = X 309 | outlist$rowNo = row.remain 310 | outlist$coefficient = betamat_aug 311 | names(vcov) = ynames 312 | outlist$vcov = vcov 313 | outlist$cluster = cluster 314 | outlist$reps=ifelse(is.null(cluster),0,reps) 315 | 316 | print(paste("Fractional logit model estimation completed. Time:", 317 | round(proc.time()[3] - start.time[3], 1), "seconds")) 318 | return(structure(outlist, class = "fmlogit")) 319 | } 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | -------------------------------------------------------------------------------- /R/marginals.R: -------------------------------------------------------------------------------- 1 | #' Average Partial Effects of the Covariates 2 | #' 3 | #' Calculate average partial effects (APE) of independent variable from a fractional multinomial logit model. 4 | #' 5 | #' @param object An "fmlogit" object. 6 | #' @param effect Can be "marginal", for marginal effect; or "discrete", for discrete changes from 7 | #' the min to the max. 8 | #' @param marg.type Type of marginal or discrete effects to be computed. Default to "atmean", the effect at 9 | #' the mean of all covariates. Also take "aveacr", the averaged effects across all observations. See details. 10 | #' @param se Whether to calculate standard errors for those margins. See details. 11 | #' @param varlist A string vector which provides the name of variables to calculate 12 | #' the marginal effect. If missing, all variables except the constant will be calculated. 13 | #' Use "constant" if wish to compute the marginal effect of constant. 14 | #' @param marg.list A list of matrices storing the marginal effect matrix for each observation. Exists 15 | #' only if marg.type="aveacr". 16 | #' @param at Specify values of the X-matrix at which the partial effect will be retrieved. Expect a vector input 17 | #' of length K-1. Only supported for \code{marg.type="atmean"}. See \code{predict.fmlogit(newdata)}. 18 | #' @param R Number of times to sample for the Krinsky-Robb standard error. Default to 1000. 19 | #' @details This module calculates the average partial effects (APEs) from a fractional multinomial logit model. 20 | #' Partial effects are the counterpart of the marginal effects in a linear model setting. In linear models, 21 | #' usually the parameter estimate itself represents marginal effect (if the variable in question is continuous). 22 | #' In logit models, however, the parameter estimates at hand is the effect on log-ratio between the choice variable 23 | #' and the baseline variable. This function is intended to extract APEs from the 24 | #' coefficient estimates completed from the fractional multinomial logit models. 25 | #' 26 | #' This function allows for two types of partial effects: marginal effect, and discrete effect. 27 | #' Marginal effect represents how a unit change in one continuous variable x may influence the choice variable y. 28 | #' The estimate of marginal effect is very straighforward. However, special care is needed when averaging 29 | #' the marginal effect across observations to acquire APE. One approach is to use the estimate of the marginal effect while setting 30 | #' other explanatory variables at the mean. We call this marginal effect at the mean (MEM), which corresponds 31 | #' to the option \code{marg.type=atmean}. Another approach is to take the average of marginal effects for each 32 | #' individual. We call this average marginal effect (AME), which corresponds to the option \code{marg.type= 33 | #' aveacr}. 34 | #' 35 | #' The discrete effect represents how a discrete change in one specific x, discrete or continuous, influence the choice variable y. 36 | #' This is more useful for categorical variables, as calculating the "marginal effect" makes little sense 37 | #' for them. In this function, we calculate the discrete effect by changing the explanatory variable from 38 | #' its minimum to its maximum. For a binary variable, this is just the difference between 0 and 1. Similar 39 | #' to the marginal effect case, we also have discrete effect at the mean (DEM), corresponding to \code{marg.type= atmean} 40 | #' and average dscrete effect (ADE), corresponding to \code{marg.type=aveacr}. 41 | #' 42 | #' Standard error is provided for the effects by using Krinsky-Robb(KR) method. Krinsky-Robb is a simulation-based 43 | #' method that calculates the empirical value of a function given a known distribution of its variables. Here 44 | #' we provide Krinsky-Robb standard error for MEM and DEM, and the user can specify how many times of 45 | #' simulation \code{R} should the Krinsky-Robb algorithm run. 46 | #' 47 | #' The user can also specify a subset of explanatory variables when calculating effects. This is done through 48 | #' specifying string vectors containing the column names of the explanatory variables to \code{varlist}. As the 49 | #' KR standard error can be time-consuming, it is advised to calculate only the variables in need. 50 | #' 51 | #' @return The function returns an object of class "fmlogit.margins". It contains the following component: 52 | #' @return \code{effects} A matrix of calculated effects. 53 | #' @return \code{se} A matrix of standard errors corresponding to the effects. Shows up if se=T for the 54 | #' input parameter. 55 | #' @return \code{ztable} A list of matrices containing effects, standard errors, z-stats and p-values. 56 | #' @return \code{R} Number of simulation times for Krinsky-Robb standard error calculation. Null if se=F. 57 | #' @return \code{expl} String message explaining the effects calculated. 58 | #' 59 | #' @examples 60 | #' #results1 = fmlogit(y,X) 61 | #' effects(results1,effect="marginal") 62 | #' effects(results1,effect="discrete",varlist = colnames(object$X)[c(1,3)]) 63 | #' @export effects.fmlogit 64 | 65 | effects.fmlogit<-function(object,effect=c("marginal","discrete"), 66 | marg.type="atmean",se=F,varlist = NULL,at=NULL,R=1000){ 67 | j=length(object$estimates)+1; K=dim(object$estimates[[1]])[1]; N=dim(object$y)[1] 68 | betamat = object$coefficient 69 | R = R # for Krinsky-Robb sampling 70 | # determine variables 71 | Xnames = colnames(object$X); ynames = colnames(object$y) 72 | if(length(varlist)==0){ 73 | varlist=Xnames[-K] 74 | var_colNo = c(1:(K-1)) 75 | k = length(var_colNo) 76 | }else{ 77 | var_colNo = unlist(lapply(varlist, function(x) {which(Xnames == x)})) 78 | if(length(varlist) != length(var_colNo)) stop("Unrecognized varlist input. Please double check your spelling") 79 | k = length(var_colNo) 80 | } 81 | 82 | xmarg = matrix(ncol=k,nrow=j) 83 | se_mat = matrix(ncol=k,nrow=j) 84 | marg_list = list() 85 | 86 | if(effect == "marginal"){ 87 | # calculate marginal effects 88 | yhat = predict(object); yhat = as.matrix(yhat) 89 | for(c in var_colNo){ 90 | c1 = which(var_colNo == c) 91 | if(marg.type == "aveacr"){ 92 | # this is the average marginal effect for all observations 93 | beta_bar = as.vector(yhat %*% betamat[,c]) 94 | betak_long = matrix(rep(betamat[,c],N),nrow=N,byrow=T) 95 | marg_mat = yhat * (betak_long-beta_bar) 96 | xmarg[,c1] = colMeans(marg_mat) 97 | marg_list[[c1]] = marg_mat 98 | } 99 | if(marg.type == "atmean"){ 100 | # this is the marginal effect at the mean 101 | # mean calculation 102 | if(is.null(at)) at = colMeans(object$X[,-K]) 103 | yhat_mean = predict(object,newdata=at) 104 | beta_bar = sum(yhat_mean * betamat[,c]) 105 | betak = betamat[,c] 106 | marg_vec = yhat_mean * (betak - beta_bar) 107 | xmarg[,c1] = as.numeric(marg_vec) 108 | } 109 | if(se==T){ 110 | # se calculation, using atmean by default 111 | se_k = rep(0,j) 112 | for(i in 1:j){ 113 | se_k[i] = sqrt(diag(object$vcov[[i]])[c]) 114 | new_betak = rnorm(R,betamat[j,c],se_k[i]) 115 | marg_matrix = matrix(nrow=R,ncol=j) 116 | for(r in 1:R){ 117 | new_betamat = betamat; new_betamat[i,c] = new_betak[r] 118 | yhat_mean = predict(object,newdata=colMeans(object$X[,-K]),newbeta = new_betamat) 119 | beta_bar = sum(yhat_mean * new_betamat[,c]) 120 | betak = new_betamat[,c] 121 | marg_vec = yhat_mean * (betak - beta_bar) 122 | marg_matrix[r,i] = as.numeric(marg_vec)[i] 123 | } 124 | se_mat[i,c1] = sd(marg_matrix[,i]) 125 | }}}} 126 | 127 | if(effect=="discrete"){ 128 | for(c in var_colNo){ 129 | c1 = which(var_colNo == c) 130 | if(marg.type == "aveacr"){ 131 | Xmin <- Xmax <- object$X[,-K] 132 | Xmin[,c] = min(object$X[,c]) 133 | Xmax[,c] = max(object$X[,c]) 134 | yhat_min = predict(object,newdata=Xmin) 135 | yhat_max = predict(object,newdata=Xmax) 136 | ydisc = yhat_max - yhat_min 137 | xmarg[,c1] = colMeans(ydisc) 138 | marg_list[[c1]] = ydisc 139 | } 140 | if(marg.type == "atmean"){ 141 | if(is.null(at)) at = colMeans(object$X[,-K]) 142 | Xmin <- Xmax <- at 143 | Xmin[c] = min(object$X[,c]) 144 | Xmax[c] = max(object$X[,c]) 145 | yhat_min = predict(object,newdata=Xmin) 146 | yhat_max = predict(object,newdata=Xmax) 147 | ydisc = yhat_max - yhat_min 148 | xmarg[,c1] = as.numeric(ydisc) 149 | } 150 | if(se==T){ 151 | # se calculation for discrete margins. using atmean by default 152 | se_k = rep(0,j) 153 | Xmin <- Xmax <- colMeans(object$X[,-K]) 154 | Xmin[c] = min(object$X[,c]) 155 | Xmax[c] = max(object$X[,c]) 156 | marg_matrix = matrix(nrow=R,ncol=j) 157 | for(i in 1:j){ 158 | se_k[i] = sqrt(diag(object$vcov[[i]])[c]) 159 | new_betak = rnorm(R,betamat[j,c],se_k[i]) 160 | for(r in 1:R){ 161 | new_betamat = betamat; new_betamat[i,c] = new_betak[r] 162 | yhat_min = predict(object,newdata=Xmin,newbeta = new_betamat) 163 | yhat_max = predict(object,newdata=Xmax,newbeta = new_betamat) 164 | ydisc = yhat_max - yhat_min 165 | marg_matrix[r,i] = as.numeric(ydisc)[i] 166 | } 167 | se_mat[i,c1] = sd(marg_matrix[,i]) 168 | }}}} 169 | # generating hypothesis testing tables. 170 | listmat = list() 171 | if(se){ 172 | for(i in 1:k){ 173 | tabout = matrix(ncol=4,nrow=j) 174 | tabout[,1:2] = cbind(xmarg[,i],se_mat[,i]) 175 | tabout[,3] = tabout[,1] / tabout[,2] 176 | tabout[,4] = 2*(1-pnorm(abs(tabout[,3]))) 177 | colnames(tabout) = c("estimate","std","z","p-value") 178 | rownames(tabout) = ynames 179 | listmat[[i]] = tabout 180 | } 181 | names(listmat)=varlist 182 | } 183 | 184 | 185 | colnames(xmarg) <- colnames(se_mat) <- varlist 186 | rownames(xmarg) <- rownames(se_mat) <-colnames(object$y) 187 | outlist=list() 188 | outlist$effects = xmarg 189 | if(se==T){outlist$se = se_mat; outlist$ztable = listmat} 190 | if(marg.type=="aveacr") {names(marg_list)=varlist; outlist$marg.list = marg_list} 191 | marg.type.out = ifelse(marg.type=="atmean","at the mean,","average across observations,") 192 | 193 | # please include this in the file 194 | outlist$R = ifelse(se,R,0) 195 | # please 196 | 197 | outlist$expl = paste(effect,"effect",marg.type.out, 198 | ifelse(se==T,"Krinsky-Robb standard error calculated","standard error not computed")) 199 | return(structure(outlist,class="fmlogit.margins")) 200 | } 201 | -------------------------------------------------------------------------------- /R/plot_effects.R: -------------------------------------------------------------------------------- 1 | #' Plot marginal or discrete effects, at each observation & for each choice 2 | #' 3 | #' Plot the desired effect at each observed value for each choice 4 | #' 5 | #' @param object An "fmlogit.margins" object. 6 | #' @param varlist A string vector which provides the name of variables to plot the effect. 7 | #' If missing, all variables in object will be plotted. 8 | #' @param X The covariates matrix. Recommend to use element X from the fmlogit object. 9 | #' @param y The covariates matrix. Recommend to use element y from the fmlogit object. 10 | #' @param against A vector with the same length as the number of observations in the model. 11 | #' Serve as the x-axis in the plots. 12 | #' @param against.x A character string, Supply the column name in the X matrix to be plot against. 13 | #' @param against.y A character string, Supply the column name in the y matrix to be plot against. 14 | #' @param group.x A character string. Supply the column name in the X matrix to be grouped upon. 15 | #' @param group.by A character string. Supply additional algebra emposed on the group variable. 16 | #' @param mfrow A numeric vector with two elements. Specify the number of rows and columns in a panel. 17 | #' Similar to par(mfrow=c()). Default to Null, and the program will choose a square panel. 18 | #' @return Panel plots of effects vs. chosen variables 19 | #' @details 20 | #' This function provides a visualization tool for potentially heterogeneous marginal and discrete effects. 21 | #' The function lets the user to plot marginal effects to detect any patterns in the effects, in itself 22 | #' and against other variables. The plot also allows visualization of sub-groups in data, which can be 23 | #' very useful to visualize categorical and dummy variables. 24 | #' 25 | #' The functions takes an fmlogit.margins object, created by the effects(fmlogit) function. Note that since 26 | #' the plotting requires marginal effects for all observations, the object should be created by choosing 27 | #' \code{marg.type="aveacr"}, the average across method for effects calculation. 28 | #' 29 | #' Additional parameters including \code{varlist}, a vector of string variable names to be plotted. \code{X} 30 | #' and \code{y}, the dependent and independent variable matrix in the original regression model. 31 | #' 32 | #' \code{against}, \code{against.x}, and \code{against.y} allows different variables to be chosen 33 | #' as the x-axis. \code{against} directly supplies the vector to be plotted against, whereas \code{against.x} 34 | #' and \code{against.y} supplies variable names in the original dataset. Note that the user has to provide 35 | #' \code{X} and \code{y} in order to use the column name option, respectively. 36 | #' 37 | #' \code{group.x} supplies the column name in the X matrix to be grouped by. The plot will be able to 38 | #' differentiate different groups by colors. Additionally, the user can supply a string to \code{group.by}, 39 | #' which provides a algebra method that will be evaluated on the group vector. For example, choose 40 | #' \code{group.x = "a"} and \code{group.by= ">0"} will create two groups, one with X$a>0, and one with X$a 41 | #' <=0 42 | #' @examples 43 | #' # Not running 44 | #' # results1 = fmlogit(y,X) 45 | #' # effect1 = effects(results1,effect="marginal",marg.type="aveacr") 46 | #' 47 | #' # Plot only takes effects with marg.type="aveacr". 48 | #' plot(effect1,X=results1$X,against.x = "popdens", group = "tot", groupby = ">3") 49 | #' @export plot.fmlogit.margins 50 | 51 | 52 | 53 | plot.fmlogit.margins = function(object,varlist=NULL,X=NULL,y=NULL, 54 | against=NULL,against.x=NULL,against.y=NULL, 55 | group.x=NULL, group.algebra=NULL, 56 | mfrow=NULL){ 57 | require(ggplot2) 58 | require(grid) 59 | 60 | if(is.null(object[["marg.list"]])) stop("Please choose marg.type=aveacr when calculating effects") 61 | k = ncol(object$effects); j = nrow(object$effects); N = nrow(object$marg.list[[1]]); 62 | Xnames = colnames(object$effects) ; ynames = rownames(object$effects) 63 | # X = object$X; y=object$y 64 | 65 | # determine variable list 66 | if(length(varlist)==0){ 67 | varlist=Xnames 68 | var_colNo = 1:k 69 | }else{ 70 | var_colNo = which(Xnames %in% varlist) 71 | k = length(var_colNo) 72 | } 73 | if(k==0) stop("Variable list not matched. Please check your varlist input.") 74 | 75 | # determine panel size 76 | if(is.null(mfrow)){ 77 | js = ceiling(sqrt(j)) 78 | jr = ifelse(js*(js-1)>j,js-1,js) 79 | }else{ 80 | jr = mfrow[1]; js = mfrow[2] 81 | } 82 | 83 | # determine plotting x axis. 84 | if(is.null(against) & is.null(against.x) & is.null(against.y)){ 85 | M.against=1:N 86 | ag.name = "ObsNo" 87 | }else if(is.null(against.x)==F){ 88 | M.against = X[,against.x] 89 | if(is.null(M.against)){ 90 | stop("against.x not found in variable list. Please double check your spelling") 91 | } 92 | ag.name = against.x 93 | }else if(is.null(against.y)==F){ 94 | M.against = y[,against.y] 95 | ag.name = against.y 96 | }else{M.against=against} 97 | 98 | 99 | # determine group variables 100 | if(is.null(group.x) & is.null(group.algebra)) {M.group=NULL; g.name=NULL} 101 | if(is.null(group.x)==F) {M.group = X[,group.x]; g.name.display <- g.name <- group.x;} 102 | if(is.null(group.algebra)==F) { 103 | M.group = eval(parse(text=paste("X[,",'"',group.x,'"',"]",group.algebra,sep=""))) 104 | M.group = ifelse(M.group,"Yes","No") 105 | g.name = group.x 106 | g.name.display = paste(group.x,group.algebra,sep="") 107 | } 108 | 109 | for(c in var_colNo){ 110 | ggplot() 111 | pushViewport(viewport(layout = grid.layout(jr, js))) 112 | temp.data = cbind(object$marg.list[[c]],M.against) 113 | temp.data = as.data.frame(temp.data) 114 | colnames(temp.data) = c(colnames(object$marg.list[[c]]),ag.name) 115 | if(is.null(M.group)==F){ 116 | temp.data = cbind(temp.data,as.factor(M.group)) 117 | colnames(temp.data)[-1] = g.name} 118 | for(i in 1:j){ 119 | g <- ggplot(temp.data,aes_string(ag.name,ynames[i],color=g.name)) + geom_point() 120 | g <- g + geom_hline(yintercept = 0) + theme_classic() + ggtitle(paste("Effects on", Xnames[c])) 121 | if(is.null(M.group)==F) g <- g + theme(legend.title = element_text(colour="black"))+ 122 | scale_color_discrete(name=g.name.display) 123 | print(g,vp = viewport(layout.pos.row = ifelse(i%%jr==0,jr,i%%jr), layout.pos.col = (i-1) %/%js + 1) ) 124 | } 125 | }} -------------------------------------------------------------------------------- /R/plot_effects_1.R: -------------------------------------------------------------------------------- 1 | #' Plot marginal or discrete effects of willingness to pay 2 | #' 3 | #' Plot marginal or discrete effects of willingness to pay, potentially against another variable 4 | #' 5 | #' @param object An "fmlogit" object. 6 | #' @param varlist A string vector which provides the name of variables to plot the effect. 7 | #' If missing, all variables in object will be plotted. 8 | #' @param X The covariates matrix. Recommend to use element X from the fmlogit object. 9 | #' @param y The covariates matrix. Recommend to use element y from the fmlogit object. 10 | #' @param against A vector with the same length as the number of observations in the model. 11 | #' Serve as the x-axis in the plots. 12 | #' @param mfrow A numeric vector with two elements. Specify the number of rows and columns in a panel. 13 | #' Similar to par(mfrow=c()). Default to Null, and the program will choose a square panel. 14 | #' @param plot.show If true, the plot will be created. Otherwise the function returns raw data that can be 15 | #' used to create user-specified (fancier) plots. 16 | #' @return Panel plots of effects vs. chosen variables 17 | #' @details 18 | #' This function provides a visualization tool for potentially heterogeneous marginal and discrete effects. 19 | #' The function lets the user to plot marginal effects to detect any patterns in the effects, in itself 20 | #' and against other variables. The plot also allows visualization of sub-groups in data, which can be 21 | #' very useful to visualize categorical and dummy variables. 22 | #' 23 | #' The functions takes an fmlogit.margins object, created by the effects(fmlogit) function. Note that since 24 | #' the plotting requires marginal effects for all observations, the object should be created by choosing 25 | #' \code{marg.type="aveacr"}, the average across method for effects calculation. 26 | #' 27 | #' Additional parameters including \code{varlist}, a vector of string variable names to be plotted. \code{X} 28 | #' and \code{y}, the dependent and independent variable matrix in the original regression model. 29 | #' 30 | #' \code{against}, \code{against.x}, and \code{against.y} allows different variables to be chosen 31 | #' as the x-axis. \code{against} directly supplies the vector to be plotted against, whereas \code{against.x} 32 | #' and \code{against.y} supplies variable names in the original dataset. Note that the user has to provide 33 | #' \code{X} and \code{y} in order to use the column name option, respectively. 34 | #' 35 | #' \code{group.x} supplies the column name in the X matrix to be grouped by. The plot will be able to 36 | #' differentiate different groups by colors. Additionally, the user can supply a string to \code{group.by}, 37 | #' which provides a algebra method that will be evaluated on the group vector. For example, choose 38 | #' \code{group.x = "a"} and \code{group.by= ">0"} will create two groups, one with X$a>0, and one with X$a 39 | #' <=0 40 | #' @examples 41 | #' # Not running 42 | #' # results1 = fmlogit(y,X) 43 | #' # effect1 = effects(results1,effect="marginal",marg.type="aveacr") 44 | #' 45 | #' # Plot only takes effects with marg.type="aveacr". 46 | #' plot(effect1,X=results1$X,against.x = "popdens", group = "tot", groupby = ">3") 47 | #' @export plot.fmlogit 48 | 49 | 50 | plot.fmlogit = function(object,wtp.vec,varlist, against=NULL,mfrow=NULL,t=500,effect=c("discrete","marginal"), 51 | type="l",plot.show=T,...){ 52 | K = ncol(object$X); j = ncol(object$y); N = nrow(object$X); 53 | Xnames = colnames(object$X) ; ynames = colnames(object$y) 54 | X = object$X; y=object$y 55 | 56 | # determine variable list 57 | var_colNo = which(Xnames %in% varlist) 58 | k = length(var_colNo) 59 | 60 | if(is.null(mfrow)){ 61 | js = ceiling(sqrt(k)) 62 | jr = ifelse(js*(js-1)>=k,js-1,js) 63 | }else{ 64 | jr = mfrow[1]; js = mfrow[2] 65 | } 66 | 67 | if(!is.null(against)) { 68 | ag_No = which(Xnames == against) 69 | if(length(ag_No)==0) stop(paste("The against vector specified,",against, 70 | "is not in the list of explanatory variables. Please check again.")) 71 | ag_min = min(X[,ag_No]); ag_max = max(X[,ag_No]) 72 | ag_vec = seq(ag_min,ag_max,length.out = t) 73 | wtp_mat = matrix(nrow=t,ncol=k) 74 | colnames(wtp_mat) = varlist 75 | for(i in 1:t){ 76 | newdata = colMeans(X[,-K]) 77 | newdata[ag_No] = ag_vec[i] 78 | wtp_mat[i,] = wtp(effects(object,effect=effect,se=F,varlist=varlist,at=newdata),wtp.vec)[[1]] 79 | } 80 | }else{ 81 | against="ObsNo" 82 | ag_vec=1:N 83 | wtp_mat = matrix(nrow=N,ncol=k) 84 | colnames(wtp_mat) = varlist 85 | for(i in 1:N){ 86 | newdata = X[i,-K] 87 | wtp_mat[i,] = wtp(effects(object,effect=effect,se=F,varlist=varlist,at=newdata),wtp.vec)[[1]] 88 | } 89 | } 90 | # plotting 91 | if(plot.show){ 92 | par(mfrow=c(jr,js)) 93 | if(is.null(type)){type="l"} # default to line plot. 94 | for(i in 1:k){ 95 | plot(ag_vec,wtp_mat[,i],xlab=against,ylab=paste(effect,"effect of", varlist[i]),...) 96 | }} 97 | return(list(ag_vec,wtp_mat)) 98 | } 99 | -------------------------------------------------------------------------------- /R/predictions.R: -------------------------------------------------------------------------------- 1 | #' Extract fitted values, residuals, and predictions 2 | #' 3 | #' @name fitted.fmlogit 4 | #' @aliases residuals.fmlogit 5 | #' @aliases predicted.fmlogit 6 | #' Extract fitted dependent variable from a fractional multinomial logit model. 7 | #' @param object A "fmlogit" object. 8 | #' @param newdata A new X matrix to perform model prediction. If Null, default to the original dataset. 9 | #' X can be a vector with length k, or a matrix with k columns, where k is the number of explanatory 10 | #' variables in the original model. 11 | #' @param newbeta A new augmented matrix of coefficients that can be used to predict outcome variables. 12 | #' Feeds into object$coefficient, which contains the baseline coefficient. Useful for constructing 13 | #' confidence intervals via simulation or bootstrapping. 14 | #' @examples 15 | #' #results1 = fmlogit(y,X) 16 | #' fitted(results1) 17 | #' residuals(results1) 18 | #' predict(results1) 19 | #' # predict using the first observation from the original dataset. 20 | #' predict(results1,X[1,]) 21 | #' @rdname fitted.fmlogit 22 | #' @export fitted.fmlogit 23 | #' 24 | 25 | 26 | fitted.fmlogit <-function(object){ 27 | j=length(object$estimates)+1; k=dim(object$estimates[[1]])[1]; N=dim(object$y)[1] 28 | betamat_aug = object$coefficient; X=object$X; y=object$y 29 | sum_expxb = rowSums(exp(X %*% t(betamat_aug))) # sum of the exp(x'b)s 30 | yhat = y 31 | for(i in 1:j){ 32 | expxb = exp(X %*% betamat_aug[i,]) # individual exp(x'b) 33 | yhat[,i] = expxb / sum_expxb 34 | } 35 | return(as.data.frame(yhat)) 36 | } 37 | 38 | #' @rdname fitted.fmlogit 39 | #' @export residuals.fmlogit 40 | #' 41 | residuals.fmlogit <- function(object){ 42 | yhat = fitted(object) 43 | return(as.data.frame(object$y-yhat)) 44 | } 45 | 46 | #' @rdname fitted.fmlogit 47 | #' @export predict.fmlogit 48 | #' 49 | predict.fmlogit <- function(object,newdata=NULL,newbeta = NULL){ 50 | if(length(newdata)==0) return(fitted(object)) 51 | if(length(newbeta)>0) object$coefficient = newbeta 52 | j=length(object$estimates)+1; k=dim(object$estimates[[1]])[1]; N=dim(object$y)[1] 53 | betamat_aug = object$coefficient; 54 | newdata = as.matrix(newdata) 55 | if(length(newdata) == dim(newdata)[1]) newdata = t(newdata) # vector 56 | if(k != dim(newdata)[2]+1) stop(paste("Dimension of newdata is wrong. Should be",k-1,"instead of",dim(newdata)[2])) 57 | X = cbind(newdata,1); N = dim(X)[1] 58 | yhat = matrix(ncol=j,nrow=N); colnames(yhat) = colnames(object$y) 59 | sum_expxb = rowSums(exp(X %*% t(betamat_aug))) # sum of the exp(x'b)s 60 | for(i in 1:j){ 61 | expxb = exp(X %*% betamat_aug[i,]) # individual exp(x'b) 62 | yhat[,i] = expxb / sum_expxb 63 | } 64 | return(as.data.frame(yhat)) 65 | } -------------------------------------------------------------------------------- /R/spending_data.R: -------------------------------------------------------------------------------- 1 | #' Government Spending by Dutch Cities in 2005 2 | #' 3 | #' Data from 429 Dutch cities with governmental spending on each sub-category 4 | #' , and city attributes. 5 | #' 6 | #' @docType data 7 | #' 8 | #' @usage data(spending) 9 | #' 10 | #' @format A data frame with 429 row and 12 columns. 11 | #' @keywords datasets 12 | #' 13 | #' @source \href{http://fmwww.bc.edu/repec/bocode/c/citybudget.dta} 14 | #' 15 | #' @examples 16 | #' spending 17 | 18 | "spending" 19 | 20 | -------------------------------------------------------------------------------- /R/summary.R: -------------------------------------------------------------------------------- 1 | #' Generate summary tables for fmlogit objects 2 | #' 3 | #' Generate tables of coefficient estimates, partial effects, and willingness to pay from 4 | #' fmlogit-type objects. 5 | #' 6 | #' @name summary.fmlogit 7 | #' @aliases summary.fmlogit.margins 8 | #' @aliases summary.fmlogit.wtp 9 | #' 10 | #' @param object an object with class "fmlogit", "fmlogit.margins", or "fmlogit.wtp". 11 | #' @param varlist select a subset of variable names to be processed. Default to NULL, of which all variables will 12 | #' be processed. 13 | #' @param sepline whether the output table uses separate lines for coefficients and standard errors. 14 | #' @param digits number of digits to be signifed. Default to show 3 digits. 15 | #' @param add.info whether to add additional descriptive information to the output. 16 | #' @param list whether to output a list object, or a single data frame. 17 | #' @param sigcode the significance code to be used. Has to be a three-component vector. 18 | #' @return Either a list (for display purposes) or a data.frame (for csv output purposes). If list return (which is 19 | #' the default) is selected, then the list will contain 4 components: $estimates the estimate; $N number of 20 | #' observations, $llf value of the log-likelihood function; and $baseline the name of the baseline choice. 21 | #' 22 | #' @details This module provides summary methods for three fmlogit objects: \code{fmlogit}, \code{fmlogit.margins} 23 | #' , and \code{fmlogit.wtp}. 24 | #' 25 | #' The summary method offers several options to the users. The user can choose for a list output \code{list=T}, which is 26 | #' good for display and quoting purposes, or a data frame output \code{list=F}, which is good for table outputs. The user 27 | #' can also specify whether to provide additional information other than the parameter estimates, whether to use 28 | #' seperate lines for the estimates and the standard errors (which mimics the output style in Stata), 29 | #' as well as the significance code. 30 | #' 31 | #' @examples 32 | #' # generate fmlogit summary 33 | #' #results1 = fmlogit(y,X) 34 | #' 35 | #' # generate marginal effects summary 36 | #' #effects1 = effects(results1,effect="marginal") 37 | #' summary(effects1) 38 | #' 39 | #' # generate latex style output 40 | #' # require(xtable) 41 | #' xtable(summary(effects1,list=F,sepline=T)) 42 | #' @rdname summary.fmlogit 43 | #' @export summary.fmlogit 44 | 45 | ############ 46 | # generate fmlogit style table 47 | ########### 48 | 49 | summary.fmlogit = function(object,varlist=NULL,sepline=F,digits=3,add.info=T,list=T,sigcode=c(0.05,0.01,0.001), 50 | print=F){ 51 | # define significance code first. 52 | asterisk = function(x,k=sigcode){ 53 | if(x>k[1]) return("") 54 | if(x>k[2]) return("*") 55 | if(x>k[3]){return("**")}else 56 | {return("***")} 57 | } 58 | # main text 59 | # pre matters 60 | if(!class(object)=="fmlogit") stop("Expect an fmlogit object. Wrong object type given.") 61 | ynames = names(object[[1]]); Xnames = rownames(object[[1]][[1]]) 62 | if(length(varlist)==0){varlist=Xnames} 63 | var_colNo = which(Xnames %in% varlist) 64 | j = object$count[3]; K = length(var_colNo) 65 | if(K < length(varlist)) warning("Some variables requested are not in the variable list. Those variables are omitted.") 66 | varlist = Xnames[var_colNo] 67 | # generating tables 68 | if(!sepline){ 69 | store_mat = matrix(ncol=j-1,nrow=K) 70 | colnames(store_mat)=ynames 71 | rownames(store_mat)=Xnames[var_colNo] 72 | for(i in 1:(j-1)){ 73 | temp_data = signif(object$estimates[[i]][var_colNo,],digits=digits) 74 | if(is.null(dim(temp_data))){ 75 | store_mat[,i] = paste(temp_data[1],"(",temp_data[2],")",asterisk(temp_data[4]),sep="") 76 | next 77 | } 78 | store_mat[,i]=apply(temp_data, 1, function(x) paste(x[1],"(",x[2],")",asterisk(x[4]),sep="")) 79 | }}else{ 80 | store_beta = store_se = matrix(ncol=j-1,nrow=K) 81 | colnames(store_beta)=ynames 82 | rownames(store_beta)=varlist 83 | for(i in 1:(j-1)){ 84 | temp_data = signif(object$estimates[[i]][var_colNo,],digits=digits) 85 | if(is.null(dim(temp_data))) temp_data = as.matrix(temp_data) 86 | store_beta[,i]=apply(temp_data,1, function(x) paste(x[1],asterisk(x[4]),sep="")) 87 | store_se[,i]=apply(temp_data, 1, function(x) paste("(",x[2],")",sep="")) 88 | } 89 | for(i in 1:K){ 90 | if(i==1) store_mat=matrix(ncol=j-1) 91 | store_mat = rbind(store_mat,store_beta[i,],store_se[i,]) 92 | } 93 | store_mat=store_mat[-1,] 94 | rownames(store_mat) = rep(" ",length=nrow(store_mat)) 95 | rownames(store_mat)[seq(1,K*2,2)] = varlist 96 | } 97 | # output matters 98 | sig.print = paste("Significance code: 0", "'***'", sigcode[3], "'**'", sigcode[2], "'*'", sigcode[1], "' ", 1) 99 | if(add.info){ 100 | nc = paste("N=",object$count[1],sep="") 101 | llf = paste("log pseudo-likelihood=",round(object$likelihood,digits=2),sep="") 102 | bl = paste("Baseline choice:", object$baseline) 103 | } 104 | if(list){ 105 | outlist = list(estimates=store_mat) 106 | if(add.info){ 107 | outlist$N = nc 108 | outlist$llf = llf 109 | outlist$baseline = bl 110 | outlist$sigcode = sig.print 111 | } 112 | if(print){print(outlist)} 113 | return(outlist) 114 | }else{ 115 | if(add.info){ 116 | info = matrix(ncol=j-1,nrow=4) 117 | info[,1] = c(nc,llf,bl,sig.print) 118 | store_mat = rbind(store_mat,info) 119 | } 120 | if(print){print(store_mat)} 121 | return(as.data.frame(store_mat)) 122 | } 123 | } 124 | 125 | ########## 126 | # summary for fmlogit.margins 127 | ########## 128 | 129 | #' @rdname summary.fmlogit 130 | #' @export summary.fmlogit.margins 131 | 132 | summary.fmlogit.margins = function(object,varlist=NULL,sepline=F,digits=3,add.info=T,list=T,sigcode=c(0.05,0.01,0.001), 133 | print=F){ 134 | # define significance code first. 135 | asterisk = function(x,k=sigcode){ 136 | if(x>k[1]) return("") 137 | if(x>k[2]) return("*") 138 | if(x>k[3]){return("**")}else 139 | {return("***")} 140 | } 141 | # main text 142 | if(!class(object)=="fmlogit.margins") stop("Expect an fmlogit.margins object. Wrong object type given.") 143 | ynames = rownames(object[[1]]); Xnames = colnames(object[[1]]) 144 | if(length(varlist)==0) varlist=Xnames 145 | var_colNo = which(Xnames %in% varlist) 146 | j = length(ynames); K = length(var_colNo) 147 | if(K < length(varlist)) warning("Some variables requested are not in the variable list. Those variables are omitted.") 148 | varlist = Xnames[var_colNo] 149 | 150 | # table process 151 | if(object$R==0) sepline=FALSE 152 | if(!sepline){ 153 | store_mat = matrix(ncol=j,nrow=K) 154 | colnames(store_mat)=ynames 155 | rownames(store_mat)=Xnames 156 | if(object$R>0){ 157 | for(i in var_colNo){ 158 | temp_data = signif(object$ztable[[i]],digits=digits) 159 | store_mat[i,]=apply(temp_data, 1, function(x) paste(x[1],"(",x[2],")",asterisk(x[4]),sep="")) 160 | }}else{ 161 | store_mat = signif(t(object$effects),digits=digits) 162 | } 163 | }else{ 164 | store_beta = store_se = matrix(ncol=j,nrow=K) 165 | colnames(store_beta)=ynames 166 | rownames(store_beta)=Xnames 167 | for(i in var_colNo){ 168 | temp_data = signif(object$ztable[[i]],digits=digits) 169 | store_beta[i,]=apply(temp_data,1, function(x) paste(x[1],asterisk(x[4]),sep="")) 170 | store_se[i,]=apply(temp_data, 1, function(x) paste("(",x[2],")",sep="")) 171 | } 172 | for(i in 1:K){ 173 | if(i==1) store_mat=matrix(ncol=j) 174 | store_mat = rbind(store_mat,store_beta[i,],store_se[i,]) 175 | } 176 | store_mat=store_mat[-1,] 177 | rownames(store_mat) = rep("",length=nrow(store_mat)) 178 | rownames(store_mat)[seq(1,K*2,2)] = varlist 179 | } 180 | # output matters 181 | sig.print = paste("Significance code: 0", "'***'", sigcode[3], "'**'", sigcode[2], "'*'", sigcode[1], "' ", 1) 182 | if(add.info){ 183 | expl = object$expl 184 | } 185 | if(list){ 186 | outlist = list(estimates=store_mat) 187 | if(add.info){ 188 | outlist$expl = expl 189 | outlist$sigcode = sig.print 190 | } 191 | if(print){print(outlist)} 192 | return(outlist) 193 | }else{ 194 | if(add.info){ 195 | info = matrix(ncol=j,nrow=2) 196 | info[,1] = c(expl,sig.print) 197 | store_mat = rbind(store_mat,info) 198 | } 199 | if(print){print(store_mat)} 200 | return(as.data.frame(store_mat)) 201 | } 202 | } 203 | 204 | ############ 205 | # generate willingness to pay tables 206 | ############ 207 | 208 | #' @rdname summary.fmlogit 209 | #' @export summary.fmlogit.wtp 210 | 211 | summary.fmlogit.wtp = function(object,varlist=NULL,sepline=F,digits=3,sigcode=c(0.05,0.01,0.001), 212 | print=F){ 213 | # define significance code first. 214 | asterisk = function(x,k=sigcode){ 215 | if(x>k[1]) return("") 216 | if(x>k[2]) return("*") 217 | if(x>k[3]){return("**")}else 218 | {return("***")} 219 | } 220 | # main text 221 | if(!class(object)=="fmlogit.wtp") stop("Expect an fmlogit.wtp object. Wrong object type given.") 222 | if(colnames(object$wtp)[1]!="estimate") return(object$wtp) # no need to summary. 223 | Xnames = rownames(object$wtp) 224 | if(length(varlist)==0) varlist=Xnames 225 | var_colNo = which(Xnames %in% varlist) 226 | K = length(var_colNo) 227 | if(K < length(varlist)) warning("Some variables requested are not in the variable list. Those variables are omitted.") 228 | varlist = Xnames[var_colNo] 229 | sig.print = paste("Significance code: 0", "'***'", sigcode[3], "'**'", sigcode[2], "'*'", sigcode[1], "' ", 1) 230 | if(!sepline){ 231 | # table process 232 | store_mat = apply(signif(object$wtp[var_colNo,],digits=digits), 1, function(x) paste(x[1],"(",x[2],")",asterisk(x[4]),sep="")) 233 | store_mat = as.data.frame(store_mat) 234 | colnames(store_mat)=NULL 235 | # output matters 236 | }else{ 237 | store_beta=apply(signif(object$wtp[var_colNo,],digits=digits),1, function(x) paste(x[1],asterisk(x[4]),sep="")) 238 | store_se=apply(signif(object$wtp[var_colNo,],digits=digits), 1, function(x) paste("(",x[2],")",sep="")) 239 | for(i in 1:K){ 240 | if(i==1) store_mat=vector() 241 | store_mat = c(store_mat,store_beta[i],store_se[i]) 242 | } 243 | names(store_mat) = rep("",length=length(store_mat)) 244 | names(store_mat)[seq(1,K*2,2)] = varlist 245 | } 246 | if(print){print(store_mat);print(sig.print)} 247 | return(store_mat) 248 | } 249 | -------------------------------------------------------------------------------- /R/wtp.R: -------------------------------------------------------------------------------- 1 | #' "Willingness to Pay" for fmlogit models 2 | #' 3 | #' Calculates the willingness to pay for fractional multinomial logit models. 4 | #' 5 | #' @param object An "fmlogit.margins" object. 6 | #' @param wtp.vec A 1*J vector that contains the willingness to pay for each choice j. 7 | #' @param varlist A string vector which provides the name of variables to calculate 8 | #' the wtp. If missing, all variables in object will be calculated. 9 | #' @return A matrix containing the estimates, standard error, z-stats, and p-value. 10 | #' @details This function calculates the aggregate effect of a variable on the 11 | #' "willingness to pay" by linearly multiplying the average partial effect with ex-ante (arbitary) 12 | #' willingness to pay numbers associated with each choice. 13 | #' 14 | #' Suppose there are three choices A,B,C, each with a willingness to pay (or cost, profit, budget), 15 | #' of 100, 200, and 300. The discrete effect of variable X on A,B and C are 0.5, 0.5, and -1, with 16 | #' standard error 0.2, 0.3 and 0.5. The aggregated discrete effect of X on the total willingness 17 | #' to pay (or cost), is thus 100*0.5 + 200*0.5 + 300*(-1) = -150. And the standard error can be also 18 | #' calculated to be 162.8, assuming that the standard error is independent. 19 | #' A simple z-test is provided to test whether the aggregate effect is different from zero. 20 | #' 21 | #' Note that if the input fmlogit.margins object has no standard error computation, then no standard error 22 | #' @examples 23 | #' #results1 = fmlogit(y,X) 24 | #' #effects1 = effects(results1,effect="marginal",se=T) 25 | #' # assume that the WTP = 1,2,3,...J for each choice j. 26 | #' wtp(effects1,seq(1:nrow(effects1$effects))) 27 | #' @export wtp 28 | 29 | wtp = function(object,wtp.vec,varlist=NULL,indv.obs=F){ 30 | j=nrow(object$effects); k=ncol(object$effects) 31 | Xnames = colnames(object$effects); ynames = rownames(object$effects) 32 | if(length(varlist)==0){ 33 | varlist=Xnames 34 | var_colNo = c(1:k) 35 | k = length(var_colNo) 36 | }else{ 37 | var_colNo = which(varlist %in% Xnames) 38 | k = length(var_colNo) 39 | } 40 | if(length(wtp.vec)!=j) stop("Wrong length of wtp.vec. Please check specification again.") 41 | # wtp calcs 42 | betamat = object$effects[,varlist]; semat = object$se[,varlist] 43 | wtp_mean = wtp.vec %*% betamat 44 | 45 | if(object$R>0){ # prevent a bug that does not output R in the effects.fmlogit module. 46 | wtp_se = sqrt(wtp.vec^2 %*% semat^2) 47 | # output tables 48 | tabout = matrix(ncol=4,nrow=k) 49 | tabout[,1] = wtp_mean 50 | tabout[,2] = wtp_se 51 | tabout[,3] = tabout[,1] / tabout[,2] 52 | tabout[,4] = 2*(1-pnorm(abs(tabout[,3]))) 53 | colnames(tabout) = c("estimate","std","z","p-value") 54 | rownames(tabout) = varlist 55 | }else tabout = wtp_mean 56 | if(indv.obs){ 57 | wtp_mat = matrix(ncol = k, nrow=nrow(object$marg.list[[1]])) 58 | for(c in var_colNo){ 59 | c1 = which(var_colNo == c) 60 | wtp_mat[,c1] = as.matrix(object$marg.list[[c1]]) %*% wtp.vec 61 | } 62 | colnames(wtp_mat) = varlist 63 | } 64 | # output list 65 | outlist = list() 66 | outlist$wtp = tabout 67 | if(indv.obs) outlist$wtp.obs = wtp_mat 68 | return(structure(outlist,class="fmlogit.wtp")) 69 | } 70 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "The fmlogit Package: A Light Document" 3 | author: "Xinde James Ji" 4 | date: "Oct 10, 2016" 5 | output: pdf_document 6 | --- 7 | 8 | This document provides an overview of the fmlogit package in R. Updates will be published at [my github site](https://github.com/f1kidd/fmlogit). Any suggestions or concerns are welcome. 9 | 10 | # What is the fractional multinomial logit model? 11 | Fractional multinomial logit models estimate fractional responses by modelling the dependent variables as fractions using multinomial logits. It is the preferred model when the true data generation process is indeed fractions of multiple choices. Fractional responses arise naturally in various settings. For example, a municipality allocates its budget across multiple departments, and we are interested in the proportion of the budget that each department receives. Or, there are multiple candidates in a presendential election, and we are interested in explaining the percentage of support for each candidate in each state. 12 | 13 | The model is distinct in that: 1) each of the responses lies between 0 and 1, and 2) the share of all responses adds up to one. The fmlogit model uses these two distinct factors, and models them explicitly. 14 | 15 | # How to install fmlogit 16 | Type the following code into your R console: 17 | ```R 18 | require(devtools) 19 | install_github("f1kidd/fmlogit") 20 | library(fmlogit) 21 | ``` 22 | 23 | # Why do we need fmlogit in R? 24 | Don't we already have an fmlogit module in Stata? Yes, and you are very welcome to [check that out](http://maartenbuis.nl/software/fmlogit.html) if you can afford a Stata license. 25 | 26 | However, this package offers several advantages over Stata's fmlogit module, namely: 27 | ### 1. Integration with the R Platform 28 | Implementating the model in R offers the opportunity to integrate the whole empirical process within a free, open-source platform. With the help of numerous R packages, everything can be accomplished in a single environment including data processing, estimation, post-estimation, and final manuscript writing. This is a huge advantage over stata. 29 | 30 | ### 2. Post-estimation improvements 31 | The marginal effect estimation in this package is much faster than Stata's fmlogit package. In this package user can specify which variable(s), and what type of partial effect to be calculated. This results in a huge gain in running time for the post-estimation commands. 32 | 33 | Also, this package allows hypothesis testing for marginal and discrete effects while Stata does not. The standard error is calculated via Krinsky-Robb method, which allows empirical hypothesis testing without knowing the underlying distribution of the effects. 34 | 35 | ### 3. Estimation flexibility 36 | This package allows factor variable inputs, and automatically transform it into dummy variables. This is not (explicitly) allowed in Stata. 37 | 38 | ### 4. Extensions 39 | This package also allows the user to easily calculate and infer the "average aggregate partial effect" given a user-specified weight scheme. This is done through linearly aggregating the attribute of each choice (e.g., expected profit/utility of each choice) with the calculated APE. 40 | 41 | # How does the estimator work? 42 | The estimator used here is an extension of that used in [Papke and Wooldridge (1996)](http://onlinelibrary.wiley.com.ezproxy.lib.utexas.edu/doi/10.1002/(SICI)1099-1255(199611)11:6%3C619::AID-JAE418%3E3.0.CO;2-1/abstract). There, they proposed a quasi-maximum likelihood(QMLE) estimator for fractional response variables. As their approach applies to binary response variables, here we expand it to a multinomial response variables with fractional structure. 43 | 44 | The steps involved in calculating the estimator are as follows: 45 | ## Step 1. Construct the multinomial logit likelihood 46 | This step is straightforward. A simple multinomial logit transformation will do the job. For detailed derivations and formula, please see the technical document [here](https://github.com/f1kidd/fmlogit/blob/master/Documentation/fmlogit_docs.pdf) where I explain the econometric steps in detail. 47 | ## Step 2. Maximize the sum of the log likelihood function 48 | Generally, R is not the most efficient scientific computing machine that exists, and that is the tradeoff we have to face. Here, the program offers several maximization methods provided in the *maxLik* package. The recommended algorithm is either conjugate gradients (CG), or Berndt-Hall-Hall-Hausman (BHHH). For a large dataset it may take a while (running for one hour is entirely possible, so don't terminate the program prematurely). 49 | ## step 3. Calculate robust standard error 50 | Here the program follows Papke & Wooldridge (1996), and construcst the robust standard error estimator for the parameters. The program also offers a simple z-test for parameters based on the standard error. 51 | 52 | # How do the post-estimation commands work? 53 | Calculating partial effects for limited dependent variables can be tricky, and this is especially true for multinomial logit models. The coefficients obtained in the regression model represent the logit-transformed odds ratio for that specific choice against the baseline choice. This is not intuitive at all in terms of actual effects on that specific choice. The bottom line is, the coefficients and standard errors obtained in the original models are not the basis for evaluating hypotheses. 54 | 55 | ## Marginal and discrete effects 56 | Instead, researchers need to compute what are called the "partial effects", as we usually do in linear models. However, the partial effect in logit-type models is tricky because the effects are heterogeneous across different observations. In other word, each unique observation have a different set of partial effects. 57 | 58 | We provide two types of partial effects: marginal and discrete. The marginal effect represents how a unit change in one variable k changes the value in choice j, i.e., $\frac{\partial x_k}{\partial y_j}$. The discrete effect represents how a discrete change in variable k, usually from the minimum to the maximum, changes the value in choice j, i.e., $\hat{y}_{j,x_k=1}-\hat{y}_{j,x_k=0}$. 59 | 60 | Typically, two types of aggregation measures are used to illustrate the global APE: one is the partial effects at the mean (PEM), which is the partial effect of variable k when every other variables are set at their mean ; and the other is partial effect of the average (PEA), which is the average of partial effect for all observations. We allow both of the two options to be specified. 61 | 62 | A more inclusive approach will be to plot the marginal effect of interest across all individuals. This is not provided in the function, but can certainly be implemented in future developments. Another possibility will be to calculate the so-called locally averaged treatment effect (LATE), where the effect of interest will be centered around a certain range of values. 63 | 64 | ## Standard Errors for APEs 65 | Here we adopt the simulation-based Krinsky-Robb method to compute standard errors for marginal and discrete effects as opposed to the empirical delta method used in Stata. These two methods should be asymptotically equivalent. However, using Krinsky-Robb allow us to perform hypothesis testing on the effects, while Delta method cannot accomplish that. 66 | 67 | Hypothesis testing is done using the standard normal z-test by treating the APE estimates as normally distributed. The approach is very simple: say we test $H_0: D_j=0$. We just need to compare 0 with our N draws, and see if it falls out of the 95% mass. This is a major advantage we provide here compared to Stata's fmlogit module. 68 | 69 | # Practical Concerns 70 | One of the concerns for the package is the computation speed of the estimation process. The maximization process can take somewhere from 20 seconds to 1 hour, depending on how large the dataset is. This is certainly a limitation. This is the inherent drawback of R's computation speed, and I can do nothing about that. 71 | 72 | However, the loss in estimation will certainly be compensated in the post-estimation process. Stata's dfmlogit command is very slow (takes somewhere between 5-60 minutes), while here the effects calculation takes seconds to complete. 73 | 74 | # References 75 | Papke, L. E. and Wooldridge, J. M. (1996), Econometric methods for fractional response variables with an application to 401(k) plan participation rates. J. Appl. Econ., 11: 619-632. 76 | 77 | Wulff, Jesper N. "Interpreting Results From the Multinomial Logit Model Demonstrated by Foreign Market Entry." Organizational Research Methods (2014): 1094428114560024. 78 | 79 | Mullahy, J., 2015. Multivariate fractional regression estimation of econometric share models. Journal of Econometric Methods 4(1), 71-100. 80 | 81 | 82 | 83 | 84 | 85 | 86 | -------------------------------------------------------------------------------- /data/spending.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/f1kidd/fmlogit/62ff38adefc95b1bfc8324c095a7a3f50775607d/data/spending.rda -------------------------------------------------------------------------------- /man/effects.fmlogit.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/marginals.R 3 | \name{effects.fmlogit} 4 | \alias{effects.fmlogit} 5 | \title{Average Partial Effects of the Covariates} 6 | \usage{ 7 | \method{effects}{fmlogit}(object, effect = c("marginal", "discrete"), 8 | marg.type = "atmean", se = F, varlist = NULL, at = NULL, 9 | R = 1000) 10 | } 11 | \arguments{ 12 | \item{object}{An "fmlogit" object.} 13 | 14 | \item{effect}{Can be "marginal", for marginal effect; or "discrete", for discrete changes from 15 | the min to the max.} 16 | 17 | \item{marg.type}{Type of marginal or discrete effects to be computed. Default to "atmean", the effect at 18 | the mean of all covariates. Also take "aveacr", the averaged effects across all observations. See details.} 19 | 20 | \item{se}{Whether to calculate standard errors for those margins. See details.} 21 | 22 | \item{varlist}{A string vector which provides the name of variables to calculate 23 | the marginal effect. If missing, all variables except the constant will be calculated. 24 | Use "constant" if wish to compute the marginal effect of constant.} 25 | 26 | \item{at}{Specify values of the X-matrix at which the partial effect will be retrieved. Expect a vector input 27 | of length K-1. Only supported for \code{marg.type="atmean"}. See \code{predict.fmlogit(newdata)}.} 28 | 29 | \item{R}{Number of times to sample for the Krinsky-Robb standard error. Default to 1000.} 30 | 31 | \item{marg.list}{A list of matrices storing the marginal effect matrix for each observation. Exists 32 | only if marg.type="aveacr".} 33 | } 34 | \value{ 35 | The function returns an object of class "fmlogit.margins". It contains the following component: 36 | 37 | \code{effects} A matrix of calculated effects. 38 | 39 | \code{se} A matrix of standard errors corresponding to the effects. Shows up if se=T for the 40 | input parameter. 41 | 42 | \code{ztable} A list of matrices containing effects, standard errors, z-stats and p-values. 43 | 44 | \code{R} Number of simulation times for Krinsky-Robb standard error calculation. Null if se=F. 45 | 46 | \code{expl} String message explaining the effects calculated. 47 | } 48 | \description{ 49 | Calculate average partial effects (APE) of independent variable from a fractional multinomial logit model. 50 | } 51 | \details{ 52 | This module calculates the average partial effects (APEs) from a fractional multinomial logit model. 53 | Partial effects are the counterpart of the marginal effects in a linear model setting. In linear models, 54 | usually the parameter estimate itself represents marginal effect (if the variable in question is continuous). 55 | In logit models, however, the parameter estimates at hand is the effect on log-ratio between the choice variable 56 | and the baseline variable. This function is intended to extract APEs from the 57 | coefficient estimates completed from the fractional multinomial logit models. 58 | 59 | This function allows for two types of partial effects: marginal effect, and discrete effect. 60 | Marginal effect represents how a unit change in one continuous variable x may influence the choice variable y. 61 | The estimate of marginal effect is very straighforward. However, special care is needed when averaging 62 | the marginal effect across observations to acquire APE. One approach is to use the estimate of the marginal effect while setting 63 | other explanatory variables at the mean. We call this marginal effect at the mean (MEM), which corresponds 64 | to the option \code{marg.type=atmean}. Another approach is to take the average of marginal effects for each 65 | individual. We call this average marginal effect (AME), which corresponds to the option \code{marg.type= 66 | aveacr}. 67 | 68 | The discrete effect represents how a discrete change in one specific x, discrete or continuous, influence the choice variable y. 69 | This is more useful for categorical variables, as calculating the "marginal effect" makes little sense 70 | for them. In this function, we calculate the discrete effect by changing the explanatory variable from 71 | its minimum to its maximum. For a binary variable, this is just the difference between 0 and 1. Similar 72 | to the marginal effect case, we also have discrete effect at the mean (DEM), corresponding to \code{marg.type= atmean} 73 | and average dscrete effect (ADE), corresponding to \code{marg.type=aveacr}. 74 | 75 | Standard error is provided for the effects by using Krinsky-Robb(KR) method. Krinsky-Robb is a simulation-based 76 | method that calculates the empirical value of a function given a known distribution of its variables. Here 77 | we provide Krinsky-Robb standard error for MEM and DEM, and the user can specify how many times of 78 | simulation \code{R} should the Krinsky-Robb algorithm run. 79 | 80 | The user can also specify a subset of explanatory variables when calculating effects. This is done through 81 | specifying string vectors containing the column names of the explanatory variables to \code{varlist}. As the 82 | KR standard error can be time-consuming, it is advised to calculate only the variables in need. 83 | } 84 | \examples{ 85 | #results1 = fmlogit(y,X) 86 | effects(results1,effect="marginal") 87 | effects(results1,effect="discrete",varlist = colnames(object$X)[c(1,3)]) 88 | } 89 | -------------------------------------------------------------------------------- /man/fitted.fmlogit.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/predictions.R 3 | \name{fitted.fmlogit} 4 | \alias{fitted.fmlogit} 5 | \alias{residuals.fmlogit} 6 | \alias{predict.fmlogit} 7 | \title{Extract fitted values, residuals, and predictions} 8 | \usage{ 9 | \method{fitted}{fmlogit}(object) 10 | 11 | \method{residuals}{fmlogit}(object) 12 | 13 | \method{predict}{fmlogit}(object, newdata = NULL, newbeta = NULL) 14 | } 15 | \arguments{ 16 | \item{object}{A "fmlogit" object.} 17 | 18 | \item{newdata}{A new X matrix to perform model prediction. If Null, default to the original dataset. 19 | X can be a vector with length k, or a matrix with k columns, where k is the number of explanatory 20 | variables in the original model.} 21 | 22 | \item{newbeta}{A new augmented matrix of coefficients that can be used to predict outcome variables. 23 | Feeds into object$coefficient, which contains the baseline coefficient. Useful for constructing 24 | confidence intervals via simulation or bootstrapping.} 25 | } 26 | \description{ 27 | Extract fitted values, residuals, and predictions 28 | } 29 | \examples{ 30 | #results1 = fmlogit(y,X) 31 | fitted(results1) 32 | residuals(results1) 33 | predict(results1) 34 | # predict using the first observation from the original dataset. 35 | predict(results1,X[1,]) 36 | } 37 | -------------------------------------------------------------------------------- /man/fmlogit.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/fmlogit_main.R 3 | \name{fmlogit} 4 | \alias{fmlogit} 5 | \title{Estimate Fractional Multinomial Logit Models} 6 | \usage{ 7 | fmlogit(y, X, beta0 = NULL, MLEmethod = "CG", maxit = 5e+05, 8 | abstol = 1e-05, cluster = NULL, reps = 1000, ...) 9 | } 10 | \arguments{ 11 | \item{y}{the dependent variable (N*J). Can be a matrix or a named data frame. 12 | The first column of the matrix is automatically treated as the baseline.} 13 | 14 | \item{X}{independent variable (N*K). Can be a matrix or a named data frame. 15 | If there is no intercept term in the X, then an intercept term is 16 | automatically added.} 17 | 18 | \item{beta0}{Initial value for beta used in optimization. Uses a 1*K(J-1) 19 | vector. Default to a vector of zeros.} 20 | 21 | \item{MLEmethod}{Method of optimization. Goes into 22 | \code{maxLik(method=MLEmethod))}. Choose from "NR","BFGS","CG","BHHH","SANN",or "NM". 23 | Default to "CG", the conjugate gradients method. See Details.} 24 | 25 | \item{maxit}{Maximum number of iteration.} 26 | 27 | \item{abstol}{Tolerence.} 28 | 29 | \item{cluster}{A vector of cluster to be used for clustered standard error computation. 30 | Default to NULL, no cluster computed.} 31 | 32 | \item{reps}{Numbers of bootstrap replications to be computed for clustered standard errors.} 33 | 34 | \item{...}{additional parameters that goes into \code{maxLik()}} 35 | } 36 | \value{ 37 | The function returns an object of class "fmlogit". Use \code{effects}, \code{predict}, 38 | \code{residual}, \code{fitted} to extract various useful features of the value returned by 39 | \code{fmlogit}. 40 | 41 | An object of class "fmlogit" contains the following components: 42 | 43 | \code{estimates} A list of matrices containing parameter estimates, 44 | standard errors, and hypothesis testing results. 45 | 46 | \code{baseline} The baseline choice 47 | 48 | \code{likelihood} The likelihood value 49 | 50 | \code{conv_code} Convergence diagnostics code. 51 | 52 | \code{convergence} Convergence messages. 53 | 54 | \code{count} Provides dataset information 55 | 56 | \code{y} The dependent variable data frame. 57 | 58 | \code{X} The independent variable data frame. Augmented by factor dummy transformation 59 | , constant term added. 60 | 61 | \code{rowNo} A vector of row numbers from the original X and y that is used for estimation. 62 | 63 | \code{coefficient} Matrix of estimated coefficients. Augmented with the baseline coefficient 64 | (which is a vector of zeros). 65 | 66 | \code{vcov} A list of matrices containing the robust variance covariance matrix for each choice 67 | variable. 68 | 69 | \code{cluster} The vector of clusters. 70 | 71 | \code{reps} Number of bootstrap replications for clustered standard error 72 | } 73 | \description{ 74 | Used to estimate fractional multinomial logit models using quasi-maximum 75 | likelihood estimations following Papke and Wooldridge(1996). 76 | } 77 | \details{ 78 | The fractional multinomial model is the expansion of the multinomial 79 | logit to fractional responses. Unlike standard multinomial logit models, 80 | which only considers 0-1 respones, fractional multinomial model considers the 81 | case where the response variable is fractions that sums up to one. Examples 82 | of these type of data are, percentages of budget spent in education, defense, 83 | public health; fractions of a population that have middle school, high 84 | school, college, or post college education, etc. 85 | 86 | This function follows Papke and Wooldridge(1996)'s paper, in which they 87 | proposed a quasi-maximum likelihood estimator for fractional response data. 88 | The likelihood function used here is a standard multinomial likelihood 89 | function, see \url{http://maartenbuis.nl/software/likelihoodFmlogit.pdf} for 90 | the likelihood used here. Robust standard errors are provided following Papke 91 | and Wooldridge(1996), in which they proposed an asymptotically consistent 92 | estimator of variance. 93 | 94 | Maximization is done by calling \code{\link{maxLik}}. maxLik is a wrapper function 95 | for different maximization methods in R. This include most methods provided by \code{\link{maxLik}}, 96 | but also other methods such as BHHH(Berndt-Hall-Hall-Hausman). 97 | 98 | MLE convergence can be a problem in R, especially if dataset is large with many explanatory variables. 99 | It is recommended to call CG(Conjugate Gradients) or BHHH(Berndt-Hall-Hall-Hausman). 100 | Conjugate gradients method is usually faster, but could lead to non-convergence under 101 | certain scenarios. BHHH is slower, but has better convergence performance. 102 | } 103 | \examples{ 104 | data = spending 105 | X = data[,2:5] 106 | y = data[,6:11] 107 | results1 = fmlogit(y,X) 108 | } 109 | \references{ 110 | Papke, L. E. and Wooldridge, J. M. (1996), Econometric methods 111 | for fractional response variables with an application to 401(k) plan 112 | participation rates. J. Appl. Econ., 11: 619-632. 113 | } 114 | -------------------------------------------------------------------------------- /man/plot.fmlogit.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plot_effects_1.R 3 | \name{plot.fmlogit} 4 | \alias{plot.fmlogit} 5 | \title{Plot marginal or discrete effects of willingness to pay} 6 | \usage{ 7 | \method{plot}{fmlogit}(object, wtp.vec, varlist, against = NULL, 8 | mfrow = NULL, t = 500, effect = c("discrete", "marginal"), 9 | type = "l", plot.show = T, ...) 10 | } 11 | \arguments{ 12 | \item{object}{An "fmlogit" object.} 13 | 14 | \item{varlist}{A string vector which provides the name of variables to plot the effect. 15 | If missing, all variables in object will be plotted.} 16 | 17 | \item{against}{A vector with the same length as the number of observations in the model. 18 | Serve as the x-axis in the plots.} 19 | 20 | \item{mfrow}{A numeric vector with two elements. Specify the number of rows and columns in a panel. 21 | Similar to par(mfrow=c()). Default to Null, and the program will choose a square panel.} 22 | 23 | \item{plot.show}{If true, the plot will be created. Otherwise the function returns raw data that can be 24 | used to create user-specified (fancier) plots.} 25 | 26 | \item{X}{The covariates matrix. Recommend to use element X from the fmlogit object.} 27 | 28 | \item{y}{The covariates matrix. Recommend to use element y from the fmlogit object.} 29 | } 30 | \value{ 31 | Panel plots of effects vs. chosen variables 32 | } 33 | \description{ 34 | Plot marginal or discrete effects of willingness to pay, potentially against another variable 35 | } 36 | \details{ 37 | This function provides a visualization tool for potentially heterogeneous marginal and discrete effects. 38 | The function lets the user to plot marginal effects to detect any patterns in the effects, in itself 39 | and against other variables. The plot also allows visualization of sub-groups in data, which can be 40 | very useful to visualize categorical and dummy variables. 41 | 42 | The functions takes an fmlogit.margins object, created by the effects(fmlogit) function. Note that since 43 | the plotting requires marginal effects for all observations, the object should be created by choosing 44 | \code{marg.type="aveacr"}, the average across method for effects calculation. 45 | 46 | Additional parameters including \code{varlist}, a vector of string variable names to be plotted. \code{X} 47 | and \code{y}, the dependent and independent variable matrix in the original regression model. 48 | 49 | \code{against}, \code{against.x}, and \code{against.y} allows different variables to be chosen 50 | as the x-axis. \code{against} directly supplies the vector to be plotted against, whereas \code{against.x} 51 | and \code{against.y} supplies variable names in the original dataset. Note that the user has to provide 52 | \code{X} and \code{y} in order to use the column name option, respectively. 53 | 54 | \code{group.x} supplies the column name in the X matrix to be grouped by. The plot will be able to 55 | differentiate different groups by colors. Additionally, the user can supply a string to \code{group.by}, 56 | which provides a algebra method that will be evaluated on the group vector. For example, choose 57 | \code{group.x = "a"} and \code{group.by= ">0"} will create two groups, one with X$a>0, and one with X$a 58 | <=0 59 | } 60 | \examples{ 61 | 62 | # Not running 63 | # results1 = fmlogit(y,X) 64 | # effect1 = effects(results1,effect="marginal",marg.type="aveacr") 65 | 66 | # Plot only takes effects with marg.type="aveacr". 67 | plot(effect1,X=results1$X,against.x = "popdens", group = "tot", groupby = ">3") 68 | } 69 | -------------------------------------------------------------------------------- /man/plot.fmlogit.margins.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plot_effects.R 3 | \name{plot.fmlogit.margins} 4 | \alias{plot.fmlogit.margins} 5 | \title{Plot marginal or discrete effects, at each observation & for each choice} 6 | \usage{ 7 | \method{plot}{fmlogit.margins}(object, varlist = NULL, X = NULL, 8 | y = NULL, against = NULL, against.x = NULL, against.y = NULL, 9 | group.x = NULL, group.algebra = NULL, mfrow = NULL) 10 | } 11 | \arguments{ 12 | \item{object}{An "fmlogit.margins" object.} 13 | 14 | \item{varlist}{A string vector which provides the name of variables to plot the effect. 15 | If missing, all variables in object will be plotted.} 16 | 17 | \item{X}{The covariates matrix. Recommend to use element X from the fmlogit object.} 18 | 19 | \item{y}{The covariates matrix. Recommend to use element y from the fmlogit object.} 20 | 21 | \item{against}{A vector with the same length as the number of observations in the model. 22 | Serve as the x-axis in the plots.} 23 | 24 | \item{against.x}{A character string, Supply the column name in the X matrix to be plot against.} 25 | 26 | \item{against.y}{A character string, Supply the column name in the y matrix to be plot against.} 27 | 28 | \item{group.x}{A character string. Supply the column name in the X matrix to be grouped upon.} 29 | 30 | \item{mfrow}{A numeric vector with two elements. Specify the number of rows and columns in a panel. 31 | Similar to par(mfrow=c()). Default to Null, and the program will choose a square panel.} 32 | 33 | \item{group.by}{A character string. Supply additional algebra emposed on the group variable.} 34 | } 35 | \value{ 36 | Panel plots of effects vs. chosen variables 37 | } 38 | \description{ 39 | Plot the desired effect at each observed value for each choice 40 | } 41 | \details{ 42 | This function provides a visualization tool for potentially heterogeneous marginal and discrete effects. 43 | The function lets the user to plot marginal effects to detect any patterns in the effects, in itself 44 | and against other variables. The plot also allows visualization of sub-groups in data, which can be 45 | very useful to visualize categorical and dummy variables. 46 | 47 | The functions takes an fmlogit.margins object, created by the effects(fmlogit) function. Note that since 48 | the plotting requires marginal effects for all observations, the object should be created by choosing 49 | \code{marg.type="aveacr"}, the average across method for effects calculation. 50 | 51 | Additional parameters including \code{varlist}, a vector of string variable names to be plotted. \code{X} 52 | and \code{y}, the dependent and independent variable matrix in the original regression model. 53 | 54 | \code{against}, \code{against.x}, and \code{against.y} allows different variables to be chosen 55 | as the x-axis. \code{against} directly supplies the vector to be plotted against, whereas \code{against.x} 56 | and \code{against.y} supplies variable names in the original dataset. Note that the user has to provide 57 | \code{X} and \code{y} in order to use the column name option, respectively. 58 | 59 | \code{group.x} supplies the column name in the X matrix to be grouped by. The plot will be able to 60 | differentiate different groups by colors. Additionally, the user can supply a string to \code{group.by}, 61 | which provides a algebra method that will be evaluated on the group vector. For example, choose 62 | \code{group.x = "a"} and \code{group.by= ">0"} will create two groups, one with X$a>0, and one with X$a 63 | <=0 64 | } 65 | \examples{ 66 | 67 | # Not running 68 | # results1 = fmlogit(y,X) 69 | # effect1 = effects(results1,effect="marginal",marg.type="aveacr") 70 | 71 | # Plot only takes effects with marg.type="aveacr". 72 | plot(effect1,X=results1$X,against.x = "popdens", group = "tot", groupby = ">3") 73 | } 74 | -------------------------------------------------------------------------------- /man/spending.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/spending_data.R 3 | \docType{data} 4 | \name{spending} 5 | \alias{spending} 6 | \title{Government Spending by Dutch Cities in 2005} 7 | \format{A data frame with 429 row and 12 columns.} 8 | \source{ 9 | \href{http://fmwww.bc.edu/repec/bocode/c/citybudget.dta} 10 | } 11 | \usage{ 12 | data(spending) 13 | } 14 | \description{ 15 | Data from 429 Dutch cities with governmental spending on each sub-category 16 | , and city attributes. 17 | } 18 | \examples{ 19 | spending 20 | } 21 | \keyword{datasets} 22 | -------------------------------------------------------------------------------- /man/summary.fmlogit.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/summary.R 3 | \name{summary.fmlogit} 4 | \alias{summary.fmlogit} 5 | \alias{summary.fmlogit.margins} 6 | \alias{summary.fmlogit.wtp} 7 | \title{Generate summary tables for fmlogit objects} 8 | \usage{ 9 | \method{summary}{fmlogit}(object, varlist = NULL, sepline = F, 10 | digits = 3, add.info = T, list = T, sigcode = c(0.05, 0.01, 11 | 0.001), print = F) 12 | 13 | \method{summary}{fmlogit.margins}(object, varlist = NULL, sepline = F, 14 | digits = 3, add.info = T, list = T, sigcode = c(0.05, 0.01, 15 | 0.001), print = F) 16 | 17 | \method{summary}{fmlogit.wtp}(object, varlist = NULL, sepline = F, 18 | digits = 3, sigcode = c(0.05, 0.01, 0.001), print = F) 19 | } 20 | \arguments{ 21 | \item{object}{an object with class "fmlogit", "fmlogit.margins", or "fmlogit.wtp".} 22 | 23 | \item{varlist}{select a subset of variable names to be processed. Default to NULL, of which all variables will 24 | be processed.} 25 | 26 | \item{sepline}{whether the output table uses separate lines for coefficients and standard errors.} 27 | 28 | \item{digits}{number of digits to be signifed. Default to show 3 digits.} 29 | 30 | \item{add.info}{whether to add additional descriptive information to the output.} 31 | 32 | \item{list}{whether to output a list object, or a single data frame.} 33 | 34 | \item{sigcode}{the significance code to be used. Has to be a three-component vector.} 35 | } 36 | \value{ 37 | Either a list (for display purposes) or a data.frame (for csv output purposes). If list return (which is 38 | the default) is selected, then the list will contain 4 components: $estimates the estimate; $N number of 39 | observations, $llf value of the log-likelihood function; and $baseline the name of the baseline choice. 40 | } 41 | \description{ 42 | Generate tables of coefficient estimates, partial effects, and willingness to pay from 43 | fmlogit-type objects. 44 | } 45 | \details{ 46 | This module provides summary methods for three fmlogit objects: \code{fmlogit}, \code{fmlogit.margins} 47 | , and \code{fmlogit.wtp}. 48 | 49 | The summary method offers several options to the users. The user can choose for a list output \code{list=T}, which is 50 | good for display and quoting purposes, or a data frame output \code{list=F}, which is good for table outputs. The user 51 | can also specify whether to provide additional information other than the parameter estimates, whether to use 52 | seperate lines for the estimates and the standard errors (which mimics the output style in Stata), 53 | as well as the significance code. 54 | } 55 | \examples{ 56 | # generate fmlogit summary 57 | #results1 = fmlogit(y,X) 58 | 59 | # generate marginal effects summary 60 | #effects1 = effects(results1,effect="marginal") 61 | summary(effects1) 62 | 63 | # generate latex style output 64 | # require(xtable) 65 | xtable(summary(effects1,list=F,sepline=T)) 66 | } 67 | -------------------------------------------------------------------------------- /man/wtp.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/wtp.R 3 | \name{wtp} 4 | \alias{wtp} 5 | \title{"Willingness to Pay" for fmlogit models} 6 | \usage{ 7 | wtp(object, wtp.vec, varlist = NULL, indv.obs = F) 8 | } 9 | \arguments{ 10 | \item{object}{An "fmlogit.margins" object.} 11 | 12 | \item{wtp.vec}{A 1*J vector that contains the willingness to pay for each choice j.} 13 | 14 | \item{varlist}{A string vector which provides the name of variables to calculate 15 | the wtp. If missing, all variables in object will be calculated.} 16 | } 17 | \value{ 18 | A matrix containing the estimates, standard error, z-stats, and p-value. 19 | } 20 | \description{ 21 | Calculates the willingness to pay for fractional multinomial logit models. 22 | } 23 | \details{ 24 | This function calculates the aggregate effect of a variable on the 25 | "willingness to pay" by linearly multiplying the average partial effect with ex-ante (arbitary) 26 | willingness to pay numbers associated with each choice. 27 | 28 | Suppose there are three choices A,B,C, each with a willingness to pay (or cost, profit, budget), 29 | of 100, 200, and 300. The discrete effect of variable X on A,B and C are 0.5, 0.5, and -1, with 30 | standard error 0.2, 0.3 and 0.5. The aggregated discrete effect of X on the total willingness 31 | to pay (or cost), is thus 100*0.5 + 200*0.5 + 300*(-1) = -150. And the standard error can be also 32 | calculated to be 162.8, assuming that the standard error is independent. 33 | A simple z-test is provided to test whether the aggregate effect is different from zero. 34 | 35 | Note that if the input fmlogit.margins object has no standard error computation, then no standard error 36 | } 37 | \examples{ 38 | #results1 = fmlogit(y,X) 39 | #effects1 = effects(results1,effect="marginal",se=T) 40 | # assume that the WTP = 1,2,3,...J for each choice j. 41 | wtp(effects1,seq(1:nrow(effects1$effects))) 42 | } 43 | --------------------------------------------------------------------------------