├── .gitattributes
├── .gitignore
├── DESCRIPTION
├── Documentation
    ├── fmlogit_docs.Rmd
    ├── fmlogit_docs.html
    └── fmlogit_docs.pdf
├── NAMESPACE
├── R
    ├── fmlogit.R
    ├── fmlogit_main.R
    ├── marginals.R
    ├── plot_effects.R
    ├── plot_effects_1.R
    ├── predictions.R
    ├── spending_data.R
    ├── summary.R
    └── wtp.R
├── README.md
├── data
    └── spending.rda
└── man
    ├── effects.fmlogit.Rd
    ├── fitted.fmlogit.Rd
    ├── fmlogit.Rd
    ├── plot.fmlogit.Rd
    ├── plot.fmlogit.margins.Rd
    ├── spending.Rd
    ├── summary.fmlogit.Rd
    └── wtp.Rd


/.gitattributes:
--------------------------------------------------------------------------------
 1 | # Auto detect text files and perform LF normalization
 2 | * text=auto
 3 | 
 4 | # Custom for Visual Studio
 5 | *.cs     diff=csharp
 6 | 
 7 | # Standard to msysgit
 8 | *.doc	 diff=astextplain
 9 | *.DOC	 diff=astextplain
10 | *.docx diff=astextplain
11 | *.DOCX diff=astextplain
12 | *.dot  diff=astextplain
13 | *.DOT  diff=astextplain
14 | *.pdf  diff=astextplain
15 | *.PDF	 diff=astextplain
16 | *.rtf	 diff=astextplain
17 | *.RTF	 diff=astextplain
18 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | *.Rproj
 2 | 
 3 | # Windows image file caches
 4 | Thumbs.db
 5 | ehthumbs.db
 6 | 
 7 | # Folder config file
 8 | Desktop.ini
 9 | 
10 | # Recycle Bin used on file shares
11 | $RECYCLE.BIN/
12 | 
13 | # Windows Installer files
14 | *.cab
15 | *.msi
16 | *.msm
17 | *.msp
18 | 
19 | # Windows shortcuts
20 | *.lnk
21 | 
22 | # =========================
23 | # Operating System Files
24 | # =========================
25 | 
26 | # OSX
27 | # =========================
28 | 
29 | .DS_Store
30 | .AppleDouble
31 | .LSOverride
32 | 
33 | # Thumbnails
34 | ._*
35 | 
36 | # Files that might appear in the root of a volume
37 | .DocumentRevisions-V100
38 | .fseventsd
39 | .Spotlight-V100
40 | .TemporaryItems
41 | .Trashes
42 | .VolumeIcon.icns
43 | 
44 | # Directories potentially created on remote AFP share
45 | .AppleDB
46 | .AppleDesktop
47 | Network Trash Folder
48 | Temporary Items
49 | .apdisk
50 | 


--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: fmlogit
 2 | Title: Fractional Multinomial Logit using QMLE
 3 | Version: 2.0
 4 | Authors@R: c(person("James Xinde", "Ji", email = "xji1@ufl.edu",role=c("aut","cre")))
 5 | Description: Provides estimation and simple hypothesis testing of the fractional
 6 |     multinomial logit model.
 7 | Depends:
 8 |     R (>= 2.6.0),maxLik
 9 | Imports: maxLik
10 | Suggests: Foreign, ggplot2, grid
11 | Encoding: UTF-8
12 | LazyData: true
13 | License: MIT
14 | RoxygenNote: 6.1.1
15 | 


--------------------------------------------------------------------------------
/Documentation/fmlogit_docs.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: 'The fmlogit Package: An Econometric Document'
  3 | author: "Xinde James Ji"
  4 | date: "Oct.10, 2016"
  5 | output: pdf_document
  6 | ---
  7 | 
  8 | This document provides theoretical documentations for the "fmlogit" package in R. Updates will be published at my github site: \url{https://github.com/f1kidd/fmlogit}. Any suggestions or concerns are welcomed\footnote{email: xji@vt.edu}. For function usage and calls, please check directly with help(func_name) after loading the fmlogit package. 
  9 | 
 10 | # Motivation
 11 | Fractional multinomial responses, or multivariate share models, arises naturally in various occasions.  For example, a municipality allocates its budgets to multiple departments, and we are interested in the proportion of the budgets that each department receives. Or, there are multiple candidates in a presendential election, and we are interested in the percentage of support for each candidate in each state. 
 12 | 
 13 | The model is distinct in that 1) each of the response lies between 0 and 1, and 2) the share of all responses adds up to one. The fmlogit model utilizes the two distinct factors, and model it explicitly using a multinomial logit transformation on the response variables. If the true data generating process is multinomial fractions, or shares of multiple choices, then the fractional multinomial logit model is consistent and efficient, while other candidate models such as Dirichlet or Beta regression is not. 
 14 | 
 15 | # Econometric Model
 16 | The basis of this package is Papke and Wooldridge(1996)'s paper, in which they proposed a quasi-maximum likelihood(QMLE) estimator for fractional response variables. As their approach applies to binary response variables, here we expand it to a multinomial response variables with fractional structure. 
 17 | 
 18 | We start by writing:\footnote{The demonstration below is in individual specific notation, but matrix notation is not hard to obtain from the individual specific notations. The actual function uses matrix calculation, which increases algorithm speed.}
 19 | $$E(y_{ij}|x_i) = G(x_i\beta_j)$$
 20 | for the $j^{th}$ choice of the $i^{th}$ obsevation , where G(.) is a know function satisfying 0<G(z)<1 for all $z\in {\rm I\!R}$. Note that here we only allow for common covariates of $x_i$, and not for choice-specific attributes. Following the logit convention, G(.) is chosen to be the multinomial logit function, with the form:
 21 | $$G(z_j) = \frac{exp(z_j)}{\sum_{k=1}^J exp(z_k)}$$
 22 | And the multinomial likelihood function, is thus given by
 23 | $$ ln(L_i) = \sum_{j=1}^J y_{ij} ln(G(x_i\beta_j))$$
 24 | with $\beta_1=0$, the baseline coefficient equal to zero. 
 25 | Papke and Wooldridge(1996) showed that the QMLE estimator of $\beta$, obtained by the the maximazation problem
 26 | $$ argmax_\beta \sum_{i=1}^N ln(L_i)$$
 27 | is a consistent estimator for $\beta$ if G(z) is the correct functional form for E(y|x). 
 28 | 
 29 | To estimate the standard error for the QMLE estimator, define $g(z_j)\equiv \partial G(z_j)/\partial z_j$, the partial derivative of the multinomial logit function with respect to choice j. Specifically, $g(z_j)$ has the following functional form:
 30 | $$ g(z_j) = \frac{\hat{E}\hat{S} - \hat{E}^2}{\hat{S}^2} $$
 31 | where $\hat{E} = exp(x_i\beta_j)$, and $\hat{S} = \sum_{k=1}^J exp(x_i\beta_k)$. 
 32 | 
 33 | A robust asymptotic standard error for $\hat{\beta_j}$ is given by the square root of the diagonal element of the following matrix:
 34 | $$\hat{A_j}^{-1}\hat{B_j}\hat{A_j}^{-1}$$
 35 | where
 36 | $$\hat{A} = \sum_{i=1}^N \frac{\hat{g}_{ij}^2 \mathbf{x}_i'\mathbf{x}_i}{\hat{G}_{ij}(1-\hat{G}_{ij})} $$
 37 | $$\hat{B} = \sum_{i=1}^N \frac{\hat{u}_{ij}^2 \hat{g}_{ij}^2 \mathbf{x}_i'\mathbf{x}_i} {[\hat{G}_{ij}(1-\hat{G}_{ij})]^2} $$
 38 | in which $\hat{u}_{ij}$ is the residual for the $j^{th}$ choice of the $i^{th}$ observation, given by $\hat{u}_{ij}=y_{ij} - G(x_i\beta_{j})$. Specifically, $\hat{A}$ is the information matrix, which is not a consistent estimator itself, and $\hat{B}$ is a weight correction for A. 
 39 | 
 40 | In most binary / multinomial response models, the convention is to treat one of the choices as a baseline. Here we apply the same logic, and treat j=1 as the baseline scenario. This implicitly generates a restriction that $\beta_1=0$, and all other betas are the marginal difference to the baseline case.  
 41 | 
 42 | # Partial Effects
 43 | 
 44 | Interpreting partial effects for limited dependent variables can be tricky, and this is especially true for multinomial logit models. The coefficients obtained in the regression model represents the logit-transformed odds ratio for that specific choice against the baseline choice, and should not be treated as the marginal or discrete effect obtained from the model. Instead, the modeller has to derive "partial effects" from the coefficient estimates, which is analogous to the coefficients in a regular linear model. The rest of the section provides detail information for deriving two types of partial effects: marginal effects and discrete effects. 
 45 | 
 46 | ## Marginal Effects
 47 | 
 48 | Marginal effect is the counterpart of the coefficients of a continous variable in a linear model. It measures the effect of a marginal change of a continuous variable $x_k$ on the choice variable $y_j$. It should be point out that the right hand size variable, $x_k$, will have J marginal effects, each on one choice variable $y_j$.
 49 | 
 50 | The marginal effect of the multinomial models actually has a very distinctive form: 
 51 |  
 52 | $$ME_{jk} = \frac{\partial p_j}{x_k} = p_j(\beta_{kj} - \bar{\beta}_i)$$
 53 |  
 54 | where $p_j$ is an 1*N vector of predicted probabilities for choice j, and $\bar{\beta}_i = \sum_{m=1}^J \beta_{km} p_m$ is the probability weighted average of $\beta{km}$. This shows that the marginal effects among different individuals are actually different given different predicted probabilities of choice j. 
 55 |  
 56 | Typically, two types of summary measures are used to illustrate the global average marginal effects. The first one is called marginal effects at the mean (MEM) in the code. In algebric form, this is represented as
 57 |  
 58 | $$MEM_{jk} = \bar{p}_j(\beta_{kj} - \bar{\beta}_i)$$
 59 |  
 60 | where $\bar{p}_j$ is the predicted value of choice j at the mean of all X covariates. Centering observations around the mean simplifies the calculation, however it ignores the potential heterogeneity in marginal effects, especially at the extreme values. 
 61 |  
 62 | Another measure is called average marginal effects (AME). This can be written as:
 63 |  
 64 | $$AME_{jk} = \frac{1}{N}\sum_{i=1}^N p_j(\beta_{kj} - \bar{\beta}_i)$$
 65 |  
 66 | However, according to Greene(2003), there is no agreement as to which one is prefered. A more inclusive approach will be to plot the marginal effect of interest across all individuals. This is not provided in the function, but can certainly be implemented in a straightforward way in R. 
 67 |  
 68 | ## Discrete Effects 
 69 |  
 70 | Discrete effect is a little bit different from marginal effects. Instead of calculating the slope of the 
 71 | coefficients, discrete effect considers the impact of a discrete change in one covariates on the predicted outcome variables. This is especially useful for dummy variables, where calculating marginal effects does not make much sense. 
 72 | 
 73 | The discrete effect has a straight-forward form. Consider a discrete change of a dummy variable k from 0 to 1. This is just
 74 | $$DE_{jk} = Pr(y=j|\mathbf{x}_{x_k=1}) - Pr(y=j|\mathbf{x}_{x_k=0})$$
 75 | The change in predicted value by setting $x_k=1$ and $x_k=0$. 
 76 | 
 77 | Similar to the marginal effect case, we can calculate discrete effects at the mean (DEM) by predicting the outcome when all other covariates at the mean, or average discrete effects, which averages the predicted difference across all observations. 
 78 | 
 79 | ## Standard Errors
 80 | 
 81 | Here we adopt the Krinsky-Robb method to compute standard errors for marginal and discrete effects. As oppose to the delta method commonly used in other programs such as Stata, Krinsky-Robb is a simulation-based method. The idea of Krinsky-Robb is that, to calculate the variance of a function $Var(f(\mathbf{\beta}))$, we do the following step:
 82 | 
 83 | For i in 1:N, where N is a very large number, 
 84 | 
 85 | 1) Sample from the known distribution of $\mathbf{\beta}$
 86 | 
 87 | 2) For each of the sample, calculate $f(\mathbf{\beta})$
 88 | 
 89 | 3) Take the empirical variance of $f(\mathbf{\beta})$. 
 90 | 
 91 | And after sufficiently large sample size, the empirical variance converges to the theoretical variance. 
 92 | 
 93 | ## Hypothesis Testing of Marginal and Discrete Effects
 94 | One of the major advantages of Krinsky-Robb over numerical Delta method is in hypothesis testing of those effects. The marginal and discrete effect are not normally distributed since the effect contains a multinomial logit distributed $p_ij$. So even though sample size is large, the central limit theorem does not really hold here. So knowing the standard error does not really help if we do not know the actual shape of the density. But that is out of the scope of the Delta method.
 95 | 
 96 | Krinsky-Robb solves this problem by providing empirical draws from the marginal effects. Hypothesis testing here will be very simple: say we test $H_0: D_j=0$. We just need to compare 0 with our N draws, and see if it falls out of the 95% mass. This is a major advantage we provide here comparing with Stata's fmlogit module. 
 97 | 
 98 | # Practical Concerns
 99 | ## Optimization Method
100 | This function calls *maxLik()* in package *maxLik* to maximize the quasi-likelihood function. The *maxLik* function is a wrapper which provides several different maximization methods, including most *optim()* methods in the base package, as well as other useful methods such as BHHH(Berndt-Hall-Hall-Hausman). The choice of optimization method can create vastly different parameter estimates. Here it is recommended that either conjugate gradients(CG), or Berndt-Hall-Hall-Hausman(BHHH) to ensure convergence. In limited testing scenarios, BHHH typically has the best performace in terms of convergence for large datasets, while CG is faster in computation speed for smaller, easy to converge datasets.
101 | 
102 | ## Robust Standard Error
103 | It is worth noting that the robust standard error created in this function is consistently lower than that created in Stata's fmlogit package, typically by about 20\%. However, the robust SE here is a consistent estimator following Pakpe and Wooldridge(1996)'s $\hat{A_j}^{-1}\hat{B_j}\hat{A_j}^{-1}$ estimator, so it is recommended that the number should be used with causion. 
104 | 
105 | # References
106 | Papke, L. E. and Wooldridge, J. M. (1996), Econometric methods for fractional response variables with an application to 401(k) plan participation rates. J. Appl. Econ., 11: 619-632.
107 | 
108 | Wulff, Jesper N. "Interpreting Results From the Multinomial Logit Model Demonstrated by Foreign Market Entry." Organizational Research Methods (2014): 1094428114560024.
109 | 
110 | Greene, W. H. (2003). Econometric analysis (5th ed.). Upper Saddle River, NJ.: Prentice Hall
111 | 
112 | 
113 | 
114 | 
115 | 
116 | 


--------------------------------------------------------------------------------
/Documentation/fmlogit_docs.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/f1kidd/fmlogit/62ff38adefc95b1bfc8324c095a7a3f50775607d/Documentation/fmlogit_docs.pdf


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | export(effects.fmlogit)
 4 | export(fitted.fmlogit)
 5 | export(fmlogit)
 6 | export(plot.fmlogit)
 7 | export(plot.fmlogit.margins)
 8 | export(predict.fmlogit)
 9 | export(residuals.fmlogit)
10 | export(summary.fmlogit)
11 | export(summary.fmlogit.margins)
12 | export(summary.fmlogit.wtp)
13 | export(wtp)
14 | 


--------------------------------------------------------------------------------
/R/fmlogit.R:
--------------------------------------------------------------------------------
1 | #' Estimation of Fractional Multinomial Logit Models
2 | #' 
3 | #' fmlogit allows you to estimate fractional multinomial logit models with Quasi-MLE. It also provides
4 | #' method for calculating marginal and discrete effects, predictions, and aggregate willingness-to-pay
5 | #' calculations. 
6 | #' 
7 | #' The main function to call is \code{fmlogit}, which will estimate the model. Using \code{effects} to
8 | #' estimate marginal or discrete effects will also be very helpful. 


--------------------------------------------------------------------------------
/R/fmlogit_main.R:
--------------------------------------------------------------------------------
  1 | #' Estimate Fractional Multinomial Logit Models
  2 | #' 
  3 | #' Used to estimate fractional multinomial logit models using quasi-maximum
  4 | #' likelihood estimations following Papke and Wooldridge(1996).
  5 | #' @param y the dependent variable (N*J). Can be a matrix or a named data frame.
  6 | #'   The first column of the matrix is automatically treated as the baseline.
  7 | #' @param X independent variable (N*K). Can be a matrix or a named data frame.
  8 | #'   If there is no intercept term in the X, then an intercept term is
  9 | #'   automatically added.
 10 | #' @param beta0 Initial value for beta used in optimization. Uses a 1*K(J-1)
 11 | #'   vector. Default to a vector of zeros.
 12 | #' @param MLEmethod Method of optimization. Goes into
 13 | #'   \code{maxLik(method=MLEmethod))}. Choose from "NR","BFGS","CG","BHHH","SANN",or "NM".  
 14 | #'   Default to "CG", the conjugate gradients method. See Details. 
 15 | #' @param maxit Maximum number of iteration.
 16 | #' @param abstol Tolerence.
 17 | #' @param cluster A vector of cluster to be used for clustered standard error computation. 
 18 | #' Default to NULL, no cluster computed. 
 19 | #' @param reps Numbers of bootstrap replications to be computed for clustered standard errors.
 20 | #' @param ... additional parameters that goes into \code{maxLik()}
 21 | #' @return The function returns an object of class "fmlogit". Use \code{effects}, \code{predict}, 
 22 | #'  \code{residual}, \code{fitted} to extract various useful features of the value returned by 
 23 | #' \code{fmlogit}. 
 24 | #' @return An object of class "fmlogit" contains the following components: 
 25 | #' @return \code{estimates}   A list of matrices containing parameter estimates,
 26 | #'   standard errors, and hypothesis testing results.
 27 | #' @return \code{baseline}    The baseline choice
 28 | #' @return \code{likelihood}  The likelihood value
 29 | #' @return \code{conv_code}   Convergence diagnostics code. 
 30 | #' @return \code{convergence} Convergence messages. 
 31 | #' @return \code{count}       Provides dataset information
 32 | #' @return \code{y}           The dependent variable data frame.
 33 | #' @return \code{X}           The independent variable data frame. Augmented by factor dummy transformation
 34 | #' , constant term added. 
 35 | #' @return \code{rowNo}       A vector of row numbers from the original X and y that is used for estimation.
 36 | #' @return \code{coefficient} Matrix of estimated coefficients. Augmented with the baseline coefficient
 37 | #' (which is a vector of zeros). 
 38 | #' @return \code{vcov}        A list of matrices containing the robust variance covariance matrix for each choice
 39 | #' variable. 
 40 | #' @return \code{cluster}     The vector of clusters. 
 41 | #' @return \code{reps}        Number of bootstrap replications for clustered standard error
 42 | #' @details The fractional multinomial model is the expansion of the multinomial
 43 | #' logit to fractional responses. Unlike standard multinomial logit models,
 44 | #' which only considers 0-1 respones, fractional multinomial model considers the
 45 | #' case where the response variable is fractions that sums up to one. Examples
 46 | #' of these type of data are, percentages of budget spent in education, defense,
 47 | #' public health; fractions of a population that have middle school, high
 48 | #' school, college, or post college education, etc.
 49 | #' 
 50 | #' This function follows Papke and Wooldridge(1996)'s paper, in which they
 51 | #' proposed a quasi-maximum likelihood estimator for fractional response data.
 52 | #' The likelihood function used here is a standard multinomial likelihood
 53 | #' function, see \url{http://maartenbuis.nl/software/likelihoodFmlogit.pdf} for
 54 | #' the likelihood used here. Robust standard errors are provided following Papke
 55 | #' and Wooldridge(1996), in which they proposed an asymptotically consistent
 56 | #' estimator of variance.
 57 | #' 
 58 | #' Maximization is done by calling \code{\link{maxLik}}. maxLik is a wrapper function 
 59 | #' for different maximization methods in R. This include most methods provided by \code{\link{maxLik}},  
 60 | #' but also other methods such as BHHH(Berndt-Hall-Hall-Hausman). 
 61 | #' 
 62 | #' MLE convergence can be a problem in R, especially if dataset is large with many explanatory variables. 
 63 | #' It is recommended to call CG(Conjugate Gradients) or BHHH(Berndt-Hall-Hall-Hausman).
 64 | #' Conjugate gradients method is usually faster, but could lead to non-convergence under 
 65 | #' certain scenarios. BHHH is slower, but has better convergence performance.
 66 | #' 
 67 | #' 
 68 | #' @examples 
 69 | #' data = spending
 70 | #' X = data[,2:5]
 71 | #' y = data[,6:11]
 72 | #' results1 = fmlogit(y,X)
 73 | #' @references Papke, L. E. and Wooldridge, J. M. (1996), Econometric methods
 74 | #'   for fractional response variables with an application to 401(k) plan
 75 | #'   participation rates. J. Appl. Econ., 11: 619-632.
 76 | #' @export fmlogit
 77 | 
 78 | 
 79 | 
 80 | 
 81 | fmlogit=function(y, X, beta0 = NULL, MLEmethod = "CG", maxit = 5e+05, 
 82 |                           abstol = 1e-05,cluster=NULL,reps=1000, ...){
 83 |   start.time = proc.time()
 84 |   
 85 |   if(length(cluster)!=nrow(y) & !is.null(cluster)){
 86 |     warning("Length of the cluster does not match the data. Cluster is ignored.")
 87 |     cluster = NULL
 88 |   }
 89 |   Xclass = sapply(X, class)
 90 |   Xfac = which(Xclass %in% c("factor", "character"))
 91 |   if (length(Xfac) > 0) {
 92 |     Xfacnames = colnames(X)[Xfac]
 93 |     strformFac = paste(Xfacnames, collapse = "+")
 94 |     Xdum = model.matrix(as.formula(paste("~", strformFac, 
 95 |                                          sep = "")), data = X)[, -1]
 96 |     X = cbind(X, Xdum)
 97 |     X = X[, -Xfac]
 98 |   }
 99 |   Xnames = colnames(X)
100 |   ynames = colnames(y)
101 |   X = as.matrix(X)
102 |   y = as.matrix(y)
103 |   n = dim(X)[1]
104 |   j = dim(y)[2]
105 |   k = dim(X)[2]
106 |   xy = cbind(X, y)
107 |   xy = na.omit(xy)
108 |   row.remain = setdiff(1:n, attr(xy, "na.action"))
109 |   X = xy[, 1:k]
110 |   y = xy[, (k + 1):(k + j)]
111 |   n = dim(y)[1]
112 |   remove(xy)
113 |   # adding in the constant term
114 |   if(k==1){
115 |     # check if the input X is constant
116 |     if(length(unique(X))==1){ # X is constant
117 |       Xnames = "constant"
118 |       X = as.matrix(as.numeric(X),nrow=1)
119 |       colnames(X) = Xnames
120 |       X = as.matrix(X)
121 |       k=0
122 |     }else{ # one single variable of input
123 |       Xnames = "X1"
124 |       X = as.matrix(X)
125 |       k = dim(X)[2]
126 |       X = cbind(X, rep(1, n))
127 |       Xnames = c(Xnames, "constant")
128 |       colnames(X) = Xnames
129 |     }
130 |   }else{ # normal cases
131 |     X = X[, apply(X, 2, function(x) length(unique(x)) != 1)]
132 |     Xnames = colnames(X)
133 |     k = dim(X)[2]
134 |     X = cbind(X, rep(1, n))
135 |     Xnames = c(Xnames, "constant")
136 |     colnames(X) = Xnames
137 |   }
138 |   
139 |   
140 |   testcols <- function(X) {
141 |     m = crossprod(as.matrix(X))
142 |     ee = eigen(m)
143 |     evecs <- split(zapsmall(ee$vectors), col(ee$vectors))
144 |     mapply(function(val, vec) {
145 |       if (val != 0) 
146 |         NULL
147 |       else which(vec != 0)
148 |     }, zapsmall(ee$values), evecs)
149 |   }
150 |   collinear = unique(unlist(testcols(X)))
151 |   while (length(collinear) > 0) {
152 |     if (qr(X)$rank == dim(X)[2]) 
153 |       print("Model may suffer from multicollinearity problems.")
154 |     break
155 |     if ((k + 1) %in% collinear) 
156 |       collinear = collinear[-length(collinear)]
157 |     X = X[, -collinear[length(collinear)]]
158 |     Xnames = colnames(X)
159 |     k = k - 1
160 |     collinear = unique(unlist(testcols(X)))
161 |   }
162 |   QMLE <- function(betas) {
163 |     betas = matrix(betas, nrow = j - 1, byrow = T)
164 |     betamat = rbind(rep(0, k + 1), betas)
165 |     llf = 0
166 |     for (i in 1:j) {
167 |       L = y[, i] * ((X %*% betamat[i, ]) - log(rowSums(exp(X %*% 
168 |                                                              t(betamat)))))
169 |       llf = llf + sum(L)
170 |     }
171 |     return(llf)
172 |   }
173 |   QMLE_Obs <- function(betas) {
174 |     betas = matrix(betas, nrow = j - 1, byrow = T)
175 |     betamat = rbind(rep(0, k + 1), betas)
176 |     llf = rep(0, n)
177 |     for (i in 1:j) {
178 |       L = y[, i] * ((X %*% betamat[i, ]) - log(rowSums(exp(X %*% 
179 |                                                              t(betamat)))))
180 |       llf = llf + L
181 |     }
182 |     return(llf)
183 |   }
184 |   if (length(beta0) == 0){
185 |     beta0 = rep(0, (k + 1) * (j - 1))
186 |   }
187 |   if (length(beta0) != (k + 1) * (j - 1)) {
188 |     beta0 = rep(0, (k + 1) * (j - 1))
189 |     warning("Wrong length of beta0 given. Use default setting instead.")
190 |   }
191 |   opt <- maxLik(QMLE_Obs, start = beta0, method = MLEmethod, 
192 |                 control = list(iterlim = maxit, tol = abstol), ...)
193 |   betamat = matrix(opt$estimate, ncol = k + 1, byrow = T)
194 |   betamat_aug = rbind(rep(0, k + 1), betamat)
195 |   colnames(betamat_aug) = Xnames
196 |   rownames(betamat_aug) = ynames
197 |   sigmat = matrix(nrow = j - 1, ncol = k + 1)
198 |   vcov = list()
199 |   
200 |   ###insert--nonparametric bootstrap procedure (clustered SE and vcov)
201 |   
202 |   if(is.null(cluster)==F){
203 |     cluster = cluster[row.remain]
204 |     clusters <- names(table(cluster))
205 |     for (i in 1:j) {
206 |       # cluster should preferably be coming from a same data frame with the original y and X. 
207 |       sterrs <- matrix(NA, nrow=reps, ncol=k + 1)
208 |       vcov_j_list=list()
209 |       
210 |       b=1
211 |       no_singular_error=c()
212 |       while(b<=reps){
213 |         
214 |         index <- sample(1:length(clusters), length(clusters), replace=TRUE)
215 |         aa <- clusters[index]
216 |         bb <- table(aa)
217 |         bootdat <- NULL
218 |         dat=cbind(y,X)
219 |         for(b1 in 1:max(bb)){
220 |           cc <- dat[cluster %in% names(bb[bb %in% b1]),]
221 |           for(b2 in 1:b1){
222 |             bootdat <- rbind(bootdat, cc)
223 |           }
224 |         }
225 |         
226 |         bootdatX=matrix(bootdat[,(j+1):ncol(bootdat)],nrow=nrow(bootdat))
227 |         bootdaty=bootdat[,1:j]
228 |         
229 |         sum_expxb = rowSums(exp(bootdatX %*% t(betamat_aug)))
230 |         expxb = exp(bootdatX %*% betamat_aug[i, ])
231 |         G = expxb/sum_expxb
232 |         g = (expxb * sum_expxb - expxb^2)/sum_expxb^2
233 |         X_a = bootdatX * as.vector(sqrt(g^2/(G * (1 - G))))
234 |         A = t(X_a) %*% X_a
235 |         mu = bootdaty[, i] - G
236 |         X_b = bootdatX * as.vector(mu * g/G/(1 - G))
237 |         B = t(X_b) %*% X_b
238 |         
239 |         a_solve_error = tryCatch(solve(A),error=function(e){NULL})
240 |         if(is.null(a_solve_error)){
241 |           no_singular_error=c(no_singular_error,b)
242 |           next
243 |         }
244 |         
245 |         Var_b = solve(A) %*% B %*% solve(A)
246 |         std_b = sqrt(diag(Var_b))
247 |         sterrs[b,]=std_b
248 |         vcov_j_list[[b]]=Var_b
249 |         
250 |         b=b+1
251 |       }
252 |       if(length(no_singular_error)>0){warning(paste('Error in solve.default(A) : Lapack routine dgesv: system is exactly singular: U[28,28] = 0" Appeared',length(no_singular_error),'times within cluster bootstrap for outcome #',i))}
253 |       std_b=apply(sterrs,2,mean)
254 |       vcov[[i]] = Reduce("+", vcov_j_list) / length(vcov_j_list)
255 |       if (i > 1) 
256 |         sigmat[i - 1, ] = std_b
257 |     }
258 |   }else{
259 |     for(i in 1:j){
260 |       # start calculation  
261 |       sum_expxb = rowSums(exp(X %*% t(betamat_aug))) # sum of the exp(x'b)s
262 |       expxb = exp(X %*% betamat_aug[i,]) # individual exp(x'b)
263 |       G = expxb / sum_expxb # exp(X'bj) / sum^J(exp(X'bj))
264 |       g = (expxb * sum_expxb - expxb^2) / sum_expxb^2 # derivative of the logit function
265 |       
266 |       # Here the diagonal of A is the 'standard' standard error
267 |       # hat(A) = sum hat(gi)^2 * xi'xi / hat(Gi)(1-hat(Gi))
268 |       # or, Xtilde = X * sqrt(g^2/G(1-G)), A = Xtilde'Xtilde
269 |       X_a = X * as.vector(sqrt(g^2/(G*(1-G))))
270 |       A = t(X_a) %*% X_a
271 |       
272 |       # robust standard error, again following PW(1996)
273 |       mu = y[,i] - G
274 |       X_b = X * as.vector(mu * g / G / (1-G))
275 |       B = t(X_b) %*% X_b
276 |       Var_b = solve(A) %*% B %*% solve(A)
277 |       std_b = sqrt(diag(Var_b))
278 |       # std_b= sqrt(diag(solve(A))) is the "unrobust" standard error. 
279 |       vcov[[i]] = Var_b
280 |       if(i>1) sigmat[i-1,] = std_b
281 |     }
282 |   }
283 |   
284 |   ###end of insert--nonparametric bootstrap procedure (clustered SE and vcov)
285 |   
286 |   listmat = list()
287 |   for (i in 1:(j - 1)) {
288 |     tabout = matrix(ncol = 4, nrow = k + 1)
289 |     tabout[, 1:2] = t(rbind(betamat[i, ], sigmat[i, ]))
290 |     tabout[, 3] = tabout[, 1]/tabout[, 2]
291 |     tabout[, 4] = 2 * (1 - pnorm(abs(tabout[, 3])))
292 |     colnames(tabout) = c("estimate", "std", "z", "p-value")
293 |     if (length(Xnames) > 0) 
294 |       rownames(tabout) = Xnames
295 |     listmat[[i]] = tabout
296 |   }
297 |   if (length(ynames) > 0) 
298 |     names(listmat) = ynames[2:j]
299 |   outlist = list()
300 |   outlist$estimates = listmat
301 |   outlist$baseline = ynames[1]
302 |   outlist$likelihood = opt$maximum
303 |   outlist$conv_code = opt$code
304 |   outlist$convergence = paste(opt$type, paste(as.character(opt$iterations), 
305 |                                               "iterations"), opt$message, sep = ",")
306 |   outlist$count = c(Obs = n, Explanatories = k, Choices = j)
307 |   outlist$y = y
308 |   outlist$X = X
309 |   outlist$rowNo = row.remain
310 |   outlist$coefficient = betamat_aug
311 |   names(vcov) = ynames
312 |   outlist$vcov = vcov
313 |   outlist$cluster = cluster
314 |   outlist$reps=ifelse(is.null(cluster),0,reps)
315 |   
316 |   print(paste("Fractional logit model estimation completed. Time:", 
317 |               round(proc.time()[3] - start.time[3], 1), "seconds"))
318 |   return(structure(outlist, class = "fmlogit"))
319 | }
320 | 
321 | 
322 | 
323 | 
324 | 
325 | 
326 | 
327 | 


--------------------------------------------------------------------------------
/R/marginals.R:
--------------------------------------------------------------------------------
  1 | #' Average Partial Effects of the Covariates
  2 | #' 
  3 | #' Calculate average partial effects (APE) of independent variable from a fractional multinomial logit model. 
  4 | #' 
  5 | #' @param object An "fmlogit" object.
  6 | #' @param effect Can be "marginal", for marginal effect; or "discrete", for discrete changes from
  7 | #' the min to the max. 
  8 | #' @param marg.type Type of marginal or discrete effects to be computed. Default to "atmean", the effect at 
  9 | #' the mean of all covariates. Also take "aveacr", the averaged effects across all observations. See details. 
 10 | #' @param se Whether to calculate standard errors for those margins. See details. 
 11 | #' @param varlist A string vector which provides the name of variables to calculate 
 12 | #' the marginal effect. If missing, all variables except the constant will be calculated. 
 13 | #' Use "constant" if wish to compute the marginal effect of constant. 
 14 | #' @param marg.list A list of matrices storing the marginal effect matrix for each observation. Exists 
 15 | #' only if marg.type="aveacr". 
 16 | #' @param at Specify values of the X-matrix at which the partial effect will be retrieved. Expect a vector input
 17 | #' of length K-1. Only supported for \code{marg.type="atmean"}. See \code{predict.fmlogit(newdata)}. 
 18 | #' @param R Number of times to sample for the Krinsky-Robb standard error. Default to 1000. 
 19 | #' @details This module calculates the average partial effects (APEs) from a fractional multinomial logit model.
 20 | #' Partial effects are the counterpart of the marginal effects in a linear model setting. In linear models, 
 21 | #' usually the parameter estimate itself represents marginal effect (if the variable in question is continuous). 
 22 | #' In logit models, however, the parameter estimates at hand is the effect on log-ratio between the choice variable
 23 | #' and the baseline variable. This function is intended to extract APEs from the 
 24 | #' coefficient estimates completed from the fractional multinomial logit models.
 25 | #' 
 26 | #' This function allows for two types of partial effects: marginal effect, and discrete effect.
 27 | #' Marginal effect represents how a unit change in one continuous variable x may influence the choice variable y. 
 28 | #' The estimate of marginal effect is very straighforward. However, special care is needed when averaging 
 29 | #' the marginal effect across observations to acquire APE. One approach is to use the estimate of the marginal effect while setting
 30 | #' other explanatory variables at the mean. We call this marginal effect at the mean (MEM), which corresponds
 31 | #' to the option \code{marg.type=atmean}. Another approach is to take the average of marginal effects for each
 32 | #' individual. We call this average marginal effect (AME), which corresponds to the option \code{marg.type=
 33 | #' aveacr}. 
 34 | #' 
 35 | #' The discrete effect represents how a discrete change in one specific x, discrete or continuous, influence the choice variable y. 
 36 | #' This is more useful for categorical variables, as calculating the "marginal effect" makes little sense
 37 | #' for them. In this function, we calculate the discrete effect by changing the explanatory variable from 
 38 | #' its minimum to its maximum. For a binary variable, this is just the difference between 0 and 1. Similar 
 39 | #' to the marginal effect case, we also have discrete effect at the mean (DEM), corresponding to \code{marg.type= atmean}
 40 | #'  and average dscrete effect (ADE), corresponding to \code{marg.type=aveacr}.
 41 | #' 
 42 | #' Standard error is provided for the effects by using Krinsky-Robb(KR) method. Krinsky-Robb is a simulation-based
 43 | #' method that calculates the empirical value of a function given a known distribution of its variables. Here 
 44 | #' we provide Krinsky-Robb standard error for MEM and DEM, and the user can specify how many times of 
 45 | #' simulation \code{R} should the Krinsky-Robb algorithm run. 
 46 | #' 
 47 | #' The user can also specify a subset of explanatory variables when calculating effects. This is done through
 48 | #' specifying string vectors containing the column names of the explanatory variables to \code{varlist}. As the
 49 | #' KR standard error can be time-consuming, it is advised to calculate only the variables in need. 
 50 | #' 
 51 | #' @return The function returns an object of class "fmlogit.margins". It contains the following component:
 52 | #' @return \code{effects} A matrix of calculated effects.
 53 | #' @return \code{se} A matrix of standard errors corresponding to the effects. Shows up if se=T for the 
 54 | #' input parameter.
 55 | #' @return \code{ztable} A list of matrices containing effects, standard errors, z-stats and p-values.
 56 | #' @return \code{R} Number of simulation times for Krinsky-Robb standard error calculation. Null if se=F.  
 57 | #' @return \code{expl} String message explaining the effects calculated.    
 58 | #' 
 59 | #' @examples 
 60 | #' #results1 = fmlogit(y,X)
 61 | #' effects(results1,effect="marginal")
 62 | #' effects(results1,effect="discrete",varlist = colnames(object$X)[c(1,3)])
 63 | #' @export effects.fmlogit
 64 | 
 65 | effects.fmlogit<-function(object,effect=c("marginal","discrete"),
 66 |                           marg.type="atmean",se=F,varlist = NULL,at=NULL,R=1000){
 67 |   j=length(object$estimates)+1; K=dim(object$estimates[[1]])[1]; N=dim(object$y)[1]
 68 |   betamat = object$coefficient
 69 |   R = R # for Krinsky-Robb sampling
 70 |   # determine variables
 71 |   Xnames = colnames(object$X); ynames = colnames(object$y)
 72 |   if(length(varlist)==0){
 73 |     varlist=Xnames[-K]
 74 |     var_colNo = c(1:(K-1))
 75 |     k = length(var_colNo)
 76 |   }else{
 77 |     var_colNo = unlist(lapply(varlist, function(x) {which(Xnames == x)}))
 78 |     if(length(varlist) != length(var_colNo)) stop("Unrecognized varlist input. Please double check your spelling")
 79 |     k = length(var_colNo)
 80 |   }
 81 |   
 82 |   xmarg = matrix(ncol=k,nrow=j)
 83 |   se_mat = matrix(ncol=k,nrow=j)
 84 |   marg_list = list()
 85 |   
 86 |   if(effect == "marginal"){
 87 |     # calculate marginal effects
 88 |     yhat = predict(object); yhat = as.matrix(yhat)
 89 |     for(c in var_colNo){
 90 |       c1 = which(var_colNo == c)
 91 |       if(marg.type == "aveacr"){
 92 |         # this is the average marginal effect for all observations
 93 |         beta_bar = as.vector(yhat %*% betamat[,c])
 94 |         betak_long = matrix(rep(betamat[,c],N),nrow=N,byrow=T)
 95 |         marg_mat =  yhat * (betak_long-beta_bar)
 96 |         xmarg[,c1] = colMeans(marg_mat)
 97 |         marg_list[[c1]] = marg_mat
 98 |       }
 99 |       if(marg.type == "atmean"){
100 |         # this is the marginal effect at the mean
101 |         # mean calculation
102 |         if(is.null(at)) at = colMeans(object$X[,-K])
103 |         yhat_mean = predict(object,newdata=at)
104 |         beta_bar = sum(yhat_mean * betamat[,c])
105 |         betak = betamat[,c]
106 |         marg_vec = yhat_mean * (betak - beta_bar)
107 |         xmarg[,c1] = as.numeric(marg_vec) 
108 |       }
109 |       if(se==T){
110 |         # se calculation, using atmean by default
111 |         se_k = rep(0,j)
112 |         for(i in 1:j){
113 |           se_k[i] = sqrt(diag(object$vcov[[i]])[c])
114 |           new_betak = rnorm(R,betamat[j,c],se_k[i])
115 |           marg_matrix = matrix(nrow=R,ncol=j)
116 |           for(r in 1:R){
117 |             new_betamat = betamat; new_betamat[i,c] = new_betak[r]
118 |             yhat_mean = predict(object,newdata=colMeans(object$X[,-K]),newbeta = new_betamat)
119 |             beta_bar = sum(yhat_mean * new_betamat[,c])
120 |             betak = new_betamat[,c]
121 |             marg_vec = yhat_mean * (betak - beta_bar)
122 |             marg_matrix[r,i] = as.numeric(marg_vec)[i]
123 |           }
124 |           se_mat[i,c1] = sd(marg_matrix[,i])
125 |         }}}}
126 |   
127 |   if(effect=="discrete"){
128 |     for(c in var_colNo){
129 |       c1 = which(var_colNo == c)
130 |       if(marg.type == "aveacr"){
131 |         Xmin <- Xmax <- object$X[,-K]
132 |         Xmin[,c] = min(object$X[,c])
133 |         Xmax[,c] = max(object$X[,c])
134 |         yhat_min = predict(object,newdata=Xmin)
135 |         yhat_max = predict(object,newdata=Xmax)
136 |         ydisc = yhat_max - yhat_min
137 |         xmarg[,c1] = colMeans(ydisc)
138 |         marg_list[[c1]] = ydisc
139 |       }
140 |       if(marg.type == "atmean"){
141 |         if(is.null(at)) at = colMeans(object$X[,-K])
142 |         Xmin <- Xmax <- at
143 |         Xmin[c] = min(object$X[,c])
144 |         Xmax[c] = max(object$X[,c])
145 |         yhat_min = predict(object,newdata=Xmin)
146 |         yhat_max = predict(object,newdata=Xmax)
147 |         ydisc = yhat_max - yhat_min
148 |         xmarg[,c1] = as.numeric(ydisc)
149 |       }
150 |       if(se==T){
151 |         # se calculation for discrete margins. using atmean by default
152 |         se_k = rep(0,j)
153 |         Xmin <- Xmax <- colMeans(object$X[,-K])
154 |         Xmin[c] = min(object$X[,c])
155 |         Xmax[c] = max(object$X[,c])
156 |         marg_matrix = matrix(nrow=R,ncol=j)
157 |         for(i in 1:j){
158 |           se_k[i] = sqrt(diag(object$vcov[[i]])[c])
159 |           new_betak = rnorm(R,betamat[j,c],se_k[i])      
160 |           for(r in 1:R){
161 |             new_betamat = betamat; new_betamat[i,c] = new_betak[r]
162 |             yhat_min = predict(object,newdata=Xmin,newbeta = new_betamat)
163 |             yhat_max = predict(object,newdata=Xmax,newbeta = new_betamat)
164 |             ydisc = yhat_max - yhat_min
165 |             marg_matrix[r,i] = as.numeric(ydisc)[i]
166 |           }
167 |           se_mat[i,c1] = sd(marg_matrix[,i])
168 |         }}}}
169 |   # generating hypothesis testing tables.
170 |   listmat = list()
171 |   if(se){
172 |     for(i in 1:k){
173 |       tabout = matrix(ncol=4,nrow=j)
174 |       tabout[,1:2] = cbind(xmarg[,i],se_mat[,i])
175 |       tabout[,3] = tabout[,1] / tabout[,2]
176 |       tabout[,4] = 2*(1-pnorm(abs(tabout[,3])))
177 |       colnames(tabout) = c("estimate","std","z","p-value")
178 |       rownames(tabout) = ynames
179 |       listmat[[i]] = tabout
180 |     }
181 |     names(listmat)=varlist
182 |   }
183 |   
184 |   
185 |   colnames(xmarg) <- colnames(se_mat) <- varlist
186 |   rownames(xmarg) <- rownames(se_mat) <-colnames(object$y)
187 |   outlist=list()
188 |   outlist$effects = xmarg
189 |   if(se==T){outlist$se = se_mat; outlist$ztable = listmat}
190 |   if(marg.type=="aveacr") {names(marg_list)=varlist; outlist$marg.list = marg_list}
191 |   marg.type.out = ifelse(marg.type=="atmean","at the mean,","average across observations,")
192 | 
193 |   # please include this in the file
194 |   outlist$R = ifelse(se,R,0)
195 |   # please
196 | 
197 |   outlist$expl = paste(effect,"effect",marg.type.out,
198 |                        ifelse(se==T,"Krinsky-Robb standard error calculated","standard error not computed"))
199 |   return(structure(outlist,class="fmlogit.margins"))
200 | }
201 | 


--------------------------------------------------------------------------------
/R/plot_effects.R:
--------------------------------------------------------------------------------
  1 | #' Plot marginal or discrete effects, at each observation & for each choice
  2 | #' 
  3 | #' Plot the desired effect at each observed value for each choice
  4 | #' 
  5 | #' @param object An "fmlogit.margins" object.
  6 | #' @param varlist A string vector which provides the name of variables to plot the effect.
  7 | #'  If missing, all variables in object will be plotted.
  8 | #' @param X The covariates matrix. Recommend to use element X from the fmlogit object. 
  9 | #' @param y The covariates matrix. Recommend to use element y from the fmlogit object. 
 10 | #' @param against A vector with the same length as the number of observations in the model. 
 11 | #' Serve as the x-axis in the plots.
 12 | #' @param against.x A character string, Supply the column name in the X matrix to be plot against.
 13 | #' @param against.y A character string, Supply the column name in the y matrix to be plot against.
 14 | #' @param group.x A character string. Supply the column name in the X matrix to be grouped upon. 
 15 | #' @param group.by A character string. Supply additional algebra emposed on the group variable. 
 16 | #' @param mfrow A numeric vector with two elements. Specify the number of rows and columns in a panel.
 17 | #' Similar to par(mfrow=c()). Default to Null, and the program will choose a square panel. 
 18 | #' @return Panel plots of effects vs. chosen variables
 19 | #' @details 
 20 | #' This function provides a visualization tool for potentially heterogeneous marginal and discrete effects.
 21 | #' The function lets the user to plot marginal effects to detect any patterns in the effects, in itself
 22 | #' and against other variables. The plot also allows visualization of sub-groups in data, which can be
 23 | #' very useful to visualize categorical and dummy variables. 
 24 | #' 
 25 | #' The functions takes an fmlogit.margins object, created by the effects(fmlogit) function. Note that since 
 26 | #' the plotting requires marginal effects for all observations, the object should be created by choosing 
 27 | #' \code{marg.type="aveacr"}, the average across method for effects calculation. 
 28 | #' 
 29 | #' Additional parameters including \code{varlist}, a vector of string variable names to be plotted. \code{X}
 30 | #'  and \code{y}, the dependent and independent variable matrix in the original regression model. 
 31 | #'  
 32 | #'  \code{against}, \code{against.x}, and \code{against.y} allows different variables to be chosen
 33 | #'  as the x-axis. \code{against} directly supplies the vector to be plotted against, whereas \code{against.x}
 34 | #'  and \code{against.y} supplies variable names in the original dataset. Note that the user has to provide
 35 | #'  \code{X} and \code{y} in order to use the column name option, respectively. 
 36 | #'  
 37 | #'  \code{group.x} supplies the column name in the X matrix to be grouped by. The plot will be able to 
 38 | #'  differentiate different groups by colors. Additionally, the user can supply a string to \code{group.by},
 39 | #'  which provides a algebra method that will be evaluated on the group vector. For example, choose 
 40 | #'  \code{group.x = "a"} and \code{group.by= ">0"} will create two groups, one with X$a>0, and one with X$a
 41 | #'  <=0
 42 | #' @examples  
 43 | #' # Not running
 44 | #' # results1 = fmlogit(y,X)
 45 | #' # effect1 = effects(results1,effect="marginal",marg.type="aveacr")
 46 | #' 
 47 | #' # Plot only takes effects with marg.type="aveacr". 
 48 | #' plot(effect1,X=results1$X,against.x = "popdens", group = "tot", groupby = ">3")
 49 | #' @export plot.fmlogit.margins
 50 | 
 51 | 
 52 | 
 53 | plot.fmlogit.margins = function(object,varlist=NULL,X=NULL,y=NULL, 
 54 |                                 against=NULL,against.x=NULL,against.y=NULL,
 55 |                                 group.x=NULL, group.algebra=NULL,
 56 |                                 mfrow=NULL){
 57 |   require(ggplot2)
 58 |   require(grid)
 59 |   
 60 |   if(is.null(object[["marg.list"]])) stop("Please choose marg.type=aveacr when calculating effects")
 61 |   k = ncol(object$effects); j = nrow(object$effects); N = nrow(object$marg.list[[1]]); 
 62 |   Xnames = colnames(object$effects) ; ynames = rownames(object$effects)
 63 |   # X = object$X; y=object$y
 64 |   
 65 |   # determine variable list
 66 |   if(length(varlist)==0){
 67 |     varlist=Xnames
 68 |     var_colNo = 1:k
 69 |   }else{
 70 |     var_colNo = which(Xnames %in% varlist)
 71 |     k = length(var_colNo)
 72 |   }
 73 |   if(k==0) stop("Variable list not matched. Please check your varlist input.")
 74 |   
 75 |   # determine panel size
 76 |   if(is.null(mfrow)){
 77 |     js = ceiling(sqrt(j))
 78 |     jr = ifelse(js*(js-1)>j,js-1,js)
 79 |   }else{
 80 |     jr = mfrow[1]; js = mfrow[2]
 81 |   }
 82 |   
 83 |   # determine plotting x axis. 
 84 |   if(is.null(against) & is.null(against.x) & is.null(against.y)){
 85 |     M.against=1:N 
 86 |     ag.name = "ObsNo"
 87 |   }else if(is.null(against.x)==F){
 88 |     M.against = X[,against.x]
 89 |     if(is.null(M.against)){
 90 |       stop("against.x not found in variable list. Please double check your spelling")
 91 |     }
 92 |     ag.name = against.x
 93 |   }else if(is.null(against.y)==F){
 94 |     M.against = y[,against.y] 
 95 |     ag.name = against.y
 96 |   }else{M.against=against}
 97 |   
 98 |   
 99 |   # determine group variables
100 |   if(is.null(group.x) & is.null(group.algebra)) {M.group=NULL; g.name=NULL}
101 |   if(is.null(group.x)==F) {M.group = X[,group.x]; g.name.display <- g.name <- group.x;}
102 |   if(is.null(group.algebra)==F) {
103 |     M.group = eval(parse(text=paste("X[,",'"',group.x,'"',"]",group.algebra,sep="")))
104 |     M.group = ifelse(M.group,"Yes","No")
105 |     g.name = group.x
106 |     g.name.display = paste(group.x,group.algebra,sep="")
107 |     }
108 |   
109 |   for(c in var_colNo){
110 |     ggplot()
111 |     pushViewport(viewport(layout = grid.layout(jr, js)))
112 |     temp.data = cbind(object$marg.list[[c]],M.against)
113 |     temp.data = as.data.frame(temp.data)
114 |     colnames(temp.data) = c(colnames(object$marg.list[[c]]),ag.name)
115 |     if(is.null(M.group)==F){
116 |       temp.data = cbind(temp.data,as.factor(M.group))
117 |       colnames(temp.data)[-1] = g.name}
118 |     for(i in 1:j){
119 |       g <- ggplot(temp.data,aes_string(ag.name,ynames[i],color=g.name)) + geom_point() 
120 |       g <- g + geom_hline(yintercept = 0) + theme_classic() + ggtitle(paste("Effects on", Xnames[c]))
121 |       if(is.null(M.group)==F) g <- g + theme(legend.title = element_text(colour="black"))+
122 |         scale_color_discrete(name=g.name.display)
123 |       print(g,vp = viewport(layout.pos.row = ifelse(i%%jr==0,jr,i%%jr), layout.pos.col = (i-1) %/%js + 1) )
124 |   }
125 | }}


--------------------------------------------------------------------------------
/R/plot_effects_1.R:
--------------------------------------------------------------------------------
 1 | #' Plot marginal or discrete effects of willingness to pay
 2 | #' 
 3 | #' Plot marginal or discrete effects of willingness to pay, potentially against another variable
 4 | #' 
 5 | #' @param object An "fmlogit" object.
 6 | #' @param varlist A string vector which provides the name of variables to plot the effect.
 7 | #'  If missing, all variables in object will be plotted.
 8 | #' @param X The covariates matrix. Recommend to use element X from the fmlogit object. 
 9 | #' @param y The covariates matrix. Recommend to use element y from the fmlogit object. 
10 | #' @param against A vector with the same length as the number of observations in the model. 
11 | #' Serve as the x-axis in the plots.
12 | #' @param mfrow A numeric vector with two elements. Specify the number of rows and columns in a panel.
13 | #' Similar to par(mfrow=c()). Default to Null, and the program will choose a square panel. 
14 | #' @param plot.show If true, the plot will be created. Otherwise the function returns raw data that can be
15 | #' used to create user-specified (fancier) plots. 
16 | #' @return Panel plots of effects vs. chosen variables
17 | #' @details 
18 | #' This function provides a visualization tool for potentially heterogeneous marginal and discrete effects.
19 | #' The function lets the user to plot marginal effects to detect any patterns in the effects, in itself
20 | #' and against other variables. The plot also allows visualization of sub-groups in data, which can be
21 | #' very useful to visualize categorical and dummy variables. 
22 | #' 
23 | #' The functions takes an fmlogit.margins object, created by the effects(fmlogit) function. Note that since 
24 | #' the plotting requires marginal effects for all observations, the object should be created by choosing 
25 | #' \code{marg.type="aveacr"}, the average across method for effects calculation. 
26 | #' 
27 | #' Additional parameters including \code{varlist}, a vector of string variable names to be plotted. \code{X}
28 | #'  and \code{y}, the dependent and independent variable matrix in the original regression model. 
29 | #'  
30 | #'  \code{against}, \code{against.x}, and \code{against.y} allows different variables to be chosen
31 | #'  as the x-axis. \code{against} directly supplies the vector to be plotted against, whereas \code{against.x}
32 | #'  and \code{against.y} supplies variable names in the original dataset. Note that the user has to provide
33 | #'  \code{X} and \code{y} in order to use the column name option, respectively. 
34 | #'  
35 | #'  \code{group.x} supplies the column name in the X matrix to be grouped by. The plot will be able to 
36 | #'  differentiate different groups by colors. Additionally, the user can supply a string to \code{group.by},
37 | #'  which provides a algebra method that will be evaluated on the group vector. For example, choose 
38 | #'  \code{group.x = "a"} and \code{group.by= ">0"} will create two groups, one with X$a>0, and one with X$a
39 | #'  <=0
40 | #' @examples  
41 | #' # Not running
42 | #' # results1 = fmlogit(y,X)
43 | #' # effect1 = effects(results1,effect="marginal",marg.type="aveacr")
44 | #' 
45 | #' # Plot only takes effects with marg.type="aveacr". 
46 | #' plot(effect1,X=results1$X,against.x = "popdens", group = "tot", groupby = ">3")
47 | #' @export plot.fmlogit
48 | 
49 | 
50 | plot.fmlogit = function(object,wtp.vec,varlist, against=NULL,mfrow=NULL,t=500,effect=c("discrete","marginal"),
51 |                         type="l",plot.show=T,...){
52 |   K = ncol(object$X); j = ncol(object$y); N = nrow(object$X); 
53 |   Xnames = colnames(object$X) ; ynames = colnames(object$y)
54 |   X = object$X; y=object$y
55 |   
56 |   # determine variable list
57 |   var_colNo = which(Xnames %in% varlist)
58 |   k = length(var_colNo)
59 |   
60 |   if(is.null(mfrow)){
61 |     js = ceiling(sqrt(k))
62 |     jr = ifelse(js*(js-1)>=k,js-1,js)
63 |   }else{
64 |     jr = mfrow[1]; js = mfrow[2]
65 |   }
66 |   
67 |   if(!is.null(against)) {
68 |     ag_No = which(Xnames == against)
69 |     if(length(ag_No)==0) stop(paste("The against vector specified,",against,
70 |                                     "is not in the list of explanatory variables. Please check again."))
71 |     ag_min = min(X[,ag_No]); ag_max = max(X[,ag_No])
72 |     ag_vec = seq(ag_min,ag_max,length.out = t)
73 |     wtp_mat = matrix(nrow=t,ncol=k)
74 |     colnames(wtp_mat) = varlist
75 |     for(i in 1:t){
76 |       newdata = colMeans(X[,-K])
77 |       newdata[ag_No] = ag_vec[i]
78 |       wtp_mat[i,] = wtp(effects(object,effect=effect,se=F,varlist=varlist,at=newdata),wtp.vec)[[1]]
79 |     }
80 |   }else{
81 |     against="ObsNo"
82 |     ag_vec=1:N
83 |     wtp_mat = matrix(nrow=N,ncol=k)
84 |     colnames(wtp_mat) = varlist
85 |     for(i in 1:N){
86 |       newdata = X[i,-K]
87 |       wtp_mat[i,] = wtp(effects(object,effect=effect,se=F,varlist=varlist,at=newdata),wtp.vec)[[1]]
88 |     }
89 |   }
90 |   # plotting
91 |   if(plot.show){
92 |     par(mfrow=c(jr,js))
93 |     if(is.null(type)){type="l"} # default to line plot. 
94 |     for(i in 1:k){
95 |       plot(ag_vec,wtp_mat[,i],xlab=against,ylab=paste(effect,"effect of", varlist[i]),...)
96 |     }}
97 |   return(list(ag_vec,wtp_mat))
98 | }
99 | 


--------------------------------------------------------------------------------
/R/predictions.R:
--------------------------------------------------------------------------------
 1 | #' Extract fitted values, residuals, and predictions
 2 | #' 
 3 | #' @name fitted.fmlogit
 4 | #' @aliases residuals.fmlogit
 5 | #' @aliases predicted.fmlogit
 6 | #' Extract fitted dependent variable from a fractional multinomial logit model. 
 7 | #' @param object A "fmlogit" object.
 8 | #' @param newdata A new X matrix to perform model prediction. If Null, default to the original dataset. 
 9 | #' X can be a vector with length k, or a matrix with k columns, where k is the number of explanatory 
10 | #' variables in the original model. 
11 | #' @param newbeta A new augmented matrix of coefficients that can be used to predict outcome variables. 
12 | #' Feeds into object$coefficient, which contains the baseline coefficient. Useful for constructing
13 | #' confidence intervals via simulation or bootstrapping. 
14 | #' @examples 
15 | #' #results1 = fmlogit(y,X)
16 | #' fitted(results1)
17 | #' residuals(results1)
18 | #' predict(results1)
19 | #' # predict using the first observation from the original dataset.
20 | #' predict(results1,X[1,])
21 | #' @rdname fitted.fmlogit
22 | #' @export fitted.fmlogit
23 | #' 
24 | 
25 | 
26 | fitted.fmlogit <-function(object){
27 |   j=length(object$estimates)+1; k=dim(object$estimates[[1]])[1]; N=dim(object$y)[1]
28 |   betamat_aug = object$coefficient; X=object$X; y=object$y
29 |   sum_expxb = rowSums(exp(X %*% t(betamat_aug))) # sum of the exp(x'b)s
30 |   yhat = y
31 |   for(i in 1:j){
32 |     expxb = exp(X %*% betamat_aug[i,]) # individual exp(x'b)
33 |     yhat[,i] = expxb / sum_expxb
34 |   }
35 |   return(as.data.frame(yhat))
36 | }
37 | 
38 | #' @rdname fitted.fmlogit
39 | #' @export residuals.fmlogit
40 | #' 
41 | residuals.fmlogit <- function(object){
42 |   yhat = fitted(object)
43 |   return(as.data.frame(object$y-yhat))
44 | }
45 | 
46 | #' @rdname fitted.fmlogit
47 | #' @export predict.fmlogit
48 | #' 
49 | predict.fmlogit <- function(object,newdata=NULL,newbeta = NULL){
50 |   if(length(newdata)==0) return(fitted(object))
51 |   if(length(newbeta)>0) object$coefficient = newbeta
52 |   j=length(object$estimates)+1; k=dim(object$estimates[[1]])[1]; N=dim(object$y)[1]
53 |   betamat_aug = object$coefficient;
54 |   newdata = as.matrix(newdata)
55 |   if(length(newdata) == dim(newdata)[1]) newdata = t(newdata) # vector
56 |   if(k != dim(newdata)[2]+1) stop(paste("Dimension of newdata is wrong. Should be",k-1,"instead of",dim(newdata)[2]))
57 |   X = cbind(newdata,1); N = dim(X)[1]
58 |   yhat = matrix(ncol=j,nrow=N); colnames(yhat) = colnames(object$y)
59 |   sum_expxb = rowSums(exp(X %*% t(betamat_aug))) # sum of the exp(x'b)s
60 |   for(i in 1:j){
61 |     expxb = exp(X %*% betamat_aug[i,]) # individual exp(x'b)
62 |     yhat[,i] = expxb / sum_expxb
63 |   }
64 |   return(as.data.frame(yhat))
65 | }


--------------------------------------------------------------------------------
/R/spending_data.R:
--------------------------------------------------------------------------------
 1 | #' Government Spending by Dutch Cities in 2005
 2 | #'
 3 | #' Data from 429 Dutch cities with governmental spending on each sub-category
 4 | #' , and city attributes. 
 5 | #'
 6 | #' @docType data
 7 | #'
 8 | #' @usage data(spending)
 9 | #' 
10 | #' @format A data frame with 429 row and 12 columns. 
11 | #' @keywords datasets
12 | #'
13 | #' @source \href{http://fmwww.bc.edu/repec/bocode/c/citybudget.dta}
14 | #'
15 | #' @examples
16 | #' spending
17 | 
18 | "spending"
19 | 
20 | 


--------------------------------------------------------------------------------
/R/summary.R:
--------------------------------------------------------------------------------
  1 | #' Generate summary tables for fmlogit objects
  2 | #' 
  3 | #' Generate tables of coefficient estimates, partial effects, and willingness to pay from
  4 | #' fmlogit-type objects. 
  5 | #' 
  6 | #' @name summary.fmlogit
  7 | #' @aliases summary.fmlogit.margins
  8 | #' @aliases summary.fmlogit.wtp
  9 | #' 
 10 | #' @param object an object with class "fmlogit", "fmlogit.margins", or "fmlogit.wtp". 
 11 | #' @param varlist select a subset of variable names to be processed. Default to NULL, of which all variables will
 12 | #' be processed.
 13 | #' @param sepline whether the output table uses separate lines for coefficients and standard errors. 
 14 | #' @param digits number of digits to be signifed. Default to show 3 digits. 
 15 | #' @param add.info whether to add additional descriptive information to the output. 
 16 | #' @param list whether to output a list object, or a single data frame. 
 17 | #' @param sigcode the significance code to be used. Has to be a three-component vector. 
 18 | #' @return Either a list (for display purposes) or a data.frame (for csv output purposes). If list return (which is
 19 | #' the default) is selected, then the list will contain 4 components: $estimates the estimate; $N number of 
 20 | #' observations, $llf value of the log-likelihood function; and $baseline the name of the baseline choice. 
 21 | #' 
 22 | #' @details This module provides summary methods for three fmlogit objects: \code{fmlogit}, \code{fmlogit.margins}
 23 | #' , and \code{fmlogit.wtp}. 
 24 | #' 
 25 | #' The summary method offers several options to the users. The user can choose for a list output \code{list=T}, which is
 26 | #'  good for display and quoting purposes, or a data frame output \code{list=F}, which is good for table outputs. The user
 27 | #' can also specify whether to provide additional information other than the parameter estimates, whether to use 
 28 | #' seperate lines for the estimates and the standard errors (which mimics the output style in Stata),
 29 | #'  as well as the significance code. 
 30 | #' 
 31 | #' @examples 
 32 | #' # generate fmlogit summary
 33 | #' #results1 = fmlogit(y,X)
 34 | #' 
 35 | #' # generate marginal effects summary
 36 | #' #effects1 = effects(results1,effect="marginal")
 37 | #' summary(effects1)
 38 | #' 
 39 | #' # generate latex style output
 40 | #' # require(xtable)
 41 | #' xtable(summary(effects1,list=F,sepline=T))
 42 | #' @rdname summary.fmlogit
 43 | #' @export summary.fmlogit
 44 | 
 45 | ############
 46 | # generate fmlogit style table
 47 | ###########
 48 | 
 49 | summary.fmlogit = function(object,varlist=NULL,sepline=F,digits=3,add.info=T,list=T,sigcode=c(0.05,0.01,0.001),
 50 |                            print=F){
 51 |   # define significance code first. 
 52 |   asterisk = function(x,k=sigcode){
 53 |     if(x>k[1]) return("")
 54 |     if(x>k[2]) return("*")
 55 |     if(x>k[3]){return("**")}else
 56 |     {return("***")}
 57 |   }
 58 |   # main text  
 59 |   # pre matters
 60 |   if(!class(object)=="fmlogit") stop("Expect an fmlogit object. Wrong object type given.")
 61 |   ynames = names(object[[1]]); Xnames = rownames(object[[1]][[1]])
 62 |   if(length(varlist)==0){varlist=Xnames}
 63 |   var_colNo = which(Xnames %in% varlist)
 64 |   j = object$count[3]; K = length(var_colNo)
 65 |   if(K < length(varlist)) warning("Some variables requested are not in the variable list. Those variables are omitted.")
 66 |   varlist = Xnames[var_colNo]
 67 |   # generating tables
 68 |   if(!sepline){
 69 |     store_mat = matrix(ncol=j-1,nrow=K)
 70 |     colnames(store_mat)=ynames
 71 |     rownames(store_mat)=Xnames[var_colNo]
 72 |     for(i in 1:(j-1)){
 73 |       temp_data = signif(object$estimates[[i]][var_colNo,],digits=digits)
 74 |       if(is.null(dim(temp_data))){
 75 |         store_mat[,i] = paste(temp_data[1],"(",temp_data[2],")",asterisk(temp_data[4]),sep="")
 76 |         next
 77 |       }
 78 |       store_mat[,i]=apply(temp_data, 1, function(x) paste(x[1],"(",x[2],")",asterisk(x[4]),sep=""))    
 79 |     }}else{
 80 |       store_beta = store_se = matrix(ncol=j-1,nrow=K)   
 81 |       colnames(store_beta)=ynames
 82 |       rownames(store_beta)=varlist
 83 |       for(i in 1:(j-1)){
 84 |         temp_data = signif(object$estimates[[i]][var_colNo,],digits=digits)
 85 |         if(is.null(dim(temp_data))) temp_data = as.matrix(temp_data)
 86 |         store_beta[,i]=apply(temp_data,1, function(x) paste(x[1],asterisk(x[4]),sep=""))
 87 |         store_se[,i]=apply(temp_data, 1, function(x) paste("(",x[2],")",sep=""))
 88 |       }
 89 |       for(i in 1:K){
 90 |         if(i==1) store_mat=matrix(ncol=j-1)
 91 |         store_mat = rbind(store_mat,store_beta[i,],store_se[i,])
 92 |       }
 93 |       store_mat=store_mat[-1,]
 94 |       rownames(store_mat) = rep(" ",length=nrow(store_mat))
 95 |       rownames(store_mat)[seq(1,K*2,2)] = varlist
 96 |     }
 97 |   # output matters
 98 |   sig.print = paste("Significance code: 0", "'***'", sigcode[3], "'**'", sigcode[2], "'*'", sigcode[1], "' ", 1)
 99 |   if(add.info){
100 |     nc = paste("N=",object$count[1],sep="")
101 |     llf = paste("log pseudo-likelihood=",round(object$likelihood,digits=2),sep="")
102 |     bl = paste("Baseline choice:", object$baseline)
103 |   }
104 |   if(list){
105 |     outlist = list(estimates=store_mat)
106 |     if(add.info){
107 |       outlist$N = nc
108 |       outlist$llf = llf
109 |       outlist$baseline = bl
110 |       outlist$sigcode = sig.print
111 |     }
112 |     if(print){print(outlist)}
113 |     return(outlist)
114 |   }else{
115 |     if(add.info){
116 |       info = matrix(ncol=j-1,nrow=4)
117 |       info[,1] = c(nc,llf,bl,sig.print)
118 |       store_mat = rbind(store_mat,info)
119 |     }
120 |     if(print){print(store_mat)}
121 |     return(as.data.frame(store_mat))
122 |   }
123 | }
124 | 
125 | ##########
126 | # summary for fmlogit.margins
127 | ##########
128 | 
129 | #' @rdname summary.fmlogit
130 | #' @export summary.fmlogit.margins
131 | 
132 | summary.fmlogit.margins = function(object,varlist=NULL,sepline=F,digits=3,add.info=T,list=T,sigcode=c(0.05,0.01,0.001),
133 |                                    print=F){
134 |   # define significance code first. 
135 |   asterisk = function(x,k=sigcode){
136 |     if(x>k[1]) return("")
137 |     if(x>k[2]) return("*")
138 |     if(x>k[3]){return("**")}else
139 |     {return("***")}
140 |   }
141 |   # main text  
142 |   if(!class(object)=="fmlogit.margins") stop("Expect an fmlogit.margins object. Wrong object type given.")
143 |   ynames = rownames(object[[1]]); Xnames = colnames(object[[1]])
144 |   if(length(varlist)==0) varlist=Xnames
145 |   var_colNo = which(Xnames %in% varlist)
146 |   j = length(ynames); K = length(var_colNo)
147 |   if(K < length(varlist)) warning("Some variables requested are not in the variable list. Those variables are omitted.")
148 |   varlist = Xnames[var_colNo]
149 |   
150 |   # table process
151 |   if(object$R==0) sepline=FALSE
152 |   if(!sepline){
153 |     store_mat = matrix(ncol=j,nrow=K)
154 |     colnames(store_mat)=ynames
155 |     rownames(store_mat)=Xnames
156 |     if(object$R>0){
157 |       for(i in var_colNo){
158 |         temp_data = signif(object$ztable[[i]],digits=digits)
159 |         store_mat[i,]=apply(temp_data, 1, function(x) paste(x[1],"(",x[2],")",asterisk(x[4]),sep=""))    
160 |       }}else{
161 |         store_mat = signif(t(object$effects),digits=digits)
162 |       }
163 |   }else{
164 |     store_beta = store_se = matrix(ncol=j,nrow=K)   
165 |     colnames(store_beta)=ynames
166 |     rownames(store_beta)=Xnames
167 |     for(i in var_colNo){
168 |       temp_data = signif(object$ztable[[i]],digits=digits)
169 |       store_beta[i,]=apply(temp_data,1, function(x) paste(x[1],asterisk(x[4]),sep=""))
170 |       store_se[i,]=apply(temp_data, 1, function(x) paste("(",x[2],")",sep=""))
171 |     }
172 |     for(i in 1:K){
173 |       if(i==1) store_mat=matrix(ncol=j)
174 |       store_mat = rbind(store_mat,store_beta[i,],store_se[i,])
175 |     }
176 |     store_mat=store_mat[-1,]
177 |     rownames(store_mat) = rep("",length=nrow(store_mat))
178 |     rownames(store_mat)[seq(1,K*2,2)] = varlist
179 |   }
180 |   # output matters
181 |   sig.print = paste("Significance code: 0", "'***'", sigcode[3], "'**'", sigcode[2], "'*'", sigcode[1], "' ", 1)
182 |   if(add.info){
183 |     expl = object$expl
184 |   }
185 |   if(list){
186 |     outlist = list(estimates=store_mat)
187 |     if(add.info){
188 |       outlist$expl = expl
189 |       outlist$sigcode = sig.print
190 |     }
191 |     if(print){print(outlist)}
192 |     return(outlist)
193 |   }else{
194 |     if(add.info){
195 |       info = matrix(ncol=j,nrow=2)
196 |       info[,1] = c(expl,sig.print)
197 |       store_mat = rbind(store_mat,info)
198 |     }
199 |     if(print){print(store_mat)}
200 |     return(as.data.frame(store_mat))
201 |   }
202 | }
203 | 
204 | ############
205 | # generate willingness to pay tables
206 | ############
207 | 
208 | #' @rdname summary.fmlogit
209 | #' @export summary.fmlogit.wtp
210 | 
211 | summary.fmlogit.wtp = function(object,varlist=NULL,sepline=F,digits=3,sigcode=c(0.05,0.01,0.001),
212 |                                print=F){
213 |   # define significance code first. 
214 |   asterisk = function(x,k=sigcode){
215 |     if(x>k[1]) return("")
216 |     if(x>k[2]) return("*")
217 |     if(x>k[3]){return("**")}else
218 |     {return("***")}
219 |   }
220 |   # main text  
221 |   if(!class(object)=="fmlogit.wtp") stop("Expect an fmlogit.wtp object. Wrong object type given.")
222 |   if(colnames(object$wtp)[1]!="estimate") return(object$wtp) # no need to summary. 
223 |   Xnames = rownames(object$wtp)
224 |   if(length(varlist)==0) varlist=Xnames
225 |   var_colNo = which(Xnames %in% varlist)
226 |   K = length(var_colNo)
227 |   if(K < length(varlist)) warning("Some variables requested are not in the variable list. Those variables are omitted.")
228 |   varlist = Xnames[var_colNo]
229 |   sig.print = paste("Significance code: 0", "'***'", sigcode[3], "'**'", sigcode[2], "'*'", sigcode[1], "' ", 1)
230 |   if(!sepline){
231 |   # table process
232 |   store_mat = apply(signif(object$wtp[var_colNo,],digits=digits), 1, function(x) paste(x[1],"(",x[2],")",asterisk(x[4]),sep=""))
233 |   store_mat = as.data.frame(store_mat)
234 |   colnames(store_mat)=NULL
235 |   # output matters
236 |   }else{          
237 |     store_beta=apply(signif(object$wtp[var_colNo,],digits=digits),1, function(x) paste(x[1],asterisk(x[4]),sep=""))
238 |     store_se=apply(signif(object$wtp[var_colNo,],digits=digits), 1, function(x) paste("(",x[2],")",sep=""))
239 |     for(i in 1:K){
240 |       if(i==1) store_mat=vector()
241 |       store_mat = c(store_mat,store_beta[i],store_se[i])
242 |     }
243 |     names(store_mat) = rep("",length=length(store_mat))
244 |     names(store_mat)[seq(1,K*2,2)] = varlist
245 |   }
246 |   if(print){print(store_mat);print(sig.print)}    
247 |   return(store_mat)
248 | }
249 | 


--------------------------------------------------------------------------------
/R/wtp.R:
--------------------------------------------------------------------------------
 1 | #' "Willingness to Pay" for fmlogit models
 2 | #' 
 3 | #' Calculates the willingness to pay for fractional multinomial logit models.  
 4 | #' 
 5 | #' @param object An "fmlogit.margins" object.
 6 | #' @param wtp.vec A 1*J vector that contains the willingness to pay for each choice j. 
 7 | #' @param varlist A string vector which provides the name of variables to calculate 
 8 | #' the wtp. If missing, all variables in object will be calculated. 
 9 | #' @return A matrix containing the estimates, standard error, z-stats, and p-value. 
10 | #' @details This function calculates the aggregate effect of a variable on the 
11 | #' "willingness to pay" by linearly multiplying the average partial effect with ex-ante (arbitary) 
12 | #' willingness to pay numbers associated with each choice. 
13 | #' 
14 | #' Suppose there are three choices A,B,C, each with a willingness to pay (or cost, profit, budget),
15 | #' of 100, 200, and 300. The discrete effect of variable X on A,B and C are 0.5, 0.5, and -1, with 
16 | #' standard error 0.2, 0.3 and 0.5. The aggregated discrete effect of X on the total willingness 
17 | #' to pay (or cost), is thus 100*0.5 + 200*0.5 + 300*(-1) = -150. And the standard error can be also
18 | #' calculated to be 162.8, assuming that the standard error is independent. 
19 | #' A simple z-test is provided to test whether the aggregate effect is different from zero. 
20 | #' 
21 | #' Note that if the input fmlogit.margins object has no standard error computation, then no standard error
22 | #' @examples
23 | #' #results1 = fmlogit(y,X)
24 | #' #effects1 = effects(results1,effect="marginal",se=T)
25 | #' # assume that the WTP = 1,2,3,...J for each choice j. 
26 | #' wtp(effects1,seq(1:nrow(effects1$effects))) 
27 | #' @export wtp
28 | 
29 | wtp = function(object,wtp.vec,varlist=NULL,indv.obs=F){
30 |   j=nrow(object$effects); k=ncol(object$effects)
31 |   Xnames = colnames(object$effects); ynames = rownames(object$effects)
32 |   if(length(varlist)==0){
33 |     varlist=Xnames
34 |     var_colNo = c(1:k)
35 |     k = length(var_colNo)
36 |   }else{
37 |     var_colNo = which(varlist %in% Xnames)
38 |     k = length(var_colNo)
39 |   }
40 |   if(length(wtp.vec)!=j) stop("Wrong length of wtp.vec. Please check specification again.")
41 |   # wtp calcs
42 |   betamat = object$effects[,varlist]; semat = object$se[,varlist]
43 |   wtp_mean = wtp.vec %*% betamat
44 | 
45 |   if(object$R>0){ # prevent a bug that does not output R in the effects.fmlogit module. 
46 |   wtp_se = sqrt(wtp.vec^2 %*% semat^2)
47 |   # output tables
48 |   tabout = matrix(ncol=4,nrow=k)
49 |   tabout[,1] = wtp_mean
50 |   tabout[,2] = wtp_se
51 |   tabout[,3] = tabout[,1] / tabout[,2]
52 |   tabout[,4] = 2*(1-pnorm(abs(tabout[,3])))
53 |   colnames(tabout) = c("estimate","std","z","p-value")
54 |   rownames(tabout) = varlist
55 |   }else tabout = wtp_mean
56 |   if(indv.obs){
57 |     wtp_mat = matrix(ncol = k, nrow=nrow(object$marg.list[[1]]))
58 |     for(c in var_colNo){
59 |       c1 = which(var_colNo == c)
60 |       wtp_mat[,c1] = as.matrix(object$marg.list[[c1]]) %*% wtp.vec
61 |     }
62 |     colnames(wtp_mat) = varlist
63 |   }
64 |   # output list
65 |   outlist = list()
66 |   outlist$wtp = tabout
67 |   if(indv.obs) outlist$wtp.obs = wtp_mat
68 |   return(structure(outlist,class="fmlogit.wtp"))
69 | }
70 |   


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "The fmlogit Package: A Light Document"
 3 | author: "Xinde James Ji" 
 4 | date: "Oct 10, 2016"
 5 | output: pdf_document
 6 | ---
 7 | 
 8 | This document provides an overview of the fmlogit package in R. Updates will be published at [my github site](https://github.com/f1kidd/fmlogit). Any suggestions or concerns are welcome. 
 9 | 
10 | # What is the fractional multinomial logit model?
11 | Fractional multinomial logit models estimate fractional responses by modelling the dependent variables as fractions using multinomial logits. It is the preferred model when the true data generation process is indeed fractions of multiple choices. Fractional responses arise naturally in various settings. For example, a municipality allocates its budget across multiple departments, and we are interested in the proportion of the budget that each department receives. Or, there are multiple candidates in a presendential election, and we are interested in explaining the percentage of support for each candidate in each state. 
12 | 
13 | The model is distinct in that: 1) each of the responses lies between 0 and 1, and 2) the share of all responses adds up to one. The fmlogit model uses these two distinct factors, and models them explicitly. 
14 | 
15 | # How to install fmlogit
16 | Type the following code into your R console:
17 | ```R
18 | require(devtools)
19 | install_github("f1kidd/fmlogit")
20 | library(fmlogit)
21 | ```
22 | 
23 | # Why do we need fmlogit in R? 
24 | Don't we already have an fmlogit module in Stata? Yes, and you are very welcome to [check that out](http://maartenbuis.nl/software/fmlogit.html) if you can afford a Stata license. 
25 | 
26 | However, this package offers several advantages over Stata's fmlogit module, namely:
27 | ### 1. Integration with the R Platform
28 | Implementating the model in R offers the opportunity to integrate the whole empirical process within a free, open-source platform. With the help of numerous R packages, everything can be accomplished in a single environment including data processing, estimation, post-estimation, and final manuscript writing. This is a huge advantage over stata. 
29 | 
30 | ### 2. Post-estimation improvements
31 | The marginal effect estimation in this package is much faster than Stata's fmlogit package. In this package user can specify which variable(s), and what type of partial effect to be calculated. This results in a huge gain in running time for the post-estimation commands. 
32 | 
33 | Also, this package allows hypothesis testing for marginal and discrete effects while Stata does not. The standard error is calculated via Krinsky-Robb method, which allows empirical hypothesis testing without knowing the underlying distribution of the effects. 
34 | 
35 | ### 3. Estimation flexibility
36 | This package allows factor variable inputs, and automatically transform it into dummy variables. This is not (explicitly) allowed in Stata. 
37 | 
38 | ### 4. Extensions
39 | This package also allows the user to easily calculate and infer the "average aggregate partial effect" given a user-specified weight scheme. This is done through linearly aggregating the attribute of each choice (e.g., expected profit/utility of each choice) with the calculated APE. 
40 | 
41 | # How does the estimator work?
42 | The estimator used here is an extension of that used in [Papke and Wooldridge (1996)](http://onlinelibrary.wiley.com.ezproxy.lib.utexas.edu/doi/10.1002/(SICI)1099-1255(199611)11:6%3C619::AID-JAE418%3E3.0.CO;2-1/abstract). There, they proposed a quasi-maximum likelihood(QMLE) estimator for fractional response variables. As their approach applies to binary response variables, here we expand it to a multinomial response variables with fractional structure. 
43 | 
44 | The steps involved in calculating the estimator are as follows: 
45 | ## Step 1. Construct the multinomial logit likelihood
46 | This step is straightforward. A simple multinomial logit transformation will do the job. For detailed derivations and formula, please see the technical document [here](https://github.com/f1kidd/fmlogit/blob/master/Documentation/fmlogit_docs.pdf) where I explain the econometric steps in detail.  
47 | ## Step 2. Maximize the sum of the log likelihood function
48 | Generally, R is not the most efficient scientific computing machine that exists, and that is the tradeoff we have to face. Here, the program offers several maximization methods provided in the *maxLik* package. The recommended algorithm is either conjugate gradients (CG), or Berndt-Hall-Hall-Hausman (BHHH). For a large dataset it may take a while (running for one hour is entirely possible, so don't terminate the program prematurely).
49 | ## step 3. Calculate robust standard error
50 | Here the program follows Papke & Wooldridge (1996), and construcst the robust standard error estimator for the parameters. The program also offers a simple z-test for parameters based on the standard error. 
51 | 
52 | # How do the post-estimation commands work?
53 | Calculating partial effects for limited dependent variables can be tricky, and this is especially true for multinomial logit models. The coefficients obtained in the regression model represent the logit-transformed odds ratio for that specific choice against the baseline choice. This is not intuitive at all in terms of actual effects on that specific choice. The bottom line is, the coefficients and standard errors obtained in the original models are not the basis for evaluating hypotheses. 
54 | 
55 | ##  Marginal and discrete effects
56 | Instead, researchers need to compute what are called the "partial effects", as we usually do in linear models. However, the partial effect in logit-type models is tricky because the effects are heterogeneous across different observations. In other word, each unique observation have a different set of partial effects.
57 |  
58 | We provide two types of partial effects: marginal and discrete. The marginal effect represents how a unit change in one variable k changes the value in choice j, i.e., $\frac{\partial x_k}{\partial y_j}$. The discrete effect represents how a discrete change in variable k, usually from the minimum to the maximum, changes the value in choice j, i.e., $\hat{y}_{j,x_k=1}-\hat{y}_{j,x_k=0}$. 
59 | 
60 | Typically, two types of aggregation measures are used to illustrate the global APE: one is the partial effects at the mean (PEM), which is the partial effect of variable k when every other variables are set at their mean ; and the other is partial effect of the average (PEA), which is the average of partial effect for all observations. We allow both of the two options to be specified. 
61 | 
62 | A more inclusive approach will be to plot the marginal effect of interest across all individuals. This is not provided in the function, but can certainly be implemented in future developments. Another possibility will be to calculate the so-called locally averaged treatment effect (LATE), where the effect of interest will be centered around a certain range of values.  
63 | 
64 | ## Standard Errors for APEs
65 | Here we adopt the simulation-based Krinsky-Robb method to compute standard errors for marginal and discrete effects as opposed to the empirical delta method used in Stata. These two methods should be asymptotically equivalent. However, using Krinsky-Robb allow us to perform hypothesis testing on the effects, while Delta method cannot accomplish that.
66 | 
67 | Hypothesis testing is done using the standard normal z-test by treating the APE estimates as normally distributed. The approach is very simple: say we test $H_0: D_j=0$. We just need to compare 0 with our N draws, and see if it falls out of the 95% mass. This is a major advantage we provide here compared to Stata's fmlogit module. 
68 | 
69 | # Practical Concerns
70 | One of the concerns for the package is the computation speed of the estimation process. The maximization process can take somewhere from 20 seconds to 1 hour, depending on how large the dataset is. This is certainly a limitation. This is the inherent drawback of R's computation speed, and I can do nothing about that. 
71 | 
72 | However, the loss in estimation will certainly be compensated in the post-estimation process. Stata's dfmlogit command is very slow (takes somewhere between 5-60 minutes), while here the effects calculation takes seconds to complete. 
73 | 
74 | # References
75 | Papke, L. E. and Wooldridge, J. M. (1996), Econometric methods for fractional response variables with an application to 401(k) plan participation rates. J. Appl. Econ., 11: 619-632.
76 | 
77 | Wulff, Jesper N. "Interpreting Results From the Multinomial Logit Model Demonstrated by Foreign Market Entry." Organizational Research Methods (2014): 1094428114560024.
78 | 
79 | Mullahy, J., 2015. Multivariate fractional regression estimation of econometric share models. Journal of Econometric Methods 4(1), 71-100.
80 | 
81 | 
82 | 
83 | 
84 | 
85 | 
86 | 


--------------------------------------------------------------------------------
/data/spending.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/f1kidd/fmlogit/62ff38adefc95b1bfc8324c095a7a3f50775607d/data/spending.rda


--------------------------------------------------------------------------------
/man/effects.fmlogit.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/marginals.R
 3 | \name{effects.fmlogit}
 4 | \alias{effects.fmlogit}
 5 | \title{Average Partial Effects of the Covariates}
 6 | \usage{
 7 | \method{effects}{fmlogit}(object, effect = c("marginal", "discrete"),
 8 |   marg.type = "atmean", se = F, varlist = NULL, at = NULL,
 9 |   R = 1000)
10 | }
11 | \arguments{
12 | \item{object}{An "fmlogit" object.}
13 | 
14 | \item{effect}{Can be "marginal", for marginal effect; or "discrete", for discrete changes from
15 | the min to the max.}
16 | 
17 | \item{marg.type}{Type of marginal or discrete effects to be computed. Default to "atmean", the effect at 
18 | the mean of all covariates. Also take "aveacr", the averaged effects across all observations. See details.}
19 | 
20 | \item{se}{Whether to calculate standard errors for those margins. See details.}
21 | 
22 | \item{varlist}{A string vector which provides the name of variables to calculate 
23 | the marginal effect. If missing, all variables except the constant will be calculated. 
24 | Use "constant" if wish to compute the marginal effect of constant.}
25 | 
26 | \item{at}{Specify values of the X-matrix at which the partial effect will be retrieved. Expect a vector input
27 | of length K-1. Only supported for \code{marg.type="atmean"}. See \code{predict.fmlogit(newdata)}.}
28 | 
29 | \item{R}{Number of times to sample for the Krinsky-Robb standard error. Default to 1000.}
30 | 
31 | \item{marg.list}{A list of matrices storing the marginal effect matrix for each observation. Exists 
32 | only if marg.type="aveacr".}
33 | }
34 | \value{
35 | The function returns an object of class "fmlogit.margins". It contains the following component:
36 | 
37 | \code{effects} A matrix of calculated effects.
38 | 
39 | \code{se} A matrix of standard errors corresponding to the effects. Shows up if se=T for the 
40 | input parameter.
41 | 
42 | \code{ztable} A list of matrices containing effects, standard errors, z-stats and p-values.
43 | 
44 | \code{R} Number of simulation times for Krinsky-Robb standard error calculation. Null if se=F.
45 | 
46 | \code{expl} String message explaining the effects calculated.
47 | }
48 | \description{
49 | Calculate average partial effects (APE) of independent variable from a fractional multinomial logit model.
50 | }
51 | \details{
52 | This module calculates the average partial effects (APEs) from a fractional multinomial logit model.
53 | Partial effects are the counterpart of the marginal effects in a linear model setting. In linear models, 
54 | usually the parameter estimate itself represents marginal effect (if the variable in question is continuous). 
55 | In logit models, however, the parameter estimates at hand is the effect on log-ratio between the choice variable
56 | and the baseline variable. This function is intended to extract APEs from the 
57 | coefficient estimates completed from the fractional multinomial logit models.
58 | 
59 | This function allows for two types of partial effects: marginal effect, and discrete effect.
60 | Marginal effect represents how a unit change in one continuous variable x may influence the choice variable y. 
61 | The estimate of marginal effect is very straighforward. However, special care is needed when averaging 
62 | the marginal effect across observations to acquire APE. One approach is to use the estimate of the marginal effect while setting
63 | other explanatory variables at the mean. We call this marginal effect at the mean (MEM), which corresponds
64 | to the option \code{marg.type=atmean}. Another approach is to take the average of marginal effects for each
65 | individual. We call this average marginal effect (AME), which corresponds to the option \code{marg.type=
66 | aveacr}. 
67 | 
68 | The discrete effect represents how a discrete change in one specific x, discrete or continuous, influence the choice variable y. 
69 | This is more useful for categorical variables, as calculating the "marginal effect" makes little sense
70 | for them. In this function, we calculate the discrete effect by changing the explanatory variable from 
71 | its minimum to its maximum. For a binary variable, this is just the difference between 0 and 1. Similar 
72 | to the marginal effect case, we also have discrete effect at the mean (DEM), corresponding to \code{marg.type= atmean}
73 |  and average dscrete effect (ADE), corresponding to \code{marg.type=aveacr}.
74 | 
75 | Standard error is provided for the effects by using Krinsky-Robb(KR) method. Krinsky-Robb is a simulation-based
76 | method that calculates the empirical value of a function given a known distribution of its variables. Here 
77 | we provide Krinsky-Robb standard error for MEM and DEM, and the user can specify how many times of 
78 | simulation \code{R} should the Krinsky-Robb algorithm run. 
79 | 
80 | The user can also specify a subset of explanatory variables when calculating effects. This is done through
81 | specifying string vectors containing the column names of the explanatory variables to \code{varlist}. As the
82 | KR standard error can be time-consuming, it is advised to calculate only the variables in need.
83 | }
84 | \examples{
85 | #results1 = fmlogit(y,X)
86 | effects(results1,effect="marginal")
87 | effects(results1,effect="discrete",varlist = colnames(object$X)[c(1,3)])
88 | }
89 | 


--------------------------------------------------------------------------------
/man/fitted.fmlogit.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/predictions.R
 3 | \name{fitted.fmlogit}
 4 | \alias{fitted.fmlogit}
 5 | \alias{residuals.fmlogit}
 6 | \alias{predict.fmlogit}
 7 | \title{Extract fitted values, residuals, and predictions}
 8 | \usage{
 9 | \method{fitted}{fmlogit}(object)
10 | 
11 | \method{residuals}{fmlogit}(object)
12 | 
13 | \method{predict}{fmlogit}(object, newdata = NULL, newbeta = NULL)
14 | }
15 | \arguments{
16 | \item{object}{A "fmlogit" object.}
17 | 
18 | \item{newdata}{A new X matrix to perform model prediction. If Null, default to the original dataset. 
19 | X can be a vector with length k, or a matrix with k columns, where k is the number of explanatory 
20 | variables in the original model.}
21 | 
22 | \item{newbeta}{A new augmented matrix of coefficients that can be used to predict outcome variables. 
23 | Feeds into object$coefficient, which contains the baseline coefficient. Useful for constructing
24 | confidence intervals via simulation or bootstrapping.}
25 | }
26 | \description{
27 | Extract fitted values, residuals, and predictions
28 | }
29 | \examples{
30 | #results1 = fmlogit(y,X)
31 | fitted(results1)
32 | residuals(results1)
33 | predict(results1)
34 | # predict using the first observation from the original dataset.
35 | predict(results1,X[1,])
36 | }
37 | 


--------------------------------------------------------------------------------
/man/fmlogit.Rd:
--------------------------------------------------------------------------------
  1 | % Generated by roxygen2: do not edit by hand
  2 | % Please edit documentation in R/fmlogit_main.R
  3 | \name{fmlogit}
  4 | \alias{fmlogit}
  5 | \title{Estimate Fractional Multinomial Logit Models}
  6 | \usage{
  7 | fmlogit(y, X, beta0 = NULL, MLEmethod = "CG", maxit = 5e+05,
  8 |   abstol = 1e-05, cluster = NULL, reps = 1000, ...)
  9 | }
 10 | \arguments{
 11 | \item{y}{the dependent variable (N*J). Can be a matrix or a named data frame.
 12 | The first column of the matrix is automatically treated as the baseline.}
 13 | 
 14 | \item{X}{independent variable (N*K). Can be a matrix or a named data frame.
 15 | If there is no intercept term in the X, then an intercept term is
 16 | automatically added.}
 17 | 
 18 | \item{beta0}{Initial value for beta used in optimization. Uses a 1*K(J-1)
 19 | vector. Default to a vector of zeros.}
 20 | 
 21 | \item{MLEmethod}{Method of optimization. Goes into
 22 | \code{maxLik(method=MLEmethod))}. Choose from "NR","BFGS","CG","BHHH","SANN",or "NM".  
 23 | Default to "CG", the conjugate gradients method. See Details.}
 24 | 
 25 | \item{maxit}{Maximum number of iteration.}
 26 | 
 27 | \item{abstol}{Tolerence.}
 28 | 
 29 | \item{cluster}{A vector of cluster to be used for clustered standard error computation. 
 30 | Default to NULL, no cluster computed.}
 31 | 
 32 | \item{reps}{Numbers of bootstrap replications to be computed for clustered standard errors.}
 33 | 
 34 | \item{...}{additional parameters that goes into \code{maxLik()}}
 35 | }
 36 | \value{
 37 | The function returns an object of class "fmlogit". Use \code{effects}, \code{predict}, 
 38 |  \code{residual}, \code{fitted} to extract various useful features of the value returned by 
 39 | \code{fmlogit}.
 40 | 
 41 | An object of class "fmlogit" contains the following components:
 42 | 
 43 | \code{estimates}   A list of matrices containing parameter estimates,
 44 |   standard errors, and hypothesis testing results.
 45 | 
 46 | \code{baseline}    The baseline choice
 47 | 
 48 | \code{likelihood}  The likelihood value
 49 | 
 50 | \code{conv_code}   Convergence diagnostics code.
 51 | 
 52 | \code{convergence} Convergence messages.
 53 | 
 54 | \code{count}       Provides dataset information
 55 | 
 56 | \code{y}           The dependent variable data frame.
 57 | 
 58 | \code{X}           The independent variable data frame. Augmented by factor dummy transformation
 59 | , constant term added.
 60 | 
 61 | \code{rowNo}       A vector of row numbers from the original X and y that is used for estimation.
 62 | 
 63 | \code{coefficient} Matrix of estimated coefficients. Augmented with the baseline coefficient
 64 | (which is a vector of zeros).
 65 | 
 66 | \code{vcov}        A list of matrices containing the robust variance covariance matrix for each choice
 67 | variable.
 68 | 
 69 | \code{cluster}     The vector of clusters.
 70 | 
 71 | \code{reps}        Number of bootstrap replications for clustered standard error
 72 | }
 73 | \description{
 74 | Used to estimate fractional multinomial logit models using quasi-maximum
 75 | likelihood estimations following Papke and Wooldridge(1996).
 76 | }
 77 | \details{
 78 | The fractional multinomial model is the expansion of the multinomial
 79 | logit to fractional responses. Unlike standard multinomial logit models,
 80 | which only considers 0-1 respones, fractional multinomial model considers the
 81 | case where the response variable is fractions that sums up to one. Examples
 82 | of these type of data are, percentages of budget spent in education, defense,
 83 | public health; fractions of a population that have middle school, high
 84 | school, college, or post college education, etc.
 85 | 
 86 | This function follows Papke and Wooldridge(1996)'s paper, in which they
 87 | proposed a quasi-maximum likelihood estimator for fractional response data.
 88 | The likelihood function used here is a standard multinomial likelihood
 89 | function, see \url{http://maartenbuis.nl/software/likelihoodFmlogit.pdf} for
 90 | the likelihood used here. Robust standard errors are provided following Papke
 91 | and Wooldridge(1996), in which they proposed an asymptotically consistent
 92 | estimator of variance.
 93 | 
 94 | Maximization is done by calling \code{\link{maxLik}}. maxLik is a wrapper function 
 95 | for different maximization methods in R. This include most methods provided by \code{\link{maxLik}},  
 96 | but also other methods such as BHHH(Berndt-Hall-Hall-Hausman). 
 97 | 
 98 | MLE convergence can be a problem in R, especially if dataset is large with many explanatory variables. 
 99 | It is recommended to call CG(Conjugate Gradients) or BHHH(Berndt-Hall-Hall-Hausman).
100 | Conjugate gradients method is usually faster, but could lead to non-convergence under 
101 | certain scenarios. BHHH is slower, but has better convergence performance.
102 | }
103 | \examples{
104 | data = spending
105 | X = data[,2:5]
106 | y = data[,6:11]
107 | results1 = fmlogit(y,X)
108 | }
109 | \references{
110 | Papke, L. E. and Wooldridge, J. M. (1996), Econometric methods
111 |   for fractional response variables with an application to 401(k) plan
112 |   participation rates. J. Appl. Econ., 11: 619-632.
113 | }
114 | 


--------------------------------------------------------------------------------
/man/plot.fmlogit.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/plot_effects_1.R
 3 | \name{plot.fmlogit}
 4 | \alias{plot.fmlogit}
 5 | \title{Plot marginal or discrete effects of willingness to pay}
 6 | \usage{
 7 | \method{plot}{fmlogit}(object, wtp.vec, varlist, against = NULL,
 8 |   mfrow = NULL, t = 500, effect = c("discrete", "marginal"),
 9 |   type = "l", plot.show = T, ...)
10 | }
11 | \arguments{
12 | \item{object}{An "fmlogit" object.}
13 | 
14 | \item{varlist}{A string vector which provides the name of variables to plot the effect.
15 | If missing, all variables in object will be plotted.}
16 | 
17 | \item{against}{A vector with the same length as the number of observations in the model. 
18 | Serve as the x-axis in the plots.}
19 | 
20 | \item{mfrow}{A numeric vector with two elements. Specify the number of rows and columns in a panel.
21 | Similar to par(mfrow=c()). Default to Null, and the program will choose a square panel.}
22 | 
23 | \item{plot.show}{If true, the plot will be created. Otherwise the function returns raw data that can be
24 | used to create user-specified (fancier) plots.}
25 | 
26 | \item{X}{The covariates matrix. Recommend to use element X from the fmlogit object.}
27 | 
28 | \item{y}{The covariates matrix. Recommend to use element y from the fmlogit object.}
29 | }
30 | \value{
31 | Panel plots of effects vs. chosen variables
32 | }
33 | \description{
34 | Plot marginal or discrete effects of willingness to pay, potentially against another variable
35 | }
36 | \details{
37 | This function provides a visualization tool for potentially heterogeneous marginal and discrete effects.
38 | The function lets the user to plot marginal effects to detect any patterns in the effects, in itself
39 | and against other variables. The plot also allows visualization of sub-groups in data, which can be
40 | very useful to visualize categorical and dummy variables. 
41 | 
42 | The functions takes an fmlogit.margins object, created by the effects(fmlogit) function. Note that since 
43 | the plotting requires marginal effects for all observations, the object should be created by choosing 
44 | \code{marg.type="aveacr"}, the average across method for effects calculation. 
45 | 
46 | Additional parameters including \code{varlist}, a vector of string variable names to be plotted. \code{X}
47 |  and \code{y}, the dependent and independent variable matrix in the original regression model. 
48 |  
49 |  \code{against}, \code{against.x}, and \code{against.y} allows different variables to be chosen
50 |  as the x-axis. \code{against} directly supplies the vector to be plotted against, whereas \code{against.x}
51 |  and \code{against.y} supplies variable names in the original dataset. Note that the user has to provide
52 |  \code{X} and \code{y} in order to use the column name option, respectively. 
53 |  
54 |  \code{group.x} supplies the column name in the X matrix to be grouped by. The plot will be able to 
55 |  differentiate different groups by colors. Additionally, the user can supply a string to \code{group.by},
56 |  which provides a algebra method that will be evaluated on the group vector. For example, choose 
57 |  \code{group.x = "a"} and \code{group.by= ">0"} will create two groups, one with X$a>0, and one with X$a
58 |  <=0
59 | }
60 | \examples{
61 |  
62 | # Not running
63 | # results1 = fmlogit(y,X)
64 | # effect1 = effects(results1,effect="marginal",marg.type="aveacr")
65 | 
66 | # Plot only takes effects with marg.type="aveacr". 
67 | plot(effect1,X=results1$X,against.x = "popdens", group = "tot", groupby = ">3")
68 | }
69 | 


--------------------------------------------------------------------------------
/man/plot.fmlogit.margins.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/plot_effects.R
 3 | \name{plot.fmlogit.margins}
 4 | \alias{plot.fmlogit.margins}
 5 | \title{Plot marginal or discrete effects, at each observation & for each choice}
 6 | \usage{
 7 | \method{plot}{fmlogit.margins}(object, varlist = NULL, X = NULL,
 8 |   y = NULL, against = NULL, against.x = NULL, against.y = NULL,
 9 |   group.x = NULL, group.algebra = NULL, mfrow = NULL)
10 | }
11 | \arguments{
12 | \item{object}{An "fmlogit.margins" object.}
13 | 
14 | \item{varlist}{A string vector which provides the name of variables to plot the effect.
15 | If missing, all variables in object will be plotted.}
16 | 
17 | \item{X}{The covariates matrix. Recommend to use element X from the fmlogit object.}
18 | 
19 | \item{y}{The covariates matrix. Recommend to use element y from the fmlogit object.}
20 | 
21 | \item{against}{A vector with the same length as the number of observations in the model. 
22 | Serve as the x-axis in the plots.}
23 | 
24 | \item{against.x}{A character string, Supply the column name in the X matrix to be plot against.}
25 | 
26 | \item{against.y}{A character string, Supply the column name in the y matrix to be plot against.}
27 | 
28 | \item{group.x}{A character string. Supply the column name in the X matrix to be grouped upon.}
29 | 
30 | \item{mfrow}{A numeric vector with two elements. Specify the number of rows and columns in a panel.
31 | Similar to par(mfrow=c()). Default to Null, and the program will choose a square panel.}
32 | 
33 | \item{group.by}{A character string. Supply additional algebra emposed on the group variable.}
34 | }
35 | \value{
36 | Panel plots of effects vs. chosen variables
37 | }
38 | \description{
39 | Plot the desired effect at each observed value for each choice
40 | }
41 | \details{
42 | This function provides a visualization tool for potentially heterogeneous marginal and discrete effects.
43 | The function lets the user to plot marginal effects to detect any patterns in the effects, in itself
44 | and against other variables. The plot also allows visualization of sub-groups in data, which can be
45 | very useful to visualize categorical and dummy variables. 
46 | 
47 | The functions takes an fmlogit.margins object, created by the effects(fmlogit) function. Note that since 
48 | the plotting requires marginal effects for all observations, the object should be created by choosing 
49 | \code{marg.type="aveacr"}, the average across method for effects calculation. 
50 | 
51 | Additional parameters including \code{varlist}, a vector of string variable names to be plotted. \code{X}
52 |  and \code{y}, the dependent and independent variable matrix in the original regression model. 
53 |  
54 |  \code{against}, \code{against.x}, and \code{against.y} allows different variables to be chosen
55 |  as the x-axis. \code{against} directly supplies the vector to be plotted against, whereas \code{against.x}
56 |  and \code{against.y} supplies variable names in the original dataset. Note that the user has to provide
57 |  \code{X} and \code{y} in order to use the column name option, respectively. 
58 |  
59 |  \code{group.x} supplies the column name in the X matrix to be grouped by. The plot will be able to 
60 |  differentiate different groups by colors. Additionally, the user can supply a string to \code{group.by},
61 |  which provides a algebra method that will be evaluated on the group vector. For example, choose 
62 |  \code{group.x = "a"} and \code{group.by= ">0"} will create two groups, one with X$a>0, and one with X$a
63 |  <=0
64 | }
65 | \examples{
66 |  
67 | # Not running
68 | # results1 = fmlogit(y,X)
69 | # effect1 = effects(results1,effect="marginal",marg.type="aveacr")
70 | 
71 | # Plot only takes effects with marg.type="aveacr". 
72 | plot(effect1,X=results1$X,against.x = "popdens", group = "tot", groupby = ">3")
73 | }
74 | 


--------------------------------------------------------------------------------
/man/spending.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/spending_data.R
 3 | \docType{data}
 4 | \name{spending}
 5 | \alias{spending}
 6 | \title{Government Spending by Dutch Cities in 2005}
 7 | \format{A data frame with 429 row and 12 columns.}
 8 | \source{
 9 | \href{http://fmwww.bc.edu/repec/bocode/c/citybudget.dta}
10 | }
11 | \usage{
12 | data(spending)
13 | }
14 | \description{
15 | Data from 429 Dutch cities with governmental spending on each sub-category
16 | , and city attributes.
17 | }
18 | \examples{
19 | spending
20 | }
21 | \keyword{datasets}
22 | 


--------------------------------------------------------------------------------
/man/summary.fmlogit.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/summary.R
 3 | \name{summary.fmlogit}
 4 | \alias{summary.fmlogit}
 5 | \alias{summary.fmlogit.margins}
 6 | \alias{summary.fmlogit.wtp}
 7 | \title{Generate summary tables for fmlogit objects}
 8 | \usage{
 9 | \method{summary}{fmlogit}(object, varlist = NULL, sepline = F,
10 |   digits = 3, add.info = T, list = T, sigcode = c(0.05, 0.01,
11 |   0.001), print = F)
12 | 
13 | \method{summary}{fmlogit.margins}(object, varlist = NULL, sepline = F,
14 |   digits = 3, add.info = T, list = T, sigcode = c(0.05, 0.01,
15 |   0.001), print = F)
16 | 
17 | \method{summary}{fmlogit.wtp}(object, varlist = NULL, sepline = F,
18 |   digits = 3, sigcode = c(0.05, 0.01, 0.001), print = F)
19 | }
20 | \arguments{
21 | \item{object}{an object with class "fmlogit", "fmlogit.margins", or "fmlogit.wtp".}
22 | 
23 | \item{varlist}{select a subset of variable names to be processed. Default to NULL, of which all variables will
24 | be processed.}
25 | 
26 | \item{sepline}{whether the output table uses separate lines for coefficients and standard errors.}
27 | 
28 | \item{digits}{number of digits to be signifed. Default to show 3 digits.}
29 | 
30 | \item{add.info}{whether to add additional descriptive information to the output.}
31 | 
32 | \item{list}{whether to output a list object, or a single data frame.}
33 | 
34 | \item{sigcode}{the significance code to be used. Has to be a three-component vector.}
35 | }
36 | \value{
37 | Either a list (for display purposes) or a data.frame (for csv output purposes). If list return (which is
38 | the default) is selected, then the list will contain 4 components: $estimates the estimate; $N number of 
39 | observations, $llf value of the log-likelihood function; and $baseline the name of the baseline choice.
40 | }
41 | \description{
42 | Generate tables of coefficient estimates, partial effects, and willingness to pay from
43 | fmlogit-type objects.
44 | }
45 | \details{
46 | This module provides summary methods for three fmlogit objects: \code{fmlogit}, \code{fmlogit.margins}
47 | , and \code{fmlogit.wtp}. 
48 | 
49 | The summary method offers several options to the users. The user can choose for a list output \code{list=T}, which is
50 |  good for display and quoting purposes, or a data frame output \code{list=F}, which is good for table outputs. The user
51 | can also specify whether to provide additional information other than the parameter estimates, whether to use 
52 | seperate lines for the estimates and the standard errors (which mimics the output style in Stata),
53 |  as well as the significance code.
54 | }
55 | \examples{
56 | # generate fmlogit summary
57 | #results1 = fmlogit(y,X)
58 | 
59 | # generate marginal effects summary
60 | #effects1 = effects(results1,effect="marginal")
61 | summary(effects1)
62 | 
63 | # generate latex style output
64 | # require(xtable)
65 | xtable(summary(effects1,list=F,sepline=T))
66 | }
67 | 


--------------------------------------------------------------------------------
/man/wtp.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/wtp.R
 3 | \name{wtp}
 4 | \alias{wtp}
 5 | \title{"Willingness to Pay" for fmlogit models}
 6 | \usage{
 7 | wtp(object, wtp.vec, varlist = NULL, indv.obs = F)
 8 | }
 9 | \arguments{
10 | \item{object}{An "fmlogit.margins" object.}
11 | 
12 | \item{wtp.vec}{A 1*J vector that contains the willingness to pay for each choice j.}
13 | 
14 | \item{varlist}{A string vector which provides the name of variables to calculate 
15 | the wtp. If missing, all variables in object will be calculated.}
16 | }
17 | \value{
18 | A matrix containing the estimates, standard error, z-stats, and p-value.
19 | }
20 | \description{
21 | Calculates the willingness to pay for fractional multinomial logit models.
22 | }
23 | \details{
24 | This function calculates the aggregate effect of a variable on the 
25 | "willingness to pay" by linearly multiplying the average partial effect with ex-ante (arbitary) 
26 | willingness to pay numbers associated with each choice. 
27 | 
28 | Suppose there are three choices A,B,C, each with a willingness to pay (or cost, profit, budget),
29 | of 100, 200, and 300. The discrete effect of variable X on A,B and C are 0.5, 0.5, and -1, with 
30 | standard error 0.2, 0.3 and 0.5. The aggregated discrete effect of X on the total willingness 
31 | to pay (or cost), is thus 100*0.5 + 200*0.5 + 300*(-1) = -150. And the standard error can be also
32 | calculated to be 162.8, assuming that the standard error is independent. 
33 | A simple z-test is provided to test whether the aggregate effect is different from zero. 
34 | 
35 | Note that if the input fmlogit.margins object has no standard error computation, then no standard error
36 | }
37 | \examples{
38 | #results1 = fmlogit(y,X)
39 | #effects1 = effects(results1,effect="marginal",se=T)
40 | # assume that the WTP = 1,2,3,...J for each choice j. 
41 | wtp(effects1,seq(1:nrow(effects1$effects))) 
42 | }
43 | 


--------------------------------------------------------------------------------