├── ELMNL_model ├── EMNL_model ├── data ├── X_TEST.pkl ├── X_TRAIN.pkl ├── y_TEST.pkl ├── y_TRAIN.pkl └── Q_test.csv ├── imgs ├── conv_p.png └── MNL_as_CNN.png ├── README.md ├── models_and_utils ├── models_utils.py ├── post_estimation_stats.py └── models.py └── MNL as a CNN.ipynb /ELMNL_model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ioaar/Interpretable-Embeddings-MNL/HEAD/ELMNL_model -------------------------------------------------------------------------------- /EMNL_model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ioaar/Interpretable-Embeddings-MNL/HEAD/EMNL_model -------------------------------------------------------------------------------- /data/X_TEST.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ioaar/Interpretable-Embeddings-MNL/HEAD/data/X_TEST.pkl -------------------------------------------------------------------------------- /data/X_TRAIN.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ioaar/Interpretable-Embeddings-MNL/HEAD/data/X_TRAIN.pkl -------------------------------------------------------------------------------- /data/y_TEST.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ioaar/Interpretable-Embeddings-MNL/HEAD/data/y_TEST.pkl -------------------------------------------------------------------------------- /data/y_TRAIN.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ioaar/Interpretable-Embeddings-MNL/HEAD/data/y_TRAIN.pkl -------------------------------------------------------------------------------- /imgs/conv_p.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ioaar/Interpretable-Embeddings-MNL/HEAD/imgs/conv_p.png -------------------------------------------------------------------------------- /imgs/MNL_as_CNN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ioaar/Interpretable-Embeddings-MNL/HEAD/imgs/MNL_as_CNN.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Interpretable-Embeddings-MNL 2 | This repository consists of the source code used in the paper: 3 | 4 | Combining Discrete Choice Models and Neural Networks through Embeddings: Formulation, Interpretability and Performance. 5 | 6 | ## Abstract 7 | This study proposes a novel approach that combines theory and data-driven choice models using Artificial Neural Networks (ANNs). In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables with a special focus on interpretability and model transparency. Although embedding representations within the logit framework have been conceptualized by [Pereira (2019)](https://arxiv.org/abs/1909.00154), their dimensions do not have an absolute definitive meaning, hence offering limited behavioral insights. The novelty of our work lies in enforcing interpretability to the embedding vectors by formally associating each of their dimensions to a choice alternative. Thus, our approach brings benefits much beyond a simple parsimonious representation improvement over dummy encoding, as it provides behaviorally meaningful outputs that can be used in travel demand analysis and policy decisions. Additionally, in contrast to previously suggested ANN-based Discrete Choice Models (DCMs) that either sacrifice interpretability for performance or are only partially interpretable, our models preserve interpretability of the utility coefficients for all the input variables despite being based on ANN principles. The proposed models were tested on two real world datasets and evaluated against benchmark and baseline models that use dummy-encoding. The results of the experiments indicate that our models deliver state-of-the-art predictive performance, outperforming existing ANN-based models while drastically reducing the number of required network parameters. 8 | -------------------------------------------------------------------------------- /models_and_utils/models_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from keras.optimizers import Adam 3 | from keras.callbacks import EarlyStopping 4 | from keras.models import load_model 5 | 6 | 7 | def cats2ints(Q_df_train): 8 | 9 | UNIQUE_CATS= sorted(list(set(Q_df_train.values.reshape(-1)))) 10 | cat2index={} 11 | 12 | for i in range(len(UNIQUE_CATS)): 13 | 14 | cat2index[UNIQUE_CATS[i]]=i 15 | 16 | return cat2index 17 | 18 | 19 | 20 | def cats2ints_transform(Q_df, cat2index): 21 | 22 | Q=[] 23 | 24 | for obs in Q_df.values: 25 | 26 | input_i=[cat2index[cat] for cat in obs] 27 | Q.append(input_i) 28 | 29 | 30 | return np.array(Q) 31 | 32 | 33 | 34 | def E_MNL_train(X_train, Q_train, y_train, model, 35 | N_EPOCHS= 10, VERBOSE= 1, 36 | save_model=1, model_filename= str()): 37 | 38 | NUM_CHOICES= X_train.shape[2] 39 | NUM_CONT_VARS= X_train.shape[1] 40 | NUM_EMB_VARS= Q_train.shape[-1] 41 | 42 | UNIQUE_CATS= sorted(list(set(Q_train.reshape(-1)))) 43 | NUM_UNIQUE_CATS= len(UNIQUE_CATS) 44 | 45 | mnl_model = model(NUM_CONT_VARS, NUM_EMB_VARS, 46 | NUM_CHOICES, NUM_UNIQUE_CATS) 47 | 48 | optimizer = Adam(clipnorm= 50.) 49 | mnl_model.compile(optimizer= optimizer, metrics= ["accuracy"], loss= 'categorical_crossentropy') 50 | 51 | Callback = EarlyStopping(monitor= 'loss', min_delta= 0, patience= 20) 52 | 53 | if VERBOSE: 54 | 55 | print(mnl_model.summary()) 56 | 57 | mnl_model.fit([X_train, Q_train], y_train, epochs= N_EPOCHS, 58 | steps_per_epoch= 50, shuffle= 'batch', 59 | verbose= VERBOSE, callbacks=[Callback]) 60 | 61 | 62 | pred_prob_train= mnl_model.predict(x= {'Features': X_train, 'input_categories': Q_train}) 63 | 64 | 65 | LL_train= sum([np.log(x) for x in np.multiply(pred_prob_train.reshape(-1), y_train.reshape(-1)) if x!= 0.0]) 66 | print('LL train:', LL_train) 67 | 68 | else: 69 | 70 | mnl_model.fit([X_train, Q_train], y_train, epochs= N_EPOCHS, 71 | steps_per_epoch= 50, shuffle= 'batch', 72 | verbose= VERBOSE, callbacks=[Callback]) 73 | 74 | if save_model: 75 | 76 | if not model_filename: 77 | 78 | print("Model file name not provided. Keras model is saved as 'temp_model'") 79 | 80 | model_filename= 'temp_model' 81 | 82 | mnl_model.save(model_filename) 83 | 84 | return mnl_model 85 | 86 | 87 | 88 | def EL_MNL_train(X_train, Q_train, y_train, model, 89 | n_extra_emb_dims= 2, N_NODES= 15, 90 | N_EPOCHS= 1, VERBOSE= 1, 91 | save_model= 1, model_filename= str()): 92 | 93 | NUM_CHOICES= X_train.shape[2] 94 | NUM_CONT_VARS= X_train.shape[1] 95 | NUM_EMB_VARS= Q_train.shape[-1] 96 | 97 | UNIQUE_CATS= sorted(list(set(Q_train.reshape(-1)))) 98 | NUM_UNIQUE_CATS= len(UNIQUE_CATS) 99 | 100 | XTRA_EMB_DIMS= n_extra_emb_dims 101 | 102 | mnl_model = model(NUM_CONT_VARS, NUM_EMB_VARS, 103 | NUM_CHOICES, NUM_UNIQUE_CATS, 104 | XTRA_EMB_DIMS, N_NODES) 105 | 106 | 107 | optimizer = Adam(clipnorm= 50.) 108 | mnl_model.compile(optimizer= optimizer, metrics= ["accuracy"], loss= 'categorical_crossentropy') 109 | 110 | Callback = EarlyStopping(monitor= 'loss', min_delta= 0, patience= 20) 111 | 112 | if VERBOSE: 113 | 114 | print(mnl_model.summary()) 115 | 116 | mnl_model.fit([X_train, Q_train], y_train, epochs= N_EPOCHS, 117 | steps_per_epoch= 50, shuffle= 'batch', 118 | verbose= VERBOSE, callbacks=[Callback]) 119 | 120 | pred_prob_train= mnl_model.predict(x= {'Features': X_train, 'input_categories': Q_train}) 121 | 122 | 123 | LL_train= sum([np.log(x) for x in np.multiply(pred_prob_train.reshape(-1), y_train.reshape(-1)) if x!= 0.0]) 124 | print('LL train:', LL_train) 125 | 126 | else: 127 | 128 | mnl_model.fit([X_train, Q_train], y_train, epochs= N_EPOCHS, 129 | steps_per_epoch= 50, shuffle= 'batch', 130 | verbose= VERBOSE, callbacks=[Callback]) 131 | 132 | if save_model: 133 | 134 | if not model_filename: 135 | 136 | print("Model file name not provided. Keras model is saved as 'temp_model'") 137 | 138 | model_filename= 'temp_model' 139 | 140 | mnl_model.save(model_filename) 141 | 142 | return mnl_model 143 | 144 | 145 | 146 | def model_load_and_predict(X, Q, y, model_filename= str()): 147 | 148 | mnl_model= load_model(model_filename) 149 | 150 | pred_prob= mnl_model.predict(x={'Features': X, 'input_categories': Q}) 151 | 152 | LL= sum([np.log(x) for x in np.multiply(pred_prob.reshape(-1), y.reshape(-1)) if x!=0.0]) 153 | 154 | return LL 155 | 156 | 157 | 158 | def model_predict(X, Q, y, trained_model): 159 | 160 | pred_prob= trained_model.predict(x={'Features': X, 'input_categories': Q}) 161 | 162 | LL= sum([np.log(x) for x in np.multiply(pred_prob.reshape(-1), y.reshape(-1)) if x!=0.0]) 163 | 164 | return LL 165 | -------------------------------------------------------------------------------- /models_and_utils/post_estimation_stats.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.stats import norm 3 | from keras import backend as K 4 | import pandas as pd 5 | 6 | 7 | 8 | def create_index(alfabet): # alphabet-->number of unique categories 9 | 10 | """ Maps categories (strings) to integers and creates 11 | a dictionary with look up index. """ 12 | 13 | index2alfa={} 14 | alfa2index={} 15 | 16 | for i in range(len(alfabet)): 17 | index2alfa[i]=alfabet[i] 18 | alfa2index[alfabet[i]]=i 19 | return index2alfa, alfa2index 20 | 21 | 22 | 23 | def get_betas_and_embeddings(trained_model, Q_df_train): 24 | 25 | 26 | UNIQUE_CATS= sorted(list(set(Q_df_train.values.reshape(-1)))) 27 | 28 | DICT={} 29 | 30 | DICT['index2alfa_from'], DICT['alfa2index_from']=create_index(UNIQUE_CATS) 31 | 32 | DICT['index2alfa_from'], DICT['alfa2index_from']=create_index(UNIQUE_CATS) 33 | betas_embs = trained_model.get_layer('Utilities_embs').get_weights()[0].reshape(-1) 34 | betas_exog = trained_model.get_layer('Utilities_exog').get_weights()[0].reshape(-1) 35 | embeddings= trained_model.get_layer('embeddings').get_weights()[0] 36 | 37 | DICT['embeddings']= embeddings 38 | DICT['betas_embs']= betas_embs 39 | DICT['betas_exog']= betas_exog 40 | 41 | return DICT 42 | 43 | 44 | 45 | def get_inverse_Hessian(model, model_inputs, labels, layer_name='Utilities'): 46 | 47 | """ This function was copied from: https://github.com/BSifringer/EnhancedDCM 48 | and was modified to handle singular matrix cases.""" 49 | 50 | data_size = len(model_inputs[0]) 51 | 52 | # Get layer and gradient w.r.t. loss 53 | beta_layer = model.get_layer(layer_name) 54 | beta_gradient = K.gradients(model.total_loss, beta_layer.weights[0])[0] 55 | 56 | # Get second order derivative operators (linewise of Hessian) 57 | Hessian_lines_op = {} 58 | for i in range(len(beta_layer.get_weights()[0])): 59 | Hessian_lines_op[i] = K.gradients(beta_gradient[i], beta_layer.weights[0]) 60 | 61 | # Define Functions that get operator values given inputed data 62 | input_tensors= model.inputs + model.sample_weights + model.targets + [K.learning_phase()] 63 | get_Hess_funcs = {} 64 | for i in range(len(Hessian_lines_op)): 65 | get_Hess_funcs[i] = K.function(inputs=input_tensors, outputs=Hessian_lines_op[i]) 66 | 67 | # Line by line Hessian average multiplied by data length (due to automatic normalization) 68 | Hessian=[] 69 | func_inputs=[*[inputs for inputs in model_inputs], np.ones(data_size), labels, 0] 70 | for j in range(len(Hessian_lines_op)): 71 | Hessian.append((np.array(get_Hess_funcs[j](func_inputs)))) 72 | Hessian = np.squeeze(Hessian)*data_size 73 | 74 | # The inverse Hessian: 75 | try: 76 | invHess = np.linalg.inv(Hessian) 77 | except np.linalg.LinAlgError: 78 | print(LinAlgError('Singular matrix')) 79 | return np.nan 80 | 81 | return invHess 82 | 83 | 84 | 85 | def get_stds(model, model_inputs, labels, layer_name='Utilities'): 86 | 87 | """ Gets the diagonal of the inverse Hessian, square rooted 88 | This function was copied from: https://github.com/BSifringer/EnhancedDCM 89 | and was modified to handle singular matrix cases.""" 90 | 91 | inv_Hess = get_inverse_Hessian(model, model_inputs, labels, layer_name) 92 | 93 | if isinstance(inv_Hess, float): 94 | 95 | return np.nan 96 | 97 | else: 98 | 99 | stds = [inv_Hess[i][i]**0.5 for i in range(inv_Hess.shape[0])] 100 | 101 | return np.array(stds).flatten() 102 | 103 | 104 | 105 | def model_summary(trained_model, X_train, Q_train, y_train, 106 | X_vars_names=[], Q_vars_names=[]): 107 | 108 | 109 | emb_betas_stds= get_stds(trained_model, [X_train, Q_train], y_train, layer_name='Utilities_embs') 110 | exog_betas_stds= get_stds(trained_model, [X_train, Q_train], y_train, layer_name='Utilities_exog') 111 | 112 | betas_embs= trained_model.get_layer('Utilities_embs').get_weights()[0].reshape(-1) 113 | betas_exog= trained_model.get_layer('Utilities_exog').get_weights()[0].reshape(-1) 114 | 115 | if not isinstance(emb_betas_stds, float) and not isinstance(exog_betas_stds, float): 116 | 117 | z_embs= betas_embs/emb_betas_stds 118 | 119 | p_embs = (1-norm.cdf(abs(z_embs)))*2 120 | 121 | z_exog= betas_exog/exog_betas_stds 122 | 123 | p_exog = (1-norm.cdf(abs(z_exog)))*2 124 | 125 | stats_exog=np.array(list(zip(X_vars_names, betas_exog, exog_betas_stds, z_exog, p_exog))) 126 | 127 | stats_embs=np.array(list(zip(Q_vars_names, betas_embs,emb_betas_stds,z_embs, p_embs))) 128 | 129 | stats_all=np.vstack([stats_exog,stats_embs]) 130 | 131 | df_stats=pd.DataFrame(index=[i[0] for i in stats_all], 132 | data=np.array([[np.float(i[1]) for i in stats_all],[np.float(i[2]) for i in stats_all], 133 | [np.float(i[3]) for i in stats_all], 134 | [np.round(np.float(i[4]),4) for i in stats_all]]).T, 135 | columns=['Betas','St errors', 't-stat','p-value']) 136 | 137 | return df_stats 138 | 139 | else: 140 | 141 | return np.nan 142 | -------------------------------------------------------------------------------- /MNL as a CNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Multinomial choice model as a NN:\n", 8 | "\n", 9 | "Given a choice set $C={1,2,..J}$ with $J$ number of alternatives, we consider a multinomial choice model, where and $X=\\{ x_{1}, x_{2}, ..., x_{K}\\}$ are the explanatory variables representing observed attributes of the choice alternative and the individual’s socio-demographic characteristics.\n", 10 | "\n", 11 | "The utility that individual $n$ associates with alternative $i$ is formally given by:\n", 12 | "

\n", 13 | "\n", 14 | "$U_{i,n}= V_{i,n} + \\epsilon, $ $\\forall{i}\\in{C}$ $(1)$ , where $\\epsilon$ is independently and identically distributed Type I Extreme Value.\n", 15 | "\n", 16 | "\n", 17 | "

\n", 18 | "Assuming that the systematic part of the utility is linear-in-parameter and considering a single vector of coefficients that applies to all the utily functions, $V_{i,n}$ can be described by the following equation:\n", 19 | "

\n", 20 | "\n", 21 | "$V_{i,n}=BX_{i,n}, $ $\\forall{i}\\in{C}$, where $B=\\{ \\beta_{1}, \\beta_{2}, ..., \\beta_{K}\\}$ are the preference parameters associated with the explanatory variables $X$ corresponding to explanatory variables for individual $n$.\n", 22 | "

\n", 23 | "\n", 24 | "\n", 25 | "The current study adopts the implementation of an MNL as a NN using a simple 2D-CNN architecture as suggested by [Sifringer et al. (2020)](https://www.researchgate.net/publication/344428513_Enhancing_discrete_choice_models_with_representation_learning). Although CNNs are traditionally used to analyse image and signal data using complex architectures that typically include non-linear activation functions and multiple channels and convolution layers, their weight-sharing architecture conveniently allow us to use them in a more simplified form to retrieve the MNL formulation as defined in $(2)$. \n", 26 | "\n", 27 | "\n", 28 | "Given a 2-dimensional input space $X$ of shape $(K, x_w)$ and a convolutional filter $k$ consisting of an array of trainable weights of shape $(k_h, k_w)$, we consider a CNN, with a single convolutional layer $L$. The CNN maps $X$ to an output space $v$ by sliding $k$ across the input $X$ and applying the dot product between $k$ and each region of $X$ (plus a bias term $\\alpha$), yielding each time a single scalar value. Thus, the shape $(v_h, v_w)$ of the output space $v$ is determined by the shapes of $X$ and $k$ according to the following formula: \n", 29 | "

\n", 30 | "\n", 31 | "\n", 32 | "$(v_h, v_w)= (\\frac{(n_{h}-k_{h}+s_{h})}{s_h }, \\frac{(n_{w}-k_{w}+s_{w})}{s_{w}})$,\n", 33 | "\n", 34 | "\n", 35 | "\n", 36 | "where $s_h$ and $s_w$ are the number of rows and columns of $X$ traversed per slide of $k$, also known as the stride $s$ of shape $(s_h,s_w)$.\n", 37 | "\n", 38 | "\n", 39 | "The value of a neuron $v_{i} \\in{v}$ that is stored in the layer $L+1$ of the CNN that follows the convolution, is given by the following equation:\n", 40 | "\n", 41 | "\n", 42 | " \n", 43 | "$v_{i}^{(L+1)}=g\\left( x_{i}^{(L)} k^{(L)}+\\alpha_{i}^{(L)}\\right)$ $(3)$, \n", 44 | "\n", 45 | "\n", 46 | " \n", 47 | "where $g$ is an activation function (usually non-linear), $x_{i}^{(L)}$ is the region of the input $x$ where the convolution is applied to produce $u_{i}^{(L+1)}$ and $\\alpha_{i}^{(L)}$ the corresponding bias term.\n", 48 | "\n", 49 | "In order to retrieve the MNL formulation as defined in $(2)$, we exclude the bias term and we set $g$ to be the identity function: ($g(x) = x$). Additionally we set the input space $X$ having a shape of $(J,K)$, i.e. *(n of CHOICES, n of exogenous variables)* and the kernel $k$ and stride $s$ to have a shape of $(1,K)$. As a result the shape of the output space $v$ will be $(v_h, v_w)= (J, 1)$, while the value of $v_{i}$ according to $(3)$ will be:\n", 50 | "\n", 51 | "$v_{i}^{(L+1)}= x_{i}^{(L)} k^{(L)}$ $(4)$, which is equivalent to the formulation of the utility functions $V_n = \\{V_{1,n},..., V_{J,n}\\}$ as defined (2). A graphical representation of the convolution process that is used to produce $V_n$ is presented in the figure below:\n", 52 | "\n", 53 | "\n", 54 | "![convolution](imgs/conv_p.png)\n", 55 | "\n", 56 | "After the convolution takes place, the output $v$, that is stored in layer $L+1$ and represents the utilities $V_n$ is connected to the final activation layer consisting of $J$ neurons, that allows the CNN to generate probability distributions over the $J$ different choice althernatives using the softmax activation function $\\sigma$, such that:\n", 57 | "\n", 58 | "$\\left(P_n\\right)_{i}= \\left(\\boldsymbol{\\sigma}\\left(\\mathbf{v}_{n}\\right)\\right)_{i}=\\frac{e^{v_{i,n}}}{\\sum_{j=1}^{J} e^{v_{j,n}}}$\n", 59 | " \n", 60 | "which is equivalent to the probability for individual $n$ to select choice alternative $i$ within the MNL framework under the standard assumptions. As it is usually the case when the output layer activation function of a NN is softmax, cross entropy is used as a loss function to optimize the model's parameters, i.e. the weights of $k$, during training through backpropagation. \n", 61 | " \n", 62 | " As noted by [Sifringer et al. (2020)](https://www.researchgate.net/publication/344428513_Enhancing_discrete_choice_models_with_representation_learning) , minimizing cross entropy loss is equivalent to maximizing the log-likelihood function, and thus allows us to derive\n", 63 | "the parameters’ Hessian matrix for the CNN and compute useful post-estimation indicators such as their standard errors and confidence intervals of the model. The architecture of the MNL implemented as a CNN is shown in the figure below.\n", 64 | "\n", 65 | "![mnl](imgs/MNL_as_CNN.png) \n" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "The keras implementation of an MNL as a CNN (with linear parameters) is given below:\n", 73 | "\n", 74 | "(based on the implementation by [Sifringer et al. (2020)](https://www.researchgate.net/publication/344428513_Enhancing_discrete_choice_models_with_representation_learning) in https://github.com/BSifringer/EnhancedDCM)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 2, 80 | "metadata": {}, 81 | "outputs": [ 82 | { 83 | "name": "stdout", 84 | "output_type": "stream", 85 | "text": [ 86 | "_________________________________________________________________\n", 87 | "Layer (type) Output Shape Param # \n", 88 | "=================================================================\n", 89 | "Features (InputLayer) (None, 10, 3, 1) 0 \n", 90 | "_________________________________________________________________\n", 91 | "Utilities (Conv2D) (None, 1, 3, 1) 10 \n", 92 | "_________________________________________________________________\n", 93 | "Flatten_Dim (Reshape) (None, 3) 0 \n", 94 | "_________________________________________________________________\n", 95 | "Choice (Activation) (None, 3) 0 \n", 96 | "=================================================================\n", 97 | "Total params: 10\n", 98 | "Trainable params: 10\n", 99 | "Non-trainable params: 0\n", 100 | "_________________________________________________________________\n", 101 | "None\n" 102 | ] 103 | }, 104 | { 105 | "data": { 106 | "text/plain": [ 107 | "" 108 | ] 109 | }, 110 | "execution_count": 2, 111 | "metadata": {}, 112 | "output_type": "execute_result" 113 | } 114 | ], 115 | "source": [ 116 | "from keras.models import Model\n", 117 | "from keras.layers import Input, Conv2D, Reshape, Activation\n", 118 | "\n", 119 | "def MNL(vars_num, choices_num, logits_activation= 'softmax'):\n", 120 | " \n", 121 | "\n", 122 | " main_input= Input((vars_num, choices_num, 1), name= 'Features')\n", 123 | " \n", 124 | " utilities = Conv2D(filters= 1, kernel_size= [vars_num,1], strides= (1,1), \n", 125 | " padding='valid', name= 'Utilities',\n", 126 | " use_bias= False, trainable= True)(main_input)\n", 127 | "\n", 128 | " utilitiesR = Reshape([choices_num], name= 'Flatten_Dim')(utilities)\n", 129 | " logits = Activation(logits_activation, name= 'Choice')(utilitiesR)\n", 130 | " \n", 131 | " model = Model(inputs= main_input, outputs= logits, name= 'Choice')\n", 132 | " print(model.summary())\n", 133 | " \n", 134 | " return model\n", 135 | "\n", 136 | "MNL(10,3)" 137 | ] 138 | } 139 | ], 140 | "metadata": { 141 | "kernelspec": { 142 | "display_name": "Python 3", 143 | "language": "python", 144 | "name": "python3" 145 | }, 146 | "language_info": { 147 | "codemirror_mode": { 148 | "name": "ipython", 149 | "version": 3 150 | }, 151 | "file_extension": ".py", 152 | "mimetype": "text/x-python", 153 | "name": "python", 154 | "nbconvert_exporter": "python", 155 | "pygments_lexer": "ipython3", 156 | "version": "3.7.7" 157 | } 158 | }, 159 | "nbformat": 4, 160 | "nbformat_minor": 4 161 | } 162 | -------------------------------------------------------------------------------- /models_and_utils/models.py: -------------------------------------------------------------------------------- 1 | from keras.models import Model 2 | from keras.layers.embeddings import Embedding 3 | from keras.layers import Input, Dense, Activation, Dropout, Flatten, concatenate, Lambda, Concatenate 4 | from keras.layers import Conv2D, Add, Reshape 5 | from keras import regularizers 6 | import tensorflow as tf 7 | #from keras.utils import plot_model 8 | 9 | 10 | 11 | def E_MNL(cont_vars_num, emb_vars_num, choices_num, 12 | unique_cats_num, pos_constraint= True, 13 | logits_activation= 'softmax'): 14 | 15 | """ E-MNL: Multinomial Logit model as a CNN 16 | with interpretable embeddings """ 17 | 18 | emb_size= choices_num 19 | 20 | main_input= Input((cont_vars_num, choices_num, 1), name= 'Features') 21 | emb_input= Input(shape= (emb_vars_num,), name= 'input_categories') 22 | 23 | hidden= Embedding(output_dim= emb_size, name= 'embeddings', 24 | embeddings_regularizer= regularizers.l2(0.01), 25 | input_dim= unique_cats_num, trainable= True)(emb_input) 26 | 27 | emb_dropout= Dropout(0.2, name= 'dropout_layer')(hidden) 28 | emb_final= Reshape([emb_vars_num, choices_num,1], name= 'reshape_embs')(emb_dropout) 29 | 30 | if pos_constraint==True: 31 | 32 | utilities1= Conv2D(filters= 1, kernel_size= [emb_vars_num,1], 33 | kernel_constraint= tf.keras.constraints.NonNeg(), 34 | strides= (1,1), padding= 'valid', name= 'Utilities_embs', 35 | use_bias= False, trainable= True)(emb_final) 36 | 37 | utilities2= Conv2D(filters= 1, kernel_size= [cont_vars_num, 1], 38 | strides= (1,1), padding= 'valid', name= 'Utilities_exog', 39 | use_bias= False, trainable= True)(main_input) 40 | 41 | utilities= Add(name= 'add_Utilities')([utilities1, utilities2]) 42 | 43 | else: 44 | 45 | final_data= concatenate([emb_final]+[main_input], name= 'concat_embs_and_exogenous', axis= 1) 46 | 47 | 48 | utilities= Conv2D(filters= 1, kernel_size= [cont_vars_num + (emb_vars_num),1], 49 | strides= (1,1), padding= 'valid', name= 'Utilities', 50 | use_bias= False, trainable= True)(final_data) 51 | 52 | utilitiesR= Reshape([choices_num], name= 'Flatten_Dim')(utilities) 53 | 54 | logits= Activation(logits_activation, name= 'Choice')(utilitiesR) 55 | model= Model(inputs= [main_input,emb_input], outputs= logits, name= 'Choice') 56 | 57 | return model 58 | 59 | 60 | 61 | def E_BL(cont_vars_num, emb_vars_num, 62 | unique_cats_num, pos_constraint= True, 63 | logits_activation= 'softmax'): 64 | 65 | """ E-BL: Binary Logit model as a CNN 66 | with interpretable embeddings """ 67 | 68 | choices_num= 2 69 | emb_size= choices_num -1 70 | 71 | main_input= Input((cont_vars_num, choices_num, 1), name= 'Features') 72 | emb_input= Input(shape= (emb_vars_num,), name= 'input_categories') 73 | 74 | 75 | emb1 = Embedding(output_dim= emb_size, name= 'embeddings', 76 | embeddings_regularizer= regularizers.l2(0.01), 77 | input_dim= unique_cats_num, trainable= True)(emb_input) 78 | 79 | emb_dropout= Dropout(0.2, name= 'dropout_layer')(emb1) 80 | 81 | # imposing equal and opposite embedding values on the 2nd embedding dimension 82 | emb2= Lambda(lambda x: x * (-1), name= "opposite")(emb1) 83 | 84 | hidden=Concatenate(name= 'Concat')([emb1, emb2]) 85 | 86 | emb_final= Reshape([emb_vars_num, choices_num, 1], name='reshape_embs')(hidden) 87 | 88 | if pos_constraint==True: 89 | 90 | utilities1= Conv2D(filters= 1, kernel_size= [emb_vars_num, 1], 91 | kernel_constraint= tf.keras.constraints.NonNeg(), 92 | strides= (1,1), padding= 'valid', name= 'Utilities_embs', 93 | use_bias= False, trainable= True)(emb_final) 94 | 95 | utilities2= Conv2D(filters= 1, kernel_size= [cont_vars_num, 1], 96 | strides= (1,1), padding= 'valid', name= 'Utilities_exog', 97 | use_bias= False, trainable= True)(main_input) 98 | 99 | utilities= Add(name= 'add_Utilities')([utilities1, utilities2]) 100 | 101 | else: 102 | 103 | final_data= concatenate([emb_final]+[main_input], name= 'concat_embs_and_exogenous', axis= 1) 104 | 105 | 106 | utilities= Conv2D(filters= 1, kernel_size= [cont_vars_num + (emb_vars_num),1], 107 | strides= (1,1), padding= 'valid', name= 'Utilities', 108 | use_bias= False, trainable= True)(final_data) 109 | 110 | 111 | utilitiesR= Reshape([choices_num], name= 'Flatten_Dim')(utilities) 112 | 113 | logits= Activation(logits_activation, name= 'Choice')(utilitiesR) 114 | model = Model(inputs= [main_input,emb_input], outputs= logits, name= 'Choice') 115 | 116 | 117 | return model 118 | 119 | 120 | 121 | def EL_MNL(cont_vars_num, emb_vars_num, choices_num, 122 | unique_cats_num, extra_emb_dims, n_nodes, 123 | pos_constraint=True, logits_activation = 'softmax'): 124 | 125 | """ EL-MNL: Multinomial Logit model as a CNN 126 | with interpretable embeddings 127 | plus representation learning term R 128 | according to (Sifringer et al. 2020)""" 129 | 130 | emb_size= choices_num + extra_emb_dims 131 | main_input= Input((cont_vars_num, choices_num, 1), name= 'Features') 132 | emb_input= Input(shape= (emb_vars_num,), name= 'input_categories') 133 | 134 | hidden= Embedding(output_dim= emb_size, name= 'embeddings', 135 | embeddings_regularizer= regularizers.l2(0.01), 136 | input_dim= unique_cats_num, trainable= True)(emb_input) 137 | 138 | emb_dropout= Dropout(0.2, name= 'dropout_layer')(hidden) 139 | 140 | 141 | emb= Lambda(lambda z: z[:,:,:choices_num], name= 'get_emb_utilities')(emb_dropout) 142 | emb_extra= Lambda(lambda z: z[:,:,choices_num:], name= 'get_extra_dims')(emb_dropout) 143 | emb_extra= Reshape([emb_vars_num*(emb_size-choices_num),1,1], name= 'reshape_extra')(emb_extra) 144 | 145 | dense= Conv2D(filters= n_nodes, kernel_size= [emb_vars_num*(emb_size-choices_num), 1], 146 | activation='relu', padding='valid', name= 'Dense_NN_per_frame')(emb_extra) 147 | 148 | new_feature= Dense(units= choices_num, name= 'Output_new_feature')(dense) 149 | 150 | new_featureR= Reshape([choices_num], name= 'Remove_Dim')(new_feature) 151 | 152 | emb_final= Reshape([emb_vars_num,choices_num,1], name= 'reshape_embs')(emb) 153 | 154 | if pos_constraint==True: 155 | 156 | utilities1= Conv2D(filters= 1, kernel_size= [emb_vars_num,1], 157 | kernel_constraint= tf.keras.constraints.NonNeg(), 158 | strides= (1,1), padding= 'valid', name= 'Utilities_embs', 159 | use_bias= False, trainable= True)(emb_final) 160 | 161 | utilities2 = Conv2D(filters= 1, kernel_size= [cont_vars_num, 1], 162 | strides= (1,1), padding= 'valid', name= 'Utilities_exog', 163 | use_bias= False, trainable= True)(main_input) 164 | 165 | utilities= Add(name='add_Utilities')([utilities1, utilities2]) 166 | 167 | else: 168 | 169 | final_data= concatenate([emb_final] + [main_input], name= 'concat_embs_and_exogenous', axis=1) 170 | 171 | 172 | utilities= Conv2D(filters= 1, kernel_size= [cont_vars_num + (emb_vars_num),1], 173 | strides= (1,1), padding= 'valid', name= 'Utilities', 174 | use_bias= False, trainable= True)(final_data) 175 | 176 | 177 | utilitiesR= Reshape([choices_num], name= 'Flatten_Dim')(utilities) 178 | final_utilities= Add(name= 'New_Utility_functions')([utilitiesR, new_featureR]) 179 | 180 | logits= Activation(logits_activation, name= 'Choice')(final_utilities) 181 | model= Model(inputs= [main_input, emb_input], outputs=logits, name= 'Choice') 182 | 183 | 184 | return model 185 | 186 | 187 | 188 | def binary_EL_MNL(cont_vars_num, emb_vars_num, 189 | unique_cats_num, emb_size, n_nodes, 190 | pos_constraint=True, logits_activation = 'softmax'): 191 | 192 | """ EL-BL: Binary Logit model as a CNN 193 | with interpretable embeddings 194 | plus representation learning term R 195 | according to (Sifringer et al. 2020) """ 196 | 197 | 198 | choices_num= 2 199 | main_input= Input((cont_vars_num, choices_num, 1), name= 'Features') 200 | emb_input= Input(shape= (emb_vars_num,), name= 'input_categories') 201 | 202 | 203 | hidden= Embedding(output_dim= emb_size, name= "embeddings", 204 | embeddings_regularizer= regularizers.l2(0.01), 205 | input_dim= unique_cats_num, trainable= True)(emb_input) 206 | 207 | emb_dropout= Dropout(0.2, name= 'dropout_layer')(hidden) 208 | 209 | emb1= Lambda(lambda z: z[:,:,:1], name= 'get_emb_utilities')(emb_dropout) 210 | emb_extra= Lambda(lambda z: z[:,:,1:], name= 'get_extra_dims')(emb_dropout) 211 | emb_extra= Reshape([emb_vars_num*(emb_size-1),1,1], name='reshape_extra')(emb_extra) 212 | 213 | dense = Conv2D(filters= n_nodes, kernel_size= [emb_vars_num*(emb_size-1), 1], activation= 'relu', 214 | padding= 'valid', name= 'Dense_NN_per_frame')(emb_extra) 215 | 216 | new_feature = Dense(units= choices_num, name= "Output_new_feature")(dense) 217 | new_featureR = Reshape([choices_num], name= 'Remove_Dim')(new_feature) 218 | 219 | # imposing equal and opposite embedding values on the 2nd embedding dimension 220 | emb2= Lambda(lambda x: x * (-1), name= "opposite")(emb1) 221 | 222 | emb= Concatenate(name= 'Concat')([emb1, emb2]) 223 | 224 | emb_final= Reshape([emb_vars_num,choices_num,1], name= 'reshape_embs')(emb) 225 | 226 | if pos_constraint==True: 227 | 228 | utilities1= Conv2D(filters= 1, kernel_size= [emb_vars_num,1], kernel_constraint= tf.keras.constraints.NonNeg(), 229 | strides= (1,1), padding= 'valid', name= 'Utilities_embs', use_bias= False, trainable= True)(emb_final) 230 | 231 | utilities2= Conv2D(filters= 1, kernel_size= [cont_vars_num,1], 232 | strides= (1,1), padding= 'valid', name= 'Utilities_exog', use_bias= False, trainable= True)(main_input) 233 | 234 | utilities= Add(name= 'add_Utilities')([utilities1, utilities2]) 235 | 236 | else: 237 | 238 | final_data= concatenate([emb_final]+[main_input], name= 'concat_embs_and_exogenous', axis=1) 239 | 240 | 241 | utilities= Conv2D(filters= 1, kernel_size= [exogenous_vars_num+ (emb_vars_num),1], 242 | strides= (1,1), padding= 'valid', name= 'Utilities', 243 | use_bias=False, trainable= True)(final_data) 244 | 245 | utilitiesR= Reshape([choices_num], name= 'Flatten_Dim')(utilities) 246 | final_utilities= Add(name= 'New_Utility_functions')([utilitiesR, new_featureR]) 247 | 248 | logits= Activation(logits_activation, name= 'Choice')(final_utilities) 249 | model= Model(inputs= [main_input,emb_input], outputs= logits, name= 'Choice') 250 | 251 | 252 | return model 253 | 254 | -------------------------------------------------------------------------------- /data/Q_test.csv: -------------------------------------------------------------------------------- 1 | PURPOSE,FIRST,TICKET,WHO,LUGGAGE,AGE,MALE,INCOME,GA,ORIGIN,DEST,SM_SEATS 2 | Leisure,no_1stClass,2way_1/2,self,NO_luggage,24