├── README.md ├── conv_net_classes.py ├── conv_net_sentence.py ├── process_data.py ├── rt-polarity.neg └── rt-polarity.pos /README.md: -------------------------------------------------------------------------------- 1 | ## Convolutional Neural Networks for Sentence Classification 2 | Code for the paper [Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1408.5882) (EMNLP 2014). 3 | 4 | Runs the model on Pang and Lee's movie review dataset (MR in the paper). 5 | Please cite the original paper when using the data. 6 | 7 | ### Requirements 8 | Code is written in Python (2.7) and requires Theano (0.7). 9 | 10 | Using the pre-trained `word2vec` vectors will also require downloading the binary file from 11 | https://code.google.com/p/word2vec/ 12 | 13 | 14 | ### Data Preprocessing 15 | To process the raw data, run 16 | 17 | ``` 18 | python process_data.py path 19 | ``` 20 | 21 | where path points to the word2vec binary file (i.e. `GoogleNews-vectors-negative300.bin` file). 22 | This will create a pickle object called `mr.p` in the same folder, which contains the dataset 23 | in the right format. 24 | 25 | Note: This will create the dataset with different fold-assignments than was used in the paper. 26 | You should still be getting a CV score of >81% with CNN-nonstatic model, though. 27 | 28 | ### Running the models (CPU) 29 | Example commands: 30 | 31 | ``` 32 | THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python conv_net_sentence.py -nonstatic -rand 33 | THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python conv_net_sentence.py -static -word2vec 34 | THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python conv_net_sentence.py -nonstatic -word2vec 35 | ``` 36 | 37 | This will run the CNN-rand, CNN-static, and CNN-nonstatic models respectively in the paper. 38 | 39 | ### Using the GPU 40 | GPU will result in a good 10x to 20x speed-up, so it is highly recommended. 41 | To use the GPU, simply change `device=cpu` to `device=gpu` (or whichever gpu you are using). 42 | For example: 43 | ``` 44 | THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python conv_net_sentence.py -nonstatic -word2vec 45 | ``` 46 | 47 | ### Example output 48 | CPU output: 49 | ``` 50 | epoch: 1, training time: 219.72 secs, train perf: 81.79 %, val perf: 79.26 % 51 | epoch: 2, training time: 219.55 secs, train perf: 82.64 %, val perf: 76.84 % 52 | epoch: 3, training time: 219.54 secs, train perf: 92.06 %, val perf: 80.95 % 53 | ``` 54 | GPU output: 55 | ``` 56 | epoch: 1, training time: 16.49 secs, train perf: 81.80 %, val perf: 78.32 % 57 | epoch: 2, training time: 16.12 secs, train perf: 82.53 %, val perf: 76.74 % 58 | epoch: 3, training time: 16.16 secs, train perf: 91.87 %, val perf: 81.37 % 59 | ``` 60 | 61 | ### Other Implementations 62 | #### TensorFlow 63 | [Denny Britz](http://www.wildml.com) has an implementation of the model in TensorFlow: 64 | 65 | https://github.com/dennybritz/cnn-text-classification-tf 66 | 67 | He also wrote a [nice tutorial](http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow) on it, as well as a general tutorial on [CNNs for NLP](http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp). 68 | 69 | #### Torch 70 | [HarvardNLP](http://harvardnlp.github.io/) group has an implementation in Torch. 71 | 72 | https://github.com/harvardnlp/sent-conv-torch 73 | 74 | ### Hyperparameters 75 | At the time of my original experiments I did not have access to a GPU so I could not run a lot of different experiments. 76 | Hence the paper is missing a lot of things like ablation studies and variance in performance, and some of the conclusions 77 | were premature (e.g. regularization does not always seem to help). 78 | 79 | Ye Zhang has written a [very nice paper](http://arxiv.org/abs/1510.03820) doing an extensive analysis of model variants (e.g. filter widths, k-max pooling, word2vec vs Glove, etc.) and their effect on performance. 80 | -------------------------------------------------------------------------------- /conv_net_classes.py: -------------------------------------------------------------------------------- 1 | """ 2 | Sample code for 3 | Convolutional Neural Networks for Sentence Classification 4 | http://arxiv.org/pdf/1408.5882v2.pdf 5 | 6 | Much of the code is modified from 7 | - deeplearning.net (for ConvNet classes) 8 | - https://github.com/mdenil/dropout (for dropout) 9 | - https://groups.google.com/forum/#!topic/pylearn-dev/3QbKtCumAW4 (for Adadelta) 10 | """ 11 | 12 | import numpy 13 | import theano.tensor.shared_randomstreams 14 | import theano 15 | import theano.tensor as T 16 | from theano.tensor.signal import downsample 17 | from theano.tensor.nnet import conv 18 | 19 | def ReLU(x): 20 | y = T.maximum(0.0, x) 21 | return(y) 22 | def Sigmoid(x): 23 | y = T.nnet.sigmoid(x) 24 | return(y) 25 | def Tanh(x): 26 | y = T.tanh(x) 27 | return(y) 28 | def Iden(x): 29 | y = x 30 | return(y) 31 | 32 | class HiddenLayer(object): 33 | """ 34 | Class for HiddenLayer 35 | """ 36 | def __init__(self, rng, input, n_in, n_out, activation, W=None, b=None, 37 | use_bias=False): 38 | 39 | self.input = input 40 | self.activation = activation 41 | 42 | if W is None: 43 | if activation.func_name == "ReLU": 44 | W_values = numpy.asarray(0.01 * rng.standard_normal(size=(n_in, n_out)), dtype=theano.config.floatX) 45 | else: 46 | W_values = numpy.asarray(rng.uniform(low=-numpy.sqrt(6. / (n_in + n_out)), high=numpy.sqrt(6. / (n_in + n_out)), 47 | size=(n_in, n_out)), dtype=theano.config.floatX) 48 | W = theano.shared(value=W_values, name='W') 49 | if b is None: 50 | b_values = numpy.zeros((n_out,), dtype=theano.config.floatX) 51 | b = theano.shared(value=b_values, name='b') 52 | 53 | self.W = W 54 | self.b = b 55 | 56 | if use_bias: 57 | lin_output = T.dot(input, self.W) + self.b 58 | else: 59 | lin_output = T.dot(input, self.W) 60 | 61 | self.output = (lin_output if activation is None else activation(lin_output)) 62 | 63 | # parameters of the model 64 | if use_bias: 65 | self.params = [self.W, self.b] 66 | else: 67 | self.params = [self.W] 68 | 69 | def _dropout_from_layer(rng, layer, p): 70 | """p is the probablity of dropping a unit 71 | """ 72 | srng = theano.tensor.shared_randomstreams.RandomStreams(rng.randint(999999)) 73 | # p=1-p because 1's indicate keep and p is prob of dropping 74 | mask = srng.binomial(n=1, p=1-p, size=layer.shape) 75 | # The cast is important because 76 | # int * float32 = float64 which pulls things off the gpu 77 | output = layer * T.cast(mask, theano.config.floatX) 78 | return output 79 | 80 | class DropoutHiddenLayer(HiddenLayer): 81 | def __init__(self, rng, input, n_in, n_out, 82 | activation, dropout_rate, use_bias, W=None, b=None): 83 | super(DropoutHiddenLayer, self).__init__( 84 | rng=rng, input=input, n_in=n_in, n_out=n_out, W=W, b=b, 85 | activation=activation, use_bias=use_bias) 86 | 87 | self.output = _dropout_from_layer(rng, self.output, p=dropout_rate) 88 | 89 | class MLPDropout(object): 90 | """A multilayer perceptron with dropout""" 91 | def __init__(self,rng,input,layer_sizes,dropout_rates,activations,use_bias=True): 92 | 93 | #rectified_linear_activation = lambda x: T.maximum(0.0, x) 94 | 95 | # Set up all the hidden layers 96 | self.weight_matrix_sizes = zip(layer_sizes, layer_sizes[1:]) 97 | self.layers = [] 98 | self.dropout_layers = [] 99 | self.activations = activations 100 | next_layer_input = input 101 | #first_layer = True 102 | # dropout the input 103 | next_dropout_layer_input = _dropout_from_layer(rng, input, p=dropout_rates[0]) 104 | layer_counter = 0 105 | for n_in, n_out in self.weight_matrix_sizes[:-1]: 106 | next_dropout_layer = DropoutHiddenLayer(rng=rng, 107 | input=next_dropout_layer_input, 108 | activation=activations[layer_counter], 109 | n_in=n_in, n_out=n_out, use_bias=use_bias, 110 | dropout_rate=dropout_rates[layer_counter]) 111 | self.dropout_layers.append(next_dropout_layer) 112 | next_dropout_layer_input = next_dropout_layer.output 113 | 114 | # Reuse the parameters from the dropout layer here, in a different 115 | # path through the graph. 116 | next_layer = HiddenLayer(rng=rng, 117 | input=next_layer_input, 118 | activation=activations[layer_counter], 119 | # scale the weight matrix W with (1-p) 120 | W=next_dropout_layer.W * (1 - dropout_rates[layer_counter]), 121 | b=next_dropout_layer.b, 122 | n_in=n_in, n_out=n_out, 123 | use_bias=use_bias) 124 | self.layers.append(next_layer) 125 | next_layer_input = next_layer.output 126 | #first_layer = False 127 | layer_counter += 1 128 | 129 | # Set up the output layer 130 | n_in, n_out = self.weight_matrix_sizes[-1] 131 | dropout_output_layer = LogisticRegression( 132 | input=next_dropout_layer_input, 133 | n_in=n_in, n_out=n_out) 134 | self.dropout_layers.append(dropout_output_layer) 135 | 136 | # Again, reuse paramters in the dropout output. 137 | output_layer = LogisticRegression( 138 | input=next_layer_input, 139 | # scale the weight matrix W with (1-p) 140 | W=dropout_output_layer.W * (1 - dropout_rates[-1]), 141 | b=dropout_output_layer.b, 142 | n_in=n_in, n_out=n_out) 143 | self.layers.append(output_layer) 144 | 145 | # Use the negative log likelihood of the logistic regression layer as 146 | # the objective. 147 | self.dropout_negative_log_likelihood = self.dropout_layers[-1].negative_log_likelihood 148 | self.dropout_errors = self.dropout_layers[-1].errors 149 | 150 | self.negative_log_likelihood = self.layers[-1].negative_log_likelihood 151 | self.errors = self.layers[-1].errors 152 | 153 | # Grab all the parameters together. 154 | self.params = [ param for layer in self.dropout_layers for param in layer.params ] 155 | 156 | def predict(self, new_data): 157 | next_layer_input = new_data 158 | for i,layer in enumerate(self.layers): 159 | if i