├── .gitignore ├── notebooks ├── 1_mnist_feedforward.Rmd ├── 2_cnn.Rmd ├── 3_text_classification.Rmd ├── 4_sunspots.Rmd ├── 5_flights.Rmd └── rnn.png └── user2018-deeplearning.Rproj /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | .DS_Store 6 | -------------------------------------------------------------------------------- /notebooks/1_mnist_feedforward.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "MNIST example" 3 | output: html_notebook 4 | --- 5 | 6 | We'll start with the "hello world" of neural nets --- classifying some handwritten digits! 7 | 8 | First, load the **keras** package and load a built-in dataset: 9 | 10 | ```{r} 11 | library(keras) 12 | mnist <- dataset_mnist() 13 | str(mnist) 14 | ``` 15 | 16 | We see that `mnist$train$x` is an array of matrices that represent the images. Let's visualize a couple of them: 17 | 18 | ```{r} 19 | plot_mnist_example <- function(i) { 20 | plot(as.raster(mnist$train$x[i,,] / 255)) 21 | } 22 | plot_mnist_example(1) 23 | plot_mnist_example(42) 24 | ``` 25 | 26 | Some necessary data massaging. 27 | 28 | ```{r} 29 | c(c(x_train, y_train), c(x_test, y_test)) %<-% mnist 30 | 31 | # Collapse each matrix to a 1-dimensional vector 32 | x_train <- array_reshape(x_train, c(nrow(x_train), 784)) 33 | x_test <- array_reshape(x_test, c(nrow(x_test), 784)) 34 | 35 | # Transform RGB values into [0,1] range 36 | x_train <- x_train / 255 37 | x_test <- x_test / 255 38 | 39 | # One-hot encode classes 40 | num_classes <- 10 41 | y_train <- to_categorical(y_train, num_classes) 42 | y_test <- to_categorical(y_test, num_classes) 43 | 44 | dim(x_train) 45 | dim(y_train) 46 | ``` 47 | 48 | We're now ready to define a neural network model! 49 | 50 | ```{r} 51 | # Instantiate a Keras sequential model object 52 | model <- keras_model_sequential() 53 | 54 | # Define the model architecture, note the by-reference semantics 55 | model %>% 56 | layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% 57 | layer_dense(units = 128, activation = 'relu') %>% 58 | layer_dense(units = 10, activation = 'softmax') 59 | 60 | summary(model) 61 | ``` 62 | 63 | How do we know what the architecture should look like? It's a combination of (mostly) trial-and-error, experience, and looking at what others have done. Every problem is unique in some way! 64 | 65 | Once we've specified the architecture, we can `compile` the model by specifying the loss we're mimizing and the optimizer. 66 | 67 | ```{r} 68 | model %>% compile( 69 | loss = 'categorical_crossentropy', 70 | optimizer = "adam", 71 | metrics = c('accuracy') 72 | ) 73 | ``` 74 | 75 | Now that we have a compiled model, we're ready to start training! 76 | 77 | ```{r} 78 | # batch_size: how many observations we use per parameter update 79 | batch_size <- 128 80 | # epoch: one epoch is one full pass through the training dataset 81 | epochs <- 15 82 | 83 | history <- model %>% fit( 84 | x_train, y_train, 85 | batch_size = batch_size, 86 | epochs = epochs, 87 | verbose = 1, 88 | validation_split = 0.2 89 | ) 90 | 91 | plot(history) 92 | ``` 93 | 94 | Once the model is trained, we can `evaluate` it and look at some performance metrics 95 | 96 | ```{r} 97 | score <- model %>% evaluate( 98 | x_test, y_test, 99 | verbose = 0 100 | ) 101 | 102 | # Output metrics 103 | cat('Test loss:', score[[1]], '\n') 104 | cat('Test accuracy:', score[[2]], '\n') 105 | ``` 106 | 107 | # Manually inspecting predictions 108 | 109 | Can you make a couple predictions and check against the actual images? 110 | 111 | To make the predictions, we can use `predict_classes()`: 112 | 113 | ```{r} 114 | predictions <- model %>% 115 | predict_classes(x_test) 116 | predictions[1] 117 | ``` 118 | 119 | To get the raw probabilities: 120 | 121 | ```{r} 122 | predictions_prob <- model %>% 123 | predict_proba(x_test) 124 | ``` 125 | 126 | -------------------------------------------------------------------------------- /notebooks/2_cnn.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Conv Nets" 3 | output: html_notebook 4 | --- 5 | 6 | In the first example, we "stretched" the matrix of pixels into a long vector which served as the input of the feedforward neural net. In practice, most image processing applications utilize another type of architecture known as convolutional neural networks (CNN), which exploits the spatial relationship among the pixels. 7 | 8 | We'll continue with the `mnist` dataset from the previous section. 9 | 10 | ```{r} 11 | library(keras) 12 | mnist <- dataset_mnist() 13 | str(mnist) 14 | ``` 15 | 16 | Note the call to `array_reshape()` below --- we're reshaping the images to volumes of dimension $img\_rows \times img\_cols \times 1$. The last dimension, *depth*, in this case is `1` since our images are grayscale. 17 | 18 | ```{r} 19 | img_rows <- 28 20 | img_cols <- 28 21 | 22 | c(c(x_train, y_train), c(x_test, y_test)) %<-% mnist 23 | 24 | x_train <- array_reshape(x_train, c(nrow(x_train), img_rows, img_cols, 1)) 25 | x_test <- array_reshape(x_test, c(nrow(x_test), img_rows, img_cols, 1)) 26 | input_shape <- c(img_rows, img_cols, 1) 27 | 28 | # Transform RGB values into [0,1] range 29 | x_train <- x_train / 255 30 | x_test <- x_test / 255 31 | 32 | # Convert class vectors to binary class matrices 33 | num_classes <- 10 34 | y_train <- to_categorical(y_train, num_classes) 35 | y_test <- to_categorical(y_test, num_classes) 36 | 37 | dim(x_train) 38 | ``` 39 | 40 | We're now ready to define a neural network model! 41 | 42 | ```{r} 43 | # Instantiate a Keras sequential model object 44 | model <- keras_model_sequential() %>% 45 | layer_conv_2d(filters = 128, kernel_size = c(3, 3), 46 | activation = 'relu', input_shape = input_shape) %>% 47 | layer_conv_2d(filters = 128, kernel_size = c(3, 3), 48 | strides = 2, activation = 'relu') %>% 49 | layer_max_pooling_2d(pool_size = c(2, 2)) %>% 50 | layer_flatten() %>% 51 | layer_dense(units = num_classes, activation = 'softmax') 52 | 53 | summary(model) 54 | ``` 55 | 56 | Now, we can compile the model! 57 | 58 | ```{r} 59 | model %>% compile( 60 | loss = 'categorical_crossentropy', 61 | optimizer = "adam", 62 | metrics = c('accuracy') 63 | ) 64 | ``` 65 | 66 | Now that we have a compiled model, we're ready to start training! 67 | 68 | ```{r} 69 | batch_size <- 128 70 | epochs <- 10 71 | 72 | history <- model %>% fit( 73 | x_train, y_train, 74 | batch_size = batch_size, 75 | epochs = epochs, 76 | verbose = 1, 77 | validation_split = 0.2 78 | ) 79 | 80 | plot(history) 81 | ``` 82 | 83 | Once the model is trained, we can `evaluate` it and look at some performance metrics 84 | 85 | ```{r} 86 | score <- model %>% evaluate( 87 | x_test, y_test, 88 | verbose = 0 89 | ) 90 | 91 | # Output metrics 92 | cat('Test loss:', score[[1]], '\n') 93 | cat('Test accuracy:', score[[2]], '\n') 94 | ``` 95 | 96 | -------------------------------------------------------------------------------- /notebooks/3_text_classification.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "IMDb classification" 3 | output: html_notebook 4 | --- 5 | 6 | In this example, we'll try to classify whether movie reviews on IMDb are positive or negative based on the review text. 7 | 8 | We'll start by loading some data that comes with the **keras** package. Here, `max_features` denotes the number of unique word tokens we want to consider. 9 | 10 | ```{r} 11 | library(keras) 12 | max_features <- 20000 13 | imdb <- dataset_imdb(num_words = max_features) 14 | c(c(x_train, y_train), c(x_test, y_test)) %<-% imdb 15 | ``` 16 | 17 | Let's see what the training data looks like 18 | 19 | ```{r} 20 | x_train[[1]] 21 | ``` 22 | 23 | To turn this into English, we can use the following helper function 24 | 25 | ```{r} 26 | word_index <- dataset_imdb_word_index() 27 | reverse_word_index <- names(word_index) 28 | names(reverse_word_index) <- word_index 29 | decode_imdb <- function(indexes) { 30 | force(indexes) 31 | words <- sapply(indexes, function(index) { 32 | word <- if (index >= 3) reverse_word_index[[as.character(index - 3)]] 33 | if (!is.null(word)) word else "?" 34 | }) 35 | paste(words, collapse = " ") 36 | } 37 | decode_imdb(x_train[[1]]) 38 | ``` 39 | 40 | Since the reviews are of variable length, we need to get them all to the same length before we can feed them to the model. 41 | 42 | ```{r} 43 | maxlen <- 80 44 | x_train <- pad_sequences(x_train, maxlen = maxlen) 45 | x_test <- pad_sequences(x_test, maxlen = maxlen) 46 | ``` 47 | 48 | Our `x_train` is now a matrix where each row corresponds to a review: 49 | 50 | ```{r} 51 | str(x_train) 52 | ``` 53 | 54 | We can then define a model and compile it: 55 | 56 | ```{r} 57 | model <- keras_model_sequential() 58 | model %>% 59 | layer_embedding(input_dim = max_features, output_dim = 128) %>% 60 | layer_gru(units = 32, dropout = 0.2, recurrent_dropout = 0.2) %>% 61 | layer_dense(units = 1, activation = 'sigmoid') 62 | 63 | model %>% compile( 64 | loss = 'binary_crossentropy', 65 | optimizer = 'adam', 66 | metrics = c('accuracy') 67 | ) 68 | ``` 69 | 70 | Train the model and evaluate its performance 71 | 72 | ```{r} 73 | model %>% fit( 74 | x_train, y_train, 75 | batch_size = batch_size, 76 | epochs = 3, 77 | validation_data = list(x_test, y_test) 78 | ) 79 | 80 | scores <- model %>% evaluate( 81 | x_test, y_test, 82 | batch_size = batch_size 83 | ) 84 | 85 | cat('Test loss:', scores[[1]]) 86 | cat('Test accuracy', scores[[2]]) 87 | ``` 88 | 89 | -------------------------------------------------------------------------------- /notebooks/4_sunspots.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Time series forecasting" 3 | output: html_notebook 4 | --- 5 | 6 | We're gonna work with the `sunspot.month` dataset 7 | 8 | > Monthly numbers of sunspots, as from the World Data Center, aka SIDC. This is the version of the data that will occasionally be updated when new counts become available. 9 | 10 | ```{r} 11 | sunspot.month 12 | ``` 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(tsibble) 17 | sunspot.month %>% 18 | as_tsibble() %>% 19 | ggplot(aes(index, value)) + 20 | geom_line() 21 | ``` 22 | 23 | 24 | The goal of this exercise will be to build a model that takes 10 years' worth of data (120 observations) and predict the next 10 years (120 observations). 25 | 26 | # One problem, four ways 27 | 28 | We'll go through four different ways to attack this problem (did I mention there's a bazillion of them?). Our approaches will correspond to each of these diagrams: 29 | 30 | ```{r echo = FALSE} 31 | knitr::include_graphics("rnn.png") 32 | ``` 33 | 34 | ## First way: one step at a time 35 | 36 | As a first attempt, we're going to set up our model so that our response variable is one time step, and our predictors are the 120 values preceding the response. This corresponds to the "many to one" picture in the diagram. 37 | 38 | We need to do some data prep first: 39 | 40 | ```{r} 41 | library(lubridate) 42 | library(keras) 43 | 44 | make_series <- function(column, start_offset, end_offset) { 45 | # Given a time series, return a list 46 | # where each element is a vector representing a window 47 | # of the time series determined by the offsets 48 | purrr::map(seq_along(column), 49 | function(x) { 50 | start <- max(0, x + start_offset) 51 | end <- max(0, x + end_offset) 52 | column[start:end] 53 | }) 54 | } 55 | 56 | make_scaler <- function(x) { 57 | scaled <- scale(x) 58 | center_value <- attr(scaled, "scaled:center") 59 | scale_value <- attr(scaled, "scaled:scale") 60 | list( 61 | scaler = function(x) { 62 | force(x) 63 | scale(x, center_value, scale_value) 64 | }, 65 | descaler = function(x) { 66 | force(x) 67 | x * scale_value + center_value 68 | } 69 | ) 70 | } 71 | 72 | c(scale_value, descale_value) %<-% make_scaler( 73 | window(sunspot.month, 1749, c(1992, 12)) 74 | ) 75 | 76 | train_range <- 1749:1992 77 | validation_range <- 1993:2002 78 | testing_range <- 2003:2012 79 | timesteps <- 120 80 | 81 | data <- sunspot.month %>% 82 | as_tsibble() %>% 83 | as_data_frame() %>% 84 | mutate(key = case_when( 85 | year(index) %in% train_range ~ "train", 86 | year(index) %in% validation_range ~ "validation", 87 | year(index) %in% testing_range ~ "testing" 88 | )) %>% 89 | mutate(scaled_value = scale_value(value) %>% 90 | as.vector()) %>% 91 | mutate(lookback = make_series(scaled_value, -timesteps, -1), 92 | target = make_series(scaled_value, 0, timesteps - 1), 93 | target_lag = make_series(scaled_value, -1, timesteps - 2)) 94 | 95 | glimpse(data) 96 | ``` 97 | 98 | We also need to massage our data into a format that Keras can consume: 99 | 100 | ```{r} 101 | make_keras_data <- function(data) { 102 | x <- data$lookback %>% 103 | array_reshape(c(length(data$lookback), timesteps, 1)) 104 | y_sequence <- data$target %>% 105 | array_reshape(c(length(data$target), timesteps, 1)) 106 | y_sequence_lag <- data$target_lag %>% 107 | array_reshape(c(length(data$target_lag), timesteps, 1)) 108 | y <- data$target %>% 109 | sapply(first) 110 | list(x = x, y = y, y_sequence = y_sequence, 111 | y_sequence_lag = y_sequence_lag) 112 | } 113 | 114 | training_data <- data %>% 115 | filter(year(index) > 1758, 116 | key == "train") %>% 117 | make_keras_data() 118 | 119 | validation_data <- data %>% 120 | filter(key == "validation") %>% 121 | make_keras_data() 122 | 123 | full_training_data <- data %>% 124 | filter(key != "testing", year(index) > 1758) %>% 125 | make_keras_data() 126 | 127 | prediction_data <- data %>% 128 | filter(year(index) == 2003, month(index) == 1) %>% 129 | make_keras_data() 130 | 131 | str(training_data) 132 | ``` 133 | 134 | Let's now define our model. 135 | 136 | ```{r} 137 | model <- keras_model_sequential() %>% 138 | layer_lstm(units = 128, input_shape = c(timesteps, 1)) %>% 139 | layer_dense(1) 140 | 141 | model %>% 142 | compile(optimizer='adam', loss='mse') 143 | 144 | model 145 | ``` 146 | 147 | Let's see if it learns! 148 | 149 | ```{r} 150 | history <- model %>% fit( 151 | training_data$x, 152 | training_data$y, 153 | batch_size = 256, 154 | epochs = 20, 155 | validation_data = list( 156 | validation_data$x, 157 | validation_data$y 158 | ) 159 | ) 160 | plot(history) 161 | ``` 162 | 163 | After we're done tweaking the model, we can train it on the full dataset: 164 | 165 | ```{r} 166 | model <- keras_model_sequential() %>% 167 | layer_lstm(units = 128, input_shape = c(timesteps, 1)) %>% 168 | layer_dense(1) 169 | 170 | model %>% 171 | compile(optimizer='adam', loss='mse') 172 | 173 | history <- model %>% fit( 174 | full_training_data$x, 175 | full_training_data$y, 176 | batch_size = 256, 177 | epochs = 20, 178 | ) 179 | ``` 180 | 181 | Since the model outputs only one prediction, and we need 120, we need to come up with a scheme to do that. The tricky part here is that for any value more than one time step into the future, we're missing some previous values. To work around this, we're going to feed back the predictions iteratively: 182 | 183 | ```{r} 184 | forecast_future_values <- function(model, x, steps) { 185 | forecasted <- numeric(0) 186 | for (i in seq_len(steps)) { 187 | x_reshaped <- array_reshape(x, c(1, timesteps, 1)) 188 | next_value <- model %>% 189 | predict(x_reshaped) %>% 190 | as.vector() 191 | 192 | forecasted <- c(forecasted, next_value) 193 | x <- c(x[-1], next_value) 194 | } 195 | forecasted 196 | } 197 | 198 | predictions <- forecast_future_values(model, prediction_data$x, 12*10) %>% 199 | descale_value() 200 | 201 | predictions 202 | ``` 203 | 204 | Let's see how our model did: 205 | 206 | ```{r} 207 | plot_predictions <- function(data, predictions) { 208 | data %>% 209 | filter(key == "testing") %>% 210 | add_column(prediction = predictions) %>% 211 | rbind(data %>% 212 | filter(key != "testing", 213 | year(index)> 1980) %>% 214 | add_column(prediction = NA_real_) 215 | ) %>% 216 | select(index, value, prediction) %>% 217 | gather(key, value, value, prediction) %>% 218 | ggplot(aes(x = index, y = value, color = key)) + 219 | geom_line() 220 | } 221 | 222 | plot_predictions(data, predictions) 223 | ``` 224 | 225 | ## Second way: have the dense layer output `timesteps` numbers 226 | 227 | The only difference compared to the previous approach is that we're outputing a vector of 120 steps instead of 1. This still corresponds to the "many to one" picture. 228 | 229 | ```{r} 230 | model <- keras_model_sequential() %>% 231 | layer_lstm(units = 128, input_shape = c(timesteps, 1)) %>% 232 | layer_dense(timesteps) %>% 233 | layer_reshape(c(timesteps, 1)) 234 | 235 | model %>% 236 | compile(optimizer='adam', loss='mse') 237 | ``` 238 | 239 | We also need to tweak our training dataset to accommdate the new model structure. Since the `y_sequence` output of `make_keras_data()` looks two years into the future, we need to make sure we're not cheating: 240 | 241 | ```{r} 242 | training_data <- data %>% 243 | filter(year(index) > 1758, 244 | year(index) < (min(validation_range) - 10)) %>% 245 | make_keras_data() 246 | 247 | validation_data <- data %>% 248 | filter(year(index) >= (min(validation_range) - 10), 249 | year(index) < (min(testing_range) - 10)) %>% 250 | make_keras_data() 251 | 252 | full_training_data <- data %>% 253 | filter(year(index) > 1758, 254 | year(index) < (min(testing_range) - 10)) %>% 255 | make_keras_data() 256 | ``` 257 | 258 | We'll also skip the tuning the model since we're focusing on the high level architecture. 259 | 260 | ```{r} 261 | history <- model %>% fit( 262 | full_training_data$x, 263 | full_training_data$y_sequence, 264 | batch_size = 256, 265 | epochs = 20, 266 | ) 267 | 268 | predictions <- model %>% 269 | predict(prediction_data$x) %>% 270 | descale_value() %>% 271 | as.vector() 272 | 273 | plot_predictions(data, predictions) 274 | ``` 275 | 276 | ## Third way: "time distributed" dense layer 277 | 278 | We're on a roll! In this next approach, we're going to set `return_sequences = TRUE` in the LSTM layer; what this does is output a prediction for each of the steps in the input sequence. This corresponds to the right-most "many to many" diagram. Since the format of the training dataset is the same, we can just define the model and start training: 279 | 280 | ```{r} 281 | model <- keras_model_sequential() %>% 282 | layer_lstm(units = 64, input_shape = c(timesteps, 1), 283 | return_sequences = TRUE) %>% 284 | layer_lstm(units = 64, return_sequences = TRUE) %>% 285 | layer_dense(1) 286 | 287 | model %>% 288 | compile(optimizer='adam', loss='mse') 289 | 290 | history <- model %>% fit( 291 | full_training_data$x, 292 | full_training_data$y_sequence, 293 | batch_size = 256, 294 | epochs = 20, 295 | ) 296 | 297 | predictions <- model %>% 298 | predict(prediction_data$x) %>% 299 | descale_value() %>% 300 | as.vector() 301 | 302 | plot_predictions(data, predictions) 303 | ``` 304 | 305 | ## Fourth way: Seq2seq/Encoder-decoder architecture 306 | 307 | Adapted from [https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html](https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html) 308 | 309 | Our final approach is slightly more involved, and it corresponds to the middle "many to many" picture. We have an *encoder* network that summarizes the historical data into some numbers, and we also have a *decoder* network that translates these summary numbers into a forecast sequence. Each step in the prediction depends on the output and hidden state of the previous time stemp. During training, we provide the actual lagged values, and during prediction, we feed back the prediction for the previous timestep into the network. Let's now build the model. 310 | 311 | ```{r} 312 | # specify the dimension of the LSTM layer 313 | latent_dim <- 128 314 | # takes an input sequence, runs it through a recurrent layer, 315 | # then return the hidden states 316 | encoder_inputs <- layer_input(shape = list(timesteps, 1)) 317 | encoder_results <- encoder_inputs %>% 318 | layer_lstm(units = latent_dim, return_state = TRUE) 319 | # the hidden states are saved in the 2nd and 3rd elements of the list 320 | encoder_states <- encoder_results[2:3] 321 | 322 | # takes the values prior to the value before forecasted 323 | # and feeds it to an lstm with hidden state initialized using 324 | # the encoder states 325 | decoder_inputs <- layer_input(shape = list(NULL, 1)) 326 | decoder_lstm <- layer_lstm( 327 | units = latent_dim, return_sequences = TRUE, 328 | return_state = TRUE) 329 | decoder_results <- decoder_inputs %>% 330 | decoder_lstm(initial_state = encoder_states) 331 | # since we have `return_sequences = TRUE`, this dense layer 332 | # would return a scalar for each step in the `decoder_inputs` 333 | # sequence 334 | decoder_dense <- layer_dense(units = 1, activation = "linear") 335 | decoder_outputs <- decoder_results[[1]] %>% 336 | decoder_dense() 337 | 338 | model <- keras_model( 339 | inputs = list(encoder_inputs, decoder_inputs), 340 | outputs = decoder_outputs 341 | ) 342 | ``` 343 | 344 | Let's now compile and fit the model: 345 | 346 | ```{r} 347 | model %>% 348 | compile(optimizer='adam', loss='mse') 349 | 350 | history <- model %>% fit( 351 | list(full_training_data$x, full_training_data$y_sequence_lag), 352 | full_training_data$y_sequence, 353 | batch_size = 256, 354 | epochs = 20 355 | ) 356 | ``` 357 | 358 | During prediction, we need a way to pass states from one step to the next, so we need to put together encoder and decoder models using the layers that have been trained above: 359 | 360 | ```{r} 361 | # the encoder model takes input values and return states 362 | encoder_model <- keras_model(encoder_inputs, encoder_states) 363 | 364 | # the decoder model takes both the previous timestep's value and 365 | # its state outputs and returns a prediction along with the new states 366 | decoder_state_input_h <- layer_input(shape = latent_dim) 367 | decoder_state_input_c <- layer_input(shape = latent_dim) 368 | decoder_states_inputs <- c(decoder_state_input_h, decoder_state_input_c) 369 | decoder_results <- decoder_lstm(decoder_inputs, initial_state = decoder_states_inputs) 370 | decoder_states <- decoder_results[2:3] 371 | decoder_outputs <- decoder_dense(decoder_results[[1]]) 372 | decoder_model <- keras_model( 373 | inputs = c(decoder_inputs, decoder_states_inputs), 374 | outputs = c(decoder_outputs, decoder_states) 375 | ) 376 | ``` 377 | 378 | We can now define a forecasting function using these models and make some predictions 379 | 380 | ```{r} 381 | forecast_values_seq2seq <- function(input_seq, encoder, decoder, steps = 120) { 382 | states_value <- predict(encoder, input_seq) 383 | target_seq <- array(tail(input_seq[1,,], 1), dim=c(1, 1, 1)) 384 | forecasted_sequence <- numeric(0) 385 | 386 | for (i in seq_len(steps)) { 387 | decoder_predict <- predict(decoder, c(list(target_seq), states_value)) 388 | output_value <- decoder_predict[[1]] 389 | 390 | forecasted_sequence <- c(forecasted_sequence, output_value) 391 | forecasted_sequence 392 | 393 | target_seq <- array(output_value, dim=c(1, 1, 1)) 394 | 395 | ## update states 396 | h <- decoder_predict[[2]] 397 | c <- decoder_predict[[3]] 398 | states_value = list(h, c) 399 | } 400 | forecasted_sequence 401 | } 402 | 403 | predictions <- forecast_values_seq2seq( 404 | prediction_data$x, encoder_model, decoder_model 405 | ) %>% descale_value() 406 | ``` 407 | 408 | Then plot the predictions 409 | ```{r} 410 | plot_predictions(data, predictions) 411 | ``` 412 | 413 | 414 | 415 | -------------------------------------------------------------------------------- /notebooks/5_flights.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Predicting flight arrival delays" 3 | output: html_notebook 4 | --- 5 | 6 | For this exercise, we'll work with the `flights` table from the `nycflights13` package. 7 | 8 | ```{r} 9 | library(tidyverse) 10 | flights <- nycflights13::flights 11 | glimpse(flights) 12 | ``` 13 | 14 | Our goal in this exercise is to predict arrival delays using information available as of takeoff for each flight. 15 | 16 | Let's do some light feature engineering: 17 | 18 | ```{r} 19 | data <- flights %>% 20 | filter(!is.na(arr_delay)) %>% 21 | mutate(orig_dest = paste0(origin, dest)) %T>% 22 | (function(x) { 23 | orig_dest_levels <<- unique(x$orig_dest) 24 | carrier_levels <<- unique(x$carrier) 25 | }) %>% 26 | mutate( 27 | orig_dest = factor(orig_dest, levels = orig_dest_levels) %>% 28 | as.integer() %>% 29 | subtract(1), 30 | carrier = factor(carrier, levels = carrier_levels) %>% 31 | as.integer() %>% 32 | subtract(1), 33 | key = ifelse(month >= 11, "test", "train") 34 | ) 35 | training_data <- filter(data, key == "train") 36 | testing_data <- filter(data, key == "test") 37 | ``` 38 | 39 | We incorporate each predictor we want to include in the model by creating an input layer for it, then concatenating them. 40 | 41 | ```{r} 42 | library(keras) 43 | carrier_input <- layer_input(shape = c(1)) 44 | dep_delay_input <- layer_input(shape = c(1)) 45 | distance_input <- layer_input(shape = c(1)) 46 | origin_destination_input <- layer_input(shape = c(1)) 47 | 48 | carrier_output <- carrier_input %>% 49 | layer_embedding(16, 8) %>% 50 | layer_flatten() 51 | 52 | origin_destination_output <- origin_destination_input %>% 53 | layer_embedding(256, 128) %>% 54 | layer_flatten() 55 | 56 | main_layer <- layer_concatenate( 57 | c(dep_delay_input, distance_input, 58 | carrier_output, origin_destination_output) 59 | ) 60 | 61 | output <- main_layer %>% 62 | layer_dense(units = 256, activation = "relu") %>% 63 | layer_dropout(0.2) %>% 64 | layer_dense(units = 256) %>% 65 | (function(x) layer_concatenate(c(x, dep_delay_input))) %>% 66 | layer_dropout(0.2) %>% 67 | layer_dense(units = 128) %>% 68 | layer_dense(units = 1, activation = "linear") 69 | 70 | model <- keras_model( 71 | list(dep_delay_input, distance_input, carrier_input, 72 | origin_destination_input), 73 | output 74 | ) 75 | model %>% 76 | compile(optimizer = "adam", loss = "mse") 77 | ``` 78 | 79 | We're passing our categorical variables through an embedding layer via `layer_embedding()`. The use case here is slightly different from the NLP example, but the intuition is similar: we're asking the model to learn a lower-dimensional representation of the high cardinality categorical variables in the context of the predictive modeling problem. 80 | 81 | To make training easier, we're going to scale each of the numeric variables, including the response. Here we define a function factory and use it create some helper functions. 82 | 83 | ```{r} 84 | make_scaler <- function(x) { 85 | scaled <- scale(x) 86 | center_value <- attr(scaled, "scaled:center") 87 | scale_value <- attr(scaled, "scaled:scale") 88 | list( 89 | scaler = function(x) { 90 | force(x) 91 | scale(x, center_value, scale_value) 92 | }, 93 | descaler = function(x) { 94 | force(x) 95 | x * scale_value + center_value 96 | } 97 | ) 98 | } 99 | 100 | c(scale_arr_delay, descale_arr_delay) %<-% make_scaler(training_data$arr_delay) 101 | c(scale_distance, descale_distance) %<-% make_scaler(training_data$distance) 102 | c(scale_dep_delay, descale_dep_delay) %<-% make_scaler(training_data$dep_delay) 103 | ``` 104 | 105 | Let's now traing the model! 106 | 107 | ```{r} 108 | history <- model %>% 109 | fit(list(scale_dep_delay(training_data$dep_delay), 110 | scale_distance(training_data$distance), 111 | training_data$carrier, training_data$orig_dest), 112 | scale_arr_delay(training_data$arr_delay), 113 | batch_size = 256, epochs = 10, 114 | validation_split = 0.2) 115 | ``` 116 | 117 | Score the testing dataset: 118 | 119 | ```{r} 120 | predictions <- predict( 121 | model, 122 | list( 123 | scale_dep_delay(testing_data$dep_delay), 124 | scale_distance(testing_data$distance), 125 | testing_data$carrier, testing_data$orig_dest) 126 | ) %>% 127 | descale_arr_delay() 128 | ``` 129 | 130 | Look at some qualitative performance results: 131 | 132 | ```{r} 133 | testing_data %>% 134 | add_column(prediction = as.vector(predictions)) %>% 135 | select(arr_delay, prediction) %>% 136 | mutate(decile = cut(prediction, quantile(prediction, probs = seq(0, 1, 0.1)), 137 | labels = FALSE, include.lowest = TRUE) 138 | ) %>% 139 | group_by(decile) %>% 140 | summarize(mean_actual = mean(arr_delay), 141 | mean_predicted = mean(prediction) 142 | ) %>% 143 | gather(key, value, mean_actual, mean_predicted) %>% 144 | ggplot(aes(x = as.factor(decile), y = value, fill = key)) + 145 | geom_bar(stat = "identity", position = "dodge") 146 | ``` 147 | 148 | -------------------------------------------------------------------------------- /notebooks/rnn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kevinykuo/user2018-deeplearning/258815dfccbbf9289c17b030b8d2b64d17012063/notebooks/rnn.png -------------------------------------------------------------------------------- /user2018-deeplearning.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | --------------------------------------------------------------------------------