├── .gitignore
├── notebooks
    ├── 1_mnist_feedforward.Rmd
    ├── 2_cnn.Rmd
    ├── 3_text_classification.Rmd
    ├── 4_sunspots.Rmd
    ├── 5_flights.Rmd
    └── rnn.png
└── user2018-deeplearning.Rproj


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | .DS_Store
6 | 


--------------------------------------------------------------------------------
/notebooks/1_mnist_feedforward.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "MNIST example"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | We'll start with the "hello world" of neural nets --- classifying some handwritten digits!
  7 | 
  8 | First, load the **keras** package and load a built-in dataset:
  9 | 
 10 | ```{r}
 11 | library(keras)
 12 | mnist <- dataset_mnist()
 13 | str(mnist)
 14 | ```
 15 | 
 16 | We see that `mnist$train$x` is an array of matrices that represent the images. Let's visualize a couple of them:
 17 | 
 18 | ```{r}
 19 | plot_mnist_example <- function(i) {
 20 |   plot(as.raster(mnist$train$x[i,,] / 255))
 21 | }
 22 | plot_mnist_example(1)
 23 | plot_mnist_example(42)
 24 | ```
 25 | 
 26 | Some necessary data massaging. 
 27 | 
 28 | ```{r}
 29 | c(c(x_train, y_train), c(x_test, y_test)) %<-% mnist
 30 | 
 31 | # Collapse each matrix to a 1-dimensional vector
 32 | x_train <- array_reshape(x_train, c(nrow(x_train), 784))
 33 | x_test <- array_reshape(x_test, c(nrow(x_test), 784))
 34 | 
 35 | # Transform RGB values into [0,1] range
 36 | x_train <- x_train / 255
 37 | x_test <- x_test / 255
 38 | 
 39 | # One-hot encode classes
 40 | num_classes <- 10
 41 | y_train <- to_categorical(y_train, num_classes)
 42 | y_test <- to_categorical(y_test, num_classes)
 43 | 
 44 | dim(x_train)
 45 | dim(y_train)
 46 | ```
 47 | 
 48 | We're now ready to define a neural network model!
 49 | 
 50 | ```{r}
 51 | # Instantiate a Keras sequential model object
 52 | model <- keras_model_sequential()
 53 | 
 54 | # Define the model architecture, note the by-reference semantics
 55 | model %>% 
 56 |   layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% 
 57 |   layer_dense(units = 128, activation = 'relu') %>%
 58 |   layer_dense(units = 10, activation = 'softmax')
 59 | 
 60 | summary(model)
 61 | ```
 62 | 
 63 | How do we know what the architecture should look like? It's a combination of (mostly) trial-and-error, experience, and looking at what others have done. Every problem is unique in some way!
 64 | 
 65 | Once we've specified the architecture, we can `compile` the model by specifying the loss we're mimizing and the optimizer.
 66 | 
 67 | ```{r}
 68 | model %>% compile(
 69 |   loss = 'categorical_crossentropy',
 70 |   optimizer = "adam",
 71 |   metrics = c('accuracy')
 72 | )
 73 | ```
 74 | 
 75 | Now that we have a compiled model, we're ready to start training!
 76 | 
 77 | ```{r}
 78 | # batch_size: how many observations we use per parameter update
 79 | batch_size <- 128
 80 | # epoch: one epoch is one full pass through the training dataset
 81 | epochs <- 15
 82 | 
 83 | history <- model %>% fit(
 84 |   x_train, y_train,
 85 |   batch_size = batch_size,
 86 |   epochs = epochs,
 87 |   verbose = 1,
 88 |   validation_split = 0.2
 89 | )
 90 | 
 91 | plot(history)
 92 | ```
 93 | 
 94 | Once the model is trained, we can `evaluate` it and look at some performance metrics
 95 | 
 96 | ```{r}
 97 | score <- model %>% evaluate(
 98 |   x_test, y_test,
 99 |   verbose = 0
100 | )
101 | 
102 | # Output metrics
103 | cat('Test loss:', score[[1]], '\n')
104 | cat('Test accuracy:', score[[2]], '\n')
105 | ```
106 | 
107 | # Manually inspecting predictions
108 | 
109 | Can you make a couple predictions and check against the actual images?
110 | 
111 | To make the predictions, we can use `predict_classes()`:
112 | 
113 | ```{r}
114 | predictions <- model %>%
115 |   predict_classes(x_test)
116 | predictions[1]
117 | ```
118 | 
119 | To get the raw probabilities:
120 | 
121 | ```{r}
122 | predictions_prob <- model %>%
123 |   predict_proba(x_test)
124 | ```
125 | 
126 | 


--------------------------------------------------------------------------------
/notebooks/2_cnn.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Conv Nets"
 3 | output: html_notebook
 4 | ---
 5 | 
 6 | In the first example, we "stretched" the matrix of pixels into a long vector which served as the input of the feedforward neural net. In practice, most image processing applications utilize another type of architecture known as convolutional neural networks (CNN), which exploits the spatial relationship among the pixels.
 7 | 
 8 | We'll continue with the `mnist` dataset from the previous section.
 9 | 
10 | ```{r}
11 | library(keras)
12 | mnist <- dataset_mnist()
13 | str(mnist)
14 | ```
15 | 
16 | Note the call to `array_reshape()` below --- we're reshaping the images to volumes of dimension $img\_rows \times img\_cols \times 1$. The last dimension, *depth*, in this case is `1` since our images are grayscale. 
17 | 
18 | ```{r}
19 | img_rows <- 28
20 | img_cols <- 28
21 | 
22 | c(c(x_train, y_train), c(x_test, y_test)) %<-% mnist
23 | 
24 | x_train <- array_reshape(x_train, c(nrow(x_train), img_rows, img_cols, 1))
25 | x_test <- array_reshape(x_test, c(nrow(x_test), img_rows, img_cols, 1))
26 | input_shape <- c(img_rows, img_cols, 1)
27 | 
28 | # Transform RGB values into [0,1] range
29 | x_train <- x_train / 255
30 | x_test <- x_test / 255
31 | 
32 | # Convert class vectors to binary class matrices
33 | num_classes <- 10
34 | y_train <- to_categorical(y_train, num_classes)
35 | y_test <- to_categorical(y_test, num_classes)
36 | 
37 | dim(x_train)
38 | ```
39 | 
40 | We're now ready to define a neural network model!
41 | 
42 | ```{r}
43 | # Instantiate a Keras sequential model object
44 | model <- keras_model_sequential() %>%
45 |   layer_conv_2d(filters = 128, kernel_size = c(3, 3), 
46 |                 activation = 'relu', input_shape = input_shape) %>% 
47 |   layer_conv_2d(filters = 128, kernel_size = c(3, 3), 
48 |                 strides = 2, activation = 'relu') %>% 
49 |   layer_max_pooling_2d(pool_size = c(2, 2)) %>%
50 |   layer_flatten() %>%
51 |   layer_dense(units = num_classes, activation = 'softmax')
52 | 
53 | summary(model)
54 | ```
55 | 
56 | Now, we can compile the model!
57 | 
58 | ```{r}
59 | model %>% compile(
60 |   loss = 'categorical_crossentropy',
61 |   optimizer = "adam",
62 |   metrics = c('accuracy')
63 | )
64 | ```
65 | 
66 | Now that we have a compiled model, we're ready to start training!
67 | 
68 | ```{r}
69 | batch_size <- 128
70 | epochs <- 10
71 | 
72 | history <- model %>% fit(
73 |   x_train, y_train,
74 |   batch_size = batch_size,
75 |   epochs = epochs,
76 |   verbose = 1,
77 |   validation_split = 0.2
78 | )
79 | 
80 | plot(history)
81 | ```
82 | 
83 | Once the model is trained, we can `evaluate` it and look at some performance metrics
84 | 
85 | ```{r}
86 | score <- model %>% evaluate(
87 |   x_test, y_test,
88 |   verbose = 0
89 | )
90 | 
91 | # Output metrics
92 | cat('Test loss:', score[[1]], '\n')
93 | cat('Test accuracy:', score[[2]], '\n')
94 | ```
95 | 
96 | 


--------------------------------------------------------------------------------
/notebooks/3_text_classification.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "IMDb classification"
 3 | output: html_notebook
 4 | ---
 5 | 
 6 | In this example, we'll try to classify whether movie reviews on IMDb are positive or negative based on the review text.
 7 | 
 8 | We'll start by loading some data that comes with the **keras** package. Here, `max_features` denotes the number of unique word tokens we want to consider.
 9 | 
10 | ```{r}
11 | library(keras)
12 | max_features <- 20000
13 | imdb <- dataset_imdb(num_words = max_features)
14 | c(c(x_train, y_train), c(x_test, y_test)) %<-% imdb
15 | ```
16 | 
17 | Let's see what the training data looks like
18 | 
19 | ```{r}
20 | x_train[[1]]
21 | ```
22 | 
23 | To turn this into English, we can use the following helper function
24 | 
25 | ```{r}
26 | word_index <- dataset_imdb_word_index()
27 | reverse_word_index <- names(word_index)
28 | names(reverse_word_index) <- word_index
29 | decode_imdb <- function(indexes) {
30 |   force(indexes)
31 |   words <- sapply(indexes, function(index) {
32 |     word <- if (index >= 3) reverse_word_index[[as.character(index - 3)]]
33 |     if (!is.null(word)) word else "?"
34 |   })
35 |   paste(words, collapse = " ")
36 | }
37 | decode_imdb(x_train[[1]])
38 | ```
39 | 
40 | Since the reviews are of variable length, we need to get them all to the same length before we can feed them to the model.
41 | 
42 | ```{r}
43 | maxlen <- 80
44 | x_train <- pad_sequences(x_train, maxlen = maxlen)
45 | x_test <- pad_sequences(x_test, maxlen = maxlen)
46 | ```
47 | 
48 | Our `x_train` is now a matrix where each row corresponds to a review:
49 | 
50 | ```{r}
51 | str(x_train)
52 | ```
53 | 
54 | We can then define a model and compile it:
55 | 
56 | ```{r}
57 | model <- keras_model_sequential()
58 | model %>%
59 |   layer_embedding(input_dim = max_features, output_dim = 128) %>% 
60 |   layer_gru(units = 32, dropout = 0.2, recurrent_dropout = 0.2) %>% 
61 |   layer_dense(units = 1, activation = 'sigmoid')
62 | 
63 | model %>% compile(
64 |   loss = 'binary_crossentropy',
65 |   optimizer = 'adam',
66 |   metrics = c('accuracy')
67 | )
68 | ```
69 | 
70 | Train the model and evaluate its performance
71 | 
72 | ```{r}
73 | model %>% fit(
74 |   x_train, y_train,
75 |   batch_size = batch_size,
76 |   epochs = 3,
77 |   validation_data = list(x_test, y_test)
78 | )
79 | 
80 | scores <- model %>% evaluate(
81 |   x_test, y_test,
82 |   batch_size = batch_size
83 | )
84 | 
85 | cat('Test loss:', scores[[1]])
86 | cat('Test accuracy', scores[[2]])
87 | ```
88 | 
89 | 


--------------------------------------------------------------------------------
/notebooks/4_sunspots.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Time series forecasting"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | We're gonna work with the `sunspot.month` dataset
  7 | 
  8 | > Monthly numbers of sunspots, as from the World Data Center, aka SIDC. This is the version of the data that will occasionally be updated when new counts become available.
  9 | 
 10 | ```{r}
 11 | sunspot.month
 12 | ```
 13 | 
 14 | ```{r}
 15 | library(tidyverse)
 16 | library(tsibble)
 17 | sunspot.month %>%
 18 |   as_tsibble() %>%
 19 |   ggplot(aes(index, value)) +
 20 |   geom_line()
 21 | ```
 22 | 
 23 | 
 24 | The goal of this exercise will be to build a model that takes 10 years' worth of data (120 observations) and predict the next 10 years (120 observations).
 25 | 
 26 | # One problem, four ways
 27 | 
 28 | We'll go through four different ways to attack this problem (did I mention there's a bazillion of them?). Our approaches will correspond to each of these diagrams:
 29 | 
 30 | ```{r echo = FALSE}
 31 | knitr::include_graphics("rnn.png")
 32 | ```
 33 | 
 34 | ## First way: one step at a time
 35 | 
 36 | As a first attempt, we're going to set up our model so that our response variable is one time step, and our predictors are the 120 values preceding the response. This corresponds to the "many to one" picture in the diagram.
 37 | 
 38 | We need to do some data prep first:
 39 | 
 40 | ```{r}
 41 | library(lubridate)
 42 | library(keras)
 43 | 
 44 | make_series <- function(column, start_offset, end_offset) {
 45 |   # Given a time series, return a list
 46 |   #  where each element is a vector representing a window
 47 |   #  of the time series determined by the offsets
 48 |   purrr::map(seq_along(column),
 49 |              function(x) {
 50 |                start <- max(0, x + start_offset)
 51 |                end <- max(0, x + end_offset)
 52 |                column[start:end]
 53 |              })
 54 | }
 55 | 
 56 | make_scaler <- function(x) {
 57 |   scaled <- scale(x)
 58 |   center_value <- attr(scaled, "scaled:center")
 59 |   scale_value <- attr(scaled, "scaled:scale")
 60 |   list(
 61 |     scaler = function(x) {
 62 |       force(x)
 63 |       scale(x, center_value, scale_value)
 64 |     },
 65 |     descaler = function(x) {
 66 |       force(x)
 67 |       x * scale_value + center_value
 68 |     }
 69 |   )
 70 | }
 71 | 
 72 | c(scale_value, descale_value) %<-% make_scaler(
 73 |   window(sunspot.month, 1749, c(1992, 12))
 74 | )
 75 | 
 76 | train_range <- 1749:1992
 77 | validation_range <- 1993:2002
 78 | testing_range <- 2003:2012
 79 | timesteps <- 120
 80 | 
 81 | data <- sunspot.month %>%
 82 |   as_tsibble() %>%
 83 |   as_data_frame() %>%
 84 |   mutate(key = case_when(
 85 |     year(index) %in% train_range ~ "train",
 86 |     year(index) %in% validation_range ~ "validation",
 87 |     year(index) %in% testing_range ~ "testing"
 88 |   )) %>%
 89 |   mutate(scaled_value = scale_value(value) %>%
 90 |            as.vector()) %>%
 91 |   mutate(lookback = make_series(scaled_value, -timesteps, -1),
 92 |          target = make_series(scaled_value, 0, timesteps - 1),
 93 |          target_lag = make_series(scaled_value, -1, timesteps - 2))
 94 | 
 95 | glimpse(data)
 96 | ```
 97 | 
 98 | We also need to massage our data into a format that Keras can consume:
 99 | 
100 | ```{r}
101 | make_keras_data <- function(data) {
102 |   x <- data$lookback %>%
103 |     array_reshape(c(length(data$lookback), timesteps, 1))
104 |   y_sequence <- data$target %>%
105 |     array_reshape(c(length(data$target), timesteps, 1))
106 |   y_sequence_lag <- data$target_lag %>%
107 |     array_reshape(c(length(data$target_lag), timesteps, 1))
108 |   y <- data$target %>%
109 |     sapply(first)
110 |   list(x = x, y = y, y_sequence = y_sequence,
111 |        y_sequence_lag = y_sequence_lag)
112 | }
113 | 
114 | training_data <- data %>%
115 |   filter(year(index) > 1758,
116 |          key == "train") %>%
117 |   make_keras_data()
118 | 
119 | validation_data <- data %>%
120 |   filter(key == "validation") %>%
121 |   make_keras_data()
122 | 
123 | full_training_data <- data %>%
124 |   filter(key != "testing", year(index) > 1758) %>%
125 |   make_keras_data()
126 | 
127 | prediction_data <- data %>%
128 |   filter(year(index) == 2003, month(index) == 1) %>%
129 |   make_keras_data()
130 | 
131 | str(training_data)
132 | ```
133 | 
134 | Let's now define our model.
135 | 
136 | ```{r}
137 | model <- keras_model_sequential() %>%
138 |   layer_lstm(units = 128, input_shape = c(timesteps, 1)) %>%
139 |   layer_dense(1)
140 | 
141 | model %>% 
142 |   compile(optimizer='adam', loss='mse')
143 | 
144 | model
145 | ```
146 | 
147 | Let's see if it learns!
148 | 
149 | ```{r}
150 | history <- model %>% fit(
151 |   training_data$x, 
152 |   training_data$y,
153 |   batch_size = 256,
154 |   epochs = 20,
155 |   validation_data = list(
156 |     validation_data$x,
157 |     validation_data$y
158 |   )
159 | )
160 | plot(history)
161 | ```
162 | 
163 | After we're done tweaking the model, we can train it on the full dataset:
164 | 
165 | ```{r}
166 | model <- keras_model_sequential() %>%
167 |   layer_lstm(units = 128, input_shape = c(timesteps, 1)) %>%
168 |   layer_dense(1)
169 | 
170 | model %>% 
171 |   compile(optimizer='adam', loss='mse')
172 | 
173 | history <- model %>% fit(
174 |   full_training_data$x,
175 |   full_training_data$y,
176 |   batch_size = 256,
177 |   epochs = 20,
178 | )
179 | ```
180 | 
181 | Since the model outputs only one prediction, and we need 120, we need to come up with a scheme to do that. The tricky part here is that for any value more than one time step into the future, we're missing some previous values. To work around this, we're going to feed back the predictions iteratively:
182 | 
183 | ```{r}
184 | forecast_future_values <- function(model, x, steps) {
185 |   forecasted <- numeric(0)
186 |   for (i in seq_len(steps)) {
187 |     x_reshaped <- array_reshape(x, c(1, timesteps, 1))
188 |     next_value <- model %>%
189 |       predict(x_reshaped) %>%
190 |       as.vector()
191 |     
192 |     forecasted <- c(forecasted, next_value)
193 |     x <- c(x[-1], next_value)
194 |   }
195 |   forecasted
196 | }
197 | 
198 | predictions <- forecast_future_values(model, prediction_data$x, 12*10) %>% 
199 |   descale_value()
200 | 
201 | predictions
202 | ```
203 | 
204 | Let's see how our model did:
205 | 
206 | ```{r}
207 | plot_predictions <- function(data, predictions) {
208 |   data %>% 
209 |     filter(key == "testing") %>% 
210 |     add_column(prediction = predictions) %>%
211 |     rbind(data %>%
212 |             filter(key != "testing",
213 |                    year(index)> 1980) %>%
214 |             add_column(prediction = NA_real_)
215 |     ) %>%
216 |     select(index, value, prediction) %>%
217 |     gather(key, value, value, prediction) %>%
218 |     ggplot(aes(x = index, y = value, color = key)) +
219 |     geom_line()
220 | }
221 | 
222 | plot_predictions(data, predictions)
223 | ```
224 | 
225 | ## Second way: have the dense layer output `timesteps` numbers
226 | 
227 | The only difference compared to the previous approach is that we're outputing a vector of 120 steps instead of 1. This still corresponds to the "many to one" picture.
228 | 
229 | ```{r}
230 | model <- keras_model_sequential() %>%
231 |   layer_lstm(units = 128, input_shape = c(timesteps, 1)) %>%
232 |   layer_dense(timesteps) %>%
233 |   layer_reshape(c(timesteps, 1))
234 | 
235 | model %>% 
236 |   compile(optimizer='adam', loss='mse')
237 | ```
238 | 
239 | We also need to tweak our training dataset to accommdate the new model structure. Since the `y_sequence` output of `make_keras_data()` looks two years into the future, we need to make sure we're not cheating:
240 | 
241 | ```{r}
242 | training_data <- data %>%
243 |   filter(year(index) > 1758,
244 |          year(index) < (min(validation_range) - 10)) %>%
245 |   make_keras_data()
246 | 
247 | validation_data <- data %>%
248 |   filter(year(index) >= (min(validation_range) - 10),
249 |          year(index) < (min(testing_range) - 10)) %>%
250 |   make_keras_data()
251 | 
252 | full_training_data <- data %>%
253 |   filter(year(index) > 1758,
254 |          year(index) < (min(testing_range) - 10)) %>%
255 |   make_keras_data()
256 | ```
257 | 
258 | We'll also skip the tuning the model since we're focusing on the high level architecture.
259 | 
260 | ```{r}
261 | history <- model %>% fit(
262 |   full_training_data$x,
263 |   full_training_data$y_sequence,
264 |   batch_size = 256,
265 |   epochs = 20,
266 | )
267 | 
268 | predictions <- model %>%
269 |   predict(prediction_data$x) %>%
270 |   descale_value() %>%
271 |   as.vector()
272 | 
273 | plot_predictions(data, predictions)
274 | ```
275 | 
276 | ## Third way: "time distributed" dense layer
277 | 
278 | We're on a roll! In this next approach, we're going to set `return_sequences = TRUE` in the LSTM layer; what this does is output a prediction for each of the steps in the input sequence. This corresponds to the right-most "many to many" diagram. Since the format of the training dataset is the same, we can just define the model and start training:
279 | 
280 | ```{r}
281 | model <- keras_model_sequential() %>%
282 |   layer_lstm(units = 64, input_shape = c(timesteps, 1),
283 |              return_sequences = TRUE) %>%
284 |   layer_lstm(units = 64, return_sequences = TRUE) %>%
285 |   layer_dense(1)
286 | 
287 | model %>% 
288 |   compile(optimizer='adam', loss='mse')
289 | 
290 | history <- model %>% fit(
291 |   full_training_data$x,
292 |   full_training_data$y_sequence,
293 |   batch_size = 256,
294 |   epochs = 20,
295 | )
296 | 
297 | predictions <- model %>%
298 |   predict(prediction_data$x) %>%
299 |   descale_value() %>%
300 |   as.vector()
301 | 
302 | plot_predictions(data, predictions)
303 | ```
304 | 
305 | ## Fourth way: Seq2seq/Encoder-decoder architecture
306 | 
307 | Adapted from [https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html](https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html)
308 | 
309 | Our final approach is slightly more involved, and it corresponds to the middle "many to many" picture. We have an *encoder* network that summarizes the historical data into some numbers, and we also have a *decoder* network that translates these summary numbers into a forecast sequence. Each step in the prediction depends on the output and hidden state of the previous time stemp. During training, we provide the actual lagged values, and during prediction, we feed back the prediction for the previous timestep into the network. Let's now build the model.
310 | 
311 | ```{r}
312 | # specify the dimension of the LSTM layer
313 | latent_dim <- 128
314 | # takes an input sequence, runs it through a recurrent layer,
315 | #   then return the hidden states
316 | encoder_inputs  <- layer_input(shape = list(timesteps, 1))
317 | encoder_results <- encoder_inputs %>%
318 |   layer_lstm(units = latent_dim, return_state = TRUE)
319 | # the hidden states are saved in the 2nd and 3rd elements of the list
320 | encoder_states <- encoder_results[2:3]
321 | 
322 | # takes the values prior to the value before forecasted
323 | #   and feeds it to an lstm with hidden state initialized using
324 | #   the encoder states
325 | decoder_inputs  <- layer_input(shape = list(NULL, 1))
326 | decoder_lstm <- layer_lstm(
327 |   units = latent_dim, return_sequences = TRUE,
328 |   return_state = TRUE)
329 | decoder_results <- decoder_inputs %>%
330 |   decoder_lstm(initial_state = encoder_states)
331 | # since we have `return_sequences = TRUE`, this dense layer
332 | #   would return a scalar for each step in the `decoder_inputs`
333 | #   sequence
334 | decoder_dense <- layer_dense(units = 1, activation = "linear")
335 | decoder_outputs <- decoder_results[[1]] %>%
336 |   decoder_dense()
337 | 
338 | model <- keras_model(
339 |   inputs = list(encoder_inputs, decoder_inputs),
340 |   outputs = decoder_outputs
341 | )
342 | ```
343 | 
344 | Let's now compile and fit the model:
345 | 
346 | ```{r}
347 | model %>% 
348 |   compile(optimizer='adam', loss='mse')
349 | 
350 | history <- model %>% fit(
351 |   list(full_training_data$x, full_training_data$y_sequence_lag), 
352 |   full_training_data$y_sequence,
353 |   batch_size = 256,
354 |   epochs = 20
355 |   )
356 | ```
357 | 
358 | During prediction, we need a way to pass states from one step to the next, so we need to put together encoder and decoder models using the layers that have been trained above:
359 | 
360 | ```{r}
361 | # the encoder model takes input values and return states
362 | encoder_model <-  keras_model(encoder_inputs, encoder_states)
363 | 
364 | # the decoder model takes both the previous timestep's value and
365 | #  its state outputs and returns a prediction along with the new states
366 | decoder_state_input_h <- layer_input(shape = latent_dim)
367 | decoder_state_input_c <- layer_input(shape = latent_dim)
368 | decoder_states_inputs <- c(decoder_state_input_h, decoder_state_input_c)
369 | decoder_results <- decoder_lstm(decoder_inputs, initial_state = decoder_states_inputs)
370 | decoder_states  <- decoder_results[2:3]
371 | decoder_outputs <- decoder_dense(decoder_results[[1]])
372 | decoder_model   <- keras_model(
373 |   inputs  = c(decoder_inputs, decoder_states_inputs),
374 |   outputs = c(decoder_outputs, decoder_states)
375 | )
376 | ```
377 | 
378 | We can now define a forecasting function using these models and make some predictions
379 | 
380 | ```{r}
381 | forecast_values_seq2seq <- function(input_seq, encoder, decoder, steps = 120) {
382 |   states_value <- predict(encoder, input_seq)
383 |   target_seq <- array(tail(input_seq[1,,], 1), dim=c(1, 1, 1))
384 |   forecasted_sequence <- numeric(0)
385 | 
386 |   for (i in seq_len(steps)) {
387 |     decoder_predict <- predict(decoder, c(list(target_seq), states_value))
388 |     output_value <- decoder_predict[[1]]
389 |     
390 |     forecasted_sequence <-  c(forecasted_sequence, output_value)
391 |     forecasted_sequence
392 |     
393 |     target_seq <- array(output_value, dim=c(1, 1, 1))
394 |     
395 |     ## update states
396 |     h <- decoder_predict[[2]]
397 |     c <- decoder_predict[[3]]
398 |     states_value = list(h, c)
399 |   }    
400 |   forecasted_sequence
401 | }
402 | 
403 | predictions <- forecast_values_seq2seq(
404 |   prediction_data$x, encoder_model, decoder_model
405 |   ) %>% descale_value()
406 | ```
407 | 
408 | Then plot the predictions
409 | ```{r}
410 | plot_predictions(data, predictions)
411 | ```
412 | 
413 | 
414 | 
415 | 


--------------------------------------------------------------------------------
/notebooks/5_flights.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Predicting flight arrival delays"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | For this exercise, we'll work with the `flights` table from the `nycflights13` package. 
  7 | 
  8 | ```{r}
  9 | library(tidyverse)
 10 | flights <- nycflights13::flights
 11 | glimpse(flights)
 12 | ```
 13 | 
 14 | Our goal in this exercise is to predict arrival delays using information available as of takeoff for each flight.
 15 | 
 16 | Let's do some light feature engineering:
 17 | 
 18 | ```{r}
 19 | data <- flights %>%
 20 |   filter(!is.na(arr_delay)) %>%
 21 |   mutate(orig_dest = paste0(origin, dest)) %T>%
 22 |   (function(x) {
 23 |     orig_dest_levels <<- unique(x$orig_dest)
 24 |     carrier_levels <<- unique(x$carrier)
 25 |   }) %>%
 26 |   mutate(
 27 |     orig_dest = factor(orig_dest, levels = orig_dest_levels) %>%
 28 |       as.integer() %>%
 29 |       subtract(1),
 30 |     carrier = factor(carrier, levels = carrier_levels) %>%
 31 |       as.integer() %>%
 32 |       subtract(1),
 33 |     key = ifelse(month >= 11, "test", "train")
 34 |     )
 35 | training_data <- filter(data, key == "train")
 36 | testing_data <- filter(data, key == "test")
 37 | ```
 38 | 
 39 | We incorporate each predictor we want to include in the model by creating an input layer for it, then concatenating them.
 40 | 
 41 | ```{r}
 42 | library(keras)
 43 | carrier_input <- layer_input(shape = c(1))
 44 | dep_delay_input <- layer_input(shape = c(1))
 45 | distance_input <- layer_input(shape = c(1))
 46 | origin_destination_input <- layer_input(shape = c(1))
 47 | 
 48 | carrier_output <- carrier_input %>%
 49 |   layer_embedding(16, 8) %>%
 50 |   layer_flatten()
 51 | 
 52 | origin_destination_output <- origin_destination_input %>%
 53 |   layer_embedding(256, 128) %>%
 54 |   layer_flatten()
 55 | 
 56 | main_layer <- layer_concatenate(
 57 |   c(dep_delay_input, distance_input,
 58 |     carrier_output, origin_destination_output)
 59 |   )
 60 | 
 61 | output <- main_layer %>%
 62 |   layer_dense(units = 256, activation = "relu") %>%
 63 |   layer_dropout(0.2) %>%
 64 |   layer_dense(units = 256) %>%
 65 |   (function(x) layer_concatenate(c(x, dep_delay_input))) %>%
 66 |   layer_dropout(0.2) %>%
 67 |   layer_dense(units = 128) %>%
 68 |   layer_dense(units = 1, activation = "linear")
 69 | 
 70 | model <- keras_model(
 71 |   list(dep_delay_input, distance_input, carrier_input,
 72 |        origin_destination_input),
 73 |   output
 74 | )
 75 | model %>%
 76 |   compile(optimizer = "adam", loss = "mse")
 77 | ```
 78 | 
 79 | We're passing our categorical variables through an embedding layer via `layer_embedding()`. The use case here is slightly different from the NLP example, but the intuition is similar: we're asking the model to learn a lower-dimensional representation of the high cardinality categorical variables in the context of the predictive modeling problem.
 80 | 
 81 | To make training easier, we're going to scale each of the numeric variables, including the response. Here we define a function factory and use it create some helper functions.
 82 | 
 83 | ```{r}
 84 | make_scaler <- function(x) {
 85 |   scaled <- scale(x)
 86 |   center_value <- attr(scaled, "scaled:center")
 87 |   scale_value <- attr(scaled, "scaled:scale")
 88 |   list(
 89 |     scaler = function(x) {
 90 |       force(x)
 91 |       scale(x, center_value, scale_value)
 92 |     },
 93 |     descaler = function(x) {
 94 |       force(x)
 95 |       x * scale_value + center_value
 96 |     }
 97 |   )
 98 | }
 99 | 
100 | c(scale_arr_delay, descale_arr_delay) %<-% make_scaler(training_data$arr_delay)
101 | c(scale_distance, descale_distance) %<-% make_scaler(training_data$distance)
102 | c(scale_dep_delay, descale_dep_delay) %<-% make_scaler(training_data$dep_delay)
103 | ```
104 | 
105 | Let's now traing the model!
106 | 
107 | ```{r}
108 | history <- model %>%
109 |   fit(list(scale_dep_delay(training_data$dep_delay), 
110 |            scale_distance(training_data$distance), 
111 |            training_data$carrier, training_data$orig_dest), 
112 |       scale_arr_delay(training_data$arr_delay),
113 |       batch_size = 256, epochs = 10,
114 |       validation_split = 0.2)
115 | ```
116 | 
117 | Score the testing dataset:
118 | 
119 | ```{r}
120 | predictions <- predict(
121 |   model, 
122 |   list(
123 |     scale_dep_delay(testing_data$dep_delay), 
124 |     scale_distance(testing_data$distance), 
125 |     testing_data$carrier, testing_data$orig_dest)
126 | ) %>%
127 |   descale_arr_delay()
128 | ```
129 | 
130 | Look at some qualitative performance results:
131 | 
132 | ```{r}
133 | testing_data %>%
134 |   add_column(prediction = as.vector(predictions)) %>%
135 |   select(arr_delay, prediction) %>%
136 |   mutate(decile = cut(prediction, quantile(prediction, probs = seq(0, 1, 0.1)),
137 |                       labels = FALSE, include.lowest = TRUE)
138 |          ) %>%
139 |   group_by(decile) %>%
140 |   summarize(mean_actual = mean(arr_delay),
141 |             mean_predicted = mean(prediction)
142 |             ) %>%
143 |   gather(key, value, mean_actual, mean_predicted) %>%
144 |   ggplot(aes(x = as.factor(decile), y = value, fill = key)) +
145 |   geom_bar(stat = "identity", position = "dodge")
146 | ```
147 | 
148 | 


--------------------------------------------------------------------------------
/notebooks/rnn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kevinykuo/user2018-deeplearning/258815dfccbbf9289c17b030b8d2b64d17012063/notebooks/rnn.png


--------------------------------------------------------------------------------
/user2018-deeplearning.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 


--------------------------------------------------------------------------------