├── .gitignore
├── Lessons
    ├── 1_Deep_Learning_Intro.qmd
    └── 2_Natural_Language_Processing.qmd
├── R-Deep-Learning.Rproj
├── README.md
└── install_script.R


/.gitignore:
--------------------------------------------------------------------------------
 1 | .Rproj.user
 2 | .Rhistory
 3 | .RData
 4 | .Ruserdata
 5 | .DS_Store
 6 | 
 7 | # Don't track these medical images in git.
 8 | data-raw/*.zip
 9 | data-raw/Open_I_abd_vs_CXRs/
10 | data-raw/condensed_2018.json
11 | 
12 | # cloudml
13 | runs


--------------------------------------------------------------------------------
/Lessons/1_Deep_Learning_Intro.qmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Deep Learning in R: Introduction"
  3 | author: "D-Lab"
  4 | format: html
  5 | editor: visual
  6 | ---
  7 | 
  8 | ## Libraries
  9 | 
 10 | We will be using the following libraries:
 11 | 
 12 | ```{r, message = F, warning = F}
 13 | library(tensorflow)
 14 | library(keras)
 15 | library(tfdatasets)
 16 | library(tfhub)
 17 | library(tidyverse)
 18 | library(reticulate)
 19 | ```
 20 | 
 21 | ## What is Deep Learning?
 22 | 
 23 | In any kind of machine learning, we are interested in mapping inputs (pictures of Oski the Bear) to targets (the label "Oski the Bear"). The machine part means a computer operating an algorithm. The learning part means some type of automatic search process to transform data in such a way as to produce useful representations guided by a feedback signal.
 24 | 
 25 | It turns out that this idea of searching for useful representations of data (such as histograms of the pixels in a picture) within a specified set of possibilities and some rule for how good the representation is, solves a remarkably large set of tasks.
 26 | 
 27 | The "Deep" in deep learning means that the models we build are layered representations of data. Modern deep learning in production can have many layers, and all of these layers are learned automatically from training data. For example, the model GPT-3 has 96.
 28 | 
 29 | ## How to Build a Neural Network
 30 | 
 31 | The representations of layers are learned by models called *neural networks*, where information goes through successive hierarchies to produce something (hopefully) useful at the end. We call what the layer does to input data its *weight* and each transformation implemented the *parameterization* of a layer.
 32 | 
 33 | A model learns by finding values for the weights of all layers in the network so that there will be a correct mapping from example inputs to associated targets. We evaluate the correctness with a loss function which determines the distance between our model predictions and the true values.
 34 | 
 35 | Error is computed as target value minus our estimation. The weights delta metric (how much the errors should change) is calculated as the error times the slope of the point on the activation function times the vector of input features.
 36 | 
 37 | The vector of original weights is added to the vector of updated weights and are "backpropagated" (used as the recycled input) and passed through the model for another epoch.
 38 | 
 39 | With deep networks, the process also takes places between **hidden layers**, or areas of *nonlinear* transformations connected only to the layers before and after them. They are referred to as "hidden" because they are not show as the final output.
 40 | 
 41 | ## Our First Model: Handwriting Digits
 42 | 
 43 | The problem we will try to solve is to classify grayscale images of handwritten digits into their numeric categorizations. Each of these images is 28 x 28 pixels and is one of the first ten digits 0-9.
 44 | 
 45 | ```{r}
 46 | library(tensorflow)
 47 | library(keras)
 48 | mnist = dataset_mnist()
 49 | ```
 50 | 
 51 | The MNIST dataset is in some sense the "Hello World!" of Deep Learning, so we will use it to explain various properties. The dataset comes preloaded in Keras in four R arrays, which are organized into two lists named `train` and `test`.
 52 | 
 53 | ```{r}
 54 | train_images <- mnist$train$x
 55 | train_labels <- mnist$train$y
 56 | 
 57 | test_images <- mnist$test$x
 58 | test_labels <- mnist$test$y
 59 | 
 60 | str(train_images)
 61 | str(train_labels)
 62 | ```
 63 | 
 64 | To solve our problem, we will make a neural network with keras. Then, we'll feed the neural network our training data and produce predictions for test images and see how accurate our predictions match the test labels.
 65 | 
 66 | Let's set up the model architecture.
 67 | 
 68 | ```{r}
 69 | first_model <- keras_model_sequential() |>
 70 |     layer_dense(units = 512, activation = "relu") |>
 71 |     layer_dense(units = 10, activation = "softmax")
 72 | ```
 73 | 
 74 | What have we done here? We have set up a model with a linear stack of layers with `keras_model_sequential()`. We have two layers in this model, which are both *fully connected* or dense. The second layer sequentially will return an array of 10 probability scores where each will be the probability that the current digit image is one of the 10 digit classes.
 75 | 
 76 | Now that we have a model, we compile it and pick three things:
 77 | 
 78 | 1.  How to optimize the model with `optimizer` (here the default is `rmsprop`)
 79 | 2.  How to evaluate how good our prediction are with a `loss` function
 80 | 3.  What metrics we should care about with `metrics`
 81 | 
 82 | ```{r}
 83 | ## We don't save this to a variable because it works in place
 84 | compile(first_model,
 85 |         optimizer = "rmsprop",
 86 |         loss = "sparse_categorical_crossentropy",
 87 |         metrics = "accuracy")
 88 | ```
 89 | 
 90 | Now that we have a compiled model, we need to make sure that our data is appropriate for the model. This is a *preprocessing* step. To prepare our image data, we will transform them into the shape that our model expects and scale the data so all values are between 0 and 1 instead of a pixel value between 0 and 255. This can help the neural network optimize its weights by removing the scale factor of the pixel intensities.
 91 | 
 92 | The `array_reshape` function allows us to reshape a three-dimensional array like those found in our `mnist` dataset into matrices.
 93 | 
 94 | ```{r}
 95 | train_images <- array_reshape(train_images, c(60000, 28 * 28))
 96 | train_images <- train_images / 255 
 97 | 
 98 | test_images <- array_reshape(test_images, c(10000, 28 * 28))
 99 | test_images <- test_images /255 
100 | ```
101 | 
102 | Now, we fit our model to the training data. We take our model architecture, the training data, and provide the number of iterations through the training data (`epoch`) and the batch size (128 observations here).
103 | 
104 | ```{r}
105 | fit(first_model,
106 |     train_images, 
107 |     train_labels, 
108 |     epochs = 5,
109 |     # What size should the model break up the data into?
110 |     batch_size = 128)
111 | ```
112 | 
113 | Very quickly we see that our model's accuracy gets very close to perfect. This has to do with the nature of this problem. Other deep learning problems may take much longer to train, and produce far less accuracy. We can now use our model to predict the probabilities of new digits from our test set.
114 | 
115 | ```{r}
116 | test_digits = test_images[1:10,]
117 | predictions = predict(first_model, test_digits)
118 | round(predictions[1,] , 5)
119 | ```
120 | 
121 | Our model's highest probability score is that this image is a "7" (it's in the 8th place because the first possible digit is 0). What does our test data say?
122 | 
123 | ```{r}
124 | test_labels[1]
125 | ```
126 | 
127 | Our model is correct! How good is it on the entire dataset? To find out, we use `evaluate` to compute our metrics over the entire test dataset.
128 | 
129 | ```{r}
130 | metrics <- evaluate(first_model, test_images, test_labels)
131 | metrics["accuracy"]
132 | ```
133 | 
134 | It happens to be the case here that our accuracy on the test dataset is a bit lower than our accuracy on the training set, which is a sign of overfitting.
135 | 
136 | ## TensorFlow
137 | 
138 | TensorFlow is an end-to-end open source machine learning platform, which focuses on simplicity and ease of use. Keras is a deep learning API built on top of TensorFlow. At its heart, TensorFlow is based on the concept of *tensors.*
139 | 
140 | A tensor is defined by:
141 | 
142 | -   the number of axes. A rank 2 tensor is a matrix.
143 | 
144 | -   The number of dimensions, also known as its shape.
145 | 
146 | -   What kind of data is in the tensor.
147 | 
148 | Let's look at the data we have loaded for the images dataset for examples.
149 | 
150 | ```{r}
151 | # Number of axes 
152 | length(dim(mnist$train$x))
153 | 
154 | # Shape 
155 | dim(mnist$train$x)
156 | 
157 | # Datatype 
158 | typeof(mnist$train$x)
159 | 
160 | # Here's what that image representation looks like for the second training example
161 | plot(as.raster(abs(255 - mnist$train$x[2, , ]), max = 255))
162 | ```
163 | 
164 | Tensors are immutable. We create them by putting in an initial value.
165 | 
166 | ```{r}
167 | exampleTensor <- tf$Variable(initial_value = tf$random$normal(shape(3, 1)))
168 | exampleTensor
169 | 
170 | # If we want to change our tensor we have to explicitly assign values
171 | exampleTensor2 <- tf$Variable(initial_value = tf$random$normal(shape(3,1)))
172 | exampleTensor2 <- exampleTensor2$assign(tf$ones(shape(3,1)))
173 | exampleTensor2
174 | ```
175 | 
176 | TensorFlow offers a large number of mathematical operations. Here's some examples.
177 | 
178 | ```{r}
179 | a = tf$ones(c(2L,2L))
180 | b = tf$square(a)
181 | c = a + b
182 | d = tf$sqrt(c)
183 | e = tf$matmul(a,b)
184 | f = e * d
185 | ```
186 | 
187 | ## Challenge
188 | 
189 | We have used two representation layers before the final classification layer. Try the following experiments:
190 | 
191 | -   Building a model with one representation layer. What is the effect on the validation and test accuracy?
192 | 
193 | -   Build a model with three representation layers. What is the effect on the validation and test accuracy?
194 | 
195 | -   What happens if we double the number of units?
196 | 
197 | -   What happens if we halve the number of units?
198 | 
199 | -   What happens if we use a different loss function? Try using `mse` instead of `binary_crossentropy`.
200 | 
201 | ## Universal Workflow for Machine Learning
202 | 
203 | For every Deep Learning or Machine Learning project, there is a universal workflow. This is high level, but useful when considering whether a Deep learning solution can be useful for your project.
204 | 
205 | 1.  First, define the problem. What is your end goal? Can you get a dataset that is annotated with appropriate labels for supervised learning tasks?
206 | 2.  Second, prepare your data. Use feature normalization to make sure that your data is appropriate for a deep learning algorithm.
207 | 3.  Third, pick your evaluation protocol. If you have small data, K-fold validation is a useful protocol. If you have large data, leave aside about 20% of the training data for validation as a rule of thumb.
208 | 4.  Fourth, start by achieving statistical power by setting an appropriate baseline (e.g. random guessing). If your model cannot beat a random guess, than deep learning might not be the right paradigm for the problem.
209 | 5.  Fifth, develop a model that can overfit. With appropriate minimal tuning such as dropout and feature engineering, overfitting indicates that your model can learn. It is much easier to break off learning from an appropriately scaled model than the reverse.
210 | 
211 | ## Sequential and Functional Keras APIs
212 | 
213 | In this section, we will walk through how to build a deep learning model with Keras. So far we have used the "sequential" API, which is easy to use but is also limited. It's rather convenient to quickly spin up and train a model using the sequential API, which we can do in a single pipe:
214 | 
215 | ```{r}
216 | model <- keras_model_sequential() |>
217 |   layer_dense(units = 512, activation = "relu") |>
218 |   layer_dense(units = 256, activation = "relu") |>
219 |   layer_dense(units = 10, activation = "softmax") |>
220 |   compile(optimizer = "rmsprop",
221 |           loss = "sparse_categorical_crossentropy",
222 |           metrics = "accuracy")
223 | 
224 | model |> fit(
225 |   train_images, 
226 |   train_labels, 
227 |   epochs = 5,
228 |   # What size should the model break up the data into?
229 |   batch_size = 128)
230 | ```
231 | 
232 | ```{r}
233 | model |> evaluate(test_images, test_labels)
234 | ```
235 | 
236 | In practice, we build models with Keras using the Functional API. The creator, Francois Chollet describes this API as like playing with LEGO bricks.
237 | 
238 | First, let's convert our previous model into the Functional API.
239 | 
240 | ```{r}
241 | # Naming layers isn't required, but is an option 
242 | # Declare a layer input that holds info about shape and 
243 | # datatype of the data model will use. Here the model will process batches where each sample has a shape of size 3
244 | simple_inputs <- layer_input(shape = c(3), name = "first_input")
245 | 
246 | # Create a layer and compose with the inputs
247 | features <- simple_inputs |>
248 |   layer_dense(64, activation = "relu")
249 | 
250 | # Obtain final outputs by chaining together an additional layer
251 | outputs <- features |>
252 |   layer_dense(10, activation = "softmax")
253 | 
254 | # Instantiate the model by specifying the inputs and outputs with keras_model()
255 | simple_model <- keras_model(inputs = simple_inputs, outputs = outputs)
256 | ```
257 | 
258 | Most models have multiple inputs and multiple outputs. For example, we might want to process different types of information and output different results for each. Consider a research queue on text data with the following structure: the title of an article, the abstract text, and some tagged information about the article added by a research assistant. The first two inputs are text input, and the latter is a categorical input.
259 | 
260 | Suppose we want our model to determine how close an article is to our research project, and which research assistant we should send it to for additional processing. We can build a model like this in a few lines of code with Keras.
261 | 
262 | ```{r}
263 | # The following is made up data to demonstrate the model 
264 | # The total number of words that our model knows about
265 | words <- 10000
266 | num_tags <- 100
267 | gsrs <- 3 
268 | 
269 | title <- layer_input(shape = c(words))
270 | text <- layer_input(shape = c(words))
271 | tags <- layer_input(shape = c(num_tags))
272 | 
273 | ## Combine features via concatenation 
274 | model_features <- layer_concatenate(list(title, text, tags)) |>
275 |   layer_dense(64, activation = "relu")
276 | 
277 | closeness <- model_features |>
278 |   layer_dense(1, activation = "sigmoid")
279 | 
280 | which_gsr <- model_features |>
281 |   layer_dense(gsrs, activation = "softmax")
282 | 
283 | queue_model <- keras_model(
284 |   inputs = list(title, text, tags),
285 |   outputs = list(closeness, which_gsr)
286 | )
287 | ```
288 | 
289 | Training a model works in a similar way to the Sequential API. We call `fit()` and pass the input and output data.
290 | 
291 | ```{r}
292 | samples <- 1280 
293 | 
294 | ## Helper function to create a random vectorized array 
295 | random_vectorized_array <- function(dim) {
296 |   array(sample(0:1, prod(dim), replace = TRUE), dim)
297 | }
298 | 
299 | ## Create fake input and output data 
300 | title_data <- random_vectorized_array(c(samples, words))
301 | text_data <- random_vectorized_array(c(samples, words))
302 | tags_data <- random_vectorized_array(c(samples, num_tags))
303 | 
304 | closeness_data <- random_vectorized_array(c(samples,1))
305 | ra_data <- random_vectorized_array(c(samples, gsrs))
306 | 
307 | ## Compile and fit the model 
308 | queue_model |>
309 |   compile(
310 |     optimizer = "rmsprop",
311 |     ## Examples of having multiple loss and accuracy functions
312 |     loss = c("mse", "categorical_crossentropy"),
313 |     metrics = c("mse", "accuracy")
314 |   )
315 | 
316 | queue_model %>%
317 |   fit(
318 |     x = list(title_data, text_data, tags_data), 
319 |     y = list(closeness_data, ra_data), 
320 |     epochs = 1
321 |   )
322 | 
323 | ## Evaluate the model metrics 
324 | queue_model |>
325 |   evaluate(x = list(title_data, text_data, tags_data),
326 |            y = list(closeness_data, ra_data))
327 | ```
328 | 


--------------------------------------------------------------------------------
/Lessons/2_Natural_Language_Processing.qmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Deep Learning in R: Deep Learning for Text"
  3 | author: "D-Lab"
  4 | format: html
  5 | editor: visual
  6 | ---
  7 | 
  8 | ## Libraries
  9 | 
 10 | We will be using the following libraries:
 11 | 
 12 | ```{r, message = F, warning = F}
 13 | library(tensorflow)
 14 | library(keras)
 15 | library(tfdatasets)
 16 | library(tfhub)
 17 | library(tidyverse)
 18 | library(reticulate)
 19 | ```
 20 | 
 21 | ## Deep Learning for Text and Natural Language Processing
 22 | 
 23 | Language in texts underpin most of our communication and human experience. A natural language is a human created language that is shaped by evolutionary and historical processes. In contrast, a machine readable language is highly structured, with precise syntax from a fixed vocabulary.
 24 | 
 25 | Modern Natural Language Processing (NLP) uses machine learning and large datasets to ingest pieces of text as inputs and return some type of prediction.
 26 | 
 27 | In this section, we cover how to prepare text data for deep learning and the transfer learning for NLP.
 28 | 
 29 | ### Preparing text data
 30 | 
 31 | Deep Learning models rely fundamentally on differentiable functions, which means that we need to convert raw text into numeric tensors. *Text Vectorization* is the process of taking text and turning it into numeric tensors. Any text vectorization process follows the same template.
 32 | 
 33 | 1.  Standardize the text to make it easy to process. Normally we convert it to lowercase and remove punctuation. Here is a simple example of what we mean.
 34 | 
 35 | ```{r, standardize_example}
 36 | sentence_1 = "d-lab is A great PLACE To LEarn deep learning!!"
 37 | sentence_2 = "D-Lab is a great place TO learn Deep Learning!"
 38 | 
 39 | ## Convert to lowercase and remove punctuation 
 40 | sentence_1 |>
 41 |   str_to_lower() |>
 42 |   str_replace_all(pattern = "[:punct:]", "") |>
 43 |   trimws()
 44 | 
 45 | sentence_2 |>
 46 |   str_to_lower() |>
 47 |   str_replace_all(pattern = "[:punct:]", "") |>
 48 |   trimws()
 49 | ```
 50 | 
 51 | 2.  Split the text into tokens, usually characters, words, or small groups of words. Most machine learning workflows tend to avoid character splitting. The more common tokenizations are N-gram tokenizers and word-level tokenizers. N-grams are also referred to as bag of words.
 52 | 
 53 | When we care about word order, we will use word-level tokenizers. When we do not care about the order, but rather words as a set, we will use N-gram tokenizers.
 54 | 
 55 | 3.  Convert each token into a numeric vector, usually after indexing all tokens present in the data.
 56 | 
 57 | ```{r, eval = T}
 58 | ## Use layer_text_vectorization in keras 
 59 | ## Have layer return sequences of words encoded as integer indices
 60 | text_vec = layer_text_vectorization(output_mode = "int")
 61 | ```
 62 | 
 63 | The layer will convert to lowercase and remove punctuation and split on whitespace for tokenization.
 64 | 
 65 | We can pass custom functions to this layer if we so choose. Here's an example of the same default layer behavior with custom functions.
 66 | 
 67 | ```{r, eval = F}
 68 | custom_fn = function(string_tensor){
 69 |   string_tensor |>
 70 |     ## convert strings to lower case 
 71 |     tf$strings$lower() |>
 72 |     ## Replace punctuation with empty string
 73 |     tf$strings$regex_replace("[[:punct:]]", "")
 74 | }
 75 | custom_split_fn = function(string_tensor){
 76 |   ## split strings on whitespace
 77 |   tf$strings$split(string_tensor)
 78 | }
 79 | 
 80 | text_vectorization_example = layer_text_vectorization(
 81 |   output_mode = "int",
 82 |   standardize = custom_fn,
 83 |   split = custom_split_fn
 84 | )
 85 | ```
 86 | 
 87 | To index the vocabulary of a text corpus, we call the `adapt()` method of the layer with a TF Dataset object that yields strings, or a usual character vector. Here is an example drawn from the excellent book "Deep Learning with R" by Francis Chollet, Tomasz Kalinowski, and J.J. Allaire.
 88 | 
 89 | ```{r}
 90 | dataset = c("I write, erase, rewrite", "Erase again, and then",
 91 |             "A poppy blooms.")
 92 | adapt(text_vectorization_example, dataset)
 93 | 
 94 | ## retrive the computed vocabulary via get_vocabulary()
 95 | ## The first two entries are the mask token and the OOV index 
 96 | get_vocabulary(text_vectorization_example)
 97 | 
 98 | ### Encode and decode our example 
 99 | vocab = text_vectorization_example |>
100 |   get_vocabulary()
101 | test_sent = "I write, rewrite, and still rewrite again."
102 | encoded_sent = text_vectorization_example(test_sent)
103 | decoded_sent = paste(vocab[as.integer(encoded_sent) + 1],
104 |                         collapse = " ")
105 | encoded_sent 
106 | decoded_sent
107 | ```
108 | 
109 | We will demonstrate how to model with IMDB Movie reviews dataset from [Maas et al 2011](Andrew%20L.%20Maas,%20Raymond%20E.%20Daly,%20Peter%20T.%20Pham,%20Dan%20Huang,%20Andrew%20Y.%20Ng,%20and%20Christopher%20Potts.%20(2011).%20Learning%20Word%20Vectors%20for%20Sentiment%20Analysis.%20The%2049th%20Annual%20Meeting%20of%20the%20Association%20for%20Computational%20Linguistics%20(ACL%202011)).
110 | 
111 | ```{r}
112 | set.seed(1337)
113 | url = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
114 | 
115 | dataset = get_file(
116 |   "aclImdb_v1",
117 |   url,
118 |   untar = TRUE,
119 |   cache_dir = ".",
120 |   cache_subdir = ""
121 | )
122 | 
123 | dataset_dir = file.path("aclImdb/")
124 | list.files(dataset_dir)
125 | 
126 | ## Look at the training example directory 
127 | train_dir = file.path(dataset_dir, "train")
128 | list.files(train_dir)
129 | 
130 | ## Prepare data to be suitable for training 
131 | remove_dir = file.path(train_dir, "unsup")
132 | unlink(remove_dir, recursive = TRUE)
133 | 
134 | ### Create a validation split with 20% of training for validation 
135 | ### the default batch_size is 32. We put it here to be explicit
136 | batch_size = 32 
137 | 
138 | ## We need a random seed so that the validation and training splits do not have overlap 
139 | seed = 1337 
140 | raw_train_dataset = text_dataset_from_directory(
141 |   "aclImdb/train",
142 |   batch_size = batch_size,
143 |   validation_split = 0.2,
144 |   subset = "training",
145 |   seed = seed
146 | )
147 | 
148 | raw_val_dataset = text_dataset_from_directory(
149 |   "aclImdb/train",
150 |   batch_size = batch_size, 
151 |   validation_split = 0.2, 
152 |   subset = "validation",
153 |   seed = seed
154 | )
155 | 
156 | raw_test_dataset = text_dataset_from_directory(
157 |   "aclImdb/test",
158 |   batch_size = batch_size
159 | )
160 | ```
161 | 
162 | ### Bag of Words Approach
163 | 
164 | ```{r}
165 | ## Unigrams approach 
166 | ## Use multi-hot encoding for binary vectors 
167 | text_vec = layer_text_vectorization(ngrams = 1,
168 |                                     max_tokens = 20000,
169 |                                     output_mode = "multi_hot")
170 | 
171 | ## Get only raw text inputs 
172 | raw_text_train = raw_train_dataset |>
173 |   dataset_map(function(x,y) x)
174 | 
175 | ## Index dataset vocabulary with keras::adapt()
176 | adapt(text_vec, raw_text_train)
177 | 
178 | binary_unigram_train_data = raw_train_dataset |>
179 |   dataset_map(~list(text_vec(.x),.y))
180 | binary_unigram_val_data = raw_val_dataset |>
181 |   dataset_map(~list(text_vec(.x), .y))
182 | binary_unigram_test_data = raw_test_dataset |>
183 |   dataset_map(~list(text_vec(.x), .y))
184 | ```
185 | 
186 | We can write a reusable model constructor to test out different bigrams.
187 | 
188 | ```{r}
189 | nlp_model_constructor = function(max_tokens = 20000,
190 |                                  hidden_dimensions = 16){
191 |   inputs = layer_input(shape = c(max_tokens))
192 |   outputs = inputs |>
193 |     layer_dense(hidden_dimensions, activation = "relu") |>
194 |     ## Include dropout
195 |     layer_dropout(0.5) |>
196 |     ## Predicting a single class so sigmoid is appropriate
197 |     layer_dense(1, activation = "sigmoid")
198 |   
199 |   model = keras_model(inputs, outputs)
200 |   model |>
201 |     compile(
202 |       optimizer = "rmsprop",
203 |       loss = "binary_crossentropy",
204 |       metrics = "accuracy"
205 |     )
206 |   model 
207 | }
208 | ```
209 | 
210 | Train and Test our basic model
211 | 
212 | ```{r}
213 | basic_model = nlp_model_constructor()
214 | basic_model
215 | 
216 | callbacks = list(
217 |   callback_model_checkpoint("binary_unigram.keras", save_best_only = TRUE)
218 | )
219 | 
220 | basic_model |>
221 |   fit(
222 |     dataset_cache(binary_unigram_train_data),
223 |     validation_data = dataset_cache(binary_unigram_val_data),
224 |     epochs = 5,
225 |     callbacks = callbacks
226 |   )
227 | 
228 | model = load_model_tf("binary_unigram.keras")
229 | cat(sprintf("Test accuracy: %.3f\n", evaluate(model, binary_unigram_test_data)["accuracy"]))
230 | 
231 | ```
232 | 
233 | 88% is a strong start. We can compare to a random baseline that just sorts reviews to positive or negative at random to have a top prediction of 50%, so our model definitely learns something from the data relative to baseline. We can return arbitrary N-grams by changing the ngrams argument to a different value. Let's try 3:
234 | 
235 | ```{r}
236 | text_vec3 = layer_text_vectorization(ngrams = 3,
237 |                                     max_tokens = 20000,
238 |                                     output_mode = "multi_hot")
239 | adapt(text_vec3, raw_text_train)
240 | 
241 | ## Wrapper function for vectorization 
242 | dataset_vectorize = function(dat){
243 |   dat |>
244 |     dataset_map(~list(text_vec3(.x), .y))
245 | }
246 | 
247 | binary_3grams_train = raw_train_dataset |>
248 |   dataset_vectorize()
249 | binary_3grams_valid = raw_val_dataset |>
250 |   dataset_vectorize()
251 | binary_3grams_test = raw_test_dataset |>
252 |   dataset_vectorize()
253 | 
254 | model_3gram = nlp_model_constructor()
255 | model_3gram
256 | 
257 | callbacks = list(
258 |   callback_model_checkpoint("binary_3gram.keras", save_best_only = TRUE)
259 | )
260 | 
261 | model_3gram |>
262 |   fit(
263 |     dataset_cache(binary_3grams_train),
264 |     validation_data = dataset_cache(binary_3grams_valid),
265 |     ## Setting for time reasons. We'd want to start with more epochs
266 |     epochs = 5,
267 |     callbacks = callbacks
268 |   )
269 | 
270 | result = load_model_tf("binary_3gram.keras")
271 | cat(sprintf("Test accuracy: %.3f\n", evaluate(result, binary_3grams_test)["accuracy"]))
272 | ```
273 | 
274 | The increase in test accuracy suggests that the ordering immediately around words is pretty important.
275 | 
276 | ### Transfer Learning
277 | 
278 | Transfer Learning is the process of storing knowledge gained while solving one problem and applying it to a different problem. Deep learning models can use pre-trained models as layers in order to potentially make large gains in accuracy on solving new problems.
279 | 
280 | We will use the same IMDB dataset to demonstrate transfer learning with a model called nnlm-en-dim50, a token based text embedding that was trained on Google News's 7B word corpus.
281 | 
282 | To use the model, we first create a Keras layer that uses the model, downloaded from TensorFlow Hub to embed sentences.
283 | 
284 | ```{r}
285 | embedding = "https://tfhub.dev/google/nnlm-en-dim50/2"
286 | nnlm_layer = tfhub::layer_hub(handle = embedding, trainable = TRUE)
287 | ```
288 | 
289 | We can now build a full model. The first layer is the TensorFlow Hub layer. It uses a pre-trained model to map a sentence into its embedding vector.
290 | 
291 | ```{r}
292 | ## For ease of API use, we will switch back to the Sequential API, but conceptually we could create our own model constructor with the Functional API as before. 
293 | hub_model = keras_model_sequential() |>
294 |   nnlm_layer() |>
295 |   layer_dense(16, activation = "relu") |>
296 |   layer_dense(1)
297 | hub_model |>
298 |   compile(
299 |     optimizer = "rmsprop",
300 |     loss = loss_binary_crossentropy(from_logits = TRUE),
301 |     metrics = "accuracy"
302 |   )
303 | 
304 | ## On a CPU this can take a long time to run, so we will only run a single epoch since this is a demonstration. 
305 | history = hub_model |>
306 |   fit(
307 |     raw_train_dataset, 
308 |     epochs = 1, 
309 |     validation_data = raw_val_dataset,
310 |   )
311 | ```
312 | 
313 | Like before we can evaluate the model.
314 | 
315 | ```{r}
316 | hub_results = hub_model |>
317 |   evaluate(raw_test_dataset)
318 | hub_results
319 | ```
320 | 
321 | Transfer learning can be an excellent tool for deep learning problems. Here without any real fine tuning, we already see a respectable accuracy score.
322 | 
323 | This concludes the introduction to Deep Learning with R. We strongly encourage you to check out additional TensorFlow resources at [RStudio TensorFlow](https://tensorflow.rstudio.com/) for more ideas.
324 | 


--------------------------------------------------------------------------------
/R-Deep-Learning.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # R Deep Learning
  2 | 
  3 | [![DataHub](https://img.shields.io/badge/launch-datahub-blue)](https://dlab.datahub.berkeley.edu/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdlab-berkeley%2FR-Deep-Learning&urlpath=rstudio%2F&branch=main)
  4 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dlab-berkeley/R-Deep-Learning/HEAD?urlpath=rstudio)
  5 | [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
  6 | [![Workshop Materials](https://img.shields.io/badge/D--Lab-Workshop%20Materials-blue)](https://docs.google.com/presentation/d/1eQsjdzcareMpEK59EJS5gLqOWIcmQpTjDvQnXIBOh_c/edit?usp=sharing)
  7 | 
  8 | This is the repository for D-Lab’s six-hour Introduction to Deep Learning in R
  9 | workshop. 
 10 | 
 11 | ### Prerequisites
 12 | We recommend attendees be intermediate R users and have had some prior
 13 | exposure to the concepts in
 14 | [R-Machine-Learning](https://github.com/dlab-berkeley/R-Machine-Learning).
 15 | 
 16 | Check D-Lab's [Learning Pathways](https://dlab-berkeley.github.io/dlab-workshops/R_path.html) to figure out which of our workshops to take!
 17 | 
 18 | ## Workshop Goals 
 19 | 
 20 | In this workshop, we provide an introduction to Deep Learning using TensorFlow
 21 | and keras in R. First, we will cover the basics of what makes deep learning
 22 | "deep." Then, we will explore using code to classify images. Along the way, we
 23 | will build a workflow of a deep learning project. 
 24 | 
 25 | ## Installation Instructions
 26 | 
 27 | We will use RStudio to go through the workshop materials, which requires
 28 | installation of R, RStudio, and TensorFlow. Complete the following steps if you
 29 | want to work locally. 
 30 | 
 31 | 1. Download [R](https://cloud.r-project.org/) and
 32 |    [RStudio](https://www.rstudio.com/products/rstudio/download/)
 33 | 
 34 | 2. Within the R console, run the following commands 
 35 | 
 36 | ```
 37 | install.packages(c("tensorflow", "keras", "reticulate")) # Pulls in all R dependencies necessary for TensorFlow in R
 38 | 
 39 | library(reticulate)
 40 | 
 41 | # Set up R with a Python installation it can use
 42 | virtualenv_create("r-reticulate", python = install_python()) 
 43 | 
 44 | library(keras)
 45 | install_keras(envname = "r-reticulate") # Install TensorFlow and Keras python modules
 46 | ```
 47 | 
 48 | After these steps you will have a working Keras and TensorFlow installation.
 49 | This process will take some time if you decide to download to your local
 50 | machine. To determine the TensorFlow version installed on your machine, run in
 51 | the console
 52 | 
 53 | ```
 54 | library(tensorflow)
 55 | tf$constant("Hello Tensorflow!")
 56 | ```
 57 | 
 58 | 3. Install additional packages required for this workshop
 59 | 
 60 | ```
 61 | install.packages(c("tfhub", "tfdatasets")
 62 | ```
 63 | 
 64 | # About the UC Berkeley D-Lab
 65 | 
 66 | D-Lab works with Berkeley faculty, research staff, and students to advance
 67 | data-intensive social science and humanities research. Our goal at D-Lab is to
 68 | provide practical training, staff support, resources, and space to enable you to
 69 | use R for your own research applications. Our services cater to all skill levels
 70 | and no programming, statistical, or computer science backgrounds are necessary.
 71 | We offer these services in the form of workshops, one-to-one consulting, and
 72 | working groups that cover a variety of research topics, digital tools, and
 73 | programming languages.  
 74 | 
 75 | Visit the [D-Lab homepage](https://dlab.berkeley.edu/) to learn more about us.
 76 | You can view our [calendar](https://dlab.berkeley.edu/events/calendar) for
 77 | upcoming events, learn about how to utilize our
 78 | [consulting](https://dlab.berkeley.edu/consulting) and [data
 79 | services](https://dlab.berkeley.edu/data), and check out upcoming
 80 | [workshops](https://dlab.berkeley.edu/events/workshops). Subscribe to our
 81 | [newsletter](https://dlab.berkeley.edu/news/weekly-newsletter) to stay up to
 82 | date on D-Lab events, services, and opportunities.
 83 | 
 84 | 
 85 | # Additional Resources
 86 | 
 87 | * Massive open online courses
 88 |     * [fast.ai - Practical Deep Learning for Coders](https://course.fast.ai/)
 89 |     * [Kaggle Deep Learning](https://www.kaggle.com/learn/deep-learning)
 90 |     * [Google Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/)
 91 |     * [See this](https://developers.google.com/machine-learning/crash-course/fitter/graph) sweet interactive learning rate tool
 92 |     * [Google seedbank examples](https://tools.google.com/seedbank/seeds)
 93 |     * [DeepLearning.ai](https://www.deeplearning.ai/)
 94 |     
 95 | * Workshops
 96 |     * [Nvidia's Modeling Time Series Data with Recurrent Neural Networks in Keras](https://courses.nvidia.com/courses/course-v1:DLI+L-HX-05+V1/about)
 97 | 
 98 | * Stanford
 99 |     * CS 20 - [Tensorflow for Deep Learning Research](http://web.stanford.edu/class/cs20si/syllabus.html)
100 |     * CS 230 - [Deep Learning](http://cs230.stanford.edu/)
101 |     * CS 231n - [Neural Networks for Visual Recognition](http://cs231n.github.io/)
102 |     * CS 224n - [Natural Language Processing with Deep Learning](http://web.stanford.edu/class/cs224n/)
103 | 
104 | * Berkeley
105 |     * Machine Learning at Berkeley [ML@B](https://ml.berkeley.edu/)
106 |     * CS [189/289A](https://people.eecs.berkeley.edu/~jrs/189/)
107 | 
108 | * UToronto CSC 321 - [Intro to Deep Learning](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/)
109 | 
110 | * Videos
111 |     * J.J. Allaire [talk at RStudioConf 2018](https://www.rstudio.com/resources/videos/machine-learning-with-tensorflow-and-r/)
112 | 
113 | * Books
114 |     * F. Chollet and J.J. Allaire - [Deep Learning in R](https://www.manning.com/books/deep-learning-with-r)
115 |     * Charniak E - [Introduction to Deep Learning](https://mitpress.mit.edu/books/introduction-deep-learning)  
116 |     * I. Goodfellow, Y. Bengio, A. Courville - [www.deeplearningbook.org](https://www.deeplearningbook.org/)
117 |     * Zhang et al. - [Dive into Deep Learning](http://en.diveintodeeplearning.org/) 
118 | 
119 | # Other D-Lab R workshops
120 | 
121 | D-Lab offers a variety of R workshops, catered toward different levels of
122 | expertise.
123 | ## Introductory Workshops
124 | 
125 | * [R Data Wrangling](https://github.com/dlab-berkeley/R-Data-Wrangling)
126 | * [R Data Visualization](https://github.com/dlab-berkeley/R-Data-Visualization)
127 | * [R Census Data](https://github.com/dlab-berkeley/Census-Data-in-R)
128 | 
129 | ## Intermediate and Advanced Workshops
130 | * [R Geospatial Fundamentals](https://github.com/dlab-berkeley/R-Geospatial-Fundamentals)
131 | * [R Machine Learning](https://github.com/dlab-berkeley/R-Machine-Learning)
132 | * [R Deep Learning](https://github.com/dlab-berkeley/R-Deep-Learning)
133 | 


--------------------------------------------------------------------------------
/install_script.R:
--------------------------------------------------------------------------------
 1 | # Pulls in all R dependencies necessary for TensorFlow in R
 2 | install.packages(c("tensorflow", "keras", "reticulate"))
 3 | # Load reticulate
 4 | library(reticulate)
 5 | # Set up R with a Python installation it can use
 6 | virtualenv_create("r-reticulate", python = install_python()) 
 7 | # Install TensorFlow and Keras python modules
 8 | library(keras)
 9 | install_keras(envname = "r-reticulate") 
10 | # Install additional packages
11 | install.packages(c("tfhub", "tfdatasets"))
12 | 


--------------------------------------------------------------------------------