├── .gitignore ├── Lessons ├── 1_Deep_Learning_Intro.qmd └── 2_Natural_Language_Processing.qmd ├── R-Deep-Learning.Rproj ├── README.md └── install_script.R /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | .DS_Store 6 | 7 | # Don't track these medical images in git. 8 | data-raw/*.zip 9 | data-raw/Open_I_abd_vs_CXRs/ 10 | data-raw/condensed_2018.json 11 | 12 | # cloudml 13 | runs -------------------------------------------------------------------------------- /Lessons/1_Deep_Learning_Intro.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Deep Learning in R: Introduction" 3 | author: "D-Lab" 4 | format: html 5 | editor: visual 6 | --- 7 | 8 | ## Libraries 9 | 10 | We will be using the following libraries: 11 | 12 | ```{r, message = F, warning = F} 13 | library(tensorflow) 14 | library(keras) 15 | library(tfdatasets) 16 | library(tfhub) 17 | library(tidyverse) 18 | library(reticulate) 19 | ``` 20 | 21 | ## What is Deep Learning? 22 | 23 | In any kind of machine learning, we are interested in mapping inputs (pictures of Oski the Bear) to targets (the label "Oski the Bear"). The machine part means a computer operating an algorithm. The learning part means some type of automatic search process to transform data in such a way as to produce useful representations guided by a feedback signal. 24 | 25 | It turns out that this idea of searching for useful representations of data (such as histograms of the pixels in a picture) within a specified set of possibilities and some rule for how good the representation is, solves a remarkably large set of tasks. 26 | 27 | The "Deep" in deep learning means that the models we build are layered representations of data. Modern deep learning in production can have many layers, and all of these layers are learned automatically from training data. For example, the model GPT-3 has 96. 28 | 29 | ## How to Build a Neural Network 30 | 31 | The representations of layers are learned by models called *neural networks*, where information goes through successive hierarchies to produce something (hopefully) useful at the end. We call what the layer does to input data its *weight* and each transformation implemented the *parameterization* of a layer. 32 | 33 | A model learns by finding values for the weights of all layers in the network so that there will be a correct mapping from example inputs to associated targets. We evaluate the correctness with a loss function which determines the distance between our model predictions and the true values. 34 | 35 | Error is computed as target value minus our estimation. The weights delta metric (how much the errors should change) is calculated as the error times the slope of the point on the activation function times the vector of input features. 36 | 37 | The vector of original weights is added to the vector of updated weights and are "backpropagated" (used as the recycled input) and passed through the model for another epoch. 38 | 39 | With deep networks, the process also takes places between **hidden layers**, or areas of *nonlinear* transformations connected only to the layers before and after them. They are referred to as "hidden" because they are not show as the final output. 40 | 41 | ## Our First Model: Handwriting Digits 42 | 43 | The problem we will try to solve is to classify grayscale images of handwritten digits into their numeric categorizations. Each of these images is 28 x 28 pixels and is one of the first ten digits 0-9. 44 | 45 | ```{r} 46 | library(tensorflow) 47 | library(keras) 48 | mnist = dataset_mnist() 49 | ``` 50 | 51 | The MNIST dataset is in some sense the "Hello World!" of Deep Learning, so we will use it to explain various properties. The dataset comes preloaded in Keras in four R arrays, which are organized into two lists named `train` and `test`. 52 | 53 | ```{r} 54 | train_images <- mnist$train$x 55 | train_labels <- mnist$train$y 56 | 57 | test_images <- mnist$test$x 58 | test_labels <- mnist$test$y 59 | 60 | str(train_images) 61 | str(train_labels) 62 | ``` 63 | 64 | To solve our problem, we will make a neural network with keras. Then, we'll feed the neural network our training data and produce predictions for test images and see how accurate our predictions match the test labels. 65 | 66 | Let's set up the model architecture. 67 | 68 | ```{r} 69 | first_model <- keras_model_sequential() |> 70 | layer_dense(units = 512, activation = "relu") |> 71 | layer_dense(units = 10, activation = "softmax") 72 | ``` 73 | 74 | What have we done here? We have set up a model with a linear stack of layers with `keras_model_sequential()`. We have two layers in this model, which are both *fully connected* or dense. The second layer sequentially will return an array of 10 probability scores where each will be the probability that the current digit image is one of the 10 digit classes. 75 | 76 | Now that we have a model, we compile it and pick three things: 77 | 78 | 1. How to optimize the model with `optimizer` (here the default is `rmsprop`) 79 | 2. How to evaluate how good our prediction are with a `loss` function 80 | 3. What metrics we should care about with `metrics` 81 | 82 | ```{r} 83 | ## We don't save this to a variable because it works in place 84 | compile(first_model, 85 | optimizer = "rmsprop", 86 | loss = "sparse_categorical_crossentropy", 87 | metrics = "accuracy") 88 | ``` 89 | 90 | Now that we have a compiled model, we need to make sure that our data is appropriate for the model. This is a *preprocessing* step. To prepare our image data, we will transform them into the shape that our model expects and scale the data so all values are between 0 and 1 instead of a pixel value between 0 and 255. This can help the neural network optimize its weights by removing the scale factor of the pixel intensities. 91 | 92 | The `array_reshape` function allows us to reshape a three-dimensional array like those found in our `mnist` dataset into matrices. 93 | 94 | ```{r} 95 | train_images <- array_reshape(train_images, c(60000, 28 * 28)) 96 | train_images <- train_images / 255 97 | 98 | test_images <- array_reshape(test_images, c(10000, 28 * 28)) 99 | test_images <- test_images /255 100 | ``` 101 | 102 | Now, we fit our model to the training data. We take our model architecture, the training data, and provide the number of iterations through the training data (`epoch`) and the batch size (128 observations here). 103 | 104 | ```{r} 105 | fit(first_model, 106 | train_images, 107 | train_labels, 108 | epochs = 5, 109 | # What size should the model break up the data into? 110 | batch_size = 128) 111 | ``` 112 | 113 | Very quickly we see that our model's accuracy gets very close to perfect. This has to do with the nature of this problem. Other deep learning problems may take much longer to train, and produce far less accuracy. We can now use our model to predict the probabilities of new digits from our test set. 114 | 115 | ```{r} 116 | test_digits = test_images[1:10,] 117 | predictions = predict(first_model, test_digits) 118 | round(predictions[1,] , 5) 119 | ``` 120 | 121 | Our model's highest probability score is that this image is a "7" (it's in the 8th place because the first possible digit is 0). What does our test data say? 122 | 123 | ```{r} 124 | test_labels[1] 125 | ``` 126 | 127 | Our model is correct! How good is it on the entire dataset? To find out, we use `evaluate` to compute our metrics over the entire test dataset. 128 | 129 | ```{r} 130 | metrics <- evaluate(first_model, test_images, test_labels) 131 | metrics["accuracy"] 132 | ``` 133 | 134 | It happens to be the case here that our accuracy on the test dataset is a bit lower than our accuracy on the training set, which is a sign of overfitting. 135 | 136 | ## TensorFlow 137 | 138 | TensorFlow is an end-to-end open source machine learning platform, which focuses on simplicity and ease of use. Keras is a deep learning API built on top of TensorFlow. At its heart, TensorFlow is based on the concept of *tensors.* 139 | 140 | A tensor is defined by: 141 | 142 | - the number of axes. A rank 2 tensor is a matrix. 143 | 144 | - The number of dimensions, also known as its shape. 145 | 146 | - What kind of data is in the tensor. 147 | 148 | Let's look at the data we have loaded for the images dataset for examples. 149 | 150 | ```{r} 151 | # Number of axes 152 | length(dim(mnist$train$x)) 153 | 154 | # Shape 155 | dim(mnist$train$x) 156 | 157 | # Datatype 158 | typeof(mnist$train$x) 159 | 160 | # Here's what that image representation looks like for the second training example 161 | plot(as.raster(abs(255 - mnist$train$x[2, , ]), max = 255)) 162 | ``` 163 | 164 | Tensors are immutable. We create them by putting in an initial value. 165 | 166 | ```{r} 167 | exampleTensor <- tf$Variable(initial_value = tf$random$normal(shape(3, 1))) 168 | exampleTensor 169 | 170 | # If we want to change our tensor we have to explicitly assign values 171 | exampleTensor2 <- tf$Variable(initial_value = tf$random$normal(shape(3,1))) 172 | exampleTensor2 <- exampleTensor2$assign(tf$ones(shape(3,1))) 173 | exampleTensor2 174 | ``` 175 | 176 | TensorFlow offers a large number of mathematical operations. Here's some examples. 177 | 178 | ```{r} 179 | a = tf$ones(c(2L,2L)) 180 | b = tf$square(a) 181 | c = a + b 182 | d = tf$sqrt(c) 183 | e = tf$matmul(a,b) 184 | f = e * d 185 | ``` 186 | 187 | ## Challenge 188 | 189 | We have used two representation layers before the final classification layer. Try the following experiments: 190 | 191 | - Building a model with one representation layer. What is the effect on the validation and test accuracy? 192 | 193 | - Build a model with three representation layers. What is the effect on the validation and test accuracy? 194 | 195 | - What happens if we double the number of units? 196 | 197 | - What happens if we halve the number of units? 198 | 199 | - What happens if we use a different loss function? Try using `mse` instead of `binary_crossentropy`. 200 | 201 | ## Universal Workflow for Machine Learning 202 | 203 | For every Deep Learning or Machine Learning project, there is a universal workflow. This is high level, but useful when considering whether a Deep learning solution can be useful for your project. 204 | 205 | 1. First, define the problem. What is your end goal? Can you get a dataset that is annotated with appropriate labels for supervised learning tasks? 206 | 2. Second, prepare your data. Use feature normalization to make sure that your data is appropriate for a deep learning algorithm. 207 | 3. Third, pick your evaluation protocol. If you have small data, K-fold validation is a useful protocol. If you have large data, leave aside about 20% of the training data for validation as a rule of thumb. 208 | 4. Fourth, start by achieving statistical power by setting an appropriate baseline (e.g. random guessing). If your model cannot beat a random guess, than deep learning might not be the right paradigm for the problem. 209 | 5. Fifth, develop a model that can overfit. With appropriate minimal tuning such as dropout and feature engineering, overfitting indicates that your model can learn. It is much easier to break off learning from an appropriately scaled model than the reverse. 210 | 211 | ## Sequential and Functional Keras APIs 212 | 213 | In this section, we will walk through how to build a deep learning model with Keras. So far we have used the "sequential" API, which is easy to use but is also limited. It's rather convenient to quickly spin up and train a model using the sequential API, which we can do in a single pipe: 214 | 215 | ```{r} 216 | model <- keras_model_sequential() |> 217 | layer_dense(units = 512, activation = "relu") |> 218 | layer_dense(units = 256, activation = "relu") |> 219 | layer_dense(units = 10, activation = "softmax") |> 220 | compile(optimizer = "rmsprop", 221 | loss = "sparse_categorical_crossentropy", 222 | metrics = "accuracy") 223 | 224 | model |> fit( 225 | train_images, 226 | train_labels, 227 | epochs = 5, 228 | # What size should the model break up the data into? 229 | batch_size = 128) 230 | ``` 231 | 232 | ```{r} 233 | model |> evaluate(test_images, test_labels) 234 | ``` 235 | 236 | In practice, we build models with Keras using the Functional API. The creator, Francois Chollet describes this API as like playing with LEGO bricks. 237 | 238 | First, let's convert our previous model into the Functional API. 239 | 240 | ```{r} 241 | # Naming layers isn't required, but is an option 242 | # Declare a layer input that holds info about shape and 243 | # datatype of the data model will use. Here the model will process batches where each sample has a shape of size 3 244 | simple_inputs <- layer_input(shape = c(3), name = "first_input") 245 | 246 | # Create a layer and compose with the inputs 247 | features <- simple_inputs |> 248 | layer_dense(64, activation = "relu") 249 | 250 | # Obtain final outputs by chaining together an additional layer 251 | outputs <- features |> 252 | layer_dense(10, activation = "softmax") 253 | 254 | # Instantiate the model by specifying the inputs and outputs with keras_model() 255 | simple_model <- keras_model(inputs = simple_inputs, outputs = outputs) 256 | ``` 257 | 258 | Most models have multiple inputs and multiple outputs. For example, we might want to process different types of information and output different results for each. Consider a research queue on text data with the following structure: the title of an article, the abstract text, and some tagged information about the article added by a research assistant. The first two inputs are text input, and the latter is a categorical input. 259 | 260 | Suppose we want our model to determine how close an article is to our research project, and which research assistant we should send it to for additional processing. We can build a model like this in a few lines of code with Keras. 261 | 262 | ```{r} 263 | # The following is made up data to demonstrate the model 264 | # The total number of words that our model knows about 265 | words <- 10000 266 | num_tags <- 100 267 | gsrs <- 3 268 | 269 | title <- layer_input(shape = c(words)) 270 | text <- layer_input(shape = c(words)) 271 | tags <- layer_input(shape = c(num_tags)) 272 | 273 | ## Combine features via concatenation 274 | model_features <- layer_concatenate(list(title, text, tags)) |> 275 | layer_dense(64, activation = "relu") 276 | 277 | closeness <- model_features |> 278 | layer_dense(1, activation = "sigmoid") 279 | 280 | which_gsr <- model_features |> 281 | layer_dense(gsrs, activation = "softmax") 282 | 283 | queue_model <- keras_model( 284 | inputs = list(title, text, tags), 285 | outputs = list(closeness, which_gsr) 286 | ) 287 | ``` 288 | 289 | Training a model works in a similar way to the Sequential API. We call `fit()` and pass the input and output data. 290 | 291 | ```{r} 292 | samples <- 1280 293 | 294 | ## Helper function to create a random vectorized array 295 | random_vectorized_array <- function(dim) { 296 | array(sample(0:1, prod(dim), replace = TRUE), dim) 297 | } 298 | 299 | ## Create fake input and output data 300 | title_data <- random_vectorized_array(c(samples, words)) 301 | text_data <- random_vectorized_array(c(samples, words)) 302 | tags_data <- random_vectorized_array(c(samples, num_tags)) 303 | 304 | closeness_data <- random_vectorized_array(c(samples,1)) 305 | ra_data <- random_vectorized_array(c(samples, gsrs)) 306 | 307 | ## Compile and fit the model 308 | queue_model |> 309 | compile( 310 | optimizer = "rmsprop", 311 | ## Examples of having multiple loss and accuracy functions 312 | loss = c("mse", "categorical_crossentropy"), 313 | metrics = c("mse", "accuracy") 314 | ) 315 | 316 | queue_model %>% 317 | fit( 318 | x = list(title_data, text_data, tags_data), 319 | y = list(closeness_data, ra_data), 320 | epochs = 1 321 | ) 322 | 323 | ## Evaluate the model metrics 324 | queue_model |> 325 | evaluate(x = list(title_data, text_data, tags_data), 326 | y = list(closeness_data, ra_data)) 327 | ``` 328 | -------------------------------------------------------------------------------- /Lessons/2_Natural_Language_Processing.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Deep Learning in R: Deep Learning for Text" 3 | author: "D-Lab" 4 | format: html 5 | editor: visual 6 | --- 7 | 8 | ## Libraries 9 | 10 | We will be using the following libraries: 11 | 12 | ```{r, message = F, warning = F} 13 | library(tensorflow) 14 | library(keras) 15 | library(tfdatasets) 16 | library(tfhub) 17 | library(tidyverse) 18 | library(reticulate) 19 | ``` 20 | 21 | ## Deep Learning for Text and Natural Language Processing 22 | 23 | Language in texts underpin most of our communication and human experience. A natural language is a human created language that is shaped by evolutionary and historical processes. In contrast, a machine readable language is highly structured, with precise syntax from a fixed vocabulary. 24 | 25 | Modern Natural Language Processing (NLP) uses machine learning and large datasets to ingest pieces of text as inputs and return some type of prediction. 26 | 27 | In this section, we cover how to prepare text data for deep learning and the transfer learning for NLP. 28 | 29 | ### Preparing text data 30 | 31 | Deep Learning models rely fundamentally on differentiable functions, which means that we need to convert raw text into numeric tensors. *Text Vectorization* is the process of taking text and turning it into numeric tensors. Any text vectorization process follows the same template. 32 | 33 | 1. Standardize the text to make it easy to process. Normally we convert it to lowercase and remove punctuation. Here is a simple example of what we mean. 34 | 35 | ```{r, standardize_example} 36 | sentence_1 = "d-lab is A great PLACE To LEarn deep learning!!" 37 | sentence_2 = "D-Lab is a great place TO learn Deep Learning!" 38 | 39 | ## Convert to lowercase and remove punctuation 40 | sentence_1 |> 41 | str_to_lower() |> 42 | str_replace_all(pattern = "[:punct:]", "") |> 43 | trimws() 44 | 45 | sentence_2 |> 46 | str_to_lower() |> 47 | str_replace_all(pattern = "[:punct:]", "") |> 48 | trimws() 49 | ``` 50 | 51 | 2. Split the text into tokens, usually characters, words, or small groups of words. Most machine learning workflows tend to avoid character splitting. The more common tokenizations are N-gram tokenizers and word-level tokenizers. N-grams are also referred to as bag of words. 52 | 53 | When we care about word order, we will use word-level tokenizers. When we do not care about the order, but rather words as a set, we will use N-gram tokenizers. 54 | 55 | 3. Convert each token into a numeric vector, usually after indexing all tokens present in the data. 56 | 57 | ```{r, eval = T} 58 | ## Use layer_text_vectorization in keras 59 | ## Have layer return sequences of words encoded as integer indices 60 | text_vec = layer_text_vectorization(output_mode = "int") 61 | ``` 62 | 63 | The layer will convert to lowercase and remove punctuation and split on whitespace for tokenization. 64 | 65 | We can pass custom functions to this layer if we so choose. Here's an example of the same default layer behavior with custom functions. 66 | 67 | ```{r, eval = F} 68 | custom_fn = function(string_tensor){ 69 | string_tensor |> 70 | ## convert strings to lower case 71 | tf$strings$lower() |> 72 | ## Replace punctuation with empty string 73 | tf$strings$regex_replace("[[:punct:]]", "") 74 | } 75 | custom_split_fn = function(string_tensor){ 76 | ## split strings on whitespace 77 | tf$strings$split(string_tensor) 78 | } 79 | 80 | text_vectorization_example = layer_text_vectorization( 81 | output_mode = "int", 82 | standardize = custom_fn, 83 | split = custom_split_fn 84 | ) 85 | ``` 86 | 87 | To index the vocabulary of a text corpus, we call the `adapt()` method of the layer with a TF Dataset object that yields strings, or a usual character vector. Here is an example drawn from the excellent book "Deep Learning with R" by Francis Chollet, Tomasz Kalinowski, and J.J. Allaire. 88 | 89 | ```{r} 90 | dataset = c("I write, erase, rewrite", "Erase again, and then", 91 | "A poppy blooms.") 92 | adapt(text_vectorization_example, dataset) 93 | 94 | ## retrive the computed vocabulary via get_vocabulary() 95 | ## The first two entries are the mask token and the OOV index 96 | get_vocabulary(text_vectorization_example) 97 | 98 | ### Encode and decode our example 99 | vocab = text_vectorization_example |> 100 | get_vocabulary() 101 | test_sent = "I write, rewrite, and still rewrite again." 102 | encoded_sent = text_vectorization_example(test_sent) 103 | decoded_sent = paste(vocab[as.integer(encoded_sent) + 1], 104 | collapse = " ") 105 | encoded_sent 106 | decoded_sent 107 | ``` 108 | 109 | We will demonstrate how to model with IMDB Movie reviews dataset from [Maas et al 2011](Andrew%20L.%20Maas,%20Raymond%20E.%20Daly,%20Peter%20T.%20Pham,%20Dan%20Huang,%20Andrew%20Y.%20Ng,%20and%20Christopher%20Potts.%20(2011).%20Learning%20Word%20Vectors%20for%20Sentiment%20Analysis.%20The%2049th%20Annual%20Meeting%20of%20the%20Association%20for%20Computational%20Linguistics%20(ACL%202011)). 110 | 111 | ```{r} 112 | set.seed(1337) 113 | url = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz" 114 | 115 | dataset = get_file( 116 | "aclImdb_v1", 117 | url, 118 | untar = TRUE, 119 | cache_dir = ".", 120 | cache_subdir = "" 121 | ) 122 | 123 | dataset_dir = file.path("aclImdb/") 124 | list.files(dataset_dir) 125 | 126 | ## Look at the training example directory 127 | train_dir = file.path(dataset_dir, "train") 128 | list.files(train_dir) 129 | 130 | ## Prepare data to be suitable for training 131 | remove_dir = file.path(train_dir, "unsup") 132 | unlink(remove_dir, recursive = TRUE) 133 | 134 | ### Create a validation split with 20% of training for validation 135 | ### the default batch_size is 32. We put it here to be explicit 136 | batch_size = 32 137 | 138 | ## We need a random seed so that the validation and training splits do not have overlap 139 | seed = 1337 140 | raw_train_dataset = text_dataset_from_directory( 141 | "aclImdb/train", 142 | batch_size = batch_size, 143 | validation_split = 0.2, 144 | subset = "training", 145 | seed = seed 146 | ) 147 | 148 | raw_val_dataset = text_dataset_from_directory( 149 | "aclImdb/train", 150 | batch_size = batch_size, 151 | validation_split = 0.2, 152 | subset = "validation", 153 | seed = seed 154 | ) 155 | 156 | raw_test_dataset = text_dataset_from_directory( 157 | "aclImdb/test", 158 | batch_size = batch_size 159 | ) 160 | ``` 161 | 162 | ### Bag of Words Approach 163 | 164 | ```{r} 165 | ## Unigrams approach 166 | ## Use multi-hot encoding for binary vectors 167 | text_vec = layer_text_vectorization(ngrams = 1, 168 | max_tokens = 20000, 169 | output_mode = "multi_hot") 170 | 171 | ## Get only raw text inputs 172 | raw_text_train = raw_train_dataset |> 173 | dataset_map(function(x,y) x) 174 | 175 | ## Index dataset vocabulary with keras::adapt() 176 | adapt(text_vec, raw_text_train) 177 | 178 | binary_unigram_train_data = raw_train_dataset |> 179 | dataset_map(~list(text_vec(.x),.y)) 180 | binary_unigram_val_data = raw_val_dataset |> 181 | dataset_map(~list(text_vec(.x), .y)) 182 | binary_unigram_test_data = raw_test_dataset |> 183 | dataset_map(~list(text_vec(.x), .y)) 184 | ``` 185 | 186 | We can write a reusable model constructor to test out different bigrams. 187 | 188 | ```{r} 189 | nlp_model_constructor = function(max_tokens = 20000, 190 | hidden_dimensions = 16){ 191 | inputs = layer_input(shape = c(max_tokens)) 192 | outputs = inputs |> 193 | layer_dense(hidden_dimensions, activation = "relu") |> 194 | ## Include dropout 195 | layer_dropout(0.5) |> 196 | ## Predicting a single class so sigmoid is appropriate 197 | layer_dense(1, activation = "sigmoid") 198 | 199 | model = keras_model(inputs, outputs) 200 | model |> 201 | compile( 202 | optimizer = "rmsprop", 203 | loss = "binary_crossentropy", 204 | metrics = "accuracy" 205 | ) 206 | model 207 | } 208 | ``` 209 | 210 | Train and Test our basic model 211 | 212 | ```{r} 213 | basic_model = nlp_model_constructor() 214 | basic_model 215 | 216 | callbacks = list( 217 | callback_model_checkpoint("binary_unigram.keras", save_best_only = TRUE) 218 | ) 219 | 220 | basic_model |> 221 | fit( 222 | dataset_cache(binary_unigram_train_data), 223 | validation_data = dataset_cache(binary_unigram_val_data), 224 | epochs = 5, 225 | callbacks = callbacks 226 | ) 227 | 228 | model = load_model_tf("binary_unigram.keras") 229 | cat(sprintf("Test accuracy: %.3f\n", evaluate(model, binary_unigram_test_data)["accuracy"])) 230 | 231 | ``` 232 | 233 | 88% is a strong start. We can compare to a random baseline that just sorts reviews to positive or negative at random to have a top prediction of 50%, so our model definitely learns something from the data relative to baseline. We can return arbitrary N-grams by changing the ngrams argument to a different value. Let's try 3: 234 | 235 | ```{r} 236 | text_vec3 = layer_text_vectorization(ngrams = 3, 237 | max_tokens = 20000, 238 | output_mode = "multi_hot") 239 | adapt(text_vec3, raw_text_train) 240 | 241 | ## Wrapper function for vectorization 242 | dataset_vectorize = function(dat){ 243 | dat |> 244 | dataset_map(~list(text_vec3(.x), .y)) 245 | } 246 | 247 | binary_3grams_train = raw_train_dataset |> 248 | dataset_vectorize() 249 | binary_3grams_valid = raw_val_dataset |> 250 | dataset_vectorize() 251 | binary_3grams_test = raw_test_dataset |> 252 | dataset_vectorize() 253 | 254 | model_3gram = nlp_model_constructor() 255 | model_3gram 256 | 257 | callbacks = list( 258 | callback_model_checkpoint("binary_3gram.keras", save_best_only = TRUE) 259 | ) 260 | 261 | model_3gram |> 262 | fit( 263 | dataset_cache(binary_3grams_train), 264 | validation_data = dataset_cache(binary_3grams_valid), 265 | ## Setting for time reasons. We'd want to start with more epochs 266 | epochs = 5, 267 | callbacks = callbacks 268 | ) 269 | 270 | result = load_model_tf("binary_3gram.keras") 271 | cat(sprintf("Test accuracy: %.3f\n", evaluate(result, binary_3grams_test)["accuracy"])) 272 | ``` 273 | 274 | The increase in test accuracy suggests that the ordering immediately around words is pretty important. 275 | 276 | ### Transfer Learning 277 | 278 | Transfer Learning is the process of storing knowledge gained while solving one problem and applying it to a different problem. Deep learning models can use pre-trained models as layers in order to potentially make large gains in accuracy on solving new problems. 279 | 280 | We will use the same IMDB dataset to demonstrate transfer learning with a model called nnlm-en-dim50, a token based text embedding that was trained on Google News's 7B word corpus. 281 | 282 | To use the model, we first create a Keras layer that uses the model, downloaded from TensorFlow Hub to embed sentences. 283 | 284 | ```{r} 285 | embedding = "https://tfhub.dev/google/nnlm-en-dim50/2" 286 | nnlm_layer = tfhub::layer_hub(handle = embedding, trainable = TRUE) 287 | ``` 288 | 289 | We can now build a full model. The first layer is the TensorFlow Hub layer. It uses a pre-trained model to map a sentence into its embedding vector. 290 | 291 | ```{r} 292 | ## For ease of API use, we will switch back to the Sequential API, but conceptually we could create our own model constructor with the Functional API as before. 293 | hub_model = keras_model_sequential() |> 294 | nnlm_layer() |> 295 | layer_dense(16, activation = "relu") |> 296 | layer_dense(1) 297 | hub_model |> 298 | compile( 299 | optimizer = "rmsprop", 300 | loss = loss_binary_crossentropy(from_logits = TRUE), 301 | metrics = "accuracy" 302 | ) 303 | 304 | ## On a CPU this can take a long time to run, so we will only run a single epoch since this is a demonstration. 305 | history = hub_model |> 306 | fit( 307 | raw_train_dataset, 308 | epochs = 1, 309 | validation_data = raw_val_dataset, 310 | ) 311 | ``` 312 | 313 | Like before we can evaluate the model. 314 | 315 | ```{r} 316 | hub_results = hub_model |> 317 | evaluate(raw_test_dataset) 318 | hub_results 319 | ``` 320 | 321 | Transfer learning can be an excellent tool for deep learning problems. Here without any real fine tuning, we already see a respectable accuracy score. 322 | 323 | This concludes the introduction to Deep Learning with R. We strongly encourage you to check out additional TensorFlow resources at [RStudio TensorFlow](https://tensorflow.rstudio.com/) for more ideas. 324 | -------------------------------------------------------------------------------- /R-Deep-Learning.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # R Deep Learning 2 | 3 | [![DataHub](https://img.shields.io/badge/launch-datahub-blue)](https://dlab.datahub.berkeley.edu/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdlab-berkeley%2FR-Deep-Learning&urlpath=rstudio%2F&branch=main) 4 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dlab-berkeley/R-Deep-Learning/HEAD?urlpath=rstudio) 5 | [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) 6 | [![Workshop Materials](https://img.shields.io/badge/D--Lab-Workshop%20Materials-blue)](https://docs.google.com/presentation/d/1eQsjdzcareMpEK59EJS5gLqOWIcmQpTjDvQnXIBOh_c/edit?usp=sharing) 7 | 8 | This is the repository for D-Lab’s six-hour Introduction to Deep Learning in R 9 | workshop. 10 | 11 | ### Prerequisites 12 | We recommend attendees be intermediate R users and have had some prior 13 | exposure to the concepts in 14 | [R-Machine-Learning](https://github.com/dlab-berkeley/R-Machine-Learning). 15 | 16 | Check D-Lab's [Learning Pathways](https://dlab-berkeley.github.io/dlab-workshops/R_path.html) to figure out which of our workshops to take! 17 | 18 | ## Workshop Goals 19 | 20 | In this workshop, we provide an introduction to Deep Learning using TensorFlow 21 | and keras in R. First, we will cover the basics of what makes deep learning 22 | "deep." Then, we will explore using code to classify images. Along the way, we 23 | will build a workflow of a deep learning project. 24 | 25 | ## Installation Instructions 26 | 27 | We will use RStudio to go through the workshop materials, which requires 28 | installation of R, RStudio, and TensorFlow. Complete the following steps if you 29 | want to work locally. 30 | 31 | 1. Download [R](https://cloud.r-project.org/) and 32 | [RStudio](https://www.rstudio.com/products/rstudio/download/) 33 | 34 | 2. Within the R console, run the following commands 35 | 36 | ``` 37 | install.packages(c("tensorflow", "keras", "reticulate")) # Pulls in all R dependencies necessary for TensorFlow in R 38 | 39 | library(reticulate) 40 | 41 | # Set up R with a Python installation it can use 42 | virtualenv_create("r-reticulate", python = install_python()) 43 | 44 | library(keras) 45 | install_keras(envname = "r-reticulate") # Install TensorFlow and Keras python modules 46 | ``` 47 | 48 | After these steps you will have a working Keras and TensorFlow installation. 49 | This process will take some time if you decide to download to your local 50 | machine. To determine the TensorFlow version installed on your machine, run in 51 | the console 52 | 53 | ``` 54 | library(tensorflow) 55 | tf$constant("Hello Tensorflow!") 56 | ``` 57 | 58 | 3. Install additional packages required for this workshop 59 | 60 | ``` 61 | install.packages(c("tfhub", "tfdatasets") 62 | ``` 63 | 64 | # About the UC Berkeley D-Lab 65 | 66 | D-Lab works with Berkeley faculty, research staff, and students to advance 67 | data-intensive social science and humanities research. Our goal at D-Lab is to 68 | provide practical training, staff support, resources, and space to enable you to 69 | use R for your own research applications. Our services cater to all skill levels 70 | and no programming, statistical, or computer science backgrounds are necessary. 71 | We offer these services in the form of workshops, one-to-one consulting, and 72 | working groups that cover a variety of research topics, digital tools, and 73 | programming languages. 74 | 75 | Visit the [D-Lab homepage](https://dlab.berkeley.edu/) to learn more about us. 76 | You can view our [calendar](https://dlab.berkeley.edu/events/calendar) for 77 | upcoming events, learn about how to utilize our 78 | [consulting](https://dlab.berkeley.edu/consulting) and [data 79 | services](https://dlab.berkeley.edu/data), and check out upcoming 80 | [workshops](https://dlab.berkeley.edu/events/workshops). Subscribe to our 81 | [newsletter](https://dlab.berkeley.edu/news/weekly-newsletter) to stay up to 82 | date on D-Lab events, services, and opportunities. 83 | 84 | 85 | # Additional Resources 86 | 87 | * Massive open online courses 88 | * [fast.ai - Practical Deep Learning for Coders](https://course.fast.ai/) 89 | * [Kaggle Deep Learning](https://www.kaggle.com/learn/deep-learning) 90 | * [Google Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/) 91 | * [See this](https://developers.google.com/machine-learning/crash-course/fitter/graph) sweet interactive learning rate tool 92 | * [Google seedbank examples](https://tools.google.com/seedbank/seeds) 93 | * [DeepLearning.ai](https://www.deeplearning.ai/) 94 | 95 | * Workshops 96 | * [Nvidia's Modeling Time Series Data with Recurrent Neural Networks in Keras](https://courses.nvidia.com/courses/course-v1:DLI+L-HX-05+V1/about) 97 | 98 | * Stanford 99 | * CS 20 - [Tensorflow for Deep Learning Research](http://web.stanford.edu/class/cs20si/syllabus.html) 100 | * CS 230 - [Deep Learning](http://cs230.stanford.edu/) 101 | * CS 231n - [Neural Networks for Visual Recognition](http://cs231n.github.io/) 102 | * CS 224n - [Natural Language Processing with Deep Learning](http://web.stanford.edu/class/cs224n/) 103 | 104 | * Berkeley 105 | * Machine Learning at Berkeley [ML@B](https://ml.berkeley.edu/) 106 | * CS [189/289A](https://people.eecs.berkeley.edu/~jrs/189/) 107 | 108 | * UToronto CSC 321 - [Intro to Deep Learning](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/) 109 | 110 | * Videos 111 | * J.J. Allaire [talk at RStudioConf 2018](https://www.rstudio.com/resources/videos/machine-learning-with-tensorflow-and-r/) 112 | 113 | * Books 114 | * F. Chollet and J.J. Allaire - [Deep Learning in R](https://www.manning.com/books/deep-learning-with-r) 115 | * Charniak E - [Introduction to Deep Learning](https://mitpress.mit.edu/books/introduction-deep-learning) 116 | * I. Goodfellow, Y. Bengio, A. Courville - [www.deeplearningbook.org](https://www.deeplearningbook.org/) 117 | * Zhang et al. - [Dive into Deep Learning](http://en.diveintodeeplearning.org/) 118 | 119 | # Other D-Lab R workshops 120 | 121 | D-Lab offers a variety of R workshops, catered toward different levels of 122 | expertise. 123 | ## Introductory Workshops 124 | 125 | * [R Data Wrangling](https://github.com/dlab-berkeley/R-Data-Wrangling) 126 | * [R Data Visualization](https://github.com/dlab-berkeley/R-Data-Visualization) 127 | * [R Census Data](https://github.com/dlab-berkeley/Census-Data-in-R) 128 | 129 | ## Intermediate and Advanced Workshops 130 | * [R Geospatial Fundamentals](https://github.com/dlab-berkeley/R-Geospatial-Fundamentals) 131 | * [R Machine Learning](https://github.com/dlab-berkeley/R-Machine-Learning) 132 | * [R Deep Learning](https://github.com/dlab-berkeley/R-Deep-Learning) 133 | -------------------------------------------------------------------------------- /install_script.R: -------------------------------------------------------------------------------- 1 | # Pulls in all R dependencies necessary for TensorFlow in R 2 | install.packages(c("tensorflow", "keras", "reticulate")) 3 | # Load reticulate 4 | library(reticulate) 5 | # Set up R with a Python installation it can use 6 | virtualenv_create("r-reticulate", python = install_python()) 7 | # Install TensorFlow and Keras python modules 8 | library(keras) 9 | install_keras(envname = "r-reticulate") 10 | # Install additional packages 11 | install.packages(c("tfhub", "tfdatasets")) 12 | --------------------------------------------------------------------------------