├── .gitignore
├── README.md
├── tabular_binary.R
├── tabular_multiclass.R
├── tabular_regression.R
└── torch_examples.Rmd


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # torch_tutorials
 2 | 
 3 | Torch is new to R, and it can be intimidating to use. The goal of this repo is to provide examples and skeleton scripts which will show how to to use torch in common ML scenarios (e.g. regression on tabular data, binary classification on text data, etc.). Moreso than vignettes, I hope to be able to identify and address common errors and roadblocks that users will come across.
 4 | 
 5 | Hopefully, data scientists will find this a useful resource for how to get up and running with torch.
 6 | 
 7 | I also feel obligated to note that model performance is not the focus of this repo, and as a result, the models perform poorly. The point of this repo is to take simple examples from reading in data, to training and evaluating models, and getting the output in a usable format.
 8 | 
 9 | ## Tabular Input Data  
10 | 
11 | The tabular input data section of this repo all uses the Palmer Penguins data.
12 | 
13 | ### Regression  
14 | 
15 | `tabular_regression.R`
16 | 
17 | ### Binary Classification
18 | 
19 | `tabular_binary.R`
20 | 
21 | ### Multi-class Classification
22 | 
23 | `tabular_multiclass.R`
24 | 
25 | ## ..Coming Soon.. 
26 | 
27 | * Image Input Data (torchvision)
28 | 
29 | * Text Input Data (torchtextlib)
30 | 
31 | * Distributional Outputs (Torch distributions/Pyro)
32 | 
33 | * Multimodal Models (multiple inputs/outputs)
34 | 


--------------------------------------------------------------------------------
/tabular_binary.R:
--------------------------------------------------------------------------------
  1 | library(torch)
  2 | library(palmerpenguins)
  3 | 
  4 | data("penguins")
  5 | 
  6 | penguins <- na.omit(penguins)
  7 | 
  8 | penguins$is_adelie <- as.numeric(penguins$species=='Adelie')
  9 | 
 10 | 
 11 | ## Train/Test Split 
 12 | train_idx <- sample(nrow(penguins), nrow(penguins)*.7)
 13 | 
 14 | features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g')
 15 | 
 16 | response <- 'is_adelie'
 17 | 
 18 | df_dataset <- dataset(
 19 |   name = "Penguins",
 20 |   
 21 |   # the input data to your dataset goes in the initialize function.
 22 |   # our dataset will take a dataframe and the name of the response
 23 |   # variable.
 24 |   initialize = function(df, feature_variables, response_variable) {
 25 |     
 26 |     # conveniently, the categorical data are already factors
 27 |     df$species <- as.numeric(df$species)
 28 |     df$island <- as.numeric(df$island)
 29 |     
 30 |     self$df <- df[, feature_variables]
 31 |     self$response_variable <- df[[response_variable]]
 32 |   },
 33 |   
 34 |   # the .getitem method takes an index as input and returns the
 35 |   # corresponding item from the dataset.
 36 |   # the index could be anything. the dataframe could have many
 37 |   # rows for each index and the .getitem method would do some
 38 |   # kind of aggregation before returning the element.
 39 |   # in our case the index will be a row of the data.frame,
 40 |   
 41 |   .getitem = function(index) {
 42 |     response <- torch_tensor(self$response_variable[index])
 43 |     x <- torch_tensor(as.numeric(self$df[index,]))
 44 |     
 45 |     # note that the dataloaders will automatically stack tensors
 46 |     # creating a new dimension
 47 |     list(x = x, y = response)
 48 |   },
 49 |   
 50 |   # It's optional, but helpful to define the .length method returning 
 51 |   # the number of elements in the dataset. This is needed if you want 
 52 |   # to shuffle your dataset.
 53 |   .length = function() {
 54 |     length(self$response_variable)
 55 |   }
 56 |   
 57 | )
 58 | 
 59 | 
 60 | penguins_train <- df_dataset(penguins[train_idx,], 
 61 |                              feature_variables = features, 
 62 |                              response_variable = response)
 63 | 
 64 | penguins_test <- df_dataset(penguins[-train_idx,],
 65 |                             feature_variables = features, 
 66 |                             response_variable = response)
 67 | 
 68 | penguins_train$.getitem(100)
 69 | 
 70 | 
 71 | dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
 72 | 
 73 | dl_test <-  dataloader(penguins_test, batch_size = 10)
 74 | 
 75 | for(batch in enumerate(dl_train)) {
 76 |   cat("X size:  ")
 77 |   print(batch[[1]]$size())
 78 |   cat("Y size:  ")
 79 |   print(batch[[2]]$size())
 80 | }
 81 | 
 82 | iter <- dl_train$.iter()
 83 | iter$.next()
 84 | 
 85 | 
 86 | net <- nn_module(
 87 |   "PenguinNet",
 88 |   initialize = function() {
 89 |     self$fc1 <- nn_linear(length(features), 16)
 90 |     self$fc2 <- nn_linear(16, 8)
 91 |     self$fc3 <- nn_linear(8, 1)
 92 |   },
 93 |   forward = function(x) {
 94 |     x %>% 
 95 |       self$fc1() %>% 
 96 |       nnf_relu() %>% 
 97 |       self$fc2() %>% 
 98 |       nnf_relu() %>%
 99 |       self$fc3() 
100 |   }
101 | )
102 | 
103 | model <- net()
104 | 
105 | optimizer <- optim_adam(model$parameters)
106 | 
107 | for (epoch in 1:10) {
108 |   
109 |   l <- c()
110 |   
111 |   for (b in enumerate(dl_train)) {
112 |     optimizer$zero_grad()
113 |     output <- model(b[[1]])
114 |     loss <- nnf_binary_cross_entropy_with_logits(output,b[[2]])
115 |     loss$backward()
116 |     optimizer$step()
117 |     l <- c(l, loss$item())
118 |   }
119 |   
120 |   cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
121 |   
122 | }
123 | 
124 | # Put the model into eval mode
125 | model$eval()
126 | 
127 | test_losses <- c()
128 | 
129 | 
130 | for (b in enumerate(dl_test)) {
131 |   output <- model(b[[1]])
132 |   loss <- nnf_binary_cross_entropy_with_logits(output, b[[2]])
133 |   test_losses <- c(test_losses, loss$item())
134 | }
135 | 
136 | mean(test_losses)
137 | 
138 | 
139 | 
140 | # Placeholder vector for predictions
141 | preds = c()
142 | 
143 | # Placeholder vector for probabilities
144 | out_log_odds = c()
145 | 
146 | for (b in enumerate(dl_test)) {
147 |   
148 |   # get log odds
149 |   output <- model(b[[1]])
150 |   
151 |   # convert to vector and append
152 |   log_odds = output$data() %>% as.array() %>% .[,1]
153 |   out_log_odds <- c(out_log_odds, log_odds)
154 |   
155 |   # get class prediction from log odds and append
156 |   predicted <- as.numeric(log_odds>0)
157 |   preds <- c(preds, predicted)
158 |   
159 | }
160 | 
161 | 
162 | head(preds)
163 | 
164 | head(out_log_odds)
165 | 


--------------------------------------------------------------------------------
/tabular_multiclass.R:
--------------------------------------------------------------------------------
  1 | library(torch)
  2 | library(palmerpenguins)
  3 | 
  4 | data("penguins")
  5 | 
  6 | penguins <- na.omit(penguins)
  7 | 
  8 | ## Train/Test Split 
  9 | train_idx <- sample(nrow(penguins), nrow(penguins)*.7)
 10 | 
 11 | features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'island')
 12 | 
 13 | response <- 'species'
 14 | 
 15 | df_dataset <- dataset(
 16 |   name = "Penguins",
 17 |   
 18 |   # the input data to your dataset goes in the initialize function.
 19 |   # our dataset will take a dataframe and the name of the response
 20 |   # variable.
 21 |   initialize = function(df, feature_variables, response_variable) {
 22 |     
 23 |     df$species <- as.numeric(df$species)
 24 |     df$island <- as.numeric(df$island)
 25 |     
 26 |     self$df <- df[, feature_variables]
 27 |     self$response_variable <- df[[response_variable]]
 28 |   },
 29 |   
 30 |   
 31 |   # the .getitem method takes an index as input and returns the
 32 |   # corresponding item from the dataset.
 33 |   # the index could be anything. the dataframe could have many
 34 |   # rows for each index and the .getitem method would do some
 35 |   # kind of aggregation before returning the element.
 36 |   # in our case the index will be a row of the data.frame,
 37 |   
 38 |   .getitem = function(index) {
 39 |     response <- torch_tensor(self$response_variable[index], dtype = torch_long())
 40 |     x <- torch_tensor(as.numeric(self$df[index,]))
 41 |     
 42 |     # note that the dataloaders will automatically stack tensors
 43 |     # creating a new dimension
 44 |     list(x = x, y = response)
 45 |   },
 46 |   
 47 |   # It's optional, but helpful to define the .length method returning 
 48 |   # the number of elements in the dataset. This is needed if you want 
 49 |   # to shuffle your dataset.
 50 |   .length = function() {
 51 |     length(self$response_variable)
 52 |   }
 53 |   
 54 | )
 55 | 
 56 | 
 57 | penguins_train <- df_dataset(penguins[train_idx,], 
 58 |                              feature_variables = features, 
 59 |                              response_variable = response)
 60 | 
 61 | penguins_test <- df_dataset(penguins[-train_idx,],
 62 |                             feature_variables = features, 
 63 |                             response_variable = response)
 64 | 
 65 | penguins_train$.getitem(100)
 66 | 
 67 | for(batch in enumerate(dl_train)) {
 68 |   cat("X size:  ")
 69 |   print(batch[[1]]$size())
 70 |   cat("Y size:  ")
 71 |   print(batch[[2]]$size())
 72 | }
 73 | 
 74 | 
 75 | dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
 76 | 
 77 | dl_test <-  dataloader(penguins_test, batch_size = 10)
 78 | 
 79 | 
 80 | iter <- dl_train$.iter()
 81 | iter$.next()
 82 | 
 83 | 
 84 | net <- nn_module(
 85 |   "PenguinNet",
 86 |   initialize = function() {
 87 |     self$fc1 <- nn_linear(5, 16)
 88 |     self$fc2 <- nn_linear(16, 8)
 89 |     self$fc3 <- nn_linear(8, 3)
 90 |   },
 91 |   forward = function(x) {
 92 |     x %>% 
 93 |       self$fc1() %>% 
 94 |       nnf_relu() %>% 
 95 |       self$fc2() %>% 
 96 |       nnf_relu() %>%
 97 |       self$fc3()
 98 |   }
 99 | )
100 | 
101 | model <- net()
102 | 
103 | optimizer <- optim_adam(model$parameters)
104 | 
105 | for (epoch in 1:10) {
106 |   
107 |   l <- c()
108 |   
109 |   for (b in enumerate(dl_train)) {
110 |     optimizer$zero_grad()
111 |     output <- model(b[[1]])
112 |     loss <- nnf_cross_entropy(output, torch_squeeze(b[[2]]))
113 |     loss$backward()
114 |     optimizer$step()
115 |     l <- c(l, loss$item())
116 |   }
117 |   
118 |   cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
119 | 
120 | }
121 | 
122 | # Put the model into eval mode
123 | model$eval()
124 | 
125 | test_losses <- c()
126 | total <- 0
127 | correct <- 0
128 | 
129 | for (b in enumerate(dl_test)) {
130 |   output <- model(b[[1]])
131 |   labels <- torch_squeeze(b[[2]])
132 |   loss <- nnf_cross_entropy(output, labels)
133 |   test_losses <- c(test_losses, loss$item())
134 |   # torch_max returns a list, with position 1 containing the values 
135 |   # and position 2 containing the respective indices
136 |   predicted <- torch_max(output$data(), dim = 2)[[2]]
137 |   total <- total + labels$size(1)
138 |   # add number of correct classifications in this batch to the aggregate
139 |   correct <- correct + (predicted == labels)$sum()$item()
140 | }
141 | 
142 | mean(test_losses)
143 | 
144 | 
145 | 
146 | # Placeholder vector for predictions
147 | preds = c()
148 | 
149 | # Placeholder vector for probabilities
150 | out_log_odds = data.frame()
151 | 
152 | for (b in enumerate(dl_test)) {
153 |   
154 |   # get log odds
155 |   output <- model(b[[1]])
156 | 
157 |   # convert to df and append
158 |   output_df = output$data() %>% as.array() %>% as.data.frame
159 |   out_log_odds <- rbind(out_log_odds, output_df)
160 |   
161 |   # get class prediction from log odds and append (using a 50% cutoff)
162 |   predicted <- torch_max(output$data(), dim = 2)[[2]]
163 |   preds <- c(preds, predicted)
164 |   
165 | }
166 | 
167 | 
168 | head(preds)
169 | 
170 | head(out_log_odds)
171 | 


--------------------------------------------------------------------------------
/tabular_regression.R:
--------------------------------------------------------------------------------
  1 | library(torch)
  2 | library(palmerpenguins)
  3 | 
  4 | data("penguins")
  5 | 
  6 | penguins <- na.omit(penguins)
  7 | 
  8 | ## Train/Test Split 
  9 | train_idx <- sample(nrow(penguins), nrow(penguins)*.7)
 10 | 
 11 | features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'island')
 12 | 
 13 | response <- 'body_mass_g'
 14 | 
 15 | df_dataset <- dataset(
 16 |   name = "Penguins",
 17 |   
 18 |   # the input data to your dataset goes in the initialize function.
 19 |   # our dataset will take a dataframe and the name of the response
 20 |   # variable.
 21 |   initialize = function(df, feature_variables, response_variable) {
 22 |     
 23 |     # conveniently, the categorical data are already factors
 24 |     df$species <- as.numeric(df$species)
 25 |     df$island <- as.numeric(df$island)
 26 |     
 27 |     self$df <- df[, feature_variables]
 28 |     self$response_variable <- df[[response_variable]]
 29 |   },
 30 |   
 31 |   # the .getitem method takes an index as input and returns the
 32 |   # corresponding item from the dataset.
 33 |   # the index could be anything. the dataframe could have many
 34 |   # rows for each index and the .getitem method would do some
 35 |   # kind of aggregation before returning the element.
 36 |   # in our case the index will be a row of the data.frame,
 37 |   
 38 |   .getitem = function(index) {
 39 |     response <- torch_tensor(self$response_variable[index], dtype = torch_float())
 40 |     x <- torch_tensor(as.numeric(self$df[index,]))
 41 |     
 42 |     # note that the dataloaders will automatically stack tensors
 43 |     # creating a new dimension
 44 |     list(x = x, y = response)
 45 |   },
 46 |   
 47 |   # It's optional, but helpful to define the .length method returning 
 48 |   # the number of elements in the dataset. This is needed if you want 
 49 |   # to shuffle your dataset.
 50 |   .length = function() {
 51 |     length(self$response_variable)
 52 |   }
 53 |   
 54 | )
 55 | 
 56 | 
 57 | penguins_train <- df_dataset(penguins[train_idx,], 
 58 |                              feature_variables = features, 
 59 |                              response_variable = response)
 60 | 
 61 | penguins_test <- df_dataset(penguins[-train_idx,],
 62 |                             feature_variables = features, 
 63 |                             response_variable = response)
 64 | 
 65 | penguins_train$.getitem(100)
 66 | 
 67 | 
 68 | dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
 69 | 
 70 | dl_test <-  dataloader(penguins_test, batch_size = 10)
 71 | 
 72 | for(batch in enumerate(dl_train)) {
 73 |   cat("X size:  ")
 74 |   print(batch[[1]]$size())
 75 |   cat("Y size:  ")
 76 |   print(batch[[2]]$size())
 77 | }
 78 | 
 79 | iter <- dl_train$.iter()
 80 | iter$.next()
 81 | 
 82 | 
 83 | net <- nn_module(
 84 |   "PenguinNet",
 85 |   initialize = function() {
 86 |     self$fc1 <- nn_linear(length(features), 16)
 87 |     self$fc2 <- nn_linear(16, 8)
 88 |     self$fc3 <- nn_linear(8, 1)
 89 |   },
 90 |   forward = function(x) {
 91 |     x %>% 
 92 |       self$fc1() %>% 
 93 |       nnf_relu() %>% 
 94 |       self$fc2() %>% 
 95 |       nnf_relu() %>%
 96 |       self$fc3() 
 97 |   }
 98 | )
 99 | 
100 | model <- net()
101 | 
102 | optimizer <- optim_adam(model$parameters)
103 | 
104 | for (epoch in 1:10) {
105 |   
106 |   l <- c()
107 |   
108 |   for (b in enumerate(dl_train)) {
109 |     optimizer$zero_grad()
110 |     output <- model(b[[1]])
111 |     loss <- nnf_mse_loss(output,b[[2]])
112 |     loss$backward()
113 |     optimizer$step()
114 |     l <- c(l, loss$item())
115 |   }
116 |   
117 |   cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
118 |   
119 | }
120 | 
121 | # Put the model into eval mode
122 | model$eval()
123 | 
124 | test_losses <- c()
125 | 
126 | for (b in enumerate(dl_test)) {
127 |   output <- model(b[[1]])
128 |   loss <- nnf_mse_loss(output, b[[2]])
129 |   test_losses <- c(test_losses, loss$item())
130 | 
131 | }
132 | 
133 | mean(test_losses)
134 | 
135 | 
136 | 
137 | # Placeholder vector for predictions
138 | preds = c()
139 | 
140 | for (b in enumerate(dl_test)) {
141 |   
142 |   # get predictions
143 |   output <- model(b[[1]])
144 |   
145 |   # convert to vector and append
146 |   predicted = output$data() %>% as.array() %>% .[,1]
147 |   preds <- c(preds, predicted)
148 |   
149 |   
150 | }
151 | 
152 | 
153 | head(preds)
154 | 


--------------------------------------------------------------------------------
/torch_examples.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Simple Torch Examples"
  3 | author: "Aaron Miles"
  4 | date: "10/1/2020"
  5 | output: html_document
  6 | ---
  7 | 
  8 | The only constants in life are death, taxes, and the RStudio team continually crushing it. This time, they've ported [Torch into R](https://blogs.rstudio.com/ai/posts/2020-09-29-introducing-torch-for-r/). I'm a fairly heavy `tensorflow` user, and coming from an R background had a steep learning curve incorporating it into my toolkit. While `torch` is simpler in a lot of ways (specifically, not requiring a python environment), these deep learning frameworks can be intimidating. What I hope to do here is demystify `torch` workflows a little bit by providing some overly simple use cases. Specifically, show regression and classification (binary and multiclass) models build with torch, and how to extract the correct output in order to feed into your model evaluation/post-modeling process. To be clear, I re-use a lot of the code from the vignettes and examples from [the torch package website](https://mlverse.github.io/torch/). They've done a great job, I just wanted to put my own spin on it in case it could be helpful to someone in the future.
  9 | 
 10 | ## Setup
 11 | 
 12 | I'm going to keep dependencies to a minimum, so I'll only be using the `torch` and `palmerpenguins` libraries.
 13 | 
 14 | ```{r libraries, message=FALSE, error=FALSE, warning=FALSE}
 15 | 
 16 | library(torch)
 17 | library(palmerpenguins)
 18 | 
 19 | ```
 20 | 
 21 | For each task I'll be using the `palmerpenguins` dataset. For sake of simplicity, I've just removed the cases that have missing values. There isn't a clear binary target variable, so I create one (flagging if the penguin is of the Adelie species). I also create a train/test split
 22 | 
 23 | ```{r dataprep, message=FALSE, error=FALSE, warning=FALSE}
 24 | 
 25 | penguins <- na.omit(penguins)
 26 | penguins$is_adelie <- as.numeric(penguins$species=='Adelie')
 27 | train_idx <- sample(nrow(penguins), nrow(penguins)*.7)
 28 | 
 29 | 
 30 | ```
 31 | 
 32 | The final setup step is to create a function that converts the data we want into torch tensors.(_side note: this is optional, but recommended way to load data into torch models. When deep learning, you probably can't have all your data in memory at once, and this process helps batch it up_). This code mimics python classes, so it may look a little funky to R users, but just know that the main purpose is to convert data from R datatypes into torch tensors. I've left the helpful comments from [this tutorial on the torch package site](https://mlverse.github.io/torch/articles/examples/dataset.html)
 33 | 
 34 | ```{r dataconvertfun, message=FALSE, error=FALSE, warning=FALSE}
 35 | 
 36 | df_dataset <- dataset(
 37 |   name = "Penguins",
 38 |   
 39 |   # the input data to your dataset goes in the initialize function.
 40 |   # our dataset will take a dataframe and the name of the response
 41 |   # variable.
 42 |   initialize = function(df, feature_variables, response_variable) {
 43 |     
 44 |     # conveniently, the categorical data are already factors
 45 |     df$species <- as.numeric(df$species)
 46 |     df$island <- as.numeric(df$island)
 47 |     
 48 |     self$df <- df[, feature_variables]
 49 |     self$response_variable <- df[[response_variable]]
 50 |   },
 51 |   
 52 |   # the .getitem method takes an index as input and returns the
 53 |   # corresponding item from the dataset.
 54 |   # the index could be anything. the dataframe could have many
 55 |   # rows for each index and the .getitem method would do some
 56 |   # kind of aggregation before returning the element.
 57 |   # in our case the index will be a row of the data.frame,
 58 |   
 59 |   .getitem = function(index) {
 60 |     response <- torch_tensor(self$response_variable[index], dtype = torch_float())
 61 |     x <- torch_tensor(as.numeric(self$df[index,]))
 62 |     
 63 |     # note that the dataloaders will automatically stack tensors
 64 |     # creating a new dimension
 65 |     list(x = x, y = response)
 66 |   },
 67 |   
 68 |   # It's optional, but helpful to define the .length method returning 
 69 |   # the number of elements in the dataset. This is needed if you want 
 70 |   # to shuffle your dataset.
 71 |   .length = function() {
 72 |     length(self$response_variable)
 73 |   }
 74 |   
 75 | )
 76 | 
 77 | 
 78 | ```
 79 | 
 80 | 
 81 | ## Regression
 82 | 
 83 | The first regression task is to predict a penguins weight using their other measurements, which island they were observed on, and their species. 
 84 | 
 85 | Will this be a very good model? No. The relationship between these variables isn't super strong, I've done no preprocessing, I'm just going through the process of building the model and getting the correct output. When building in real life, absolutely take all those preprocessing steps. Perhaps in a future post I'll show how `torch` models can integrate into a `tidymodels` workflow of some kind.
 86 | 
 87 | First, I pass the names of the features and response variables I want through that data conversion function
 88 | 
 89 | ```{r regbatch, message=FALSE, error=FALSE, warning=FALSE}
 90 | features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'island', 'species')
 91 | 
 92 | response <- 'body_mass_g'
 93 | 
 94 | penguins_train <- df_dataset(penguins[train_idx,], 
 95 |                              feature_variables = features, 
 96 |                              response_variable = response)
 97 | 
 98 | penguins_test <- df_dataset(penguins[-train_idx,],
 99 |                             feature_variables = features, 
100 |                             response_variable = response)
101 | 
102 | ```
103 | 
104 | If I want to look at an example case, here is how you could do that.
105 | ```{r examine, message=FALSE, error=FALSE, warning=FALSE}
106 | penguins_train$.getitem(1)
107 | ```
108 | 
109 | Next, I'll pass this converted data through a data loader. This will explicitly batch my data, and be what feeds into the model. You can specify your batch size here. For sake of simplicity, I'm just going to set 10
110 | 
111 | ```{r regdataloader, message=FALSE, error=FALSE, warning=FALSE}
112 | 
113 | dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
114 | 
115 | dl_test <-  dataloader(penguins_test, batch_size = 10)
116 | 
117 | ```
118 | 
119 | What that did was allow me to load 10 cases at a time. It's helpful to get an idea of what that actually looks like. I'll iterate through an example here to show.
120 | 
121 | ```{r iteration, message=FALSE, error=FALSE, warning=FALSE}
122 | iter <- dl_train$.iter()
123 | iter$.next()
124 | 
125 | ```
126 | 
127 | We can see here a single batch of 10 cases. That's what's going to be fed to the model in order to update the weights. If I kept executing `iter$.next()` I'd see the next 10 cases, and so on until I had gone through the entire dataset.
128 | 
129 | Now, for modeling. The overall structure is a bit different than `tensorflow`, but still intuitive in it's own way.
130 | 
131 | I'd highly recommend reading Sigrid Keydana's [initial blog post on torch](https://blogs.rstudio.com/ai/posts/2020-09-29-introducing-torch-for-r/) for more info on torch model structure
132 | 
133 | 
134 | ```{r modelspec, message=FALSE, error=FALSE, warning=FALSE}
135 | 
136 | net <- nn_module(
137 |   "PenguinNet",
138 |   initialize = function() {
139 |     self$fc1 <- nn_linear(length(features), 16)
140 |     self$fc2 <- nn_linear(16, 8)
141 |     self$fc3 <- nn_linear(8, 1)
142 |   },
143 |   forward = function(x) {
144 |     x %>% 
145 |       self$fc1() %>% 
146 |       nnf_relu() %>% 
147 |       self$fc2() %>% 
148 |       nnf_relu() %>%
149 |       self$fc3() 
150 |   }
151 | )
152 | 
153 | ```
154 | 
155 | 
156 | So first I specify my layers in the `initialize` section, things like layer type, shape, etc. Then I specify the network structure and place those layers within that network in the `forward` section. This is combined in `tensorflow`, which may be a stumbling block for some.
157 | 
158 | I'll specify the optimizer and assign the network to a model
159 | 
160 | ```{r regoptim,  message=FALSE, error=FALSE, warning=FALSE}
161 | 
162 | model <- net()
163 | 
164 | optimizer <- optim_adam(model$parameters)
165 | 
166 | ```
167 | 
168 | Now comes a part that will likely look different to R users. We're used to a nice tidy (no pun intended) `fit()` function, or that function being wrapped up in something like `lm()`, `randomForest()` etc. With the package in it's infancy (and being a port of PyTorch and borrowing syntax), the fitting is a little more involved. I'm going to set a for loop over the epochs, and explicitly update the model's weights with each pass. This is what is happening under the hood anyway in the functions mentioned above (perhaps without the batching), so it is useful insight into how models in general, and deep learning models in particular, are built.
169 | 
170 | ```{r regmodel, error=FALSE, warning=FALSE}
171 | 
172 | for (epoch in 1:10) {
173 |   
174 |   l <- c()
175 |   
176 |   for (b in enumerate(dl_train)) {
177 |     optimizer$zero_grad()
178 |     output <- model(b[[1]])
179 |     loss <- nnf_mse_loss(output,b[[2]])
180 |     loss$backward()
181 |     optimizer$step()
182 |     l <- c(l, loss$item())
183 |   }
184 |   
185 |   cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
186 |   
187 | }
188 | 
189 | 
190 | ```
191 | 
192 | Notice how I specified the loss function within that loop (`nnf_mse_loss()`)? Keep an eye on how that changes as we work through the classification models.
193 | 
194 | So I have my crappy model, now I want to evaluate it on the test set, and pull predictions out so I can make dope visualizations. 
195 | 
196 | First thing to do is to put the model object in evaluation mode, meaning it won't update the weights anymore and stay as a statis object (i.e. you don't want your linear regression model changing coefficients as you eval on the test set.) That's a simple function 
197 | 
198 | ```{r regfreeze}
199 | model$eval()
200 | ```
201 | 
202 | 
203 | For evaluation, I take a similar approach to training, where I loop through the test set, get my loss function, and then aggregate at the end. With a continuous outcome, I'm really only looking at MSE here.
204 | 
205 | ```{r regeval, error=FALSE, warning=FALSE, message=FALSE}
206 | 
207 | test_losses <- c()
208 | 
209 | for (b in enumerate(dl_test)) {
210 |   output <- model(b[[1]])
211 |   loss <- nnf_mse_loss(output, b[[2]])
212 |   test_losses <- c(test_losses, loss$item())
213 | 
214 | }
215 | 
216 | mean(test_losses)
217 | 
218 | ```
219 | As I go through the classification examples, I'll show how to specify different loss functions.
220 | 
221 | And as with any model, it's useless without the output to pass into your production system, visualization models, etc. Extracting the output is simple, even though we have to do some workarounds compared to other packages due to the batching. First, I create an empty prediction vector, and as our cases pass through I populate that vector with the subsequent predictions.
222 | 
223 | ```{r}
224 | 
225 | preds = c()
226 | 
227 | for (b in enumerate(dl_test)) {
228 |   
229 |   # get predictions
230 |   output <- model(b[[1]])
231 |   
232 |   # convert to vector and append
233 |   predicted = output$data() %>% as.array() %>% .[,1]
234 |   preds <- c(preds, predicted)
235 |   
236 |   
237 | }
238 | 
239 | 
240 | head(preds)
241 | ```
242 | 
243 | As we can see here, we now have a nice clean vector we can use to in prediction and visualization systems.
244 | 
245 | That's end-to-end for regression, now let's move onto binary classification.
246 | 
247 | ## Binary Classification
248 | 
249 | Keeping with the penguins dataset, let's re-use the data loading function from before, and transform the data we want into torch tensors. As there isn't a natural binary variable in this dataset, the outcome is going to be `is_adelie` variable that I created up above. 
250 | 
251 | ```{r bindataloader, error=FALSE, message=FALSE, warning=FALSE}
252 | 
253 | features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g')
254 | 
255 | response <- 'is_adelie'
256 | 
257 | penguins_train <- df_dataset(penguins[train_idx,], 
258 |                              feature_variables = features, 
259 |                              response_variable = response)
260 | 
261 | penguins_test <- df_dataset(penguins[-train_idx,],
262 |                             feature_variables = features, 
263 |                             response_variable = response)
264 | 
265 | 
266 | ```
267 | 
268 | Now to take a look at a sample case to make sure the data looks correct
269 | 
270 | ```{r bindatacheck, message=FALSE, warning=FALSE, error=FALSE}
271 | 
272 | penguins_train$.getitem(100)
273 | 
274 | 
275 | ```
276 | 
277 | Looking good! On to the data loaders...
278 | 
279 | ```{r message=FALSE, warning=FALSE, error=FALSE}
280 | 
281 | dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
282 | 
283 | dl_test <-  dataloader(penguins_test, batch_size = 10)
284 | 
285 | 
286 | ```
287 | 
288 | As this is a classification model, our model structure is going to be mostly the same.
289 | 
290 | ```{r binmodelspec, message=FALSE, error=FALSE, warning=FALSE}
291 | 
292 | net <- nn_module(
293 |   "PenguinNet",
294 |   initialize = function() {
295 |     self$fc1 <- nn_linear(length(features), 16)
296 |     self$fc2 <- nn_linear(16, 8)
297 |     self$fc3 <- nn_linear(8, 1)
298 |   },
299 |   forward = function(x) {
300 |     x %>% 
301 |       self$fc1() %>% 
302 |       nnf_relu() %>% 
303 |       self$fc2() %>% 
304 |       nnf_relu() %>%
305 |       self$fc3() 
306 |   }
307 | )
308 | 
309 | 
310 | model <- net()
311 | 
312 | optimizer <- optim_adam(model$parameters)
313 | 
314 | ```
315 |  
316 | Some may be wondering why I don't have a sigmoid activation at the end of the network. Torch is able to handle that through the loss function. As we see below, I use the `nnf_binary_cross_entropy_with_logits()` loss function, which handles that transformation. Another way to run this model would be to add the sigmoid activation function and use the `nnf_binary_cross_entropy()` function. As is true in all of coding, there are a lot of way to do the same thing.
317 | 
318 | ```{r binmodelrun, message=FALSE, warning=FALSE, error=FALSE}
319 | 
320 | for (epoch in 1:10) {
321 |   
322 |   l <- c()
323 |   
324 |   for (b in enumerate(dl_train)) {
325 |     optimizer$zero_grad()
326 |     output <- model(b[[1]])
327 |     loss <- nnf_binary_cross_entropy_with_logits(output,b[[2]])
328 |     loss$backward()
329 |     optimizer$step()
330 |     l <- c(l, loss$item())
331 |   }
332 |   
333 |   cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
334 |   
335 | }
336 | 
337 | 
338 | ```
339 | 
340 | Next, the model goes into evaluation mode and we get the test loss
341 | 
342 | ```{r binloss, message=FALSE, warning=FALSE, error=FALSE}
343 | 
344 | model$eval()
345 | 
346 | test_losses <- c()
347 | 
348 | 
349 | for (b in enumerate(dl_test)) {
350 |   output <- model(b[[1]])
351 |   loss <- nnf_binary_cross_entropy_with_logits(output, b[[2]])
352 |   test_losses <- c(test_losses, loss$item())
353 | }
354 | 
355 | mean(test_losses)
356 | 
357 | 
358 | ```
359 | 
360 | Evaluation with classification models need more response vectors than just one (even though they can all be derived from log-odds). The model itself will return log odds, but we can add another vector that returns a class prediction. 
361 | 
362 | ```{r binout, message=FALSE, warning=FALSE, error=FALSE}
363 | 
364 | # Placeholder vector for predictions
365 | preds = c()
366 | 
367 | # Placeholder vector for probabilities
368 | out_log_odds = c()
369 | 
370 | for (b in enumerate(dl_test)) {
371 |   
372 |   # get log odds
373 |   output <- model(b[[1]])
374 |   
375 |   # convert to df and append
376 |   log_odds = output$data() %>% as.array() %>% .[,1]
377 |   out_log_odds <- c(out_log_odds, log_odds)
378 |   
379 |   # get class prediction from log odds and append
380 |   predicted <- as.numeric(log_odds>0)
381 |   preds <- c(preds, predicted)
382 |   
383 | }
384 | 
385 | 
386 | head(preds)
387 | 
388 | head(out_log_odds)
389 | 
390 | 
391 | ```
392 | 
393 | All that puts out log odds, which you can convert into odds ratios and/or probabilities, as well as class predictions at a 50% cutoff. All that can be fed into even more evaluation, confusion matrices, etc.
394 | 
395 | 
396 | ## Multi-Class Classification
397 | 
398 | Predicting multiple classes is (unsurprisingly) trickier and has more holdups than either of the two previous examples. In this example, I'll be predicting the penguin's species. One thing important to note with multi-class classification is that, contrary to the past two examples, the data type of the outcome variable has to be long, not float. Re-examining the data transformation function from above, we can easily add `dtype = torch_long()` when specifying the outcome variable to account for this
399 | 
400 | ```{r multisetup, error=FALSE, warning=FALSE, message=FALSE}
401 | 
402 | features <- c('bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g')
403 | 
404 | response <- 'species'
405 | 
406 | df_dataset <- dataset(
407 |   name = "Penguins",
408 |   
409 |   initialize = function(df, feature_variables, response_variable) {
410 |     
411 |     df$species <- as.numeric(df$species)
412 |     df$island <- as.numeric(df$island)
413 |     
414 |     self$df <- df[, feature_variables]
415 |     self$response_variable <- df[[response_variable]]
416 |   },
417 |   
418 | 
419 |   .getitem = function(index) {
420 |     
421 |     response <- torch_tensor(self$response_variable[index], dtype = torch_long()) 
422 |     x <- torch_tensor(as.numeric(self$df[index,]))
423 |     
424 |     list(x = x, y = response)
425 |   },
426 | 
427 |   .length = function() {
428 |     length(self$response_variable)
429 |   }
430 |   
431 | )
432 | 
433 | 
434 | penguins_train <- df_dataset(penguins[train_idx,], 
435 |                              feature_variables = features, 
436 |                              response_variable = response)
437 | 
438 | penguins_test <- df_dataset(penguins[-train_idx,],
439 |                             feature_variables = features, 
440 |                             response_variable = response)
441 | 
442 | 
443 | ```
444 | 
445 | Now, for a look at the data to make sure the outcome is properly coded.
446 | 
447 | ```{r multicheck, error=FALSE, message=FALSE, warning=FALSE}
448 | penguins_train$.getitem(100)
449 | ```
450 | 
451 | With that looking good, the next step is to prep the dataloaders and specify the model structure. Those familiar with deep learning will recognize that I have three nodes on my last layer, which is equal to the number of classes I'm trying to predict. 
452 | ```{r multinet, error=FALSE, message=FALSE, warning=FALSE}
453 | 
454 | dl_train <- dataloader(penguins_train, batch_size = 10, shuffle = TRUE)
455 | dl_test <-  dataloader(penguins_test, batch_size = 10)
456 | 
457 | net <- nn_module(
458 |   "PenguinNet",
459 |   initialize = function() {
460 |     self$fc1 <- nn_linear(length(features), 16)
461 |     self$fc2 <- nn_linear(16, 8)
462 |     self$fc3 <- nn_linear(8, 3)
463 |   },
464 |   forward = function(x) {
465 |     x %>% 
466 |       self$fc1() %>% 
467 |       nnf_relu() %>% 
468 |       self$fc2() %>% 
469 |       nnf_relu() %>%
470 |       self$fc3()
471 |   }
472 | )
473 | 
474 | ```
475 |  
476 | As with the binary classification, I don't have any activations after the last layer, as torch handles those when specifying `nnf_cross_entropy()` into the loss function.
477 | 
478 | Another important thing to note is that `torch_squeeze()` has to be applied to the labels, or else this loop will error out. As we're working with a multi-class problem, there is an issue with shape, as the model is outputting 3 vectors. What `torch_squeeze()` is gets those vectors into the right format in order to be run with the batching, in our case creating a 10x3 matrix, as our batch size is 10 here.
479 | 
480 | ```{r runmultinet, error=FALSE, message=FALSE, warning=FALSE}
481 | 
482 | model <- net()
483 | 
484 | optimizer <- optim_adam(model$parameters)
485 | 
486 | for (epoch in 1:10) {
487 |   
488 |   l <- c()
489 |   
490 |   for (b in enumerate(dl_train)) {
491 |     optimizer$zero_grad()
492 |     output <- model(b[[1]])
493 |     loss <- nnf_cross_entropy(output, torch_squeeze(b[[2]]))
494 |     loss$backward()
495 |     optimizer$step()
496 |     l <- c(l, loss$item())
497 |   }
498 |   
499 |   cat(sprintf("Loss at epoch %d: %3f\n", epoch, mean(l)))
500 | 
501 | }
502 | 
503 | 
504 | ```
505 | 
506 | After the model is trained, evaluation is the next step. Again, `torch_squeeze()` is necessary to get the output in the right shape. I also pull from Sigrid's into to torch post to add an accuracy metric as well.
507 | 
508 | ```{r multieval, error=FALSE, message=FALSE, warning=FALSE}
509 | 
510 | # Put the model into eval mode
511 | model$eval()
512 | 
513 | test_losses <- c()
514 | total <- 0
515 | correct <- 0
516 | 
517 | for (b in enumerate(dl_test)) {
518 |   output <- model(b[[1]])
519 |   labels <- torch_squeeze(b[[2]])
520 |   loss <- nnf_cross_entropy(output, labels)
521 |   test_losses <- c(test_losses, loss$item())
522 |   # torch_max returns a list, with position 1 containing the values 
523 |   # and position 2 containing the respective indices
524 |   predicted <- torch_max(output$data(), dim = 2)[[2]]
525 |   total <- total + labels$size(1)
526 |   # add number of correct classifications in this batch to the aggregate
527 |   correct <- correct + (predicted == labels)$sum()$item()
528 | }
529 | 
530 | mean(test_losses)
531 | 
532 | ```
533 | 
534 | Moving on to pulling out the correct output, some adjustments have to be made due to the nature of the multi-class model. Rather than pulling a single vector and working with that, I pull each of the three vectors representing each class's log-odds into a data frame. From that data frame, I extract the class with the highest probability, and use that for our class prediction. 
535 | 
536 | ```{r multipreds, error=FALSE, message=FALSE, warning=FALSE}
537 | 
538 | 
539 | 
540 | # Placeholder vector for predictions
541 | preds = c()
542 | 
543 | # Placeholder df for log odds
544 | out_log_odds = data.frame()
545 | 
546 | for (b in enumerate(dl_test)) {
547 |   
548 |   # get log odds
549 |   output <- model(b[[1]])
550 | 
551 |   # convert to df and append
552 |   output_df = output$data() %>% as.array() %>% as.data.frame
553 |   out_log_odds <- rbind(out_log_odds, output_df)
554 |   
555 |   # get class prediction from log odds and append (using a 50% cutoff)
556 |   predicted <- torch_max(output$data(), dim = 2)[[2]]
557 |   preds <- c(preds, predicted)
558 |   
559 | }
560 | 
561 | 
562 | head(preds)
563 | 
564 | head(out_log_odds)
565 | 
566 | 
567 | ```
568 | 
569 | And with that, we've gone through some basic examples of ML model types using tabular data!
570 | 
571 | 
572 | ## Final Thoughts
573 | 
574 | Hopefully this post demystified some of the code necessary to get Torch up and running, and that the reader will be more comfortable using torch in day to day work, and even build more awesome stuff!
575 | 
576 | I hope to make this somewhat of a series, where I make examples of making binary/continuous predictions with text/image data, etc. I've used `tensorflow` quite a bit, and often ran into seemingly simple errors. My goal with torch is to have a resource online that'll help others overcome these.
577 | 
578 | I'm very excited to see Torch develop within the R community. Binding directly to the C++ libraries is going to pay dividends, as it removed the step of managing python environments within R. First, it's going to be fun to see libraries like `torchtext` and `pyro` get ported over and used within R. Second, I think this setup makes it likely that we'll see some R torch libraries get created that will have to be ported over to Python. The RStudio team absolutely crushed it with this port, and I'm excited to see how they and the R community at large continue to build on this.
579 | 


--------------------------------------------------------------------------------