├── README.md ├── Blogpost1.Rmd └── Blogpost1.md /README.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /Blogpost1.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Testing without the internet using mock functions" 3 | author: "Karina Marks, Gabor Csardi" 4 | output: 5 | html_document: 6 | keep_md: yes 7 | --- 8 | 9 | ## Testing 10 | 11 | At Mango we validate and test packages for our ValidR customers. 12 | We create unit tests for different requirements 13 | of functions. These tests must run on a customer's machine, where internet 14 | access could be restricted; hence, all tests 15 | must work - and pass - on any OS and without an internet connection. This 16 | creates a problem when testing packages whose main functionality is to 17 | connect to a server for tasks such as web scraping or web API. The way we 18 | do this is by using the method of 19 | [mocking](https://en.wikipedia.org/wiki/Mock_object). 20 | 21 | 22 | ## Mocking 23 | 24 | Mocking allows you to replace parts of your system under test with mock 25 | objects and make assertions about how they have been used. The `testthat` 26 | package supports mocking via the `with_mock()` function. `with_mock()` allows us to 27 | temporarily replace some R functions. In our case we replace the functions 28 | that connect to the internet with mock functions that only *pretend* to do 29 | so. 30 | 31 | ### Simple Example - a system error 32 | 33 | Suppose we use the `system()` function to call out to the operating 34 | system and start some utility. This is a somewhat fragile operation, it 35 | might fail if the utility is not available, so we make sure that we 36 | handle errors properly. This is somewhat tricky in the case of `system()` as 37 | it does not signal an R error if the sytem function fails, it just 38 | returns the exit status of the system shell, which is typically 39 | non-zero on an error. 40 | 41 | To test that errors are handled properly, we would need to create an 42 | environment where the system function indeed fails. While this is not 43 | impossible, it is much easier to just pretend that it fails using 44 | `with_mock()`: 45 | 46 | ```{r, results = "hide"} 47 | ## Package or script code 48 | ext <- function() { 49 | ## This usually works, but probably not always 50 | status <- system("sleep 5") 51 | if (status != 0) { stop("sleeping failed") } 52 | } 53 | ## Test code 54 | library(testthat) 55 | with_mock( 56 | `base::system` = function(...) { return(127) }, 57 | expect_error(ext(), "sleeping failed") 58 | ) 59 | ``` 60 | 61 | We want to test that `ext()` behaves well if a system error happens, 62 | and throws a proper R error. We use `with_mock()` to temporarily change 63 | `system()`, and pretend that it has failed. `with_mock()` takes two kinds 64 | of arguments: named and unnamed. The named arguments are the mock functions, 65 | the names define the functions to mock. Unnamed arguments are expressions 66 | to evaluate in the mocked environment. 67 | 68 | ## Using `with_mock` to avoid connecting to the internet 69 | 70 | ### Example - `GET()` 71 | 72 | Consider the function `GET()` from the `httr` package, 73 | which will perform an HTTP GET request: 74 | 75 | ```{r} 76 | library(httr) 77 | response <- GET("http://httpbin.org/get") 78 | ``` 79 | 80 | Under the hood `GET()` uses the `curl_fetch_memory()` function from the 81 | `curl` package, built on top of `libcurl`. The job of the `GET()` function is 82 | to call `curl` with the right arguments, and then process its response 83 | properly. So this is what we need to test (testing `curl` itself is 84 | another mocking story). 85 | 86 | If we would like to test that `GET()` works correctly, without any internet 87 | connection, the steps to do this are: 88 | 89 | * Trace `curl_fetch_memory()` to see the input it receives and the 90 | output it generates. 91 | * Call `GET()` for the scenario we want to test, and record the input 92 | and output of `curl_fetch_memory()`, so that we can use it to pretend an 93 | internet connection later in the real test case. Study the recorded 94 | input and output to see it can be reused, and update it as needed. 95 | * Write the test case using `with_mock()` on `curl_fetch_memory()` to 96 | replace it with a mock function that checks if the input received from 97 | `GET()` is correct and provide the recorded output. 98 | 99 | For brevity, let's assume that we only use the output of 100 | `curl_fetch_memory()` now. The input can be handled similarly. 101 | We trace it and record its return value: 102 | 103 | ```{r results = "hide", message = FALSE} 104 | library(curl) 105 | cfm_output <- NULL 106 | trace( 107 | curl_fetch_memory, 108 | exit = function() { cfm_output <<- returnValue() } 109 | ) 110 | response <- GET("http://httpbin.org/get") 111 | ``` 112 | 113 | The return value of `curl_fetch_memory()` is a list that contains the HTTP 114 | status code, headers, some timing information, and of course the content 115 | itself: 116 | 117 | ```{r} 118 | names(cfm_output) 119 | ``` 120 | 121 | We save this into a file now, and put the file into the `testthat` 122 | directory where the unit tests run. 123 | 124 | ```{r} 125 | save(cfm_output, file = "cfm_output.rda") 126 | ``` 127 | 128 | Now that we have the response stored locally we can use it to unit test 129 | `GET()` without connecting to the internet. 130 | 131 | ```{r} 132 | test_that("GET works as it should", { 133 | response <- with_mock( 134 | `curl::curl_fetch_memory` = function(...) { 135 | load("cfm_output.rda") 136 | cfm_output 137 | }, 138 | GET("http://httpbin.org/get") 139 | ) 140 | expect_silent(stop_for_status(response)) 141 | expect_equal(headers(response)$`content-type`, "application/json") 142 | # ... more tests for the response 143 | }) 144 | ``` 145 | 146 | We have successfully tested `GET()` without having to connect to a server. 147 | 148 | ## Primitive Functions 149 | 150 | The only problem with using this method is that you may sometimes wish to 151 | mock a primitive function, but `with_mock` does not allow you to do this. 152 | 153 | ### Example - `BROWSE()` 154 | 155 | This function is again from the `httr` package and relies on the function 156 | `browseURL()` from the base package `utils` to connect to the internet, so we 157 | will need to mock this function. This function only runs if the R session 158 | is interactive, i.e. `isTrue(interactive())`, which probably holds when 159 | *writing* the unit tests, but not when *running* them. 160 | It seems straightforward to mock `interactive()` to return `TRUE`, but it 161 | is a primitive function, so `with_mock()` cannot deal with it: 162 | ```{r, error = TRUE, results = "hide"} 163 | interactive 164 | with_mock( 165 | `interactive` = function(...) TRUE, 166 | interactive() 167 | ) 168 | ``` 169 | 170 | To work around this we must mock the function manually: 171 | 172 | ``` {r} 173 | test_that("Interactive is always TRUE", { 174 | 175 | interactive_original <- base::interactive 176 | on.exit( 177 | { 178 | assign("interactive", interactive_original, envir = baseenv()) 179 | lockBinding("interactive", baseenv()) 180 | }, 181 | add = TRUE 182 | ) 183 | unlockBinding('interactive', baseenv()) 184 | assign('interactive', function() TRUE, envir = baseenv()) 185 | 186 | expect_true(interactive()) 187 | }) 188 | 189 | ``` 190 | 191 | * We have saved the original version of `interactive()`, so that we can restore it. 192 | * `unlockBinding()` has allowed us to replace the function in the `base` 193 | package with our version. 194 | * `on.exit()` makes sure that after the test has run, this function is 195 | changed back, and `lockBinding()` seals the `base` package again. 196 | 197 | After this, mocking `browseURL()` is easy and we leave it 198 | as an exercise to the reader. 199 | 200 | Note that this method is not allowed for tests in CRAN packages, 201 | because calling `unlockBinding()` is forbidden there. If you are 202 | writing tests for your own R package, then a workaround is to define 203 | your own non-primitive `is_interactive()` function and use (and mock) 204 | that in your package: 205 | 206 | ```{r} 207 | is_interactive <- function() interactive() 208 | with_mock( 209 | `is_interactive` = function() TRUE, 210 | is_interactive() 211 | ) 212 | ``` 213 | 214 | Links 215 | 216 | * [This blog post on GitHub](https://github.com/MangoTheCat/blog-with-mock) 217 | * [The `testthat` R package.](https://github.com/hadley/testthat#readme) 218 | * [The `httr` R package.](https://github.com/hadley/httr#readme) 219 | * [The `curl` R package.](https://github.com/jeroenooms/curl#readme) 220 | * [ValidR](http://www.mango-solutions.com/wp/products-services/products/validr/) 221 | -------------------------------------------------------------------------------- /Blogpost1.md: -------------------------------------------------------------------------------- 1 | # Testing without the internet using mock functions 2 | Karina Marks, Gabor Csardi 3 | 4 | ## Testing 5 | 6 | At Mango we validate and test packages for our ValidR customers. 7 | We create unit tests for different requirements 8 | of functions. These tests must run on a customer's machine, where internet 9 | access could be restricted; hence, all tests 10 | must work - and pass - on any OS and without an internet connection. This 11 | creates a problem when testing packages whose main functionality is to 12 | connect to a server for tasks such as web scraping or web API. The way we 13 | do this is by using the method of 14 | [mocking](https://en.wikipedia.org/wiki/Mock_object). 15 | 16 | 17 | ## Mocking 18 | 19 | Mocking allows you to replace parts of your system under test with mock 20 | objects and make assertions about how they have been used. The `testthat` 21 | package supports mocking via the `with_mock()` function. `with_mock()` allows us to 22 | temporarily replace some R functions. In our case we replace the functions 23 | that connect to the internet with mock functions that only *pretend* to do 24 | so. 25 | 26 | ### Simple Example - a system error 27 | 28 | Suppose we use the `system()` function to call out to the operating 29 | system and start some utility. This is a somewhat fragile operation, it 30 | might fail if the utility is not available, so we make sure that we 31 | handle errors properly. This is somewhat tricky in the case of `system()` as 32 | it does not signal an R error if the sytem function fails, it just 33 | returns the exit status of the system shell, which is typically 34 | non-zero on an error. 35 | 36 | To test that errors are handled properly, we would need to create an 37 | environment where the system function indeed fails. While this is not 38 | impossible, it is much easier to just pretend that it fails using 39 | `with_mock()`: 40 | 41 | 42 | ```r 43 | ## Package or script code 44 | ext <- function() { 45 | ## This usually works, but probably not always 46 | status <- system("sleep 5") 47 | if (status != 0) { stop("sleeping failed") } 48 | } 49 | ## Test code 50 | library(testthat) 51 | ``` 52 | 53 | ``` 54 | ## Warning: package 'testthat' was built under R version 3.2.5 55 | ``` 56 | 57 | ```r 58 | with_mock( 59 | `base::system` = function(...) { return(127) }, 60 | expect_error(ext(), "sleeping failed") 61 | ) 62 | ``` 63 | 64 | We want to test that `ext()` behaves well if a system error happens, 65 | and throws a proper R error. We use `with_mock()` to temporarily change 66 | `system()`, and pretend that it has failed. `with_mock()` takes two kinds 67 | of arguments: named and unnamed. The named arguments are the mock functions, 68 | the names define the functions to mock. Unnamed arguments are expressions 69 | to evaluate in the mocked environment. 70 | 71 | ## Using `with_mock` to avoid connecting to the internet 72 | 73 | ### Example - `GET()` 74 | 75 | Consider the function `GET()` from the `httr` package, 76 | which will perform an HTTP GET request: 77 | 78 | 79 | ```r 80 | library(httr) 81 | ``` 82 | 83 | ``` 84 | ## Warning: package 'httr' was built under R version 3.2.5 85 | ``` 86 | 87 | ```r 88 | response <- GET("http://httpbin.org/get") 89 | ``` 90 | 91 | Under the hood `GET()` uses the `curl_fetch_memory()` function from the 92 | `curl` package, built on top of `libcurl`. The job of the `GET()` function is 93 | to call `curl` with the right arguments, and then process its response 94 | properly. So this is what we need to test (testing `curl` itself is 95 | another mocking story). 96 | 97 | If we would like to test that `GET()` works correctly, without any internet 98 | connection, the steps to do this are: 99 | 100 | * Trace `curl_fetch_memory()` to see the input it receives and the 101 | output it generates. 102 | * Call `GET()` for the scenario we want to test, and record the input 103 | and output of `curl_fetch_memory()`, so that we can use it to pretend an 104 | internet connection later in the real test case. Study the recorded 105 | input and output to see it can be reused, and update it as needed. 106 | * Write the test case using `with_mock()` on `curl_fetch_memory()` to 107 | replace it with a mock function that checks if the input received from 108 | `GET()` is correct and provide the recorded output. 109 | 110 | For brevity, let's assume that we only use the output of 111 | `curl_fetch_memory()` now. The input can be handled similarly. 112 | We trace it and record its return value: 113 | 114 | 115 | ```r 116 | library(curl) 117 | cfm_output <- NULL 118 | trace( 119 | curl_fetch_memory, 120 | exit = function() { cfm_output <<- returnValue() } 121 | ) 122 | response <- GET("http://httpbin.org/get") 123 | ``` 124 | 125 | The return value of `curl_fetch_memory()` is a list that contains the HTTP 126 | status code, headers, some timing information, and of course the content 127 | itself: 128 | 129 | 130 | ```r 131 | names(cfm_output) 132 | ``` 133 | 134 | ``` 135 | ## [1] "url" "status_code" "headers" "modified" "times" 136 | ## [6] "content" 137 | ``` 138 | 139 | We save this into a file now, and put the file into the `testthat` 140 | directory where the unit tests run. 141 | 142 | 143 | ```r 144 | save(cfm_output, file = "cfm_output.rda") 145 | ``` 146 | 147 | Now that we have the response stored locally we can use it to unit test 148 | `GET()` without connecting to the internet. 149 | 150 | 151 | ```r 152 | test_that("GET works as it should", { 153 | response <- with_mock( 154 | `curl::curl_fetch_memory` = function(...) { 155 | load("cfm_output.rda") 156 | cfm_output 157 | }, 158 | GET("http://httpbin.org/get") 159 | ) 160 | expect_silent(stop_for_status(response)) 161 | expect_equal(headers(response)$`content-type`, "application/json") 162 | # ... more tests for the response 163 | }) 164 | ``` 165 | 166 | We have successfully tested `GET()` without having to connect to a server. 167 | 168 | ## Primitive Functions 169 | 170 | The only problem with using this method is that you may sometimes wish to 171 | mock a primitive function, but `with_mock` does not allow you to do this. 172 | 173 | ### Example - `BROWSE()` 174 | 175 | This function is again from the `httr` package and relies on the function 176 | `browseURL()` from the base package `utils` to connect to the internet, so we 177 | will need to mock this function. This function only runs if the R session 178 | is interactive, i.e. `isTrue(interactive())`, which probably holds when 179 | *writing* the unit tests, but not when *running* them. 180 | It seems straightforward to mock `interactive()` to return `TRUE`, but it 181 | is a primitive function, so `with_mock()` cannot deal with it: 182 | 183 | ```r 184 | interactive 185 | with_mock( 186 | `interactive` = function(...) TRUE, 187 | interactive() 188 | ) 189 | ``` 190 | 191 | ``` 192 | ## Error in FUN(X[[i]], ...): old_fun must be a function 193 | ``` 194 | 195 | ``` 196 | ## Error in FUN(X[[i]], ...): old_fun must be a function 197 | ``` 198 | 199 | To work around this we must mock the function manually: 200 | 201 | 202 | ```r 203 | test_that("Interactive is always TRUE", { 204 | 205 | interactive_original <- base::interactive 206 | on.exit( 207 | { 208 | assign("interactive", interactive_original, envir = baseenv()) 209 | lockBinding("interactive", baseenv()) 210 | }, 211 | add = TRUE 212 | ) 213 | unlockBinding('interactive', baseenv()) 214 | assign('interactive', function() TRUE, envir = baseenv()) 215 | 216 | expect_true(interactive()) 217 | }) 218 | ``` 219 | 220 | * We have saved the original version of `interactive()`, so that we can restore it. 221 | * `unlockBinding()` has allowed us to replace the function in the `base` 222 | package with our version. 223 | * `on.exit()` makes sure that after the test has run, this function is 224 | changed back, and `lockBinding()` seals the `base` package again. 225 | 226 | After this, mocking `browseURL()` is easy and we leave it 227 | as an exercise to the reader. 228 | 229 | Note that this method is not allowed for tests in CRAN packages, 230 | because calling `unlockBinding()` is forbidden there. If you are 231 | writing tests for your own R package, then a workaround is to define 232 | your own non-primitive `is_interactive()` function and use (and mock) 233 | that in your package: 234 | 235 | 236 | ```r 237 | is_interactive <- function() interactive() 238 | with_mock( 239 | `is_interactive` = function() TRUE, 240 | is_interactive() 241 | ) 242 | ``` 243 | 244 | ``` 245 | ## [1] TRUE 246 | ``` 247 | 248 | Links 249 | 250 | * [This blog post on GitHub](https://github.com/MangoTheCat/blog-with-mock) 251 | * [The `testthat` R package.](https://github.com/hadley/testthat#readme) 252 | * [The `httr` R package.](https://github.com/hadley/httr#readme) 253 | * [The `curl` R package.](https://github.com/jeroenooms/curl#readme) 254 | * [ValidR](http://www.mango-solutions.com/wp/products-services/products/validr/) 255 | --------------------------------------------------------------------------------