├── README.md
├── Blogpost1.Rmd
└── Blogpost1.md


/README.md:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/Blogpost1.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Testing without the internet using mock functions"
  3 | author: "Karina Marks, Gabor Csardi"
  4 | output: 
  5 |   html_document: 
  6 |     keep_md: yes
  7 | ---
  8 | 
  9 | ## Testing
 10 | 
 11 | At Mango we validate and test packages for our ValidR customers.
 12 | We create unit tests for different requirements
 13 | of functions. These tests must run on a customer's machine, where internet
 14 | access could be restricted; hence, all tests 
 15 | must work - and pass - on any OS and without an internet connection. This
 16 | creates a problem when testing packages whose main functionality is to
 17 | connect to a server for tasks such as web scraping or web API. The way we
 18 | do this is by using the method of
 19 | [mocking](https://en.wikipedia.org/wiki/Mock_object).
 20 | 
 21 | 
 22 | ## Mocking
 23 | 
 24 | Mocking allows you to replace parts of your system under test with mock
 25 | objects and make assertions about how they have been used. The `testthat`
 26 | package supports mocking via the `with_mock()` function. `with_mock()` allows us to
 27 | temporarily replace some R functions. In our case we replace the functions
 28 | that connect to the internet with mock functions that only *pretend* to do
 29 | so.
 30 | 
 31 | ### Simple Example - a system error
 32 | 
 33 | Suppose we use the `system()` function to call out to the operating
 34 | system and start some utility. This is a somewhat fragile operation, it
 35 | might fail if the utility is not available, so we make sure that we 
 36 | handle errors properly. This is somewhat tricky in the case of `system()` as 
 37 | it does not signal an R error if the sytem function fails, it just 
 38 | returns the exit status of the system shell, which is typically
 39 | non-zero on an error.
 40 | 
 41 | To test that errors are handled properly, we would need to create an
 42 | environment where the system function indeed fails. While this is not
 43 | impossible, it is much easier to just pretend that it fails using
 44 | `with_mock()`:
 45 | 
 46 | ```{r, results = "hide"}
 47 | ## Package or script code
 48 | ext <- function() {
 49 |   ## This usually works, but probably not always
 50 |   status <- system("sleep 5")
 51 |   if (status != 0) { stop("sleeping failed") }
 52 | }
 53 | ## Test code
 54 | library(testthat)
 55 | with_mock(
 56 |   `base::system` = function(...) { return(127) },
 57 |   expect_error(ext(), "sleeping failed")
 58 | )
 59 | ```
 60 | 
 61 | We want to test that `ext()` behaves well if a system error happens,
 62 | and throws a proper R error. We use `with_mock()` to temporarily change
 63 | `system()`, and pretend that it has failed. `with_mock()` takes two kinds
 64 | of arguments: named and unnamed. The named arguments are the mock functions,
 65 | the names define the functions to mock. Unnamed arguments are expressions
 66 | to evaluate in the mocked environment.
 67 | 
 68 | ## Using `with_mock` to avoid connecting to the internet
 69 | 
 70 | ### Example - `GET()`
 71 | 
 72 | Consider the function `GET()` from the `httr` package,
 73 | which will perform an HTTP GET request:
 74 | 
 75 | ```{r}
 76 | library(httr)
 77 | response <- GET("http://httpbin.org/get")
 78 | ```
 79 | 
 80 | Under the hood `GET()` uses the `curl_fetch_memory()` function from the
 81 | `curl` package, built on top of `libcurl`. The job of the `GET()` function is
 82 | to call `curl` with the right arguments, and then process its response
 83 | properly. So this is what we need to test (testing `curl` itself is
 84 | another mocking story).
 85 | 
 86 | If we would like to test that `GET()` works correctly, without any internet
 87 | connection, the steps to do this are:
 88 | 
 89 | * Trace `curl_fetch_memory()` to see the input it receives and the
 90 |   output it generates.
 91 | * Call `GET()` for the scenario we want to test, and record the input
 92 |   and output of `curl_fetch_memory()`, so that we can use it to pretend an
 93 |   internet connection later in the real test case. Study the recorded
 94 |   input and output to see it can be reused, and update it as needed.
 95 | * Write the test case using `with_mock()` on `curl_fetch_memory()` to
 96 |   replace it with a mock function that checks if the input received from
 97 |   `GET()` is correct and provide the recorded output.
 98 | 
 99 | For brevity, let's assume that we only use the output of
100 | `curl_fetch_memory()` now. The input can be handled similarly.
101 | We trace it and record its return value:
102 | 
103 | ```{r results = "hide", message = FALSE}
104 | library(curl)
105 | cfm_output <- NULL
106 | trace(
107 |   curl_fetch_memory,
108 |   exit = function() { cfm_output <<- returnValue() }
109 | )
110 | response <- GET("http://httpbin.org/get")
111 | ```
112 | 
113 | The return value of `curl_fetch_memory()` is a list that contains the HTTP
114 | status code, headers, some timing information, and of course the content
115 | itself:
116 | 
117 | ```{r}
118 | names(cfm_output)
119 | ```
120 | 
121 | We save this into a file now, and put the file into the `testthat`
122 | directory where the unit tests run.
123 | 
124 | ```{r}
125 | save(cfm_output, file = "cfm_output.rda")
126 | ```
127 | 
128 | Now that we have the response stored locally we can use it to unit test
129 | `GET()` without connecting to the internet.
130 | 
131 | ```{r}
132 | test_that("GET works as it should", {
133 |   response <- with_mock(
134 |     `curl::curl_fetch_memory` = function(...) {
135 | 	    load("cfm_output.rda")
136 |       cfm_output
137 |     },
138 |     GET("http://httpbin.org/get")
139 |   )
140 |   expect_silent(stop_for_status(response))
141 |   expect_equal(headers(response)$`content-type`, "application/json")
142 |   # ... more tests for the response
143 | })
144 | ```
145 | 
146 | We have successfully tested `GET()` without having to connect to a server.
147 | 
148 | ## Primitive Functions
149 | 
150 | The only problem with using this method is that you may sometimes wish to
151 | mock a primitive function, but `with_mock` does not allow you to do this.
152 | 
153 | ### Example - `BROWSE()`
154 | 
155 | This function is again from the `httr` package and relies on the function
156 | `browseURL()` from the base package `utils` to connect to the internet, so we
157 | will need to mock this function. This function only runs if the R session
158 | is interactive, i.e. `isTrue(interactive())`, which probably holds when
159 | *writing* the unit tests, but not when *running* them. 
160 | It seems straightforward to mock `interactive()` to return `TRUE`, but it
161 | is a primitive function, so `with_mock()` cannot deal with it:
162 | ```{r, error = TRUE, results = "hide"}
163 | interactive
164 | with_mock(
165 |   `interactive` = function(...) TRUE,
166 |   interactive()
167 | )
168 | ```
169 | 
170 | To work around this we must mock the function manually:
171 | 
172 | ``` {r}
173 | test_that("Interactive is always TRUE", {
174 | 
175 |   interactive_original <- base::interactive
176 |   on.exit(
177 |     {
178 |       assign("interactive", interactive_original, envir = baseenv())
179 |       lockBinding("interactive", baseenv())
180 |     },
181 |     add = TRUE
182 |   )
183 |   unlockBinding('interactive', baseenv())
184 |   assign('interactive', function() TRUE, envir = baseenv())
185 | 
186 |   expect_true(interactive())
187 | })
188 | 
189 | ```
190 | 
191 | * We have saved the original version of `interactive()`, so that we can restore it.
192 | * `unlockBinding()` has allowed us to replace the function in the `base`
193 |   package with our version.
194 | * `on.exit()` makes sure that after the test has run, this function is
195 |   changed back, and  `lockBinding()` seals the `base` package again.
196 | 
197 | After this, mocking `browseURL()` is easy and we leave it
198 | as an exercise to the reader.
199 | 
200 | Note that this method is not allowed for tests in CRAN packages,
201 | because calling `unlockBinding()` is forbidden there. If you are
202 | writing tests for your own R package, then a workaround is to define
203 | your own non-primitive `is_interactive()` function and use (and mock) 
204 | that in your package:
205 | 
206 | ```{r}
207 | is_interactive <- function() interactive()
208 | with_mock(
209 |   `is_interactive` = function() TRUE,
210 |   is_interactive()
211 | )
212 | ```
213 | 
214 | Links
215 | 
216 | * [This blog post on GitHub](https://github.com/MangoTheCat/blog-with-mock)
217 | * [The `testthat` R package.](https://github.com/hadley/testthat#readme)
218 | * [The `httr` R package.](https://github.com/hadley/httr#readme)
219 | * [The `curl` R package.](https://github.com/jeroenooms/curl#readme)
220 | * [ValidR](http://www.mango-solutions.com/wp/products-services/products/validr/)
221 | 


--------------------------------------------------------------------------------
/Blogpost1.md:
--------------------------------------------------------------------------------
  1 | # Testing without the internet using mock functions
  2 | Karina Marks, Gabor Csardi  
  3 | 
  4 | ## Testing
  5 | 
  6 | At Mango we validate and test packages for our ValidR customers.
  7 | We create unit tests for different requirements
  8 | of functions. These tests must run on a customer's machine, where internet
  9 | access could be restricted; hence, all tests 
 10 | must work - and pass - on any OS and without an internet connection. This
 11 | creates a problem when testing packages whose main functionality is to
 12 | connect to a server for tasks such as web scraping or web API. The way we
 13 | do this is by using the method of
 14 | [mocking](https://en.wikipedia.org/wiki/Mock_object).
 15 | 
 16 | 
 17 | ## Mocking
 18 | 
 19 | Mocking allows you to replace parts of your system under test with mock
 20 | objects and make assertions about how they have been used. The `testthat`
 21 | package supports mocking via the `with_mock()` function. `with_mock()` allows us to
 22 | temporarily replace some R functions. In our case we replace the functions
 23 | that connect to the internet with mock functions that only *pretend* to do
 24 | so.
 25 | 
 26 | ### Simple Example - a system error
 27 | 
 28 | Suppose we use the `system()` function to call out to the operating
 29 | system and start some utility. This is a somewhat fragile operation, it
 30 | might fail if the utility is not available, so we make sure that we 
 31 | handle errors properly. This is somewhat tricky in the case of `system()` as 
 32 | it does not signal an R error if the sytem function fails, it just 
 33 | returns the exit status of the system shell, which is typically
 34 | non-zero on an error.
 35 | 
 36 | To test that errors are handled properly, we would need to create an
 37 | environment where the system function indeed fails. While this is not
 38 | impossible, it is much easier to just pretend that it fails using
 39 | `with_mock()`:
 40 | 
 41 | 
 42 | ```r
 43 | ## Package or script code
 44 | ext <- function() {
 45 |   ## This usually works, but probably not always
 46 |   status <- system("sleep 5")
 47 |   if (status != 0) { stop("sleeping failed") }
 48 | }
 49 | ## Test code
 50 | library(testthat)
 51 | ```
 52 | 
 53 | ```
 54 | ## Warning: package 'testthat' was built under R version 3.2.5
 55 | ```
 56 | 
 57 | ```r
 58 | with_mock(
 59 |   `base::system` = function(...) { return(127) },
 60 |   expect_error(ext(), "sleeping failed")
 61 | )
 62 | ```
 63 | 
 64 | We want to test that `ext()` behaves well if a system error happens,
 65 | and throws a proper R error. We use `with_mock()` to temporarily change
 66 | `system()`, and pretend that it has failed. `with_mock()` takes two kinds
 67 | of arguments: named and unnamed. The named arguments are the mock functions,
 68 | the names define the functions to mock. Unnamed arguments are expressions
 69 | to evaluate in the mocked environment.
 70 | 
 71 | ## Using `with_mock` to avoid connecting to the internet
 72 | 
 73 | ### Example - `GET()`
 74 | 
 75 | Consider the function `GET()` from the `httr` package,
 76 | which will perform an HTTP GET request:
 77 | 
 78 | 
 79 | ```r
 80 | library(httr)
 81 | ```
 82 | 
 83 | ```
 84 | ## Warning: package 'httr' was built under R version 3.2.5
 85 | ```
 86 | 
 87 | ```r
 88 | response <- GET("http://httpbin.org/get")
 89 | ```
 90 | 
 91 | Under the hood `GET()` uses the `curl_fetch_memory()` function from the
 92 | `curl` package, built on top of `libcurl`. The job of the `GET()` function is
 93 | to call `curl` with the right arguments, and then process its response
 94 | properly. So this is what we need to test (testing `curl` itself is
 95 | another mocking story).
 96 | 
 97 | If we would like to test that `GET()` works correctly, without any internet
 98 | connection, the steps to do this are:
 99 | 
100 | * Trace `curl_fetch_memory()` to see the input it receives and the
101 |   output it generates.
102 | * Call `GET()` for the scenario we want to test, and record the input
103 |   and output of `curl_fetch_memory()`, so that we can use it to pretend an
104 |   internet connection later in the real test case. Study the recorded
105 |   input and output to see it can be reused, and update it as needed.
106 | * Write the test case using `with_mock()` on `curl_fetch_memory()` to
107 |   replace it with a mock function that checks if the input received from
108 |   `GET()` is correct and provide the recorded output.
109 | 
110 | For brevity, let's assume that we only use the output of
111 | `curl_fetch_memory()` now. The input can be handled similarly.
112 | We trace it and record its return value:
113 | 
114 | 
115 | ```r
116 | library(curl)
117 | cfm_output <- NULL
118 | trace(
119 |   curl_fetch_memory,
120 |   exit = function() { cfm_output <<- returnValue() }
121 | )
122 | response <- GET("http://httpbin.org/get")
123 | ```
124 | 
125 | The return value of `curl_fetch_memory()` is a list that contains the HTTP
126 | status code, headers, some timing information, and of course the content
127 | itself:
128 | 
129 | 
130 | ```r
131 | names(cfm_output)
132 | ```
133 | 
134 | ```
135 | ## [1] "url"         "status_code" "headers"     "modified"    "times"      
136 | ## [6] "content"
137 | ```
138 | 
139 | We save this into a file now, and put the file into the `testthat`
140 | directory where the unit tests run.
141 | 
142 | 
143 | ```r
144 | save(cfm_output, file = "cfm_output.rda")
145 | ```
146 | 
147 | Now that we have the response stored locally we can use it to unit test
148 | `GET()` without connecting to the internet.
149 | 
150 | 
151 | ```r
152 | test_that("GET works as it should", {
153 |   response <- with_mock(
154 |     `curl::curl_fetch_memory` = function(...) {
155 | 	    load("cfm_output.rda")
156 |       cfm_output
157 |     },
158 |     GET("http://httpbin.org/get")
159 |   )
160 |   expect_silent(stop_for_status(response))
161 |   expect_equal(headers(response)$`content-type`, "application/json")
162 |   # ... more tests for the response
163 | })
164 | ```
165 | 
166 | We have successfully tested `GET()` without having to connect to a server.
167 | 
168 | ## Primitive Functions
169 | 
170 | The only problem with using this method is that you may sometimes wish to
171 | mock a primitive function, but `with_mock` does not allow you to do this.
172 | 
173 | ### Example - `BROWSE()`
174 | 
175 | This function is again from the `httr` package and relies on the function
176 | `browseURL()` from the base package `utils` to connect to the internet, so we
177 | will need to mock this function. This function only runs if the R session
178 | is interactive, i.e. `isTrue(interactive())`, which probably holds when
179 | *writing* the unit tests, but not when *running* them. 
180 | It seems straightforward to mock `interactive()` to return `TRUE`, but it
181 | is a primitive function, so `with_mock()` cannot deal with it:
182 | 
183 | ```r
184 | interactive
185 | with_mock(
186 |   `interactive` = function(...) TRUE,
187 |   interactive()
188 | )
189 | ```
190 | 
191 | ```
192 | ## Error in FUN(X[[i]], ...): old_fun must be a function
193 | ```
194 | 
195 | ```
196 | ## Error in FUN(X[[i]], ...): old_fun must be a function
197 | ```
198 | 
199 | To work around this we must mock the function manually:
200 | 
201 | 
202 | ```r
203 | test_that("Interactive is always TRUE", {
204 | 
205 |   interactive_original <- base::interactive
206 |   on.exit(
207 |     {
208 |       assign("interactive", interactive_original, envir = baseenv())
209 |       lockBinding("interactive", baseenv())
210 |     },
211 |     add = TRUE
212 |   )
213 |   unlockBinding('interactive', baseenv())
214 |   assign('interactive', function() TRUE, envir = baseenv())
215 | 
216 |   expect_true(interactive())
217 | })
218 | ```
219 | 
220 | * We have saved the original version of `interactive()`, so that we can restore it.
221 | * `unlockBinding()` has allowed us to replace the function in the `base`
222 |   package with our version.
223 | * `on.exit()` makes sure that after the test has run, this function is
224 |   changed back, and  `lockBinding()` seals the `base` package again.
225 | 
226 | After this, mocking `browseURL()` is easy and we leave it
227 | as an exercise to the reader.
228 | 
229 | Note that this method is not allowed for tests in CRAN packages,
230 | because calling `unlockBinding()` is forbidden there. If you are
231 | writing tests for your own R package, then a workaround is to define
232 | your own non-primitive `is_interactive()` function and use (and mock) 
233 | that in your package:
234 | 
235 | 
236 | ```r
237 | is_interactive <- function() interactive()
238 | with_mock(
239 |   `is_interactive` = function() TRUE,
240 |   is_interactive()
241 | )
242 | ```
243 | 
244 | ```
245 | ## [1] TRUE
246 | ```
247 | 
248 | Links
249 | 
250 | * [This blog post on GitHub](https://github.com/MangoTheCat/blog-with-mock)
251 | * [The `testthat` R package.](https://github.com/hadley/testthat#readme)
252 | * [The `httr` R package.](https://github.com/hadley/httr#readme)
253 | * [The `curl` R package.](https://github.com/jeroenooms/curl#readme)
254 | * [ValidR](http://www.mango-solutions.com/wp/products-services/products/validr/)
255 | 


--------------------------------------------------------------------------------