├── .gitignore ├── Unconference_Aus_2016 ├── img │ ├── DSC_0007.JPG │ ├── hackthon1.jpg │ ├── hackthon2.jpg │ ├── hackthon3.jpg │ └── unconf_demog.png ├── Unconference_Aus_2016.md └── Unconference_Aus_2016.Rmd ├── blogs.Rproj └── R_no_primitives ├── R_no_primitives.Rmd └── R_no_primitives.md /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /Unconference_Aus_2016/img/DSC_0007.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MilesMcBain/blogs/master/Unconference_Aus_2016/img/DSC_0007.JPG -------------------------------------------------------------------------------- /Unconference_Aus_2016/img/hackthon1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MilesMcBain/blogs/master/Unconference_Aus_2016/img/hackthon1.jpg -------------------------------------------------------------------------------- /Unconference_Aus_2016/img/hackthon2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MilesMcBain/blogs/master/Unconference_Aus_2016/img/hackthon2.jpg -------------------------------------------------------------------------------- /Unconference_Aus_2016/img/hackthon3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MilesMcBain/blogs/master/Unconference_Aus_2016/img/hackthon3.jpg -------------------------------------------------------------------------------- /Unconference_Aus_2016/img/unconf_demog.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MilesMcBain/blogs/master/Unconference_Aus_2016/img/unconf_demog.png -------------------------------------------------------------------------------- /blogs.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /R_no_primitives/R_no_primitives.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R Has No Primitives" 3 | author: "Miles McBain" 4 | output: 5 | md_document: 6 | variant: markdown_github 7 | --- 8 | 9 | # The Good Bit 10 | 11 | Some weeks ago Hadley tweeted [this graphic](https://twitter.com/hadleywickham/status/732288980549390336) about objects and names in R. Someone asked him to give a situation where this was important and he said: 12 | 13 | > I haven't been able to figure that out. But you'll make terrible predictions about performance unless you know 14 | 15 | I thought I knew what this meant. I truly did. But it wasn't until I saw the conversation around [this tweet](https://twitter.com/nj_tierney/status/735087930251710464) from @[nj_tierney](https://twitter.com/nj_tierney) that I can honestly say the penny finally dropped. And boy did it drop. I'll say this real slow and clear like for old school coders like me: 16 | 17 | **R has no primitive types.** 18 | 19 | No seriously. Everything is an object. Integers and floats are always numeric vectors, even when then are 1 just one element long. EVEN bloody boolean values which can be represented by a single bit in memory are objects. check this out: 20 | 21 | ```{r} 22 | library(pryr) 23 | abool <- T 24 | object_size(abool) 25 | ``` 26 | 27 | 48 bytes for information that can be represented by a single bit! O.O In truth this isn't so bad because most langauges pad their bools. But the point is `abool` is not a bool. It's a SEXP... How cute. 28 | 29 | SEXPs are header thingies that describe objects. Further to our discussion today, they have a `named` attribute that records how many names an object has been assigned. Hence Hadley's graphic. If you look at the [beastiary of SEXPs in the R language](https://cran.r-project.org/doc/manuals/r-release/R-ints.html#SEXPTYPEs) you will also see there are no non-vector data type objects. 30 | 31 | ![](http://i.giphy.com/OK27wINdQS5YQ.gif) 32 | 33 | I know right? 34 | 35 | It gets better. This framework allows R to do some neat tricks with the assignment operator. I'll quote section 1.1.2 of the R manual at you, because I think it explains it quite nicely: 36 | 37 | ```{} 38 | The named field is set and accessed by the SET_NAMED and NAMED macros, and take values 0, 1 and 2. R has a 'call by value' illusion, so an assignment like 39 | 40 | b <- a 41 | appears to make a copy of a and refer to it as b. However, if neither a nor b are subsequently altered there is no need to copy. What really happens is that a new symbol b is bound to the same value as a and the named field on the value object is set (in this case to 2). When an object is about to be altered, the named field is consulted. A value of 2 means that the object must be duplicated before being changed. (Note that this does not say that it is necessary to duplicate, only that it should be duplicated whether necessary or not.) A value of 0 means that it is known that no other SEXP shares data with this object, and so it may safely be altered. A value of 1 is used for situations like 42 | 43 | dim(a) <- c(7, 2) 44 | where in principle two copies of a exist for the duration of the computation as (in principle) 45 | 46 | a <- `dim<-`(a, c(7, 2)) 47 | ``` 48 | 49 | **TLDR:** R delays copies due to assignment until it absolutely has to, and can optimise out 'in principle' copies. There is no call by value. It was all... an 'illusion'. 50 | 51 | ![](http://i.giphy.com/qJxFuXXWpkdEI.gif) 52 | 53 | # A History Lesson 54 | 55 | Why am I making a big deal out of this? Well for me this was very surprising. I learned to code in C++, where there was disctinction between primitive types and objects. Primitive types don't waste any memory on headers, they are literally just the raw data represented in memory, and the compiler does the job of to tracking their type. For example an `int` takes up 4 or 8 bytes of memory (depending on 32, or 64 bit) and it uses all of that memory to represent the numerical value of that integer. 56 | 57 | Let's compare that old world: 58 | 59 | ```{r} 60 | library(Rcpp) 61 | Rcpp::cppFunction( 62 | 'void primitiveDemo(){ 63 | NumericVector output(2); 64 | 65 | int a = 2; 66 | int b = a; 67 | int asize = sizeof(a); 68 | int bsize = sizeof(b); 69 | 70 | Rcpp::Rcout << "address of a: " << &a << ", address of b: " << &b; 71 | Rcpp::Rcout << ", size of a: " << asize << ", size of b: " << bsize; 72 | 73 | return; 74 | }' 75 | ) 76 | 77 | primitiveDemo() 78 | ``` 79 | 80 | With the new: 81 | ```{r} 82 | a <- 2 83 | b <- a 84 | paste0("address of a: ", address(a), ", address of b: ", address(b)) 85 | paste0("size of a: ", object_size(a), ", size of b: ", object_size(b)) 86 | ``` 87 | 88 | So R is currently using a total of 48 bytes for storage because it did not actually make a copy. C++ makes the copy and uses a total of 8 bytes 89 | 90 | # Take Away 91 | If you're tyring to optimise R while thinking like a c++ coder, you may well be doing more harm than good. I myself have fallen foul of this in an attempt to modify data frames in place with my `pushr` package. It ended up just being syntactic sugar, with no observable performance boost. 92 | 93 | 94 | 95 | -------------------------------------------------------------------------------- /Unconference_Aus_2016/Unconference_Aus_2016.md: -------------------------------------------------------------------------------- 1 | On April 21st and 22nd of 2016, we had 40 members of the R community gather in Brisbane, Australia, with the goal of reproducing the rOpensci Unconference events that have been running with great success in San Francisco since 2014. Like every event organisers ever, we went through the usual crisis: Where will it be? Will anyone actually show up? Is the problem space over venue, date, attendees, catering, sponsors convex? It it even possible to organise an event by only uttering TRUE statements? 2 | 3 | It ain't easy being green. The fact that back in December 2015 Karthik Ram agreed to a proposal to essentially franchise the rOpensci unconference to a few junior researchers he’d never heard of, from a place a long way away, with enthusiasm for rOpensci’s ideals as their only redeeming qualities, still astounds us. More than that, it speaks volumes about the accessibility of this community to anyone that believes in what it stands for. 4 | 5 | So sustained by coaching from Karthik, encouragement from local R luminaries, and generous support from sponsors who jumped right on the idea, we pulled together together an event that captured the spirit of those that have come before and got the community interested. The final event attracted a stellar cohort of participants from post grad students, post-docs and tenured professors, to industry statisticians and government data scientists. Attendees travelled from across Australia, New Zealand and even Singapore. Together we represented a diversity of organisations including Queensland University of Technology, University of Melbourne, Monash University, University of Iowa, University of Auckland, Microsoft, University of Adelaide, CSIRO, ACEMS, AACo, QLD Museum, QLD government and others. 6 | 7 | It's hard not to gush about. It's the most validating feeling to see the community rally around this idea of getting together in the name of open source, open science, and solving each other's problems. It was such a privilege to be amongst this group of incredibly intelligent and giving people. 8 | 9 | The plot below shows the breakdown of the attendees in terms of gender and profession, both areas that we hope to improve in regards to diversity in 2017. We particularly hope to attract more women from the grad student cohort, as well as a wider representation from across industry and government. 10 | 11 | ![Attendee Demographics](./img/unconf_demog.png) 12 | 13 | Much to our delight, it turns out that collecting a bunch of smart, passionate people and freeing them from distractions for 2 days works at least as spectacularly down-under as it has in the United States. And while the outcomes of this event are far greater than a set of packages, we are proud to introduce you to the following: 14 | 15 | [Eechidna](https://github.com/ropenscilabs/eechidna) (Now on CRAN!) is a package for easily creating visual mashups of Australian electoral and demographic data. Arriving just in time for the lead up to the Australian election (July 2). In the next few days we will launch a visualisation competition around the package. 16 | 17 | [A collection of vignettes](https://github.com/saundersk1/auunconf16) for pulling Australian weather data into R. 18 | 19 | [Awaptools](https://github.com/swish-climate-impact-assessment/awaptools), a helpful colleciton of methods for downloading, and unzipping Australian weather grids. 20 | 21 | [Leafier](https://github.com/ropenscilabs/leafier), is an impressive proof of concept for enhancing the performance of large spatial datasets in leaflet. The presentation for this one induced one of those magical ‘collective intake of breath moments’. 22 | 23 | [Snowball](https://github.com/ropenscilabs/snowball), a framework for adminstering AWS clusters and scheduling cluster jobs from within R. 24 | 25 | [BURGr](http://www.meetup.com/Brisbane-Users-of-R-Group-BURGr/), the Brisbane Users of R Group was born out of the conference and will be having its first meetup this week. 26 | 27 | Feedback from participants has been positive. The unconference format was a real hit and there has been a lot of encouragement to keep 2017 similarly small, with a similar format. We agree! We also see the merits of a ‘primer’ session on open source/open science tools before the main event, so that everyone can start off on similar footing. The voting process for selecting issues was also identified as needing tweaking and we’re going to address that too. This is a great thing about the unconference format: since everyone helps create the experience, everyone is invested and wants to see it thrive. 28 | 29 | If you like what you’ve been reading about and want to be involved in 2017, [you can register your interest on our site](http://auunconf.ropensci.org/). 30 | 31 | To conclude this post we would like to give a heartfelt thanks to our sponsors. These guys get it! Thank you to the [Microsoft Innovation Centre, Brisbane](www.microsoft.com.au), [ACEMS](www.acems.org.au), [rOpenSci](www.ropensci.org), and [AACo](https://www.aaco.com.au/) for the support to make the first rOpenSci Unconference happen in Australia. 32 | 33 | Miles, Jessie & Nick 34 | 35 | ![Lounge hacking](./img/hackthon1.jpg) ![Window hacjing](./img/hackthon2.jpg) ![Main Room hacking](./img/hackthon2.jpg) ![Miles, Jonathan, Jessie, Nick](./img/DSC_0007.JPG) 36 | -------------------------------------------------------------------------------- /Unconference_Aus_2016/Unconference_Aus_2016.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Unconference Aus 2016" 3 | author: "Jessie Roberts, Miles McBain, Nick Tierney" 4 | date: "6/14/2016" 5 | output: 6 | md_document: 7 | variant: markdown_github 8 | --- 9 | 10 | md_document: 11 | variant: markdown_github 12 | 13 | On April 21st and 22nd of 2016, we had 40 members of the R community gather in Brisbane, Australia, with the goal of reproducing the rOpensci Unconference events that have been running with great success in San Francisco since 2014. Like every event organisers ever, we went through the usual crisis: Where will it be? Will anyone actually show up? Is the problem space over venue, date, attendees, catering, sponsors convex? It it even possible to organise an event by only uttering TRUE statements? 14 | 15 | It ain't easy being green. The fact that back in December 2015 Karthik Ram agreed to a proposal to essentially franchise the rOpensci unconference to a few junior researchers he’d never heard of, from a place a long way away, with enthusiasm for rOpensci’s ideals as their only redeeming qualities, still astounds us. More than that, it speaks volumes about the accessibility of this community to anyone that believes in what it stands for. 16 | 17 | So sustained by coaching from Karthik, encouragement from local R luminaries, and generous support from sponsors who jumped right on the idea, we pulled together together an event that captured the spirit of those that have come before and got the community interested. The final event attracted a stellar cohort of participants from post grad students, post-docs and tenured professors, to industry statisticians and government data scientists. Attendees travelled from across Australia, New Zealand and even Singapore. Together we represented a diversity of organisations including Queensland University of Technology, University of Melbourne, Monash University, University of Iowa, University of Auckland, Microsoft, University of Adelaide, CSIRO, ACEMS, AACo, QLD Museum, QLD government and others. 18 | 19 | It's hard not to gush about. It's the most validating feeling to see the community rally around this idea of getting together in the name of open source, open science, and solving each other's problems. It was such a privilege to be amongst this group of incredibly intelligent and giving people. 20 | 21 | The plot below shows the breakdown of the attendees in terms of gender and profession, both areas that we hope to improve in regards to diversity in 2017. We particularly hope to attract more women from the grad student cohort, as well as a wider representation from across industry and government. 22 | 23 | ![Attendee Demographics](./img/unconf_demog.png) 24 | 25 | Much to our delight, it turns out that collecting a bunch of smart, passionate people and freeing them from distractions for 2 days works at least as spectacularly down-under as it has in the United States. And while the outcomes of this event are far greater than a set of packages, we are proud to introduce you to the following: 26 | 27 | [Eechidna](https://github.com/ropenscilabs/eechidna) (Now on CRAN!) is a package for easily creating visual mashups of Australian electoral and demographic data. Arriving just in time for the lead up to the Australian election (July 2). In the next few days we will launch a visualisation competition around the package. 28 | 29 | [A collection of vignettes](https://github.com/saundersk1/auunconf16) for pulling Australian weather data into R. 30 | 31 | [Awaptools](https://github.com/swish-climate-impact-assessment/awaptools), a helpful colleciton of methods for downloading, and unzipping Australian weather grids. 32 | 33 | [Leafier](https://github.com/ropenscilabs/leafier), is an impressive proof of concept for enhancing the performance of large spatial datasets in leaflet. The presentation for this one induced one of those magical ‘collective intake of breath moments’. 34 | 35 | [Snowball](https://github.com/ropenscilabs/snowball), a framework for adminstering AWS clusters and scheduling cluster jobs from within R. 36 | 37 | [BURGr](http://www.meetup.com/Brisbane-Users-of-R-Group-BURGr/), the Brisbane Users of R Group was born out of the conference and will be having its first meetup this week. 38 | 39 | Feedback from participants has been positive. The unconference format was a real hit and there has been a lot of encouragement to keep 2017 similarly small, with a similar format. We agree! We also see the merits of a ‘primer’ session on open source/open science tools before the main event, so that everyone can start off on similar footing. The voting process for selecting issues was also identified as needing tweaking and we’re going to address that too. This is a great thing about the unconference format: since everyone helps create the experience, everyone is invested and wants to see it thrive. 40 | 41 | If you like what you’ve been reading about and want to be involved in 2017, [you can register your interest on our site](http://auunconf.ropensci.org/). 42 | 43 | To conclude this post we would like to give a heartfelt thanks to our sponsors. These guys get it! Thank you to the [Microsoft Innovation Centre, Brisbane](www.microsoft.com.au), [ACEMS](www.acems.org.au), [rOpenSci](www.ropensci.org), and [AACo](https://www.aaco.com.au/) for the support to make the first rOpenSci Unconference happen in Australia. 44 | 45 | Miles, Jessie & Nick 46 | 47 | ![Lounge hacking](./img/hackthon1.jpg) 48 | ![Window hacjing](./img/hackthon2.jpg) 49 | ![Main Room hacking](./img/hackthon2.jpg) 50 | ![Miles, Jonathan, Jessie, Nick](./img/DSC_0007.JPG) -------------------------------------------------------------------------------- /R_no_primitives/R_no_primitives.md: -------------------------------------------------------------------------------- 1 | The Good Bit 2 | ============ 3 | 4 | Some weeks ago Hadley tweeted [this 5 | graphic](https://twitter.com/hadleywickham/status/732288980549390336) 6 | about objects and names in R. Someone asked him to give a situation 7 | where this was important and he said: 8 | 9 | > I haven't been able to figure that out. But you'll make terrible 10 | > predictions about performance unless you know 11 | 12 | I thought I knew what this meant. I truly did. But it wasn't until I saw 13 | the conversation around [this 14 | tweet](https://twitter.com/nj_tierney/status/735087930251710464) from 15 | @[nj\_tierney](https://twitter.com/nj_tierney) that I can honestly say 16 | the penny finally dropped. And boy did it drop. I'll say this real slow 17 | and clear like for old school coders like me: 18 | 19 | **R has no primitive types.** 20 | 21 | No seriously. Everything is an object. Integers and floats are always 22 | numeric vectors, even when then are 1 just one element long. EVEN bloody 23 | boolean values which can be represented by a single bit in memory are 24 | objects. check this out: 25 | 26 | library(pryr) 27 | abool <- T 28 | object_size(abool) 29 | 30 | ## 48 B 31 | 32 | 48 bytes for information that can be represented by a single bit! O.O In 33 | truth this isn't so bad because most langauges pad their bools. But the 34 | point is `abool` is not a bool. It's a SEXP... How cute. 35 | 36 | SEXPs are header thingies that describe objects. Further to our 37 | discussion today, they have a `named` attribute that records how many 38 | names an object has been assigned. Hence Hadley's graphic. If you look 39 | at the [beastiary of SEXPs in the R 40 | language](https://cran.r-project.org/doc/manuals/r-release/R-ints.html#SEXPTYPEs) 41 | you will also see there are no non-vector data type objects. 42 | 43 | ![](http://i.giphy.com/OK27wINdQS5YQ.gif) 44 | 45 | I know right? 46 | 47 | It gets better. This framework allows R to do some neat tricks with the 48 | assignment operator. I'll quote section 1.1.2 of the R manual at you, 49 | because I think it explains it quite nicely: 50 | 51 | The named field is set and accessed by the SET_NAMED and NAMED macros, and take values 0, 1 and 2. R has a 'call by value' illusion, so an assignment like 52 | 53 | b <- a 54 | appears to make a copy of a and refer to it as b. However, if neither a nor b are subsequently altered there is no need to copy. What really happens is that a new symbol b is bound to the same value as a and the named field on the value object is set (in this case to 2). When an object is about to be altered, the named field is consulted. A value of 2 means that the object must be duplicated before being changed. (Note that this does not say that it is necessary to duplicate, only that it should be duplicated whether necessary or not.) A value of 0 means that it is known that no other SEXP shares data with this object, and so it may safely be altered. A value of 1 is used for situations like 55 | 56 | dim(a) <- c(7, 2) 57 | where in principle two copies of a exist for the duration of the computation as (in principle) 58 | 59 | a <- `dim<-`(a, c(7, 2)) 60 | 61 | **TLDR:** R delays copies due to assignment until it absolutely has to, 62 | and can optimise out 'in principle' copies. There is no call by value. 63 | It was all... an 'illusion'. 64 | 65 | ![](http://i.giphy.com/qJxFuXXWpkdEI.gif) 66 | 67 | A History Lesson 68 | ================ 69 | 70 | Why am I making a big deal out of this? Well for me this was very 71 | surprising. I learned to code in C++, where there was disctinction 72 | between primitive types and objects. Primitive types don't waste any 73 | memory on headers, they are literally just the raw data represented in 74 | memory, and the compiler does the job of to tracking their type. For 75 | example an `int` takes up 4 or 8 bytes of memory (depending on 32, or 64 76 | bit) and it uses all of that memory to represent the numerical value of 77 | that integer. 78 | 79 | Let's compare that old world: 80 | 81 | library(Rcpp) 82 | Rcpp::cppFunction( 83 | 'void primitiveDemo(){ 84 | NumericVector output(2); 85 | 86 | int a = 2; 87 | int b = a; 88 | int asize = sizeof(a); 89 | int bsize = sizeof(b); 90 | 91 | Rcpp::Rcout << "address of a: " << &a << ", address of b: " << &b; 92 | Rcpp::Rcout << ", size of a: " << asize << ", size of b: " << bsize; 93 | 94 | return; 95 | }' 96 | ) 97 | 98 | primitiveDemo() 99 | 100 | ## address of a: 0x7ffc45730ae8, address of b: 0x7ffc45730aec, size of a: 4, size of b: 4 101 | 102 | With the new: 103 | 104 | a <- 2 105 | b <- a 106 | paste0("address of a: ", address(a), ", address of b: ", address(b)) 107 | 108 | ## [1] "address of a: 0x340abf8, address of b: 0x340abf8" 109 | 110 | paste0("size of a: ", object_size(a), ", size of b: ", object_size(b)) 111 | 112 | ## [1] "size of a: 48, size of b: 48" 113 | 114 | So R is currently using a total of 48 bytes for storage because it did 115 | not actually make a copy. C++ makes the copy and uses a total of 8 bytes 116 | 117 | Take Away 118 | ========= 119 | 120 | If you're tyring to optimise R while thinking like a c++ coder, you may 121 | well be doing more harm than good. I myself have fallen foul of this in 122 | an attempt to modify data frames in place with my `pushr` package. It 123 | ended up just being syntactic sugar, with no observable performance 124 | boost. 125 | --------------------------------------------------------------------------------