├── README.md ├── blog-post.md ├── month-1-summary.md ├── month-1.md ├── month-2.md └── proposal.md /README.md: -------------------------------------------------------------------------------- 1 | proposal.md -------------------------------------------------------------------------------- /blog-post.md: -------------------------------------------------------------------------------- 1 | 2 | # First public version of the r-hub builder 3 | 4 | ## Introduction 5 | 6 | The r-hub builder is the first major project of the 7 | [R consortium](https://www.r-consortium.org/). It is an R package build 8 | and continuous integration service, open to all members of the R 9 | community. 10 | 11 | The goals of r-hub are 12 | Goals for R-Hub include: 13 | * simplify the R package development process: creating a package, 14 | building binaries and continuous integration, publishing, distributing 15 | and maintaining it; 16 | * encourage community contributions; and 17 | * pre-test CRAN package submissions to ease burden on CRAN maintainers. 18 | 19 | ## What's available 20 | 21 | * Linux builders for uploaded R source packages. You can watch the 22 | package check process in real time. Currently Debian and Fedora 23 | builders are available. Builds are performed in Docker containers, 24 | and new builders can be added easily. 25 | 26 | * Automatic detection of system requirements. We built a system 27 | requirements database that allows us to automatically install system 28 | software needed for R packages. Note that the database needs constant 29 | improvements, and if it fails for your R package, please let us know. 30 | See below. 31 | 32 | * Flexible package dependencies. You don't need to have all your package 33 | dependencies on CRAN in order to use r-hub. We support 34 | `devtools`-style `Remotes` fields in `DESCRIPTION`, so you can depend 35 | on GitHub, BitBucket, etc. packages. See more about this at 36 | https://cran.r-project.org/web/packages/devtools/vignettes/dependencies.html 37 | 38 | Go to https://builder.r-hub.io to try the r-hub builder! 39 | 40 | ## What's coming? 41 | 42 | Mostly everything else that was promised in the 43 | [proposal](https://github.com/r-hub/proposal#readme) The two major 44 | features that are coming soon are 45 | * Windows builds, and 46 | * The r-hub CI. You'll be able to trigger builds from your GitHub 47 | repositories. 48 | 49 | ## You can help 50 | 51 | * Please try and run the builder on your R package, and report failures 52 | at https://github.com/r-hub/feedback/issues or by emailing 53 | `support@r-hub.io`. 54 | * If your package does not build because of a non-detected system 55 | requirement, please let us know at 56 | https://github.com/r-hub/sysreqsdb/issues 57 | * The code for r-hub itself is open source and available at 58 | [GitHub](https://github.com/r-hub). Your contributions are welcome! 59 | -------------------------------------------------------------------------------- /month-1-summary.md: -------------------------------------------------------------------------------- 1 | 2 | # Summary of the first month 3 | 4 | ## Deliverables 5 | 6 | ### 1 Jenkins DockerFile, tailored to r-hub 7 | 8 | Available at https://github.com/r-hub/rhub-jenkins and 9 | https://github.com/r-hub/dokku-jenkins. This configuration is not final, 10 | but most challenges related to configuring Jenkins in a Docker container and 11 | running it from Dokku are solved. 12 | 13 | ### 2 r-hub application, front-end 14 | 15 | A minimal web application is at https://github.com/r-hub/rhub-frontend. 16 | It needs a better response page, otherwise it is complete for this stage 17 | of the project. (Will change with other OSes and configuration options.) 18 | 19 | ### 3 r-hub application, back-end 20 | 21 | It is at https://github.com/r-hub/rhub-backend. The principles are 22 | done. The Jenkins job configuration templates are also included in this 23 | repository, we will extend these substantially. 24 | 25 | ### 4 A running instance of our Jenkins container 26 | 27 | This is not done yet, we only have local deployment. 28 | 29 | ### 5 A running instance of the r-hub front-end 30 | 31 | This is not done yet, we only have local deployment. 32 | 33 | ### 6 A running instance of the r-hub back-end 34 | 35 | This is not done yet, we only have local deployment. 36 | 37 | ### 7 DockerFile for an Ubuntu Linux builder 38 | 39 | We have two Linux builder configurations at 40 | https://github.com/r-hub/rhub-linux-builders. 41 | 42 | ### 8 A running instance of a Linux builder 43 | 44 | This is not done yet, we only have local deployment. 45 | 46 | ### 9 A GitHub repository for r-hub's documentation 47 | 48 | It is at https://github.com/r-hub/docs. To be extended as we move on. 49 | 50 | ### 10 A GitHub repository for user feedback 51 | 52 | A currently empty repository is at https://github.com/r-hub/feedback. 53 | 54 | ### 11 r-hub controller 55 | 56 | After considering and evaluating various VM and container management 57 | solutions, I decided to use a simple Vagrant based approach for now. 58 | The controller code and configuration is included in an R package: 59 | https://github.com/r-hub/rhubctrl 60 | 61 | ### 12 A method for storing sensitive information on GitHub 62 | 63 | We use https://github.com/hadley/secure to store secrets in the repository 64 | on the controller, and environment variables to propagate them to the 65 | virtual machines and containers. Somewhat more details of the configuration 66 | is at https://github.com/r-hub/rhubctrl#configuration. 67 | -------------------------------------------------------------------------------- /month-1.md: -------------------------------------------------------------------------------- 1 | 2 | # Plans for the first month of r-hub 3 | 4 | Note: this document follows mostly the original proposal. One week 5 | in the proposal's timeline corresponds to two weeks here, as I will 6 | work for only 87 hours per month (roughly 50%) on the project. 7 | 8 | ## Introduction 9 | 10 | The goal of the first ~5 weeks is to have a Linux-builder web-application 11 | and back-end up and running. Users can submit builds via a web page, 12 | the build results are stored on a static web server. Users 13 | get e-mail notifications about the results. Very much like the current 14 | win-builder, but for Linux, for now one specific distribution of Linux. 15 | 16 | ## Deliverables 17 | 18 | ### 1 Jenkins DockerFile, tailored to r-hub 19 | 20 | ### 2 r-hub application, front-end 21 | 22 | The web application that currently only handles the web uploads. It hands over 23 | the submissions to the back-end, by putting them in a message queue, 24 | and storing the submitted source packages in a shared filesystem. 25 | 26 | This is a node.js application and it will run in a Docker container. 27 | 28 | ### 3 r-hub application, back-end 29 | 30 | This is the application that coordinates the builds. Currently, it picks up 31 | submissions from the message queue, one by one, and adds them as a new Jenkins 32 | job. It also "contains" the message queue itself. 33 | 34 | For now it works with a fixed number of builders, all added to Jenkins 35 | manually. 36 | 37 | The back-end is also responsible for the (temporary) storage of the source and 38 | binary packages. 39 | 40 | This is a node.js application, and it will run in a Docker container. 41 | 42 | ### 4 A running instance of our Jenkins container 43 | 44 | Managed via Dokku. 45 | 46 | ### 5 A running instance of the r-hub front-end 47 | 48 | Managed via Dokku. 49 | 50 | ### 6 A running instance of the r-hub back-end 51 | 52 | Managed via Dokku. 53 | 54 | ### 7 DockerFile for an Ubuntu Linux builder 55 | 56 | We can use one with the most recently released R version, i.e. R 3.2.2. 57 | Most probably we can just take the container from https://github.com/rocker-org. 58 | 59 | ### 8 A running instance of a Linux builder 60 | 61 | A generic Linux server running 2-4 Linux builder containers, 62 | all managed via Dokku. The containers connect to the Jenkins server 63 | automatically. 64 | 65 | ### 9 A GitHub repository for r-hub's documentation 66 | 67 | Using markdown documents. 68 | 69 | ### 10 A GitHub repository for user feedback 70 | 71 | This repository is linked to from the submission page, from the 72 | emails sent to users, etc. 73 | 74 | ### 11 r-hub controller 75 | 76 | Work out a method to control the r-hub micro-services. Look for a simple 77 | solution that is possibly compatible with dokku. The controller is responsible 78 | for supplying the configuration information to all r-hub containers. 79 | 80 | Possible solutions to evaluate: 81 | * Google Kubernetes 82 | * Helios from Spotify 83 | * Docker compose 84 | * Flynn 85 | * Decking 86 | * Deis 87 | 88 | ### 12 A method for storing sensitive information on GitHub 89 | 90 | There are already methods for this, of course. Ideally we want something 91 | simple and lightweight. I.e. the whole repo is encrypted, and can be decrypted 92 | by any admin keys. https://github.com/hadley/secure contains most of what we 93 | need, but we probably need to access the encrypted secrets from the command 94 | line without using R, and probably also from JavaScript. 95 | -------------------------------------------------------------------------------- /month-2.md: -------------------------------------------------------------------------------- 1 | 2 | # Plans for the second month of r-hub 3 | 4 | Note: this document follows mostly the original proposal. One week 5 | in the proposal's timeline corresponds to two weeks here, as I will 6 | work for only 87 hours per month (roughly 50%) on the project. 7 | 8 | ## Introduction 9 | 10 | This month will focus on three themes: 11 | * Bringing up a working web-submission based system, with several Linux 12 | builders. 13 | * Handling system requirements. 14 | * Redundant services, logging and monitoring. 15 | 16 | ## Deliverables 17 | 18 | ### 1 A running instance of our Jenkins container 19 | 20 | From month 1. 21 | 22 | ### 2 A running instance of the r-hub front-end 23 | 24 | From month 1. 25 | 26 | ### 3 A running instance of the r-hub back-end 27 | 28 | From month 1. 29 | 30 | ### 4 DockerFile for an Ubuntu Linux builder 31 | 32 | From month 1. 33 | 34 | ### 5 A running instance of a Linux builder 35 | 36 | From month 1. 37 | 38 | ### 6 Builder for several Linux flavours 39 | 40 | Current Linux platforms on CRAN: 41 | 42 | * `r-devel-linux-x86_64-debian-clang` (Debian testing) 43 | * `r-devel-linux-x86_64-debian-gcc` (Debian testing) 44 | * `r-devel-linux-x86_64-fedora-clang` (Fedora 21) 45 | * `r-devel-linux-x86_64-fedora-gcc` (Fedora 21) 46 | * `r-patched-linux-x86_64` (Debian testing) 47 | * `r-release-linux-x86_64` (Debian testing) 48 | 49 | Other popular Linux platforms we want to support: 50 | 51 | * `r-devel` on most recent Ubuntu LTS 52 | * `r-release` on most recent Ubuntu LTS 53 | * `r-devel` on stable Debian 54 | * `r-release` on stable Debian 55 | 56 | ### 7 Database of system requirements 57 | 58 | Create a database of system requirements for CRAN packages. This work 59 | has already started at https://github.com/r-hub/sysreqs. 60 | The builders need to use this database to install additional software 61 | in the container. Caching these software would be ideal but looks challenging. 62 | 63 | ### 8 Precompiling and caching CRAN packages 64 | 65 | Create a volume on AWS and/or Azure that stores the precompiled 66 | binaries of all CRAN packages, for all supported Linux platforms. 67 | Fix the system requirement database and the build system based 68 | on the errors we get while compiling the CRAN packages. 69 | 70 | ### 9 Logging 71 | 72 | Implement logging for all services, to a centralized logger machine. 73 | 74 | ### 10 Monitoring 75 | 76 | Design a sensu dashboard for the r-hub services. 77 | -------------------------------------------------------------------------------- /proposal.md: -------------------------------------------------------------------------------- 1 | 2 | # **r-hub**: the everything-builder the R community needs 3 | 4 | Gábor Csárdi, based on previous versions by J.J. Allaire (RStudio), Ben 5 | Bolker (McMaster University) Dirk Eddelbuettel (Debian), Jay Emerson (Yale 6 | University), Nicholas Lewin-Koh (Genentech), Joseph Rickert (Revolution), 7 | David Smith (Revolution), Murray Stokely (Google), and Simon Urbanek (AT&T) 8 | 9 | July 21, 2015 10 | 11 | ## Introduction 12 | 13 | The infrastructure available for developing, building, testing, and 14 | validating R packages is of critical importance to the R community. CRAN 15 | and R-Forge have traditionally met these needs, however the maintenance and 16 | enhancement of R-Forge has significant costs in both money and time. This 17 | proposal outlines r-hub, a service that is complementary to CRAN and 18 | R-Forge, that would add capabilities, improve extensibility, and create a 19 | platform for community contributions to r-hub itself. 20 | 21 | ## Goals 22 | 23 | 1. Services that ease _all_ steps the R package development process, 24 | creating a package, building binaries and continuous integration, 25 | publishing, distributing and maintaining it. 26 | 2. Make these services free for all members of the community. 27 | 3. Allow community contributions to r-hub itself. 28 | 4. Make CRAN maintainers' work easier by pre-testing CRAN package 29 | submissions. 30 | 31 | ## Design principles 32 | 33 | 1. Reuse as much existing technology as possible, especially the tools and 34 | services already widely used among R package developers. 35 | 2. Keep compatibility with current systems. 36 | 3. Cover as much of the user base as possible. 37 | 4. Create open systems: provide APIs, make all code for the service open 38 | source, accept contributions to it. Make all software open source on 39 | GitHub. GitHub can be used with git and svn natively, and also with 40 | Mercurial via a plugin. 41 | 5. Automated system. Make the daily operation of r-hub fully automatic, 42 | without any human intervention needed. 43 | 6. Cost-effective services. Rent machines from the cloud, if possible. 44 | Adapt number and type of machines to the current workload. 45 | 7. Make it possible for organizations and individuals to contribute (the 46 | cost of) machines to the build farm. 47 | 8. Make services reproducible. It should be possible to start up an 48 | instance of any r-hub service within minutes, for anyone, even 49 | on their own laptop. This is crucial for development, testing and 50 | user contribution. 51 | 9. Separation is key, so use containers and reproducible VM images as much 52 | as possible. 53 | 54 | ## Essential services 55 | 56 | ### Build server 57 | 58 | To check and build R packages. It also creates binaries for various 59 | platforms, including the currently supported CRAN platforms, and 60 | potentially others: 61 | - MS Windows 62 | - OSX, each flavor supported by CRAN 63 | - Linux flavors 64 | - Solaris 65 | 66 | Users can upload their packages through a web page, or an HTTP 67 | API, potentially select platforms to build on. They are provided with a 68 | randomized URL where they can check the status of their builds, see their 69 | output, and they can also cancel the build. Once the builds are finished 70 | they can download source and binary R packages. 71 | 72 | ### Continuous integration for R packages 73 | 74 | Make the build server work as a continuous integration service, free to use 75 | for all R community members, integrated into GitHub and potentially other 76 | popular source code repositories. It will work very similarly to 77 | Travis or other GitHub integrated CI services, but will specialize in R 78 | packages: better start-up time using caching and binary builds of popular or 79 | all packages, no or minimal configuration requirements for the users. 80 | 81 | ### Distribution of R package sources and binaries 82 | 83 | Store built source and binary packages in a repository from which they can 84 | be installed easily. This service can be used for distributing development 85 | versions of CRAN R packages, or for R packages that are not (yet) fit to be 86 | distributed on CRAN. 87 | 88 | ### Community web site 89 | 90 | To showcase the packages, and to integrate all services. Users can browse 91 | packages, search for packages, search the source code, view build 92 | statistics, link to GitHub issues. 93 | 94 | ## Implementation details and time line 95 | 96 | ### Build server: Jenkins in a container (week 1) 97 | 98 | [Jenkins][jenkins] server, running in a Docker container. We need one 99 | master server to start with. We will run it in the cloud, initially on 100 | DigitalOcean or AWS. We will use [dokku][dokku] to manage it (and will also 101 | manage other microservices similarly in containers). We will probably 102 | switch to something more integrated in the future, like 103 | [Kubernetes][kubernetes] on [CoreOS][coreos], but initial development 104 | dokku is satisfactory. 105 | 106 | We will not expose Jenkins directly, but both users and admins will access 107 | it through an HTTP API (the r-hub API). The reason for this is that we 108 | might need to use something else than Jenkins for Solaris, and that in the 109 | future we might need multiple Jenkins servers. 110 | 111 | Initially the server providing the r-hub API is very simple, and will 112 | essentially serve as a proxy to our single Jenkins master. 113 | 114 | The actual build submissions to Jenkins will be handled by another process, 115 | so that the web app can handle requests promptly. The web app simply puts 116 | down the file into a shared folder (putting a random submission id in the 117 | file name to make sure it is unique), and then adds the a request including 118 | the file name and and the id to a RabbitMQ queue. Another process picks up 119 | jobs from the queue, and then communicates with Jenkins to add a job and 120 | start the build. Query parameters are also passed to the queue. 121 | 122 | [jenkins]: https://jenkins-ci.org 123 | [dokku]: https://github.com/progrium/dokku 124 | [kubernetes]: https://github.com/googlecloudplatform/kubernetes 125 | [coreos]: https://coreos.com/ 126 | 127 | ### r-hub API (weeks 1-2) 128 | 129 | An HTTP API to manipulate Jenkins and the builds. The protocol is in JSON. 130 | Initial design is very simple, this is sufficient for the web-based 131 | submissions. 132 | 133 | #### Builds 134 | 135 | - `PUT /jobs/submit/tarball` 136 | 137 | Submit a source R package to be checked and built. The server responds 138 | with a JSON string including a random ID that can be used to query the 139 | submission later. Query parameters can be used later to select build 140 | OS, email notification, etc. 141 | 142 | - `GET /jobs/info/` 143 | 144 | Query the status of a submission. The server responds with a status 145 | code (e.g. SUBMITTED, QUEUED, DONE, etc), and if the build is already 146 | generating output, a summary of the output as well. The response also 147 | contains info about the build machine, if that is known already. 148 | 149 | - `GET /jobs/output/` 150 | 151 | Query the console output of the check. 152 | 153 | #### Worker machines, API for administrators 154 | 155 | - `GET /workers` 156 | 157 | Query the list of active worker machines. 158 | 159 | - `PUT /worker` 160 | 161 | Instruct Jenkins to create another worker. It's parameters are 162 | supplied in the body of the request, in JSON. 163 | 164 | - `DELETE /worker/` 165 | 166 | Remove a worker. 167 | 168 | ### Jenkins Linux workers (weeks 2-3), backends 169 | 170 | We will support different backends for workers. Initially there will be two 171 | backends: local, and either DigitalOcean or AWS. 172 | 173 | The local backend is for development and testing, it creates a worker 174 | container on the local machine. 175 | 176 | The DigitalOcean/AWS worker is mainly for production (but can be used for 177 | testing). DigitalOcean has an API to create a droplet with Docker support, 178 | AWS has an API as well. Jenkins has a Docker plugin that can create the 179 | container, once a Docker environment is set up. 180 | 181 | Each platform needs to be able to install a recent version of R-devel (and 182 | R-patched). For the Linux workers, a permanent Jenkins job will build 183 | R-devel daily, in a separate container, and then distribute the 184 | (successful) builds via HTTP. The worker containers download the binary 185 | R-devel versions upon start-up, and they also check if they have the most 186 | recent version at the beginning of each build. This is a simple and cheap 187 | HTTP HEAD query. 188 | 189 | Current Linux platforms on CRAN: 190 | 191 | - `r-devel-linux-x86_64-debian-clang` (Debian testing) 192 | - `r-devel-linux-x86_64-debian-gcc` (Debian testing) 193 | - `r-devel-linux-x86_64-fedora-clang` (Fedora 21) 194 | - `r-devel-linux-x86_64-fedora-gcc` (Fedora 21) 195 | - `r-patched-linux-x86_64` (Debian testing) 196 | - `r-release-linux-x86_64` (Debian testing) 197 | 198 | Other popular Linux platforms we want to support: 199 | 200 | - `r-devel` on most recent Ubuntu LTS 201 | - `r-release` on most recent Ubuntu LTS 202 | - `r-devel` on stable Debian 203 | - `r-release` on stable Debian 204 | 205 | ### System requirements (week 4) 206 | 207 | Create a database of system requirements, in the spirit of 208 | https://github.com/metacran/sysreqs The workers need to use this 209 | database. 210 | 211 | Repositories needed so far: 212 | 213 | * `rhub-jenkins` The Docker config for our Jenkins instance. It 214 | contains the Jenkins config as well. 215 | * `rhub-app` The web app to submit and query jobs, in node.js. 216 | - `rhub-api` Web pages that document the API. 217 | * `rhub-queue` The app that handles the queue, picks up the files, 218 | creates Jenkins jobs, and starts the build. 219 | * `rhub-worker-*` One repo for the Docker config for each worker type. 220 | I.e. one for `r-devel-linux-x86_64-debian-clang`, etc. 221 | * `rhub-backend-local` The local backend. All the code needed to create 222 | a local worker, and also the code needed to destroy it. In node.js, 223 | probably, so that the app can easily use it. 224 | * `rhub-backend-do` DigitalOcean backend to create/destroy workers on DO. 225 | * `rhub-backend-aws` Backend to create/destroy workers on AWS. 226 | * `sysreqs` Database of system requirement mappings. 227 | * `sysreqs-app` Web app with an API for system requirements. 228 | * `rhub-issues` A repository for user feedback. 229 | * `rhub-config` Encrypted r-hub configuration, accessible only via the 230 | private keys of the admins. All sensitive information needed for r-hub to 231 | work will be included here. 232 | 233 | ### Online submission system (week 5) 234 | 235 | Test the Linux builders and the system requirements by building 236 | all CRAN packages (that are available on Linux). Fix errors, 99% 237 | of the CRAN packages are expected to build correctly. 238 | 239 | In general, orchestrate, document, test and make public the web app for 240 | submitting builds for Linux machines. At this point, we will have the 241 | Linux-builder equivalent of Win-builder, as an open, extensible system. 242 | 243 | ### Redundant services, logging and monitoring (weeks 6-7) 244 | 245 | For logging, redirect all logs through the network to a single log 246 | server. These logs do not include console output from builds, those 247 | are handled separately. This is only about the r-hub service logs. 248 | 249 | For build logs, at this point of the project, we will just leave them in 250 | Jenkins for a while, and then remove them (and the whole job) after a 3-5 251 | days period. 252 | 253 | Create a [sensu][sensu] dashboard and the required reporters, to 254 | oversee all services. 255 | 256 | Redundancy: workers are disposable, so we don't need to worry about them. 257 | The web app can be replicated on a second server and then load balanced 258 | using AWS services. The same applies to the queue manager. 259 | 260 | [sensu]: https://sensuapp.org/ 261 | 262 | New repositories: 263 | 264 | * `rhub-monitor` Config for running the monitoring server. 265 | * `rhub-logger` Config for running the logger. 266 | * `rhub-dashboard` Web-app to view the actual monitoring dashboard. 267 | 268 | ### Windows workers (weeks 8-9) 269 | 270 | Users will be able to select windows in the submission web app, and 271 | builder selection will be part of the r-hub API as well. At this point the 272 | user will have to select a single builder. 273 | 274 | We can run windows VMs on AWS. We will use code from the 275 | [r-appveyor][r-appveyor] project to build the windows worker backend. This 276 | project builds a disk image containing a binary R-devel build daily. 277 | 278 | New repository: 279 | 280 | * `rhub-backend-aws-windows` AWS windows backend. 281 | 282 | [jenkins-azure]: https://github.com/jenkinsci/azure-slave-plugin 283 | [node-azure]: https://www.npmjs.com/package/azure 284 | [r-appveyor]: https://github.com/krlmlr/r-appveyor 285 | 286 | ### OSX workers (weeks 10-11) 287 | 288 | For legal reasons, OSX workers need to run on Apple hardware. Many 289 | companies offer OSX as IaaS, and this seems to be the simplest and cheapest 290 | solution for us. Most likely we will rent one or two Mac Mini servers with 291 | 16GB memory from https://macstadium.com, unless we get a better offer. This 292 | server comes with a VMware ESXi v6.0 Hypervisor, so we run multiple 293 | instances of OSX on it, in virtual machines. We create snapshot images for 294 | the latest two or three OSX versions, and we start each R package build 295 | from a snapshot image, assuming that start-up takes a reasonable time 296 | (1-2 minutes maximum). If it takes longer, then we will look into running 297 | the build as a restricted user, clean up after the build, reuse the VM 298 | for multiple builds, and only reboot/recreate the VM once a day. 299 | 300 | Jenkins supports ESXi via a plugin, or we can just use the native ESXi API 301 | directly from our web-app middleware, to manage the VMs. 302 | 303 | For OSX, a daily build of R-devel is available from r-project.org and the 304 | workers just download this and install it to a read-only shared area, 305 | so that the worker images do not need to be rebuilt daily. 306 | 307 | New repositories: 308 | 309 | * `rhub-backend-osx` 310 | 311 | ### On-demand workers (week 12) 312 | 313 | While we strive to have a quick build start-up time, we want to make 314 | sure that workers are not idle for long. For OSX at https://macstadium.com, 315 | flexible scaling is not available for the rented Mac Mini servers. 316 | For Linux and Windows workers, they are typically billed by the hour. 317 | We will work out a simple heuristics to shut down idle Windows and Linux 318 | workers. This could work from within Jenkins or the r-hub application. 319 | 320 | ### CRAN presubmission (starting from week 13) 321 | 322 | We will work with CRAN on the integration of r-hub into the CRAN submission 323 | process. At present, 80% of the CRAN submissions are turned down. We set 324 | out to reduce this to 20%, by providing feedback to package developers, 325 | prior to submitting to CRAN. 326 | 327 | The CRAN submission process will have the following steps: 328 | - User submits through web-app on r-hub. 329 | - Package is built and checked on Linux with r-devel. 330 | If there are warnings or errors, the submission is rejected. 331 | - Package is built and checked by r-hub, on all CRAN platforms. If there 332 | are warnings or errors, the package is rejected. 333 | - The submission is sent to CRAN, including all checks. *New* warnings and 334 | errors are marked for CRAN maintainers. They have the possibility to 335 | accepts the submission, and then the source package moves to CRAN's 336 | system. Otherwise CRAN maintainers can send back an email to the 337 | package maintainer, including all build logs, and possibly their 338 | own annotation. 339 | 340 | ### CI for GitHub projects (weeks 14-15) 341 | 342 | An R specific CI service, with GitHub integration. It should be completely 343 | automatic, with some sane defaults, e.g. by default build and check with 344 | r-devel, on Linux (Debian), Windows and the most recent OSX platform. 345 | Other platforms can be selected via configuration, or as one-time builds 346 | from the web app or the API. 347 | 348 | Having a CI also means that we will have projects, and need a database to 349 | store their settings, build history, etc. We will use a MongoDB database 350 | for this, except for the build output, which will be compressed and stored 351 | in a key-value datastore, e.g. Redis or something more lightweight. We will 352 | purge the output of old builds if we need to lower the storage costs. 353 | 354 | Repository: 355 | 356 | - `rhub-ci` Web app that handles the hooks from GitHub (and possibly other 357 | services in the future). 358 | - `rhub-ci-docs` Documentation of the r-hub CI. 359 | 360 | Binary and source packages from GitHub will be stored on a storage server 361 | in a CRAN-like repository, and we will open this repository for public use. 362 | 363 | Design a system to handle dependencies among packages stored at CRAN, 364 | r-hub, BioConductor, GitHub, etc. This should be part of the DESCRIPTION 365 | file, and handled by the r-hub CI automatically. 366 | 367 | ### Open, documented API (week 16-17) 368 | 369 | Document and make public the r-hub CI service, including an API to 370 | add/remove jobs, run/query builds. Stabilize services. Make sure that 371 | almost all CRAN packages build on all three major platforms. 372 | 373 | Repository: 374 | 375 | - `rhub` R package, to query the r-hub API: submit packages to 376 | build, search packages, etc. 377 | 378 | ### Community website (weeks 18) 379 | 380 | Simple web-site, where users can browse and search packages, both in the 381 | r-hub repository and CRAN. It will be integrated into the already existing 382 | web pages. We will possibly integrate this with the existing 383 | http://www.r-pkg.org web pages. 384 | 385 | ### Reverse dependency checks (week 19) 386 | 387 | We will include the check of reverse dependencies in CRAN pre-submissions. 388 | We compare the new check results to old ones, and include the differences 389 | (i.e. newly appearing warnings and errors) in our email notification to 390 | CRAN, if a package maintainer decides to continue with the submission, even 391 | if reverse dependencies break. 392 | 393 | ### Solaris builds (week 20) 394 | 395 | We will look into having Solaris workers. This is really only important for 396 | packages with compiled code. It is also non-trivial, for two reasons. 397 | First, Jenkins does not officially support Solaris since late 2014. Second, 398 | ideally we would need Solaris Sparc as well, as it has a different 399 | byte-order than all other CRAN platforms, and many errors only come up 400 | there. 401 | 402 | We can run Solaris x86 on AWS in a VM, and we can rent a Solaris Sparc LPAR 403 | in the cloud, and we can use Solaris Zones containers on it. 404 | 405 | ### Stabilize, publicize r-hub (weeks 21-24) 406 | 407 | These four weeks also serve as reserves, if any of the sub-projects run out 408 | of time, which is not unlikely. 409 | 410 | ## Costs 411 | 412 | Some facts first, we base our cost estimates on these: 413 | 414 | * There are about 7,000 packages on CRAN, and we estimate that this number 415 | will double in five years, to about 14000 packages. 416 | * About 30 packages are updated on CRAN every day, we estimate that this 417 | number will grow to 60 in the next five years. 418 | * A CRAN package has on average 2.65 hard (Depends, Imports) dependencies. 419 | * There are about 100,000 repositories on GitHub that are classified as 420 | written mainly in R. Out of them about 2,000 are currently using Travis CI. 421 | * On a modern processor (e.g. Xeon 2.9GHz) it takes on average 30 seconds to 422 | build and check a CRAN package on Linux, and about twice as long on OSX 423 | and three to five times as long on Windows. These numbers assume that 424 | binary packages are available, and no container start-up time. 425 | 426 | Our estimates for the number of daily builds and required CPU time one year 427 | after the start of the project, assuming r-hub is integrated into the CRAN 428 | submission process and it is popular on GitHub as a CI service: 429 | 430 | * 35 successful CRAN submissions, and 200 unsuccessful ones, a total number 431 | of 235 submissions, triggering 235 direct Linux builds on Debian, and 432 | about 50 builds on other Linux flavors and other OSes. This also 433 | triggers about 150 reverse dependency checks on the Debian Linux. 434 | * We estimate 1,000 CI builds from GitHub, although this number might be 435 | highly inaccurate, as we have essentially no real data on this. 436 | (The mean number of ~5 pushes per R GitHub repository is useless, as 437 | most of these repositories are essentially inactive.) The 1,000 CI builds 438 | will be done on Debian Linux, Windows and the most recent OSX platform. 439 | * This is a total of 1,385 builds on the primary Linux platform, 1,050 440 | builds on the primary Windows and primary OSX platforms, and 441 | 50 builds on other platforms. 442 | * Assuming 2 minutes container start-up time, this amounts to about 60 CPU 443 | hours for the primary Linux platform, 50 CPU hours for the primary OSX 444 | platform and 70 CPU hours for the primary windows platform. For other 445 | platforms the required resource are negligible. 446 | 447 | These estimates are the lower bounds for running a useful service. They do 448 | not contain builds that are stuck and until the end of the build time limit 449 | (20-40 minutes), and they do not contain CPU time required to build R-devel 450 | daily, etc. We estimate that these are negligible compared to the total 451 | cost. 452 | 453 | Cost for development work: 454 | * $100 per hour, total of 1000 hours over 9-12 months. $100,000 total. 455 | 456 | Yearly costs: 457 | * Domain name registration, virus scanner for the Windows builder, 458 | SSL certificates, maximum $500 total. 459 | 460 | Ongoing operational costs, from month one: 461 | * Development and production environments, Jenkins server and web 462 | server hosting, all server side applications, about $100 per month. 463 | 464 | From month two: 465 | * Linux build servers: $100 per month, $10 per platform. 466 | * OSX build servers: one mac mini server $120 per month. 467 | * Windows servers: four t2.medium instances on AWS, $210 per month. 468 | * Solaris server: Solaris Sparc $300 per month, Solaris x86 $40 per month. 469 | * Storage server for binary and source packages: under $200 per month. 470 | 471 | Maintenance cost after the first year: 472 | * Ten hours work per month, for one person, $1000 per month. 473 | 474 | Total monthly operational cost after the first year: $2112 per month. 475 | 476 | ## Future directions 477 | 478 | They may be prioritized to be included in the project if there is time 479 | available from the allocated 1000 hours, and the community needs them. 480 | 481 | ### Builder 482 | 483 | - Build native binary packages for popular Linux distribution, e.g. Debian, 484 | Ubuntu, RedHat Linux or CentOS. 485 | - Builds with memory sanitizers and debuggers: valgrind, UB-SAN, etc. 486 | - Digitally signed binaries, served over HTTPS. 487 | - R CI: since we build R-devel regularly anyway, we can also run tests 488 | for it, and multi-platform continuous integration. 489 | - Separate r-hub-devel and r-hub-stable repositories. 490 | - Automatic vertical scaling of build workers. Keep track of how much 491 | memory they need and use bigger machines only if needed. 492 | - Check CRAN packages with all allowed major R versions. E.g. if a package 493 | depends on R (>= 3.0.0), then check it with R 3.0.3, 3.1.3, 3.2.1. 494 | 495 | ### Community web site and r-hub web app 496 | 497 | - RSS feeds for new and updated packages, similarly to CRANberries, with 498 | filters for keywords, dependencies, authors. 499 | - Searchable documentation of R packages, online in HTML, both for CRAN and 500 | r-hub packages. With inline plots. 501 | - Explain R CMD check failures. Link to relevant parts of the 502 | documentation. 503 | - Badges for package versions, number of installs, downloads, checks. 504 | - RSS or email notifications for CRAN check failures. 505 | --------------------------------------------------------------------------------