├── README.md └── intro.pdf /README.md: -------------------------------------------------------------------------------- 1 | # Introduction to parallel computing with R 2 | **Instructor:** Hana Ševčíková, 3 | University of Washington 4 | 5 | **Where and when:** useR!2017 conference, Brussels, July 4th 2017 6 | 7 | **Keywords**: Parallel Computing, Master-Slave Paradigm, Reproducibility, Load Balancing, Distributed Random Number Generators 8 | 9 | # Goal 10 | 11 | The goal of this tutorial is to introduce attendees to concepts and tools available in R for parallel computing. It is aimed at novice R programmers to lower an often percieved mental hurdle when dealing with code paralelization. 12 | 13 | 14 | # Description 15 | 16 | Over the past few years, R has become increasingly popular outside of the statistical community. It became one of the most popular programming languages among data scientists. With an increasing amount of data and more complex algorithms available to scientists today, parallel processing is almost always a must, and in fact is expected in packages implementing time-consuming methods. 17 | 18 | Numerous R packages for parallel computing have been developed over the past two decades, with **snow** being one of the pioneers in providing a high level interface for parallel computations on a cluster or in a multicore environment. More recently, most of the **snow** functionality has been implemented in the R core package **parallel**. 19 | 20 | The main focus of the tutorial will be on the viewpoint implemented in the **parallel** package, namely the master-slave paradigm. Note that other viewpoints, such as Single Program/Multiple Data (SPMD), grid computing, map/reduce will be briefly introduced only as concepts, without going into detail. 21 | 22 | In a parallel statistical application a few issues need more attention than in its sequential counterpart. These include reproducibility, random number generation, computation transparency or load balancing. We will talk about solutions to these and other issues implemented in user-contributed packages. 23 | 24 | # Outline 25 | 26 | * Paradigms of parallel computing 27 | * The master-slave paradigm in R 28 | * Examples of using **parallel** 29 | * Random numbers generation 30 | * Reproducibility and load balancing 31 | * Review of useful **snow**-like R packages with examples (**snowFT** and **foreach**) 32 | * Benchmarking 33 | 34 | # Pre-requisites 35 | 36 | The tutorial is targeting people relatively new to R, so only basic knowledge of R is required. 37 | 38 | Please install the following packages: 39 | 40 | ``` 41 | install.packages(c("foreach", "doParallel", "doRNG", 42 | "snowFT", "extraDistr", "ggplot2", 43 | "reshape2", "wpp2017"), 44 | dependencies = TRUE) 45 | ``` 46 | 47 | #### Technical Note: 48 | 49 | For RStudio users, please note that currently RStudio contains a bug that prevents one of the packages handled in the tutorial from working correctly. Thus I recommend that you use an alternative R user interface than RStudio. If you use RStudio, you will not be able to run about 1/4 of the material. (This bug in RStudio was reported and hopefully will be fixed soon.) 50 | 51 | # Instructor 52 | 53 | [Hana Ševčíková](http://www.stat.washington.edu/hana) is a Senior Research Scientist at the Center for Statistics and the Social Sciences at the University of Washington. She has collaborated on implementation of R packages for parallel computing and distributed random number generators, such as **snowFT**, **rlecuyer** and **snow**. More recently, she has been involved in developing demographic R packages as part of a collaborative research project with the United Nations. 54 | 55 | 56 | [Material](https://rawgit.com/PPgp/useR2017public/master/tutorial.html) 57 | -------------------------------------------------------------------------------- /intro.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hanase/useR2017/0b55937099f35d6ade90b4586e3f09d741ef0d07/intro.pdf --------------------------------------------------------------------------------