└── README.md /README.md: -------------------------------------------------------------------------------- 1 | ![R Logo](https://www.r-project.org/Rlogo.png) 2 | 3 | # Tip and tricks for R 4 | - [Usefull tools](https://github.com/IARC-bioinfo/R-tricks#usefull-tools) 5 | - [Tip and tricks](https://github.com/IARC-bioinfo/R-tricks#tip-and-tricks) 6 | - [Use Rscript to run R from bash](https://github.com/IARC-bioinfo/R-tricks#use-rscript-to-run-r-from-bash) 7 | - [Running bash commands from R](https://github.com/IARC-bioinfo/R-tricks#running-bash-commands-from-r) 8 | - [Building an argument section in R](https://github.com/IARC-bioinfo/R-tricks#building-an-argument-section-for-your-r-script) 9 | - [Source github R code](https://github.com/IARCbioinfo/R-tricks/blob/master/README.md#source-an-r-script-hosted-on-github) 10 | 11 | ## Usefull tools 12 | - [R](https://www.r-project.org) 13 | - [RStudio](https://www.rstudio.com) 14 | - [Revolution Analytics](http://www.revolutionanalytics.com) is a R fork made to be faster. 15 | 16 | ## Modern R 17 | There are new ways to use R for data processing workflow, in particular using combination of these packages: [`dplyr`](https://github.com/hadley/dplyr), [`magrittr`](https://github.com/smbache/magrittr), [`tidyr`](https://github.com/hadley/tidyr) and [`ggplot2`](https://github.com/hadley/ggplot2) (with the [`cowplot`](https://github.com/wilkelab/cowplot) add-on). A few nice tutorial/introductions: 18 | - http://zevross.com/blog/2015/01/13/a-new-data-processing-workflow-for-r-dplyr-magrittr-tidyr-ggplot2/ 19 | - http://blog.rstudio.org/2014/07/22/introducing-tidyr/ 20 | - https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html 21 | - http://www.r-statistics.com/2014/08/simpler-r-coding-with-pipes-the-present-and-future-of-the-magrittr-package/ 22 | 23 | ## Plotting with R 24 | There is a pretty nice R package to easily produce publication-ready plots with ggplot2 : [ggpubr](http://www.sthda.com/english/rpkgs/ggpubr/). 25 | In addition, there is a list of color palettes inspired by colors used in scientific journals : [vignette](https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html). 26 | 27 | ## Tip and tricks 28 | 29 | ### Use Rscript to run R from bash 30 | R comes with `Rscript`, a command line tool that can run R scripts or R commands directly from the bash shell. 31 | 32 | The most compact way to run it is with the `-e` option containing directly the R expression to evaluate. For example the following command will output 10 random numbers: 33 | ```bash 34 | Rscript -e 'res=runif(10);cat(res,"\n")' 35 | ``` 36 | 37 | Of course this is only usefull for very short commands. An alternative is to write a R script, for example if you create a file called `test.r` containing: 38 | ```R 39 | res=runif(10) 40 | cat(res,"\n") 41 | ``` 42 | You can run it using `Rscript test.r`. And even better, if you add an initial [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) line `#!/usr/bin/env Rscript` in the script above and make it executable with `chmod +x test.r`, you can directly launch your R script with `./test.r`! 43 | 44 | Now a common thing to have in scripts is command line parameters. The R function `commandArgs()` returns the command line arguments as a vector of strings (so you will need to convert them to numeric in some cases). By default the first element of the vector is the name of the Rscript executable, the second if the first argument and so on. Most of the time you won't need it so you should rather use `commandArgs(trailingOnly = TRUE)` instead (or the compact version `commandArgs(T)`) to have the first element being the first argument and so on. You can easily check if the command line argument is missing: R puts an `NA`. For example, our script outputing random numbers will look like this if we want to output 10 numbers by default or to have this number as a command line argument: 45 | ```R 46 | #!/usr/bin/env Rscript 47 | N=as.numeric(commandArgs(TRUE)[1]) 48 | if (is.na(N)) N=10 49 | res=runif(N) 50 | cat(res,"\n") 51 | ``` 52 | You can run it using: 53 | ```bash 54 | ./test.r 5 55 | 0.2852298 0.9366892 0.752774 0.4416602 0.9603793 56 | ``` 57 | or 58 | ```bash 59 | ./test.r 60 | 0.3131752 0.3540976 0.07491065 0.125842 0.2947516 0.203168 0.8390772 0.6115891 0.323192 0.783478 61 | ``` 62 | If you have a second argument it way look like: 63 | ```R 64 | #!/usr/bin/env Rscript 65 | N1=as.numeric(commandArgs(TRUE)[1]) 66 | N2=as.numeric(commandArgs(TRUE)[2]) 67 | res1=runif(N1) 68 | res2=rnorm(N2) 69 | cat(res1,"\n") 70 | cat(res2,"\n") 71 | ``` 72 | And running: 73 | ```bash 74 | ./test.r 4 8 75 | 0.6631743 0.2670673 0.5929007 0.8545739 76 | -0.6988854 -0.4150706 1.0834 -0.002987133 -2.552233 -0.6456261 0.7652581 0.7687048 77 | ``` 78 | outputs 4 uniform numbers and 8 normally distributed numbers. Note that in this case it's harder to check the presence or absence of arguments. 79 | 80 | Also note that [`funr`](https://github.com/sahilseth/funr) is an interesting tool providing shell access to all R functions. 81 | 82 | ### Running bash commands from R 83 | Now the opposite: you need to run a bash command from y R script. In this section all commands should be run in R unless specified. The easiest way is to use the `system()` function: 84 | ```R 85 | system('ls -l | wc -l') 86 | ``` 87 | By default the `system()` function returns an error code (0 for success and 127 for failure). If you want to get the result of the command back in a R variable, you can use the `intern = TRUE` option. In this case it will return a character vector with one string element for each output line. Again, use `as.numeric()` to transform the output to a number if needed. For example this command will give you the number of files in the current directory in the variable `num_files` (note that there are easier and cleaner ways of doing that in R directly, that's just a toy example): 88 | ```R 89 | num_files=as.numeric(system('ls -l | wc -l',intern = TRUE)) 90 | ``` 91 | 92 | If the output is more than a single line and you want to load it in a data frame, you can use the function `pipe()` instead of system. For example the following **bash command** returns the names of the files and their sizes in the current directory: 93 | ```bash 94 | ls -l | tail -n +2 | awk '{print $9"\t"$5}' 95 | ``` 96 | Now in R, you can use `read.table` and `pipe` together to call the command and put the result in a data frame `file_sizes`: 97 | ```R 98 | file_sizes=read.table(pipe('ls -l | tail -n +2 | awk \'{print $9"\t"$5}\'')) 99 | ``` 100 | Note that the `'` chars in the bash command need to be escaped with `\'` in R because the command itself has to be a string in R delimited by `'`. 101 | 102 | 103 | ### Building an argument section for your R script 104 | 105 | This is an example of how having a beautiful argument section in the head of your R script, with possible optional arguments 106 | 107 | ```R 108 | #! /usr/bin/Rscript 109 | 110 | ## Collect arguments 111 | args <- commandArgs(TRUE) 112 | 113 | ## Parse arguments (we expect the form --arg=value) 114 | parseArgs <- function(x) strsplit(sub("^--", "", x), "=") 115 | argsL <- as.list(as.character(as.data.frame(do.call("rbind", parseArgs(args)))$V2)) 116 | names(argsL) <- as.data.frame(do.call("rbind", parseArgs(args)))$V1 117 | args <- argsL 118 | rm(argsL) 119 | 120 | ## Give some value to options if not provided 121 | if(is.null(args$opt_arg1)) {args$opt_arg1="default_option1"} 122 | if(is.null(args$opt_arg2)) {args$opt_arg2="default_option1"} else {args$opt_arg2=as.numeric(args$opt_arg2)} 123 | 124 | ## Default setting when no all arguments passed or help needed 125 | if("--help" %in% args | is.null(args$arg1) | is.null(args$arg2)) { 126 | cat(" 127 | The R Script arguments_section.R 128 | 129 | Mandatory arguments: 130 | --arg1=type - description 131 | --arg2=type - description 132 | --help - print this text 133 | 134 | Optionnal arguments: 135 | --opt_arg1=String - example:an absolute path, default:default_option1 136 | --opt_arg2=Value - example:a threshold, default:10 137 | 138 | WARNING : here put all the things the user has to know 139 | 140 | Example: 141 | ./arguments_section.R --arg1=~/Documents/ --arg2=10 --opt_arg2=8 \n\n") 142 | 143 | q(save="no") 144 | } 145 | 146 | cat("first mandatory argument : ", args$arg1,"\n",sep="") 147 | cat("second mandatory argument : ", args$arg2,"\n",sep="") 148 | cat("first optional argument : ", args$opt_arg1,"\n",sep="") 149 | cat("second optional argument : ", args$opt_arg2,"\n",sep="") 150 | ``` 151 | 152 | ### Source an R script hosted on github 153 | 154 | Once on the webpage of your R script on github, click on raw, and copy the URL from the toolbar. Then use the following code (change the [example](https://github.com/tdelhomme/R-tips-tricks/blob/master/Rcode/multiplot.r) of the raw web address): 155 | 156 | ``` 157 | library(RCurl) 158 | 159 | script <- getURL("https://raw.githubusercontent.com/tdelhomme/R-tips-tricks/master/Rcode/multiplot.r") 160 | 161 | eval(parse(text = script)) 162 | ``` 163 | 164 | --------------------------------------------------------------------------------