├── Dates.Rmd ├── RColors.pptx ├── ReproResearch.pptx ├── binarylinearmodelsim.pdf ├── classes-methods.Rnw ├── connections.tex ├── controlstructures.tex ├── datatypes1.tex ├── datatypes2.tex ├── debugging.tex ├── functions.tex ├── ggplot2_part1.pptx ├── ggplot2_part2.pptx ├── grep.tex ├── help.ppt ├── homicide-month.pdf ├── knitr.pptx ├── linearmodelsim.pdf ├── loopfunctions.tex ├── macros.tex ├── mulike.pdf ├── overview_history.tex ├── plotting.tex ├── reading-data.tex ├── regex.tex ├── sigmalike.pdf ├── simpoisson.pdf ├── simulation.tex ├── str.pptx └── vectorized.tex /Dates.Rmd: -------------------------------------------------------------------------------- 1 | % Dates and Times in R 2 | % Computing for Data Analysis 3 | % 4 | 5 | ```{r, echo=FALSE} 6 | options(width = 50) 7 | ``` 8 | 9 | # Dates and Times in R 10 | 11 | R has developed a special representation of dates and times 12 | 13 | - Dates are represented by the `Date` class 14 | 15 | - Times are represented by the `POSIXct` or the `POSIXlt` class 16 | 17 | - Dates are stored internally as the number of days since 1970-01-01 18 | 19 | - Times are stored internally as the number of seconds since 20 | 1970-01-01 21 | 22 | # Dates in R 23 | 24 | Dates are represented by the `Date` class and can be coerced from a 25 | character string using the `as.Date()` function. 26 | 27 | ```{r} 28 | x <- as.Date("1970-01-01") 29 | x 30 | unclass(x) 31 | unclass(as.Date("1970-01-02")) 32 | ``` 33 | 34 | # Times in R 35 | 36 | Times are represented using the `POSIXct` or the `POSIXlt` class 37 | 38 | - `POSIXct` is just a very large integer under the hood; it use a 39 | useful class when you want to store times in something like a data 40 | frame 41 | 42 | - `POSIXlt` is a list underneath and it stores a bunch of other useful 43 | information like the day of the week, day of the year, month, day of 44 | the month 45 | 46 | There are a number of generic functions that work on dates and times 47 | 48 | - `weekdays`: give the day of the week 49 | 50 | - `months`: give the month name 51 | 52 | - `quarters`: give the quarter number ("Q1", "Q2", "Q3", or "Q4") 53 | 54 | # Times in R 55 | 56 | Times can be coerced from a character string using the `as.POSIXlt` 57 | or `as.POSIXct` function. 58 | 59 | ```{r} 60 | x <- Sys.time() 61 | x 62 | p <- as.POSIXlt(x) 63 | names(unclass(p)) 64 | p$sec 65 | ``` 66 | 67 | # Times in R 68 | 69 | You can also use the `POSIXct` format. 70 | 71 | ```{r} 72 | x <- Sys.time() 73 | x ## Already in `POSIXct' format 74 | unclass(x) 75 | x$sec 76 | p <- as.POSIXlt(x) 77 | p$sec 78 | ``` 79 | 80 | # Times in R 81 | 82 | Finally, there is the `strptime` function in case your dates are 83 | written in a different format 84 | 85 | ```{r} 86 | datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10") 87 | x <- strptime(datestring, "%B %d, %Y %H:%M") 88 | x 89 | class(x) 90 | ``` 91 | 92 | I can *never* remember the formatting strings. Check `?strptime` for 93 | details. 94 | 95 | # Operations on Dates and Times 96 | 97 | You can use mathematical operations on dates and times. Well, really 98 | just `+` and `-`. You can do comparisons too (i.e. `==`, `<=`) 99 | 100 | ```{r} 101 | x <- as.Date("2012-01-01") 102 | y <- strptime("9 Jan 2011 11:34:21", "%d %b %Y %H:%M:%S") 103 | x - y 104 | x <- as.POSIXlt(x) 105 | x - y 106 | ``` 107 | 108 | # Operations on Dates and Times 109 | 110 | Even keeps track of leap years, leap seconds, daylight savings, and 111 | time zones. 112 | 113 | ```{r} 114 | x <- as.Date("2012-03-01") 115 | y <- as.Date("2012-02-28") 116 | x - y 117 | x <- as.POSIXct("2012-10-25 01:00:00") 118 | y <- as.POSIXct("2012-10-25 06:00:00", tz = "GMT") 119 | y - x 120 | ``` 121 | 122 | # Summary 123 | 124 | - Dates and times have special classes in R that allow for numerical 125 | and statistical calculations 126 | 127 | - Dates use the `Date` class 128 | 129 | - Times use the `POSIXct` and `POSIXlt` class 130 | 131 | - Character strings can be coerced to Date/Time classes using the 132 | `strptime` function or the `as.Date`, `as.POSIXlt`, or `as.POSIXct` 133 | -------------------------------------------------------------------------------- /RColors.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/RColors.pptx -------------------------------------------------------------------------------- /ReproResearch.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/ReproResearch.pptx -------------------------------------------------------------------------------- /binarylinearmodelsim.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/binarylinearmodelsim.pdf -------------------------------------------------------------------------------- /classes-methods.Rnw: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | \usepackage[noae]{Sweave} 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[Classes and Methods in R]{Classes and Methods in R} 26 | 27 | \date{Computing for Data Analysis} 28 | 29 | \setbeamertemplate{footline}[page number] 30 | 31 | \setkeys{Gin}{width=0.4\textwidth} 32 | 33 | 34 | \begin{document} 35 | 36 | 37 | 38 | \begin{frame} 39 | \titlepage 40 | \end{frame} 41 | 42 | \begin{frame}{Classes and Methods} 43 | \begin{itemize} 44 | \item 45 | A system for doing object oriented programming 46 | \item R was originally quite interesting because it is both 47 | interactive \textit{and} has a system for object orientation. 48 | \begin{itemize} 49 | \item 50 | Other languages which support OOP (C++, Java, Lisp, Python, Perl) 51 | generally speaking are not interactive languages 52 | \end{itemize} 53 | \item In R much of the code for supporting classes/methods is written 54 | by John Chambers himself (the creator of the original S language) 55 | and documented in the book \textit{Programming with Data: A Guide to the S 56 | Language} 57 | \item A natural extension of Chambers' idea of allowing someone to 58 | cross the user $\longrightarrow$ programmer spectrum 59 | \item Object oriented programming is a bit different in R than it is 60 | in most languages --- even if you are familiar with the idea, you 61 | may want to pay attention to the details 62 | \end{itemize} 63 | \end{frame} 64 | 65 | 66 | \begin{frame}{Two styles of classes and methods} 67 | S3 classes/methods 68 | \begin{itemize} 69 | \item 70 | Included with version 3 of the S language. 71 | \item 72 | Informal, a little kludgey 73 | \item 74 | Sometimes called \textit{old-style} classes/methods 75 | \end{itemize} 76 | S4 classes/methods 77 | \begin{itemize} 78 | \item 79 | more formal and rigorous 80 | \item 81 | Included with S-PLUS 6 and R 1.4.0 (December 2001) 82 | \item 83 | Also called \textit{new-style} classes/methods 84 | \end{itemize} 85 | \end{frame} 86 | 87 | 88 | 89 | \begin{frame}{Two worlds living side by side} 90 | \begin{itemize} 91 | \item For now (and the forseeable future), S3 classes/methods and S4 92 | classes/methods are separate systems (but they can be mixed to some 93 | degree). 94 | \item 95 | Each system can be used fairly independently of the other. 96 | \item 97 | Developers of new projects (you!) are encouraged to use the S4 98 | style classes/methods. 99 | \begin{itemize} 100 | \item 101 | Used extensively in the Bioconductor project 102 | \end{itemize} 103 | \item 104 | But many developers still use S3 classes/methods because they 105 | are ``quick and dirty" (and easier). 106 | \item 107 | In this lecture we will focus primarily on S4 classes/methods 108 | \item The code for implementing S4 classes/methods in R is in the 109 | \textbf{methods} package, which is usually loaded by default (but 110 | you can load it with \code{library(methods)} if for some reason it 111 | is not loaded) 112 | \end{itemize} 113 | \end{frame} 114 | 115 | \begin{frame}{Object Oriented Programming in R} 116 | \begin{itemize} 117 | \item A \textit{class} is a description of an thing. A class can be 118 | defined using \code{setClass()} in the \textbf{methods} package. 119 | \item 120 | An \textit{object} is an instance of a class. Objects can be created 121 | using \code{new()}. 122 | \item A \textit{method} is a function that only operates on a certain 123 | class of objects. 124 | \item A generic function is an R function which dispatches methods. A 125 | generic function typically encapsulates a ``generic" concept 126 | (e.g. \code{plot}, \code{mean}, \code{predict}, ...) 127 | \begin{itemize} 128 | \item 129 | The generic function does not actually do any computation. 130 | \end{itemize} 131 | \item 132 | A \textit{method} is the implementation of a generic function for an 133 | object of a particular class. 134 | \end{itemize} 135 | \end{frame} 136 | 137 | 138 | \begin{frame}{Things to look up} 139 | \begin{itemize} 140 | \item 141 | The help files for the `methods' package are extensive --- do read 142 | them as they are the primary documentation 143 | \item 144 | You may want to start with \code{?Classes} and \code{?Methods} 145 | \item 146 | Check out \code{?setClass}, \code{?setMethod}, and \code{?setGeneric} 147 | \item 148 | Some of it gets technical, but try your best for now---it will make 149 | sense in the future as you keep using it. 150 | \item Most of the documentation in the \textbf{methods} package is 151 | oriented towards developers/programmers as these are the primary 152 | people using classes/methods 153 | \end{itemize} 154 | \end{frame} 155 | 156 | 157 | \begin{frame}[fragile]{Classes} 158 | All objects in R have a class which can be determined by the class 159 | function 160 | <>= 161 | class(1) 162 | class(TRUE) 163 | class(rnorm(100)) 164 | class(NA) 165 | class("foo") 166 | @ 167 | \end{frame} 168 | 169 | 170 | 171 | \begin{frame}[fragile]{Classes (cont'd)} 172 | Data classes go beyond the atomic classes 173 | <>= 174 | x <- rnorm(100) 175 | y <- x + rnorm(100) 176 | fit <- lm(y ~ x) ## linear regression model 177 | class(fit) 178 | @ 179 | \end{frame} 180 | 181 | \begin{frame}{Generics/Methods in R} 182 | \begin{itemize} 183 | \item 184 | S4 and S3 style generic functions look different but conceptually, 185 | they are the same (they play the same role). 186 | \item 187 | When you program you can write new methods for an existing generic OR 188 | create your own generics and associated methods. 189 | \item Of course, if a data type does not exist in R that matches your 190 | needs, you can always define a new class along with generics/methods 191 | that go with it 192 | \end{itemize} 193 | \end{frame} 194 | 195 | \begin{frame}[fragile]{An S3 generic function (in the `base' package)} 196 | The \code{mean} function is generic 197 | <>= 198 | mean 199 | @ 200 | 201 | So is the \code{print} function 202 | <>= 203 | print 204 | @ 205 | \end{frame} 206 | 207 | 208 | \begin{frame}[fragile]{S3 methods} 209 | <>= 210 | methods("mean") 211 | @ 212 | \end{frame} 213 | 214 | 215 | \begin{frame}[fragile]{An S4 generic function (from the `methods' package)} 216 | The S4 equivalent of \code{print} is \code{show} 217 | <>= 218 | show 219 | @ 220 | 221 | The \code{show} function is usually not called directly (much like 222 | \code{print}) because objects are auto-printed 223 | \end{frame} 224 | 225 | \begin{frame}[fragile]{S4 methods} 226 | There are many different methods for the \code{show} generic 227 | function 228 | <>= 229 | showMethods("show") 230 | @ 231 | \end{frame} 232 | 233 | 234 | \begin{frame}{Generic/method mechanism} 235 | The first argument of a generic function is an object of a particular 236 | class (there may be other arguments) 237 | \begin{enumerate} 238 | \item 239 | The generic function checks the class of the object. 240 | \item 241 | A search is done to see if there is an appropriate method for 242 | that class. 243 | \item 244 | If there exists a method for that class, then that method is 245 | called on the object and we're done. 246 | \item 247 | If a method for that class does not exist, a search is done to see 248 | if there is a default method for the generic. If a default exists, 249 | then the default method is called. 250 | \item 251 | If a default method doesn't exist, then an error is thrown. 252 | \end{enumerate} 253 | \end{frame} 254 | 255 | \begin{frame}{Examining Code for Methods} 256 | Examining the code for an S3 or S4 method requires a call to a special 257 | function 258 | \begin{itemize} 259 | \item You cannot just print the code for a method like other 260 | functions because the code for the method is usually hidden. 261 | \item If you want to see the code for an S3 method, you can use the function 262 | \code{getS3method}. 263 | \item The call is \code{getS3method(, )} 264 | \item For S4 methods you can use the function \code{getMethod} 265 | \item The call is \code{getMethod(, )} (more 266 | details later) 267 | \end{itemize} 268 | \end{frame} 269 | 270 | 271 | \begin{frame}[fragile]{S3 Class/Method: Example 1} 272 | What's happening here? 273 | <>= 274 | set.seed(2) 275 | x <- rnorm(100) 276 | mean(x) 277 | @ 278 | \begin{enumerate} 279 | \item 280 | The class of x is ``numeric'' 281 | \item 282 | But there is no mean method for ``numeric'' objects! 283 | \item 284 | So we call the default function for \code{mean}. 285 | \end{enumerate} 286 | \end{frame} 287 | 288 | 289 | \begin{frame}[fragile]{S3 Class/Method: Example 1} 290 | <>= 291 | head(getS3method("mean", "default")) 292 | tail(getS3method("mean", "default")) 293 | @ 294 | \end{frame} 295 | 296 | \begin{frame}[fragile]{S3 Class/Method: Example 2} 297 | What happens here? 298 | <>= 299 | set.seed(3) 300 | df <- data.frame(x = rnorm(100), y = 1:100) 301 | sapply(df, mean) 302 | @ 303 | \begin{enumerate} 304 | \item 305 | The class of df is ``data.frame''; in a data frame each column can be 306 | an object of a different class 307 | \item 308 | We \code{sapply} over the columns and call the \code{mean} function 309 | \item 310 | In each column, \code{mean} checks the class of the object and 311 | dispatches the appropriate method. 312 | \item Here we have a \code{numeric} column and an \code{integer} 313 | column; in both cases \code{mean} calls the default method 314 | \end{enumerate} 315 | \end{frame} 316 | 317 | 318 | \begin{frame}{Calling Methods} 319 | NOTE: Some methods are visible to the user (i.e. \code{mean.default}), 320 | but you should \textbf{never} call methods directly. Rather, use the 321 | generic function and let the method be dispatched automatically. 322 | \end{frame} 323 | 324 | 325 | \begin{frame}[fragile]{S3 Class/Method: Example 3} 326 | The \code{plot} function is generic and its behavior depends on the 327 | object being plotted. 328 | <>= 329 | set.seed(10) 330 | x <- rnorm(100) 331 | plot(x) 332 | @ 333 | \end{frame} 334 | 335 | 336 | \begin{frame}[fragile]{S3 Class/Method: Example 3} 337 | For time series objects, \code{plot} connects the dots 338 | <>= 339 | set.seed(10) 340 | x <- rnorm(100) 341 | x <- as.ts(x) ## Convert to a time series object 342 | plot(x) 343 | @ 344 | \end{frame} 345 | 346 | \begin{frame}{Write your own methods!} 347 | If you write new methods for new classes, you'll probably end up 348 | writing methods for the following generics: 349 | \begin{itemize} 350 | \item 351 | print/show 352 | \item 353 | summary 354 | \item 355 | plot 356 | \end{itemize} 357 | There are two ways that you can extend the R system via classes/methods 358 | \begin{itemize} 359 | \item Write a method for a new class but for an existing generic 360 | function (i.e. like \code{print}) 361 | \item Write new generic functions and new methods for those generics 362 | \end{itemize} 363 | \end{frame} 364 | 365 | 366 | \begin{frame}{S4 Classes} 367 | Why would you want to create a new class? 368 | \begin{itemize} 369 | \item 370 | To represent new types of data (e.g. gene expression, space-time, 371 | hierarchical, sparse matrices) 372 | \item 373 | New concepts/ideas that haven't been thought of yet (e.g. a fitted 374 | point process model, mixed-effects model, a sparse matrix) 375 | \item 376 | To abstract/hide implementation details from the user 377 | \end{itemize} 378 | I say things are ``new'' meaning that R does not know about them (not 379 | that they are new to the statistical community). 380 | \end{frame} 381 | 382 | \begin{frame}{S4 Class/Method: Creating a New Class} 383 | A new class can be defined using the \code{setClass} function 384 | \begin{itemize} 385 | \item At a minimum you need to specify the name of the class 386 | \item You can also specify data elements that are called \textit{slots} 387 | \item You can then define methods for the class with the 388 | \code{setMethod} function 389 | \item Information about a class definition can be obtained with the 390 | \code{showClass} function 391 | \end{itemize} 392 | \end{frame} 393 | 394 | \begin{frame}[fragile]{S4 Class/Method: Polygon Class} 395 | Creating new classes/methods is usually not something done at the 396 | console; you likely want to save the code in a separate file 397 | \begin{verbatim} 398 | setClass("polygon", 399 | representation(x = "numeric", 400 | y = "numeric")) 401 | \end{verbatim} 402 | The slots for this class are \code{x} and \code{y}. The slots for an 403 | S4 object can be accessed with the \code{@} operator. 404 | \end{frame} 405 | 406 | \begin{frame}[fragile]{S4 Class/Method: Polygon Class} 407 | A \code{plot} method can be created with the \code{setMethod} 408 | function. 409 | \begin{itemize} 410 | \item For \code{setMethod} you need to specify a generic function 411 | (\code{plot}), and a \textit{signature}. 412 | \item A signature is a character vector indicating the classes of 413 | objects that are accepted by the method. In this case, the 414 | \code{plot} method will take one type of object--a \code{polygon} 415 | object. 416 | \end{itemize} 417 | \begin{verbatim} 418 | setMethod("plot", "polygon", 419 | function(x, y, ...) { 420 | plot(x@x, x@y, type = "n", ...) 421 | xp <- c(x@x, x@x[1]) 422 | yp <- c(x@y, x@y[1]) 423 | lines(xp, yp) 424 | }) 425 | \end{verbatim} 426 | Notice that the slots of the polygon (the x- and y-coordinates) are 427 | accessed with the \code{@} operator. 428 | \end{frame} 429 | 430 | \begin{frame}[fragile]{S4 Class/Method: Polygon Class} 431 | Create a new class 432 | <>= 433 | setClass("polygon", 434 | representation(x = "numeric", 435 | y = "numeric")) 436 | @ 437 | 438 | Create a plot method for this class 439 | <>= 440 | setMethod("plot", "polygon", 441 | function(x, y, ...) { 442 | plot(x@x, x@y, type = "n", ...) 443 | xp <- c(x@x, x@x[1]) 444 | yp <- c(x@y, x@y[1]) 445 | lines(xp, yp) 446 | }) 447 | @ 448 | 449 | If things go well, you will not get any messages or errors and nothing 450 | useful will be returned by either \code{setClass} or \code{setMethod}. 451 | \end{frame} 452 | 453 | \begin{frame}[fragile]{S4 Class/Method: Polygon Class} 454 | After calling \code{setMethod} the new \code{plot} method will be 455 | added to the list of methods for \code{plot}. 456 | <>= 457 | showMethods("plot") 458 | @ 459 | 460 | Notice that the signature for class \code{polygon} is listed. The 461 | method for \code{ANY} is the default method and it is what is called 462 | when now other signature matches 463 | \end{frame} 464 | 465 | \begin{frame}[fragile]{S4 Class/Method: Polygon class} 466 | <>= 467 | p <- new("polygon", x = c(1, 2, 3, 4), y = c(1, 2, 3, 1)) 468 | plot(p) 469 | @ 470 | \end{frame} 471 | 472 | 473 | \begin{frame}{Where to Look, Places to Start} 474 | \begin{itemize} 475 | \item 476 | The best way to learn this stuff is to look at examples (and try the 477 | exercises for the course) 478 | \item 479 | There are now quite a few examples on CRAN which use S4 480 | classes/methods. 481 | \item 482 | Bioconductor (http://www.bioconductor.org) --- a rich 483 | resource, even if you know nothing about bioinformatics 484 | \item 485 | Some packages on CRAN (as far as I know) --- SparseM, 486 | gpclib, flexmix, its, lme4, orientlib, pixmap 487 | \item 488 | The \code{stats4} package (comes with R) has a bunch of 489 | classes/methods for doing maximum likelihood analysis. 490 | \end{itemize} 491 | \end{frame} 492 | 493 | 494 | 495 | \end{document} 496 | -------------------------------------------------------------------------------- /connections.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Connections} 28 | 29 | \date{Computing for Data Analysis} 30 | 31 | \setbeamertemplate{footline}[page number] 32 | 33 | 34 | \begin{document} 35 | 36 | \begin{frame} 37 | \titlepage 38 | \end{frame} 39 | 40 | 41 | \begin{frame}{Interfaces to the Outside World} 42 | Data are read in using \textit{connection} interfaces. Connections 43 | can be made to files (most common) or to other more exotic things. 44 | \begin{itemize} 45 | \item 46 | \code{file}, opens a connection to a file 47 | \item 48 | \code{gzfile}, opens a connection to a file compressed with gzip 49 | \item 50 | \code{bzfile}, opens a connection to a file compressed with bzip2 51 | \item 52 | \code{url}, opens a connection to a webpage 53 | \end{itemize} 54 | \end{frame} 55 | 56 | 57 | \begin{frame}[fragile]{File Connections} 58 | \begin{verbatim} 59 | > str(file) 60 | function (description = "", open = "", blocking = TRUE, 61 | encoding = getOption("encoding")) 62 | \end{verbatim} 63 | \begin{itemize} 64 | \item 65 | \code{description} is the name of the file 66 | \item 67 | \code{open} is a code indicating 68 | \begin{itemize} 69 | \item 70 | ``r'' read only 71 | \item 72 | ``w'' writing (and initializing a new file) 73 | \item 74 | ``a'' appending 75 | \item 76 | ``rb'', ``wb'', ``ab'' reading, writing, or appending in binary mode 77 | (Windows) 78 | \end{itemize} 79 | \end{itemize} 80 | \end{frame} 81 | 82 | 83 | \begin{frame}[fragile]{Connections} 84 | In general, connections are powerful tools that let you navigate files 85 | or other external objects. In practice, we often don't need to deal 86 | with the connection interface directly. 87 | \begin{verbatim} 88 | con <- file("foo.txt", "r") 89 | data <- read.csv(con) 90 | close(con) 91 | \end{verbatim} 92 | is the same as 93 | \begin{verbatim} 94 | data <- read.csv("foo.txt") 95 | \end{verbatim} 96 | \end{frame} 97 | 98 | \end{document} 99 | -------------------------------------------------------------------------------- /controlstructures.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Control Structures} 28 | 29 | \date{Computing for Data Analysis} 30 | 31 | \setbeamertemplate{footline}[page number] 32 | 33 | 34 | 35 | 36 | \begin{document} 37 | 38 | \begin{frame} 39 | \titlepage 40 | \end{frame} 41 | 42 | \begin{frame}{Control Structures} 43 | Control structures in R allow you to control the flow of execution of 44 | the program, depending on runtime conditions. Common structures are 45 | \begin{itemize} 46 | \item 47 | \code{if}, \code{else}: testing a condition 48 | \item 49 | \code{for}: execute a loop a fixed number of times 50 | \item 51 | \code{while}: execute a loop \textit{while} a condition is true 52 | \item 53 | \code{repeat}: execute an infinite loop 54 | \item 55 | \code{break}: break the execution of a loop 56 | \item 57 | \code{next}: skip an interation of a loop 58 | \item 59 | \code{return}: exit a function 60 | \end{itemize} 61 | Most control structures are not used in interactive sessions, but 62 | rather when writing functions or longer expresisons. 63 | \end{frame} 64 | 65 | 66 | \begin{frame}[fragile]{Control Structures: if} 67 | \begin{verbatim} 68 | if() { 69 | ## do something 70 | } else { 71 | ## do something else 72 | } 73 | 74 | if() { 75 | ## do something 76 | } else if() { 77 | ## do something different 78 | } else { 79 | ## do something different 80 | } 81 | \end{verbatim} 82 | \end{frame} 83 | 84 | \begin{frame}[fragile]{if} 85 | This is a valid if/else structure. 86 | \begin{verbatim} 87 | if(x > 3) { 88 | y <- 10 89 | } else { 90 | y <- 0 91 | } 92 | \end{verbatim} 93 | So is this one. 94 | \begin{verbatim} 95 | y <- if(x > 3) { 96 | 10 97 | } else { 98 | 0 99 | } 100 | \end{verbatim} 101 | \end{frame} 102 | 103 | \begin{frame}[fragile]{if} 104 | Of course, the \code{else} clause is not necessary. 105 | \begin{verbatim} 106 | if() { 107 | 108 | } 109 | 110 | if() { 111 | 112 | } 113 | \end{verbatim} 114 | \end{frame} 115 | 116 | 117 | \begin{frame}[fragile]{for} 118 | \code{for} loops take an interator variable and assign it successive 119 | values from a sequence or vector. For loops are most commonly used 120 | for iterating over the elements of an object (list, vector, etc.) 121 | \begin{verbatim} 122 | for(i in 1:10) { 123 | print(i) 124 | } 125 | \end{verbatim} 126 | This loop takes the \code{i} variable and in each iteration of the 127 | loop gives it values 1, 2, 3, ..., 10, and then exits. 128 | \end{frame} 129 | 130 | \begin{frame}[fragile]{for} 131 | These three loops have the same behavior. 132 | \begin{verbatim} 133 | x <- c("a", "b", "c", "d") 134 | 135 | for(i in 1:4) { 136 | print(x[i]) 137 | } 138 | 139 | for(i in seq_along(x)) { 140 | print(x[i]) 141 | } 142 | 143 | for(letter in x) { 144 | print(letter) 145 | } 146 | 147 | for(i in 1:4) print(x[i]) 148 | \end{verbatim} 149 | \end{frame} 150 | 151 | \begin{frame}[fragile]{Nested for loops} 152 | \code{for} loops can be nested. 153 | \begin{verbatim} 154 | x <- matrix(1:6, 2, 3) 155 | 156 | for(i in seq_len(nrow(x))) { 157 | for(j in seq_len(ncol(x))) { 158 | print(x[i, j]) 159 | } 160 | } 161 | \end{verbatim} 162 | Be careful with nesting though. Nesting beyond 2--3 levels is often 163 | very difficult to read/understand. 164 | \end{frame} 165 | 166 | \begin{frame}[fragile]{while} 167 | While loops begin by testing a condition. If it is true, then they 168 | execute the loop body. Once the loop body is executed, the condition 169 | is tested again, and so forth. 170 | \begin{verbatim} 171 | count <- 0 172 | 173 | while(count < 10) { 174 | print(count) 175 | count <- count + 1 176 | } 177 | \end{verbatim} 178 | While loops can potentially result in infinite loops if not written 179 | properly. Use with care! 180 | \end{frame} 181 | 182 | \begin{frame}[fragile]{while} 183 | Sometimes there will be more than one condition in the test. 184 | \begin{verbatim} 185 | z <- 5 186 | 187 | while(z >= 3 && z <= 10) { 188 | print(z) 189 | coin <- rbinom(1, 1, 0.5) 190 | 191 | if(coin == 1) { ## random walk 192 | z <- z + 1 193 | } else { 194 | z <- z - 1 195 | } 196 | } 197 | \end{verbatim} 198 | Conditions are always evaluated from left to right. 199 | \end{frame} 200 | 201 | \begin{frame}[fragile]{repeat} 202 | Repeat initiates an infinite loop; these are not commonly used in 203 | statistical applications but they do have their uses. The only way to 204 | exit a \code{repeat} loop is to call \code{break}. 205 | \begin{verbatim} 206 | x0 <- 1 207 | tol <- 1e-8 208 | 209 | repeat { 210 | x1 <- computeEstimate() 211 | 212 | if(abs(x1 - x0) < tol) { 213 | break 214 | } else { 215 | x0 <- x1 216 | } 217 | } 218 | \end{verbatim} 219 | \end{frame} 220 | 221 | \begin{frame}{repeat} 222 | The loop in the previous slide is a bit dangerous because there's no 223 | guarantee it will stop. Better to set a hard limit on the number of 224 | iterations (e.g. using a for loop) and then report whether convergence 225 | was achieved or not. 226 | \end{frame} 227 | 228 | \begin{frame}[fragile]{next, return} 229 | \code{next} is used to skip an iteration of a loop 230 | \begin{verbatim} 231 | for(i in 1:100) { 232 | if(i <= 20) { 233 | ## Skip the first 20 iterations 234 | next 235 | } 236 | ## Do something here 237 | } 238 | \end{verbatim} 239 | \code{return} signals that a function should exit and return a given 240 | value 241 | \end{frame} 242 | 243 | 244 | \begin{frame}{Control Structures} 245 | Summary 246 | \begin{itemize} 247 | \item Control structures like \code{if}, \code{while}, and \code{for} 248 | allow you to control the flow of an R program 249 | \item Infinite loops should generally be avoided, even if they are 250 | theoretically correct. 251 | \item Control structures mentiond here are primarily useful for 252 | writing programs; for command-line interactive work, the *apply 253 | functions are more useful. 254 | \end{itemize} 255 | \end{frame} 256 | 257 | 258 | 259 | 260 | \end{document} 261 | 262 | 263 | -------------------------------------------------------------------------------- /datatypes1.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Data Types and Basic Operations} 28 | 29 | %\author{Roger D. Peng} 30 | % - Give the names in the same order as the appear in the paper. 31 | % - Use the \inst{?} command only if the authors have different 32 | % affiliation. 33 | 34 | %\institute{ 35 | % \inst{1}% 36 | % Department Biostatistics\\ 37 | % Johns Hopkins Bloomberg School of Public Health 38 | % \and 39 | % \inst{2}% 40 | % Department of Preventive Medicine\\ 41 | % Feinberg School of Medicine, Northwestern University 42 | %} 43 | % - Use the \inst command only if there are several affiliations. 44 | % - Keep it simple, no one is interested in your street address. 45 | 46 | \date{Computing for Data Analysis} 47 | 48 | \setbeamertemplate{footline}[page number] 49 | 50 | 51 | \begin{document} 52 | 53 | \begin{frame} 54 | \titlepage 55 | \end{frame} 56 | 57 | 58 | \begin{frame}{Objects} 59 | R has five basic or ``atomic'' classes of objects: 60 | \begin{itemize} 61 | \item 62 | character 63 | \item 64 | numeric (real numbers) 65 | \item 66 | integer 67 | \item 68 | complex 69 | \item 70 | logical (True/False) 71 | \end{itemize} 72 | The most basic object is a vector 73 | \begin{itemize} 74 | \item 75 | A vector can only contain objects of the same class 76 | \item 77 | BUT: The one exception is a \textit{list}, which is represented as a 78 | vector but can contain objects of different classes (indeed, that's 79 | usually why we use them) 80 | \end{itemize} 81 | Empty vectors can be created with the \code{vector()} function. 82 | \end{frame} 83 | 84 | \begin{frame}{Numbers} 85 | \begin{itemize} 86 | \item 87 | Numbers in R a generally treated as numeric objects (i.e. double 88 | precision real numbers) 89 | \item 90 | If you explicitly want an integer, you need to specify the \code{L} 91 | suffix 92 | \item 93 | Ex: Entering \code{1} gives you a numeric object; entering \code{1L} 94 | explicitly gives you an integer. 95 | \item 96 | There is also a special number \code{Inf} which represents infinity; 97 | e.g. \code{1 / 0}; \code{Inf} can be used in ordinary calculations; 98 | e.g. \code{1 / Inf} is 0 99 | \item 100 | The value \code{NaN} represents an undefined value (``not a number''); 101 | e.g. 0 / 0; \code{NaN} can also be thought of as a missing value (more 102 | on that later) 103 | \end{itemize} 104 | \end{frame} 105 | 106 | \begin{frame}{Attributes} 107 | R objects can have attributes 108 | \begin{itemize} 109 | \item 110 | names, dimnames 111 | \item 112 | dimensions (e.g. matrices, arrays) 113 | \item 114 | class 115 | \item 116 | length 117 | \item 118 | other user-defined attributes/metadata 119 | \end{itemize} 120 | Attributes of an object can be accessed using the \code{attributes()} 121 | function. 122 | \end{frame} 123 | 124 | \begin{frame}[fragile]{Entering Input} 125 | At the R prompt we type \textit{expressions}. The \code{<-} symbol is 126 | the assignment operator. 127 | \begin{verbatim} 128 | > x <- 1 129 | > print(x) 130 | [1] 1 131 | > x 132 | [1] 1 133 | > msg <- "hello" 134 | \end{verbatim} 135 | The grammar of the language determines whether an expression is 136 | complete or not. 137 | \begin{verbatim} 138 | > x <- ## Incomplete expression 139 | \end{verbatim} 140 | The \code{\#} character indicates a \textit{comment}. Anything to the 141 | right of the \code{\#} (including the \code{\#} itself) is ignored. 142 | \end{frame} 143 | 144 | \begin{frame}[fragile]{Evaluation} 145 | When a complete expression is entered at the prompt, it is 146 | \textit{evaluated} and the result of the evaluated expression is 147 | returned. The result may be \textit{auto-printed}. 148 | \begin{verbatim} 149 | > x <- 5 ## nothing printed 150 | > x ## auto-printing occurs 151 | [1] 5 152 | > print(x) ## explicit printing 153 | [1] 5 154 | \end{verbatim} 155 | The \code{[1]} indicates that \code{x} is a vector and 5 is the first 156 | element. 157 | \end{frame} 158 | 159 | 160 | \begin{frame}[fragile]{Printing} 161 | \begin{verbatim} 162 | > x <- 1:20 163 | > x 164 | [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 165 | [16] 16 17 18 19 20 166 | \end{verbatim} 167 | The \code{:} operator is used to create integer sequences. 168 | \end{frame} 169 | 170 | \begin{frame}[fragile]{Creating Vectors} 171 | The \code{c()} function can be used to create vectors of objects. 172 | \begin{verbatim} 173 | > x <- c(0.5, 0.6) ## numeric 174 | > x <- c(TRUE, FALSE) ## logical 175 | > x <- c(T, F) ## logical 176 | > x <- c("a", "b", "c") ## character 177 | > x <- 9:29 ## integer 178 | > x <- c(1+0i, 2+4i) ## complex 179 | \end{verbatim} 180 | Using the \code{vector()} function 181 | \begin{verbatim} 182 | > x <- vector("numeric", length = 10) 183 | > x 184 | [1] 0 0 0 0 0 0 0 0 0 0 185 | \end{verbatim} 186 | \end{frame} 187 | 188 | \begin{frame}[fragile]{Mixing Objects} 189 | What about the following? 190 | \begin{verbatim} 191 | > y <- c(1.7, "a") ## character 192 | > y <- c(TRUE, 2) ## numeric 193 | > y <- c("a", TRUE) ## character 194 | \end{verbatim} 195 | When different objects are mixed in a vector, \textit{coercion} occurs 196 | so that every element in the vector is of the same class. 197 | \end{frame} 198 | 199 | \begin{frame}[fragile]{Explicit Coercion} 200 | Objects can be explicitly coerced from one class to another using the 201 | \code{as.*} functions, if available. 202 | \begin{verbatim} 203 | > x <- 0:6 204 | > class(x) 205 | [1] "integer" 206 | > as.numeric(x) 207 | [1] 0 1 2 3 4 5 6 208 | > as.logical(x) 209 | [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE 210 | > as.character(x) 211 | [1] "0" "1" "2" "3" "4" "5" "6" 212 | > as.complex(x) 213 | [1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i 214 | \end{verbatim} 215 | \end{frame} 216 | 217 | \begin{frame}[fragile]{Explicit Coercion} 218 | Nonsensical coercion results in \code{NA}s. 219 | \begin{verbatim} 220 | > x <- c("a", "b", "c") 221 | > as.numeric(x) 222 | [1] NA NA NA 223 | Warning message: 224 | NAs introduced by coercion 225 | > as.logical(x) 226 | [1] NA NA NA 227 | \end{verbatim} 228 | \end{frame} 229 | 230 | 231 | \begin{frame}[fragile]{Matrices} 232 | Matrices are vectors with a \textit{dimension} attribute. The 233 | dimension attribute is itself an integer vector of length 2 (nrow, 234 | ncol) 235 | \begin{verbatim} 236 | > m <- matrix(nrow = 2, ncol = 3) 237 | > m 238 | [,1] [,2] [,3] 239 | [1,] NA NA NA 240 | [2,] NA NA NA 241 | > dim(m) 242 | [1] 2 3 243 | > attributes(m) 244 | $dim 245 | [1] 2 3 246 | \end{verbatim} 247 | \end{frame} 248 | 249 | \begin{frame}[fragile]{Matrices (cont'd)} 250 | Matrices are constructed \textit{column-wise}, so entries can be 251 | thought of starting in the ``upper left'' corner and running down the 252 | columns. 253 | \begin{verbatim} 254 | > m <- matrix(1:6, nrow = 2, ncol = 3) 255 | > m 256 | [,1] [,2] [,3] 257 | [1,] 1 3 5 258 | [2,] 2 4 6 259 | \end{verbatim} 260 | \end{frame} 261 | 262 | \begin{frame}[fragile]{Matrices (cont'd)} 263 | Matrices can also be created directly from vectors by adding a 264 | dimension attribute. 265 | \begin{verbatim} 266 | > m <- 1:10 267 | > m 268 | [1] 1 2 3 4 5 6 7 8 9 10 269 | > dim(m) <- c(2, 5) 270 | > m 271 | [,1] [,2] [,3] [,4] [,5] 272 | [1,] 1 3 5 7 9 273 | [2,] 2 4 6 8 10 274 | \end{verbatim} 275 | \end{frame} 276 | 277 | \begin{frame}[fragile]{cbind-ing and rbind-ing} 278 | Matrices can be created by \textit{column-binding} or 279 | \textit{row-binding} with \code{cbind()} and \code{rbind()}. 280 | \begin{verbatim} 281 | > x <- 1:3 282 | > y <- 10:12 283 | > cbind(x, y) 284 | x y 285 | [1,] 1 10 286 | [2,] 2 11 287 | [3,] 3 12 288 | > rbind(x, y) 289 | [,1] [,2] [,3] 290 | x 1 2 3 291 | y 10 11 12 292 | \end{verbatim} 293 | \end{frame} 294 | 295 | \begin{frame}[fragile]{Lists} 296 | Lists are a special type of vector that can contain elements of 297 | different classes. Lists are a very important data type in R and you 298 | should get to know them well. 299 | \begin{verbatim} 300 | > x <- list(1, "a", TRUE, 1 + 4i) 301 | > x 302 | [[1]] 303 | [1] 1 304 | 305 | [[2]] 306 | [1] "a" 307 | 308 | [[3]] 309 | [1] TRUE 310 | 311 | [[4]] 312 | [1] 1+4i 313 | \end{verbatim} 314 | \end{frame} 315 | 316 | \begin{frame}{Factors} 317 | Factors are used to represent categorical data. Factors can be 318 | unordered or ordered. One can think of a factor as an integer vector 319 | where each integer has a \textit{label}. 320 | \begin{itemize} 321 | \item 322 | Factors are treated specially by modelling functions like \code{lm()} 323 | and \code{glm()} 324 | \item 325 | Using factors with labels is \textit{better} than using integers 326 | because factors are self-describing; having a variable that has values 327 | ``Male'' and ``Female'' is better than a variable that has values 1 328 | and 2. 329 | \end{itemize} 330 | \end{frame} 331 | 332 | \begin{frame}[fragile]{Factors} 333 | \begin{verbatim} 334 | > x <- factor(c("yes", "yes", "no", "yes", "no")) 335 | > x 336 | [1] yes yes no yes no 337 | Levels: no yes 338 | > table(x) 339 | x 340 | no yes 341 | 2 3 342 | > unclass(x) 343 | [1] 2 2 1 2 1 344 | attr(,"levels") 345 | [1] "no" "yes" 346 | \end{verbatim} 347 | \end{frame} 348 | 349 | 350 | \begin{frame}[fragile]{Factors} 351 | The order of the levels can be set using the \code{levels} argument to 352 | \code{factor()}. This can be important in linear modelling because 353 | the first level is used as the baseline level. 354 | \begin{verbatim} 355 | > x <- factor(c("yes", "yes", "no", "yes", "no"), 356 | levels = c("yes", "no")) 357 | > x 358 | [1] yes yes no yes no 359 | Levels: yes no 360 | \end{verbatim} 361 | \end{frame} 362 | 363 | \begin{frame}[fragile]{Missing Values} 364 | Missing values are denoted by \code{NA} or \code{NaN} for undefined 365 | mathematical operations. 366 | \begin{itemize} 367 | \item 368 | \code{is.na()} is used to test objects if they are \code{NA} 369 | \item 370 | \code{is.nan()} is used to test for \code{NaN} 371 | \item 372 | \code{NA} values have a class also, so there are integer \code{NA}, 373 | character \code{NA}, etc. 374 | \item 375 | A \code{NaN} value is also \code{NA} but the converse is not true 376 | \end{itemize} 377 | \end{frame} 378 | 379 | \begin{frame}[fragile]{Missing Values} 380 | \begin{verbatim} 381 | > x <- c(1, 2, NA, 10, 3) 382 | > is.na(x) 383 | [1] FALSE FALSE TRUE FALSE FALSE 384 | > is.nan(x) 385 | [1] FALSE FALSE FALSE FALSE FALSE 386 | > x <- c(1, 2, NaN, NA, 4) 387 | > is.na(x) 388 | [1] FALSE FALSE TRUE TRUE FALSE 389 | > is.nan(x) 390 | [1] FALSE FALSE TRUE FALSE FALSE 391 | \end{verbatim} 392 | \end{frame} 393 | 394 | \begin{frame}{Data Frames} 395 | Data frames are used to store tabular data 396 | \begin{itemize} 397 | \item 398 | They are represented as a special type of list where every element of 399 | the list has to have the same length 400 | \item 401 | Each element of the list can be thought of as a column and the length 402 | of each element of the list is the number of rows 403 | \item 404 | Unlike matrices, data frames can store different classes of objects in 405 | each column (just like lists); matrices must have every element be the 406 | same class 407 | \item 408 | Data frames also have a special attribute called \code{row.names} 409 | \item 410 | Data frames are usually created by calling \code{read.table()} or 411 | \code{read.csv()} 412 | \item 413 | Can be converted to a matrix by calling \code{data.matrix()} 414 | \end{itemize} 415 | \end{frame} 416 | 417 | \begin{frame}[fragile]{Data Frames} 418 | \begin{verbatim} 419 | > x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) 420 | > x 421 | foo bar 422 | 1 1 TRUE 423 | 2 2 TRUE 424 | 3 3 FALSE 425 | 4 4 FALSE 426 | > nrow(x) 427 | [1] 4 428 | > ncol(x) 429 | [1] 2 430 | \end{verbatim} 431 | \end{frame} 432 | 433 | \begin{frame}[fragile]{Names} 434 | R objects can also have names, which is very useful for writing 435 | readable code and self-describing objects. 436 | \begin{verbatim} 437 | > x <- 1:3 438 | > names(x) 439 | NULL 440 | > names(x) <- c("foo", "bar", "norf") 441 | > x 442 | foo bar norf 443 | 1 2 3 444 | > names(x) 445 | [1] "foo" "bar" "norf" 446 | \end{verbatim} 447 | \end{frame} 448 | 449 | \begin{frame}[fragile]{Names} 450 | Lists can also have names. 451 | \begin{verbatim} 452 | > x <- list(a = 1, b = 2, c = 3) 453 | > x 454 | $a 455 | [1] 1 456 | 457 | $b 458 | [1] 2 459 | 460 | $c 461 | [1] 3 462 | \end{verbatim} 463 | \end{frame} 464 | 465 | \begin{frame}[fragile]{Names} 466 | And matrices. 467 | \begin{verbatim} 468 | > m <- matrix(1:4, nrow = 2, ncol = 2) 469 | > dimnames(m) <- list(c("a", "b"), c("c", "d")) 470 | > m 471 | c d 472 | a 1 3 473 | b 2 4 474 | \end{verbatim} 475 | \end{frame} 476 | 477 | \begin{frame}[fragile]{Summary} 478 | Data Types 479 | \begin{itemize} 480 | \item atomic classes: numeric, logical, character, integer, complex 481 | \item vectors, lists 482 | \item factors 483 | \item missing values 484 | \item data frames 485 | \item names 486 | \end{itemize} 487 | \end{frame} 488 | 489 | 490 | 491 | \end{document} 492 | 493 | 494 | -------------------------------------------------------------------------------- /datatypes2.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Data Types and Basic Operations} 28 | 29 | %\author{Roger D. Peng} 30 | % - Give the names in the same order as the appear in the paper. 31 | % - Use the \inst{?} command only if the authors have different 32 | % affiliation. 33 | 34 | %\institute{ 35 | % \inst{1}% 36 | % Department Biostatistics\\ 37 | % Johns Hopkins Bloomberg School of Public Health 38 | % \and 39 | % \inst{2}% 40 | % Department of Preventive Medicine\\ 41 | % Feinberg School of Medicine, Northwestern University 42 | %} 43 | % - Use the \inst command only if there are several affiliations. 44 | % - Keep it simple, no one is interested in your street address. 45 | 46 | \date{Computing for Data Analysis} 47 | 48 | \setbeamertemplate{footline}[page number] 49 | 50 | 51 | \begin{document} 52 | 53 | \begin{frame} 54 | \titlepage 55 | \end{frame} 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | \begin{frame}[fragile]{Subsetting} 65 | There are a number of operators that can be used to extract subsets of 66 | R objects. 67 | \begin{itemize} 68 | \item 69 | \verb+[+ always returns an object of the same class as the original; 70 | can be used to select more than one element (there is one exception) 71 | \item 72 | \verb+[[+ is used to extract elements of a list or a data frame; it 73 | can only be used to extract a single element and the class of the 74 | returned object will not necessarily be a list or data frame 75 | \item 76 | \verb+$+ is used to extract elements of a list or data frame by name; 77 | semantics are similar to hat of \verb+[[+. 78 | \end{itemize} 79 | \end{frame} 80 | 81 | \begin{frame}[fragile]{Subsetting} 82 | \begin{verbatim} 83 | > x <- c("a", "b", "c", "c", "d", "a") 84 | > x[1] 85 | [1] "a" 86 | > x[2] 87 | [1] "b" 88 | > x[1:4] 89 | [1] "a" "b" "c" "c" 90 | > x[x > "a"] 91 | [1] "b" "c" "c" "d" 92 | > u <- x > "a" 93 | > u 94 | [1] FALSE TRUE TRUE TRUE TRUE FALSE 95 | > x[u] 96 | [1] "b" "c" "c" "d" 97 | \end{verbatim} 98 | \end{frame} 99 | 100 | \begin{frame}[fragile]{Subsetting a Matrix} 101 | Matrices can be subsetted in the usual way with $(i, j)$ type indices. 102 | \begin{verbatim} 103 | > x <- matrix(1:6, 2, 3) 104 | > x[1, 2] 105 | [1] 3 106 | > x[2, 1] 107 | [1] 2 108 | \end{verbatim} 109 | Indices can also be missing. 110 | \begin{verbatim} 111 | > x[1, ] 112 | [1] 1 3 5 113 | > x[, 2] 114 | [1] 3 4 115 | \end{verbatim} 116 | \end{frame} 117 | 118 | \begin{frame}[fragile]{Subsetting a Matrix} 119 | By default, when a single element of a matrix is retrieved, it is 120 | returned as a vector of length 1 rather than a $1\times 1$ matrix. 121 | This behavior can be turned off by setting \code{drop = FALSE}. 122 | \begin{verbatim} 123 | > x <- matrix(1:6, 2, 3) 124 | > x[1, 2] 125 | [1] 3 126 | 127 | > x[1, 2, drop = FALSE] 128 | [,1] 129 | [1,] 3 130 | \end{verbatim} 131 | \end{frame} 132 | 133 | \begin{frame}[fragile]{Subsetting a Matrix} 134 | Similarly, subsetting a single column or a single row will give you a 135 | vector, not a matrix (by default). 136 | \begin{verbatim} 137 | > x <- matrix(1:6, 2, 3) 138 | > x[1, ] 139 | [1] 1 3 5 140 | > x[1, , drop = FALSE] 141 | [,1] [,2] [,3] 142 | [1,] 1 3 5 143 | \end{verbatim} 144 | \end{frame} 145 | 146 | \begin{frame}[fragile]{Subsetting Lists} 147 | \begin{verbatim} 148 | > x <- list(foo = 1:4, bar = 0.6) 149 | > x[1] 150 | $foo 151 | [1] 1 2 3 4 152 | 153 | > x[[1]] 154 | [1] 1 2 3 4 155 | 156 | > x$bar 157 | [1] 0.6 158 | > x[["bar"]] 159 | [1] 0.6 160 | > x["bar"] 161 | $bar 162 | [1] 0.6 163 | \end{verbatim} 164 | \end{frame} 165 | 166 | \begin{frame}[fragile]{Subsetting Lists} 167 | Extracting multiple elements of a list. 168 | \begin{verbatim} 169 | > x <- list(foo = 1:4, bar = 0.6, baz = "hello") 170 | > x[c(1, 3)] 171 | $foo 172 | [1] 1 2 3 4 173 | 174 | $baz 175 | [1] "hello" 176 | \end{verbatim} 177 | \end{frame} 178 | 179 | \begin{frame}[fragile]{Subsetting Lists} 180 | The \verb+[[+ operator can be used with \textit{computed} indices; 181 | \verb+$+ can only be used with literal names. 182 | \begin{verbatim} 183 | > x <- list(foo = 1:4, bar = 0.6, baz = "hello") 184 | > name <- "foo" 185 | > x[[name]] ## computed index for `foo' 186 | [1] 1 2 3 4 187 | > x$name ## element `name' doesn't exist! 188 | NULL 189 | > x$foo 190 | [1] 1 2 3 4 ## element `foo' does exist 191 | \end{verbatim} 192 | \end{frame} 193 | 194 | \begin{frame}[fragile]{Subsetting Nested Elements of a List} 195 | The \verb+[[+ can take an integer sequence. 196 | \begin{verbatim} 197 | > x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) 198 | > x[[c(1, 3)]] 199 | [1] 14 200 | > x[[1]][[3]] 201 | [1] 14 202 | 203 | > x[[c(2, 1)]] 204 | [1] 3.14 205 | \end{verbatim} 206 | \end{frame} 207 | 208 | \begin{frame}[fragile]{Partial Matching} 209 | Partial matching of names is allowed with \verb+[[+ and \verb+$+. 210 | \begin{verbatim} 211 | > x <- list(aardvark = 1:5) 212 | > x$a 213 | [1] 1 2 3 4 5 214 | > x[["a"]] 215 | NULL 216 | > x[["a", exact = FALSE]] 217 | [1] 1 2 3 4 5 218 | \end{verbatim} 219 | \end{frame} 220 | 221 | \begin{frame}[fragile]{Removing NA Values} 222 | A common task is to remove missing values (\code{NA}s). 223 | \begin{verbatim} 224 | > x <- c(1, 2, NA, 4, NA, 5) 225 | > bad <- is.na(x) 226 | > x[!bad] 227 | [1] 1 2 4 5 228 | \end{verbatim} 229 | \end{frame} 230 | 231 | \begin{frame}[fragile]{Removing NA Values} 232 | What if there are multiple things and you want to take the subset with 233 | no missing values? 234 | \begin{verbatim} 235 | > x <- c(1, 2, NA, 4, NA, 5) 236 | > y <- c("a", "b", NA, "d", NA, "f") 237 | > good <- complete.cases(x, y) 238 | > good 239 | [1] TRUE TRUE FALSE TRUE FALSE TRUE 240 | > x[good] 241 | [1] 1 2 4 5 242 | > y[good] 243 | [1] "a" "b" "d" "f" 244 | \end{verbatim} 245 | \end{frame} 246 | 247 | \begin{frame}[fragile]{Removing NA Values} 248 | \begin{verbatim} 249 | > airquality[1:6, ] 250 | Ozone Solar.R Wind Temp Month Day 251 | 1 41 190 7.4 67 5 1 252 | 2 36 118 8.0 72 5 2 253 | 3 12 149 12.6 74 5 3 254 | 4 18 313 11.5 62 5 4 255 | 5 NA NA 14.3 56 5 5 256 | 6 28 NA 14.9 66 5 6 257 | > good <- complete.cases(airquality) 258 | > airquality[good, ][1:6, ] 259 | Ozone Solar.R Wind Temp Month Day 260 | 1 41 190 7.4 67 5 1 261 | 2 36 118 8.0 72 5 2 262 | 3 12 149 12.6 74 5 3 263 | 4 18 313 11.5 62 5 4 264 | 7 23 299 8.6 65 5 7 265 | 8 19 99 13.8 59 5 8 266 | \end{verbatim} 267 | \end{frame} 268 | 269 | 270 | \end{document} 271 | -------------------------------------------------------------------------------- /debugging.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | 17 | \usepackage{amsmath,amsfonts,amssymb} 18 | 19 | \input{macros} 20 | 21 | \title[Debugging]{Debugging} 22 | 23 | 24 | \date{Computing for Data Analysis} 25 | 26 | \setbeamertemplate{footline}[page number] 27 | 28 | \begin{document} 29 | 30 | \begin{frame} 31 | \titlepage 32 | \end{frame} 33 | 34 | 35 | \begin{frame}{Something's Wrong!} 36 | Indications that something's not right 37 | \begin{itemize} 38 | \item \code{message}: A generic notification/diagnostic message 39 | produced by the \code{message} function; execution of the function 40 | continues 41 | \item \code{warning}: An indication that something is wrong but not 42 | necessarily fatal; execution of the function continues; generated by 43 | the \code{warning} function 44 | \item \code{error}: An indication that a fatal problem has occurred; 45 | execution stops; produced by the \code{stop} function 46 | \item \code{condition}: A generic concept for indicating that 47 | something unexpected can occur; programmers can create their own 48 | conditions 49 | \end{itemize} 50 | \end{frame} 51 | 52 | \begin{frame}[fragile]{Something's Wrong!} 53 | Warning 54 | \begin{verbatim} 55 | > log(-1) 56 | [1] NaN 57 | Warning message: 58 | In log(-1) : NaNs produced 59 | \end{verbatim} 60 | \end{frame} 61 | 62 | \begin{frame}[fragile]{Something's Wrong} 63 | \begin{verbatim} 64 | printmessage <- function(x) { 65 | if(x > 0) 66 | print("x is greater than zero") 67 | else 68 | print("x is less than or equal to zero") 69 | invisible(x) 70 | } 71 | \end{verbatim} 72 | \end{frame} 73 | 74 | \begin{frame}[fragile]{Something's Wrong} 75 | \begin{verbatim} 76 | printmessage <- function(x) { 77 | if(x > 0) 78 | print("x is greater than zero") 79 | else 80 | print("x is less than or equal to zero") 81 | invisible(x) 82 | } 83 | > printmessage(1) 84 | [1] "x is greater than zero" 85 | > printmessage(NA) 86 | Error in if (x > 0) { : missing value where TRUE/FALSE needed 87 | \end{verbatim} 88 | \end{frame} 89 | 90 | \begin{frame}[fragile]{Something's Wrong!} 91 | \begin{verbatim} 92 | printmessage2 <- function(x) { 93 | if(is.na(x)) 94 | print("x is a missing value!") 95 | else if(x > 0) 96 | print("x is greater than zero") 97 | else 98 | print("x is less than or equal to zero") 99 | invisible(x) 100 | } 101 | \end{verbatim} 102 | \end{frame} 103 | 104 | \begin{frame}[fragile]{Something's Wrong!} 105 | \begin{verbatim} 106 | printmessage2 <- function(x) { 107 | if(is.na(x)) 108 | print("x is a missing value!") 109 | else if(x > 0) 110 | print("x is greater than zero") 111 | else 112 | print("x is less than or equal to zero") 113 | invisible(x) 114 | } 115 | > x <- log(-1) 116 | Warning message: 117 | In log(-1) : NaNs produced 118 | > printmessage2(x) 119 | [1] "x is a missing value!" 120 | \end{verbatim} 121 | \end{frame} 122 | 123 | \begin{frame}{Something's Wrong!} 124 | How do you know that something is wrong with your function? 125 | \begin{itemize} 126 | \item What was your input? How did you call the function? 127 | \item What were you expecting? Output, messages, other results? 128 | \item What did you get? 129 | \item How does what you get differ from what you were expecting? 130 | \item Were your expectations correct in the first place? 131 | \item Can you reproduce the problem (exactly)? 132 | \end{itemize} 133 | \end{frame} 134 | 135 | \begin{frame}{Debugging Tools in R} 136 | The primary tools for debugging functions in R are 137 | \begin{itemize} 138 | \item \code{traceback}: prints out the function call stack after an 139 | error occurs; does nothing if there's no error 140 | \item \code{debug}: flags a function for ``debug'' mode which allows 141 | you to step through execution of a function one line at a time 142 | \item \code{browser}: suspends the execution of a function wherever it 143 | is called and puts the function in debug mode 144 | \item \code{trace}: allows you to insert debugging code into a 145 | function a specific places 146 | \item \code{recover}: allows you to modify the error behavior so that 147 | you can browse the function call stack 148 | \end{itemize} 149 | These are interactive tools specifically designed to allow you to pick 150 | through a function. There's also the more blunt technique of inserting 151 | \code{print}/\code{cat} statements in the function. 152 | \end{frame} 153 | 154 | 155 | 156 | 157 | \begin{frame}[fragile]{traceback} 158 | \begin{verbatim} 159 | > mean(x) 160 | Error in mean(x) : object 'x' not found 161 | > traceback() 162 | 1: mean(x) 163 | > 164 | \end{verbatim} 165 | \end{frame} 166 | 167 | \begin{frame}[fragile]{traceback} 168 | \begin{verbatim} 169 | > lm(y ~ x) 170 | Error in eval(expr, envir, enclos) : object 'y' not found 171 | > traceback() 172 | 7: eval(expr, envir, enclos) 173 | 6: eval(predvars, data, env) 174 | 5: model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) 175 | 4: model.frame(formula = y ~ x, drop.unused.levels = TRUE) 176 | 3: eval(expr, envir, enclos) 177 | 2: eval(mf, parent.frame()) 178 | 1: lm(y ~ x) 179 | \end{verbatim} 180 | \end{frame} 181 | 182 | \begin{frame}[fragile]{debug} 183 | \begin{verbatim} 184 | > debug(lm) 185 | > lm(y ~ x) 186 | debugging in: lm(y ~ x) 187 | debug: { 188 | ret.x <- x 189 | ret.y <- y 190 | cl <- match.call() 191 | ... 192 | if (!qr) 193 | z$qr <- NULL 194 | z 195 | } 196 | Browse[2]> 197 | \end{verbatim} 198 | \end{frame} 199 | 200 | \begin{frame}[fragile]{debug} 201 | \begin{verbatim} 202 | Browse[2]> n 203 | debug: ret.x <- x 204 | Browse[2]> n 205 | debug: ret.y <- y 206 | Browse[2]> n 207 | debug: cl <- match.call() 208 | Browse[2]> n 209 | debug: mf <- match.call(expand.dots = FALSE) 210 | Browse[2]> n 211 | debug: m <- match(c("formula", "data", "subset", "weights", "na.action", 212 | "offset"), names(mf), 0L) 213 | \end{verbatim} 214 | \end{frame} 215 | 216 | \begin{frame}[fragile]{recover} 217 | \begin{verbatim} 218 | > options(error = recover) 219 | > read.csv("nosuchfile") 220 | Error in file(file, "rt") : cannot open the connection 221 | In addition: Warning message: 222 | In file(file, "rt") : 223 | cannot open file 'nosuchfile': No such file or directory 224 | 225 | Enter a frame number, or 0 to exit 226 | 227 | 1: read.csv("nosuchfile") 228 | 2: read.table(file = file, header = header, sep = sep, quote = quote, dec = de 229 | 3: file(file, "rt") 230 | 231 | Selection: 232 | \end{verbatim} 233 | \end{frame} 234 | 235 | \begin{frame}{Debugging} 236 | Summary 237 | \begin{itemize} 238 | \item There are three main indications of a problem/condition: 239 | message, warning, error; only an error is fatal 240 | \item When analyzing a function with a problem, make sure you can 241 | reproduce the problem, clearly state your expectations and how the 242 | output differs from your expectation 243 | \item Interactive debugging tools \code{traceback}, \code{debug}, 244 | \code{browser}, \code{trace}, and \code{recover} can be used to 245 | find problematic code in functions 246 | \item Debugging tools are not a substitute for thinking! 247 | \end{itemize} 248 | \end{frame} 249 | 250 | 251 | \end{document} 252 | 253 | 254 | -------------------------------------------------------------------------------- /functions.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Functions} 28 | 29 | \date{Computing for Data Analysis} 30 | 31 | \setbeamertemplate{footline}[page number] 32 | 33 | 34 | \begin{document} 35 | 36 | \begin{frame} 37 | \titlepage 38 | \end{frame} 39 | 40 | \begin{frame}[fragile]{Functions} 41 | Functions are created using the \code{function()} directive and are 42 | stored as R objects just like anything else. In particular, they are 43 | R objects of class ``function''. 44 | \begin{verbatim} 45 | f <- function() { 46 | ## Do something interesting 47 | } 48 | \end{verbatim} 49 | Functions in R are ``first class objects'', which means that they can 50 | be treated much like any other R object. Importantly, 51 | \begin{itemize} 52 | \item 53 | Functions can be passed as arguments to other functions 54 | \item 55 | Functions can be nested, so that you can define a function inside of 56 | another function 57 | \end{itemize} 58 | The return value of a function is the last expression in the function 59 | body to be evaluated. 60 | \end{frame} 61 | 62 | \begin{frame}{Function Arguments} 63 | Functions have \textit{named arguments} which potentially have 64 | \textit{default values}. 65 | \begin{itemize} 66 | \item 67 | The \textit{formal arguments} are the arguments included in the 68 | function definition 69 | \item 70 | The \code{formals} function returns a list of all the formal arguments 71 | of a function 72 | \item 73 | Not every function call in R makes use of all the formal arguments 74 | \item 75 | Function arguments can be \textit{missing} or might have default 76 | values 77 | \end{itemize} 78 | \end{frame} 79 | 80 | \begin{frame}[fragile]{Argument Matching} 81 | R functions arguments can be matched positionally or by name. So the 82 | following calls to \code{sd} are all equivalent 83 | \begin{verbatim} 84 | > mydata <- rnorm(100) 85 | > sd(mydata) 86 | > sd(x = mydata) 87 | > sd(x = mydata, na.rm = FALSE) 88 | > sd(na.rm = FALSE, x = mydata) 89 | > sd(na.rm = FALSE, mydata) 90 | \end{verbatim} 91 | Even though it's legal, I don't recommend messing around with the 92 | order of the arguments too much, since it can lead to some confusion. 93 | \end{frame} 94 | 95 | \begin{frame}[fragile]{Argument Matching} 96 | You can mix positional matching with matching by name. When an 97 | argument is matched by name, it is ``taken out'' of the argument list 98 | and the remaining unnamed arguments are matched in the order that they 99 | are listed in the function definition. 100 | \begin{verbatim} 101 | > args(lm) 102 | function (formula, data, subset, weights, na.action, 103 | method = "qr", model = TRUE, x = FALSE, 104 | y = FALSE, qr = TRUE, singular.ok = TRUE, 105 | contrasts = NULL, offset, ...) 106 | \end{verbatim} 107 | The following two calls are equivalent. 108 | \begin{verbatim} 109 | lm(data = mydata, y ~ x, model = FALSE, 1:100) 110 | lm(y ~ x, mydata, 1:100, model = FALSE) 111 | \end{verbatim} 112 | \end{frame} 113 | 114 | \begin{frame}{Argument Matching} 115 | \begin{itemize} 116 | \item 117 | Most of the time, named arguments are useful on the command line when 118 | you have a long argument list and you want to use the defaults for 119 | everything except for an argument near the end of the list 120 | \item 121 | Named arguments also help if you can remember the name of the argument 122 | and not its position on the argument list (plotting is a good 123 | example). 124 | \end{itemize} 125 | \end{frame} 126 | 127 | \begin{frame}{Argument Matching} 128 | Function arguments can also be \textit{partially} matched, which is 129 | useful for interactive work. The order of operations when given an 130 | argument is 131 | \begin{enumerate} 132 | \item 133 | Check for exact match for a named argument 134 | \item 135 | Check for a partial match 136 | \item 137 | Check for a positional match 138 | \end{enumerate} 139 | \end{frame} 140 | 141 | \begin{frame}[fragile]{Defining a Function} 142 | \begin{verbatim} 143 | f <- function(a, b = 1, c = 2, d = NULL) { 144 | 145 | } 146 | \end{verbatim} 147 | In addition to not specifying a default value, you can also set an 148 | argument value to \code{NULL}. 149 | \end{frame} 150 | 151 | \begin{frame}[fragile]{Lazy Evaluation} 152 | Arguments to functions are evaluated \textit{lazily}, so they are 153 | evaluated only as needed. 154 | \begin{verbatim} 155 | f <- function(a, b) { 156 | a^2 157 | } 158 | f(2) 159 | \end{verbatim} 160 | This function never actually uses the argument \code{b}, so calling 161 | \code{f(2)} will not produce an error because the 2 gets positionally 162 | matched to \code{a}. 163 | \end{frame} 164 | 165 | \begin{frame}[fragile]{Lazy Evaluation} 166 | Another example 167 | \begin{verbatim} 168 | f <- function(a, b) { 169 | print(a) 170 | print(b) 171 | } 172 | \end{verbatim} 173 | \begin{verbatim} 174 | > f(45) 175 | [1] 45 176 | Error in print(b) : argument "b" is missing, with no default 177 | > 178 | \end{verbatim} 179 | Notice that ``45'' got printed first before the error was triggered. 180 | This is because \code{b} did not have to be evaluated until after 181 | \code{print(a)}. Once the function tried to evaluate \code{print(b)} 182 | it had to throw an error. 183 | \end{frame} 184 | 185 | \begin{frame}[fragile]{The ``...'' Argument} 186 | The \code{...} argument indicate a variable number of arguments that 187 | are usually passed on to other functions. 188 | \begin{itemize} 189 | \item 190 | \code{...} is often used when extending another function and you don't 191 | want to copy the entire argument list of the original function 192 | \begin{verbatim} 193 | myplot <- function(x, y, type = "l", ...) { 194 | plot(x, y, type = type, ...) 195 | } 196 | \end{verbatim} 197 | \item 198 | Generic functions use \code{...} so that extra arguments can be passed 199 | to methods (more on this later). 200 | \begin{verbatim} 201 | > mean 202 | function (x, ...) 203 | UseMethod("mean") 204 | \end{verbatim} 205 | \end{itemize} 206 | \end{frame} 207 | 208 | \begin{frame}[fragile]{The ``...'' Argument} 209 | The \code{...} argument is also necessary when the number of arguments 210 | passed to the function cannot be known in advance. 211 | \begin{verbatim} 212 | > args(paste) 213 | function (..., sep = " ", collapse = NULL) 214 | 215 | > args(cat) 216 | function (..., file = "", sep = " ", fill = FALSE, 217 | labels = NULL, append = FALSE) 218 | \end{verbatim} 219 | \end{frame} 220 | 221 | \begin{frame}[fragile]{Arguments Coming After the ``...'' Argument} 222 | One catch with \code{...} is that any arguments that appear 223 | \textit{after} \code{...} on the argument list must be named 224 | explicitly and cannot be partially matched. 225 | \begin{verbatim} 226 | > args(paste) 227 | function (..., sep = " ", collapse = NULL) 228 | 229 | > paste("a", "b", sep = ":") 230 | [1] "a:b" 231 | 232 | > paste("a", "b", se = ":") 233 | [1] "a b :" 234 | \end{verbatim} 235 | \end{frame} 236 | 237 | 238 | \begin{frame}[fragile]{A Diversion on Binding Values to Symbol} 239 | How does R know which value to assign to which symbol? When I type 240 | \begin{verbatim} 241 | > lm <- function(x) { x * x } 242 | > lm 243 | function(x) { x * x } 244 | \end{verbatim} 245 | how does R know what value to assign to the symbol \code{lm}? Why 246 | doesn't it give it the value of \code{lm} that is in the \pkg{stats} 247 | package? 248 | \end{frame} 249 | 250 | \begin{frame}[fragile]{A Diversion on Binding Values to Symbol} 251 | When R tries to bind a value to a symbol, it searches through a series 252 | of \code{environments} to find the appropriate value. When you are 253 | working on the command line and need to retrieve the value of an R 254 | object, the order is roughly 255 | \begin{enumerate} 256 | \item 257 | Search the global environment for a symbol name matching the one 258 | requested. 259 | \item 260 | Search the namespaces of each of the packages on the search list 261 | \end{enumerate} 262 | The search list can be found by using the \code{search} function. 263 | \begin{verbatim} 264 | > search() 265 | [1] ".GlobalEnv" "package:stats" "package:graphics" 266 | [4] "package:grDevices" "package:utils" "package:datasets" 267 | [7] "package:methods" "Autoloads" "package:base" 268 | \end{verbatim} 269 | \end{frame} 270 | 271 | \begin{frame}[fragile]{Binding Values to Symbol} 272 | \begin{itemize} 273 | \item 274 | The \textit{global environment} or the user's workspace is always the 275 | first element of the search list and the \pkg{base} package is always 276 | the last. 277 | \item 278 | The order of the packages on the search list matters! 279 | \item 280 | User's can configure which packages get loaded on startup so you 281 | cannot assume that there will be a set list of packages available. 282 | \item 283 | When a user loads a package with \code{library} the namespace of that 284 | package gets put in position 2 of the search list (by default) and 285 | everything else gets shifted down the list. 286 | \item 287 | Note that R has separate namespaces for functions and non-functions so 288 | it's possible to have an object named \code{c} and a function named 289 | \code{c}. 290 | \end{itemize} 291 | \end{frame} 292 | 293 | \begin{frame}{Scoping Rules} 294 | The scoping rules for R are the main feature that make it different 295 | from the original S language. 296 | \begin{itemize} 297 | \item 298 | The scoping rules determine how a value is associated with a free 299 | variable in a function 300 | \item 301 | R uses \textit{lexical scoping} or \textit{static scoping}. A common 302 | alternative is \textit{dynamic scoping}. 303 | \item 304 | Related to the scoping rules is how R uses the \textit{search list} to 305 | bind a value to a symbol 306 | \item 307 | Lexical scoping turns out to be particularly useful for simplifying 308 | statistical computations 309 | \end{itemize} 310 | \end{frame} 311 | 312 | \begin{frame}[fragile]{Lexical Scoping} 313 | Consider the following function. 314 | \begin{verbatim} 315 | f <- function(x, y) { 316 | x^2 + y / z 317 | } 318 | \end{verbatim} 319 | This function has 2 formal arguments \code{x} and \code{y}. In the 320 | body of the function there is another symbol \code{z}. In this case 321 | \code{z} is called a \textit{free variable}. 322 | 323 | The scoping rules of a language determine how values are assigned to 324 | free variables. Free variables are not formal arguments and are not 325 | local variables (assigned insided the function body). 326 | \end{frame} 327 | 328 | \begin{frame}[fragile]{Lexical Scoping} 329 | Lexical scoping in R means that 330 | \begin{quote} 331 | \textit{the values of free variables 332 | are searched for in the environment in which the function was 333 | defined}. 334 | \end{quote} 335 | What is an environment? 336 | \begin{itemize} 337 | \item 338 | An \textit{environment} is a collection of (symbol, value) pairs, 339 | i.e. \code{x} is a symbol and \code{3.14} might be its value. 340 | \item 341 | Every environment has a parent environment; it is possible for an 342 | environment to have multiple ``children'' 343 | \item 344 | the only environment without a parent is the empty environment 345 | \item 346 | A function + an environment = a \textit{closure} or \textit{function 347 | closure}. 348 | \end{itemize} 349 | \end{frame} 350 | 351 | 352 | \begin{frame}{Lexical Scoping} 353 | Searching for the value for a free variable: 354 | \begin{itemize} 355 | \item 356 | If the value of a symbol is not found in the environment in which a 357 | function was defined, then the search is continued in the 358 | \textit{parent environment}. 359 | \item 360 | The search continues down the sequence of parent environments until we 361 | hit the \textit{top-level environment}; this usually the global 362 | environment (workspace) or the namespace of a package. 363 | \item 364 | After the top-level environment, the search continues down the search 365 | list until we hit the \textit{empty environment}. 366 | \item 367 | If a value for a given symbol cannot be found once the empty 368 | environment is arrived at, then an error is thrown. 369 | \end{itemize} 370 | \end{frame} 371 | 372 | 373 | \begin{frame}{Lexical Scoping} 374 | Why does all this matter? 375 | \begin{itemize} 376 | \item 377 | Typically, a function is defined in the global environment, so that 378 | the values of free variables are just found in the user's workspace 379 | \item 380 | This behavior is logical for most people and is usually the ``right 381 | thing'' to do 382 | \item 383 | However, in R you can have functions defined \textit{inside other 384 | functions} 385 | \begin{itemize} 386 | \item 387 | Languages like C don't let you do this 388 | \end{itemize} 389 | \item 390 | Now things get interesting --- In this case the environment in which a 391 | function is defined is the body of another function! 392 | \end{itemize} 393 | \end{frame} 394 | 395 | \begin{frame}[fragile]{Lexical Scoping} 396 | \begin{verbatim} 397 | make.power <- function(n) { 398 | pow <- function(x) { 399 | x^n 400 | } 401 | pow 402 | } 403 | \end{verbatim} 404 | This function returns another function as its value. 405 | \begin{verbatim} 406 | > cube <- make.power(3) 407 | > square <- make.power(2) 408 | > cube(3) 409 | [1] 27 410 | > square(3) 411 | [1] 9 412 | \end{verbatim} 413 | \end{frame} 414 | 415 | \begin{frame}[fragile]{Exploring a Function Closure} 416 | What's in a function's environment? 417 | \begin{verbatim} 418 | > ls(environment(cube)) 419 | [1] "n" "pow" 420 | > get("n", environment(cube)) 421 | [1] 3 422 | 423 | > ls(environment(square)) 424 | [1] "n" "pow" 425 | > get("n", environment(square)) 426 | [1] 2 427 | \end{verbatim} 428 | \end{frame} 429 | 430 | \begin{frame}[fragile]{Lexical vs. Dynamic Scoping} 431 | \begin{verbatim} 432 | y <- 10 433 | 434 | f <- function(x) { 435 | y <- 2 436 | y^2 + g(x) 437 | } 438 | 439 | g <- function(x) { 440 | x * y 441 | } 442 | \end{verbatim} 443 | What is the value of 444 | \begin{verbatim} 445 | f(3) 446 | \end{verbatim} 447 | \end{frame} 448 | 449 | \begin{frame}[fragile]{Lexical vs. Dynamic Scoping} 450 | \begin{itemize} 451 | \item 452 | With lexical scoping the value of \code{y} in the function \code{g} is 453 | looked up in the environment in which the function was defined, in 454 | this case the global environment, so the value of \code{y} is 10. 455 | \item 456 | With dynamic scoping, the value of \code{y} is looked up in the 457 | environment from which the function was \textit{called} (sometimes 458 | referred to as the \textit{calling environment}). 459 | \begin{itemize} 460 | \item 461 | In R the calling environment is known as the \textit{parent frame} 462 | \end{itemize} 463 | So the value of \code{y} would be 2. 464 | \end{itemize} 465 | \end{frame} 466 | 467 | \begin{frame}[fragile]{Lexical vs. Dynamic Scoping} 468 | When a function is \textit{defined} in the global environment and is 469 | subsequently \textit{called} from the global environment, then the 470 | defining environment and the calling environment are the same. This 471 | can sometimes give the appearance of dynamic scoping. 472 | \begin{verbatim} 473 | > g <- function(x) { 474 | + a <- 3 475 | + x + a + y 476 | + } 477 | > g(2) 478 | Error in g(2) : object "y" not found 479 | > y <- 3 480 | > g(2) 481 | [1] 8 482 | \end{verbatim} 483 | \end{frame} 484 | 485 | 486 | \begin{frame}{Other Languages} 487 | Other languages that support lexical scoping 488 | \begin{itemize} 489 | \item 490 | Scheme 491 | \item 492 | Perl 493 | \item 494 | Python 495 | \item 496 | Common Lisp (all languages converge to Lisp) 497 | \end{itemize} 498 | \end{frame} 499 | 500 | \begin{frame}{Consequences of Lexical Scoping} 501 | \begin{itemize} 502 | \item 503 | In R, all objects must be stored in memory 504 | \item 505 | All functions must carry a pointer to their respective defining 506 | environments, which could be anywhere 507 | \item 508 | In S-PLUS, free variables are always looked up in the global 509 | workspace, so everything can be stored on the disk because the 510 | ``defining environment'' of all functions is the same. 511 | \end{itemize} 512 | \end{frame} 513 | 514 | \begin{frame}{Application: Optimization} 515 | Why is any of this information useful? 516 | \begin{itemize} 517 | \item 518 | Optimization routines in R like \code{optim}, \code{nlm}, and 519 | \code{optimize} require you to pass a function whose argument is a 520 | vector of parameters (e.g. a log-likelihood) 521 | \item 522 | However, an object function might depend on a host of other things 523 | besides its parameters (like \textit{data}) 524 | \item 525 | When writing software which does optimization, it may be desirable to 526 | allow the user to hold certain parameters fixed 527 | \end{itemize} 528 | \end{frame} 529 | 530 | \begin{frame}[fragile]{Maximizing a Normal Likelihood} 531 | Write a ``constructor'' function 532 | \begin{verbatim} 533 | make.NegLogLik <- function(data, fixed=c(FALSE,FALSE)) { 534 | params <- fixed 535 | function(p) { 536 | params[!fixed] <- p 537 | mu <- params[1] 538 | sigma <- params[2] 539 | a <- -0.5*length(data)*log(2*pi*sigma^2) 540 | b <- -0.5*sum((data-mu)^2) / (sigma^2) 541 | -(a + b) 542 | } 543 | } 544 | \end{verbatim} 545 | \textbf{Note}: Optimization functions in R \textit{minimize} 546 | functions, so you need to use the negative log-likelihood. 547 | \end{frame} 548 | 549 | 550 | \begin{frame}[fragile]{Maximizing a Normal Likelihood} 551 | \begin{verbatim} 552 | > set.seed(1); normals <- rnorm(100, 1, 2) 553 | > nLL <- make.NegLogLik(normals) 554 | > nLL 555 | function(p) { 556 | params[!fixed] <- p 557 | mu <- params[1] 558 | sigma <- params[2] 559 | a <- -0.5*length(data)*log(2*pi*sigma^2) 560 | b <- -0.5*sum((data-mu)^2) / (sigma^2) 561 | -(a + b) 562 | } 563 | 564 | > ls(environment(nLL)) 565 | [1] "data" "fixed" "params" 566 | \end{verbatim} 567 | \end{frame} 568 | 569 | \begin{frame}[fragile]{Estimating Parameters} 570 | \begin{verbatim} 571 | > optim(c(mu = 0, sigma = 1), nLL)$par 572 | mu sigma 573 | 1.218239 1.787343 574 | \end{verbatim} 575 | Fixing $\sigma = 2$ 576 | \begin{verbatim} 577 | > nLL <- make.NegLogLik(normals, c(FALSE, 2)) 578 | > optimize(nLL, c(-1, 3))$minimum 579 | [1] 1.217775 580 | \end{verbatim} 581 | Fixing $\mu = 1$ 582 | \begin{verbatim} 583 | > nLL <- make.NegLogLik(normals, c(1, FALSE)) 584 | > optimize(nLL, c(1e-6, 10))$minimum 585 | [1] 1.800596 586 | \end{verbatim} 587 | \end{frame} 588 | 589 | \begin{frame}[fragile]{Plotting the Likelihood} 590 | \begin{verbatim} 591 | nLL <- make.NegLogLik(normals, c(1, FALSE)) 592 | x <- seq(1.7, 1.9, len = 100) 593 | y <- sapply(x, nLL) 594 | plot(x, exp(-(y - min(y))), type = "l") 595 | 596 | nLL <- make.NegLogLik(normals, c(FALSE, 2)) 597 | x <- seq(0.5, 1.5, len = 100) 598 | y <- sapply(x, nLL) 599 | plot(x, exp(-(y - min(y))), type = "l") 600 | \end{verbatim} 601 | \end{frame} 602 | 603 | 604 | \begin{frame}{Plotting the Likelihood} 605 | \includegraphics[width=3in,height=3in]{mulike} 606 | \end{frame} 607 | 608 | \begin{frame}{Plotting the Likelihood} 609 | \includegraphics[width=3in,height=3in]{sigmalike} 610 | \end{frame} 611 | 612 | 613 | \begin{frame}{Lexical Scoping Summary} 614 | \begin{itemize} 615 | \item 616 | Objective functions can be ``built'' which contain all of the 617 | necessary data for evaluating the function 618 | \item 619 | No need to carry around long argument lists --- useful for interactive 620 | and exploratory work. 621 | \item 622 | Code can be simplified and cleand up 623 | \item 624 | Reference: Robert Gentleman and Ross Ihaka (2000). ``Lexical Scope and 625 | Statistical Computing,'' \textit{JCGS}, 9, 491--508. 626 | \end{itemize} 627 | \end{frame} 628 | 629 | 630 | 631 | \end{document} 632 | -------------------------------------------------------------------------------- /ggplot2_part1.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/ggplot2_part1.pptx -------------------------------------------------------------------------------- /ggplot2_part2.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/ggplot2_part2.pptx -------------------------------------------------------------------------------- /grep.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | \usepackage{amsmath,amsfonts,amssymb} 17 | 18 | \setbeamertemplate{footline}[page number] 19 | 20 | \input{macros} 21 | 22 | \title[Regular Expressions in R]{Regular Expressions in R} 23 | 24 | 25 | \date{Computing for Data Analysis} 26 | 27 | 28 | 29 | \begin{document} 30 | 31 | \begin{frame} 32 | \titlepage 33 | \end{frame} 34 | 35 | \begin{frame}{Regular Expression Functions} 36 | The primary R functions for dealing with regular expressions are 37 | \begin{itemize} 38 | \item \code{grep}, \code{grepl}: Search for matches of a regular 39 | expression/pattern in a character vector; either return the indices 40 | into the character vector that match, the strings that happen to 41 | match, or a TRUE/FALSE vector indicating which elements match 42 | \item \code{regexpr}, \code{gregexpr}: Search a character vector for regular 43 | expression matches and return the indices of the string where the 44 | match begins and the length of the match 45 | \item \code{sub}, \code{gsub}: Search a character vector for regular 46 | expression matches and replace that match with another string 47 | \item \code{regexec}: Easier to explain through demonstration. 48 | \end{itemize} 49 | \end{frame} 50 | 51 | \begin{frame}[fragile]{grep} 52 | Here is an excerpt of the Baltimore City homicides dataset: 53 | \begin{verbatim} 54 | > homicides <- readLines("homicides.txt") 55 | > homicides[1] 56 | [1] "39.311024, -76.674227, iconHomicideShooting, 'p2', '
Leon 57 | Nelson
3400 Clifton Ave.
Baltimore, MD 58 | 21216
black male, 17 years old
59 |
Found on January 1, 2007
Victim died at Shock 60 | Trauma
Cause: shooting
'" 61 | 62 | > homicides[1000] 63 | [1] "39.33626300000, -76.55553990000, icon_homicide_shooting, 'p1200',... 64 | \end{verbatim} 65 | How can I find the records for all the victims of shootings (as 66 | opposed to other causes)? 67 | \end{frame} 68 | 69 | \begin{frame}[fragile]{grep} 70 | 71 | \begin{verbatim} 72 | > length(grep("iconHomicideShooting", homicides)) 73 | [1] 228 74 | > length(grep("iconHomicideShooting|icon_homicide_shooting", homicides)) 75 | [1] 1003 76 | > length(grep("Cause: shooting", homicides)) 77 | [1] 228 78 | > length(grep("Cause: [Ss]hooting", homicides)) 79 | [1] 1003 80 | > length(grep("[Ss]hooting", homicides)) 81 | [1] 1005 82 | \end{verbatim} 83 | 84 | \end{frame} 85 | 86 | 87 | \begin{frame}[fragile]{grep} 88 | \begin{verbatim} 89 | > i <- grep("[cC]ause: [Ss]hooting", homicides) 90 | > j <- grep("[Ss]hooting", homicides) 91 | > str(i) 92 | int [1:1003] 1 2 6 7 8 9 10 11 12 13 ... 93 | > str(j) 94 | int [1:1005] 1 2 6 7 8 9 10 11 12 13 ... 95 | > setdiff(i, j) 96 | integer(0) 97 | > setdiff(j, i) 98 | [1] 318 859 99 | \end{verbatim} 100 | \end{frame} 101 | 102 | 103 | \begin{frame}[fragile]{grep} 104 | \begin{verbatim} 105 | > homicides[859] 106 | [1] "39.33743900000, -76.66316500000, icon_homicide_bluntforce, 107 | 'p914', '
Steven Harris 109 |
4200 Pimlico Road
Baltimore, MD 21215 110 |
Race: Black
Gender: male
Age: 38 years old
111 |
Found on July 29, 2010
Victim died at Scene
112 |
Cause: Blunt Force

Harris was 113 | found dead July 22 and ruled a shooting victim; an autopsy 114 | subsequently showed that he had not been shot,...

'" 115 | \end{verbatim} 116 | \end{frame} 117 | 118 | 119 | \begin{frame}[fragile]{grep} 120 | By default, \code{grep} returns the indices into the character vector 121 | where the regex pattern matches. 122 | \begin{verbatim} 123 | > grep("^New", state.name) 124 | [1] 29 30 31 32 125 | \end{verbatim} 126 | Setting \code{value = TRUE} returns 127 | the actual elements of the character vector that match. 128 | \begin{verbatim} 129 | > grep("^New", state.name, value = TRUE) 130 | [1] "New Hampshire" "New Jersey" "New Mexico" "New York" 131 | \end{verbatim} 132 | \code{grepl} returns a logical vector indicating which element matches. 133 | \begin{verbatim} 134 | > grepl("^New", state.name) 135 | [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 136 | [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 137 | [25] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 138 | [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 139 | [49] FALSE FALSE 140 | \end{verbatim} 141 | \end{frame} 142 | 143 | 144 | \begin{frame}[fragile]{regexpr} 145 | Some limitations of \code{grep} 146 | \begin{itemize} 147 | \item The \code{grep} function tells you which strings in a character 148 | vector match a certain pattern but it doesn't tell you exactly where 149 | the match occurs or what the match is (for a more complicated 150 | regex. 151 | \item The \code{regexpr} function gives you the index into each string 152 | where the match begins and the length of the match for that string. 153 | \item \code{regexpr} only gives you the first match of the string 154 | (reading left to right). \code{gregexpr} will give you all of the 155 | matches in a given string. 156 | \end{itemize} 157 | \end{frame} 158 | 159 | 160 | \begin{frame}[fragile]{regexpr} 161 | How can we find the date of the homicide? 162 | \begin{verbatim} 163 | > homicides[1] 164 | [1] "39.311024, -76.674227, iconHomicideShooting, 'p2', '
Leon 165 | Nelson
3400 Clifton Ave.
Baltimore, 166 | MD 21216
black male, 17 years old
167 |
Found on January 1, 2007
Victim died at Shock 168 | Trauma
Cause: shooting
'" 169 | \end{verbatim} 170 | Can we just 'grep' on ``Found''? 171 | \end{frame} 172 | 173 | \begin{frame}[fragile]{regexpr} 174 | The word 'found' may be found elsewhere in the entry. 175 | \begin{verbatim} 176 | > homicides[954] 177 | [1] "39.30677400000, -76.59891100000, icon_homicide_shooting, 'p816', 178 | '
1400 N Caroline St
Baltimore, MD 21213
179 |
Race: Black
Gender: male
Age: 29 years old
180 |
Found on March 3, 2010
Victim died at Scene
181 |
Cause: Shooting

Wheeler\\'s body 182 | was found on the grounds of Dr. Bernard Harris Sr. Elementary 183 | School

'" 184 | \end{verbatim} 185 | \end{frame} 186 | 187 | \begin{frame}[fragile]{regexpr} 188 | Let's use the pattern 189 | \begin{verbatim} 190 |
[F|f]ound(.*)
191 | \end{verbatim} 192 | What does this look for? 193 | \begin{verbatim} 194 | > regexpr("
[F|f]ound(.*)
", homicides[1:10]) 195 | [1] 177 178 188 189 178 182 178 187 182 183 196 | attr(,"match.length") 197 | [1] 93 86 89 90 89 84 85 84 88 84 198 | attr(,"useBytes") 199 | [1] TRUE 200 | > substr(homicides[1], 177, 177 + 93 - 1) 201 | [1] "
Found on January 1, 2007
Victim died at Shock 202 | Trauma
Cause: shooting
" 203 | \end{verbatim} 204 | \end{frame} 205 | 206 | 207 | \begin{frame}[fragile]{regexpr} 208 | The previous pattern was too greedy and matched too much of the 209 | string. We need to use the \code{?} metacharacter to make the regex 210 | ``lazy''. 211 | \begin{verbatim} 212 | > regexpr("
[F|f]ound(.*?)
", homicides[1:10]) 213 | [1] 177 178 188 189 178 182 178 187 182 183 214 | attr(,"match.length") 215 | [1] 33 33 33 33 33 33 33 33 33 33 216 | attr(,"useBytes") 217 | [1] TRUE 218 | 219 | > substr(homicides[1], 177, 177 + 33 - 1) 220 | [1] "
Found on January 1, 2007
" 221 | 222 | \end{verbatim} 223 | \end{frame} 224 | 225 | \begin{frame}[fragile]{regmatches} 226 | One handy function is \code{regmatches} which extracts the matches in 227 | the strings for you without you having to use \code{substr}. 228 | \begin{verbatim} 229 | > r <- regexpr("
[F|f]ound(.*?)
", homicides[1:5]) 230 | > regmatches(homicides[1:5], r) 231 | [1] "
Found on January 1, 2007
" "
Found on January 2, 2007
" 232 | [3] "
Found on January 2, 2007
" "
Found on January 3, 2007
" 233 | [5] "
Found on January 5, 2007
" 234 | \end{verbatim} 235 | \end{frame} 236 | 237 | \begin{frame}[fragile]{sub/gsub} 238 | Sometimes we need to clean things up or modify strings by matching a 239 | pattern and replacing it with something else. For example, how can we 240 | extract the data from this string? 241 | \begin{verbatim} 242 | > x <- substr(homicides[1], 177, 177 + 33 - 1) 243 | > x 244 | [1] "
Found on January 1, 2007
" 245 | \end{verbatim} 246 | We want to strip out the stuff surrounding the ``January 1, 2007'' 247 | piece. 248 | \begin{verbatim} 249 | > sub("
[F|f]ound on |
", "", x) 250 | [1] "January 1, 2007" 251 | 252 | > gsub("
[F|f]ound on |
", "", x) 253 | [1] "January 1, 2007" 254 | \end{verbatim} 255 | \end{frame} 256 | 257 | 258 | \begin{frame}[fragile]{sub/gsub} 259 | sub/gsub can take vector arguments 260 | \begin{verbatim} 261 | > r <- regexpr("
[F|f]ound(.*?)
", homicides[1:5]) 262 | > m <- regmatches(homicides[1:5], r) 263 | > m 264 | [1] "
Found on January 1, 2007
" "
Found on January 2, 2007
" 265 | [3] "
Found on January 2, 2007
" "
Found on January 3, 2007
" 266 | [5] "
Found on January 5, 2007
" 267 | > gsub("
[F|f]ound on |
", "", m) 268 | [1] "January 1, 2007" "January 2, 2007" "January 2, 2007" "January 3, 2007" 269 | [5] "January 5, 2007" 270 | > as.Date(d, "%B %d, %Y") 271 | [1] "2007-01-01" "2007-01-02" "2007-01-02" "2007-01-03" "2007-01-05" 272 | \end{verbatim} 273 | \end{frame} 274 | 275 | \begin{frame}[fragile]{regexec} 276 | The \code{regexec} function works like \code{regexpr} except it gives 277 | you the indices for parenthesized sub-expressions. 278 | \begin{verbatim} 279 | > regexec("
[F|f]ound on (.*?)
", homicides[1]) 280 | [[1]] 281 | [1] 177 190 282 | attr(,"match.length") 283 | [1] 33 15 284 | 285 | > regexec("
[F|f]ound on .*?
", homicides[1]) 286 | [[1]] 287 | [1] 177 288 | attr(,"match.length") 289 | [1] 33 290 | \end{verbatim} 291 | \end{frame} 292 | 293 | 294 | \begin{frame}[fragile]{regexec} 295 | Now we can extract the string in the parenthesized sub-expression. 296 | \begin{verbatim} 297 | > regexec("
[F|f]ound on (.*?)
", homicides[1]) 298 | [[1]] 299 | [1] 177 190 300 | attr(,"match.length") 301 | [1] 33 15 302 | 303 | > substr(homicides[1], 177, 177 + 33 - 1) 304 | [1] "
Found on January 1, 2007
" 305 | 306 | > substr(homicides[1], 190, 190 + 15 - 1) 307 | [1] "January 1, 2007" 308 | \end{verbatim} 309 | \end{frame} 310 | 311 | \begin{frame}[fragile]{regexec} 312 | Even easier with the \code{regmatches} function. 313 | \begin{verbatim} 314 | > r <- regexec("
[F|f]ound on (.*?)
", homicides[1:2]) 315 | > regmatches(homicides[1:2], r) 316 | [[1]] 317 | [1] "
Found on January 1, 2007
" "January 1, 2007" 318 | 319 | [[2]] 320 | [1] "
Found on January 2, 2007
" "January 2, 2007" 321 | \end{verbatim} 322 | \end{frame} 323 | 324 | \begin{frame}[fragile]{regexec} 325 | Let's make a plot of monthly homicide counts 326 | \begin{verbatim} 327 | > r <- regexec("
[F|f]ound on (.*?)
", homicides) 328 | > m <- regmatches(homicides, r) 329 | > dates <- sapply(m, function(x) x[2]) 330 | > dates <- as.Date(dates, "%B %d, %Y") 331 | > hist(dates, "month", freq = TRUE) 332 | \end{verbatim} 333 | \end{frame} 334 | 335 | \begin{frame}[fragile]{regexec} 336 | \includegraphics[height=3.2in]{homicide-month} 337 | \end{frame} 338 | 339 | 340 | \begin{frame}{Summary} 341 | The primary R functions for dealing with regular expressions are 342 | \begin{itemize} 343 | \item \code{grep}, \code{grepl}: Search for matches of a regular 344 | expression/pattern in a character vector 345 | \item \code{regexpr}, \code{gregexpr}: Search a character vector for regular 346 | expression matches and return the indices where the match begins; 347 | useful in conjunction with \code{regmatches} 348 | \item \code{sub}, \code{gsub}: Search a character vector for regular 349 | expression matches and replace that match with another string 350 | \item \code{regexec}: Gives you indices of parethensized sub-expressions. 351 | \end{itemize} 352 | \end{frame} 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | \end{document} 362 | -------------------------------------------------------------------------------- /help.ppt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/help.ppt -------------------------------------------------------------------------------- /homicide-month.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/homicide-month.pdf -------------------------------------------------------------------------------- /knitr.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/knitr.pptx -------------------------------------------------------------------------------- /linearmodelsim.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/linearmodelsim.pdf -------------------------------------------------------------------------------- /loopfunctions.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Loop Functions} 28 | 29 | \date{Computing for Data Analysis} 30 | 31 | 32 | 33 | \begin{document} 34 | 35 | \begin{frame} 36 | \titlepage 37 | \end{frame} 38 | 39 | \begin{frame}{Looping on the Command Line} 40 | Writing for, while loops is useful when programming but not 41 | particularly easy when working interactively on the command line. 42 | There are some functions which implement looping to make life easier. 43 | \begin{itemize} 44 | \item 45 | \code{lapply}: Loop over a list and evaluate a function on each element 46 | \item 47 | \code{sapply}: Same as \code{lapply} but try to simplify the result 48 | \item 49 | \code{apply}: Apply a function over the margins of an array 50 | \item 51 | \code{tapply}: Apply a function over subsets of a vector 52 | \item 53 | \code{mapply}: Multivariate version of \code{lapply} 54 | \end{itemize} 55 | An auxiliary function \code{split} is also useful, particularly in 56 | conjunction with \code{lapply}. 57 | \end{frame} 58 | 59 | \begin{frame}[fragile]{lapply} 60 | \code{lapply} takes three arguments: a list \code{X}, a function (or 61 | the name of a function) \code{FUN}, and other arguments via its 62 | \code{...} argument. If \code{X} is not a list, it will be coerced to 63 | a list using \code{as.list}. 64 | \begin{verbatim} 65 | > lapply 66 | function (X, FUN, ...) 67 | { 68 | FUN <- match.fun(FUN) 69 | if (!is.vector(X) || is.object(X)) 70 | X <- as.list(X) 71 | .Internal(lapply(X, FUN)) 72 | } 73 | \end{verbatim} 74 | The actual looping is done internally in C code. 75 | \end{frame} 76 | 77 | 78 | \begin{frame}[fragile]{lapply} 79 | \code{lapply} always returns a list, regardless of the class of the 80 | input. 81 | \begin{verbatim} 82 | > x <- list(a = 1:5, b = rnorm(10)) 83 | > lapply(x, mean) 84 | $a 85 | [1] 3 86 | 87 | $b 88 | [1] 0.0296824 89 | \end{verbatim} 90 | \end{frame} 91 | 92 | 93 | \begin{frame}[fragile]{lapply} 94 | \begin{verbatim} 95 | > x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) 96 | > lapply(x, mean) 97 | $a 98 | [1] 2.5 99 | 100 | $b 101 | [1] 0.06082667 102 | 103 | $c 104 | [1] 1.467083 105 | 106 | $d 107 | [1] 5.074749 108 | \end{verbatim} 109 | \end{frame} 110 | 111 | 112 | \begin{frame}[fragile]{lapply} 113 | \begin{verbatim} 114 | > x <- 1:4 115 | > lapply(x, runif) 116 | [[1]] 117 | [1] 0.2675082 118 | 119 | [[2]] 120 | [1] 0.2186453 0.5167968 121 | 122 | [[3]] 123 | [1] 0.2689506 0.1811683 0.5185761 124 | 125 | [[4]] 126 | [1] 0.5627829 0.1291569 0.2563676 0.7179353 127 | \end{verbatim} 128 | \end{frame} 129 | 130 | \begin{frame}[fragile]{lapply} 131 | \begin{verbatim} 132 | > x <- 1:4 133 | > lapply(x, runif, min = 0, max = 10) 134 | [[1]] 135 | [1] 3.302142 136 | 137 | [[2]] 138 | [1] 6.848960 7.195282 139 | 140 | [[3]] 141 | [1] 3.5031416 0.8465707 9.7421014 142 | 143 | [[4]] 144 | [1] 1.195114 3.594027 2.930794 2.766946 145 | \end{verbatim} 146 | \end{frame} 147 | 148 | 149 | \begin{frame}[fragile]{lapply} 150 | \code{lapply} and friends make heavy use of \textit{anonymous functions}. 151 | \begin{verbatim} 152 | > x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2)) 153 | > x 154 | $a 155 | [,1] [,2] 156 | [1,] 1 3 157 | [2,] 2 4 158 | 159 | $b 160 | [,1] [,2] 161 | [1,] 1 4 162 | [2,] 2 5 163 | [3,] 3 6 164 | \end{verbatim} 165 | \end{frame} 166 | 167 | 168 | \begin{frame}[fragile]{lapply} 169 | An anonymous function for extracting the first column of each matrix. 170 | \begin{verbatim} 171 | > lapply(x, function(elt) elt[,1]) 172 | $a 173 | [1] 1 2 174 | 175 | $b 176 | [1] 1 2 3 177 | \end{verbatim} 178 | \end{frame} 179 | 180 | 181 | \begin{frame}[fragile]{sapply} 182 | \code{sapply} will try to simplify the result of \code{lapply} if 183 | possible. 184 | \begin{itemize} 185 | \item 186 | If the result is a list where every element is length 1, then a vector 187 | is returned 188 | \item 189 | If the result is a list where every element is a vector of the same 190 | length ($> 1$), a matrix is returned. 191 | \item 192 | If it can't figure things out, a list is returned 193 | \end{itemize} 194 | \end{frame} 195 | 196 | 197 | 198 | \begin{frame}[fragile]{sapply} 199 | \begin{verbatim} 200 | > x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) 201 | > lapply(x, mean) 202 | $a 203 | [1] 2.5 204 | 205 | $b 206 | [1] 0.06082667 207 | 208 | $c 209 | [1] 1.467083 210 | 211 | $d 212 | [1] 5.074749 213 | \end{verbatim} 214 | \end{frame} 215 | 216 | \begin{frame}[fragile]{sapply} 217 | \begin{verbatim} 218 | > sapply(x, mean) 219 | a b c d 220 | 2.50000000 0.06082667 1.46708277 5.07474950 221 | 222 | 223 | > mean(x) 224 | [1] NA 225 | Warning message: 226 | In mean.default(x) : argument is not numeric or logical: returning NA 227 | \end{verbatim} 228 | \end{frame} 229 | 230 | \begin{frame}[fragile]{apply} 231 | \code{apply} is used to a evaluate a function (often an anonymous one) 232 | over the margins of an array. 233 | \begin{itemize} 234 | \item 235 | It is most often used to apply a function to the rows or columns of a matrix 236 | \item 237 | It can be used with general arrays, e.g. taking the average of an 238 | array of matrices 239 | \item 240 | It is not really faster than writing a loop, but it works in one line! 241 | \end{itemize} 242 | \end{frame} 243 | 244 | 245 | \begin{frame}[fragile]{apply} 246 | \begin{verbatim} 247 | > str(apply) 248 | function (X, MARGIN, FUN, ...) 249 | \end{verbatim} 250 | \begin{itemize} 251 | \item 252 | \code{X} is an array 253 | \item 254 | \code{MARGIN} is an integer vector indicating which margins should be 255 | ``retained''. 256 | \item 257 | \code{FUN} is a function to be applied 258 | \item 259 | \code{...} is for other arguments to be passed to \code{FUN} 260 | \end{itemize} 261 | \end{frame} 262 | 263 | \begin{frame}[fragile]{apply} 264 | \begin{verbatim} 265 | > x <- matrix(rnorm(200), 20, 10) 266 | > apply(x, 2, mean) 267 | [1] 0.04868268 0.35743615 -0.09104379 268 | [4] -0.05381370 -0.16552070 -0.18192493 269 | [7] 0.10285727 0.36519270 0.14898850 270 | [10] 0.26767260 271 | 272 | > apply(x, 1, sum) 273 | [1] -1.94843314 2.60601195 1.51772391 274 | [4] -2.80386816 3.73728682 -1.69371360 275 | [7] 0.02359932 3.91874808 -2.39902859 276 | [10] 0.48685925 -1.77576824 -3.34016277 277 | [13] 4.04101009 0.46515429 1.83687755 278 | [16] 4.36744690 2.21993789 2.60983764 279 | [19] -1.48607630 3.58709251 280 | \end{verbatim} 281 | \end{frame} 282 | 283 | 284 | \begin{frame}{col/row sums and means} 285 | For sums and means of matrix dimensions, we have some shortcuts. 286 | \begin{itemize} 287 | \item 288 | \code{rowSums} = apply(x, 1, sum) 289 | \item 290 | \code{rowMeans} = apply(x, 1, mean) 291 | \item 292 | \code{colSums} = apply(x, 2, sum) 293 | \item 294 | \code{colMeans} = apply(x, 2, mean) 295 | \end{itemize} 296 | The shortcut functions are \textit{much} faster, but you won't notice 297 | unless you're using a large matrix. 298 | \end{frame} 299 | 300 | \begin{frame}[fragile]{Other Ways to Apply} 301 | Quantiles of the rows of a matrix. 302 | \begin{verbatim} 303 | > x <- matrix(rnorm(200), 20, 10) 304 | > apply(x, 1, quantile, probs = c(0.25, 0.75)) 305 | [,1] [,2] [,3] [,4] 306 | 25% -0.3304284 -0.99812467 -0.9186279 -0.49711686 307 | 75% 0.9258157 0.07065724 0.3050407 -0.06585436 308 | [,5] [,6] [,7] [,8] 309 | 25% -0.05999553 -0.6588380 -0.653250 0.01749997 310 | 75% 0.52928743 0.3727449 1.255089 0.72318419 311 | [,9] [,10] [,11] [,12] 312 | 25% -1.2467955 -0.8378429 -1.0488430 -0.7054902 313 | 75% 0.3352377 0.7297176 0.3113434 0.4581150 314 | [,13] [,14] [,15] [,16] 315 | 25% -0.1895108 -0.5729407 -0.5968578 -0.9517069 316 | 75% 0.5326299 0.5064267 0.4933852 0.8868922 317 | [,17] [,18] [,19] [,20] 318 | 25% -0.2502935 -0.7488003 -0.7190923 -0.638243 319 | 75% 0.7763024 0.2873202 0.6416363 1.271602 320 | \end{verbatim} 321 | \end{frame} 322 | 323 | 324 | \begin{frame}[fragile]{apply} 325 | Average matrix in an array 326 | \begin{verbatim} 327 | > a <- array(rnorm(2 * 2 * 10), c(2, 2, 10)) 328 | > apply(a, c(1, 2), mean) 329 | [,1] [,2] 330 | [1,] -0.2353245 -0.03980211 331 | [2,] -0.3339748 0.04364908 332 | 333 | > rowMeans(a, dims = 2) 334 | [,1] [,2] 335 | [1,] -0.2353245 -0.03980211 336 | [2,] -0.3339748 0.04364908 337 | \end{verbatim} 338 | \end{frame} 339 | 340 | \begin{frame}[fragile]{tapply} 341 | \code{tapply} is used to apply a function over subsets of a vector. I 342 | don't know why it's called \code{tapply}. 343 | \begin{verbatim} 344 | > str(tapply) 345 | function (X, INDEX, FUN = NULL, ..., simplify = TRUE) 346 | \end{verbatim} 347 | \begin{itemize} 348 | \item 349 | \code{X} is a vector 350 | \item 351 | \code{INDEX} is a factor or a list of factors (or else they are coerced to 352 | factors) 353 | \item 354 | \code{FUN} is a function to be applied 355 | \item 356 | \code{...} contains other arguments to be passed \code{FUN} 357 | \item 358 | \code{simplify}, should we simplify the result? 359 | \end{itemize} 360 | \end{frame} 361 | 362 | \begin{frame}[fragile]{tapply} 363 | Take group means. 364 | \begin{verbatim} 365 | > x <- c(rnorm(10), runif(10), rnorm(10, 1)) 366 | > f <- gl(3, 10) 367 | > f 368 | [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 369 | [24] 3 3 3 3 3 3 3 370 | Levels: 1 2 3 371 | > tapply(x, f, mean) 372 | 1 2 3 373 | 0.1144464 0.5163468 1.2463678 374 | \end{verbatim} 375 | \end{frame} 376 | 377 | \begin{frame}[fragile]{tapply} 378 | Take group means without simplification. 379 | \begin{verbatim} 380 | > tapply(x, f, mean, simplify = FALSE) 381 | $`1` 382 | [1] 0.1144464 383 | 384 | $`2` 385 | [1] 0.5163468 386 | 387 | $`3` 388 | [1] 1.246368 389 | \end{verbatim} 390 | \end{frame} 391 | 392 | 393 | \begin{frame}[fragile]{tapply} 394 | Find group ranges. 395 | \begin{verbatim} 396 | > tapply(x, f, range) 397 | $`1` 398 | [1] -1.097309 2.694970 399 | 400 | $`2` 401 | [1] 0.09479023 0.79107293 402 | 403 | $`3` 404 | [1] 0.4717443 2.5887025 405 | \end{verbatim} 406 | \end{frame} 407 | 408 | 409 | \begin{frame}[fragile]{split} 410 | \code{split} takes a vector or other objects and splits it into groups 411 | determined by a factor or list of factors. 412 | \begin{verbatim} 413 | > str(split) 414 | function (x, f, drop = FALSE, ...) 415 | \end{verbatim} 416 | \begin{itemize} 417 | \item 418 | \code{x} is a vector (or list) or data frame 419 | \item 420 | \code{f} is a factor (or coerced to one) or a list of factors 421 | \item 422 | \code{drop} indicates whether empty factors levels should be dropped 423 | \end{itemize} 424 | \end{frame} 425 | 426 | \begin{frame}[fragile]{split} 427 | \begin{verbatim} 428 | > x <- c(rnorm(10), runif(10), rnorm(10, 1)) 429 | > f <- gl(3, 10) 430 | > split(x, f) 431 | $`1` 432 | [1] -0.8493038 -0.5699717 -0.8385255 -0.8842019 433 | [5] 0.2849881 0.9383361 -1.0973089 2.6949703 434 | [9] 1.5976789 -0.1321970 435 | 436 | $`2` 437 | [1] 0.09479023 0.79107293 0.45857419 0.74849293 438 | [5] 0.34936491 0.35842084 0.78541705 0.57732081 439 | [9] 0.46817559 0.53183823 440 | 441 | $`3` 442 | [1] 0.6795651 0.9293171 1.0318103 0.4717443 443 | [5] 2.5887025 1.5975774 1.3246333 1.4372701 444 | [9] 1.3961579 1.0068999 445 | \end{verbatim} 446 | \end{frame} 447 | 448 | \begin{frame}[fragile]{split} 449 | A common idiom is \code{split} followed by an \code{lapply}. 450 | \begin{verbatim} 451 | > lapply(split(x, f), mean) 452 | $`1` 453 | [1] 0.1144464 454 | 455 | $`2` 456 | [1] 0.5163468 457 | 458 | $`3` 459 | [1] 1.246368 460 | \end{verbatim} 461 | \end{frame} 462 | 463 | 464 | \begin{frame}[fragile]{Splitting a Data Frame} 465 | \begin{verbatim} 466 | > library(datasets) 467 | > head(airquality) 468 | Ozone Solar.R Wind Temp Month Day 469 | 1 41 190 7.4 67 5 1 470 | 2 36 118 8.0 72 5 2 471 | 3 12 149 12.6 74 5 3 472 | 4 18 313 11.5 62 5 4 473 | 5 NA NA 14.3 56 5 5 474 | 6 28 NA 14.9 66 5 6 475 | \end{verbatim} 476 | \end{frame} 477 | 478 | \begin{frame}[fragile]{Splitting a Data Frame} 479 | \begin{verbatim} 480 | > s <- split(airquality, airquality$Month) 481 | > lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")])) 482 | $`5` 483 | Ozone Solar.R Wind 484 | NA NA 11.62258 485 | 486 | $`6` 487 | Ozone Solar.R Wind 488 | NA 190.16667 10.26667 489 | 490 | $`7` 491 | Ozone Solar.R Wind 492 | NA 216.483871 8.941935 493 | 494 | $`8` 495 | Ozone Solar.R Wind 496 | NA NA 8.793548 497 | 498 | $`9` 499 | Ozone Solar.R Wind 500 | NA 167.4333 10.1800 501 | \end{verbatim} 502 | \end{frame} 503 | 504 | 505 | \begin{frame}[fragile]{Splitting a Data Frame} 506 | \begin{verbatim} 507 | > sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")])) 508 | 5 6 7 8 9 509 | Ozone NA NA NA NA NA 510 | Solar.R NA 190.16667 216.483871 NA 167.4333 511 | Wind 11.62258 10.26667 8.941935 8.793548 10.1800 512 | 513 | 514 | 515 | > sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], 516 | na.rm = TRUE)) 517 | 5 6 7 8 9 518 | Ozone 23.61538 29.44444 59.115385 59.961538 31.44828 519 | Solar.R 181.29630 190.16667 216.483871 171.857143 167.43333 520 | Wind 11.62258 10.26667 8.941935 8.793548 10.18000 521 | \end{verbatim} 522 | \end{frame} 523 | 524 | 525 | 526 | \begin{frame}[fragile]{Splitting on More than One Level} 527 | \begin{verbatim} 528 | > x <- rnorm(10) 529 | > f1 <- gl(2, 5) 530 | > f2 <- gl(5, 2) 531 | > f1 532 | [1] 1 1 1 1 1 2 2 2 2 2 533 | Levels: 1 2 534 | > f2 535 | [1] 1 1 2 2 3 3 4 4 5 5 536 | Levels: 1 2 3 4 5 537 | > interaction(f1, f2) 538 | [1] 1.1 1.1 1.2 1.2 1.3 2.3 2.4 2.4 2.5 2.5 539 | 10 Levels: 1.1 2.1 1.2 2.2 1.3 2.3 1.4 ... 2.5 540 | \end{verbatim} 541 | \end{frame} 542 | 543 | \begin{frame}[fragile]{Splitting on More than One Level} 544 | Interactions can create empty levels. 545 | \begin{verbatim} 546 | > str(split(x, list(f1, f2))) 547 | List of 10 548 | $ 1.1: num [1:2] -0.378 0.445 549 | $ 2.1: num(0) 550 | $ 1.2: num [1:2] 1.4066 0.0166 551 | $ 2.2: num(0) 552 | $ 1.3: num -0.355 553 | $ 2.3: num 0.315 554 | $ 1.4: num(0) 555 | $ 2.4: num [1:2] -0.907 0.723 556 | $ 1.5: num(0) 557 | $ 2.5: num [1:2] 0.732 0.360 558 | \end{verbatim} 559 | \end{frame} 560 | 561 | \begin{frame}[fragile]{split} 562 | Empty levels can be dropped. 563 | \begin{verbatim} 564 | > str(split(x, list(f1, f2), drop = TRUE)) 565 | List of 6 566 | $ 1.1: num [1:2] -0.378 0.445 567 | $ 1.2: num [1:2] 1.4066 0.0166 568 | $ 1.3: num -0.355 569 | $ 2.3: num 0.315 570 | $ 2.4: num [1:2] -0.907 0.723 571 | $ 2.5: num [1:2] 0.732 0.360 572 | \end{verbatim} 573 | \end{frame} 574 | 575 | 576 | 577 | 578 | \begin{frame}[fragile]{mapply} 579 | \code{mapply} is a multivariate apply of sorts which applies a 580 | function in parallel over a set of arguments. 581 | \begin{verbatim} 582 | > str(mapply) 583 | function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, 584 | USE.NAMES = TRUE) 585 | \end{verbatim} 586 | \begin{itemize} 587 | \item 588 | \code{FUN} is a function to apply 589 | \item 590 | \code{...} contains arguments to apply over 591 | \item 592 | \code{MoreArgs} is a list of other arguments to \code{FUN}. 593 | \item 594 | \code{SIMPLIFY} indicates whether the result should be simplified 595 | \end{itemize} 596 | \end{frame} 597 | 598 | 599 | \begin{frame}[fragile]{mapply} 600 | The following is tedious to type 601 | \begin{verbatim} 602 | list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1)) 603 | \end{verbatim} 604 | Instead we can do 605 | \begin{verbatim} 606 | > mapply(rep, 1:4, 4:1) 607 | [[1]] 608 | [1] 1 1 1 1 609 | 610 | [[2]] 611 | [1] 2 2 2 612 | 613 | [[3]] 614 | [1] 3 3 615 | 616 | [[4]] 617 | [1] 4 618 | \end{verbatim} 619 | \end{frame} 620 | 621 | 622 | \begin{frame}[fragile]{Vectorizing a Function} 623 | \begin{verbatim} 624 | > noise <- function(n, mean, sd) { 625 | + rnorm(n, mean, sd) 626 | + } 627 | > noise(5, 1, 2) 628 | [1] 2.4831198 2.4790100 0.4855190 -1.2117759 629 | [5] -0.2743532 630 | 631 | > noise(1:5, 1:5, 2) 632 | [1] -4.2128648 -0.3989266 4.2507057 1.1572738 633 | [5] 3.7413584 634 | \end{verbatim} 635 | \end{frame} 636 | 637 | \begin{frame}[fragile]{Instant Vectorization} 638 | \begin{verbatim} 639 | > mapply(noise, 1:5, 1:5, 2) 640 | [[1]] 641 | [1] 1.037658 642 | 643 | [[2]] 644 | [1] 0.7113482 2.7555797 645 | 646 | [[3]] 647 | [1] 2.769527 1.643568 4.597882 648 | 649 | [[4]] 650 | [1] 4.476741 5.658653 3.962813 1.204284 651 | 652 | [[5]] 653 | [1] 4.797123 6.314616 4.969892 6.530432 6.723254 654 | \end{verbatim} 655 | \end{frame} 656 | 657 | \begin{frame}[fragile]{Instant Vectorization} 658 | Which is the same as 659 | \begin{verbatim} 660 | list(noise(1, 1, 2), noise(2, 2, 2), 661 | noise(3, 3, 2), noise(4, 4, 2), 662 | noise(5, 5, 2)) 663 | \end{verbatim} 664 | \end{frame} 665 | 666 | 667 | \end{document} 668 | -------------------------------------------------------------------------------- /macros.tex: -------------------------------------------------------------------------------- 1 | \newcommand{\pkg}{\textbf} 2 | \newcommand{\code}{\texttt} 3 | -------------------------------------------------------------------------------- /mulike.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/mulike.pdf -------------------------------------------------------------------------------- /overview_history.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[Overview and History of R]{Overview and History of R} 26 | 27 | 28 | \date{Computing for Data Analysis} 29 | 30 | \setbeamertemplate{footline}[page number] 31 | 32 | \begin{document} 33 | 34 | \begin{frame} 35 | \titlepage 36 | \end{frame} 37 | 38 | \begin{frame}{What is R?} 39 | What is R? 40 | \end{frame} 41 | 42 | 43 | \begin{frame}{What is R?} 44 | R is a dialect of the S language. 45 | \end{frame} 46 | 47 | \begin{frame}{What is S?} 48 | \begin{itemize} 49 | \item 50 | S is a language that was developed by John Chambers and others at Bell 51 | Labs. 52 | \item 53 | S was initiated in 1976 as an internal statistical analysis 54 | environment---originally implemented as Fortran libraries. 55 | \item 56 | Early versions of the language did not contain functions for 57 | statistical modeling. 58 | \item 59 | In 1988 the system was rewritten in C and began to resemble the system 60 | that we have today (this was Version 3 of the language). The book 61 | \textit{Statistical Models in S} by Chambers and Hastie (the white 62 | book) documents the statistical analysis functionality. 63 | \item 64 | Version 4 of the S language was released in 1998 and is the version we 65 | use today. The book \textit{Programming with Data} by John Chambers 66 | (the green book) documents this version of the language. 67 | \end{itemize} 68 | \end{frame} 69 | 70 | \begin{frame}{Historical Notes} 71 | \begin{itemize} 72 | \item 73 | In 1993 Bell Labs gave StatSci (now Insightful Corp.) an exclusive 74 | license to develop and sell the S language. 75 | \item 76 | In 2004 Insightful purchased the S language from Lucent for \$2 77 | million and is the current owner. 78 | \item 79 | In 2006, Alcatel purchased Lucent Technologies and is now called 80 | Alcatel-Lucent. 81 | \item 82 | Insightful sells its implementation of the S language under the 83 | product name S-PLUS and has built a number of fancy features (GUIs, 84 | mostly) on top of it---hence the ``PLUS''. 85 | \item 86 | In 2008 Insightful is acquired by TIBCO for \$25 million 87 | \item 88 | The fundamentals of the S language itself has not changed dramatically 89 | since 1998. 90 | \item 91 | In 1998, S won the Association for Computing Machinery's Software 92 | System Award. 93 | \end{itemize} 94 | \end{frame} 95 | 96 | 97 | \begin{frame}{S Philosophy} 98 | In ``Stages in the Evolution of S'', John Chambers writes: 99 | \begin{quote} 100 | ``[W]e wanted users to be able to begin in an interactive environment, 101 | where they did not consciously think of themselves as 102 | programming. Then as their needs became clearer and their 103 | sophistication increased, they should be able to slide gradually into 104 | programming, when the language and system aspects would become more 105 | important.'' 106 | \end{quote} 107 | http://www.stat.bell-labs.com/S/history.html 108 | \end{frame} 109 | 110 | \begin{frame}{Back to R} 111 | \begin{itemize} 112 | \item 113 | 1991: Created in New Zealand by Ross Ihaka and Robert Gentleman. 114 | Their experience developing R is documented in a 1996 \textit{JCGS} 115 | paper. 116 | \item 117 | 1993: First announcement of R to the public. 118 | \item 119 | 1995: Martin M\"achler convinces Ross and Robert to use the GNU 120 | General Public License to make R free software. 121 | \item 122 | 1996: A public mailing list is created (R-help and R-devel) 123 | \item 124 | 1997: The R Core Group is formed (containing some people associated 125 | with S-PLUS). The core group controls the source code for R. 126 | \item 127 | 2000: R version 1.0.0 is released. 128 | \item 129 | 2012: R version 2.15.1 is released on June 22, 2012. 130 | \end{itemize} 131 | \end{frame} 132 | 133 | 134 | \begin{frame}{Features of R} 135 | \begin{itemize} 136 | \item 137 | Syntax is very similar to S, making it easy for S-PLUS users to switch 138 | over. 139 | \item 140 | Semantics are superficially similar to S, but in reality are quite 141 | different (more on that later). 142 | \item 143 | Runs on almost any standard computing platform/OS (even on the 144 | PlayStation 3) 145 | \item 146 | Frequent releases (annual + bugfix releases); active development. 147 | \end{itemize} 148 | \end{frame} 149 | 150 | \begin{frame}{Features of R (cont'd)} 151 | \begin{itemize} 152 | \item 153 | Quite lean, as far as software goes; functionality is divided into 154 | modular packages 155 | \item 156 | Graphics capabilities very sophisticated and better than most stat 157 | packages. 158 | \item 159 | Useful for interactive work, but contains a powerful programming 160 | language for developing new tools (user $\longrightarrow$ programmer) 161 | \item 162 | Very active and vibrant user community; R-help and R-devel mailing 163 | lists and Stack Overflow 164 | \end{itemize} 165 | \end{frame} 166 | 167 | \begin{frame}{Features of R (cont'd)} 168 | It's free!\\ 169 | (Both in the sense of beer and in the sense of speech.) 170 | \end{frame} 171 | 172 | \begin{frame}{Free Software} 173 | With \textit{free software}, you are granted 174 | \begin{itemize} 175 | \item 176 | The freedom to run the program, for any purpose (freedom 0). 177 | \item 178 | The freedom to study how the program works, and adapt it to your needs 179 | (freedom 1). Access to the source code is a precondition for this. 180 | \item 181 | The freedom to redistribute copies so you can help your neighbor 182 | (freedom 2). 183 | \item 184 | The freedom to improve the program, and release your improvements to 185 | the public, so that the whole community benefits (freedom 3). Access 186 | to the source code is a precondition for this. 187 | \end{itemize} 188 | http://www.fsf.org 189 | \end{frame} 190 | 191 | \begin{frame}{Drawbacks of R} 192 | \begin{itemize} 193 | \item 194 | Essentially based on 40 year old technology. 195 | \item 196 | Little built in support for dynamic or 3-D graphics (but things have 197 | improved greatly since the ``old days''). 198 | \item 199 | Functionality is based on consumer demand and user contributions. If 200 | no one feels like implementing your favorite method, then it's 201 | \textit{your} job! 202 | \begin{itemize} 203 | \item (Or you need to pay someone to do it) 204 | \end{itemize} 205 | \item Objects must generally be stored in physical memory; but there 206 | have been advancements to deal with this too 207 | \item 208 | Not ideal for all possible situations (but this is a drawback of all 209 | software packages). 210 | \end{itemize} 211 | \end{frame} 212 | 213 | 214 | 215 | \begin{frame}{Design of the R System} 216 | The R system is divided into 2 conceptual parts: 217 | \begin{enumerate} 218 | \item 219 | The ``base'' R system that you download from CRAN 220 | \item 221 | Everything else. 222 | \end{enumerate} 223 | R functionality is divided into a number of \textit{packages}. 224 | \begin{itemize} 225 | \item 226 | The ``base'' R system contains, among other things, the \pkg{base} 227 | package which is required to run R and contains the most fundamental 228 | functions. 229 | \item 230 | The other packages contained in the ``base'' system include 231 | \pkg{utils}, \pkg{stats}, \pkg{datasets}, \pkg{graphics}, 232 | \pkg{grDevices}, \pkg{grid}, \pkg{methods}, \pkg{tools}, 233 | \pkg{parallel}, \pkg{compiler}, \pkg{splines}, \pkg{tcltk}, 234 | \pkg{stats4}. 235 | \item 236 | There are also ``Recommend'' packages: \pkg{boot}, \pkg{class}, 237 | \pkg{cluster}, \pkg{codetools}, \pkg{foreign}, \pkg{KernSmooth}, 238 | \pkg{lattice}, \pkg{mgcv}, \pkg{nlme}, \pkg{rpart}, \pkg{survival}, 239 | \pkg{MASS}, \pkg{spatial}, \pkg{nnet}, \pkg{Matrix}. 240 | \end{itemize} 241 | \end{frame} 242 | 243 | 244 | 245 | 246 | \begin{frame}{Design of the R System} 247 | And there are many other packages available: 248 | \begin{itemize} 249 | \item 250 | There are about $4000$ packages on CRAN that have been developed by 251 | users and programmers around the world. 252 | \item 253 | There are also many packages associated with the Bioconductor project 254 | (http://bioconductor.org). 255 | \item 256 | People often make packages available on their personal websites; there 257 | is no reliable way to keep track of how many packages are available in 258 | this fashion. 259 | \end{itemize} 260 | \end{frame} 261 | 262 | \begin{frame}{Some R Resources} 263 | Available from CRAN (http://cran.r-project.org) 264 | \begin{itemize} 265 | \item 266 | An Introduction to R 267 | \item 268 | Writing R Extensions 269 | \item 270 | R Data Import/Export 271 | \item 272 | R Installation and Administration (mostly for building R from sources) 273 | \item 274 | R Internals (not for the faint of heart) 275 | \end{itemize} 276 | \end{frame} 277 | 278 | \begin{frame}{Some Useful Books on S/R} 279 | Standard texts 280 | \begin{itemize} 281 | \item 282 | Chambers (2008). \textit{Software for Data Analysis}, Springer. (your 283 | textbook) 284 | \item 285 | Chambers (1998). \textit{Programming with Data}, Springer. 286 | \item 287 | Venables \& Ripley (2002). \textit{Modern Applied Statistics with S}, 288 | Springer. 289 | \item 290 | Venables \& Ripley (2000). \textit{S Programming}, Springer. 291 | \item 292 | Pinheiro \& Bates (2000). \textit{Mixed-Effects Models in S and 293 | S-PLUS}, Springer. 294 | \item 295 | Murrell (2005). \textit{R Graphics}, Chapman \& Hall/CRC Press. 296 | \end{itemize} 297 | Other resources 298 | \begin{itemize} 299 | \item 300 | Springer has a series of books called \textit{Use R!}. 301 | \item 302 | A longer list of books is at 303 | http://www.r-project.org/doc/bib/R-books.html 304 | \end{itemize} 305 | \end{frame} 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | \end{document} 320 | 321 | 322 | -------------------------------------------------------------------------------- /plotting.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Plotting} 28 | 29 | \date{Computing for Data Analysis} 30 | 31 | 32 | 33 | \begin{document} 34 | 35 | \begin{frame} 36 | \titlepage 37 | \end{frame} 38 | 39 | 40 | \begin{frame}{Plotting} 41 | The plotting and graphics engine in R is encapsulated in a few base 42 | and recommend packages: 43 | \begin{itemize} 44 | \item 45 | \pkg{graphics}: contains plotting functions for the ``base'' graphing 46 | systems, including \code{plot}, \code{hist}, \code{boxplot} and many 47 | others. 48 | \item 49 | \pkg{lattice}: contains code for producing Trellis graphics, which are 50 | independent of the ``base'' graphics system; includes functions like 51 | \code{xyplot}, \code{bwplot}, \code{levelplot} 52 | \item 53 | \pkg{grid}: implements a different graphing system independent of the 54 | ``base'' system; the \pkg{lattice} package builds on top of 55 | \pkg{grid}; we seldom call functions from the \pkg{grid} package 56 | directly 57 | \item 58 | \pkg{grDevices}: contains all the code implementing the various 59 | graphics devices, including X11, PDF, PostScript, PNG, etc. 60 | \end{itemize} 61 | \end{frame} 62 | 63 | \begin{frame}{The Process of Making a Plot} 64 | When making a plot one must first make a few choices (not 65 | necessarily in this order): 66 | \begin{itemize} 67 | \item To what device will the plot be sent? The default in Unix is 68 | \code{x11}; on Windows it is \code{windows}; on Mac OS X it is 69 | \code{quartz} 70 | \item Is the plot for viewing temporarily on the screen, or will it 71 | eventually end up in a paper? Are you using it in a presentation? 72 | Plots included in a paper/presentation need to use a file device 73 | rather than a screen device. 74 | \item 75 | Is there a large amount of data going into the plot? Or is it just a 76 | few points? 77 | \item 78 | Do you need to be able to resize the graphic? 79 | \end{itemize} 80 | \end{frame} 81 | 82 | 83 | \begin{frame}{The Process of Making a Plot} 84 | \begin{itemize} 85 | \item 86 | What graphics system will you use: base or grid/lattice? These 87 | generally cannot be mixed. 88 | \item 89 | Base graphics are usually constructed piecemeal, with each aspect of 90 | the plot handled separately through a series of function calls; this 91 | is often conceptually simpler and allows plotting to mirror the 92 | thought process 93 | \item 94 | Lattice/grid graphics are usually created in a single function call, 95 | so all of the graphics parameters have to specified at once; 96 | specifying everything at once allows R to automatically calculate the 97 | necessary spacings and font sizes. 98 | \end{itemize} 99 | \end{frame} 100 | 101 | 102 | \begin{frame}{Base Graphics} 103 | Base graphics are used most commonly and are a very powerful system 104 | for creating 2-D graphics. 105 | \begin{itemize} 106 | \item 107 | Calling \code{plot(x, y)} or \code{hist(x)} will launch a graphics 108 | device (if one is not already open) and draw the plot on the device 109 | \item 110 | If the arguments to \code{plot} are not of some special class, then 111 | the \textit{default method} for \code{plot} is called; this function 112 | has \textit{many} arguments, letting you set the title, x axis lable, 113 | y axis label, etc. 114 | \item 115 | The base graphics system has \textit{many} parameters that can set and 116 | tweaked; these parameters are documented in \code{?par}; it wouldn't 117 | hurt to memorize this help page! 118 | \end{itemize} 119 | \end{frame} 120 | 121 | \begin{frame}{Some Important Base Graphics Parameters} 122 | The \code{par} function is used to specify global graphics parameters 123 | that affect all plots in an R session. These parameters can often be 124 | overridden as arguments to specific plotting functions. 125 | \begin{itemize} 126 | \item 127 | pch: the plotting symbol (default is open circle) 128 | \item 129 | lty: the line type (default is solid line), can be dashed, dotted, 130 | etc. 131 | \item 132 | lwd: the line width, specified as an integer multiple 133 | \item 134 | col: the plotting color, specified as a number, string, or hex code; 135 | the \code{colors} function gives you a vector of colors by name 136 | \item 137 | las: the orientation of the axis labels on the plot 138 | \end{itemize} 139 | \end{frame} 140 | 141 | \begin{frame}{Some Important Base Graphics Parameters} 142 | \begin{itemize} 143 | \item 144 | bg: the background color 145 | \item 146 | mar: the margin size 147 | \item 148 | oma: the outer margin size (default is 0 for all sides) 149 | \item 150 | mfrow: number of plots per row, column (plots are filled row-wise) 151 | \item 152 | mfcol: number of plots per row, column (plots are filled column-wise) 153 | \end{itemize} 154 | \end{frame} 155 | 156 | \begin{frame}[fragile]{Some Important Base Graphics Parameters} 157 | Some default values. 158 | \begin{verbatim} 159 | > par("lty") 160 | [1] "solid" 161 | > par("lwd") 162 | [1] 1 163 | > par("col") 164 | [1] "black" 165 | > par("pch") 166 | [1] 1 167 | \end{verbatim} 168 | \end{frame} 169 | 170 | \begin{frame}[fragile]{Some Important Base Graphics Parameters} 171 | Some default values. 172 | \begin{verbatim} 173 | > par("bg") 174 | [1] "transparent" 175 | > par("mar") 176 | [1] 5.1 4.1 4.1 2.1 177 | > par("oma") 178 | [1] 0 0 0 0 179 | > par("mfrow") 180 | [1] 1 1 181 | > par("mfcol") 182 | [1] 1 1 183 | \end{verbatim} 184 | \end{frame} 185 | 186 | \begin{frame}{Some Important Base Plotting Functions} 187 | \begin{itemize} 188 | \item 189 | \code{plot}: make a scatterplot, or other type of plot depending on 190 | the class of the object being plotted 191 | \item 192 | \code{lines}: add lines to a plot, given a vector x values and a 193 | corresponding vector of y values (or a 2-column matrix); this function 194 | just connects the dots 195 | \item 196 | \code{points}: add points to a plot 197 | \item 198 | \code{text}: add text labels to a plot using specified x, y 199 | coordinates 200 | \item 201 | \code{title}: add annotations to x, y axis labels, title, subtitle, 202 | outer margin 203 | \item 204 | \code{mtext}: add arbitrary text to the margins (inner or outer) of 205 | the plot 206 | \item 207 | \code{axis}: adding axis ticks/labels 208 | \end{itemize} 209 | \end{frame} 210 | 211 | 212 | \begin{frame}{Useful Graphics Devices} 213 | The list of devices is found in \code{?Devices}; there are also 214 | devices created by users on CRAN 215 | \begin{itemize} 216 | \item 217 | \code{pdf}: useful for line-type graphics, vector 218 | format, resizes well, usually portable 219 | \item 220 | \code{postscript}: older format, also vector format and resizes well, 221 | usually portable, can be used to create encapsulated postscript files, 222 | Windows systems often don't have a postscript viewer 223 | \item 224 | \code{xfig}: good of you use Unix and want to edit a plot by hand 225 | \end{itemize} 226 | \end{frame} 227 | 228 | \begin{frame}{Useful Graphics Devices} 229 | \begin{itemize} 230 | \item 231 | \code{png}: bitmapped format, good for line drawings or images with 232 | solid colors, uses lossless compression (like the old GIF format), 233 | most web browsers can read this format natively, good for plotting 234 | many many many points, does not resize well 235 | \item 236 | \code{jpeg}: good for photographs or natural scenes, uses lossy 237 | compression, good for plotting many many many points, does not resize 238 | well, can be read by almost any computer and any web browser, not 239 | great for line drawings 240 | \item 241 | \code{bitmap}: needed to create bitmap files (png, jpeg) in certain 242 | situations (uses Ghostscript), also can be used to create a variety of 243 | other bitmapped formats not mentioned 244 | \item 245 | \code{bmp}: a native Windows bitmapped format 246 | \end{itemize} 247 | \end{frame} 248 | 249 | 250 | \begin{frame}{Copying Plots} 251 | There are two basic approaches to plotting. 252 | \begin{enumerate} 253 | \item 254 | Launch a graphics device 255 | \item 256 | Make a plot; annotate if needed 257 | \item 258 | Close graphics device 259 | \end{enumerate} 260 | Or 261 | \begin{enumerate} 262 | \item 263 | Make a plot on a screen device (default); annotate if needed 264 | \item 265 | Copy the plot to another device if necessary (not an exact process) 266 | \end{enumerate} 267 | \end{frame} 268 | 269 | 270 | \begin{frame}{Copying Plots} 271 | Copying a plot to another device can be useful because some plots 272 | require a lot of code and it can be a pain to type all that in again 273 | for a different device. 274 | \begin{itemize} 275 | \item \code{dev.copy}: copy a plot from one device to another 276 | \item \code{dev.copy2pdf}: copy a plot to a Portable Document Format 277 | (PDF) file 278 | \item \code{dev.list}: show the list of open graphics devices 279 | \item \code{dev.next}: switch control to the next graphics device on 280 | the device list 281 | \item \code{dev.set}: set control to a specific graphics device 282 | \item \code{dev.off}: close the current graphics device 283 | \end{itemize} 284 | NOTE: Copying a plot is not an exact operation! 285 | \end{frame} 286 | 287 | 288 | \begin{frame}{Lattice Functions} 289 | \begin{itemize} 290 | \item 291 | \code{xyplot}: this is the main function for creating scatterplots 292 | \item 293 | \code{bwplot}: box-and-whiskers plots (``boxplots'') 294 | \item 295 | \code{histogram}: histograms 296 | \item 297 | \code{stripplot}: like a boxplot but with actual points 298 | \item 299 | \code{dotplot}: plot dots on ``violin strings'' 300 | \item 301 | \code{splom}: scatterplot matrix; like \code{pairs} in base graphics 302 | system 303 | \item 304 | \code{levelplot}, \code{contourplot}: for plotting ``image'' data 305 | \end{itemize} 306 | \end{frame} 307 | 308 | 309 | \begin{frame}[fragile]{Lattice Functions} 310 | Lattice functions generally take a formula for their first argument, 311 | usually of the form 312 | \begin{verbatim} 313 | y ~ x | f * g 314 | \end{verbatim} 315 | \begin{itemize} 316 | \item 317 | On the left of the \verb+~+ is the y variable, on the right is the x 318 | variable 319 | \item 320 | After the \verb+|+ are \textit{conditioning variables} --- they are 321 | optional; the \verb+*+ indicates an interaction 322 | \item 323 | The second argument is the data frame or list from which the variables 324 | in the formula should be obtained. 325 | \item 326 | If no data frame or list is passed, then the parent frame is used. 327 | \item 328 | If no other arguments are passed, there are defaults that can be used. 329 | \end{itemize} 330 | \end{frame} 331 | 332 | 333 | \begin{frame}{Lattice Behavior} 334 | Lattice functions behave differently from base graphics functions in 335 | one critical way. 336 | \begin{itemize} 337 | \item 338 | Base graphics functions plot data directly the graphics device 339 | \item 340 | Lattice graphics functions return an object of class \code{trellis}. 341 | \item 342 | The print methods for lattice functions actually do the work of 343 | plotting the data on the graphics device. 344 | \item 345 | Lattice functions return ``plot objects'' that can, in principle, be 346 | stored (but it's usually better to just save the code + data). 347 | \item 348 | On the command line, \code{trellis} objects are \textit{auto-printed} 349 | so that it appears the function is plotting the data 350 | \end{itemize} 351 | \end{frame} 352 | 353 | 354 | %% \begin{frame}[fragile]{Calling Lattice Functions} 355 | %% \begin{verbatim} 356 | %% p <- xyplot(y ~ x | f, subscripts = TRUE, 357 | %% ylim = lattice:::extend.limits(range(lo, hi)), 358 | %% panel = function(x, y, subscripts, ...) { 359 | %% panel.xyplot(x, y, ...) 360 | %% lsegments(x, lo[subscripts], 361 | %% x, hi[subscripts]) 362 | %% panel.abline(h = 0, lty = 2) 363 | %% }, 364 | %% xlab = NULL, 365 | %% scales = list(x = list(at = 1:nmodels, labels = models, 366 | %% rot = 90, alternating = FALSE), 367 | %% y = list(alternating = 3)), 368 | %% ylab = list(label = expression("% increase in 369 | %% admissions for a 10 " * mu * g/m^3 * " inceas%% e in " * PM[2.5]), cex = 0.8) 370 | %% ) 371 | %% print(p) 372 | %% \end{verbatim} 373 | %% \end{frame} 374 | 375 | 376 | \begin{frame}[fragile]{Lattice Panel Functions} 377 | Lattice functions have a \code{panel} function which controls what 378 | happens inside each panel of the entire plot. 379 | \begin{verbatim} 380 | x <- rnorm(100) 381 | y <- x + rnorm(100, sd = 0.5) 382 | f <- gl(2, 50, labels = c("Group 1", "Group 2")) 383 | xyplot(y ~ x | f) 384 | \end{verbatim} 385 | plots y vs. x conditioned on f. 386 | \end{frame} 387 | 388 | \begin{frame}[fragile]{Lattice Panel Functions} 389 | \begin{verbatim} 390 | xyplot(y ~ x | f, 391 | panel = function(x, y, ...) { 392 | panel.xyplot(x, y, ...) 393 | panel.abline(h = median(y), 394 | lty = 2) 395 | }) 396 | \end{verbatim} 397 | plots y vs. x conditioned on f with horizontal (dashed) line drawn at 398 | the median of y for each panel. 399 | \end{frame} 400 | 401 | 402 | \begin{frame}[fragile]{Lattice Panel Functions} 403 | Adding a regression line 404 | \begin{verbatim} 405 | xyplot(y ~ x | f, 406 | panel = function(x, y, ...) { 407 | panel.xyplot(x, y, ...) 408 | panel.lmline(x, y, col = 2) 409 | }) 410 | \end{verbatim} 411 | fits and plots a simple linear regression line to each panel of the 412 | plot. 413 | \end{frame} 414 | 415 | \begin{frame}[fragile]{Using Subscripts} 416 | Sometimes you need to access objects outside the panel envrionment. 417 | \begin{verbatim} 418 | y <- c(rnorm(10), rnorm(10, 2)) 419 | x <- rep(1:10, 2) 420 | std <- rep(1, 20) 421 | rng <- range(y - 1.96 * std, y + 1.96 * std) 422 | f <- gl(2, 10) 423 | 424 | xyplot(y ~ x | f, subscripts = TRUE, ylim = rng, 425 | panel = function(x, y, subscripts, ...) { 426 | panel.xyplot(x, y, ...) 427 | lsegments(x, y - 1.96 * std[subscripts], 428 | x, y + 1.96 * std[subscripts]) 429 | panel.abline(h = 0, lty = 2) 430 | }) 431 | \end{verbatim} 432 | \end{frame} 433 | 434 | 435 | \begin{frame}{Mathematical Annotation} 436 | R can produce \LaTeX-like symbols on a plot for mathematical 437 | annotation. This is very handy and is useful for making fun of people 438 | who use other statistical packages. 439 | \begin{itemize} 440 | \item 441 | Math symbols are ``expressions'' in R and need to be wrapped in the 442 | \code{expression} function 443 | \item 444 | There is a set list of allowed symbols and this is documented in ?plotmath 445 | \item 446 | Plotting functions that take arguments for text generally allow 447 | expressions for math symbols 448 | \end{itemize} 449 | \end{frame} 450 | 451 | 452 | \begin{frame}[fragile]{Mathematical Annotation} 453 | Some examples. 454 | \begin{verbatim} 455 | plot(0, 0, main = expression(theta == 0), 456 | ylab = expression(hat(gamma) == 0), 457 | xlab = expression(sum(x[i] * y[i], i==1, n))) 458 | \end{verbatim} 459 | Pasting strings together. 460 | \begin{verbatim} 461 | x <- rnorm(100) 462 | hist(x, 463 | xlab=expression("The mean (" * bar(x) * ") is " * 464 | sum(x[i]/n,i==1,n))) 465 | \end{verbatim} 466 | \end{frame} 467 | 468 | \begin{frame}[fragile]{Substituting} 469 | What if you want to use a computed value in the annotation? 470 | \begin{verbatim} 471 | x <- rnorm(100) 472 | y <- x + rnorm(100, sd = 0.5) 473 | plot(x, y, 474 | xlab=substitute(bar(x) == k, list(k=mean(x))), 475 | ylab=substitute(bar(y) == k, list(k=mean(y))) 476 | ) 477 | \end{verbatim} 478 | Or in a loop of plots 479 | \begin{verbatim} 480 | par(mfrow = c(2, 2)) 481 | for(i in 1:4) { 482 | x <- rnorm(100) 483 | hist(x, main=substitute(theta==num,list(num=i))) 484 | } 485 | \end{verbatim} 486 | \end{frame} 487 | 488 | 489 | \begin{frame}{Summary of Important Help Pages} 490 | \begin{itemize} 491 | \item 492 | ?par 493 | \item 494 | ?plot 495 | \item 496 | ?xyplot 497 | \item 498 | ?plotmath 499 | \item 500 | ?axis 501 | \end{itemize} 502 | \end{frame} 503 | 504 | 505 | \end{document} 506 | -------------------------------------------------------------------------------- /reading-data.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Reading and Writing Data} 28 | 29 | \date{Computing for Data Analysis} 30 | 31 | \setbeamertemplate{footline}[page number] 32 | 33 | 34 | \begin{document} 35 | 36 | \begin{frame} 37 | \titlepage 38 | \end{frame} 39 | 40 | 41 | 42 | \begin{frame}{Reading Data} 43 | There are a few principal functions reading data into R. 44 | \begin{itemize} 45 | \item 46 | \code{read.table}, \code{read.csv}, for reading tabular data 47 | \item 48 | \code{readLines}, for reading lines of a text file 49 | \item 50 | \code{source}, for reading in R code files (inverse of \code{dump}) 51 | \item 52 | \code{dget}, for reading in R code files (inverse of \code{dput}) 53 | \item 54 | \code{load}, for reading in saved workspaces 55 | \item 56 | \code{unserialize}, for reading single R objects in binary form 57 | \end{itemize} 58 | \end{frame} 59 | 60 | 61 | \begin{frame}{Writing Data} 62 | There are analogous functions for writing data to files 63 | \begin{itemize} 64 | \item 65 | \code{write.table} 66 | \item 67 | \code{writeLines} 68 | \item 69 | \code{dump} 70 | \item 71 | \code{dput} 72 | \item 73 | \code{save} 74 | \item 75 | \code{serialize} 76 | \end{itemize} 77 | \end{frame} 78 | 79 | 80 | 81 | \begin{frame}{Reading Data Files with read.table} 82 | The \code{read.table} function is one of the most commonly used 83 | functions for reading data. It has a few important arguments: 84 | \begin{itemize} 85 | \item 86 | \code{file}, the name of a file, or a connection 87 | \item 88 | \code{header}, logical indicating if the file has a header line 89 | \item 90 | \code{sep}, a string indicating how the columns are separated 91 | \item 92 | \code{colClasses}, a character vector indicating the class of each 93 | column in the dataset 94 | \item 95 | \code{nrows}, the number of rows in the dataset 96 | \item 97 | \code{comment.char}, a character string indicating the comment 98 | character 99 | \item 100 | \code{skip}, the number of lines to skip from the beginning 101 | \item 102 | \code{stringsAsFactors}, should character variables be coded as 103 | factors? 104 | \end{itemize} 105 | \end{frame} 106 | 107 | 108 | \begin{frame}[fragile]{read.table} 109 | For small to moderately sized datasets, you can usually call 110 | \code{read.table} without specifying any other arguments 111 | \begin{verbatim} 112 | data <- read.table("foo.txt") 113 | \end{verbatim} 114 | R will automatically 115 | \begin{itemize} 116 | \item 117 | skip lines that begin with a \# 118 | \item 119 | figure out how many rows there are (and how much memory needs to be 120 | allocated) 121 | \item 122 | figure what type of variable is in each column of the table 123 | \end{itemize} 124 | Telling R all these things directly makes R run faster and more 125 | efficiently. 126 | \begin{itemize} 127 | \item 128 | \code{read.csv} is identical to \code{read.table} except that the 129 | default separator is a comma. 130 | \end{itemize} 131 | \end{frame} 132 | 133 | 134 | \begin{frame}{Reading in Larger Datasets with read.table} 135 | With much larger datasets, doing the following things will make your 136 | life easier and will prevent R from choking. 137 | \begin{itemize} 138 | \item 139 | Read the help page for \code{read.table}, which contains many hints 140 | \item 141 | Make a rough calculation of the memory required to store your dataset. 142 | If the dataset is larger than the amount of RAM on your computer, you 143 | can probably stop right here. 144 | \item 145 | Set \code{comment.char = ""} if there are no commented lines in your 146 | file. 147 | \end{itemize} 148 | \end{frame} 149 | 150 | 151 | \begin{frame}[fragile]{Reading in Larger Datasets with read.table} 152 | \begin{itemize} 153 | \item 154 | Use the \code{colClasses} argument. Specifying this option instead of 155 | using the default can make 'read.table' run MUCH faster, often twice 156 | as fast. In order to use this option, you have to know the class of 157 | each column in your data frame. If all of the columns are ``numeric'', 158 | for example, then you can just set \code{colClasses = "numeric"}. A 159 | quick an dirty way to figure out the classes of each column is the 160 | following: 161 | \begin{verbatim} 162 | initial <- read.table("datatable.txt", nrows = 100) 163 | classes <- sapply(initial, class) 164 | tabAll <- read.table("datatable.txt", 165 | colClasses = classes) 166 | \end{verbatim} 167 | \item 168 | Set \code{nrows}. This doesn't make R run faster but it helps with 169 | memory usage. A mild overestimate is okay. You can use the Unix tool 170 | \code{wc} to calculate the number of lines in a file. 171 | \end{itemize} 172 | \end{frame} 173 | 174 | 175 | \begin{frame}{Know Thy System} 176 | In general, when using R with larger datasets, it's useful to know a 177 | few things about your system. 178 | \begin{itemize} 179 | \item 180 | How much memory is available? 181 | \item 182 | What other applications are in use? 183 | \item 184 | Are there other users logged into the same system? 185 | \item 186 | What operating system? 187 | \item 188 | Is the OS 32 or 64 bit? 189 | \end{itemize} 190 | \end{frame} 191 | 192 | \begin{frame}{Calculating Memory Requirements} 193 | I have a data frame with 1,500,000 rows and 120 columns, all of which 194 | are numeric data. Roughly, how much memory is required to store this 195 | data frame? 196 | \begin{eqnarray*} 197 | 1,500,000\times 120 \times\mbox{$8$ bytes/numeric} 198 | & = & 199 | 1440000000\mbox{ bytes}\\ 200 | & = & 201 | 1440000000 / 2^{20}\mbox{ bytes/MB}\\ 202 | & = & 203 | 1,373.29\mbox{ MB}\\ 204 | & = & 205 | 1.34\mbox{ GB} 206 | \end{eqnarray*} 207 | \end{frame} 208 | 209 | 210 | \begin{frame}{Textual Formats} 211 | \begin{itemize} 212 | \item 213 | \code{dump}ing and \code{dput}ing are useful because the resulting 214 | textual format is edit-able, and in the case of corruption, 215 | potentially recoverable. 216 | \item 217 | Unlike writing out a table or csv file, \code{dump} and \code{dput} 218 | preserve the \textit{metadata} (sacrificing some readability), so that 219 | another user doesn't have to specify it all over again. 220 | \item 221 | Textual formats can work much better with version control programs 222 | like subversion or git which can only track changes meaningfully in 223 | text files 224 | \item Textual formats can be longer-lived; if there is corruption 225 | somewhere in the file, it can be easier to fix the problem 226 | \item 227 | Textual formats adhere to the ``Unix philosophy'' 228 | \item Downside: The format is not very space-efficient 229 | \end{itemize} 230 | \end{frame} 231 | 232 | \begin{frame}[fragile]{dput-ting R Objects} 233 | Another way to pass data around is by deparsing the R object with 234 | \code{dput} and reading it back in using \code{dget}. 235 | \begin{verbatim} 236 | > y <- data.frame(a = 1, b = "a") 237 | > dput(y) 238 | structure(list(a = 1, 239 | b = structure(1L, .Label = "a", 240 | class = "factor")), 241 | .Names = c("a", "b"), row.names = c(NA, -1L), 242 | class = "data.frame") 243 | > dput(y, file = "y.R") 244 | > new.y <- dget("y.R") 245 | > new.y 246 | a b 247 | 1 1 a 248 | \end{verbatim} 249 | \end{frame} 250 | 251 | 252 | \begin{frame}[fragile]{Dumping R Objects} 253 | Multiple objects can be deparsed using the \code{dump} function and 254 | read back in using \code{source}. 255 | \begin{verbatim} 256 | > x <- "foo" 257 | > y <- data.frame(a = 1, b = "a") 258 | > dump(c("x", "y"), file = "data.R") 259 | > rm(x, y) 260 | > source("data.R") 261 | > y 262 | a b 263 | 1 1 a 264 | > x 265 | [1] "foo" 266 | \end{verbatim} 267 | \end{frame} 268 | 269 | 270 | 271 | \begin{frame}{Interfaces to the Outside World} 272 | Data are read in using \textit{connection} interfaces. Connections 273 | can be made to files (most common) or to other more exotic things. 274 | \begin{itemize} 275 | \item 276 | \code{file}, opens a connection to a file 277 | \item 278 | \code{gzfile}, opens a connection to a file compressed with gzip 279 | \item 280 | \code{bzfile}, opens a connection to a file compressed with bzip2 281 | \item 282 | \code{url}, opens a connection to a webpage 283 | \end{itemize} 284 | \end{frame} 285 | 286 | 287 | \begin{frame}[fragile]{File Connections} 288 | \begin{verbatim} 289 | > str(file) 290 | function (description = "", open = "", blocking = TRUE, 291 | encoding = getOption("encoding")) 292 | \end{verbatim} 293 | \begin{itemize} 294 | \item 295 | \code{description} is the name of the file 296 | \item 297 | \code{open} is a code indicating 298 | \begin{itemize} 299 | \item 300 | ``r'' read only 301 | \item 302 | ``w'' writing (and initializing a new file) 303 | \item 304 | ``a'' appending 305 | \item 306 | ``rb'', ``wb'', ``ab'' reading, writing, or appending in binary mode 307 | (Windows) 308 | \end{itemize} 309 | \end{itemize} 310 | \end{frame} 311 | 312 | 313 | \begin{frame}[fragile]{Connections} 314 | In general, connections are powerful tools that let you navigate files 315 | or other external objects. In practice, we often don't need to deal 316 | with the connection interface directly. 317 | \begin{verbatim} 318 | con <- file("foo.txt", "r") 319 | data <- read.csv(con) 320 | close(con) 321 | \end{verbatim} 322 | is the same as 323 | \begin{verbatim} 324 | data <- read.csv("foo.txt") 325 | \end{verbatim} 326 | \end{frame} 327 | 328 | 329 | \begin{frame}[fragile]{Reading Lines of a Text File} 330 | The \code{readLines} function can be used to simply read lines of a 331 | text file and store them in a character vector. 332 | \begin{verbatim} 333 | > con <- gzfile("words.gz") 334 | > x <- readLines(con, 10) 335 | > x 336 | [1] "1080" "10-point" "10th" "11-point" 337 | [5] "12-point" "16-point" "18-point" "1st" 338 | [9] "2" "20-point" 339 | \end{verbatim} 340 | \code{writeLines} takes a character vector and writes each element one 341 | line at a time to a text file. 342 | \end{frame} 343 | 344 | 345 | \begin{frame}[fragile]{Reading Lines of a Text File} 346 | \code{readLines} can be useful for reading in lines of webpages 347 | \begin{verbatim} 348 | ## This might take time 349 | con <- url("http://www.jhsph.edu", "r") 350 | x <- readLines(con) 351 | > head(x) 352 | [1] "" 353 | [2] "" 354 | [3] "" 355 | [4] "" 356 | [5] "\t"[6] "\t" 357 | \end{verbatim} 358 | \end{frame} 359 | 360 | \begin{frame}[fragile]{Saving Data in Non-tabular Forms} 361 | For temporary storage or for transport, it is more efficient to save 362 | data in (compressed) binary form using \code{save} or 363 | \code{save.image}. 364 | \begin{verbatim} 365 | x <- 1 366 | y <- data.frame(a = 1, b = "a") 367 | save(x, y, file = "data.RData") 368 | load("data.RData") ## overwrites existing x and y! 369 | \end{verbatim} 370 | Binary formats are not great for long-term storage because if they are 371 | corrupted, recovery is usually not possible. 372 | \end{frame} 373 | 374 | \begin{frame}{Serialization} 375 | Serialization is the process of taking an R object and converting into 376 | a representation as a ``series'' of bytes. 377 | \begin{itemize} 378 | \item 379 | The \code{save} and \code{save.image} functions serialize R objects 380 | and then save them to files 381 | \item 382 | The \code{serialize} function can be used to serialize an R object to 383 | an arbitrary connection (database, socket, pipe, etc.) 384 | \item 385 | \code{unserialize} reads from an arbitrary connection and inverts a 386 | serialization, returning an R object 387 | \end{itemize} 388 | \end{frame} 389 | 390 | 391 | \begin{frame}[fragile]{Serialization} 392 | \begin{verbatim} 393 | > x <- list(1, 2, 3) 394 | > serialize(x, NULL) 395 | [1] 58 0a 00 00 00 02 00 02 06 01 00 02 03 00 00 396 | [16] 00 00 13 00 00 00 03 00 00 00 0e 00 00 00 01 397 | [31] 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00 398 | [46] 01 40 00 00 00 00 00 00 00 00 00 00 0e 00 00 399 | [61] 00 01 40 08 00 00 00 00 00 00 400 | \end{verbatim} 401 | \end{frame} 402 | 403 | \begin{frame}[fragile]{Serialization} 404 | \begin{verbatim} 405 | > con <- gzfile("foo.gz", "wb") 406 | > serialize(x, con) 407 | NULL 408 | > close(con) 409 | > 410 | > con <- gzfile("foo.gz", "rb") 411 | > y <- unserialize(con) 412 | > identical(x, y) 413 | [1] TRUE 414 | \end{verbatim} 415 | \end{frame} 416 | 417 | 418 | \begin{frame}{Data Output Summary} 419 | \begin{itemize} 420 | \item 421 | \code{write.table}, \code{write.csv} --- readable output, textual, 422 | little metadata 423 | \item 424 | \code{save}, \code{save.image}, \code{serialize} --- exact 425 | representation, efficient storage if compressed, not recoverable if 426 | corrupted 427 | \item 428 | \code{dput}, \code{dump} --- textual format, somewhat readable, 429 | metadata retained, not usable for more exotic objects (environments) 430 | \end{itemize} 431 | \end{frame} 432 | 433 | 434 | \end{document} 435 | -------------------------------------------------------------------------------- /regex.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | \setbeamertemplate{footline}[page number] 13 | 14 | \usepackage[english]{babel} 15 | \usepackage[latin1]{inputenc} 16 | \usepackage{graphicx} 17 | %\usepackage{times} 18 | %\usepackage[T1]{fontenc} 19 | % Or whatever. Note that the encoding and the font should match. If T1 20 | % does not look nice, try deleting the line with the fontenc. 21 | 22 | \usepackage{amsmath,amsfonts,amssymb} 23 | 24 | \input{macros} 25 | 26 | \title{Regular Expressions} 27 | 28 | \date{Computing for Data Analysis} 29 | 30 | \begin{document} 31 | 32 | \begin{frame} 33 | \titlepage 34 | \end{frame} 35 | 36 | \begin{frame}{Regular expressions} 37 | \begin{itemize} 38 | \item 39 | Regular expressions can be thought of as a combination of literals and 40 | \textit{metacharacters} 41 | \item 42 | To draw an analogy with natural language, think of literal text 43 | forming the words of this language, and the metacharacters defining 44 | its grammar 45 | \item 46 | Regular expressions have a rich set of metacharacters 47 | \end{itemize} 48 | 49 | \end{frame} 50 | 51 | \begin{frame}[fragile]{Literals} 52 | Simplest pattern consists only of literals. The literal ``nuclear'' 53 | would match to the following lines: 54 | \begin{verbatim} 55 | Ooh. I just learned that to keep myself alive after a 56 | nuclear blast! All I have to do is milk some rats 57 | then drink the milk. Aweosme. :} 58 | 59 | Laozi says nuclear weapons are mas macho 60 | 61 | Chaos in a country that has nuclear weapons -- not good. 62 | 63 | my nephew is trying to teach me nuclear physics, or 64 | possibly just trying to show me how smart he is 65 | so I'll be proud of him [which I am]. 66 | 67 | lol if you ever say "nuclear" people immediately think 68 | DEATH by radiation LOL 69 | \end{verbatim} 70 | \end{frame} 71 | 72 | \begin{frame}[fragile]{Literals} 73 | The literal ``Obama'' would match to the following lines 74 | \begin{verbatim} 75 | Politics r dum. Not 2 long ago Clinton was sayin Obama 76 | was crap n now she sez vote 4 him n unite? WTF? 77 | Screw em both + Mcain. Go Ron Paul! 78 | 79 | Clinton conceeds to Obama but will her followers listen?? 80 | 81 | Are we sure Chelsea didn't vote for Obama? 82 | 83 | thinking ... Michelle Obama is terrific! 84 | 85 | jetlag..no sleep...early mornig to starbux..Ms. Obama 86 | was moving 87 | \end{verbatim} 88 | \end{frame} 89 | 90 | \begin{frame}{Regular Expressions} 91 | \begin{itemize} 92 | \item 93 | Simplest pattern consists only of literals; a match occurs if the 94 | sequence of literals occurs anywhere in the text being tested 95 | \item 96 | What if we only want the word ``Obama''? or sentences that end in 97 | the word ``Clinton'', or ``clinton'' or ``clinto''? 98 | \end{itemize} 99 | \end{frame} 100 | 101 | \begin{frame}{Regular Expressions} 102 | We need a way to express 103 | \begin{itemize} 104 | \item 105 | whitespace word boundaries 106 | \item 107 | sets of literals 108 | \item 109 | the beginning and end of a line 110 | \item 111 | alternatives (``war'' or ``peace'') 112 | \end{itemize} 113 | Metacharacters to the rescue! 114 | \end{frame} 115 | 116 | \begin{frame}[fragile]{Metacharacters} 117 | Some metacharacters represent the start of a line 118 | \begin{verbatim} 119 | ^i think 120 | \end{verbatim} 121 | will match the lines 122 | \begin{verbatim} 123 | i think we all rule for participating 124 | i think i have been outed 125 | i think this will be quite fun actually 126 | i think i need to go to work 127 | i think i first saw zombo in 1999. 128 | \end{verbatim} 129 | \end{frame} 130 | 131 | \begin{frame}[fragile]{Metacharacters} 132 | \$ represents the end of a line 133 | \begin{verbatim} 134 | morning$ 135 | \end{verbatim} 136 | will match the lines 137 | \begin{verbatim} 138 | well they had something this morning 139 | then had to catch a tram home in the morning 140 | dog obedience school in the morning 141 | and yes happy birthday i forgot to say it earlier this morning 142 | I walked in the rain this morning 143 | good morning 144 | \end{verbatim} 145 | \end{frame} 146 | 147 | \begin{frame}[fragile]{Character Classes with []} 148 | We can list a set of characters we will accept at a given point in the 149 | match 150 | \begin{verbatim} 151 | [Bb][Uu][Ss][Hh] 152 | \end{verbatim} 153 | will match the lines 154 | \begin{verbatim} 155 | The democrats are playing, "Name the worst thing about Bush!" 156 | I smelled the desert creosote bush, brownies, BBQ chicken 157 | BBQ and bushwalking at Molonglo Gorge 158 | Bush TOLD you that North Korea is part of the Axis of Evil 159 | I'm listening to Bush - Hurricane (Album Version) 160 | \end{verbatim} 161 | \end{frame} 162 | 163 | \begin{frame}[fragile]{Character Classes with []} 164 | \begin{verbatim} 165 | ^[Ii] am 166 | \end{verbatim} 167 | will match 168 | \begin{verbatim} 169 | i am so angry at my boyfriend i can't even bear to 170 | look at him 171 | 172 | i am boycotting the apple store 173 | 174 | I am twittering from iPhone 175 | 176 | I am a very vengeful person when you ruin my sweetheart. 177 | 178 | I am so over this. I need food. Mmmm bacon... 179 | \end{verbatim} 180 | \end{frame} 181 | 182 | \begin{frame}[fragile]{Character Classes with []} 183 | Similarly, you can specify a range of letters [a-z] or 184 | [a-zA-Z]; notice that the order doesn't matter 185 | \begin{verbatim} 186 | ^[0-9][a-zA-Z] 187 | \end{verbatim} 188 | will match the lines 189 | \begin{verbatim} 190 | 7th inning stretch 191 | 2nd half soon to begin. OSU did just win something 192 | 3am - cant sleep - too hot still.. :( 193 | 5ft 7 sent from heaven 194 | 1st sign of starvagtion 195 | \end{verbatim} 196 | \end{frame} 197 | 198 | \begin{frame}[fragile]{Character Classes with []} 199 | When used at the beginning of a character class, the ``\verb+^+'' is also a 200 | metacharacter and indicates matching characters NOT in the 201 | indicated class 202 | \begin{verbatim} 203 | [^?.]$ 204 | \end{verbatim} 205 | will match the lines 206 | \begin{verbatim} 207 | i like basketballs 208 | 6 and 9 209 | dont worry... we all die anyway! 210 | Not in Baghdad 211 | helicopter under water? hmmm 212 | \end{verbatim} 213 | \end{frame} 214 | 215 | \begin{frame}[fragile]{More Metacharacters} 216 | ``.'' is used to refer to any character. So 217 | \begin{verbatim} 218 | 9.11 219 | \end{verbatim} 220 | will match the lines 221 | \begin{verbatim} 222 | its stupid the post 9-11 rules 223 | if any 1 of us did 9/11 we would have been caught in days. 224 | NetBios: scanning ip 203.169.114.66 225 | Front Door 9:11:46 AM 226 | Sings: 0118999881999119725...3 ! 227 | \end{verbatim} 228 | \end{frame} 229 | 230 | \begin{frame}[fragile]{More Metacharacters: $|$} 231 | This does not mean ``pipe'' in the context of regular expressions; 232 | instead it translates to ``or''; we can use it to combine two 233 | expressions, the subexpressions being called alternatives 234 | \begin{verbatim} 235 | flood|fire 236 | \end{verbatim} 237 | will match the lines 238 | \begin{verbatim} 239 | is firewire like usb on none macs? 240 | the global flood makes sense within the context of the bible 241 | yeah ive had the fire on tonight 242 | ... and the floods, hurricanes, killer heatwaves, rednecks, gun nuts, etc. 243 | \end{verbatim} 244 | \end{frame} 245 | 246 | \begin{frame}[fragile]{More Metacharacters: $|$} 247 | We can include any number of alternatives... 248 | \begin{verbatim} 249 | flood|earthquake|hurricane|coldfire 250 | \end{verbatim} 251 | will match the lines 252 | \begin{verbatim} 253 | Not a whole lot of hurricanes in the Arctic. 254 | We do have earthquakes nearly every day somewhere in our State 255 | hurricanes swirl in the other direction 256 | coldfire is STRAIGHT! 257 | 'cause we keep getting earthquakes 258 | \end{verbatim} 259 | \end{frame} 260 | 261 | \begin{frame}[fragile]{More Metacharacters: $|$} 262 | The alternatives can be real expressions and not just literals 263 | \begin{verbatim} 264 | ^[Gg]ood|[Bb]ad 265 | \end{verbatim} 266 | will match the lines 267 | \begin{verbatim} 268 | good to hear some good knews from someone here 269 | Good afternoon fellow american infidels! 270 | good on you-what do you drive? 271 | Katie... guess they had bad experiences... 272 | my middle name is trouble, Miss Bad News 273 | \end{verbatim} 274 | \end{frame} 275 | 276 | \begin{frame}[fragile]{More Metacharacters: ( and )} 277 | Subexpressions are often contained in parentheses to constrain the 278 | alternatives 279 | \begin{verbatim} 280 | ^([Gg]ood|[Bb]ad) 281 | \end{verbatim} 282 | will match the lines 283 | \begin{verbatim} 284 | bad habbit 285 | bad coordination today 286 | good, becuase there is nothing worse than a man in kinky underwear 287 | Badcop, its because people want to use drugs 288 | Good Monday Holiday 289 | Good riddance to Limey 290 | \end{verbatim} 291 | \end{frame} 292 | 293 | \begin{frame}[fragile]{More Metacharacters: ?} 294 | The question mark indicates that the indicated expression is optional 295 | \begin{verbatim} 296 | [Gg]eorge( [Ww]\.)? [Bb]ush 297 | \end{verbatim} 298 | will match the lines 299 | \begin{verbatim} 300 | i bet i can spell better than you and george bush combined 301 | BBC reported that President George W. Bush claimed God told him to invade Iraq 302 | a bird in the hand is worth two george bushes 303 | \end{verbatim} 304 | \end{frame} 305 | 306 | \begin{frame}[fragile]{One thing to note...} 307 | In the following 308 | \begin{verbatim} 309 | [Gg]eorge( [Ww]\.)? [Bb]ush 310 | \end{verbatim} 311 | we wanted to match a ``.'' as a literal period; to do that, we had to 312 | ``escape'' the metacharacter, preceding it with a backslash In 313 | general, we have to do this for any metacharacter we want to include 314 | in our match 315 | \end{frame} 316 | 317 | \begin{frame}[fragile]{More metacharacters: * and +} 318 | The * and + signs are metacharacters used to indicate repetition; * 319 | means ``any number, including none, of the item'' and + means ``at 320 | least one of the item'' 321 | \begin{verbatim} 322 | \(.*\) 323 | \end{verbatim} 324 | will match the lines 325 | \begin{verbatim} 326 | anyone wanna chat? (24, m, germany) 327 | hello, 20.m here... ( east area + drives + webcam ) 328 | (he means older men) 329 | () 330 | \end{verbatim} 331 | \end{frame} 332 | 333 | \begin{frame}[fragile]{More metacharacters: * and +} 334 | The * and + signs are metacharacters used to indicate repetition; * 335 | means ``any number, including none, of the item'' and + means ``at 336 | least one of the item'' 337 | \begin{verbatim} 338 | [0-9]+ (.*)[0-9]+ 339 | \end{verbatim} 340 | will match the lines 341 | \begin{verbatim} 342 | working as MP here 720 MP battallion, 42nd birgade 343 | so say 2 or 3 years at colleage and 4 at uni makes us 23 when and if we finish 344 | it went down on several occasions for like, 3 or 4 *days* 345 | Mmmm its time 4 me 2 go 2 bed 346 | \end{verbatim} 347 | \end{frame} 348 | 349 | \begin{frame}[fragile]{More metacharacters: \{ and \}} 350 | \{ and \} are referred to as interval quantifiers; the let us specify 351 | the minimum and maximum number of matches of an expression 352 | \begin{verbatim} 353 | [Bb]ush( +[^ ]+){1,5} debate 354 | \end{verbatim} 355 | will match the lines 356 | \begin{verbatim} 357 | Bush has historically won all major debates he's done. 358 | in my view, Bush doesn't need these debates.. 359 | bush doesn't need the debates? maybe you are right 360 | That's what Bush supporters are doing about the debate. 361 | Felix, I don't disagree that Bush was poorly prepared for the debate. 362 | indeed, but still, Bush should have taken the debate more seriously. 363 | Keep repeating that Bush smirked and scowled during the debate 364 | \end{verbatim} 365 | \end{frame} 366 | 367 | \begin{frame}[fragile]{More metacharacters: { and }} 368 | \begin{itemize} 369 | \item 370 | {m,n} means at least m but not more than n matches 371 | \item 372 | {m} means exactly m matches 373 | \item 374 | {m,} means at least m matches 375 | \end{itemize} 376 | \end{frame} 377 | 378 | \begin{frame}[fragile]{More metacharacters: ( and ) revisited} 379 | \begin{itemize} 380 | \item 381 | In most implementations of regular expressions, the parentheses not 382 | only limit the scope of alternatives divided by a ``$|$'', but also 383 | can be used to ``remember'' text matched by the subexpression enclosed 384 | \item 385 | We refer to the matched text with \verb+\1+, \verb+\2+, etc. 386 | \end{itemize} 387 | \end{frame} 388 | 389 | \begin{frame}[fragile]{More metacharacters: ( and ) revisited} 390 | So the expression 391 | \begin{verbatim} 392 | +([a-zA-Z]+) +\1 + 393 | \end{verbatim} 394 | will match the lines 395 | \begin{verbatim} 396 | time for bed, night night twitter! 397 | 398 | blah blah blah blah 399 | 400 | my tattoo is so so itchy today 401 | 402 | i was standing all all alone against the world outside... 403 | 404 | hi anybody anybody at home 405 | 406 | estudiando css css css css.... que desastritooooo 407 | \end{verbatim} 408 | \end{frame} 409 | 410 | \begin{frame}[fragile]{More metacharacters: ( and ) revisited} 411 | The \verb+*+ is ``greedy'' so it always matches the \textit{longest} 412 | possible string that satisfies the regular expression. So 413 | \begin{verbatim} 414 | ^s(.*)s 415 | \end{verbatim} 416 | matches 417 | \begin{verbatim} 418 | sitting at starbucks 419 | 420 | setting up mysql and rails 421 | 422 | studying stuff for the exams 423 | 424 | spaghetti with marshmallows 425 | 426 | stop fighting with crackers 427 | 428 | sore shoulders, stupid ergonomics 429 | \end{verbatim} 430 | \end{frame} 431 | 432 | \begin{frame}[fragile]{More metacharacters: ( and ) revisited} 433 | The greediness of \verb+*+ can be turned off with the \verb+?+, as in 434 | \begin{verbatim} 435 | ^s(.*?)s 436 | \end{verbatim} 437 | \end{frame} 438 | 439 | 440 | \begin{frame}{Summary} 441 | \begin{itemize} 442 | \item Regular expressions are used in many different languages; not 443 | unique to R. 444 | \item Regular expressions are composed of literals and metacharacters 445 | that represent sets or classes of characters/words 446 | \item Text processing via regular expressions is a very powerful way 447 | to extract data from ``unfriendly'' sources (not all data comes as a 448 | CSV file) 449 | \end{itemize} 450 | (Thanks to Mark Hansen for some material in this lecture.) 451 | \end{frame} 452 | 453 | 454 | 455 | \end{document} 456 | -------------------------------------------------------------------------------- /sigmalike.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/sigmalike.pdf -------------------------------------------------------------------------------- /simpoisson.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/simpoisson.pdf -------------------------------------------------------------------------------- /simulation.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[Simulation]{Simulation} 26 | 27 | 28 | \date{Computing for Data Analysis} 29 | 30 | 31 | 32 | \begin{document} 33 | 34 | \begin{frame} 35 | \titlepage 36 | \end{frame} 37 | 38 | \begin{frame}{Generating Random Numbers} 39 | Functions for probability distributions in R 40 | \begin{itemize} 41 | \item \code{rnorm}: generate random Normal variates with a 42 | given mean and standard deviation 43 | \item \code{dnorm}: evaluate the Normal probability density (with a 44 | given mean/SD) at a point (or vector of points) 45 | \item \code{pnorm}: evaluate the cumulative distribution function 46 | for a Normal distribution 47 | \item \code{rpois}: generate random Poisson variates with a given 48 | rate 49 | \end{itemize} 50 | \end{frame} 51 | 52 | 53 | \begin{frame}{Generating Random Numbers} 54 | Probability distribution functions usually have four functions 55 | associated with them. The functions are prefixed with a 56 | \begin{itemize} 57 | \item \code{d} for density 58 | \item \code{r} for random number generation 59 | \item \code{p} for cumulative distribution 60 | \item \code{q} for quantile function 61 | \end{itemize} 62 | \end{frame} 63 | 64 | 65 | \begin{frame}[fragile]{Generating Random Numbers} 66 | Working with the Normal distributions requires using these four 67 | functions 68 | \begin{verbatim} 69 | dnorm(x, mean = 0, sd = 1, log = FALSE) 70 | pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) 71 | qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) 72 | rnorm(n, mean = 0, sd = 1) 73 | \end{verbatim} 74 | If $\Phi$ is the cumulative distribution function for a standard 75 | Normal distribution, then $\text{\code{pnorm}}(q) = \Phi(q)$ and 76 | $\text{\code{qnorm}(p)} = \Phi^{-1}(p)$. 77 | \end{frame} 78 | 79 | \begin{frame}[fragile]{Generating Random Numbers} 80 | Generating random Normal variates 81 | \begin{verbatim} 82 | > x <- rnorm(10) 83 | > x 84 | [1] 1.38380206 0.48772671 0.53403109 0.66721944 85 | [5] 0.01585029 0.37945986 1.31096736 0.55330472 86 | [9] 1.22090852 0.45236742 87 | > x <- rnorm(10, 20, 2) 88 | > x 89 | [1] 23.38812 20.16846 21.87999 20.73813 19.59020 90 | [6] 18.73439 18.31721 22.51748 20.36966 21.04371 91 | > summary(x) 92 | Min. 1st Qu. Median Mean 3rd Qu. Max. 93 | 18.32 19.73 20.55 20.67 21.67 23.39 94 | \end{verbatim} 95 | \end{frame} 96 | 97 | \begin{frame}[fragile]{Generating Random Numbers} 98 | Setting the random number seed with \code{set.seed} ensures 99 | reproducibility 100 | \begin{verbatim} 101 | > set.seed(1) 102 | > rnorm(5) 103 | [1] -0.6264538 0.1836433 -0.8356286 1.5952808 104 | [5] 0.3295078 105 | > rnorm(5) 106 | [1] -0.8204684 0.4874291 0.7383247 0.5757814 107 | [5] -0.3053884 108 | > set.seed(1) 109 | > rnorm(5) 110 | [1] -0.6264538 0.1836433 -0.8356286 1.5952808 111 | [5] 0.3295078 112 | \end{verbatim} 113 | Always set the random number seed when conducting a simulation! 114 | \end{frame} 115 | 116 | \begin{frame}[fragile]{Generating Random Numbers} 117 | Generating Poisson data 118 | \begin{verbatim} 119 | > rpois(10, 1) 120 | [1] 3 1 0 1 0 0 1 0 1 1 121 | > rpois(10, 2) 122 | [1] 6 2 2 1 3 2 2 1 1 2 123 | > rpois(10, 20) 124 | [1] 20 11 21 20 20 21 17 15 24 20 125 | 126 | > ppois(2, 2) ## Cumulative distribution 127 | [1] 0.6766764 ## Pr(x <= 2) 128 | > ppois(4, 2) 129 | [1] 0.947347 ## Pr(x <= 4) 130 | > ppois(6, 2) 131 | [1] 0.9954662 ## Pr(x <= 6) 132 | \end{verbatim} 133 | \end{frame} 134 | 135 | \begin{frame}[fragile]{Generating Random Numbers From a Linear Model} 136 | Suppose we want to simulate from the following linear model 137 | \[ 138 | y = \beta_0 + \beta_1 x + \varepsilon 139 | \] 140 | where $\varepsilon\sim\mathcal{N}(0, 2^2)$. Assume 141 | $x\sim\mathcal{N}(0,1^2)$, $\beta_0 = 0.5$ and $\beta_1 = 2$. 142 | \begin{verbatim} 143 | > set.seed(20) 144 | > x <- rnorm(100) 145 | > e <- rnorm(100, 0, 2) 146 | > y <- 0.5 + 2 * x + e 147 | > summary(y) 148 | Min. 1st Qu. Median Mean 3rd Qu. Max. 149 | -6.4080 -1.5400 0.6789 0.6893 2.9300 6.5050 150 | > plot(x, y) 151 | \end{verbatim} 152 | \end{frame} 153 | 154 | \begin{frame}[fragile]{Generating Random Numbers From a Linear Model} 155 | \includegraphics[height=3.2in]{linearmodelsim} 156 | \end{frame} 157 | 158 | \begin{frame}[fragile]{Generating Random Numbers From a Linear Model} 159 | What if \code{x} is binary? 160 | \begin{verbatim} 161 | > set.seed(10) 162 | > x <- rbinom(100, 1, 0.5) 163 | > e <- rnorm(100, 0, 2) 164 | > y <- 0.5 + 2 * x + e 165 | > summary(y) 166 | Min. 1st Qu. Median Mean 3rd Qu. Max. 167 | -3.4940 -0.1409 1.5770 1.4320 2.8400 6.9410 168 | > plot(x, y) 169 | \end{verbatim} 170 | \end{frame} 171 | 172 | \begin{frame}[fragile]{Generating Random Numbers From a Linear Model} 173 | \includegraphics[height=3.2in]{binarylinearmodelsim} 174 | \end{frame} 175 | 176 | \begin{frame}[fragile]{Generating Random Numbers From a Generalized 177 | Linear Model} 178 | Suppose we want to simulate from a Poisson model where 179 | \begin{eqnarray*} 180 | Y & \sim & \text{Poisson}(\mu)\\ 181 | \log\mu & = & \beta_0 + \beta_1 x 182 | \end{eqnarray*} 183 | and $\beta_0 = 0.5$ and $\beta_1 = 0.3$. We need to use the 184 | \code{rpois} function for this 185 | \begin{verbatim} 186 | > set.seed(1) 187 | > x <- rnorm(100) 188 | > log.mu <- 0.5 + 0.3 * x 189 | > y <- rpois(100, exp(log.mu)) 190 | > summary(y) 191 | Min. 1st Qu. Median Mean 3rd Qu. Max. 192 | 0.00 1.00 1.00 1.55 2.00 6.00 193 | > plot(x, y) 194 | \end{verbatim} 195 | \end{frame} 196 | 197 | \begin{frame}[fragile]{Generating Random Numbers From a Generalized Linear Model} 198 | \includegraphics[height=3.2in]{simpoisson} 199 | \end{frame} 200 | 201 | \begin{frame}[fragile]{Random Sampling} 202 | The \code{sample} function draws randomly from a specified set of 203 | (scalar) objects allowing you to sample from arbitrary 204 | distributions. 205 | \begin{verbatim} 206 | > set.seed(1) 207 | > sample(1:10, 4) 208 | [1] 3 4 5 7 209 | > sample(1:10, 4) 210 | [1] 3 9 8 5 211 | > sample(letters, 5) 212 | [1] "q" "b" "e" "x" "p" 213 | > sample(1:10) ## permutation 214 | [1] 4 7 10 6 9 2 8 3 1 5 215 | > sample(1:10) 216 | [1] 2 3 4 1 9 5 10 8 6 7 217 | > sample(1:10, replace = TRUE) ## Sample w/replacement 218 | [1] 2 9 7 8 2 8 5 9 7 8 219 | \end{verbatim} 220 | \end{frame} 221 | 222 | \begin{frame}{Simulation} 223 | Summary 224 | \begin{itemize} 225 | \item Drawing samples from specific probability distributions can be 226 | done with \code{r}* functions 227 | \item Standard distributions are built in: Normal, Poisson, Binomial, 228 | Exponential, Gamma, etc. 229 | \item The \code{sample} function can be used to draw random samples 230 | from arbitrary vectors 231 | \item Setting the random number generator seed via \code{set.seed} is 232 | critical for reproducibility 233 | \end{itemize} 234 | \end{frame} 235 | 236 | \end{document} 237 | 238 | 239 | -------------------------------------------------------------------------------- /str.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/str.pptx -------------------------------------------------------------------------------- /vectorized.tex: -------------------------------------------------------------------------------- 1 | \documentclass[aspectratio=169]{beamer} 2 | 3 | \mode 4 | { 5 | \usetheme{Warsaw} 6 | % or ... 7 | 8 | \setbeamercovered{transparent} 9 | % or whatever (possibly just delete it) 10 | } 11 | 12 | 13 | \usepackage[english]{babel} 14 | \usepackage[latin1]{inputenc} 15 | \usepackage{graphicx} 16 | %\usepackage{times} 17 | %\usepackage[T1]{fontenc} 18 | % Or whatever. Note that the encoding and the font should match. If T1 19 | % does not look nice, try deleting the line with the fontenc. 20 | 21 | \usepackage{amsmath,amsfonts,amssymb} 22 | 23 | \input{macros} 24 | 25 | \title[The R Language]{Introduction to the R Language} 26 | 27 | \subtitle{Vectorized Operations} 28 | 29 | \date{Computing for Data Analysis} 30 | 31 | \setbeamertemplate{footline}[page number] 32 | 33 | 34 | \begin{document} 35 | 36 | \begin{frame} 37 | \titlepage 38 | \end{frame} 39 | 40 | 41 | \begin{frame}[fragile]{Vectorized Operations} 42 | Many operations in R are \textit{vectorized} making code more 43 | efficient, concise, and easier to read. 44 | \begin{verbatim} 45 | > x <- 1:4; y <- 6:9 46 | > x + y 47 | [1] 7 9 11 13 48 | > x > 2 49 | [1] FALSE FALSE TRUE TRUE 50 | > x >= 2 51 | [1] FALSE TRUE TRUE TRUE 52 | > y == 8 53 | [1] FALSE FALSE TRUE FALSE 54 | > x * y 55 | [1] 6 14 24 36 56 | > x / y 57 | [1] 0.1666667 0.2857143 0.3750000 0.4444444 58 | \end{verbatim} 59 | \end{frame} 60 | 61 | \begin{frame}[fragile]{Vectorized Matrix Operations} 62 | \begin{verbatim} 63 | > x <- matrix(1:4, 2, 2); y <- matrix(rep(10, 4), 2, 2) 64 | > x * y ## element-wise multiplication 65 | [,1] [,2] 66 | [1,] 10 30 67 | [2,] 20 40 68 | > x / y 69 | [,1] [,2] 70 | [1,] 0.1 0.3 71 | [2,] 0.2 0.4 72 | > x %*% y ## true matrix multiplication 73 | [,1] [,2] 74 | [1,] 40 40 75 | [2,] 60 60 76 | \end{verbatim} 77 | \end{frame} 78 | 79 | 80 | \end{document} 81 | --------------------------------------------------------------------------------