├── Dates.Rmd
├── RColors.pptx
├── ReproResearch.pptx
├── binarylinearmodelsim.pdf
├── classes-methods.Rnw
├── connections.tex
├── controlstructures.tex
├── datatypes1.tex
├── datatypes2.tex
├── debugging.tex
├── functions.tex
├── ggplot2_part1.pptx
├── ggplot2_part2.pptx
├── grep.tex
├── help.ppt
├── homicide-month.pdf
├── knitr.pptx
├── linearmodelsim.pdf
├── loopfunctions.tex
├── macros.tex
├── mulike.pdf
├── overview_history.tex
├── plotting.tex
├── reading-data.tex
├── regex.tex
├── sigmalike.pdf
├── simpoisson.pdf
├── simulation.tex
├── str.pptx
└── vectorized.tex


/Dates.Rmd:
--------------------------------------------------------------------------------
  1 | % Dates and Times in R
  2 | % Computing for Data Analysis
  3 | %
  4 | 
  5 | ```{r, echo=FALSE}
  6 | options(width = 50)
  7 | ```
  8 | 
  9 | # Dates and Times in R
 10 | 
 11 | R has developed a special representation of dates and times
 12 | 
 13 | - Dates are represented by the `Date` class
 14 | 
 15 | - Times are represented by the `POSIXct` or the `POSIXlt` class
 16 | 
 17 | - Dates are stored internally as the number of days since 1970-01-01
 18 | 
 19 | - Times are stored internally as the number of seconds since
 20 |   1970-01-01
 21 | 
 22 | # Dates in R
 23 | 
 24 | Dates are represented by the `Date` class and can be coerced from a
 25 | character string using the `as.Date()` function.
 26 | 
 27 | ```{r}
 28 | x <- as.Date("1970-01-01")
 29 | x
 30 | unclass(x)
 31 | unclass(as.Date("1970-01-02"))
 32 | ```
 33 | 
 34 | # Times in R
 35 | 
 36 | Times are represented using the `POSIXct` or the `POSIXlt` class
 37 | 
 38 | - `POSIXct` is just a very large integer under the hood; it use a
 39 |   useful class when you want to store times in something like a data
 40 |   frame
 41 | 
 42 | - `POSIXlt` is a list underneath and it stores a bunch of other useful
 43 |   information like the day of the week, day of the year, month, day of
 44 |   the month
 45 | 
 46 | There are a number of generic functions that work on dates and times
 47 | 
 48 | - `weekdays`: give the day of the week
 49 | 
 50 | - `months`: give the month name
 51 | 
 52 | - `quarters`: give the quarter number ("Q1", "Q2", "Q3", or "Q4")
 53 | 
 54 | # Times in R
 55 | 
 56 | Times can be coerced from a character string using the `as.POSIXlt`
 57 | or `as.POSIXct` function.
 58 | 
 59 | ```{r}
 60 | x <- Sys.time()
 61 | x
 62 | p <- as.POSIXlt(x)
 63 | names(unclass(p))
 64 | p$sec
 65 | ```
 66 | 
 67 | # Times in R
 68 | 
 69 | You can also use the `POSIXct` format.
 70 | 
 71 | ```{r}
 72 | x <- Sys.time()
 73 | x  ## Already in `POSIXct' format
 74 | unclass(x)
 75 | x$sec
 76 | p <- as.POSIXlt(x)
 77 | p$sec
 78 | ```
 79 | 
 80 | # Times in R
 81 | 
 82 | Finally, there is the `strptime` function in case your dates are
 83 | written in a different format
 84 | 
 85 | ```{r}
 86 | datestring <- c("January 10, 2012 10:40", "December 9, 2011 9:10")
 87 | x <- strptime(datestring, "%B %d, %Y %H:%M")
 88 | x
 89 | class(x)
 90 | ```
 91 | 
 92 | I can *never* remember the formatting strings. Check `?strptime` for
 93 | details.
 94 | 
 95 | # Operations on Dates and Times
 96 | 
 97 | You can use mathematical operations on dates and times. Well, really
 98 | just `+` and `-`. You can do comparisons too (i.e. `==`, `<=`)
 99 | 
100 | ```{r}
101 | x <- as.Date("2012-01-01")
102 | y <- strptime("9 Jan 2011 11:34:21", "%d %b %Y %H:%M:%S")
103 | x - y
104 | x <- as.POSIXlt(x)
105 | x - y
106 | ```
107 | 
108 | # Operations on Dates and Times
109 | 
110 | Even keeps track of leap years, leap seconds, daylight savings, and
111 | time zones.
112 | 
113 | ```{r}
114 | x <- as.Date("2012-03-01")
115 | y <- as.Date("2012-02-28")
116 | x - y
117 | x <- as.POSIXct("2012-10-25 01:00:00")
118 | y <- as.POSIXct("2012-10-25 06:00:00", tz = "GMT")
119 | y - x
120 | ```
121 | 
122 | # Summary
123 | 
124 | - Dates and times have special classes in R that allow for numerical
125 |   and statistical calculations
126 | 
127 | - Dates use the `Date` class
128 | 
129 | - Times use the `POSIXct` and `POSIXlt` class
130 | 
131 | - Character strings can be coerced to Date/Time classes using the
132 |   `strptime` function or the `as.Date`, `as.POSIXlt`, or `as.POSIXct`
133 | 


--------------------------------------------------------------------------------
/RColors.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/RColors.pptx


--------------------------------------------------------------------------------
/ReproResearch.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/ReproResearch.pptx


--------------------------------------------------------------------------------
/binarylinearmodelsim.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/binarylinearmodelsim.pdf


--------------------------------------------------------------------------------
/classes-methods.Rnw:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | \usepackage[noae]{Sweave}
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[Classes and Methods in R]{Classes and Methods in R}
 26 | 
 27 | \date{Computing for Data Analysis}
 28 | 
 29 | \setbeamertemplate{footline}[page number]
 30 | 
 31 | \setkeys{Gin}{width=0.4\textwidth}
 32 | 
 33 | 
 34 | \begin{document}
 35 | 
 36 | 
 37 | 
 38 | \begin{frame}
 39 |   \titlepage
 40 | \end{frame}
 41 | 
 42 | \begin{frame}{Classes and Methods}
 43 | \begin{itemize}
 44 | \item
 45 | A system for doing object oriented programming
 46 | \item R was originally quite interesting because it is both
 47 |   interactive \textit{and} has a system for object orientation.
 48 | \begin{itemize}
 49 | \item
 50 | Other languages which support OOP (C++, Java, Lisp, Python, Perl)
 51 | generally speaking are not interactive languages
 52 | \end{itemize}
 53 | \item In R much of the code for supporting classes/methods is written
 54 |   by John Chambers himself (the creator of the original S language)
 55 |   and documented in the book \textit{Programming with Data: A Guide to the S
 56 |   Language}
 57 | \item A natural extension of Chambers' idea of allowing someone to
 58 |   cross the user $\longrightarrow$ programmer spectrum
 59 | \item Object oriented programming is a bit different in R than it is
 60 |   in most languages --- even if you are familiar with the idea, you
 61 |   may want to pay attention to the details
 62 | \end{itemize}
 63 | \end{frame}
 64 | 
 65 | 
 66 | \begin{frame}{Two styles of classes and methods}
 67 | S3 classes/methods
 68 | \begin{itemize}
 69 | \item
 70 | Included with version 3 of the S language.
 71 | \item
 72 | Informal, a little kludgey
 73 | \item
 74 | Sometimes called \textit{old-style} classes/methods
 75 | \end{itemize}
 76 | S4 classes/methods
 77 | \begin{itemize}
 78 | \item
 79 | more formal and rigorous
 80 | \item
 81 | Included with S-PLUS 6 and R  1.4.0 (December 2001)
 82 | \item
 83 | Also called \textit{new-style} classes/methods
 84 | \end{itemize}
 85 | \end{frame}
 86 | 
 87 | 
 88 | 
 89 | \begin{frame}{Two worlds living side by side}
 90 | \begin{itemize}
 91 | \item For now (and the forseeable future), S3 classes/methods and S4
 92 |   classes/methods are separate systems (but they can be mixed to some
 93 |   degree).
 94 | \item
 95 | Each system can be used fairly independently of the other.
 96 | \item
 97 | Developers of new projects (you!) are encouraged to use the S4
 98 | style classes/methods.
 99 | \begin{itemize}
100 | \item
101 | Used extensively in the Bioconductor project
102 | \end{itemize}
103 | \item
104 | But many developers still use S3 classes/methods because they
105 | are ``quick and dirty" (and easier).
106 | \item
107 | In this lecture we will focus primarily on S4 classes/methods
108 | \item The code for implementing S4 classes/methods in R is in the
109 |   \textbf{methods} package, which is usually loaded by default (but
110 |   you can load it with \code{library(methods)} if for some reason it
111 |   is not loaded)
112 | \end{itemize}
113 | \end{frame}
114 | 
115 | \begin{frame}{Object Oriented Programming in R}
116 | \begin{itemize}
117 | \item A \textit{class} is a description of an thing. A class can be
118 |   defined using \code{setClass()} in the \textbf{methods} package.
119 | \item
120 | An \textit{object} is an instance of a class. Objects can be created
121 | using \code{new()}.
122 | \item A \textit{method} is a function that only operates on a certain
123 |   class of objects.
124 | \item A generic function is an R function which dispatches methods.  A
125 |   generic function typically encapsulates a ``generic" concept
126 |   (e.g. \code{plot}, \code{mean}, \code{predict}, ...)
127 | \begin{itemize}
128 | \item
129 | The generic function does not actually do any computation.
130 | \end{itemize}
131 | \item
132 | A \textit{method} is the implementation of a generic function for an
133 | object of a particular class.
134 | \end{itemize}
135 | \end{frame}
136 | 
137 | 
138 | \begin{frame}{Things to look up}
139 | \begin{itemize}
140 | \item
141 | The help files for the `methods' package are extensive --- do read
142 | them as they are the primary documentation
143 | \item 
144 | You may want to start with \code{?Classes} and \code{?Methods}
145 | \item
146 | Check out \code{?setClass}, \code{?setMethod}, and \code{?setGeneric}
147 | \item
148 | Some of it gets technical, but try your best for now---it will make
149 | sense in the future as you keep using it.
150 | \item Most of the documentation in the \textbf{methods} package is
151 |   oriented towards developers/programmers as these are the primary
152 |   people using classes/methods
153 | \end{itemize}
154 | \end{frame}
155 | 
156 | 
157 | \begin{frame}[fragile]{Classes}
158 | All objects in R have a class which can be determined by the class
159 | function
160 | <<examplesclasses>>=
161 | class(1)
162 | class(TRUE)
163 | class(rnorm(100))
164 | class(NA)
165 | class("foo")
166 | @ 
167 | \end{frame}
168 | 
169 | 
170 | 
171 | \begin{frame}[fragile]{Classes (cont'd)}
172 | Data classes go beyond the atomic classes
173 | <<lmclass>>=
174 | x <- rnorm(100)
175 | y <- x + rnorm(100)
176 | fit <- lm(y ~ x)  ## linear regression model
177 | class(fit)
178 | @ 
179 | \end{frame}
180 | 
181 | \begin{frame}{Generics/Methods in R}
182 | \begin{itemize}
183 | \item
184 | S4 and S3 style generic functions look different but conceptually,
185 | they are the same (they play the same role).
186 | \item
187 | When you program you can write new methods for an existing generic OR
188 | create your own generics and associated methods.
189 | \item Of course, if a data type does not exist in R that matches your
190 |   needs, you can always define a new class along with generics/methods
191 |   that go with it
192 | \end{itemize}
193 | \end{frame}
194 | 
195 | \begin{frame}[fragile]{An S3 generic function (in the `base' package)}
196 | The \code{mean} function is generic
197 | <<printmean>>=
198 | mean
199 | @ 
200 | 
201 | So is the \code{print} function
202 | <<printprint>>=
203 | print
204 | @ 
205 | \end{frame}
206 | 
207 | 
208 | \begin{frame}[fragile]{S3 methods}
209 | <<methodsmean>>=
210 | methods("mean")
211 | @ 
212 | \end{frame}
213 | 
214 | 
215 | \begin{frame}[fragile]{An S4 generic function (from the `methods' package)}
216 | The S4 equivalent of \code{print} is \code{show}  
217 | <<showmethod>>=
218 | show
219 | @ 
220 | 
221 | The \code{show} function is usually not called directly (much like
222 | \code{print}) because objects are auto-printed
223 | \end{frame}
224 | 
225 | \begin{frame}[fragile]{S4 methods}
226 | There are many different methods for the \code{show} generic
227 | function
228 | <<showmethodsshow>>=
229 | showMethods("show")
230 | @   
231 | \end{frame}
232 | 
233 | 
234 | \begin{frame}{Generic/method mechanism}
235 | The first argument of a generic function is an object of a particular
236 | class (there may be other arguments)
237 | \begin{enumerate}
238 | \item
239 | The generic function checks the class of the object.
240 | \item
241 | A search is done to see if there is an appropriate method for
242 | that class.
243 | \item
244 | If there exists a method for that class, then that method is
245 | called on the object and we're done.
246 | \item
247 | If a method for that class does not exist, a search is done to see
248 | if there is a default method for the generic. If a default exists,
249 | then the default method is called.
250 | \item
251 | If a default method doesn't exist, then an error is thrown.
252 | \end{enumerate}
253 | \end{frame}
254 | 
255 | \begin{frame}{Examining Code for Methods}
256 | Examining the code for an S3 or S4 method requires a call to a special
257 | function
258 | \begin{itemize}
259 | \item You cannot just print the code for a method like other
260 | functions because the code for the method is usually hidden.
261 | \item If you want to see the code for an S3 method, you can use the function
262 | \code{getS3method}.
263 | \item The call is \code{getS3method(<generic>, <class>)}
264 | \item For S4 methods you can use the function \code{getMethod}
265 | \item The call is \code{getMethod(<generic>, <signature>)} (more
266 |   details later)
267 | \end{itemize}
268 | \end{frame}
269 | 
270 | 
271 | \begin{frame}[fragile]{S3 Class/Method: Example 1}
272 |  What's happening here?
273 | <<meanexample>>=
274 | set.seed(2)
275 | x <- rnorm(100)
276 | mean(x)
277 | @ 
278 | \begin{enumerate}
279 | \item
280 | The class of x is ``numeric''
281 | \item
282 | But there is no mean method for ``numeric'' objects!
283 | \item
284 | So we call the default function for \code{mean}.
285 | \end{enumerate}
286 | \end{frame}
287 | 
288 | 
289 | \begin{frame}[fragile]{S3 Class/Method: Example 1}
290 | <<showmean>>=
291 | head(getS3method("mean", "default"))
292 | tail(getS3method("mean", "default"))
293 | @ 
294 | \end{frame}
295 | 
296 | \begin{frame}[fragile]{S3 Class/Method: Example 2}
297 |   What happens here?
298 | <<dataframemean>>=
299 | set.seed(3)
300 | df <- data.frame(x = rnorm(100), y = 1:100)
301 | sapply(df, mean)
302 | @   
303 | \begin{enumerate}
304 | \item
305 | The class of df is ``data.frame''; in a data frame each column can be
306 | an object of a different class
307 | \item
308 | We \code{sapply} over the columns and call the \code{mean} function
309 | \item
310 | In each column, \code{mean} checks the class of the object and
311 | dispatches the appropriate method.
312 | \item Here we have a \code{numeric} column and an \code{integer}
313 |   column; in both cases \code{mean} calls the default method
314 | \end{enumerate}
315 | \end{frame}
316 | 
317 | 
318 | \begin{frame}{Calling Methods}
319 | NOTE: Some methods are visible to the user (i.e. \code{mean.default}),
320 | but you should \textbf{never} call methods directly. Rather, use the
321 | generic function and let the method be dispatched automatically.
322 | \end{frame}
323 | 
324 | 
325 | \begin{frame}[fragile]{S3 Class/Method: Example 3}
326 |   The \code{plot} function is generic and its behavior depends on the
327 |   object being plotted.
328 | <<plotdefault,fig=true>>=
329 | set.seed(10)
330 | x <- rnorm(100)
331 | plot(x)
332 | @ 
333 | \end{frame}
334 | 
335 | 
336 | \begin{frame}[fragile]{S3 Class/Method: Example 3}
337 |   For time series objects, \code{plot} connects the dots
338 | <<plotts,fig=true>>=
339 | set.seed(10)
340 | x <- rnorm(100)
341 | x <- as.ts(x)  ## Convert to a time series object
342 | plot(x)
343 | @ 
344 | \end{frame}
345 | 
346 | \begin{frame}{Write your own methods!}
347 | If you write new methods for new classes, you'll probably end up
348 | writing methods for the following generics:
349 | \begin{itemize}
350 | \item
351 | print/show
352 | \item
353 | summary
354 | \item
355 | plot
356 | \end{itemize}
357 | There are two ways that you can extend the R system via classes/methods
358 | \begin{itemize}
359 | \item Write a method for a new class but for an existing generic
360 |   function (i.e. like \code{print})
361 | \item Write new generic functions and new methods for those generics
362 | \end{itemize}
363 | \end{frame}
364 | 
365 | 
366 | \begin{frame}{S4 Classes}
367 | Why would you want to create a new class?
368 | \begin{itemize}
369 | \item
370 | To represent new types of data (e.g. gene expression, space-time,
371 | hierarchical, sparse matrices)
372 | \item
373 | New concepts/ideas that haven't been thought of yet (e.g. a fitted
374 | point process model, mixed-effects model, a sparse matrix)
375 | \item
376 | To abstract/hide implementation details from the user
377 | \end{itemize}
378 | I say things are ``new'' meaning that R does not know about them (not
379 | that they are new to the statistical community).
380 | \end{frame}
381 | 
382 | \begin{frame}{S4 Class/Method: Creating a New Class}
383 |   A new class can be defined using the \code{setClass} function
384 | \begin{itemize}
385 | \item At a minimum you need to specify the name of the class
386 | \item You can also specify data elements that are called \textit{slots}
387 | \item You can then define methods for the class with the
388 |   \code{setMethod} function
389 | \item Information about a class definition can be obtained with the
390 |   \code{showClass} function
391 | \end{itemize}
392 | \end{frame}
393 | 
394 | \begin{frame}[fragile]{S4 Class/Method: Polygon Class}
395 | Creating new classes/methods is usually not something done at the
396 | console; you likely want to save the code in a separate file
397 | \begin{verbatim}
398 | setClass("polygon",
399 |          representation(x = "numeric",
400 |                         y = "numeric"))
401 | \end{verbatim}
402 | The slots for this class are \code{x} and \code{y}. The slots for an
403 | S4 object can be accessed with the \code{@} operator.
404 | \end{frame}
405 | 
406 | \begin{frame}[fragile]{S4 Class/Method: Polygon Class}
407 | A \code{plot} method can be created with the \code{setMethod}
408 | function.
409 | \begin{itemize}
410 | \item For \code{setMethod} you need to specify a generic function
411 |   (\code{plot}), and a \textit{signature}.
412 | \item A signature is a character vector indicating the classes of
413 |   objects that are accepted by the method. In this case, the
414 |   \code{plot} method will take one type of object--a \code{polygon}
415 |   object.
416 | \end{itemize}
417 | \begin{verbatim}
418 | setMethod("plot", "polygon",
419 |           function(x, y, ...) {
420 |                   plot(x@x, x@y, type = "n", ...)
421 |                   xp <- c(x@x, x@x[1])
422 |                   yp <- c(x@y, x@y[1])
423 |                   lines(xp, yp)
424 |           })
425 | \end{verbatim}
426 | Notice that the slots of the polygon (the x- and y-coordinates) are
427 | accessed with the \code{@} operator.
428 | \end{frame}
429 | 
430 | \begin{frame}[fragile]{S4 Class/Method: Polygon Class}
431 |   Create a new class
432 | <<createpolygonclass>>=
433 | setClass("polygon",
434 |          representation(x = "numeric",
435 |                         y = "numeric"))
436 | @ 
437 | 
438 | Create a plot method for this class
439 | <<createplotmethod>>=
440 | setMethod("plot", "polygon",
441 |           function(x, y, ...) {
442 |                   plot(x@x, x@y, type = "n", ...)
443 |                   xp <- c(x@x, x@x[1])
444 |                   yp <- c(x@y, x@y[1])
445 |                   lines(xp, yp)
446 |           })
447 | @ 
448 | 
449 | If things go well, you will not get any messages or errors and nothing
450 | useful will be returned by either \code{setClass} or \code{setMethod}.
451 | \end{frame}
452 | 
453 | \begin{frame}[fragile]{S4 Class/Method: Polygon Class}
454 | After calling \code{setMethod} the new \code{plot} method will be
455 | added to the list of methods for \code{plot}.
456 | <<showMethods>>=
457 | showMethods("plot")
458 | @   
459 | 
460 | Notice that the signature for class \code{polygon} is listed.  The
461 | method for \code{ANY} is the default method and it is what is called
462 | when now other signature matches
463 | \end{frame}
464 | 
465 | \begin{frame}[fragile]{S4 Class/Method: Polygon class}
466 | <<showplotpolygon,fig=true>>=
467 | p <- new("polygon", x = c(1, 2, 3, 4), y = c(1, 2, 3, 1))
468 | plot(p)
469 | @ 
470 | \end{frame}
471 | 
472 | 
473 | \begin{frame}{Where to Look, Places to Start}
474 | \begin{itemize}
475 | \item
476 | The best way to learn this stuff is to look at examples (and try the
477 | exercises for the course)
478 | \item
479 | There are now quite a few examples on CRAN which use S4
480 | classes/methods.
481 | \item
482 | Bioconductor (http://www.bioconductor.org) --- a rich
483 | resource, even if you know nothing about bioinformatics
484 | \item
485 | Some packages on CRAN (as far as I know) --- SparseM,
486 | gpclib, flexmix, its, lme4, orientlib, pixmap
487 | \item
488 | The \code{stats4} package (comes with R) has a bunch of
489 | classes/methods for doing maximum likelihood analysis.
490 | \end{itemize}
491 | \end{frame}
492 | 
493 | 
494 | 
495 | \end{document}
496 | 


--------------------------------------------------------------------------------
/connections.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[aspectratio=169]{beamer}
 2 | 
 3 | \mode<presentation>
 4 | {
 5 |   \usetheme{Warsaw}
 6 |   % or ...
 7 | 
 8 |   \setbeamercovered{transparent}
 9 |   % or whatever (possibly just delete it)
10 | }
11 | 
12 | 
13 | \usepackage[english]{babel}
14 | \usepackage[latin1]{inputenc}
15 | \usepackage{graphicx}
16 | %\usepackage{times}
17 | %\usepackage[T1]{fontenc}
18 | % Or whatever. Note that the encoding and the font should match. If T1
19 | % does not look nice, try deleting the line with the fontenc.
20 | 
21 | \usepackage{amsmath,amsfonts,amssymb}
22 | 
23 | \input{macros}
24 | 
25 | \title[The R Language]{Introduction to the R Language}
26 | 
27 | \subtitle{Connections}
28 | 
29 | \date{Computing for Data Analysis}
30 | 
31 | \setbeamertemplate{footline}[page number]
32 | 
33 | 
34 | \begin{document}
35 | 
36 | \begin{frame}
37 |   \titlepage
38 | \end{frame}
39 | 
40 | 
41 | \begin{frame}{Interfaces to the Outside World}
42 | Data are read in using \textit{connection} interfaces.  Connections
43 | can be made to files (most common) or to other more exotic things.
44 | \begin{itemize}
45 | \item
46 | \code{file}, opens a connection to a file
47 | \item
48 | \code{gzfile}, opens a connection to a file compressed with gzip
49 | \item
50 | \code{bzfile}, opens a connection to a file compressed with bzip2
51 | \item
52 | \code{url}, opens a connection to a webpage
53 | \end{itemize}
54 | \end{frame}
55 | 
56 | 
57 | \begin{frame}[fragile]{File Connections}
58 | \begin{verbatim}
59 | > str(file)
60 | function (description = "", open = "", blocking = TRUE, 
61 |           encoding = getOption("encoding"))
62 | \end{verbatim}
63 | \begin{itemize}
64 | \item
65 | \code{description} is the name of the file
66 | \item
67 | \code{open} is a code indicating
68 | \begin{itemize}
69 | \item
70 | ``r'' read only
71 | \item
72 | ``w'' writing (and initializing a new file)
73 | \item
74 | ``a'' appending
75 | \item
76 | ``rb'', ``wb'', ``ab'' reading, writing, or appending in binary mode
77 | (Windows)
78 | \end{itemize}
79 | \end{itemize}
80 | \end{frame}
81 | 
82 | 
83 | \begin{frame}[fragile]{Connections}
84 | In general, connections are powerful tools that let you navigate files
85 | or other external objects.  In practice, we often don't need to deal
86 | with the connection interface directly.
87 | \begin{verbatim}
88 | con <- file("foo.txt", "r")
89 | data <- read.csv(con)
90 | close(con)
91 | \end{verbatim}
92 | is the same as
93 | \begin{verbatim}
94 | data <- read.csv("foo.txt")
95 | \end{verbatim}
96 | \end{frame}
97 | 
98 | \end{document}
99 | 


--------------------------------------------------------------------------------
/controlstructures.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[The R Language]{Introduction to the R Language}
 26 | 
 27 | \subtitle{Control Structures}
 28 | 
 29 | \date{Computing for Data Analysis}
 30 | 
 31 | \setbeamertemplate{footline}[page number]
 32 | 
 33 | 
 34 | 
 35 | 
 36 | \begin{document}
 37 | 
 38 | \begin{frame}
 39 |   \titlepage
 40 | \end{frame}
 41 | 
 42 | \begin{frame}{Control Structures}
 43 | Control structures in R allow you to control the flow of execution of
 44 | the program, depending on runtime conditions.  Common structures are
 45 | \begin{itemize}
 46 | \item
 47 | \code{if}, \code{else}: testing a condition
 48 | \item
 49 | \code{for}: execute a loop a fixed number of times
 50 | \item
 51 | \code{while}: execute a loop \textit{while} a condition is true
 52 | \item
 53 | \code{repeat}:  execute an infinite loop
 54 | \item
 55 | \code{break}: break the execution of a loop
 56 | \item
 57 | \code{next}: skip an interation of a loop
 58 | \item
 59 | \code{return}: exit a function
 60 | \end{itemize}
 61 | Most control structures are not used in interactive sessions, but
 62 | rather when writing functions or longer expresisons.
 63 | \end{frame}
 64 | 
 65 | 
 66 | \begin{frame}[fragile]{Control Structures: if}
 67 | \begin{verbatim}
 68 | if(<condition>) {
 69 |         ## do something
 70 | } else {
 71 |         ## do something else
 72 | }
 73 | 
 74 | if(<condition1>) {
 75 |         ## do something
 76 | } else if(<condition2>)  {
 77 |         ## do something different
 78 | } else {
 79 |         ## do something different
 80 | }
 81 | \end{verbatim}
 82 | \end{frame}
 83 | 
 84 | \begin{frame}[fragile]{if}
 85 | This is a valid if/else structure.
 86 | \begin{verbatim}
 87 | if(x > 3) {
 88 |         y <- 10
 89 | } else {
 90 |         y <- 0
 91 | }
 92 | \end{verbatim}
 93 | So is this one.
 94 | \begin{verbatim}
 95 | y <- if(x > 3) {
 96 |         10
 97 | } else {
 98 |         0
 99 | }
100 | \end{verbatim}
101 | \end{frame}
102 | 
103 | \begin{frame}[fragile]{if}
104 | Of course, the \code{else} clause is not necessary.
105 | \begin{verbatim}
106 | if(<condition1>) {
107 | 
108 | }
109 | 
110 | if(<condition2>) {
111 | 
112 | }
113 | \end{verbatim}
114 | \end{frame}
115 | 
116 | 
117 | \begin{frame}[fragile]{for}
118 | \code{for} loops take an interator variable and assign it successive
119 | values from a sequence or vector.  For loops are most commonly used
120 | for iterating over the elements of an object (list, vector, etc.)
121 | \begin{verbatim}
122 | for(i in 1:10) {
123 |         print(i)
124 | }
125 | \end{verbatim}
126 | This loop takes the \code{i} variable and in each iteration of the
127 | loop gives it values 1, 2, 3, ..., 10, and then exits.
128 | \end{frame}
129 | 
130 | \begin{frame}[fragile]{for}
131 | These three loops have the same behavior.
132 | \begin{verbatim}
133 | x <- c("a", "b", "c", "d")
134 | 
135 | for(i in 1:4) {
136 |         print(x[i])
137 | }
138 | 
139 | for(i in seq_along(x)) {
140 |         print(x[i])
141 | }
142 | 
143 | for(letter in x) {
144 |         print(letter)
145 | }
146 | 
147 | for(i in 1:4) print(x[i])
148 | \end{verbatim}
149 | \end{frame}
150 | 
151 | \begin{frame}[fragile]{Nested for loops}
152 | \code{for} loops can be nested.
153 | \begin{verbatim}
154 | x <- matrix(1:6, 2, 3)
155 | 
156 | for(i in seq_len(nrow(x))) {
157 |         for(j in seq_len(ncol(x))) {
158 |                 print(x[i, j])
159 |         }
160 | }
161 | \end{verbatim}
162 | Be careful with nesting though.  Nesting beyond 2--3 levels is often
163 | very difficult to read/understand.
164 | \end{frame}
165 | 
166 | \begin{frame}[fragile]{while}
167 | While loops begin by testing a condition.  If it is true, then they
168 | execute the loop body.  Once the loop body is executed, the condition
169 | is tested again, and so forth.
170 | \begin{verbatim}
171 | count <- 0
172 | 
173 | while(count < 10) {
174 |         print(count)
175 |         count <- count + 1
176 | }
177 | \end{verbatim}
178 | While loops can potentially result in infinite loops if not written
179 | properly.  Use with care!
180 | \end{frame}
181 | 
182 | \begin{frame}[fragile]{while}
183 | Sometimes there will be more than one condition in the test.
184 | \begin{verbatim}
185 | z <- 5
186 | 
187 | while(z >= 3 && z <= 10) {
188 |         print(z)
189 |         coin <- rbinom(1, 1, 0.5)
190 | 
191 |         if(coin == 1) {  ## random walk
192 |                 z <- z + 1
193 |         } else {
194 |                 z <- z - 1
195 |         }
196 | }
197 | \end{verbatim}
198 | Conditions are always evaluated from left to right.
199 | \end{frame}
200 | 
201 | \begin{frame}[fragile]{repeat}
202 | Repeat initiates an infinite loop; these are not commonly used in
203 | statistical applications but they do have their uses.  The only way to
204 | exit a \code{repeat} loop is to call \code{break}.
205 | \begin{verbatim}
206 | x0 <- 1
207 | tol <- 1e-8
208 | 
209 | repeat {
210 |         x1 <- computeEstimate()
211 | 
212 |         if(abs(x1 - x0) < tol) {
213 |                 break
214 |         } else {
215 |                 x0 <- x1
216 |         }
217 | }
218 | \end{verbatim}
219 | \end{frame}
220 | 
221 | \begin{frame}{repeat}
222 | The loop in the previous slide is a bit dangerous because there's no
223 | guarantee it will stop.  Better to set a hard limit on the number of
224 | iterations (e.g. using a for loop) and then report whether convergence
225 | was achieved or not.
226 | \end{frame}
227 | 
228 | \begin{frame}[fragile]{next, return}
229 | \code{next} is used to skip an iteration of a loop
230 | \begin{verbatim}
231 | for(i in 1:100) {
232 |         if(i <= 20) {
233 |                 ## Skip the first 20 iterations
234 |                 next
235 |         }
236 |         ## Do something here
237 | }
238 | \end{verbatim}
239 | \code{return} signals that a function should exit and return a given
240 | value
241 | \end{frame}
242 | 
243 | 
244 | \begin{frame}{Control Structures}
245 | Summary
246 | \begin{itemize}
247 |   \item Control structures like \code{if}, \code{while}, and \code{for}
248 |     allow you to control the flow of an R program
249 |   \item Infinite loops should generally be avoided, even if they are
250 |     theoretically correct.
251 |   \item Control structures mentiond here are primarily useful for
252 |     writing programs; for command-line interactive work, the *apply
253 |     functions are more useful.
254 | \end{itemize}
255 | \end{frame}
256 | 
257 | 
258 | 
259 | 
260 | \end{document}
261 | 
262 | 
263 | 


--------------------------------------------------------------------------------
/datatypes1.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[The R Language]{Introduction to the R Language}
 26 | 
 27 | \subtitle{Data Types and Basic Operations}
 28 | 
 29 | %\author{Roger D. Peng}
 30 | % - Give the names in the same order as the appear in the paper.
 31 | % - Use the \inst{?} command only if the authors have different
 32 | %   affiliation.
 33 | 
 34 | %\institute{
 35 | %  \inst{1}%
 36 | %  Department Biostatistics\\
 37 | %  Johns Hopkins Bloomberg School of Public Health
 38 | %  \and
 39 | %  \inst{2}%
 40 | %  Department of Preventive Medicine\\
 41 | %  Feinberg School of Medicine, Northwestern University
 42 | %}
 43 | % - Use the \inst command only if there are several affiliations.
 44 | % - Keep it simple, no one is interested in your street address.
 45 | 
 46 | \date{Computing for Data Analysis}
 47 | 
 48 | \setbeamertemplate{footline}[page number]
 49 | 
 50 | 
 51 | \begin{document}
 52 | 
 53 | \begin{frame}
 54 |   \titlepage
 55 | \end{frame}
 56 | 
 57 | 
 58 | \begin{frame}{Objects}
 59 | R has five basic or ``atomic'' classes of objects:
 60 | \begin{itemize}
 61 | \item
 62 | character
 63 | \item
 64 | numeric (real numbers)
 65 | \item
 66 | integer
 67 | \item
 68 | complex
 69 | \item
 70 | logical  (True/False)
 71 | \end{itemize}
 72 | The most basic object is a vector
 73 | \begin{itemize}
 74 | \item
 75 | A vector can only contain objects of the same class
 76 | \item
 77 | BUT: The one exception is a \textit{list}, which is represented as a
 78 | vector but can contain objects of different classes (indeed, that's
 79 | usually why we use them)
 80 | \end{itemize}
 81 | Empty vectors can be created with the \code{vector()} function.
 82 | \end{frame}
 83 | 
 84 | \begin{frame}{Numbers}
 85 | \begin{itemize}
 86 | \item
 87 | Numbers in R a generally treated as numeric objects (i.e. double
 88 | precision real numbers)
 89 | \item
 90 | If you explicitly want an integer, you need to specify the \code{L}
 91 | suffix
 92 | \item
 93 | Ex: Entering \code{1} gives you a numeric object; entering \code{1L}
 94 | explicitly gives you an integer. 
 95 | \item
 96 | There is also a special number \code{Inf} which represents infinity;
 97 | e.g.  \code{1 / 0}; \code{Inf} can be used in ordinary calculations;
 98 | e.g. \code{1 / Inf} is 0
 99 | \item
100 | The value \code{NaN} represents an undefined value (``not a number'');
101 | e.g. 0 / 0; \code{NaN} can also be thought of as a missing value (more
102 | on that later)
103 | \end{itemize}
104 | \end{frame}
105 | 
106 | \begin{frame}{Attributes}
107 | R objects can have attributes
108 | \begin{itemize}
109 | \item
110 | names, dimnames
111 | \item
112 | dimensions (e.g. matrices, arrays)
113 | \item
114 | class
115 | \item
116 | length
117 | \item
118 | other user-defined attributes/metadata
119 | \end{itemize}
120 | Attributes of an object can be accessed using the \code{attributes()}
121 | function.
122 | \end{frame}
123 | 
124 | \begin{frame}[fragile]{Entering Input}
125 | At the R prompt we type \textit{expressions}.  The \code{<-} symbol is
126 | the assignment operator.
127 | \begin{verbatim}
128 | > x <- 1
129 | > print(x)
130 | [1] 1
131 | > x
132 | [1] 1
133 | > msg <- "hello"
134 | \end{verbatim}
135 | The grammar of the language determines whether an expression is
136 | complete or not.
137 | \begin{verbatim}
138 | > x <-  ## Incomplete expression
139 | \end{verbatim}
140 | The \code{\#} character indicates a \textit{comment}.  Anything to the
141 | right of the \code{\#} (including the \code{\#} itself) is ignored.
142 | \end{frame}
143 | 
144 | \begin{frame}[fragile]{Evaluation}
145 | When a complete expression is entered at the prompt, it is
146 | \textit{evaluated} and the result of the evaluated expression is
147 | returned.  The result may be \textit{auto-printed}.
148 | \begin{verbatim}
149 | > x <- 5  ## nothing printed
150 | > x       ## auto-printing occurs
151 | [1] 5
152 | > print(x)  ## explicit printing
153 | [1] 5
154 | \end{verbatim}
155 | The \code{[1]} indicates that \code{x} is a vector and 5 is the first
156 | element.
157 | \end{frame}
158 | 
159 | 
160 | \begin{frame}[fragile]{Printing}
161 | \begin{verbatim}
162 | > x <- 1:20
163 | > x
164 |  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
165 | [16] 16 17 18 19 20
166 | \end{verbatim}
167 | The \code{:} operator is used to create integer sequences.
168 | \end{frame}
169 | 
170 | \begin{frame}[fragile]{Creating Vectors}
171 | The \code{c()} function can be used to create vectors of objects.
172 | \begin{verbatim}
173 | > x <- c(0.5, 0.6)       ## numeric
174 | > x <- c(TRUE, FALSE)    ## logical
175 | > x <- c(T, F)           ## logical
176 | > x <- c("a", "b", "c")  ## character
177 | > x <- 9:29              ## integer
178 | > x <- c(1+0i, 2+4i)     ## complex
179 | \end{verbatim}
180 | Using the \code{vector()} function
181 | \begin{verbatim}
182 | > x <- vector("numeric", length = 10)
183 | > x
184 |  [1] 0 0 0 0 0 0 0 0 0 0
185 | \end{verbatim}
186 | \end{frame}
187 | 
188 | \begin{frame}[fragile]{Mixing Objects}
189 | What about the following?
190 | \begin{verbatim}
191 | > y <- c(1.7, "a")   ## character
192 | > y <- c(TRUE, 2)    ## numeric
193 | > y <- c("a", TRUE)  ## character
194 | \end{verbatim}
195 | When different objects are mixed in a vector, \textit{coercion} occurs
196 | so that every element in the vector is of the same class.
197 | \end{frame}
198 | 
199 | \begin{frame}[fragile]{Explicit Coercion}
200 | Objects can be explicitly coerced from one class to another using the
201 | \code{as.*} functions, if available.
202 | \begin{verbatim}
203 | > x <- 0:6
204 | > class(x)
205 | [1] "integer"
206 | > as.numeric(x)
207 | [1] 0 1 2 3 4 5 6
208 | > as.logical(x)
209 | [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
210 | > as.character(x)
211 | [1] "0" "1" "2" "3" "4" "5" "6"
212 | > as.complex(x)
213 | [1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i
214 | \end{verbatim}
215 | \end{frame}
216 | 
217 | \begin{frame}[fragile]{Explicit Coercion}
218 | Nonsensical coercion results in \code{NA}s.
219 | \begin{verbatim}
220 | > x <- c("a", "b", "c")
221 | > as.numeric(x)
222 | [1] NA NA NA
223 | Warning message:
224 | NAs introduced by coercion 
225 | > as.logical(x)
226 | [1] NA NA NA
227 | \end{verbatim}
228 | \end{frame}
229 | 
230 | 
231 | \begin{frame}[fragile]{Matrices}
232 | Matrices are vectors with a \textit{dimension} attribute.  The
233 | dimension attribute is itself an integer vector of length 2 (nrow,
234 | ncol)
235 | \begin{verbatim}
236 | > m <- matrix(nrow = 2, ncol = 3)
237 | > m
238 |      [,1] [,2] [,3]
239 | [1,]   NA   NA   NA
240 | [2,]   NA   NA   NA
241 | > dim(m)
242 | [1] 2 3
243 | > attributes(m)
244 | $dim
245 | [1] 2 3
246 | \end{verbatim}
247 | \end{frame}
248 | 
249 | \begin{frame}[fragile]{Matrices (cont'd)}
250 | Matrices are constructed \textit{column-wise}, so entries can be
251 | thought of starting in the ``upper left'' corner and running down the
252 | columns.
253 | \begin{verbatim}
254 | > m <- matrix(1:6, nrow = 2, ncol = 3)
255 | > m
256 |      [,1] [,2] [,3]
257 | [1,]    1    3    5
258 | [2,]    2    4    6
259 | \end{verbatim}
260 | \end{frame}
261 | 
262 | \begin{frame}[fragile]{Matrices (cont'd)}
263 | Matrices can also be created directly from vectors by adding a
264 | dimension attribute.
265 | \begin{verbatim}
266 | > m <- 1:10
267 | > m
268 |  [1]  1  2  3  4  5  6  7  8  9 10
269 | > dim(m) <- c(2, 5)
270 | > m
271 |      [,1] [,2] [,3] [,4] [,5]
272 | [1,]    1    3    5    7    9
273 | [2,]    2    4    6    8   10
274 | \end{verbatim}
275 | \end{frame}
276 | 
277 | \begin{frame}[fragile]{cbind-ing and rbind-ing}
278 | Matrices can be created by \textit{column-binding} or
279 | \textit{row-binding} with \code{cbind()} and \code{rbind()}.
280 | \begin{verbatim}
281 | > x <- 1:3
282 | > y <- 10:12
283 | > cbind(x, y)
284 |      x  y
285 | [1,] 1 10
286 | [2,] 2 11
287 | [3,] 3 12
288 | > rbind(x, y)
289 |   [,1] [,2] [,3]
290 | x    1    2    3
291 | y   10   11   12
292 | \end{verbatim}
293 | \end{frame}
294 | 
295 | \begin{frame}[fragile]{Lists}
296 | Lists are a special type of vector that can contain elements of
297 | different classes.  Lists are a very important data type in R and you
298 | should get to know them well.
299 | \begin{verbatim}
300 | > x <- list(1, "a", TRUE, 1 + 4i)
301 | > x
302 | [[1]]
303 | [1] 1
304 | 
305 | [[2]]
306 | [1] "a"
307 | 
308 | [[3]]
309 | [1] TRUE
310 | 
311 | [[4]]
312 | [1] 1+4i
313 | \end{verbatim}
314 | \end{frame}
315 | 
316 | \begin{frame}{Factors}
317 | Factors are used to represent categorical data.  Factors can be
318 | unordered or ordered.  One can think of a factor as an integer vector
319 | where each integer has a \textit{label}.
320 | \begin{itemize}
321 | \item
322 | Factors are treated specially by modelling functions like \code{lm()}
323 | and \code{glm()}
324 | \item
325 | Using factors with labels is \textit{better} than using integers
326 | because factors are self-describing; having a variable that has values
327 | ``Male'' and ``Female'' is better than a variable that has values 1
328 | and 2.
329 | \end{itemize}
330 | \end{frame}
331 | 
332 | \begin{frame}[fragile]{Factors}
333 | \begin{verbatim}
334 | > x <- factor(c("yes", "yes", "no", "yes", "no"))
335 | > x
336 | [1] yes yes no  yes no 
337 | Levels: no yes
338 | > table(x)
339 | x
340 |  no yes 
341 |   2   3 
342 | > unclass(x)
343 | [1] 2 2 1 2 1
344 | attr(,"levels")
345 | [1] "no"  "yes"
346 | \end{verbatim}
347 | \end{frame}
348 | 
349 | 
350 | \begin{frame}[fragile]{Factors}
351 | The order of the levels can be set using the \code{levels} argument to
352 | \code{factor()}.  This can be important in linear modelling because
353 | the first level is used as the baseline level.
354 | \begin{verbatim}
355 | > x <- factor(c("yes", "yes", "no", "yes", "no"), 
356 |               levels = c("yes", "no"))
357 | > x
358 | [1] yes yes no  yes no 
359 | Levels: yes no
360 | \end{verbatim}
361 | \end{frame}
362 | 
363 | \begin{frame}[fragile]{Missing Values}
364 | Missing values are denoted by \code{NA} or \code{NaN} for undefined
365 | mathematical operations.
366 | \begin{itemize}
367 | \item
368 | \code{is.na()} is used to test objects if they are \code{NA}
369 | \item
370 | \code{is.nan()} is used to test for \code{NaN}
371 | \item
372 | \code{NA} values have a class also, so there are integer \code{NA},
373 | character \code{NA}, etc.  
374 | \item
375 | A \code{NaN} value is also \code{NA} but the converse is not true
376 | \end{itemize}
377 | \end{frame}
378 | 
379 | \begin{frame}[fragile]{Missing Values}
380 | \begin{verbatim}
381 | > x <- c(1, 2, NA, 10, 3)
382 | > is.na(x)
383 | [1] FALSE FALSE  TRUE FALSE FALSE
384 | > is.nan(x)
385 | [1] FALSE FALSE FALSE FALSE FALSE
386 | > x <- c(1, 2, NaN, NA, 4)
387 | > is.na(x)
388 | [1] FALSE FALSE  TRUE  TRUE FALSE
389 | > is.nan(x)
390 | [1] FALSE FALSE  TRUE FALSE FALSE
391 | \end{verbatim}
392 | \end{frame}
393 | 
394 | \begin{frame}{Data Frames}
395 | Data frames are used to store tabular data
396 | \begin{itemize}
397 | \item
398 | They are represented as a special type of list where every element of
399 | the list has to have the same length
400 | \item
401 | Each element of the list can be thought of as a column and the length
402 | of each element of the list is the number of rows
403 | \item
404 | Unlike matrices, data frames can store different classes of objects in
405 | each column (just like lists); matrices must have every element be the
406 | same class
407 | \item
408 | Data frames also have a special attribute called \code{row.names}
409 | \item
410 | Data frames are usually created by calling \code{read.table()} or
411 | \code{read.csv()}
412 | \item
413 | Can be converted to a matrix by calling \code{data.matrix()}
414 | \end{itemize}
415 | \end{frame}
416 | 
417 | \begin{frame}[fragile]{Data Frames}
418 | \begin{verbatim}
419 | > x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
420 | > x
421 |   foo   bar
422 | 1   1  TRUE
423 | 2   2  TRUE
424 | 3   3 FALSE
425 | 4   4 FALSE
426 | > nrow(x)
427 | [1] 4
428 | > ncol(x)
429 | [1] 2
430 | \end{verbatim}
431 | \end{frame}
432 | 
433 | \begin{frame}[fragile]{Names}
434 | R objects can also have names, which is very useful for writing
435 | readable code and self-describing objects.
436 | \begin{verbatim}
437 | > x <- 1:3
438 | > names(x)
439 | NULL
440 | > names(x) <- c("foo", "bar", "norf")
441 | > x
442 |  foo  bar norf 
443 |    1    2    3 
444 | > names(x)
445 | [1] "foo"  "bar"  "norf"
446 | \end{verbatim}
447 | \end{frame}
448 | 
449 | \begin{frame}[fragile]{Names}
450 | Lists can also have names.
451 | \begin{verbatim}
452 | > x <- list(a = 1, b = 2, c = 3)
453 | > x
454 | $a
455 | [1] 1
456 | 
457 | $b
458 | [1] 2
459 | 
460 | $c
461 | [1] 3
462 | \end{verbatim}
463 | \end{frame}
464 | 
465 | \begin{frame}[fragile]{Names}
466 | And matrices.
467 | \begin{verbatim}
468 | > m <- matrix(1:4, nrow = 2, ncol = 2)
469 | > dimnames(m) <- list(c("a", "b"), c("c", "d"))
470 | > m
471 |   c d
472 | a 1 3
473 | b 2 4
474 | \end{verbatim}
475 | \end{frame}
476 | 
477 | \begin{frame}[fragile]{Summary}
478 | Data Types
479 | \begin{itemize}
480 | \item atomic classes: numeric, logical, character, integer, complex
481 | \item vectors, lists
482 | \item factors
483 | \item missing values
484 | \item data frames
485 | \item names
486 | \end{itemize}
487 | \end{frame}
488 | 
489 | 
490 | 
491 | \end{document}
492 | 
493 | 
494 | 


--------------------------------------------------------------------------------
/datatypes2.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[The R Language]{Introduction to the R Language}
 26 | 
 27 | \subtitle{Data Types and Basic Operations}
 28 | 
 29 | %\author{Roger D. Peng}
 30 | % - Give the names in the same order as the appear in the paper.
 31 | % - Use the \inst{?} command only if the authors have different
 32 | %   affiliation.
 33 | 
 34 | %\institute{
 35 | %  \inst{1}%
 36 | %  Department Biostatistics\\
 37 | %  Johns Hopkins Bloomberg School of Public Health
 38 | %  \and
 39 | %  \inst{2}%
 40 | %  Department of Preventive Medicine\\
 41 | %  Feinberg School of Medicine, Northwestern University
 42 | %}
 43 | % - Use the \inst command only if there are several affiliations.
 44 | % - Keep it simple, no one is interested in your street address.
 45 | 
 46 | \date{Computing for Data Analysis}
 47 | 
 48 | \setbeamertemplate{footline}[page number]
 49 | 
 50 | 
 51 | \begin{document}
 52 | 
 53 | \begin{frame}
 54 |   \titlepage
 55 | \end{frame}
 56 | 
 57 | 
 58 | 
 59 | 
 60 | 
 61 | 
 62 | 
 63 | 
 64 | \begin{frame}[fragile]{Subsetting}
 65 | There are a number of operators that can be used to extract subsets of
 66 | R objects.
 67 | \begin{itemize}
 68 | \item
 69 | \verb+[+ always returns an object of the same class as the original;
 70 | can be used to select more than one element (there is one exception)
 71 | \item
 72 | \verb+[[+ is used to extract elements of a list or a data frame; it
 73 | can only be used to extract a single element and the class of the
 74 | returned object will not necessarily be a list or data frame
 75 | \item
 76 | \verb+$+ is used to extract elements of a list or data frame by name;
 77 | semantics are similar to hat of \verb+[[+.
 78 | \end{itemize}
 79 | \end{frame}
 80 | 
 81 | \begin{frame}[fragile]{Subsetting}
 82 | \begin{verbatim}
 83 | > x <- c("a", "b", "c", "c", "d", "a")
 84 | > x[1]
 85 | [1] "a"
 86 | > x[2]
 87 | [1] "b"
 88 | > x[1:4]
 89 | [1] "a" "b" "c" "c"
 90 | > x[x > "a"]
 91 | [1] "b" "c" "c" "d"
 92 | > u <- x > "a"
 93 | > u
 94 | [1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE
 95 | > x[u]
 96 | [1] "b" "c" "c" "d"
 97 | \end{verbatim}
 98 | \end{frame}
 99 | 
100 | \begin{frame}[fragile]{Subsetting a Matrix}
101 | Matrices can be subsetted in the usual way with $(i, j)$ type indices.
102 | \begin{verbatim}
103 | > x <- matrix(1:6, 2, 3)
104 | > x[1, 2]
105 | [1] 3
106 | > x[2, 1]
107 | [1] 2
108 | \end{verbatim}
109 | Indices can also be missing.
110 | \begin{verbatim}
111 | > x[1, ]
112 | [1] 1 3 5
113 | > x[, 2]
114 | [1] 3 4
115 | \end{verbatim}
116 | \end{frame}
117 | 
118 | \begin{frame}[fragile]{Subsetting a Matrix}
119 | By default, when a single element of a matrix is retrieved, it is
120 | returned as a vector of length 1 rather than a $1\times 1$ matrix.
121 | This behavior can be turned off by setting \code{drop = FALSE}.
122 | \begin{verbatim}
123 | > x <- matrix(1:6, 2, 3)
124 | > x[1, 2]
125 | [1] 3
126 | 
127 | > x[1, 2, drop = FALSE]
128 |      [,1]
129 | [1,]    3
130 | \end{verbatim}
131 | \end{frame}
132 | 
133 | \begin{frame}[fragile]{Subsetting a Matrix}
134 | Similarly, subsetting a single column or a single row will give you a
135 | vector, not a matrix (by default).
136 | \begin{verbatim}
137 | > x <- matrix(1:6, 2, 3)
138 | > x[1, ]
139 | [1] 1 3 5
140 | > x[1, , drop = FALSE]
141 |      [,1] [,2] [,3]
142 | [1,]    1    3    5
143 | \end{verbatim}
144 | \end{frame}
145 | 
146 | \begin{frame}[fragile]{Subsetting Lists}
147 | \begin{verbatim}
148 | > x <- list(foo = 1:4, bar = 0.6)
149 | > x[1]
150 | $foo
151 | [1] 1 2 3 4
152 | 
153 | > x[[1]]
154 | [1] 1 2 3 4
155 | 
156 | > x$bar
157 | [1] 0.6
158 | > x[["bar"]]
159 | [1] 0.6
160 | > x["bar"]
161 | $bar
162 | [1] 0.6
163 | \end{verbatim}
164 | \end{frame}
165 | 
166 | \begin{frame}[fragile]{Subsetting Lists}
167 | Extracting multiple elements of a list.
168 | \begin{verbatim}
169 | > x <- list(foo = 1:4, bar = 0.6, baz = "hello")
170 | > x[c(1, 3)]
171 | $foo
172 | [1] 1 2 3 4
173 | 
174 | $baz
175 | [1] "hello"
176 | \end{verbatim}
177 | \end{frame}
178 | 
179 | \begin{frame}[fragile]{Subsetting Lists}
180 | The \verb+[[+ operator can be used with \textit{computed} indices;
181 | \verb+$+ can only be used with literal names.
182 | \begin{verbatim}
183 | > x <- list(foo = 1:4, bar = 0.6, baz = "hello")
184 | > name <- "foo"
185 | > x[[name]]  ## computed index for `foo'
186 | [1] 1 2 3 4
187 | > x$name     ## element `name' doesn't exist!
188 | NULL
189 | > x$foo
190 | [1] 1 2 3 4  ## element `foo' does exist
191 | \end{verbatim}
192 | \end{frame}
193 | 
194 | \begin{frame}[fragile]{Subsetting Nested Elements of a List}
195 | The \verb+[[+ can take an integer sequence.
196 | \begin{verbatim}
197 | > x <- list(a = list(10, 12, 14), b = c(3.14, 2.81))
198 | > x[[c(1, 3)]]
199 | [1] 14
200 | > x[[1]][[3]]
201 | [1] 14
202 | 
203 | > x[[c(2, 1)]]
204 | [1] 3.14
205 | \end{verbatim}
206 | \end{frame}
207 | 
208 | \begin{frame}[fragile]{Partial Matching}
209 | Partial matching of names is allowed with \verb+[[+ and \verb+$+.
210 | \begin{verbatim}
211 | > x <- list(aardvark = 1:5)
212 | > x$a
213 | [1] 1 2 3 4 5
214 | > x[["a"]]
215 | NULL
216 | > x[["a", exact = FALSE]]
217 | [1] 1 2 3 4 5
218 | \end{verbatim}
219 | \end{frame}
220 | 
221 | \begin{frame}[fragile]{Removing NA Values}
222 | A common task is to remove missing values (\code{NA}s).
223 | \begin{verbatim}
224 | > x <- c(1, 2, NA, 4, NA, 5)
225 | > bad <- is.na(x)
226 | > x[!bad]
227 | [1] 1 2 4 5
228 | \end{verbatim}
229 | \end{frame}
230 | 
231 | \begin{frame}[fragile]{Removing NA Values}
232 | What if there are multiple things and you want to take the subset with
233 | no missing values?
234 | \begin{verbatim}
235 | > x <- c(1, 2, NA, 4, NA, 5)
236 | > y <- c("a", "b", NA, "d", NA, "f")
237 | > good <- complete.cases(x, y)
238 | > good
239 | [1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE
240 | > x[good]
241 | [1] 1 2 4 5
242 | > y[good]
243 | [1] "a" "b" "d" "f"
244 | \end{verbatim}
245 | \end{frame}
246 | 
247 | \begin{frame}[fragile]{Removing NA Values}
248 | \begin{verbatim}
249 | > airquality[1:6, ]
250 |   Ozone Solar.R Wind Temp Month Day
251 | 1    41     190  7.4   67     5   1
252 | 2    36     118  8.0   72     5   2
253 | 3    12     149 12.6   74     5   3
254 | 4    18     313 11.5   62     5   4
255 | 5    NA      NA 14.3   56     5   5
256 | 6    28      NA 14.9   66     5   6
257 | > good <- complete.cases(airquality)
258 | > airquality[good, ][1:6, ]
259 |   Ozone Solar.R Wind Temp Month Day
260 | 1    41     190  7.4   67     5   1
261 | 2    36     118  8.0   72     5   2
262 | 3    12     149 12.6   74     5   3
263 | 4    18     313 11.5   62     5   4
264 | 7    23     299  8.6   65     5   7
265 | 8    19      99 13.8   59     5   8
266 | \end{verbatim}
267 | \end{frame}
268 | 
269 | 
270 | \end{document}
271 | 


--------------------------------------------------------------------------------
/debugging.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | 
 17 | \usepackage{amsmath,amsfonts,amssymb}
 18 | 
 19 | \input{macros}
 20 | 
 21 | \title[Debugging]{Debugging}
 22 | 
 23 | 
 24 | \date{Computing for Data Analysis}
 25 | 
 26 | \setbeamertemplate{footline}[page number]
 27 | 
 28 | \begin{document}
 29 | 
 30 | \begin{frame}
 31 |   \titlepage
 32 | \end{frame}
 33 | 
 34 | 
 35 | \begin{frame}{Something's Wrong!}
 36 | Indications that something's not right
 37 | \begin{itemize}
 38 | \item \code{message}: A generic notification/diagnostic message
 39 |   produced by the \code{message} function; execution of the function
 40 |   continues
 41 | \item \code{warning}: An indication that something is wrong but not
 42 |   necessarily fatal; execution of the function continues; generated by
 43 |   the \code{warning} function
 44 | \item \code{error}: An indication that a fatal problem has occurred;
 45 |   execution stops; produced by the \code{stop} function
 46 | \item \code{condition}: A generic concept for indicating that
 47 |   something unexpected can occur; programmers can create their own
 48 |   conditions
 49 | \end{itemize}
 50 | \end{frame}
 51 | 
 52 | \begin{frame}[fragile]{Something's Wrong!}
 53 | Warning
 54 | \begin{verbatim}
 55 | > log(-1)
 56 | [1] NaN
 57 | Warning message:
 58 | In log(-1) : NaNs produced
 59 | \end{verbatim}
 60 | \end{frame}
 61 | 
 62 | \begin{frame}[fragile]{Something's Wrong}
 63 | \begin{verbatim}
 64 | printmessage <- function(x) {
 65 |         if(x > 0)
 66 |                 print("x is greater than zero")
 67 |         else 
 68 |                 print("x is less than or equal to zero")
 69 |         invisible(x)
 70 | }
 71 | \end{verbatim}
 72 | \end{frame}
 73 | 
 74 | \begin{frame}[fragile]{Something's Wrong}
 75 | \begin{verbatim}
 76 | printmessage <- function(x) {
 77 |         if(x > 0)
 78 |                 print("x is greater than zero")
 79 |         else 
 80 |                 print("x is less than or equal to zero")
 81 |         invisible(x)
 82 | }
 83 | > printmessage(1)
 84 | [1] "x is greater than zero"
 85 | > printmessage(NA)
 86 | Error in if (x > 0) { : missing value where TRUE/FALSE needed
 87 | \end{verbatim}
 88 | \end{frame}
 89 | 
 90 | \begin{frame}[fragile]{Something's Wrong!}
 91 | \begin{verbatim}
 92 | printmessage2 <- function(x) {
 93 |         if(is.na(x))
 94 |                 print("x is a missing value!")
 95 |         else if(x > 0)
 96 |                 print("x is greater than zero")
 97 |         else 
 98 |                 print("x is less than or equal to zero")
 99 |         invisible(x)
100 | }
101 | \end{verbatim}
102 | \end{frame}
103 | 
104 | \begin{frame}[fragile]{Something's Wrong!}
105 | \begin{verbatim}
106 | printmessage2 <- function(x) {
107 |         if(is.na(x))
108 |                 print("x is a missing value!")
109 |         else if(x > 0)
110 |                 print("x is greater than zero")
111 |         else 
112 |                 print("x is less than or equal to zero")
113 |         invisible(x)
114 | }
115 | > x <- log(-1)
116 | Warning message:
117 | In log(-1) : NaNs produced
118 | > printmessage2(x)
119 | [1] "x is a missing value!"
120 | \end{verbatim}
121 | \end{frame}
122 | 
123 | \begin{frame}{Something's Wrong!}
124 | How do you know that something is wrong with your function?
125 | \begin{itemize}
126 | \item What was your input? How did you call the function?
127 | \item What were you expecting? Output, messages, other results?
128 | \item What did you get?
129 | \item How does what you get differ from what you were expecting?
130 | \item Were your expectations correct in the first place?
131 | \item Can you reproduce the problem (exactly)?
132 | \end{itemize}
133 | \end{frame}
134 | 
135 | \begin{frame}{Debugging Tools in R}
136 | The primary tools for debugging functions in R are
137 | \begin{itemize}
138 | \item \code{traceback}: prints out the function call stack after an
139 |   error occurs; does nothing if there's no error
140 | \item \code{debug}: flags a function for ``debug'' mode which allows
141 |   you to step through execution of a function one line at a time
142 | \item \code{browser}: suspends the execution of a function wherever it
143 |   is called and puts the function in debug mode 
144 | \item \code{trace}: allows you to insert debugging code into a
145 |   function a specific places
146 | \item \code{recover}: allows you to modify the error behavior so that
147 |   you can browse the function call stack
148 | \end{itemize}
149 | These are interactive tools specifically designed to allow you to pick
150 | through a function. There's also the more blunt technique of inserting
151 | \code{print}/\code{cat} statements in the function.
152 | \end{frame}
153 | 
154 | 
155 | 
156 | 
157 | \begin{frame}[fragile]{traceback}
158 | \begin{verbatim}
159 | > mean(x)
160 | Error in mean(x) : object 'x' not found
161 | > traceback()
162 | 1: mean(x)
163 | > 
164 | \end{verbatim}
165 | \end{frame}
166 | 
167 | \begin{frame}[fragile]{traceback}
168 | \begin{verbatim}
169 | > lm(y ~ x)
170 | Error in eval(expr, envir, enclos) : object 'y' not found
171 | > traceback()
172 | 7: eval(expr, envir, enclos)
173 | 6: eval(predvars, data, env)
174 | 5: model.frame.default(formula = y ~ x, drop.unused.levels = TRUE)
175 | 4: model.frame(formula = y ~ x, drop.unused.levels = TRUE)
176 | 3: eval(expr, envir, enclos)
177 | 2: eval(mf, parent.frame())
178 | 1: lm(y ~ x)
179 | \end{verbatim}
180 | \end{frame}
181 | 
182 | \begin{frame}[fragile]{debug}
183 | \begin{verbatim}
184 | > debug(lm)
185 | > lm(y ~ x)
186 | debugging in: lm(y ~ x)
187 | debug: {
188 |     ret.x <- x
189 |     ret.y <- y
190 |     cl <- match.call()
191 |     ...
192 |     if (!qr) 
193 |         z$qr <- NULL
194 |     z
195 | }
196 | Browse[2]> 
197 | \end{verbatim}
198 | \end{frame}
199 | 
200 | \begin{frame}[fragile]{debug}
201 | \begin{verbatim}
202 | Browse[2]> n
203 | debug: ret.x <- x
204 | Browse[2]> n
205 | debug: ret.y <- y
206 | Browse[2]> n
207 | debug: cl <- match.call()
208 | Browse[2]> n
209 | debug: mf <- match.call(expand.dots = FALSE)
210 | Browse[2]> n
211 | debug: m <- match(c("formula", "data", "subset", "weights", "na.action", 
212 |     "offset"), names(mf), 0L)
213 | \end{verbatim}
214 | \end{frame}
215 | 
216 | \begin{frame}[fragile]{recover}
217 | \begin{verbatim}
218 | > options(error = recover)
219 | > read.csv("nosuchfile")
220 | Error in file(file, "rt") : cannot open the connection
221 | In addition: Warning message:
222 | In file(file, "rt") :
223 |   cannot open file 'nosuchfile': No such file or directory
224 | 
225 | Enter a frame number, or 0 to exit   
226 | 
227 | 1: read.csv("nosuchfile")
228 | 2: read.table(file = file, header = header, sep = sep, quote = quote, dec = de
229 | 3: file(file, "rt")
230 | 
231 | Selection: 
232 | \end{verbatim}
233 | \end{frame}
234 | 
235 | \begin{frame}{Debugging}
236 | Summary
237 | \begin{itemize}
238 |   \item There are three main indications of a problem/condition:
239 |     message, warning, error; only an error is fatal
240 |   \item When analyzing a function with a problem, make sure you can
241 |     reproduce the problem, clearly state your expectations and how the
242 |     output differs from your expectation
243 |   \item Interactive debugging tools \code{traceback}, \code{debug},
244 |     \code{browser}, \code{trace}, and \code{recover} can be used to
245 |     find problematic code in functions
246 |   \item Debugging tools are not a substitute for thinking!
247 | \end{itemize}
248 | \end{frame}
249 | 
250 | 
251 | \end{document}
252 | 
253 | 
254 | 


--------------------------------------------------------------------------------
/functions.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[The R Language]{Introduction to the R Language}
 26 | 
 27 | \subtitle{Functions}
 28 | 
 29 | \date{Computing for Data Analysis}
 30 | 
 31 | \setbeamertemplate{footline}[page number]
 32 | 
 33 | 
 34 | \begin{document}
 35 | 
 36 | \begin{frame}
 37 |   \titlepage
 38 | \end{frame}
 39 | 
 40 | \begin{frame}[fragile]{Functions}
 41 | Functions are created using the \code{function()} directive and are
 42 | stored as R objects just like anything else.  In particular, they are
 43 | R objects of class ``function''.
 44 | \begin{verbatim}
 45 | f <- function(<arguments>) {
 46 |         ## Do something interesting
 47 | }
 48 | \end{verbatim}
 49 | Functions in R are ``first class objects'', which means that they can
 50 | be treated much like any other R object.  Importantly,
 51 | \begin{itemize}
 52 | \item
 53 | Functions can be passed as arguments to other functions
 54 | \item
 55 | Functions can be nested, so that you can define a function inside of
 56 | another function
 57 | \end{itemize}
 58 | The return value of a function is the last expression in the function
 59 | body to be evaluated.
 60 | \end{frame}
 61 | 
 62 | \begin{frame}{Function Arguments}
 63 | Functions have \textit{named arguments} which potentially have
 64 | \textit{default values}.
 65 | \begin{itemize}
 66 | \item
 67 | The \textit{formal arguments} are the arguments included in the
 68 | function definition
 69 | \item
 70 | The \code{formals} function returns a list of all the formal arguments
 71 | of a function
 72 | \item
 73 | Not every function call in R makes use of all the formal arguments
 74 | \item
 75 | Function arguments can be \textit{missing} or might have default
 76 | values
 77 | \end{itemize}
 78 | \end{frame}
 79 | 
 80 | \begin{frame}[fragile]{Argument Matching}
 81 | R functions arguments can be matched positionally or by name. So the
 82 | following calls to \code{sd} are all equivalent
 83 | \begin{verbatim}
 84 | > mydata <- rnorm(100)
 85 | > sd(mydata)
 86 | > sd(x = mydata)
 87 | > sd(x = mydata, na.rm = FALSE)
 88 | > sd(na.rm = FALSE, x = mydata)
 89 | > sd(na.rm = FALSE, mydata)
 90 | \end{verbatim}
 91 | Even though it's legal, I don't recommend messing around with the
 92 | order of the arguments too much, since it can lead to some confusion.
 93 | \end{frame}
 94 | 
 95 | \begin{frame}[fragile]{Argument Matching}
 96 | You can mix positional matching with matching by name.  When an
 97 | argument is matched by name, it is ``taken out'' of the argument list
 98 | and the remaining unnamed arguments are matched in the order that they
 99 | are listed in the function definition.
100 | \begin{verbatim}
101 | > args(lm)
102 | function (formula, data, subset, weights, na.action, 
103 |           method = "qr", model = TRUE, x = FALSE, 
104 |           y = FALSE, qr = TRUE, singular.ok = TRUE, 
105 |           contrasts = NULL, offset, ...) 
106 | \end{verbatim}
107 | The following two calls are equivalent.
108 | \begin{verbatim}
109 | lm(data = mydata, y ~ x, model = FALSE, 1:100)
110 | lm(y ~ x, mydata, 1:100, model = FALSE)
111 | \end{verbatim}
112 | \end{frame}
113 | 
114 | \begin{frame}{Argument Matching}
115 | \begin{itemize}
116 | \item
117 | Most of the time, named arguments are useful on the command line when
118 | you have a long argument list and you want to use the defaults for
119 | everything except for an argument near the end of the list
120 | \item
121 | Named arguments also help if you can remember the name of the argument
122 | and not its position on the argument list (plotting is a good
123 | example).
124 | \end{itemize}
125 | \end{frame}
126 | 
127 | \begin{frame}{Argument Matching}
128 | Function arguments can also be \textit{partially} matched, which is
129 | useful for interactive work.  The order of operations when given an
130 | argument is
131 | \begin{enumerate}
132 | \item
133 | Check for exact match for a named argument
134 | \item
135 | Check for a partial match
136 | \item
137 | Check for a positional match
138 | \end{enumerate}
139 | \end{frame}
140 | 
141 | \begin{frame}[fragile]{Defining a Function}
142 | \begin{verbatim}
143 | f <- function(a, b = 1, c = 2, d = NULL) {
144 |         
145 | }
146 | \end{verbatim}
147 | In addition to not specifying a default value, you can also set an
148 | argument value to \code{NULL}.
149 | \end{frame}
150 | 
151 | \begin{frame}[fragile]{Lazy Evaluation}
152 | Arguments to functions are evaluated \textit{lazily}, so they are
153 | evaluated only as needed.
154 | \begin{verbatim}
155 | f <- function(a, b) {
156 |         a^2
157 | }
158 | f(2)
159 | \end{verbatim}
160 | This function never actually uses the argument \code{b}, so calling
161 | \code{f(2)} will not produce an error because the 2 gets positionally
162 | matched to \code{a}.  
163 | \end{frame}
164 | 
165 | \begin{frame}[fragile]{Lazy Evaluation}
166 | Another example
167 | \begin{verbatim}
168 | f <- function(a, b) {
169 |         print(a)
170 |         print(b)
171 | }
172 | \end{verbatim}
173 | \begin{verbatim}
174 | > f(45)
175 | [1] 45
176 | Error in print(b) : argument "b" is missing, with no default
177 | > 
178 | \end{verbatim}
179 | Notice that ``45'' got printed first before the error was triggered.
180 | This is because \code{b} did not have to be evaluated until after
181 | \code{print(a)}.  Once the function tried to evaluate \code{print(b)}
182 | it had to throw an error.
183 | \end{frame}
184 | 
185 | \begin{frame}[fragile]{The ``...'' Argument}
186 | The \code{...} argument indicate a variable number of arguments that
187 | are usually passed on to other functions.
188 | \begin{itemize}
189 | \item
190 | \code{...} is often used when extending another function and you don't
191 | want to copy the entire argument list of the original function
192 | \begin{verbatim}
193 | myplot <- function(x, y, type = "l", ...) {
194 |         plot(x, y, type = type, ...)
195 | }
196 | \end{verbatim}
197 | \item
198 | Generic functions use \code{...} so that extra arguments can be passed
199 | to methods (more on this later).
200 | \begin{verbatim}
201 | > mean
202 | function (x, ...) 
203 | UseMethod("mean")
204 | \end{verbatim}
205 | \end{itemize}
206 | \end{frame}
207 | 
208 | \begin{frame}[fragile]{The ``...'' Argument}
209 | The \code{...} argument is also necessary when the number of arguments
210 | passed to the function cannot be known in advance.
211 | \begin{verbatim}
212 | > args(paste)
213 | function (..., sep = " ", collapse = NULL) 
214 | 
215 | > args(cat)
216 | function (..., file = "", sep = " ", fill = FALSE, 
217 |     labels = NULL, append = FALSE) 
218 | \end{verbatim}
219 | \end{frame}
220 | 
221 | \begin{frame}[fragile]{Arguments Coming After the ``...'' Argument}
222 | One catch with \code{...} is that any arguments that appear
223 | \textit{after} \code{...} on the argument list must be named
224 | explicitly and cannot be partially matched.
225 | \begin{verbatim}
226 | > args(paste)
227 | function (..., sep = " ", collapse = NULL) 
228 | 
229 | > paste("a", "b", sep = ":")
230 | [1] "a:b"
231 | 
232 | > paste("a", "b", se = ":")
233 | [1] "a b :"
234 | \end{verbatim}
235 | \end{frame}
236 | 
237 | 
238 | \begin{frame}[fragile]{A Diversion on Binding Values to Symbol}
239 | How does R know which value to assign to which symbol?  When I type
240 | \begin{verbatim}
241 | > lm <- function(x) { x * x }
242 | > lm
243 | function(x) { x * x }
244 | \end{verbatim}
245 | how does R know what value to assign to the symbol \code{lm}?  Why
246 | doesn't it give it the value of \code{lm} that is in the \pkg{stats}
247 | package?
248 | \end{frame}
249 | 
250 | \begin{frame}[fragile]{A Diversion on Binding Values to Symbol}
251 | When R tries to bind a value to a symbol, it searches through a series
252 | of \code{environments} to find the appropriate value.  When you are
253 | working on the command line and need to retrieve the value of an R
254 | object, the order is roughly
255 | \begin{enumerate}
256 | \item
257 | Search the global environment for a symbol name matching the one
258 | requested.
259 | \item
260 | Search the namespaces of each of the packages on the search list
261 | \end{enumerate}
262 | The search list can be found by using the \code{search} function.
263 | \begin{verbatim}
264 | > search()
265 | [1] ".GlobalEnv"        "package:stats"     "package:graphics" 
266 | [4] "package:grDevices" "package:utils"     "package:datasets" 
267 | [7] "package:methods"   "Autoloads"         "package:base"     
268 | \end{verbatim}
269 | \end{frame}
270 | 
271 | \begin{frame}[fragile]{Binding Values to Symbol}
272 | \begin{itemize}
273 | \item
274 | The \textit{global environment} or the user's workspace is always the
275 | first element of the search list and the \pkg{base} package is always
276 | the last.  
277 | \item
278 | The order of the packages on the search list matters!
279 | \item
280 | User's can configure which packages get loaded on startup so you
281 | cannot assume that there will be a set list of packages available.
282 | \item
283 | When a user loads a package with \code{library} the namespace of that
284 | package gets put in position 2 of the search list (by default) and
285 | everything else gets shifted down the list.
286 | \item
287 | Note that R has separate namespaces for functions and non-functions so
288 | it's possible to have an object named \code{c} and a function named
289 | \code{c}.
290 | \end{itemize}
291 | \end{frame}
292 | 
293 | \begin{frame}{Scoping Rules}
294 | The scoping rules for R are the main feature that make it different
295 | from the original S language.  
296 | \begin{itemize}
297 | \item
298 | The scoping rules determine how a value is associated with a free
299 | variable in a function
300 | \item
301 | R uses \textit{lexical scoping} or \textit{static scoping}.  A common
302 | alternative is \textit{dynamic scoping}.
303 | \item
304 | Related to the scoping rules is how R uses the \textit{search list} to
305 | bind a value to a symbol
306 | \item
307 | Lexical scoping turns out to be particularly useful for simplifying
308 | statistical computations
309 | \end{itemize}
310 | \end{frame}
311 | 
312 | \begin{frame}[fragile]{Lexical Scoping}
313 | Consider the following function.
314 | \begin{verbatim}
315 | f <- function(x, y) {
316 |         x^2 + y / z
317 | }
318 | \end{verbatim}
319 | This function has 2 formal arguments \code{x} and \code{y}.  In the
320 | body of the function there is another symbol \code{z}.  In this case
321 | \code{z} is called a \textit{free variable}.
322 | 
323 | The scoping rules of a language determine how values are assigned to
324 | free variables.  Free variables are not formal arguments and are not
325 | local variables (assigned insided the function body).
326 | \end{frame}
327 | 
328 | \begin{frame}[fragile]{Lexical Scoping}
329 | Lexical scoping in R means that
330 | \begin{quote}
331 | \textit{the values of free variables
332 |   are searched for in the environment in which the function was
333 |   defined}.
334 | \end{quote}
335 | What is an environment?
336 | \begin{itemize}
337 | \item
338 | An \textit{environment} is a collection of (symbol, value) pairs,
339 | i.e. \code{x} is a symbol and \code{3.14} might be its value.
340 | \item
341 | Every environment has a parent environment; it is possible for an
342 | environment to have multiple ``children''
343 | \item
344 | the only environment without a parent is the empty environment
345 | \item
346 | A function + an environment = a \textit{closure} or \textit{function
347 |   closure}.
348 | \end{itemize}
349 | \end{frame}
350 | 
351 | 
352 | \begin{frame}{Lexical Scoping}
353 | Searching for the value for a free variable:
354 | \begin{itemize}
355 | \item
356 | If the value of a symbol is not found in the environment in which a
357 | function was defined, then the search is continued in the
358 | \textit{parent environment}.
359 | \item
360 | The search continues down the sequence of parent environments until we
361 | hit the \textit{top-level environment}; this usually the global
362 | environment (workspace) or the namespace of a package.
363 | \item
364 | After the top-level environment, the search continues down the search
365 | list until we hit the \textit{empty environment}.
366 | \item
367 | If a value for a given symbol cannot be found once the empty
368 | environment is arrived at, then an error is thrown.
369 | \end{itemize}
370 | \end{frame}
371 | 
372 | 
373 | \begin{frame}{Lexical Scoping}
374 | Why does all this matter?
375 | \begin{itemize}
376 | \item
377 | Typically, a function is defined in the global environment, so that
378 | the values of free variables are just found in the user's workspace
379 | \item
380 | This behavior is logical for most people and is usually the ``right
381 | thing'' to do
382 | \item
383 | However, in R you can have functions defined \textit{inside other
384 |   functions}
385 | \begin{itemize}
386 | \item
387 | Languages like C don't let you do this
388 | \end{itemize}
389 | \item
390 | Now things get interesting --- In this case the environment in which a
391 | function is defined is the body of another function!
392 | \end{itemize}
393 | \end{frame}
394 | 
395 | \begin{frame}[fragile]{Lexical Scoping}
396 | \begin{verbatim}
397 | make.power <- function(n) {
398 |         pow <- function(x) {
399 |                 x^n
400 |         }
401 |         pow
402 | }
403 | \end{verbatim}
404 | This function returns another function as its value.
405 | \begin{verbatim}
406 | > cube <- make.power(3)
407 | > square <- make.power(2)
408 | > cube(3)
409 | [1] 27
410 | > square(3)
411 | [1] 9
412 | \end{verbatim}
413 | \end{frame}
414 | 
415 | \begin{frame}[fragile]{Exploring a Function Closure}
416 | What's in a function's environment?
417 | \begin{verbatim}
418 | > ls(environment(cube))
419 | [1] "n"   "pow"
420 | > get("n", environment(cube))
421 | [1] 3
422 | 
423 | > ls(environment(square))
424 | [1] "n"   "pow"
425 | > get("n", environment(square))
426 | [1] 2
427 | \end{verbatim}
428 | \end{frame}
429 | 
430 | \begin{frame}[fragile]{Lexical vs. Dynamic Scoping}
431 | \begin{verbatim}
432 | y <- 10
433 | 
434 | f <- function(x) {
435 |         y <- 2
436 |         y^2 + g(x)
437 | }
438 | 
439 | g <- function(x) {
440 |         x * y
441 | }
442 | \end{verbatim}
443 | What is the value of
444 | \begin{verbatim}
445 | f(3)
446 | \end{verbatim}
447 | \end{frame}
448 | 
449 | \begin{frame}[fragile]{Lexical vs. Dynamic Scoping}
450 | \begin{itemize}
451 | \item
452 | With lexical scoping the value of \code{y} in the function \code{g} is
453 | looked up in the environment in which the function was defined, in
454 | this case the global environment, so the value of \code{y} is 10.
455 | \item
456 | With dynamic scoping, the value of \code{y} is looked up in the
457 | environment from which the function was \textit{called} (sometimes
458 | referred to as the \textit{calling environment}).
459 | \begin{itemize}
460 | \item
461 | In R the calling environment is known as the \textit{parent frame}
462 | \end{itemize}
463 | So the value of \code{y} would be 2.
464 | \end{itemize}
465 | \end{frame}
466 | 
467 | \begin{frame}[fragile]{Lexical vs. Dynamic Scoping}
468 | When a function is \textit{defined} in the global environment and is
469 | subsequently \textit{called} from the global environment, then the
470 | defining environment and the calling environment are the same.  This
471 | can sometimes give the appearance of dynamic scoping.
472 | \begin{verbatim}
473 | > g <- function(x) {
474 | +         a <- 3
475 | +         x + a + y
476 | + }
477 | > g(2)
478 | Error in g(2) : object "y" not found
479 | > y <- 3
480 | > g(2)
481 | [1] 8
482 | \end{verbatim}
483 | \end{frame}
484 | 
485 | 
486 | \begin{frame}{Other Languages}
487 | Other languages that support lexical scoping
488 | \begin{itemize}
489 | \item
490 | Scheme
491 | \item
492 | Perl
493 | \item
494 | Python
495 | \item
496 | Common Lisp (all languages converge to Lisp)
497 | \end{itemize}
498 | \end{frame}
499 | 
500 | \begin{frame}{Consequences of Lexical Scoping}
501 | \begin{itemize}
502 | \item
503 | In R, all objects must be stored in memory
504 | \item
505 | All functions must carry a pointer to their respective defining
506 | environments, which could be anywhere
507 | \item
508 | In S-PLUS, free variables are always looked up in the global
509 | workspace, so everything can be stored on the disk because the
510 | ``defining environment'' of all functions is the same.
511 | \end{itemize}
512 | \end{frame}
513 | 
514 | \begin{frame}{Application: Optimization}
515 | Why is any of this information useful?
516 | \begin{itemize}
517 | \item
518 | Optimization routines in R like \code{optim}, \code{nlm}, and
519 | \code{optimize} require you to pass a function whose argument is a
520 | vector of parameters (e.g. a log-likelihood)
521 | \item
522 | However, an object function might depend on a host of other things
523 | besides its parameters (like \textit{data})
524 | \item
525 | When writing software which does optimization, it may be desirable to
526 | allow the user to hold certain parameters fixed
527 | \end{itemize}
528 | \end{frame}
529 | 
530 | \begin{frame}[fragile]{Maximizing a Normal Likelihood}
531 | Write a ``constructor'' function
532 | \begin{verbatim}
533 | make.NegLogLik <- function(data, fixed=c(FALSE,FALSE)) {
534 |         params <- fixed
535 |         function(p) {
536 |                 params[!fixed] <- p
537 |                 mu <- params[1]
538 |                 sigma <- params[2]
539 |                 a <- -0.5*length(data)*log(2*pi*sigma^2)
540 |                 b <- -0.5*sum((data-mu)^2) / (sigma^2)
541 |                 -(a + b)
542 |         }
543 | }
544 | \end{verbatim}
545 | \textbf{Note}: Optimization functions in R \textit{minimize}
546 | functions, so you need to use the negative log-likelihood.
547 | \end{frame}
548 | 
549 | 
550 | \begin{frame}[fragile]{Maximizing a Normal Likelihood}
551 | \begin{verbatim}
552 | > set.seed(1); normals <- rnorm(100, 1, 2)
553 | > nLL <- make.NegLogLik(normals)
554 | > nLL
555 | function(p) {
556 |                 params[!fixed] <- p
557 |                 mu <- params[1]
558 |                 sigma <- params[2]
559 |                 a <- -0.5*length(data)*log(2*pi*sigma^2)
560 |                 b <- -0.5*sum((data-mu)^2) / (sigma^2)
561 |                 -(a + b)
562 |         }
563 | <environment: 0x165b1a4>
564 | > ls(environment(nLL))
565 | [1] "data"   "fixed"  "params"
566 | \end{verbatim}
567 | \end{frame}
568 | 
569 | \begin{frame}[fragile]{Estimating Parameters}
570 | \begin{verbatim}
571 | > optim(c(mu = 0, sigma = 1), nLL)$par
572 |       mu    sigma 
573 | 1.218239 1.787343 
574 | \end{verbatim}
575 | Fixing $\sigma = 2$
576 | \begin{verbatim}
577 | > nLL <- make.NegLogLik(normals, c(FALSE, 2))
578 | > optimize(nLL, c(-1, 3))$minimum
579 | [1] 1.217775
580 | \end{verbatim}
581 | Fixing $\mu = 1$
582 | \begin{verbatim}
583 | > nLL <- make.NegLogLik(normals, c(1, FALSE))
584 | > optimize(nLL, c(1e-6, 10))$minimum
585 | [1] 1.800596
586 | \end{verbatim}
587 | \end{frame}
588 | 
589 | \begin{frame}[fragile]{Plotting the Likelihood}
590 | \begin{verbatim}
591 | nLL <- make.NegLogLik(normals, c(1, FALSE))
592 | x <- seq(1.7, 1.9, len = 100)
593 | y <- sapply(x, nLL)
594 | plot(x, exp(-(y - min(y))), type = "l")
595 | 
596 | nLL <- make.NegLogLik(normals, c(FALSE, 2))
597 | x <- seq(0.5, 1.5, len = 100)
598 | y <- sapply(x, nLL)
599 | plot(x, exp(-(y - min(y))), type = "l")
600 | \end{verbatim}
601 | \end{frame}
602 | 
603 | 
604 | \begin{frame}{Plotting the Likelihood}
605 | \includegraphics[width=3in,height=3in]{mulike}
606 | \end{frame}
607 | 
608 | \begin{frame}{Plotting the Likelihood}
609 | \includegraphics[width=3in,height=3in]{sigmalike}
610 | \end{frame}
611 | 
612 | 
613 | \begin{frame}{Lexical Scoping Summary}
614 | \begin{itemize}
615 | \item
616 | Objective functions can be ``built'' which contain all of the
617 | necessary data for evaluating the function
618 | \item
619 | No need to carry around long argument lists --- useful for interactive
620 | and exploratory work.
621 | \item
622 | Code can be simplified and cleand up
623 | \item
624 | Reference: Robert Gentleman and Ross Ihaka (2000). ``Lexical Scope and
625 | Statistical Computing,'' \textit{JCGS}, 9, 491--508.
626 | \end{itemize}
627 | \end{frame}
628 | 
629 | 
630 | 
631 | \end{document}
632 | 


--------------------------------------------------------------------------------
/ggplot2_part1.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/ggplot2_part1.pptx


--------------------------------------------------------------------------------
/ggplot2_part2.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/ggplot2_part2.pptx


--------------------------------------------------------------------------------
/grep.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | \usepackage{amsmath,amsfonts,amssymb}
 17 | 
 18 | \setbeamertemplate{footline}[page number]
 19 | 
 20 | \input{macros}
 21 | 
 22 | \title[Regular Expressions in R]{Regular Expressions in R}
 23 | 
 24 | 
 25 | \date{Computing for Data Analysis}
 26 | 
 27 | 
 28 | 
 29 | \begin{document}
 30 | 
 31 | \begin{frame}
 32 |   \titlepage
 33 | \end{frame}
 34 | 
 35 | \begin{frame}{Regular Expression Functions}
 36 | The primary R functions for dealing with regular expressions are
 37 | \begin{itemize}
 38 | \item \code{grep}, \code{grepl}: Search for matches of a regular
 39 |   expression/pattern in a character vector; either return the indices
 40 |   into the character vector that match, the strings that happen to
 41 |   match, or a TRUE/FALSE vector indicating which elements match
 42 | \item \code{regexpr}, \code{gregexpr}: Search a character vector for regular
 43 |   expression matches and return the indices of the string where the
 44 |   match begins and the length of the match
 45 | \item \code{sub}, \code{gsub}: Search a character vector for regular
 46 |   expression matches and replace that match with another string
 47 | \item \code{regexec}: Easier to explain through demonstration.
 48 | \end{itemize}
 49 | \end{frame}
 50 | 
 51 | \begin{frame}[fragile]{grep}
 52 | Here is an excerpt of the Baltimore City homicides dataset:
 53 | \begin{verbatim}
 54 | > homicides <- readLines("homicides.txt")
 55 | > homicides[1]
 56 | [1] "39.311024, -76.674227, iconHomicideShooting, 'p2', '<dl><dt>Leon 
 57 | Nelson</dt><dd class=\"address\">3400 Clifton Ave.<br />Baltimore, MD 
 58 | 21216</dd><dd>black male, 17 years old</dd>
 59 | <dd>Found on January 1, 2007</dd><dd>Victim died at Shock 
 60 | Trauma</dd><dd>Cause: shooting</dd></dl>'"
 61 | 
 62 | > homicides[1000]
 63 | [1] "39.33626300000, -76.55553990000, icon_homicide_shooting, 'p1200',...
 64 | \end{verbatim}
 65 | How can I find the records for all the victims of shootings (as
 66 | opposed to other causes)?
 67 | \end{frame}
 68 | 
 69 | \begin{frame}[fragile]{grep}
 70 | 
 71 | \begin{verbatim}
 72 | > length(grep("iconHomicideShooting", homicides))
 73 | [1] 228
 74 | > length(grep("iconHomicideShooting|icon_homicide_shooting", homicides))
 75 | [1] 1003
 76 | > length(grep("Cause: shooting", homicides))
 77 | [1] 228
 78 | > length(grep("Cause: [Ss]hooting", homicides))
 79 | [1] 1003
 80 | > length(grep("[Ss]hooting", homicides))
 81 | [1] 1005
 82 | \end{verbatim}
 83 | 
 84 | \end{frame}
 85 | 
 86 | 
 87 | \begin{frame}[fragile]{grep}
 88 | \begin{verbatim}
 89 | > i <- grep("[cC]ause: [Ss]hooting", homicides)
 90 | > j <- grep("[Ss]hooting", homicides)
 91 | > str(i)
 92 |  int [1:1003] 1 2 6 7 8 9 10 11 12 13 ...
 93 | > str(j)
 94 |  int [1:1005] 1 2 6 7 8 9 10 11 12 13 ...
 95 | > setdiff(i, j)
 96 | integer(0)
 97 | > setdiff(j, i)
 98 | [1] 318 859
 99 | \end{verbatim}
100 | \end{frame}
101 | 
102 | 
103 | \begin{frame}[fragile]{grep}
104 | \begin{verbatim}
105 | > homicides[859]
106 | [1] "39.33743900000, -76.66316500000, icon_homicide_bluntforce,
107 | 'p914', '<dl><dt><a href=\"http://essentials.baltimoresun.com/
108 | micro_sun/homicides/victim/914/steven-harris\">Steven Harris</a>
109 | </dt><dd class=\"address\">4200 Pimlico Road<br />Baltimore, MD 21215
110 | </dd><dd>Race: Black<br />Gender: male<br />Age: 38 years old</dd>
111 | <dd>Found on July 29, 2010</dd><dd>Victim died at Scene</dd>
112 | <dd>Cause: Blunt Force</dd><dd class=\"popup-note\"><p>Harris was 
113 | found dead July 22 and ruled a shooting victim; an autopsy
114 | subsequently showed that he had not been shot,...</dd></dl>'"
115 | \end{verbatim}
116 | \end{frame}
117 | 
118 | 
119 | \begin{frame}[fragile]{grep}
120 | By default, \code{grep} returns the indices into the character vector
121 | where the regex pattern matches. 
122 | \begin{verbatim}
123 | > grep("^New", state.name)
124 | [1] 29 30 31 32
125 | \end{verbatim}
126 | Setting \code{value = TRUE} returns
127 | the actual elements of the character vector that match.
128 | \begin{verbatim}
129 | > grep("^New", state.name, value = TRUE)
130 | [1] "New Hampshire" "New Jersey"    "New Mexico"    "New York" 
131 | \end{verbatim}
132 | \code{grepl} returns a logical vector indicating which element matches.
133 | \begin{verbatim}
134 | > grepl("^New", state.name)
135 |  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
136 | [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
137 | [25] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
138 | [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
139 | [49] FALSE FALSE
140 | \end{verbatim}
141 | \end{frame}
142 | 
143 | 
144 | \begin{frame}[fragile]{regexpr}
145 | Some limitations of \code{grep}
146 | \begin{itemize}
147 | \item  The \code{grep} function tells you which strings in a character
148 |   vector match a certain pattern but it doesn't tell you exactly where
149 |   the match occurs or what the match is (for a more complicated
150 |   regex. 
151 | \item The \code{regexpr} function gives you the index into each string
152 |   where the match begins and the length of the match for that string.
153 | \item \code{regexpr} only gives you the first match of the string
154 |   (reading left to right). \code{gregexpr} will give you all of the
155 |   matches in a given string.
156 | \end{itemize}
157 | \end{frame}
158 | 
159 | 
160 | \begin{frame}[fragile]{regexpr}
161 | How can we find the date of the homicide?
162 | \begin{verbatim}
163 | > homicides[1]
164 | [1] "39.311024, -76.674227, iconHomicideShooting, 'p2', '<dl><dt>Leon
165 | Nelson</dt><dd class=\"address\">3400 Clifton Ave.<br />Baltimore, 
166 | MD 21216</dd><dd>black male, 17 years old</dd>
167 | <dd>Found on January 1, 2007</dd><dd>Victim died at Shock 
168 | Trauma</dd><dd>Cause: shooting</dd></dl>'"
169 | \end{verbatim}
170 | Can we just 'grep' on ``Found''?
171 | \end{frame}
172 | 
173 | \begin{frame}[fragile]{regexpr}
174 | The word 'found' may be found elsewhere in the entry.
175 | \begin{verbatim}
176 | > homicides[954]
177 | [1] "39.30677400000, -76.59891100000, icon_homicide_shooting, 'p816', 
178 | '<dl><dd class=\"address\">1400 N Caroline St<br />Baltimore, MD 21213</dd>
179 | <dd>Race: Black<br />Gender: male<br />Age: 29 years old</dd>
180 | <dd>Found on March  3, 2010</dd><dd>Victim died at Scene</dd>
181 | <dd>Cause: Shooting</dd><dd class=\"popup-note\"><p>Wheeler\\'s body 
182 | was&nbsp;found on the grounds of Dr. Bernard Harris Sr.&nbsp;Elementary 
183 | School</p></dd></dl>'"
184 | \end{verbatim}
185 | \end{frame}
186 | 
187 | \begin{frame}[fragile]{regexpr}
188 | Let's use the pattern
189 | \begin{verbatim}
190 | <dd>[F|f]ound(.*)</dd>
191 | \end{verbatim}
192 | What does this look for?
193 | \begin{verbatim}
194 | > regexpr("<dd>[F|f]ound(.*)</dd>", homicides[1:10])
195 |  [1] 177 178 188 189 178 182 178 187 182 183
196 | attr(,"match.length")
197 |  [1] 93 86 89 90 89 84 85 84 88 84
198 | attr(,"useBytes")
199 | [1] TRUE
200 | > substr(homicides[1], 177, 177 + 93 - 1)
201 | [1] "<dd>Found on January 1, 2007</dd><dd>Victim died at Shock
202 |  Trauma</dd><dd>Cause: shooting</dd>"
203 | \end{verbatim}
204 | \end{frame}
205 | 
206 | 
207 | \begin{frame}[fragile]{regexpr}
208 | The previous pattern was too greedy and matched too much of the
209 | string. We need to use the \code{?} metacharacter to make the regex
210 | ``lazy''.
211 | \begin{verbatim}
212 | > regexpr("<dd>[F|f]ound(.*?)</dd>", homicides[1:10])
213 |  [1] 177 178 188 189 178 182 178 187 182 183
214 | attr(,"match.length")
215 |  [1] 33 33 33 33 33 33 33 33 33 33
216 | attr(,"useBytes")
217 | [1] TRUE
218 | 
219 | > substr(homicides[1], 177, 177 + 33 - 1)
220 | [1] "<dd>Found on January 1, 2007</dd>"
221 | 
222 | \end{verbatim}
223 | \end{frame}
224 | 
225 | \begin{frame}[fragile]{regmatches}
226 | One handy function is \code{regmatches} which extracts the matches in
227 | the strings for you without you having to use \code{substr}.
228 | \begin{verbatim}
229 | > r <- regexpr("<dd>[F|f]ound(.*?)</dd>", homicides[1:5])
230 | > regmatches(homicides[1:5], r)
231 | [1] "<dd>Found on January 1, 2007</dd>" "<dd>Found on January 2, 2007</dd>"
232 | [3] "<dd>Found on January 2, 2007</dd>" "<dd>Found on January 3, 2007</dd>"
233 | [5] "<dd>Found on January 5, 2007</dd>"
234 | \end{verbatim}
235 | \end{frame}
236 | 
237 | \begin{frame}[fragile]{sub/gsub}
238 | Sometimes we need to clean things up or modify strings by matching a
239 | pattern and replacing it with something else. For example, how can we
240 | extract the data from this string?
241 | \begin{verbatim}
242 | > x <- substr(homicides[1], 177, 177 + 33 - 1)
243 | > x
244 | [1] "<dd>Found on January 1, 2007</dd>"
245 | \end{verbatim}
246 | We want to strip out the stuff surrounding the ``January 1, 2007''
247 | piece.
248 | \begin{verbatim}
249 | > sub("<dd>[F|f]ound on |</dd>", "", x)
250 | [1] "January 1, 2007</dd>"
251 | 
252 | > gsub("<dd>[F|f]ound on |</dd>", "", x)
253 | [1] "January 1, 2007"
254 | \end{verbatim}
255 | \end{frame}
256 | 
257 | 
258 | \begin{frame}[fragile]{sub/gsub}
259 | sub/gsub can take vector arguments
260 | \begin{verbatim}
261 | > r <- regexpr("<dd>[F|f]ound(.*?)</dd>", homicides[1:5])
262 | > m <- regmatches(homicides[1:5], r)
263 | > m
264 | [1] "<dd>Found on January 1, 2007</dd>" "<dd>Found on January 2, 2007</dd>"
265 | [3] "<dd>Found on January 2, 2007</dd>" "<dd>Found on January 3, 2007</dd>"
266 | [5] "<dd>Found on January 5, 2007</dd>"
267 | > gsub("<dd>[F|f]ound on |</dd>", "", m)
268 | [1] "January 1, 2007" "January 2, 2007" "January 2, 2007" "January 3, 2007"
269 | [5] "January 5, 2007"
270 | > as.Date(d, "%B %d, %Y")
271 | [1] "2007-01-01" "2007-01-02" "2007-01-02" "2007-01-03" "2007-01-05"
272 | \end{verbatim}
273 | \end{frame}
274 | 
275 | \begin{frame}[fragile]{regexec}
276 | The \code{regexec} function works like \code{regexpr} except it gives
277 | you the indices for parenthesized sub-expressions.
278 | \begin{verbatim}
279 | > regexec("<dd>[F|f]ound on (.*?)</dd>", homicides[1])
280 | [[1]]
281 | [1] 177 190
282 | attr(,"match.length")
283 | [1] 33 15
284 | 
285 | > regexec("<dd>[F|f]ound on .*?</dd>", homicides[1])
286 | [[1]]
287 | [1] 177
288 | attr(,"match.length")
289 | [1] 33
290 | \end{verbatim}
291 | \end{frame}
292 | 
293 | 
294 | \begin{frame}[fragile]{regexec}
295 |   Now we can extract the string in the parenthesized sub-expression.
296 | \begin{verbatim}
297 | > regexec("<dd>[F|f]ound on (.*?)</dd>", homicides[1])
298 | [[1]]
299 | [1] 177 190
300 | attr(,"match.length")
301 | [1] 33 15
302 | 
303 | > substr(homicides[1], 177, 177 + 33 - 1)
304 | [1] "<dd>Found on January 1, 2007</dd>"
305 | 
306 | > substr(homicides[1], 190, 190 + 15 - 1)
307 | [1] "January 1, 2007"
308 | \end{verbatim}
309 | \end{frame}
310 | 
311 | \begin{frame}[fragile]{regexec}
312 | Even easier with the \code{regmatches} function.
313 | \begin{verbatim}
314 | > r <- regexec("<dd>[F|f]ound on (.*?)</dd>", homicides[1:2])
315 | > regmatches(homicides[1:2], r)
316 | [[1]]
317 | [1] "<dd>Found on January 1, 2007</dd>" "January 1, 2007"                  
318 | 
319 | [[2]]
320 | [1] "<dd>Found on January 2, 2007</dd>" "January 2, 2007"
321 | \end{verbatim}
322 | \end{frame}
323 | 
324 | \begin{frame}[fragile]{regexec}
325 | Let's make a plot of monthly homicide counts
326 | \begin{verbatim}
327 | > r <- regexec("<dd>[F|f]ound on (.*?)</dd>", homicides)
328 | > m <- regmatches(homicides, r)
329 | > dates <- sapply(m, function(x) x[2])
330 | > dates <- as.Date(dates, "%B %d, %Y")
331 | > hist(dates, "month", freq = TRUE)
332 | \end{verbatim}
333 | \end{frame}
334 | 
335 | \begin{frame}[fragile]{regexec}
336 | \includegraphics[height=3.2in]{homicide-month}
337 | \end{frame}
338 | 
339 | 
340 | \begin{frame}{Summary}
341 | The primary R functions for dealing with regular expressions are
342 | \begin{itemize}
343 | \item \code{grep}, \code{grepl}: Search for matches of a regular
344 |   expression/pattern in a character vector
345 | \item \code{regexpr}, \code{gregexpr}: Search a character vector for regular
346 |   expression matches and return the indices where the match begins;
347 |   useful in conjunction with \code{regmatches}
348 | \item \code{sub}, \code{gsub}: Search a character vector for regular
349 |   expression matches and replace that match with another string
350 | \item \code{regexec}: Gives you indices of parethensized sub-expressions.
351 | \end{itemize}
352 | \end{frame}
353 | 
354 | 
355 | 
356 | 
357 | 
358 | 
359 | 
360 | 
361 | \end{document}
362 | 


--------------------------------------------------------------------------------
/help.ppt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/help.ppt


--------------------------------------------------------------------------------
/homicide-month.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/homicide-month.pdf


--------------------------------------------------------------------------------
/knitr.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/knitr.pptx


--------------------------------------------------------------------------------
/linearmodelsim.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/linearmodelsim.pdf


--------------------------------------------------------------------------------
/loopfunctions.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[The R Language]{Introduction to the R Language}
 26 | 
 27 | \subtitle{Loop Functions}
 28 | 
 29 | \date{Computing for Data Analysis}
 30 | 
 31 | 
 32 | 
 33 | \begin{document}
 34 | 
 35 | \begin{frame}
 36 |   \titlepage
 37 | \end{frame}
 38 | 
 39 | \begin{frame}{Looping on the Command Line}
 40 | Writing for, while loops is useful when programming but not
 41 | particularly easy when working interactively on the command line.
 42 | There are some functions which implement looping to make life easier.
 43 | \begin{itemize}
 44 | \item
 45 | \code{lapply}:  Loop over a list and evaluate a function on each element
 46 | \item
 47 | \code{sapply}:  Same as \code{lapply} but try to simplify the result
 48 | \item
 49 | \code{apply}:  Apply a function over the margins of an array
 50 | \item
 51 | \code{tapply}:  Apply a function over subsets of a vector
 52 | \item
 53 | \code{mapply}:  Multivariate version of \code{lapply}
 54 | \end{itemize}
 55 | An auxiliary function \code{split} is also useful, particularly in
 56 | conjunction with \code{lapply}.
 57 | \end{frame}
 58 | 
 59 | \begin{frame}[fragile]{lapply}
 60 | \code{lapply} takes three arguments: a list \code{X}, a function (or
 61 | the name of a function) \code{FUN}, and other arguments via its
 62 | \code{...} argument.  If \code{X} is not a list, it will be coerced to
 63 | a list using \code{as.list}.
 64 | \begin{verbatim}
 65 | > lapply
 66 | function (X, FUN, ...) 
 67 | {
 68 |     FUN <- match.fun(FUN)
 69 |     if (!is.vector(X) || is.object(X)) 
 70 |         X <- as.list(X)
 71 |     .Internal(lapply(X, FUN))
 72 | }
 73 | \end{verbatim}
 74 | The actual looping is done internally in C code.
 75 | \end{frame}
 76 | 
 77 | 
 78 | \begin{frame}[fragile]{lapply}
 79 | \code{lapply} always returns a list, regardless of the class of the
 80 | input.
 81 | \begin{verbatim}
 82 | > x <- list(a = 1:5, b = rnorm(10))
 83 | > lapply(x, mean)
 84 | $a
 85 | [1] 3
 86 | 
 87 | $b
 88 | [1] 0.0296824
 89 | \end{verbatim}
 90 | \end{frame}
 91 | 
 92 | 
 93 | \begin{frame}[fragile]{lapply}
 94 | \begin{verbatim}
 95 | > x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
 96 | > lapply(x, mean)
 97 | $a
 98 | [1] 2.5
 99 | 
100 | $b
101 | [1] 0.06082667
102 | 
103 | $c
104 | [1] 1.467083
105 | 
106 | $d
107 | [1] 5.074749
108 | \end{verbatim}
109 | \end{frame}
110 | 
111 | 
112 | \begin{frame}[fragile]{lapply}
113 | \begin{verbatim}
114 | > x <- 1:4
115 | > lapply(x, runif)
116 | [[1]]
117 | [1] 0.2675082
118 | 
119 | [[2]]
120 | [1] 0.2186453 0.5167968
121 | 
122 | [[3]]
123 | [1] 0.2689506 0.1811683 0.5185761
124 | 
125 | [[4]]
126 | [1] 0.5627829 0.1291569 0.2563676 0.7179353
127 | \end{verbatim}
128 | \end{frame}
129 | 
130 | \begin{frame}[fragile]{lapply}
131 | \begin{verbatim}
132 | > x <- 1:4
133 | > lapply(x, runif, min = 0, max = 10)
134 | [[1]]
135 | [1] 3.302142
136 | 
137 | [[2]]
138 | [1] 6.848960 7.195282
139 | 
140 | [[3]]
141 | [1] 3.5031416 0.8465707 9.7421014
142 | 
143 | [[4]]
144 | [1] 1.195114 3.594027 2.930794 2.766946
145 | \end{verbatim}
146 | \end{frame}
147 | 
148 | 
149 | \begin{frame}[fragile]{lapply}
150 | \code{lapply} and friends make heavy use of \textit{anonymous functions}.
151 | \begin{verbatim}
152 | > x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))
153 | > x
154 | $a
155 |      [,1] [,2]
156 | [1,]    1    3
157 | [2,]    2    4
158 | 
159 | $b
160 |      [,1] [,2]
161 | [1,]    1    4
162 | [2,]    2    5
163 | [3,]    3    6
164 | \end{verbatim}
165 | \end{frame}
166 | 
167 | 
168 | \begin{frame}[fragile]{lapply}
169 | An anonymous function for extracting the first column of each matrix.
170 | \begin{verbatim}
171 | > lapply(x, function(elt) elt[,1])
172 | $a
173 | [1] 1 2
174 | 
175 | $b
176 | [1] 1 2 3
177 | \end{verbatim}
178 | \end{frame}
179 | 
180 | 
181 | \begin{frame}[fragile]{sapply}
182 | \code{sapply} will try to simplify the result of \code{lapply} if
183 | possible.
184 | \begin{itemize}
185 | \item
186 | If the result is a list where every element is length 1, then a vector
187 | is returned
188 | \item
189 | If the result is a list where every element is a vector of the same
190 | length ($> 1$), a matrix is returned.
191 | \item
192 | If it can't figure things out, a list is returned
193 | \end{itemize}
194 | \end{frame}
195 | 
196 | 
197 | 
198 | \begin{frame}[fragile]{sapply}
199 | \begin{verbatim}
200 | > x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
201 | > lapply(x, mean)
202 | $a
203 | [1] 2.5
204 | 
205 | $b
206 | [1] 0.06082667
207 | 
208 | $c
209 | [1] 1.467083
210 | 
211 | $d
212 | [1] 5.074749
213 | \end{verbatim}
214 | \end{frame}
215 | 
216 | \begin{frame}[fragile]{sapply}
217 | \begin{verbatim}
218 | > sapply(x, mean)
219 |          a          b          c          d 
220 | 2.50000000 0.06082667 1.46708277 5.07474950
221 | 
222 | 
223 | > mean(x)
224 | [1] NA
225 | Warning message:
226 | In mean.default(x) : argument is not numeric or logical: returning NA
227 | \end{verbatim}
228 | \end{frame}
229 | 
230 | \begin{frame}[fragile]{apply}
231 | \code{apply} is used to a evaluate a function (often an anonymous one)
232 | over the margins of an array.
233 | \begin{itemize}
234 | \item
235 | It is most often used to apply a function to the rows or columns of a matrix
236 | \item
237 | It can be used with general arrays, e.g. taking the average of an
238 | array of matrices
239 | \item
240 | It is not really faster than writing a loop, but it works in one line!
241 | \end{itemize}
242 | \end{frame}
243 | 
244 | 
245 | \begin{frame}[fragile]{apply}
246 | \begin{verbatim}
247 | > str(apply)
248 | function (X, MARGIN, FUN, ...)  
249 | \end{verbatim}
250 | \begin{itemize}
251 | \item
252 | \code{X} is an array
253 | \item
254 | \code{MARGIN} is an integer vector indicating which margins should be
255 | ``retained''.
256 | \item
257 | \code{FUN} is a function to be applied
258 | \item
259 | \code{...} is for other arguments to be passed to \code{FUN}
260 | \end{itemize}
261 | \end{frame}
262 | 
263 | \begin{frame}[fragile]{apply}
264 | \begin{verbatim}
265 | > x <- matrix(rnorm(200), 20, 10)
266 | > apply(x, 2, mean)
267 |  [1]  0.04868268  0.35743615 -0.09104379
268 |  [4] -0.05381370 -0.16552070 -0.18192493
269 |  [7]  0.10285727  0.36519270  0.14898850
270 | [10]  0.26767260
271 | 
272 | > apply(x, 1, sum)
273 |  [1] -1.94843314  2.60601195  1.51772391
274 |  [4] -2.80386816  3.73728682 -1.69371360
275 |  [7]  0.02359932  3.91874808 -2.39902859
276 | [10]  0.48685925 -1.77576824 -3.34016277
277 | [13]  4.04101009  0.46515429  1.83687755
278 | [16]  4.36744690  2.21993789  2.60983764
279 | [19] -1.48607630  3.58709251
280 | \end{verbatim}
281 | \end{frame}
282 | 
283 | 
284 | \begin{frame}{col/row sums and means}
285 | For sums and means of matrix dimensions, we have some shortcuts.
286 | \begin{itemize}
287 | \item
288 | \code{rowSums} = apply(x, 1, sum)
289 | \item
290 | \code{rowMeans} = apply(x, 1, mean)
291 | \item
292 | \code{colSums} = apply(x, 2, sum)
293 | \item
294 | \code{colMeans} = apply(x, 2, mean)
295 | \end{itemize}
296 | The shortcut functions are \textit{much} faster, but you won't notice
297 | unless you're using a large matrix.
298 | \end{frame}
299 | 
300 | \begin{frame}[fragile]{Other Ways to Apply}
301 | Quantiles of the rows of a matrix.
302 | \begin{verbatim}
303 | > x <- matrix(rnorm(200), 20, 10)
304 | > apply(x, 1, quantile, probs = c(0.25, 0.75))
305 |           [,1]        [,2]       [,3]        [,4]
306 | 25% -0.3304284 -0.99812467 -0.9186279 -0.49711686
307 | 75%  0.9258157  0.07065724  0.3050407 -0.06585436
308 |            [,5]       [,6]      [,7]       [,8]
309 | 25% -0.05999553 -0.6588380 -0.653250 0.01749997
310 | 75%  0.52928743  0.3727449  1.255089 0.72318419
311 |           [,9]      [,10]      [,11]      [,12]
312 | 25% -1.2467955 -0.8378429 -1.0488430 -0.7054902
313 | 75%  0.3352377  0.7297176  0.3113434  0.4581150
314 |          [,13]      [,14]      [,15]      [,16]
315 | 25% -0.1895108 -0.5729407 -0.5968578 -0.9517069
316 | 75%  0.5326299  0.5064267  0.4933852  0.8868922
317 |          [,17]      [,18]      [,19]     [,20]
318 | 25% -0.2502935 -0.7488003 -0.7190923 -0.638243
319 | 75%  0.7763024  0.2873202  0.6416363  1.271602
320 | \end{verbatim}
321 | \end{frame}
322 | 
323 | 
324 | \begin{frame}[fragile]{apply}
325 | Average matrix in an array
326 | \begin{verbatim}
327 | > a <- array(rnorm(2 * 2 * 10), c(2, 2, 10))
328 | > apply(a, c(1, 2), mean)
329 |            [,1]        [,2]
330 | [1,] -0.2353245 -0.03980211
331 | [2,] -0.3339748  0.04364908
332 | 
333 | > rowMeans(a, dims = 2)
334 |            [,1]        [,2]
335 | [1,] -0.2353245 -0.03980211
336 | [2,] -0.3339748  0.04364908
337 | \end{verbatim}
338 | \end{frame}
339 | 
340 | \begin{frame}[fragile]{tapply}
341 | \code{tapply} is used to apply a function over subsets of a vector.  I
342 | don't know why it's called \code{tapply}.
343 | \begin{verbatim}
344 | > str(tapply)
345 | function (X, INDEX, FUN = NULL, ..., simplify = TRUE)  
346 | \end{verbatim}
347 | \begin{itemize}
348 | \item
349 | \code{X} is a vector
350 | \item
351 | \code{INDEX} is a factor or a list of factors (or else they are coerced to
352 | factors)
353 | \item
354 | \code{FUN} is a function to be applied
355 | \item
356 | \code{...} contains other arguments to be passed \code{FUN}
357 | \item
358 | \code{simplify}, should we simplify the result?
359 | \end{itemize}
360 | \end{frame}
361 | 
362 | \begin{frame}[fragile]{tapply}
363 | Take group means.
364 | \begin{verbatim}
365 | > x <- c(rnorm(10), runif(10), rnorm(10, 1))
366 | > f <- gl(3, 10)
367 | > f
368 |  [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
369 | [24] 3 3 3 3 3 3 3
370 | Levels: 1 2 3
371 | > tapply(x, f, mean)
372 |         1         2         3 
373 | 0.1144464 0.5163468 1.2463678 
374 | \end{verbatim}
375 | \end{frame}
376 | 
377 | \begin{frame}[fragile]{tapply}
378 | Take group means without simplification.
379 | \begin{verbatim}
380 | > tapply(x, f, mean, simplify = FALSE)
381 | $`1`
382 | [1] 0.1144464
383 | 
384 | $`2`
385 | [1] 0.5163468
386 | 
387 | $`3`
388 | [1] 1.246368
389 | \end{verbatim}
390 | \end{frame}
391 | 
392 | 
393 | \begin{frame}[fragile]{tapply}
394 | Find group ranges.
395 | \begin{verbatim}
396 | > tapply(x, f, range)
397 | $`1`
398 | [1] -1.097309  2.694970
399 | 
400 | $`2`
401 | [1] 0.09479023 0.79107293
402 | 
403 | $`3`
404 | [1] 0.4717443 2.5887025
405 | \end{verbatim}
406 | \end{frame}
407 | 
408 | 
409 | \begin{frame}[fragile]{split}
410 | \code{split} takes a vector or other objects and splits it into groups
411 | determined by a factor or list of factors.
412 | \begin{verbatim}
413 | > str(split)
414 | function (x, f, drop = FALSE, ...)  
415 | \end{verbatim}
416 | \begin{itemize}
417 | \item
418 | \code{x} is a vector (or list) or data frame
419 | \item
420 | \code{f} is a factor (or coerced to one) or a list of factors
421 | \item
422 | \code{drop} indicates whether empty factors levels should be dropped
423 | \end{itemize}
424 | \end{frame}
425 | 
426 | \begin{frame}[fragile]{split}
427 | \begin{verbatim}
428 | > x <- c(rnorm(10), runif(10), rnorm(10, 1))
429 | > f <- gl(3, 10)
430 | > split(x, f)
431 | $`1`
432 |  [1] -0.8493038 -0.5699717 -0.8385255 -0.8842019
433 |  [5]  0.2849881  0.9383361 -1.0973089  2.6949703
434 |  [9]  1.5976789 -0.1321970
435 | 
436 | $`2`
437 |  [1] 0.09479023 0.79107293 0.45857419 0.74849293
438 |  [5] 0.34936491 0.35842084 0.78541705 0.57732081
439 |  [9] 0.46817559 0.53183823
440 | 
441 | $`3`
442 |  [1] 0.6795651 0.9293171 1.0318103 0.4717443
443 |  [5] 2.5887025 1.5975774 1.3246333 1.4372701
444 |  [9] 1.3961579 1.0068999
445 | \end{verbatim}
446 | \end{frame}
447 | 
448 | \begin{frame}[fragile]{split}
449 | A common idiom is \code{split} followed by an \code{lapply}.
450 | \begin{verbatim}
451 | > lapply(split(x, f), mean)
452 | $`1`
453 | [1] 0.1144464
454 | 
455 | $`2`
456 | [1] 0.5163468
457 | 
458 | $`3`
459 | [1] 1.246368
460 | \end{verbatim}
461 | \end{frame}
462 | 
463 | 
464 | \begin{frame}[fragile]{Splitting a Data Frame}
465 | \begin{verbatim}
466 | > library(datasets)
467 | > head(airquality)
468 |   Ozone Solar.R Wind Temp Month Day
469 | 1    41     190  7.4   67     5   1
470 | 2    36     118  8.0   72     5   2
471 | 3    12     149 12.6   74     5   3
472 | 4    18     313 11.5   62     5   4
473 | 5    NA      NA 14.3   56     5   5
474 | 6    28      NA 14.9   66     5   6
475 | \end{verbatim}
476 | \end{frame}
477 | 
478 | \begin{frame}[fragile]{Splitting a Data Frame}
479 | \begin{verbatim}
480 | > s <- split(airquality, airquality$Month)
481 | > lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
482 | $`5`
483 |    Ozone  Solar.R     Wind 
484 |       NA       NA 11.62258 
485 | 
486 | $`6`
487 |     Ozone   Solar.R      Wind 
488 |        NA 190.16667  10.26667 
489 | 
490 | $`7`
491 |      Ozone    Solar.R       Wind 
492 |         NA 216.483871   8.941935 
493 | 
494 | $`8`
495 |    Ozone  Solar.R     Wind 
496 |       NA       NA 8.793548 
497 | 
498 | $`9`
499 |    Ozone  Solar.R     Wind 
500 |       NA 167.4333  10.1800 
501 | \end{verbatim}
502 | \end{frame}
503 | 
504 | 
505 | \begin{frame}[fragile]{Splitting a Data Frame}
506 | \begin{verbatim}
507 | > sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
508 |                5         6          7        8        9
509 | Ozone         NA        NA         NA       NA       NA
510 | Solar.R       NA 190.16667 216.483871       NA 167.4333
511 | Wind    11.62258  10.26667   8.941935 8.793548  10.1800
512 | 
513 | 
514 | 
515 | > sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], 
516 |                                  na.rm = TRUE))
517 |                 5         6          7          8         9
518 | Ozone    23.61538  29.44444  59.115385  59.961538  31.44828
519 | Solar.R 181.29630 190.16667 216.483871 171.857143 167.43333
520 | Wind     11.62258  10.26667   8.941935   8.793548  10.18000
521 | \end{verbatim}
522 | \end{frame}
523 | 
524 | 
525 | 
526 | \begin{frame}[fragile]{Splitting on More than One Level}
527 | \begin{verbatim}
528 | > x <- rnorm(10)
529 | > f1 <- gl(2, 5)
530 | > f2 <- gl(5, 2)
531 | > f1
532 |  [1] 1 1 1 1 1 2 2 2 2 2
533 | Levels: 1 2
534 | > f2
535 |  [1] 1 1 2 2 3 3 4 4 5 5
536 | Levels: 1 2 3 4 5
537 | > interaction(f1, f2)
538 |  [1] 1.1 1.1 1.2 1.2 1.3 2.3 2.4 2.4 2.5 2.5
539 | 10 Levels: 1.1 2.1 1.2 2.2 1.3 2.3 1.4 ... 2.5
540 | \end{verbatim}
541 | \end{frame}
542 | 
543 | \begin{frame}[fragile]{Splitting on More than One Level}
544 | Interactions can create empty levels.
545 | \begin{verbatim}
546 | > str(split(x, list(f1, f2)))
547 | List of 10
548 |  $ 1.1: num [1:2] -0.378  0.445
549 |  $ 2.1: num(0) 
550 |  $ 1.2: num [1:2] 1.4066 0.0166
551 |  $ 2.2: num(0) 
552 |  $ 1.3: num -0.355
553 |  $ 2.3: num 0.315
554 |  $ 1.4: num(0) 
555 |  $ 2.4: num [1:2] -0.907  0.723
556 |  $ 1.5: num(0) 
557 |  $ 2.5: num [1:2] 0.732 0.360
558 | \end{verbatim}
559 | \end{frame}
560 | 
561 | \begin{frame}[fragile]{split}
562 | Empty levels can be dropped.
563 | \begin{verbatim}
564 | > str(split(x, list(f1, f2), drop = TRUE))
565 | List of 6
566 |  $ 1.1: num [1:2] -0.378  0.445
567 |  $ 1.2: num [1:2] 1.4066 0.0166
568 |  $ 1.3: num -0.355
569 |  $ 2.3: num 0.315
570 |  $ 2.4: num [1:2] -0.907  0.723
571 |  $ 2.5: num [1:2] 0.732 0.360
572 | \end{verbatim}
573 | \end{frame}
574 | 
575 | 
576 | 
577 | 
578 | \begin{frame}[fragile]{mapply}
579 | \code{mapply} is a multivariate apply of sorts which applies a
580 | function in parallel over a set of arguments.
581 | \begin{verbatim}
582 | > str(mapply)
583 | function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, 
584 |           USE.NAMES = TRUE)
585 | \end{verbatim}
586 | \begin{itemize}
587 | \item
588 | \code{FUN} is a function to apply
589 | \item
590 | \code{...} contains arguments to apply over
591 | \item
592 | \code{MoreArgs} is a list of other arguments to \code{FUN}.
593 | \item
594 | \code{SIMPLIFY} indicates whether the result should be simplified
595 | \end{itemize}
596 | \end{frame}
597 | 
598 | 
599 | \begin{frame}[fragile]{mapply}
600 | The following is tedious to type
601 | \begin{verbatim}
602 | list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))
603 | \end{verbatim}
604 | Instead we can do
605 | \begin{verbatim}
606 | > mapply(rep, 1:4, 4:1)
607 | [[1]]
608 | [1] 1 1 1 1
609 | 
610 | [[2]]
611 | [1] 2 2 2
612 | 
613 | [[3]]
614 | [1] 3 3
615 | 
616 | [[4]]
617 | [1] 4
618 | \end{verbatim}
619 | \end{frame}
620 | 
621 | 
622 | \begin{frame}[fragile]{Vectorizing a Function}
623 | \begin{verbatim}
624 | > noise <- function(n, mean, sd) {
625 | +         rnorm(n, mean, sd)
626 | + }
627 | > noise(5, 1, 2)
628 | [1]  2.4831198  2.4790100  0.4855190 -1.2117759
629 | [5] -0.2743532
630 | 
631 | > noise(1:5, 1:5, 2)
632 | [1] -4.2128648 -0.3989266  4.2507057  1.1572738
633 | [5]  3.7413584
634 | \end{verbatim}
635 | \end{frame}
636 | 
637 | \begin{frame}[fragile]{Instant Vectorization}
638 | \begin{verbatim}
639 | > mapply(noise, 1:5, 1:5, 2)
640 | [[1]]
641 | [1] 1.037658
642 | 
643 | [[2]]
644 | [1] 0.7113482 2.7555797
645 | 
646 | [[3]]
647 | [1] 2.769527 1.643568 4.597882
648 | 
649 | [[4]]
650 | [1] 4.476741 5.658653 3.962813 1.204284
651 | 
652 | [[5]]
653 | [1] 4.797123 6.314616 4.969892 6.530432 6.723254
654 | \end{verbatim}
655 | \end{frame}
656 | 
657 | \begin{frame}[fragile]{Instant Vectorization}
658 | Which is the same as
659 | \begin{verbatim}
660 | list(noise(1, 1, 2), noise(2, 2, 2),
661 |      noise(3, 3, 2), noise(4, 4, 2),
662 |      noise(5, 5, 2))
663 | \end{verbatim}
664 | \end{frame}
665 | 
666 | 
667 | \end{document}
668 | 


--------------------------------------------------------------------------------
/macros.tex:
--------------------------------------------------------------------------------
1 | \newcommand{\pkg}{\textbf}
2 | \newcommand{\code}{\texttt}
3 | 


--------------------------------------------------------------------------------
/mulike.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/mulike.pdf


--------------------------------------------------------------------------------
/overview_history.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[Overview and History of R]{Overview and History of R}
 26 | 
 27 | 
 28 | \date{Computing for Data Analysis}
 29 | 
 30 | \setbeamertemplate{footline}[page number]
 31 | 
 32 | \begin{document}
 33 | 
 34 | \begin{frame}
 35 |   \titlepage
 36 | \end{frame}
 37 | 
 38 | \begin{frame}{What is R?}
 39 | What is R?
 40 | \end{frame}
 41 | 
 42 | 
 43 | \begin{frame}{What is R?}
 44 | R is a dialect of the S language.
 45 | \end{frame}
 46 | 
 47 | \begin{frame}{What is S?}
 48 | \begin{itemize}
 49 | \item
 50 | S is a language that was developed by John Chambers and others at Bell
 51 | Labs.
 52 | \item
 53 | S was initiated in 1976 as an internal statistical analysis
 54 | environment---originally implemented as Fortran libraries.
 55 | \item
 56 | Early versions of the language did not contain functions for
 57 | statistical modeling.
 58 | \item
 59 | In 1988 the system was rewritten in C and began to resemble the system
 60 | that we have today (this was Version 3 of the language).  The book
 61 | \textit{Statistical Models in S} by Chambers and Hastie (the white 
 62 | book) documents the statistical analysis functionality.
 63 | \item
 64 | Version 4 of the S language was released in 1998 and is the version we
 65 | use today.  The book \textit{Programming with Data} by John Chambers
 66 | (the green book) documents this version of the language.
 67 | \end{itemize}
 68 | \end{frame}
 69 | 
 70 | \begin{frame}{Historical Notes}
 71 | \begin{itemize}
 72 | \item
 73 | In 1993 Bell Labs gave StatSci (now Insightful Corp.) an exclusive
 74 | license to develop and sell the S language.
 75 | \item
 76 | In 2004 Insightful purchased the S language from Lucent for \$2
 77 | million and is the current owner.
 78 | \item
 79 | In 2006, Alcatel purchased Lucent Technologies and is now called
 80 | Alcatel-Lucent.
 81 | \item
 82 | Insightful sells its implementation of the S language under the
 83 | product name S-PLUS and has built a number of fancy features (GUIs,
 84 | mostly) on top of it---hence the ``PLUS''.
 85 | \item
 86 | In 2008 Insightful is acquired by TIBCO for \$25 million
 87 | \item
 88 | The fundamentals of the S language itself has not changed dramatically
 89 | since 1998.
 90 | \item
 91 | In 1998, S won the Association for Computing Machinery's Software
 92 | System Award.
 93 | \end{itemize}
 94 | \end{frame}
 95 | 
 96 | 
 97 | \begin{frame}{S Philosophy}
 98 | In ``Stages in the Evolution of S'', John Chambers writes:
 99 | \begin{quote}
100 | ``[W]e wanted users to be able to begin in an interactive environment,
101 | where they did not consciously think of themselves as
102 | programming. Then as their needs became clearer and their
103 | sophistication increased, they should be able to slide gradually into
104 | programming, when the language and system aspects would become more
105 | important.''
106 | \end{quote}
107 | http://www.stat.bell-labs.com/S/history.html
108 | \end{frame}
109 | 
110 | \begin{frame}{Back to R}
111 | \begin{itemize}
112 | \item
113 | 1991: Created in New Zealand by Ross Ihaka and Robert Gentleman.
114 | Their experience developing R is documented in a 1996 \textit{JCGS}
115 | paper.
116 | \item
117 | 1993: First announcement of R to the public.
118 | \item
119 | 1995: Martin M\"achler convinces Ross and Robert to use the GNU
120 | General Public License to make R free software.
121 | \item
122 | 1996: A public mailing list is created (R-help and R-devel)
123 | \item
124 | 1997: The R Core Group is formed (containing some people associated
125 | with S-PLUS).  The core group controls the source code for R.
126 | \item
127 | 2000: R version 1.0.0 is released.
128 | \item
129 | 2012: R version 2.15.1 is released on June 22, 2012.
130 | \end{itemize}
131 | \end{frame}
132 | 
133 | 
134 | \begin{frame}{Features of R}
135 | \begin{itemize}
136 | \item
137 | Syntax is very similar to S, making it easy for S-PLUS users to switch
138 | over.
139 | \item
140 | Semantics are superficially similar to S, but in reality are quite
141 | different (more on that later).
142 | \item
143 | Runs on almost any standard computing platform/OS (even on the
144 | PlayStation 3)
145 | \item
146 | Frequent releases (annual + bugfix releases); active development.
147 | \end{itemize}
148 | \end{frame}
149 | 
150 | \begin{frame}{Features of R (cont'd)}
151 | \begin{itemize}
152 | \item
153 | Quite lean, as far as software goes; functionality is divided into
154 | modular packages
155 | \item
156 | Graphics capabilities very sophisticated and better than most stat
157 | packages.
158 | \item
159 | Useful for interactive work, but contains a powerful programming
160 | language for developing new tools (user $\longrightarrow$ programmer)
161 | \item
162 | Very active and vibrant user community; R-help and R-devel mailing
163 | lists and Stack Overflow
164 | \end{itemize}
165 | \end{frame}
166 | 
167 | \begin{frame}{Features of R (cont'd)}
168 | It's free!\\
169 | (Both in the sense of beer and in the sense of speech.)
170 | \end{frame}
171 | 
172 | \begin{frame}{Free Software}
173 | With \textit{free software}, you are granted
174 | \begin{itemize}
175 | \item
176 | The freedom to run the program, for any purpose (freedom 0).
177 | \item
178 | The freedom to study how the program works, and adapt it to your needs
179 | (freedom 1). Access to the source code is a precondition for this.
180 | \item
181 | The freedom to redistribute copies so you can help your neighbor
182 | (freedom 2).
183 | \item
184 | The freedom to improve the program, and release your improvements to
185 | the public, so that the whole community benefits (freedom 3). Access
186 | to the source code is a precondition for this.
187 | \end{itemize}
188 | http://www.fsf.org
189 | \end{frame}
190 | 
191 | \begin{frame}{Drawbacks of R}
192 | \begin{itemize}
193 | \item
194 | Essentially based on 40 year old technology.
195 | \item
196 | Little built in support for dynamic or 3-D graphics (but things have
197 | improved greatly since the ``old days'').
198 | \item
199 | Functionality is based on consumer demand and user contributions.  If
200 | no one feels like implementing your favorite method, then it's
201 | \textit{your} job!
202 | \begin{itemize}
203 | \item (Or you need to pay someone to do it)
204 | \end{itemize}
205 | \item Objects must generally be stored in physical memory; but there
206 |   have been advancements to deal with this too
207 | \item
208 | Not ideal for all possible situations (but this is a drawback of all
209 | software packages).
210 | \end{itemize}
211 | \end{frame}
212 | 
213 | 
214 | 
215 | \begin{frame}{Design of the R System}
216 | The R system is divided into 2 conceptual parts:
217 | \begin{enumerate}
218 | \item
219 | The ``base'' R system that you download from CRAN
220 | \item
221 | Everything else.
222 | \end{enumerate}
223 | R functionality is divided into a number of \textit{packages}.
224 | \begin{itemize}
225 | \item
226 | The ``base'' R system contains, among other things, the \pkg{base}
227 | package which is required to run R and contains the most fundamental
228 | functions.
229 | \item
230 | The other packages contained in the ``base'' system include
231 | \pkg{utils}, \pkg{stats}, \pkg{datasets}, \pkg{graphics},
232 | \pkg{grDevices}, \pkg{grid}, \pkg{methods}, \pkg{tools},
233 | \pkg{parallel}, \pkg{compiler}, \pkg{splines}, \pkg{tcltk},
234 | \pkg{stats4}.
235 | \item
236 | There are also ``Recommend'' packages: \pkg{boot}, \pkg{class},
237 | \pkg{cluster}, \pkg{codetools}, \pkg{foreign}, \pkg{KernSmooth},
238 | \pkg{lattice}, \pkg{mgcv}, \pkg{nlme}, \pkg{rpart}, \pkg{survival},
239 | \pkg{MASS}, \pkg{spatial}, \pkg{nnet}, \pkg{Matrix}.
240 | \end{itemize}
241 | \end{frame}
242 | 
243 | 
244 | 
245 | 
246 | \begin{frame}{Design of the R System}
247 | And there are many other packages available:
248 | \begin{itemize}
249 | \item
250 | There are about $4000$ packages on CRAN that have been developed by
251 | users and programmers around the world.
252 | \item
253 | There are also many packages associated with the Bioconductor project
254 | (http://bioconductor.org).
255 | \item
256 | People often make packages available on their personal websites; there
257 | is no reliable way to keep track of how many packages are available in
258 | this fashion.
259 | \end{itemize}
260 | \end{frame}
261 | 
262 | \begin{frame}{Some R Resources}
263 | Available from CRAN (http://cran.r-project.org)
264 | \begin{itemize}
265 | \item
266 | An Introduction to R
267 | \item
268 | Writing R Extensions
269 | \item
270 | R Data Import/Export
271 | \item
272 | R Installation and Administration (mostly for building R from sources)
273 | \item
274 | R Internals (not for the faint of heart)
275 | \end{itemize}
276 | \end{frame}
277 | 
278 | \begin{frame}{Some Useful Books on S/R}
279 | Standard texts
280 | \begin{itemize}
281 | \item
282 | Chambers (2008). \textit{Software for Data Analysis}, Springer. (your
283 | textbook)
284 | \item
285 | Chambers (1998). \textit{Programming with Data}, Springer.
286 | \item
287 | Venables \& Ripley (2002). \textit{Modern Applied Statistics with S},
288 | Springer.
289 | \item
290 | Venables \& Ripley (2000). \textit{S Programming}, Springer.
291 | \item
292 | Pinheiro \& Bates (2000). \textit{Mixed-Effects Models in S and
293 | S-PLUS}, Springer.
294 | \item
295 | Murrell (2005). \textit{R Graphics}, Chapman \& Hall/CRC Press.
296 | \end{itemize}
297 | Other resources
298 | \begin{itemize}
299 | \item
300 | Springer has a series of books called \textit{Use R!}.
301 | \item
302 | A longer list of books is at
303 | http://www.r-project.org/doc/bib/R-books.html
304 | \end{itemize}
305 | \end{frame}
306 | 
307 | 
308 | 
309 | 
310 | 
311 | 
312 | 
313 | 
314 | 
315 | 
316 | 
317 | 
318 | 
319 | \end{document}
320 | 
321 | 
322 | 


--------------------------------------------------------------------------------
/plotting.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[The R Language]{Introduction to the R Language}
 26 | 
 27 | \subtitle{Plotting}
 28 | 
 29 | \date{Computing for Data Analysis}
 30 | 
 31 | 
 32 | 
 33 | \begin{document}
 34 | 
 35 | \begin{frame}
 36 |   \titlepage
 37 | \end{frame}
 38 | 
 39 | 
 40 | \begin{frame}{Plotting}
 41 | The plotting and graphics engine in R is encapsulated in a few base
 42 | and recommend packages:
 43 | \begin{itemize}
 44 | \item
 45 | \pkg{graphics}: contains plotting functions for the ``base'' graphing
 46 | systems, including \code{plot}, \code{hist}, \code{boxplot} and many
 47 | others.
 48 | \item
 49 |   \pkg{lattice}: contains code for producing Trellis graphics, which are
 50 |   independent of the ``base'' graphics system; includes functions like
 51 |   \code{xyplot}, \code{bwplot}, \code{levelplot}
 52 | \item
 53 |   \pkg{grid}: implements a different graphing system independent of the
 54 |   ``base'' system; the \pkg{lattice} package builds on top of
 55 |   \pkg{grid}; we seldom call functions from the \pkg{grid} package
 56 | directly
 57 | \item
 58 | \pkg{grDevices}: contains all the code implementing the various
 59 | graphics devices, including X11, PDF, PostScript, PNG, etc.
 60 | \end{itemize}
 61 | \end{frame}
 62 | 
 63 | \begin{frame}{The Process of Making a Plot}
 64 |   When making a plot one must first make a few choices (not
 65 |   necessarily in this order):
 66 | \begin{itemize}
 67 | \item To what device will the plot be sent?  The default in Unix is
 68 |   \code{x11}; on Windows it is \code{windows}; on Mac OS X it is
 69 |   \code{quartz}
 70 | \item Is the plot for viewing temporarily on the screen, or will it
 71 |   eventually end up in a paper?  Are you using it in a presentation?
 72 |   Plots included in a paper/presentation need to use a file device
 73 |   rather than a screen device.
 74 | \item
 75 | Is there a large amount of data going into the plot?  Or is it just a
 76 | few points?  
 77 | \item
 78 | Do you need to be able to resize the graphic?  
 79 | \end{itemize}
 80 | \end{frame}
 81 | 
 82 | 
 83 | \begin{frame}{The Process of Making a Plot}
 84 | \begin{itemize}
 85 | \item
 86 | What graphics system will you use: base or grid/lattice?  These
 87 | generally cannot be mixed.
 88 | \item
 89 | Base graphics are usually constructed piecemeal, with each aspect of
 90 | the plot handled separately through a series of function calls; this
 91 | is often conceptually simpler and allows plotting to mirror the
 92 | thought process
 93 | \item
 94 | Lattice/grid graphics are usually created in a single function call,
 95 | so all of the graphics parameters have to specified at once;
 96 | specifying everything at once allows R to automatically calculate the
 97 | necessary spacings and font sizes.
 98 | \end{itemize}
 99 | \end{frame}
100 | 
101 | 
102 | \begin{frame}{Base Graphics}
103 | Base graphics are used most commonly and are a very powerful system
104 | for creating 2-D graphics.  
105 | \begin{itemize}
106 | \item
107 | Calling \code{plot(x, y)} or \code{hist(x)} will launch a graphics
108 | device (if one is not already open) and draw the plot on the device
109 | \item
110 | If the arguments to \code{plot} are not of some special class, then
111 | the \textit{default method} for \code{plot} is called; this function
112 | has \textit{many} arguments, letting you set the title, x axis lable,
113 | y axis label, etc.
114 | \item
115 | The base graphics system has \textit{many} parameters that can set and
116 | tweaked; these parameters are documented in \code{?par}; it wouldn't
117 | hurt to memorize this help page!
118 | \end{itemize}
119 | \end{frame}
120 | 
121 | \begin{frame}{Some Important Base Graphics Parameters}
122 | The \code{par} function is used to specify global graphics parameters
123 | that affect all plots in an R session.  These parameters can often be
124 | overridden as arguments to specific plotting functions.
125 | \begin{itemize}
126 | \item
127 | pch:  the plotting symbol (default is open circle)
128 | \item
129 | lty: the line type (default is solid line), can be dashed, dotted,
130 | etc.
131 | \item
132 | lwd: the line width, specified as an integer multiple
133 | \item
134 | col: the plotting color, specified as a number, string, or hex code;
135 | the \code{colors} function gives you a vector of colors by name
136 | \item
137 | las: the orientation of the axis labels on the plot
138 | \end{itemize}
139 | \end{frame}
140 | 
141 | \begin{frame}{Some Important Base Graphics Parameters}
142 | \begin{itemize}
143 | \item
144 | bg: the background color
145 | \item
146 | mar:  the margin size
147 | \item
148 | oma:  the outer margin size (default is 0 for all sides)
149 | \item
150 | mfrow: number of plots per row, column (plots are filled row-wise)
151 | \item
152 | mfcol: number of plots per row, column (plots are filled column-wise)
153 | \end{itemize}
154 | \end{frame}
155 | 
156 | \begin{frame}[fragile]{Some Important Base Graphics Parameters}
157 | Some default values.
158 | \begin{verbatim}
159 | > par("lty")
160 | [1] "solid"
161 | > par("lwd")
162 | [1] 1
163 | > par("col")
164 | [1] "black"
165 | > par("pch")
166 | [1] 1
167 | \end{verbatim}
168 | \end{frame}
169 | 
170 | \begin{frame}[fragile]{Some Important Base Graphics Parameters}
171 | Some default values.
172 | \begin{verbatim}
173 | > par("bg")
174 | [1] "transparent"
175 | > par("mar")
176 | [1] 5.1 4.1 4.1 2.1
177 | > par("oma")
178 | [1] 0 0 0 0
179 | > par("mfrow")
180 | [1] 1 1
181 | > par("mfcol")
182 | [1] 1 1
183 | \end{verbatim}
184 | \end{frame}
185 | 
186 | \begin{frame}{Some Important Base Plotting Functions}
187 | \begin{itemize}
188 | \item
189 | \code{plot}: make a scatterplot, or other type of plot depending on
190 | the class of the object being plotted
191 | \item
192 | \code{lines}: add lines to a plot, given a vector x values and a
193 | corresponding vector of y values (or a 2-column matrix); this function
194 | just connects the dots
195 | \item
196 | \code{points}: add points to a plot
197 | \item
198 | \code{text}: add text labels to a plot using specified x, y
199 | coordinates
200 | \item
201 | \code{title}: add annotations to x, y axis labels, title, subtitle,
202 | outer margin
203 | \item
204 | \code{mtext}: add arbitrary text to the margins (inner or outer) of
205 | the plot
206 | \item
207 | \code{axis}: adding axis ticks/labels
208 | \end{itemize}
209 | \end{frame}
210 | 
211 | 
212 | \begin{frame}{Useful Graphics Devices}
213 | The list of devices is found in \code{?Devices}; there are also
214 | devices created by users on CRAN
215 | \begin{itemize}
216 | \item
217 | \code{pdf}: useful for line-type graphics, vector
218 | format, resizes well, usually portable
219 | \item
220 | \code{postscript}: older format, also vector format and resizes well,
221 | usually portable, can be used to create encapsulated postscript files,
222 | Windows systems often don't have a postscript viewer
223 | \item
224 | \code{xfig}: good of you use Unix and want to edit a plot by hand
225 | \end{itemize}
226 | \end{frame}
227 | 
228 | \begin{frame}{Useful Graphics Devices}
229 | \begin{itemize}
230 | \item
231 | \code{png}: bitmapped format, good for line drawings or images with
232 | solid colors, uses lossless compression (like the old GIF format),
233 | most web browsers can read this format natively, good for plotting
234 | many many many points, does not resize well
235 | \item
236 | \code{jpeg}: good for photographs or natural scenes, uses lossy
237 | compression, good for plotting many many many points, does not resize
238 | well, can be read by almost any computer and any web browser, not
239 | great for line drawings
240 | \item
241 | \code{bitmap}: needed to create bitmap files (png, jpeg) in certain
242 | situations (uses Ghostscript), also can be used to create a variety of
243 | other bitmapped formats not mentioned
244 | \item
245 | \code{bmp}: a native Windows bitmapped format
246 | \end{itemize}
247 | \end{frame}
248 | 
249 | 
250 | \begin{frame}{Copying Plots}
251 | There are two basic approaches to plotting.
252 | \begin{enumerate}
253 | \item
254 | Launch a graphics device
255 | \item
256 | Make a plot; annotate if needed
257 | \item
258 | Close graphics device
259 | \end{enumerate}
260 | Or
261 | \begin{enumerate}
262 | \item
263 | Make a plot on a screen device (default); annotate if needed
264 | \item
265 | Copy the plot to another device if necessary (not an exact process)
266 | \end{enumerate}
267 | \end{frame}
268 | 
269 | 
270 | \begin{frame}{Copying Plots}
271 | Copying a plot to another device can be useful because some plots
272 | require a lot of code and it can be a pain to type all that in again
273 | for a different device.
274 | \begin{itemize}
275 | \item \code{dev.copy}: copy a plot from one device to another
276 | \item \code{dev.copy2pdf}: copy a plot to a Portable Document Format
277 |   (PDF) file
278 | \item \code{dev.list}: show the list of open graphics devices
279 | \item \code{dev.next}: switch control to the next graphics device on
280 |   the device list
281 | \item \code{dev.set}: set control to a specific graphics device
282 | \item \code{dev.off}: close the current graphics device
283 | \end{itemize}
284 | NOTE: Copying a plot is not an exact operation!
285 | \end{frame}
286 | 
287 | 
288 | \begin{frame}{Lattice Functions}
289 | \begin{itemize}
290 | \item
291 | \code{xyplot}: this is the main function for creating scatterplots
292 | \item
293 | \code{bwplot}: box-and-whiskers plots (``boxplots'')
294 | \item
295 | \code{histogram}: histograms
296 | \item
297 | \code{stripplot}: like a boxplot but with actual points
298 | \item
299 | \code{dotplot}: plot dots on ``violin strings''
300 | \item
301 | \code{splom}: scatterplot matrix; like \code{pairs} in base graphics
302 | system
303 | \item
304 | \code{levelplot}, \code{contourplot}: for plotting ``image'' data
305 | \end{itemize}
306 | \end{frame}
307 | 
308 | 
309 | \begin{frame}[fragile]{Lattice Functions}
310 | Lattice functions generally take a formula for their first argument,
311 | usually of the form
312 | \begin{verbatim}
313 | y ~ x | f * g
314 | \end{verbatim}
315 | \begin{itemize}
316 | \item
317 | On the left of the \verb+~+ is the y variable, on the right is the x
318 | variable
319 | \item
320 | After the \verb+|+ are \textit{conditioning variables} --- they are
321 | optional; the \verb+*+ indicates an interaction
322 | \item
323 | The second argument is the data frame or list from which the variables
324 | in the formula should be obtained.
325 | \item
326 | If no data frame or list is passed, then the parent frame is used.
327 | \item
328 | If no other arguments are passed, there are defaults that can be used.
329 | \end{itemize}
330 | \end{frame}
331 | 
332 | 
333 | \begin{frame}{Lattice Behavior}
334 | Lattice functions behave differently from base graphics functions in
335 | one critical way.
336 | \begin{itemize}
337 | \item
338 | Base graphics functions plot data directly the graphics device
339 | \item
340 | Lattice graphics functions return an object of class \code{trellis}.
341 | \item
342 | The print methods for lattice functions actually do the work of
343 | plotting the data on the graphics device.
344 | \item
345 | Lattice functions return ``plot objects'' that can, in principle, be
346 | stored (but it's usually better to just save the code + data).
347 | \item
348 | On the command line, \code{trellis} objects are \textit{auto-printed}
349 | so that it appears the function is plotting the data
350 | \end{itemize}
351 | \end{frame}
352 | 
353 | 
354 | %% \begin{frame}[fragile]{Calling Lattice Functions}
355 | %% \begin{verbatim}
356 | %% p <- xyplot(y ~ x | f, subscripts = TRUE,
357 | %%             ylim = lattice:::extend.limits(range(lo, hi)),
358 | %%             panel = function(x, y, subscripts, ...) {
359 | %%                 panel.xyplot(x, y, ...)
360 | %%                 lsegments(x, lo[subscripts], 
361 | %%                           x, hi[subscripts])
362 | %%                 panel.abline(h = 0, lty = 2)
363 | %%             },
364 | %%             xlab = NULL, 
365 | %%             scales = list(x = list(at = 1:nmodels, labels = models, 
366 | %%                                    rot = 90, alternating = FALSE), 
367 | %%                           y = list(alternating = 3)),
368 | %%             ylab = list(label = expression("% increase in
369 | %%             admissions for a 10 " * mu * g/m^3 * " inceas%% e in " * PM[2.5]), cex = 0.8)
370 | %%             )
371 | %% print(p)
372 | %% \end{verbatim}
373 | %% \end{frame}
374 | 
375 | 
376 | \begin{frame}[fragile]{Lattice Panel Functions}
377 | Lattice functions have a \code{panel} function which controls what
378 | happens inside each panel of the entire plot.
379 | \begin{verbatim}
380 | x <- rnorm(100)
381 | y <- x + rnorm(100, sd = 0.5)
382 | f <- gl(2, 50, labels = c("Group 1", "Group 2"))
383 | xyplot(y ~ x | f)
384 | \end{verbatim}
385 | plots y vs. x conditioned on f.
386 | \end{frame}
387 | 
388 | \begin{frame}[fragile]{Lattice Panel Functions}
389 | \begin{verbatim}
390 | xyplot(y ~ x | f,
391 |        panel = function(x, y, ...) {
392 |                panel.xyplot(x, y, ...)
393 |                panel.abline(h = median(y),
394 |                             lty = 2)
395 |        })
396 | \end{verbatim}
397 | plots y vs. x conditioned on f with horizontal (dashed) line drawn at
398 | the median of y for each panel.
399 | \end{frame}
400 | 
401 | 
402 | \begin{frame}[fragile]{Lattice Panel Functions}
403 | Adding a regression line
404 | \begin{verbatim}
405 | xyplot(y ~ x | f,
406 |        panel = function(x, y, ...) {
407 |                panel.xyplot(x, y, ...)
408 |                panel.lmline(x, y, col = 2)
409 |        })
410 | \end{verbatim}
411 | fits and plots a simple linear regression line to each panel of the
412 | plot.
413 | \end{frame}
414 | 
415 | \begin{frame}[fragile]{Using Subscripts}
416 | Sometimes you need to access objects outside the panel envrionment.
417 | \begin{verbatim}
418 | y <- c(rnorm(10), rnorm(10, 2))
419 | x <- rep(1:10, 2)
420 | std <- rep(1, 20)
421 | rng <- range(y - 1.96 * std, y + 1.96 * std)
422 | f <- gl(2, 10)
423 | 
424 | xyplot(y ~ x | f, subscripts = TRUE, ylim = rng,
425 |        panel = function(x, y, subscripts, ...) {
426 |                panel.xyplot(x, y, ...)
427 |                lsegments(x, y - 1.96 * std[subscripts],
428 |                          x, y + 1.96 * std[subscripts])
429 |                panel.abline(h = 0, lty = 2)
430 |        })
431 | \end{verbatim}
432 | \end{frame}
433 | 
434 | 
435 | \begin{frame}{Mathematical Annotation}
436 | R can produce \LaTeX-like symbols on a plot for mathematical
437 | annotation.  This is very handy and is useful for making fun of people
438 | who use other statistical packages.
439 | \begin{itemize}
440 | \item
441 | Math symbols are ``expressions'' in R and need to be wrapped in the
442 | \code{expression} function
443 | \item
444 | There is a set list of allowed symbols and this is documented in ?plotmath
445 | \item
446 | Plotting functions that take arguments for text generally allow
447 | expressions for math symbols
448 | \end{itemize}
449 | \end{frame}
450 | 
451 | 
452 | \begin{frame}[fragile]{Mathematical Annotation}
453 | Some examples.
454 | \begin{verbatim}
455 | plot(0, 0, main = expression(theta == 0), 
456 |      ylab = expression(hat(gamma) == 0),
457 |      xlab = expression(sum(x[i] * y[i], i==1, n)))
458 | \end{verbatim}
459 | Pasting strings together.
460 | \begin{verbatim}
461 | x <- rnorm(100)
462 | hist(x, 
463 |      xlab=expression("The mean (" * bar(x) * ") is " * 
464 |                      sum(x[i]/n,i==1,n)))
465 | \end{verbatim}
466 | \end{frame}
467 | 
468 | \begin{frame}[fragile]{Substituting}
469 | What if you want to use a computed value in the annotation?
470 | \begin{verbatim}
471 | x <- rnorm(100)
472 | y <- x + rnorm(100, sd = 0.5)
473 | plot(x, y, 
474 |      xlab=substitute(bar(x) == k, list(k=mean(x))),
475 |      ylab=substitute(bar(y) == k, list(k=mean(y)))
476 |      )
477 | \end{verbatim}
478 | Or in a loop of plots
479 | \begin{verbatim}
480 | par(mfrow = c(2, 2))
481 | for(i in 1:4) {
482 |         x <- rnorm(100)
483 |         hist(x, main=substitute(theta==num,list(num=i)))
484 | }
485 | \end{verbatim}
486 | \end{frame}
487 | 
488 | 
489 | \begin{frame}{Summary of Important Help Pages}
490 | \begin{itemize}
491 | \item
492 | ?par
493 | \item
494 | ?plot
495 | \item
496 | ?xyplot
497 | \item
498 | ?plotmath
499 | \item
500 | ?axis
501 | \end{itemize}
502 | \end{frame}
503 | 
504 | 
505 | \end{document}
506 | 


--------------------------------------------------------------------------------
/reading-data.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[The R Language]{Introduction to the R Language}
 26 | 
 27 | \subtitle{Reading and Writing Data}
 28 | 
 29 | \date{Computing for Data Analysis}
 30 | 
 31 | \setbeamertemplate{footline}[page number]
 32 | 
 33 | 
 34 | \begin{document}
 35 | 
 36 | \begin{frame}
 37 |   \titlepage
 38 | \end{frame}
 39 | 
 40 | 
 41 | 
 42 | \begin{frame}{Reading Data}
 43 | There are a few principal functions reading data into R.
 44 | \begin{itemize}
 45 | \item
 46 | \code{read.table}, \code{read.csv}, for reading tabular data
 47 | \item
 48 | \code{readLines}, for reading lines of a text file
 49 | \item
 50 | \code{source}, for reading in R code files (inverse of \code{dump})
 51 | \item
 52 | \code{dget}, for reading in R code files (inverse of \code{dput})
 53 | \item
 54 | \code{load}, for reading in saved workspaces
 55 | \item
 56 | \code{unserialize}, for reading single R objects in binary form
 57 | \end{itemize}
 58 | \end{frame}
 59 | 
 60 | 
 61 | \begin{frame}{Writing Data}
 62 | There are analogous functions for writing data to files
 63 | \begin{itemize}
 64 | \item
 65 | \code{write.table}
 66 | \item
 67 | \code{writeLines}
 68 | \item
 69 | \code{dump}
 70 | \item
 71 | \code{dput}
 72 | \item
 73 | \code{save}
 74 | \item
 75 | \code{serialize}
 76 | \end{itemize}
 77 | \end{frame}
 78 | 
 79 | 
 80 | 
 81 | \begin{frame}{Reading Data Files with read.table}
 82 | The \code{read.table} function is one of the most commonly used
 83 | functions for reading data.  It has a few important arguments:
 84 | \begin{itemize}
 85 | \item
 86 | \code{file}, the name of a file, or a connection
 87 | \item
 88 | \code{header}, logical indicating if the file has a header line
 89 | \item
 90 | \code{sep}, a string indicating how the columns are separated
 91 | \item
 92 | \code{colClasses}, a character vector indicating the class of each
 93 | column in the dataset
 94 | \item
 95 | \code{nrows}, the number of rows in the dataset
 96 | \item
 97 | \code{comment.char}, a character string indicating the comment
 98 | character
 99 | \item
100 | \code{skip}, the number of lines to skip from the beginning
101 | \item
102 | \code{stringsAsFactors}, should character variables be coded as
103 | factors?
104 | \end{itemize}
105 | \end{frame}
106 | 
107 | 
108 | \begin{frame}[fragile]{read.table}
109 | For small to moderately sized datasets, you can usually call
110 | \code{read.table} without specifying any other arguments
111 | \begin{verbatim}
112 | data <- read.table("foo.txt")
113 | \end{verbatim}
114 | R will automatically
115 | \begin{itemize}
116 | \item
117 | skip lines that begin with a \#
118 | \item
119 | figure out how many rows there are (and how much memory needs to be
120 | allocated)
121 | \item
122 | figure what type of variable is in each column of the table
123 | \end{itemize}
124 | Telling R all these things directly makes R run faster and more
125 | efficiently.
126 | \begin{itemize}
127 | \item
128 | \code{read.csv} is identical to \code{read.table} except that the
129 | default separator is a comma.
130 | \end{itemize}
131 | \end{frame}
132 | 
133 | 
134 | \begin{frame}{Reading in Larger Datasets with read.table}
135 | With much larger datasets, doing the following things will make your
136 | life easier and will prevent R from choking.
137 | \begin{itemize}
138 | \item
139 | Read the help page for \code{read.table}, which contains many hints
140 | \item
141 | Make a rough calculation of the memory required to store your dataset.
142 | If the dataset is larger than the amount of RAM on your computer, you
143 | can probably stop right here.
144 | \item
145 | Set \code{comment.char = ""} if there are no commented lines in your
146 | file.
147 | \end{itemize}
148 | \end{frame}
149 | 
150 | 
151 | \begin{frame}[fragile]{Reading in Larger Datasets with read.table}
152 | \begin{itemize}
153 | \item
154 | Use the \code{colClasses} argument.  Specifying this option instead of
155 | using the default can make 'read.table' run MUCH faster, often twice
156 | as fast. In order to use this option, you have to know the class of
157 | each column in your data frame. If all of the columns are ``numeric'',
158 | for example, then you can just set \code{colClasses = "numeric"}.  A
159 | quick an dirty way to figure out the classes of each column is the
160 | following:
161 | \begin{verbatim}
162 | initial <- read.table("datatable.txt", nrows = 100)
163 | classes <- sapply(initial, class)
164 | tabAll <- read.table("datatable.txt", 
165 |                      colClasses = classes)
166 | \end{verbatim}
167 | \item
168 | Set \code{nrows}.  This doesn't make R run faster but it helps with
169 | memory usage.  A mild overestimate is okay.  You can use the Unix tool
170 | \code{wc} to calculate the number of lines in a file.
171 | \end{itemize}
172 | \end{frame}
173 | 
174 | 
175 | \begin{frame}{Know Thy System}
176 | In general, when using R with larger datasets, it's useful to know a
177 | few things about your system.
178 | \begin{itemize}
179 | \item
180 | How much memory is available?
181 | \item
182 | What other applications are in use?
183 | \item
184 | Are there other users logged into the same system?
185 | \item
186 | What operating system?
187 | \item
188 | Is the OS 32 or 64 bit?
189 | \end{itemize}
190 | \end{frame}
191 | 
192 | \begin{frame}{Calculating Memory Requirements}
193 | I have a data frame with 1,500,000 rows and 120 columns, all of which
194 | are numeric data.  Roughly, how much memory is required to store this
195 | data frame?
196 | \begin{eqnarray*}
197 | 1,500,000\times 120 \times\mbox{$8$ bytes/numeric}
198 | & = &
199 | 1440000000\mbox{ bytes}\\
200 | & = &
201 | 1440000000 / 2^{20}\mbox{ bytes/MB}\\
202 | & = &
203 | 1,373.29\mbox{ MB}\\
204 | & = &
205 | 1.34\mbox{ GB}
206 | \end{eqnarray*}
207 | \end{frame}
208 | 
209 | 
210 | \begin{frame}{Textual Formats}
211 | \begin{itemize}
212 | \item
213 | \code{dump}ing and \code{dput}ing are useful because the resulting
214 | textual format is edit-able, and in the case of corruption,
215 | potentially recoverable.
216 | \item
217 | Unlike writing out a table or csv file, \code{dump} and \code{dput}
218 | preserve the \textit{metadata} (sacrificing some readability), so that
219 | another user doesn't have to specify it all over again.
220 | \item
221 | Textual formats can work much better with version control programs
222 | like subversion or git which can only track changes meaningfully in
223 | text files
224 | \item Textual formats can be longer-lived; if there is corruption
225 |   somewhere in the file, it can be easier to fix the problem
226 | \item
227 | Textual formats adhere to the ``Unix philosophy''
228 | \item Downside: The format is not very space-efficient
229 | \end{itemize}
230 | \end{frame}
231 | 
232 | \begin{frame}[fragile]{dput-ting R Objects}
233 | Another way to pass data around is by deparsing the R object with
234 | \code{dput} and reading it back in using \code{dget}.
235 | \begin{verbatim}
236 | > y <- data.frame(a = 1, b = "a")
237 | > dput(y)
238 | structure(list(a = 1, 
239 |                b = structure(1L, .Label = "a", 
240 |                              class = "factor")), 
241 |           .Names = c("a", "b"), row.names = c(NA, -1L), 
242 |           class = "data.frame")
243 | > dput(y, file = "y.R")
244 | > new.y <- dget("y.R")
245 | > new.y
246 |   a b
247 | 1 1 a
248 | \end{verbatim}
249 | \end{frame}
250 | 
251 | 
252 | \begin{frame}[fragile]{Dumping R Objects}
253 | Multiple objects can be deparsed using the \code{dump} function and
254 | read back in using \code{source}.
255 | \begin{verbatim}
256 | > x <- "foo"
257 | > y <- data.frame(a = 1, b = "a")
258 | > dump(c("x", "y"), file = "data.R")
259 | > rm(x, y)
260 | > source("data.R")
261 | > y
262 |   a b
263 | 1 1 a
264 | > x
265 | [1] "foo"
266 | \end{verbatim}
267 | \end{frame}
268 | 
269 | 
270 | 
271 | \begin{frame}{Interfaces to the Outside World}
272 | Data are read in using \textit{connection} interfaces.  Connections
273 | can be made to files (most common) or to other more exotic things.
274 | \begin{itemize}
275 | \item
276 | \code{file}, opens a connection to a file
277 | \item
278 | \code{gzfile}, opens a connection to a file compressed with gzip
279 | \item
280 | \code{bzfile}, opens a connection to a file compressed with bzip2
281 | \item
282 | \code{url}, opens a connection to a webpage
283 | \end{itemize}
284 | \end{frame}
285 | 
286 | 
287 | \begin{frame}[fragile]{File Connections}
288 | \begin{verbatim}
289 | > str(file)
290 | function (description = "", open = "", blocking = TRUE, 
291 |           encoding = getOption("encoding"))
292 | \end{verbatim}
293 | \begin{itemize}
294 | \item
295 | \code{description} is the name of the file
296 | \item
297 | \code{open} is a code indicating
298 | \begin{itemize}
299 | \item
300 | ``r'' read only
301 | \item
302 | ``w'' writing (and initializing a new file)
303 | \item
304 | ``a'' appending
305 | \item
306 | ``rb'', ``wb'', ``ab'' reading, writing, or appending in binary mode
307 | (Windows)
308 | \end{itemize}
309 | \end{itemize}
310 | \end{frame}
311 | 
312 | 
313 | \begin{frame}[fragile]{Connections}
314 | In general, connections are powerful tools that let you navigate files
315 | or other external objects.  In practice, we often don't need to deal
316 | with the connection interface directly.
317 | \begin{verbatim}
318 | con <- file("foo.txt", "r")
319 | data <- read.csv(con)
320 | close(con)
321 | \end{verbatim}
322 | is the same as
323 | \begin{verbatim}
324 | data <- read.csv("foo.txt")
325 | \end{verbatim}
326 | \end{frame}
327 | 
328 | 
329 | \begin{frame}[fragile]{Reading Lines of a Text File}
330 | The \code{readLines} function can be used to simply read lines of a
331 | text file and store them in a character vector.
332 | \begin{verbatim}
333 | > con <- gzfile("words.gz")
334 | > x <- readLines(con, 10)
335 | > x
336 |  [1] "1080"     "10-point" "10th"     "11-point"
337 |  [5] "12-point" "16-point" "18-point" "1st"
338 |  [9] "2"        "20-point"
339 | \end{verbatim}
340 | \code{writeLines} takes a character vector and writes each element one
341 | line at a time to a text file.
342 | \end{frame}
343 | 
344 | 
345 | \begin{frame}[fragile]{Reading Lines of a Text File}
346 | \code{readLines} can be useful for reading in lines of webpages
347 | \begin{verbatim}
348 | ## This might take time
349 | con <- url("http://www.jhsph.edu", "r")
350 | x <- readLines(con)
351 | > head(x)
352 | [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">"
353 | [2] ""
354 | [3] "<html>"
355 | [4] "<head>"
356 | [5] "\t<meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\" />"[6] "\t"
357 | \end{verbatim}
358 | \end{frame}
359 | 
360 | \begin{frame}[fragile]{Saving Data in Non-tabular Forms}
361 | For temporary storage or for transport, it is more efficient to save
362 | data in (compressed) binary form using \code{save} or
363 | \code{save.image}.
364 | \begin{verbatim}
365 | x <- 1
366 | y <- data.frame(a = 1, b = "a")
367 | save(x, y, file = "data.RData")
368 | load("data.RData")  ## overwrites existing x and y!
369 | \end{verbatim}
370 | Binary formats are not great for long-term storage because if they are
371 | corrupted, recovery is usually not possible.
372 | \end{frame}
373 | 
374 | \begin{frame}{Serialization}
375 | Serialization is the process of taking an R object and converting into
376 | a representation as a ``series'' of bytes.  
377 | \begin{itemize}
378 | \item
379 | The \code{save} and \code{save.image} functions serialize R objects
380 | and then save them to files
381 | \item
382 | The \code{serialize} function can be used to serialize an R object to
383 | an arbitrary connection (database, socket, pipe, etc.)
384 | \item
385 | \code{unserialize} reads from an arbitrary connection and inverts a
386 | serialization, returning an R object
387 | \end{itemize}
388 | \end{frame}
389 | 
390 | 
391 | \begin{frame}[fragile]{Serialization}
392 | \begin{verbatim}
393 | >  x <- list(1, 2, 3)
394 | > serialize(x, NULL)
395 |  [1] 58 0a 00 00 00 02 00 02 06 01 00 02 03 00 00
396 | [16] 00 00 13 00 00 00 03 00 00 00 0e 00 00 00 01
397 | [31] 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00
398 | [46] 01 40 00 00 00 00 00 00 00 00 00 00 0e 00 00
399 | [61] 00 01 40 08 00 00 00 00 00 00
400 | \end{verbatim}
401 | \end{frame}
402 | 
403 | \begin{frame}[fragile]{Serialization}
404 | \begin{verbatim}
405 | > con <- gzfile("foo.gz", "wb")
406 | > serialize(x, con)
407 | NULL
408 | > close(con)
409 | >
410 | > con <- gzfile("foo.gz", "rb")
411 | > y <- unserialize(con)
412 | > identical(x, y)
413 | [1] TRUE
414 | \end{verbatim}
415 | \end{frame}
416 | 
417 | 
418 | \begin{frame}{Data Output Summary}
419 | \begin{itemize}
420 | \item
421 | \code{write.table}, \code{write.csv} --- readable output, textual,
422 | little metadata
423 | \item
424 | \code{save}, \code{save.image}, \code{serialize} --- exact
425 | representation, efficient storage if compressed, not recoverable if
426 | corrupted
427 | \item
428 | \code{dput}, \code{dump} --- textual format, somewhat readable,
429 | metadata retained, not usable for more exotic objects (environments)
430 | \end{itemize}
431 | \end{frame}
432 | 
433 | 
434 | \end{document}
435 | 


--------------------------------------------------------------------------------
/regex.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | \setbeamertemplate{footline}[page number]
 13 | 
 14 | \usepackage[english]{babel}
 15 | \usepackage[latin1]{inputenc}
 16 | \usepackage{graphicx}
 17 | %\usepackage{times}
 18 | %\usepackage[T1]{fontenc}
 19 | % Or whatever. Note that the encoding and the font should match. If T1
 20 | % does not look nice, try deleting the line with the fontenc.
 21 | 
 22 | \usepackage{amsmath,amsfonts,amssymb}
 23 | 
 24 | \input{macros}
 25 | 
 26 | \title{Regular Expressions}
 27 | 
 28 | \date{Computing for Data Analysis}
 29 | 
 30 | \begin{document}
 31 | 
 32 | \begin{frame}
 33 |   \titlepage
 34 | \end{frame}
 35 | 
 36 | \begin{frame}{Regular expressions}
 37 | \begin{itemize}
 38 | \item
 39 | Regular expressions can be thought of as a combination of literals and
 40 | \textit{metacharacters}
 41 | \item
 42 | To draw an analogy with natural language, think of literal text
 43 | forming the words of this language, and the metacharacters defining
 44 | its grammar
 45 | \item
 46 | Regular expressions have a rich set of metacharacters
 47 | \end{itemize}
 48 | 
 49 | \end{frame}
 50 | 
 51 | \begin{frame}[fragile]{Literals}
 52 | Simplest pattern consists only of literals.  The literal ``nuclear''
 53 | would match to the following lines:
 54 | \begin{verbatim}
 55 | Ooh. I just learned that to keep myself alive after a 
 56 | nuclear blast! All I have to do is milk some rats 
 57 | then drink the milk. Aweosme. :}
 58 | 
 59 | Laozi says nuclear weapons are mas macho
 60 | 
 61 | Chaos in a country that has nuclear weapons -- not good.
 62 | 
 63 | my nephew is trying to teach me nuclear physics, or 
 64 | possibly just trying to show me how smart he is 
 65 | so I'll be proud of him [which I am].
 66 | 
 67 | lol if you ever say "nuclear" people immediately think 
 68 | DEATH by radiation LOL
 69 | \end{verbatim}
 70 | \end{frame}
 71 | 
 72 | \begin{frame}[fragile]{Literals}
 73 | The literal ``Obama'' would match to the following lines
 74 | \begin{verbatim}
 75 | Politics r dum. Not 2 long ago Clinton was sayin Obama 
 76 | was crap n now she sez vote 4 him n unite? WTF? 
 77 | Screw em both + Mcain. Go Ron Paul!
 78 | 
 79 | Clinton conceeds to Obama but will her followers listen??  
 80 | 
 81 | Are we sure Chelsea didn't vote for Obama?
 82 | 
 83 | thinking ... Michelle Obama is terrific!
 84 | 
 85 | jetlag..no sleep...early mornig to starbux..Ms. Obama 
 86 | was moving
 87 | \end{verbatim}
 88 | \end{frame}
 89 | 
 90 | \begin{frame}{Regular Expressions}
 91 | \begin{itemize}
 92 | \item
 93 | Simplest pattern consists only of literals; a match occurs if the
 94 | sequence of literals occurs anywhere in the text being tested
 95 | \item
 96 | What if we only want the word ``Obama''? or sentences that end in 
 97 | the word ``Clinton'', or ``clinton'' or ``clinto''? 
 98 | \end{itemize}
 99 | \end{frame}
100 | 
101 | \begin{frame}{Regular Expressions}
102 | We need a way to express
103 | \begin{itemize}
104 | \item
105 | whitespace word boundaries
106 | \item
107 | sets of literals
108 | \item
109 | the beginning and end of a line
110 | \item
111 | alternatives (``war'' or ``peace'')
112 | \end{itemize}
113 | Metacharacters to the rescue!
114 | \end{frame}
115 | 
116 | \begin{frame}[fragile]{Metacharacters}
117 | Some metacharacters represent the start of a line
118 | \begin{verbatim}
119 | ^i think 
120 | \end{verbatim}
121 | will match the lines 
122 | \begin{verbatim}
123 | i think we all rule for participating
124 | i think i have been outed
125 | i think this will be quite fun actually
126 | i think i need to go to work
127 | i think i first saw zombo in 1999.
128 | \end{verbatim}
129 | \end{frame}
130 | 
131 | \begin{frame}[fragile]{Metacharacters}
132 | \$ represents the end of a line
133 | \begin{verbatim}
134 | morning$ 
135 | \end{verbatim}
136 | will match the lines 
137 | \begin{verbatim}
138 | well they had something this morning
139 | then had to catch a tram home in the morning
140 | dog obedience school in the morning
141 | and yes happy birthday i forgot to say it earlier this morning 
142 | I walked in the rain this morning
143 | good morning
144 | \end{verbatim}
145 | \end{frame}
146 | 
147 | \begin{frame}[fragile]{Character Classes with []}
148 | We can list a set of characters we will accept at a given point in the
149 | match
150 | \begin{verbatim}
151 | [Bb][Uu][Ss][Hh]
152 | \end{verbatim}
153 | will match the lines 
154 | \begin{verbatim}
155 | The democrats are playing, "Name the worst thing about Bush!"
156 | I smelled the desert creosote bush, brownies, BBQ chicken
157 | BBQ and bushwalking at Molonglo Gorge
158 | Bush TOLD you that North Korea is part of the Axis of Evil 
159 | I'm listening to Bush - Hurricane (Album Version)
160 | \end{verbatim}
161 | \end{frame}
162 | 
163 | \begin{frame}[fragile]{Character Classes with []}
164 | \begin{verbatim}
165 | ^[Ii] am
166 | \end{verbatim}
167 | will match
168 | \begin{verbatim}
169 | i am so angry at my boyfriend i can't even bear to 
170 | look at him
171 | 
172 | i am boycotting the apple store
173 | 
174 | I am twittering from iPhone
175 | 
176 | I am a very vengeful person when you ruin my sweetheart.
177 | 
178 | I am so over this. I need food. Mmmm bacon...
179 | \end{verbatim}
180 | \end{frame}
181 | 
182 | \begin{frame}[fragile]{Character Classes with []}
183 | Similarly, you can specify a range of letters [a-z] or 
184 | [a-zA-Z]; notice that the order doesn't matter
185 | \begin{verbatim}
186 | ^[0-9][a-zA-Z]
187 | \end{verbatim}
188 | will match the lines 
189 | \begin{verbatim}
190 | 7th inning stretch
191 | 2nd half soon to begin. OSU did just win something
192 | 3am - cant sleep - too hot still.. :(
193 | 5ft 7 sent from heaven 
194 | 1st sign of starvagtion
195 | \end{verbatim}
196 | \end{frame}
197 | 
198 | \begin{frame}[fragile]{Character Classes with []}
199 | When used at the beginning of a character class, the ``\verb+^+'' is also a 
200 | metacharacter and indicates matching characters NOT in the 
201 | indicated class
202 | \begin{verbatim}
203 | [^?.]$
204 | \end{verbatim}
205 | will match the lines 
206 | \begin{verbatim}
207 | i like basketballs 
208 | 6 and 9
209 | dont worry... we all die anyway! 
210 | Not in Baghdad
211 | helicopter under water? hmmm
212 | \end{verbatim}
213 | \end{frame}
214 | 
215 | \begin{frame}[fragile]{More Metacharacters}
216 | ``.'' is used to refer to any character. So
217 | \begin{verbatim}
218 | 9.11
219 | \end{verbatim}
220 | will match the lines 
221 | \begin{verbatim}
222 | its stupid the post 9-11 rules 
223 | if any 1 of us did 9/11 we would have been caught in days.
224 | NetBios: scanning ip 203.169.114.66
225 | Front Door 9:11:46 AM
226 | Sings: 0118999881999119725...3 !
227 | \end{verbatim}
228 | \end{frame}
229 | 
230 | \begin{frame}[fragile]{More Metacharacters: $|$}
231 | This does not mean ``pipe'' in the context of regular expressions;
232 | instead it translates to ``or''; we can use it to combine two
233 | expressions, the subexpressions being called alternatives
234 | \begin{verbatim}
235 | flood|fire
236 | \end{verbatim}
237 | will match the lines 
238 | \begin{verbatim}
239 | is firewire like usb on none macs? 
240 | the global flood makes sense within the context of the bible 
241 | yeah ive had the fire on tonight 
242 | ... and the floods, hurricanes, killer heatwaves, rednecks, gun nuts, etc. 
243 | \end{verbatim}
244 | \end{frame}
245 | 
246 | \begin{frame}[fragile]{More Metacharacters: $|$}
247 | We can include any number of alternatives...
248 | \begin{verbatim}
249 | flood|earthquake|hurricane|coldfire
250 | \end{verbatim}
251 | will match the lines 
252 | \begin{verbatim}
253 | Not a whole lot of hurricanes in the Arctic.
254 | We do have earthquakes nearly every day somewhere in our State 
255 | hurricanes swirl in the other direction 
256 | coldfire is STRAIGHT! 
257 | 'cause we keep getting earthquakes
258 | \end{verbatim}
259 | \end{frame}
260 | 
261 | \begin{frame}[fragile]{More Metacharacters: $|$}
262 | The alternatives can be real expressions and not just literals
263 | \begin{verbatim}
264 | ^[Gg]ood|[Bb]ad
265 | \end{verbatim}
266 | will match the lines 
267 | \begin{verbatim}
268 | good to hear some good knews from someone here 
269 | Good afternoon fellow american infidels! 
270 | good on you-what do you drive? 
271 | Katie... guess they had bad experiences... 
272 | my middle name is trouble, Miss Bad News
273 | \end{verbatim}
274 | \end{frame}
275 | 
276 | \begin{frame}[fragile]{More Metacharacters: ( and )}
277 | Subexpressions are often contained in parentheses to constrain the 
278 | alternatives
279 | \begin{verbatim}
280 | ^([Gg]ood|[Bb]ad)
281 | \end{verbatim}
282 | will match the lines 
283 | \begin{verbatim}
284 | bad habbit 
285 | bad coordination today 
286 | good, becuase there is nothing worse than a man in kinky underwear
287 | Badcop, its because people want to use drugs 
288 | Good Monday Holiday 
289 | Good riddance to Limey
290 | \end{verbatim}
291 | \end{frame}
292 | 
293 | \begin{frame}[fragile]{More Metacharacters: ?}
294 | The question mark indicates that the indicated expression is optional
295 | \begin{verbatim}
296 | [Gg]eorge( [Ww]\.)? [Bb]ush
297 | \end{verbatim}
298 | will match the lines 
299 | \begin{verbatim}
300 | i bet i can spell better than you and george bush combined
301 | BBC reported that President George W. Bush claimed God told him to invade Iraq 
302 | a bird in the hand is worth two george bushes 
303 | \end{verbatim}
304 | \end{frame}
305 | 
306 | \begin{frame}[fragile]{One thing to note...}
307 | In the following
308 | \begin{verbatim}
309 | [Gg]eorge( [Ww]\.)? [Bb]ush
310 | \end{verbatim}
311 | we wanted to match a ``.'' as a literal period; to do that, we had to
312 | ``escape'' the metacharacter, preceding it with a backslash In
313 | general, we have to do this for any metacharacter we want to include
314 | in our match
315 | \end{frame}
316 | 
317 | \begin{frame}[fragile]{More metacharacters: * and +}
318 | The * and + signs are metacharacters used to indicate repetition; * 
319 | means ``any number, including none, of the item'' and + means ``at 
320 | least one of the item''
321 | \begin{verbatim}
322 | \(.*\)
323 | \end{verbatim}
324 | will match the lines 
325 | \begin{verbatim}
326 | anyone wanna chat? (24, m, germany)
327 | hello, 20.m here... ( east area + drives + webcam ) 
328 | (he means older men) 
329 | ()
330 | \end{verbatim}
331 | \end{frame}
332 | 
333 | \begin{frame}[fragile]{More metacharacters: * and +}
334 | The * and + signs are metacharacters used to indicate repetition; * 
335 | means ``any number, including none, of the item'' and + means ``at 
336 | least one of the item''
337 | \begin{verbatim}
338 | [0-9]+ (.*)[0-9]+
339 | \end{verbatim}
340 | will match the lines 
341 | \begin{verbatim}
342 | working as MP here 720 MP battallion, 42nd birgade 
343 | so say 2 or 3 years at colleage and 4 at uni makes us 23 when and if we finish
344 | it went down on several occasions for like, 3 or 4 *days*
345 | Mmmm its time 4 me 2 go 2 bed
346 | \end{verbatim}
347 | \end{frame}
348 | 
349 | \begin{frame}[fragile]{More metacharacters: \{ and \}}
350 | \{ and \} are referred to as interval quantifiers; the let us specify
351 | the minimum and maximum number of matches of an expression
352 | \begin{verbatim}
353 | [Bb]ush( +[^ ]+){1,5} debate
354 | \end{verbatim}
355 | will match the lines 
356 | \begin{verbatim}
357 | Bush has historically won all major debates he's done. 
358 | in my view, Bush doesn't need these debates..
359 | bush doesn't need the debates? maybe you are right 
360 | That's what Bush supporters are doing about the debate. 
361 | Felix, I don't disagree that Bush was poorly prepared for the debate.
362 | indeed, but still, Bush should have taken the debate more seriously.
363 | Keep repeating that Bush smirked and scowled during the debate 
364 | \end{verbatim}
365 | \end{frame}
366 | 
367 | \begin{frame}[fragile]{More metacharacters: { and }}
368 | \begin{itemize}
369 | \item
370 | {m,n} means at least m but not more than n matches
371 | \item
372 | {m} means exactly m matches
373 | \item
374 | {m,} means at least m matches
375 | \end{itemize}
376 | \end{frame}
377 | 
378 | \begin{frame}[fragile]{More metacharacters: ( and ) revisited}
379 | \begin{itemize}
380 | \item
381 | In most implementations of regular expressions, the parentheses not
382 | only limit the scope of alternatives divided by a ``$|$'', but also
383 | can be used to ``remember'' text matched by the subexpression enclosed
384 | \item
385 | We refer to the matched text with \verb+\1+, \verb+\2+, etc.
386 | \end{itemize}
387 | \end{frame}
388 | 
389 | \begin{frame}[fragile]{More metacharacters: ( and ) revisited}
390 | So the expression 
391 | \begin{verbatim}
392 |  +([a-zA-Z]+) +\1 +
393 | \end{verbatim}
394 | will match the lines 
395 | \begin{verbatim}
396 | time for bed, night night twitter!
397 | 
398 | blah blah blah blah 
399 | 
400 | my tattoo is so so itchy today
401 | 
402 | i was standing all all alone against the world outside...
403 | 
404 | hi anybody anybody at home
405 | 
406 | estudiando css css css css.... que desastritooooo
407 | \end{verbatim}
408 | \end{frame}
409 | 
410 | \begin{frame}[fragile]{More metacharacters: ( and ) revisited}
411 | The \verb+*+ is ``greedy'' so it always matches the \textit{longest}
412 | possible string that satisfies the regular expression. So
413 | \begin{verbatim}
414 | ^s(.*)s
415 | \end{verbatim}
416 | matches
417 | \begin{verbatim}
418 | sitting at starbucks
419 | 
420 | setting up mysql and rails
421 | 
422 | studying stuff for the exams
423 | 
424 | spaghetti with marshmallows
425 | 
426 | stop fighting with crackers
427 | 
428 | sore shoulders, stupid ergonomics
429 | \end{verbatim}
430 | \end{frame}
431 | 
432 | \begin{frame}[fragile]{More metacharacters: ( and ) revisited}
433 | The greediness of \verb+*+ can be turned off with the \verb+?+, as in
434 | \begin{verbatim}
435 | ^s(.*?)s
436 | \end{verbatim}
437 | \end{frame}
438 | 
439 | 
440 | \begin{frame}{Summary}
441 | \begin{itemize}
442 | \item Regular expressions are used in many different languages; not
443 |   unique to R.
444 | \item Regular expressions are composed of literals and metacharacters
445 |   that represent sets or classes of characters/words
446 | \item Text processing via regular expressions is a very powerful way
447 |   to extract data from ``unfriendly'' sources (not all data comes as a
448 |   CSV file)
449 | \end{itemize}
450 | (Thanks to Mark Hansen for some material in this lecture.)
451 | \end{frame}
452 | 
453 | 
454 | 
455 | \end{document}
456 | 


--------------------------------------------------------------------------------
/sigmalike.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/sigmalike.pdf


--------------------------------------------------------------------------------
/simpoisson.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/simpoisson.pdf


--------------------------------------------------------------------------------
/simulation.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[aspectratio=169]{beamer}
  2 | 
  3 | \mode<presentation>
  4 | {
  5 |   \usetheme{Warsaw}
  6 |   % or ...
  7 | 
  8 |   \setbeamercovered{transparent}
  9 |   % or whatever (possibly just delete it)
 10 | }
 11 | 
 12 | 
 13 | \usepackage[english]{babel}
 14 | \usepackage[latin1]{inputenc}
 15 | \usepackage{graphicx}
 16 | %\usepackage{times}
 17 | %\usepackage[T1]{fontenc}
 18 | % Or whatever. Note that the encoding and the font should match. If T1
 19 | % does not look nice, try deleting the line with the fontenc.
 20 | 
 21 | \usepackage{amsmath,amsfonts,amssymb}
 22 | 
 23 | \input{macros}
 24 | 
 25 | \title[Simulation]{Simulation}
 26 | 
 27 | 
 28 | \date{Computing for Data Analysis}
 29 | 
 30 | 
 31 | 
 32 | \begin{document}
 33 | 
 34 | \begin{frame}
 35 |   \titlepage
 36 | \end{frame}
 37 | 
 38 | \begin{frame}{Generating Random Numbers}
 39 |   Functions for probability distributions in R
 40 | \begin{itemize}
 41 |   \item \code{rnorm}: generate random Normal variates with a
 42 |     given mean and standard deviation
 43 |   \item \code{dnorm}: evaluate the Normal probability density (with a
 44 |     given mean/SD) at a point (or vector of points)
 45 |   \item \code{pnorm}: evaluate the cumulative distribution function
 46 |     for a Normal distribution
 47 |   \item \code{rpois}: generate random Poisson variates with a given
 48 |     rate
 49 | \end{itemize}
 50 | \end{frame}
 51 | 
 52 | 
 53 | \begin{frame}{Generating Random Numbers}
 54 |   Probability distribution functions usually have four functions
 55 |   associated with them. The functions are prefixed with a
 56 | \begin{itemize}
 57 | \item \code{d} for density
 58 | \item \code{r} for random number generation
 59 | \item \code{p} for cumulative distribution
 60 | \item \code{q} for quantile function
 61 | \end{itemize}
 62 | \end{frame}
 63 | 
 64 | 
 65 | \begin{frame}[fragile]{Generating Random Numbers}
 66 | Working with the Normal distributions requires using these four
 67 | functions
 68 | \begin{verbatim}
 69 | dnorm(x, mean = 0, sd = 1, log = FALSE)
 70 | pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
 71 | qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
 72 | rnorm(n, mean = 0, sd = 1)
 73 | \end{verbatim}
 74 | If $\Phi$ is the cumulative distribution function for a standard
 75 | Normal distribution, then $\text{\code{pnorm}}(q) = \Phi(q)$ and
 76 | $\text{\code{qnorm}(p)} = \Phi^{-1}(p)$.
 77 | \end{frame}
 78 | 
 79 | \begin{frame}[fragile]{Generating Random Numbers}
 80 | Generating random Normal variates
 81 | \begin{verbatim}
 82 | > x <- rnorm(10)
 83 | > x
 84 |  [1] 1.38380206 0.48772671 0.53403109 0.66721944
 85 |  [5] 0.01585029 0.37945986 1.31096736 0.55330472
 86 |  [9] 1.22090852 0.45236742
 87 | > x <- rnorm(10, 20, 2)
 88 | > x
 89 |  [1] 23.38812 20.16846 21.87999 20.73813 19.59020
 90 |  [6] 18.73439 18.31721 22.51748 20.36966 21.04371
 91 | > summary(x)
 92 |    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 93 |   18.32   19.73   20.55   20.67   21.67   23.39 
 94 | \end{verbatim}
 95 | \end{frame}
 96 | 
 97 | \begin{frame}[fragile]{Generating Random Numbers}
 98 | Setting the random number seed with \code{set.seed} ensures
 99 | reproducibility
100 | \begin{verbatim}
101 | > set.seed(1)
102 | > rnorm(5)
103 | [1] -0.6264538  0.1836433 -0.8356286  1.5952808
104 | [5]  0.3295078
105 | > rnorm(5)
106 | [1] -0.8204684  0.4874291  0.7383247  0.5757814
107 | [5] -0.3053884
108 | > set.seed(1)
109 | > rnorm(5)
110 | [1] -0.6264538  0.1836433 -0.8356286  1.5952808
111 | [5]  0.3295078
112 | \end{verbatim}
113 | Always set the random number seed when conducting a simulation!
114 | \end{frame}
115 | 
116 | \begin{frame}[fragile]{Generating Random Numbers}
117 | Generating Poisson data
118 | \begin{verbatim}
119 | > rpois(10, 1)
120 |  [1] 3 1 0 1 0 0 1 0 1 1
121 | > rpois(10, 2)
122 |  [1] 6 2 2 1 3 2 2 1 1 2
123 | > rpois(10, 20)
124 |  [1] 20 11 21 20 20 21 17 15 24 20
125 | 
126 | > ppois(2, 2)  ## Cumulative distribution
127 | [1] 0.6766764  ## Pr(x <= 2)
128 | > ppois(4, 2)
129 | [1] 0.947347   ## Pr(x <= 4)
130 | > ppois(6, 2)
131 | [1] 0.9954662  ## Pr(x <= 6)
132 | \end{verbatim}
133 | \end{frame}
134 | 
135 | \begin{frame}[fragile]{Generating Random Numbers From a Linear Model}
136 | Suppose we want to simulate from the following linear model
137 | \[
138 | y = \beta_0 + \beta_1 x + \varepsilon
139 | \]
140 | where $\varepsilon\sim\mathcal{N}(0, 2^2)$. Assume
141 | $x\sim\mathcal{N}(0,1^2)$, $\beta_0 = 0.5$ and $\beta_1 = 2$.
142 | \begin{verbatim}
143 | > set.seed(20)
144 | > x <- rnorm(100)
145 | > e <- rnorm(100, 0, 2)
146 | > y <- 0.5 + 2 * x + e
147 | > summary(y)
148 |    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
149 | -6.4080 -1.5400  0.6789  0.6893  2.9300  6.5050 
150 | > plot(x, y)
151 | \end{verbatim}
152 | \end{frame}
153 | 
154 | \begin{frame}[fragile]{Generating Random Numbers From a Linear Model}
155 | \includegraphics[height=3.2in]{linearmodelsim}
156 | \end{frame}
157 | 
158 | \begin{frame}[fragile]{Generating Random Numbers From a Linear Model}
159 | What if \code{x} is binary?
160 | \begin{verbatim}
161 | > set.seed(10)
162 | > x <- rbinom(100, 1, 0.5)
163 | > e <- rnorm(100, 0, 2)
164 | > y <- 0.5 + 2 * x + e
165 | > summary(y)
166 |    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
167 | -3.4940 -0.1409  1.5770  1.4320  2.8400  6.9410 
168 | > plot(x, y)
169 | \end{verbatim}
170 | \end{frame}
171 | 
172 | \begin{frame}[fragile]{Generating Random Numbers From a Linear Model}
173 | \includegraphics[height=3.2in]{binarylinearmodelsim}
174 | \end{frame}
175 | 
176 | \begin{frame}[fragile]{Generating Random Numbers From a Generalized
177 |     Linear Model}
178 | Suppose we want to simulate from a Poisson model where
179 | \begin{eqnarray*}
180 | Y & \sim & \text{Poisson}(\mu)\\
181 | \log\mu & = & \beta_0 + \beta_1 x
182 | \end{eqnarray*}
183 | and $\beta_0 = 0.5$ and $\beta_1 = 0.3$.  We need to use the
184 | \code{rpois} function for this
185 | \begin{verbatim}
186 | > set.seed(1)
187 | > x <- rnorm(100)
188 | > log.mu <- 0.5 + 0.3 * x
189 | > y <- rpois(100, exp(log.mu))
190 | > summary(y)
191 |    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
192 |    0.00    1.00    1.00    1.55    2.00    6.00 
193 | > plot(x, y)
194 | \end{verbatim}
195 | \end{frame}
196 | 
197 | \begin{frame}[fragile]{Generating Random Numbers From a Generalized Linear Model}
198 | \includegraphics[height=3.2in]{simpoisson}
199 | \end{frame}
200 | 
201 | \begin{frame}[fragile]{Random Sampling}
202 |   The \code{sample} function draws randomly from a specified set of
203 |   (scalar) objects allowing you to sample from arbitrary
204 |   distributions.
205 | \begin{verbatim}
206 | > set.seed(1)
207 | > sample(1:10, 4)
208 | [1] 3 4 5 7
209 | > sample(1:10, 4)
210 | [1] 3 9 8 5
211 | > sample(letters, 5)
212 | [1] "q" "b" "e" "x" "p"
213 | > sample(1:10)  ## permutation
214 |  [1]  4  7 10  6  9  2  8  3  1  5
215 | > sample(1:10)
216 |  [1]  2  3  4  1  9  5 10  8  6  7
217 | > sample(1:10, replace = TRUE)  ## Sample w/replacement
218 |  [1] 2 9 7 8 2 8 5 9 7 8
219 | \end{verbatim}
220 | \end{frame}
221 | 
222 | \begin{frame}{Simulation}
223 | Summary
224 | \begin{itemize}
225 | \item Drawing samples from specific probability distributions can be
226 |   done with \code{r}* functions
227 | \item Standard distributions are built in: Normal, Poisson, Binomial,
228 |   Exponential, Gamma, etc.
229 | \item The \code{sample} function can be used to draw random samples
230 |   from arbitrary vectors
231 | \item Setting the random number generator seed via \code{set.seed} is
232 |   critical for reproducibility
233 | \end{itemize}
234 | \end{frame}
235 | 
236 | \end{document}
237 | 
238 | 
239 | 


--------------------------------------------------------------------------------
/str.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rdpeng/CourseraLectures/f58ebd7bc02e275d0428b8c2bb0ff9a2668e5990/str.pptx


--------------------------------------------------------------------------------
/vectorized.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[aspectratio=169]{beamer}
 2 | 
 3 | \mode<presentation>
 4 | {
 5 |   \usetheme{Warsaw}
 6 |   % or ...
 7 | 
 8 |   \setbeamercovered{transparent}
 9 |   % or whatever (possibly just delete it)
10 | }
11 | 
12 | 
13 | \usepackage[english]{babel}
14 | \usepackage[latin1]{inputenc}
15 | \usepackage{graphicx}
16 | %\usepackage{times}
17 | %\usepackage[T1]{fontenc}
18 | % Or whatever. Note that the encoding and the font should match. If T1
19 | % does not look nice, try deleting the line with the fontenc.
20 | 
21 | \usepackage{amsmath,amsfonts,amssymb}
22 | 
23 | \input{macros}
24 | 
25 | \title[The R Language]{Introduction to the R Language}
26 | 
27 | \subtitle{Vectorized Operations}
28 | 
29 | \date{Computing for Data Analysis}
30 | 
31 | \setbeamertemplate{footline}[page number]
32 | 
33 | 
34 | \begin{document}
35 | 
36 | \begin{frame}
37 |   \titlepage
38 | \end{frame}
39 | 
40 | 
41 | \begin{frame}[fragile]{Vectorized Operations}
42 | Many operations in R are \textit{vectorized} making code more
43 | efficient, concise, and easier to read.
44 | \begin{verbatim}
45 | > x <- 1:4; y <- 6:9
46 | > x + y
47 | [1]  7  9 11 13
48 | > x > 2
49 | [1] FALSE FALSE  TRUE  TRUE
50 | > x >= 2
51 | [1] FALSE  TRUE  TRUE  TRUE
52 | > y == 8
53 | [1] FALSE FALSE  TRUE FALSE
54 | > x * y
55 | [1]  6 14 24 36
56 | > x / y
57 | [1] 0.1666667 0.2857143 0.3750000 0.4444444
58 | \end{verbatim}
59 | \end{frame}
60 | 
61 | \begin{frame}[fragile]{Vectorized Matrix Operations}
62 | \begin{verbatim}
63 | > x <- matrix(1:4, 2, 2); y <- matrix(rep(10, 4), 2, 2)
64 | > x * y       ## element-wise multiplication
65 |      [,1] [,2]
66 | [1,]   10   30
67 | [2,]   20   40
68 | > x / y
69 |      [,1] [,2]
70 | [1,]  0.1  0.3
71 | [2,]  0.2  0.4
72 | > x %*% y     ## true matrix multiplication
73 |      [,1] [,2]
74 | [1,]   40   40
75 | [2,]   60   60
76 | \end{verbatim}
77 | \end{frame}
78 | 
79 | 
80 | \end{document}
81 | 


--------------------------------------------------------------------------------