├── .gitignore
├── 01-introduction
├── 01-introduction.Rnw
├── 01-introduction.pdf
├── 01-introduction.tex
└── images
│ ├── blacksmith.png
│ ├── jaume_nit.jpg
│ ├── spider_web.jpg
│ └── wikipedia_table.png
├── 02-reading-files
├── 02-reading-files.Rnw
├── 02-reading-files.pdf
├── 02-reading-files.tex
└── images
│ ├── R_data_import_manual.png
│ ├── alpha_data.png
│ ├── blacksmith.png
│ ├── cars2004_data.png
│ ├── dsets_data.png
│ ├── iris_data.png
│ ├── mineria_webpage.png
│ ├── moby_dick.jpg
│ ├── moby_dick.png
│ ├── moby_dick_535.png
│ ├── moby_dick_book.png
│ ├── moby_dick_gutenberg.png
│ ├── rbook_files.png
│ ├── read_functions.pdf
│ ├── skulls_data.png
│ ├── spider_web.jpg
│ ├── taxon_data.png
│ ├── travelbooks_data.png
│ ├── web2r.pdf
│ └── wikipedia_table.png
├── 03-xml-basics
├── 03-xml-basics.Rnw
├── 03-xml-basics.pdf
├── 03-xml-basics.tex
└── images
│ ├── blacksmith.png
│ ├── goodwillhunting.jpg
│ ├── parsing_xml.jpg
│ ├── spider_web.jpg
│ ├── xml_movie_tree1.pdf
│ ├── xml_movie_tree2.pdf
│ ├── xml_plants_catalog.png
│ └── xmlcover.png
├── 04-parsing-xml
├── 04-parsing-xml.Rnw
├── 04-parsing-xml.pdf
├── 04-parsing-xml.tex
└── images
│ ├── mailing_lists.png
│ ├── mailing_sig.png
│ ├── mailing_sig_source.png
│ ├── xml_package.png
│ ├── xml_tree_navigate.pdf
│ ├── xmlparsing_cover.png
│ ├── xpath_director.pdf
│ ├── xpath_firstname.pdf
│ ├── xpath_lastname.pdf
│ ├── xpath_movie.pdf
│ ├── xpath_title.pdf
│ ├── xpath_tree.pdf
│ └── xpath_ytmt.pdf
├── 05-json-data
├── 05-json-data.Rnw
├── 05-json-data.pdf
├── 05-json-data.tex
└── images
│ ├── json_cover.png
│ └── miserables_js.png
├── 06-http-basics-rcurl
├── 06-http-basics-rcurl.Rnw
├── 06-http-basics-rcurl.pdf
├── 06-http-basics-rcurl.tex
└── images
│ ├── Rproject.png
│ ├── client_request_server_response.pdf
│ ├── clients_servers.pdf
│ ├── curl_request.png
│ ├── hazards_form.png
│ ├── http_browser.png
│ ├── http_cover.png
│ ├── http_request_response.pdf
│ ├── rcurl_website.png
│ ├── web_request.pdf
│ └── web_surfing.pdf
├── 07-web-forms
├── 07-web-forms.Rnw
├── 07-web-forms.pdf
├── 07-web-forms.tex
└── images
│ ├── backpack_form.pdf
│ ├── html_form_coffee.pdf
│ ├── html_form_controls.pdf
│ ├── html_form_get_post.pdf
│ ├── html_form_gmail.pdf
│ ├── html_form_google.pdf
│ ├── html_form_twitter_facebook.pdf
│ ├── html_form_works.pdf
│ ├── national_parks_site.png
│ ├── product_select.pdf
│ └── webforms_cover.jpg
├── 08-web-apis
├── 08-web-apis.Rnw
├── 08-web-apis.pdf
├── 08-web-apis.tex
└── images
│ ├── bitly_api.png
│ ├── eutilities_webpage.png
│ ├── google_apis.png
│ ├── ncbi_databases.png
│ ├── ncbi_webpage.png
│ ├── pubmed_search.png
│ ├── pubmed_webpage.png
│ ├── redwood.jpg
│ └── us_census_api.png
├── Makefile
├── README.md
├── data
├── alpha.csv
├── alpha.xls
├── cars2004.csv
├── esearch_retmax103.xml
├── esearch_retmax20.xml
├── esummary.xml
├── harry_potter.html
├── harry_potter.txt
├── iris_data.txt
├── mailing_lists.html
├── miserables.txt
├── moby_dick.txt
├── npca.html
├── plant_catalog.xml
├── rproject_homepage.html
├── skulls_data.html
├── star_wars_json.R
├── swimming1500.html
├── taxon.txt
├── travelbooks.R
├── usingR.RData
└── xml_movies_cover.r
└── header.tex
/.gitignore:
--------------------------------------------------------------------------------
1 | # Mac specific
2 | .DS_Store
3 |
4 | # R stuff
5 | .Rhistory
6 |
7 | # latex specific
8 | 0*/.DS_Store
9 |
--------------------------------------------------------------------------------
/01-introduction/01-introduction.Rnw:
--------------------------------------------------------------------------------
1 | \documentclass{beamer}
2 |
3 | % load packages
4 | \usepackage{tikz}
5 | \usepackage{graphicx}
6 | \usepackage{upquote}
7 | \usepackage{listings}
8 | \usepackage{hyperref}
9 | \usepackage{color}
10 | \usepackage{lmodern}
11 |
12 | \input{../header.tex}
13 |
14 | \title[Getting data from the web with R]{\LARGE Getting Data from the Web with R}
15 | \subtitle[Web Data in R]{\large Part 1: Introduction}
16 | \author[gastonsanchez.com]{
17 | \textcolor{gray}{\textbf{G}aston \textbf{S}anchez}
18 | }
19 | \institute[]{\scriptsize \textcolor{lightgray}{April-May 2014}}
20 | \date[CC BY-SA-NC 4.0]{
21 | \textcolor{lightgrey}{\tiny{Content licensed under
22 | \href{http://creativecommons.org/licenses/by-nc-sa/4.0/}{CC BY-NC-SA 4.0}}}
23 | }
24 |
25 |
26 | \begin{document}
27 |
28 | <>=
29 | # smaller font size for chunks
30 | opts_chunk$set(size = 'tiny')
31 | thm <- knit_theme$get("bclear")
32 | knit_theme$set(thm)
33 | options(width=78)
34 | @
35 |
36 |
37 | %--- the titlepage frame -------------------------%
38 |
39 | \begin{frame}[plain]
40 | \titlepage
41 | \end{frame}
42 |
43 | %------------------------------------------------
44 |
45 | { % all template changes are local to this group.
46 | \setbeamertemplate{navigation symbols}{}
47 | \begin{frame}[plain]
48 | \begin{tikzpicture}[remember picture,overlay]
49 | \node[at=(current page.center)] {
50 | \includegraphics[height=\paperheight]{images/jaume_nit.jpg}
51 | };
52 | \node[fill=black, opacity=0, text opacity=1] at (5.5,-3.8) {\large{ \color{white} Getting Data from the Web with R}};
53 | \end{tikzpicture}
54 | \end{frame}
55 | }
56 |
57 | %------------------------------------------------
58 |
59 | \begin{frame}[fragile]
60 | \frametitle{Readme}
61 |
62 | \begin{block}{\scriptsize License:}
63 | \tiny
64 | \begin{itemize}
65 | \item[] Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License \\
66 | \url{http://creativecommons.org/licenses/by-nc-sa/4.0/}{}
67 | \end{itemize}
68 | \end{block}
69 |
70 | \begin{block}{\scriptsize You are free to:}
71 | \tiny
72 | \begin{itemize}
73 | \item[] \textcolor{darkgray}{\textbf{Share}} --- \textcolor{gray}{copy and redistribute the material}
74 | \item[] \textcolor{darkgray}{\textbf{Adapt}} --- \textcolor{gray}{rebuild and transform the material}
75 | \end{itemize}
76 | \end{block}
77 |
78 | \vspace{2mm}
79 | \begin{block}{\scriptsize Under the following conditions:}
80 | \tiny
81 | \begin{itemize}
82 | \item[] \textcolor{darkgray}{\textbf{Attribution}} --- \textcolor{gray}{You must give appropriate credit, provide a link to the license, and indicate if changes were made.}
83 | \item[] \textcolor{darkgray}{\textbf{NonCommercial}} --- \textcolor{gray}{You may not use this work for commercial purposes.}
84 | \item[] \textcolor{darkgray}{\textbf{Share Alike}} --- \textcolor{gray}{If you remix, transform, or build upon this
85 | work, you must distribute your contributions under the same license to this one.}
86 | \end{itemize}
87 | \end{block}
88 |
89 | \end{frame}
90 |
91 | %------------------------------------------------
92 |
93 | \begin{frame}
94 | \frametitle{Lectures Menu}
95 |
96 | \begin{columns}[t]
97 | \begin{column}{0.1\textwidth}
98 | %--- empty space ---%
99 | \end{column}
100 | \begin{column}{0.8\textwidth}
101 | \begin{block}{Slide Decks}
102 | \begin{enumerate}
103 | \item \textbf{Introduction}
104 | \item \textcolor{lightgray}{Reading files from the Web}
105 | \item \textcolor{lightgray}{Basics of XML and HTML}
106 | \item \textcolor{lightgray}{Parsing XML / HTML content}
107 | \item \textcolor{lightgray}{Handling JSON data}
108 | \item \textcolor{lightgray}{HTTP Basics and the RCurl Package}
109 | \item \textcolor{lightgray}{Getting data via Web Forms}
110 | \item \textcolor{lightgray}{Getting data via Web APIs}
111 | \end{enumerate}
112 | \end{block}
113 | \end{column}
114 | \begin{column}{0.1\textwidth}
115 | %--- empty space ---%
116 | \end{column}
117 | \end{columns}
118 |
119 | \end{frame}
120 |
121 | %------------------------------------------------
122 |
123 | \begin{frame}
124 | \frametitle{About these lectures}
125 |
126 | \begin{columns}[t]
127 | \begin{column}{0.1\textwidth}
128 | %--- empty space ---%
129 | \end{column}
130 | \begin{column}{0.8\textwidth}
131 | \begin{block}{Goal}
132 | My goal is \textbf{to give you an introduction} to some of the tools in R for getting data from the Web.
133 |
134 | \bigskip
135 | I don't pretend to cover everything nor going very deep. I just want to show you an overview of various Web Data scenarios you can handle with R.
136 | \end{block}
137 | \end{column}
138 | \begin{column}{0.1\textwidth}
139 | %--- empty space ---%
140 | \end{column}
141 | \end{columns}
142 |
143 | \end{frame}
144 |
145 | %------------------------------------------------
146 |
147 | \begin{frame}
148 | \begin{center}
149 | \Huge{\textcolor{mandarina}{Preliminaries}}
150 | \end{center}
151 | \end{frame}
152 |
153 | %------------------------------------------------
154 |
155 | \begin{frame}
156 | \frametitle{Requirements}
157 |
158 | \begin{block}{Must have:}
159 | \begin{itemize}
160 | \item Some experience working with R
161 | \item Some knowledge of HTML
162 | \item An insatiable curiosity for learning new things
163 | \end{itemize}
164 | \end{block}
165 |
166 | \begin{block}{Nice to have:}
167 | \begin{itemize}
168 | \item Knowledge about data storage formats
169 | \item Some programming experience
170 | \item Knowledge on how the Web works
171 | \end{itemize}
172 | \end{block}
173 |
174 | \end{frame}
175 |
176 | %------------------------------------------------
177 |
178 | \begin{frame}
179 | \frametitle{Software}
180 |
181 | \begin{block}{You'll need:}
182 | \begin{itemize}
183 | \item R \textcolor{lightgray}{(preferably the last version)} \\ \url{http://cran.r-project.org/}
184 | \item RStudio \textcolor{lightgray}{(highly recommended)} \\ \url{https://www.rstudio.com/}
185 | \item Text Editor \\ \textcolor{lightgray}{(eg vim, emacs, TextWrangler, notepad, sublime text)}
186 | \item Web Browser \\ \textcolor{lightgray}{(eg Chrome, Safari, Firefox, Internet Explorer, Opera)}
187 | \item and a good internet connection!
188 | \end{itemize}
189 | \end{block}
190 |
191 | \end{frame}
192 |
193 | %------------------------------------------------
194 |
195 | \begin{frame}
196 | \frametitle{In my case ...}
197 |
198 | \begin{block}{Software I used for these slides:}
199 | \begin{itemize}
200 | \item R version 3.1.0 (2014-04-10) -- "Spring Dance"
201 | \item Platform: x86\_64-apple-darwin10.8.0 (64-bit)
202 | \item IDE: RStudio Version 0.98.501
203 | \item Text Editor: TextWrangler
204 | \item Web Browser: Google Chrome Version 34.0.1847.131
205 | \item Operating System: OS-X Version 10.8.5
206 | \end{itemize}
207 | \end{block}
208 |
209 | \end{frame}
210 |
211 | %------------------------------------------------
212 |
213 | \begin{frame}
214 | \frametitle{Resources}
215 |
216 | \begin{block}{Some R Books}
217 | \begin{itemize}
218 | \item XML and Web Technologies for Data Sciences with R \\
219 | \low{by Deb Nolan and Duncan Temple Lang}
220 | \item Introduction to Data Technologies \\
221 | \low{by Duncan Murdoch}
222 | \item Data Manipulation with R \\
223 | \low{by Phil Spector}
224 | \item more references in each slide deck
225 | \end{itemize}
226 | \end{block}
227 |
228 | \end{frame}
229 |
230 | %------------------------------------------------
231 |
232 | \begin{frame}
233 | \frametitle{Resources}
234 |
235 | \begin{block}{Web Scraping with R}
236 | \begin{itemize}
237 | \item Web scraping for the humanities and social sciences \\
238 | \low{(by Rolf Fredheim and Aiora Zabala)} \\
239 | {\tiny \url{http://quantifyingmemory.blogspot.co.uk/2014/02/web-scraping-basics.html}}
240 | \item Web Scraping with R \low{(by Xian Nan)} \\
241 | {\tiny \url{http://cos.name/wp-content/uploads/2013/05/Web-Scraping-with-R-XiaoNan.pdf}}
242 | \item R-bloggers posts on \textit{Web Scraping} \\
243 | {\tiny \url{http://www.r-bloggers.com/?s=web+scraping}}
244 | \end{itemize}
245 | \end{block}
246 |
247 | \end{frame}
248 |
249 | %------------------------------------------------
250 |
251 | \begin{frame}
252 | \frametitle{Some R Packages}
253 |
254 | \begin{center}
255 | \begin{tabular}{l l}
256 | \hline
257 | Package & Description \\
258 | \hline
259 | \highcode{RCurl} & R interface to the \code{libcurl} library \\
260 | & for making general HTTP requests \\
261 | \highcode{RHTMLForms} & Tools to process Web/HTML forms \\
262 | \highcode{XML} & Tools for parsing XML and HTML documents \\
263 | & and working with structured data from the Web \\
264 | \highcode{RJSONIO} & Functions for handling JSON data \\
265 | \highcode{jsonlite} & Functions for handling JSON data \\
266 | \highcode{rjson} & Functions for handling JSON data \\
267 | \highcode{ROAuth} & Interface for authentication via OAuth 1.0 \\
268 | \highcode{SSOAP} & Use SOAP protocol to retrieve data \\
269 | \hline
270 | \end{tabular}
271 | \end{center}
272 |
273 | CRAN Task View: \textit{Web Technologies and Services} \\
274 | {\scriptsize \url{http://cran.r-project.org/web/views/WebTechnologies.html}}
275 |
276 | \end{frame}
277 |
278 | %------------------------------------------------
279 |
280 | { % all template changes are local to this group.
281 | \setbeamertemplate{navigation symbols}{}
282 | \begin{frame}[plain]
283 | \begin{tikzpicture}[remember picture,overlay]
284 | \node[at=(current page.center)] {
285 | \includegraphics[width=\paperwidth]{images/spider_web.jpg}
286 | };
287 | \node[fill=black, opacity=0, text opacity=1] at (7.5,-2.8) {\Huge{ \color{white} The Web}};
288 | \end{tikzpicture}
289 | \end{frame}
290 | }
291 |
292 | %------------------------------------------------
293 |
294 | \begin{frame}
295 | \frametitle{VIP Questions}
296 |
297 | \begin{columns}[t]
298 | \begin{column}{0.1\textwidth}
299 | %--- empty space ---%
300 | \end{column}
301 | \begin{column}{0.8\textwidth}
302 | \begin{block}{Very Important Preliminary Questions}
303 | The Data that you want:
304 | \begin{enumerate}
305 | \item Where is it located?
306 | \item How accessible is it?
307 | \item What is its structure / format?
308 | \end{enumerate}
309 | \end{block}
310 | \end{column}
311 | \begin{column}{0.1\textwidth}
312 | %--- empty space ---%
313 | \end{column}
314 | \end{columns}
315 |
316 | \end{frame}
317 |
318 | %------------------------------------------------
319 |
320 | \begin{frame}
321 | \frametitle{VIP Questions}
322 |
323 | \begin{block}{Location of Data}
324 | \begin{itemize}
325 | \item Do you know the location (URL) beforehand?
326 | \item[] Or do you have to figure it out? \\
327 | \item Is it in one single specific place? \\
328 | \low{(eg one HTML table, one file in the Web)}
329 | \item Is it in one website but spread across several pages? \\
330 | \low{(eg several HTML tables at different pages)}
331 | \item Is it spread across several websites? \\
332 | \low{(eg multiple pieces of information in various sites)}
333 | \item Is it in one or several databases?
334 | \end{itemize}
335 | \end{block}
336 |
337 | \end{frame}
338 |
339 | %------------------------------------------------
340 |
341 | \begin{frame}
342 | \frametitle{VIP Questions}
343 |
344 | \begin{block}{Accessibility of Data}
345 | \begin{itemize}
346 | \item Do you have free direct immediate access to data?
347 | \item Do you need to fill a Web Form?
348 | \item Do you need to use a Web API?
349 | \item Do you require username, password, authentication?
350 | \item Do you need to use a specifc transfer protocol?
351 | \item Do you need to use a specifc type/method of request?
352 | \end{itemize}
353 | \end{block}
354 |
355 | \end{frame}
356 |
357 |
358 | %------------------------------------------------
359 |
360 | \begin{frame}
361 | \frametitle{VIP Questions}
362 |
363 | \begin{block}{Format / Structure of Data}
364 | \begin{itemize}
365 | \item Is it plain text? \\
366 | \item Is it in tabular \low{(spreadsheet-like)} form? \\
367 | \item Is it in HTML? \\
368 | \item Is it in some XML-dialect?
369 | \item Is it in JSON format?
370 | \item Other formats: binary, images, maps, etc?
371 | \end{itemize}
372 | \end{block}
373 |
374 | \end{frame}
375 |
376 | %------------------------------------------------
377 |
378 | \begin{frame}
379 | \frametitle{Glossary}
380 |
381 | \begin{block}{Some Acronyms}
382 | \begin{itemize}
383 | \item \textbf{WWW} World Wide Web
384 | \item \textbf{W3C} World Wide Web Consortium
385 | \item \textbf{URL} Uniform Resource Locator
386 | \item \textbf{HTTP} HyperText Transfer Protocol
387 | \item \textbf{XML} Extensible Markup Language
388 | \item \textbf{HTML} HyperText Markup Language
389 | \item \textbf{JSON} JavaScript Object Notation
390 | \end{itemize}
391 | \end{block}
392 |
393 | \end{frame}
394 |
395 | %------------------------------------------------
396 |
397 | \end{document}
--------------------------------------------------------------------------------
/01-introduction/01-introduction.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/01-introduction/01-introduction.pdf
--------------------------------------------------------------------------------
/01-introduction/01-introduction.tex:
--------------------------------------------------------------------------------
1 | \documentclass{beamer}\usepackage[]{graphicx}\usepackage[]{color}
2 | %% maxwidth is the original width if it is less than linewidth
3 | %% otherwise use linewidth (to make sure the graphics do not exceed the margin)
4 | \makeatletter
5 | \def\maxwidth{ %
6 | \ifdim\Gin@nat@width>\linewidth
7 | \linewidth
8 | \else
9 | \Gin@nat@width
10 | \fi
11 | }
12 | \makeatother
13 |
14 | \definecolor{fgcolor}{rgb}{0.196, 0.196, 0.196}
15 | \newcommand{\hlnum}[1]{\textcolor[rgb]{0.063,0.58,0.627}{#1}}%
16 | \newcommand{\hlstr}[1]{\textcolor[rgb]{0.063,0.58,0.627}{#1}}%
17 | \newcommand{\hlcom}[1]{\textcolor[rgb]{0.588,0.588,0.588}{#1}}%
18 | \newcommand{\hlopt}[1]{\textcolor[rgb]{0.196,0.196,0.196}{#1}}%
19 | \newcommand{\hlstd}[1]{\textcolor[rgb]{0.196,0.196,0.196}{#1}}%
20 | \newcommand{\hlkwa}[1]{\textcolor[rgb]{0.231,0.416,0.784}{#1}}%
21 | \newcommand{\hlkwb}[1]{\textcolor[rgb]{0.627,0,0.314}{#1}}%
22 | \newcommand{\hlkwc}[1]{\textcolor[rgb]{0,0.631,0.314}{#1}}%
23 | \newcommand{\hlkwd}[1]{\textcolor[rgb]{0.78,0.227,0.412}{#1}}%
24 | \let\hlipl\hlkwb
25 |
26 | \usepackage{framed}
27 | \makeatletter
28 | \newenvironment{kframe}{%
29 | \def\at@end@of@kframe{}%
30 | \ifinner\ifhmode%
31 | \def\at@end@of@kframe{\end{minipage}}%
32 | \begin{minipage}{\columnwidth}%
33 | \fi\fi%
34 | \def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
35 | \colorbox{shadecolor}{##1}\hskip-\fboxsep
36 | % There is no \\@totalrightmargin, so:
37 | \hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
38 | \MakeFramed {\advance\hsize-\width
39 | \@totalleftmargin\z@ \linewidth\hsize
40 | \@setminipage}}%
41 | {\par\unskip\endMakeFramed%
42 | \at@end@of@kframe}
43 | \makeatother
44 |
45 | \definecolor{shadecolor}{rgb}{.97, .97, .97}
46 | \definecolor{messagecolor}{rgb}{0, 0, 0}
47 | \definecolor{warningcolor}{rgb}{1, 0, 1}
48 | \definecolor{errorcolor}{rgb}{1, 0, 0}
49 | \newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX
50 |
51 | \usepackage{alltt}
52 |
53 | % load packages
54 | \usepackage{tikz}
55 | \usepackage{graphicx}
56 | \usepackage{upquote}
57 | \usepackage{listings}
58 | \usepackage{hyperref}
59 | \usepackage{color}
60 | \usepackage{lmodern}
61 |
62 | \input{../header.tex}
63 |
64 | \title[Getting data from the web with R]{\LARGE Getting Data from the Web with R}
65 | \subtitle[Web Data in R]{\large Part 1: Introduction}
66 | \author[gastonsanchez.com]{
67 | \textcolor{gray}{\textbf{G}aston \textbf{S}anchez}
68 | }
69 | \institute[]{\scriptsize \textcolor{lightgray}{April-May 2014}}
70 | \date[CC BY-SA-NC 4.0]{
71 | \textcolor{lightgrey}{\tiny{Content licensed under
72 | \href{http://creativecommons.org/licenses/by-nc-sa/4.0/}{CC BY-NC-SA 4.0}}}
73 | }
74 | \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
75 | \begin{document}
76 |
77 |
78 |
79 |
80 | %--- the titlepage frame -------------------------%
81 |
82 | \begin{frame}[plain]
83 | \titlepage
84 | \end{frame}
85 |
86 | %------------------------------------------------
87 |
88 | { % all template changes are local to this group.
89 | \setbeamertemplate{navigation symbols}{}
90 | \begin{frame}[plain]
91 | \begin{tikzpicture}[remember picture,overlay]
92 | \node[at=(current page.center)] {
93 | \includegraphics[height=\paperheight]{images/jaume_nit.jpg}
94 | };
95 | \node[fill=black, opacity=0, text opacity=1] at (5.5,-3.8) {\large{ \color{white} Getting Data from the Web with R}};
96 | \end{tikzpicture}
97 | \end{frame}
98 | }
99 |
100 | %------------------------------------------------
101 |
102 | \begin{frame}[fragile]
103 | \frametitle{Readme}
104 |
105 | \begin{block}{\scriptsize License:}
106 | \tiny
107 | \begin{itemize}
108 | \item[] Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License \\
109 | \url{http://creativecommons.org/licenses/by-nc-sa/4.0/}{}
110 | \end{itemize}
111 | \end{block}
112 |
113 | \begin{block}{\scriptsize You are free to:}
114 | \tiny
115 | \begin{itemize}
116 | \item[] \textcolor{darkgray}{\textbf{Share}} --- \textcolor{gray}{copy and redistribute the material}
117 | \item[] \textcolor{darkgray}{\textbf{Adapt}} --- \textcolor{gray}{rebuild and transform the material}
118 | \end{itemize}
119 | \end{block}
120 |
121 | \vspace{2mm}
122 | \begin{block}{\scriptsize Under the following conditions:}
123 | \tiny
124 | \begin{itemize}
125 | \item[] \textcolor{darkgray}{\textbf{Attribution}} --- \textcolor{gray}{You must give appropriate credit, provide a link to the license, and indicate if changes were made.}
126 | \item[] \textcolor{darkgray}{\textbf{NonCommercial}} --- \textcolor{gray}{You may not use this work for commercial purposes.}
127 | \item[] \textcolor{darkgray}{\textbf{Share Alike}} --- \textcolor{gray}{If you remix, transform, or build upon this
128 | work, you must distribute your contributions under the same license to this one.}
129 | \end{itemize}
130 | \end{block}
131 |
132 | \end{frame}
133 |
134 | %------------------------------------------------
135 |
136 | \begin{frame}
137 | \frametitle{Lectures Menu}
138 |
139 | \begin{columns}[t]
140 | \begin{column}{0.1\textwidth}
141 | %--- empty space ---%
142 | \end{column}
143 | \begin{column}{0.8\textwidth}
144 | \begin{block}{Slide Decks}
145 | \begin{enumerate}
146 | \item \textbf{Introduction}
147 | \item \textcolor{lightgray}{Reading files from the Web}
148 | \item \textcolor{lightgray}{Basics of XML and HTML}
149 | \item \textcolor{lightgray}{Parsing XML / HTML content}
150 | \item \textcolor{lightgray}{Handling JSON data}
151 | \item \textcolor{lightgray}{HTTP Basics and the RCurl Package}
152 | \item \textcolor{lightgray}{Getting data via Web Forms}
153 | \item \textcolor{lightgray}{Getting data via Web APIs}
154 | \end{enumerate}
155 | \end{block}
156 | \end{column}
157 | \begin{column}{0.1\textwidth}
158 | %--- empty space ---%
159 | \end{column}
160 | \end{columns}
161 |
162 | \end{frame}
163 |
164 | %------------------------------------------------
165 |
166 | \begin{frame}
167 | \frametitle{About these lectures}
168 |
169 | \begin{columns}[t]
170 | \begin{column}{0.1\textwidth}
171 | %--- empty space ---%
172 | \end{column}
173 | \begin{column}{0.8\textwidth}
174 | \begin{block}{Goal}
175 | My goal is \textbf{to give you an introduction} to some of the tools in R for getting data from the Web.
176 |
177 | \bigskip
178 | I don't pretend to cover everything nor going very deep. I just want to show you an overview of various Web Data scenarios you can handle with R.
179 | \end{block}
180 | \end{column}
181 | \begin{column}{0.1\textwidth}
182 | %--- empty space ---%
183 | \end{column}
184 | \end{columns}
185 |
186 | \end{frame}
187 |
188 | %------------------------------------------------
189 |
190 | \begin{frame}
191 | \begin{center}
192 | \Huge{\textcolor{mandarina}{Preliminaries}}
193 | \end{center}
194 | \end{frame}
195 |
196 | %------------------------------------------------
197 |
198 | \begin{frame}
199 | \frametitle{Requirements}
200 |
201 | \begin{block}{Must have:}
202 | \begin{itemize}
203 | \item Some experience working with R
204 | \item Some knowledge of HTML
205 | \item An insatiable curiosity for learning new things
206 | \end{itemize}
207 | \end{block}
208 |
209 | \begin{block}{Nice to have:}
210 | \begin{itemize}
211 | \item Knowledge about data storage formats
212 | \item Some programming experience
213 | \item Knowledge on how the Web works
214 | \end{itemize}
215 | \end{block}
216 |
217 | \end{frame}
218 |
219 | %------------------------------------------------
220 |
221 | \begin{frame}
222 | \frametitle{Software}
223 |
224 | \begin{block}{You'll need:}
225 | \begin{itemize}
226 | \item R \textcolor{lightgray}{(preferably the last version)} \\ \url{http://cran.r-project.org/}
227 | \item RStudio \textcolor{lightgray}{(highly recommended)} \\ \url{https://www.rstudio.com/}
228 | \item Text Editor \\ \textcolor{lightgray}{(eg vim, emacs, TextWrangler, notepad, sublime text)}
229 | \item Web Browser \\ \textcolor{lightgray}{(eg Chrome, Safari, Firefox, Internet Explorer, Opera)}
230 | \item and a good internet connection!
231 | \end{itemize}
232 | \end{block}
233 |
234 | \end{frame}
235 |
236 | %------------------------------------------------
237 |
238 | \begin{frame}
239 | \frametitle{In my case ...}
240 |
241 | \begin{block}{Software I used for these slides:}
242 | \begin{itemize}
243 | \item R version 3.1.0 (2014-04-10) -- "Spring Dance"
244 | \item Platform: x86\_64-apple-darwin10.8.0 (64-bit)
245 | \item IDE: RStudio Version 0.98.501
246 | \item Text Editor: TextWrangler
247 | \item Web Browser: Google Chrome Version 34.0.1847.131
248 | \item Operating System: OS-X Version 10.8.5
249 | \end{itemize}
250 | \end{block}
251 |
252 | \end{frame}
253 |
254 | %------------------------------------------------
255 |
256 | \begin{frame}
257 | \frametitle{Resources}
258 |
259 | \begin{block}{Some R Books}
260 | \begin{itemize}
261 | \item XML and Web Technologies for Data Sciences with R \\
262 | \low{by Deb Nolan and Duncan Temple Lang}
263 | \item Introduction to Data Technologies \\
264 | \low{by Duncan Murdoch}
265 | \item Data Manipulation with R \\
266 | \low{by Phil Spector}
267 | \item more references in each slide deck
268 | \end{itemize}
269 | \end{block}
270 |
271 | \end{frame}
272 |
273 | %------------------------------------------------
274 |
275 | \begin{frame}
276 | \frametitle{Resources}
277 |
278 | \begin{block}{Web Scraping with R}
279 | \begin{itemize}
280 | \item Web scraping for the humanities and social sciences \\
281 | \low{(by Rolf Fredheim and Aiora Zabala)} \\
282 | {\tiny \url{http://quantifyingmemory.blogspot.co.uk/2014/02/web-scraping-basics.html}}
283 | \item Web Scraping with R \low{(by Xian Nan)} \\
284 | {\tiny \url{http://cos.name/wp-content/uploads/2013/05/Web-Scraping-with-R-XiaoNan.pdf}}
285 | \item R-bloggers posts on \textit{Web Scraping} \\
286 | {\tiny \url{http://www.r-bloggers.com/?s=web+scraping}}
287 | \end{itemize}
288 | \end{block}
289 |
290 | \end{frame}
291 |
292 | %------------------------------------------------
293 |
294 | \begin{frame}
295 | \frametitle{Some R Packages}
296 |
297 | \begin{center}
298 | \begin{tabular}{l l}
299 | \hline
300 | Package & Description \\
301 | \hline
302 | \highcode{RCurl} & R interface to the \code{libcurl} library \\
303 | & for making general HTTP requests \\
304 | \highcode{RHTMLForms} & Tools to process Web/HTML forms \\
305 | \highcode{XML} & Tools for parsing XML and HTML documents \\
306 | & and working with structured data from the Web \\
307 | \highcode{RJSONIO} & Functions for handling JSON data \\
308 | \highcode{jsonlite} & Functions for handling JSON data \\
309 | \highcode{rjson} & Functions for handling JSON data \\
310 | \highcode{ROAuth} & Interface for authentication via OAuth 1.0 \\
311 | \highcode{SSOAP} & Use SOAP protocol to retrieve data \\
312 | \hline
313 | \end{tabular}
314 | \end{center}
315 |
316 | CRAN Task View: \textit{Web Technologies and Services} \\
317 | {\scriptsize \url{http://cran.r-project.org/web/views/WebTechnologies.html}}
318 |
319 | \end{frame}
320 |
321 | %------------------------------------------------
322 |
323 | { % all template changes are local to this group.
324 | \setbeamertemplate{navigation symbols}{}
325 | \begin{frame}[plain]
326 | \begin{tikzpicture}[remember picture,overlay]
327 | \node[at=(current page.center)] {
328 | \includegraphics[width=\paperwidth]{images/spider_web.jpg}
329 | };
330 | \node[fill=black, opacity=0, text opacity=1] at (7.5,-2.8) {\Huge{ \color{white} The Web}};
331 | \end{tikzpicture}
332 | \end{frame}
333 | }
334 |
335 | %------------------------------------------------
336 |
337 | \begin{frame}
338 | \frametitle{VIP Questions}
339 |
340 | \begin{columns}[t]
341 | \begin{column}{0.1\textwidth}
342 | %--- empty space ---%
343 | \end{column}
344 | \begin{column}{0.8\textwidth}
345 | \begin{block}{Very Important Preliminary Questions}
346 | The Data that you want:
347 | \begin{enumerate}
348 | \item Where is it located?
349 | \item How accessible is it?
350 | \item What is its structure / format?
351 | \end{enumerate}
352 | \end{block}
353 | \end{column}
354 | \begin{column}{0.1\textwidth}
355 | %--- empty space ---%
356 | \end{column}
357 | \end{columns}
358 |
359 | \end{frame}
360 |
361 | %------------------------------------------------
362 |
363 | \begin{frame}
364 | \frametitle{VIP Questions}
365 |
366 | \begin{block}{Location of Data}
367 | \begin{itemize}
368 | \item Do you know the location (URL) beforehand?
369 | \item[] Or do you have to figure it out? \\
370 | \item Is it in one single specific place? \\
371 | \low{(eg one HTML table, one file in the Web)}
372 | \item Is it in one website but spread across several pages? \\
373 | \low{(eg several HTML tables at different pages)}
374 | \item Is it spread across several websites? \\
375 | \low{(eg multiple pieces of information in various sites)}
376 | \item Is it in one or several databases?
377 | \end{itemize}
378 | \end{block}
379 |
380 | \end{frame}
381 |
382 | %------------------------------------------------
383 |
384 | \begin{frame}
385 | \frametitle{VIP Questions}
386 |
387 | \begin{block}{Accessibility of Data}
388 | \begin{itemize}
389 | \item Do you have free direct immediate access to data?
390 | \item Do you need to fill a Web Form?
391 | \item Do you need to use a Web API?
392 | \item Do you require username, password, authentication?
393 | \item Do you need to use a specifc transfer protocol?
394 | \item Do you need to use a specifc type/method of request?
395 | \end{itemize}
396 | \end{block}
397 |
398 | \end{frame}
399 |
400 |
401 | %------------------------------------------------
402 |
403 | \begin{frame}
404 | \frametitle{VIP Questions}
405 |
406 | \begin{block}{Format / Structure of Data}
407 | \begin{itemize}
408 | \item Is it plain text? \\
409 | \item Is it in tabular \low{(spreadsheet-like)} form? \\
410 | \item Is it in HTML? \\
411 | \item Is it in some XML-dialect?
412 | \item Is it in JSON format?
413 | \item Other formats: binary, images, maps, etc?
414 | \end{itemize}
415 | \end{block}
416 |
417 | \end{frame}
418 |
419 | %------------------------------------------------
420 |
421 | \begin{frame}
422 | \frametitle{Glossary}
423 |
424 | \begin{block}{Some Acronyms}
425 | \begin{itemize}
426 | \item \textbf{WWW} World Wide Web
427 | \item \textbf{W3C} World Wide Web Consortium
428 | \item \textbf{URL} Uniform Resource Locator
429 | \item \textbf{HTTP} HyperText Transfer Protocol
430 | \item \textbf{XML} Extensible Markup Language
431 | \item \textbf{HTML} HyperText Markup Language
432 | \item \textbf{JSON} JavaScript Object Notation
433 | \end{itemize}
434 | \end{block}
435 |
436 | \end{frame}
437 |
438 | %------------------------------------------------
439 |
440 | \end{document}
441 |
--------------------------------------------------------------------------------
/01-introduction/images/blacksmith.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/01-introduction/images/blacksmith.png
--------------------------------------------------------------------------------
/01-introduction/images/jaume_nit.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/01-introduction/images/jaume_nit.jpg
--------------------------------------------------------------------------------
/01-introduction/images/spider_web.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/01-introduction/images/spider_web.jpg
--------------------------------------------------------------------------------
/01-introduction/images/wikipedia_table.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/01-introduction/images/wikipedia_table.png
--------------------------------------------------------------------------------
/02-reading-files/02-reading-files.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/02-reading-files.pdf
--------------------------------------------------------------------------------
/02-reading-files/images/R_data_import_manual.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/R_data_import_manual.png
--------------------------------------------------------------------------------
/02-reading-files/images/alpha_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/alpha_data.png
--------------------------------------------------------------------------------
/02-reading-files/images/blacksmith.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/blacksmith.png
--------------------------------------------------------------------------------
/02-reading-files/images/cars2004_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/cars2004_data.png
--------------------------------------------------------------------------------
/02-reading-files/images/dsets_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/dsets_data.png
--------------------------------------------------------------------------------
/02-reading-files/images/iris_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/iris_data.png
--------------------------------------------------------------------------------
/02-reading-files/images/mineria_webpage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/mineria_webpage.png
--------------------------------------------------------------------------------
/02-reading-files/images/moby_dick.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/moby_dick.jpg
--------------------------------------------------------------------------------
/02-reading-files/images/moby_dick.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/moby_dick.png
--------------------------------------------------------------------------------
/02-reading-files/images/moby_dick_535.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/moby_dick_535.png
--------------------------------------------------------------------------------
/02-reading-files/images/moby_dick_book.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/moby_dick_book.png
--------------------------------------------------------------------------------
/02-reading-files/images/moby_dick_gutenberg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/moby_dick_gutenberg.png
--------------------------------------------------------------------------------
/02-reading-files/images/rbook_files.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/rbook_files.png
--------------------------------------------------------------------------------
/02-reading-files/images/read_functions.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/read_functions.pdf
--------------------------------------------------------------------------------
/02-reading-files/images/skulls_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/skulls_data.png
--------------------------------------------------------------------------------
/02-reading-files/images/spider_web.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/spider_web.jpg
--------------------------------------------------------------------------------
/02-reading-files/images/taxon_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/taxon_data.png
--------------------------------------------------------------------------------
/02-reading-files/images/travelbooks_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/travelbooks_data.png
--------------------------------------------------------------------------------
/02-reading-files/images/web2r.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/web2r.pdf
--------------------------------------------------------------------------------
/02-reading-files/images/wikipedia_table.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/02-reading-files/images/wikipedia_table.png
--------------------------------------------------------------------------------
/03-xml-basics/03-xml-basics.Rnw:
--------------------------------------------------------------------------------
1 | \documentclass{beamer}
2 |
3 | % load packages
4 | \usepackage{tikz}
5 | \usepackage{graphicx}
6 | \usepackage{upquote}
7 | \usepackage{listings}
8 | \usepackage{hyperref}
9 | \usepackage{color}
10 | \usepackage{lmodern}
11 |
12 | \input{../header.tex}
13 |
14 | \title[Getting data from the web with R]{\LARGE Getting Data from the Web with R}
15 | \subtitle[Web Data in R]{\large Part 3: Basics of XML}
16 | \author[gastonsanchez.com]{
17 | \textcolor{gray}{\textbf{G}aston \textbf{S}anchez}
18 | }
19 | \institute[]{\scriptsize \textcolor{lightgray}{April-May 2014}}
20 | \date[CC BY-SA-NC 4.0]{
21 | \textcolor{lightgray}{\tiny{Content licensed under
22 | \href{http://creativecommons.org/licenses/by-nc-sa/4.0/}{CC BY-NC-SA 4.0}}}
23 | }
24 |
25 |
26 | \begin{document}
27 |
28 | <>=
29 | # smaller font size for chunks
30 | opts_chunk$set(size = 'tiny')
31 | thm <- knit_theme$get("bclear")
32 | knit_theme$set(thm)
33 | options(width=78)
34 | @
35 |
36 |
37 | %--- the titlepage frame -------------------------%
38 |
39 | \begin{frame}[plain]
40 | \titlepage
41 | \end{frame}
42 |
43 | %------------------------------------------------
44 |
45 | { % all template changes are local to this group.
46 | \setbeamertemplate{navigation symbols}{}
47 | \begin{frame}[plain]
48 | \begin{tikzpicture}[remember picture,overlay]
49 | \node[at=(current page.center)] {
50 | \includegraphics[width=\paperwidth]{images/xmlcover.png}
51 | };
52 | \end{tikzpicture}
53 | \end{frame}
54 | }
55 |
56 | %------------------------------------------------
57 |
58 | \begin{frame}[fragile]
59 | \frametitle{Readme}
60 |
61 | \begin{block}{\scriptsize License:}
62 | \tiny
63 | \begin{itemize}
64 | \item[] Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License \\
65 | \url{http://creativecommons.org/licenses/by-nc-sa/4.0/}{}
66 | \end{itemize}
67 | \end{block}
68 |
69 | \begin{block}{\scriptsize You are free to:}
70 | \tiny
71 | \begin{itemize}
72 | \item[] \textcolor{darkgray}{\textbf{Share}} --- \textcolor{gray}{copy and redistribute the material}
73 | \item[] \textcolor{darkgray}{\textbf{Adapt}} --- \textcolor{gray}{rebuild and transform the material}
74 | \end{itemize}
75 | \end{block}
76 |
77 | \vspace{2mm}
78 | \begin{block}{\scriptsize Under the following conditions:}
79 | \tiny
80 | \begin{itemize}
81 | \item[] \textcolor{darkgray}{\textbf{Attribution}} --- \textcolor{gray}{You must give appropriate credit, provide a link to the license, and indicate if changes were made.}
82 | \item[] \textcolor{darkgray}{\textbf{NonCommercial}} --- \textcolor{gray}{You may not use this work for commercial purposes.}
83 | \item[] \textcolor{darkgray}{\textbf{Share Alike}} --- \textcolor{gray}{If you remix, transform, or build upon this
84 | work, you must distribute your contributions under the same license to this one.}
85 | \end{itemize}
86 | \end{block}
87 |
88 | \end{frame}
89 |
90 | %------------------------------------------------
91 |
92 | \begin{frame}
93 | \frametitle{Lectures Menu}
94 |
95 | \begin{columns}[t]
96 | \begin{column}{0.1\textwidth}
97 | %--- empty space ---%
98 | \end{column}
99 | \begin{column}{0.8\textwidth}
100 | \begin{block}{Slide Decks}
101 | \begin{enumerate}
102 | \item \textcolor{lightgray}{Introduction}
103 | \item \textcolor{lightgray}{Reading files from the Web}
104 | \item \textbf{Basics of XML and HTML}
105 | \item \textcolor{lightgray}{Parsing XML / HTML documents}
106 | \item \textcolor{lightgray}{Handling JSON data}
107 | \item \textcolor{lightgray}{HTTP Basics and the RCurl package}
108 | \item \textcolor{lightgray}{Getting data via Web Forms}
109 | \item \textcolor{lightgray}{Getting data via Web APIs}
110 | \end{enumerate}
111 | \end{block}
112 | \end{column}
113 | \begin{column}{0.1\textwidth}
114 | %--- empty space ---%
115 | \end{column}
116 | \end{columns}
117 |
118 | \end{frame}
119 |
120 | %------------------------------------------------
121 |
122 | \begin{frame}
123 | \begin{center}
124 | \Huge{\textcolor{mandarina}{Basics of XML \\ and HTML}}
125 | \end{center}
126 | \end{frame}
127 |
128 | %------------------------------------------------
129 |
130 | \begin{frame}
131 | \frametitle{Goal}
132 |
133 | \begin{columns}[t]
134 | \begin{column}{0.1\textwidth}
135 | %--- empty space ---%
136 | \end{column}
137 | \begin{column}{0.8\textwidth}
138 |
139 | \begin{block}{XML \& HTML}
140 | The goal of these slides is to give you a \textbf{crash introduction to XML and HTML} so you can get a good grasp of those formats for the rest of the lectures
141 | \end{block}
142 |
143 | \end{column}
144 | \begin{column}{0.1\textwidth}
145 | %--- empty space ---%
146 | \end{column}
147 | \end{columns}
148 |
149 | \end{frame}
150 |
151 | %------------------------------------------------
152 |
153 | \begin{frame}
154 | \frametitle{Synopsis}
155 |
156 | \begin{columns}[t]
157 | \begin{column}{0.1\textwidth}
158 | %--- empty space ---%
159 | \end{column}
160 | \begin{column}{0.8\textwidth}
161 |
162 | \begin{block}{In a nutshell}
163 | We'll cover a the following concepts:
164 | \begin{itemize}
165 | \item Importance of XML and HTML
166 | \item Hierarchical Structure
167 | \item Document Object Model (DOM)
168 | \end{itemize}
169 | \end{block}
170 |
171 | \end{column}
172 | \begin{column}{0.1\textwidth}
173 | %--- empty space ---%
174 | \end{column}
175 | \end{columns}
176 |
177 | \end{frame}
178 |
179 | %------------------------------------------------
180 |
181 | \begin{frame}
182 | \frametitle{Some References}
183 |
184 | \begin{itemize}
185 | \item XML Files website {\scriptsize (\url{http://www.xmlfiles.com})} \\
186 | \low{by Jan Egil Refsnes}
187 | \item XML in a Nutshell \\
188 | \low{by Elliotte Rusty Harold; W. Scott Means}
189 | \item XML Tutorial {\scriptsize (\url{http://www.w3schools.com/xml/default.asp})} \\
190 | \low{by w3schools}
191 | \item Introduction to Data Technologies \\
192 | \low{by Paul Murrell}
193 | \item XML and Web Technologies for Data Sciences with R \\
194 | \low{by Deb Nolan and Duncan Temple Lang}
195 | \end{itemize}
196 |
197 | \end{frame}
198 |
199 | %------------------------------------------------
200 |
201 | \begin{frame}
202 | \frametitle{XML and HTML}
203 |
204 | \begin{block}{Why you should care about XML and HTML?}
205 | \begin{itemize}
206 | \item Large amounts of data and information are stored, shared and distributed using HTML and XML-dialects
207 | \item They are widely adopted and used in many applications
208 | \item Working with data from the Web means dealing with HTML
209 | \end{itemize}
210 | \end{block}
211 |
212 | \end{frame}
213 |
214 |
215 | %------------------------------------------------
216 |
217 | \begin{frame}
218 | \begin{center}
219 | {\Huge \textcolor{mandarina}{XML}} \\
220 | \bigskip
221 | {\Large \textcolor{mandarina}{eXtensible Markup Language}}
222 | \end{center}
223 | \end{frame}
224 |
225 | %------------------------------------------------
226 |
227 | { % all template changes are local to this group.
228 | \setbeamertemplate{navigation symbols}{}
229 | \begin{frame}[plain]
230 | \begin{tikzpicture}[remember picture,overlay]
231 | \node[at=(current page.center)] {
232 | \includegraphics[width=\paperwidth]{images/xml_plants_catalog.png}
233 | };
234 | \end{tikzpicture}
235 | \end{frame}
236 | }
237 |
238 | %------------------------------------------------
239 |
240 | \begin{frame}
241 | \frametitle{Some Definitions}
242 |
243 | \begin{quotation}
244 | ``XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable''
245 | \end{quotation}
246 |
247 | {\footnotesize
248 | \hspace{8mm} \url{http://en.wikipedia.org/wiki/XML}
249 | }
250 |
251 | \bigskip
252 | \begin{quotation}
253 | ``XML is a data description language used for describing data''
254 | \end{quotation}
255 |
256 | {\footnotesize
257 | \hspace{8mm} \high{Paul Murrell} \\
258 | \hspace{8mm} \low{Introduction to Data Technologies}
259 | }
260 |
261 | \end{frame}
262 |
263 | %------------------------------------------------
264 |
265 | \begin{frame}
266 | \frametitle{Some Definitions}
267 |
268 | \begin{quotation}
269 | ``XML is a very general structure with which we can define any number of new formats to represent arbitrary data''
270 | \end{quotation}
271 |
272 | \begin{quotation}
273 | ``XML is a standard for the semantic, hierarchical representation of data''
274 | \end{quotation}
275 |
276 | {\footnotesize
277 | \hspace{8mm} \high{Deb Nolan \& Duncan Temple Lang} \\
278 | \hspace{8mm} \low{XML and Web Technologies for Data Sciences with R}
279 | }
280 |
281 | \end{frame}
282 |
283 | %------------------------------------------------
284 |
285 | \begin{frame}
286 | \frametitle{About XML}
287 |
288 | \begin{columns}[t]
289 | \begin{column}{0.1\textwidth}
290 | %--- empty space ---%
291 | \end{column}
292 | \begin{column}{0.8\textwidth}
293 |
294 | \begin{block}{XML}
295 | XML stands for \textbf{eXtensible Markup Language}
296 | \end{block}
297 |
298 | \begin{block}{Broadly speaking ...}
299 | XML provides a flexible framework to create formats for describing and representing data
300 | \end{block}
301 |
302 | \end{column}
303 | \begin{column}{0.1\textwidth}
304 | %--- empty space ---%
305 | \end{column}
306 | \end{columns}
307 |
308 | \end{frame}
309 |
310 | %------------------------------------------------
311 |
312 | \begin{frame}
313 | \frametitle{Markups}
314 |
315 | \begin{block}{Markup}
316 | A \textbf{markup} is a sequence of characters or other symbols inserted at certain places in a document to indicate either:
317 | \begin{itemize}
318 | \item how the content should be displayed when printed or in screen
319 | \item describe the document's structure
320 | \end{itemize}
321 | \end{block}
322 |
323 | \begin{block}{Markup Language}
324 | A markup language is a system for \textbf{annotating} (i.e. \textit{marking}) a document in a way that the content is distinguished from its representation \low{(eg LaTeX, PostScript, HTML, SVG)}
325 | \end{block}
326 |
327 | \end{frame}
328 |
329 | %------------------------------------------------
330 |
331 | \begin{frame}[fragile]
332 | \frametitle{Markups}
333 |
334 | \begin{block}{XML Markups}
335 | In XML (as well as in HTML) the marks (aka \textit{tags}) are defined using angle brackets: {\Huge \highcode{<>}}
336 | \end{block}
337 |
338 | \bigskip
339 |
340 | \code{ \high{}Text marked with special tag\high{} }
341 |
342 | \end{frame}
343 |
344 | %------------------------------------------------
345 |
346 | \begin{frame}[fragile]
347 | \frametitle{Extensible}
348 |
349 | \begin{block}{Extensible?}
350 | The concept of \textit{extensibility} means that we can define our own marks, the order in which they occur, and how they should be processed. For example:
351 | \begin{itemize}
352 | \item \highcode{}
353 | \item \highcode{}
354 | \item \highcode{}
355 | \item \highcode{}
356 | \end{itemize}
357 | \end{block}
358 |
359 | \end{frame}
360 |
361 | %------------------------------------------------
362 |
363 | \begin{frame}
364 | \frametitle{About XML}
365 |
366 | \begin{block}{XML is NOT}
367 | \begin{itemize}
368 | \item a programming language
369 | \item a network transfer protocol
370 | \item a database
371 | \end{itemize}
372 | \end{block}
373 |
374 | \begin{block}{XML is}
375 | \begin{itemize}
376 | \item more than a markup language
377 | \item a generic language that provides structure and syntax for representing any type of information
378 | \item a meta-language: it allows us to create or define other languages
379 | \end{itemize}
380 | \end{block}
381 |
382 | \end{frame}
383 |
384 | %------------------------------------------------
385 |
386 | \begin{frame}
387 | \frametitle{XML Applications}
388 |
389 | \begin{block}{Some XML dialects}
390 | \begin{itemize}
391 | \item \textbf{KML} (\textit{Keyhole Markup Language}) for describing geo-spatial information used in Google Earth, Google Maps, Google Sky
392 | \item \textbf{SVG} (\textit{Scalable Vector Graphics}) for visual graphical displays of two-dimensional graphics with support for interactivity and animation
393 | \item \textbf{PMML} (\textit{Predictive Model Markup Language}) for describing and exchanging models produced by data mining and machine learning algorithms
394 | \end{itemize}
395 | \end{block}
396 |
397 | \end{frame}
398 |
399 |
400 | %------------------------------------------------
401 |
402 | \begin{frame}
403 | \frametitle{XML Applications (con't)}
404 |
405 | \begin{block}{Some XML dialects}
406 | \begin{itemize}
407 | \item \textbf{RSS} (\textit{Rich Site Summary}) feeds for publishing blog entries
408 | \item \textbf{SDMX} (\textit{Statistical Data and Metadata Exchange}) for organizing and exchanging statistical information
409 | \item \textbf{GML} (\textit{Geography Markup Language}) for representing geographical features
410 | \item \textbf{SBML} (\textit{Systems Biology Markup Language}) for describing biological systems
411 | \end{itemize}
412 | \end{block}
413 |
414 | \end{frame}
415 |
416 | %------------------------------------------------
417 |
418 | \begin{frame}
419 | \begin{center}
420 | \Huge{\textcolor{mandarina}{Minimalist Example}}
421 | \end{center}
422 | \end{frame}
423 |
424 | %------------------------------------------------
425 |
426 | { % all template changes are local to this group.
427 | \setbeamertemplate{navigation symbols}{}
428 | \begin{frame}[plain]
429 | \begin{tikzpicture}[remember picture,overlay]
430 | \node[at=(current page.center)] {
431 | \includegraphics[width=\paperwidth]{images/goodwillhunting.jpg}
432 | };
433 | \end{tikzpicture}
434 | \end{frame}
435 | }
436 |
437 | %------------------------------------------------
438 |
439 | \begin{frame}[fragile]
440 | \frametitle{XML Example}
441 |
442 | \begin{block}{Ultra Simple XML}
443 | \begin{verbatim}
444 |
445 | Good Will Hunting
446 |
447 | \end{verbatim}
448 | \end{block}
449 |
450 | \bigskip
451 |
452 | \begin{itemize}
453 | \item one single element {\textit{movie}}
454 | \item start-tag: \highcode{}
455 | \item end-tag: \highcode{}
456 | \item content: \highcode{Good Will Hunting}
457 | \end{itemize}
458 |
459 | \end{frame}
460 |
461 | %------------------------------------------------
462 |
463 | \begin{frame}[fragile]
464 | \frametitle{XML Example}
465 |
466 | \begin{block}{Ultra Simple XML}
467 | \begin{verbatim}
468 |
469 | Good Will Hunting
470 |
471 | \end{verbatim}
472 | \end{block}
473 |
474 | \bigskip
475 |
476 | \begin{itemize}
477 | \item xml elements can have \textbf{attributes}
478 | \item attributes: \highcode{mins} \low{(minutes)} and \highcode{lang} \low{(language)}
479 | \item attributes are \textit{attached} to the element's start tag
480 | \item attribute values \textbf{must be quoted!}
481 | \end{itemize}
482 |
483 | \end{frame}
484 |
485 | %------------------------------------------------
486 |
487 | \begin{frame}[fragile]
488 | \frametitle{XML Example}
489 |
490 | \begin{block}{Minimalist XML}
491 | \begin{verbatim}
492 |
493 | Good Will Hunting
494 | Gus Van Sant
495 | 1998
496 | drama
497 |
498 | \end{verbatim}
499 | \end{block}
500 |
501 | \bigskip
502 |
503 | \begin{itemize}
504 | \item an xml element may contain other elements
505 | \item \textit{movie} contains several elements: \textit{title, director, year, genre}
506 | \end{itemize}
507 |
508 | \end{frame}
509 |
510 | %------------------------------------------------
511 |
512 | \begin{frame}[fragile]
513 | \frametitle{XML Example}
514 |
515 | \begin{block}{Simple XML}
516 | \begin{verbatim}
517 |
518 | Good Will Hunting
519 |
520 | Gus
521 | Van Sant
522 |
523 | 1998
524 | drama
525 |
526 | \end{verbatim}
527 | \end{block}
528 |
529 | \bigskip
530 |
531 | \begin{itemize}
532 | \item Now \textit{director} has two child elements: \textit{first\_name} and \textit{last\_name}
533 | \end{itemize}
534 |
535 | \end{frame}
536 |
537 | %------------------------------------------------
538 |
539 | \begin{frame}[fragile]
540 | \frametitle{XML Hierarchy Structure}
541 |
542 | \begin{block}{Conceptual XML}
543 | \begin{verbatim}
544 |
545 | ...
546 | ...
547 | ...
548 | ...
549 |
550 | \end{verbatim}
551 | \end{block}
552 |
553 | \bigskip
554 |
555 | \begin{itemize}
556 | \item An XML document can be represented with a \textbf{tree structure}
557 | \item An XML document must have \textbf{one single Root} element
558 | \item The \code{Root} may contain \code{child} elements
559 | \item A \code{child} element may contain \code{subchild} elements
560 | \end{itemize}
561 |
562 | \end{frame}
563 |
564 | %------------------------------------------------
565 |
566 | \begin{frame}
567 | \frametitle{XML Tree Structure}
568 |
569 | \begin{center}
570 | \includegraphics[width=9cm]{images/xml_movie_tree1.pdf}
571 | \end{center}
572 |
573 | \end{frame}
574 |
575 | %------------------------------------------------
576 |
577 | \begin{frame}
578 | \frametitle{XML Tree Structure (con't)}
579 |
580 | \begin{center}
581 | \includegraphics[width=9cm]{images/xml_movie_tree2.pdf}
582 | \end{center}
583 |
584 | \end{frame}
585 |
586 | %------------------------------------------------
587 |
588 | \begin{frame}
589 | \frametitle{Well-Formedness}
590 |
591 | \begin{block}{Well-formed XML}
592 | We say that an XML document is \textbf{well-formed} when it obeys the basic syntax rules of XML. Some of those rules are:
593 | \begin{itemize}
594 | \item one root element containing the rest of elements
595 | \item properly nested elements
596 | \item self-closing tags
597 | \item attributes appear in start-tags of elements
598 | \item attribute values must be quoted
599 | \item element names and attribute names are case sensitive
600 | \end{itemize}
601 | \end{block}
602 |
603 | \end{frame}
604 |
605 | %------------------------------------------------
606 |
607 | \begin{frame}
608 | \frametitle{Well-Formedness}
609 |
610 | \begin{block}{Importance of Well-formed XML}
611 | Not well-formed XML documents produce potentially fatal errors or warnings when parsed.
612 |
613 | \bigskip
614 | Documents may be well-formed but not valid. Well-formed just guarantees that the document meets the basic XML structure, not that the content is valid.
615 | \end{block}
616 |
617 | \end{frame}
618 |
619 | %------------------------------------------------
620 |
621 | \begin{frame}
622 | \begin{center}
623 | \Huge{\textcolor{mandarina}{Additional XML Elements}}
624 | \end{center}
625 | \end{frame}
626 |
627 | %------------------------------------------------
628 |
629 | \begin{frame}[fragile]
630 | \frametitle{Some Additional Elements}
631 |
632 | \begin{block}{Example with extra elemets}
633 | { \small
634 | \begin{verbatim}
635 |
636 | 5 & b < 10 ]]>
637 |
638 |
639 |
640 |
641 | Good Will Hunting
642 |
643 | Gus
644 | Van Sant
645 |
646 | 1998
647 | drama
648 |
649 | \end{verbatim}
650 | }
651 | \end{block}
652 |
653 | \end{frame}
654 |
655 | %------------------------------------------------
656 |
657 | \begin{frame}
658 | \frametitle{Additional Elements}
659 |
660 | \begin{center}
661 | \textcolor{turquoise}{Additional (optional) XML elements}
662 |
663 | \bigskip
664 | \begin{tabular}{l l}
665 | \hline
666 | Markup & Description \\
667 | \hline
668 | \code{} & XML Declaration \\
669 | & \low{identifies content as an XML document} \\
670 | \code{} & Processing Instruction \\
671 | & \low{processing instructions passed to application \code{PI}} \\
672 | \code{} & Document-type Declaration \\
673 | & \low{defines the structure of an XML document} \\
674 | \code{} & CDATA Character Data \\
675 | & \low{anything inside a CDATA is ignored by the parser} \\
676 | \code{} & Comment \\
677 | & \low{for writing comments} \\
678 | \hline
679 | \end{tabular}
680 | \end{center}
681 |
682 | \end{frame}
683 |
684 | %------------------------------------------------
685 |
686 | \begin{frame}
687 | \frametitle{DTD}
688 |
689 | \begin{block}{Document-Type Declaration}
690 | The Document-type Declaration identifies the \textbf{type} of the document. The \textit{type} indicates the structure of a \textbf{valid} document:
691 |
692 | \begin{itemize}
693 | \item what elements are allowed to be present
694 | \item how elements can be combined
695 | \item how elements must be ordered
696 | \end{itemize}
697 | \end{block}
698 |
699 | Basically, the DTD specifies what the format allows to do.
700 | \end{frame}
701 |
702 | %------------------------------------------------
703 |
704 | \begin{frame}
705 | \begin{center}
706 | \Huge{\textcolor{mandarina}{Wrapping Up}}
707 | \end{center}
708 | \end{frame}
709 |
710 | %------------------------------------------------
711 |
712 | \begin{frame}
713 | \frametitle{About XML}
714 |
715 | \begin{block}{About XML}
716 | \begin{itemize}
717 | \item designed to store and transfer data
718 | \item designed to be self-descriptive
719 | \item tags are not predefined and can be extended
720 | \end{itemize}
721 | \end{block}
722 |
723 | \end{frame}
724 |
725 | %------------------------------------------------
726 |
727 | \begin{frame}
728 | \frametitle{Characteristics of XML}
729 |
730 | \begin{block}{XML is}
731 | \begin{itemize}
732 | \item a generic language that provides structure and syntax for many markup dialects
733 | \item is a syntax or format for defining markup languages
734 | \item a standard for the semantic, hierarchical representation of data
735 | \item provides a general approach for representing all types of information dialects
736 | \end{itemize}
737 | \end{block}
738 |
739 | \end{frame}
740 |
741 | %------------------------------------------------
742 |
743 | \begin{frame}[fragile]
744 | \frametitle{XML document example}
745 |
746 | \begin{block}{Simple XML}
747 | \begin{verbatim}
748 |
749 |
750 |
751 |
752 | Good Will Hunting
753 |
754 | Gus
755 | Van Sant
756 |
757 | 1998
758 | drama
759 |
760 | \end{verbatim}
761 | \end{block}
762 |
763 | \end{frame}
764 |
765 | %------------------------------------------------
766 |
767 | \begin{frame}
768 | \frametitle{XML Tree Structure}
769 |
770 | \begin{block}{Each Node can have:}
771 | \begin{itemize}
772 | \item a Name
773 | \item any number of attributes
774 | \item optional content
775 | \item other nested elements
776 | \end{itemize}
777 | \end{block}
778 |
779 | \begin{block}{Traversing the tree}
780 | There's a \textbf{unique} path from the root node to any given node
781 | \end{block}
782 |
783 | \end{frame}
784 |
785 | %------------------------------------------------
786 |
787 | \end{document}
--------------------------------------------------------------------------------
/03-xml-basics/03-xml-basics.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/03-xml-basics.pdf
--------------------------------------------------------------------------------
/03-xml-basics/03-xml-basics.tex:
--------------------------------------------------------------------------------
1 | \documentclass{beamer}\usepackage[]{graphicx}\usepackage[]{color}
2 | %% maxwidth is the original width if it is less than linewidth
3 | %% otherwise use linewidth (to make sure the graphics do not exceed the margin)
4 | \makeatletter
5 | \def\maxwidth{ %
6 | \ifdim\Gin@nat@width>\linewidth
7 | \linewidth
8 | \else
9 | \Gin@nat@width
10 | \fi
11 | }
12 | \makeatother
13 |
14 | \definecolor{fgcolor}{rgb}{0.196, 0.196, 0.196}
15 | \newcommand{\hlnum}[1]{\textcolor[rgb]{0.063,0.58,0.627}{#1}}%
16 | \newcommand{\hlstr}[1]{\textcolor[rgb]{0.063,0.58,0.627}{#1}}%
17 | \newcommand{\hlcom}[1]{\textcolor[rgb]{0.588,0.588,0.588}{#1}}%
18 | \newcommand{\hlopt}[1]{\textcolor[rgb]{0.196,0.196,0.196}{#1}}%
19 | \newcommand{\hlstd}[1]{\textcolor[rgb]{0.196,0.196,0.196}{#1}}%
20 | \newcommand{\hlkwa}[1]{\textcolor[rgb]{0.231,0.416,0.784}{#1}}%
21 | \newcommand{\hlkwb}[1]{\textcolor[rgb]{0.627,0,0.314}{#1}}%
22 | \newcommand{\hlkwc}[1]{\textcolor[rgb]{0,0.631,0.314}{#1}}%
23 | \newcommand{\hlkwd}[1]{\textcolor[rgb]{0.78,0.227,0.412}{#1}}%
24 | \let\hlipl\hlkwb
25 |
26 | \usepackage{framed}
27 | \makeatletter
28 | \newenvironment{kframe}{%
29 | \def\at@end@of@kframe{}%
30 | \ifinner\ifhmode%
31 | \def\at@end@of@kframe{\end{minipage}}%
32 | \begin{minipage}{\columnwidth}%
33 | \fi\fi%
34 | \def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
35 | \colorbox{shadecolor}{##1}\hskip-\fboxsep
36 | % There is no \\@totalrightmargin, so:
37 | \hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
38 | \MakeFramed {\advance\hsize-\width
39 | \@totalleftmargin\z@ \linewidth\hsize
40 | \@setminipage}}%
41 | {\par\unskip\endMakeFramed%
42 | \at@end@of@kframe}
43 | \makeatother
44 |
45 | \definecolor{shadecolor}{rgb}{.97, .97, .97}
46 | \definecolor{messagecolor}{rgb}{0, 0, 0}
47 | \definecolor{warningcolor}{rgb}{1, 0, 1}
48 | \definecolor{errorcolor}{rgb}{1, 0, 0}
49 | \newenvironment{knitrout}{}{} % an empty environment to be redefined in TeX
50 |
51 | \usepackage{alltt}
52 |
53 | % load packages
54 | \usepackage{tikz}
55 | \usepackage{graphicx}
56 | \usepackage{upquote}
57 | \usepackage{listings}
58 | \usepackage{hyperref}
59 | \usepackage{color}
60 | \usepackage{lmodern}
61 |
62 | \input{../header.tex}
63 |
64 | \title[Getting data from the web with R]{\LARGE Getting Data from the Web with R}
65 | \subtitle[Web Data in R]{\large Part 3: Basics of XML}
66 | \author[gastonsanchez.com]{
67 | \textcolor{gray}{\textbf{G}aston \textbf{S}anchez}
68 | }
69 | \institute[]{\scriptsize \textcolor{lightgray}{April-May 2014}}
70 | \date[CC BY-SA-NC 4.0]{
71 | \textcolor{lightgray}{\tiny{Content licensed under
72 | \href{http://creativecommons.org/licenses/by-nc-sa/4.0/}{CC BY-NC-SA 4.0}}}
73 | }
74 | \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
75 | \begin{document}
76 |
77 |
78 |
79 |
80 | %--- the titlepage frame -------------------------%
81 |
82 | \begin{frame}[plain]
83 | \titlepage
84 | \end{frame}
85 |
86 | %------------------------------------------------
87 |
88 | { % all template changes are local to this group.
89 | \setbeamertemplate{navigation symbols}{}
90 | \begin{frame}[plain]
91 | \begin{tikzpicture}[remember picture,overlay]
92 | \node[at=(current page.center)] {
93 | \includegraphics[width=\paperwidth]{images/xmlcover.png}
94 | };
95 | \end{tikzpicture}
96 | \end{frame}
97 | }
98 |
99 | %------------------------------------------------
100 |
101 | \begin{frame}[fragile]
102 | \frametitle{Readme}
103 |
104 | \begin{block}{\scriptsize License:}
105 | \tiny
106 | \begin{itemize}
107 | \item[] Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License \\
108 | \url{http://creativecommons.org/licenses/by-nc-sa/4.0/}{}
109 | \end{itemize}
110 | \end{block}
111 |
112 | \begin{block}{\scriptsize You are free to:}
113 | \tiny
114 | \begin{itemize}
115 | \item[] \textcolor{darkgray}{\textbf{Share}} --- \textcolor{gray}{copy and redistribute the material}
116 | \item[] \textcolor{darkgray}{\textbf{Adapt}} --- \textcolor{gray}{rebuild and transform the material}
117 | \end{itemize}
118 | \end{block}
119 |
120 | \vspace{2mm}
121 | \begin{block}{\scriptsize Under the following conditions:}
122 | \tiny
123 | \begin{itemize}
124 | \item[] \textcolor{darkgray}{\textbf{Attribution}} --- \textcolor{gray}{You must give appropriate credit, provide a link to the license, and indicate if changes were made.}
125 | \item[] \textcolor{darkgray}{\textbf{NonCommercial}} --- \textcolor{gray}{You may not use this work for commercial purposes.}
126 | \item[] \textcolor{darkgray}{\textbf{Share Alike}} --- \textcolor{gray}{If you remix, transform, or build upon this
127 | work, you must distribute your contributions under the same license to this one.}
128 | \end{itemize}
129 | \end{block}
130 |
131 | \end{frame}
132 |
133 | %------------------------------------------------
134 |
135 | \begin{frame}
136 | \frametitle{Lectures Menu}
137 |
138 | \begin{columns}[t]
139 | \begin{column}{0.1\textwidth}
140 | %--- empty space ---%
141 | \end{column}
142 | \begin{column}{0.8\textwidth}
143 | \begin{block}{Slide Decks}
144 | \begin{enumerate}
145 | \item \textcolor{lightgray}{Introduction}
146 | \item \textcolor{lightgray}{Reading files from the Web}
147 | \item \textbf{Basics of XML and HTML}
148 | \item \textcolor{lightgray}{Parsing XML / HTML documents}
149 | \item \textcolor{lightgray}{Handling JSON data}
150 | \item \textcolor{lightgray}{HTTP Basics and the RCurl package}
151 | \item \textcolor{lightgray}{Getting data via Web Forms}
152 | \item \textcolor{lightgray}{Getting data via Web APIs}
153 | \end{enumerate}
154 | \end{block}
155 | \end{column}
156 | \begin{column}{0.1\textwidth}
157 | %--- empty space ---%
158 | \end{column}
159 | \end{columns}
160 |
161 | \end{frame}
162 |
163 | %------------------------------------------------
164 |
165 | \begin{frame}
166 | \begin{center}
167 | \Huge{\textcolor{mandarina}{Basics of XML \\ and HTML}}
168 | \end{center}
169 | \end{frame}
170 |
171 | %------------------------------------------------
172 |
173 | \begin{frame}
174 | \frametitle{Goal}
175 |
176 | \begin{columns}[t]
177 | \begin{column}{0.1\textwidth}
178 | %--- empty space ---%
179 | \end{column}
180 | \begin{column}{0.8\textwidth}
181 |
182 | \begin{block}{XML \& HTML}
183 | The goal of these slides is to give you a \textbf{crash introduction to XML and HTML} so you can get a good grasp of those formats for the rest of the lectures
184 | \end{block}
185 |
186 | \end{column}
187 | \begin{column}{0.1\textwidth}
188 | %--- empty space ---%
189 | \end{column}
190 | \end{columns}
191 |
192 | \end{frame}
193 |
194 | %------------------------------------------------
195 |
196 | \begin{frame}
197 | \frametitle{Synopsis}
198 |
199 | \begin{columns}[t]
200 | \begin{column}{0.1\textwidth}
201 | %--- empty space ---%
202 | \end{column}
203 | \begin{column}{0.8\textwidth}
204 |
205 | \begin{block}{In a nutshell}
206 | We'll cover a the following concepts:
207 | \begin{itemize}
208 | \item Importance of XML and HTML
209 | \item Hierarchical Structure
210 | \item Document Object Model (DOM)
211 | \end{itemize}
212 | \end{block}
213 |
214 | \end{column}
215 | \begin{column}{0.1\textwidth}
216 | %--- empty space ---%
217 | \end{column}
218 | \end{columns}
219 |
220 | \end{frame}
221 |
222 | %------------------------------------------------
223 |
224 | \begin{frame}
225 | \frametitle{Some References}
226 |
227 | \begin{itemize}
228 | \item XML Files website {\scriptsize (\url{http://www.xmlfiles.com})} \\
229 | \low{by Jan Egil Refsnes}
230 | \item XML in a Nutshell \\
231 | \low{by Elliotte Rusty Harold; W. Scott Means}
232 | \item XML Tutorial {\scriptsize (\url{http://www.w3schools.com/xml/default.asp})} \\
233 | \low{by w3schools}
234 | \item Introduction to Data Technologies \\
235 | \low{by Paul Murrell}
236 | \item XML and Web Technologies for Data Sciences with R \\
237 | \low{by Deb Nolan and Duncan Temple Lang}
238 | \end{itemize}
239 |
240 | \end{frame}
241 |
242 | %------------------------------------------------
243 |
244 | \begin{frame}
245 | \frametitle{XML and HTML}
246 |
247 | \begin{block}{Why you should care about XML and HTML?}
248 | \begin{itemize}
249 | \item Large amounts of data and information are stored, shared and distributed using HTML and XML-dialects
250 | \item They are widely adopted and used in many applications
251 | \item Working with data from the Web means dealing with HTML
252 | \end{itemize}
253 | \end{block}
254 |
255 | \end{frame}
256 |
257 |
258 | %------------------------------------------------
259 |
260 | \begin{frame}
261 | \begin{center}
262 | {\Huge \textcolor{mandarina}{XML}} \\
263 | \bigskip
264 | {\Large \textcolor{mandarina}{eXtensible Markup Language}}
265 | \end{center}
266 | \end{frame}
267 |
268 | %------------------------------------------------
269 |
270 | { % all template changes are local to this group.
271 | \setbeamertemplate{navigation symbols}{}
272 | \begin{frame}[plain]
273 | \begin{tikzpicture}[remember picture,overlay]
274 | \node[at=(current page.center)] {
275 | \includegraphics[width=\paperwidth]{images/xml_plants_catalog.png}
276 | };
277 | \end{tikzpicture}
278 | \end{frame}
279 | }
280 |
281 | %------------------------------------------------
282 |
283 | \begin{frame}
284 | \frametitle{Some Definitions}
285 |
286 | \begin{quotation}
287 | ``XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable''
288 | \end{quotation}
289 |
290 | {\footnotesize
291 | \hspace{8mm} \url{http://en.wikipedia.org/wiki/XML}
292 | }
293 |
294 | \bigskip
295 | \begin{quotation}
296 | ``XML is a data description language used for describing data''
297 | \end{quotation}
298 |
299 | {\footnotesize
300 | \hspace{8mm} \high{Paul Murrell} \\
301 | \hspace{8mm} \low{Introduction to Data Technologies}
302 | }
303 |
304 | \end{frame}
305 |
306 | %------------------------------------------------
307 |
308 | \begin{frame}
309 | \frametitle{Some Definitions}
310 |
311 | \begin{quotation}
312 | ``XML is a very general structure with which we can define any number of new formats to represent arbitrary data''
313 | \end{quotation}
314 |
315 | \begin{quotation}
316 | ``XML is a standard for the semantic, hierarchical representation of data''
317 | \end{quotation}
318 |
319 | {\footnotesize
320 | \hspace{8mm} \high{Deb Nolan \& Duncan Temple Lang} \\
321 | \hspace{8mm} \low{XML and Web Technologies for Data Sciences with R}
322 | }
323 |
324 | \end{frame}
325 |
326 | %------------------------------------------------
327 |
328 | \begin{frame}
329 | \frametitle{About XML}
330 |
331 | \begin{columns}[t]
332 | \begin{column}{0.1\textwidth}
333 | %--- empty space ---%
334 | \end{column}
335 | \begin{column}{0.8\textwidth}
336 |
337 | \begin{block}{XML}
338 | XML stands for \textbf{eXtensible Markup Language}
339 | \end{block}
340 |
341 | \begin{block}{Broadly speaking ...}
342 | XML provides a flexible framework to create formats for describing and representing data
343 | \end{block}
344 |
345 | \end{column}
346 | \begin{column}{0.1\textwidth}
347 | %--- empty space ---%
348 | \end{column}
349 | \end{columns}
350 |
351 | \end{frame}
352 |
353 | %------------------------------------------------
354 |
355 | \begin{frame}
356 | \frametitle{Markups}
357 |
358 | \begin{block}{Markup}
359 | A \textbf{markup} is a sequence of characters or other symbols inserted at certain places in a document to indicate either:
360 | \begin{itemize}
361 | \item how the content should be displayed when printed or in screen
362 | \item describe the document's structure
363 | \end{itemize}
364 | \end{block}
365 |
366 | \begin{block}{Markup Language}
367 | A markup language is a system for \textbf{annotating} (i.e. \textit{marking}) a document in a way that the content is distinguished from its representation \low{(eg LaTeX, PostScript, HTML, SVG)}
368 | \end{block}
369 |
370 | \end{frame}
371 |
372 | %------------------------------------------------
373 |
374 | \begin{frame}[fragile]
375 | \frametitle{Markups}
376 |
377 | \begin{block}{XML Markups}
378 | In XML (as well as in HTML) the marks (aka \textit{tags}) are defined using angle brackets: {\Huge \highcode{<>}}
379 | \end{block}
380 |
381 | \bigskip
382 |
383 | \code{ \high{}Text marked with special tag\high{} }
384 |
385 | \end{frame}
386 |
387 | %------------------------------------------------
388 |
389 | \begin{frame}[fragile]
390 | \frametitle{Extensible}
391 |
392 | \begin{block}{Extensible?}
393 | The concept of \textit{extensibility} means that we can define our own marks, the order in which they occur, and how they should be processed. For example:
394 | \begin{itemize}
395 | \item \highcode{}
396 | \item \highcode{}
397 | \item \highcode{}
398 | \item \highcode{}
399 | \end{itemize}
400 | \end{block}
401 |
402 | \end{frame}
403 |
404 | %------------------------------------------------
405 |
406 | \begin{frame}
407 | \frametitle{About XML}
408 |
409 | \begin{block}{XML is NOT}
410 | \begin{itemize}
411 | \item a programming language
412 | \item a network transfer protocol
413 | \item a database
414 | \end{itemize}
415 | \end{block}
416 |
417 | \begin{block}{XML is}
418 | \begin{itemize}
419 | \item more than a markup language
420 | \item a generic language that provides structure and syntax for representing any type of information
421 | \item a meta-language: it allows us to create or define other languages
422 | \end{itemize}
423 | \end{block}
424 |
425 | \end{frame}
426 |
427 | %------------------------------------------------
428 |
429 | \begin{frame}
430 | \frametitle{XML Applications}
431 |
432 | \begin{block}{Some XML dialects}
433 | \begin{itemize}
434 | \item \textbf{KML} (\textit{Keyhole Markup Language}) for describing geo-spatial information used in Google Earth, Google Maps, Google Sky
435 | \item \textbf{SVG} (\textit{Scalable Vector Graphics}) for visual graphical displays of two-dimensional graphics with support for interactivity and animation
436 | \item \textbf{PMML} (\textit{Predictive Model Markup Language}) for describing and exchanging models produced by data mining and machine learning algorithms
437 | \end{itemize}
438 | \end{block}
439 |
440 | \end{frame}
441 |
442 |
443 | %------------------------------------------------
444 |
445 | \begin{frame}
446 | \frametitle{XML Applications (con't)}
447 |
448 | \begin{block}{Some XML dialects}
449 | \begin{itemize}
450 | \item \textbf{RSS} (\textit{Rich Site Summary}) feeds for publishing blog entries
451 | \item \textbf{SDMX} (\textit{Statistical Data and Metadata Exchange}) for organizing and exchanging statistical information
452 | \item \textbf{GML} (\textit{Geography Markup Language}) for representing geographical features
453 | \item \textbf{SBML} (\textit{Systems Biology Markup Language}) for describing biological systems
454 | \end{itemize}
455 | \end{block}
456 |
457 | \end{frame}
458 |
459 | %------------------------------------------------
460 |
461 | \begin{frame}
462 | \begin{center}
463 | \Huge{\textcolor{mandarina}{Minimalist Example}}
464 | \end{center}
465 | \end{frame}
466 |
467 | %------------------------------------------------
468 |
469 | { % all template changes are local to this group.
470 | \setbeamertemplate{navigation symbols}{}
471 | \begin{frame}[plain]
472 | \begin{tikzpicture}[remember picture,overlay]
473 | \node[at=(current page.center)] {
474 | \includegraphics[width=\paperwidth]{images/goodwillhunting.jpg}
475 | };
476 | \end{tikzpicture}
477 | \end{frame}
478 | }
479 |
480 | %------------------------------------------------
481 |
482 | \begin{frame}[fragile]
483 | \frametitle{XML Example}
484 |
485 | \begin{block}{Ultra Simple XML}
486 | \begin{verbatim}
487 |
488 | Good Will Hunting
489 |
490 | \end{verbatim}
491 | \end{block}
492 |
493 | \bigskip
494 |
495 | \begin{itemize}
496 | \item one single element {\textit{movie}}
497 | \item start-tag: \highcode{}
498 | \item end-tag: \highcode{}
499 | \item content: \highcode{Good Will Hunting}
500 | \end{itemize}
501 |
502 | \end{frame}
503 |
504 | %------------------------------------------------
505 |
506 | \begin{frame}[fragile]
507 | \frametitle{XML Example}
508 |
509 | \begin{block}{Ultra Simple XML}
510 | \begin{verbatim}
511 |
512 | Good Will Hunting
513 |
514 | \end{verbatim}
515 | \end{block}
516 |
517 | \bigskip
518 |
519 | \begin{itemize}
520 | \item xml elements can have \textbf{attributes}
521 | \item attributes: \highcode{mins} \low{(minutes)} and \highcode{lang} \low{(language)}
522 | \item attributes are \textit{attached} to the element's start tag
523 | \item attribute values \textbf{must be quoted!}
524 | \end{itemize}
525 |
526 | \end{frame}
527 |
528 | %------------------------------------------------
529 |
530 | \begin{frame}[fragile]
531 | \frametitle{XML Example}
532 |
533 | \begin{block}{Minimalist XML}
534 | \begin{verbatim}
535 |
536 | Good Will Hunting
537 | Gus Van Sant
538 | 1998
539 | drama
540 |
541 | \end{verbatim}
542 | \end{block}
543 |
544 | \bigskip
545 |
546 | \begin{itemize}
547 | \item an xml element may contain other elements
548 | \item \textit{movie} contains several elements: \textit{title, director, year, genre}
549 | \end{itemize}
550 |
551 | \end{frame}
552 |
553 | %------------------------------------------------
554 |
555 | \begin{frame}[fragile]
556 | \frametitle{XML Example}
557 |
558 | \begin{block}{Simple XML}
559 | \begin{verbatim}
560 |
561 | Good Will Hunting
562 |
563 | Gus
564 | Van Sant
565 |
566 | 1998
567 | drama
568 |
569 | \end{verbatim}
570 | \end{block}
571 |
572 | \bigskip
573 |
574 | \begin{itemize}
575 | \item Now \textit{director} has two child elements: \textit{first\_name} and \textit{last\_name}
576 | \end{itemize}
577 |
578 | \end{frame}
579 |
580 | %------------------------------------------------
581 |
582 | \begin{frame}[fragile]
583 | \frametitle{XML Hierarchy Structure}
584 |
585 | \begin{block}{Conceptual XML}
586 | \begin{verbatim}
587 |
588 | ...
589 | ...
590 | ...
591 | ...
592 |
593 | \end{verbatim}
594 | \end{block}
595 |
596 | \bigskip
597 |
598 | \begin{itemize}
599 | \item An XML document can be represented with a \textbf{tree structure}
600 | \item An XML document must have \textbf{one single Root} element
601 | \item The \code{Root} may contain \code{child} elements
602 | \item A \code{child} element may contain \code{subchild} elements
603 | \end{itemize}
604 |
605 | \end{frame}
606 |
607 | %------------------------------------------------
608 |
609 | \begin{frame}
610 | \frametitle{XML Tree Structure}
611 |
612 | \begin{center}
613 | \includegraphics[width=9cm]{images/xml_movie_tree1.pdf}
614 | \end{center}
615 |
616 | \end{frame}
617 |
618 | %------------------------------------------------
619 |
620 | \begin{frame}
621 | \frametitle{XML Tree Structure (con't)}
622 |
623 | \begin{center}
624 | \includegraphics[width=9cm]{images/xml_movie_tree2.pdf}
625 | \end{center}
626 |
627 | \end{frame}
628 |
629 | %------------------------------------------------
630 |
631 | \begin{frame}
632 | \frametitle{Well-Formedness}
633 |
634 | \begin{block}{Well-formed XML}
635 | We say that an XML document is \textbf{well-formed} when it obeys the basic syntax rules of XML. Some of those rules are:
636 | \begin{itemize}
637 | \item one root element containing the rest of elements
638 | \item properly nested elements
639 | \item self-closing tags
640 | \item attributes appear in start-tags of elements
641 | \item attribute values must be quoted
642 | \item element names and attribute names are case sensitive
643 | \end{itemize}
644 | \end{block}
645 |
646 | \end{frame}
647 |
648 | %------------------------------------------------
649 |
650 | \begin{frame}
651 | \frametitle{Well-Formedness}
652 |
653 | \begin{block}{Importance of Well-formed XML}
654 | Not well-formed XML documents produce potentially fatal errors or warnings when parsed.
655 |
656 | \bigskip
657 | Documents may be well-formed but not valid. Well-formed just guarantees that the document meets the basic XML structure, not that the content is valid.
658 | \end{block}
659 |
660 | \end{frame}
661 |
662 | %------------------------------------------------
663 |
664 | \begin{frame}
665 | \begin{center}
666 | \Huge{\textcolor{mandarina}{Additional XML Elements}}
667 | \end{center}
668 | \end{frame}
669 |
670 | %------------------------------------------------
671 |
672 | \begin{frame}[fragile]
673 | \frametitle{Some Additional Elements}
674 |
675 | \begin{block}{Example with extra elemets}
676 | { \small
677 | \begin{verbatim}
678 |
679 | 5 & b < 10 ]]>
680 |
681 |
682 |
683 |
684 | Good Will Hunting
685 |
686 | Gus
687 | Van Sant
688 |
689 | 1998
690 | drama
691 |
692 | \end{verbatim}
693 | }
694 | \end{block}
695 |
696 | \end{frame}
697 |
698 | %------------------------------------------------
699 |
700 | \begin{frame}
701 | \frametitle{Additional Elements}
702 |
703 | \begin{center}
704 | \textcolor{turquoise}{Additional (optional) XML elements}
705 |
706 | \bigskip
707 | \begin{tabular}{l l}
708 | \hline
709 | Markup & Description \\
710 | \hline
711 | \code{} & XML Declaration \\
712 | & \low{identifies content as an XML document} \\
713 | \code{} & Processing Instruction \\
714 | & \low{processing instructions passed to application \code{PI}} \\
715 | \code{} & Document-type Declaration \\
716 | & \low{defines the structure of an XML document} \\
717 | \code{} & CDATA Character Data \\
718 | & \low{anything inside a CDATA is ignored by the parser} \\
719 | \code{} & Comment \\
720 | & \low{for writing comments} \\
721 | \hline
722 | \end{tabular}
723 | \end{center}
724 |
725 | \end{frame}
726 |
727 | %------------------------------------------------
728 |
729 | \begin{frame}
730 | \frametitle{DTD}
731 |
732 | \begin{block}{Document-Type Declaration}
733 | The Document-type Declaration identifies the \textbf{type} of the document. The \textit{type} indicates the structure of a \textbf{valid} document:
734 |
735 | \begin{itemize}
736 | \item what elements are allowed to be present
737 | \item how elements can be combined
738 | \item how elements must be ordered
739 | \end{itemize}
740 | \end{block}
741 |
742 | Basically, the DTD specifies what the format allows to do.
743 | \end{frame}
744 |
745 | %------------------------------------------------
746 |
747 | \begin{frame}
748 | \begin{center}
749 | \Huge{\textcolor{mandarina}{Wrapping Up}}
750 | \end{center}
751 | \end{frame}
752 |
753 | %------------------------------------------------
754 |
755 | \begin{frame}
756 | \frametitle{About XML}
757 |
758 | \begin{block}{About XML}
759 | \begin{itemize}
760 | \item designed to store and transfer data
761 | \item designed to be self-descriptive
762 | \item tags are not predefined and can be extended
763 | \end{itemize}
764 | \end{block}
765 |
766 | \end{frame}
767 |
768 | %------------------------------------------------
769 |
770 | \begin{frame}
771 | \frametitle{Characteristics of XML}
772 |
773 | \begin{block}{XML is}
774 | \begin{itemize}
775 | \item a generic language that provides structure and syntax for many markup dialects
776 | \item is a syntax or format for defining markup languages
777 | \item a standard for the semantic, hierarchical representation of data
778 | \item provides a general approach for representing all types of information dialects
779 | \end{itemize}
780 | \end{block}
781 |
782 | \end{frame}
783 |
784 | %------------------------------------------------
785 |
786 | \begin{frame}[fragile]
787 | \frametitle{XML document example}
788 |
789 | \begin{block}{Simple XML}
790 | \begin{verbatim}
791 |
792 |
793 |
794 |
795 | Good Will Hunting
796 |
797 | Gus
798 | Van Sant
799 |
800 | 1998
801 | drama
802 |
803 | \end{verbatim}
804 | \end{block}
805 |
806 | \end{frame}
807 |
808 | %------------------------------------------------
809 |
810 | \begin{frame}
811 | \frametitle{XML Tree Structure}
812 |
813 | \begin{block}{Each Node can have:}
814 | \begin{itemize}
815 | \item a Name
816 | \item any number of attributes
817 | \item optional content
818 | \item other nested elements
819 | \end{itemize}
820 | \end{block}
821 |
822 | \begin{block}{Traversing the tree}
823 | There's a \textbf{unique} path from the root node to any given node
824 | \end{block}
825 |
826 | \end{frame}
827 |
828 | %------------------------------------------------
829 |
830 | \end{document}
831 |
--------------------------------------------------------------------------------
/03-xml-basics/images/blacksmith.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/images/blacksmith.png
--------------------------------------------------------------------------------
/03-xml-basics/images/goodwillhunting.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/images/goodwillhunting.jpg
--------------------------------------------------------------------------------
/03-xml-basics/images/parsing_xml.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/images/parsing_xml.jpg
--------------------------------------------------------------------------------
/03-xml-basics/images/spider_web.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/images/spider_web.jpg
--------------------------------------------------------------------------------
/03-xml-basics/images/xml_movie_tree1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/images/xml_movie_tree1.pdf
--------------------------------------------------------------------------------
/03-xml-basics/images/xml_movie_tree2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/images/xml_movie_tree2.pdf
--------------------------------------------------------------------------------
/03-xml-basics/images/xml_plants_catalog.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/images/xml_plants_catalog.png
--------------------------------------------------------------------------------
/03-xml-basics/images/xmlcover.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/03-xml-basics/images/xmlcover.png
--------------------------------------------------------------------------------
/04-parsing-xml/04-parsing-xml.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/04-parsing-xml.pdf
--------------------------------------------------------------------------------
/04-parsing-xml/images/mailing_lists.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/mailing_lists.png
--------------------------------------------------------------------------------
/04-parsing-xml/images/mailing_sig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/mailing_sig.png
--------------------------------------------------------------------------------
/04-parsing-xml/images/mailing_sig_source.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/mailing_sig_source.png
--------------------------------------------------------------------------------
/04-parsing-xml/images/xml_package.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xml_package.png
--------------------------------------------------------------------------------
/04-parsing-xml/images/xml_tree_navigate.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xml_tree_navigate.pdf
--------------------------------------------------------------------------------
/04-parsing-xml/images/xmlparsing_cover.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xmlparsing_cover.png
--------------------------------------------------------------------------------
/04-parsing-xml/images/xpath_director.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xpath_director.pdf
--------------------------------------------------------------------------------
/04-parsing-xml/images/xpath_firstname.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xpath_firstname.pdf
--------------------------------------------------------------------------------
/04-parsing-xml/images/xpath_lastname.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xpath_lastname.pdf
--------------------------------------------------------------------------------
/04-parsing-xml/images/xpath_movie.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xpath_movie.pdf
--------------------------------------------------------------------------------
/04-parsing-xml/images/xpath_title.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xpath_title.pdf
--------------------------------------------------------------------------------
/04-parsing-xml/images/xpath_tree.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xpath_tree.pdf
--------------------------------------------------------------------------------
/04-parsing-xml/images/xpath_ytmt.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/04-parsing-xml/images/xpath_ytmt.pdf
--------------------------------------------------------------------------------
/05-json-data/05-json-data.Rnw:
--------------------------------------------------------------------------------
1 | \documentclass{beamer}
2 |
3 | % load packages
4 | \usepackage{tikz}
5 | \usepackage{graphicx}
6 | \usepackage{upquote}
7 | \usepackage{listings}
8 | \usepackage{hyperref}
9 | \usepackage{color}
10 | \usepackage{lmodern}
11 |
12 | \input{../header.tex}
13 |
14 | \title[Getting data from the web with R]{\LARGE Getting Data from the Web with R}
15 | \subtitle[Web Data in R]{\large Part 5: Handling JSON data}
16 | \author[gastonsanchez.com]{
17 | \textcolor{gray}{\textbf{G}aston \textbf{S}anchez}
18 | }
19 | \institute[]{\scriptsize \textcolor{lightgray}{April-May 2014}}
20 | \date[CC BY-SA-NC 4.0]{
21 | \textcolor{lightgray}{\tiny{Content licensed under
22 | \href{http://creativecommons.org/licenses/by-nc-sa/4.0/}{CC BY-NC-SA 4.0}}}
23 | }
24 |
25 |
26 | \begin{document}
27 | <>=
28 | # smaller font size for chunks
29 | opts_chunk$set(size = 'tiny')
30 | thm <- knit_theme$get("bclear")
31 | knit_theme$set(thm)
32 | options(width=78)
33 | @
34 |
35 |
36 | %--- the titlepage frame -------------------------%
37 |
38 | \begin{frame}[plain]
39 | \titlepage
40 | \end{frame}
41 |
42 | %------------------------------------------------
43 |
44 | { % all template changes are local to this group.
45 | \setbeamertemplate{navigation symbols}{}
46 | \begin{frame}[plain]
47 | \begin{tikzpicture}[remember picture,overlay]
48 | \node[at=(current page.center)] {
49 | \includegraphics[width=\paperwidth]{images/json_cover.png}
50 | };
51 | \end{tikzpicture}
52 | \end{frame}
53 | }
54 |
55 | %------------------------------------------------
56 |
57 | \begin{frame}[fragile]
58 | \frametitle{Readme}
59 |
60 | \begin{block}{\scriptsize License:}
61 | \tiny
62 | \begin{itemize}
63 | \item[] Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License \\
64 | \url{http://creativecommons.org/licenses/by-nc-sa/4.0/}{}
65 | \end{itemize}
66 | \end{block}
67 |
68 | \begin{block}{\scriptsize You are free to:}
69 | \tiny
70 | \begin{itemize}
71 | \item[] \textcolor{darkgray}{\textbf{Share}} --- \textcolor{gray}{copy and redistribute the material}
72 | \item[] \textcolor{darkgray}{\textbf{Adapt}} --- \textcolor{gray}{rebuild and transform the material}
73 | \end{itemize}
74 | \end{block}
75 |
76 | \vspace{2mm}
77 | \begin{block}{\scriptsize Under the following conditions:}
78 | \tiny
79 | \begin{itemize}
80 | \item[] \textcolor{darkgray}{\textbf{Attribution}} --- \textcolor{gray}{You must give appropriate credit, provide a link to the license, and indicate if changes were made.}
81 | \item[] \textcolor{darkgray}{\textbf{NonCommercial}} --- \textcolor{gray}{You may not use this work for commercial purposes.}
82 | \item[] \textcolor{darkgray}{\textbf{Share Alike}} --- \textcolor{gray}{If you remix, transform, or build upon this
83 | work, you must distribute your contributions under the same license to this one.}
84 | \end{itemize}
85 | \end{block}
86 |
87 | \end{frame}
88 |
89 | %------------------------------------------------
90 |
91 | \begin{frame}
92 | \frametitle{Lectures Menu}
93 |
94 | \begin{columns}[t]
95 | \begin{column}{0.1\textwidth}
96 | %--- empty space ---%
97 | \end{column}
98 | \begin{column}{0.8\textwidth}
99 | \begin{block}{Slide Decks}
100 | \begin{enumerate}
101 | \item \textcolor{lightgray}{Introduction}
102 | \item \textcolor{lightgray}{Reading files from the Web}
103 | \item \textcolor{lightgray}{Basics of XML and HTML}
104 | \item \textcolor{lightgray}{Parsing XML / HTML content}
105 | \item \textbf{Handling JSON data}
106 | \item \textcolor{lightgray}{HTTP Basics and the RCurl Package}
107 | \item \textcolor{lightgray}{Getting data via Web Forms}
108 | \item \textcolor{lightgray}{Getting data via Web APIs}
109 | \end{enumerate}
110 | \end{block}
111 | \end{column}
112 | \begin{column}{0.1\textwidth}
113 | %--- empty space ---%
114 | \end{column}
115 | \end{columns}
116 |
117 | \end{frame}
118 |
119 | %------------------------------------------------
120 |
121 | \begin{frame}
122 | \begin{center}
123 | \Huge{\textcolor{mandarina}{JSON Data}}
124 | \end{center}
125 | \end{frame}
126 |
127 | %------------------------------------------------
128 |
129 | \begin{frame}
130 | \frametitle{Goal}
131 |
132 | \begin{columns}[t]
133 | \begin{column}{0.1\textwidth}
134 | %--- empty space ---%
135 | \end{column}
136 | \begin{column}{0.8\textwidth}
137 |
138 | \begin{block}{JSON}
139 | The goal of these slides is to provide an introduction for \textbf{handling JSON data in R}
140 | \end{block}
141 |
142 | \end{column}
143 | \begin{column}{0.1\textwidth}
144 | %--- empty space ---%
145 | \end{column}
146 | \end{columns}
147 |
148 | \end{frame}
149 |
150 | %------------------------------------------------
151 |
152 | \begin{frame}
153 | \frametitle{Synopsis}
154 |
155 | \begin{columns}[t]
156 | \begin{column}{0.1\textwidth}
157 | %--- empty space ---%
158 | \end{column}
159 | \begin{column}{0.8\textwidth}
160 |
161 | \begin{block}{In a nutshell}
162 | We'll cover the following topics:
163 | \begin{itemize}
164 | \item JSON Basics
165 | \item R packages for JSON data
166 | \item Reading JSON data from the Web
167 | \end{itemize}
168 | \end{block}
169 |
170 | \end{column}
171 | \begin{column}{0.1\textwidth}
172 | %--- empty space ---%
173 | \end{column}
174 | \end{columns}
175 |
176 | \end{frame}
177 |
178 | %------------------------------------------------
179 |
180 | \begin{frame}
181 | \frametitle{Some References}
182 |
183 | \begin{itemize}
184 | \item XML and Web Technlogies for Data Sciences with R \\
185 | \low{by Deb Nolan and Duncan Temple Lang}
186 | \item Introducing JSON \\
187 | {\scriptsize \url{http://www.json.org/}}
188 | \item R package RJSONIO \\
189 | {\scriptsize \url{http://cran.r-project.org/web/packages/RJSONIO/index.html}}
190 | \item R package jsonlite \\
191 | {\scriptsize \url{http://cran.r-project.org/web/packages/jsonlite/vignettes/json-mapping.pdf}}
192 | \item R package rjson \\
193 | {\scriptsize \url{http://cran.r-project.org/web/packages/rjson/index.html}}
194 | \end{itemize}
195 |
196 | \end{frame}
197 |
198 | %------------------------------------------------
199 |
200 | \begin{frame}
201 | \begin{center}
202 | \Huge{\textcolor{mandarina}{JSON Basics}}
203 | \end{center}
204 | \end{frame}
205 |
206 | %------------------------------------------------
207 |
208 | \begin{frame}
209 | \frametitle{Basics First}
210 |
211 | \begin{columns}[t]
212 | \begin{column}{0.1\textwidth}
213 | %--- empty space ---%
214 | \end{column}
215 | \begin{column}{0.7\textwidth}
216 |
217 | \begin{block}{Fundamentals}
218 | JSON stands for \textbf{JavaScript Object Notation} \\
219 | and it is a format for representing data
220 | \begin{itemize}
221 | \item general purpose format
222 | \item lightweight format
223 | \item widely popular
224 | \item fairly simple
225 | \end{itemize}
226 | \end{block}
227 |
228 | \end{column}
229 | \begin{column}{0.1\textwidth}
230 | %--- empty space ---%
231 | \end{column}
232 | \end{columns}
233 |
234 | \end{frame}
235 |
236 | %------------------------------------------------
237 |
238 | \begin{frame}
239 | \frametitle{Basics First}
240 |
241 | \begin{block}{Why should we care?}
242 | When working with data from the Web, we'll inevitably find some JSON data
243 | \begin{itemize}
244 | \item JSON can be used directly in JavaScript code for Web pages
245 | \item many Web APIs provide data in JSON format
246 | \item R has packages designed to handle JSON data
247 | \end{itemize}
248 | \end{block}
249 |
250 | \end{frame}
251 |
252 | %------------------------------------------------
253 |
254 | \begin{frame}
255 | \begin{center}
256 | \Huge{\textcolor{mandarina}{Understanding JSON}}
257 | \end{center}
258 | \end{frame}
259 |
260 | %------------------------------------------------
261 |
262 | \begin{frame}
263 | \frametitle{Understanding JSON}
264 |
265 | \begin{columns}[t]
266 | \begin{column}{0.3\textwidth}
267 | \begin{block}{JSON Data Types}
268 | \begin{itemize}
269 | \item[] \highcode{null}
270 | \item[] \highcode{true}
271 | \item[] \highcode{false}
272 | \item[] \highcode{number}
273 | \item[] \highcode{string}
274 | \end{itemize}
275 | \end{block}
276 | \end{column}
277 |
278 | \begin{column}{0.4\textwidth}
279 | \begin{block}{JSON Data Containers}
280 | \begin{itemize}
281 | \item[] square brackets \highcode{[ ]}
282 | \item[] curly brackets \highcode{\{ \}}
283 | \end{itemize}
284 | \end{block}
285 | \end{column}
286 | \end{columns}
287 |
288 | \end{frame}
289 |
290 | %------------------------------------------------
291 |
292 | \begin{frame}
293 | \frametitle{JSON Arrays}
294 |
295 | \begin{block}{Unnamed Arrays}
296 | Square brackets \highcode{[ ]} are used for \textbf{ordered unnamed arrays}
297 | \begin{itemize}
298 | \item \code{ [ 1, 2, 3, ... ] }
299 | \item \code{ [ true, true, false, ... ] }
300 | \end{itemize}
301 | \end{block}
302 |
303 | \begin{block}{Named Arrays}
304 | Curly brackets \highcode{\{ \}} are used for \textbf{named arrays}
305 | \begin{itemize}
306 | \item \code{ \{ "dollars" : 5, "euros" : 20, ... \} }
307 | \item \code{ \{ "city" : "Berkeley", "state" : "CA", ... \} }
308 | \end{itemize}
309 | \end{block}
310 |
311 | \end{frame}
312 |
313 | %------------------------------------------------
314 |
315 | \begin{frame}[fragile]
316 | \frametitle{JSON Arrays}
317 |
318 | Containers can be nested
319 |
320 | \begin{columns}[t]
321 | \begin{column}{0.5\textwidth}
322 | \begin{block}{Example A}
323 | {\footnotesize
324 | \begin{verbatim}
325 | {
326 | "name": ["X", "Y", "Z"],
327 | "grams": [300, 200, 500],
328 | "qty": [4, 5, null],
329 | "new": [true, false, true],
330 | }
331 | \end{verbatim}
332 | }
333 | \end{block}
334 | \end{column}
335 |
336 | \begin{column}{0.5\textwidth}
337 | \begin{block}{Example B}
338 | {\footnotesize
339 | \begin{verbatim}
340 | [
341 | { "name": "X",
342 | "grams": 300,
343 | "qty": 4,
344 | "new": true },
345 | { "name": "Y",
346 | "grams": 200,
347 | "qty": 5,
348 | "new": false },
349 | { "name": "Z",
350 | "grams": 500,
351 | "qty": null,
352 | "new": true}
353 | ]
354 | \end{verbatim}
355 | }
356 | \end{block}
357 | \end{column}
358 | \end{columns}
359 |
360 | \end{frame}
361 |
362 | %------------------------------------------------
363 |
364 | \begin{frame}[fragile]
365 | \frametitle{Data Table Toy Example}
366 |
367 | \begin{center}
368 | \textcolor{turquoise}{Imagine we have some data}
369 |
370 | \bigskip
371 | \begin{tabular}{l l l l l}
372 | \hline
373 | Name & Gender & Homeland & Born & Jedi \\
374 | \hline
375 | Anakin & male & Tatooine & 41.9BBY & yes \\
376 | Amidala & female & Naboo & 46BBY & no \\
377 | Luke & male & Tatooine & 19BBY & yes \\
378 | Leia & female & Alderaan & 19BBY & no \\
379 | Obi-Wan & male & Stewjon & 57BBY & yes \\
380 | Han & male & Corellia & 29BBY & no \\
381 | Palpatine & male & Naboo & 82BBY & no \\
382 | R2-D2 & unknown & Naboo & 33BBY & no \\
383 | \hline
384 | \end{tabular}
385 | \end{center}
386 |
387 | There are several ways to represent this data in JSON format
388 |
389 | \end{frame}
390 |
391 | %------------------------------------------------
392 |
393 | \begin{frame}[fragile]
394 | \frametitle{One way to represent data}
395 |
396 | {\footnotesize
397 | \begin{verbatim}
398 | [
399 | {
400 | "Name": "Anakin",
401 | "Gender": "male",
402 | "Homeworld": "Tatooine",
403 | "Born": "41.9BBY",
404 | "Jedi": "yes"
405 | },
406 | ...
407 | {
408 | "Name": "R2-D2",
409 | "Gender": "unknown",
410 | "Homeworld": "Naboo",
411 | "Born": "33BBY",
412 | "Jedi": "no"
413 | },
414 | ]
415 | \end{verbatim}
416 | }
417 | \end{frame}
418 |
419 | %------------------------------------------------
420 |
421 | \begin{frame}[fragile]
422 | \frametitle{Another way to represent data}
423 |
424 | {\footnotesize
425 | \begin{verbatim}
426 | {
427 | "Name": [ "Anakin", "Amidala", "Luke", ... , "R2-D2" ],
428 | "Gender": [ "male", "female", "male", ... , "unknown" ],
429 | "Homeworld": [ "Tatooine", "Naboo", "Tatooine", ... , "Naboo" ],
430 | "Born": [ "41.9BBY", "46BBY", "19BBY", ... , "33BBY" ],
431 | "Jedi": [ "yes", "no", "yes", ... , "no" ]
432 | }
433 |
434 | \end{verbatim}
435 | }
436 | \end{frame}
437 |
438 | %------------------------------------------------
439 |
440 | \begin{frame}
441 | \begin{center}
442 | \Huge{\textcolor{mandarina}{JSON R packages}}
443 | \end{center}
444 | \end{frame}
445 |
446 | %------------------------------------------------
447 |
448 | \begin{frame}
449 | \frametitle{R packages}
450 |
451 | \begin{block}{R packages for JSON}
452 | R has 3 packages for working with JSON data
453 |
454 | \begin{itemize}
455 | \item \highcode{"RJSONIO"} by Duncan Temple Lang
456 | \item \highcode{"rjson"} by Alex Couture-Beil
457 | \item \highcode{"jsonlite"} by Jeroen Ooms, Duncan Temple Lang, Jonathan Wallace
458 | \end{itemize}
459 | \end{block}
460 |
461 | All packages provide 2 main functions ---\highcode{toJSON()} and \highcode{fromJSON()}--- that allow conversion \textbf{to} and \textbf{from} data in JSON format, respectively. \\
462 | \low{We'll focus on the functions from \code{"RJSONIO"}}
463 |
464 | \end{frame}
465 |
466 | %------------------------------------------------
467 |
468 | \begin{frame}[fragile]
469 | \frametitle{R package RJSONIO}
470 |
471 | \begin{block}{R package \code{"RJSONIO"}}
472 | If you don't have \code{"RJSONIO"} you'll have to install it:
473 | <>=
474 | # install RJSONIO
475 | install.packages("RJSONIO", dependencies = TRUE)
476 | @
477 | \end{block}
478 |
479 | \end{frame}
480 |
481 | %------------------------------------------------
482 |
483 | \begin{frame}[fragile]
484 | \frametitle{R package RJSONIO}
485 |
486 | \begin{block}{Main functions}
487 | There are 2 primary functions in \code{"RJSONIO"}
488 | \begin{itemize}
489 | \item \highcode{toJSON()} converts an R object to a string in JSON
490 | \item \highcode{fromJSON()} converts JSON content to R objects
491 | \end{itemize}
492 | \end{block}
493 |
494 | \end{frame}
495 |
496 | %------------------------------------------------
497 |
498 | \begin{frame}[fragile]
499 | \frametitle{\code{toJSON()}}
500 |
501 | \begin{block}{Function \code{toJSON()}}
502 | \begin{verbatim}
503 | toJSON(x, container = isContainer(x, asIs, .level),
504 | collapse = "\n", ...)
505 | \end{verbatim}
506 |
507 | \begin{itemize}
508 | \item \highcode{x} the R object to be converted to JSON format
509 | \item \highcode{container} whether to treat the object as a vector/container or a scalar
510 | \item \highcode{collapse} string used as separator when combining the individual lines of the generated JSON content
511 | \item \highcode{...} additional arguments controlling the JSON formatting
512 | \end{itemize}
513 | \end{block}
514 |
515 | \end{frame}
516 |
517 | %------------------------------------------------
518 |
519 | \begin{frame}[fragile]
520 | \frametitle{\code{fromJSON()}}
521 |
522 | \begin{block}{Function \code{fromJSON()}}
523 | \begin{verbatim}
524 | fromJSON(content, handler = NULL, default.size = 100,
525 | depth = 150L, allowComments = TRUE, ...)
526 | \end{verbatim}
527 |
528 | \begin{itemize}
529 | \item \highcode{content} the JSON content: either a file name or a character string
530 | \item \highcode{handler} R object responsible for processing each individual token/element
531 | \item \highcode{deafult.size} size to use for arrays and objects in an effort to avoid reallocating each time we add a new element.
532 | \item \highcode{depth} maximum number of nested JSON levels
533 | \item \highcode{allowComments} whether to allow C-style comments within the JSON content
534 | \item \highcode{...} additional parameters
535 | \end{itemize}
536 | \end{block}
537 |
538 | \end{frame}
539 |
540 | %------------------------------------------------
541 |
542 | \begin{frame}[fragile]
543 | \frametitle{Data Table Toy Example}
544 |
545 | \begin{center}
546 | \textcolor{turquoise}{Imagine we have some tabular data}
547 | \end{center}
548 |
549 | \begin{center}
550 | \begin{tabular}{l l l l l}
551 | \hline
552 | Name & Gender & Homeland & Born & Jedi \\
553 | \hline
554 | Anakin & male & Tatooine & 41.9BBY & yes \\
555 | Amidala & female & Naboo & 46BBY & no \\
556 | Luke & male & Tatooine & 19BBY & yes \\
557 | Leia & female & Alderaan & 19BBY & no \\
558 | Obi-Wan & male & Stewjon & 57BBY & yes \\
559 | Han & male & Corellia & 29BBY & no \\
560 | Palpatine & male & Naboo & 82BBY & no \\
561 | R2-D2 & unknown & Naboo & 33BBY & no \\
562 | \hline
563 | \end{tabular}
564 | \end{center}
565 |
566 | \end{frame}
567 |
568 | %------------------------------------------------
569 |
570 | \begin{frame}[fragile]
571 | \frametitle{R Data Frame}
572 |
573 | <>=
574 | # toy data
575 | sw_data = rbind(
576 | c("Anakin", "male", "Tatooine", "41.9BBY", "yes"),
577 | c("Amidala", "female", "Naboo", "46BBY", "no"),
578 | c("Luke", "male", "Tatooine", "19BBY", "yes"),
579 | c("Leia", "female", "Alderaan", "19BBY", "no"),
580 | c("Obi-Wan", "male", "Stewjon", "57BBY", "yes"),
581 | c("Han", "male", "Corellia", "29BBY", "no"),
582 | c("Palpatine", "male", "Naboo", "82BBY", "no"),
583 | c("R2-D2", "unknown", "Naboo", "33BBY", "no"))
584 |
585 | # convert to data.frame and add column names
586 | swdf = data.frame(sw_data)
587 | names(swdf) = c("Name", "Gender", "Homeworld", "Born", "Jedi")
588 | swdf
589 | @
590 |
591 | \end{frame}
592 |
593 | %------------------------------------------------
594 |
595 | \begin{frame}[fragile]
596 | \frametitle{From R to JSON}
597 |
598 | <>=
599 | # load RJSONIO
600 | library(RJSONIO)
601 |
602 | # convert R data.frame to JSON
603 | sw_json = toJSON(swdf)
604 |
605 | # what class?
606 | class(sw_json)
607 |
608 | # display JSON format
609 | cat(sw_json)
610 | @
611 |
612 | \end{frame}
613 |
614 | %------------------------------------------------
615 |
616 | \begin{frame}[fragile]
617 | \frametitle{From JSON to R}
618 |
619 | <>=
620 | # convert JSON string to R list
621 | sw_R = fromJSON(sw_json)
622 |
623 | # what class?
624 | class(sw_R)
625 |
626 | # display JSON format
627 | sw_R
628 | @
629 |
630 | \end{frame}
631 |
632 | %------------------------------------------------
633 |
634 | \begin{frame}
635 | \begin{center}
636 | \Huge{\textcolor{mandarina}{Reading JSON Data}}
637 | \end{center}
638 | \end{frame}
639 |
640 | %------------------------------------------------
641 |
642 | \begin{frame}
643 | \frametitle{JSON Data from the Web}
644 |
645 | \begin{block}{How do we read JSON data from the Web?}
646 | We read JSON data in several ways. One way is to pass the url directly to \code{fromJSON()}. Another way is by passing \code{fromJSON()} the name of the file with the JSON content as a single string.
647 | \end{block}
648 |
649 | \end{frame}
650 |
651 | %------------------------------------------------
652 |
653 | \begin{frame}
654 | \frametitle{File: miserables.js}
655 |
656 | We'll read the \textit{miserables} dataset from: \\
657 | \url{http://mbostock.github.io/protovis/ex/miserables.js}
658 |
659 | \begin{center}
660 | \includegraphics[width=8cm]{images/miserables_js.png}
661 | \end{center}
662 |
663 | \end{frame}
664 |
665 | %------------------------------------------------
666 |
667 | \begin{frame}
668 | \frametitle{Reading Issues}
669 |
670 | \begin{block}{Houston we have a problem ...}
671 | The data is in a file that contains several javascript comments and some other javascript notation.
672 |
673 | \bigskip
674 | Unfortunately, we cannot use any of the \code{fromJSON()} functions directly on this type of content.
675 |
676 | \bigskip
677 | Instead, we need to read the content as text, get rid of the comments, and change some characters before using \code{fromJSON()}
678 | \end{block}
679 |
680 | \end{frame}
681 |
682 | %------------------------------------------------
683 |
684 | \begin{frame}[fragile]
685 | \frametitle{Reading \code{miserables.js}}
686 |
687 | <>=
688 | # load RJSONIO and jsonlite
689 | library(RJSONIO)
690 | library(jsonlite)
691 |
692 | # url with JSON content
693 | miser = "http://mbostock.github.io/protovis/ex/miserables.js"
694 |
695 | # import content as text (character vector)
696 | miserables = readLines(miser)
697 |
698 | # eliminate first 11 lines (containing comments)
699 | miserables = miserables[-c(1:11)]
700 | @
701 |
702 | <>=
703 | library(RJSONIO)
704 | library(jsonlite)
705 | miser = "/Users/Gaston/Documents/Data_Technologies/data/miserables.txt"
706 | miserables = readLines(miser)
707 | miserables = miserables[-c(1:11)]
708 | @
709 |
710 | Now check the first and the last lines:
711 |
712 | <>=
713 | # first line
714 | miserables[1]
715 | # last line
716 | miserables[length(miserables)]
717 | @
718 |
719 | \end{frame}
720 |
721 | %------------------------------------------------
722 |
723 | \begin{frame}[fragile]
724 | \frametitle{Preparing JSON content}
725 |
726 | We need to modify the first and last lines so they don't contain non-JSON javascript notation
727 |
728 | <>=
729 | # open curly bracket in first line
730 | miserables[1] = "{"
731 |
732 | # closing curly bracket in last line
733 | miserables[length(miserables)] = "}"
734 | @
735 |
736 | Now we must concatenate all the content into a single string:
737 |
738 | <>=
739 | # JSON content in one single string
740 | miserables_str = paste(miserables, collapse = "")
741 | @
742 |
743 | Once we have the JSON content in the proper shape, we can parse it with \highcode{fromJSON()}.
744 |
745 | \end{frame}
746 |
747 | %------------------------------------------------
748 |
749 | \begin{frame}[fragile]
750 | \frametitle{Parsing JSON content}
751 |
752 | \highcode{fromJSON()} from package \code{"RJSONIO"}:
753 |
754 | \begin{columns}[t]
755 | \begin{column}{0.5\textwidth}
756 | <>=
757 | # fromJSON() in package RJSONIO
758 | mis1 = RJSONIO::fromJSON(miserables_str)
759 |
760 | # class
761 | class(mis1)
762 | # how many elements
763 | length(mis1)
764 | # names
765 | names(mis1)
766 | @
767 | \end{column}
768 |
769 | \begin{column}{0.5\textwidth}
770 | <>=
771 | # class of each element
772 | lapply(mis1, class)
773 | # how many elements in each list component
774 | lapply(mis1, length)
775 | @
776 | \end{column}
777 | \end{columns}
778 |
779 | \end{frame}
780 |
781 | %------------------------------------------------
782 |
783 | \begin{frame}[fragile]
784 | \frametitle{Parsing JSON content}
785 |
786 | \begin{columns}[t]
787 | \begin{column}{0.5\textwidth}
788 | <>=
789 | # take a peek at nodes
790 | head(mis1[[1]], n = 3)
791 | @
792 | \end{column}
793 |
794 | \begin{column}{0.5\textwidth}
795 | <>=
796 | # take a peek at links
797 | head(mis1[[2]], n = 3)
798 | @
799 | \end{column}
800 | \end{columns}
801 |
802 | \end{frame}
803 |
804 | %------------------------------------------------
805 |
806 | \begin{frame}[fragile]
807 | \frametitle{Parsing Differences}
808 |
809 | \begin{block}{\code{"RJSONIO"} -vs- \code{"jsonlite"}}
810 | The package \code{"jsonlite"} is a fork of \code{"RJSONIO"}. However, \code{"jsonlite"} implements a smarter mapping between JSON data and R classes.
811 |
812 | \bigskip
813 | From the previous example, we saw that \code{"jsonlite"} returns a list of data frames instead of the list of lists returned by \code{"RJSONIO"}
814 | \end{block}
815 |
816 | \end{frame}
817 |
818 | %------------------------------------------------
819 |
820 | \end{document}
--------------------------------------------------------------------------------
/05-json-data/05-json-data.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/05-json-data/05-json-data.pdf
--------------------------------------------------------------------------------
/05-json-data/images/json_cover.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/05-json-data/images/json_cover.png
--------------------------------------------------------------------------------
/05-json-data/images/miserables_js.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/05-json-data/images/miserables_js.png
--------------------------------------------------------------------------------
/06-http-basics-rcurl/06-http-basics-rcurl.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/06-http-basics-rcurl.pdf
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/Rproject.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/Rproject.png
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/client_request_server_response.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/client_request_server_response.pdf
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/clients_servers.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/clients_servers.pdf
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/curl_request.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/curl_request.png
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/hazards_form.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/hazards_form.png
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/http_browser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/http_browser.png
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/http_cover.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/http_cover.png
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/http_request_response.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/http_request_response.pdf
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/rcurl_website.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/rcurl_website.png
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/web_request.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/web_request.pdf
--------------------------------------------------------------------------------
/06-http-basics-rcurl/images/web_surfing.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/06-http-basics-rcurl/images/web_surfing.pdf
--------------------------------------------------------------------------------
/07-web-forms/07-web-forms.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/07-web-forms.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/backpack_form.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/backpack_form.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/html_form_coffee.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/html_form_coffee.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/html_form_controls.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/html_form_controls.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/html_form_get_post.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/html_form_get_post.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/html_form_gmail.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/html_form_gmail.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/html_form_google.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/html_form_google.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/html_form_twitter_facebook.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/html_form_twitter_facebook.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/html_form_works.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/html_form_works.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/national_parks_site.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/national_parks_site.png
--------------------------------------------------------------------------------
/07-web-forms/images/product_select.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/product_select.pdf
--------------------------------------------------------------------------------
/07-web-forms/images/webforms_cover.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/07-web-forms/images/webforms_cover.jpg
--------------------------------------------------------------------------------
/08-web-apis/08-web-apis.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/08-web-apis.pdf
--------------------------------------------------------------------------------
/08-web-apis/images/bitly_api.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/bitly_api.png
--------------------------------------------------------------------------------
/08-web-apis/images/eutilities_webpage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/eutilities_webpage.png
--------------------------------------------------------------------------------
/08-web-apis/images/google_apis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/google_apis.png
--------------------------------------------------------------------------------
/08-web-apis/images/ncbi_databases.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/ncbi_databases.png
--------------------------------------------------------------------------------
/08-web-apis/images/ncbi_webpage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/ncbi_webpage.png
--------------------------------------------------------------------------------
/08-web-apis/images/pubmed_search.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/pubmed_search.png
--------------------------------------------------------------------------------
/08-web-apis/images/pubmed_webpage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/pubmed_webpage.png
--------------------------------------------------------------------------------
/08-web-apis/images/redwood.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/redwood.jpg
--------------------------------------------------------------------------------
/08-web-apis/images/us_census_api.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/08-web-apis/images/us_census_api.png
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | # Title: Makefile for tutorial-R-web-data
2 | # Author: Gaston Sanchez
3 | #
4 | # Note:
5 | # This makefile makes use of "Automatic-Variables" extensively. Specifically:
6 | # $(@D) the directory part of the file name of the target
7 | # $( This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
27 |
28 | Author: [Gaston Sanchez](http://gastonsanchez.com)
29 |
--------------------------------------------------------------------------------
/data/alpha.csv:
--------------------------------------------------------------------------------
1 | iden,beta1,gamma1,delta1,alfa1,altra1,estr1,beta2,gamma2,delta2,alfa2,edat,memb,estd,eciv,prof1,prof2,ingr
1,0,0,0,0,1,0,1,0,0,0,40,4,12,1,1,0,190
2,0,0,0,0,1,0,0,0,0,1,57,2,12,1,1,0,220
3,0,0,0,0,1,0,0,0,0,1,35,1,16,0,1,0,220
4,0,0,0,0,1,0,0,1,0,0,54,3,14,1,1,0,220
5,0,0,1,0,0,0,0,0,1,0,43,5,14,1,0,1,110
6,0,0,1,0,0,0,0,1,0,0,19,6,14,0,0,0,220
7,0,0,0,0,1,0,0,0,0,0,38,5,18,1,1,0,220
8,0,0,0,0,1,0,0,1,0,0,42,3,14,1,1,0,220
9,0,0,1,0,0,0,0,0,1,0,21,2,12,0,1,0,25
10,0,0,0,1,0,0,0,0,0,0,19,5,14,0,0,0,190
11,0,0,0,1,0,0,0,0,1,0,18,4,10,0,0,1,110
12,0,0,0,0,0,1,0,0,0,1,48,3,18,1,1,0,190
13,0,0,1,0,0,0,0,0,1,0,30,1,18,0,1,0,25
14,0,0,0,0,0,0,0,0,0,0,28,2,12,1,1,0,85
15,1,0,0,0,0,0,0,0,1,0,20,2,14,0,0,0,220
16,0,0,0,0,0,1,0,0,0,1,18,5,14,0,0,0,220
17,0,0,0,1,0,0,0,0,0,1,21,7,14,0,0,0,125
18,0,0,0,0,1,0,1,0,0,0,23,2,16,1,1,0,170
19,0,0,0,1,0,0,0,0,0,1,28,1,14,0,1,0,150
20,0,0,1,0,0,0,0,0,1,0,45,3,18,1,1,0,150
21,0,0,1,0,0,0,0,0,1,0,48,1,16,0,0,1,110
22,0,0,0,1,0,0,0,0,0,1,24,2,12,0,1,0,110
23,0,0,0,0,1,0,0,0,0,1,33,3,16,1,1,0,220
24,0,0,0,1,0,0,1,0,0,0,47,5,16,1,1,0,220
25,0,0,0,0,1,0,0,0,1,0,20,4,14,0,0,1,220
26,0,0,0,0,1,0,0,0,1,0,57,2,12,1,0,1,110
27,0,0,0,0,1,0,1,0,0,0,56,2,18,1,1,0,190
28,0,0,1,0,0,0,0,0,0,1,37,4,12,1,1,0,220
29,0,0,1,0,0,0,0,0,1,0,18,5,14,0,0,0,220
30,0,1,0,0,0,0,0,1,0,0,21,4,14,1,1,0,110
31,0,0,1,0,0,0,1,0,0,0,19,7,12,0,0,1,150
32,0,0,0,1,0,0,0,0,1,0,27,2,16,1,0,0,25
33,0,0,0,0,0,1,0,0,1,0,23,3,14,0,0,1,220
34,0,0,0,0,1,0,0,0,1,0,25,4,16,1,0,1,125
35,0,0,1,0,0,0,0,0,1,0,23,1,14,0,1,0,60
36,1,0,0,0,0,0,1,0,0,0,30,2,18,1,1,0,220
37,0,0,1,0,0,0,0,0,1,0,20,5,12,0,0,1,150
38,0,0,0,1,0,0,1,0,0,0,18,4,14,0,0,0,190
39,0,0,1,0,0,0,0,0,1,0,18,1,12,0,0,1,60
40,0,0,1,0,0,0,0,0,1,0,22,2,14,1,0,1,170
41,0,0,0,1,0,0,1,0,0,0,42,5,12,1,0,1,150
42,0,0,0,0,1,0,0,0,1,0,19,7,14,0,0,1,150
43,0,0,0,0,0,0,0,0,0,0,18,4,10,0,0,1,220
44,0,0,0,0,0,0,0,0,1,0,24,4,14,1,0,0,60
45,0,0,0,0,0,0,0,0,0,1,28,1,14,0,1,0,125
46,0,1,0,0,0,0,0,1,0,0,21,4,12,0,0,1,170
47,0,0,1,0,0,0,0,0,0,1,33,1,18,1,1,0,220
48,0,0,0,0,1,0,0,0,1,0,20,7,14,0,0,0,220
49,0,0,1,0,0,0,0,1,0,0,18,4,12,0,0,0,220
50,0,0,0,1,0,0,0,1,0,0,50,4,18,1,0,1,190
51,0,0,0,0,1,0,0,0,1,0,23,3,16,0,0,0,220
52,0,0,0,0,1,0,0,1,0,0,21,9,12,0,1,0,125
53,1,0,0,0,0,0,1,0,0,0,23,2,12,1,0,1,125
54,0,0,0,0,0,1,0,1,0,0,24,8,14,0,0,1,190
55,0,0,1,0,0,0,0,0,1,0,24,2,14,0,0,1,85
56,0,0,1,0,0,0,0,0,0,1,32,4,12,1,1,0,110
57,1,0,0,0,0,0,1,0,0,0,35,5,14,1,1,0,220
58,0,0,1,0,0,0,0,0,0,1,25,3,16,1,0,1,150
59,0,0,0,0,1,0,0,0,0,1,42,4,16,1,1,0,125
60,0,0,1,0,0,0,0,0,1,0,25,4,14,1,0,1,125
61,0,1,0,0,0,0,0,1,0,0,63,2,16,1,0,0,150
62,0,0,1,0,0,0,0,0,0,1,44,3,18,1,1,0,220
63,0,0,0,0,1,0,0,0,0,1,25,1,12,0,0,1,190
64,0,0,0,0,1,0,0,0,1,0,36,5,14,1,1,0,150
65,0,0,1,0,0,0,0,0,1,0,18,7,14,0,0,0,220
66,0,0,0,0,0,1,0,0,0,0,23,3,14,0,0,0,60
67,0,0,0,1,0,0,0,0,0,1,64,3,12,1,0,1,25
68,1,0,0,0,0,0,0,0,1,0,29,1,14,0,0,1,150
69,0,0,0,0,0,1,0,0,0,1,21,3,14,0,0,0,190
70,0,0,0,1,0,0,0,0,1,0,29,3,18,1,1,0,220
71,1,0,0,0,0,0,0,0,0,1,42,3,16,1,1,0,190
72,0,0,0,1,0,0,0,0,1,0,36,4,14,1,1,0,170
73,0,0,1,0,0,0,0,0,1,0,36,5,18,1,1,0,220
74,0,0,0,1,0,0,0,0,0,0,30,4,18,1,1,0,110
75,0,0,0,0,1,0,0,1,0,0,21,1,12,0,0,1,25
76,0,0,0,0,1,0,0,1,0,0,23,1,14,0,1,0,60
77,0,0,0,0,1,0,0,0,1,0,41,4,16,1,1,0,190
78,0,0,1,0,0,0,0,0,1,0,40,6,16,1,1,0,190
79,0,0,1,0,0,0,0,0,1,0,45,4,18,1,1,0,125
80,0,0,1,0,0,0,0,0,1,0,24,5,14,1,0,1,60
81,0,1,0,0,0,0,1,0,0,0,21,4,14,0,0,0,25
82,0,0,0,0,1,0,0,0,1,0,37,4,12,1,1,0,220
83,0,0,0,1,0,0,0,0,0,1,27,3,12,1,1,0,85
84,0,0,0,0,1,0,0,0,0,1,44,4,14,1,1,0,220
85,1,0,0,0,0,0,1,0,0,0,45,3,12,1,1,0,125
86,0,0,1,0,0,0,0,0,0,1,32,2,14,1,1,0,220
87,0,0,0,0,1,0,0,0,1,0,25,1,14,0,1,0,60
88,0,0,1,0,0,0,0,0,1,0,26,2,12,0,1,0,125
89,1,0,0,0,0,0,0,0,0,1,19,6,12,0,1,0,190
90,0,0,0,0,1,0,0,0,1,0,26,3,10,1,0,1,60
91,1,0,0,0,0,0,0,0,1,0,18,4,14,0,0,1,220
92,0,0,1,0,0,0,0,0,1,0,32,3,14,0,1,0,125
93,0,0,0,1,0,0,0,1,0,0,42,1,8,1,0,1,170
94,0,0,1,0,0,0,0,0,1,0,29,3,16,1,0,1,190
95,0,0,0,0,0,0,0,1,0,0,21,4,14,0,0,0,110
96,0,0,0,0,1,0,0,0,1,0,19,3,12,0,0,0,85
97,0,0,0,0,1,0,1,0,0,0,21,3,14,0,0,0,125
98,0,0,1,0,0,0,0,0,1,0,33,4,12,1,0,1,150
99,0,0,0,0,1,0,0,0,0,1,51,4,18,1,1,0,150
100,0,0,0,0,0,0,0,1,0,0,25,3,16,0,1,0,220
101,0,0,0,0,1,0,0,0,1,0,20,3,14,0,1,0,85
102,0,0,0,0,1,0,0,0,1,0,40,9,18,1,1,0,220
103,0,0,1,0,0,0,0,0,1,0,18,4,10,0,0,0,220
104,0,0,0,1,0,0,0,0,1,0,28,7,12,1,0,1,110
105,0,0,0,0,0,0,0,0,1,0,49,3,12,1,0,1,220
106,0,0,0,0,1,0,0,1,0,0,46,4,8,1,0,1,110
107,0,0,1,0,0,0,0,0,1,0,27,4,16,1,0,1,60
108,0,0,0,0,0,0,0,0,0,1,34,4,18,1,1,0,220
109,0,0,1,0,0,0,0,0,1,0,26,2,14,1,0,1,150
110,0,0,1,0,0,0,0,0,1,0,19,3,14,0,0,0,220
111,0,0,0,1,0,0,0,0,0,1,41,5,16,1,1,0,190
112,1,0,0,0,0,0,1,0,0,0,53,2,16,1,1,0,220
113,0,0,0,1,0,0,0,0,0,1,39,3,18,1,1,0,220
114,0,0,0,0,1,0,0,0,1,0,19,6,14,0,0,1,220
115,0,0,1,0,0,0,0,0,1,0,18,1,12,0,0,1,25
116,0,0,0,0,0,0,0,0,1,0,45,3,16,0,1,0,170
117,0,0,0,0,1,0,0,0,1,0,19,6,14,0,1,0,220
118,1,0,0,0,0,0,0,0,0,1,25,3,12,0,0,1,85
119,0,0,0,0,1,0,0,0,0,1,56,4,16,1,1,0,190
120,0,0,0,1,0,0,1,0,0,0,37,4,14,1,1,0,220
121,0,0,0,0,1,0,0,0,1,0,19,5,10,0,1,0,220
122,0,0,1,0,0,0,0,0,1,0,61,1,18,0,1,0,85
123,0,0,1,0,0,0,0,0,1,0,18,5,10,0,0,0,170
124,0,0,0,0,0,1,0,0,1,0,32,4,16,1,1,0,220
125,0,0,0,0,1,0,0,0,0,1,42,4,16,1,1,0,220
126,0,0,0,1,0,0,0,0,0,1,38,3,16,0,1,0,125
127,0,0,1,0,0,0,0,0,1,0,58,4,14,1,1,0,220
128,0,0,0,0,1,0,0,1,0,0,28,3,14,1,0,1,125
129,0,0,0,0,1,0,0,0,0,1,34,1,14,0,0,1,150
130,0,0,1,0,0,0,0,0,0,1,20,3,12,1,0,1,25
131,0,0,0,1,0,0,0,1,0,0,37,6,12,1,0,1,125
132,1,0,0,0,0,0,0,0,0,1,21,4,12,0,0,1,150
133,0,0,0,0,1,0,1,0,0,0,55,3,16,1,0,1,170
134,1,0,0,0,0,0,1,0,0,0,57,2,12,1,0,0,170
135,0,0,0,0,1,0,0,0,0,1,24,2,18,1,1,0,85
136,0,0,0,0,1,0,0,1,0,0,30,4,16,1,1,0,220
137,0,0,0,1,0,0,0,0,0,1,40,5,18,1,1,0,170
138,0,0,0,0,1,0,0,0,0,1,27,2,16,1,1,0,220
139,0,0,0,0,1,0,0,0,0,1,29,1,18,0,1,0,60
140,1,0,0,0,0,0,0,0,0,1,30,4,14,1,0,1,110
141,0,0,1,0,0,0,0,0,1,0,26,2,18,1,1,0,190
142,0,0,0,0,0,0,0,0,1,0,21,7,14,0,0,1,220
143,0,0,0,0,1,0,1,0,0,0,19,1,10,0,0,1,25
144,0,0,0,0,0,0,0,0,0,1,26,4,14,0,0,0,25
145,0,0,1,0,0,0,0,0,1,0,24,2,18,1,1,0,220
146,1,0,0,0,0,0,0,0,1,0,40,9,16,1,1,0,125
147,0,0,1,0,0,0,0,0,1,0,18,3,12,0,0,1,190
148,0,0,0,0,1,0,0,0,1,0,52,2,12,1,1,0,110
149,0,0,1,0,0,0,0,0,1,0,18,5,14,0,0,0,85
150,1,0,0,0,0,0,1,0,0,0,30,2,14,0,1,0,85
151,0,0,1,0,0,0,0,0,1,0,18,4,12,0,0,1,170
152,0,0,0,0,1,0,0,0,0,1,21,4,14,0,0,0,110
153,0,0,0,0,0,1,0,0,0,1,30,5,14,0,0,1,125
154,0,0,0,0,1,0,0,0,1,0,22,5,18,0,1,0,125
155,0,0,1,0,0,0,1,0,0,0,22,1,14,0,0,1,85
156,0,0,0,0,1,0,0,0,0,1,27,2,16,1,1,0,220
157,0,0,0,0,1,0,0,0,0,0,35,5,14,1,1,0,220
158,1,0,0,0,0,0,0,0,1,0,28,4,18,0,0,0,85
159,0,0,0,0,1,0,0,0,1,0,54,3,12,1,1,0,220
160,0,1,0,0,0,0,1,0,0,0,20,1,14,0,0,1,60
161,0,0,1,0,0,0,0,0,1,0,20,1,12,0,0,1,60
162,0,0,0,1,0,0,0,0,0,1,26,1,14,0,1,0,220
163,0,0,0,0,1,0,1,0,0,0,40,7,12,1,0,1,150
164,0,0,0,1,0,0,1,0,0,0,26,3,12,1,0,1,85
165,0,0,0,1,0,0,0,0,0,1,18,4,10,0,0,0,25
166,0,0,0,0,0,0,0,0,1,0,25,1,12,0,1,0,85
167,0,0,1,0,0,0,1,0,0,0,54,5,18,1,1,0,220
168,0,0,0,1,0,0,0,0,0,1,20,1,14,0,0,0,220
169,0,0,0,1,0,0,0,0,0,1,19,4,14,0,0,0,110
170,0,0,1,0,0,0,0,0,1,0,60,2,14,1,0,0,220
171,0,0,0,0,1,0,0,0,0,0,45,6,18,1,1,0,125
172,0,0,0,0,1,0,0,0,0,0,63,2,16,1,1,0,220
173,0,0,0,0,0,1,0,0,1,0,20,6,16,0,0,0,220
174,1,0,0,0,0,0,0,0,1,0,63,1,18,0,1,0,150
175,0,0,0,1,0,0,0,0,0,1,64,2,16,1,0,0,110
176,0,0,0,1,0,0,0,0,1,0,20,6,14,0,0,0,220
177,0,0,0,1,0,0,0,0,0,1,18,2,10,0,0,1,60
178,0,0,0,0,0,0,0,1,0,0,58,2,10,1,0,0,110
179,0,0,0,0,1,0,0,0,0,1,18,4,10,0,0,0,125
180,0,0,1,0,0,0,0,0,1,0,22,9,14,0,0,0,150
181,0,0,0,0,0,0,0,0,0,0,24,2,14,0,0,1,85
182,0,0,0,0,1,0,0,0,1,0,38,1,12,0,0,1,125
183,0,1,0,0,0,0,0,0,1,0,21,2,14,1,1,0,170
184,1,0,0,0,0,0,0,0,1,0,24,1,12,0,0,1,110
185,0,0,0,0,1,0,0,0,0,1,52,2,16,1,1,0,220
186,0,0,0,0,1,0,0,0,1,0,20,6,12,0,0,1,220
187,0,0,0,1,0,0,0,0,0,1,50,3,10,1,0,1,110
188,0,0,0,1,0,0,0,0,0,1,21,5,12,0,0,1,220
189,0,0,0,0,1,0,0,0,1,0,19,3,10,0,0,1,110
190,0,0,0,0,0,0,0,0,0,1,19,3,10,0,0,1,170
191,0,0,1,0,0,0,0,0,1,0,21,4,12,1,0,1,170
192,0,0,0,0,0,0,1,0,0,0,20,4,14,0,1,0,25
193,0,0,0,0,0,0,0,0,0,1,20,1,14,0,1,0,60
194,1,0,0,0,0,0,0,0,1,0,29,3,14,1,0,1,125
195,0,0,1,0,0,0,0,0,1,0,23,2,14,1,0,1,85
196,0,0,0,0,1,0,1,0,0,0,20,5,12,0,1,0,220
197,0,0,0,0,1,0,0,0,1,0,24,1,14,0,0,1,25
198,0,0,0,0,0,1,0,0,1,0,23,6,14,0,0,1,170
199,0,0,1,0,0,0,0,0,0,1,25,2,14,1,0,0,110
200,0,0,0,0,1,0,0,0,0,1,21,1,14,0,0,1,85
201,0,0,1,0,0,0,0,0,1,0,20,8,14,0,0,1,220
202,0,0,0,1,0,0,0,0,0,1,27,4,18,0,1,0,220
203,0,0,1,0,0,0,1,0,0,0,21,6,10,0,0,1,85
204,1,0,0,0,0,0,1,0,0,0,19,6,10,0,0,1,220
205,1,0,0,0,0,0,1,0,0,0,31,3,14,1,1,0,220
206,0,0,0,0,1,0,0,0,0,1,46,4,12,1,0,1,125
207,0,0,0,0,1,0,1,0,0,0,39,9,12,1,0,1,125
208,0,0,0,1,0,0,0,0,0,1,40,6,10,1,0,1,85
209,0,0,0,0,1,0,0,0,0,1,63,4,18,1,1,0,220
210,0,0,0,0,1,0,0,0,0,1,58,2,16,1,1,0,220
211,0,0,0,0,1,0,0,0,1,0,49,4,14,1,1,0,220
212,0,0,0,0,0,0,0,0,1,0,36,4,14,1,0,1,170
213,1,0,0,0,0,0,1,0,0,0,31,3,14,1,0,1,190
214,1,0,0,0,0,0,1,0,0,0,52,6,14,1,0,1,110
215,0,0,1,0,0,0,0,0,1,0,58,1,6,0,0,1,110
216,0,0,0,1,0,0,0,0,0,1,48,3,16,1,0,1,190
217,0,1,0,0,0,0,0,1,0,0,33,4,12,1,0,1,220
218,0,0,0,0,1,0,0,0,0,1,30,3,10,1,0,1,125
219,0,0,0,1,0,0,0,0,0,0,30,2,18,1,0,0,25
220,0,0,1,0,0,0,0,0,1,0,34,1,14,0,1,0,110
221,0,0,0,1,0,0,0,0,0,1,41,4,16,1,1,0,110
222,0,0,0,0,1,0,0,0,0,1,38,4,16,1,1,0,110
223,0,0,0,1,0,0,0,0,0,1,30,1,16,0,1,0,85
224,0,0,1,0,0,0,0,0,1,0,31,4,14,1,1,0,220
225,1,0,0,0,0,0,1,0,0,0,64,2,14,1,0,0,110
226,0,1,0,0,0,0,1,0,0,0,56,2,8,1,0,0,150
227,0,0,0,1,0,0,0,0,1,0,33,1,12,0,1,0,85
228,0,0,0,0,1,0,0,0,1,0,60,2,14,1,1,0,60
229,0,0,0,0,1,0,0,0,1,0,64,2,14,1,1,0,25
230,0,0,0,0,1,0,0,0,0,1,39,4,10,1,0,1,220
231,0,0,0,1,0,0,0,0,0,1,31,4,12,1,0,1,110
232,0,0,0,0,1,0,0,0,0,1,34,4,12,1,1,0,85
233,0,0,0,1,0,0,0,0,0,1,53,2,14,1,1,0,220
234,0,0,1,0,0,0,0,0,1,0,30,3,14,1,0,1,125
235,0,0,0,0,0,0,0,0,0,1,44,2,10,1,0,1,110
236,0,0,0,1,0,0,0,0,0,1,44,3,12,1,0,1,170
237,0,0,0,0,1,0,0,0,0,0,54,2,12,1,1,0,85
238,0,0,0,0,1,0,0,0,0,1,35,3,6,1,0,1,85
239,0,0,0,1,0,0,0,0,0,1,54,2,10,1,0,1,110
240,1,0,0,0,0,0,1,0,0,0,44,2,16,1,1,0,150
241,0,0,0,1,0,0,0,0,1,0,34,3,18,1,1,0,220
242,0,0,1,0,0,0,0,0,1,0,45,3,14,1,0,1,125
243,0,0,0,1,0,0,0,0,0,1,35,4,12,1,0,1,85
244,0,0,0,0,1,0,0,1,0,0,46,5,16,1,1,0,150
245,0,0,0,0,1,0,0,0,0,1,41,6,6,1,0,1,150
246,0,0,0,1,0,0,1,0,0,0,32,3,12,0,0,1,150
247,0,0,1,0,0,0,0,0,1,0,53,7,12,1,0,1,220
248,0,0,1,0,0,0,0,0,1,0,54,4,10,1,0,1,220
249,0,0,0,1,0,0,0,0,0,1,48,5,14,1,1,0,220
250,0,0,0,0,1,0,0,0,1,0,62,2,16,1,1,0,150
251,0,0,0,0,1,0,0,1,0,0,47,5,12,1,1,0,170
252,0,0,0,0,1,0,0,1,0,0,30,3,12,1,0,1,190
--------------------------------------------------------------------------------
/data/alpha.xls:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gastonstat/tutorial-R-web-data/2cdb84b2dc4512d7a33f99afd3ceb8e8a4e63bd0/data/alpha.xls
--------------------------------------------------------------------------------
/data/cars2004.csv:
--------------------------------------------------------------------------------
1 | Model,Cylinders,Horsepower,Speed,Weight,Width,Length
2 | Citroen C2 1.1 Base,1124,61,158,932,1659,3666
3 | Smart Fortwo Coupe,698,52,135,730,1515,2500
4 | Mini 1.6 170,1598,170,218,1215,1690,3625
5 | Nissan Micra 1.2 65,1240,65,154,965,1660,3715
6 | Renault Clio 3.0 V6,2946,255,245,1400,1810,3812
7 | Audi A3 1.9 TDI,1896,105,187,1295,1765,4203
8 | Peugeot 307 1.4 HDI 70,1398,70,160,1179,1746,4202
9 | Peugeot 407 3.0 V6 BVA,2946,211,229,1640,1811,4676
10 | Mercedes Classe C 270 CDI,2685,170,230,1600,1728,4528
11 | BMW 530d,2993,218,245,1595,1846,4841
12 | Jaguar S-Type 2.7 V6 Bi-Turbo,2720,207,230,1722,1818,4905
13 | BMW 745i,4398,333,250,1870,1902,5029
14 | Mercedes Classe S 400 CDI,3966,260,250,1915,2092,5038
15 | Citroen C3 Pluriel 1.6i,1587,110,185,1177,1700,3934
16 | BMW Z4 2.5i,2494,192,235,1260,1781,4091
17 | Audi TT 1.8T 180,1781,180,228,1280,1764,4041
18 | Aston Martin Vanquish,5935,460,306,1835,1923,4665
19 | Bentley Continental GT,5998,560,318,2385,1918,4804
20 | Ferrari Enzo,5998,660,350,1365,2650,4700
21 | Renault Scenic 1.9 dCi 120,1870,120,188,1430,1805,4259
22 | Volkswagen Touran 1.9 TDI 105,1896,105,180,1498,1794,4391
23 | Land Rover Defender Td5,2495,122,135,1695,1790,3883
24 | Land Rover Discovery Td5,2495,138,157,2175,2190,4705
25 | Nissan X-Trail 2.2 dCi,2184,136,180,1520,1765,4455
--------------------------------------------------------------------------------
/data/esearch_retmax103.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 10310301NCID_1_347770334_130.14.22.215_9001_1399490825_59805424
4 | 23959643
5 | 23730202
6 | 23520917
7 | 23281708
8 | 23281599
9 | 23266811
10 | 23261826
11 | 23260131
12 | 23251346
13 | 23235118
14 | 23230266
15 | 23166509
16 | 23153121
17 | 23151206
18 | 23148531
19 | 23141859
20 | 23115822
21 | 23112225
22 | 23102215
23 | 23086744
24 | 23066043
25 | 23065348
26 | 23064634
27 | 23056365
28 | 23049969
29 | 22983818
30 | 22965039
31 | 22955987
32 | 22955986
33 | 22955982
34 | 22955974
35 | 22955971
36 | 22955793
37 | 22955617
38 | 22955616
39 | 22952639
40 | 22926102
41 | 22908741
42 | 22906809
43 | 22850674
44 | 22849370
45 | 22829947
46 | 22808122
47 | 22792041
48 | 22779033
49 | 22736453
50 | 22734843
51 | 22714408
52 | 22701604
53 | 22700182
54 | 22683623
55 | 22665225
56 | 22648820
57 | 22642107
58 | 22624854
59 | 22615578
60 | 22608825
61 | 22606341
62 | 22593554
63 | 22586432
64 | 22567964
65 | 22550155
66 | 22548216
67 | 22539467
68 | 22523069
69 | 22495928
70 | 22495107
71 | 22492999
72 | 22465810
73 | 22456607
74 | 22451669
75 | 22445588
76 | 22434427
77 | 22383546
78 | 22372745
79 | 22365773
80 | 22364178
81 | 22344433
82 | 22326088
83 | 22313678
84 | 22309575
85 | 24511453
86 | 22300442
87 | 24478921
88 | 22253817
89 | 22248320
90 | 22231539
91 | 22228013
92 | 22213110
93 | 24705081
94 | 22183967
95 | 22175250
96 | 22170741
97 | 22138481
98 | 22102592
99 | 22093876
100 | 22075116
101 | 22057813
102 | 22057783
103 | 22050290
104 | 22031941
105 | 21945885
106 | 21559844
107 | human genome[title]title2906N2012[pdat]pdat1067564NANDhuman genome[title] AND 2012[pdat]
108 |
--------------------------------------------------------------------------------
/data/esearch_retmax20.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | 1032001NCID_1_282865897_130.14.18.34_9001_1399488846_694838780
4 | 23959643
5 | 23730202
6 | 23520917
7 | 23281708
8 | 23281599
9 | 23266811
10 | 23261826
11 | 23260131
12 | 23251346
13 | 23235118
14 | 23230266
15 | 23166509
16 | 23153121
17 | 23151206
18 | 23148531
19 | 23141859
20 | 23115822
21 | 23112225
22 | 23102215
23 | 23086744
24 | human genome[title]title2906N2012[pdat]pdat1067564NANDhuman genome[title] AND 2012[pdat]
25 |
--------------------------------------------------------------------------------
/data/harry_potter.txt:
--------------------------------------------------------------------------------
1 | Harry Potter and the Sorcerer's Stone
2 |
3 | CHAPTER ONE
4 |
5 | THE BOY WHO LIVED
6 |
7 | Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say
8 | that they were perfectly normal, thank you very much. They were the last
9 | people you'd expect to be involved in anything strange or mysterious,
10 | because they just didn't hold with such nonsense.
11 |
12 | Mr. Dursley was the director of a firm called Grunnings, which made
13 | drills. He was a big, beefy man with hardly any neck, although he did
14 | have a very large mustache. Mrs. Dursley was thin and blonde and had
15 | nearly twice the usual amount of neck, which came in very useful as she
16 | spent so much of her time craning over garden fences, spying on the
17 | neighbors. The Dursley s had a small son called Dudley and in their
18 | opinion there was no finer boy anywhere.
19 |
20 | The Dursleys had everything they wanted, but they also had a secret, and
21 | their greatest fear was that somebody would discover it. They didn't
22 | think they could bear it if anyone found out about the Potters. Mrs.
23 | Potter was Mrs. Dursley's sister, but they hadn't met for several years;
24 | in fact, Mrs. Dursley pretended she didn't have a sister, because her
25 | sister and her good-for-nothing husband were as unDursleyish as it was
26 | possible to be. The Dursleys shuddered to think what the neighbors would
27 | say if the Potters arrived in the street. The Dursleys knew that the
28 | Potters had a small son, too, but they had never even seen him. This boy
29 | was another good reason for keeping the Potters away; they didn't want
30 | Dudley mixing with a child like that.
31 |
--------------------------------------------------------------------------------
/data/iris_data.txt:
--------------------------------------------------------------------------------
1 | 5.1,3.5,1.4,0.2,Iris-setosa
2 | 4.9,3.0,1.4,0.2,Iris-setosa
3 | 4.7,3.2,1.3,0.2,Iris-setosa
4 | 4.6,3.1,1.5,0.2,Iris-setosa
5 | 5.0,3.6,1.4,0.2,Iris-setosa
6 | 5.4,3.9,1.7,0.4,Iris-setosa
7 | 4.6,3.4,1.4,0.3,Iris-setosa
8 | 5.0,3.4,1.5,0.2,Iris-setosa
9 | 4.4,2.9,1.4,0.2,Iris-setosa
10 | 4.9,3.1,1.5,0.1,Iris-setosa
11 | 5.4,3.7,1.5,0.2,Iris-setosa
12 | 4.8,3.4,1.6,0.2,Iris-setosa
13 | 4.8,3.0,1.4,0.1,Iris-setosa
14 | 4.3,3.0,1.1,0.1,Iris-setosa
15 | 5.8,4.0,1.2,0.2,Iris-setosa
16 | 5.7,4.4,1.5,0.4,Iris-setosa
17 | 5.4,3.9,1.3,0.4,Iris-setosa
18 | 5.1,3.5,1.4,0.3,Iris-setosa
19 | 5.7,3.8,1.7,0.3,Iris-setosa
20 | 5.1,3.8,1.5,0.3,Iris-setosa
21 | 5.4,3.4,1.7,0.2,Iris-setosa
22 | 5.1,3.7,1.5,0.4,Iris-setosa
23 | 4.6,3.6,1.0,0.2,Iris-setosa
24 | 5.1,3.3,1.7,0.5,Iris-setosa
25 | 4.8,3.4,1.9,0.2,Iris-setosa
26 | 5.0,3.0,1.6,0.2,Iris-setosa
27 | 5.0,3.4,1.6,0.4,Iris-setosa
28 | 5.2,3.5,1.5,0.2,Iris-setosa
29 | 5.2,3.4,1.4,0.2,Iris-setosa
30 | 4.7,3.2,1.6,0.2,Iris-setosa
31 | 4.8,3.1,1.6,0.2,Iris-setosa
32 | 5.4,3.4,1.5,0.4,Iris-setosa
33 | 5.2,4.1,1.5,0.1,Iris-setosa
34 | 5.5,4.2,1.4,0.2,Iris-setosa
35 | 4.9,3.1,1.5,0.1,Iris-setosa
36 | 5.0,3.2,1.2,0.2,Iris-setosa
37 | 5.5,3.5,1.3,0.2,Iris-setosa
38 | 4.9,3.1,1.5,0.1,Iris-setosa
39 | 4.4,3.0,1.3,0.2,Iris-setosa
40 | 5.1,3.4,1.5,0.2,Iris-setosa
41 | 5.0,3.5,1.3,0.3,Iris-setosa
42 | 4.5,2.3,1.3,0.3,Iris-setosa
43 | 4.4,3.2,1.3,0.2,Iris-setosa
44 | 5.0,3.5,1.6,0.6,Iris-setosa
45 | 5.1,3.8,1.9,0.4,Iris-setosa
46 | 4.8,3.0,1.4,0.3,Iris-setosa
47 | 5.1,3.8,1.6,0.2,Iris-setosa
48 | 4.6,3.2,1.4,0.2,Iris-setosa
49 | 5.3,3.7,1.5,0.2,Iris-setosa
50 | 5.0,3.3,1.4,0.2,Iris-setosa
51 | 7.0,3.2,4.7,1.4,Iris-versicolor
52 | 6.4,3.2,4.5,1.5,Iris-versicolor
53 | 6.9,3.1,4.9,1.5,Iris-versicolor
54 | 5.5,2.3,4.0,1.3,Iris-versicolor
55 | 6.5,2.8,4.6,1.5,Iris-versicolor
56 | 5.7,2.8,4.5,1.3,Iris-versicolor
57 | 6.3,3.3,4.7,1.6,Iris-versicolor
58 | 4.9,2.4,3.3,1.0,Iris-versicolor
59 | 6.6,2.9,4.6,1.3,Iris-versicolor
60 | 5.2,2.7,3.9,1.4,Iris-versicolor
61 | 5.0,2.0,3.5,1.0,Iris-versicolor
62 | 5.9,3.0,4.2,1.5,Iris-versicolor
63 | 6.0,2.2,4.0,1.0,Iris-versicolor
64 | 6.1,2.9,4.7,1.4,Iris-versicolor
65 | 5.6,2.9,3.6,1.3,Iris-versicolor
66 | 6.7,3.1,4.4,1.4,Iris-versicolor
67 | 5.6,3.0,4.5,1.5,Iris-versicolor
68 | 5.8,2.7,4.1,1.0,Iris-versicolor
69 | 6.2,2.2,4.5,1.5,Iris-versicolor
70 | 5.6,2.5,3.9,1.1,Iris-versicolor
71 | 5.9,3.2,4.8,1.8,Iris-versicolor
72 | 6.1,2.8,4.0,1.3,Iris-versicolor
73 | 6.3,2.5,4.9,1.5,Iris-versicolor
74 | 6.1,2.8,4.7,1.2,Iris-versicolor
75 | 6.4,2.9,4.3,1.3,Iris-versicolor
76 | 6.6,3.0,4.4,1.4,Iris-versicolor
77 | 6.8,2.8,4.8,1.4,Iris-versicolor
78 | 6.7,3.0,5.0,1.7,Iris-versicolor
79 | 6.0,2.9,4.5,1.5,Iris-versicolor
80 | 5.7,2.6,3.5,1.0,Iris-versicolor
81 | 5.5,2.4,3.8,1.1,Iris-versicolor
82 | 5.5,2.4,3.7,1.0,Iris-versicolor
83 | 5.8,2.7,3.9,1.2,Iris-versicolor
84 | 6.0,2.7,5.1,1.6,Iris-versicolor
85 | 5.4,3.0,4.5,1.5,Iris-versicolor
86 | 6.0,3.4,4.5,1.6,Iris-versicolor
87 | 6.7,3.1,4.7,1.5,Iris-versicolor
88 | 6.3,2.3,4.4,1.3,Iris-versicolor
89 | 5.6,3.0,4.1,1.3,Iris-versicolor
90 | 5.5,2.5,4.0,1.3,Iris-versicolor
91 | 5.5,2.6,4.4,1.2,Iris-versicolor
92 | 6.1,3.0,4.6,1.4,Iris-versicolor
93 | 5.8,2.6,4.0,1.2,Iris-versicolor
94 | 5.0,2.3,3.3,1.0,Iris-versicolor
95 | 5.6,2.7,4.2,1.3,Iris-versicolor
96 | 5.7,3.0,4.2,1.2,Iris-versicolor
97 | 5.7,2.9,4.2,1.3,Iris-versicolor
98 | 6.2,2.9,4.3,1.3,Iris-versicolor
99 | 5.1,2.5,3.0,1.1,Iris-versicolor
100 | 5.7,2.8,4.1,1.3,Iris-versicolor
101 | 6.3,3.3,6.0,2.5,Iris-virginica
102 | 5.8,2.7,5.1,1.9,Iris-virginica
103 | 7.1,3.0,5.9,2.1,Iris-virginica
104 | 6.3,2.9,5.6,1.8,Iris-virginica
105 | 6.5,3.0,5.8,2.2,Iris-virginica
106 | 7.6,3.0,6.6,2.1,Iris-virginica
107 | 4.9,2.5,4.5,1.7,Iris-virginica
108 | 7.3,2.9,6.3,1.8,Iris-virginica
109 | 6.7,2.5,5.8,1.8,Iris-virginica
110 | 7.2,3.6,6.1,2.5,Iris-virginica
111 | 6.5,3.2,5.1,2.0,Iris-virginica
112 | 6.4,2.7,5.3,1.9,Iris-virginica
113 | 6.8,3.0,5.5,2.1,Iris-virginica
114 | 5.7,2.5,5.0,2.0,Iris-virginica
115 | 5.8,2.8,5.1,2.4,Iris-virginica
116 | 6.4,3.2,5.3,2.3,Iris-virginica
117 | 6.5,3.0,5.5,1.8,Iris-virginica
118 | 7.7,3.8,6.7,2.2,Iris-virginica
119 | 7.7,2.6,6.9,2.3,Iris-virginica
120 | 6.0,2.2,5.0,1.5,Iris-virginica
121 | 6.9,3.2,5.7,2.3,Iris-virginica
122 | 5.6,2.8,4.9,2.0,Iris-virginica
123 | 7.7,2.8,6.7,2.0,Iris-virginica
124 | 6.3,2.7,4.9,1.8,Iris-virginica
125 | 6.7,3.3,5.7,2.1,Iris-virginica
126 | 7.2,3.2,6.0,1.8,Iris-virginica
127 | 6.2,2.8,4.8,1.8,Iris-virginica
128 | 6.1,3.0,4.9,1.8,Iris-virginica
129 | 6.4,2.8,5.6,2.1,Iris-virginica
130 | 7.2,3.0,5.8,1.6,Iris-virginica
131 | 7.4,2.8,6.1,1.9,Iris-virginica
132 | 7.9,3.8,6.4,2.0,Iris-virginica
133 | 6.4,2.8,5.6,2.2,Iris-virginica
134 | 6.3,2.8,5.1,1.5,Iris-virginica
135 | 6.1,2.6,5.6,1.4,Iris-virginica
136 | 7.7,3.0,6.1,2.3,Iris-virginica
137 | 6.3,3.4,5.6,2.4,Iris-virginica
138 | 6.4,3.1,5.5,1.8,Iris-virginica
139 | 6.0,3.0,4.8,1.8,Iris-virginica
140 | 6.9,3.1,5.4,2.1,Iris-virginica
141 | 6.7,3.1,5.6,2.4,Iris-virginica
142 | 6.9,3.1,5.1,2.3,Iris-virginica
143 | 5.8,2.7,5.1,1.9,Iris-virginica
144 | 6.8,3.2,5.9,2.3,Iris-virginica
145 | 6.7,3.3,5.7,2.5,Iris-virginica
146 | 6.7,3.0,5.2,2.3,Iris-virginica
147 | 6.3,2.5,5.0,1.9,Iris-virginica
148 | 6.5,3.0,5.2,2.0,Iris-virginica
149 | 6.2,3.4,5.4,2.3,Iris-virginica
150 | 5.9,3.0,5.1,1.8,Iris-virginica
151 |
--------------------------------------------------------------------------------
/data/mailing_lists.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | R: Mailing Lists
6 |
7 |
8 |
9 |
10 |
Mailing Lists
11 |
12 |
Please read the instructions
13 | below and the posting guide
14 | before sending anything to any mailing list!
15 |
16 |
17 |
Thanks to Martin Maechler (and ETH Zurich),
18 | there are four general mailing lists devoted to R:
19 |
20 |
21 |
R-announce
22 |
This list is for major announcements about the
23 | development of R and the availability of new code.
24 | It has a low volume (typically only a few messages a
25 | month) and everyone mildly interested should consider subscribing,
26 | but note that R-help gets everything from R-announce as well, so
27 | you don't need to subscribe to both of them.
28 |
29 |
30 | Note that the list is moderated to be used for
31 | announcements mainly by the R Core Development Team.
32 |
33 | Use the
35 | web interface for information, subscription, archives, etc.
36 |
37 |
38 |
39 |
R-packages
40 |
41 | This list is for announcements as well, usually on
42 | the availability of new or enhanced contributed packages (on CRAN, typically).
44 |
45 |
46 |
47 | Note that the list is moderated. However, CRAN package
48 | authors (and others, similarly qualified) can freely post.
49 |
50 | As with R-announce, all messages to R-packages are
51 | automatically forwarded to the main R-help mailing list;
52 | we still recommend to subscribe to R-packages if you read R-help
53 | only in digest form.
54 |
55 | Use the
57 | web interface for information, subscription, archives, etc.
58 |
59 |
60 |
61 |
62 |
R-help
63 |
64 | The ‘main’ R mailing list, for discussion
65 | about problems and solutions using R, announcements (not
66 | covered by ‘R-announce’ or ‘R-packages’,
67 | see above), about the availability of new functionality for
68 | R and documentation of R, comparison and
69 | compatibility with S-plus, and for the posting of nice
70 | examples and benchmarks.
71 | Do read the posting guide
72 | before sending anything!
73 |
74 |
75 | This has become quite an active list with dozens of
76 | messages per day. An alternative is to subscribe and choose daily
77 | digests (in plain or MIME format).
78 |
79 | Use the
80 | web interface for information, subscription, archives, etc.
81 |
82 |
83 |
84 |
85 |
R-devel
86 |
This list is intended for questions and discussion about code
87 | development in R. Questions likely to prompt discussion
88 | unintelligible to non-programmers or topics that are too technical
89 | for R-help's audience should go to R-devel, see the posting guide section.
91 | The list is also for proposals of new functionality for R,
92 | and pre-testing of new versions. It is meant particularly for
93 | those who maintain an active position in the development of
94 | R. Therefore, it also receives all (filtered,
95 | i.e. non-spam!) bug reports from R-bugs.
97 |
98 | If you don't want to receive more than a daily message, you
99 | can subscribe and choose digests (in plain or MIME format).
100 |
101 | Use the
102 | web interface for information, subscription, archives, etc.
103 |
104 |
105 |
106 | Additionally, there are several specific Special Interest
107 | Group (=: SIG) mailing lists;
108 | however do post to only one list at time ('SIG' or general one),
109 | cross-posting is considered to be impolite.
110 |
203 |
204 | To satisfy geographic or regional (or subject) needs, some R users have
205 | formed "R User Groups" for which there are mailing
206 | lists. Information about (some of) these groups and their lists can be
207 | found at the
208 | RUG web page,
209 | maintained by John C. Nash.
210 |
211 |
212 |
Gmane also several of our lists, with web, news, and RSS interfaces
220 | and search facilities, see the index of Gmane's
222 | mirrored R-lists.
223 |
224 |
Have a look at CRAN's search page for
226 | searchable versions of the mailing list archives.
227 |
Information about the list can be obtained by sending an email
252 | with ‘info’ as its contents to r-help-request@R-project.org.
254 |
255 | Note that you can can subscribe and unsubscribe by E-mail
256 | (instead of the web interface), however to unsubscribe you currently need the
257 | mailing list password which you get when subscribing and in a monthly reminder.
258 |
259 |
260 |
To send a message to everyone on the r-help mailing list, send
261 | email to
262 | r-help@R-project.org.
263 |
264 | Do please create a new email message when posting to the list rather than
265 | replying to a previous message and simply changing the subject line!
266 | This allows sensible threading in the mailing list archives (and
267 | many users e-mail readers).
268 |
269 | Subscription and posting to the other lists is done
270 | analogously, with ‘r-help’ replaced by ‘r-announce’ and
271 | ‘r-devel’, respectively. Note that the r-announce list is
272 | gatewayed into r-help, so you don't need to subscribe to both of
273 | them.
274 |
275 |
It is recommended that you send mail to r-help (or r-devel if
276 | appropriate) rather than only to the R developers (who are also
277 | subscribed to the list, of course). This may save them precious time they
278 | can use for constantly improving R, and will typically also
279 | result in much quicker feedback for yourself.
280 |
281 |
Of course, in the case of bug reports it would be very helpful
282 | to have code which reliably reproduces the problem, see the
283 |
284 | entry in the R FAQ.
285 |
286 |
287 |
288 |
289 | Author: R-core
290 | Problems with the mailing lists: Use the links above and their e-mail
291 | adresses.
292 |
293 |
Thomson, A. and Randall-Maciver, R. (1905) Ancient Races of the Thebaid, Oxford: Oxford University Press.
17 | Also found in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets, New York: Chapman & Hall, pp. 299-301.
18 | Manly, B.F.J. (1986) Multivariate Statistical Methods, New York: Chapman & Hall.
19 |
Authorization:
Contact Authors
20 |
Description:
Four measurements of male Egyptian skulls from 5 different time periods. Thirty skulls are measured from each time period.
21 |
22 |
23 |
Number of cases:
150
24 |
Variable Names:
25 |
26 |
MB: Maximal Breadth of Skull
27 |
BH: Basibregmatic Height of Skull
28 |
BL: Basialveolar Length of Skull
29 |
NH: Nasal Height of Skull
30 |
Year: Approximate Year of Skull Formation (negative = B.C., positive = A.D.)
31 |