├── .gitignore
├── 00_introduction
└── notes.md
├── 01_relational-databases
├── relational-model-annotated-slides.pdf
├── relational-querying-annotated-slides.pdf
└── summary.md
├── 02_xml-data
├── data
│ ├── Bookstore-DTD.xml
│ ├── Bookstore-IDREFs.xml
│ ├── Bookstore-XSD.xml
│ ├── Bookstore-noDTD.xml
│ └── Bookstore.xsd
├── dtds-annotated-slides.pdf
├── dtds.md
├── exercises
│ ├── countries.dtd
│ ├── countries.xml
│ ├── courses-ID.dtd
│ ├── courses-ID.xml
│ ├── courses-noID.dtd
│ └── courses-noID.xml
├── well-formed-xml-annotated-slides.pdf
├── well-formed-xml.md
├── xml-quiz-solutions.md
├── xml-quiz.md
├── xml-schema-annotated-slides.pdf
└── xml-schema.md
├── 03_json-data
├── data
│ ├── Bookstore.json
│ └── BookstoreSchema.json
├── json-demo-annotated-slides.pdf
├── json-demo.md
├── json-intro-annotated-slides.pdf
├── json-intro.md
├── json-quiz-solutions.md
└── json-quiz.md
├── 04_relational-algebra
├── relational-algebra-1-annotated-slides.pdf
├── relational-algebra-2-annotated-slides.pdf
├── relational-algebra-questions.md
└── relational-algebra-questions.pdf
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | script-files
2 |
--------------------------------------------------------------------------------
/00_introduction/notes.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | - Database applications may be programmed via **frameworks** (i.e. Ruby on Rails, Django, etc.)
4 | - DBMS may run in conjunction with **middleware** (i.e. application servers, web servers, etc.)
5 | - Data-intensive applications may not use DBMS at all
6 | - Data is not always processed through query languages associated with database systems
7 | - Hadoop is a processing framework for running operations on data that's stored in files
8 |
9 | # Features of a DBMS
10 |
11 | Database Management System (DBMS) provides ...
12 |
13 | > ... efficient, reliable, convenient, and safe multi-user storage of and acess to massive amounts of persistent data
14 |
15 | - Massive:
16 | - Terabytes
17 | - DBMSs are designed to handle data that is residing outside of memory
18 | - Persistent
19 | - Data in the database outlives the program that execute on that data
20 | - Safe
21 | - Needs guarantee that managed data will stay in a consistent state
22 | - Failures come in all forms: hardwore, software, power, users, etc.
23 | - Multi-user
24 | - Programs may allow different users to access data *concurrently*
25 | - ***Concurrency control*** a database mechanism that controls the way multiple users access the database; control actually occurs at the level of the data items in the database
26 | - Similar to *file system concurrency* or even *variable concurrency* except it's more centered around the data itself
27 | - Convenient
28 | - Designed to make it easy to work with massive data
29 | - ***Physical Data Independence***: the way that data is actually stored and laid out on disk is independent of the way the programs think about the structure of the data
30 | - You could have a program that operates on a database and underneath there could be a complete change in the way the data is stored, yet the program itself would not have to be changed
31 | - ***High level query languages***: obey the notion called *declarative*, which is saying that in the query, you describe thwat you want out of the database but you don't need to describe the algorithm to get the data out
32 | - Efficient
33 | - Has to perform thousands of queries or updates per second
34 | - Reliable
35 | - 99.99999% uptime is the type of guarantee that DBMSs are making for their applications
36 |
37 | ---
38 |
39 | # Key Concepts
40 |
41 | ## Data model
42 |
43 | - Set or records: the data and the database is thought of as such in the relational data model
44 | - XML documents: hierarchical structure of labeled values
45 | - Graph data model: all data in the DB is in the form of nodes and edges
46 |
47 | ## Schema versus data
48 |
49 | Like types and variables in a programming language
50 |
51 | A schema ...
52 | - sets up the structure of the database
53 | - is typically set up at the beginning, and doesn't change very much where the data changes rapidly
54 | - is normally set up with what's known as a ***data definition language***
55 |
56 | ## Data definition language (DDL)
57 |
58 | - Sometimes people use higher level design tools that help them think about the design and then from there go to the data definitoin language
59 | - Used in general to set up a schema or structure for a particular database
60 | - Once the schema has been set up and data has been loaded, it's possible to start querying and modifying the data with what's known as ***data manipulation language***
61 |
62 | ## Data manipulation or query language (DML)
63 |
64 | - Queries and modifies the database (duh)
65 |
66 | ---
67 |
68 | # Key People
69 |
70 | ## DBMS Implementer
71 |
72 | - Implements (builds) the database system itself
73 |
74 | ## Database designer
75 |
76 | - Establishes the schema for a database
77 | - If working on an application, the database designer has to figure out how to structure the data before the application is built
78 | - Surprisingly difficult job when complex data is involved in an application
79 |
80 | ## Database application developer
81 |
82 | - Builds the applications or programs that operate on the database
83 | - Often interfacing between the eventual user and the data itself
84 | - You can have a database with many different operating programs, i.e. a sales database where some applications are actually inserting the sales as they happen, while others are analyzing the sales
85 | - Not necessary to have *one-to-one coupling* between programs and databases
86 |
87 | ## Database administrator
88 |
89 | - Loads the data and keeps it running smoothly
90 | - Very important job for large DB applications (highly paid)
91 | - DBMS tend to have a number of tuning parameters associated with them, and getting those tuning parameters right can make a significant difference in the important performance of the database system
92 |
--------------------------------------------------------------------------------
/01_relational-databases/relational-model-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/01_relational-databases/relational-model-annotated-slides.pdf
--------------------------------------------------------------------------------
/01_relational-databases/relational-querying-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/01_relational-databases/relational-querying-annotated-slides.pdf
--------------------------------------------------------------------------------
/01_relational-databases/summary.md:
--------------------------------------------------------------------------------
1 | ### Atomic types
2 |
3 | It's typical for relational databases to have just just atomic types in their attributes, but many database systems do also support structured types inside attributes.
4 |
5 | ### Enumerated domain
6 |
7 | Most relational databases have a concept of enumerated domain. For example, the attribute `state` might be an enumerated domain for the 50 abbreviations for states.
8 |
9 | ### Schema
10 |
11 | Schema of a database is the structure of the relation. It includes the name of the relation and the attributes of the relation and the types of those attributes.
12 |
13 | ### Instance
14 |
15 | The instance is the actual contents of the table at a given point in time.
16 |
17 | Typically, you set up a schema in advance, then the instances of the data will change over time.
18 |
19 | ### NULL
20 |
21 | Null values are used to denote that a particular value is maybe unknown or undefined.
22 |
23 | Null values are useful but one has to be very careful in a database sytem when running queries over relations that have null values. For example, `attribute == value OR attribute != value` won't include values that are `NULL`.
24 |
25 | ### Key
26 |
27 | A key is an attribute of a relation where every value for that attribute is unique. Every tuple is going to have a unique ID. For example, a student ID number in a student relation may very well be a key (since each student usually has a unique ID).
28 |
29 | Why it's important to have attributes that are identified as keys:
30 |
31 | - If you want to run a query to get a specific tuple out of the database, you would do so by asking for that tuple by its key.
32 | - Database systems, for efficiency, tend to build special index structures or store the database in a particular way so it's very fast to find a tuple based on its key
33 | - If one relation in a relational database wants to refer to tuples of another, there's no concept of pointer in relational databases. Therefore, the first relation will typically refer to a tuple in the second relation by its unique key.
34 |
35 | ### Steps in creating and using a (relational) database
36 |
37 | 1. Design schema; create using DDL (data definition language)
38 | 2. "Bulk load" initial data
39 | - fairly common for database to be initially loaded from data that comes from outside source
40 | 3. Repeat: execute queries and modifications
41 |
42 | ### Ad-hoc queries in high-level language
43 |
44 | **ad hoc** = posing queries that one didn't think of in advance; unnecessary to write long programs for specific queries
45 |
46 | Rather, the language can be used to pose a query as you think about what you want to ask.
47 |
48 | - Some easy to pose; some a bit harder
49 | - Some easy for DBMS to execute efficiently; some harder (not correlated with above)
50 | - "Query language" also used to modify data
51 |
52 | #### Example
53 |
54 | - All students with GPA >3.7 applying to Stanford and MIT only
55 | - All engineering departments in CA with < 500 applicants
56 | - College with highest average accept rate over last 5 years
57 |
58 | ### Queries return relations (*compositional, closed*)
59 |
60 | When you get back the same type of object that you query, that's known as ***closure*** of the language
61 |
62 | ***Compositionality*** is the abillity to run a query over the result of our previous query.
63 |
64 |
65 | ### Query Languages
66 |
67 | - ***Relational Algebra***: formal
68 | - Statement: *IDs of students with GPA > 3.7 applying to Stanford*
69 | - Expression: 
70 |
71 | - ***SQL***: actual/implemented
72 | ```
73 | Select Student.ID
74 | From Student, Apply
75 | Where Student.ID=Apply.ID
76 | And GPA>3.7 and college='Stanford'
77 | ```
78 |
--------------------------------------------------------------------------------
/02_xml-data/data/Bookstore-DTD.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
11 |
12 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 | ]>
24 |
25 |
26 |
27 | A First Course in Database Systems
28 |
29 |
30 | Jeffrey
31 | Ullman
32 |
33 |
34 | Jennifer
35 | Widom
36 |
37 |
38 |
39 |
40 | Database Systems: The Complete Book
41 |
42 |
43 | Hector
44 | Garcia-Molina
45 |
46 |
47 | Jeffrey
48 | Ullman
49 |
50 |
51 | Jennifer
52 | Widom
53 |
54 |
55 |
56 | Buy this book bundled with "A First Course" - a great deal!
57 |
58 |
59 |
60 |
--------------------------------------------------------------------------------
/02_xml-data/data/Bookstore-IDREFs.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
10 |
11 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 | ]>
23 |
24 |
25 |
26 | A First Course in Database Systems
27 |
28 |
29 | Database Systems: The Complete Book
30 |
31 | Amazon.com says: Buy this book bundled with
32 | - a great deal!
33 |
34 |
35 |
36 | Hector
37 | Garcia-Molina
38 |
39 |
40 | Jeffrey
41 | Ullman
42 |
43 |
44 | Jennifer
45 | Widom
46 |
47 |
48 |
--------------------------------------------------------------------------------
/02_xml-data/data/Bookstore-XSD.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 | A First Course in Database Systems
20 |
21 |
22 |
23 |
24 |
25 |
26 | Database Systems: The Complete Book
27 |
28 |
29 |
30 |
31 |
32 |
33 | Amazon.com says: Buy this book bundled with
34 | - a great deal!
35 |
36 |
37 |
38 | Hector
39 | Garcia-Molina
40 |
41 |
42 | Jeffrey
43 | Ullman
44 |
45 |
46 | Jennifer
47 | Widom
48 |
49 |
50 |
--------------------------------------------------------------------------------
/02_xml-data/data/Bookstore-noDTD.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | A First Course in Database Systems
7 |
8 |
9 | Jeffrey
10 | Ullman
11 |
12 |
13 | Jennifer
14 | Widom
15 |
16 |
17 |
18 |
19 |
20 | Buy this book bundled with "A First Course" - a great deal!
21 |
22 | Database Systems: The Complete Book
23 |
24 |
25 | Hector
26 | Garcia-Molina
27 |
28 |
29 | Jeffrey
30 | Ullman
31 |
32 |
33 | Jennifer
34 | Widom
35 |
36 |
37 |
38 |
39 |
--------------------------------------------------------------------------------
/02_xml-data/data/Bookstore.xsd:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
10 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
51 |
52 |
54 |
55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
70 |
71 |
--------------------------------------------------------------------------------
/02_xml-data/dtds-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/02_xml-data/dtds-annotated-slides.pdf
--------------------------------------------------------------------------------
/02_xml-data/dtds.md:
--------------------------------------------------------------------------------
1 | # *Valid* XML
2 |
3 | Has to adhere to the basic structural requirements as well-formed XML, but it also adheres to ***content-specific specification***
4 |
5 | - **Document Type Descriptor (DTD)**
6 | - **XML Schema Description (XSD)**
7 |
8 | ## Parsing Valid XML
9 |
10 | 1. Send XML document to a validating XML parser
11 | 2. Add additional input to the process (a specification); either a DTD or an XSD
12 | 3. If structural requirements not met; output error about the lack of good form
13 | 4. If content specific specifications not met; output invalid error
14 |
15 | ---
16 |
17 | # Document Type Descriptor (DTD)
18 |
19 | - Grammar-like language for specifying elements, attributes, nesting, ordering, # of occurences
20 | - Also allows special attribute types such as `ID` and `IDREF(S)`
21 | - effectively allows one to specify pointers within a document, although these *pointers are untyped*
22 |
23 | ---
24 |
25 | # DTD/XSD versus none (well-formed)
26 |
27 | ### Pros
28 |
29 | - Programs can assume structure
30 | - programs can be simpler because they don't have to do a lot of error checking on data
31 | - CSS/XSL
32 | - Specification language for conveying what XML might need to look like
33 | - i.e. data exchange using XML => companies can use the DTD as a specification for what the XML needs to look like when it arrives at the program it's going to operate on
34 | - Documentation
35 | - Use one of the specifications to document what the data itself looks like
36 | - Essentially enjoys benefits of strongly typed data as opposed to loosely-typed data
37 |
38 | ### Cons
39 |
40 | - Flexibility
41 | - DTD makes XML data have to conform to a specification
42 | - Allows ease of change in data formats without running into errors
43 | - DTDs can potentially be messy
44 | - especially for *irregular* documents
45 | - XSDs can potentially become VERY messy
46 | - it's possible to have a document where the specification is much, much larger than the document itself
47 | - Essentially enjoys benefits of nil typing
48 |
49 | ---
50 |
51 | # Examples
52 |
53 | ## [`Bookstore-DTD.xml`](./data/Bookstore-DTD.xml)
54 |
55 | Notice a few things:
56 |
57 | - The DTD is specified before the XML in the same file. However, it is also possible to specify DTDs in separate files.
58 | - The first grammar-like construct `(Book | Magazine)*` are a little bit like regular expressions
59 | - `(Book | Magazine)` means that a bookstore element has a sub-element (any number of elements) that is/are called `Book` or `Magazne`
60 | - `*` means zero or more instances
61 | - `CDATA`: type of data, in this case a string
62 | - `#REQUIRED`: has to be present; `#IMPLIED`: doesn't have to be present
63 | - `#PCDATA` is when you have a leaf that consists of text data in the XML tree
64 | - `(Author+)`: one or more instances of author sub-elements
65 |
66 | ```xml
67 |
68 |
69 |
70 |
71 |
72 |
73 |
74 |
75 |
77 |
78 |
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 |
89 | ]>
90 |
91 |
92 |
93 | A First Course in Database Systems
94 |
95 |
96 | Jeffrey
97 | Ullman
98 |
99 |
100 | Jennifer
101 | Widom
102 |
103 |
104 |
105 |
106 | Database Systems: The Complete Book
107 |
108 |
109 | Hector
110 | Garcia-Molina
111 |
112 |
113 | Jeffrey
114 | Ullman
115 |
116 |
117 | Jennifer
118 | Widom
119 |
120 |
121 |
122 | Buy this book bundled with "A First Course" - a great deal!
123 |
124 |
125 |
126 | ```
127 |
128 | ## [`Bookstore-IDREFs.xml`](./data/Bookstore-IDREFs.xml)
129 |
130 | As before, the DTD is specified before the XML in the same file.
131 |
132 | Notice a few things:
133 |
134 | - Instead of having authors as subelements of book elements, have authors listed separately, and then effectively point from books to the authors of the books
135 | - Add `Ident` attribute to authors, give a string value to that attribute that we're going to use effectively for the pointers in the book
136 | - Authors is an `IDREF` attribute where its value can refer to one or more strings that are `ID` attributes
137 | - `(# PCDATA | BookRef)` is the type of construct that can be used to mix strings and sub-elements within an element
138 | - `ID` attributes need to all be unique
139 | - A singular `IDREF` requires that the string has to have exactly one `ID` value
140 | - Plural `IDREF`s require `ID` values to be separated by spaces
141 |
142 | ```xml
143 |
144 |
145 |
146 |
147 |
148 |
149 |
150 |
152 |
153 |
156 |
157 |
158 |
159 |
160 |
161 |
162 |
163 |
164 | ]>
165 |
166 |
167 |
168 | A First Course in Database Systems
169 |
170 |
171 | Database Systems: The Complete Book
172 |
173 | Amazon.com says: Buy this book bundled with
174 | - a great deal!
175 |
176 |
177 |
178 | Hector
179 | Garcia-Molina
180 |
181 |
182 | Jeffrey
183 | Ullman
184 |
185 |
186 | Jennifer
187 | Widom
188 |
189 |
190 | ```
191 |
192 | ## Command-line validity checking
193 |
194 | Use `xmllint` for *linting* (error-checking):
195 |
196 | ```sh
197 | $ xmllint --valid --noout Bookstore-DTD.xml # missing data in #REQUIRED field
198 |
199 | Bookstore-DTD.xml:59: element Book: validity error : Database Systems: The Complete Book does not carry attribute Edition
200 |
201 | ^
202 | $ xmllint --valid --noout Bookstore-DTD.xml # no closing tags
203 |
204 | Bookstore-DTD.xml:64: parser error: Premature end of ...
205 |
206 | ```
207 |
208 | If it doesn't return anything, there are no errors, and you should be proud.
209 |
210 | ---
211 |
212 |
--------------------------------------------------------------------------------
/02_xml-data/exercises/countries.dtd:
--------------------------------------------------------------------------------
1 |
3 |
4 |
7 |
8 |
9 |
10 |
11 |
12 | ]>
13 |
--------------------------------------------------------------------------------
/02_xml-data/exercises/countries.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Turkic
6 | Pashtu
7 | Afghan Persian
8 |
9 |
10 |
11 |
12 | Algiers
13 | 1507241
14 |
15 |
16 |
17 |
18 |
19 |
20 | English
21 |
22 |
23 | English
24 |
25 |
26 |
27 | La Matanza
28 | 1111811
29 |
30 |
31 | Cordoba
32 | 1208713
33 |
34 |
35 | Rosario
36 | 1118984
37 |
38 |
39 | Buenos Aires
40 | 2988006
41 |
42 |
43 |
44 |
45 | Yerevan
46 | 1200000
47 |
48 | Russian
49 | Armenian
50 |
51 |
52 |
53 |
54 | Sydney
55 | 3657000
56 |
57 |
58 | Brisbane
59 | 1302000
60 |
61 |
62 | Adelaide
63 | 1050000
64 |
65 |
66 | Melbourne
67 | 3081000
68 |
69 |
70 | Perth
71 | 1193000
72 |
73 | English
74 |
75 |
76 |
77 | Vienna
78 | 1583000
79 |
80 | German
81 |
82 |
83 |
84 | Baku
85 | 1740000
86 |
87 | Russian
88 | Armenian
89 | Azeri
90 |
91 |
92 |
93 |
94 |
95 | Dhaka
96 | 3839000
97 |
98 |
99 | Chittagong
100 | 1599000
101 |
102 |
103 |
104 | English
105 |
106 |
107 |
108 | Minsk
109 | 1540000
110 |
111 |
112 |
113 | French
114 | German
115 | Dutch
116 |
117 |
118 |
119 |
120 | English
121 |
122 |
123 |
124 |
125 | Serbo-Croatian
126 |
127 |
128 |
129 |
130 | Manaus
131 | 1158265
132 |
133 |
134 | Salvador
135 | 2209465
136 |
137 |
138 | Fortaleza
139 | 1967365
140 |
141 |
142 | Belo Horizonte
143 | 2091770
144 |
145 |
146 | Belem
147 | 1142258
148 |
149 |
150 | Curitiba
151 | 1465698
152 |
153 |
154 | Recife
155 | 1342877
156 |
157 |
158 | Rio de Janeiro
159 | 5533011
160 |
161 |
162 | Porto Alegre
163 | 1286251
164 |
165 |
166 | Sao Paulo
167 | 9811776
168 |
169 |
170 | Brasilia
171 | 1817001
172 |
173 |
174 |
175 | English
176 |
177 |
178 |
179 |
180 | Sofia
181 | 1300000
182 |
183 | Bulgarian
184 |
185 |
186 |
187 |
188 | Rangoon
189 | 2513000
190 |
191 | Burmese
192 |
193 |
194 |
195 |
196 |
197 |
198 | Montreal
199 | 1017666
200 |
201 |
202 |
203 |
204 | English
205 |
206 |
207 |
208 |
209 |
210 | Santiago
211 | 4318000
212 |
213 | Spanish
214 |
215 |
216 |
217 | Hong Kong
218 | 6218000
219 |
220 |
221 | Hefei
222 | 1000000
223 |
224 |
225 | Huainan
226 | 1200000
227 |
228 |
229 | Lanzhou
230 | 1510000
231 |
232 |
233 | Guangzhou
234 | 3580000
235 |
236 |
237 | Guiyang
238 | 1530000
239 |
240 |
241 | Shijiazhuang
242 | 1320000
243 |
244 |
245 | Tangshan
246 | 1500000
247 |
248 |
249 | Handan
250 | 1110000
251 |
252 |
253 | Harbin
254 | 2830000
255 |
256 |
257 | Qiqihar
258 | 1380000
259 |
260 |
261 | Zhengzhou
262 | 1710000
263 |
264 |
265 | Luoyang
266 | 1190000
267 |
268 |
269 | Wuhan
270 | 3750000
271 |
272 |
273 | Changsha
274 | 1330000
275 |
276 |
277 | Nanjing
278 | 2500000
279 |
280 |
281 | Fuzhou
282 | 1290000
283 |
284 |
285 | Nanchang
286 | 1350000
287 |
288 |
289 | Jilin
290 | 1270000
291 |
292 |
293 | Changchun
294 | 2110000
295 |
296 |
297 | Shenyang
298 | 4540000
299 |
300 |
301 | Dalian
302 | 2400000
303 |
304 |
305 | Anshan
306 | 1390000
307 |
308 |
309 | Fushun
310 | 1350000
311 |
312 |
313 | Xian
314 | 2760000
315 |
316 |
317 | Jinan
318 | 2320000
319 |
320 |
321 | Zibo
322 | 2460000
323 |
324 |
325 | Qingdao
326 | 2060000
327 |
328 |
329 | Taiyuan
330 | 1960000
331 |
332 |
333 | Datong
334 | 1110000
335 |
336 |
337 | Chengdu
338 | 2810000
339 |
340 |
341 | Chongqing
342 | 2980000
343 |
344 |
345 | Kunming
346 | 1520000
347 |
348 |
349 | Hangzhou
350 | 1340000
351 |
352 |
353 | Ningbo
354 | 1090000
355 |
356 |
357 | Nanning
358 | 1070000
359 |
360 |
361 | Baotou
362 | 1200000
363 |
364 |
365 | Urumqi
366 | 1160000
367 |
368 |
369 | Beijing
370 | 7000000
371 |
372 |
373 | Shanghai
374 | 7830000
375 |
376 |
377 | Tianjin
378 | 5770000
379 |
380 |
381 |
382 | English
383 |
384 |
385 | English
386 |
387 |
388 |
389 | Medellin
390 | 1621356
391 |
392 |
393 | Barranquilla
394 | 1064255
395 |
396 |
397 | Bogota
398 | 5237635
399 |
400 |
401 | Cali
402 | 1718871
403 |
404 | Spanish
405 |
406 |
407 |
408 |
409 |
410 |
411 |
412 | Serbo-Croatian
413 |
414 |
415 |
416 | Havana
417 | 2241000
418 |
419 | Spanish
420 |
421 |
422 |
423 |
424 | Prague
425 | 1215000
426 |
427 |
428 |
429 |
430 | Copenhagen
431 | 1358540
432 |
433 |
434 |
435 |
436 |
437 |
438 | Santo Domingo
439 | 1400000
440 |
441 | Spanish
442 |
443 |
444 |
445 | Quito
446 | 1200000
447 |
448 |
449 | Guayaquil
450 | 1300868
451 |
452 |
453 |
454 |
455 | El Giza
456 | 1671000
457 |
458 |
459 | Alexandria
460 | 2917000
461 |
462 |
463 | Cairo
464 | 6053000
465 |
466 |
467 |
468 |
469 |
470 |
471 |
472 |
473 | Addis Ababa
474 | 2316400
475 |
476 |
477 |
478 | English
479 |
480 |
481 |
482 |
483 | Swedish
484 | Finnish
485 |
486 |
487 |
488 | Paris
489 | 2152423
490 |
491 | French
492 |
493 |
494 | French
495 |
496 |
497 |
498 |
499 |
500 |
501 | Tbilisi
502 | 1200000
503 |
504 | Russian
505 | Armenian
506 | Azeri
507 | Georgian
508 |
509 |
510 |
511 | Munchen
512 | 1244676
513 |
514 |
515 | Muenchen
516 | 1290079
517 |
518 |
519 | Berlin
520 | 3472009
521 |
522 |
523 | Hamburg
524 | 1705872
525 |
526 | German
527 |
528 |
529 |
530 |
531 |
532 |
533 |
534 | French
535 |
536 |
537 |
538 | Spanish
539 | Indian
540 |
541 |
542 |
543 | French
544 |
545 |
546 |
547 |
548 | French
549 |
550 |
551 |
552 |
553 |
554 | Budapest
555 | 2016000
556 |
557 | Hungarian
558 |
559 |
560 | Icelandic
561 |
562 |
563 |
564 | Hyderabad
565 | 3145939
566 |
567 |
568 | Ahmadabad
569 | 2954526
570 |
571 |
572 | Surat
573 | 1505872
574 |
575 |
576 | Vadodara
577 | 1061598
578 |
579 |
580 | Bangalore
581 | 3302296
582 |
583 |
584 | Bhopal
585 | 1062771
586 |
587 |
588 | Indore
589 | 1091674
590 |
591 |
592 | Mumbai
593 | 9925891
594 |
595 |
596 | Nagpur
597 | 1624752
598 |
599 |
600 | Pune
601 | 1566651
602 |
603 |
604 | Kalyan
605 | 1014557
606 |
607 |
608 | Ludhiana
609 | 1042740
610 |
611 |
612 | Jaipur
613 | 1458183
614 |
615 |
616 | Madras
617 | 3841396
618 |
619 |
620 | Lucknow
621 | 1619115
622 |
623 |
624 | Kanpur
625 | 1879420
626 |
627 |
628 | Calcutta
629 | 4399819
630 |
631 |
632 | New Delhi
633 | 7206704
634 |
635 | Hindi
636 |
637 |
638 |
639 | Jakarta
640 | 8259266
641 |
642 |
643 | Bandung
644 | 2058649
645 |
646 |
647 | Semarang
648 | 1250971
649 |
650 |
651 | Surabaya
652 | 2483871
653 |
654 |
655 | Palembang
656 | 1144279
657 |
658 |
659 | Medan
660 | 1730752
661 |
662 |
663 |
664 |
665 | Tabriz
666 | 1166203
667 |
668 |
669 | Esfahan
670 | 1220595
671 |
672 |
673 | Shiraz
674 | 1042801
675 |
676 |
677 | Mashhad
678 | 1964489
679 |
680 |
681 | Tehran
682 | 6750043
683 |
684 | Turkish
685 | Baloch
686 | Kurdish
687 | Arabic
688 | Luri
689 | Persian Persian
690 | Turkic Turkic
691 |
692 |
693 |
694 | Baghdad
695 | 4478000
696 |
697 |
698 |
699 |
700 |
701 |
702 | Milano
703 | 1432184
704 |
705 |
706 | Rome
707 | 2791354
708 |
709 |
710 | Napoli
711 | 1206013
712 |
713 |
714 |
715 |
716 |
717 | Sapporo
718 | 1748000
719 |
720 |
721 | Tokyo
722 | 7843000
723 |
724 |
725 | Yokohama
726 | 3256000
727 |
728 |
729 | Kawasaki
730 | 1187000
731 |
732 |
733 | Nagoya
734 | 2108000
735 |
736 |
737 | Kyoto
738 | 1415000
739 |
740 |
741 | Osaka
742 | 2492000
743 |
744 |
745 | Kobe
746 | 1388000
747 |
748 |
749 | Hiroshima
750 | 1099000
751 |
752 |
753 | Fukuoka
754 | 1273000
755 |
756 |
757 | Kita kyushu
758 | 1012000
759 |
760 | Japanese
761 |
762 |
763 |
764 |
765 |
766 | Almaty
767 | 1172400
768 |
769 | Kazak
770 |
771 |
772 |
773 | Nairobi
774 | 2000000
775 |
776 |
777 |
778 |
779 |
780 |
781 |
782 |
783 |
784 |
785 | English
786 |
787 |
788 |
789 |
790 |
791 |
792 | Portuguese
793 |
794 |
795 | Albanian
796 | Macedonian
797 | Turkish
798 | Serbo-Croatian
799 |
800 |
801 |
802 | Antananarivo
803 | 1250000
804 |
805 |
806 |
807 |
808 |
809 | Kuala Lumpur
810 | 1145075
811 |
812 |
813 |
814 |
815 | Bambara
816 |
817 |
818 |
819 |
820 |
821 |
822 |
823 |
824 |
825 |
826 | Nezahualcoyotl
827 | 1255456
828 |
829 |
830 | Guadalajara
831 | 1650042
832 |
833 |
834 | Monterrey
835 | 1068996
836 |
837 |
838 | Puebla
839 | 1007170
840 |
841 |
842 | Mexico
843 | 9815795
844 |
845 |
846 |
847 |
848 |
849 |
850 | Khalkha Mongol
851 |
852 |
853 | English
854 |
855 |
856 |
857 | Rabat
858 | 1385872
859 |
860 |
861 | Casablanca
862 | 2940623
863 |
864 |
865 |
866 | Portuguese
867 |
868 |
869 | German
870 | English
871 | Afrikaans
872 |
873 |
874 |
875 | Nepali
876 |
877 |
878 |
879 | Amsterdam
880 | 1101407
881 |
882 |
883 | Rotterdam
884 | 1078747
885 |
886 | Dutch
887 |
888 |
889 |
890 |
891 |
892 |
893 | Managua
894 | 1195000
895 |
896 | Spanish
897 |
898 |
899 |
900 |
901 | Lagos
902 | 5686000
903 |
904 |
905 | Ibadan
906 | 1263000
907 |
908 |
909 |
910 |
911 |
912 |
913 | Pyongyang
914 | 2335000
915 |
916 | Korean
917 |
918 |
919 |
920 | Norwegian
921 |
922 |
923 |
924 |
925 | Hyderabad
926 | 1107000
927 |
928 |
929 | Peshawar
930 | 1676000
931 |
932 |
933 | Lahore
934 | 5085000
935 |
936 |
937 | Karachi
938 | 9863000
939 |
940 |
941 | Faisalabad
942 | 1875000
943 |
944 |
945 | Gujranwala
946 | 1663000
947 |
948 |
949 | Rawalpindi
950 | 1290000
951 |
952 |
953 | Multan
954 | 1257000
955 |
956 | Pashtu
957 | Urdu
958 | Punjabi
959 | Sindhi
960 | Balochi
961 | Hindko
962 | Brahui
963 | Siraiki
964 |
965 |
966 |
967 | English
968 |
969 |
970 | English
971 |
972 |
973 |
974 |
975 | Lima
976 | 6321173
977 |
978 |
979 |
980 |
981 | Manila
982 | 1655000
983 |
984 |
985 | Quezon
986 | 1989000
987 |
988 |
989 | Kalookan
990 | 1023000
991 |
992 |
993 | Davao
994 | 1007000
995 |
996 |
997 |
998 |
999 |
1000 | Warsaw
1001 | 1655000
1002 |
1003 | Polish
1004 |
1005 |
1006 | Portuguese
1007 |
1008 |
1009 |
1010 |
1011 |
1012 |
1013 | Bucharest
1014 | 2037000
1015 |
1016 |
1017 |
1018 |
1019 | Sankt Peterburg
1020 | 4838000
1021 |
1022 |
1023 | Moscow
1024 | 8717000
1025 |
1026 |
1027 | Nizhniy Novgorod
1028 | 1383000
1029 |
1030 |
1031 | Kazan
1032 | 1085000
1033 |
1034 |
1035 | Volgograd
1036 | 1003000
1037 |
1038 |
1039 | Samara
1040 | 1184000
1041 |
1042 |
1043 | Rostov na Donu
1044 | 1026000
1045 |
1046 |
1047 | Ufa
1048 | 1094000
1049 |
1050 |
1051 | Perm
1052 | 1032000
1053 |
1054 |
1055 | Yekaterinburg
1056 | 1280000
1057 |
1058 |
1059 | Chelyabinsk
1060 | 1086000
1061 |
1062 |
1063 | Novosibirsk
1064 | 1369000
1065 |
1066 |
1067 | Omsk
1068 | 1163000
1069 |
1070 | Russian
1071 |
1072 |
1073 |
1074 | English
1075 |
1076 |
1077 | English
1078 |
1079 |
1080 |
1081 | French
1082 |
1083 |
1084 |
1085 | Italian
1086 |
1087 |
1088 | Portuguese
1089 |
1090 |
1091 |
1092 | Riyadh
1093 | 1250000
1094 |
1095 | Arabic
1096 |
1097 |
1098 |
1099 | Dakar
1100 | 1382000
1101 |
1102 |
1103 |
1104 |
1105 | Belgrade
1106 | 1407073
1107 |
1108 | Albanian
1109 | Serbo-Croatian
1110 |
1111 |
1112 |
1113 |
1114 |
1115 | Singapore
1116 | 2558000
1117 |
1118 |
1119 |
1120 |
1121 | Slovenian
1122 | Serbo-Croatian
1123 |
1124 |
1125 | English
1126 |
1127 |
1128 |
1129 |
1130 |
1131 | Seoul
1132 | 10229262
1133 |
1134 |
1135 | Pusan
1136 | 3813814
1137 |
1138 |
1139 | Taegu
1140 | 2449139
1141 |
1142 |
1143 | Inchon
1144 | 2307618
1145 |
1146 |
1147 | Kwangju
1148 | 1257504
1149 |
1150 |
1151 | Taejon
1152 | 1272143
1153 |
1154 |
1155 |
1156 |
1157 | Barcelona
1158 | 1630867
1159 |
1160 |
1161 | Madrid
1162 | 3041101
1163 |
1164 | Catalan
1165 | Galician
1166 | Basque
1167 |
1168 |
1169 | Tamil
1170 | Sinhala
1171 |
1172 |
1173 |
1174 | Omdurman
1175 | 1267077
1176 |
1177 |
1178 |
1179 |
1180 |
1181 |
1182 | Swedish
1183 |
1184 |
1185 | French
1186 | German
1187 | Italian
1188 | Romansch
1189 |
1190 |
1191 |
1192 | Damascus
1193 | 1500000
1194 |
1195 |
1196 |
1197 |
1198 | Kaohsiung
1199 | 1426518
1200 |
1201 |
1202 | Taipei
1203 | 2626138
1204 |
1205 |
1206 |
1207 |
1208 |
1209 | Dar es Salaam
1210 | 1360850
1211 |
1212 |
1213 |
1214 |
1215 | Bangkok
1216 | 5876000
1217 |
1218 |
1219 |
1220 |
1221 |
1222 |
1223 |
1224 |
1225 | Adana
1226 | 1047300
1227 |
1228 |
1229 | Ankara
1230 | 2782200
1231 |
1232 |
1233 | Istanbul
1234 | 7615500
1235 |
1236 |
1237 | Izmir
1238 | 1985300
1239 |
1240 |
1241 |
1242 | Russian
1243 | Uzbek
1244 | Turkmen
1245 |
1246 |
1247 | English
1248 |
1249 |
1250 |
1251 |
1252 |
1253 | Dnipropetrovsk
1254 | 1187000
1255 |
1256 |
1257 | Donetsk
1258 | 1117000
1259 |
1260 |
1261 | Kharkiv
1262 | 1618000
1263 |
1264 |
1265 | Kiev
1266 | 2616000
1267 |
1268 |
1269 | Odesa
1270 | 1106000
1271 |
1272 |
1273 |
1274 |
1275 |
1276 | London
1277 | 6967500
1278 |
1279 |
1280 | Birmingham
1281 | 1008400
1282 |
1283 |
1284 |
1285 |
1286 | Phoenix
1287 | 1159014
1288 |
1289 |
1290 | Los Angeles
1291 | 3553638
1292 |
1293 |
1294 | San Diego
1295 | 1171121
1296 |
1297 |
1298 | Chicago
1299 | 2721547
1300 |
1301 |
1302 | Detroit
1303 | 1000272
1304 |
1305 |
1306 | New York
1307 | 7380906
1308 |
1309 |
1310 | Philadelphia
1311 | 1478002
1312 |
1313 |
1314 | Houston
1315 | 1744058
1316 |
1317 |
1318 | San Antonio
1319 | 1067816
1320 |
1321 |
1322 | Dallas
1323 | 1053292
1324 |
1325 |
1326 |
1327 |
1328 | Montevideo
1329 | 1247000
1330 |
1331 |
1332 |
1333 |
1334 | Tashkent
1335 | 2106000
1336 |
1337 | Russian
1338 | Tajik
1339 | Uzbek
1340 |
1341 |
1342 |
1343 |
1344 | Maracaibo
1345 | 1249670
1346 |
1347 |
1348 | Caracas
1349 | 1822465
1350 |
1351 |
1352 |
1353 |
1354 | Hanoi
1355 | 3056146
1356 |
1357 |
1358 | Haiphong
1359 | 1447523
1360 |
1361 |
1362 | Ho Chi Minh City
1363 | 3924435
1364 |
1365 |
1366 |
1367 |
1368 |
1369 |
1370 |
1371 | Arabic
1372 |
1373 |
1374 |
1375 | Kinshasa
1376 | 4655313
1377 |
1378 |
1379 |
1380 |
1381 |
1382 |
--------------------------------------------------------------------------------
/02_xml-data/exercises/courses-ID.dtd:
--------------------------------------------------------------------------------
1 |
3 |
4 |
6 |
7 |
8 |
9 |
10 |
11 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 | ]>
23 |
--------------------------------------------------------------------------------
/02_xml-data/exercises/courses-ID.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Computer Science
5 |
6 | Programming Methodology
7 | Introduction to the engineering of computer applications emphasizing modern software engineering principles.
8 |
9 |
10 | Programming Abstractions
11 | Abstraction and its relation to programming.
12 |
13 |
14 | Computer Organization and Systems
15 | Introduction to the fundamental concepts of computer systems.
16 |
17 |
18 | Introduction to Probability for Computer Scientists
19 |
20 |
21 | From Languages to Information
22 |
23 | Natural language processing. Cross-listed as
24 |
25 | .
26 |
27 |
28 |
29 | Compilers
30 | Principles and practices for design and implementation of compilers and interpreters.
31 |
32 |
33 | Introduction to Databases
34 | Database design and use of database management systems for applications.
35 |
36 |
37 | Artificial Intelligence: Principles and Techniques
38 |
39 |
40 | Structured Probabilistic Models: Principles and Techniques
41 | Using probabilistic modeling languages to represent complex domains.
42 |
43 |
44 | Machine Learning
45 | A broad introduction to machine learning and statistical pattern recognition.
46 |
47 |
48 | Alex
49 | S.
50 | Aiken
51 |
52 |
53 | Jerry
54 | R.
55 | Cain
56 |
57 |
58 | Daphne
59 | Koller
60 |
61 |
62 | Andrew
63 | Ng
64 |
65 |
66 | Eric
67 | Roberts
68 |
69 |
70 | Mehran
71 | Sahami
72 |
73 |
74 | Sebastian
75 | Thrun
76 |
77 |
78 | Jennifer
79 | Widom
80 |
81 |
82 | Julie
83 | Zelenski
84 |
85 |
86 |
87 | Electrical Engineering
88 |
89 | Digital Systems I
90 | Digital circuit, logic, and system design.
91 |
92 |
93 | Digital Systems II
94 | The design of processor-based digital systems.
95 |
96 |
97 | William
98 | J.
99 | Dally
100 |
101 |
102 | Mark
103 | A.
104 | Horowitz
105 |
106 |
107 | Subhasish
108 | Mitra
109 |
110 |
111 | Oyekunle
112 | Olukotun
113 |
114 |
115 |
116 | Linguistics
117 |
118 | From Languages to Information
119 |
120 | Natural language processing. Cross-listed as
121 |
122 | .
123 |
124 |
125 |
126 | Dan
127 | Jurafsky
128 |
129 |
130 | Beth
131 | Levin
132 |
133 |
134 |
135 |
--------------------------------------------------------------------------------
/02_xml-data/exercises/courses-noID.dtd:
--------------------------------------------------------------------------------
1 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 | ]>
20 |
--------------------------------------------------------------------------------
/02_xml-data/exercises/courses-noID.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Computer Science
5 |
6 |
7 | Jennifer
8 | Widom
9 |
10 |
11 |
12 | Programming Methodology
13 | Introduction to the engineering of computer applications emphasizing modern software engineering principles.
14 |
15 |
16 | Jerry
17 | R.
18 | Cain
19 |
20 |
21 | Eric
22 | Roberts
23 |
24 |
25 | Mehran
26 | Sahami
27 |
28 |
29 |
30 |
31 | Programming Abstractions
32 | Abstraction and its relation to programming.
33 |
34 |
35 | Eric
36 | Roberts
37 |
38 |
39 | Jerry
40 | R.
41 | Cain
42 |
43 |
44 |
45 | CS106A
46 |
47 |
48 |
49 | Computer Organization and Systems
50 | Introduction to the fundamental concepts of computer systems.
51 |
52 |
53 | Julie
54 | Zelenski
55 |
56 |
57 |
58 | CS106B
59 |
60 |
61 |
62 | Introduction to Probability for Computer Scientists
63 |
64 |
65 | Mehran
66 | Sahami
67 |
68 |
69 |
70 | CS106B
71 |
72 |
73 |
74 | From Languages to Information
75 | Natural language processing. Cross-listed as LING180.
76 |
77 |
78 | Dan
79 | Jurafsky
80 |
81 |
82 |
83 | CS107
84 | CS109
85 |
86 |
87 |
88 | Compilers
89 | Principles and practices for design and implementation of compilers and interpreters.
90 |
91 |
92 | Alex
93 | S.
94 | Aiken
95 |
96 |
97 |
98 | CS107
99 |
100 |
101 |
102 | Introduction to Databases
103 | Database design and use of database management systems for applications.
104 |
105 |
106 | Jennifer
107 | Widom
108 |
109 |
110 |
111 | CS107
112 |
113 |
114 |
115 | Artificial Intelligence: Principles and Techniques
116 |
117 |
118 | Andrew
119 | Ng
120 |
121 |
122 | Sebastian
123 | Thrun
124 |
125 |
126 |
127 |
128 | Structured Probabilistic Models: Principles and Techniques
129 | Using probabilistic modeling languages to represent complex domains.
130 |
131 |
132 | Daphne
133 | Koller
134 |
135 |
136 |
137 |
138 | Machine Learning
139 | A broad introduction to machine learning and statistical pattern recognition.
140 |
141 |
142 | Andrew
143 | Ng
144 |
145 |
146 |
147 |
148 |
149 | Electrical Engineering
150 |
151 |
152 | Mark
153 | A.
154 | Horowitz
155 |
156 |
157 |
158 | Digital Systems I
159 | Digital circuit, logic, and system design.
160 |
161 |
162 | Subhasish
163 | Mitra
164 |
165 |
166 |
167 |
168 | Digital Systems II
169 | The design of processor-based digital systems.
170 |
171 |
172 | William
173 | J.
174 | Dally
175 |
176 |
177 | Oyekunle
178 | Olukotun
179 |
180 |
181 |
182 | EE108A
183 | CS106B
184 |
185 |
186 |
187 |
188 | Linguistics
189 |
190 |
191 | Beth
192 | Levin
193 |
194 |
195 |
196 | From Languages to Information
197 | Natural language processing. Cross-listed as CS124.
198 |
199 |
200 | Dan
201 | Jurafsky
202 |
203 |
204 |
205 | CS107
206 | CS109
207 |
208 |
209 |
210 |
211 |
--------------------------------------------------------------------------------
/02_xml-data/well-formed-xml-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/02_xml-data/well-formed-xml-annotated-slides.pdf
--------------------------------------------------------------------------------
/02_xml-data/well-formed-xml.md:
--------------------------------------------------------------------------------
1 | # Extensible Markup Language (XML)
2 |
3 | - Standard for data representation and exchange
4 | - Document format similar to HTML
5 | - Tags describe content instead of formatting
6 | - Also has a streaming format/standard
7 | - Typically for the use in programs for admitting and consuming XML
8 |
9 | ## Basic constructs
10 |
11 | - Tagged elements (nested)
12 | - Attributes
13 | - Each element may have within its opening tag a set of attributes
14 | - Consist of an attribute name, the equal sign, and then an attribute value
15 | - Any element can have any number of attributes as long as the attribute names are unique
16 | - Text
17 | - If XML is thought of as a tree, the text (strings) are the leaf elements of the tree
18 |
19 | ## Example ([`Bookstore-noDTD.xml`](./data/Bookstore-noDTD.xml))
20 |
21 | ```xml
22 |
23 |
24 |
25 |
26 |
27 | A First Course in Database Systems
28 |
29 |
30 | Jeffrey
31 | Ullman
32 |
33 |
34 | Jennifer
35 | Widom
36 |
37 |
38 |
39 |
40 |
41 | Buy this book bundled with "A First Course" - a great deal!
42 |
43 | Database Systems: The Complete Book
44 |
45 |
46 | Hector
47 | Garcia-Molina
48 |
49 |
50 | Jeffrey
51 | Ullman
52 |
53 |
54 | Jennifer
55 | Widom
56 |
57 |
58 |
59 |
60 | ```
61 |
62 | ---
63 |
64 | # Relational Model versus XML
65 |
66 | ## Structure
67 |
68 | - Relational: *tables*
69 | - XML: *hierarchical trees or graphs*
70 |
71 | ## Schema
72 |
73 | - Relational: *fixed in advance*
74 | - add the data afterwards to conform to the schema
75 | - XML: *flexible, self-describing*
76 | - the schema and the data are mixed together to some extent
77 | - tags on elements are telling you the kind of data you'll have, and you can have a lot of irregularity
78 | - many mechanisms for introducing schemas into XML but not required
79 |
80 | ## Queries
81 |
82 | - Relational: *simple, nice languages*
83 | - XML: *less simple*
84 |
85 | ## Ordering
86 |
87 | - Relational: *None*
88 | - fundamentally, the data in a relationship database is a set of data without an ordering within that set
89 | - XML: *Implied*
90 | - XML can be thought of as either a document model or a stream model
91 | - Either case, the nature of the XML being laid out in a document as we have here (or being in a stream) induces an order
92 |
93 | ## Implementation
94 |
95 | - Relational: *Native*
96 | - has been around for at least 35 years
97 | - XML: *Add-on*
98 | - most likely a layer over the relational database system
99 | - entering and querying data in XML will be translated into a relational implementation
100 |
101 | ---
102 |
103 | # *Well-Formed* XML
104 |
105 | ## Basic structural requirements:
106 |
107 | - Single root element
108 | - Matched tags, proper nesting
109 | - Unique attributes within elements
110 |
111 | ## XML Parser
112 |
113 | - The parser will check the basic structure of a XML document
114 | - If not well-formed, parser will send an error
115 | - Else, the output is parsed XML, which is shown via several standards:
116 | - **DOM**: document-object model is a programmatic interface for traversing the tree implied by XML
117 | - **SAX**: more of a stream model for XML
118 |
119 | ## Displaying XML
120 |
121 | Use ruled-basd language to translate to HTML, which we can then render in a browser
122 |
123 | - **Cascading stylesheets (CSS)**
124 | - **Extensible stylesheet language (XSL)**
125 |
126 | ### Process
127 |
128 | 1. Send XML document to CSS or XSL interpreter
129 | 2. Apply rules that will be used on that particular document
130 | 3. Get output as HTML document
131 | 4. Render HTML in browser
132 |
133 | ---
134 |
135 |
136 | # Questions
137 |
138 | 1. You're creating a database to contain information about university records: students, courses, grades, etc. Should you use the relational model or XML?
139 | - The database has a simple structure fixed in advance, so it's amenable to the relational model.
140 |
141 | 2. You're creating a database to contain information for a university web site: news, academic announcements, admissions, events, research, etc. Should you use the relational model or XML?
142 | - The database has an unpredictable, complex, dynamic structure, so the flexibility of XML is warranted.
143 |
144 | 3. You're creating a database to contain information about family trees (ancestry). Should you use the relational model or XML?
145 | - The database has a fixed structure suggesting relational, but it's strictly hierarchical suggesting XML. Either may be suitable.
146 |
--------------------------------------------------------------------------------
/02_xml-data/xml-quiz-solutions.md:
--------------------------------------------------------------------------------
1 | # Solutions to [XML Quiz](./xml-quiz.md)
2 |
3 | 1. **ANSWER**:
4 |
5 | ```xml
6 |
7 |
8 |
9 |
10 |
11 | ```
12 |
13 | **EXPLANATION**:
14 |
15 | Well-formed XML must follow these rules (along with others):
16 | - There must be exactly one top level element.
17 | - All opening tags must be closed.
18 | - All elements are properly nested i.e., there are no interleaved elements.
19 | - Attribute values must be enclosed in single or double quotes.
20 |
21 | 2. **ANSWER**:
22 |
23 | ```xml
24 |
25 | ```
26 |
27 | **EXPLANATION**:
28 |
29 | In the XML snippet, the info element has one address subelement and two phone subelements, in that order. Thus, in the DTD the list of components for `INFO` must include `ADDR`, `ADDR*`, `ADDR+`, or `ADDR?` followed by `PHONE*` or `PHONE+`. Interspersed with these may be any elements that are not required to appear-- that is, any components with a `?` or `*`. Thus, we might also have components like `NAME*` or `MANAGER?` at any point in the list.
30 |
31 | 3. **ANSWER**:
32 |
33 | ```xml
34 |
35 | ```
36 |
37 | **EXPLANATION**:
38 |
39 | The correct choices (i.e., the erroneous DTD snippets) are based on two rules:
40 |
41 | 1. A `#REQUIRED` attribute must appear in every element.
42 | 2. An attribute can have types `CDATA`, `ID`, or `IDREF(S)`, but not `#PCDATA`.
43 |
44 | The incorrect choices (i.e., the snippets that could appear in a DTD), are either optional attributes (#IMPLIED) or are required attributes of a proper type.
45 |
46 | 4. **ANSWER**:
47 |
48 | ```
49 | A B /B B /B C B /B D /D /C /A
50 | ```
51 |
52 | **EXPLANATION**:
53 |
54 | According to the DTD, an A element has within it one or more B subelements, and then a C element. Within the C element is zero or one B elements followed by exactly one D element. In terms of regular expressions, the tag sequences we can see are:
55 |
56 | ```
57 | A (B /B)(B /B)* C (D /D | B /B D /D) /C /A.
58 | ```
59 |
60 | Some text may appear between each B-/B pair and each D-/D pair, but text may not appear elsewhere.
61 |
62 | 5. **ANSWER**:
63 |
64 | Only the second
65 |
66 | **EXPLANATION**:
67 |
68 | Focus on the ID and IDREF attributes: A valid document needs to have unique values across ID attributes. An IDREF attribute can refer to any existing ID attribute value.
69 |
70 | 6. **ANSWER**:
71 |
72 | ```xml
73 |
74 | John
75 | Q
76 | Public
77 | 123 Public Avenue, Seattle, WA 98001
78 | Computer Science
79 |
80 | ```
81 |
82 | **EXPLANATION**:
83 |
84 | This question deals with the `xs:element`, `xs:sequence`, and `xs:choice` elements in XML Schema. In order for XML to be valid according to the specified schema:
85 | - The elements contained in a sequence must appear in exactly the same order as specified in the `xs:sequence`.
86 | - Exactly one of the elements contained in an `xs:choice` must appear.
87 | - If an element specifies a `minOccurs` attribute, the XML must contain at least that many instances of the element.
88 | - If an element specifies a `maxOccurs` attribute, the XML must not contain more than that many instances of the element.
89 | - If `minOccurs` and `maxOccurs` are not specified, their default value is 1.
90 | - Elements not defined as a part of a sequence or choice cannot occur inside the corresponding `xs:sequence` and `xs:choice`.
91 |
92 | The given schema specifies the following constraints:
93 | - The "fname", "initial", "lname", and "address" elements must occur in that sequence.
94 | - The "initial" element is optional due to its `minOccurs` value being 0.
95 | - The "address" element can occur either 1 or 2 times due to its `maxOccurs` value being 2.
96 | - After the "address" element, either exactly one "major" element or exactly 2 "minor" elements can occur, but not both.
97 | - Elements not defined as a part of this schema specification are not allowed to occur as a part of the "person" element.
98 |
99 | Here is an example of valid XML for this schema:
100 |
101 | ```xml
102 |
103 | John
104 | Q
105 | Public
106 | 123 Public Avenue
107 | Seattle, WA 98001
108 | Computer Science
109 |
110 | ```
111 |
--------------------------------------------------------------------------------
/02_xml-data/xml-quiz.md:
--------------------------------------------------------------------------------
1 | # Multiple Choice (6 points)
2 |
3 |
4 | 1. Provide a well-formed XML that satisfies the following conditions:
5 | - It has a root element "tasklist"
6 | - The root element has 3 "task" subelements
7 | - Each of the "task" subelements has an attribute named "name"
8 | - The values of the "name" attributes for the 3 tasks are "eat", "drink", and "play"
9 |
10 | 2. An XML document contains the following portion:
11 |
12 | ```xml
13 | ...
14 |
15 | 101 Maple St.
16 | 555-1212
17 | 555-4567
18 |
19 | ...
20 | ```
21 |
22 | Which of the following could be the `INFO` element specification in a DTD that the document matches?
23 |
24 | 3. An XML document contains the following portion:
25 |
26 | ```xml
27 |
28 | 123 Sesame St.
29 | 555-1212
30 |
31 | ```
32 |
33 | Which of the following could NOT be part of a DTD that the document matches? Note that there can be multiple `ATTLIST` declarations for a single element type; do not assume the only attributes allowed for an element type are the ones shown in the answer choice.
34 |
35 | 4. Here is a DTD:
36 |
37 | ```xml
38 |
40 |
41 |
42 |
43 | ]>
44 | ```
45 |
46 | Which of the following sequences of opening and closing tags matches this DTD? Note: In actual XML, opening and closing tags would be enclosed in angle brackets, and some elements might have text subelements. This quiz focuses on the element sequencing and interleaving specified by the DTD.
47 |
48 | 5. Here is an XML DTD:
49 |
50 | ```xml
51 |
53 |
54 |
55 |
56 |
57 |
58 |
59 | ]>
60 | ```
61 |
62 | Which of the following documents match the DTD?
63 |
64 | -
65 |
66 | ```xml
67 |
68 |
69 |
70 |
71 |
72 |
73 |
74 |
75 |
76 |
77 |
78 | ```
79 | -
80 |
81 | ```xml
82 |
83 |
84 |
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 |
94 | ```
95 | -
96 |
97 | ```xml
98 |
99 |
100 |
101 |
102 |
103 |
104 |
105 | ```
106 |
107 | 6. Study the following XML Schema specification:
108 |
109 | ```xml
110 |
111 |
112 |
113 |
114 |
115 |
117 |
118 |
120 |
121 |
122 |
124 |
125 |
126 |
127 |
128 |
129 | ```
130 |
131 | Provide a XML that is valid according to the XML Schema specification.
132 |
133 | ---
134 |
135 | # [View Solutions](./xml-quiz-solutions.md)
136 |
--------------------------------------------------------------------------------
/02_xml-data/xml-schema-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/02_xml-data/xml-schema-annotated-slides.pdf
--------------------------------------------------------------------------------
/02_xml-data/xml-schema.md:
--------------------------------------------------------------------------------
1 | # XML Schema (XSD)
2 |
3 | - Extensive language
4 | - Like DTDs, can specify elements, attributes, nesting, ordering, # of occurences
5 | - Allows data types, keys, (typed) pointers, etc.
6 | - Unlike DTDs, is written in XML
7 |
8 | ## Key Features
9 |
10 | 1. Typed values
11 | 2. Key declarations
12 | - in DTDs, `ID`s were globally unique values that could be used to identify specific elements
13 | 3. References (`keyref`)
14 | - similar to pointers but a little more powerful
15 | 4. Occurrence constraints
16 | - `minOccurs="0"`: 0 instances at minimum (not required)
17 | - `maxOccurs="unbounded"`: no upper limits
18 |
19 | ---
20 |
21 | # Example
22 |
23 | ### [`Bookstore-XSD.xml`](./data/Bookstore-XSD.xml) with [`Bookstore.xsd`](./data/Bookstore.xsd)
24 |
25 | - Instead of using `IDREF` attributes to refer from books to authors, we now back our back having an author's sub-element with the two authors underneath and then those authors themselves have what are effectively *pointers to the identifiers* for the authors
26 | - The XSD is in a separate file from the actual XML data
27 |
28 | ```xml
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 | A First Course in Database Systems
48 |
49 |
50 |
51 |
52 |
53 |
54 | Database Systems: The Complete Book
55 |
56 |
57 |
58 |
59 |
60 |
61 | Amazon.com says: Buy this book bundled with
62 | - a great deal!
63 |
64 |
65 |
66 | Hector
67 | Garcia-Molina
68 |
69 |
70 | Jeffrey
71 | Ullman
72 |
73 |
74 | Jennifer
75 | Widom
76 |
77 |
78 | ```
79 |
--------------------------------------------------------------------------------
/03_json-data/data/Bookstore.json:
--------------------------------------------------------------------------------
1 | { "Books":
2 | [
3 | { "ISBN":"ISBN-0-13-713526-2",
4 | "Price":85,
5 | "Edition":3,
6 | "Title":"A First Course in Database Systems",
7 | "Authors":[ {"First_Name":"Jeffrey", "Last_Name":"Ullman"},
8 | {"First_Name":"Jennifer", "Last_Name":"Widom"} ] }
9 | ,
10 | { "ISBN":"ISBN-0-13-815504-6",
11 | "Price":100,
12 | "Remark":"Buy this book bundled with 'A First Course' - a great deal!",
13 | "Title":"Database Systems:The Complete Book",
14 | "Authors":[ {"First_Name":"Hector", "Last_Name":"Garcia-Molina"},
15 | {"First_Name":"Jeffrey", "Last_Name":"Ullman"},
16 | {"First_Name":"Jennifer", "Last_Name":"Widom"} ] }
17 | ],
18 | "Magazines":
19 | [
20 | { "Title":"National Geographic",
21 | "Month":"January",
22 | "Year":2009 }
23 | ,
24 | { "Title":"Newsweek",
25 | "Month":"February",
26 | "Year":2009 }
27 | ]
28 | }
29 |
--------------------------------------------------------------------------------
/03_json-data/data/BookstoreSchema.json:
--------------------------------------------------------------------------------
1 | { "type":"object",
2 | "properties": {
3 | "Books": {
4 | "type":"array",
5 | "items": {
6 | "type":"object",
7 | "properties": {
8 | "ISBN": { "type":"string", "pattern":"ISBN*" },
9 | "Price": { "type":"integer",
10 | "minimum":0, "maximum":200 },
11 | "Edition": { "type":"integer", "optional": true },
12 | "Remark": { "type":"string", "optional": true },
13 | "Title": { "type":"string" },
14 | "Authors": {
15 | "type":"array",
16 | "minItems":1,
17 | "maxItems":10,
18 | "items": {
19 | "type":"object",
20 | "properties": {
21 | "First_Name": { "type":"string" },
22 | "Last_Name": { "type":"string" }}}}}}},
23 | "Magazines": {
24 | "type":"array",
25 | "items": {
26 | "type":"object",
27 | "properties": {
28 | "Title": { "type":"string" },
29 | "Month": { "type":"string",
30 | "enum":["January","February"] },
31 | "Year": { "type":"integer" }}}}
32 | }}
33 |
--------------------------------------------------------------------------------
/03_json-data/json-demo-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/03_json-data/json-demo-annotated-slides.pdf
--------------------------------------------------------------------------------
/03_json-data/json-demo.md:
--------------------------------------------------------------------------------
1 | # Demo
2 |
3 | - Constructs and syntactic correctness
4 | - Flexibility of data model
5 | - JSON Schema, validation
6 |
7 | # Example
8 |
9 | ## [`Bookstore.json`](./data/Bookstore.json)
10 |
11 | JSON data representing books and magazines
12 |
13 | ```json
14 | { "Books":
15 | [
16 | { "ISBN":"ISBN-0-13-713526-2",
17 | "Price":85,
18 | "Edition":3,
19 | "Title":"A First Course in Database Systems",
20 | "Authors":[ {"First_Name":"Jeffrey", "Last_Name":"Ullman"},
21 | {"First_Name":"Jennifer", "Last_Name":"Widom"} ] }
22 | ,
23 | { "ISBN":"ISBN-0-13-815504-6",
24 | "Price":100,
25 | "Remark":"Buy this book bundled with 'A First Course' - a great deal!",
26 | "Title":"Database Systems:The Complete Book",
27 | "Authors":[ {"First_Name":"Hector", "Last_Name":"Garcia-Molina"},
28 | {"First_Name":"Jeffrey", "Last_Name":"Ullman"},
29 | {"First_Name":"Jennifer", "Last_Name":"Widom"} ] }
30 | ],
31 | "Magazines":
32 | [
33 | { "Title":"National Geographic",
34 | "Month":"January",
35 | "Year":2009 }
36 | ,
37 | { "Title":"Newsweek",
38 | "Month":"February",
39 | "Year":2009 }
40 | ]
41 | }
42 | ```
43 |
44 | ## [`BookstoreSchema.json`](./data/BookstoreSchema.json)
45 |
46 | - The structure of the schema file reflects the structure of the data file it's describing.
47 | - The outermost constructs in the schema file are the outermost in the data file and as we nest, it parallels the nesting.
48 | - In JSON Schema, `additionalProperties: true | false` determines whether said data are allowed to have any properties beyond those that are specified in the schema.
49 | - Many parsers actually do enforce that labels or properties need to be unique within objects, even though technically syntatically correct JSON does allow multiple copies.
50 | - Numeric boundaries for integer types:
51 |
52 | "Price": {
53 | "type": integer,
54 | "minimum": min,
55 | "maximum": max
56 | }
57 |
58 | - We can constrain strings using a pattern (similar to regex), and we can constrain any type by enumerating the values that are allowed
59 |
60 | "Month": {
61 | "type": string,
62 | "enum": ["January", "February", "March", ... ]
63 | }
64 |
65 |
66 | ```json
67 | { "type":"object",
68 | "properties": {
69 | "Books": {
70 | "type":"array",
71 | "items": {
72 | "type":"object",
73 | "properties": {
74 | "ISBN": { "type":"string", "pattern":"ISBN*" },
75 | "Price": { "type":"integer",
76 | "minimum":0, "maximum":200 },
77 | "Edition": { "type":"integer", "optional": true },
78 | "Remark": { "type":"string", "optional": true },
79 | "Title": { "type":"string" },
80 | "Authors": {
81 | "type":"array",
82 | "minItems":1,
83 | "maxItems":10,
84 | "items": {
85 | "type":"object",
86 | "properties": {
87 | "First_Name": { "type":"string" },
88 | "Last_Name": { "type":"string" }}}}}}},
89 | "Magazines": {
90 | "type":"array",
91 | "items": {
92 | "type":"object",
93 | "properties": {
94 | "Title": { "type":"string" },
95 | "Month": { "type":"string",
96 | "enum":["January","February"] },
97 | "Year": { "type":"integer" }}}}
98 | }}
99 | ```
100 |
--------------------------------------------------------------------------------
/03_json-data/json-intro-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/03_json-data/json-intro-annotated-slides.pdf
--------------------------------------------------------------------------------
/03_json-data/json-intro.md:
--------------------------------------------------------------------------------
1 | # JavaScript Object Notation (JSON)
2 |
3 | - Like XML, can be thought of as a data model
4 | - Standard for "serializing" data objects, usually in files
5 | - *serializing*: taking objects that are in a program and writing them in a serial fashion
6 | - Human-readable, useful for data interchange
7 | - An alternative to the relational data model that is more appropriate for semi-structured data
8 | - No longer tied to JavaScript
9 | - Parsers available for many languages
10 |
11 | ## Base constructs (recursive)
12 |
13 | - Base values: number, string, boolean
14 | - Objects: sets of label-value pairs (enclosed in `{}`)
15 | - Arrays: list of values (enclosed in `[]`)
16 |
17 | ---
18 |
19 | # Relational Model versus JSON
20 |
21 | ## Structure
22 |
23 | - Relational: *tables*
24 | - JSON: *sets, arrays*
25 | - can be nested recursively
26 |
27 | ## Schema
28 |
29 | - Relational: *fixed in advance*
30 | - add the data afterwards to conform to the schema
31 | - JSON: *no schema required*
32 | - schema and data are kind of mixed together (just like XML)
33 | - often referred to as ***self-describing data***, where the schema elements are within the data itself
34 |
35 | ## Queries
36 |
37 | - Relational: *simple, nice languages*
38 | - JSON: *nothing widely used*
39 | - typically read into a program and manipulated programmatically
40 | - there exists a JSON path language, JSON Query (jaql)
41 |
42 | ## Ordering
43 |
44 | - Relational: *None*
45 | - fundamentally, the data in a relationship database is a set of data without an ordering within that set
46 | - JSON: *arrays* (ordered)
47 |
48 | ## Implementation
49 |
50 | - Relational: *Native*
51 | - has been around for at least 35 years
52 | - JSON: *coupled with programming languages* (no standalone systems)
53 | - used in NoSQL videos as a format for reading and writing data
54 | - some NoSQL systems are ***document management systems***, where the documents themselves may contain JSON data and then the systems will have special features for manipulating the JSON in the document that is stored by the system
55 |
56 | ---
57 |
58 | # XML versus JSON
59 |
60 | ## Verbosity
61 |
62 | XML > JSON:
63 |
64 | - largely due to closing tags in XML
65 |
66 | ## Complexity
67 |
68 | XML > JSON:
69 |
70 | - a lot of extra stuff in XML
71 | - takes more time to read an entire XML specification than to read JSON
72 |
73 | ## Validity
74 |
75 | - XML: document-type descriptors (DTDs), XML schema descriptors (XSD)
76 | - JSON: JSON schema
77 | - a way to specify structure and tests conformation
78 | - not widely used (as of Feb. 2012)
79 |
80 | ## Programming Interface
81 |
82 | - XML: potentially clunky
83 | - ***impedance mismatch***: attributes and sub-elements of an XML model don't typically match the model of data inside a programming language
84 | - discussed heavily in the field of database systems (one of the original criticisms of relational database systems)
85 | - JSON: more direct mapping
86 | - i.e. Python dictionaries are very similar to JSON
87 |
88 | ## Querying
89 |
90 | - XML widely used:
91 | - XPath
92 | - XQuery
93 | - XSLT
94 | - JSON:
95 | - JSON Path (proposal)
96 | - JSON Query
97 | - JAQL
98 |
99 | ---
100 |
101 | # Syntatically valid JSON
102 |
103 | ## Basic structural requirements
104 |
105 | - Sets of label-value pairs
106 | - Arrays of values
107 | - Base values from predefined types
108 |
109 | ## Process
110 |
111 | 1. Send JSON file to parser
112 | 2. Parser determines whether there are syntactic errors or not
113 | 3. If so, parse and manipulated via programming language
114 |
115 | # Semantically valid JSON
116 |
117 | ## Basic structural requirements
118 |
119 | Additionally checks whether data conforms to specified schema
120 |
121 | - if we use JSON schema, we put specification in a separate file
122 | - send to validator (additional step before checking syntactic errors)
123 |
124 | ---
125 |
126 | # Questions
127 |
128 | 1. You're creating a database to contain information about students in a class (name and ID), and class projects done in pairs (two students and a project title). Should you use the relational model or JSON?
129 |
130 | **ANSWER**: Relational
131 |
132 | **EXPLANATION**: The database has a fixed structure that lends itself to tables (one table for student information and one for project information) and convenient queries in a relational language.
133 |
134 | 2. You're creating a database to contain information about students in a class (name and ID), and class projects. Projects may include any combination of students; they have a title and optional additional information such as materials, approvals, and milestones. Should you use the relational model or JSON?
135 |
136 | **ANSWER**: JSON
137 |
138 | **EXPLANATION**: The database has a complex, irregular, and possibly dynamic structure, so the flexibility of JSON is warranted.
139 |
140 | 3. You're creating a database to contain a set of sensor measurements from a two-dimensional grid. Each measurement is a time-sequence of readings, and each reading contains ten labeled values. Should you use the relational model or JSON?
141 |
142 | **ANSWER**: Either one is appropriate
143 |
144 | **EXPLANATION**: The database has a fixed structure suggesting relational, but its nested array, list, and label-value structure suggests JSON. Either may be suitable.
145 |
--------------------------------------------------------------------------------
/03_json-data/json-quiz-solutions.md:
--------------------------------------------------------------------------------
1 | # Solutions to [JSON Quiz](./json-quiz.md)
2 |
3 | 1. **EXPLANATION**:
4 |
5 | In JSON objects, all labels (property names) must be quoted, and all label-value pairs must be separated by commas. Values in label-value pairs can be numbers, quoted strings, true, false, null, objects, or arrays. Objects and arrays may be empty.
6 |
7 | 2. **EXPLANATION**:
8 |
9 | A JSON array is a comma-separated, [] enclosed list of JSON values. Values can be numbers, quoted strings, true, false, null, objects, or arrays. Objects and arrays may be empty. Objects must be a set of label-value pairs.
10 |
11 | 3. **ANSWER**:
12 |
13 | ```json
14 | { "ItemID": "Item123",
15 | "ItemName": "desk",
16 | "Price": 50,
17 | "Sellers": ["Amy","Ben"],
18 | "Ratings": [{"Rater":"Amy", "Score":5}, {"Score":1},
19 | {"Rater":"Carl", "Score":4}],
20 | "AvgRating": 10.0,
21 | "FreeShipping": true }
22 | ```
23 |
24 | **EXPLANATION**:
25 |
26 | ON data that is valid according to the JSON Schema specification must have: an itemID that is a string beginning with "Item"; a Price that is an integer between 10 and 100; an array of between 0 and 3 Sellers; an array of 0 or more Ratings, each of which is either a Rater-Score pair, or just a Score, where scores are integers between 1 and 5; a FreeShipping designation of either true or false. (AvgShipping, a real number, is optional.)
27 |
28 | 4. **EXPLANATION**:
29 |
30 | The following JSON Schema specification is valid for the given data:
31 |
32 | ```json
33 | { "type": "object",
34 | "properties": {
35 | "A": {"type":"array", "minItems":4,
36 | "maxItems":4, "items":{"type":"integer"}},
37 | "B": {"type":"object",
38 | "properties": {"C": {"type":"integer"}, "D": {"type":"integer"}}},
39 | "E": {"type":"array", "items": {"type":["integer","boolean"]}},
40 | "F": {"type":"object",
41 | "properties": {"G": {"type":"array",
42 | "items": {"type":["null","integer"]}}}}
43 | }
44 | }
45 | ```
46 |
47 | Changing the minimum and/or maximum number of items in "A" is valid as long as four items are permitted. Alternative types may be added (e.g., replacing "integer" with ["integer","string"]) without violating validity.
48 |
--------------------------------------------------------------------------------
/03_json-data/json-quiz.md:
--------------------------------------------------------------------------------
1 | # Multiple Choice
2 |
3 |
4 | 1. Why is this NOT a valid JSON object?
5 |
6 | ```json
7 | { "name": "Smiley",
8 | "age": 20,
9 | "phone": { "888-123-4567", "888-765-4321" },
10 | "email": "smiley@xyz.com",
11 | "happy": true }
12 | ```
13 |
14 | 2. Why is this NOT a valid JSON array?
15 |
16 | ```json
17 | [ 1, 2, "dog", "cat", true, false, [1, "dog", null], {} ]
18 | ```
19 |
20 | 3. Consider the following JSON Schema specification:
21 |
22 | ```json
23 | {
24 | "type": "object",
25 | "properties":
26 | { "ItemID": { "type":"string", "pattern":"Item*" },
27 | "ItemName": { "type":"string" },
28 | "Price": { "type":"integer", "minimum":10, "maximum":100 },
29 | "Sellers": { "type":"array", "maxItems":3,
30 | "items": { "type":"string" }},
31 | "Ratings": { "type":"array",
32 | "items":
33 | { "type": "object",
34 | "properties": {"Rater":
35 | {"type": "string", "optional": true},
36 | "Score":
37 | {"type": "integer", "minimum":1,
38 | "maxiumum":5}}}},
39 | "AvgRating": { "type":"number", "optional":true },
40 | "FreeShipping": {"type":"boolean" }
41 | }
42 | }
43 | ```
44 |
45 | Provide the JSON data that is valid according to the JSON Schema specification above.
46 |
47 | 4. Consider the following JSON data:
48 |
49 | ```json
50 | { "A": [1,1,2,2], "B": {"C":3, "D":4}, "E":[5,6,true], "F": {"G": [null,7]} }
51 | ```
52 |
53 | Why could the following NOT be included as part of a JSON Schema specification that is satisfied by the JSON data above? Assume that every letter ("A", "B", "C", ...) appears in the JSON Schema specification exactly once.
54 |
55 | ```json
56 | "G": {"type":"array", "items": {"type":"integer"}}
57 | ```
58 |
59 | ---
60 |
61 | # [View Solutions](./json-quiz-solutions.md)
62 |
--------------------------------------------------------------------------------
/04_relational-algebra/relational-algebra-1-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/04_relational-algebra/relational-algebra-1-annotated-slides.pdf
--------------------------------------------------------------------------------
/04_relational-algebra/relational-algebra-2-annotated-slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/04_relational-algebra/relational-algebra-2-annotated-slides.pdf
--------------------------------------------------------------------------------
/04_relational-algebra/relational-algebra-questions.md:
--------------------------------------------------------------------------------
1 | # Part 1 - Select, Project, Join
2 |
3 | 1. `Statement A: It can be useful to compose two selection operators.`
4 |
5 | For example:
6 |
7 | $$\sigma_{state='CA'}(\sigma_{enr>8000} \mathrm{College})$$
8 |
9 | `Statement B: It can be useful to compose two projection operators.`
10 |
11 | For example:
12 |
13 | $$\pi_{GPA} (\pi_{sID,GPA,HS} \mathrm{Student})$$
14 |
15 | ---
16 |
17 | **ANSWER**: Both statements are false.
18 |
19 | **EXPLANATION**: Two selection operators in a row can always be replaced by a single selection operator whose condition is the "and" of the two selection conditions. If there are two projection operators in a row, the attribute list of the second (outer) projection must be a subset of the attribute list of the first (inner) projection. Thus, the first projection can be removed without changing the result of the expression.
20 |
21 | ---
22 |
23 | 2. Which of the following expressons does NOT return the names and GPAs of students with HS>1000 who applied to CS and were rejected?
24 |
25 | A) $\pi_{sName, GPA}(\sigma_{Student.sID=Apply.sID}(\sigma_{HS>1000}(Student) \times \sigma_{major=`CS` \wedge dec=`R`}(Apply)))$
26 |
27 | B) $\pi_{sName, GPA}(\sigma_{Student.sID=Apply.sID \wedge HS > 1000 \wedge major=`CS` \wedge dec=`R`}(Student \times \pi_{sID, major, dec}Apply))$
28 |
29 | C) $\sigma_{Student.sID=Apply.sID}(\pi_{sName, GPA}(\sigma_{HS > 1000}Student \times \sigma_{major=`CS` \wedge dec=`R`}Apply))$
30 |
31 | ---
32 |
33 | **ANSWER**: C
34 |
35 | **EXPLANATION**: The first two expressions are equivalent to the expression given in the video for this query. The third expression is invalid because the sID attributes are not included in the result of the expression to which condition Student.sID=Apply.sID is applied.
36 |
37 | ---
38 |
39 | 3. Which of the following English sentences describes the result of this expression:
40 |
41 | $$\pi_{sName, cName}(\sigma_{HS > enr}(\sigma_{state=`CA`}College \Join Student \Join \sigma_{major=`CS`}Apply))$$
42 |
43 | A) All student-college name pairs, where the student is applying to major in CS at the college, the college is in California, and the college is smaller than some high school
44 |
45 | B) Students paired with all California colleges to which the student applied to major in CS, where at least one of those colleges is smaller than the student's high school
46 |
47 | C) Students paired with all colleges smaller than the student's high school to which the student applied to major in CS, where at least one of those colleges is in California
48 |
49 | D) Students paired with all California colleges smaller than the student's high school to which the student applied to major in CS
50 |
51 | ---
52 |
53 | **ANSWER**: D
54 |
55 | **EXPLANATION**: The inner natural join connects students with the colleges to which they've applied, allowing only California colleges and CS-major applicaitons. The outer selection condition filters out all applications except those where the high school is bigger than the college, and the final projection keeps the student and college names.
56 |
57 | ---
58 |
59 | 4. Which of the following expressions finds the IDs of all students such that some college bears the student's name?
60 |
61 | A) $\pi_{sID}(College \Join Student)$
62 |
63 | B) $\pi_{sID}(\sigma_{cName=sName}(College \times Student))$
64 |
65 | C) $\pi_{sID}(\pi_{cName}College \Join \pi_{cName}(\sigma_{sName=cName}Student))$
66 |
67 | D) $\pi_{sID}(\sigma_{cName=sName}(\pi_{sID}Student \times College \times Student)$
68 |
69 | ---
70 |
71 | **ANSWER**: B
72 |
73 | **EXPLANATION**: The first choice returns the IDs of all students in the database. The third choice is invalid because cName is not an attribute of Student. The fourth choice yields all sIDs in Student.
74 |
75 | \pagebreak
76 |
77 | # Part 2 - Set Operators, Renaming, Notation
78 |
79 | 1. Three of the following four expressions finds the names of all students who did not apply to major in CS or EE. Which one finds something different? (Hint: You should not assume student names are unique.)
80 |
81 | A) $\pi_{sName}(Student \Join (\pi_{sID}Student - (\pi_{sID}(\sigma_{major='CS'}Apply) \cup \pi_{sID}(\sigma_{major='EE'}Apply))))$
82 |
83 | B) $\pi_{sName}(Student \Join (\pi_{sID}Student - \pi_{sID}(\sigma_{major='CS' \vee major='EE'}Apply)))$
84 |
85 | C) $\pi_{sName}(\pi_{sID, sName}Student - \pi_{sID, sName}(Student \Join \pi_{sID}(\sigma_{major='CS' \vee major='EE'}Apply)))$
86 |
87 | D) $\pi_{sName}Student - \pi_{sName}(Student \Join \pi_{sID}(\sigma_{major='CS' \vee major='EE'}Apply))$
88 |
89 | ---
90 |
91 | **ANSWER**: D
92 |
93 | **EXPLANATION**: If there are two students named Susan, one who did not apply to CS or EE, and one who did, then 'Susan' will correctly be included in the result of the first three expressions, but will incorrectly be omitted from the result of the last one.
94 |
95 | ---
96 |
97 | 2. Which of the following English sentences describes the result of this expression:
98 |
99 | $$\pi_{cName}College - \pi_{cName}(Apply \Join (\pi_{sID}(\sigma_{GPA > 3.5}Student) \cap \pi_{sID}(\sigma_{major='CS'}Apply)))$$
100 |
101 | A) All colleges with no GPA>3.5 applicants who applied for a CS major at that college
102 |
103 | B) All colleges with no GPA>3.5 applicants who applied for a CS major at any college
104 |
105 | C) All colleges where all applicants either have GPA>3.5 or applied for a CS major at that college
106 |
107 | D) All colleges where no applicants have GPA>3.5 or no applicants applied for a CS major at that college
108 |
109 | ---
110 |
111 | **ANSWER**: B
112 |
113 | **EXPLANATION**: The intersection finds the IDs of all students who have GPA>3.5 and applied for a CS major at any college. The natural-join with Apply finds all colleges those students applied for. The minus finds all colleges except the ones those students applied for.
114 |
115 | ---
116 |
117 | 3. Suppose relation Student has 20 tuples. What is the minimum and maximum number of tuples in the result of this expression:
118 |
119 | $$\rho_{s1(i1, n1, g, h)}Student \Join \rho_{s2(i2, n2, g, h)}Student$$
120 |
121 | A) minimum = 0, maximum = 400
122 |
123 | B) minimum = 20, maximum = 20
124 |
125 | C) minimum = 20, maximum = 400
126 |
127 | D) minimum = 40, maximum = 40
128 |
129 | ---
130 |
131 | **ANSWER**: C
132 |
133 | **EXPLANATION**: If every student has a unique GPA-HS combination, then students join only with themselves, and there are 20 tuples in the result (minimum). If every student has the same GPA and HS as every other student, then all pairs join and the result has $20*20=400$ tuples.
134 |
135 | ---
136 |
137 | 4. Suppose relations College, Student, and Apply have 5, 20, and 50 tuples in them respectively. Remember that cName is a key for College. Do not assume sName is a key for Student. Do assume that college names in Apply also appear in College. What is the minimum and maximum number of tuples in the result of this expression:
138 |
139 | $$\pi_{cName}College \cup \rho_{cName}(\pi_{sName}Student) \cup \pi_{cName}Apply$$
140 |
141 | A) minimum = 5, maximum = 25
142 |
143 | B) minimum = 5, maximum = 75
144 |
145 | C) minimum = 25, maximum = 45
146 |
147 | D) minimum = 75, maximum = 75
148 |
149 | ---
150 |
151 | **ANSWER**: A
152 |
153 | **EXPLANATION**: Recall that duplicates are eliminated automatically in relational algebra. If all students have names that are also college names, then there are only 5 names altogether (minimum). If every student has a unique name and none of them are college names, then there are 5+20=25 names altogether (maximum). Since all college names in Apply are also in College, the third term of the expression cannot add any new names.
154 |
--------------------------------------------------------------------------------
/04_relational-algebra/relational-algebra-questions.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/04_relational-algebra/relational-algebra-questions.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Table of Contents
2 |
3 | - Introduction ([notes](00_introduction/notes.md))
4 | - Relational Databases ([summary](01_relational-databases/summary.md))
5 | - The Relational Model ([annotated slides](01_relational-databases/relational-model-annotated-slides.pdf))
6 | - Querying Relational Databases ([annotated slides](01_relational-databases/relational-querying-annotated-slides.pdf))
7 | - XML Data
8 | - Well-formed XML ([notes](02_xml-data/well-formed-xml.md) | [annotated slides](02_xml-data/well-formed-xml-annotated-slides.pdf))
9 | - DTDs, IDs, and IDREFS ([notes](02_xml-data/dtds.md) | [annotated slides](02_xml-data/dtds-annotated-slides.pdf))
10 | - XML Schema ([notes](02_xml-data/xml-schema.md) | [annotated slides](02_xml-data/xml-schema-annotated-slides.pdf))
11 | - XML Quiz ([problems](02_xml-data/xml-quiz.md) | [solutions](02_xml-data/xml-quiz-solutions.md))
12 | - JSON Data
13 | - JSON Introduction ([notes](03_json-data/json-intro.md) | [annotated slides](03_json-data/json-intro-annotated-slides.pdf))
14 | - JSON Demo ([notes](03_json-data/json-demo.md) | [annotated slides](03_json-data/json-demo-annotated-slides.pdf))
15 | - JSON Quiz ([problems](03_json-data/json-quiz.md) | [solutions](03_json-data/json-quiz-solutions.md))
16 | - Relational Algebra
17 | - Relational Algebra Part 1: Select, Project, Join ([annotated slides](./04_relational-algebra/relational-algebra-1-annotated-slides.pdf))
18 | - Relational Algebra Part 2: Set Operators, Renaming, Notation ([annotated slides](./04_relational-algebra/relational-algebra-2-annotated-slides.pdf))
19 | - Relational Algebra Lecture Questions ([Raw MD](./04_relational-algebra/relational-algebra-questions.md) | [PDF](./04_relational-algebra/relational-algebra-questions.pdf))
20 | - SQL
21 | - Relational Design Theory
22 | - Querying XML
23 | - Unified Modeling Language
24 | - Indexes
25 | - Transactions
26 | - Constraints and Triggers
27 | - Views
28 | - Authorization
29 | - Recursion
30 | - On-Line Analytical Processing
31 | - NoSQL Systems
32 | - Exams
33 |
34 | # About the Course
35 |
36 | > "Introduction to Databases" was one of Stanford's three inaugural massive open online courses in the fall of 2011; it was offered again in MOOC format in 2013 and 2014. Materials from the MOOC offerings have been available for self-study on Coursera as well as on other platforms. Starting in summer 2014, the materials are now being offered on the OpenEdX platform as a set of smaller self-paced "mini-courses", which can be assembled in a variety of ways to learn about different aspects of databases. All of the mini-courses are based around video lectures and/or video demos. Many of them include in-video quizzes to check understanding, in-depth standalone quizzes, and/or a variety of automatically-checked interactive programming exercises. Each mini-course also includes a discussion forum and pointers to readings and resources. Taught by Professor Jennifer Widom, the overall curriculum draws from Stanford's popular Databases course. To explore and enroll in the new Databases mini-courses, please visit .
37 |
--------------------------------------------------------------------------------