├── .gitignore ├── 00_introduction └── notes.md ├── 01_relational-databases ├── relational-model-annotated-slides.pdf ├── relational-querying-annotated-slides.pdf └── summary.md ├── 02_xml-data ├── data │ ├── Bookstore-DTD.xml │ ├── Bookstore-IDREFs.xml │ ├── Bookstore-XSD.xml │ ├── Bookstore-noDTD.xml │ └── Bookstore.xsd ├── dtds-annotated-slides.pdf ├── dtds.md ├── exercises │ ├── countries.dtd │ ├── countries.xml │ ├── courses-ID.dtd │ ├── courses-ID.xml │ ├── courses-noID.dtd │ └── courses-noID.xml ├── well-formed-xml-annotated-slides.pdf ├── well-formed-xml.md ├── xml-quiz-solutions.md ├── xml-quiz.md ├── xml-schema-annotated-slides.pdf └── xml-schema.md ├── 03_json-data ├── data │ ├── Bookstore.json │ └── BookstoreSchema.json ├── json-demo-annotated-slides.pdf ├── json-demo.md ├── json-intro-annotated-slides.pdf ├── json-intro.md ├── json-quiz-solutions.md └── json-quiz.md ├── 04_relational-algebra ├── relational-algebra-1-annotated-slides.pdf ├── relational-algebra-2-annotated-slides.pdf ├── relational-algebra-questions.md └── relational-algebra-questions.pdf └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | script-files 2 | -------------------------------------------------------------------------------- /00_introduction/notes.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | - Database applications may be programmed via **frameworks** (i.e. Ruby on Rails, Django, etc.) 4 | - DBMS may run in conjunction with **middleware** (i.e. application servers, web servers, etc.) 5 | - Data-intensive applications may not use DBMS at all 6 | - Data is not always processed through query languages associated with database systems 7 | - Hadoop is a processing framework for running operations on data that's stored in files 8 | 9 | # Features of a DBMS 10 | 11 | Database Management System (DBMS) provides ... 12 | 13 | > ... efficient, reliable, convenient, and safe multi-user storage of and acess to massive amounts of persistent data 14 | 15 | - Massive: 16 | - Terabytes 17 | - DBMSs are designed to handle data that is residing outside of memory 18 | - Persistent 19 | - Data in the database outlives the program that execute on that data 20 | - Safe 21 | - Needs guarantee that managed data will stay in a consistent state 22 | - Failures come in all forms: hardwore, software, power, users, etc. 23 | - Multi-user 24 | - Programs may allow different users to access data *concurrently* 25 | - ***Concurrency control*** a database mechanism that controls the way multiple users access the database; control actually occurs at the level of the data items in the database 26 | - Similar to *file system concurrency* or even *variable concurrency* except it's more centered around the data itself 27 | - Convenient 28 | - Designed to make it easy to work with massive data 29 | - ***Physical Data Independence***: the way that data is actually stored and laid out on disk is independent of the way the programs think about the structure of the data 30 | - You could have a program that operates on a database and underneath there could be a complete change in the way the data is stored, yet the program itself would not have to be changed 31 | - ***High level query languages***: obey the notion called *declarative*, which is saying that in the query, you describe thwat you want out of the database but you don't need to describe the algorithm to get the data out 32 | - Efficient 33 | - Has to perform thousands of queries or updates per second 34 | - Reliable 35 | - 99.99999% uptime is the type of guarantee that DBMSs are making for their applications 36 | 37 | --- 38 | 39 | # Key Concepts 40 | 41 | ## Data model 42 | 43 | - Set or records: the data and the database is thought of as such in the relational data model 44 | - XML documents: hierarchical structure of labeled values 45 | - Graph data model: all data in the DB is in the form of nodes and edges 46 | 47 | ## Schema versus data 48 | 49 | Like types and variables in a programming language 50 | 51 | A schema ... 52 | - sets up the structure of the database 53 | - is typically set up at the beginning, and doesn't change very much where the data changes rapidly 54 | - is normally set up with what's known as a ***data definition language*** 55 | 56 | ## Data definition language (DDL) 57 | 58 | - Sometimes people use higher level design tools that help them think about the design and then from there go to the data definitoin language 59 | - Used in general to set up a schema or structure for a particular database 60 | - Once the schema has been set up and data has been loaded, it's possible to start querying and modifying the data with what's known as ***data manipulation language*** 61 | 62 | ## Data manipulation or query language (DML) 63 | 64 | - Queries and modifies the database (duh) 65 | 66 | --- 67 | 68 | # Key People 69 | 70 | ## DBMS Implementer 71 | 72 | - Implements (builds) the database system itself 73 | 74 | ## Database designer 75 | 76 | - Establishes the schema for a database 77 | - If working on an application, the database designer has to figure out how to structure the data before the application is built 78 | - Surprisingly difficult job when complex data is involved in an application 79 | 80 | ## Database application developer 81 | 82 | - Builds the applications or programs that operate on the database 83 | - Often interfacing between the eventual user and the data itself 84 | - You can have a database with many different operating programs, i.e. a sales database where some applications are actually inserting the sales as they happen, while others are analyzing the sales 85 | - Not necessary to have *one-to-one coupling* between programs and databases 86 | 87 | ## Database administrator 88 | 89 | - Loads the data and keeps it running smoothly 90 | - Very important job for large DB applications (highly paid) 91 | - DBMS tend to have a number of tuning parameters associated with them, and getting those tuning parameters right can make a significant difference in the important performance of the database system 92 | -------------------------------------------------------------------------------- /01_relational-databases/relational-model-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/01_relational-databases/relational-model-annotated-slides.pdf -------------------------------------------------------------------------------- /01_relational-databases/relational-querying-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/01_relational-databases/relational-querying-annotated-slides.pdf -------------------------------------------------------------------------------- /01_relational-databases/summary.md: -------------------------------------------------------------------------------- 1 | ### Atomic types 2 | 3 | It's typical for relational databases to have just just atomic types in their attributes, but many database systems do also support structured types inside attributes. 4 | 5 | ### Enumerated domain 6 | 7 | Most relational databases have a concept of enumerated domain. For example, the attribute `state` might be an enumerated domain for the 50 abbreviations for states. 8 | 9 | ### Schema 10 | 11 | Schema of a database is the structure of the relation. It includes the name of the relation and the attributes of the relation and the types of those attributes. 12 | 13 | ### Instance 14 | 15 | The instance is the actual contents of the table at a given point in time. 16 | 17 | Typically, you set up a schema in advance, then the instances of the data will change over time. 18 | 19 | ### NULL 20 | 21 | Null values are used to denote that a particular value is maybe unknown or undefined. 22 | 23 | Null values are useful but one has to be very careful in a database sytem when running queries over relations that have null values. For example, `attribute == value OR attribute != value` won't include values that are `NULL`. 24 | 25 | ### Key 26 | 27 | A key is an attribute of a relation where every value for that attribute is unique. Every tuple is going to have a unique ID. For example, a student ID number in a student relation may very well be a key (since each student usually has a unique ID). 28 | 29 | Why it's important to have attributes that are identified as keys: 30 | 31 | - If you want to run a query to get a specific tuple out of the database, you would do so by asking for that tuple by its key. 32 | - Database systems, for efficiency, tend to build special index structures or store the database in a particular way so it's very fast to find a tuple based on its key 33 | - If one relation in a relational database wants to refer to tuples of another, there's no concept of pointer in relational databases. Therefore, the first relation will typically refer to a tuple in the second relation by its unique key. 34 | 35 | ### Steps in creating and using a (relational) database 36 | 37 | 1. Design schema; create using DDL (data definition language) 38 | 2. "Bulk load" initial data 39 | - fairly common for database to be initially loaded from data that comes from outside source 40 | 3. Repeat: execute queries and modifications 41 | 42 | ### Ad-hoc queries in high-level language 43 | 44 | **ad hoc** = posing queries that one didn't think of in advance; unnecessary to write long programs for specific queries 45 | 46 | Rather, the language can be used to pose a query as you think about what you want to ask. 47 | 48 | - Some easy to pose; some a bit harder 49 | - Some easy for DBMS to execute efficiently; some harder (not correlated with above) 50 | - "Query language" also used to modify data 51 | 52 | #### Example 53 | 54 | - All students with GPA >3.7 applying to Stanford and MIT only 55 | - All engineering departments in CA with < 500 applicants 56 | - College with highest average accept rate over last 5 years 57 | 58 | ### Queries return relations (*compositional, closed*) 59 | 60 | When you get back the same type of object that you query, that's known as ***closure*** of the language 61 | 62 | ***Compositionality*** is the abillity to run a query over the result of our previous query. 63 | 64 | 65 | ### Query Languages 66 | 67 | - ***Relational Algebra***: formal 68 | - Statement: *IDs of students with GPA > 3.7 applying to Stanford* 69 | - Expression: ![\pi_{ID} = \sigma_{(GPA > 3.7) \wedge (cName = "Stanford")} = (student \bowtie Apply)](http://www.sciweavers.org/tex2img.php?eq=%24%24%5Cpi_%7BID%7D%20%3D%20%5Csigma_%7B%5Cleft%5B%20%28GPA%20%3E%203.7%29%20%5Cwedge%20%28cName%20%3D%20%22Stanford%22%29%20%5Cright%5D%7D%20%3D%20%28student%20%5Cbowtie%20Apply%29%24%24&bc=White&fc=Black&im=png&fs=12&ff=fourier&edit=0) 70 | 71 | - ***SQL***: actual/implemented 72 | ``` 73 | Select Student.ID 74 | From Student, Apply 75 | Where Student.ID=Apply.ID 76 | And GPA>3.7 and college='Stanford' 77 | ``` 78 | -------------------------------------------------------------------------------- /02_xml-data/data/Bookstore-DTD.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 11 | 12 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | ]> 24 | 25 | 26 | 27 | A First Course in Database Systems 28 | 29 | 30 | Jeffrey 31 | Ullman 32 | 33 | 34 | Jennifer 35 | Widom 36 | 37 | 38 | 39 | 40 | Database Systems: The Complete Book 41 | 42 | 43 | Hector 44 | Garcia-Molina 45 | 46 | 47 | Jeffrey 48 | Ullman 49 | 50 | 51 | Jennifer 52 | Widom 53 | 54 | 55 | 56 | Buy this book bundled with "A First Course" - a great deal! 57 | 58 | 59 | 60 | -------------------------------------------------------------------------------- /02_xml-data/data/Bookstore-IDREFs.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | ]> 23 | 24 | 25 | 26 | A First Course in Database Systems 27 | 28 | 29 | Database Systems: The Complete Book 30 | 31 | Amazon.com says: Buy this book bundled with 32 | - a great deal! 33 | 34 | 35 | 36 | Hector 37 | Garcia-Molina 38 | 39 | 40 | Jeffrey 41 | Ullman 42 | 43 | 44 | Jennifer 45 | Widom 46 | 47 | 48 | -------------------------------------------------------------------------------- /02_xml-data/data/Bookstore-XSD.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | A First Course in Database Systems 20 | 21 | 22 | 23 | 24 | 25 | 26 | Database Systems: The Complete Book 27 | 28 | 29 | 30 | 31 | 32 | 33 | Amazon.com says: Buy this book bundled with 34 | - a great deal! 35 | 36 | 37 | 38 | Hector 39 | Garcia-Molina 40 | 41 | 42 | Jeffrey 43 | Ullman 44 | 45 | 46 | Jennifer 47 | Widom 48 | 49 | 50 | -------------------------------------------------------------------------------- /02_xml-data/data/Bookstore-noDTD.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | A First Course in Database Systems 7 | 8 | 9 | Jeffrey 10 | Ullman 11 | 12 | 13 | Jennifer 14 | Widom 15 | 16 | 17 | 18 | 19 | 20 | Buy this book bundled with "A First Course" - a great deal! 21 | 22 | Database Systems: The Complete Book 23 | 24 | 25 | Hector 26 | Garcia-Molina 27 | 28 | 29 | Jeffrey 30 | Ullman 31 | 32 | 33 | Jennifer 34 | Widom 35 | 36 | 37 | 38 | 39 | -------------------------------------------------------------------------------- /02_xml-data/data/Bookstore.xsd: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 51 | 52 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | -------------------------------------------------------------------------------- /02_xml-data/dtds-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/02_xml-data/dtds-annotated-slides.pdf -------------------------------------------------------------------------------- /02_xml-data/dtds.md: -------------------------------------------------------------------------------- 1 | # *Valid* XML 2 | 3 | Has to adhere to the basic structural requirements as well-formed XML, but it also adheres to ***content-specific specification*** 4 | 5 | - **Document Type Descriptor (DTD)** 6 | - **XML Schema Description (XSD)** 7 | 8 | ## Parsing Valid XML 9 | 10 | 1. Send XML document to a validating XML parser 11 | 2. Add additional input to the process (a specification); either a DTD or an XSD 12 | 3. If structural requirements not met; output error about the lack of good form 13 | 4. If content specific specifications not met; output invalid error 14 | 15 | --- 16 | 17 | # Document Type Descriptor (DTD) 18 | 19 | - Grammar-like language for specifying elements, attributes, nesting, ordering, # of occurences 20 | - Also allows special attribute types such as `ID` and `IDREF(S)` 21 | - effectively allows one to specify pointers within a document, although these *pointers are untyped* 22 | 23 | --- 24 | 25 | # DTD/XSD versus none (well-formed) 26 | 27 | ### Pros 28 | 29 | - Programs can assume structure 30 | - programs can be simpler because they don't have to do a lot of error checking on data 31 | - CSS/XSL 32 | - Specification language for conveying what XML might need to look like 33 | - i.e. data exchange using XML => companies can use the DTD as a specification for what the XML needs to look like when it arrives at the program it's going to operate on 34 | - Documentation 35 | - Use one of the specifications to document what the data itself looks like 36 | - Essentially enjoys benefits of strongly typed data as opposed to loosely-typed data 37 | 38 | ### Cons 39 | 40 | - Flexibility 41 | - DTD makes XML data have to conform to a specification 42 | - Allows ease of change in data formats without running into errors 43 | - DTDs can potentially be messy 44 | - especially for *irregular* documents 45 | - XSDs can potentially become VERY messy 46 | - it's possible to have a document where the specification is much, much larger than the document itself 47 | - Essentially enjoys benefits of nil typing 48 | 49 | --- 50 | 51 | # Examples 52 | 53 | ## [`Bookstore-DTD.xml`](./data/Bookstore-DTD.xml) 54 | 55 | Notice a few things: 56 | 57 | - The DTD is specified before the XML in the same file. However, it is also possible to specify DTDs in separate files. 58 | - The first grammar-like construct `(Book | Magazine)*` are a little bit like regular expressions 59 | - `(Book | Magazine)` means that a bookstore element has a sub-element (any number of elements) that is/are called `Book` or `Magazne` 60 | - `*` means zero or more instances 61 | - `CDATA`: type of data, in this case a string 62 | - `#REQUIRED`: has to be present; `#IMPLIED`: doesn't have to be present 63 | - `#PCDATA` is when you have a leaf that consists of text data in the XML tree 64 | - `(Author+)`: one or more instances of author sub-elements 65 | 66 | ```xml 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 77 | 78 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | ]> 90 | 91 | 92 | 93 | A First Course in Database Systems 94 | 95 | 96 | Jeffrey 97 | Ullman 98 | 99 | 100 | Jennifer 101 | Widom 102 | 103 | 104 | 105 | 106 | Database Systems: The Complete Book 107 | 108 | 109 | Hector 110 | Garcia-Molina 111 | 112 | 113 | Jeffrey 114 | Ullman 115 | 116 | 117 | Jennifer 118 | Widom 119 | 120 | 121 | 122 | Buy this book bundled with "A First Course" - a great deal! 123 | 124 | 125 | 126 | ``` 127 | 128 | ## [`Bookstore-IDREFs.xml`](./data/Bookstore-IDREFs.xml) 129 | 130 | As before, the DTD is specified before the XML in the same file. 131 | 132 | Notice a few things: 133 | 134 | - Instead of having authors as subelements of book elements, have authors listed separately, and then effectively point from books to the authors of the books 135 | - Add `Ident` attribute to authors, give a string value to that attribute that we're going to use effectively for the pointers in the book 136 | - Authors is an `IDREF` attribute where its value can refer to one or more strings that are `ID` attributes 137 | - `(# PCDATA | BookRef)` is the type of construct that can be used to mix strings and sub-elements within an element 138 | - `ID` attributes need to all be unique 139 | - A singular `IDREF` requires that the string has to have exactly one `ID` value 140 | - Plural `IDREF`s require `ID` values to be separated by spaces 141 | 142 | ```xml 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 152 | 153 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | ]> 165 | 166 | 167 | 168 | A First Course in Database Systems 169 | 170 | 171 | Database Systems: The Complete Book 172 | 173 | Amazon.com says: Buy this book bundled with 174 | - a great deal! 175 | 176 | 177 | 178 | Hector 179 | Garcia-Molina 180 | 181 | 182 | Jeffrey 183 | Ullman 184 | 185 | 186 | Jennifer 187 | Widom 188 | 189 | 190 | ``` 191 | 192 | ## Command-line validity checking 193 | 194 | Use `xmllint` for *linting* (error-checking): 195 | 196 | ```sh 197 | $ xmllint --valid --noout Bookstore-DTD.xml # missing data in #REQUIRED field 198 | 199 | Bookstore-DTD.xml:59: element Book: validity error : Database Systems: The Complete Book does not carry attribute Edition 200 | 201 | ^ 202 | $ xmllint --valid --noout Bookstore-DTD.xml # no closing tags 203 | 204 | Bookstore-DTD.xml:64: parser error: Premature end of ... 205 | 206 | ``` 207 | 208 | If it doesn't return anything, there are no errors, and you should be proud. 209 | 210 | --- 211 | 212 | -------------------------------------------------------------------------------- /02_xml-data/exercises/countries.dtd: -------------------------------------------------------------------------------- 1 | 3 | 4 | 7 | 8 | 9 | 10 | 11 | 12 | ]> 13 | -------------------------------------------------------------------------------- /02_xml-data/exercises/countries.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Turkic 6 | Pashtu 7 | Afghan Persian 8 | 9 | 10 | 11 | 12 | Algiers 13 | 1507241 14 | 15 | 16 | 17 | 18 | 19 | 20 | English 21 | 22 | 23 | English 24 | 25 | 26 | 27 | La Matanza 28 | 1111811 29 | 30 | 31 | Cordoba 32 | 1208713 33 | 34 | 35 | Rosario 36 | 1118984 37 | 38 | 39 | Buenos Aires 40 | 2988006 41 | 42 | 43 | 44 | 45 | Yerevan 46 | 1200000 47 | 48 | Russian 49 | Armenian 50 | 51 | 52 | 53 | 54 | Sydney 55 | 3657000 56 | 57 | 58 | Brisbane 59 | 1302000 60 | 61 | 62 | Adelaide 63 | 1050000 64 | 65 | 66 | Melbourne 67 | 3081000 68 | 69 | 70 | Perth 71 | 1193000 72 | 73 | English 74 | 75 | 76 | 77 | Vienna 78 | 1583000 79 | 80 | German 81 | 82 | 83 | 84 | Baku 85 | 1740000 86 | 87 | Russian 88 | Armenian 89 | Azeri 90 | 91 | 92 | 93 | 94 | 95 | Dhaka 96 | 3839000 97 | 98 | 99 | Chittagong 100 | 1599000 101 | 102 | 103 | 104 | English 105 | 106 | 107 | 108 | Minsk 109 | 1540000 110 | 111 | 112 | 113 | French 114 | German 115 | Dutch 116 | 117 | 118 | 119 | 120 | English 121 | 122 | 123 | 124 | 125 | Serbo-Croatian 126 | 127 | 128 | 129 | 130 | Manaus 131 | 1158265 132 | 133 | 134 | Salvador 135 | 2209465 136 | 137 | 138 | Fortaleza 139 | 1967365 140 | 141 | 142 | Belo Horizonte 143 | 2091770 144 | 145 | 146 | Belem 147 | 1142258 148 | 149 | 150 | Curitiba 151 | 1465698 152 | 153 | 154 | Recife 155 | 1342877 156 | 157 | 158 | Rio de Janeiro 159 | 5533011 160 | 161 | 162 | Porto Alegre 163 | 1286251 164 | 165 | 166 | Sao Paulo 167 | 9811776 168 | 169 | 170 | Brasilia 171 | 1817001 172 | 173 | 174 | 175 | English 176 | 177 | 178 | 179 | 180 | Sofia 181 | 1300000 182 | 183 | Bulgarian 184 | 185 | 186 | 187 | 188 | Rangoon 189 | 2513000 190 | 191 | Burmese 192 | 193 | 194 | 195 | 196 | 197 | 198 | Montreal 199 | 1017666 200 | 201 | 202 | 203 | 204 | English 205 | 206 | 207 | 208 | 209 | 210 | Santiago 211 | 4318000 212 | 213 | Spanish 214 | 215 | 216 | 217 | Hong Kong 218 | 6218000 219 | 220 | 221 | Hefei 222 | 1000000 223 | 224 | 225 | Huainan 226 | 1200000 227 | 228 | 229 | Lanzhou 230 | 1510000 231 | 232 | 233 | Guangzhou 234 | 3580000 235 | 236 | 237 | Guiyang 238 | 1530000 239 | 240 | 241 | Shijiazhuang 242 | 1320000 243 | 244 | 245 | Tangshan 246 | 1500000 247 | 248 | 249 | Handan 250 | 1110000 251 | 252 | 253 | Harbin 254 | 2830000 255 | 256 | 257 | Qiqihar 258 | 1380000 259 | 260 | 261 | Zhengzhou 262 | 1710000 263 | 264 | 265 | Luoyang 266 | 1190000 267 | 268 | 269 | Wuhan 270 | 3750000 271 | 272 | 273 | Changsha 274 | 1330000 275 | 276 | 277 | Nanjing 278 | 2500000 279 | 280 | 281 | Fuzhou 282 | 1290000 283 | 284 | 285 | Nanchang 286 | 1350000 287 | 288 | 289 | Jilin 290 | 1270000 291 | 292 | 293 | Changchun 294 | 2110000 295 | 296 | 297 | Shenyang 298 | 4540000 299 | 300 | 301 | Dalian 302 | 2400000 303 | 304 | 305 | Anshan 306 | 1390000 307 | 308 | 309 | Fushun 310 | 1350000 311 | 312 | 313 | Xian 314 | 2760000 315 | 316 | 317 | Jinan 318 | 2320000 319 | 320 | 321 | Zibo 322 | 2460000 323 | 324 | 325 | Qingdao 326 | 2060000 327 | 328 | 329 | Taiyuan 330 | 1960000 331 | 332 | 333 | Datong 334 | 1110000 335 | 336 | 337 | Chengdu 338 | 2810000 339 | 340 | 341 | Chongqing 342 | 2980000 343 | 344 | 345 | Kunming 346 | 1520000 347 | 348 | 349 | Hangzhou 350 | 1340000 351 | 352 | 353 | Ningbo 354 | 1090000 355 | 356 | 357 | Nanning 358 | 1070000 359 | 360 | 361 | Baotou 362 | 1200000 363 | 364 | 365 | Urumqi 366 | 1160000 367 | 368 | 369 | Beijing 370 | 7000000 371 | 372 | 373 | Shanghai 374 | 7830000 375 | 376 | 377 | Tianjin 378 | 5770000 379 | 380 | 381 | 382 | English 383 | 384 | 385 | English 386 | 387 | 388 | 389 | Medellin 390 | 1621356 391 | 392 | 393 | Barranquilla 394 | 1064255 395 | 396 | 397 | Bogota 398 | 5237635 399 | 400 | 401 | Cali 402 | 1718871 403 | 404 | Spanish 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | Serbo-Croatian 413 | 414 | 415 | 416 | Havana 417 | 2241000 418 | 419 | Spanish 420 | 421 | 422 | 423 | 424 | Prague 425 | 1215000 426 | 427 | 428 | 429 | 430 | Copenhagen 431 | 1358540 432 | 433 | 434 | 435 | 436 | 437 | 438 | Santo Domingo 439 | 1400000 440 | 441 | Spanish 442 | 443 | 444 | 445 | Quito 446 | 1200000 447 | 448 | 449 | Guayaquil 450 | 1300868 451 | 452 | 453 | 454 | 455 | El Giza 456 | 1671000 457 | 458 | 459 | Alexandria 460 | 2917000 461 | 462 | 463 | Cairo 464 | 6053000 465 | 466 | 467 | 468 | 469 | 470 | 471 | 472 | 473 | Addis Ababa 474 | 2316400 475 | 476 | 477 | 478 | English 479 | 480 | 481 | 482 | 483 | Swedish 484 | Finnish 485 | 486 | 487 | 488 | Paris 489 | 2152423 490 | 491 | French 492 | 493 | 494 | French 495 | 496 | 497 | 498 | 499 | 500 | 501 | Tbilisi 502 | 1200000 503 | 504 | Russian 505 | Armenian 506 | Azeri 507 | Georgian 508 | 509 | 510 | 511 | Munchen 512 | 1244676 513 | 514 | 515 | Muenchen 516 | 1290079 517 | 518 | 519 | Berlin 520 | 3472009 521 | 522 | 523 | Hamburg 524 | 1705872 525 | 526 | German 527 | 528 | 529 | 530 | 531 | 532 | 533 | 534 | French 535 | 536 | 537 | 538 | Spanish 539 | Indian 540 | 541 | 542 | 543 | French 544 | 545 | 546 | 547 | 548 | French 549 | 550 | 551 | 552 | 553 | 554 | Budapest 555 | 2016000 556 | 557 | Hungarian 558 | 559 | 560 | Icelandic 561 | 562 | 563 | 564 | Hyderabad 565 | 3145939 566 | 567 | 568 | Ahmadabad 569 | 2954526 570 | 571 | 572 | Surat 573 | 1505872 574 | 575 | 576 | Vadodara 577 | 1061598 578 | 579 | 580 | Bangalore 581 | 3302296 582 | 583 | 584 | Bhopal 585 | 1062771 586 | 587 | 588 | Indore 589 | 1091674 590 | 591 | 592 | Mumbai 593 | 9925891 594 | 595 | 596 | Nagpur 597 | 1624752 598 | 599 | 600 | Pune 601 | 1566651 602 | 603 | 604 | Kalyan 605 | 1014557 606 | 607 | 608 | Ludhiana 609 | 1042740 610 | 611 | 612 | Jaipur 613 | 1458183 614 | 615 | 616 | Madras 617 | 3841396 618 | 619 | 620 | Lucknow 621 | 1619115 622 | 623 | 624 | Kanpur 625 | 1879420 626 | 627 | 628 | Calcutta 629 | 4399819 630 | 631 | 632 | New Delhi 633 | 7206704 634 | 635 | Hindi 636 | 637 | 638 | 639 | Jakarta 640 | 8259266 641 | 642 | 643 | Bandung 644 | 2058649 645 | 646 | 647 | Semarang 648 | 1250971 649 | 650 | 651 | Surabaya 652 | 2483871 653 | 654 | 655 | Palembang 656 | 1144279 657 | 658 | 659 | Medan 660 | 1730752 661 | 662 | 663 | 664 | 665 | Tabriz 666 | 1166203 667 | 668 | 669 | Esfahan 670 | 1220595 671 | 672 | 673 | Shiraz 674 | 1042801 675 | 676 | 677 | Mashhad 678 | 1964489 679 | 680 | 681 | Tehran 682 | 6750043 683 | 684 | Turkish 685 | Baloch 686 | Kurdish 687 | Arabic 688 | Luri 689 | Persian Persian 690 | Turkic Turkic 691 | 692 | 693 | 694 | Baghdad 695 | 4478000 696 | 697 | 698 | 699 | 700 | 701 | 702 | Milano 703 | 1432184 704 | 705 | 706 | Rome 707 | 2791354 708 | 709 | 710 | Napoli 711 | 1206013 712 | 713 | 714 | 715 | 716 | 717 | Sapporo 718 | 1748000 719 | 720 | 721 | Tokyo 722 | 7843000 723 | 724 | 725 | Yokohama 726 | 3256000 727 | 728 | 729 | Kawasaki 730 | 1187000 731 | 732 | 733 | Nagoya 734 | 2108000 735 | 736 | 737 | Kyoto 738 | 1415000 739 | 740 | 741 | Osaka 742 | 2492000 743 | 744 | 745 | Kobe 746 | 1388000 747 | 748 | 749 | Hiroshima 750 | 1099000 751 | 752 | 753 | Fukuoka 754 | 1273000 755 | 756 | 757 | Kita kyushu 758 | 1012000 759 | 760 | Japanese 761 | 762 | 763 | 764 | 765 | 766 | Almaty 767 | 1172400 768 | 769 | Kazak 770 | 771 | 772 | 773 | Nairobi 774 | 2000000 775 | 776 | 777 | 778 | 779 | 780 | 781 | 782 | 783 | 784 | 785 | English 786 | 787 | 788 | 789 | 790 | 791 | 792 | Portuguese 793 | 794 | 795 | Albanian 796 | Macedonian 797 | Turkish 798 | Serbo-Croatian 799 | 800 | 801 | 802 | Antananarivo 803 | 1250000 804 | 805 | 806 | 807 | 808 | 809 | Kuala Lumpur 810 | 1145075 811 | 812 | 813 | 814 | 815 | Bambara 816 | 817 | 818 | 819 | 820 | 821 | 822 | 823 | 824 | 825 | 826 | Nezahualcoyotl 827 | 1255456 828 | 829 | 830 | Guadalajara 831 | 1650042 832 | 833 | 834 | Monterrey 835 | 1068996 836 | 837 | 838 | Puebla 839 | 1007170 840 | 841 | 842 | Mexico 843 | 9815795 844 | 845 | 846 | 847 | 848 | 849 | 850 | Khalkha Mongol 851 | 852 | 853 | English 854 | 855 | 856 | 857 | Rabat 858 | 1385872 859 | 860 | 861 | Casablanca 862 | 2940623 863 | 864 | 865 | 866 | Portuguese 867 | 868 | 869 | German 870 | English 871 | Afrikaans 872 | 873 | 874 | 875 | Nepali 876 | 877 | 878 | 879 | Amsterdam 880 | 1101407 881 | 882 | 883 | Rotterdam 884 | 1078747 885 | 886 | Dutch 887 | 888 | 889 | 890 | 891 | 892 | 893 | Managua 894 | 1195000 895 | 896 | Spanish 897 | 898 | 899 | 900 | 901 | Lagos 902 | 5686000 903 | 904 | 905 | Ibadan 906 | 1263000 907 | 908 | 909 | 910 | 911 | 912 | 913 | Pyongyang 914 | 2335000 915 | 916 | Korean 917 | 918 | 919 | 920 | Norwegian 921 | 922 | 923 | 924 | 925 | Hyderabad 926 | 1107000 927 | 928 | 929 | Peshawar 930 | 1676000 931 | 932 | 933 | Lahore 934 | 5085000 935 | 936 | 937 | Karachi 938 | 9863000 939 | 940 | 941 | Faisalabad 942 | 1875000 943 | 944 | 945 | Gujranwala 946 | 1663000 947 | 948 | 949 | Rawalpindi 950 | 1290000 951 | 952 | 953 | Multan 954 | 1257000 955 | 956 | Pashtu 957 | Urdu 958 | Punjabi 959 | Sindhi 960 | Balochi 961 | Hindko 962 | Brahui 963 | Siraiki 964 | 965 | 966 | 967 | English 968 | 969 | 970 | English 971 | 972 | 973 | 974 | 975 | Lima 976 | 6321173 977 | 978 | 979 | 980 | 981 | Manila 982 | 1655000 983 | 984 | 985 | Quezon 986 | 1989000 987 | 988 | 989 | Kalookan 990 | 1023000 991 | 992 | 993 | Davao 994 | 1007000 995 | 996 | 997 | 998 | 999 | 1000 | Warsaw 1001 | 1655000 1002 | 1003 | Polish 1004 | 1005 | 1006 | Portuguese 1007 | 1008 | 1009 | 1010 | 1011 | 1012 | 1013 | Bucharest 1014 | 2037000 1015 | 1016 | 1017 | 1018 | 1019 | Sankt Peterburg 1020 | 4838000 1021 | 1022 | 1023 | Moscow 1024 | 8717000 1025 | 1026 | 1027 | Nizhniy Novgorod 1028 | 1383000 1029 | 1030 | 1031 | Kazan 1032 | 1085000 1033 | 1034 | 1035 | Volgograd 1036 | 1003000 1037 | 1038 | 1039 | Samara 1040 | 1184000 1041 | 1042 | 1043 | Rostov na Donu 1044 | 1026000 1045 | 1046 | 1047 | Ufa 1048 | 1094000 1049 | 1050 | 1051 | Perm 1052 | 1032000 1053 | 1054 | 1055 | Yekaterinburg 1056 | 1280000 1057 | 1058 | 1059 | Chelyabinsk 1060 | 1086000 1061 | 1062 | 1063 | Novosibirsk 1064 | 1369000 1065 | 1066 | 1067 | Omsk 1068 | 1163000 1069 | 1070 | Russian 1071 | 1072 | 1073 | 1074 | English 1075 | 1076 | 1077 | English 1078 | 1079 | 1080 | 1081 | French 1082 | 1083 | 1084 | 1085 | Italian 1086 | 1087 | 1088 | Portuguese 1089 | 1090 | 1091 | 1092 | Riyadh 1093 | 1250000 1094 | 1095 | Arabic 1096 | 1097 | 1098 | 1099 | Dakar 1100 | 1382000 1101 | 1102 | 1103 | 1104 | 1105 | Belgrade 1106 | 1407073 1107 | 1108 | Albanian 1109 | Serbo-Croatian 1110 | 1111 | 1112 | 1113 | 1114 | 1115 | Singapore 1116 | 2558000 1117 | 1118 | 1119 | 1120 | 1121 | Slovenian 1122 | Serbo-Croatian 1123 | 1124 | 1125 | English 1126 | 1127 | 1128 | 1129 | 1130 | 1131 | Seoul 1132 | 10229262 1133 | 1134 | 1135 | Pusan 1136 | 3813814 1137 | 1138 | 1139 | Taegu 1140 | 2449139 1141 | 1142 | 1143 | Inchon 1144 | 2307618 1145 | 1146 | 1147 | Kwangju 1148 | 1257504 1149 | 1150 | 1151 | Taejon 1152 | 1272143 1153 | 1154 | 1155 | 1156 | 1157 | Barcelona 1158 | 1630867 1159 | 1160 | 1161 | Madrid 1162 | 3041101 1163 | 1164 | Catalan 1165 | Galician 1166 | Basque 1167 | 1168 | 1169 | Tamil 1170 | Sinhala 1171 | 1172 | 1173 | 1174 | Omdurman 1175 | 1267077 1176 | 1177 | 1178 | 1179 | 1180 | 1181 | 1182 | Swedish 1183 | 1184 | 1185 | French 1186 | German 1187 | Italian 1188 | Romansch 1189 | 1190 | 1191 | 1192 | Damascus 1193 | 1500000 1194 | 1195 | 1196 | 1197 | 1198 | Kaohsiung 1199 | 1426518 1200 | 1201 | 1202 | Taipei 1203 | 2626138 1204 | 1205 | 1206 | 1207 | 1208 | 1209 | Dar es Salaam 1210 | 1360850 1211 | 1212 | 1213 | 1214 | 1215 | Bangkok 1216 | 5876000 1217 | 1218 | 1219 | 1220 | 1221 | 1222 | 1223 | 1224 | 1225 | Adana 1226 | 1047300 1227 | 1228 | 1229 | Ankara 1230 | 2782200 1231 | 1232 | 1233 | Istanbul 1234 | 7615500 1235 | 1236 | 1237 | Izmir 1238 | 1985300 1239 | 1240 | 1241 | 1242 | Russian 1243 | Uzbek 1244 | Turkmen 1245 | 1246 | 1247 | English 1248 | 1249 | 1250 | 1251 | 1252 | 1253 | Dnipropetrovsk 1254 | 1187000 1255 | 1256 | 1257 | Donetsk 1258 | 1117000 1259 | 1260 | 1261 | Kharkiv 1262 | 1618000 1263 | 1264 | 1265 | Kiev 1266 | 2616000 1267 | 1268 | 1269 | Odesa 1270 | 1106000 1271 | 1272 | 1273 | 1274 | 1275 | 1276 | London 1277 | 6967500 1278 | 1279 | 1280 | Birmingham 1281 | 1008400 1282 | 1283 | 1284 | 1285 | 1286 | Phoenix 1287 | 1159014 1288 | 1289 | 1290 | Los Angeles 1291 | 3553638 1292 | 1293 | 1294 | San Diego 1295 | 1171121 1296 | 1297 | 1298 | Chicago 1299 | 2721547 1300 | 1301 | 1302 | Detroit 1303 | 1000272 1304 | 1305 | 1306 | New York 1307 | 7380906 1308 | 1309 | 1310 | Philadelphia 1311 | 1478002 1312 | 1313 | 1314 | Houston 1315 | 1744058 1316 | 1317 | 1318 | San Antonio 1319 | 1067816 1320 | 1321 | 1322 | Dallas 1323 | 1053292 1324 | 1325 | 1326 | 1327 | 1328 | Montevideo 1329 | 1247000 1330 | 1331 | 1332 | 1333 | 1334 | Tashkent 1335 | 2106000 1336 | 1337 | Russian 1338 | Tajik 1339 | Uzbek 1340 | 1341 | 1342 | 1343 | 1344 | Maracaibo 1345 | 1249670 1346 | 1347 | 1348 | Caracas 1349 | 1822465 1350 | 1351 | 1352 | 1353 | 1354 | Hanoi 1355 | 3056146 1356 | 1357 | 1358 | Haiphong 1359 | 1447523 1360 | 1361 | 1362 | Ho Chi Minh City 1363 | 3924435 1364 | 1365 | 1366 | 1367 | 1368 | 1369 | 1370 | 1371 | Arabic 1372 | 1373 | 1374 | 1375 | Kinshasa 1376 | 4655313 1377 | 1378 | 1379 | 1380 | 1381 | 1382 | -------------------------------------------------------------------------------- /02_xml-data/exercises/courses-ID.dtd: -------------------------------------------------------------------------------- 1 | 3 | 4 | 6 | 7 | 8 | 9 | 10 | 11 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | ]> 23 | -------------------------------------------------------------------------------- /02_xml-data/exercises/courses-ID.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Computer Science 5 | 6 | Programming Methodology 7 | Introduction to the engineering of computer applications emphasizing modern software engineering principles. 8 | 9 | 10 | Programming Abstractions 11 | Abstraction and its relation to programming. 12 | 13 | 14 | Computer Organization and Systems 15 | Introduction to the fundamental concepts of computer systems. 16 | 17 | 18 | Introduction to Probability for Computer Scientists 19 | 20 | 21 | From Languages to Information 22 | 23 | Natural language processing. Cross-listed as 24 | 25 | . 26 | 27 | 28 | 29 | Compilers 30 | Principles and practices for design and implementation of compilers and interpreters. 31 | 32 | 33 | Introduction to Databases 34 | Database design and use of database management systems for applications. 35 | 36 | 37 | Artificial Intelligence: Principles and Techniques 38 | 39 | 40 | Structured Probabilistic Models: Principles and Techniques 41 | Using probabilistic modeling languages to represent complex domains. 42 | 43 | 44 | Machine Learning 45 | A broad introduction to machine learning and statistical pattern recognition. 46 | 47 | 48 | Alex 49 | S. 50 | Aiken 51 | 52 | 53 | Jerry 54 | R. 55 | Cain 56 | 57 | 58 | Daphne 59 | Koller 60 | 61 | 62 | Andrew 63 | Ng 64 | 65 | 66 | Eric 67 | Roberts 68 | 69 | 70 | Mehran 71 | Sahami 72 | 73 | 74 | Sebastian 75 | Thrun 76 | 77 | 78 | Jennifer 79 | Widom 80 | 81 | 82 | Julie 83 | Zelenski 84 | 85 | 86 | 87 | Electrical Engineering 88 | 89 | Digital Systems I 90 | Digital circuit, logic, and system design. 91 | 92 | 93 | Digital Systems II 94 | The design of processor-based digital systems. 95 | 96 | 97 | William 98 | J. 99 | Dally 100 | 101 | 102 | Mark 103 | A. 104 | Horowitz 105 | 106 | 107 | Subhasish 108 | Mitra 109 | 110 | 111 | Oyekunle 112 | Olukotun 113 | 114 | 115 | 116 | Linguistics 117 | 118 | From Languages to Information 119 | 120 | Natural language processing. Cross-listed as 121 | 122 | . 123 | 124 | 125 | 126 | Dan 127 | Jurafsky 128 | 129 | 130 | Beth 131 | Levin 132 | 133 | 134 | 135 | -------------------------------------------------------------------------------- /02_xml-data/exercises/courses-noID.dtd: -------------------------------------------------------------------------------- 1 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | ]> 20 | -------------------------------------------------------------------------------- /02_xml-data/exercises/courses-noID.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Computer Science 5 | 6 | 7 | Jennifer 8 | Widom 9 | 10 | 11 | 12 | Programming Methodology 13 | Introduction to the engineering of computer applications emphasizing modern software engineering principles. 14 | 15 | 16 | Jerry 17 | R. 18 | Cain 19 | 20 | 21 | Eric 22 | Roberts 23 | 24 | 25 | Mehran 26 | Sahami 27 | 28 | 29 | 30 | 31 | Programming Abstractions 32 | Abstraction and its relation to programming. 33 | 34 | 35 | Eric 36 | Roberts 37 | 38 | 39 | Jerry 40 | R. 41 | Cain 42 | 43 | 44 | 45 | CS106A 46 | 47 | 48 | 49 | Computer Organization and Systems 50 | Introduction to the fundamental concepts of computer systems. 51 | 52 | 53 | Julie 54 | Zelenski 55 | 56 | 57 | 58 | CS106B 59 | 60 | 61 | 62 | Introduction to Probability for Computer Scientists 63 | 64 | 65 | Mehran 66 | Sahami 67 | 68 | 69 | 70 | CS106B 71 | 72 | 73 | 74 | From Languages to Information 75 | Natural language processing. Cross-listed as LING180. 76 | 77 | 78 | Dan 79 | Jurafsky 80 | 81 | 82 | 83 | CS107 84 | CS109 85 | 86 | 87 | 88 | Compilers 89 | Principles and practices for design and implementation of compilers and interpreters. 90 | 91 | 92 | Alex 93 | S. 94 | Aiken 95 | 96 | 97 | 98 | CS107 99 | 100 | 101 | 102 | Introduction to Databases 103 | Database design and use of database management systems for applications. 104 | 105 | 106 | Jennifer 107 | Widom 108 | 109 | 110 | 111 | CS107 112 | 113 | 114 | 115 | Artificial Intelligence: Principles and Techniques 116 | 117 | 118 | Andrew 119 | Ng 120 | 121 | 122 | Sebastian 123 | Thrun 124 | 125 | 126 | 127 | 128 | Structured Probabilistic Models: Principles and Techniques 129 | Using probabilistic modeling languages to represent complex domains. 130 | 131 | 132 | Daphne 133 | Koller 134 | 135 | 136 | 137 | 138 | Machine Learning 139 | A broad introduction to machine learning and statistical pattern recognition. 140 | 141 | 142 | Andrew 143 | Ng 144 | 145 | 146 | 147 | 148 | 149 | Electrical Engineering 150 | 151 | 152 | Mark 153 | A. 154 | Horowitz 155 | 156 | 157 | 158 | Digital Systems I 159 | Digital circuit, logic, and system design. 160 | 161 | 162 | Subhasish 163 | Mitra 164 | 165 | 166 | 167 | 168 | Digital Systems II 169 | The design of processor-based digital systems. 170 | 171 | 172 | William 173 | J. 174 | Dally 175 | 176 | 177 | Oyekunle 178 | Olukotun 179 | 180 | 181 | 182 | EE108A 183 | CS106B 184 | 185 | 186 | 187 | 188 | Linguistics 189 | 190 | 191 | Beth 192 | Levin 193 | 194 | 195 | 196 | From Languages to Information 197 | Natural language processing. Cross-listed as CS124. 198 | 199 | 200 | Dan 201 | Jurafsky 202 | 203 | 204 | 205 | CS107 206 | CS109 207 | 208 | 209 | 210 | 211 | -------------------------------------------------------------------------------- /02_xml-data/well-formed-xml-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/02_xml-data/well-formed-xml-annotated-slides.pdf -------------------------------------------------------------------------------- /02_xml-data/well-formed-xml.md: -------------------------------------------------------------------------------- 1 | # Extensible Markup Language (XML) 2 | 3 | - Standard for data representation and exchange 4 | - Document format similar to HTML 5 | - Tags describe content instead of formatting 6 | - Also has a streaming format/standard 7 | - Typically for the use in programs for admitting and consuming XML 8 | 9 | ## Basic constructs 10 | 11 | - Tagged elements (nested) 12 | - Attributes 13 | - Each element may have within its opening tag a set of attributes 14 | - Consist of an attribute name, the equal sign, and then an attribute value 15 | - Any element can have any number of attributes as long as the attribute names are unique 16 | - Text 17 | - If XML is thought of as a tree, the text (strings) are the leaf elements of the tree 18 | 19 | ## Example ([`Bookstore-noDTD.xml`](./data/Bookstore-noDTD.xml)) 20 | 21 | ```xml 22 | 23 | 24 | 25 | 26 | 27 | A First Course in Database Systems 28 | 29 | 30 | Jeffrey 31 | Ullman 32 | 33 | 34 | Jennifer 35 | Widom 36 | 37 | 38 | 39 | 40 | 41 | Buy this book bundled with "A First Course" - a great deal! 42 | 43 | Database Systems: The Complete Book 44 | 45 | 46 | Hector 47 | Garcia-Molina 48 | 49 | 50 | Jeffrey 51 | Ullman 52 | 53 | 54 | Jennifer 55 | Widom 56 | 57 | 58 | 59 | 60 | ``` 61 | 62 | --- 63 | 64 | # Relational Model versus XML 65 | 66 | ## Structure 67 | 68 | - Relational: *tables* 69 | - XML: *hierarchical trees or graphs* 70 | 71 | ## Schema 72 | 73 | - Relational: *fixed in advance* 74 | - add the data afterwards to conform to the schema 75 | - XML: *flexible, self-describing* 76 | - the schema and the data are mixed together to some extent 77 | - tags on elements are telling you the kind of data you'll have, and you can have a lot of irregularity 78 | - many mechanisms for introducing schemas into XML but not required 79 | 80 | ## Queries 81 | 82 | - Relational: *simple, nice languages* 83 | - XML: *less simple* 84 | 85 | ## Ordering 86 | 87 | - Relational: *None* 88 | - fundamentally, the data in a relationship database is a set of data without an ordering within that set 89 | - XML: *Implied* 90 | - XML can be thought of as either a document model or a stream model 91 | - Either case, the nature of the XML being laid out in a document as we have here (or being in a stream) induces an order 92 | 93 | ## Implementation 94 | 95 | - Relational: *Native* 96 | - has been around for at least 35 years 97 | - XML: *Add-on* 98 | - most likely a layer over the relational database system 99 | - entering and querying data in XML will be translated into a relational implementation 100 | 101 | --- 102 | 103 | # *Well-Formed* XML 104 | 105 | ## Basic structural requirements: 106 | 107 | - Single root element 108 | - Matched tags, proper nesting 109 | - Unique attributes within elements 110 | 111 | ## XML Parser 112 | 113 | - The parser will check the basic structure of a XML document 114 | - If not well-formed, parser will send an error 115 | - Else, the output is parsed XML, which is shown via several standards: 116 | - **DOM**: document-object model is a programmatic interface for traversing the tree implied by XML 117 | - **SAX**: more of a stream model for XML 118 | 119 | ## Displaying XML 120 | 121 | Use ruled-basd language to translate to HTML, which we can then render in a browser 122 | 123 | - **Cascading stylesheets (CSS)** 124 | - **Extensible stylesheet language (XSL)** 125 | 126 | ### Process 127 | 128 | 1. Send XML document to CSS or XSL interpreter 129 | 2. Apply rules that will be used on that particular document 130 | 3. Get output as HTML document 131 | 4. Render HTML in browser 132 | 133 | --- 134 | 135 | 136 | # Questions 137 | 138 | 1. You're creating a database to contain information about university records: students, courses, grades, etc. Should you use the relational model or XML? 139 | - The database has a simple structure fixed in advance, so it's amenable to the relational model. 140 | 141 | 2. You're creating a database to contain information for a university web site: news, academic announcements, admissions, events, research, etc. Should you use the relational model or XML? 142 | - The database has an unpredictable, complex, dynamic structure, so the flexibility of XML is warranted. 143 | 144 | 3. You're creating a database to contain information about family trees (ancestry). Should you use the relational model or XML? 145 | - The database has a fixed structure suggesting relational, but it's strictly hierarchical suggesting XML. Either may be suitable. 146 | -------------------------------------------------------------------------------- /02_xml-data/xml-quiz-solutions.md: -------------------------------------------------------------------------------- 1 | # Solutions to [XML Quiz](./xml-quiz.md) 2 | 3 | 1. **ANSWER**: 4 | 5 | ```xml 6 | 7 | 8 | 9 | 10 | 11 | ``` 12 | 13 | **EXPLANATION**: 14 | 15 | Well-formed XML must follow these rules (along with others): 16 | - There must be exactly one top level element. 17 | - All opening tags must be closed. 18 | - All elements are properly nested i.e., there are no interleaved elements. 19 | - Attribute values must be enclosed in single or double quotes. 20 | 21 | 2. **ANSWER**: 22 | 23 | ```xml 24 | 25 | ``` 26 | 27 | **EXPLANATION**: 28 | 29 | In the XML snippet, the info element has one address subelement and two phone subelements, in that order. Thus, in the DTD the list of components for `INFO` must include `ADDR`, `ADDR*`, `ADDR+`, or `ADDR?` followed by `PHONE*` or `PHONE+`. Interspersed with these may be any elements that are not required to appear-- that is, any components with a `?` or `*`. Thus, we might also have components like `NAME*` or `MANAGER?` at any point in the list. 30 | 31 | 3. **ANSWER**: 32 | 33 | ```xml 34 | 35 | ``` 36 | 37 | **EXPLANATION**: 38 | 39 | The correct choices (i.e., the erroneous DTD snippets) are based on two rules: 40 | 41 | 1. A `#REQUIRED` attribute must appear in every element. 42 | 2. An attribute can have types `CDATA`, `ID`, or `IDREF(S)`, but not `#PCDATA`. 43 | 44 | The incorrect choices (i.e., the snippets that could appear in a DTD), are either optional attributes (#IMPLIED) or are required attributes of a proper type. 45 | 46 | 4. **ANSWER**: 47 | 48 | ``` 49 | A B /B B /B C B /B D /D /C /A 50 | ``` 51 | 52 | **EXPLANATION**: 53 | 54 | According to the DTD, an A element has within it one or more B subelements, and then a C element. Within the C element is zero or one B elements followed by exactly one D element. In terms of regular expressions, the tag sequences we can see are: 55 | 56 | ``` 57 | A (B /B)(B /B)* C (D /D | B /B D /D) /C /A. 58 | ``` 59 | 60 | Some text may appear between each B-/B pair and each D-/D pair, but text may not appear elsewhere. 61 | 62 | 5. **ANSWER**: 63 | 64 | Only the second 65 | 66 | **EXPLANATION**: 67 | 68 | Focus on the ID and IDREF attributes: A valid document needs to have unique values across ID attributes. An IDREF attribute can refer to any existing ID attribute value. 69 | 70 | 6. **ANSWER**: 71 | 72 | ```xml 73 | 74 | John 75 | Q 76 | Public 77 |
123 Public Avenue, Seattle, WA 98001
78 | Computer Science 79 |
80 | ``` 81 | 82 | **EXPLANATION**: 83 | 84 | This question deals with the `xs:element`, `xs:sequence`, and `xs:choice` elements in XML Schema. In order for XML to be valid according to the specified schema: 85 | - The elements contained in a sequence must appear in exactly the same order as specified in the `xs:sequence`. 86 | - Exactly one of the elements contained in an `xs:choice` must appear. 87 | - If an element specifies a `minOccurs` attribute, the XML must contain at least that many instances of the element. 88 | - If an element specifies a `maxOccurs` attribute, the XML must not contain more than that many instances of the element. 89 | - If `minOccurs` and `maxOccurs` are not specified, their default value is 1. 90 | - Elements not defined as a part of a sequence or choice cannot occur inside the corresponding `xs:sequence` and `xs:choice`. 91 | 92 | The given schema specifies the following constraints: 93 | - The "fname", "initial", "lname", and "address" elements must occur in that sequence. 94 | - The "initial" element is optional due to its `minOccurs` value being 0. 95 | - The "address" element can occur either 1 or 2 times due to its `maxOccurs` value being 2. 96 | - After the "address" element, either exactly one "major" element or exactly 2 "minor" elements can occur, but not both. 97 | - Elements not defined as a part of this schema specification are not allowed to occur as a part of the "person" element. 98 | 99 | Here is an example of valid XML for this schema: 100 | 101 | ```xml 102 | 103 | John 104 | Q 105 | Public 106 |
123 Public Avenue
107 |
Seattle, WA 98001
108 | Computer Science 109 |
110 | ``` 111 | -------------------------------------------------------------------------------- /02_xml-data/xml-quiz.md: -------------------------------------------------------------------------------- 1 | # Multiple Choice (6 points) 2 | 3 | 4 | 1. Provide a well-formed XML that satisfies the following conditions: 5 | - It has a root element "tasklist" 6 | - The root element has 3 "task" subelements 7 | - Each of the "task" subelements has an attribute named "name" 8 | - The values of the "name" attributes for the 3 tasks are "eat", "drink", and "play" 9 | 10 | 2. An XML document contains the following portion: 11 | 12 | ```xml 13 | ... 14 | 15 | 101 Maple St. 16 | 555-1212 17 | 555-4567 18 | 19 | ... 20 | ``` 21 | 22 | Which of the following could be the `INFO` element specification in a DTD that the document matches? 23 | 24 | 3. An XML document contains the following portion: 25 | 26 | ```xml 27 | 28 | 123 Sesame St. 29 | 555-1212 30 | 31 | ``` 32 | 33 | Which of the following could NOT be part of a DTD that the document matches? Note that there can be multiple `ATTLIST` declarations for a single element type; do not assume the only attributes allowed for an element type are the ones shown in the answer choice. 34 | 35 | 4. Here is a DTD: 36 | 37 | ```xml 38 | 40 | 41 | 42 | 43 | ]> 44 | ``` 45 | 46 | Which of the following sequences of opening and closing tags matches this DTD? Note: In actual XML, opening and closing tags would be enclosed in angle brackets, and some elements might have text subelements. This quiz focuses on the element sequencing and interleaving specified by the DTD. 47 | 48 | 5. Here is an XML DTD: 49 | 50 | ```xml 51 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | ]> 60 | ``` 61 | 62 | Which of the following documents match the DTD? 63 | 64 | - 65 | 66 | ```xml 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | ``` 79 | - 80 | 81 | ```xml 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | ``` 95 | - 96 | 97 | ```xml 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | ``` 106 | 107 | 6. Study the following XML Schema specification: 108 | 109 | ```xml 110 | 111 | 112 | 113 | 114 | 115 | 117 | 118 | 120 | 121 | 122 | 124 | 125 | 126 | 127 | 128 | 129 | ``` 130 | 131 | Provide a XML that is valid according to the XML Schema specification. 132 | 133 | --- 134 | 135 | # [View Solutions](./xml-quiz-solutions.md) 136 | -------------------------------------------------------------------------------- /02_xml-data/xml-schema-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/02_xml-data/xml-schema-annotated-slides.pdf -------------------------------------------------------------------------------- /02_xml-data/xml-schema.md: -------------------------------------------------------------------------------- 1 | # XML Schema (XSD) 2 | 3 | - Extensive language 4 | - Like DTDs, can specify elements, attributes, nesting, ordering, # of occurences 5 | - Allows data types, keys, (typed) pointers, etc. 6 | - Unlike DTDs, is written in XML 7 | 8 | ## Key Features 9 | 10 | 1. Typed values 11 | 2. Key declarations 12 | - in DTDs, `ID`s were globally unique values that could be used to identify specific elements 13 | 3. References (`keyref`) 14 | - similar to pointers but a little more powerful 15 | 4. Occurrence constraints 16 | - `minOccurs="0"`: 0 instances at minimum (not required) 17 | - `maxOccurs="unbounded"`: no upper limits 18 | 19 | --- 20 | 21 | # Example 22 | 23 | ### [`Bookstore-XSD.xml`](./data/Bookstore-XSD.xml) with [`Bookstore.xsd`](./data/Bookstore.xsd) 24 | 25 | - Instead of using `IDREF` attributes to refer from books to authors, we now back our back having an author's sub-element with the two authors underneath and then those authors themselves have what are effectively *pointers to the identifiers* for the authors 26 | - The XSD is in a separate file from the actual XML data 27 | 28 | ```xml 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | A First Course in Database Systems 48 | 49 | 50 | 51 | 52 | 53 | 54 | Database Systems: The Complete Book 55 | 56 | 57 | 58 | 59 | 60 | 61 | Amazon.com says: Buy this book bundled with 62 | - a great deal! 63 | 64 | 65 | 66 | Hector 67 | Garcia-Molina 68 | 69 | 70 | Jeffrey 71 | Ullman 72 | 73 | 74 | Jennifer 75 | Widom 76 | 77 | 78 | ``` 79 | -------------------------------------------------------------------------------- /03_json-data/data/Bookstore.json: -------------------------------------------------------------------------------- 1 | { "Books": 2 | [ 3 | { "ISBN":"ISBN-0-13-713526-2", 4 | "Price":85, 5 | "Edition":3, 6 | "Title":"A First Course in Database Systems", 7 | "Authors":[ {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, 8 | {"First_Name":"Jennifer", "Last_Name":"Widom"} ] } 9 | , 10 | { "ISBN":"ISBN-0-13-815504-6", 11 | "Price":100, 12 | "Remark":"Buy this book bundled with 'A First Course' - a great deal!", 13 | "Title":"Database Systems:The Complete Book", 14 | "Authors":[ {"First_Name":"Hector", "Last_Name":"Garcia-Molina"}, 15 | {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, 16 | {"First_Name":"Jennifer", "Last_Name":"Widom"} ] } 17 | ], 18 | "Magazines": 19 | [ 20 | { "Title":"National Geographic", 21 | "Month":"January", 22 | "Year":2009 } 23 | , 24 | { "Title":"Newsweek", 25 | "Month":"February", 26 | "Year":2009 } 27 | ] 28 | } 29 | -------------------------------------------------------------------------------- /03_json-data/data/BookstoreSchema.json: -------------------------------------------------------------------------------- 1 | { "type":"object", 2 | "properties": { 3 | "Books": { 4 | "type":"array", 5 | "items": { 6 | "type":"object", 7 | "properties": { 8 | "ISBN": { "type":"string", "pattern":"ISBN*" }, 9 | "Price": { "type":"integer", 10 | "minimum":0, "maximum":200 }, 11 | "Edition": { "type":"integer", "optional": true }, 12 | "Remark": { "type":"string", "optional": true }, 13 | "Title": { "type":"string" }, 14 | "Authors": { 15 | "type":"array", 16 | "minItems":1, 17 | "maxItems":10, 18 | "items": { 19 | "type":"object", 20 | "properties": { 21 | "First_Name": { "type":"string" }, 22 | "Last_Name": { "type":"string" }}}}}}}, 23 | "Magazines": { 24 | "type":"array", 25 | "items": { 26 | "type":"object", 27 | "properties": { 28 | "Title": { "type":"string" }, 29 | "Month": { "type":"string", 30 | "enum":["January","February"] }, 31 | "Year": { "type":"integer" }}}} 32 | }} 33 | -------------------------------------------------------------------------------- /03_json-data/json-demo-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/03_json-data/json-demo-annotated-slides.pdf -------------------------------------------------------------------------------- /03_json-data/json-demo.md: -------------------------------------------------------------------------------- 1 | # Demo 2 | 3 | - Constructs and syntactic correctness 4 | - Flexibility of data model 5 | - JSON Schema, validation 6 | 7 | # Example 8 | 9 | ## [`Bookstore.json`](./data/Bookstore.json) 10 | 11 | JSON data representing books and magazines 12 | 13 | ```json 14 | { "Books": 15 | [ 16 | { "ISBN":"ISBN-0-13-713526-2", 17 | "Price":85, 18 | "Edition":3, 19 | "Title":"A First Course in Database Systems", 20 | "Authors":[ {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, 21 | {"First_Name":"Jennifer", "Last_Name":"Widom"} ] } 22 | , 23 | { "ISBN":"ISBN-0-13-815504-6", 24 | "Price":100, 25 | "Remark":"Buy this book bundled with 'A First Course' - a great deal!", 26 | "Title":"Database Systems:The Complete Book", 27 | "Authors":[ {"First_Name":"Hector", "Last_Name":"Garcia-Molina"}, 28 | {"First_Name":"Jeffrey", "Last_Name":"Ullman"}, 29 | {"First_Name":"Jennifer", "Last_Name":"Widom"} ] } 30 | ], 31 | "Magazines": 32 | [ 33 | { "Title":"National Geographic", 34 | "Month":"January", 35 | "Year":2009 } 36 | , 37 | { "Title":"Newsweek", 38 | "Month":"February", 39 | "Year":2009 } 40 | ] 41 | } 42 | ``` 43 | 44 | ## [`BookstoreSchema.json`](./data/BookstoreSchema.json) 45 | 46 | - The structure of the schema file reflects the structure of the data file it's describing. 47 | - The outermost constructs in the schema file are the outermost in the data file and as we nest, it parallels the nesting. 48 | - In JSON Schema, `additionalProperties: true | false` determines whether said data are allowed to have any properties beyond those that are specified in the schema. 49 | - Many parsers actually do enforce that labels or properties need to be unique within objects, even though technically syntatically correct JSON does allow multiple copies. 50 | - Numeric boundaries for integer types: 51 | 52 | "Price": { 53 | "type": integer, 54 | "minimum": min, 55 | "maximum": max 56 | } 57 | 58 | - We can constrain strings using a pattern (similar to regex), and we can constrain any type by enumerating the values that are allowed 59 | 60 | "Month": { 61 | "type": string, 62 | "enum": ["January", "February", "March", ... ] 63 | } 64 | 65 | 66 | ```json 67 | { "type":"object", 68 | "properties": { 69 | "Books": { 70 | "type":"array", 71 | "items": { 72 | "type":"object", 73 | "properties": { 74 | "ISBN": { "type":"string", "pattern":"ISBN*" }, 75 | "Price": { "type":"integer", 76 | "minimum":0, "maximum":200 }, 77 | "Edition": { "type":"integer", "optional": true }, 78 | "Remark": { "type":"string", "optional": true }, 79 | "Title": { "type":"string" }, 80 | "Authors": { 81 | "type":"array", 82 | "minItems":1, 83 | "maxItems":10, 84 | "items": { 85 | "type":"object", 86 | "properties": { 87 | "First_Name": { "type":"string" }, 88 | "Last_Name": { "type":"string" }}}}}}}, 89 | "Magazines": { 90 | "type":"array", 91 | "items": { 92 | "type":"object", 93 | "properties": { 94 | "Title": { "type":"string" }, 95 | "Month": { "type":"string", 96 | "enum":["January","February"] }, 97 | "Year": { "type":"integer" }}}} 98 | }} 99 | ``` 100 | -------------------------------------------------------------------------------- /03_json-data/json-intro-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/03_json-data/json-intro-annotated-slides.pdf -------------------------------------------------------------------------------- /03_json-data/json-intro.md: -------------------------------------------------------------------------------- 1 | # JavaScript Object Notation (JSON) 2 | 3 | - Like XML, can be thought of as a data model 4 | - Standard for "serializing" data objects, usually in files 5 | - *serializing*: taking objects that are in a program and writing them in a serial fashion 6 | - Human-readable, useful for data interchange 7 | - An alternative to the relational data model that is more appropriate for semi-structured data 8 | - No longer tied to JavaScript 9 | - Parsers available for many languages 10 | 11 | ## Base constructs (recursive) 12 | 13 | - Base values: number, string, boolean 14 | - Objects: sets of label-value pairs (enclosed in `{}`) 15 | - Arrays: list of values (enclosed in `[]`) 16 | 17 | --- 18 | 19 | # Relational Model versus JSON 20 | 21 | ## Structure 22 | 23 | - Relational: *tables* 24 | - JSON: *sets, arrays* 25 | - can be nested recursively 26 | 27 | ## Schema 28 | 29 | - Relational: *fixed in advance* 30 | - add the data afterwards to conform to the schema 31 | - JSON: *no schema required* 32 | - schema and data are kind of mixed together (just like XML) 33 | - often referred to as ***self-describing data***, where the schema elements are within the data itself 34 | 35 | ## Queries 36 | 37 | - Relational: *simple, nice languages* 38 | - JSON: *nothing widely used* 39 | - typically read into a program and manipulated programmatically 40 | - there exists a JSON path language, JSON Query (jaql) 41 | 42 | ## Ordering 43 | 44 | - Relational: *None* 45 | - fundamentally, the data in a relationship database is a set of data without an ordering within that set 46 | - JSON: *arrays* (ordered) 47 | 48 | ## Implementation 49 | 50 | - Relational: *Native* 51 | - has been around for at least 35 years 52 | - JSON: *coupled with programming languages* (no standalone systems) 53 | - used in NoSQL videos as a format for reading and writing data 54 | - some NoSQL systems are ***document management systems***, where the documents themselves may contain JSON data and then the systems will have special features for manipulating the JSON in the document that is stored by the system 55 | 56 | --- 57 | 58 | # XML versus JSON 59 | 60 | ## Verbosity 61 | 62 | XML > JSON: 63 | 64 | - largely due to closing tags in XML 65 | 66 | ## Complexity 67 | 68 | XML > JSON: 69 | 70 | - a lot of extra stuff in XML 71 | - takes more time to read an entire XML specification than to read JSON 72 | 73 | ## Validity 74 | 75 | - XML: document-type descriptors (DTDs), XML schema descriptors (XSD) 76 | - JSON: JSON schema 77 | - a way to specify structure and tests conformation 78 | - not widely used (as of Feb. 2012) 79 | 80 | ## Programming Interface 81 | 82 | - XML: potentially clunky 83 | - ***impedance mismatch***: attributes and sub-elements of an XML model don't typically match the model of data inside a programming language 84 | - discussed heavily in the field of database systems (one of the original criticisms of relational database systems) 85 | - JSON: more direct mapping 86 | - i.e. Python dictionaries are very similar to JSON 87 | 88 | ## Querying 89 | 90 | - XML widely used: 91 | - XPath 92 | - XQuery 93 | - XSLT 94 | - JSON: 95 | - JSON Path (proposal) 96 | - JSON Query 97 | - JAQL 98 | 99 | --- 100 | 101 | # Syntatically valid JSON 102 | 103 | ## Basic structural requirements 104 | 105 | - Sets of label-value pairs 106 | - Arrays of values 107 | - Base values from predefined types 108 | 109 | ## Process 110 | 111 | 1. Send JSON file to parser 112 | 2. Parser determines whether there are syntactic errors or not 113 | 3. If so, parse and manipulated via programming language 114 | 115 | # Semantically valid JSON 116 | 117 | ## Basic structural requirements 118 | 119 | Additionally checks whether data conforms to specified schema 120 | 121 | - if we use JSON schema, we put specification in a separate file 122 | - send to validator (additional step before checking syntactic errors) 123 | 124 | --- 125 | 126 | # Questions 127 | 128 | 1. You're creating a database to contain information about students in a class (name and ID), and class projects done in pairs (two students and a project title). Should you use the relational model or JSON? 129 | 130 | **ANSWER**: Relational 131 | 132 | **EXPLANATION**: The database has a fixed structure that lends itself to tables (one table for student information and one for project information) and convenient queries in a relational language. 133 | 134 | 2. You're creating a database to contain information about students in a class (name and ID), and class projects. Projects may include any combination of students; they have a title and optional additional information such as materials, approvals, and milestones. Should you use the relational model or JSON? 135 | 136 | **ANSWER**: JSON 137 | 138 | **EXPLANATION**: The database has a complex, irregular, and possibly dynamic structure, so the flexibility of JSON is warranted. 139 | 140 | 3. You're creating a database to contain a set of sensor measurements from a two-dimensional grid. Each measurement is a time-sequence of readings, and each reading contains ten labeled values. Should you use the relational model or JSON? 141 | 142 | **ANSWER**: Either one is appropriate 143 | 144 | **EXPLANATION**: The database has a fixed structure suggesting relational, but its nested array, list, and label-value structure suggests JSON. Either may be suitable. 145 | -------------------------------------------------------------------------------- /03_json-data/json-quiz-solutions.md: -------------------------------------------------------------------------------- 1 | # Solutions to [JSON Quiz](./json-quiz.md) 2 | 3 | 1. **EXPLANATION**: 4 | 5 | In JSON objects, all labels (property names) must be quoted, and all label-value pairs must be separated by commas. Values in label-value pairs can be numbers, quoted strings, true, false, null, objects, or arrays. Objects and arrays may be empty. 6 | 7 | 2. **EXPLANATION**: 8 | 9 | A JSON array is a comma-separated, [] enclosed list of JSON values. Values can be numbers, quoted strings, true, false, null, objects, or arrays. Objects and arrays may be empty. Objects must be a set of label-value pairs. 10 | 11 | 3. **ANSWER**: 12 | 13 | ```json 14 | { "ItemID": "Item123", 15 | "ItemName": "desk", 16 | "Price": 50, 17 | "Sellers": ["Amy","Ben"], 18 | "Ratings": [{"Rater":"Amy", "Score":5}, {"Score":1}, 19 | {"Rater":"Carl", "Score":4}], 20 | "AvgRating": 10.0, 21 | "FreeShipping": true } 22 | ``` 23 | 24 | **EXPLANATION**: 25 | 26 | ON data that is valid according to the JSON Schema specification must have: an itemID that is a string beginning with "Item"; a Price that is an integer between 10 and 100; an array of between 0 and 3 Sellers; an array of 0 or more Ratings, each of which is either a Rater-Score pair, or just a Score, where scores are integers between 1 and 5; a FreeShipping designation of either true or false. (AvgShipping, a real number, is optional.) 27 | 28 | 4. **EXPLANATION**: 29 | 30 | The following JSON Schema specification is valid for the given data: 31 | 32 | ```json 33 | { "type": "object", 34 | "properties": { 35 | "A": {"type":"array", "minItems":4, 36 | "maxItems":4, "items":{"type":"integer"}}, 37 | "B": {"type":"object", 38 | "properties": {"C": {"type":"integer"}, "D": {"type":"integer"}}}, 39 | "E": {"type":"array", "items": {"type":["integer","boolean"]}}, 40 | "F": {"type":"object", 41 | "properties": {"G": {"type":"array", 42 | "items": {"type":["null","integer"]}}}} 43 | } 44 | } 45 | ``` 46 | 47 | Changing the minimum and/or maximum number of items in "A" is valid as long as four items are permitted. Alternative types may be added (e.g., replacing "integer" with ["integer","string"]) without violating validity. 48 | -------------------------------------------------------------------------------- /03_json-data/json-quiz.md: -------------------------------------------------------------------------------- 1 | # Multiple Choice 2 | 3 | 4 | 1. Why is this NOT a valid JSON object? 5 | 6 | ```json 7 | { "name": "Smiley", 8 | "age": 20, 9 | "phone": { "888-123-4567", "888-765-4321" }, 10 | "email": "smiley@xyz.com", 11 | "happy": true } 12 | ``` 13 | 14 | 2. Why is this NOT a valid JSON array? 15 | 16 | ```json 17 | [ 1, 2, "dog", "cat", true, false, [1, "dog", null], {} ] 18 | ``` 19 | 20 | 3. Consider the following JSON Schema specification: 21 | 22 | ```json 23 | { 24 | "type": "object", 25 | "properties": 26 | { "ItemID": { "type":"string", "pattern":"Item*" }, 27 | "ItemName": { "type":"string" }, 28 | "Price": { "type":"integer", "minimum":10, "maximum":100 }, 29 | "Sellers": { "type":"array", "maxItems":3, 30 | "items": { "type":"string" }}, 31 | "Ratings": { "type":"array", 32 | "items": 33 | { "type": "object", 34 | "properties": {"Rater": 35 | {"type": "string", "optional": true}, 36 | "Score": 37 | {"type": "integer", "minimum":1, 38 | "maxiumum":5}}}}, 39 | "AvgRating": { "type":"number", "optional":true }, 40 | "FreeShipping": {"type":"boolean" } 41 | } 42 | } 43 | ``` 44 | 45 | Provide the JSON data that is valid according to the JSON Schema specification above. 46 | 47 | 4. Consider the following JSON data: 48 | 49 | ```json 50 | { "A": [1,1,2,2], "B": {"C":3, "D":4}, "E":[5,6,true], "F": {"G": [null,7]} } 51 | ``` 52 | 53 | Why could the following NOT be included as part of a JSON Schema specification that is satisfied by the JSON data above? Assume that every letter ("A", "B", "C", ...) appears in the JSON Schema specification exactly once. 54 | 55 | ```json 56 | "G": {"type":"array", "items": {"type":"integer"}} 57 | ``` 58 | 59 | --- 60 | 61 | # [View Solutions](./json-quiz-solutions.md) 62 | -------------------------------------------------------------------------------- /04_relational-algebra/relational-algebra-1-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/04_relational-algebra/relational-algebra-1-annotated-slides.pdf -------------------------------------------------------------------------------- /04_relational-algebra/relational-algebra-2-annotated-slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/04_relational-algebra/relational-algebra-2-annotated-slides.pdf -------------------------------------------------------------------------------- /04_relational-algebra/relational-algebra-questions.md: -------------------------------------------------------------------------------- 1 | # Part 1 - Select, Project, Join 2 | 3 | 1. `Statement A: It can be useful to compose two selection operators.` 4 | 5 | For example: 6 | 7 | $$\sigma_{state='CA'}(\sigma_{enr>8000} \mathrm{College})$$ 8 | 9 | `Statement B: It can be useful to compose two projection operators.` 10 | 11 | For example: 12 | 13 | $$\pi_{GPA} (\pi_{sID,GPA,HS} \mathrm{Student})$$ 14 | 15 | --- 16 | 17 | **ANSWER**: Both statements are false. 18 | 19 | **EXPLANATION**: Two selection operators in a row can always be replaced by a single selection operator whose condition is the "and" of the two selection conditions. If there are two projection operators in a row, the attribute list of the second (outer) projection must be a subset of the attribute list of the first (inner) projection. Thus, the first projection can be removed without changing the result of the expression. 20 | 21 | --- 22 | 23 | 2. Which of the following expressons does NOT return the names and GPAs of students with HS>1000 who applied to CS and were rejected? 24 | 25 | A) $\pi_{sName, GPA}(\sigma_{Student.sID=Apply.sID}(\sigma_{HS>1000}(Student) \times \sigma_{major=`CS` \wedge dec=`R`}(Apply)))$ 26 | 27 | B) $\pi_{sName, GPA}(\sigma_{Student.sID=Apply.sID \wedge HS > 1000 \wedge major=`CS` \wedge dec=`R`}(Student \times \pi_{sID, major, dec}Apply))$ 28 | 29 | C) $\sigma_{Student.sID=Apply.sID}(\pi_{sName, GPA}(\sigma_{HS > 1000}Student \times \sigma_{major=`CS` \wedge dec=`R`}Apply))$ 30 | 31 | --- 32 | 33 | **ANSWER**: C 34 | 35 | **EXPLANATION**: The first two expressions are equivalent to the expression given in the video for this query. The third expression is invalid because the sID attributes are not included in the result of the expression to which condition Student.sID=Apply.sID is applied. 36 | 37 | --- 38 | 39 | 3. Which of the following English sentences describes the result of this expression: 40 | 41 | $$\pi_{sName, cName}(\sigma_{HS > enr}(\sigma_{state=`CA`}College \Join Student \Join \sigma_{major=`CS`}Apply))$$ 42 | 43 | A) All student-college name pairs, where the student is applying to major in CS at the college, the college is in California, and the college is smaller than some high school 44 | 45 | B) Students paired with all California colleges to which the student applied to major in CS, where at least one of those colleges is smaller than the student's high school 46 | 47 | C) Students paired with all colleges smaller than the student's high school to which the student applied to major in CS, where at least one of those colleges is in California 48 | 49 | D) Students paired with all California colleges smaller than the student's high school to which the student applied to major in CS 50 | 51 | --- 52 | 53 | **ANSWER**: D 54 | 55 | **EXPLANATION**: The inner natural join connects students with the colleges to which they've applied, allowing only California colleges and CS-major applicaitons. The outer selection condition filters out all applications except those where the high school is bigger than the college, and the final projection keeps the student and college names. 56 | 57 | --- 58 | 59 | 4. Which of the following expressions finds the IDs of all students such that some college bears the student's name? 60 | 61 | A) $\pi_{sID}(College \Join Student)$ 62 | 63 | B) $\pi_{sID}(\sigma_{cName=sName}(College \times Student))$ 64 | 65 | C) $\pi_{sID}(\pi_{cName}College \Join \pi_{cName}(\sigma_{sName=cName}Student))$ 66 | 67 | D) $\pi_{sID}(\sigma_{cName=sName}(\pi_{sID}Student \times College \times Student)$ 68 | 69 | --- 70 | 71 | **ANSWER**: B 72 | 73 | **EXPLANATION**: The first choice returns the IDs of all students in the database. The third choice is invalid because cName is not an attribute of Student. The fourth choice yields all sIDs in Student. 74 | 75 | \pagebreak 76 | 77 | # Part 2 - Set Operators, Renaming, Notation 78 | 79 | 1. Three of the following four expressions finds the names of all students who did not apply to major in CS or EE. Which one finds something different? (Hint: You should not assume student names are unique.) 80 | 81 | A) $\pi_{sName}(Student \Join (\pi_{sID}Student - (\pi_{sID}(\sigma_{major='CS'}Apply) \cup \pi_{sID}(\sigma_{major='EE'}Apply))))$ 82 | 83 | B) $\pi_{sName}(Student \Join (\pi_{sID}Student - \pi_{sID}(\sigma_{major='CS' \vee major='EE'}Apply)))$ 84 | 85 | C) $\pi_{sName}(\pi_{sID, sName}Student - \pi_{sID, sName}(Student \Join \pi_{sID}(\sigma_{major='CS' \vee major='EE'}Apply)))$ 86 | 87 | D) $\pi_{sName}Student - \pi_{sName}(Student \Join \pi_{sID}(\sigma_{major='CS' \vee major='EE'}Apply))$ 88 | 89 | --- 90 | 91 | **ANSWER**: D 92 | 93 | **EXPLANATION**: If there are two students named Susan, one who did not apply to CS or EE, and one who did, then 'Susan' will correctly be included in the result of the first three expressions, but will incorrectly be omitted from the result of the last one. 94 | 95 | --- 96 | 97 | 2. Which of the following English sentences describes the result of this expression: 98 | 99 | $$\pi_{cName}College - \pi_{cName}(Apply \Join (\pi_{sID}(\sigma_{GPA > 3.5}Student) \cap \pi_{sID}(\sigma_{major='CS'}Apply)))$$ 100 | 101 | A) All colleges with no GPA>3.5 applicants who applied for a CS major at that college 102 | 103 | B) All colleges with no GPA>3.5 applicants who applied for a CS major at any college 104 | 105 | C) All colleges where all applicants either have GPA>3.5 or applied for a CS major at that college 106 | 107 | D) All colleges where no applicants have GPA>3.5 or no applicants applied for a CS major at that college 108 | 109 | --- 110 | 111 | **ANSWER**: B 112 | 113 | **EXPLANATION**: The intersection finds the IDs of all students who have GPA>3.5 and applied for a CS major at any college. The natural-join with Apply finds all colleges those students applied for. The minus finds all colleges except the ones those students applied for. 114 | 115 | --- 116 | 117 | 3. Suppose relation Student has 20 tuples. What is the minimum and maximum number of tuples in the result of this expression: 118 | 119 | $$\rho_{s1(i1, n1, g, h)}Student \Join \rho_{s2(i2, n2, g, h)}Student$$ 120 | 121 | A) minimum = 0, maximum = 400 122 | 123 | B) minimum = 20, maximum = 20 124 | 125 | C) minimum = 20, maximum = 400 126 | 127 | D) minimum = 40, maximum = 40 128 | 129 | --- 130 | 131 | **ANSWER**: C 132 | 133 | **EXPLANATION**: If every student has a unique GPA-HS combination, then students join only with themselves, and there are 20 tuples in the result (minimum). If every student has the same GPA and HS as every other student, then all pairs join and the result has $20*20=400$ tuples. 134 | 135 | --- 136 | 137 | 4. Suppose relations College, Student, and Apply have 5, 20, and 50 tuples in them respectively. Remember that cName is a key for College. Do not assume sName is a key for Student. Do assume that college names in Apply also appear in College. What is the minimum and maximum number of tuples in the result of this expression: 138 | 139 | $$\pi_{cName}College \cup \rho_{cName}(\pi_{sName}Student) \cup \pi_{cName}Apply$$ 140 | 141 | A) minimum = 5, maximum = 25 142 | 143 | B) minimum = 5, maximum = 75 144 | 145 | C) minimum = 25, maximum = 45 146 | 147 | D) minimum = 75, maximum = 75 148 | 149 | --- 150 | 151 | **ANSWER**: A 152 | 153 | **EXPLANATION**: Recall that duplicates are eliminated automatically in relational algebra. If all students have names that are also college names, then there are only 5 names altogether (minimum). If every student has a unique name and none of them are college names, then there are 5+20=25 names altogether (maximum). Since all college names in Apply are also in College, the third term of the expression cannot add any new names. 154 | -------------------------------------------------------------------------------- /04_relational-algebra/relational-algebra-questions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eddowh/coursera_intro-to-databases/d13161f219ab2ae95532513c95523d982fddd549/04_relational-algebra/relational-algebra-questions.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Table of Contents 2 | 3 | - Introduction ([notes](00_introduction/notes.md)) 4 | - Relational Databases ([summary](01_relational-databases/summary.md)) 5 | - The Relational Model ([annotated slides](01_relational-databases/relational-model-annotated-slides.pdf)) 6 | - Querying Relational Databases ([annotated slides](01_relational-databases/relational-querying-annotated-slides.pdf)) 7 | - XML Data 8 | - Well-formed XML ([notes](02_xml-data/well-formed-xml.md) | [annotated slides](02_xml-data/well-formed-xml-annotated-slides.pdf)) 9 | - DTDs, IDs, and IDREFS ([notes](02_xml-data/dtds.md) | [annotated slides](02_xml-data/dtds-annotated-slides.pdf)) 10 | - XML Schema ([notes](02_xml-data/xml-schema.md) | [annotated slides](02_xml-data/xml-schema-annotated-slides.pdf)) 11 | - XML Quiz ([problems](02_xml-data/xml-quiz.md) | [solutions](02_xml-data/xml-quiz-solutions.md)) 12 | - JSON Data 13 | - JSON Introduction ([notes](03_json-data/json-intro.md) | [annotated slides](03_json-data/json-intro-annotated-slides.pdf)) 14 | - JSON Demo ([notes](03_json-data/json-demo.md) | [annotated slides](03_json-data/json-demo-annotated-slides.pdf)) 15 | - JSON Quiz ([problems](03_json-data/json-quiz.md) | [solutions](03_json-data/json-quiz-solutions.md)) 16 | - Relational Algebra 17 | - Relational Algebra Part 1: Select, Project, Join ([annotated slides](./04_relational-algebra/relational-algebra-1-annotated-slides.pdf)) 18 | - Relational Algebra Part 2: Set Operators, Renaming, Notation ([annotated slides](./04_relational-algebra/relational-algebra-2-annotated-slides.pdf)) 19 | - Relational Algebra Lecture Questions ([Raw MD](./04_relational-algebra/relational-algebra-questions.md) | [PDF](./04_relational-algebra/relational-algebra-questions.pdf)) 20 | - SQL 21 | - Relational Design Theory 22 | - Querying XML 23 | - Unified Modeling Language 24 | - Indexes 25 | - Transactions 26 | - Constraints and Triggers 27 | - Views 28 | - Authorization 29 | - Recursion 30 | - On-Line Analytical Processing 31 | - NoSQL Systems 32 | - Exams 33 | 34 | # About the Course 35 | 36 | > "Introduction to Databases" was one of Stanford's three inaugural massive open online courses in the fall of 2011; it was offered again in MOOC format in 2013 and 2014. Materials from the MOOC offerings have been available for self-study on Coursera as well as on other platforms. Starting in summer 2014, the materials are now being offered on the OpenEdX platform as a set of smaller self-paced "mini-courses", which can be assembled in a variety of ways to learn about different aspects of databases. All of the mini-courses are based around video lectures and/or video demos. Many of them include in-video quizzes to check understanding, in-depth standalone quizzes, and/or a variety of automatically-checked interactive programming exercises. Each mini-course also includes a discussion forum and pointers to readings and resources. Taught by Professor Jennifer Widom, the overall curriculum draws from Stanford's popular Databases course. To explore and enroll in the new Databases mini-courses, please visit . 37 | --------------------------------------------------------------------------------