├── README.md ├── courses.csv ├── sql-workshop.Rmd ├── sql_workshop.pdf ├── sql_workshop_022723.pdf ├── sql_workshop_030122.pdf ├── studentAssessment.csv ├── studentInfo.csv └── studentRegistration.csv /README.md: -------------------------------------------------------------------------------- 1 | # Columbia SQL Workshop 2 | 3 | [Slide Deck](https://github.com/learning-analytics-curriculum-teaching/sql-workshop/blob/master/sql_workshop_030122.pdf) 4 | 5 | ## Create MySQL Instance on Amazon Web Services 6 | 7 | * Log into your [AWS Management Console](https://console.aws.amazon.com) 8 | * Locate `RDS` under the `Databases` heading 9 | * Within Amazon RDS click `Create database` 10 | * Under `Choose a database creation method` click `Standard Create` 11 | * Under `Engine options` choose `MySQL` 12 | * Under `Templates` choose `Free tier` 13 | * Under `Settings` name your `DB instance identifier` as `sqltest` 14 | * Under `Credential settings` create a username and password combination and write it down (you will need it later) 15 | * Under `Connectivity` expand `Additional connectivity configuration` to show additional menu items and make sure that `Publicly accessible` is checked `Yes` 16 | * Expand the `Additional configuration` menu 17 | * Under `Initial database name` write `testdb` 18 | * Uncheck `Automatic backups` 19 | * Click `Create database` 20 | 21 | ## Modify Security Group 22 | 23 | * Under `Security Groups` click `Inbound` and then `Edit` 24 | * Add the rule `SQL/Aurora` on `Port 3306` with the `Connection` of `MyIP` 25 | -------------------------------------------------------------------------------- /courses.csv: -------------------------------------------------------------------------------- 1 | "code_module","code_presentation","module_presentation_length" 2 | "AAA","2013J","268" 3 | "AAA","2014J","269" 4 | "BBB","2013J","268" 5 | "BBB","2014J","262" 6 | "BBB","2013B","240" 7 | "BBB","2014B","234" 8 | "CCC","2014J","269" 9 | "CCC","2014B","241" 10 | "DDD","2013J","261" 11 | "DDD","2014J","262" 12 | "DDD","2013B","240" 13 | "DDD","2014B","241" 14 | "EEE","2013J","268" 15 | "EEE","2014J","269" 16 | "EEE","2014B","241" 17 | "FFF","2013J","268" 18 | "FFF","2014J","269" 19 | "FFF","2013B","240" 20 | "FFF","2014B","241" 21 | "GGG","2013J","261" 22 | "GGG","2014J","269" 23 | "GGG","2014B","241" 24 | -------------------------------------------------------------------------------- /sql-workshop.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "sql-workshop" 3 | author: "Charles Lang" 4 | date: "10/16/2019" 5 | output: html_document 6 | --- 7 | 8 | ## Connect to AMW MySQL Database 9 | ```{r} 10 | install.packages("DBI") 11 | install.packages("RMySQL") 12 | 13 | library(DBI) 14 | library(RMySQL) 15 | 16 | db_user <- 'admin' 17 | db_password <- 'sql022723' 18 | db_name <- 'testdb' 19 | db_host <- 'sqltest.clh43re5lq5h.us-east-1.rds.amazonaws.com' 20 | db_port <- 3306 21 | 22 | mydb <- dbConnect(MySQL(), user = db_user, password = db_password, dbname = db_name, host = db_host, port = db_port) 23 | 24 | summary(mydb) 25 | ``` 26 | 27 | ## Load OU Data 28 | ```{r} 29 | #Student demographic data 30 | studentInfo <- read.csv("studentInfo.csv", header = TRUE) 31 | #Student assessment data 32 | studentAssessment <- read.csv("studentAssessment.csv", header = TRUE) 33 | #Course data 34 | courses <- read.csv("courses.csv", header = TRUE) 35 | studentRegistration <- read.csv("studentRegistration.csv", header = TRUE) 36 | ``` 37 | 38 | ## Write data to the DB using the DBI package 39 | ```{r} 40 | #List the tables in the DB 41 | dbListTables(mydb) 42 | 43 | #Write a new table to the DB 44 | dbWriteTable(mydb, "studentInfo", studentInfo) 45 | dbWriteTable(mydb, "studentAssessment", studentAssessment) 46 | dbWriteTable(mydb, "courses", courses) 47 | dbWriteTable(mydb, "studentRegistration", studentRegistration) 48 | 49 | #List tables to see that table was added 50 | dbListTables(mydb) 51 | 52 | #Read a particular table 53 | dbReadTable(mydb, 'studentInfo') 54 | 55 | #EXERCISE 1 56 | #Make two toy data sets with at least three variables and at least 30 rows each in them. Have a mix of numeric and character variables. Transfer these dataframes to your SQL database using the DBI commands. Name the tables whatever you like. 57 | 58 | ``` 59 | 60 | ## Getting into SQL - READING 61 | ```{r} 62 | #Query a portion of the database (always returns dataframe) 63 | dbGetQuery(mydb, "SELECT * FROM studentInfo LIMIT 10;") 64 | 65 | dbGetQuery(mydb, "SELECT * FROM studentInfo ORDER BY id_student LIMIT 10;") 66 | 67 | dbGetQuery(mydb, "SELECT id_student, gender FROM studentInfo ORDER BY id_student DESC LIMIT 10;") #Order listed will be reflected in order in table 68 | 69 | dbGetQuery(mydb, "SELECT id_student AS 'Student ID', gender FROM studentInfo LIMIT 10;") #SQL Standard single quotes for literal strings and double quotes for everything else but that conflicts with R 70 | 71 | #Count the number of rows 72 | dbGetQuery(mydb, "SELECT COUNT(*) FROM studentAssessment;") 73 | 74 | #Using a WHERE statement on all columns 75 | dbGetQuery(mydb, "SELECT COUNT(*) FROM studentAssessment WHERE score > 50;") 76 | 77 | #Using a WHERE statement on a single column (will not include missing data) 78 | dbGetQuery(mydb, "SELECT COUNT(score) FROM studentAssessment WHERE score > 50;") 79 | 80 | #Using an AND statement 81 | dbGetQuery(mydb, "SELECT COUNT(*) FROM studentAssessment WHERE score > 50 AND id_assessment = '1752';") 82 | 83 | #EXERCISE 2 84 | #Read one of your toy data tables, make sure the output is ordered in descending order, you rename one of the variables and the output is limited to the first 20 rows. 85 | 86 | #Read the other table according to a condition of one of the variables. 87 | 88 | ``` 89 | 90 | ## Getting into SQL - UPDATING 91 | ```{r} 92 | #Count rows 93 | dbGetQuery(mydb, "SELECT COUNT(*) FROM studentAssessment;") 94 | 95 | #Add a row 96 | dbGetQuery(mydb, "INSERT INTO studentAssessment (id_assessment, id_student, date_submitted, is_banked, score) VALUES ('00001', '1', '20', '0', '50');") 97 | 98 | #Count rows again 99 | dbGetQuery(mydb, "SELECT COUNT(*) FROM studentAssessment;") 100 | 101 | #View inserted row 102 | dbGetQuery(mydb, "SELECT * FROM studentAssessment ORDER BY id_student LIMIT 10;") 103 | 104 | #Add a row with missing values 105 | dbGetQuery(mydb, "INSERT INTO studentAssessment (id_assessment, id_student, date_submitted) VALUES ('00001', '1', '20');") 106 | 107 | #View inserted row 108 | dbGetQuery(mydb, "SELECT * FROM studentAssessment ORDER BY id_student LIMIT 10;") 109 | 110 | #Update a row 111 | dbGetQuery(mydb, "UPDATE studentAssessment SET score = '20' WHERE id_student = 1;") 112 | 113 | dbGetQuery(mydb, "SELECT id_student, score FROM studentAssessment ORDER BY id_student LIMIT 10;") 114 | 115 | #Update a row with NULL 116 | dbGetQuery(mydb, "UPDATE studentAssessment SET score = 'NULL' WHERE id_student = 6516;") 117 | 118 | #Delete a row (destructive) 119 | dbGetQuery(mydb, "DELETE FROM studentAssessment WHERE id_student = 1;") 120 | 121 | dbGetQuery(mydb, "SELECT * FROM studentAssessment ORDER BY id_student LIMIT 10;") 122 | 123 | #EXERCISE 3 124 | #Insert a new row in one of your toy data tables leaving one variable empty. Change one value in your other table. Display your new tables. Delete the row you edited and the row you inserted. 125 | 126 | 127 | ``` 128 | 129 | ## Add/Deleting Table 130 | ```{r} 131 | #Creating a new table in SQL 132 | dbGetQuery(mydb,"CREATE TABLE test ( 133 | score INTEGER, 134 | student TEXT 135 | );") 136 | 137 | dbListTables(mydb) 138 | 139 | #Inserting data into the table 140 | dbGetQuery(mydb, "INSERT INTO test VALUES ( 10, 'Amy' );") 141 | dbGetQuery(mydb, "INSERT INTO test VALUES ( 11, 'Jen' );") 142 | dbGetQuery(mydb, "INSERT INTO test VALUES ( 9, 'Frank' );") 143 | 144 | dbGetQuery(mydb, "SELECT * FROM test;") 145 | 146 | #Inserting a NULL row 147 | #dbGetQuery(mydb, "INSERT INTO test DEFAULT VALUES;") 148 | 149 | dbGetQuery(mydb,"INSERT INTO test (score, student) SELECT score, id_student FROM studentAssessment;") 150 | 151 | dbGetQuery(mydb, "SELECT * FROM test;") 152 | 153 | # NULL Value 154 | 155 | #NULL is a state (similar to R), represents the lack of a value. But is not compatible with R backend so this code doesn't work as part of dbGetQuery() 156 | 157 | #This doesn't work because NULL is not a value (it cannot be "equal" to anything as it is ignored) 158 | SELECT * FROM test WHERE score = NULL; 159 | 160 | #Instead use "is" 161 | dbGetQuery(mydb, "SELECT * FROM test WHERE score is NULL;") 162 | 163 | #Delete a table 164 | dbGetQuery(mydb, "DROP TABLE test;") 165 | 166 | dbGetQuery(mydb, "SELECT * FROM test;") #This should produce an error 167 | 168 | #Delete a table if it exists 169 | dbGetQuery(mydb, "DROP TABLE IF EXISTS test;") 170 | 171 | #EXERCISE 4 172 | #Create a table that is exactly the same as your first toy data table but this time use SQL commands. Display your new table. Then delete the original table. 173 | 174 | ``` 175 | 176 | # Constraints 177 | ```{r} 178 | #Create table where student column *cannot* be NULL 179 | dbGetQuery(mydb,"CREATE TABLE test2 ( 180 | score INTEGER, 181 | student TEXT NOT NULL 182 | );") 183 | 184 | dbGetQuery(mydb, "DROP TABLE IF EXISTS test2;") 185 | 186 | #Create table where score column defaults to zero 187 | 188 | dbGetQuery(mydb,"CREATE TABLE test2 ( 189 | score INTEGER DEFAULT 0, 190 | student TEXT 191 | );") 192 | 193 | dbGetQuery(mydb,"INSERT INTO test2 (score, student) VALUES ('1', 'A');") 194 | dbGetQuery(mydb,"INSERT INTO test2 (student) VALUES ('B');") 195 | 196 | dbGetQuery(mydb, "SELECT * FROM test2;") 197 | 198 | dbGetQuery(mydb, "DROP TABLE IF EXISTS test2;") 199 | 200 | 201 | # Create a table where column must have unique value 202 | dbGetQuery(mydb,"CREATE TABLE test2 ( 203 | score INTEGER UNIQUE, 204 | student TEXT 205 | );") 206 | 207 | dbGetQuery(mydb,"INSERT INTO test2 (score, student) VALUES ('1', 'A');") 208 | 209 | #Error because of unique 210 | dbGetQuery(mydb,"INSERT INTO test2 (score, student) VALUES ('1', 'A');") 211 | 212 | #NULL is exempt 213 | dbGetQuery(mydb,"INSERT INTO test2 (score, student) VALUES (NULL, 'A');") 214 | dbGetQuery(mydb,"INSERT INTO test2 (score, student) VALUES (NULL, 'A');") 215 | 216 | #EXERCISE 5 217 | #Recreate one of your toy data tables with the constraint that for one of the integer variables the default value will be zero. Test your table by inserting some empty values. Display your new tables. Then delete your table. 218 | 219 | ``` 220 | 221 | 222 | # Adding a column with a default value 223 | ```{r} 224 | #Add a column with default value 1 225 | dbGetQuery(mydb, "ALTER TABLE studentAssessment ADD email INTEGER DEFAULT 1 ") 226 | 227 | dbGetQuery(mydb, "SELECT * FROM studentAssessment LIMIT 10;") 228 | 229 | #Delete a column 230 | dbGetQuery(mydb, "ALTER TABLE studentAssessment DROP COLUMN email;") 231 | 232 | #EXERCISE 6 233 | #Add a column to one of your toy data tables with a default value of 3. Display your new table. Delete this column. 234 | ``` 235 | 236 | 237 | # ID Columns 238 | ```{r} 239 | dbGetQuery(mydb,"CREATE TABLE test3 ( 240 | id INTEGER AUTO_INCREMENT PRIMARY KEY, #Not standard syntax 241 | score INTEGER, 242 | student TEXT 243 | );") 244 | 245 | dbGetQuery(mydb,"INSERT INTO test3 (score, student) VALUES (1, 'A');") 246 | dbGetQuery(mydb,"INSERT INTO test3 (score, student) VALUES (5, 'B');") 247 | 248 | dbGetQuery(mydb, "SELECT * FROM test3;") 249 | 250 | dbGetQuery(mydb, "DROP TABLE IF EXISTS test3;") 251 | 252 | #EXERCISE 7 253 | #Create a new table with four variables and a primary key that is a sequential id value. 254 | 255 | ``` 256 | 257 | ## Filtering (WHERE) 258 | ```{r} 259 | dbGetQuery(mydb, "SELECT id_student, date_submitted FROM studentAssessment WHERE date_submitted > 550 ORDER BY date_submitted DESC;") 260 | 261 | #OR Statement 262 | dbGetQuery(mydb, "SELECT id_student, date_submitted FROM studentAssessment WHERE date_submitted > 550 OR date_submitted < 2 ORDER BY date_submitted DESC;") 263 | 264 | #AND Statement 265 | dbGetQuery(mydb, "SELECT id_student, date_submitted FROM studentAssessment WHERE date_submitted > 550 AND id_student = 325750 ORDER BY date_submitted DESC;") 266 | 267 | #LIKE - Character string matches a pattern 268 | # %aaa% - Finds any values that have "aaa" in any position 269 | # aaa% - Finds any values that start with "aaa" 270 | # _aaa% - Finds any values that have aaa in the second, third, fourth positions 271 | 272 | dbGetQuery(mydb, "SELECT id_student, gender, region FROM studentInfo WHERE region LIKE '%Region%';") 273 | 274 | #Begin with 'Region' 275 | dbGetQuery(mydb, "SELECT id_student, gender, region FROM studentInfo WHERE region LIKE 'Region%';") 276 | 277 | #End with 'Region' 278 | dbGetQuery(mydb, "SELECT id_student, gender, region FROM studentInfo WHERE region LIKE '%Region';") 279 | 280 | #'c' is the second letter 281 | dbGetQuery(mydb, "SELECT id_student, gender, region FROM studentInfo WHERE region LIKE '_c%';") 282 | 283 | #IN 284 | dbGetQuery(mydb, "SELECT id_student, gender, region FROM studentInfo WHERE region IN ('Wales','Ireland');") 285 | 286 | #EXERCISE 8 287 | #Query one of your original toy data tables, for two different conditions. 288 | 289 | ``` 290 | 291 | ## Removing Duplicates 292 | ```{r} 293 | dbGetQuery(mydb, "SELECT DISTINCT region FROM studentInfo;") 294 | 295 | dbGetQuery(mydb, "SELECT DISTINCT region, gender FROM studentInfo;") 296 | 297 | #EXERCISE 9 298 | #Insert a duplicate row into one of your toy data tables. Then query the table without including duplicates. 299 | 300 | ``` 301 | 302 | ## Conditional Expressions (non-standard) 303 | ```{r} 304 | dbGetQuery(mydb, "CREATE TABLE booltest (a INTEGER, b INTEGER);") 305 | dbGetQuery(mydb, "INSERT INTO booltest VALUES (1, 0);") 306 | dbGetQuery(mydb, "SELECT * FROM booltest;") 307 | 308 | dbGetQuery(mydb,"SELECT 309 | CASE WHEN a THEN 'true' ELSE 'false' END as boolA, 310 | CASE WHEN b THEN 'true' ELSE 'false' END as boolB 311 | FROM booltest") 312 | 313 | dbGetQuery(mydb,"SELECT 314 | CASE a WHEN 1 THEN 'true' ELSE 'false' END as boolA, 315 | CASE b WHEN 1 THEN 'true' ELSE 'false' END as boolB 316 | FROM booltest") 317 | ``` 318 | 319 | #Relationships (JOIN) - *Slide* 320 | ```{r} 321 | 322 | #This will take a while because of the architecture we are using 323 | #dbGetQuery(mydb,"SELECT id_student, gender, region, date_registration 324 | #FROM studentInfo 325 | #JOIN studentRegistration 326 | #ON studentInfo.id_student = studentRegistration.id_student 327 | #;") 328 | 329 | dbGetQuery(mydb, "CREATE TABLE left_table (id INTEGER, description TEXT);") 330 | dbGetQuery(mydb, "CREATE TABLE right_table (id INTEGER, description TEXT);") 331 | 332 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 1, 'left 01');") 333 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 2, 'left 02');") 334 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 3, 'left 03');") 335 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 4, 'left 04');") 336 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 5, 'left 05');") 337 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 6, 'left 06');") 338 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 7, 'left 07');") 339 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 8, 'left 08');") 340 | dbGetQuery(mydb, "INSERT INTO left_table VALUES ( 9, 'left 09');") 341 | 342 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 6, 'left 06');") 343 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 7, 'left 07');") 344 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 8, 'left 08');") 345 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 9, 'left 09');") 346 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 10, 'left 10');") 347 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 11, 'left 11');") 348 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 12, 'left 12');") 349 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 13, 'left 13');") 350 | dbGetQuery(mydb, "INSERT INTO right_table VALUES ( 14, 'left 14');") 351 | 352 | dbGetQuery(mydb, "SELECT * FROM left_table;") 353 | dbGetQuery(mydb, "SELECT * FROM right_table;") 354 | 355 | #Everything that matches between tables 356 | dbGetQuery(mydb,"SELECT l.description AS left_table, r.description AS right_table 357 | FROM left_table AS l 358 | JOIN right_table AS r ON l.id = r.id") 359 | 360 | #Evereything from right, matches from left 361 | dbGetQuery(mydb,"SELECT l.description AS left_table, r.description AS right_table 362 | FROM left_table AS l 363 | RIGHT JOIN right_table AS r ON l.id = r.id") 364 | 365 | #Evereything from left, matches from right 366 | dbGetQuery(mydb,"SELECT l.description AS left_table, r.description AS right_table 367 | FROM left_table AS l 368 | LEFT JOIN right_table AS r ON l.id = r.id") 369 | 370 | #Union 371 | dbGetQuery(mydb, "SELECT * FROM left_table 372 | UNION 373 | SELECT * FROM right_table;") 374 | 375 | 376 | #EXERCISE 10 377 | # Create an a common id variable in your two toy data tables. Then do join those tables so that your query returns all the values from one table and only those that match from the other. 378 | 379 | ``` 380 | # ChatGPT Version 381 | ```{r} 382 | #Write the SQL code to create two tables called left and right with seven rows of data in two columns, id which is an integer and description which is text. They should have some overlapping values in them. Then right the code to join all the matching rows between the two tables 383 | 384 | #EXERCISE 11 385 | # Ask CHatGPT to write some SQL code and then try to run it on your DB 386 | ``` 387 | 388 | 389 | 390 | # Disconnect your database 391 | ```{r} 392 | dbDisconnect(mydb) 393 | ``` 394 | 395 | -------------------------------------------------------------------------------- /sql_workshop.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learning-analytics-curriculum-teaching/sql-workshop/37bb83362db73ac054f18e0b4b3b5c56e2aa8fcc/sql_workshop.pdf -------------------------------------------------------------------------------- /sql_workshop_022723.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learning-analytics-curriculum-teaching/sql-workshop/37bb83362db73ac054f18e0b4b3b5c56e2aa8fcc/sql_workshop_022723.pdf -------------------------------------------------------------------------------- /sql_workshop_030122.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learning-analytics-curriculum-teaching/sql-workshop/37bb83362db73ac054f18e0b4b3b5c56e2aa8fcc/sql_workshop_030122.pdf --------------------------------------------------------------------------------