├── README.md ├── discussion ├── disc1.pdf ├── disc10.pdf ├── disc10_sol.pdf ├── disc11_sol.pdf ├── disc12_soln.pdf ├── disc13.pdf ├── disc13_sol.pdf ├── disc1_sol.pdf ├── disc2.pdf ├── disc2_sol.pdf ├── disc3.pdf ├── disc3_soln.pdf ├── disc4.pdf ├── disc4_sol.pdf ├── disc5.pdf ├── disc5_sol.pdf ├── disc6.pdf ├── disc6_sol.pdf ├── disc7.pdf ├── disc7_sol.pdf ├── disc8.pdf ├── disc8_sol.pdf ├── disc9.pdf └── disc9_sol.pdf ├── exam-prep ├── B+Trees&Buffer Mng&Relational Algebra&Sorting:Hashing │ ├── Exam Prep 2 Slides - Fall 2020.pdf │ ├── Exam Prep 2 Solutions - Fall 2020.pdf │ └── Exam Prep 2 Worksheet - Fall 2020.pdf ├── Iterators&Joins &Query Optimization │ ├── Exam Prep 3 Slides - Fall 2020.pdf │ ├── Exam Prep 3 Solutions - Fall 2020.pdf │ └── Exam Prep 3 Worksheet - Fall 2020.pdf ├── Parallel Query Processing, DB Design │ ├── Exam Prep 5 Slides - Fall 2020.pdf │ ├── Exam Prep 5 Solutions - Fall 2020.pdf │ └── Exam Prep 5 Worksheet - Fall 2020.pdf ├── SQL&Files │ ├── Exam Prep 1 Slides - Fall 2020.pdf │ ├── Exam Prep 1 Solutions - Fall 2020.pdf │ └── Exam Prep 1 Worksheet - Fall 2020.pdf └── Transactions & Concurrency, Recovery │ ├── Exam Prep 4 Slides - Fall 2020.pdf │ ├── Exam Prep 4 Solutions - Fall 2020.pdf │ └── Exam Prep 4 Worksheet - Fall 2020.pdf ├── notes ├── n00-SQLPart1.pdf ├── n01-SQLPart2.pdf ├── n02-DisksFiles.pdf ├── n03-B+Trees.pdf ├── n04-BufferMgmt.pdf ├── n05-RelAlg.pdf ├── n06-Sorting.pdf ├── n07-Hashing.pdf ├── n08-Joins.pdf ├── n09-QueryOpt.pdf ├── n10-XactConc-I.pdf ├── n11-XactConc-II.pdf ├── n12-Recovery.pdf ├── n13-DBDesign.pdf ├── n14-PQProcessing.pdf ├── n15-DistXact.pdf ├── n16-NoSQL.pdf └── n17-MRSpark.pdf ├── ppt ├── 02 SQL 1 Final.pptx ├── 03 SQL II Final.pptx ├── 04 Files and Buffers.pptx ├── 04.5 Files,Pages, Records.pptx ├── 05 Files Heap Files vs Sorted Files.pptx ├── 06 Trees and Indexes FINAL animated.pptx ├── 07 Tree-Indexes-final-jmh.pptx ├── 08 Buffer Management Final.pptx ├── 09 Sort Hash -JMH FINAL.pptx ├── 10 relational algebra - final - jmh.pptx ├── 11 Joins final JMH.pptx ├── 12 Parallel Queries final JMH.pptx ├── 13 Query Plan Space JMH Final (1).pptx ├── 14 Query Optimization JMH FINAL.pptx ├── 15 Text Search JMH FINAL.pptx ├── 16 Relational Modeling FINAL JMH.pptx ├── 17 FDs and Normalization JMH FINAL.pptx ├── 18 Transactions 1 JMH FINAL.pptx ├── 19 Transactions 2 JMH FINAL.pptx ├── 20 Recovery FINAL JMH.pptx ├── 21 Ranking and Crawling.pptx ├── 22 Distributed Transactions JMH FINAL.pptx ├── 23 Big Data and Data Wrangling.pptx ├── 24 Replication and NoSQL.pptx └── 25 Closing Comments.pptx └── project-handout ├── hw0 └── README.md ├── proj0 ├── README.md ├── getting-started.md ├── submitting.md └── your-tasks.md ├── proj1.md ├── proj1 ├── README.md ├── getting-started.md ├── submitting.md ├── testing.md └── your-tasks.md ├── proj2.md ├── proj2 ├── README.md ├── getting-started.md ├── submission.md ├── testing.md └── your-tasks.md ├── proj3.md ├── proj3 ├── README.md ├── getting-started.md ├── part-1-join-algorithms │ ├── README.md │ ├── task-1-debugging.md │ └── task-2-common-errors.md ├── part-2-query-optimization.md ├── skeleton-code.md ├── submitting-the-assignment.md └── testing.md ├── proj4.md ├── proj4 ├── README.md ├── getting-started.md ├── part-1-lockmanager.md ├── part-2-lockcontext-and-lockutil.md ├── skeleton-code.md ├── submitting-the-assignment.md └── testing.md ├── proj5.md ├── proj5 ├── README.md ├── getting-started.md ├── submitting-the-assignment.md ├── testing.md └── your-tasks.md ├── proj6.md └── proj6 ├── README.md ├── getting-started.md ├── submitting-the-assignment.md ├── testing.md └── your-tasks.md /README.md: -------------------------------------------------------------------------------- 1 | # CS186 : Introduction to Database Systems 2 | 3 | This repo contains all the learning materials for Berkeley's Database course CS186. 4 | 5 | ## Course Resources 6 | 7 | ### General 8 | 9 | [Course website](https://cs186berkeley.net/fa20/): I used the 2021 spring version. 10 | 11 | [Recorded videos](https://www.youtube.com/user/CS186Berkeley/playlists): I watched the recorded videos on Youtube, you can also find the same videos on bilibili. 12 | 13 | Vitamin: Vitamins are short, weekly assignments to keep you on schedule and check your understanding of the basics from lecture. However these assignments are not open to the public, you can find them on the [Edx's archived CS186W](https://learning.edge.edx.org/course/course-v1:BerkeleyX+CS186+2018_SP/). You can also download the ppt and watch course videos on Edx. 14 | 15 | [PPT](./ppt): I downloaded them from Edx. 16 | 17 | [Notes](./notes): These notes serve as a great explanation and conclusion of the course contents. I read the 2021 spring version. 18 | 19 | [Discussion](./discussion): Enhance your understanding of the course contents. 20 | 21 | [Exam-preparation](./exam-prep): Review the course contents and practice on some exam-like problems. 22 | 23 | ### Projects 24 | 25 | There are six projects in total. There is a [gitbook](https://cs186.gitbook.io/project/) for CS186 projects, but it may be updated each semester. I cloned the 2021 spring version, you can find the projects handout that I used [here](./project-handout). 26 | 27 | - Project 1: SQL 28 | - You will learn to write SQL queries in this project. 29 | - My implementation is in this [repo](https://github.com/PKUFlyingPig/CS186-proj1). 30 | - Project 2 - 4 : Implement a simple relational database —— rookiedb 31 | - My implementation is in this [repo](https://github.com/PKUFlyingPig/CS186-Rookiedb). 32 | - project 2: B+ tree 33 | - project 3: joins and query optimization 34 | - project 4: concurrency 35 | - project 5: recovery 36 | - Project 6: NoSQL 37 | - You will learn to write mongodb queries in this project. 38 | - My implementation is in this [repo](https://github.com/PKUFlyingPig/CS186-proj6). 39 | 40 | ## Want to learn more ? 41 | 42 | Check out [this repository](https://github.com/PKUFlyingPig/Self-learning-Computer-Science) which contains all my self-learning materials : ) 43 | 44 | 45 | 46 | 47 | 48 | -------------------------------------------------------------------------------- /discussion/disc1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc1.pdf -------------------------------------------------------------------------------- /discussion/disc10.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc10.pdf -------------------------------------------------------------------------------- /discussion/disc10_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc10_sol.pdf -------------------------------------------------------------------------------- /discussion/disc11_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc11_sol.pdf -------------------------------------------------------------------------------- /discussion/disc12_soln.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc12_soln.pdf -------------------------------------------------------------------------------- /discussion/disc13.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc13.pdf -------------------------------------------------------------------------------- /discussion/disc13_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc13_sol.pdf -------------------------------------------------------------------------------- /discussion/disc1_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc1_sol.pdf -------------------------------------------------------------------------------- /discussion/disc2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc2.pdf -------------------------------------------------------------------------------- /discussion/disc2_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc2_sol.pdf -------------------------------------------------------------------------------- /discussion/disc3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc3.pdf -------------------------------------------------------------------------------- /discussion/disc3_soln.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc3_soln.pdf -------------------------------------------------------------------------------- /discussion/disc4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc4.pdf -------------------------------------------------------------------------------- /discussion/disc4_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc4_sol.pdf -------------------------------------------------------------------------------- /discussion/disc5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc5.pdf -------------------------------------------------------------------------------- /discussion/disc5_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc5_sol.pdf -------------------------------------------------------------------------------- /discussion/disc6.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc6.pdf -------------------------------------------------------------------------------- /discussion/disc6_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc6_sol.pdf -------------------------------------------------------------------------------- /discussion/disc7.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc7.pdf -------------------------------------------------------------------------------- /discussion/disc7_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc7_sol.pdf -------------------------------------------------------------------------------- /discussion/disc8.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc8.pdf -------------------------------------------------------------------------------- /discussion/disc8_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc8_sol.pdf -------------------------------------------------------------------------------- /discussion/disc9.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc9.pdf -------------------------------------------------------------------------------- /discussion/disc9_sol.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/discussion/disc9_sol.pdf -------------------------------------------------------------------------------- /exam-prep/B+Trees&Buffer Mng&Relational Algebra&Sorting:Hashing/Exam Prep 2 Slides - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/B+Trees&Buffer Mng&Relational Algebra&Sorting:Hashing/Exam Prep 2 Slides - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/B+Trees&Buffer Mng&Relational Algebra&Sorting:Hashing/Exam Prep 2 Solutions - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/B+Trees&Buffer Mng&Relational Algebra&Sorting:Hashing/Exam Prep 2 Solutions - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/B+Trees&Buffer Mng&Relational Algebra&Sorting:Hashing/Exam Prep 2 Worksheet - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/B+Trees&Buffer Mng&Relational Algebra&Sorting:Hashing/Exam Prep 2 Worksheet - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Iterators&Joins &Query Optimization/Exam Prep 3 Slides - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Iterators&Joins &Query Optimization/Exam Prep 3 Slides - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Iterators&Joins &Query Optimization/Exam Prep 3 Solutions - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Iterators&Joins &Query Optimization/Exam Prep 3 Solutions - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Iterators&Joins &Query Optimization/Exam Prep 3 Worksheet - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Iterators&Joins &Query Optimization/Exam Prep 3 Worksheet - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Parallel Query Processing, DB Design/Exam Prep 5 Slides - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Parallel Query Processing, DB Design/Exam Prep 5 Slides - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Parallel Query Processing, DB Design/Exam Prep 5 Solutions - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Parallel Query Processing, DB Design/Exam Prep 5 Solutions - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Parallel Query Processing, DB Design/Exam Prep 5 Worksheet - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Parallel Query Processing, DB Design/Exam Prep 5 Worksheet - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/SQL&Files/Exam Prep 1 Slides - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/SQL&Files/Exam Prep 1 Slides - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/SQL&Files/Exam Prep 1 Solutions - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/SQL&Files/Exam Prep 1 Solutions - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/SQL&Files/Exam Prep 1 Worksheet - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/SQL&Files/Exam Prep 1 Worksheet - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Transactions & Concurrency, Recovery/Exam Prep 4 Slides - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Transactions & Concurrency, Recovery/Exam Prep 4 Slides - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Transactions & Concurrency, Recovery/Exam Prep 4 Solutions - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Transactions & Concurrency, Recovery/Exam Prep 4 Solutions - Fall 2020.pdf -------------------------------------------------------------------------------- /exam-prep/Transactions & Concurrency, Recovery/Exam Prep 4 Worksheet - Fall 2020.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/exam-prep/Transactions & Concurrency, Recovery/Exam Prep 4 Worksheet - Fall 2020.pdf -------------------------------------------------------------------------------- /notes/n00-SQLPart1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n00-SQLPart1.pdf -------------------------------------------------------------------------------- /notes/n01-SQLPart2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n01-SQLPart2.pdf -------------------------------------------------------------------------------- /notes/n02-DisksFiles.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n02-DisksFiles.pdf -------------------------------------------------------------------------------- /notes/n03-B+Trees.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n03-B+Trees.pdf -------------------------------------------------------------------------------- /notes/n04-BufferMgmt.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n04-BufferMgmt.pdf -------------------------------------------------------------------------------- /notes/n05-RelAlg.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n05-RelAlg.pdf -------------------------------------------------------------------------------- /notes/n06-Sorting.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n06-Sorting.pdf -------------------------------------------------------------------------------- /notes/n07-Hashing.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n07-Hashing.pdf -------------------------------------------------------------------------------- /notes/n08-Joins.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n08-Joins.pdf -------------------------------------------------------------------------------- /notes/n09-QueryOpt.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n09-QueryOpt.pdf -------------------------------------------------------------------------------- /notes/n10-XactConc-I.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n10-XactConc-I.pdf -------------------------------------------------------------------------------- /notes/n11-XactConc-II.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n11-XactConc-II.pdf -------------------------------------------------------------------------------- /notes/n12-Recovery.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n12-Recovery.pdf -------------------------------------------------------------------------------- /notes/n13-DBDesign.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n13-DBDesign.pdf -------------------------------------------------------------------------------- /notes/n14-PQProcessing.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n14-PQProcessing.pdf -------------------------------------------------------------------------------- /notes/n15-DistXact.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n15-DistXact.pdf -------------------------------------------------------------------------------- /notes/n16-NoSQL.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n16-NoSQL.pdf -------------------------------------------------------------------------------- /notes/n17-MRSpark.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/notes/n17-MRSpark.pdf -------------------------------------------------------------------------------- /ppt/02 SQL 1 Final.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/02 SQL 1 Final.pptx -------------------------------------------------------------------------------- /ppt/03 SQL II Final.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/03 SQL II Final.pptx -------------------------------------------------------------------------------- /ppt/04 Files and Buffers.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/04 Files and Buffers.pptx -------------------------------------------------------------------------------- /ppt/04.5 Files,Pages, Records.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/04.5 Files,Pages, Records.pptx -------------------------------------------------------------------------------- /ppt/05 Files Heap Files vs Sorted Files.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/05 Files Heap Files vs Sorted Files.pptx -------------------------------------------------------------------------------- /ppt/06 Trees and Indexes FINAL animated.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/06 Trees and Indexes FINAL animated.pptx -------------------------------------------------------------------------------- /ppt/07 Tree-Indexes-final-jmh.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/07 Tree-Indexes-final-jmh.pptx -------------------------------------------------------------------------------- /ppt/08 Buffer Management Final.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/08 Buffer Management Final.pptx -------------------------------------------------------------------------------- /ppt/09 Sort Hash -JMH FINAL.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/09 Sort Hash -JMH FINAL.pptx -------------------------------------------------------------------------------- /ppt/10 relational algebra - final - jmh.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/10 relational algebra - final - jmh.pptx -------------------------------------------------------------------------------- /ppt/11 Joins final JMH.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/11 Joins final JMH.pptx -------------------------------------------------------------------------------- /ppt/12 Parallel Queries final JMH.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/12 Parallel Queries final JMH.pptx -------------------------------------------------------------------------------- /ppt/13 Query Plan Space JMH Final (1).pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/13 Query Plan Space JMH Final (1).pptx -------------------------------------------------------------------------------- /ppt/14 Query Optimization JMH FINAL.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/14 Query Optimization JMH FINAL.pptx -------------------------------------------------------------------------------- /ppt/15 Text Search JMH FINAL.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/15 Text Search JMH FINAL.pptx -------------------------------------------------------------------------------- /ppt/16 Relational Modeling FINAL JMH.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/16 Relational Modeling FINAL JMH.pptx -------------------------------------------------------------------------------- /ppt/17 FDs and Normalization JMH FINAL.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/17 FDs and Normalization JMH FINAL.pptx -------------------------------------------------------------------------------- /ppt/18 Transactions 1 JMH FINAL.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/18 Transactions 1 JMH FINAL.pptx -------------------------------------------------------------------------------- /ppt/19 Transactions 2 JMH FINAL.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/19 Transactions 2 JMH FINAL.pptx -------------------------------------------------------------------------------- /ppt/20 Recovery FINAL JMH.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/20 Recovery FINAL JMH.pptx -------------------------------------------------------------------------------- /ppt/21 Ranking and Crawling.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/21 Ranking and Crawling.pptx -------------------------------------------------------------------------------- /ppt/22 Distributed Transactions JMH FINAL.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/22 Distributed Transactions JMH FINAL.pptx -------------------------------------------------------------------------------- /ppt/23 Big Data and Data Wrangling.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/23 Big Data and Data Wrangling.pptx -------------------------------------------------------------------------------- /ppt/24 Replication and NoSQL.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/24 Replication and NoSQL.pptx -------------------------------------------------------------------------------- /ppt/25 Closing Comments.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PKUFlyingPig/CS186/5a021e79e9ffbb4f4aaef2b0c21b953ce2d84920/ppt/25 Closing Comments.pptx -------------------------------------------------------------------------------- /project-handout/hw0/README.md: -------------------------------------------------------------------------------- 1 | # Homework 0: Setup 2 | 3 | This assignment will be released on **Thursday, 8/27/2020**. -------------------------------------------------------------------------------- /project-handout/proj0/README.md: -------------------------------------------------------------------------------- 1 | # Project 0: Setup 2 | 3 | -------------------------------------------------------------------------------- /project-handout/proj0/getting-started.md: -------------------------------------------------------------------------------- 1 | # Getting Started 2 | 3 | ## Logistics 4 | 5 | This assignment is due **Monday, 1/25/2021 at 11:59PM PST (GMT-8)**. It is worth 0% of your overall grade, but failure to complete it may result in being **administratively dropped from the class**. 6 | 7 | ## Prerequisites 8 | 9 | No lectures are required to work through this assignment. 10 | 11 | ## `git` and GitHub 12 | 13 | [git](https://en.wikipedia.org/wiki/Git) is a _version control_ system, that helps developers like you track different versions of your code, synchronize them across different machines, and collaborate with others. If you don't already have git on your machine you can follow the instructions [here ](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)to install it. 14 | 15 | [GitHub](https://github.com) is a site which supports this system, hosting it as a service. In order to get a copies of the skeleton code to work on during the semester you'll need to create an account. 16 | 17 | We will be using git and GitHub to pass out assignments in this course. If you don't know much about git, that isn't a problem: you will _need_ to use it only in very simple ways that we will show you in order to keep up with class assignments. 18 | 19 | If you'd like to use git for managing your own code versioning, there are many guides to using git online -- [this](http://git-scm.com/book/en/v1/Getting-Started) is a good one. 20 | 21 | ### Fetching the released code 22 | 23 | For each project, we will provide a GitHub Classroom link. Follow the link to create a GitHub repository with the starter code for the project you are working on. Use `git clone` to get a local copy of the newly created repository. For example, if your GitHub username is `oski` after being assigned your repo through GitHub Classroom you would run: 24 | 25 | `git clone https://github.com/berkeley-cs186-student/sp21-proj0-oski` 26 | 27 | The GitHub Classroom link for this project is provided in the project release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). 28 | 29 | ### Debugging Issues with GitHub Classroom 30 | 31 | Feel free to skip this section if you don't have any issues with GitHub Classroom. If you are having issues \(i.e. the page froze or some error message appeared\), first check if you have access to your repo at `https://github.com/berkeley-cs186-student/sp21-proj0-username`, replacing `username` with your GitHub username. If you have access to your repo and the starter code is there, then you can proceed as usual. If you have access to your repo but the starter code is not there, run the following commands in a terminal \(again replacing `username` with your GitHub username\): 32 | 33 | ```text 34 | git clone https://github.com/berkeley-cs186/sp21-rookiedb sp21-proj0 35 | cd sp21-proj0/ 36 | git remote remove origin 37 | git remote add origin https://github.com/berkeley-cs186-student/sp21-proj0-username.git 38 | git push -u origin master 39 | ``` 40 | 41 | Then, you can proceed as usual. 42 | 43 | #### 404 Not Found 44 | 45 | If you're getting a 404 not found page when trying to access your repo, make sure you've set up your repo using the GitHub Classroom link in the Project 0 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). 46 | 47 | If you don't have access to your repo at all after following these steps, feel free to contact the course staff on Piazza. 48 | 49 | ## Setting up your local development environment 50 | 51 | You are free to use any text editor or IDE to complete the assignments, but **we will build and test your code in a Docker container with Maven**. 52 | 53 | We recommend setting up a local development environment by installing Java 8 locally \(the version our Docker container runs\) and using an IDE such as IntelliJ. 54 | 55 | [Java 8 downloads](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) \(or alternatively, you're free to use [OpenJDK](https://openjdk.java.net/install/)\) 56 | 57 | If you have a newer version of Java installed that should be fine, we'll do our best to support grading for those versions up to Java 11. We won't be able to support any new syntax or features introduced in Java 12 or later, which won't be necessary for the projects. 58 | 59 | To import the project into IntelliJ, make sure that you import as a Maven project \(select the `pom.xml` file when importing\). 60 | 61 | ![After hitting Import Project navigate to the pom.xml file and open it.](../../.gitbook/assets/image%20%284%29%20%283%29%20%284%29%20%284%29%20%284%29.png) 62 | 63 | If launching IntelliJ takes you to an existing workspace instead of showing you the popup above you can open the project by navigating to `File -> New -> Project From Existing Sources` and then select the `pom.xml` file. 64 | 65 | ### Running tests in IntelliJ 66 | 67 | If you are using IntelliJ and wish to run the tests for a given assignment follow the instructions in the following document: [IntelliJ setup](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/intellij-test-setup.md) 68 | 69 | Once you have a copy of the released code, head to the next section "Your Tasks" and begin working on the assignment. 70 | 71 | -------------------------------------------------------------------------------- /project-handout/proj0/submitting.md: -------------------------------------------------------------------------------- 1 | # Submitting the Assignment 2 | 3 | This project is due on **Monday, 1/25/2021 at 11:59PM PST (GMT-8)**. 4 | 5 | ## Pushing changes to GitHub Classroom 6 | 7 | To submit a project, navigate to the cloned repo in a terminal and stage the files for your submission using `git add`. For example, in this project you would run: 8 | 9 | `git add src/main/java/edu/berkeley/cs186/database/databox/StringDataBox.java` 10 | 11 | to stage your change to `StringDataBox.java`. Once your changes are staged commit them with `git commit -m "Put your own informative commit message here"`. Finally use`git push` to push all of your changes to the remote GitHub repository created by GitHub Classroom. 12 | 13 | ## Submitting to Gradescope 14 | 15 | Once your changes are on GitHub go to the CS186 Gradescope and click on the project for which you want to submit your code. Select GitHub for the submission method \(if it hasn't been selected already\), and select the repository and branch with the code you want to upload and submit. If you have not done this before, then you will have to link your GitHub account to Gradescope using the "Connect to GitHub" button. If you are unable to find the appropriate repository, then you might need to go to [https://github.com/settings/applications](https://github.com/settings/applications), click Gradescope, and grant access to the `berkeley-cs186-student` organization. 16 | 17 | Note that you are only allowed to modify certain files for each assignment, and changes to other files you are not allowed to modify will be discarded when we run tests. 18 | 19 | You should make sure that all code you modify belongs to files with `TODO(proj0)` comments in them. A full list of files that you may modify for this project are as follows: 20 | 21 | * `databox/StringDataBox.java` 22 | 23 | Once you've submitted you should see a score of 1.0/1.0. If so, congratulations! You've finished your first assignment for CS 186. 24 | 25 | ### Submitting via upload 26 | 27 | If your GitHub account has access to many repos, the Gradescope UI might time out while trying to load which repos you have available. If this is the case for you, you can submit your code directly using via upload. You can zip up your source code with `zip -r submission.zip src/` and submit that directly to the autograder. 28 | 29 | ## Grading 30 | 31 | * 100% of your grade will be made up of one test released to you \(the test that we 32 | 33 | provided in `database.databox.TestWelcome`\) 34 | 35 | * This project will be worth 0% of your overall grade, but failing to complete it may result in you being **administratively dropped from the class** 36 | 37 | -------------------------------------------------------------------------------- /project-handout/proj0/your-tasks.md: -------------------------------------------------------------------------------- 1 | # Your Tasks 2 | 3 | For this assignment you will get acquainted with running RookieDB's command line interface and make a small change to one file to get things working properly. 4 | 5 | ## Task 1: Running the CLI 6 | 7 | Most databases provide a command line interface \(CLI\) to send and view the results of queries. To run the CLI in IntelliJ navigate to the file: 8 | 9 | `src/main/java/edu/berkeley/cs186/database/cli/CommandLineInterface` 10 | 11 | It's okay if you don't understand most of the code here right now, we just want to run it. Locate the arrow next to the class declaration click on it to start the CLI. 12 | 13 | ![Click the arrow \(circled in red above\) to run the CLI](../../.gitbook/assets/image.png) 14 | 15 | This should open a new panel in IntelliJ resembling the following image: 16 | 17 | ![](../../.gitbook/assets/image%20%2810%29%20%281%29.png) 18 | 19 | Click on this panel and try typing in the following query and hitting enter: 20 | 21 | `SELECT * FROM Courses LIMIT 5;` 22 | 23 | You should get something similar to the following output: 24 | 25 | ![](../../.gitbook/assets/image%20%283%29.png) 26 | 27 | Hmm, that doesn't look quite right! Follow the instructions in the next task to get the proper output. To exit the CLI just type in `exit` and hit enter. 28 | 29 | ## Task 2: Welcome to CS186! 30 | 31 | Open up `src/main/java/edu/berkeley/cs186/database/databox/StringDataBox.java`. It's okay if you do not understand most of the code right now. 32 | 33 | The `toString` method currently looks like: 34 | 35 | ```java 36 | @Override 37 | public String toString() { 38 | // TODO(proj0): replace the following line with `return s;` 39 | return "FIX ME"; 40 | } 41 | ``` 42 | 43 | Follow the instructions in the `TODO(proj0)` comment to fix the return statement. 44 | 45 | Navigate to`src/test/java/edu/berkeley/cs186/database/databox/TestWelcome.java` and try running the test in the file, which should now be passing. Now you can run through Task 1 again to see what the proper output should be. 46 | 47 | ## You're done! 48 | 49 | Follow the instructions in the next section "Submitting the Assignment" to turn in your work. 50 | 51 | -------------------------------------------------------------------------------- /project-handout/proj1.md: -------------------------------------------------------------------------------- 1 | # Project 1: SQL 2 | 3 | This assignment will be released on **Thursday, 1/21/2021**. 4 | 5 | -------------------------------------------------------------------------------- /project-handout/proj1/README.md: -------------------------------------------------------------------------------- 1 | # Project 1: SQL 2 | 3 | -------------------------------------------------------------------------------- /project-handout/proj1/getting-started.md: -------------------------------------------------------------------------------- 1 | # Getting Started 2 | 3 | ## Logistics 4 | 5 | This project is due **Tuesday, 2/2/2021 at 11:59PM PST (GMT-8)**. It is worth of 5% your overall grade in the class. 6 | 7 | ## Prerequisites 8 | 9 | You should watch the SQL I lecture before beginning this project. Later questions will require material from the SQL II lecture. 10 | 11 | ## Fetching the released code 12 | 13 | The GitHub Classroom link for this project is in the Project 1 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). Once your private repo is set up clone the project 1 skeleton code onto your local machine. 14 | 15 | ### Debugging Issues with GitHub Classroom 16 | 17 | Feel free to skip this section if you don't have any issues with GitHub Classroom. If you are having issues \(i.e. the page froze or some error message appeared\), first check if you have access to your repo at `https://github.com/berkeley-cs186-student/sp21-proj1-username`, replacing `username` with your GitHub username. If you have access to your repo and the starter code is there, then you can proceed as usual. If you have access to your repo but the starter code is not there, run the following commands in a terminal \(again replacing `username` with your GitHub username\): 18 | 19 | ```text 20 | git clone https://github.com/berkeley-cs186/sp21-proj1 21 | cd sp21-proj1/ 22 | git remote remove origin 23 | git remote add origin https://github.com/berkeley-cs186-student/sp21-proj1-username.git 24 | git push -u origin master 25 | ``` 26 | 27 | Then, you can proceed as usual. 28 | 29 | ### 404 Not Found 30 | 31 | If you're getting a 404 not found page when trying to access your repo, make sure you've set up your repo using the GitHub Classroom link in the Project 1 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). 32 | 33 | If you don't have access to your repo at all after following these steps, feel free to contact the course staff on Piazza. 34 | 35 | ## Required Software 36 | 37 | ### SQLite3 38 | 39 | Check if you already have sqlite3 instead by opening a terminal and running `sqlite3 --version`. Any version at 3.8.3 or higher should be fine. 40 | 41 | If you don't already have SQLite on your machine, the simplest way to start using it is to download a precompiled binary from the [SQLite website](http://www.sqlite.org/download.html). The latest version of SQLite at the time of writing is 3.34.1, but you can check for additional updates on the website. 42 | 43 | #### Windows 44 | 45 | 1. Visit the download page linked above and navigate to the section **Precompiled Binaries for Windows**. Click on the link **sqlite-tools-win32-x86-\*.zip** to download the binary. 46 | 2. Unzip the file. There should be a `sqlite3.exe` file in the directory after extraction. 47 | 3. Navigate to the folder containing the `sqlite3.exe` file and check that the version is at least 3.8.3: `cd path/to/sqlite_folder` `./sqlite3 --version` 48 | 4. Move the `sqlite3.exe` executable into your `sp21-proj1-yourname` directory \(the same place as the `proj1.sql` file\) 49 | 50 | #### macOS Yosemite \(10.10\), El Capitan \(10.11\), Sierra \(10.12\) 51 | 52 | SQLite comes pre-installed. Check that you have a version that's greater than 3.8.3 `./sqlite3 --version` 53 | 54 | #### Mac OS X Mavericks \(10.9\) or older 55 | 56 | SQLite comes pre-installed, but it is the wrong version. 57 | 58 | 1. Visit the download page linked above and navigate to the section **Precompiled Binaries for Mac OS X \(x86\)**. Click on the link **sqlite-tools-osx-x86-\*.zip** to download the binary. 59 | 2. Unzip the file. There should be a `sqlite3` file in the directory after extraction. 60 | 3. Navigate to the folder containing the `sqlite3` file and check that the version is at least 3.8.3: `cd path/to/sqlite_folder` `./sqlite3 --version` 61 | 4. Move the `sqlite3` file into your `sp21-proj1-yourname` directory \(the same place as the `proj1.sql` file\) 62 | 63 | #### Ubuntu 64 | 65 | Install with `sudo apt install sqlite3` 66 | 67 | For other Linux distributions you'll need to find `sqlite3` on your appropriate package manager. Alternatively you can follow the Mac OS X \(10.9\) or older instructions substituting the Mac OS X binary for one from **Precompiled Binaries for Linux.** 68 | 69 | ### Python 70 | 71 | You'll need a copy of Python 3.5 or higher to run the tests for this project locally. You can check if you already have an existing copy by running `python3 --version` in a terminal. If you don't already have a working copy download and install one for your appropriate platform from [here](https://www.python.org/downloads/). 72 | 73 | ## Download and extract the data set 74 | 75 | Download the data set for this project from the course's Google Drive [here](https://drive.google.com/file/d/1WLMFAiNzrA0Qv3p80epO71uN8J6fTXYG/view?usp=sharing). You should get a file called `lahman.db.zip`. Unzip the `lahman.db.zip` file inside your `sp21-proj1-yourname` directory. You should now have a `lahman.db` file in your `sp21-proj1-yourname` directory \(the same place as the `proj1.sql` file\) 76 | 77 | ## Running the tests 78 | 79 | If you followed the instructions above you should now be able to test your code. Navigate to your project directory and try using `python3 test.py`. You should get output similar to the following: 80 | 81 | ```text 82 | FAIL q0 see diffs/q0.txt 83 | FAIL q1i see diffs/q1i.txt 84 | FAIL q1ii see diffs/q1ii.txt 85 | FAIL q1iii see diffs/q1iii.txt 86 | FAIL q1iv see diffs/q1iv.txt 87 | FAIL q2i see diffs/q2i.txt 88 | FAIL q2ii see diffs/q2ii.txt 89 | FAIL q2iii see diffs/q2iii.txt 90 | FAIL q3i see diffs/q3i.txt 91 | FAIL q3ii see diffs/q3ii.txt 92 | FAIL q3iii see diffs/q3iii.txt 93 | FAIL q4i see diffs/q4i.txt 94 | FAIL q4ii_bins_0_to_8 see diffs/q4ii_bins_0_to_8.txt 95 | FAIL q4ii_bin_9 see diffs/q4ii_bin_9.txt 96 | FAIL q4iii see diffs/q4iii.txt 97 | FAIL q4iv see diffs/q4iv.txt 98 | FAIL q4v see diffs/q4v.txt 99 | ``` 100 | 101 | If so, move on to the next section to start the project. If you see `ERROR`instead of `FAIL` create a followup on Piazza with details from your `your_output/` folder. 102 | 103 | -------------------------------------------------------------------------------- /project-handout/proj1/submitting.md: -------------------------------------------------------------------------------- 1 | # Submitting the Assignment 2 | 3 | This project is due on **Tuesday, 2/2/2021 at 11:59PM PST (GMT-8)**. 4 | 5 | Push your changes to your GitHub Classroom private repository and then submit through Gradescope. You may find it helpful to read through the project 0 submission procedure again [here](submitting.md). Alternatively you can submit your `proj1.sql` file directly \(make sure it is named `proj1.sql` or the autograder won't recognize it\). 6 | 7 | A full list of files that you may modify are as follows: 8 | 9 | * `proj1.sql` 10 | 11 | ## Grading 12 | 13 | * 80% of your grade will be made up of tests released to you 14 | * 20% will be determined by hidden tests unreleased tests that we will run on your submission after the deadline 15 | * This project will be worth 5% of your overall grade in the class 16 | 17 | -------------------------------------------------------------------------------- /project-handout/proj1/testing.md: -------------------------------------------------------------------------------- 1 | # Testing 2 | 3 | You can run your answers through SQLite directly by running `sqlite3 lahman.db` to open the database and then entering `.read proj1.sql` 4 | 5 | ```text 6 | $ sqlite3 lahman.db 7 | SQLite version 3.33.0 2020-08-14 13:23:32 8 | Enter ".help" for usage hints. 9 | sqlite> .read proj1.sql 10 | ``` 11 | 12 | This can help you catch any syntax errors in your SQL. 13 | 14 | To help debug your logic, we've provided output from each of the views you need to define in questions 1-4 for the data set you've been given. Your views should match ours, but note that your SQL queries should work on ANY data set. **We will test your queries on a \(set of\) different database\(s\), so it is** _**NOT**_ **sufficient to simply return these results in all cases!** 15 | 16 | To run the test, from within the `sp21-proj1-yourname` directory: 17 | 18 | ```text 19 | $ python3 test.py 20 | $ python3 test.py -q 4ii # This would run tests for only q4ii 21 | ``` 22 | 23 | Become familiar with the UNIX [diff](http://en.wikipedia.org/wiki/Diff) format, if you're not already, because our tests saves a simplified diff for any query executions that don't match in `diffs/`. As an example, the following output for `diffs/q1i.txt:`: 24 | 25 | ```text 26 | - 1|1|1 27 | + Jumbo|Diaz|1984 28 | + Walter|Young|1980 29 | ``` 30 | 31 | indicates that your output has an extra `1|1|1` \(the `-` at the beginning means the expected output _doesn't_ include this line but your output has it\) and is missing the lines `Jumbo|Diaz|1984` and `Walter|Young|1980` \(the plus at the beginning means the expected output _does_ include those lines but your output is missing it\). If there is neither a `+` nor `-` at the beginning then it means that the line is in both your output and the expected output \(your output is correct for that line\). 32 | 33 | If you care to look at the query outputs directly, ours are located in the `expected_output` directory. Your view output should be located in your solution's `your_output` directory once you run the tests. 34 | 35 | **Note:** For queries where we don't specify the order, it doesn't matter how you sort your results; we will reorder before comparing. Note, however, that our test query output is sorted for these cases, so if you're trying to compare yours and ours manually line-by-line, make sure you use the proper ORDER BY clause \(you can determine this by looking in `test.py`\). Different versions of SQLite handle floating points slightly differently so we also round certain floating point values in our own queries. A full list is specified here for convenience: 36 | 37 | ```sql 38 | SELECT * FROM q0; 39 | SELECT * FROM q1i ORDER BY namefirst, namelast, birthyear; 40 | SELECT * FROM q1ii ORDER BY namefirst, namelast, birthyear; 41 | SELECT birthyear, ROUND(avgheight, 4), count FROM q1iii; 42 | SELECT birthyear, ROUND(avgheight, 4), count FROM q1iv; 43 | SELECT * FROM q2i; 44 | SELECT * FROM q2ii; 45 | SELECT * FROM q2iii; 46 | SELECT playerid, namefirst, namelast, yearid, ROUND(slg, 4) FROM q3i; 47 | SELECT playerid, namefirst, namelast, ROUND(lslg, 4) FROM q3ii; 48 | SELECT namefirst, namelast, ROUND(lslg, 4) FROM q3iii ORDER BY namefirst, namelast; 49 | SELECT yearid, min, max, ROUND(avg, 4) FROM q4i; 50 | SELECT * FROM q4ii WHERE binid <> 9; 51 | WITH max_salary AS (SELECT MAX(salary) AS salary FROM salaries) 52 | SELECT binid, low, 53 | ((CASE WHEN high >= salary THEN '' ELSE 'not ' END) || 54 | 'at least ' || salary) AS high, count 55 | FROM q4ii, max_salary WHERE binid = 9; 56 | SELECT yearid, mindiff, maxdiff, ROUND(avgdiff, 4) FROM q4iii; 57 | SELECT * FROM q4iv ORDER BY yearid, playerid; 58 | SELECT team, ROUND(diffAvg, 4) FROM q4v ORDER BY team; 59 | ``` 60 | 61 | -------------------------------------------------------------------------------- /project-handout/proj1/your-tasks.md: -------------------------------------------------------------------------------- 1 | # Your Tasks 2 | 3 | ![Databaseball](../../.gitbook/assets/databaseball%20%282%29%20%283%29%20%283%29%20%283%29.jpg) 4 | 5 | In this project we will be working with the commonly-used [Lahman baseball statistics database](http://www.seanlahman.com/baseball-archive/statistics/) \(our friends at the San Francisco Giants tell us they use it!\) The database contains pitching, hitting, and fielding statistics for Major League Baseball from 1871 through 2019. It includes data from the two current leagues \(American and National\), four other "major" leagues \(American Association, Union Association, Players League, and Federal League\), and the National Association of 1871-1875. 6 | 7 | At this point you should be able to run SQLite and view the database using either `./sqlite3 -header lahman.db` \(if in the previous section you downloaded a precompiled binary\) or `sqlite3 -header lahman.db` otherwise. 8 | 9 | ```text 10 | $ sqlite3 lahman.db 11 | SQLite version 3.33.0 2020-08-14 13:23:32 12 | Enter ".help" for usage hints. 13 | sqlite> .tables 14 | ``` 15 | 16 | Try running a few sample commands in the SQLite console and see what they do: 17 | 18 | ```text 19 | sqlite> .schema people 20 | ``` 21 | 22 | ```text 23 | sqlite> SELECT playerid, namefirst, namelast FROM people; 24 | ``` 25 | 26 | ```text 27 | sqlite> SELECT COUNT(*) FROM fielding; 28 | ``` 29 | 30 | ## Understanding the Schema 31 | 32 | The database is comprised of the following main tables: 33 | 34 | ```text 35 | People - Player names, date of birth (DOB), and biographical info 36 | Batting - batting statistics 37 | Pitching - pitching statistics 38 | Fielding - fielding statistics 39 | ``` 40 | 41 | It is supplemented by these tables: 42 | 43 | ```text 44 | AllStarFull - All-Star appearances 45 | HallofFame - Hall of Fame voting data 46 | Managers - managerial statistics 47 | Teams - yearly stats and standings 48 | BattingPost - post-season batting statistics 49 | PitchingPost - post-season pitching statistics 50 | TeamFranchises - franchise information 51 | FieldingOF - outfield position data 52 | FieldingPost- post-season fielding data 53 | FieldingOFsplit - LF/CF/RF splits 54 | ManagersHalf - split season data for managers 55 | TeamsHalf - split season data for teams 56 | Salaries - player salary data 57 | SeriesPost - post-season series information 58 | AwardsManagers - awards won by managers 59 | AwardsPlayers - awards won by players 60 | AwardsShareManagers - award voting for manager awards 61 | AwardsSharePlayers - award voting for player awards 62 | Appearances - details on the positions a player appeared at 63 | Schools - list of colleges that players attended 64 | CollegePlaying - list of players and the colleges they attended 65 | Parks - list of major league ballparls 66 | HomeGames - Number of homegames played by each team in each ballpark 67 | ``` 68 | 69 | For more detailed information, see the [docs online](http://www.seanlahman.com/files/database/readme2019.txt). 70 | 71 | ## Writing Queries 72 | 73 | We've provided a skeleton solution file, `proj1.sql`, to help you get started. In the file, you'll find a `CREATE VIEW` statement for each part of the first 4 questions below, specifying a particular view name \(like `q2i`\) and list of column names \(like `playerid`, `lastname`\). The view name and column names constitute the interface against which we will grade this assignment. In other words, _don't change or remove these names_. Your job is to fill out the view definitions in a way that populates the views with the right tuples. 74 | 75 | For example, consider Question 0: "What is the highest `era` \([earned run average](https://en.wikipedia.org/wiki/Earned_run_average)\) recorded in baseball history?". 76 | 77 | In the `proj1.sql` file we provide: 78 | 79 | ```sql 80 | CREATE VIEW q0(era) AS 81 | SELECT 1 -- replace this line 82 | ; 83 | ``` 84 | 85 | You would edit this with your answer, keeping the schema the same: 86 | 87 | ```sql 88 | -- solution you provide 89 | CREATE VIEW q0(era) AS 90 | SELECT MAX(era) 91 | FROM pitching 92 | ; 93 | ``` 94 | 95 | To complete the project, create a view for `q0` as above \(via copy-paste\), and for all of the following queries, which you will need to write yourself. 96 | 97 | You can confirm the test is now passing by running `python3 test.py -q 0` 98 | 99 | ```text 100 | > python3 test.py -q 0 101 | PASS q0 102 | ``` 103 | 104 | More details on testing can be found in the [Testing](testing.md) section. 105 | 106 | ### Changes from Lecture 107 | 108 | SQLite doesn't support every SQL feature covered in lecture, specifically: 109 | 110 | * There is support for `LEFT OUTER JOIN` but not `RIGHT OUTER` or `FULL OUTER`. 111 | * To get equivalent output to `RIGHT OUTER` you can reverse the order of the tables \(i.e. `A RIGHT JOIN B` is the same as `B LEFT JOIN A`. 112 | * While it isn't required to complete this assignment, the equivalent to `FULL OUTER JOIN` can be done by `UNION`ing `RIGHT OUTER` and `LEFT OUTER` 113 | * There is no regex match \(`~`\) tilde operator. You can use `LIKE` instead. 114 | * There is no `ANY` or `ALL` operator. 115 | 116 | ## Your Tasks 117 | 118 | ### Task 1: **Basics** 119 | 120 | **i.** In the `people` table, find the `namefirst`, `namelast` and `birthyear` for all players with weight greater than 300 pounds. 121 | 122 | **ii.** Find the `namefirst`, `namelast` and `birthyear` of all players whose `namefirst` field contains a space. Order the results by `namefirst`, breaking ties with `namelast` both in ascending order 123 | 124 | **iii.** From the `people` table, group together players with the same `birthyear`, and report the `birthyear`, average `height`, and number of players for each `birthyear`. Order the results by `birthyear` in _ascending_ order. 125 | 126 | Note: Some birth years have no players; your answer can simply skip those years. In some other years, you may find that all the players have a `NULL` height value in the dataset \(i.e. `height IS NULL`\); your query should return `NULL` for the height in those years. 127 | 128 | **iv.** Following the results of part iii, now only include groups with an average height > `70`. Again order the results by `birthyear` in _ascending_ order. 129 | 130 | ### Task 2: **Hall of Fame Schools** 131 | 132 | **i.** Find the `namefirst`, `namelast`, `playerid` and `yearid` of all people who were successfully inducted into the Hall of Fame in _descending_ order of `yearid`. Break ties on `yearid` by `playerid` \(ascending\). 133 | 134 | **ii.** Find the people who were successfully inducted into the Hall of Fame and played in college at a school located in the state of California. For each person, return their `namefirst`, `namelast`, `playerid`, `schoolid`, and `yearid` in _descending_ order of `yearid`. Break ties on `yearid` by `schoolid, playerid` \(ascending\). For this question, `yearid` refers to the year of induction into the Hall of Fame. 135 | 136 | * Note: a player may appear in the results multiple times \(once per year in a college in California\). 137 | 138 | **iii.** Find the `playerid`, `namefirst`, `namelast` and `schoolid` of all people who were successfully inducted into the Hall of Fame -- whether or not they played in college. Return people in _descending_ order of `playerid`. Break ties on `playerid` by `schoolid` \(ascending\). \(Note: `schoolid` will be `NULL` if they did not play in college.\) 139 | 140 | ### Task 3: [**SaberMetrics**](https://en.wikipedia.org/wiki/Sabermetrics) 141 | 142 | **i.** Find the `playerid`, `namefirst`, `namelast`, `yearid` and single-year `slg` \(Slugging Percentage\) of the players with the 10 best annual Slugging Percentage recorded over all time. For statistical significance, only include players with more than 50 at-bats in the season. Order the results by `slg` descending, and break ties by `yearid, playerid` \(ascending\). 143 | 144 | * Baseball note: Slugging Percentage is not provided in the database; it is computed according to a [simple formula](https://en.wikipedia.org/wiki/Slugging_percentage) you can calculate from the data in the database. 145 | * SQL note: You should compute `slg` properly as a floating point number---you'll need to figure out how to convince SQL to do this! 146 | * Data set note: The online documentation `batting` mentions two columns `2B` and `3B`. On your local copy of the data set these have been renamed `H2B` and `H3B` respectively \(columns starting with numbers are tedious to write queries on\). 147 | * Data set note: The column `H` o f the `batting` table represents all hits = \(\# singles\) + \(\# doubles\) + \(\# triples\) + \(\# home runs\), not just \(\# singles\) so you’ll need to account for some double-counting 148 | * If a player played on multiple teams during the same season \(for example `anderma02` in 2006\) treat their time on each team separately for this calculation 149 | 150 | **ii.** Following the results from Part i, find the `playerid`, `namefirst`, `namelast` and `lslg` \(Lifetime Slugging Percentage\) for the players with the top 10 Lifetime Slugging Percentage. Lifetime Slugging Percentage \(LSLG\) uses the same formula as Slugging Percentage \(SLG\), but it uses the number of singles, doubles, triples, home runs, and at bats each player has over their entire career, rather than just over a single season. 151 | 152 | Note that the database only gives batting information broken down by year; you will need to convert to total information across all time \(from the earliest date recorded up to the last date recorded\) to compute `lslg`. Order the results by `lslg` \(descending\) and break ties by `playerid` \(ascending\) 153 | 154 | * Note: Make sure that you only include players with more than 50 at-bats across their lifetime. 155 | 156 | **iii.** Find the `namefirst`, `namelast` and Lifetime Slugging Percentage \(`lslg`\) of batters whose lifetime slugging percentage is higher than that of San Francisco favorite Willie Mays. 157 | 158 | You may include Willie Mays' `playerid` in your query \(`mayswi01`\), but you _may not_ include his slugging percentage -- you should calculate that as part of the query. \(Test your query by replacing `mayswi01` with the playerid of another player -- it should work for that player as well! We may do the same in the autograder.\) 159 | 160 | * Note: Make sure that you still only include players with more than 50 at-bats across their lifetime. 161 | 162 | _Just for fun_: For those of you who are baseball buffs, variants of the above queries can be used to find other more detailed SaberMetrics, like [Runs Created](https://en.wikipedia.org/wiki/Runs_created) or [Value Over Replacement Player](https://en.wikipedia.org/wiki/Value_over_replacement_player). Wikipedia has a nice page on [baseball statistics](https://en.wikipedia.org/wiki/Baseball_statistics); most of these can be computed fairly directly in SQL. 163 | 164 | _Also just for fun_: SF Giants VP of Baseball Operations, [Yeshayah Goldfarb](https://www.mlb.com/giants/team/front-office/yeshayah-goldfarb), suggested the following: 165 | 166 | > Using the Lahman database as your guide, make an argument for when MLBs “Steroid Era” started and ended. There are a number of different ways to explore this question using the data. 167 | 168 | \(Please do not include your "just for fun" answers in your solution file! They will break the autograder.\) 169 | 170 | ### Task 4: **Salaries** 171 | 172 | **i.** Find the `yearid`, min, max and average of all player salaries for each year recorded, ordered by `yearid` in _ascending_ order. 173 | 174 | **ii.** For salaries in 2016, compute a [histogram](https://en.wikipedia.org/wiki/Histogram). Divide the salary range into 10 equal bins from min to max, with `binid`s 0 through 9, and count the salaries in each bin. Return the `binid`, `low` and `high` boundaries for each bin, as well as the number of salaries in each bin, with results sorted from smallest bin to largest. 175 | 176 | * Note: `binid` 0 corresponds to the lowest salaries, and `binid` 9 corresponds to the highest. The ranges are left-inclusive \(i.e. `[low, high)`\) -- so the `high` value is excluded. For example, if bin 2 has a `high` value of 100000, salaries of 100000 belong in bin 3, and bin 3 should have a `low` value of 100000. 177 | * Note: The `high` value for bin 9 may be inclusive\). 178 | * Note: The test for this question is broken into two parts. Use `python3 test.py -q 4ii_bins_0_to_8` and `python3 test.py -q 4ii_bin_9` to run the tests 179 | * Hidden testing advice: we will be testing the case where a bin has zero player salaries in it. The correct behavior in this case is to display the correct `binid`, `low` and `high` with a `count` of zero, NOT just excluding the bin altogether. 180 | 181 | Some useful information: 182 | 183 | * You may find it helpful to use the provided helper table containing all the possible `binid`s. We'll only be testing with these possible binid's \(there aren't any hidden tests using say, 100 bins\) so using the hardcoded table is fine 184 | * If you want to take the [floor ](https://en.wikipedia.org/wiki/Floor_and_ceiling_functions)of a positive float value you can do `CAST (some_value AS INT)` 185 | 186 | **iii.** Now let's compute the Year-over-Year change in min, max and average player salary. For each year with recorded salaries after the first, return the `yearid`, `mindiff`, `maxdiff`, and `avgdiff` with respect to the previous year. Order the output by `yearid` in _ascending_ order. \(You should omit the very first year of recorded salaries from the result.\) 187 | 188 | **iv.** In 2001, the max salary went up by over $6 million. Write a query to find the players that had the max salary in 2000 and 2001. Return the `playerid`, `namefirst`, `namelast`, `salary` and `yearid` for those two years. If multiple players tied for the max salary in a year, return all of them. 189 | 190 | * Note on notation: you are computing a relational variant of the [argmax](https://en.wikipedia.org/wiki/Arg_max) for each of those two years. 191 | 192 | **v.** Each team has at least 1 All Star and may have multiple. For each team in the year 2016, give the `teamid` and `diffAvg` \(the difference between the team's highest paid all-star's salary and the team's lowest paid all-star's salary\). 193 | 194 | * Note: Due to some discrepancies in the database, please draw your team names from the All-Star table \(so use `allstarfull.teamid` in the SELECT statement for this\). 195 | 196 | ## You're done! 197 | 198 | Rerun `python3 test.py` to see if you're passing tests. If so, follow the instructions in the next section to submit your work. 199 | 200 | -------------------------------------------------------------------------------- /project-handout/proj2.md: -------------------------------------------------------------------------------- 1 | # Project 2: B+ Trees 2 | 3 | This assignment will be released on **Tuesday, 2/4/2020**. 4 | 5 | -------------------------------------------------------------------------------- /project-handout/proj2/README.md: -------------------------------------------------------------------------------- 1 | # Project 2: B+ Trees 2 | 3 | -------------------------------------------------------------------------------- /project-handout/proj2/getting-started.md: -------------------------------------------------------------------------------- 1 | # Getting Started 2 | 3 | ## Logistics 4 | 5 | This project is due **Thursday, 2/18/2021 at 11:59PM PST (GMT-8)**. It is worth 6% of your overall grade in the class. The workload for the project is designed to be completed solo, but this semester we're allowing students to work on this project with a partner if you want to. Feel free to search for a partner on [this Piazza thread](https://piazza.com/class/kjoxqrf1eq04mr?cid=5)! 6 | 7 | ## Prerequisites 8 | 9 | You should watch the B+ Trees lectures before working on this project. 10 | 11 | ## Academic Integrity Policy 12 | 13 | “_As a member of the UC Berkeley community, I act with honesty, integrity, and respect for others._” — UC Berkeley Honor Code 14 | 15 | **Read through the academic integrity guidelines** [**here**](https://piazza.com/class/kjoxqrf1eq04mr?cid=8)**.** We will be running plagiarism detection software on every submission against our own database of this semester's submissions, past submissions, and publicly hosted implementations on platforms such as GitHub and GitLab, followed by a thorough manual review process. Plagiarism on any assignment will result in a [non-reportable warning](https://sa.berkeley.edu/student-code-of-conduct-section6) and a grade penalty based on the severity of the infraction. 16 | 17 | As long as you follow the guidelines there isn't anything to worry about here. While we do rely on software to find possible cases of academic dishonesty every case is reviewed by multiple TAs who can filter out false positives. 18 | 19 | ## Fetching the released code 20 | 21 | The GitHub Classroom link for this project is in the Project 2 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). Once your private repo is set up clone the Project 2 skeleton code onto your local machine. You'll be working off of a fresh copy of the RookieDB skeleton instead of reusing the one from Project 0. 22 | 23 | ### Setting up your local development environment 24 | 25 | If you're using IntelliJ you can follow the instructions [in Project 0](../proj0/getting-started.md#setting-up-your-local-development-environment) in to set up your local environment again. Once you have your environment set up you can head to the next section [Your Tasks](your-tasks.md) and begin working on the assignment. 26 | 27 | ## Adding a partner 28 | 29 | Once you've found a partner fill out [**this form**](https://forms.gle/sJsPSCZaaeKgTJya9) so we know who you're working with. If you want to share code over GitHub you can follow the instructions [here](../../common/adding-a-partner-on-github.md). 30 | 31 | ## Debugging Issues with GitHub Classroom 32 | 33 | Feel free to skip this section if you don't have any issues with GitHub Classroom. If you are having issues \(i.e. the page froze or some error message appeared\), first check if you have access to your repo at `https://github.com/berkeley-cs186-student/sp21-proj2-username`, replacing `username` with your GitHub username. If you have access to your repo and the starter code is there, then you can proceed as usual. If you have access to your repo but the starter code is not there, run the following commands in a terminal \(again replacing `username` with your GitHub username\): 34 | 35 | ```text 36 | git clone https://github.com/berkeley-cs186/sp21-rookiedb sp21-proj2 37 | cd sp21-proj2/ 38 | git remote remove origin 39 | git remote add origin https://github.com/berkeley-cs186-student/sp21-proj2-username.git 40 | git push -u origin master 41 | ``` 42 | 43 | Then, you can proceed as usual. 44 | 45 | #### 404 Not Found 46 | 47 | If you're getting a 404 not found page when trying to access your repo, make sure you've set up your repo using the GitHub Classroom link in the Project 2 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). 48 | 49 | If you don't have access to your repo at all after following these steps, feel free to contact the course staff on Piazza. 50 | 51 | -------------------------------------------------------------------------------- /project-handout/proj2/submission.md: -------------------------------------------------------------------------------- 1 | # Submitting the Assignment 2 | 3 | ## Files 4 | 5 | You may **not** modify the signature of any methods or classes that we provide to you, but you're free to add helper methods. 6 | 7 | You should make sure that all code you modify belongs to files with `TODO(proj2)` comments in them \(e.g. don't add helper methods to DataBox\). A full list of files that you may modify follows: 8 | 9 | * `src/main/java/edu/berkeley/cs186/database/index/BPlusTree.java` 10 | * `src/main/java/edu/berkeley/cs186/database/index/InnerNode.java` 11 | * `src/main/java/edu/berkeley/cs186/database/index/LeafNode.java` 12 | * `src/main/java/edu/berkeley/cs186/database/index/BPlusNode.java` \(Optional\) 13 | 14 | Make sure that your code does _not_ use any \(non-final\) static variables -- this may cause odd behavior when running with the autograder vs. in your IDE \(tests run through the IDE often run with a new instance of Java for each test, so the static variables get reset, but multiple tests per Java instance may be run when using maven, where static variables _do not_ get reset\). 15 | 16 | ## Gradescope 17 | 18 | Once all of your files are prepared in your repo you can submit to Gradescope through GitHub the same way you did for [Project 0](../proj0/submitting.md#pushing-changes-to-github-classroom). 19 | 20 | ## Submitting via upload 21 | 22 | If your GitHub account has access to many repos, the Gradescope UI might time out while trying to load which repos you have available. If this is the case for you, you can submit your code directly using via upload. You can zip up your source code with `zip -r submission.zip src/` and submit that directly to the autograder. 23 | 24 | ## Partners 25 | 26 | If you haven't yet already be sure to fill out [this form](https://forms.gle/sJsPSCZaaeKgTJya9) so we know who you're working with. Every student is responsible for submitting to gradescope individually -- if you submit but your partner doesn't then your partner will not got credit. If you worked off of a shared repo both members of the group are free to submit that repo. Slip days will be deducted individually. For example: You submit on time, but your partner submits a day late. Your partner will have to use a slip day or will receive a late penalty on the project \(but you will not\). 27 | 28 | ## Grade breakdown 29 | 30 | * 60% of your grade will be made up of tests released to you \(the tests that we provided in 31 | 32 | `database.index.*`\). 33 | 34 | * 40% of your grade will be made up of hidden, unreleased tests that we will run on your submission after the deadline. 35 | 36 | -------------------------------------------------------------------------------- /project-handout/proj2/testing.md: -------------------------------------------------------------------------------- 1 | # Testing 2 | 3 | We strongly encourage testing your code yourself. The given tests for this project are not comprehensive tests: it is possible to write incorrect code that passes them all \(but not get full score\). 4 | 5 | Things that you might consider testing for include: anything that we specify in the comments or in this document that a method should do that you don't see a test already testing for, and any edge cases that you can think of. Think of what valid inputs might break your code and cause it not to perform as intended, and add a test to make sure things are working. 6 | 7 | We've put together a [video to introduce you to the testing framework](https://drive.google.com/drive/folders/1VeqJHtAJ0fFcGvusLjXa-wyKzZ_TKb8L). 8 | 9 | To help you get started, here is one case that is _not_ in the given tests \(and will be included in the hidden tests\): everything should work properly even if we delete everything from the BPlusTree. For example, after everything is deleted and a new iterator is created, the new iterator should immediately return false for `hasNext`. 10 | 11 | \(Note that we don't allow mutating the tree during a scan, so the behavior of an iterator created before everything was deleted is undefined, and can be handled however you like or not at all\). 12 | 13 | To add a unit test, open up the appropriate test file \(in `src/test/java/edu/berkeley/cs186/database/index`\) and simply add a new method to the file with a `@Test` annotation, for example: 14 | 15 | ```java 16 | @Test 17 | public void testEverythingDeleted() { 18 | // your test code here 19 | } 20 | ``` 21 | 22 | Many test classes have some setup code done for you already: take a look at other tests in the file for an idea of how to write the test code. 23 | 24 | -------------------------------------------------------------------------------- /project-handout/proj2/your-tasks.md: -------------------------------------------------------------------------------- 1 | # Your Tasks 2 | 3 | ![Datarake](../../.gitbook/assets/b_tree.jpg) 4 | 5 | In this project you'll be implementing B+ tree indices. Since you'll be diving into the code base for the first time we've provided an introduction to the existing skeleton code. 6 | 7 | ## Understanding the Skeleton Code 8 | 9 | ### DataBox 10 | 11 | Every modern database supports a variety of data types to use in records, and RookieDB is no exception. For consistency and convenience most implementations choose to have their own internal representation of their data types built on top of the implementation language's defaults. In RookieDB we represent them using data boxes. 12 | 13 | A data box can contain data of the following types: `Boolean` \(1 byte\), `Int` \(4 bytes\), `Float` \(4 bytes\), `Long` \(8 bytes\) and `String(N)` \(N bytes\). For this project you'll be working with the abstract [`DataBox`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/databox/DataBox.java) class which implements `Comparable`. You may find it useful to review how the [Comparable interface works](https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html) for this project. 14 | 15 | ### RecordId 16 | 17 | A record in a table is uniquely identified by its page number \(the number of the page on which it resides\) and its entry number \(the record's index on the page\). These two numbers \(pageNum, entryNum\) comprise a [`RecordId`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/table/RecordId.java). For this project we'll be using record IDs in our leaf nodes as pointers to records in the data pages. 18 | 19 | ### Index 20 | 21 | The [`index`](https://github.com/berkeley-cs186/sp21-rookiedb/tree/master/src/test/java/edu/berkeley/cs186/database/index%20) directory contains a partial implementation of an Alternative 2 B+ tree, an implementation that you will complete in this project. Some of the important files in this directory are: 22 | 23 | * [`BPlusTree.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/BPlusTree.java) - This file contains the class that manages the structure of the B+ tree. Every B+ tree maps keys of a type `DataBox` \(a single value or "cell" in a table\) to values of type `RecordId` \(identifiers for records on data pages\). An example of inserting and a retrieving records using keys can be found in the comments at [`@BPlusTree.java#L12`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/BPlusTree.java#L124) 24 | * [`BPlusNode.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/BPlusNode.java) - A B+ node represents a node in the B+ tree, and contains similar methods to `BPlusTree` such as `get`, `put` and `delete`. `BPlusNode` is an abstract class and is implemented as either a `LeafNode` or an `InnerNo` 25 | * * [`LeafNode.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/LeafNode.java) - A leaf node is a node with no descendants that contains pairs of keys and Record IDs that point to the relevant records in the table, as well a pointer to its right sibling. More details can be found [`@LeafNode.java#L15`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/LeafNode.java#L15) 26 | * [`InnerNode.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/InnerNode.java) - An inner node is a node that stores keys and pointers \(page numbers\) to child nodes \(which themselves may either be an inner node or a leaf node\). More details can be found [`@InnerNode.java#L15`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/InnerNode.java#L15) 27 | * [`BPlusTreeMetadata.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/BPlusTreeMetadata.java)- This file contains a class that stores useful information such as the order and height of the tree. You can access instances of this class using the `this.metadata` instance variables available in all of the classes listed above. 28 | 29 | #### Implementation Details 30 | 31 | You should read through all of the code in the [`index`](https://github.com/berkeley-cs186/sp21-rookiedb/tree/master/src/main/java/edu/berkeley/cs186/database/index) directory. Many comments contain critical information on how you must implement certain functions. For example, `BPlusNode::put` specifies how to redistribute entries after a split. You are responsible for reading these comments. Here are a few of the most notable points: 32 | 33 | * Our implementation of B+ trees **does not** support duplicate keys. You will throw an exception whenever a duplicate key is inserted. 34 | * Our implementation of B+ trees assumes that inner nodes and leaf nodes can be serialized on a single page. You **do not** have to support nodes that span multiple pages. 35 | * Our implementation of delete **does not** rebalance the tree. Thus, the invariant that all non-root leaf nodes in a B+ tree of order `d` contain between `d` and `2d` entries is broken. Note that actual B+ trees **do rebalance** after deletion, but we will **not** be implementing rebalancing trees in this project for the sake of simplicity. 36 | 37 | ### LockContext objects 38 | 39 | There are a few parts in this project where a method will take in objects of the type `LockContext`. You do not need to worry too much about these objects right now; they will become more relevant in Project 4. 40 | 41 | If there are any methods you wish to call that require these objects, use the ones passed in to the method you are implementing, or defined in the class of the method you are implementing \(`this.lockContext` for `BPlusTree` and `this.treeContext` for `InnerNode` and `LeafNode`\). 42 | 43 | ### Optional<T> objects 44 | 45 | This part of the project makes extensive use of `Optional` objects. We recommend reading through the documentation [here](https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html) to get a feel for them. In particular, we use `Optional`s for values that may not necessarily be present. For example, a call to `get` may not yield any value for a key that doesn't correspond to a record, in which case an `Optional.empty()` would be returned. If the key did correspond to a record, a populated `Optional.of(RecordId(pageNum, entryNum))` would be returned instead. 46 | 47 | ### Project Structure Diagram 48 | 49 | Here's a diagram that shows the structure of the project with color-coded components. You may find it helpful to refer back to this after you start working on the tasks. 50 | 51 | ![\(Click on the image to zoom in\)](../../.gitbook/assets/impldetails.jpg) 52 | 53 | * Green Boxes: functions that you need to implement 54 | * White boxes: next to each function, contains a quick summary of the important points that you need to consider for that function. **To find more detailed descriptions look at the comments of each method**. 55 | * Orange boxes: hints for each function which may point you to helper functions. 56 | 57 | ## Your Tasks 58 | 59 | ### Task 1: LeafNode::fromBytes 60 | 61 | You should first implement the `fromBytes` in `LeafNode`. This method reads a `LeafNode` from a page. For information on how a leaf node is serialized, see `LeafNode::toBytes`. For an example on how to read a node from disk, see `InnerNode::fromBytes`. Your code should be similar to the inner node version but should account for the differences between how inner nodes and leaf nodes are serialized. You may find the documentation in [`ByteBuffer.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/common/ByteBuffer.java#L5) helpful. 62 | 63 | Once you have implemented `fromBytes` you should be passing `TestLeafNode::testToAndFromBytes`. 64 | 65 | ### Task 2: get, getLeftmostLeaf, put, remove 66 | 67 | After implementing `fromBytes`, you will need to implement the following methods in `LeafNode`, `InnerNode`, and `BPlusTree`: 68 | 69 | * `get` 70 | * `getLeftmostLeaf` \(`LeafNode` and `InnerNode` only\) 71 | * `put` 72 | * `remove` 73 | 74 | For more information on what these methods should do refer to the comments in `BPlusTree` and `BPlusNode`. 75 | 76 | Each of these methods, although split into three different classes, can be viewed as one recursive action each - the `BPlusTree` method starts the call, the `InnerNode` method is the recursive case, and the `LeafNode` method is the base case. It's suggested that you work on one method at a time \(over all three classes\). 77 | 78 | We've provided a `sync()` method in `LeafNode` and `InnerNode`. The purpose of `sync()` is to ensure that representation of a node in our buffers is up-to-date with the representation of the node in program memory. **Do not forget to call `sync()` when implementing the two mutating methods** \(`put` and `remove`\); it's easy to forget. 79 | 80 | ### Task 3: Scans 81 | 82 | You will need to implement the following methods in `BPlusTree`: 83 | 84 | * `scanAll` 85 | * `scanGreaterEqual` 86 | 87 | In order to implement these, you will have to complete the [`BPlusTreeIterator`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/912d6248a59d1f27117796d9e4c5d7e6ee194b91/src/main/java/edu/berkeley/cs186/database/index/BPlusTree.java#L413) inner class in `BPlusTree.java`to complete these two methods. 88 | 89 | After completing this Task you should be passing `TestBPlusTree::testRandomPuts` 90 | 91 | Your implementation **does not** have to account for the tree being modified during a scan. For the time being you can think of this as there being a lock that prevents scanning and mutation from overlapping, and that the behavior of iterators created before a modification is undefined \(you can handle any problems with these iterators however you like, or not at all\). 92 | 93 | ### Task 4: Bulk Load 94 | 95 | Much like the methods from the Task 2 you'll need to implement `bulkLoad` within all three of `LeafNode`, `InnerNode`, and `BPlusTree`. Since bulk loading is a mutating operation you will need to call `sync()`. Be sure to read the instructions in [`BPluNode::bulkLoad`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/index/BPlusNode.java#L139) carefully to ensure you split your nodes properly. We've provided a visualization of bulk loading for an order 2 tree with fill factor 0.75 \([powerpoint slides here](https://docs.google.com/presentation/d/1_ghdp60NV6XRHnutFAL20k2no6tr2PosXGokYtR8WwU/edit?usp=sharing)\): 96 | 97 | ![](../../.gitbook/assets/vis%20%281%29%20%281%29%20%282%29%20%283%29%20%283%29%20%281%29.gif) 98 | 99 | After this you should pass all the Project 2 tests we have provided to you \(and any you add yourselves\). These are all the provided tests in [`database.index.*`](https://github.com/berkeley-cs186/sp21-rookiedb/tree/master/src/test/java/edu/berkeley/cs186/database/index). 100 | 101 | ## Debugging 102 | 103 | To help you debug we have implemented the `toDotPDFFile` method of `BPlusTree`. You can add a call to this method in a test to generate a PDF file of your B+ tree. 104 | 105 | For example, 106 | 107 | ```java 108 | BPlusTree tree = ... 109 | tree.toDotPDFFile("tree.pdf"); 110 | ``` 111 | 112 | If you get `"Cannot run program "dot"`you need to install [GraphViz](https://graphviz.gitlab.io/download/). GraphViz is a software package that generates visualizations of network style graphs. 113 | 114 | ## You're done! 115 | 116 | Move on to the next sections for details on testing and on submitting the assignment. 117 | 118 | -------------------------------------------------------------------------------- /project-handout/proj3.md: -------------------------------------------------------------------------------- 1 | # Project 3: Joins and Query Optimization 2 | 3 | This assignment will be released on **Tuesday, 2/23/2021**. 4 | 5 | -------------------------------------------------------------------------------- /project-handout/proj3/README.md: -------------------------------------------------------------------------------- 1 | # Project 3: Joins and Query Optimization 2 | 3 | -------------------------------------------------------------------------------- /project-handout/proj3/getting-started.md: -------------------------------------------------------------------------------- 1 | # Getting Started 2 | 3 | ## Logistics 4 | 5 | This project is worth 8% of your overall grade in the class. 6 | 7 | * Part 1 is due **Monday, 3/8/2021 at 11:59PM PST (GMT-8)** and will be worth 30% of your score. Your score will be determined by public tests only. 8 | * Part 2 is due **Monday, 3/15/2021 at 11:59PM PDT (GMT-7)** and will be worth the remaining 70% of your score. We'll be running the public tests for Part 2 and all hidden tests for both Part 1 and Part 2 on this submission. 9 | 10 | The workload for the project is designed to be completed solo, but this semester we're allowing students to work on this project with a partner if you want to. Your partner does not have to be the same one as you had for Project 2. Feel free to search for a partner on [this Piazza thread](https://piazza.com/class/kjoxqrf1eq04mr?cid=5)! 11 | 12 | ## Prerequisites 13 | 14 | You'll need to finish both Iterators & Joins lectures to finis Part 1. 15 | 16 | To finish Part 2 you'll need to watch up to the Query Optimization: Costs & Search lecture. 17 | 18 | ## Fetching the released code 19 | 20 | The GitHub Classroom link for this project is in the Project 3 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). Once your private repo is set up clone the Project 3 skeleton code onto your local machine. 21 | 22 | ### Setting up your local development environment 23 | 24 | If you're using IntelliJ you can follow the instructions [in Project 0](../proj0/getting-started.md#setting-up-your-local-development-environment) to set up your local environment again. Once you have your environment set up you can head to the next section [Part 0](skeleton-code.md) and begin working on the assignment. 25 | 26 | ## Adding a partner 27 | 28 | Once you've found a partner fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSdnDiXgyjRRFvm2mQ0G4M1o2xjGMDLpjXlOjbDzfAxPGibvZg/viewform?usp=sf_link) so we know who you're working with. If you want to share code over GitHub you can follow the instructions [here](../../common/adding-a-partner-on-github.md). 29 | 30 | ## Debugging Issues with GitHub Classroom 31 | 32 | Feel free to skip this section if you don't have any issues with GitHub Classroom. If you are having issues \(i.e. the page froze or some error message appeared\), first check if you have access to your repo at `https://github.com/berkeley-cs186-student/sp21-proj3-username`, replacing `username` with your GitHub username. If you have access to your repo and the starter code is there, then you can proceed as usual. If you have access to your repo but the starter code is not there, run the following commands in a terminal \(again replacing `username` with your GitHub username\): 33 | 34 | ```text 35 | git clone https://github.com/berkeley-cs186/sp21-rookiedb sp21-proj3 36 | cd sp21-proj3/ 37 | git remote remove origin 38 | git remote add origin https://github.com/berkeley-cs186-student/sp21-proj3-username.git 39 | git push -u origin master 40 | ``` 41 | 42 | Then, you can proceed as usual. 43 | 44 | ### 404 Not Found 45 | 46 | If you're getting a 404 not found page when trying to access your repo, make sure you've set up your repo using the GitHub Classroom link in the Project 3 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). 47 | 48 | If you don't have access to your repo at all after following these steps, feel free to contact the course staff on Piazza. 49 | 50 | -------------------------------------------------------------------------------- /project-handout/proj3/part-1-join-algorithms/README.md: -------------------------------------------------------------------------------- 1 | # Part 1: Join Algorithms 2 | 3 | ![Datatape](../../../.gitbook/assets/datatape.png) 4 | 5 | In this part, you will implement some join algorithms: block nested loop join, sort merge, and grace hash join. You can complete Task 1, Task 2 and Task 3 in **any order you want**. Task 4 is dependent on the completion of Task 3. 6 | 7 | Aside from when the comments tell you that you can do something in memory, everything else should be **streamed**. You should not hold more pages in memory at once than the given algorithm says you are allowed to. Doing otherwise may result in no credit. 8 | 9 | **Note on terminology**: in lecture, we sometimes use both block and page describe the unit of transfer between memory and disk. In the context of join algorithms, however, page refers to the unit of transfer between memory and disk, and block refers to a set of one or more pages. All uses of the word `block` in this part refer to this second definition \(a set of pages\). 10 | 11 | **Convenient assumptions**: 12 | 13 | * For all iterators that will be implemented in this project you can assume `hasNext()` will always be called before `next()`. 14 | * Any Record object provided through an argument or as an element of a list or iterator will never be `null`. 15 | * For testing purposes, we will **not** be testing behavior on invalid inputs \(`null` objects, negative buffer sizes or buffers too small to perform a join, invalid queries, etc...\). You can handle these inputs however you want, or not at all. 16 | * Your join operators, sort operator, and query plans do not need to account for underlying relations being mutated during their execution. 17 | 18 | ## Your Tasks 19 | 20 | ### Task 1: Nested Loop Joins 21 | 22 | **Update \(03/04/2021\)**: If you received your copy of the skeleton code before March 1st, then the following docstring for fetchNextRightPage is incorrect: 23 | 24 | "rightPageIterator should be set to a backtracking iterator over up to one page of records from the left source, and leftRecord should be set to the first record in this block." 25 | 26 | and should instead be: 27 | 28 | "rightPageIterator should be set to a backtracking iterator over up to one page of records from **the right source.**" Note that leftRecord should not be set to the first record in the right page. 29 | 30 | #### Simple Nested Loop Join \(SNLJ\) 31 | 32 | SNLJ has already been implemented for you in `SNLJOperator`. You should take a look at it to get a sense for how the pseudocode in lecture and section translate to code, but you should **not** copy it when writing your own join operators. Although each join algorithm should return the same data, the order differs between each join algorithm, as does the structure of the code. In particular, SNLJ does not need to explicitly manage pages of data \(it only ever needs the next record of each table, and therefore can just use an iterator over all records in a table\), whereas all the algorithms you will be implementing in this part must explicitly manage when pages of data are fetched from disk. 33 | 34 | #### Page Nested Loop Join \(PNLJ\) 35 | 36 | PNLJ has already been implemented for you as a special case of BNLJ with B=3. Therefore, it will not function properly until BNLJ has been properly implemented. The test cases for both PNLJ and BNLJ in `TestNestedLoopJoin` depend on a properly implemented BNLJ. 37 | 38 | #### Block Nested Loop Join \(BNLJ\) 39 | 40 | You should read through the given skeleton code in `BNLJOperator`. The `next` and `hasNext` methods of the iterator have already been filled out for you, but you will need to implement the `fetchNextRecord` method, which should do most of the heavy lifting of the BNLJ algorithm. 41 | 42 | There are also two suggested helper methods: `fetchNextLeftBlock`, which should fetch the next non-empty block of left table pages from `leftIterator`, and `fetchNextRightPage`, which should fetch the next non-empty page of the right table \(from `rightIterator`\). We suggest breaking up the problem into smaller subproblems, and adding more helper methods than the two suggested ones -- it will make debugging your code much easier. 43 | 44 | The `fetchNextRecord` method should, as its name suggests, fetches the next record of the join output. When implementing this method there are 4 important cases you should consider: 45 | 46 | * Case 1: The right page iterator has a value to yield 47 | * Case 2: The right page iterator doesn't have a value to yield but the left block iterator does 48 | * Case 3: Neither the right page nor left block iterators have values to yield, but there's more right pages 49 | * Case 4: Neither right page nor left block iterators have values nor are there more right pages, but there are still left blocks 50 | 51 | We've provided the following animation to give you a feel for how the blocks, pages, and records are traversed during the nested looping process. Identifying where each of these cases take place in the diagram may help guide on what to do in each case. 52 | 53 | ![](../../../.gitbook/assets/bnlj-slower.gif) 54 | 55 | Animations of SNLJ and PNLJ can be found [here](../../../common/misc/nested-loop-join-animations.md). Loaded left records are highlighted in blue, while loaded orange records are highlighted in orange. The dark purple square represents which pair of records are being considered for the join, while light purple shows which pairs have already been considered. 56 | 57 | Once you have implemented `BNLJOperator`, all the tests in `TestNestedLoopJoin` should pass. 58 | 59 | ### Task 2: Hash Joins 60 | 61 | #### Simple Hash Join \(SHJ\) 62 | 63 | We've provided an implementation of Simple Hash Join which can be found in `SHJOperator.java`. Simple Hash Join performs a single pass of partitioning on only the left records before attempting to join. Read the code for SHJ carefully as it should give you a good idea of how to implement Grace Hash Join. 64 | 65 | #### Grace Hash Join \(GHJ\) 66 | 67 | Everything you will need to implement will be done in `GHJOperator.java`. You will need to implement the functions `partition`, `buildAndProbe`, and `run`. Additionally, you will have to provide some inputs in `getBreakSHJInputs` and `getBreakGHJInputs` which will be used to test that Simple Hash Join fails but Grace Hash Join passes \(tested in `testBreakSHJButPassGHJ`\) and that GHJ breaks \(tested in `testGHJBreak`\) respectively. 68 | 69 | The file `Partition.java` in the `query/disk` directory will be useful when working with partitions. Read through the file and get a good idea what methods you can use. 70 | 71 | Once you have implemented all the methods in `GHJOperator.java`, all tests in `TestGraceHashJoin.java` will pass. There will be **no hidden tests** for Grace Hash Join. Your grade for Grace Hash Join will come solely from the released public tests. 72 | 73 | ### Task 3: External Sort 74 | 75 | The first step in Sort Merge Join is to sort both input relations. Therefore, before you can work on implementing Sort Merge Join, you must first implement an external sorting algorithm. 76 | 77 | Recall that a "run" in the context of external mergesort is just a sequence of sorted records. This is represented in `SortOperator` by the `Run` class \(located in `query/disk/Run.java`\). As runs in external mergesort can span many pages \(and eventually span the entirety of the table\), the `Run` class does not keep all its data in memory. Rather, it creates a temporary table and writes all of its data to the temporary table \(which is materialized to disk at the buffer manager's discretion\). 78 | 79 | You will need to implement the `sortRun`, `mergeSortedRuns`, `mergePass`, and `sort` methods of `SortOperator`. 80 | 81 | * `sortRun(run)` should sort the passed in data using an in-memory sort \(Pass 0 of external mergesort\). 82 | * `mergeSortedRuns(runs)` should return a new run given a list of sorted runs. 83 | * `mergePass(runs)` should perform a single merge pass of external mergesort, given a list of all the sorted runs from the previous pass. 84 | * `sort()` should run external mergesort from start to finish, and return the final run with the sorted data 85 | 86 | Each of these methods may be tested independently, so you **must** implement each one as described. You may add additional helper methods as you see fit. 87 | 88 | Once you have implemented all four methods, all the tests in `TestSortOperator` should pass. 89 | 90 | ### Task 4: Sort Merge Join 91 | 92 | Now that you have a working external sort, you can now implement Sort Merge Join \(SMJ\). 93 | 94 | For simplicity, your implementation of SMJ should _not_ utilize the optimization discussed in lecture in any case \(where the final merge pass of sorting happens at the same time as the join\). Therefore, you should use `SortOperator` to sort during the sort phase of SMJ. 95 | 96 | You will need to implement the `SortMergeIterator` inner class of `SortMergeOperator`. 97 | 98 | Your implementation in `SortMergeOperator` and your implementation of `SortOperator` may be tested independently. You **must not** use any method of `SortOperator` in `SortMergeOperator`, aside from the public methods given in the skeleton \(in other words: don't add a new public method to `SortOperator` and call it from `SortMergeOperator`\). 99 | 100 | Once you have implemented `SortMergeIterator`, all the tests in `TestSortMergeJoin` should pass. 101 | 102 | ## Submission 103 | 104 | Follow the submission instructions [here](../submitting-the-assignment.md) for the Project 3 Part 1 assignment on Gradescope. If you completed everything you should be passing all the tests in the following files: 105 | 106 | * `database.query.TestNestedLoopJoin` 107 | * `database.query.TestGraceHashJoin` 108 | * `database.query.TestSortOperator` 109 | * `database.query.TestSortMergeJoin` 110 | 111 | -------------------------------------------------------------------------------- /project-handout/proj3/part-1-join-algorithms/task-1-debugging.md: -------------------------------------------------------------------------------- 1 | # Task 1 Debugging 2 | 3 | **Update \(03/05/2021\):** If you received your copy of the skeleton code before March 6th, then the [following line](https://github.com/berkeley-cs186/sp21-rookiedb/commit/0f9805a8c7e3417f9dcd4eba873c9b6d51af7b03) in ExtraNLJTests should be changed. This doesn't effect the way the tests run, but will make print debugging less confusing. 4 | 5 | We put together some extra tests with detailed error outputs that should give you some hints as to what might be go wrong with your BNLJ implementation. They're meant to be easier to reason about than the main BNLJ tests since each page only has 4 records instead of 400. **These tests are ungraded**. They're just meant to help you track down bugs in the nested loop join tests in `TestNestedLoopJoin`. 6 | 7 | ## Overview 8 | 9 | These tests are designed to give you visualizations that might hint as to where you're going wrong. **You should try to get the test cases working in order**, that is, start with the 1x1 PNLJ tests, followed by the 2x2 PNLJ tests, and then finally the 2x2 BNLJ tests. When you fail a test it should give you a detailed description of why you failed. Here's some example output from failing `testPNLJ1x1Full`: 10 | 11 | ```text 12 | edu.berkeley.cs186.database.query.QueryPlanException: 13 | == MISSING OR EXTRA RECORDS == 14 | +---------+ 15 | Left 0 | ? ? ? ? | 16 | Page 0 | x x x x | 17 | #1 0 | x x x x | 18 | 0 | x x x x | 19 | +---------+ 20 | 0 0 0 0 21 | Right 22 | Page #1 23 | 24 | You either excluded or included records when you shouldn't have. Key: 25 | - x means we expected this record to be included and you included it 26 | - + means we expected this record to be excluded and you included it 27 | - ? means we expected this record to be included and you excluded it 28 | - r means you included this record multiple times 29 | - a blank means we expected this record to be excluded and you excluded it 30 | ``` 31 | 32 | In this example we expect every single record in the left table to be joined with every single table in the right table. The question marks on the top row of the box tell you that you're missing 4 records. A likely reason for why this is the case is that your join logic exits too early, before the last left record is ever compared against the right records. The exact cause of this particular problem is stopping iteration as soon as `!this.leftRecordIterator.hasNext()`, before considering the last left record against any right records. 33 | 34 | Here's a more complicated case that we see in office hours a lot in testPNLJ2x2Full : 35 | 36 | ```text 37 | edu.berkeley.cs186.database.query.QueryPlanException: 38 | == MISMATCH == 39 | +---------+---------+ 40 | Left 0 | | | 41 | Page 0 | | | 42 | #2 0 | | | 43 | 0 | | | 44 | +---------+---------+ 45 | Left 0 | x x x x | A | 46 | Page 0 | x x x x | | 47 | #1 0 | x x x x | | 48 | 0 | x x x x | E | 49 | +---------+---------+ 50 | 0 0 0 0 0 0 0 0 51 | Right Right 52 | Page #1 Page #2 53 | 54 | You had 1 or more mismatched records. The first mismatch 55 | was at record #17. The above shows the state of 56 | the join when the mismatch occurred. Key: 57 | - x means your join properly yielded this record at the right time 58 | - E was the record we expected you to yield 59 | - A was the record that you actually yielded 60 | ``` 61 | 62 | This example found a record returned in the wrong order. To help you debug we give the position of where we expected the next record to be, and where it actually was. Can you spot the bug? We were expecting the first record on right page \#2 to be compared with the first record in left page \#1. It appears that leftRecord was still set to the last record on page \#1. The mistake was that the leftRecord wasn't reset back to the first record in the left page. Many students will remember to call `leftIterator.reset()`, but forget to do `leftRecord = leftIterator.next()` afterwards, causing this issue. 63 | 64 | ## Animations 65 | 66 | Here's some animations of how we expect each test format to be traversed. 67 | 68 | ### PNLJ 1x1 69 | 70 | ![](../../../.gitbook/assets/1x1%20%283%29%20%284%29.gif) 71 | 72 | ### PNLJ 2x2 73 | 74 | ![](../../../.gitbook/assets/2x2pnlj%20%281%29%20%281%29.gif) 75 | 76 | ### BNLJ 2x2 \(B=4\) 77 | 78 | ![](../../../.gitbook/assets/2x2bnlj%20%284%29%20%284%29%20%283%29.gif) 79 | 80 | ## Cases 81 | 82 | Here's examples of the cases mentioned in the spec look like in the PNLJ 2x2 cases \(block size of 1\). The dark purple square is the most recently considered record. The red arrow points to the next pair records that should be considered for the join. 83 | 84 | ![](../../../.gitbook/assets/cases%20%281%29.png) 85 | 86 | Try to think about what should be advanced and what should be reset in each case. As a reminder: 87 | 88 | * Case 1: The right page iterator has a value to yield 89 | * Case 2: The right page iterator doesn't have a value to yield but the left block iterator does 90 | * Case 3: Neither the right page nor left block iterators have values to yield, but there's more right pages 91 | * Case 4: Neither right page nor left block iterators have values nor are there more right pages, but there are still left blocks 92 | 93 | ## Common Errors 94 | 95 | ### PNLJ 1x1 Full 96 | 97 | ```text 98 | == MISSING OR EXTRA RECORDS == 99 | +---------+ 100 | Left 0 | x ? x ? | 101 | Page 0 | x ? x ? | 102 | #1 0 | x ? x ? | 103 | 0 | x ? x ? | 104 | +---------+ 105 | 0 0 0 0 106 | Right 107 | Page #1 108 | 109 | You either excluded or included records when you shouldn't have. Key: 110 | - x means we expected this record to be included and you included it 111 | - + means we expected this record to be excluded and you included it 112 | - ? means we expected this record to be included and you excluded it 113 | - r means you included this record multiple times 114 | - a blank means we expected this record to be excluded and you excluded it 115 | ``` 116 | 117 | The above case is likely happening because you're calling `rightRecordIterator.next()` more often than you should, and losing every other value. Make sure whenever you call `rightRecordIterator.next()` that you compare the result to the current left record and set it as the next record if there's a match. 118 | 119 | ```text 120 | == MISMATCH == 121 | +---------+ 122 | Left 0 | | 123 | Page 0 | | 124 | #1 0 | E | 125 | 0 | A x x x | 126 | +---------+ 127 | 0 0 0 0 128 | Right 129 | Page #1 130 | 131 | You had 1 or more mismatched records. The first mismatch 132 | was at record #5. The above shows the state of 133 | the join when the mismatch occurred. Key: 134 | - x means your join properly yielded this record at the right time 135 | - E was the record we expected you to yield 136 | - A was the record that you actually yielded 137 | ``` 138 | 139 | The above case is mostly likely caused by failing to advance the left record in case 2. Remember that even if you call `leftRecordIterator.next()`, if you don't set the result to leftRecord then leftRecord won't get updated. 140 | 141 | ```text 142 | edu.berkeley.cs186.database.query.QueryPlanException: 143 | == MISSING OR EXTRA RECORDS == 144 | +---------+ 145 | Left 0 | ? ? ? ? | 146 | Page 0 | x x x x | 147 | #1 0 | x x x x | 148 | 0 | x x x x | 149 | +---------+ 150 | 0 0 0 0 151 | Right 152 | Page #1 153 | 154 | You either excluded or included records when you shouldn't have. Key: 155 | - x means we expected this record to be included and you included it 156 | - + means we expected this record to be excluded and you included it 157 | - ? means we expected this record to be included and you excluded it 158 | - r means you included this record multiple times 159 | - a blank means we expected this record to be excluded and you excluded it 160 | ``` 161 | 162 | The above case likely caused by stopping iteration too early, specifically as soon as !leftRecordIterator.hasNext\(\). Remember that even if there isn't another left record, you still have to compare the current left record against every right record in the rightRecordIterator. 163 | 164 | ```text 165 | edu.berkeley.cs186.database.query.QueryPlanException: 166 | == MISMATCH == 167 | +---------+ 168 | Left 0 | | 169 | Page 0 | | 170 | #1 0 | A | 171 | 0 | x x x E | 172 | +---------+ 173 | 0 0 0 0 174 | Right 175 | Page #1 176 | 177 | You had 1 or more mismatched records. The first mismatch 178 | was at record #4. The above shows the state of 179 | the join when the mismatch occurred. Key: 180 | - x means your join properly yielded this record at the right time 181 | - E was the record we expected you to yield 182 | - A was the record that you actually yielded 183 | 184 | == MISSING OR EXTRA RECORDS == 185 | +---------+ 186 | Left 0 | x x x ? | 187 | Page 0 | x x x ? | 188 | #1 0 | x x x ? | 189 | 0 | x x x ? | 190 | +---------+ 191 | 0 0 0 0 192 | Right 193 | Page #1 194 | 195 | You either excluded or included records when you shouldn't have. Key: 196 | - x means we expected this record to be included and you included it 197 | - + means we expected this record to be excluded and you included it 198 | - ? means we expected this record to be included and you excluded it 199 | - r means you included this record multiple times 200 | - a blank means we expected this record to be excluded and you excluded it 201 | ``` 202 | 203 | In the above case you're probably handling case 2 too early, before you ever compare the last right record to the current left record. Make sure that when you handle case 2 that you've already handled case 1 for the the last right record. 204 | 205 | ### PNLJ 2x2 Full 206 | 207 | ```text 208 | == MISMATCH == 209 | +---------+---------+ 210 | Left 0 | | | 211 | Page 0 | | | 212 | #2 0 | | | 213 | 0 | | | 214 | +---------+---------+ 215 | Left 0 | x x x x | A | 216 | Page 0 | x x x x | | 217 | #1 0 | x x x x | | 218 | 0 | x x x x | E | 219 | +---------+---------+ 220 | 0 0 0 0 0 0 0 0 221 | Right Right 222 | Page #1 Page #2 223 | 224 | You had 1 or more mismatched records. The first mismatch 225 | was at record #17. The above shows the state of 226 | the join when the mismatch occurred. Key: 227 | - x means your join properly yielded this record at the right time 228 | - E was the record we expected you to yield 229 | - A was the record that you actually yielded 230 | ``` 231 | 232 | In the above case you're probably not handling case 3 properly. In particular, make sure that when you run out of both left records and right records for a given left block and right page respectively that you call `leftIterator.reset()` AND assign `leftRecord` to the first record of the current page. Many students forget to reassign left record. 233 | 234 | ```text 235 | +---------+---------+ 236 | Left 0 | | | 237 | Page 0 | | | 238 | #2 0 | | A | 239 | 0 | E | | 240 | +---------+---------+ 241 | Left 0 | x x x x | x x x x | 242 | Page 0 | x x x x | x x x x | 243 | #1 0 | x x x x | x x x x | 244 | 0 | x x x x | x x x x | 245 | +---------+---------+ 246 | 0 0 0 0 0 0 0 0 247 | Right Right 248 | Page #1 Page #2 249 | 250 | You had 1 or more mismatched records. The first mismatch 251 | was at record #33. The above shows the state of 252 | the join when the mismatch occurred. Key: 253 | - x means your join properly yielded this record at the right time 254 | - E was the record we expected you to yield 255 | - A was the record that you actually yielded 256 | ``` 257 | 258 | In the above case make sure that by the end of case 4 you've set rightRecordIterator to be an iterator over right page \#1. 259 | 260 | ```text 261 | edu.berkeley.cs186.database.query.QueryPlanException: 262 | == MISMATCH == 263 | +---------+---------+ 264 | Left 0 | | | 265 | Page 0 | | | 266 | #2 0 | | | 267 | 0 | E | | 268 | +---------+---------+ 269 | Left 0 | A x x x | x x x x | 270 | Page 0 | x x x x | x x x x | 271 | #1 0 | x x x x | x x x x | 272 | 0 | x x x x | x x x x | 273 | +---------+---------+ 274 | 0 0 0 0 0 0 0 0 275 | Right Right 276 | Page #1 Page #2 277 | 278 | You had 1 or more mismatched records. The first mismatch 279 | was at record #33. The above shows the state of 280 | the join when the mismatch occurred. Key: 281 | - x means your join properly yielded this record at the right time 282 | - E was the record we expected you to yield 283 | - A was the record that you actually yielded 284 | ``` 285 | 286 | In the above case make sure that by the end of case 4 you've set leftRecordIterator to be an iterator over left page \#2. 287 | 288 | ```text 289 | == MISSING OR EXTRA RECORDS == 290 | +---------+---------+ 291 | Left 0 | ? ? ? ? | ? ? ? ? | 292 | Page 0 | ? ? ? ? | ? ? ? ? | 293 | #2 0 | ? ? ? ? | ? ? ? ? | 294 | 0 | ? ? ? ? | ? ? ? ? | 295 | +---------+---------+ 296 | Left 0 | x x x x | x x x x | 297 | Page 0 | x x x x | x x x x | 298 | #1 0 | x x x x | x x x x | 299 | 0 | x x x x | x x x x | 300 | +---------+---------+ 301 | 0 0 0 0 0 0 0 0 302 | Right Right 303 | Page #1 Page #2 304 | 305 | You either excluded or included records when you shouldn't have. Key: 306 | - x means we expected this record to be included and you included it 307 | - + means we expected this record to be excluded and you included it 308 | - ? means we expected this record to be included and you excluded it 309 | - r means you included this record multiple times 310 | - a blank means we expected this record to be excluded and you excluded it 311 | ``` 312 | 313 | In the above case you're probably doing something wrong in case 4. In particular make sure that your code resets your right record iterator to be an iterator over the first page of the right relation. Remember that you'll need to reset your `rightIterator` to do this! 314 | 315 | ```text 316 | == MISSING OR EXTRA RECORDS == 317 | +---------+---------+ 318 | Left 0 | ? ? ? ? | ? ? ? ? | 319 | Page 0 | ? ? ? ? | ? ? ? ? | 320 | #2 0 | ? ? ? ? | ? ? ? ? | 321 | 0 | ? ? ? ? | ? ? ? ? | 322 | +---------+---------+ 323 | Left 0 | x x x x | ? ? ? ? | 324 | Page 0 | x x x x | ? ? ? ? | 325 | #1 0 | x x x x | ? ? ? ? | 326 | 0 | x x x x | ? ? ? ? | 327 | +---------+---------+ 328 | 0 0 0 0 0 0 0 0 329 | Right Right 330 | Page #1 Page #2 331 | 332 | You either excluded or included records when you shouldn't have. Key: 333 | - x means we expected this record to be included and you included it 334 | - + means we expected this record to be excluded and you included it 335 | - ? means we expected this record to be included and you excluded it 336 | - r means you included this record multiple times 337 | - a blank means we expected this record to be excluded and you excluded it 338 | ``` 339 | 340 | In the above case you likely have a problem in your implementation of case 3, and your code is terminating too early. One possible cause of this is forgetting to mark the beginning of your leftRecordIterator in fetchNextLeftBlock. This could cause problems when you try to reset leftRecordIterator in your case 3, and causing you to throw a no such element exception earlier than you intend to. 341 | 342 | -------------------------------------------------------------------------------- /project-handout/proj3/part-1-join-algorithms/task-2-common-errors.md: -------------------------------------------------------------------------------- 1 | # Task 2 Common Errors 2 | 3 | ## Index out of bounds error while partitioning 4 | 5 | Hash codes can be negative. Make sure you handle that case. The hash codes can also be larger than the number of partitions, so make sure you handle that too. We recommend you look at [SHJOperator'](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/join/SHJOperator.java#L53-L56)s implementation to make sure you partition correctly with hash codes. 6 | 7 | ## Reached the max number of passes cap 8 | 9 | This means that you're doing recursive partitioning infinitely. The most likely cause of this is partitioning using the the same hash function every single time. Make sure to update your hash func calls so that the hash function is updated each time. 10 | 11 | If you're certain that you're doing both of those things, make sure your condition for recursive partitioning is correct. An off by one \(for example `<=` vs `<` \) is enough to make it so you never reach the build and probe phase. 12 | 13 | ## Code running forever/recursion depth limit exceeded/java.lang.OutOfMemoryError 14 | 15 | Make sure every time you make a recursive call to run that you increment the pass number. 16 | 17 | ## AssertionError: Expected: 1674 Actual: 91 18 | 19 | Make sure when you recursively call run that you add all of the resulting records to your output. Additionally make sure that whenever you call buildAndProbe that you also add those records to your output. 20 | 21 | -------------------------------------------------------------------------------- /project-handout/proj3/part-2-query-optimization.md: -------------------------------------------------------------------------------- 1 | # Part 2: Query Optimization 2 | 3 | ![Dataspace](../../.gitbook/assets/dataspace.png) 4 | 5 | In this part, you will implement a piece of a relational query optimizer: Plan space search. 6 | 7 | ## Overview: Plan Space Search 8 | 9 | You will now search the plan space of some cost estimates. For our database, this is similar to System R: the set of all left-deep trees, avoiding Cartesian products where possible. Unlike System R, we do not consider interesting orders, and further, we completely disallow Cartesian products in all queries. To search the plan space, we will utilize the dynamic programming algorithm used in the Selinger optimizer. 10 | 11 | Before you begin, you should have a good idea of how the `QueryPlan` class is used \(see the [Skeleton Code](skeleton-code.md) section\) and how query operators fit together. For example, to implement a simple query with a single selection predicate: 12 | 13 | ```java 14 | /** 15 | * SELECT * FROM myTableName WHERE stringAttr = 'CS 186' 16 | */ 17 | QueryOperator source = SequentialScanOperator(transaction, myTableName); 18 | QueryOperator select = SelectOperator(source, 'stringAttr', PredicateOperator.EQUALS, "CS 186"); 19 | 20 | int estimatedIOCost = select.estimateIOCost(); // estimate I/O cost 21 | Iterator iter = select.iterator(); // iterator over the results 22 | ``` 23 | 24 | A tree of `QueryOperator` objects is formed when we have multiple tables joined together. The current implementation of `QueryPlan#execute`, which is called by the user to run the query, is to join all tables in the order given by the user: if the user says `SELECT * FROM t1 JOIN t2 ON .. JOIN t3 ON ..`, then it scans `t1`, then joins `t2`, then joins `t3`. This will perform poorly in many cases, so your task is to implement the dynamic programming algorithm to join the tables together in a better order. 25 | 26 | You will have to implement the `QueryPlan#execute` method. To do so, you will also have to implement two helper methods: `QueryPlan#minCostSingleAccess` \(pass 1 of the dynamic programming algorithm\) and `QueryPlan#minCostJoins` \(pass i > 1\). 27 | 28 | ## Your Tasks 29 | 30 | Note that you may **not** modify the signature of any methods or classes that we provide to you, but you're free to add helper methods. Also, you should only modify `query/QueryPlan.java` in this part. 31 | 32 | ### Task 5: Single Table Access Selection \(Pass 1\) 33 | 34 | Recall that the first part of the search algorithm involves finding the lowest estimated cost plans for accessing each table individually \(pass i involves finding the best plans for sets of i tables, so pass 1 involves finding the best plans for sets of 1 table\). 35 | 36 | This functionality should be implemented in the `QueryPlan#minCostSingleAccess` helper method, which takes a table and returns the optimal `QueryOperator` for scanning the table. 37 | 38 | In our database, we only consider two types of table scans: a sequential, full table scan \(`SequentialScanOperator`\) and an index scan \(`IndexScanOperator`\), which requires an index and filtering predicate on a column. 39 | 40 | You should first calculate the estimated I/O cost of a sequential scan, since this is always possible \(it's the default option: we only move away from it in favor of index scans if the index scan is both possible and more efficient\). 41 | 42 | Then, if there are any indices on any column of the table that we have a selection predicate on, you should calculate the estimated I/O cost of doing an index scan on that column. If any of these are more efficient than the sequential scan, take the best one. 43 | 44 | Finally, as part of a heuristic-based optimization covered in class, you should push down any selection predicates that involve solely the table \(see `QueryPlan#addEligibleSelections`\). 45 | 46 | This should leave you with a query operator beginning with a sequential or index scan operator, followed by zero or more `SelectOperator`s. 47 | 48 | After you have implemented `QueryPlan#minCostSingleAccess`, you should be passing all of the tests in `TestSingleAccess`. These tests do not involve any joins. 49 | 50 | ### **Task 6: Join Selection \(Pass i > 1\)** 51 | 52 | Recall that for i > 1, pass i of the dynamic programming algorithm takes in optimal plans for joining together all possible sets of i - 1 tables \(except those involving cartesian products\), and returns optimal plans for joining together all possible sets of i tables \(again excluding those with cartesian products\). 53 | 54 | We represent the state between two passes as a mapping from sets of strings \(table names\) to the corresponding optimal `QueryOperator`. You will need to implement the logic for pass i \(i > 1\) of the search algorithm in the `QueryPlan#minCostJoins` helper method. 55 | 56 | This method should, given a mapping from sets of i - 1 tables to the optimal plan for joining together those i - 1 tables, return a mapping from sets of i tables to the optimal left-deep plan for joining all sets of i tables \(except those with cartesian products\). 57 | 58 | You should use the list of explicit join conditions added when the user calls the `QueryPlan#join` method to identify potential joins. 59 | 60 | After implementing this method you should be passing `TestOptimizationJoins#testMinCostJoins` 61 | 62 | **Note:** you should not add any selection predicates in this method. This is because in our database, we only allow two column predicates in the join condition, and a conjunction of single column predicates otherwise, so the only unprocessed selection predicates in pass i > 1 are the join conditions. _This is not generally the case!_ SQL queries can contain selection predicates that can _not_ be processed until multiple tables have been joined together, for example: 63 | 64 | ```sql 65 | SELECT * FROM t1, t2, t3, t4 WHERE (t1.a = t2.b OR t2.b = t2.c) 66 | ``` 67 | 68 | where the single predicate cannot be evaluated until after `t1`, `t2`, _and_ `t3` have been joined together. Therefore, a database that supports all of SQL would have to push down predicates after each pass of the search algorithm. 69 | 70 | ### Task 7: Optimal Plan Selection 71 | 72 | **Update** \(03/10/2021\): The comment for execute\(\) says to "add group by and select operators". This should be changed to "add group by and **project** operators", since selects should have already been added during pass 1. 73 | 74 | **Update** \(03/11/2021\): testJoinOrderB has been updated to accept answers for arbitrary tie breaks when determining join orders. You can get these changes by replacing the assert statements at the end of the test with the ones [here](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/query/TestOptimizationJoins.java#L307-L314). 75 | 76 | Your final task is to write the outermost driver method of the optimizer, `QueryPlan#execute`, which should utilize the two helper methods you have implemented to find the best query plan. 77 | 78 | You will need to add the remaining group by and projection operators that are a part of the query, but have not yet been added to the query plan \(see the private helper methods implemented for you in the `QueryPlan` class\). 79 | 80 | **Note:** The tables in `QueryPlan` are kept in the variable `tableNames`. 81 | 82 | After this, you should pass all the tests we have provided to you in `database.query.*`. 83 | 84 | ## Submission 85 | 86 | Follow the submission instructions [here](submitting-the-assignment.md) for the Project 3 Part 2 assignment on Gradescope. If you completed everything you should be passing all the tests in the following files: 87 | 88 | * `database.query.TestNestedLoopJoin` 89 | * `database.query.TestGraceHashJoin` 90 | * `database.query.TestSortOperator` 91 | * `database.query.TestSingleAccess` 92 | * `database.query.TestOptimizationJoins` 93 | * `database.query.TestBasicQuery` 94 | 95 | -------------------------------------------------------------------------------- /project-handout/proj3/skeleton-code.md: -------------------------------------------------------------------------------- 1 | # Part 0: Skeleton Code 2 | 3 | ![To read, or not to read, that is the question](../../.gitbook/assets/dataskeleton.png) 4 | 5 | In this project you'll be implementing some common join algorithms and a limited version of the Selinger optimizer. We've provided a brief introduction into the new parts of the code base you'll be working with. 6 | 7 | For **Part 1** we recommend you read through: 8 | 9 | * **common/iterator** - Details on backtracking iterators, which will be needed to implement joins 10 | * **Join Operators** - Details on the base class of the join operators you'll be implementing and some useful helper methods we've provided 11 | * **query/disk** - Details on some useful classes for implementing Grace Hash Join and External Sort 12 | 13 | For **Part 2** we recommend you read through: 14 | 15 | * **Scan and Special Operators** - These talk about additional operators that you'll use while creating query plans 16 | * **query/QueryPlan.java** - Gives a high level overview of a QueryPlan and some details on how to create and work with them 17 | 18 | ## common/iterator 19 | 20 | The `common/iterator` directory contains an interface called a `BacktrackingIterator`. Iterators that implement this will be able to mark a point during iteration, and reset back to that mark. For example, here we have a backtracking iterator that just returns 1, 2, and 3, but can backtrack: 21 | 22 | ```java 23 | BackTrackingIterator iter = new BackTrackingIteratorImplementation(); 24 | iter.next(); // returns 1 25 | iter.next(); // returns 2 26 | iter.markPrev(); // marks the previously returned value, 2 27 | iter.next(); // returns 3 28 | iter.hasNext(); // returns false 29 | iter.reset(); // reset to the marked value (line 3) 30 | iter.hasNext(); // returns true 31 | iter.next(); // returns 2 32 | iter.markNext(); // mark the value to be returned next, 3 33 | iter.next(); // returns 3 34 | iter.hasNext(); // returns false 35 | iter.reset(); // reset to the marked value (line 11) 36 | iter.hasNext(); // returns true 37 | iter.next(); // returns 3 38 | ``` 39 | 40 | `ArrayBacktrackingIterator` implements this interface. It takes in an array and returns a backtracking iterator over the values in that array. 41 | 42 | ## query/QueryOperator.java 43 | 44 | The `query` directory contains what are called query operators. A single query to the database may be expressed as a composition of these operators. All operators extend the `QueryOperator` class and implement the `Iterable` interface. The scan operators fetch data from a single table. The remaining operators take one or more input operators, transform or combine the input \(e.g. projecting away columns, sorting, joining\), and return a collection of records. 45 | 46 | ### Join Operators 47 | 48 | [`JoinOperator.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/JoinOperator.java) is the base class of all the join operators. **Reading this file and understanding the methods given to you can save you a lot of time on Part 1.** It provides methods you may need to deal with tables and the current transaction. You should not be dealing directly with `Table` objects nor `TransactionContext` objects while implementing join algorithms in Part 1 \(aside from passing them into methods that require them\). Subclasses of JoinOperator are all located in `query/join`. 49 | 50 | Some helper methods you might want to be aware of are located [here](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/JoinOperator.java#L167-L207). 51 | 52 | ### Scan Operators 53 | 54 | The scan operators fetch data directly from a table. 55 | 56 | * [`SequentialScanOperator.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/SequentialScanOperator.java) - Takes a table name provides an iterator over all the records of that table 57 | * [`IndexScanOperator.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/IndexScanOperator.java) - Takes a table name, column name, a PredicateOperator \(>, <, <=, >=, =\) and a value. The column specified must have an index built on it for this operator to work. If so, the index scan will use take advantage of the index to yield records with columns satisfying the given predicate and value \(e.g. `salaries.yearid >= 2000`\) efficiently 58 | 59 | ### Special Operators 60 | 61 | The remaining operators don't fall into a specific category, but rather perform some specific purpose. 62 | 63 | * [`SelectOperator.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/SelectOperator.java) - Corresponds to the **σ** operator of relational algebra. This operator takes a column name, a PredicateOperator \(>, <, <=, >=, =, !=\) and a value. It will only yields records from the source operator for which the predicate is satisfied, for example \(`yearid >= 2000`\)[`ProjectOperator.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/ProjectOperator.java) - Corresponds to the **π** operator of relational algebra. This operator takes a list of column names and filters out any columns that weren't listed. Can also compute aggregates, but that is out of scope for this project 64 | * [`SortOperator.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/SortOperator.java) - Yields records from the source operator in sorted order. You'll be implementing this in Part 1 65 | 66 | ### Other Operators 67 | 68 | These operators are **out of scope** and directly relevant to the code you'll be writing in this project. 69 | 70 | * [`MaterializeOperator.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/MaterializeOperator.java) - Materializes the source operator into a temporary table immediately, and then acts as a sequential scan over the temporary table. Mainly used in testing to control when IOs take place 71 | * [`GroupByOperator.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/JoinOperator.java) - Out of scope for this project. This operator accepts a column name and yields the records of the source operator but with the records grouped by their value and each separated by a marker record. For example, if the source operator had singleton records `[0,1,2,1,2,0,1]` the group by operator might yield `[0,0,M,1,1,1,M,2,2]` where `M` is a marker record. 72 | 73 | ## query/disk 74 | 75 | The classes in this directory are useful for implementing Grace Hash Join and External Sort, and correpond to the concept of "partitions" and "runs" used in those topics respectively. Both classes have an `add` method that can be used to insert a record into the partition/run. These classes will automatically buffer insertions and reads so that at most one page is needed in memory at a time. 76 | 77 | ## query/aggr 78 | 79 | The classes and functions in this directory implement aggregate functions, and are **not** necessary to complete the project \(though you're free to browse through them if you're interested\). 80 | 81 | ## query/QueryPlan.java 82 | 83 | ![](../../.gitbook/assets/proj3-volcano-model.png) 84 | 85 | This is the _volcano model_, where the operators are layered atop one another, and each operator requests tuples from the input operator\(s\) as it needs to generate its next output tuple. Note that each operator only fetches tuples from its input operator\(s\) as needed, rather than all at once! 86 | 87 | A query plan is a composition of query operators, and it describes _how_ a query is executed. Recall that SQL is a _declarative_ language - the user does not specify _how_ a query is run, and only _what_ the query should return. Therefore, there are often many possible query plans for a given query. 88 | 89 | The [`QueryPlan`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/query/QueryPlan.java) class represents a query. Users of the database create queries using the public methods \(such as `join()`, `select()`, etc.\) and then call `execute` to generate a query plan for the query and get back an iterator over the resulting data set \(which is _not_ fully materialized: the iterator generates each tuple as requested\). The current implementation of `execute` simply calls `executeNaive`, which joins tables in the order given; your task in Part 2 will be to generate better query plans. 90 | 91 | **SelectPredicate** 92 | 93 | SelectPredicate is a helper class inside of QueryPlan.java that stores information about that selection predicates that the user has applied, for example `someTable.col1 < 186`. A select predicate has four values that you can access: 94 | 95 | * `tableName`and `columnName` specify which column the predicate applies to 96 | * `operator` represents the type of operator being used \(for example `<`, `<=`, `>`, etc...\) 97 | * `value` is a DataBox containing a constant value that the column should be evaluated against \(in the above example, `186` would be the value\). 98 | 99 | All of the select predicates for the query are stored inside the selectPredicates instance variable. 100 | 101 | **JoinPredicate** 102 | 103 | JoinPredicate is a helper class insode of QueryPlan.java that stores information about the conditions on which tables are joined together, for example: `leftTable.leftColumn = rightTable.rightColumn`. All joins in RookieDB are equijoins. JoinPredicates have five values: 104 | 105 | * `joinTable`: the name of one of the table's being joined in. Only used for toString\(\) 106 | * `leftTable`: the name of the table on the left side of the equality 107 | * `leftColumn`: the name of the column on the left side of the equality 108 | * `rightTable`: the name of the table on the right side of the equality 109 | * `rightColumn`: The name of the column on the right side of the equality 110 | 111 | All of the join predicates for the query are stored inside of the joinPredicates instance variable. 112 | 113 | ### Interface for querying 114 | 115 | You should read through the `Database.java` section of the [main overview](../../#database-java) and browse through examples in [`src/test/java/edu/berkeley/cs186/database/TestDatabase.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/TestDatabase.java) to familiarize yourself with how queries are written in our database. 116 | 117 | After `execute()` has been called on a `QueryPlan` object, you can print the final query plan: 118 | 119 | ```java 120 | Iterator result = query.execute(); 121 | QueryOperator finalOperator = query.getFinalOperator(); 122 | System.out.println(finalOperator.toString()); 123 | ``` 124 | 125 | ```text 126 | -> SNLJ on S.sid=E.sid (cost=6) 127 | -> Seq Scan on S (cost=3) 128 | -> Seq Scan on E (cost=3) 129 | ``` 130 | 131 | -------------------------------------------------------------------------------- /project-handout/proj3/submitting-the-assignment.md: -------------------------------------------------------------------------------- 1 | # Submitting the Assignment 2 | 3 | ## Files 4 | 5 | You may **not** modify the signature of any methods or classes that we provide to you, but you're free to add helper methods. 6 | 7 | You should make sure that all code you modify belongs to files with `TODO(proj3)` comments in them \(e.g. don't add helper methods to DataBox\). A full list of files that you may modify follows: 8 | 9 | * `src/main/java/edu/berkeley/cs186/database/query/join/BNLJOperator.java` 10 | * `src/main/java/edu/berkeley/cs186/database/query/join/SortOperator.java` 11 | * `src/main/java/edu/berkeley/cs186/database/query/join/SortMergeOperator.java` 12 | * `src/main/java/edu/berkeley/cs186/database/query/join/GHJOperator.java` 13 | * `src/main/java/edu/berkeley/cs186/database/query/QueryPlan.java` \(Part 2 only\) 14 | 15 | Make sure that your code does _not_ use any static \(non-final\) variables - this may cause odd behavior when running with maven vs. in your IDE \(tests run through the IDE often run with a new instance of Java for each test, so the static variables get reset, but multiple tests per Java instance may be run when using maven, where static variables _do not_ get reset\). 16 | 17 | ## Gradescope 18 | 19 | Once all of your files are prepared in your repo you can submit to Gradescope through GitHub the same way you did for [Project 0](../proj0/submitting.md#pushing-changes-to-github-classroom). 20 | 21 | ## Submitting via upload 22 | 23 | If your GitHub account has access to many repos, the Gradescope UI might time out while trying to load which repos you have available. If this is the case for you, you can submit your code directly using via upload. You can zip up your source code with `zip -r submission.zip src/` and submit that directly to the autograder. 24 | 25 | ## Partners 26 | 27 | If you haven't yet already be sure to fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSdnDiXgyjRRFvm2mQ0G4M1o2xjGMDLpjXlOjbDzfAxPGibvZg/viewform?usp=sf_link) so we know who you're working with. Every student is responsible for submitting to gradescope individually -- if you submit but your partner doesn't then your partner will not got credit. If you worked off of a shared repo both members of the group are free to submit that repo. Slip days will be deducted individually. For example: You submit on time, but your partner submits a day late. Your partner will have to use a slip day or will receive a late penalty on the project \(but you will not\). 28 | 29 | ## Grade breakdown 30 | 31 | This project is worth 8% of your overall grade. 32 | 33 | * **30% of your score will come from your submission for Part 1**. We will only be running public Part 1 tests on your Part 1 submission. 34 | * **70% of your score will come from your final submission**. We will be running the hidden Part 1 tests, the public Part 2 tests, and the hidden Part 2 tests on your Part 2 submission. 35 | * 60% of your overall score will be made up of the tests released in this project \(the tests that we provided in `database.query.*`\). 36 | * 40% of your overall score will be made up of hidden, unreleased tests that we will run on your submission after the deadline. 37 | * The combined public and hidden tests from Part 1 are worth 50% 38 | * The combined public and hidden tests from Part 2 are worth 50%. 39 | 40 | -------------------------------------------------------------------------------- /project-handout/proj3/testing.md: -------------------------------------------------------------------------------- 1 | # Testing 2 | 3 | We strongly encourage testing your code yourself. The given tests for this project are not comprehensive tests: it is possible to write incorrect code that passes them all \(but not get full score\). 4 | 5 | Things that you might consider testing for include: anything that we specify in the comments or in this document that a method should do that you don't see a test already testing for, and any edge cases that you can think of. Think of what valid inputs might break your code and cause it not to perform as intended, and add a test to make sure things are working. We will **not** be testing behavior on invalid inputs \(`null` objects, negative buffer sizes or buffers too small to perform a join, invalid queries, etc...\). You can handle these inputs however you want, or not at all. 6 | 7 | To help you get started, here is one case that is _not_ in the given tests \(and will be included in the hidden tests\): joining an empty table with another table should result in an iterator that returns no records \(`hasNext()` should return false immediately\). 8 | 9 | To add a unit test, open up the appropriate test file and simply add a new method to the file with a `@Test` annotation, for example: 10 | 11 | ```java 12 | @Test 13 | public void testEmptyBNLJ() { 14 | // your test code here 15 | } 16 | ``` 17 | 18 | Many test classes have some setup code done for you already: take a look at other tests in the file for an idea of how to write the test code. For example, the SNLJ tests in TestNestedLoopJoin can be used as a template for your own BNLJ, Sort, and SMJ tests. 19 | 20 | 21 | 22 | -------------------------------------------------------------------------------- /project-handout/proj4.md: -------------------------------------------------------------------------------- 1 | # Project 4: Concurrency 2 | 3 | This assignment will be released on **Tuesday, 3/16/2021**. 4 | 5 | -------------------------------------------------------------------------------- /project-handout/proj4/README.md: -------------------------------------------------------------------------------- 1 | # Project 4: Concurrency 2 | 3 | -------------------------------------------------------------------------------- /project-handout/proj4/getting-started.md: -------------------------------------------------------------------------------- 1 | # Getting Started 2 | 3 | ## Logistics 4 | 5 | This project is worth 8% of your overall grade in the class. 6 | 7 | * Part 1 is due **Monday, 3/29/2021 at 11:59PM PDT (GMT-7)** and will be worth 20% of your score. Your score will be determined by public tests only. 8 | * Part 2 is due **Friday, 4/9/2021 at 11:59PM PDT (GMT-7)** and will be worth the remaining 80% of your score. We'll be running the public tests for Part 2 and all hidden tests for both Part 1 and Part 2 on this submission. 9 | 10 | The workload for the project is designed to be completed solo, but this semester we're allowing students to work on this project with a partner if you want to. Your partner does not have to be the same as the one you had for previous assignments. Feel free to search for a partner on [this Piazza thread](https://piazza.com/class/kjoxqrf1eq04mr?cid=5)! 11 | 12 | ## Prerequisites 13 | 14 | You should watch both Transactions & Concurrency lectures before beginning this project. 15 | 16 | ## Additional Resources 17 | 18 | Debugging walkthrough video for project 4 can be found [here](https://drive.google.com/drive/folders/1UnpcSU-rG9VAHfsD5WXO8CfFuzEDbwH_?usp=sharing). 19 | 20 | ## Fetching the released code 21 | 22 | The GitHub Classroom link for this project is in the Project 4 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). Once your private repo is set up clone the Project 4 skeleton code onto your local machine. 23 | 24 | ### Setting up your local development environment 25 | 26 | If you're using IntelliJ you can follow the instructions [in Project 0](../proj0/getting-started.md#setting-up-your-local-development-environment) in to set up your local environment again. Once you have your environment set up you can head to the next section [Part 0](skeleton-code.md) and begin working on the assignment. 27 | 28 | ## Adding a partner 29 | 30 | Once you've found a partner make sure you fill out the details in the form on the piazza release post. If you want to share code over GitHub you can follow the instructions [here](../../common/adding-a-partner-on-github.md). 31 | 32 | ## Debugging Issues with GitHub Classroom 33 | 34 | Feel free to skip this section if you don't have any issues with GitHub Classroom. If you are having issues \(i.e. the page froze or some error message appeared\), first check if you have access to your repo at `https://github.com/berkeley-cs186-student/sp21-proj4-username`, replacing `username` with your GitHub username. If you have access to your repo and the starter code is there, then you can proceed as usual. If you have access to your repo but the starter code is not there, run the following commands in a terminal \(again replacing `username` with your GitHub username\): 35 | 36 | ```text 37 | git clone https://github.com/berkeley-cs186/sp21-rookiedb sp21-proj4 38 | cd sp21-proj4/ 39 | git remote remove origin 40 | git remote add origin https://github.com/berkeley-cs186-student/sp21-proj4-username.git 41 | git push -u origin master 42 | ``` 43 | 44 | Then, you can proceed as usual. 45 | 46 | #### 404 Not Found 47 | 48 | If you're getting a 404 not found page when trying to access your repo, make sure you've set up your repo using the GitHub Classroom link in the Project 4 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). 49 | 50 | If you don't have access to your repo at all after following these steps, feel free to contact the course staff on Piazza. 51 | 52 | -------------------------------------------------------------------------------- /project-handout/proj4/part-1-lockmanager.md: -------------------------------------------------------------------------------- 1 | # Part 1: Queuing 2 | 3 | ![Datarace](../../.gitbook/assets/datarace%20%281%29%20%281%29%20%282%29%20%282%29%20%283%29%20%283%29.png) 4 | 5 | In this part you will implement some helpers functions for lock types and the queuing system for locks. The `concurrency` directory contains a partial implementation of a lock manager \(`LockManager.java`\), which you will be completing. 6 | 7 | Note on terminology: in this project, "children" and "parent" refer to the resource\(s\) directly below/above a resource in the hierarchy. "Descendants" and "ancestors" are used when we wish to refer to all resource\(s\) below/above in the hierarchy. 8 | 9 | ## Task 1: LockType 10 | 11 | Before you start implementing the queuing logic, you need to keep track of all the lock types supported, and how they interact with each other. The [`LockType`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockType.java) class contains methods reasoning about this, which will come in handy in the rest of the project. 12 | 13 | For the purposes of this project, a transaction with: 14 | 15 | * `S(A)` can read A and all descendants of A. 16 | * `X(A)` can read and write A and all descendants of A. 17 | * `IS(A)` can request shared and intent-shared locks on all children of A. 18 | * `IX(A)` can request any lock on all children of A. 19 | * `SIX(A)` can do anything that having `S(A)` or `IX(A)` lets it do, except requesting S, IS, or SIX\*\* locks on children of A, which would be redundant. 20 | 21 | \*\* This differs from how its presented in lecture, where SIX\(A\) allows a transaction to request SIX locks on children of A. We disallow this in the project since the S aspect of the SIX child would be redundant. 22 | 23 | You will need to implement the `compatible`, `canBeParentLock`, and `substitutable` methods: 24 | 25 | * [`compatible(A, B)`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockType.java#L14) checks if lock type A is compatible with lock type B -- can one transaction have lock A while another transaction has lock B on the same resource? For example, two transactions can have S locks on the same resource, so `compatible(S, S) = true`, but two transactions cannot have X locks on the same resource, so `compatible(X, X) = false`. 26 | * [`canBeParentLock(A, B)`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockType.java#L48) returns true if having A on a resource lets a transaction acquire a lock of type B on a child. For example, in order to get an S lock on a table, we must have \(at the very least\) an IS lock on the parent of table: the database. So `canBeParentLock(IS, S) = true`. 27 | * [`substitutable(substitute, required)`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockType.java#L61) checks if one lock type \(`substitute`\) can be used in place of another \(`required`\). This is only the case if a transaction having `substitute` can do everything that a transaction having `required` can do. Another way of looking at this is: let a transaction request the required lock. Can there be any problems if we secretly give it the substitute lock instead? For example, if a transaction requested an X lock, and we quietly gave it an S lock, there would be problems if the transaction tries to write to the resource. Therefore, `substitutable(S, X) = false`. 28 | 29 | Once you complete this task you should be passing all the tests in [`TestLockType.java`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/concurrency/TestLockType.java). This semester we've provided a [Gradescope assignment](https://www.gradescope.com/courses/225521/assignments/1097657) that you can complete to check your logic for the cases that aren't given. 30 | 31 | ## Task 2: LockManager 32 | 33 | **Note \(Spring 2021\):** To keep workload balanced with the upcoming midterm, your Part 1 submission will only need to pass the following tests from this task for full credit: 34 | 35 | * TestLockManager.testSimpleAcquireLock 36 | * TestLockManager.testSimpleAcquireLockFail 37 | * TestLockManager.testSimpleReleaseLock 38 | * TestLockManager.testReleaseUnheldLock 39 | * TestLockManager.testSimpleConflict 40 | 41 | These tests have been selected to overlap with the material you'll be studying for the midterm. The remaining tests will be tested in the Part 2 submission. 42 | 43 | The [`LockManager`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockType.java) class handles locking for individual resources. We will add multigranularity constraints in Part 2. 44 | 45 | **Before you start coding**, you should understand what the lock manager does, what each method of the lock manager is responsible for, and how the internal state of the lock manager changes with each operation. The different methods of the lock manager are not independent: trying to implement one without consideration of the others will cause you to spend significantly more time on this project. There is a fair amount of logic shared between methods, and it may be worth spending a bit of time writing some helper methods. 46 | 47 | A simple example of a blocking acquire call is described at the bottom of this section -- you should understand it and be able to describe any other combination of calls before implementing any method. 48 | 49 | You will need to implement the following methods of `LockManager`: 50 | 51 | * [`acquireAndRelease`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockManager.java#L156): this method atomically \(from the user's perspective\) acquires one lock and releases zero or more locks. This method has priority over any queued requests \(it should proceed even if there is a queue, and it is placed in the front of the queue if it cannot proceed\). 52 | * [`acquire`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockManager.java#L183): this method is the standard `acquire` method of a lock manager. It allows a transaction to request one lock, and grants the request if there is no queue and the request is compatible with existing locks. Otherwise, it should queue the request \(at the back\) and block the transaction. We do not allow implicit lock upgrades, so requesting an X lock on a resource the transaction already has an S lock on is invalid. 53 | * [`release`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockManager.java#L208): this method is the standard `release` method of a lock manager. It allows a transaction to release one lock that it holds. 54 | * [`promote`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockManager.java#L238): this method allows a transaction to explicitly promote/upgrade a held lock. The lock the transaction holds on a resource is replaced with a stronger lock on the same resource. This method has priority over any queued requests \(it should proceed even if there is a queue, and it is placed in the front of the queue if it cannot proceed\). We do not allow promotions to SIX, those types of requests should go to `acquireAndRelease`. This is because during SIX lock upgrades, it is possible we might need to also release redundant locks, so we need to handle these upgrades with `acquireAndRelease`. 55 | * [`getLockType`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockManager.java#L256): this is the main way to query the lock manager, and returns the type of lock that a transaction has on a specific resource., which was implemented in the previous step. 56 | 57 | ### Queues 58 | 59 | Whenever a request for a lock cannot be satisfied \(either because it conflicts with locks other transactions already have on the resource, or because there's a queue of requests for locks on the resource and the operation does not have priority over the queue\), it should be placed on the queue \(at the back, unless otherwise specified\) for the resource, and the transaction making the request should be blocked. 60 | 61 | The queue for each resource is processed independently of other queues, and must be processed after a lock on the resource is released, in the following manner: 62 | 63 | * The request at the front of the queue is considered, and if it doesn't conflict with any of the existing locks on the resource, it should be removed from the queue and: 64 | * the transaction that made the request should be given the lock 65 | * any locks that the request stated should be released are released 66 | * the transaction that made the request should be unblocked 67 | * The previous step should be repeated until the first request on the queue cannot be satisfied or the queue is empty. 68 | 69 | ### **Synchronization** 70 | 71 | `LockManager`'s methods have `synchronized` blocks to ensure that calls to `LockManager` are serial and that there is no interleaving of calls. You may want to read up on [synchronized methods](https://docs.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html) and [synchronized statements](https://docs.oracle.com/javase/tutorial/essential/concurrency/locksync.html) in Java. You should make sure that all accesses \(both queries and modifications\) to lock manager state in a method is inside **one** synchronized block, for example: 72 | 73 | ```java 74 | // Correct, use a single synchronized block 75 | void acquire(...) { 76 | synchronized (this) { 77 | ResourceEntry entry = getResourceEntry(name); // fetch resource entry 78 | // do stuff 79 | entry.locks.add(...); // add to list of locks 80 | } 81 | } 82 | 83 | // Incorrect, multiple synchronized blocks 84 | void acquire(...) { 85 | synchronized (this) { 86 | ResourceEntry entry = getResourceEntry(name); // fetch resource entry 87 | } 88 | // first synchronized block ended: another call to LockManager can start here 89 | synchronized (this) { 90 | // do stuff 91 | entry.locks.add(...); // add to list of locks 92 | } 93 | } 94 | 95 | // Incorrect, doing work outside of the synchronized block 96 | void acquire(...) { 97 | ResourceEntry entry = getResourceEntry(name); // fetch resource entry 98 | // do stuff 99 | // other calls can run while the above code runs, which means we could 100 | // be using outdated lock manager state 101 | synchronized (this) { 102 | entry.locks.add(...); // add to list of locks 103 | } 104 | } 105 | ``` 106 | 107 | Transactions block the entire thread when blocked, which means that you cannot block the transaction inside the `synchronized` block \(this would prevent any other call to `LockManager` from running until the transaction is unblocked... which is never, since the `LockManager` is the one that unblocks the transaction\). 108 | 109 | To block a transaction, call `Transaction#prepareBlock` **inside** the synchronized block, and then call `Transaction#block` **outside** the synchronized block. The `Transaction#prepareBlock` needs to be in the synchronized block to avoid a race condition where the transaction may be dequeued between the time it leaves the synchronized block and the time it actually blocks. 110 | 111 | If tests in `TestLockManager` are timing out, double-check that you are calling `prepareBlock` and `block` in the manner described above, and that you are not calling `prepareBlock` without `block`. \(It could also just be a regular infinite loop, but checking that you're handling synchronization correctly is a good place to start\). 112 | 113 | **Example** 114 | 115 | Consider the following calls \(this is what `testSimpleConflict` tests\): 116 | 117 | ```java 118 | // initialized elsewhere, T1 has transaction number 1, 119 | // T2 has transaction number 2 120 | Transaction t1, t2; 121 | 122 | LockManager lockman = new LockManager(); 123 | ResourceName db = new ResourceName("database"); 124 | 125 | lockman.acquire(t1, db, LockType.X); // t1 requests X(db) 126 | lockman.acquire(t2, db, LockType.X); // t2 requests X(db) 127 | lockman.release(t1, db); // t1 releases X(db) 128 | ``` 129 | 130 | In the first call, T1 requests an X lock on the database. There are no other locks on database, so we grant T1 the lock. We add X\(db\) to the list of locks T1 has \(in `transactionLocks`\), as well as to the locks held on the database \(in `resourceLocks`\). Our internal state now looks like: 131 | 132 | ```text 133 | transactionLocks: { 1 => [ X(db) ] } (transaction 1 has 1 lock: X(db)) 134 | resourceEntries: { db => { locks: [ {1, X(db)} ], queue: [] } } 135 | (there is 1 lock on db: an X lock by transaction 1, nothing on the queue) 136 | ``` 137 | 138 | In the second call, T2 requests an X lock on the database. T1 already has an X lock on database, so T2 is not granted the lock. We add T2's request to the queue, and block T2. Our internal state now looks like: 139 | 140 | ```text 141 | transactionLocks: {1 => [X(db)]} (transaction 1 has 1 lock: X(db)) 142 | resourceEntries: {db => {locks: [{1, X(db)}], queue: [LockRequest(T2, X(db))]}} 143 | (there is 1 lock on db: an X lock by transaction 1, and 1 request on 144 | queue: a request for X by transaction 2) 145 | ``` 146 | 147 | In the last call, T1 releases an X lock on the database. T2's request can now be processed, so we remove T2 from the queue, grant it the lock by updating `transactionLocks` and `resourceLocks`, and unblock it. Our internal state now looks like: 148 | 149 | ```text 150 | transactionLocks: { 2 => [ X(db) ] } (transaction 2 has 1 lock: X(db)) 151 | resourceEntries: { db => { locks: [ {2, X(db)} ], queue: [] } } 152 | (there is 1 lock on db: an X lock by transaction 2, nothing on the queue) 153 | ``` 154 | 155 | ## Submission 156 | 157 | After this, you should pass all the tests we have provided to you in [`database.concurrency.TestLockType`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/concurrency/TestLockType.java) and [`database.concurrency.TestLockManager`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/concurrency/TestLockUtil.java). Remember that for the Part 1 submission you only need to pass the first 5 tests of TestLockManager for full credit. 158 | 159 | Note that you may **not** modify the signature of any methods or classes that we provide to you, but you're free to add helper methods. Also, you should only modify code in the `concurrency` directory for this part. 160 | 161 | -------------------------------------------------------------------------------- /project-handout/proj4/part-2-lockcontext-and-lockutil.md: -------------------------------------------------------------------------------- 1 | # Part 2: Multigranularity 2 | 3 | ![Dataphase](../../.gitbook/assets/dataphase%20%281%29%20%281%29%20%282%29%20%282%29%20%283%29%20%283%29%20%281%29.png) 4 | 5 | **A working implementation of Part 1 is required for Part 2. If you have not yet finished** [**Part 1**](part-1-lockmanager.md)**, you should do so before continuing.** 6 | 7 | In this part, you will implement the middle layer \(`LockContext`\) and the declarative layer \(in `LockUtil`\). The `concurrency` directory contains a partial implementation of a lock context \(`LockContext`\), which you must complete in this part of the project. 8 | 9 | ## Your Tasks 10 | 11 | ### Task 3: LockContext 12 | 13 | The [`LockContext`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockContext.java) class represents a single resource in the hierarchy; this is where all multigranularity operations \(such as enforcing that you have the appropriate intent locks before acquiring or performing lock escalation\) are implemented. 14 | 15 | You will need to implement the following methods of `LockContext`: 16 | 17 | * [`acquire`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockContext.java#L83-L93): this method performs an acquire via the underlying `LockManager` after ensuring that all multigranularity constraints are met. For example, if the transaction has IS\(database\) and requests X\(table\), the appropriate exception must be thrown \(see comments above method\). If a transaction has a SIX lock, then it is redundant for the transaction to have an IS/S lock on any descendant resource. Therefore, in our implementation, we prohibit acquiring an IS/S lock if an ancestor has SIX, and consider this to be an invalid request. 18 | * [`release`](https://github.com/berkeley-cs186/sp21-base/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockContext.java#L101-L111): this method performs a release via the underlying `LockManager` after ensuring that all multigranularity constraints will still be met after release. For example, if the transaction has X\(table\) and attempts to release IX\(database\), the appropriate exception must be thrown \(see comments above method\). 19 | * [`promote`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockContext.java#L119-L138): this method performs a lock promotion via the underlying `LockManager` after ensuring that all multigranularity constraints are met. For example, if the transaction has IS\(database\) and requests a promotion from S\(table\) to X\(table\), the appropriate exception must be thrown \(see comments above method\). In the special case of promotion to SIX \(from IS/IX/S\), you should simultaneously release all descendant locks of type S/IS, since we disallow having IS/S locks on descendants when a SIX lock is held. You should also disallow promotion to a SIX lock if an ancestor has SIX, because this would be redundant. 20 | 21 | **Note**: this does still allow for SIX locks to be held under a SIX lock, in the case of promoting an ancestor to SIX while a descendant holds SIX. This is redundant, but fixing it is both messy \(have to swap all descendant SIX locks with IX locks\) and pointless \(you still hold a lock on the descendant anyways\), so we just leave it as is. 22 | 23 | * [`escalate`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockContext.java#L145-L178): this method performs lock escalation up to the current level \(see below for more details\). Since interleaving of multiple `LockManager` calls by multiple transactions \(running on different threads\) is allowed, you must make sure to only use one mutating call to the `LockManager` and only request information about the current transaction from the `LockManager` \(since information pertaining to any other transaction may change between the querying and the acquiring\). 24 | * [`getExplicitLockType`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockContext.java#L184-L188): this method returns the type of the lock explicitly held at the current level. For example, if a transaction has X\(db\), `dbContext.getExplicitLockType(transaction)` should return X, but `tableContext.getExplicitLockType(transaction)` should return NL \(no lock explicitly held\). 25 | * [`getEffectiveLockType`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/concurrency/LockContext.java#L194-L200): this method returns the type of the lock either implicitly or explicitly held at the current level. For example, if a transaction has X\(db\): 26 | 27 | * `dbContext.getEffectiveLockType(transaction)` should return X 28 | * `tableContext.getEffectiveLockType(transaction)` should _also_ return X \(since we implicitly have an X lock on every table due to explicitly having an X lock on the entire database\). 29 | 30 | Since an intent lock does _not_ implicitly grant lock-acquiring privileges to lower levels, if a transaction only has SIX\(database\), `tableContext.getEffectiveLockType(transaction)` should return S \(not SIX\), since the transaction implicitly has S on table via the SIX lock, but not the IX part of the SIX lock \(which is only available at the database level\). It is possible for the explicit lock type to be one type, and the effective lock type to be a different lock type, specifically if an ancestor has a SIX lock. 31 | 32 | **Hierarchy** 33 | 34 | The `LockContext` objects all share a single underlying `LockManager` object. The `parentContext` method returns the parent of the current context \(e.g. the lock context of the database is returned when `tableContext.parentContext()` is called\), and the `childContext` method returns the child lock context with the name passed in \(e.g. `tableContext.childContext(0L)` returns the context of page 0 of the table\). There is exactly one `LockContext` for each resource: calling `childContext` with the same parameters multiple times returns the same object. 35 | 36 | The provided code already initializes this tree of lock contexts for you. For performance reasons, however, we do not create lock contexts for every page of a table immediately. Instead, we create them as the corresponding `Page` objects are created. 37 | 38 | #### Escalation 39 | 40 | Lock escalation is the process of going from many fine locks \(locks at lower levels in the hierarchy\) to a single coarser lock \(lock at a higher level\). For example, we can escalate many page locks a transaction holds into a single lock at the table level. 41 | 42 | We perform lock escalation through `LockContext#escalate`. A call to this method should be interpreted as a request to escalate all locks on descendants \(these are the fine locks\) into one lock on the context `escalate` was called with \(the coarse lock\). The fine locks may be any mix of intent and regular locks, but we limit the coarse lock to be either S or X. 43 | 44 | For example, if we have the following locks: IX\(database\), SIX\(table\), X\(page 1\), X\(page 2\), X\(page 4\), and call `tableContext.escalate(transaction)`, we should replace the page-level locks with a single lock on the table that encompasses them: 45 | 46 | ![](../../.gitbook/assets/proj4-escalate1%20%283%29%20%283%29%20%283%29%20%281%29.png) 47 | 48 | Likewise, if we called `dbContext.escalate(transaction)`, we should replace the page-level locks and table-level locks with a single lock on the database that encompasses them: 49 | 50 | ![](../../.gitbook/assets/proj4-escalate2%20%281%29%20%281%29%20%282%29%20%282%29%20%283%29%20%283%29.png) 51 | 52 | Note that escalating to an X lock always "works" in this regard: having a coarse X lock definitely encompasses having a bunch of finer locks. However, this introduces other complications: if the transaction previously held only finer S locks, it would not have the IX locks required to hold an X lock, and escalating to an X reduces the amount of concurrency allowed unnecessarily. We therefore require that `escalate` only escalate to the least permissive lock type \(between either S or X\) that still encompasses the replaced finer locks \(so if we only had IS/S locks, we should escalate to S, not X\). 53 | 54 | Also note that since we are only escalating to S or X, a transaction that only has IS\(database\) would escalate to S\(database\). Though a transaction that only has IS\(database\) technically has no locks at lower levels, the only point in keeping an intent lock at this level would be to acquire a normal lock at a lower level, and the point in escalating is to avoid having locks at a lower level. Therefore, we don't allow escalating to intent locks \(IS/IX/SIX\). 55 | 56 | ### Task 4: LockUtil 57 | 58 | The LockContext class enforces multigranularity constraints for us, but it's a bit cumbersome to use in our database: wherever we want to request some locks, we have to handle requesting the appropriate intent locks, etc. 59 | 60 | To simplify integrating locking into our codebase \(the second half of this part\), we define the `ensureSufficientLockHeld` method. This method is used like a declarative statement. For example, let's say we have some code that reads an entire table. To add locking, we can do: 61 | 62 | ```java 63 | LockUtil.ensureSufficientLockHeld(tableContext, LockType.S); 64 | 65 | // any code that reads the table here 66 | ``` 67 | 68 | After the `ensureSufficientLockHeld` line, we can assume that the current transaction \(the transaction returned by `Transaction.getTransaction()`\) has permission to read the resource represented by `tableContext`, as well as any children \(all the pages\). 69 | 70 | We can call it several times in a row: 71 | 72 | ```java 73 | LockUtil.ensureSufficientLockHeld(tableContext, LockType.S); 74 | LockUtil.ensureSufficientLockHeld(tableContext, LockType.S); 75 | 76 | // any code that reads the table here 77 | ``` 78 | 79 | or write several statements in any order: 80 | 81 | ```java 82 | LockUtil.ensureSufficientLockHeld(pageContext, LockType.S); 83 | LockUtil.ensureSufficientLockHeld(tableContext, LockType.S); 84 | LockUtil.ensureSufficientLockHeld(pageContext, LockType.S); 85 | 86 | // any code that reads the table here 87 | ``` 88 | 89 | and no errors should be thrown, and at the end of the calls, we should be able to read all of the table. 90 | 91 | Note that the caller does not care exactly which locks the transaction actually has: if we gave the transaction an X lock on the database, the transaction would indeed have permission to read all of the table. But this doesn't allow for much concurrency \(and actually enforces a serial schedule if used with 2PL\), so we additionally stipulate that `ensureSufficientLockHeld` should grant as little additional permission as possible: if an S lock suffices, we should have the transaction acquire an S lock, not an X lock, but if the transaction already has an X lock, we should leave it alone \(`ensureSufficientLockHeld` should never reduce the permissions a transaction has; it should always let the transaction do at least as much as it used to, before the call\). 92 | 93 | We suggest breaking up the logic of this method into two phases: ensuring that we have the appropriate locks on ancestors, and acquiring the lock on the resource. You will need to promote in some cases, and escalate in some cases \(these cases are not mutually exclusive\). 94 | 95 | ### Task 5: Two-Phase Locking 96 | 97 | At this point, you should have a working system to acquire and release locks on different resources in the database. In this task you'll add logic to acquire and release locks throughout the course of a transaction. 98 | 99 | #### Acquisition Phase 100 | 101 | **Reads and Writes:** The simplest scheme for locking is to simply lock pages as we need them. As all reads and writes to pages are performed via the `Page.PageBuffer` class, it suffices to change only that. Modify the [`get`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/memory/Page.java#L207-L208) and [`put`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/memory/Page.java#L223-L224) methods of [`Page.PageBuffer`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/memory/Page.java#L185) to lock the page \(and acquire locks up the hierarchy as needed\) with the least permissive lock types possible. 102 | 103 | **Scans**: If we know we'll be scanning multiple pages of a table, we're better off just getting a single lock on the table instance of many fine grained locks on the table's pages. Modify the ridIterator and recordIterator methods to acquire an appropriate lock on the table before doing a scan. 104 | 105 | **Write Optimization:** When we modify a page, we'll almost always end up reading it first \(acquiring IS/S locks\) and then write back our updates to it afterwards \(promoting to IX/X locks\). If we know ahead of time that we're going to modify a page, we can skip the IS/S locks altogether by just acquiring IX/X locks to begin with. Modify the following methods to request the appropriate lock upfront: 106 | 107 | * [`PageDirectory#getPageWithSpace`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/table/PageDirectory.java#L121-L122) 108 | * [`Table#updateRecord`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/table/Table.java#L318-L319) 109 | * [`Table#deleteRecord`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/table/Table.java#L345-L346) 110 | 111 | Note: no more tests will pass after doing this, see the next section for why. 112 | 113 | #### Release Phase 114 | 115 | At this point, transactions should be acquiring lots of locks needed to do their queries, but no locks are ever released! We will be using Strict Two-Phase Locking in our database, which means that lock releases only happen when the transaction finishes, in the `cleanup` method. 116 | 117 | Modify the `close` method of `Database.TransactionContextImpl` to release all locks the transaction acquired. You should only use `LockContext#release` and not `LockManager#release` - `LockManager` will not verify multigranularity constraints, but other transactions at the same time assume that these constraints are met, so you do want these constraints to be maintained. Note that you can't just release the locks in any order! Think about in what order you are allowed to release the locks. 118 | 119 | You should pass the all the tests in [`TestDatabaseDeadlockPrecheck`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/TestDatabaseDeadlockPrecheck.java) and [`TestDatabase2PL`](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/TestDatabase2PL.java) after implementing the acquisition and release phase. 120 | 121 | ## **Additional Notes** 122 | 123 | After this, you should pass all the tests we have provided to you under `database.concurrency.*`, as well as the tests in `TestDatabaseDeadlockPrecheck` and `TestDatabase2PL`. 124 | 125 | Note that you may **not** modify the signature of any methods or classes that we provide to you, but you're free to add helper methods. Also, you should only modify code in the `concurrency` directory for this section. 126 | 127 | -------------------------------------------------------------------------------- /project-handout/proj4/skeleton-code.md: -------------------------------------------------------------------------------- 1 | # Part 0: Skeleton Code 2 | 3 | ![Data x-ray](../../.gitbook/assets/dataxray.png) 4 | 5 | Read through all of the code in the `concurrency` directory \(including the classes that you are not touching - they may contain useful methods or information pertinent to the project\). Many comments contain critical information on how you must implement certain functions. 6 | 7 | Try to understand how each class fits in: what is each class responsible for, what are all the methods you have to implement, and how does each one manipulate the internal state. Trying to code one method at a time without understanding how all the parts of the lock manager work often results in having to rewrite significant amounts of code. 8 | 9 | ## Layers 10 | 11 | ![](../../.gitbook/assets/proj4-layers.png) 12 | 13 | The skeleton code divides multigranularity locking into three layers. 14 | 15 | * The `LockManager` object manages all the locks, treating each resource as independent \(it doesn't consider the resource hierarchy at all\). This level is responsible queuing logic, blocking/unblocking transactions as necessary, and is the single source of authority on whether a transaction has a certain lock. If the `LockManager` says T1 has X\(database\), then T1 has X\(database\). 16 | * A collection of `LockContext` objects, which each represent a single lockable object \(e.g. a page or a table\) lies on top of the `LockManager`. The `LockContext` objects are connected according to the hierarchy \(e.g. a `LockContext` for a table has the database context as its parent, and its pages' contexts as children\). The `LockContext` objects all share a single `LockManager`, and each context enforces multigranularity constraints on its methods \(e.g. an exception will be thrown if a transaction attempts to request X\(table\) without IX\(database\)\). 17 | * A declarative layer lies on top of the collection of `LockContext` objects, and is responsible for acquiring all the intent locks needed for each S or X request that the database uses \(e.g. if S\(page\) is requested, this layer would be responsible for requesting IS\(database\), IS\(table\) if necessary\). 18 | 19 | In Part 1, you will be implementing the bottom layer \(`LockManager`\) and lock types. In Part 2, you will be implementing the middle and top layer \(`LockContext` and `LockUtil`\), and integrate your changes into the database. 20 | 21 | 22 | 23 | -------------------------------------------------------------------------------- /project-handout/proj4/submitting-the-assignment.md: -------------------------------------------------------------------------------- 1 | # Submitting the Assignment 2 | 3 | ## Files 4 | 5 | You may **not** modify the signature of any methods or classes that we provide to you, but you're free to add helper methods. 6 | 7 | You should make sure that all code you modify belongs to files with `TODO(proj4_part1)` and `TODO(proj4_part2)` comments in them \(e.g. don't add helper methods to DataBox\). A full list of files that you may modify follows: 8 | 9 | * `src/main/java/edu/berkeley/cs186/database/concurrency/LockType.java` 10 | * `src/main/java/edu/berkeley/cs186/database/concurrency/LockManager.java` 11 | * `src/main/java/edu/berkeley/cs186/database/concurrency/LockContext.java` \(Part 2 only\) 12 | * `src/main/java/edu/berkeley/cs186/database/concurrency/LockUtil.java` \(Part 2 only\) 13 | * `src/main/java/edu/berkeley/cs186/database/table/Table.java` \(Part 2 only\) 14 | * `src/main/java/edu/berkeley/cs186/database/table/PageDirectory.java` \(Part 2 only\) 15 | * `src/main/java/edu/berkeley/cs186/database/memory/Page.java` \(Part 2 only\) 16 | * `src/main/java/edu/berkeley/cs186/database/Database.java` \(Part 2 only\) 17 | 18 | Make sure that your code does _not_ use any static \(non-final\) variables - this may cause odd behavior when running with maven vs. in your IDE \(tests run through the IDE often run with a new instance of Java for each test, so the static variables get reset, but multiple tests per Java instance may be run when using maven, where static variables _do not_ get reset\). 19 | 20 | ## Gradescope 21 | 22 | Once all of your files are prepared in your repo you can submit to Gradescope through GitHub the same way you did for [Project 0](../proj0/submitting.md#pushing-changes-to-github-classroom). 23 | 24 | ### Submitting via upload 25 | 26 | If your GitHub account has access to many repos, the Gradescope UI might time out while trying to load which repos you have available. If this is the case for you, you can submit your code directly using via upload. You can zip up your source code with `zip -r submission.zip src/` and submit that directly to the autograder. 27 | 28 | ## Partners 29 | 30 | If you haven't yet already make sure [this form](https://docs.google.com/forms/d/e/1FAIpQLSeqB2F-aXGiORqNlOFuZjtknE_ydTI9yqznaYdI3V2ZCj2WKw/viewform?usp=sf_link) has the correct partner information so we know who you're working with. **We're no longer allowing partner submissions on Gradescope**. If you worked with a partner you'll both need to submit on your own \(you're still free to submit identical code if you like though\). Slip days will be deducted individually. For example: You submit on time, but your partner submits a day late. Your partner will have to use a slip day or will receive a late penalty on the project \(but you will not\). 31 | 32 | ## Grading 33 | 34 | * Your submission for Part 1 will be worth 20% of your Project 4 grade and will come entirely from the Part 1 public tests. 35 | * Your submission for Part 2 will be worth 80% of your Project 4 grade. 36 | * 5% for Part 1 hidden tests 37 | * 60% for Part 2 public tests 38 | * 15% for Part 2 hidden tests 39 | 40 | Your Part 2 submission will be used to run all hidden tests \(and public Part 2 tests\). If you do not have a submission for Part 2, you will receive a 0% on all hidden tests for Part 1 and 2. You may go back and make changes to Part 1 even after that part deadline has passed before submitting to Part 2. 41 | 42 | -------------------------------------------------------------------------------- /project-handout/proj4/testing.md: -------------------------------------------------------------------------------- 1 | # Testing 2 | 3 | We strongly encourage testing your code yourself, especially after each part \(rather than all at the end\). The given tests for this project \(even more so than previous projects\) are **not** comprehensive tests: it **is** possible to write incorrect code that passes them all. 4 | 5 | Things that you might consider testing for include: anything that we specify in the comments or in this document that a method should do that you don't see a test already testing for, and any edge cases that you can think of. Think of what valid inputs might break your code and cause it not to perform as intended, and add a test to make sure things are working. 6 | 7 | To help you get started, here is one case that is **not** in the given tests \(and will be included in the hidden tests\): if a transaction holds IX\(database\), IS\(table\), S\(page\) and promotes the database lock to a SIX lock via `LockContext#promote`, `numChildLocks` should be updated to be 0 for both the database and table contexts. 8 | 9 | To add a unit test, open up the appropriate test file \(all test files are located in `src/test/java/edu/berkeley/cs186/database` or subdirectories of it\), and simply add a new method to the test class, for example: 10 | 11 | ```text 12 | @Test 13 | public void testPromoteSIXSaturation() { 14 | // your test code here 15 | } 16 | ``` 17 | 18 | Many test classes have some setup code done for you already: take a look at other tests in the file for an idea of how to write the test code. 19 | 20 | -------------------------------------------------------------------------------- /project-handout/proj5.md: -------------------------------------------------------------------------------- 1 | # Project 5: Recovery 2 | 3 | This assignment will be released on **Saturday, 4/10/2021**. 4 | 5 | -------------------------------------------------------------------------------- /project-handout/proj5/README.md: -------------------------------------------------------------------------------- 1 | # Project 5: Recovery 2 | 3 | -------------------------------------------------------------------------------- /project-handout/proj5/getting-started.md: -------------------------------------------------------------------------------- 1 | # Getting Started 2 | 3 | ## Logistics 4 | 5 | This project is due **Friday, 4/23/2021 at 11:59PM PDT (GMT-7)**. It is worth 8% of your overall grade in the class. The workload for the project is designed to be completed solo, but this semester we're allowing students to work on this project with a partner if you want to. Feel free to search for a partner on [this Piazza thread](https://piazza.com/class/kjoxqrf1eq04mr?cid=5)! 6 | 7 | ## Prerequisites 8 | 9 | You should watch all the recovery lectures before starting this project. We also highly recommend reviewing the [recovery notes](https://cs186berkeley.net/resources/static/notes/n12-Recovery.pdf). 10 | 11 | ## Fetching the released code 12 | 13 | Complete [this form](https://docs.google.com/forms/d/e/1FAIpQLSdOQsgqO6cNzxB4A7q7O2V4hv4q0Ncl_OGVzQcX3lFWTH-nQQ/viewform?usp=sf_link) to get a Github Classroom link. Once your private repo is set up clone the Project 5 skeleton code onto your local machine. 14 | 15 | ### Setting up your local development environment 16 | 17 | If you're using IntelliJ you can follow the instructions [in Project 0](../proj0/getting-started.md#setting-up-your-local-development-environment) in to set up your local environment again. Once you have your environment set up you can head to the next section [Your Tasks](your-tasks.md) and begin working on the assignment. 18 | 19 | ## Adding a partner 20 | 21 | Once you've found a partner fill out [**this form**](https://docs.google.com/forms/d/e/1FAIpQLSdOQsgqO6cNzxB4A7q7O2V4hv4q0Ncl_OGVzQcX3lFWTH-nQQ/viewform?usp=sf_link) so we know who you're working with. If you want to share code over GitHub you can follow the instructions [here](../../common/adding-a-partner-on-github.md). 22 | 23 | ## Debugging Issues with GitHub Classroom 24 | 25 | Feel free to skip this section if you don't have any issues with GitHub Classroom. If you are having issues \(i.e. the page froze or some error message appeared\), first check if you have access to your repo at `https://github.com/berkeley-cs186-student/sp21-proj5-username`, replacing `username` with your GitHub username. If you have access to your repo and the starter code is there, then you can proceed as usual. If you have access to your repo but the starter code is not there, run the following commands in a terminal \(again replacing `username` with your GitHub username\): 26 | 27 | ```text 28 | git clone https://github.com/berkeley-cs186/sp21-rookiedb sp21-proj5 29 | cd sp21-proj5/ 30 | git remote remove origin 31 | git remote add origin https://github.com/berkeley-cs186-student/sp21-proj5-username.git 32 | git push -u origin master 33 | ``` 34 | 35 | Then, you can proceed as usual. 36 | 37 | ### 404 Not Found 38 | 39 | If you're getting a 404 not found page when trying to access your repo, make sure you've set up your repo using the GitHub Classroom link in the Project 2 release post on [Piazza](https://piazza.com/class/kjoxqrf1eq04mr). 40 | 41 | If you don't have access to your repo at all after following these steps, feel free to contact the course staff on Piazza. 42 | 43 | -------------------------------------------------------------------------------- /project-handout/proj5/submitting-the-assignment.md: -------------------------------------------------------------------------------- 1 | # Submitting the Assignment 2 | 3 | ## Files 4 | 5 | You may **not** modify the signature of any methods or classes that we provide to you, but you're free to add helper methods. 6 | 7 | You should make sure that all code you modify belongs to files with `TODO(proj5)` comments in them \(e.g. don't add helper methods to DataBox\). A full list of files that you may modify follows: 8 | 9 | * `src/main/java/edu/berkeley/cs186/database/recovery/ARIESRecoveryManager.java` 10 | 11 | Make sure that your code does _not_ use any static \(non-final\) variables - this may cause odd behavior when running with maven vs. in your IDE \(tests run through the IDE often run with a new instance of Java for each test, so the static variables get reset, but multiple tests per Java instance may be run when using maven, where static variables _do not_ get reset\). 12 | 13 | ## Gradescope 14 | 15 | Once all of your files are prepared in your repo you can submit to Gradescope through GitHub the same way you did for [Project 0](../proj0/submitting.md#pushing-changes-to-github-classroom). 16 | 17 | ## Submitting via upload 18 | 19 | If your GitHub account has access to many repos, the Gradescope UI might time out while trying to load which repos you have available. If this is the case for you, you can submit your code directly using via upload. You can zip up your source code with `zip -r submission.zip src/` and submit that directly to the autograder. 20 | 21 | ## Partners 22 | 23 | If you haven't yet already be sure to fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSdOQsgqO6cNzxB4A7q7O2V4hv4q0Ncl_OGVzQcX3lFWTH-nQQ/viewform?usp=sf_link) so we know who you're working with. Every student is responsible for submitting to gradescope individually -- if you submit but your partner doesn't then your partner will not got credit. If you worked off of a shared repo both members of the group are free to submit that repo. Slip days will be deducted individually. For example: You submit on time, but your partner submits a day late. Your partner will have to use a slip day or will receive a late penalty on the project \(but you will not\). 24 | 25 | ## Grading 26 | 27 | * 60% of your grade will be made up of tests released to you \(the tests that we provided in `database.recovery.TestRecoveryManager`\). 28 | * 40% of your grade will be made up of hidden, unreleased tests that we will run on your submission after the deadline. 29 | 30 | -------------------------------------------------------------------------------- /project-handout/proj5/testing.md: -------------------------------------------------------------------------------- 1 | # Testing 2 | 3 | We strongly encourage testing your code yourself, especially after each part \(rather than all at the end\). The given tests for this project \(even more so than previous projects\) are **not** comprehensive tests: it **is** possible to write incorrect code that passes them all. 4 | 5 | Things that you might consider testing for include: anything that we specify in the comments or in this document that a method should do that you don't see a test already testing for, and any edge cases that you can think of. Think of what valid inputs might break your code and cause it not to perform as intended, and add a test to make sure things are working. 6 | 7 | ## Running tests with coverage 8 | 9 | To find cases that you've accounted for in your implementation but are not being covered in your tests, you can run all of the Project 5 tests [with coverage](https://www.jetbrains.com/help/idea/code-coverage.html). Afterwards, you can navigate to your ARIESRecoveryManager file to see what parts of your code are not yet tested for. 10 | 11 | ## Cases not covered in the public tests 12 | 13 | Here are a few cases mentioned in the spec but not tested for in the public test set: 14 | 15 | * The checkpoint test provided checkpoints with a small number of transaction table and dirty page table entries -- enough to fit within 1 to 2 pages. Make sure your code still works even when there's a large amount of entries. If the entries aren't split up properly and too many entries are inserted into a single EndCheckpointLogRecord, your code will fail to flush the entry for exceeding the log tail size. 16 | * For appropriate transactions after analysis/undo, make sure that transactions [have been cleaned up](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/recovery/DummyTransaction.java#L30) \(calling cleanup\(\) on a transaction should set that flag\). 17 | 18 | And here are two common cases that your code should be prepared to handle: 19 | 20 | * Make sure your redo logic still works without error even if there are no entries in the reconstructed dirty page table after analysis. 21 | * Make sure your undo logic still works without error even if there are no transactions that need to be undone after analysis. 22 | 23 | ## Writing your own tests 24 | 25 | You can use or modify any of the functions we provided in the public test set to write your own tests. 26 | 27 | ### Setup 28 | 29 | ```java 30 | @Before 31 | public void setup() throws IOException { 32 | testDir = tempFolder.newFolder("test-dir").getAbsolutePath(); 33 | recoveryManager = loadRecoveryManager(testDir); 34 | DummyTransaction.cleanupTransactions(); 35 | LogRecord.onRedoHandler(t -> { 36 | }); 37 | } 38 | ``` 39 | 40 | The function above is run before every single test, and sets the value of the `recoveryManager` private variable to a new RecoveryManager object that operates on files in the `"test-dir"` directory \(locally this directory will be generated and likely cleaned up every time you run the test wherever JUnit is configured to create temporary directories\). The recovery manager object created will use a dummy locking system to prevent any dependencies with project 4, and 32 pages of memory in its buffer manager. 41 | 42 | ### Getting useful objects 43 | 44 | The following variables of the RecoveryManager can be used for testing purposes: 45 | 46 | * `bufferManager` - Useful if you want to manually run updates using records \(argument to LogRecord.redo\) 47 | * `diskSpaceManager` - Useful if you want to manually run updates using records \(argument to LogRecord.redo\) 48 | * `logManager` - Useful to directly append and flush logs to see how the recovery manager deals with them when rolling back. See `testAbortingEnd` for an example. 49 | * `dirtyPageTable` - Useful to make sure that pages are getting flushed properly and that recLSN's are set correctly or check that its reconstructed properly during analysis. 50 | * `transactionTable` - Useful to make sure that entries are created/removed properly or check that its reconstructed properly during analysis. 51 | 52 | ### Redo checks 53 | 54 | You may have noticed calls to `setupRedoChecks` and `finishRedoChecks`. To help with testing, every time redo is called on a LogRecord we make a [call to a provided method](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/recovery/LogRecord.java#L156). During regular operation this will just be a no-op function, but during testing we can set this to be whatever we want using [onRedoHandler](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/main/java/edu/berkeley/cs186/database/recovery/LogRecord.java#L225-L232). 55 | 56 | To make it more straight forward to do a series of checks, setupRedoChecks accepts a list of functional objects that take a LogRecord as an argument. Every time redo is called, the first LogRecord in the list is removed and is called using the LogRecord that was redone. For example, in [testAbortingEnd](https://github.com/berkeley-cs186/sp21-rookiedb/blob/master/src/test/java/edu/berkeley/cs186/database/recovery/TestForwardProcessing.java#L159-L163) we use this to check that the expected CLR's are emitted and redone in the order that we anticipated. This is useful when: 57 | 58 | * when ending an aborted transaction, rolling back changes should involve calling redo on CLRs as they are generated 59 | * during the undo phase, rolling back changes should involve calling redo on CLRs as they are generated 60 | 61 | -------------------------------------------------------------------------------- /project-handout/proj6.md: -------------------------------------------------------------------------------- 1 | # Project 6: NoSQL 2 | 3 | This assignment will be released on **Saturday, 4/24/2021**. 4 | 5 | -------------------------------------------------------------------------------- /project-handout/proj6/README.md: -------------------------------------------------------------------------------- 1 | # Project 6: NoSQL 2 | 3 | -------------------------------------------------------------------------------- /project-handout/proj6/getting-started.md: -------------------------------------------------------------------------------- 1 | # Getting Started 2 | 3 | ## Logistics 4 | 5 | This project is due **Sunday, 5/2/2021 at 11:59PM PDT (GMT-7)**. It is worth 5% of your overall grade in the class. 100% of your grade will come from the public tests released with the data set. 6 | 7 | Like Project 1 this project **must be completed individually.** Note that while this means we expect you to write all your queries on your own, all of the following are **permitted** under our academic integrity guidelines: 8 | 9 | * Discussion of approaches for solving a problem. 10 | * Giving away or receiving conceptual ideas towards a problem solution. 11 | * Discussion of specific syntax issues and bugs in your code. 12 | * Looking at another student's code for the sole purpose of helping that student debug 13 | * Using small snippets of code that you find online for solving tiny problems \(e.g. Googling “number to string mongo” may lead you to some sample code that you copy and paste into your solution\). Such code should always be cited with relevant code comments. 14 | 15 | ## Prerequisites 16 | 17 | There are no hard prerequisites for this project. The spec will walk you through writing a query in Mongo's syntax. However, you may find watching the NoSQL lectures \(after they're released\) to help contextualize how Mongo differs from the traditional SQL databases we've been working with for the majority of this semester. 18 | 19 | ## Fetching the released code 20 | 21 | The GitHub Classroom link for this project is in the Project 6 release post on [Piazza](https://piazza.com/class/$piazza-link$). Once your private repo is set up clone the Project 6 skeleton code onto your local machine. 22 | 23 | ## Required Software 24 | 25 | ### MongoDB v4.4 26 | 27 | We'll be exploring the document-oriented database [MongoDB](https://en.wikipedia.org/wiki/MongoDB) in this project. Check if you already have a copy installed by running `mongo --version` in a terminal. If you already have it installed you should see output similar to the following: 28 | 29 | ```text 30 | > mongo --version 31 | MongoDB shell version v4.4.1 32 | Build Info: ... 33 | ``` 34 | 35 | If you don't already have MongoDB on your machine, follow the instructions for your platform: 36 | 37 | #### Windows 38 | 39 | Follow the instructions [here](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/) to install MongoDB on Windows. You'll also need to install Database Tools from [here](https://docs.mongodb.com/database-tools/installation/installation-windows/). 40 | 41 | Once you have everything installed you'll want to locate the location of the mongo shell and mongoimport binaries. Confirm the location of the binaries at the following spots: 42 | 43 | * The mongo shell binary \(mongo.exe\) should be located `C:\Program Files\MongoDB\Server\4.4\bin\`. 44 | * `C:\Program Files\MongoDB\Tools\100\bin\`. If you can't find it at that exact location, check other directories under `C:\Program Files\MongoDB\Tools\`. If your file has a long name like `windows-x86-64-bit-mongoimport.exe` then rename it to just `mongoimport.exe` 45 | 46 | Add the two directories to your PATH. To edit your environment variables on Windows 10, use the following steps: 47 | 48 | 1. Open up search and type in "Edit the system environment variables" 49 | 2. Open that up and click "Environment Variables..." near the bottom 50 | 3. Click "Path" under user variables \(top half of the screen\) and click edit 51 | 4. Click "New" on the top right and add C:\Program Files\MongoDB\Server\4.4\bin\ 52 | 5. Repeat the same process for database tools with the appropriate PATH 53 | 54 | If everything went successfully you should be able to run `mongo --version` and `mongoimport --version` successfully in Git Bash, or `mongo.exe --version` and `mongoimport.exe --version` in other shells. 55 | 56 | #### MacOS 57 | 58 | If you don't already have it, install [Homebrew](https://brew.sh/). Then, in a terminal, run the following: 59 | 60 | ```text 61 | brew tap mongodb/brew 62 | brew install mongodb-community@4.4 63 | brew services start mongodb-community@4.4 64 | ``` 65 | 66 | If you run into a CompilerSelectionError, run `xcode-select --install` and repeat the commands above. Check that everything is installed by running `mongo --version` and `mongoimport --version`. If both of these commands work then you should be good to go. 67 | 68 | If your version of Mac can't support Mongo 4.4 then we recommend you use the Docker approach to complete this assignment. Otherwise, if it doesn't support Docker either then you can use Mongo 4.2 instead. This will work the same as 4.4 except the following functions recommended in the spec will not be available, so you'll have to modify the advice slightly: 69 | 70 | * `{$first: }` will have to be replaced with `{ $arrayElemAt: [ , 0 ]}` 71 | * `$isNumber` will have to be rewritten as an or statement based on the the fields [type](https://docs.mongodb.com/manual/reference/operator/aggregation/type/) 72 | 73 | #### Linux 74 | 75 | Follow the instructions [here](https://docs.mongodb.com/manual/administration/install-on-linux/) for your appropriate platform. If something breaks during the installation process and you can't run `mongo` and `mongoimport --version`, follow the instructions in the next section \(Docker\) to get a docker container with mongo and python pre-installed. 76 | 77 | #### Docker 78 | 79 | If you're on MacOS/Linux and ran into issues with installing mongo directly on your host machine, we recommend using a [docker container](https://www.docker.com/resources/what-container) with mongo and python pre-installed instead. To use our Docker image you'll need to install Docker Community Edition \("CE"\) on your machine. 80 | 81 | * To install Docker CE on Mac open the [Docker getting started page](https://www.docker.com/get-started), stay on the "Developer" tab, and click the button on the right to download the installer for your OS. Follow all the instructions included. 82 | * To install Docker CE on Linux, open the [Docker docs](https://docs.docker.com/install/#server), and click the appropriate link to find instructions for your Linux distro. 83 | 84 | Confirm that Docker is installed by running `docker --version` on the command line. If it works, you should be good to go. From the root of your project directory, run `pwd` to get the path to your present working directory. Then carefully run the following command \(be sure to replace `/path/to/project/directory` with the path from the previous step\): 85 | 86 | `docker run --name mongo186 -v "/path/to/project/directory:/proj6" -it chriskw/mongo186` 87 | 88 | This will download an image of a container with mongo and python preinstalled. You should see output like the following: 89 | 90 | ```text 91 | $ docker run --name mongo186 -v "/replace/this/accordingly:/proj6" -it chriskw/mongo186 92 | Unable to find image 'chriskw/mongo186:latest' locally 93 | latest: Pulling from chriskw/mongo186 94 | 95 | (a bunch of downloads should happen here) 96 | 97 | about to fork child process, waiting until server is ready for connections. 98 | forked process: 10 99 | child process started successfully, parent exiting 100 | student@a2ba045477a4:/$ 101 | ``` 102 | 103 | This should bring you into a container with the necessary requirements. Run `cd /proj6` to enter the proj6 directory. All of the files on your host machine should be present. Changes on your host machine \(for example, using a text editor like VSCode/Sublime\) should be visible from within this container as you work through the project. If everything went smoothly, you should be able to go to the next section [Extract the data set](getting-started.md#extract-the-data-set). 104 | 105 | If you exit the container and wish to access it again, you can run the command `docker start -ai mongo186` to re-enter the container. 106 | 107 | If your proj6 directory is empty, you most likely provided an invalid path when you ran `docker run`. In this case, run the following two commands: `docker kill mongo186` followed by `docker rm mongo186`. Afterwards, rerun the `docker run` command from above, making sure to replace the string after the `-v` accordingly. Followup on Piazza if you run into trouble with this step. 108 | 109 | ### Python 110 | 111 | You'll need a copy of Python 3.5 or higher to run the tests for this project locally \(the same as used in Project 1\). You can check if you already have an existing copy by running `python3 --version` in a terminal. If you don't already have a working copy download and install one for your appropriate platform from [here](https://www.python.org/downloads/). 112 | 113 | ## Extract the data set 114 | 115 | Download the data set from the class drive [here](https://drive.google.com/file/d/1VIA9unz82zVSeHV2EiohAwO7vYiPY56i/view). 116 | 117 | Unzip the `data.zip` file inside your `sp21-proj6-yourname` directory. You should now have a `data/` directory in your `sp21-proj6-yourname` directory. 118 | 119 | Afterwards, try running `python3 load.py`. You should see output like the following: 120 | 121 | ```text 122 | > python3 load.py 123 | 2021-04-11T08:30:08.567-0700 connected to: mongodb://localhost:27017/ 124 | 2021-04-11T08:30:08.567-0700 dropping: movies.credits 125 | 2021-04-11T08:30:14.590-0700 45475 document(s) imported successfully. 0 document(s) failed to import. 126 | 2021-04-11T08:30:14.609-0700 connected to: mongodb://localhost:27017/ 127 | 2021-04-11T08:30:14.610-0700 dropping: movies.movies_metadata 128 | 2021-04-11T08:30:15.551-0700 45406 document(s) imported successfully. 0 document(s) failed to import. 129 | 2021-04-11T08:30:15.562-0700 connected to: mongodb://localhost:27017/ 130 | 2021-04-11T08:30:15.562-0700 dropping: movies.keywords 131 | 2021-04-11T08:30:16.569-0700 43986 document(s) imported successfully. 0 document(s) failed to import. 132 | 2021-04-11T08:30:16.580-0700 connected to: mongodb://localhost:27017/ 133 | 2021-04-11T08:30:16.581-0700 dropping: movies.ratings 134 | 2021-04-11T08:30:17.968-0700 99958 document(s) imported successfully. 0 document(s) failed to import 135 | ``` 136 | 137 | ### Unfiltered Dataset 138 | 139 | The dataset used for this project is a subset of the original dataset -- we've filtered out overtly offensive and inappropriate keywords and movie descriptions, as well as any keywords that appear less than 15 times. If you want to access the original dataset without these filters applied, the unfiltered version is [here](https://drive.google.com/file/d/1ZiYYcW_vqeyL239AcAbsd2nybXZLdt9e/view). Note that the expected output of the project is based on the filtered version, so you'll need to use the filtered version to complete the project. 140 | 141 | ## Running the tests 142 | 143 | If you followed the instructions above you should now be able to test your code. Navigate to your project directory and try using `python3 test.py`. You should get output similar to the following: 144 | 145 | ```text 146 | > python3 test.py 147 | q0 FAIL: Empty output 148 | q1i FAIL: Empty output 149 | q1ii FAIL: Empty output 150 | q1iii FAIL: Empty output 151 | q1iv FAIL: Empty output 152 | q2i FAIL: Empty output 153 | q2ii FAIL: Empty output 154 | q2iii FAIL: Empty output 155 | q3i FAIL: Empty output 156 | q3ii FAIL: Empty output 157 | ``` 158 | 159 | If so, move on to the next section to start the project. If you see `ERROR`instead of `FAIL` create a followup on Piazza with details from your `your_output/` folder. 160 | 161 | ## Debugging Issues with GitHub Classroom 162 | 163 | Feel free to skip this section if you don't have any issues with GitHub Classroom. If you are having issues \(i.e. the page froze or some error message appeared\), first check if you have access to your repo at `https://github.com/berkeley-cs186-student/sp21-proj6-username`, replacing `username` with your GitHub username. If you have access to your repo and the starter code is there, then you can proceed as usual. If you have access to your repo but the starter code is not there, run the following commands in a terminal \(again replacing `username` with your GitHub username\): 164 | 165 | ```text 166 | git clone https://github.com/berkeley-cs186/sp21-proj6 sp21-proj6 167 | cd sp21-proj6/ 168 | git remote remove origin 169 | git remote add origin https://github.com/berkeley-cs186-student/sp21-proj6-username.git 170 | git push -u origin master 171 | ``` 172 | 173 | Then, you can proceed as usual. 174 | 175 | ### 404 Not Found 176 | 177 | If you're getting a 404 not found page when trying to access your repo, make sure you've set up your repo using the GitHub Classroom link in the Project 6 release post on [Piazza](https://piazza.com/class/$piazza-link$). 178 | 179 | If you don't have access to your repo at all after following these steps, feel free to contact the course staff on Piazza. 180 | 181 | -------------------------------------------------------------------------------- /project-handout/proj6/submitting-the-assignment.md: -------------------------------------------------------------------------------- 1 | # Submitting the Assignment 2 | 3 | ## Files 4 | 5 | You should make sure that all code you modify belongs to the .js files inside the `query/` directory. 6 | 7 | ## Gradescope 8 | 9 | Once all of your files are prepared in your repo you can submit to Gradescope through GitHub the same way you did for [Project 0](../proj0/submitting.md#pushing-changes-to-github-classroom). 10 | 11 | ## Submitting via upload 12 | 13 | If your GitHub account has access to many repos, the Gradescope UI might time out while trying to load which repos you have available. If you want to submit via upload, you can zip up your `query/` directory and upload that to Gradescope instead. 14 | 15 | ## Grading 16 | 17 | * This project will be worth 5% of your overall grade in the class. 100% of your grade will come from the public tests provided to you. 18 | 19 | -------------------------------------------------------------------------------- /project-handout/proj6/testing.md: -------------------------------------------------------------------------------- 1 | # Testing 2 | 3 | 4 | 5 | You can run your answers through mongo directly by running `mongo movies` to open the database and then entering your query directly: 6 | 7 | ```text 8 | $ mongo movies 9 | db.replace_with_a_collection.aggregate([ ... your query ... ]) 10 | ``` 11 | 12 | This can help you catch any syntax errors in your queries. Alternatively you can run a query directly through the testing script by pasting in your query into the appropriate file \(we'll use `query/q0.js` as an example\) and running `python3 test.py q0 --view`. This will print up to ten of the query's results. 13 | 14 | You can request more results with the `--batch_size` flag \(i.e. `python3 test.py q0 --view --batch_size 20` will give the first twenty results\). 15 | 16 | If you find yourself dealing with large, hard to read documents, you can view a formatted version of the first document by using `--format` instead of `--view`. For example, using the provided query from the [Building your first query](your-tasks.md#building-your-first-query) section of the spec: 17 | 18 | ```text 19 | $ python3 test.py q0 --format 20 | Showing formatted first document of the query 21 | { 22 | "min_rating": 2, 23 | "max_rating": 5, 24 | "title": "Jurassic Park", 25 | "num_ratings": 48 26 | } 27 | ``` 28 | 29 | To run a test, from within the `fa20-proj6-yourname` directory: 30 | 31 | ```text 32 | $ python3 test.py # This runs all of the tests 33 | $ python3 test.py 3ii # This would run tests for only q3ii 34 | ``` 35 | 36 | ## Format Matching 37 | 38 | Before we run a full test on your output, we check that the format of your output matches what we're expecting. Format in this context means that all the field names we expect are there, there are no extra field names, and that the types corresponding to the field names match. Here's an example of some mismatched format for `diffs/q2i.diff` 39 | 40 | ```text 41 | EXTRA FIELDS 42 | - foo 43 | 44 | MISMATCHED TYPES 45 | - mismatch on field `movieId`: 46 | - expected type: `number` (example: `63`) 47 | - actual type: `null`, (example: `null`) 48 | 49 | FORMAT MISMATCH 50 | - Example of expected document: 51 | { 52 | "movieId": 63 53 | } 54 | 55 | - Your document: 56 | { 57 | "foo": "bar", 58 | "movieId": null 59 | } 60 | ``` 61 | 62 | The above output tells you two things: 63 | 64 | * One of your documents had an extra field. In this case, looking at the "Your document" section, your output had an extra field called "foo" 65 | * One of your documents has a mismatched type for one of its fields. In this case, look at the "Your document" section, your output had a value of type null for the field "movieId", when it should have been a number. 66 | 67 | ## Diffs 68 | 69 | If you pass the format check, we'll run a diff against your query and the expected output. Become familiar with the UNIX [diff](http://en.wikipedia.org/wiki/Diff) format, if you're not already, because our tests saves a simplified diff for any query executions that don't match in `diffs/`. As an example, the following output for `diffs/q2ii.diff:`: 70 | 71 | ```text 72 | + {"_id": "only", "count": 521} 73 | {"_id": "just", "count": 481} 74 | - {"_id": "about", "count": 535} 75 | ``` 76 | 77 | This indicates that: 78 | 79 | * your output has an extra document `{"_id": "about", "count": 535}` \(the `-` at the beginning means the expected output _doesn't_ include this line but your output has it\) 80 | * your output is missing the document `{"_id": "only", "count": 521}` \(the plus at the beginning means the expected output _does_ include those lines but your output is missing it\). 81 | * If there is neither a `+` nor `-` at the beginning then it means that the line is in both your output and the expected output \(your output is correct for that line\). 82 | 83 | If you care to look at the query outputs directly, ours are located in the `expected_output` directory. Your output should be located in your solution's `your_output` directory once you run the tests. 84 | 85 | ## Reformatting 86 | 87 | When we generate the diffs we'll be doing some basic reformatting to reorder things that don't have an inherent order to make sure that the results are consistent with each other. These include: 88 | 89 | * field names will be rearranged to be in alphabetical order 90 | * arrays when we don't ask for an explicit order to them 91 | 92 | --------------------------------------------------------------------------------