├── .gitignore ├── fundamentals ├── pucit-systems-programming │ ├── lecture-6-git-1.md │ ├── lecture-7-git-2.md │ ├── lecture-1-intro.md │ ├── lecture-8-exit-handlers.md │ ├── lecture-29-shared-memory.md │ └── lecture-5-gnu-cmake.md ├── grok-system-design │ ├── Examples │ │ ├── Twitter │ │ │ └── 6-Twitter.pdf │ │ ├── DropBox │ │ │ └── 4-dropbox-part1.pdf │ │ ├── Pastebin │ │ │ ├── 2-pastebin-part1.pdf │ │ │ └── 2-pastebin-part2.pdf │ │ ├── TinyURL │ │ │ ├── 1-tinyurl-part1.pdf │ │ │ └── 1-tinyurl-part2.pdf │ │ ├── Web-Crawler │ │ │ └── Web-Crawler.pdf │ │ ├── Fleet-Upgrade │ │ │ ├── Fleet-Upgrade.pdf │ │ │ └── algoexpert.md │ │ ├── Instagram │ │ │ ├── 3-instagram-part1.pdf │ │ │ └── 3-instagram-part2.pdf │ │ ├── Facebook-Messenger │ │ │ └── 5-facebook-messenger-part1.pdf │ │ ├── step-by-step.md │ │ └── File-Transfer │ │ │ └── File-Transfer.md │ └── Lectures │ │ ├── 7-Redundancy.md │ │ ├── 9-CAP.md │ │ ├── 6-Proxies.md │ │ ├── 5-Indexes.md │ │ ├── 10-Consistent-Hashing.md │ │ ├── 2-Load-Balancing.md │ │ ├── 11-long-polling-websockets.md │ │ ├── 1-System-Design-Basics.md │ │ ├── 3-Caching.md │ │ ├── 4-Data-Partitioning.md │ │ └── 8-Sql-NoSql.md ├── cse-421-intro-to-os │ ├── lecture-1-intro.md │ ├── lecture-28-raid.md │ ├── lecture-13-simple-schedulers.md │ ├── lecture-10-context-switch.md │ ├── lecture-18-paging.md │ ├── lecture-9-interrupt-exception-2.md │ ├── lecture-24-filesystem-data-structure.md │ ├── lecture-4-fork-and-sync.md │ ├── lecture-27-log-structured-files.md │ ├── lecture-14-scheduling-story.md │ ├── lecture-17-page-translation.md │ ├── lecture-32-performance.md │ ├── lecture-11-threads.md │ ├── lecture-20-page-replacement.md │ ├── lecture-3-process-file-handlers.md │ ├── lecture-22-files.md │ ├── lecture-12-scheduling.md │ ├── lecture-2-processes.md │ ├── lecture-21-disks.md │ ├── lecture-19-swapping.md │ ├── lecture-23-intro-to-filesystems.md │ ├── lecture-5-intro-to-synch-primitives.md │ ├── lecture-30-xen-virtualization.md │ ├── lecture-25-file-system-caching.md │ ├── lecture-16-address-translation.md │ ├── lecture-15-virtual-address.md │ ├── lecture-31-containers.md │ └── lecture-6-synch-primitives.md └── system-perf │ ├── network.md │ ├── application.md │ └── cpus.md ├── cramming ├── linux-sys-admin │ └── images │ │ └── 15-sample-network.png ├── past │ ├── csv-parse │ │ ├── dataset1.csv │ │ ├── dataset2.csv │ │ └── main.py │ ├── scrub-phone │ │ ├── phone.new │ │ ├── phone.orig │ │ ├── main.py │ │ └── main.sh │ ├── parse-weblog │ │ ├── main.sh │ │ └── main.py │ └── minesweeper │ │ ├── minesweeper.py │ │ └── minesweeper_2.py └── lfs │ └── code │ ├── host_check.sh │ ├── md5sums │ └── wget-list └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.ipynb_checkpoints* 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /fundamentals/pucit-systems-programming/lecture-6-git-1.md: -------------------------------------------------------------------------------- 1 | # We already know this stuff skipping for now... -------------------------------------------------------------------------------- /fundamentals/pucit-systems-programming/lecture-7-git-2.md: -------------------------------------------------------------------------------- 1 | # We already know this stuff skipping for now... -------------------------------------------------------------------------------- /cramming/linux-sys-admin/images/15-sample-network.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/cramming/linux-sys-admin/images/15-sample-network.png -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Twitter/6-Twitter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/Twitter/6-Twitter.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/DropBox/4-dropbox-part1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/DropBox/4-dropbox-part1.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Pastebin/2-pastebin-part1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/Pastebin/2-pastebin-part1.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Pastebin/2-pastebin-part2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/Pastebin/2-pastebin-part2.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/TinyURL/1-tinyurl-part1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/TinyURL/1-tinyurl-part1.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/TinyURL/1-tinyurl-part2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/TinyURL/1-tinyurl-part2.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Web-Crawler/Web-Crawler.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/Web-Crawler/Web-Crawler.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Fleet-Upgrade/Fleet-Upgrade.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/Fleet-Upgrade/Fleet-Upgrade.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Instagram/3-instagram-part1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/Instagram/3-instagram-part1.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Instagram/3-instagram-part2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/Instagram/3-instagram-part2.pdf -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Facebook-Messenger/5-facebook-messenger-part1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cheuklau/sre-interview-prep/HEAD/fundamentals/grok-system-design/Examples/Facebook-Messenger/5-facebook-messenger-part1.pdf -------------------------------------------------------------------------------- /cramming/past/csv-parse/dataset1.csv: -------------------------------------------------------------------------------- 1 | NAME,LEG_LENGTH,DIET 2 | Hadrosaurus,1.2,herbivore 3 | Struthiomius,0.92,omnivore 4 | Velociraptor,1.0,carnivore 5 | Triceratops,0.87,herbivore 6 | Euplocephalus,1.6,herbivore 7 | Stegosaurus,1.40,herbivore 8 | Tyrannosaurus Rex,2.5,carnivore 9 | -------------------------------------------------------------------------------- /cramming/past/csv-parse/dataset2.csv: -------------------------------------------------------------------------------- 1 | NAME,STRIDE_LENGTH,STANCE 2 | Euoplocephalus,1.87,quadrupedal 3 | Stegosaurus,1.90,quadrupedal 4 | Tyrannosaurus Rex,5.76,bipedal 5 | Hadrosaurus,1.4,bipedal 6 | Deinonychus,1.21,bipedal 7 | Struthimimus,1.34,bipedal 8 | Velociraptor,2.72,bipedal 9 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-1-intro.md: -------------------------------------------------------------------------------- 1 | # Lecture 1 2 | 3 | ## Operating Systems Briefly Defined 4 | 5 | - Operating system is a computer program that: 6 | * Multiplexes hardware resources (i.e., sharing resources between processes) 7 | * Implements useful abstractions (improves on low-level realities) 8 | 9 | -------------------------------------------------------------------------------- /cramming/past/scrub-phone/phone.new: -------------------------------------------------------------------------------- 1 | This is a test of the phone number scrubber. 2 | Here is something close to a phone number 12-49-1098 and here 3 | is a valid phone number XXX-XXX-XXXX 4 | Will the script be able to scrub XXX-XXX-XXXX embedded in a sentence or 5 | XXX-XXX-XXXX that starts at the beginning of the sentence? 6 | Good luck! -------------------------------------------------------------------------------- /cramming/past/scrub-phone/phone.orig: -------------------------------------------------------------------------------- 1 | This is a test of the phone number scrubber. 2 | Here is something close to a phone number 12-49-1098 and here 3 | is a valid phone number 815-294-2039 4 | Will the script be able to scrub 918-203-4950 embedded in a sentence or 5 | 948-102-3049 that starts at the beginning of the sentence? 6 | Good luck! -------------------------------------------------------------------------------- /cramming/past/scrub-phone/main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import re 4 | 5 | # This task is probably too simple to use a Python script for. 6 | # A single sed command is sufficient. 7 | 8 | infile = open("phone.orig", "r") 9 | outfile = open("phone.new", "w") 10 | for line in infile: 11 | outfile.write(re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', line)) -------------------------------------------------------------------------------- /cramming/past/parse-weblog/main.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Grep options: 4 | # -E is for extended regex which gives you full regex capability 5 | # -o is for printing only the matched string with each match on a separate line 6 | # 7 | # Next we sort the results so the same IPs are next to each other 8 | # Next we run uniq -c which gets the count of each unique IP 9 | # Next we sort by count in reverse order so that the greatest count is first 10 | # Finally we return the top 10 results 11 | grep -Eo "^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" webserver.log | sort | uniq -c | sort -r | head -10 12 | -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/7-Redundancy.md: -------------------------------------------------------------------------------- 1 | # Redundancy and Replication 2 | 3 | ## Redundancy 4 | 5 | - Redundancy is the duplication of critical components or functions of a system to increase system reliability or to improve system performance 6 | - Redundancy removes SPOF and provides backups in a crisis 7 | - Example: If two instances of a service running in prod and one fails, the system can failover to the other one 8 | 9 | ## Replication 10 | 11 | - Replication means sharing info to ensure consistency between redundant resources e.g., software/hardware components to improve reliability, fault-tolerance or accessibility 12 | - Widely used in database management systems (DBMS) usually with a primary-replica relationship between original and replicas 13 | - Primary server gets updates and then it updates the replicas 14 | -------------------------------------------------------------------------------- /cramming/past/parse-weblog/main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import re 4 | import collections 5 | 6 | webfile = "webserver.log" 7 | 8 | # Create regex object with the same expression as bash script. 9 | # Note that we are grouping the entire IP and nothing else. 10 | re_obj = re.compile(r'(^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})') 11 | 12 | # Go through each line of the webfile. 13 | # Use the re match method on each line. 14 | # If there is a match then get the IP and increment the count 15 | # for that IP in a counter object. 16 | # Note that counter is preferred over a dict here because we need 17 | # to return the top 10 at the end and there is no way to 18 | # sort a dict. 19 | result = collections.Counter() 20 | infile = open(webfile, "r") 21 | for line in infile: 22 | match = re_obj.match(line) 23 | if match: 24 | ip_tmp = match.groups()[0] 25 | result[ip_tmp] += 1 26 | for ip, pings in result.most_common(10): 27 | print '%4s %s' % (pings, ip) 28 | -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/9-CAP.md: -------------------------------------------------------------------------------- 1 | # CAP Theorem 2 | 3 | - CAP theorem states it is impossible for a distributed software system to provide more than two out of the three of the following: 4 | 1. Consistency 5 | * All nodes see the same data at the same time 6 | * Achieved by updating several nodes before allowing further reads 7 | 2. Availability 8 | * Every request gets a response on success/failure 9 | * Achieved by replicating data across different servers 10 | 3. Partition tolerance 11 | * System continues to work despite message loss or partial failure 12 | * Sustain any amount of network failure that doesn't result in a failure of the entire network 13 | * Data is replicated across nodes and networks to keep the system up through outages 14 | - We can only build a system that has two of the three properties 15 | * To be consistent, all nodes should see the same set of updates in the same order 16 | * But if network loses a partition, updates in one partition might not make it to the other partitions before a client reads from out-of-date partition 17 | * Only fix is to stop serving requests from out-of-date partition, but then service no longer 100% available 18 | - Examples: 19 | * AP: Cassandra, CouchDB 20 | * AC: RDBMS 21 | * CP: Big table, mongoDB, HBase -------------------------------------------------------------------------------- /cramming/past/scrub-phone/main.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Method 1 4 | # Here we do a grep with 5 | # -E to denote extended regex 6 | # -o to denote we only want to output matches in a newline 7 | # Read in the file called phone.orig 8 | # 9 | # The output will be each phone number in a newline 10 | # This will be piped into xargs with 11 | # -I{} replace occurrences of {} with the phone numbers piped 12 | # in from grep one at a time 13 | # 14 | # xargs is wrapped around sed with 15 | # -i '' indicates we want to replace in-place on the file 16 | # Note that the empty quotes '' is only required on MacOS to indicate 17 | # we do not want a backup copy of the original. 18 | # 19 | # The extra copies and moves allows us to keep the original file. 20 | # cp phone.orig phone.tmp 21 | # grep -Eo '\d{3}-\d{3}-\d{4}' phone.orig | xargs -I{} sed -i '' 's/{}/XXX-XXX-XXXX/g' phone.orig 22 | # mv phone.orig phone.new 23 | # mv phone.tmp phone.orig 24 | 25 | # Method 2 26 | # Here we use the regex directly in sed but sadly regex in sed is a bit iffy. 27 | # For example, we had to use [[:digit:]] (or [0-9]) instead of \d and 28 | # we have to specify -E for sed to understand extended regular expressions. 29 | # For GNU instead of -E we use -r. 30 | # 31 | # The extra copies and moves allows us to keep the original file. 32 | cp phone.orig phone.tmp 33 | sed -i '' -E 's/[[:digit:]]{3}-[[:digit:]]{3}-[[:digit:]]{4}/XXX-XXX-XXXX/g' phone.orig 34 | mv phone.orig phone.new 35 | mv phone.tmp phone.orig -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/6-Proxies.md: -------------------------------------------------------------------------------- 1 | # Proxies 2 | 3 | - Proxy server is an intermediate server between client and backend server 4 | - Clients connect to proxy to make a request for a service e.g., web page, file, connection, etc 5 | - Proxy is a software or hardware that acts as an intermediary for client requests seeking resources from other servers 6 | - Proxies used to: 7 | * Filter requests 8 | * Log requests 9 | * Transform requests e.g., adding/removing headers, encrypting/decrypting, compressing a resource 10 | - Proxy server can also cache requests so requests can be handled without going to remote server 11 | 12 | ## Proxy Server Types 13 | 14 | - Proxies can reside on client's local server or anywhere between client and remote servers 15 | - Popular proxy servers: 16 | 1. Open proxy 17 | * Proxy server accessible by any internet user 18 | * Any user on the internet is able to use this forwarding device 19 | * Two famous types of open proxies: 20 | + Anonymous proxy: proxy reveals its identity as a server but does not disclose the initial IP address 21 | - Hides users IP address 22 | + Transparent proxy: proxy identifies itself and with the support of HTTP headers, the first IP address can be viewed 23 | - Main benefit is being able to cache websites 24 | 2. Reverse proxy 25 | * Retrieves resources on behalf of a client from one or more servers 26 | * Resources are then returned to the client appearing as if they originated from the proxy itself -------------------------------------------------------------------------------- /cramming/lfs/code/host_check.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | export LC_ALL=C 3 | bash --version | head -n1 | cut -d" " -f2-4 4 | MYSH=$(readlink -f /bin/sh) 5 | echo "/bin/sh -> $MYSH" 6 | echo $MYSH | grep -q bash || echo "ERROR: /bin/sh does not point to bash" 7 | unset MYSH 8 | echo -n "Binutils: "; ld --version | head -n1 | cut -d" " -f3- 9 | bison --version | head -n1 10 | if [ -h /usr/bin/yacc ]; then 11 | echo "/usr/bin/yacc -> `readlink -f /usr/bin/yacc`"; 12 | elif [ -x /usr/bin/yacc ]; then 13 | echo yacc is `/usr/bin/yacc --version | head -n1` 14 | else 15 | echo "yacc not found" 16 | fi 17 | bzip2 --version 2>&1 < /dev/null | head -n1 | cut -d" " -f1,6- 18 | echo -n "Coreutils: "; chown --version | head -n1 | cut -d")" -f2 19 | diff --version | head -n1 20 | find --version | head -n1 21 | gawk --version | head -n1 22 | if [ -h /usr/bin/awk ]; then 23 | echo "/usr/bin/awk -> `readlink -f /usr/bin/awk`"; 24 | elif [ -x /usr/bin/awk ]; then 25 | echo awk is `/usr/bin/awk --version | head -n1` 26 | else 27 | echo "awk not found" 28 | fi 29 | gcc --version | head -n1 30 | g++ --version | head -n1 31 | ldd --version | head -n1 | cut -d" " -f2- # glibc version 32 | grep --version | head -n1 33 | gzip --version | head -n1 34 | cat /proc/version 35 | m4 --version | head -n1 36 | make --version | head -n1 37 | patch --version | head -n1 38 | echo Perl `perl -V:version` 39 | python3 --version 40 | sed --version | head -n1 41 | tar --version | head -n1 42 | makeinfo --version | head -n1 # texinfo version 43 | xz --version | head -n1 44 | echo 'int main(){}' > dummy.c && g++ -o dummy dummy.c 45 | if [ -x dummy ] 46 | then echo "g++ compilation OK"; 47 | else echo "g++ compilation failed"; fi 48 | rm -f dummy.c dummy -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/5-Indexes.md: -------------------------------------------------------------------------------- 1 | # Indexes 2 | 3 | - Goal of creating an index on a particular table in a DB is to make it faster to search through the table and find relevant rows 4 | - Indexes can be created using one or more columns of a database table, providing rapid random lookups and efficient access of ordered records 5 | 6 | ## Example: A Library Catalog 7 | 8 | - Library catalog is a register that contains list of books found in a library 9 | - Catalog organized like a DB table with four columns: `Title`, `Writer`, `Subject` and `Date` 10 | - Two such catalogs: 11 | * One sorted by `Title` 12 | * One sorted by `Writer` 13 | * Provide a sorted list of data that is easily searchable by relevant information 14 | - Index is a data structure that can be perceived as a table of contents that points us to the location where actual data lives 15 | - To create index on a column of a table, we store column and a pointer to the whole row in the index 16 | - For example, we can create an index on `Title` where each row contains the `Title` and a pointer to the full row in the original DB table 17 | - Must carefully consider how users will access the data 18 | - Particularly important when we have a large dataset 19 | * Can't possibly iterate over that much data in a reasonable amount of time 20 | * Very likely large data set is spread over multiple machines 21 | 22 | ## How do Indexes Decrease Write Performance? 23 | 24 | - Speeds up data retrieval but may itself by large due to additional keys which slow down data insertion and update 25 | - When adding rows or updating existing rows for a table with active index, we have to write the new data and also update the index 26 | * This decreases write performance 27 | - Therefore, adding unnecessary indexes to tables should be avoided 28 | - Adding indexes is about improving performance of search queries 29 | - Should not add indexes to write-heavy databases -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/10-Consistent-Hashing.md: -------------------------------------------------------------------------------- 1 | # Consistent Hashing 2 | 3 | - Distributed Hash Table (DHT) is a fundamental component used in distributed scalable syustems 4 | - Hash tables need a key, value and a hash function to map the key to a location where the value is stored i.e., `index=hash_function(key)` 5 | - Example: 6 | * Given `n` cache servers, an intuitive hash function would be `key % n` 7 | * Two drawbacks: 8 | 1. Not horizontally scable because whenever a new cache host is added, all existing mappings are broken 9 | 2. May not be load balanced especially for non-uniformly distributed data 10 | 11 | ## Consistent Hashing 12 | 13 | - Useful strategy for distributed caching systems and DHTs 14 | - Distribute data across a cluster in such a way that will minimize reorganization when nodes are added or removed 15 | - When hash table is resized e.g., new cache host added, only `k/n` keys need to be remapped 16 | - Objects are mapped to the same host if possible 17 | - When host is removed, objects on that host are shared by the other hosts 18 | 19 | ## How Does it Work? 20 | 21 | - Consistent hashing maps a key to an integer 22 | - Suppose output of the hash function is `[0, 256]` range 23 | - Imagine integers in the range placed on a ring such taht the values are wrapped around 24 | - Procedure: 25 | 1. Given a list of cache servers, hash them to integers in the range 26 | 2. To map a key to a server 27 | * Hash it to a single integer 28 | * Move clockwise on the ring until finding the first cache it encounters 29 | * That cache is the one that contains the key 30 | - To add a new server `D`, keys originally in `C` will be split with some of them shifted to `D` 31 | - To remove cache `A` all keys that were originally mapped to `A` will fall into `B` 32 | - For load balancing, real data is essentially randomly distributed and thus may not be uniform 33 | - To handle above issue, we add virtual replicas for caches i.e., instead of mapping each cache to a single point on the ring, we map it to multiple points on the ring -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-28-raid.md: -------------------------------------------------------------------------------- 1 | # Lecture 28 2 | 3 | ## Redundant Arrays of Inexpensive Disks 4 | 5 | - Big idea: several cheap things can be better than one expensive thing 6 | 7 | ## RAID: Problems 8 | 9 | - What is the problem that the RAID paper identifies 10 | * Computer CPUs are getting faster 11 | * Computer memory is getting faster 12 | * Hard drives are not keeping up 13 | - What is the problem with the RAID solution? 14 | 8 Many cheap things fail much more frequently than one expensive thing 15 | * So need to plan to handle failures 16 | 17 | ## RAID 1 (Common) 18 | 19 | - RAID 1 (mirroring) 20 | * Two duplicate disks 21 | * Writes must go to both disks, reads can come from either 22 | * Performance: better for reads 23 | * Capacity: unchanged 24 | 25 | ## RAID 2 (Uncommon) 26 | 27 | - RAID 2 28 | * Byte-level striping: single error disk 29 | * Hamming codes to detect failures and correct errors 30 | * Most reads and writes require all disks 31 | * Capacity: improved 32 | 33 | ## RAID 3 (Uncommon) 34 | 35 | - RAID 3 36 | * Only correct errors since disks can detect when they fail 37 | * Byte-level striping, single parity disk 38 | * Most reads and writes require all disks 39 | * Capacity: improved 40 | 41 | ## RAID 4 (Uncommon) 42 | 43 | - RAID 4 44 | * Block-level striping, single parity disk 45 | * Better distribution of reads between disks due to larger stripe size 46 | * But all writes all must access the parity disk 47 | * Performance: improved for reads 48 | 49 | ## RAID 5 (Full Victory) 50 | 51 | - RAID 5 52 | * Block-level striping 53 | * Multiple parity disks 54 | * Better distribution of writes between disks 55 | * Performance: improved for writes 56 | 57 | ## RAID 0 (Non-RAID) 58 | 59 | - RAID 0 60 | * Each disk stores half of the data 61 | * No error correction or redundancy 62 | * Performance: fantastic 63 | * Capacity: fantastic 64 | * Redundancy: none 65 | 66 | ## RAID: Redundancy 67 | 68 | - RAID arrays can tolerate the failure of one (or more) disks 69 | - Once one fails, the array is vulnerable to data loss 70 | - An admin must replace the disks and then rebuild the array 71 | -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/step-by-step.md: -------------------------------------------------------------------------------- 1 | # System Design Interviews: Step by Step Guide 2 | 3 | # Step 1: Requirements Clarification 4 | 5 | - Define end goals 6 | - Clarify parts of the system to focus oon 7 | - Example designing a Twitter-like service: 8 | 1. Will users be able to post tweets/follow people 9 | 2. Should we design to create user timeline 10 | 3. Will tweets coontain photos and videos 11 | 4. Are we focusing oon backend only or front end too 12 | 5. Will users be able to search tweets 13 | 14 | # Step 2: Back of Envelope Calculation 15 | 16 | - Estimate scale of the system we are going to design 17 | - Help with scaling, partitining, load balancing and caching 18 | - What scale expected from system? 19 | - How much storage? 20 | - What network bandwidth? 21 | 22 | # Step 3: System Interface Definition 23 | 24 | - What APIs are expected from the system 25 | - `postTweet(user_id, tweet_data, tweet_location, ...)` 26 | 27 | # Step 4: Defining Data Model 28 | 29 | - Help with clarifying how data will flow between different parts of system 30 | - Guide for data partitioning and management 31 | - Candidate should identify different parts of system, how they will interact and other aspects of data management e.g., storage, transport, encryption, etc 32 | - Example: 33 | * `User: UserID, Name, Email` 34 | * `Tweet: TweetID, Content` 35 | - Decide which database too use e.g., NoSQL or MySQL 36 | - What kind of block storage to store photoos and videos 37 | 38 | # Step 5: High-Level Design 39 | 40 | - Draw block diagram representing core components of system 41 | - Enough to solve actual problem end-to-end 42 | - Example: 43 | * Client 44 | * Load balancer 45 | * App servers 46 | * Databases/File storage 47 | 48 | # Step 6: Detailed Design 49 | 50 | - Dig deeper intoo a few major coomponents 51 | - Present different approaches including pros and cons 52 | - Example: 53 | * How should we partitioon data 54 | * How will we handle hot paths for databases 55 | * How to store data to optimize searches 56 | * Where can we implement caching 57 | * What components need better load balancing 58 | 59 | # Step 7: Identifying and Resolving Bottlenecks 60 | 61 | - Identify bottlenecks and approaches to mitigate them 62 | * SPOFs 63 | * Do we have enough replicas of the data 64 | * Do we have enough copies of services to be highly available 65 | * How are we monitoring performance, alerting 66 | 67 | # Step 8: Summary 68 | 69 | - Be organized -------------------------------------------------------------------------------- /cramming/past/minesweeper/minesweeper.py: -------------------------------------------------------------------------------- 1 | # Implement minesweeper 2 | # 3 | # User selects a grid size. 4 | # Randomly assign mines to grid points. 5 | # For each other grid point calculate how many 6 | # neighboring grid points are mines and 7 | # assign that total to it. 8 | # User will select a grid point: 9 | # If it is a mine, game is over. 10 | # If it is not a mine, display the number 11 | # of neighboring mines. 12 | # 13 | 14 | import random 15 | 16 | # User input for grid size 17 | size = 10 18 | 19 | # Number of mines 20 | num_mines = int(0.20 * size * size) 21 | 22 | # Initialize grid 23 | grid = [[0 for i in range(size)] for j in range(size)] 24 | 25 | # Randomly assign mine positions 26 | set_mines = 0 27 | while set_mines < num_mines: 28 | x = random.choice(range(0, size)) 29 | y = random.choice(range(0, size)) 30 | if grid[x][y] == 0: 31 | grid[x][y] = "b" 32 | set_mines += 1 33 | 34 | # Assign value to each grid point 35 | for x in range(0, size): 36 | for y in range(0, size): 37 | if grid[x][y] != "b": 38 | tmp = 0 39 | # Check one above 40 | if y < size - 1: 41 | if grid[x][y + 1] == "b": 42 | tmp += 1 43 | # Check one below 44 | if y > 0: 45 | if grid[x][y - 1] == "b": 46 | tmp += 1 47 | # Check one to the left 48 | if x > 0: 49 | if grid[x - 1][y] == "b": 50 | tmp += 1 51 | # Check one to the right 52 | if x < size - 1: 53 | if grid[x + 1][y] == "b": 54 | tmp += 1 55 | grid[x][y] = tmp 56 | 57 | # Initialize solution grid 58 | solution = [["X" for i in range(size)] for j in range(size)] 59 | 60 | # Start the game 61 | game_over = False 62 | while not game_over: 63 | 64 | # Re-display game board 65 | for row in solution: 66 | print row 67 | 68 | # Ask user for input 69 | x_user = int(raw_input("choose row: ")) 70 | y_user = int(raw_input("choose column: ")) 71 | mark_user = raw_input("bomb (y/n): ") 72 | 73 | if mark_user == "y": 74 | solution[x_user][y_user] = "?" 75 | else: 76 | if grid[x_user][y_user] == "b": 77 | game_over = True 78 | else: 79 | solution[x_user][y_user] = str(grid[x_user][y_user]) 80 | 81 | if game_over: 82 | print "You lost" 83 | else: 84 | print "You won" 85 | 86 | for row in grid: 87 | print row 88 | 89 | 90 | 91 | -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/2-Load-Balancing.md: -------------------------------------------------------------------------------- 1 | # Load Balancing 2 | 3 | - Load balancers (LB) help spread traffic across a cluster of servers to improve responsiveness and availability of applications 4 | - LB keep track of the status of all resources while distributing requests 5 | - LB will stop sending requests to unhealthy servers 6 | - LB sits between client and server accepting incoming network and application traffic 7 | - LB distributes traffic across multiple backend servers using different algorithms 8 | - This reduces individual server load and prevents any one app server from being a SPOF 9 | - To achieve full scalability and redundancy, we load balance at each layer of the system: 10 | * Between user and web server 11 | * Between web servers and internal platform layer e.g., application servers or cache servers 12 | * Between internal platform layer and database 13 | 14 | ## Benefits of Load Balancing 15 | 16 | - Users experience faster, uninterrupted service 17 | - Less downtime and higher throughput 18 | - Makes it easier for system administrators to handle incoming requests while decreasing wait time for users 19 | - Provide benefits like predictive analytics that determine traffic bottlenecks 20 | - System administrators experience fewer failed or stressed components 21 | 22 | ## Load Balancing Algorithms 23 | 24 | - Two factors: 25 | 1. Is the server actually responding appropriately to requests 26 | 2. Use a preconfigured algorithm to select one of the healthy servers 27 | - Health checks 28 | * Only forward traffic to healthy backend servers 29 | * Health checks regularly attempt to connect to backend servers to ensure they are listening 30 | * If a server fails a health check, it is automatically removed from the pool 31 | - Variety of algorithms 32 | 1. Least connection method - directs traffic to server with fewest active connections 33 | 2. Least response time method - directs traffic to server with fewest active connections and lowest average response time 34 | 3. Least bandwidth method - directs traffic to server currently serving least amount of traffic measure in Mbps 35 | 4. Round robin 36 | 5. Weighted round robin 37 | 6. IP hash - hash of IP address of client is calculated to redirect request to a server 38 | 39 | ## Redundant Load Balancers 40 | 41 | - Load balancer can be a SPOF 42 | - To overcome this a second LB can be connected to the first to form a cluster 43 | - Each LB monitors the health of the other and in the event of a failure, the secondary LB takes over -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-13-simple-schedulers.md: -------------------------------------------------------------------------------- 1 | # Lecture 13 2 | 3 | ## Scheduling Information 4 | 5 | - Schedulers use three kinds of additional information in order to choose which thread to run next: 6 | 1. What will happen next? 7 | * Orachular schedulers cannot be implemented but can be a good point of comparison 8 | 2. What just happened? 9 | * Typical schedulers (and many other OS algorithms) use the past to predict the future 10 | 3. What does the user want? 11 | * Schedulers usually have ways to incorporate user input 12 | 13 | ## Random Scheduling 14 | 15 | - Choose a scheduling quantum 16 | * Maximum amount of time any thread will be able to run at one time 17 | - Then: 18 | 1. Choose a thread at random from the ready pile 19 | 2. Run the thread until it blocks or the scheduling quantum expires 20 | - What happens when a thread leaves the waiting state? 21 | * Just return it to the ready pile 22 | 23 | ## Round-Robin Scheduling 24 | 25 | - Choose a scheduling quantum 26 | - Establish an ordered ready queue. For example, when a thread is created add it to the tail of the ready queue. 27 | - Then: 28 | 1. Choose the thread at the head of the ready queue 29 | 2. Run the thread until it blocks or the scheduling quantum expires 30 | 3. If its scheduling quantum expires, place it at the tail of the ready queue 31 | - What happens when a thread leaves the waiting state? 32 | * Could put it at the head of the ready queue or at the tail 33 | 34 | ## The Know Nothings 35 | 36 | - The random and round-robin scheduling algorithms 37 | * Require no information about a thread's past, present or future 38 | * Accept no user input 39 | - These are rarely used algorithms except as straw men to compare other approaches to 40 | - Both penalize (or at least do not reward) threads that give up the CPU before quantums expire 41 | - As one exception, round robin scheduling is sometimes used once other scheduling decisions have been made and a set of threads are considered equivalent 42 | * As an example, you might rotate round-robin through a set of threads with the same priority 43 | 44 | ## The Know-It-Alls 45 | 46 | - What might we like to show about threads we are about to execute: 47 | 1. How long is it going to use the CPU? 48 | 2. Will it block or yield? 49 | 3. How long will it wait? 50 | 51 | ## Shortest Job First 52 | 53 | - Why would we use this algorithm? 54 | * Minimizes wait time 55 | - More generatlly, why would we prefer threads that give up the CPU before their time quantum ends? 56 | * They are probably waiting for something else which can be done in parallel with CPU use -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/File-Transfer/File-Transfer.md: -------------------------------------------------------------------------------- 1 | # File Transfer 2 | 3 | ## Functional Requirements 4 | 5 | - Distribute file from a local computer to a fleet of servers 6 | - Consider: 7 | - Small files 8 | - Large files 9 | - Same local network 10 | - Globally distributed network 11 | 12 | ## Methods 13 | 14 | ### scp 15 | 16 | - Secure copy command 17 | - `scp /path/to/local/file user@host:/path/to/remote/dir` 18 | - `scp user@server1:/path/to/remote/file user@server2:/path/to/remote/file` 19 | - Uses SSH so communication is encrypted 20 | 21 | ### rsync 22 | 23 | - `rsync /path/to/local/file user@host:/path/to/remote/file` 24 | - Use `-rsh=ssh` to use SSH encryption; otherwise unencrypted 25 | - In general `rsync` faster than `scp` as it uses optimization e.g., delta algorithm 26 | 27 | ### Central WebServer 28 | 29 | - Host file on a central webserver e.g., S3 and run a `wget` command on each server 30 | - Amount of machines that can perform concurrently would depend on bandwidth and max connections allowed by webserver 31 | - Most liklely will not reach S3 limit 32 | 33 | ## Case 1) Small file, few servers 34 | 35 | - `scp` might be sufficient since file is small 36 | * Provides encryption via ssh and simple 37 | - `rsync` would also work but since file is small, optimization with delta algorithm less obvious 38 | - Central webserver will be overkill for only a few servers 39 | 40 | ## Case 2) Large file, few servers 41 | 42 | - `rsync` would work best since file is large so delta algorithm shines 43 | * Can run with `ssh` option to enable ssh encryption 44 | - Central webserver will be overkill for only a few servers 45 | 46 | ## Case 3) Small or large file, many servers 47 | 48 | - Central webserver should be the best for this scenario 49 | - Each server will connect with the webserver e.g., `wget` to download the file 50 | - This requires the webserver to handle all of the necessary oncurrent connections and handle the resulting bandwidth 51 | - Alternatively, we could use a configuration management software e.g., Ansible to have only a subset of servers download at a time to meet webserver resource limits 52 | - If we don't have a central webserver, we could try using `rsync` to distribute file to `k` servers and then those `k` servers distribute to `k` additional servers and so on until all servers have been updated. 53 | + This allows us to parallelize file transfer. 54 | 55 | ## Case 4) Small or large file, many servers different geographical locations 56 | 57 | - Same as Case 3 but perhaps try using a hosted solution e.g., S3 that has geographic content distribution networks at locations where servers are hosted in order to minimize latency -------------------------------------------------------------------------------- /cramming/past/csv-parse/main.py: -------------------------------------------------------------------------------- 1 | # Write a program that reads in two csv files: 2 | # 3 | # $ cat dataset1.csv 4 | # NAME,LEG_LENGTH,DIET 5 | # Hadrosaurus,1.2,herbivore 6 | # Struthiomius,0.92,omnivore 7 | # Velociraptor,1.0,carnivore 8 | # Triceratops,0.87,herbivore 9 | # Euplocephalus,1.6,herbivore 10 | # Stegosaurus,1.40,herbivore 11 | # Tyrannosaurus Rex,2.5,carnivore 12 | # 13 | # $ cat dataset2.csv 14 | # NAME,STRIDE_LENGTH,STANCE 15 | # Euoplocephalus,1.87,quadrupedal 16 | # Stegosaurus,1.90,quadrupedal 17 | # Tyrannosaurus Rex,5.76,bipedal 18 | # Hadrosaurus,1.4,bipedal 19 | # Deinonychus,1.21,bipedal 20 | # Struthimimus,1.34,bipedal 21 | # Velociraptor,2.72,bipedal 22 | # 23 | # Then prints the names of bipedal dinosaurs from 24 | # fastest to slowest. 25 | # 26 | # Speed is given by 27 | # ((STRIDE_LENGTH / LEG_LENGTH) - 1) * SQRT(LEG_LENGTH * g) 28 | # 29 | # Strategy: 30 | # We need to combine information from both 31 | # datasets in order to calculate the speed. 32 | # We read in the first dataset to form a dictionary 33 | # with dinosaur name and leg length as value. 34 | # Next, we read in the second dataset and 35 | # for each row if the stance is bipedal 36 | # we calculate the speed. 37 | # We can store the speed as the key and 38 | # the name of the dinosaur as the value in 39 | # a dictonary. 40 | # we can also store the speed in a separate 41 | # array sort it, then use the sorted array to get 42 | # the values in the dictionary to print. 43 | # 44 | # Key idea 1 45 | # Use csv library DictReader to read in a csv file. 46 | # Each item in the DictReader object contains 47 | # a row of the csv file mapped to the header. 48 | # 49 | # Key idea 2 50 | # Use collections.Counter instead of a dict to store 51 | # the results because you have to print the results in 52 | # a sorted order and a dict cannot be sorted. 53 | 54 | import csv 55 | import math 56 | import collections 57 | 58 | # Read in the first dataset. 59 | leg_length = {} 60 | with open('dataset1.csv') as csvfile: 61 | reader = csv.DictReader(csvfile) 62 | for row in reader: 63 | leg_length[row['NAME']] = row['LEG_LENGTH'] 64 | 65 | # Read in the second dataset and calculate speed. 66 | # Store the results in a Counter object. 67 | result = collections.Counter() 68 | speed_arry = [] 69 | with open('dataset2.csv') as csvfile: 70 | reader = csv.DictReader(csvfile) 71 | for row in reader: 72 | if row['STANCE'] == 'bipedal' and row['NAME'] in leg_length: 73 | tmp = (float(row['STRIDE_LENGTH'])/float(leg_length[row['NAME']])-1.0)*math.sqrt(float(leg_length[row['NAME']])*9.81) 74 | result[row['NAME']] = tmp 75 | 76 | # Print the results 77 | for name, speed in result.most_common(): 78 | print name 79 | 80 | 81 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SRE Interview Preparation 2 | 3 | This repo contains resources for SRE interview preparation. It is divided into two sections: 4 | 1. Cramming materials 5 | 2. Fundamental materials 6 | 7 | The `cramming` directory contains resources that are useful for cramming for SRE/Devops interviews. These resources work well for 75% of the SRE/Devops interviews that you may face. For the other 25% (more selective companies), they require a deeper understanding that requires structured learning of fundamentals. 8 | 9 | For example, lets say you are asked to describe the Linux booting process. Several of the resources in `cramming` will give a cursory overview of the steps involved, but if the interviewer starts digging deeper into a particular step, you will be lost without fundamental knowledge of computer architecture and assembly. Similar examples can be provided for other questions regarding application performance, memory management, CPU scheduling, etc. The `fundamentals` directory contains resources that are much more structured (i.e., full courses and textbooks) that provide a solid background to the material in `cramming`. 10 | 11 | # Cramming Material 12 | 13 | - `algoexpert` 14 | * Contains problems solved in algoexpert stored in Jupyter Notebooks 15 | - `google-sre` 16 | * Contains notes from Google's Site Reliability Engineering [books](https://landing.google.com/sre/books/) 17 | - `lfs` 18 | * Contains notes and scripts from [Linux from Scratch](http://www.linuxfromscratch.org/) 19 | - `linux-sys-admin` 20 | * Contains notes from [Linux System Adminstration Handbook](https://www.amazon.com/UNIX-Linux-System-Administration-Handbook/dp/0134277554/ref=sr_1_8?dchild=1&keywords=linux&qid=1592369959&sr=8-8) 21 | - `past` 22 | * Contains past interview questions related to parsing and other sys admin tasks 23 | - `random` 24 | * `fb-resource-*.md` contains notes from two study resources that Facebook provided for Production Engineer systems interview 25 | * `google-foobar.ipynb` contains questions from the Google Foobar hiring challenge 26 | * `troubleshooting.md` contains notes from Brendan Gregg's [USE](http://www.brendangregg.com/usemethod.html) and [TSA](http://www.brendangregg.com/tsamethod.html) methods for troubleshooting 27 | * `what-happens-google.md` contains notes on the classic "What happens when you type google.com into your browser" 28 | - `tech-blogs` 29 | * Contains notes on tech blogs of various companies 30 | 31 | # Fundamental Material 32 | 33 | - `kernel` 34 | * Contains notes about the Linux Kernel from [here](https://github.com/0xAX/linux-insides) 35 | - `system-perf` 36 | * Contains notes from Systems Performance [book](https://www.amazon.com/Systems-Performance-Enterprise-Brendan-Gregg/dp/0133390098) -------------------------------------------------------------------------------- /fundamentals/system-perf/network.md: -------------------------------------------------------------------------------- 1 | # Network 2 | 3 | ## Methodology 4 | 5 | ### Tools Method 6 | 7 | - `netstat -s`: find high rate of retransmits 8 | - `netstat -i`: interface error counters 9 | - `ifconfig`: check errors, dropped and overruns 10 | - `vp`: check rate of bytes transmitted and received 11 | - `tcpdump`: see who is using network and identify unnecessary work 12 | - `dtrace`: packet inspection between app and wire 13 | 14 | ### USE Method 15 | 16 | - Utilization: time interface busy sending or receiving frames 17 | - Saturation: degree of extra queueing, buffering or blocking due to fully utilized interface 18 | - Error: 19 | * For receive, check bad checksum, frame too oshort/long, collisions 20 | * For transmit: check late collisions 21 | 22 | ### Workload Characterization 23 | 24 | - Network interface throughput: RX and TX, bytes per second 25 | - Network interface IOPS: RX and TX, frames per second 26 | - TCP connection rate: active and passive connections per second 27 | - Questions to ask: 28 | 1. Ave packet size received and transmitted 29 | 2. Protocol? UDP or TCP? 30 | 3. Which ports active? Bytes per second, connections per second? 31 | 4. Which processes actively using the network 32 | 33 | ### Latency Analysis 34 | 35 | - Network latencies caused by: 36 | 1. System call send/receive latency 37 | 2. System call connect latency 38 | 3. TCP connection initialization time 39 | 4. TCP first-byte latency 40 | 5. TCP connection duration 41 | 6. TCP retransmits 42 | 7. Network round-trip time 43 | 8. Interrupt latency: time from network controller interrupt for a received packet to when it is serviced by the kernel 44 | 9. Inter-stack latency 45 | 46 | ### Performance monitoring 47 | 48 | - Look for 49 | * Throughput 50 | * Connections 51 | * Errors e.g., drop packets counters 52 | * TCP retransmits 53 | * TCP out-of-order packets 54 | 55 | ### Packet Sniffing 56 | 57 | - Capture packets from network so their protoocol headers and data can be inspected on a packet level 58 | * Timestamp 59 | * Entire packet including protocool headers, partial or full payload data 60 | * Metadata e.g., number of packets, number of drops 61 | 62 | ### TCP Analysis 63 | 64 | - Usage of TCP send/receive buffers 65 | - Usage of TCP backlog queues 66 | - Kernel drops due to backlog queue being full 67 | - Congestion window size 68 | - SYNs received during a TCP TIME_WAIT interval 69 | 70 | 71 | ### Static Performance Tuning 72 | 73 | - Number of network intefaces available for use? 74 | - Max speed of network interfaces? 75 | - MTU configured for network interfaces? 76 | - Routing? Default gateway? 77 | - How is DNS configured? 78 | - Known issues with network device driver? Kernel TCP/IP stack? 79 | -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/11-long-polling-websockets.md: -------------------------------------------------------------------------------- 1 | # Long-Polling vs WebSockets vs Server-Sent Events 2 | 3 | - All three are popular communication protocols between client and server 4 | - Standard HTTP request: 5 | 1. Client opens a connection and requests data from server 6 | 2. Server calculates response 7 | 3. Server sends response back to the client 8 | 9 | ## Ajax Polling 10 | 11 | - Client repeatedly polls (requests) server for data 12 | - If no data available, empty response is returned 13 | - Procedure: 14 | 1. Client opens connection, requests data via HTTP 15 | 2. Requested webpage sends requests to server at regular intervals 16 | 3. Server calculates response and sends it back 17 | 4. Client repeats above three steps to get updates from server 18 | - Problem is that client has to keep asking, resulting in many empty responses creating overhead 19 | 20 | ## HTTP Long-Polling 21 | 22 | - Server push info to client when data is available 23 | - Client requests info from server as normal polling but with expectation that server may not respond immediately 24 | - Also known as `Hanging GET` 25 | * If server does not have data available for client, it holds request and waits until data is available 26 | * Once data is available, a full response is sent to client 27 | * Client immediately re-request information 28 | - Procedure: 29 | 1. Client makes initial request using regular HTTP 30 | 2. Server delays response until update available 31 | 3. Update available and server sends a full response 32 | 4. Client sends new long-poll request either immediately upon receiving response or after a pause to allow acceptable latency period 33 | 5. Each long-poll request has a timeout; client has to reconnect periodically after connection is closed due to timeouts 34 | 35 | ## WebSockets 36 | 37 | - Provides full duplex communication channels over a single TCP connection 38 | - Persistent connection between a client and a server that both parties can use to send data at any time 39 | - Client establishes websocket connection theough websocket handshake 40 | - If process succeeds, server and client can exchange data in both directions at any time 41 | - Lower overhead, facilitating real-time data transfer from and to the server 42 | - Messages passed back and forth while keeping connection open 43 | 44 | ## Server-Sent Events (SSEs) 45 | 46 | - Client establishes a persistent and long-term connection with the server 47 | - Server uses this connection to send data to client 48 | - If client wants to send data to server, it would need another protocol to do so 49 | - Procedure 50 | 1. Client requests data from server using HTTP 51 | 2. Requested webpage opens connection to server 52 | 3. Server sends data to client when new info is available 53 | - SSEs best when we need real-time traffic from the server to client -------------------------------------------------------------------------------- /fundamentals/system-perf/application.md: -------------------------------------------------------------------------------- 1 | # Application 2 | 3 | ## Methoodology and Analysis 4 | 5 | ### Thread State Analysis 6 | 7 | - Where application threads are spending their time 8 | - Two states: 9 | * On-CPU: executing 10 | * Off-CPU: waiting for turn, I/O, locks, paging, etc 11 | - Six states 12 | * Executing 13 | + Check reason for CPU consumption via profiling 14 | + `top` reports this as `%CPU` 15 | * Runnable 16 | + App needs more CPU resources 17 | + `schedstats` via `/proc//schedstat` 18 | * Anonymous paging 19 | + Lack of available main memory 20 | + Kernel delay accounting feature or tracing 21 | * Sleeping 22 | + Where is the app blocked (syscall analysis, I/O profiling) 23 | + `pidstat -d`, `iotop`, tracing 24 | * Lock 25 | + Identify lock, thread holding it and why held for so long 26 | + Tracing 27 | * Idle 28 | - Want to minimize time spent in first five states (maximize idle) 29 | 30 | ### CPU Profiling 31 | 32 | - Why is an app consuming CPU resources 33 | - Sample on-CPU user-level stack trace and coalesce results 34 | * Visualize using flame graphs 35 | 36 | ### Syscall Analysis 37 | 38 | - Executing: on-CPU (user mode) 39 | - Syscalls: system call (kernel mode running or waiting) 40 | * I/O, locks, other syscall types 41 | - For runnable and anonymous paging, check CPU and memory saturation via USE 42 | - Executing state can be studied via CPU profiling 43 | - `strace` to show system calls made, their return codes and time spent 44 | * Note: high overhead 45 | * Buffered tracing buffers instructumentation data in-kernel so target program can continue to execute 46 | * Differs from breakpoint tracig which interrupts target program for each tracepoint 47 | 48 | ### I/O Profiling 49 | 50 | - Determines why and how I/O-related system alls are being performed 51 | - Tracing filtering for read system calls for example 52 | 53 | ### USE Method 54 | 55 | - Utilization, saturation and error of all hardware resources 56 | - For software resuorces, depends on the app at hand 57 | * Example: app uses poool of woorker threads with a queue for requests to wait their turn 58 | + Utilization: average number of threads busy processing requests during an interval 59 | + Saturation: average length of request queue during an interval 60 | + Errors: requests denied or failed 61 | * Example: file descriptors 62 | + Utilization: % file descriptoors opened 63 | + Saturation: if threads block waiting for FD allocation 64 | + Errors: allocation errors 65 | 66 | ### Lock Analysis 67 | 68 | - Check for excessive hold times 69 | - CPU profiling shows spin locks, mutex locks 70 | 71 | ### Static Performance Tuning 72 | 73 | - Latest version? 74 | - Known performance issues? 75 | - Configured correctly? 76 | - Cache? 77 | - App run concurrently? 78 | - System libraries? 79 | - Large heap pages? 80 | - System imposed resource limits? -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-10-context-switch.md: -------------------------------------------------------------------------------- 1 | # Lecture 10 2 | 3 | ## CPU Limitations: Number 4 | 5 | - Historically we were limited to one core i.e., one processor. Why? 6 | * Expensive and complex 7 | - Recently we have many. Why? 8 | * At some point it was no longer feasible to make a single core faster e.g., due to heat 9 | * Therefore, we started adding more cores 10 | - In general, there are fewer cores than tasks to be run 11 | - How does the CPU compare to other parts of the system e.g., memory, disk, etc 12 | * CPU is way faster 13 | * Faster than memory - usually addressed on the processor through out-of-order execution 14 | * Way faster than disk - addressed by the OS 15 | * Way faster than you - partially addressed by OS 16 | - Human can withstand 15ms of lag which is equivalent ot 15,000,000 CPU clock cycles for 1GHz processor 17 | 18 | ## Birth of the OS 19 | 20 | - OS emerged partly to hide delays caused by slow devices to keep processor active 21 | - Hiding processor delays requires only cooperative scheduling 22 | * Threads only stop running when they require a long-latency operation 23 | 24 | ## Supporting Multiple Interactive Users 25 | 26 | - Support multiple users requires the notion that multiple tasks are running simulataneously or concurrently, either: 27 | 1. One task per user for multiple users 28 | 2. Multiple tasks for a single user 29 | 3. Multiple tasks for multiple users 30 | 31 | ## The Illusion of Concurrency 32 | 33 | - How is this accomplished? 34 | * Processors rapidly switches between tasks creating the notion of concurrency 35 | * We refer to these transitions as context switches 36 | 37 | ## Implementing Context Switching 38 | 39 | - How does the OS get control? 40 | 1. Hardware interrupts 41 | 2. Software interrupts 42 | 3. Software exceptions 43 | - But what if these things don't happen? 44 | - Timer interrupts generated by a timer device ensure that the OS regains control of the system at regular intervals 45 | - Timer interrupts are the basis of preemptive scheduling - OS doesn't wait for the thread to stop running, instead it preempts it 46 | - Rest of interrupt handling is unchanged 47 | - Timer interrupts means that a running thread may be stopped at any time 48 | - When the thread restarts we want it to appear that nothing has changed i.e., that the execution was not interrupted 49 | * Other parts of the system might have changed, but the CPU state should be identical 50 | - How do we do this? 51 | 52 | ## Saving Thread State 53 | 54 | - What does thread state consist of? 55 | * Registers 56 | * Stacks 57 | - We rely on memory protection to keep the stack unchanged until we restart the thread 58 | - Saving thread is the first thing that happens when the interrupt service routine (ISR) is triggered. Why? 59 | * Saved state is sometimes refer to as a trap frame. 60 | 61 | ## Context Switching 62 | 63 | - Threads switch to a separate kernel stack when executing in the kernel. Why? 64 | * The kernel doesn't trust (or want to pollute) the user thread's stack 65 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-18-paging.md: -------------------------------------------------------------------------------- 1 | # Lecture 18 2 | 3 | ## Locating Page State 4 | 5 | - We want MMU's cache (TLB) to have the address mapping to physical memory 6 | - If MMU does not have it, it asks the kernel which needs an efficient way to look up 7 | - Requirements for how we locate page information: 8 | * Speed: translation is a hot path and should be as efficient as possible 9 | * Compactness: data structures we should use should not take up too much physical memory 10 | 11 | ## Page Tables 12 | 13 | - Data structure used to quickly map a virtual page number to a page table entry is called a page table 14 | * Each process has a separate page table 15 | 16 | ## Flat Page Tables 17 | 18 | - Approach: use one array to hold all page table entries for each process 19 | * Virtual page number is index into this array 20 | + Speed: O(1) VPN is used directly as an index into the array 21 | + Compactness: amount of memory (4MB) per process can have that may have to be contiguous (most is unused) 22 | 23 | ## Linked List Page Tables 24 | 25 | - Approach: List of PTEs for each process, searched on each translation 26 | * Size: scaled with number of process memory allocates i.e., 4bytes*`n` for `n` valid virtual pages 27 | * Speed: O(n) for n valid virtual pages 28 | 29 | ## Multi-Level Page Tables 30 | 31 | - Approach 32 | * Build a tree-like data structure mapping VPN to PTE 33 | * Break VPN into multiple parts, each used as an index at a separate level of the tree 34 | - Example: 35 | * With 4K pages VPN is 20 bits 36 | * Use top 10 bits as index into top-level page table 37 | * Bottom 10 bits as index into second-level page table 38 | * Each page table is `2^10 * 4` bytes = 4K 39 | - Speed: O(c): constant number of lookups per translation depending on tree depth 40 | - Compactness: depends on sparsity of address space, but better than flat and worse than linked list 41 | 42 | ## Out of Core 43 | 44 | - So far we have been talking about cases where processes are able to use physical memory available on the machine 45 | - What happens when we run out? 46 | - When we run out there are two options: 47 | 1. Fail: either don't load `exec()`, don't create a new process `fork()`, refuse to allocate more heap `sbrk()`, or kill the process if it is trying to allocate more stack 48 | 2. Create more space, preserving the contents of memory for later use 49 | 50 | ## Virtually Sneaky 51 | 52 | - Virtual address translation gives the kernel the ability to remove memory from a process behind its back 53 | - What are the requirements for doing this? 54 | * The last time the process used the virtual address, it behaved like memory 55 | * The next time the process uses the virtual address, it behaves like memory 56 | * In between, whateveer data was stored at that address must be preserved 57 | 58 | ## Swapping 59 | 60 | - Place OS typically place data stored in memory in order to borrow the memory from the process is on disk 61 | - We call the process of moving data back and forth from memory to disk in order to improve memory usage swapping 62 | - Goal: when swapping is done well you system feels like it has memory that is as large as the size of the disk but as fast as actual RAM 63 | - Unfortunately, when swapping is not done well, your system feels like it has memory that is as small as RAM and as slow as disk -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-9-interrupt-exception-2.md: -------------------------------------------------------------------------------- 1 | # Lecture 9 2 | 3 | ## Masking Interrupts 4 | 5 | - Hardware interrupts can be either asynchronous or synchronous 6 | - Asynchronous interrupts can be ignored or masked 7 | * Note: ISR is the interrupt service routine which are instructions that processor executes when an interrupt fires 8 | * Processor provides an interrupt mask allowing OS to choose which interrupts will trigger the ISR 9 | * If interrupt is masked, it will not trigger the ISR 10 | * If interrupt is still asserted when it is unmasked, it will trigger the ISR at that point 11 | - Some interrupts are synchronous or not maskable 12 | * These typically indicate very serious conditions which must be handled immediately 13 | * Example: processor reset 14 | 15 | ## Why Mask Interrupts 16 | 17 | - Choosing to ignore interrupts allows the system to prioritze certain interrupts over others 18 | - In some cases handling certian interrupts generates other interrupts which would prevent the system from handling the origin interrupt 19 | * Applications could take control of devices by preventing the kernel from communicating with them 20 | - Interrupt handlers allow OS to control access to hardware devices and protect them from direct control by untrusted apps 21 | - Memory that contains interrupt handlers is protected from access by user apps 22 | - One of the first things kernel does on boot is install its interrupt handlers 23 | 24 | ## Software Interrupts 25 | 26 | - Given that OS prevents unprivileged code from directly accessing system resources, how do apps gain access to these protected resourceas? 27 | - CPUs provide a special instruction (`syscall` on the MIPS) that generate a software (or synthetic) interrupt 28 | - Software interrupts provide a mechanism for user code to idicate that it needs help from the kernel 29 | - Rest of interrupt handling path is unchanged. The CPU: 30 | 1. Enters privileged mode 31 | 2. Records state necessary to process interrupt 32 | 3. Jumps to a pre-determined memory location and begins executing instructions 33 | 34 | ## Making System Calls 35 | 36 | - To access the kernel system call interface an application: 37 | 1. Arranges arguments to the system call in an agreed-on place where kernel can find them, typically in registers or on its stack 38 | 2. Loads a number identifying the system call it wants the kernel to perform into a pre-determined register 39 | 3. Executes the `syscall` instruction 40 | - `libc` provides the wrappers and the `syscall` instruction that programmers are familiar with 41 | 42 | ## Software Exceptions 43 | 44 | - Software exception indicates that code running on the CPU has created a situation that the processor needs help to address 45 | - Examples: 46 | * Divide by zero - probably kills the process 47 | * Attempt to use a privileged instruction - also probably kills the process 48 | * Attempt to use a virtual address that the CPU does not know how to translate - common exception handled transparently as part of virtual memory manangement 49 | - Interrupts are voluntary 50 | * Think of the CPU as saying to the kernel: the `/bin/true` process needs you assistance 51 | - Exceptions are non-voluntary 52 | * Think of CPU as saying to the kernel: I need some help with this `/bin/false` process. It just trued to divide by zero and I think it needs to be terminated 53 | -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/1-System-Design-Basics.md: -------------------------------------------------------------------------------- 1 | # System Design Basics 2 | 3 | - When designing a large system consider: 4 | 1. What are different architectural pieces needed? 5 | 2. How do these pieces work together? 6 | 3. How can we utilize these pieces? Tradeoffs? 7 | 8 | # Key Characteristics of Distributed Systems 9 | 10 | ## Scalability 11 | 12 | - Scalability is the capability of a system, process or network to grow and manage increased demand 13 | - System may have to scale for a variety of reasons e.g., increased data volume or amount of work 14 | - Scalable system should achieve this without performance loss 15 | - In general, performance of a system declines with system size due to management or environment cost 16 | - Some tasks may not be distributed (either due to atomic nature or system design flaw) 17 | * These will limit speed-up obtained by distribution 18 | - Scalable architecture attempts to balance load evently across all nodes 19 | 20 | ### Horizontal vs. Vertical Scaling 21 | 22 | - Horizontal scaling adds more servers to pool of resources 23 | - Vertical scaling adds more power (e.g., CPU, RAM, storage) to an existing server 24 | - Horizontal scaling is easier to scale dynamically just by adding more nodes 25 | - Vertical scaling requires downtimes and has an upper limit 26 | - Horizontal scaling examples: 27 | * Cassandra 28 | * MongoDB 29 | - Vertical scaling example: 30 | * MySQL 31 | 32 | ### Reliability 33 | 34 | - Reliability is the probability a system will fail in a given period 35 | - Distributed system is reliable if it delvers services even when one or more of its software or hardware components fail 36 | - A reliable distributed system achieves reliability through redundancy in both software components and the data 37 | - Redundancy has a cost and a reliable system has to pay that to achieve resilience for services by eliminating every SPOF 38 | 39 | ### Availability 40 | 41 | - Availability is the time a system remains opeartional to perform its required function in a given period 42 | - Percentage of time that a system, service or machine remains operational 43 | - If a system is reliable, it is available 44 | - If a system is available, it is not necessarily reliable 45 | * For example, it is possible to achieve high availability with an unreliable product by minimizing repair time and ensuring spares are always available when needed 46 | 47 | ### Efficiency 48 | 49 | - Consider operation that runs in a distributed manner and delivers set of items 50 | - Two measures of efficiency are response time (delay to obtain the first item) and throughput (number of items delivered in a unit time) 51 | - Above two measures correspond to following unit costs: 52 | 1. Number of messages sent by nodes regardless of message size 53 | 2. Size of messages representing volume of data exchanges 54 | - Complexity of operations supported by distributed data structures charactierized as a function of one of those unit costs 55 | - Difficult to develop a precise cost model that accurately accounts for all performance factors e.g., network topology, network load, etc 56 | 57 | ### Serviceability and Manageability 58 | 59 | - Ease of operation and maintenance 60 | - If MTTR increases then availability decreases 61 | - Considerations for manageability: 62 | * Ease of diagnosing problems 63 | * Ease of making updates or modifications 64 | * Ease of operation -------------------------------------------------------------------------------- /fundamentals/pucit-systems-programming/lecture-1-intro.md: -------------------------------------------------------------------------------- 1 | # Lecture 1 2 | 3 | ## System Programmer Perspective 4 | 5 | - Application programing produce software to provide service to th user 6 | - System programming produce software which provides services to computer hardware e.g., disk fragmenter 7 | * Write kernel code to manage main memory, disk space management, cpu scheduling and management of I/O devices through device drivers 8 | - Executable uses library calls to make system calls which accesses the kernel 9 | - System programmer writes program that may have to acquire data 10 | * From a file that may have been opened by some other user 11 | * From other programs running on the same or different machine 12 | * From OS itself 13 | - After processing programs may have to write results to a shared resource which other processes are also writing or results may need to be delivered to another process asynchrnously i.e., not when process asked for it but at some later unpredictable time 14 | - Important tasks a kernel performs: 15 | * File management 16 | * Process management 17 | * Memory managment 18 | * Information management 19 | * Signal handling 20 | * Synchronization 21 | * IPC 22 | * Device management 23 | - Two methods program makes requets for kernel services 24 | * Making a system call (entry point built directly in kernel) 25 | * Calling lirbary routine that makes use of this system call 26 | - System call 27 | * Controlled entry point into kernel code 28 | * Allows process to request kernel to perform privileged operation 29 | * System call changes process state from user to kernel mode so CPU can access protected kernel memory 30 | * Set of system calls is fixed; each system call identified by unique number 31 | * Each system call may have a set of arguments specifying info to be transfered from user to kernel space and vice versa 32 | - System call invocation: `open()` 33 | 1. User app makes `open()` system call 34 | 2. Processor enters kernel mode 35 | 3. System call table maps to code of `open()` system call 36 | 4. Executes and returns output to user app 37 | - System call invocation: `read()` 38 | 1. Application uses `read()` wrapper function from glibc library 39 | ``` 40 | read(fd, buffer, count); # Reads data from file associated with file descriptor fd into buffer pointed to by buf for count nbytes 41 | ``` 42 | - Note: pushes arguments onto stack in reverse order 43 | 2. glibc wrapper function invokes syscall 44 | ``` 45 | read(...) 46 | { 47 | ... 48 | syscall(SYS_read, fd, buff, count); 49 | ... 50 | return; 51 | } 52 | ``` 53 | - Arguments to `syscall()` are put on CPU registers 54 | 3. Trap handler takes `syscall()` parameters from CPU registers to kernel process stack 55 | * It then finds the system call number (specified by `SYS_read` in previous step) to find address of system call service routine 56 | * Finally it transfers control to the system call service routine 57 | 4. System call service routine 58 | ``` 59 | SYS_read() 60 | { 61 | ... 62 | ... 63 | return error 64 | } 65 | ``` 66 | - Returns results (error or success code) back to the trap handler 67 | 5. Trap handler switches back from kernel to user mode (wrapper function) returning results 68 | 6. Wrapper function returns back to the application program with the results 69 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-24-filesystem-data-structure.md: -------------------------------------------------------------------------------- 1 | # Lecture 24 2 | 3 | ## ext4 inodes 4 | 5 | - 1 inode per file 6 | - 256 bytes so 1 per sector or 16 per block 7 | - Contains: 8 | * Location of file data blocks (contents) 9 | * Permissions including user, read/write/execute bits, etc 10 | * Timestamps including creation (`crtime`), access (`atime`), content modification (`mtime`), attribute modification (`ctime`) and delete (`dtime`) times 11 | * Named and located by number 12 | 13 | ## Locating Inodes 14 | 15 | - How does the system translate an inode number into an inode structure? 16 | * All inodes are created at hofrmat time at well-known locations 17 | - What are the consequences of this? 18 | * Inodes may not be located near file contents 19 | + `ext4` creates multiple blocks of inodes within the drive to reduce seek times between inodes and data 20 | * Fixed number of inodes for the file system 21 | + Can run out of inodes before we run out of data blocks 22 | + `ext4` creates approximately one inode per 16kb of data blocks, but this can be configured at format time 23 | 24 | ## Directories 25 | 26 | - Simply a special file the contents of which map inode numbers to relative names 27 | ``` 28 | ls -i / 29 | 131073 bin 30 | 39217 boot 31 | 3 dev 32 | ... 33 | ``` 34 | - Above shows inode mapping of each of its contents 35 | - Note `/proc` and `/sys` has the same inode number `1` since they are pseudo files (does not live on disk) 36 | 37 | ## open: Path Name Translation 38 | 39 | - `open("/etc/default/keyboard")` must translate `/etc/default/keyboard` into an inode number 40 | 1. Get inode number for root directory (usually a fixed agreed on inode number e.g., `2`) 41 | 2. Open directory with inode number `2` and look for `etc` Lets assume it is `393218` 42 | 3. Open directory with inode number `393218` and look for `default`. Lets assume it is `393247` 43 | 4. Open directory with inode number `393247` and look for `keyboard`. Lets assume it is `394692` 44 | 5. Open file with inode number `394692` 45 | 46 | ## read/write: Retrieving and Modifying Data Blocks 47 | 48 | - `read/write(filehandle, 345)` must tranlsate `345` (the offset) to a data block within the open file to determine what data block to modify 49 | - There are multiple ways of doing this 50 | 51 | ## Data Blocks: Linked Lists 52 | 53 | - One solution: organize data blocks into a linked list 54 | * Inode contains a pointer to the first data block 55 | * Each data block contains a pointer to the previous and next data block 56 | - Pros: 57 | * Simple 58 | * Small amount of information in inode 59 | - Cons: 60 | * Offset lookups are slow - O(n) in the size of the file 61 | 62 | ## Data Blocks: Flat Array 63 | 64 | - Store all data blocks in the inode in a single array and allocate at file creation time 65 | - Pros: 66 | * Simple 67 | * Offset lookups are fast O(1) 68 | - Cons: 69 | * Small file size fixed at startup time 70 | 71 | ## Data Blocks: Multilevel Index 72 | 73 | - Most files are small, but some can get very large 74 | - Have inode store: 75 | * Some pointers to blocks which refer to direct blocks 76 | * Some pointers to blocks containing pointers to blocks which we refer to as indirect blocks 77 | * Some pointers to blocks containing pointers to blocks containing pointers to blocks which we refer to doubly indirect blocks 78 | * Etc. 79 | - Pros: 80 | * Index scales with the size of the file 81 | * Offset lookups are still fairly fast 82 | * Small files stay small but big files can get extremely large -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-4-fork-and-sync.md: -------------------------------------------------------------------------------- 1 | # Lecture 4 2 | 3 | ## Pipes 4 | 5 | - Chains of communicating processes can be created by exploiting the `pipe()` system call 6 | - Standard output of one process is passed as standard input to another 7 | - `pipe()` creates an anonymous pipe object and returns two file descriptors 8 | 1. For the read-only end 9 | 2. For the write-only end 10 | - Pipe contents are buffered in memory 11 | - IPC using `fork()` and `pipe()`: 12 | 1. Before calling `fork()`, parent creates a pipe object by calling `pipe()` 13 | 2. Next, it calls `fork()` 14 | 3. After `fork()`, the parent closes its copy of the read-only end and the child closes its copy of the write-only end 15 | 4. Now the parent can pass information to its child 16 | - Issues with `fork()` 17 | * Copying all the state is expensive 18 | + Especially when the next thing that a process does is starting to load a new binary which destroys most of the state `fork()` has carefully copied 19 | * Several solutions to this problem: 20 | + Optimize existing semantics through copy-on-write 21 | + Change the semanitcs `vfork()` which will fail if the child does anything other than immediately load a new executable 22 | - Note: this does not copy the address space 23 | * What if I don't want to copy process state 24 | + `fork()` is now replaced by `clone()` - a more flexible primitive that enables more control: 25 | - Over sharing, including sharing memory and signal handlers 26 | - Over child execution, which begins at a function pointer passed to the system call instead of resuming at the point where `fork()` was called 27 | - `fork()` establishes a parent-child relationship between two processes at the point when one is created 28 | - `pstree` utility allows you to visualize these relationships 29 | 30 | ## Synchronization 31 | 32 | - The OS creates the illusion of concurrency by quickly switching the processors between multiple threads 33 | - Threads are used to abstract and multiplexes the CPU 34 | - The illusion of concurrency is both powerful and useful 35 | * Helps us think about how to structure our applications 36 | * Hides latencies caused by slow hardware devices 37 | - Unfortunately, concurrency also creates problems 38 | * Coordination: how do we enable efficient communication between the multiple threads involved in performing a single task 39 | * Correctness: how do we ensure shared state remains consistent when being accessed by multiple threads concurrently? 40 | + How do we enforce time-based semantics 41 | 42 | ## Patient 0 43 | 44 | - The OS itself is one of the most difficult concurrent programs to write 45 | * It is multiplexing access to hardware resources and therefore sharing a lot of state between multiple processes 46 | * It is frequently using many threads to hide hardware delays while servicing devices and application requests 47 | * Lots of shared state plus lots of threads equals a difficult synchronization problem 48 | - Unless explicitly synchronized, threads may: 49 | 1. Be run in any order 50 | 2. Be stopped and restarted at any time 51 | 3. Remain stopped for arbitrary lengths of time 52 | - In general, these are good since OS is making choices about how to allocate resources 53 | - When accessing shared data these are challenges that force us to program carefully 54 | 55 | ## Race Condition 56 | 57 | - A race condition is when output of a process is unexpectedly dependent on timing or other events 58 | - Note that the definition of a race depends on what we expected to happen -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/3-Caching.md: -------------------------------------------------------------------------------- 1 | # Caching 2 | 3 | - LBs help you scale horizontally, caching enables you to make better use of existing resources 4 | - Caches take advantage of the locality of reference principle (recently requested data likely to be requested again) 5 | - Used in every layer of computing e.g., hardware, OS, web browsers, web apps, etc 6 | - Cache has limited amount of space but faster than original data source, and contains most recently accessed items 7 | - Caches implemented near front end in order to retrun data quickly 8 | 9 | ## Application Server Cache 10 | 11 | - Place cache on a request layer node to enable local storage of response data 12 | - When request made to service, node returns cached data if it exists; otherwise, node will query data from disk 13 | - Cache can be in memory (fast) or on node's storage (slow, but still faster than going to network storage) 14 | - If there is a cluster of cache nodes, higher chance of cache miss since LB randomly distributes request 15 | - To overcome this we can use global cache and distributed caches 16 | 17 | ## Content Distribution Network (CDN) 18 | 19 | - CDN is a kind of cache used for serving large amounts of static data 20 | - CDN setup: 21 | 1. Request ask CDN for a piece of static data 22 | 2. CDN serve content if locally available 23 | 3. If unavailable, CDN queries back-end servers for the file, cache it locally, serve it to requesting server 24 | - Note: if system isn't large enough yet to have a CDN, we can just use Nginx with a separate subdomain pointing to it and have it serve static media 25 | 26 | ## Cache Invalidation 27 | 28 | - If data is modified in database, it should be invalidated in the cache; otherwise inconsisten app behavior will occur 29 | - Three schemes for cache invalidation: 30 | 1. Write-through cache 31 | * Data written into cache and corresponding database at the same time 32 | * Complete data consistency between cache and storage 33 | * Minimizes risk of data loss since every write operation must be done twice before returning success to the client 34 | * Scheme has higher latency for write operations 35 | 2. Write-around cache 36 | * Data written directly to permanent storage, bypassing cache 37 | * Reduce cache from being flooded with write operations that will not subsequenctly be re-read 38 | * Disadvantage that a read request for recently written data will create a cache miss and must be read from slower backend 39 | 3. Write-back cache 40 | * Data is written to cache and completion confirmed to client 41 | * Write to permanent is done after a specified interval or under certain conditions 42 | * Results in low latency and high throughput for write-intensive apps 43 | * Risk of data loss if cache crash since only copy of data is in the cache 44 | 45 | ## Cache Eviction Policies 46 | 47 | - Common cache eviction policies: 48 | 1. First in first out (FIFO) 49 | * Evict first block accessed first without regard to how often or how many times it was accessed before 50 | 2. Last in first out (LIFO) 51 | * Evict block accessed most recently first without regard to how often or how many times it was accessed before 52 | 3. Least recently used (LRU) 53 | * Discard least recently used items first 54 | 4. Most recently used (MRU) 55 | * Discards most recently used items first 56 | 5. Least frequently used (LFU) 57 | * Counts how often an item is needed and the least needed items are discarded first 58 | 6. Random replacement (RR) 59 | * Randomly select an item to discard when required -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-27-log-structured-files.md: -------------------------------------------------------------------------------- 1 | # Lecture 27 2 | 3 | ## Computers Circa 1991 4 | 5 | - Disk bandwidth is improving rapidly, meaning OS can stream reads/writes to the disk faster 6 | - Computers have more memory (up to 128MB) 7 | - Disk seek times still slow 8 | 9 | ## Using what we Got 10 | 11 | - So if we can solve this seek issue, we can utilize growing disk bandwidth to improve filesystem performance 12 | * We have a bunch of spare memory, maybe that can be useful 13 | 14 | ## Use a Cache 15 | 16 | - How do we make a big slow thing look faster 17 | * Use a cache 18 | * In this case of the file system, the smaller faster thing is memory 19 | * We call the memory used to cache file system data the buffer cache 20 | - With a large cache, we should be able to avoid doing almost any disk reads 21 | - But we still have to do disk writes, but cache will still help collect small writes in memory until we can do one larger write 22 | 23 | ## Log Structured File Systems 24 | 25 | - All writes go to an append-only log 26 | - Example: change an existing byte in a file: 27 | 1. Seek to read the inode map 28 | 2. Seek to read the inode 29 | 3. Seek to write the data block 30 | 4. Seek to write the inode 31 | - For a cached-read write: 32 | 1. Read the inode map from cache 33 | 2. Read the inode from cache 34 | 3. Seek to write the data block 35 | 4. Seek to write the inode 36 | - For an LFS write: 37 | * Reads are handled by the cache and writes can stream to the disk at full bandwidth due to short seeks to append to the log 38 | - When do we write to the log? 39 | * When the user calls `sync`, `fsync` or when blocks are evited from the buffer cache 40 | 41 | ## Locating LFS Inodes 42 | 43 | - How did FFS translate an inode number to disk block 44 | * It stored the inode map in a fixed location on disk 45 | - Why is this a problem for LFS 46 | * Inodes are just appended to the log and so they can move 47 | - And so what do you think LFS does about this? 48 | * It logs the inode map 49 | 50 | ## LFS Metadata 51 | 52 | - What about file system metadata: inode and data block allocation bitmaps, etc 53 | * We can log that stuff too 54 | 55 | ## As the Log Turns 56 | 57 | - What happens when the log reaches the end of the disk 58 | * Probably a lot of unused space earlier in the log due to overwritten inodes, data blocks, etc 59 | - How do we reclaim this space? 60 | * Clean the log by identifying empty space and compacting used blocks 61 | - Conceptually you can think of this happening across the entire disk all at once, but for performance reasons LFS divides the disk into segments which are cleanly separated 62 | 63 | ## The Devil is in the Cleaning 64 | 65 | - LFS seems like a great idea until you think about cleaning 66 | 67 | ## Cleaning Questions 68 | 69 | - When should we run the cleaner? 70 | * Probably when the system is idle which may be a problem on systems that don't idle much 71 | - What size segments should we clean? 72 | * Large segments amortize the cost to read and write all of the data necessary to clean the segment 73 | * But small segments increase the probability that all blocks in a segment will be dead making cleaning trivial 74 | - What other effect does log cleaning have? 75 | * Cleaner overhead is very workload-dependent making it difficult to reason about the performance of log-structure file system 76 | 77 | ## Reading Questions 78 | 79 | - Let's say that the cache does not soak up as many reads as we were hoping 80 | - What problem can LFS create? 81 | * Block allocation is extremely discontiguous, meaning that reads may seek all over the disk 82 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-14-scheduling-story.md: -------------------------------------------------------------------------------- 1 | # Lecture 14 2 | 3 | ## Oracular Spectacular 4 | 5 | - Normally we cannot predict the future 6 | * Control flow is unpredictable 7 | * Users are unpredictable 8 | - Instead we use the past to predict the future 9 | * What did the thread do recently? It will probably keep doing that 10 | 11 | ## Multi-Level Feedback Queues 12 | 13 | - Choose a scheduling quantum 14 | - Establish some number of queues each representing a level 15 | - Threads from the highest-level queues are chosen first 16 | - Then: 17 | 1. Choose and run a thread from the highest-level non-empty queue 18 | 2. If the thread blocks or yields, promote it to a higher level queue 19 | 3. If the thread must be preempted at the end of a quantum, demote it to a lower level queue 20 | - What happens to: 21 | * CPU-bound threads? They descend to the depths 22 | * I/O-bound threads? They rise to the heights 23 | - Can anyone spot any problems with this approach? 24 | * Starvation i.e., threads trapped in the lower queues may never have a chance to run 25 | - One solution is to periodically rebalance the levels by tossing everyone back to the top level 26 | 27 | ## Establishing Priorities 28 | 29 | - Priorities are a scheduling abstraction that allows user or system to assign relative importance between tasks 30 | - For example: 31 | * Backup task: low priority 32 | * Video encoding: low priority 33 | * Video playback: high priority 34 | * Interactive apps: medium priority 35 | - Priorities are always relative 36 | 37 | ## Priority Starvation 38 | 39 | - Strict priorities can lead to starvation when low-priority threads are constantly blocked by high-priority threads with work to do 40 | - One solution is lottery scheduling: 41 | 1. Give each thread a number of tickets proportional to their priority 42 | 2. Choose a ticket at random - the thread holding the ticket gets to run 43 | - Priorities may also be used to determine how long threads are allowed to run i.e., dynamically adjusting their time quantum 44 | 45 | ## Linux Scheduling Pre-2.6 46 | 47 | - Scheduler scaled poorly requiring O(n) time to schedule tasks where n is the number of runnable threads 48 | 49 | ## Linux 2.6 Scheduler 50 | 51 | - Linux kernel scheduler maintainer implemented a new O(1) scheduler to address scalability issues with eaerlier approach 52 | - O(1) scheduler combines a static and dynamic priority 53 | * Static priority: set by user or system using `nice` 54 | * Dynamic priority: potential boost to static priority intended to reward interactive threads 55 | 56 | ## Rotating Staircase Deasdline Scheduler (RSDL) 57 | 58 | - One parameter: round-robin interval 59 | - One input: thread priority 60 | - Priority defines levels at which task can run 61 | * High priority tasks: more levels, more chances to run 62 | * Low priority tasks: fewer levels, fewer chances to run 63 | - Tasks can run for at most a fixed amount of time per level 64 | - Each level can also run for at most a fixed amount of time 65 | 66 | ## RSDL 67 | 68 | - To begin a scheduling epoch: 69 | 1. Put all threads in a queue determined by priority 70 | 2. If a thread blocks or yields, remains at level 71 | 3. If thread runs out of quaota, moves to next level down 72 | 4. If level runs out of its quota, all threads move to the level down 73 | 5. Continue until all quotas exhausted or no threads are runnable, then restart another epoch 74 | 75 | ## RSDL Pros 76 | 77 | - Easily calculate how long it will be before a thread at a certain priority level runs 78 | - Simple, fixed accouting; scheduling is O(1) 79 | - More recent versions use interleaving to further reduce delay between tasks scheduling with different priorities -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-17-page-translation.md: -------------------------------------------------------------------------------- 1 | # Lecture 17 2 | 3 | ## Pages 4 | 5 | - Modern solution is to choose a translation granularity that is small enough to limit internal fragmentation but large enough to allow TLB to cache entries covering a significant amount of memory 6 | * Also limits the size of kernel data structures associated with memory management 7 | - Execution locality also helps here: process memory accesses are typically highly spatially clustered meaning that even a small cache can be effective 8 | 9 | ## Page Size 10 | 11 | - 4K is a very common page size 12 | - 8K or larger pages are also sometimes used 13 | - 4K pages and a 128-entry TLB allow caching translations for 1 MB of memory 14 | - You can think of pages as fixed size segments so the bound is the same for each 15 | 16 | ## Page Translation 17 | 18 | - We refer to the portion of the virtual address that identifies the page as the virtual page number (VPN) and the remainder as the offset 19 | - Virtual pages map to physical pages 20 | - All addresses inside a single virtual page map to the same physical page 21 | - Check: for 4K pages, split 32-bit address into virtual page number (top 20 bits) and offset (bottom 12 bits); check if a virtual page to physical page translation exists for this page 22 | - Translate: physical address = physical page + offset 23 | 24 | ## TLB Example 25 | 26 | - Assume we have a TLB: 27 | * 0x10 to 0x50 28 | * 0x800 to 0x306 29 | * 0x110 to 0x354 30 | * 0x674 to 0x232 31 | - For 0x800346 virtual address we split it to 0x800 and 346 where 346 is the offset 32 | - 0x800 maps to 0x306 physical page number 33 | - We combine the physical page number and the offset to get the physical address of 0x306346 34 | 35 | ## TLB Management 36 | 37 | - Where do entries in the TLB come from? 38 | * The OS loads them 39 | - What happens if a process tries to access an address not in the TLB? 40 | * TLB asks OS for help via a TLB exception 41 | * OS either load the mapping or figure out what to do with the process (possibly termination) 42 | 43 | ## Paging Pros 44 | 45 | - Maintains many of the pros of segmentation, which can be layered on top of paging 46 | - Can organize and protect regions of memory appropriately 47 | - Better fit for address spaces 48 | * Even less internal fragmentation than segmentation due to smaller allocation size 49 | 50 | ## Paging Cons 51 | 52 | - Requires per-page hardware translation: use hardware to help us 53 | - Requires per-page OS state: a lot of clever engineering here 54 | 55 | ## Page State 56 | 57 | - In order to keep the TLB up-to-date we need to be able to: 58 | * Store info about each virtual page 59 | * Locate that information quickly 60 | 61 | ## Page Table Entries (PTEs) 62 | 63 | - We refer to a single entry storing information abou ta single virtual page used by a single process as a page table entry (PTE) 64 | * Can usually jam everything into one 32-bit machine word: 65 | + Location: 20 bits (physical page number or location on disk) 66 | + Permissions: 3 bits (read, write, execute) 67 | + Valid: 1 bit (is the page located in memory) 68 | + Referenced: 1 bit (has the page been read/written to recently) 69 | 70 | ## Locating Page State 71 | 72 | - Process: "Machine, store to address 0x10000" 73 | - MMU: "Where is the virtual address 0x10000" supposed to be? Kernel, help!" 74 | - Exception 75 | - Kernel: "Let's see where did put that page table entry for 0x10000. I should be more organized!" 76 | - What are some requirements for how we locate page information: 77 | * Speed: translation is a hot path and should be efficient as possible 78 | - Data structure used to quickly map a virtual page number to a page table entry is called a page table -------------------------------------------------------------------------------- /cramming/past/minesweeper/minesweeper_2.py: -------------------------------------------------------------------------------- 1 | # Implement minesweeper 2 | # 3 | # Same as initial version but rectangle and 4 | # more object oriented 5 | 6 | import random 7 | 8 | class Grid: 9 | 10 | def __init__(self, n_rows, n_cols, n_mines): 11 | 12 | self.n_rows = n_rows 13 | self.n_cols = n_cols 14 | self.n_mines = n_mines 15 | self.grid = self.create_grid(n_rows, n_cols, n_mines) 16 | 17 | 18 | def create_grid(self, n_rows, n_cols, n_mines): 19 | 20 | # Initialize grid to zeros 21 | grid = [[0 for col in range(n_cols)] for row in range(n_rows)] 22 | 23 | # Randomly assign mines 24 | self.assign_mines(grid, n_rows, n_cols, n_mines) 25 | 26 | # Assign values to each grid based on placed bombs 27 | self.assign_values(grid, n_rows, n_cols) 28 | 29 | return grid 30 | 31 | 32 | def assign_mines(self, grid, n_rows, n_cols, n_mines): 33 | 34 | set_mines = 0 35 | while set_mines < n_mines: 36 | row = random.choice(range(0, n_rows)) 37 | col = random.choice(range(0, n_cols)) 38 | # Mark selected location as a bomb 39 | if grid[row][col] == 0: 40 | grid[row][col] = "b" 41 | set_mines += 1 42 | 43 | 44 | def assign_values(self, grid, n_rows, n_cols): 45 | 46 | for row in range(0, n_rows): 47 | for col in range(0, n_cols): 48 | if grid[row][col] != "b": 49 | n_bombs = 0 50 | # Check one to the right 51 | if col < n_cols - 1: 52 | if grid[row][col + 1] == "b": 53 | n_bombs += 1 54 | # Check one to the left 55 | if col > 0: 56 | if grid[row][col - 1] == "b": 57 | n_bombs += 1 58 | # Check one below 59 | if row > 0: 60 | if grid[row - 1][col] == "b": 61 | n_bombs += 1 62 | # Check one above 63 | if row < n_rows - 1: 64 | if grid[row + 1][col] == "b": 65 | n_bombs += 1 66 | grid[row][col] = n_bombs 67 | 68 | 69 | if __name__ == "__main__": 70 | 71 | # Grid setup 72 | n_rows, n_cols, n_mines = 2, 3, 3 73 | 74 | # Create the grid 75 | grid = Grid(n_rows, n_cols, n_mines).grid 76 | 77 | # Initialize solution grid 78 | solution = [["X" for i in range(n_cols)] for j in range(n_rows)] 79 | 80 | # Start the game 81 | n_found, n_total = 0, n_rows * n_cols - n_mines 82 | while True: 83 | 84 | # Re-display game board 85 | for row in solution: 86 | print row 87 | 88 | # Ask for user input 89 | row_inp = int(raw_input("choose row: ")) 90 | col_inp = int(raw_input("choose column: ")) 91 | mark = raw_input("mark as portential bomb? (y/n): ") 92 | 93 | # Check input against grid 94 | if mark == "y": 95 | solution[row_inp][col_inp] = "?" 96 | else: 97 | if grid[row_inp][col_inp] == "b": 98 | game_status = "You Lost" 99 | break 100 | else: 101 | solution[row_inp][col_inp] = str(grid[row_inp][col_inp]) 102 | n_found += 1 103 | if n_found == n_total: 104 | game_status = "You Won" 105 | break 106 | 107 | # Print solution 108 | for row in grid: 109 | print row 110 | 111 | print game_status 112 | 113 | 114 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-32-performance.md: -------------------------------------------------------------------------------- 1 | # Lecture 32 2 | 3 | ## OS Performance 4 | 5 | 1. Measure your system 6 | * How and doing what? 7 | + High level software counters may not have fine enough resolution to measure extremely fast events 8 | + Low level hardware counters may have extremely device-specific interfaces making cross-platform measurements more difficult 9 | + Measurements should be repeatable right? 10 | - Wrong, you are measuring the present but rest of system is trying to use the past to predict the future 11 | - In generate real systems are almost never in the exact same state as they were last time you measured whatever you are trying to measure 12 | + Measurements tends to affect the thing that you are trying to measure 13 | - This has three results: 14 | 1. Measurement may destroy the problem you are trying to measure 15 | 2. Must separate results from the noise produced by measurement 16 | 3. Measurement overhead may limit your access to real systems 17 | + This is even more fraught given how central the OS is to the operation of computer itself 18 | - Difficult to find the appropriate places to insert debugging hooks 19 | - OS can generate a lot of debugging output e.g., imagine tracing every page fault 20 | * Benchmarking real systems seem real hard. What else can we do? 21 | + Build a model: abstract away all of the low-level details and reason analytically 22 | + Build a simulator: write some additional code to perform a simplified simulation of more complex parts of the system - particularly hardware 23 | * Models: 24 | + Pros: 25 | - Can make a strong mathematical guarantees about system performance 26 | + Cons: 27 | - Usually after making a bunch of unrealistic assumptions 28 | * Simulations: 29 | + Pros: 30 | - Best case, experimental speedup outweights lack of hardware details 31 | + Cons: 32 | - Worst case, bugs in the simulation leads you in all sorts of wrong directions 33 | * What metric do I use to compare: 34 | + Two disk drives 35 | + Two scheduling algorithms 36 | + Two page replacement algorithms 37 | + Two file systems 38 | * Microbenchmarks: isolate one aspect of system performance 39 | + Example: measuring virtual memory system 40 | - Time to handle single page fault 41 | - Time to look up page in the page table 42 | - Time to choose a page to evict 43 | + Problem: may not be studying the right thing 44 | * Macrobenchmarks: measure one operation involving many parts of the system working together 45 | + Example: measuring virtual memory system 46 | - Aggregate time to handle page faults on a heavily-loaded system 47 | - Page fault rate 48 | + Problem: introduces many, many variables that can complicate analysis 49 | * Application benchmarks: focus on the performance of the system as observed by one application 50 | + Example: measuring virtual memory system 51 | - `triplesort` 52 | - `parallelvm` 53 | + Problem: improvements for the app may harm others 54 | * Benchmark bias 55 | + People choosing and running benchmarks may be trying to justify some change by making their system look faster 56 | + Alternatively people chose a benchmark and did work to improve its performance while ignoring other effects on the system 57 | * Fundamental tension: most useful system is general purpose but the fast system is a single purpose system 58 | 2. Analyze the results 59 | * Use statistics 60 | 3. Improve the slow parts 61 | * How and which slow parts? 62 | 4. Repeat 63 | 64 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-11-threads.md: -------------------------------------------------------------------------------- 1 | # Lecture 11 2 | 3 | ## Threads 4 | 5 | - What is a thread? 6 | * Registers 7 | * Stack 8 | - How are each of the following shared between threads or processes? 9 | * Registers (private to thread) 10 | * Stack (private to thread) 11 | * Memory - shared between multiple threads (part of process) 12 | * File descriptor table - shared between multiple threads (part of process) 13 | 14 | ## Why Use Threads? 15 | 16 | - Threads can be a good way of thinking about apps that do multiple things "simultaneously" 17 | - Threads may naturally encapsulate some data about a certain thing that the app is doing 18 | - Threads may help apps hide or parallelize delays caused by slow devices 19 | 20 | ## Threads vs Events 21 | 22 | - While threads are a reasonable way of thinking about concurrent programming, they are not the only way to make use of system resources 23 | - Another approach is event-driven programming 24 | - Anyone who has done Javascript development or used frameworks e.g., `node.js` has grown familiar with this programming model 25 | - Simplification of events vs threads: 26 | * Threads can block so we make use of the CPU by switching between threads 27 | * Even handlers cannot block so we can make use of the CPU by simply running events until completion 28 | 29 | ## Naturally Multithreaded Applications 30 | 31 | - Web server 32 | * Use a separate thread to handle each incoming requeset 33 | - Web browser 34 | * Separate threads for each open tab 35 | * When loading a page, separate threads to request and receive each unique part of the page 36 | - Scientific applications 37 | * Divide-and-conquer parallelizable datasets 38 | 39 | ## Why Not Processes? 40 | 41 | - IPC is more difficult because kernel tries to protect processes from each other 42 | * Inside a single process, anything goes 43 | - State associated with processes that doesn't scale very well 44 | 45 | ## Implementing Threads 46 | 47 | - Threads can be implemented in userspace by unprivileged libraries 48 | * This is the `M:1` threading model 49 | + `M` user threads that look like `1` thread to the OS kernel 50 | - Threads can be implemented by the kernel directly 51 | * This is the `1:1` threading model 52 | 53 | ## Implenting Threads in Userspace 54 | 55 | - How is this possible? 56 | * Doesn't involve multiplexing between processes so no kernel privilege required 57 | - How do I: 58 | * Save and restore context? 59 | + Just saving and restoring registers 60 | + C library has an implementation called `setjmp()` and `longjmp()` 61 | * Preempt other threads? 62 | + Use periodic signals delivered by the OS to activate a userspace thread scheduler 63 | 64 | ## Comparing Thrading Implementations 65 | 66 | - `M:1` userspace threading 67 | * Pros: 68 | + Threading operations are much faster because they do not have to cross the user/kernel boundary 69 | + Thread state can be smaller 70 | * Cons: 71 | + Can't use multiple cores 72 | + OS may not schedule the app correctly because it doesn't know about the fact that it contains more than one thread 73 | - `1:1` kernel threading 74 | * Pros: 75 | + Scheduling might improve because kernel can schedule all threads in the process 76 | * Cons: 77 | + Context switch overhead for all threading operations 78 | 79 | ## Thread States 80 | 81 | - Several different states: 82 | 1. Running: executing instructions on CPU core 83 | 2. Ready: not executing instructions but capable of being restarted 84 | 3. Waiting, blocked, sleeping: not executing instructions and not able to be restarted until some event occurs 85 | - Transitions: 86 | 1. Running to ready: thread was descheduled 87 | 2. Running to waiting: thread performed a blocking system call 88 | 3. Waiting to ready: event thread was waiting for happened 89 | 4. Ready to running: thread was scheduled 90 | 5. Running to terminated -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/4-Data-Partitioning.md: -------------------------------------------------------------------------------- 1 | # Data Partitioning 2 | 3 | - Data partitioning breaks up big database into smaller parts 4 | - Splits up database across machines to improve manageability, performance, availability, and load balancing of an applicaiton 5 | - At certain scale point, cheaper to scale horizontally than grow vertically 6 | 7 | ## Partitioning Methods 8 | 9 | - Three most populat schemes used by various large scale apps: 10 | 1. Horizontal partitioning 11 | * Put different rows into different tables 12 | * Example: zip codes < 10000 stored in one table and the rest in another (range based partitioning) 13 | * Also known as data sharding 14 | * Need to carefully chose ranges or else unbalanced servers occur 15 | 2. Vertical partitioning 16 | * Divide our data to store tables related to a specific feature in their own server 17 | * Example: place profile info on one DB server, friends list on another and photos on a third server 18 | * Straightforward to implement and has low impact on app 19 | * Main problem is that if app grows, may be necessary to further partition a feature specific DB across various servers 20 | 3. Directory based partitioning 21 | * Create a lookup service that knows current partitioning scheme and abstracts it away from DB access code 22 | * To find out where a particular data entity resides, we query the directory server that holds the mapping between each tuple key to DB server 23 | * We can perform tasks e.g., adding servers to DB pool or change partitioning scheme without having impact on app 24 | 25 | ## Partitioning Criteria 26 | 27 | - Key or hash-based partitioning 28 | * Apply a hash function to some key attributes that yields the partition number 29 | * Example: if we have 100 DB servers and our ID is an incrementing numerical value we can hash with `ID % 100` to determine which DB server to assign a record 30 | * Problem with approach is that it effectively fixes the number of total DB servers since adding a new one would require redistribution of data and downtime 31 | * Workaround is to use consistent hashing 32 | - List partitioning 33 | * Each partition is assigned a list of values so whenever we want to insert a new record, we see which partition contains our key and then store it there 34 | * Example: users living in Iceland, Norway, Sweden, Finland and Denmark are in partition for Nordic countries 35 | - Round-robin partitioning 36 | * With n paritions, the `i` tuple is assinged to partition `i mod n` 37 | - Composite partitioning 38 | * Combine any of the above partitioning schemes to devise a new scheme 39 | 40 | ## Common Problems of Data Partitioning 41 | 42 | - Extra constraints on different operations that can be performed on a partitioned database 43 | - Operations across multiple tables or multiple rows in the same table can no longer run on the same server 44 | - Some constraints and additional complexities caused by partitioning: 45 | 1. Joins and denormalization 46 | * Once DB is partitioned and spread across multiple machines, often not feasible to perform joins 47 | * Data has to be compiled from multiple servers 48 | * Workaround is to denormalize the DB so that queries that previously required joins can be performed on a single table 49 | 2. Referential integrity 50 | * Trying to enforce data integrity constraints e.g., foreign keys in a partitioned DB can be difficult 51 | * Most RDBMS do not support foreign keys constraints across DBs on different DB servers 52 | * This means apps that require referential integrity on partitioned DBs have to enforce it in the app code 53 | 3. Rebalancing 54 | * Many reasons we have to change our partitioning scheme: 55 | 1. Data partition is not uniform 56 | 2. A lot of load on a particular partition 57 | * In these cases, we either have to: 58 | 1. Create more DB paritions 59 | 2. Rebalance existing partitions which means paritioning scheme changed and all existing data moved to new locations 60 | * Rebalancing without incurring downtime is difficult -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-20-page-replacement.md: -------------------------------------------------------------------------------- 1 | # Lecture 20 2 | 3 | ## Page Eviction 4 | 5 | - In order to swap out a page we need to choose which page to move to disk 6 | - In order to swap in a page we might need to chooose which page to swap out 7 | - Swapping cost-benefit calculation: 8 | * Cost: mainly time and disk bandwidth required to move a page to and from disk 9 | * Benefit: Use of 4K (or a page) of memory as long as the page on disk remains unused 10 | - There are tricks that the OS might play to minimize the cost, but mainly we focus oon algorithms that maximize the benefit 11 | - Another complementary description of our goal is minimizing the page fault rate 12 | 13 | ## Maximizing Benefit 14 | 15 | - Benefit: use of 4K memory as long as the page on disk remains unused 16 | - How do we maximize the benefit: 17 | * Pick the page to evict that will remain unused the longest 18 | 19 | ## Best Case Scenario 20 | 21 | - What is the absolute best page to evict, the one that page replacement algorithms dream about? 22 | * A page that will never be used again 23 | 24 | ## Thrashing 25 | 26 | - Virtual memory subsystem is in constant state of paging, rapidly exchanging data in memory for data on disk 27 | - Causes performance of the computer to degrade or collapse 28 | 29 | ## Break Out the Ball 30 | 31 | - What would we like to know about a page when choosing one to evict? 32 | * How long will it be before this page is used again? 33 | - The optimal scheduler evicts the page that will remain unused the longest 34 | - This clearly maximizes our swapping cost-benefit calculation 35 | 36 | ## Past Didn't Go Anywhere 37 | 38 | - Intelligent page replacement requires three things: 39 | 1. Determining what information to track 40 | 2. Figuring out how to collect that information 41 | 3. How to store it 42 | 43 | ## There are Tradeoffs 44 | 45 | - Collecting statistics may be expensive, slowing down the prcess of translating virtual addresses 46 | - Storing statistics may be expensive, occupying kernel memory that cuold be used for other things 47 | 48 | ## Simplest 49 | 50 | - What is simplest possible page replacement algorithm 51 | * Random 52 | - Pros 53 | * Easy 54 | * Good baseline for algorithms that may try to be smarter 55 | - Cons 56 | * Too simple 57 | 58 | ## Use the Past 59 | 60 | - What is an algorithm that uses a page's past to predict its future 61 | * Least recently used 62 | * Choose page that has not been used for the longest period of time 63 | * Hopefully this is a page that will not be used for a while 64 | - Pros 65 | * Good as we can do without predicting the future 66 | - Cons 67 | * How do we tell how long it has been since a page has been accessed 68 | 69 | ## LRU: Collecting Statistics 70 | 71 | - At what point does OS know that a process has accessed a virtual page? 72 | * When we load the entry into the TLB 73 | - Does this reflect every virtual page access? 74 | * No, only the first 75 | * A page that is accessed once and one that is accessed 1000 times are indistinguishable 76 | - Why not record every page access? 77 | * Too slow 78 | 79 | ## LRU: Storing Statistics 80 | 81 | - How much access time information can we store? 82 | * 32 bits = 2^32 ticks but doubles the page table entry size 83 | * 8 bits = 256 ticks 84 | - How do we find the least recently used page? 85 | * Need soome kind of efficient data structure holding all physical pages on the system that is searched on every page eviction 86 | 87 | ## Clock LRU 88 | 89 | - Simple and efficient LRU-like algorithm 90 | - One bit of accessed information, set when loading a virtual address into the TLB 91 | - To locate a page to evict: 92 | 1. Cycle through all pages in memory in a fixed order 93 | 2. If a page accessed bit is clear, evict that page 94 | 3. If a page accessed bit is set, clear the bit 95 | - If clock hand turning slowly 96 | * Little memory pressure 97 | * Making good decisions on what to evict 98 | - If clock hand turning rapidly 99 | * Lots of memory pressure 100 | * Making bad decisions on what to evict -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-3-process-file-handlers.md: -------------------------------------------------------------------------------- 1 | # Lecture 3 2 | 3 | ## Processes 4 | - `ps aux | grep bash` to view bash processes running 5 | - `pgrep bash` to view PID of bash processes 6 | - `ps -Lf ` to view threads 7 | * `UID` = user running the process 8 | * `PID` 9 | * `PPID` 10 | * `LWP` = lightweight process i.e., thread ID 11 | * `PRI` = scheduling priority 12 | * `SZ` = size of core image (kB) 13 | * `WCHAN` = if process is running, description of what it is waiting on 14 | * `RSS` = total amount of resident memory in use by process (kB) 15 | * `TIME` = measure of amount of time process spent running 16 | - `pmap ` to show mapping between process address space and content 17 | * `0005637ceb00000 1368K r-x-- systemd` 18 | * ` ` 19 | * Note: `anon` = memory used by process for its heap 20 | * Note: `stack` = memory used by process for its stack 21 | - `lsof -p ` to list open files of a process 22 | * Note: `/dev/pts/0` indicates open terminal 23 | - Information obtained by `ps`, `pmap`, `lsof` is from `/proc` filesystem 24 | * No disk block storing the `/proc` filesystem 25 | * Linux creates these pseudofiles 26 | 27 | ## File Handles 28 | 29 | - File descriptor that processes receive from `open()` and pass to other file system calls is just an integer, an index into the process file table 30 | - That integer refers to a file handle object maintained by the kernel 31 | - That file handle object contains a reference to a separate file object also maintained by the kernel 32 | - The file object is mapped by the file system to blocks on disk 33 | - Three levels of indirection: 34 | 1. File descriptor --> file handle 35 | 2. File handle --> file object 36 | 3. File object --> blocks on disk 37 | - Why this extra indirection with the file handle? 38 | * Allows certain pieces of state to be shared separately 39 | * File descriptors are private to each process 40 | * File handles are private to each process but shared after process creation 41 | + File handles store the current file offset or position in the file that the next read will come from or write will go to 42 | + File handles can be deliberately shared between two processes 43 | * File objects hold other file state and can be shared transparently between many processes 44 | * This follows an OS design principle of separating policy from mechanism 45 | * This also facilitates control or sharing by adding a level of indirection 46 | 47 | ## fork: Create a New Process 48 | 49 | - `fork()` is the system call to create a new proocess 50 | * `fork()` creates a new process that is a copy of the calling process 51 | * After `fork()` completes, we refer to the caller as the parent and the newly created process as the child 52 | - Generally `fork()` tries to make an exact copy of the calling process 53 | - Threads are a notable exceptioon 54 | - Single-threaded `fork()` has reliable semantics because the only thread process had is the one that called `fork()` 55 | * So nothing else is happening while we complete the system call 56 | - Multi-threaded `fork()` creates a host of problems that many systems choose to ignore 57 | * Linux will only copy state for the thread that called `fork()` 58 | * Two major problems with multi-threaded `fork()` 59 | 1. Another thread could be blocked in the middle of doing something (uniprocessor systems) 60 | 2. Another thread could be actually doing something (multiprocessor systems) 61 | * This ends up being a mess so we just copy the calling thread 62 | - `fork()` copies 63 | * One thread -- the caller 64 | * The address space 65 | * The process file table 66 | - `fork()` returns two times 67 | * The child thread returns executing at the exact same poiont that its parent called `fork()` 68 | * `fork()` returns twice: the PID to the parent and 0 to the child 69 | - All contents oof memory in the parent and child are identical 70 | - Both child and parent have the same files open at the same position but since they are sharing file handles changes to the file made by the parent or child will be reflected in the other 71 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-22-files.md: -------------------------------------------------------------------------------- 1 | # Lecture 22 2 | 3 | ## File Systems to the Rescue 4 | 5 | - Low-level disk interface is messy and very limited 6 | * Requires reading and writing entire 512-byte blocks 7 | * No notion of files, directories, etc 8 | - File systems take this limited block-level device and create the file abstraction almost entirely in software 9 | * Compared to the CPU and memory that we have studied previously more of the file abstraction is implemented in software 10 | * This explains the plethora of available file systems: EXT2/3/4, reiserfs, ntfs, jfs, xfs, etc 11 | 12 | ## What About Flash 13 | 14 | - No moving parts 15 | - We can eliminate a lot of the complexity of modern file systems 16 | - Except that: 17 | * Have to erase an entire large chunk before we can rewrite it 18 | * Wears out faster than magnetic drives, and can wear unevenly if we are not careful 19 | 20 | ## Clarifying the Concept of a File 21 | 22 | - Most of us are familiar with files but the semantics of file have a variety of sources which are worth separating: 23 | * Just a file: minimum it takes to be a file 24 | * About a file: what other useful information do most file systems typically store about files 25 | * Files and porocesses: what additional properties does the UNIX file system interface introduce to allow user processes to manipulate files 26 | * Files together: given multiple files, how do we organize them in a useful way 27 | 28 | ## Just a File: The Minimum 29 | 30 | - What does a file have to do to be useful: 31 | * Reliably store data 32 | * Be located usually via a name 33 | 34 | ## Basic File Expectations 35 | 36 | - At minimum we expect that 37 | * File contents should not change unexpectedly 38 | * File contents should change when requested and as requested 39 | - These requirements seem simple but many file systems do not meet them 40 | - Failures such as power outages and sudden ejects make file system design difficult and exposed tradeoffs between durability and performance 41 | * Memory: fast, transient 42 | * Disk: slow, stable 43 | 44 | ## About a File: File Metadata 45 | 46 | - What else might we want to know about a file: 47 | * When was the file created, last accessed, or last modified 48 | * Who is allowed to do what to the file e.g., read, write, rename, change other attributes, etc 49 | * Other file attributes? 50 | 51 | ## Where to Store File Metadata 52 | 53 | - MP3 file contains audio data but also has title, artist, date 54 | - Where should these attributes be stored: 55 | * In the file itself 56 | * In another file 57 | * In attributes associated with the file maintained by the file system 58 | - In the file: 59 | * Ex) MP3 ID3 tag, data container stored within an MP3 file 60 | * Pro: travels with the file from computer to computer 61 | * Con: requires programs that access the file to understand the format of the embedded data 62 | - In another file: 63 | * Ex) iTunes database 64 | * Pro: can be maintained separately by each application 65 | * Con: Does not move with the file and the separate file mnust be kept in sync when the files it stores info about changes 66 | - In attributes: 67 | * Ex) attributes have been supported by a variety of file systems including BFS 68 | * Pro: maintained by the file system so can be queried quickly 69 | * Con: does not move with the file and creates compatability problems with other file systems 70 | 71 | ## Processes and Files: UNIX Semantics 72 | 73 | - Many file systems provide an interface for establishing a relationship between a process and a file: 74 | * "I have a file open. I am using this file" 75 | * "I am finished using the file and will close it now" 76 | - Why does the file system wabnt to establish these process-file relationships? 77 | * Can improve performance if the OS knows what files are actively being used by using caching or read-ahead 78 | * File system may provide guarantees to processes based on thie relationship e.g., exclusive access 79 | * Some file systems (particularly network file systems) don't even bother with establishing these relationships 80 | - UNIX semantics simplify reads and writes to files by storing the file position for processes 81 | * This is a convenience, not a requirement: processes could be required to provide a position with every read and write 82 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-12-scheduling.md: -------------------------------------------------------------------------------- 1 | # Lecture 12 2 | 3 | ## Scheduling 4 | 5 | - Scheduling is the process of choosing the next thread (or threads) to run on the CPU (or CPUs) 6 | - Why schedule threads? 7 | * CPU multiplexing: we have more threads than cores to run them on 8 | * Kernel prvilege: we are in charge of allocating the CPU and must try to make good decisions 9 | - When does scheduling occur? 10 | * When a thread voluntarily gives up the CPU by calling `yield()` 11 | * When a thread makes a blocking system call and must sleep until the call completes 12 | * When a thread exits 13 | * When the kernel decides that a thread has run for long enough 14 | + This is what makes a scheduling policy preemptive as opposed to cooperative 15 | + Kernel can preempt (i.e., stop) a thread that has not requested to be stopped 16 | - What is the rationale behind having a way for threads to voluntarily give up the CPU? 17 | * `yield()` can be useful way of allowing well-behaved thread to tell CPU it has no more useful work to do 18 | * `yield()` is inherently cooperative i.e., "let me get out of the way so that another, more useful, thread can run" 19 | - How do I schedule threads? 20 | * Mechansim: how do we switch between threads? 21 | * Policy: how do we choose the next thread to run? 22 | - How do we switch between threads? 23 | * Perform a context switch and move threads between the ready, running and waiting queues 24 | 25 | ## Policy vs. Mechanism 26 | 27 | - Scheduling is an example of useful separateion between policy and mechanism: 28 | * Policies 29 | + Deciding what thread to run 30 | + Giving preference to interactive tasks 31 | + Choosing a thread to run at random 32 | * Mechanisms: 33 | + Context switch 34 | + Maintaining the running, read and waiting queues 35 | + Using timer interrupts to stop running threads 36 | 37 | ## Scheduling Matters 38 | 39 | - How the CPU is scheduled impacts every other part of the system 40 | * Using other system resources requires the CPU 41 | * Intelligent scheduling makes a modestly-powered system seem fast and responsive 42 | * Bad scheduling makes a powerful system seem sluggish and laggy 43 | - Responsiveness: when you give computer an instruction and it responds in a timely manner 44 | * May not finish, but at least you know it started 45 | * Most of what we do with computers consist of responsive tasks 46 | * Examples: web browsing, editing, chatting 47 | - Continuity: when you ask computer to perform a continuous task it does so smoothly 48 | * Implies active waiting: not interacting with computer but you are expecting it to continue to perform a task you have initiated 49 | * Examples: blinking cursor, playing music or a movie 50 | - Completion: when we ask the computer to perform a task (or it performs one on our behalf) that we expect to take a long time, we want it to complete eventually 51 | * Implies passive waiting: asking computer to continue to deliver interactive performance while working on your long-running task 52 | * Unlike responsive and continuous task, background tasks may not be user initiated 53 | * Examples: performing a system backup, indexing files on computer 54 | - Conflicting goals 55 | * Scheduling is a balance between meeting deadlines and optimizing resource allocation 56 | + Optimal resource allocation: allocate tasks so that all resources constantly in use 57 | + Meeting deadlines: drop everything and do a certain task 58 | * Responsiveness and continuity require meeting deadlines 59 | + Responsiveness have unpredictable deadlines e.g., when user moves the mouse, I need to be ready to redraw the cursor 60 | + Continuity have predictable deadlines e.g., every 5ms I need to write more data to the sound card buffer 61 | * Throughput requires careful resource allocation 62 | + Throughput require optimal resource allocation e.g., I should really give the backup process more resources so that it can finish overnight 63 | - Deadlines win 64 | * Humans are sensitive to responsiveness and continuity 65 | * We don't notice resource allocation as much 66 | * Poor responsiveness or continuity wastes our time 67 | 68 | ## Scheduling Goals 69 | 70 | - How well does it meet deadlines (unpredictable or predictable) 71 | - How completely does it allocate system resources 72 | * No point having idle CPU, memory or disk bandwidth when something useful could be happening 73 | - On human-facing systems, typically deadlines win 74 | - For human-facing systems, if system doesn't meet deadlines it is typically just annoying e.g., buffering 75 | - For other classes of systems, failure to meet deadlines could be fatal -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-2-processes.md: -------------------------------------------------------------------------------- 1 | # Lecture 2 2 | 3 | ## Operating System Abstractions 4 | 5 | - Abstractions simplify application design by: 6 | 1. Hiding undesirable properties 7 | 2. Adding new capabilityes 8 | 3. Organizing information 9 | - Abstractions provide an interface to application programmers that separates 10 | * Policy (what interface commits to accomplishing) from 11 | * Mechanism (how the interface is implemented) 12 | 13 | ## Example Abstraction: File 14 | 15 | - What undesirable properties do files hide: 16 | * Disks are slow 17 | * Chunks of storage are distributed over the disk 18 | * Disk storage may fail 19 | - What capabilities do files add: 20 | * Growth and shrinking 21 | * Organization into directories 22 | - What information do files help organize: 23 | * Ownership and permissions 24 | 25 | ## Preview of Coming Abstractions 26 | 27 | - Thread map to CPU 28 | - Address space map to memory 29 | - File map to disk 30 | 31 | ## The Process 32 | 33 | - Processes are the most fundamental OS abstractions 34 | - Unlike threads, address spaces and files, processes are not tied to a hardware component 35 | - Instead processes contain other abstractions: 36 | * One or more threads 37 | * Address space 38 | - OS is responsible for isolating processes from each other 39 | * What you do in your own process is your own business but it shouldn't be able to crash the machine or affect other processes 40 | * Therefore, safe intra-process communication is your problem 41 | * Safe inter-process communication is an OS problem 42 | - Intra-process communication 43 | * Communication between multiple threads in a process usually accomplished using shared memory 44 | * Threads within a process share open file handles and both static and dynamically-allocated global variables 45 | * Thread stacks and thus thread local variables are typically private 46 | * Sharing data requires synchronization mechanisms to ensure consistency 47 | - Inter-process communication 48 | * A variety of mechanisms exist to enable inter-process communication (IPC) 49 | + Shared files or sockets, exit codes, signals, pipes, shared memory 50 | * All require coordination between the communicating processes 51 | * Most have semantics limiting the degree to which processes can interfere with each other 52 | + A process can't just send a `SIGKILL` to any other process running on the machine 53 | - Return codes is an example of IPC 54 | * When process `exit` it returns exit code to parent process 55 | * In Bash run `echo $?` to get return code of previous command 56 | - Pipes 57 | * `ps aux | grep myprog` 58 | * Pipes create a producer-consumer buffer between two processes 59 | * Allows output from one process to be used as the input to another 60 | * OS manages a queue for each pipe to accomodate different input and output rates 61 | - Signals 62 | * `kill ` 63 | * `kill -9 ` sends `SIGKILL` which cannot be ignored by a process 64 | * `control-c` sends `SIGTERM` which processes can ignore 65 | * Signals are a limited form of asynchronous communication between processes 66 | * Processes can register a signal handler to run when a signal is received 67 | * Users can send signals to processes owned by them 68 | * Super-user can send a signal to any process 69 | * Processes can ignore most signals except `SIGKILL` (non-graceful termination) 70 | 71 | ## Processes vs. Threads 72 | 73 | - Note: We can describe both a process and a thread as running 74 | * Most apps are multi-thread 75 | * A process requires multiple resources: CPU, memory, files, etc 76 | * A thread of execution abstracts CPU state 77 | - Processes contain threads and threads belong to a process 78 | * Only one exception: kernel may have threads of execution not associated with any user process 79 | * Note: Except the kernel process which is a process 80 | - A process is considered to be running when one or more of its threads are running 81 | 82 | ## Process Example: Firefox 83 | 84 | - Firefox has multiple threads. What are they doing? 85 | * Waiting for and processing interface events e.g., mouse clicks, keyboard input, etc 86 | * Redrawing the screen as necessary in response to user input, web page loading, etc 87 | * Loading web pages - usually multiple parts in parallel to speed things up 88 | - Firefox is using memory. For what? 89 | * `Firefox.exe` i.e., the executable code of Firefox itself 90 | * Shared libraries for web page parsing, security, etc 91 | * Stacks storing local variables for running threads 92 | * A heap storing dynamically-allocated memory 93 | - Firefox has files open. Why? 94 | * Configuration files 95 | * Fonts -------------------------------------------------------------------------------- /cramming/lfs/code/md5sums: -------------------------------------------------------------------------------- 1 | 007aabf1dbb550bcddde52a244cd1070 acl-2.2.53.tar.gz 2 | bc1e5cb5c96d99b24886f1f527d3bb3d attr-2.4.48.tar.gz 3 | 50f97f4159805e374639a73e2636f22e autoconf-2.69.tar.xz 4 | 53f38e7591fa57c3d2cee682be668e5b automake-1.16.1.tar.xz 5 | 2b44b47b905be16f45709648f671820b bash-5.0.tar.gz 6 | 6582c6fbbae943fbfb8fe14a34feab57 bc-2.5.3.tar.gz 7 | 664ec3a2df7805ed3464639aaae332d6 binutils-2.34.tar.xz 8 | 49fc2cf23e31e697d5072835e1662a97 bison-3.5.2.tar.xz 9 | 67e051268d0c475ea773822f7500d0e5 bzip2-1.0.8.tar.gz 10 | 270e82a445be6026040267a5e11cc94b check-0.14.0.tar.gz 11 | 0009a224d8e288e8ec406ef0161f9293 coreutils-8.31.tar.xz 12 | e1b07516533f351b3aba3423fafeffd6 dejagnu-1.6.2.tar.gz 13 | 4824adc0e95dbbf11dfbdfaad6a1e461 diffutils-3.7.tar.xz 14 | 6d35428e4ce960cb7e875afe5849c0f3 e2fsprogs-1.45.5.tar.gz 15 | 5480d0b7174446aba13a6adde107287f elfutils-0.178.tar.bz2 16 | dedfb1964f6098fe9320de827957331f eudev-3.2.9.tar.gz 17 | d2384fa607223447e713e1b9bd272376 expat-2.2.9.tar.xz 18 | 00fce8de158422f5ccd2666512329bd2 expect5.45.4.tar.gz 19 | 3217633ed09c7cd35ed8d04191675574 file-5.38.tar.gz 20 | 731356dec4b1109b812fecfddfead6b2 findutils-4.7.0.tar.xz 21 | 2882e3179748cc9f9c23ec593d6adc8d flex-2.6.4.tar.gz 22 | f9db3f6715207c6f13719713abc9c707 gawk-5.0.1.tar.xz 23 | 3818ad8600447f05349098232c2ddc78 gcc-9.2.0.tar.xz 24 | 988dc82182121c7570e0cb8b4fcd5415 gdbm-1.18.1.tar.gz 25 | 9ed9e26ab613b668e0026222a9c23639 gettext-0.20.1.tar.xz 26 | 78a720f17412f3c3282be5a6f3363ec6 glibc-2.31.tar.xz 27 | a325e3f09e6d91e62101e59f9bda3ec1 gmp-6.2.0.tar.xz 28 | 9e251c0a618ad0824b51117d5d9db87e gperf-3.1.tar.gz 29 | 111b117d22d6a7d049d6ae7505e9c4d2 grep-3.4.tar.xz 30 | 08fb04335e2f5e73f23ea4c3adbf0c5f groff-1.22.4.tar.gz 31 | 5aaca6713b47ca2456d8324a58755ac7 grub-2.04.tar.xz 32 | 691b1221694c3394f1c537df4eee39d3 gzip-1.10.tar.xz 33 | 3ba3afb1d1b261383d247f46cb135ee8 iana-etc-2.30.tar.bz2 34 | 87fef1fa3f603aef11c41dcc097af75e inetutils-1.9.4.tar.xz 35 | 12e517cac2b57a0121cda351570f1e63 intltool-0.51.0.tar.gz 36 | ee8e2cdb416d4a8ef39525d39ab7c2d0 iproute2-5.5.0.tar.xz 37 | d1d7ae0b5fb875dc082731e09cd0c8bc kbd-2.2.0.tar.xz 38 | 1129c243199bdd7db01b55a61aa19601 kmod-26.tar.xz 39 | 4ad4408b06d7a6626a055cb453f36819 less-551.tar.gz 40 | e9249541960df505e4dfac0c32369372 lfs-bootscripts-20191031.tar.xz 41 | 52120c05dc797b01f5a7ae70f4335e96 libcap-2.31.tar.xz 42 | 6313289e32f1d38a9df4770b014a2ca7 libffi-3.3.tar.gz 43 | 169de4cc1f6f7f7d430a5bed858b2fd3 libpipeline-1.5.2.tar.gz 44 | 1bfb9b923f2c1339b4d2ce1807064aa5 libtool-2.4.6.tar.xz 45 | 3ea50025d8c679a327cf2fc225d81a46 linux-5.5.3.tar.xz 46 | 730bb15d96fffe47e148d1e09235af82 m4-1.4.18.tar.xz 47 | fc7a67ea86ace13195b0bce683fd4469 make-4.3.tar.gz 48 | 897576a19ecbef376a916485608cd790 man-db-2.9.0.tar.xz 49 | da25a4f8dfed0a34453c90153b98752d man-pages-5.05.tar.xz 50 | 9bf73f7b5a2426a7c8674a809bb8cae2 meson-0.53.1.tar.gz 51 | 4125404e41e482ec68282a2e687f6c73 mpc-1.1.0.tar.gz 52 | 320fbc4463d4c8cb1e566929d8adc4f8 mpfr-4.0.2.tar.xz 53 | cf1d964113a171da42a8940e7607e71a ninja-1.10.0.tar.gz 54 | e812da327b1c2214ac1aed440ea3ae8d ncurses-6.2.tar.gz 55 | 3be209000dbc7e1b95bcdf47980a3baa openssl-1.1.1d.tar.gz 56 | 78ad9937e4caadcba1526ef1853730d5 patch-2.7.6.tar.xz 57 | f399f3aaee90ddcff5eadd3bccdaacc0 perl-5.30.1.tar.xz 58 | f6e931e319531b736fadc017f470e68a pkg-config-0.29.2.tar.gz 59 | 2b0717a7cb474b3d6dfdeedfbad2eccc procps-ng-3.3.15.tar.xz 60 | 0524258861f00be1a02d27d39d8e5e62 psmisc-23.2.tar.xz 61 | b3fb85fd479c0bf950c626ef80cacb57 Python-3.8.1.tar.xz 62 | edc8c97f9680373fcc1dd952f0ea7fcc python-3.8.1-docs-html.tar.bz2 63 | 7e6c1f16aee3244a69aba6e438295ca3 readline-8.0.tar.gz 64 | 6d906edfdb3202304059233f51f9a71d sed-4.8.tar.xz 65 | 4b05eff8a427cf50e615bda324b5bc45 shadow-4.8.1.tar.xz 66 | c70599ab0d037fde724f7210c2c8d7f8 sysklogd-1.5.1.tar.gz 67 | 48cebffebf2a96ab09bec14bf9976016 sysvinit-2.96.tar.xz 68 | 83e38700a80a26e30b2df054e69956e5 tar-1.32.tar.xz 69 | 97c55573f8520bcab74e21bfd8d0aadc tcl8.6.10-src.tar.gz 70 | d4c5d8cc84438c5993ec5163a59522a6 texinfo-6.7.tar.xz 71 | f6987e6dfdb2eb83a1b5076a50b80894 tzdata2019c.tar.gz 72 | 27cd82f9a61422e186b9d6759ddf1634 udev-lfs-20171102.tar.xz 73 | 7f64882f631225f0295ca05080cee1bf util-linux-2.35.1.tar.xz 74 | f5337b1170df90e644a636539a0313a3 vim-8.2.0190.tar.gz 75 | 80bb18a8e6240fcf7ec2f7b57601c170 XML-Parser-2.46.tar.gz 76 | 003e4d0b1b1899fc6e3000b24feddf7c xz-5.2.4.tar.xz 77 | 85adef240c5f370b308da8c938951a68 zlib-1.2.11.tar.xz 78 | 487f7ee1562dee7c1c8adf85e2a63df9 zstd-1.4.4.tar.gz 79 | c1545da2ad7d78574b52c465ec077ed9 bash-5.0-upstream_fixes-1.patch 80 | 6a5ac7e89b791aae556de0f745916f7f bzip2-1.0.8-install_docs-1.patch 81 | a9404fb575dfd5514f3c8f4120f9ca7d coreutils-8.31-i18n-1.patch 82 | 9a5997c3452909b1769918c759eff8a2 glibc-2.31-fhs-1.patch 83 | f75cca16a38da6caa7d52151f7136895 kbd-2.2.0-backspace-1.patch 84 | 4900322141d493e74020c9cf437b2cdc sysvinit-2.96-consolidated-1.patch -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-21-disks.md: -------------------------------------------------------------------------------- 1 | # Lecture 21 2 | 3 | ## SSDs vs HDDs 4 | 5 | - Some important technology 6 | * Stable storage: storage that does not lose its contents when the computer is turned off 7 | - Today we have two main categories of stable storage with very different characteristics 8 | 1. HDD, spinning disk or harddrive: stable storage device constructed of rotating magnetic platters 9 | 2. SSD or flash drive: stable storage constructed of non-moving non-volatile memory 10 | - HDDs are bigger, slower and cheaper than SSDs 11 | - HDDs and SSDs lead to very different system designs 12 | 13 | ## Why Study Spinning Disks? 14 | 15 | - Flash is the future but HDDs still around 16 | - Hierarchical file systems are dead, long live search 17 | - Local search is still built on top of hierarchical file systems today 18 | - New storage technologies will completely alter the way that OS store data but new solutions will benefit from and probably resemble earlier efforts 19 | 20 | ## Disk Parts 21 | 22 | - Platter: 23 | * Circular flat disk on which magnetic data is stored 24 | * Constructed of a rigid non-magnetic material coated with a very thin layer of magnetic material 25 | * Can have data written on both sides 26 | - Spindle 27 | * Drive shaft on which multiple platters are mounted and spun between 4200 and 15000 RPM 28 | - Head 29 | * Actuator that reads and writes data onto the magnetic surface of the platters while rotating at tens of nanometers over the platter surface 30 | 31 | ## Disk Locations 32 | 33 | - Track 34 | * Think of a lane on a race track running around the platter 35 | - Sector 36 | * Resembles a slice of pie cut out of a single platter 37 | - Cylinder 38 | * Imagine the intersection between a cylinder and the set of platters 39 | * Composed of a set of vertically-aligned tracks on all platters 40 | 41 | ## Spinning Disks are Different 42 | 43 | - Spinning disks are fundamentally different from other system components we have discussed so far 44 | - Difference in kind: disks move 45 | - Difference in degree: disks are slow 46 | - Difference in integration: disks are devices, and less tightly coupled to the abstraction built on top of them 47 | 48 | ## Disks Move, Ergo Disks are Slow 49 | 50 | - Electronics time scale: time for electron to flow from one part of the computer or chip to another (fast) 51 | - Mechanics time scale: time necessary for a physical object to move from one point to another (comparatively slow) 52 | 53 | ## Disks Move, Ergo Disks Fail 54 | 55 | - Disks can fail by parts 56 | * Many disks ship with sectors already broken; OS detect and ignore these sectors 57 | * Sectors may fail over time, potentially resulting in data loss 58 | - Disks can fail catastrophically 59 | * Head crash occurs when a jolt sends the disk heads crashing into the magnetic surface, scarping off material and destroying the platter 60 | * When this happens, you have about 20 seconds to say goodbye 61 | 62 | ## Head Crash 63 | 64 | - We will discuss RAID, a clever way to use multiple disks to create a more reliable and better-performing device that looks like a single disk 65 | - Many interesting approaches to fault tolerance began with people thinking about spinning disks 66 | 67 | ## Disks are Slow 68 | 69 | - Disks frequently bottleneck other parts of the OS 70 | - OS play some of our usual games with disks to try and hide disk latency 71 | * Use past to predict the future 72 | * Use a cache 73 | * Procrastination 74 | - Contrast with memory latencies which are hidden transparetnly by the processor 75 | - Here OS software directly involved 76 | 77 | ## Source of Slowness 78 | 79 | - Reading or writing from the disk requires a series of steps, each of which is a potential source of latency 80 | 1. Issue the command: OS has to tell device what to do 81 | * Commmand has to cross the device interconnect and the drive has to select which head to use 82 | 2. Seek time: drive has to move the head to appropriate track 83 | 3. Settle time: heads have to stabilize on the very narrow track 84 | 4. Rotation time: platters have to rotate to the position where the data is stored 85 | 5. Transfer time: Data has to be read and transmitted back across the interconnect into system memory 86 | 87 | ## What Improves? 88 | 89 | - Interconnect speeds: seem to be increasing e.g., SATA-6 90 | - Seek times: not improving rapidly (moving physical objects part) 91 | - Rotation speeds: vary between devices but may not be primary source of latency anyway (physical limitations come into play) 92 | 93 | ## The Already-Came I/O Crisis 94 | 95 | - Two factors collide: 96 | 1. Hard drive densities and capacities soar, encouraging users to save more stuff and increasing I/O demand 97 | 2. Seek times limit the ability of disks to keep up 98 | - Three orders of magnitude increase in capacity between 1991 and 2006, but only two in speed 99 | -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Lectures/8-Sql-NoSql.md: -------------------------------------------------------------------------------- 1 | # SQL vs NoSQL 2 | 3 | - Two main DB solutions: SQL (relational) and NoSQL (non-relational) 4 | - Relational databases are structured with pre-defined schemas 5 | - Non-relational databases are unstructured, distributed and have a dynamic schema 6 | 7 | ## SQL 8 | 9 | - Relational databases store data in rows and columns 10 | - Each row contains all info about one entity 11 | - Each column contains all separeate data points 12 | - Examples: MySQL, Postgres, MariaDB 13 | 14 | ## NoSQL 15 | 16 | - Most common types of NoSQL: 17 | 1. Key-value stores 18 | * Data is stored in an array of key-value pairs 19 | * Examples: Redis, DynamoDB 20 | 2. Document databases 21 | * Data stored in documents 22 | * Documents grouped together into collections 23 | * Each document can have an entirely different structure 24 | * Examples: MongoDB 25 | 3. Wide-column databases 26 | * Columnar databases have column families which are containers for rows 27 | * Don't need to know all the columns upfront 28 | * Each row diesn't have to have the same number of columns 29 | * Beste suited for analyzing large datasets 30 | * Examples: Cassandra, HBase 31 | 4. Graph databases 32 | * Used to store data whose relations are best represented in a graph 33 | * Data save in graph structure with nodes (entities), properties (info about entities) and lines (connection between entities) 34 | * Examples: Neo4J 35 | 36 | ## High Level Differences Between SQL and NoSQL 37 | 38 | - Storage 39 | * SQL stores data in tables where each row represents an entity and each column represents data point about that entity 40 | * NoSQL databases have different storage models e.g., key-value, document, graph and columnar 41 | - Schema 42 | * In SQL, each record conforms to a fixed schema i.e., columns must be decided before data entry 43 | + Schema can be modified layer but requires modifying entire database and going offline 44 | * In NoSQL, schemas are dynamic 45 | + Columns added on the fly and each row doesn't have to contain data for each column 46 | - Querying 47 | * SQL databases use SQL for defining and manipulating the data which is very powerful 48 | * In NoSQL, queries are focues on a collection of documents 49 | + Different databases have different syntax 50 | - Scalability 51 | * SQL databases are vertically scalable i.e., by increasing memory, CPU, etc 52 | + Possible to scale a relational database across multiple servers, but challenging and time-consuming 53 | * NoSQL are horizontally scalable 54 | - Reliability or ACID (atomicity, consistency, isolation, durability) compliancy 55 | * Vast majority of relational databases are ACID compliant 56 | + When it comes to data reliability and safe guaranteee of performing transactions, SQL is best bet 57 | * Most NoSQL solutions sacrifice ACID compliance for performance and scalability 58 | 59 | ## SQL vs NoSQL - Which One to Use? 60 | 61 | - Reasons to use SQL database 62 | 1. Need to ensure ACID compliance 63 | * ACID compliance reduces anomalies and protects integrity of database by prescribing exactly how transactions interact with the database 64 | 2. Data is structured and unchanging 65 | * If business is not experiencing massive growth that would require more servers and data is consistent, there may be no reason to use a system designed to support a variety of data types and high traffic volume 66 | - Reasons to use NoSQL database 67 | 1. Storing large volumes of data that often have little to no structure 68 | * NoSQL database sets no limits on the types of data we can store together and allows us to add new types as the need changes 69 | * With document-based databases, you can store data in one place without having to define what types of data those are in advance 70 | 2. Making the most of cloud computing and storage 71 | * Cloud-based storage is excellent cost-saving solution but requires data to be easily spread across multiple servers to scale up 72 | * NoSQL databases designed to be scaled across multiple datacenters 73 | 3. Rapid development 74 | * NoSQL useful for rapid development as it doesn't need to be prepped ahead of time 75 | * Require frequent updates to data structure without much downtime between versions 76 | 77 | ## Aside: ACID 78 | 79 | - Set of properties that guarantee database transactions are processed reliably 80 | - Atomicity 81 | * Guarantee that either all of the transaction succeeeds or none of it does 82 | * All or nothing 83 | - Consistency 84 | * Guarantee that all data will be consistent 85 | - Isolation 86 | * Guarantee that all transactions occur in isolation i.e., no transaction will be affected by any other transaction 87 | * Transaction cannot read data from any other data that has not yet completed 88 | - Durability 89 | * Once transaction is commmitted it will remain in the system -------------------------------------------------------------------------------- /fundamentals/pucit-systems-programming/lecture-8-exit-handlers.md: -------------------------------------------------------------------------------- 1 | # Lecture 8 2 | 3 | ## How a C Program Starts and Terminates 4 | 5 | 1. User executes binary on terminal 6 | 2. `fork()` creates a copy of the parent 7 | 3. Child process calls `exec()` to load binary program into address space of that process 8 | 4. C startup routines: 9 | * Various dynamic libraries from other processes are mapped to the memory map of this process 10 | * Set the command line arguments, environemnt variables in the process stack 11 | 5. Main function is called 12 | * Calls other user defined functions which returns back to it 13 | 6. Finally either: 14 | * Main or user function calls exit function call 15 | 7. Exit function call performs the following before terminating the process: 16 | * Calls standard I/O cleanup routine which flushes I/O buffer and closes open files 17 | * Calls exit handlers 18 | + User supplied functions 19 | 8. Process terminates and returns exit code to parent 20 | * Note: if program uses `_exit()` to exit the program then the exit handlers (`atexit()`) are not called 21 | 22 | ## Example 23 | 24 | - `cat prog1.c` 25 | ``` 26 | #include 27 | void display(char* msg){ 28 | printf("%s\n",msg); 29 | } 30 | int main(){ 31 | display("Learning is fun with Arif"); 32 | return 54; 33 | } 34 | ``` 35 | - `gcc prog1.c` 36 | - `./a.out` to execute 37 | - `strace ./a.out` to monitor system calls made by a process and the signals sent to it 38 | ``` 39 | execve("./a.out", ["./a.out"], [/* 45 vars */]) = 0 # Exec call with 0 as the return statement 40 | ... other system calls 41 | brk(0x55555777000) = 0x55555777000 # Sets the heap 42 | write(1, "Learning is fun with Arif\n", 26Learning is fun with Arif) = 26 # Converts the printf library to the write system call (file descriptor 1 is to stdout) 43 | exit(54) 44 | ``` 45 | - Memory map different shared objects into this process's address space 46 | - `echo $?` will show `54` 47 | 48 | ## Exit Function 49 | 50 | - Sample program that uses `atexit` exit handler: 51 | ``` 52 | #include // for perror function 53 | #include // for exit and atexit function 54 | #include // for _exit system call 55 | 56 | void exit_handler(){ 57 | printf("Exit handler\n") 58 | } 59 | int main(){ 60 | atexit(exit_handler); 61 | printf("Main is done!\n"); 62 | return 0; // or exit(0); 63 | } 64 | ``` 65 | - `gcc atexit_ex1.c` 66 | - `./a.out` 67 | ``` 68 | Main is done! 69 | Exit handler 70 | ``` 71 | - For more than one exit handler, they are executed in reverse order in which they are registered with `atexit()` 72 | 73 | ## How a C Program Terminates 74 | 75 | - Normal termination 76 | * Main function's `return` statement 77 | * Any function calling `exit()` library call 78 | * Any function calling `_exit()` system call 79 | - Abnormal termination 80 | * Calling `abort()` function 81 | * Terminated by a signal 82 | 83 | ## Limitations of atexit() 84 | 85 | - An exit handler doesn't know what exit status was passed to `exit()` 86 | * This may be useful e.g., we may like to perform different actions depending on whether the process is exiting successfully 87 | - We can't specify an argument to exit handler when called 88 | 89 | ## Library Call on_exit() 90 | 91 | - `int on_exit(void(*func) (int, void*), void*arg)` 92 | - `on_exit()` is also used to register exit handlers like `atexit()` but is more powerful 93 | - Accepts two arguments: function pointer and a void pointer 94 | - `func` is a function pointer that is passed two arguments (integer and a `void*`) 95 | - First argument to `func` is the integer value passed to `exit()` 96 | - Second argument is second argment to `on_exit()` 97 | 98 | ## Process Resource Limits 99 | 100 | - Every process has a set of resource limits that can be used to restrict the amounts of various system resources that the process may consume 101 | - We can set the resource limits of the shell using the `ulimit` built-in command 102 | * These limits are inherited by the processes that the sshell creates to execute user comjmands 103 | - Since kernel 2.6.24, Linux-specific `/proc/PID/limits` file can be used to view all of the resource limits of any process 104 | - Example: `cat /proc/2/limits` 105 | ``` 106 | Limit Soft Limit Hard Limit Units 107 | Max cpu time unlimited unlimited seconds # Max cpu time in seconds that can be used by a process 108 | Max file size unlimited unlimited bytes 109 | Max data size unlimited unlimited bytes 110 | Max stack size 8388608 unlimited bytes 111 | Max processes 12017 12017 processes # Max number of child processes a parent can create 112 | Max open files 1024 1024 files # Max number of files a process can open at one time 113 | Max address space unlimited unlimited bytes 114 | Max nice priority 0 0 115 | etc 116 | ``` 117 | - Cannot increase soft limit above hard limit 118 | - `ulimit -a` shows default limits -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-19-swapping.md: -------------------------------------------------------------------------------- 1 | # Lecture 19 2 | 3 | ## Out of Core 4 | 5 | - When we run out there are two options: 6 | 1. Fail, and either don't load process `exec()`, create a new process `fork()`, refuse to allocate more memory `sbrk()` or kill process if it is trying to allocate more stack 7 | 2. Create more space, preserving contents of memory for future use 8 | - Virtual address translation gives kernel ability to remove memory from a process behind its back 9 | * Requirements for doing this: 10 | * Last time process used virtual address it behaved like memory 11 | * Next time process uses the virtual address it behaves like emmory 12 | * In between, whatever data was stored at that address must be preserved 13 | 14 | ## Swapping 15 | 16 | - The place OS typically place data store in memory in order to borrow the memory from the process is on disk 17 | - We call the process of moving data back and forth from memory to disk in order to improve memory usage swapping 18 | - Goal: when swapping is done well, your system feels like it has memory that is as large as the size of the disk but fast as actual RAM 19 | - Unfortunately, when swapping is not done well, your system feels like it has memory that is as small as RAM and as slow as disk 20 | 21 | ## TLB vs. Page Fauls 22 | 23 | - We distinguish between two kinds of memory-related fault: 24 | 1. TLB fault: required virtual to physical address translation not in TLB 25 | 2. Page fault: contents of a virtual page are either not initialized or in memory 26 | - Every page fault is preceded by a TLB fault 27 | * If the contents of the virtual page are not in memory, a translation cannot exist for it 28 | - Not every TLB fault generates a page fault 29 | * If page is in memory and the translation is the page table, the TLB fault can be handled without generating a page fault 30 | 31 | ## Swap Out 32 | 33 | - To swap a page to disk we must: 34 | 1. Remove the translation from the TLB, if it exists 35 | 2. Copy the contents of the page to disk 36 | 3. Update page table entry to indicate that the page is on disk 37 | - Note: If process tries to access same virtual address, TLB will not have it so process will sleep until swap in is done 38 | 39 | ## Swap Out Speed 40 | 41 | - Remove translation from TLB: fast 42 | - Copy contents of page to disk: slow 43 | - Update the page table entry to indicate that page is on disk: fast 44 | 45 | ## Page Cleaning 46 | 47 | - Frequently when we are swapping out a page it is in order to allocate new memory to a running process or possibly to swap in a page 48 | - So it would be great if swapping out were fast 49 | - Can we prepare the system to optimize swap out? Yes 50 | * Each page has a dedicated place on disk 51 | * During idle periods, OS writes data from active memory pages to swap disk 52 | * Pages with matching content on the swap disk are called clean 53 | * Pages that do not match their swap disk content are called dirty 54 | 55 | ## Swap In 56 | 57 | - When must we swap in a page? 58 | * When the virtual address is used by the process 59 | - To translate a virtual address used by a process that points to a page that has been swapped out, we must: 60 | 1. Stop the instruction that is trying to translate the address until we can retrieve the contents 61 | 2. Allocate a page in memory to hold the new page contents 62 | 3. Locate the page on disk using the page table entry 63 | 4. Copy the contents of the page from disk 64 | 5. Update the page table entry to indicate that the page is in memory 65 | 6. Load the TLB 66 | 7. Restart the instruction that was addressing the virtual address we retrieved 67 | - Note: thrashing refers to programs continuously trigger page faults, swapping pages in and out, OOM killer starts killing processes 68 | 69 | ## On-Demand Paging 70 | 71 | - Sometimes procastination is useful particularly when you end up never having to do the thing you're being asked to do 72 | - Process: kernel, load this huge chunk of code into my address space 73 | - Kernel: I am busy, but I will make a note of it 74 | - Process: kernel, give me 4MB more heap 75 | - Kernel: request is granted, but come back when you really need it 76 | - Until an instruction on a code page is executed or a read or write occurs to a data or heap page, the kernel does not load the contents of that page into memory 77 | - Why not? 78 | * A lot of code is never executed and some global variables are never used. Why waste memory? 79 | 80 | ## Demanded Paging 81 | 82 | - What happens the first time a process executes an instruction from a new code page? 83 | * That page contents are loaded from disk and the instruction is started 84 | - What happens the first time a process does a load or store to an uninitialized heap, stack or data page? 85 | * The kernel allocates a new page filled with zeros and the instruction is restarted 86 | 87 | ## Aside: Hardware-Managed TLBs 88 | 89 | - On certain architectures, the MMU will search the page table itself to locate virtual-to-physical address translations missing from the TLB 90 | * Pro: Hardware is faster 91 | * Con: OS must set up page tables in a fixed way that the hardware understands 92 | - With hardware-managed TLB, kernel never sees TLB faults (handled in the hardware) 93 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-23-intro-to-filesystems.md: -------------------------------------------------------------------------------- 1 | # Lecture 23 2 | 3 | ## UNIX File Interface 4 | 5 | - Establishing relationships: 6 | * `open("foo")`: I would like to use the file named `foo` 7 | * `close("foo")`: I am finished with `foo` 8 | - Reading and writing: 9 | * `read(2)`: I would like to perform a read from file handle `2` at the current position 10 | * `write(2)`: I would like to perform a write from file handle `2` at the current position 11 | - Positioning: 12 | * `lseek(2, 100)`: Please move my saved position for file handle `2` to position `100` 13 | 14 | ## Files Together: File Organization 15 | 16 | - Each file has to have a unique name 17 | - Flat name spaces were actually used by some early file systems but file naming got bad fast: 18 | * `letterToMom.txt` 19 | * `letterToSuzanna.txt` 20 | * `AnotherLetterToSuzanna.txt` 21 | * etc 22 | 23 | ## Hierarchical Implications 24 | 25 | - Don't look at everything all at once, allow users to store and examine related files together: 26 | * `letters/Mom/Letter.txt` 27 | * `letters/Suzanna/Letters/1.txt` 28 | * `letters/Suzanna/Letters/2.txt` 29 | * etc 30 | - Each file should be stored in one place 31 | 32 | ## Location Implications 33 | 34 | - Location requires navigation and relative navigation is useful, meaning that locations (directories) can include pointers to other locations (other directories) 35 | - Finally location is only meaningful if it is tied to a files name so hierarchical file systems implement name spaces, which require that a file's name map to a single unique location within the file system 36 | 37 | ## Why Trees 38 | 39 | - File systems usually require that files be organized into an acyclic graph with a single root also known as a tree 40 | - Why? 41 | * Trees produce a single canonical name for each file on the system as well as an infinite number of relative names 42 | + Canonical name: `/you/used/to/love/well` 43 | + Relative name: `/you/used/to/love/me/../well` 44 | 45 | ## File System Design Goals 46 | 47 | 1. Efficiently translate file names to file contents 48 | 2. Allow files to move, grow, shrink, and otherwise change 49 | 3. Optimize access to single files 50 | 4. Optimize access to multiple files, particularly related files 51 | 5. Survive failures and maintain a consistent view of file names and contents 52 | 53 | ## Three of These Things Are All Like Each Other 54 | 55 | - The file systems we will discuss all support the following features: 56 | * Files, including some number of file attributes and permissions 57 | * Names, organized into a hierarchical name space 58 | - This is the file interface and feature set we are all used to 59 | - The difference lie in the implementation and what happens on disk 60 | 61 | ## Implementing Hierarchical File Systems 62 | 63 | - Broadly speaking, two types of disk blocks: 64 | 1. Data blocks: contain file data 65 | 2. Index nodes (inodes): contain not file data 66 | - What makes file systems different: 67 | * On-disk layout: how does file system decide where to put data and metadata blocks in order to optimize file access 68 | * Data structures: what data structures does the filesystem use to translate names and locate file data 69 | * Crash recovery: how does the file system prepare for and recover from crashes 70 | 71 | ## File System Challenges 72 | 73 | - File systems are really maintaining a large and complex data structure using disk blocks as storage 74 | - This is hard because making changes potentially requires updating many different structures 75 | 76 | ## Example write 77 | 78 | - Say a process wants to `write` data to the end of a file. What does a file system have to do? 79 | 1. Find empty disk blocks to use and mark them as in use 80 | 2. Associate those blocks with the file that is being written to 81 | 3. Adjust the size of the file that is being written to 82 | 4. Actually copy the data to the disk blocks being used 83 | - From perspective of a process all of these things need to happen synchronously 84 | - In reality, many different asynchrounous operations are involved touching many different disk blocks 85 | - This creates both a consistency and a performance problem 86 | 87 | ## What Happens on Disk? 88 | 89 | - Let's consider the on-disk structures used by modern file systems 90 | - Specifically we are going to investigate how file systems: 91 | * Translate paths to file index nodes or inodes 92 | * Find data blocks associated with a given inode (file) 93 | * Allocate and free inodes and data blocks 94 | - We will ty and keep this high level but examples used are drawn from `ext4` file system 95 | 96 | ## Sectors, Blocks and Extents 97 | 98 | - Sector: smallest unit that the disk allows to be written (typically 256 bytes) 99 | - Block: smallest unit that the filesystem actually writes (usually 4k bytes) 100 | - Extent: set of contiguous blocks used to hold part of a file (described by start and end block) 101 | - Why would file systems not write chunks smaller than 4k? 102 | * Because contiguous writes are good for disk head scheduling and 4k is the page size which affects in-memory file caching 103 | - Why would file systems want to write file data in even larger chunks? 104 | * Because contiguous writes are good for disk head scheduling and many files are larger than 4k 105 | -------------------------------------------------------------------------------- /fundamentals/pucit-systems-programming/lecture-29-shared-memory.md: -------------------------------------------------------------------------------- 1 | # Lecture 29 2 | 3 | ## Introduction to Shared Memory 4 | 5 | - Shared memory allows 2+ processes to share a memory region or segment of memory for reading and writing purposes 6 | - Problem with pipes, fifo and message queue is that mode switches are involved as the data has to pass from one process buffer to the kernel buffer and then to another process buffer 7 | - Since access to user-space memory does not require a mode switch, shared memory is considered as one of the quickest means of IPC 8 | 9 | ## APIs to shared memory 10 | 11 | - System-V API 12 | * Header file: `sys/shm.h` 13 | * Data structure: `shmid_ds` 14 | * Create/open: `shmget()`, `sgmat()` 15 | * Close: `shmdet()` 16 | * Perform IPC: Access memory 17 | * Control options: `shmctl()` 18 | 19 | ## Creating/Opening Shared Memory Segment 20 | 21 | - `int shmget(key_t key, size_t size, int shmflg);` 22 | - `shmget()` system call creates a new shared memory segm,ent or obtains the identifier of an existing segment 23 | * Contents of a newly created shared memory segment are initialized to `0` 24 | * Return value is ID of the shared memory segment 25 | - First argument `key` can be the constant `IPC_PRIVATE` or can be achieved using `ftok()` library call 26 | - Second argument `size` specifies the desired size of the segment in bytes 27 | * Kernel round it up to next multiple of the system page size 28 | * If we use `shmget()` to obtain the identifier of an existing segment then size has no effecct on the segment 29 | - `shmflg` argument specifies permissions to be placed on a new shared memory segment or checked against an existing segment 30 | * In addition, it can be a bitwise OR of contants like `IPC_CREAT` and `IPC_EXCL` 31 | 32 | ## Using shared memory segment 33 | 34 | - `shmat()` system call attaches shared memory segment identified by `shmid` to address space of calling process 35 | - Second argument `shmaddr` is the address where the memory sengment will be attached 36 | * If we want OS kernel to select a suitable address, keep it NULL 37 | - Third argument `shmflg` can be `SHM_RDONLY` to attach the shared memory segment for read-only access 38 | * We can place a zero over there for giving both read and write access 39 | - On success `shmat()` returns the address at which the shared memory segment is attached which can be treated like a normal C pointer 40 | * We can assign the return value from `shmat()` to a pointer of some intrinsic data type or a programmer defined structure 41 | 42 | ## Detaching shared memory segment 43 | 44 | - When process no longer needs to access shared memory segment, it can call `shmdt()` to detach segment from its address space 45 | - The only argment to `shmaddr` identifies the segment to be detached 46 | * It should be a value returned by a previous call to `shmat()` 47 | - Detaching a shared memory segment is not the same as deleting it 48 | * Deleting can be performed using `shmctl()` 49 | - Child created by `fork()` inherits parent's attached shared memory segments 50 | * Thus shared memory provides an easy method of IPC between parent and child 51 | * However, after an `exec()` all attached shared memory segments are detached 52 | - Shared memory segments are also automatically detached on process termination 53 | 54 | ## Delete shared memory segment 55 | 56 | - `shmctl()` system call is used to perform control operations on the shared memory segment specified in its first argument `shmid` 57 | - One of the basic control operation is deletion of the shared memory segment 58 | * This can be done by giving `IPC_RMID` as the `cmd` in the second argument 59 | * This will destory the memory segment after the last process detaches it 60 | - For deletion operation of shared memory the third argument is kept NULL 61 | 62 | ## Example 63 | 64 | - `cat /proc/sys/kernel/shmni` to see maximum limit of shared memory segments 65 | - `cat /proc/sys/kernel/shmmax` to see maximum size of shared memory segments 66 | - `ipcs -m` to view shared memory segments 67 | - Example of writer: 68 | ``` 69 | #include 70 | #include 71 | #include 72 | #include 73 | int main(){ 74 | // ftok to generate unique key 75 | key_t key = ftok("f1.txt", 65); 76 | // shmget returns an identifier in shmid 77 | int shmid = shmget(key, 1024, 0666|IPC_CREAT); 78 | // shmat to attach to shared memory 79 | char *buffer = (char*)shmat(shmid, NULL, 0); 80 | printf("Please enter a string to be written in shared memory:\n"); 81 | fgets(buffer, 512, stdin); 82 | printf("\nData has been written in shared memory. Bye\n"); 83 | //detach from shared memory 84 | shmdt(buffer); 85 | return 0; 86 | } 87 | ``` 88 | - Example of reader: 89 | ``` 90 | #include 91 | #include 92 | #include 93 | #include 94 | int main(){ 95 | // ftok to generate unique key 96 | key_t key = ftok("f1.txt", 65); 97 | // shmget returns an identifier of existing shared memory 98 | int shmid = shmget(key, 1024, 0666|IPC_CREAT); 99 | // shmat to attach to shared memory 100 | char *buffer = (char*)shmat(shmid, NULL, 0); 101 | printf("Data read from memory: %s\n", buffer); 102 | //detach from shared memory 103 | shmdt(buffer); 104 | // destroy the shared memory 105 | // shmctl(shmid, IPC_RMID, NULL); 106 | return 0; 107 | } 108 | ``` 109 | -------------------------------------------------------------------------------- /cramming/lfs/code/wget-list: -------------------------------------------------------------------------------- 1 | http://download.savannah.gnu.org/releases/acl/acl-2.2.53.tar.gz 2 | http://download.savannah.gnu.org/releases/attr/attr-2.4.48.tar.gz 3 | http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.xz 4 | http://ftp.gnu.org/gnu/automake/automake-1.16.1.tar.xz 5 | http://ftp.gnu.org/gnu/bash/bash-5.0.tar.gz 6 | https://github.com/gavinhoward/bc/archive/2.5.3/bc-2.5.3.tar.gz 7 | http://ftp.gnu.org/gnu/binutils/binutils-2.34.tar.xz 8 | http://ftp.gnu.org/gnu/bison/bison-3.5.2.tar.xz 9 | https://www.sourceware.org/pub/bzip2/bzip2-1.0.8.tar.gz 10 | https://github.com/libcheck/check/releases/download/0.14.0/check-0.14.0.tar.gz 11 | http://ftp.gnu.org/gnu/coreutils/coreutils-8.31.tar.xz 12 | https://dbus.freedesktop.org/releases/dbus/dbus-1.12.16.tar.gz 13 | http://ftp.gnu.org/gnu/dejagnu/dejagnu-1.6.2.tar.gz 14 | http://ftp.gnu.org/gnu/diffutils/diffutils-3.7.tar.xz 15 | https://downloads.sourceforge.net/project/e2fsprogs/e2fsprogs/v1.45.5/e2fsprogs-1.45.5.tar.gz 16 | https://sourceware.org/ftp/elfutils/0.178/elfutils-0.178.tar.bz2 17 | https://dev.gentoo.org/~blueness/eudev/eudev-3.2.9.tar.gz 18 | https://prdownloads.sourceforge.net/expat/expat-2.2.9.tar.xz 19 | https://prdownloads.sourceforge.net/expect/expect5.45.4.tar.gz 20 | ftp://ftp.astron.com/pub/file/file-5.38.tar.gz 21 | http://ftp.gnu.org/gnu/findutils/findutils-4.7.0.tar.xz 22 | https://github.com/westes/flex/releases/download/v2.6.4/flex-2.6.4.tar.gz 23 | http://ftp.gnu.org/gnu/gawk/gawk-5.0.1.tar.xz 24 | http://ftp.gnu.org/gnu/gcc/gcc-9.2.0/gcc-9.2.0.tar.xz 25 | http://ftp.gnu.org/gnu/gdbm/gdbm-1.18.1.tar.gz 26 | http://ftp.gnu.org/gnu/gettext/gettext-0.20.1.tar.xz 27 | http://ftp.gnu.org/gnu/glibc/glibc-2.31.tar.xz 28 | http://ftp.gnu.org/gnu/gmp/gmp-6.2.0.tar.xz 29 | http://ftp.gnu.org/gnu/gperf/gperf-3.1.tar.gz 30 | http://ftp.gnu.org/gnu/grep/grep-3.4.tar.xz 31 | http://ftp.gnu.org/gnu/groff/groff-1.22.4.tar.gz 32 | https://ftp.gnu.org/gnu/grub/grub-2.04.tar.xz 33 | http://ftp.gnu.org/gnu/gzip/gzip-1.10.tar.xz 34 | http://anduin.linuxfromscratch.org/LFS/iana-etc-2.30.tar.bz2 35 | http://ftp.gnu.org/gnu/inetutils/inetutils-1.9.4.tar.xz 36 | https://launchpad.net/intltool/trunk/0.51.0/+download/intltool-0.51.0.tar.gz 37 | https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-5.5.0.tar.xz 38 | https://www.kernel.org/pub/linux/utils/kbd/kbd-2.2.0.tar.xz 39 | https://www.kernel.org/pub/linux/utils/kernel/kmod/kmod-26.tar.xz 40 | http://www.greenwoodsoftware.com/less/less-551.tar.gz 41 | http://www.linuxfromscratch.org/lfs/downloads/9.1/lfs-bootscripts-20191031.tar.xz 42 | https://www.kernel.org/pub/linux/libs/security/linux-privs/libcap2/libcap-2.31.tar.xz 43 | ftp://sourceware.org/pub/libffi/libffi-3.3.tar.gz 44 | http://download.savannah.gnu.org/releases/libpipeline/libpipeline-1.5.2.tar.gz 45 | http://ftp.gnu.org/gnu/libtool/libtool-2.4.6.tar.xz 46 | https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.5.3.tar.xz 47 | http://ftp.gnu.org/gnu/m4/m4-1.4.18.tar.xz 48 | http://ftp.gnu.org/gnu/make/make-4.3.tar.gz 49 | http://download.savannah.gnu.org/releases/man-db/man-db-2.9.0.tar.xz 50 | https://www.kernel.org/pub/linux/docs/man-pages/man-pages-5.05.tar.xz 51 | https://github.com/mesonbuild/meson/releases/download/0.53.1/meson-0.53.1.tar.gz 52 | https://ftp.gnu.org/gnu/mpc/mpc-1.1.0.tar.gz 53 | http://www.mpfr.org/mpfr-4.0.2/mpfr-4.0.2.tar.xz 54 | https://github.com/ninja-build/ninja/archive/v1.10.0/ninja-1.10.0.tar.gz 55 | http://ftp.gnu.org/gnu/ncurses/ncurses-6.2.tar.gz 56 | https://www.openssl.org/source/openssl-1.1.1d.tar.gz 57 | http://ftp.gnu.org/gnu/patch/patch-2.7.6.tar.xz 58 | https://www.cpan.org/src/5.0/perl-5.30.1.tar.xz 59 | https://pkg-config.freedesktop.org/releases/pkg-config-0.29.2.tar.gz 60 | https://sourceforge.net/projects/procps-ng/files/Production/procps-ng-3.3.15.tar.xz 61 | https://sourceforge.net/projects/psmisc/files/psmisc/psmisc-23.2.tar.xz 62 | https://www.python.org/ftp/python/3.8.1/Python-3.8.1.tar.xz 63 | https://www.python.org/ftp/python/doc/3.8.1/python-3.8.1-docs-html.tar.bz2 64 | http://ftp.gnu.org/gnu/readline/readline-8.0.tar.gz 65 | http://ftp.gnu.org/gnu/sed/sed-4.8.tar.xz 66 | https://github.com/shadow-maint/shadow/releases/download/4.8.1/shadow-4.8.1.tar.xz 67 | http://www.infodrom.org/projects/sysklogd/download/sysklogd-1.5.1.tar.gz 68 | https://github.com/systemd/systemd/archive/v244/systemd-244.tar.gz 69 | http://anduin.linuxfromscratch.org/LFS/systemd-man-pages-244.tar.xz 70 | http://download.savannah.gnu.org/releases/sysvinit/sysvinit-2.96.tar.xz 71 | http://ftp.gnu.org/gnu/tar/tar-1.32.tar.xz 72 | https://downloads.sourceforge.net/tcl/tcl8.6.10-src.tar.gz 73 | http://ftp.gnu.org/gnu/texinfo/texinfo-6.7.tar.xz 74 | https://www.iana.org/time-zones/repository/releases/tzdata2019c.tar.gz 75 | http://anduin.linuxfromscratch.org/LFS/udev-lfs-20171102.tar.xz 76 | https://www.kernel.org/pub/linux/utils/util-linux/v2.35/util-linux-2.35.1.tar.xz 77 | http://anduin.linuxfromscratch.org/LFS/vim-8.2.0190.tar.gz 78 | https://cpan.metacpan.org/authors/id/T/TO/TODDR/XML-Parser-2.46.tar.gz 79 | https://tukaani.org/xz/xz-5.2.4.tar.xz 80 | https://zlib.net/zlib-1.2.11.tar.xz 81 | https://github.com/facebook/zstd/releases/download/v1.4.4/zstd-1.4.4.tar.gz 82 | http://www.linuxfromscratch.org/patches/lfs/9.1/bash-5.0-upstream_fixes-1.patch 83 | http://www.linuxfromscratch.org/patches/lfs/9.1/bzip2-1.0.8-install_docs-1.patch 84 | http://www.linuxfromscratch.org/patches/lfs/9.1/coreutils-8.31-i18n-1.patch 85 | http://www.linuxfromscratch.org/patches/lfs/9.1/glibc-2.31-fhs-1.patch 86 | http://www.linuxfromscratch.org/patches/lfs/9.1/kbd-2.2.0-backspace-1.patch 87 | http://www.linuxfromscratch.org/patches/lfs/9.1/sysvinit-2.96-consolidated-1.patch -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-5-intro-to-synch-primitives.md: -------------------------------------------------------------------------------- 1 | # Lecture 5 2 | 3 | ## Review 4 | 5 | - Reduce synchronization as much as possible so multi-thread process is correct 6 | - Increase concurrency as much as possible so multi-thread process is fast 7 | 8 | ## Concurrency vs. Atomicity 9 | 10 | - Concurrency: the illusion that multiple things are happening at once 11 | * Requires stopping or starting any thread at any time 12 | - Atomicity: the illusion that a set of separate actions occurred all at once 13 | * Requires not stopping certain threads at certain times or not starting certain threads at certain times 14 | * Providing some limited control to threads over their scheduling 15 | 16 | ## Critical Sections 17 | 18 | - Critical section contains a series of instructions that only one thread can be executing at any given time 19 | - Questions to ask: 20 | 1. What is the local state private to each state? 21 | 2. What is the shared state that is being accessed by multiple threads? 22 | 3. What lines are in the critical section? 23 | - Example 24 | ``` 25 | void giveGWATheMoolah(account_t account, int largeAmount) { 26 | int gwaHas = get_balance(account); 27 | gwaHas = gwaHas + largeAmount; 28 | put_balance(account, gwaHas); 29 | notifyGWAThatHeIsRich(gwaHas); 30 | return; 31 | } 32 | ``` 33 | - In above example: 34 | 1. `gwaHas` is local state private to each thread 35 | 2. `account` is the shared state accessed by multiple threads 36 | 3. Lines 2-4 need to be in a critical section 37 | - Critical section requirements: 38 | 1. Mutual exclusion: only one thread should be executing in the critical section at a time 39 | 2. Progress: all threads should eventually be able to proceed through critical section 40 | 3. Performance: keep critical sections as small as possible without sacrificing correctness 41 | 42 | ## Implementing Critical Sections 43 | 44 | - Two possible appraches: 45 | 1. Don't stop 46 | 2. Don't enter 47 | - On uniprocessors, a single thread can prevent other threads from executing in a critical section by not being descheduled 48 | * In the kernel we can do this by masking interrupts (no timer, no scheduler, no stopping) 49 | * In the multi-core era, this is only of historical interest 50 | - More generally we need a way to force other threads (potentially running on other cores) not to enter the critical section while one thread is inside 51 | * How do we do this? 52 | 53 | ## Atomic Instructions 54 | 55 | - Software synchronization primitives utilize special hardware instructions guaranteed to be atomic across all cores 56 | * Test-and-set: write a memory location and return its old value 57 | ``` 58 | int testAndSet(int * target, int value) { 59 | oldvalue = *target; 60 | *target = value; 61 | return oldvalue; 62 | } 63 | ``` 64 | * Compare-and-swap: compare the contents of a memory location to a given value. If the same then set variable to a new given value 65 | ``` 66 | bool compareAndSwap(int * target, int compare, int newvalue) { 67 | if (*target == compare) { 68 | *target = newvalue; 69 | return 1; 70 | } else { 71 | return 0; 72 | } 73 | } 74 | ``` 75 | * Load-link and store-conditional: load-link returns the value of a memory address while the following store-conditional succeeds only if the value has not changed since load-link 76 | ``` 77 | y = 1; 78 | __asm volatile( 79 | ".set push;" /* save assembler mode */ 80 | ".set mips32;" /* allow MIPS32 instructions */ 81 | ".set volatile;" /* avoid unwanted optimization */ 82 | "ll %0, 0(%2);" /* x = *sd */ 83 | "sc %1, 0(%2);" /* *sd = y; y = success? */ 84 | ".set pop" /* restore assembler mode */ 85 | : "=r" (x), "+r" (y) : "r" (sd)); 86 | if (y == 0) { 87 | return 1; 88 | } 89 | ``` 90 | - Many processors provide either test-and-set or compare-and-swap 91 | * Modify example from earlier: 92 | ``` 93 | +int payGWA = 0; // Shared variable for our test and set. 94 | 95 | void giveGWATheMoolah(account_t account, int largeAmount) { 96 | if (testAndSet(&payGWA, 1) == 1) { 97 | // Keep looping until testAndSet is unlocked 98 | } 99 | int gwaHas = get_balance(account); 100 | gwaHas = gwaHas + largeAmount; 101 | put_balance(account, gwaHas); 102 | testAndSet(&payGWA, 0); # Clear the test and set. 103 | notifyGWAThatHeIsRich(gwaHas); 104 | return; 105 | } 106 | ``` 107 | * Busy waiting: threads wait for the critical section by "pounding on the door" executing test-and-set repeatedly 108 | * This is bad on a multi-core system (worse on a single core system) since busy waiting prevents the thread in the critical section from making progress 109 | 110 | ## Locks 111 | 112 | - Locks are a synchronization primitive used to implement critical sections 113 | * Threads acquire a lock when entering a critical section 114 | * Threads release a lock when leaving a critical section 115 | 116 | ## Spinlocks 117 | 118 | - What we implemented in the example is known as a spinlock 119 | * Lock for the fact that it guards a critical section 120 | * Spin describing the process of acquiring it 121 | - Spinlocks are rarely used on their own to solve synchronization problems 122 | - Spinlocks are commonly used to build more useful synchronization primitives -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-30-xen-virtualization.md: -------------------------------------------------------------------------------- 1 | # Lecture 30 2 | 3 | ## Paper Overview 4 | 5 | - What is the wrong way 6 | * Full virtualization 7 | + There are situations in which it is desirable for the hosted OS to see real as well as virtual resources 8 | + Providing both real and virtual time allows a guest OS to better support time-sensitive tasks and to correctly handle TCP timeouts and RTT estimates 9 | + Exposing real machine addresses allow a guest OS to improve performance by using superpages or page coloring 10 | - So what is the big idea? 11 | * Paravirtualization 12 | + Trade off small changes to the guest OS for big improvements in performance and VMM simplicity 13 | + Present a virtual machine abstraction that is similar but not identical to underlying hardware 14 | + Promises improved performance although it does require modifications to the guest OS 15 | + Does not require changes to the application binary interface (ABI) and hence no modifcations required to guest applications 16 | 17 | ## Xen Design Principles 18 | 19 | 1. Support for unmodified application binaries is essential 20 | * Must virtualize all architectural features required by existing standard ABIs 21 | 2. Supporting full multi-application OS is important allowing complex server configurations to be virtualized within a single guest OS instance 22 | 3. Paravirtualization is necessary to obtain high performance and strong resource isolation on uncooperative machine architectures such as x86 23 | 4. Even on cooperative machine architectures, completely hiding the effects of resource virtualization from guest OSes risks both correctness and performance 24 | - Xen introduces the idea of a hypervisor, a small piece of control software similar to the VMM running below all OSes running on the machine 25 | - Much of the typical VMM functionality is moved to the control plane software that runs inside a Xen guest 26 | 27 | ## Summary of Xen Changes 28 | 29 | - Memory management 30 | * Segmentation: cannot install fully-privileged segment descriptors and cannot overlap with top end of the linear address space 31 | * Paging: guest OS has direct read access to hardware page tables but updates are hatched and validated by the hypervisor 32 | - CPU 33 | * Protection: guest OS must run at a lower privilege level than Xen 34 | * Exceptions: guest OS must register a descriptor table for exception handlers with Xen; aside from page faults, the handlers remain the same 35 | * System calls: Guest OS may install a fast handler for system calls allowing direct calls from an application into its guest OS and avoiding indirecting through Xen on every call 36 | * Interrupts: hardware interrupts are replaced with a lightweight event system 37 | * Time: each guest OS has a timer interface and is aware of both real and virtual time 38 | - Device I/O 39 | * Network, disk, etc: virtual devices are elegant and simple to access; data is transferred using asynchronous I/O rings; an event mechanism replaces hardware interrupts for notfications 40 | 41 | ## Virtual Machine Memory Interface 42 | 43 | - Virtualizing memory is hard, but its easier if the architecture has 44 | * A software-managed TLB which can be efficiently virtualized or 45 | * A TLB with address space identifiers which does not need to be flushed on every transition 46 | - Of course the x86 has neither of these features 47 | * Given these limitations, we: 48 | 1. guest OSes are responsible for allocating and managing hardware page tables with minimal involvement from Xen to ensure safety and isolation 49 | 2. Xen exists in a 64MB section at the top of eery address space, thus avoiding a TLB flush when evetering and leaving the hypervisor 50 | - But how then do we ensure safety? 51 | * Each time a guest OS requires a new page table, perhaps because a new process is beign created, it allocates and initializes a page from its own memory reservation and registers it with Xen 52 | * At this point, the OS must relinquish direct write privileges to the page table memory: all subsequent updates must be validated by Xen 53 | * Guest OSes may batch update requests to amortize the overhead of entering the hypervisor 54 | * Top 64MB region of each address space, which is reserved for Xen is not accessible or remapped by guest OSes 55 | * This address region is not used by any of the common x86 ABIs however so this restriction does not break application compatibility 56 | 57 | ## Virtual Machine CPU Interface 58 | 59 | - Principally the insertion of a hypervisor below the OS violates the usual assumption that OS is most privileged entity in the system 60 | - To protect hypervisor from OS misbehavior guest OSes must be modified to run at a lower privilege level 61 | * x86 privilege rings to the rescue 62 | * Rings 1 and 2 have not been used by any well known x86 OS since OS/2 63 | - What exceptions happen enough to create a performance problem? Page faults and system calls 64 | * Typically only two types of exceptions occur frequently enough to affect system performance: system calls and page faults 65 | * We improve the performance of system calls by allowing each guest OS to register a fast exception handler which is accessed directly by the processor without indirecting via ring 0 66 | * This handler is validated befor einstalling it in the hardware exception table 67 | 68 | ## Para vs Full Virtualization 69 | 70 | - Full virtualization: do not change the OS except at run time 71 | - Paravirtualization: minimial changes to the OS which sometimes results in better interaction between the OS and virtual hardware -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-25-file-system-caching.md: -------------------------------------------------------------------------------- 1 | # Lecture 25 2 | 3 | ## Making File Systems Fast 4 | 5 | - How do we make a big slow thing look faster? 6 | * Use a cache 7 | - In this case of the file system, the smaller, faster thing is memory 8 | - We call the memory used to cache file system data the buffer cache 9 | 10 | ## Putting Spare Memory to Work 11 | 12 | - OS use memory 13 | * As memory 14 | * To cache file data in order to improve performance 15 | - These two uses of memory compete with each other 16 | * Big buffer cache, small main memory: file access is fast, but potential thrashing in the memory subsystem 17 | * Small buffer cache, large main memory: little swapping occurs but file access is extremely slow 18 | - On Linux the `swappiness` kernel parameter controls how aggressively the OS prunes unused process memory pages and hence the balance between memory and buffer cache 19 | 20 | ## Where to Put the Buffer Cache? 21 | 22 | ## Above the File System 23 | 24 | - What do we cache? 25 | * Entire files and directories 26 | - What is the buffer cache interface 27 | * `open`, `close`, `read`, `write` (same as the file system call interface) 28 | 29 | ## Above the File System: Operations 30 | 31 | - `open` 32 | * Pass down to underlying file system 33 | - `read` 34 | * If file is not in the buffer cache, pass load contents into the buffer cache and then modify them 35 | * If file is in the cache, modify the cached contents 36 | - `write` 37 | * If file is not in buffer cache, pass load contents into the buffer cache and then modify them 38 | * If file is in the cache, modify the cached contents 39 | - `close` 40 | * Remove from the cache (if necessary) and flush contents through the file system 41 | 42 | ## Above the File System: Pros and Cons 43 | 44 | - Pros: 45 | * Buffer cache sees file operations, may lead to better prediction or performance 46 | - Cons: 47 | * Hides many file operations from the file system, preventing it from providing consistency guarantees 48 | * Can't cache file system metadata: inodes, superblocks, etc 49 | 50 | ## Below the File System 51 | 52 | - What do we cache? 53 | * Disk blocks 54 | - What is the buffer cache interface? 55 | * `readblock`, `writeblock` (same as the disk interface) 56 | 57 | ## Below the File System: Pros and Cons 58 | 59 | - Pros: 60 | * Can cache all blocks including file system data structures, inodes, superblocks, etc 61 | * Allows file system to see all file operations even if they eventually hit the cache 62 | - Cons 63 | * Cannot observe file semantics or relationships 64 | - This is what modern OS do 65 | 66 | ## Review: Data Blocks: Multilevel Index 67 | 68 | - Most files are small, but some can get very large 69 | - Have inode store: 70 | * Some pointers to blocks which refer to direct blocks 71 | * Some pointers to blocks containing pointers to blocks which we refer to as indirect blocks 72 | * Some pointers to blocks containing pointers to blocks containing pointers to blocks which we refer to doubly indirect blocks 73 | * Etc. 74 | - Pros: 75 | * Index scales with the size of the file 76 | * Offset lookups are still fairly fast 77 | * Small files stay small but big files can get extremely large 78 | 79 | ## Buffer Cache Location 80 | 81 | - Where is buffer cache typically located? 82 | * Below the file system 83 | - What does the buffer cache store? 84 | * Complete disk blocks including file system metadata 85 | 86 | ## Caching and Consistency 87 | 88 | - How can the cache cause consistency problems? 89 | * Objects in the cache are lost on failures 90 | - Remember: almost every file system operation involves modifying multiple disk blocks 91 | - Example of creating a new file in an existing directory 92 | 1. Allocate an inode, mark the used inode bitmap 93 | 2. Allocate data blocks, mark the used data block bitmap 94 | 3. Associate data blocks with the file by modifying the inode 95 | 4. Add inode to the given directory by modifying the directory file 96 | 5. Write data blocks 97 | 98 | ## How Caching Exacerbates Consistency 99 | 100 | - Observation: file system operations that modify multiple blocks may leave the file system in an inconsistent state if partially complete 101 | - How does caching exacerbate this situation? 102 | * May increase time span between when the first write of the operation hits the disk and the last is completed 103 | 104 | ## What Can Go Wrong? 105 | 106 | - What kinds of inconsistency can take place if the system is interrupted between the multiple operations necessary to complete a write? 107 | 1. Allocate an inode, mark the used inode bitmap (incode incorrectly marked in use) 108 | 2. Allocate data blocks, mark the used data block bitmap (data blocks incorrectly marked in use) 109 | 3. Associate data blocks with the file by modifying the inode (dangling file not present in any directory) 110 | 4. Add inode to the given directory by modifying the directory file 111 | 5. Write data blocks (data loss) 112 | 113 | ## Maintaining File System Consistency 114 | 115 | - What is the safest approach? 116 | * Do not buffer writes 117 | * We call this a write through cache because writes do not hit the cache 118 | - What is the most dangerous approach? 119 | * Buffer all operations until blocks are evicted 120 | * We call this a write back cache 121 | - What approach is better for 122 | * Performace 123 | * Safety 124 | - What about a middle ground? 125 | * Write important file system data metadata structures (superblock, inode, bitmaps, etc) immedately but delay data writes 126 | - File systems also give use processes some control through `sync` (sync the entire file system) and `fsync` (sync one file) -------------------------------------------------------------------------------- /fundamentals/system-perf/cpus.md: -------------------------------------------------------------------------------- 1 | # CPUs 2 | 3 | ## Terminology 4 | 5 | - Processor: physical chip with 1+ CPUs implemented as cores or hardware threads 6 | - Core: independent CPU instance on a multi-core processor (multiprocessing) 7 | - Hardware thread: supports executing multiple threads in parallel on a single core where each thread is an independent CPU instance (multithreading) 8 | - CPU instruction: single CPU operation 9 | - Logical CPU: virtual processor 10 | - Scheduler: kernel subsystem that assigns threads to run on CPUs 11 | - Run queue: queue of runnable threads waiting to be serviced by CPUs 12 | 13 | ## Concepts 14 | 15 | - Clock Rate 16 | * Digital signal that drives all processor logic 17 | * CPU instruction takes one or more cycles (CPU cycles) to execute 18 | * CPUs execute at a clock rate e.g., 5GHz performs 5 billion clock cycles per second 19 | - Instruction 20 | * Instruction includes: 21 | 1. Instruction fetch 22 | 2. Instruction decode 23 | 3. Execute 24 | 4. Memory access 25 | 5. Register write-back 26 | * Each step takes at least one clock cycle 27 | * Memory can take dozens of clock cycles (CPU caching important) 28 | - Cycles per instruction: high-level metric for describing where a CPU is spending its clock cycles and understanding nature of CPU utilization 29 | * High CPI indicates CPUs are often stalled typically for memory access 30 | - CPU utilization: time CPU is busy performing work during an interval (%) 31 | * Performance does not degrade steeply with high utilization 32 | * Kernel prioritizes processes 33 | - Saturation: CPU at 100% utilization 34 | * Threads encounter scheduler latency as they wait for on-cpu time 35 | - Preemption: allows higher-priority thread to preempt running thread to beign its own execution 36 | * Elimiates run-queue latency for higher-priority work 37 | - Multiprocess 38 | * Use `fork()` 39 | * Separate address space per process 40 | * Cost of `fork()`, `exit()` 41 | * Communicate with IPC which incurs CPU cost; context switching to move data between address spaces 42 | - Multithreading 43 | * Use `threads` API 44 | * Small memory overhead 45 | * Small CPU overhead; just API calls 46 | * Direct access to share memory (integrity via synchroonization primitives) 47 | - Word size 48 | * Processors designed around a max word size e.g., 32-bit or 64-bit which is the integer size and register size. 49 | - CPU performance counters 50 | * Counters for: 51 | 1. CPU cycles 52 | 2. CPU instructions 53 | 3. Level 1,2,3 cache access (miss and hits) 54 | 4. Floating point unit 55 | 5. Memory I/O 56 | 6. Resource I/O 57 | 58 | ## Methodology and Analysis 59 | 60 | ### Tools Method 61 | 62 | - `uptime`: load averages over time 63 | - `vmstat`: check idle columns to see how much headroom there is (<10% can be a problem) 64 | - `mpstat`: check for hot CPUs to identify thread scability problem 65 | - `top`: see which processes are top CPU consumers 66 | - `pidstat`: break down top CPU consumers into user and system time 67 | - `perf/dtrace`: profile CPU usage stack traces to identify why CPUs are in use 68 | 69 | ### USE Method 70 | 71 | - Identify bottlenecks and errors across all components 72 | - For CPU: 73 | * Utilization: time CPU was busy 74 | + Percent busy, check per CPU to see if there are scalability issues 75 | * Saturation: degree to which runnable threads are queued waiting for turn on CPU 76 | * Errors: CPU errors 77 | + Are all CPUs still online 78 | 79 | ### Workload Characterization 80 | 81 | - Important for capacity planning, benchmarking and simulating workloads 82 | - Skip since we care about troubleshooting 83 | 84 | ### Profiling 85 | 86 | - Sampling the state of the CPU at timed intervals: 87 | 1. Select type of proofile data to capture and rate 88 | 2. Begin sampling at timed intervals 89 | 3. Wait while activity of interest occurs 90 | 4. End sampling and collect sample data 91 | 5. Process the data 92 | - Generate flame graphs 93 | - CPU profile data on: 94 | * User and/or kernel level 95 | * Function and offset, function only, partial stack trace or full stack trace 96 | - Sampling stack trace points to higher-level reasons for CPU usage 97 | 98 | ### Cycle Analysis 99 | 100 | - For usage of specific CPU resources such as caches and interconnects, profiling can use CPU performance counters (CPC)-based event triggers instead of timed intervals 101 | - Can reveal that cycles are spent stalled on Level 1, 2 or 3 cache misses, memory I/O or resuorce I/O or spent on floating-point operations or other activities 102 | 103 | ### Performance Monitoring 104 | 105 | - Identify issues and patterns over time 106 | - Key metrics for CPUs: 107 | 1. Utilization: percent busy 108 | 2. Saturation: run-queue length 109 | 110 | ### Static performance Tuning 111 | 112 | - Examine: 113 | * CPUs available 114 | * Size of CPU caches 115 | * CPU clock speed 116 | * CPU-related features enabled/disabled by BIOS? 117 | * Software imposted CPU usage limits? 118 | 119 | ### Priority Tuning 120 | 121 | - `nice` for adjusting process priority 122 | - Identify low-priority work e.g., monitoring agents and scheduled backups and moofidy them to start with a higher `nice` value 123 | - Can also change scheduler class/policy 124 | - Real-time scheduling class allow processes to preempt all other work 125 | 126 | ### Resource Control 127 | 128 | - Skip 129 | 130 | ### CPU Binding 131 | 132 | - Bind processes and threads to individual CPUs or collection or CPUs 133 | - Improve CPU cache warmth for processes, improving memory I/O performance 134 | 135 | ### Microbenchmarking 136 | 137 | - Skip 138 | 139 | ### Scaling 140 | 141 | - Skip 142 | 143 | ## Analysis 144 | 145 | -------------------------------------------------------------------------------- /fundamentals/grok-system-design/Examples/Fleet-Upgrade/algoexpert.md: -------------------------------------------------------------------------------- 1 | # Code Deployment 2 | 3 | ## Functional Requirements 4 | 5 | - System to repeatedly build and deploy code to hundreds of thousands of servers 6 | - Sercers across 5-10 regioons 7 | - Building code will involve grabbing snapshots of code using commit SHAs 8 | - Building code will take up to 15 minutes 9 | - Size of binaries of up to 10GB 10 | - Entire process take at most 30 minutes 11 | - Each build ends in SUCCESS or FAILURE 12 | - We care abouot availability (2 to 3 nines) 13 | 14 | ## Coming up with a Plan 15 | 16 | - Two clear sub-ssytems 17 | 1. Build system that builds code into binaries 18 | 2. Deployment system that deploys binaries to machines 19 | 20 | ## Build System - Overview 21 | 22 | - Jobs get added to a queue where each job has a commit identifier (SHA identifier) 23 | - Pool of servers (workers) are going to handle building these jobs 24 | - Each worker will repeatedly take jobs off the queue in a FIFO manner (no prioritization for now) 25 | - Write resulting binaries to blob storage e.g., S3 26 | - Blob storage makes sense since binaries are just blobs of data 27 | 28 | ## Build System - Job Queue 29 | 30 | - Naive design is to put job queue in memory 31 | - This implementation is problematic because we could lose state of jobs (queued jobs and past jobs) 32 | - Better off implementing queue in SQL database 33 | 34 | ## Build System - SQL Job Queue 35 | 36 | - `jobs` table where every record represents a job 37 | - Use record-creation timestamps as queue's ordering mechanism 38 | * `id`: ID of job, autogenerated 39 | * `created_at`: timestamp of creation 40 | * `commit_sha`: string of commit SHA 41 | * `name`: pointer ot job's eventual binary in blob storage 42 | * `status`: QUEUED, RUNNING, SUCCEEDED, FAILED 43 | - Implement dequeueing mechanism by looking at oldest `creation_timestamp` with a QUEUED status 44 | 45 | ## Build System - Concurrency 46 | 47 | - ACID transactions make it safe for potentially hundreds of workers grabbing jobs off the queue without unintentionally running same job twice 48 | ``` 49 | BEGIN TRANSACTION; 50 | SELECT * FROM jobs_table WHERE status = 'QUEUED' ORDER BY created_at ASC LIMIT 1; 51 | // if there is none, then we rollback 52 | UPDATE jobs_table SET status = 'RUNNING' where id = id from previous query; 53 | COMMIT; 54 | ``` 55 | - All workers will be running this transcation every so often to dequeue the next job 56 | - If we assume that we will have 100 workers sharing same queue, we will have 100/5=20 reads per second which is easy to handle for SQL 57 | * Note: assuming each worker dequeues every 5 seconds 58 | 59 | ## Build System - Lost Jobs 60 | 61 | - What if there is a network partition with our workers or one of our workers die mid-build? 62 | - Job may remain in RUNNING state forever 63 | - Use extra column in job called `last_heartbeat` 64 | * This will be updated in a heartbeat fashion by worker where worker updates row every 3-5 minutes to let us know job is still running 65 | - Separate service that polls the tabke every 5 minutes, checks all RUNNING joobs and if `last_heartbeat` is longer than 2 heartbeats ago then something is wrong and service can be reset too QUEUED 66 | ``` 67 | UPDATE jobs_table SET status = 'QUEUED' WHERE 68 | status = 'RUNNING' AND 69 | last_heartbeat < NOW() - 10 minutes 70 | ``` 71 | 72 | ## Build System - Scale Estimation 73 | 74 | - Previously assumed we would have 100 workers which made SQL database queue able to handle expected load 75 | - Back of envelope math shows a single worker can run 4 jobs per hour (builds can take up to 15 mintues) which si 100 jobs per day 76 | - If we have 5000-10000 builds per day then we would need 50-100 workers 77 | - System should scale horizontally fairly easily so we can automatically add or remove workers based on load 78 | 79 | ## Build System - Storage 80 | 81 | - When worker completes a build, it stores binary in blob storage before updating relevant row in `jobs` table 82 | - Ensures binary is available before it is marked as SUCCEEDED 83 | - Want to use regional storage 84 | 85 | ## Deployment System - General Overview 86 | 87 | - Want actual deployment system to allow foor very fast distribution of 10GB binaries to hundreds of thousands of servers across global regioons 88 | - Wants some service that tells us when a binary has been replicated in all regions 89 | - Anoother service that can serve as source of truth for what binary shooould currently be run on all machines 90 | - Peer-to-peer-network design foor actual machines across the world 91 | 92 | ## Deployment system - replication-status service 93 | 94 | - Service that continuously checks all regional buckets and aggregates replication status for successful builds 95 | - Once a binary has been replicated across all regions, service updates a separate SQL database with rows containing name of binary and a replicatioon status 96 | - Once binary has a complete replication_status, it is officially deployable 97 | 98 | ## Deployment system - block distribution 99 | 100 | - Sequential download from each server of a 10GB file will be slow 101 | - Instead we want all of oour regional clusters to behave as peer-to-peer networks 102 | 103 | ## Deployment system - trigger 104 | 105 | - Each regional cluster will have a key-value store holding config for that cluster about what builds should be running on that cluster 106 | - Also have a global key-value store 107 | - When engineer clicks Deploy build B1 button: 108 | 1. Global key-value store's build_version gets updated 109 | 2. Regional key-value stores will be continuously polling global key-value store for updates to the build_versioon and will update themselves 110 | 3. Machines in the clusters/regions will poll relevant regional key-value store and when build_version changes, they will try to fetch that build from P2P network and run the binary -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-16-address-translation.md: -------------------------------------------------------------------------------- 1 | # Lecture 16 2 | 3 | ## Translation is Control 4 | 5 | - Forcing processes to translate a reference to gain access to the underlying object proides the kernel with a great deal of control 6 | - References can be revoked, shared, moved, altered 7 | 8 | ## Virtual vs. Physical Memory 9 | 10 | - Address space abstraction requires breaking connection between a memory address and physical memory 11 | - We refer to data accessed via memory interface as using virtual addresses 12 | - Physical address points to memory 13 | - Virtual address points to something that acts like memory 14 | - Virtual addresses have much richer semantics than physical addresses, encapsulating location, permanence and protection 15 | 16 | ## Creating Virtual Addresses 17 | 18 | - `exec()` creates virtual addresses using ELF file as a blueprint 19 | - `fork()` copies virtual address space of parent process 20 | - `sbrk()` extends the process heap e.g., by `malloc()` 21 | - `mmap()` creates a virtual address region that points to a file 22 | 23 | ## Efficient Translation 24 | 25 | - Goal: almost every virtual address translation should be able to proceed without kernel assistance 26 | - Why: 27 | * Kernel is too slow 28 | * Recall: kernel sets policy, hardware provides the mechanism 29 | 30 | ## Explicit Translation 31 | 32 | - Process: "Dear kernel, I would like to use virual address 0x10000, please tell me what physical address this maps to?" 33 | - Does this work? 34 | * No, it is unsafe. We can't allow process to use physical addresses directly. All addresses must be translated. 35 | 36 | ## Implicit Translation 37 | 38 | - Process: "Machine, store to address 0x10000" 39 | - MMU: "Where does virtual address 0x10000 supposed to map to? Kernel, help!" 40 | - Exception 41 | - Kernel: "Machine, virtual address 0x10000 maps to physical address 0x567400" 42 | - MMU: "Process, store completed" 43 | - Note: if not translatable then we get a segmentation fault 44 | 45 | ## K.I.S.S Base and Bound 46 | 47 | - Simplest virtual address mapping approach: 48 | 1. Assign each process a base physical address and bound 49 | 2. Check: virtual address is okay if virtual address is less than bound 50 | 3. Translate: physical address = virtual address + base 51 | * Example: 52 | + Virtual address = 0x100000 53 | + Base = 0x40600, Bounds = 0x30000 54 | + Physical memory = virtual address + base = 0x50600 55 | + Note: if physical memory > bounds then process will fail (translation fails) 56 | 57 | ## Base and Bounds: Pro 58 | 59 | - Pro: simple, hardware only needs to know base and bounds 60 | - Pro: fast 61 | * Protection: one comparison 62 | * Translation: one addition 63 | 64 | ## Base and Bounds: Con 65 | 66 | - Con: this is not a good fit for our address space abstraction 67 | * Address spaces encourage discontiguous allocation 68 | * Base and bounds allocation must be mostly contiguous otherwise we lose memory to internal fragmentation 69 | - Con: also signficant chance of external fragmentation due to large contiguous allocations 70 | 71 | ## Segmentation 72 | 73 | - One base and bounds isn't a good fit for address space abstraction 74 | - We can extend this idea 75 | * Multiple bases and bounds per process (call each a segment) 76 | * Assign each logical region of the address space (code, data, heap, stack) to its own segment 77 | + Each can be separate size 78 | + Each can have separate permissions 79 | - Segmentation works as follows: 80 | 1. Each segment has a start virtual address, base physical address and bound 81 | 2. Check: virtual address is aokay if it is inside some segment or for some segment: 82 | + segment start < virtual address < segment start + segment bound 83 | 3. Translate: for the segment that contains htis virtual address: 84 | + physical address = virtual address - segment start + segment base 85 | - Example: 86 | * 0x10000 virtual memory 87 | * MMMU asks kernel if valid segment exists, and kernel replies with 88 | + start 0x100000, base 0x43000, bounds 0x1000 89 | * Mapped to physical address 0x43000 90 | - Example: 91 | * 0x400 virtual memory 92 | * MMU asks kernel if valid segment exists, and kernel replies with 93 | + start 0x100, base 0x16000, bounds 0x500 94 | * Mapped to physical 0x16300 95 | - Segmentation fault, core dumped mean you tried to access invalid virtual memory 96 | 97 | ## Segmentation: Pros 98 | 99 | - Still fairly simple 100 | * Protection (segment exists): N comparisons for N segments 101 | * Translation: one addition 102 | - Can organize and protect regions of memory appropriately 103 | - Better fit for address spaces leading to less internal fragmentation 104 | 105 | ## Segmentation: Cons 106 | 107 | - Still requires entire segment to be contiguous in memory 108 | - Potential for external fragmentation due to segment contiguity 109 | 110 | ## Ideal 111 | 112 | - Ideally, we would like: 113 | * Fast mapping from any virtual byte to any physical byte 114 | * OS cannot do this. Can hardware help? 115 | 116 | ## Translation Lookaside Buffer 117 | 118 | - Common system trick: when something is slow, throw a cache at it 119 | - Translation Lookaside Buffer (TLB) typically use content-addressable memory or CAMs to quickly search for a cached virtual-physical translation 120 | - Example: 121 | * TLB contains virtual to physical mapping: 122 | + 0x10 to 0x50 123 | + 0x800 to 0x306 124 | + 0x110 to 0x354 125 | * CAMs can search a large number of mapping e.g., 256 at once turning O(n) search operation to O(1) 126 | 127 | ## TLB Cons 128 | 129 | - CAMs are limited in size, cannot be arbitrarily large 130 | - Segments are too large and lead to internal fragmentation 131 | - Mapping individual bytes would nmean that the TLB would not be able to cache many entries and eprformance would suffer 132 | - Is there a middle ground? Yes: page translation and page management -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-15-virtual-address.md: -------------------------------------------------------------------------------- 1 | # Lecture 15 2 | 3 | ## Convention 4 | 5 | - Process layout is specified by the executable and linker format (ELF) 6 | - Some layouts is the function of convention 7 | - Example: why not load the code at `0x0` 8 | * To catch possibly the most common programmer error i.e., NULL pointer problems 9 | * Leaving a large portion of the process address space starting at `0x0` empty allows kernel to catch these errors 10 | 11 | ## Destined to Ever Meet? 12 | 13 | - Stack starts at the top of the address space and grows down 14 | - Note: registers are on the CPU 15 | - Heap starts towards the bottom and grows up 16 | - Will the stack and heap ever meet? 17 | * Probably not because that would mean either stack or heap was huge 18 | 19 | ## Relocation 20 | 21 | - Given our address space model, no more problems with locating things, right? 22 | * Not quite; dynamically loaded libraries still need to be relocated at run time 23 | 24 | ## Address Space: A Great Idea? 25 | 26 | - Address space abstraction sounds powerful and useful 27 | - Can we implement it? What is required? 28 | * Address translation: 29 | + Example: `0x100000` to process 1 is not the same as `0x10000` to process 2 is not the same 30 | * Protection: 31 | + Address spaces are intended to provide a private view of memory to each process 32 | * Memory management 33 | + Together one or several processes may have more address space allocated than physical memory on the machine 34 | - Implementing address sapaces requires breaking the direct connection between a memory address and physical memory 35 | - Introducing another level of indirection is a classic systems techniques (e.g., for file handles) 36 | - Forcing processes to translate a reference to gain access to the underlying object provides the kernel with a great deal of control 37 | - References can be revoked, shared, moved and altered 38 | 39 | ## Memory Interface 40 | 41 | - We don't usually think about memory as having an interface but it does 42 | * `load(address)`: load data from the given address, usually into a register or possibly into another memory location 43 | * `store(address, value)`: store value to the given address where value may be in a register or another memory location 44 | - Address space abstraction requires breaking the connection between a memory address and physical memory 45 | - We refer to data accessed via the memory interface as using virtual addresses 46 | * Physical address points to memory 47 | * Virtual address points to something that acts like memory 48 | - Virtual addresses have a much richer semantics than physical addresses, encapsulation location, permanence and protection 49 | 50 | ## Virtual Addresses: Location 51 | 52 | - Data referenced by a virtual address might be 53 | * In memory: but kernel may have moved it to the disk 54 | - Virtual address --> physical address 55 | * On-disk but the kernel may be caching it in memory 56 | - Virtual address --> disk, block, offset 57 | * In memory on another machine 58 | - Virtual address --> IP address, physical address 59 | * On a port on a hardware device 60 | - Virtual address --> device, port 61 | 62 | ## Virtual Addresses: Permanence 63 | 64 | - Processes expect data written to virtual addresses that point to physical memory to store values trannsiently 65 | - Processes expect data written to virtual addresses that point to disk to store values 66 | 67 | ## Virtual Addresses: Permissions and Protection 68 | 69 | - Some virtual addresses may only be used by the kernel while in kernel mode 70 | - Virtual addresses may also be assigned read, write or execute permissions 71 | * `read/write`: process can load/store to this address 72 | * `execute`: process can load and execute instructions from this address 73 | 74 | ## Creating Virtual Addresses: exec() 75 | 76 | - `exec()` uses a blueprint from an ELF file to determine how the address space should look when `exec()` completes 77 | - `exec()` creates and initializes virtual memory that point to memory: 78 | * Code: usually marked as read-only 79 | * Data: marked as read-write, but not executable 80 | * Heap: area used for dynamic allocations, marked read-write 81 | * Stack: space for the first thread 82 | - Recall: `pmap ` to look at memory mappings 83 | * `pmap` shows virtual addresses so multiple instances of the same process will have the same virtual addresses 84 | * `pmap` does not show you where those virtual addresses point 85 | 86 | ## Creating Virtual Address: fork() 87 | 88 | - `fork()` copies the address space of the calling process 89 | - The child has the same virtual addresses aas the parent but they point to different memory locations 90 | - Copying all the memory is expensive 91 | * Especially when the next thing that a proces does is to load a new binary which destroys most of the copied state 92 | 93 | ## Creating Virtual Addresses: sbrk() 94 | 95 | - Dynamic memory allocation is performed by `sbrk()` system call 96 | - `sbrk()` asks the kernel to move the breakpoint or the point at which the process heap ends 97 | - Used by `malloc()` when it wants more heap 98 | 99 | ## Creating Virtual Addresses: mmap() 100 | 101 | - `mmap()` is a system call that creates virtual addresses that map to a portion of a file 102 | 103 | ## Example Machine Memory Layout: System/161 104 | 105 | - System/161 emulates a 32-bit MIPS architecture 106 | - Addresses are 32-bits wide: from 0x0 to 0xFFFFFFFF 107 | - MIPS architecture defines four address regions: 108 | 1. 0x0 to 0x7FFFFFFF: process virtual addresses (accessible to user processes, translated by the kernel, 2GB) 109 | 2. 0x80000000 to 0x9FFFFFFF: kernel direct-mapped addresses (only accessible to the kernel, translated by subtracting 0x80000000, 512 MB cached) 110 | 3. 0xA0000000 to 0xBFFFFFFF: kernel direct-mapped addresses (only accessible to the kernel, 512 MB uncached) 111 | 4. 0xC0000000 to 0xFFFFFFFF: kernel virtual addresses (only accessible to the kernel, translated by the kernel, 1GB) 112 | -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-31-containers.md: -------------------------------------------------------------------------------- 1 | # Lecture 31 2 | 3 | ## Operating System Virtualization 4 | 5 | - How do we create a virtual OS (container)? 6 | * Start with a real OS 7 | * Create software responsible for isolating guest software inside the container 8 | + That software seems to lack a canonical name and today its actually a bunch of different tools 9 | * Container resources (processes, files, network sockets, etc) are provided by the real OS but visibility outside the container is limited 10 | - What are the implications? 11 | * Container and real OS share same kernel 12 | * So apps inside and outside the kernel must share same ABI 13 | * Challenges is getting this to work are due to shared OS namespaces 14 | 15 | ## Containers vs VMs 16 | 17 | - You can run Windows inside a container provided by Linux: False, containers shares the kernel with the host 18 | - You can run SUSE Linux isnide an Ubuntu container: True, as long as both distributions use the same kernels, differences aree confined to different binary tools and file locations 19 | - Running `ps` inside trhe container will show all processes: False, container process namespaces is isolated from the host 20 | 21 | ## Hypervisor vs Container Virtualization 22 | 23 | - Hypervisor virtualization 24 | * Server Hardware 25 | * Host OS 26 | * Hypervisor 27 | * Guest OSes 28 | * Binaries/libraries 29 | * Apps 30 | - Container virtualization 31 | * Server hardware 32 | * Host OS 33 | * Binaries/libraries 34 | * Apps 35 | 36 | ## Why Virtualize an OS? 37 | 38 | - Shares many of the same benefits of hardware virtualization with much lower overhead 39 | - Decoupling 40 | 1. Cannot run multiple OS on the same machine 41 | 2. Can transfer software setups to another machine as long as it has an identical or nearly identical hardware kernel 42 | 3. Can adjust ahrdware container resources to system needs 43 | - Isolation 44 | 1. Container should not leak info inside and outside the container 45 | 2. Can isolate all of the config and software packages a particular app needs to run 46 | 47 | ## OS vs. Hardware Overhead 48 | 49 | - Hardware virtualization system call path 50 | * App inside the VM makes a system call 51 | * Trap to the host OS (or hypervisor) 52 | * Hand trap back to guest OS 53 | - OS virtualization system call path 54 | * App inside the container makes a system call 55 | * Trap to the OS 56 | * Remember all of the work we had to do to deprivilege the guest OS and deal with uncooperative machine architectures like x86? 57 | + OS virrtualization does not require any of this: there is only one OS 58 | 59 | ## OS Virutalization is About Names 60 | 61 | - What kind of names must the container virtualize 62 | * Process IDs 63 | + `top` inside the container shows only processes running inside container 64 | + `top` outside container may show processes inside the container but with different pids 65 | * File names 66 | + Processes inside the container may have a limited or different view of mounted file system 67 | + File names may resolve to different naems and some file names outside the container may be removed 68 | * User names 69 | + Containers may have different users with different roles 70 | + `root` inside the container should not be root outside the container 71 | * Host name and IP address 72 | + Processes inside the container may use a different host name and IP address when performing network operations 73 | 74 | ## OS Virtualization is About Control 75 | 76 | - OS may want to ensure that the entire container cannot consume more than a certain amount of 77 | * CPU time 78 | * Memory 79 | * Disk or network I/O 80 | - Forms of OS virutalization go back to `chroot` (run command or interactive shell with special root directory) 81 | * Instead of starting path resolution at inode #2 , start sometwhere else 82 | - Modern container management systems e.g., Docker combine and build upon multiple lower-level tools and services 83 | 84 | ## Linux Namespaces 85 | 86 | - Linux has provided namespace separation for a variety of resources that typically had unified namespaces 87 | * Mount points: allows different namespaces to see different views of the file system 88 | * Process IDs: new processes are allocated IDs in their current namespace and all parent namespaces 89 | * Network: namespaces can have private IP addresses and their own routing tables, and can communciate with other namespaces through virtual interfaces 90 | * Devices: devices can be present or hidden in different namespaces 91 | 92 | ## Cgroups 93 | 94 | - Cgroups is a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes 95 | - Processes and their children remain in the same cgroup 96 | - cgroups make it possible to control the resources allocated to a set of processes 97 | 98 | ## UnionFS 99 | 100 | - A stackable unification file system 101 | - Path name resolution: 102 | * Does `/foo/bar` exist in the top layer; if yes, return its contents 103 | * Does `/foo/bar` exist in the next layer; if yes, return its contents 104 | * Etc 105 | - Can also hide parts of the lower file system 106 | * Does `/foo/bar` exist in the top layer; if yes, return its contents 107 | * Access to `/ff` in the next layer is prohibted, so stop 108 | 109 | ## COW File System 110 | 111 | - Previous container libraries made a copy of the parent's entire file system 112 | - What could we do instead 113 | * Copy on write 114 | * Only make modifications to underlying file system when the container modifies files 115 | * Speeds start up and reduces storage usage 116 | + The container mainly needs read-only access to host files 117 | 118 | ## What is Docker? 119 | 120 | - Docker builds on previous technologies 121 | * Provides a unified set of tools for container management on a variety of systems 122 | * Layered file system images for easy updates 123 | * Now involved in development of containerization libraries on Linux 124 | -------------------------------------------------------------------------------- /fundamentals/pucit-systems-programming/lecture-5-gnu-cmake.md: -------------------------------------------------------------------------------- 1 | # Lecture 5 2 | 3 | ## Binary Software Packages 4 | 5 | - A binary package is a collection of files bundled into a single file containing 6 | * Executable files 7 | * man/info pages 8 | * copyright info 9 | * configuration and installation scripts 10 | - It is easy to install softwares from binary packages built for your machine and OS as the dependencies are already resolved 11 | - For Debian distributions (Ubuntu, Kali, Mint), they come in `.deb` format and package managers are available e.g., `apt`, `dpkg` 12 | - For Redhat-based distributions (Fedora, CentOS, OpenSuse), they come in `.rpm` format and availbable package managers are `rpm` and `yum` 13 | 14 | ## Open-Source Software Packages 15 | 16 | - Open-source software is a software with source code made available with license in which copyright holder provides rights to study, change and distribute software to anyone and for any purpose (GNU GPL) 17 | - Normally distributed as a tarball with: 18 | * Source code files 19 | * README and INSTALL 20 | * AUTHORS 21 | * Configure script 22 | * `Makefile.am` and `Makefile.in` 23 | - Source package is eventually converted into a binary package for a platform on which it is configured, built and installed 24 | - Normally use source packages to install softwares because: 25 | 1. We cannot find a corresponding binary package 26 | 2. We want to enchance functionalities of a software 27 | 3. We want to fix a bug in a software 28 | - Download options: 29 | * Download via ftp or `wget` 30 | * Use advanced packaging tool `sudo apt-get source hello` 31 | * Use github 32 | - Source package is eventually converted into a binary for a platform on which it is configured, built and installed 33 | * Many times we all have recited the following magic spell to install a unix open-source tarball: 34 | + `./configure` 35 | + `make` 36 | + `sudo make install` 37 | - Example: 38 | ``` 39 | mkdir hellopackage 40 | cd hellopackage 41 | wget ftp://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz 42 | tar xzf hello-2.10.tar.gz 43 | cd hello-2.10 44 | ``` 45 | - `src` contains source code with `hello.c` and `system.h` 46 | - `man` contains the man page 47 | - Note: there are no `makefile` in this package, we need to create it 48 | - `./configure` 49 | * Checks for dependencies required for build and install process 50 | * This script will create a `makefile` for you 51 | - `make` 52 | - `ls src` will show `.o` files created by `make` 53 | - `sudo make install` 54 | * Binary and man pages are copied 55 | - `which hello` 56 | * Now exists in `/usr/local/bin` 57 | - `hello` 58 | * Just prints hello world 59 | - `sudo make uninstall` 60 | 61 | ## Packaging Your Software using GNU Autotools autoconf & automake 62 | 63 | - Packaging software using GNU autotools 64 | * `configure.ac` is used by `aclocal` to generate `aclocal.m4` 65 | * `configure.ac` is used by `autoconf` to generate `configure` 66 | * `Makefile.am` is used by `automake` to generate `Makefile.in` 67 | * `Makefile.in` is used by `configure` to generate `makefile` 68 | * `make dist` generates `myexe-1.0.tar.gz` (tarball) 69 | - Example: 70 | * Same `src` as in previous examples 71 | * `configure.ac` 72 | ``` 73 | AC_INIT([myexe],[1.0],[arif@pucit.edu.pk]) # Required 74 | AM_INIT_AUTOMAKE # Use automake for this project 75 | AC_PROG_CC([gcc cl cc]) # Compiler dependencies 76 | AC_CONFIG_FILES([Makefile]) # Convert makefile.in to makefile 77 | AC_OUTPUT 78 | ``` 79 | * `aclocal` will generate `aclocal.m4` 80 | * `autoconf` will generate `configure` 81 | * `makefile.am` 82 | ``` 83 | AUTOMAKE_OPTIONS=foreign 84 | bin_PROGRAMS = myexe 85 | myexe_SOURCES = src/myadd.c src/mysub.c src/mymul.c src/mydiv.c src/prog1.c src/mymath.h 86 | ``` 87 | * `automake` 88 | * `./configure` 89 | * `makefile` that results is huge 90 | * `make dist` to create tarball 91 | + Tarball will contain all of the files in `src` as well as the executable 92 | 93 | ## Packaging Software with Cmake 94 | 95 | - `cmake` is a cross platform `makefile` generator 96 | - Effort to deploy a better way to configure, build and deploy complex softwares written in various languages across many different platforms 97 | 98 | ## How Does CMake work 99 | 100 | - CMake utility reads project description from a file named `CMakeLists.txt` and generates a build system for a Makefile project, VS project, Eclipse project, etc 101 | - Example: 102 | * Consider following project: 103 | + `CMakeLists.txt` 104 | + `include` 105 | - `mymath.h` (contains the prototypes) 106 | + `lib` 107 | - `libarifmath.a` (contains `.o` files created in previous sections - view with `ar` utility) 108 | - `libarifmath.so` 109 | + `man` 110 | - `myadd.3` (contains man page info) 111 | + `src` 112 | - `prog1.c` (contains C code) 113 | - We should build out of source so that: 114 | * Generated files remain separate from source files 115 | * We can generate multiple source trees from the same source 116 | * We can delete build directory later and perform a clean build again 117 | - `CMakeLists.txt` 118 | ``` 119 | cmake_minimum_required (VERSION 3.7) 120 | project(ex1_cmakeproject) 121 | include_directories(${CMAKE_SOURCE_DIR}/include}) 122 | link_directories(${CMAKE_SOURCE_DIR}/lib) 123 | set(SOURCES src/prog1.c) 124 | add_executable(myexe ${PROJECT_SOURCE_DIR}/${SOURCES}) 125 | target_link_libraries(myexe libc.so libarifmath.a) 126 | install(TARGETS myexe DESTINATION /usr/bin) 127 | install(FILE man/myadd.3 DESTINATION /usr/share/man/man3) 128 | include(InstallREquiredSystemLibraries) 129 | set(CPACK_GNERATOR "DEB") 130 | set(CPACK_DEBIAN_PACKAGE_MAINTAINER "Arif Butt) 131 | include (CPACK) 132 | ``` 133 | - `cmake` 134 | * Produces `makefile` along with other cmake-related files 135 | * `make` to generate `myexe` 136 | * `make install` to install `myexe` 137 | - `cpack --config CPackSourceConfig.cmake` to generate the Redhat package 138 | - `cpack --config CPackConfig.cmake` to generate the Debian package -------------------------------------------------------------------------------- /fundamentals/cse-421-intro-to-os/lecture-6-synch-primitives.md: -------------------------------------------------------------------------------- 1 | # Lecture 6 2 | 3 | ## Locks 4 | 5 | - Locks are a synchronization primitive used to implement critical sections 6 | - Threads release a lock when leaving a critical section 7 | - Previously we covered spinlocks 8 | * Lock for the fact that it guards a critical section 9 | * Spin describing the process of acquiring it 10 | - Spinlocks are rarely used on their own to solve synchronization problems 11 | - Spinlocks are commonly used to build more useful synchronization primitives 12 | - If we go back to the previous problem: 13 | ``` 14 | void giveGWATheMoolah(account_t account, int largeAmount) { 15 | int gwaHas = get_balance(account); 16 | gwaHas = gwaHas + largeAmount; 17 | put_balance(account, gwaHas); 18 | notifyGWAThatHeIsRich(gwaHas); 19 | return; 20 | } 21 | ``` 22 | - We can apply a lock: 23 | ``` 24 | lock gwaWalletLock; // Need to initialize somewhere 25 | 26 | void giveGWATheMoolah(account_t account, int largeAmount) { 27 | + lock_acquire(&gwaWalletLock); 28 | int gwaHas = get_balance(account); 29 | gwaHas = gwaHas + largeAmount; 30 | put_balance(account, gwaHas); 31 | + lock_release(&gwaWalletLock); 32 | notifyGWAThatHeIsRich(gwaHas); 33 | return; 34 | } 35 | ``` 36 | - If we call `lock_acquire()` while another thread is in the critical section then the thread acquiring the lock must wait until the thread holding the lock calls `lock_release()` 37 | 38 | ## How to Wait 39 | 40 | - Two ways to wait: 41 | 1. Active waiting: repeat action until the lock is released 42 | 2. Passive waiting: tell the kernel what we are waiting for, go to sleep, and rely on `lock_release()` to awaken us 43 | - There are cases where spinning is the right thing to do. When? 44 | * Only on multi-core systems. Why? 45 | + On single core systems, nothing can change unless we allow another thread to run 46 | * If the critical section is short 47 | + Balance the length of the critical section against the overhead of a context switch 48 | 49 | ## How to Sleep 50 | 51 | - The kernel provide functionality allowing kernel threads to sleep and wake on a `key` 52 | - `thread_sleep(key)` tells the kernel, I (the process) is going to sleep, but please wake me up when `key` happens 53 | - `thread_wake(key)` tells the kernel, please wake all the threads that were waiting for `key` 54 | - Similar functionality can be implemented in user space 55 | 56 | 57 | - Locks are designed to protect critical sections 58 | - `lock_release()` can be considered a signal from the thread inside the critical section to other threads indicating that they can proceed 59 | - What about other kinds of signals that I want to deliver e.g., 60 | * The buffer has data in it 61 | * Child has exited 62 | 63 | ## Condition Variables 64 | 65 | - We can do this using condition variables 66 | * A condition variable is a signaling mechanism allowing threads to: 67 | + `cv_wait` until a condition is true 68 | + `cv_notify` to notify other threads when the condition becomes true 69 | - The condition is usually represented as some change to shared state e.g., 70 | * The buffer has data in it `bufsize > 0` 71 | * `cv_wait`: notify me when buffer has data in it 72 | * `cv_signal`: I just put data in the buffer so notify the threads that are waiting for the buffer to have data 73 | - Condition variable can convey more information than locks about some change to the state of the world 74 | * Example: buffer can be full, empty or neither 75 | * If buffer full, we can let threads withdraw but not add items 76 | * If buffer empty, we can let threads add but not withdraw 77 | * If buffer is neither full nor empty we can let threads add and withdraw items 78 | * We have three different buffer state and two different threads (producer and consumer) 79 | 80 | ## Locking Multiple Resources 81 | 82 | - Locks protect access to shared resources 83 | - Threads may need multiple shared resources to perform some operation 84 | - Example: 85 | * Consider two threads `A` and `B` that both need simultaneous access to resources `1` and `2` 86 | 1. Thread `A` runs, grabs lock for resource `1` 87 | 2. Context switch 88 | 3. Thread `B` runs, grabs lock for resource `2` 89 | 4. Context switch 90 | 5. Thread `A` runs, tries to acquire lock for resource `2` 91 | 6. Thread sleeps 92 | 7. Thread `B` runs, tries to acquire lock for resource `1` 93 | - Both threads will never wake up as they have a circular dependency 94 | * This is referred to as a deadlock 95 | - Self deadlock happends when a single thread is in a deadlock 96 | * Thread `A` acquires resource `1` 97 | * Thread `A` then tries to reacquire resource `1` 98 | * Why would this happen? 99 | + `foo()` needs resource `1`, `bar()` needs resource `1` 100 | + While locking resource `1`, `foo()` calls `bar()` 101 | * Solve this problem with recursive locks where we allow a thread to reacquire a lock that it already hoolds as long as calls to acquire are matched by calls to release 102 | * This is fairly common 103 | 104 | ## Conditions for Deadlock 105 | 106 | - A deadlock cannot occur unless all of the following conditions are met: 107 | 1. Protected access to shared resources which implies waiting 108 | 2. No resource preemption meaning that thesystem cannot forcibly take a resource from a thread holding it 109 | 3. Multiple independent requests, meaning a thread can hold some resources while requesting others 110 | 4. Circular dependency graph meaning that Thread `A` is waiting for Thread `B` which is waiting for Thread `C` which is waiting for Thread `D` which is waiting for Thread `A` 111 | 112 | ## Deadlock vs Starvation 113 | 114 | - Starvation is an equally problematic condition in which one or more threads do not make progress 115 | * Starvation differs from deadlock in that some threads make progress and it is, in fact, those threads that are preventing the starving threads from proceeding 116 | 117 | ## Producer-Consumer 118 | 119 | - Producer and consumer share a fixed-size buffer 120 | - Producer can add items to the buffer if it is not full 121 | - Consumer can withdraw items from the buffer if it is not empty 122 | - Ensure: 123 | 1. Producer wait if buffer is full 124 | 2. Consumer must wait if buffer is empty 125 | 3. Producers should not be sleeping if there is room in the buffer 126 | 4. Consumers should not be sleeping there are items in the buffer --------------------------------------------------------------------------------