├── .gitignore
├── .vscode
    └── settings.json
├── Lesson 02 - Course Overview
    └── lesson-02-notes.md
├── .gitattributes
├── Lesson 01 - Course Readiness Survey
    └── lesson-01-notes.md
├── Lesson 06 - Thread Case Study - PThreads
    ├── 01-pthread-creation-1-quiz.md
    ├── 04-lesson-review.md
    ├── 03-pthread-creation-3-quiz.md
    ├── 02-pthread-creation-2-quiz.md
    └── lesson-06-notes.md
├── Lesson 18 - Distributed File Systems
    ├── 01-file-caching-quiz.md
    ├── 04-nfs-file-handle-quiz.md
    ├── 06-lesson-review.md
    ├── 05-nfs-cache-consistency-quiz.md
    ├── 02-dfs-data-structure-quiz.md
    ├── 03-replication-vs-partitioning-quiz.md
    └── lesson-18-notes.md
├── Lesson 14 - Synchronization Constructs
    ├── 01-mutex-via-semaphore-quiz.md
    ├── 07-lesson-review.md
    ├── 05-test-and-test-and-set-spinlock-quiz.md
    ├── 06-queueing-lock-array-quiz.md
    ├── 03-spinlock-2-quiz.md
    ├── 02-spinlock-1-quiz.md
    ├── 04-conflicting-metrics-quiz.md
    └── lesson-14-notes.md
├── Lesson 15 - IO Management
    ├── 06-block-device-quiz.md
    ├── 02-io-devices-as-files-quiz.md
    ├── 08-lesson-review
    ├── 04-looking-at-dev-quiz.md
    ├── 07-inode-quiz.md
    ├── 03-pseudo-devices-quiz.md
    ├── 01-io-devices-quiz.md
    ├── 05-dma-vs-pio-quiz.md
    └── lesson-15-notes.md
├── Lesson 17 - Remote Procedure Calls
    ├── 03-xdr-data-types-quiz.md
    ├── 04-xdr-encoding-quiz.md
    ├── 02-square.x-return-type-quiz.md
    ├── 05-lesson-review.md
    ├── 01-rpc-failure-quiz.md
    └── lesson-17-notes.md
├── Lesson 03 - Introduction to Operating Systems
    ├── 01-operating-system-components-quiz.md
    ├── 04-lesson-review.md
    ├── 03-system-calls-quiz.md
    ├── 02-abstraction-or-arbitration-quiz.md
    └── lesson-03-notes.md
├── Lesson 12 - Memory Management
    ├── 02-page-table-size-quiz.md
    ├── 05-lesson-review.md
    ├── 04-check-pointing-quiz.md
    ├── 01-multi-level-page-table-quiz.md
    ├── 03-least-recently-used-quiz.md
    └── lesson-12-notes.md
├── Lesson 05 - Threads and Concurrency
    ├── 06-lesson-review.md
    ├── 02-mutex-quiz.md
    ├── 04-critical-section-quiz.md
    ├── 03-condition-variable-quiz.md
    ├── 01-process-vs-threads-quiz.md
    ├── 05-multi-threading-patterns-quiz.md
    └── lesson-05-notes.md
├── Lesson 19 - Distributed Shared Memory
    ├── 06-consistency-models-4-quiz.md
    ├── 03-consistency-models-1-quiz.md
    ├── 05-consistency-models-3-quiz.md
    ├── 04-consistency-models-2-quiz.md
    ├── 07-consistency-models-5-quiz.md
    ├── 01-implementing-dsm-quiz.md
    ├── 08-lesson-review.md
    ├── 02-dsm-performance-quiz.md
    └── lesson-19-notes.md
├── Lesson 04 - Processes and Process Management
    ├── 07-lesson-review.md
    ├── 03-process-state-quiz.md
    ├── 04-parent-process-quiz.md
    ├── 06-shared-memory-quiz.md
    ├── 02-hot-cache-quiz.md
    ├── 05-scheduler-responsibility-quiz.md
    ├── 01-virtual-address-quiz.md
    └── lesson-04-notes.md
├── Lesson 20 - Data Center Technologies
    ├── 01-data-center-quiz.md
    ├── 02-homogeneous-design-quiz.md
    ├── 03-heterogeneous-design-quiz.md
    ├── 04-scale-out-limitations-quiz.md
    ├── 06-cloud-failure-probability-quiz.md
    ├── 07-lesson-review.md
    ├── 05-cloud-computing-definitions-quiz.md
    └── lesson-20-notes.md
├── Lesson 08 - Thread Design Considerations
    ├── 03-number-of-threads-quiz.md
    ├── 05-lesson-review.md
    ├── 01-thread-structures-quiz.md
    ├── 02-pthread-concurrency-quiz.md
    ├── 04-signals-quiz.md
    └── lesson-08-notes.md
├── Lesson 11 - Scheduling
    ├── 05-cpi-experiment-quiz.md
    ├── 02-preemptive-scheduling-quiz.md
    ├── 06-lesson-review.md
    ├── 04-linux-schedulers-quiz.md
    ├── 01-sjf-performance-quiz.md
    ├── 03-time-slice-quiz.md
    └── lesson-11-notes.md
├── Lesson 16 - Virtualization
    ├── 07-bt-and-pv-quiz.md
    ├── 01-virtualization-tech-quiz.md
    ├── 05-virtualization-requirements-quiz.md
    ├── 09-lesson-review.md
    ├── 02-benefits-of-virtualization-1-quiz.md
    ├── 08-hardware-virtualization-quiz.md
    ├── 03-benefits-of-virtualization-2-quiz.md
    ├── 04-bare-metal-or-hosted-quiz.md
    ├── 06-problematic-instructions-quiz.md
    └── lesson-16-notes.md
├── Lesson 13 - Inter-process Communication
    ├── 01-ipc-comparison-quiz.md
    ├── 03-lesson-review.md
    ├── 02-message-queue-quiz.md
    └── lesson-13-notes.md
├── Lesson 21 - Sample Final Questions
    ├── 09-rpc-data-types.md
    ├── 07-pio.md
    ├── 06-page-table-size.md
    ├── 08-inode-structure.md
    ├── 04-synchronization.md
    ├── 05-spinlocks.md
    ├── 03-hardware-counters.md
    ├── 11-consistency-models.md
    ├── 12-distributed-applications.md
    ├── 10-dfs-semantics.md
    ├── 01-time-slices.md
    └── 02-linux-o(1)-scheduler.md
├── Lesson 10 - Sample Midterm Questions
    ├── 05-signals.md
    ├── 01-process-creation.md
    ├── 08-performance-observations.md
    ├── 06-solaris-papers.md
    ├── 02-multi-threading-and-one-cpu.md
    ├── 04-calendar-critical-section.md
    ├── 03-critical-section.md
    └── 07-pipeline-model.md
├── Lesson 07 - Problem Set 1
    ├── 03-simple-socket-server.c
    ├── 02-simple-socket-client.c
    ├── 04-the-echo-protocol-client.c
    ├── 01-priority-readers-and-writers.c
    └── 04-the-echo-protocol-server.c
└── Lesson 09 - Thread Performance Considerations
    ├── 04-lesson-review.md
    ├── 02-performance-observation-quiz.md
    ├── 01-models-and-memory-quiz.md
    ├── 03-experimental-design-quiz.md
    └── lesson-09-notes.md


/.gitignore:
--------------------------------------------------------------------------------
1 | 
2 | *.bin
3 | 


--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 |     "files.insertFinalNewline": true
3 | }


--------------------------------------------------------------------------------
/Lesson 02 - Course Overview/lesson-02-notes.md:
--------------------------------------------------------------------------------
1 | # Lesson 2: Course Overview
2 | 


--------------------------------------------------------------------------------
/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 | 


--------------------------------------------------------------------------------
/Lesson 01 - Course Readiness Survey/lesson-01-notes.md:
--------------------------------------------------------------------------------
1 | # Lesson 1: Course Readiness Survey
2 | 


--------------------------------------------------------------------------------
/Lesson 06 - Thread Case Study - PThreads/01-pthread-creation-1-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: PThread Creation 1
 2 | 
 3 | **Code provided in lecture**.
 4 | 
 5 | A: The following output is valid:
 6 | 
 7 | ```bash
 8 | Hello Thread
 9 | Hello Thread
10 | Hello Thread
11 | Hello Thread
12 | ```
13 | 


--------------------------------------------------------------------------------
/Lesson 18 - Distributed File Systems/01-file-caching-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 1: File Caching
2 | 
3 | **Where** do you think **files or file blocks can be cached** in a **DFS** with a **single file server** and **many clients**?
4 | 
5 | A: In a remote server, on AWS for example there is a caching option for servers.
6 | 


--------------------------------------------------------------------------------
/Lesson 14 - Synchronization Constructs/01-mutex-via-semaphore-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 1: Mutex Via Semaphore
2 | 
3 | Complete the code snippet (shown in lecture side) so that the **semaphore** has behavior identical to a **mutex** used by threads within a process.
4 | 
5 | A: The answers are `1` and `0` respectively.
6 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/06-block-device-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 6: Block Device
2 | 
3 | In Linux, the `ioctl()` command can be used to manipulate devices. Complete the code snippet (provided in lecture slides), using `ioctl()`, to **determine the size** of a **block device**.
4 | 
5 | A: The answer is `blkgetsize`.
6 | 


--------------------------------------------------------------------------------
/Lesson 17 - Remote Procedure Calls/03-xdr-data-types-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 3: XDR Data Types
2 | 
3 | A **RPC routine** uses the following **XDR data type**: `int data<5>`; assume the array is full. **How many bytes** are needed to represent this **5 element array** in a C client on a **32-bit machine**?
4 | 
5 | A: 28 bytes
6 | 


--------------------------------------------------------------------------------
/Lesson 03 - Introduction to Operating Systems/01-operating-system-components-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 1: Operating Systems Components
2 | 
3 | Which of the following are likely components of an operating system?
4 | 
5 | A: file system (hides hardware complexity), device driver (makes decisions), scheduler (distributes processes).
6 | 


--------------------------------------------------------------------------------
/Lesson 12 - Memory Management/02-page-table-size-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 2: Page Table Size
2 | 
3 | On a **12-bit architecture** what are the **number of entries in the page table** if the **page size** is **32 bytes**? How about **512 bytes**? (assume single-level page table)
4 | 
5 | A: The answers are 128 entries and 8 entries respectively.
6 | 


--------------------------------------------------------------------------------
/Lesson 05 - Threads and Concurrency/06-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 6: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about threads, how threads differ from processes, thread mechanisms, ways to control threads and avoid wake-up or deadlock scenarios, and different multi-threading models and patterns.
6 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/06-consistency-models-4-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 6: Consistency Models 4
2 | 
3 | Consider the following sequence of operations (see lecture slides). Is this execution **weakly consistent**?
4 | 
5 | - Option 1: yes
6 | - Option 2: no
7 | 
8 | A: The correct answer is Option 1, since _P2_ and _P3_ are not synchronized.
9 | 


--------------------------------------------------------------------------------
/Lesson 04 - Processes and Process Management/07-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 7: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about what a process is, the ways in which an OS manages processes, address space and memory management, PCB, process life cycle and also how processes communicate with each other.
6 | 


--------------------------------------------------------------------------------
/Lesson 20 - Data Center Technologies/01-data-center-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 1: Data Center
2 | 
3 | How many data centers were there in the world in 2015? How much space (in square feet) was required to house all of the world's data centers (in 2015)?
4 | 
5 | A: There were around 500K data centers in 2015 which took up approximately 286 million square feet.
6 | 


--------------------------------------------------------------------------------
/Lesson 17 - Remote Procedure Calls/04-xdr-encoding-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 4: XDR Encoding
2 | 
3 | A **RPC routine** uses the following **XDR data type**: `int data<5>`; assume the array is full. **How many bytes** are needed to **encode** this **5 element array** to be sent from client to server (32-bit; do not include bytes for headers/protocol!)?
4 | 
5 | A: 24 bytes
6 | 


--------------------------------------------------------------------------------
/Lesson 08 - Thread Design Considerations/03-number-of-threads-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Number of Threads
 2 | 
 3 | - In the Linux kernel's codebase, **a minimum of how many threads** are needed to allow a system to boot?
 4 | - What is the **name of the variable** used to set this limit
 5 | 
 6 | A: The following answers are valid,
 7 | 
 8 | - `20`
 9 | - `max_threads`
10 | 


--------------------------------------------------------------------------------
/Lesson 17 - Remote Procedure Calls/02-square.x-return-type-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Square.x Return Type
 2 | 
 3 | What is the **return type** of `squareproc_1` if this `square.x` file is complied with:
 4 | 
 5 | - `rpcgen -C`
 6 | - `rpcgen -C -M`
 7 | 
 8 | A: The answers are as follows,
 9 | 
10 | - `rpcgen -C`: `square_out*`
11 | - `rpcgen -C -M`: `enum clnt_stat`
12 | 


--------------------------------------------------------------------------------
/Lesson 03 - Introduction to Operating Systems/04-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 4: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about OS elements (abstractions, mechanisms, and polices), design principles, difference between user level and kernel level boundary, and finally, the difference between monolithic and modular OS.
6 | 


--------------------------------------------------------------------------------
/Lesson 08 - Thread Design Considerations/05-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 5: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about KLTs vs ULTs, thread data structures, hard and light process state, thread management, issues when using multiple CPUs, timing issues (synching), interrupts vs signals, and interrupt and handling.
6 | 


--------------------------------------------------------------------------------
/Lesson 11 - Scheduling/05-cpi-experiment-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 5: CPI Experiment
2 | 
3 | This diagram (shown in lecture) shows the results from **Fedorova's experiments**.
4 | What do you think these results say about using **CPI** (cycles-per-instruction) for scheduling?
5 | 
6 | A: Mixed CPI is the way to go since mixed CPI yields higher IPC and same CPI yields a lower IPC.
7 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/03-consistency-models-1-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 3: Consistency Models 1
2 | 
3 | Consider the following sequence of operations (see lecture slides). Is this execution **sequentially consistent**?
4 | 
5 | - Option 1: yes
6 | - Option 2: no
7 | 
8 | A: The correct answer is Option 1, since a single processor takes on the updates while another reads.
9 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/05-consistency-models-3-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 5: Consistency Models 3
2 | 
3 | Consider the following sequence of operations (see lecture slides). Is this execution **sequentially consistent**? **Casually consistent**?
4 | 
5 | - Option 1: yes
6 | - Option 2: no
7 | 
8 | A: The sequence is neither sequentially consistent or casually consistent.
9 | 


--------------------------------------------------------------------------------
/Lesson 06 - Thread Case Study - PThreads/04-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 4: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about PThreads and equivalent Birrell's mechanisms, what PThread stands for, how PThreads are created, detaching PThreads, compiling PThreads, PThread Mutuxes, and PThread condition variables as well as examples.
6 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/02-io-devices-as-files-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: I/O Devices as Files
 2 | 
 3 | The following **Linux commands** all perform the same **operation** on an **I/O device (represented as a file)**. What **operation** do they perform?
 4 | 
 5 | - `cp file > /dev/Ip0`
 6 | - `cat file > /dev/Ip0`
 7 | - `echo "Hello, world" > /dev/Ip0`
 8 | 
 9 | A: The answer is _print_
10 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/04-consistency-models-2-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 4: Consistency Models 2
2 | 
3 | Consider the following sequence of operations (see lecture slides). Is this execution **sequentially consistent**? **Casually consistent**?
4 | 
5 | - Option 1: yes
6 | - Option 2: no
7 | 
8 | A: The sequence is not sequentially consistent, instead it is casually consistent.
9 | 


--------------------------------------------------------------------------------
/Lesson 05 - Threads and Concurrency/02-mutex-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Mutex
 2 | 
 3 | Threads *T1-T5* are contending for a mutex *m*, *T1* is the first to obtain the mutex. Which thread will get access to *m* after *T1* releases it?
 4 | 
 5 | - T2
 6 | - T3
 7 | - T4
 8 | - T5
 9 | 
10 | A: From the diagram in lecture, only *T2*, *T4*, and *T5* are valid, *T3* does not open up until later in time.
11 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/07-bt-and-pv-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 7: Binary Translation and Paravirtualization
2 | 
3 | Which of the following do you think will cause a **trap** and **exit to the hypervisor** for both **binary translation** and **paravirtualization** WMs?
4 | 
5 | - Option 1: access a page that's swapped
6 | - Option 2: update to page table entry
7 | 
8 | A: The correct answer is Option 1.
9 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/07-consistency-models-5-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 7: Consistency Models 5
2 | 
3 | If you ignore the sync operations (see lecture slides). Is this execution **casually consistent**?
4 | 
5 | - Option 1: yes
6 | - Option 2: no
7 | 
8 | A: The correct answer is Option 2, since casual consistency does not permit that writes from a single processor are randomly reordered.
9 | 


--------------------------------------------------------------------------------
/Lesson 04 - Processes and Process Management/03-process-state-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Process State
 2 | 
 3 | The CPU is able to execute a process when the process is in which state(s)?
 4 | 
 5 | - Option 1: running
 6 | - Option 2: ready
 7 | - Option 3: waiting
 8 | - Option 4: new
 9 | 
10 | A: Option 1 and 2 are valid, the CPU is not able to execute a process when the process state is waiting or new.
11 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/08-lesson-review:
--------------------------------------------------------------------------------
1 | # Quiz 8: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about O/S support for I/O devices, I/O device basics, device drivers, types of devices, programmed I/O, direct memory access, different types of accesses, block device stack, file systems, inodes (inodes with pointers as well), and disk access optimizations.
6 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/04-looking-at-dev-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: Looking at /Dev
 2 | 
 3 | Run the command `ls -la /dev` in a **Linux environment**. What are some of the **device names** you see? Enter at least **five device names** in a comma-separated list in the text-box below (see lecture slides).
 4 | 
 5 | A: The answer is as follows,
 6 | 
 7 | - `zero`
 8 | - `null`
 9 | - `tty`
10 | - `hda`
11 | - `sda`
12 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/01-virtualization-tech-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: Virtualization Tech
 2 | 
 3 | Based on the classical definition of **virtualization** by Popek and Goldberg, which of the following do you think are **virtualization technologies**? Check all that apply.
 4 | 
 5 | - Option 1: Virtual Box
 6 | - Option 2: Java Virtual Machine
 7 | - Option 3: Virtual Game Boy
 8 | 
 9 | A: The answer is Option 1.
10 | 


--------------------------------------------------------------------------------
/Lesson 03 - Introduction to Operating Systems/03-system-calls-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: System Calls
 2 | 
 3 | On a 64-bit Linux-based OS, which system call is used to:
 4 | 
 5 | - Send a signal to a process?
 6 | - Set the group identity of a process?
 7 | - Mount a file system?
 8 | - Read/write system parameters
 9 | 
10 | A: The answers are respectively,
11 | 
12 | - `kill`
13 | - `setgid`
14 | - `mount`
15 | - `sysctl`
16 | 


--------------------------------------------------------------------------------
/Lesson 04 - Processes and Process Management/04-parent-process-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 4: Parent Process
2 | 
3 | On UNIX-based OS, which process is often regarded as *the parent of all processes*?
4 | 
5 | Extra credit: on the Android OS, which process is regarded as *the parent of all app processes*?
6 | 
7 | A: `init` is the parent of all processes on UNIX-based OS and `zygote` is the parent of all app processes on Android.
8 | 


--------------------------------------------------------------------------------
/Lesson 13 - Inter-process Communication/01-ipc-comparison-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: IPC Comparison
 2 | 
 3 | Consider using IPC to communicate between processes. You can either use a **message-passing** or a **memory-based** API. Which one do you think will perform better?
 4 | 
 5 | - Option 1: message-passing
 6 | - Option 2: shared memory
 7 | - Option 3: neither, it depends
 8 | 
 9 | A: The correct answer is Option 3.
10 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/07-inode-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 7: Inode
2 | 
3 | An **inode** has the following structure: **each block pointer is 4 bytes**. If **a block on disk is 1 kB**, what is the **maximum file size** that can be supported by this **inode structure** (nearest GB)? What is the **maximum file size** if **a block on disk is 8 kB** (nearest TB)?
4 | 
5 | A: The answers are as follows,
6 | 
7 | - 16 GB
8 | - 64 TB
9 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/09-rpc-data-types.md:
--------------------------------------------------------------------------------
1 | # Problem 9: RPC Data Types
2 | 
3 | A RPC routine get_coordinates() returns the N-dimensional coordinates of a data point, where each coordinate is an integer.
4 | 
5 | Write the elements of the C data structure that corresponds to the 3D coordinates of a data point.
6 | 
7 | A: We would need an `int len` and `int * val` to correspond to 3S coordinates of a data point.
8 | 


--------------------------------------------------------------------------------
/Lesson 04 - Processes and Process Management/06-shared-memory-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 6: Shared Memory
 2 | 
 3 | **Shared memory-based communication** performs better than **message passing communication**.
 4 | 
 5 | - Option 1: true
 6 | - Option 2: false
 7 | - Option 3: it depends
 8 | 
 9 | A: Option 3 depends if the setup cost could be justified across a sufficiently large number of messages (shared memory-based communication).
10 | 


--------------------------------------------------------------------------------
/Lesson 08 - Thread Design Considerations/01-thread-structures-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 1: Thread Structure
2 | 
3 | 1. What is the name of the **kernel thread structure** (name of C struct)?
4 | 2. What is the name of the data structure - contained in the above data structure that **describes the process the kernel thread is running** (name of C struct)?
5 | 
6 | A: The answers are as follows,
7 | 1. `kthread_worker`
8 | 2. `task_struct`
9 | 


--------------------------------------------------------------------------------
/Lesson 11 - Scheduling/02-preemptive-scheduling-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Preemptive Scheduling
 2 | 
 3 | An OS scheduler uses a priority-based algorithm with preemption to schedule tasks. Given values shown in the table (see lecture), complete the finishing times of each task. Assume that _P3_ < _P2_ < _P1_.
 4 | 
 5 | A: The answers are as follows,
 6 | 
 7 | - _T1_ finishes at 8 s
 8 | - _T2_ finishes at 10 s
 9 | - _T3_ finishes at 11s
10 | 


--------------------------------------------------------------------------------
/Lesson 11 - Scheduling/06-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 6: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about scheduling mechanisms, how the CPU scheduler decides how and when processes and threads access shared CPUs, how the scheduler schedulers tasks and when, FCFS vs SJF scheduling, preemptive scheduling, round robin scheduling, time-sharing and time slices, run-queues, and linux schedulers.
6 | 


--------------------------------------------------------------------------------
/Lesson 13 - Inter-process Communication/03-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 3: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about IPCs, how IPCs can be used to support message passing, different levels of message based IPC (i.e., processes, OS, kernel), forms of message passing, shared memory IPC, copying versus mapping, SysV, synchronization method, and what to consider when designing for memory.
6 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/05-virtualization-requirements-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 5: Virtualization Requirements
 2 | 
 3 | Which of the following do you think are **virtualization requirements**?
 4 | 
 5 | - Option 1: present virtual platform interface to VMs
 6 | - Option 2: provide isolation across VMs
 7 | - Option 3: protect guest OS from apps
 8 | - Option 4: protect VMM from guest OS
 9 | 
10 | A: The correct answers are all of the above
11 | 


--------------------------------------------------------------------------------
/Lesson 18 - Distributed File Systems/04-nfs-file-handle-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: File Handle
 2 | 
 3 | In the previous morsel, we mentioned that **a file handle can become stale**. What does that mean?
 4 | 
 5 | - Option 1: the file is outdated
 6 | - Option 2: the remote server is not working
 7 | - Option 3: the file on the remote server has been removed
 8 | - Option 4: the file has been open for too long
 9 | 
10 | A: The correct answer is Option 3.
11 | 


--------------------------------------------------------------------------------
/Lesson 08 - Thread Design Considerations/02-pthread-concurrency-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Pthread Conncurency
 2 | 
 3 | - In the pthreads library, **which function sets the concurrency level** (function name)?
 4 | - For the above function, **which concurrency value** instructs the implementation to manage the concurrency level as it deems appropriate (integer)?
 5 | 
 6 | A: The following answers are valid,
 7 | 
 8 | - `pthread_setconcurrency()`
 9 | - `0`
10 | 


--------------------------------------------------------------------------------
/Lesson 10 - Sample Midterm Questions/05-signals.md:
--------------------------------------------------------------------------------
1 | # Problem 5: Signals
2 | 
3 | If the kernel cannot see user-level signal masks, then how is a signal delivered to a user-level thread (where the signal can be handled)?
4 | 
5 | A: There must exist a handler (more specifically, a ULT library handler) in between the user-level and the kernel-level such that signal masks created on the user-level can be transmitted from the user-level to the kernel-level.
6 | 


--------------------------------------------------------------------------------
/Lesson 12 - Memory Management/05-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 5: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about memory management, how the OS uses pages or segments, how to optimize for performance, virtual vs physical memory, page-based vs segment-based memory management, hardware support in memory management, page tables, TLBs, memory allocation and its challenges, demand paging, and check-pointing.
6 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/09-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 9: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about virtualization, benefits of virtualization, different models of virtualization (bare-metal vs hosted), processor virtualization, BT vs PV, different types of memory virtualization, device virtualization and models (passthrough, hypervisor direct, split-device), and lastly, hardware virtualization.
6 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/01-implementing-dsm-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: Implementing DSM
 2 | 
 3 | According to the paper _Distributed Shared Memory: Concepts and Systems_, what is a common task that's **implemented in software** in **hybrid (hardware + software) DSM implementations**?
 4 | 
 5 | - Option 1: prefetch pages
 6 | - Option 2: address translation
 7 | - Option 3: triggering invalidations
 8 | 
 9 | A: The correct answer is Option 1.
10 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/08-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 8: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about DSM, distributed applications vs DSM, hardware DSM vs software DSM and why hardware-supported DSM is expensive, DSM design (granularity, access algorithm, migration, replication, consistency management, and architecture), and consistency models (strict, sequential, casual, and weak).
6 | 


--------------------------------------------------------------------------------
/Lesson 03 - Introduction to Operating Systems/02-abstraction-or-arbitration-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Operating Systems Components
 2 | 
 3 | For the following options, indicate if they are examples of abstraction or arbitration.
 4 | 
 5 | A: The answers are as follows,
 6 | 
 7 | - Distributing memory between multiple processes (arbitration)
 8 | - Supporting different types of speakers (abstraction)
 9 | - Interchangeable access of hard disk or SSD (abstraction)
10 | 


--------------------------------------------------------------------------------
/Lesson 17 - Remote Procedure Calls/05-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 5: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about RPC, why we would want to use an RPC (as well as benefits), some requirements for RPCs, RPC structure, RPC steps (i.e., register, bind, call, marshal, receive, unmarshal, actual call, and result), IDL, SUN RPC, XDR compilation (as well as XDR data types, routines, and encoding), and Java RMI.
6 | 


--------------------------------------------------------------------------------
/Lesson 08 - Thread Design Considerations/04-signals-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: Signals
 2 | 
 3 | Using the most recent POSIX standard indicate the correct signal names for the following events:
 4 | 
 5 | - Terminal interrupt signal
 6 | - High bandwidth data is available on a socket
 7 | - Background process attempting write
 8 | - File size limit exceeded
 9 | 
10 | A: The answers are as follows,
11 | 
12 | - `SIGINT`
13 | - `SIGURG`
14 | - `SIGTTOU`
15 | - `SIGXFSZ`
16 | 


--------------------------------------------------------------------------------
/Lesson 13 - Inter-process Communication/02-message-queue-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Message Queue
 2 | 
 3 | For **message queues** what are the **Linux system calls** that are used for:
 4 | 
 5 | - Send message to a message queue?
 6 | - Receive messages from a message queue?
 7 | - Perform a message control operation?
 8 | - Get a message identifier?
 9 | 
10 | A: The answers are as follows,
11 | 
12 | - `msgsnd()`
13 | - `msgrcv()`
14 | - `msgctl()`
15 | - `msgget()`
16 | 


--------------------------------------------------------------------------------
/Lesson 04 - Processes and Process Management/02-hot-cache-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Hot Cache
 2 | 
 3 | For the following sentence, check all options that correctly complete it: *when a cache is hot...*
 4 | 
 5 | - Option 1: it can malfunction so we must context switch to another process
 6 | - Option 2: most process data is in the cache so the process performance will be at its best
 7 | - Option 3: sometimes we must context switch
 8 | 
 9 | A: Option 2 and 3 are valid.
10 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/02-benefits-of-virtualization-1-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Benefits of Virtualization 1
 2 | 
 3 | If **virtualization** has been around since the 60's, why has it not been used ubiquitously since that time?
 4 | 
 5 | - Option 1: virtualization was not efficient
 6 | - Option 2: everyone used Microsoft Windows
 7 | - Option 3: mainframes were not ubiquitous
 8 | - Option 4: other hardware was cheap
 9 | 
10 | A: The correct answer is Option 3 and 4.
11 | 


--------------------------------------------------------------------------------
/Lesson 20 - Data Center Technologies/02-homogeneous-design-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Homogeneous Design
 2 | 
 3 | Consider a toy shop where every worker knows how to build any toy (**homogeneous architecture**). If higher order rates start arriving you can keep the **homogeneous architecture** balanced by:
 4 | 
 5 | A: The following are valid answers,
 6 | 
 7 | - Add more workers (processes)
 8 | - Add more workbenches (servers)
 9 | - Add more tools, parts, etc. (storage)
10 | 


--------------------------------------------------------------------------------
/Lesson 05 - Threads and Concurrency/04-critical-section-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: Critical Section
 2 | 
 3 | A toy shop has the following policy, at any point in time:
 4 | 
 5 | - Max three new orders can be processed
 6 | - If only one new order being processed, then any number of old orders can be processed
 7 | 
 8 | Select the appropriate check that needs to be made for the **critical section** (see lecture).
 9 | 
10 | A: The first and last choice is correct (see lecture).
11 | 


--------------------------------------------------------------------------------
/Lesson 14 - Synchronization Constructs/07-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 7: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about more synchronization constructs, how hardware can support synchronization, spinlocks and similarities to mutexes, semaphores, atomic instructions, SMPs and their characteristics, cache coherence, contention with spinlocks, picking a delay, queueing locks, and spinlock performance for heavy and light loads.
6 | 


--------------------------------------------------------------------------------
/Lesson 07 - Problem Set 1/03-simple-socket-server.c:
--------------------------------------------------------------------------------
 1 | // Simple socket server
 2 | // Pseudo code
 3 | 
 4 | // Include packages
 5 | #include <stdio.h>
 6 | #include <stdlib.h>
 7 | #include <strings.h>
 8 | #include <unistd.h>
 9 | #include <netdb.h>
10 | 
11 | // Constants
12 | #define PORT 8000
13 | 
14 | // Accept a client socket if available
15 | int main() {
16 |     // Create socket
17 |     // Listen on socket
18 |     // Accept client
19 |     // Close socket
20 | }
21 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/08-hardware-virtualization-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 8: Hardware Virtualization
2 | 
3 | With hardware support for virtualization, guest VMs can run unmodified and can have access to the underlying devices. Given this, do you think the split-device driver model is still relevant?
4 | 
5 | - Option 1: yes
6 | - Option 2: no
7 | 
8 | A: The correct answer is Option 1 since the spit-driver model consolidates all of the requests for device access to the surface VM.
9 | 


--------------------------------------------------------------------------------
/Lesson 09 - Thread Performance Considerations/04-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 4: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about performance comparisons between MP, MT, and event-driven models. We also learned about event-driven architectures such as Flash and compared it against Apache web server. We learned comparisons depend on metrics, AMPED and AMTED architectures, experimental methodology, and designing and running experiments.
6 | 


--------------------------------------------------------------------------------
/Lesson 11 - Scheduling/04-linux-schedulers-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: Linux Schedulers
 2 | 
 3 | What was the **main reason** the **Linux O(1) scheduler** was replaced by the **CFS scheduler**?
 4 | 
 5 | - Option 1: Scheduling a task under high loads took unpredictable amount of time
 6 | - Option 2: Low priority task could wait indefinitely and starve
 7 | - Option 3: Interactive tasks could wait unpredictable amounts of time to be scheduled
 8 | 
 9 | A: The correct answer is Option 3.
10 | 


--------------------------------------------------------------------------------
/Lesson 12 - Memory Management/04-check-pointing-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: Check-pointing
 2 | 
 3 | Which one of these endings correctly completes the following statement?
 4 | 
 5 | *The more frequently you checkpoint...*
 6 | 
 7 | - Option 1: the more state you will checkpoint
 8 | - Option 2: the higher the overheads of the checkpointing process
 9 | - Option 3: the faster you will be able to recover from a fault
10 | - Option 4: all of the above
11 | 
12 | A: The correct answer is Option 4.
13 | 


--------------------------------------------------------------------------------
/Lesson 20 - Data Center Technologies/03-heterogeneous-design-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Heterogeneous Design
 2 | 
 3 | Consider a toy shop where every worker knows how to build some specific toy (**heterogeneous architecture**). If higher order rates start arriving you can keep the **heterogeneous architecture** balanced by:
 4 | 
 5 | A: The following are valid answers,
 6 | 
 7 | - Add more workers (processes)
 8 | - Add more workbenches (servers)
 9 | - Add more tools, parts, etc. (storage)
10 | 


--------------------------------------------------------------------------------
/Lesson 20 - Data Center Technologies/04-scale-out-limitations-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: Scale Out Limitations
 2 | 
 3 | Consider a toy shop where every worker knows how to build any toy (**heterogeneous architecture**). Higher order rates start arriving, so the manager keeps _scaling out_: add workers, workbenches, parts, this works until:
 4 | 
 5 | A: The following answers are valid,
 6 | 
 7 | - Unable to manage immense amount of resources
 8 | - Unable to outsource
 9 | - Capacity is full
10 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/02-dsm-performance-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: DSM Performance
 2 | 
 3 | If **access latency (performance)** is a primary concern, which of the following **techniques** would be best to use in your **DSM** design?
 4 | 
 5 | - Option 1: migration
 6 | - Option 2: caching
 7 | - Option 3: replication
 8 | 
 9 | A: The correct answers are Options 2 and 3 since migration is typically used for SRSW and caching and replication can make data more readily available and in closer locations.
10 | 


--------------------------------------------------------------------------------
/Lesson 11 - Scheduling/01-sjf-performance-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: SJF Performance
 2 | 
 3 | Assume SJF is used to schedule tasks _T1_, _T2_, and _T3_. Also make the following assumptions:
 4 | 
 5 | - Scheduler does not preempt tasks
 6 | - Known execution times: _T1_ = 1 s, _T2_ = 10 s, _T3_ = 1s
 7 | - All arrive at the same time t = 0
 8 | 
 9 | Calculate the **throughput**, **average completion time**, and **average wait time**.
10 | 
11 | A: The answers are, respectively, **0.25 tasks/sec**, **5 sec**, and **1 sec**
12 | 


--------------------------------------------------------------------------------
/Lesson 14 - Synchronization Constructs/05-test-and-test-and-set-spinlock-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 5: Test and Test and Set Spinlock
 2 | 
 3 | In an SMP system with **N processors**, what is the **complexity** of the memory contention (accesses), relative to N, that will result from releasing a `test_and_test_and_set` **spinlock** (code provided in lecture)?
 4 | 
 5 | - Cache coherent with `write_update` 
 6 | - Cache coherent with `write_invalidate`
 7 | 
 8 | A: The answers are as follows,
 9 | 
10 | - *O(n)*
11 | - *O(n^2)*
12 | 


--------------------------------------------------------------------------------
/Lesson 18 - Distributed File Systems/06-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 6: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about DFS, how DFS are designed and implemented, we also learned about NFS, various models of DFS (replicated, partitioned, or a combination of both), remote file service, the difference between stateless and stateful and how that affects DFS, Caching in DFS, replication versus partitioning, and some details on Sprite DFS (workloads, design, and file access).
6 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/03-benefits-of-virtualization-2-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Benefits of Virtualization 2
 2 | 
 3 | If **virtualization** was not widely adopted in the past, what changed? Why did we start to care about virtualization?
 4 | 
 5 | - Option 1: severs were underutilized
 6 | - Option 2: data centers were becoming too large
 7 | - Option 3: companies had to hire more system admins
 8 | - Option 4: companies were paying high utility bills to run and cool servers
 9 | 
10 | A: The correct answer all of the above.
11 | 


--------------------------------------------------------------------------------
/Lesson 18 - Distributed File Systems/05-nfs-cache-consistency-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 5: NFS Cache Consistency
 2 | 
 3 | Which of the following **file sharing semantics** are supported by **NFS** and it's **cache consistency mechanisms**?
 4 | 
 5 | - Option 1: UNIX
 6 | - Option 2: session
 7 | - Option 3: periodic
 8 | - Option 4: immutable
 9 | - Option 5: neither
10 | 
11 | A: The correct choice is Option 5, NFS is not purely a session-based file sharing semantics DFS, it does not purely support just periodic file sharing semantics.
12 | 


--------------------------------------------------------------------------------
/Lesson 18 - Distributed File Systems/02-dfs-data-structure-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: DFS Data Structure
 2 | 
 3 | Image a **DFS** where **file sharing** is implemented via a **server-driven** mechanism and which **session-semantics**. Given this design, which of the following items **should be** part of the **per file data structures** maintained by the **server**?
 4 | 
 5 | - Option 1: readers
 6 | - Option 2: current writer
 7 | - Option 3: current writers
 8 | - Option 4: version number
 9 | 
10 | A: The correct answers are Option 1, 3, and 4.
11 | 


--------------------------------------------------------------------------------
/Lesson 14 - Synchronization Constructs/06-queueing-lock-array-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 6: Queueing Lock Array
 2 | 
 3 | Assume we are using **Anderson's queueing spinlock implementation** where each array element of the **queue** can have one of two values: `has_lock(0)` and `must_wait(1)`. If a system has 32 CPUs, then how large is the **array data structure**?
 4 | 
 5 | - Option 1: 32 bits
 6 | - Option 2: 32 bytes
 7 | - Option 3: neither
 8 | 
 9 | A: The answer is Option 3 since the size of the data structure depends on the size of the cache line.
10 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/03-pseudo-devices-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Pseudo Devices
 2 | 
 3 | **Linux** supports a number of **pseudo (virtual) devices** that provide special functionality to a system. Given the following **functions** name the **pseudo device** that provides that **functionality**.
 4 | 
 5 | - Option 1: Accept and discard all output (produces no output)
 6 | - Option 2: Produces a variable-length string of pseudo-random numbers
 7 | 
 8 | A: The answer is the following,
 9 | 
10 | - Option 1: `/dev/null`
11 | - Option 2: `dev/random`
12 | 


--------------------------------------------------------------------------------
/Lesson 07 - Problem Set 1/02-simple-socket-client.c:
--------------------------------------------------------------------------------
 1 | // Simple socket client
 2 | // Pseudo code
 3 | 
 4 | // Include packages
 5 | #include <stdio.h>
 6 | #include <stdlib.h>
 7 | #include <strings.h>
 8 | #include <unistd.h>
 9 | #include <netdb.h>
10 | #include <netinet/in.h>
11 | 
12 | // Constants
13 | #define SERVER_NAME "localhost"
14 | #define PORT 8000
15 | 
16 | // Connect to server socket if available
17 | int main() {
18 |     // Set localhost
19 |     // Create socket
20 |     // Connect to server if exists
21 |     // Close socket
22 | }
23 | 


--------------------------------------------------------------------------------
/Lesson 12 - Memory Management/01-multi-level-page-table-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: Multi-level Page Table
 2 | 
 3 | A process with **12-bit addresses** has an address space where only the **first 2 KB** and the **last 1 KB** are allocated and used.
 4 | 
 5 | - How many total entries are there in a **single-level page table** that uses the **first address format**?
 6 | - How many entries are needed in the inner page tables of the **2-level page table** when the **second format** is used?
 7 | 
 8 | A: The answers are as follows,
 9 | 
10 | - 64
11 | - 48
12 | 


--------------------------------------------------------------------------------
/Lesson 05 - Threads and Concurrency/03-condition-variable-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Condition Variable
 2 | 
 3 | Recall the consumer code from the previous example for **condition variables**. Instead of `while` why did we not simply use `if`?
 4 | 
 5 | - Option 1: `while` can support multiple consumer threads
 6 | - Option 2: cannot guarantee access to *m* once the condition is signaled
 7 | - Option 3: the list can change before the consumer gets access again
 8 | - Option 4: all of the above
 9 | 
10 | A: The valid answer is Option 4, all answers are valid.
11 | 


--------------------------------------------------------------------------------
/Lesson 09 - Thread Performance Considerations/02-performance-observation-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Performance Observation
 2 | 
 3 | Here is another graph from the _Flash_ paper (see lecture). Focus on the performance of _Flash_ and _SPED_. At about 100 MB, _Flash_ becomes **better** than _SPED_. Why?
 4 | 
 5 | - Option 1: Flash can handle I/O operations without blocking
 6 | - Option 2: SPED starts receiving more requests
 7 | - Option 3: The workload becomes I/O bound
 8 | - Option 4: Flash can cache more files
 9 | 
10 | A: The answers are Options 1 and 3.
11 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/07-pio.md:
--------------------------------------------------------------------------------
 1 | # Problem 7: PIO
 2 | 
 3 | Answer the following questions about PIO:
 4 | 
 5 | 1. Considering I/O devices, what does PIO stand for?
 6 | 2. List the steps performed by the OS or process running on the CPU when sending a network packet using PIO.
 7 | 
 8 | A: The following answers are valid,
 9 | 
10 | 1. PIO stands for programmed I/O
11 | 2. When sending a network packet using PIO:
12 |    - Write command to request packet transmission
13 |    - Copy packet to data registers
14 |    - Repeat until packet sent
15 | 


--------------------------------------------------------------------------------
/Lesson 10 - Sample Midterm Questions/01-process-creation.md:
--------------------------------------------------------------------------------
 1 | # Problem 1: Process Creation
 2 | 
 3 | How is a process created? Select all that apply.
 4 | 
 5 | - Via fork
 6 | - Via exec
 7 | - Via fork followed by exec
 8 | - Via exec followed by fork
 9 | - Via exec or fork followed by exec
10 | - Via fork or fork followed by exec
11 | - None of the above
12 | - All of the above
13 | 
14 | A: A process is created by the following:
15 | 
16 | - Via fork, exact clone
17 | - Via fork followed by exec, not an exact clone
18 | - Via for or fork followed by exec
19 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/04-bare-metal-or-hosted-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: Bare-metal or Hosted
 2 | 
 3 | Do you think that the following **virtualization products** are **bare-metal/hypervisor-based (HV) or host-OS-based (OS)**
 4 | 
 5 | - KVM
 6 | - VMware ESX
 7 | - Fusion
 8 | - Citrix XenServer
 9 | - VirtualBox
10 | - Microsoft Hyper-V
11 | - VMware Player
12 | 
13 | A: The answers are as follows:
14 | 
15 | - KVM: OS
16 | - VMware ESX: HV
17 | - Fusion: OS
18 | - Citrix XenServer: HV
19 | - VirtualBox: OS
20 | - Microsoft Hyper-V: HV
21 | - VMware Player: OS
22 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/01-io-devices-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: I/O Devices
 2 | 
 3 | For each **device**, indicate whether it's typically used for **input (I)**, **output (O)**, or **both (B)**.
 4 | 
 5 | - Keyboard
 6 | - Microphone
 7 | - Speaker
 8 | - Network interface card (NIC)
 9 | - Display
10 | - Flash card
11 | - Hard disk drive
12 | 
13 | A: The answers are as follows:
14 | 
15 | - Keyboard: **I**
16 | - Microphone: **I**
17 | - Speaker: **O**
18 | - Network interface card (NIC): **B**
19 | - Display: **O**
20 | - Flash card: **B**
21 | - Hard disk drive: **B**
22 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/05-dma-vs-pio-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 5: DMA vs PIO
 2 | 
 3 | For a hypothetical system, assume the following:
 4 | 
 5 | - It costs one cycle to run a **store instruction** to a **device register**
 6 | - It costs five cycles to configure a **DMA controller**
 7 | - The PCI-bus is eight bytes wide
 8 | - All devices support both **DMA** and **PIO** access
 9 | 
10 | Which **device access method** is best for the following devices?
11 | 
12 | - Keyboard
13 | - NIC
14 | 
15 | A: The answer is as follows,
16 | 
17 | - Keyboard: PIO
18 | - NIC: depends
19 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/06-page-table-size.md:
--------------------------------------------------------------------------------
 1 | # Problem 6: Page Table Size
 2 | 
 3 | Consider a 32-bit (x86) platform running Linux that uses a single-level page table. What are the **maximum number of page table entries** when the following page sizes are used?
 4 | 
 5 | 1. Regular (4 kB) pages?
 6 | 2. Large (2 MB) pages?
 7 | 
 8 | A: The following answers are valid,
 9 | 
10 | 1. We would need 20 bits for the regular pages so $2^{20}$ entries for each virtual page
11 | 2. We would need 11 bits for the large pages so $2^{11}$ entries for each virtual page
12 | 


--------------------------------------------------------------------------------
/Lesson 04 - Processes and Process Management/05-scheduler-responsibility-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 5: Scheduler Responsibility
 2 | 
 3 | Which of the following ARE NOT a **responsibility of the CPU scheduler**?
 4 | 
 5 | - Option 1: maintaining the I/O queue
 6 | - Option 2: maintaining the ready queue
 7 | - Option 3: decision on when to context switch
 8 | - Option 4: decision on when to generate an event that a process is waiting on
 9 | 
10 | A: Option 2 and 4 are not a responsibility of the CPU scheduler, the CPU scheduler maintains the ready queue and decides when to context switch.
11 | 


--------------------------------------------------------------------------------
/Lesson 14 - Synchronization Constructs/03-spinlock-2-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Spinlock 2
 2 | 
 3 | Does this **spinlock implementation** (see lecture slide) correctly guarantee **mutual exclusion**? Is it **efficient**?
 4 | 
 5 | A: The answer is as follows,
 6 | 
 7 | - Mutual exclusion for this spinlock implementation is still incorrect:
 8 |   - If threads are allowed to execute concurrently, there is not way to guarantee that a race condition will not occur (need hardware support)
 9 | - It is also still not efficient:
10 |   - Continuous loop will spin as long as the lock is busy
11 | 


--------------------------------------------------------------------------------
/Lesson 14 - Synchronization Constructs/02-spinlock-1-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: Spinlock 1
 2 | 
 3 | Does this **spinlock implementation** (see lecture slide) correctly guarantee **mutual exclusion**? Is it **efficient**?
 4 | 
 5 | A: The answer is as follows,
 6 | 
 7 | - Mutual exclusion for this spinlock implementation is incorrect:
 8 |   - It is possible that more than one thread will see if the lock is free at the same time
 9 | - It is also not efficient:
10 |   - As long as lock is not free, `goto spin` will be executed so the cycle will be repeated executed (waste of CPU resources)
11 | 


--------------------------------------------------------------------------------
/Lesson 20 - Data Center Technologies/06-cloud-failure-probability-quiz.md:
--------------------------------------------------------------------------------
1 | # Quiz 6: Cloud Failure Probability
2 | 
3 | A hypothetical cloud has `N = 10` components (CPUs). Each has failure probability of `p = 0.03`. What is the probability that there will be a **failure** somewhere in the system? What if the system has `N = 100` components?
4 | 
5 | A: Using the given equation in lecture $P_{system} = 1-((1-P)^N) * 100$, the answers are as follows,
6 | 
7 | - `N = 10 and p = 0.03`: probability for system failure is 26%
8 | - `N = 100 and p = 0.03`: probability for system failure is 95%
9 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/08-inode-structure.md:
--------------------------------------------------------------------------------
 1 | # Problem 8: Inode Structure
 2 | 
 3 | Assume an inode has the following structure:
 4 | 
 5 | ![inode-structure](https://s3.amazonaws.com/content.udacity-data.com/courses/ud923/notes/ud923-final-inodes.png)
 6 | 
 7 | Also assume that each **block pointer element is 4 bytes**.
 8 | 
 9 | If a block on the disk is 4 kB, then what is the **maximum file size** that can be supported by this inode structure?
10 | 
11 | A: About 4 TB when we take into account addresses of first 12 disk blocks and single, double, and triple indirect.
12 | 


--------------------------------------------------------------------------------
/Lesson 09 - Thread Performance Considerations/01-models-and-memory-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: Models and Memory
 2 | 
 3 | Of the three **models** mentioned so far, which **model** likely requires **least amount of memory**?
 4 | 
 5 | - Boss-worker model
 6 | - Pipeline model
 7 | - Event-driven model
 8 | 
 9 | Why do you think this **model** requires the **least amount of memory**?
10 | 
11 | A: Event-driven model requires the least amount of memory since it operates on a single thread. Extra memory is only required for helper thread for concurrent blocking I/O, not for all concurrent requests
12 | 


--------------------------------------------------------------------------------
/Lesson 20 - Data Center Technologies/07-lesson-review.md:
--------------------------------------------------------------------------------
1 | # Quiz 7: Lesson Review
2 | 
3 | What did you learn in this lesson?
4 | 
5 | A: In this lesson, we learned about data center technologies, challenges in conventional data centers, multi-tier architectures (internet service, homogeneous, and heterogeneous), cloud computing, requirements for cloud computing, how cloud computing works in terms of scale, deployment models for the cloud, cloud service models, requirements for the cloud as well as cloud enabling technologies (virtualization and big data), and then the cloud as a big data engine.
6 | 


--------------------------------------------------------------------------------
/Lesson 17 - Remote Procedure Calls/01-rpc-failure-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: RPC Failure
 2 | 
 3 | Assume an **RPC fails** and returns a **timeout message**. Given this **timeout message**, what is the reason for the **RPC failure**?
 4 | 
 5 | - Option 1: client packet lost
 6 | - Option 2: server packet lost
 7 | - Option 3: network link down
 8 | - Option 4: server machine down
 9 | - Option 5: server process failed
10 | - Option 6: server process overloaded
11 | - Option 7: all of the above
12 | - Option 8: any of the above
13 | 
14 | A: The answer is Option 8, any of the above could cause an RPC failure.
15 | 


--------------------------------------------------------------------------------
/Lesson 05 - Threads and Concurrency/01-process-vs-threads-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: Process vs Threads
 2 | 
 3 | Do the following statements apply to **processes (P)**, **threads (T)**, or **both (B)**?
 4 | 
 5 | - Option 1: can share a virtual address space
 6 | - Option 2: take longer to context switch
 7 | - Option 3: have an execution context
 8 | - Option 4: usually result in hotter caches when multiple exist
 9 | - Option 5: make use of some communication mechanisms
10 | 
11 | A: The following answers are correct,
12 | 
13 | - Option 1: T
14 | - Option 2: P
15 | - Option 3: B
16 | - Option 4: T
17 | - Option 5: B
18 | 


--------------------------------------------------------------------------------
/Lesson 07 - Problem Set 1/04-the-echo-protocol-client.c:
--------------------------------------------------------------------------------
 1 | // The echo protocol client
 2 | // Pseudo code
 3 | 
 4 | // Include packages
 5 | #include <stdio.h>
 6 | #include <stdlib.h>
 7 | #include <strings.h>
 8 | #include <unistd.h>
 9 | #include <netdb.h>
10 | #include <netinet/in.h>
11 | 
12 | // Constants
13 | #define SERVER_NAME "localhost"
14 | #define PORT 8000
15 | #define MESSAGE "hello world!"
16 | 
17 | // Connect to server socket if available and send message
18 | int main() {
19 |     // Set localhost
20 |     // Create socket
21 |     // Connect to server if exists
22 |     // Send message
23 |     // Close socket
24 | }
25 | 


--------------------------------------------------------------------------------
/Lesson 10 - Sample Midterm Questions/08-performance-observations.md:
--------------------------------------------------------------------------------
 1 | # Problem 8: Performance Observations
 2 | 
 3 | Here is a graph from the paper [*Flash: An Efficient and Portable Web Server*](https://s3.amazonaws.com/content.udacity-data.com/courses/ud923/references/ud923-pai-paper.pdf), that compares the performance of Flash with other web servers.
 4 | 
 5 | For data sets **where the data set size is less than 100 MB** why does...
 6 | 
 7 | 1. Flash perform worse than SPED?
 8 | 2. Flash perform better than MP?
 9 | 
10 | A: The answers are as follows,
11 | 
12 | 1. Flash checks the cache where SPED does not.
13 | 2. Flash uses cache where MP does not.
14 | 


--------------------------------------------------------------------------------
/Lesson 07 - Problem Set 1/01-priority-readers-and-writers.c:
--------------------------------------------------------------------------------
 1 | // Priority readers and writers
 2 | // Pseudo code
 3 | 
 4 | // Include packages
 5 | #include <stdio.h>
 6 | #include <stdlib.h>
 7 | #include <unistd.h>
 8 | #include <pthread.h>
 9 | 
10 | // Constants
11 | #define READERS 5
12 | #define READS 5
13 | #define WRITERS 5
14 | #define WRITES 5
15 | 
16 | // Globals
17 | int shared_resource = 0;
18 | 
19 | // Main
20 | int main() {}
21 | 
22 | // Reader helper function to print value read and number of readers
23 | void reader_helper() {}
24 | 
25 | // Writer helper function to print value written and number of readers during write
26 | void writer_helper() {}
27 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/04-synchronization.md:
--------------------------------------------------------------------------------
 1 | # Problem 4: Synchronization
 2 | 
 3 | In a multi-processor system, a thread is trying to acquire a locked mutex.
 4 | 
 5 | 1. Should the thread spin until the mutex is released or block?
 6 | 2. Why might it be better to spin in some instances?
 7 | 3. What if this were a uni-processor system?
 8 | 
 9 | A: The answers are as follows,
10 | 
11 | 1. Spin if thread is running on another CPU, otherwise block
12 | 2. Spinning is better than accumulating overhead for context switching
13 | 3. Block when using uni-processor system since there is no way to run threads in parallel on a single-core processor system
14 | 


--------------------------------------------------------------------------------
/Lesson 11 - Scheduling/03-time-slice-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Time-slice
 2 | 
 3 | On a single CPU system, consider the following workload and conditions:
 4 | 
 5 | - **10 I/O bound tasks** and **1 CPU bound** task
 6 | - I/O bound tasks issue an I/O op every **1 ms** of CPU computing
 7 | - I/O operations always take **10 ms** to complete
 8 | - Context switching overhead is **0.1 ms**
 9 | - All tasks are long running
10 | 
11 | What is the **CPU utilization** (%) for a **round robin scheduler** where the **time-slice** is **1 ms**? How about for a **10 ms time-slice**? (round to nearest percent)
12 | 
13 | A: The answers are as follows,
14 | 
15 | - 1 ms: 91%
16 | - 10 ms: 95%
17 | 


--------------------------------------------------------------------------------
/Lesson 06 - Thread Case Study - PThreads/03-pthread-creation-3-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: PThread Creation 3
 2 | 
 3 | **Code provided in lecture**.
 4 | 
 5 | What are the possible **outputs** to this program?
 6 | 
 7 | - Option 1:
 8 |   ```bash
 9 |   Thread number 0
10 |   Thread number 0
11 |   Thread number 2
12 |   Thread number 3
13 |   ```
14 | - Option 2:
15 |   ```bash
16 |   Thread number 0
17 |   Thread number 2
18 |   Thread number 1
19 |   Thread number 3
20 |   ```
21 | - Option 3:
22 |   ```bash
23 |   Thread number 3
24 |   Thread number 2
25 |   Thread number 1
26 |   Thread number 0
27 |   ```
28 | 
29 | A: Only Option 2 and 3 are valid since there is no race condition present in the code.
30 | 


--------------------------------------------------------------------------------
/Lesson 07 - Problem Set 1/04-the-echo-protocol-server.c:
--------------------------------------------------------------------------------
 1 | // The echo protocol server
 2 | // Pseudo code
 3 | 
 4 | // Include packages
 5 | #include <arpa/inet.h>
 6 | #include <ctype.h>
 7 | #include <stdio.h>
 8 | #include <stdlib.h>
 9 | #include <strings.h>
10 | #include <sys/socket.h>
11 | #include <unistd.h>
12 | #include <netdb.h>
13 | #include <netinet/in.h> 
14 | 
15 | // Constants
16 | #define PORT 8000
17 | 
18 | // Accept a client socket if available, read message, modify message, and reply
19 | int main() {
20 |     // Create socket
21 |     // Listen on socket
22 |     // Accept client
23 |     // Read and modify message
24 |     // Send message to client
25 |     // Close socket
26 | }
27 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/06-problematic-instructions-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 6: Problematic Instructions
 2 | 
 3 | In earlier x86 platforms the flags **privileged register** was accessed via the **instructions POPF (pop flags off stack) and PUSHF (push flags onto stack)** that **failed silently** if not called from **ring 0 (hypervisor)**. What do you think can occur as a result?
 4 | 
 5 | - Option 1: guest VM could not request interrupts enabled
 6 | - Option 2: guest VM could not request interrupts disabled
 7 | - Option 3: guest VM could not find out what is the state of the interrupts enabled/disabled bit
 8 | - Option 4: all of the above
 9 | 
10 | A: The correct answer is Option 4, all of the above.
11 | 


--------------------------------------------------------------------------------
/Lesson 04 - Processes and Process Management/01-virtual-address-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 1: Virtual Address
 2 | 
 3 | If two processes, *P1* and *P2*, are running at the same time, what are the **virtual address space** ranges they will have?
 4 | 
 5 | - Option 1:
 6 |   - *P1*: 0-32000
 7 |   - *P2*: 32001-64000
 8 | - Option 2:
 9 |   - *P1*: 0-64000
10 |   - *P2*: 0-64000
11 | - Option 3:
12 |   - *P1*: 32001-64000
13 |   - *P2*: 0-32000
14 | 
15 | A: Option 2 is the only valid answer since we have decoupled the virtual addresses that are used by the processes from the physical address where data actually is makes it possible for different processes to have the exact same address space range and to use the exact same addresses.
16 | 


--------------------------------------------------------------------------------
/Lesson 14 - Synchronization Constructs/04-conflicting-metrics-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 4: Conflicting Metrics
 2 | 
 3 | 1. Reduce **latency**:
 4 | 
 5 | - _Time to acquire a free lock_
 6 | - Ideally immediately execute atomic
 7 | 
 8 | 2. Reduce **waiting time (delay)**:
 9 | 
10 | - _Time to stop spinning and acquire a lock that has been freed_
11 | - Ideally immediately
12 | 
13 | 3. Reduce **contention**:
14 | 
15 | - _Bus/network I/C traffic_
16 | - Ideally zero
17 | 
18 | Among the described **metrics** are there any conflicting goals? Check all that apply.
19 | 
20 | - Option 1: 1 conflicts with 2
21 | - Option 2: 1 conflicts with 3
22 | - Option 3: 2 conflicts with 3
23 | 
24 | A: The correct answer is Option 2 and 3.
25 | 


--------------------------------------------------------------------------------
/Lesson 18 - Distributed File Systems/03-replication-vs-partitioning-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Replication vs Partitioning
 2 | 
 3 | Consider **server machines** that hold **100 files each**. Using **three** such machines, a **DFS** can be configured using **replication** or **partitioning**. Answer the following:
 4 | 
 5 | 1. How many **total files can be stored** int he **replicated** vs the **partitioned DFS**?
 6 | 
 7 | 2. What **percentage of the total files will be lost if one machine fails** in the **replicated** versus **partitioned DFS** (round to the nearest %)?
 8 | 
 9 | A: The answers are as follows,
10 | 
11 | 1. There will be **100** files in replicated DFS and **300** files in partitioned DFS.
12 | 2. There will be **0** % lost in replicated DFS and **33** % lost in partitioned DFS.
13 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/05-spinlocks.md:
--------------------------------------------------------------------------------
 1 | # Problem 5: Spinlocks
 2 | 
 3 | For the following question, consider a multi-processor with write-invalidated cache coherence.
 4 | 
 5 | Determine whether the use of a dynamic (exponential backoff) delay has the **same, better, or worse performance** than a test-and-test-and-set (“spin on read”) lock. Then, explain why.
 6 | 
 7 | Make a performance comparison using each of the following metrics:
 8 | 
 9 | 1. Latency
10 | 2. Delay
11 | 3. Contention
12 | 
13 | A: The answers are as follows,
14 | 
15 | 1. Latency is the same since operations are identical when lock is free
16 | 2. Delay is worst for dynamic delay since lock could be released during delay
17 | 3. Contention is better for dynamic delay since there exists a delay (does not trigger any additional memory access requests)
18 | 


--------------------------------------------------------------------------------
/Lesson 10 - Sample Midterm Questions/06-solaris-papers.md:
--------------------------------------------------------------------------------
 1 | # Problem 6: Solaris Papers
 2 | 
 3 | The implementation of Solaris threads described in the paper _Beyond Multiprocessing: Multi-threading the Sun OS Kernel_, describes four key data structures used by the OS to support threads.
 4 | 
 5 | For each of these data structures, **list at least two elements** they must contain.
 6 | 
 7 | 1. Process
 8 | 2. LWP
 9 | 3. Kernel-threads
10 | 4. CPU
11 | 
12 | A: The answers are as follows,
13 | 
14 | 1. Process must include at least information on user-level registers and a list of KLTs
15 | 2. LWP must include at least user-level registers and system call arguments
16 | 3. Kernel-threads must include at least kernel-level registers and scheduling information
17 | 4. CPU must include information on the current thread and a list of KLTs
18 | 


--------------------------------------------------------------------------------
/Lesson 20 - Data Center Technologies/05-cloud-computing-definitions-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 5: Cloud Computing Definitions
 2 | 
 3 | > Cloud computing is a model for enabling **ubiquitous**, convenient, **on-demand network access** to a **shared pool** of configurable computing resources (e.g., network, servers, etc.) that can be **rapidly provisioned** and released with **minimal management** effort or service provider interactions.
 4 | 
 5 | Place the underlined (in this case bold) phrases into the appropriate textbox based on the **cloud computing requirement** they best describe.
 6 | 
 7 | A: The following answers are valid,
 8 | 
 9 | - Elastic resources:
10 |   - On-demand network access
11 |   - Shared pool
12 |   - Rapidly provisioned
13 | - Fine-grained pricing: N/A
14 | - Professionally managed:
15 |   - Minimal management
16 | - API-based:
17 |   - Ubiquitous
18 | 


--------------------------------------------------------------------------------
/Lesson 12 - Memory Management/03-least-recently-used-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Least Recently Used
 2 | 
 3 | Suppose you have an array with **11 page-sized entries** that are **accessed one-by-one and then manipulated one-by-one in a loop**. Also, suppose you have a system with **10 pages of physical memory**.
 4 | 
 5 | What is the **percentage of pages** that will need to be **demand paged using the LRU policy**? (round to the nearest %)
 6 | 
 7 | Assume the following structure:
 8 | 
 9 | ```c
10 | int i = 0;
11 | int j = 0;
12 | 
13 | while(1) {
14 |     for(i = 0; i < 11; ++i) {
15 |         // access page[i]
16 |     }
17 | 
18 |     for(j = 0; j < 11; ++j) {
19 |         // manipulate page[i]
20 |     }
21 |     break;
22 | }
23 | ```
24 | 
25 | A: 100% since physical memory only has 10 pages, we have to demand the first page in and pick out another page to swap and replace.
26 | 


--------------------------------------------------------------------------------
/Lesson 06 - Thread Case Study - PThreads/02-pthread-creation-2-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 2: PThread Creation 2
 2 | 
 3 | **Code provided in lecture**.
 4 | 
 5 | What are the possible **outputs** to this program?
 6 | 
 7 | - Option 1:
 8 |   ```bash
 9 |   Thread number 0
10 |   Thread number 1
11 |   Thread number 2
12 |   Thread number 3
13 |   ```
14 | - Option 2:
15 |   ```bash
16 |   Thread number 0
17 |   Thread number 2
18 |   Thread number 1
19 |   Thread number 3
20 |   ```
21 | - Option 3:
22 |   ```bash
23 |   Thread number 0
24 |   Thread number 2
25 |   Thread number 2
26 |   Thread number 3
27 |   ```
28 | 
29 | A: All options are possible, there is no control over when each input thread is scheduled, `i` is a global variable, and all other threads see the new value. Additionally, a race condition is possible (a thread tries to read a value while another is modifying that value).
30 | 


--------------------------------------------------------------------------------
/Lesson 10 - Sample Midterm Questions/02-multi-threading-and-one-cpu.md:
--------------------------------------------------------------------------------
 1 | # Problem 2: Multi-threading and One CPU
 2 | 
 3 | Is there a benefit of multi-threading on one CPU?
 4 | 
 5 | - Yes
 6 | - No
 7 | 
 8 | A: Yes, reason are (from lecture):
 9 | 
10 | - Threads can implement **parallelization** which can process the input much faster than if only a single thread on a single CPU had to process say, an entire matrix for example
11 | - Threads may execute completely different portions of the program
12 | - Threads can also utilize **specialization** which takes advantage of the hot cache present on each thread
13 | - A multi-threaded application is more memory efficient and has lower memory requirements than its multi-processor alternative
14 | - Additionally, a multi-threaded application incurs lower overheads for their inter-thread communication then the corresponding inter-process alternatives
15 | 


--------------------------------------------------------------------------------
/Lesson 10 - Sample Midterm Questions/04-calendar-critical-section.md:
--------------------------------------------------------------------------------
 1 | # Problem 4: Calendar Critical Section
 2 | 
 3 | A shared calendar supports three types of operations for reservations:
 4 | 
 5 | 1. Read
 6 | 2. Cancel
 7 | 3. Enter
 8 | 
 9 | Requests for cancellations should have priority above reads, who in turn have priority over new updates.
10 | 
11 | In pseudo-code, write the critical section enter/exit code for the **read** operation.
12 | 
13 | A: The pseudo-code for the **read** operation is as follows,
14 | 
15 | ```c
16 | // Calendar Critical Section
17 | // Pseudo code
18 | 
19 | // Include packages, constants, and globals
20 | 
21 | // Main
22 | int main() {
23 |     // Requests for cancellations
24 |     // Reads
25 |     // Updates
26 | }
27 | 
28 | // Reader helper function
29 | void reader_helper() {
30 |     // Logic to break if a request for cancellation occurs
31 | }
32 | ```
33 | 


--------------------------------------------------------------------------------
/Lesson 09 - Thread Performance Considerations/03-experimental-design-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 3: Experimental Design
 2 | 
 3 | A toy shop manager wants to determine how many workers to hire to be able to handle the **worst case** scenario. Orders range in difficulty from _blocks_ to _teddy bears_ to _trains_. The shop has _three working areas_, each with tools for any toy.
 4 | 
 5 | Which of the following **experiments** (types of orders, number of workers) will allow us to make meaningful conclusions about the manager's question?
 6 | 
 7 | - Configuration 1: `{ (train, 3), (train, 4), (train, 5) }`
 8 | - Configuration 2: `{ (blocks, 3), (bears, 6), (train, 9) }`
 9 | - Configuration 3: `{ (mixed, 3), (mixed, 6), (mixed, 9) }`
10 | - Configuration 4: `{ (train, 3), (train, 6), (train, 9) }`
11 | 
12 | A: Configuration 4 is valid since there is a sequence of trains (worst case) with a variation of a number of workers (adding three each time).
13 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/03-hardware-counters.md:
--------------------------------------------------------------------------------
 1 | # Problem 3: Hardware Counters
 2 | 
 3 | Consider a quad-core machine with a single memory module connected to the CPU's via a shared _bus_. On this machine, a CPU instruction takes 1 cycle, and a memory instruction takes 4 cycles.
 4 | 
 5 | The machine has two hardware counters:
 6 | 
 7 | - Counter that measures IPC
 8 | - Counter that measures CPI
 9 | 
10 | Answer the following:
11 | 
12 | 1. What does IPC stand for in this context?
13 | 2. What does CPI stand for in this context?
14 | 3. What is the highest IPC on this machine?
15 | 4. What is the highest CPI on this machine?
16 | 
17 | A: The answers are as follows,
18 | 
19 | 1. IPC (instructions per cycle)
20 | 2. CPI (cycles per instruction)
21 | 3. Since it is a quad-cord machine the highest IPC is 4
22 | 4. Since a memory instruction takes 4 cycles, the highest CPI is 16 if each core issues a memory instruction (contention)
23 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/11-consistency-models.md:
--------------------------------------------------------------------------------
 1 | # Problem 11: Consistency Models
 2 | 
 3 | Consider the following sequence of operations by processors _P1_, _P2_, and _P3_ which occurred in a distributed shared memory system:
 4 | 
 5 | ![consistency-model-diagram](https://s3.amazonaws.com/content.udacity-data.com/courses/ud923/notes/ud923-final-consistency-models.png)
 6 | 
 7 | Notation
 8 | 
 9 | - `R_m1(X) => X` was read from memory location m1 (**does not indicate where it was stored**)
10 | - `W_m1(Y) => Y` was written to memory location m1
11 | - Initially all memory is set to 0
12 | 
13 | Answer the following questions:
14 | 
15 | 1. Name all processors (_P1_, _P2_, or _P3_) that observe causally consistent reads.
16 | 2. Is this execution causally consistent?
17 | 
18 | A: The answers are as follows,
19 | 
20 | 1. Only *P1* and *P2* observe causally consistent reads
21 | 2. No because values before writes cannot be observed
22 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/12-distributed-applications.md:
--------------------------------------------------------------------------------
 1 | # Problem 12: Distributed Applications
 2 | 
 3 | You are designing the new image datastore for an application that stores users' images (like [Picasa](http://picasa.google.com/)). The new design must consider the following scale:
 4 | 
 5 | - The current application has 30 million users
 6 | - Each user has on average 2,000 photos
 7 | - Each photo is on average 500 kB
 8 | - Requests are evenly distributed across all images
 9 | 
10 | Answer the following:
11 | 
12 | 1. Would you use replication or partitioning as a mechanism to ensure high responsiveness of the image store?
13 | 2. If you have 10 server machines at your disposal, and one of them crashes, what's the percentage of requests that the image store will not be able to respond to, if any?
14 | 
15 | A: The answers are as follows,
16 | 
17 | 1. Partitioning since current data set is extremely large (30 PB)
18 | 2. 10% as requests are evenly distributed
19 | 


--------------------------------------------------------------------------------
/Lesson 05 - Threads and Concurrency/05-multi-threading-patterns-quiz.md:
--------------------------------------------------------------------------------
 1 | # Quiz 5: Multi-threading Patterns
 2 | 
 3 | For the 6-step toy order application, we have designed two solutions:
 4 | 
 5 | - A **boss-workers** solution
 6 | - A **pipeline** solution
 7 | 
 8 | Both solutions have six threads:
 9 | 
10 | - In the **boss-workers** solution, a worker processes a toy order in 120 ms
11 | - In the **pipeline** solution, each of the six stages (= step) take 20 ms
12 | 
13 | How long will it take for these solutions to complete 10 toy orders?
14 | What about if there were 11 toy orders?
15 | 
16 | A: Using the following formulas:
17 | 
18 | - **Boss-worker formula**: `time_to_finish_one_order * ceiling * (num_orders / num_concurrent_threads)`
19 | - **Pipeline formula**: `time_to_finish_first_order + (remaining_orders * time_to_finish_last_stage)`
20 | 
21 | We get the following:
22 | -**Boss-workers (10)**: 240 ms
23 | -**Boss-workers (11)**: 360 ms
24 | -**Pipeline (10)**: 300 ms
25 | -**Pipeline (11)**: 320 ms
26 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/10-dfs-semantics.md:
--------------------------------------------------------------------------------
 1 | # Problem 10: DFS Semantics
 2 | 
 3 | Consider the following timeline where 'f' is distributed shared file and P1 and P2 are processes:
 4 | 
 5 | ![dfs-semantics-diagram](https://s3.amazonaws.com/content.udacity-data.com/courses/ud923/notes/ud923-final-dfs-semantics.png)
 6 | 
 7 | Other Assumptions:
 8 | 
 9 | - 't' represents the time intervals in which functions execute
10 | - The 'w' flag means write/append
11 | - The 'r' flag means read
12 | - The original content of 'f' was _a_
13 | - The `read()` function returns the entire contents of the file
14 | 
15 | For each of the following DFS semantics, what will be read -- **the contents of 'f'** -- by P2 when t = 4s?
16 | 
17 | 1. UNIX semantics
18 | 2. NFS semantics
19 | 3. Sprite semantics
20 | 
21 | A: The following answers are valid,
22 | 
23 | 1. `ab` since all updates are instantaneous
24 | 2. `a` since the write at 2 seconds would not be read
25 | 3. `ab` since a check occurs for *P1* most recent value
26 | 


--------------------------------------------------------------------------------
/Lesson 10 - Sample Midterm Questions/03-critical-section.md:
--------------------------------------------------------------------------------
 1 | # Problem 3: Critical Section
 2 | 
 3 | In the (pseudo) code segments for the **producer code** and **consumer code**, mark and explain all the lines where there are errors.
 4 | 
 5 | - **Global Section**:
 6 | 
 7 | ```c
 8 | int in, out, buffer[BUFFERSIZE];
 9 | mutex_t m;
10 | cond_var_t not_empty, not_full;
11 | ```
12 | 
13 | - **Producer Code**:
14 | 
15 | ```c
16 |    while (more_to_produce) {
17 |      mutex_lock(&m);
18 |      if (out == (in + 1) % BUFFERSIZE)) // buffer full, should use a while loop
19 |         condition_wait(&not_full); // I believe this needs a mutex input
20 |    add_item(buffer[in]); // add item
21 |      in = (in + 1) % BUFFERSIZE
22 |       cond_broadcast(&not_empty); // Use signal instead
23 | 
24 | } // end producer code
25 | ```
26 | 
27 | - **Consumer Code**:
28 | 
29 | ```c
30 |    while (more_to_consume) {
31 |    mutex_lock(&m);
32 |    if (out == in) // buffer empty, should use while loop
33 |      condition_wait(&not_empty); // Needs mutex input
34 |    remove_item(out);
35 |    out = (out + 1) % BUFFERSIZE;
36 |    condition_signal(&not_empty); // Should be not_full
37 |    // Insert mutex unlock here!
38 | } // end consumer code
39 | ```
40 | 


--------------------------------------------------------------------------------
/Lesson 10 - Sample Midterm Questions/07-pipeline-model.md:
--------------------------------------------------------------------------------
 1 | # Problem 7: Pipeline Model
 2 | 
 3 | An image web server has three stages with average execution times as follows:
 4 | 
 5 | - Stage 1: read and parse request (10 ms)
 6 | - Stage 2: read and process image (30 ms)
 7 | - Stage 3: send image (20 ms)
 8 | 
 9 | You have been asked to build a multi-threaded implementation of this server using the **pipeline model**. Using a **pipeline model**, answer the following questions:
10 | 
11 | 1. How many threads will you allocate to each pipeline stage?
12 | 2. What is the expected execution time for 100 requests (in sec)?
13 | 3. What is the average throughput of the system in Question 2 (in req/sec)? Assume there are infinite processing resources (CPU's, memory, etc.).
14 | 
15 | A: The answers are as follows,
16 | 
17 | 1. We need to vary the number of threads for each stage:
18 |    1. Stage 1 needs one thread (1 \* 10 ms = 10 ms)
19 |    2. Stage 2 needs three threads (3 \* 10 ms = 30 ms)
20 |    3. Stage 3 needs two threads (2 \* 10 ms = 20 ms)
21 | 2. Around 1.05 s since the first request takes 60 ms (each stage added up), remaining 99 requests will take 10 ms.
22 | 3. If we complete 100 requests in 1.05 s then our rate is about 95.2 req/s.
23 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/01-time-slices.md:
--------------------------------------------------------------------------------
 1 | # Problem 1: Time-slices
 2 | 
 3 | On a single CPU system, consider the following workload and conditions:
 4 | 
 5 | - 10 I/O-bound tasks and 1 CPU-bound task
 6 | - I/O-bound tasks issue an I/O operation once every 1 ms of CPU computing
 7 | - I/O operations always take 10 ms to complete
 8 | - Context switching overhead is 0.1 ms
 9 | - I/O device(s) have infinite capacity to handle concurrent I/O requests
10 | - All tasks are long-running
11 | 
12 | Now, answer the following questions (for each question, round to the nearest percent):
13 | 
14 | 1. What is the **CPU utilization** (%) for a round-robin scheduler where the time-slice is 20 ms?
15 | 2. What is the **I/O utilization** (%) for a round-robin scheduler where the time-slice is 20 ms?
16 | 
17 | A: The answers are as follows,
18 | 
19 | 1. Using $CPU_{utilization} = (I/0 + time-slice)/(I/O + time-slice + (tasks * context switch))$ the CPU utilization is 97%
20 | 2. Given that $issued time = (start time + prior tasks + (prior tasks * context switch))$, the first I/O request is issued at 1 ms and the last I/O request at 20.9 ms; previously we found the CPU bound task to complete at 31.1 ms so from $I/O_{utilization} = (first request - last request)/CPU bound task$ we get 64%
21 | 


--------------------------------------------------------------------------------
/Lesson 21 - Sample Final Questions/02-linux-o(1)-scheduler.md:
--------------------------------------------------------------------------------
 1 | # Problem 2: Linux O(1) Scheduler
 2 | 
 3 | For the next four questions, consider a Linux system with the following assumptions:
 4 | 
 5 | - Uses the O(1) scheduling algorithm for time sharing threads
 6 | - Must assign a time quantum for thread T1 with priority 110
 7 | - Must assign a time quantum for thread T2 with priority 135
 8 | 
 9 | Provide answers to the following:
10 | 
11 | 1. Which thread has a *higher* priority (will be serviced first)?
12 | 2. Which thread is assigned a **longer time quantum**?
13 | 3. Assume T2 has used its time quantum without blocking. What will happen to the value that represents its priority level when T2 gets scheduled again?
14 |    - Lower/decrease
15 |    - Higher/increase
16 |    - Same
17 | 4. Assume now that T2 blocks for I/O before its time quantum expired. What will happen to the value that represents its priority level when T2 gets scheduled again?
18 |    - Lower/decrease
19 |    - Higher/increase
20 |    - Same
21 | 
22 | A: The answers are as follows,
23 | 
24 | 1. The O(1) scheduling algorithm assigns higher priority to threads with a lower priority number (T1)
25 | 2. The O(1) scheduling algorithm dictates that higher priority tasks will have a longer time quantum (T1)
26 | 3. Increase, if T2 uses time quantum without blocking it should have a lower priority next time around
27 | 4. Decrease, if T2 blocks for I/O before time quantum expiration then priority will be higher due to blocking (more I/O intense)
28 | 


--------------------------------------------------------------------------------
/Lesson 03 - Introduction to Operating Systems/lesson-03-notes.md:
--------------------------------------------------------------------------------
 1 | # Lesson 3: Introduction to Operating Systems
 2 | 
 3 | - Topics to be covered in this lesson:
 4 |   - What is an OS (operating system)?
 5 |   - What are key components of an OS?
 6 |   - Design and implementation considerations of OSs
 7 | 
 8 | ## What is an Operating System?
 9 | 
10 | - An OS is a special piece of software that abstracts and arbitrates the use of a computer system
11 | - An **OS** is like **a toy shop manager** in that an OS:
12 |   - Directs operational resources
13 |   - Enforces working policies
14 |   - Mitigates difficulty of complex tasks
15 | - By definition, an OS is a layer of systems software that:
16 |   - Directly has privileged access to the underlying hardware
17 |   - Hides the hardware complexity
18 |   - Manages hardware on behalf of one of more applications according to some predefined polices
19 |   - In addition, it ensures that applications are isolated and protected from one another
20 | 
21 | ## OS Elements
22 | 
23 | - Abstractions:
24 |   - Process, thread, file, socket, memory page
25 | - Mechanisms
26 |   - Create, schedule, open, write, allocate
27 | - Policies
28 |   - Least recently used (LRU), earliest deadline first (EDF)
29 | 
30 | ## Design Principles
31 | 
32 | - Separation of mechanisms to policy:
33 |   - Implement flexible mechanisms to support many policies
34 | - Optimize for common case:
35 |   - Where will the OS be used?
36 |   - What will the user want to execute on that machine?
37 |   - What are the workload requirements?
38 | 
39 | ## OS Protection Boundary
40 | 
41 | - Generally, applications operate in unprivileged mode (user level) while operating systems operate in privileged mode (kernel level)
42 | - Kernel level software is able to access hardware directly
43 | - User-kernel switch is supported by hardware
44 | 
45 | ## Crossing The OS Boundary
46 | 
47 | - Applications will need to utilize user-kernel transitions which is accomplished by hardware, this involves a number of instructions and switches locality
48 | - Switching locality will affect hardware cache (transitions are costly)
49 | - Hardware will set _traps_ on illegal instructions or memory accesses requiring special privilege
50 | 
51 | ## Monolithic OS
52 | 
53 | - Pros:
54 |   - Everything included
55 |   - Inlining, compile-time optimizations
56 | - Cons:
57 |   - Customization, portability, manageability
58 |   - Memory footprint
59 |   - Performance
60 | 
61 | ## Modular OS
62 | 
63 | - Pros:
64 |   - Maintainability
65 |   - Smaller footprint
66 |   - Less resource needs
67 | - Cons:
68 |   - Indirection can impact performance
69 |   - Maintenance can still be an issue
70 | 
71 | ## Microkernel
72 | 
73 | - Pros:
74 |   - Size
75 |   - Verifiability
76 | - Cons:
77 |   - Portability
78 |   - Complexity of software development
79 |   - Cost of user/kernel crossing
80 | 


--------------------------------------------------------------------------------
/Lesson 06 - Thread Case Study - PThreads/lesson-06-notes.md:
--------------------------------------------------------------------------------
 1 | # Lesson 6: Thread Case Study - PThreads
 2 | 
 3 | - Topics to be covered in this lesson:
 4 |   - What are PThreads?
 5 |     - POSIX Threads
 6 |   - What is POSIX?
 7 |     - POSIX stands for Portable Operating System Interface
 8 | - **POSIX Threads**:
 9 |   - POSIX versions of Birrell's API
10 |   - Specifies syntax and semantics of the operations
11 | 
12 | ## PThread Creation
13 | 
14 | - PThread creation is similar to thread abstraction proposed by Birrell:
15 | 
16 | | Birrell's Mechanisms | PThreads                     |
17 | | -------------------- | ---------------------------- |
18 | | `Thread`             | `pthread_t` (type of thread) |
19 | | `Fork()`             | `pthread_create()`           |
20 | | `Join()`             | `pthread_join()`             |
21 | 
22 | - `pthread_attr_t`:
23 | 
24 |   - Specified in `pthread_create`
25 |   - Defines features of the new thread
26 |   - Has default behavior with NULL in `pthread_create`
27 | 
28 | - **Detaching PThreads**:
29 |   - Mechanism not considered by Birrell
30 |   - Default: joinable threads
31 |     - Parent thread creates children threads and joins them at a later time
32 |     - The parent thread should not terminate until the children threads have completed their executions and have been joined via the explicit join operation
33 |     - If parent threads exits early, children threads can become _zombies_
34 |   - **Detached threads**:
35 |     - There is a possibility for children threads to be detached from the parent
36 |     - Once detached, threads cannot join
37 |     - If a parent exits, children threads are free to continue their execution
38 |     - Parent and children are equivalent to one another
39 | 
40 | ## Compiling Threads
41 | 
42 | - Ensure to include the PThread header file, `pthread.h`, in your main file that contains the PThreads code, otherwise the program will not compile
43 | - Compile source with `-lpthread` or `-pthread`
44 | - Check the return values of common functions
45 | 
46 | ## PThread Mutexes
47 | 
48 | - PThread mutexes were designed to solve mutual exclusion problems among concurrent threads
49 | - Below is a comparison of Birrell's mechanisms and PThreads for mutexes:
50 | 
51 | | Birrell's Mechanisms      | PThreads                                   |
52 | | ------------------------- | ------------------------------------------ |
53 | | `Mutex`                   | `pthread_mutex_t` (mutex type)             |
54 | | `Lock()` (to lock)        | `pthread_mutex_lock()` (explicit lock)     |
55 | | `Lock()` (also to unlock) | `pthread_mutex_unlock()` (explicit unlock) |
56 | 
57 | - **Mutex safety tips**:
58 |   - Shared data should always be accessed through a single mutex!
59 |   - Mutex scope must be visible to all!
60 |   - Globally order locks
61 |     - For all threads, lock mutexes in order
62 |   - Always unlock a mutex
63 |     - Always unlock the correct mutex
64 | 
65 | ## PThread Condition Variables
66 | 
67 | - Below is a comparison of Birrell's mechanisms and PThreads for condition variables:
68 | 
69 | | Birrell's Mechanisms           | PThreads                                 |
70 | | ------------------------------ | ---------------------------------------- |
71 | | `Condition`                    | `pthread_cond_t` (type of cond variable) |
72 | | `Wait()` (to lock)             | `pthread_cond_wait()`                    |
73 | | `Signal()` (also to unlock)    | `pthread_cond_signal()`                  |
74 | | `Broadcast()` (also to unlock) | `pthread_cond_broadcast()`               |
75 | 
76 | - There are also other condition variables such as `pthread_cond_init()` and `pthread_cond_destroy()`
77 | - **Condition variable safety tips**:
78 |   - Do not forget to notify waiting threads!
79 |     - Predicate change => signal/broadcast correct condition variable
80 |   - When in doubt broadcast
81 |     - However, broadcast too often will result in **performance loss**
82 |   - You do not need a mutex to signal/broadcast (it may be necessary to wait until mutex is removed before signaling/broadcasting)
83 | 


--------------------------------------------------------------------------------
/Lesson 13 - Inter-process Communication/lesson-13-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 13: Inter-process Communication
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - IPC (inter-process communication)
  5 |   - Shared Memory IPC
  6 | 
  7 | ## Visual Metaphor
  8 | 
  9 | - An **IPC** is like **working together** in the toy shop:
 10 |   - Processes share memory:
 11 |     - Data shared in memory
 12 |   - Processes exchange messages:
 13 |     - Message passing via sockets
 14 |   - Requires synchronization:
 15 |     - Mutexes, writing, etc.
 16 | 
 17 | ## Inter-process Communication
 18 | 
 19 | - **IPC**: OS-supported mechanisms for interaction among processes (coordination and communication)
 20 | - Message passing: e.g., sockets, pipes, message queues
 21 | - Memory-based IPC: shared memory, memory mapped files
 22 | - Higher-level semantics: files, RPC
 23 | - Synchronization primitives
 24 | 
 25 | ## Message Based IPC
 26 | 
 27 | - **Processes** create messages and send/receive them:
 28 |   - Send/write messages to a port
 29 |   - Receive/read messages from a port
 30 | - **OS** creates and maintains a channel (i.e., buffer, FIFO queue, etc.):
 31 |   - OS provides interface to processes - a port
 32 | - **Kernel** required to:
 33 |   - Establish a connection
 34 |   - Perform each IPC operation
 35 | - Send: system call + data copy
 36 | - Receive: system call + data copy
 37 | - Pros:
 38 |   - Simplicity: kernel does channel management and synchronization
 39 | - Cons: overheads
 40 | 
 41 | ## Forms of Message Passing
 42 | 
 43 | - **Pipes**:
 44 |   - Carry byte stream between two processes (e.g., connect output from one process to input of another)
 45 | - **Message queues**:
 46 |   - Carry _messages_ among processes
 47 |   - OS management includes priorities, scheduling of message delivery, etc.
 48 |   - APIs: `SysV` and `POSIX`
 49 | - **Sockets**:
 50 |   - `send()`, `recv()` to pass message buffers
 51 |   - `socket()` to create kernel-level socket buffer
 52 |   - Associate necessary kernel-level processing (TCP/IP)
 53 |   - If different machines, channel between process and network device
 54 |   - If same machine, bypass fall protocol stack
 55 | 
 56 | ### Shared Memory IPC
 57 | 
 58 | - Read and write to shared memory region
 59 | - OS establishes shared channel between the processes
 60 |   - Physical pages mapped into virtual address space
 61 |   - VA (_P1_) and VA (_P2_) map to the same physical address (see lecture for diagram)
 62 |   - VA (_P1_) is **not** equal to VA (_P2_)
 63 |   - Physical memory does no need to be contiguous
 64 | - Pros:
 65 |   - System calls only for setup data copies potentially reduced (but not eliminated)
 66 | - Cons:
 67 |   - Explicit synchronization
 68 |   - Communication protocol, shared buffer management, etc. (programmer responsibility)
 69 | 
 70 | ## Copy vs Map
 71 | 
 72 | - Goal: transfer data from one into target address space
 73 | - **Copy**:
 74 |   - CPU cycles to copy data to/from port
 75 |   - Large data: `t(copy) >> t(map)`
 76 | - **Map**:
 77 |   - CPU cycles to map memory into address space
 78 |   - CPU to copy data to channel
 79 |   - Set up once use many times (good payoff)
 80 |   - Can perform well for one-time use
 81 | - Tradeoff exercised in Windows LPC (local producer callsS)
 82 | 
 83 | ## SysV Shared Memory
 84 | 
 85 | - **Segments** of shared memory: not necessarily contiguous physical pages
 86 | - Shared memory is system-wide: system limits on number of segments and total size
 87 | - **Create**: OS assigns unique key
 88 | - **Attach**: map VA to PA
 89 | - **Detach**: invalidate address mappings
 90 | - **Destroy**: only remove when explicitly deleted (or reboot)
 91 | 
 92 | ## Shared Memory and Sync
 93 | 
 94 | - _Like threads accessing shared state in a single address space but for processes_
 95 | - **Synchronization method**:
 96 |   - Mechanisms supported by process threading library (pthreads)
 97 |   - OS-supported IPC for synchronization
 98 | - **Either method must coordinate**:
 99 |   - Number of concurrent access to shared segment
100 |   - When data is available and ready for consumption
101 | 
102 | ## Shared Memory Design Considerations
103 | 
104 | - Consider the following when designing for memory:
105 |   - Different API/mechanisms for synchronization
106 |   - OS provides shared memory and is out of the way
107 |   - Data passing/sync protocols are up to the programmer
108 | 
109 | ## How Many Segments?
110 | 
111 | - One large segment: manager for allocating/freeing memory from shared segment
112 | - Many small segment:
113 |   - Use pool of segments, queue of segment ids
114 |   - Communicate segment IDs among processes
115 | 
116 | ## Design Considerations
117 | 
118 | - Consider the following questions:
119 |   - What size segments?
120 |   - What if data doesn't fit
121 | - Segment size is equivalent to data size:
122 |   - Works for well-known static sizes
123 |   - Limits max data size
124 | - Segment size is greater than message size:
125 |   - Transfer data in rounds
126 |   - Include protocol to track progress
127 | 


--------------------------------------------------------------------------------
/Lesson 04 - Processes and Process Management/lesson-04-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 4: Processes and Process Management
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - What is a process?
  5 |   - How are processes represented by OS's?
  6 |   - How are multiple concurrent processes managed by OS's?
  7 | 
  8 | ## What is a Process?
  9 | 
 10 | - What is a process?
 11 |   - A **process** is like **an order of toys** in that a process:
 12 |     - Has a state of execution (e.g., program counter, stack)
 13 |     - Has parts and temporary holding area (e.g., data, register state occupies state in memory)
 14 |     - May require special hardware (e.g., I/O devices)
 15 | - OS manages hardware on behalf of applications
 16 | - **Application** - program on disk, flash memory (static entity)
 17 | - **Process** - state of a program when executing loaded in memory (active entity)
 18 | 
 19 | ## What does a Process look like?
 20 | 
 21 | - A process encapsulates all data of a running application
 22 | - Every single element of the process state has to be uniquely identified by its address (OS abstraction used to encapsulate the process state is an address space)
 23 | - Some types of state include:
 24 |   - Text and data (static state when process first loads)
 25 |   - Heap - dynamically created during execution
 26 |   - Stack - grows and strinks (has LIFO queue)
 27 | 
 28 | ## Process Address Space
 29 | 
 30 | - **Address space** - *in memory* representation of a process
 31 | - **Page tables** - mapping of virtual to physical addresses
 32 | - **Physical address** - locations in physical memory
 33 | 
 34 | ## Address Space and Memory Management
 35 | 
 36 | - Parts of virtual address space may not be allocated
 37 | - May not be enough physical memory for all state
 38 | - Solution: the operating system dynamically decides which portion of which address space will be present where in physical memory
 39 | 
 40 | ## Process Execution State
 41 | 
 42 | - How does the OS know what a process is doing?
 43 |   - The **program counter** allows the OS to know where a process currently is in the instruction sequence
 44 |   - The program counter is maintained on a **CPU register** while the process is executing
 45 |   - There also exists a **stack pointer** which points to the top of the stack (useful for LIFO operations)
 46 |   - To maintain all of the above, the OS maintains a **PCB** (process control block)
 47 | 
 48 | ## Process Control Block
 49 | 
 50 | - What is a PCB?
 51 |   - A PCB (process control block) is a data structure that the OS maintains for every one of the processes that it manages
 52 |   - A PCB is created when process is created
 53 | - Certain fields are updated when process state changes
 54 | - Other fields change too frequently
 55 | 
 56 | ## Context Switch
 57 | 
 58 | - **Context switch** - switching the CPU from the context of one process to the context of another
 59 | - Context switching is expensive!
 60 |   - **Direct costs** - number of cycles for load to store instructions
 61 |   - **Indirect costs** - COLD cache! Cache misses!
 62 |   - Ultimately, we want to limit how frequently context switching is done!
 63 | 
 64 | ## Process Life Cycle: States
 65 | 
 66 | - Processes can be **running** or **idle**
 67 | - Process states can be: new, ready, running, waiting, or terminated
 68 | 
 69 | ## Process Life Cycle: Creation
 70 | 
 71 | - Two mechanisms for process creation:
 72 |   - **Fork**:
 73 |     - Copies the parent PCB into new child PCB
 74 |     - Child continues execution at instruction after fork
 75 |   - **Exec**:
 76 |     - Replace child image
 77 |     - Load new program and start from first instruction
 78 | 
 79 | ## Role of the CPU Scheduler
 80 | 
 81 | - A **CPU Scheduler** determines which one of the currently ready processes will be dispatched to the CPU to start running, and how long it should run for
 82 | - In general, the OS must be efficient:
 83 |   - **Preempt** - interrupt and save current context
 84 |   - **Schedule** - run scheduler to choose next process
 85 |   - **Dispatch** - dispatch process to switch into its context
 86 | 
 87 | ## Length of Process
 88 | 
 89 | - Useful CPU work can be determined by the following: `total processing time / total time`
 90 | - In general, total scheduling time should be considered overhead, we want most of the CPU time to be spent doing useful work
 91 | - **Time-slice** - time allocated to a process on the CPU
 92 | 
 93 | ### Inter Process Communication
 94 | 
 95 | - An OS must provide mechanisms to allow processes to interact with one another
 96 | - **IPC mechanisms**:
 97 |   - Help transfer data/info between address spaces
 98 |   - Maintain protection and isolation
 99 |   - Provide flexibility and performance
100 | - **Message-passing IPC**:
101 |   - OS provides communication channel liked shared buffer
102 |   - Processes write (send)/read (receive) messages to/from channel
103 |   - Pros: OS manages
104 |   - Cons: overheads
105 | - **Shared Memory IPC**:
106 |   - OS establishes a shared channel and maps it into each process address space
107 |   - Processes directly read/write from this memory
108 |   - Pros: OS is out of the way!
109 |   - Cons: may need to re-implement code
110 | 


--------------------------------------------------------------------------------
/Lesson 20 - Data Center Technologies/lesson-20-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 20: Data Center Technologies
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - Brief and high-level overview of challenges and technologies facing data centers
  5 |   - **Goal**: provide context for mechanisms from previous lessons
  6 |   - Multi-tier architectures for internet services
  7 |   - Cloud computing
  8 |   - Cloud and _big data_ technologies
  9 | 
 10 | ## Internet Services
 11 | 
 12 | - **Internet service**: any type of service provided via web interface
 13 | - **Presentation**: static content
 14 | - **Business logic**: dynamic content
 15 | - **Database tier**: data store
 16 |   - Not necessarily separate processes on separate machines
 17 |   - Many available open source and proprietary technologies
 18 | - **Middleware**: supporting, integrative or value-added software technologies
 19 |   - In multi-process configurations: some form of IPC used, including RPC/RMI, shared memory, etc.
 20 | 
 21 | ## Internet Service Architectures
 22 | 
 23 | - For scale: multi-process, multi-node (_scale out_ architecture)
 24 |   1. _Boss-worker_: front-end distributes requests to nodes
 25 |   2. _All equal_: all nodes execute any possible step in request processing, for any request (functionally homogeneous)
 26 |   3. _Specialized nodes_: nodes execute some specific step(s) in request processing for some request type (functionally heterogeneous)
 27 | 
 28 | ## Homogeneous Architectures
 29 | 
 30 | - **Functionally homogeneous**:
 31 |   - Each node can do any processing step
 32 |   - Pros:
 33 |     - Keeps front-end simple
 34 |     - Does not mean that each node has all data, just each node can get to all data
 35 |   - Cons:
 36 |     - How to benefit from caching?
 37 | 
 38 | ## Heterogeneous Architectures
 39 | 
 40 | - **Functionally heterogeneous**:
 41 |   - Different nodes, different tasks/requests
 42 |   - Data does not have to be uniformly accessible everywhere
 43 |   - Pros:
 44 |     - Benefit of locality and caching
 45 |   - Cons:
 46 |     - More complex front-end
 47 |     - More complex management
 48 | 
 49 | ## Cloud Computing Requirements
 50 | 
 51 | - **Traditional approach**:
 52 |   - Buy and configure resources: determine capacity based on expected demand (peak)
 53 |   - When demand exceeds capacity:
 54 |     - Dropped requests
 55 |     - Lost opportunity
 56 | - **Ideal cloud**:
 57 |   - Pros:
 58 |     - Capacity scales elastically with demand
 59 |     - Scaling instantaneous, both up and down
 60 |       - Cost is proportional to demand, to revenue opportunity
 61 |     - All of this happens automatically, no need for hacking wizardry
 62 |     - Can access anytime, anywhere
 63 |   - Cons:
 64 |     - Don't _own_ resources
 65 | - **Cloud computing requirements** (summarized):
 66 |   - On-demand, elastic resources and services
 67 |   - Fine-grained pricing based on usage
 68 |   - Professionally managed and hosted
 69 |   - API-based access
 70 | 
 71 | ## Cloud Computing Overview
 72 | 
 73 | - **Shared resources**:
 74 |   - Infrastructure and software/services
 75 | - **APIs for access and configuration**:
 76 |   - Web-based, libraries, command line, etc.
 77 | - **Billing/accounting services**:
 78 |   - Many models: spot, reservation, entire marketplace
 79 |   - Typically discrete quantities: tiny, medium, x-large, etc.
 80 | - **Managed by cloud provider**
 81 | 
 82 | ## Why Does Cloud Computing Work?
 83 | 
 84 | - **Law of large numbers**:
 85 |   - Per customer there is large variation in resource needs
 86 |   - Average across many customers is roughly constant
 87 | - **Economies of scale**:
 88 |   - Unit cost of providing resources or service drops at _bulk_
 89 | 
 90 | ## Cloud Computing Vision
 91 | 
 92 | > If computers of the kind I have advocated become computers of the future, then computing many some day be organized as a public utility, just as the telephone system is a public utility... The computer utility could become the basis of a new and important industry. (_John McCarthy, MIT Centennial, 1961_)
 93 | 
 94 | - Computing: fungible utility
 95 | - Limitations exist: API lock-in, hardware dependence, latency, privacy, security, etc.
 96 | 
 97 | ## Cloud Deployment Models
 98 | 
 99 | - **Public**: third-party customers/tenants
100 | - **Private**: leverage technology internally
101 | - **Hybrid (public + private)**: fail over, dealing with spikes, testing
102 | - **Community**: used by certain type of users
103 | 
104 | ## Cloud Service Models
105 | 
106 | - **On-premise**:
107 |   - You must manage all components and services
108 | - **IaaS (Infrastructure as a Service)**:
109 |   - You manage components such as applications, data, run-time, middleware, OS
110 |   - Others manage virtualization, servers, storage, and networking
111 | - **PaaS (Platform as a Service)**:
112 |   - You manage components such as applications and data
113 |   - Others manage run-time, middleware, OS, virtualization, servers, storage, and networking
114 | - **SaaS (Software as a Service)**:
115 |   - Opposite of on-premise, others manage all components and services
116 | 
117 | ## Requirements for the Cloud
118 | 
119 | 1. _Fungible_ resources
120 | 2. Elastic, dynamic resource allocation methods
121 | 3. Scale: management at scale, scalable resources allocations
122 | 4. Dealing with failures
123 | 5. Multi-tenancy: performance and isolation
124 | 6. Security
125 | 
126 | ## Cloud Enabling Technologies
127 | 
128 | - **Virtualization**:
129 |   - Resource provisioning (scheduling)
130 | - **Big data** processing (Hadoop MapReduce, Spark, etc.)
131 | - Storage:
132 |   - Distributed front-end (_append only_)
133 |   - NoSQL, distributed in-memory caches
134 | - Software-defined networking, storage, data centers, etc.
135 | - Monitoring: real-time log processing (e.g., AWS CloudWatch)
136 | 
137 | ## The Cloud as a Big Data Engine
138 | 
139 | - Data storage layer
140 | - Data processing layer
141 | - Caching layer
142 | - Language front-ends (e.g., querying)
143 | - Analytics libraries (e.g., ML)
144 | - Continuously streaming data
145 | 


--------------------------------------------------------------------------------
/Lesson 19 - Distributed Shared Memory/lesson-19-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 19: Distributed Shared Memory
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - DSM (distributed shared memory)
  5 |   - Distributed state management and design alternatives
  6 |   - Consistency model
  7 | 
  8 | ## Visual Metaphor
  9 | 
 10 | - **Managing distributed shared memory** is like **managing a tools/parts across all workspaces in a toy shop**:
 11 |   - Must decide placement:
 12 |     - Place memory (pages) close to relevant processes
 13 |   - Must decide migration:
 14 |     - When to copy memory (pages) from remote to local
 15 |   - Must decide sharing rules:
 16 |     - Ensure memory operations are properly ordered
 17 | 
 18 | ## Reviewing DFS
 19 | 
 20 | - **Clients**:
 21 |   - Send requests to file service
 22 | - **Caching**:
 23 |   - Improve performance (seen by clients) and scalability (supported by servers)
 24 | - **Servers**:
 25 |   - Own and manage state (files)
 26 |   - Provide service (file access)
 27 | 
 28 | ## Peer Distributed Applications
 29 | 
 30 | - **Each node**:
 31 |   - _Owns_ state
 32 |   - Provides service
 33 |   - All nodes are _peers_
 34 | - In _peer-to-peer_ even overall control and management done by all
 35 | 
 36 | ## DSM (Distributed Shared Memory)
 37 | 
 38 | - **Each node**:
 39 |   - _Owns_ state: memory
 40 |   - Provides service:
 41 |     - Memory reads/writes from any node
 42 |     - Consistency protocols
 43 | - Permits scaling beyond single machine memory limits:
 44 |   - More _shared_ memory at lower cost
 45 |   - Slower overall memory access
 46 |   - Commodity interconnect technologies support this RDMA
 47 | 
 48 | ## Hardware vs Software DSM
 49 | 
 50 | - **Hardware-supported** (expensive!):
 51 |   - Relies on interconnect
 52 |   - OS manages larger physical memory
 53 |   - NICs (network interconnect card) translate remote memory access to messages
 54 |   - NICs involved in all aspects of memory management; support atomics, etc.
 55 | - **Software-supported**:
 56 |   - Everything done by software
 57 |   - OS, or language run-time
 58 | 
 59 | ## DSM Design: Sharing Granularity
 60 | 
 61 | - Cache line granularity?
 62 |   - Overheads too high for DSM
 63 | - Pros:
 64 |   - Page granularity (OS-level)
 65 |   - Object granularity (language run-time)
 66 | - Cons:
 67 |   - Variable granularity
 68 | - Beware of false sharing, e.g., X and Y are on the same page!
 69 | 
 70 | ## DSM Design: Access Algorithm
 71 | 
 72 | - **What types of applications use DSM**?
 73 |   - Application access algorithm:
 74 |     - Single reader/single writer (SRSW)
 75 |     - Multiple readers/single writer (MRSW)
 76 |     - Multiple readers/multiple writers (MRMW)
 77 | 
 78 | ## DSM Design: Migration vs Replication
 79 | 
 80 | - **DSM performance metric**: access latency
 81 | - Achieving low latency through:
 82 |   - **Migration**:
 83 |     - Makes sense for SRSW
 84 |     - Requires data movement
 85 |   - **Replication** (caching):
 86 |     - More general
 87 |     - Requires consistency management
 88 | 
 89 | ## DSM Design: Consistency Management
 90 | 
 91 | - **DSM**: shared memory in SMPs
 92 | - **In SMP**:
 93 |   - Write-invalidate
 94 |   - Write-update
 95 | - Coherence operations on each write: overhead too high
 96 | - Push invalidations when data is written to:
 97 |   - Proactive
 98 |   - Eager
 99 |   - Pessimistic
100 | - Pull modification info periodically
101 |   - On-demand (reactive)
102 |   - Lazy
103 |   - Optimistic
104 | - When these methods get triggered depends on the consistency model for the shared state!
105 | 
106 | ## DSM Architecture
107 | 
108 | - **DSM architecture** (page-based, OS-supported):
109 |   - Each node contributes part of memory pages to DSM
110 |   - Need local caches for performance (latency)
111 |   - All nodes responsible for part of distributed memory
112 |   - Home node manages access and tracks page ownership
113 | - **Exact replicas** (explicit replication):
114 |   - For load balancing, performance, or reliability
115 |   - _home_/manager node controls management
116 | 
117 | ## Indexing Distributed State
118 | 
119 | - **Each page (object) has**:
120 |   - Address: node ID and page frame number
121 |   - Node ID: _home_ node
122 | - **Global map (replicated)**:
123 |   - Object (page) ID: manager node ID
124 |   - Manager map available on each node!
125 | - **Metadata for local pages (partitioned)**:
126 |   - Per-page metadata is distributed across managers
127 | - **Global mapping table**:
128 |   - Object ID: index into mapping table: manager node
129 | 
130 | ## Implementing DSM
131 | 
132 | - **Problem**: DSM must _intercept_ accesses to DSM state
133 |   - To send remote messages requesting access
134 |   - To trigger coherence messages
135 |   - Overheads should be avoided for local, non-shared state (pages)
136 |   - Dynamically _engage_ and _disengage_ DSM when necessary
137 | - **Solution**: Use hardware MMU support!
138 |   - Trap in OS if mapping invalid or access not permitted
139 |   - Remote address mapping: trap and pass to DSM to send message
140 |   - Cached content: trap and pass to DSM to perform necessary coherence operations
141 |   - Other MMU information useful (e.g., dirty page)
142 | 
143 | ## What is a Consistency Model?
144 | 
145 | - **Consistency model**: agreement between memory (state) and upper software layers
146 | - _Memory behaves correctly if and only if software follows specific rules_
147 | - Memory (state) guarantees to behave correctly:
148 |   - Access ordering
149 |   - Propagation/visibility of updates
150 | 
151 | ## Strict Consistency
152 | 
153 | - **Strict consistency**: updates visible everywhere immediately
154 | - In practice:
155 |   - Even on single SMP, no guarantees on order without extra locking and synchronization
156 |   - In distributed systems, latency and message reorder/loss make this even harder (impossible to guarantee)
157 | 
158 | ## Sequential Consistency
159 | 
160 | - **Sequential consistency**:
161 |   - Memory updates from different processors may be arbitrarily interleaved
162 |   - All processes will see the same interleaving
163 |   - Operations from same process always appear in order they were issued
164 | 
165 | ## Casual Consistency
166 | 
167 | - **Casual consistency**:
168 |   - _Concurrent_ writes: No guarantees
169 |   - Casually related writes: ordered
170 | 
171 | ## Weak Consistency
172 | 
173 | - **Synchronization points**: operations (read, write, sync)
174 |   - All updates prior to a sync point will be visible
175 |   - No guarantee what happens in between
176 | - **Variations**:
177 |   - Single sync operations (sync)
178 |   - Separate sync per subset of state (page)
179 |   - Separate _entry/acquire_ vs _exit/release_ operations
180 | - Pros: limit data movement and coherence operations
181 | - Cons: maintain extra state for additional operations
182 | 


--------------------------------------------------------------------------------
/Lesson 16 - Virtualization/lesson-16-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 16: Virtualization
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - Overview of virtualization
  5 |   - Main technical approaches in popular virtualization solutions
  6 |   - Virtualization-related hardware advances
  7 | 
  8 | ## What is Virtualization?
  9 | 
 10 | - **Virtualization** allows concurrent execution of multiple OS (and their applications) on the same physical machine
 11 | - **Virtual resources**: each OS thinks that it _owns_ hardware resources
 12 | - **Virtual machine (VM)**: OS applications and virtual resources (guest domain)
 13 | - **Virtualization layer**: management of physical hardware (virtual machine monitor, hypervisor)
 14 | 
 15 | ## Defining Virtualization
 16 | 
 17 | - A **virtual machine** is an **efficient, isolated duplicated of the real machine** supported by a **VMM (virtual machine monitor)**:
 18 |   - Provides environment essentially identical with the original machine
 19 |   - Programs show at worst only minor decrease in speed
 20 |   - VMM is in complete control of system resources
 21 | 
 22 | ## Benefits of Virtualization
 23 | 
 24 | - Pros:
 25 |   - Consolidation: decrease cost; improve manageability
 26 |   - Migration: availability, reliability
 27 |   - Security, debugging, support for legacy OS
 28 | 
 29 | ## Virtualization Models: Bare-metal
 30 | 
 31 | - **Bare-metal**: hypervisor-based
 32 |   - VMM (hypervisor) manages all hardware resources and supports execution of VMs
 33 |   - Privileged, service VM to deal with devices
 34 | - Xen (open source or Citrix XenServer):
 35 |   - DomO and DomU
 36 |   - Drivers in DomO
 37 | - ESX (VMWare):
 38 |   - Many open APIs
 39 |   - Drivers in VMM
 40 |   - Used to have Linux control core, now remote APIs
 41 | 
 42 | ## Virtualization Models: Hosted
 43 | 
 44 | - **Hosted**:
 45 |   - Host OS owns all hardware
 46 |   - Special VMM module provides **hardware interfaces** to VMs and deals with VM context switching
 47 | - Example: KVM (kernel-based VM shown in lecture)
 48 |   - Based on Linux
 49 |   - KVM kernel module plus QEMU (Quick Emulator) for hardware virtualization
 50 |   - Leverages Linux open-source community
 51 | 
 52 | ## Processor Virtualization
 53 | 
 54 | - **Trap-and-emulate**:
 55 |   - Guest instructions are executed directly by hardware
 56 |   - For non-privileged operations: hardware speeds must provide efficiency
 57 |   - For privileged operations: trap to hypervisor
 58 |   - Hypervisor determines what needs to be done:
 59 |     - If illegal operation: terminate VM
 60 |     - If legal operation: emulate the behavior the guest OS was expecting from the hardware
 61 | 
 62 | ## x86 Virtualization in the Past
 63 | 
 64 | - **x86 virtualization pre-2005**
 65 |   - Four rings, no root/non-root modes yet
 66 |   - Hypervisor in ring 0, guest OS in ring 1
 67 |   - However, 17 privileged instructions do not trap, they fail silently!
 68 | - Cons:
 69 |   - Hypervisor does not know so it does not try to change settings
 70 |   - OS does not know, so it assumes change was successful
 71 | 
 72 | ## Binary Translation
 73 | 
 74 | - **Main idea**: rewrite the VM binary to never issue those 17 instructions
 75 |   - Pioneered by Mendel Rosenblum's group at Stanford, commercialized as VMware
 76 | - **Binary translation**:
 77 |   - Goal: full virtualization (guest OS not modified)
 78 |   - Approach: dynamic binary translation
 79 | - Inspect code blocks to be executed
 80 | - If needed, translate to alternate instruction sequence
 81 | - Otherwise, run at hardware speeds
 82 | 
 83 | ## Paravirtualization
 84 | 
 85 | - Goal: performance, give up on unmodified guests
 86 | - Approach: paravirtualization, modify guest so that...
 87 |   - It knows it's running virtualized
 88 |   - It makes explicit calls to the hypervisor (hypercalls)
 89 |   - **Hypercall**: system calls
 90 |     - Package context info
 91 |     - Specify desired hypercall
 92 |     - Trap to VMM
 93 | 
 94 | ## Memory Virtualization Full
 95 | 
 96 | - All guests expect contiguous physical memory, starting at 0
 97 | - Virtual vs physical vs machine addresses (VA vs PA vs MA) and page frame numbers
 98 | - Still leverages hardware MMU, TLB, etc.
 99 | - **Option 1**:
100 |   - Guest page table: VA to PA
101 |   - Hypervisor: PA to MA
102 |   - **Too expensive!**
103 | - **Option 2**:
104 |   - Guest page tables: VA to PA
105 |   - Hypervisor shadow PT: VA to MA
106 |   - Hypervisor maintains consistence
107 | 
108 | ## Memory Virtualization Paravirtualized
109 | 
110 | - **Paravirtualized**:
111 |   - Guest aware of virtualization
112 |   - No longer strict requirement on contiguous physical memory starting at 0
113 |   - Explicitly registers page tables with hypervisor
114 |   - Can _batch_ page table updates to reduce VM exits
115 |   - _Other optimizations_
116 | - **Bottom line**: overheads eliminated or reduced on newer platforms
117 | 
118 | ## Device Virtualization
119 | 
120 | - For CPUs and memory:
121 |   - Less diversity
122 |   - ISA level (instruction set architecture level) _standardization_ of interface
123 | - For devices:
124 |   - High diversity
125 |   - Lack of standard specification of device interface and behavior
126 | - Three key models for device virtualization (see later slides)
127 | 
128 | ## Passthrough Model
129 | 
130 | - **Approach**: VMM-level driver configures device access permissions
131 | - Pros:
132 |   - VM provided with exclusive access to the device
133 |   - VM can directly access the device (VMM-bypass)
134 | - Cons:
135 |   - Device sharing difficult
136 |   - VMM must have exact type of device as what VM expects
137 |   - VM migration tricky
138 | 
139 | ## Hypervisor Direct Model
140 | 
141 | - **Approach**:
142 |   - VMM intercepts all device accesses
143 |   - Emulate device operation:
144 |     - Translate to generic I/O operation
145 |     - Transverse VMM-resident I/O stack
146 |     - Invoke VMM-resident driver
147 | - Cons:
148 |   - Latency of device operations
149 |   - Device driver ecosystem complexities in hypervisor
150 | 
151 | ## Split-device Driver Model
152 | 
153 | - **Approach**:
154 |   - Device access control **split** between:
155 |     - Front-end driver in guest VM (device API)
156 |     - Back-end driver in service VM (or host)
157 |     - Modified guest drivers
158 | - Pros:
159 |   - Eliminate emulation overhead
160 |   - Allow for better management of shared devices
161 | 
162 | ## Hardware Virtualization
163 | 
164 | - **AMD Pacifica and Intel Vanderpool Technology (Intel-VT)**, 2005:
165 |   - _Close holes_ in x86 ISA
166 |   - Modes: root/non-root (or _host_ and _guest_ mode)
167 |   - VM control structure
168 |   - Extended page tables and tagged TLB with VM ids
169 |   - Multi-queue devices and interrupt routing
170 |   - Security and management support
171 |   - Additional instructions to exercise previously mentioned features
172 | 


--------------------------------------------------------------------------------
/Lesson 15 - IO Management/lesson-15-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 15: I/O Management
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - OS support for I/O devices
  5 |   - Block device stack
  6 |   - File system architecture
  7 | 
  8 | ## Visual Metaphor
  9 | 
 10 | - **I/O** is like **a top shop shipping department**:
 11 |   - Have protocols:
 12 |     - Interfaces for device I/O
 13 |   - Have dedicated handlers:
 14 |     - Device drivers, interrupt handlers, etc.
 15 |   - Decouple I/O details from core processing:
 16 |     - Abstract I/O device detail from applications
 17 | 
 18 | ## I/O Devices
 19 | 
 20 | - Basic I/O device features:
 21 |   - Control registers:
 22 |     - Command
 23 |     - Data transfers
 24 |     - Status
 25 |   - Micro-controller (device's CPU)
 26 |   - On device memory
 27 |   - Other logic: e.g., analog to digital converters
 28 | 
 29 | ## CPU Device Interconnect
 30 | 
 31 | - Peripheral Component Interconnect (PCI):
 32 |   - PCI express (PCIe)
 33 | - Other types of interconnects:
 34 |   - SCSI (small computer system interface) bus
 35 |   - Peripheral bus
 36 |   - Bridges handle differences
 37 | 
 38 | ## Device Drivers
 39 | 
 40 | - Per each device type
 41 | - Responsible for device access, management and control
 42 | - Provided by device manufacturers per OS/version
 43 | - Each OS standardizes interfaces:
 44 |   - Device independence
 45 |   - Device diversity
 46 | 
 47 | ## Types of Devices
 48 | 
 49 | - **Block: disk**
 50 |   - Read/write blocks of data
 51 |   - Direct access to arbitrary block
 52 | - **Character: keyboard**
 53 |   - Get/put character
 54 | - **Network devices**
 55 | - OS representation of a device: special device file
 56 | 
 57 | ## Device Interactions
 58 | 
 59 | - Access device registers: memory load/store
 60 | - **Memory-mapped I/O**:
 61 |   - Part of _host_ physical memory dedicated for device interactions
 62 |   - BAR (base address registers)
 63 | - **I/O port**:
 64 |   - Dedicated in/out instructions for device access
 65 |   - Target device (I/O port) and value in register
 66 | - **Interrupt**:
 67 |   - Pros: can be generated as soon as possible
 68 |   - Cons: interrupt handling steps
 69 | - **Polling**:
 70 |   - Pros: When convenient for OS
 71 |   - Cons: delay or CPU overhead
 72 | 
 73 | ## Device Access PIO
 74 | 
 75 | - No additional hardware support
 76 | - CPU _programs_ the device:
 77 |   - Via command registers
 78 |   - Via data movement
 79 | - An example of a PIO (programmed I/O): NIC data (network packet shown in lecture)
 80 |   - Write command to request packet transmission
 81 |   - Copy packet to data registers
 82 |   - Repeat until packet sent
 83 | 
 84 | ## Device Access DMA
 85 | 
 86 | - Relies on DMA (direct memory access) controller
 87 | - CPU _programs_ the device:
 88 |   - Via command registers
 89 |   - Via DMA controls
 90 | - An example of a DMA: NIC data (network packet shown in lecture)
 91 |   - Write command to request packet transmission
 92 |   - Configure DMA controller with **in-memory address and size of packet buffer**
 93 |   - Less steps but DMA config is more complex
 94 | - For DMAs:
 95 |   - Data buffer must be in physical memory until transfer completes: pinning regions (non-swappable)
 96 | 
 97 | ## Typical Device Access
 98 | 
 99 | - _See lecture for diagram_
100 | - Typical device access includes the following:
101 |   - System call
102 |   - In-kernel stack
103 |   - Driver invocation
104 |   - Device request configuration
105 |   - Device performs request
106 | 
107 | ## OS Bypass
108 | 
109 | - _See lecture for diagram_
110 | - Device regs/data directly accessible
111 | - OS configures then out-of-the-way
112 | - **User-level driver**:
113 |   - OS retains coarse-grain control
114 |   - Relies on device features:
115 |     - Sufficient registers
116 |     - Demux capability
117 | 
118 | ## Sync vs Async Access
119 | 
120 | - _See lecture for diagram_
121 | - **Synchronous I/O operations**: process blocks
122 | - **Asynchronous I/O operations**: process continues
123 |   - Process checks and retrieves result
124 |   - Process is notified that the operation completed and results are ready
125 | 
126 | ## Block Device Stack
127 | 
128 | - _See lecture for diagram_
129 | - Processes use files: logical storage unit
130 | - Kernel file system (FS):
131 |   - Where and how to find and access file
132 |   - OS specifies interface
133 | - Generic block layer:
134 |   - OS standardized block interface
135 | - Device driver
136 | 
137 | ## Virtual File System
138 | 
139 | - Problem: how to address the following?
140 |   - What if files are on more than one device?
141 |   - What if devices work better with different file system implementations?
142 |   - What if files are not on a local device (accessed via network)?
143 | - Solution: use a file system
144 | 
145 | ## Virtual File System Abstractions
146 | 
147 | - **File**: elements on which the VFS (virtual file system) operations
148 | - **File descriptor**: OS representation of file
149 |   - Open, read, write, send file, lock, close, etc.
150 | - **Inode**: persistent representation of file _index_
151 |   - List of all data blocks
152 |   - Device, permissions, size, etc.
153 | - **Dentry**: directory entry, corresponds to single path component
154 | - **Superblock**: file system specific information regarding the file system layout
155 | 
156 | ## VFS on Disk
157 | 
158 | - **File**: data blocks on disk
159 | - **Inode**: track files' blocks and also resides on disk in some block
160 | - **Superblock**: overall map of disk blocks
161 |   - Inode blocks
162 |   - Data blocks
163 |   - Free blocks
164 | 
165 | ## `ext2`: Second Extended File System
166 | 
167 | - For each block group:
168 |   - **Superblock**: number of inodes, disk blocks, start of free blocks
169 |   - **Group descriptor**: bitmaps, number of free nodes, directories
170 |   - **Bitmaps**: tracks free blocks and inodes
171 |   - **Inodes**: one to max number (one per file)
172 |   - **Data blocks**: file data
173 | 
174 | ## Inodes
175 | 
176 | - **Inodes**: index of all disk blocks corresponding to a file
177 |   - File: identified by inode
178 |   - Inode: list of all blocks plus other metadata
179 | - Pros: easy to perform sequential or random access
180 | - Cons: limit on file size
181 | 
182 | ## Inodes with Indirect Pointers
183 | 
184 | - **Inodes with indirect pointers**: index of all disk blocks corresponding to a file
185 | - Inodes contain:
186 |   - Metadata
187 |   - Pointers to blocks
188 | - **Direct pointer**: points to data block
189 | - **Indirect pointer**: block of pointers
190 | - **Double indirect pointer**: block of block of pointers
191 | - Pros: small inode means large file size
192 | - Cons: file access slow down
193 | 
194 | ## Disk Access Optimizations
195 | 
196 | - **Caching/buffering**: reduce number of disk accesses
197 |   - Buffer cache in main memory
198 |   - Read/write from cache
199 |   - Periodically flush to disk (`fsync()`)
200 | - **I/O scheduling**: reduce disk head movement
201 |   - Maximize sequential vs random access
202 | - **Prefetching**: increase cache hits
203 |   - Leverages locality
204 | - **Journaling/logging**: reduce random access
205 |   - _Describe_ write in log: block, offset, value, etc.
206 |   - Periodically apply updates to proper disk locations
207 | 


--------------------------------------------------------------------------------
/Lesson 14 - Synchronization Constructs/lesson-14-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 14: Synchronization Constructs
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - More synchronization constructs
  5 |   - Hardware supported synchronization
  6 | 
  7 | ## Visual Metaphor
  8 | 
  9 | - **Synchronization** is like **waiting** for a coworker to finish **so you can continue working**:
 10 |   - May repeatedly check to continue:
 11 |     - Sync using spinlocks
 12 |   - May wait for a signal to continue:
 13 |     - Sync using mutexes and condition variables
 14 |   - Waiting hurts performance:
 15 |     - CPUs waste cycles for checking
 16 |     - Cache effects
 17 | 
 18 | ## More About Synchronization
 19 | 
 20 | - Limitation of mutexes and condition variables:
 21 |   - Error prone/correctness/ease-of-use:
 22 |     - Unlock wrong mutex, signal wrong condition variable
 23 |   - Lack of expressive power:
 24 |     - Helper variables for access or priority control
 25 |   - Low level support:
 26 |     - Hardware atomic instructions
 27 | 
 28 | ## Spinlocks
 29 | 
 30 | - Spinlock is like a mutex:
 31 |   - Mutual exclusion
 32 |   - Lock and unlock (free)
 33 | - When lock is busy that means the thread is spinning
 34 | 
 35 | ## Semaphores
 36 | 
 37 | - Semaphores are a common sync construct in OS kernels:
 38 |   - Similar to a traffic light (stop and go state)
 39 |   - Similar to a mutex but more general
 40 | - **Count-based sync** (semaphores can be an integer value):
 41 |   - On init: assigned a max value positive integer (maximum count)
 42 |   - On try (wait): if non-zero, decrement and proceed (counting semaphore)
 43 |   - If initialized with `1`: semaphore is equal to mutex (binary semaphore)
 44 |   - On exit (post): increment
 45 | 
 46 | ## Reader Writer Locks
 47 | 
 48 | - Syncing different types of accesses:
 49 |   - Read (never modify): shared access
 50 |   - Write (always modify): exclusive access
 51 | - Read/write locks:
 52 |   - Specify the type of access then lock behaves accordingly
 53 | 
 54 | ## Monitors
 55 | 
 56 | - Monitors are a high-level synchronization construct
 57 | - Monitors specify:
 58 |   - Shared resource
 59 |   - Entry procedure
 60 |   - Possible condition variables
 61 | - On entry: lock, check, etc.
 62 | - On exit: unlock, check, signal, etc.
 63 | 
 64 | ## Need for Hardware Support
 65 | 
 66 | - Problem: concurrent check/update on different CPUs can overlap
 67 | - Solution: hardware-supported atomic instructions
 68 | 
 69 | ## Atomic Instructions
 70 | 
 71 | - Hardware specific:
 72 |   - `test_and_set()`
 73 |   - `read_and_compare()`
 74 |   - `compare_and_swap()`
 75 | - Guarantees:
 76 |   - Atomicity
 77 |   - Mutual exclusion
 78 |   - Queue all concurrent instructions but one
 79 | - Atomic instructions are critical sections with hardware-supported synchronization:
 80 | 
 81 | ```c
 82 | // Specialize/optimize to available atomics
 83 | spinlock_lock(lock): // Spin until free
 84 |     while(test_and_set(lock) == busy);
 85 | ```
 86 | 
 87 | - `test_and_set(lock)`: atomically returns `(tests)` original value and sets new value to `1` (busy)
 88 | - First thread: `test_and_set(lock) == 0` (free)
 89 | - Next ones: `test_and_set(lock) == 1` (busy)
 90 |   - Reset lock to `1` (busy)
 91 | 
 92 | ## Shared Memory Multi-processors
 93 | 
 94 | - A multi-processor system consists of more than one CPU and has memory accessible to all CPUs (see lecture slide for bus-based vs interconnect based)
 95 | - SMP (shared memory multi-processors): systems where a bus is shared across all modules which allows the system's memory to be accessible to all CPUs
 96 | - SMPs also have cache:
 97 |   - Hides memory latency
 98 |   - Memory _further away_ due to contention
 99 |   - No write, write-through, write-back
100 | 
101 | ## Cache Coherence and Atomics
102 | 
103 | - Atomics always issued to the memory controller:
104 |   - Pros: can be ordered and synchronized
105 |   - Cons: takes much longer and generates coherence traffic regardless of change
106 | - Atomics and SMP:
107 |   - Expensive because of bus or I/C contention
108 |   - Expensive because of cache bypass and coherence trafficX
109 | 
110 | ## Spinlock Performance Metrics
111 | 
112 | - Reduce **latency**:
113 |   - _Time to acquire a free lock_
114 |   - Ideally immediately execute atomic
115 | - Reduce **waiting time (delay)**:
116 |   - _Time to stop spinning and acquire a lock that has been freed_
117 |   - Ideally immediately
118 | - Reduce **contention**:
119 |   - _Bus/network I/C traffic_
120 |   - Ideally zero
121 | 
122 | ## Test and Set Spinlock
123 | 
124 | - Test and set spinlock implementation (see lecture):
125 |   - Pros:
126 |     - Latency: minimal (atomic)
127 |     - Delay: Potentially min (spinning continuously on the atomic)
128 |   - Cons:
129 |     - Contention: processors go to memory on each spin
130 | 
131 | ## Test and Test and Set Spinlock
132 | 
133 | - Test and test and set spinlock implementation (see lecture):
134 |   - Spin on read
135 |   - Spin on cached value
136 |   - Pros:
137 |     - Latency: ok
138 |     - Delay: ok
139 |   - Cons:
140 |     - Contention: better than test and set spinlock but...
141 |       - Non-cached coherent architecture: no difference
142 |       - Cache coherence with write update architecture: ok
143 |       - Cache coherence with write invalidate architecture: horrible
144 |     - Contention due to atomics + caches invalidated means more contention
145 |     - Everyone sees lock is free at the same time
146 |     - Everyone tries to acquire the lock at the same time
147 | 
148 | ## Spinlock _Delay_ Alternatives
149 | 
150 | - Delay after lock release:
151 |   - Everyone sees lock is free
152 |   - Not everyone attempts to acquire it
153 |   - Pros:
154 |     - Contention improved
155 |     - Latency ok
156 |   - Cons:
157 |     - Delay is much worse
158 | - Delay after each lock reference:
159 |   - Does not spin constantly
160 |   - Works on non-cached coherent architectures
161 |   - Can hurt delay even more however
162 |   - Pros:
163 |     - Contention improved
164 |     - Latency ok
165 |   - Cons:
166 |     - Delay is **much worse**
167 | 
168 | ## Picking a Delay
169 | 
170 | - **Static Delay** (based on fixed value, e.g., CPU ID):
171 |   - Simple approach
172 |   - Unnecessary delay under low contention
173 | - **Dynamic Delay** (backoff-based):
174 |   - Random delay in a range that increases with _perceived_ contention
175 |   - Perceived is the same as failed `test_and_set()`
176 |   - Delay after each reference will keep growing based on contention or length of critical section
177 | 
178 | ## Queueing Lock
179 | 
180 | - **Common problem in spinlock implementations**:
181 |   - Everyone tries to acquire lock at the same time once lock is freed: delay alternatives
182 |   - Everyone sees the lock is free at the same time (Anderson's Queueing Lock)
183 | - Solution:
184 |   - Set unique **ticket** for arriving thread
185 |   - Assigned `queue[ticket]` is private lock
186 |   - Enter critical section when you have lock:
187 |     - `queue[ticket] == must_wait` (spin)
188 |     - `queue[ticket] == has_lock` (enter critical section)
189 |   - Signal/set next lock holder on exit:
190 |     - `queue[ticket + 1] = has_lock`
191 | - Cons:
192 |   - Assumes `read_and_increment` atomic
193 |   - _O(n)_ size
194 | 
195 | ## Queueing Lock Implementation
196 | 
197 | - Pros:
198 |   - Delay: directly signal next CPU/thread to run
199 |   - Contention: better but requires cache coherence and cache line aligned elements
200 |   - Only one CPU/thread sees the lock is free and tries to acquire lock!
201 | - Cons:
202 |   - Latency: more costly read and increment
203 | 
204 | ## Spinlock Performance Comparisons
205 | 
206 | - **Setup** (see lecture for figure):
207 |   - _N_ processes running critical section one million times
208 |   - _N_ varied based on system
209 | - **Metrics**:
210 |   - Overhead compared to ideal performance
211 |   - Theoretical limit based on number of critical sections to be run
212 | - **Under high loads**:
213 |   - Queue best (most scalable), `test_and_test_and_set` worst
214 |   - Static better than dynamic, reference better than release (avoids extra invalidations)
215 | - **Under light loads**:
216 |   - `test_and_test_and_set` good (low latency)
217 |   - Dynamic better than static (lower delay)
218 |   - Queueing lock worst (high latency due to read and increment)
219 | 


--------------------------------------------------------------------------------
/Lesson 18 - Distributed File Systems/lesson-18-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 18: Distributed File Systems
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - DFS (distributed file systems) design and implementation
  5 |   - NFS (network file systems)
  6 | 
  7 | ## Visual Metaphor
  8 | 
  9 | - **Distributed file systems** are like **distributed storage facilities**:
 10 |   - Accessed via well-defined interface:
 11 |     - Access via VFS
 12 |   - Focus on consistent state:
 13 |     - Tracking state, file updates, cache coherence, etc.
 14 |   - Mix distribution models possible:
 15 |     - Replicated vs partitioned, peer-like systems, etc.
 16 | 
 17 | ## Distributed File Systems
 18 | 
 19 | - _See lecture for figure_
 20 | - **DFS**: an environment where multiple machines are involved in the delivery of the file system service
 21 |   - Includes file-system interfaces which use VFS interface to abstract and hide file system organizations
 22 |   - Uses OS to hide local physical storage on a machine
 23 |   - Has files maintained on a remote machine or on a remote file system that is being accessed over the network
 24 | 
 25 | ## DFS Models
 26 | 
 27 | - **Client/server on different machines**
 28 | - File server distributed on multiple machines
 29 |   - **Replicated** (each server: all files)
 30 |   - **Partitioned** (each server: part of files)
 31 |   - **Both** (files partitioned; each partition replicated)
 32 | - Files stored on and served from all machines (peers)
 33 | - Blurred distinction between clients and servers
 34 | 
 35 | ## Remote File Service: Extremes
 36 | 
 37 | - **Extreme 1: upload/download**
 38 |   - Like FTP, SVN, etc.
 39 | - **Extreme 2: true remote file access**
 40 |   - Every access to remote file, nothing done locally
 41 | - Pros:
 42 |   - File accesses centralized, easy to reason about consistency
 43 | - Cons:
 44 |   - Every file operation pays network cost
 45 |   - Limits server scalability
 46 | 
 47 | ## Remote File Service: A Compromise
 48 | 
 49 | - A more practical remote file access (with caching)
 50 | - Allow clients to store parts of files locally (blocks)
 51 |   - Pros:
 52 |     - Low latency on file operations
 53 |     - Server load reduced: is more scalable
 54 | - Force clients to interact with server (frequently)
 55 |   - Pros:
 56 |     - Server has insights into what clients are doing
 57 |     - Server has control into which accesses can be permitted: easier to maintain consistency
 58 |   - Cons:
 59 |     - Server more complex, requires difference file sharing semantics
 60 | 
 61 | ## Stateless vs Stateful File Server
 62 | 
 63 | - **Stateless**: keeps no state, ok with extreme models but cannot support _practical_ model
 64 |   - Pros:
 65 |     - No resources are used on server side (CPU/memory)
 66 |     - ON failure, just restart
 67 |   - Cons:
 68 |     - Cannot support caching and consistency management
 69 |     - Every request self-contained: more bits transferred
 70 | - **Stateful**: keeps client state, needed for _practical_ model to track what is cached/accessed
 71 |   - Pros:
 72 |     - Can support locking, caching, incremental operations
 73 |   - Cons:
 74 |     - On failure, need check-pointing and recovery mechanisms
 75 |     - Overheads to maintain state and consistency: depends on caching mechanism and consistency protocol
 76 | 
 77 | ## Caching State in a DFS
 78 | 
 79 | - Locally clients maintain portion of state (e.g., file blocks)
 80 | - Locally clients perform operations on cached state (e.g. open/read/write, etc.): requires coherence mechanisms
 81 | - How?
 82 |   - SMP: write-update/write-invalidate
 83 |   - DFS: client/server-driven
 84 | - When?
 85 |   - SMP: on write
 86 |   - DFS: on demand, periodically, on open, etc.
 87 | - Details depend on file sharing semantics
 88 | 
 89 | ## File Sharing Semantics on a DFS
 90 | 
 91 | - **UNIX semantics**: every write visible immediately
 92 | - **Session semantics**: (between open-close: session)
 93 |   - Write-back on `close()`, update on `open()`
 94 |   - Easy to reason, but may be insufficient
 95 | - **Periodic updates**:
 96 |   - Client writes-back periodically: clients have a _lease_ on cached data (not exclusive necessarily)
 97 |   - Server invalidates periodically: provides bounds on _inconsistency_
 98 |   - Augment with `flush()`/`sync()` API
 99 | - **Immutable files**: never modify, new files created instead
100 | - **Transactions**: all changes atomic
101 | 
102 | ## File vs Directory Service
103 | 
104 | - Too many options?
105 |   - Sharing frequency
106 |   - Write frequency
107 |   - Importance of consistent view
108 | - Optimize for common case
109 | - Two types of files:
110 |   - Regular files vs directories
111 |   - Choose different policies for each
112 |     - E.g., session-semantics for files, UNIX for directories
113 |     - E.g., less frequent write-back for files than directories
114 | 
115 | ## Replication and Partitioning
116 | 
117 | - **Replication**: each machine holds all files
118 |   - Pros: load balancing, availability, fault tolerance
119 |   - Cons:
120 |     - Writes become more complex:
121 |       - Synchronously to all
122 |       - Or write to one then propagated to others
123 |     - Replicas must be reconciled:
124 |       - E.g., voting
125 | - **Partitioning**: each machine has subset of files
126 |   - Pros:
127 |     - Availability vs single server DFS
128 |     - Scalability with file system size
129 |     - Single file writes simpler
130 |   - Cons:
131 |     - On failure, lose portion of data
132 |     - Load balancing harder; if not balanced them hot-spots possible
133 | - Can combined both techniques, replicate each partition
134 | 
135 | ## NFS (Network File System)
136 | 
137 | - _See lecture for figure_
138 | - A NFS typically includes a client and a server; however, clients act as the remote server over a network
139 | - **Client**:
140 |   - Client requests for file access starts at _system call layer_ and moves to _VFS layer_
141 |   - At the _VFS layer_, a decision will be made for where the file belongs to (the _local file system interface_ or the _NFS client_)
142 |   - If _NFS client_ is chosen, it will move on to the _RPC client stub_ which communicates with the _RPC server sub_
143 | - **Server**:
144 |   - Continuing from the _RPC server stub_, the call could make it's way to the _NFS server_ which resides on a remote machine
145 |   - The _NFS server_ could communicate with the _VFS layer_ on the server side to get access to the file
146 |   - From the _VFS layer_, the layout is about the same as the client side
147 |   - When an open request comes from the client, the _NFS server_ will create a file handle (i.e. a byte sequence that encodes both the server machine as well as the server local file information which will be return to the client)
148 |   - If files are deleted or the server machine dies, the file handle will return an error for _stale data_ (invalid data)
149 | 
150 | ## NFS Versions
151 | 
152 | - Since 80s, currently NFSv3 and NFSv4
153 | - NFSv3: stateless, NFSv4: stateful
154 | - **Caching**:
155 |   - Session-based (non-concurrent)
156 |   - Periodic updates
157 |     - Default: three seconds for files; 30 seconds for directory
158 |     - NFSv4: delegation to client for a period of time (avoids _update checks_)
159 | - **Locking**:
160 |   - Lease-based
161 |   - NFSv4: also _share reservation_, reader/writer lock
162 | 
163 | ## Sprite Distributed File Systems
164 | 
165 | - _Caching in the Sprite Network File System_, by Nelson et al.
166 |   - Research DFS
167 |   - Great value in the explanation of the design process
168 | - Takeaway: used trace data on usage/file access patterns to analyze DFS design requirements and justify decisions
169 | 
170 | ## Sprite DFS Access Pattern Analysis
171 | 
172 | - **Access pattern (workload) analysis**:
173 |   - 33% of all file accesses are writes
174 |     - Caching ok but write-though not sufficient
175 |   - 75% of files are open less than 0.5 seconds
176 |   - 90% of files are open less than 10 seconds
177 |     - Session semantics still too high overhead
178 |   - 20-30% of new data deleted within 30 seconds
179 |   - 50% of new data deleted within 5 minutes
180 |   - File sharing is rare!
181 |     - Write-back on close not necessary
182 |     - No need to optimize for concurrent access but must support it
183 | 
184 | ## Sprite DFS from Analysis to Design
185 | 
186 | - **From analysis to design**:
187 |   - Cache with write-back
188 |   - Every 30 seconds write-blocks that have NOT been modified for the last 30 seconds
189 |     - When another client opens file: get dirty blocks
190 |   - Open goes to server, directories not cached
191 |   - On _concurrent write_: disable caching
192 | - **Sprite sharing semantics**:
193 |   - Sequential write sharing: caching and sequential semantics
194 |   - Concurrent write sharing: no caching
195 | 
196 | ## File Access Operations in Sprite
197 | 
198 | - $R_1... R_n$ **readers, w, writer**:
199 |   - All `open()` go through server
200 |   - All clients cache blocks
201 |   - Writer keeps timestamps for each modified block
202 | - $w_2$ **sequential writer** (sequential sharing):
203 |   - Server contacts last last writer for dirty blocks
204 |   - Since $w_2$ has not closed: disabled caching!
205 | 


--------------------------------------------------------------------------------
/Lesson 17 - Remote Procedure Calls/lesson-17-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 17: Remote Procedure Calls
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - RPC (remote procedure calls)
  5 | 
  6 | ## Why RPC?
  7 | 
  8 | - **Example 1: Get File App**
  9 |   - Client-server
 10 |   - **Create** and **init sockets**
 11 |   - **Allocate** and **populate buffers**
 12 |   - **Include _protocol_ info** (e.g., get file, size, etc.)
 13 |   - **Copy data** into buffers (e.g., filename, file, etc.)
 14 | - **Example 2: Mod Image App**
 15 |   - Client-server
 16 |   - **Create** and **init sockets**
 17 |   - **Allocate** and **populate buffers**
 18 |   - **Include _protocol_ info** (e.g., algorithm, parameters, etc.)
 19 |   - **Copy data** into buffers (e.g., image data, etc.)
 20 | - Common steps (highlighted in bold) related to remote IPC (inter-process communication): RPC
 21 | 
 22 | ## Benefits of RPC
 23 | 
 24 | - RPC is intended to simplify the development of cross-address space and cross-machine interactions
 25 | - Pros:
 26 |   - Higher-level interface for data movement and communication
 27 |   - Error handling
 28 |   - Hiding complexities of cross-machine interactions
 29 | 
 30 | ## RPC Requirements
 31 | 
 32 | - Some requirements of RPCs include:
 33 |   - Client/server interactions
 34 |   - Procedure call interface:
 35 |     - Sync call semantics
 36 |   - Type checking
 37 |     - Error handling
 38 |     - Packet bytes interpretation
 39 |   - Cross-machine conversion
 40 |     - E.g., big/little endian
 41 |   - Higher-level protocol
 42 |     - Access control, fault tolerance, etc.
 43 |     - different transport protocols
 44 | 
 45 | ## Structure of RPC
 46 | 
 47 | - _See lecture slides for figure_
 48 | - The execution order of an RPC is as follows:
 49 |   1. Client call to procedure
 50 |   2. Stub builds message
 51 |   3. Message is sent across the network
 52 |   4. Server OS hands message to server stub
 53 |   5. Stub unpacks message
 54 |   6. Stub makes local call to _add_
 55 | 
 56 | ## Steps in RPC
 57 | 
 58 | - There are some general steps in RPC:
 59 |   1. **Register**: server _registers_ procedure, argument types, location, etc.
 60 |   2. **Bind**: client finds and _binds_ to desired server
 61 |   3. **Call**: client makes RPC call; control passed to sub, client code blocks
 62 |   4. **Marshal**: client stub _marshals_ arguments (serialize arguments into buffer)
 63 |   5. **Send**: client sends message to server
 64 |   6. **Receive**: server receives message; passes message to server-stub; access control
 65 |   7. **Unmarshal**: server stub _unmarshals_ arguments (extracts arguments and creates data structures)
 66 |   8. **Actual call**: server stub calls local procedure implementation
 67 |   9. **Result**: server performs operation and computes result of RPC operation
 68 | - The above steps are similar on return
 69 | 
 70 | ## Interface Definition Language
 71 | 
 72 | - What can the server do?
 73 | - What arguments are required for the various operations (need agreement!)?
 74 | - Why:
 75 |   - Client-side bind decision
 76 |   - Run-time to automate stub generation: IDL (interface definition language)
 77 | 
 78 | ## Specifying an IDL
 79 | 
 80 | - An IDL used to describe the interface the server exports
 81 | - RPC can use IDL that is:
 82 |   - Language-agnostic
 83 |   - Language-specific
 84 | - Remember that an IDL is just an interface not an implementation!
 85 | 
 86 | ## Binding
 87 | 
 88 | - Client determines:
 89 |   - Which server should it connect to?
 90 |   - How will it connect to that server?
 91 | - **Registry**: database of available services
 92 |   - Search for service name to find a service (which) and contact details (how)
 93 | - **Distributed**:
 94 |   - Any RPC service can register
 95 | - **Machine-specific**:
 96 |   - For services running on same machine
 97 |   - Clients must known machine address: registry provides port number needed for connection
 98 | - **Needs naming protocol**:
 99 |   - Exact match for _add_
100 |   - Or consider _summation_, _sum_, _addition_, etc.
101 | 
102 | ## Visual Metaphor
103 | 
104 | - Applications use **binding and registries** like toy shops use directories of **outsourcing services**:
105 | - Who can provide services?
106 |   - Look u registry for image processing
107 | - What services are provided?
108 |   - Compress, filter, version number, etc. (IDL)
109 | - How will they send/receive?
110 |   - TCP/UDP (registry)
111 | 
112 | ## What About Pointers?
113 | 
114 | - Solutions:
115 |   - No pointers!
116 |   - Serialize pointers; copy referenced _pointed to_ data structure to send buffer
117 | 
118 | ## Handling Partial Failures
119 | 
120 | - When a client hangs, what is the problem?
121 |   - Server, service, or network down? Message lost?
122 |   - Timeout and retry (no guarantees!)
123 | - Special RPC error notification (signal, exception, etc.):
124 |   - Catch all possible ways in which the RPC call can (partially) fail
125 | 
126 | ## RPC Design Choice Summary
127 | 
128 | - **Design decisions for RPC systems** (e.g., Sun RPC, Java RMI)
129 |   - **Binding**: how to find the server
130 |   - **IDL**: how to talk to the server; how to package data
131 |   - **Pointers as arguments**: disallow or serialize pointed data
132 |   - **Partial failures**: special error notifications
133 | 
134 | ## What is Sun RPC?
135 | 
136 | - Sun RPC was developed in the 80x by Sun for UNIX; now widely available on other platforms
137 | - Design choices:
138 |   - **Binding**: per-machine registry daemon
139 |   - **IDL**: XDR (for interface specification and for encoding)
140 |   - **Pointers as arguments**: allowed and serialized
141 |   - **Partial failures**: retries; return as much information as possible
142 | 
143 | ## Sun RPC Overview
144 | 
145 | - _See lecture for figure_
146 | - Client-server via procedure calls
147 | - Interface specified via XDR (x file)
148 | - `rpcgen` compiler: converts x to language-specified stubs
149 | - Server registers with local registry damon
150 | - Registry:
151 |   - Name of service, version, protocol(s), port number, etc.
152 | - Binding creates handle:
153 |   - Client uses handle in calls
154 |   - RPC run-time uses handle to track per-client RPC state
155 | - Client and server on same or different machines
156 | - Documentation, tutorials and examples now maintained by Orcale
157 |   - TI-RPC: Transport-independent Sun RPC
158 |   - Provides Sun RPC/XDR documentation and code examples
159 |   - Older online references still relevant
160 |   - Linux man pages for _rpc_
161 | 
162 | ## Compiling XDR
163 | 
164 | - `rpcgen` compiler:
165 |   - `square.h`: data types and function definitions
166 |   - `square_svc.c`: server stub and skeleton (main)
167 |   - `square_clnt.c`: client stub
168 |   - `square_xdr.c`:common marshalling routines
169 | 
170 | ## Summarizing XDR Compilation
171 | 
172 | - _See lecture for figure_
173 | - **Developer**:
174 |   - Server code: implementation of `square.proc_1_svc`
175 |   - Client side: call `squareproc_1()`
176 |   - `#include.h`
177 |   - Link with stub objects
178 | - **RPC run-time - the rest**:
179 |   - OS interactions, communication management, etc.
180 | - `rpcgen -C square.x`: not thread safe!
181 | - `rpcgen -C -M square.x`: multi-threading safe!
182 |   - Does not make a multi-threaded _svc.c_ server
183 | 
184 | ## Sun RPC Registry
185 | 
186 | - **RPC daemon**: port mapper
187 | - **Query with** `rpcinfo -p`:
188 |   - `/usr/sbin/rpcinfo -p`
189 |   - Program id, version, protocol (tcp, udp), socket port number, service name, etc.
190 |   - Port mapper runs with tcp and udp on port 111
191 | 
192 | ## Sun RPC Binding
193 | 
194 | - **Client type**:
195 |   - Client handle
196 |   - Status, error, authentication, etc.
197 | 
198 | ## XDR Data Types
199 | 
200 | - **Default types**:
201 |   - `char`, `byte`, `int`, `float`
202 | - Additional XDR types:
203 |   - `const` (`#define`)
204 |   - `hyper` (64-bit `integer`)
205 |   - `quadruple` (128-bit `float`)
206 |   - `opaque` (C `byte`)
207 |     - Uninterpreted binary data
208 | - **Fixed-length array** (e.g., `int data[80]`)
209 | - **Variable-length array** (e.g., `int data<80>`): translates into a data structure with _len_ and _val_ fields
210 |   - **Except for strings**:
211 |     - String line `<80>`: c pointer to `char`
212 |     - Stored in memory as a normal null-terminated string
213 |     - Encoded (for transmission) as a pair of length and data
214 | 
215 | ## XDR Routines
216 | 
217 | - **Marshalling/unmarshalling**: found in `square_xdr.c`
218 | - **Clean-up**:
219 |   - `xdr_free()`
220 |   - User-defined `freeresult` procedure (e.g., `square_prog_1_freeresult`)
221 |   - Called after results returned
222 | 
223 | ## Encoding
224 | 
225 | - What goes on the wire?
226 |   - **Transport header** (e.g., TCP, UDP)
227 |   - **RPC header**: service procedure ID, version number, request ID, etc.
228 |   - **Actual data**:
229 |     - Arguments or results
230 |     - Encoded into a byte stream depending on data type
231 | 
232 | ## XDR Encoding
233 | 
234 | - **XDR**: **IDL + the encoding** (i.e., the binary representation of data _on-the-wire_)
235 | - **XDR encoding rules**:
236 |   - All data types are encoded in multiples of four bytes
237 |   - Big endian is the transmission standard
238 |   - Two's complement is used for integers
239 |   - IEEE format is used for floating point
240 | 
241 | ## Java RMI
242 | 
243 | - **Java RMI (Remote Method Invocations)**:
244 |   - Among address spaces in JVM(s)
245 |   - Matches Java OO semantics
246 |   - **IDL**: Java (language specific)
247 | - **RMI run-time**:
248 |   - **Remote reference layer**:
249 |     - Unicast, broadcast, return-first response, return-if-all-match
250 |   - **Transport**:
251 |     - TCP, UDP, shared memory, etc.
252 | 


--------------------------------------------------------------------------------
/Lesson 11 - Scheduling/lesson-11-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 11: Scheduling
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - Scheduling mechanisms, algorithms and data structures
  5 |   - Linux O(1) and CFS schedulers
  6 |   - Scheduling on multi-CPU platforms
  7 | 
  8 | ## Visual Metaphor
  9 | 
 10 | - Like an **OS scheduler**, a **toy shop manager** schedules work:
 11 |   - Assign task immediately:
 12 |     - Scheduling is simple (first-come first-serve)
 13 |   - Assign simple tasks first:
 14 |     - Maximize throughput (shortest job first)
 15 |   - Assign complex tasks first:
 16 |     - Maximize utilization of CPU devices, memory, etc.
 17 | 
 18 | ## CPU Scheduling
 19 | 
 20 | - The **CPU scheduler**:
 21 |   - Decides how and when **processes** (and their **threads**) access shared CPUs
 22 |   - Schedules tasks running **user-level processes/threads** as well as **KLTs**
 23 |   - Chooses one of ready **tasks** to run on CPU when:
 24 |     - CPU becomes idle
 25 |     - New **task** becomes ready
 26 |     - Time-slice expired timeout
 27 |   - **Thread** is dispatched on CPU
 28 | - Scheduling is equivalent to choosing a **task** from ready queue:
 29 |   - Which **task** should be selected?
 30 |     - Scheduling policy/algorithm
 31 |   - How is this done?
 32 |     - Depends on run-queue data structure (run-queue is the scheduling algorithm)
 33 | 
 34 | ## Run To Completion Scheduling
 35 | 
 36 | - Initial assumptions:
 37 |   - Group of tasks/jobs
 38 |   - Known execution times
 39 |   - No preemption
 40 |   - Single CPU
 41 | - Metrics:
 42 |   - Throughput
 43 |   - Average job completion time
 44 |   - Average job wait time
 45 |   - CPU utilization
 46 | - First-come first-serve (FCFS):
 47 |   - Schedules tasks in order of arrival
 48 |   - Run-queue is the same as queue (FIFO)
 49 | - Shortest job first (SJF):
 50 |   - Schedules tasks in order of their execution time
 51 |   - Run-queue is the same as ordered queue or tree
 52 | 
 53 | ## Preemptive Scheduling: SJF + Preempt
 54 | 
 55 | - **SJF + Preemption**:
 56 |   - _T2_ arrives first
 57 |   - _T2_ should be preempted
 58 | - **Heuristics based on history**: job running time
 59 | - How long did a task run last time?
 60 | - How long did a task run last _n_ times?
 61 | 
 62 | ## Preemptive Scheduling: Priority
 63 | 
 64 | - **Priority scheduling**:
 65 |   - Tasks have different priority levels
 66 |   - Run highest priority tasks next (preemption)
 67 |   - Run-queue is equal to per priority queues, tree ordered based on priority, etc.
 68 |   - Low priority tasks stuck in a run-queue (starvation)
 69 |   - _Priority aging_ is where `priority = f(actual priority, time spend in run queue)`
 70 |   - Eventually task will run (prevents starvation!)
 71 | 
 72 | ## Priority Inversion
 73 | 
 74 | - Assume SJF (see lecture for table and graph):
 75 |   - Priority: _T1_, _T2_, _T3_
 76 |   - Order of execution: _T2_, _T3_, _T1_ (priorities inverted)
 77 |   - Solution:
 78 |     - Temp boost priority of mutex owner
 79 |     - Lower again release
 80 | 
 81 | ## Round Robin Scheduling
 82 | 
 83 | - Pick up first tasks from queue (like FCFS)
 84 | - Task may yield, to wait on I/O (unlike FCFS)
 85 | - Round robin with priorities:
 86 |   - Include preemption
 87 | - Round robin with interleaving:
 88 |   - Time-slicing
 89 | 
 90 | ## Time-sharing and time-slices
 91 | 
 92 | - **Time-slice** - maximum amount of uninterrupted time given to a task (time quantum)
 93 | - Task may run less than time-slice time:
 94 |   - Has to wait for I/O, synchronization, etc. (will be placed on a queue)
 95 |   - Higher priority task becomes runnable
 96 | - Using time-slices tasks are interleaved (time-sharing the CPU):
 97 |   - CPU bound tasks (preempted after time-slice)
 98 | - Pros:
 99 |   - Short tasks finish sooner
100 |   - More responsive
101 |   - Lengthy I/O operations initiated sooner
102 | - Cons:
103 |   - Overheads (interrupt, schedule, context switch)
104 | 
105 | ## Summarizing Time-slice Length
106 | 
107 | - How long should a time-slice be?
108 |   - **CPU bound tasks prefer longer time-slices**:
109 |     - Limits context switching overheads
110 |     - Keeps CPU utilization and throughput high
111 |   - **I/O bound tasks prefer shorter time-slices**:
112 |     - I/O bound tasks can issue I/O operations earlier
113 |     - Keeps CPU and device utilization high
114 |     - Better used perceived performance
115 | 
116 | ## Run-queue Data Structure
117 | 
118 | - If we want I/O and CPU bound tasks have different time-slice values, then...
119 |   - Same run-queue, check type, etc.
120 |   - Two different structures
121 | - One solution: use a multi-queue data structure with separate internal queues
122 |   - First time-slice is most I/O intensive (highest priority)
123 |   - Second time-slice is medium I/O intensive (mix of I/O and CPU processing)
124 |   - Third and beyond time-slice is CPU intensive (lowest priority)
125 |   - Pros:
126 |     - Time-slicing benefits provided for I/O bound tasks
127 |     - Time-slicing overheads avoided for CPU bound tasks
128 | - Handling different time-slice values:
129 |   - Tasks enter top-most queue
130 |   - If task yields voluntarily keep task at this level
131 |   - If task uses up entire time-slice push down to lower level
132 |   - Task in lower queue gets priority boost when releasing CPU due to I/O waits
133 | - In summary, MLFQ (multi-level feedback queue) is not a priority queue (MLFQ has a feedback mechanism) and offer different treatment of threads at each level
134 | 
135 | ## Linux O(1) Scheduler
136 | 
137 | - The Linux O(1) scheduler has several of unique characteristics:
138 |   - The name **O(1)** means it takes constant time to select/add task, regardless of task count
139 |   - **Preemptive, priority-based**:
140 |     - Real time (0-99)
141 |     - Time-sharing (100-139)
142 |   - **User processes**:
143 |     - Default 120
144 |     - Nice value (-20 to 19)
145 | - Time-slice value for the Linux O(1) scheduler:
146 |   - Depends on priority
147 |   - Smallest for low priority
148 |   - Highest for high priority
149 | - Feedback for the Linux O(1) scheduler:
150 |   - Sleep time: waiting/idling
151 |   - Longer sleep: interactive
152 |   - Smaller sleep: compute-intensive
153 | - Run-queue for Linux O(1) scheduler: two arrays of tasks...
154 |   - Active:
155 |     - Used to pick next task to run
156 |     - Constant time to add/select
157 |     - Tasks remain in queue in active array until time-slice expires
158 |   - Expired:
159 |     - Inactive list
160 |     - When no more tasks in active array (swap active and expired)
161 | 
162 | ## Linux CFS Scheduler
163 | 
164 | - Problems with Linux O(1) scheduler:
165 |   - Performance of interactive tasks is not satisfactory
166 |   - Lacks fairness during task prioritization
167 | - Solution: Linux CFS (Completely Fair Scheduler)
168 |   - CFS is the default scheduler since Linux 2.6.23
169 |   - Run-queue is based on a red-black tree:
170 |     - Ordered by `vruntime` where `vruntime` is time spent on CPU
171 | - CFS scheduling works as follows:
172 |   - Always pick left-most node
173 |   - Periodically adjust `vruntime`
174 |   - Compare to left-most `vruntime`:
175 |     - If smaller, continue running
176 |     - If larger, preempt and place appropriately in the tree
177 |   - `vruntime` progress rate depends on priority and niceness:
178 |     - Rate fast for low-priority
179 |     - Rate slower for high-priority
180 |     - Same tree for all priorities
181 |   - Performance:
182 |     - Select task: _O(1)_
183 |     - Add task: _O(log(n))_
184 | 
185 | ## Scheduling on Multi-processors
186 | 
187 | - **Cache-affinity** important!
188 |   - Keeps tasks on the same CPU as much as possible
189 |   - Hierarchical scheduler architecture
190 | - **Per-CPU run-queue** and **scheduler**:
191 |   - Load balance across CPUs based on queue length or when CPU is idle
192 | - **NUMA (Non-uniform Memory Access)**:
193 |   - Multiple memory nodes
194 |   - Memory node closer to a _socket_ of multiple processors:
195 |     - Access to local memory node faster than access to remote memory node
196 |     - Keep tasks on CPU closer to memory node where their state is
197 |     - NUMA-aware scheduling
198 | 
199 | ## Hyper-threading
200 | 
201 | - Multiple hardware-supported execution contexts
202 | - Still one CPU but with **very fast** context switch:
203 |   - Hardware multi-threading
204 |   - Hyper-threading
205 |   - CMT (chip multi-threading)
206 |   - CMT (simultaneous multi-threading)
207 | 
208 | ## Scheduling for Hyper-threading
209 | 
210 | - **Assumptions**:
211 |   - Thread issues instruction on each cycle (one max IPC or instruction per cycle)
212 |   - Memory access (four cycles)
213 |   - Hardware switching instantaneous
214 |   - SMT with two hardware threads
215 | - Threads _interfere_ and _contend_ for CPU pipeline resource:
216 |   - Performance degrades
217 |   - Memory degrades
218 | - CPU idle: waste CPU cycles
219 | - Mix of CPU and memory-intensive threads:
220 |   - Avoid/limit contention on processor pipeline
221 |   - All components (CPU and memory) well utilized
222 |   - However, still leads to interference and degradation but minimal
223 | 
224 | ## CPU-bound or Memory-bound?
225 | 
226 | - Use historic information:
227 |   - _Sleep time_ won't work:
228 |     - The thread is not sleeping when waiting on memory
229 |     - Software takes too much time to compute
230 | - What about hardware counters?
231 |   - Hardware counters estimate what kind of resources a thread needs
232 |   - Scheduler can make informed decisions:
233 |     - Typically multiple counters
234 |     - Models with per architecture thresholds
235 |     - Based on well-understood workloads
236 | 
237 | ## CPI Experiment Results
238 | 
239 | - Resource contention in *SMTs* for process pipeline
240 | - Hardware counters can be used to characterize workload
241 | - Schedulers should be aware of resource contention, not just load balancing
242 | 


--------------------------------------------------------------------------------
/Lesson 05 - Threads and Concurrency/lesson-05-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 5: Threads and Concurrency
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - What are threads?
  5 |   - How do threads differ from processes?
  6 |   - What data structures are used to implement and manage threads?
  7 | 
  8 | ## What is a Thread?
  9 | 
 10 | - What is a thread?
 11 |   - A **thread** is like a **worker in a toy shop** in that a thread:
 12 |     - Is an active entity (e.g., executing unit of a process)
 13 |     - Works simultaneously with others (e.g., many threads executing)
 14 |     - Requires coordination (e.g., sharing of I/O devices, CPUs, memory, etc.)
 15 | 
 16 | ## Process vs Thread
 17 | 
 18 | - A single thread of process is represented by its address space
 19 | - Threads represent multiple, independent execution contexts
 20 | - Threads are part of the same virtual address space all threads share all of the virtual to physical address mappings as well as the code, data, and files
 21 | - Key differences:
 22 |   - However, threads will potentially execute different instructions, access different portions of that address space, operating on different portions of the input and differ in other ways
 23 |   - Each thread will need to have a different program counter, stack pointer, stack, thread-specific registers
 24 |   - Implication: for each thread we must have separate data structures to represent this per-thread information; consequently, the OS has a more complex PCB structure than a process
 25 | 
 26 | ## Why are threads useful?
 27 | 
 28 | - Threads can implement **parallelization** which can process the input much faster than if only a single thread on a single CPU had to process say, an entire matrix for example
 29 | - Threads may execute completely different portions of the program
 30 | - Threads can also utilize **specialization** which takes advantage of the hot cache present on each thread
 31 | - A multi-threaded application is more memory efficient and has lower memory requirements than its multi-processor alternative
 32 | - Additionally, a multi-threaded application incurs lower overheads for their inter-thread communication then the corresponding inter-process alternatives
 33 | 
 34 | ## Basic Thread Mechanisms
 35 | 
 36 | - **Thread data structure** - identify threads, keep track of resource usage, etc.
 37 | - Mechanisms to **create** and **manage** threads
 38 | - Mechanisms to safely **coordinate** among threads running **concurrently** in the same address space
 39 | - Processes:
 40 |   - Operate within their own address space
 41 |   - OS and hardware makes sure that no access from one address space is allowed to be performed on memory that belongs to another
 42 | - Threads:
 43 |   - Share the same virtual-to-physical address mappings
 44 |   - Can access the same data at the same time (concurrency issue)
 45 | - To address concurrency issues we use **synchronization mechanisms**:
 46 |   - **Mutual exclusion**:
 47 |     - Exclusive access to only on thread at a time
 48 |     - **Mutex** (mutual exclusion object) - a program object that is created so that multiple program threads can take turns sharing the same resource
 49 |   - **Waiting** on other threads:
 50 |     - Specific condition before proceeding
 51 |     - **Condition variable** - a container of threads that are waiting for a certain condition
 52 |   - Waking up other threads from wait state
 53 | 
 54 | ## Thread Creation
 55 | 
 56 | - There are three main steps in thread creation process:
 57 |   - **Thread type**:
 58 |     - Thread data structure
 59 |   - **Fork (proc, args)**:
 60 |     - Create a thread
 61 |     - Not UNIX fork
 62 |   - **Join (thread)**:
 63 |     - Terminate a thread
 64 | 
 65 | ## Mutual Exclusion
 66 | 
 67 | - **Mutex** - a lock that should be used whenever accessing data or state that's shared among threads
 68 | - When a thread locks a mutex (also termed acquiring the mutex) it has exclusive access to a resource until the thread decides to unlock the mutex
 69 | - A mutex has the following information:
 70 |   - Is the mutex locked?
 71 |   - Which thread owns the mutex?
 72 |   - Which threads are blocked?
 73 | - **Critical section** - portion of the code protected by the mutex
 74 | 
 75 | ## Condition Variable
 76 | 
 77 | - Condition variables can be used in conjuction with mutexes to control the behavior of concurrent threads
 78 | - In the consumer and producer example in lecture, there is a condition where both consumer and producer checks if lists is/is not full, move foward
 79 |   - We combat this wait condition with a condition variable which **releases** the mutex to allow for producers to finish filling up the list and then **acquires** the mutex after the `Wait()` statement is finished
 80 | 
 81 | ## Condition Variable API
 82 | 
 83 | - A condition variable API conists of the following:
 84 |   - **Condition** type
 85 |   - **Wait (mutex, condition)**:
 86 |     - Mutex is automatically released and re-acquired on wait
 87 |   - **Signal (condition)**:
 88 |     - Notify one thread waiting on condition
 89 |   - **Broadcast (condition)**:
 90 |     - Notify all waiting threads
 91 | 
 92 | ## Common Pitfalls
 93 | 
 94 | - Keep track of mutex/condition variables used with a resource:
 95 |   - e.g., mutex*type \_m1*; // mutex for file1
 96 | - Check that that you are always (and correctly) using lock and unlock:
 97 |   - e.g., did you forget to lock/unlock? What about compilers?
 98 | - Use a single mutex to access a single resource!
 99 |   - We do not want reads and writes to happen concurrently!
100 | - Check that you are signaling correct condition
101 | - Check that you are not using signal when broadcast is needed:
102 |   - Signal - only one thread will proceed, remaining threads will continue to wait, possibly indefinitely!
103 | - Ask yourself: do you need priority guarantees?
104 |   - Thread execution order not controlled by signals to condition variables!
105 | - Other pitfalls include:
106 |   - Spurious wake-ups
107 |   - Deadlocks
108 | 
109 | ## Spurious Wake-ups
110 | 
111 | - Spurious wake-ups occur when cycles are wasted via context switching threads to run on the CPU and then back again to wait on the wait queue
112 | - When you unlock a mutex after broadcast/signal, no other thread can get lock
113 | - Solution: broadcast/signal after mutex is unlocked, this only works in some cases however (write to file)
114 | 
115 | ## Deadlocks
116 | 
117 | - Deadlocks occur when two or more competing threads are waiting on each other to complete but none of them ever do
118 | - Solution: a good general solution is to maintain lock order, e.g., first _m_a_ then _m_b_
119 | 
120 | ## Kernel vs User level Threads
121 | 
122 | - Kernel level:
123 |   - Kernel level threads imply that the OS itself is multi-threaded
124 |   - Kernel threads are managed bny kernel level components like the kernel level scheduler (the OS scheduler will decide how kernel level threads will be mapped onto the physical CPUs and which one of the threads will execute)
125 | - User level:
126 |   - The processes are multi-threaded
127 |   - For a user level thread to execute it must be associated with a kernel level thread and the OS level scheduler must schedule that kernel level thread onto a CPU
128 | - What is the relationship between a kernel level thread and a user level thread?
129 | 
130 | ## Multi-threading models
131 | 
132 | - **One-to-one model**:
133 |   - Pros:
134 |     - OS sees/understands threads, synchronization, blocking, etc.
135 |   - Cons:
136 |     - Must go to OS for all operations (may be expensive)
137 |     - OS may have limits on policies, thread number
138 |     - Portability
139 | - **Many-to-one model**:
140 |   - Pros:
141 |     - Totally portable, does not depend on OS limits and polices
142 |   - Cons:
143 |     - OS has no insights into application needs
144 |     - OS may block entire process if one user level thread blocks on I/O
145 | - **Many-to-many model**:
146 |   - Pros:
147 |     - Can be best of both worlds
148 |     - Can have bound or unbound threads
149 |   - Cons:
150 |     - Requires coordination between user and kernel level thread managers
151 | 
152 | ## Scope of Multi-threading
153 | 
154 | - **Process scope**:
155 |   - User level library manages threads within a single process
156 | - **System scope**:
157 |   - System-wide thread management by OS level thread managers (e.g., CPU scheduler)
158 | 
159 | ## Multi-threading Patterns
160 | 
161 | - **Boss-workers**:
162 |   - Boss: assigns work to workers
163 |   - Worker: performs entire tasks
164 |   - Scenario 1: boss assigns work by directly signaling specific worker
165 |     - Pros:
166 |       - Workers don't need to synchronize
167 |     - Cons:
168 |       - Boss must track what each worker is doing
169 |       - Throughput will do down!
170 |   - Scenario 2: boss assigns work in producer/consumer queue
171 |     - Pros:
172 |       - Boss does not need to know details about workers
173 |     - Cons:
174 |       - Queue synchronization
175 |   - Scenario 3: worker pool (static or dynamic)
176 |     - Pros:
177 |       - Simplicity
178 |     - Cons:
179 |       - Thread pool management
180 |       - Locality
181 | - **Boss-workers variants**:
182 |   - All workers created equal versus workers specialized for certain tasks
183 |   - Pros:
184 |     - Better locality
185 |     - Quality of service management
186 |   - Cons:
187 |     - Load balancing
188 | - **Pipeline pattern**:
189 |   - Threads assigned one subtask in the system
190 |   - Entire tasks are pipeline of threads
191 |   - Multiple tasks concurrently in the system, in different pipeline stages
192 |   - Throughput is the longest stage in the pipeline (weakest link) in the pipeline
193 |   - Pipeline stages can be managed via thread pool
194 |   - The best way to pass work is through a shared-buffer based communication between stages
195 |   - Pros:
196 |     - Specialization and locality
197 |   - Cons:
198 |     - Balancing and synchronization overheadss
199 | - **Layered pattern**:
200 |   - Each layer group of related subtasks
201 |   - End-to-end task must pass up and down through all layers
202 |   - Pros:
203 |     - Specialization
204 |     - Less fine-grained than pipeline
205 |   - Cons:
206 |     - Not suitable for all applications
207 |     - Synchronization
208 | 


--------------------------------------------------------------------------------
/Lesson 08 - Thread Design Considerations/lesson-08-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 8: Thread Design Considerations
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - Kernel vs user-level threads
  5 |   - Threads and interrupts
  6 |   - Threads and signal handling
  7 | 
  8 | ## Kernel vs User-level Threads
  9 | 
 10 | - **User-level library**:
 11 |   - Provides thread abstraction scheduling, sync
 12 | - **OS kernel**:
 13 |   - Maintains thread abstraction scheduling, sync
 14 | 
 15 | ## Thread Data Structures: Single CPU
 16 | 
 17 | - **ULT (user-level thread)**:
 18 |   - User-level thread ID
 19 |   - User-level refs
 20 |   - Thread stack
 21 | - **PCB**:
 22 |   - Virtual address mapping
 23 | - **KLT (kernel-level thread)**:
 24 |   - Stack
 25 |   - Registers
 26 | 
 27 | ## Thread Data Structures: At Scale
 28 | 
 29 | - When running multiple processes:
 30 |   - We need copies of KLT, PCB, and KLT structures
 31 |   - We will need to have a relationship between ULT, PCB, and KLT to what is the address space within which that thread executes
 32 |   - For a system with multiple CPUs we will need to have another data structure to represent the CPU as well as a relationship between the KLTs and the CPU
 33 | - When the kernel is multi-threaded:
 34 |   - We can have multiple kernel-level threads supporting a single user-level process
 35 |   - When kernel needs to context switch among KLTs that belong to different processes, it can quickly determined that they point to a different PCB
 36 | 
 37 | ## Hard and Light Process State
 38 | 
 39 | - When two KLTs belong to the same address space:
 40 |   - Information in the PCB are split into a _hard_ and _light_ process state
 41 |   - **Hard process state** - relevant for all of the ULTs that execute within that process
 42 |   - **Light process state** - relevant for a subset of the ULTs that are currently associated with a particular KLT
 43 | 
 44 | ## Rationale for Data Structures
 45 | 
 46 | - **Single PCB**:
 47 |   - Large continuous data structure
 48 |   - Private for each entity
 49 |   - Saved and restored on each context switch
 50 |   - Update for any changes
 51 | - **Multiple data structures**:
 52 |   - Smaller data structures
 53 |   - Easier to share
 54 |   - On context switch only save and restore what needs to change
 55 |   - User-level library need only update portion of the state
 56 | - In general, pivoting to multiple data structures improves scalability, overheads, performance, and flexibility
 57 | - Modern OS adopt multiple data structures for organizing information about their execution contexts
 58 | 
 59 | ## Basic Thead Management Interactions
 60 | 
 61 | - Problem:
 62 |   - User-level library does not know what is happening in the kernel
 63 |   - Kernel does not know what is happening in the user-level library
 64 | - Solution:
 65 |   - System calls and special signals allow kernel and ULT to interact and coordinate (as shown in Solaris 2.0 demo in lecture)
 66 | 
 67 | ## Thread Management Visibility and Design
 68 | 
 69 | - Problem:
 70 |   - Visibility of state and decisions between kernel and user-level library
 71 | - **User-level library** sees:
 72 |   - ULTs
 73 |   - Available KLTs
 74 | - **Kernel** sees:
 75 |   - KLTs
 76 |   - CPUs
 77 |   - Kernel-level scheduler
 78 | - Invisible to kernel:
 79 |   - Mutex variable
 80 |   - Wait queues
 81 | - Additionally there are many to many:
 82 |   - User-level scheduling decisions
 83 |   - Change ULT-KLT mapping
 84 | - One way to address visibility issue is to use one-to-one models
 85 | - How/when does the user-level library run?
 86 |   - Process jump to user-level libary scheduler when:
 87 |     - ULTs explicitly yield
 88 |     - Timer set by user-level library expires
 89 |     - ULTs call library functions like lock/unlock
 90 |     - Blocked threads become runnable
 91 |   - User-level library scheduler:
 92 |     - Runs on ULT operations
 93 |     - Runs on signals from timer
 94 | 
 95 | ## Issues on Multple CPUs
 96 | 
 97 | - Problem:
 98 |   - Have ULTs split running on multiple CPUs, how to get CPUs to communicate?
 99 | - Solution:
100 |   - On the kernel level, need to send signal to other KLT on the other CPU to run library code locally
101 | 
102 | ## Synchronization-related Issues
103 | 
104 | - Problem:
105 |   - Have ULTs split running on multiple CPUs, how to get CPUs to synchronize?
106 | - Solution:
107 |   - Use **adaptive mutexes**:
108 |     - If critical section short do not block, spin instead!
109 |     - For long critical sections we resort to default blocking behavior
110 | - Destroying threads:
111 |   - Instead of destroying, we should reuse threads
112 |   - When a thread exits:
113 |     - Put on **death row**
114 |     - Periodically destroyed by **reaper thread**
115 |     - Otherwise thread structures/stacks are reused (performance gains!)
116 | 
117 | ## Interrupts vs Signals
118 | 
119 | - **Interrupts**:
120 |   - Events generated externally by components other than the CPU (I/O devices, timers, other CPUs)
121 |   - Determined based on the physical platform
122 |   - Appear asynchronously
123 | - **Signals**:
124 |   - Events triggered by the CPU and software running on it
125 |   - Determined based on the operating system
126 |   - Appear synchronously or asynchronously
127 | - **Interrupts and Signals**:
128 |   - Have a unique ID depending on the hardware or OS
129 |   - Can be masked and disabled/suspended via corresponding mask
130 |     - Per-CPU interrupt mask, per-process signal mask
131 |   - If enabled, trigger corresponding handler
132 |     - Interrupt handler set for entire system by OS
133 |     - Signal handlers set on per process basis, by process
134 | 
135 | ## Interrupt Handling
136 | 
137 | - Recall that interrupts are generated externally
138 | - When a device sends an interrupt to the CPU it is basically sending a signal through the interconnect that connects the device to the CPU complex
139 | - For most modern devices there is an MSI (message signal interrupter) message that can be carried on the same interconnect that connects the devices to the CPU complex
140 | - Based on the MSI message, the interrupt can be uniquely identified through a interrupt handler table
141 | 
142 | ## Signal Handling
143 | 
144 | - Recall that signals are generated internally
145 | - If a thread decides to access illegal memory, a signal (`SIGSEGV`) will be generated from the OS
146 | - The OS maintains a signal handler table for every process in the system
147 | - A process may specify how a signal should be handled, this is because the OS actually specifies some default actions for handling signals
148 | - **Handlers/actions** (default actions):
149 |   - Terminate
150 |   - Ignore
151 |   - Terminate and core dump
152 |   - Stop or continue
153 | - **Synchronous**:
154 |   - `SIGSEGV` (access to protected memory)
155 |   - `SIGFPE` (divide by zero)
156 |   - `SIGKILL` (kill, id) can be directed to s specific thread
157 | - **Asynchronous**:
158 |   - `SIGKILL` (kill)
159 |   - `SIGALARM`
160 | 
161 | ## Why Disable Interrupts or Signals?
162 | 
163 | - Problem:
164 |   - Interrupts and signals are handled on the thread stack which can cause handler code to deadlock
165 | - Solution:
166 |   - Control interruptions by handler code (user interrupt/signal masks)
167 | - A **mask** is a sequence of bits where each bit corresponds to a specific interrupt or signal and the value of the bit, zero or one, will indicate whether the specific interrupter signal is disabled or enabled
168 | 
169 | ## More on Masks
170 | 
171 | - **Interrupt masks** are per CPU:
172 |   - If mask disables interrupt, hardware interrupt routing mechanism will not deliver interrupt to CPU
173 | - **Signal masks** are per execution context (ULT on top of KLT) if mask disables signal, kernel sees mask and will not interrupt corresponding thread
174 | 
175 | ## Interrupts on Multi-core systems
176 | 
177 | - Interrupts can be directed to any CPU that has them enabled
178 | - May set interrupt on just a single core
179 |   - Avoids overheads and perturbations on all other cores
180 | 
181 | ## Types of Signals
182 | 
183 | - **One-shot signals**:
184 |   - `n_signals_pending == one_signal_pending` at least once
185 |   - Must be explicitly re-enabled
186 | - **Real-time signals**:
187 |   - _If n signals raised, then handler is called n times_
188 | 
189 | ## Interrupts as Threads
190 | 
191 | - Problem:
192 |   - Deadlocks can happen for signal handling routines
193 | - Solution:
194 |   - As mentioned in the SunOS paper, we can allow interrupts to become full-fledged threads every time interrupts are performing blocking operations
195 | - However, dynamic thread creation is expensive!
196 |   - **Dynamic decision**:
197 |     - If handler doesn't lock, execute on interrupted thread's stack
198 |     - If handler can block, turn into real thread
199 |   - **Optimization**:
200 |     - Pre-create and pre-initialize thread structures for interrupt routines
201 | 
202 | ## Interrupts: Top vs Bottom Half
203 | 
204 | - Interrupts as threads can be handled in two ways (see diagram from lecture):
205 |   - **Top half**:
206 |     - Fast, non-blocking
207 |     - Min amount of processing
208 |   - **Bottom half**:
209 |     - Arbitrary
210 |     - Complexity
211 | - Bottom line:
212 |   - To permit arbitrary functionality to be incorporated into the interrupt-handling operations, the handling routine must be executed by another thread where synchronization and blocking is a possibility
213 | 
214 | ## Performance of Threads as Interrupts
215 | 
216 | - **Overall cost**:
217 |   - Overhead of 40 SPARC instructions per interupt
218 |   - Saving of 12 instructions per mutex
219 |     - No changes in interrupt mask, level, etc.
220 |   - Fewer interrupts than mutex lock/unlock operations
221 | - To summarize, optimize for the common case!
222 | 
223 | ## Task Structure in Linux
224 | 
225 | - Main execution abstraction: task
226 |   - KLT
227 | - Single-threaded process: one task
228 | - Multi-threaded process: many tasks
229 | - **Task creation** - use `clone` command
230 | - **Linux threads model**:
231 |   - NPTL (Native POSIX Threads Library) - *one-to-one model*:
232 |     - Kernel sees each ULT info
233 |     - Kernel traps are cheaper
234 |     - More resources: memory, large range of IDs, etc.
235 |   - Order Linux Threads - *many-to-many model*:
236 |     - Similar issues to those described in Solaris papers
237 | 


--------------------------------------------------------------------------------
/Lesson 12 - Memory Management/lesson-12-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 12: Memory Management
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - Physical and virtual memory management
  5 |   - Review of memory management mechanisms
  6 |   - Illustration of advanced services
  7 | 
  8 | ## Visual Metaphor
  9 | 
 10 | - **OS** and **toy shops** each have **memory (part) management systems**:
 11 |   - Uses intelligently sized containers:
 12 |     - Memory pages or segments
 13 |   - Not all memory is needed at once:
 14 |     - Tasks operate on subset of memory
 15 |   - Optimized for performance:
 16 |     - Reduce time to access state in memory (better performance)
 17 | 
 18 | ## Memory Management: Goals
 19 | 
 20 | - Virtual vs physical memory:
 21 |   - Allocate: Allocation, replacement, etc.
 22 |   - Arbitrate: Address translation and validation
 23 | - Page-based memory management:
 24 |   - Allocate: Pages to page frames
 25 |   - Arbitrate: Page tables
 26 | - Segment-based memory management:
 27 |   - Allocate: Segments
 28 |   - Arbitrate: segment registers
 29 | 
 30 | ## Memory Management: Hardware Support
 31 | 
 32 | - MMU (memory management unit):
 33 |   - Translate **VA (virtual address)** to **PA (physical addresses)**
 34 |   - Reports faults: illegal access, permission not present in memory
 35 | - Registers:
 36 |   - Pointers to page table
 37 |   - Base and limit size, number of segments, etc.
 38 | - Cache - TLB (translation look-aside buffer):
 39 |   - Valid virtual address to physical address translations: TLB
 40 | - Translation:
 41 |   - Actual PA generation done in hardware
 42 | 
 43 | ## Page Tables
 44 | 
 45 | - Virtual memory pages and physical memory page frames are the same size
 46 | - Useful acronyms for page tables:
 47 |   - VPN (virtual page number)
 48 |   - PFN (physical frame number)
 49 | - Page table has allocation on first touch!
 50 | - Unused pages are reclaimed:
 51 |   - Mapping invalid
 52 |   - Hardware will fault
 53 |   - Re-establish on re-access
 54 | - In summary, the OS creates a page table for every process:
 55 |   - On context switch, switch to valid page table
 56 |   - Update register
 57 | 
 58 | ## Page Table Entry
 59 | 
 60 | - Flags:
 61 |   - Present (valid/invalid)
 62 |   - Dirty (written to)
 63 |   - Accessed (for read or write)
 64 |   - Protection bits (read, write, and execute)
 65 | - **Page fault**:
 66 |   - Two options for PFN:
 67 |     - Generate physical address and access
 68 |     - Generate error code on kernel stack (trap into kernel)
 69 |   - Page fault handler determines action based on error code and faulting address:
 70 |     - Bring page from disk to memory
 71 |     - Protection error (`SIGSEGV`)
 72 |     - Error code from PTE flags
 73 |     - Faulting address CR2 register
 74 | 
 75 | ## Page Table Size
 76 | 
 77 | - **64-bit Architecture**:
 78 |   - PTE (Page Table Entry): 8 bytes including PFN + flags
 79 |   - VPN: 2^64 / page size
 80 |   - Page size: (2^64 / 2^12) \* 8 bytes = 32 petabytes per process
 81 | - **Process** does not use entire address space:
 82 |   - Even on 32-bit architecture will not always use all of 4 GB
 83 |   - But **page table** assumes an entry per **VPN**, regardless of whether corresponding virtual memory is needed or not
 84 | 
 85 | ## Multi-level Page Tables
 86 | 
 87 | - **Outer page table**: page table directory
 88 | - **Internal page table**: only for valid virtual memory regions
 89 | - **Additional layers**:
 90 |   - Page table directory pointer (third level)
 91 |   - Page table directory point map (fourth level)
 92 |     - Important on 64-bit architectures
 93 |     - Larger and more sparse
 94 |     - Larger gaps could save more internal page table components
 95 | - **Multi-level page tables**:
 96 |   - Pros:
 97 |     - Smaller internal page tables/directories
 98 |     - Granularity of coverage (potential reduced page table size)
 99 |   - Cons:
100 |     - More memory access required for translation (increased translation latency)
101 | 
102 | ## Speeding Up Translation TLB
103 | 
104 | - Overhead of address translation:
105 |   - Single-level page table:
106 |     - 1x access to page table entry
107 |     - 1x access to memory
108 |   - Four-level page table:
109 |     - 4x accesses to page table entries
110 |     - 1x access to memory (can lead to slow down!)
111 | - **Page table cache (TLB)**:
112 |   - MMU-level address translation cache
113 |   - On TLB miss: page table access from memory
114 |   - Has protection/validity bits
115 |   - Small number of cached address, high TLB hit rate and temporal and spatial locality
116 | 
117 | ## Inverted Page Tables
118 | 
119 | - Another way of organizing the address translation process (see lecture for the inverted page table diagram):
120 |   - Components:
121 |     - Logical address
122 |     - Physical address
123 |     - Physical memory
124 |     - Page table
125 |     - Search
126 |   - TLB to catch memory references
127 | - Inverted page tables use hashing page tables (see lecture for diagram) to optimize efficiency:
128 |   - Speeds up linear search process and narrows it down to few possible entries into the inverted page table, this speeds up address translation
129 | 
130 | ## Segmentation
131 | 
132 | - **Segments** are arbitrary and granular:
133 |   - E.g., code, heap, data, stack, etc.
134 |   - Address is equivalent to the segment selector + offset
135 | - **Segment** is a contiguous physical memory:
136 |   - Segment size is equivalent to segment base + limit registers
137 | - **Segmentation + paging**:
138 |   - IA x86_32: segmentation + paging
139 |     - Linux up to 8K per process / global segment
140 |   - IA 86x_64: paging
141 | 
142 | ## Page Size
143 | 
144 | - 10-bit offset: 1 KB page size
145 | - 12-bit offset: 4 KB page size
146 | - Below is a table detailing _large vs huge_ pages
147 | 
148 | |                                       | Large   | Huge    |
149 | | ------------------------------------- | ------- | ------- |
150 | | Page size                             | 2 MB    | 1 GB    |
151 | | Offset bits                           | 21 bits | 30 bits |
152 | | Reduction factor (on page table size) | x512    | x1024   |
153 | 
154 | - In general, for larger pages:
155 |   - Pros: fewer page table entries, smaller page tables, more TLB hits, etc.
156 |   - Cons: internal fragmentation (wastes memory)
157 | 
158 | ## Memory Allocation
159 | 
160 | - **Memory allocator**:
161 |   - Determines VA to PA mapping, address translation, page tables, etc.
162 |   - Simply determine PA from VA and check validity/permissions
163 | - **Kernel-level allocators**:
164 |   - Kernel state, static process state
165 | - **User-level allocators**:
166 |   - Dynamic process state (heap), malloc/free
167 |   - E.g., `dimalloc`, `jemalloc`, `hoard`, `tcmalloc`
168 | 
169 | ## Memory Allocation Challenges
170 | 
171 | - Problem: **external fragmentation**
172 |   - Occurs with multiple interleaved allocate and free operations, and as a result of them, we have holes of free memory that is not contiguous
173 |   - Requests for larger contiguous memory allocations cannot be satisfied
174 | - Solution:
175 |   - When pages are freed, the allocator can aggregate adjacent areas of free pages into one larger free area, this allows for larger future requests
176 | 
177 | ## Allocators in the Linux Kernel
178 | 
179 | - The Linux kernel relies on two basic allocation mechanisms:
180 |   - **Buddy**:
181 |     - Starts with consecutive memory region that's free (2^x area)
182 |     - On request, sub-divide into 2^x chunks and find smallest 2^x chunk that can satisfy request (fragmentation still there)
183 |     - On free:
184 |       - Check buddy to see if it can aggregate into a larger chunk
185 |       - Aggregate more up the tree (aggregation works well and fast)
186 |   - **Slab**:
187 |     - Addresses 2^x granularity in Buddy
188 |     - Addresses internal fragmentation
189 |     - **Slab allocator**:
190 |       - Caches for common object types/sizes, on top of contiguous memory
191 |     - Pros:
192 |       - Internal fragmentation avoided
193 |       - External fragmentation not an issue
194 | 
195 | ## Demand Paging
196 | 
197 | - Virtual memory >> physical memory:
198 |   - Virtual memory page note always in physical memory
199 |   - Physical page frame saved and restored to/from secondary storage
200 | - **Demand paging**: pages swapped in/out of memory and a swap partition
201 |   - Original PA is not equal to PA after swap
202 |   - If page is _pinned_ swapping disabled
203 | 
204 | ## Freeing Up Physical Memory
205 | 
206 | - When should pages be swapped out?
207 |   - OS runs page (out) daemon:
208 |     - When memory usage is above threshold (high watermark)
209 |     - When CPU usage is below threshold (low watermark)
210 | - Which pages should be swapped out?
211 |   - Pages that won't be used
212 |   - History-based prediction:
213 |     - LRU (least-recently used) policy: access bit to track if page is referenced
214 |   - Pages that don't need to be written out
215 |     - Dirty bit to track if modified
216 |   - Avoid non-swappable pages
217 | - In Linux:
218 |   - Parameters to tune thresholds: target page count, etc.
219 |   - Categorize pages into different types: e.g., claimable, swappable, etc.
220 |   - _Second chance_ variation of LRU
221 | 
222 | ## Copy On Write
223 | 
224 | - MMU Hardware:
225 |   - Perform translation, track access, enforce protection, etc.
226 |   - Useful to build other services and optimizations
227 | - **COW (copy-on-write)**:
228 |   - On process creation:
229 |     - Copy entire parent address space
230 |     - Many pages are static, don't change (why keep multiple copies?)
231 |   - On create:
232 |     - Map new VA to original page
233 |     - Write protect original page
234 |     - If only read: save memory and time to copy
235 |   - On write:
236 |     - Page fault copy
237 |     - Pay copy cost only if necessary
238 | 
239 | ## Failure Management Check-pointing
240 | 
241 | - **Check-pointing**: failure and recovery management technique
242 |   - Periodically save process state
243 |   - Failure may be unavoidable but can restart from checkpoint so recovery much faster
244 | - **Simple approach**: pause and copy
245 | - **Better approach**:
246 |   - Write-protect and copy everything once
247 |   - Copy diffs of _dirtied_ pages for incremental checkpoints, rebuild from mutiple diffs, or in background
248 | - **Debugging**:
249 |   - RR (rewind-replay)
250 |   - Rewind means to restart from checkpoint
251 |   - Gradually go back to older checkpoints until error found
252 | - **Migration**:
253 |   - Continue on another machine
254 |   - Disaster recovery
255 |   - Consolidation
256 |   - Repeated checkpoints in a fast loop until pause-and-copy becomes acceptable (or unavoidable)
257 | 


--------------------------------------------------------------------------------
/Lesson 09 - Thread Performance Considerations/lesson-09-notes.md:
--------------------------------------------------------------------------------
  1 | # Lesson 9: Thread Performance Considerations
  2 | 
  3 | - Topics to be covered in this lesson:
  4 |   - Performance comparisons:
  5 |     - Multi-process vs multi-threaded vs event-driven
  6 |   - Event-driven architectures
  7 |     - "Flash: An Efficient and Portable Web Server" vs Apache
  8 | 
  9 | ## Which Threading Modeling is Better?
 10 | 
 11 | - Consider the _Boss-worker_ vs _Pipeline_ example as discussed in lesson 5 (see lecture for specific initial conditions):
 12 |   - We care about two metrics, execution time and average time to complete order
 13 |   - The _Boss-worker_ model has an execution time greater than that of the _Pipeline_ model (undesirable)
 14 |   - However, the _Boss-worker_ model has an average time to complete order less than that of the _Pipeline_ model (desirable)
 15 |   - Which model is better?
 16 |     - It really depends on the metrics!
 17 | 
 18 | ## Are Threads Useful?
 19 | 
 20 | - Threads are useful because of:
 21 |   - Parallelization: speed up
 22 |   - Specialization: hot cache!
 23 |   - Efficiency: lower memory requirement and cheaper synchronization
 24 | - Threads hide latency of I/O operations (single CPU)
 25 | - Now consider what is useful...
 26 |   - For a matrix multiply application: execution time
 27 |   - For a web service application:
 28 |     - Number of client requests/time
 29 |     - Response time
 30 |   - For hardware: higher utilization (e.g., CPU)
 31 | - Again, it depends on the metrics!
 32 | 
 33 | ## Visual Metaphor
 34 | 
 35 | - **Metrics** exist for OS and for toy shops (some examples below):
 36 |   - **Throughput**:
 37 |     - Process completion rate
 38 |   - **Response time**:
 39 |     - Average time to respond to input (e.g., mouse click)
 40 |   - **Utilization**:
 41 |     - Percentage of CPU
 42 | 
 43 | ## Performance Metrics Intro
 44 | 
 45 | - Metrics are a measurement standard **measurable** and/or **quantifiable property** (e.g., execution time) of the **system** (software implementation of a problem) we're interested in that can be used to **evaluate the system behavior** (its improvement compared to other implementations)
 46 | 
 47 | ## Performance Metrics
 48 | 
 49 | - What are some performance metrics computer scientists typically care about?
 50 |   - Previously covered metrics:
 51 |     - Execution time
 52 |     - Throughput
 53 |     - Request rate
 54 |     - CPU utilization
 55 |   - Other metrics one might care about:
 56 |     - Wait time
 57 |     - Platform efficiency
 58 |     - Performance/cost
 59 |     - Performance/power
 60 |     - Percentage of SLA violations
 61 |     - Client-perceived performance
 62 |     - Aggregate performance
 63 |     - Average resource usage
 64 | 
 65 | ## Performance Metrics Summary
 66 | 
 67 | - Performance metrics are a **measurable quantity** obtained from:
 68 |   - Experiments with real software deployment, real machines, real workloads
 69 |   - _Toy_ experiments representative of realistic settings
 70 |   - Simulation: test-bed
 71 | 
 72 | ## Really... Are Threads Useful?
 73 | 
 74 | - Depends on **metrics**!
 75 | - Depends on **workload**!
 76 | - Bottom line: it depends!
 77 | 
 78 | ## Multi-process vs Multi-thread
 79 | 
 80 | - Consider how to best provide concurrency (see lecture for simple web server example):
 81 |   - **Multi-process web server**:
 82 |     - Pros: simple programming
 83 |     - Cons:
 84 |       - Many processes which means high memory usage costly context switch hard/costly to maintain shared state (tricky port setup)
 85 |   - **Multi-threaded web server**:
 86 |     - Pros:
 87 |       - Shared address space
 88 |       - Shared state
 89 |       - Cheap context switch
 90 |     - Cons:
 91 |       - Not simple implementation
 92 |       - Requires synchronization
 93 |       - Underlying support for threads
 94 | 
 95 | ## Event-driven Model
 96 | 
 97 | - An event-driven model contains the following elements (see lecture for diagram):
 98 |   - **Event handlers**:
 99 |     - Accept connection
100 |     - Read request
101 |     - Send header
102 |     - Read file/send data
103 |   - **Dispatch loop**
104 |   - **Events**:
105 |     - Receipt of request
106 |     - Completion of send
107 |     - Completion of disk read
108 | - An event driven model has a single address space, single process, and single thread of control!
109 | - The **dispatcher** is in charge of state machine external events (call handler and jump to code)
110 | - The **handler**:
111 |   - Runs to completion
112 |   - Facilitates blocking:
113 |     - Initiate blocking operation and pass control to dispatch loop
114 | 
115 | ## Concurrency in the Event-driven Model
116 | 
117 | - If the event-driven model operates on a single thread, how to achieve concurrency?
118 |   - Single thread switches among processing of different requests
119 | - **Multi-process and multi-threaded**:
120 |   - One request per execution context (process/thread)
121 | - **Event-driven**:
122 |   - Many requests interleaved in an execution context
123 | 
124 | ## Event-driven Model: Why
125 | 
126 | - Why does this work?
127 |   - On one CPU, _threads hide latency_:
128 |     - `if (t_idle > 2 * t_context_switch)`, context switch to hide latency
129 |     - `if (t_idle == 0)`, context switching just wastes cycles that could have been used for request processing
130 | - **Event-driven**:
131 |   - Process request until wait necessary then switch to another request
132 | - **Multiple CPUs**:
133 |   - Multiple event-driven processes
134 | 
135 | ## Event-driven Model: How
136 | 
137 | - How does this work?
138 |   - **Event** is equal to input on FD (file descriptor)
139 | - Which **file descriptor**?
140 |   - `select()`
141 |   - `poll()`
142 |   - `epoll()`
143 | - Benefits of event-driven model:
144 |   - Single address space and single flow of control
145 |   - Smaller memory requirement and no context switching
146 |   - No synchronization
147 | 
148 | ## Helper Threads and Processes
149 | 
150 | - Problem: for the event-driven model, a blocking request/handler will block the entire process
151 | - Solution:
152 |   - Use **asynchronous I/O operations**:
153 |     - Process/thread makes system call
154 |     - OS obtains all relevant into from stack and either learns where to return results, or tells caller where to get results later
155 |     - Process/thread can continue
156 | - However, an asynchronous system call requires support from kernel (e.g., threads) and/or device
157 | - In general, asynchronous system calls fit nicely with the event-driven model!
158 | - Another problem: what if async calls are not available?
159 |   - Use **helpers**:
160 |     - Designated for blocking I/O operations only
161 |     - Pipe/socket based communication with event dispatcher
162 |     - Helper blocks! But main event loop (and process) will not!
163 | - Before, there were no multi-threaded solutions, therefore, a AMPED (Asymmetric Multi-process Event-driven Model) solution was created similar to that mentioned above
164 | - With the addition of multi-threaded capabilities, the multi-threaded event-driven model discussed in previously became known as the AMTED (Asymmetric Multi-threaded Event-driven Model) solution
165 | - In summary, helper threads/processes:
166 |   - Pros:
167 |     - Resolves portability limitations of basic event-driven model
168 |     - Smaller footprint than regular worker thread
169 |   - Cons:
170 |     - Applicability to certain classes of applications
171 |     - Event-driven routing on multi CPU systems
172 | 
173 | ## Flash Web Server
174 | 
175 | - **Flash: event-driven web server**:
176 |   - An **event-driven web server (AMPED)** with **asymmetric helper processes**
177 |   - _Helpers_ used for disk reads
178 |   - Pipes used for communication with dispatcher
179 |   - Helper reads file in memory (via memory map)
180 |   - Dispatcher checks (via mincore) if pages are in memory to decide _local handler_ or _helper_
181 | - In general, a flash web server can offer possible big savings!
182 | - **Flash: additional optimizations**:
183 |   - Application-level caching (data and computation)
184 |   - Alignment for DMA (direct memory access)
185 |   - Use of DMA with scatter-gather, vector I/O operations
186 | - Back in the day these optimizations would be novel, now they are fairly common
187 | 
188 | ## Apache Web Server
189 | 
190 | - An Apache web server (diagram available in lecture slides) consists of the following elements:
191 |   - **Core** - basic server skeleton
192 |   - **Modules** - per functionality
193 |   - **Flow of control** - similar to event-driven model
194 | - However, an Apache web server differs in:
195 |   - Combination of MP and MT:
196 |     - Each process is equivalent to boss/worker with dynamic thread pool
197 |     - Number of processes can also be dynamically adjusted
198 | 
199 | ## Experimental Methodology
200 | 
201 | - To set up performance comparisons consider the following:
202 |   - First, define the comparison points:
203 |     - What systems are you comparing?
204 |   - Second, define inputs:
205 |     - What workloads will be used?
206 |   - Third, define metrics:
207 |     - How will you measure performance?
208 | 
209 | ## Summary of Performance Results
210 | 
211 | - **When data is in cache**:
212 |   - SPED (single-process event-driven) >> AMPED Flash:
213 |     - Unnecessary test for memory presence
214 |   - SPED and AMPED Flash >> MT/MP:
215 |     - Sync and context switching overhead
216 | - **Disk-bound workload**:
217 |   - AMPED Flash >> SPED:
218 |     - Blocks because no async I/O
219 |   - AMPED Flash >> MT/MP:
220 |     - More memory efficient and less context switching
221 | 
222 | ## Advice on Designing Experiments
223 | 
224 | - **Design relevant experiments**: statements about a solution that others believe in and care for
225 | - **Purpose of relevant experiments** (e.g., web server experiment):
226 |   - Clients: response time
227 |   - Operations: throughput
228 |   - **Possible goals**:
229 |     - Increase response time and throughput
230 |     - Increase response time
231 |     - Increase response time while decreasing throughput
232 |     - Maintains response time when request rate increases
233 |   - **Goals**: metrics and configuration of experiments
234 | - _Rule of thumb_ for picking **metrics**:
235 |   - Standard **metrics** equals broader audience
236 |   - **Metrics** answering the _"Who? What? Why?"_ questions:
237 |     - Client performance: response time, timed-out request, etc.
238 |     - Operator costs: throughput, costs, etc.
239 | - **Picking the right configuration space**:
240 |   - **System resources**:
241 |     - Hardware and software
242 |   - **Workload**:
243 |     - Web server: request rate, number of concurrent requests, file size, access pattern
244 |   - **Now pick!**:
245 |     - Choose a subset of configuration parameters
246 |     - Pick ranges for each variable factor
247 |     - Pick relevant workload
248 |     - Include the best/worst case scenarios
249 | - **Are you comparing apples to apples?**:
250 |   - Pick useful combination of factors, many just reiterate the same point
251 | - **What about the competition and baseline?**:
252 |   - Compare system to:
253 |     - State-of-the-art
254 |     - Most common practice
255 |     - Ideal best/worst case scenario
256 | 
257 | ## Advice on Running Experiments
258 | 
259 | - If you have designed the experiments you should consider:
260 |   - Running test cases _n_ times
261 |   - Compute metrics
262 |   - Represent results
263 | - Additionally, do not forget about making conclusions!
264 | 


--------------------------------------------------------------------------------