├── aspell.words ├── fig ├── as.pdf ├── os.pdf ├── race.pdf ├── smp.pdf ├── inode.pdf ├── switch.pdf ├── deadlock.pdf ├── fslayer.pdf ├── fslayout.pdf ├── mkernel.pdf ├── xv6_layout.pdf ├── processlayout.png ├── riscv_address.pdf ├── riscv_pagetable.pdf ├── order.tex ├── trap.tex ├── sleep.tex ├── switch.tex ├── mkernel.svg ├── os.svg ├── race.svg ├── fslayer.svg ├── deadlock.svg ├── smp.svg ├── as.svg ├── switch.svg └── fslayout.svg ├── font ├── MinionPro-It.otf ├── MinionPro-Bold.otf ├── MinionPro-BoldIt.otf ├── MinionPro-Regular.otf ├── MinionPro-Semibold.otf ├── MinionPro-SemiboldIt.otf ├── README └── LucidaSans-Typewriter83.afm ├── .gitignore ├── xv6-riscv-src-booklet ├── toc.hdr ├── Makefile ├── toc.ftr ├── pr.pl ├── runoff.list ├── runoff.spec ├── runoff1 └── runoff ├── README.md ├── bin ├── double.pl └── capital.py ├── sum.tex ├── LICENSE ├── Makefile ├── acks.tex ├── book.tex ├── lineref ├── book.bib ├── tr2tex ├── lock2.tex ├── pgfault.tex └── interrupt.tex /aspell.words: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/aspell.words -------------------------------------------------------------------------------- /fig/as.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/as.pdf -------------------------------------------------------------------------------- /fig/os.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/os.pdf -------------------------------------------------------------------------------- /fig/race.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/race.pdf -------------------------------------------------------------------------------- /fig/smp.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/smp.pdf -------------------------------------------------------------------------------- /fig/inode.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/inode.pdf -------------------------------------------------------------------------------- /fig/switch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/switch.pdf -------------------------------------------------------------------------------- /fig/deadlock.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/deadlock.pdf -------------------------------------------------------------------------------- /fig/fslayer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/fslayer.pdf -------------------------------------------------------------------------------- /fig/fslayout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/fslayout.pdf -------------------------------------------------------------------------------- /fig/mkernel.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/mkernel.pdf -------------------------------------------------------------------------------- /fig/xv6_layout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/xv6_layout.pdf -------------------------------------------------------------------------------- /fig/processlayout.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/processlayout.png -------------------------------------------------------------------------------- /fig/riscv_address.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/riscv_address.pdf -------------------------------------------------------------------------------- /font/MinionPro-It.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/font/MinionPro-It.otf -------------------------------------------------------------------------------- /fig/riscv_pagetable.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/fig/riscv_pagetable.pdf -------------------------------------------------------------------------------- /font/MinionPro-Bold.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/font/MinionPro-Bold.otf -------------------------------------------------------------------------------- /font/MinionPro-BoldIt.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/font/MinionPro-BoldIt.otf -------------------------------------------------------------------------------- /font/MinionPro-Regular.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/font/MinionPro-Regular.otf -------------------------------------------------------------------------------- /font/MinionPro-Semibold.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/font/MinionPro-Semibold.otf -------------------------------------------------------------------------------- /font/MinionPro-SemiboldIt.otf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mit-pdos/xv6-riscv-book/HEAD/font/MinionPro-SemiboldIt.otf -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.pdf 2 | *.*~ 3 | *.ps 4 | latex.out/ 5 | xv6-riscv-src 6 | xv6-riscv-src-booklet/fmt 7 | *.aux 8 | *.idx 9 | *.ilg 10 | *.ind 11 | *.log 12 | *.toc 13 | *.bbl 14 | *.blg 15 | *.out 16 | -------------------------------------------------------------------------------- /font/README: -------------------------------------------------------------------------------- 1 | These fonts are taken from 2 | 3 | /home/am8/rsc/font/Adobe/TypeClassics/MinionPro 4 | /home/am8/rsc/font/BellLabs 5 | 6 | They are copyrighted material used under license 7 | and cannot be redistributed. 8 | 9 | -------------------------------------------------------------------------------- /xv6-riscv-src-booklet/toc.hdr: -------------------------------------------------------------------------------- 1 | The numbers to the left of the file names in the table are sheet numbers. 2 | The source code has been printed in a double column format with fifty 3 | lines per column, giving one hundred lines per sheet (or page). 4 | Thus there is a convenient relationship between line numbers and sheet numbers. 5 | 6 | 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This edition of the book has been converted to LaTeX. 2 | In order to build it, ensure you have a TeX distribution that contains 3 | the `pdflatex` command. With that, you should be able to build the book 4 | by running `make`, which will clone the OS itself and build the book 5 | to `book.pdf` in the main directory. 6 | 7 | Figures are drawn using `inkscape`. 8 | -------------------------------------------------------------------------------- /xv6-riscv-src-booklet/Makefile: -------------------------------------------------------------------------------- 1 | # make a printout 2 | SRC=../xv6-riscv-src 3 | FS = $(shell grep -v '^\#' runoff.list) 4 | FILES = $(addprefix $(SRC)/,$(FS)) 5 | 6 | PRINT = runoff.list runoff.spec $(SRC)/README toc.hdr toc.ftr $(FILES) 7 | 8 | xv6.pdf: $(PRINT) 9 | ./runoff 10 | ls -l xv6-src-booklet.pdf 11 | 12 | clean: 13 | rm xv6-src-booklet.pdf 14 | rm -rf fmt 15 | -------------------------------------------------------------------------------- /bin/double.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | 3 | # Detects duplicated words even when they are 4 | # are repeated between lines. 5 | # Taken from the ORA regex book 6 | 7 | $/ = ".\n"; 8 | while (<>) { 9 | next if !s/\b([a-z]+)((\s|<[^>]+>)+)(\1\b)/\e[7m$1\e[m$2\e[7m$4\e[m/ig; 10 | 11 | s/^([^\e]*\n)+//mg; 12 | s/^/$ARGV: /mg; 13 | print; 14 | } 15 | 16 | # also test for things like 17 | # [^\w+]a\w+[aeiou] and [^\w+]an\w+[!aeiou] 18 | -------------------------------------------------------------------------------- /xv6-riscv-src-booklet/toc.ftr: -------------------------------------------------------------------------------- 1 | 2 | 3 | The source listing is preceded by a cross-reference that lists every defined 4 | constant, struct, global variable, and function in xv6. Each entry gives, 5 | on the same line as the name, the line number (or, in a few cases, numbers) 6 | where the name is defined. Successive lines in an entry list the line 7 | numbers where the name is used. For example, this entry: 8 | 9 | swtch 2658 10 | 0374 2428 2466 2657 2658 11 | 12 | indicates that swtch is defined on line 2658 and is mentioned on five lines 13 | on sheets 03, 24, and 26. 14 | -------------------------------------------------------------------------------- /fig/order.tex: -------------------------------------------------------------------------------- 1 | \begin{tikzpicture} 2 | 3 | \node at (0, 1.5) {CPU C1}; 4 | \node at (5, 1.5) {CPU C2}; 5 | 6 | \node (code0) at (0,0) { 7 | \begin{lstlisting}[] 8 | acquire(&A); 9 | acquire(&B); 10 | ... 11 | release(&B); 12 | release(&A); 13 | \end{lstlisting} 14 | 15 | %% this line is important 16 | }; 17 | 18 | \node (code1) at (5,0.0) { 19 | \begin{lstlisting}[] 20 | acquire(&B); 21 | acquire(&A); 22 | ... 23 | release(&A); 24 | release(&B); 25 | \end{lstlisting} 26 | 27 | %% this line is important 28 | }; 29 | 30 | \end{tikzpicture} 31 | -------------------------------------------------------------------------------- /fig/trap.tex: -------------------------------------------------------------------------------- 1 | \begin{tikzpicture}[>=latex] 2 | 3 | \tikzset{ 4 | lnode/.style={ 5 | thick, 6 | text width = 3cm, 7 | align = center, 8 | } 9 | } 10 | 11 | \node[lnode] at (0, 0) (n0) {User code}; 12 | \node[lnode] at (0, -2) (n1) {trampoline\\\lstinline{uservec}}; 13 | \node[lnode] at (0, -4.0) (n2) {\lstinline{usertrap}}; 14 | \node[lnode] at (0, -6.0) (n3) {syscall\\or device driver}; 15 | 16 | \node[lnode] at (6, 0) (n4) {User code}; 17 | \node[lnode] at (6, -2) (n5) {trampoline\\\lstinline{userret}}; 18 | 19 | \draw[->] (n0) -- (n1); 20 | \draw[->] (n1) -- (n2); 21 | \draw[->] (n2) -- (n3); 22 | 23 | \draw[->] (n5) -- (n4); 24 | \draw[->] (n2) -- (6,-4.0) -- (n5); 25 | 26 | 27 | \end{tikzpicture} 28 | -------------------------------------------------------------------------------- /xv6-riscv-src-booklet/pr.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | 3 | use POSIX qw(strftime); 4 | 5 | if($ARGV[0] eq "-h"){ 6 | shift @ARGV; 7 | $h = $ARGV[0]; 8 | shift @ARGV; 9 | }else{ 10 | $h = $ARGV[0]; 11 | } 12 | 13 | $page = 0; 14 | $now = strftime "%b %e %H:%M %Y", localtime; 15 | 16 | @lines = <>; 17 | for($i=0; $i<@lines; $i+=50){ 18 | print "\n\n"; 19 | ++$page; 20 | print "$now $h Page $page\n"; 21 | print "\n\n"; 22 | for($j=$i; $j<@lines && $j<$i +50; $j++){ 23 | $lines[$j] =~ s!//DOC.*!!; 24 | print $lines[$j]; 25 | } 26 | for(; $j<$i+50; $j++){ 27 | print "\n"; 28 | } 29 | $sheet = ""; 30 | if($lines[$i] =~ /^([0-9][0-9])[0-9][0-9] /){ 31 | $sheet = "Sheet $1"; 32 | } 33 | print "\n\n"; 34 | print "$sheet\n"; 35 | print "\n\n"; 36 | } 37 | -------------------------------------------------------------------------------- /bin/capital.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import re 4 | import sys 5 | 6 | # look for uncapitalized xv6 7 | regexsmall = re.compile(r'\.\s+(xv6)') 8 | 9 | with open(sys.argv[1], 'r') as f: 10 | d = f.read() 11 | for w in re.findall(regexsmall, d): 12 | print("Error smallcaps %s: %s" % (sys.argv[1], w)) 13 | 14 | # look for capitalized code names (e.g., \lstinline{Exec}), but 15 | # names that are all caps 16 | regexbig = re.compile(r'\\(lstinline|indexcode){([A-Z][a-z]+[a-zA-Z_]+)') 17 | 18 | with open(sys.argv[1], 'r') as f: 19 | line = f.readline() 20 | cnt = 1 21 | while line: 22 | for w in re.findall(regexbig, line): 23 | print("%s:%d: error: %s" % (sys.argv[1], cnt, w[1])) 24 | line = f.readline() 25 | cnt += 1 26 | -------------------------------------------------------------------------------- /sum.tex: -------------------------------------------------------------------------------- 1 | \chapter{Summary} 2 | \label{CH:SUM} 3 | 4 | This text introduced the main ideas in operating systems by studying one 5 | operating system, xv6, line by line. Some code lines embody the essence of the 6 | main ideas (e.g., context switching, user/kernel boundary, locks, etc.) and each 7 | line is important; other code lines provide an illustration of how to implement 8 | a particular operating system idea and could easily be done in different ways 9 | (e.g., a better algorithm for scheduling, better on-disk data structures to 10 | represent files, better logging to allow for concurrent transactions, etc.). 11 | All the ideas were illustrated in the context of one particular, very successful 12 | system call interface, the Unix interface, but those ideas carry over to the 13 | design of other operating systems. 14 | 15 | -------------------------------------------------------------------------------- /fig/sleep.tex: -------------------------------------------------------------------------------- 1 | \begin{tikzpicture} 2 | 3 | \node (code0) at (0,0) { 4 | \begin{lstlisting}[] 5 | piperead() { 6 | acquire(&pipe->lock); 7 | while(no data in pipe->buffer) { 8 | sleep(&pipe, &pipe->lock) { 9 | // in sleep() 10 | acquire(&p->lock) 11 | release(&pipe->lock) 12 | p->state = SLEEPING 13 | ... 14 | swtch() { 15 | // in scheduler() 16 | release(&p->lock) 17 | ... 18 | \end{lstlisting} 19 | 20 | %% this line is important 21 | }; 22 | 23 | \draw[<->,thick] (4.7,2.5) -- (4.7,-0.3); 24 | \draw[<->,thick] (4.9,0.7) -- (4.9,-2.6); 25 | \node at (6.5, 1.5) {Holding pipe->lock}; 26 | \node at (6.5, -1.3) {Holding p->lock}; 27 | \end{tikzpicture} 28 | -------------------------------------------------------------------------------- /xv6-riscv-src-booklet/runoff.list: -------------------------------------------------------------------------------- 1 | # basic headers 2 | kernel/types.h 3 | kernel/param.h 4 | kernel/memlayout.h 5 | kernel/defs.h 6 | kernel/riscv.h 7 | kernel/types.h 8 | kernel/elf.h 9 | 10 | # entering xv6 11 | kernel/entry.S 12 | kernel/start.c 13 | kernel/main.c 14 | 15 | # locks 16 | kernel/spinlock.h 17 | kernel/spinlock.c 18 | 19 | # processes 20 | kernel/vm.c 21 | kernel/proc.h 22 | kernel/proc.c 23 | kernel/swtch.S 24 | kernel/kalloc.c 25 | 26 | # system calls 27 | kernel/trampoline.S 28 | kernel/kernelvec.S 29 | kernel/trap.c 30 | kernel/syscall.h 31 | kernel/syscall.c 32 | kernel/sysproc.c 33 | 34 | # file system 35 | kernel/buf.h 36 | kernel/sleeplock.h 37 | kernel/fcntl.h 38 | kernel/stat.h 39 | kernel/fs.h 40 | kernel/file.h 41 | kernel/bio.c 42 | kernel/sleeplock.c 43 | kernel/log.c 44 | kernel/fs.c 45 | kernel/file.c 46 | kernel/sysfile.c 47 | kernel/exec.c 48 | 49 | # pipes 50 | kernel/pipe.c 51 | 52 | # string operations 53 | kernel/string.c 54 | 55 | # low-level hardware 56 | kernel/plic.c 57 | kernel/console.c 58 | kernel/uart.c 59 | kernel/virtio_disk.c 60 | 61 | # user-level 62 | user/init.c 63 | user/sh.c 64 | 65 | # link 66 | kernel/kernel.ld 67 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The xv6 book sources are: 2 | 3 | Copyright (c) 2006-2024 Russ Cox, Frans Kaashoek, and Robert Morris, 4 | Massachusetts Institute of Technology 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy of 7 | this source and associated documentation files (the "Book"), to deal in the Book 8 | without restriction, including without limitation the rights to use, copy, 9 | modify, merge, publish, distribute, sublicense, and/or sell copies of the Book, 10 | and to permit persons to whom the Book is furnished to do so, subject to the 11 | following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Book. 15 | 16 | THE BOOK IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 17 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 18 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 19 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 20 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 21 | CONNECTION WITH THE BOOK OR THE USE OR OTHER DEALINGS IN THE BOOK. 22 | 23 | -------------------------------------------------------------------------------- /fig/switch.tex: -------------------------------------------------------------------------------- 1 | \begin{tikzpicture} 2 | 3 | \node at (-0.4, 1.2) {Process 1}; 4 | \node at (4.2, 1.2) {Scheduler}; 5 | \node at (8.5, 1.2) {Process 2}; 6 | 7 | \node (code0) at (0,0) { 8 | \begin{lstlisting}[basicstyle=\tt\footnotesize] 9 | (*@\textcolor{blue}{acquire(\&p->lock);}@*) 10 | ... 11 | p->state = RUNNABLE; 12 | swtch(&p->context, ...); 13 | \end{lstlisting} 14 | 15 | %% this line is important 16 | }; 17 | 18 | \node (code1) at (4.5,-3.0) { 19 | \begin{lstlisting}[basicstyle=\tt\footnotesize] 20 | swtch(...); // return 21 | (*@\textcolor{blue}{release(\&p->lock)};@*) 22 | 23 | // find a RUNNABLE p 24 | 25 | (*@\textcolor{green}{acquire(\&p->lock);}@*) 26 | p->state = RUNNING; 27 | swtch(...,&p->context); 28 | \end{lstlisting} 29 | 30 | %% this line is important 31 | }; 32 | 33 | \node (code2) at (9.6,-5.5) { 34 | \begin{lstlisting}[basicstyle=\tt\footnotesize] 35 | swtch(&p->context,...); // return 36 | (*@\textcolor{green}{release(\&p->lock);}@*) 37 | \end{lstlisting} 38 | 39 | %% this line is important 40 | }; 41 | 42 | \draw[->,thick] (1.2,-0.95) -- (2.5,-1.4); 43 | \draw[->,thick] (5.3,-4.75) -- (6.8,-5.2); 44 | 45 | \end{tikzpicture} 46 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SRC=xv6-riscv-src/ 2 | 3 | T=latex.out 4 | 5 | TEX=$(patsubst %,$(T)/%,$(wildcard *.tex)) 6 | SPELLTEX=$(wildcard *.tex) 7 | 8 | all: book.pdf 9 | .PHONY: all src clean 10 | 11 | $(T)/%.tex: %.tex | src 12 | mkdir -p latex.out 13 | ./lineref $(notdir $@) $(SRC) xv6-riscv-src-booklet/fmt > $@ 14 | 15 | src: 16 | if [ ! -d $(SRC) ]; then \ 17 | git clone git@github.com:mit-pdos/xv6-riscv.git $(SRC) ; \ 18 | else \ 19 | git -C $(SRC) pull ; \ 20 | fi; \ 21 | true 22 | 23 | booklet: src 24 | (cd xv6-riscv-src-booklet; make) 25 | mv xv6-riscv-src-booklet/xv6-src-booklet.pdf . 26 | 27 | book.pdf: booklet book.tex $(TEX) 28 | pdflatex book.tex 29 | bibtex book 30 | pdflatex book.tex 31 | pdflatex book.tex 32 | 33 | 34 | lineref: $(TEX) booklet 35 | echo done 36 | 37 | clean: 38 | rm -f book.aux book.idx book.ilg book.ind book.log\ 39 | book.toc book.bbl book.blg book.out 40 | rm -rf latex.out 41 | rm -rf $(SRC) 42 | 43 | spell: 44 | @ for i in $(SPELLTEX); do aspell --mode=tex -p ./aspell.words -c $$i; done 45 | @ for i in $(SPELLTEX); do perl bin/double.pl $$i; done 46 | @ for i in $(SPELLTEX); do perl bin/capital.py $$i; done 47 | @ ( head -1 aspell.words ; tail -n +2 aspell.words | sort ) > aspell.words~ 48 | @ mv aspell.words~ aspell.words 49 | -------------------------------------------------------------------------------- /acks.tex: -------------------------------------------------------------------------------- 1 | \chapter*{Foreword and acknowledgments} 2 | 3 | 4 | This is a draft text intended for a class on operating systems. It 5 | explains the main concepts of operating systems by studying an example 6 | kernel, named xv6. Xv6 is modeled on Dennis Ritchie's and 7 | Ken Thompson's Unix Version 6 (v6)~\cite{unix}. Xv6 loosely follows the structure 8 | and style of v6, but is implemented in ANSI C~\cite{kernighan} for 9 | a multi-core RISC-V~\cite{riscv}. 10 | 11 | This text should be read along with the source code for xv6, an 12 | approach inspired by John Lions' Commentary on UNIX 6th 13 | Edition~\cite{lions}; the text has hyperlinks to the source code at 14 | \url{https://github.com/mit-pdos/xv6-riscv}. See 15 | \url{https://pdos.csail.mit.edu/6.1810} for additional pointers to 16 | on-line resources for v6 and xv6, including several lab assignments 17 | using xv6. 18 | 19 | We have used this text in 6.828 and 6.1810, the operating system 20 | classes at MIT. We thank the faculty, teaching assistants, and 21 | students of those classes who have all directly or indirectly 22 | contributed to xv6. In particular, we would like to thank Adam Belay, 23 | Austin Clements, and Nickolai Zeldovich. Finally, we would like to 24 | thank people who emailed us bugs in the text or suggestions for 25 | improvements: Abutalib Aghayev, Sebastian Boehm, brandb97, Anton 26 | Burtsev, Raphael Carvalho, Tej Chajed,Brendan Davidson, Rasit 27 | Eskicioglu, Color Fuzzy, Wojciech Gac, Giuseppe, Tao Guo, Haibo Hao, 28 | Naoki Hayama, Chris Henderson, Robert Hilderman, Eden Hochbaum, 29 | Wolfgang Keller, Paweł Kraszewski, Henry Laih, Jin Li, Austin Liew, 30 | lyazj@github.com, Pavan Maddamsetti, Jacek Masiulaniec, Michael 31 | McConville, m3hm00d, Mes0903, miguelgvieira, Mark Morrissey, Muhammed Mourad, 32 | Harry Pan, Harry Porter, Siyuan Qian, Zhefeng Qiao, Askar Safin, 33 | Salman Shah, Huang Sha, Vikram Shenoy, Adeodato Simó, Ruslan 34 | Savchenko, Pawel Szczurko, Warren Toomey, tyfkda, tzerbib, Vanush 35 | Vaswani, Chen Wang, Xi Wang, and Zou Chang Wei, Sam Whitlock, Qiongsi Wu, 36 | LucyShawYang, ykf1114@gmail.com, and Meng Zhou 37 | 38 | If you spot errors or have suggestions for improvement, please send email to 39 | Frans Kaashoek and Robert Morris (kaashoek,rtm@csail.mit.edu). 40 | -------------------------------------------------------------------------------- /xv6-riscv-src-booklet/runoff.spec: -------------------------------------------------------------------------------- 1 | # Is sheet 01 (after the TOC) a left sheet or a right sheet? 2 | sheet1: left 3 | 4 | # "left" and "right" specify which page of a two-page spread a file 5 | # must start on. "left" means that a file must start on the first of 6 | # the two pages. "right" means it must start on the second of the two 7 | # pages. The file may start in either column. 8 | # 9 | # "even" and "odd" specify which column a file must start on. "even" 10 | # means it must start in the left of the two columns (00). "odd" means it 11 | # must start in the right of the two columns (50). 12 | # 13 | # You'd think these would be the other way around. 14 | 15 | # types.h either 16 | # param.h either 17 | # defs.h either 18 | # x86.h either 19 | # asm.h either 20 | # mmu.h either 21 | # elf.h either 22 | # mp.h either 23 | 24 | even: kernel/main.c 25 | # mp.c don't care at all 26 | # even: initcode.S 27 | # odd: init.c 28 | 29 | left: kernel/spinlock.h 30 | even: kernel/spinlock.h 31 | 32 | # This gets struct proc and allocproc on the same spread 33 | left: kernel/proc.h 34 | even: kernel/proc.c 35 | 36 | # goal is to have two action-packed 2-page spreads, 37 | # one with 38 | # userinit growproc fork exit wait 39 | # and another with 40 | # scheduler sched yield forkret sleep wakeup1 wakeup 41 | right: kernel/proc.c # VERY important 42 | even: kernel/proc.c # VERY important 43 | 44 | # A few more action packed spreads 45 | # page table creation and process loading 46 | # walkpgdir mappages setupkvm switch[ku]vm inituvm (loaduvm) 47 | # process memory management 48 | # allocuvm deallocuvm freevm 49 | left: kernel/vm.c 50 | 51 | even: kernel/kalloc.c # mild preference 52 | 53 | # syscall.h either 54 | # trapasm.S either 55 | # traps.h either 56 | # even: trap.c 57 | # vectors.pl either 58 | # syscall.c either 59 | # sysproc.c either 60 | 61 | # buf.h either 62 | # dev.h either 63 | # fcntl.h either 64 | # stat.h either 65 | # file.h either 66 | # fs.h either 67 | # fsvar.h either 68 | # left: ide.c # mild preference 69 | # odd: bio.c 70 | 71 | # log.c fits nicely in a spread 72 | even: kernel/log.c 73 | left: kernel/log.c 74 | 75 | # with fs.c starting on 2nd column of a left page, we get these 2-page spreads: 76 | # ialloc iupdate iget idup ilock iunlock iput iunlockput 77 | # bmap itrunc stati readi writei 78 | # namecmp dirlookup dirlink skipelem namex namei 79 | # fileinit filealloc filedup fileclose filestat fileread filewrite 80 | # starting on 2nd column of a right page is not terrible either 81 | odd: kernel/fs.c # VERY important 82 | left: kernel/fs.c # mild preference 83 | # file.c either 84 | # exec.c either 85 | # sysfile.c either 86 | 87 | # even: pipe.c # mild preference 88 | # string.c either 89 | even: kernel/console.c 90 | odd: user/sh.c 91 | 92 | -------------------------------------------------------------------------------- /xv6-riscv-src-booklet/runoff1: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env perl 2 | 3 | $n = 0; 4 | $v = 0; 5 | if($ARGV[0] eq "-v") { 6 | $v = 1; 7 | shift @ARGV; 8 | } 9 | if($ARGV[0] eq "-n") { 10 | $n = $ARGV[1]; 11 | shift @ARGV; 12 | shift @ARGV; 13 | } 14 | $n = int(($n+49)/50)*50 - 1; 15 | 16 | $file = $ARGV[0]; 17 | @lines = <>; 18 | $linenum = 0; 19 | foreach (@lines) { 20 | $linenum++; 21 | chomp; 22 | s/\s+$//; 23 | if(length() >= 75){ 24 | print STDERR "$file:$linenum: line too long\n"; 25 | } 26 | } 27 | @outlines = (); 28 | $nextout = 0; 29 | 30 | for($i=0; $i<@lines; ){ 31 | # Skip leading blank lines. 32 | $i++ while $i<@lines && $lines[$i] =~ /^$/; 33 | last if $i>=@lines; 34 | 35 | # If the rest of the file fits, use the whole thing. 36 | if(@lines <= $i+50 && !grep { /PAGEBREAK/ } @lines){ 37 | $breakbefore = @lines; 38 | }else{ 39 | # Find a good next page break; 40 | # Hope for end of function. 41 | # but settle for a blank line (but not first blank line 42 | # in function, which comes after variable declarations). 43 | $breakbefore = $i; 44 | $lastblank = $i; 45 | $sawbrace = 0; 46 | $breaksize = 15; # 15 lines to get to function 47 | for($j=$i; $j<$i+50 && $j < @lines; $j++){ 48 | if($lines[$j] =~ /PAGEBREAK!/){ 49 | $lines[$j] = ""; 50 | $breakbefore = $j; 51 | $breaksize = 100; 52 | last; 53 | } 54 | if($lines[$j] =~ /PAGEBREAK:\s*([0-9]+)/){ 55 | $breaksize = $1; 56 | $breakbefore = $j; 57 | $lines[$j] = ""; 58 | } 59 | if($lines[$j] =~ /^};?$/){ 60 | $breakbefore = $j+1; 61 | $breaksize = 15; 62 | } 63 | if($lines[$j] =~ /^{$/){ 64 | $sawbrace = 1; 65 | } 66 | if($lines[$j] =~ /^$/){ 67 | if($sawbrace){ 68 | $sawbrace = 0; 69 | }else{ 70 | $lastblank = $j; 71 | } 72 | } 73 | } 74 | if($j<@lines && $lines[$j] =~ /^$/){ 75 | $lastblank = $j; 76 | } 77 | 78 | # If we are not putting enough on a page, try a blank line. 79 | if($breakbefore - $i < 50 - $breaksize && $lastblank > $breakbefore && $lastblank >= $i+50 - 5){ 80 | if($v){ 81 | print STDERR "breakbefore $breakbefore i $i breaksize $breaksize\n"; 82 | } 83 | $breakbefore = $lastblank; 84 | $breaksize = 5; # only 5 lines to get to blank line 85 | } 86 | 87 | # If we are not putting enough on a page, force a full page. 88 | if($breakbefore - $i < 50 - $breaksize && $breakbefore != @lines){ 89 | $breakbefore = $i + 50; 90 | $breakbefore = @lines if @lines < $breakbefore; 91 | } 92 | 93 | if($breakbefore < $i+2){ 94 | $breakbefore = $i+2; 95 | } 96 | } 97 | 98 | # Emit the page. 99 | $i50 = $i + 50; 100 | for(; $i<$breakbefore; $i++){ 101 | printf "%04d %s\n", ++$n, $lines[$i]; 102 | } 103 | 104 | # Finish page 105 | for($j=$i; $j<$i50; $j++){ 106 | printf "%04d \n", ++$n; 107 | } 108 | } 109 | -------------------------------------------------------------------------------- /book.tex: -------------------------------------------------------------------------------- 1 | \documentclass[12pt]{book} 2 | \usepackage[T1]{fontenc} 3 | \usepackage{times} 4 | \usepackage{listings} 5 | \usepackage{graphicx} 6 | \usepackage{xcolor} 7 | \usepackage{url} 8 | % \usepackage{showidx} 9 | \usepackage{imakeidx} 10 | \usepackage{booktabs} 11 | \usepackage{url} 12 | \usepackage{etoolbox} % for showidx 13 | \usepackage{fullpage} 14 | \usepackage{soul} 15 | \usepackage[utf8]{inputenc} 16 | \usepackage{gnuplot-lua-tikz} 17 | 18 | \usepackage{hyperref} % should be last 19 | 20 | % One space after periods 21 | \frenchspacing 22 | 23 | \hypersetup{pdfauthor={Russ Cox, Frans Kaashoek, Robert Morris}, 24 | pdftitle={xv6: a simple, Unix-like teaching operating system},} 25 | 26 | \lstset{basicstyle=\small\ttfamily} 27 | \lstset{morecomment=[is]{[[[}{]]]}} 28 | \lstset{escapeinside={(*@}{@*)}} 29 | \lstset{xleftmargin=5.0ex} 30 | 31 | \newcommand{\github}{https://github.com/mit-pdos/xv6-riscv/blob/riscv/} 32 | 33 | \newcommand{\fileref}[1]{\small{\textcolor{red}{(#1)}}} 34 | \newcommand{\lineref}[2]{\small{\textcolor{red}{(#1:#2)}}} 35 | \newcommand{\linerefs}[3]{\small{\textcolor{red}{(#1:#2-#3)}}} 36 | 37 | % MFK: this (or file:) gives a security alert 38 | % \newcommand{\showlineref}[3]{\href{run:./xv6-riscv-src/#1:#2}{\small{(\textcolor{blue}{#3})}}} 39 | 40 | \newcommand{\showfileref}[2]{\href{\github/#1}{\small{(\textcolor{blue}{#2})}}} 41 | \newcommand{\showlineref}[3]{\href{\github/#1\#L#2}{\small{(\textcolor{blue}{#3})}}} 42 | \newcommand{\showlinerefs}[5]{\href{\github/#1\#L#2-L#3}{\small\textcolor{blue}{(#4-#5)}}} 43 | 44 | \newcommand{\indextext}[1]{\textit{#1}\index{#1}} 45 | \newcommand{\indextextx}[1]{{#1}\index{#1}} 46 | \newcommand{\indexcode}[1]{\lstinline{#1}\index{#1@\lstinline{#1}}} 47 | 48 | %% editing markup 49 | \newcommand{\insertnote}[3]{\noindent\textcolor{#1}{\textbf{#2:} #3}} 50 | \newcommand{\note}[1]{\insertnote{blue}{NOTE}{#1}} 51 | \newcommand{\rtm}[1]{\insertnote{red}{RTM}{#1}} 52 | \newcommand{\mfk}[1]{\insertnote{red}{MFK}{#1}} 53 | %% for publishing book without notes 54 | %\renewcommand{\insertnote}[3]{} 55 | 56 | \title{\textbf{xv6: a simple, Unix-like teaching operating system}} 57 | \author{Russ Cox \and Frans Kaashoek \and Robert Morris} 58 | 59 | \makeindex 60 | 61 | \begin{document} 62 | 63 | \maketitle 64 | 65 | \tableofcontents 66 | 67 | \input{latex.out/acks} 68 | \input{latex.out/unix} 69 | \input{latex.out/first} 70 | \input{latex.out/mem} 71 | \input{latex.out/trap} 72 | \input{latex.out/pgfault} 73 | \input{latex.out/interrupt} 74 | \input{latex.out/lock} 75 | \input{latex.out/sched} 76 | \input{latex.out/sleep} 77 | \input{latex.out/fs} 78 | \input{latex.out/lock2} 79 | \input{latex.out/sum} 80 | 81 | { 82 | % The following prevents latex from splitting a bibliography entry with a page 83 | % break 84 | \interlinepenalty=10000 85 | % Since we're using natbib in numbers mode, we don't need plainnat, 86 | % which exists to feed authors and years back in to natbib. As a 87 | % result, it complains about entries without years, which we don't 88 | % care about. 89 | %\bibliographystyle{plainnat} 90 | \bibliographystyle{plain} 91 | \bibliography{book} 92 | } 93 | 94 | \printindex 95 | 96 | \end{document} 97 | -------------------------------------------------------------------------------- /lineref: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import os 4 | import re 5 | import sys 6 | 7 | if len(sys.argv) < 2: 8 | print("error: too few arguments", file=sys.stderr) 9 | 10 | xv6src = sys.argv[2] 11 | xv6booklet = sys.argv[3] 12 | 13 | # format line in lions booklet style 14 | def fmt_line(i): 15 | return f"{i:04d}" 16 | 17 | def line_number(path, i, line): 18 | if path == xv6booklet: 19 | return int(line[0:4]) 20 | else: 21 | return i 22 | 23 | def lookup_fileref_booklet(fname): 24 | p = xv6booklet 25 | try: 26 | with open(p +'/'+fname) as f: 27 | line = f.readline() 28 | return line_number(p, 1, line) 29 | except IOError: 30 | print("error: cannot open %s" % fname, file=sys.stderr) 31 | return None 32 | 33 | def lookup_regex_booklet(fname, pat1, pat2=None): 34 | pat1 = pat1.replace(r"^", r"\d\d\d\d ") 35 | if pat2 != None: 36 | pat2 = pat2.replace(r"^", r"\d\d\d\d ") 37 | return lookup_regex(xv6booklet, fname, pat1, pat2) 38 | 39 | def lookup_regex(path, fname, pat1, pat2=None): 40 | fname = fname.replace(r"\_", "_") 41 | p1 = re.compile(pat1) 42 | if pat2 != None: 43 | p2 = re.compile(pat2) 44 | else: 45 | p2 = None 46 | cnt = 1 47 | i = None 48 | j = None 49 | p = p1 50 | try: 51 | with open(path+'/'+fname) as f: 52 | line = f.readline() 53 | while line: 54 | m = p.search(line) 55 | if m != None: 56 | if p2 == None: 57 | cnt = line_number(path, cnt, line) 58 | return (cnt, None) 59 | else: 60 | if i == None: 61 | i = line_number(path, cnt, line) 62 | p = p2 63 | else: 64 | j = line_number(path, cnt, line) 65 | return (i, j) 66 | cnt += 1 67 | line = f.readline() 68 | if i == None: 69 | print("%s: cannot find pat %s" % (fname, p1), file=sys.stderr) 70 | else: 71 | print("%s: cannot find pat %s" % (fname, p2), file=sys.stderr) 72 | return (None, None) 73 | except IOError: 74 | print("error: cannot open %s" % fname, file=sys.stderr) 75 | return (None, None) 76 | 77 | def lineref(l): 78 | # file:/pattern/delta 79 | p = re.compile(r'\\lineref{(.*):\/(.*)\/([+-]?\d+)?}') 80 | m = p.search(l) 81 | if m != None: 82 | f = m.groups()[0] 83 | (i, j) = lookup_regex(xv6src, f, m.groups()[1]) 84 | (i0, j0) = lookup_regex_booklet(f, m.groups()[1]) 85 | # print(f, m.groups()[1], i, i0, file=sys.stderr) 86 | delta = 0 87 | if m.groups()[2] != None: 88 | delta = int(m.groups()[2]) 89 | if i != None: 90 | l = p.sub(r'\\showlineref{%s}{%s}{%s}' % (f,str(i+delta),fmt_line(i0+delta)), l) 91 | print(l, end="") 92 | return 93 | # fileref 94 | p = re.compile(r'\\fileref\{([^}]*)\}') 95 | m = p.search(l) 96 | if m != None: 97 | f = m.groups()[0] 98 | n0 = lookup_fileref_booklet(f) 99 | print("fileref", f, n0, file=sys.stderr) 100 | l = p.sub(r'\\showfileref{%s}{%s}' % (f, fmt_line(n0)), l) 101 | print(l, end="") 102 | return 103 | # file:/pattern/delta,pattern/delta 104 | p = re.compile(r'\\linerefs{(.*):\/(.*)\/([+-]?\d+)?,\/(.*)\/([+-]?\d+)?}') 105 | m = p.search(l) 106 | if m != None: 107 | f = m.groups()[0] 108 | (i, j) = lookup_regex(xv6src, f, m.groups()[1], m.groups()[3]) 109 | (i0, j0) = lookup_regex_booklet(f, m.groups()[1], m.groups()[3]) 110 | delta1 = 0 111 | if m.groups()[2] != None: 112 | delta1 = int(m.groups()[2]) 113 | delta2 = 0 114 | if m.groups()[4] != None: 115 | delta2 = int(m.groups()[4]) 116 | if i != None and j != None: 117 | l = p.sub(r'\\showlinerefs{%s}{%s}{%s}{%s}{%s}' % (f, str(i+delta1), str(j+delta2), fmt_line(i0+delta1), fmt_line(j0+delta2)), l) 118 | print(l, end="") 119 | return 120 | print(l, end="") 121 | 122 | with open(sys.argv[1]) as f: 123 | line = f.readline() 124 | while line: 125 | lineref(line) 126 | line = f.readline() 127 | -------------------------------------------------------------------------------- /book.bib: -------------------------------------------------------------------------------- 1 | @book{riscv, 2 | author = {Patterson, David and Waterman, Andrew}, 3 | title = {The {RISC-V} Reader: an open architecture Atlas}, 4 | year = {2017}, 5 | isbn = {099924910X, 9780999249109}, 6 | publisher = {Strawberry Canyon}, 7 | } 8 | 9 | @book{lions, 10 | author = {John Lions}, 11 | title = {Commentary on UNIX 6th Edition}, 12 | year = 2000, 13 | publisher = {Peer to Peer Communications}, 14 | isbn = {1-57398-013-7}, 15 | } 16 | 17 | @article{unix, 18 | author = {Ritchie, Dennis M. and Thompson, Ken}, 19 | title = {The {UNIX} Time-sharing System}, 20 | journal = {Commun. ACM}, 21 | issue_date = {July 1974}, 22 | volume = {17}, 23 | number = {7}, 24 | month = jul, 25 | year = {1974}, 26 | pages = {365--375}, 27 | numpages = {11}, 28 | url = {http://doi.acm.org/10.1145/361011.361061}, 29 | doi = {10.1145/361011.361061}, 30 | publisher = {ACM}, 31 | } 32 | 33 | @book{knuth, 34 | author = {Knuth, Donald}, 35 | title = {Fundamental Algorithms. The Art of Computer Programming. (Second ed.)}, 36 | year = 1997, 37 | volume = 1, 38 | publisher = Addison-Wesley, 39 | isbn = {0-201-89683-4}, 40 | } 41 | 42 | @document{riscv:priv, 43 | title = {The {RISC-V} instruction set manual {Volume II}: privileged specification}, 44 | editor = {Andrew Waterman and Krste Asanovic and John Hauser}, 45 | year = 2024, 46 | howpublished = {\url{https://drive.google.com/file/d/1uviu1nH-tScFfgrovvFCrj7Omv8tFtkp/view?usp=drive_link}}, 47 | } 48 | 49 | @document{riscv:user, 50 | title = {The {RISC-V} instruction set manual {Volume I}: unprivileged specification {ISA}}, 51 | editor = {Andrew Waterman and Krste Asanovic}, 52 | year = 2024, 53 | howpublished={\url{https://drive.google.com/file/d/17GeetSnT5wW3xNuAHI95-SI1gPGd5sJ_/view?usp=drive_link}}, 54 | } 55 | 56 | @book{kernighan, 57 | author = {Kernighan, Brian W.}, 58 | editor = {Ritchie, Dennis M.}, 59 | title = {The C Programming Language}, 60 | year = {1988}, 61 | isbn = {0131103709}, 62 | edition = {2nd}, 63 | publisher = {Prentice Hall Professional Technical Reference}, 64 | } 65 | 66 | @document{u54, 67 | author = {SiFive}, 68 | title = {SiFive {FU540-C000} manual}, 69 | howpublished={\url{https://sifive.cdn.prismic.io/sifive%2F590bbcb6-598e-4ed8-b5d3-88c2c7458ebf_u54-core-complex-manual-v19.05.pdf}}, 70 | year = "2018", 71 | } 72 | 73 | @document{virtio, 74 | author = {{OASIS} Open}, 75 | title = {Virtual {I/O} Device ({VIRTIO}) Version 1.0}, 76 | year = "2016", 77 | month = "March", 78 | howpublished={\url{http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html}}, 79 | } 80 | 81 | 82 | @document{dijkstra65, 83 | author = {Edsger Dijkstra}, 84 | title = {Cooperating Sequential Processes}, 85 | year = "1965", 86 | howpublished={\url{https://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html}}, 87 | } 88 | 89 | @document{ns16550a, 90 | author = {Martin Michael and Daniel Durich}, 91 | year = "1987", 92 | title = {The {NS16550A}: {UART} Design and Application Considerations}, 93 | howpublished = {\url{http://bitsavers.trailing-edge.com/components/national/_appNotes/AN-0491.pdf}}, 94 | } 95 | 96 | @article{boehm04, 97 | author = {Boehm, Hans-J}, 98 | title = {Threads cannot be implemented as a library}, 99 | journal = {ACM PLDI Conference}, 100 | year = {2005}, 101 | } 102 | 103 | 104 | @article{lamport:bakery, 105 | author = {Lamport, L}, 106 | title = {A New Solution of Dijkstra's Concurrent Programming Problem}, 107 | journal = {Communications of the ACM}, 108 | year = {1974}, 109 | } 110 | 111 | 112 | @MISC{mckenney:rcuusage, 113 | author = {Paul E. Mckenney and Silas Boyd-wickizer and Jonathan Walpole}, 114 | title = {{RCU} Usage In the Linux Kernel: One Decade Later}, 115 | year = {2013} 116 | } 117 | 118 | 119 | @book{herlihy:art, 120 | author = {Herlihy, Maurice and Shavit, Nir}, 121 | title = {The Art of Multiprocessor Programming, Revised Reprint}, 122 | year = {2012}, 123 | } 124 | 125 | @INPROCEEDINGS{Presotto91plan9, 126 | author = {Dave Presotto and Rob Pike and Ken Thompson and Howard Trickey}, 127 | title = {Plan 9, A Distributed System}, 128 | booktitle = {In Proceedings of the Spring 1991 EurOpen Conference}, 129 | year = {1991}, 130 | pages = {43--50} 131 | } 132 | 133 | @inproceedings{sel4, 134 | author = {Klein, Gerwin and Elphinstone, Kevin and Heiser, Gernot and Andronick, June and Cock, David and Derrin, Philip and Elkaduwe, Dhammika and Engelhardt, Kai and Kolanski, Rafal and Norrish, Michael and Sewell, Thomas and Tuch, Harvey and Winwood, Simon}, 135 | title = {SeL4: Formal Verification of an {OS} Kernel}, 136 | year = {2009}, 137 | booktitle = {Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles}, 138 | pages = {207–220}, 139 | } 140 | 141 | @MISC{aleph:smashing, 142 | author="Aleph One", 143 | title={Smashing The Stack For Fun And Profit}, 144 | howpublished={\url{http://phrack.org/issues/49/14.html#article}}, 145 | } 146 | 147 | @MISC{mitre:cves, 148 | title = {Linux Common Vulnerabilities and Exposures ({CVEs})}, 149 | howpublished={\url{https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux}}, 150 | } 151 | -------------------------------------------------------------------------------- /xv6-riscv-src-booklet/runoff: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | 3 | echo This script takes a minute to run. Be patient. 1>&2 4 | 5 | LC_CTYPE=C export LC_CTYPE 6 | SRC=../xv6-riscv-src 7 | 8 | # pad stdin to multiple of 120 lines 9 | pad() 10 | { 11 | awk '{print} END{for(; NR%120!=0; NR++) print ""}' 12 | } 13 | 14 | # create formatted (numbered) files 15 | mkdir -p fmt/kernel 16 | mkdir -p fmt/user 17 | rm -f fmt/kernel/* 18 | rm -f fmt/user/* 19 | cp $SRC/README fmt 20 | echo > fmt/blank 21 | files=`grep -v '^#' runoff.list | awk '{print $1}'` 22 | n=99 23 | for i in $files 24 | do 25 | ./runoff1 -n $n $SRC/$i >fmt/$i 26 | nn=`tail -1 fmt/$i | sed 's/ .*//; s/^0*//'` 27 | if [ "x$nn" != x ]; then 28 | n=$nn 29 | fi 30 | done 31 | 32 | # create table of contents 33 | cat toc.hdr >fmt/toc 34 | pr -e8 -t runoff.list | awk ' 35 | /^[a-z0-9]/ { 36 | s=$0 37 | f="fmt/"$1 38 | getline"fmt/tocdata" 43 | next 44 | } 45 | { 46 | print 47 | }' | pr -3 -t >>fmt/toc 48 | cat toc.ftr >>fmt/toc 49 | 50 | # check for bad alignments 51 | perl -e ' 52 | $leftwarn = 0; 53 | while(<>){ 54 | chomp; 55 | s!#.*!!; 56 | s!\s+! !g; 57 | s! +$!!; 58 | next if /^$/; 59 | 60 | if(/TOC: (\d+) (.*)/){ 61 | $toc{$2} = $1; 62 | next; 63 | } 64 | 65 | if(/sheet1: (left|right)$/){ 66 | print STDERR "assuming that sheet 1 is a $1 page. double-check!\n"; 67 | $left = $1 eq "left" ? "13579" : "02468"; 68 | $right = $1 eq "left" ? "02468" : "13579"; 69 | next; 70 | } 71 | 72 | if(/even: (.*)/){ 73 | $file = $1; 74 | if(!defined($toc{$file})){ 75 | print STDERR "Have no toc for $file\n"; 76 | next; 77 | } 78 | if($toc{$file} =~ /^\d\d[^0]/){ 79 | print STDERR "$file does not start on a fresh page.\n"; 80 | } 81 | next; 82 | } 83 | 84 | if(/odd: (.*)/){ 85 | $file = $1; 86 | if(!defined($toc{$file})){ 87 | print STDERR "Have no toc for $file\n"; 88 | next; 89 | } 90 | if($toc{$file} !~ /^\d\d5/){ 91 | print STDERR "$file does not start on a second half page.\n"; 92 | } 93 | next; 94 | } 95 | 96 | if(/(left|right): (.*)/){ 97 | $what = $1; 98 | $file = $2; 99 | if(!defined($toc{$file})){ 100 | print STDERR "Have no toc for $file\n"; 101 | next; 102 | } 103 | if($what eq "left" && !($toc{$file} =~ /^\d[$left][05]/)){ 104 | print STDERR "$file does not start on a left page [$toc{$file}]\n"; 105 | } 106 | # why does this not work if I inline $x in the if? 107 | $x = ($toc{$file} =~ /^\d[$right][05]/); 108 | if($what eq "right" && !$x){ 109 | print STDERR "$file does not start on a right page [$toc{$file}] [$x]\n"; 110 | } 111 | next; 112 | } 113 | 114 | print STDERR "Unknown spec: $_\n"; 115 | } 116 | ' fmt/tocdata runoff.spec 117 | 118 | # make definition list 119 | cd fmt 120 | perl -e ' 121 | while(<>) { 122 | chomp; 123 | 124 | s!//.*!!; 125 | s!/\*([^*]|[*][^/])*\*/!!g; 126 | s!\s! !g; 127 | s! +$!!; 128 | 129 | # look for declarations like char* x; 130 | if (/^[0-9]+ typedef .* u(int|short|long|char);/) { 131 | next; 132 | } 133 | if (/^[0-9]+ extern/) { 134 | next; 135 | } 136 | if (/^[0-9]+ struct [a-zA-Z0-9_]+;/) { 137 | next; 138 | } 139 | if (/^([0-9]+) #define +([A-za-z0-9_]+) +?\(.*/) { 140 | print "$1 $2\n" 141 | } 142 | elsif (/^([0-9]+) #define +([A-Za-z0-9_]+) +([^ ]+)/) { 143 | print "$1 $2 $3\n"; 144 | } 145 | elsif (/^([0-9]+) #define +([A-Za-z0-9_]+)/) { 146 | print "$1 $2\n"; 147 | } 148 | 149 | if(/^^([0-9]+) \.globl ([a-zA-Z0-9_]+)/){ 150 | $isglobl{$2} = 1; 151 | } 152 | if(/^^([0-9]+) ([a-zA-Z0-9_]+):$/ && $isglobl{$2}){ 153 | print "$1 $2\n"; 154 | } 155 | 156 | if (/\(/) { 157 | next; 158 | } 159 | 160 | if (/^([0-9]+) (((static|struct|extern|union|enum) +)*([A-Za-z0-9_]+))( .*)? +([A-Za-z_][A-Za-z0-9_]*)(,|;|=| =)/) { 161 | print "$1 $7\n"; 162 | } 163 | 164 | elsif(/^([0-9]+) (enum|struct|union) +([A-Za-z0-9_]+) +{/){ 165 | print "$1 $3\n"; 166 | } 167 | # TODO: enum members 168 | } 169 | ' $files >defs 170 | 171 | (for i in $files 172 | do 173 | case "$i" in 174 | *.S) 175 | cat $i | sed 's;#.*;;; s;//.*;;;' 176 | ;; 177 | *) 178 | cat $i | sed 's;//.*;;; s;"([^"\\]|\\.)*";;;' 179 | esac 180 | done 181 | ) >alltext 182 | 183 | perl -n -e 'print if s/^([0-9]+ [a-zA-Z0-9_]+)\(.*$/\1/;' alltext | 184 | grep -E -v ' (STUB|usage|main|if|for)$' >>defs 185 | #perl -n -e 'print if s/^([0-9]+) STUB\(([a-zA-Z0-9_]+)\)$/\1 \2/;' alltext \ 186 | # >>defs 187 | ( 188 | >s.defs 189 | 190 | # make reference list 191 | for i in `awk '{print $2}' defs | sort -f | uniq` 192 | do 193 | defs=`grep -E '^[0-9]+ '$i'( |$)' defs | awk '{print $1}'` 194 | echo $i $defs >>s.defs 195 | uses=`grep -E -h '([^a-zA-Z_0-9])'$i'($|[^a-zA-Z_0-9])' alltext | awk '{print $1}'` 196 | if [ "x$defs" != "x$uses" ]; then 197 | echo $i $defs 198 | echo $uses |fmt -29 | sed 's/^/ /' 199 | # else 200 | # echo $i defined but not used >&2 201 | fi 202 | done 203 | ) >refs 204 | 205 | # build defs list 206 | awk ' 207 | { 208 | printf("%04d %s\n", $2, $1); 209 | for(i=3; i<=NF; i++) 210 | printf("%04d \" \n", $i); 211 | } 212 | ' s.defs > t.defs 213 | 214 | # format the whole thing 215 | ( 216 | ../pr.pl README 217 | ../pr.pl -h "table of contents" toc 218 | # pr -t -2 t.defs | ../pr.pl -h "definitions" | pad 219 | pr -t -l50 -2 refs | ../pr.pl -h "cross-references" | pad 220 | # pr.pl -h "definitions" -2 t.defs | pad 221 | # pr.pl -h "cross-references" -2 refs | pad 222 | ../pr.pl blank # make sheet 1 start on left page 223 | ../pr.pl blank 224 | for i in $files 225 | do 226 | ../pr.pl -h "xv6/$i" $i 227 | done 228 | ) | mpage -m50t50b -o -bLetter -T -t -2 -FCourier -L60 >all.ps 229 | grep Pages: all.ps 230 | 231 | # if we have the nice font, use it 232 | nicefont=LucidaSans-Typewriter83 233 | if [ ! -f ../$nicefont ] 234 | then 235 | if git cat-file blob font:$nicefont > ../$nicefont~; then 236 | mv ../$nicefont~ ../$nicefont 237 | fi 238 | fi 239 | if [ -f ../$nicefont ] 240 | then 241 | echo nicefont 242 | (sed 1q all.ps; cat ../$nicefont; sed "1d; s/Courier/$nicefont/" all.ps) >allf.ps 243 | else 244 | echo ugly font! 245 | cp all.ps allf.ps 246 | fi 247 | ps2pdf allf.ps ../xv6-src-booklet.pdf 248 | # cd .. 249 | # pdftops xv6.pdf xv6.ps 250 | -------------------------------------------------------------------------------- /fig/mkernel.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 18 | 20 | 27 | 32 | 33 | 40 | 45 | 46 | 53 | 54 | 72 | 75 | 76 | 78 | 79 | 81 | image/svg+xml 82 | 84 | 85 | 86 | 87 | 88 | 92 | Microkernel 103 | 110 | 117 | shell 128 | 135 | File server 146 | 150 | 155 | 159 | userspace 175 | kernelspace 191 | 196 |   207 | Send message 218 | 219 | 220 | -------------------------------------------------------------------------------- /tr2tex: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import os 4 | import re 5 | import sys 6 | 7 | def end(l): 8 | # look for " .", etc. 9 | p = re.compile(r'\s+([\.,:;])') 10 | l = p.sub(r'\1', l) 11 | # look for ) ( 12 | p = re.compile(r'\s+\)([\.,]*) \(\n') 13 | m = p.search(l) 14 | if m != None: 15 | l = p.sub(r'', l) 16 | if len(m.groups()) > 0: 17 | l = "(" + l + ")%s\n" % m.groups()[0] 18 | else: 19 | l = "(" + l + ")\n" 20 | # look for '' `` 21 | p = re.compile(r"\s+''([\.,]*) ``\n") 22 | m = p.search(l) 23 | if m != None: 24 | l = p.sub(r'', l) 25 | if len(m.groups()) > 0: 26 | l = "``" + l + "''%s\n" % m.groups()[0] 27 | else: 28 | l = "``" + l + "''\n" 29 | # check for " )" 30 | p = re.compile(r'\s+([\)])') 31 | l = p.sub(r'\1', l) 32 | return l 33 | 34 | def ig2dotdot(): 35 | line = f.readline() 36 | while line: 37 | if line.startswith('..'): 38 | return 39 | print("%", line, end="") 40 | line = f.readline() 41 | 42 | def chapter(l): 43 | p = re.compile(r'.chapter ([:\w]+) "(.+)"') 44 | l = p.sub(r'\\chapter{\2}\n\\label{\1}', l) 45 | print(l, end="") 46 | 47 | def chapter1(l): 48 | p = re.compile(r'.chapterlike "(.+)"') 49 | l = p.sub(r'\\chapter*{\1}\n', l) 50 | print(l, end="") 51 | 52 | def comment(l): 53 | p = re.compile(r'\.\\"(.*)') 54 | l = p.sub(r'%% \1', l) 55 | print(l, end="") 56 | 57 | def index(l): 58 | p = re.compile(r'.index ([-\w]+)') 59 | l = p.sub(r'\\index{\1}', l) 60 | p = re.compile(r'.index "(.*)"') 61 | l = p.sub(r'\\index{\1}', l) 62 | l = end(l) 63 | print(l, end="") 64 | 65 | def defindex(l): 66 | p = re.compile(r'.italic-index ([-\w]+)') 67 | l = p.sub(r'\\textit{\1}\\index{\1}', l) 68 | p = re.compile(r'.italic-index "(.*)"') 69 | l = p.sub(r'\\textit{\1}\\index{\1}', l) 70 | l = end(l) 71 | print(l, end="") 72 | 73 | codep = r'([\/\.\->_\w\&<\(\)\#\[\]\|=\*%\!]+)' 74 | 75 | def code(l): 76 | p = re.compile(r'.code "(.*)"') 77 | l = p.sub(r'\\lstinline{\1}', l) 78 | p = re.compile(r'.code ' + codep) 79 | l = p.sub(r'\\lstinline{\1}', l) 80 | l = end(l) 81 | print(l, end="") 82 | 83 | def codeindex(l): 84 | p = re.compile(r'.code-index ' + codep) 85 | l = p.sub(r'\\lstinline{\1}\\index{\1@\\lstinline{\1}}', l) 86 | p = re.compile(r'.code-index "(.*)"') 87 | l = p.sub(r'\\lstinline{\1}\\index{\1@\\lstinline{\1}}', l) 88 | l = end(l) 89 | print(l, end="") 90 | 91 | def section(l): 92 | p = re.compile(r'.section "(.+)"') 93 | l = p.sub(r'\\section{\1}', l) 94 | print(l, end="") 95 | 96 | def figref(l): 97 | p = re.compile(r'.figref ([:\w]+)') 98 | m = p.search(l) 99 | if p != None: 100 | n = m.groups()[0].split(':') 101 | if len(n) > 1: 102 | n = n[1] 103 | else: 104 | n = n[0] 105 | l = p.sub(r'Figure~\\ref{fig:%s}' % n, l) 106 | l = end(l) 107 | print(l, end="") 108 | 109 | def caption(fname): 110 | with open(fname) as f: 111 | line = f.readline() 112 | while line: 113 | if not line.startswith("."): 114 | return line.rstrip('\n') 115 | line = f.readline() 116 | 117 | def istable(fname): 118 | with open(fname) as f: 119 | line = f.readline() 120 | while line: 121 | if line.startswith(".TS"): 122 | return True 123 | line = f.readline() 124 | return False 125 | 126 | def table(fname): 127 | try: 128 | with open(fname) as f: 129 | print(r'\begin{tabular}{ll}') 130 | line = f.readline() # F1 131 | line = f.readline() # .TS 132 | line = f.readline() # center 133 | line = f.readline() # lb lb 134 | line = f.readline() # l l. 135 | line = f.readline() # heading 136 | p = re.compile(r'([\w\ ]+)\t+([\w]+)\n') 137 | line = p.sub(r'{\\bf \1} & {\\bf \2} \\\\', line) 138 | print(line) 139 | print("\midrule") 140 | # table 141 | line = f.readline() 142 | while line: 143 | if line.startswith(".TE"): 144 | break 145 | p = re.compile(r'([\(\)\w]+)\t+(.+)\n') 146 | line = p.sub(r'\1 & \2 \\\\', line) 147 | print(line) 148 | line = f.readline() 149 | line = f.readline() # F2 150 | line = f.readline() # caption 151 | print("\end{tabular}") 152 | print("\caption{" + line.strip("\n") + "}") 153 | except IOError: 154 | print("error: couldn't open %s" % fname, file=sys.stderr) 155 | 156 | def insert(l): 157 | l = l.rstrip('\n') 158 | fname = l.split(' ')[1] 159 | try: 160 | with open(fname) as f: 161 | line = f.readline() 162 | while line: 163 | print(line, end="") 164 | line = f.readline() 165 | except IOError: 166 | print("error: couldn't open %s" % fname, file=sys.stderr) 167 | 168 | def listing(l): 169 | print(r'\begin{lstlisting}[]') 170 | line = f.readline() 171 | while line: 172 | if line.startswith(".so"): 173 | insert(line) 174 | elif line.startswith(".P2"): 175 | break 176 | else: 177 | print(line, end="") 178 | line = f.readline() 179 | print("\end{lstlisting}"); 180 | 181 | def figure(l): 182 | p = re.compile(r'.figure (.+)\n') 183 | l = p.sub(r'\1', l) 184 | name = "fig/%s.t" % l 185 | print("") 186 | print(r'\begin{figure}[t]') 187 | print("\center") 188 | if istable(name): 189 | table(name) 190 | else: 191 | print("\includegraphics[scale=0.5]{fig/%s.eps}" % l) 192 | print("\caption{%s}" % caption(name)) 193 | print("\label{fig:%s}" % l) 194 | print("\end{figure}") 195 | 196 | def italic(l): 197 | p = re.compile(r'.italic ([\w]+)') 198 | l = p.sub(r'\\textit{\1}', l) 199 | p = re.compile(r'.italic "(.*)"') 200 | l = p.sub(r'\\textit{\1}', l) 201 | 202 | l = end(l) 203 | print(l, end="") 204 | 205 | def address(l): 206 | p = re.compile(r'.address (\w+)') 207 | l = p.sub(r'\\texttt{\1}', l) 208 | l = end(l) 209 | print(l, end="") 210 | 211 | def register(l): 212 | p = re.compile(r'.register (\w+)') 213 | l = p.sub(r'\\texttt{\%\1}', l) 214 | l = end(l) 215 | print(l, end="") 216 | 217 | def lineref(l): 218 | l = l.replace('!', '\\') 219 | p = re.compile(r'.line (.*):\/(.*)\/([+-]?\d+)?') 220 | l = p.sub(r'\\lineref{\1:/\2/\3}', l) 221 | p = re.compile(r'.line (.*):(\d+)') 222 | l = p.sub(r'\\lineref{\1:\2}', l) 223 | l = end(l) 224 | print(l, end="") 225 | 226 | def linerefs(l): 227 | l = l.replace('!', '\\') 228 | p = re.compile(r'.lines (.*):\/(.*)\/([+-]?\d+)?,\/(.*)\/([+-]?\d+)?') 229 | l = p.sub(r'\\linerefs{\1:/\2/\3,/\4/\5}', l) 230 | l = end(l) 231 | print(l, end="") 232 | 233 | def indent(l): 234 | p = re.compile(r'.IP (.*)\n') 235 | l = p.sub(r'\\paragraph{\\textbullet}', l) 236 | l = end(l) 237 | print(l, end="") 238 | 239 | def file(l): 240 | p = re.compile(r'.file ([-\w\/\.]+)') 241 | l = p.sub(r'\\fileref{\1}', l) 242 | p = re.compile(r'.file "(.*)"') 243 | l = p.sub(r'\\fileref{\1}', l) 244 | l = end(l) 245 | print(l, end="") 246 | 247 | def doref(l): 248 | # replace references in a line 249 | p = re.compile(r' \\\*\[([\w:]*)\]') 250 | l = p.sub(r'~\\ref{\1}', l) 251 | return l 252 | 253 | def doquote(l): 254 | # replace references in a line 255 | p = re.compile(r'"([\w]+)"') 256 | l = p.sub(r"``\1''", l) 257 | return l 258 | 259 | def docarrot(l): 260 | p = re.compile(r'(\d+)\^(\d+)(-\d+)*') 261 | l = p.sub(r'$\1^{\2}\3$', l) 262 | return l 263 | 264 | def dourl(l): 265 | p = re.compile(r'https://([\w\.\/]+)') 266 | l = p.sub(r'\\url{https://\1}', l) 267 | return l 268 | 269 | def tr2tex(l): 270 | if l.startswith(". "): 271 | l = "." + l[2:] 272 | 273 | if l.startswith(".ig"): 274 | ig2dotdot() 275 | elif l.startswith(".chapter "): 276 | chapter(l) 277 | elif l.startswith(".chapterlike"): 278 | chapter1(l) 279 | elif l.startswith(".PP"): 280 | print("") 281 | elif l.startswith(".code "): 282 | code(l) 283 | elif l.startswith(".italic-index"): 284 | defindex(l) 285 | elif l.startswith(".index"): 286 | index(l) 287 | elif l.startswith(".code-index"): 288 | codeindex(l) 289 | elif l.startswith(".section"): 290 | section(l) 291 | elif l.startswith(r'.\"'): 292 | comment(l) 293 | elif l.startswith(r'.figref'): 294 | figref(l) 295 | elif l.startswith(r'.figure '): 296 | figure(l) 297 | elif l.startswith(r'.italic'): 298 | italic(l) 299 | elif l.startswith(r'.address '): 300 | address(l) 301 | elif l.startswith(r'.register '): 302 | register(l) 303 | elif l.startswith(r'.line '): 304 | lineref(l) 305 | elif l.startswith(r'.lines '): 306 | linerefs(l) 307 | elif l.startswith(r'.P1'): 308 | listing(l) 309 | elif l.startswith(r'.IP'): 310 | indent(l) 311 | elif l.startswith(r'.file'): 312 | file(l) 313 | elif l.startswith(r'.sheet'): 314 | return 315 | elif l.startswith(r"."): 316 | print("TODO", l) 317 | else: 318 | l = docarrot(l) 319 | l = doref(l) 320 | l = doquote(l) 321 | l = dourl(l) 322 | print(l, end="") 323 | 324 | with open(sys.argv[1]) as f: 325 | line = f.readline() 326 | while line: 327 | tr2tex(line) 328 | line = f.readline() 329 | -------------------------------------------------------------------------------- /fig/os.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 18 | 20 | 28 | 34 | 35 | 43 | 49 | 50 | 58 | 64 | 65 | 73 | 79 | 80 | 87 | 93 | 94 | 101 | 102 | 124 | 129 | 130 | 132 | 133 | 135 | image/svg+xml 136 | 138 | 139 | 140 | 141 | 142 | 147 | 154 | Kernel 165 | 172 | 180 | shell 191 | 199 | cat 210 | 215 | 220 | 225 | userspace 241 | kernelspace 257 | systemcall 273 | 278 | 283 | 284 | 285 | -------------------------------------------------------------------------------- /fig/race.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 18 | 20 | 27 | 32 | 33 | 40 | 41 | 59 | 66 | 67 | 69 | 70 | 72 | image/svg+xml 73 | 75 | 76 | 77 | 78 | 79 | 84 | 88 | Memory 99 | CPU 1 111 | CPU2 122 | 127 | 131 | 135 | 15 147 | l->next 158 | 163 | 16 175 | list 186 | 191 | 195 | 15 207 | 16 219 | list 230 | l->next 241 | 246 | Time 258 | 259 | 260 | -------------------------------------------------------------------------------- /lock2.tex: -------------------------------------------------------------------------------- 1 | \chapter{Concurrency revisited} 2 | \label{CH:LOCK2} 3 | 4 | Simultaneously obtaining good parallel 5 | performance, correctness despite concurrency, and understandable code 6 | is a big challenge in kernel design. 7 | Straightforward use of locks is the best path to correctness, 8 | but is not always possible. 9 | This 10 | chapter highlights examples in which xv6 is forced to use locks in an 11 | involved way, and examples where xv6 uses lock-like techniques but not 12 | locks. 13 | 14 | \section{Locking patterns} 15 | 16 | Cached items are often a challenge to lock. 17 | For example, 18 | the file system's block cache \lineref{kernel/bio.c:/^struct/} stores 19 | copies of up to {\tt NBUF} disk blocks. 20 | It's vital that a given disk block have at most 21 | one copy in the cache; otherwise, different processes might make 22 | conflicting changes to different copies of what ought to be the same 23 | block. Each cached block is stored in a {\tt struct buf} 24 | \fileref{kernel/buf.h}. A {\tt struct buf} has a lock field which 25 | helps ensure that only one process uses a given disk block at a time. 26 | However, that lock is not enough: what if a block is not present in 27 | the cache at all, and two processes want to use it at the same time? 28 | There is no {\tt struct buf} (since the block isn't yet cached), and 29 | thus there is nothing to lock. Xv6 deals with this situation by 30 | associating an additional lock ({\tt bcache.lock}) with the set of 31 | identities of cached blocks. Code that needs to check if a block is 32 | cached (e.g., {\tt bget} \lineref{kernel/bio.c:/^bget/}), or change the 33 | set of cached blocks, must hold {\tt bcache.lock}; after that code has 34 | found the block and {\tt struct buf} it needs, it can release {\tt 35 | bcache.lock} and lock just the specific block. This is a common 36 | pattern: one lock for the set of items, plus one lock per item. 37 | 38 | Ordinarily the same function that acquires a lock will release it. But 39 | a more precise way to view things is that a lock is acquired at the 40 | start of a sequence that must appear atomic, and released when that 41 | sequence ends. If the sequence starts and ends in different functions, 42 | or different threads, or on different CPUs, then the lock acquire and 43 | release must do the same. The function of the lock is to force 44 | other uses to wait, not to pin a piece of data to a particular agent. One 45 | example is the {\tt acquire} in {\tt yield} 46 | \lineref{kernel/proc.c:/^yield/}, which is released in the scheduler 47 | thread rather than in the acquiring process. Another example is the 48 | {\tt acquiresleep} in {\tt ilock} \lineref{kernel/fs.c:/^ilock/}; this 49 | code often sleeps while reading the disk; it may wake up on a 50 | different CPU, which means the lock may be acquired 51 | and released on different CPUs. 52 | 53 | Freeing an object that is protected by a lock embedded in the object 54 | is a delicate business, since owning the lock is 55 | not enough to guarantee that freeing would be correct. The problem 56 | case arises when some other thread is waiting in {\tt acquire} to use 57 | the object; freeing the object implicitly frees the embedded lock, which will 58 | cause the waiting thread to malfunction. One solution is to track how 59 | many references to the object exist, so that it is only freed when the 60 | last reference disappears. See {\tt pipeclose} 61 | \lineref{kernel/pipe.c:/^pipeclose/} for an example; 62 | {\tt pi->readopen} and {\tt pi->writeopen} track whether 63 | the pipe has file descriptors referring to it. 64 | 65 | Usually one sees locks around sequences of reads and writes to sets of related 66 | items; the locks ensure that other threads see only completed sequences of 67 | updates (as long as they, too, lock). 68 | What about situations where the update is a simple write to a 69 | single shared variable? For example, 70 | \texttt{setkilled} and \texttt{killed} 71 | \lineref{kernel/proc.c:/^setkilled/} 72 | lock around their simple uses of 73 | \lstinline{p->killed}. 74 | If there were no lock, one thread could write 75 | \lstinline{p->killed} 76 | at the same time that another thread reads it. 77 | This is a \indextextx{race}, and the C language specification says 78 | that a race yields \indextext{undefined behavior}, which means the 79 | program may crash or yield incorrect results\footnote{``Threads and 80 | data races'' in \url{ 81 | https://en.cppreference.com/w/c/language/memory_model}}. The locks 82 | prevent the race and avoid the undefined behavior. 83 | 84 | One reason races can break programs is that, if there are no 85 | locks or equivalent constructs, the compiler may generate machine code 86 | that reads and writes memory in ways quite different than 87 | the original C code. For example, the machine code 88 | of a thread calling 89 | \texttt{killed} could copy \lstinline{p->killed} to a register and 90 | read only that cached value; this would mean that the thread 91 | might never see any writes to 92 | \lstinline{p->killed}. The locks prevent such caching. 93 | 94 | % sleep locks. 95 | % hand-over-hand locking in namei. 96 | % namei's use of refcount to prevent changing underfoot, 97 | % and lock when actually using it. 98 | % example of where deadlock is avoided? namei? 99 | % spawn a thread to evade a lock order problem? 100 | 101 | \section{Lock-like patterns} 102 | 103 | In many places xv6 uses a reference count or a flag in a lock-like way 104 | to indicate that an object is allocated and should not be freed 105 | or re-used. A process's {\tt p->state} acts in this way, as do the 106 | reference counts in {\tt file}, {\tt inode}, and {\tt buf} structures. 107 | While in each case a lock protects the flag or reference count, it is 108 | the latter that prevents the object from being prematurely freed. 109 | 110 | The file system uses {\tt struct inode} reference counts as a kind of 111 | shared lock that can be held by multiple processes, in order to avoid 112 | deadlocks that would occur if the code used ordinary locks. For 113 | example, the loop in {\tt namex} \lineref{kernel/fs.c:/^namex/} locks 114 | the directory named by each pathname component in turn. However, {\tt namex} 115 | must release each lock at the end of the loop, since if it 116 | held multiple locks it could deadlock with itself if the pathname 117 | included a dot (e.g., {\tt a/./b}). It might also deadlock with a 118 | concurrent lookup involving the directory and {\tt ..}. As 119 | Chapter~\ref{CH:FS} explains, the solution is for the loop to carry 120 | the directory inode over to the next iteration with its reference 121 | count incremented, but not locked. 122 | 123 | Some data items are protected by different mechanisms at different 124 | times, and may at times be protected from concurrent access implicitly 125 | by the structure of the xv6 code rather than by explicit locks. For 126 | example, when a physical page is free, it is protected by \texttt{kmem.lock} 127 | \lineref{kernel/kalloc.c:/^. kmem;/}. If the page is then 128 | allocated as a pipe \lineref{kernel/pipe.c:/^pipealloc/}, it is 129 | protected by a different lock (the embedded \lstinline{pi->lock}). If the page 130 | is re-allocated for a new process's user memory, it is not protected by a 131 | lock at all. Instead, the fact that the allocator won't give that page 132 | to any other process (until it is freed) protects it from concurrent 133 | access. 134 | The ownership of a new process's memory is complex: 135 | first the parent allocates and 136 | manipulates it in {\tt fork}, then the child uses it, and (after the 137 | child exits) the parent again owns the memory and passes it to {\tt 138 | kfree}. There are two lessons here: a data object may be protected 139 | from concurrency in different ways at different points in its 140 | lifetime, and the protection may take the form of implicit structure 141 | rather than explicit locks. 142 | 143 | A final lock-like example is the need to disable interrupts around 144 | calls to {\tt mycpu()} \lineref{kernel/proc.c:/^myproc/}. Disabling 145 | interrupts causes the calling code to be atomic with respect to timer 146 | interrupts that could force a context switch, and thus move the 147 | process to a different CPU. 148 | 149 | \section{No locks at all} 150 | 151 | There are a few places where xv6 shares mutable data with no locks at 152 | all. One is in the implementation of spinlocks, although one could 153 | view the RISC-V atomic instructions as relying on locks implemented in 154 | hardware. Another is the {\tt started} variable in {\tt main.c} 155 | \lineref{kernel/main.c:/^volatile/}, used to prevent other CPUs from 156 | running until CPU zero has finished initializing xv6; 157 | the {\tt volatile} ensures that the compiler actually generates 158 | load and store instructions. 159 | 160 | Xv6 contains cases in which one CPU or thread writes some data, and 161 | another CPU or thread reads the data, but there is no specific lock 162 | dedicated to protecting that data. For example, in {\tt fork}, the 163 | parent writes the child's user memory pages, and the child (a 164 | different thread, perhaps on a different CPU) reads those pages; no 165 | lock explicitly protects those pages. This is not strictly a locking 166 | problem, since the child doesn't start executing until after the parent has 167 | finished writing. It is a potential memory ordering problem 168 | (see Chapter~\ref{CH:LOCK}), since without a memory barrier there's no 169 | reason to expect one CPU to see another CPU's writes. However, since 170 | the parent releases locks, and the child acquires locks as it starts 171 | up, the memory barriers in {\tt acquire} and {\tt release} 172 | ensure that the child's CPU sees the parent's writes. 173 | 174 | \section{Parallelism} 175 | 176 | Locking is primarily about suppressing parallelism in the interests of 177 | correctness. Because performance is also important, kernel designers 178 | often have to think about how to use locks in a way that both achieves 179 | correctness and allows parallelism. While xv6 is not systematically 180 | designed for high performance, it's still worth considering which xv6 181 | operations can execute in parallel, and which might conflict on locks. 182 | 183 | Pipes in xv6 are an example of fairly good parallelism. Each pipe has 184 | its own lock, so that different processes can read and write 185 | different pipes in parallel on different CPUs. For a given pipe, 186 | however, the writer and reader must wait for each other to release the 187 | lock; they can't read/write the same pipe at the same time. It is also 188 | the case that a read from an empty pipe (or a write to a full pipe) 189 | must block, but this is not due to the locking scheme. 190 | 191 | Context switching is a more complex example. Two kernel threads, each 192 | executing on its own CPU, can call {\tt yield}, {\tt sched}, and {\tt 193 | swtch} at the same time, and the calls will execute in parallel. The 194 | threads each hold a lock, but they are different locks, so they don't 195 | have to wait for each other. Once in {\tt scheduler}, however, the two 196 | CPUs may conflict on locks while searching the table of processes for 197 | one that is {\tt RUNNABLE}. That is, xv6 is likely to get a 198 | performance benefit from multiple CPUs during context switch, but 199 | perhaps not as much as it could. 200 | 201 | Another example is concurrent calls to {\tt fork} from different 202 | processes on different CPUs. The calls may have to wait for each other 203 | for {\tt pid\_lock} and {\tt kmem.lock}, and for per-process locks 204 | needed to search the process table for an {\tt UNUSED} process. On the 205 | other hand, the two forking processes can copy user memory pages and 206 | format page-table pages fully in parallel. 207 | 208 | The locking scheme in each of the above examples sacrifices parallel 209 | performance in certain cases. In each case it's possible to obtain 210 | more parallelism using a more elaborate design. Whether it's 211 | worthwhile depends on details: how often the relevant operations are 212 | invoked, how long the code spends with a contended lock held, how many 213 | CPUs might be running conflicting operations at the same time, whether 214 | other parts of the code are more restrictive bottlenecks. It can be 215 | difficult to guess whether a given locking scheme might cause 216 | performance problems, or whether a new design is significantly better, 217 | so measurement on realistic workloads is often required. 218 | 219 | \section{Exercises} 220 | 221 | \begin{enumerate} 222 | 223 | \item Modify xv6's pipe implementation to allow a read and 224 | a write to the same pipe to proceed in parallel on different CPUs. 225 | 226 | \item Modify xv6's \texttt{scheduler()} to reduce lock contention 227 | when different CPUs are looking for runnable processes at the same time. 228 | 229 | \item Eliminate some of the serialization in xv6's \texttt{fork()}. 230 | 231 | \end{enumerate} 232 | -------------------------------------------------------------------------------- /fig/fslayer.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 19 | 21 | 28 | 34 | 35 | 42 | 48 | 49 | 56 | 57 | 79 | 84 | 85 | 87 | 88 | 90 | image/svg+xml 91 | 93 | 94 | 95 | 96 | 97 | 102 |     Directory 135 | Inode 146 | Logging 157 | Buffer cache 168 | 175 | 180 |   191 | Pathname 202 | File descriptor 213 | 218 | 223 | 228 | 233 | 240 |   251 |   262 | Disk 273 | 274 | 275 | -------------------------------------------------------------------------------- /font/LucidaSans-Typewriter83.afm: -------------------------------------------------------------------------------- 1 | StartFontMetrics 2.0 2 | Comment AFM Generated by Ghostscript/pf2afm 3 | FontName LucidaSans-Typewriter83 4 | FullName Lucida Sans Typewriter 83 5 | FamilyName LucidaSansTypewriter83 6 | Weight Medium 7 | Notice Copyright (c) 1991 Bigelow & Holmes Inc. and Y&Y, Inc. (508) 371-3286. All Rights Reserved. 8 | ItalicAngle 0 9 | IsFixedPitch true 10 | UnderlinePosition -100 11 | UnderlineThickness 50 12 | Version 1.003 13 | EncodingScheme FontSpecific 14 | FontBBox -12 -205 618 928 15 | StartCharMetrics 246 16 | C 32 ; WX 502 ; N space ; B 0 0 0 0 ; 17 | C 33 ; WX 502 ; N exclam ; B 201 0 301 603 ; 18 | C 34 ; WX 502 ; N quotedbl ; B 100 422 402 643 ; 19 | C 35 ; WX 502 ; N numbersign ; B 22 0 486 603 ; 20 | C 36 ; WX 502 ; N dollar ; B 88 -50 431 653 ; 21 | C 37 ; WX 502 ; N percent ; B 0 -15 502 618 ; 22 | C 38 ; WX 502 ; N ampersand ; B 9 -15 502 618 ; 23 | C 39 ; WX 502 ; N quoteright ; B 191 392 312 643 ; 24 | C 40 ; WX 502 ; N parenleft ; B 141 -121 442 643 ; 25 | C 41 ; WX 502 ; N parenright ; B 60 -121 362 643 ; 26 | C 42 ; WX 502 ; N asterisk ; B 67 252 435 603 ; 27 | C 43 ; WX 502 ; N plus ; B 35 0 467 432 ; 28 | C 44 ; WX 502 ; N comma ; B 191 -131 312 121 ; 29 | C 45 ; WX 502 ; N hyphen ; B 85 186 417 246 ; 30 | C 46 ; WX 502 ; N period ; B 191 0 312 121 ; 31 | C 47 ; WX 502 ; N slash ; B 30 -121 472 643 ; 32 | C 48 ; WX 502 ; N zero ; B 45 -15 457 618 ; 33 | C 49 ; WX 502 ; N one ; B 75 0 477 618 ; 34 | C 50 ; WX 502 ; N two ; B 68 0 413 618 ; 35 | C 51 ; WX 502 ; N three ; B 95 -15 428 618 ; 36 | C 52 ; WX 502 ; N four ; B 45 0 452 603 ; 37 | C 53 ; WX 502 ; N five ; B 110 -15 419 603 ; 38 | C 54 ; WX 502 ; N six ; B 66 -15 457 618 ; 39 | C 55 ; WX 502 ; N seven ; B 82 0 445 603 ; 40 | C 56 ; WX 502 ; N eight ; B 65 -15 448 618 ; 41 | C 57 ; WX 502 ; N nine ; B 61 -15 451 618 ; 42 | C 58 ; WX 502 ; N colon ; B 191 0 312 442 ; 43 | C 59 ; WX 502 ; N semicolon ; B 191 -131 312 442 ; 44 | C 60 ; WX 502 ; N less ; B 35 0 467 432 ; 45 | C 61 ; WX 502 ; N equal ; B 35 116 467 316 ; 46 | C 62 ; WX 502 ; N greater ; B 35 0 467 432 ; 47 | C 63 ; WX 502 ; N question ; B 67 0 442 618 ; 48 | C 64 ; WX 502 ; N at ; B 30 -15 502 618 ; 49 | C 65 ; WX 502 ; N A ; B 5 0 497 603 ; 50 | C 66 ; WX 502 ; N B ; B 75 0 453 603 ; 51 | C 67 ; WX 502 ; N C ; B 42 -15 470 618 ; 52 | C 68 ; WX 502 ; N D ; B 53 0 471 603 ; 53 | C 69 ; WX 502 ; N E ; B 88 0 459 603 ; 54 | C 70 ; WX 502 ; N F ; B 100 0 472 603 ; 55 | C 71 ; WX 502 ; N G ; B 31 -15 459 618 ; 56 | C 72 ; WX 502 ; N H ; B 60 0 442 603 ; 57 | C 73 ; WX 502 ; N I ; B 70 0 432 603 ; 58 | C 74 ; WX 502 ; N J ; B 80 -15 379 603 ; 59 | C 75 ; WX 502 ; N K ; B 69 0 489 603 ; 60 | C 76 ; WX 502 ; N L ; B 90 0 452 603 ; 61 | C 77 ; WX 502 ; N M ; B 38 0 463 603 ; 62 | C 78 ; WX 502 ; N N ; B 60 0 442 603 ; 63 | C 79 ; WX 502 ; N O ; B 25 -15 477 618 ; 64 | C 80 ; WX 502 ; N P ; B 93 0 471 603 ; 65 | C 81 ; WX 502 ; N Q ; B 25 -131 513 618 ; 66 | C 82 ; WX 502 ; N R ; B 75 0 489 603 ; 67 | C 83 ; WX 502 ; N S ; B 67 -15 454 618 ; 68 | C 84 ; WX 502 ; N T ; B 10 0 492 603 ; 69 | C 85 ; WX 502 ; N U ; B 63 -15 440 603 ; 70 | C 86 ; WX 502 ; N V ; B 6 0 496 603 ; 71 | C 87 ; WX 502 ; N W ; B 7 0 495 603 ; 72 | C 88 ; WX 502 ; N X ; B 4 0 496 603 ; 73 | C 89 ; WX 502 ; N Y ; B 13 0 499 603 ; 74 | C 90 ; WX 502 ; N Z ; B 40 0 462 603 ; 75 | C 91 ; WX 502 ; N bracketleft ; B 181 -121 432 643 ; 76 | C 92 ; WX 502 ; N backslash ; B 30 -121 472 643 ; 77 | C 93 ; WX 502 ; N bracketright ; B 70 -121 322 643 ; 78 | C 94 ; WX 502 ; N asciicircum ; B 35 121 467 603 ; 79 | C 95 ; WX 502 ; N underscore ; B 0 -60 502 0 ; 80 | C 96 ; WX 502 ; N quoteleft ; B 191 392 312 643 ; 81 | C 97 ; WX 502 ; N a ; B 55 -7 481 452 ; 82 | C 98 ; WX 502 ; N b ; B 70 -10 457 643 ; 83 | C 99 ; WX 502 ; N c ; B 67 -10 449 452 ; 84 | C 100 ; WX 502 ; N d ; B 52 -10 438 643 ; 85 | C 101 ; WX 502 ; N e ; B 58 -10 447 452 ; 86 | C 102 ; WX 502 ; N f ; B 60 0 488 653 ; 87 | C 103 ; WX 502 ; N g ; B 50 -171 439 452 ; 88 | C 104 ; WX 502 ; N h ; B 74 0 431 643 ; 89 | C 105 ; WX 502 ; N i ; B 70 0 322 643 ; 90 | C 106 ; WX 502 ; N j ; B 60 -171 364 643 ; 91 | C 107 ; WX 502 ; N k ; B 87 0 483 643 ; 92 | C 108 ; WX 502 ; N l ; B 70 0 322 643 ; 93 | C 109 ; WX 502 ; N m ; B 38 0 464 452 ; 94 | C 110 ; WX 502 ; N n ; B 74 0 431 452 ; 95 | C 111 ; WX 502 ; N o ; B 43 -10 459 452 ; 96 | C 112 ; WX 502 ; N p ; B 73 -161 459 452 ; 97 | C 113 ; WX 502 ; N q ; B 45 -161 432 452 ; 98 | C 114 ; WX 502 ; N r ; B 126 0 452 452 ; 99 | C 115 ; WX 502 ; N s ; B 76 -10 429 452 ; 100 | C 116 ; WX 502 ; N t ; B 53 -10 448 528 ; 101 | C 117 ; WX 502 ; N u ; B 72 -10 428 442 ; 102 | C 118 ; WX 502 ; N v ; B 25 0 477 442 ; 103 | C 119 ; WX 502 ; N w ; B 5 0 497 442 ; 104 | C 120 ; WX 502 ; N x ; B 38 0 469 442 ; 105 | C 121 ; WX 502 ; N y ; B 34 -161 479 442 ; 106 | C 122 ; WX 502 ; N z ; B 55 0 447 442 ; 107 | C 123 ; WX 502 ; N braceleft ; B 95 -121 417 643 ; 108 | C 124 ; WX 502 ; N bar ; B 221 -121 281 643 ; 109 | C 125 ; WX 502 ; N braceright ; B 85 -121 407 643 ; 110 | C 126 ; WX 502 ; N asciitilde ; B 35 142 467 290 ; 111 | C 161 ; WX 502 ; N exclamdown ; B 201 -161 301 442 ; 112 | C 162 ; WX 502 ; N cent ; B 80 0 413 603 ; 113 | C 163 ; WX 502 ; N sterling ; B 103 0 434 618 ; 114 | C 164 ; WX 502 ; N fraction ; B 0 -15 502 618 ; 115 | C 165 ; WX 502 ; N yen ; B 20 0 492 603 ; 116 | C 166 ; WX 502 ; N florin ; B 93 -121 466 618 ; 117 | C 167 ; WX 502 ; N section ; B 88 -133 418 618 ; 118 | C 168 ; WX 502 ; N currency ; B 45 96 457 508 ; 119 | C 169 ; WX 502 ; N quotesingle ; B 191 392 312 643 ; 120 | C 170 ; WX 502 ; N quotedblleft ; B 100 422 402 643 ; 121 | C 171 ; WX 502 ; N guillemotleft ; B 48 35 445 404 ; 122 | C 172 ; WX 502 ; N guilsinglleft ; B 142 35 361 404 ; 123 | C 173 ; WX 502 ; N guilsinglright ; B 142 35 361 404 ; 124 | C 174 ; WX 502 ; N fi ; B 50 0 427 653 ; 125 | C 175 ; WX 502 ; N fl ; B 50 0 427 653 ; 126 | C 177 ; WX 502 ; N endash ; B 55 221 447 271 ; 127 | C 178 ; WX 502 ; N dagger ; B 80 -121 422 603 ; 128 | C 179 ; WX 502 ; N daggerdbl ; B 80 -121 422 603 ; 129 | C 180 ; WX 502 ; N periodcentered ; B 201 171 301 271 ; 130 | C 182 ; WX 502 ; N paragraph ; B 51 -121 408 603 ; 131 | C 183 ; WX 502 ; N bullet ; B 131 118 372 359 ; 132 | C 184 ; WX 502 ; N quotesinglbase ; B 191 -131 312 121 ; 133 | C 185 ; WX 502 ; N quotedblbase ; B 100 -121 402 100 ; 134 | C 186 ; WX 502 ; N quotedblright ; B 100 422 402 643 ; 135 | C 187 ; WX 502 ; N guillemotright ; B 57 35 454 404 ; 136 | C 188 ; WX 502 ; N ellipsis ; B 43 0 458 80 ; 137 | C 189 ; WX 502 ; N perthousand ; B 0 -15 502 618 ; 138 | C 191 ; WX 502 ; N questiondown ; B 60 -171 435 442 ; 139 | C 193 ; WX 502 ; N grave ; B 171 522 352 643 ; 140 | C 194 ; WX 502 ; N acute ; B 151 522 332 643 ; 141 | C 195 ; WX 502 ; N circumflex ; B 116 522 387 643 ; 142 | C 196 ; WX 502 ; N tilde ; B 116 522 387 618 ; 143 | C 197 ; WX 502 ; N macron ; B 136 522 367 583 ; 144 | C 198 ; WX 502 ; N breve ; B 116 522 387 643 ; 145 | C 199 ; WX 502 ; N dotaccent ; B 211 522 291 603 ; 146 | C 200 ; WX 502 ; N dieresis ; B 136 522 367 593 ; 147 | C 202 ; WX 502 ; N ring ; B 180 522 327 669 ; 148 | C 203 ; WX 502 ; N cedilla ; B 202 -161 336 0 ; 149 | C 205 ; WX 502 ; N hungarumlaut ; B 123 522 399 643 ; 150 | C 206 ; WX 502 ; N ogonek ; B 172 -161 306 0 ; 151 | C 207 ; WX 502 ; N caron ; B 116 522 387 643 ; 152 | C 208 ; WX 502 ; N emdash ; B 25 221 477 271 ; 153 | C 225 ; WX 502 ; N AE ; B 0 0 487 603 ; 154 | C 227 ; WX 502 ; N ordfeminine ; B 97 291 421 618 ; 155 | C 232 ; WX 502 ; N Lslash ; B -10 0 452 603 ; 156 | C 233 ; WX 502 ; N Oslash ; B 25 -15 478 618 ; 157 | C 234 ; WX 502 ; N OE ; B 30 -15 495 618 ; 158 | C 235 ; WX 502 ; N ordmasculine ; B 90 291 412 618 ; 159 | C 241 ; WX 502 ; N ae ; B 20 -10 490 452 ; 160 | C 245 ; WX 502 ; N dotlessi ; B 70 0 311 442 ; 161 | C 248 ; WX 502 ; N lslash ; B 70 0 422 643 ; 162 | C 249 ; WX 502 ; N oslash ; B 43 -10 460 452 ; 163 | C 250 ; WX 502 ; N oe ; B 20 -10 492 452 ; 164 | C 251 ; WX 502 ; N germandbls ; B 78 -10 478 653 ; 165 | C -1 ; WX 502 ; N threesuperior ; B 128 291 388 618 ; 166 | C -1 ; WX 502 ; N Aring ; B 5 0 497 737 ; 167 | C -1 ; WX 502 ; N Ntilde ; B 60 0 442 748 ; 168 | C -1 ; WX 502 ; N Yacute ; B 13 0 499 773 ; 169 | C -1 ; WX 502 ; N ecircumflex ; B 58 -10 447 643 ; 170 | C -1 ; WX 502 ; N igrave ; B 70 0 327 643 ; 171 | C -1 ; WX 502 ; N odieresis ; B 43 -10 459 593 ; 172 | C -1 ; WX 502 ; N otilde ; B 43 -10 459 618 ; 173 | C -1 ; WX 502 ; N notequal ; B 35 35 467 397 ; 174 | C -1 ; WX 502 ; N twosuperior ; B 123 301 394 618 ; 175 | C -1 ; WX 502 ; N Eth ; B 13 0 471 603 ; 176 | C -1 ; WX 502 ; N Adieresis ; B 5 0 497 723 ; 177 | C -1 ; WX 502 ; N Udieresis ; B 63 -15 440 723 ; 178 | C -1 ; WX 502 ; N eacute ; B 58 -10 447 643 ; 179 | C -1 ; WX 502 ; N ocircumflex ; B 43 -10 459 643 ; 180 | C -1 ; WX 502 ; N partialdiff ; B 70 -10 434 653 ; 181 | C -1 ; WX 502 ; N plusminus ; B 35 0 467 432 ; 182 | C -1 ; WX 502 ; N Atilde ; B 5 0 497 748 ; 183 | C -1 ; WX 502 ; N Idieresis ; B 70 0 432 723 ; 184 | C -1 ; WX 502 ; N Ucircumflex ; B 63 -15 440 773 ; 185 | C -1 ; WX 502 ; N egrave ; B 58 -10 447 643 ; 186 | C -1 ; WX 502 ; N zcaron ; B 55 0 447 643 ; 187 | C -1 ; WX 502 ; N oacute ; B 43 -10 459 643 ; 188 | C -1 ; WX 502 ; N ydieresis ; B 34 -161 479 593 ; 189 | C -1 ; WX 502 ; N degree ; B 160 437 340 618 ; 190 | C -1 ; WX 502 ; N Acircumflex ; B 5 0 497 773 ; 191 | C -1 ; WX 502 ; N Icircumflex ; B 70 0 432 773 ; 192 | C -1 ; WX 502 ; N Uacute ; B 63 -15 440 773 ; 193 | C -1 ; WX 502 ; N integral ; B 97 -121 406 618 ; 194 | C -1 ; WX 502 ; N ccedilla ; B 67 -161 449 452 ; 195 | C -1 ; WX 502 ; N ograve ; B 43 -10 459 643 ; 196 | C -1 ; WX 502 ; N infinity ; B 10 45 492 382 ; 197 | C -1 ; WX 502 ; N thorn ; B 73 -161 459 643 ; 198 | C -1 ; WX 502 ; N greaterequal ; B 35 -60 467 492 ; 199 | C -1 ; WX 502 ; N registered ; B 70 256 432 618 ; 200 | C -1 ; WX 502 ; N Delta ; B 13 0 489 603 ; 201 | C -1 ; WX 502 ; N Aacute ; B 5 0 497 773 ; 202 | C -1 ; WX 502 ; N summation ; B 20 0 496 603 ; 203 | C -1 ; WX 502 ; N Iacute ; B 70 0 432 773 ; 204 | C -1 ; WX 502 ; N Ugrave ; B 63 -15 440 773 ; 205 | C -1 ; WX 502 ; N Zcaron ; B 40 0 462 773 ; 206 | C -1 ; WX 502 ; N lessequal ; B 35 -60 467 492 ; 207 | C -1 ; WX 502 ; N aring ; B 55 -7 481 669 ; 208 | C -1 ; WX 502 ; N logicalnot ; B 35 121 467 322 ; 209 | C -1 ; WX 502 ; N Agrave ; B 5 0 497 773 ; 210 | C -1 ; WX 502 ; N ntilde ; B 74 0 431 618 ; 211 | C -1 ; WX 502 ; N Igrave ; B 70 0 432 773 ; 212 | C -1 ; WX 502 ; N multiply ; B 31 0 471 440 ; 213 | C -1 ; WX 502 ; N Scaron ; B 67 -15 454 773 ; 214 | C -1 ; WX 502 ; N adieresis ; B 55 -7 481 593 ; 215 | C -1 ; WX 502 ; N eth ; B 45 -10 458 643 ; 216 | C -1 ; WX 502 ; N scaron ; B 76 -10 429 643 ; 217 | C -1 ; WX 502 ; N udieresis ; B 72 -10 428 593 ; 218 | C -1 ; WX 502 ; N yacute ; B 34 -161 479 643 ; 219 | C -1 ; WX 502 ; N brokenbar ; B 221 -121 281 643 ; 220 | C -1 ; WX 502 ; N threequarters ; B 17 -15 515 618 ; 221 | C -1 ; WX 502 ; N Edieresis ; B 88 0 459 723 ; 222 | C -1 ; WX 502 ; N Odieresis ; B 25 -15 477 723 ; 223 | C -1 ; WX 502 ; N atilde ; B 55 -7 481 618 ; 224 | C -1 ; WX 502 ; N idieresis ; B 70 0 382 593 ; 225 | C -1 ; WX 502 ; N ucircumflex ; B 72 -10 428 643 ; 226 | C -1 ; WX 502 ; N underline ; B 0 -90 502 -30 ; 227 | C -1 ; WX 502 ; N minus ; B 35 186 467 246 ; 228 | C -1 ; WX 502 ; N onehalf ; B -1 -15 492 618 ; 229 | C -1 ; WX 502 ; N Omega ; B 20 0 482 618 ; 230 | C -1 ; WX 502 ; N radical ; B 0 -121 502 658 ; 231 | C -1 ; WX 502 ; N Ecircumflex ; B 88 0 459 773 ; 232 | C -1 ; WX 502 ; N Otilde ; B 25 -15 477 748 ; 233 | C -1 ; WX 502 ; N Ydieresis ; B 13 0 499 723 ; 234 | C -1 ; WX 502 ; N acircumflex ; B 55 -7 481 643 ; 235 | C -1 ; WX 502 ; N icircumflex ; B 70 0 402 643 ; 236 | C -1 ; WX 502 ; N uacute ; B 72 -10 428 643 ; 237 | C -1 ; WX 502 ; N zero1 ; B 45 -15 457 618 ; 238 | C -1 ; WX 502 ; N onequarter ; B 0 -15 492 618 ; 239 | C -1 ; WX 502 ; N Eacute ; B 88 0 459 773 ; 240 | C -1 ; WX 502 ; N trademark ; B 0 301 492 603 ; 241 | C -1 ; WX 502 ; N Ocircumflex ; B 25 -15 477 773 ; 242 | C -1 ; WX 502 ; N pi ; B 12 0 493 442 ; 243 | C -1 ; WX 502 ; N aacute ; B 55 -7 481 643 ; 244 | C -1 ; WX 502 ; N iacute ; B 70 0 367 643 ; 245 | C -1 ; WX 502 ; N ugrave ; B 72 -10 428 643 ; 246 | C -1 ; WX 502 ; N underscore1 ; B 33 -60 468 0 ; 247 | C -1 ; WX 502 ; N onesuperior ; B 139 301 285 618 ; 248 | C -1 ; WX 502 ; N product ; B 28 0 474 603 ; 249 | C -1 ; WX 502 ; N lozenge ; B 10 0 492 482 ; 250 | C -1 ; WX 502 ; N Egrave ; B 88 0 459 773 ; 251 | C -1 ; WX 502 ; N Oacute ; B 25 -15 477 773 ; 252 | C -1 ; WX 502 ; N divide ; B 35 0 467 432 ; 253 | C -1 ; WX 502 ; N agrave ; B 55 -7 481 643 ; 254 | C -1 ; WX 502 ; N approxequal ; B 35 60 467 372 ; 255 | C -1 ; WX 502 ; N mu ; B 73 -121 429 442 ; 256 | C -1 ; WX 502 ; N copyright ; B 25 -15 477 618 ; 257 | C -1 ; WX 502 ; N nbspace ; B 0 0 0 0 ; 258 | C -1 ; WX 502 ; N Ccedilla ; B 42 -161 470 618 ; 259 | C -1 ; WX 502 ; N Thorn ; B 93 0 471 603 ; 260 | C -1 ; WX 502 ; N Ograve ; B 25 -15 477 773 ; 261 | C -1 ; WX 502 ; N edieresis ; B 58 -10 447 593 ; 262 | EndCharMetrics 263 | EndFontMetrics 264 | -------------------------------------------------------------------------------- /fig/deadlock.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 18 | 20 | 27 | 32 | 33 | 40 | 45 | 46 | 53 | 54 | 72 | 79 | 80 | 82 | 83 | 85 | image/svg+xml 86 | 88 | 89 | 90 | 91 | 92 | 97 | 101 | recv 113 | send 124 | 128 | 206 139 | 207 150 | 216 161 | Time 172 | sleep 183 | wakeup 194 | wait for wakeup forever 205 | 215 216 | test 227 | store p 238 | 204 249 | test 260 | 205 271 | spin forever 282 | 283 | 284 | -------------------------------------------------------------------------------- /fig/smp.svg: -------------------------------------------------------------------------------- 1 | 2 | 17 | 19 | 27 | 32 | 33 | 41 | 46 | 47 | 56 | 61 | 62 | 70 | 75 | 76 | 84 | 89 | 90 | 98 | 103 | 104 | 112 | 117 | 118 | 119 | 142 | 147 | 148 | 150 | 151 | 153 | image/svg+xml 154 | 156 | 157 | 158 | 159 | 160 | 166 | 173 | 180 | 187 | CPU 198 | 205 | CPU 216 | l->next = list 227 | l->next = list 238 | list 249 | 256 | 263 | 267 | 271 | Memory 282 | 286 | 290 | 294 | BUS 305 | 306 | 307 | -------------------------------------------------------------------------------- /fig/as.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 18 | 20 | 27 | 33 | 34 | 41 | 47 | 48 | 55 | 62 | 69 | 76 | 83 | 90 | 97 | 104 | 105 | 128 | 137 | 138 | 140 | 141 | 143 | image/svg+xml 144 | 146 | 147 | 148 | 149 | 150 | 155 | 162 | 167 | 172 | 177 | 182 | 187 |   198 | 0 209 |   220 |   231 | user textand data 247 | user stack 258 | heap 269 | 274 | MAXVA 285 | trampoline 296 | 301 | trapframe 312 | 313 | 314 | -------------------------------------------------------------------------------- /pgfault.tex: -------------------------------------------------------------------------------- 1 | \chapter{Page faults} 2 | \label{CH:PGFAULTS} 3 | %% 4 | 5 | The RISC-V CPU raises a page-fault exception 6 | when a virtual address is used that has no mapping 7 | in the page table, or has a mapping whose \lstinline{PTE_V} 8 | flag is clear, or a mapping whose permission bits 9 | (\lstinline{PTE_R}, 10 | \lstinline{PTE_W}, 11 | \lstinline{PTE_X}, 12 | \lstinline{PTE_U}) 13 | forbid the operation being attempted. 14 | RISC-V distinguishes three 15 | kinds of page fault: load page faults (caused by load instructions), 16 | store page faults (caused by store instructions), 17 | and instruction 18 | page faults (caused by fetches of instructions to be executed). The 19 | \lstinline{scause} register indicates the type of the 20 | page fault and the \indexcode{stval} register contains the address 21 | that couldn't be translated. 22 | 23 | The combination of page tables and page faults is a powerful tool. 24 | Page tables give the kernel a level of indirection between virtual and 25 | physical addresses, so that the kernel can control the structure and 26 | content of address spaces. Page faults allow the kernel to intercept 27 | loads and stores and, by modifying the page table, specify on the fly 28 | what data those references refer to. The kernel can use these 29 | capabilities to increase efficiency: for example, copy-on-write fork 30 | allows the kernel to transparently share memory between parent and 31 | child, avoiding the cost of copying pages that neither write. 32 | Application programmers can also benefit. One possibility is 33 | memory-mapped files, where the kernel uses paging to cause a file's 34 | content to appear in an application's address space, transparently 35 | reading file blocks in response to page faults. Another is lazy memory 36 | allocation, which allows a program to ask for a huge virtual 37 | address space, but only to pay the cost of allocating physical memory 38 | for the pages the program actually reads and writes. xv6 uses page 39 | faults for only one purpose: lazy allocation. 40 | 41 | Before proceeding, please read the functions {\tt sys\_sbrk()} in {\tt 42 | kernel/sysproc.c}, and {\tt vmfault} in {\tt kernel/vm.c}. 43 | Search for calls to {\tt vmfault} in {\tt kernel/trap.c} and {\tt 44 | kernel/vm.c}. 45 | 46 | \section{Lazy allocation} 47 | \label{sec:lazy} 48 | 49 | xv6's \indextext{lazy allocation} has two parts. 50 | First, when an application asks for memory by calling 51 | \lstinline{sbrk} with the flag \lstinline{SBRK_LAZY}, the kernel notes the increase in size, but does not 52 | allocate physical memory and does not create PTEs for the new range of 53 | virtual addresses. Second, on a page fault on one of those new 54 | addresses, the kernel allocates a page of physical memory and maps it 55 | into the page table. The kernel implements lazy allocation 56 | transparently to applications: no modifications to applications are 57 | necessary for them to benefit. 58 | 59 | Lazy allocation is convenient for applications because they don't have 60 | to accurately predict how much memory they will need. For example, an 61 | application may process input, but not know in advance how large the 62 | input will be. With lazy allocations an application can ask for 63 | memory for the worst case, but not have to pay for this worst case: 64 | the kernel doesn't have to do any work at all for pages that the 65 | application never uses. 66 | 67 | Furthermore, if the application is asking to grow the address space by 68 | a lot, then \lstinline{sbrk} without lazy allocation is expensive: if 69 | an application asks for a gigabyte of memory, the kernel has to 70 | allocate and zero 262,144 4096-byte physical pages. Lazy allocation allows 71 | this cost to be spread over time. On the other hand, lazy allocation 72 | incurs the extra overhead of page faults, which involve a user/kernel 73 | transition. Operating systems can reduce this cost by allocating a 74 | batch of consecutive pages per page fault instead of one page and by 75 | specializing the kernel entry/exit code for such page-faults (though 76 | xv6 does neither). 77 | 78 | On the other hand, when taking a page fault for a lazily-allocated 79 | page, the kernel may find that it has not free memory to allocate. In 80 | this case, the kernel has no easy way of returning an out-of-memory 81 | error to the application and instead kills the application. For 82 | applications that prefer an error on a failed allocation, xv6 allows 83 | an application to allocate memory eagerly by calling \lstinline{sbrk} 84 | with the flag \lstinline{SBRK_EAGER}. 85 | 86 | \section{Code} 87 | 88 | The system call {\tt sbrk(n)} grows (or shrinks if {\tt n} 89 | is negative) a process's memory size by {\tt n} bytes, and returns 90 | the start of the newly allocated region (i.e., the old size). 91 | The kernel implementation is \lstinline{sys_sbrk} 92 | \lineref{kernel/sysproc.c:/^sys_sbrk/}. 93 | 94 | If the application specifies \lstinline{SBRK_EAGER}, the system 95 | call is implemented by the function 96 | \lstinline{growproc} 97 | \lineref{kernel/proc.c:/^growproc/}. 98 | \lstinline{growproc} calls \lstinline{uvmalloc}. 99 | \lstinline{uvmalloc} 100 | \lineref{kernel/vm.c:/^uvmalloc/} 101 | allocates physical memory with {\tt kalloc}, zeros the allocated memory, 102 | and adds PTEs to the user page table with {\tt mappages}. 103 | 104 | If the applications allocates memory lazily, \lstinline{sys_sbrk} 105 | just increments the process's size 106 | (\lstinline{myproc()->sz}) by {\tt n} and returns the old size; it does 107 | not allocate physical memory or add PTEs to the process's page table. 108 | 109 | When a process loads or stores to a virtual address that 110 | lacks a valid page-table mapping, the CPU will 111 | raise \indextext{page-fault exception}. 112 | \lstinline{usertrap} checks for this case 113 | \lineref{kernel/trap.c:/page fault/} 114 | and calls \lstinline{vmfault} 115 | \lineref{kernel/vm.c:/^vmfault/} 116 | to handle the page fault. \lstinline{vmfault} 117 | checks that the faulting address is within the 118 | region previously granted by {\tt sbrk}, 119 | allocates a page of physical memory with {\tt kalloc}, 120 | zeros the allocated page, and adds a PTE to the user page table with 121 | {\tt mappages}. Xv6 sets the \lstinline{PTE_W}, \lstinline{PTE_R}, 122 | \lstinline{PTE_U}, and \lstinline{PTE_V} flags in the PTE for the 123 | new page. Then, \lstinline{usertrap} resumes the 124 | process at the instruction that caused the fault. Because the 125 | PTE is now valid, the re-executed load or store instruction 126 | will execute without a fault. 127 | 128 | If an application frees memory using {\tt sbrk}, \lstinline{sys_sbrk} 129 | calls \lstinline{shrinkproc}, which calls \lstinline{uvmdealloc}. The 130 | real work is done by {\tt uvmunmap} \lineref{kernel/vm.c:/^uvmunmap/}, 131 | which uses {\tt walk} to find PTEs. Since some pages may never have 132 | been used by the process and thus never have been allocated by {\tt 133 | vmfault}, {\tt uvmunmap} skips PTEs without the \lstinline{PTE_V} 134 | flag. If a PTE mapping is valid, {\tt uvmunmap} calls {\tt kfree} 135 | to free the physical memory it refers to. 136 | 137 | Note that Xv6 uses a process's page table not just to tell the 138 | hardware how to map user virtual addresses, but also as the only 139 | record of which physical memory pages are allocated to that 140 | process. That is the reason why freeing user memory (in {\tt 141 | uvmunmap}) requires examination of the user page table. 142 | 143 | \section{Real world: Copy-On-Write (COW) fork} 144 | 145 | Many kernels (though not xv6) use page faults to implement 146 | \indextext{copy-on-write (COW) fork}. The {\tt fork} system call 147 | promises that the 148 | child sees memory whose initial content is the same as the parent's 149 | memory at the time of the fork. One way to implement this is to copy 150 | the entire memory of the parent to newly allocated physical memory for 151 | the child; this is what xv6 does. Copying can be slow, and it 152 | would be more efficient if the child could share the parent's physical 153 | memory. A straightforward implementation of this would not work, 154 | however, since it would cause the parent and child to disrupt each 155 | other's execution with their writes to the shared stack and heap. 156 | 157 | Copy-on-write fork causes parent and child to safely share physical 158 | memory by appropriate use of page-table permissions and page faults. 159 | The basic plan is for the parent and child to initially share all 160 | physical pages, but for each to map them read-only (with the 161 | \lstinline{PTE_W} flag clear). Parent and child can then read from the 162 | shared physical memory. If either writes a shared page, the RISC-V CPU 163 | raises a page-fault exception. A kernel supporting COW would respond 164 | by allocating a new page of physical memory and copying the shared 165 | page into that new page. Then kernel would change the relevant PTE in 166 | the faulting process's page table to point to the copy and to allow 167 | writes as well as reads, and then resume the faulting process at the 168 | instruction that caused the fault. Because the PTE now allows writes, 169 | the re-executed store instruction will execute without a fault, and 170 | will modify a private copy of the page rather than the shared page. 171 | 172 | Copy-on-write requires book-keeping 173 | to help decide when physical pages can be freed, since each page can 174 | be referenced by a varying number of page tables depending on the history of 175 | forks, page faults, execs, and exits. This book-keeping allows 176 | an important optimization: if a process incurs a store page 177 | fault and the physical page is only referred to from that process's 178 | page table, no copy is needed. 179 | 180 | Copy-on-write makes \lstinline{fork} faster, since \lstinline{fork} 181 | need not copy memory. Some of the memory will have to be copied 182 | later, when written, but it's often the case that most of the 183 | memory never has to be copied. 184 | A common example is 185 | \lstinline{fork} followed by \lstinline{exec}: 186 | a few pages may be written after the \lstinline{fork}, 187 | but then the child's \lstinline{exec} releases 188 | the bulk of the memory inherited from the parent. 189 | Copy-on-write \lstinline{fork} eliminates the need to 190 | ever copy this memory. 191 | Furthermore, COW fork is transparent: 192 | no modifications to applications are necessary for 193 | them to benefit. 194 | 195 | \section{Real world: Demand paging} 196 | 197 | Yet another widely-used feature that exploits page faults is 198 | \indextext{demand paging}. In the \lstinline{exec} system call, xv6 loads all 199 | of an application's text 200 | and data into memory before starting 201 | the application. Since applications 202 | can be large and reading from disk takes time, this startup cost can 203 | be noticeable to users. To 204 | decrease startup time, a modern kernel doesn't initially load 205 | the executable file into memory, but just creates the user page table with 206 | all PTEs marked invalid. The kernel starts the program running; 207 | each time the program uses a page for the first time, a page 208 | fault occurs, and in response 209 | the kernel reads the content of the page from disk and 210 | maps it into the user address space. Like COW fork and lazy 211 | allocation, the kernel can implement this feature transparently to 212 | applications. 213 | 214 | The programs running on a computer may need more memory than the 215 | computer has RAM. To cope gracefully, the operating system may 216 | implement \indextext{paging to disk}. The idea is to store only a 217 | fraction of user pages in RAM, and to store the rest on disk in a 218 | \indextext{paging area}. The kernel marks PTEs that correspond to 219 | memory stored in the paging area (and thus not in RAM) as invalid. If 220 | an application tries to use one of the pages that has been {\it paged 221 | out} to disk, the application will incur a page fault, and the page 222 | must be {\it paged in}: the kernel trap handler will allocate a page 223 | of physical RAM, read the page from disk into the RAM, and modify the 224 | relevant PTE to point to the RAM. 225 | 226 | What happens if a page needs to be paged in, but there is no free 227 | physical RAM? In that case, the kernel must first free a physical page 228 | by paging it out or {\it evicting} it to the paging area on disk, and 229 | marking the PTEs referring to that physical page as invalid. Eviction 230 | is expensive, so paging performs best if it's infrequent: if 231 | applications use only a subset of their memory pages and the union of 232 | the subsets fits in RAM. This property is often referred to as having 233 | good locality of reference. As with many virtual memory techniques, 234 | kernels usually implement paging to disk in a way that's transparent 235 | to applications. 236 | 237 | Computers often operate with little or no {\it free} physical memory, 238 | regardless of how much RAM the hardware provides. For example, cloud 239 | providers multiplex many customers on a single machine to use their 240 | hardware cost-effectively. As another example, users run many 241 | applications on smart phones in a small amount of physical memory. In 242 | such settings allocating a page may require first evicting an existing 243 | page. Thus, when free physical memory is scarce, allocation is 244 | expensive. 245 | 246 | Lazy allocation and demand paging are particularly advantageous when 247 | free memory is scarce and programs actively use only a fraction of 248 | their allocated memory. These techniques can also avoid the work 249 | wasted when a page is allocated or loaded but either never used or 250 | evicted before it can be used. 251 | 252 | \section{Real world: Memory-mapped files} 253 | 254 | Other features that combine paging and page-fault exceptions include 255 | automatically extending stacks and \indextext{memory-mapped files}, 256 | which are files that a program maps into its address space using 257 | the \texttt{mmap} system call so that the program can read and write 258 | them using load and store instructions. 259 | 260 | % "virtual" memory, eviction, page-in 261 | % lazy allocation 262 | % auto stack expansion 263 | % guard pages 264 | % mmap files 265 | % cow fork 266 | % shared text, shared libraries 267 | % demand paging of text 268 | % virtual machine migration 269 | % distributed shared memory 270 | % fast IPC 271 | % zero-copy write 272 | % DPDK 273 | % (unified block / page cache) 274 | 275 | %% 276 | \section{Exercises} 277 | %% 278 | 279 | \begin{enumerate} 280 | 281 | \item Write a user program that grows its address space by one byte by calling 282 | \lstinline{sbrk(1)}. 283 | Run the program and investigate the page table for the program before the call 284 | to 285 | \lstinline{sbrk} 286 | and after the call to 287 | \lstinline{sbrk}. 288 | How much space has the kernel allocated? What does the 289 | PTE 290 | for the new memory contain? 291 | 292 | \item Implement COW fork. 293 | 294 | \item Implement {\tt mmap}. 295 | 296 | \end{enumerate} 297 | -------------------------------------------------------------------------------- /fig/switch.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 18 | 20 | 27 | 32 | 33 | 40 | 45 | 46 | 53 | 54 | 72 | 75 | 76 | 78 | 79 | 81 | image/svg+xml 82 | 84 | 85 | 86 | 87 | 88 | 92 | Kernel 103 | 110 | 120 | shell 131 | 141 | cat 152 | 156 | userspace 172 | kernelspace 188 | 195 | kstack shell 211 | 218 | kstack cat 234 |   252 | kstackscheduler 268 | 272 | 276 | save 287 | restore 298 | 302 | 306 | swtch 317 | swtch 328 | 329 | 330 | -------------------------------------------------------------------------------- /fig/fslayout.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 18 | 20 | 27 | 33 | 34 | 41 | 47 | 48 | 55 | 56 | 79 | 88 | 89 | 91 | 92 | 94 | image/svg+xml 95 | 97 | 98 | 99 | 100 | 101 | 106 |     135 | 140 | 0 152 | boot 163 | super 174 | inodes 185 | bit map 196 | data 207 | log 218 | 223 | 1 234 | 239 | 244 | 249 | 254 | 259 | 264 | 269 | 274 | data 287 | .... 298 | 2 309 | 314 | 315 | 316 | -------------------------------------------------------------------------------- /interrupt.tex: -------------------------------------------------------------------------------- 1 | \chapter{Interrupts and device drivers} 2 | \label{CH:INTERRUPT} 3 | 4 | A 5 | \indextext{driver} 6 | is the code in an operating system that manages a particular device: 7 | it configures the device hardware, 8 | tells the device to perform operations, 9 | handles the resulting interrupts, 10 | and interacts with processes using the device. 11 | Driver code can be tricky 12 | because a driver executes concurrently with the device, 13 | and often concurrently with processes using the device. In 14 | addition, the driver must understand the device's hardware interface, 15 | which can be complex and poorly documented. 16 | 17 | Devices that need attention from the operating system can usually be 18 | configured to generate interrupts, which are one type of trap. 19 | The kernel trap handling code recognizes when a device 20 | has raised an interrupt and calls the driver's interrupt handler; 21 | in xv6, this dispatch happens in {\tt devintr} \lineref{kernel/trap.c:/^devintr/}. 22 | 23 | Many device drivers execute code in two contexts: a \indextext{top 24 | half} that runs in a process's kernel thread, and a \indextext{bottom 25 | half} that executes at interrupt time. The top half 26 | is called via system calls such as {\tt read} and {\tt write} that want 27 | the device to perform I/O. This code may ask the hardware to start an 28 | operation (e.g., ask the disk to read a block); then the code waits for 29 | the operation to complete. Eventually the device completes the 30 | operation and raises an interrupt. The driver's interrupt handler, 31 | acting as the bottom half, 32 | figures out what operation has completed, wakes up a waiting 33 | process if appropriate, and tells the hardware to start work 34 | on the next operation, if any. 35 | 36 | \section{Code: Console input} 37 | 38 | The console driver \fileref{kernel/console.c} 39 | is a simple illustration of driver structure. The 40 | console driver accepts characters typed by a human, via the \indextext{UART} 41 | serial-port hardware attached to the RISC-V. The console driver accumulates a 42 | line of input at a time, processing special input characters such as 43 | backspace and control-u. User processes, such as the shell, use 44 | the {\tt read} system call to fetch lines of input from the console. 45 | When you type input to xv6 in QEMU, your keystrokes are delivered to 46 | xv6 by way of QEMU's simulated UART hardware. 47 | 48 | The UART hardware that the driver talks to is a 16550 49 | chip~\cite{ns16550a} emulated by QEMU. On a real computer, a 16550 50 | would manage an RS232 serial link connecting to a terminal or other 51 | computer. When running QEMU, it's connected to your keyboard and 52 | display. 53 | 54 | The UART hardware appears to software as a set of \indextext{memory-mapped} 55 | control registers. That is, there are some physical addresses that 56 | are connected to the UART device, so that loads and stores 57 | interact with the device hardware rather than RAM. 58 | The memory-mapped addresses for the UART start at 0x10000000, or {\tt UART0} 59 | \lineref{kernel/memlayout.h:/UART0.0x/}. 60 | There are a handful of UART control registers, each the width 61 | of a byte. Their offsets from {\tt UART0} are defined in 62 | \lineref{kernel/uart.c:/define.RHR/}. For example, the 63 | {\tt LSR} register contains bits that indicate whether input 64 | characters are waiting to be read by the driver. These 65 | characters (if any) are available for reading from the 66 | {\tt RHR} register. Each time one is read, the UART hardware 67 | deletes it from an internal FIFO of waiting characters, and 68 | clears the ``ready'' bit in {\tt LSR} when the FIFO is empty. 69 | To transmit, the driver writes a byte to the {\tt THR} register, 70 | which causes the UART to append the byte to a FIFO of bytes that the 71 | UART will send on the RS232 serial link. 72 | The UART transmit and receive hardware are largely independent 73 | of each other. 74 | 75 | Xv6's {\tt main} calls {\tt consoleinit} 76 | \lineref{kernel/console.c:/^consoleinit/} to initialize the UART 77 | hardware. This code configures the UART to generate 78 | a receive 79 | interrupt when the UART receives each byte of input, and 80 | a \indextext{transmit complete} interrupt each time the 81 | UART finishes sending a byte of output \lineref{kernel/uart.c:/^uartinit/}. 82 | 83 | The xv6 shell reads from the console by way of a file descriptor 84 | opened by {\tt init.c} \lineref{user/init.c:/open..console/}. Calls to 85 | the {\tt read} system call make their way through the kernel to {\tt 86 | consoleread} \lineref{kernel/console.c:/^consoleread/}. {\tt 87 | consoleread} waits for input to arrive (via interrupts) and be 88 | buffered in {\tt cons.buf}, copies the input to user space, and (after 89 | a whole line has arrived) returns to the user process. If the user 90 | hasn't typed a full line yet, any reading processes will wait in the 91 | {\tt sleep} call 92 | \lineref{kernel/console.c:/sleep..cons/} 93 | (Chapter~\ref{CH:SLEEP} explains the details of {\tt sleep}). 94 | 95 | When the user types a character, the UART hardware asks the RISC-V 96 | to raise an interrupt, which activates 97 | xv6's trap handler. 98 | The trap handler calls {\tt devintr} 99 | \lineref{kernel/trap.c:/^devintr/}, 100 | which looks at the RISC-V {\tt scause} register to discover that 101 | the interrupt is from an external device. 102 | Then it asks a hardware unit called the PLIC 103 | \cite{riscv:priv} 104 | to tell it which device interrupted 105 | \lineref{kernel/trap.c:/plic.claim/}. 106 | If it was the UART, {\tt devintr} calls {\tt uartintr}. 107 | 108 | {\tt uartintr} 109 | \lineref{kernel/uart.c:/^uartintr/} 110 | reads any waiting input characters from the UART hardware 111 | and hands them to {\tt consoleintr} 112 | \lineref{kernel/console.c:/^consoleintr/}; it doesn't 113 | wait for characters, since future input will raise a new interrupt. 114 | The job of {\tt consoleintr} is to accumulate input characters in 115 | {\tt cons.buf} 116 | until a whole line arrives. 117 | {\tt consoleintr} treats backspace and a few other characters 118 | specially. 119 | When a newline arrives, {\tt consoleintr} wakes up a 120 | waiting {\tt consoleread} (if there is one). 121 | 122 | Once woken, {\tt consoleread} will observe a full line in {\tt 123 | cons.buf}, copy it to user space, and return (via the system call 124 | machinery) to user space. 125 | 126 | \section{Code: Console output} 127 | 128 | A {\tt write} system call on a file descriptor connected to the console 129 | eventually arrives at 130 | {\tt uartputc} 131 | \lineref{kernel/uart.c:/^uartputc/}. 132 | The device driver maintains an output buffer ({\tt uart\_tx\_buf}) 133 | so that writing processes do not have to wait for the UART to finish 134 | sending; instead, {\tt uartputc} appends each character to the buffer, 135 | calls {\tt uartstart} to start the device transmitting (if it isn't 136 | already), and returns. The only situation in which {\tt uartputc} 137 | waits is if the buffer is already full. 138 | 139 | Each time the UART finishes sending a byte, it generates an interrupt. 140 | {\tt uartintr} calls {\tt uartstart}, which checks that the device 141 | really has finished sending, and hands the device the next buffered 142 | output character. Thus if a process writes multiple bytes to the 143 | console, typically the first byte will be sent by {\tt uartputc}'s 144 | call to {\tt uartstart}, and the remaining buffered bytes will be sent 145 | by {\tt uartstart} calls from {\tt uartintr} as transmit complete 146 | interrupts arrive. 147 | 148 | A general pattern to note is the decoupling of device activity from 149 | process activity via buffering and interrupts. The console driver can 150 | process input even when no process is waiting to read it; a subsequent 151 | read will see the input. Similarly, processes can send output without 152 | having to wait for the device. This decoupling can increase 153 | performance by allowing processes to execute concurrently with device 154 | I/O, and is particularly important when the device is slow (as with 155 | the UART) or needs immediate attention (as with echoing typed 156 | characters). This idea is sometimes called \indextext{I/O 157 | concurrency}. 158 | 159 | \section{Concurrency in drivers} 160 | 161 | You may have noticed calls to {\tt acquire} in {\tt consoleread} 162 | and in {\tt consoleintr}. These calls acquire a lock, which protects 163 | the console driver's data structures from concurrent access. 164 | There are three concurrency dangers here: two processes on 165 | different CPUs might call {\tt consoleread} at the same time; 166 | the hardware might ask a CPU to deliver a console (really 167 | UART) interrupt while that CPU is already executing inside 168 | {\tt consoleread}; 169 | and the hardware might deliver a console interrupt on 170 | a different CPU while {\tt consoleread} is executing. 171 | Chapter~\ref{CH:LOCK} explains how to use locks 172 | to ensure that these dangers don't lead to incorrect results. 173 | 174 | Another way in which concurrency requires care in drivers is that one 175 | process may be waiting for input from a device, but the interrupt 176 | signaling arrival of the input may arrive when a different process (or 177 | no process at all) is running. Thus interrupt handlers are not allowed 178 | to think about the process or code that they have interrupted. For 179 | example, an interrupt handler cannot safely call {\tt copyout} with 180 | the current process's page table. Interrupt handlers typically do 181 | relatively little work (e.g., just copy the input data to a buffer), 182 | and wake up top-half code to do the rest. 183 | 184 | \section{Timer interrupts} 185 | 186 | Xv6 uses timer interrupts to maintain its idea of the 187 | current time and to switch among compute-bound processes. Timer 188 | interrupts come from clock hardware attached to each RISC-V CPU. Xv6 189 | programs each CPU's clock hardware to interrupt the CPU periodically. 190 | 191 | Code in {\tt start.c} 192 | \lineref{kernel/start.c:/^timerinit/} sets some control bits 193 | that allow supervisor-mode access to the timer control 194 | registers, and then asks for the first timer interrupt. 195 | The {\tt time} control register contains a count that the 196 | hardware increments at a steady rate; this serves as a 197 | notion of the current time. The {\tt stimecmp} register 198 | contains a time at which the the CPU will raise a timer 199 | interrupt; setting {\tt stimecmp} to the current value 200 | of {\tt time} plus {\it x} will schedule an interrupt 201 | {\it x} time units in the future. For {\tt qemu}'s RISC-V 202 | emulation, 1000000 time units is roughly a tenth of second. 203 | 204 | Timer interrupts arrive via {\tt usertrap} or {\tt kerneltrap} 205 | and {\tt devintr}, like other device interrupts. 206 | Timer interrupts arrive with {\tt scause}'s low bits set to 207 | five; {\tt devintr} in {\tt trap.c} detects this situation 208 | and calls {\tt clockintr} 209 | \lineref{kernel/trap.c:/clockintr/}. 210 | The latter function increments {\tt ticks}, 211 | allowing the kernel to track the 212 | passage of time. The increment occurs on only one CPU, to avoid time 213 | passing faster if there are multiple CPUs. 214 | {\tt clockintr} wakes up any processes waiting in the {\tt pause} 215 | system call, 216 | and schedules the next timer interrupt by writing 217 | {\tt stimecmp}. 218 | 219 | {\tt devintr} returns 2 for a timer interrupt 220 | in order to indicate to {\tt kerneltrap} 221 | or {\tt usertrap} that they should call {\tt yield} so that 222 | CPUs can be multiplexed among runnable processes. 223 | 224 | The fact that kernel code can be interrupted by a timer interrupt that 225 | forces a context switch via {\tt yield} is part of the reason why 226 | early code in {\tt usertrap} is careful to save state such as {\tt 227 | sepc} before enabling interrupts. These context switches also mean 228 | that kernel code must be written in the knowledge that it may move 229 | from one CPU to another without warning. 230 | 231 | \section{Real world} 232 | 233 | Xv6, like many operating systems, allows interrupts and even context 234 | switches (via {\tt yield}) while executing in the kernel. The reason 235 | for this is to retain quick response times during complex system calls 236 | that run for a long time. However, as noted above, allowing interrupts 237 | in the kernel is the source of some complexity; as a result, a few 238 | operating systems allow interrupts only while executing user code. 239 | 240 | Supporting all the devices on a typical computer in its full glory is 241 | much work, because there are many devices, the devices have many 242 | features, and the protocol between device and driver can be complex 243 | and poorly documented. In many operating systems, the drivers account 244 | for more code than the core kernel. 245 | 246 | The UART driver retrieves data a byte at a time by reading the UART 247 | control registers; this pattern is called \indextext{programmed I/O}, since 248 | software is driving the data movement. Programmed I/O is simple, but 249 | too slow to be used at high data rates. Devices that need to move lots 250 | of data at high speed typically use \indextext{direct memory access (DMA)}. 251 | DMA device hardware directly writes incoming data to RAM, and reads 252 | outgoing data from RAM. Modern disk and network devices use DMA. A 253 | driver for a DMA device would prepare data in RAM, and then use a 254 | single write to a control register to tell the device to process the 255 | prepared data. 256 | 257 | Interrupts make sense when a device needs attention at unpredictable 258 | times, and not too often. But interrupts have high CPU overhead. Thus 259 | high speed devices, such as network and disk controllers, use tricks 260 | that reduce the need for interrupts. One trick is to raise a single 261 | interrupt for a whole batch of incoming or outgoing requests. Another 262 | trick is for the driver to disable interrupts entirely, and to check 263 | the device periodically to see if it needs attention. This technique 264 | is called \indextext{polling}. Polling makes sense if the device performs 265 | operations at a high rate, but it wastes CPU time if the device is mostly 266 | idle. Some drivers dynamically switch between polling and interrupts 267 | depending on the current device load. 268 | 269 | The UART driver copies incoming data first to a buffer in the kernel, 270 | and then to user space. This makes sense at low data rates, but such a 271 | double copy can significantly reduce performance for devices that 272 | generate or consume data very quickly. Some operating systems are able 273 | to directly move data between user-space buffers and device hardware, 274 | often with DMA. 275 | 276 | As mentioned in Chapter~\ref{CH:UNIX}, the console appears to 277 | applications as a regular file, and applications read input and write 278 | output using the \lstinline{read} and \lstinline{write} system calls. 279 | Applications may want to control aspects of a device that cannot be 280 | expressed through the standard file system calls (e.g., 281 | enabling/disabling line buffering in the console driver). Unix 282 | operating systems provide an \lstinline{ioctl} system call for such 283 | cases. 284 | 285 | Some uses of computers require ``real-time'' responses to external 286 | events: responses guaranteed to occur within a bounded time. For 287 | example, in safety-critical systems missing a deadline can lead to 288 | disasters. Xv6 is not suitable for real-time settings. Among other 289 | things, xv6's scheduler does not take into account real-time deadlines 290 | when it decides what process to run next, and xv6 has long kernel code 291 | paths with interrupts disabled, so that it may not respond to 292 | interrupts quickly. A real-time operating system must not only fix 293 | these problems, but also be structured in a way that allows analysis 294 | of worst-case response times. 295 | 296 | \section{Exercises} 297 | 298 | \begin{enumerate} 299 | 300 | \item Modify {\tt uart.c} to not use interrupts at all. You may need 301 | to modify {\tt console.c} as well. 302 | 303 | \item Add a driver for an Ethernet card. 304 | 305 | \end{enumerate} 306 | --------------------------------------------------------------------------------