├── egscc.jpg
├── metagraph.jpg
├── prooflemmascc.jpg
├── vertexcover1.jpg
├── closestpointlemma.jpg
├── .gitignore
├── README.md
├── notes.tex
├── preamble.tex
├── Introduction.tex
├── RandomizedAlgorithms.tex
├── DivideConquer.tex
├── Graphs.tex
├── NPCompleteness.tex
├── DynamicProgramming.tex
├── GreedyAlgorithms.tex
└── DataStructures.tex


/egscc.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/egscc.jpg


--------------------------------------------------------------------------------
/metagraph.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/metagraph.jpg


--------------------------------------------------------------------------------
/prooflemmascc.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/prooflemmascc.jpg


--------------------------------------------------------------------------------
/vertexcover1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/vertexcover1.jpg


--------------------------------------------------------------------------------
/closestpointlemma.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/closestpointlemma.jpg


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | *.aux
 2 | *.fdb_latexmk
 3 | *.fls
 4 | *.log
 5 | *.loa
 6 | *.out
 7 | *.synctex.gz
 8 | *.glg
 9 | *.glo
10 | *.gls
11 | *.ist
12 | *.toc
13 | *.xdy
14 | *.pdf
15 | !notes.pdf
16 | *.1
17 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # AlgorithmDesignAnalysisNotes
2 | Notes for coursera course *Algorithms: Design and Analysis* by Prof. Tim Roughgarden from Stanford University.
3 | 
4 | Feel free to take and edit. But you have to know latex to edit it. Otherwise just take the file *notes.pdf* from the release.
5 | 


--------------------------------------------------------------------------------
/notes.tex:
--------------------------------------------------------------------------------
 1 | \input{preamble}
 2 | \def\PREAMBLE{PREAMBLE}
 3 | \title{Notes for Algorithms: Design and Analysis}
 4 | \author{Jing LI\\pkuyplijing@gmail.com}
 5 | 
 6 | \begin{document}
 7 | \pagestyle{empty}
 8 | \hypersetup{pageanchor=false}
 9 | \maketitle
10 | \begin{center}
11 | \emph{Sincere gratitude to Professor Tim Roughgarden \\for offering such a wonderful course.}
12 | \end{center}
13 | \tableofcontents
14 | \newpage
15 | \hypersetup{pageanchor=true}
16 | \pagenumbering{arabic}
17 | \pagestyle{headings}
18 | \addtocontents{toc}{\protect\thispagestyle{empty}}
19 | \input{Introduction}
20 | \input{DivideConquer}
21 | \input{RandomizedAlgorithms}
22 | \input{Graphs}
23 | \input{DataStructures}
24 | \input{GreedyAlgorithms}
25 | \input{DynamicProgramming}
26 | \input{NPCompleteness}
27 | \clearpage%somehow needed so that list of algorithms in toc jumps to the correct page.
28 | \phantomsection
29 | \addcontentsline{toc}{chapter}{\listalgorithmname}
30 | \listofalgorithms
31 | \end{document}


--------------------------------------------------------------------------------
/preamble.tex:
--------------------------------------------------------------------------------
 1 | \documentclass{report}
 2 | \usepackage[format = hang, font = bf]{caption}
 3 | \usepackage{subcaption}
 4 | % The following is needed in order to make the code compatible
 5 | % with both latex/dvips and pdflatex. Added for using UML generated by MetaUML.
 6 | \ifx\pdftexversion\undefined
 7 | \usepackage[dvips]{graphicx}
 8 | \else
 9 | \usepackage[pdftex]{graphicx}
10 | \DeclareGraphicsRule{*}{mps}{*}{}
11 | \fi
12 | \usepackage{array}
13 | \usepackage{amsmath}
14 | \usepackage{amsthm}
15 | \usepackage{mathtools}
16 | \usepackage{boxedminipage}
17 | \usepackage{listings}
18 | \usepackage{multicol}
19 | \usepackage{makecell}%diagonal line in table
20 | \usepackage{float}%allowing forceful figure[H]
21 | \usepackage{xcolor}
22 | \usepackage{amsfonts}%allowing \mathbb{R}
23 | \usepackage{amssymb}
24 | \usepackage{alltt}
25 | \usepackage{algorithmicx}
26 | \usepackage[chapter]{algorithm} 
27 | %chapter option ensures that algorithms are numbered within each chapter rather than in the whole article
28 | \usepackage[noend]{algpseudocode} %If end if, end procdeure, etc is expected to appear, remove the noend option
29 | \usepackage{xspace}
30 | \usepackage{physics}
31 | \usepackage{color}
32 | \usepackage{tikz}
33 | \usetikzlibrary{shapes,positioning}
34 | \usepackage{url}
35 | \def\UrlBreaks{\do\A\do\B\do\C\do\D\do\E\do\F\do\G\do\H\do\I\do\J\do\K\do\L\do\M\do\N\do\O\do\P\do\Q\do\R\do\S\do\T\do\U\do\V\do\W\do\X\do\Y\do\Z\do\[\do\\\do\]\do\^\do\_\do\`\do\a\do\b\do\c\do\d\do\e\do\f\do\g\do\h\do\i\do\j\do\k\do\l\do\m\do\n\do\o\do\p\do\q\do\r\do\s\do\t\do\u\do\v\do\w\do\x\do\y\do\z\do\0\do\1\do\2\do\3\do\4\do\5\do\6\do\7\do\8\do\9\do\.\do\@\do\\\do\/\do\!\do\_\do\|\do\;\do\>\do\]\do\)\do\,\do\?\do\'\do+\do\=\do\#\do\-}
36 | \usepackage{xr}%allow cross-file references
37 | \usepackage[breaklinks = true]{hyperref}
38 | \lstset{
39 | language = C++, 
40 | showspaces = false,
41 | breaklines = true, 
42 | tabsize = 2, 
43 | numbers = left, 
44 | extendedchars = false, 
45 | basicstyle = {\ttfamily \footnotesize}, 
46 | keywordstyle=\color{blue!70}, 
47 | commentstyle=\color{gray}, 
48 | frame=shadowbox, 
49 | rulesepcolor=\color{red!20!green!20!blue!20}, 
50 | numberstyle={\color[RGB]{0,192,192}}, 
51 | moredelim=[is][\underbar]{_}{_}
52 | }
53 | \mathchardef\myhyphen="2D
54 | % switch-case environment definitions
55 | \algblock{switch}{endswitch} 
56 | \algblock{case}{endcase}
57 | %\algrenewtext{endswitch}{\textbf{end switch}} %If end switch is expected to appear, uncomment this line.
58 | \algtext*{endswitch} % Make end switch disappear
59 | \algtext*{endcase}
60 | \algnewcommand\algorithmicinput{\textbf{Input}}
61 | \algnewcommand\Input{\item[\algorithmicinput]}
62 | \algnewcommand\algorithmicoutput{\textbf{Output}}
63 | \algnewcommand\Output{\item[\algorithmicoutput]}
64 | \algnewcommand\algorithmicinputoutput{\textbf{input and output:}}
65 | \algnewcommand\InputOutput{\item[\algorithmicinputoutput]}
66 | \allowdisplaybreaks
67 | \newtheorem{theorem}{Theorem}
68 | \newtheorem{corollary}[theorem]{Corollary}
69 | \newtheorem{lemma}[theorem]{Lemma}
70 | \newtheorem{definition}{Definition}


--------------------------------------------------------------------------------
/Introduction.tex:
--------------------------------------------------------------------------------
  1 | \ifx\PREAMBLE\undefined
  2 | \input{preamble}
  3 | \begin{document}
  4 | \fi
  5 | \chapter{Introduction}
  6 | \section{Warmup: Integer Multiplication Problem}
  7 | \subsection{The Primary School Approach}
  8 | Consider the integer multiplication algorithm that everyone learned in primary school.
  9 | \begin{description}
 10 | \item[Input]two n-digit numbers $x$ and $y$.
 11 | \item[Output]the product $x\times y$.
 12 | \end{description}
 13 | We will assess its performance by counting the number of \textbf{primitive operations}, here being the addition or multiplication of two single-digit numbers, required to carry it out. In this case, it is clearly $\Theta(n^2)$. It might have been taken for granted that this is the unique, or at least optimal approach, while actually it isn't. As Aho, Hopcroft and Ullman put it in their 1974 book \emph{The Design and Analysis of Computer Algorithms}, ``Perhaps the most important principle for the good algorithm designer is to refuse to be content.'' Always be ready to ask yourself the question: CAN WE DO BETTER?
 14 | \subsection{Karatsuba Multiplication}
 15 | Consider the calculation of $5678\times1234$. We will note $a = 56, b = 78, c = 12$ and $d = 34$. The calculation can be carried out with the following steps:
 16 | \begin{enumerate}
 17 | \item Calculate $a\cdot c = 672$;
 18 | \item Calculate $b\cdot d = 2652$;
 19 | \item Calculate $(a+b)(c+d) = 134\times46 = 6164$;
 20 | \item Calculate \textcircled{\raisebox{-.9pt}{3}} - \textcircled{\raisebox{-.9pt}{2}} - \textcircled{\raisebox{-.9pt}{1}} = 2840;
 21 | \item Calculate \textcircled{\raisebox{-.9pt}{1}}$\times$10000 + \textcircled{\raisebox{-.9pt}{4}}$\times$100 + \textcircled{\raisebox{-.9pt}{2}} = 6720000 + 284000 + 2652 = 7006652.
 22 | \end{enumerate}
 23 | The procedure here can be formalized into a recursive algorithm to calculate $x\times y$. Write $x=10^{n/2}a + b$ and $y=10^{n/2}c + d$. Obviously we have $xy=10^nac +\allowbreak 10^{n/2}(ad+bc) +\allowbreak bd =\allowbreak 10^nac + 10^{n/2}((a+b)(c+d)-ac-bd) + bd$, which inspires us of the following algorithm:
 24 | \begin{algorithm}[ht]
 25 | \caption{Karatsuba Multiplication}\label{kmultiplication}
 26 | \begin{algorithmic}[1]\item Recursively calculate $ac$;
 27 | \State{Recursively calculate $bc$}
 28 | \State{Recursively calculate $(a+b)(c+d)$}
 29 | \State{Calculate \textcircled{\raisebox{-.9pt}{3}} - \textcircled{\raisebox{-.9pt}{2}} - \textcircled{\raisebox{-.9pt}{1}} to get $ad+bc$}
 30 | \State{Combine the results appropriately, i.e. $xy=10^nac +\allowbreak 10^{n/2}(ad+bc) +\allowbreak bd$}
 31 | \end{algorithmic}
 32 | \end{algorithm}
 33 | 
 34 | Algorithm \ref{kmultiplication} demonstrates that even a simple problem with a presumably unique solution has plenty of room for subtle algorithm analysis and design. 
 35 | \section{Course Topic}
 36 | The course will cover the following topics:
 37 | \begin{itemize}
 38 | \item Vocabulary for design and analysis of algorithms;
 39 | \item Divide-and-conquer algorithm design paradigm;
 40 | \item Randomization in algorithm design;
 41 | \item Primitives for reasoning about graphs;
 42 | \item Use and implementation of data structures.
 43 | \end{itemize}
 44 | In part II of the course the following topics will be covered:
 45 | \begin{itemize}
 46 | \item Greedy algorithm design paradigm;
 47 | \item Dynamic programming algorithm design paradigm;
 48 | \item NP-complete problems and what to do with them;
 49 | \item Fast heuristics with provable guarantees for NP problems;
 50 | \item Fast exact algorithms for special cases of NP problems;
 51 | \item Exact algorithms that beat brute-force search for NP problems.
 52 | \end{itemize}
 53 | \section{Merge Sort}
 54 | We will use \textbf{merge sort} to illustrate a few basic ideas of the course. Merge sort is a non-trivial algorithm to tackle the sorting problem:
 55 | \begin{description}
 56 | \item[Input]An unsorted array of $n$ numbers;
 57 | \item[Output]The same numbers in sorted order.
 58 | \end{description}
 59 | \subsection{Pseudo Code}
 60 | Merge sort is more efficient than selection sort, insertion sort and bubble sort. It is  a good introductory example for the divide \& conquer paradigm. Its pseudo code is shown in Algorithm \ref{mergesort}.
 61 | \begin{algorithm}[ht]
 62 | \caption{Merge sort}\label{mergesort}
 63 | \begin{algorithmic}[1]
 64 | \Input\Statex{Unsorted array of length n}
 65 | \Output\Statex{Sorted array of length n}
 66 | \If{length of the array = 0 or 1}
 67 | \State{return}\Comment{Basic case. Array already sorted.}
 68 | \EndIf
 69 | \State{Recursively merge sort 1st half of the array.}
 70 | \State{Recursively merge sort 2nd half of the array.}
 71 | \State{Merge the two sorted halves into one sorted array.}
 72 | \end{algorithmic}
 73 | \end{algorithm}
 74 | The merging process may not seem intuitive. It is implemented with parallel traverses of the two sorted sub-arrays, as shown in Algorithm \ref{mergehalves}.
 75 | \begin{algorithm}[ht]
 76 | \caption{Merging two sorted sub-arrays}\label{mergehalves}
 77 | \begin{algorithmic}[1]
 78 | \Input
 79 | \Statex{A = 1st sorted sub-array, of length $\lfloor n/2 \rfloor$}
 80 | \Statex{B = 2nd sorted sub-array, of length $\lceil n/2 \rceil$}
 81 | \Output\Statex{C = sorted array of length $n$}
 82 | \State{i = 1, j = 1}
 83 | \For{k = 1 \textbf{to} $n$}
 84 | \If{i $>$ A.len}\Comment{A has been exhausted}
 85 | \State{C[k] = B[j++]}
 86 | \ElsIf{j $>$ B.len}\Comment{B has been exhausted}
 87 | \State{C[k] = A[i++]}
 88 | \ElsIf{A[i] $<$ B[j]}
 89 | \State{C[k] = A[i++]}
 90 | \Else\Comment{A[i]$\geq$B[j]}
 91 | \State{C[k] = B[j++]}
 92 | \EndIf
 93 | \EndFor
 94 | \end{algorithmic}
 95 | \end{algorithm}
 96 | \subsection{Running Time}
 97 | The running time of the merging operation is obviously linear to the length of the array. Precisely speaking, each iteration involves one increment of i or j, one increment of k, an assignment to C[k] and at most 3 comparisons\footnote{Here we are taking an approach more detailed than that in the lecture: end cases, i.e. when A or B is exhausted, are taken into account.}. Taking the initialization of i and j into account, in total we have to carry out 6n + 2 primitive operations, which is smaller than 8n.
 98 | 
 99 | We can then draw the \textbf{recursive tree} of the problem. The original merge sort problem of size $n$ resides at level 0. At level 1 we have 2 sub-problems of size $n/2$, etc. In general, at level $j$ we have $2^j$ sub-problems of size $\frac{n}{2^j}$, and in total we have $\log n+1$ levels\footnote{At the last level $k$ we must have $\frac{n}{2^k}=1$, thus $k=\log n$.}. At each level, the number of required primitive operations is smaller than $8\cdot\frac{n}{2^j}\cdot 2^j=8n$. In the end, we have an upper bound of the total number of primitive operations needed to solve the original merge sort problem of size $n$: $8n(\log n+1)$.
100 | \section{Asymptotic Analysis}
101 | In the analysis above, we have been applying 3 guiding principles that will serve as universal tactics in future analysis of algorithms:
102 | \begin{enumerate}
103 | \item Focus on worst-case analysis, rather than average-case analysis or benchmarks on a specified set of inputs.
104 | \item Analyze with no regard to the constant factor.
105 | \item Conduct asymptotic analysis, i.e. focus on running time when $n$ is large.
106 | \end{enumerate}
107 | 
108 | \textbf{Asymptotic analysis} provides basic vocabulary for the design and analysis of algorithms. It is essential for high-level reasoning about algorithms because it is both coarse enough to suppress details dependent upon architecture, language, compiler and implementation details, and sharp enough to facilitate comparisons between different algorithms, especially for inputs of large size. Its high-level idea is to \textbf{suppress constant factors as well as lower-order terms}. For our example of merge sort, the running time of $8n(\log n+1)$ is actually equivalent to $n\log n$, or in big-O notation, $O(n\log n)$.
109 | \subsection{Big-O Notation}
110 | Let $T(n),n\in\mathbb{N}$ be the function representing the running time of an algorithm with input of size $n$. The \textbf{Big-O notation} $T(n) = O(f(n))$ means that eventually (for all sufficiently large $n$), $T(n)$ will be bounded above by a constant multiple of $f(n)$. We hereby provide its formal definition.
111 | \begin{definition}\label{bigodef}
112 | Big-O notation $T(n)=O(f(n))$ holds if and only if there exist constants $c,n_0$ such that 
113 | $$T(n)\leq c\cdot f(n),\forall n\geq n_0.$$
114 | \end{definition}
115 | \begin{theorem}
116 | \begin{equation*}
117 | a_kn^k+\dots+a_1n+a_0=O(n^k)
118 | \end{equation*}
119 | \end{theorem}
120 | \begin{proof}
121 | Constants $n_0=1$ and $c=\sum\limits_{i=0}^{k}\lvert a_i\rvert$ satisfy Definition \ref{bigodef}.
122 | \end{proof}
123 | \begin{theorem}
124 | For every $k\geq 1$, $n^k\neq O(n^{k-1})$.
125 | \end{theorem}
126 | \begin{proof}
127 | The theorem can be proved by contradiction. Suppose $n^k=O(n^{k-1})$, i.e. $\exists$ constants $c,n_0$ such that $$n^k\leq c\cdot n^{k-1}, \forall n>n_0.$$Then $$n\leq c,\forall n>n_0,$$which is an obvious contradiction. 
128 | \end{proof}
129 | \subsection{Omega, Theta and Little-O Notations}
130 | \begin{definition}
131 | Omega notation $T(n)=\Omega(f(n))$ holds if and only if there exist constants $c,n_0$ such that 
132 | $$T(n)\geq c\cdot f(n),n\geq n_0.$$
133 | \end{definition} 
134 | \begin{definition}
135 | Theta notation $T(n)=\Theta(f(n))$ holds if and only if $T(n)=\Omega(f(n))$ and $T(n)=O(f(n))$, which is equivalent to $\exists$ constants $c_1,c_2,n_0$ such that 
136 | $$c_1\cdot f(n)\leq T(n)\leq c_2\cdot f(n),\forall n\geq n_0$$
137 | \end{definition} 
138 | A convention in algorithm design, though a sloppy one, is to use big-O notation to represent Theta notation. 
139 | \begin{definition}
140 | Little-O notation $T(n)=o(f(n))$ holds if and only if  $\forall$ constant $c>0$, $\exists$ constant $n_0$ such that 
141 | $$T(n)\leq c\cdot f(n), \forall n\geq n_0.$$
142 | \end{definition} 
143 | \begin{theorem}
144 | $n^{k-1}=o(n^k),\forall k\geq 1$
145 | \end{theorem}
146 | \begin{proof}
147 | Constant $n=1/c$ can satisfy the condition.
148 | \end{proof}
149 | \ifx\PREAMBLE\undefined
150 | \end{document}
151 | \fi


--------------------------------------------------------------------------------
/RandomizedAlgorithms.tex:
--------------------------------------------------------------------------------
  1 | \ifx\PREAMBLE\undefined
  2 | \input{preamble}
  3 | \begin{document}
  4 | \fi
  5 | \chapter{Randomized Algorithms}
  6 | \section{Quick Sort}
  7 | \subsection{Overview}
  8 | Quick sort is a prevalent sorting algorithm in practice. It is $O(n\log n)$ on average, and it works in place, i.e. extra memory need to carry out the sort is minimal, whereas for merge sort, we need at least $O(n)$ extra memory. The problem is the same as specified for merge sort. Here we assume that all items inside the array are distinct for simplicity. 
  9 | 
 10 | The key idea of merge sort is \textbf{partition the array around a pivot element}. Plenty of deliberation remains for the choice of the pivot element. For the moment we just assume that the first element is used. In a partition, the array is rearranged so that elements smaller than the pivot are put before the pivot, while elements larger than it are put after the pivot. The partition puts the pivot in the correct position. By recursively partition the two sub-arrays on both sides of the pivot, the whole array becomes sorted. As will be revealed later, a partition can be finished with $O(n)$ time and no extra memory. The skeleton of the algorithm is shown in Algorithm \ref{quicksortskeleton}.
 11 | \begin{algorithm}[ht]
 12 | \caption{Skeleton of Quick Sort}\label{quicksortskeleton}
 13 | \begin{algorithmic}[1]
 14 | \Input\Statex{Array $A$ with $n$ distinct elements in any order}
 15 | \Output\Statex{Array $A$ in sorted order}
 16 | \If{$A.len$ == 1}
 17 | \State{return}
 18 | \Else
 19 | \State{$p$ = ChoosePivot($A$)}
 20 | \State{Partition $A$ around $p$}
 21 | \State{Recursively sort 1st part(on left of $p$)}
 22 | \State{Recursively sort 2nd part(on right of $p$)}
 23 | \EndIf
 24 | \end{algorithmic}
 25 | \end{algorithm}
 26 | \subsection{Partition Subroutine}
 27 | If the in place requirement is thrown away, it is easy to come up with a partition algorithm using $O(n)$ time and $O(n)$ extra memory, as shown in Algorithm \ref{partitionextramemory}.
 28 | \begin{algorithm}[ht]
 29 | \caption{Partition with $O(n)$ Extra Memory}\label{partitionextramemory}
 30 | \begin{algorithmic}[1]
 31 | \Input\Statex{Array $A$ with $n(n>1)$ distinct elements in any order}
 32 | \Statex{Pivot element $p$, put at first position of $A$}
 33 | \Output\Statex{Array $A$ partitioned around $p$}
 34 | \State{Allocate $temp[n]$}
 35 | \State{$small = 1, big = n$}
 36 | \For{$i$ = 2 \textbf{to} $n$} 
 37 | \If{$A[i]>p$}
 38 | \State{$temp[big--] = A[i]$}
 39 | \Else\Comment{$A[i]<p$}
 40 | \State{$temp[small++] = A[i]$}
 41 | \EndIf
 42 | \EndFor
 43 | \State{Assert $small == big$}
 44 | \State{$temp[small] = p$}
 45 | \State{Copy $temp$ back to $A$}
 46 | \end{algorithmic}
 47 | \end{algorithm}
 48 | 
 49 | Now we try to implement an in-place partition algorithm that uses no extra memory. During the process, the array will be composed of 4 consecutive parts: the pivot $p$ at the first position, then elements smaller than $p$, elements larger than $p$ and finally elements remaining to be partitioned. Algorithm \ref{partitionnoextramemory} provides such an implementation. It completes the partition in one scan of the array, and uses no extra memory. $i$ is the index of the first element larger than $p$\footnote{If $A[2]$ is smaller than $p$, $i$ will not have the same meaning when it gets initialized to 2. But the algorithm is still correct. In such case, $i$ remains the same as $j$ until we encounter the first element larger than $p$, and $swap$ will not do anything because $k$ and $i$ are always the same when we do the comparison.}, while $j$ is the index of the first unpartitioned element. 
 50 | \begin{algorithm}[ht]
 51 | \caption{Partition with No Extra Memory}\label{partitionnoextramemory}
 52 | \begin{algorithmic}[1]
 53 | \Input\Statex{Array $A$ with $n(n>1)$ distinct elements in any order}
 54 | \Statex{Pivot element $p$, put at first position of $A$}
 55 | \Output\Statex{Array $A$ partitioned around $p$}
 56 | \State{$i = 2,\:j = 2$}
 57 | \For{$k$ = 2 \textbf{to} $n$} 
 58 | \If{$A[k]<p$}
 59 | \State{$swap(A[k],A[i++])$}
 60 | \EndIf
 61 | \State{$j++$}
 62 | \EndFor
 63 | \State{$swap(A[1],A[i-1])$}
 64 | \end{algorithmic}
 65 | \end{algorithm}
 66 | \subsection{Choice of Pivot}
 67 | The running time of quick sort depends on the choice of the pivot. The naive approach taken above to always choose the first element is not satisfactory. If the array is already sorted, each time the size of the problem is only reduced by 1, which is no better than selection sort, and the running time if $O(n^2)$. This is indeed the worst case of quick sort. On the contrary, if each time the pivot divides the array into sub-arrays of the same size, we will have the recurrence relation 
 68 | \begin{equation*}
 69 | T(n)=2\cdot T(n/2)+O(n),
 70 | \end{equation*}
 71 | and the running time will be $O(n\log n)$ according to the master method, which is the best case. 
 72 | \subsection{Quick Sort Theorem}
 73 | A good solution is to choose the pivot randomly at every recursive call. We will prove the quick sort theorem \ref{quicksorttheorem}. 
 74 | \begin{theorem}\label{quicksorttheorem}
 75 | For every input array $A$ of length $n$, the average running time of quick sort (with random pivots) is $O(n\log n)$.
 76 | \end{theorem}
 77 | The proof of the theorem involves some basic probability knowledge that won't be covered here. Let $\Omega$ represent the sample space of all possible sequences of pivots in quick sort. For any $\sigma\in\Omega$, let $C(\sigma)$ represent the number of comparisons between array elements made by quick sort. We have Lemma \ref{rtdominated}.
 78 | \begin{lemma}\label{rtdominated}
 79 | The running time of quick sort is dominated by comparisons, i.e. $\exists$ constant $c$ such that $\forall\sigma\in\Omega$, 
 80 | $$RT(\sigma)\leq c\cdot C(\sigma).$$
 81 | \end{lemma}
 82 | Lemma \ref{rtdominated} means that in order to prove the quick sort algorithm, all we need is to prove the expectation of $C(\sigma)$ is $O(n\log n)$. We cannot apply the master method here because in quick sort, the two sub-problems are unlikely to have the same size. 
 83 | 
 84 | For a fixed input array $A$, let $z_i$ represent its $i^{th}$ smallest element. Let $x_{ij}(\sigma)$ represent the number of comparisons between $z_i$ and $z_j$ made during quick sort with pivot sequence $\sigma.$ Whenever a comparison happens, one of the two elements being compared is the pivot. If $z_i$ and $z_j$ have been partitioned respectively into two sides of a pivot before neither is used as pivot, they will never be compared in the future. If $z_i$ is used as pivot and is compared against $z_j$, $z_i$ will be at its correct position at the end of the partition, and is excluded from any further comparisons. Thus $\forall i,j,\sigma$, it is clear that $x_{ij}(\sigma)$ is either 0 or 1. A random variable that can only take values 0 and 1, like $x_{ij}$ here, is called an \textbf{indicator}. 
 85 | 
 86 | $C(\sigma)$ can be expressed as the sum over all $x_{ij}$:
 87 | \begin{equation*}
 88 | C(\sigma)=\sum\limits_{i=1}^{n-1}\sum\limits_{j=i+1}^{n}x_{ij}(\sigma).
 89 | \end{equation*}
 90 | According to the \textbf{linearity of expectation}, an taking into account of the fact that $x_{ij}$ is an indicator, we have
 91 | \begin{equation*}
 92 | E[C]=\sum\limits_{i=1}^{n-1}\sum\limits_{j=i+1}^{n}E[x_{ij}]=\sum\limits_{i=1}^{n-1}\sum\limits_{j=i+1}^{n}P[x_{ij}=1].
 93 | \end{equation*}
 94 | Here we are actually applying a \textbf{decomposition approach} that goes for the analysis of average running time of a lot of randomized algorithms. 
 95 | \begin{enumerate}
 96 | \item Identify random variable $Y$ that we care about.
 97 | \item Express $Y$ as the sum of a series of indicators: $Y=\sum\limits_lx_l$.
 98 | \item Apply linearity of expectation: $E[Y]=\sum\limits_lE[x_l]=\sum\limits_lP[x_l=1]$.
 99 | \end{enumerate}
100 | It can be proved that for any $i<j$,
101 | $$P(x_{ij}=1)=\frac{2}{j-i+1}$$
102 | \begin{proof}
103 | Consider the set of elements 
104 | $$S_{ij}=\{z_i,z_{i+1},\dots,z_{j-1},z_j\}.$$
105 | As long as none of them is chosen as pivot, they will be passed to the same recursive call. If $z_i$ or $z_j$ is the first among them to be chosen as pivot, $x_i$ and $x_j$ will be compared. On the contrary, if any other element is chosen as pivot before $z_i$ and $z_j$, $z_i$ and $z_j$ will end up in different sub-arrays, and they can never be compared in the future. So $P(x_{ij}=1)$ is equal to the probability that $z_i$ or $z_j$ is the first element of $S_{ij}$ to be chosen as pivot, i.e. $\frac{2}{j-i+1}$.
106 | \end{proof}
107 | Then we have 
108 | \begin{align*}
109 | E[C]&=\sum\limits_{i=1}^{n-1}\sum\limits_{j=i+1}^{n}P[x_{ij}=1]=\sum\limits_{i=1}^{n-1}\sum\limits_{j=i+1}^{n}\frac{2}{j-i+1}\\
110 | &=2\cdot\sum\limits_{i=1}^{n-1}\left(\frac{1}{2}+\frac{1}{3}+\dots+\frac{1}{n-i+1}\right)\\
111 | &=2\cdot\left(\frac{n}{2}+\frac{n-1}{3}+\frac{n-2}{4}+\dots+\frac{1}{n}\right)\\
112 | &\leq2n\sum\limits_{k=2}^n\frac{1}{k}\\
113 | &\leq2n\int_1^n\frac{1}{x}dx=2n\ln n
114 | \end{align*}
115 | Thus the running time of quick sort with random pivots is $O(n\log n)$.
116 | \section{Randomized Selection}
117 | \subsection{RSelect Algorithm}
118 | In this section we will discuss the selection problem.
119 | \begin{description}
120 | \item[Input]Array $A$ containing $n$ distinct numbers in arbitrary order, and number $i\in\{1,2,\dots,n\}$. 
121 | \item[Output]The $i^{th}$ order statistic, i.e. $i^{th}$ smallest element of $A$.
122 | \end{description}
123 | Obviously, the $i^{th}$ order statistic can be found by sorting the array and then directly pick the $i^{th}$ element. The overall running time is at least $O(n\log n)$. A randomized approach similar to that of quick sort can reduce the average running time to $O(n)$, which is quite amazing because simply reading all elements of the array is already an $O(n)$ process. The algorithm is provided in Algorithm \ref{randomizedselection}. 
124 | \begin{algorithm}[ht]
125 | \caption{Randomized Selection}\label{randomizedselection}
126 | \begin{algorithmic}[1]
127 | \Input\Statex{Array $A$ with $n$ distinct elements in any order}
128 | \Statex{Integer $i\in\{1,2,\dots,n\}$}
129 | \Output\Statex{$i^{th}$ order statistic of $A$}
130 | \Function{$RSelect$}{$A,i$}
131 | \If{$A.len==1$}
132 | \State{return $A[1]$}
133 | \EndIf
134 | \State{Randomly choose pivot $p$ from $A$}
135 | \State{Partition $A$ around $p$}
136 | \State{Let $j$ = new index of $p$, $A_1,A_2$ = 1st and 2nd part of $A$}
137 | \If{$j==i$}
138 | \State{return $p$}
139 | \ElsIf{$j>i$}
140 | \State{return $RSelect(A_1,i)$}\Comment{$A_1.len=j-1$}
141 | \Else\State{return $RSelect(A_2,i-j)$}\Comment{$j<i.\:A_2.len=n-j$}
142 | \EndIf
143 | \EndFunction
144 | \end{algorithmic}
145 | \end{algorithm}
146 | \subsection{Running Time}
147 | $RSelect$ is also based on recursively partitioning the arrays, but at most one recursive call is needed in each iteration because the $i^{th}$ order statistic can only be on one side of the pivot. As with quick sort, the worse case happens when each time the smallest element is chosen as pivot. The worst running time is $O(n^2)$. The average running time is given by Theorem \ref{rselecttheorem}.
148 | \begin{theorem}\label{rselecttheorem}
149 | For any input array $A$ of length $n$, the average running time of $RSelect$ is $O(n)$.
150 | \end{theorem}
151 | In $RSelect$, the only workload outside of the recursive call is the partition, which is an $O(n)$ process. Another way to note its time complexity is that it uses fewer than $cn$ operations, in which $c$ is a constant. 
152 | \begin{definition}
153 | RSelect is said to be in phase $j$ if size of the current array is between $(3/4)^{j+1}n$ and $(3/4)^jn$.
154 | \end{definition}
155 | Let $x_j$ represent the number of recursive calls during phase $j$. Then we have
156 | \begin{equation}\label{rtrselect}
157 | RT(RSelect)\leq cn\sum\limits_jx_j\cdot\left(\frac{3}{4}\right)^j
158 | \end{equation}
159 | Now let's consider $E(x_j)$. If $RSelect$ chooses a pivot giving a 25\%-75\% or better partition, the execution is guaranteed to enter the next phase, because the length of the sub-array to examine will shrink by at least 25\%. The probability for such a partition to happen is obviously 50\%. The problem is analogous to the coin flipping problem. The expectation of $x_j$ is the same as the expectation of the number of flips needed to have the ``head'' side of the coin show up for the first time. The coin flipping problem is a geometric distribution problem, in which the expectation is $\frac{1}{p}=\frac{1}{1/2}=2$. Thus $E(x_j)=2$. Substitute this into \eqref{rtrselect}, we have
160 | \begin{align*}
161 | E(RT(RSelect))\leq cn\sum\limits_jE(x_j)\left(\frac{3}{4}\right)^j=2cn\sum\limits_j\left(\frac{3}{4}\right)^j\leq8cn.
162 | \end{align*}
163 | \section{Deterministic Selection}
164 | In this section we will cover a deterministic selection algorithm that is also $O(n)$. It is not a good choice when compared against random selection because the constant for the big-O notation is larger, and it is not in place. Yet it is still a brilliant and interesting algorithm.
165 | 
166 | When partitioning an array, the ideal pivot is its median. In order to guarantee the efficiency of the algorithm, we need to find a good enough pivot. The key idea is to use the ``median of medians''. The array is broken into $n/5$ sub-arrays of length 5. The sub-arrays are sorted, their medians are put into a new array $C$, and we recursively compute the median of $C$. The final result is returned as the pivot. The algorithm is shown in Algorithm \ref{dselection}.
167 | \begin{algorithm}[ht]
168 | \caption{Deterministic Selection}\label{dselection}
169 | \begin{algorithmic}[1]
170 | \Input\Statex{Array $A$ with $n$ distinct elements in any order}
171 | \Statex{Integer $i\in\{1,2,\dots,n\}$}
172 | \Output\Statex{$i^{th}$ order statistic of $A$}
173 | \Function{$DSelect$}{$A,i$}
174 | \State{$n=A.len$}
175 | \If{$n\leq 5$}
176 | \State{sort $A$ and pick the $i^{th}$ element}
177 | \EndIf 
178 | \State{Break $A$ into $n/5$ sub-arrays of length 5 and sort each sub-array}\Comment{$O(n)$}
179 | \State{Let array $C$ store the medians of the $n/5$ sub-arrays}\Comment{$O(n)$}
180 | \State{$p=DSelect(C,n/10)$}\Comment{Recursively compute median. $T(n/5)$}
181 | \State{Partition $A$ around $p$}\Comment{$O(n)$}
182 | \State{Let $j$ = new index of $p$, $A_1,A_2$ = 1st and 2nd part of $A$}
183 | \If{$j==i$}\State{return $p$}
184 | \ElsIf{$j>i$}\State{return $DSelect(A_1,i)$}\Comment{$T(?)$ running time to be determined}\label{line1}
185 | \Else\State{return $DSelect(A_2,i-j$)}\label{line2}
186 | \EndIf
187 | \EndFunction
188 | \end{algorithmic}
189 | \end{algorithm}
190 | Two recursive calls are made in Algorithm \ref{dselection}. Let $T(n)$ represent the running time of $DSelect$ on an array of length $n$, then there exists constant $c$ such that 
191 | \begin{itemize}
192 | \item $T(1)=1$
193 | \item $T(n)\leq cn+T(n/5)+T(?)$
194 | \end{itemize}
195 | in which the running time of the second recursive call needs to be determined. 
196 | \begin{lemma}
197 | The 2nd recursive call (line \ref{line1} or \ref{line2}) is guaranteed to be $O(\frac{7}{10}n)$.
198 | \end{lemma}
199 | \begin{proof}
200 | Let $k=n/5$, and let $x_i$ represent the $i^{th}$ smallest of the $k$ medians. Then the final pivot is $x_{k/2}$. Our goal is to prove that at least 30\% of the elements in the input array are smaller than the $x_{k/2}$, and at least 30\% are bigger than it. Let's put all elements of the array in a 2D grid as follow, with each sorted group as a column, and $x_i$ in ascending order. 
201 | \begin{table}[H]
202 | \centering
203 | \begin{tabular}{ccccccc}
204 | {\color{red}$\circ$}&{\color{red}$\circ$}&{\color{red}$\cdots$}&{\color{red}$\circ$}&$\cdots$&$\circ$&$\circ$
205 | \\
206 | {\color{red}$\circ$}&{\color{red}$\circ$}&{\color{red}$\cdots$}&{\color{red}$\circ$}&$\cdots$&$\circ$&$\circ$
207 | \\
208 | {\color{red}$x_1$}&{\color{red}$x_2$}&{\color{red}$\cdots$}&$x_{k/2}$&{\color{blue}$\cdots$}&{\color{blue}$x_{k-1}$}&{\color{blue}$x_k$}\\
209 | $\circ$&$\circ$&$\cdots$&{\color{blue}$\circ$}&{\color{blue}$\cdots$}&{\color{blue}$\circ$}&{\color{blue}$\circ$}\\
210 | $\circ$&$\circ$&$\cdots$&{\color{blue}$\circ$}&{\color{blue}$\cdots$}&{\color{blue}$\circ$}&{\color{blue}$\circ$}
211 | \end{tabular}
212 | \end{table}
213 | Obviously, elements in red color are smaller than $x_{k/2}$, while elements in blue color are bigger than it. Both include $\frac{3}{5}\times\frac{1}{2}=30\%$ of the elements. Thus the size of the sub-array is guaranteed to shrink by at least 30\% in the 2nd recursive call, so its running time is $O(\frac{7}{10}n)$.
214 | \end{proof}
215 | \begin{theorem}
216 | $\forall$ input array of size $n$, $DSelect$ runs in $O(n)$ time.
217 | \end{theorem}
218 | \begin{proof}
219 | Now we have 
220 | \begin{equation*}
221 | T(n)\leq cn+T\left(\frac{n}{5}\right)+T\left(\frac{7n}{10}\right).
222 | \end{equation*}
223 | The master method is not applicable because the two sub-problems are not of the same size. We will prove $T(n)\leq 10cn$ by induction.
224 | 
225 | The base case($n=1$) is trivial. Suppose that $T(k)\leq 10ck$ holds for all $k<n$. Then we have 
226 | \begin{align*}
227 | T(n)\leq&cn+T\left(\frac{n}{5}\right)+T\left(\frac{7n}{10}\right)\\
228 | \leq&cn+10c\cdot\frac{n}{5}+10c\cdot\frac{7n}{10}\\
229 | =&10cn
230 | \end{align*} 
231 | Hence $T(n)$ is $O(n)$.
232 | \end{proof}
233 | \section{Lower Bound for Comparison-Based Sorting}
234 | Up to now we have in our toolbox $O(n\log n)$ sorting algorithms and $O(n)$ selection algorithms. In this section we will prove that we cannot do better with comparison-based sorting.
235 | \begin{definition}
236 | In a comparison-based sorting, the array elements can only be accessed via comparisons. 
237 | \end{definition}
238 | Merge sort, heap sort, quick sort, selection sort are all examples of comparison-based sorting. Bucket sort, counting sort, radix sort are examples of non-comparison-based sorting. They require extra knowledge of the data to sort. 
239 | 
240 | The following theorem provides a lower bound for the running time of comparison-based sorting. 
241 | \begin{theorem}
242 | Every comparison-based sorting algorithm has worst-case running time $\Omega(n\log n)$.
243 | \end{theorem}
244 | \begin{proof}
245 | Consider the action of a comparison-based sorting algorithm on an input composed of integers from 1 to $n$. There are totally $n!$ such inputs. Let $k$ be the minimum number of comparisons needed to address all these inputs. Each comparison has two possible results, thus $k$ comparisons have $2^k$ possible combinations of results. In order for all the $n!$ possible inputs to be sorted correctly, we must have
246 | $$2^k\geq n!,$$
247 | because otherwise there would be at least 2 inputs that appeared the same for the comparison-based algorithm, according to the pigeonhole principle\footnote{Also named drawer principle}. Hence we have 
248 | $$k\geq\log(n!)>\log(n/2)^{n/2}=\frac{n}{2}\log\frac{n}{2},$$
249 | which means that the running time of the algorithm is $\Omega(n\log n)$.
250 | 
251 | 
252 | \end{proof}
253 | \ifx\PREAMBLE\undefined
254 | \end{document}
255 | \fi


--------------------------------------------------------------------------------
/DivideConquer.tex:
--------------------------------------------------------------------------------
  1 | \ifx\PREAMBLE\undefined
  2 | \input{preamble}
  3 | \begin{document}
  4 | \fi
  5 | \chapter{Divide and Conquer Algorithms}
  6 | A typical divide-and-conquer solution to a problem consists of the following steps:
  7 | \begin{enumerate}
  8 | \item Divide the problem into smaller sub-problems.
  9 | \item Conquer the sub-problems via recursive calls.
 10 | \item Combine solutions of sub-problems into a solution to the original problem, often involving some clean-up work.
 11 | \end{enumerate}
 12 | \section{Inversion Counting Problem}
 13 | We will solve the inversion problem using the divide-and-conquer paradigm. The problem is described as follow.
 14 | \begin{description}
 15 | \item[Input]An array $A$ containing numbers $1,2,\dots,n$ in some arbitrary order.
 16 | \item[Output]Number of inversions in this array, i.e. number of pairs $[i,j]$ such that $i<j$ and $A[i]>A[j]$.
 17 | \end{description}
 18 | A geometrical solution to the problem is to draw two parallel series of points, mark one series in the order inside array $A$, and mark the other in the order $1,2,\dots,n$. Connect points marked by the same number, i.e. 1 with 1, 2 with 2, etc, then the number of crossing lines is exactly the number of inversions. 
 19 | 
 20 | The inversion number is widely useful in comparison and recommendation systems. A movie rating website wants to compare tastes of its users and recommend to a user movies liked by other users with similar taste to his. One criterion of such comparison is to pick the ratings given by one user to a series of movies and compare them against other users' ratings by calculating the number of inversions. The fewer inversions there are, the more similar their tastes are.
 21 | 
 22 | A brute-force approach is obviously $\Theta(n^2)$. We can do better by applying the divide-and-conquer paradigm. Suppose that the array has been divided into two halves. An inversion $[i,j]$ is called a left inversion if both $i,j$ are in the left half, a right inversion if they are both in the right half, and a split inversion if $i$ is in the left half and $j$ is in the right half. A high-level divide-and-conquer algorithm is provided in Algorithm \ref{inversioncounting}. If \texttt{countSplitInv} can be implemented as $\Theta(n)$, then the whole algorithm will be $\Theta(n\log n)$.
 23 | \begin{algorithm}[ht]
 24 | \caption{Divide-and-conquer Inversion Counting}\label{inversioncounting}
 25 | \begin{algorithmic}[1]
 26 | \Input\Statex{Array A}
 27 | \Output\Statex{Number of inversions in A}
 28 | \Function{count}{Array A}
 29 | \If{A.len == 1}\State{return 0}
 30 | \Else
 31 | \State{x = count(1st half of A)}\Comment{Left inversions.}
 32 | \State{y = count(2nd half of A)}\Comment{Right inversions.}
 33 | \State{z = countSplitInv(A)}\Comment{Count split inversions.}
 34 | \State{return x+y+z}
 35 | \EndIf
 36 | \EndFunction
 37 | \end{algorithmic}
 38 | \end{algorithm}
 39 | 
 40 | The implementation of \texttt{countSplitInv} seems quite subtle, but it can actually be developed from merge sort. In Algorithm \ref{inversioncounting}, the subroutine \texttt{count} only counts the number of inversions. In addition to that, we rename it with \texttt{sortAndCount} and require that it also sorts the array. Subroutine \texttt{countSplitInv} now becomes \texttt{mergeAndCountSplitInv}. It merges the two sorted sub-arrays into one sorted array and counts the number of split inversions.
 41 | 
 42 | If there exists no split inversion, it must be the case that any element of the left sub-array A is smaller than any element of the right sub-array B. As a result, when merging the two sorted sub-arrays, A will be exhausted before any element of B is put in the result. Once an element of B is chosen during the merging process before A is exhausted, every element left in A forms an inversion with it. Algorithm \ref{countsplitinversion}, developed from Algorithm \ref{mergehalves}%in file Introduction.tex
 43 | , uses this idea to carry out the \texttt{mergeAndCountSplitInv} process. It is still $\Theta(n)$, thus our inversion counting algorithm is guaranteed to be $\Theta(n\log n)$.
 44 | \begin{algorithm}[ht]
 45 | \caption{Merge and Count Split Inversion}\label{countsplitinversion}
 46 | \begin{algorithmic}[1]
 47 | \Input
 48 | \Statex{A = 1st sorted sub-array, of length $\lfloor n/2 \rfloor$}
 49 | \Statex{B = 2nd sorted sub-array, of length $\lceil n/2 \rceil$}
 50 | \Output
 51 | \Statex{C = sorted array of length $n$}
 52 | \Statex{numSplitInv = number of split inversions}
 53 | \State{i = 1, j = 1, numSplitInv = 0}
 54 | \For{k = 1 \textbf{to} $n$}
 55 | \If{i $>$ A.len}\Comment{A has been exhausted}
 56 | \State{C[k] = B[j++]}
 57 | \ElsIf{j $>$ B.len}\Comment{B has been exhausted}
 58 | \State{C[k] = A[i++]}
 59 | \ElsIf{A[i] $\leq$ B[j]}\Comment{In this case equality is actually impossible.}
 60 | \State{C[k] = A[i++]}
 61 | \Else\Comment{A[i]$>$B[j]}
 62 | \State{C[k] = B[j++]}
 63 | \State{numSplitInv += A.len}
 64 | \EndIf
 65 | \EndFor
 66 | \end{algorithmic}
 67 | \end{algorithm}
 68 | \section{Matrix Multiplication}
 69 | Matrix multiplication is an important mathematical problem.
 70 | \begin{description}
 71 | \item[Input]Two matrices $X,Y$ of dimension $N\times N$.
 72 | \item[Output]Product matrix $Z=X\cdot Y$.
 73 | \end{description}
 74 | The definition of matrix multiplication is $Z_{ij}=\sum\limits_{k=1}^NX_{ik}Y_{kj}$. Calculating the product matrix directly will result in a $\Theta(n^3)$ algorithm.\footnote{Here the input size is $\Theta(n^2)$, rather than $\Theta(n)$.} We will introduce an ingenious divide-and-conquer algorithm developed by Strassen that is more efficient.
 75 | 
 76 | At first sight, it might seem plausible to divide each matrix into 4 sub-matrices of dimension $N/2\times N/2$ in the divide phase of the divide-and conquer process:
 77 | \begin{equation*}
 78 | XY=\begin{pmatrix}A&B\\C&D\end{pmatrix}\begin{pmatrix}E&F\\G&H\end{pmatrix}=
 79 | \begin{pmatrix}
 80 | AE+BG&AF+BH\\CE+DG&CF+DH
 81 | \end{pmatrix}.
 82 | \end{equation*}
 83 | However, this division does not make great difference. The algorithm is still $\Theta(n^3)$, as we will prove later. Recall that in the Karatsuba Multiplication algorithm, we reduced the number of products by 1 by applying the Gauss's trick: obtain some products by linear combination of other products, rather than direct multiplication. Since addition/subtraction is generally more efficient than multiplication, it usually pays off to appropriately choose the products to calculate in order to reduce the number of products to calculate. In the naive divide-and-conquer design above, we have to calculate 8 products of sub-matrices. Strassen's brilliant algorithm reduces this number to 7, as shown in Algorithm \ref{strassen}, and ends up with smaller time consumption. 
 84 | \begin{algorithm}
 85 | \caption{Strassen's Matrix Multiplication}\label{strassen}
 86 | \begin{algorithmic}[1]
 87 | \Input\Statex{Two matrices $X,Y$ of dimension $N\times N$}
 88 | \Output\Statex{Matrix product $X\cdot Y$}
 89 | \State{Divide the matrices: $X=\begin{pmatrix}A&B\\C&D\end{pmatrix}$, $Y=\begin{pmatrix}E&F\\G&H\end{pmatrix}$}
 90 | \State{Recursively calculate
 91 | \begin{align*}
 92 | P_1&=A(F-H),\:P_2=(A+B)H,\:P_3=(C+D)E,\:P_4=D(G-E)\\
 93 | P_5&=(A+D)(E+H),\:P_6=(B-D)(G+H),\:P_7=(A-C)(E+F)\\
 94 | \end{align*}}
 95 | \State{Linearly combine the products in step 2 to obtain $XY$:
 96 | \begin{equation*}
 97 | XY=\begin{pmatrix}
 98 | P_5+P_4-P_2+P_6&P_1+P_2\\
 99 | P_3+P_4&P_1+P_5-P_3-P_7\\
100 | \end{pmatrix}=\begin{pmatrix}
101 | AE+BG&AF+BH\\CE+DG&CF+DH
102 | \end{pmatrix}\end{equation*}}
103 | \end{algorithmic}
104 | \end{algorithm}
105 | The explanation of its time complexity, as well as that of the naive divide-and-conquer algorithm will be addressed later. 
106 | \section{Closest Pair}
107 | The closest pair problem is the first computational geometry problem we meet. 
108 | \begin{description}
109 | \item[Input]A set of $n$ points $P=\{p_1,\dots,p_n\}$ on the $\mathbb{R}^2$ plane. For simplicity, we assume that they have distinct $x$ coordinates and $y$ coordinates.
110 | \item[Output]A pair of distinct points $p^*,q^*\in P$ that minimizes the Euclidean distance between two points $d(p,q)$ with $p,q\in P$.
111 | \end{description}
112 | A brute-force algorithm is obviously $\Theta(n^2)$. A divide-and-conquer approach can improve it to $\Theta(n\log n)$. Its subtlety lies, as usual, in the 3rd step: combination of solutions to the sub-problems. As preparation before the divide-and-conquer, we sort the points respectively by $x$ and $y$ coordinates, and note the results as $P_x,P_y$. The sort process is $\Theta(n\log n)$ using merge sort, thus we can obtain a $\Theta(n \log n)$ algorithm as long as the divide-and-conquer process takes no more than $\Theta(n\log n)$. 
113 | 
114 | The skeleton of the process is shown in Algorithm \ref{closetpair}. \texttt{ClosestSplitPair} remains to be illustrated. It outputs the ``split pair'', i.e. one point in $Q$ and the other in $R$, with minimum distance.
115 | \begin{algorithm}[ht]
116 | \caption{Closest Pair Searching ClosetPair($P_x,P_y$)}\label{closetpair}
117 | \begin{algorithmic}[1]
118 | \Input\Statex{A set of $n$ points $P=\{p_1,\dots,p_n\}$ on $\mathbb{R}^2$, sorted respectively by $x$ and $y$ coordinates as $P_x$ and $P_y$}
119 | \Output\Statex{$p,q$ with minimum Euclidean distance}
120 | \State{Let $Q$ be the left half of $P$ and $R$ be right half of $P$. According to $P_x,P_y$, form $Q_x,Q_y,R_x,R_y$, i.e. $Q,R$ sorted by $x$ and $y$ coordinates.}\Comment{$\Theta(n)$}
121 | \State{$(p_1,q_1)$ = ClosestPair($Q_x,Q_y$)}
122 | \State{$(p_2,q_2)$ = ClosestPair($R_x,R_y$)}
123 | \State{$\delta=\min\{d(p_1,q_1),d(p_2,q_2)\}$}
124 | \State{$(p_3,q_3)$ = ClosetSplitPair($P_x,P_y,\delta$)}\Comment{Should be $O(n)$}
125 | \State{Return the best among $(p_1,q_1),(p_2,q_2)$ and $(p_3,q_3)$}
126 | \end{algorithmic}
127 | \end{algorithm}
128 | 
129 | Let $\overline{x}$ represent the largest $x$ coordinate in $Q$, i.e. in the left half of $P$. Since we have $P_x$, $\overline{x}$ can be obtained in $O(1)$ time. Define $S_y$ as points in $P$ with $x$ coordinate inside $[\overline{x}-\delta,\overline{x}+\delta]$, sorted by $y$ coordinate. We have the following lemma.
130 | \begin{lemma}\label{lemmaforclosesplitpair}
131 | Let $p\in Q,q\in R$ be the split pair with $d(p,q)<\delta$. Then we must have
132 | \begin{itemize}
133 | \item $p,q\in S_y$;
134 | \item $p,q$ are at most 7 positions away from each other in $S_y$. 
135 | \end{itemize}
136 | \end{lemma}
137 | \begin{proof}
138 | Let $p(x_1,y_1)\in Q$, $q(x_2,y_2)\in R$, and we have 
139 | $$d(p,q)=\sqrt{(x_1-x_2)^2+(y_1-y_2)^2}<\delta.$$
140 | Since $\overline{x}$ is the largest coordinate in $Q$, we have 
141 | $$x_1\leq \overline{x}\leq x_2.$$
142 | Thus 
143 | \begin{align*}
144 | \lvert x_2-\overline{x}\rvert=x_2-\overline{x}\leq x_2-x_1\leq\sqrt{(x_1-x_2)^2+(y_1-y_2)^2}<\delta,\\
145 | \lvert x_1-\overline{x}\rvert=\overline{x}-x_1\leq x_2-x_1\leq\sqrt{(x_1-x_2)^2+(y_1-y_2)^2}<\delta.
146 | \end{align*}
147 | Which leads to the conclusion
148 | $$x_1,x_2\in[\overline{x}-\delta,\overline{x}+\delta].$$
149 | This can be directly translated to the first claim of the lemma: $p,q\in S_y$. Figure \ref{pq7position} helps to prove the 2nd claim.  
150 | \begin{figure}[H]
151 | \centering
152 | \includegraphics[width=0.5\textwidth]{closestpointlemma.jpg}
153 | \caption{$p,q$ at most 7 positions away from each other}\label{pq7position}
154 | \end{figure}
155 | 
156 | In Figure \ref{pq7position}, we draw 8 $\delta/2\times\delta/2$ grids around $\overline{x}$ and $\min\{y_1,y_2\}$. Since $\lvert y_1-y_2\rvert<d(p,q)<\delta$ and $p,q\in S_y$, we know that $p,q$ must be contained in these grids. On the other hand, each grid can contain at most 1 point, because if a grid contained 2 points, their distance would be smaller than $\sqrt{2}/2\delta$, thus smaller than $\delta$, violating the prerequisite that $\delta$ is the minimum distance between a non-split pair. As a result, there can be at most 8 points inside these grids, including $p$ and $q$. Hence $p,q$ are at most 7 points away from each other.
157 | \end{proof}
158 | According to Lemma \ref{lemmaforclosesplitpair}, we finally have the \texttt{ClosestSplitPair} algorithm as shown in Algorithm \ref{closestsplitpair}.
159 | \begin{algorithm}[ht]
160 | \caption{ClosestSplitPair($P_x,P_y,\delta$)}\label{closestsplitpair}
161 | \begin{algorithmic}[1]
162 | \Input\Statex{$P_x,P_y,\delta$ as defined in Algorithm \ref{closetpair}.}
163 | \Output\Statex{Split pair $p\in Q,q\in R$ with $d(p,q)<\delta$, or $null$ if such pair does not exist.}
164 | \State{Initialize $best = \delta$, $best\_pair=null$}
165 | \For{$i$ = 1 \textbf{to} $\lvert S_y\rvert-1$}
166 | \For{$j$ = 1 \textbf{to} $\min\{7,\lvert S_y\rvert-i\}$}
167 | \State{Let $p,q$ be $i^{th}$, $(i+j)^{th}$ points of $S_y$}
168 | \If{$d(p,q)<best$}
169 | \State{$best\_pair = p,q,\:best=d(p,q)$}
170 | \EndIf
171 | \EndFor
172 | \EndFor
173 | \end{algorithmic}
174 | \end{algorithm}
175 | \section{The Master Method}
176 | Potentially useful algorithmic ideas often need mathematical analysis to evaluate. The master method is a general recurrence approach to analyze the running time of divide-and-conquer algorithms. 
177 | \subsection{Examples}
178 | Recall the integer multiplication problem. Let $T(n)$ represent the maximum number of primitive operations needed to multiply two $n$-digit integers. The recurrence approach aims at expressing $T(n)$ in terms of running time of recursive calls concerning smaller $n$. In this specific divide-and-conquer algorithm, we hope to express $T(n)$ as function of $T(n/2)$. A recurrence approach needs a base case. In this problem the base case is trivial: $T(1)\leq a$, in which $a$ is a constant.
179 | 
180 | For the divide-and-conquer algorithm without Gauss's trick, we have 
181 | \begin{equation}\label{nogausstrick}
182 | T(n)\leq 4\cdot T(n/2)+O(n).
183 | \end{equation}
184 | When Gauss's trick is applied, we have 
185 | \begin{equation}\label{gausstrick}
186 | T(n)=3\cdot T(n/2)+O(n).
187 | \end{equation}
188 | Merge sort takes $O(n\log n)$ running time. It has the recurrence relation
189 | \begin{equation}\label{mergesortrecur}
190 | T(n)=2\cdot T(n/2)+O(n).
191 | \end{equation}
192 | We have no idea how to obtain the running time from these recurrence relations, but it is clear that the rank of the three algorithms in terms of running time is \eqref{nogausstrick}$>$\eqref{gausstrick}$>$\eqref{mergesortrecur}.
193 | \subsection{Mathematical Statement}
194 | The master method helps to obtain the running time of an algorithm according to its recurrence relation. It assumes that all sub-problems have equal size, thus it's not applicable to the closest pair problem, in which the left and right halves of the points are not guaranteed to have the same number of points. We also assume that the base case is trivial: $T(n)\leq a$ ($a$ is constant) for all sufficiently small $n$. Consider an algorithm that has the recurrence relation
195 | \begin{equation*}
196 | T(n)\leq a\cdot T(n/b)+O(n^d),
197 | \end{equation*}
198 | in which $a,b,n$ are constants with clear meanings. $a$ is the number of recursive calls, $b$ is the input size shrinkage factor, and $O(n^d)$ describes the amount of work needed to combine the solutions to the sub-problems. $a,b$ are both larger than 1, while $d$ can be as small as 0. Master method provides the form of $T(n)$ in different cases, as expressed in Theorem \ref{mastermethod}.
199 | \begin{theorem}\label{mastermethod}
200 | $T(n)$ can be expressed in big-O notation as follow\footnote{If the recurrence relation is written with = rather than $\leq$, the result will be in $\Theta$ notation.}:
201 | \begin{equation*}
202 | T(n)=
203 | \begin{cases}
204 | O(n^d\log n)&\text{if}\:a=b^d\\
205 | O(n^d)&\text{if}\:a<b^d\\
206 | O(n^{\log_ba})&\text{if}\:a>b^d\\
207 | \end{cases}
208 | \end{equation*}
209 | \end{theorem}
210 | Now let's look at a few examples. 
211 | 
212 | For merge sort algorithm \ref{mergesort} with recurrence relation \eqref{mergesortrecur}, we have $a=2,\:b=2,\:d=1$, thus it belongs to case 1 and has running time $O(n\log n)$. 
213 | 
214 | Consider binary search, which has the recurrence relation
215 | \begin{equation*}
216 | T(n)=T(n/2)+O(1),
217 | \end{equation*} 
218 | i.e. $a=1,\:b=2,\:d=0$, hence it is $O(\log n)$. 
219 | 
220 | For the integer multiplication algorithm without Gauss's trick \eqref{nogausstrick}, we have $a=4,\:b=2,\:d=1$, ending up with case 3. Thus it has time complexity $O(n^2)$, i.e. the divide-and-conquer approach fails to improve the time consumption in comparison with the primary school method. 
221 | 
222 | When we take Gauss's trick into account, i.e. with Karatsuba multiplication algorithm \ref{kmultiplication}, in \eqref{gausstrick} we have $a=3,\:b=2,\:d=1$. We are still in case 3 and end up with $O(n^{\log_23})$, which is smaller than $O(n^2)$ but larger than $O(n\log n)$.
223 | 
224 | Strassen's algorithm \ref{strassen} for matrix multiplication has the recurrence relation
225 | \begin{equation*}
226 | T(n)=7\cdot T(n/2)+O(n^2),
227 | \end{equation*}
228 | which leaves us in case 3. Its running time is therefore $O(n^{\log_27})$, which is better than $O(n^3)$.
229 | 
230 | As an illustration of case 2, consider the recurrence relation 
231 | \begin{equation*}
232 | T(n)\leq 2\cdot T(n/2)+O(n^2).
233 | \end{equation*}
234 | With $a=b=d=2$, we are in case 2, and end up with running time $O(n^2)$. In this case, the running time is governed by the work outside the recursive call, i.e. the time spent on combining solutions to the sub-problems dominates the global time consumption.
235 | \subsection{Proof}
236 | We will prove the correctness of the master method in this section. As having been stated above, we assume that the recurrence relation takes the form
237 | \begin{itemize}
238 | \item $T(1)\leq c$
239 | \item $T(n)\leq a\cdot T(n/b)+cn^d$
240 | \end{itemize} 
241 | It's fine to use the same constant $c$ in both the base case and the recurrence relation because we are using $\leq$. In order to make the process less tedious, we also assume that $n$ is a power of $b$. The argument will be similar to what we did to obtain the running time of merge sort: through analysis on the recursive tree. Note that in this section when we refer to a value of time consumption, we always mean that the actual time consumption is smaller than or equal to this value.
242 | 
243 | The recursive tree has $\log_bn+1$ levels, from level 0 (the original problem) to level $\log_bn$ (trivial problem of size 1). At level $j$, there are in total $a^j$ sub-problems, each of size $n/b^j$. For $j\neq\log_bn$, the time consumption at this level is contributed by the $cn^d$ term:
244 | \begin{equation*}
245 | a^j\cdot c\cdot\left(\frac{n}{b^j}\right)^d=c\cdot n^d\cdot\left(\frac{a}{b^d}\right)^j.
246 | \end{equation*}
247 | Summing it up over all levels leads to the result $c\cdot n^d\cdot\sum\limits_{j=0}^{\log_bn-1}\left(\frac{a}{b^d}\right)^j$. At level $\log_bn$, the time consumption is simply the combination of all base cases:
248 | \begin{equation*}
249 | c\cdot a^{\log_bn}
250 | \end{equation*}
251 | The total time consumption is thus 
252 | \begin{equation}\label{totaltime}
253 | c\cdot n^d\cdot\sum\limits_{j=0}^{\log_bn-1}\left(\frac{a}{b^d}\right)^j+c\cdot a^{\log_bn}=c\cdot n^d\cdot\sum\limits_{j=0}^{\log_bn}\left(\frac{a}{b^d}\right)^j
254 | \end{equation}
255 | This leads to classified discussion over the value of $\frac{a}{b^d}$.
256 | \begin{enumerate}
257 | \item $a=b^d$. In this case, \eqref{totaltime} becomes 
258 | \begin{equation*}
259 | c\cdot n^d(\log_bn+1)=O(n^d\log_bn)
260 | \end{equation*}
261 | \item $a<b^d$. In this case, \eqref{totaltime} becomes 
262 | \begin{equation*}
263 | c\cdot n^d\frac{1-\left(\frac{a}{b^d}\right)^{\log_bn+1}}{1-\frac{a}{b^d}}<\frac{c}{1-\frac{a}{b^d}}n^d=O(n^d)
264 | \end{equation*}
265 | \item $a>b^d$. In this case, \eqref{totaltime} becomes 
266 | \begin{equation*}
267 | c\cdot n^d\frac{\left(\frac{a}{b^d}\right)^{\log_bn+1}-1}{\frac{a}{b^d}-1}<\frac{c}{\frac{a}{b^d}-1}n^d\frac{a}{b^d}\left(\frac{a}{b^d}\right)^{\log_bn}=O(a^{\log_bn})=O(n^{\log_ba})
268 | \end{equation*}
269 | \end{enumerate}
270 | Therefore we have completed the proof of the master method.\hfill$\square$
271 | 
272 | The essential role that $a/b^d$ plays here comes naturally from the meaning of $a,b,d$. Each problem produces $a$ sub-problems in the next level. We call $a$ the \textbf{rate of sub-problem proliferation, abbr. RSP}. Size of each sub-problem shrinks by $b$ times after each recurrence, and the work load shrinks by $b^d$ times, so we call $b^d$ the \textbf{rate of work shrinkage, abbr. RWS}. The three cases of the master method can be interpreted as follow.
273 | \begin{enumerate}
274 | \item RSP = RWS. The amount of work at each level is $cn^d$. With totally $\log_bn$ levels, the problem should be $O(n^d\log_bn)$.
275 | \item RSP $>$ RWS. The amount of work increases with the recursion level $j$. The last level dominates the running time, thus the overall running time is proportional to the number of sub-problems (base cases) in the last level. The problem is $O(a^{\log_bn})$.
276 | \item RSP $<$ RWS. The amount of work decreases with the recursion level $j$. The root level dominates the running time, thus the problem is $O(n^d)$. 
277 | \end{enumerate}
278 | \ifx\PREAMBLE\undefined
279 | \end{document}
280 | \fi


--------------------------------------------------------------------------------
/Graphs.tex:
--------------------------------------------------------------------------------
  1 | \ifx\PREAMBLE\undefined
  2 | \input{preamble}
  3 | \begin{document}
  4 | \fi
  5 | \chapter{Graph Primitives}
  6 | \textbf{Graphs represent pairwise relationships amongst a set of objects}. The objects are called vertices or nodes. The relationships are called edges or arcs, each connecting a pair of vertices. An edge can be directed or undirected. The set of vertices and the set of edges are noted respectively as $V$ and $E$. Graph is a concept heavily used in reality. Road networks, the web, social networks, precedence constraints are all examples of graphs.
  7 | 
  8 | A connected graph composed of $n$ vertices with no parallel edges has at least $n-1$ and at most $n(n-1)/2$ edges. Let $m$ represent the number of edges. In most applications, $m$ is $\Omega(n)$ and $O(n^2)$. If $m$ is $O(n)$ or close to it, the graph is called a sparse graph, while if $m$ is closer to $O(n^2)$, it's called a dense graph. Yet their delimitation is not strictly clear. 
  9 | \section{Representation}
 10 | \subsection{Adjacent Matrix}
 11 | An undirected graph $G$ with $n$ vertices and no parallel edges can be represented by an $n\times n$ 0-1 matrix $A$. $A_{ij}=1$ when and only when $G$ has an $i-j$ edge. Variants of this representation can easily accommodate parallel edges, weighted edges: just let $A_{ij}$ represent the number of parallel edges or the weight of the edge. For directed graphs, $i\rightarrow j$ can be represented by $A_{ij}=1$ and $A_{ji}=-1$.
 12 | 
 13 | Adjacent matrix representation requires $\Theta(n^2)$ space. For a dense graph this is fine, but for a sparse graph it is wasteful. 
 14 | \subsection{Adjacent Lists}
 15 | The adjacent lists representation is composed of 4 ingredients:
 16 | \begin{itemize}
 17 | \item Array/List of vertices. $\Theta(n)$ space.
 18 | \item Array/List of edges. $\Theta(m)$ space.
 19 | \item Each edge points to its end points. $\Theta(m)$ space.
 20 | \item Each vertex points to edges incident on it. $\Theta(m)$ space.
 21 | \end{itemize}
 22 | Adjacent lists representation requires $\Theta(n+m)$ space. 
 23 | 
 24 | The choice between the two representations depends on the density of the graph and operations to take. We will mainly use adjacent lists in this chapter.
 25 | \section{Minimum Cut}
 26 | \subsection{Definition}
 27 | \begin{definition}
 28 | A cut of a graph ($V,E$) is a partition of $V$ into two non-empty sets $A$ and $B$.
 29 | \end{definition}
 30 | A graph with $n$ vertices has $2^n-2$ possible cuts. 
 31 | \begin{definition}
 32 | The crossing edges of a cut($A,B$) are those with 
 33 | \begin{itemize}
 34 | \item one endpoint in $A$ and the other in $B$, for undirected graphs;
 35 | \item tail in $A$ and head in $B$, for directed graphs.
 36 | \end{itemize}
 37 | \end{definition}
 38 | We will try to solve the minimum cut problem:
 39 | \begin{description}
 40 | \item[Input]An undirected graph $G=(V,E)$ in which parallel edges are allowed.
 41 | \item[Output]A cut $(A,B)$ with minimum number of crossing edges.
 42 | \end{description}
 43 | A lot of problems in reality can be reduced to a minimum cut problems:
 44 | \begin{itemize}
 45 | \item Identify weakness point of physical networks;
 46 | \item Community detection in social networks;
 47 | \item Image segmentation.
 48 | \end{itemize}
 49 | \subsection{Random Contraction Algorithm}
 50 | Algorithm \ref{randomcontraction} provides a random approach to find a cut. 
 51 | \begin{algorithm}[ht]
 52 | \caption{Random Contraction}\label{randomcontraction}
 53 | \begin{algorithmic}[1]
 54 | \Input\Statex{An undirected graph $G=(V,E)$ in which parallel edges are allowed.}
 55 | \Output\Statex{A cut $(A,B)$ with minimum number of crossing edges}
 56 | \While{There are more than 2 vertices left}
 57 | \State{Pick a remaining edge $(u,v)$ randomly}\label{randomselection}
 58 | \State{Merge(or contract) $u,v$ into a single vertex}
 59 | \State{Remove self-loops}
 60 | \EndWhile
 61 | \State{Return cut represented by 2 final vertices}
 62 | \end{algorithmic}
 63 | \end{algorithm}
 64 | \subsection{Probability of Correctness}
 65 | Algorithm \ref{randomcontraction} is not guaranteed to always return a minimum cut. We have to iterate the whole algorithm multiple times and choose the cut with minimum number of crossing edges obtained during the process. In order to estimate the number of iterations needed, we will calculate the probability that a specific minimum cut $(A,B)$ is returned in one iteration.
 66 | 
 67 | If one of the crossing edges of $(A,B)$ is selected in step \ref{randomselection}, the minimum cut cannot be returned. If there are $k$ crossing edges in $(A,B)$, the probability that a crossing edge is selected at the first iteration is $k/m$. The number of edges decrease by an indefinite number at each iteration, while the number of vertices decrease by exactly 1 each time. Thus it is preferable to express the probability in terms of $n$ instead of $m$.
 68 | 
 69 | Each vertex $v$ is related to a cut $({v},V-{v})$. The number of crossing edges of this cut is the number of edges incident on $v$, i.e. the degree of $v$. Considering the definition of minimum cut, this number must be larger than $k$. We also know the sum of the degrees of all vertices: $\sum\limits_{v}degree(v)=2m$. Thus we have $2m=\sum\limits_{v}degree(v)\geq kn$. Hence the probability that a crossing edge is not selected at the first iteration in step \ref{randomselection} is $$1-k/m\geq 2/n.$$
 70 | 
 71 | At the second iteration, the same argument still holds. Let $m'$ represent the number of remaining edges after the first iteration. We have $2m'=\sum\limits_{v}degree(v)\geq k(n-1)$. Thus the probability that a crossing edge is selected at the 2nd iteration under the condition that no crossing edge was selected at the 1st iteration is $$1-k/m'\geq 2/(n-1)$$.
 72 | 
 73 | The same argument continues further until the last iteration, i.e. the $(n-2)^{th}$ iteration. Finally, the probability that no crossing edge is selected in the whole process, thus resulting in the minimum cut $(A,B)$ is
 74 | \begin{align*}
 75 | P(A,B)&\geq\left(1-\frac{2}{n}\right)\left(1-\frac{2}{n-1}\right)\cdots\left(1-\frac{2}{n-(n-3)}\right)\\
 76 | &=\frac{(n-2)\cdot(n-3)\cdots 2\cdot 1}{n\cdot(n-1)\cdots 4\cdot 3}\\
 77 | &=\frac{2}{n(n-1)}>\frac{1}{n^2}.
 78 | \end{align*}
 79 | The probability seems small, but it already makes great advancement when compared with brute-force method, in which case the probability to obtain $(A,B)$ in an iteration is $\frac{1}{2^n}$.
 80 | 
 81 | After $N$ iterations, the probability that $(A,B)$ has not been found is 
 82 | \begin{equation*}
 83 | P(not\:found)<\left(1-\frac{1}{n^2}\right)^N\leq e^{-\frac{N}{n^2}}.
 84 | \end{equation*}
 85 | If $N=n^2$, the probability is smaller than $1/e$. If $N=n^2\ln n$, it is smaller than $1/n$. 
 86 | \subsection{Number of Minimum Cuts}
 87 | A graph can have multiple minimum cuts. For example, a tree with $n$ vertices has $n-1$ minimum cuts. We would like to find the largest number of minimum cuts that a graph with $n$ vertices can have.
 88 | \begin{theorem}
 89 | A graph with $n$ vertices can have at most $\binom{n}{2}$ minimum cuts.
 90 | \end{theorem}
 91 | \begin{proof}
 92 | Consider the graph in which the $n$ vertices form a circle. Removing any two edges results in a minimum cut. Thus in total it has $\binom{n}{2}$ minimum cuts. The rest of the proof aims at proving that a graph with $n$ vertices cannot have more minimum cuts.
 93 | 
 94 | Recall that the probability for Algorithm \ref{randomcontraction} to return a specific minimum cut is bigger than $\frac{2}{n(n-1)}$. Suppose there are $k$ minimum cuts. The events ``Algorithm \ref{randomcontraction} returns minimum cut $C_i$'' and  ``Algorithm \ref{randomcontraction} returns minimum cut $C_j$'' are disjoint events when $i\neq j$. Thus we have 
 95 | $$1\geq P(\text{return a minimum cut})\geq\frac{2k}{n(n-1)}.$$
 96 | Therefore, 
 97 | $$k\leq \frac{n(n-1)}{2}=\binom{n}{2}.$$
 98 | \end{proof}
 99 | \section{Breadth First Search}
100 | Graph search is widely used for various purposes:
101 | \begin{itemize}
102 | \item Check if a network is connected;
103 | \item Find shortest paths, e.g. for driving navigation, or formulating a plan;
104 | \item Calculate connected components.
105 | \item $\dots$
106 | \end{itemize}
107 | We will introduce a few fast algorithms based on graph search. Graph search usually starts from a source vertex. When searching the graph, we want to find everything that is findable, i.e. every vertex reachable from the source via a path. Moreover, we never explore anything twice. In terms of running time, our goal is $O(m+n).$ 
108 | 
109 | BFS explores the nodes of a graph in layers. Nodes with the same distances from the source are in the same layer. It can be used to compute shortest paths of graphs, and to compute connected components of undirected graphs. It guarantees $O(m+n)$ running time. The general pattern of BFS is shown in Algorithm \ref{bfs}.
110 | \begin{algorithm}[ht]
111 | \caption{Breadth First Search(BFS)}\label{bfs}
112 | \begin{algorithmic}[1]
113 | \Input\Statex{Graph $G$ with all vertices unexplored}
114 | \Statex{Source vertex $s$}
115 | \Output\Statex{$G$ with all vertices reachable from $s$ explored.}
116 | \State{Mark $s$ as explored.}
117 | \State{Let $Q$ = queue initialized with $s$}
118 | \While{$Q\neq\emptyset$}
119 | \State{Remove first element $v$ of $Q$}
120 | \For{each edge $(v,w)$}
121 | \If{$w$ is unexplored}
122 | \State{Mark $w$ as explored}
123 | \State{Add $w$ to $Q$}
124 | \EndIf
125 | \EndFor
126 | \EndWhile 
127 | \end{algorithmic}
128 | \end{algorithm}
129 | 
130 | At the end of BFS, the fact that a node $v$ is explored means the existence of a path from $s$ to $v$. 
131 | \subsection{Shortest Path}
132 | Algorithm \ref{shortestpath} calculates the shortest path from $s$ to any vertex reachable from $s$.
133 | \begin{algorithm}[ht]
134 | \caption{Shortest Path - BFS}\label{shortestpath}
135 | \begin{algorithmic}[1]
136 | \Input\Statex{Graph $G$ with all vertices unexplored}
137 | \Statex{Source vertex $s$}
138 | \Output\Statex{$dist(v)$ for any vertex $v$, i.e. min number of edges on a path from $s$ to $v$}
139 | \State{Initialize $dist(v)=\left\{\begin{array}{rl}0,&if(v==s)\\+\infty,&if(v\neq s)\end{array}\right.$}
140 | \State{Mark $s$ as explored.}
141 | \State{Let $Q$ = queue initialized with $s$}
142 | \While{$Q\neq\emptyset$}
143 | \State{Remove first element $v$ of $Q$}
144 | \For{each edge $(v,w)$}
145 | \If{$w$ is unexplored}
146 | \State{Mark $w$ as explored}
147 | \State{$dist(w)=dist(v)+1$}
148 | \State{Add $w$ to $Q$}
149 | \EndIf
150 | \EndFor
151 | \EndWhile 
152 | \end{algorithmic}
153 | \end{algorithm}
154 | 
155 | After the algorithm terminates, $dist(v)=i$ means that $v$ is in the $i^{th}$ layer and that the shortest path connecting $s$ and $v$ has $i$ edges.
156 | \subsection{Undirected Connectivity}
157 | \begin{definition}
158 | For an undirected graph $G(V,E)$, connected components are equivalence classes of the equivalence relation\footnote{An equivalence relation on a set must satisfy: 1. $a\sim a$; 2. If $a\sim b$, then $b\sim a$; 3. If $a\sim b$ and $b\sim c$, then $a\sim c$.} $u\sim v$, in which $u,v$ are its vertices and $u\sim v\iff\exists$ path from $u$ to $v$. 
159 | \end{definition} 
160 | 
161 | Identifying connected components of graphs is useful for various purposes:
162 | \begin{itemize}
163 | \item Check if a network is disconnected;
164 | \item Graph visualization;
165 | \item Clustering.
166 | \end{itemize}
167 | When it comes to the calculation of connected component, undirected graphs and directed graphs are significantly different. BFS is an effective method for calculating the connectivity of undirected graphs. Algorithm \ref{ccundirected} computes the CCs of an undirected graph in $O(m+n)$ time.
168 | \begin{algorithm}[ht]
169 | \caption{Connected Components of Undirected Graph - BFS}\label{ccundirected}
170 | \begin{algorithmic}[1]
171 | \Input\Statex{Undirected graph $G$ with all vertices unexplored and labeled 1 to $n$}
172 | \Output\Statex{Connected components of $G$}
173 | \For{$i$ = 1 \textbf{to} $n$}
174 | \If{$i$ not explored}
175 | \State{$BFS(G,i)$}\Comment{discovers $i$'s connected component}
176 | \EndIf
177 | \EndFor
178 | \end{algorithmic}
179 | \end{algorithm}
180 | \section{Depth First Search}
181 | DFS is a more aggressive method to search an graph than BFS. It explores the nodes following the edges as deeply as possible, and only backtracks when necessary. DFS is especially important for dealing with directed graphs. As we are about to demonstrate, it helps to compute topological ordering of directed acyclic graphs and strongly connected components of directed graphs. As with BFS, DFS also runs in $O(m+n)$ time. 
182 | 
183 | DFS can be implemented by mimicking BFS in Algorithm \ref{bfs}. A stack should be used instead of a queue. A recursive approach is shown in Algorithm \ref{dfs}.
184 | \begin{algorithm}[ht]
185 | \caption{Depth First Search (Recursive)}\label{dfs}
186 | \begin{algorithmic}[1]
187 | \Input\Statex{Graph $G$ with all vertices unexplored}
188 | \Statex{Source vertex $s$}
189 | \Output\Statex{$G$ with all vertices reachable from $s$ explored.}
190 | \Function{$DFS$}{Graph $G$, node $s$} 
191 | \State{Mark $s$ as explored}
192 | \For{each edge $(s,v)$}
193 | \If{$v$ not explored}
194 | \State{$DFS(G,v)$}
195 | \EndIf
196 | \EndFor
197 | \EndFunction
198 | \end{algorithmic}
199 | \end{algorithm}
200 | DFS can also be used to calculate connected components of undirected graphs. But we will focus on two applications of DFS that cannot be handled with BFS.
201 | \subsection{Topological Sort}
202 | Topological sort aims at putting the nodes of a directed graph in topological ordering. 
203 | \begin{definition}
204 | A topological ordering of a directed graph $G$ is a labeling $f$ of $G$'s nodes among $\{1,2,\dots,n\}$ such that $\forall(u,v)\in G,\:f(u)<f(v)$.
205 | \end{definition}
206 | All edges of a directed graph go forward in a topological ordering. Topological sort is applied when the order of a sequence of tasks with precedence constraints needs to be arranged. Note that if a directed graph contains a cycle, there exists no topological ordering for it. This condition is actually necessary and sufficient.
207 | \begin{theorem}
208 | A directed graph has a topological ordering if and only if it contains no cycle.
209 | \end{theorem}
210 | One way to find a topological ordering of a DAG is by identifying sink nodes, i.e. nodes with no outgoing arcs.
211 | \begin{theorem}
212 | A DAG has at least one sink node.
213 | \end{theorem}
214 | \begin{proof}
215 | The theorem can be proved by contradiction easily. Suppose we have a DAG containing no sink node. Starting from a source node, we can follow one of its outgoing arcs to a new node. The process can be done indefinitely because every node has outgoing arcs. But it is inevitable that at least one node will be visited multiple times after $n+1$ steps, forming a cycle, which contradicts with the definition of DAG.
216 | \end{proof}
217 | A sink node is a perfect candidate for the last position in the topological ordering: no arc starts from it. Algorithm \ref{topologicalnodfs} calculates the topological ordering of a DAG by recursively putting a sink node at the end of the ordering. 
218 | \begin{algorithm}[ht]
219 | \caption{Topological Ordering of DAG}\label{topologicalnodfs}
220 | \begin{algorithmic}[1]
221 | \Input\Statex{Directed acyclic graph $G$ with $n$ nodes}
222 | \Output\Statex{Topological ordering of $G$}
223 | \Function{$TopologicalOrdering$}{Graph $G$}
224 | \If{$G$ is empty}
225 | \State{return}
226 | \EndIf
227 | \State{Find a sink node $v$ in $G$}
228 | \State{set $f(v)=n$}
229 | \State{$TopologicalOrdering(G-\{v\})$}
230 | \EndFunction
231 | \end{algorithmic}
232 | \end{algorithm}
233 | 
234 | The algorithm using DFS is shown in Algorithm \ref{topologicaldfs}. A loop over the nodes is added in order to guarantee the correctness when $G$ is not connected.
235 | \begin{algorithm}[ht]
236 | \caption{Topological Ordering of DAG - DFS}\label{topologicaldfs}
237 | \begin{algorithmic}[1]
238 | \Input\Statex{Directed acyclic graph $G$ with $n$ nodes, unexplored}
239 | \Output\Statex{Topological ordering of $G$}
240 | \Function{$DFSLoop$}{Graph $G$}
241 | \State{$current\_label=n$}
242 | \For{each $v$ in $G$}
243 | \If{$v$ not explored}
244 | \State{$DFS(G,v)$}
245 | \EndIf
246 | \EndFor
247 | \EndFunction
248 | 
249 | \Function{$DFS$}{Graph $G$, node $s$} 
250 | \State{Mark $s$ as explored}
251 | \For{each edge $(s,v)$}
252 | \If{$v$ not explored}
253 | \State{$DFS(G,v)$}
254 | \EndIf
255 | \EndFor
256 | \State{$f(s)=current\_label--$}
257 | \EndFunction
258 | \end{algorithmic}
259 | \end{algorithm}
260 | \subsection{Strongly Connected Components}
261 | The concept of connectivity in directed graphs is different from that in undirected graphs.
262 | \begin{definition}
263 | Strong connected components (SCCs) of a directed graph $G(V,E)$ are equivalence classes of the equivalence relation $u\sim v$, in which $u,v\in V$ and $u\sim v\iff\exists$ path from $u$ to $v$ as well as from $v$ to $u$.
264 | \end{definition}
265 | Algorithm \ref{twopassscc} computes all SCCs of a directed graph by running DFS twice, thus in $O(m+n)$ time. In the end, vertices with the same ``leader'' are in the same SCC. An example is provided in Figure \ref{egscc}.
266 | \begin{figure}[ht]
267 | \centering
268 | \includegraphics[width=\textwidth]{egscc.jpg}
269 | \caption{Example of Algorithm \ref{twopassscc}}\label{egscc}
270 | \end{figure}
271 | \begin{algorithm}[ht]
272 | \caption{Kosaraju's 2-Pass SCC Algorithm - DFS}\label{twopassscc}
273 | \begin{algorithmic}[1]
274 | \Input\Statex{Directed graph $G(V,E)$ with $n$ nodes labeled 1 to $n$}
275 | \Output\Statex{SCCs of $G$}
276 | \State{Reverse $G$ to get $G^{rev}$}
277 | \State{\color{blue}1$^{st}$ loop: $DFSLoop(G^{rev})$}
278 | \State{\color{red}2$^{nd}$ loop: $DFSLoop(G)$}
279 | \Function{$DFSLoop$}{Graph $G$}
280 | \State{Set all nodes unexplored}
281 | \State{$t\coloneqq 0$, $s\coloneqq null$}\Comment{$t$: finish time. $s$: current leader}
282 | \State{\color{red}2$^{nd}$ loop only: Relabel nodes according to $f(i)$}
283 | \For{$i$ = $n$ \textbf{to} 1}
284 | \If{$i$ not explored}
285 | \State{\color{red}2$^{nd}$ loop only: $s\coloneqq i$}
286 | \State{$DFS(G,i)$}
287 | \EndIf
288 | \EndFor
289 | \EndFunction
290 | \Function{$DFS$}{Graph $G$, node $i$}
291 | \State{Mark $i$ as explored}
292 | \State{\color{red}2$^{nd}$ loop only: $leader(i)\coloneqq s$}
293 | \For{each arc $i\rightarrow j\in E$}
294 | \If{$j$ not explored}
295 | \State{$DFS(G,j)$}
296 | \EndIf
297 | \State{\color{blue}1$^{st}$ loop only: $f(i)\coloneqq ++t$}
298 | \EndFor
299 | \EndFunction
300 | \end{algorithmic}
301 | \end{algorithm}
302 | 
303 | The SCCs of a directed graph induce a meta-DAG. The meta-nodes are the SCCs, while its arcs are the original arcs between different SCCs, as shown in Figure \ref{metagraph}. It is guaranteed to be a DAG because the existence of any loop in the meta-graph will force a few SCCs to collapse into one big SCC.
304 | \begin{figure}[ht]
305 | \centering
306 | \includegraphics[width=0.5\textwidth]{metagraph.jpg}
307 | \caption{Meta-graph induced by SCCs}\label{metagraph}
308 | \end{figure}
309 | 
310 | DFS from a node is guaranteed to reach all other nodes in the same SCC, but it will also reach other SCCs as long as there exist outgoing arcs from this SCC, which are also outgoing arcs in the meta-DAG. Sink nodes in the meta-DAG do not have outgoing arcs, thus the correspondent SCC can be isolated if we start a DFS from one node in this SCC. Algorithm \ref{twopassscc} does exactly this. In order to prove its correctness we need the following lemma.
311 | \begin{lemma}\label{keylemmascc}
312 | Consider two adjacent SCCs of $G$: $C_1$ and $C_2$ such that $\exists$ arc $i\rightarrow j$ in which $i\in C_1,\:j\in C_2$. Let $f(v)$ denote the finishing time determined in the 1$^{st}$ loop of Algorithm \ref{twopassscc} for node $v$. Then we must have
313 | $$\max\limits_{v\in C_1}f(v)<\max\limits_{v\in C_2}f(v).$$
314 | \end{lemma}
315 | \begin{proof}
316 | In $G^{rev}$, there exists arc $j\rightarrow i$. Let $v$ represent the first vertex in $C_1\cup C_2$ to be explored. $v$ can be in either $C_1$ or $C_2$, as shown in Figure \ref{prooflemmascc}.
317 | \begin{figure}[ht]
318 | \centering
319 | \includegraphics[width=\textwidth]{prooflemmascc.jpg}
320 | \caption{Proof of Lemma \ref{keylemmascc}}\label{prooflemmascc}
321 | \end{figure}
322 | 
323 | If $v\in C_1$, then none of the nodes in $C_2$ will be explored before all nodes in $C_1$ are explored, because there exists no arc from $C_1$ to $C_2$. Thus we have $f(v_1)<f(v_2),\:\forall v_1\in C_1, v_2\in C_2$, which is a stronger conclusion than our original argument in the lemma.
324 | 
325 | If $v\in C_2$, DFS for $v$ won't finish before DFSs for all nodes in $C_2$ finish. In particular, DFS for $j$ won't finish before DFS for all nodes in $C_1$ finish. Hence $v$ will have the largest $f$ value amongst all nodes in $C_1\cup C_2$, and we have $f(v)=\max\limits_{i\in C_2}f(i)>\max\limits_{i\in C_1}f(i)$ 
326 | \end{proof}
327 | 
328 | An obvious corollary of Lemma \ref{keylemmascc} is as follow.
329 | \begin{corollary}\label{corollaryscc}
330 | $\max\limits_{v\in V}f(v)$ must lie in a sink SCC, i.e. an SCC that has no outgoing arc. 
331 | \end{corollary}
332 | Therefore, by starting from the node with the largest $f$ value in the 2$^{nd}$ loop, we are guaranteed to explore a sink SCC of $G$ first. Nodes in this sink SCC are ruled out from further exploration because there have been marked as explored. Every time we set up a new leader, it is guaranteed to be the node with the largest $f$ value amongst all unexplored nodes. DFS from the leader will reach and will only reach nodes in the same SCC as the leader, which is a sink SCC of the unexplored part of the graph. The SCCs will be found one by one.
333 | \section{Dijkstra's Shortest Path Algorithm}
334 | If all edges in a graph have equal lengths, the shortest path problem can be solved by BFS, as discussed in Algorithm \ref{shortestpath}. Dijkstra's algorithm computes shortest paths when edges have different lengths.
335 | \begin{description}
336 | \item[Input]Directed graph $G(V,E)$. Each edge $e\in E$ has non-negative length $l_e$. A source vertex $s$.
337 | \item[Output]For each $v\in V$, compute the length of shortest path from $s$ to $v$ in $G$.
338 | \end{description}
339 | For convenience, we assume that there exists a path from $s$ to any vertex in $G$. 
340 | \subsection{Algorithm}
341 | Dijkstra's algorithm is shown in Algorithm \ref{dijkstra}.
342 | \begin{algorithm}[ht]
343 | \caption{Dijkstra's Shortest Path Algorithm}\label{dijkstra}
344 | \begin{algorithmic}[1]
345 | \Input\Statex{Directed graph $G(V,E)$. Each edge $e\in E$ has non-negative lengths $l_e$}
346 | \Statex{Source vertex $s$}
347 | \Output\Statex{Shortest path from $s$ to $v$ for all $v\in V$}
348 | \State{Initialize $X=\{s\}$}\Comment{$X$: vertices processed so far}
349 | \State{$A[s]=0$}\Comment{$A[v]$: length of shortest path from $s$ to $v$}
350 | \State{$B[s]= empty\:path$}\Comment{$B[v]$: shortest path from $s$ to $v$}
351 | \While{$X\neq V$}
352 | \State{Among all edges $v\rightarrow w$ with $v\in X,w\notin X$, choose $v^*\rightarrow w^*$ that minimizes $A[v]+l_{vw}$}\Comment{Let's call it ``Dijkstra's greedy score''}
353 | \State{$X\coloneqq X\cup\{w^*\}$}
354 | \State{$A[w^*]\coloneqq A[v^*]+l_{v^*w^*}$}
355 | \State{$B[w^*]\coloneqq B[v^*]+v*\rightarrow w^*$}
356 | \EndWhile
357 | \end{algorithmic}
358 | \end{algorithm}
359 | \subsection{Correctness}
360 | The correctness of Dijkstra's algorithm can be proved by induction.
361 | \begin{proof}
362 | We will try to prove by induction that after each iteration,$\forall v\in X$, $B[v]$ is the shortest path from $s$ to $v$, and $A[v]$ is the length of the shortest path. 
363 | 
364 | At the beginning, $X=\{s\}$, $A[s]=0$, $B[s]=empty\:path$. Obviously the conclusion is correct. Let's assume that it holds before an iteration, and in this iteration we have chosen the edge $v^*\rightarrow w^*$. In order to add $w^*$ to $X$, we have to prove that $B[v^*]+v^*\rightarrow w^*$ with length $A[v^*]+l_{v^*w^*}$ is the shortest path from $s$ to $w^*$.
365 | 
366 | Take any path $S$ from $s$ to $w^*$. It has to cross the boundary between $X$ and $V-X$ somewhere. Suppose the edge from $X$ to $V-X$ is $y\rightarrow z$. This path can be divided into 3 segments:
367 | \begin{enumerate}
368 | \item $S_1$: from $s$ to $y$. According to our assumption, it is at least as long as $A[y]$: $L(S_1)\geq A[y]$.
369 | \item $S_2$: the edge $y\rightarrow z$. $L(S_2)=l_{yz}.$
370 | \item $S_3$: from $z$ to $w$. All edges have non-negative length, thus $L(S_3)\geq 0.$
371 | \end{enumerate}
372 | Dijkstra's algorithm guarantees that 
373 | $$A[v^*]+l_{v^*w^*}\leq A[y]+l_{yz}.$$
374 | Thus we have 
375 | $$L(S)=L(S_1)+L(S_2)+L(S_3)\geq A[y]+l_{yz}\geq A[v^*]+l_{v^*w^*}.$$
376 | Therefore, $B[v^*]+v^*\rightarrow w^*$ is the shortest path from $s$ to $w^*$.
377 | \end{proof}
378 | \subsection{Implementation and Running Time}
379 | A naive implementation of Dijkstra's algorithm can take as long as $O(nm)$ time to run: in each iteration, we have to scan through all edges to find $v^*\rightarrow w^*$. In order to speed up the execution, we have to turn to the heap data structure.
380 | 
381 | Heap is a data structure designed to perform insertion and extraction of minimum in $O(\log n)$ time. Conceptually, a heap is an almost perfectly balanced binary tree (null leaves are only allowed at the lowest level). The key of each node must be smaller (or equal to) that of its two children. This property guarantees that the node with the smallest key is at the root. Insertion is performed by adding the element behind the last node and bubbling up, while extraction of minimum is performed by swapping the root and the last node and then bubbling down. The height of the tree is $O(\log n)$, thus insertion and extraction of minimum can be executed in $O(\log n)$ time. 
382 | 
383 | In the implementation of Dijkstra's algorithm, we use a heap to store the vertices in $V-X$. The key of a node is the smallest Dijkstra's greedy score related to this vertex, i.e. for $v\in V-X$, $key[v]$ is the smallest value of $A[u]+l_{uv},\forall u\in X$. If such edge $u\rightarrow v$ does not exist, $key[v]=+\infty.$ In each iteration of Dijkstra's algorithm, we extract the minimum of the heap and denote it with $w$. Now we should have $w\in X$, and $A[w]$ is the length of the shortest path from $s$ to $w$. Then for all edges $w\rightarrow v$ with $v\in V-X$, we update the key of $v$:
384 | $$key[v]\coloneqq\min\{A[w]+l_{wv},key[v]\}.$$
385 | If $key[v]$ is changed here, we bubble it down in the heap, which is a $O(\log n)$ operation. In this way the heap gets maintained at each iteration.
386 | 
387 | In total, we do $n-1$ extractions of minimum, and at most $m$ bubbling down of element in the heap. Each of these operations is $O(\log n)$, thus the total time consumption is $O((m+n)\log n)$. Since the graph is weakly connected ($\forall v\exists$ path from $s$ to $v$), we have $m\geq n-1$, hence $O(m+n)=O(m)$. In conclusion, the running time of Dijkstra's algorithm implemented using heap is $O(m\log n)$.
388 | \ifx\PREAMBLE\undefined
389 | \end{document}
390 | \fi


--------------------------------------------------------------------------------
/NPCompleteness.tex:
--------------------------------------------------------------------------------
  1 | \ifx\PREAMBLE\undefined
  2 | \input{preamble}
  3 | \begin{document}
  4 | \fi
  5 | \chapter{NP Completeness}
  6 | Up to now we have been focusing on problems that can be solved in polynomial time. However, many important problems seem impossible to solve efficiently.  We will introduce NP-completeness to formalize the computational intractability of these problems, and introduce some algorithmic approaches to NP-complete problems. We will discuss only deterministic algorithms, but it does not affect the correctness of the conclusions drawn from our discussion: it is not likely that a problem requiring exponential time with deterministic algorithm can be solved by a randomized algorithm in polynomial time.
  7 | \section{NP Complete Problems}
  8 | \subsection{The Class P}
  9 | \begin{definition}
 10 | A problem is \textbf{polynomial-time solvable} if there is an algorithm that correctly solves it in $O(n^k)$ time for some constant $k$. $n$ is the length of the input.
 11 | \end{definition}
 12 | \begin{definition}
 13 | The class \textbf{P} is defined as the set of all polynomial-time solvable problems. 
 14 | \end{definition}
 15 | Every problem we've seen in the course is in the class P except for the cycle-free shortest path problem for graphs with negative cycles, as well as the knapsack problem. Knapsack problem is not polynomial-time solvable because its running time is $O(nW)$, while the input length is proportional to $\log W$. 
 16 | \subsection{Reductions and Completeness}
 17 | Computational intractability can be illustrated via relative difficulty, i.e. by showing that a problem is ``as hard as'' a lot of other problems. This requires the definition of reduction. 
 18 | \begin{definition}
 19 | Problem $\Pi_1$ \textbf{reduces} to problem $\Pi_2$ if given a polynomial time subroutine to solve $\Pi_2$, $\Pi_1$ can be solved in polynomial time based on it.
 20 | \end{definition}
 21 | Suppose $\Pi_1$ reduces to $\Pi_2$. If $\Pi_2\in P$, then $\Pi_1\in P$. The contrapositive is also correct: if $\Pi_1\notin P$, then $\Pi_2\notin P$, i.e. $\Pi_2$ is at least as hard as $\Pi_1$. 
 22 | \begin{definition}
 23 | Let C be a set of problems. A problem $\Pi$ is C-complete if 
 24 | \begin{enumerate}
 25 | \item $\Pi\in C$;
 26 | \item Every problem in C reduces to $\Pi$. 
 27 | \end{enumerate}
 28 | \end{definition}
 29 | A C-complete problem is the hardest problem in C.
 30 | \subsection{Traveling Salesman Problem}
 31 | \begin{description}
 32 | \Input{A completed undirected graph with non-negative edge costs.}
 33 | \Output{A minimum cost tour, i.e. a cycle that visits every vertex exactly once with minimum total cost.}
 34 | \end{description}
 35 | It has been conjectured for a long time that there exists no polynomial-time algorithm for the TSP problem. In order to demonstrate its computational intractability, we would like to show that it is C-complete for a really big set C. C cannot be the set of all problems. For instance, the Halting problem, i.e. given a program and an input for it, determine whether it will halt, has been proved to be unsolvable with any algorithm. TSP is obviously solvable in finite time via brute-force search. A less ambitious yet more promising idea is to try to prove that TSP is as hard as all brute-force solvable problems.
 36 | \subsection{The Class NP}
 37 | \begin{definition}
 38 | A problem is in the class NP\footnote{NP stands for ``nondeterministic polynomial'', instead of ``not polynomial''.} if
 39 | \begin{enumerate}
 40 | \item Solutions always have length polynomial in the input size;
 41 | \item Purported solutions can be verified in polynomial time. 
 42 | \end{enumerate} 
 43 | \end{definition}
 44 | As an example, looking for a TSP tour with total cost smaller than $T$ is an NP problem. The original TSP problem reduces to this problem via binary search over the threshold $T$. Also, constraint satisfaction problems, e.g. the 3SAT problem, are NP problems.  
 45 | 
 46 | The definition of NP ensures that each problem in NP can be solved by brute-force search in exponential time. The first condition implies that the number of candidate solutions is at most exponential in the input size. According to the second condition, each candidate can be verified in polynomial time, thus brute-force search takes at most exponential time. 
 47 | 
 48 | The two conditions in the definition of NP are quite weak, thus NP is a really big class. In fact, the majority of natural computational problems are in NP. By definition of completeness, a polynomial-time algorithm for one NP-complete problem can help to solve every problem in NP in polynomial time, which implies that P=NP. Thus NP-completeness is a strong evidence of computational intractability. 
 49 | 
 50 | A lot of NP-complete problems have been recognized, including the TSP problem. In general, in order to prove the NP-completeness of a problem, we should prove that a known NP-complete problem can reduce to it.
 51 | 
 52 | Whether P = NP or not is one of the most important open mathematical questions. It is widely conjectured, but not yet proved that P $\neq$ NP.
 53 | \subsection{Algorithmic Approaches to NP-complete Problems}
 54 | Up to now no polynomial algorithm has been found for NP-complete problems. There are nonetheless a few useful strategies that can help us beat brute-force search.
 55 | \subsubsection{Focus on computationally tractable special cases.}
 56 | For example, the max-weight independent set problem is NP-complete for general graphs, but P for path graphs and trees. Knapsack problems with polynomial capacity, i.e. $W=O(n)$ can be solved in polynomial time. 2SAT is P, while 3SAT is NP-complete, etc.
 57 | \subsubsection{Heuristics}
 58 | Heuristics are fast algorithms that are not always correct. We will introduce a few greedy and dynamic-programming based heuristics for Knapsack problem.
 59 | \subsubsection{Find exponential algorithms better than brute-force search.}
 60 | The dynamic programming algorithm that we have introduced is one such example. We will introduce more later.
 61 | \section{Exact Algorithms for NP-complete Problems}
 62 | \subsection{The Vertex Cover Problem}
 63 | \begin{description}
 64 | \Input{An undirected graph $G=(V,E)$.}
 65 | \Output{A minimum-cardinality \textbf{vertex cover}, i.e. a subset $S\subseteq V$ that contains at least one end point of each edge of $G$.}
 66 | \end{description}
 67 | For instance, the minimum size of a vertex cover for a star graph is 1, while for a complete graph (a ``clique'') it is $n-1$. 
 68 | \begin{theorem}
 69 | A set is a vertex cover if and only if its complement is an independence set.
 70 | \end{theorem}
 71 | \begin{proof}
 72 | A set is a vertex cover \\
 73 | $\iff$ Each edge of the graph is adjacent to at least one vertex in the set \\
 74 | $\iff$ Each edge of the graph is adjacent to at most one vertex not in the set, i.e. in its complement\\
 75 | $\iff$ Its complement is an independent set.
 76 | \end{proof}
 77 | In general, the vertex cover problem is an NP-complete problem. We will take the 3rd approach above, i.e. try to devise an exponential algorithm better than brute-force search.
 78 | 
 79 | Consider a simpler problem: given a positive integer $k$, we would like to check whether or not there exists a vertex cover of size $\leq k$. There are in total $\binom{n}{k}$ candidate solutions. Verifying each candidate takes $O(m)$ time, thus for small $k$, brute-force search takes $O(mn^k)$ time. We can actually do better.
 80 | \begin{lemma}\textbf{Substructure Lemma}
 81 | Consider an undirected  graph $G$ and edge $(u,v)\in G$ and integer $k\geq 1$. Let $G_u$ represent the subgraph of $G$ obtained by deleting $u$ and its incident edges from $G$, and $G_v$ defined similarly. Then $G$ has a vertex of size $k$ $\iff$ $G_u$ or $G_v$ has a vertex cover of size $k-1$.
 82 | \end{lemma}
 83 | \begin{proof}
 84 | ($\Leftarrow$)
 85 | Suppose $G_u$ has a vertex cover $S$ of size $k-1$. Denote all edges inside $G_u$ as $E_u$ and edges incident to $u$ as $F_u$, then $E=E_u\cup F_u$ and $E_u\cap F_u=\emptyset$, as shown in Figure \ref{vertexcoverfigure}.
 86 | \begin{figure}
 87 | \centering
 88 | \includegraphics[width=0.4\textwidth]{vertexcover1.jpg}
 89 | \caption{Vertex Cover Substructure Lemma}\label{vertexcoverfigure}
 90 | \end{figure}
 91 | Since $S$ contains at least one endpoint for any edge in $E_u$, $S\cup\{u\}$ is a vertex cover of $G$ of size $k$.
 92 | 
 93 | ($\Rightarrow$)
 94 | Let $S$ be a vertex cover of $G$ of size $k$. For any edge $(u,v)$, at least one of $u,v$ is inside $S$. Suppose $u\in S$. For any edge in $E_u$, at least one of its endpoints is in $S$, and it cannot be $u$, thus $S-\{u\}$ is a vertex cover of $G_u$ of size $k-1$.
 95 | \end{proof}
 96 | According to the substructure lemma, we can compute the vertex cover of a graph with Algorithm \ref{vertexcover}.
 97 | \begin{algorithm}[ht]
 98 | \caption{Vertex Cover Problem}\label{vertexcover}
 99 | \begin{algorithmic}[1]
100 | \Input{Undirected graph $G=(V,E)$.}
101 | \Output{Vertex cover of $G$ of size $k$.}
102 | \If{$k=0$}
103 | \If{$V=\emptyset$}
104 | \State{Return $\emptyset$}
105 | \Else\State{Fail}
106 | \EndIf\EndIf
107 | \State{Pick an arbitrary edge $(u,v)\in E$.}
108 | \State{Recursively search for a vertex cover $S$ of size $k-1$ in $G_u$.}
109 | \State{If found, return $S\cup\{u\}$.}
110 | \State{Recursively search for a vertex cover $S$ of size $k-1$ in $G_v$.}
111 | \State{If found, return $S\cup\{v\}$.}
112 | \State{Fail.}\Comment{$G$ has no vertex cover of size $k$.}
113 | \end{algorithmic}
114 | \end{algorithm}
115 | 
116 | There are at most $O(2^k)$ recursive calls, each call takes $O(m)$ time (construction of $G_u$ or $G_v$), thus the overall running time is $O(m2^k)$, which is much better than $O(mn^k)$.
117 | 
118 | \subsection{Traveling Salesman Problem}
119 | A brute-force search algorithm takes $O(n!)$ time. We will try to devise an algorithm with better running time. 
120 | 
121 | For every destination $j\in\{1,2,\dots,n\}$ and every subset $S\subseteq\{1,2,\dots,n\}$ that contains 1 and $j$, let $L_{S,j}$ represent the minimum length of a path from 1 to $j$ that visits each vertex in $S$ exactly once.
122 | \begin{lemma}\textbf{Optimal Substructure Lemma}
123 | Let $P$ be a shortest path from 1 to $j$ that visits each vertex in $S$ exactly once. If the last hop of $P$ is $(k,j)$ and the rest of $P$ is $P'$, then $P'$ is a shortest path from 1 to $k$ that visits each vertex in $S-\{j\}$ exactly once. 
124 | \end{lemma}
125 | The proof of the optimal substructure lemma is straightforward. According to the lemma, we have the induction relation
126 | $$L_{S,j}=\min_{k\in S,k\neq j}\{L_{S-\{j\},k}+c_{kj}\}.$$
127 | Algorithm \ref{tsp} is the dynamic programming algorithm based on this relation.
128 | \begin{algorithm}[ht]
129 | \caption{TSP Problem}\label{tsp}
130 | \begin{algorithmic}[1]
131 | \Input{A completed undirected graph with cost $c_{uv}$ for edge $(u,v)$.}
132 | \Output{$2^n\times n$ 2D array $A$ with $A[S][j]=L_{S,j}$.}
133 | \State{Initialize $A[S][1]=\begin{cases}
134 | 0&if\:S=\{1\}\\
135 | +\infty&otherwise
136 | \end{cases}$}
137 | \For{$m=2,3,\dots,n$}\Comment{$m$ = subproblem size}
138 | \For{each $S\subseteq\{1,2,\dots,n\}$ with $\lvert S\rvert=m$ and $1\in S$}
139 | \For{each $j\in S$ and $j\neq 1$}
140 | \State{$A[S][j]=\min_{k\in S,k\neq j}\{A[S-\{j\}][k]+c_{kj}\}$}
141 | \EndFor\EndFor\EndFor
142 | \State{Return $\min_{j=2,\dots,n}{A[\{1,2,\dots,n\}][j]+c_{j1}}$}
143 | \end{algorithmic}
144 | \end{algorithm}
145 | 
146 | There are $n2^n$ subproblems, each of which takes $O(n)$ time. Thus the overall running time is $O(n^22^n)$.
147 | 
148 | \section{Heuristics for NP-Complete Problems}
149 | In this section we will use Knapsack problem as an example to illustrate the use of heuristics to tackle NP-complete problems.
150 | \subsection{Greedy Heuristic}
151 | In the Knapsack problem, we aim at maximize the total value of items under limited budget of total weight. Thus, ideal items are those with big value and small weight. A natural greedy choice is to always select the item with the largest $\frac{v}{w}$ ratio. However, this greedy approach has no guarantee of proximity to the optimal solution at all. For the simple example with $v_1=1,w_1=1,v_2=999,w_2=1000,W=1000$, we will end up with result 1, while the optimal is actually 999. 
152 | 
153 | A simple modification to this greedy heuristic can provide a fairly good proximity guarantee, as shown in Algorithm \ref{knapsackgreedyheuristic}.
154 | \begin{algorithm}[ht]
155 | \caption{Refined Greedy Heuristic for Knapsack Problem}\label{knapsackgreedyheuristic}
156 | \begin{algorithmic}[1]
157 | \Input{$n$ items with value $v_i$ and weight $w_i$ for item $i$. Total weight budget $W$(Assume that $w_i\leq W,\forall i$).}
158 | \Output{A subset $S$ of the items that maximizes $\sum_{i\in S} v_i$ and subject to $\sum_{i\in S}w_i\leq W$.}
159 | \State{Re-index the items so that $\frac{v_1}{w_1}\geq\frac{v_2}{w_2}\geq\dots\geq\frac{v_n}{w_n}$.}
160 | \State{Put the items into $S'$ in order until one does not fit.}
161 | \State{Let $k$ be the item with largest value.}
162 | \If{$\sum_{i\in S'}v_i>v_k$}
163 | \State{$S=S'$}
164 | \Else\State{$S=\{k\}$.}
165 | \EndIf
166 | \end{algorithmic}
167 | \end{algorithm}
168 | 
169 | The value of the solution provided by Algorithm \ref{knapsackgreedyheuristic} is always $\geq 50\%$ of the value of the optimal solution.
170 | \begin{proof}
171 | Suppose $S$ contains $m$ items, and we are allowed to put a fraction of item $m+1$ into $S$ so that $\sum_{i\in S}w_i=W$. Call this solution the ``greedy fractional solution''. Obviously this solution is at least as good as the optimal solution: it uses up the weight budget and guarantees larger overall $\frac{v}{w}$ ratio. 
172 | 
173 | Now let's consider the result of Algorithm \ref{knapsackgreedyheuristic}. It guarantees two conditions: 
174 | \begin{align*}
175 | \sum_{i\in S}v_i&\geq\sum_{i\in S'}v_i,\\
176 | \sum_{i\in S}v_i&\geq v_k\geq v_{m+1}.
177 | \end{align*}
178 | Summing up the two inequalities, we have 
179 | \begin{align*}
180 | 2\sum_{i\in S}v_i&\geq\sum_{i\in S'}v_i+v_{m+1}\\
181 | &\geq\text{value of greedy fractional solution}\\
182 | &\geq\text{value of optimal solution}\\
183 | \end{align*}
184 | Hence the value of the heuristic solution is at least 50\% of the value of the optimal solution.
185 | \end{proof}
186 | The 50\% proximity guarantee is tight, i.e. it is mathematically impossible to universally prove a better guarantee. Consider an example with $v_1=102,w_1=101,v_2=v_3=100,w_2=w_3=100$ and $W=200$. The optimal value is 200, while the solution provided by the refined greedy heuristic has value 102. 
187 | 
188 | Nonetheless, for specific inputs, the guarantee can be better. If $w_i\leq\delta W,\forall i$, in which $\delta$ is a small value such as 10\%, $S'$ is sure to use at least $(1-\delta)$ of $W$, thus value of the heuristic solution is at least $(1-\delta)$ of that of the optimal solution.
189 | \subsection{DP Heuristic}
190 | A dynamic programming heuristic can provide arbitrary proximity guarantee specified by user, making it possible to tune the trade-off between accuracy and running time. However this is not universal for all NP-complete problems.
191 | 
192 | In the last chapter we introduced a DP algorithm that solves the Knapsack problem in $O(nW)$ time. The subproblems were to calculate the maximal total value with total weight smaller than $x$ using only the first $i$ items, in which $i=0,1,\dots,n$ and $x=0,1,\dots,W$. There is actually another DP algorithm that solves another set of subproblems: for $i=0,1,\dots,n$ and $x=0,1,\dots,nv_{max}$, calculate the minimum total weight $S_{i,x}$ needed to achieve total value $\geq x$ using the first $i$ items. The recurrence relation is
193 | \begin{equation*}
194 | S_{i,x} = \min\{S_{i-1,x}, w_i+S_{i-1,x-v_i}\}
195 | \end{equation*} 
196 | $S_{i-1,x-v_i}$ is interpreted as 0 if $v_i>x$. The algorithm is shown in Algorithm \ref{knapsackdp2}. The overall algorithm is $O(n^2v_max)$.
197 | \begin{algorithm}[ht]
198 | \caption{Another DP Algorithm for Knapsack}\label{knapsackdp2}
199 | \begin{algorithmic}[1]
200 | \Input{$n$ items with value $v_i$ and weight $w_i$ for item $i$. Total weight budget $W$(Assume that $w_i\leq W,\forall i$).}
201 | \Output{$n\times(nv_{max}+1)$ 2D array $A$ with $A[i][x]=S_{i,x}$.}
202 | \State{Base case: $A[0][0]=0$, $A[0][x]=+\infty$ for $x\neq 0$.}
203 | \For{$i=1,2,\dots,n$}
204 | \For{$x=0,1,\dots,nv_{max}$}
205 | \If{$v_i<x$}
206 | \State{$A[i][x]=\min\{A[i-1][x],A[i-1][x-v_i]+w_i\}$}
207 | \Else{$A[i][x]=\min\{A[i-1][x],w_i\}$}
208 | \EndIf\EndFor\EndFor
209 | \State{Return largest $x$ such that $A[n][x]\leq W$.}
210 | \end{algorithmic}
211 | \end{algorithm}
212 | 
213 | Algorithm \ref{knapsackdp2} requires that all $v_i$'s are integers. If they are small (polynomial in $n$), then the algorithm is already polynomial time. If that is not the case, a heuristic comes to our rescue. Its key idea is to solve a slightly incorrect but easier Knapsack instance in polynomial time. Specifically, we set $\hat{v_i}=\lfloor\frac{v_i}{m}\rfloor$ for each item and solve the Knapsack instance using $\hat{v_i}$'s as values. Now let's analyze its accuracy and running time.
214 | 
215 | Since $\frac{v_i}{m}-1<\lfloor\frac{v_i}{m}\rfloor\leq \frac{v_i}{m}$, we have $v_i-m<m\hat{v_i}\leq v_i$. Let $S$ represent the heuristic solution and $S^*$ represent the optimal solutions. Then we have $\sum_{i\in S}\hat{v_i}\geq \sum_{i\in S^*}\hat{v_i}$ because $S$ is optimal for $\hat{v_i}$'s. Therefore,
216 | \begin{equation}\label{npknapsackeq1}
217 | \sum_{i\in S}v_i\geq m\sum_{i\in S}\hat{v_i}\geq m\sum_{i\in S^*}\hat{v_i}> \sum_{i\in S^*}(v_i-m)\geq \sum_{i\in S^*}v_i-mn.
218 | \end{equation}
219 | In order to achieve $(1-\epsilon)$ proximity, we need to guarantee 
220 | \begin{equation}\label{npknapsackeq2}
221 | \sum_{i\in S^*}v_i-\sum_{i\in S}v_i\leq\epsilon\sum_{i\in S^*}v_i.
222 | \end{equation}
223 | According to \eqref{npknapsackeq1}, a sufficient condition for \eqref{npknapsackeq2} is  
224 | \begin{equation*}
225 | mn\leq \epsilon\sum_{i\in S^*}v_i,
226 | \end{equation*}
227 | which can be satisfied by setting
228 | \begin{equation*}
229 | m=\frac{\epsilon v_{max}}{n}.
230 | \end{equation*}
231 | In terms of running time, since $\hat{v_{max}}\leq\frac{v_{max}}{m}=\frac{n}{\epsilon}$, the overall running time is $O(n^2\hat{v_{max}}=O(n^3/\epsilon)$.
232 | 
233 | In summary, in order to guarantee $(1-\epsilon)$ proximity to the optimal solution, we should solve the Knapsack instance using $\hat{v_i}=\lfloor\frac{nv_i}{\epsilon v_{max}}\rfloor$ as value, and the running time will be $O(n^3/\epsilon)$.
234 | \section{Local Search}
235 | \subsection{Principles of Local Search}
236 | In this section we will introduce a widely used paradigm to address NP complete problems: local search. 
237 | 
238 | Let $X$ be a set of solutions to a problem, for example all cuts of a graph, all possible tours of a TSP problem, all possible variable assignments of a constraint satisfaction problem, etc. For each $x\in X$, specify a subset of $X$ as its neighbors. For a graph cut, its neighborhood can be defined as the set of all cuts obtained by switching the side of one vertex; for a TSP tour, each neighbor differ from itself by 2 edges; for a CSP assignment, its neighbors differ from itself in the value of one single variable; etc. A general local search algorithm has the following structure:
239 | \begin{enumerate}
240 | \item Let $x$ be some solution.
241 | \item While $x$ has a neighbor $y$ superior to itself, set $x\coloneqq y$.
242 | \item Return the locally optimal solution $x$.
243 | \end{enumerate}
244 | 
245 | There can be multiple local optimal solutions of a problems. If $X$ is finite and each iteration is guaranteed to improve some objective function, then the local search algorithm is guaranteed to terminate and converge to one of the local optimal solutions. However, it usually does not converge quickly. Neither is it guaranteed to provide a good approximation to the global optimal solution. To mitigate this shortcoming, we can run the local search multiple times with randomly chosen starting point, and return the best local optimal solution found. Different definition of neighborhood result in different local optimal solutions. In general bigger neighborhoods leads to fewer local optimal solutions, but makes it more difficult to verify local optimality, which is a quality-speed trade-off worthy of some tuning.
246 | \subsection{Maximum Cut Problem}
247 | \begin{description}
248 | \Input{An undirected graph $G(V,E)$.}
249 | \Output{A cut $(A,B)$ that maximizes the number of crossing edges.}
250 | \end{description}
251 | If the graph is a bipartite graph, the problem is solvable within linear time via BFS. Just start from any vertex and mark vertices reachable from it with odd number of steps with 1 and the rest with 0, then we obtain the maximum cut. However, in general this is an NP complete problem. We will introduce a local search algorithm that solves the problem.
252 | 
253 | For any cut $(A,B)$ and a vertex $v$, define 
254 | \begin{description}
255 | \item[$c_v(A,B)$]number of edges incident on $v$ that crosses $(A,B)$.
256 | \item[$d_v(A,B)$]number of edges incident on $v$ that do not cross $(A,B)$.
257 | \end{description}
258 | Then we have the local search algorithm shown in Algorithm \ref{maxcutlocalsearch}.
259 | 
260 | \begin{algorithm}[ht]
261 | \caption{Local Search of Maximum Cut}\label{maxcutlocalsearch}
262 | \begin{algorithmic}[1]
263 | \State{Let $(A,B)$ be an arbitrary cut of $G$.}
264 | \While{there is a vertex $v$ with $d_v(A,B)>c_v(A,B)$}
265 | \State{Move $v$ to the other side of the cut.}
266 | \EndWhile
267 | \State{Return the final cut $(A,B)$.}
268 | \end{algorithmic}
269 | \end{algorithm}
270 | For graphs containing no parallel edges, the algorithm terminates within $\binom{n}{2}$ iterations because there are at most $\binom{n}{2}$ edges, and the number of crossing edges always increase by at least 1 before the algorithm converges. Hence the algorithm completes in polynomial time. 
271 | 
272 | In the output cut, we have $d_v(A,B)\leq c_v(A,B),\forall v\in V$. Hence 
273 | \begin{align*}
274 | \sum_vd_v(A,B)\leq \sum_vc_v(A,B).
275 | \end{align*}
276 | The lhs counts each non-crossing edge twice, while the rhs counts each crossing edge twice, thus 
277 | \begin{align*}
278 | 2\cdot(\text{num of non-crossing edge})&\leq 2\cdot(\text{num of crossing edge})\\
279 | &=2\cdot(\lvert E\rvert-\text{num of crossing edge})
280 | \end{align*}
281 | As a result, we have 
282 | $$\text{num of non-crossing edge}\leq\frac{1}{2}\lvert E\rvert,$$ i.e. Algorithm \ref{maxcutlocalsearch} outputs a cut in which the number of crossing edges is at least $\frac{1}{2}\lvert E\rvert$, which is not at all an impressive performance guarantee because a the expectation of the number of crossing edges of a random cut is already $\frac{1}{2}\lvert E\rvert$.
283 | 
284 | In a more general case, each $e\in E$ has a non-negative weight $w_e$, and the aim of the problem becomes to maximize the total weight of the crossing edges. The local search algorithm is still well defined, and there is a similar 50\% performance guarantee. However, the algorithm is no longer guaranteed to converge in polynomial time, because the number of crossing edges is no longer guaranteed to increase in each iteration. The number of iterations can be exponential.
285 | \subsection{The 2SAT Problem}
286 | \begin{description}
287 | \Input{$n$ boolean variables $x_i,i=1,2,\dots,n$. $m$ clauses of 2 literals each.}
288 | \Output{Whether there exists an assignment to all $x_i$ that simultaneously satisfies all clauses.}
289 | \end{description}
290 | One example of the clauses with $n=m=4$ is $(x_1\lor x_2)\land(\neg x_1\lor x_3)\land(x_3\lor x_4)\land(\neg x_2\lor\neg x_4)$.
291 | 
292 | The 2SAT problem is a special case of the CSP that can be solved in polynomial time\footnote{It can be reduced to computing SCCs of a directed graph. The graph contains 2 vertices $x_i$, $\neg x_i$ for each variable, and two edges $\neg x\rightarrow y,\neg y\rightarrow x$ for each clause $x\lor y$. If $x_i$ and $\neg x_i$ are in the same SCC for any $x_i$, there exists no viable assignment.}. 3SAT is on the contrary NP complete. We will analyze a random local search algorithm that solves the 2SAT problem, as shown in Algorithm \ref{twosatlocalsearch}.
293 | \begin{algorithm}[ht]
294 | \caption{Paradimitriou's Local Search 2SAT Algorithm}\label{twosatlocalsearch}
295 | \begin{algorithmic}[1]
296 | \For{$i=1$ \textbf{to} $\log n$}
297 | \State{Choose a random initial assignment.}
298 | \For{$j=1$ \textbf{to} $2n^2$}
299 | \If{current assignment satisfies all clauses}
300 | \State{Return this assignment}
301 | \Else\State{Pick an arbitrary unsatisfied clause and flip the value of one of its two variables.}
302 | \EndIf\EndFor\EndFor
303 | \State{Report non-existence of such an assignment.}
304 | \end{algorithmic}
305 | \end{algorithm}
306 | Obviously the algorithm runs in polynomial time and is always correct when there exists no satisfactory assignment. Nonetheless its performance guarantee is not trivial when such assignment exists.
307 | 
308 | To analyze Paradimitriou's algorithm, we need to first consider the problem of random walks on non-negative integers. Starting at 0, the position goes up or down by 1 at each step with 50/50 probability, except when the current position is 0, in which case we can only go to 1 with 100\% probability. Let $T_n$ represent the number of steps until the random walk reaches position $n$. 
309 | \begin{theorem}
310 | $E(T_n)=n^2$
311 | \end{theorem}
312 | \begin{proof}
313 | Let $Z_i$ represent the number of steps needed to get from $i$ to $n$. Then we have
314 | \begin{align*}
315 | E(Z_n)&=0\\
316 | E(Z_0)&=E(Z_1)+1\\
317 | E(Z_i)&=\frac{1}{2}E(Z_{i+1})+\frac{1}{2}E(Z_{i-1})+1,\forall i\geq 1
318 | \end{align*}
319 | Therefore we have
320 | \begin{equation*}
321 | E(Z_i)-E(Z_{i+1})=E(Z_{i-1})-E(Z_i)+2,\forall i\geq 1,
322 | \end{equation*}
323 | which leads to 
324 | \begin{align*}
325 | E(Z_i)-E(Z_{i+1})=2i+1,\forall i\geq 0\\
326 | \end{align*}
327 | As a result,
328 | \begin{align*}
329 | E(Z_0)-E(Z_n)=\sum\limits_{i=0}^{n-1}\left(E(Z_i)-E(Z_{i+1})\right)=n^2.
330 | \end{align*}
331 | Hence $E(T^n)=E(Z_0)=n^2.$
332 | \end{proof}
333 | \begin{corollary}\label{localsearchcorollary}
334 | $P\left(T_n>2n^2\right)<\frac{1}{2}$
335 | \end{corollary}
336 | \begin{proof}
337 | \begin{align*}
338 | n^2=E(T^n)&=\sum\limits_{i=0}^{2n^2}iP(T_n=i)+\sum\limits_{i=2n^2+1}^{\infty}iP(T_n=i)\\
339 | &\geq\sum\limits_{i=2n^2+1}^{\infty}iP(T_n=i)\\
340 | &>2n^2P\left(T_n>2n^2\right)\\
341 | \end{align*}
342 | Hence $P\left(T_n>2n^2\right)<\frac{1}{2}.$
343 | \end{proof}
344 | Now we can prove the following performance guarantee of Paradimitriou's local search algorithm.
345 | \begin{theorem}
346 | For a satisfiable 2SAT instance with $n$ variables, Paradimitriou's algorithm produces a satisfying assignment with probability $\geq 1-\frac{1}{n}$.
347 | \end{theorem}
348 | \begin{proof}
349 | Let's focus on one iteration of the outer loop. 
350 | 
351 | Let $a^*$ represent an arbitrary satisfying assignment (there can be multiple such assignments), and let $a_t$ represent the assignment after $t$ iterations of the inner loop ($t=1,2,\dots,2n^2$). Let $\chi_t$ represent the number of variables on whose value $a^*$ and $a_t$ agree, thus $\chi_t$ is between 0 and $n$. 
352 | 
353 | If $\chi_t\neq n$, then there must be at least one unsatisfied clause. Suppose the clause concerns $x_i$ and $x_j$. The consequence of flipping the value of $x_i$ \textbf{or} $x_j$ (choose randomly) is:
354 | \begin{itemize}
355 | \item If $a^*$ and $a_t$ disagree on both $x_i$ and $x_j$: $\chi_{t+1}=\chi_t+1$.
356 | \item If $a^*$ and $a_t$ disagree on one of $x_i$ and $x_j$:
357 | \begin{equation*}
358 | \chi_{t+1}=\begin{cases}
359 | \chi_t+1&(50\%\:possibility)\\
360 | \chi_t-1&(50\%\:possibility)\\
361 | \end{cases}
362 | \end{equation*}
363 | \end{itemize}
364 | There is an obvious analogy between the behavior of $\chi_t$ and the position in the random walks problem except:
365 | \begin{enumerate}
366 | \item Sometimes $\chi_t$ increases by 1 with possibility 100\%;
367 | \item The process may terminate before $\chi_t=n$ because there could be other viable assignments;
368 | \item Usually we start with $\chi_1>0$ instead of $\chi_1=0$: the miserable situation in which we start with the exact opposite of a satisfying assignment is rare. 
369 | \end{enumerate}
370 | All three differences only make it easier for the process to terminate correctly. Let $T$ represent the number of iterations needed inside each inner loop. According to Corollary \ref{localsearchcorollary}, we have $P(T>2n^2)<\frac{1}{2}$. Thus the probability that one iteration of the outer loop ends up with a satisfying assignment is at least $\frac{1}{2}$. With $\log n$ iterations, the probability that we end up with a correct solution is $\geq 1-\frac{1}{2^{\log n}}=1-\frac{1}{n}.$
371 | 
372 | 
373 | \end{proof}
374 | \ifx\PREAMBLE\undefined
375 | \end{document}
376 | \fi


--------------------------------------------------------------------------------
/DynamicProgramming.tex:
--------------------------------------------------------------------------------
  1 | \ifx\PREAMBLE\undefined
  2 | \input{preamble}
  3 | \begin{document}
  4 | \fi
  5 | \chapter{Dynamic Programming}
  6 | In this chapter we will introduce the last algorithm design paradigm: dynamic programming.
  7 | \section{Max-weight Independent Sets}
  8 | Our first example of dynamic programming is a relatively simple graph problem.
  9 | \begin{description}
 10 | \item[Input]A path graph $G(V,E)$ with non-negative weights on vertices.
 11 | \item[Output]An independent set, i.e. a subset of $V$ in which no vertices are adjacent, of maximum total weight.
 12 | \end{description}
 13 | \begin{center}
 14 | \begin{tikzpicture}
 15 | \tikzstyle{chosen} = [fill=red!20!, circle, draw=red]
 16 | \tikzstyle{normal} = [draw, circle]
 17 | \node[normal] (0) at (-4,0) {1};
 18 | \node[chosen] (1) at (-2,0) {4};
 19 | \node[normal] (2) at (0,0) 	{5};
 20 | \node[chosen] (3) at (2,0)	{4};
 21 | \draw (0) -- (1) -- (2) --(3);
 22 | \end{tikzpicture}
 23 | \end{center}
 24 | In the example above, the WIS is obviously the two red nodes. Generally, a brute-force approach takes exponential time. An intuitive greedy algorithm does not guarantee a correct answer: it is actually wrong for the simple example above.  The divide-and-conquer paradigm cannot be applied because there is no natural correct way to combine solutions to the two sub-problems. This is when dynamic programming comes to our rescue.
 25 | 
 26 | Let's consider the structure of an optimal solution in terms of its relationship with solutions to smaller problems. Let $S\subseteq V$ be a max-weight independent set (IS) of $G$, $v_n$ be the last vertex of the path and $v_{n-1}$ be the last but one vertex. Denote $G$ with $v_n$ deleted as $G'$, and $G$ with $v_n,v_{n-1}$ deleted as $G''$.  
 27 | \begin{itemize}
 28 | \item If $v_n\notin S$, then $S$ must also be a max-weight IS of $G'$, which can be proved easily by contradiction.
 29 | \item If $v_n\in S$, then $v_{n-1}\notin S$. It can be proved easily by contradiction that $S-\{v_n\}$ is a max-weight IS of $G''$. 
 30 | \end{itemize}
 31 | Therefore, a max-weight IS of $G$ is either a max-weight IS of $G'$, or a max-weight IS of $G''$ + $v_n$. The same reasoning holds for smaller problems, which induces a correct recursive algorithm:
 32 | \begin{enumerate}
 33 | \item Recursively compute $S_1$ = max-weight IS of $G'$.
 34 | \item Recursively compute $S_2$ = max-weight IS of $G''$.
 35 | \item Return $S_1$ or $S_2\cup\{v_n\}$, whichever is better.
 36 | \end{enumerate}
 37 | The correctness of the algorithm can be verified by induction. However it takes exponential time because it is per se a variant of the brute-force algorithm.
 38 | \begin{center}
 39 | \begin{tikzpicture}[
 40 | level 1/.style={sibling distance=6cm, level distance=1.5cm},
 41 | level 2/.style={sibling distance=3cm, level distance=1.5cm},
 42 | level 3/.style={sibling distance=2cm, level distance=1.5cm}]
 43 | level 4/.style={sibling distance=1cm, level distance=1.5cm}]
 44 | \tikzstyle{node} = [draw, circle,minimum size=0.8cm]
 45 | \node[node] (0) at (0,0) {n}
 46 | child {node[node]{n-1}
 47 | 	child {node[node]{n-2}
 48 | 		child {node[node]{n-3}}
 49 | 		child {node[node]{n-4}}
 50 | 		}
 51 | 	child {node[node]{n-3}
 52 | 		child {node[node]{n-4}}
 53 | 		child {node[node]{n-5}}
 54 | 		}
 55 | 	}
 56 | child {node[node]{n-2}
 57 | 	child {node[node]{n-3}
 58 | 		child {node[node]{n-4}}
 59 | 		child {node[node]{n-5}}
 60 | 		}
 61 | 	child {node[node]{n-4}
 62 | 		child {node[node]{n-5}}
 63 | 		child {node[node]{n-6}}
 64 | 		}
 65 | 	};
 66 | \end{tikzpicture}
 67 | \end{center}
 68 | As shown above, each sub-problem is calculated multiple times. The number of distinct sub-problems is actually $O(n)$. If we can reformulate the recursive algorithm into a bottom-up iterative algorithm, and cache the solution to a sub-problem the first time it is solved, the problem can be solved in linear time, as shown in Algorithm \ref{maxweightis}.
 69 | \begin{algorithm}[ht]
 70 | \caption{Max-weight Independent Set(DP)}\label{maxweightis}
 71 | \begin{algorithmic}[1]
 72 | \Input{Path graph $G=(V,E)$ with non-negative weight $w_i$ for each vertex $v_i$. Sub graph composed of the first $i$ vertices is denoted by $G_i$.}
 73 | \Output{Array $A$ with $A[i]=$ total weight of max-weight IS of $G_i$.}
 74 | \State{$A[0]=0,A[1]=w_1$.}
 75 | \For{$i = 2,3,\dots,n$}
 76 | \State{$A[i]=\max\{A[i-1], A[i-2]+w_i\}$}
 77 | \EndFor
 78 | \end{algorithmic}
 79 | \end{algorithm}
 80 | 
 81 | Algorithm \ref{maxweightis} only outputs the total weight of the max-weight IS of $G$. The IS itself can be reconstructed according to array $A$, as shown in Algorithm \ref{reconstructionis}. The running time is also $O(n)$.
 82 | 
 83 | \begin{algorithm}[ht]
 84 | \caption{Reconstruction of Max-weight Independent Set}\label{reconstructionis}
 85 | \begin{algorithmic}[1]
 86 | \Input{Array $A$ computed in Algorithm \ref{maxweightis}.}
 87 | \Output{The max-weight IS $S$ of path graph $G$.}
 88 | \State{Initialize $S=\emptyset$, $i=n$.}
 89 | \While{$i\geq 2$}
 90 | \If{$A[i-1]<A[i-2]+w_i$}
 91 | \State{Add $v_i$ to $S$}
 92 | \State{$i = i - 2$}
 93 | \Else
 94 | \State{$i = i - 1$}
 95 | \EndIf
 96 | \EndWhile
 97 | \If{$v_2\notin S$}
 98 | \State{Add $v_1$ to $S$}
 99 | \EndIf
100 | \end{algorithmic}
101 | \end{algorithm}
102 | \section{Principles of DP}
103 | After a concrete example, let's introduce a few general principles of dynamic programming. Typical DP problems  share the following traits:
104 | \begin{enumerate}
105 | \item It is easy to identify a small number of sub-problems. In the max-weight IS problem, the sub-problems are the max-weight IS of $G_i$ for $i=0,1,\dots,n$. The number of sub-problems is not necessarily linear, but it has to be reasonably small.
106 | \item Given solutions to smaller sub-problems, larger sub-problems can be solved quickly can correctly. This is usually expressed as a recursive relation, for example $A[i]=\max\{A[i-1],A[i-2]+w_i\}$ in the max-weight IS problem.
107 | \item The final solution can be computed quickly after solving all sub-problems. Usually it's just the answer to the largest sub-problem.
108 | \end{enumerate}
109 | \section{Knapsack Problem}
110 | \begin{description}
111 | \item[Input]$n$ items with non-negative value $v_i$ and non-negative integral size $w_i$ for item $i$. Capacity $W$, which is a non-negative integer.
112 | \item[Output]A subset $S\subseteq\{1,2,\dots,n\}$ that maximizes $\sum\limits_{i\in S}v_i$ subject to the condition $\sum\limits_{i\in S}w_i\leq W$.
113 | \end{description}
114 | Again let's consider different situations for item $n$. If $n\notin S$, then $S$ must also be the optimal solution for the first $n-1$ items and capacity $W$. On the contrary, if $n\in S$, then $S-\{n\}$ must be the optimal solution for the first $n-1$ items and capacity $W-w_n$. Let $V_{i,x}$ represent the value of the best solution for the first $i$ items and capacity $x$, then recursively we have 
115 | $$V_{i,x}=\max\{V_{i-1,x},V_{i-1,x-w_i}+v_i\}.$$
116 | The sub-problems have been identified up to now: for each $i,x$ combination, there is a sub-problem. A DP algorithm is show in Algorithm \ref{knapsack}. The running time is obviously $O(nW)$.
117 | 
118 | \begin{algorithm}[ht]
119 | \caption{Knapsack Problem(DP)}\label{knapsack}
120 | \begin{algorithmic}[1]
121 | \Input{$n$ items as stated above.}
122 | \Input{$(n+1)\times(W+1)$ 2-D array $A$ with $A[i][x]=V_{i,x}$.}
123 | \State{Initialize $A[0][x]=0$ for $x=0,1,\dots,W$.}
124 | \For{$i=1,2,\dots,n$}
125 | \For{$x=0,1,\dots,W$}
126 | \If{$x\geq w_i$}
127 | \State{$A[i][x]=\max\{A[i-1][x],A[i-1][x-w_i]+v_i\}$}
128 | \Else\State{$A[i][x]=A[i-1][x]$}
129 | \EndIf\EndFor\EndFor
130 | \end{algorithmic}
131 | \end{algorithm}
132 | 
133 | Array $A$ records the maximized sum of value for all sub-problems. The solution to the problem, i.e. the subset $S$, can be reconstructed according to $A$, as shown in Algorithm \ref{knapsackreconstruction}. Note that the running time is only $O(n)$.
134 | 
135 | \begin{algorithm}[ht]
136 | \caption{Knapsack Reconstruction}\label{knapsackreconstruction}
137 | \begin{algorithmic}[1]
138 | \Input{Array $A$ computed in Algorithm \ref{knapsack}.}
139 | \Output{Solution $S$ to the knapsack problem.}
140 | \State{Initialize $S=\emptyset$, $i=n,x=W$.}
141 | \While{$i\geq 1$}
142 | \If{$A[i][x]\neq A[i-1][x]$}
143 | \State{Add $i$ to $S$}
144 | \State{$x=x-w_i$}
145 | \EndIf
146 | \State{$i=i-1$}
147 | \EndWhile
148 | \end{algorithmic}
149 | \end{algorithm}
150 | 
151 | An example of Knapsack problem is show below. There are 4 items, and $W = 6$. 
152 | \begin{center}
153 | \begin{tabular}{c|cccc}
154 | item  & 1 & 2 & 3 & 4\\\hline
155 | $v_i$ & 3 & 2 & 4 & 4\\
156 | $w_i$ & 4 & 3 & 2 & 3\\
157 | \end{tabular}
158 | \begin{tabular}{c|ccccc}
159 | $A[i,x$]& $i=0$ & $i=1$ & $i=2$ & $i=3$ & $i=4$\\\hline
160 | $x=0$   &	0   & 0     & 0     & 0     & 0\\
161 | $x=1$   &	0   & 0     & 0     & 0     & 0\\
162 | $x=2$   &	0   & 0     & 0     & 4     & 4\\
163 | $x=3$   &	0   & 0     & 2     & 4     & 4\\
164 | $x=4$   &	0   & 3     & 3     & 4     & 4\\
165 | $x=5$   &	0   & 3     & 3     & 6     & 8\\
166 | $x=6$   &	0   & 3     & 3     & 7     & 8\\
167 | \end{tabular}
168 | \end{center}
169 | 
170 | The maximum value is therefore 8, corresponding to the subset \{3,4\}.
171 | \section{Sequence Alignment}
172 | \begin{description}
173 | \Input{String $X=x_1x_2\dots x_m$, $Y=y_1y_2\dots y_m$ over some alphabet $\Sigma$. Penalty $\alpha_{ab}$ for aligning $a$ with $b$, and $\alpha_{gap}$ for inserting a gap. Presumably $\alpha_{aa}=0,\forall a\in\Sigma$.}
174 | \Output{An alignment of $X$ and $Y$ with minimum total penalty.}
175 | \end{description}
176 | Consider the last position of the alignment. There are 3 possible cases: $x_m\:\&\:y_n$, $x_m\:\&\:gap$, or $gap\:\&\:y_n$. Let $X'=X-x_m$ and $Y'=Y-y_j$. If the optimal alignment falls into the first case, then it can be proved by contradictory that it is the optimal alignment of $X'$ and $Y'$ plus aligning $x_m$ with $y_n$. Similar reasoning can be made for the other 2 cases. In general, let $X_i$ represent the first $i$ letters of $X$ and $Y_j$ represent the first $j$ letters of $Y$. Let $P_{ij}$ represent the optimal penalty for aligning $X_i$ and $Y_j$. Then we must have
177 | \begin{equation*}
178 | P_{ij}=\min\begin{cases}
179 | \alpha_{x_iy_j}+P_{i-1,j-1}\\
180 | \alpha_{gap}+P_{i,j-1}\\
181 | \alpha_{gap}+P_{i-1,j}\\
182 | \end{cases}\end{equation*}
183 | As for the base cases, obviously we have $P_{0i}=P_{i0}=i\cdot\alpha_{gap}$. Now we hear the knock at the door of a DP algorithm, as shown in Algorithm \ref{alignmentdp}. Its running time is $O(mn)$.
184 | \begin{algorithm}[ht]
185 | \caption{Sequence Alignment(DP)}\label{alignmentdp}
186 | \begin{algorithmic}[1]
187 | \Input{Two strings $X,Y$ as stated above.}
188 | \Output{$(m+1)\times(n+1)$ 2D array $A$ with $A[i][j]=P_{ij}$.}
189 | \State{Initialize $A[i][0]=A[0][i]=i\cdot\alpha_{gap}$ for all $i$.}
190 | \For{$i=1$ \textbf{to} $m$}
191 | \For{$j=1$ \textbf{to} $n$}
192 | \State{$A[i][j]=\min\{A[i-1][j-1]+\alpha_{ij},A[i][j-1]+\alpha_{gap},A[i-1][j]+\alpha_{gap}\}$}
193 | \EndFor\EndFor
194 | \end{algorithmic}
195 | \end{algorithm}
196 | 
197 | Just like before, the actual solution can be reconstructed based on $A$, as shown in Algorithm \ref{alignmentreconstruction}. The running time is $O(m+n)$.
198 | \begin{algorithm}[ht]
199 | \caption{Sequence Alignment Reconstruction}\label{alignmentreconstruction}
200 | \begin{algorithmic}[1]
201 | \Input{Array $A$ computed in Algorithm \ref{alignmentdp}}.
202 | \Output{The actual alignment}
203 | \State{$i=m,j=n$}
204 | \While{$i>0$ or $j>0$}
205 | \If{$i==0$}
206 | \State{Align all $j$ left characters in $Y$ align with a gap and return}
207 | \ElsIf{$j==0$}
208 | \State{Align all $i$ left characters in $X$ align with a gap and return}
209 | \ElsIf{$A[i][j]==A[i-1][j-1]+\alpha_{ij}$}
210 | \State{Align $x_i$ with $y_j$}
211 | \State{$i=i-1,j=j-1$}
212 | \ElsIf{$A[i][j]==A[i][j-1]+\alpha_{gap}$}
213 | \State{Align $y_j$ with a gap}
214 | \State{$j=j-1$}
215 | \Else\State{Align $x_i$ with a gap}
216 | \State{$i=i-1$}
217 | \EndIf\EndWhile
218 | \end{algorithmic}
219 | \end{algorithm}
220 | \section{Optimal Binary Search Trees}
221 | In a BST, the time consumption of searching for an item $i$ is $O(d_i)$, in which $d_i$ is the depth of $i$ in the BST. If the search for each item is equally likely to occur, the ideal BST should be balanced. Nonetheless, if we have knowledge of the likelihood for each item to be searched for, the optimal BST is not necessarily balanced. More frequently visited items should be put closer to the root. In Huffman code problem, we aimed at minimizing the average coding length; here, our target is to minimize the average search time.
222 | \begin{description}
223 | \Input{Frequencies $p_i$ for items $i=1,2,\dots,n$. }
224 | \Output{An optimal BST that minimizes the average search time
225 | $C(T)=\sum\limits_{i}p_i(d_i+1).$}
226 | \end{description}
227 | For Huffman coding, a bottom-up greedy algorithm efficiently solves the problem. But here either bottom-up or top-down greedy algorithm cannot guarantee an optimal solution.
228 | 
229 | Suppose the optimal BST $T$ has left child tree $T_1$, right child tree $T_2$ and root $r$. Items $\{1,\dots,r-1\}$ are contained in $T_1$, while $\{r+1,\dots,n\}$ lie in $T_2$. Then we have
230 | \begin{align*}
231 | C(T)&=\sum\limits_{i=1}^np_i(d_i+1)=p_r + \sum\limits_{i=1}^{r-1}p_i(d_i+1) + \sum\limits_{i=r+1}^{n}p_i(d_i+1)\\
232 | &=p_r+\sum\limits_{i=1}^{r-1}p_i(d_{1i}+1 + 1) + \sum\limits_{i=r+1}^{n}p_i(d_{2i}+1 + 1)\\
233 | &=p_r+C(T_1)+\sum\limits_{i=1}^{r-1}p_i + C(T_2)+\sum\limits_{i=r+1}^{n}p_i\\
234 | &=C(T_1)+C(T_2)+\sum\limits_{i=1}^{n}p_i.
235 | \end{align*}
236 | Thus it can be proved by contradiction that $T_1$ must be optimal for $\{1,\dots,r-1\}$, and $T_2$ must be optimal for $\{r+1,\dots,n\}$. 
237 | \begin{lemma}\textbf{Optimal Structure Lemma}
238 | If $T$ is an optimal BST for the keys $\{1,\dots,n\}$ with root $r$, then its subtrees $T_1$ and $T_2$ must be optimal BSTs respectively for $\{1,\dots,r-1\}$ and $\{r+1,\dots,n\}$.
239 | \end{lemma}
240 | For $1\leq i\leq j\leq n$, let $C_{ij}$ represent the average search time of an optimal BST for items $\{i,\dots,j\}$. According to the optimal structure lemma, we can set up the recurrence relation:
241 | \begin{equation*}
242 | C_{ij}=\min\limits_{r=i}^j\left(C_{i,r-1}+C_{r+1,j}\right)+\sum\limits_{k=i}^jp_k,
243 | \end{equation*}
244 | which leads to the DP algorithm \ref{optimalbstdp} to solve the problem.
245 | \begin{algorithm}[ht]
246 | \caption{Optimal BST(DP)}\label{optimalbstdp}
247 | \begin{algorithmic}[1]
248 | \Input{Frequencies $p_i$ for items $i=1,2,\dots,n$. }
249 | \Output{$n\times n$ 2D array $A$ with $A[i][j]$ representing optimal average search time for items $\{i,\dots,j\}$. }
250 | \State{Initialize $A[i][i]=p_i$ for $i=1,\dots,n$.}\Comment{Base case: single node.}
251 | \For{$s=0$ \textbf{to} $n-1$}\Comment{$A[i][j]=0$ if $i>j$ or $i,j$ out of bound. }
252 | \For{$i=1$ \textbf{to} $n-s$}
253 | \State{$A[i][i+s]=\min\limits_{r=i}^{i+s}\left(A[i][r-1]+A[r+1][i+s]\right)+\sum\limits_{k=i}^{i+s}p_k$}
254 | \EndFor\EndFor
255 | \end{algorithmic}
256 | \end{algorithm}
257 | 
258 | In total there are $O(n^2)$ sub-problem, and each requires $O(j-i)$ time. Hence the overall running time is $O(n^3)$. Nonetheless it has been proven that an optimized version of the algorithm takes only $O(n^2)$ time.
259 | \section{Bellman Ford Algorithm}
260 | We've already introduced Dijkstra's algorithm to solve the shortest path problem when edge lengths are non-negative. Bellman Ford algorithm comes to our rescue when there exist edge lengths with negative lengths. Also, it provides a distributed alternative to Dijkstra's algorithm, which is needed to solve the Internet routing problem.
261 | \begin{description}
262 | \Input{Directed graph $G(V,E)$ with edge lengths $c_e$ for each $e\in E$ and source vertex $s\in V$. }
263 | \Output{For every destination $v\in V$, compute the length of a shortest $s-v$ path.}
264 | \end{description} 
265 | We face a dilemma when it comes to negative cycles: if we take them into account in the search for shortest paths, a lot of vertices will end up with shortest paths with length $-\infty$. If we require that they should not be included in the paths, the problem becomes unsolved in polynomial time, i.e. NP hard. For the moment, we just assume that they do not exist in the graphs. Later we will introduce a criteria that detects negative cycles with little increase in the amount of workload. 
266 | \subsection{Algorithm}
267 | If there exists no negative cycles, then the shortest $s-v$ path for any vertex $v$ contains at most $n-1$ edges, because any path containing $n$ edges is sure to contain a cycle, and we can always get rid of the cycle, thus reducing the total cost, without breaking the reachability from $s$ to $v$. This inspires us of the definition of the subproblems in the dynamic programming paradigm: a shortest path from $s$ to $v$ with $i$ edges, in which $i=0,1,\dots,n-1$. We have the following lemma correct even for graphs with negative cycles.
268 | \begin{lemma}
269 | Let $G(V,E)$ be a directed graph with edge length $c_e$ for edge $e$ and source vertex $s$. For any vertex $v$, let $P$ represent the shortest path from $s$ to $v$ with at most $i$ paths. Then one of the two cases must be true.
270 | \begin{description}
271 | \item[case 1]If $P$ has $\leq(i-1)$ edges, then it is also a shortest $s-v$ path with $\leq(i-1)$ edges.
272 | \item[case 2]If $P$ contains $i$ edges with $(w,v)$ being the last hop, then $P'=P-(w,v)$ must be the shortest $s-w$ path with $\leq(i-1)$ edges. 
273 | \end{description}
274 | \end{lemma}
275 | The lemma can be easily proved by contradiction. It can serve as the recursion relation for our dynamic programming algorithm. The number of candidates for the solution to the subproblem for vertex $v$ of size $i$ , whose length we will denote as $L_{i,v}$, is $1+in-degree(v)$: solution to the subproblem with size $i-1$, and every incident edge $(w,v)$ + solution to the subproblem for $w$ of size $i-1$, i.e.
276 | \begin{equation*}
277 | L_{i,v} = \min\left\{L_{i-1,v}, \min\limits_{(w,v)\in E}\left(L_{i-1,w}+c_{wv}\right)\right\},\:\forall v\in V.
278 | \end{equation*}
279 | \begin{algorithm}[ht]
280 | \caption{Bellman Ford Algorithm}\label{bellmanford}
281 | \begin{algorithmic}[1]
282 | \Input{Directed graph $G(V,E)$ with edge length $c_e$ for all $e$ and source vertex $s$.}
283 | \Output{$n\times n$ 2D array $A$ with $A[i][v]=L_{i,v}$, in which $i=0,1,\dots,n-1$, $v\in V$.}
284 | \State{Initialize $A[0][s]=0$ and $A[0][v]=+\infty$ for all $v\neq s$.}
285 | \For{$i=1$ \textbf{to} $n-1$}
286 | \For{each $v\in V$}
287 | \State{$A[i][v]=\min\left\{A[i-1][v], \min\limits_{(w,v)\in E}\left(A[i-1][w]+c_{wv}\right)\right\}$}
288 | \EndFor\EndFor
289 | \end{algorithmic}
290 | \end{algorithm}
291 | 
292 | As long as $G$ contains no negative cycle, $A[n-1][v]$ is guaranteed to be the length of the shortest $s-v$ path for any $v\in V$. The running time of Algorithm \ref{bellmanford} is $O(mn)$, because there are $n$ iterations in the outer loop, and each edge is examined exactly once in the inner loop. Note that if or some $j<n-1$, we have $A[j][v]=A[j-1][v]$ for all $v$, then later iterations will by no means modify the values, thus we can halt the algorithm and use $A[j][v]$ as the output. 
293 | \subsection{Detect Negative Cycles}
294 | In order to detect negative cycles, we will prove the following theorem.
295 | \begin{theorem}
296 | $G$ contains no negative cycle $\iff$ One more iteration in the Bellman Ford algorithm makes no modification to the values, i.e. we will have $A[n][v]=A[n-1][v]$ for all $v\in V$. 
297 | \end{theorem} 
298 | \begin{proof}
299 | ($\rightarrow$) Guaranteed by the correctness of Bellman Ford algorithm. An $s-v$ path with $n$ edges can never be the shortest in a graph containing no negative cycle, thus $A[n][v]$ will be filled with the same value as $A[n-1][v]$.
300 | 
301 | ($\rightarrow$) Let $d(v)=A[n-1][v]=A[n][v]$. For any $(w,v)\in E$, we have 
302 | $$A[n][v]\leq A[n-1][w]+c_{wv},$$
303 | i.e. $d(v)-d(w)\leq c_{wv}$. Therefore for an arbitrary cycle $C$, its length
304 | $$\sum\limits_{(w,v)\in C}c_{wv}\geq \sum\limits_{(w,v)\in C}(d(v)-d(w))=0.$$
305 | Thus all cycles have non-negative total length.
306 | \end{proof}
307 | 
308 | Hence, by running the extended Bellman Ford algorithm, i.e. running one more iteration after obtaining $A[n-1][v]$, we can detect whether negative cycles exist in the graph. This extra iteration does not lead to essential increase of the running time, which is still $O(mn)$. 
309 | \subsection{Space Optimization}
310 | The 2D array $A$ uses $O(n^2)$ space. If all we need is the length of the shortest path for each vertex, the space consumption can be reduced to $O(n)$, because we only need $A[i-1]$ to derive $A[i]$. However, if we hope to reconstruct the shortest paths themselves, it seems inevitable to store the whole 2D array.
311 | 
312 | Yet we can do better. We can set up another 2D array $B$, in which $B[i][v]$ is the predecessor of $v$ along the shortest $s-v$ path with at most $i$ edges, or NULL if such path doest not exist. Initially $B[0][v]$ is set to NULL for all $v$. In case 1 of Algorithm \ref{bellmanford}, we simply set $B[i][v]=B[i-1][v]$, while in case 2 we set $B[i][v]=w$. Finally with $B[n-1][v]$ at hand, we can reconstruct the shortest $s-v$ path by tracing back the predecessor pointers. Only linear space is needed to store the content of $A[i],A[i-1],B[i],B[i-1]$ in each iteration.
313 | 
314 | The correctness of this method can be verified as follow. Suppose the predecessor of $v$ is $w$, then the $s-w$ part of the shortest $s-v$ path must be the shortest $s-w$ path, otherwise we would be able to construct a shorter $s-v$ path. Hence the 3rd-to-last vertex along the shortest $s-v$ path is the 2nd-to-last along the shortest $s-w$ path, i.e. the predecessor of $w$. By tracing back the predecessor pointers, we finally obtain the shortest $s-v$ path.
315 | 
316 | Negative cycles can also be found with the help of the predecessor pointers. In each iteration, we can check for cycles within the predecessors, i.e. $B[i]$. If a negative cycle exists, it will be detected\footnote{Still not sure of precise correctness proof.}. 
317 | \subsection{Internet Routing}
318 | The Bellman Ford algorithm is intuitively a distributed algorithm suitable for the Internet routing problem. Yet needs to be adjusted for this specific use case. 
319 | 
320 | First we switch from a source driven approach to a destination driven one, i.e. all directions in the Bellman Ford algorithm are reversed. Every vertex $v$ stores the length of the shortest path from $v$ to each destination $t$ and the successor of $v$ along that path. Of course this does not mean that each machine has to know how to get to anywhere on the Internet. Thanks to the hierarchy structure of the Internet, each machine need to store only the path to some node above its own level in this hierarchy, e.g. the router of a district, or the gateway of a school, and the node will handle the rest of the routing. 
321 | 
322 | Second, asynchrony needs to be taken into account. Instead of a ``pull-based'' approach, in which each vertex checks the status of its neighbors proactively, we should switch to a ``push-based'' approach, in which each vertex notices its neighbors when its status changes. The algorithm will finally converge given that there exists no negative cycle, but it might take exponential time.
323 | 
324 | Third, connection failures need to be handled properly to avoid the ``counting to infinity'' problem. This is achieved in reality by making each vertex $v$ store the entire shortest path to destinations, which significantly causes more space consumption, but guarantees the robustness in case of failures, and also permits more sophisticated route selection, i.e. selection of intermediate stops.
325 | \section{All-Pairs Shortest Paths}
326 | \subsection{Problem Definition}
327 | We've been dealing with shortest paths with a vertex is fixed as source. The all-pairs shortest path problem aims at obtaining the shortest path for any two vertices.
328 | \begin{description}
329 | \Input{Directed graph $G(V,E)$ with edge cost $c_e$ for each $e\in E$.}
330 | \Output{Either compute the length of a shortest $u\rightarrow v$ path for all pairs of vertices $u,v$, or report the existence of a negative cycle.}
331 | \end{description}
332 | 
333 | If there are no negative edges, we can solve the APSP problem by running Dijkstra's algorithm $n$ times, which takes $O(mn\log n)$ time. In a sparse graph, i.e. when $m=O(n)$, this is equivalent to $O(n^2\log n)$. In a dense graph, i.e. when $m=O(n^2)$, this is $O(n^3\log n)$. 
334 | 
335 | If negative edges are allowed, we have to instead run Bellman Ford algorithm $n$ times, thus we end up with time consumption $O(mn^2)$, which is equivalent to $O(n^3)$ for sparse graph and $O(n^4)$ for dense graph.
336 | \subsection{Floyd Warshall Algorithm}
337 | Floyd Warshall algorithm is a $O(n^3)$ algorithm that solves the APSP problem. It is as good as Bellman-Ford algorithm for sparse graphs, and better for dense graphs. Whether there exists an algorithm that solves the APSP problem for dense graphs significantly faster than $O(n^3)$ remains an open question.
338 | 
339 | In the context of F-W algorithm, the vertices are ordered arbitrarily, i.e. we have $V=\{1,2,\dots,n\}$. Let $V^{(k)}=\{1,2,\dots,k\}$. Suppose $G$ has no negative cycle, then we have the following lemma.
340 | \begin{lemma}\textbf{Optimal Substructure Lemma}
341 | For a specific pair of vertices $i,j$, let $P$ be a shortest $i-j$ path with all internal nodes, i.e. nodes along the path other than $i$ and $j$, in $V^{(k)}$. Then one of the two cases must be true.
342 | \begin{description}
343 | \item[case 1]If $k$ is not internal to $P$, then $P$ is a shortest $i-j$ path with all internal nodes in $V^{(k-1)}$.
344 | \item[case 2]If $k$ is internal to $P$, then $P=P_1+P_2$, in which $P_1$ is the shortest $i-k$ path with all internal nodes in $V^{(k-1)}$ and $P_2$ is the shortest $k-j$ path with all internal nodes in $V^{(k-1)}$.
345 | \end{description}
346 | \end{lemma}
347 | The proof already seems cliche to us. The lemma reveals the recurrence relation needed for the DP algorithm, as shown in Algorithm \ref{floydwarshall}.
348 | \begin{algorithm}[ht]
349 | \caption{Floyd Warshall's APSP Algorithm}\label{floydwarshall}
350 | \begin{algorithmic}[1]
351 | \Input{Directed graph $G(V,E)$ with edge length $c_e$ for all $e$, containing no negative cycle.}
352 | \Output{$n\times n\times(n+1)$ 3D array $A$ with $A[i][j][k]$ representing the length of the shortest $i-j$ path with all internal nodes in $V^{k}$.}
353 | \State{Initialize $A[i][j][0]=\begin{cases}
354 | 0&if\:i=j\\
355 | c_{ij}&if\:(i,j)\in E\\
356 | +\infty&if\:i\neq j\:and\:(i,j)\notin E
357 | \end{cases}$}
358 | \For{$k=1$ to $n$}
359 | \For{$i=1$ to $n$}
360 | \For{$j=1$ to $n$}
361 | \State{$A[i][j][k]=\min\{A[i][j][k-1], A[i][k][k-1]+A[k][j][k-1]\}$}
362 | \EndFor\EndFor\EndFor
363 | \end{algorithmic}
364 | \end{algorithm}
365 | 
366 | Each sub problem takes $O(1)$ time, thus the overall time consumption is $O(n^3)$. 
367 | 
368 | If there exists a negative cycle, then we will have $A[i][i]<0$ for at least one $i\in V$ at the end of the algorithm, which can be used for the detection of negative cycles.
369 | 
370 | In terms of reconstruction, we can save an $n\times n$ 2D array $B$ in which $B[i][j]$ represents the max label among internal nodes on a shortest $i-j$ path. In case 2 of algorithm \ref{floydwarshall}, we set $B[i][j]=k$.
371 | \subsection{Johnson's Algorithm}
372 | John's algorithm enables us to solve the APSP problem with 1 invocation of Bellman Ford and $n$ invocations of Dijkstra when there exists negative edges. 
373 | 
374 | It is not always possible to choose a source vertex that can reach any vertex in the graph. Thus we create a fake source vertex $s$ that has an edge of cost 0 to every vertex. We run Bellman Ford algorithm for $s$, and obtain $p_v$, the length of the shortest path from $s$ to each vertex $v$. Consider an edge $i\rightarrow j$. Since the optimal $s-i$ path + $i\rightarrow j$ forms an $s-j$ path, we must have $p_i+c_{ij}\geq p_j$, i.e. $$c'_{ij}=c_{ij}+p_i-p_j\geq 0.$$
375 | If we substitute $c_{ij}$, the cost of edge $i\rightarrow j$ with $c'_{ij}$, then the new graph contains no negative edge, which makes it eligible for Dijkstra's algorithm. Consider a path $s-t$ with length $L_{st}$ in the original graph.  The length of the new $s-t$ path is 
376 | $$L'_{st}=L_{st}+p_s-p_t. $$
377 | The difference between $L_{st}$ and $L'_{st}$ is constant, thus a shortest path in the original graph remains shortest in the new graph.
378 | 
379 | As a summary, we have John's algorithm illustrated in Algorithm 
380 | \begin{algorithm}[ht]
381 | \caption{Johnson's APSP Algorithm}\label{apspjohnson}
382 | \begin{algorithmic}[1]
383 | \Input{Directed graph $G(V,E)$ with edge length $c_e$ for all $e$.}
384 | \Output{Report a negative cycle or  }
385 | \State{Form graph $G'$ by adding a new vertex $s$ and a new edge $(s,v)$ of length 0 for each $v\in V$. }\Comment{$O(n)$}
386 | \State{Run Bellman Ford on $G'$ with source $s$. If a negative cycle is detected, halt and report. }\Comment{The negative cycle cannot contain $s$ because it has no ingoing edge. $O(mn)$}
387 | \State{Define $p_v$ = length of shortest $s-v$ path in $G'$. }
388 | \State{For each edge $(u,v)\in E$, define $c'_{uv}=c_{uv} + p_u-p_v$ }\Comment{$O(m)$}
389 | \State{For each vertex $u\in G$, run Dijkstra in $G$ with source $u$ and edge length $\{c'_e\}$ to obtain the shortest $u-v$ path $d'(u,v)$ for each $v\in V$. }\Comment{$O(nm\log n)$}
390 | \State{For each pair of $(u,v)$, return the shortest path length $d(u,v)=d'(u,v)+p_v-p_u$. }\Comment{$O(n^2)$}
391 | \end{algorithmic}
392 | \end{algorithm}
393 | 
394 | The overall running time is dominated by the Dijkstra invocations, i.e. $O(mn\log n)$. 
395 | \ifx\PREAMBLE\undefined
396 | \end{document}
397 | \fi


--------------------------------------------------------------------------------
/GreedyAlgorithms.tex:
--------------------------------------------------------------------------------
  1 | \ifx\PREAMBLE\undefined
  2 | \input{preamble}
  3 | \externaldocument{DataStructures}
  4 | \externaldocument{DivideConquer}
  5 | \begin{document}
  6 | \fi
  7 | \chapter{Greedy Algorithms}
  8 | In the field of algorithm design, there is no silver bullet suitable for all kinds of problems. We have covered divide-and-conquer paradigm and randomized algorithms. We will now introduce greedy algorithms, and in the next chapter we will dive into dynamic programming.
  9 | 
 10 | In a greedy algorithm, ``myopic'' decisions are made iteratively, in the hope that finally we will end up with a correct solution to the problem. Dijkstra's algorithm is actually a greedy algorithm. In contrast with divide-and-conquer algorithms, greedy algorithms have the following features:
 11 | \begin{itemize}
 12 | \item It is easy to propose multiple greedy algorithms for many problems.
 13 | \item It is relatively easy to analyze running times of greedy algorithms.
 14 | \item It is hard to establish the correctness of greedy algorithms. Proofs are usually ad-hoc. General approaches include induction (e.g. for Dijkstra) and ``exchange argument'', which will be covered later.
 15 | \item Most greedy algorithms are unfortunately incorrect.
 16 | \end{itemize}
 17 | 
 18 | A problem that can theoretically be solved by a greedy algorithm is the caching problem. Modern computers contain caches, sometimes a few layers. On a cache miss that happened at a page request, we have to evict something from the cache in order to make room for the newly requested page that is not in the cache. The choice of the item to evict influences greatly the number of cache misses. Some misses are inevitable, while others happen can be avoided by wise eviction choices. 
 19 | 
 20 | It can be proved via exchange argument that the ``furthest-in-future'' algorithm in the optimal solution to this problem. The algorithm always evicts the item that will take the longest time to be requested again. Although its correctness can be verified, this algorithm obviously cannot be implemented. Nonetheless it is useful as a guideline for practical algorithms like LRU(least-recently-used) algorithm, and it also serves as an idealized benchmark for caching algorithms.
 21 | \section{Scheduling}
 22 | \begin{description}
 23 | \item[Input]A set of $n$ jobs (e.g. processes) that have to use a shared resource (e.g. a processor) exclusively. Each job $j$ has a length $l_j$ and a weight $w_j$. 
 24 | \item[Output]An order to execute the jobs that minimizes the weighted sum of completion times $\sum\limits_{j=1}^{n}w_jC_j.$
 25 | \end{description}
 26 | Two simple cases of the problem are when the jobs have the same lengths / weights. If they have the same length, jobs with larger weights should be scheduled earlier, whereas if they are of the same weight, jobs with shorter lengths should go first. The scenarios might be extended to handle more general cases if we are able to resolve conflicts: what if $w_i>w_j$ and $l_i>l_j$? This can be achieved by assigning scores to jobs, and scheduling jobs with higher scores in front. The score has to increase with weight and decrease with length. Two intuitive choices are 
 27 | \begin{itemize}
 28 | \item $w_j-l_j$
 29 | \item $w_j/l_j$
 30 | \end{itemize}
 31 | A simple 2-job case with $l_1=5,\allowbreak w_1=3$ and $l_2=2,\allowbreak w_2=1$ rules the first option out. We will try to prove the correctness of the second option, whose correctness is absolutely not trivial.
 32 | \begin{proof}
 33 | First we assume that all jobs have distinct scores, i.e. $w_i/l_i\neq w_j/l_j$ for $i\neq j$. This case can be addressed via contradiction.
 34 | 
 35 | The $n$ jobs can be renamed so that $\frac{w_1}{l_1}>\frac{w_2}{l_2}>\dots>\frac{w_n}{l_n}$. According to the rule above, the optimal order should be $1,2,\dots,n$. Suppose that there exists an order superior to this one. In this order, there must exist at least one pair of consecutive jobs $i,j$ such that $i>j$ but $i$ is behind $j$. If we exchange $i,j$, the completion time of $i$ will decrease by $l_j$, whilst that of $j$ will increase by $l_i$. In total, the weighted sum of completion times decreases by
 36 | $$w_il_j-w_jl_i.$$
 37 | Since $i>j$, we must have $\frac{w_i}{l_i}>\frac{w_j}{l_j}$, thus $w_il_j>w_jl_i$. In conclusion, we have obtained a better order than the one supposed to be the optimal, which negates the initial assumption.
 38 | 
 39 | With similar argument, the correctness of the algorithm can be verified for the general case with possible ties in score. In an arbitrary order of the jobs, a consecutive pair $(i,j)$ with $i>j$ and $i$ behind $j$ can be called an inversion\footnote{The definition of inversion here is different from that in Algorithm \ref{inversioncounting}.} The number of inversions is at most $\frac{n(n-1)}{2}$, and the only order without any inversion is $1,2,\dots,n$. Since we have $w_il_j\geq w_jl_i$ for $i>j$, exchanging an inversion is guaranteed not to increase the weighted sum of completion times. Each exchange decreases the number of inversions strictly by one. Thus after at most $\frac{n(n-1)}{2}$ exchanges, we must arrive at the order $1,2,\dots,n$ with no increase of the weighted sum. Therefore $1,2,\dots,n$ is at least as good as any other order in terms of weighted sum of completion times, making it a guaranteed optimal solution to the scheduling problem.
 40 | \end{proof}
 41 | \section{Minimum Spanning Tree}
 42 | Minimum spanning tree is a problem to which there exist a bunch of correct and fast greedy solutions. We will discuss two of them: Prim's algorithm and Kruskal's algorithm.
 43 | \begin{description}
 44 | \item[Input]An undirected graph $G(V,E)$ with a possibly negative cost $c_e$ for each $e\in E$. 
 45 | \item[Output]A minimum cost tree $T\subseteq E$ that spans all vertices, i.e. connected subgraph $(V,T)$ that contains no cycles with minimum sum of edge costs.
 46 | \end{description}
 47 | In order to facilitate the discussion, we assume that graph $G$ is connected, and that edge costs are distinct, although Prim and Kruskal remain correct for ties in edge costs.
 48 | \subsection{Prims' Algorithm}
 49 | Prim's MST algorithm is shown in Algorithm \ref{primmst}. 
 50 | \begin{algorithm}[ht]
 51 | \caption{Prim's MST Algorithm}\label{primmst}
 52 | \begin{algorithmic}[1]
 53 | \Input{Undirected graph $G(V,E)$ with distinct cost $c_e$ for all $e\in E$.}
 54 | \Output{MST of $G$}
 55 | \State{Initialize $X=\{s\}$, $s\in V$ chosen arbitrarily}
 56 | \State{Initialize $T=\emptyset$}
 57 | \While{$X\neq V$}
 58 | \State{Let $e=(u,v)$ be the cheapest edge of $G$ with $u\in X, v\notin X$}
 59 | \State{Add $e$ to $T$}
 60 | \State{Add $v$ to $X$}
 61 | \EndWhile
 62 | \end{algorithmic}
 63 | \end{algorithm}
 64 | \subsubsection{Correctness}
 65 | We will prove its correctness in two steps. First, we verify that it does compute a spanning tree $T^*$. Then we prove that $T^*$ is an MST. 
 66 | \begin{lemma}\textbf{(Empty Cut Lemma)}\label{emptycutlemma}
 67 | A graph is not connected if and only if $\exists$ cut $(A,B)$ with no crossing edges.
 68 | \end{lemma}
 69 | \begin{proof}
 70 | $(\Leftarrow)$The proof is trivial. Just take vertex $u\in A$ and $v\in B$. There cannot exist any edges between $u,v$, thus the graph is not connected.
 71 | 
 72 | $(\Rightarrow)$Suppose there exists no path between $u,v$. Take $A=\{$All vertices reachable from $u\}$, i.e. the connected component of $u$, $B=\{$All other vertices$\}$, i.e. other connected components. Then there exists no crossing edges of the cut $(A,B)$.
 73 | \end{proof}
 74 | \begin{lemma}\label{doublecrossinglemma}
 75 | \textbf{(Double Crossing Lemma)}
 76 | Suppose the cycle $C\subseteq E$ has an edge crossing the cut $(A,B)$, then there must exist some other edge $e\in C$ that crosses the same cut.
 77 | \end{lemma}
 78 | \begin{corollary}
 79 | \textbf{(Lonely Cut Corollary)}\label{lonelycutcorollary}
 80 | If $e$ is the only edge crossing a cut $(A,B)$, then it is not contained in any cycle.
 81 | \end{corollary}
 82 | With the lemmas and the corollaries above, we can prove that Prim's algorithm outputs a spanning tree.
 83 | \begin{proof}
 84 | It can be proved by induction that Prim's algorithm maintains the invariant that $T$ spans $X$. The proof of connectivity is trivial. No cycle can be created in $T$ because each time an edge $e$ is added into $T$, it becomes the only crossing edge of the cut $(X,\{v\})$ of $T$, and therefore cannot be contained in a cycle according to Corollary \ref{lonelycutcorollary}. 
 85 | 
 86 | The algorithm cannot get stuck when $X\neq V$, because otherwise the cut $(X,V-X)$ must be empty, and according to Lemma \ref{emptycutlemma}, the graph would be disconnected. 
 87 | 
 88 | As a conclusion, Prim's algorithm is guaranteed to output a spanning tree of the original graph.
 89 | \end{proof}
 90 | The second part of the proof is based on the cut property.
 91 | \begin{theorem}\label{cutproperty}
 92 | \textbf{(Cut Property)}
 93 | Consider an edge $e$ of graph $G$. If $\exists$ cut $(A,B)$ such that $e$ is the cheapest crossing edge of the cut, then $e$ belongs to the\footnote{We use ``the'' rather than ``a'' because the MST is unique if edge costs are distinct.} MST of $G$.
 94 | \end{theorem}
 95 | \begin{proof}
 96 | The cut property can be proved by exchange argument.
 97 | 
 98 | Suppose there is an edge $e$ that is the cheapest crossing edge of a cut $(A,B)$, yet $e$ is not in the MST $T^*$. As shown in Figure \ref{proofcutproperty}, in which all blue edges form the minimum spanning tree $T^*$, and the minimum crossing edge $e$ of $(A,B)$ is not contained in $T^*$.
 99 | \begin{figure}[ht]
100 | \centering
101 | \begin{tikzpicture}
102 | \tikzstyle{nodestyle} = [circle, draw=blue, fill=blue!20!]
103 | \tikzstyle{usualedge} = [blue, very thick]
104 | \tikzstyle{emphedge} = [red, very thick]
105 | \tikzstyle{label} = [midway, above]
106 | \node (A) at (-2,0.5) {A};
107 | \node (B) at (2,0.5) {B};
108 | \node[nodestyle](1) at (-2,0){1};
109 | \node[nodestyle](2) at (-2,-1){2};
110 | \node[nodestyle](3) at (-2,-2){3};
111 | \node[nodestyle](4) at (2,0){4};
112 | \node[nodestyle](5) at (2,-1){5};
113 | \node[nodestyle](6) at (2,-2){6};
114 | \draw[usualedge] (1) -- (2);
115 | \draw[usualedge] (2) -- (3);
116 | \draw[usualedge] (1) -- (4) node[label] {f};
117 | \draw[emphedge] (2) -- (5) node[label] {e};
118 | \draw[usualedge] (3) -- (6) node[label] {e'};
119 | \draw[usualedge] (5) -- (6);
120 | \draw (2) ellipse (1cm and 2cm);
121 | \draw (5) ellipse (1cm and 2cm);
122 | \end{tikzpicture}
123 | \caption{Proof of cut property}\label{proofcutproperty}
124 | \end{figure}
125 | 
126 | We cannot exchange $e$ with a random crossing edge of the cut $(A,B)$. In this example, if we exchange $e$ with $f$, we no longer have a spanning tree. Rather, if $e$ is exchanged with $e'$, we obtain a spanning tree with smaller cost than $T^*$. Our task is to prove that such an edge $e'$ always exists. 
127 | 
128 | Since $T^*$ is a spanning tree, $T^*\cup\{e\}$ must contains a cycle that includes $e$. According to Lemma \ref{doublecrossinglemma}, there must exist another edge $e'$ that crosses the cut $(A,B)$. According to the assumption, $e'$ must be more expensive than $e$. By substituting $e'$ with $e$ in $T^*$, we obtain a spanning tree $(T^*-\{e'\})\cup\{e\}$ with smaller cost than $T^*$, which contradicts with our assumption that $T^*$ is the MST.
129 | \end{proof}
130 | 
131 | According to the cut property, each edge selected in Prim's algorithm is guaranteed to be part of the MST. Since we obtain a spanning tree in the end, it must be the MST. 
132 | \subsubsection{Implementation}
133 | A brute-force implementation of Prim's algorithm has $O(nm)$ running time. In each iteration, all edges going out of $X$ needs to be scanned and the cheapest among them is chosen. The scanning process is $O(m)$, and there are totally $n$ iterations, thus the overall running time is $O(nm)$. This may not seem fast, but considering the fact than there exist $2^m$ possible sub-graphs among which the MST has to be selected, it already provides plenty of performance amelioration.
134 | 
135 | Using heap can make Prim's algorithm run even faster. A straightforward idea is to use the heap to store crossing edges of the cut $(X,V-X)$, as shown in Algorithm \ref{primmstheap1}.
136 | \begin{algorithm}[ht]
137 | \caption{Prim's MST Algorithm, Heap Implementation 1}\label{primmstheap1}
138 | \begin{algorithmic}[1]
139 | \Input{Undirected graph $G(V,E)$ with distinct cost $c_e$ for all $e\in E$.}
140 | \Output{MST of $G$}
141 | \State{Initialize an empty heap $H$ and an empty set of nodes $X$.}
142 | \State{Randomly select a node $s$ and add it to $X$.}
143 | \State{Insert all edges $(s,v)$ into the heap.}
144 | \While{$X\neq V$}
145 | \State{Add edge $e=H.extractMin()$ to $T$.}
146 | \State{Add the node $n$ of $e$ not in $X$ to X.}
147 | \For{each edge $(n,v)\in E$}
148 | \If{$v\in V-X$}
149 | \State{Insert $(n,v)$ into $H$.}
150 | \ElsIf{$(n,v)\in H$}
151 | \State{Delete $(n,v)$ from $H$.}
152 | \EndIf
153 | \EndFor
154 | \EndWhile
155 | \end{algorithmic}
156 | \end{algorithm}
157 | Each edge can be inserted into and deleted from the heap at most once respectively, thus the overall running time is $O(m\log m)$. 
158 | 
159 | The heap can also be used to store vertices in $V-X$, with the key of node $v$ being the cost of the cheapest edge $(u,v)$ with $u\in X$, or $\infty$ if such edge does not exist. The implementation is shown in Algorithm \ref{primmstheap2}.
160 | \begin{algorithm}[ht]
161 | \caption{Prim's MST Algorithm, Heap Implementation 2}\label{primmstheap2}
162 | \begin{algorithmic}[1]
163 | \Input{Undirected graph $G(V,E)$ with distinct cost $c_e$ for all $e\in E$}
164 | \Output{MST of $G$}
165 | \State{Initialize heap $H$ with all nodes in $V$. Obviously all nodes have key $\infty$.}
166 | \State{Initialize empty set of nodes $X$.}
167 | \While{$H$ is not empty}
168 | \State{Add node $v=H.extractMin()$ to $X$.}
169 | \State{Add the edge associated with $v$ to $T$ if it exists.}
170 | \For{each edge $(v,w)\in E$}
171 | \If{$w\in V-X$ \textbf{and} $cost(v,w)<key[w]$}
172 | \State{Change $key[w]$ to $cost(v,w)$.}
173 | \State{Change the edge associated with $w$ to $(v,w)$.}
174 | \State{Bubble up the changed node in $H$.}
175 | \EndIf
176 | \EndFor
177 | \EndWhile
178 | \end{algorithmic}
179 | \end{algorithm}
180 | 
181 | In total the minimum of the heap is extracted $n$ times. Each edge is possible to trigger a bubble up. Since the graph is connected, we have $m\geq n-1$. Therefore, the overall running time of Algorithm \ref{primmstheap2} is $O(m\log n)$.
182 | \subsection{Kruskal's Algorithm}
183 | Kruskal's algorithm is another efficient greedy algorithm that solves the MST problem.
184 | \begin{algorithm}[ht]
185 | \caption{Kruskal's MST Algorithm}\label{kruskalmst}
186 | \begin{algorithmic}[1]
187 | \Input{Undirected connected graph $G(V,E)$ with distinct cost $c_e$ for all $e\in E$}
188 | \Output{$T$: MST of $G$}
189 | \State{Sort all edges in order of increasing cost.}
190 | \State{Rename the edges so that $c_1<c_2<\dots<c_m$.}
191 | \State{Initialize set of edges $T=\emptyset$.}
192 | \For{$i=1$ \textbf{to} $m$}\Comment{Can terminate when $T$ already contains all nodes.}
193 | \If{$T\cup\{i\}$ has no cycle}
194 | \State{Add $i$ to $T$}
195 | \EndIf
196 | \EndFor
197 | \end{algorithmic}
198 | \end{algorithm}
199 | \subsubsection{Correctness}
200 | We will prove the correctness of Kruskal's algorithm in a similar way to that of Prim's algorithm, i.e. first we verify that it outputs a spanning tree, then we demonstrate that this tree is the MST.
201 | 
202 | Let $T^*$ be the output of Kruskal's algorithm on a graph $G$. It can be proved that $T^*$ contains all nodes of $G$. For a random node $n$, consider the cut $(\{n\},V-\{n\})$ of $G$. Since $G$ is connected, there must exist at least one edge crossing this cut. Let $e$ represent the cheapest among them. When Kruskal's algorithm examines $e$, it is guaranteed to include it into $T$, because $e$ is at the moment the only crossing edge of the cut inside $T$ and thus cannot induce any cycle according to the lonely cut corollary (\ref{lonelycutcorollary}). Therefore $n$ is contained in $T^*$. 
203 | 
204 | It is enforced by the algorithm that $T^*$ contains no cycle. In order to prove that $T^*$ is a spanning tree of $G$, we need to verify that $T^*$ is connected, which according to empty cut lemma (\ref{emptycutlemma}) is equivalent to the fact that it crosses every cut of $G$. For any cut $(A,B)$ of $G$, the same conclusion as that proved for $(\{n\},V-\{n\})$ above still holds, i.e. $T^*$ must contain the cheapest crossing edge of this cut. Therefore $T^*$ must cross any cut, and we have succeeded in proving that Kruskal's algorithm outputs a spanning tree of $G$.   
205 | 
206 | In order to prove that $T^*$ is the MST of $G$, we just have to prove that each edge of $T^*$ satisfies the cut property (\ref{cutproperty}). 
207 | 
208 | Consider an iteration in Kruskal's algorithm that adds edge $(u,v)$ into $T$, as illustrated in Figure \ref{proofkruskal}. 
209 | \begin{figure}[ht]
210 | \centering
211 | \begin{tikzpicture}
212 | \tikzstyle{majornode} = [circle, draw=blue, fill=blue!20!]
213 | \tikzstyle{minornode} = [circle, draw, minimum size=0.3cm]
214 | \tikzstyle{minoredge} = [thick, black]
215 | \tikzstyle{majoredge} = [very thick, red]
216 | \node[majornode] (u) at (-2, -1) {u};
217 | \node[majornode] (v) at (1, -3) {v};
218 | \node[minornode] (u1) at (-2.5,-1.5){};
219 | \node[minornode] (u2) at (-3,-2){};
220 | \node[minornode] (u3) at (-2.2,-2.5){};
221 | \node[minornode] (u4) at (-2,-1.8){};
222 | \node[minornode] (v1) at (2.3,-1.5){};
223 | \node[minornode] (v2) at (2.4,-2.2){};
224 | \node[minornode] (v3) at (0.8,-2){};
225 | \node[minornode] (v4) at (1.2,-1.4){};
226 | \node[minornode] (n1) at (0.5, -0.8) {};
227 | \node[minornode] (n2) at (0, -1.8) {};
228 | \node[minornode] (n3) at (-0.2, -0.5) {};
229 | \node[minornode] (n4) at (2, -0.3) {};
230 | \draw[minoredge] (n1) -- (n2);
231 | \draw[minoredge] (u) -- (u1);
232 | \draw[minoredge] (u1) -- (u2);
233 | \draw[minoredge] (u1) -- (u3);
234 | \draw[minoredge] (u3) -- (u4);
235 | \draw[minoredge] (v) -- (v1);
236 | \draw[minoredge] (v1) -- (v2);
237 | \draw[minoredge] (v1) -- (v3);
238 | \draw[minoredge] (v3) -- (v4);
239 | \draw[majoredge] (u) -- (v);
240 | \draw[blue] (u1) ellipse (1cm and 1.5cm);
241 | \draw[blue] (v4) ellipse (2.3cm and 2cm);
242 | \node[blue] (A) at (-2.5, -0.5) {A};
243 | \node[blue] (B) at (1, 0) {B};
244 | \end{tikzpicture}
245 | \caption{Proof of Kruskal's Algorithm}\label{proofkruskal}
246 | \end{figure}
247 | 
248 | Since $(u,v)$ does not induce any cycle, there cannot exist any path from $u$ to $v$ in $T$. Take $A=$\{connected component of $u$ in $T$\}, $B=$\{All other nodes of $G$\}, then $(u,v)$ is the first crossing edge of this cut examined by Kruskal's algorithm (if there existed any other, the associated node in $B$ would be connected to $u$, and should have been contained in $A$), thus is also the cheapest crossing edge of the cut, which means that it satisfies the cut property. Hence, all edges in $T^*$ satisfy the cut property. $T^*$ is 
249 | therefore the MST of $G$.
250 | \subsubsection{Implementation}
251 | Sorting the edges takes $O(m\log m)$ time. Assume that there exists no parallel edge, then $m=O(n^2)$, so the running time of the sorting is $O(m\log n)$. Totally there are $O(m)$ iterations. In each iteration, the existence of cycle in the intermediate sub-graph $T$, which contains at most $n-1$ edges, can be checked via DFS or BFS in $O(n)$ time\footnote{A JAVA implementation using DFS can be found at \href{http://algs4.cs.princeton.edu/code/edu/princeton/cs/algs4/Cycle.java.html}{\path{http://algs4.cs.princeton.edu/code/edu/princeton/cs/algs4/Cycle.java.html}}}. The total running time is therefore $O(m\log n) + O(mn)=O(mn)$. 
252 | 
253 | If we can find a data structure that allows cycle check in constant time, sorting the edges will become the bottleneck and the running time of Kruskal's algorithm will become $O(m\log n)$. Union-Find is a data structure that makes this feasible.
254 | 
255 | Union-Find data structure maintains the partition of a set of objects. It supports two essential operations:
256 | \begin{description}
257 | \item[$find(X)$]Returns the name of the group to which $X$ belongs.
258 | \item[$union(C_i, C_j)$]Merge groups $C_i,C_j$ into one group.
259 | \end{description} 
260 | A more detailed introduction to union find can be found in section \ref{unionfind}.
261 | 
262 | In the problem of cycle check, the objects are the nodes, and they are partitioned into different connected components of $(V,T)$. When a new edge $(u,v)$ is added to $T$, if $u,v$ are inside the same group, i.e. $find(u)=find(v)$, then $(u,v)$ introduces a cycle. 
263 | 
264 | In order that the cycle check operation is $O(1)$, the union-find here needs to carry out $find(u)$ in constant time. This can be achieved by maintaining a linked structure per connected component of $(V,T)$. Each component has an arbitrary leader node to which all nodes in the same connected component point. The leader represents the name of the connected component. $find(u)$ returns the leader in $O(1)$ time, thus cycle check can be finished in $O(1)$ time. 
265 | 
266 | When a new edge $(u,v)$ is added to $T$, connected components of $u,v$ should merge into one. When two components merge, we have the smaller one inherit the leader of the larger one. At most $O(n)$ pointer updates are required for a merge. Nonetheless, the leader of each node can be updated at most $O(\log n)$ times, because each merge in which the leader gets updated at least doubles the size of the component to which the node belongs. 
267 | 
268 | It takes $O(m\log n)$ time to sort the edges, $O(m)$ time to check cycles, and $O(n\log n)$ to update leaders during merges. In total, Kruskal's algorithm has $O(m\log n)$ running time, which matches with Prim's algorithm.
269 | \subsection{State-of-the-Art and Open Questions}
270 | Prim's and Kruskal's algorithms find the MST in $O(m\log n)$ time. But can we do better? The answer is yes. An $O(m)$ randomized algorithm and an $O(m\alpha(n))$ deterministic algorithm have been found. The optimal deterministic algorithm has also been found, yet its running time is still unclear. So it remains an open question whether or not there exists an $O(m)$ deterministic solution to the MST problem.
271 | \subsection{Clustering}
272 | Clustering is also called unsupervised learning, which is an important problem in machine learning. The target is to classify $n$ given ``points'' into $k$ coherent groups. These points are not necessarily geometrical points. They can be web pages, images, genome fragments, etc, but it is required that a similarity measure, i.e. a distance function $d(p,q)$ should be defined for the points. $d(p,q)$ should be symmetric. 
273 | \begin{definition}
274 | The spacing of a $k$-clustering is defined as $$\min\limits_{p\in C_i,q\in C_j,i\neq j}d(p,q),$$ in which $C_i,C_j$ are two different clusters.
275 | \end{definition}
276 | \begin{description}
277 | \item[Input]$n$ points with distance function $d(p,q)$.
278 | \item[Output]$k$-clustering with maximum spacing.
279 | \end{description}
280 | A greed algorithm is provided in Algorithm \ref{greedyclustering}. It is quite analogous to Kruskal's MST algorithm: the points are the vertices, the distances are the edge costs, and each pair of point is an edge. This approach to solving the clustering problem is called single link clustering.
281 | \begin{algorithm}[ht]
282 | \caption{Greedy Clustering Algorithm}\label{greedyclustering}
283 | \begin{algorithmic}[1]
284 | \State{Initialize each point as a cluster.}
285 | \Repeat
286 | \State{Choose $p\in C_i,q\in C_j,i\neq j$ such that $d(p,q)$ is minimized.}
287 | \State{Merge $C_i,C_j$ into one cluster.}
288 | \Until{There are $k$ clusters left.}
289 | \end{algorithmic}
290 | \end{algorithm}
291 | 
292 | To prove the correctness of Algorithm \ref{greedyclustering}, we again turn to exchange argument.
293 | \begin{proof}
294 | Let $C_1,C_2,\dots,C_k$ be the $k$-clustering found by the greedy algorithm with spacing $S$, and $\hat{C_1},\hat{C_2},\dots,\hat{C_k}$ be another arbitrary $k$-clustering with spacing $\hat{S}$. We need to verify that $\hat{S}\leq S$.
295 | 
296 | If $\hat{C_i}$'s are just a renaming of $C_i$'s, then they obviously have the same spacing. Otherwise, there must exist at least one pair of points $p,q$ such that $p,q\in C_i$ but $p\in \hat{C_i}, q\in\hat{C_j}, i\neq j.$
297 | 
298 | If $p,q$ were chosen in some iteration of Algorithm \ref{greedyclustering}, i.e. they were ``directly merged'', then we must have $d(p,q)\leq S$. Since $\hat{S}\leq d(p,q)$, we've already proved $\hat{S}\leq S$.
299 | 
300 | If $p,q$ were not chosen in any iteration of the greedy algorithm, i.e. they were ``indirectly merged'', then let's consider the path from $p$ to $q$ inside $C_i$ formed by all direct greedy merges. Because $p\in \hat{C_i}, q\in\hat{C_j}, i\neq j,$ we are certainly able to find two consecutive points $p',q'$ on this path such that $p'\in C_i, q'\in C_j$. Then the problem is reduced to the case above, and we have $\hat{S}\leq d(p',q')\leq S$.   
301 | \end{proof}
302 | \section{Huffman's Code}
303 | \subsection{Problem Definition}
304 | Text has to be encoded into binary codes in order to be processed by modern computers. An encoding method maps each character inside an alphabet $\Sigma$ into a binary string. If all characters are encoded with binary strings of the same length, then $n$ bits can represent at most $2^n$ characters. For example, ASCII uses 8 bits to represent 128 characters. 
305 | 
306 | We can do better by using fewer bits to represent characters more frequently used, i.e. using variable-length codes. The codes have to be prefix-free in order to avoid ambiguity.
307 | 
308 | A binary encoding can be represented by a binary tree. For each node, path to its left child is marked with 0 while path to its right child is marked with 1. Each node is labeled with a character, whose code is the 0/1 marks encountered when going from the root to this node. For a prefix-free encoding, only leaf nodes can be labeled.
309 | 
310 | Our target is to find the best prefix-free encoding for a given set of character frequencies. The average encoding length $L(T)$ for an encoding (represented by the correspondent binary tree $T$) is defined as
311 | $$L(T)=\sum\limits_{i\in\Sigma}p_id_i,$$
312 | in which $p_i$ is the frequency of $i$, and $d_i$ is the length of $i$'s code, i.e. the depth of $i$ in the tree.
313 | \begin{description}
314 | \item[Input]Probability $p_i$ for each character $i\in\Sigma$.
315 | \item[Output]An encoding tree $T$ that minimizes $L(T).$
316 | \end{description}
317 | 
318 | \subsection{Greedy Algorithm}
319 | Huffman's algorithm is a greedy algorithm that can help us find the optimal encoding.
320 | \begin{algorithm}[ht]
321 | \caption{Huffman's Algorithm}\label{huffman}
322 | \begin{algorithmic}[1]
323 | \If{$\lvert\Sigma\rvert=2$} 
324 | \State{Return a tree with the two characters in $\Sigma$ as children.}
325 | \EndIf
326 | \State{Let $a,b$ be the two characters in $\Sigma$ that have the smallest frequencies.}
327 | \State{Let $\Sigma'=\Sigma-\{a,b\}+\{ab\}$, and $p_{ab}=p_a+p_b$.}
328 | \State{Recursively compute $T'$ for $\Sigma'$.}
329 | \State{Expand $T'$ to a tree $T$ by splitting leaf $ab$ into two leaves $a,b$.}
330 | \end{algorithmic}
331 | \end{algorithm}
332 | 
333 | As an example, consider the following alphabet.
334 | \begin{center}
335 | \begin{tabular}{ccccccc}
336 | character & A & B & C & D & E & F\\
337 | weight    & 3 & 2 & 6 & 8 & 2 & 6
338 | \end{tabular}
339 | \end{center}
340 | Running Huffman's algorithm will give us the result 
341 | \begin{center}
342 | \begin{tikzpicture}[
343 | level 1/.style={sibling distance=4cm, level distance=1.5cm},
344 | level 2/.style={sibling distance=2cm, level distance=1.5cm},
345 | level 3/.style={sibling distance=2cm, level distance=1.5cm},
346 | level 4/.style={sibling distance=2cm, level distance=1.5cm},
347 | ]
348 | \tikzstyle{node} = [circle, draw, minimum size=0.5cm]
349 | \tikzstyle{leafnode} = [circle, draw, minimum size=0.5cm, red]
350 | \node[node] (1) at (0,0) {27}
351 | 	child { node[node] {15}
352 | 		child { node[node] {7}
353 | 			child { node[leafnode] {A(3)} edge from parent node[left]{0}}
354 | 			child { node[node] {4}
355 | 				child { node[leafnode] {B(2)} edge from parent node[left]{0}}
356 | 				child { node[leafnode] {E(2)} edge from parent node[right]{1}}
357 | 				edge from parent node[right]{1}
358 | 				}
359 | 			edge from parent node[left]{0}}
360 | 		child { node[leafnode] {D(8)} edge from parent node[right]{1}}
361 | 		edge from parent node[left]{0}}
362 | 	child { node[node] {12}
363 | 		child { node[leafnode] {C(6)} edge from parent node[left]{0}}
364 | 		child { node[leafnode] {F(6)} edge from parent node[right]{1}}
365 | 		edge from parent node[right]{1}};
366 | \end{tikzpicture}
367 | \end{center}
368 | Finally we have the Huffman code:
369 | \begin{center}
370 | \begin{tabular}{ccccccc}
371 | character & A & B & C & D & E & F\\
372 | code    & 000 & 0010 & 10 & 01 & 0011 & 11
373 | \end{tabular}
374 | \end{center}
375 | \subsection{Proof of Correctness}
376 | We will prove the correctness of Huffman's algorithm by induction on $n=\lvert\Sigma\rvert>2$.
377 | 
378 | The base case is trivial: for an alphabet containing 2 characters, 1 bit is enough for the encoding. Suppose the algorithm is correct for any $k\leq n$, in which $n\geq 2$. Let's denote the alphabet for $n$ as $\Sigma'$, and the tree obtained by the algorithm to encode $\Sigma'$ as $T'_0$, which contains a leaf $ab$ that will be split into siblings $a,b$ to obtain tree $T_0$ that will be used to encode $\Sigma$. 
379 | 
380 | For any tree $T'$ used to encode $\Sigma'$, and tree $T$ obtained by splitting $ab$ of $T'$ into $a$ and $b$, since $p_{ab}=p_a+p_b$ and $d_{a}=d_{b}=d'_{ab}+1$, we have 
381 | \begin{equation}\label{proofhuffman}
382 | L(T)-L(T')=p_ad_a+p_bd_b-p_{ab}d'_{ab}=p_a+p_b.
383 | \end{equation}
384 | According to our assumption, $T'_0$ produces the minimum average encoding length $L(T')$ among all possible $T'$. \eqref{proofhuffman} verifies that $L(T)$ has a constant difference from $L(T')$, thus $T_0$ is the optimal choice among all $T$, i.e. all encoding trees of $\Sigma$ that have $a$ and $b$ as siblings. Next we will prove by exchange argument that this optimum is guaranteed to be the overall optimum.
385 | 
386 | Encoding trees of $\Sigma$ are guaranteed to have the following properties:
387 | \begin{itemize}
388 | \item Each node has either no child or two children. If a node has only one child, as shown in the following figure, node B can simply be removed.\\
389 | \begin{figure}[H]
390 | \centering
391 | \begin{tikzpicture}
392 | \tikzstyle{node}=[circle, draw]
393 | \node[node] (0) at (0,0) {A};
394 | \node[node,red] (1) at (-1,-1) {B};
395 | \node[node] (2) at (-2,-2) {C};
396 | \node[node] (3) at (4,-0.5) {A};
397 | \node[node] (4) at (3,-1.5) {C};
398 | \draw (0) -- (1);
399 | \draw (1) -- (2);
400 | \draw (3) -- (4);
401 | \draw[->] (0.5,-1) -- (2.5,-1);
402 | \end{tikzpicture} 
403 | \end{figure}
404 | \item Leaf nodes at the same  level can be interchanged arbitrarily without affecting the average encoding length. 
405 | \end{itemize}
406 | 
407 | For an encoding tree that do not have $a$ and $b$ as siblings but having them at the same level, we can always find another tree with the same average encoding length having $a,b$ as siblings. Suppose that an encoding tree $T$ with $a,b$ at different levels has the minimum average encoding length. There is at least one of them not at the bottom level, which let's suppose is $a$. There is at least one node other than $b$ in the bottom level, which let's suppose is $x$. Up to now we have $p_a<p_x$ and $d_a<d_x$. By exchanging $a,x$, we obtain a new encoding tree $T_1$, and 
408 | $$L(T_1)-L(T)=p_ad_x+p_xd_a-(p_xd_x+p_ad_a)=-(p_a-p_x)(d_a-d_x)<0,$$
409 | which contradicts our assumption that $T$ is optimal. Hence the optimal tree must have $a,b$ together at the bottom level.
410 | 
411 | A brute-force implementation of Huffman's algorithm takes $O(n^2)$ time: there are $n$ iterations, and in each iteration it takes $O(n)$ to search for $a,b$. By using a heap to store the frequencies, the running time can be reduced to $O(n\log n)$. By sorting the frequencies in advance, which takes $O(n\log n)$, the rest of the job can be finished in $O(n)$ time with the help of two queues, as shown in Algorithm \ref{huffmanfast}.
412 | \begin{algorithm}[ht]
413 | \caption{Huffman's Algorithm - Fast Implementation}\label{huffmanfast}
414 | \begin{algorithmic}[1]
415 | \Input{Probability $p_i$ for each character $i\in\Sigma$}.
416 | \Output{An encoding tree $T$ that minimizes $L(T).$}
417 | \State{Sort the characters according to $p_i$ and push them in a queue $Q_1$.}
418 | \State{Set up empty queue $Q_2$.}
419 | \While{$\lvert Q_1\rvert+\lvert Q_2\rvert>1$}
420 | \State{$a=selectMin(Q_1,Q_2)$}
421 | \State{$b=selectMin(Q_1,Q_2)$}
422 | \State{Push $ab$ with $p_{ab}=p_a+p_b$ to $Q_2$.}\Comment{$a,b$ are siblings and $ab$ is their parent.}
423 | \EndWhile
424 | \State{The only node left in $Q_2$ is the root of $T$.}
425 | \Function{$selectMin$}{$Q_1,Q_2$}
426 | \State{$a=Q_1.front(), b=Q_2.front()$}
427 | \State{Pop the smaller between $a,b$ from its queue and return it.}
428 | \EndFunction
429 | \end{algorithmic}
430 | \end{algorithm}
431 | 
432 | \ifx\PREAMBLE\undefined
433 | \end{document}
434 | \fi


--------------------------------------------------------------------------------
/DataStructures.tex:
--------------------------------------------------------------------------------
  1 | \ifx\PREAMBLE\undefined
  2 | \input{preamble}
  3 | \begin{document}
  4 | \fi
  5 | \chapter{Data Structures}
  6 | Data structures help us organize data so that it can be accessed quickly and usefully. Different data structures support different sets of operations, thus are suitable for different tasks.
  7 | \section{Heap}
  8 | A heap, also named a priority queue, is a container for objects with comparable keys. It should support at least two basic operations: insertion of new object, and extraction(i.e. removal) of the object with minimum\footnote{A heap can also support extraction of object with maximum key, but extract-min and extract-max cannot be supported simultaneously.} key. Both operations are expected to take $O(\log n)$ time. Typical heap implementations also support deletion of an object from the key, which is also $O(\log n)$. The construction of a heap, namely ``heapify'', takes $O(n)$ rather than $O(n\log n)$.
  9 | \subsection{Use Cases}
 10 | Heap can be used for sorting. First construct a heap with the $n$ items to be sorted, and then execute extract-min $n$ times. The process takes $O(n\log n)$ time, which is already the optimal running time for comparison based sorting algorithms. 
 11 | 
 12 | We've already covered the use of a heap to accelerate Dijkstra's algorithm in the previous chapter.
 13 | 
 14 | An interesting use case of heap is median maintenance. We define the median of a sorted sequence of $n$ items $x_1,\dots,x_n$ to be $x_{(n+1)/2}$, for example $x_4$ for 8 items and $x_5$ for 9 items. 
 15 | \begin{description}
 16 | \item[Input]A sequence of unsorted items $x_1,x_2,\dots,x_n$ provided one-by-one.
 17 | \item[Output]At each step $i$, calculate the median of $x_1,\dots,x_i$ in $O(\log i)$ time.
 18 | \end{description}
 19 | The problem can be solved using two heaps, as shown in Algorithm \ref{medianmaintenance}. For convenience, we assume that the heaps used here supports not only the extraction of min/max, but also checking the key value of the min/max without removing it.
 20 | \begin{algorithm}[ht]
 21 | \caption{Median Maintenance using Heaps}\label{medianmaintenance}
 22 | \begin{algorithmic}[1]
 23 | \InputOutput\Statex{see above}
 24 | \State{Initialize empty MaxHeap that supports extract-max}\Comment{Stores smaller half}
 25 | \State{Initialize empty MinHeap that supports extract-min}\Comment{Stores larger half}
 26 | \For{$i$ = 1 \textbf{to} $n$}
 27 | \If{$x_i<$ MaxHeap.checkMax()}\Comment{Should insert into smaller half}
 28 | \State{MaxHeap.insert($x_i$)}
 29 | \Else\Comment{insert into larger half}
 30 | \State{MinHeap.insert($x_i$)}
 31 | \EndIf
 32 | \If{MinHeap.size() - MaxHeap.size() == 2}\Comment{If unbalanced, balance the two halves}
 33 | \State{MaxHeap.insert(MinHeap.extractMin()}
 34 | \ElsIf{MaxHeap.size() - MinHeap.size() == 2}
 35 | \State{MinHeap.insert(MaxHeap.extractMax()}
 36 | \EndIf
 37 | \If{MinHeap.size() $>$ MaxHeap.size()}\Comment{Set median}
 38 | \State{median = MinHeap.checkMin()}
 39 | \Else\State{median = MaxHeap.checkMax()}
 40 | \EndIf
 41 | \EndFor
 42 | \end{algorithmic}
 43 | \end{algorithm}
 44 | \subsection{Implementation}
 45 | A heap can be conceptually thought of as a binary tree that is as complete as possible, i.e. null leaves are only allowed at the lowest level. The key of any node should be smaller than or equal to keys of its children, if there are any. This guarantees that the object at the root of the tree has the smallest key. This tree can be implemented as an array, with the root at the first position, and nodes at lower levels sequentially concatenated afterwards. If the array $A$ is 1-indexed, then the parent of $A[i]$ is $A[i/2]$, and the children of this node are $A[2i]$ and $A[2i+1]$.
 46 | 
 47 | With the array representation of heap, insertion can be implemented as follow:
 48 | \begin{itemize}
 49 | \item Put the new object at the end of the array.
 50 | \item As long as the key of the new object is smaller than that of its parent, bubble it up.
 51 | \end{itemize}
 52 | And extract-min can be implemented as follow:
 53 | \begin{itemize}
 54 | \item Remove the root.
 55 | \item Move the last object in the array to the first position.
 56 | \item As long as the key of the object at the root is larger than that of at least one of its children, sink it down. If the keys of both children are smaller, the child with smaller key should be used in the sink-down.
 57 | \end{itemize}
 58 | The height of the tree is $O(\log n)$, thus either bubble-up or sink-down can be executed at most $O(\log n)$ times, which guarantees that the two operations take $O(\log n)$ running time.
 59 | \section{Binary Search Tree}
 60 | Sorted array supports quick search of an element in $O(\log n)$ time, but it takes $O(n)$ time to insert or delete an element. Binary search tree is a data structure that supports both quick search and quick insertion / deletion. 
 61 | \subsection{Basic Operations}
 62 | Each node of a BST contains the key and three pointers to other nodes: the left child, the right child and the parent. Some of the three pointers can be null. The most important property of BST is that for any node, all nodes in its left child has smaller keys than itself, while all nodes in its right key has larger keys. The height of a BST is at least $\log n$ and at most $n$. A \textbf{balanced} BST supports search, insertion and deletion in $O(\log n)$ time. But if it's not balanced, these operations can take as long as $O(n)$ time. Some of its basic operations are listed below.
 63 | \begin{description}
 64 | \item[search]In order to search for a node with a specific key value $k$:
 65 | \begin{itemize}
 66 | \item Start from the root node.
 67 | \item If a node is null or its key is equal to $k$, return this node.
 68 | \item If $k$ is smaller than its key, recursively search its left child.
 69 | \item If $k$ is larger than its key, recursively search its right child.
 70 | \end{itemize}
 71 | \item[insert]In order to insert a new node with key value $k$:
 72 | \begin{itemize}
 73 | \item Start from the root node.
 74 | \item If the node is null, make a new node here with key value $k$.
 75 | \item If $k$ is smaller than its key, go to its left child.
 76 | \item If $k$ is larger than its key, go to its right child.
 77 | \end{itemize}
 78 | \item[max]In order to obtain the node with the maximum key value:
 79 | \begin{itemize}
 80 | \item Start from the root node.
 81 | \item If the node has right child, go to its right child.
 82 | \item Return the node.
 83 | \end{itemize}
 84 | \item[min]Similar to max.
 85 | \item[successor]In order to obtain the successor of a node with key value $k$:
 86 | \begin{itemize}
 87 | \item If the node has right child, return the max of its right node.
 88 | \item Otherwise recursively go to its parent, until the key becomes larger than $k$.
 89 | \end{itemize}
 90 | \item[predecessor]Similar to successor.
 91 | \item[in order traversal]In order to traverse all nodes of a BST in order:
 92 | \begin{itemize}
 93 | \item Start from the root node.
 94 | \item If the node is null, stop.
 95 | \item Recursively traverse the left child.
 96 | \item Do something to the node, e.g. print its key.
 97 | \item Recursively traverse the right child.
 98 | \end{itemize}
 99 | \item[delete]In order to delete a node with key value $k$:
100 | \begin{itemize}
101 | \item Search for the node.
102 | \item If it has no child, change it to null.
103 | \item If it has 1 child, replace it with its child.
104 | \item If it has 2 children, find its predecessor, who is guaranteed to have at most 1 child, and swap their keys. Then delete the node (currently at its predecessor's old position).
105 | \end{itemize}
106 | \end{description}
107 | Sometimes a tree node can contain some information about the tree itself, for example the size of the subtree that uses this node as root. For each node $n$, we have
108 | $$size(n) = size(n.left) + size(n.right) + 1.$$
109 | With this information, we can find the node with the $i^{th}$ largest key among all nodes:
110 | \begin{itemize}
111 | \item Start from the root node.
112 | \item If $size(n.left) = i - 1$, return the node.
113 | \item If $size(n.left) > i - 1$, return the node with the $i^{th}$ largest key in the left subtree.
114 | \item If $size(n.left) < i - 1$, return the node with the $(i-size(n.left)-1)^{th}$ largest key in the right subtree.
115 | \end{itemize}
116 | \subsection{Red-Black Tree}
117 | The height of a BST can vary between $O(\log n)$ and $O(n)$. Balanced BSTs are  guarantees to have $O(\log n)$ height, thus ensuring the efficiency of operations on it. Red-black trees is an implementation of balanced BST. In addition to the key and pointers to the parent and children, nodes in a red-black tree also stores a bit to indicate whether the node is red or black. The following conditions are satisfied by a red-black tree:
118 | \begin{enumerate}
119 | \item Each node is either red or black;
120 | \item The root is black;\label{redblackcondition2}
121 | \item There can never be two red nodes in a row, i.e. red nodes must have black parents and children;\label{redblackcondition3}
122 | \item Every root $\rightarrow$ null path has the same number of black nodes.\label{redblackcondition4}
123 | \end{enumerate}
124 | \begin{theorem}
125 | The height of a red-black tree with $n$ nodes is smaller than or equal to $O(2\log(n+1))$.
126 | \end{theorem}
127 | \begin{proof}
128 | Suppose all root $\rightarrow$ null paths contain $k$ black nodes. Then the red-black contains at lest $k$ complete levels, because otherwise there would exist root $\rightarrow$ null paths with fewer than $k$ nodes, thus of cause fewer than $k$ black nodes. Therefore we have 
129 | $$n\geq 1 + 2 + \dots + 2^{k-1} = 2^k-1,$$
130 | which means $k\leq\log(n+1).$ Suppose the height of the tree is $h$. According to condition \ref{redblackcondition3}, we hereby come to the conclusion 
131 | $$h\leq 2k\leq 2\log(n+1).$$
132 | \end{proof}
133 | An important idea in the implementation of red-black tree is rotation, as illustrated in Figure \ref{rotations}. It alters the structure of the tree in a way that makes the tree more balanced, whilst preserves the BST property.
134 | 
135 | \begin{figure}[ht]
136 | \begin{subfigure}{\textwidth}
137 | \centering
138 | \begin{tikzpicture}
139 | \tikzstyle{subtree} = [regular polygon, regular polygon sides = 4, draw]
140 | \tikzstyle{singlenode} = [circle,draw]
141 | \node[circle, draw](1) at (0,0){1}
142 | child { node[subtree] {A} }
143 | child  { node [singlenode] {2}
144 | 	child { node[subtree] {B} }
145 | 	child { node[subtree] {C} }
146 | };
147 | \node[circle, draw](2)[right= 5cm of 1]{2}
148 | child  { node [singlenode] {1}
149 | 	child { node[subtree] {A} }
150 | 	child { node[subtree] {B} }
151 | }
152 | child { node[subtree] {C} };
153 | \draw[->,very thick] (2.3,-1.5) -- (3.3,-1.5); 
154 | \end{tikzpicture}
155 | \caption{left rotation}
156 | \end{subfigure}\\
157 | \begin{subfigure}{\textwidth}
158 | \centering
159 | \begin{tikzpicture}
160 | \tikzstyle{subtree} = [regular polygon, regular polygon sides = 4, draw]
161 | \tikzstyle{singlenode} = [circle,draw]
162 | \node[singlenode](1) at (0,0){1}
163 | child  { node [singlenode] {2}
164 | 	child { node[subtree] {A} }
165 | 	child { node[subtree] {B} }
166 | }
167 | child { node[subtree] {C} };
168 | \node[singlenode](2)[right= 3cm of 1]{2}
169 | child { node[subtree] {A} }
170 | child  { node [singlenode] {1}
171 | 	child { node[subtree] {B} }
172 | 	child { node[subtree] {C} }
173 | };
174 | \draw[->,very thick] (1.3,-1.5) -- (2.3,-1.5); 
175 | \end{tikzpicture}
176 | \caption{right rotation}
177 | \end{subfigure}
178 | \caption{Rotations in Red Black Tree}\label{rotations}
179 | \end{figure}
180 | 
181 | Insertion and deletion in a red-black tree is carried out in two steps. First a normal BST insertion / BST is executed. It is probable that some of the conditions of red-black tree will be violated, thus we then modify the tree by recoloring the nodes and rotations in order to restore the conditions.
182 | 
183 | When we insert a node into the red-black tree as we do for any BST, we first try to color it as red. If condition \ref{redblackcondition3} is not violated, then everything is fine. Otherwise we wind up in two possible cases, as shown in Figure \ref{redblackinsertion}, in which $x$ is the newly inserted node. 
184 | 
185 | In case 1, all we need to do is a recoloring of the nodes. The red node is propagated upwards, which may possibly induce another violation of \ref{redblackcondition3}. The process can last as much as $O(\log n)$ times until we reach the root. If the root is colored red, condition \ref{redblackcondition2} will be violated, and the solution is to color it back to black. 
186 | 
187 | During the upward propagation process, it is possible that we meet case 2. Tackling this case is a little bit more complex, but it can be proven that the conditions can be restored via 2-3 rotations and recolorings in $O(1)$ time.
188 | 
189 | \begin{figure}[H]
190 | \begin{subfigure}{.6\textwidth}
191 | \centering
192 | \begin{tikzpicture}
193 | \tikzstyle{rednode}=[circle,draw,red]
194 | \tikzstyle{blacknode}=[circle,draw]
195 | \node[blacknode](w) at (0,0) {w}
196 | child {node[rednode]{z}}
197 | child {node[rednode]{y}
198 | 	child{node[rednode,left]{x}}
199 | };
200 | \node[rednode](w1) [right = 3cm of w] {w}
201 | child {node[blacknode]{z}}
202 | child {node[blacknode]{y}
203 | 	child{node[rednode,left]{x}}
204 | };
205 | \draw[->,very thick] (1.3,-1.5) -- (2.3,-1.5);
206 | \end{tikzpicture}
207 | \caption{case 1}
208 | \end{subfigure}
209 | \begin{subfigure}{.3\textwidth}
210 | \centering
211 | \begin{tikzpicture}
212 | \tikzstyle{rednode}=[circle,draw,red]
213 | \tikzstyle{blacknode}=[circle,draw]
214 | \tikzstyle{subtree} = [regular polygon, regular polygon sides = 4, draw]
215 | \node[blacknode](w2) {w}
216 | child {node[blacknode]{z}}
217 | child {node[rednode]{y}
218 | 	child{node[rednode]{x}
219 | 		child{node[subtree]{...}}
220 | 		child{node[subtree]{...}}
221 | 	}
222 | 	child{node[subtree]{...}}
223 | };
224 | \end{tikzpicture}
225 | \caption{case 2}
226 | \end{subfigure}
227 | \caption{Insertion in a Red-Black Tree}\label{redblackinsertion}
228 | \end{figure}
229 | \section{Hash Table}
230 | \subsection{Concepts and Applications}
231 | Hash table is a data structure designed to efficiently maintain a (possibly evolving) set of items, such as financial transactions, IP addresses, people associated with some data, etc. It supports insertion of a new record, deletion of existing records, and lookup of a particular record (like a dictionary). Assuming that the hash table is properly implemented, and that the data is non-pathological, all these operations can be executed in $O(1)$ time: amazingly fast!
232 | 
233 | Let's first introduce a few typical use cases of hash table before diving into its implementation. 
234 | 
235 | Hash table can be used to solve the de-duplicates problem.
236 | \begin{description}
237 | \item[Input]A stream of objects.
238 | \item[Output]Unique objects in the stream, i.e. the objects with all duplicates removed.
239 | \end{description} 
240 | The problem arises when we want to record the number of unique visitors to a website, or when we want to remove duplicates in the result of a search. With a hash table on the objects implemented, we can solve the problem in linear time. Just examine the objects one by one. For each object $x$, do a lookup in the hash table $H$. If $x$ is not found in $H$, insert it into $H$ and append it to the result, otherwise just continue with the next object.
241 | 
242 | Another application is the 2-sum problem.
243 | \begin{description}
244 | \item[Input]An unsorted array $A$ of $n$ integers, and a target sum $t$.
245 | \item[Output]Whether there exists two numbers $x,y\in A$ such that $x+y==t$.
246 | \end{description}
247 | A naive enumerative solution is $O(n^2)$. If we sort $A$ and then search for $t-x$ in $A$ for every $x\in A$, the time consumption can be reduced to $O(n\log n)$. But with hash table, the problem can be solved in merely $O(n)$ time. Just insert all items into the hash table, and then for each $x\in A$ check if $t-x$ is in $A$ via a hash table lookup.
248 | 
249 | In the early days of compilers, hash table was used to implemented symbol tables. The administrator of a network can use hash table to block certain IP addresses. When exploring huge game trees of chess or Go, hash table can be used to avoid duplicate explorations of the same configuration that can appear enormous times in the tree. In the last case, the size of the tree is so large that hash table is the only plausible method to record whether a configuration has been explored.
250 | \subsection{Implementation}
251 | When implementing hash table, we should think of a generally really big universe $U$ (e.g. all IP addresses, all names, all chessboard configurations, etc), of which we wish to maintain a evolving subset $S$ of reasonable size. The general approach is as follow.
252 | \begin{enumerate}
253 | \item Pick $n$ as the number of ``buckets''. $n$ should be of size comparable with $S$.
254 | \item Choose a hash function $h:U\rightarrow\{0,1,2,\dots,n-1\}$.
255 | \item Use array $A$ of length $n$ to store the items. $x$ should be stored in $A[h(x)]$.
256 | \end{enumerate}
257 | \begin{definition}
258 | For a specific hash function $h$ on a universe $U$, we say there is a collision if $\exists$ distinct $x,y\in U$ such that $h(x)=h(y)$.
259 | \end{definition}
260 | Think of the famous ``same birthday'' problem: what's the number of people needed so that the probability for at least 2 of them to have the same birthday is more than 50\%? The answer is 23, which is quite a small number. This problem is an example to demonstrate that collisions are not unlikely to happen, and thus a good implementation of hash table must be able to resolve collision properly. There are two popular solutions:
261 | \begin{description}
262 | \item[Chaining]A linked list is kept in each bucket containing items with the correspondent hash value. Given an object $x$, an insertion / deletion / lookup is executed in the list $A[h(x)]$ when a correspondent operation is executed on the hash table with $x$.
263 | \item[Open addressing]A bucket only stores one object. The hash function specifies a probe sequence $h_1(x),h_2(x),$ etc. When an object is inserted in to the hash table, the sequence will be followed until an empty slot is found. The sequence can be linear (i.e. slots are probed consecutively), or decided by two independent hash functions.
264 | \end{description}
265 | 
266 | For a hash table with chaining, insertions are always $\Theta(1)$ because we simply insert a new element at the front of a list, while deletions and lookups are $\Theta(list\:length)$. The maximal length of a list can be anywhere from $m/n$, which means lengths of all lists are equal, to $m$, which means all objects are in the same bucket. The situation with open addressing is similar. Obviously, the performance of an implementation depends heavily on the choice of the hash function. A good hash function should lead to good performance, i.e. data should be spread out among all hash values, and the result of the hash function should be easy to evaluate and store.
267 | 
268 | A widely used method to define a hash function consists of two steps. First an object is transformed into a usually large integer, namely the hash code, and then the integer is mapped by a compression function to a number between $0$ and $n-1$, i.e. the index of a bucket. The  $mod\:n$ function can serve as the compression function. 
269 | 
270 | The number of buckets $n$ must be selected with caution. It should be a prime within a constant factor of the number of objects supposed to be saved in the table, and it should not be close to a power of 2 or 10.
271 | \begin{definition}
272 | The load factor $\alpha$ of a hash table is defined as 
273 | $$\alpha=\frac{\text{\# of objects in the hash table}}{\text{\# of buckets in the hash table}}.$$
274 | \end{definition}
275 | Obviously, for open addressing, $\alpha$ has to be smaller than 1, whereas chaining can cope with $\alpha<1$.
276 | In general, $\alpha$ has to be $O(1)$ to guarantee constant running time for hash table operations. In particular, $\alpha\ll 1$ is expected for open addressing. 
277 | \subsection{Universal Hashing}
278 | We wish to fabricate a clever hash function that can spread any data set quasi-evenly among all buckets. Unfortunately such function does not exist, because any hash function has a pathological data set. The reason is that for any hash function $h:U\rightarrow\{0,1,\dots,n-1\}$, according to the Pigeonhole Principle,  there exists a bucket $i$ such that at least $\lvert U\rvert/n$ elements of $U$ hash to $i$ under $h$. If the data set is a subset of these elements, all of them will collide. This could become dangerous in real-world systems: a simple hash function can be reverse engineered and abused. 
279 | 
280 | There are two solutions to this problem. Either a cryptographic hash function, e.g. SHA-2, should be used to make the reverse engineering infeasible, or a randomized approach should be taken: we should design a family $H$ of hash functions such that for any data set $S$, a randomly chosen function $h\in H$ is almost guaranteed to spread $S$ out quasi-evenly. Such a family of hash functions is called universal.
281 | \begin{definition}
282 | Let $H$ be a set of hash functions $h:U\rightarrow\{0,1,\dots,n-1\}$. $H$ is universal if and only if $\forall x,y\in U(x\neq y),$ 
283 | $$P(h(x)=h(y))\leq \frac{1}{n},$$
284 | in which $n$ is the number of buckets and $h$ is a hash function chosen uniformly at random from $H$. $1/n$ is actually the probably of a collision for pure random hashing.
285 | \end{definition}
286 | We will now provide a universal hash function family for IP addresses. Let $U$ represent the universe of all IP address of the form $(x_1,x_2,x_3,x_4)$, in which each $x_i$ is an integer between 0 and 255 inclusive. Let $n$ be a prime whose value is comparable with the number of objects in the hash table, and larger than 255. We define a hash function $h_a$ for each 4-tuple $a=(a_1,a_2,a_3,a_4)$ with each $a_i\in\{0,1,\dots,n-1\}:$
287 | $$h_a(x_1,x_2,x_3,x_4)=\left(\sum\limits_{i=1}^4a_ix_i\right)\mod n.$$ 
288 | Then the family of all $h_a$ is universal.
289 | \begin{proof}
290 | Consider two distinct IP addresses $x=(x_1,x_2,x_3,x_4)$, $y=(y_1,y_2,y_3,y_4)$, and assume that $x_4\neq y_4$. If $x$ and $y$ collide, we have 
291 | \begin{align*}
292 | \left(\sum\limits_{i=1}^4a_ix_i\right)\mod n=\left(\sum\limits_{i=1}^4a_iy_i\right)\mod n\\
293 | a_4(x_4-y_4)\mod n=\left(\sum\limits_{i=1}^3a_i(y_i-x_i)\right)\mod n\\
294 | \end{align*}
295 | For an arbitrarily fixed choice of $a_1,a_2,a_3$, the rhs is a fixed number between 0 and $n-1$ inclusive. With $x_4-y_4\mod n\neq 0$ (guaranteed by $n>255$ and $x_4\neq y_4$) and $a_4$ randomly chosen in $\{0,1,\dots,n-1\}$, the lhs is actually equally likely to be any of $\{0,1,\dots,n-1\}$. Therefore the probability of collision is $\frac{1}{n}$. 
296 | \end{proof}
297 | Now we would like to verify the $O(1)$ running time guarantee of hash table implemented with chaining and hash function $h$ selected randomly from a universal family $H$. Here we assume that $\lvert S\rvert=O(n)$, i.e. $\alpha=\frac{\lvert S\rvert}{n}=O(1)$, and that it takes $O(1)$ to evaluate the hash function. 
298 | \begin{proof}
299 | As discussed before, the running time of basic operations on a hash table implemented with chaining is $O(list\:length)$. So here we will try to prove that the expectation of the list length $L$ is $O(1)$. 
300 | 
301 | For a specific list corresponding to hash value $h(x)$, we define 
302 | \begin{equation*}
303 | Z_y=\begin{cases}
304 | 1\text{ if }h(y)=h(x)\\
305 | 0\text{ otherwise}\\
306 | \end{cases}
307 | \end{equation*}
308 | for any $y\in S$. Then obviously $L=\sum\limits_{y\in S}Z_y$. Thus we have
309 | \begin{align*}
310 | E[L]&= \sum\limits_{y\in S}E[Z_y]=\sum\limits_{y\in S}P(h(y)=h(x))\\
311 | &\leq\sum\limits_{y\in S}\frac{1}{n}=\frac{\lvert S\rvert}{n}=O(1).
312 | \end{align*}
313 | \end{proof}
314 | The running time of operations on hash table implemented with open addressing is hard to analyze. We will use a heuristic assumption that all $n!$ probe sequences are equally possible, which is indeed not true but facilitates an idealized quick analysis. Under this heuristic assumption, the expected running time of operations is $\frac{1}{1-\alpha}$. 
315 | \begin{proof}
316 | A random probe finds an empty slot with probability $1-\alpha$. A random probe sequence can be regarded as repetitions of random probes\footnote{Actually the probability for probes after the first probe to find an empty slot is larger, because we don't examine the same slot twice. The running time $\frac{1}{1-\alpha}$ is an upper bound.}. Thus the expectation of the number of probes needed for finding an empty slot is $\frac{1}{1-\alpha}$.
317 | \end{proof}
318 | For linear probing, the heuristic assumption is deadly wrong. So we assume instead that the initial probe is random, which is again not true in practice. Knuth proved in 1962 that the expected running time of an insertion under this assumption is $\frac{1}{(1-\alpha)^2}$.
319 | \section{Bloom Filters}
320 | Bloom filter is another data structure that facilitates fast insertions and lookups besides hash tables. It uses much less space than hash table, at the price of the following shortcomings:
321 | \begin{itemize}
322 | \item It cannot store associated objects;
323 | \item It does not support deletions;
324 | \item There is a small probability of false positive for the lookup result.
325 | \end{itemize} 
326 | Historically, bloom filters are used to implement spell-checkers. A canonical use case is to forbid passwords of certain patterns. It is also used in network routers to complete tasks like banning certain IP addresses. It is desirable in such  environment because memory is limited, the lookups are supposed to be super-fast and occasional false positives are tolerable. 
327 | 
328 | Basic ingredients of a bloom filter includes an array $A$ of $n$ bits and $k$ hash functions $h_i,i=1,,2\dots,k$. To insert element $x$, we set $A[h_i(x)] = 1$ for all $i$. To do a lookup for $x$, we return true if we find that $A[h_i(x)]$ = 1 for all $i$. It is obvious that if all bits related to element $x$ via the $k$ hash functions have been set to 1 before $x$ itself is inserted, false positive will happen in a lookup for $x$. If the probability of false positive is too big, bloom filter should never be used. We will use heuristic analysis to illustrate that this probability is very small in reality.
329 | 
330 | We assume that across different $i$ and $x$, all $h_i(x)$ are uniformly random and independent. The assumption is generally not true, but it helps to understand the trade-off between space and error in bloom filters. 
331 | 
332 | We would like to insert a data set $S$ into a bloom filter using $n$ bits. The probability that a certain bit has been set to 1 after inserting $S$ is
333 | \begin{equation}\label{bloomfilterheuristic}
334 | 1- \left(1-\frac{1}{n}\right)^{k\lvert S\rvert}\approx 1-e^{-k\lvert S\rvert/n}=1-e^{-k/b},
335 | \end{equation}
336 | in which $b$ is the number of bits per object $\frac{n}{\lvert S\rvert}.$ Note that the approximation is only correct when $n$ is large, i.e. when $1/n$ is small. For an element not in $S$, the probability of a false positive is 
337 | \begin{equation*}
338 | \epsilon=\left(1-e^{-k/b}\right)^k.
339 | \end{equation*}
340 | Let $t=-k/b$, then we have
341 | \begin{align*}
342 | \ln\epsilon&=-bt\ln(1-e^{-t})\\
343 | \dv{\ln\epsilon}{t}&=-\frac{b}{e^t-1}\left(t+(e^t-1)\ln(1-e^{-t})\right).\\
344 | \end{align*}
345 | When $t=\ln 2$, $\dv{\ln\epsilon}{t}=0$ and $\epsilon$ gets minimized. Thus we should use $k=(ln 2)b\approx 0.693b.$ With $b=8$, we should choose $k=5$ or 6, and $\epsilon$ will be approximately 2\%. 
346 | \section[Union Find]{Union Find\protect\footnote{This topic was originally covered as an optional topic in Part 2. I put it here because it's a pure data structure topic that fits this chapter better.}}\label{unionfind}
347 | Union find data structure maintains the partition of a set of objects. It supports two essential operations:
348 | \begin{description}
349 | \item[$find(x)$]Returns the name of the group to which $x$ belongs.
350 | \item[$union(C_i, C_j)$]Merge groups $C_i,C_j$ into one group.
351 | \end{description} 
352 | \subsection{Quick Find UF}
353 | \begin{multicols}{2}
354 | \begin{algorithmic}[1]
355 | \Function{find}{x}
356 | \State{return leader(x)}
357 | \EndFunction
358 | \end{algorithmic}
359 | \columnbreak
360 | \begin{algorithmic}[1]
361 | \Function{union}{x,y}
362 | \If{size(x) $>$ size(y)}
363 | \For{i in y's group}
364 | \State{leader(i) = x}
365 | \EndFor
366 | \State{size(x) += size(y)}
367 | \Else
368 | \For{i in x's group}
369 | \State{leader(i) = y}
370 | \EndFor
371 | \State{size(y) += size(x)}
372 | \EndIf
373 | \EndFunction
374 | \end{algorithmic}
375 | \end{multicols}
376 | In this implementation, a leader is chosen from each group arbitrarily. Each group is represented by its leader. Each object maintains a pointer to its leader. When two groups get merged, the leader of the larger group (i.e. the group that contains more objects) becomes the leader of the merged group. 
377 | 
378 | $find()$ is easy for this implementation: just return its leader, which takes $O(1)$ time. However $union()$ takes $O(n)$ time, because all objects in the smaller group have to have the leader pointer updated. Nonetheless, if we consecutively merge groups so that in the end all objects are in the same group, the leader of each object is updated at most $\log n$ times, because each time an object has its leader updated, the size of the group to which it belongs at least doubles. Therefore in total there can be at most $O(n\log )$ leader updates.
379 | \subsection{Quick Union UF}
380 | \begin{multicols}{2}
381 | \begin{algorithmic}[1]
382 | \Function{find}{x}
383 | \While{x $\neq$ parent(x)}
384 | \State{x = parent(x)}
385 | \EndWhile
386 | \State{return x}
387 | \EndFunction
388 | \end{algorithmic}
389 | \columnbreak
390 | \begin{algorithmic}[1]
391 | \Function{union}{x,y}
392 | \State{lx = find(x), ly = find(y)}
393 | \If{size(lx) $>$ size(ly)}
394 | \State{parent(ly) = lx}
395 | \State{size(lx) += size(ly)}
396 | \Else
397 | \State{parent(lx) = ly}
398 | \State{size(ly) += size(lx)}
399 | \EndIf
400 | \EndFunction
401 | \end{algorithmic}
402 | \end{multicols}
403 | In this implementation\footnote{It is different from the lazy union implementation in the lectures. It is as efficient as union by rank, but much easier to verify.}, each object maintains a pointer to its parent instead of its leader. Only the leader has itself as parent. In this way each group form a tree, with the leader as root. $find()$ follows the parent pointer of objects until the leader is met. $union()$ changes the parent of the leader of the smaller group into the leader of the larger group. 
404 | 
405 | It can be proved by induction that a group with $k$ objects forms a tree of height no more than $\log k$. 
406 | \begin{proof}
407 | The base case is trivial: each group has 1 object and height 0. Suppose $1\leq i\leq j$. When a group with $i$ objects is merged with a group with $j$ objects, the height of the new group is 
408 | $$h_{i+j}=h_i+1\leq \log i + 1 = \log(2i)\leq\log(i+j).$$
409 | \end{proof}
410 | Since the tree is at most of logarithmic height, the running time of $find()$ and $union()$ are both $O(\log n).$
411 | \subsection{Union by Rank}
412 | \begin{multicols}{2}
413 | \begin{algorithmic}[1]
414 | \Function{find}{x}
415 | \While{x $\neq$ parent(x)}
416 | \State{x = parent(x)}
417 | \EndWhile
418 | \State{return x}
419 | \EndFunction
420 | \end{algorithmic}
421 | \columnbreak
422 | \begin{algorithmic}[1]
423 | \Function{union}{x,y}
424 | \State{lx = find(x), ly = find(y)}
425 | \If{rank(lx) $>$ rank(ly)}
426 | \State{parent(ly) = lx}
427 | \ElsIf{rank(lx) $<$ rank(ly)}
428 | \State{parent(lx) = ly}
429 | \Else\Comment{equal}
430 | \State{parent(lx) = ly}
431 | \State{rank(ly) += 1}
432 | \EndIf
433 | \EndFunction
434 | \end{algorithmic}
435 | \end{multicols}
436 | $find()$ is the same as quick union. Each object maintains a rank field, which is initialized to 0 for all objects and can only increase in a merge of two trees whose roots have the same rank. In $union()$, rank instead of size is used to determine which root will serve as the root of the merged tree. It can be verified that union by rank also achieves $O(\log n)$ running time. 
437 | 
438 | It follows immediately from the implementation of $union()$ that
439 | \begin{enumerate}
440 | \item For any object x, rank(x) can only increase over time.
441 | \item Only ranks of roots can go up.
442 | \item Along a path to the root, ranks strictly increase.
443 | \end{enumerate}
444 | \begin{lemma}\textbf{(Rank Lemma)}
445 | After an arbitrary number of union operations, there are at most $n/2^r$ objects with rank $r$, in which $r\geq 0$. 
446 | \end{lemma}
447 | \begin{proof}
448 | First, if $x,y$ have the same rank $r$, then their subtrees must be disjoint. If they had a common node $z$, then path $z\rightarrow x$ and $z\rightarrow y$ would have to superpose with each other because each node has only one parent. It would become inevitable that one of $x$ and $y$ was an ancestor of the other, which is impossible because they have the same rank. 
449 | 
450 | Then it can be verified that a rank-$r$ object has a subtree of size $\geq 2^r$. We will prove it by induction. In the base case, all subtrees are of rank $0$ and size 1. When two subtrees whose roots have different ranks merge, the situation is simple: no rank changes, while sizes become larger, hence the claim cannot be violated. When two subtrees $t_1, t_2$ whose roots have the same rank $r$ merge, the rank of the new root is $r+1$, and it is the only node whose rank changes. Since $t_1,t_2$ both have size $\geq 2^r$, the new tree must have size $\geq 2^{r+1}$, therefore the claim is observed. 
451 | 
452 | The rank lemma follows directly from the two claims above.    
453 | \end{proof}
454 | According to the rank lemma, there is at most 1 object with rank $\log n$, which can only be the root. Thus the tree is at most of height $\log n$, and the running time of $find()$ and $union()$ is $O(\log n)$. 
455 | \subsection{Path Compression}
456 | If $find()$ operation is expected to be executed multiple times for each object, which is almost always the case, it makes no sense to repeat the same traversal job every time. Instead, we can make the parent pointer of all objects met during the process point to the root, i.e. the leader, so that later $find()$ operation on these objects take $O(1)$ time. This modification adds only a constant factor overhead to the first $find()$ on objects who are not direct children of the leader, and greatly speeds up subsequent $find()$ operations.
457 | \begin{algorithmic}[1]
458 | \Function{find}{x}
459 | \State{leader = x}
460 | \While{leader $\neq$ parent(leader)}
461 | \State{leader = parent(leader)}
462 | \EndWhile
463 | \While{parent(x) $\neq$ leader}
464 | \State{t = parent(x)}
465 | \State{parent(x) = leader}
466 | \State{x = t}
467 | \EndWhile
468 | \State{return leader}
469 | \EndFunction
470 | \end{algorithmic}
471 | We will try to precisely analyze the performance enhancement of union by rank brought by path compression. Ranks are maintained exactly as without path compression. In this case, rank[x] is only an upper bound on the maximum number of steps along a path from a leaf to x. But the rank lemma still holds, and we still have rank(parent(x)) $>$ rank(x) for all non-root x.
472 | \subsubsection{Hopcroft-Ullman's Analysis}
473 | \begin{theorem} \textbf{(Hopcroft-Ullman Theorem)}
474 | With union by rank and path compression, $m$ union + find operations take $O(m\log^*n)$ time, where 
475 | \begin{equation*}
476 | \log^*n=\begin{cases}
477 | 0&if\:n\leq 1\\
478 | 1+\log^*(\log n)&if\:n>1
479 | \end{cases}
480 | \end{equation*}
481 | \end{theorem}
482 | We will focus on the case when $m=\Omega(n).$
483 | \begin{proof}
484 | First we divide the interval [0,n] into a few rank blocks:\{0\}, \{1\}, \{2,3,4\}, \{5,$\dots,2^4$\}, \{17,$\dots,2^16$\}, \{65537,$\dots,2^65536$\}, $\dots$, \{...$n$\}. In general, there are $O(\log^*n)$ rank blocks. 
485 | 
486 | Consider a non-root object x, thus rank(x) is fixed. At a given time point, we call an object x good if one of the two conditions is satisfied:
487 | \begin{itemize}
488 | \item x or parent(x) is a root;
489 | \item rank(parent(x)) is in a larger block than rank(x). 
490 | \end{itemize}
491 | If neither condition is satisfied, we say x is bad. 
492 | 
493 | A $find()$ operation can visit at most $O(\log^*n)$ good nodes (root, direct child of root + at most 1 in each rank block). In $m$ operation, these visits take $O(m\log^*n)$ time. 
494 | 
495 | To compute the time consumption of visits to bad nodes, consider a rank block \{k+1, $\dots, 2^k$\}. Note that each time a bad node is visited, its parent is changed to another node (its then root) with strictly larger rank than its current parent. For a bad node x with rank(x) in this block, this process can happen at most $2^k$ times before x becomes good. Therefore the number of visits to x while x is bad and rank(x) is in this block is $\leq 2^k$. According to the rank lemma, the number of objects x with rank in this block is $\leq\sum\limits_{i=k+1}^{2^k}\frac{n}{2^i}<\frac{n}{2^k}$. Thus the total number of visits to bad objects in this block is $\leq 2^k\cdot\frac{n}{2^k}=n$. Since there are $O(\log^* n)$ blocks, the total time spent on visiting bad nodes is $O(n\log^*n)$. 
496 | 
497 | In conclusion, the running time of $m$ operations is $O((m+n)\log^*n)$. Since we are interested in the case when $m=\Omega(n)$, it is equivalent to $O(m\log^*n)$. 
498 | \end{proof}
499 | \subsubsection{Tarjan's Analysis}
500 | Hopcroft-Ullman theorem already provides an upper bound quite close to linear running time. Yet Tarjan proved that there is an even better upper bound. 
501 | 
502 | For integers $r\geq 1, k\geq 0$, the Ackermann function $A_k(r)$ is defined as 
503 | \begin{align*}
504 | A_0(r)&=r+1\\
505 | A_k(r)&=\underbrace{(A_{k-1}\circ A_{k-1}\circ\dots\circ A_{k-1})}_{r\text{ times}}(r),\:k\geq 1
506 | \end{align*}
507 | It's easy to derive that $A_1(r)=2r$, $A_2(r)=r2^r$. Because $A_2(r)$ is larger than $2^r$, $A_3(r)$ is larger than the result of applying $2^r$ $r$ times on $r$, i.e. the ``exponential tower'' of height $r$:
508 | $$A_3(r)>{{2^2}^2}^{\dots r\:times\dots}.$$
509 | Specifically, $A_1(2)=4$, $A_2(2)=8$, $A_3(2)=A_2(A_2(2))=A_2(8)=8\times 2^8=2048$. $A_4(2)=A_3(2048)$, which is larger than the exponential tower of height 2048. 
510 | 
511 | For integer $n\geq 4$, we define the inverse Ackermann function 
512 | $$\alpha(n)=\text{ minimum value of $k$ such that }A_k(2)\geq n.$$
513 | Since $A_k(2)$ blows up fast as $k$ gets larger, $\alpha(n)$ grows extremely slow. As a comparison:
514 | \begin{equation*}
515 | \alpha(n)=\begin{cases}
516 | 1&n=4\\
517 | 2&n=5,\dots,8\\
518 | 3&n=9,\dots,2048\\
519 | 4&n=2049,\dots,>{{2^2}^2}^{\dots 2048\:times\dots}\\
520 | &\dots\\
521 | \end{cases}
522 | \log^*n=\begin{cases}
523 | 1&n=2\\
524 | 2&n=3,4\\
525 | 3&n=5,\dots,16\\
526 | 4&n=17,\dots,65536\\
527 | 5&n=65537,\dots,2^{65536}\\
528 | \end{cases}
529 | \end{equation*}
530 | For $n={{2^2}^2}^{\dots 2048\:times\dots}$, $\log^*n=2048$ while $\alpha(n)=4$.
531 | \begin{theorem} \textbf{(Tarjan's Theorem)}
532 | With union by rank and path compression, $m$ union + find operations take $O(m\alpha(n))$ time.
533 | \end{theorem}
534 | In order to prove Hopcroft-Ullman's theorem, we used the fact that if parent(x) is updated from p to p', then rank(p') $\geq$ rank(p) + 1. To verify Tarjan's theorem, we will use a stronger version of this claim: in most cases rank(p') is much bigger than rank(p) (not just by 1). 
535 | \begin{proof}
536 | Consider a non-root object x, thus rank(x) is fixed. Define 
537 | \begin{center}
538 | $\delta(x)$ = max $k$ such that rank(parent(x)) $\geq$ $A_k($rank(x)).
539 | \end{center}
540 | As a few examples:\\
541 | \begin{equation*}
542 | \begin{cases}
543 | \delta(x)\geq 0\iff \text{rank(parent(x))} \geq \text{rank(x) + 1 (always)}\\
544 | \delta(x)\geq 1\iff \text{rank(parent(x))} \geq 2\cdot\text{rank(x)}\\
545 | \delta(x)\geq 2\iff \text{rank(parent(x))} \geq \text{rank(x)}\cdot 2^{\text{rank(x)}}\\
546 | \end{cases}
547 | \end{equation*}
548 | Note that for all object x with rank(x)$\geq 2$, we must have $\delta(x)\leq\alpha(n)$, because 
549 | $$A_{\alpha(n)}(\text{rank(x)})\geq A_{\alpha(n)}(2)\geq n\geq\text{rank(parent(x))}.$$ 
550 | An object is defined as bad if \textbf{all} of the following conditions hold:
551 | \begin{enumerate}
552 | \item x is not a root;
553 | \item parent(x) is not a root;
554 | \item rank(x)$\geq$2;
555 | \item x has an ancestor with $\delta(x)=\delta(y)$.
556 | \end{enumerate}
557 | Otherwise x is good. Along an object-root path, the maximum number of good objects is $\Theta(\alpha(n))$: 1 root, 1 direct child of root, 1 object with rank 0, 1 object with rank 1, 1 object x with $\delta(x)=k$ for each $k=0,1,\dots,\alpha(n)$. Thus the total number of visits to good objects is $O(m\alpha(n))$.
558 | 
559 | Consider the visit to a bad object x. x has an ancestor y with $\delta(x)=\delta(y)=k$. Suppose x's parent is p, y's parent is p', then we have 
560 | \begin{center}
561 | rank(x's new parent) $\geq$ rank(p') $\geq A_k($rank(y)) $\geq A_k($rank(p))
562 | \end{center}
563 | The first and third $\geq$ are because rank only goes up from child to parent. The second $\geq$ comes from the definition of $\delta(y)$. This relation indicates that path compression at least applies the $A_k$ function to rank(x's parent). If r = rank(x), then after r such pointer updates, we have 
564 | \begin{center}
565 | rank(parent(x)) $\geq\underbrace{(A_k\circ\dots\circ A_k)}_{\text{r times}}(r)=A_{k+1}(r).$ 
566 | \end{center} 
567 | Hence, every r visits to x while x is bad increases $\delta(x)$. Because $\delta(x)\leq\alpha(n)$, there can be at most $r\alpha(n)$ visits to x while it's bad. Thus the total number of visits to bad objects is 
568 | \begin{equation*}
569 | N(bad)\leq\sum\limits_{\text{x is bad}}rank(x)\alpha(n)\leq\alpha(n)\sum\limits_{r\geq 2}\frac{n}{2^r}<n\alpha(n)
570 | \end{equation*}
571 | In conclusion, the total running time is $O(m\log n)+O(n\log n)=O(m\log n)$.
572 | \end{proof}
573 | \ifx\PREAMBLE\undefined
574 | \end{document}
575 | \fi


--------------------------------------------------------------------------------