├── egscc.jpg ├── metagraph.jpg ├── prooflemmascc.jpg ├── vertexcover1.jpg ├── closestpointlemma.jpg ├── .gitignore ├── README.md ├── notes.tex ├── preamble.tex ├── Introduction.tex ├── RandomizedAlgorithms.tex ├── DivideConquer.tex ├── Graphs.tex ├── NPCompleteness.tex ├── DynamicProgramming.tex ├── GreedyAlgorithms.tex └── DataStructures.tex /egscc.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/egscc.jpg -------------------------------------------------------------------------------- /metagraph.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/metagraph.jpg -------------------------------------------------------------------------------- /prooflemmascc.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/prooflemmascc.jpg -------------------------------------------------------------------------------- /vertexcover1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/vertexcover1.jpg -------------------------------------------------------------------------------- /closestpointlemma.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkulijing/AlgorithmDesignAnalysisNotes/HEAD/closestpointlemma.jpg -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.aux 2 | *.fdb_latexmk 3 | *.fls 4 | *.log 5 | *.loa 6 | *.out 7 | *.synctex.gz 8 | *.glg 9 | *.glo 10 | *.gls 11 | *.ist 12 | *.toc 13 | *.xdy 14 | *.pdf 15 | !notes.pdf 16 | *.1 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AlgorithmDesignAnalysisNotes 2 | Notes for coursera course *Algorithms: Design and Analysis* by Prof. Tim Roughgarden from Stanford University. 3 | 4 | Feel free to take and edit. But you have to know latex to edit it. Otherwise just take the file *notes.pdf* from the release. 5 | -------------------------------------------------------------------------------- /notes.tex: -------------------------------------------------------------------------------- 1 | \input{preamble} 2 | \def\PREAMBLE{PREAMBLE} 3 | \title{Notes for Algorithms: Design and Analysis} 4 | \author{Jing LI\\pkuyplijing@gmail.com} 5 | 6 | \begin{document} 7 | \pagestyle{empty} 8 | \hypersetup{pageanchor=false} 9 | \maketitle 10 | \begin{center} 11 | \emph{Sincere gratitude to Professor Tim Roughgarden \\for offering such a wonderful course.} 12 | \end{center} 13 | \tableofcontents 14 | \newpage 15 | \hypersetup{pageanchor=true} 16 | \pagenumbering{arabic} 17 | \pagestyle{headings} 18 | \addtocontents{toc}{\protect\thispagestyle{empty}} 19 | \input{Introduction} 20 | \input{DivideConquer} 21 | \input{RandomizedAlgorithms} 22 | \input{Graphs} 23 | \input{DataStructures} 24 | \input{GreedyAlgorithms} 25 | \input{DynamicProgramming} 26 | \input{NPCompleteness} 27 | \clearpage%somehow needed so that list of algorithms in toc jumps to the correct page. 28 | \phantomsection 29 | \addcontentsline{toc}{chapter}{\listalgorithmname} 30 | \listofalgorithms 31 | \end{document} -------------------------------------------------------------------------------- /preamble.tex: -------------------------------------------------------------------------------- 1 | \documentclass{report} 2 | \usepackage[format = hang, font = bf]{caption} 3 | \usepackage{subcaption} 4 | % The following is needed in order to make the code compatible 5 | % with both latex/dvips and pdflatex. Added for using UML generated by MetaUML. 6 | \ifx\pdftexversion\undefined 7 | \usepackage[dvips]{graphicx} 8 | \else 9 | \usepackage[pdftex]{graphicx} 10 | \DeclareGraphicsRule{*}{mps}{*}{} 11 | \fi 12 | \usepackage{array} 13 | \usepackage{amsmath} 14 | \usepackage{amsthm} 15 | \usepackage{mathtools} 16 | \usepackage{boxedminipage} 17 | \usepackage{listings} 18 | \usepackage{multicol} 19 | \usepackage{makecell}%diagonal line in table 20 | \usepackage{float}%allowing forceful figure[H] 21 | \usepackage{xcolor} 22 | \usepackage{amsfonts}%allowing \mathbb{R} 23 | \usepackage{amssymb} 24 | \usepackage{alltt} 25 | \usepackage{algorithmicx} 26 | \usepackage[chapter]{algorithm} 27 | %chapter option ensures that algorithms are numbered within each chapter rather than in the whole article 28 | \usepackage[noend]{algpseudocode} %If end if, end procdeure, etc is expected to appear, remove the noend option 29 | \usepackage{xspace} 30 | \usepackage{physics} 31 | \usepackage{color} 32 | \usepackage{tikz} 33 | \usetikzlibrary{shapes,positioning} 34 | \usepackage{url} 35 | \def\UrlBreaks{\do\A\do\B\do\C\do\D\do\E\do\F\do\G\do\H\do\I\do\J\do\K\do\L\do\M\do\N\do\O\do\P\do\Q\do\R\do\S\do\T\do\U\do\V\do\W\do\X\do\Y\do\Z\do\[\do\\\do\]\do\^\do\_\do\`\do\a\do\b\do\c\do\d\do\e\do\f\do\g\do\h\do\i\do\j\do\k\do\l\do\m\do\n\do\o\do\p\do\q\do\r\do\s\do\t\do\u\do\v\do\w\do\x\do\y\do\z\do\0\do\1\do\2\do\3\do\4\do\5\do\6\do\7\do\8\do\9\do\.\do\@\do\\\do\/\do\!\do\_\do\|\do\;\do\>\do\]\do\)\do\,\do\?\do\'\do+\do\=\do\#\do\-} 36 | \usepackage{xr}%allow cross-file references 37 | \usepackage[breaklinks = true]{hyperref} 38 | \lstset{ 39 | language = C++, 40 | showspaces = false, 41 | breaklines = true, 42 | tabsize = 2, 43 | numbers = left, 44 | extendedchars = false, 45 | basicstyle = {\ttfamily \footnotesize}, 46 | keywordstyle=\color{blue!70}, 47 | commentstyle=\color{gray}, 48 | frame=shadowbox, 49 | rulesepcolor=\color{red!20!green!20!blue!20}, 50 | numberstyle={\color[RGB]{0,192,192}}, 51 | moredelim=[is][\underbar]{_}{_} 52 | } 53 | \mathchardef\myhyphen="2D 54 | % switch-case environment definitions 55 | \algblock{switch}{endswitch} 56 | \algblock{case}{endcase} 57 | %\algrenewtext{endswitch}{\textbf{end switch}} %If end switch is expected to appear, uncomment this line. 58 | \algtext*{endswitch} % Make end switch disappear 59 | \algtext*{endcase} 60 | \algnewcommand\algorithmicinput{\textbf{Input}} 61 | \algnewcommand\Input{\item[\algorithmicinput]} 62 | \algnewcommand\algorithmicoutput{\textbf{Output}} 63 | \algnewcommand\Output{\item[\algorithmicoutput]} 64 | \algnewcommand\algorithmicinputoutput{\textbf{input and output:}} 65 | \algnewcommand\InputOutput{\item[\algorithmicinputoutput]} 66 | \allowdisplaybreaks 67 | \newtheorem{theorem}{Theorem} 68 | \newtheorem{corollary}[theorem]{Corollary} 69 | \newtheorem{lemma}[theorem]{Lemma} 70 | \newtheorem{definition}{Definition} -------------------------------------------------------------------------------- /Introduction.tex: -------------------------------------------------------------------------------- 1 | \ifx\PREAMBLE\undefined 2 | \input{preamble} 3 | \begin{document} 4 | \fi 5 | \chapter{Introduction} 6 | \section{Warmup: Integer Multiplication Problem} 7 | \subsection{The Primary School Approach} 8 | Consider the integer multiplication algorithm that everyone learned in primary school. 9 | \begin{description} 10 | \item[Input]two n-digit numbers $x$ and $y$. 11 | \item[Output]the product $x\times y$. 12 | \end{description} 13 | We will assess its performance by counting the number of \textbf{primitive operations}, here being the addition or multiplication of two single-digit numbers, required to carry it out. In this case, it is clearly $\Theta(n^2)$. It might have been taken for granted that this is the unique, or at least optimal approach, while actually it isn't. As Aho, Hopcroft and Ullman put it in their 1974 book \emph{The Design and Analysis of Computer Algorithms}, ``Perhaps the most important principle for the good algorithm designer is to refuse to be content.'' Always be ready to ask yourself the question: CAN WE DO BETTER? 14 | \subsection{Karatsuba Multiplication} 15 | Consider the calculation of $5678\times1234$. We will note $a = 56, b = 78, c = 12$ and $d = 34$. The calculation can be carried out with the following steps: 16 | \begin{enumerate} 17 | \item Calculate $a\cdot c = 672$; 18 | \item Calculate $b\cdot d = 2652$; 19 | \item Calculate $(a+b)(c+d) = 134\times46 = 6164$; 20 | \item Calculate \textcircled{\raisebox{-.9pt}{3}} - \textcircled{\raisebox{-.9pt}{2}} - \textcircled{\raisebox{-.9pt}{1}} = 2840; 21 | \item Calculate \textcircled{\raisebox{-.9pt}{1}}$\times$10000 + \textcircled{\raisebox{-.9pt}{4}}$\times$100 + \textcircled{\raisebox{-.9pt}{2}} = 6720000 + 284000 + 2652 = 7006652. 22 | \end{enumerate} 23 | The procedure here can be formalized into a recursive algorithm to calculate $x\times y$. Write $x=10^{n/2}a + b$ and $y=10^{n/2}c + d$. Obviously we have $xy=10^nac +\allowbreak 10^{n/2}(ad+bc) +\allowbreak bd =\allowbreak 10^nac + 10^{n/2}((a+b)(c+d)-ac-bd) + bd$, which inspires us of the following algorithm: 24 | \begin{algorithm}[ht] 25 | \caption{Karatsuba Multiplication}\label{kmultiplication} 26 | \begin{algorithmic}[1]\item Recursively calculate $ac$; 27 | \State{Recursively calculate $bc$} 28 | \State{Recursively calculate $(a+b)(c+d)$} 29 | \State{Calculate \textcircled{\raisebox{-.9pt}{3}} - \textcircled{\raisebox{-.9pt}{2}} - \textcircled{\raisebox{-.9pt}{1}} to get $ad+bc$} 30 | \State{Combine the results appropriately, i.e. $xy=10^nac +\allowbreak 10^{n/2}(ad+bc) +\allowbreak bd$} 31 | \end{algorithmic} 32 | \end{algorithm} 33 | 34 | Algorithm \ref{kmultiplication} demonstrates that even a simple problem with a presumably unique solution has plenty of room for subtle algorithm analysis and design. 35 | \section{Course Topic} 36 | The course will cover the following topics: 37 | \begin{itemize} 38 | \item Vocabulary for design and analysis of algorithms; 39 | \item Divide-and-conquer algorithm design paradigm; 40 | \item Randomization in algorithm design; 41 | \item Primitives for reasoning about graphs; 42 | \item Use and implementation of data structures. 43 | \end{itemize} 44 | In part II of the course the following topics will be covered: 45 | \begin{itemize} 46 | \item Greedy algorithm design paradigm; 47 | \item Dynamic programming algorithm design paradigm; 48 | \item NP-complete problems and what to do with them; 49 | \item Fast heuristics with provable guarantees for NP problems; 50 | \item Fast exact algorithms for special cases of NP problems; 51 | \item Exact algorithms that beat brute-force search for NP problems. 52 | \end{itemize} 53 | \section{Merge Sort} 54 | We will use \textbf{merge sort} to illustrate a few basic ideas of the course. Merge sort is a non-trivial algorithm to tackle the sorting problem: 55 | \begin{description} 56 | \item[Input]An unsorted array of $n$ numbers; 57 | \item[Output]The same numbers in sorted order. 58 | \end{description} 59 | \subsection{Pseudo Code} 60 | Merge sort is more efficient than selection sort, insertion sort and bubble sort. It is a good introductory example for the divide \& conquer paradigm. Its pseudo code is shown in Algorithm \ref{mergesort}. 61 | \begin{algorithm}[ht] 62 | \caption{Merge sort}\label{mergesort} 63 | \begin{algorithmic}[1] 64 | \Input\Statex{Unsorted array of length n} 65 | \Output\Statex{Sorted array of length n} 66 | \If{length of the array = 0 or 1} 67 | \State{return}\Comment{Basic case. Array already sorted.} 68 | \EndIf 69 | \State{Recursively merge sort 1st half of the array.} 70 | \State{Recursively merge sort 2nd half of the array.} 71 | \State{Merge the two sorted halves into one sorted array.} 72 | \end{algorithmic} 73 | \end{algorithm} 74 | The merging process may not seem intuitive. It is implemented with parallel traverses of the two sorted sub-arrays, as shown in Algorithm \ref{mergehalves}. 75 | \begin{algorithm}[ht] 76 | \caption{Merging two sorted sub-arrays}\label{mergehalves} 77 | \begin{algorithmic}[1] 78 | \Input 79 | \Statex{A = 1st sorted sub-array, of length $\lfloor n/2 \rfloor$} 80 | \Statex{B = 2nd sorted sub-array, of length $\lceil n/2 \rceil$} 81 | \Output\Statex{C = sorted array of length $n$} 82 | \State{i = 1, j = 1} 83 | \For{k = 1 \textbf{to} $n$} 84 | \If{i $>$ A.len}\Comment{A has been exhausted} 85 | \State{C[k] = B[j++]} 86 | \ElsIf{j $>$ B.len}\Comment{B has been exhausted} 87 | \State{C[k] = A[i++]} 88 | \ElsIf{A[i] $<$ B[j]} 89 | \State{C[k] = A[i++]} 90 | \Else\Comment{A[i]$\geq$B[j]} 91 | \State{C[k] = B[j++]} 92 | \EndIf 93 | \EndFor 94 | \end{algorithmic} 95 | \end{algorithm} 96 | \subsection{Running Time} 97 | The running time of the merging operation is obviously linear to the length of the array. Precisely speaking, each iteration involves one increment of i or j, one increment of k, an assignment to C[k] and at most 3 comparisons\footnote{Here we are taking an approach more detailed than that in the lecture: end cases, i.e. when A or B is exhausted, are taken into account.}. Taking the initialization of i and j into account, in total we have to carry out 6n + 2 primitive operations, which is smaller than 8n. 98 | 99 | We can then draw the \textbf{recursive tree} of the problem. The original merge sort problem of size $n$ resides at level 0. At level 1 we have 2 sub-problems of size $n/2$, etc. In general, at level $j$ we have $2^j$ sub-problems of size $\frac{n}{2^j}$, and in total we have $\log n+1$ levels\footnote{At the last level $k$ we must have $\frac{n}{2^k}=1$, thus $k=\log n$.}. At each level, the number of required primitive operations is smaller than $8\cdot\frac{n}{2^j}\cdot 2^j=8n$. In the end, we have an upper bound of the total number of primitive operations needed to solve the original merge sort problem of size $n$: $8n(\log n+1)$. 100 | \section{Asymptotic Analysis} 101 | In the analysis above, we have been applying 3 guiding principles that will serve as universal tactics in future analysis of algorithms: 102 | \begin{enumerate} 103 | \item Focus on worst-case analysis, rather than average-case analysis or benchmarks on a specified set of inputs. 104 | \item Analyze with no regard to the constant factor. 105 | \item Conduct asymptotic analysis, i.e. focus on running time when $n$ is large. 106 | \end{enumerate} 107 | 108 | \textbf{Asymptotic analysis} provides basic vocabulary for the design and analysis of algorithms. It is essential for high-level reasoning about algorithms because it is both coarse enough to suppress details dependent upon architecture, language, compiler and implementation details, and sharp enough to facilitate comparisons between different algorithms, especially for inputs of large size. Its high-level idea is to \textbf{suppress constant factors as well as lower-order terms}. For our example of merge sort, the running time of $8n(\log n+1)$ is actually equivalent to $n\log n$, or in big-O notation, $O(n\log n)$. 109 | \subsection{Big-O Notation} 110 | Let $T(n),n\in\mathbb{N}$ be the function representing the running time of an algorithm with input of size $n$. The \textbf{Big-O notation} $T(n) = O(f(n))$ means that eventually (for all sufficiently large $n$), $T(n)$ will be bounded above by a constant multiple of $f(n)$. We hereby provide its formal definition. 111 | \begin{definition}\label{bigodef} 112 | Big-O notation $T(n)=O(f(n))$ holds if and only if there exist constants $c,n_0$ such that 113 | $$T(n)\leq c\cdot f(n),\forall n\geq n_0.$$ 114 | \end{definition} 115 | \begin{theorem} 116 | \begin{equation*} 117 | a_kn^k+\dots+a_1n+a_0=O(n^k) 118 | \end{equation*} 119 | \end{theorem} 120 | \begin{proof} 121 | Constants $n_0=1$ and $c=\sum\limits_{i=0}^{k}\lvert a_i\rvert$ satisfy Definition \ref{bigodef}. 122 | \end{proof} 123 | \begin{theorem} 124 | For every $k\geq 1$, $n^k\neq O(n^{k-1})$. 125 | \end{theorem} 126 | \begin{proof} 127 | The theorem can be proved by contradiction. Suppose $n^k=O(n^{k-1})$, i.e. $\exists$ constants $c,n_0$ such that $$n^k\leq c\cdot n^{k-1}, \forall n>n_0.$$Then $$n\leq c,\forall n>n_0,$$which is an obvious contradiction. 128 | \end{proof} 129 | \subsection{Omega, Theta and Little-O Notations} 130 | \begin{definition} 131 | Omega notation $T(n)=\Omega(f(n))$ holds if and only if there exist constants $c,n_0$ such that 132 | $$T(n)\geq c\cdot f(n),n\geq n_0.$$ 133 | \end{definition} 134 | \begin{definition} 135 | Theta notation $T(n)=\Theta(f(n))$ holds if and only if $T(n)=\Omega(f(n))$ and $T(n)=O(f(n))$, which is equivalent to $\exists$ constants $c_1,c_2,n_0$ such that 136 | $$c_1\cdot f(n)\leq T(n)\leq c_2\cdot f(n),\forall n\geq n_0$$ 137 | \end{definition} 138 | A convention in algorithm design, though a sloppy one, is to use big-O notation to represent Theta notation. 139 | \begin{definition} 140 | Little-O notation $T(n)=o(f(n))$ holds if and only if $\forall$ constant $c>0$, $\exists$ constant $n_0$ such that 141 | $$T(n)\leq c\cdot f(n), \forall n\geq n_0.$$ 142 | \end{definition} 143 | \begin{theorem} 144 | $n^{k-1}=o(n^k),\forall k\geq 1$ 145 | \end{theorem} 146 | \begin{proof} 147 | Constant $n=1/c$ can satisfy the condition. 148 | \end{proof} 149 | \ifx\PREAMBLE\undefined 150 | \end{document} 151 | \fi -------------------------------------------------------------------------------- /RandomizedAlgorithms.tex: -------------------------------------------------------------------------------- 1 | \ifx\PREAMBLE\undefined 2 | \input{preamble} 3 | \begin{document} 4 | \fi 5 | \chapter{Randomized Algorithms} 6 | \section{Quick Sort} 7 | \subsection{Overview} 8 | Quick sort is a prevalent sorting algorithm in practice. It is $O(n\log n)$ on average, and it works in place, i.e. extra memory need to carry out the sort is minimal, whereas for merge sort, we need at least $O(n)$ extra memory. The problem is the same as specified for merge sort. Here we assume that all items inside the array are distinct for simplicity. 9 | 10 | The key idea of merge sort is \textbf{partition the array around a pivot element}. Plenty of deliberation remains for the choice of the pivot element. For the moment we just assume that the first element is used. In a partition, the array is rearranged so that elements smaller than the pivot are put before the pivot, while elements larger than it are put after the pivot. The partition puts the pivot in the correct position. By recursively partition the two sub-arrays on both sides of the pivot, the whole array becomes sorted. As will be revealed later, a partition can be finished with $O(n)$ time and no extra memory. The skeleton of the algorithm is shown in Algorithm \ref{quicksortskeleton}. 11 | \begin{algorithm}[ht] 12 | \caption{Skeleton of Quick Sort}\label{quicksortskeleton} 13 | \begin{algorithmic}[1] 14 | \Input\Statex{Array $A$ with $n$ distinct elements in any order} 15 | \Output\Statex{Array $A$ in sorted order} 16 | \If{$A.len$ == 1} 17 | \State{return} 18 | \Else 19 | \State{$p$ = ChoosePivot($A$)} 20 | \State{Partition $A$ around $p$} 21 | \State{Recursively sort 1st part(on left of $p$)} 22 | \State{Recursively sort 2nd part(on right of $p$)} 23 | \EndIf 24 | \end{algorithmic} 25 | \end{algorithm} 26 | \subsection{Partition Subroutine} 27 | If the in place requirement is thrown away, it is easy to come up with a partition algorithm using $O(n)$ time and $O(n)$ extra memory, as shown in Algorithm \ref{partitionextramemory}. 28 | \begin{algorithm}[ht] 29 | \caption{Partition with $O(n)$ Extra Memory}\label{partitionextramemory} 30 | \begin{algorithmic}[1] 31 | \Input\Statex{Array $A$ with $n(n>1)$ distinct elements in any order} 32 | \Statex{Pivot element $p$, put at first position of $A$} 33 | \Output\Statex{Array $A$ partitioned around $p$} 34 | \State{Allocate $temp[n]$} 35 | \State{$small = 1, big = n$} 36 | \For{$i$ = 2 \textbf{to} $n$} 37 | \If{$A[i]>p$} 38 | \State{$temp[big--] = A[i]$} 39 | \Else\Comment{$A[i]1)$ distinct elements in any order} 54 | \Statex{Pivot element $p$, put at first position of $A$} 55 | \Output\Statex{Array $A$ partitioned around $p$} 56 | \State{$i = 2,\:j = 2$} 57 | \For{$k$ = 2 \textbf{to} $n$} 58 | \If{$A[k]i$} 140 | \State{return $RSelect(A_1,i)$}\Comment{$A_1.len=j-1$} 141 | \Else\State{return $RSelect(A_2,i-j)$}\Comment{$ji$}\State{return $DSelect(A_1,i)$}\Comment{$T(?)$ running time to be determined}\label{line1} 185 | \Else\State{return $DSelect(A_2,i-j$)}\label{line2} 186 | \EndIf 187 | \EndFunction 188 | \end{algorithmic} 189 | \end{algorithm} 190 | Two recursive calls are made in Algorithm \ref{dselection}. Let $T(n)$ represent the running time of $DSelect$ on an array of length $n$, then there exists constant $c$ such that 191 | \begin{itemize} 192 | \item $T(1)=1$ 193 | \item $T(n)\leq cn+T(n/5)+T(?)$ 194 | \end{itemize} 195 | in which the running time of the second recursive call needs to be determined. 196 | \begin{lemma} 197 | The 2nd recursive call (line \ref{line1} or \ref{line2}) is guaranteed to be $O(\frac{7}{10}n)$. 198 | \end{lemma} 199 | \begin{proof} 200 | Let $k=n/5$, and let $x_i$ represent the $i^{th}$ smallest of the $k$ medians. Then the final pivot is $x_{k/2}$. Our goal is to prove that at least 30\% of the elements in the input array are smaller than the $x_{k/2}$, and at least 30\% are bigger than it. Let's put all elements of the array in a 2D grid as follow, with each sorted group as a column, and $x_i$ in ascending order. 201 | \begin{table}[H] 202 | \centering 203 | \begin{tabular}{ccccccc} 204 | {\color{red}$\circ$}&{\color{red}$\circ$}&{\color{red}$\cdots$}&{\color{red}$\circ$}&$\cdots$&$\circ$&$\circ$ 205 | \\ 206 | {\color{red}$\circ$}&{\color{red}$\circ$}&{\color{red}$\cdots$}&{\color{red}$\circ$}&$\cdots$&$\circ$&$\circ$ 207 | \\ 208 | {\color{red}$x_1$}&{\color{red}$x_2$}&{\color{red}$\cdots$}&$x_{k/2}$&{\color{blue}$\cdots$}&{\color{blue}$x_{k-1}$}&{\color{blue}$x_k$}\\ 209 | $\circ$&$\circ$&$\cdots$&{\color{blue}$\circ$}&{\color{blue}$\cdots$}&{\color{blue}$\circ$}&{\color{blue}$\circ$}\\ 210 | $\circ$&$\circ$&$\cdots$&{\color{blue}$\circ$}&{\color{blue}$\cdots$}&{\color{blue}$\circ$}&{\color{blue}$\circ$} 211 | \end{tabular} 212 | \end{table} 213 | Obviously, elements in red color are smaller than $x_{k/2}$, while elements in blue color are bigger than it. Both include $\frac{3}{5}\times\frac{1}{2}=30\%$ of the elements. Thus the size of the sub-array is guaranteed to shrink by at least 30\% in the 2nd recursive call, so its running time is $O(\frac{7}{10}n)$. 214 | \end{proof} 215 | \begin{theorem} 216 | $\forall$ input array of size $n$, $DSelect$ runs in $O(n)$ time. 217 | \end{theorem} 218 | \begin{proof} 219 | Now we have 220 | \begin{equation*} 221 | T(n)\leq cn+T\left(\frac{n}{5}\right)+T\left(\frac{7n}{10}\right). 222 | \end{equation*} 223 | The master method is not applicable because the two sub-problems are not of the same size. We will prove $T(n)\leq 10cn$ by induction. 224 | 225 | The base case($n=1$) is trivial. Suppose that $T(k)\leq 10ck$ holds for all $k\log(n/2)^{n/2}=\frac{n}{2}\log\frac{n}{2},$$ 249 | which means that the running time of the algorithm is $\Omega(n\log n)$. 250 | 251 | 252 | \end{proof} 253 | \ifx\PREAMBLE\undefined 254 | \end{document} 255 | \fi -------------------------------------------------------------------------------- /DivideConquer.tex: -------------------------------------------------------------------------------- 1 | \ifx\PREAMBLE\undefined 2 | \input{preamble} 3 | \begin{document} 4 | \fi 5 | \chapter{Divide and Conquer Algorithms} 6 | A typical divide-and-conquer solution to a problem consists of the following steps: 7 | \begin{enumerate} 8 | \item Divide the problem into smaller sub-problems. 9 | \item Conquer the sub-problems via recursive calls. 10 | \item Combine solutions of sub-problems into a solution to the original problem, often involving some clean-up work. 11 | \end{enumerate} 12 | \section{Inversion Counting Problem} 13 | We will solve the inversion problem using the divide-and-conquer paradigm. The problem is described as follow. 14 | \begin{description} 15 | \item[Input]An array $A$ containing numbers $1,2,\dots,n$ in some arbitrary order. 16 | \item[Output]Number of inversions in this array, i.e. number of pairs $[i,j]$ such that $iA[j]$. 17 | \end{description} 18 | A geometrical solution to the problem is to draw two parallel series of points, mark one series in the order inside array $A$, and mark the other in the order $1,2,\dots,n$. Connect points marked by the same number, i.e. 1 with 1, 2 with 2, etc, then the number of crossing lines is exactly the number of inversions. 19 | 20 | The inversion number is widely useful in comparison and recommendation systems. A movie rating website wants to compare tastes of its users and recommend to a user movies liked by other users with similar taste to his. One criterion of such comparison is to pick the ratings given by one user to a series of movies and compare them against other users' ratings by calculating the number of inversions. The fewer inversions there are, the more similar their tastes are. 21 | 22 | A brute-force approach is obviously $\Theta(n^2)$. We can do better by applying the divide-and-conquer paradigm. Suppose that the array has been divided into two halves. An inversion $[i,j]$ is called a left inversion if both $i,j$ are in the left half, a right inversion if they are both in the right half, and a split inversion if $i$ is in the left half and $j$ is in the right half. A high-level divide-and-conquer algorithm is provided in Algorithm \ref{inversioncounting}. If \texttt{countSplitInv} can be implemented as $\Theta(n)$, then the whole algorithm will be $\Theta(n\log n)$. 23 | \begin{algorithm}[ht] 24 | \caption{Divide-and-conquer Inversion Counting}\label{inversioncounting} 25 | \begin{algorithmic}[1] 26 | \Input\Statex{Array A} 27 | \Output\Statex{Number of inversions in A} 28 | \Function{count}{Array A} 29 | \If{A.len == 1}\State{return 0} 30 | \Else 31 | \State{x = count(1st half of A)}\Comment{Left inversions.} 32 | \State{y = count(2nd half of A)}\Comment{Right inversions.} 33 | \State{z = countSplitInv(A)}\Comment{Count split inversions.} 34 | \State{return x+y+z} 35 | \EndIf 36 | \EndFunction 37 | \end{algorithmic} 38 | \end{algorithm} 39 | 40 | The implementation of \texttt{countSplitInv} seems quite subtle, but it can actually be developed from merge sort. In Algorithm \ref{inversioncounting}, the subroutine \texttt{count} only counts the number of inversions. In addition to that, we rename it with \texttt{sortAndCount} and require that it also sorts the array. Subroutine \texttt{countSplitInv} now becomes \texttt{mergeAndCountSplitInv}. It merges the two sorted sub-arrays into one sorted array and counts the number of split inversions. 41 | 42 | If there exists no split inversion, it must be the case that any element of the left sub-array A is smaller than any element of the right sub-array B. As a result, when merging the two sorted sub-arrays, A will be exhausted before any element of B is put in the result. Once an element of B is chosen during the merging process before A is exhausted, every element left in A forms an inversion with it. Algorithm \ref{countsplitinversion}, developed from Algorithm \ref{mergehalves}%in file Introduction.tex 43 | , uses this idea to carry out the \texttt{mergeAndCountSplitInv} process. It is still $\Theta(n)$, thus our inversion counting algorithm is guaranteed to be $\Theta(n\log n)$. 44 | \begin{algorithm}[ht] 45 | \caption{Merge and Count Split Inversion}\label{countsplitinversion} 46 | \begin{algorithmic}[1] 47 | \Input 48 | \Statex{A = 1st sorted sub-array, of length $\lfloor n/2 \rfloor$} 49 | \Statex{B = 2nd sorted sub-array, of length $\lceil n/2 \rceil$} 50 | \Output 51 | \Statex{C = sorted array of length $n$} 52 | \Statex{numSplitInv = number of split inversions} 53 | \State{i = 1, j = 1, numSplitInv = 0} 54 | \For{k = 1 \textbf{to} $n$} 55 | \If{i $>$ A.len}\Comment{A has been exhausted} 56 | \State{C[k] = B[j++]} 57 | \ElsIf{j $>$ B.len}\Comment{B has been exhausted} 58 | \State{C[k] = A[i++]} 59 | \ElsIf{A[i] $\leq$ B[j]}\Comment{In this case equality is actually impossible.} 60 | \State{C[k] = A[i++]} 61 | \Else\Comment{A[i]$>$B[j]} 62 | \State{C[k] = B[j++]} 63 | \State{numSplitInv += A.len} 64 | \EndIf 65 | \EndFor 66 | \end{algorithmic} 67 | \end{algorithm} 68 | \section{Matrix Multiplication} 69 | Matrix multiplication is an important mathematical problem. 70 | \begin{description} 71 | \item[Input]Two matrices $X,Y$ of dimension $N\times N$. 72 | \item[Output]Product matrix $Z=X\cdot Y$. 73 | \end{description} 74 | The definition of matrix multiplication is $Z_{ij}=\sum\limits_{k=1}^NX_{ik}Y_{kj}$. Calculating the product matrix directly will result in a $\Theta(n^3)$ algorithm.\footnote{Here the input size is $\Theta(n^2)$, rather than $\Theta(n)$.} We will introduce an ingenious divide-and-conquer algorithm developed by Strassen that is more efficient. 75 | 76 | At first sight, it might seem plausible to divide each matrix into 4 sub-matrices of dimension $N/2\times N/2$ in the divide phase of the divide-and conquer process: 77 | \begin{equation*} 78 | XY=\begin{pmatrix}A&B\\C&D\end{pmatrix}\begin{pmatrix}E&F\\G&H\end{pmatrix}= 79 | \begin{pmatrix} 80 | AE+BG&AF+BH\\CE+DG&CF+DH 81 | \end{pmatrix}. 82 | \end{equation*} 83 | However, this division does not make great difference. The algorithm is still $\Theta(n^3)$, as we will prove later. Recall that in the Karatsuba Multiplication algorithm, we reduced the number of products by 1 by applying the Gauss's trick: obtain some products by linear combination of other products, rather than direct multiplication. Since addition/subtraction is generally more efficient than multiplication, it usually pays off to appropriately choose the products to calculate in order to reduce the number of products to calculate. In the naive divide-and-conquer design above, we have to calculate 8 products of sub-matrices. Strassen's brilliant algorithm reduces this number to 7, as shown in Algorithm \ref{strassen}, and ends up with smaller time consumption. 84 | \begin{algorithm} 85 | \caption{Strassen's Matrix Multiplication}\label{strassen} 86 | \begin{algorithmic}[1] 87 | \Input\Statex{Two matrices $X,Y$ of dimension $N\times N$} 88 | \Output\Statex{Matrix product $X\cdot Y$} 89 | \State{Divide the matrices: $X=\begin{pmatrix}A&B\\C&D\end{pmatrix}$, $Y=\begin{pmatrix}E&F\\G&H\end{pmatrix}$} 90 | \State{Recursively calculate 91 | \begin{align*} 92 | P_1&=A(F-H),\:P_2=(A+B)H,\:P_3=(C+D)E,\:P_4=D(G-E)\\ 93 | P_5&=(A+D)(E+H),\:P_6=(B-D)(G+H),\:P_7=(A-C)(E+F)\\ 94 | \end{align*}} 95 | \State{Linearly combine the products in step 2 to obtain $XY$: 96 | \begin{equation*} 97 | XY=\begin{pmatrix} 98 | P_5+P_4-P_2+P_6&P_1+P_2\\ 99 | P_3+P_4&P_1+P_5-P_3-P_7\\ 100 | \end{pmatrix}=\begin{pmatrix} 101 | AE+BG&AF+BH\\CE+DG&CF+DH 102 | \end{pmatrix}\end{equation*}} 103 | \end{algorithmic} 104 | \end{algorithm} 105 | The explanation of its time complexity, as well as that of the naive divide-and-conquer algorithm will be addressed later. 106 | \section{Closest Pair} 107 | The closest pair problem is the first computational geometry problem we meet. 108 | \begin{description} 109 | \item[Input]A set of $n$ points $P=\{p_1,\dots,p_n\}$ on the $\mathbb{R}^2$ plane. For simplicity, we assume that they have distinct $x$ coordinates and $y$ coordinates. 110 | \item[Output]A pair of distinct points $p^*,q^*\in P$ that minimizes the Euclidean distance between two points $d(p,q)$ with $p,q\in P$. 111 | \end{description} 112 | A brute-force algorithm is obviously $\Theta(n^2)$. A divide-and-conquer approach can improve it to $\Theta(n\log n)$. Its subtlety lies, as usual, in the 3rd step: combination of solutions to the sub-problems. As preparation before the divide-and-conquer, we sort the points respectively by $x$ and $y$ coordinates, and note the results as $P_x,P_y$. The sort process is $\Theta(n\log n)$ using merge sort, thus we can obtain a $\Theta(n \log n)$ algorithm as long as the divide-and-conquer process takes no more than $\Theta(n\log n)$. 113 | 114 | The skeleton of the process is shown in Algorithm \ref{closetpair}. \texttt{ClosestSplitPair} remains to be illustrated. It outputs the ``split pair'', i.e. one point in $Q$ and the other in $R$, with minimum distance. 115 | \begin{algorithm}[ht] 116 | \caption{Closest Pair Searching ClosetPair($P_x,P_y$)}\label{closetpair} 117 | \begin{algorithmic}[1] 118 | \Input\Statex{A set of $n$ points $P=\{p_1,\dots,p_n\}$ on $\mathbb{R}^2$, sorted respectively by $x$ and $y$ coordinates as $P_x$ and $P_y$} 119 | \Output\Statex{$p,q$ with minimum Euclidean distance} 120 | \State{Let $Q$ be the left half of $P$ and $R$ be right half of $P$. According to $P_x,P_y$, form $Q_x,Q_y,R_x,R_y$, i.e. $Q,R$ sorted by $x$ and $y$ coordinates.}\Comment{$\Theta(n)$} 121 | \State{$(p_1,q_1)$ = ClosestPair($Q_x,Q_y$)} 122 | \State{$(p_2,q_2)$ = ClosestPair($R_x,R_y$)} 123 | \State{$\delta=\min\{d(p_1,q_1),d(p_2,q_2)\}$} 124 | \State{$(p_3,q_3)$ = ClosetSplitPair($P_x,P_y,\delta$)}\Comment{Should be $O(n)$} 125 | \State{Return the best among $(p_1,q_1),(p_2,q_2)$ and $(p_3,q_3)$} 126 | \end{algorithmic} 127 | \end{algorithm} 128 | 129 | Let $\overline{x}$ represent the largest $x$ coordinate in $Q$, i.e. in the left half of $P$. Since we have $P_x$, $\overline{x}$ can be obtained in $O(1)$ time. Define $S_y$ as points in $P$ with $x$ coordinate inside $[\overline{x}-\delta,\overline{x}+\delta]$, sorted by $y$ coordinate. We have the following lemma. 130 | \begin{lemma}\label{lemmaforclosesplitpair} 131 | Let $p\in Q,q\in R$ be the split pair with $d(p,q)<\delta$. Then we must have 132 | \begin{itemize} 133 | \item $p,q\in S_y$; 134 | \item $p,q$ are at most 7 positions away from each other in $S_y$. 135 | \end{itemize} 136 | \end{lemma} 137 | \begin{proof} 138 | Let $p(x_1,y_1)\in Q$, $q(x_2,y_2)\in R$, and we have 139 | $$d(p,q)=\sqrt{(x_1-x_2)^2+(y_1-y_2)^2}<\delta.$$ 140 | Since $\overline{x}$ is the largest coordinate in $Q$, we have 141 | $$x_1\leq \overline{x}\leq x_2.$$ 142 | Thus 143 | \begin{align*} 144 | \lvert x_2-\overline{x}\rvert=x_2-\overline{x}\leq x_2-x_1\leq\sqrt{(x_1-x_2)^2+(y_1-y_2)^2}<\delta,\\ 145 | \lvert x_1-\overline{x}\rvert=\overline{x}-x_1\leq x_2-x_1\leq\sqrt{(x_1-x_2)^2+(y_1-y_2)^2}<\delta. 146 | \end{align*} 147 | Which leads to the conclusion 148 | $$x_1,x_2\in[\overline{x}-\delta,\overline{x}+\delta].$$ 149 | This can be directly translated to the first claim of the lemma: $p,q\in S_y$. Figure \ref{pq7position} helps to prove the 2nd claim. 150 | \begin{figure}[H] 151 | \centering 152 | \includegraphics[width=0.5\textwidth]{closestpointlemma.jpg} 153 | \caption{$p,q$ at most 7 positions away from each other}\label{pq7position} 154 | \end{figure} 155 | 156 | In Figure \ref{pq7position}, we draw 8 $\delta/2\times\delta/2$ grids around $\overline{x}$ and $\min\{y_1,y_2\}$. Since $\lvert y_1-y_2\rvert$\eqref{gausstrick}$>$\eqref{mergesortrecur}. 193 | \subsection{Mathematical Statement} 194 | The master method helps to obtain the running time of an algorithm according to its recurrence relation. It assumes that all sub-problems have equal size, thus it's not applicable to the closest pair problem, in which the left and right halves of the points are not guaranteed to have the same number of points. We also assume that the base case is trivial: $T(n)\leq a$ ($a$ is constant) for all sufficiently small $n$. Consider an algorithm that has the recurrence relation 195 | \begin{equation*} 196 | T(n)\leq a\cdot T(n/b)+O(n^d), 197 | \end{equation*} 198 | in which $a,b,n$ are constants with clear meanings. $a$ is the number of recursive calls, $b$ is the input size shrinkage factor, and $O(n^d)$ describes the amount of work needed to combine the solutions to the sub-problems. $a,b$ are both larger than 1, while $d$ can be as small as 0. Master method provides the form of $T(n)$ in different cases, as expressed in Theorem \ref{mastermethod}. 199 | \begin{theorem}\label{mastermethod} 200 | $T(n)$ can be expressed in big-O notation as follow\footnote{If the recurrence relation is written with = rather than $\leq$, the result will be in $\Theta$ notation.}: 201 | \begin{equation*} 202 | T(n)= 203 | \begin{cases} 204 | O(n^d\log n)&\text{if}\:a=b^d\\ 205 | O(n^d)&\text{if}\:ab^d\\ 207 | \end{cases} 208 | \end{equation*} 209 | \end{theorem} 210 | Now let's look at a few examples. 211 | 212 | For merge sort algorithm \ref{mergesort} with recurrence relation \eqref{mergesortrecur}, we have $a=2,\:b=2,\:d=1$, thus it belongs to case 1 and has running time $O(n\log n)$. 213 | 214 | Consider binary search, which has the recurrence relation 215 | \begin{equation*} 216 | T(n)=T(n/2)+O(1), 217 | \end{equation*} 218 | i.e. $a=1,\:b=2,\:d=0$, hence it is $O(\log n)$. 219 | 220 | For the integer multiplication algorithm without Gauss's trick \eqref{nogausstrick}, we have $a=4,\:b=2,\:d=1$, ending up with case 3. Thus it has time complexity $O(n^2)$, i.e. the divide-and-conquer approach fails to improve the time consumption in comparison with the primary school method. 221 | 222 | When we take Gauss's trick into account, i.e. with Karatsuba multiplication algorithm \ref{kmultiplication}, in \eqref{gausstrick} we have $a=3,\:b=2,\:d=1$. We are still in case 3 and end up with $O(n^{\log_23})$, which is smaller than $O(n^2)$ but larger than $O(n\log n)$. 223 | 224 | Strassen's algorithm \ref{strassen} for matrix multiplication has the recurrence relation 225 | \begin{equation*} 226 | T(n)=7\cdot T(n/2)+O(n^2), 227 | \end{equation*} 228 | which leaves us in case 3. Its running time is therefore $O(n^{\log_27})$, which is better than $O(n^3)$. 229 | 230 | As an illustration of case 2, consider the recurrence relation 231 | \begin{equation*} 232 | T(n)\leq 2\cdot T(n/2)+O(n^2). 233 | \end{equation*} 234 | With $a=b=d=2$, we are in case 2, and end up with running time $O(n^2)$. In this case, the running time is governed by the work outside the recursive call, i.e. the time spent on combining solutions to the sub-problems dominates the global time consumption. 235 | \subsection{Proof} 236 | We will prove the correctness of the master method in this section. As having been stated above, we assume that the recurrence relation takes the form 237 | \begin{itemize} 238 | \item $T(1)\leq c$ 239 | \item $T(n)\leq a\cdot T(n/b)+cn^d$ 240 | \end{itemize} 241 | It's fine to use the same constant $c$ in both the base case and the recurrence relation because we are using $\leq$. In order to make the process less tedious, we also assume that $n$ is a power of $b$. The argument will be similar to what we did to obtain the running time of merge sort: through analysis on the recursive tree. Note that in this section when we refer to a value of time consumption, we always mean that the actual time consumption is smaller than or equal to this value. 242 | 243 | The recursive tree has $\log_bn+1$ levels, from level 0 (the original problem) to level $\log_bn$ (trivial problem of size 1). At level $j$, there are in total $a^j$ sub-problems, each of size $n/b^j$. For $j\neq\log_bn$, the time consumption at this level is contributed by the $cn^d$ term: 244 | \begin{equation*} 245 | a^j\cdot c\cdot\left(\frac{n}{b^j}\right)^d=c\cdot n^d\cdot\left(\frac{a}{b^d}\right)^j. 246 | \end{equation*} 247 | Summing it up over all levels leads to the result $c\cdot n^d\cdot\sum\limits_{j=0}^{\log_bn-1}\left(\frac{a}{b^d}\right)^j$. At level $\log_bn$, the time consumption is simply the combination of all base cases: 248 | \begin{equation*} 249 | c\cdot a^{\log_bn} 250 | \end{equation*} 251 | The total time consumption is thus 252 | \begin{equation}\label{totaltime} 253 | c\cdot n^d\cdot\sum\limits_{j=0}^{\log_bn-1}\left(\frac{a}{b^d}\right)^j+c\cdot a^{\log_bn}=c\cdot n^d\cdot\sum\limits_{j=0}^{\log_bn}\left(\frac{a}{b^d}\right)^j 254 | \end{equation} 255 | This leads to classified discussion over the value of $\frac{a}{b^d}$. 256 | \begin{enumerate} 257 | \item $a=b^d$. In this case, \eqref{totaltime} becomes 258 | \begin{equation*} 259 | c\cdot n^d(\log_bn+1)=O(n^d\log_bn) 260 | \end{equation*} 261 | \item $ab^d$. In this case, \eqref{totaltime} becomes 266 | \begin{equation*} 267 | c\cdot n^d\frac{\left(\frac{a}{b^d}\right)^{\log_bn+1}-1}{\frac{a}{b^d}-1}<\frac{c}{\frac{a}{b^d}-1}n^d\frac{a}{b^d}\left(\frac{a}{b^d}\right)^{\log_bn}=O(a^{\log_bn})=O(n^{\log_ba}) 268 | \end{equation*} 269 | \end{enumerate} 270 | Therefore we have completed the proof of the master method.\hfill$\square$ 271 | 272 | The essential role that $a/b^d$ plays here comes naturally from the meaning of $a,b,d$. Each problem produces $a$ sub-problems in the next level. We call $a$ the \textbf{rate of sub-problem proliferation, abbr. RSP}. Size of each sub-problem shrinks by $b$ times after each recurrence, and the work load shrinks by $b^d$ times, so we call $b^d$ the \textbf{rate of work shrinkage, abbr. RWS}. The three cases of the master method can be interpreted as follow. 273 | \begin{enumerate} 274 | \item RSP = RWS. The amount of work at each level is $cn^d$. With totally $\log_bn$ levels, the problem should be $O(n^d\log_bn)$. 275 | \item RSP $>$ RWS. The amount of work increases with the recursion level $j$. The last level dominates the running time, thus the overall running time is proportional to the number of sub-problems (base cases) in the last level. The problem is $O(a^{\log_bn})$. 276 | \item RSP $<$ RWS. The amount of work decreases with the recursion level $j$. The root level dominates the running time, thus the problem is $O(n^d)$. 277 | \end{enumerate} 278 | \ifx\PREAMBLE\undefined 279 | \end{document} 280 | \fi -------------------------------------------------------------------------------- /Graphs.tex: -------------------------------------------------------------------------------- 1 | \ifx\PREAMBLE\undefined 2 | \input{preamble} 3 | \begin{document} 4 | \fi 5 | \chapter{Graph Primitives} 6 | \textbf{Graphs represent pairwise relationships amongst a set of objects}. The objects are called vertices or nodes. The relationships are called edges or arcs, each connecting a pair of vertices. An edge can be directed or undirected. The set of vertices and the set of edges are noted respectively as $V$ and $E$. Graph is a concept heavily used in reality. Road networks, the web, social networks, precedence constraints are all examples of graphs. 7 | 8 | A connected graph composed of $n$ vertices with no parallel edges has at least $n-1$ and at most $n(n-1)/2$ edges. Let $m$ represent the number of edges. In most applications, $m$ is $\Omega(n)$ and $O(n^2)$. If $m$ is $O(n)$ or close to it, the graph is called a sparse graph, while if $m$ is closer to $O(n^2)$, it's called a dense graph. Yet their delimitation is not strictly clear. 9 | \section{Representation} 10 | \subsection{Adjacent Matrix} 11 | An undirected graph $G$ with $n$ vertices and no parallel edges can be represented by an $n\times n$ 0-1 matrix $A$. $A_{ij}=1$ when and only when $G$ has an $i-j$ edge. Variants of this representation can easily accommodate parallel edges, weighted edges: just let $A_{ij}$ represent the number of parallel edges or the weight of the edge. For directed graphs, $i\rightarrow j$ can be represented by $A_{ij}=1$ and $A_{ji}=-1$. 12 | 13 | Adjacent matrix representation requires $\Theta(n^2)$ space. For a dense graph this is fine, but for a sparse graph it is wasteful. 14 | \subsection{Adjacent Lists} 15 | The adjacent lists representation is composed of 4 ingredients: 16 | \begin{itemize} 17 | \item Array/List of vertices. $\Theta(n)$ space. 18 | \item Array/List of edges. $\Theta(m)$ space. 19 | \item Each edge points to its end points. $\Theta(m)$ space. 20 | \item Each vertex points to edges incident on it. $\Theta(m)$ space. 21 | \end{itemize} 22 | Adjacent lists representation requires $\Theta(n+m)$ space. 23 | 24 | The choice between the two representations depends on the density of the graph and operations to take. We will mainly use adjacent lists in this chapter. 25 | \section{Minimum Cut} 26 | \subsection{Definition} 27 | \begin{definition} 28 | A cut of a graph ($V,E$) is a partition of $V$ into two non-empty sets $A$ and $B$. 29 | \end{definition} 30 | A graph with $n$ vertices has $2^n-2$ possible cuts. 31 | \begin{definition} 32 | The crossing edges of a cut($A,B$) are those with 33 | \begin{itemize} 34 | \item one endpoint in $A$ and the other in $B$, for undirected graphs; 35 | \item tail in $A$ and head in $B$, for directed graphs. 36 | \end{itemize} 37 | \end{definition} 38 | We will try to solve the minimum cut problem: 39 | \begin{description} 40 | \item[Input]An undirected graph $G=(V,E)$ in which parallel edges are allowed. 41 | \item[Output]A cut $(A,B)$ with minimum number of crossing edges. 42 | \end{description} 43 | A lot of problems in reality can be reduced to a minimum cut problems: 44 | \begin{itemize} 45 | \item Identify weakness point of physical networks; 46 | \item Community detection in social networks; 47 | \item Image segmentation. 48 | \end{itemize} 49 | \subsection{Random Contraction Algorithm} 50 | Algorithm \ref{randomcontraction} provides a random approach to find a cut. 51 | \begin{algorithm}[ht] 52 | \caption{Random Contraction}\label{randomcontraction} 53 | \begin{algorithmic}[1] 54 | \Input\Statex{An undirected graph $G=(V,E)$ in which parallel edges are allowed.} 55 | \Output\Statex{A cut $(A,B)$ with minimum number of crossing edges} 56 | \While{There are more than 2 vertices left} 57 | \State{Pick a remaining edge $(u,v)$ randomly}\label{randomselection} 58 | \State{Merge(or contract) $u,v$ into a single vertex} 59 | \State{Remove self-loops} 60 | \EndWhile 61 | \State{Return cut represented by 2 final vertices} 62 | \end{algorithmic} 63 | \end{algorithm} 64 | \subsection{Probability of Correctness} 65 | Algorithm \ref{randomcontraction} is not guaranteed to always return a minimum cut. We have to iterate the whole algorithm multiple times and choose the cut with minimum number of crossing edges obtained during the process. In order to estimate the number of iterations needed, we will calculate the probability that a specific minimum cut $(A,B)$ is returned in one iteration. 66 | 67 | If one of the crossing edges of $(A,B)$ is selected in step \ref{randomselection}, the minimum cut cannot be returned. If there are $k$ crossing edges in $(A,B)$, the probability that a crossing edge is selected at the first iteration is $k/m$. The number of edges decrease by an indefinite number at each iteration, while the number of vertices decrease by exactly 1 each time. Thus it is preferable to express the probability in terms of $n$ instead of $m$. 68 | 69 | Each vertex $v$ is related to a cut $({v},V-{v})$. The number of crossing edges of this cut is the number of edges incident on $v$, i.e. the degree of $v$. Considering the definition of minimum cut, this number must be larger than $k$. We also know the sum of the degrees of all vertices: $\sum\limits_{v}degree(v)=2m$. Thus we have $2m=\sum\limits_{v}degree(v)\geq kn$. Hence the probability that a crossing edge is not selected at the first iteration in step \ref{randomselection} is $$1-k/m\geq 2/n.$$ 70 | 71 | At the second iteration, the same argument still holds. Let $m'$ represent the number of remaining edges after the first iteration. We have $2m'=\sum\limits_{v}degree(v)\geq k(n-1)$. Thus the probability that a crossing edge is selected at the 2nd iteration under the condition that no crossing edge was selected at the 1st iteration is $$1-k/m'\geq 2/(n-1)$$. 72 | 73 | The same argument continues further until the last iteration, i.e. the $(n-2)^{th}$ iteration. Finally, the probability that no crossing edge is selected in the whole process, thus resulting in the minimum cut $(A,B)$ is 74 | \begin{align*} 75 | P(A,B)&\geq\left(1-\frac{2}{n}\right)\left(1-\frac{2}{n-1}\right)\cdots\left(1-\frac{2}{n-(n-3)}\right)\\ 76 | &=\frac{(n-2)\cdot(n-3)\cdots 2\cdot 1}{n\cdot(n-1)\cdots 4\cdot 3}\\ 77 | &=\frac{2}{n(n-1)}>\frac{1}{n^2}. 78 | \end{align*} 79 | The probability seems small, but it already makes great advancement when compared with brute-force method, in which case the probability to obtain $(A,B)$ in an iteration is $\frac{1}{2^n}$. 80 | 81 | After $N$ iterations, the probability that $(A,B)$ has not been found is 82 | \begin{equation*} 83 | P(not\:found)<\left(1-\frac{1}{n^2}\right)^N\leq e^{-\frac{N}{n^2}}. 84 | \end{equation*} 85 | If $N=n^2$, the probability is smaller than $1/e$. If $N=n^2\ln n$, it is smaller than $1/n$. 86 | \subsection{Number of Minimum Cuts} 87 | A graph can have multiple minimum cuts. For example, a tree with $n$ vertices has $n-1$ minimum cuts. We would like to find the largest number of minimum cuts that a graph with $n$ vertices can have. 88 | \begin{theorem} 89 | A graph with $n$ vertices can have at most $\binom{n}{2}$ minimum cuts. 90 | \end{theorem} 91 | \begin{proof} 92 | Consider the graph in which the $n$ vertices form a circle. Removing any two edges results in a minimum cut. Thus in total it has $\binom{n}{2}$ minimum cuts. The rest of the proof aims at proving that a graph with $n$ vertices cannot have more minimum cuts. 93 | 94 | Recall that the probability for Algorithm \ref{randomcontraction} to return a specific minimum cut is bigger than $\frac{2}{n(n-1)}$. Suppose there are $k$ minimum cuts. The events ``Algorithm \ref{randomcontraction} returns minimum cut $C_i$'' and ``Algorithm \ref{randomcontraction} returns minimum cut $C_j$'' are disjoint events when $i\neq j$. Thus we have 95 | $$1\geq P(\text{return a minimum cut})\geq\frac{2k}{n(n-1)}.$$ 96 | Therefore, 97 | $$k\leq \frac{n(n-1)}{2}=\binom{n}{2}.$$ 98 | \end{proof} 99 | \section{Breadth First Search} 100 | Graph search is widely used for various purposes: 101 | \begin{itemize} 102 | \item Check if a network is connected; 103 | \item Find shortest paths, e.g. for driving navigation, or formulating a plan; 104 | \item Calculate connected components. 105 | \item $\dots$ 106 | \end{itemize} 107 | We will introduce a few fast algorithms based on graph search. Graph search usually starts from a source vertex. When searching the graph, we want to find everything that is findable, i.e. every vertex reachable from the source via a path. Moreover, we never explore anything twice. In terms of running time, our goal is $O(m+n).$ 108 | 109 | BFS explores the nodes of a graph in layers. Nodes with the same distances from the source are in the same layer. It can be used to compute shortest paths of graphs, and to compute connected components of undirected graphs. It guarantees $O(m+n)$ running time. The general pattern of BFS is shown in Algorithm \ref{bfs}. 110 | \begin{algorithm}[ht] 111 | \caption{Breadth First Search(BFS)}\label{bfs} 112 | \begin{algorithmic}[1] 113 | \Input\Statex{Graph $G$ with all vertices unexplored} 114 | \Statex{Source vertex $s$} 115 | \Output\Statex{$G$ with all vertices reachable from $s$ explored.} 116 | \State{Mark $s$ as explored.} 117 | \State{Let $Q$ = queue initialized with $s$} 118 | \While{$Q\neq\emptyset$} 119 | \State{Remove first element $v$ of $Q$} 120 | \For{each edge $(v,w)$} 121 | \If{$w$ is unexplored} 122 | \State{Mark $w$ as explored} 123 | \State{Add $w$ to $Q$} 124 | \EndIf 125 | \EndFor 126 | \EndWhile 127 | \end{algorithmic} 128 | \end{algorithm} 129 | 130 | At the end of BFS, the fact that a node $v$ is explored means the existence of a path from $s$ to $v$. 131 | \subsection{Shortest Path} 132 | Algorithm \ref{shortestpath} calculates the shortest path from $s$ to any vertex reachable from $s$. 133 | \begin{algorithm}[ht] 134 | \caption{Shortest Path - BFS}\label{shortestpath} 135 | \begin{algorithmic}[1] 136 | \Input\Statex{Graph $G$ with all vertices unexplored} 137 | \Statex{Source vertex $s$} 138 | \Output\Statex{$dist(v)$ for any vertex $v$, i.e. min number of edges on a path from $s$ to $v$} 139 | \State{Initialize $dist(v)=\left\{\begin{array}{rl}0,&if(v==s)\\+\infty,&if(v\neq s)\end{array}\right.$} 140 | \State{Mark $s$ as explored.} 141 | \State{Let $Q$ = queue initialized with $s$} 142 | \While{$Q\neq\emptyset$} 143 | \State{Remove first element $v$ of $Q$} 144 | \For{each edge $(v,w)$} 145 | \If{$w$ is unexplored} 146 | \State{Mark $w$ as explored} 147 | \State{$dist(w)=dist(v)+1$} 148 | \State{Add $w$ to $Q$} 149 | \EndIf 150 | \EndFor 151 | \EndWhile 152 | \end{algorithmic} 153 | \end{algorithm} 154 | 155 | After the algorithm terminates, $dist(v)=i$ means that $v$ is in the $i^{th}$ layer and that the shortest path connecting $s$ and $v$ has $i$ edges. 156 | \subsection{Undirected Connectivity} 157 | \begin{definition} 158 | For an undirected graph $G(V,E)$, connected components are equivalence classes of the equivalence relation\footnote{An equivalence relation on a set must satisfy: 1. $a\sim a$; 2. If $a\sim b$, then $b\sim a$; 3. If $a\sim b$ and $b\sim c$, then $a\sim c$.} $u\sim v$, in which $u,v$ are its vertices and $u\sim v\iff\exists$ path from $u$ to $v$. 159 | \end{definition} 160 | 161 | Identifying connected components of graphs is useful for various purposes: 162 | \begin{itemize} 163 | \item Check if a network is disconnected; 164 | \item Graph visualization; 165 | \item Clustering. 166 | \end{itemize} 167 | When it comes to the calculation of connected component, undirected graphs and directed graphs are significantly different. BFS is an effective method for calculating the connectivity of undirected graphs. Algorithm \ref{ccundirected} computes the CCs of an undirected graph in $O(m+n)$ time. 168 | \begin{algorithm}[ht] 169 | \caption{Connected Components of Undirected Graph - BFS}\label{ccundirected} 170 | \begin{algorithmic}[1] 171 | \Input\Statex{Undirected graph $G$ with all vertices unexplored and labeled 1 to $n$} 172 | \Output\Statex{Connected components of $G$} 173 | \For{$i$ = 1 \textbf{to} $n$} 174 | \If{$i$ not explored} 175 | \State{$BFS(G,i)$}\Comment{discovers $i$'s connected component} 176 | \EndIf 177 | \EndFor 178 | \end{algorithmic} 179 | \end{algorithm} 180 | \section{Depth First Search} 181 | DFS is a more aggressive method to search an graph than BFS. It explores the nodes following the edges as deeply as possible, and only backtracks when necessary. DFS is especially important for dealing with directed graphs. As we are about to demonstrate, it helps to compute topological ordering of directed acyclic graphs and strongly connected components of directed graphs. As with BFS, DFS also runs in $O(m+n)$ time. 182 | 183 | DFS can be implemented by mimicking BFS in Algorithm \ref{bfs}. A stack should be used instead of a queue. A recursive approach is shown in Algorithm \ref{dfs}. 184 | \begin{algorithm}[ht] 185 | \caption{Depth First Search (Recursive)}\label{dfs} 186 | \begin{algorithmic}[1] 187 | \Input\Statex{Graph $G$ with all vertices unexplored} 188 | \Statex{Source vertex $s$} 189 | \Output\Statex{$G$ with all vertices reachable from $s$ explored.} 190 | \Function{$DFS$}{Graph $G$, node $s$} 191 | \State{Mark $s$ as explored} 192 | \For{each edge $(s,v)$} 193 | \If{$v$ not explored} 194 | \State{$DFS(G,v)$} 195 | \EndIf 196 | \EndFor 197 | \EndFunction 198 | \end{algorithmic} 199 | \end{algorithm} 200 | DFS can also be used to calculate connected components of undirected graphs. But we will focus on two applications of DFS that cannot be handled with BFS. 201 | \subsection{Topological Sort} 202 | Topological sort aims at putting the nodes of a directed graph in topological ordering. 203 | \begin{definition} 204 | A topological ordering of a directed graph $G$ is a labeling $f$ of $G$'s nodes among $\{1,2,\dots,n\}$ such that $\forall(u,v)\in G,\:f(u)\max\limits_{i\in C_1}f(i)$ 326 | \end{proof} 327 | 328 | An obvious corollary of Lemma \ref{keylemmascc} is as follow. 329 | \begin{corollary}\label{corollaryscc} 330 | $\max\limits_{v\in V}f(v)$ must lie in a sink SCC, i.e. an SCC that has no outgoing arc. 331 | \end{corollary} 332 | Therefore, by starting from the node with the largest $f$ value in the 2$^{nd}$ loop, we are guaranteed to explore a sink SCC of $G$ first. Nodes in this sink SCC are ruled out from further exploration because there have been marked as explored. Every time we set up a new leader, it is guaranteed to be the node with the largest $f$ value amongst all unexplored nodes. DFS from the leader will reach and will only reach nodes in the same SCC as the leader, which is a sink SCC of the unexplored part of the graph. The SCCs will be found one by one. 333 | \section{Dijkstra's Shortest Path Algorithm} 334 | If all edges in a graph have equal lengths, the shortest path problem can be solved by BFS, as discussed in Algorithm \ref{shortestpath}. Dijkstra's algorithm computes shortest paths when edges have different lengths. 335 | \begin{description} 336 | \item[Input]Directed graph $G(V,E)$. Each edge $e\in E$ has non-negative length $l_e$. A source vertex $s$. 337 | \item[Output]For each $v\in V$, compute the length of shortest path from $s$ to $v$ in $G$. 338 | \end{description} 339 | For convenience, we assume that there exists a path from $s$ to any vertex in $G$. 340 | \subsection{Algorithm} 341 | Dijkstra's algorithm is shown in Algorithm \ref{dijkstra}. 342 | \begin{algorithm}[ht] 343 | \caption{Dijkstra's Shortest Path Algorithm}\label{dijkstra} 344 | \begin{algorithmic}[1] 345 | \Input\Statex{Directed graph $G(V,E)$. Each edge $e\in E$ has non-negative lengths $l_e$} 346 | \Statex{Source vertex $s$} 347 | \Output\Statex{Shortest path from $s$ to $v$ for all $v\in V$} 348 | \State{Initialize $X=\{s\}$}\Comment{$X$: vertices processed so far} 349 | \State{$A[s]=0$}\Comment{$A[v]$: length of shortest path from $s$ to $v$} 350 | \State{$B[s]= empty\:path$}\Comment{$B[v]$: shortest path from $s$ to $v$} 351 | \While{$X\neq V$} 352 | \State{Among all edges $v\rightarrow w$ with $v\in X,w\notin X$, choose $v^*\rightarrow w^*$ that minimizes $A[v]+l_{vw}$}\Comment{Let's call it ``Dijkstra's greedy score''} 353 | \State{$X\coloneqq X\cup\{w^*\}$} 354 | \State{$A[w^*]\coloneqq A[v^*]+l_{v^*w^*}$} 355 | \State{$B[w^*]\coloneqq B[v^*]+v*\rightarrow w^*$} 356 | \EndWhile 357 | \end{algorithmic} 358 | \end{algorithm} 359 | \subsection{Correctness} 360 | The correctness of Dijkstra's algorithm can be proved by induction. 361 | \begin{proof} 362 | We will try to prove by induction that after each iteration,$\forall v\in X$, $B[v]$ is the shortest path from $s$ to $v$, and $A[v]$ is the length of the shortest path. 363 | 364 | At the beginning, $X=\{s\}$, $A[s]=0$, $B[s]=empty\:path$. Obviously the conclusion is correct. Let's assume that it holds before an iteration, and in this iteration we have chosen the edge $v^*\rightarrow w^*$. In order to add $w^*$ to $X$, we have to prove that $B[v^*]+v^*\rightarrow w^*$ with length $A[v^*]+l_{v^*w^*}$ is the shortest path from $s$ to $w^*$. 365 | 366 | Take any path $S$ from $s$ to $w^*$. It has to cross the boundary between $X$ and $V-X$ somewhere. Suppose the edge from $X$ to $V-X$ is $y\rightarrow z$. This path can be divided into 3 segments: 367 | \begin{enumerate} 368 | \item $S_1$: from $s$ to $y$. According to our assumption, it is at least as long as $A[y]$: $L(S_1)\geq A[y]$. 369 | \item $S_2$: the edge $y\rightarrow z$. $L(S_2)=l_{yz}.$ 370 | \item $S_3$: from $z$ to $w$. All edges have non-negative length, thus $L(S_3)\geq 0.$ 371 | \end{enumerate} 372 | Dijkstra's algorithm guarantees that 373 | $$A[v^*]+l_{v^*w^*}\leq A[y]+l_{yz}.$$ 374 | Thus we have 375 | $$L(S)=L(S_1)+L(S_2)+L(S_3)\geq A[y]+l_{yz}\geq A[v^*]+l_{v^*w^*}.$$ 376 | Therefore, $B[v^*]+v^*\rightarrow w^*$ is the shortest path from $s$ to $w^*$. 377 | \end{proof} 378 | \subsection{Implementation and Running Time} 379 | A naive implementation of Dijkstra's algorithm can take as long as $O(nm)$ time to run: in each iteration, we have to scan through all edges to find $v^*\rightarrow w^*$. In order to speed up the execution, we have to turn to the heap data structure. 380 | 381 | Heap is a data structure designed to perform insertion and extraction of minimum in $O(\log n)$ time. Conceptually, a heap is an almost perfectly balanced binary tree (null leaves are only allowed at the lowest level). The key of each node must be smaller (or equal to) that of its two children. This property guarantees that the node with the smallest key is at the root. Insertion is performed by adding the element behind the last node and bubbling up, while extraction of minimum is performed by swapping the root and the last node and then bubbling down. The height of the tree is $O(\log n)$, thus insertion and extraction of minimum can be executed in $O(\log n)$ time. 382 | 383 | In the implementation of Dijkstra's algorithm, we use a heap to store the vertices in $V-X$. The key of a node is the smallest Dijkstra's greedy score related to this vertex, i.e. for $v\in V-X$, $key[v]$ is the smallest value of $A[u]+l_{uv},\forall u\in X$. If such edge $u\rightarrow v$ does not exist, $key[v]=+\infty.$ In each iteration of Dijkstra's algorithm, we extract the minimum of the heap and denote it with $w$. Now we should have $w\in X$, and $A[w]$ is the length of the shortest path from $s$ to $w$. Then for all edges $w\rightarrow v$ with $v\in V-X$, we update the key of $v$: 384 | $$key[v]\coloneqq\min\{A[w]+l_{wv},key[v]\}.$$ 385 | If $key[v]$ is changed here, we bubble it down in the heap, which is a $O(\log n)$ operation. In this way the heap gets maintained at each iteration. 386 | 387 | In total, we do $n-1$ extractions of minimum, and at most $m$ bubbling down of element in the heap. Each of these operations is $O(\log n)$, thus the total time consumption is $O((m+n)\log n)$. Since the graph is weakly connected ($\forall v\exists$ path from $s$ to $v$), we have $m\geq n-1$, hence $O(m+n)=O(m)$. In conclusion, the running time of Dijkstra's algorithm implemented using heap is $O(m\log n)$. 388 | \ifx\PREAMBLE\undefined 389 | \end{document} 390 | \fi -------------------------------------------------------------------------------- /NPCompleteness.tex: -------------------------------------------------------------------------------- 1 | \ifx\PREAMBLE\undefined 2 | \input{preamble} 3 | \begin{document} 4 | \fi 5 | \chapter{NP Completeness} 6 | Up to now we have been focusing on problems that can be solved in polynomial time. However, many important problems seem impossible to solve efficiently. We will introduce NP-completeness to formalize the computational intractability of these problems, and introduce some algorithmic approaches to NP-complete problems. We will discuss only deterministic algorithms, but it does not affect the correctness of the conclusions drawn from our discussion: it is not likely that a problem requiring exponential time with deterministic algorithm can be solved by a randomized algorithm in polynomial time. 7 | \section{NP Complete Problems} 8 | \subsection{The Class P} 9 | \begin{definition} 10 | A problem is \textbf{polynomial-time solvable} if there is an algorithm that correctly solves it in $O(n^k)$ time for some constant $k$. $n$ is the length of the input. 11 | \end{definition} 12 | \begin{definition} 13 | The class \textbf{P} is defined as the set of all polynomial-time solvable problems. 14 | \end{definition} 15 | Every problem we've seen in the course is in the class P except for the cycle-free shortest path problem for graphs with negative cycles, as well as the knapsack problem. Knapsack problem is not polynomial-time solvable because its running time is $O(nW)$, while the input length is proportional to $\log W$. 16 | \subsection{Reductions and Completeness} 17 | Computational intractability can be illustrated via relative difficulty, i.e. by showing that a problem is ``as hard as'' a lot of other problems. This requires the definition of reduction. 18 | \begin{definition} 19 | Problem $\Pi_1$ \textbf{reduces} to problem $\Pi_2$ if given a polynomial time subroutine to solve $\Pi_2$, $\Pi_1$ can be solved in polynomial time based on it. 20 | \end{definition} 21 | Suppose $\Pi_1$ reduces to $\Pi_2$. If $\Pi_2\in P$, then $\Pi_1\in P$. The contrapositive is also correct: if $\Pi_1\notin P$, then $\Pi_2\notin P$, i.e. $\Pi_2$ is at least as hard as $\Pi_1$. 22 | \begin{definition} 23 | Let C be a set of problems. A problem $\Pi$ is C-complete if 24 | \begin{enumerate} 25 | \item $\Pi\in C$; 26 | \item Every problem in C reduces to $\Pi$. 27 | \end{enumerate} 28 | \end{definition} 29 | A C-complete problem is the hardest problem in C. 30 | \subsection{Traveling Salesman Problem} 31 | \begin{description} 32 | \Input{A completed undirected graph with non-negative edge costs.} 33 | \Output{A minimum cost tour, i.e. a cycle that visits every vertex exactly once with minimum total cost.} 34 | \end{description} 35 | It has been conjectured for a long time that there exists no polynomial-time algorithm for the TSP problem. In order to demonstrate its computational intractability, we would like to show that it is C-complete for a really big set C. C cannot be the set of all problems. For instance, the Halting problem, i.e. given a program and an input for it, determine whether it will halt, has been proved to be unsolvable with any algorithm. TSP is obviously solvable in finite time via brute-force search. A less ambitious yet more promising idea is to try to prove that TSP is as hard as all brute-force solvable problems. 36 | \subsection{The Class NP} 37 | \begin{definition} 38 | A problem is in the class NP\footnote{NP stands for ``nondeterministic polynomial'', instead of ``not polynomial''.} if 39 | \begin{enumerate} 40 | \item Solutions always have length polynomial in the input size; 41 | \item Purported solutions can be verified in polynomial time. 42 | \end{enumerate} 43 | \end{definition} 44 | As an example, looking for a TSP tour with total cost smaller than $T$ is an NP problem. The original TSP problem reduces to this problem via binary search over the threshold $T$. Also, constraint satisfaction problems, e.g. the 3SAT problem, are NP problems. 45 | 46 | The definition of NP ensures that each problem in NP can be solved by brute-force search in exponential time. The first condition implies that the number of candidate solutions is at most exponential in the input size. According to the second condition, each candidate can be verified in polynomial time, thus brute-force search takes at most exponential time. 47 | 48 | The two conditions in the definition of NP are quite weak, thus NP is a really big class. In fact, the majority of natural computational problems are in NP. By definition of completeness, a polynomial-time algorithm for one NP-complete problem can help to solve every problem in NP in polynomial time, which implies that P=NP. Thus NP-completeness is a strong evidence of computational intractability. 49 | 50 | A lot of NP-complete problems have been recognized, including the TSP problem. In general, in order to prove the NP-completeness of a problem, we should prove that a known NP-complete problem can reduce to it. 51 | 52 | Whether P = NP or not is one of the most important open mathematical questions. It is widely conjectured, but not yet proved that P $\neq$ NP. 53 | \subsection{Algorithmic Approaches to NP-complete Problems} 54 | Up to now no polynomial algorithm has been found for NP-complete problems. There are nonetheless a few useful strategies that can help us beat brute-force search. 55 | \subsubsection{Focus on computationally tractable special cases.} 56 | For example, the max-weight independent set problem is NP-complete for general graphs, but P for path graphs and trees. Knapsack problems with polynomial capacity, i.e. $W=O(n)$ can be solved in polynomial time. 2SAT is P, while 3SAT is NP-complete, etc. 57 | \subsubsection{Heuristics} 58 | Heuristics are fast algorithms that are not always correct. We will introduce a few greedy and dynamic-programming based heuristics for Knapsack problem. 59 | \subsubsection{Find exponential algorithms better than brute-force search.} 60 | The dynamic programming algorithm that we have introduced is one such example. We will introduce more later. 61 | \section{Exact Algorithms for NP-complete Problems} 62 | \subsection{The Vertex Cover Problem} 63 | \begin{description} 64 | \Input{An undirected graph $G=(V,E)$.} 65 | \Output{A minimum-cardinality \textbf{vertex cover}, i.e. a subset $S\subseteq V$ that contains at least one end point of each edge of $G$.} 66 | \end{description} 67 | For instance, the minimum size of a vertex cover for a star graph is 1, while for a complete graph (a ``clique'') it is $n-1$. 68 | \begin{theorem} 69 | A set is a vertex cover if and only if its complement is an independence set. 70 | \end{theorem} 71 | \begin{proof} 72 | A set is a vertex cover \\ 73 | $\iff$ Each edge of the graph is adjacent to at least one vertex in the set \\ 74 | $\iff$ Each edge of the graph is adjacent to at most one vertex not in the set, i.e. in its complement\\ 75 | $\iff$ Its complement is an independent set. 76 | \end{proof} 77 | In general, the vertex cover problem is an NP-complete problem. We will take the 3rd approach above, i.e. try to devise an exponential algorithm better than brute-force search. 78 | 79 | Consider a simpler problem: given a positive integer $k$, we would like to check whether or not there exists a vertex cover of size $\leq k$. There are in total $\binom{n}{k}$ candidate solutions. Verifying each candidate takes $O(m)$ time, thus for small $k$, brute-force search takes $O(mn^k)$ time. We can actually do better. 80 | \begin{lemma}\textbf{Substructure Lemma} 81 | Consider an undirected graph $G$ and edge $(u,v)\in G$ and integer $k\geq 1$. Let $G_u$ represent the subgraph of $G$ obtained by deleting $u$ and its incident edges from $G$, and $G_v$ defined similarly. Then $G$ has a vertex of size $k$ $\iff$ $G_u$ or $G_v$ has a vertex cover of size $k-1$. 82 | \end{lemma} 83 | \begin{proof} 84 | ($\Leftarrow$) 85 | Suppose $G_u$ has a vertex cover $S$ of size $k-1$. Denote all edges inside $G_u$ as $E_u$ and edges incident to $u$ as $F_u$, then $E=E_u\cup F_u$ and $E_u\cap F_u=\emptyset$, as shown in Figure \ref{vertexcoverfigure}. 86 | \begin{figure} 87 | \centering 88 | \includegraphics[width=0.4\textwidth]{vertexcover1.jpg} 89 | \caption{Vertex Cover Substructure Lemma}\label{vertexcoverfigure} 90 | \end{figure} 91 | Since $S$ contains at least one endpoint for any edge in $E_u$, $S\cup\{u\}$ is a vertex cover of $G$ of size $k$. 92 | 93 | ($\Rightarrow$) 94 | Let $S$ be a vertex cover of $G$ of size $k$. For any edge $(u,v)$, at least one of $u,v$ is inside $S$. Suppose $u\in S$. For any edge in $E_u$, at least one of its endpoints is in $S$, and it cannot be $u$, thus $S-\{u\}$ is a vertex cover of $G_u$ of size $k-1$. 95 | \end{proof} 96 | According to the substructure lemma, we can compute the vertex cover of a graph with Algorithm \ref{vertexcover}. 97 | \begin{algorithm}[ht] 98 | \caption{Vertex Cover Problem}\label{vertexcover} 99 | \begin{algorithmic}[1] 100 | \Input{Undirected graph $G=(V,E)$.} 101 | \Output{Vertex cover of $G$ of size $k$.} 102 | \If{$k=0$} 103 | \If{$V=\emptyset$} 104 | \State{Return $\emptyset$} 105 | \Else\State{Fail} 106 | \EndIf\EndIf 107 | \State{Pick an arbitrary edge $(u,v)\in E$.} 108 | \State{Recursively search for a vertex cover $S$ of size $k-1$ in $G_u$.} 109 | \State{If found, return $S\cup\{u\}$.} 110 | \State{Recursively search for a vertex cover $S$ of size $k-1$ in $G_v$.} 111 | \State{If found, return $S\cup\{v\}$.} 112 | \State{Fail.}\Comment{$G$ has no vertex cover of size $k$.} 113 | \end{algorithmic} 114 | \end{algorithm} 115 | 116 | There are at most $O(2^k)$ recursive calls, each call takes $O(m)$ time (construction of $G_u$ or $G_v$), thus the overall running time is $O(m2^k)$, which is much better than $O(mn^k)$. 117 | 118 | \subsection{Traveling Salesman Problem} 119 | A brute-force search algorithm takes $O(n!)$ time. We will try to devise an algorithm with better running time. 120 | 121 | For every destination $j\in\{1,2,\dots,n\}$ and every subset $S\subseteq\{1,2,\dots,n\}$ that contains 1 and $j$, let $L_{S,j}$ represent the minimum length of a path from 1 to $j$ that visits each vertex in $S$ exactly once. 122 | \begin{lemma}\textbf{Optimal Substructure Lemma} 123 | Let $P$ be a shortest path from 1 to $j$ that visits each vertex in $S$ exactly once. If the last hop of $P$ is $(k,j)$ and the rest of $P$ is $P'$, then $P'$ is a shortest path from 1 to $k$ that visits each vertex in $S-\{j\}$ exactly once. 124 | \end{lemma} 125 | The proof of the optimal substructure lemma is straightforward. According to the lemma, we have the induction relation 126 | $$L_{S,j}=\min_{k\in S,k\neq j}\{L_{S-\{j\},k}+c_{kj}\}.$$ 127 | Algorithm \ref{tsp} is the dynamic programming algorithm based on this relation. 128 | \begin{algorithm}[ht] 129 | \caption{TSP Problem}\label{tsp} 130 | \begin{algorithmic}[1] 131 | \Input{A completed undirected graph with cost $c_{uv}$ for edge $(u,v)$.} 132 | \Output{$2^n\times n$ 2D array $A$ with $A[S][j]=L_{S,j}$.} 133 | \State{Initialize $A[S][1]=\begin{cases} 134 | 0&if\:S=\{1\}\\ 135 | +\infty&otherwise 136 | \end{cases}$} 137 | \For{$m=2,3,\dots,n$}\Comment{$m$ = subproblem size} 138 | \For{each $S\subseteq\{1,2,\dots,n\}$ with $\lvert S\rvert=m$ and $1\in S$} 139 | \For{each $j\in S$ and $j\neq 1$} 140 | \State{$A[S][j]=\min_{k\in S,k\neq j}\{A[S-\{j\}][k]+c_{kj}\}$} 141 | \EndFor\EndFor\EndFor 142 | \State{Return $\min_{j=2,\dots,n}{A[\{1,2,\dots,n\}][j]+c_{j1}}$} 143 | \end{algorithmic} 144 | \end{algorithm} 145 | 146 | There are $n2^n$ subproblems, each of which takes $O(n)$ time. Thus the overall running time is $O(n^22^n)$. 147 | 148 | \section{Heuristics for NP-Complete Problems} 149 | In this section we will use Knapsack problem as an example to illustrate the use of heuristics to tackle NP-complete problems. 150 | \subsection{Greedy Heuristic} 151 | In the Knapsack problem, we aim at maximize the total value of items under limited budget of total weight. Thus, ideal items are those with big value and small weight. A natural greedy choice is to always select the item with the largest $\frac{v}{w}$ ratio. However, this greedy approach has no guarantee of proximity to the optimal solution at all. For the simple example with $v_1=1,w_1=1,v_2=999,w_2=1000,W=1000$, we will end up with result 1, while the optimal is actually 999. 152 | 153 | A simple modification to this greedy heuristic can provide a fairly good proximity guarantee, as shown in Algorithm \ref{knapsackgreedyheuristic}. 154 | \begin{algorithm}[ht] 155 | \caption{Refined Greedy Heuristic for Knapsack Problem}\label{knapsackgreedyheuristic} 156 | \begin{algorithmic}[1] 157 | \Input{$n$ items with value $v_i$ and weight $w_i$ for item $i$. Total weight budget $W$(Assume that $w_i\leq W,\forall i$).} 158 | \Output{A subset $S$ of the items that maximizes $\sum_{i\in S} v_i$ and subject to $\sum_{i\in S}w_i\leq W$.} 159 | \State{Re-index the items so that $\frac{v_1}{w_1}\geq\frac{v_2}{w_2}\geq\dots\geq\frac{v_n}{w_n}$.} 160 | \State{Put the items into $S'$ in order until one does not fit.} 161 | \State{Let $k$ be the item with largest value.} 162 | \If{$\sum_{i\in S'}v_i>v_k$} 163 | \State{$S=S'$} 164 | \Else\State{$S=\{k\}$.} 165 | \EndIf 166 | \end{algorithmic} 167 | \end{algorithm} 168 | 169 | The value of the solution provided by Algorithm \ref{knapsackgreedyheuristic} is always $\geq 50\%$ of the value of the optimal solution. 170 | \begin{proof} 171 | Suppose $S$ contains $m$ items, and we are allowed to put a fraction of item $m+1$ into $S$ so that $\sum_{i\in S}w_i=W$. Call this solution the ``greedy fractional solution''. Obviously this solution is at least as good as the optimal solution: it uses up the weight budget and guarantees larger overall $\frac{v}{w}$ ratio. 172 | 173 | Now let's consider the result of Algorithm \ref{knapsackgreedyheuristic}. It guarantees two conditions: 174 | \begin{align*} 175 | \sum_{i\in S}v_i&\geq\sum_{i\in S'}v_i,\\ 176 | \sum_{i\in S}v_i&\geq v_k\geq v_{m+1}. 177 | \end{align*} 178 | Summing up the two inequalities, we have 179 | \begin{align*} 180 | 2\sum_{i\in S}v_i&\geq\sum_{i\in S'}v_i+v_{m+1}\\ 181 | &\geq\text{value of greedy fractional solution}\\ 182 | &\geq\text{value of optimal solution}\\ 183 | \end{align*} 184 | Hence the value of the heuristic solution is at least 50\% of the value of the optimal solution. 185 | \end{proof} 186 | The 50\% proximity guarantee is tight, i.e. it is mathematically impossible to universally prove a better guarantee. Consider an example with $v_1=102,w_1=101,v_2=v_3=100,w_2=w_3=100$ and $W=200$. The optimal value is 200, while the solution provided by the refined greedy heuristic has value 102. 187 | 188 | Nonetheless, for specific inputs, the guarantee can be better. If $w_i\leq\delta W,\forall i$, in which $\delta$ is a small value such as 10\%, $S'$ is sure to use at least $(1-\delta)$ of $W$, thus value of the heuristic solution is at least $(1-\delta)$ of that of the optimal solution. 189 | \subsection{DP Heuristic} 190 | A dynamic programming heuristic can provide arbitrary proximity guarantee specified by user, making it possible to tune the trade-off between accuracy and running time. However this is not universal for all NP-complete problems. 191 | 192 | In the last chapter we introduced a DP algorithm that solves the Knapsack problem in $O(nW)$ time. The subproblems were to calculate the maximal total value with total weight smaller than $x$ using only the first $i$ items, in which $i=0,1,\dots,n$ and $x=0,1,\dots,W$. There is actually another DP algorithm that solves another set of subproblems: for $i=0,1,\dots,n$ and $x=0,1,\dots,nv_{max}$, calculate the minimum total weight $S_{i,x}$ needed to achieve total value $\geq x$ using the first $i$ items. The recurrence relation is 193 | \begin{equation*} 194 | S_{i,x} = \min\{S_{i-1,x}, w_i+S_{i-1,x-v_i}\} 195 | \end{equation*} 196 | $S_{i-1,x-v_i}$ is interpreted as 0 if $v_i>x$. The algorithm is shown in Algorithm \ref{knapsackdp2}. The overall algorithm is $O(n^2v_max)$. 197 | \begin{algorithm}[ht] 198 | \caption{Another DP Algorithm for Knapsack}\label{knapsackdp2} 199 | \begin{algorithmic}[1] 200 | \Input{$n$ items with value $v_i$ and weight $w_i$ for item $i$. Total weight budget $W$(Assume that $w_i\leq W,\forall i$).} 201 | \Output{$n\times(nv_{max}+1)$ 2D array $A$ with $A[i][x]=S_{i,x}$.} 202 | \State{Base case: $A[0][0]=0$, $A[0][x]=+\infty$ for $x\neq 0$.} 203 | \For{$i=1,2,\dots,n$} 204 | \For{$x=0,1,\dots,nv_{max}$} 205 | \If{$v_i \sum_{i\in S^*}(v_i-m)\geq \sum_{i\in S^*}v_i-mn. 218 | \end{equation} 219 | In order to achieve $(1-\epsilon)$ proximity, we need to guarantee 220 | \begin{equation}\label{npknapsackeq2} 221 | \sum_{i\in S^*}v_i-\sum_{i\in S}v_i\leq\epsilon\sum_{i\in S^*}v_i. 222 | \end{equation} 223 | According to \eqref{npknapsackeq1}, a sufficient condition for \eqref{npknapsackeq2} is 224 | \begin{equation*} 225 | mn\leq \epsilon\sum_{i\in S^*}v_i, 226 | \end{equation*} 227 | which can be satisfied by setting 228 | \begin{equation*} 229 | m=\frac{\epsilon v_{max}}{n}. 230 | \end{equation*} 231 | In terms of running time, since $\hat{v_{max}}\leq\frac{v_{max}}{m}=\frac{n}{\epsilon}$, the overall running time is $O(n^2\hat{v_{max}}=O(n^3/\epsilon)$. 232 | 233 | In summary, in order to guarantee $(1-\epsilon)$ proximity to the optimal solution, we should solve the Knapsack instance using $\hat{v_i}=\lfloor\frac{nv_i}{\epsilon v_{max}}\rfloor$ as value, and the running time will be $O(n^3/\epsilon)$. 234 | \section{Local Search} 235 | \subsection{Principles of Local Search} 236 | In this section we will introduce a widely used paradigm to address NP complete problems: local search. 237 | 238 | Let $X$ be a set of solutions to a problem, for example all cuts of a graph, all possible tours of a TSP problem, all possible variable assignments of a constraint satisfaction problem, etc. For each $x\in X$, specify a subset of $X$ as its neighbors. For a graph cut, its neighborhood can be defined as the set of all cuts obtained by switching the side of one vertex; for a TSP tour, each neighbor differ from itself by 2 edges; for a CSP assignment, its neighbors differ from itself in the value of one single variable; etc. A general local search algorithm has the following structure: 239 | \begin{enumerate} 240 | \item Let $x$ be some solution. 241 | \item While $x$ has a neighbor $y$ superior to itself, set $x\coloneqq y$. 242 | \item Return the locally optimal solution $x$. 243 | \end{enumerate} 244 | 245 | There can be multiple local optimal solutions of a problems. If $X$ is finite and each iteration is guaranteed to improve some objective function, then the local search algorithm is guaranteed to terminate and converge to one of the local optimal solutions. However, it usually does not converge quickly. Neither is it guaranteed to provide a good approximation to the global optimal solution. To mitigate this shortcoming, we can run the local search multiple times with randomly chosen starting point, and return the best local optimal solution found. Different definition of neighborhood result in different local optimal solutions. In general bigger neighborhoods leads to fewer local optimal solutions, but makes it more difficult to verify local optimality, which is a quality-speed trade-off worthy of some tuning. 246 | \subsection{Maximum Cut Problem} 247 | \begin{description} 248 | \Input{An undirected graph $G(V,E)$.} 249 | \Output{A cut $(A,B)$ that maximizes the number of crossing edges.} 250 | \end{description} 251 | If the graph is a bipartite graph, the problem is solvable within linear time via BFS. Just start from any vertex and mark vertices reachable from it with odd number of steps with 1 and the rest with 0, then we obtain the maximum cut. However, in general this is an NP complete problem. We will introduce a local search algorithm that solves the problem. 252 | 253 | For any cut $(A,B)$ and a vertex $v$, define 254 | \begin{description} 255 | \item[$c_v(A,B)$]number of edges incident on $v$ that crosses $(A,B)$. 256 | \item[$d_v(A,B)$]number of edges incident on $v$ that do not cross $(A,B)$. 257 | \end{description} 258 | Then we have the local search algorithm shown in Algorithm \ref{maxcutlocalsearch}. 259 | 260 | \begin{algorithm}[ht] 261 | \caption{Local Search of Maximum Cut}\label{maxcutlocalsearch} 262 | \begin{algorithmic}[1] 263 | \State{Let $(A,B)$ be an arbitrary cut of $G$.} 264 | \While{there is a vertex $v$ with $d_v(A,B)>c_v(A,B)$} 265 | \State{Move $v$ to the other side of the cut.} 266 | \EndWhile 267 | \State{Return the final cut $(A,B)$.} 268 | \end{algorithmic} 269 | \end{algorithm} 270 | For graphs containing no parallel edges, the algorithm terminates within $\binom{n}{2}$ iterations because there are at most $\binom{n}{2}$ edges, and the number of crossing edges always increase by at least 1 before the algorithm converges. Hence the algorithm completes in polynomial time. 271 | 272 | In the output cut, we have $d_v(A,B)\leq c_v(A,B),\forall v\in V$. Hence 273 | \begin{align*} 274 | \sum_vd_v(A,B)\leq \sum_vc_v(A,B). 275 | \end{align*} 276 | The lhs counts each non-crossing edge twice, while the rhs counts each crossing edge twice, thus 277 | \begin{align*} 278 | 2\cdot(\text{num of non-crossing edge})&\leq 2\cdot(\text{num of crossing edge})\\ 279 | &=2\cdot(\lvert E\rvert-\text{num of crossing edge}) 280 | \end{align*} 281 | As a result, we have 282 | $$\text{num of non-crossing edge}\leq\frac{1}{2}\lvert E\rvert,$$ i.e. Algorithm \ref{maxcutlocalsearch} outputs a cut in which the number of crossing edges is at least $\frac{1}{2}\lvert E\rvert$, which is not at all an impressive performance guarantee because a the expectation of the number of crossing edges of a random cut is already $\frac{1}{2}\lvert E\rvert$. 283 | 284 | In a more general case, each $e\in E$ has a non-negative weight $w_e$, and the aim of the problem becomes to maximize the total weight of the crossing edges. The local search algorithm is still well defined, and there is a similar 50\% performance guarantee. However, the algorithm is no longer guaranteed to converge in polynomial time, because the number of crossing edges is no longer guaranteed to increase in each iteration. The number of iterations can be exponential. 285 | \subsection{The 2SAT Problem} 286 | \begin{description} 287 | \Input{$n$ boolean variables $x_i,i=1,2,\dots,n$. $m$ clauses of 2 literals each.} 288 | \Output{Whether there exists an assignment to all $x_i$ that simultaneously satisfies all clauses.} 289 | \end{description} 290 | One example of the clauses with $n=m=4$ is $(x_1\lor x_2)\land(\neg x_1\lor x_3)\land(x_3\lor x_4)\land(\neg x_2\lor\neg x_4)$. 291 | 292 | The 2SAT problem is a special case of the CSP that can be solved in polynomial time\footnote{It can be reduced to computing SCCs of a directed graph. The graph contains 2 vertices $x_i$, $\neg x_i$ for each variable, and two edges $\neg x\rightarrow y,\neg y\rightarrow x$ for each clause $x\lor y$. If $x_i$ and $\neg x_i$ are in the same SCC for any $x_i$, there exists no viable assignment.}. 3SAT is on the contrary NP complete. We will analyze a random local search algorithm that solves the 2SAT problem, as shown in Algorithm \ref{twosatlocalsearch}. 293 | \begin{algorithm}[ht] 294 | \caption{Paradimitriou's Local Search 2SAT Algorithm}\label{twosatlocalsearch} 295 | \begin{algorithmic}[1] 296 | \For{$i=1$ \textbf{to} $\log n$} 297 | \State{Choose a random initial assignment.} 298 | \For{$j=1$ \textbf{to} $2n^2$} 299 | \If{current assignment satisfies all clauses} 300 | \State{Return this assignment} 301 | \Else\State{Pick an arbitrary unsatisfied clause and flip the value of one of its two variables.} 302 | \EndIf\EndFor\EndFor 303 | \State{Report non-existence of such an assignment.} 304 | \end{algorithmic} 305 | \end{algorithm} 306 | Obviously the algorithm runs in polynomial time and is always correct when there exists no satisfactory assignment. Nonetheless its performance guarantee is not trivial when such assignment exists. 307 | 308 | To analyze Paradimitriou's algorithm, we need to first consider the problem of random walks on non-negative integers. Starting at 0, the position goes up or down by 1 at each step with 50/50 probability, except when the current position is 0, in which case we can only go to 1 with 100\% probability. Let $T_n$ represent the number of steps until the random walk reaches position $n$. 309 | \begin{theorem} 310 | $E(T_n)=n^2$ 311 | \end{theorem} 312 | \begin{proof} 313 | Let $Z_i$ represent the number of steps needed to get from $i$ to $n$. Then we have 314 | \begin{align*} 315 | E(Z_n)&=0\\ 316 | E(Z_0)&=E(Z_1)+1\\ 317 | E(Z_i)&=\frac{1}{2}E(Z_{i+1})+\frac{1}{2}E(Z_{i-1})+1,\forall i\geq 1 318 | \end{align*} 319 | Therefore we have 320 | \begin{equation*} 321 | E(Z_i)-E(Z_{i+1})=E(Z_{i-1})-E(Z_i)+2,\forall i\geq 1, 322 | \end{equation*} 323 | which leads to 324 | \begin{align*} 325 | E(Z_i)-E(Z_{i+1})=2i+1,\forall i\geq 0\\ 326 | \end{align*} 327 | As a result, 328 | \begin{align*} 329 | E(Z_0)-E(Z_n)=\sum\limits_{i=0}^{n-1}\left(E(Z_i)-E(Z_{i+1})\right)=n^2. 330 | \end{align*} 331 | Hence $E(T^n)=E(Z_0)=n^2.$ 332 | \end{proof} 333 | \begin{corollary}\label{localsearchcorollary} 334 | $P\left(T_n>2n^2\right)<\frac{1}{2}$ 335 | \end{corollary} 336 | \begin{proof} 337 | \begin{align*} 338 | n^2=E(T^n)&=\sum\limits_{i=0}^{2n^2}iP(T_n=i)+\sum\limits_{i=2n^2+1}^{\infty}iP(T_n=i)\\ 339 | &\geq\sum\limits_{i=2n^2+1}^{\infty}iP(T_n=i)\\ 340 | &>2n^2P\left(T_n>2n^2\right)\\ 341 | \end{align*} 342 | Hence $P\left(T_n>2n^2\right)<\frac{1}{2}.$ 343 | \end{proof} 344 | Now we can prove the following performance guarantee of Paradimitriou's local search algorithm. 345 | \begin{theorem} 346 | For a satisfiable 2SAT instance with $n$ variables, Paradimitriou's algorithm produces a satisfying assignment with probability $\geq 1-\frac{1}{n}$. 347 | \end{theorem} 348 | \begin{proof} 349 | Let's focus on one iteration of the outer loop. 350 | 351 | Let $a^*$ represent an arbitrary satisfying assignment (there can be multiple such assignments), and let $a_t$ represent the assignment after $t$ iterations of the inner loop ($t=1,2,\dots,2n^2$). Let $\chi_t$ represent the number of variables on whose value $a^*$ and $a_t$ agree, thus $\chi_t$ is between 0 and $n$. 352 | 353 | If $\chi_t\neq n$, then there must be at least one unsatisfied clause. Suppose the clause concerns $x_i$ and $x_j$. The consequence of flipping the value of $x_i$ \textbf{or} $x_j$ (choose randomly) is: 354 | \begin{itemize} 355 | \item If $a^*$ and $a_t$ disagree on both $x_i$ and $x_j$: $\chi_{t+1}=\chi_t+1$. 356 | \item If $a^*$ and $a_t$ disagree on one of $x_i$ and $x_j$: 357 | \begin{equation*} 358 | \chi_{t+1}=\begin{cases} 359 | \chi_t+1&(50\%\:possibility)\\ 360 | \chi_t-1&(50\%\:possibility)\\ 361 | \end{cases} 362 | \end{equation*} 363 | \end{itemize} 364 | There is an obvious analogy between the behavior of $\chi_t$ and the position in the random walks problem except: 365 | \begin{enumerate} 366 | \item Sometimes $\chi_t$ increases by 1 with possibility 100\%; 367 | \item The process may terminate before $\chi_t=n$ because there could be other viable assignments; 368 | \item Usually we start with $\chi_1>0$ instead of $\chi_1=0$: the miserable situation in which we start with the exact opposite of a satisfying assignment is rare. 369 | \end{enumerate} 370 | All three differences only make it easier for the process to terminate correctly. Let $T$ represent the number of iterations needed inside each inner loop. According to Corollary \ref{localsearchcorollary}, we have $P(T>2n^2)<\frac{1}{2}$. Thus the probability that one iteration of the outer loop ends up with a satisfying assignment is at least $\frac{1}{2}$. With $\log n$ iterations, the probability that we end up with a correct solution is $\geq 1-\frac{1}{2^{\log n}}=1-\frac{1}{n}.$ 371 | 372 | 373 | \end{proof} 374 | \ifx\PREAMBLE\undefined 375 | \end{document} 376 | \fi -------------------------------------------------------------------------------- /DynamicProgramming.tex: -------------------------------------------------------------------------------- 1 | \ifx\PREAMBLE\undefined 2 | \input{preamble} 3 | \begin{document} 4 | \fi 5 | \chapter{Dynamic Programming} 6 | In this chapter we will introduce the last algorithm design paradigm: dynamic programming. 7 | \section{Max-weight Independent Sets} 8 | Our first example of dynamic programming is a relatively simple graph problem. 9 | \begin{description} 10 | \item[Input]A path graph $G(V,E)$ with non-negative weights on vertices. 11 | \item[Output]An independent set, i.e. a subset of $V$ in which no vertices are adjacent, of maximum total weight. 12 | \end{description} 13 | \begin{center} 14 | \begin{tikzpicture} 15 | \tikzstyle{chosen} = [fill=red!20!, circle, draw=red] 16 | \tikzstyle{normal} = [draw, circle] 17 | \node[normal] (0) at (-4,0) {1}; 18 | \node[chosen] (1) at (-2,0) {4}; 19 | \node[normal] (2) at (0,0) {5}; 20 | \node[chosen] (3) at (2,0) {4}; 21 | \draw (0) -- (1) -- (2) --(3); 22 | \end{tikzpicture} 23 | \end{center} 24 | In the example above, the WIS is obviously the two red nodes. Generally, a brute-force approach takes exponential time. An intuitive greedy algorithm does not guarantee a correct answer: it is actually wrong for the simple example above. The divide-and-conquer paradigm cannot be applied because there is no natural correct way to combine solutions to the two sub-problems. This is when dynamic programming comes to our rescue. 25 | 26 | Let's consider the structure of an optimal solution in terms of its relationship with solutions to smaller problems. Let $S\subseteq V$ be a max-weight independent set (IS) of $G$, $v_n$ be the last vertex of the path and $v_{n-1}$ be the last but one vertex. Denote $G$ with $v_n$ deleted as $G'$, and $G$ with $v_n,v_{n-1}$ deleted as $G''$. 27 | \begin{itemize} 28 | \item If $v_n\notin S$, then $S$ must also be a max-weight IS of $G'$, which can be proved easily by contradiction. 29 | \item If $v_n\in S$, then $v_{n-1}\notin S$. It can be proved easily by contradiction that $S-\{v_n\}$ is a max-weight IS of $G''$. 30 | \end{itemize} 31 | Therefore, a max-weight IS of $G$ is either a max-weight IS of $G'$, or a max-weight IS of $G''$ + $v_n$. The same reasoning holds for smaller problems, which induces a correct recursive algorithm: 32 | \begin{enumerate} 33 | \item Recursively compute $S_1$ = max-weight IS of $G'$. 34 | \item Recursively compute $S_2$ = max-weight IS of $G''$. 35 | \item Return $S_1$ or $S_2\cup\{v_n\}$, whichever is better. 36 | \end{enumerate} 37 | The correctness of the algorithm can be verified by induction. However it takes exponential time because it is per se a variant of the brute-force algorithm. 38 | \begin{center} 39 | \begin{tikzpicture}[ 40 | level 1/.style={sibling distance=6cm, level distance=1.5cm}, 41 | level 2/.style={sibling distance=3cm, level distance=1.5cm}, 42 | level 3/.style={sibling distance=2cm, level distance=1.5cm}] 43 | level 4/.style={sibling distance=1cm, level distance=1.5cm}] 44 | \tikzstyle{node} = [draw, circle,minimum size=0.8cm] 45 | \node[node] (0) at (0,0) {n} 46 | child {node[node]{n-1} 47 | child {node[node]{n-2} 48 | child {node[node]{n-3}} 49 | child {node[node]{n-4}} 50 | } 51 | child {node[node]{n-3} 52 | child {node[node]{n-4}} 53 | child {node[node]{n-5}} 54 | } 55 | } 56 | child {node[node]{n-2} 57 | child {node[node]{n-3} 58 | child {node[node]{n-4}} 59 | child {node[node]{n-5}} 60 | } 61 | child {node[node]{n-4} 62 | child {node[node]{n-5}} 63 | child {node[node]{n-6}} 64 | } 65 | }; 66 | \end{tikzpicture} 67 | \end{center} 68 | As shown above, each sub-problem is calculated multiple times. The number of distinct sub-problems is actually $O(n)$. If we can reformulate the recursive algorithm into a bottom-up iterative algorithm, and cache the solution to a sub-problem the first time it is solved, the problem can be solved in linear time, as shown in Algorithm \ref{maxweightis}. 69 | \begin{algorithm}[ht] 70 | \caption{Max-weight Independent Set(DP)}\label{maxweightis} 71 | \begin{algorithmic}[1] 72 | \Input{Path graph $G=(V,E)$ with non-negative weight $w_i$ for each vertex $v_i$. Sub graph composed of the first $i$ vertices is denoted by $G_i$.} 73 | \Output{Array $A$ with $A[i]=$ total weight of max-weight IS of $G_i$.} 74 | \State{$A[0]=0,A[1]=w_1$.} 75 | \For{$i = 2,3,\dots,n$} 76 | \State{$A[i]=\max\{A[i-1], A[i-2]+w_i\}$} 77 | \EndFor 78 | \end{algorithmic} 79 | \end{algorithm} 80 | 81 | Algorithm \ref{maxweightis} only outputs the total weight of the max-weight IS of $G$. The IS itself can be reconstructed according to array $A$, as shown in Algorithm \ref{reconstructionis}. The running time is also $O(n)$. 82 | 83 | \begin{algorithm}[ht] 84 | \caption{Reconstruction of Max-weight Independent Set}\label{reconstructionis} 85 | \begin{algorithmic}[1] 86 | \Input{Array $A$ computed in Algorithm \ref{maxweightis}.} 87 | \Output{The max-weight IS $S$ of path graph $G$.} 88 | \State{Initialize $S=\emptyset$, $i=n$.} 89 | \While{$i\geq 2$} 90 | \If{$A[i-1]0$ or $j>0$} 205 | \If{$i==0$} 206 | \State{Align all $j$ left characters in $Y$ align with a gap and return} 207 | \ElsIf{$j==0$} 208 | \State{Align all $i$ left characters in $X$ align with a gap and return} 209 | \ElsIf{$A[i][j]==A[i-1][j-1]+\alpha_{ij}$} 210 | \State{Align $x_i$ with $y_j$} 211 | \State{$i=i-1,j=j-1$} 212 | \ElsIf{$A[i][j]==A[i][j-1]+\alpha_{gap}$} 213 | \State{Align $y_j$ with a gap} 214 | \State{$j=j-1$} 215 | \Else\State{Align $x_i$ with a gap} 216 | \State{$i=i-1$} 217 | \EndIf\EndWhile 218 | \end{algorithmic} 219 | \end{algorithm} 220 | \section{Optimal Binary Search Trees} 221 | In a BST, the time consumption of searching for an item $i$ is $O(d_i)$, in which $d_i$ is the depth of $i$ in the BST. If the search for each item is equally likely to occur, the ideal BST should be balanced. Nonetheless, if we have knowledge of the likelihood for each item to be searched for, the optimal BST is not necessarily balanced. More frequently visited items should be put closer to the root. In Huffman code problem, we aimed at minimizing the average coding length; here, our target is to minimize the average search time. 222 | \begin{description} 223 | \Input{Frequencies $p_i$ for items $i=1,2,\dots,n$. } 224 | \Output{An optimal BST that minimizes the average search time 225 | $C(T)=\sum\limits_{i}p_i(d_i+1).$} 226 | \end{description} 227 | For Huffman coding, a bottom-up greedy algorithm efficiently solves the problem. But here either bottom-up or top-down greedy algorithm cannot guarantee an optimal solution. 228 | 229 | Suppose the optimal BST $T$ has left child tree $T_1$, right child tree $T_2$ and root $r$. Items $\{1,\dots,r-1\}$ are contained in $T_1$, while $\{r+1,\dots,n\}$ lie in $T_2$. Then we have 230 | \begin{align*} 231 | C(T)&=\sum\limits_{i=1}^np_i(d_i+1)=p_r + \sum\limits_{i=1}^{r-1}p_i(d_i+1) + \sum\limits_{i=r+1}^{n}p_i(d_i+1)\\ 232 | &=p_r+\sum\limits_{i=1}^{r-1}p_i(d_{1i}+1 + 1) + \sum\limits_{i=r+1}^{n}p_i(d_{2i}+1 + 1)\\ 233 | &=p_r+C(T_1)+\sum\limits_{i=1}^{r-1}p_i + C(T_2)+\sum\limits_{i=r+1}^{n}p_i\\ 234 | &=C(T_1)+C(T_2)+\sum\limits_{i=1}^{n}p_i. 235 | \end{align*} 236 | Thus it can be proved by contradiction that $T_1$ must be optimal for $\{1,\dots,r-1\}$, and $T_2$ must be optimal for $\{r+1,\dots,n\}$. 237 | \begin{lemma}\textbf{Optimal Structure Lemma} 238 | If $T$ is an optimal BST for the keys $\{1,\dots,n\}$ with root $r$, then its subtrees $T_1$ and $T_2$ must be optimal BSTs respectively for $\{1,\dots,r-1\}$ and $\{r+1,\dots,n\}$. 239 | \end{lemma} 240 | For $1\leq i\leq j\leq n$, let $C_{ij}$ represent the average search time of an optimal BST for items $\{i,\dots,j\}$. According to the optimal structure lemma, we can set up the recurrence relation: 241 | \begin{equation*} 242 | C_{ij}=\min\limits_{r=i}^j\left(C_{i,r-1}+C_{r+1,j}\right)+\sum\limits_{k=i}^jp_k, 243 | \end{equation*} 244 | which leads to the DP algorithm \ref{optimalbstdp} to solve the problem. 245 | \begin{algorithm}[ht] 246 | \caption{Optimal BST(DP)}\label{optimalbstdp} 247 | \begin{algorithmic}[1] 248 | \Input{Frequencies $p_i$ for items $i=1,2,\dots,n$. } 249 | \Output{$n\times n$ 2D array $A$ with $A[i][j]$ representing optimal average search time for items $\{i,\dots,j\}$. } 250 | \State{Initialize $A[i][i]=p_i$ for $i=1,\dots,n$.}\Comment{Base case: single node.} 251 | \For{$s=0$ \textbf{to} $n-1$}\Comment{$A[i][j]=0$ if $i>j$ or $i,j$ out of bound. } 252 | \For{$i=1$ \textbf{to} $n-s$} 253 | \State{$A[i][i+s]=\min\limits_{r=i}^{i+s}\left(A[i][r-1]+A[r+1][i+s]\right)+\sum\limits_{k=i}^{i+s}p_k$} 254 | \EndFor\EndFor 255 | \end{algorithmic} 256 | \end{algorithm} 257 | 258 | In total there are $O(n^2)$ sub-problem, and each requires $O(j-i)$ time. Hence the overall running time is $O(n^3)$. Nonetheless it has been proven that an optimized version of the algorithm takes only $O(n^2)$ time. 259 | \section{Bellman Ford Algorithm} 260 | We've already introduced Dijkstra's algorithm to solve the shortest path problem when edge lengths are non-negative. Bellman Ford algorithm comes to our rescue when there exist edge lengths with negative lengths. Also, it provides a distributed alternative to Dijkstra's algorithm, which is needed to solve the Internet routing problem. 261 | \begin{description} 262 | \Input{Directed graph $G(V,E)$ with edge lengths $c_e$ for each $e\in E$ and source vertex $s\in V$. } 263 | \Output{For every destination $v\in V$, compute the length of a shortest $s-v$ path.} 264 | \end{description} 265 | We face a dilemma when it comes to negative cycles: if we take them into account in the search for shortest paths, a lot of vertices will end up with shortest paths with length $-\infty$. If we require that they should not be included in the paths, the problem becomes unsolved in polynomial time, i.e. NP hard. For the moment, we just assume that they do not exist in the graphs. Later we will introduce a criteria that detects negative cycles with little increase in the amount of workload. 266 | \subsection{Algorithm} 267 | If there exists no negative cycles, then the shortest $s-v$ path for any vertex $v$ contains at most $n-1$ edges, because any path containing $n$ edges is sure to contain a cycle, and we can always get rid of the cycle, thus reducing the total cost, without breaking the reachability from $s$ to $v$. This inspires us of the definition of the subproblems in the dynamic programming paradigm: a shortest path from $s$ to $v$ with $i$ edges, in which $i=0,1,\dots,n-1$. We have the following lemma correct even for graphs with negative cycles. 268 | \begin{lemma} 269 | Let $G(V,E)$ be a directed graph with edge length $c_e$ for edge $e$ and source vertex $s$. For any vertex $v$, let $P$ represent the shortest path from $s$ to $v$ with at most $i$ paths. Then one of the two cases must be true. 270 | \begin{description} 271 | \item[case 1]If $P$ has $\leq(i-1)$ edges, then it is also a shortest $s-v$ path with $\leq(i-1)$ edges. 272 | \item[case 2]If $P$ contains $i$ edges with $(w,v)$ being the last hop, then $P'=P-(w,v)$ must be the shortest $s-w$ path with $\leq(i-1)$ edges. 273 | \end{description} 274 | \end{lemma} 275 | The lemma can be easily proved by contradiction. It can serve as the recursion relation for our dynamic programming algorithm. The number of candidates for the solution to the subproblem for vertex $v$ of size $i$ , whose length we will denote as $L_{i,v}$, is $1+in-degree(v)$: solution to the subproblem with size $i-1$, and every incident edge $(w,v)$ + solution to the subproblem for $w$ of size $i-1$, i.e. 276 | \begin{equation*} 277 | L_{i,v} = \min\left\{L_{i-1,v}, \min\limits_{(w,v)\in E}\left(L_{i-1,w}+c_{wv}\right)\right\},\:\forall v\in V. 278 | \end{equation*} 279 | \begin{algorithm}[ht] 280 | \caption{Bellman Ford Algorithm}\label{bellmanford} 281 | \begin{algorithmic}[1] 282 | \Input{Directed graph $G(V,E)$ with edge length $c_e$ for all $e$ and source vertex $s$.} 283 | \Output{$n\times n$ 2D array $A$ with $A[i][v]=L_{i,v}$, in which $i=0,1,\dots,n-1$, $v\in V$.} 284 | \State{Initialize $A[0][s]=0$ and $A[0][v]=+\infty$ for all $v\neq s$.} 285 | \For{$i=1$ \textbf{to} $n-1$} 286 | \For{each $v\in V$} 287 | \State{$A[i][v]=\min\left\{A[i-1][v], \min\limits_{(w,v)\in E}\left(A[i-1][w]+c_{wv}\right)\right\}$} 288 | \EndFor\EndFor 289 | \end{algorithmic} 290 | \end{algorithm} 291 | 292 | As long as $G$ contains no negative cycle, $A[n-1][v]$ is guaranteed to be the length of the shortest $s-v$ path for any $v\in V$. The running time of Algorithm \ref{bellmanford} is $O(mn)$, because there are $n$ iterations in the outer loop, and each edge is examined exactly once in the inner loop. Note that if or some $jw_j$ and $l_i>l_j$? This can be achieved by assigning scores to jobs, and scheduling jobs with higher scores in front. The score has to increase with weight and decrease with length. Two intuitive choices are 27 | \begin{itemize} 28 | \item $w_j-l_j$ 29 | \item $w_j/l_j$ 30 | \end{itemize} 31 | A simple 2-job case with $l_1=5,\allowbreak w_1=3$ and $l_2=2,\allowbreak w_2=1$ rules the first option out. We will try to prove the correctness of the second option, whose correctness is absolutely not trivial. 32 | \begin{proof} 33 | First we assume that all jobs have distinct scores, i.e. $w_i/l_i\neq w_j/l_j$ for $i\neq j$. This case can be addressed via contradiction. 34 | 35 | The $n$ jobs can be renamed so that $\frac{w_1}{l_1}>\frac{w_2}{l_2}>\dots>\frac{w_n}{l_n}$. According to the rule above, the optimal order should be $1,2,\dots,n$. Suppose that there exists an order superior to this one. In this order, there must exist at least one pair of consecutive jobs $i,j$ such that $i>j$ but $i$ is behind $j$. If we exchange $i,j$, the completion time of $i$ will decrease by $l_j$, whilst that of $j$ will increase by $l_i$. In total, the weighted sum of completion times decreases by 36 | $$w_il_j-w_jl_i.$$ 37 | Since $i>j$, we must have $\frac{w_i}{l_i}>\frac{w_j}{l_j}$, thus $w_il_j>w_jl_i$. In conclusion, we have obtained a better order than the one supposed to be the optimal, which negates the initial assumption. 38 | 39 | With similar argument, the correctness of the algorithm can be verified for the general case with possible ties in score. In an arbitrary order of the jobs, a consecutive pair $(i,j)$ with $i>j$ and $i$ behind $j$ can be called an inversion\footnote{The definition of inversion here is different from that in Algorithm \ref{inversioncounting}.} The number of inversions is at most $\frac{n(n-1)}{2}$, and the only order without any inversion is $1,2,\dots,n$. Since we have $w_il_j\geq w_jl_i$ for $i>j$, exchanging an inversion is guaranteed not to increase the weighted sum of completion times. Each exchange decreases the number of inversions strictly by one. Thus after at most $\frac{n(n-1)}{2}$ exchanges, we must arrive at the order $1,2,\dots,n$ with no increase of the weighted sum. Therefore $1,2,\dots,n$ is at least as good as any other order in terms of weighted sum of completion times, making it a guaranteed optimal solution to the scheduling problem. 40 | \end{proof} 41 | \section{Minimum Spanning Tree} 42 | Minimum spanning tree is a problem to which there exist a bunch of correct and fast greedy solutions. We will discuss two of them: Prim's algorithm and Kruskal's algorithm. 43 | \begin{description} 44 | \item[Input]An undirected graph $G(V,E)$ with a possibly negative cost $c_e$ for each $e\in E$. 45 | \item[Output]A minimum cost tree $T\subseteq E$ that spans all vertices, i.e. connected subgraph $(V,T)$ that contains no cycles with minimum sum of edge costs. 46 | \end{description} 47 | In order to facilitate the discussion, we assume that graph $G$ is connected, and that edge costs are distinct, although Prim and Kruskal remain correct for ties in edge costs. 48 | \subsection{Prims' Algorithm} 49 | Prim's MST algorithm is shown in Algorithm \ref{primmst}. 50 | \begin{algorithm}[ht] 51 | \caption{Prim's MST Algorithm}\label{primmst} 52 | \begin{algorithmic}[1] 53 | \Input{Undirected graph $G(V,E)$ with distinct cost $c_e$ for all $e\in E$.} 54 | \Output{MST of $G$} 55 | \State{Initialize $X=\{s\}$, $s\in V$ chosen arbitrarily} 56 | \State{Initialize $T=\emptyset$} 57 | \While{$X\neq V$} 58 | \State{Let $e=(u,v)$ be the cheapest edge of $G$ with $u\in X, v\notin X$} 59 | \State{Add $e$ to $T$} 60 | \State{Add $v$ to $X$} 61 | \EndWhile 62 | \end{algorithmic} 63 | \end{algorithm} 64 | \subsubsection{Correctness} 65 | We will prove its correctness in two steps. First, we verify that it does compute a spanning tree $T^*$. Then we prove that $T^*$ is an MST. 66 | \begin{lemma}\textbf{(Empty Cut Lemma)}\label{emptycutlemma} 67 | A graph is not connected if and only if $\exists$ cut $(A,B)$ with no crossing edges. 68 | \end{lemma} 69 | \begin{proof} 70 | $(\Leftarrow)$The proof is trivial. Just take vertex $u\in A$ and $v\in B$. There cannot exist any edges between $u,v$, thus the graph is not connected. 71 | 72 | $(\Rightarrow)$Suppose there exists no path between $u,v$. Take $A=\{$All vertices reachable from $u\}$, i.e. the connected component of $u$, $B=\{$All other vertices$\}$, i.e. other connected components. Then there exists no crossing edges of the cut $(A,B)$. 73 | \end{proof} 74 | \begin{lemma}\label{doublecrossinglemma} 75 | \textbf{(Double Crossing Lemma)} 76 | Suppose the cycle $C\subseteq E$ has an edge crossing the cut $(A,B)$, then there must exist some other edge $e\in C$ that crosses the same cut. 77 | \end{lemma} 78 | \begin{corollary} 79 | \textbf{(Lonely Cut Corollary)}\label{lonelycutcorollary} 80 | If $e$ is the only edge crossing a cut $(A,B)$, then it is not contained in any cycle. 81 | \end{corollary} 82 | With the lemmas and the corollaries above, we can prove that Prim's algorithm outputs a spanning tree. 83 | \begin{proof} 84 | It can be proved by induction that Prim's algorithm maintains the invariant that $T$ spans $X$. The proof of connectivity is trivial. No cycle can be created in $T$ because each time an edge $e$ is added into $T$, it becomes the only crossing edge of the cut $(X,\{v\})$ of $T$, and therefore cannot be contained in a cycle according to Corollary \ref{lonelycutcorollary}. 85 | 86 | The algorithm cannot get stuck when $X\neq V$, because otherwise the cut $(X,V-X)$ must be empty, and according to Lemma \ref{emptycutlemma}, the graph would be disconnected. 87 | 88 | As a conclusion, Prim's algorithm is guaranteed to output a spanning tree of the original graph. 89 | \end{proof} 90 | The second part of the proof is based on the cut property. 91 | \begin{theorem}\label{cutproperty} 92 | \textbf{(Cut Property)} 93 | Consider an edge $e$ of graph $G$. If $\exists$ cut $(A,B)$ such that $e$ is the cheapest crossing edge of the cut, then $e$ belongs to the\footnote{We use ``the'' rather than ``a'' because the MST is unique if edge costs are distinct.} MST of $G$. 94 | \end{theorem} 95 | \begin{proof} 96 | The cut property can be proved by exchange argument. 97 | 98 | Suppose there is an edge $e$ that is the cheapest crossing edge of a cut $(A,B)$, yet $e$ is not in the MST $T^*$. As shown in Figure \ref{proofcutproperty}, in which all blue edges form the minimum spanning tree $T^*$, and the minimum crossing edge $e$ of $(A,B)$ is not contained in $T^*$. 99 | \begin{figure}[ht] 100 | \centering 101 | \begin{tikzpicture} 102 | \tikzstyle{nodestyle} = [circle, draw=blue, fill=blue!20!] 103 | \tikzstyle{usualedge} = [blue, very thick] 104 | \tikzstyle{emphedge} = [red, very thick] 105 | \tikzstyle{label} = [midway, above] 106 | \node (A) at (-2,0.5) {A}; 107 | \node (B) at (2,0.5) {B}; 108 | \node[nodestyle](1) at (-2,0){1}; 109 | \node[nodestyle](2) at (-2,-1){2}; 110 | \node[nodestyle](3) at (-2,-2){3}; 111 | \node[nodestyle](4) at (2,0){4}; 112 | \node[nodestyle](5) at (2,-1){5}; 113 | \node[nodestyle](6) at (2,-2){6}; 114 | \draw[usualedge] (1) -- (2); 115 | \draw[usualedge] (2) -- (3); 116 | \draw[usualedge] (1) -- (4) node[label] {f}; 117 | \draw[emphedge] (2) -- (5) node[label] {e}; 118 | \draw[usualedge] (3) -- (6) node[label] {e'}; 119 | \draw[usualedge] (5) -- (6); 120 | \draw (2) ellipse (1cm and 2cm); 121 | \draw (5) ellipse (1cm and 2cm); 122 | \end{tikzpicture} 123 | \caption{Proof of cut property}\label{proofcutproperty} 124 | \end{figure} 125 | 126 | We cannot exchange $e$ with a random crossing edge of the cut $(A,B)$. In this example, if we exchange $e$ with $f$, we no longer have a spanning tree. Rather, if $e$ is exchanged with $e'$, we obtain a spanning tree with smaller cost than $T^*$. Our task is to prove that such an edge $e'$ always exists. 127 | 128 | Since $T^*$ is a spanning tree, $T^*\cup\{e\}$ must contains a cycle that includes $e$. According to Lemma \ref{doublecrossinglemma}, there must exist another edge $e'$ that crosses the cut $(A,B)$. According to the assumption, $e'$ must be more expensive than $e$. By substituting $e'$ with $e$ in $T^*$, we obtain a spanning tree $(T^*-\{e'\})\cup\{e\}$ with smaller cost than $T^*$, which contradicts with our assumption that $T^*$ is the MST. 129 | \end{proof} 130 | 131 | According to the cut property, each edge selected in Prim's algorithm is guaranteed to be part of the MST. Since we obtain a spanning tree in the end, it must be the MST. 132 | \subsubsection{Implementation} 133 | A brute-force implementation of Prim's algorithm has $O(nm)$ running time. In each iteration, all edges going out of $X$ needs to be scanned and the cheapest among them is chosen. The scanning process is $O(m)$, and there are totally $n$ iterations, thus the overall running time is $O(nm)$. This may not seem fast, but considering the fact than there exist $2^m$ possible sub-graphs among which the MST has to be selected, it already provides plenty of performance amelioration. 134 | 135 | Using heap can make Prim's algorithm run even faster. A straightforward idea is to use the heap to store crossing edges of the cut $(X,V-X)$, as shown in Algorithm \ref{primmstheap1}. 136 | \begin{algorithm}[ht] 137 | \caption{Prim's MST Algorithm, Heap Implementation 1}\label{primmstheap1} 138 | \begin{algorithmic}[1] 139 | \Input{Undirected graph $G(V,E)$ with distinct cost $c_e$ for all $e\in E$.} 140 | \Output{MST of $G$} 141 | \State{Initialize an empty heap $H$ and an empty set of nodes $X$.} 142 | \State{Randomly select a node $s$ and add it to $X$.} 143 | \State{Insert all edges $(s,v)$ into the heap.} 144 | \While{$X\neq V$} 145 | \State{Add edge $e=H.extractMin()$ to $T$.} 146 | \State{Add the node $n$ of $e$ not in $X$ to X.} 147 | \For{each edge $(n,v)\in E$} 148 | \If{$v\in V-X$} 149 | \State{Insert $(n,v)$ into $H$.} 150 | \ElsIf{$(n,v)\in H$} 151 | \State{Delete $(n,v)$ from $H$.} 152 | \EndIf 153 | \EndFor 154 | \EndWhile 155 | \end{algorithmic} 156 | \end{algorithm} 157 | Each edge can be inserted into and deleted from the heap at most once respectively, thus the overall running time is $O(m\log m)$. 158 | 159 | The heap can also be used to store vertices in $V-X$, with the key of node $v$ being the cost of the cheapest edge $(u,v)$ with $u\in X$, or $\infty$ if such edge does not exist. The implementation is shown in Algorithm \ref{primmstheap2}. 160 | \begin{algorithm}[ht] 161 | \caption{Prim's MST Algorithm, Heap Implementation 2}\label{primmstheap2} 162 | \begin{algorithmic}[1] 163 | \Input{Undirected graph $G(V,E)$ with distinct cost $c_e$ for all $e\in E$} 164 | \Output{MST of $G$} 165 | \State{Initialize heap $H$ with all nodes in $V$. Obviously all nodes have key $\infty$.} 166 | \State{Initialize empty set of nodes $X$.} 167 | \While{$H$ is not empty} 168 | \State{Add node $v=H.extractMin()$ to $X$.} 169 | \State{Add the edge associated with $v$ to $T$ if it exists.} 170 | \For{each edge $(v,w)\in E$} 171 | \If{$w\in V-X$ \textbf{and} $cost(v,w)2$. 377 | 378 | The base case is trivial: for an alphabet containing 2 characters, 1 bit is enough for the encoding. Suppose the algorithm is correct for any $k\leq n$, in which $n\geq 2$. Let's denote the alphabet for $n$ as $\Sigma'$, and the tree obtained by the algorithm to encode $\Sigma'$ as $T'_0$, which contains a leaf $ab$ that will be split into siblings $a,b$ to obtain tree $T_0$ that will be used to encode $\Sigma$. 379 | 380 | For any tree $T'$ used to encode $\Sigma'$, and tree $T$ obtained by splitting $ab$ of $T'$ into $a$ and $b$, since $p_{ab}=p_a+p_b$ and $d_{a}=d_{b}=d'_{ab}+1$, we have 381 | \begin{equation}\label{proofhuffman} 382 | L(T)-L(T')=p_ad_a+p_bd_b-p_{ab}d'_{ab}=p_a+p_b. 383 | \end{equation} 384 | According to our assumption, $T'_0$ produces the minimum average encoding length $L(T')$ among all possible $T'$. \eqref{proofhuffman} verifies that $L(T)$ has a constant difference from $L(T')$, thus $T_0$ is the optimal choice among all $T$, i.e. all encoding trees of $\Sigma$ that have $a$ and $b$ as siblings. Next we will prove by exchange argument that this optimum is guaranteed to be the overall optimum. 385 | 386 | Encoding trees of $\Sigma$ are guaranteed to have the following properties: 387 | \begin{itemize} 388 | \item Each node has either no child or two children. If a node has only one child, as shown in the following figure, node B can simply be removed.\\ 389 | \begin{figure}[H] 390 | \centering 391 | \begin{tikzpicture} 392 | \tikzstyle{node}=[circle, draw] 393 | \node[node] (0) at (0,0) {A}; 394 | \node[node,red] (1) at (-1,-1) {B}; 395 | \node[node] (2) at (-2,-2) {C}; 396 | \node[node] (3) at (4,-0.5) {A}; 397 | \node[node] (4) at (3,-1.5) {C}; 398 | \draw (0) -- (1); 399 | \draw (1) -- (2); 400 | \draw (3) -- (4); 401 | \draw[->] (0.5,-1) -- (2.5,-1); 402 | \end{tikzpicture} 403 | \end{figure} 404 | \item Leaf nodes at the same level can be interchanged arbitrarily without affecting the average encoding length. 405 | \end{itemize} 406 | 407 | For an encoding tree that do not have $a$ and $b$ as siblings but having them at the same level, we can always find another tree with the same average encoding length having $a,b$ as siblings. Suppose that an encoding tree $T$ with $a,b$ at different levels has the minimum average encoding length. There is at least one of them not at the bottom level, which let's suppose is $a$. There is at least one node other than $b$ in the bottom level, which let's suppose is $x$. Up to now we have $p_a1$} 420 | \State{$a=selectMin(Q_1,Q_2)$} 421 | \State{$b=selectMin(Q_1,Q_2)$} 422 | \State{Push $ab$ with $p_{ab}=p_a+p_b$ to $Q_2$.}\Comment{$a,b$ are siblings and $ab$ is their parent.} 423 | \EndWhile 424 | \State{The only node left in $Q_2$ is the root of $T$.} 425 | \Function{$selectMin$}{$Q_1,Q_2$} 426 | \State{$a=Q_1.front(), b=Q_2.front()$} 427 | \State{Pop the smaller between $a,b$ from its queue and return it.} 428 | \EndFunction 429 | \end{algorithmic} 430 | \end{algorithm} 431 | 432 | \ifx\PREAMBLE\undefined 433 | \end{document} 434 | \fi -------------------------------------------------------------------------------- /DataStructures.tex: -------------------------------------------------------------------------------- 1 | \ifx\PREAMBLE\undefined 2 | \input{preamble} 3 | \begin{document} 4 | \fi 5 | \chapter{Data Structures} 6 | Data structures help us organize data so that it can be accessed quickly and usefully. Different data structures support different sets of operations, thus are suitable for different tasks. 7 | \section{Heap} 8 | A heap, also named a priority queue, is a container for objects with comparable keys. It should support at least two basic operations: insertion of new object, and extraction(i.e. removal) of the object with minimum\footnote{A heap can also support extraction of object with maximum key, but extract-min and extract-max cannot be supported simultaneously.} key. Both operations are expected to take $O(\log n)$ time. Typical heap implementations also support deletion of an object from the key, which is also $O(\log n)$. The construction of a heap, namely ``heapify'', takes $O(n)$ rather than $O(n\log n)$. 9 | \subsection{Use Cases} 10 | Heap can be used for sorting. First construct a heap with the $n$ items to be sorted, and then execute extract-min $n$ times. The process takes $O(n\log n)$ time, which is already the optimal running time for comparison based sorting algorithms. 11 | 12 | We've already covered the use of a heap to accelerate Dijkstra's algorithm in the previous chapter. 13 | 14 | An interesting use case of heap is median maintenance. We define the median of a sorted sequence of $n$ items $x_1,\dots,x_n$ to be $x_{(n+1)/2}$, for example $x_4$ for 8 items and $x_5$ for 9 items. 15 | \begin{description} 16 | \item[Input]A sequence of unsorted items $x_1,x_2,\dots,x_n$ provided one-by-one. 17 | \item[Output]At each step $i$, calculate the median of $x_1,\dots,x_i$ in $O(\log i)$ time. 18 | \end{description} 19 | The problem can be solved using two heaps, as shown in Algorithm \ref{medianmaintenance}. For convenience, we assume that the heaps used here supports not only the extraction of min/max, but also checking the key value of the min/max without removing it. 20 | \begin{algorithm}[ht] 21 | \caption{Median Maintenance using Heaps}\label{medianmaintenance} 22 | \begin{algorithmic}[1] 23 | \InputOutput\Statex{see above} 24 | \State{Initialize empty MaxHeap that supports extract-max}\Comment{Stores smaller half} 25 | \State{Initialize empty MinHeap that supports extract-min}\Comment{Stores larger half} 26 | \For{$i$ = 1 \textbf{to} $n$} 27 | \If{$x_i<$ MaxHeap.checkMax()}\Comment{Should insert into smaller half} 28 | \State{MaxHeap.insert($x_i$)} 29 | \Else\Comment{insert into larger half} 30 | \State{MinHeap.insert($x_i$)} 31 | \EndIf 32 | \If{MinHeap.size() - MaxHeap.size() == 2}\Comment{If unbalanced, balance the two halves} 33 | \State{MaxHeap.insert(MinHeap.extractMin()} 34 | \ElsIf{MaxHeap.size() - MinHeap.size() == 2} 35 | \State{MinHeap.insert(MaxHeap.extractMax()} 36 | \EndIf 37 | \If{MinHeap.size() $>$ MaxHeap.size()}\Comment{Set median} 38 | \State{median = MinHeap.checkMin()} 39 | \Else\State{median = MaxHeap.checkMax()} 40 | \EndIf 41 | \EndFor 42 | \end{algorithmic} 43 | \end{algorithm} 44 | \subsection{Implementation} 45 | A heap can be conceptually thought of as a binary tree that is as complete as possible, i.e. null leaves are only allowed at the lowest level. The key of any node should be smaller than or equal to keys of its children, if there are any. This guarantees that the object at the root of the tree has the smallest key. This tree can be implemented as an array, with the root at the first position, and nodes at lower levels sequentially concatenated afterwards. If the array $A$ is 1-indexed, then the parent of $A[i]$ is $A[i/2]$, and the children of this node are $A[2i]$ and $A[2i+1]$. 46 | 47 | With the array representation of heap, insertion can be implemented as follow: 48 | \begin{itemize} 49 | \item Put the new object at the end of the array. 50 | \item As long as the key of the new object is smaller than that of its parent, bubble it up. 51 | \end{itemize} 52 | And extract-min can be implemented as follow: 53 | \begin{itemize} 54 | \item Remove the root. 55 | \item Move the last object in the array to the first position. 56 | \item As long as the key of the object at the root is larger than that of at least one of its children, sink it down. If the keys of both children are smaller, the child with smaller key should be used in the sink-down. 57 | \end{itemize} 58 | The height of the tree is $O(\log n)$, thus either bubble-up or sink-down can be executed at most $O(\log n)$ times, which guarantees that the two operations take $O(\log n)$ running time. 59 | \section{Binary Search Tree} 60 | Sorted array supports quick search of an element in $O(\log n)$ time, but it takes $O(n)$ time to insert or delete an element. Binary search tree is a data structure that supports both quick search and quick insertion / deletion. 61 | \subsection{Basic Operations} 62 | Each node of a BST contains the key and three pointers to other nodes: the left child, the right child and the parent. Some of the three pointers can be null. The most important property of BST is that for any node, all nodes in its left child has smaller keys than itself, while all nodes in its right key has larger keys. The height of a BST is at least $\log n$ and at most $n$. A \textbf{balanced} BST supports search, insertion and deletion in $O(\log n)$ time. But if it's not balanced, these operations can take as long as $O(n)$ time. Some of its basic operations are listed below. 63 | \begin{description} 64 | \item[search]In order to search for a node with a specific key value $k$: 65 | \begin{itemize} 66 | \item Start from the root node. 67 | \item If a node is null or its key is equal to $k$, return this node. 68 | \item If $k$ is smaller than its key, recursively search its left child. 69 | \item If $k$ is larger than its key, recursively search its right child. 70 | \end{itemize} 71 | \item[insert]In order to insert a new node with key value $k$: 72 | \begin{itemize} 73 | \item Start from the root node. 74 | \item If the node is null, make a new node here with key value $k$. 75 | \item If $k$ is smaller than its key, go to its left child. 76 | \item If $k$ is larger than its key, go to its right child. 77 | \end{itemize} 78 | \item[max]In order to obtain the node with the maximum key value: 79 | \begin{itemize} 80 | \item Start from the root node. 81 | \item If the node has right child, go to its right child. 82 | \item Return the node. 83 | \end{itemize} 84 | \item[min]Similar to max. 85 | \item[successor]In order to obtain the successor of a node with key value $k$: 86 | \begin{itemize} 87 | \item If the node has right child, return the max of its right node. 88 | \item Otherwise recursively go to its parent, until the key becomes larger than $k$. 89 | \end{itemize} 90 | \item[predecessor]Similar to successor. 91 | \item[in order traversal]In order to traverse all nodes of a BST in order: 92 | \begin{itemize} 93 | \item Start from the root node. 94 | \item If the node is null, stop. 95 | \item Recursively traverse the left child. 96 | \item Do something to the node, e.g. print its key. 97 | \item Recursively traverse the right child. 98 | \end{itemize} 99 | \item[delete]In order to delete a node with key value $k$: 100 | \begin{itemize} 101 | \item Search for the node. 102 | \item If it has no child, change it to null. 103 | \item If it has 1 child, replace it with its child. 104 | \item If it has 2 children, find its predecessor, who is guaranteed to have at most 1 child, and swap their keys. Then delete the node (currently at its predecessor's old position). 105 | \end{itemize} 106 | \end{description} 107 | Sometimes a tree node can contain some information about the tree itself, for example the size of the subtree that uses this node as root. For each node $n$, we have 108 | $$size(n) = size(n.left) + size(n.right) + 1.$$ 109 | With this information, we can find the node with the $i^{th}$ largest key among all nodes: 110 | \begin{itemize} 111 | \item Start from the root node. 112 | \item If $size(n.left) = i - 1$, return the node. 113 | \item If $size(n.left) > i - 1$, return the node with the $i^{th}$ largest key in the left subtree. 114 | \item If $size(n.left) < i - 1$, return the node with the $(i-size(n.left)-1)^{th}$ largest key in the right subtree. 115 | \end{itemize} 116 | \subsection{Red-Black Tree} 117 | The height of a BST can vary between $O(\log n)$ and $O(n)$. Balanced BSTs are guarantees to have $O(\log n)$ height, thus ensuring the efficiency of operations on it. Red-black trees is an implementation of balanced BST. In addition to the key and pointers to the parent and children, nodes in a red-black tree also stores a bit to indicate whether the node is red or black. The following conditions are satisfied by a red-black tree: 118 | \begin{enumerate} 119 | \item Each node is either red or black; 120 | \item The root is black;\label{redblackcondition2} 121 | \item There can never be two red nodes in a row, i.e. red nodes must have black parents and children;\label{redblackcondition3} 122 | \item Every root $\rightarrow$ null path has the same number of black nodes.\label{redblackcondition4} 123 | \end{enumerate} 124 | \begin{theorem} 125 | The height of a red-black tree with $n$ nodes is smaller than or equal to $O(2\log(n+1))$. 126 | \end{theorem} 127 | \begin{proof} 128 | Suppose all root $\rightarrow$ null paths contain $k$ black nodes. Then the red-black contains at lest $k$ complete levels, because otherwise there would exist root $\rightarrow$ null paths with fewer than $k$ nodes, thus of cause fewer than $k$ black nodes. Therefore we have 129 | $$n\geq 1 + 2 + \dots + 2^{k-1} = 2^k-1,$$ 130 | which means $k\leq\log(n+1).$ Suppose the height of the tree is $h$. According to condition \ref{redblackcondition3}, we hereby come to the conclusion 131 | $$h\leq 2k\leq 2\log(n+1).$$ 132 | \end{proof} 133 | An important idea in the implementation of red-black tree is rotation, as illustrated in Figure \ref{rotations}. It alters the structure of the tree in a way that makes the tree more balanced, whilst preserves the BST property. 134 | 135 | \begin{figure}[ht] 136 | \begin{subfigure}{\textwidth} 137 | \centering 138 | \begin{tikzpicture} 139 | \tikzstyle{subtree} = [regular polygon, regular polygon sides = 4, draw] 140 | \tikzstyle{singlenode} = [circle,draw] 141 | \node[circle, draw](1) at (0,0){1} 142 | child { node[subtree] {A} } 143 | child { node [singlenode] {2} 144 | child { node[subtree] {B} } 145 | child { node[subtree] {C} } 146 | }; 147 | \node[circle, draw](2)[right= 5cm of 1]{2} 148 | child { node [singlenode] {1} 149 | child { node[subtree] {A} } 150 | child { node[subtree] {B} } 151 | } 152 | child { node[subtree] {C} }; 153 | \draw[->,very thick] (2.3,-1.5) -- (3.3,-1.5); 154 | \end{tikzpicture} 155 | \caption{left rotation} 156 | \end{subfigure}\\ 157 | \begin{subfigure}{\textwidth} 158 | \centering 159 | \begin{tikzpicture} 160 | \tikzstyle{subtree} = [regular polygon, regular polygon sides = 4, draw] 161 | \tikzstyle{singlenode} = [circle,draw] 162 | \node[singlenode](1) at (0,0){1} 163 | child { node [singlenode] {2} 164 | child { node[subtree] {A} } 165 | child { node[subtree] {B} } 166 | } 167 | child { node[subtree] {C} }; 168 | \node[singlenode](2)[right= 3cm of 1]{2} 169 | child { node[subtree] {A} } 170 | child { node [singlenode] {1} 171 | child { node[subtree] {B} } 172 | child { node[subtree] {C} } 173 | }; 174 | \draw[->,very thick] (1.3,-1.5) -- (2.3,-1.5); 175 | \end{tikzpicture} 176 | \caption{right rotation} 177 | \end{subfigure} 178 | \caption{Rotations in Red Black Tree}\label{rotations} 179 | \end{figure} 180 | 181 | Insertion and deletion in a red-black tree is carried out in two steps. First a normal BST insertion / BST is executed. It is probable that some of the conditions of red-black tree will be violated, thus we then modify the tree by recoloring the nodes and rotations in order to restore the conditions. 182 | 183 | When we insert a node into the red-black tree as we do for any BST, we first try to color it as red. If condition \ref{redblackcondition3} is not violated, then everything is fine. Otherwise we wind up in two possible cases, as shown in Figure \ref{redblackinsertion}, in which $x$ is the newly inserted node. 184 | 185 | In case 1, all we need to do is a recoloring of the nodes. The red node is propagated upwards, which may possibly induce another violation of \ref{redblackcondition3}. The process can last as much as $O(\log n)$ times until we reach the root. If the root is colored red, condition \ref{redblackcondition2} will be violated, and the solution is to color it back to black. 186 | 187 | During the upward propagation process, it is possible that we meet case 2. Tackling this case is a little bit more complex, but it can be proven that the conditions can be restored via 2-3 rotations and recolorings in $O(1)$ time. 188 | 189 | \begin{figure}[H] 190 | \begin{subfigure}{.6\textwidth} 191 | \centering 192 | \begin{tikzpicture} 193 | \tikzstyle{rednode}=[circle,draw,red] 194 | \tikzstyle{blacknode}=[circle,draw] 195 | \node[blacknode](w) at (0,0) {w} 196 | child {node[rednode]{z}} 197 | child {node[rednode]{y} 198 | child{node[rednode,left]{x}} 199 | }; 200 | \node[rednode](w1) [right = 3cm of w] {w} 201 | child {node[blacknode]{z}} 202 | child {node[blacknode]{y} 203 | child{node[rednode,left]{x}} 204 | }; 205 | \draw[->,very thick] (1.3,-1.5) -- (2.3,-1.5); 206 | \end{tikzpicture} 207 | \caption{case 1} 208 | \end{subfigure} 209 | \begin{subfigure}{.3\textwidth} 210 | \centering 211 | \begin{tikzpicture} 212 | \tikzstyle{rednode}=[circle,draw,red] 213 | \tikzstyle{blacknode}=[circle,draw] 214 | \tikzstyle{subtree} = [regular polygon, regular polygon sides = 4, draw] 215 | \node[blacknode](w2) {w} 216 | child {node[blacknode]{z}} 217 | child {node[rednode]{y} 218 | child{node[rednode]{x} 219 | child{node[subtree]{...}} 220 | child{node[subtree]{...}} 221 | } 222 | child{node[subtree]{...}} 223 | }; 224 | \end{tikzpicture} 225 | \caption{case 2} 226 | \end{subfigure} 227 | \caption{Insertion in a Red-Black Tree}\label{redblackinsertion} 228 | \end{figure} 229 | \section{Hash Table} 230 | \subsection{Concepts and Applications} 231 | Hash table is a data structure designed to efficiently maintain a (possibly evolving) set of items, such as financial transactions, IP addresses, people associated with some data, etc. It supports insertion of a new record, deletion of existing records, and lookup of a particular record (like a dictionary). Assuming that the hash table is properly implemented, and that the data is non-pathological, all these operations can be executed in $O(1)$ time: amazingly fast! 232 | 233 | Let's first introduce a few typical use cases of hash table before diving into its implementation. 234 | 235 | Hash table can be used to solve the de-duplicates problem. 236 | \begin{description} 237 | \item[Input]A stream of objects. 238 | \item[Output]Unique objects in the stream, i.e. the objects with all duplicates removed. 239 | \end{description} 240 | The problem arises when we want to record the number of unique visitors to a website, or when we want to remove duplicates in the result of a search. With a hash table on the objects implemented, we can solve the problem in linear time. Just examine the objects one by one. For each object $x$, do a lookup in the hash table $H$. If $x$ is not found in $H$, insert it into $H$ and append it to the result, otherwise just continue with the next object. 241 | 242 | Another application is the 2-sum problem. 243 | \begin{description} 244 | \item[Input]An unsorted array $A$ of $n$ integers, and a target sum $t$. 245 | \item[Output]Whether there exists two numbers $x,y\in A$ such that $x+y==t$. 246 | \end{description} 247 | A naive enumerative solution is $O(n^2)$. If we sort $A$ and then search for $t-x$ in $A$ for every $x\in A$, the time consumption can be reduced to $O(n\log n)$. But with hash table, the problem can be solved in merely $O(n)$ time. Just insert all items into the hash table, and then for each $x\in A$ check if $t-x$ is in $A$ via a hash table lookup. 248 | 249 | In the early days of compilers, hash table was used to implemented symbol tables. The administrator of a network can use hash table to block certain IP addresses. When exploring huge game trees of chess or Go, hash table can be used to avoid duplicate explorations of the same configuration that can appear enormous times in the tree. In the last case, the size of the tree is so large that hash table is the only plausible method to record whether a configuration has been explored. 250 | \subsection{Implementation} 251 | When implementing hash table, we should think of a generally really big universe $U$ (e.g. all IP addresses, all names, all chessboard configurations, etc), of which we wish to maintain a evolving subset $S$ of reasonable size. The general approach is as follow. 252 | \begin{enumerate} 253 | \item Pick $n$ as the number of ``buckets''. $n$ should be of size comparable with $S$. 254 | \item Choose a hash function $h:U\rightarrow\{0,1,2,\dots,n-1\}$. 255 | \item Use array $A$ of length $n$ to store the items. $x$ should be stored in $A[h(x)]$. 256 | \end{enumerate} 257 | \begin{definition} 258 | For a specific hash function $h$ on a universe $U$, we say there is a collision if $\exists$ distinct $x,y\in U$ such that $h(x)=h(y)$. 259 | \end{definition} 260 | Think of the famous ``same birthday'' problem: what's the number of people needed so that the probability for at least 2 of them to have the same birthday is more than 50\%? The answer is 23, which is quite a small number. This problem is an example to demonstrate that collisions are not unlikely to happen, and thus a good implementation of hash table must be able to resolve collision properly. There are two popular solutions: 261 | \begin{description} 262 | \item[Chaining]A linked list is kept in each bucket containing items with the correspondent hash value. Given an object $x$, an insertion / deletion / lookup is executed in the list $A[h(x)]$ when a correspondent operation is executed on the hash table with $x$. 263 | \item[Open addressing]A bucket only stores one object. The hash function specifies a probe sequence $h_1(x),h_2(x),$ etc. When an object is inserted in to the hash table, the sequence will be followed until an empty slot is found. The sequence can be linear (i.e. slots are probed consecutively), or decided by two independent hash functions. 264 | \end{description} 265 | 266 | For a hash table with chaining, insertions are always $\Theta(1)$ because we simply insert a new element at the front of a list, while deletions and lookups are $\Theta(list\:length)$. The maximal length of a list can be anywhere from $m/n$, which means lengths of all lists are equal, to $m$, which means all objects are in the same bucket. The situation with open addressing is similar. Obviously, the performance of an implementation depends heavily on the choice of the hash function. A good hash function should lead to good performance, i.e. data should be spread out among all hash values, and the result of the hash function should be easy to evaluate and store. 267 | 268 | A widely used method to define a hash function consists of two steps. First an object is transformed into a usually large integer, namely the hash code, and then the integer is mapped by a compression function to a number between $0$ and $n-1$, i.e. the index of a bucket. The $mod\:n$ function can serve as the compression function. 269 | 270 | The number of buckets $n$ must be selected with caution. It should be a prime within a constant factor of the number of objects supposed to be saved in the table, and it should not be close to a power of 2 or 10. 271 | \begin{definition} 272 | The load factor $\alpha$ of a hash table is defined as 273 | $$\alpha=\frac{\text{\# of objects in the hash table}}{\text{\# of buckets in the hash table}}.$$ 274 | \end{definition} 275 | Obviously, for open addressing, $\alpha$ has to be smaller than 1, whereas chaining can cope with $\alpha<1$. 276 | In general, $\alpha$ has to be $O(1)$ to guarantee constant running time for hash table operations. In particular, $\alpha\ll 1$ is expected for open addressing. 277 | \subsection{Universal Hashing} 278 | We wish to fabricate a clever hash function that can spread any data set quasi-evenly among all buckets. Unfortunately such function does not exist, because any hash function has a pathological data set. The reason is that for any hash function $h:U\rightarrow\{0,1,\dots,n-1\}$, according to the Pigeonhole Principle, there exists a bucket $i$ such that at least $\lvert U\rvert/n$ elements of $U$ hash to $i$ under $h$. If the data set is a subset of these elements, all of them will collide. This could become dangerous in real-world systems: a simple hash function can be reverse engineered and abused. 279 | 280 | There are two solutions to this problem. Either a cryptographic hash function, e.g. SHA-2, should be used to make the reverse engineering infeasible, or a randomized approach should be taken: we should design a family $H$ of hash functions such that for any data set $S$, a randomly chosen function $h\in H$ is almost guaranteed to spread $S$ out quasi-evenly. Such a family of hash functions is called universal. 281 | \begin{definition} 282 | Let $H$ be a set of hash functions $h:U\rightarrow\{0,1,\dots,n-1\}$. $H$ is universal if and only if $\forall x,y\in U(x\neq y),$ 283 | $$P(h(x)=h(y))\leq \frac{1}{n},$$ 284 | in which $n$ is the number of buckets and $h$ is a hash function chosen uniformly at random from $H$. $1/n$ is actually the probably of a collision for pure random hashing. 285 | \end{definition} 286 | We will now provide a universal hash function family for IP addresses. Let $U$ represent the universe of all IP address of the form $(x_1,x_2,x_3,x_4)$, in which each $x_i$ is an integer between 0 and 255 inclusive. Let $n$ be a prime whose value is comparable with the number of objects in the hash table, and larger than 255. We define a hash function $h_a$ for each 4-tuple $a=(a_1,a_2,a_3,a_4)$ with each $a_i\in\{0,1,\dots,n-1\}:$ 287 | $$h_a(x_1,x_2,x_3,x_4)=\left(\sum\limits_{i=1}^4a_ix_i\right)\mod n.$$ 288 | Then the family of all $h_a$ is universal. 289 | \begin{proof} 290 | Consider two distinct IP addresses $x=(x_1,x_2,x_3,x_4)$, $y=(y_1,y_2,y_3,y_4)$, and assume that $x_4\neq y_4$. If $x$ and $y$ collide, we have 291 | \begin{align*} 292 | \left(\sum\limits_{i=1}^4a_ix_i\right)\mod n=\left(\sum\limits_{i=1}^4a_iy_i\right)\mod n\\ 293 | a_4(x_4-y_4)\mod n=\left(\sum\limits_{i=1}^3a_i(y_i-x_i)\right)\mod n\\ 294 | \end{align*} 295 | For an arbitrarily fixed choice of $a_1,a_2,a_3$, the rhs is a fixed number between 0 and $n-1$ inclusive. With $x_4-y_4\mod n\neq 0$ (guaranteed by $n>255$ and $x_4\neq y_4$) and $a_4$ randomly chosen in $\{0,1,\dots,n-1\}$, the lhs is actually equally likely to be any of $\{0,1,\dots,n-1\}$. Therefore the probability of collision is $\frac{1}{n}$. 296 | \end{proof} 297 | Now we would like to verify the $O(1)$ running time guarantee of hash table implemented with chaining and hash function $h$ selected randomly from a universal family $H$. Here we assume that $\lvert S\rvert=O(n)$, i.e. $\alpha=\frac{\lvert S\rvert}{n}=O(1)$, and that it takes $O(1)$ to evaluate the hash function. 298 | \begin{proof} 299 | As discussed before, the running time of basic operations on a hash table implemented with chaining is $O(list\:length)$. So here we will try to prove that the expectation of the list length $L$ is $O(1)$. 300 | 301 | For a specific list corresponding to hash value $h(x)$, we define 302 | \begin{equation*} 303 | Z_y=\begin{cases} 304 | 1\text{ if }h(y)=h(x)\\ 305 | 0\text{ otherwise}\\ 306 | \end{cases} 307 | \end{equation*} 308 | for any $y\in S$. Then obviously $L=\sum\limits_{y\in S}Z_y$. Thus we have 309 | \begin{align*} 310 | E[L]&= \sum\limits_{y\in S}E[Z_y]=\sum\limits_{y\in S}P(h(y)=h(x))\\ 311 | &\leq\sum\limits_{y\in S}\frac{1}{n}=\frac{\lvert S\rvert}{n}=O(1). 312 | \end{align*} 313 | \end{proof} 314 | The running time of operations on hash table implemented with open addressing is hard to analyze. We will use a heuristic assumption that all $n!$ probe sequences are equally possible, which is indeed not true but facilitates an idealized quick analysis. Under this heuristic assumption, the expected running time of operations is $\frac{1}{1-\alpha}$. 315 | \begin{proof} 316 | A random probe finds an empty slot with probability $1-\alpha$. A random probe sequence can be regarded as repetitions of random probes\footnote{Actually the probability for probes after the first probe to find an empty slot is larger, because we don't examine the same slot twice. The running time $\frac{1}{1-\alpha}$ is an upper bound.}. Thus the expectation of the number of probes needed for finding an empty slot is $\frac{1}{1-\alpha}$. 317 | \end{proof} 318 | For linear probing, the heuristic assumption is deadly wrong. So we assume instead that the initial probe is random, which is again not true in practice. Knuth proved in 1962 that the expected running time of an insertion under this assumption is $\frac{1}{(1-\alpha)^2}$. 319 | \section{Bloom Filters} 320 | Bloom filter is another data structure that facilitates fast insertions and lookups besides hash tables. It uses much less space than hash table, at the price of the following shortcomings: 321 | \begin{itemize} 322 | \item It cannot store associated objects; 323 | \item It does not support deletions; 324 | \item There is a small probability of false positive for the lookup result. 325 | \end{itemize} 326 | Historically, bloom filters are used to implement spell-checkers. A canonical use case is to forbid passwords of certain patterns. It is also used in network routers to complete tasks like banning certain IP addresses. It is desirable in such environment because memory is limited, the lookups are supposed to be super-fast and occasional false positives are tolerable. 327 | 328 | Basic ingredients of a bloom filter includes an array $A$ of $n$ bits and $k$ hash functions $h_i,i=1,,2\dots,k$. To insert element $x$, we set $A[h_i(x)] = 1$ for all $i$. To do a lookup for $x$, we return true if we find that $A[h_i(x)]$ = 1 for all $i$. It is obvious that if all bits related to element $x$ via the $k$ hash functions have been set to 1 before $x$ itself is inserted, false positive will happen in a lookup for $x$. If the probability of false positive is too big, bloom filter should never be used. We will use heuristic analysis to illustrate that this probability is very small in reality. 329 | 330 | We assume that across different $i$ and $x$, all $h_i(x)$ are uniformly random and independent. The assumption is generally not true, but it helps to understand the trade-off between space and error in bloom filters. 331 | 332 | We would like to insert a data set $S$ into a bloom filter using $n$ bits. The probability that a certain bit has been set to 1 after inserting $S$ is 333 | \begin{equation}\label{bloomfilterheuristic} 334 | 1- \left(1-\frac{1}{n}\right)^{k\lvert S\rvert}\approx 1-e^{-k\lvert S\rvert/n}=1-e^{-k/b}, 335 | \end{equation} 336 | in which $b$ is the number of bits per object $\frac{n}{\lvert S\rvert}.$ Note that the approximation is only correct when $n$ is large, i.e. when $1/n$ is small. For an element not in $S$, the probability of a false positive is 337 | \begin{equation*} 338 | \epsilon=\left(1-e^{-k/b}\right)^k. 339 | \end{equation*} 340 | Let $t=-k/b$, then we have 341 | \begin{align*} 342 | \ln\epsilon&=-bt\ln(1-e^{-t})\\ 343 | \dv{\ln\epsilon}{t}&=-\frac{b}{e^t-1}\left(t+(e^t-1)\ln(1-e^{-t})\right).\\ 344 | \end{align*} 345 | When $t=\ln 2$, $\dv{\ln\epsilon}{t}=0$ and $\epsilon$ gets minimized. Thus we should use $k=(ln 2)b\approx 0.693b.$ With $b=8$, we should choose $k=5$ or 6, and $\epsilon$ will be approximately 2\%. 346 | \section[Union Find]{Union Find\protect\footnote{This topic was originally covered as an optional topic in Part 2. I put it here because it's a pure data structure topic that fits this chapter better.}}\label{unionfind} 347 | Union find data structure maintains the partition of a set of objects. It supports two essential operations: 348 | \begin{description} 349 | \item[$find(x)$]Returns the name of the group to which $x$ belongs. 350 | \item[$union(C_i, C_j)$]Merge groups $C_i,C_j$ into one group. 351 | \end{description} 352 | \subsection{Quick Find UF} 353 | \begin{multicols}{2} 354 | \begin{algorithmic}[1] 355 | \Function{find}{x} 356 | \State{return leader(x)} 357 | \EndFunction 358 | \end{algorithmic} 359 | \columnbreak 360 | \begin{algorithmic}[1] 361 | \Function{union}{x,y} 362 | \If{size(x) $>$ size(y)} 363 | \For{i in y's group} 364 | \State{leader(i) = x} 365 | \EndFor 366 | \State{size(x) += size(y)} 367 | \Else 368 | \For{i in x's group} 369 | \State{leader(i) = y} 370 | \EndFor 371 | \State{size(y) += size(x)} 372 | \EndIf 373 | \EndFunction 374 | \end{algorithmic} 375 | \end{multicols} 376 | In this implementation, a leader is chosen from each group arbitrarily. Each group is represented by its leader. Each object maintains a pointer to its leader. When two groups get merged, the leader of the larger group (i.e. the group that contains more objects) becomes the leader of the merged group. 377 | 378 | $find()$ is easy for this implementation: just return its leader, which takes $O(1)$ time. However $union()$ takes $O(n)$ time, because all objects in the smaller group have to have the leader pointer updated. Nonetheless, if we consecutively merge groups so that in the end all objects are in the same group, the leader of each object is updated at most $\log n$ times, because each time an object has its leader updated, the size of the group to which it belongs at least doubles. Therefore in total there can be at most $O(n\log )$ leader updates. 379 | \subsection{Quick Union UF} 380 | \begin{multicols}{2} 381 | \begin{algorithmic}[1] 382 | \Function{find}{x} 383 | \While{x $\neq$ parent(x)} 384 | \State{x = parent(x)} 385 | \EndWhile 386 | \State{return x} 387 | \EndFunction 388 | \end{algorithmic} 389 | \columnbreak 390 | \begin{algorithmic}[1] 391 | \Function{union}{x,y} 392 | \State{lx = find(x), ly = find(y)} 393 | \If{size(lx) $>$ size(ly)} 394 | \State{parent(ly) = lx} 395 | \State{size(lx) += size(ly)} 396 | \Else 397 | \State{parent(lx) = ly} 398 | \State{size(ly) += size(lx)} 399 | \EndIf 400 | \EndFunction 401 | \end{algorithmic} 402 | \end{multicols} 403 | In this implementation\footnote{It is different from the lazy union implementation in the lectures. It is as efficient as union by rank, but much easier to verify.}, each object maintains a pointer to its parent instead of its leader. Only the leader has itself as parent. In this way each group form a tree, with the leader as root. $find()$ follows the parent pointer of objects until the leader is met. $union()$ changes the parent of the leader of the smaller group into the leader of the larger group. 404 | 405 | It can be proved by induction that a group with $k$ objects forms a tree of height no more than $\log k$. 406 | \begin{proof} 407 | The base case is trivial: each group has 1 object and height 0. Suppose $1\leq i\leq j$. When a group with $i$ objects is merged with a group with $j$ objects, the height of the new group is 408 | $$h_{i+j}=h_i+1\leq \log i + 1 = \log(2i)\leq\log(i+j).$$ 409 | \end{proof} 410 | Since the tree is at most of logarithmic height, the running time of $find()$ and $union()$ are both $O(\log n).$ 411 | \subsection{Union by Rank} 412 | \begin{multicols}{2} 413 | \begin{algorithmic}[1] 414 | \Function{find}{x} 415 | \While{x $\neq$ parent(x)} 416 | \State{x = parent(x)} 417 | \EndWhile 418 | \State{return x} 419 | \EndFunction 420 | \end{algorithmic} 421 | \columnbreak 422 | \begin{algorithmic}[1] 423 | \Function{union}{x,y} 424 | \State{lx = find(x), ly = find(y)} 425 | \If{rank(lx) $>$ rank(ly)} 426 | \State{parent(ly) = lx} 427 | \ElsIf{rank(lx) $<$ rank(ly)} 428 | \State{parent(lx) = ly} 429 | \Else\Comment{equal} 430 | \State{parent(lx) = ly} 431 | \State{rank(ly) += 1} 432 | \EndIf 433 | \EndFunction 434 | \end{algorithmic} 435 | \end{multicols} 436 | $find()$ is the same as quick union. Each object maintains a rank field, which is initialized to 0 for all objects and can only increase in a merge of two trees whose roots have the same rank. In $union()$, rank instead of size is used to determine which root will serve as the root of the merged tree. It can be verified that union by rank also achieves $O(\log n)$ running time. 437 | 438 | It follows immediately from the implementation of $union()$ that 439 | \begin{enumerate} 440 | \item For any object x, rank(x) can only increase over time. 441 | \item Only ranks of roots can go up. 442 | \item Along a path to the root, ranks strictly increase. 443 | \end{enumerate} 444 | \begin{lemma}\textbf{(Rank Lemma)} 445 | After an arbitrary number of union operations, there are at most $n/2^r$ objects with rank $r$, in which $r\geq 0$. 446 | \end{lemma} 447 | \begin{proof} 448 | First, if $x,y$ have the same rank $r$, then their subtrees must be disjoint. If they had a common node $z$, then path $z\rightarrow x$ and $z\rightarrow y$ would have to superpose with each other because each node has only one parent. It would become inevitable that one of $x$ and $y$ was an ancestor of the other, which is impossible because they have the same rank. 449 | 450 | Then it can be verified that a rank-$r$ object has a subtree of size $\geq 2^r$. We will prove it by induction. In the base case, all subtrees are of rank $0$ and size 1. When two subtrees whose roots have different ranks merge, the situation is simple: no rank changes, while sizes become larger, hence the claim cannot be violated. When two subtrees $t_1, t_2$ whose roots have the same rank $r$ merge, the rank of the new root is $r+1$, and it is the only node whose rank changes. Since $t_1,t_2$ both have size $\geq 2^r$, the new tree must have size $\geq 2^{r+1}$, therefore the claim is observed. 451 | 452 | The rank lemma follows directly from the two claims above. 453 | \end{proof} 454 | According to the rank lemma, there is at most 1 object with rank $\log n$, which can only be the root. Thus the tree is at most of height $\log n$, and the running time of $find()$ and $union()$ is $O(\log n)$. 455 | \subsection{Path Compression} 456 | If $find()$ operation is expected to be executed multiple times for each object, which is almost always the case, it makes no sense to repeat the same traversal job every time. Instead, we can make the parent pointer of all objects met during the process point to the root, i.e. the leader, so that later $find()$ operation on these objects take $O(1)$ time. This modification adds only a constant factor overhead to the first $find()$ on objects who are not direct children of the leader, and greatly speeds up subsequent $find()$ operations. 457 | \begin{algorithmic}[1] 458 | \Function{find}{x} 459 | \State{leader = x} 460 | \While{leader $\neq$ parent(leader)} 461 | \State{leader = parent(leader)} 462 | \EndWhile 463 | \While{parent(x) $\neq$ leader} 464 | \State{t = parent(x)} 465 | \State{parent(x) = leader} 466 | \State{x = t} 467 | \EndWhile 468 | \State{return leader} 469 | \EndFunction 470 | \end{algorithmic} 471 | We will try to precisely analyze the performance enhancement of union by rank brought by path compression. Ranks are maintained exactly as without path compression. In this case, rank[x] is only an upper bound on the maximum number of steps along a path from a leaf to x. But the rank lemma still holds, and we still have rank(parent(x)) $>$ rank(x) for all non-root x. 472 | \subsubsection{Hopcroft-Ullman's Analysis} 473 | \begin{theorem} \textbf{(Hopcroft-Ullman Theorem)} 474 | With union by rank and path compression, $m$ union + find operations take $O(m\log^*n)$ time, where 475 | \begin{equation*} 476 | \log^*n=\begin{cases} 477 | 0&if\:n\leq 1\\ 478 | 1+\log^*(\log n)&if\:n>1 479 | \end{cases} 480 | \end{equation*} 481 | \end{theorem} 482 | We will focus on the case when $m=\Omega(n).$ 483 | \begin{proof} 484 | First we divide the interval [0,n] into a few rank blocks:\{0\}, \{1\}, \{2,3,4\}, \{5,$\dots,2^4$\}, \{17,$\dots,2^16$\}, \{65537,$\dots,2^65536$\}, $\dots$, \{...$n$\}. In general, there are $O(\log^*n)$ rank blocks. 485 | 486 | Consider a non-root object x, thus rank(x) is fixed. At a given time point, we call an object x good if one of the two conditions is satisfied: 487 | \begin{itemize} 488 | \item x or parent(x) is a root; 489 | \item rank(parent(x)) is in a larger block than rank(x). 490 | \end{itemize} 491 | If neither condition is satisfied, we say x is bad. 492 | 493 | A $find()$ operation can visit at most $O(\log^*n)$ good nodes (root, direct child of root + at most 1 in each rank block). In $m$ operation, these visits take $O(m\log^*n)$ time. 494 | 495 | To compute the time consumption of visits to bad nodes, consider a rank block \{k+1, $\dots, 2^k$\}. Note that each time a bad node is visited, its parent is changed to another node (its then root) with strictly larger rank than its current parent. For a bad node x with rank(x) in this block, this process can happen at most $2^k$ times before x becomes good. Therefore the number of visits to x while x is bad and rank(x) is in this block is $\leq 2^k$. According to the rank lemma, the number of objects x with rank in this block is $\leq\sum\limits_{i=k+1}^{2^k}\frac{n}{2^i}<\frac{n}{2^k}$. Thus the total number of visits to bad objects in this block is $\leq 2^k\cdot\frac{n}{2^k}=n$. Since there are $O(\log^* n)$ blocks, the total time spent on visiting bad nodes is $O(n\log^*n)$. 496 | 497 | In conclusion, the running time of $m$ operations is $O((m+n)\log^*n)$. Since we are interested in the case when $m=\Omega(n)$, it is equivalent to $O(m\log^*n)$. 498 | \end{proof} 499 | \subsubsection{Tarjan's Analysis} 500 | Hopcroft-Ullman theorem already provides an upper bound quite close to linear running time. Yet Tarjan proved that there is an even better upper bound. 501 | 502 | For integers $r\geq 1, k\geq 0$, the Ackermann function $A_k(r)$ is defined as 503 | \begin{align*} 504 | A_0(r)&=r+1\\ 505 | A_k(r)&=\underbrace{(A_{k-1}\circ A_{k-1}\circ\dots\circ A_{k-1})}_{r\text{ times}}(r),\:k\geq 1 506 | \end{align*} 507 | It's easy to derive that $A_1(r)=2r$, $A_2(r)=r2^r$. Because $A_2(r)$ is larger than $2^r$, $A_3(r)$ is larger than the result of applying $2^r$ $r$ times on $r$, i.e. the ``exponential tower'' of height $r$: 508 | $$A_3(r)>{{2^2}^2}^{\dots r\:times\dots}.$$ 509 | Specifically, $A_1(2)=4$, $A_2(2)=8$, $A_3(2)=A_2(A_2(2))=A_2(8)=8\times 2^8=2048$. $A_4(2)=A_3(2048)$, which is larger than the exponential tower of height 2048. 510 | 511 | For integer $n\geq 4$, we define the inverse Ackermann function 512 | $$\alpha(n)=\text{ minimum value of $k$ such that }A_k(2)\geq n.$$ 513 | Since $A_k(2)$ blows up fast as $k$ gets larger, $\alpha(n)$ grows extremely slow. As a comparison: 514 | \begin{equation*} 515 | \alpha(n)=\begin{cases} 516 | 1&n=4\\ 517 | 2&n=5,\dots,8\\ 518 | 3&n=9,\dots,2048\\ 519 | 4&n=2049,\dots,>{{2^2}^2}^{\dots 2048\:times\dots}\\ 520 | &\dots\\ 521 | \end{cases} 522 | \log^*n=\begin{cases} 523 | 1&n=2\\ 524 | 2&n=3,4\\ 525 | 3&n=5,\dots,16\\ 526 | 4&n=17,\dots,65536\\ 527 | 5&n=65537,\dots,2^{65536}\\ 528 | \end{cases} 529 | \end{equation*} 530 | For $n={{2^2}^2}^{\dots 2048\:times\dots}$, $\log^*n=2048$ while $\alpha(n)=4$. 531 | \begin{theorem} \textbf{(Tarjan's Theorem)} 532 | With union by rank and path compression, $m$ union + find operations take $O(m\alpha(n))$ time. 533 | \end{theorem} 534 | In order to prove Hopcroft-Ullman's theorem, we used the fact that if parent(x) is updated from p to p', then rank(p') $\geq$ rank(p) + 1. To verify Tarjan's theorem, we will use a stronger version of this claim: in most cases rank(p') is much bigger than rank(p) (not just by 1). 535 | \begin{proof} 536 | Consider a non-root object x, thus rank(x) is fixed. Define 537 | \begin{center} 538 | $\delta(x)$ = max $k$ such that rank(parent(x)) $\geq$ $A_k($rank(x)). 539 | \end{center} 540 | As a few examples:\\ 541 | \begin{equation*} 542 | \begin{cases} 543 | \delta(x)\geq 0\iff \text{rank(parent(x))} \geq \text{rank(x) + 1 (always)}\\ 544 | \delta(x)\geq 1\iff \text{rank(parent(x))} \geq 2\cdot\text{rank(x)}\\ 545 | \delta(x)\geq 2\iff \text{rank(parent(x))} \geq \text{rank(x)}\cdot 2^{\text{rank(x)}}\\ 546 | \end{cases} 547 | \end{equation*} 548 | Note that for all object x with rank(x)$\geq 2$, we must have $\delta(x)\leq\alpha(n)$, because 549 | $$A_{\alpha(n)}(\text{rank(x)})\geq A_{\alpha(n)}(2)\geq n\geq\text{rank(parent(x))}.$$ 550 | An object is defined as bad if \textbf{all} of the following conditions hold: 551 | \begin{enumerate} 552 | \item x is not a root; 553 | \item parent(x) is not a root; 554 | \item rank(x)$\geq$2; 555 | \item x has an ancestor with $\delta(x)=\delta(y)$. 556 | \end{enumerate} 557 | Otherwise x is good. Along an object-root path, the maximum number of good objects is $\Theta(\alpha(n))$: 1 root, 1 direct child of root, 1 object with rank 0, 1 object with rank 1, 1 object x with $\delta(x)=k$ for each $k=0,1,\dots,\alpha(n)$. Thus the total number of visits to good objects is $O(m\alpha(n))$. 558 | 559 | Consider the visit to a bad object x. x has an ancestor y with $\delta(x)=\delta(y)=k$. Suppose x's parent is p, y's parent is p', then we have 560 | \begin{center} 561 | rank(x's new parent) $\geq$ rank(p') $\geq A_k($rank(y)) $\geq A_k($rank(p)) 562 | \end{center} 563 | The first and third $\geq$ are because rank only goes up from child to parent. The second $\geq$ comes from the definition of $\delta(y)$. This relation indicates that path compression at least applies the $A_k$ function to rank(x's parent). If r = rank(x), then after r such pointer updates, we have 564 | \begin{center} 565 | rank(parent(x)) $\geq\underbrace{(A_k\circ\dots\circ A_k)}_{\text{r times}}(r)=A_{k+1}(r).$ 566 | \end{center} 567 | Hence, every r visits to x while x is bad increases $\delta(x)$. Because $\delta(x)\leq\alpha(n)$, there can be at most $r\alpha(n)$ visits to x while it's bad. Thus the total number of visits to bad objects is 568 | \begin{equation*} 569 | N(bad)\leq\sum\limits_{\text{x is bad}}rank(x)\alpha(n)\leq\alpha(n)\sum\limits_{r\geq 2}\frac{n}{2^r}