├── .gitattributes ├── .github └── ISSUE_TEMPLATE │ ├── bug-or-typo-in-a-chapter-.md │ └── suggestions.md ├── CONTRIBUTING.md ├── LICENSE.md ├── README.md ├── __latexindent_temp.tex ├── acknowledgements.md ├── index.md ├── introtcs.bib ├── lec_00_0_preface.md ├── lec_00_1_math_background.md ├── lec_01_introduction.md ├── lec_02_representation.md ├── lec_03_computation.md ├── lec_03a_computing_every_function.md ├── lec_04_code_and_data.md ├── lec_05_infinite.md ├── lec_06_loops.md ├── lec_07_other_models.md ├── lec_08_uncomputability.md ├── lec_08a_restricted_models.md ├── lec_09_godel.md ├── lec_10_efficient_alg.md ├── lec_11_running_time.md ├── lec_12_NP.md ├── lec_13_Cook_Levin.md ├── lec_14_PvsNP.md ├── lec_14a_space_complexity.md ├── lec_15_probability.md ├── lec_16_randomized_alg.md ├── lec_17_model_rand.md ├── lec_19_cryptography.md ├── lec_20_alg_society.md ├── lec_24_proofs.md ├── lec_26_quantum_computing.md ├── macros.tex ├── metadata.yaml └── msword.md /.gitattributes: -------------------------------------------------------------------------------- 1 | # Set the default behavior, in case people don't have core.autocrlf set. 2 | * text eol=lf 3 | 4 | # Explicitly declare text files you want to always be normalized and converted 5 | # to native line endings on checkout. 6 | *.md text 7 | *.tex text 8 | *.sh text 9 | *.lua text 10 | *.html text 11 | 12 | 13 | # Declare files that will always have CRLF line endings on checkout. 14 | 15 | # Denote all files that are truly binary and should not be modified. 16 | *.png binary 17 | *.jpg binary 18 | *.pdf binary 19 | *.ttf binary 20 | *.woff binary 21 | *.eot binary 22 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug-or-typo-in-a-chapter-.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: 'Bug or typo in a chapter ' 3 | about: Report bug(s) or typo(s) in a chapter 4 | 5 | --- 6 | 7 | Please open a different issue for each chapter, with the **chapter name** in the title - thank you! 8 | 9 | Please edit the fields below 10 | 11 | **Chapter name**: The names of the chapter and section(s) where the typos/bugs/are. (Please use _names_, not _numbers_, since the numbers tend to change). 12 | 13 | **List of bugs/typos** 14 | 15 | 16 | Please don't refer to page numbers or even section numbers since they tend to change - section titles are best. Also, please give some context (sentence before/after the issue) to make it easier for me to find it - thanks! 17 | 18 | 1. bug 1 19 | 20 | 2. bug 2 21 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/suggestions.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Suggestions 3 | about: Suggest ideas, areas to cover, any other comments 4 | 5 | --- 6 | 7 | Please add your comments here. If you refer to specific chapter or section of the book, please use _names_, and not _numbers_, since the latter tend to change. 8 | 9 | Thank you! 10 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Suggesting comments or typos 2 | 3 | If you see any typo, things that can be explained better, see an error in a proof, can point me out to a good reference, or any other suggestions to make these notes better then I would greatly appreciate hearing about this. 4 | 5 | You can either do this by posting on the [issues](https://github.com/boazbk/tcs/issues) page, or if it's a very localized edit (such as a typo fix), you can also simply edit it yourself via a [pull request](https://github.com/boazbk/tcs/pulls). 6 | 7 | In an issue, please also write your full name so I can acknowledge you properly. If you do a pull request, please also edit the file `acknowledgements.md` to add your name. 8 | 9 | These notes are provided under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. 10 | 11 | It will remain freely and publicly available, but I may also create a printed book version of these notes in the future. 12 | By making any contribution to this work, you are assigning me the rights to use your contribution in the online or any other version of this work. 13 | 14 | 15 | _Thank you!_ 16 | 17 | Boaz 18 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Introduction to Theoretical Computer Science 2 | 3 | This is the git repository for a book in preparation for an introductory undergraduate course on computer science. 4 | The book is posted (in both html and pdf formats) on the web page https://introtcs.org 5 | 6 | Please use the [issues](https://github.com/boazbk/tcs/issues) and [pull requests](https://github.com/boazbk/tcs/pulls) to post any suggestions, comments, typo fixes, etc. 7 | 8 | 9 | There are two chapters that are still missing: space bounded computation & proofs and programs. 10 | 11 | Supplemental code for the book is on [github.com/boazbk/tcscode](https://github.com/boazbk/tcscode) 12 | 13 | 14 | I am producing this book from the markdown source using 15 | [Pandoc](https://pandoc.org/). 16 | The templates for the LaTeX and HTML versions are derived from [Tufte LaTeX](https://tufte-latex.github.io/tufte-latex/), [Gitbook](https://www.gitbook.com/) and [Bookdown](https://bookdown.org/). You can see the [following respository](https://github.com/boazbk/panbook) for some of the templates, scripts and [panflute](http://scorreia.com/software/panflute/) filter I am using. 17 | 18 | 19 | This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. 20 | 21 | While this text will remain freely and publicly available, I might create a printed book version in the future. 22 | By making any contribution to this work, such as a typo fix or any other suggestion or edit, you are assigning me the rights to use your contribution in both the online or any other version of this work. 23 | -------------------------------------------------------------------------------- /__latexindent_temp.tex: -------------------------------------------------------------------------------- 1 | @inproceedings{DBLP:conf/focs/Shor94, 2 | author = {Peter W. Shor}, 3 | title = {Algorithms for Quantum Computation: Discrete Logarithms and Factoring}, 4 | booktitle = {35th Annual Symposium on Foundations of Computer Science, Santa Fe, 5 | New Mexico, USA, 20-22 November 1994}, 6 | pages = {124--134}, 7 | publisher = {{IEEE} Computer Society}, 8 | year = {1994}, 9 | url = {https://doi.org/10.1109/SFCS.1994.365700}, 10 | doi = {10.1109/SFCS.1994.365700}, 11 | timestamp = {Wed, 16 Oct 2019 14:14:54 +0200}, 12 | biburl = {https://dblp.org/rec/conf/focs/Shor94.bib}, 13 | bibsource = {dblp computer science bibliography, https://dblp.org} 14 | } -------------------------------------------------------------------------------- /acknowledgements.md: -------------------------------------------------------------------------------- 1 | # Acknowledgements for people that contributed comments or typos 2 | 3 | If you make a pull request, please also add your name here in the alphabetical order 4 | 5 | * Scott Aaronson 6 | * Michele Amoretti 7 | * Luke Bailey 8 | * Aadi Bajpai 9 | * Marguerite Basta 10 | * Anindya Basu 11 | * Sam Benkelman 12 | * Jarosław Błasiok 13 | * Emily Chan 14 | * Christy Cheng 15 | * Michelle Chiang 16 | * Gabriel Chiong 17 | * Daniel Chiu 18 | * Je-Qin Chooi 19 | * Chi-Ning Chou 20 | * Michael Colavita 21 | * Brenna Courtney 22 | * Rodrigo Daboin Sanchez 23 | * Robert Darley Waddilove 24 | * Anlan Du 25 | * Dennis Du 26 | * Juan Esteller 27 | * Junxuan Liao 28 | * David Evans 29 | * Michael Fine 30 | * Simon Fischer 31 | * Leor Fishman 32 | * Zaymon Foulds-Cook 33 | * William Fu 34 | * Kent Furuie 35 | * Piotr Galuszka 36 | * Carolyn Ge 37 | * Jason Giroux 38 | * Mark Goldstein 39 | * Alexander Golovnev 40 | * Sayan Goswami 41 | * Maxwell Grozovsky 42 | * Jeffrey Gu 43 | * Kenneth Gu 44 | * Michael Haak 45 | * Rebecca Hao 46 | * Lucia Hoerr 47 | * Joosep Hook 48 | * Austin Houck 49 | * Catherine Huang 50 | * Thomas HUET 51 | * Emily Jia 52 | * Serdar Kaçka 53 | * Chan Kang 54 | * Nina Katz-Christy 55 | * Vidak Kazic 56 | * Joe Kerrigan 57 | * Eddie Kohler 58 | * Estefania Lahera 59 | * Allison Lee 60 | * Benjamin Lee 61 | * Ondřej Lengál 62 | * Raymond Lin 63 | * Emma Ling 64 | * Alex Lombardi 65 | * Lisa Lu 66 | * Zijian(Carl) Ma 67 | * Kai Ma 68 | * Aditya Mahadevan 69 | * Kunal Marwaha 70 | * Christian May 71 | * Josh Mehr 72 | * Jacob Meyerson 73 | * Leon Mlodzian 74 | * George Moe 75 | * Kian Moretz 76 | * Todd Morrill 77 | * Glenn Moss 78 | * Haley Mulligan 79 | * Hamish Nicholson 80 | * Hanry Xu 81 | * Owen Niles 82 | * Sandip Nirmel 83 | * Sebastian Oberhoff 84 | * Thomas Orton 85 | * Joshua Pan 86 | * Pablo Parrilo 87 | * Juan Perdomo 88 | * Banks Pickett 89 | * Aaron Sachs 90 | * Abdelrhman Saleh 91 | * Brian Sapozhnikov 92 | * Anthony Scemama 93 | * Peter Schäfer 94 | * Josh Seides 95 | * Alaisha Sharma 96 | * Nathan Sheely 97 | * Haneul Shin 98 | * Noah Singer 99 | * Matthew Smedberg 100 | * Miguel Solano 101 | * Hikari Sorensen 102 | * David Steurer 103 | * Alec Sun 104 | * Amol Surati 105 | * Everett Sussman 106 | * Marika Swanberg 107 | * Garrett Tanzer 108 | * Eric Thomas 109 | * Sarah Turnill 110 | * Salil Vadhan 111 | * Adrien Vandenbroucque 112 | * Jeffrey Wang 113 | * Patrick Watts 114 | * Jonah Weissman 115 | * Abraham Wieland 116 | * Ryan Williams 117 | * Christina Xiao 118 | * Licheng Xu 119 | * Richard Xu 120 | * Wanqian Yang 121 | * Elizabeth Yeoh-Wang 122 | * Sun-Jung Yum 123 | * Josh Zelinsky 124 | * Fred Zhang 125 | * Grace Zhang 126 | * Alex Zhao 127 | * Jessica Zhu 128 | 129 | 130 | -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | --- 2 | indexpage: true 3 | indexcontents: "" 4 | suppress-bibliography: true 5 | title: "index" 6 | filename: "lnotes_book" 7 | --- 8 | 9 | # Introduction to Theoretical Computer Science 10 | 11 | __Boaz Barak__ 12 | 13 | _Work in progress_ 14 | 15 | 16 | This is a textbook in preparation for an introductory undergraduate course on theoretical computer science. 17 | I am using this text for [Harvard CS 121](http://cs121.boazbarak.org). 18 | It is also used for [UVa CS 3102](https://uvatoc.github.io) and [UCLA CS181](https://hackmd.io/@raghum/introtcs). 19 | 20 | 21 | See below for individual chapters. You can also download the [book in a single PDF file](https://files.boazbarak.org/introtcs/lnotes_book.pdf) (about 600 pages, 10MB). 22 | 23 | If you have any _comments, suggestions, typo fixes_, etc.. I would be very grateful if you post them as an [**issue**](https://github.com/boazbk/tcs/issues) or [**pull request**](https://github.com/boazbk/tcs/pulls) in the [**GitHub repository boazbk/tcs**](https://github.com/boazbk/tcs) where I am maintaining the source files for these notes. 24 | You can also post comments on each chapter in the links below. 25 | 26 | 27 | See the [github.com/boazbk/tcscode](the tcscode repository) for jupyter notebooks with supplementary code for the book. 28 | 29 | 30 | For prior versions of the book, see the [repository release page](https://github.com/boazbk/tcs/releases). The most updated version of this book is always on this page. 31 | 32 | 33 | __Frozen version for Fall 20223:__ I will be only making minor edits (typos, local fixes) during the fall so as not to disrupt teaching. For consistency in references and exercises, instructors can use the following version frozen as of July 24, 2023: [Introduction to TCS version 0.95](https://github.com/boazbk/tcs/releases/download/v0.95/lnotes_book_fall2023.pdf) 34 | 35 | -------------------------------------------------------------------------------- /introtcs.bib: -------------------------------------------------------------------------------- 1 | 2 | @Book{hopcroft, 3 | author = {Hopcroft, John and Motwani, Rajeev and Ullman, Jeffrey}, 4 | title = {Introduction to automata theory, languages, and computation}, 5 | publisher = {Pearson Education}, 6 | year = {2014}, 7 | address = {Harlow, Essex}, 8 | isbn = {1292039051} 9 | } 10 | 11 | @Book{kozen1997automata, 12 | author = {Kozen, Dexter}, 13 | title = {Automata and computability}, 14 | publisher = {Springer}, 15 | year = {1997}, 16 | address = {New York}, 17 | isbn = {978-3-642-85706-5} 18 | } 19 | 20 | @book{SipserBook, 21 | author = {Michael Sipser}, 22 | title = {Introduction to the theory of computation}, 23 | publisher = {{PWS} Publishing Company}, 24 | year = {1997}, 25 | isbn = {978-0-534-94728-6}, 26 | timestamp = {Thu, 21 Apr 2011 19:59:45 +0200}, 27 | biburl = {https://dblp.org/rec/bib/books/daglib/0086373}, 28 | bibsource = {dblp computer science bibliography, https://dblp.org} 29 | } 30 | 31 | @book{HopcroftUllman79, 32 | author = {John E. Hopcroft and 33 | Jeffrey D. Ullman}, 34 | title = {Introduction to Automata Theory, Languages and Computation}, 35 | publisher = {Addison-Wesley}, 36 | year = {1979}, 37 | isbn = {0-201-02988-X}, 38 | timestamp = {Thu, 03 Jan 2002 11:51:07 +0100}, 39 | biburl = {https://dblp.org/rec/bib/books/aw/HopcroftU79}, 40 | bibsource = {dblp computer science bibliography, https://dblp.org} 41 | } 42 | 43 | @book{Boole1854, 44 | title={An investigation of the laws of thought: on which are founded the mathematical theories of logic and probabilities}, 45 | author={Boole, George}, 46 | year={1854}, 47 | publisher={Dover Publications} 48 | } 49 | 50 | 51 | @book{DeMorgan1847, 52 | title={Formal logic: or, the calculus of inference, necessary and probable}, 53 | author={De Morgan, Augustus}, 54 | year={1847}, 55 | publisher={Taylor and Walton} 56 | } 57 | 58 | 59 | @book{Boole1847mathematical, 60 | title={The mathematical analysis of logic}, 61 | author={Boole, George}, 62 | year={1847}, 63 | publisher={Philosophical Library} 64 | } 65 | 66 | @article{Turing37, 67 | title={On computable numbers, with an application to the Entscheidungsproblem}, 68 | author={Turing, Alan M}, 69 | journal={Proceedings of the London mathematical society}, 70 | volume={2}, 71 | number={1}, 72 | pages={230--265}, 73 | year={1937} 74 | } 75 | 76 | 77 | @article{McCullochPitts43, 78 | title={A logical calculus of the ideas immanent in nervous activity}, 79 | author={McCulloch, Warren S and Pitts, Walter}, 80 | journal={The bulletin of mathematical biophysics}, 81 | volume={5}, 82 | number={4}, 83 | pages={115--133}, 84 | year={1943}, 85 | publisher={Springer} 86 | } 87 | 88 | 89 | @article{RabinScott59, 90 | title={Finite automata and their decision problems}, 91 | author={Rabin, Michael O and Scott, Dana}, 92 | journal={IBM journal of research and development}, 93 | volume={3}, 94 | number={2}, 95 | pages={114--125}, 96 | year={1959}, 97 | publisher={IBM} 98 | } 99 | 100 | 101 | @article{Chomsky56, 102 | title={Three models for the description of language}, 103 | author={Chomsky, Noam}, 104 | journal={IRE Transactions on information theory}, 105 | volume={2}, 106 | number={3}, 107 | pages={113--124}, 108 | year={1956}, 109 | publisher={IEEE} 110 | } 111 | 112 | @article{Karatsuba95, 113 | title={The complexity of computations}, 114 | author={Karatsuba, ANATOLII ALEXEEVICH}, 115 | journal={Proceedings of the Steklov Institute of Mathematics-Interperiodica Translation}, 116 | volume={211}, 117 | pages={169--183}, 118 | year={1995}, 119 | publisher={Providence, RI: American Mathematical Society} 120 | } 121 | 122 | 123 | @inproceedings{Toom63, 124 | title={The complexity of a scheme of functional elements realizing the multiplication of integers}, 125 | author={Toom, Andrei L}, 126 | booktitle={Soviet Mathematics Doklady}, 127 | volume={3}, 128 | number={4}, 129 | pages={714--716}, 130 | year={1963} 131 | } 132 | 133 | 134 | @article{Cook66, 135 | title={On the minimum computation time for multiplication}, 136 | author={Cook, SA}, 137 | journal={Doctoral dissertation, Harvard University Department of Mathematics, Cambridge, Mass.}, 138 | volume={1}, 139 | year={1966} 140 | } 141 | 142 | 143 | @article{SchonhageStrassen71, 144 | title={Schnelle multiplikation grosser zahlen}, 145 | author={Sch{\"o}nhage, Arnold and Strassen, Volker}, 146 | journal={Computing}, 147 | volume={7}, 148 | number={3-4}, 149 | pages={281--292}, 150 | year={1971}, 151 | publisher={Springer} 152 | } 153 | 154 | @inproceedings{Furer07, 155 | author = {Martin F\"{u}rer}, 156 | title = {Faster integer multiplication}, 157 | booktitle = {Proceedings of the 39th Annual {ACM} Symposium on Theory of Computing, 158 | San Diego, California, USA, June 11-13, 2007}, 159 | pages = {57--66}, 160 | year = {2007}, 161 | } 162 | 163 | 164 | @article{Strassen69, 165 | title={Gaussian elimination is not optimal}, 166 | author={Strassen, Volker}, 167 | journal={Numerische mathematik}, 168 | volume={13}, 169 | number={4}, 170 | pages={354--356}, 171 | year={1969}, 172 | publisher={Springer} 173 | } 174 | 175 | 176 | @book{CLRS, 177 | author = {Thomas H. Cormen and 178 | Charles E. Leiserson and 179 | Ronald L. Rivest and 180 | Clifford Stein}, 181 | title = {Introduction to Algorithms, 3rd Edition}, 182 | publisher = {{MIT} Press}, 183 | year = {2009}, 184 | url = {http://mitpress.mit.edu/books/introduction-algorithms}, 185 | isbn = {978-0-262-03384-8}, 186 | timestamp = {Fri, 12 May 2017 13:30:20 +0200}, 187 | biburl = {https://dblp.org/rec/bib/books/daglib/0023376}, 188 | bibsource = {dblp computer science bibliography, https://dblp.org} 189 | } 190 | 191 | 192 | @book{KleinbergTardos06, 193 | author = {Jon M. Kleinberg and 194 | {\'{E}}va Tardos}, 195 | title = {Algorithm design}, 196 | publisher = {Addison-Wesley}, 197 | year = {2006}, 198 | isbn = {978-0-321-37291-8}, 199 | timestamp = {Wed, 23 Mar 2011 16:58:37 +0100}, 200 | biburl = {https://dblp.org/rec/bib/books/daglib/0015106}, 201 | bibsource = {dblp computer science bibliography, https://dblp.org} 202 | } 203 | 204 | @book{DasguptaPV08, 205 | author = {Sanjoy Dasgupta and 206 | Christos H. Papadimitriou and 207 | Umesh V. Vazirani}, 208 | title = {Algorithms}, 209 | publisher = {McGraw-Hill}, 210 | year = {2008}, 211 | isbn = {978-0-07-352340-8}, 212 | timestamp = {Wed, 09 Feb 2011 13:20:29 +0100}, 213 | biburl = {https://dblp.org/rec/bib/books/daglib/0017733}, 214 | bibsource = {dblp computer science bibliography, https://dblp.org} 215 | } 216 | 217 | 218 | @book{Blaser13, 219 | author = {Bl{\"a}ser, Markus}, 220 | title = {Fast Matrix Multiplication}, 221 | year = {2013}, 222 | pages = {1--60}, 223 | doi = {10.4086/toc.gs.2013.005}, 224 | publisher = {Theory of Computing Library}, 225 | number = {5}, 226 | series = {Graduate Surveys}, 227 | URL = {http://www.theoryofcomputing.org/library.html}, 228 | } 229 | 230 | 231 | @Book{LewisZax19, 232 | author = {Lewis, Harry and Zax, Rachel}, 233 | title = {Essential Discrete Mathematics for Computer Science}, 234 | publisher = {Princeton University Press}, 235 | year = {2019}, 236 | isbn = {9780691190617} 237 | } 238 | 239 | @Book{Livio05, 240 | author = {Livio, Mario}, 241 | title = {The equation that couldn't be solved : how mathematical genius discovered the language of symmetry}, 242 | publisher = {Simon \& Schuster}, 243 | year = {2005}, 244 | address = {New York}, 245 | isbn = {0743258207} 246 | } 247 | 248 | 249 | @article{Cooley87FFTdiscovery, 250 | title={The re-discovery of the fast Fourier transform algorithm}, 251 | author={Cooley, James W}, 252 | journal={Microchimica Acta}, 253 | volume={93}, 254 | number={1-6}, 255 | pages={33--45}, 256 | year={1987}, 257 | publisher={Springer} 258 | } 259 | 260 | 261 | @article{heideman1985gauss, 262 | title={Gauss and the history of the fast Fourier transform}, 263 | author={Heideman, Michael T and Johnson, Don H and Burrus, C Sidney}, 264 | journal={Archive for history of exact sciences}, 265 | volume={34}, 266 | number={3}, 267 | pages={265--277}, 268 | year={1985}, 269 | publisher={Springer} 270 | } 271 | 272 | 273 | @article{Werbos74, 274 | title={Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph. D. thesis, Harvard University, Cambridge, MA, 1974.}, 275 | author={Werbos, PJ}, 276 | year={1974} 277 | } 278 | 279 | 280 | @techreport{pagerank99, 281 | title={The PageRank citation ranking: Bringing order to the web.}, 282 | author={Page, Lawrence and Brin, Sergey and Motwani, Rajeev and Winograd, Terry}, 283 | year={1999}, 284 | institution={Stanford InfoLab} 285 | } 286 | 287 | @article{Kleinber99, 288 | title={Authoritative sources in a hyperlinked environment}, 289 | author={Kleinberg, Jon M}, 290 | journal={Journal of the ACM (JACM)}, 291 | volume={46}, 292 | number={5}, 293 | pages={604--632}, 294 | year={1999}, 295 | publisher={ACM} 296 | } 297 | 298 | @inproceedings{Akamai97, 299 | author = {David R. Karger and 300 | Eric Lehman and 301 | Frank Thomson Leighton and 302 | Rina Panigrahy and 303 | Matthew S. Levine and 304 | Daniel Lewin}, 305 | title = {Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web}, 306 | booktitle = {Proceedings of the Twenty-Ninth Annual {ACM} Symposium on the Theory of Computing, El Paso, Texas, USA, May 4-6, 1997}, 307 | pages = {654--663}, 308 | year = {1997}, 309 | url = {https://doi.org/10.1145/258533.258660}, 310 | doi = {10.1145/258533.258660}, 311 | timestamp = {Wed, 14 Nov 2018 10:51:37 +0100}, 312 | biburl = {https://dblp.org/rec/bib/conf/stoc/KargerLLPLL97}, 313 | bibsource = {dblp computer science bibliography, https://dblp.org} 314 | } 315 | 316 | @article{compressedmri08, 317 | title={Compressed sensing MRI}, 318 | author={Lustig, Michael and Donoho, David L and Santos, Juan M and Pauly, John M}, 319 | journal={IEEE signal processing magazine}, 320 | volume={25}, 321 | number={2}, 322 | pages={72--82}, 323 | year={2008}, 324 | publisher={IEEE} 325 | } 326 | 327 | @article{AgrawalKayalSaxena04, 328 | title={{PRIMES} is in {P}}, 329 | author={Agrawal, Manindra and Kayal, Neeraj and Saxena, Nitin}, 330 | journal={Annals of mathematics}, 331 | pages={781--793}, 332 | year={2004} 333 | } 334 | 335 | @article{CandesRombergTao06, 336 | title={Stable signal recovery from incomplete and inaccurate measurements}, 337 | author={Candes, Emmanuel J and Romberg, Justin K and Tao, Terence}, 338 | journal={Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences}, 339 | volume={59}, 340 | number={8}, 341 | pages={1207--1223}, 342 | year={2006}, 343 | } 344 | 345 | @article{Donoho2006compressed, 346 | title={Compressed sensing}, 347 | author={Donoho, David L}, 348 | journal={IEEE Transactions on information theory}, 349 | volume={52}, 350 | number={4}, 351 | pages={1289--1306}, 352 | year={2006}, 353 | publisher={IEEE} 354 | } 355 | 356 | @article{Ellenberg10wired, 357 | title={Fill in the blanks: Using math to turn lo-res datasets into hi-res samples}, 358 | author={Ellenberg, Jordan}, 359 | journal={Wired Magazine}, 360 | volume={18}, 361 | number={3}, 362 | pages={501--509}, 363 | year={2010} 364 | } 365 | 366 | 367 | @article{Hardy41, 368 | title={A Mathematician's Apology}, 369 | author={Hardy, GH}, 370 | year={1941} 371 | } 372 | 373 | @Book{LehmanLeightonMeyer, 374 | title = {Mathematics for Computer Science}, 375 | author = {Lehman, Eric and Leighton, Thomson F. and Meyer, Albert R.}, 376 | year = {2018}, 377 | notes = {Lecture notes for MIT Course 6.042; latest version available from the course webpage.} 378 | } 379 | 380 | @Book{Rosen19discrete, 381 | author = {Rosen, Kenneth}, 382 | title = {Discrete mathematics and its applications}, 383 | publisher = {McGraw-Hill}, 384 | year = {2019}, 385 | address = {New York, NY}, 386 | isbn = {125967651x} 387 | } 388 | 389 | @misc{Fleck, 390 | title = {Building Blocks for Theoretical Computer Science}, 391 | author = {Margaret M. Fleck}, 392 | year = {2018}, 393 | note = {Online book, available at \url{http://mfleck.cs.illinois.edu/building-blocks/}.} 394 | } 395 | 396 | @Book{Kun18, 397 | author = {Kun, Jeremy}, 398 | title = {A programmer's introduction to mathematics}, 399 | publisher = {CreateSpace Independent Publishing Platform}, 400 | year = {2018}, 401 | address = {Middletown, DE}, 402 | isbn = {1727125452} 403 | } 404 | 405 | @misc{AspensDiscreteMath, 406 | author = {James Aspens}, 407 | title = {Notes on Discrete Mathematics}, 408 | year = {2018}, 409 | note = {Online textbook for CS 202. Available on \url{http://www.cs.yale.edu/homes/aspnes/classes/202/notes.pdf}.} 410 | } 411 | 412 | @Book{Solow14, 413 | author = {Solow, Daniel}, 414 | title = {How to read and do proofs : an introduction to mathematical thought processes}, 415 | publisher = {John Wiley \& Sons, Inc}, 416 | year = {2014}, 417 | address = {Hoboken, New Jersey}, 418 | isbn = {9781118164020} 419 | } 420 | 421 | 422 | @article{CoverThomas06, 423 | title={Elements of information theory 2nd edition}, 424 | author={Cover, Thomas M. and Thomas, Joy A.}, 425 | journal={Willey-Interscience: NJ}, 426 | year={2006}, 427 | isbn = {0471241954} 428 | } 429 | 430 | 431 | @book{MooreMertens11, 432 | title={The nature of computation}, 433 | author={Moore, Cristopher and Mertens, Stephan}, 434 | year={2011}, 435 | publisher={Oxford University Press} 436 | } 437 | 438 | @book{Aaronson13democritus, 439 | title={Quantum computing since Democritus}, 440 | author={Aaronson, Scott}, 441 | year={2013}, 442 | publisher={Cambridge University Press} 443 | } 444 | 445 | 446 | @book{Dauben90cantor, 447 | title={Georg Cantor: His mathematics and philosophy of the infinite}, 448 | author={Dauben, Joseph Warren}, 449 | year={1990}, 450 | publisher={Princeton University Press} 451 | } 452 | 453 | 454 | @book{halmos1960naive, 455 | title={Naive set theory}, 456 | author={Halmos, Paul R}, 457 | year={1960}, 458 | note ={Republished in 2017 by Courier Dover Publications.} 459 | } 460 | 461 | 462 | @article{Shannon1938, 463 | title={A symbolic analysis of relay and switching circuits}, 464 | author={Shannon, Claude E}, 465 | journal={Electrical Engineering}, 466 | volume={57}, 467 | number={12}, 468 | pages={713--723}, 469 | year={1938}, 470 | publisher={IEEE} 471 | } 472 | 473 | 474 | @article{HopcroftUllman69, 475 | title={Formal languages and their relation to automata}, 476 | author={Hopcroft, John E and Ullman, Jeffrey D}, 477 | year={1969}, 478 | publisher={Addison-Wesley Longman Publishing Co., Inc.} 479 | } 480 | 481 | 482 | @book{Savage1998models, 483 | title={Models of computation}, 484 | author={Savage, John E}, 485 | volume={136}, 486 | year={1998}, 487 | publisher={Addison-Wesley Reading, MA}, 488 | note = {Available electronically at \url{http://cs.brown.edu/people/jsavage/book/}.} 489 | } 490 | 491 | 492 | @article{Sheffer1913, 493 | title={A set of five independent postulates for Boolean algebras, with application to logical constants}, 494 | author={Sheffer, Henry Maurice}, 495 | journal={Transactions of the American mathematical society}, 496 | volume={14}, 497 | number={4}, 498 | pages={481--488}, 499 | year={1913}, 500 | publisher={JSTOR} 501 | } 502 | 503 | 504 | @book{WhiteheadRussell1912, 505 | title={Principia mathematica}, 506 | author={Whitehead, Alfred North and Russell, Bertrand}, 507 | volume={2}, 508 | year={1912}, 509 | publisher={University Press} 510 | } 511 | 512 | @book{Peirce1976, 513 | title={The new elements of mathematics}, 514 | author={Peirce, Charles Sanders and Eisele, Carolyn}, 515 | volume={4}, 516 | year={1976}, 517 | publisher={Mouton The Hague}, 518 | note = {Works of Charles Peirce, collected and edited by Carolyn Eisele.} 519 | } 520 | 521 | 522 | 523 | @article{Burks1978charles, 524 | title={Booke review: Charles S. Peirce, The new elements of mathematics}, 525 | author={Burks, Arthur W}, 526 | journal={Bulletin of the American Mathematical Society}, 527 | volume={84}, 528 | number={5}, 529 | pages={913--918}, 530 | year={1978}, 531 | publisher={American Mathematical Society} 532 | } 533 | 534 | 535 | @book{Jukna12, 536 | title={Boolean function complexity: advances and frontiers}, 537 | author={Jukna, Stasys}, 538 | volume={27}, 539 | year={2012}, 540 | publisher={Springer Science \& Business Media} 541 | } 542 | 543 | 544 | @book{NisanShocken2005, 545 | title={The elements of computing systems: building a modern computer from first principles}, 546 | author={Nisan, Noam and Schocken, Shimon}, 547 | year={2005}, 548 | publisher={MIT press} 549 | } 550 | 551 | 552 | @article{Lupanov1958, 553 | title={A circuit synthesis method}, 554 | author={Lupanov, O}, 555 | journal={Izv. Vuzov, Radiofizika}, 556 | volume={1}, 557 | number={1}, 558 | pages={120--130}, 559 | year={1958} 560 | } 561 | 562 | 563 | @inproceedings{Valiant1976, 564 | title={Universal circuits (preliminary report)}, 565 | author={Valiant, Leslie G}, 566 | booktitle={Proceedings of the eighth annual ACM symposium on Theory of computing}, 567 | pages={196--203}, 568 | year={1976}, 569 | organization={ACM} 570 | } 571 | 572 | @article{lipmaa2016valiant, 573 | title={Valiant's Universal Circuit: Improvements, Implementation, and Applications.}, 574 | author={Lipmaa, Helger and Mohassel, Payman and Sadeghian, Seyed Saeed}, 575 | journal={IACR Cryptology ePrint Archive}, 576 | volume={2016}, 577 | pages={17}, 578 | year={2016} 579 | } 580 | 581 | @inproceedings{Gunther2017, 582 | title={More efficient universal circuit constructions}, 583 | author={G{\"u}nther, Daniel and Kiss, {\'A}gnes and Schneider, Thomas}, 584 | booktitle={International Conference on the Theory and Application of Cryptology and Information Security}, 585 | pages={443--470}, 586 | year={2017}, 587 | organization={Springer} 588 | } 589 | 590 | 591 | @book{wegener1987complexity, 592 | title={The complexity of Boolean functions}, 593 | author={Wegener, Ingo}, 594 | volume={1}, 595 | year={1987}, 596 | publisher={BG Teubner Stuttgart} 597 | } 598 | 599 | 600 | @phdthesis{Ernst2009phd, 601 | title={Optimal combinational multi-level logic synthesis}, 602 | author={Ernst, Elizabeth Ann}, 603 | year={2009}, 604 | school={University of Michigan}, 605 | note = {Available on \url{https://deepblue.lib.umich.edu/handle/2027.42/62373}.} 606 | } 607 | 608 | 609 | @Book{shetterly2016hidden, 610 | author = {Shetterly, Margot}, 611 | title = {Hidden figures : the American dream and the untold story of the Black women mathematicians who helped win the space race}, 612 | publisher = {William Morrow}, 613 | year = {2016}, 614 | address = {New York, NY}, 615 | isbn = {9780062363602} 616 | } 617 | 618 | 619 | @Book{sobel2017the, 620 | author = {Sobel, Dava}, 621 | title = {The Glass Universe : How the Ladies of the Harvard Observatory Took the Measure of the Stars}, 622 | publisher = {Penguin Books}, 623 | year = {2017}, 624 | address = {New York}, 625 | isbn = {9780143111344} 626 | } 627 | 628 | 629 | @article{Wang1957, 630 | title={A variant to Turing's theory of computing machines}, 631 | author={Wang, Hao}, 632 | journal={Journal of the ACM (JACM)}, 633 | volume={4}, 634 | number={1}, 635 | pages={63--92}, 636 | year={1957}, 637 | publisher={ACM} 638 | } 639 | 640 | 641 | @book{lupanov1984, 642 | title={Asymptotic complexity bounds for control circuits}, 643 | author={Lupanov, Oleg B.}, 644 | year={1984}, 645 | publisher={MSU}, 646 | note = {In Russian.} 647 | } 648 | 649 | 650 | @Book{church1941, 651 | author = {Church, Alonzo}, 652 | title = {The calculi of lambda-conversion}, 653 | publisher = {Princeton University Press H. Milford, Oxford University Press}, 654 | year = {1941}, 655 | address = {Princeton London}, 656 | isbn = {978-0-691-08394-0} 657 | } 658 | 659 | 660 | @Book{pierce2002types, 661 | author = {Pierce, Benjamin}, 662 | title = {Types and programming languages}, 663 | publisher = {MIT Press}, 664 | year = {2002}, 665 | address = {Cambridge, Mass}, 666 | isbn = {0262162091} 667 | } 668 | 669 | 670 | @Book{barendregt1984, 671 | author = {Barendregt, H. P.}, 672 | title = {The lambda calculus : its syntax and semantics}, 673 | publisher = {North-Holland Sole distributors for the U.S.A. and Canada, Elsevier Science Pub. Co}, 674 | year = {1984}, 675 | address = {Amsterdam New York New York, N.Y}, 676 | isbn = {0444875085} 677 | } 678 | 679 | 680 | @inproceedings{hagerup1998, 681 | title={Sorting and searching on the word RAM}, 682 | author={Hagerup, Torben}, 683 | booktitle={Annual Symposium on Theoretical Aspects of Computer Science}, 684 | pages={366--398}, 685 | year={1998}, 686 | organization={Springer} 687 | } 688 | 689 | 690 | @article{fredman1993, 691 | title={Surpassing the information theoretic bound with fusion trees}, 692 | author={Fredman, Michael L and Willard, Dan E}, 693 | journal={Journal of computer and system sciences}, 694 | volume={47}, 695 | number={3}, 696 | pages={424--436}, 697 | year={1993}, 698 | publisher={Elsevier} 699 | } 700 | 701 | 702 | @article{shamir1979, 703 | title={Factoring numbers in O (logn) arithmetic steps}, 704 | author={Shamir, Adi}, 705 | journal={Information Processing Letters}, 706 | volume={8}, 707 | number={1}, 708 | pages={28--31}, 709 | year={1979}, 710 | publisher={Elsevier} 711 | } 712 | 713 | 714 | @article{thompson1984reflections, 715 | title={Reflections on trusting trust}, 716 | author={Thompson, Ken}, 717 | journal={Communications of the ACM}, 718 | volume={27}, 719 | number={8}, 720 | pages={761--763}, 721 | year={1984}, 722 | publisher={ACM} 723 | } 724 | 725 | 726 | 727 | @Book{grabiner2005the, 728 | author = {Grabiner, Judith}, 729 | title = {The origins of Cauchy's rigorous calculus}, 730 | publisher = {Dover Publications}, 731 | year = {2005}, 732 | address = {Mineola, N.Y}, 733 | isbn = {9780486438153} 734 | } 735 | 736 | @article{grabiner1983gave, 737 | title={Who gave you the epsilon? Cauchy and the origins of rigorous calculus}, 738 | author={Grabiner, Judith V}, 739 | journal={The American Mathematical Monthly}, 740 | volume={90}, 741 | number={3}, 742 | pages={185--194}, 743 | year={1983}, 744 | publisher={Taylor \& Francis} 745 | } 746 | 747 | 748 | 749 | @inbook{Lutzen2002, 750 | place={Cambridge}, 751 | series={The Cambridge History of Science}, 752 | title={Between Rigor and Applications: Developments in the Concept of Function in Mathematical Analysis}, 753 | volume={5}, 754 | DOI={10.1017/CHOL9780521571999.026}, 755 | booktitle={The Cambridge History of Science}, 756 | publisher={Cambridge University Press}, 757 | author={Lützen, Jesper}, 758 | editor={Nye, Mary JoEditor}, 759 | year={2002}, 760 | pages={468–487}, 761 | collection={The Cambridge History of Science} 762 | } 763 | 764 | 765 | @article{Kleiner91, 766 | ISSN = {0025570X, 19300980}, 767 | URL = {http://www.jstor.org/stable/2690647}, 768 | author = {Israel Kleiner}, 769 | journal = {Mathematics Magazine}, 770 | number = {5}, 771 | pages = {291--314}, 772 | publisher = {Mathematical Association of America}, 773 | title = {Rigor and Proof in Mathematics: A Historical Perspective}, 774 | volume = {64}, 775 | year = {1991} 776 | } 777 | 778 | 779 | @Book{cohen2008set, 780 | author = {Cohen, Paul}, 781 | title = {Set theory and the continuum hypothesis}, 782 | publisher = {Dover Publications}, 783 | year = {2008}, 784 | address = {Mineola, N.Y}, 785 | isbn = {0486469212} 786 | } 787 | 788 | 789 | @Book{hofstadter1999, 790 | author = {Hofstadter, Douglas}, 791 | title = {Gödel, Escher, Bach : an eternal golden braid}, 792 | publisher = {Basic Books}, 793 | year = {1999}, 794 | address = {New York}, 795 | isbn = {0465026567} 796 | } 797 | 798 | 799 | @Book{Holt2018, 800 | author = {Holt, Jim}, 801 | title = {When Einstein walked with Gödel : excursions to the edge of thought}, 802 | publisher = {Farrar, Straus and Giroux}, 803 | year = {2018}, 804 | address = {New York}, 805 | isbn = {0374146705} 806 | } 807 | 808 | 809 | @article{shelah2003logical, 810 | title={Logical dreams}, 811 | author={Shelah, Saharon}, 812 | journal={Bulletin of the American Mathematical Society}, 813 | volume={40}, 814 | number={2}, 815 | pages={203--228}, 816 | year={2003} 817 | } 818 | 819 | 820 | @Book{singh1997fermat, 821 | author = {Singh, Simon}, 822 | title = {Fermat's enigma : the quest to solve the world's greatest mathematical problem}, 823 | publisher = {Walker}, 824 | year = {1997}, 825 | address = {New York}, 826 | isbn = {0385493622} 827 | } 828 | 829 | 830 | @article{rice1953classes, 831 | title={Classes of recursively enumerable sets and their decision problems}, 832 | author={Rice, Henry Gordon}, 833 | journal={Transactions of the American Mathematical Society}, 834 | volume={74}, 835 | number={2}, 836 | pages={358--366}, 837 | year={1953}, 838 | publisher={JSTOR} 839 | } 840 | 841 | @Book{stein1987ada, 842 | author = {Stein, Dorothy}, 843 | title = {Ada : a life and a legacy}, 844 | publisher = {MIT Press}, 845 | year = {1987}, 846 | address = {Cambridge, Mass}, 847 | isbn = {0262691167} 848 | } 849 | 850 | @article{holt2001ada, 851 | title={The Ada perplex: how Byron’s daughter came to be celebrated as a cybervisionary}, 852 | author={Holt, Jim}, 853 | journal={The New Yorker}, 854 | volume={5}, 855 | pages={88--93}, 856 | year={2001} 857 | } 858 | 859 | @book{collier2000charles, 860 | title={Charles Babbage: And the engines of perfection}, 861 | author={Collier, Bruce and MacLachlan, James}, 862 | year={2000}, 863 | publisher={Oxford University Press} 864 | } 865 | 866 | @Book{swade2002the, 867 | author = {Swade, Doron}, 868 | title = {The difference engine : Charles Babbage and the quest to build the first computer}, 869 | publisher = {Penguin Books}, 870 | year = {2002}, 871 | address = {New York}, 872 | isbn = {0142001449} 873 | } 874 | 875 | 876 | @article{vonNeumann45, 877 | title={First Draft of a Report on the EDVAC}, 878 | author={von Neumann, John}, 879 | year={1945}, 880 | note ={Reprinted in the IEEE Annals of the History of Computing journal, 1993}, 881 | } 882 | 883 | 884 | @unpublished{HarveyvdHoeven2019, 885 | TITLE = {Integer multiplication in time O(n log n)}, 886 | AUTHOR = {Harvey, David and Van Der Hoeven, Joris}, 887 | URL = {https://hal.archives-ouvertes.fr/hal-02070778}, 888 | NOTE = {working paper or preprint}, 889 | YEAR = {2019}, 890 | MONTH = {3}, 891 | PDF = {https://hal.archives-ouvertes.fr/hal-02070778/file/nlogn.pdf}, 892 | HAL_ID = {hal-02070778}, 893 | HAL_VERSION = {v1}, 894 | } 895 | 896 | 897 | @article{schrijver2005history, 898 | title={On the history of combinatorial optimization (till 1960)}, 899 | author={Schrijver, Alexander}, 900 | journal={Handbooks in operations research and management science}, 901 | volume={12}, 902 | pages={1--68}, 903 | year={2005}, 904 | publisher={Elsevier} 905 | } 906 | 907 | @misc{TardosKleinberg, 908 | title={Algorithm Design}, 909 | author={Tardos, Eva and Kleinberg, Jon}, 910 | year={2006}, 911 | publisher={Reading (MA): Addison-Wesley} 912 | } 913 | 914 | @book{dasgupta2008algorithms, 915 | title={Algorithms}, 916 | author={Dasgupta, Sanjoy and Papadimitriou, Christos H and Vazirani, Umesh Virkumar}, 917 | year={2008}, 918 | publisher={McGraw-Hill Higher Education} 919 | } 920 | 921 | 922 | @article{maass1985combinatorial, 923 | title={Combinatorial lower bound arguments for deterministic and nondeterministic Turing machines}, 924 | author={Maass, Wolfgang}, 925 | journal={Transactions of the American Mathematical Society}, 926 | volume={292}, 927 | number={2}, 928 | pages={675--693}, 929 | year={1985} 930 | } 931 | 932 | @article{gomes2008satisfiability, 933 | title={Satisfiability solvers}, 934 | author={Gomes, Carla P and Kautz, Henry and Sabharwal, Ashish and Selman, Bart}, 935 | journal={Foundations of Artificial Intelligence}, 936 | volume={3}, 937 | pages={89--134}, 938 | year={2008}, 939 | publisher={Elsevier} 940 | } 941 | 942 | 943 | @article{cohen1981holy, 944 | title={On holy wars and a plea for peace}, 945 | author={Cohen, Danny}, 946 | journal={Computer}, 947 | volume={14}, 948 | number={10}, 949 | pages={48--54}, 950 | year={1981}, 951 | publisher={IEEE} 952 | } 953 | 954 | @article{cook1973time, 955 | title={Time bounded random access machines}, 956 | author={Cook, Stephen A and Reckhow, Robert A}, 957 | journal={Journal of Computer and System Sciences}, 958 | volume={7}, 959 | number={4}, 960 | pages={354--375}, 961 | year={1973}, 962 | publisher={Elsevier} 963 | } 964 | 965 | @inproceedings{baiocchi2001three, 966 | title={Three small universal Turing machines}, 967 | author={Baiocchi, Claudio}, 968 | booktitle={International Conference on Machines, Computations, and Universality}, 969 | pages={1--10}, 970 | year={2001}, 971 | organization={Springer} 972 | } 973 | 974 | @article{rogozhin1996small, 975 | title={Small universal Turing machines}, 976 | author={Rogozhin, Yurii}, 977 | journal={Theoretical Computer Science}, 978 | volume={168}, 979 | number={2}, 980 | pages={215--240}, 981 | year={1996}, 982 | publisher={Elsevier} 983 | } 984 | 985 | @article{woods2009complexity, 986 | title={The complexity of small universal Turing machines: A survey}, 987 | author={Woods, Damien and Neary, Turlough}, 988 | journal={Theoretical Computer Science}, 989 | volume={410}, 990 | number={4-5}, 991 | pages={443--450}, 992 | year={2009}, 993 | publisher={Elsevier} 994 | } 995 | 996 | @book{mermin2007quantum, 997 | title={Quantum computer science: an introduction}, 998 | author={Mermin, N David}, 999 | year={2007}, 1000 | publisher={Cambridge University Press} 1001 | } 1002 | 1003 | 1004 | @article{williams2009finding, 1005 | title={Finding paths of length k in $O^*(2^k)$ time}, 1006 | author={Williams, Ryan}, 1007 | journal={Information Processing Letters}, 1008 | volume={109}, 1009 | number={6}, 1010 | pages={315--318}, 1011 | year={2009}, 1012 | publisher={Elsevier} 1013 | } 1014 | 1015 | @article{bjorklund2014determinant, 1016 | title={Determinant sums for undirected hamiltonicity}, 1017 | author={Bjorklund, Andreas}, 1018 | journal={SIAM Journal on Computing}, 1019 | volume={43}, 1020 | number={1}, 1021 | pages={280--299}, 1022 | year={2014}, 1023 | publisher={SIAM} 1024 | } 1025 | 1026 | 1027 | @article{buchfuhrer2011complexity, 1028 | title={The complexity of Boolean formula minimization}, 1029 | author={Buchfuhrer, David and Umans, Christopher}, 1030 | journal={Journal of Computer and System Sciences}, 1031 | volume={77}, 1032 | number={1}, 1033 | pages={142--153}, 1034 | year={2011}, 1035 | publisher={Elsevier} 1036 | } 1037 | 1038 | @incollection{aaronson2016p, 1039 | title={P =? NP}, 1040 | author={Aaronson, Scott}, 1041 | booktitle={Open problems in mathematics}, 1042 | pages={1--122}, 1043 | year={2016}, 1044 | publisher={Springer}, 1045 | note = {Available on \url{https://www.scottaaronson.com/papers/pnp.pdf}} 1046 | } 1047 | 1048 | 1049 | 1050 | @article{aaronson2005physicalreality, 1051 | title={NP-complete problems and physical reality}, 1052 | author={Aaronson, Scott}, 1053 | journal={ACM Sigact News}, 1054 | volume={36}, 1055 | number={1}, 1056 | pages={30--52}, 1057 | year={2005}, 1058 | publisher={ACM}, 1059 | note = {Available on \url{https://arxiv.org/abs/quant-ph/0502072}} 1060 | } 1061 | 1062 | 1063 | 1064 | @article{johnson2012brief, 1065 | title={A brief history of NP-completeness, 1954--2012}, 1066 | author={Johnson, David S}, 1067 | journal={Documenta Mathematica}, 1068 | pages={359--376}, 1069 | year={2012} 1070 | } 1071 | 1072 | 1073 | 1074 | 1075 | @book{wigderson2017mathematics, 1076 | title={Mathematics and Computation}, 1077 | author={Wigderson, Avi}, 1078 | year={2019}, 1079 | publisher = {Princeton University Press}, 1080 | note = {Draft available on \url{https://www.math.ias.edu/avi/book}} 1081 | } 1082 | 1083 | 1084 | @article{vadhan2012pseudorandomness, 1085 | title={Pseudorandomness}, 1086 | author={Vadhan, Salil P and others}, 1087 | journal={Foundations and Trends{\textregistered} in Theoretical Computer Science}, 1088 | volume={7}, 1089 | number={1--3}, 1090 | pages={1--336}, 1091 | year={2012}, 1092 | publisher={Now Publishers, Inc.} 1093 | } 1094 | 1095 | 1096 | @book{motwani1995randomized, 1097 | title={Randomized algorithms}, 1098 | author={Motwani, Rajeev and Raghavan, Prabhakar}, 1099 | year={1995}, 1100 | publisher={Cambridge university press} 1101 | } 1102 | 1103 | 1104 | 1105 | @book{mitzenmacher2017probability, 1106 | title={Probability and computing: randomization and probabilistic techniques in algorithms and data analysis}, 1107 | author={Mitzenmacher, Michael and Upfal, Eli}, 1108 | year={2017}, 1109 | publisher={Cambridge university press} 1110 | } 1111 | 1112 | @article{woods2009complexity, 1113 | title={The complexity of small universal Turing machines: A survey}, 1114 | author={Woods, Damien and Neary, Turlough}, 1115 | journal={Theoretical Computer Science}, 1116 | volume={410}, 1117 | number={4-5}, 1118 | pages={443--450}, 1119 | year={2009}, 1120 | publisher={Elsevier} 1121 | } 1122 | 1123 | @article{aaronson20beaver, 1124 | title = {The Busy Beaver Frontier}, 1125 | author = {Scott Aaronson}, 1126 | year = {2020}, 1127 | note = {Available on \url{https://www.scottaaronson.com/papers/bb.pdf}}, 1128 | journal = {{SIGACT} News} 1129 | } 1130 | 1131 | 1132 | @inproceedings{Shor94, 1133 | author = {Peter W. Shor}, 1134 | title = {Algorithms for Quantum Computation: Discrete Logarithms and Factoring}, 1135 | booktitle = {35th Annual Symposium on Foundations of Computer Science, Santa Fe, 1136 | New Mexico, USA, 20-22 November 1994}, 1137 | pages = {124--134}, 1138 | publisher = {{IEEE} Computer Society}, 1139 | year = {1994}, 1140 | url = {https://doi.org/10.1109/SFCS.1994.365700}, 1141 | doi = {10.1109/SFCS.1994.365700}, 1142 | timestamp = {Wed, 16 Oct 2019 14:14:54 +0200}, 1143 | biburl = {https://dblp.org/rec/conf/focs/Shor94.bib}, 1144 | bibsource = {dblp computer science bibliography, https://dblp.org} 1145 | } -------------------------------------------------------------------------------- /lec_00_0_preface.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Preface" 3 | filename: "lec_00_0_preface" 4 | chapternum: "p" 5 | --- 6 | 7 | 8 | # Preface {#prefacechap } 9 | 10 | >_"We make ourselves no promises, but we cherish the hope that the unobstructed pursuit of useless knowledge will prove to have consequences in the future as in the past"_ ... 11 | >_"An institution which sets free successive generations of human souls is amply justified whether or not this graduate or that makes a so-called useful contribution to human knowledge. A poem, a symphony, a painting, a mathematical truth, a new scientific fact, all bear in themselves all the justification that universities, colleges, and institutes of research need or require"_, Abraham Flexner, [The Usefulness of Useless Knowledge](https://www.ias.edu/sites/default/files/library/UsefulnessHarpers.pdf), 1939. 12 | 13 | >_"I suggest that you take the hardest courses that you can, because you learn the most when you challenge yourself... CS 121 I found pretty hard."_, [Mark Zuckerberg](https://youtu.be/xFFs9UgOAlE?t=3646), 2005. 14 | 15 | 16 | This is a textbook for an undergraduate introductory course on theoretical computer science. 17 | The educational goals of this book are to convey the following: 18 | 19 | * That computation arises in a variety of natural and human-made systems, and not only in modern silicon-based computers. 20 | 21 | * Similarly, beyond being an extremely important _tool_, computation also serves as a useful _lens_ to describe natural, physical, mathematical and even social concepts. 22 | 23 | * The notion of _universality_ of many different computational models, and the related notion of the duality between _code_ and _data_. 24 | 25 | * The idea that one can precisely define a mathematical model of computation, and then use that to prove (or sometimes only conjecture) lower bounds and impossibility results. 26 | 27 | * Some of the surprising results and discoveries in modern theoretical computer science, including the prevalence of NP-completeness, the power of interaction, the power of randomness on one hand and the possibility of derandomization on the other, the ability to use hardness "for good" in cryptography, and the fascinating possibility of quantum computing. 28 | 29 | I hope that following this course, students would be able to recognize computation, with both its power and pitfalls, as it arises in various settings, including seemingly "static" content or "restricted" formalisms such as macros and scripts. 30 | They should be able to follow through the logic of _proofs_ about computation, including the central concept of a _reduction_, as well as understanding "self-referential" proofs (such as diagonalization-based proofs that involve programs given their own code as input). 31 | Students should understand that some problems are _inherently intractable_, and be able to recognize the potential for intractability when they are faced with a new problem. 32 | While this book only touches on cryptography, students should understand the basic idea of how we can use computational hardness for cryptographic purposes. 33 | However, more than any specific skill, this book aims to introduce students to a new way of thinking of computation as an object in its own right and to illustrate how this new way of thinking leads to far-reaching insights and applications. 34 | 35 | My aim in writing this text is to try to convey these concepts in the simplest possible way and try to make sure that the formal notation and model help elucidate, rather than obscure, the main ideas. 36 | I also tried to take advantage of modern students' familiarity (or at least interest!) in programming, and hence use (highly simplified) programming languages to describe our models of computation. 37 | That said, this book does not assume fluency with any particular programming language, but rather only some familiarity with the general _notion_ of programming. 38 | We will use programming metaphors and idioms, occasionally mentioning specific programming languages such as _Python_, _C_, or _Lisp_, but students should be able to follow these descriptions even if they are not familiar with these languages. 39 | 40 | Proofs in this book, including the existence of a universal Turing Machine, the fact that every finite function can be computed by some circuit, the Cook-Levin theorem, and many others, are often constructive and algorithmic, in the sense that they ultimately involve transforming one program to another. 41 | While it is possible to follow these proofs without seeing the code, I do think that having access to the code, and the ability to play around with it and see how it acts on various programs, can make these theorems more concrete for the students. 42 | To that end, an accompanying website (which is still a work in progress) allows executing programs in the various computational models we define, as well as seeing constructive proofs of some of the theorems. 43 | 44 | ## To the student 45 | 46 | This book can be challenging, mainly because it brings together a variety of ideas and techniques in the study of computation. 47 | There are quite a few technical hurdles to master, whether it is following the diagonalization argument for proving the Halting Problem is undecidable, combinatorial gadgets in NP-completeness reductions, analyzing probabilistic algorithms, or arguing about the adversary to prove the security of cryptographic primitives. 48 | 49 | The best way to engage with this material is to read these notes __actively__, so make sure you have a pen ready. 50 | While reading, I encourage you to stop and think about the following: 51 | 52 | * When I state a theorem, stop and take a shot at proving it on your own _before_ reading the proof. You will be amazed by how much better you can understand a proof even after only 5 minutes of attempting it on your own. 53 | 54 | * When reading a definition, make sure that you understand what the definition means, and what the natural examples are of objects that satisfy it and objects that do not. Try to think of the motivation behind the definition, and whether there are other natural ways to formalize the same concept. 55 | 56 | * Actively notice which questions arise in your mind as you read the text, and whether or not they are answered in the text. 57 | 58 | As a general rule, it is more important that you understand the __definitions__ than the __theorems__, and it is more important that you understand a __theorem statement__ than its __proof__. 59 | After all, before you can prove a theorem, you need to understand what it states, and to understand what a theorem is about, you need to know the definitions of the objects involved. 60 | Whenever a proof of a theorem is at least somewhat complicated, I provide a "proof idea." 61 | Feel free to skip the actual proof in a first reading, focusing only on the proof idea. 62 | 63 | 64 | 65 | This book contains some code snippets, but this is by no means a programming text. You don't need to know how to program to follow this material. The reason we use code is that it is a _precise_ way to describe computation. Particular implementation details are not as important to us, and so we will emphasize code readability at the expense of considerations such as error handling, encapsulation, etc. that can be extremely important for real-world programming. 66 | 67 | ### Is the effort worth it? 68 | 69 | This is not an easy book, and you might reasonably wonder why should you spend the effort in learning this material. 70 | A traditional justification for a "Theory of Computation" course is that you might encounter these concepts later on in your career. 71 | Perhaps you will come across a hard problem and realize it is NP complete, or find a need to use what you learned about regular expressions. 72 | This might very well be true, but the main benefit of this book is not in teaching you any practical tool or technique, but instead in giving you a _different way of thinking_: an ability to recognize computational phenomena even when they occur in non-obvious settings, a way to model computational tasks and questions, and to reason about them. 73 | 74 | 75 | Regardless of any use you will derive from this book, I believe learning this material is important because it contains concepts that are both beautiful and fundamental. 76 | The role that _energy_ and _matter_ played in the 20th century is played in the 21st by _computation_ and _information_, not just as tools for our technology and economy, but also as the basic building blocks we use to understand the world. 77 | This book will give you a taste of some of the theory behind those, and hopefully spark your curiosity to study more. 78 | 79 | 80 | ## To potential instructors 81 | 82 | I wrote this book for my Harvard course, but I hope that other lecturers will find it useful as well. 83 | To some extent, it is similar in content to "Theory of Computation" or "Great Ideas" courses such as those taught at [CMU](http://www.cs.cmu.edu/~./15251/) or [MIT](http://stellar.mit.edu/S/course/6/sp16/6.045/materials.html). 84 | 85 | 86 | The most significant difference between our approach and more traditional ones (such as Hopcroft and Ullman's [@HopcroftUllman69, @HopcroftUllman79] and Sipser's [@SipserBook]) is that we do not start with _finite automata_ as our initial computational model. 87 | Instead, our initial computational model is _Boolean Circuits_.^[An earlier book that starts with circuits as the initial model is John Savage's [@Savage1998models].] 88 | We believe that Boolean Circuits are more fundamental to the theory of computing (and even its practice!) than automata. 89 | In particular, Boolean Circuits are a prerequisite for many concepts that one would want to teach in a modern course on theoretical computer science, including cryptography, quantum computing, derandomization, attempts at proving $\mathbf{P} \neq \mathbf{NP}$, and more. 90 | Even in cases where Boolean Circuits are not strictly required, they can often offer significant simplifications (as in the case of the proof of the Cook-Levin Theorem). 91 | 92 | Furthermore, I believe there are pedagogical reasons to start with Boolean circuits as opposed to finite automata. 93 | Boolean circuits are a more natural model of computation, and one that corresponds more closely to computing in silicon, making the connection to practice more immediate to the students. 94 | Finite functions are arguably easier to grasp than infinite ones, as we can fully write down their truth table. 95 | The theorem that _every_ finite function can be computed by some Boolean circuit is both simple enough and important enough to serve as an excellent starting point for this course. 96 | Moreover, many of the main conceptual points of the theory of computation, including the notions of the duality between _code_ and _data_, and the idea of _universality_, can already be seen in this context. 97 | 98 | After Boolean circuits, we move on to Turing machines and prove results such as the existence of a universal Turing machine, the uncomputability of the halting problem, and Rice's Theorem. 99 | Automata are discussed after we see Turing machines and undecidability, as an example for a _restricted computational model_ where problems such as determining halting can be effectively solved. 100 | 101 | 102 | While this is not our motivation, the order we present circuits, Turing machines, and automata roughly corresponds to the chronological order of their discovery. 103 | Boolean algebra goes back to Boole's and DeMorgan's works in the 1840s [@Boole1847mathematical, @DeMorgan1847] (though the definition of Boolean circuits and the connection to physical computation was given 90 years later by Shannon [@Shannon1938]). Alan Turing defined what we now call "Turing Machines" in the 1930s [@Turing37], while finite automata were introduced in the 1943 work of McCulloch and Pitts [@McCullochPitts43] but only really understood in the seminal 1959 work of Rabin and Scott [@RabinScott59]. 104 | 105 | More importantly, while models such as finite-state machines, regular expressions, and context-free grammars are incredibly important for practice, the main applications for these models 106 | (whether it is for parsing, for analyzing properties such as _liveness_ and _safety_, or even for [software-defined routing tables](https://www.cs.cornell.edu/~kozen/Papers/NetKAT-APLAS.pdf)) rely crucially on the fact that these are _tractable_ models for which we can effectively answer _semantic questions_. 107 | This practical motivation can be better appreciated _after_ students see the undecidability of semantic properties of general computing models. 108 | 109 | 110 | The fact that we start with circuits makes proving the Cook-Levin Theorem much easier. In fact, our proof of this theorem can be (and is) done using a handful of lines of Python. Combining this proof with the standard reductions (which are also implemented in Python) allows students to appreciate visually how a question about computation can be mapped into a question about (for example) the existence of an independent set in a graph. 111 | 112 | 113 | Some other differences between this book and previous texts are the following: 114 | 115 | 116 | 1. For measuring _time complexity_, we use the standard RAM machine model used (implicitly) in algorithms courses, rather than Turing machines. While these two models are of course polynomially equivalent, and hence make no difference for the definitions of the classes $\mathbf{P}$, $\mathbf{NP}$, and $\mathbf{EXP}$, our choice makes the distinction between notions such as $O(n)$ or $O(n^2)$ time more meaningful. This choice also ensures that these finer-grained time complexity classes correspond to the informal definitions of linear and quadratic time that students encounter in their algorithms lectures (or their whiteboard coding interviews...). 117 | 118 | 119 | 2. We use the terminology of _functions_ rather than _languages_. That is, rather than saying that a Turing Machine $M$ _decides a language_ $L \subseteq \{0,1\}^*$, we say that it _computes a function_ $F:\{0,1\}^* \rightarrow \{0,1\}$. The terminology of "languages" arises from Chomsky's work [@Chomsky56], but it is often more confusing than illuminating. The language terminology also makes it cumbersome to discuss concepts such as algorithms that compute functions with more than one bit of output (including basic tasks such as addition, multiplication, etc...). The fact that we use functions rather than languages means we have to be extra vigilant about students distinguishing between the _specification_ of a computational task (e.g., the _function_) and its _implementation_ (e.g., the _program_). On the other hand, this point is so important that it is worth repeatedly emphasizing and drilling into the students, regardless of the notation used. The book does mention the language terminology and reminds of it occasionally, to make it easier for students to consult outside resources. 120 | 121 | 122 | 123 | 124 | Reducing the time dedicated to finite automata and context-free languages allows instructors to spend more time on topics that a modern course in the theory of computing needs to touch upon. These include randomness and computation, the interactions between _proofs_ and _programs_ (including Gödel's incompleteness theorem, interactive proof systems, and even a bit on the $\lambda$-calculus and the Curry-Howard correspondence), cryptography, and quantum computing. 125 | 126 | This book contains sufficient detail to enable its use for self-study. 127 | Toward that end, every chapter starts with a list of learning objectives, ends with a recap, and is peppered with "pause boxes" which encourage students to stop and work out an argument or make sure they understand a definition before continuing further. 128 | 129 | [roadmapsec](){.ref} contains a "roadmap" for this book, with descriptions of the different chapters, as well as the dependency structure between them. 130 | This can help in planning a course based on this book. 131 | 132 | 133 | 134 | 135 | 136 | ## Acknowledgements 137 | 138 | This text is continually evolving, and I am getting input from many people, for which I am deeply grateful. 139 | Salil Vadhan co-taught with me the first iteration of this course and gave me a tremendous amount of useful feedback and insights during this process. 140 | Michele Amoretti and Marika Swanberg carefully read several chapters of this text and gave extremely helpful detailed comments. Dave Evans and Richard Xu contributed many pull requests fixing errors and improving phrasing. 141 | Thanks to Anil Ada, Venkat Guruswami, and Ryan O'Donnell for helpful tips from their experience in teaching [CMU 15-251](http://www.cs.cmu.edu/~./15251/). 142 | Thanks to Adam Hesterberg and Madhu Sudan for their comments on their experience teaching CS 121 with this book. 143 | Kunal Marwaha gave many comments, as well as provided great help with the technical aspects of producing the book. 144 | 145 | Thanks to everyone that sent me comments, typo reports, or posted issues or pull requests on the GitHub repository [https://github.com/boazbk/tcs](https://github.com/boazbk/tcs). 146 | In particular I would like to acknowledge helpful feedback from Scott Aaronson, Michele Amoretti, Aadi Bajpai, Marguerite Basta, Anindya Basu, Sam Benkelman, Jarosław Błasiok, Emily Chan, Christy Cheng, Michelle Chiang, Daniel Chiu, Chi-Ning Chou, Michael Colavita, Brenna Courtney, Rodrigo Daboin Sanchez, Robert Darley Waddilove, Anlan Du, Juan Esteller, David Evans, Michael Fine, Simon Fischer, Leor Fishman, Zaymon Foulds-Cook, William Fu, Kent Furuie, Piotr Galuszka, Carolyn Ge, Jason Giroux, Mark Goldstein, Alexander Golovnev, Sayan Goswami, Maxwell Grozovsky, Michael Haak, Rebecca Hao, Lucia Hoerr, Joosep Hook, Austin Houck, Thomas Huet, Emily Jia, Serdar Kaçka, Chan Kang, Nina Katz-Christy, Vidak Kazic, Joe Kerrigan, Eddie Kohler, Estefania Lahera, Allison Lee, Benjamin Lee, Ondřej Lengál, Raymond Lin, Emma Ling, Alex Lombardi, Lisa Lu, Kai Ma, Aditya Mahadevan, Kunal Marwaha, Christian May, Josh Mehr, Jacob Meyerson, Leon Mlodzian, George Moe, Todd Morrill, Glenn Moss, Haley Mulligan, Hamish Nicholson, Owen Niles, Sandip Nirmel, Sebastian Oberhoff, Thomas Orton, Joshua Pan, Pablo Parrilo, Juan Perdomo, Banks Pickett, Aaron Sachs, Abdelrhman Saleh, Brian Sapozhnikov, Anthony Scemama, Peter Schäfer, Josh Seides, Alaisha Sharma, Nathan Sheely, Haneul Shin, Noah Singer, Matthew Smedberg, Miguel Solano, Hikari Sorensen, David Steurer, Alec Sun, Amol Surati, Everett Sussman, Marika Swanberg, Garrett Tanzer, Eric Thomas, Sarah Turnill, Salil Vadhan, Patrick Watts, Jonah Weissman, Ryan Williams, Licheng Xu, Richard Xu, Wanqian Yang, Elizabeth Yeoh-Wang, Josh Zelinsky, Fred Zhang, Grace Zhang, Alex Zhao, and Jessica Zhu. 147 | 148 | 149 | 150 | I am using many open-source software packages in the production of these notes for which I am grateful. 151 | In particular, I am thankful to Donald Knuth and Leslie Lamport for [LaTeX](https://www.latex-project.org/) and 152 | to John MacFarlane for [Pandoc](http://pandoc.org/). 153 | David Steurer wrote the original scripts to produce this text. 154 | The current version uses Sergio Correia's [panflute](http://scorreia.com/software/panflute/). 155 | The templates for the LaTeX and HTML versions are derived from [Tufte LaTeX](https://tufte-latex.github.io/tufte-latex/), [Gitbook](https://www.gitbook.com/) and [Bookdown](https://bookdown.org/). 156 | Thanks to Amy Hendrickson for some LaTeX consulting. 157 | Juan Esteller and Gabe Montague initially implemented the NAND* programming languages in OCaml and Javascript. 158 | I used the [Jupyter project](http://jupyter.org/) to write the supplemental code snippets. 159 | 160 | Finally, I would like to thank my family: my wife Ravit, and my children Alma and Goren. 161 | Working on this book (and the corresponding course) took so much of my time that Alma wrote an essay for her fifth-grade class saying that "universities should not pressure professors to work too much." 162 | I'm afraid all I have to show for this effort is 600 pages of ultra-boring mathematical text. 163 | -------------------------------------------------------------------------------- /lec_01_introduction.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Introduction" 3 | filename: "lec_01_introduction" 4 | chapternum: "0" 5 | --- 6 | 7 | 8 | # Introduction { #chapintro } 9 | 10 | 11 | 12 | > ### { .objectives } 13 | * Introduce and motivate the study of computation for its own sake, irrespective of particular implementations. 14 | * The notion of an _algorithm_ and some of its history. 15 | * Algorithms as not just _tools_, but also _ways of thinking and understanding_. 16 | * Taste of Big-$O$ analysis and the surprising creativity in the design of efficient algorithms. 17 | 18 | >_"Computer Science is no more about computers than astronomy is about telescopes"_, attributed to Edsger Dijkstra.^[This quote is typically read as disparaging the importance of actual physical computers in Computer Science, but note that telescopes are absolutely essential to astronomy as they provide us with the means to connect theoretical predictions with actual experimental observations.] 19 | 20 | 21 | >_"Hackers need to understand the theory of computation about as much as painters need to understand paint chemistry."_, Paul Graham 2003.^[To be fair, in the following sentence Graham says "you need to know how to calculate time and space complexity and about Turing completeness". This book includes these topics, as well as others such as NP-hardness, randomization, cryptography, quantum computing, and more.] 22 | 23 | 24 | 25 | >_"The subject of my talk is perhaps most directly indicated by simply 26 | asking two questions: first, is it harder to multiply than to 27 | add? and second, why?...I (would like to) show that there is no 28 | algorithm for multiplication computationally as simple as that for 29 | addition, and this proves something of a stumbling block."_, Alan Cobham, 1964 30 | 31 | 32 | One of the ancient Babylonians' greatest innovations is the _place-value number system_. 33 | The place-value system represents numbers as sequences of digits where the _position_ of each digit determines its value. 34 | 35 | This is opposed to a system like Roman numerals, where every digit has a fixed value regardless of position. 36 | For example, the average distance to the moon is approximately 259,956 Roman miles. In standard Roman numerals, that would be 37 | 38 | ``` 39 | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 40 | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 41 | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 42 | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 43 | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 44 | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 45 | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 46 | MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 47 | MMMMMMMMMMMMMMMMMMMDCCCCLVI 48 | ``` 49 | 50 | Writing the distance to the _sun_ in Roman numerals would require about 100,000 symbols; it would take a 50-page book to contain this single number! 51 | 52 | For someone who thinks of numbers in an additive system like Roman numerals, quantities like the distance to the moon or sun are not merely large---they are _unspeakable_: they cannot be expressed or even grasped. 53 | It's no wonder that Eratosthenes, the first to calculate the earth's diameter (up to about ten percent error), and Hipparchus, the first to calculate the distance to the moon, used not a Roman-numeral type system but the Babylonian sexagesimal (base 60) place-value system. 54 | 55 | ## Integer multiplication: an example of an algorithm 56 | 57 | In the language of Computer Science, the place-value system for representing numbers is known as a _data structure_: a set of instructions, or "recipe", for representing objects as symbols. 58 | An _algorithm_ is a set of instructions, or "recipe", for performing operations on such representations. 59 | Data structures and algorithms have enabled amazing applications that have transformed human society, but their importance goes beyond their practical utility. 60 | Structures from computer science, such as bits, strings, graphs, and even the notion of a program itself, as well as concepts such as universality and replication, have not just found (many) practical uses but contributed a new language and a new way to view the world. 61 | 62 | 63 | In addition to coming up with the place-value system, the Babylonians also invented the "standard algorithms" that we were all taught in elementary school for adding and multiplying numbers. 64 | These algorithms have been essential throughout the ages for people using abaci, papyrus, or pencil and paper, but in our computer age, do they still serve any purpose beyond torturing third-graders? 65 | To see why these algorithms are still very much relevant, let us compare the Babylonian digit-by-digit multiplication algorithm ("grade-school multiplication") with the naive algorithm that multiplies numbers through repeated addition. 66 | We start by formally describing both algorithms, see [naivemultalg](){.ref} and [gradeschoolalg](){.ref}. 67 | 68 | 69 | ``` { .algorithm title="Multiplication via repeated addition" #naivemultalg } 70 | INPUT: Non-negative integers $x,y$ 71 | OUTPUT: Product $x\cdot y$ 72 | 73 | Let $result \leftarrow 0$. 74 | For{$i=1,\ldots,y$} 75 | $result \leftarrow result + x$ 76 | endfor 77 | return $result$ 78 | ``` 79 | 80 | ``` {.algorithm title="Grade-school multiplication" #gradeschoolalg} 81 | INPUT: Non-negative integers $x,y$ 82 | OUTPUT: Product $x\cdot y$ 83 | 84 | Write $x=x_{n-1}x_{n-2}\cdots x_0$ and $y = y_{m-1}y_{m-2}\cdots y_0$ in decimal place-value notation. # $x_0$ is the ones digit of $x$, $x_1$ is the tens digit, etc. 85 | Let $result \leftarrow 0$ 86 | For{$i=0,\ldots,n-1$} 87 | For{$j=0,\ldots,m-1$} 88 | $result \leftarrow result + 10^{i+j}\cdot x_i \cdot y_j$ 89 | endfor 90 | endfor 91 | return $result$ 92 | ``` 93 | 94 | 95 | Both [naivemultalg](){.ref} and [gradeschoolalg](){.ref} assume that we already know how to add numbers, and [gradeschoolalg](){.ref} also assumes that we can multiply a number by a power of $10$ (which is, after all, a simple shift). 96 | Suppose that $x$ and $y$ are two integers of $n=20$ decimal digits each. 97 | (This roughly corresponds to 64 binary digits, which is a common size in many programming languages.) 98 | Computing $x \cdot y$ using [naivemultalg](){.ref} entails adding $x$ to itself $y$ times which entails (since $y$ is a $20$-digit number) at least $10^{19}$ additions. 99 | In contrast, the grade-school algorithm (i.e., [gradeschoolalg](){.ref}) involves $n^2$ shifts and single-digit products, and so at most $2n^2 = 800$ single-digit operations. 100 | To understand the difference, consider that a grade-schooler can perform a single-digit operation in about 2 seconds, and so would require about $1,600$ seconds (about half an hour) to compute $x\cdot y$ using [gradeschoolalg](){.ref}. 101 | In contrast, even though it is more than a billion times faster than a human, if we used [naivemultalg](){.ref} to compute $x\cdot y$ using a modern PC, it would take us $10^{20}/10^9 = 10^{11}$ seconds (which is more than three millennia!) to compute the same result. 102 | 103 | 104 | Computers have not made algorithms obsolete. 105 | On the contrary, the vast increase in our ability to measure, store, and communicate data has led to much higher demand for developing better and more sophisticated algorithms that empower us to make better decisions based on these data. 106 | We also see that in no small extent the notion of _algorithm_ is independent of the actual computing device that executes it. 107 | The digit-by-digit multiplication algorithm is vastly better than iterated addition, regardless of whether the technology we use to implement it is a silicon-based chip, or a third-grader with pen and paper. 108 | 109 | 110 | Theoretical computer science is concerned with the _inherent_ properties of algorithms and computation; namely, those properties that are _independent_ of current technology. 111 | We ask some questions that were already pondered by the Babylonians, such as "what is the best way to multiply two numbers?", but also questions that rely on cutting-edge science such as "could we use the effects of quantum entanglement to factor numbers faster?". 112 | 113 | 114 | 115 | ::: {.remark title="Specification, implementation, and analysis of algorithms." #implspecanarem} 116 | A full description of an algorithm has three components: 117 | 118 | * __Specification__: __What__ is the task that the algorithm performs (e.g., multiplication in the case of [naivemultalg](){.ref} and [gradeschoolalg](){.ref}.) 119 | 120 | * __Implementation__: __How__ is the task accomplished: what is the sequence of instructions to be performed. Even though [naivemultalg](){.ref} and [gradeschoolalg](){.ref} perform the same computational task (i.e., they have the same _specification_), they do it in different ways (i.e., they have different _implementations_). 121 | 122 | * __Analysis:__ __Why__ does this sequence of instructions achieve the desired task. A full description of [naivemultalg](){.ref} and [gradeschoolalg](){.ref} will include a _proof_ for each one of these algorithms that on input $x,y$, the algorithm does indeed output $x\cdot y$. 123 | 124 | Often as part of the analysis we show that the algorithm is not only __correct__ but also __efficient__. That is, we want to show that not only will the algorithm compute the desired task, but will do so in a prescribed number of operations. For example [gradeschoolalg](){.ref} computes the multiplication function on inputs of $n$ digits using $O(n^2)$ operations, while [karatsubaalg](){.ref} (described below) computes the same function using $O(n^{1.6})$ operations. (We define the $O$ notations used here in [secbigohnotation](){.ref}.) 125 | ::: 126 | 127 | 128 | 129 | 130 | 131 | ## Extended Example: A faster way to multiply (optional) {#karatsubasec } 132 | 133 | Once you think of the standard digit-by-digit multiplication algorithm, it seems like the ``obviously best'' way to multiply numbers. 134 | In 1960, the famous mathematician Andrey Kolmogorov organized a seminar at Moscow State University in which he conjectured that every algorithm for multiplying two $n$ digit numbers would require a number of basic operations that is proportional to $n^2$ ($\Omega(n^2)$ operations, using $O$-notation as defined in [chapmath](){.ref}). 135 | In other words, Kolmogorov conjectured that in any multiplication algorithm, doubling the number of digits would _quadruple_ the number of basic operations required. 136 | A young student named Anatoly Karatsuba was in the audience, and within a week he disproved Kolmogorov's conjecture by discovering an algorithm that requires only about $Cn^{1.6}$ operations for some constant $C$. 137 | Such a number becomes much smaller than $n^2$ as $n$ grows and so for large $n$ Karatsuba's algorithm 138 | is superior to the grade-school one. (For example, [Python's implementation](https://svn.python.org/projects/python/trunk/Objects/longobject.c) switches from the grade-school algorithm to Karatsuba's algorithm for numbers that are 1000 bits or larger.) 139 | While the difference between an $O(n^{1.6})$ and an $O(n^2)$ algorithm can be sometimes crucial in practice (see [algsbeyondarithmetic](){.ref} below), in this book we will mostly ignore such distinctions. 140 | However, we describe Karatsuba's algorithm below since it is a good example of how algorithms can often be surprising, as well as a demonstration of the _analysis of algorithms_, which is central to this book and to theoretical computer science at large. 141 | 142 | Karatsuba's algorithm is based on a faster way to multiply _two-digit_ numbers. 143 | Suppose that $x,y \in [100]=\{0,\ldots, 99 \}$ are a pair of two-digit numbers. 144 | Let's write $\overline{x}$ for the "tens" digit of $x$, and $\underline{x}$ for the "ones" digit, so that $x = 10\overline{x} + \underline{x}$, and write similarly $y = 10\overline{y} + \underline{y}$ for $\overline{x},\underline{x},\overline{y},\underline{y} \in [10]$. 145 | The grade-school algorithm for multiplying $x$ and $y$ is illustrated in [gradeschoolmult](){.ref}. 146 | 147 | ![The grade-school multiplication algorithm illustrated for multiplying $x=10\overline{x}+\underline{x}$ and $y=10\overline{y}+\underline{y}$. It uses the formula $(10\overline{x}+\underline{x}) \times (10 \overline{y}+\underline{y}) = 100\overline{x}\overline{y}+10(\overline{x}\underline{y} + \underline{x}\overline{y}) + \underline{x}\underline{y}$.](../figure/gradeschoolmult.png){#gradeschoolmult .margin } 148 | 149 | The grade-school algorithm can be thought of as transforming the task of multiplying a pair of two-digit numbers into _four_ single-digit multiplications via the formula 150 | 151 | $$ 152 | (10\overline{x}+\underline{x}) \times (10 \overline{y}+\underline{y}) = 100\overline{x}\overline{y}+10(\overline{x}\underline{y} + \underline{x}\overline{y}) + \underline{x}\underline{y} \label{eq:gradeschooltwodigit} 153 | $$ 154 | 155 | Generally, in the grade-school algorithm _doubling_ the number of digits in the input results in _quadrupling_ the number of operations, leading to an $O(n^2)$ times algorithm. 156 | In contrast, Karatsuba's algorithm is based on the observation that we can express [eq:gradeschooltwodigit](){.ref} also as 157 | 158 | $$ 159 | (10\overline{x}+\underline{x}) \times (10 \overline{y}+\underline{y}) = (100-10)\overline{x}\overline{y}+10\left[(\overline{x}+\underline{x})(\overline{y}+\underline{y})\right] -(10-1)\underline{x}\underline{y} \label{eq:karatsubatwodigit} 160 | $$ 161 | 162 | which reduces multiplying the two-digit number $x$ and $y$ to computing the following three simpler products: $\overline{x}\overline{y}$, $\underline{x}\underline{y}$ and $(\overline{x}+\underline{x})(\overline{y}+\underline{y})$. 163 | By repeating the same strategy recursively, we can reduce the task of multiplying two $n$-digit numbers to the task of multiplying _three_ pairs of $\floor{n/2}+1$ digit numbers.^[If $x$ is a number then $\floor{x}$ is the integer obtained by rounding it down, see [notationsec](){.ref}.] 164 | Since every time we _double_ the number of digits we _triple_ the number of operations, we will be able 165 | to multiply numbers of $n=2^\ell$ digits using about $3^\ell = n^{\log_2 3} \sim n^{1.585}$ operations. 166 | 167 | The above is the intuitive idea behind Karatsuba's algorithm, but is not enough to fully specify it. 168 | A complete description of an algorithm entails a _precise specification_ of its operations together with its _analysis_: proof that the algorithm does in fact do what it's supposed to do. 169 | The operations of Karatsuba's algorithm are detailed in [karatsubaalg](){.ref}, while the analysis is given in [karatsubacorrect](){.ref} and [karatsubaefficient](){.ref}. 170 | 171 | 172 | ![Karatsuba's multiplication algorithm illustrated for multiplying $x=10\overline{x}+\underline{x}$ and $y=10\overline{y}+\underline{y}$. We compute the three orange, green and purple products $\underline{x}\underline{y}$, $\overline{x}\overline{y}$ and $(\overline{x}+\underline{x})(\overline{y}+\underline{y})$ and then add and subtract them to obtain the result.](../figure/karatsubatwodigit.png){#karatsubafig .margin } 173 | 174 | 175 | ![Running time of Karatsuba's algorithm vs. the grade-school algorithm. (Python implementation available [online](https://goo.gl/zwzpYe).) Note the existence of a "cutoff" length, where for sufficiently large inputs Karatsuba becomes more efficient than the grade-school algorithm. The precise cutoff location varies by implementation and platform details, but will always occur eventually.](../figure/karastubavsgschoolv2.png){#karatsubaruntimefig .margin } 176 | 177 | 178 | 179 | ``` { .algorithm title="Karatsuba multiplication" #karatsubaalg } 180 | INPUT: non-negative integers $x,y$ each of at most $n$ digits 181 | 182 | OUTPUT: $x\cdot y$ 183 | 184 | procedure{Karatsuba}{$x$,$y$} 185 | lif {$n \leq 4$} return $x\cdot y$ lendif 186 | Let $m = \floor{n/2}$ 187 | Write $x= 10^{m}\overline{x} + \underline{x}$ and $y= 10^{m}\overline{y}+ \underline{y}$ 188 | $A \leftarrow Karatsuba(\overline{x},\overline{y})$ 189 | $B \leftarrow Karatsuba(\overline{x}+\underline{x},\overline{y}+\underline{y})$ 190 | $C \leftarrow Karatsuba(\underline{x},\underline{y})$ 191 | Return $(10^n-10^m)\cdot A + 10^m \cdot B +(1-10^m)\cdot C$ 192 | endprocedure 193 | ``` 194 | 195 | 196 | 197 | 198 | 199 | 200 | [karatsubaalg](){.ref} is only half of the full description of Karatsuba's algorithm. 201 | The other half is the _analysis_, which entails proving that __(1)__ [karatsubaalg](){.ref} indeed computes the multiplication operation and __(2)__ it does so using $O(n^{\log_2 3})$ operations. 202 | We now turn to showing both facts: 203 | 204 | > ### {.lemma #karatsubacorrect} 205 | For every non-negative integers $x,y$, when given input $x,y$ [karatsubaalg](){.ref} will output $x\cdot y$. 206 | 207 | 208 | 209 | ::: {.proof data-ref="karatsubacorrect"} 210 | Let $n$ be the maximum number of digits of $x$ and $y$. We prove the lemma by induction on $n$. 211 | The base case is $n \leq 4$ where the algorithm returns $x\cdot y$ by definition. 212 | (It does not matter which algorithm we use to multiply four-digit numbers - we can even use repeated addition.) 213 | Otherwise, if $n>4$, we define $m = \floor{n/2}$, and write 214 | $x= 10^{m}\overline{x} + \underline{x}$ and $y= 10^{m}\overline{y}+ \underline{y}$. 215 | 216 | Plugging this into $x\cdot y$, we get 217 | 218 | $$ 219 | x \cdot y = 10^{2m} \overline{x}\overline{y} + 10^{m}(\overline{x}\underline{y} +\underline{x}\overline{y}) + \underline{x}\underline{y} \;. \label{eqkarastubaone} 220 | $$ 221 | 222 | Rearranging the terms we see that 223 | 224 | $$ 225 | x\cdot y = 10^{2m}\overline{x}\overline{y} + 10^{m}\left[ (\overline{x}+\underline{x})(\overline{y}+\underline{y}) - \underline{x}\underline{y} - \overline{x}\overline{y} \right] + \underline{x}\underline{y} \;. 226 | \label{eqkarastubatwo} 227 | $$ 228 | since the numbers $\underline{x}$,$\overline{x}$, $\underline{y}$,$\overline{y}$,$\overline{x}+\underline{x}$,$\overline{y}+\underline{y}$ all have at most $m+2 ### {.lemma #karatsubaefficient} 234 | If $x,y$ are integers of at most $n$ digits, [karatsubaalg](){.ref} will take $O(n^{\log_2 3})$ operations on input $x,y$. 235 | 236 | ::: {.proof data-ref="karatsubaefficient"} 237 | [karatsubafig](){.ref} illustrates the idea behind the proof, which we only sketch here, leaving filling out the details as [karatsuba-ex](){.ref}. 238 | The proof is again by induction. We define $T(n)$ to be the maximum number of steps that [karatsubaalg](){.ref} takes on inputs of length at most $n$. 239 | Since in the base case $n\leq 4$, [karatsuba-ex](){.ref} performs a constant number of computation, we know that $T(4) \leq c$ for some constant $c$ and for $n>4$, it satisfies the recursive equation 240 | $$ 241 | T(n) \leq 3T(\floor{n/2}+1) + c' n \label{eqkaratsubarecursion} 242 | $$ 243 | for some constant $c'$ (using the fact that addition can be done in $O(n)$ operations). 244 | 245 | The recursive equation [eqkaratsubarecursion](){.eqref} solves to $O(n^{\log_2 3})$. 246 | The intuition behind this is presented in [karatsubafig](){.ref}, and this is also a consequence of the so-called ["Master Theorem"](https://en.wikipedia.org/wiki/Master_theorem_\(analysis_of_algorithms\)) on recurrence relations. 247 | As mentioned above, we leave completing the proof to the reader as [karatsuba-ex](){.ref}. 248 | ::: 249 | 250 | 251 | 252 | 253 | 254 | ![Karatsuba's algorithm reduces an $n$-bit multiplication to three $n/2$-bit multiplications, which in turn are reduced to nine $n/4$-bit multiplications and so on. We can represent the computational cost of all these multiplications in a $3$-ary tree of depth $\log_2 n$, where at the root the extra cost is $cn$ operations, at the first level the extra cost is $c(n/2)$ operations, and at each of the $3^i$ nodes of level $i$, the extra cost is $c(n/2^i)$. The total cost is $cn\sum_{i=0}^{\log_2 n} (3/2)^i \leq 10cn^{\log_2 3}$ by the formula for summing a geometric series.](../figure/karatsuba_analysis2.png){#karatsuba-fig } 255 | 256 | 257 | 258 | Karatsuba's algorithm is by no means the end of the line for multiplication algorithms. 259 | In the 1960's, Toom and Cook extended Karatsuba's ideas to get an $O(n^{\log_k (2k-1)})$ time multiplication algorithm for every constant $k$. 260 | In 1971, Schönhage and Strassen got even better algorithms using the _Fast Fourier Transform_; their idea was to somehow treat integers as "signals" and do the multiplication more efficiently by moving to the Fourier domain. 261 | (The _Fourier transform_ is a central tool in mathematics and engineering, used in a great many applications; if you have not seen it yet, you are likely to encounter it at some point in your studies.) 262 | In the years that followed researchers kept improving the algorithm, and only very recently Harvey and Van Der Hoeven managed to obtain an $O(n \log n)$ time algorithm for multiplication (though it only starts beating the Schönhage-Strassen algorithm for truly astronomical numbers). 263 | Yet, despite all this progress, we still don't know whether or not there is an $O(n)$ time algorithm for multiplying two $n$ digit numbers! 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | ::: {.remark title="Matrix Multiplication (advanced note)" #matrixmult} 272 | (This book contains many "advanced" or "optional" notes and sections. These may assume background that not every student has, and can be safely skipped over as none of the future parts depends on them.) 273 | 274 | Ideas similar to Karatsuba's can be used to speed up _matrix_ multiplications as well. 275 | Matrices are a powerful way to represent linear equations and operations, widely used in numerous applications of scientific computing, graphics, machine learning, and many many more. 276 | 277 | One of the basic operations one can do with two matrices is to _multiply_ them. 278 | For example, if $x = \begin{pmatrix} x_{0,0} & x_{0,1}\\ x_{1,0}& x_{1,1} \end{pmatrix}$ and $y = \begin{pmatrix} y_{0,0} & y_{0,1}\\ y_{1,0}& y_{1,1} \end{pmatrix}$ then the product of $x$ and $y$ is the matrix $\begin{pmatrix} x_{0,0}y_{0,0} + x_{0,1}y_{1,0} & x_{0,0}y_{0,1} + x_{0,1}y_{1,1}\\ x_{1,0}y_{0,0}+x_{1,1}y_{1,0} & x_{1,0}y_{0,1}+x_{1,1}y_{1,1} \end{pmatrix}$. 279 | You can see that we can compute this matrix by _eight_ products of numbers. 280 | 281 | Now suppose that $n$ is even and $x$ and $y$ are a pair of $n\times n$ matrices which we can think of as each composed of four $(n/2)\times (n/2)$ blocks $x_{0,0},x_{0,1},x_{1,0},x_{1,1}$ and $y_{0,0},y_{0,1},y_{1,0},y_{1,1}$. 282 | Then the formula for the matrix product of $x$ and $y$ can be expressed in the same way as above, just replacing products $x_{a,b}y_{c,d}$ with _matrix_ products, and addition with matrix addition. 283 | This means that we can use the formula above to give an algorithm that _doubles_ the dimension of the matrices at the expense of increasing the number of operations by a factor of $8$, which for $n=2^\ell$ results in $8^\ell = n^3$ operations. 284 | 285 | 286 | In 1969 Volker Strassen noted that we can compute the product of a pair of two-by-two matrices using only _seven_ products of numbers by observing that each entry of the matrix $xy$ can be computed by adding and subtracting the following seven terms: $t_1 = (x_{1,0}+x_{1,1})(y_{0,0}+y_{1,1})$, $t_2 = (x_{0,0}+x_{1,1})y_{0,0}$, $t_3 = x_{0,0}(y_{0,1}-y_{1,1})$, $t_4 = x_{1,1}(y_{0,1}-y_{0,0})$, $t_5 = (x_{0,0}+x_{0,1})y_{1,1}$, $t_6 = (x_{1,0}-x_{0,0})(y_{1,0}+y_{0,1})$, 287 | $t_7 = (x_{0,1}-x_{1,1})(y_{1,0}+y_{1,1})$. 288 | Indeed, one can verify that $xy = \begin{pmatrix} t_1 + t_4 - t_5 + t_7 & t_3 + t_5 \\ t_2 +t_4 & t_1 + t_3 - t_2 + t_6 \end{pmatrix}$. 289 | 290 | 291 | Using this observation, we can obtain an algorithm such that doubling the dimension of the matrices results in increasing the number of operations by a factor of $7$, which means that for $n=2^\ell$ the cost is $7^\ell = n^{\log_2 7} \sim n^{2.807}$. 292 | A long sequence of work has since improved this algorithm, and the [current record](https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm#Sub-cubic_algorithms) has a running time of about $O(n^{2.373})$. 293 | However, unlike the case of integer multiplication, at the moment we don't know of any algorithm for matrix multiplication that runs in time linear or even close to linear in the size of the input matrices (e.g., an $O(n^2 polylog(n))$ time algorithm). 294 | People have tried to use [group representations](https://en.wikipedia.org/wiki/Group_representation), which can be thought of as generalizations of the Fourier transform, to obtain faster algorithms, but this effort [has not yet succeeded](http://discreteanalysisjournal.com/article/1245-on-cap-sets-and-the-group-theoretic-approach-to-matrix-multiplication). 295 | ::: 296 | 297 | 298 | ## Algorithms beyond arithmetic {#algsbeyondarithmetic } 299 | 300 | The quest for better algorithms is by no means restricted to arithmetic tasks such as adding, multiplying or solving equations. 301 | Many _graph algorithms_, including algorithms for finding paths, matchings, spanning trees, cuts, and flows, have been discovered in the last several decades, and this is still an intensive area of research. 302 | (For example, the last few years saw many advances in algorithms for the _maximum flow_ problem, borne out of unexpected connections with electrical circuits and linear equation solvers.) 303 | These algorithms are being used not just for the "natural" applications of routing network traffic or GPS-based navigation, but also for applications as varied as drug discovery through searching for structures in gene-interaction graphs to computing risks from correlations in financial investments. 304 | 305 | 306 | Google was founded based on the _PageRank_ algorithm, which is an efficient algorithm to approximate the "principal eigenvector" of (a dampened version of) the adjacency matrix of the web graph. 307 | The _Akamai_ company was founded based on a new data structure, known as _consistent hashing_, for a hash table where buckets are stored at different servers. 308 | The _backpropagation algorithm_, which computes partial derivatives of a neural network in $O(n)$ instead of $O(n^2)$ time, underlies many of the recent phenomenal successes of learning deep neural networks. 309 | Algorithms for solving linear equations under sparsity constraints, a concept known as _compressed sensing_, have been used to drastically reduce the amount and quality of data needed to analyze MRI images. 310 | This made a critical difference for MRI imaging of cancer tumors in children, where previously doctors needed to use anesthesia to suspend breath during the MRI exam, sometimes with dire consequences. 311 | 312 | Even for classical questions, studied through the ages, new discoveries are still being made. 313 | For example, for the question of determining whether a given integer is prime or composite, which has been studied since the days of Pythagoras, efficient probabilistic algorithms were only discovered in the 1970s, while the first [deterministic polynomial-time algorithm](https://en.wikipedia.org/wiki/AKS_primality_test) was only found in 2002. 314 | For the related problem of actually finding the factors of a composite number, new algorithms were found in the 1980s, and (as we'll see later in this course) discoveries in the 1990s raised the tantalizing prospect of obtaining faster algorithms through the use of quantum mechanical effects. 315 | 316 | Despite all this progress, there are still many more questions than answers in the world of algorithms. 317 | For almost all natural problems, we do not know whether the current algorithm is the "best", or whether a significantly better one is still waiting to be discovered. 318 | As alluded to in Cobham's opening quote for this chapter, even for the basic problem of multiplying numbers we have not yet answered the question of whether there is a multiplication algorithm that is as efficient as our algorithms for addition. 319 | But at least we now know the right way to _ask_ it. 320 | 321 | 322 | 323 | ## On the importance of negative results 324 | 325 | Finding better algorithms for problems such as multiplication, solving equations, graph problems, or fitting neural networks to data, is undoubtedly a worthwhile endeavor. 326 | But why is it important to prove that such algorithms _don't_ exist? 327 | One motivation is pure intellectual curiosity. 328 | Another reason to study impossibility results is that they correspond to the fundamental limits of our world. 329 | In other words, impossibility results are _laws of nature_. 330 | 331 | Here are some examples of impossibility results outside computer science (see [bnotesintrosec](){.ref} for more about these). 332 | In physics, the impossibility of building a _perpetual motion machine_ corresponds to the _law of conservation of energy_. 333 | The impossibility of building a heat engine beating Carnot's bound corresponds to the second law of thermodynamics, while the impossibility of faster-than-light information transmission is a cornerstone of special relativity. 334 | In mathematics, while we all learned the formula for solving quadratic equations in high school, the impossibility of generalizing this formula to equations of degree five or more gave birth to _group theory_. 335 | The impossibility of proving Euclid's fifth axiom from the first four gave rise to [non-Euclidean geometries](https://en.wikipedia.org/wiki/Non-Euclidean_geometry), which ended up crucial for the theory of general relativity. 336 | 337 | In an analogous way, impossibility results for computation correspond to "computational laws of nature" that tell us about the fundamental limits of any information processing apparatus, whether based on silicon, neurons, or quantum particles. 338 | Moreover, computer scientists found creative approaches to _apply_ computational limitations to achieve certain useful tasks. 339 | For example, much of modern Internet traffic is encrypted using the [RSA encryption scheme](https://en.wikipedia.org/wiki/RSA_\(cryptosystem\)), the security of which relies on the (conjectured) impossibility of efficiently factoring large integers. 340 | More recently, the [Bitcoin](https://en.wikipedia.org/wiki/Bitcoin) system uses a digital analog of the "gold standard" where, instead of using a precious metal, new currency is obtained by "mining" solutions for computationally difficult problems. 341 | 342 | 343 | 344 | > ### { .recap } 345 | * The history of algorithms goes back thousands of years; they have been essential to much of human progress and these days form the basis of multi-billion dollar industries, as well as life-saving technologies. 346 | * There is often more than one algorithm to achieve the same computational task. Finding a faster algorithm can often make a much bigger difference than improving computing hardware. 347 | * Better algorithms and data structures don't just speed up calculations, but can yield new qualitative insights. 348 | * One question we will study is to find out what is the _most efficient_ algorithm for a given problem. 349 | * To show that an algorithm is the most efficient one for a given problem, we need to be able to _prove_ that it is _impossible_ to solve the problem using a smaller amount of computational resources. 350 | 351 | ## Roadmap to the rest of this book {#roadmapsec } 352 | 353 | Often, when we try to solve a computational problem, whether it is solving a system of linear equations, finding the top eigenvector of a matrix, or trying to rank Internet search results, it is enough to use the "I know it when I see it" standard for describing algorithms. 354 | As long as we find some way to solve the problem, we are happy and might not care much on the exact mathematical model for our algorithm. 355 | But when we want to answer a question such as "does there _exist_ an algorithm to solve the problem $P$?" we need to be much more precise. 356 | 357 | In particular, we will need to __(1)__ define exactly what it means to solve $P$, and __(2)__ define exactly what an algorithm is. 358 | Even __(1)__ can sometimes be non-trivial but __(2)__ is particularly challenging; it is not at all clear how (and even whether) we can encompass all potential ways to design algorithms. 359 | We will consider several simple _models of computation_, and argue that, despite their simplicity, they do capture all "reasonable" approaches to achieve computing, including all those that are currently used in modern computing devices. 360 | 361 | Once we have these formal models of computation, we can try to obtain _impossibility results_ for computational tasks, showing that some problems _can not be solved_ (or perhaps can not be solved within the resources of our universe). 362 | Archimedes once said that given a fulcrum and a long enough lever, he could move the world. 363 | We will see how _reductions_ allow us to leverage one hardness result into a slew of a great many others, illuminating the boundaries between the computable and uncomputable (or tractable and intractable) problems. 364 | 365 | Later in this book we will go back to examining our models of computation, and see how resources such as randomness or quantum entanglement could potentially change the power of our model. 366 | In the context of probabilistic algorithms, we will see a glimpse of how randomness has become an indispensable tool for understanding computation, information, and communication. 367 | We will also see how computational difficulty can be an asset rather than a hindrance, and be used for the "derandomization" of probabilistic algorithms. 368 | The same ideas also show up in _cryptography_, which has undergone not just a technological but also an intellectual revolution in the last few decades, much of it building on the foundations that we explore in this course. 369 | 370 | Theoretical Computer Science is a vast topic, branching out and touching upon many scientific and engineering disciplines. 371 | This book provides a very partial (and biased) sample of this area. 372 | More than anything, I hope I will manage to "infect" you with at least some of my love for this field, which is inspired and enriched by the connection to practice, but is also deep and beautiful regardless of applications. 373 | 374 | ### Dependencies between chapters 375 | 376 | This book is divided into the following parts, see [dependencystructurefig](){.ref}. 377 | 378 | * __Preliminaries:__ Introduction, mathematical background, and representing objects as strings. 379 | 380 | * __Part I: Finite computation (Boolean circuits):__ Equivalence of circuits and straight-line programs. Universal gate sets. Existence of a circuit for every function, representing circuits as strings, universal circuit, lower bound on circuit size using the counting argument. 381 | 382 | * __Part II: Uniform computation (Turing machines):__ Equivalence of Turing machines and programs with loops. Equivalence of models (including RAM machines, $\lambda$ calculus, and cellular automata), configurations of Turing machines, existence of a universal Turing machine, uncomputable functions (including the Halting problem and Rice's Theorem), Gödel's incompleteness theorem, restricted computational models (regular and context free languages). 383 | 384 | * __Part III: Efficient computation:__ Definition of running time, time hierarchy theorem, $\mathbf{P}$ and $\mathbf{NP}$, $\mathbf{P_{/poly}}$, $\mathbf{NP}$ completeness and the Cook-Levin Theorem, space bounded computation. 385 | 386 | * __Part IV: Randomized computation:__ Probability, randomized algorithms, $\mathbf{BPP}$, amplification, $\mathbf{BPP} \subseteq \mathbf{P}_{/poly}$, pseudorandom generators and derandomization. 387 | 388 | * __Part V: Advanced topics:__ Cryptography, proofs and algorithms (interactive and zero knowledge proofs, Curry-Howard correspondence), quantum computing. 389 | 390 | ![The dependency structure of the different parts. Part I introduces the model of Boolean circuits to study _finite functions_ with an emphasis on _quantitative_ questions (how many gates to compute a function). Part II introduces the model of Turing machines to study functions that have _unbounded input lengths_ with an emphasis on _qualitative_ questions (is this function computable or not). Much of Part II does not depend on Part I, as Turing machines can be used as the first computational model. Part III depends on both parts as it introduces a _quantitative_ study of functions with unbounded input length. The more advanced parts IV (randomized computation) and V (advanced topics) rely on the material of Parts I, II and III.](../figure/dependencystructure.png){#dependencystructurefig } 391 | 392 | 393 | The book largely proceeds in linear order, with each chapter building on the previous ones, with the following exceptions: 394 | 395 | * The topics of $\lambda$ calculus ([lambdacalculussec](){.ref} and [lambdacalculussec](){.ref}), Gödel's incompleteness theorem ([godelchap](){.ref}), Automata/regular expressions and context-free grammars ([restrictedchap](){.ref}), and space-bounded computation ([spacechap](){.ref}), are not used in the following chapters. Hence you can choose whether to cover or skip any subset of them. 396 | 397 | * Part II (Uniform Computation / Turing Machines) does not have a strong dependency on Part I (Finite computation / Boolean circuits) and it should be possible to teach them in the reverse order with minor modification. Boolean circuits are used Part III (efficient computation) for results such as $\mathbf{P} \subseteq \mathbf{P_{/poly}}$ and the Cook-Levin Theorem, as well as in Part IV (for $\mathbf{BPP} \subseteq \mathbf{P_{/poly}}$ and derandomization) and Part V (specifically in cryptography and quantum computing). 398 | 399 | * All chapters in [advancedpart](){.ref} (Advanced topics) are independent of one another and can be covered in any order. 400 | 401 | 402 | A course based on this book can use all of Parts I, II, and III (possibly skipping over some or all of the $\lambda$ calculus, [godelchap](){.ref}, [restrictedchap](){.ref} or [spacechap](){.ref}), and then either cover all or some of Part IV (randomized computation), and add a "sprinkling" of advanced topics from Part V based on student or instructor interest. 403 | 404 | 405 | 406 | 407 | ## Exercises 408 | 409 | ::: {.exercise } 410 | Rank the significance of the following inventions in speeding up the multiplication of large (that is 100-digit or more) numbers. That is, use "back of the envelope" estimates to order them in terms of the speedup factor they offered over the previous state of affairs. 411 | 412 | a. Discovery of the grade-school digit by digit algorithm (improving upon repeated addition). 413 | 414 | b. Discovery of Karatsuba's algorithm (improving upon the digit by digit algorithm). 415 | 416 | c. Invention of modern electronic computers (improving upon calculations with pen and paper). 417 | ::: 418 | 419 | ::: {.exercise} 420 | The 1977 Apple II personal computer had a processor speed of 1.023 Mhz or about $10^6$ operations per second. At the time of this writing the world's fastest supercomputer performs 93 "petaflops" ($10^{15}$ floating point operations per second) or about $10^{18}$ basic steps per second. For each one of the following running times (as a function of the input length $n$), compute for both computers how large an input they could handle in a week of computation, if they run an algorithm that has this running time: 421 | 422 | a. $n$ operations. 423 | 424 | b. $n^2$ operations. 425 | 426 | c. $n\log n$ operations. 427 | 428 | d. $2^n$ operations. 429 | 430 | e. $n!$ operations. 431 | ::: 432 | 433 | ::: {.exercise title="Usefulness of algorithmic non-existence"} 434 | In this chapter we mentioned several companies that were founded based on the discovery of new algorithms. Can you give an example for a company that was founded based on the _non-existence_ of an algorithm? See footnote for hint.^[As we will see in Chapter [chapcryptography](){.ref}, almost any company relying on cryptography needs to assume the _non-existence_ of certain algorithms. In particular, [RSA Security](https://goo.gl/tMsAui) was founded based on the security of the RSA cryptosystem, which presumes the _non-existence_ of an efficient algorithm to compute the prime factorization of large integers.] 435 | ::: 436 | 437 | ::: {.exercise title="Analysis of Karatsuba's Algorithm" #karatsuba-ex} 438 | 439 | a. Suppose that $T_1,T_2,T_3,\ldots$ is a sequence of numbers such that $T_2 \leq 10$ and for every $n$, $T_n \leq 3T_{\lfloor n/2 \rfloor+1} + Cn$ for some $C \geq 1$. Prove that $T_n \leq 20Cn^{\log_2 3}$ for every $n>2$.^[__Hint:__ Use a proof by induction - suppose that this is true for all $n$'s from $1$ to $m$ and prove that this is true also for $m+1$.] \ 440 | 441 | b. Prove that the number of single-digit operations that Karatsuba's algorithm takes to multiply two $n$ digit numbers is at most $1000n^{\log_2 3}$. 442 | 443 | ::: 444 | 445 | ::: {.exercise } 446 | Implement in the programming language of your choice functions ```Gradeschool_multiply(x,y)``` and ```Karatsuba_multiply(x,y)``` that take two arrays of digits ```x``` and ```y``` and return an array representing the product of ```x``` and ```y``` (where ```x``` is identified with the number ```x[0]+10*x[1]+100*x[2]+...``` etc..) using the grade-school algorithm and the Karatsuba algorithm respectively. At what number of digits does the Karatsuba algorithm beat the grade-school one? 447 | 448 | ::: 449 | 450 | ::: {.exercise title="Matrix Multiplication (optional, advanced)" #matrixex} 451 | In this exercise, we show that if for some $\omega>2$, we can write the product of two $k\times k$ real-valued matrices $A,B$ using at most $k^\omega$ multiplications, then we can multiply two $n\times n$ matrices in roughly $n^\omega$ time for every large enough $n$. 452 | 453 | To make this precise, we need to make some notation that is unfortunately somewhat cumbersome. Assume that there is some $k\in \N$ and $m \leq k^\omega$ such that for every $k\times k$ matrices $A,B,C$ such that $C=AB$, we can write for every $i,j \in [k]$: 454 | $$ 455 | C_{i,j} = \sum_{\ell=0}^{m-1} \alpha_{i,j}^\ell f_\ell(A)g_\ell(B) 456 | $$ 457 | for some linear functions $f_0,\ldots,f_{m-1},g_0,\ldots,g_{m-1}:\mathbb{R}^{n^2} \rightarrow \mathbb{R}$ and coefficients $\{ \alpha_{i,j}^\ell \}_{i,j \in [k],\ell \in [m]}$. 458 | Prove that under this assumption for every $\epsilon>0$, if $n$ is sufficiently large, then there is an algorithm that computes the product of two $n\times n$ matrices using at most $O(n^{\omega+\epsilon})$ arithmetic operations. See footnote for hint.^[Start by showing this for the case that $n=k^t$ for some natural number $t$, in which case you can do so recursively by breaking the matrices into $k\times k$ blocks.] 459 | ::: 460 | 461 | 462 | 463 | ## Bibliographical notes { #bnotesintrosec } 464 | 465 | For a brief overview of what we'll see in this book, you could do far worse than read [Bernard Chazelle's wonderful essay on the Algorithm as an Idiom of modern science](https://www.cs.princeton.edu/~chazelle/pubs/algorithm.html). 466 | The book of Moore and Mertens [@MooreMertens11] gives a wonderful and comprehensive overview of the theory of computation, including much of the content discussed in this chapter and the rest of this book. 467 | Aaronson's book [@Aaronson13democritus] is another great read that touches upon many of the same themes. 468 | 469 | 470 | For more on the algorithms the Babylonians used, see [Knuth's paper](http://steiner.math.nthu.edu.tw/disk5/js/computer/1.pdf) and Neugebauer's [classic book](https://www.amazon.com/Exact-Sciences-Antiquity-Neugebauer/dp/0486223329). 471 | 472 | 473 | Many of the algorithms we mention in this chapter are covered in algorithms textbooks such as those by Cormen, Leiserson, Rivest, and Stein [@CLRS], Kleinberg and Tardos [@KleinbergTardos06], and Dasgupta, Papadimitriou and Vazirani [@DasguptaPV08], as well as [Jeff Erickson's textbook](http://jeffe.cs.illinois.edu/teaching/algorithms/). 474 | Erickson's book is freely available online and contains a great exposition of recursive algorithms in general and Karatsuba's algorithm in particular. 475 | 476 | 477 | 478 | The story of Karatsuba's discovery of his multiplication algorithm is recounted by him in [@Karatsuba95]. As mentioned above, further improvements were made by Toom and Cook [@Toom63, @Cook66], Schönhage and Strassen [@SchonhageStrassen71], Fürer [@Furer07], and recently by Harvey and Van Der Hoeven [@HarveyvdHoeven2019], see [this article](https://www.quantamagazine.org/mathematicians-discover-the-perfect-way-to-multiply-20190411/) for a nice overview. 479 | The last papers crucially rely on the _Fast Fourier transform_ algorithm. The fascinating story of the (re)discovery of this algorithm by John Tukey in the context of the cold war is recounted in [@Cooley87FFTdiscovery]. (We say re-discovery because it later turned out that the algorithm dates back to Gauss [@heideman1985gauss].) 480 | The Fast Fourier Transform is covered in some of the books mentioned below, and there are also online available lectures such as [Jeff Erickson's](http://jeffe.cs.illinois.edu/teaching/algorithms/). See also this [popular article by David Austin](http://www.ams.org/samplings/feature-column/fcarc-multiplication). 481 | Fast _matrix_ multiplication was discovered by Strassen [@Strassen69], and since then this has been an active area of research. [@Blaser13] is a recommended self-contained survey of this area. 482 | 483 | The _Backpropagation_ algorithm for fast differentiation of neural networks was invented by Werbos [@Werbos74]. 484 | The _Pagerank_ algorithm was invented by Larry Page and Sergey Brin [@pagerank99]. It is closely related to the _HITS_ algorithm of Kleinberg [@Kleinber99]. The _Akamai_ company was founded based on the _consistent hashing_ data structure described in [@Akamai97]. _Compressed sensing_ has a long history but two foundational papers are [@CandesRombergTao06, @Donoho2006compressed]. [@compressedmri08] gives a survey of applications of compressed sensing to MRI; see also this popular article by Ellenberg [@Ellenberg10wired]. 485 | The deterministic polynomial-time algorithm for testing primality was given by Agrawal, Kayal, and Saxena [@AgrawalKayalSaxena04]. 486 | 487 | 488 | We alluded briefly to classical impossibility results in mathematics, including the impossibility of proving Euclid's fifth postulate from the other four, impossibility of trisecting an angle with a straightedge and compass and the impossibility of solving a quintic equation via radicals. A geometric proof of the impossibility of angle trisection (one of the three [geometric problems of antiquity](http://mathworld.wolfram.com/GeometricProblemsofAntiquity.html), going back to the ancient Greeks) is given in this [blog post of Tao](https://terrytao.wordpress.com/2011/08/10/a-geometric-proof-of-the-impossibility-of-angle-trisection-by-straightedge-and-compass/). The book of Mario Livio [@Livio05] covers some of the background and ideas behind these impossibility results. 489 | Some [exciting recent research](http://www.scottaaronson.com/barbados-2016.pdf) is focused on trying to use computational complexity to shed light on fundamental questions in physics such as understanding black holes and reconciling general relativity with quantum mechanics 490 | -------------------------------------------------------------------------------- /lec_08a_restricted_models.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Restricted computational models" 3 | filename: "lec_08a_restricted_models" 4 | chapternum: "10" 5 | --- 6 | 7 | 8 | # Restricted computational models { #restrictedchap } 9 | 10 | > ### { .objectives } 11 | * See that Turing completeness is not always a good thing. 12 | * Another example of an always-halting formalism: _context-free grammars_ and _simply typed $\lambda$ calculus_. 13 | * The pumping lemma for non context-free functions. 14 | * Examples of computable and uncomputable _semantic properties_ of regular expressions and context-free grammars. 15 | 16 | >_"Happy families are all alike; every unhappy family is unhappy in its own way"_, Leo Tolstoy (opening of the book "Anna Karenina"). 17 | 18 | 19 | We have seen that many models of computation are _Turing equivalent_, including Turing machines, NAND-TM/NAND-RAM programs, standard programming languages such as C/Python/Javascript, as well as other models such as the $\lambda$ calculus and even the game of life. 20 | The flip side of this is that for all these models, Rice's theorem ([rice-thm](){.ref}) holds as well, which means that any semantic property of programs in such a model is _uncomputable_. 21 | 22 | The uncomputability of halting and other semantic specification problems for Turing equivalent models motivates __restricted computational models__ that are __(a)__ powerful enough to capture a set of functions useful for certain applications but __(b)__ weak enough that we can still solve semantic specification problems on them. 23 | In this chapter we discuss several such examples. 24 | 25 | ::: { .bigidea #restrictedmodel} 26 | We can use _restricted computational models_ to bypass limitations such as uncomputability of the Halting problem and Rice's Theorem. Such models can compute only a restricted subclass of functions, but allow to answer at least some _semantic questions_ on programs. 27 | ::: 28 | 29 | 30 | ![Some restricted computational models. We have already seen two equivalent restricted models of computation: regular expressions and deterministic finite automata. We show a more powerful model: context-free grammars. We also present tools to demonstrate that some functions _can not_ be computed in these models. ](../figure/restrictedoverview.png){#restrictedmodelsoverviewfig} 31 | 32 | 33 | 34 | 35 | ## Turing completeness as a bug 36 | 37 | We have seen that seemingly simple computational models or systems can turn out to be Turing complete. 38 | The [following webpage](https://goo.gl/xRXq7p) lists several examples of formalisms that "accidentally" turned out to Turing complete, including supposedly limited languages such as the C preprocessor, CSS, (certain variants of) SQL, sendmail configuration, as well as games such as Minecraft, Super Mario, and the card game "Magic: The Gathering". 39 | Turing completeness is not always a good thing, as it means that such formalisms can give rise to arbitrarily complex behavior. 40 | For example, the postscript format (a precursor of PDF) is a Turing-complete programming language meant to describe documents for printing. 41 | The expressive power of postscript can allow for short descriptions of very complex images, but it also gave rise to some nasty surprises, such as the attacks described in [this page](http://hacking-printers.net/wiki/index.php/PostScript) ranging from using infinite loops as a denial of service attack, to accessing the printer's file system. 42 | 43 | 44 | 45 | ::: {.example title="The DAO Hack" #ethereum} 46 | An interesting recent example of the pitfalls of Turing-completeness arose in the context of the cryptocurrency [Ethereum](https://www.ethereum.org/). 47 | The distinguishing feature of this currency is the ability to design "smart contracts" using an expressive (and in particular Turing-complete) programming language. 48 | In our current "human operated" economy, Alice and Bob might sign a contract to agree that if condition X happens then they will jointly invest in Charlie's company. 49 | Ethereum allows Alice and Bob to create a joint venture where Alice and Bob pool their funds together into an account that will be governed by some program $P$ that decides under what conditions it disburses funds from it. 50 | For example, one could imagine a piece of code that interacts between Alice, Bob, and some program running on Bob's car that allows Alice to rent out Bob's car without any human intervention or overhead. 51 | 52 | 53 | Specifically Ethereum uses the Turing-complete programming language [solidity](https://solidity.readthedocs.io/en/develop/index.html) which has a syntax similar to JavaScript. 54 | The flagship of Ethereum was an experiment known as The "Decentralized Autonomous Organization" or [The DAO](https://goo.gl/NegW77). 55 | The idea was to create a smart contract that would create an autonomously run decentralized venture capital fund, without human managers, where shareholders could decide on investment opportunities. 56 | The DAO was at the time the biggest crowdfunding success in history. At its height the DAO was worth 150 million dollars, which was more than ten percent of the total Ethereum market. 57 | Investing in the DAO (or entering any other "smart contract") amounts to providing your funds to be run by a computer program. i.e., "code is law", or to use the words the DAO described itself: _"The DAO is borne from immutable, unstoppable, and irrefutable computer code"_. 58 | Unfortunately, it turns out that (as we saw in [chapcomputable](){.ref}) understanding the behavior of computer programs is quite a hard thing to do. 59 | A hacker (or perhaps, some would say, a savvy investor) was able to fashion an input that caused the DAO code to enter into an infinite recursive loop in which it continuously transferred funds into the hacker's account, thereby [cleaning out about 60 million dollars](https://www.bloomberg.com/features/2017-the-ether-thief/) out of the DAO. 60 | While this transaction was "legal" in the sense that it complied with the code of the smart contract, it was obviously not what the humans who wrote this code had in mind. 61 | The Ethereum community struggled with the response to this attack. 62 | Some tried the "Robin Hood" approach of using the same loophole to drain the DAO funds into a secure account, but it only had limited success. 63 | Eventually, the Ethereum community decided that the code can be mutable, stoppable, and refutable. 64 | Specifically, the Ethereum maintainers and miners agreed on a "hard fork" (also known as a "bailout") to revert history to before the hacker's transaction occurred. 65 | Some community members strongly opposed this decision, and so an alternative currency called [Ethereum Classic](https://ethereumclassic.github.io/) was created that preserved the original history. 66 | ::: 67 | 68 | 69 | 70 | 71 | ## Context free grammars { #seccfg } 72 | 73 | If you have ever written a program, you've experienced a _syntax error_. 74 | You probably also had the experience of your program entering into an _infinite loop_. 75 | What is less likely is that the compiler or interpreter entered an infinite loop while trying to figure out if your program has a syntax error. 76 | 77 | When a person designs a programming language, they need to determine its _syntax_. 78 | That is, the designer decides which strings corresponds to valid programs, and which ones do not (i.e., which strings contain a syntax error). 79 | To ensure that a compiler or interpreter always halts when checking for syntax errors, language designers typically _do not_ use a general Turing-complete mechanism to express their syntax. 80 | Rather they use a _restricted_ computational model. 81 | One of the most popular choices for such models is _context free grammars_. 82 | 83 | To explain context free grammars, let us begin with a canonical example. 84 | Consider the function $ARITH:\Sigma^* \rightarrow \{0,1\}$ that takes as input a string $x$ over the alphabet $\Sigma = \{ (,),+,-,\times,\div,0,1,2,3,4,5,6,7,8,9\}$ and returns $1$ if and only if the string $x$ represents a valid arithmetic expression. 85 | Intuitively, we build expressions by applying an operation such as $+$,$-$,$\times$ or $\div$ to smaller expressions, or enclosing them in parentheses, where the "base case" corresponds to expressions that are simply numbers. 86 | More precisely, we can make the following definitions: 87 | 88 | * A _digit_ is one of the symbols $0,1,2,3,4,5,6,7,8,9$. 89 | 90 | * A _number_ is a sequence of digits. (For simplicity we drop the condition that the sequence does not have a leading zero, though it is not hard to encode it in a context-free grammar as well.) 91 | 92 | * An _operation_ is one of $+,-,\times,\div$ 93 | 94 | * An _expression_ has either the form "_number_", the form "_sub-expression1 operation sub-expression2_", or the form "(_sub-expression1_)", where "sub-expression1" and "sub-expression2" are themselves expressions. (Note that this is a _recursive_ definition.) 95 | 96 | A context free grammar (CFG) is a formal way of specifying such conditions. 97 | A CFG consists of a set of _rules_ that tell us how to generate strings from smaller components. 98 | In the above example, one of the rules is "if $exp1$ and $exp2$ are valid expressions, then $exp1 \times exp2$ is also a valid expression"; we can also write this rule using the shorthand $expression \; \Rightarrow \; expression \; \times \; expression$. 99 | As in the above example, the rules of a context-free grammar are often _recursive_: the rule $expression \; \Rightarrow\; expression \; \times \; expression$ defines valid expressions in terms of itself. 100 | We now formally define context-free grammars: 101 | 102 | 103 | ::: {.definition title="Context Free Grammar" #defcfg} 104 | Let $\Sigma$ be some finite set. 105 | A _context free grammar (CFG) over $\Sigma$_ is a triple $(V,R,s)$ such that: 106 | 107 | * $V$, known as the _variables_, is a set disjoint from $\Sigma$. 108 | 109 | * $s\in V$ is known as the _initial variable_. 110 | 111 | * $R$ is a set of _rules_. Each rule is a pair $(v,z)$ with $v\in V$ and $z\in (\Sigma \cup V)^*$. We often write the rule $(v,z)$ as $v \Rightarrow z$ and say that the string $z$ _can be derived_ from the variable $v$. 112 | ::: 113 | 114 | ::: {.example title="Context free grammar for arithmetic expressions" #cfgarithmeticex} 115 | The example above of well-formed arithmetic expressions can be captured formally by the following context free grammar: 116 | 117 | * The alphabet $\Sigma$ is $\{ (,),+,-,\times,\div,0,1,2,3,4,5,6,7,8,9\}$ 118 | 119 | * The variables are $V = \{ expression \;,\; number \;,\; digit \;,\; operation \}$. 120 | 121 | * The rules are the set $R$ containing the following $19$ rules: 122 | 123 | - The $4$ rules $operation \Rightarrow +$, $operation \Rightarrow -$, $operation \Rightarrow \times$, and $operation \Rightarrow \div$. 124 | 125 | - The $10$ rules $digit \Rightarrow 0$,$\ldots$, $digit \Rightarrow 9$. 126 | 127 | - The rule $number \Rightarrow digit$. 128 | 129 | - The rule $number \Rightarrow digit\; number$. 130 | 131 | - The rule $expression \Rightarrow number$. 132 | 133 | - The rule $expression \Rightarrow expression \; operation \; expression$. 134 | 135 | - The rule $expression \Rightarrow (expression)$. 136 | 137 | * The starting variable is $expression$ 138 | ::: 139 | 140 | 141 | 142 | People use many different notations to write context free grammars. 143 | One of the most common notations is the [Backus–Naur form](https://goo.gl/R4qZji). 144 | In this notation we write a rule of the form $v \Rightarrow a$ (where $v$ is a variable and $a$ is a string) in the form ` := a`. 145 | If we have several rules of the form $v \mapsto a$, $v \mapsto b$, and $v \mapsto c$ then we can combine them as ` := a|b|c`. 146 | (In words we say that $v$ can derive either $a$, $b$, or $c$.) 147 | For example, the Backus-Naur description for the context free grammar of [cfgarithmeticex](){.ref} is the following (using ASCII equivalents for operations): 148 | 149 | ```python 150 | operation := +|-|*|/ 151 | digit := 0|1|2|3|4|5|6|7|8|9 152 | number := digit|digit number 153 | expression := number|expression operation expression|(expression) 154 | ``` 155 | 156 | Another example of a context free grammar is the "matching parentheses" grammar, which can be represented in Backus-Naur as follows: 157 | 158 | ```python 159 | match := ""|match match|(match) 160 | ``` 161 | 162 | A string over the alphabet $\{$ `(`,`)` $\}$ can be generated from this grammar (where `match` is the starting expression and `""` corresponds to the empty string) if and only if it consists of a matching set of parentheses. 163 | In contrast, by [regexpparn](){.ref} there is no regular expression that matches a string $x$ if and only if $x$ contains a valid sequence of matching parentheses. 164 | 165 | 166 | 167 | 168 | 169 | ### Context-free grammars as a computational model 170 | 171 | We can think of a context-free grammar over the alphabet $\Sigma$ as defining a function that maps every string $x$ in $\Sigma^*$ to $1$ or $0$ depending on whether $x$ can be generated by the rules of the grammars. 172 | We now make this definition formally. 173 | 174 | ::: {.definition title="Deriving a string from a grammar" #CFGderive} 175 | If $G=(V,R,s)$ is a context-free grammar over $\Sigma$, then for two strings $\alpha,\beta \in (\Sigma \cup V)^*$ we say that $\beta$ _can be derived in one step_ from $\alpha$, denoted by $\alpha \Rightarrow_G \beta$, if we can obtain $\beta$ from $\alpha$ by applying one of the rules of $G$. That is, we obtain $\beta$ by replacing in $\alpha$ one occurrence of the variable $v$ with the string $z$, where $v \Rightarrow z$ is a rule of $G$. 176 | 177 | We say that $\beta$ _can be derived_ from $\alpha$, denoted by $\alpha \Rightarrow_G^* \beta$, if it can be derived by some finite number $k$ of steps. 178 | That is, if there are $\alpha_1,\ldots,\alpha_{k-1} \in (\Sigma \cup V)^*$, so that $\alpha \Rightarrow_G \alpha_1 \Rightarrow_G \alpha_2 \Rightarrow_G \cdots \Rightarrow_G \alpha_{k-1} \Rightarrow_G \beta$. 179 | 180 | We say that $x\in \Sigma^*$ is _matched_ by $G=(V,R,s)$ if $x$ can be derived from the starting variable $s$ (i.e., if $s \Rightarrow_G^* x$). 181 | We define the _function computed by_ $(V,R,s)$ to be the map $\Phi_{V,R,s}:\Sigma^* \rightarrow \{0,1\}$ such that $\Phi_{V,R,s}(x)=1$ iff $x$ is matched by $(V,R,s)$. 182 | A function $F:\Sigma^* \rightarrow \{0,1\}$ is _context free_ if $F = \Phi_{V,R,s}$ for some CFG $(V,R,s)$.^[As in the case of [matchingregexpdef](){.ref} we can also use _language_ rather than _function_ notation and say that a language $L \subseteq \Sigma^*$ is _context free_ if the function $F$ such that $F(x)=1$ iff $x\in L$ is context free.] 183 | ::: 184 | 185 | A priori it might not be clear that the map $\Phi_{V,R,s}$ is computable, but it turns out that this is the case. 186 | 187 | > ### {.theorem title="Context-free grammars always halt" #CFGhalt} 188 | For every CFG $(V,R,s)$ over $\{0,1\}$, the function $\Phi_{V,R,s}:\{0,1\}^* \rightarrow \{0,1\}$ is computable. 189 | 190 | As usual we restrict attention to grammars over $\{0,1\}$ although the proof extends to any finite alphabet $\Sigma$. 191 | 192 | ::: {.proof #proofofcfghalt data-ref="CFGhalt"} 193 | We only sketch the proof. 194 | We start with the observation we can convert every CFG to an equivalent version of _Chomsky normal form_, where all rules either have the form $u \rightarrow vw$ for variables $u,v,w$ or the form $u \rightarrow \sigma$ for a variable $u$ and symbol $\sigma \in \Sigma$, plus potentially the rule $s \rightarrow ""$ where $s$ is the starting variable. 195 | 196 | The idea behind such a transformation is to simply add new variables as needed, and so for example we can translate a rule such as $v \rightarrow u\sigma w$ into the three rules $v \rightarrow ur$, $r \rightarrow tw$ and $t \rightarrow \sigma$. 197 | 198 | Using the Chomsky Normal form we get a natural recursive algorithm for computing whether $s \Rightarrow_G^* x$ for a given grammar $G$ and string $x$. 199 | We simply try all possible guesses for the first rule $s \rightarrow uv$ that is used in such a derivation, and then all possible ways to partition $x$ as a concatenation $x=x'x''$. 200 | If we guessed the rule and the partition correctly, then this reduces our task to checking whether $u \Rightarrow_G^* x'$ and $v \Rightarrow_G^* x''$, which (as it involves shorter strings) can be done recursively. 201 | The base cases are when $x$ is empty or a single symbol, and can be easily handled. 202 | ::: 203 | 204 | 205 | ::: {.remark title="Parse trees" #parsetreesrem} 206 | While we focus on the task of _deciding_ whether a CFG matches a string, the algorithm to compute $\Phi_{V,R,s}$ actually gives more information than that. 207 | That is, on input a string $x$, if $\Phi_{V,R,s}(x)=1$ then the algorithm yields the sequence of rules that one can apply from the starting vertex $s$ to obtain the final string $x$. 208 | We can think of these rules as determining a _tree_ with $s$ being the _root_ vertex and the sinks (or _leaves_) corresponding to the substrings of $x$ that are obtained by the rules that do not have a variable in their second element. 209 | This tree is known as the _parse tree_ of $x$, and often yields very useful information about the structure of $x$. 210 | 211 | Often the first step in a compiler or interpreter for a programming language is a _parser_ that transforms the source into the parse tree (also known as the [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree)). 212 | There are also tools that can automatically convert a description of a context-free grammars into a parser algorithm that computes the parse tree of a given string. 213 | (Indeed, the above recursive algorithm can be used to achieve this, but there are much more efficient versions, especially for grammars that have [particular forms](https://en.wikipedia.org/wiki/LR_parser), and programming language designers often try to ensure their languages have these more efficient grammars.) 214 | ::: 215 | 216 | 217 | ### The power of context free grammars 218 | 219 | Context free grammars can capture every regular expression: 220 | 221 | > ### {.theorem title="Context free grammars and regular expressions" #CFGreg} 222 | Let $e$ be a regular expression over $\{0,1\}$, then there is a CFG $(V,R,s)$ over $\{0,1\}$ such that $\Phi_{V,R,s}=\Phi_{e}$. 223 | 224 | ::: {.proof #proofofCFGreg data-ref="CFGreg"} 225 | We prove the theorem by induction on the length of $e$. 226 | If $e$ is an expression of one bit length, then $e=0$ or $e=1$, in which case we leave it to the reader to verify that there is a (trivial) CFG that computes it. 227 | Otherwise, we fall into one of the following case: __case 1:__ $e = e'e''$, __case 2:__ $e = e'|e''$ or __case 3:__ $e=(e')^*$ where in all cases $e',e''$ are shorter regular expressions. 228 | By the induction hypothesis, we can define grammars $(V',R',s')$ and $(V'',R'',s'')$ that compute $\Phi_{e'}$ and $\Phi_{e''}$ respectively. By renaming variables, we can also assume without loss of generality that $V'$ and $V''$ are disjoint. 229 | 230 | In case 1, we can define the new grammar as follows: we add a new starting variable $s \not\in V \cup V'$ and the rule $s \mapsto s's''$. 231 | In case 2, we can define the new grammar as follows: we add a new starting variable $s \not\in V \cup V'$ and the rules $s \mapsto s'$ and $s \mapsto s''$. 232 | Case 3 will be the only one that uses _recursion_. 233 | As before we add a new starting variable $s \not\in V \cup V'$, but now add the rules $s \mapsto ""$ (i.e., the empty string) and also add, for every rule of the form $(s',\alpha) \in R'$, the rule $s \mapsto s\alpha$ to $R$. 234 | 235 | We leave it to the reader as (a very good!) exercise to verify that in all three cases the grammars we produce capture the same function as the original expression. 236 | ::: 237 | 238 | It turns out that CFG's are strictly more powerful than regular expressions. 239 | In particular, as we've seen, the "matching parentheses" function $MATCHPAREN$ can be computed by a context free grammar, whereas, as shown in [regexpparn](){.ref}, it cannot be computed by regular expressions. 240 | Here is another example: 241 | 242 | ::: {.solvedexercise title="Context free grammar for palindromes" #reversedstringcfg} 243 | Let $PAL:\{0,1,;\}^* \rightarrow \{0,1\}$ be the function defined in [palindromenotreg](){.ref} where $PAL(w)=1$ iff $w$ has the form $u;u^R$. 244 | Then $PAL$ can be computed by a context-free grammar 245 | ::: 246 | 247 | ::: {.solution data-ref="reversedstringcfg"} 248 | A simple grammar computing $PAL$ can be described using Backus–Naur notation: 249 | 250 | ```python 251 | start := ; | 0 start 0 | 1 start 1 252 | ``` 253 | 254 | One can prove by induction that this grammar generates exactly the strings $w$ such that $PAL(w)=1$. 255 | ::: 256 | 257 | A more interesting example is computing the strings of the form $u;v$ that are _not_ palindromes: 258 | 259 | ::: {.solvedexercise title="Non-palindromes" #nonpalindrome} 260 | Prove that there is a context free grammar that computes $NPAL:\{0,1,;\}^* \rightarrow \{0,1\}$ where $NPAL(w)=1$ if $w=u;v$ but $v \neq u^R$. 261 | ::: 262 | 263 | ::: {.solution data-ref="nonpalindrome"} 264 | Using Backus–Naur notation we can describe such a grammar as follows 265 | 266 | ```python 267 | palindrome := ; | 0 palindrome 0 | 1 palindrome 1 268 | different := 0 palindrome 1 | 1 palindrome 0 269 | start := different | 0 start | 1 start | start 0 | start 1 270 | ``` 271 | 272 | In words, this means that we can characterize a string $w$ such that $NPAL(w)=1$ as having the following form 273 | 274 | $$ 275 | w = \alpha b u ; u^R b' \beta 276 | $$ 277 | 278 | where $\alpha,\beta,u$ are arbitrary strings and $b \neq b'$. 279 | Hence we can generate such a string by first generating a palindrome $u; u^R$ (`palindrome` variable), then adding $0$ on either the left or right and $1$ on the opposite side to get something that is _not_ a palindrome (`different` variable), and then we can add arbitrary number of $0$'s and $1$'s on either end (the `start` variable). 280 | ::: 281 | 282 | 283 | ### Limitations of context-free grammars (optional) 284 | 285 | Even though context-free grammars are more powerful than regular expressions, there are some simple languages that are _not_ captured by context free grammars. 286 | One tool to show this is the context-free grammar analog of the "pumping lemma" ([pumping](){.ref}): 287 | 288 | > ### {.theorem title="Context-free pumping lemma" #cfgpumping} 289 | Let $(V,R,s)$ be a CFG over $\Sigma$, then there is some numbers $n_0,n_1 \in \N$ such that for every $x \in \Sigma^*$ with $|x|>n_0$, if $\Phi_{V,R,s}(x)=1$ then $x=abcde$ such that $|b|+|c|+|d| \leq n_1$, $|b|+|d| \geq 1$, and $\Phi_{V,R,s}(ab^kcd^ke)=1$ for every $k\in \N$. 290 | 291 | ::: { .pause } 292 | The context-free pumping lemma is even more cumbersome to state than its regular analog, but you can remember it as saying the following: _"If a long enough string is matched by a grammar, there must be a variable that is repeated in the derivation."_ 293 | ::: 294 | 295 | ::: {.proof #proofofcfgpumping data-ref="cfgpumping"} 296 | We only sketch the proof. The idea is that if the total number of symbols in the rules of the grammar is $n_0$, then the only way to get $|x|>n_0$ with $\Phi_{V,R,s}(x)=1$ is to use _recursion_. 297 | That is, there must be some variable $v \in V$ such that we are able to derive from $v$ the value $bvd$ for some strings $b,d \in \Sigma^*$, and then further on derive from $v$ some string $c\in \Sigma^*$ such that $bcd$ is a substring of $x$ (in other words, $x=abcde$ for some $a,e \in \{0,1\}^*$). If we take the variable $v$ satisfying this requirement with a minimum number of derivation steps, then we can ensure that $|bcd|$ is at most some constant depending on $n_0$ and we can set $n_1$ to be that constant ($n_1=10 \cdot |R| \cdot n_0$ will do, since we will not need more than $|R|$ applications of rules, and each such application can grow the string by at most $n_0$ symbols). 298 | 299 | 300 | Thus by the definition of the grammar, we can repeat the derivation to replace the substring $bcd$ in $x$ with $b^kcd^k$ for every $k\in \N$ while retaining the property that the output of $\Phi_{V,R,s}$ is still one. Since $bcd$ is a substring of $x$, we can write $x=abcde$ and are guaranteed that $ab^kcd^ke$ is matched by the grammar for every $k$. 301 | ::: 302 | 303 | Using [cfgpumping](){.ref} one can show that even the simple function $F:\{0,1\}^* \rightarrow \{0,1\}$ defined as follows: 304 | $$F(x) = \begin{cases}1 & x =ww \text{ for some } w\in \{0,1\}^* \\ 0 & \text{otherwise} \end{cases}$$ 305 | is not context free. 306 | (In contrast, the function $G:\{0,1\}^* \rightarrow \{0,1\}$ defined as $G(x)=1$ iff $x=w_0w_1\cdots w_{n-1}w_{n-1}w_{n-2}\cdots w_0$ for some $w\in \{0,1\}^*$ and $n=|w|$ is context free, can you see why?.) 307 | 308 | ::: {.solvedexercise title="Equality is not context-free" #equalisnotcfg} 309 | Let $EQ:\{0,1,;\}^* \rightarrow \{0,1\}$ be the function such that $EQ(x)=1$ if and only if $x=u;u$ for some $u\in \{0,1\}^*$. 310 | Then $EQ$ is not context free. 311 | ::: 312 | 313 | ::: {.solution data-ref="equalisnotcfg"} 314 | We use the context-free pumping lemma. 315 | Suppose towards the sake of contradiction that there is a grammar $G$ that computes $EQ$, and let $n_0$ be the constant obtained from [cfgpumping](){.ref}. 316 | 317 | Consider the string $x= 1^{n_0}0^{n_0};1^{n_0}0^{n_0}$, and write it as $x=abcde$ as per [cfgpumping](){.ref}, with $|bcd| \leq n_0$ and with $|b|+|d| \geq 1$. 318 | By [cfgpumping](){.ref}, it should hold that $EQ(ace)=1$. 319 | However, by case analysis this can be shown to be a contradiction. 320 | 321 | Firstly, unless $b$ is on the left side of the $;$ separator and $d$ is on the right side, dropping $b$ and $d$ will definitely make the two parts different. 322 | But if it is the case that $b$ is on the left side and $d$ is on the right side, then by the condition that $|bcd| \leq n_0$ we know that $b$ is a string of only zeros and $d$ is a string of only ones. 323 | If we drop $b$ and $d$ then since one of them is non-empty, we get that there are either less zeroes on the left side than on the right side, or there are less ones on the right side than on the left side. 324 | In either case, we get that $EQ(ace)=0$, obtaining the desired contradiction. 325 | ::: 326 | 327 | 328 | ## Semantic properties of context free languages 329 | 330 | 331 | As in the case of regular expressions, the limitations of context free grammars do provide some advantages. 332 | For example, emptiness of context free grammars is decidable: 333 | 334 | > ### {.theorem title="Emptiness for CFG's is decidable" #cfgemptinessthem} 335 | There is an algorithm that on input a context-free grammar $G$, outputs $1$ if and only if $\Phi_G$ is the constant zero function. 336 | 337 | > ### {.proofidea data-ref="cfgemptinessthem"} 338 | The proof is easier to see if we transform the grammar to Chomsky Normal Form as in [CFGhalt](){.ref}. Given a grammar $G$, we can recursively define a non-terminal variable $v$ to be _non-empty_ if there is either a rule of the form $v \Rightarrow \sigma$, or there is a rule of the form $v \Rightarrow uw$ where both $u$ and $w$ are non-empty. 339 | Then the grammar is non-empty if and only if the starting variable $s$ is non-empty. 340 | 341 | ::: {.proof data-ref="cfgemptinessthem"} 342 | We assume that the grammar $G$ in Chomsky Normal Form as in [CFGhalt](){.ref}. We consider the following procedure for marking variables as "non-empty": 343 | 344 | 1. We start by marking all variables $v$ that are involved in a rule of the form $v \Rightarrow \sigma$ as non-empty. 345 | 346 | 2. We then continue to mark $v$ as non-empty if it is involved in a rule of the form $v \Rightarrow uw$ where $u,w$ have been marked before. 347 | 348 | We continue this way until we cannot mark any more variables. 349 | We then declare that the grammar is empty if and only if $s$ has not been marked. 350 | To see why this is a valid algorithm, note that if a variable $v$ has been marked as "non-empty" then there is some string $\alpha\in \Sigma^*$ that can be derived from $v$. 351 | On the other hand, if $v$ has not been marked, then every sequence of derivations from $v$ will always have a variable that has not been replaced by alphabet symbols. 352 | Hence in particular $\Phi_G$ is the all zero function if and only if the starting variable $s$ is not marked "non-empty". 353 | ::: 354 | 355 | ### Uncomputability of context-free grammar equivalence (optional) 356 | 357 | By analogy to regular expressions, one might have hoped to get an algorithm for deciding whether two given context free grammars are equivalent. 358 | Alas, no such luck. It turns out that the equivalence problem for context free grammars is _uncomputable_. 359 | This is a direct corollary of the following theorem: 360 | 361 | > ### {.theorem title="Fullness of CFG's is uncomputable" #fullnesscfgdef} 362 | For every set $\Sigma$, let $CFGFULL_\Sigma$ be the function that on input a context-free grammar $G$ over $\Sigma$, outputs $1$ if and only if $G$ computes the constant $1$ function. 363 | Then there is some finite $\Sigma$ such that $CFGFULL_\Sigma$ is uncomputable. 364 | 365 | [fullnesscfgdef](){.ref} immediately implies that equivalence for context-free grammars is uncomputable, since computing "fullness" of a grammar $G$ over some alphabet $\Sigma = \{\sigma_0,\ldots,\sigma_{k-1} \}$ corresponds to checking whether $G$ is equivalent to the grammar $s \Rightarrow ""|s\sigma_0|\cdots|s\sigma_{k-1}$. 366 | Note that [fullnesscfgdef](){.ref} and [cfgemptinessthem](){.ref} together imply that context-free grammars, unlike regular expressions, are _not_ closed under complement. (Can you see why?) 367 | Since we can encode every element of $\Sigma$ using $\ceil{\log |\Sigma|}$ bits (and this finite encoding can be easily carried out within a grammar) [fullnesscfgdef](){.ref} implies that fullness is also uncomputable for grammars over the binary alphabet. 368 | 369 | 370 | ::: {.proofidea data-ref="fullnesscfgdef"} 371 | We prove the theorem by reducing from the Halting problem. 372 | To do that we use the notion of _configurations_ of NAND-TM programs, as defined in [configtmdef](){.ref}. 373 | Recall that a _configuration_ of a program $P$ is a binary string $s$ that encodes all the information about the program in the current iteration. 374 | 375 | We define $\Sigma$ to be $\{0,1\}$ plus some separator characters and define $INVALID_P:\Sigma^* \rightarrow \{0,1\}$ to be the function that maps every string $L\in \Sigma^*$ to $1$ if and only if $L$ does _not_ encode a sequence of configurations that correspond to a valid halting history of the computation of $P$ on the empty input. 376 | 377 | The heart of the proof is to show that $INVALID_P$ is context-free. Once we do that, we see that $P$ halts on the empty input if and only if $INVALID_P(L)=1$ for _every_ $L$. 378 | To show that, we will encode the list in a special way that makes it amenable to deciding via a context-free grammar. 379 | Specifically we will reverse all the odd-numbered strings. 380 | ::: 381 | 382 | 383 | ::: {.proof data-ref="fullnesscfgdef"} 384 | We only sketch the proof. 385 | We will show that if we can compute $CFGFULL$ then we can solve $HALTONZERO$, which has been proven uncomputable in [haltonzero-thm](){.ref}. 386 | Let $M$ be an input Turing machine for $HALTONZERO$. We will use the notion of _configurations_ of a Turing machine, as defined in [configtmdef](){.ref}. 387 | 388 | Recall that a _configuration_ of Turing machine $M$ and input $x$ captures the full state of $M$ at some point of the computation. 389 | The particular details of configurations are not so important, but what you need to remember is that: 390 | 391 | * A configuration can be encoded by a binary string $\sigma \in \{0,1\}^*$. 392 | 393 | * The _initial_ configuration of $M$ on the input $0$ is some fixed string. 394 | 395 | * A _halting configuration_ will have the value a certain state (which can be easily "read off" from it) set to $1$. 396 | 397 | * If $\sigma$ is a configuration at some step $i$ of the computation, we denote by $NEXT_M(\sigma)$ as the configuration at the next step. $NEXT_M(\sigma)$ is a string that agrees with $\sigma$ on all but a constant number of coordinates (those encoding the position corresponding to the head position and the two adjacent ones). On those coordinates, the value of $NEXT_M(\sigma)$ can be computed by some finite function. 398 | 399 | We will let the alphabet $\Sigma = \{0,1\} \cup \{ \| , \# \}$. 400 | A _computation history_ of $M$ on the input $0$ is a string $L\in \Sigma$ that corresponds to a list $\| \sigma_0 \# \sigma_1 \| \sigma_2 \# \sigma_3 \cdots \sigma_{t-2} \| \sigma_{t-1} \#$ (i.e., $\|$ comes before an even numbered block, and $\#$ comes before an odd numbered one) such that if $i$ is even then $\sigma_i$ is the string encoding the configuration of $P$ on input $0$ at the beginning of its $i$-th iteration, and if $i$ is odd then it is the same except the string is _reversed_. 401 | (That is, for odd $i$, $rev(\sigma_i)$ encodes the configuration of $P$ on input $0$ at the beginning of its $i$-th iteration.) 402 | Reversing the odd-numbered blocks is a technical trick to ensure that the function $INVALID_M$ we define below is context free. 403 | 404 | We now define $INVALID_M:\Sigma^* \rightarrow \{0,1\}$ as follows: 405 | 406 | $$INVALID_M(L) = \begin{cases}0 & \text{$L$ is a valid computation history of $M$ on $0$} \\ 407 | 1 & \text{otherwise} \end{cases} 408 | $$ 409 | 410 | We will show the following claim: 411 | 412 | __CLAIM:__ $INVALID_M$ is context-free. 413 | 414 | The claim implies the theorem. Since $M$ halts on $0$ if and only if there exists a valid computation history, $INVALID_M$ is the constant one function if and only if $M$ does _not_ halt on $0$. 415 | In particular, this allows us to reduce determining whether $M$ halts on $0$ to determining whether the grammar $G_M$ corresponding to $INVALID_M$ is full. 416 | 417 | We now turn to the proof of the claim. 418 | We will not show all the details, but the main point $INVALID_M(L)=1$ if _at least one_ of the following three conditions hold: 419 | 420 | 1. $L$ is not of the right format, i.e. not of the form $\langle \text{binary-string} \rangle \# \langle \text{binary-string} \rangle \| \langle \text{binary-string} \rangle \# \cdots$. 421 | 422 | 2. $L$ contains a substring of the form $\| \sigma \# \sigma' \|$ such that $\sigma' \neq rev(NEXT_P(\sigma))$ 423 | 424 | 3. $L$ contains a substring of the form $\# \sigma \| \sigma' \#$ such that $\sigma' \neq NEXT_P(rev(\sigma))$ 425 | 426 | Since context-free functions are closed under the OR operation, the claim will follow if we show that we can verify conditions 1, 2 and 3 via a context-free grammar. 427 | 428 | For condition 1 this is very simple: checking that $L$ _is_ of the correct format can be done using a regular expression. 429 | Since regular expressions are closed under negation, this means that checking that $L$ is _not_ of this format can also be done by a regular expression and hence by a context-free grammar. 430 | 431 | For conditions 2 and 3, this follows via very similar reasoning to that showing that the function $F$ such that $F(u\#v)=1$ iff $u \neq rev(v)$ is context-free, see [nonpalindrome](){.ref}. 432 | After all, the $NEXT_M$ function only modifies its input in a constant number of places. We leave filling out the details as an exercise to the reader. 433 | Since $INVALID_M(L)=1$ if and only if $L$ satisfies one of the conditions 1., 2. or 3., and all three conditions can be tested for via a context-free grammar, this completes the proof of the claim and hence the theorem. 434 | ::: 435 | 436 | ## Summary of semantic properties for regular expressions and context-free grammars 437 | 438 | To summarize, we can often trade _expressiveness_ of the model for _amenability to analysis_. 439 | If we consider computational models that are _not_ Turing complete, then we are sometimes able to bypass Rice's Theorem and answer certain semantic questions about programs in such models. 440 | Here is a summary of some of what is known about semantic questions for the different models we have seen. 441 | 442 | ```table 443 | --- 444 | caption: 'Computability of semantic properties' 445 | alignment: '' 446 | table-width: '' 447 | id: semantictable 448 | --- 449 | _Model_, **Halting**, **Emptiness**, **Equivalence** 450 | _Regular expressions_, Computable, Computable,Computable 451 | _Context free grammars_, Computable, Computable, Uncomputable 452 | _Turing-complete models_, Uncomputable, Uncomputable, Uncomputable 453 | ``` 454 | 455 | 456 | 457 | 458 | 459 | 460 | > ### { .recap } 461 | * The uncomputability of the Halting problem for general models motivates the definition of restricted computational models. 462 | * In some restricted models we can answer _semantic_ questions such as: does a given program terminate, or do two programs compute the same function? 463 | * _Regular expressions_ are a restricted model of computation that is often useful to capture tasks of string matching. We can test efficiently whether an expression matches a string, as well as answer questions such as Halting and Equivalence. 464 | * _Context free grammars_ is a stronger, yet still not Turing complete, model of computation. The halting problem for context free grammars is computable, but equivalence is not computable. 465 | 466 | 467 | ## Exercises 468 | 469 | ::: {.exercise title="Closure properties of context-free functions" #closurecfgex} 470 | Suppose that $F,G:\{0,1\}^* \rightarrow \{0,1\}$ are context free. For each one of the following definitions of the function $H$, either prove that $H$ is always context free or give a counterexample for regular $F,G$ that would make $H$ not context free. 471 | 472 | 1. $H(x) = F(x) \vee G(x)$. 473 | 474 | 2. $H(x) = F(x) \wedge G(x)$ 475 | 476 | 3. $H(x) = NAND(F(x),G(x))$. 477 | 478 | 4. $H(x) = F(x^R)$ where $x^R$ is the reverse of $x$: $x^R = x_{n-1}x_{n-2} \cdots x_o$ for $n=|x|$. 479 | 480 | 5. $H(x) = \begin{cases}1 & x=uv \text{ s.t. } F(u)=G(v)=1 \\ 0 & \text{otherwise} \end{cases}$ 481 | 482 | 6. $H(x) = \begin{cases}1 & x=uu \text{ s.t. } F(u)=G(u)=1 \\ 0 & \text{otherwise} \end{cases}$ 483 | 484 | 485 | 7. $H(x) = \begin{cases}1 & x=uu^R \text{ s.t. } F(u)=G(u)=1 \\ 0 & \text{otherwise} \end{cases}$ 486 | ::: 487 | 488 | ::: {.exercise #noncontextfreeex} 489 | Prove that the function $F:\{0,1\}^* \rightarrow \{0,1\}$ such that $F(x)=1$ if and only if $|x|$ is a power of two is not context free. 490 | ::: 491 | 492 | 493 | 494 | ::: {.exercise title="Syntax for programming languages" #proglanguagecfgex} 495 | Consider the following syntax of a "programming language" whose source can be written using the [ASCII](https://en.wikipedia.org/wiki/ASCII) character set: 496 | 497 | * _Variables_ are obtained by a sequence of letters, numbers and underscores, but can't start with a number. 498 | 499 | 500 | * A _statement_ has either the form `foo = bar;` where `foo` and `bar` are variables, or the form `IF (foo) BEGIN ... END` where `...` is list of one or more statements, potentially separated by newlines. 501 | 502 | A _program_ in our language is simply a sequence of statements (possibly separated by newlines or spaces). 503 | 504 | 1. Let $VAR:\{0,1\}^* \rightarrow \{0,1\}$ be the function that given a string $x\in \{0,1\}^*$, outputs $1$ if and only if $x$ corresponds to an ASCII encoding of a valid variable identifier. Prove that $VAR$ is regular. 505 | 506 | 2. Let $SYN:\{0,1\}^* \rightarrow \{0,1\}$ be the function that given a string $s \in \{0,1\}^*$, outputs $1$ if and only if $s$ is an ASCII encoding of a valid program in our language. Prove that $SYN$ is context free. (You do not have to specify the full formal grammar for $SYN$, but you need to show that such a grammar exists.) 507 | 508 | 3. Prove that $SYN$ is not regular. See footnote for hint^[Try to see if you can "embed" in some way a function that looks similar to $MATCHPAREN$ in $SYN$, so you can use a similar proof. Of course for a function to be non-regular, it does not need to utilize literal parentheses symbols.] 509 | ::: 510 | 511 | 512 | 513 | 514 | 515 | 516 | 517 | ## Bibliographical notes 518 | 519 | 520 | As in the case of regular expressions, there are many resources available that cover context-free grammar in great detail. 521 | Chapter 2 of [@SipserBook] contains many examples of context-free grammars and their properties. 522 | There are also websites such as [Grammophone](https://mdaines.github.io/grammophone/) where you can input grammars, and see what strings they generate, as well as some of the properties that they satisfy. 523 | 524 | 525 | The adjective "context free" is used for CFG's because a rule of the form $v \mapsto a$ means that we can _always_ replace $v$ with the string $a$, no matter what is the _context_ in which $v$ appears. 526 | More generally, we might want to consider cases where the replacement rules depend on the context. 527 | This gives rise to the notion of _general (aka "Type 0") grammars_ that allow rules of the form $a \Rightarrow b$ where both $a$ and $b$ are strings over $(V \cup \Sigma)^*$. 528 | The idea is that if, for example, we wanted to enforce the condition that we only apply some rule such as $v \mapsto 0w1$ when $v$ is surrounded by three zeroes on both sides, then we could do so by adding a rule of the form $000v000 \mapsto 0000w1000$ (and of course we can add much more general conditions). 529 | Alas, this generality comes at a cost - general grammars are Turing complete and hence their halting problem is uncomputable. 530 | That is, there is no algorithm $A$ that can determine for every general grammar $G$ and a string $x$, whether or not the grammar $G$ generates $x$. 531 | 532 | 533 | 534 | The [Chomsky Hierarchy](https://en.wikipedia.org/wiki/Chomsky_hierarchy) is a hierarchy of grammars from the least restrictive (most powerful) Type 0 grammars, which correspond to _recursively enumerable_ languages (see [recursiveenumerableex](){.ref}) to the most restrictive Type 3 grammars, which correspond to regular languages. 535 | Context-free languages correspond to Type 2 grammars. 536 | Type 1 grammars are _context sensitive grammars_. 537 | These are more powerful than context-free grammars but still less powerful than Turing machines. 538 | In particular functions/languages corresponding to context-sensitive grammars are always computable, and in fact can be computed by a [linear bounded automatons](https://en.wikipedia.org/wiki/Linear_bounded_automaton) which are non-deterministic algorithms that take $O(n)$ space. 539 | For this reason, the class of functions/languages corresponding to context-sensitive grammars is also known as the complexity class $\mathbf{NSPACE}O(n)$; we discuss space-bounded complexity in [spacechap](){.ref}). 540 | While Rice's Theorem implies that we cannot compute any non-trivial semantic property of Type 0 grammars, the situation is more complex for other types of grammars: some semantic properties can be determined and some cannot, depending on the grammar's place in the hierarchy. 541 | -------------------------------------------------------------------------------- /lec_14a_space_complexity.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Space bounded computation" 3 | filename: "lec_14a_space_complexity" 4 | chapternum: "17" 5 | --- 6 | 7 | 8 | 9 | # Space bounded computation { #spacechap } 10 | 11 | PLAN: Example of space bounded algorithms, importance of preserving space. The classes L and PSPACE, space hierarchy theorem, PSPACE=NPSPACE, constant space = regular languages. 12 | 13 | 14 | 15 | 16 | ## Exercises 17 | 18 | 19 | 20 | 21 | ## Bibliographical notes 22 | 23 | 24 | -------------------------------------------------------------------------------- /lec_16_randomized_alg.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Probabilistic computation" 3 | filename: "lec_16_randomized_alg" 4 | chapternum: "19" 5 | --- 6 | 7 | # Probabilistic computation { #randomizedalgchap } 8 | 9 | > ### { .objectives } 10 | * See examples of randomized algorithms \ 11 | * Get more comfort with analyzing probabilistic processes and tail bounds \ 12 | * Success amplification using tail bounds \ 13 | 14 | 15 | 16 | > _"in 1946 .. (I asked myself) what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method ... might not be to lay it out say one hundred times and simply observe and count"_, Stanislaw Ulam, 1983 17 | 18 | >_"The salient features of our method are that it is probabilistic ... and with a controllable miniscule probability of error."_, Michael Rabin, 1977 19 | 20 | In early computer systems, much effort was taken to drive _out_ randomness and noise. 21 | Hardware components were prone to non-deterministic behavior from a number of causes, whether it was vacuum tubes overheating or actual physical bugs causing short circuits (see [bugfig](){.ref}). 22 | This motivated John von Neumann, one of the early computing pioneers, to write a paper on how to _error correct_ computation, introducing the notion of _redundancy_. 23 | 24 | ![A 1947 entry in the [log book](http://americanhistory.si.edu/collections/search/object/nmah_334663) of the Harvard MARK II computer containing an actual bug that caused a hardware malfunction. By Courtesy of the Naval Surface Warfare Center.](../figure/bug.jpg){#bugfig .margin } 25 | 26 | So it is quite surprising that randomness turned out not just a hindrance but also a _resource_ for computation, enabling us to achieve tasks much more efficiently than previously known. 27 | One of the first applications involved the very same John von Neumann. 28 | While he was sick in bed and playing cards, Stan Ulam came up with the observation that calculating statistics of a system could be done much faster by running several randomized simulations. 29 | He mentioned this idea to von Neumann, who became very excited about it; indeed, it turned out to be crucial for the neutron transport calculations that were needed for development of the Atom bomb and later on the hydrogen bomb. 30 | Because this project was highly classified, Ulam, von Neumann and their collaborators came up with the codeword "Monte Carlo" for this approach (based on the famous casinos where Ulam's uncle gambled). 31 | The name stuck, and randomized algorithms are known as Monte Carlo algorithms to this day.^[Some texts also talk about "Las Vegas algorithms" that always return the right answer but whose running time is only polynomial on the average. Since this Monte Carlo vs Las Vegas terminology is confusing, we will not use these terms anymore, and simply talk about randomized algorithms.] 32 | 33 | In this chapter, we will see some examples of randomized algorithms that use randomness to compute a quantity in a faster or simpler way than was known otherwise. 34 | We will describe the algorithms in an informal / "pseudo-code" way, rather than as Turing macines or NAND-TM/NAND-RAM programs. 35 | In [chapmodelrand](){.ref} we will discuss how to augment the computational models we saw before to incorporate the ability to "toss coins". 36 | 37 | ::: {.nonmath} 38 | This chapter gives some examples of randomized algorithms to get a sense of why probability can be useful for computation. 39 | We will also see the technique of _success amplification_ which is key for many randomized algorithms. 40 | ::: 41 | 42 | 43 | ## Finding approximately good maximum cuts 44 | 45 | We start with the following example. 46 | Recall the _maximum cut problem_ of finding, given a graph $G=(V,E)$, the cut that maximizes the number of edges. 47 | This problem is $\mathbf{NP}$-hard, which means that we do not know of any efficient algorithm that can solve it, but randomization enables a simple algorithm that can cut at least half of the edges: 48 | 49 | > ### {.theorem title="Approximating max cut" #maxcutthm} 50 | There is an efficient probabilistic algorithm that on input an $n$-vertex $m$-edge graph $G$, outputs a cut $(S,\overline{S})$ that cuts at least $m/2$ of the edges of $G$ in expectation. 51 | 52 | > ### {.proofidea data-ref="maxcutthm"} 53 | We simply choose a _random cut_: we choose a subset $S$ of vertices by choosing every vertex $v$ to be a member of $S$ with probability $1/2$ independently. It's not hard to see that each edge is cut with probability $1/2$ and so the expected number of cut edges is $m/2$. 54 | 55 | ::: {.proof data-ref="maxcutthm"} 56 | The algorithm is extremely simple: 57 | 58 | ::: {.quote} 59 | __Algorithm Random Cut:__ 60 | 61 | __Input:__ Graph $G=(V,E)$ with $n$ vertices and $m$ edges. Denote $V = \{ v_0,v_1,\ldots, v_{n-1}\}$. 62 | 63 | __Operation:__ 64 | 65 | 1. Pick $x$ uniformly at random in $\{0,1\}^n$. 66 | 67 | 2. Let $S \subseteq V$ be the set $\{ v_i \;:\; x_i = 1 \;,\; i\in [n] \}$ that includes all vertices corresponding to coordinates of $x$ where $x_i=1$. 68 | 69 | 3. Output the cut $(S,\overline{S})$. 70 | ::: 71 | 72 | We claim that the expected number of edges cut by the algorithm is $m/2$. 73 | Indeed, for every edge $e \in E$, let $X_e$ be the random variable such that $X_e(x)=1$ if the edge $e$ is cut by $x$, and $X_e(x)=0$ otherwise. 74 | For every such edge $e =\{ i,j \}$, $X_e(x)=1$ if and only if $x_i \neq x_j$. 75 | Since the pair $(x_i,x_j)$ obtains each of the values $00,01,10,11$ with probability $1/4$, the probability that $x_i \neq x_j$ is $1/2$ and hence $\E[X_e]=1/2$. 76 | If we let $X$ be the random variable corresponding to the total number of edges cut by $S$, then $X = \sum_{e\in E} X_e$ and hence by linearity of expectation 77 | 78 | $$\E[X] = \sum_{e\in E} \E[X_e] = m(1/2) = m/2 \;.$$ 79 | ::: 80 | 81 | __Randomized algorithms work in the worst case.__ It is tempting to think of a randomized algorithm such as the one of [maxcutthm](){.ref} as an algorithm that works for a "random input graph" but it is actually much better than that. 82 | The expectation in this theorem is _not_ taken over the choice of the graph, but rather only over the _random choices of the algorithm_. 83 | In particular, _for every graph $G$_, the algorithm is guaranteed to cut half of the edges of the input graph in expectation. 84 | That is, 85 | 86 | ::: { .bigidea #randomworstcaseidea} 87 | A randomized algorithm outputs the correct value with good probability on _every possible input_. 88 | ::: 89 | 90 | We will define more formally what "good probability" means in [chapmodelrand](){.ref} but the crucial point is that this probability is always only taken over the random choices of the algorithm, while the input is _not_ chosen at random. 91 | 92 | 93 | 94 | 95 | 96 | ### Amplifying the success of randomized algorithms 97 | 98 | [maxcutthm](){.ref} gives us an algorithm that cuts $m/2$ edges in _expectation_. 99 | But, as we saw before, expectation does not immediately imply concentration, and so a priori, it may be the case that when we run the algorithm, most of the time we don't get a cut matching the expectation. 100 | Luckily, we can _amplify_ the probability of success by repeating the process several times and outputting the best cut we find. 101 | We start by arguing that the probability the algorithm above succeeds in cutting at least $m/2$ edges is not _too_ tiny. 102 | 103 | > ### {.lemma #cutprob} 104 | The probability that a random cut in an $m$ edge graph cuts at least $m/2$ edges is at least $1/(2m)$. 105 | 106 | > ### {.proofidea data-ref="cutprob"} 107 | To see the idea behind the proof, think of the case that $m=1000$. 108 | In this case one can show that we will cut at least $500$ edges with probability at least $0.001$ (and so in particular larger than $1/(2m)=1/2000$). 109 | Specifically, if we assume otherwise, then this means that with probability more than $0.999$ the algorithm cuts $499$ or fewer edges. 110 | But since we can never cut more than the total of $1000$ edges, given this assumption, the highest value of the expected number of edges cut is if we cut exactly $499$ edges with probability $0.999$ and cut $1000$ edges with probability $0.001$. 111 | Yet even in this case the expected number of edges will be $0.999 \cdot 499 + 0.001 \cdot 1000 < 500$, which contradicts the fact that we've calculated the expectation to be at least $500$ in [maxcutthm](){.ref}. 112 | 113 | ::: {.proof data-ref="cutprob"} 114 | Let $p$ be the probability that we cut at least $m/2$ edges and suppose, towards a contradiction, that $p<1/(2m)$. 115 | Since the number of edges cut is an integer, and $m/2$ is a multiple of $0.5$, by definition of $p$, with probability $1-p$ we cut at most $m/2 - 0.5$ edges. 116 | Moreover, since we can never cut more than $m$ edges, under our assumption that $p<1/(2m)$, we can bound the expected number of edges cut by 117 | 118 | $$ 119 | pm + (1-p)(m/2-0.5) \leq pm + m/2-0.5 120 | $$ 121 | But if $p<1/(2m)$ then $pm<0.5$ and so the right-hand side is smaller than $m/2$, which contradicts the fact that (as proven in [maxcutthm](){.ref}) the expected number of edges cut is at least $m/2$. 122 | ::: 123 | 124 | 125 | ### Success amplification 126 | 127 | [cutprob](){.ref} shows that our algorithm succeeds at least _some_ of the time, but we'd like to succeed almost _all_ of the time. The approach to do that is to simply _repeat_ our algorithm many times, with fresh randomness each time, and output the best cut we get in one of these repetitions. 128 | It turns out that with extremely high probability we will get a cut of size at least $m/2$. 129 | For example, if we repeat this experiment $2000m$ times, then (using the inequality $(1-1/k)^k \leq 1/e \leq 1/2$) we can show that the probability that we will never cut at least $m/2$ edges, where $k=2m$, is at most 130 | 131 | $$ 132 | (1-1/(2m))^{2000 m} = (1-1/k)^{1000 k} = ((1-1/k)^{k})^{1000} \leq 2^{-1000} \;. 133 | $$ 134 | 135 | More generally, the same calculations can be used to show the following lemma: 136 | 137 | > ### {.lemma #cutalgorithmamplificationlem} 138 | There is an algorithm that on input a graph $G=(V,E)$ and a number $k$, runs in time polynomial in $|V|$ and $k$ and outputs a cut $(S,\overline{S})$ such that 139 | $$ 140 | \Pr[ \text{number of edges cut by $(S,\overline{S})$ } \geq |E|/2 ] \geq 1- 2^{-k} \;. 141 | $$ 142 | 143 | 144 | ::: {.proof data-ref="cutalgorithmamplificationlem"} 145 | The algorithm will work as follows: 146 | 147 | ::: {.quote} 148 | __Algorithm AMPLIFY RANDOM CUT:__ 149 | 150 | __Input:__ Graph $G=(V,E)$ with $n$ vertices and $m$ edges. Denote $V = \{ v_0,v_1,\ldots, v_{n-1}\}$. Number $k>0$. 151 | 152 | __Operation:__ 153 | 154 | 1. Repeat the following $200km$ times: 155 | 156 | a. Pick $x$ uniformly at random in $\{0,1\}^n$. 157 | 158 | b. Let $S \subseteq V$ be the set $\{ v_i \;:\; x_i = 1 \;,\; i\in [n] \}$ that includes all vertices corresponding to coordinates of $x$ where $x_i=1$. 159 | 160 | c. If $(S,\overline{S})$ cuts at least $m/2$ then halt and output $(S,\overline{S})$. 161 | 162 | 2. Output "failed" 163 | 164 | ::: 165 | 166 | We leave completing the analysis as an exercise to the reader (see [cutalgorithmamplificationlemex](){.ref}). 167 | ::: 168 | 169 | 170 | 171 | ### Two-sided amplification 172 | 173 | The analysis above relied on the fact that the maximum cut has _one sided error_. By this we mean that if we get a cut of size at least $m/2$ then we know we have succeeded. 174 | This is common for randomized algorithms, but is not the only case. 175 | In particular, consider the task of computing some Boolean function $F:\{0,1\}^* \rightarrow \{0,1\}$. 176 | A randomized algorithm $A$ for computing $F$, given input $x$, might toss coins and succeed in outputting $F(x)$ with probability, say, $0.9$. 177 | We say that $A$ has _two sided errors_ if there is positive probability that $A(x)$ outputs $1$ when $F(x)=0$, and positive probability that $A(x)$ outputs $0$ when $F(x)=1$. 178 | In such a case, to amplify $A$'s success, we cannot simply repeat it $k$ times and output $1$ if a single one of those repetitions resulted in $1$, nor can we output $0$ if a single one of the repetitions resulted in $0$. 179 | But we can output the _majority value_ of these repetitions. 180 | By the Chernoff bound ([chernoffthm](){.ref}), with probability _exponentially close_ to $1$ (i.e., $1-2^{\Omega(k)}$), the fraction of the repetitions where $A$ will output $F(x)$ will be at least, say $0.89$, and in such cases we will of course output the correct answer. 181 | 182 | The above translates into the following theorem 183 | 184 | ::: {.theorem title="Two-sided amplification" #amplifyalg} 185 | If $F:\{0,1\}^* \rightarrow \{0,1\}$ is a function such that there is a polynomial-time algorithm $A$ satisfying 186 | $$ 187 | \Pr[A(x) = F(x)] \geq 0.51 188 | $$ 189 | for every $x\in \{0,1\}^*$, then there is a polynomial time algorithm $B$ satisfying 190 | $$ 191 | \Pr[ B(x) = F(x) ] \geq 1 - 2^{-|x|} 192 | $$ 193 | for every $x\in \{0,1\}^*$. 194 | ::: 195 | 196 | We omit the proof of [amplifyalg](){.ref}, since we will prove a more general result later on in [amplificationthm](){.ref}. 197 | 198 | 199 | 200 | ### What does this mean? 201 | 202 | We have shown a probabilistic algorithm that on any $m$ edge graph $G$, will output a cut of at least $m/2$ edges with probability at least $1-2^{-1000}$. 203 | Does it mean that we can consider this problem as "easy"? 204 | Should we be somewhat wary of using a probabilistic algorithm, since it can sometimes fail? 205 | 206 | First of all, it is important to emphasize that this is still a _worst case_ guarantee. 207 | That is, we are not assuming anything about the _input graph_: the probability is only due to the _internal randomness of the algorithm_. 208 | While a probabilistic algorithm might not seem as nice as a deterministic algorithm that is _guaranteed_ to give an output, to get a sense of what a failure probability of $2^{-1000}$ means, note that: 209 | 210 | 211 | * The chance of winning the Massachusetts Mega Millions lottery is one over $(75)^5\cdot 15$, which is roughly $2^{-35}$. So $2^{-1000}$ corresponds to winning the lottery about $300$ times in a row, at which point you might not care so much about your algorithm failing. 212 | 213 | * The chance for a U.S. resident to be struck by lightning is about $1/700000$, which corresponds to about a $2^{-45}$ chance that you'll be struck by lightning the very second that you're reading this sentence (after which again you might not care so much about the algorithm's performance). 214 | 215 | * Since the earth is about 5 billion years old, we can estimate the chance that an asteroid of the magnitude that caused the dinosaurs' extinction will hit us this very second to be about $2^{-60}$. 216 | It is quite likely that even a deterministic algorithm will fail if this happens. 217 | 218 | So, in practical terms, a probabilistic algorithm is just as good as a deterministic one. 219 | But it is still a theoretically fascinating question whether randomized algorithms actually yield more power, or whether is it the case that for any computational problem that can be solved by a probabilistic algorithm, there is a deterministic algorithm with nearly the same performance.^[This question does have some significance to practice, since hardware that generates high quality randomness at speed is non-trivial to construct.] 220 | For example, we will see in [maxcutex](){.ref} that there is in fact a deterministic algorithm that can cut at least $m/2$ edges in an $m$-edge graph. 221 | We will discuss this question in generality in [chapmodelrand](){.ref}. 222 | For now, let us see a couple of examples where randomization leads to algorithms that are better in some sense than the known deterministic algorithms. 223 | 224 | ## Solving SAT through randomization 225 | 226 | The 3SAT problem is $\mathbf{NP}$ hard, and so it is unlikely that it has a polynomial (or even subexponential) time algorithm. 227 | But this does not mean that we can't do at least somewhat better than the trivial $2^n$ algorithm for $n$-variable 3SAT. 228 | The best known worst-case algorithms for 3SAT are randomized, and are related to the following simple algorithm, variants of which are also used in practice: 229 | 230 | 231 | ::: {.quote} 232 | 233 | __Algorithm WalkSAT:__ 234 | 235 | __Input:__ An $n$ variable 3CNF formula $\varphi$. 236 | 237 | __Parameters:__ $T,S \in \N$ 238 | 239 | __Operation:__ 240 | 241 | 1. Repeat the following $T$ steps: 242 | 243 | a. Choose a random assignment $x\in \{0,1\}^n$ and repeat the following for $S$ steps: 244 | 245 | 1. If $x$ satisfies $\varphi$ then output $x$. 246 | 247 | 2. Otherwise, choose a random clause $(\ell_i \vee \ell_j \vee \ell_k)$ that $x$ does not satisfy, choose a random literal in $\ell_i,\ell_j,\ell_k$ and modify $x$ to satisfy this literal. 248 | 249 | 250 | 2. If all the $T\cdot S$ repetitions above did not result in a satisfying assignment then output `Unsatisfiable` 251 | ::: 252 | 253 | 254 | The running time of this algorithm is $S\cdot T \cdot poly(n)$, and so the key question is how small we can make $S$ and $T$ so that the probability that WalkSAT outputs `Unsatisfiable` on a satisfiable formula $\varphi$ is small. 255 | It is known that we can do so with $ST = \tilde{O}((4/3)^n) = \tilde{O}(1.333\ldots^n)$ (see [walksatex](){.ref} for a weaker result), but we'll show below a simpler analysis yielding $ST= \tilde{O}(\sqrt{3}^n) = \tilde{O}(1.74^n)$, which is still much better than the trivial $2^n$ bound.^[At the time of this writing, the best known [randomized](https://arxiv.org/pdf/1103.2165.pdf) algorithms for 3SAT run in time roughly $O(1.308^n)$, and the best known [deterministic](https://arxiv.org/pdf/1102.3766v1.pdf) algorithms run in time $O(1.3303^n)$ in the worst case.] 256 | 257 | > ### {.theorem title="WalkSAT simple analysis" #walksatthm} 258 | If we set $T=100\cdot \sqrt{3}^{n}$ and $S= n/2$, then the probability we output `Unsatisfiable` for a satisfiable $\varphi$ is at most $1/2$. 259 | 260 | 261 | ::: {.proof data-ref="walksatthm"} 262 | Suppose that $\varphi$ is a satisfiable formula and let $x^*$ be a satisfying assignment for it. 263 | For every $x\in \{0,1\}^n$, denote by $\Delta(x,x^*)$ the number of coordinates that differ between $x$ and $x^*$. 264 | The heart of the proof is the following claim: 265 | 266 | __Claim I:__ For every $x,x^*$ as above, in every local improvement step, the value $\Delta(x,x^*)$ is decreased by one with probability at least $1/3$. 267 | 268 | __Proof of Claim I:__ Since $x^*$ is a _satisfying_ assignment, if $C$ is a clause that $x$ does _not_ satisfy, then at least one of the variables involved in $C$ must get different values in $x$ and $x^*$. 269 | Thus when we change $x$ by one of the three literals in the clause, we have probability at least $1/3$ of decreasing the distance. 270 | 271 | The second claim is that our starting point is not that bad: 272 | 273 | __Claim 2:__ With probability at least $1/2$ over a random $x\in \{0,1\}^n$, $\Delta(x,x^*) \leq n/2$. 274 | 275 | __Proof of Claim II:__ Consider the map $FLIP:\{0,1\}^n \rightarrow \{0,1\}^n$ that simply "flips" all the bits of its input from $0$ to $1$ and vice versa. That is, $FLIP(x_0,\ldots,x_{n-1}) = (1-x_0,\ldots,1-x_{n-1})$. 276 | Clearly $FLIP$ is one to one. Moreover, if $x$ is of distance $k$ to $x^*$, then $FLIP(x)$ is distance $n-k$ to $x^*$. 277 | Now let $B$ be the "bad event" in which $x$ is of distance $>n/2$ from $x^*$. 278 | Then the set $A = FLIP(B) = \{ FLIP(x) \;:\; x\in B \}$ satisfies $|A|=|B|$ and that if $x\in A$ then $x$ is of distance $ ### {.lemma title="Matching polynomial" #matchpolylem} 312 | Define $P=P(G)$ to be the polynomial mapping $\R^{n^2}$ to $\R$ where 313 | $$ 314 | P(x_{0,0},\ldots,x_{n-1,n-1}) = \sum_{\pi \in S_n} \left( \prod_{i=0}^{n-1} sign(\pi)A_{i,\pi(i)} \right) \prod_{i=0}^{n-1} x_{i,\pi(i)} \label{matchpolyeq} 315 | $$ 316 | Then $G$ has a perfect matching if and only if $P$ is not identically zero. 317 | That is, $G$ has a perfect matching if and only if there exists some assignment $x=(x_{i,j})_{i,j\in [n]} \in \R^{n^2}$ such that $P(x) \neq 0$.^[The [sign](https://goo.gl/ELnXhq) of a permutation $\pi:[n] \rightarrow [n]$, denoted by $sign(\pi)$, can be defined in several equivalent ways, one of which is that it equals $-1$ if the number of pairs $x\pi(y)$ is odd and equals $+1$ otherwise. The importance of the term $sign(\pi)$ is that it makes $P$ equal to the _determinant_ of the matrix $(x_{i,j})$ and hence efficiently computable.] 318 | 319 | ::: {.proof data-ref="matchpolylem"} 320 | If $G$ has a perfect matching $M^*$, then let $\pi^*$ be the permutation corresponding to $M$ and let $x^* \in \mathbb{R}^{n^2}$ defined as follows: $x_{i,j}=1$ if $j=\pi^*(i)$ and $x^*_{i,j}=0$ otherwise. (That is, $x^*_{i,j}=1$ iff $\pi^*(i)=j$.) We claim that $P(x^*) = sign(\pi^*)$ which in particular means that $P$ is not identically zero. To see why this is true, write $P(x^*) = \sum_{\pi \in S_n} sign(\pi) P_\pi(x^*)$ where $P_\pi(x^*)=\prod_{i=0}^{n-1} A_{i,\pi(i)} x^*_{i,\pi(i)}$. But for all $\pi \neq \pi^*$ there will be some $i$ such that $\pi(i) \neq \pi^*(j)$ and so $x^*_{i,\pi(i)}=0$, which means that $\Pi_{\pi}(x^*)=0$. On the other hand, since $\pi^*$ is a matching in $G$, $A_{i,\pi^*(i)}=1$ for all $i$, and hence $P_{\pi^*}(x^*) = \prod_{i=0}^{n-1} A_{i,\pi^*(i)} x^*_{i,\pi^*(i)}=1$, and so $P(x^*) = sign(\pi^*)$. 321 | 322 | On the other hand, suppose that $P$ is not identically zero. 323 | By [matchpolyeq](){.eqref}, this means there is some $x \in \{0,1\}^{n^2}$ and some permutation $\pi$ such that $\prod_{i=0}^{n-1}A_{i,\pi(i)}x_{i,\pi(i)} \neq 0$. But for this to happen, it must be that $A_{i,\pi(i)} \neq 0$ for all $i$, which means that for every $i$, the edge $(i,\pi(i))$ exists in the graph, and hence $\pi$ must be a perfect matching in $G$. 324 | ::: 325 | 326 | 327 | 328 | As we've seen before, for every $x \in \R^{n^2}$, we can compute $P(x)$ by simply computing the _determinant_ of the matrix $A(x)$, which is obtained by replacing $A_{i,j}$ with $A_{i,j}x_{i,j}$. 329 | This reduces testing perfect matching to the _zero testing_ problem for polynomials: given some polynomial $P(\cdot)$, test whether $P$ is identically zero or not. 330 | The intuition behind our randomized algorithm for zero testing is the following: 331 | 332 | >_If a polynomial is not identically zero, then it can't have "too many" roots._ 333 | 334 | ![A degree $d$ curve in one variable can have at most $d$ roots. In higher dimensions, a $n$-variate degree-$d$ polynomial can have an infinite number roots though the set of roots will be an $n-1$ dimensional surface. Over a finite field $\mathbb{F}$, an $n$-variate degree $d$ polynomial has at most $d|\mathbb{F}|^{n-1}$ roots.](../figure/curves.png){#curvesfig .margin } 335 | 336 | This intuition sort of makes sense. 337 | For one variable polynomials, we know that a non-zero linear function has at most one root, a quadratic function (e.g., a parabola) has at most two roots, and generally a degree $d$ equation has at most $d$ roots. 338 | While in more than one variable there can be an infinite number of roots (e.g., the polynomial $x_0+y_0$ vanishes on the line $y=-x$) it is still the case that the set of roots is very "small" compared to the set of all inputs. 339 | For example, the root of a bivariate polynomial form a curve, the roots of a three-variable polynomial form a surface, and more generally the roots of an $n$-variable polynomial are a space of dimension $n-1$. 340 | 341 | This intuition leads to the following simple randomized algorithm: 342 | 343 | >_To decide if $P$ is identically zero, choose a "random" input $x$ and check if $P(x)\neq 0$._ 344 | 345 | This makes sense: if there are only "few" roots, then we expect that with high probability the random input $x$ is not going to be one of those roots. 346 | However, to transform this into an actual algorithm, we need to make both the intuition and the notion of a "random" input precise. 347 | Choosing a random real number is quite problematic, especially when you have only a finite number of coins at your disposal, and so we start by reducing the task to a finite setting. 348 | We will use the following result: 349 | 350 | > ### {.theorem title="Schwartz–Zippel lemma" #szlem} 351 | For every integer $q$, and polynomial $P:\R^n \rightarrow \R$ with integer coefficients. 352 | If $P$ has degree at most $d$ and is not identically zero, then it has at most $dq^{n-1}$ roots 353 | in the set $[q]^n = \{ (x_0,\ldots,x_{n-1}) : x_i \in \{0,\ldots,q-1\} \}$. 354 | 355 | We omit the (not too complicated) proof of [szlem](){.ref}. 356 | We remark that it holds not just over the real numbers but over any field as well. 357 | Since the matching polynomial $P$ of [matchpolylem](){.ref} has degree at most $n$, [szlem](){.ref} leads directly to a simple algorithm for testing if it is non-zero: 358 | 359 | 360 | ::: {.quote} 361 | __Algorithm Perfect-Matching:__ 362 | 363 | __Input:__ Bipartite graph $G$ on $2n$ vertices $\{ \ell_0,\ldots,\ell_{n-1} , r_0,\ldots,r_{n-1} \}$. 364 | 365 | __Operation:__ 366 | 367 | 1. For every $i,j \in [n]$, choose $x_{i,j}$ independently at random from $[2n]=\{0,\ldots 2n-1\}$. 368 | 369 | 2. Compute the determinant of the matrix $A(x)$ whose $(i,j)^{th}$ entry equals $x_{i,j}$ if the edge $\{\ell_i,r_j\}$ is present and $0$ otherwise. 370 | 371 | 3. Output `no perfect matching` if this determinant is zero, and output `perfect matching` otherwise. 372 | ::: 373 | 374 | This algorithm can be improved further (e.g., see [matchingmodex](){.ref}). 375 | While it is not necessarily faster than the cut-based algorithms for perfect matching, it does have some advantages. In particular, it is more amenable for parallelization. (However, it also has the significant disadvantage that it does not produce a matching but only states that one exists.) 376 | The Schwartz–Zippel Lemma, and the associated zero testing algorithm for polynomials, is widely used across computer science, including in several settings where we have no known deterministic algorithm matching their performance. 377 | 378 | 379 | 380 | ::: { .recap } 381 | * Using concentration results, we can _amplify_ in polynomial time the success probability of a probabilistic algorithm from a mere $1/p(n)$ to $1-2^{-q(n)}$ for every polynomials $p$ and $q$. 382 | 383 | * There are several randomized algorithms that are better in various senses (e.g., simpler, faster, or other advantages) than the best known deterministic algorithm for the same problem. 384 | ::: 385 | 386 | ## Exercises 387 | 388 | 389 | ::: {.exercise title="Amplification for max cut" #cutalgorithmamplificationlemex} 390 | Prove [cutalgorithmamplificationlem](){.ref} 391 | ::: 392 | 393 | > ### {.exercise title="Deterministic max cut algorithm" #maxcutex} 394 | ^[TODO: add exercise to give a deterministic max cut algorithm that gives $m/2$ edges. Talk about greedy approach.] 395 | 396 | > ### {.exercise title="Simulating distributions using coins" #coindistex} 397 | Our model for probability involves tossing $n$ coins, but sometimes algorithm require sampling from other distributions, such as selecting a uniform number in $\{0,\ldots,M-1\}$ for some $M$. 398 | Fortunately, we can simulate this with an exponentially small probability of error: prove that for every $M$, if $n>k\lceil \log M \rceil$, then there is a function $F:\{0,1\}^n \rightarrow \{0,\ldots,M-1\} \cup \{ \bot \}$ such that __(1)__ The probability that $F(x)=\bot$ is at most $2^{-k}$ and __(2)__ the distribution of $F(x)$ conditioned on $F(x) \neq \bot$ is equal to the uniform distribution over $\{0,\ldots,M-1\}$.^[__Hint:__ Think of $x\in \{0,1\}^n$ as choosing $k$ numbers $y_1,\ldots,y_k \in \{0,\ldots, 2^{\lceil \log M \rceil}-1 \}$. Output the first such number that is in $\{0,\ldots,M-1\}$. ] 399 | 400 | > ### {.exercise title="Better walksat analysis" #walksatex} 401 | 1. Prove that for every $\epsilon>0$, if $n$ is large enough then for every $x^*\in \{0,1\}^n$ $\Pr_{x \sim \{0,1\}^n}[ \Delta(x,x^*) \leq n/3 ] \leq 2^{-(1-H(1/3)-\epsilon)n}$ where $H(p)=p\log(1/p) + (1-p)\log(1/(1-p))$ is the same function as in [entropybinomex](){.ref}. \ 402 | 2. Prove that $2^{1-H(1/4)+(1/4) \log 3}=(3/2)$. 403 | 2. Use the above to prove that for every $\delta>0$ and large enough $n$, if we set $T=1000\cdot (3/2+\delta)^n$ and $S=n/4$ in the WalkSAT algorithm then for every satisfiable 3CNF $\varphi$, the probability that we output `unsatisfiable` is at most $1/2$. \ 404 | 405 | > ### {.exercise title="Faster bipartite matching (challenge)" #matchingmodex} 406 | (to be completed: improve the matching algorithm by working modulo a prime) 407 | 408 | 409 | ## Bibliographical notes 410 | 411 | The books of Motwani and Raghavan [@motwani1995randomized] and Mitzenmacher and Upfal [@mitzenmacher2017probability] are two excellent resources for randomized algorithms. 412 | Some of the history of the discovery of Monte Carlo algorithm is covered [here](http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-88-9068). 413 | 414 | 415 | 416 | 417 | 418 | 419 | 420 | ## Acknowledgements 421 | -------------------------------------------------------------------------------- /lec_20_alg_society.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Algorithms and society" 3 | filename: "lec_20_alg_society" 4 | --- 5 | 6 | 7 | # Algorithms and society 8 | 9 | PLAN: Talk about how algorithms interact with society - incentivies, privacy, fairness. Maybe talk about cryptocurrencies (if we don't talk about it in crypto) 10 | 11 | 12 | 13 | ## Exercises 14 | 15 | 16 | 17 | ## Bibliographical notes 18 | 19 | -------------------------------------------------------------------------------- /lec_24_proofs.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Proofs and algorithms" 3 | filename: "lec_24_proofs" 4 | chapternum: "22" 5 | --- 6 | 7 | # Proofs and algorithms { #chapproofs } 8 | 9 | >_"Let's not try to define knowledge, but try to define zero-knowledge."_, Shafi Goldwasser. 10 | 11 | 12 | _Proofs_ have captured human imagination for thousands of years, ever since the publication of Euclid's _Elements_, a book second only to the bible in the number of editions printed. 13 | 14 | Plan: 15 | 16 | * Proofs and algorithms 17 | 18 | * Interactive proofs 19 | 20 | * Zero knowledge proofs 21 | 22 | * Propositions as types, Coq and other proof assistants. 23 | 24 | 25 | 26 | ## Exercises 27 | 28 | 29 | 30 | ## Bibliographical notes 31 | 32 | 33 | 34 | -------------------------------------------------------------------------------- /macros.tex: -------------------------------------------------------------------------------- 1 | \renewcommand{\label}[1]{} 2 | \newcommand{\N}{\mathbb{N}} 3 | \newcommand{\R}{\mathbb{R}} 4 | \newcommand{\Z}{\mathbb{Z}} 5 | \newcommand{\E}{\mathbb{E}} 6 | \newcommand{\val}{\ensuremath{\mathrm{val}}} 7 | \newcommand{\floor}[1]{\lfloor #1 \rfloor} 8 | \newcommand{\ceil}[1]{\lceil #1 \rceil} 9 | \newcommand{\expr}[1]{\langle #1 \rangle} 10 | \newcommand{\pmod}[1]{(\mod #1)} 11 | -------------------------------------------------------------------------------- /metadata.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | booktitle: "Introduction to Theoretical Computer Science" 3 | author: "Boaz Barak" 4 | log: True 5 | logdir: "log" 6 | sourcedir: "content/" 7 | bibfile: "introtcs.bib" 8 | latexsectionheaders: 9 | 1: "chapter" 10 | 2: "section" 11 | 3: "subsection" 12 | 4: "subsubsection" 13 | 5: "paragraph" 14 | 6: "subparagraph" 15 | labelclasses: 16 | "bigidea": "Big Idea" 17 | "solvedexercise": "Solved Exercise" 18 | "pause": "Pause and think box" 19 | "subsection": "Section" 20 | latexblockcode: "code" 21 | latexinlinecode: "" 22 | auxfile: "bookaux.yaml" 23 | pdfurl: "https://files.boazbarak.org/introtcs/lnotes_book.pdf" 24 | searchindex: "html/search_index.json" 25 | binarybaseurl: "https://files.boazbarak.org/introtcs" 26 | description: "Textbook on Theoretical Computer Science by Boaz Barak" 27 | lang: "en" 28 | url: 'https://introtcs.org/' 29 | github-repo: "boazbk/tcs" 30 | cover-image: "icons/cover.png" 31 | apple-touch-icon: "icons/apple-touch-icon-120x120.png" 32 | apple-touch-icon-size: 120 33 | favicon: "icons/favicon.ico" 34 | --- 35 | 36 | 37 | -------------------------------------------------------------------------------- /msword.md: -------------------------------------------------------------------------------- 1 | **Disclaimer:** The MS-Word version of this text has significant formatting issues compared to the PDF and HTML versions. 2 | --------------------------------------------------------------------------------