├── ref ├── awk1978.pdf ├── mcilroy.htm ├── awk1line.txt └── hist.html ├── present.awk ├── README.md ├── LICENSE └── slides.txt /ref/awk1978.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mikepea/awk_tawk/HEAD/ref/awk1978.pdf -------------------------------------------------------------------------------- /present.awk: -------------------------------------------------------------------------------- 1 | #!/usr/bin/awk -f 2 | 3 | BEGIN { FS="\n"; RS=""; } # multiline mode 4 | 5 | function get_key() { RS="\n"; getline key < "-"; RS="" } 6 | function refresh() { system("clear"); print "=== Slide " NR " ===" } 7 | function alen(a) { c=0; for (i in a) c++; return c } 8 | function empty_array(a) { split("", a, ":") } 9 | function print_slide(slide) { 10 | l = alen(slide) 11 | for (i=1;i<=l;i++) { 12 | if ( slide[i] ~ /^@/ ) { continue } 13 | print ( slide[i] == "." ) ? "" : slide[i] 14 | } 15 | } 16 | 17 | { 18 | refresh() 19 | if ($1 ~ /^!/) { 20 | system(substr($1, 2)); empty_array(slide_cache) 21 | } else if ($1 ~ /^#/) next 22 | else { 23 | if ( $1 != "@last") empty_array(slide_cache) 24 | orig_len = alen(slide_cache) 25 | for (i=1; i<=NF; i++) { 26 | slide_cache[orig_len + i] = $i 27 | } 28 | } 29 | } 30 | NR >= ENVIRON["SS"] { print_slide(slide_cache); get_key() } 31 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # awk_tawk 2 | 3 | Presentation on how great AWK is, including a review of The AWK Programming Language 4 | 5 | Also includes a presentation tool, written in AWK. Eating own dog food or what! 6 | 7 | SS=1 awk -f ./present.awk slides.txt 8 | 9 | ... where SS optionally gives the starting slide number. 10 | 11 | Hit enter to advance slides. PRs welcome. 12 | 13 | ### Notes and References 14 | 15 | Dennis Richie on early Unix history: https://www.bell-labs.com/usr/dmr/www/hist.html 16 | 17 | Doug McIlroy Interview: https://www.princeton.edu/~hos/frs122/precis/mcilroy.htm 18 | 19 | 1978 Awk Paper - 'Awk -- A Pattern Scanning and Processing Language (Second 20 | Edition) (1978)': http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1299 21 | 22 | Handy Awk cheat sheet: http://www.catonmat.net/download/awk.cheat.sheet.pdf 23 | 24 | Archive.org link to The Awk Programming Language PDF: 25 | https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoIC7/The_AWK_Programming_Language.pdf 26 | 27 | And a nice list of one-liners: http://www.catonmat.net/blog/wp-content/uploads/2008/09/awk1line.txt 28 | 29 | Karabiner Elements - makes it possible to control this presentation with 30 | a clicker (that emulates a keyboard): https://pqrs.org/osx/karabiner/ 31 | 32 | 33 | -------------------------------------------------------------------------------- /ref/mcilroy.htm: -------------------------------------------------------------------------------- 1 | 2 | 3 | Doug McIlroy 4 | 5 | 6 | 7 | The following is an account of an interview with Doug McIlroy, 8 | head of department where UNIX started. 9 |
  10 |
       In 1969, Bell Labs decided to 11 | get out of the Multics project.  It became clear that Research was 12 | a drag on the Computing Center's budget.  "They had a million dollars 13 | worth of equipment in the attic that was sitting there being played with 14 | by three people. ...A clean, sharp decision was made to get out.  The 15 | project did not wind down.  It just stopped." 16 | 17 |

       With the experience gained from 18 | Multics, "Ken Thompson began to build his own operating system for the 19 | giant 645, starting from scratch."   In the wee hours of the 20 | night, Thompson would take the system down when nobody was using it. 21 | 22 |

       At the same time, the development 23 | group was still working.  The Computer Center still owned the machines 24 | and was separate from Research.  So Computing Research had no computers.  25 | "Visual and acoustics research had had computers for some time."  26 | They were interested in listening to signals in real time and make digital 27 | filters, but this ate up all the cycles of the machine.  As more and 28 | more minicomputers came available, Visual and Acoustics Research kept getting 29 | them.  They had nice hardware, and we would comment on how inefficiently 30 | they were using their cycles.  Because they really didn't like making 31 | software, when things got tough, they would just buy another machine.  32 | And if the machines got a little faster, they would just throw out the 33 | old one.  That was the origin of the PDP7." 34 | 35 |

       The PDP7 had an improved graphics 36 | engine, that had been sitting idle.  "That's what Thompson grabbed 37 | and finally used to build early versions of Unix on."  As Thompson 38 | brought along his operating system, Ritchie joined in.  McIlroy also 39 | saw its potential, and, being head of the department, he muscled in. 40 | 41 |

       One place that McIlroy exerted 42 | managerial control of Unix was in pushing for pipes.  The idea of 43 | pipes goes way back.  McIlroy began doing macros in the CACM back 44 | in 1959 or 1960.  Macros involve switching among many data streams.  45 | "You're taking in your input, you suddenly come to a macro call, and that 46 | says, stop taking input from here, go take it from the definition.  47 | In the middle of the definition, you'll find another macro call.  48 | Somewhere I talked of a macro processor as a switchyard for data streams.  49 | ...In 1964, [according to a] paper hanging on Brian's wall, I talked about 50 | screwing together streams like garden hoses." 51 | 52 |

       "On MULTICS, Joe Osanna, ... was 53 | actually beginning to build a way to do input-output plumbing.  Input-output 54 | was interpreted over this idea of the segmented address space in the file 55 | system: files were really just segments of the same old address space.  56 | Nevertheless, you had to do I/O because all the programming languages did 57 | it.   And he was making ways of connecting programs together." 58 | 59 |

       While Thompson and Ritchie were 60 | laying out their file system, McIlroy was "sketching out how to do data 61 | processing by connecting together cascades of processes and looking for 62 | a kind of prefix-notation language for connecting processes together." 63 | 64 |

       Over a period from 1970 to 1972, 65 | McIlroy suggested proposal after proposal.  He recalls the break-through 66 | day:  "Then one day, I came up with a syntax for the shell that went 67 | along with the piping, and Ken said, I'm gonna do it.  He was tired 68 | of hearing all this stuff."  Thompson didn't do exactly what McIlroy 69 | had proposed for the pipe system call, but "invented a slightly better 70 | one.  That finally got changed once more to what we have today.  He 71 | put pipes into Unix."  Thompson also had to change most of the programs, 72 | because up until that time, they couldn't take standard input.  There 73 | wasn't really a need; they all had file arguments. "GREP had a file argument, 74 | CAT had a file argument." 75 | 76 |

       The next morning, "we had this 77 | orgy of  `one liners.'  Everybody had a one liner.  Look 78 | at this, look at that.  ...Everybody started putting forth the UNIX 79 | philosophy.  Write programs that do one thing and do it well.  80 | Write programs to work together.  Write programs that handle text 81 | streams, because that is a universal interface."   Those ideas 82 | which add up to the tool approach, were there in some unformed way before 83 | pipes, but they really came together afterwards.  Pipes became the 84 | catalyst for this UNIX philosophy.  "The tool thing has turned out 85 | to be actually successful.  With pipes, many programs could work together, 86 | and they could work together at a distance." 87 | 88 |

       APL influenced the development 89 | of pipes.  APL did not allow the use of operators with variants, which 90 | many utilities had at the time.  It only took a willingness to throw 91 | in a new separator, the vertical bar.  About four years passed, from 92 | the time they started talking about developing a new separator, to the 93 | time it happened. 94 | 95 |

       To most of the Research group at 96 | Bell Labs, the computing theory was there on the side, while they had functionality 97 | to deal with.  Most of the group members were more computer types 98 | than mathematicians, even though they wrote papers occasionally with mathematical 99 | notation.  McIlroy went to Oxford for a year, solely to "imbibe the 100 | notion of semantics form the source."  The Research group included 101 | system builders like Thompson, and theoretical scientists like Aho.  102 | "Aho handed out paper after paper of slightly different models of parsing 103 | and automata, and that was supported with the overt idea that one day [it] 104 | would feed computing practice." 105 | 106 |

    "There is a case where there's absolutely no doubt 107 | that, overtly, theory fed into what we did.  When the sound theory 108 | of parsing went into a compiler-writing system, it became available to 109 | the masses.  There are lots of other places where theory is an inspiration, 110 | or it's in the back of your mind."  Thompson wrote one famous recognizer, 111 | which is still used in GREP.   And Aho decided that he was going 112 | to take that part of automata theory, and so he built EGREP.  "So, 113 | you have the deterministic [parser] in EGREP, and the nondeterministic one in 114 | GREP." 115 | 116 |

    "I think really that YACC and GREP are what people 117 | hold up as the `real tools' and they are the ones where we find a strong 118 | theoretical underpinning.  TROFF has none.   [While] 119 | it's used, and indispensable, nobody holds it up as a programming 120 | gem." 121 | 122 |

    This concludes what is contained in the interview, 123 | as it relates to Unix. 124 |
  125 | 126 | 127 | -------------------------------------------------------------------------------- /ref/awk1line.txt: -------------------------------------------------------------------------------- 1 | HANDY ONE-LINERS FOR AWK 22 July 2003 2 | compiled by Eric Pement version 0.22 3 | Latest version of this file is usually at: 4 | http://www.student.northpark.edu/pemente/awk/awk1line.txt 5 | 6 | 7 | USAGE: 8 | 9 | Unix: awk '/pattern/ {print "$1"}' # standard Unix shells 10 | DOS/Win: awk '/pattern/ {print "$1"}' # okay for DJGPP compiled 11 | awk "/pattern/ {print \"$1\"}" # required for Mingw32 12 | 13 | Most of my experience comes from version of GNU awk (gawk) compiled for 14 | Win32. Note in particular that DJGPP compilations permit the awk script 15 | to follow Unix quoting syntax '/like/ {"this"}'. However, the user must 16 | know that single quotes under DOS/Windows do not protect the redirection 17 | arrows (<, >) nor do they protect pipes (|). Both are special symbols 18 | for the DOS/CMD command shell and their special meaning is ignored only 19 | if they are placed within "double quotes." Likewise, DOS/Win users must 20 | remember that the percent sign (%) is used to mark DOS/Win environment 21 | variables, so it must be doubled (%%) to yield a single percent sign 22 | visible to awk. 23 | 24 | If I am sure that a script will NOT need to be quoted in Unix, DOS, or 25 | CMD, then I normally omit the quote marks. If an example is peculiar to 26 | GNU awk, the command 'gawk' will be used. Please notify me if you find 27 | errors or new commands to add to this list (total length under 65 28 | characters). I usually try to put the shortest script first. 29 | 30 | FILE SPACING: 31 | 32 | # double space a file 33 | awk '1;{print ""}' 34 | awk 'BEGIN{ORS="\n\n"};1' 35 | 36 | # double space a file which already has blank lines in it. Output file 37 | # should contain no more than one blank line between lines of text. 38 | # NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are 39 | # often treated as non-blank, and thus 'NF' alone will return TRUE. 40 | awk 'NF{print $0 "\n"}' 41 | 42 | # triple space a file 43 | awk '1;{print "\n"}' 44 | 45 | NUMBERING AND CALCULATIONS: 46 | 47 | # precede each line by its line number FOR THAT FILE (left alignment). 48 | # Using a tab (\t) instead of space will preserve margins. 49 | awk '{print FNR "\t" $0}' files* 50 | 51 | # precede each line by its line number FOR ALL FILES TOGETHER, with tab. 52 | awk '{print NR "\t" $0}' files* 53 | 54 | # number each line of a file (number on left, right-aligned) 55 | # Double the percent signs if typing from the DOS command prompt. 56 | awk '{printf("%5d : %s\n", NR,$0)}' 57 | 58 | # number each line of file, but only print numbers if line is not blank 59 | # Remember caveats about Unix treatment of \r (mentioned above) 60 | awk 'NF{$0=++a " :" $0};{print}' 61 | awk '{print (NF? ++a " :" :"") $0}' 62 | 63 | # count lines (emulates "wc -l") 64 | awk 'END{print NR}' 65 | 66 | # print the sums of the fields of every line 67 | awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}' 68 | 69 | # add all fields in all lines and print the sum 70 | awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}' 71 | 72 | # print every line after replacing each field with its absolute value 73 | awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }' 74 | awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }' 75 | 76 | # print the total number of fields ("words") in all lines 77 | awk '{ total = total + NF }; END {print total}' file 78 | 79 | # print the total number of lines that contain "Beth" 80 | awk '/Beth/{n++}; END {print n+0}' file 81 | 82 | # print the largest first field and the line that contains it 83 | # Intended for finding the longest string in field #1 84 | awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}' 85 | 86 | # print the number of fields in each line, followed by the line 87 | awk '{ print NF ":" $0 } ' 88 | 89 | # print the last field of each line 90 | awk '{ print $NF }' 91 | 92 | # print the last field of the last line 93 | awk '{ field = $NF }; END{ print field }' 94 | 95 | # print every line with more than 4 fields 96 | awk 'NF > 4' 97 | 98 | # print every line where the value of the last field is > 4 99 | awk '$NF > 4' 100 | 101 | 102 | TEXT CONVERSION AND SUBSTITUTION: 103 | 104 | # IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format 105 | awk '{sub(/\r$/,"");print}' # assumes EACH line ends with Ctrl-M 106 | 107 | # IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format 108 | awk '{sub(/$/,"\r");print} 109 | 110 | # IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format 111 | awk 1 112 | 113 | # IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format 114 | # Cannot be done with DOS versions of awk, other than gawk: 115 | gawk -v BINMODE="w" '1' infile >outfile 116 | 117 | # Use "tr" instead. 118 | tr -d \r outfile # GNU tr version 1.22 or higher 119 | 120 | # delete leading whitespace (spaces, tabs) from front of each line 121 | # aligns all text flush left 122 | awk '{sub(/^[ \t]+/, ""); print}' 123 | 124 | # delete trailing whitespace (spaces, tabs) from end of each line 125 | awk '{sub(/[ \t]+$/, "");print}' 126 | 127 | # delete BOTH leading and trailing whitespace from each line 128 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' 129 | awk '{$1=$1;print}' # also removes extra space between fields 130 | 131 | # insert 5 blank spaces at beginning of each line (make page offset) 132 | awk '{sub(/^/, " ");print}' 133 | 134 | # align all text flush right on a 79-column width 135 | awk '{printf "%79s\n", $0}' file* 136 | 137 | # center all text on a 79-character width 138 | awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file* 139 | 140 | # substitute (find and replace) "foo" with "bar" on each line 141 | awk '{sub(/foo/,"bar");print}' # replaces only 1st instance 142 | gawk '{$0=gensub(/foo/,"bar",4);print}' # replaces only 4th instance 143 | awk '{gsub(/foo/,"bar");print}' # replaces ALL instances in a line 144 | 145 | # substitute "foo" with "bar" ONLY for lines which contain "baz" 146 | awk '/baz/{gsub(/foo/, "bar")};{print}' 147 | 148 | # substitute "foo" with "bar" EXCEPT for lines which contain "baz" 149 | awk '!/baz/{gsub(/foo/, "bar")};{print}' 150 | 151 | # change "scarlet" or "ruby" or "puce" to "red" 152 | awk '{gsub(/scarlet|ruby|puce/, "red"); print}' 153 | 154 | # reverse order of lines (emulates "tac") 155 | awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file* 156 | 157 | # if a line ends with a backslash, append the next line to it 158 | # (fails if there are multiple lines ending with backslash...) 159 | awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file* 160 | 161 | # print and sort the login names of all users 162 | awk -F ":" '{ print $1 | "sort" }' /etc/passwd 163 | 164 | # print the first 2 fields, in opposite order, of every line 165 | awk '{print $2, $1}' file 166 | 167 | # switch the first 2 fields of every line 168 | awk '{temp = $1; $1 = $2; $2 = temp}' file 169 | 170 | # print every line, deleting the second field of that line 171 | awk '{ $2 = ""; print }' 172 | 173 | # print in reverse order the fields of every line 174 | awk '{for (i=NF; i>0; i--) printf("%s ",i);printf ("\n")}' file 175 | 176 | # remove duplicate, consecutive lines (emulates "uniq") 177 | awk 'a !~ $0; {a=$0}' 178 | 179 | # remove duplicate, nonconsecutive lines 180 | awk '! a[$0]++' # most concise script 181 | awk '!($0 in a) {a[$0];print}' # most efficient script 182 | 183 | # concatenate every 5 lines of input, using a comma separator 184 | # between fields 185 | awk 'ORS=%NR%5?",":"\n"' file 186 | 187 | 188 | 189 | SELECTIVE PRINTING OF CERTAIN LINES: 190 | 191 | # print first 10 lines of file (emulates behavior of "head") 192 | awk 'NR < 11' 193 | 194 | # print first line of file (emulates "head -1") 195 | awk 'NR>1{exit};1' 196 | 197 | # print the last 2 lines of a file (emulates "tail -2") 198 | awk '{y=x "\n" $0; x=$0};END{print y}' 199 | 200 | # print the last line of a file (emulates "tail -1") 201 | awk 'END{print}' 202 | 203 | # print only lines which match regular expression (emulates "grep") 204 | awk '/regex/' 205 | 206 | # print only lines which do NOT match regex (emulates "grep -v") 207 | awk '!/regex/' 208 | 209 | # print the line immediately before a regex, but not the line 210 | # containing the regex 211 | awk '/regex/{print x};{x=$0}' 212 | awk '/regex/{print (x=="" ? "match on line 1" : x)};{x=$0}' 213 | 214 | # print the line immediately after a regex, but not the line 215 | # containing the regex 216 | awk '/regex/{getline;print}' 217 | 218 | # grep for AAA and BBB and CCC (in any order) 219 | awk '/AAA/; /BBB/; /CCC/' 220 | 221 | # grep for AAA and BBB and CCC (in that order) 222 | awk '/AAA.*BBB.*CCC/' 223 | 224 | # print only lines of 65 characters or longer 225 | awk 'length > 64' 226 | 227 | # print only lines of less than 65 characters 228 | awk 'length < 64' 229 | 230 | # print section of file from regular expression to end of file 231 | awk '/regex/,0' 232 | awk '/regex/,EOF' 233 | 234 | # print section of file based on line numbers (lines 8-12, inclusive) 235 | awk 'NR==8,NR==12' 236 | 237 | # print line number 52 238 | awk 'NR==52' 239 | awk 'NR==52 {print;exit}' # more efficient on large files 240 | 241 | # print section of file between two regular expressions (inclusive) 242 | awk '/Iowa/,/Montana/' # case sensitive 243 | 244 | 245 | SELECTIVE DELETION OF CERTAIN LINES: 246 | 247 | # delete ALL blank lines from a file (same as "grep '.' ") 248 | awk NF 249 | awk '/./' 250 | 251 | 252 | CREDITS AND THANKS: 253 | 254 | Special thanks to Peter S. Tillier for helping me with the first release 255 | of this FAQ file. 256 | 257 | For additional syntax instructions, including the way to apply editing 258 | commands from a disk file instead of the command line, consult: 259 | 260 | "sed & awk, 2nd Edition," by Dale Dougherty and Arnold Robbins 261 | O'Reilly, 1997 262 | "UNIX Text Processing," by Dale Dougherty and Tim O'Reilly 263 | Hayden Books, 1987 264 | "Effective awk Programming, 3rd Edition." by Arnold Robbins 265 | O'Reilly, 2001 266 | 267 | To fully exploit the power of awk, one must understand "regular 268 | expressions." For detailed discussion of regular expressions, see 269 | "Mastering Regular Expressions, 2d edition" by Jeffrey Friedl 270 | (O'Reilly, 2002). 271 | 272 | The manual ("man") pages on Unix systems may be helpful (try "man awk", 273 | "man nawk", "man regexp", or the section on regular expressions in "man 274 | ed"), but man pages are notoriously difficult. They are not written to 275 | teach awk use or regexps to first-time users, but as a reference text 276 | for those already acquainted with these tools. 277 | 278 | USE OF '\t' IN awk SCRIPTS: For clarity in documentation, we have used 279 | the expression '\t' to indicate a tab character (0x09) in the scripts. 280 | All versions of awk, even the UNIX System 7 version should recognize 281 | the '\t' abbreviation. 282 | 283 | #---end of file--- 284 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /slides.txt: -------------------------------------------------------------------------------- 1 | . 2 | . 3 | . 4 | . 5 | . 6 | . 7 | ___ ___ __ 8 | / \ \ / / |/ / 9 | / _ \ \ /\ / /| ' / 10 | / ___ \ V V / | . \ 11 | /_/ \_\_/\_/ |_|\_\ 12 | . 13 | _____ ___ ___ __ 14 | |_ _|/ \ \ / / |/ / 15 | | | / _ \ \ /\ / /| ' / 16 | | |/ ___ \ V V / | . \ 17 | |_/_/ \_\_/\_/ |_|\_\ 18 | . 19 | by 20 | . 21 | @mikepea 22 | 23 | . 24 | . 25 | . 26 | . 27 | . 28 | . 29 | . 30 | AWK? Wassat? 31 | ------------ 32 | . 33 | 34 | @last 35 | . 36 | A data transformation and reporting tool. 37 | Reads records from files/stdin, and interprets 38 | each record as a list of fields which can then 39 | be manipulated. 40 | 41 | @last 42 | . 43 | Awk is a programming language designed to 44 | make many common information retrieval and text 45 | manipulation tasks easy to state and to perform. 46 | ( y/awk-1978-paper ) 47 | 48 | @last 49 | . 50 | Written in 1977, by Alfred Aho, Peter Weinberger, 51 | & Brian Kerninghan, 52 | 53 | @last 54 | . 55 | Hence AWK! 56 | 57 | . 58 | . 59 | . 60 | . 61 | . 62 | . 63 | . 64 | AWK? Why? 65 | --------- 66 | . 67 | . 68 | 69 | @last 70 | . 71 | Well, I <3 Awk. 72 | 73 | @last 74 | . 75 | First language I ever wrote anything 'commercial' in. 76 | In 1994, :ahem:. 77 | 78 | @last 79 | . 80 | But i'm not alone - y/cantrill-bigdata (skip to 19:30) 81 | 82 | @last 83 | . 84 | THE [grey] AWK BOOK - y/awk-book 85 | 86 | . 87 | . 88 | . 89 | . 90 | . 91 | . 92 | . 93 | AWK? Why? 94 | --------- 95 | . 96 | . 97 | . 98 | grep ' 200 ' access_log | awk '{print $1}' 99 | 100 | @last 101 | . 102 | :( 103 | 104 | @last 105 | . 106 | But I always remember that it can do loads more! 107 | 108 | . 109 | A Bit Of History 110 | ---------------- 111 | . 112 | 113 | 114 | @last 115 | 1970: Unics [sic] is born. Ken Thompson, Dennis Ritchie at Bell Labs. 116 | Born out of multics, after much financial wrangling to get 117 | a spare PDP-7. 118 | 119 | @last 120 | First step - Space Travel game! (Then copy, print, edit, delete :) 121 | 122 | @last 123 | . 124 | 1971: man pages appear, at the request of Doug McIlroy (their manager ;) 125 | 126 | @last 127 | . 128 | 1973: Unix pipelines invented (along with sed, grep and tr) 129 | Effectively creating the Unix Philosophy, and 'an unforgettable 130 | orgy of one-liners'. Doug McIlroy <3 131 | 132 | @last 133 | . 134 | 1975: Lex and Yacc are created, enabling tools like 'bc' 135 | 136 | @last 137 | . 138 | 1977: Awk is born. Essentially the first scripting language on Unix! 139 | . 140 | "Awk was originally designed... in part of an experiment to see 141 | how the Unix tools grep and sed could be generalized to deal with 142 | numbers as well as text... based on our interest in regular 143 | expressions" 144 | 145 | @last 146 | . 147 | 1985: Brian Kerninghan creates (and still maintains!) New AWK (nawk). 148 | This is on your Macs! 149 | 150 | @last 151 | . 152 | 1988: POSIX standard created, with awk as an included component. 153 | . 154 | 155 | 156 | 157 | . 158 | . 159 | . 160 | . 161 | . 162 | . 163 | . 164 | . 165 | . 166 | 167 | . 168 | . 169 | The AWK Programming Language, 1988 170 | ---------------------------------- 171 | 172 | @last 173 | . 174 | . 175 | <3 <3 <3 <3 <3 <3 <3 <3 <3 <3 <3 <3 176 | <3 <3 177 | <3 Typeset using a DEC VAX 8550 <3 178 | <3 running 9th ed. UNIX® <3 179 | <3 <3 180 | <3 <3 <3 <3 <3 <3 <3 <3 <3 <3 <3 <3 181 | 182 | 183 | 184 | 185 | 186 | . 187 | . 188 | The AWK Programming Language, 1988 189 | ---------------------------------- 190 | . 191 | An absolute delight of a technical book. 192 | 193 | @last 194 | . 195 | Still amazingly current. Awk has evolved (gawk, nawk) 196 | but is still fundamentally the same language. 197 | 198 | @last 199 | . 200 | Has got an 'Algorithms' section! Interview answers in AWK! 201 | 202 | @last 203 | . 204 | Has a chapter on writing DSLs, which it calls 205 | 'Little Langauges' <3 206 | 207 | @last 208 | . 209 | Has a section on creating a RDBMS in AWK, with a custom 210 | query language! 211 | 212 | 213 | 214 | 215 | 216 | 217 | . 218 | . 219 | Understand awk a bit betterer 220 | ----------------------------- 221 | . 222 | awk [-Fs] '{program}' input_files 223 | awk [-Fs] -f {script} input_files 224 | . 225 | ... where a program looks like: 226 | 227 | @last 228 | . 229 | BEGIN { 230 | # do initialisation stuff 231 | } 232 | . 233 | conditional { 234 | # do stuff on records matching condition 235 | } 236 | . 237 | END { 238 | # finish up. Like print a report. 239 | } 240 | 241 | . 242 | . 243 | Some important built-in variables 244 | --------------------------------- 245 | . 246 | 247 | @last 248 | . 249 | $1 .. $n - field references 250 | $i - field referenced by integer i 251 | 252 | @last 253 | NF - number of fields in record 254 | 255 | @last 256 | $NF - last field in record 257 | 258 | @last 259 | . 260 | NR - number of records read so far 261 | ('current line') 262 | FNR - number of records read so far 263 | in this file 264 | 265 | @last 266 | . 267 | RS - input record seperator (default \n) 268 | FS - input field seperator (default whitespace) 269 | 270 | @last 271 | . 272 | ORS - output record seperator (default \n) 273 | OFS - output field seperator (default whitespace) 274 | 275 | 276 | . 277 | . 278 | An Unforgettable Orgy Of One-Liners 279 | ----------------------------------- 280 | 281 | . 282 | . 283 | An Unforgettable(*) Orgy Of One-Liners 284 | -------------------------------------- 285 | (*) jk, you're totally gonna forget these. 286 | see github.com/mikepea/awk_tawk for a refresh 287 | 288 | @last 289 | . 290 | awk 'END { print NR }' 291 | 292 | @last 293 | # wc -l 294 | 295 | @last 296 | . 297 | awk 'NR == 10000000' 298 | 299 | @last 300 | # print 10,000,000th line 301 | 302 | @last 303 | . 304 | awk '{ print $NF }' 305 | 306 | @last 307 | # print last field of every line 308 | 309 | @last 310 | . 311 | awk '{ n = n + NF } END { print n }' 312 | 313 | @last 314 | # wc -w 315 | 316 | @last 317 | . 318 | awk 'BEGIN { FS="" } { n = n + NF + 1 } END { print n }' 319 | 320 | @last 321 | # wc -c 322 | 323 | @last 324 | . 325 | 326 | 327 | . 328 | . 329 | An Unforgettable(*) Orgy Of One-Liners 330 | -------------------------------------- 331 | (*) jk, you're totally gonna forget these. 332 | see github.com/mikepea/awk_tawk for a refresh 333 | 334 | @last 335 | . 336 | awk '$9 == 200 { print $1 }' access_log 337 | 338 | @last 339 | # show IPs that had 200s from an Apache access log 340 | 341 | @last 342 | . 343 | ls -l | awk '{ sum = sum + $5 } END {print sum}' 344 | 345 | @last 346 | # print the total bytes of a list of files 347 | 348 | @last 349 | . 350 | awk 'BEGIN { FS=":" } $2 > max { max = $2 } END {print max}' passwd 351 | 352 | @last 353 | # print highest uid from passwd file 354 | 355 | 356 | @last 357 | . 358 | awk '{ $1 = NR; print}' 359 | # replace first field with line number 360 | 361 | @last 362 | . 363 | awk '{ $2 = ""; print}' 364 | # erase the second field 365 | 366 | 367 | . 368 | . 369 | An Unforgettable(*) Orgy Of One-Liners 370 | -------------------------------------- 371 | (*) jk, you're totally gonna forget these. 372 | see github.com/mikepea/awk_tawk for a refresh 373 | . 374 | . 375 | awk 'BEGIN { print "Hello World!" }' 376 | @last 377 | # the BEGIN is important! 378 | 379 | @last 380 | . 381 | awk 'BEGIN { srand(); t=srand(); print t }' 382 | 383 | @last 384 | # date +'%s' # WHAAAAT!!?!1111! 385 | . 386 | 387 | @last 388 | . 389 | Thank @codymello for that one. POSIX! 390 | 391 | . 392 | . 393 | What else? 394 | ---------- 395 | 396 | . 397 | . 398 | . 399 | . 400 | _ __ ___ __ _ _____ ___ __ ___| | 401 | | '__/ _ \/ _` |/ _ \ \/ / '_ \/ __| | 402 | | | | __/ (_| | __/> <| |_) \__ \_| 403 | |_| \___|\__, |\___/_/\_\ .__/|___(_) 404 | |___/ |_| 405 | . 406 | OMG! THEY'RE SO NEW AND EXCITING! (*) 407 | 408 | @last 409 | . 410 | . 411 | . 412 | (*) actually invented in 1956 by Stephen Cole Kleen, a mathematician 413 | . 414 | ... and then repurposed by Ken Thompson in 'ed', and by extension 'grep' 415 | . 416 | . 417 | . 418 | 419 | 420 | . 421 | . 422 | Regexps in AWK 423 | -------------- 424 | . 425 | * Basically no different to sed/grep. Yey standards! 426 | 427 | 428 | @last 429 | * I'm amazed at how little they've changed 430 | since 1988 431 | . 432 | 433 | @last 434 | * Useful in record matching conditionals 435 | . 436 | /regex/ { 437 | # do stuff on records matching regex 438 | } 439 | . 440 | 441 | @last 442 | /from/,/to/ { 443 | # do stuff on records between regexes 444 | } 445 | 446 | @last 447 | . 448 | * No submatch variables :( 449 | i.e. (.+) => \1 in sed 450 | . 451 | 452 | 453 | . 454 | . 455 | Useful built-in functions 456 | ------------------------- 457 | . 458 | . 459 | 460 | @last 461 | sub(/regex/,s) # leftmost sub in $0 462 | sub(/regex/,s,t) # leftmost sub in t 463 | gsub(/regex/,s) # global sub in $0 464 | gsub(/regex/,s,t) # global sub in t 465 | . 466 | 467 | @last 468 | substr(s,p) # return suffix of s, at pos p 469 | substr(s,p,n) # return substr of s, len n, at pos p 470 | . 471 | 472 | Useful built-in functions 473 | ------------------------- 474 | . 475 | . 476 | 477 | @last 478 | srand(s) # seed random number generator with 's' 479 | rand() # return random number 0 <= n < 1 480 | . 481 | 482 | @last 483 | index(s,t) # return pos of string t in s. 0 otherwise 484 | length(s) # return length of string 485 | match(s,/regex/) # return index of match or 0 if not. 486 | . 487 | 488 | @last 489 | . 490 | awk 'length($0) > 80' myprog.py 491 | # print lines >80 chars 492 | 493 | Useful built-in functions 494 | ------------------------- 495 | . 496 | . 497 | 498 | @last 499 | printf(fmt, expr-list) 500 | sprintf(fmt, expr-list) 501 | 502 | . 503 | . 504 | . 505 | How the hell did I not know about printf?! 506 | ----------------------------------------- 507 | . 508 | . 509 | 510 | @last 511 | * Works just like regular printf / sprintf 512 | . 513 | awk '{ printf("Hello %s %s", $3, $4) }' 514 | awk '{ fullname = sprintf("%s %s", $3, $4) }' 515 | 516 | 517 | 518 | . 519 | . 520 | Arrays 521 | ------ 522 | . 523 | . 524 | * associative, so really a dictionary 525 | . 526 | * one dimensional, though the book gives 527 | a workaround for multi-dimensional (and this may be a 528 | feature of modern gawk?) 529 | . 530 | 531 | @last 532 | arr["thing"] = value 533 | . 534 | 535 | @last 536 | records[NR] = $0 537 | . 538 | 539 | @last 540 | for (key in arr) { 541 | print key, arr[key] 542 | } 543 | . 544 | 545 | @last 546 | split($1, arr, ",") 547 | . 548 | 549 | @last 550 | if arr[key] > 500 551 | delete arr[key] 552 | 553 | 554 | 555 | . 556 | . 557 | Custom functions are a thing 558 | ---------------------------- 559 | . 560 | . 561 | * As you'd expect, but have probably never used. 562 | 563 | @last 564 | * variables are passed as a copy 565 | * arrays are passed by reference 566 | . 567 | 568 | @last 569 | function add(arg1, arg2) { 570 | return arg1 + arg2 571 | } 572 | 573 | @last 574 | . 575 | function alen(a) { 576 | c=0; 577 | for (i in a) c++; 578 | return c; 579 | } 580 | 581 | @last 582 | . 583 | function clear_array(a) { split("", a, ":"); } 584 | 585 | @last 586 | . 587 | function get_key() { RS="\n"; getline key < "-"; RS="" } 588 | 589 | . 590 | . 591 | Interation with other programs 592 | ------------------------------ 593 | . 594 | . 595 | 596 | @last 597 | * system() works as you'd expect 598 | . 599 | system("cat " $2) 600 | 601 | @last 602 | . 603 | function refresh() { system("clear"); print ""; } 604 | 605 | @last 606 | . 607 | * also can use a pipe! 608 | . 609 | while ("who" | getline) { 610 | num_users++ 611 | } 612 | . 613 | . 614 | . 615 | 616 | . 617 | . 618 | What's crappy about Awk? 619 | ------------------------ 620 | . 621 | @last 622 | * scope is global. Meh. 623 | 624 | @last 625 | . 626 | * $0 is a string 627 | 628 | @last 629 | . 630 | * OFS isn't as useful as I thought as a result :( 631 | 632 | 633 | . 634 | . 635 | . 636 | . 637 | . 638 | . 639 | . 640 | The AWK Programming Language, 1988 641 | ---------------------------------- 642 | . 643 | "Awk is not a solution to every programming problem, 644 | but it's an indispensable part of a programmer's 645 | toolbox, especially on Unix, where easy connection 646 | of tools is a way of life." 647 | 648 | 649 | !awk '{print " ",$0}' present.awk 650 | 651 | -------------------------------------------------------------------------------- /ref/hist.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Early Unix history and evolution 5 | 6 | 7 |

The Evolution of the Unix Time-sharing System* 8 |

9 |
Dennis M. Ritchie
10 | Bell Laboratories, Murray Hill, NJ, 07974 11 |
12 |

ABSTRACT

13 | This paper presents a brief history of the early development of the Unix operating 14 | system. 15 | It concentrates on the evolution of the file system, 16 | the process-control mechanism, 17 | and the idea of pipelined commands. 18 | Some attention is paid to social conditions 19 | during the development of the system. 20 |
21 |

22 |
23 |
24 | NOTE: *This paper was first presented at the Language 25 | Design and Programming Methodology conference at Sydney, 26 | Australia, September 1979. 27 | The conference proceedings were published as 28 | Lecture Notes in Computer Science #79: 29 | Language Design and Programming Methodology, 30 | Springer-Verlag, 1980. 31 | This rendition is based on a reprinted version appearing in 32 | AT&T Bell Laboratories Technical Journal 33 | 63 34 | No. 6 Part 2, October 1984, pp. 1577-93. 35 |
36 |
37 |

Introduction 38 |

39 |

40 | During the past few years, the Unix operating system 41 | has come into wide use, 42 | so wide that its very name has become a trademark of Bell Laboratories. 43 | Its important characteristics have become known to many people. 44 | It has suffered much rewriting and tinkering 45 | since the first publication describing it in 1974 [1], 46 | but few fundamental changes. 47 | However, Unix was born in 1969 not 1974, 48 | and the account of its development 49 | makes a little-known and perhaps instructive story. 50 | This paper presents a technical and social history 51 | of the evolution of the system. 52 |

53 |

Origins 54 |

55 |

56 | For computer science at Bell Laboratories, the period 1968-1969 was 57 | somewhat unsettled. 58 | The main reason for this was the slow, though clearly 59 | inevitable, withdrawal of the Labs from the Multics project. 60 | To the Labs computing community as a whole, 61 | the problem was the increasing obviousness of the failure of Multics to deliver 62 | promptly any sort of usable system, 63 | let alone the panacea envisioned earlier. 64 | For much of this time, 65 | the Murray Hill Computer Center was also running a costly 66 | GE 645 machine that inadequately simulated 67 | the GE 635. 68 | Another shake-up that occurred during this period 69 | was the organizational separation of computing services 70 | and computing research. 71 |

72 |

73 | From the point of view of the group that was to 74 | be most involved in the beginnings of Unix 75 | (K. Thompson, Ritchie, M. D. McIlroy, J. F. Ossanna), 76 | the decline and fall of Multics had a directly felt effect. 77 | We were among the last Bell Laboratories holdouts 78 | actually working on Multics, 79 | so we still felt some sort of stake in its success. 80 | More important, the convenient interactive computing service 81 | that Multics had promised to the entire community 82 | was in fact available to our limited group, 83 | at first under the CTSS system used to develop Multics, 84 | and later under Multics itself. 85 | Even though Multics could not then 86 | support many users, it could support us, 87 | albeit at exorbitant cost. 88 | We didn't want to lose the pleasant niche we occupied, 89 | because no similar ones were available; 90 | even the time-sharing service that would later be offered 91 | under GE's operating system 92 | did not exist. 93 | What we wanted to preserve was not just a good environment in which to 94 | do programming, but a system around which a fellowship could form. 95 | We knew from experience 96 | that 97 | the essence of communal computing, as supplied by remote-access, time-shared machines, 98 | is not just to type programs 99 | into a terminal instead of a keypunch, 100 | but to encourage close communication. 101 |

102 |

103 | Thus, during 1969, 104 | we began trying to find an alternative to Multics. 105 | The search took several forms. 106 | Throughout 1969 we (mainly Ossanna, Thompson, Ritchie) 107 | lobbied intensively for the purchase of 108 | a medium-scale machine 109 | for which we promised to write an operating system; 110 | the machines we suggested were the DEC PDP-10 111 | and the SDS (later Xerox) Sigma 7. 112 | The effort was frustrating, because our proposals were never 113 | clearly and finally turned down, 114 | but yet were certainly never accepted. 115 | Several times it seemed we were very near success. 116 | The final blow to this effort came when we 117 | presented an exquisitely complicated proposal, 118 | designed to minimize financial outlay, 119 | that involved some outright purchase, some third-party lease, 120 | and a plan to turn in a DEC KA-10 processor on the soon-to-be-announced 121 | and more capable KI-10. 122 | The proposal was rejected, and rumor soon had it that 123 | W. O. Baker (then vice-president of Research) 124 | had reacted to it with the comment `Bell Laboratories 125 | just doesn't do business this way!' 126 |

127 |

128 | Actually, it is perfectly obvious in retrospect 129 | (and should have been at the time) 130 | that we were asking the Labs to spend too much money 131 | on too few people with too vague a plan. 132 | Moreover, I am quite sure that at that time operating systems 133 | were not, for our management, an attractive area in which to support work. 134 | They were in the process of extricating themselves 135 | not only from an operating system development effort that 136 | had failed, 137 | but from running the local Computation Center. 138 | Thus it may have seemed that buying a machine such as we 139 | suggested might lead on the one hand to yet another Multics, 140 | or on the other, if we produced something useful, 141 | to yet another Comp Center for them to be responsible for. 142 |

143 |

144 | Besides the financial agitations that took place in 1969, 145 | there was technical work also. 146 | Thompson, R. H. Canaday, and Ritchie 147 | developed, on blackboards and scribbled notes, 148 | the basic design of a file system 149 | that was later to become the heart of Unix. 150 | Most of the design was Thompson's, 151 | as was the impulse to think about file systems at all, 152 | but I believe I contributed the idea of device files. 153 | Thompson's itch for creation of an operating system took several forms during 154 | this period; 155 | he also wrote (on Multics) 156 | a fairly detailed simulation of the performance of the proposed file 157 | system design 158 | and of paging behavior of programs. 159 | In addition, he started work on a new operating system 160 | for the GE-645, going as far as writing an assembler 161 | for the machine and a rudimentary operating system kernel 162 | whose greatest achievement, so far as I remember, 163 | was to type a greeting message. 164 | The complexity of the machine was such that a mere message was already 165 | a fairly notable accomplishment, but when it became clear that the lifetime 166 | of the 645 at the Labs was measured in months, 167 | the work was dropped. 168 |

169 |

170 | Also during 1969, Thompson developed the game of 171 | `Space Travel.' 172 | First written on Multics, then transliterated into Fortran 173 | for GECOS 174 | (the operating system for the GE, later Honeywell, 635), 175 | it was nothing less than a simulation of the movement of the major bodies 176 | of the Solar System, with the player guiding a ship 177 | here and there, observing the scenery, and attempting to 178 | land on the various planets and moons. 179 | The GECOS version was unsatisfactory in two important respects: 180 | first, the display of the state of the game was jerky and hard to control 181 | because one had to type commands at it, 182 | and second, a game cost about $75 for CPU time on the big computer. 183 | It did not take long, therefore, 184 | for Thompson to find a little-used PDP-7 computer with 185 | an excellent display processor; 186 | the whole system was used as a Graphic-II terminal. 187 | He and I rewrote Space Travel 188 | to run on this machine. 189 | The undertaking was more ambitious than it might seem; 190 | because we disdained all existing software, 191 | we had to write a floating-point arithmetic package, 192 | the pointwise specification of the graphic characters 193 | for the display, 194 | and a debugging subsystem that continuously 195 | displayed the contents of typed-in locations in a corner 196 | of the screen. 197 | All this was written in assembly language for a cross-assembler 198 | that ran under GECOS and produced paper tapes 199 | to be carried to the PDP-7. 200 |

201 |

202 | Space Travel, though it made a very attractive game, 203 | served mainly as an introduction to the clumsy 204 | technology of preparing programs for the PDP-7. 205 | Soon Thompson began implementing the paper file system 206 | (perhaps `chalk file system' would be more accurate) 207 | that had been designed earlier. 208 | A file system without a way to exercise it 209 | is a sterile proposition, 210 | so he 211 | proceeded to flesh it out with 212 | the other requirements for a working operating system, 213 | in particular the notion of processes. 214 | Then came a small set of user-level utilities: 215 | the means to copy, print, delete, and edit files, 216 | and of course a simple command interpreter (shell). 217 | Up to this time all the programs were written using GECOS 218 | and files were transferred 219 | to the PDP-7 on paper tape; 220 | but once an assembler was completed the 221 | system was able to support itself. 222 | Although it was not until well into 1970 that 223 | Brian Kernighan 224 | suggested the name `Unix,' in a somewhat treacherous 225 | pun on `Multics,' 226 | the operating system we know today was born. 227 |

228 |

The PDP-7 Unix file system 229 |

230 |

231 | Structurally, 232 | the file system of PDP-7 Unix 233 | was nearly identical to today's. 234 | It had 235 |

236 |
237 |
1)
238 | An i-list: a linear array of 239 | i-nodes 240 | each describing a file. 241 | An i-node contained less than it does now, 242 | but the essential information was the same: 243 | the protection mode of the file, its type 244 | and size, and the list of physical blocks 245 | holding the contents. 246 |
2)
247 | Directories: 248 | a special kind of file containing a sequence 249 | of names and the associated i-number. 250 |
3)
251 | Special files 252 | describing devices. 253 | The device specification was not contained 254 | explicitly in the i-node, but was instead encoded 255 | in the number: 256 | specific i-numbers corresponded to specific files. 257 |
258 |

259 | The important file system calls were also present 260 | from the start. 261 | Read, write, open, creat (sic), close: 262 | with one very important exception, discussed below, 263 | they were similar to what one finds now. 264 | A minor difference was that the unit of I/O was the 265 | word, not the byte, because the PDP-7 was a word-addressed machine. 266 | In practice this meant merely that all programs dealing 267 | with character streams ignored null characters, 268 | because null was used to pad a file to 269 | an even number of characters. 270 | Another minor, occasionally annoying difference 271 | was the lack of erase and kill processing 272 | for terminals. 273 | Terminals, in effect, were always in raw mode. 274 | Only a few programs (notably the shell and the editor) 275 | bothered to implement erase-kill processing. 276 |

277 |

278 | In spite of its considerable similarity 279 | to the current file system, 280 | the PDP-7 file system was in one way remarkably different: 281 | there were no path names, and each file-name argument 282 | to the system was a simple name (without `/') taken 283 | relative to the current directory. 284 | Links, in the usual Unix sense, did exist. 285 | Together with an elaborate set of conventions, 286 | they were the 287 | principal means by which the lack of path names became 288 | acceptable. 289 |

290 |

291 | The 292 | link 293 | call took the form 294 |

 295 | link(dir, file, newname)
 296 | 
297 | where 298 | dir 299 | was a directory file in the current directory, 300 | file 301 | an existing entry in that directory, 302 | and 303 | newname 304 | the name of the link, which was added to the 305 | current directory. 306 | Because 307 | dir 308 | needed to be in the current directory, 309 | it is evident that today's prohibition against 310 | links to directories was not enforced; 311 | the PDP-7 Unix file system had the shape of a general 312 | directed graph. 313 |

314 |

315 | So that every user did not need 316 | to maintain a link to all directories of interest, 317 | there existed a directory 318 | called 319 | dd 320 | that contained entries for the directory of each 321 | user. 322 | Thus, to make a link to file 323 | x 324 | in directory 325 | ken, 326 | I might do 327 |

 328 | ln dd ken ken
 329 | ln ken x x
 330 | rm ken
 331 | 
332 | This scheme rendered subdirectories sufficiently hard to use 333 | as to make them unused in practice. 334 | Another important barrier was that there was no way to create a directory 335 | while the system was running; 336 | all were made during recreation of the file system 337 | from paper tape, so that directories were in effect 338 | a nonrenewable resource. 339 |

340 |

341 | The 342 | dd 343 | convention made the 344 | chdir 345 | command relatively convenient. 346 | It took multiple arguments, and switched the current directory to each named directory in turn. 347 | Thus 348 |

 349 | chdir dd ken
 350 | 
351 | would move to 352 | directory 353 | ken. 354 | (Incidentally, 355 | chdir 356 | was spelled 357 | ch; 358 | why this was expanded when we went to the PDP-11 359 | I don't remember.) 360 |

361 |

362 | The most serious inconvenience of the implementation of the file system, 363 | aside from the lack of path names, 364 | was the difficulty of changing its configuration; 365 | as mentioned, directories and special files were both made 366 | only when the disk was recreated. 367 | Installation of a new device was very painful, because the code 368 | for devices was spread widely throughout the system; 369 | for example there were several loops that visited each device in turn. 370 | Not surprisingly, there was no notion of mounting a removable 371 | disk pack, because the machine had only a single fixed-head disk. 372 |

373 |

374 | The operating system code that implemented this file system 375 | was a drastically simplified version of the present 376 | scheme. 377 | One important simplification followed from the fact that 378 | the system was not multi-programmed; 379 | only one program was in memory at a time, 380 | and control was passed between processes 381 | only when an explicit swap took place. 382 | So, for example, 383 | there was an 384 | iget 385 | routine that made a named i-node available, 386 | but it left the i-node in a constant, static location 387 | rather than returning a pointer into a large table 388 | of active i-nodes. 389 | A precursor of the current buffering mechanism was present 390 | (with about 4 buffers) 391 | but there was essentially no overlap of disk I/O with computation. 392 | This was avoided not merely for simplicity. 393 | The disk attached to the PDP-7 was fast for its time; 394 | it transferred one 18-bit word every 2 microseconds. 395 | On the other hand, the PDP-7 itself had a memory cycle time 396 | of 1 microsecond, 397 | and most instructions took 2 cycles (one for the instruction itself, 398 | one for the operand). 399 | However, indirectly addressed instructions required 3 cycles, 400 | and indirection was quite common, because the machine had no index registers. 401 | Finally, the DMA controller was unable to access memory during an instruction. 402 | The upshot was that the disk would incur overrun errors if any 403 | indirectly-addressed instructions 404 | were executed while it was transferring. 405 | Thus control could not be returned to the user, 406 | nor in fact could general system code be executed, 407 | with the disk running. 408 | The interrupt routines for the clock and terminals, 409 | which needed to be runnable at all times, 410 | had to be coded in very strange fashion to avoid indirection. 411 |

412 |

Process control 413 |

414 |

415 | By `process control,' I mean 416 | the mechanisms by which processes are created and used; 417 | today the system calls 418 | fork, 419 | exec, 420 | wait, 421 | and 422 | exit 423 | implement these mechanisms. 424 | Unlike the file system, which existed in nearly its 425 | present form from the earliest days, the process control 426 | scheme underwent considerable mutation after PDP-7 427 | Unix was already in use. 428 | (The introduction of path names in the PDP-11 system 429 | was certainly a considerable notational advance, 430 | but not a change in fundamental structure.) 431 |

432 |

433 | Today, the way in which commands are executed by the shell can 434 | be summarized as follows: 435 |

436 |
437 |
1)
438 | The shell reads a command line from the terminal. 439 |
2)
440 | It creates a child process by 441 | fork. 442 |
3)
443 | The child process uses 444 | exec 445 | to call in the command from a file. 446 |
4)
447 | Meanwhile, the parent shell 448 | uses 449 | wait 450 | to wait for the child (command) process 451 | to terminate by calling 452 | exit. 453 |
5)
454 | The parent shell goes back to step 1). 455 |
456 |

457 | Processes (independently executing entities) 458 | existed very early in PDP-7 Unix. 459 | There were in fact precisely two of them, 460 | one for each of the two terminals attached to the machine. 461 | There was no 462 | fork, 463 | wait, 464 | or 465 | exec. 466 | There was an 467 | exit, 468 | but its meaning was rather different, as will be seen. 469 | The main loop of the shell went as follows. 470 |

471 |
472 |
1)
473 | The shell closed all its open files, then opened the terminal 474 | special file for standard input and output 475 | (file descriptors 0 and 1). 476 |
2)
477 | It read a command line from the terminal. 478 |
3)
479 | It linked to the file specifying the command, 480 | opened the file, and removed the link. 481 | Then it copied a small bootstrap program to the top of memory 482 | and jumped to it; this bootstrap program read in the file 483 | over the shell code, then jumped to the first location 484 | of the command 485 | (in effect an 486 | exec). 487 |
4)
488 | The command did its work, then terminated by calling 489 | exit. 490 | The 491 | exit 492 | call caused the system to read in a fresh copy of the shell 493 | over the terminated command, then to jump to its start 494 | (and thus in effect to go to step 1). 495 |
496 |

497 | The most interesting thing about this primitive implementation 498 | is the degree to which it anticipated themes 499 | developed more fully later. 500 | True, it could support neither background processes 501 | nor shell command files (let alone pipes and filters); 502 | but IO redirection (via `<' and `>') was soon there; 503 | it is discussed below. 504 | The implementation of redirection was quite straightforward; 505 | in step 3) above the shell just replaced its standard input 506 | or output with the appropriate file. 507 | Crucial to subsequent development 508 | was the implementation of the shell as a user-level 509 | program stored in a file, 510 | rather than a part of the operating system. 511 |

512 |

513 | The structure of this process control scheme, 514 | with one process per terminal, 515 | is similar to that of many interactive systems, 516 | for example CTSS, Multics, Honeywell TSS, and IBM TSS and TSO. 517 | In general such systems require special mechanisms 518 | to implement useful facilities such as detached computations 519 | and command files; 520 | Unix at that stage didn't bother to supply the special mechanisms. 521 | It also exhibited some irritating, idiosyncratic problems. 522 | For example, a newly recreated shell had to close all its open files 523 | both to get rid of any open files 524 | left by the command just executed and to rescind previous IO 525 | redirection. 526 | Then it had to reopen the special file corresponding to 527 | its terminal, in order to read a new command line. 528 | There was no 529 | /dev 530 | directory (because no path names); 531 | moreover, the shell could retain no memory 532 | across commands, because it was reexecuted afresh 533 | after each command. 534 | Thus a further file system convention was required: 535 | each directory had to contain an entry 536 | tty 537 | for a special file that referred to the terminal 538 | of the process that opened it. 539 | If by accident one changed into some directory that lacked this 540 | entry, the shell would loop hopelessly; 541 | about the only remedy was to reboot. 542 | (Sometimes the missing link could be made from the other terminal.) 543 |

544 |

545 | Process control in its modern form was designed and implemented 546 | within a couple of days. 547 | It is astonishing how easily it fitted into the existing system; 548 | at the same time it is easy to see how some of the slightly 549 | unusual features of the design are present precisely because 550 | they represented small, easily-coded changes to what existed. 551 | A good example is the separation of the 552 | fork 553 | and 554 | exec 555 | functions. 556 | The most common model for the creation of new 557 | processes involves specifying a program for the process 558 | to execute; 559 | in Unix, 560 | a forked process continues to run the same program 561 | as its parent until it performs an explicit 562 | exec. 563 | The separation of the functions is certainly not unique to 564 | Unix, 565 | and in fact it was present in the Berkeley time-sharing 566 | system [2], 567 | which was well-known to Thompson. 568 | Still, it seems reasonable to suppose that it exists in Unix 569 | mainly because of the ease with which 570 | fork 571 | could be implemented without changing much else. 572 | The system already handled multiple 573 | (i.e. two) processes; 574 | there was a process table, and the processes were swapped between 575 | main memory and the disk. 576 | The initial implementation of 577 | fork 578 | required only 579 |

580 |
581 |
1)
582 | Expansion of the process table 583 |
2)
584 | Addition of a fork call that copied the current 585 | process to the disk swap area, 586 | using the already existing swap IO primitives, 587 | and made some adjustments to the process table. 588 |
589 |

590 | In fact, the PDP-7's 591 | fork 592 | call required precisely 27 lines of assembly code. 593 | Of course, other changes in the operating system 594 | and user programs were required, and some of them were 595 | rather interesting and unexpected. 596 | But a combined 597 | fork-exec 598 | would have been considerably more complicated, if only because 599 | exec 600 | as such did not exist; 601 | its function was already performed, 602 | using explicit IO, by the shell. 603 |

604 |

605 | The 606 | exit 607 | system call, which previously read in a new copy of the shell 608 | (actually a sort of automatic 609 | exec 610 | but without arguments), 611 | simplified considerably; 612 | in the new version a process only had to clean out its 613 | process table entry, 614 | and give up control. 615 |

616 |

617 | Curiously, 618 | the primitives that became 619 | wait 620 | were considerably more general 621 | than the present scheme. 622 | A pair of primitives sent one-word messages between named processes: 623 |

 624 | smes(pid, message)
 625 | (pid, message) = rmes()
 626 | 
627 | The target process of 628 | smes 629 | did not need to have any ancestral 630 | relationship with the receiver, 631 | although the system provided no explicit 632 | mechanism for communicating process 633 | IDs except that 634 | fork 635 | returned to each of the parent and child the ID of its 636 | relative. 637 | Messages were not queued; 638 | a sender delayed until the receiver read the message. 639 |

640 |

641 | The message facility was used as follows: 642 | the parent shell, 643 | after creating a process to execute a command, 644 | sent a message to the new process by 645 | smes; 646 | when the command terminated 647 | (assuming it did not try to read any messages) 648 | the shell's blocked 649 | smes 650 | call returned an error indication that the target 651 | process did not exist. 652 | Thus the shell's 653 | smes 654 | became, in effect, 655 | the equivalent of 656 | wait. 657 |

658 |

659 | A different protocol, 660 | which took advantage of more of the generality offered by messages, 661 | was used between the initialization program and the shells 662 | for each terminal. 663 | The initialization process, 664 | whose ID was understood to be 1, 665 | created a shell for each of the terminals, 666 | and then issued 667 | rmes; 668 | each shell, when it read the end of its input file, 669 | used 670 | smes 671 | to send a conventional `I am terminating' 672 | message to the initialization process, 673 | which recreated a new shell process 674 | for that terminal. 675 |

676 |

677 | I can recall no other use of messages. 678 | This explains why the facility 679 | was replaced by the 680 | wait 681 | call of the present system, which is less general, 682 | but more directly applicable 683 | to the desired purpose. 684 | Possibly relevant also is the evident bug in the mechanism: 685 | if a command process attempted to use messages to 686 | communicate with other processes, 687 | it would disrupt the shell's synchronization. 688 | The shell depended on sending a message that 689 | was never received; 690 | if a command executed 691 | rmes, 692 | it would receive the shell's phony message, 693 | and cause the shell to read another input line just as if 694 | the command had terminated. 695 | If a need for general 696 | messages had manifested itself, 697 | the bug would have been repaired. 698 |

699 |

700 | At any rate, the new process control scheme 701 | instantly rendered some very valuable features 702 | trivial to implement; 703 | for example detached processes (with `&') 704 | and recursive use of the shell as a command. 705 | Most systems have to supply some 706 | sort of special `batch job submission' facility 707 | and 708 | a special command interpreter for files distinct 709 | from the one used interactively. 710 |

711 |

712 | Although the multiple-process idea slipped in very easily indeed, 713 | there were some aftereffects that weren't anticipated. 714 | The most memorable of these became evident 715 | soon after the new system 716 | came up and apparently worked. 717 | In the midst of our jubilation, it was discovered 718 | that the 719 | chdir 720 | (change current directory) 721 | command had stopped working. 722 | There was much reading of code and anxious introspection about 723 | how the addition of 724 | fork 725 | could have broken the 726 | chdir 727 | call. 728 | Finally the truth dawned: 729 | in the old system 730 | chdir 731 | was an ordinary command; 732 | it adjusted the current directory of the (unique) 733 | process attached to the terminal. 734 | Under the new system, the 735 | chdir 736 | command correctly changed the current directory of the process 737 | created to execute it, 738 | but this process promptly terminated 739 | and had no effect whatsoever on its parent shell! 740 | It was necessary to make 741 | chdir 742 | a special command, executed internally within the shell. 743 | It turns out that several command-like functions have the same 744 | property, 745 | for example 746 | login. 747 |

748 |

749 | Another mismatch between the system as it had been 750 | and the new process control scheme took longer to become 751 | evident. 752 | Originally, the read/write pointer associated with 753 | each open file was stored within the process that opened 754 | the file. 755 | (This pointer indicates where in the file the next 756 | read or write will take place.) 757 | The problem with this organization became evident only 758 | when we tried to use command files. 759 | Suppose a simple command file contains 760 |

 761 | ls
 762 | who
 763 | 
764 | and it is executed as follows: 765 |
 766 | sh comfile >output
 767 | 
768 | The sequence of events was 769 |

770 |
771 |
1)
772 | The main shell creates a new process, which opens 773 | outfile 774 | to receive the standard output and executes the shell 775 | recursively. 776 |
2)
777 | The new shell creates another process to execute 778 | ls, 779 | which correctly writes on file 780 | output 781 | and then terminates. 782 |
3)
783 | Another process is created to execute the next command. 784 | However, the IO pointer for the output is copied from 785 | that of the shell, 786 | and it is still 0, because the shell has never written 787 | on its output, and IO pointers are associated with processes. 788 | The effect is that the output of 789 | who 790 | overwrites and destroys the output of the preceding 791 | ls 792 | command. 793 |
794 |

795 | Solution of this problem required creation of a new 796 | system table to contain the IO pointers 797 | of open files independently of the process in which they 798 | were opened. 799 |

800 |

IO Redirection 801 |

802 |

803 | The very convenient notation for IO redirection, using the `>' and `<' 804 | characters, 805 | was not present from the very beginning of the PDP-7 Unix system, 806 | but it did appear quite early. 807 | Like much else in Unix, 808 | it was inspired by an idea from Multics. 809 | Multics has a rather general IO redirection mechanism [3] 810 | embodying named IO streams 811 | that can be dynamically redirected 812 | to various devices, files, and even through special 813 | stream-processing modules. 814 | Even in the version of Multics we were familiar with a decade ago, 815 | there existed a command that switched subsequent output 816 | normally destined for the terminal to a file, and another command 817 | to reattach output to the terminal. 818 | Where under Unix one might say 819 |

 820 | ls >xx
 821 | 
822 | to get a listing of the names of one's files in 823 | xx, 824 | on Multics the notation was 825 |
 826 | iocall attach user_output file xx
 827 | list
 828 | iocall attach user_output syn user_i/o
 829 | 
830 | Even though this very clumsy sequence was used often 831 | during the Multics days, 832 | and would have been utterly straightforward to integrate 833 | into the Multics shell, the idea 834 | did not occur to us or anyone else at the time. 835 | I speculate that the reason it did not was the sheer 836 | size of the Multics project: 837 | the implementors of the IO system were at Bell Labs in Murray Hill, 838 | while the shell was done at MIT. 839 | We didn't consider making changes to the shell 840 | (it was 841 | their 842 | program); 843 | correspondingly, 844 | the keepers of the shell may 845 | not even have known of the usefulness, albeit clumsiness, 846 | of 847 | iocall. 848 | (The 1969 Multics manual [4] 849 | lists 850 | iocall 851 | as an `author-maintained,' that is non-standard, command.) 852 | Because both the Unix IO system and its shell were 853 | under the exclusive control of Thompson, 854 | when the right idea finally surfaced, 855 | it was a matter of an hour or so to implement it. 856 |

857 |

The advent of the PDP-11 858 |

859 |

860 | By the beginning of 1970, 861 | PDP-7 Unix was a going concern. 862 | Primitive by today's standards, 863 | it was still capable of providing 864 | a more congenial programming environment than its alternatives. 865 | Nevertheless, it was clear that the PDP-7, a machine we didn't even own, 866 | was already obsolete, 867 | and its successors in the same line offered little of interest. 868 | In early 1970 we proposed 869 | acquisition of a PDP-11, which had just been introduced by 870 | Digital. 871 | In some sense, this proposal was merely the latest 872 | in the series of attempts that had been made throughout the preceding year. 873 | It differed in two important ways. 874 | First, the amount of money (about $65,000) 875 | was an order of magnitude less than what we had previously asked; 876 | second, 877 | the charter sought was not merely to 878 | write some (unspecified) operating system, 879 | but instead to 880 | create a system specifically designed for editing and formatting 881 | text, 882 | what might today be called a `word-processing system.' 883 | The impetus for the proposal came mainly from J. F. Ossanna, 884 | who was then and until the end of his life interested 885 | in text processing. 886 | If our early proposals were too vague, 887 | this one was perhaps too specific; at first it too 888 | met with disfavor. 889 | Before long, however, 890 | funds were obtained through the efforts of L. E. McMahon 891 | and an order for a PDP-11 was placed in May. 892 |

893 |

894 | The processor arrived at the end of the summer, but 895 | the PDP-11 was so new a product that no disk was available until 896 | December. 897 | In the meantime, a rudimentary, core-only version of Unix was written 898 | using a cross-assembler on the PDP-7. 899 | Most of the time, 900 | the machine sat in a corner, enumerating all the closed Knight's tours 901 | on a 6×8 chess board—a three-month job. 902 |

903 |

The first PDP-11 system 904 |

905 |

906 | Once the disk arrived, 907 | the system was quickly completed. 908 | In internal structure, the first version of Unix for the PDP-11 represented a relatively 909 | minor advance over the PDP-7 system; 910 | writing it was largely a matter of transliteration. 911 | For example, 912 | there was no multi-programming; only one user program 913 | was present in core at any moment. 914 | On the other hand, 915 | there were important changes in the interface to the user: 916 | the present directory structure, 917 | with full path names, 918 | was in place, 919 | along with the modern form of 920 | exec 921 | and 922 | wait, 923 | and conveniences like 924 | character-erase and line-kill 925 | processing for terminals. 926 | Perhaps the most interesting thing about the 927 | enterprise was its small size: 928 | there were 24K bytes of core memory 929 | (16K for the system, 8K for user programs), 930 | and a disk with 1K blocks (512K bytes). 931 | Files were limited to 64K bytes. 932 |

933 |

934 | At the time of the placement of the order for the PDP-11, 935 | it had seemed natural, or perhaps expedient, 936 | to promise a system dedicated to word processing. 937 | During the protracted arrival of the hardware, 938 | the increasing usefulness of PDP-7 Unix 939 | made it appropriate to justify creating PDP-11 Unix 940 | as a development tool, to be used in writing the 941 | more special-purpose system. 942 | By the spring of 1971, 943 | it was generally agreed that 944 | no one had the slightest interest in scrapping Unix. 945 | Therefore, we transliterated the 946 | roff 947 | text formatter 948 | into PDP-11 assembler language, 949 | starting from the PDP-7 version that 950 | had been transliterated 951 | from McIlroy's BCPL version on Multics, 952 | which had in turn been inspired 953 | by J. Saltzer's 954 | runoff 955 | program on CTSS. 956 | In early summer, editor and formatter in hand, 957 | we felt prepared to fulfill our charter by offering 958 | to supply a text-processing service to the 959 | Patent department for preparing patent applications. 960 | At the time, they were evaluating a commercial system 961 | for this purpose; the main advantages 962 | we offered 963 | (besides the dubious one of taking part in 964 | an in-house experiment) 965 | were two in number: 966 | first, 967 | we supported Teletype's model 37 terminals, 968 | which, with an extended type-box, 969 | could print most of the math symbols 970 | they required; 971 | second, we quickly endowed 972 | roff 973 | with the ability to produce line-numbered pages, 974 | which the Patent Office required and which the other 975 | system could not handle. 976 |

977 |

978 | During the last half of 1971, we supported three typists from the Patent 979 | department, who spent the day busily typing, editing, and formatting 980 | patent applications, and meanwhile tried to carry on our own work. 981 | Unix has a reputation for supplying interesting services on modest hardware, 982 | and this period may mark a high point in the benefit/equipment ratio; 983 | on a machine with no memory protection and a single .5 MB disk, 984 | every test of a new program required care and boldness, because it could 985 | easily crash the system, and every few hours' work by the typists 986 | meant pushing out more information onto DECtape, because of the 987 | very small disk. 988 |

989 |

990 | The experiment was trying but successful. 991 | Not only did the Patent department adopt Unix, 992 | and thus become the first of many groups at the Laboratories 993 | to ratify our work, 994 | but we achieved sufficient credibility to convince our own management 995 | to acquire 996 | one of the first PDP 11/45 systems made. 997 | We have accumulated much hardware since then, 998 | and labored continuously on the software, 999 | but because most of the interesting work has already been published, 1000 | (e.g. on the system itself [1, 5, 6, 7, 8, 9]) it seems unnecessary to repeat it here. 1001 |

1002 | 1003 |

Pipes 1004 |

1005 |

1006 | One of the most widely admired contributions of Unix 1007 | to the culture of operating systems and command languages 1008 | is the 1009 | pipe, 1010 | as used in a pipeline of commands. 1011 | Of course, the fundamental idea was by no means new; 1012 | the pipeline is merely a specific form of coroutine. 1013 | Even the implementation was not unprecedented, 1014 | although we didn't know it at the time; 1015 | the `communication files' of the Dartmouth 1016 | Time-Sharing System [10] 1017 | did very nearly what Unix pipes do, 1018 | though they seem not to have been exploited so fully. 1019 |

1020 |

1021 | Pipes appeared in Unix in 1972, 1022 | well after the PDP-11 version of the system was in operation, 1023 | at the suggestion (or perhaps insistence) of M. D. McIlroy, 1024 | a long-time advocate of the non-hierarchical control flow 1025 | that characterizes coroutines. 1026 | Some years before pipes were implemented, he suggested 1027 | that commands should be thought of as binary operators, 1028 | whose left and right operand specified the input and output files. 1029 | Thus a `copy' utility would be commanded by 1030 |

1031 | inputfile copy outputfile
1032 | 
1033 | To make a pipeline, command operators could be stacked up. 1034 | Thus, to sort 1035 | input, 1036 | paginate it neatly, and print the result off-line, 1037 | one would write 1038 |
1039 | input sort paginate offprint
1040 | 
1041 | In today's system, this would correspond to 1042 |
1043 | sort input | pr | opr
1044 | 
1045 | The idea, explained one afternoon on a blackboard, 1046 | intrigued us but failed to ignite any immediate action. 1047 | There were several objections to the idea as put: 1048 | the infix notation seemed too radical (we were too 1049 | accustomed to typing `cp x y' to copy 1050 | x 1051 | to 1052 | y); 1053 | and we were unable to see how to 1054 | distinguish command parameters from the input or output files. 1055 | Also, the one-input one-output model 1056 | of command execution seemed too confining. 1057 | What a failure of imagination! 1058 |

1059 |

1060 | Some time later, thanks to McIlroy's persistence, 1061 | pipes were finally installed in the operating system 1062 | (a relatively simple job), 1063 | and a new notation was introduced. 1064 | It used the same characters as for I/O redirection. 1065 | For example, the pipeline above might have been written 1066 |

1067 | sort input >pr>opr>
1068 | 
1069 | The idea is that following a `>' may be either a file, 1070 | to specify redirection of output to that file, 1071 | or a command into which the output of the preceding command 1072 | is directed as input. 1073 | The trailing `>' was needed in the example to specify 1074 | that the (nonexistent) output of 1075 | opr 1076 | should be directed 1077 | to the console; otherwise the command 1078 | opr 1079 | would not have been executed at all; 1080 | instead a file 1081 | opr 1082 | would have been created. 1083 |

1084 |

1085 | The new facility was enthusiastically received, and 1086 | the term `filter' was soon coined. 1087 | Many commands were changed to make them usable in pipelines. 1088 | For example, no one had imagined that anyone would want the 1089 | sort 1090 | or 1091 | pr 1092 | utility to sort or print its standard input if given no explicit arguments. 1093 |

1094 |

1095 | Soon some problems with the notation became evident. 1096 | Most annoying was a silly lexical problem: 1097 | the string after `>' was delimited by blanks, so, 1098 | to give a parameter to 1099 | pr 1100 | in the example, one had to quote: 1101 |

1102 | sort input >"pr -2">opr>
1103 | 
1104 | Second, in attempt to give generality, 1105 | the pipe notation accepted `<' as an input redirection 1106 | in a way corresponding to `>'; this meant that the notation was 1107 | not unique. 1108 | One could also write, for example, 1109 |
1110 | opr <pr<"sort input"<
1111 | 
1112 | or even 1113 |
1114 | pr <"sort input"< >opr>
1115 | 
1116 | The pipe notation using `<' and `>' survived 1117 | only a couple of months; 1118 | it was replaced by the present one 1119 | that uses a unique operator 1120 | to separate components of a pipeline. 1121 | Although the old notation had a certain charm and 1122 | inner consistency, 1123 | the new one is certainly superior. 1124 | Of course, it too has limitations. 1125 | It is unabashedly linear, though there are situations 1126 | in which multiple redirected inputs and outputs 1127 | are called for. 1128 | For example, what is the best way to compare the outputs of 1129 | two programs? 1130 | What is the appropriate notation for invoking a program 1131 | with two parallel output streams? 1132 |

1133 |

1134 | I mentioned above in the section on IO redirection that Multics 1135 | provided a mechanism by which IO streams could be directed 1136 | through processing modules on the way to (or from) the device 1137 | or file serving as source or sink. 1138 | Thus it might seem that stream-splicing in Multics 1139 | was the direct precursor of Unix pipes, as Multics 1140 | IO redirection certainly was for its Unix version. 1141 | In fact I do not think this is true, or is true only in a weak sense. 1142 | Not only were coroutines well-known already, 1143 | but their embodiment as Multics spliceable IO modules 1144 | required that the modules be specially coded in such a way 1145 | that they could be used for no other purpose. 1146 | The genius of the Unix pipeline is precisely that it 1147 | is constructed from the very same commands used constantly 1148 | in simplex fashion. 1149 | The mental leap needed to see this possibility 1150 | and to invent the notation is large indeed. 1151 |

1152 |

High-level languages 1153 |

1154 |

1155 | Every program for the original PDP-7 Unix system was written in 1156 | assembly language, and bare assembly language it was—for example, 1157 | there were no macros. 1158 | Moreover, there was no loader or link-editor, so every program had to be complete in itself. 1159 | The first interesting language to appear was a version 1160 | of McClure's TMG [11] 1161 | that was implemented by McIlroy. 1162 | Soon after TMG became available, 1163 | Thompson decided 1164 | that we could not pretend to offer a real computing service 1165 | without Fortran, 1166 | so he sat down to write a Fortran in TMG. 1167 | As I recall, 1168 | the intent to handle Fortran lasted about a week. 1169 | What he produced instead was a definition of and a compiler for 1170 | the new language B [12]. 1171 | B was much influenced by the BCPL language [13]; 1172 | other influences were Thompson's taste for spartan syntax, 1173 | and the very small space into which the compiler had to fit. 1174 | The compiler produced simple interpretive code; 1175 | although it and the programs it produced were rather slow, 1176 | it made life much more pleasant. 1177 | Once interfaces to the regular system calls were made available, 1178 | we began once again to enjoy the benefits of using a reasonable 1179 | language to write what are usually called 1180 | `systems programs:' 1181 | compilers, assemblers, and the like. 1182 | (Although some might consider the PL/I we used under 1183 | Multics unreasonable, 1184 | it was much better than assembly language.) 1185 | Among other programs, the PDP-7 B cross-compiler for the PDP-11 1186 | was written in B, and in the course of time, 1187 | the B compiler for the PDP-7 itself was transliterated 1188 | from TMG into B. 1189 |

1190 |

1191 | When the PDP-11 arrived, 1192 | B was moved to it almost immediately. 1193 | In fact, a version of the multi-precision `desk calculator' 1194 | program 1195 | dc 1196 | was one of the earliest programs to run on the PDP-11, 1197 | well before the disk arrived. 1198 | However, B did not take over instantly. 1199 | Only passing thought was given to rewriting the operating system 1200 | in B rather than assembler, 1201 | and the same was true of most of the utilities. 1202 | Even the assembler was rewritten in assembler. 1203 | This approach was taken mainly because of the slowness of the interpretive 1204 | code. 1205 | Of smaller but still real importance was the mismatch 1206 | of the word-oriented B language with the byte-addressed 1207 | PDP-11. 1208 |

1209 |

1210 | Thus, in 1971, work began on what was to become the C language [14]. 1211 | The story of the language developments from BCPL 1212 | through B to C is told elsewhere [15], 1213 | and need not be repeated here. 1214 | Perhaps the most important watershed occurred during 1973, 1215 | when the operating system kernel was rewritten in C. 1216 | It was at this point that the system assumed its modern form; 1217 | the most far-reaching change was the introduction of 1218 | multi-programming. 1219 | There were few externally-visible changes, but the internal structure of the 1220 | system became much more rational and general. 1221 | The success of this effort convinced us that C was useful 1222 | as a nearly universal tool for systems programming, 1223 | instead of just a toy for simple applications. 1224 |

1225 |

1226 | Today, the only important Unix program still written in assembler 1227 | is the assembler itself; 1228 | virtually all the utility programs are in C, 1229 | and so are most of the applications programs, although there are 1230 | sites with many in Fortran, Pascal, and Algol 68 as well. 1231 | It seems certain that much of the success of Unix follows 1232 | from the readability, modifiability, and portability 1233 | of its software that in turn follows 1234 | from its expression in high-level languages. 1235 |

1236 |

Conclusion 1237 |

1238 |

1239 | One of the comforting things about old memories is their tendency 1240 | to take on a rosy glow. 1241 | The programming environment provided by the early versions of Unix seems, 1242 | when described here, to be extremely harsh and primitive. 1243 | I am sure that if forced back to the PDP-7 I would find it intolerably limiting and 1244 | lacking in conveniences. 1245 | Nevertheless, it did not seem so at the time; 1246 | the memory fixes on what was good and what lasted, and on the joy of helping 1247 | to create the improvements that made life better. 1248 | In ten years, I hope we can look back with the same mixed impression 1249 | of progress combined with continuity. 1250 |

1251 |

Acknowledgements 1252 |

1253 |

1254 | I am grateful to S. P. Morgan, K. Thompson, and M. D. McIlroy 1255 | for providing early documents and digging up recollections. 1256 |

1257 |

1258 | Because I am most interested in describing the evolution 1259 | of ideas, this paper attributes ideas and work to individuals only where 1260 | it seems most important. 1261 | The reader will not, on the average, 1262 | go far wrong if he reads each occurrence of `we' 1263 | with unclear antecedent 1264 | as `Thompson, with some assistance from me.' 1265 |

1266 |

References 1267 |

1268 |

1269 |
1270 |
1.
1271 | D. M. Ritchie and K. Thompson, 1272 | `The Unix Time-sharing System, 1273 | C. ACM 1274 | 17 1275 | No. 7 (July 1974), pp 365-37. 1276 |
2.
1277 | L. P. Deutch and B. W. Lampson, 1278 | `SDS 930 Time-sharing System Preliminary 1279 | Reference Manual,' Doc. 30.10.10, Project Genie, 1280 | Univ. Cal. at Berkeley (April 1965). 1281 |
3.
1282 | R. J. Feiertag and 1283 | E. I. Organick, 1284 | `The Multics input-output system,' 1285 | Proc. Third Symposium on Operating Systems Principles, 1286 | October 18-20, 1971, 1287 | pp. 35-41. 1288 |
4.
1289 | The Multiplexed Information and Computing Service: Programmers' Manual, 1290 | Mass. Inst. of Technology, Project MAC, Cambridge MA, (1969). 1291 |
5.
1292 | K. Thompson, 1293 | `Unix Implementation,' 1294 | Bell System Tech J. 1295 | 57 1296 | No. 6, (July-August 1978), pp. 1931-46. 1297 |
6.
1298 | S. C. Johnson and D. M. Ritchie, 1299 | Portability of C Programs and the Unix System,' 1300 | Bell System Tech J. 1301 | 57 1302 | No. 6, (July-August 1978), pp. 2021-48. 1303 |
7.
1304 | B. W. Kernighan, 1305 | M. E. Lesk, and 1306 | J. F. Ossanna. 1307 | `Document Preparation,' 1308 | Bell Sys. Tech. J., 1309 | 57 1310 | No. 6, 1311 | pp. 2115-2135. 1312 |
8.
1313 | B. W. Kernighan and 1314 | L. L. Cherry, 1315 | `A System for Typesetting Mathematics,' 1316 | J. Comm. Assoc. Comp. Mach. 1317 | 18, 1318 | pp. 151-157 1319 | (March 1975). 1320 |
9.
1321 | M. E. Lesk and 1322 | B. W. Kernighan, 1323 | `Computer Typesetting of Technical Journals on Unix,' 1324 | Proc. AFIPS NCC 1325 | 46 1326 | (1977), pp. 879-88. 1327 |
10.
1328 | Systems Programmers Manual for the Dartmouth Time Sharing System for the GE 635 Computer, 1329 | Dartmouth College, 1330 | Hanover, New Hampshire, 1331 | 1971. 1332 |
11.
1333 | R. M. McClure, 1334 | `TMG--A Syntax-Directed Compiler,' 1335 | Proc 20th ACM National Conf. (1968), pp. 262-74. 1336 |
12.
1337 | S. C. Johnson and B. W. Kernighan, 1338 | `The Programming Language B,' 1339 | Comp. Sci. Tech. Rep. #8, Bell Laboratories, 1340 | Murray Hill NJ (1973). 1341 |
13.
1342 | M. Richards, 1343 | `BCPL: A Tool for Compiler Writing and Systems Programming,' 1344 | Proc. AFIPS SJCC 1345 | 34 1346 | (1969), pp. 557-66. 1347 |
14.
1348 | B. W. Kernighan and 1349 | D. M. Ritchie, 1350 | The C Programming Language, 1351 | Prentice-Hall, Englewood Cliffs NJ, 1978. 1352 | Second Edition, 1979. 1353 |
15.
1354 | D. M. Ritchie, S. C. Johnson, and M. E. Lesk, 1355 | `The C Programming Language,' 1356 | Bell Sys. Tech. J. 1357 | 57 1358 | No. 6 1359 | (July-August 1978) pp. 1991-2019. 1360 |
1361 |

1362 | 1363 | Copyright © 1996 Lucent Technologies Inc. All rights reserved. 1364 | 1365 | --------------------------------------------------------------------------------