├── README
├── The_AWK_Programming_Language.djvu
├── The_AWK_Programming_Language_zh_CN.pdf
├── awkcode.txt
└── latex_src
    ├── Makefile
    ├── an_awk_tutorial.tex
    ├── answers_to_selected_exercises.tex
    ├── awk.tex
    ├── awk_summary.tex
    ├── data_processing.tex
    ├── epilog.tex
    ├── experiments_with_algorithms.tex
    ├── images
        ├── cover.pdf
        ├── heap_sort.eps
        ├── insertion_sort.eps
        ├── quicksort.eps
        ├── report3.eps
        ├── sort_cmp.eps
        └── traffic_deaths.eps
    ├── index.tex
    ├── little_languages.tex
    ├── preamble.tex
    ├── preface.tex
    ├── processing_words.tex
    ├── reports_and_databases.tex
    └── the_awk_language.tex


/README:
--------------------------------------------------------------------------------
 1 | The AWK Programming Language (Aho, Kernighan, Weinberger 著, 中文名: AWK
 2 | 程序设计语言) 的中文翻译 项目, 使用 LaTeX 排版, 有任何想说的话都可以给我
 3 | 发邮件: wuzhouhui250@gmail.com
 4 | 
 5 | 翻译进度:
 6 |         全部翻译完毕, 当然错误是难免的, 如果你在阅读的过程中发现了它们,
 7 |         一定要记得告诉我 (Issue, Email, 或 Pull Request, 在发送 Pull
 8 |         Request 时, 请基于分支 review).
 9 | 
10 | 文件清单:
11 |         The_AWK_Programming_Language.djvu:      英文原版
12 |         The_AWK_Programming_Language_zh_CN.pdf  中文版
13 |         awkcode.txt                             书中出现的 awk 程序源代码
14 |         latex_src                               LaTeX 源代码
15 | 
16 | 注意事项:
17 |         分支 twoside 中的 PDF 在内容上与 master 分支是一样的, 但是排版
18 |         稍有不同: 双面排版, 字体也更加细腻, 适合打印.
19 | 
20 | 编译:
21 | 	make -C latex_src/
22 | 
23 | 


--------------------------------------------------------------------------------
/The_AWK_Programming_Language.djvu:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wuzhouhui/awk/787d83cbf2a2f1686d026000a6054e531bb7b538/The_AWK_Programming_Language.djvu


--------------------------------------------------------------------------------
/The_AWK_Programming_Language_zh_CN.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wuzhouhui/awk/787d83cbf2a2f1686d026000a6054e531bb7b538/The_AWK_Programming_Language_zh_CN.pdf


--------------------------------------------------------------------------------
/latex_src/Makefile:
--------------------------------------------------------------------------------
 1 | 
 2 | all: awk.pdf
 3 | 
 4 | awk.pdf: *.tex
 5 | 	latexmk -pdf -pdflatex="xelatex" -use-make awk.tex
 6 | 
 7 | clean:
 8 | 	latexmk -CA
 9 | 
10 | .PHONY: all clean awk.pdf
11 | 


--------------------------------------------------------------------------------
/latex_src/an_awk_tutorial.tex:
--------------------------------------------------------------------------------
  1 | % vim: ts=4 sts=4 sw=4 et tw=75
  2 | 
  3 | \chapter{快速入门}
  4 | \label{chap:an_awk_tutorial}
  5 | 
  6 | \marginpar{1}
  7 | Awk 是一种使用方便且表现力很强的编程语言, 它可以应用在多种不同的计算与数据
  8 | 处理任务中. 这一章是一个简短的教程, 目的是为了能让读者尽可能快地写出自己的
  9 | awk 程序.
 10 | 第 \ref{chap:the_awk_language} 章对整个 awk 语言进行描述, 剩下的章节展示了
 11 | 在多种不同的领域中, 如何使用 awk 解决问题. 在书中出现的例子, 读者应该
 12 | 会感到非常有用, 有趣且具有指导作用.
 13 | 
 14 | \section{开始}
 15 | \label{sec:getting_started}
 16 | 
 17 | 实用的 awk 程序通常都很短, 只有一两行. 假设有一个文件, 叫作
 18 | \filename{emp.data}, 这个文件包含有名字, 每小时工资 (以美元为单位),
 19 | 工作时长, 每一行代表一个雇员的记录, 就像这样
 20 | \begin{awkcode}
 21 |     Beth    4.00    0
 22 |     Dan     3.75    0
 23 |     Kathy   4.00    10
 24 |     Mark    5.00    20
 25 |     Mary    5.50    22
 26 |     Susie   4.25    18
 27 | \end{awkcode}
 28 | 现在你想打印每位雇员的名字以及他们的报酬 (每小时工资乘以工作时长), 而雇员
 29 | 的工作时长必须大于零. 这种类型的工作是 awk 的设计目标之一, 所以会很简单.
 30 | 只要键入下面一行即可:
 31 | \begin{awkcode}
 32 |     awk '$3 > 0 { print $1, $2 * $3 }' emp.data
 33 | \end{awkcode}
 34 | 该行命令告诉操作系统运行 awk 程序, 被运行的程序用单引号包围起来, 从输入文件
 35 | \marginpar{2}
 36 | \filename{emp.data} 获取数据. 被单引号包围的部分是一个完整的 awk 程序. 它由
 37 | 一个单独的 \cterm{\mbox{模式}\mbox{--}动作}\ 语句
 38 | (\term{pattern-action statement}) 组成.
 39 | 模式 \verb'$3 > 0' 扫描每一个输入行, 如果该行的第三列 (或者说 \cterm{字段}
 40 | (\term{field})) 大于零, 则动作
 41 | \begin{awkcode}
 42 |     { print $1, $2 * $3 }
 43 | \end{awkcode}
 44 | 就会为每一个匹配行打印第一个字段, 以及第二与第三个字段的乘积.
 45 | 
 46 | 如果想知道哪些员工在偷懒, 键入
 47 | \begin{awkcode}
 48 |     awk '$3 == 0 { print $1 }' emp.data
 49 | \end{awkcode}
 50 | 模式 \verb'$3 == 0' 匹配第三个字段为零的行, 动作
 51 | \begin{awkcode}
 52 |     { print $1 }
 53 | \end{awkcode}
 54 | 打印该行的第一个字段.
 55 | 
 56 | 在阅读这本书时, 请尝试运行并修改书中的程序. 由于大多数程序都很简短,
 57 | 通过这种方式读者可以快速理解 awk 的工作方式. 在 Unix 系统中, 可以这样运行
 58 | 上面提到的两个示例程序:
 59 | \begin{awkcode}
 60 |     $ awk '$3 > 0 { print $1, $2 * $3 }' emp.data
 61 |     Kathy 40
 62 |     Mark 100
 63 |     Mary 121
 64 |     Susie 76.5
 65 |     $ awk '$3 == 0 { print $1 }' emp.data
 66 |     Beth
 67 |     Dan
 68 |     $
 69 | \end{awkcode}
 70 | 行首的字符 \verb'$' 是 Shell 的命令提示符, 在你的机器上或许会不一样.
 71 | 
 72 | \subsection{AWK 程序的结构}
 73 | \label{subsec:the_structure_of_an_awk_program}
 74 | 
 75 | 现在让我们回退一步, 看一下到底发生了什么. 在上面的命令行中, 被单引号包围的
 76 | 部分是使用 awk 语言编写的程序. 本章的每一个 awk 程序都是由一个或多个
 77 | \mbox{模式}\mbox{--}动作\ 语句组成的序列:
 78 | \begin{pattern}
 79 |     \textit{pattern} \texttt{\{} \textit{action} \texttt{\}} \par
 80 |     \textit{pattern} \texttt{\{} \textit{action} \texttt{\}} \par
 81 |     ...
 82 | \end{pattern}
 83 | awk 的基本操作是在由输入行组成的序列中, 陆续地扫描每一行, 搜索可以
 84 | 被模式 \cterm{匹配} (\term{match}) 的行. ``匹配'' 的精确含义依赖于问题中
 85 | 的模式, 比如, 对于 \verb'$3 > 0', 意味着 ``条件为真''.
 86 | 
 87 | \marginpar{3}
 88 | 每一个输入行轮流被每一个模式测试. 每匹配一个模式, 对应的动作 (可能包含多个
 89 | 步骤) 就会执行. 然后下一行被读取, 匹配重新开始. 这个过程会一起持续
 90 | 到所有的输入被读取完毕为止.
 91 | 
 92 | 上面的程序是模式与动作的典型例子. 程序
 93 | \begin{awkcode}
 94 |     $3 == 0 { print $1 }
 95 | \end{awkcode}
 96 | 由一条单独的 \patact 语句组成: 如果某行的第 3 个字段为 0, 那么它的第 1 个
 97 | 字段就会被打印出来.
 98 | 
 99 | 在一个 \patact 语句中, 模式或动作可以省略其一,
100 | 但不能两者同时被省略. 如果一个
101 | 模式没有动作, 例如
102 | \begin{awkcode}
103 |     $3 == 0
104 | \end{awkcode}
105 | 会将每一个匹配行 (也就是条件判断为真的行) 打印出来. 这个程序将文件
106 | \filename{emp.data} 中第3个字段为0的两行打印出来:
107 | \begin{awkcode}
108 |     Beth    4.00    0
109 |     Dan     3.75    0
110 | \end{awkcode}
111 | 
112 | 如果只有动作而没有模式, 例如
113 | \begin{awkcode}
114 |     { print $1 }
115 | \end{awkcode}
116 | 对于每一个输入行, 动作 (在这个例子里是打印第1个字段) 都会被执行.
117 | 
118 | 因为模式与动作都是可选的, 所以用花括号将动作包围起来, 以便区分两者.
119 | 
120 | \subsection{运行 AWK 程序}
121 | \label{running_an_awk_program}
122 | 
123 | 运行一个 awk 程序有多种方式. 可以键入下面这种形式的命令
124 | \begin{pattern}
125 |     \texttt{awk} \texttt{'}\textit{program}\texttt{'} \textit{input files}
126 | \end{pattern}
127 | 这个命令对指定的输入文件的每一行, 执行 \textit{program}. 例如你可以键入
128 | \begin{awkcode}
129 |     awk '$3 == 0 { print $1 }' file1 file2
130 | \end{awkcode}
131 | 打印文件 \filename{file1} 与 \filename{file2} 的每一行的第一个字段
132 | (条件是该行的第3个字段为0).
133 | 
134 | 也可以在命令行上省略输入文件, 只要键入
135 | \begin{pattern}
136 |     \texttt{awk} \texttt{'}\textit{program}\texttt{'}
137 | \end{pattern}
138 | 在这种情况下, awk 会将 \textit{program} 应用到你接下来在终端输入的内容上面,
139 | 直到键入一个文件结束标志 (Unix 系统是组合键 Control-d). 下面是一个在 Unix
140 | 上运行的例子
141 | \marginpar{4}
142 | \begin{pattern}
143 |     \indent\verb"$ awk '$3 == 0 { print $1 }'"\par
144 |     \indent\verb'Beth    4.00    0'\par
145 |     \indent\textbf{\texttt{Beth}}\par
146 |     \indent\verb'Dan     3.75    0'\par
147 |     \indent\textbf{\texttt{Dan}}\par
148 |     \indent\verb'Kathy   3.75    10'\par
149 |     \indent\verb'Kathy   3.75    0'\par
150 |     \indent\textbf{\texttt{Kathy}}\par
151 |     \indent\verb'...'\par
152 | \end{pattern}
153 | 由 awk 打印的字符加粗显示.
154 | 
155 | 这种行为对测试 awk 程序来说非常方便: 键入程序与数据, 检查程序的输出. 我们再
156 | 次建议读者运行并修改书中的程序.
157 | 
158 | 注意到, 命令行中的程序被单引号包围. 这个规定可以防止程序中的字符 (例如
159 | \verb'$') 被 shell 解释, 也可以让程序的长度多于一行.
160 | 
161 | 当程序的长度比较短时 (只有几行), 这种安排会比较方便. 如果程序比较长, 更好的
162 | 做法是将它们放在一个单独的文件中, 如果文件名是 \filename{progfile} 的话,
163 | 运行时只要键入
164 | \begin{pattern}
165 |     \texttt{awk -f progfile} \textit{optional list of files}
166 | \end{pattern}
167 | 选项 \verb'-f' 告诉 awk 从文件中提取程序. 在 \filename{progfile} 出现的地方
168 | 可以是任意的文件名.
169 | 
170 | \subsection{错误}
171 | \label{subsec:errors}
172 | 
173 | 如果你在 awk 程序犯了一个错误, awk 会显示一个诊断信息. 例如, 如果打错了一
174 | 个花括号, 就像这样
175 | \begin{awkcode}
176 |     awk '$3 == 0 [ print $1 }' emp.data
177 | \end{awkcode}
178 | 将会打印一条这样的消息
179 | \begin{awkcode}
180 |     awk: syntax error at source line 1
181 |     context is
182 |             $3 == 0 >>> [ <<<
183 |             extra }
184 |             missing ]
185 |     awk: Bailing out at source line 1
186 | \end{awkcode}
187 | ``Syntax error'' 意味着你犯了一个语法错误, 这个错误被发现的地方用
188 | \verb'>>> <<<' 标记. ``Bailing out'' 意味着无法恢复. 有时候可以得到更
189 | 多的关于错误的信息, 例如, 错误信息报告了一个不匹配的花括号或括号.
190 | 
191 | 由于发生了语法错误, awk 不会尝试执行这个程序.
192 | 然而有些错误直到运行时才会检测到. 例如, 程序尝试用 0 作除数, 这时候 awk
193 | 会停止处理, 接着报告输入行的行号, 以及程序中尝试进行除法运算的代码所在的
194 | 行号.
195 | 
196 | \section{简单的输出}
197 | \label{sec:simple_output}
198 | 
199 | \marginpar{5}
200 | 本章的余下部分包含了一系列简短并且典型的 awk 程序, 这些程序都是对
201 | 文件 \filename{emp.data} 进行处理. 我们会简单地介绍这些程序是怎么工作的, 但
202 | 这些例子主要用于阐述有用的操作, 这些操作很容易用 awk 完成, 包括打印字段,
203 | 选择输入, 以及变换数据. 我们不会展现 awk 所能做的所有事情, 也不会对
204 | 细节作过多的解释. 但是阅读完这一章, 读者将有能力利用 awk
205 | 完成相当数量的工作, 而且会发现阅读后面的章节变得更加容易.
206 | 
207 | 我们只将程序的主体显示出来, 而不是完整的命令行. 在每一种情况下, 程序或者可以
208 | 被包围在一对单引号中, 作为 \awk 命令的第一个参数来运行, 也可以将其放入一
209 | 个文件中, 通过带有 \verb'-f' 选项的 \awk 命令来运行.
210 | 
211 |  awk 的数据只有两种类型: 数值与由字符组成的字符串. 文件
212 | \filename{emp.data} 是很典型的待处理数据, 它既含有单词, 也包括数值, 且字
213 | 段之间通过制表符或空格分隔.
214 | 
215 | awk 从它的输入中每次读取一行, 将行分解为一个个的字段 (默认将字段看作是
216 | 非空白字符组成的序列). 当前输入行的第一个字段叫作 \verb'$1', 第二个
217 | 是 \verb'$2', 依次类推. 一整行记为 \verb'$0'. 每行的字段数有可能不一样.
218 | 
219 | 通常情况下, 我们需要做的是打印每一行的部分或全部字段, 也可能会做一些计算.
220 | 这一节中的所有程序都是这种形式.
221 | 
222 | \subsection{打印每一行}
223 | \label{subsec:printing_every_line}
224 | 
225 | 如果一个动作没有模式, 对于每一个输入行, 该动作都会被执行. 语句 \print 会
226 | 打印每一个当前输入行, 所以程序
227 | \begin{awkcode}
228 |     { print }
229 | \end{awkcode}
230 | 会将它所有的输入打印到标准输出. 因为 \verb'$0' 表示一整行, 所以程序
231 | \begin{awkcode}
232 |     { print $0 }
233 | \end{awkcode}
234 | 完成同样的工作.
235 | 
236 | \subsection{打印某些字段}
237 | \label{subsec:printing_certain_fields}
238 | 
239 | 在单个 \print 语句中可以将多个条目打印到同一个输出行中. 打印每一个输入行
240 | 的第1与第3个字段的程序是
241 | \begin{awkcode}
242 |     { print $1, $3 }
243 | \end{awkcode}
244 | 当 \filename{emp.data} 作为输入时, 它会输出
245 | \marginpar{6}
246 | \begin{awkcode}
247 |     Beth 0
248 |     Dan 0
249 |     Kathy 10
250 |     Mark 20
251 |     Mary 22
252 |     Susie 18
253 | \end{awkcode}
254 | 在 \print 语句中由逗号分隔的表达式, 在输出时默认用一个空格符分隔. 由
255 | \print 打印的每一行都由一个换行符终止. 这些默认行为都可以修改, 我们将在第
256 | \ref{chap:the_awk_language} 章讨论如何修改.
257 | 
258 | \subsection{\texttt{NF}, 字段的数量}
259 | \label{subsec:nf_the_number_fields}
260 | 
261 | 有时候, 必须总是通过 \verb'$1', \verb'$2' 这样的形式引用字段, 但是任何表
262 | 达式都可以出现在 \verb'$' 的后面, 用来指明一个字段的编号: 表达式被求值, 求出
263 | 的值被当作字段的编号. Awk 计算当前输入行的字段数量, 并将它存储在一个内建的
264 | 变量中, 这个变量叫作 \nf. 因此程序
265 | \begin{awkcode}
266 |     { print NF, $1, $NF }
267 | \end{awkcode}
268 | 将会打印每一个输入行的字段数量, 第一个字段, 以及最后一个字段.
269 | 
270 | \subsection{计算和打印}
271 | \label{subsec:computing_and_printing}
272 | 
273 | 也可以用字段的值进行计算, 并将计算得到的结果放在输出语句中. 程序
274 | \begin{awkcode}
275 |     { print $1, $2 * $3 }
276 | \end{awkcode}
277 | 是一个很典型的例子, 它会打印雇员的名字与报酬 (每小时工资乘以工作时长):
278 | \begin{awkcode}
279 |     Beth 0
280 |     Dan 0
281 |     Kathy 40
282 |     Mark 100
283 |     Mary 121
284 |     Susie 76.5
285 | \end{awkcode}
286 | 我们待会儿就会展示如何将输出做得更好看.
287 | 
288 | \subsection{打印行号}
289 | \label{subsec:printing_line_numbers}
290 | 
291 | Awk 提供了另一个内建变量 \nr, 这个变量计算到目前为止, 读取到的行的数量.
292 | 我们可以使用 \nr 和 \verb'$0' 为 \filename{emp.data} 的每一行加上行号:
293 | \begin{awkcode}
294 |     { print NR, $0 }
295 | \end{awkcode}
296 | 输出就像这样:
297 | \marginpar{7}
298 | \begin{awkcode}
299 |     1 Beth    4.00    0
300 |     2 Dan     3.75    0
301 |     3 Kathy   4.00    10
302 |     4 Mark    5.00    20
303 |     5 Mary    5.50    22
304 |     6 Susie   4.25    18
305 | \end{awkcode}
306 | 
307 | \subsection{将文本放入输出中}
308 | \label{subsec:putting_text_in_the_output}
309 | 
310 | 也可以把单词放在字段与算术表达式之间:
311 | \begin{awkcode}
312 |     { print "total pay for", $1, "is", $2 * $3 }
313 | \end{awkcode}
314 | 输出
315 | \begin{awkcode}
316 |     total pay for Beth is 0
317 |     total pay for Dan is 0
318 |     total pay for Kathy is 40
319 |     total pay for Mark is 100
320 |     total pay for Mary is 121
321 |     total pay for Susie is 76.5
322 | \end{awkcode}
323 | 在 \print 语句中, 被双引号包围的文本会和字段, 以及运算结果一起输出.
324 | 
325 | \section{更精美的输出}
326 | \label{sec:fancier_output}
327 | 
328 | \print 用于简单快速的输出. 如果读者想要格式化输出, 那么就需要使用 \printf
329 | 语句. 正如我们将要在 \ref{sec:output} 节看到的那样, \printf 几乎可以产生
330 | 任何种类的输出, 但在这一节, 我们仅仅展现它的一小部分能力.
331 | 
332 | \subsection{字段排列}
333 | \label{subsec:lining_up_fields}
334 | 
335 | \printf 语句具有形式
336 | \begin{pattern}
337 |     \texttt{printf(}\textit{format}\texttt{,} \textit{value$_1$}\texttt{,}
338 |     \textit{value$_2$}\texttt{, ... ,} \textit{value$_n$}\texttt{)}
339 | \end{pattern}
340 | \textit{format} 是一个字符串, 它包含按字面打印的文本, 中间散布着格式说明符,
341 | 格式说明符用于说明如何打印值. 一个格式说明符是一个 \verb'%', 后面跟着几个
342 | 字符, 这些字符控制一个 \textit{value} 的输出格式. 第一个格式说明符说明
343 | \textit{value$_1$} 的输出格式, 第二个格式说明符说明 \textit{value$_2$} 的
344 | 输出格式, 依次类推. 于是, 格式说明符的数量应该和被打印的 \textit{value} 一
345 | 样多.
346 | 
347 | 这个程序使用 \printf 打印每位雇员的报酬:
348 | \begin{awkcode}
349 |     { printf("total pay for %s is $%.2f\n", $1, $2 * $3) }
350 | \end{awkcode}
351 | 这个 \printf 语句的格式字符串包含两个格式说明符. 第一个格式说明符
352 | \marginpar{8}
353 | \verb'%s', 是说将第一个值 \verb'$1', 以字符串的形式打印; 第二个格式说明符
354 | \verb'%.2f', 是说将第二个值 \verb'$2*$3', 按照数值格式打印, 且带有两位小
355 | 数. 格式
356 | 字符串的其他内容 (包括美元符) 按照字面值打印; 字符串末尾的 \verb'\n' 表示
357 | 换行符, 该符号使后面的输出从下一行开始. 当 \filename{emp.data} 作为输入时,
358 | 这个程序输出:
359 | \begin{awkcode}
360 |     total pay for Beth is $0.00
361 |     total pay for Dan is $0.00
362 |     total pay for Kathy is $40.00
363 |     total pay for Mark is $100.00
364 |     total pay for Mary is $121.00
365 |     total pay for Susie is $76.50
366 | \end{awkcode}
367 | 使用 \printf 不会自动产生空格符或换行符; 用户必须自己创建它们, 不要忘了
368 | \verb'\n'.
369 | 
370 | 另外一个程序打印每位雇员的名字与报酬:
371 | \begin{awkcode}
372 |     { printf("%-8s $%6.2f\n", $1, $2 * $3) }
373 | \end{awkcode}
374 | 第一个格式说明符 \verb'%-8s', 将名字左对齐输出, 占用8个字符的宽度. 第二个
375 | 格式说明符 \verb'%6.2f', 将报酬以带有两位小数的数值格式打印出来,
376 | 数字至少占用
377 | 6个字符的宽度:
378 | \begin{awkcode}
379 |     Beth     $  0.00
380 |     Dan      $  0.00
381 |     Kathy    $ 40.00
382 |     Mark     $100.00
383 |     Mary     $121.00
384 |     Susie    $ 76.50
385 | \end{awkcode}
386 | 更多的关于 \printf 的例子会慢慢加以介绍, 而完整描述在
387 | \ref{sec:output} 节.
388 | 
389 | \subsection{输出排序}
390 | \label{subsec:sorting_the_output}
391 | 
392 | 设想一下你想要为每一位雇员打印所有的数据, 包括他的报酬, 报酬按照升序排列.
393 | 最简单的办法是使用 awk 在每一位雇员的记录前加上报酬, 然后再通过一个排序
394 | 程序进行排序, 在 Unix 中, 命令行
395 | \begin{awkcode}
396 |     awk '{ printf("%6.2f %s\n", $2 * $3, $0) }' emp.data | sort -n
397 | \end{awkcode}
398 | 将 awk 的输出通过管道传递给 \texttt{sort} 命令, 最后输出:
399 | \marginpar{9}
400 | \begin{awkcode}
401 |       0.00 Beth    4.00    0
402 |       0.00 Dan     3.75    0
403 |      40.00 Kathy   4.00    10
404 |      76.50 Susie   4.25    18
405 |     100.00 Mark    5.00    20
406 |     121.00 Mary    5.50    22
407 | \end{awkcode}
408 | 
409 | \section{选择}
410 | \label{sec:selection}
411 | 
412 | Awk 的模式非常擅长从输入中选择感兴趣的行, 以便进行进一步的处理. 因为一个没有
413 | 动作的模式会将所有匹配的行打印出来, 所以许多 awk 程序仅含有一条单独的模式.
414 | 这一节给出的的例子, 其模式具有很高的实用价值.
415 | 
416 | \subsection{通过比较进行选择}
417 | \label{subsec:selection_by_comparison}
418 | 
419 | 这个程序使用一个比较模式来选择某些雇员的记录, 条件是他的每小时工资大于等于
420 | \$5.00, 也就是第二个字段大于等于 5:
421 | \begin{awkcode}
422 |     $2 >= 5
423 | \end{awkcode}
424 | 它从 \filename{emp.data} 中选择这些行:
425 | \begin{awkcode}
426 |     Mark    5.00    20
427 |     Mary    5.50    22
428 | \end{awkcode}
429 | 
430 | \subsection{通过计算进行选择}
431 | \label{subsec:selection_by_computation}
432 | 
433 | 程序
434 | \begin{awkcode}
435 |     $2 * $3 > 50 { printf("$%.2f for %s\n", $2 * $3, $1) }
436 | \end{awkcode}
437 | 打印那些报酬超过 \$50 的雇员:
438 | \begin{awkcode}
439 |     $100.00 for Mark
440 |     $121.00 for Mary
441 |     $76.50 for Susie
442 | \end{awkcode}
443 | 
444 | \subsection{通过文本内容选择}
445 | \label{subsec:selection_by_text_content}
446 | 
447 | 除了数值选择, 用户也可以选择那些包含特定单词或短语的输入行. 这个程序打印所有
448 | 第一个字段是 \texttt{Susie} 的行:
449 | \begin{awkcode}
450 |     $1 == "Susie"
451 | \end{awkcode}
452 | 操作符 \texttt{==} 测试相等性. 用户也可以搜索含有任意字母, 单词或短语的文本,
453 | 通过一个叫做\cterm{正则表达式} (\term{regular expressions}) 的模式来完成.
454 | 这个程序打印所有包含 \texttt{Susie} 的行:
455 | \marginpar{10}
456 | \begin{awkcode}
457 |     /Susie/
458 | \end{awkcode}
459 | 输出是
460 | \begin{awkcode}
461 |     Susie   4.25    18
462 | \end{awkcode}
463 | 正则表达式可以用来指定非常精细的模式, \ref{sec:patterns} 节包含了一
464 | 个完整的讨论.
465 | 
466 | \subsection{模式的组合}
467 | \label{subsec:combinations_of_patterns}
468 | 
469 | 模式可以使用括号和逻辑运算符进行组合, 逻辑运算符包括 \AND, \OR, 和 \NOT,
470 | 分别表示 AND, OR, 和 NOT. 程序
471 | \begin{awkcode}
472 |     $2 >= 4 || $3 >= 20
473 | \end{awkcode}
474 | 打印那些 \verb'$2' 至少为 4, 或者 \verb'$3' 至少为 20 的行:
475 | \begin{awkcode}
476 |     Beth    4.00    0
477 |     Kathy   4.00    10
478 |     Mark    5.00    20
479 |     Mary    5.50    22
480 |     Susie   4.25    18
481 | \end{awkcode}
482 | 两个条件都满足的行只输出一次. 将这个程序与下面这个程序作对比, 它包含两
483 | 个模式:
484 | \begin{awkcode}
485 |     $2 >= 4
486 |     $3 >= 20
487 | \end{awkcode}
488 | 如果某行对这两个条件都满足, 它会被打印两次
489 | \begin{awkcode}
490 |     Beth    4.00    0
491 |     Kathy   4.00    10
492 |     Mark    5.00    20
493 |     Mark    5.00    20
494 |     Mary    5.50    22
495 |     Mary    5.50    22
496 |     Susie   4.25    18
497 | \end{awkcode}
498 | 注意程序
499 | \begin{awkcode}
500 |     !($2 < 4 && $3 < 20)
501 | \end{awkcode}
502 | 打印的行不满足这样的条件: \verb'$2' 小于 4, 并且 \verb'$3' 也小于 20; 这个
503 | 条件判断与上面的第一个等价, 虽然在可读性方面差了一点.
504 | 
505 | \subsection{数据验证}
506 | \label{subsec:data_validation}
507 | 
508 | 真实的数据总是存在错误. 检查数据是否具有合理的值, 格式是否正确,
509 | 这种任务通常称作\cterm{数据验证} (\term{data validation}), 在这一方面 awk
510 | 是一款非常优秀的工具.
511 | 
512 | 数据验证在本质上是否定: 不打印具有期望的属性的行, 而是打印可疑行.  接下来的
513 | 程序使用比较模式, 将 5 条合理性测试应用到 \filename{emp.data} 的每一行:
514 | \marginpar{11}
515 | \begin{awkcode}
516 |     NF != 3   { print $0, "number of fields is not equal to 3" }
517 |     $2 < 3.35 { print $0, "rate is below minimum wage" }
518 |     $2 > 10   { print $0, "rate exceeds $10 per hour" }
519 |     $3 < 0    { print $0, "negative hours worked" }
520 |     $3 > 60   { print $0, "too many hours worked" }
521 | \end{awkcode}
522 | 如果数据没有错误, 就不会有输出.
523 | 
524 | \subsection{\BEGIN 与 \END}
525 | \label{subsec:an_awk_tutorial_begin_and_end}
526 | 
527 | 特殊的模式 \BEGIN 在第一个输入文件的第一行之前被匹配, \END 在最后一个输入
528 | 文件的最后一行被处理之后匹配. 这个程序使用 \BEGIN 打印一个标题:
529 | \begin{awkcode}
530 |     BEGIN { print "NAME     RATE    HOURS"; print "" }
531 |           { print }
532 | \end{awkcode}
533 | 输出是
534 | \begin{awkcode}
535 |     NAME     RATE    HOURS
536 | 
537 |     Beth    4.00    0
538 |     Dan     3.75    0
539 |     Kathy   4.00    10
540 |     Mark    5.00    20
541 |     Mary    5.50    22
542 |     Susie   4.25    18
543 | \end{awkcode}
544 | 可以在同一行放置多个语句, 语句之间用分号分开. 注意 \verb'print ""' 打
545 | 印一个空行, 它与一个单独的 \print 并不相同, 后者打印当前行.
546 | 
547 | \section{用 AWK 计算}
548 | \label{sec:computing_with_awk}
549 | 
550 | 一个动作就是一个语句序列, 语句之间用分号或换行符分开. 读者已经见过只有一条
551 | 单独的 \print 语句的动作. 这一小节提供的例子所包含的语句可以用来进行简单
552 | 的数学或字符串计算. 在这些语句里, 不仅可以使用内建变量, 比如 \nf, 还
553 | 可以自己定义变量, 这些变量可以用来计算, 存储数据等. 在 awk 中, 用户创建
554 | 的变量不需要事先声明就可以使用.
555 | 
556 | \subsection{计数}
557 | \label{subsec:counting}
558 | 
559 | 这个程序用一个变量 \texttt{emp} 计算工作时长超过 15 个小时的员工人数:
560 | \marginpar{12}
561 | \begin{awkcode}
562 |     $3 > 15 { emp = emp + 1 }
563 |     END     { print emp, "employees worked more than 15 hours" }
564 | \end{awkcode}
565 | 对每一个第三个字段超过 15 的行, 变量 \texttt{emp} 的值就加 1. 用
566 | \filename{emp.data} 作输入数据, 这个程序输出:
567 | \begin{awkcode}
568 |     3 employees worked more than 15 hours
569 | \end{awkcode}
570 | 当 awk 的变量作为数值使用时, 默认初始值为 0, 所以我们没必要初始化
571 | \texttt{emp}.
572 | 
573 | \subsection{计算总和与平均数}
574 | \label{subsec:computing_sums_and_averages}
575 | 
576 | 为了计算雇员的人数, 我们可以使用内建变量 \nr, 它的值是到目前为止读取到的行
577 | 数; 当所有输入都处理完毕时, 它的值就是读取到的行数.
578 | \begin{awkcode}
579 |     END {print NR, "employees" }
580 | \end{awkcode}
581 | 输出是:
582 | \begin{awkcode}
583 |     6 employees
584 | \end{awkcode}
585 | 
586 | 这里有个程序利用 \nr 来计算平均报酬:
587 | \begin{awkcode}
588 |         { pay = pay + $2 * $3 }
589 |         END { print NR, "employees"
590 |               print "total pay is", pay
591 |               print "average pay is", pay / NR
592 |             }
593 | \end{awkcode}
594 | 第一个动作累加所有雇员的报酬. \END 动作打印:
595 | \begin{awkcode}
596 |     6 employees
597 |     total pay is 337.5
598 |     average pay is 56.25
599 | \end{awkcode}
600 | 很明显, \printf 可以用来产生更加美观的输出. 这个程序有一个潜在的错误: 一
601 | 种不常见的情况是 \nr 的值为 0, 程序会尝试将 0 作除数, 此时 awk 就会产生一
602 | 条错误消息.
603 | 
604 | \subsection{操作文本}
605 | \label{subsec:handling_text}
606 | 
607 | Awk 的长处之一是它可以非常方便地对字符串进行操作, 就像其他大多数语言处理数值那
608 | 样方便. Awk 的变量除了可以存储数值, 还可以存储字符串. 这个程序搜索每小时
609 | 工资最高的雇员:
610 | \begin{awkcode}
611 |     $2 > maxrate { maxrate = $2; maxemp = $1 }
612 |     END { print "highest hourly rate:", maxrate, "for", maxemp }
613 | \end{awkcode}
614 | 它的输出是:
615 | \marginpar{13}
616 | \begin{awkcode}
617 |     highest hourly rate: 5.50 for Mary
618 | \end{awkcode}
619 | 在这个程序里, 变量 \texttt{maxrate} 保存的是数值, 而 \texttt{maxemp} 保存
620 | 的是字符串. (如果有多个雇员都拥有相同的最高每小时工资, 这个程序只会打印第
621 | 一个人的名字.)
622 | 
623 | \subsection{字符串拼接}
624 | \label{subsec:string_concatenation}
625 | 
626 | 可以通过旧字符串的组合来生成一个新字符串; 这个操作叫作\cterm{拼接}
627 | (\term{concatenation}). 程序
628 | \begin{awkcode}
629 |         { names = names $1 " " }
630 |     END { print names }
631 | \end{awkcode}
632 | 将所有雇员的名字都收集到一个单独的字符串中, 每一次拼接都是把名字与一个空格
633 | 符添加到变量 \texttt{names} 的值的末尾. \texttt{names} 在 \END 动作
634 | 中被打印出来:
635 | \begin{awkcode}
636 |     Beth Dan Kathy Mark Mary Susie 
637 | \end{awkcode}
638 | 在一个 awk 程序中, 字符串的拼接操作通过陆续写出字符串来完成. 对每一个输入行,
639 | 上面程序中的第一条语句将三个字符串连接在一起: \texttt{names} 早先的值,
640 | 第一个字段, 以及一个空格; 然后再将结果字符串赋值给 \texttt{names}. 于是, 当
641 | 所有的输入行都读取完毕时, \texttt{names} 包含有一个由所有雇员名字组成的,
642 | 每个
643 | 名字之间由空格分隔的字符串. 用来存储字符串的变量的初始值默认为空字符串 (也
644 | 就是说该字符串不包含任何字符), 因此在这个程序里, \texttt{names} 不需要显式
645 | 地初始化.
646 | 
647 | \subsection{打印最后一行}
648 | \label{subsec:printing_the_last_input_line}
649 | 
650 | 虽然在 \END 动作里, \nr 的值被保留了下来, 但是 \verb'$0' 却不会. 程序
651 | \begin{awkcode}
652 |         { last = $0 }
653 |     END { print last }
654 | \end{awkcode}
655 | 可以用来打印文件的最后一行:
656 | \begin{awkcode}
657 |     Susie   4.25    18
658 | \end{awkcode}
659 | 
660 | \subsection{内建函数}
661 | \label{subsec:built_in_functions}
662 | 
663 | 我们已经看到 awk 提供有内建变量, 这些变量可以用来维护经常需要用到的量, 比如
664 | 字段的个数, 以及当前输入行的行号. 同样, awk 也提供用来计算其他值的
665 | 内建函数. 求平方根, 取对数, 随机数, 除了这些数学函数, 还有其他用来操作文本
666 | 的函数. 其中之一是 \texttt{length}, 它用来计算字符串中字符的个数. 例如,
667 | 这个程序计算每一个人的名字的长度:
668 | \begin{awkcode}
669 |     { print $1, length($1) }
670 | \end{awkcode}
671 | 程序运行结果是:
672 | \marginpar{14}
673 | \begin{awkcode}
674 |     Beth 4
675 |     Dan 3
676 |     Kathy 5
677 |     Mark 4
678 |     Mary 4
679 |     Susie 5
680 | \end{awkcode}
681 | 
682 | \subsection{行, 单词与字符的计数}
683 | \label{subsec:counting_lines_words_and_characters}
684 | 
685 | 这个程序使用 \length, \nf 与 \nr 计算行, 单词与字符的数量, 为方便起见, 我们
686 | 将每个字段都当成一个单词.
687 | \begin{awkcode}
688 |     { nc = nc + length($0) + 1
689 |       nw = nw + NF
690 |     }
691 |     END { print NR, "lines,", nw, "words,", nc, "characters" }
692 | \end{awkcode}
693 | 文件 \filename{emp.data} 含有
694 | \begin{awkcode}
695 |     6 lines, 18 words, 77 characters
696 | \end{awkcode}
697 | 我们为每一个输入行末尾的换行符加 1, 这是因为 \verb'$0' 不包含换行符.
698 | 
699 | \section{流程控制语句}
700 | \label{sec:control_flow_statements}
701 | 
702 | Awk 提供了用于决策的 \verb'if-else' 语句, 以及循环语句, 所有的这些都来源于
703 | C 语言. 它们只能用在动作 (Action) 里.
704 | 
705 | \subsection{\texttt{If-Else} 语句}
706 | \label{subsec:if_else_statement}
707 | 
708 | 下面这个程序计算每小时工资多于 \$6.00 的雇员的总报酬与平均报酬. 在计算平均
709 | 数时, 它用到了 \texttt{if} 语句, 避免用 0 作除数.
710 | \begin{awkcode}
711 |     $2 > 6 { n = n + 1; pay = pay + $2 * $3 }
712 |     END    { if (n > 0)
713 |                print n, "employees, total pay is", pay,
714 |                         "average pay is", pay/n
715 |            else
716 |                print "no employees are paid more than $6/hour"
717 |            }
718 | \end{awkcode}
719 | \filename{emp.data} 的输出是:
720 | \begin{awkcode}
721 |     no employees are paid more than $6/hour
722 | \end{awkcode}
723 | \marginpar{15}
724 | 在 \verb'if-else' 语句里, \verb'if' 后面的条件被求值, 如果条件为真, 第一个
725 | \print 语句执行, 否则是第二个 \print 语句被执行. 注意到, 在逗号后面断
726 | 行, 我们可以将一个长语句延续到下一行.
727 | 
728 | \subsection{\texttt{While} 语句}
729 | \label{subsec:while_statement}
730 | 
731 | 一个 \texttt{while} 含有一个条件判断与一个循环体. 当条件为真时, 循环体
732 | 执行. 下面这个程序展示了一笔钱在一个特定的利率下, 其价值如何随着投资
733 | 时间的增长而增加, 价值计算的公式是 $value = amount (1 + rate)^{years}$.
734 | \begin{awkcode}
735 |     # interest1 - compute compound interest
736 |     #   input:  amount  rate  years
737 |     #   output: compounded value at the end of each year
738 | 
739 |     {   i = 1
740 |         while (i <= $3) {
741 |             printf("\t%.2f\n", $1 * (1 + $2) ^ i)
742 |             i = i + 1
743 |         }
744 |     }
745 | \end{awkcode}
746 | \while 后面被括号包围起来的表达式是条件判断; 循环体是跟在条件判断后面的,
747 | 被花
748 | 括号包围起来的的两条语句. \printf 格式控制字符串里的 \verb'\t' 表示一个制
749 | 表符; \verb'^' 是指数运算符. 从井号 (\verb'#') 开始, 直到行末的文本是
750 | \cterm{注释} (\term{comment}), 注释会被 awk 忽略, 但有助于其他人读懂程
751 | 序.
752 | 
753 | 读者可以键入三个数, 看一下不同的本金, 利率和时间会产生怎样的结果. 举
754 | 个例子, 这个交易展示了 \$1000 在 $6\%$ 与 $12\%$ 的利率下, 在5 年的时间里如
755 | 何升值:
756 | \begin{awkcode}
757 |     $ awk -f interest1
758 |     1000 .06 5
759 |             1060.00
760 |             1123.60
761 |             1191.02
762 |             1262.48
763 |             1338.23
764 |     1000 .12 5
765 |             1120.00
766 |             1254.40
767 |             1404.93
768 |             1573.52
769 |             1762.34
770 | \end{awkcode}
771 | 
772 | \subsection{\texttt{For} 语句}
773 | \label{subsec:for_statement}
774 | \marginpar{16}
775 | 大多数循环都包括初始化, 测试, 增值, 而 \texttt{for} 语句将这三者压缩成一行.
776 | 这里是前一个计算投资回报的程序, 不过这次用 \for 循环:
777 | \begin{awkcode}
778 |     # interest2 - compute compound interest
779 |     #   input:  amount  rate  years
780 |     #   output: compounded value at the end of each year
781 | 
782 |     {   for (i = 1; i <= $3; i = i + 1)
783 |             printf("\t%.2f\n", $1 * (1 + $2) ^ i)
784 |     }
785 | \end{awkcode}
786 | 初始化语句 \texttt{i = 1} 只执行一次. 接下来, 判断条件 \verb'i <= $3' 是否
787 | 成立;
788 | 如果测试结果为真, 循环体的 \printf 语句被执行. 执行完循环体之后, 增值语句
789 | \texttt{i = i + 1} 执行, 循环的下一次迭代从条件的另一次测试开始. 代码很
790 | 紧凑, 因为循环体只有一条语句, 也就不再需要花括号.
791 | 
792 | \section{数组}
793 | \label{sec:arrays}
794 | 
795 | Awk 提供了数组, 用来存储一组相关的值. 虽然数组给予了 awk 非常可观的力量,
796 | 但是
797 | 我们在这里只展示一个简单的例子. 下面这个程序按行逆序显示输入数据. 第一个
798 | 动作将输入行放入数组 \texttt{line} 的下一个元素中; 也就是说, 第一行放入
799 | \texttt{line[1]}, 第二行放入 \texttt{line[2]}, 依次类推. \END 动作用一个
800 | \while 循环, 从数组的最后一个元素开始打印, 一直打印到第一个元素为止:
801 | \begin{awkcode}
802 |     # reverse - print input in reverse order by line
803 | 
804 |         { line[NR] = $0 }  # remember each input line
805 | 
806 |     END { i = NR           # print lines in reverse order
807 |           while (i > 0) {
808 |               print line[i]
809 |               i = i - 1
810 |           }
811 |         }
812 | \end{awkcode}
813 | 用 \filename{emp.data} 作输入, 输出是
814 | \begin{awkcode}
815 |     Susie   4.25    18
816 |     Mary    5.50    22
817 |     Mark    5.00    20
818 |     Kathy   4.00    10
819 |     Dan     3.75    0
820 |     Beth    4.00    0
821 | \end{awkcode}
822 | 这是用 \for 循环实现的等价的程序:
823 | \marginpar{17}
824 | \begin{awkcode}
825 |     # reverse - print input in reverse order by line
826 | 
827 |         { line[NR] = $0 }  # remember each input line
828 | 
829 |     END { for (i = NR; i > 0; i = i - 1)
830 |               print line[i]
831 |         }
832 | \end{awkcode}
833 | 
834 | \section{实用``一行''手册}
835 | \label{sec:a_handful_of_useful_one_liners}
836 | 
837 | 虽然 awk 可以写出非常复杂的程序, 但是许多实用的程序并不比我们目前
838 | 为止看到的复杂多少. 这里有一些小程序集合, 对读者应该会有一些参考价值. 大多数
839 | 是我们已经讨论过的程序的变形.
840 | \begin{enumerate}
841 | \item 输入行的总行数
842 | \begin{awkcode}
843 |     END { print NR }
844 | \end{awkcode}
845 | \item 打印第 10 行
846 | \begin{awkcode}
847 |     NR == 10
848 | \end{awkcode}
849 | \item 打印每一个输入行的最后一个字段
850 | \begin{awkcode}
851 |     { print $NF }
852 | \end{awkcode}
853 | \item 打印最后一行的最后一个字段
854 | \begin{awkcode}
855 |     { field = $NF }
856 |     END { print field }
857 |     \end{awkcode}
858 | \item 打印字段数多于4个的输入行
859 | \begin{awkcode}
860 |     NF > 4
861 | \end{awkcode}
862 | \item 打印最后一个字段值大于4的输入行
863 | \begin{awkcode}
864 |     $NF > 4
865 | \end{awkcode}
866 | \item 打印所有输入行的字段数的总和
867 | \begin{awkcode}
868 |     { nf = nf + NF }
869 |     END { print nf }
870 | \end{awkcode}
871 | \item 打印包含 \texttt{Beth} 的行的数量
872 | \begin{awkcode}
873 |     /Beth/ { nlines = nlines + 1 }
874 |     END { print nlines }
875 | \end{awkcode}
876 | \item 打印具有最大值的第一个字段, 以及包含它的行 (假设 \verb'$1' 总是
877 | \marginpar{18}
878 |     正的)
879 | \begin{awkcode}
880 |     $1 > max { max = $1; maxline = $0 }
881 |     END { print max, maxline }
882 | \end{awkcode}
883 | \item 打印至少包含一个字段的行
884 | \begin{awkcode}
885 |     NF > 0
886 | \end{awkcode}
887 | \item 打印长度超过80个字符的行
888 | \begin{awkcode}
889 |     length($0) > 80
890 | \end{awkcode}
891 | \item 在每一行的前面加上它的字段数
892 | \begin{awkcode}
893 |     { print NF, $0 }
894 | \end{awkcode}
895 | \item 打印每一行的第1与第2个字段, 但顺序相反
896 | \begin{awkcode}
897 |     { print $2, $1 }
898 | \end{awkcode}
899 | \item 交换每一行的第1与第2个字段, 并打印该行
900 | \begin{awkcode}
901 |     { temp = $1; $1 = $2; $2 = temp; print }
902 | \end{awkcode}
903 | \item 将每一行的第一个字段用行号代替
904 | \begin{awkcode}
905 |     { $1 = NR; print }
906 | \end{awkcode}
907 | \item 打印删除了第2个字段后的行
908 | \begin{awkcode}
909 |     { $2 = ""; print }
910 | \end{awkcode}
911 | \item 将每一行的字段按逆序打印
912 | \begin{awkcode}
913 |     { for (i = NF; i > 0; i = i - 1) printf("%s ", $i)
914 |       printf("\n")
915 |     }
916 | \end{awkcode}
917 | \item 打印每一行的所有字段值之和
918 | \begin{awkcode}
919 |     { sum = 0
920 |       for (i = 1; i <= NF; i = i + 1) sum = sum + $i
921 |       print sum
922 |     }
923 | \end{awkcode}
924 | \item 将所有行的所有字段值累加起来
925 | \begin{awkcode}
926 |         { for (i = 1; i <= NF; i = i + 1) sum = sum + $i }
927 |     END { print sum }
928 | \end{awkcode}
929 | \item 将每一行的每一个字段用它的绝对值替换
930 | \begin{awkcode}
931 |     { for (i = 1; i <= NF; i = i + 1) if ($i < 0) $i = -$i
932 |       print
933 |     }
934 | \end{awkcode}
935 | \end{enumerate}
936 | 
937 | \section{接下来}
938 | \label{sec:what_next}
939 | \marginpar{19}
940 | 读者已经见识过了 awk 的要点. 本章的每个程序都是由
941 | 多个 \patact 语句组成的序
942 | 列. Awk 用模式测试每一个输入行, 如果模式匹配, 对应的动作就会执行. 模式
943 | 可以包括数值或字符串比较, 动作也可以包含计算和格式化输出. 除了可以自动从
944 | 输入文件中读取数据, awk 还会将每一个输入行分割为字段. 它还提供了一系列
945 | 的内建变量与函数, 当然你也可以自己定义. 有了这些特征的帮助, 许多实用
946 | 的计算可以用非常简短的程序实现 --- 如果使用其他语言实现同样的功能, 那么
947 | 就要考虑到许多细节, 而 awk 程序可以隐式处理这些细节.
948 | 
949 | 本书的剩下部分将会详细讨论这些基本概念. 由于有些程序会比本章给出的示例
950 | 程序大一些, 我们强烈建议读者尽可能开始自己写程序. 这会使你对 awk 更加熟悉,
951 | 也更容易理解稍大一些的程序. 更进一步讲, 没有什么能比一个简单的实验更能说明
952 | 问题. 读者还是应该浏览整本书; 每一个例子都传达出一些关于 awk 的知识点, 可能是
953 | 关于如何使用一个语言特性, 也可能是如何创建一个有趣的程序.
954 | % Although there is no page 20/210 in original book.
955 | \marginpar{20}
956 | 


--------------------------------------------------------------------------------
/latex_src/answers_to_selected_exercises.tex:
--------------------------------------------------------------------------------
  1 | % vim: ts=4 sts=4 sw=4 et tw=75
  2 | \chapter{部分习题答案}
  3 | \label{chap:answers_to_selected_exercises}
  4 | \marginpar{193}
  5 | 
  6 | \myexer\ref{exer:sum3} 一个比较简单的, 用来忽略空白行的方式是把 \texttt{sum3}
  7 | 的第1行替换成
  8 | \begin{awkcode}
  9 |         nfld == 0 && NF > 0 { nfld = NF
 10 | \end{awkcode}
 11 | 
 12 | \myexer\ref{exer:without_for_test} 如果缺了这个条件判断, 非数值列的和仍然
 13 |     会被累加, 但不会被打印出来. 当累加到某些无用的总和时, 可能会出现一些
 14 |     错误(例如溢出), 而该条件判断可以避免这这种情况出现, 而且并不会对程序
 15 |     的运行效率产生明显的影响.
 16 | 
 17 | \myexer\ref{exer:accumulate} 使用关联数组的话, 这道题就容易多了:
 18 | \begin{awkcode}
 19 |         { total[$1] += $2 }
 20 |     END { for (x in total) print x, total[x] | "sort" }
 21 | \end{awkcode}
 22 | 
 23 | \myexer\ref{exer:star_num} 假定一行内至多只能有 25 个星号. 把 \texttt{max} 
 24 | 设置为 25, 如果最长的行不会超过上限, 那么下面的程序就不会对数据进行更改,
 25 |     否则的话, 就对每一行按照比例进行缩放, 使得最长的行不会超过 25 个星号.
 26 | 新数组 \texttt{y} 用来维护缩放后的长度, 这样的话, 数组 \texttt{x} 的元素
 27 | 仍然是有效的.
 28 | \begin{awkcode}
 29 |         { x[int($1/10)]++ }
 30 |     END { max = MAXSTARS = 25
 31 |           for (i = 0; i <= 10; i++)
 32 |               if (x[i] > max)
 33 |                   max = x[i]
 34 |           for (i = 0; i <= 10; i++)
 35 |               y[i] = x[i]/max * MAXSTARS
 36 |           for (i = 0; i < 10; i++)
 37 |               printf(" %2d - %2d: %3d %s\n",
 38 |                   10*i, 10*i+9, x[i], rep(y[i],"*"))
 39 |           printf("100:      %3d %s\n", x[10], rep(y[10],"*"))
 40 |         }
 41 | 
 42 |     function rep(n,s,   t) {  # return string of n s's
 43 |         while (n-- > 0)
 44 |             t = t s
 45 |         return t
 46 |     }
 47 | \end{awkcode}
 48 | 
 49 | \myexer\ref{exer:bucket} 需要对数据遍历两遍, 其中一遍确定桶的范围, 另一 
 50 |     遍把条目分配到桶中.
 51 |     
 52 | \marginpar{194}
 53 | \myexer\ref{exer:sumcomma} 逗号在数字中如何放置 --- 对于这个问题并没有一个
 54 | 明确的定义, 如果不考虑软件工程的标准, 比较常见的情况是即使对问题不是非常
 55 | 清楚, 但也必须加以解决. 对这道题有两种可能的答案. 下面的程序对整数求和,
 56 | 而这些整数中的逗号都处在传统的位置上:
 57 | \begin{awkcode}
 58 |     /^[+-]?[0-9][0-9]?[0-9]?(,[0-9][0-9][0-9])*$/ {
 59 |             gsub(/,/, "")
 60 |             sum += $0
 61 |             next
 62 |     }
 63 |           { print "bad format:", $0 }
 64 |     END   { print sum }
 65 | \end{awkcode}
 66 | 一般来说, 逗号不会出现在小数点之后, 程序 
 67 | \begin{awkcode}
 68 |     /^[+-]?[0-9][0-9]?[0-9]?(,[0-9][0-9][0-9])*([.][0-9]*)?$/ {
 69 |             gsub(/,/, "")
 70 |             sum += $0
 71 |             next
 72 |     }
 73 |           { print "bad format:", $0}
 74 |     END   { print sum }
 75 | \end{awkcode}
 76 | 所求和的数值, 其在小数点之前含有逗号和至少一个数字.
 77 | 
 78 | \myexer\ref{exer:date} 函数 \texttt{daynum(y,m,d)} 返回某个日期自 1901 年 
 79 | 1 月 1 号以来经过的天数, 日期的格式是 \textit{year month day}, 比如 
 80 | \texttt{2001 4 1}. 闰年的 二月有 29 天, 闰年的判断标准是年份可以被 4 整除,
 81 | 但不能被 100 整除, 或者能直接被 400 整除, 于是 1900 年和2100年都不是
 82 | 闰年 (它们能被 100 整除), 但 2000 年是闰年 (能直接被 400 整除).
 83 | \begin{awkcode}
 84 |     function daynum(y, m, d,    days, i, n) {   # 1 == Jan 1, 1901
 85 |         split("31 28 31 30 31 30 31 31 30 31 30 31", days)
 86 |         # 365 days a year, plus one for each leap year
 87 |         n = (y-1901) * 365 + int((y-1901)/4)
 88 |         if (y % 4 == 0) # leap year from 1901 to 2099
 89 |             days[2]++
 90 |         for (i = 1; i < m; i++)
 91 |             n += days[i]
 92 |         return n + d
 93 |     }
 94 |         { print daynum($1, $2, $3) }
 95 | \end{awkcode}
 96 | 这个程序只对 1901 年到 2099 年之间的年份才是正确的, 而且它也不检查输入
 97 | 数据的有效性.
 98 | 
 99 | \myexer\ref{exer:numtowords} 修改 \texttt{numtowords} 的一种方式是:
100 | \begin{awkcode}
101 |     function numtowords(n,   cents, dols, s) { # n has 2 decimal places
102 |         cents = substr(n, length(n)-1, 2)
103 |         dols = substr(n, 1, length(n)-3)
104 |         if (dols ==  0)
105 |             s = "zero dollars and " cents " cents exactly"
106 |         else
107 |             s = intowords(dols) " dollars and " cents " cents exactly"
108 |         sub(/^one dollars/, "one dollar", s)
109 |         gsub(/  +/, " ", s)
110 |         return s
111 | \end{awkcode}
112 | 函数 \texttt{sub} 可以修复 ``one dollars'' 问题, \texttt{gsub} 可以移除
113 | 多余的空格, 即使原文本来就没错, 这两条语句也不会造成什么影响, 这比先
114 | 判断再更改要容易得多.
115 | 
116 | \marginpar{195}
117 | \myexer\ref{exer:p12check} 为了简单起见, 假定配对的符号是 \texttt{aa} 和
118 | \texttt{bb}, \texttt{cc} 和 \texttt{dd}, \texttt{ee} 和 \texttt{ff}.
119 | 在文本中, 这些符号对不能嵌套或重叠.
120 | \begin{awkcode}
121 |     BEGIN {
122 |         expects["aa"] = "bb"
123 |         expects["cc"] = "dd"
124 |         expects["ee"] = "ff"
125 |     }
126 |     /^(aa|cc|ee)/ {
127 |         if (p != "")
128 |             print "line", NR, ": expected " p
129 |         p = expects[substr($0, 1, 2)]
130 |     }
131 |     /^(bb|dd|ff)/ {
132 |         x = substr($0, 1, 2)
133 |         if (p != x) {
134 |             print "line", NR, ": saw " x
135 |             if (p)
136 |                 print ", expected", p
137 |         }
138 |         p = ""
139 |     }
140 |     END {
141 |         if (p != "")
142 |             print "at end, missing", p
143 |     }
144 | \end{awkcode}
145 | 变量 \texttt{p} 通过记录待匹配的定界符来为状态编码. 程序用到了一个小技巧:
146 | 所有的开标签都具有相同的长度. 一个可能的选择方案是要求定界符总是 \verb'$1'.
147 | 
148 | \myexer\ref{exer:checkgen} 选择一些标记, 比如 \texttt{=}, 它们不能当作
149 | 合法的模式来使用. 程序 
150 | \begin{awkcode}
151 |     BEGIN { FS = "\t" }
152 |     /^=/  { print substr($0, 2); next }
153 |     { printf("%s {\n\tprintf(\"line %%d, %s: %%s\\n\", NR, $0) }\n",
154 |             $1, $2)
155 |     }
156 | \end{awkcode}
157 | 可以打印出那些以标记开始的行, 但不包括标记本身.
158 | 
159 | \myexer\ref{exer:form3_form4} 一个可能的解决办法在命令行中显式地给出日期参数:
160 | \begin{awkcode}
161 |     awk -f prep3 pass=1 countries pass=2 countries |
162 |         awk -f form3 date='January 1, 1988'
163 | \end{awkcode}
164 | 变量 \texttt{date} 在命令行中赋值, 而且它的值可以一直保留到 \texttt{form3}
165 | 的 \texttt{BEGIN} 动作之外. 如果参数中包含空格, 那么就必须用引号把它们
166 | 包围起来. 另一个办法是把 \texttt{date} 命令的输出以管道的方式输送给 
167 | 变量, \ref{sec:data_transformation_and_reduction} 节演示过这种方法.
168 | 
169 | \myexer\ref{exer:table_format} 在参考我们给出的答案之前, 考虑一下你会如何
170 | 处理不带小数点的数值. 为了简单起见, 我们的解决方案只考虑一个单独的列.
171 | 我们用两个变量 --- \texttt{lwid} 和 \texttt{rwid} --- 来替换 \texttt{nwid},
172 | \texttt{lwid} 记录小数点左边的数字的长度, \texttt{rwid} 记录小数点右边
173 | 的数字的个数 (包括小数点本身). 它们根据模式 \texttt{left} 和 \texttt{right}
174 | 来计算. 于是, 数值需要的空间长度就是 \texttt{lwid+rwid}, 计算结果可能会
175 | 超过最长的数值的长度, 这时候就需要 \texttt{wid} 来记录最大值.
176 | \marginpar{196}
177 | \begin{awkcode}
178 |     # table1 - single column formatter
179 |     #   input:  one column of strings and decimal numbers
180 |     #   output: aligned column
181 | 
182 |     BEGIN {
183 |         blanks = sprintf("%100s", " ")
184 |         number = "^[+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)$"
185 |         left = "^[+-]?[0-9]*"
186 |         right = "[.][0-9]*"
187 |     }
188 | 
189 |     {   row[NR] = $1
190 |         if ($1 ~ number) {
191 |             match($1, left) # matches the empty string, so RLENGTH>=0
192 |             lwid = max(lwid, RLENGTH)
193 |             if (!match($1, right))
194 |                 RLENGTH = 0
195 |             rwid = max(rwid, RLENGTH)
196 |             wid = max(wid, lwid + rwid)
197 |         } else
198 |             wid = max(wid, length($1))
199 |     }
200 | 
201 |     END {
202 |         for (r = 1; r <= NR; r++) {
203 |             if (row[r] ~ number)
204 |                 printf("%" wid "s\n", numjust(row[r]))
205 |             else
206 |                 printf("%-" wid "s\n", row[r])
207 |         }
208 |     }
209 | 
210 |     function max(x, y) { return (x > y) ? x : y }
211 | 
212 |     function numjust(s) {   # position s
213 |         if (!match(s, right))
214 |             RLENGTH = 0
215 |         return s substr(blanks, 1, int(rwid-RLENGTH+(wid-(lwid+rwid))/2))
216 |     }
217 | \end{awkcode}
218 | 如果某个数字没有使用到 \texttt{lwid} 的全部空间, 那么就要把它向左移位,
219 | 所以在 \texttt{numjust} 中会有一个比较复杂的计算.
220 | 
221 | \myexer\ref{exer:info}
222 | \begin{awkcode}
223 |     awk '
224 |     BEGIN { FS = "\t"; pat = ARGV[1]; ARGV[1] = "-" }
225 |     $1 ~ pat {
226 |         printf("%s:\n", $1)
227 |         printf("\t%d million people\n", $3)
228 |         printf("\t%.3f million sq. mi.\n", $2/1000)
229 |         printf("\t%.1f people per sq. mi.\n", 1000*$3/$2)
230 |     }
231 |     ' "$1" <countries
232 | \end{awkcode}
233 | \marginpar{197}
234 | 另一种解决办法是用 \textit{var}\texttt{=}\textit{text} 替换掉
235 | \texttt{ARGV}:
236 | \begin{awkcode}
237 |     awk '
238 |     BEGIN { FS = "\t" }
239 |     $1 ~ pat {
240 |         printf("%s:\n", $1)
241 |         printf("\t%d million people\n", $3)
242 |         printf("\t%.3f million sq. mi.\n", $2/1000)
243 |         printf("\t%.1f people per sq. mi.\n", 1000*$3/$2)
244 |     }
245 |     ' pat="$1" <countries
246 | \end{awkcode}
247 | 
248 | \myexer\ref{exer:join} 为了检查文件是有序的, 需要了解从每一个输入中读取
249 | 到的最后一条记录, 然后把它们与 \texttt{getone} 中的 \texttt{getline} 的
250 | 调用结果作比较.
251 | 
252 | \myexer\ref{exer:system} 修改 \texttt{doquery} 中调用 \texttt{system}
253 | 的 \texttt{for} 循环: 把所有的命令拼接成一个单独的字符串 \texttt{x},
254 | 比如:
255 | \begin{awkcode}
256 |     for (j = 1; j <= ncmd[i]; j++) x = x cmd[i, j] "\n"
257 | \end{awkcode}
258 | 然后在 \texttt{system} 的调用中使用 \texttt{x}. 如果 \texttt{x} 是 
259 | \texttt{doquery} 的局部变量, 那么在每次调用 \texttt{doquery} 时, 
260 | \texttt{x} 都可以被正确地初始化.
261 | 
262 | \myexer\ref{exer:qawk} 这里显示的是部分答案: 函数把 \texttt{qawk} 在
263 | 一次执行中计算出的导出文件都记录下来, 这样就避免了重复计算:
264 | \begin{awkcode}
265 |     function doquery(s,   i,j,x) {
266 |         for (i in qattr)  # clean up for next query
267 |             delete qattr[i]
268 |         query = s    # put $names in query into qattr, without $
269 |         while (match(s, /\$[A-Za-z]+/)) {
270 |             qattr[substr(s, RSTART+1, RLENGTH-1)] = 1
271 |             s = substr(s, RSTART+RLENGTH+1)
272 |         }
273 |         for (i = 1; i <= nrel && !subset(qattr, attr, i); ) 
274 |             i++
275 |         if (i > nrel)     # didn't find a table with all attributes
276 |             missing(qattr)
277 |         else {            # table i contains attributes in query
278 |             for (j in qattr)   # create awk program
279 |                 gsub("\\$" j, "$" attr[i,j], query)
280 |             if (!exists[i] && ncmd[i] > 0) {
281 |                 for (j = 1; j <= ncmd[i]; j++)
282 |                     x = x cmd[i, j] "\n"
283 |                 print "executing\n" x  # for debugging
284 |                 if (system(x) != 0) { # create table i
285 |                         print "command failed, query skipped\n", x
286 |                         return
287 |                    }
288 |                 exists[i]++
289 |             }
290 |             awkcmd = sprintf("awk -F'\t' '%s' %s", query, relname[i])
291 |             printf("query: %s\n", awkcmd)   # for debugging
292 |             system(awkcmd)
293 |         }
294 |     }
295 | \end{awkcode}
296 | 数组 \texttt{exists} 把已经计算过的导出文件记录下来. 这个版本的
297 | \texttt{doquery} 还包含了最后一个问题的答案.
298 | 
299 | \myexer\ref{exer:multiline_query} 最简单的做法是把 \texttt{qawk}的
300 | 开头变成 
301 | \marginpar{198}
302 | \begin{awkcode}
303 |     BEGIN { readrel("relfile"); RS = "" }
304 | \end{awkcode}
305 | 于是, 在碰到空白行之前, 所有的行都是一个查询的组成部分. 如果不考虑实现
306 | 机制, 查询最终都要转化成合法的 awk 程序.
307 | 
308 | \myexer\ref{exer:rand} 这些 ``随机'' 数其实都是确定了的: 只要知道随机数
309 | 种子和生成算法, 就可以确定随机数序列. 然而, 任意两个序列之间都会分享
310 | 许多属性, 完整的讨论可以在 Knuth 的 \textit{The Art of Computer
311 | Programming} (第 2 卷) 中找到.
312 | 
313 | \myexer\ref{exer:rand2}  下面的程序可以生成从 1 到 \textit{n} 的 \textit{k}
314 | 个互不相同的整数, 算法来自 R. W. Floyd:
315 | \begin{awkcode}
316 |     # print k distinct random integers between 1 and n
317 | 
318 |     { random($1, $2) }
319 | 
320 |     function random(k, n,    A, i, r) {
321 |         for (i = n-k+1; i <= n; i++)
322 |             ((r = randint(i)) in A) ? A[i] : A[r]
323 |         for (i in A)
324 |             print i
325 |     }
326 | 
327 |     function randint(n) { return int(n*rand())+1 }
328 | \end{awkcode}
329 | 
330 | \myexer\ref{exer:bridge_hands} 问题是随机生成下面这种形式的桥牌:
331 | \begin{awkcode}
332 |                         NORTH
333 |                     S: 10 9 6 4
334 |                     H: 8 7
335 |                     D: J 10 6
336 |                     C: 10 8 5 3
337 |        WEST                                 EAST
338 |     S: K 8 7 3                           S: A J 5
339 |     H: K Q 4 3 2                         H: J
340 |     D: 8 7                               D: A K Q 9 2
341 |     C: A J                               C: K Q 6 2
342 |                         SOUTH
343 |                     S: Q 2
344 |                     H: A 10 9 6 5
345 |                     D: 5 4 3
346 |                     C: 9 7 4
347 | \end{awkcode}
348 | 下面的程序生成从 1 到 52 的整数的一个随机排列, 排列结果存放到数组
349 | \texttt{deck} 中. 数组被均分成四段, 分别对每段中的 13 个数排序, 每一段
350 | 都表示一手牌: 数字 52 对应黑桃 A, 51 对应黑桃 K, 1 对应梅花二.
351 | 
352 | 函数 \texttt{permute(k,n)} 使用 Floyd 算法 (习题 \ref{exer:rand2}) 从
353 | \texttt{1} 到 \texttt{n} 的整数中随机生成一个长度为\texttt{k} 的排列.
354 | 函数 \texttt{sort(x,y)} 使用插入排序 (见 \ref{sec:sorting} 节),
355 | 对 \texttt{deck[x..y]} 中的元素进行排序. 最后, 函数 \texttt{prhands} 
356 | 按照上面的风格, 格式化并输出每一手牌.
357 | \marginpar{199}
358 | \begin{awkcode}
359 |   # bridge - generate random bridge hands
360 | 
361 |   BEGIN { split(permute(52,52), deck)           # generate a random deck
362 |           sort(1,13); sort(14,26); sort(27,39); sort(40,52) # sort hands
363 |           prhands()                    # format and print the four hands
364 |   }
365 | 
366 |   function permute(k, n,    i, p, r) {   # generate a random permutation
367 |       srand(); p = " "                   # of k integers between 1 and n
368 |       for (i = n-k+1; i <= n; i++)
369 |           if (p ~ " " (r = int(i*rand())+1) " " )
370 |               sub(" " r " ", " " r " " i " ", p)    # put i after r in p
371 |           else p = " " r p                     # put r at beginning of p
372 |       return p
373 |   }
374 | 
375 |   function sort(left,right,    i,j,t) { # sort hand in deck[left..right]
376 |       for (i = left+1; i <= right; i++)
377 |           for (j = i; j > left && deck[j-1] < deck[j]; j--) {
378 |               t = deck[j-1]; deck[j-1] = deck[j]; deck[j] = t
379 |           }
380 |   }
381 | 
382 |   function prhands() {                            # print the four hands
383 |       b = sprintf("%20s", " "); b40 = sprintf("%40s", " ")
384 |       card = 1                                  # global index into deck
385 |       suits(13); print b "   NORTH"
386 |       print b spds; print b hrts; print b dnds; print b clbs
387 |       suits(26)  # create the west hand from deck[14..26]
388 |       ws = spds substr(b40, 1, 40 - length(spds))
389 |       wh = hrts substr(b40, 1, 40 - length(hrts))
390 |       wd = dnds substr(b40, 1, 40 - length(dnds))
391 |       wc = clbs substr(b40, 1, 40 - length(clbs))
392 |       suits(39); print "   WEST" sprintf("%36s", " ") "EAST"
393 |       print ws spds; print wh hrts; print wd dnds; print wc clbs
394 |       suits(52); print b "   SOUTH"
395 |       print b spds; print b hrts; print b dnds; print b clbs
396 |   }
397 | 
398 |   function suits(j) {           # collect suits of hand in deck[j-12..j]
399 |       for (spds = "S:"; deck[card] > 39 && card <= j; card++)
400 |           spds = spds " " fvcard(deck[card])
401 |       for (hrts = "H:"; deck[card] > 26 && card <= j; card++)
402 |           hrts = hrts " " fvcard(deck[card])
403 |       for (dnds = "D:"; deck[card] > 13 && card <= j; card++)
404 |           dnds = dnds " " fvcard(deck[card])
405 |       for (clbs = "C:"; card <= j; card++)
406 |           clbs = clbs " " fvcard(deck[card])
407 |   }
408 | 
409 |   function fvcard(i) {                    # compute face value of card i
410 |       if (i % 13 == 0) return "A"
411 |       else if (i % 13 == 12) return "K"
412 |       else if (i % 13 == 11) return "Q"
413 |       else if (i % 13 == 10) return "J"
414 |       else return (i % 13) + 1
415 |   }
416 | \end{awkcode}
417 | \marginpar{200}
418 | 
419 | \myexer\ref{exer:length_limit} 如果想要聪明地解决这个问题, 其实是比较
420 | 困难的. 最简单的办法是跟踪到目前为止已经输出的字符数, 如果发现输出得太
421 | 多了, 就打印一条错误消息并停止. 更复杂一点的做法是, 在函数 \texttt{gen}
422 | 中, 如果发现推导过程已经变得太长了, 就在下一次推导时, 只选择空字符串或
423 | 终结符. 不幸的是, 这种做法并非每次都奏效. 一种比较保险的做法要求事先
424 | 知道每个非终结符可以产生的最短输出, 当推导过程变得过长时, 就强制按照最
425 | 短输出规则来推导. 这需要对语法进行大量的修改, 以及一些特殊的知识.
426 | 
427 | \myexer\ref{exer:weight} 我们在每一条产生式的末尾都加上概率, 这些概率首
428 | 先被读取到数组 \texttt{rhsprob} 中, 读取完毕后, 修改 \texttt{rhsprob},
429 | 使得每个元素都表示当前或前面任意一条产生式的概率, 这样做可以让函数
430 | \texttt{gen} 中的条件判断更简单一点, 否则的话, 每次都要重新计算概率.
431 | \begin{awkcode}
432 |     # sentgen1 - random sentence generator with probabilities
433 |     #   input:  grammar file; sequence of nonterminals
434 |     #   output: random sentences generated by the grammar
435 | 
436 |     BEGIN {  # read rules from grammar file
437 |         while (getline < "test-gram" > 0)
438 |             if ($2 == "->") {
439 |                 i = ++lhs[$1]              # count lhs
440 |                 rhsprob[$1, i] = $NF       # 0 <= probability <= 1
441 |                 rhscnt[$1, i] = NF-3       # how many in rhs
442 |                 for (j = 3; j < NF; j++)   # record them
443 |                    rhslist[$1, i, j-2] = $j
444 |             } else
445 |                 print "illegal production: " $0
446 |         for (sym in lhs)
447 |             for (i = 2; i <= lhs[sym]; i++)
448 |                 rhsprob[sym, i] += rhsprob[sym, i-1]
449 |     }
450 | 
451 |     {   if ($1 in lhs) {  # nonterminal to expand
452 |             gen($1)
453 |             printf("\n")
454 |         } else 
455 |             print "unknown nonterminal: " $0   
456 |     }
457 | 
458 |     function gen(sym,    i, j) {
459 |         if (sym in lhs) {       # a nonterminal
460 |             j = rand()          # random production
461 |             for (i = 1; i <= lhs[sym] && j > rhsprob[sym, i]; i++)
462 |                 ;
463 |             for (j = 1; j <= rhscnt[sym, i]; j++) # expand rhs's
464 |                 gen(rhslist[sym, i, j])
465 |         } else
466 |             printf("%s ", sym)
467 |     }
468 | \end{awkcode}
469 | 
470 | \myexer\ref{exer:nonrecursive} 标准做法是用一个由用户管理的栈替换掉
471 | 递归. 在对产生式的右部进行展开时, 程序按照相反的顺序把右部压入到栈中,
472 | 这样就可以按照正确的顺序产生输出.
473 | \marginpar{201}
474 | \begin{awkcode}
475 |     # sentgen2 - random sentence generator (nonrecursive)
476 |     #   input:  grammar file; sequence of nonterminals
477 |     #   output: random sentences generated by the grammar
478 | 
479 |     BEGIN {  # read rules from grammar file
480 |         while (getline < "grammar" > 0)
481 |             if ($2 == "->") {
482 |                 i = ++lhs[$1]              # count lhs
483 |                 rhscnt[$1, i] = NF-2       # how many in rhs
484 |                 for (j = 3; j <= NF; j++)  # record them
485 |                    rhslist[$1, i, j-2] = $j
486 |             } else
487 |                 print "illegal production: " $0
488 |     }
489 | 
490 |     {   if ($1 in lhs) {  # nonterminal to expand
491 |             push($1)
492 |             gen()
493 |             printf("\n")
494 |         } else 
495 |             print "unknown nonterminal: " $0   
496 |     }
497 | 
498 |     function gen(    i, j) {
499 |         while (stp >= 1) {
500 |             sym = pop()
501 |             if (sym in lhs) {       # a nonterminal
502 |                 i = int(lhs[sym] * rand()) + 1   # random production
503 |                 for (j = rhscnt[sym, i]; j >= 1; j--) # expand rhs's
504 |                     push(rhslist[sym, i, j])
505 |             } else
506 |                 printf("%s ", sym)
507 |         }
508 |     }
509 | 
510 |     function push(s) { stack[++stp] = s }
511 | 
512 |     function pop() { return stack[stp--] }
513 | \end{awkcode}
514 | 
515 | \myexer\ref{exer:quiz} 最简单的办法是在开始时, 随机生成一个从 \texttt{1}
516 | 到 \texttt{nq} 的排列, 然后按照这个顺序来提问.
517 | 
518 | \myexer\ref{exer:wordcount} 在 awk 中, 对大小写进行转换的最简洁的方式是
519 | 创建一个数组, 数组包含了每个字母的大小写映射关系, 这种做法非常笨拙, 所以 
520 | 如果可能的话, 可以利用某些 Unix 命令来完成, 比如 \texttt{tr}.
521 | 
522 | \myexer\ref{exer:fmt_justify} 我们把单词存放在数组中, 如果要在一行内打印
523 | \texttt{cnt} 个单词, 那就有 \texttt{cnt-1} 个缝隙需要填充空格. 如果有
524 | \textit{n} 个空格, 那么每个缝隙占用 \textit{n}\texttt{/(cnt-1)} 个. 对
525 | 每一个单词, 程序计算缝隙占用的空格数, 然后递减缝隙和空格的数量.
526 | 如果额外的空格分布地并不均匀, 那么多出来的空格就会轮流地从左边, 或从右边
527 | 分散到后面的连续的行中.
528 | \marginpar{202}
529 | \begin{awkcode}
530 |     # fmt.just - formatter with right justification
531 | 
532 |     BEGIN { blanks = sprintf("%60s", " ") }
533 |     /./   { for (i = 1; i <= NF; i++) addword($i) }
534 |     /^$/  { printline("no"); print "" }
535 |     END   { printline("no") }
536 | 
537 |     function addword(w) {
538 |         if (cnt + size + length(w) > 60)
539 |             printline("yes")
540 |         line[++cnt] = w
541 |         size += length(w)
542 |     }
543 | 
544 |     function printline(f,    i, nb, nsp, holes) {
545 |         if (f == "no" || cnt == 1) {
546 |             for (i = 1; i <= cnt; i++)
547 |                 printf("%s%s", line[i], i < cnt ? " " : "\n")
548 |         } else if (cnt > 1) {
549 |             dir = 1 - dir        # alternate side for extra blanks
550 |             nb = 60 - size       # number of blanks needed
551 |             holes = cnt - 1      # holes
552 |             for (i = 1; holes > 0; i++) {
553 |                 nsp = int((nb-dir) / holes) + dir
554 |                 printf("%s%s", line[i], substr(blanks, 1, nsp))
555 |                 nb -= nsp
556 |                 holes--
557 |             }
558 |             print line[cnt]
559 |         }
560 |         size = cnt = 0
561 |     }
562 | \end{awkcode}
563 | 给 \texttt{printline} 传递一个参数 ``no'' 可以避免对段落的最后一行进行
564 | 右对齐.
565 | 
566 | \myexer\ref{exer:lack_underscore} 这取决于遗漏了下划线的符号名是否在文档
567 | 中的其他地方出现, 如果是, 那么这些内容就会被错误地替换掉.
568 | 
569 | \myexer\ref{exer:xref} 
570 | \begin{awkcode}
571 |     /^\.#/ { printf("{ gsub(/%s/, \"%d\") }\n", $2, ++count[$1])
572 |              if (saw[$2])
573 |                  print NR ": redefinition of", $2, "from line", saw[$2]
574 |              saw[$2] = NR
575 |            }
576 |     END    { printf("!/^[.]#/\n") }
577 | \end{awkcode}
578 | 
579 | \myexer\ref{exer:xref_once}
580 | \begin{awkcode}
581 |     /^\.#/ { s[$2] = ++count[$1]; next }
582 |            { for ( i in s)
583 |                  gsub(i, s[i])
584 |              print
585 |            }
586 | \end{awkcode}
587 | 符号名必须在它被使用之前定义.
588 | 
589 | \myexer\ref{exer:stop_list} 符合分而治之策略的最简单的解决办法是: 为管道
590 | 添加一个过滤, 删除掉那些以停止列表中的单词作为开始的旋转行.
591 | \marginpar{203}
592 | \begin{awkcode}
593 |     ...
594 |     awk '$1 !~ /^(a|an|and|by|for|if|in|is|of|on|the|to)$/' |
595 |     sort -f |
596 |     ...
597 | \end{awkcode}
598 | 
599 | \myexer\ref{exer:literal} 如何辨别字面上的 \verb'~' 与作为空格使用
600 | 的 \verb'~' 是一个风格问题. 我们选择使用 awk 的转义序列约定: 当想要字
601 | 面上的字符时, 我们就在字符前加一个反斜杠 \verb'\'. 我们将只考虑
602 | 波浪号 \verb'~', 对于其他字符, \texttt{ix.genkey} 和
603 | \texttt{ix.format} 都作了详尽的阐述.
604 | 为了显示\verb'~', 我们把所有的 \verb'\~' 实例替换成
605 | 某个不可能出现的字符串, 这个字符串由一个制表符和一个 \verb'1' 组成,
606 | \verb'1' 排在制表符之后. 不会出现带有制表符的字符串, 因为制表符是字段
607 | 分隔符. 剩下的波浪号被替换掉, 然后再把转义过的字符串放回原处, 最后把它
608 | 们恢复成转义前的样子. 于是, \texttt{ix.genkey} 的第一个 \texttt{gsub} 
609 | 被替换成:
610 | \begin{awkcode}
611 |     gsub(/\~/, "\t1", $1)   # protect quoted tildes
612 |     gsub(/~/, " ", $1)      # unprotected tildes now become blanks
613 |     gsub(/\t1/, "~", $1)    # restore protected tildes
614 | \end{awkcode}
615 | 另外, 不能再在排序键中把波浪号删除掉.
616 | 
617 | \myexer\ref{exer:asm} 只需要添加 4 行代码, 2 行添加在第 1 次遍历, 另外 
618 | 2 行添加到第 2 次遍历中.
619 | \begin{awkcode}
620 |     ...
621 |     # ASSEMBLER PASS 1
622 |         nextmem = 0    # new
623 |         FS = "[ \t]+"
624 |         while (getline <srcfile > 0) {
625 |             input[nextmem] = $0    # new: remember source line
626 |             sub(/#.*/, "")         # strip comments
627 |             symtab[$1] = nextmem   # remember label location
628 |             if ($2 != "") {        # save op, addr if present
629 |                 print $2 "\t" $3 >tempfile
630 |                 nextmem++
631 |             }
632 |         }
633 |         close(tempfile)
634 |     
635 |     # ASSEMBLER PASS 2
636 |         nextmem = 0
637 |         while (getline <tempfile > 0) {
638 |             if ($2 !~ /^[0-9]*$/)  # if symbolic addr,
639 |                 $2 = symtab[$2]    # replace by numeric value
640 |             mem[nextmem++] = 1000 * op[$1] + $2  # pack into word
641 |         }
642 |         for (i = 0; i < nextmem; i++)    # new: print memory
643 |             printf("%3d:  %05d   %s\n", i, mem[i], input[i])  # new
644 |     }
645 |     ...
646 | \end{awkcode}
647 | 
648 | \myexer\ref{exer:zhuangzhi} 如果只想对 \texttt{graph} 进行一些很简单的
649 | 修改, 就能完成这件工作 --- 实际上这是非常困难的, 因为 \texttt{x} 与
650 | \texttt{y} 所需要的信息嵌入在整个程序和许多变量中 (例如 \texttt{bticks}
651 | 和 \texttt{lticks}). 或许更好的做法是定义一个对输入进行处理的过滤器
652 | \texttt{transpose}. 这里提供了过滤器的一个实现, 通过修改 \texttt{graph}
653 | 来得到:
654 | \marginpar{204}
655 | \begin{awkcode}
656 |     # transpose - input and output suitable for graph
657 |     #   input:  data and specification of a graph
658 |     #   output: data and specification for the transposed graph
659 | 
660 |     BEGIN {
661 |         number = "^[-+]?([0-9]+[.]?[0-9]*|[.][0-9]+)" \
662 |                                 "([eE][-+]?[0-9]+)?$"
663 |     }
664 |     $1 == "bottom" && $2 == "ticks" {     # ticks for x-axis
665 |         $1 = "left"
666 |         print
667 |         next
668 |     }
669 |     $1 == "left" && $2 == "ticks" {       # ticks for y-axis
670 |         $1 = "bottom"
671 |         print
672 |         next
673 |     }
674 |     $1 == "range" {                       # xmin ymin xmax ymax
675 |         print $1, $3, $2, $5, $4
676 |         next
677 |     }
678 |     $1 == "height" { $1 = "width"; print; next }
679 |     $1 == "width"  { $1 = "height"; print; next }
680 |     $1 ~ number && $2 ~ number  { nd++; print $2, $1, $3; next }
681 |     $1 ~ number && $2 !~ number { # single number:
682 |         nd++                      #   count data points
683 |         print $1, nd, $2          #   fill in both x and y
684 |         next
685 |     }
686 |     { print }
687 | \end{awkcode}
688 | 对数坐标轴的一个简单版本也可以用同样的方法来实现.
689 | 
690 | \myexer\ref{exer:calc2} 只需要在 \texttt{if} 语句中多增加几种判断情形即可,
691 | 例如:
692 | \begin{awkcode}
693 |     else if ($i == "pi")
694 |         stack[++top] = 3.14159265358979
695 | \end{awkcode}
696 | 
697 | \myexer\ref{exer:check} 条件 \texttt{A[i] > A[i+1]} 在本质上是不变的, 
698 | % XXX 原文应该有误
699 | 因为这是由算法保证的, 所以它应该总是为真. 真正的问题是 \texttt{check} 并
700 | 不检查输出是否是输入的一个排列: 如果元素被移到了数组边界之外, 它也不会
701 | 发现.
702 | 
703 | \myexer\ref{exer:time_used} 第 \ref{chap:epilog} 章曾经简单地描述过:
704 | awk 使用哈稀表来存放数组. 在小数组中查找元素时, 哈稀表只需要常量的时间,
705 | 但是当数组变大时, 查找时间也会增加.
706 | 
707 | \myexer\ref{exer:end_exit} 由 \texttt{makeprof} 插入的 \texttt{END} 动
708 | 作, 是在所有其他的 \texttt{END} 执行完之后才会轮到它们执行. 所以, 如果 
709 | 前面先执行的 \texttt{END} 中含有 \texttt{exit} 语句, 就会提前终止程序.
710 | 部分的解决方案是把 \texttt{END} 插入到所有其他 \texttt{END} 的前面.
711 | 
712 | \myexer\ref{exer:rtsort} 把节点压入到栈中, 而不是把它们打印出来. 当输入
713 | 结束时, 从栈底开始打印结点. 另一种解决办法是交换 \verb'$1' 与 \verb'$2'
714 | 的功能, 这既可以在  \texttt{rtsort} 中完成, 也可以通过一个单独的程序来
715 | 实现.
716 | 


--------------------------------------------------------------------------------
/latex_src/awk.tex:
--------------------------------------------------------------------------------
 1 | % started at Sat Jul 25 08:57:23 CST 2015
 2 | % written by wuzhouhui250@gmail.com
 3 | % compiled by xelatex
 4 | % vim: ts=4 sts=4 sw=4 et tw=75
 5 | 
 6 | \input{preamble}
 7 | 
 8 | \begin{document}
 9 | 
10 | \frontmatter
11 | \pdfbookmark[0]{封面}{title}
12 | \begin{titlepage}
13 |     \includepdf[width=\paperwidth]{./images/cover.pdf}
14 | \end{titlepage}
15 | \maketitle
16 | \pdfbookmark[0]{目录}{contents}
17 | \tableofcontents
18 | \include{preface}
19 | 
20 | \mainmatter
21 | \include{an_awk_tutorial}
22 | \include{the_awk_language}
23 | \include{data_processing}
24 | \include{reports_and_databases}
25 | \include{processing_words}
26 | \include{little_languages}
27 | \include{experiments_with_algorithms}
28 | \include{epilog}
29 | 
30 | \appendix
31 | \include{awk_summary}
32 | \include{answers_to_selected_exercises}
33 | 
34 | \backmatter
35 | \include{index}
36 | 
37 | \end{document}
38 | 


--------------------------------------------------------------------------------
/latex_src/awk_summary.tex:
--------------------------------------------------------------------------------
  1 | % vim: ts=8 sts=8 sw=4 et tw=75
  2 | \chapter{AWK 总结}
  3 | \label{chap:awk_summary}
  4 | 
  5 | \marginpar{187}
  6 | 这个附录包含了 awk 语言的一个总结. 在句法规则上, 如果某个成分被一对中括号
  7 | \verb'['...\verb']' 包围, 则表示它们是可选的.
  8 | 
  9 | \subsubsection{命令行}
 10 | \begin{quote}
 11 |     \texttt{awk [-F}\textit{s}\texttt{] '}%
 12 |     \textit{program}\texttt{'}\ \ \textit{optional list of filenames}
 13 | 
 14 |     \texttt{awk [-F}\textit{s}\texttt{] -f}
 15 |     \textit{progfile}\ \ \textit{optional list of filenames}
 16 | \end{quote}
 17 | 参数 \texttt{-F}\textit{s} 把字段分隔符 \texttt{FS} 设置成 \textit{s},
 18 | 如果没有提供文件名, awk 就从标准输入读取数据. 文件名的形式可以是 
 19 | \textit{var}\texttt{=}\textit{text}, 在这种情况下, 相当于把 \textit{text}
 20 | 赋值给变量 \textit{var}, 当这个参数被当作一个文件而被访问时, 执行赋值
 21 | 操作.
 22 | 
 23 | \subsubsection{AWK 程序}
 24 | 一个 awk 程序由一系列的 \patact 语句和函数定义组成. 一个 \patact 语句具有
 25 | 形式:
 26 | \begin{quote}
 27 |     \textit{pattern}\ \ \verb'{'\ \textit{action}\ \verb'}'
 28 | \end{quote}
 29 | 如果某个动作省略了模式, 则默认匹配所有输入行; 如果某个模式省略了动作, 则
 30 | 默认打印匹配行.
 31 | 
 32 | 一个函数定义具有形式:
 33 | \begin{quote}
 34 |     \texttt{function}\ \
 35 |     \textit{name}\verb'('\textit{parameter-list}\verb') {'
 36 |         \textit{statement}\ \verb'}'
 37 | \end{quote}
 38 | \patact 语句和函数定义由换行符或分号分隔, 并且这两个字符可以混合使用.
 39 | 
 40 | \subsubsection{模式}
 41 | \begin{quote}
 42 |     \texttt{BEGIN}
 43 | 
 44 |     \texttt{END}
 45 | 
 46 |     \textit{expression}
 47 | 
 48 |     \verb'/'\textit{regular expression}\verb'/'
 49 | 
 50 |     \textit{pattern}\ \verb'&&'\ \textit{pattern}
 51 | 
 52 |     \textit{pattern}\ \verb'||'\ \textit{pattern}
 53 | 
 54 |     \verb'!'\textit{pattern}
 55 | 
 56 |     \verb'('\textit{pattern}\verb')'
 57 | 
 58 |     \textit{pattern}\verb','\ \textit{pattern}
 59 | \end{quote}
 60 | 
 61 | 最后一个模式是范围模式, 它不能作为其他模式的组成部分. 类似地,
 62 | \texttt{BEGIN} 和 \texttt{END} 也不能和其他模式结合.
 63 | \marginpar{188}
 64 | \subsubsection{动作}
 65 | 一个动作由一系列的语句组成, 这些语句包括:
 66 | \begin{quote}
 67 |     \texttt{break}
 68 | 
 69 |     \texttt{continue}
 70 | 
 71 |     \texttt{delete}\ \textit{array-element}
 72 | 
 73 |     \texttt{do}\ \textit{statement}\ \texttt{while}\
 74 |     \texttt{(}\textit{expression}\texttt{)}
 75 | 
 76 |     \texttt{exit[}\textit{expression}\texttt{]}
 77 | 
 78 |     \textit{expression}
 79 | 
 80 |     \texttt{if (}\textit{expression}\texttt{) }\textit{statement}
 81 |     \ \texttt{[else}\ \textit{statement}\texttt{]}
 82 | 
 83 |     \textit{input-output statement}
 84 | 
 85 |     \texttt{for (}\textit{expression}\texttt{; }\textit{expression}
 86 |     \texttt{; }\textit{expression}\texttt{) }\textit{statement}
 87 | 
 88 |     \texttt{for (}\textit{variable}\texttt{ in }\textit{array}\texttt{) }
 89 |     \textit{statement}
 90 | 
 91 |     \texttt{next}
 92 | 
 93 |     \texttt{return [}\textit{expression}\texttt{]}
 94 | 
 95 |     \texttt{while (}\textit{expression}\texttt{) statement}
 96 | 
 97 |     \verb'{' \textit{statement} \verb'}'
 98 | \end{quote}
 99 | 一个单独的分号表示空语句. 在一个 \texttt{if-else} 语句中, 如果第一个 
100 | \textit{statement} 和 \texttt{else} 出现在同一行, 那么它必须以分号结尾, 
101 | 或者用花括号包围起来. 类似地, 在 \texttt{do} 语句中, 如果
102 | \textit{statement} 和 \texttt{while} 出现在同一行, 那么它必须以分号结尾,
103 | 或者用花括号包围起来.
104 | 
105 | \subsubsection{程序格式}
106 | 语句通过换行符或 (和) 分号隔开. 空行可以出现在任何语句, \patact 语句,
107 | 或函数定义的前面或后面. 空格与制表符可以插入到运算符或操作数的周围. 一条
108 | 长语句可以通过反斜杠延续到下一行. 另外, 如果一条语句在逗号, 左花括号,
109 | \verb'&&', \verb'||', \texttt{do}, \texttt{else}, \texttt{if} 或 
110 | \texttt{for} 的右括号后断行, 则不需要反斜杠. 由 \verb'#' 开始的注释可以
111 | 出现在任意一行的末尾.
112 | 
113 | \subsubsection{输入输出}
114 | \begin{quote}
115 |     \begin{tabbing}
116 |         \texttt{close(}\textit{expr}\texttt{)}\hspace{8em} 
117 |         \= 关闭由 \textit{expr}
118 |         指示的文件或管道 \\
119 |         \texttt{getline} \> 把 \verb'$0' 设置成下一条记录; 同时设置
120 |         \texttt{NF}, \texttt{NR}, \texttt{FNR} \\
121 | 
122 |         \texttt{getline <}\textit{file} \> 把 \verb'$0' 设置成文件
123 |         \textit{file} 的下一条记录; 同时设置 \texttt{NF} \\
124 | 
125 |         \texttt{getline}\ \textit{var} \> 把 \textit{var} 设置成下一条记录;
126 |         同时设置 \texttt{NR}, \texttt{FNR} \\
127 | 
128 |         \texttt{getline}\ \textit{var}\ \texttt{<}\textit{file} \>  把
129 |         \textit{var} 设置成文件  \textit{file} 的下一条记录. \\
130 | 
131 |         \texttt{print}  \> 打印当前记录 \\
132 | 
133 |         \texttt{print}\ \textit{expr-list} \> 打印 \textit{expr-list}
134 |         所表示的表达式 \\
135 | 
136 |         \texttt{print}\ \textit{expr-list}\ \texttt{>}\textit{file} \>
137 |         把表达式输出到文件 \textit{file} 中 \\
138 | 
139 |         \texttt{printf}\ \textit{fmt}\texttt{,} \textit{expr-list} \>
140 |         格式化并输出 \\
141 | 
142 |         \texttt{printf}\ \textit{fmt}\texttt{,} \textit{expr-list}\ 
143 |         \texttt{>} \textit{file}        \> 格式化并输出到文件 \textit{file}
144 |         中 \\
145 | 
146 |         \texttt{system(}\textit{cmd-line}\texttt{)}     \> 执行命令
147 |         \textit{cmd-line}, 返回命令的退出状态 \\
148 |     \end{tabbing}
149 | \end{quote}
150 | 
151 | \texttt{print} 后面的 \textit{expr-list}, 以及 \texttt{printf} 后面的
152 | \textit{fmt}\texttt{,}\ \textit{expr-list} 可以用括号括起来. 在 
153 | \texttt{print} 和 \texttt{printf} 中, \texttt{>>}\textit{file} 表示把
154 | 输出追加到文件 \textit{file} 的末尾, \texttt{|}\ \textit{command} 表示把
155 | 输出写到一个管道中. 类似的, \textit{command}\texttt{ | getline} 表示把
156 | 命令 \textit{command} 的输出以管道的方式输送给 \texttt{getline}. 函数
157 | \texttt{getline} 在遇到文件末尾时返回 0, 出错时返回 -1.
158 | 
159 | \marginpar{189}
160 | 
161 | \subsubsection{\textbf{\texttt{printf}} 格式转换}
162 | \texttt{printf} 与 \texttt{sprintf} 识别以下格式转换命令:
163 | \begin{quote}
164 |     \begin{tabbing}
165 |         \verb'%c' \hspace{4em}      \= ASCII 字符 \\
166 | 
167 |         \verb'%d'       \> 十进制数 \\
168 | 
169 |         \verb'%e'       \> \texttt{[-]d.ddddddE[+-]dd}  \\
170 | 
171 |         \verb'%f'       \> \texttt{[-]ddd.dddddd} \\
172 | 
173 |         \verb'%g'       \> 等效于 \verb'%e' 或 \verb'%f', 选择转换
174 |                 后长度较短的那个, 无意义的零会被删除    \\
175 | 
176 |         \verb'%o'       \> 无符号八进制数 \\
177 | 
178 |         \verb'%s'       \> 字符串       \\
179 | 
180 |         \verb'%x'       \> 无符号十六进制数 \\
181 | 
182 |         \verb'%%'       \> 打印一个百分号 \verb'%', 不会有参数被转换 \\
183 |     \end{tabbing}
184 | \end{quote}
185 | 
186 | \verb'%' 与控制字符之间可以出现额外的参数:
187 | 
188 | \begin{quote}
189 |     \begin{tabbing}
190 |         \texttt{-} \hspace{4em}     \= 表达式在它所处的域中左对齐   \\
191 | 
192 |         \textit{width}  \> 当需要时, 把域的宽度填充到该值, 前导的
193 |         \texttt{0} 表示用 \texttt{0} 填充 \\
194 | 
195 |         \texttt{.}\textit{prec} \> 字符串最大宽度, 或小数点后保留的位数 \\
196 |     \end{tabbing}
197 | \end{quote}
198 | 
199 | \subsubsection{内建变量}
200 | 下面列出的内建变量可以使用在任意一个表达式中:
201 | \begin{quote}
202 |     \begin{tabbing}
203 |         \texttt{ARGC}\hspace{6em}   \= 命令行参数的个数 \\
204 |         \texttt{ARGV}   \> 命令行参数组成的数组 (\texttt{ARGV[0..ARGC-1]})\\
205 |         \texttt{FILENAME}\> 当前输入文件的文件名 \\
206 |         \texttt{FNR}    \> 当前输入文件已读取的记录个数 \\
207 |         \texttt{FS}     \> 输入数据的字段分隔符 (默认是空格) \\
208 |         \texttt{NF}     \> 当前输入记录的字段个数 \\
209 |         \texttt{NR}     \> 从程序开始到现在, 读取到的记录个数 \\
210 |         \texttt{OFMT}   \> 数字的输出格式 (默认是 \verb'"%.6g"') \\
211 |         \texttt{OFS}    \> 输出字段分隔符 (默认是空格)\\
212 |         \texttt{ORS}    \> 输出记录分隔符 (默认是换行符) \\
213 |         \texttt{RLENGTH}\> 被函数 \texttt{match} 中的正则表达式匹配的字符串
214 |         的长度 \\
215 |         \texttt{RS}     \> 输入数据的记录分隔符 (默认是换行符) \\
216 |         \texttt{RSTART} \> 被函数 \texttt{match} 匹配的字符串在原字符串中的
217 |         开始位置 \\
218 |         \texttt{SUBSEP} \> 具有形式
219 |         \texttt{[}\textit{i}\texttt{,}\textit{j}\texttt{,}...\texttt{]}
220 |         的数组下标的分隔符 (默认是 \verb'"\034"') \\
221 |     \end{tabbing}
222 | \end{quote}
223 | \texttt{ARGC} 和 \texttt{ARGV} 包含被执行的程序的名字 (通常是
224 | \texttt{awk}), 但是不包括出现在命令行中的 awk 程序或选项.
225 | \texttt{RLENGTH} 同时也是 \texttt{match} 的返回值.
226 | 
227 | \subsubsection{内建字符串函数}
228 | 在下面列出的字符串函数中, \textit{s} 和 \textit{t} 表示字符串, \textit{r}
229 | 表示正则表达式, \textit{i} 和 \textit{n} 表示整数.
230 | 
231 | \texttt{sub} 和 \texttt{gsub} 的替换字符串中的 \verb'&' 会被匹配的字符
232 | 串替换掉, 而 \verb'\&' 表示一个字面意义上的 \verb'&'.
233 | \marginpar{190}
234 | \begin{quote}
235 |     \begin{tabular}{ll}
236 |         \texttt{gsub(}\textit{r}\texttt{,} \textit{s}\texttt{,}
237 |         \textit{t}\texttt{)} &
238 |         \makecell[tl]{全局地把 \textit{t} 中被 \textit{r} 匹配的每一个
239 |         子字符串替换为 \textit{s}, 返回 \\
240 |         替换发生的次数; 如果省略 \textit{t}, 则默认使用 \texttt{\$0}} \\
241 | 
242 |         \texttt{index(}\textit{s}\texttt{,} \textit{t}\texttt{)} &
243 |         返回 \textit{t} 在 \textit{s} 中的开始位置, 如果 \textit{s}
244 |         不包含 \textit{t}, 则返回 0 \\
245 | 
246 |         \texttt{length(}\textit{s}\texttt{)} & 返回 \textit{s} 的长度 \\
247 | 
248 |         \texttt{match(}\textit{s}\texttt{,} \textit{r}\texttt{)} &
249 |         \makecell[tl]{返回 \textit{s} 中匹配 \textit{r} 的子字符串的起始
250 |         位置, 如果 \\ 不存在可匹配的子字符串, 则返回 0,
251 |         调用该函数会同时设置 \\ \texttt{RSTART} 与 \texttt{RLENGTH}} \\
252 | 
253 |         \texttt{split(}\textit{s}\texttt{,} \textit{a}\texttt{,}
254 |         \textit{fs}\texttt{)} & \makecell[tl]{按照 \textit{fs}, 把
255 |         \textit{s} 切分到数组 \textit{a} 中, 返回 
256 |         分割后的字段的个数;\\ 如果省略 \textit{fs}, 则默认使用
257 |         \texttt{FS}} \\
258 | 
259 |         \texttt{sprintf(}\textit{fmt}\texttt{,} \textit{expr-list}
260 |         \texttt{)} & 返回格式化了的 \textit{expr-list} (
261 |         根据 \textit{fmt} 进行格式化) \\
262 | 
263 |         \texttt{sub(}\textit{r}\texttt{,} \textit{s}\texttt{,}
264 |         \textit{t}\texttt{)}   & 类似于 \texttt{gsub}, 但是它只替换
265 |         第一个被匹配的子字符串 \\
266 | 
267 |         \texttt{substr(}\textit{s}\texttt{,} \textit{i}\texttt{,}
268 |         \textit{n}\texttt{)} & \makecell[tl]{返回 \textit{s} 中, 从
269 |         \textit{i} 开始的, 长度为 \textit{n} 的子字符串, \\ 
270 |         如果省略 \textit{n}, 则返回
271 |         \textit{s} 中从 \textit{i} 开始的后缀} \\
272 |     \end{tabular}
273 | \end{quote}
274 | 
275 | \subsubsection{内建算术函数}
276 | \begin{quote}
277 |     \begin{tabbing}
278 |         \texttt{atan2(}\textit{y}\texttt{,} \textit{x}\texttt{)}
279 |         \hspace{4em} \=
280 |         \textit{y}/\textit{x} 的反正切值, 弧度制,
281 |         定义域从 $-\pi$ 到 $\pi$ \\
282 | 
283 |         \texttt{cos(}\textit{x}\texttt{)} \> 余弦 (弧度制) \\
284 | 
285 |         \texttt{exp(}\textit{x}\texttt{)} \> 指数 $e^x$ \\
286 | 
287 |         \texttt{int(}\textit{x}\texttt{)} \> 取整 \\
288 | 
289 |         \texttt{log(}\textit{x}\texttt{)} \> 自然对数 \\
290 | 
291 |         \texttt{rand(}\texttt{)} \> 返回一个伪随机数 \textit{r}
292 |         (0 $\leqslant$ \textit{r} $<$ 1 )\\
293 | 
294 |         \texttt{sin(}\textit{x}\texttt{)} \> 正弦 (弧度制) \\
295 | 
296 |         \texttt{sqrt(}\textit{x}\texttt{)} \> 平方根 \\
297 | 
298 |         \texttt{srand(}\textit{x}\texttt{)} \> 设置随机数种子, 如果省略
299 |         \textit{x}, 则默认使用当天的时间
300 |     \end{tabbing}
301 | \end{quote}
302 | 
303 | \subsubsection{表达式运算符 (按优先级递增排列)}
304 | 表达式可以通过下列运算符进行组合:
305 | \begin{quote}
306 |     \begin{tabbing}
307 |         \verb'= += -= *= /= %= ^=' \hspace{2em} \= 赋值 \\
308 | 
309 |         \texttt{?:} \> 条件表达式 \\
310 | 
311 |         \texttt{||} \> 逻辑或 \\
312 | 
313 |         \verb'&&' \> 逻辑与 \\
314 | 
315 |         \texttt{in} \> 数组成员运算符 \\
316 | 
317 |         \verb'~  !~' \> 正则表达式匹配运算符, 与否定匹配运算符 \\
318 | 
319 |         \texttt{< <= > >= != ==} \> 关系运算符 \\
320 | 
321 |         \texttt{ }      \> 字符串拼接 (没有显式的运算符) \\
322 | 
323 |         \texttt{+ -}    \> 加, 减 \\
324 | 
325 |         \verb'* / %'  \> 乘, 除, 取模 \\
326 | 
327 |         \texttt{+ - !}  \> 单目加, 单目减, 逻辑非 \\
328 | 
329 |         \texttt{\^}      \> 指数运算符 \\
330 | 
331 |         \texttt{++ --}  \> 自增, 自减 (包括前缀形式与后缀形式) \\
332 | 
333 |         \texttt{\$}     \> 字段 \\
334 |     \end{tabbing}
335 | \end{quote}
336 | 所有的运算符都是左结合的, 除了赋值运算符, \texttt{?:} 和 \texttt{\^},
337 | 它们是右结合的. 任意一个表达式都可以用括号括起来.
338 | \marginpar{191}
339 | \subsubsection{正则表达式}
340 | 正则表达式的元字符包括:
341 | \begin{awkcode}
342 |     \ ^ $ . [ ] | ( ) * + ?
343 | \end{awkcode}
344 | 下面的表格总结了正则表达式及其所匹配的字符串:
345 | \begin{quote}
346 |     \begin{tabular}{ll}
347 |         \textit{c}       & 匹配一个非元字符 \textit{c} \\
348 |         \texttt{\textbackslash}\textit{c} & 匹配一个转义序列,
349 |         或一个字面上的字符 \textit{c} \\
350 |         \texttt{\^}     & 匹配一个字符串的开始 \\
351 |         \texttt{\$}     & 匹配一个字符串的结束 \\
352 |         \texttt{.}      & 匹配任意一个字符 \\
353 |         \texttt{[}\textit{abc...}\texttt{]} & 字符类: 匹配 \textit{abc...} 中的任意一个字符 \\
354 |         \texttt{[}\texttt{\^}\textit{abc...}\texttt{]} & 字符类: 匹配任意一个不在 \textit{abc...} 中的字符 \\
355 |         \textit{$r_1$}\texttt{|}\textit{$r_2$} & 选择: 匹配一个能被 \textit{$r_1$} 或 \textit{$r_2$} 匹配的字符串 \\
356 |         \texttt{(}\textit{$r_1$}\texttt{)}\texttt{(}\textit{$r_2$}\texttt{)} & 拼接: 匹配字符串
357 |         \textit{xy}, 其中 \textit{x} 被 \textit{$r_1$} 匹配, \textit{y} 被 \textit{$r_2$} 匹配 \\
358 |         \texttt{(}\textit{r}\texttt{)*} & 匹配 0 个或多个连续出现的被 \textit{r} 匹配 
359 |         的字符串 \\
360 |         \texttt{(}\textit{r}\texttt{)+} & 匹配 1 个或多个连续出现的被 \textit{r} 匹配 
361 |         的字符串 \\
362 |         \texttt{(}\textit{r}\texttt{)?} & 匹配 0 个或 1 个被 \textit{r} 匹配的字符串 \\
363 |         \texttt{(}\textit{r}\texttt{)}   & 组合: 匹配的字符串与 \textit{r} 所匹配 
364 |         的字符串相同 \\
365 | 
366 |     \end{tabular}
367 | \end{quote}
368 | 运算符按优先级升序排列. 只要没有违反优先级规则, 就可以省略多余的括号.
369 | 
370 | \subsubsection{转义序列}
371 | 在字符串与正则表达式中, 转义序列具有特殊的含义.
372 | \begin{quote}
373 |     \begin{tabbing}
374 |         \verb'\b' \hspace{4em} \= 退格 \\
375 |         \verb'\f' \> 换页 \\
376 |         \verb'\n' \> 换行 \\
377 |         \verb'\r' \> 回车 \\
378 |         \verb'\t' \> 制表 \\
379 |         \verb'\'\textit{ddd} \> 八进制数, \textit{ddd} 是 1 到 3 个数字,
380 |         每个数字的值在 0 到 7 之间 \\
381 |         \verb'\'\textit{c} \> 其他字面上的字符 \textit{c}, 比如 \verb'\"'
382 |         表示 \verb'"', \verb'\\' 表示 \verb'\' \\
383 |     \end{tabbing}
384 | \end{quote}
385 | 
386 | \subsubsection{限制}
387 | 任意一个特定的 awk 实现都会强加一些限制条件, 下面列出了一些典型值:
388 | \begin{quote}
389 |     100 个字段 \par
390 |     每条输入记录 3000 个字符 \par
391 |     每条输出记录 3000 个字符 \par
392 |     每个字段 1024 个字符 \par
393 |     每个 \texttt{printf} 字符串 3000 个字符 \par
394 |     字面字符串 400 个字符 \par
395 |     字符类 400 个字符 \par
396 |     15 个打开文件 \par
397 |     1 个管道 \par
398 |     双精度浮点数
399 | \end{quote}
400 | 数值的限制与本地系统所能表示的数值范围有关, 比如某个机器所能表示的数值 
401 | 范围是 $10^{-38}$ 到  $10^{38}$, 超过这个范围的数值只拥有字符串形式.
402 | \marginpar{192}
403 | \subsubsection{初始化, 比较和强制类型转换}
404 | 每一个变量或字段, 在任意时刻都可能是字符串, 或数值, 或两者都是. 当变量
405 | 通过赋值语句来获取一个值时:
406 | \begin{awkcode}
407 |     var = expr
408 | \end{awkcode}
409 | 它的类型也会被设置成表达式的类型 (``赋值'' 包括 \texttt{+=}, \texttt{-=},
410 | 等等). 算术表达式的类型是数值, 拼接是字符串类型, 以此类推. 如果赋值语句
411 | 只是一个简单的复制, 比如 \texttt{v1 = v2}, 那么 \texttt{v1} 就会被
412 | 设置成 \texttt{v2} 的类型.
413 | 
414 | 比较时, 如果两个操作数的类型都是数值, 那么比较操作就会按照数值比较来进行.
415 | 否则的话, 操作数被强制转换成字符串 (如果原来不是字符串的话), 此时比较操作
416 | 就按照字符串比较来进行. 通过某些手段, 可以把任意一个表达式强制转换成数
417 | 值类型, 比如 
418 | \begin{awkcode}
419 |     expr + 0
420 | \end{awkcode}
421 | 转换成字符串可以这样做 (也就是和空字符作拼接操作):
422 | \begin{awkcode}
423 |     expr ""
424 | \end{awkcode}
425 | 字符串的数值形式的值, 指的是该字符串的数值前缀转换成数值后所得到的值.
426 | 
427 | 未初始化的变量的值是 数值0 或空字符串 \verb'""'. 因此, 如果 \texttt{x}
428 | 没有被初始化过, 则条件判断
429 | \begin{awkcode}
430 |     if (x) ...
431 | \end{awkcode}
432 | 就会失败, 同样, 下面这些条件判断:
433 | \begin{awkcode}
434 |     if (!x) ...
435 |     if (x == 0) ...
436 |     if (x == "") ...
437 | \end{awkcode}
438 | 都会成功, 但是注意:
439 | \begin{awkcode}
440 |     if (x == "0") ...
441 | \end{awkcode}
442 | 的比较结果为假.
443 | 
444 | 如果可能的话, 字段的类型可以通过上下文环境来判断, 比如,
445 | \begin{awkcode}
446 |     $1++
447 | \end{awkcode}
448 | 该表达式意味着: 如果有必要, 就把 \texttt{\$1} 强制转换成数值类型, 而
449 | \begin{awkcode}
450 |     $1 = $1 "," $2
451 | \end{awkcode}
452 | 意味着: 如果有必要, 就把 \texttt{\$1} 和 \texttt{\$2} 的类型强制转换成
453 | 字符串.
454 | 
455 | 如果无法根据上下文来判断变量的类型, 比如,
456 | \begin{awkcode}
457 |     if ($1 == $2) ...
458 | \end{awkcode}
459 | 这时候就得根据输入数据来决定变量的类型. 所有字段的类型都是字符串, 但是,
460 | 如果字段包含了一个机器可识别的数, 那么它也会被当作数值类型.
461 | 
462 | 显式为空的字段具有字符串值 \verb'""', 它们不是数值类型. 该结论也适用于
463 | 不存在的字段 (也就是超出 \texttt{NF} 的部分) 和 空白行的 \texttt{\$0}.
464 | 
465 | 对字段成立的结论, 同样适用于由 \texttt{split} 创建的数组元素.
466 | 
467 | 当在表达式中提到一个不存在的变量时, 就会创建该变量, 其初始值是 0 和
468 | \texttt{""}. 因此, 如果元素 \texttt{arr[i]} 当前不存在, 语句:
469 | \begin{awkcode}
470 |     if (arr[i] == "") ...
471 | \end{awkcode}
472 | 就会导致元素 \texttt{arr[i]} 被创建, 且初始值 为 \texttt{""}, 这就使得
473 | \texttt{if} 的条件判断结果为真. 测试语句
474 | \begin{awkcode}
475 |     if (i in arr) ...
476 | \end{awkcode}
477 | 判断元素 \texttt{arr[i]} 是否存在, 但是不会带来创建新元素的副作用.
478 | 


--------------------------------------------------------------------------------
/latex_src/data_processing.tex:
--------------------------------------------------------------------------------
   1 | % vim: ts=4 sts=4 sw=4 et tw=75
   2 | \chapter{数据处理}
   3 | \label{chap:data_processing}
   4 | 
   5 | \marginpar{67}
   6 | Awk 最初的设计目标是用于日常的数据处理, 例如信息查询, 数据验证, 以及数据
   7 | 转换与归约, 我们已经在第 \ref{chap:an_awk_tutorial} 章与第
   8 | \ref{chap:the_awk_language} 章见到了关于它们的简单例子. 在这一章,
   9 | 我们会按照
  10 | 相同的思路, 考虑一些更加复杂的任务, 大多数例子一次只处理一行, 但是最后一节
  11 | 讨论如何处理占据多行的输入记录.
  12 | 
  13 | Awk 程序通常按照增量模式开发: 先写上几行, 测试一下, 然后再添加几行, 再测试,
  14 | 如此进行下去. 这本书里的大多数比较大的程序都是按照这种模式开发的.
  15 | 
  16 | 也可以按照传统的方式开发 awk 程序, 先拟好程序的主体框架, 然后查询手册,
  17 | 但是, 通过修改已有的程序来达到我们自己想要的效果, 通常来说会更加容易.
  18 | 于是, 这本书里的程序扮演了另一个角色: 通过例子来提供实用的编程模型.
  19 | 
  20 | \section{数据转换与归约}
  21 | \label{sec:data_transformation_and_reduction}
  22 | 
  23 | Awk 最常用的一个功能是把数据从一种形式转换成另一种形式, 通常情况下, 是把
  24 | 一种程序的输出格式, 转换成另一种程序要求的格式. 另一个常用的功能是从一个大
  25 | 数据集中提取相关的数据, 通常伴随着汇总信息的重新格式化与准备, 这一节包含了
  26 | 许多关于这些主题的例子.
  27 | 
  28 | \subsection{列求和}
  29 | \label{subsec:Summing_columns}
  30 | 
  31 | 我们已经看到了 ``两行'' awk 程序的几种变体, 这些程序对单个字段上的所有
  32 | 数字求和.
  33 | 下面这个程序执行的工作更加复杂, 但却是比较典型的数据归约任务. 每一个输入行
  34 | 都含有若干个字段, 每一个字段都包含数字, 程序的任务是计算每一列的和, 而不管
  35 | 该行有多少列.
  36 | \begin{awkcode}
  37 |      # sum1 - print column sums
  38 |      #   input:  rows of numbers
  39 |      #   output: sum of each column
  40 |      #     missing entries are treated as zeros
  41 |      
  42 |          { for (i = 1; i <= NF; i++)
  43 |                sum[i] += $i
  44 |            if (NF > maxfld)
  45 |                maxfld = NF
  46 |          }
  47 |      END { for (i = 1; i <= maxfld; i++) {
  48 |                printf("%g", sum[i])
  49 |                if (i < maxfld)
  50 |                    printf("\t")
  51 |                else
  52 |                    printf("\n")
  53 |            }
  54 |          }
  55 | \end{awkcode}
  56 | \marginpar{68}
  57 | 变量的自动初始化在这里显得非常方便, 因为 \verb'maxfld' (到目前为止, 字段数的
  58 | 最大值) 自动从 0 开始, 随着程序的运行, 所有的项都被放入数组 \verb'sum'
  59 | 中, 虽然只有到程序运行结束, 我们才能知道数组中到底有多少项. 值得注意的是,
  60 | 如果输入文件为空, 那么程序什么也不会打印出来.
  61 | 
  62 | 程序不需要知道一行有多少个字段, 这对我们写程序来说就非常方便, 但是它不检
  63 | 查参与运算的项是否都是数值, 也不检查每行的字段数是否相同. 下面的程序做的
  64 | 是同样的事情, 但是 它会检查每行的字段数是否与第一行的相同:
  65 | \begin{awkcode}
  66 |      # sum2 - print column sums
  67 |      #     check that each line has the same number of fields
  68 |      #        as line one
  69 |      
  70 |      NR==1 { nfld = NF }
  71 |            { for (i = 1; i <= NF; i++)
  72 |                  sum[i] += $i
  73 |              if (NF != nfld)
  74 |                  print "line " NR " has " NF " entries, not " nfld
  75 |            }
  76 |      END   { for (i = 1; i <= nfld; i++)
  77 |                  printf("%g%s", sum[i], i < nfld ? "\t" : "\n")
  78 |            }
  79 | \end{awkcode}
  80 | 我们还修正了位于 \END 的输出代码, 这段代码显示了如何利用条件表达式, 使得
  81 | 在列与列之间插入一个制表符, 在最后一列之后插入一个换行符.
  82 | 
  83 | 现在, 假设某些字段不是数值型, 所以它们不能被计算在内. 策略是新增一个数组
  84 | \verb'numcol', 用于跟踪数值型字段, 函数 \verb'isnum' 用于检查某项是否是
  85 | 一个数值, 由于用到了函数, 所以测试只需要在一个地方完成, 这样做有助于将来%
  86 | \marginpar{69}%
  87 | 对程序进行修改. 如果程序足够相信它的输入, 那么只需要查看第 1 行就够了,
  88 | 我们仍然需要 \verb'nfld', 因为在 \END 中, \verb'NF' 的 值是零.
  89 | \begin{awkcode}
  90 |      # sum3 - print sums of numeric columns
  91 |      #     input:  rows of integers and strings
  92 |      #     output: sums of numeric columns
  93 |      #       assumes every line has same layout
  94 |      
  95 |      NR==1 { nfld = NF
  96 |              for (i = 1; i <= NF; i++)
  97 |                  numcol[i] = isnum($i)
  98 |            }
  99 |      
 100 |            { for (i = 1; i <= NF; i++)
 101 |                  if (numcol[i])
 102 |                      sum[i] += $i
 103 |            }
 104 |      
 105 |      END   { for (i = 1; i <= nfld; i++) {
 106 |                  if (numcol[i])
 107 |                      printf("%g", sum[i])
 108 |                  else
 109 |                      printf("--")
 110 |                  printf(i < nfld ? "\t" : "\n")
 111 |              }
 112 |            }
 113 |      
 114 |      function isnum(n) { return n ~ /^[+-]?[0-9]+$/ }
 115 | \end{awkcode}
 116 | 函数 \verb'isnum' 把数值定义成一个或多个数字, 可能有前导符号. 关于数值更加
 117 | 一般的定义可以在 \ref{sec:patterns} 节的正则表达式那里找到.
 118 | 
 119 | \begin{exercise}
 120 |     \label{exer:sum3}
 121 |     修改程序 \verb'sum3', 使它忽略空行.
 122 | \end{exercise}
 123 | \begin{exercise}
 124 |     为数值添加更加一般的正则表达式. 它会如何影响运行时间?
 125 | \end{exercise}
 126 | \begin{exercise}
 127 |     \label{exer:without_for_test}
 128 |     如果把第 2 个 \verb'for' 语句的 \verb'numcol' 测试拿掉, 会产生什
 129 |     么影响?
 130 | \end{exercise}
 131 | \begin{exercise}
 132 |     \label{exer:accumulate}
 133 |     写一个程序, 这个程序读取一个\ \mbox{条目}--数额\ 对列表, 对列表中的
 134 |     每一个 条目, 累加它的数额; 在结束时, 打印条目以及它的总数额, 条目
 135 |     按照字母顺序排列.
 136 | \end{exercise}
 137 | 
 138 | \subsection{计算百分比与分位数}
 139 | \label{subsec:computing_percentages_and_quantiles}
 140 | 
 141 | 假设我们不想知道每列的总和, 但是想知道每一列所占的百分比, 要完成这个工作就
 142 | 必须对数据遍历两遍. 如果只有一列是数值, 而且也没有太多的数据, 最简单的办法
 143 | 是在第一次遍历时, 把数值存储在一个数组中, 第二次遍历时计算百分比并把它打印
 144 | 出来:
 145 | \begin{awkcode}
 146 |      # percent
 147 |      #   input:  a column of nonnegative numbers
 148 |      #   output: each number and its percentage of the total
 149 |      
 150 |          { x[NR] = $1; sum += $1 }
 151 |      
 152 |      END { if (sum != 0)
 153 |                for (i = 1; i <= NR; i++)
 154 |                    printf("%10.2f %5.1f\n", x[i], 100*x[i]/sum)
 155 |          }
 156 | \end{awkcode}
 157 | \marginpar{70}
 158 | 
 159 | 虽然包含了稍微复杂一点的转换关系,
 160 | 但是它可以用于许多事情, 例如调整学生的成绩, 使得成绩分布符
 161 | 合某种曲线. 一旦成绩计算完毕 (0 到 100 之间的数), 显示成一个直方图可能会比
 162 | 较有趣:
 163 | \begin{awkcode}
 164 |     # histogram
 165 |     #   input:  numbers between 0 and 100
 166 |     #   output: histogram of deciles
 167 |     
 168 |         { x[int($1/10)]++ }
 169 |     
 170 |     END { for (i = 0; i < 10; i++)
 171 |               printf(" %2d - %2d: %3d %s\n",
 172 |                   10*i, 10*i+9, x[i], rep(x[i],"*"))
 173 |           printf("100:      %3d %s\n", x[10], rep(x[10],"*"))
 174 |         }
 175 |     
 176 |     function rep(n,s,   t) {  # return string of n s's
 177 |         while (n-- > 0)
 178 |             t = t s
 179 |         return t
 180 |     }
 181 | \end{awkcode}
 182 | 需要注意的是后缀递减运算符 \verb'--' 如何控制 \verb'while' 循环.
 183 | 
 184 | 我们可以用随机生成的成绩测试 \verb'histogram'. 管道线上的第一个程序随机
 185 | 生成 200 个 0 到 100 的整数, 并把这些整数输送给 \verb'histogram'
 186 | \begin{awkcode}
 187 |     awk '
 188 |     # generate random integers
 189 |     BEGIN { for (i = 1; i <= 200; i++)
 190 |                 print int(101*rand())
 191 |           }
 192 |     ' |
 193 |     awk -f histogram
 194 | \end{awkcode}
 195 | 它的输出是
 196 | \begin{awkcode}
 197 |       0 -  9:  20 ********************
 198 |      10 - 19:  18 ******************
 199 |      20 - 29:  20 ********************
 200 |      30 - 39:  16 ****************
 201 |      40 - 49:  23 ***********************
 202 |      50 - 59:  17 *****************
 203 |      60 - 69:  22 **********************
 204 |      70 - 79:  20 ********************
 205 |      80 - 89:  20 ********************
 206 |      90 - 99:  22 **********************
 207 |     100:        2 **
 208 | \end{awkcode}
 209 | 
 210 | \marginpar{71}
 211 | \begin{exercise}
 212 |     \label{exer:star_num}
 213 |     根据比例决定星号的个数, 使得当数据过多时, 一行的长度不会超过屏幕的宽度.
 214 | \end{exercise}
 215 | 
 216 | \begin{exercise}
 217 |     \label{exer:bucket}
 218 |     修改 \verb'histogram', 把输入分拣到指定数量的桶中, 根据目前为止看到的
 219 |     数据调整每个桶的范围.
 220 | \end{exercise}
 221 | 
 222 | \subsection{带逗号的数}
 223 | \label{subsec:numbers_with_commas}
 224 | 
 225 | 设想我们有一张包含了许多数的表, 每个数都有逗号与小数点, 就像
 226 | \verb'12,345.67', 因为第一个逗号会终止 awk 对数的解析, 所以它们不能直接
 227 | 相加, 必须首先把逗号移除:
 228 | \begin{awkcode}
 229 |     # sumcomma - add up numbers containing commas
 230 |     
 231 |         { gsub(/,/, ""); sum += $0 }
 232 |     END { print sum }
 233 | \end{awkcode}
 234 | \verb'gsub(/,/, "")' 把每一个逗号都替换成空字符串, 也就是删除逗号.
 235 | 
 236 | 这个程序不检查逗号是否处于正确的位置,
 237 | 也不在答案中打印逗号. 往数字中加入逗号只
 238 | 需要很少的工作量, 下一个程序就展示了这点, 它为数字加上逗号, 保留两位小数.
 239 | 这个程序的结构是非常值得效仿的: 一个函数只负责添加逗号, 剩下的部分只管读取
 240 | 与打印, 一旦测试通过, 新的函数就可以被包含到最终的程序中.
 241 | 
 242 | 基本思路是在一个循环中, 从小数点开始, 从右至左, 在适当的位置插入逗号, 每
 243 | 次迭代都把一个逗号插到最左边的三个数字的前面, 这三个数字后面跟着一个逗号或
 244 | 小数点, 而且每一个逗号的前面至少有一个数字. 算法使用递归处理负数: 如果
 245 | 输入参数是负数, 那么函数 \verb'addcomma' 使用正数来调用它自身, 返回时再加
 246 | 上负号.
 247 | \begin{awkcode}
 248 |     # addcomma - put commas in numbers
 249 |     #   input:  a number per line
 250 |     #   output: the input number followed by
 251 |     #      the number with commas and two decimal places 
 252 |     
 253 |         { printf("%-12s %20s\n", $0, addcomma($0)) }
 254 |     
 255 |     function addcomma(x,   num) {
 256 |         if (x < 0)
 257 |             return "-" addcomma(-x)
 258 |         num = sprintf("%.2f", x)   # num is dddddd.dd
 259 |         while (num ~ /[0-9][0-9][0-9][0-9]/)
 260 |             sub(/[0-9][0-9][0-9][,.]/, ",&", num)
 261 |         return num
 262 |     }
 263 | \end{awkcode}
 264 | \marginpar{72}
 265 | 请注意 \verb'&' 的用法, 通过文本替换, \verb'sub' 在每三个数字的前面插入一个
 266 | 逗号.
 267 | 
 268 | 这是某些测试数据的输出:
 269 | \begin{awkcode}
 270 |     0                            0.00
 271 |     -1                          -1.00
 272 |     -12.34                     -12.34
 273 |     12345                   12,345.00
 274 |     -1234567.89         -1,234,567.89
 275 |     -123.                     -123.00
 276 |     -123456               -123,456.00
 277 | \end{awkcode}
 278 | 
 279 | \begin{exercise}
 280 |     \label{exer:sumcomma}
 281 |     修复 \verb'sumcomma' (带逗号的数字求和程序): 检查数字中的逗号
 282 |     是否处于正确的位置上.
 283 | \end{exercise}
 284 | 
 285 | \subsection{字段固定的输入}
 286 | \label{subsec:fixed_field_input}
 287 | 
 288 | 对于那些出现在宽度固定的字段里的信息, 在直接使用它们之前, 通常需要某种形式的
 289 | 预处理. 有些程序 (例如电子表格) 在固定的列上面放置序号, 而不是给它们带上
 290 | 字段分隔符. 如果序号太宽,
 291 | 这些列就会邻接在一起. 字段固定的数据最适合用 \verb'substr' 处理,
 292 | 它可以将任意组合的列挑选出来. 举例来说, 假设每行的前6个字符包含一个日期, 
 293 | 日期的形式是 \verb'mmddyy', 如果我们想让它们按照日期排序, 最简单的办法是
 294 | 先把日期转换成 \verb'yymmdd' 的形式:
 295 | \begin{awkcode}
 296 |     # date convert - convert mmddyy into yymmdd in $1
 297 |     
 298 |     { $1 = substr($1,5,2) substr($1,1,2) substr($1,3,2); print }
 299 | \end{awkcode}
 300 | 如果输入是按照月份排序的, 就像这样
 301 | \begin{awkcode}
 302 |     013042 mary's birthday
 303 |     032772 mark's birthday
 304 |     052470 anniversary
 305 |     061209 mother's birthday
 306 |     110175 elizabeth's birthday
 307 | \end{awkcode}
 308 | 那么程序的输出是
 309 | \begin{awkcode}
 310 |     420130 mary's birthday
 311 |     720327 mark's birthday
 312 |     700524 anniversary
 313 |     090612 mother's birthday
 314 |     751101 elizabeth's birthday
 315 | \end{awkcode}
 316 | \marginpar{73}
 317 | 现在数据已经准备好, 可以按照年, 月, 日来排序了.
 318 | 
 319 | \begin{exercise}
 320 |     \label{exer:date}
 321 |     将日期转换成某种形式, 这种形式允许你对日期进行算术运算, 例如计算某两
 322 |     个日期之间的天数.
 323 | \end{exercise}
 324 | 
 325 | \subsection{程序的交叉引用检查}
 326 | \label{subsec:program_cross_reference_checking}
 327 | 
 328 | Awk 经常用于从其他程序的输出中提取信息, 有时候这些输出仅仅是一些同种类的行
 329 | 的集合, 在这种情况下, 使用字段分割操作或函数 \verb'substr'
 330 | 是非常方便且合适的. 然而, 有时候产生输出的程序本来就打算将输出表示成人类
 331 | 可读的形式, 对于这种情况, awk 需要把精心格式化过的输出重新还原
 332 | 成机器容易处理的形式, 只有这样, 才能从互不相关的数据中提取信息, 下面是一个
 333 | 简单的例子.
 334 | 
 335 | 大型程序通常由多个源文件组成, 知道哪些文件定义了哪些函数, 以及这些函数在哪
 336 | 里被使用到 --- 可以带来许多方便之处 (有时候这是非常重要的). 为了完成这个
 337 | 任务, Unix 提供了命令 \verb'nm', \verb'nm' 从一个目标文件集合中提取信息,
 338 | 并打印成一张精心格式化过的列表, 这张表包含了名字, 定义, 以及名字在哪里被使用
 339 | 到, \verb'nm' 的典型输出是
 340 | \begin{awkcode}
 341 |     file.o:
 342 |     00000c80 T _addroot
 343 |     00000b30 T _checkdev
 344 |     00000a3c T _checkdupl
 345 |              U _chown
 346 |              U _client
 347 |              U _close
 348 |     funmount.o:
 349 |     00000000 T _funmount
 350 |              U cerror
 351 | \end{awkcode}
 352 | 只有一个字段的行 (比如 \verb'file.o') 是文件名, 有两个字段的行 (比如 
 353 | \verb'U _close') 表示的是名字被使用到的地方, 有三个字段的行表示的是名字
 354 | 被定义的地方. \verb'T' 表示这个定义是一个文本符号 (函数), \verb'U' 表示 
 355 | 这个名字是未定义的.
 356 | 
 357 | 如果直接使用这些未处理过的信息, 去判断某个文件定义了哪些名字, 又或者是
 358 | 某个符号都在哪里被用到 --- 将会非常得麻烦, 因为每个符号都没有和它所在的
 359 | 文件名放在一起. 对于稍微大型的 C 程序来说, \verb'nm' 的输出将会非常得长
 360 | --- awk 源代码由 9 个文件组成, 它的 \verb'nm' 输出超过了 850 行.
 361 | \marginpar{74}
 362 | 一个仅仅 3 行的 awk 程序就可以把文件名加到每个符号的前面, 经过它的处理,
 363 | 后续的程序仅通过一行就可以获取到有用的信息:
 364 | \begin{awkcode}
 365 |     # nm.format - add filename to each nm output line
 366 |     
 367 |     NF == 1 { file = $1 }
 368 |     NF == 2 { print file, $1, $2 }
 369 |     NF == 3 { print file, $2, $3 }
 370 | \end{awkcode}
 371 | 把上面 \verb'nm' 的输出作为 \filename{nm.format} 的输入, 结果是
 372 | \begin{awkcode}
 373 |     file.o: T _addroot
 374 |     file.o: T _checkdev
 375 |     file.o: T _checkdupl
 376 |     file.o: U _chown
 377 |     file.o: U _client
 378 |     file.o: U _close
 379 |     funmount.o: T _funmount
 380 |     funmount.o: U cerror
 381 | \end{awkcode}
 382 | 现在, 如果有其他程序想要对输出作进一步的处理就容易多了.
 383 | 
 384 | 上面的输出没有包括行号信息, 也没有指出某个名字在文件中被用到了多少次, 
 385 | 但是这些信息很容易通过文本编辑器或另一个 awk 程序获取. 本小节的 awk 程序
 386 | 不依赖于目标文件的编程语言, 所以它比通常的交叉引用工具更加灵活, 也更加
 387 | 简短.
 388 | 
 389 | \subsection{格式化输出}
 390 | \label{subsec:formatted_output}
 391 | 
 392 | 接下来我们要用 awk 赚点钱, 或者是打印支票. 输入数据由多行组成, 每一行都包括
 393 | 支票编号, 金额, 收款人, 字段之间用制表符分开, 输出是标准的支票格式: 8行高,
 394 | 第2行与第3行是支票编号与日期, 都向右缩进 45 个空格, 第4行是收款人, 占用
 395 | 45 个字符宽的区域, 紧跟在它后面的是 3 个空格, 再后面是金额, 第 5 行是金额
 396 | 的大写形式, 其他行都是空白. 支票看起来就像这样:
 397 | \begin{mdframed}
 398 | \begin{awkcode}
 399 | 
 400 |                                              1026
 401 |                                              Aug 31, 2015
 402 | Pay to Mary R. Worth--------------------------------   $123.45
 403 | the sum of one hundred twenty three dollars and 45 cents exactly
 404 | 
 405 | 
 406 | 
 407 | \end{awkcode}
 408 | \end{mdframed}
 409 | 这是打印支票的 awk 程序:
 410 | \marginpar{75}
 411 | \begin{awkcode}
 412 |     # prchecks - print formatted checks
 413 |     #   input:  number \t amount \t payee
 414 |     #   output: eight lines of text for preprinted check forms
 415 |     
 416 |     BEGIN {
 417 |         FS = "\t"
 418 |         dashes = sp45 = sprintf("%45s", " ")
 419 |         gsub(/ /, "-", dashes)        # to protect the payee
 420 |         "date" | getline date         # get today's date
 421 |         split(date, d, " ")
 422 |         date = d[2] " " d[3] ", " d[6]
 423 |         initnum()    # set up tables for number conversion
 424 |     }
 425 |     NF != 3 || $2 >= 1000000 {        # illegal data
 426 |         printf("\nline %d illegal:\n%s\n\nVOID\nVOID\n\n\n", NR, $0)
 427 |         next                          # no check printed
 428 |     }
 429 |     {   printf("\n")                  # nothing on line 1
 430 |         printf("%s%s\n", sp45, $1)    # number, indented 45 spaces
 431 |         printf("%s%s\n", sp45, date)  # date, indented 45 spaces
 432 |         amt = sprintf("%.2f", $2)     # formatted amount
 433 |         printf("Pay to %45.45s   $%s\n", $3 dashes, amt)  # line 4
 434 |         printf("the sum of %s\n", numtowords(amt))        # line 5
 435 |         printf("\n\n\n")              # lines 6, 7 and 8
 436 |     }
 437 |     
 438 |     function numtowords(n,   cents, dols) { # n has 2 decimal places
 439 |         cents = substr(n, length(n)-1, 2)
 440 |         dols = substr(n, 1, length(n)-3)
 441 |         if (dols == 0)
 442 |             return "zero dollars and " cents " cents exactly"
 443 |         return intowords(dols) " dollars and " cents " cents exactly"
 444 |     }
 445 |     
 446 |     function intowords(n) {
 447 |         n = int(n)
 448 |         if (n >= 1000)
 449 |             return intowords(n/1000) " thousand " intowords(n%1000)
 450 |         if (n >= 100)
 451 |             return intowords(n/100) " hundred " intowords(n%100)
 452 |         if (n >= 20)
 453 |             return tens[int(n/10)] " " intowords(n%10)
 454 |         return nums[n]
 455 |     }
 456 |     
 457 |     function initnum() {
 458 |         split("one two three four five six seven eight nine " \
 459 |               "ten eleven twelve thirteen fourteen fifteen " \
 460 |               "sixteen seventeen eighteen nineteen", nums, " ")
 461 |         split("ten twenty thirty forty fifty sixty " \
 462 |               "seventy eighty ninety", tens, " ")
 463 |     }
 464 | \end{awkcode}
 465 | 
 466 | 程序中包含了几个比较有趣的部分. 首先, 要注意到我们在 \BEGIN 中是如何利用
 467 | \marginpar{76}
 468 | \verb'sprintf' 来生成空格字符串的, 并且通过替换将空格字符串转换成
 469 | 破折号. 还要注意的是, 在函数 \verb'initnum' 中, 我们如何通过行的延续与
 470 | 字符串拼接来创建 \verb'split' 的参数 --- 这是很常见的编程技巧.
 471 | 
 472 | 日期通过
 473 | \begin{awkcode}
 474 |     "date" | getline date   # get today's date
 475 | \end{awkcode}
 476 | 从系统获取, 该行执行 \verb'date', 再把它的输出输送给 \verb'getline'. 为了把
 477 | \begin{awkcode}
 478 |     Wed Jun 17 13:39:36 EDT 1987
 479 | \end{awkcode}
 480 | 转换成 
 481 | \begin{awkcode}
 482 |     Jun 17, 1987
 483 | \end{awkcode}
 484 | 我们需要自己做一些处理工作.
 485 | (在不支持管道的非Unix平台上, 该程序需要做些修改才能正确运行)
 486 | 
 487 | 函数 \verb'numtowords' 与 \verb'intowords' 把数字转换成对应的单词, 转换过程
 488 | 非常直接 (虽然程序用了一半的代码来做这件事). \verb'intowords' 是一个递归
 489 | 函数, 它调用自身来处理一个规模较小的子问题, 这是本章出现过的第 2 个递归
 490 | 函数, 在后面我们还会遇到更多这样的函数. 在很多情况下, 为了把一个大问题分解
 491 | 成相对容易解决的小问题, 递归都是一种非常有效的方法.
 492 | 
 493 | \begin{exercise}
 494 |     利用前面提到过的程序 \verb'addcomma', 为金额加上逗号.
 495 | \end{exercise}
 496 | 
 497 | \begin{exercise}
 498 |     对于负的, 或特别大的金额, 程序 \verb'prchecks' 处理得并不是很好. 修改 
 499 |     程序: 拒绝金额为负的打印请求, 同时能将数额特别巨大的金额
 500 |     分成两行打印出来.
 501 | \end{exercise}
 502 | 
 503 | \begin{exercise}
 504 |     \label{exer:numtowords}
 505 |     函数 \verb'numtowords' 有时会在一行中连着打印两个空格, 还会打印出像 
 506 |     ``one dollars'' 这样有错误的句子, 你会如何消除这些瑕疵?
 507 | \end{exercise}
 508 | 
 509 | \begin{exercise}
 510 |     修改 \verb'prchecks': 在适当的地方, 为金额的大写形式加上
 511 |     连字符, 比如 ``twenty-one dollars''.
 512 | \end{exercise}
 513 | 
 514 | \section{数据验证}
 515 | \label{sec:data_validation}
 516 | 
 517 | Awk 的另一个常用功能是数据验证: 确保数据是合法的, 或至少合理的. 本节包含了
 518 | 几个用于验证输入有效性的小程序, 例如, 考虑上一节出现的列求和程序, 有没有
 519 | 这样一种情况: 在本应是数值的字段上出现了非数值的量 (或反之) ? 下面这个程序 
 520 | 与列求和程序非常相似, 但没有求和操作:
 521 | \marginpar{77}
 522 | \begin{awkcode}
 523 |     # colcheck - check consistency of columns
 524 |     #   input:  rows of numbers and strings
 525 |     #   output: lines whose format differs from first line
 526 |     
 527 |     NR == 1	{
 528 |         nfld = NF
 529 |         for (i = 1; i <= NF; i++)
 530 |            type[i] = isnum($i)
 531 |     }
 532 |     {   if (NF != nfld)
 533 |            printf("line %d has %d fields instead of %d\n",
 534 |               NR, NF, nfld)
 535 |         for (i = 1; i <= NF; i++)
 536 |            if (isnum($i) != type[i])
 537 |               printf("field %d in line %d differs from line 1\n",
 538 |                  i, NR)
 539 |     }
 540 |     
 541 |     function isnum(n) { return n ~ /^[+-]?[0-9]+$/ }
 542 | \end{awkcode}
 543 | 同样, 我们把数值看成是仅由数字构成的序列, 可能有前导符号, 如果想让这个判断
 544 | 更加完整, 请参考 \ref{sec:patterns} 节关于正则表达式的讨论.
 545 | 
 546 | \subsection{对称的分隔符}
 547 | \label{subsec:balanced_delimiters}
 548 | 
 549 | 本书有一个机器可读的版本, 在该版本中, 每一个程序都由一种特殊行开始,
 550 | 这个特殊
 551 | 行以 \verb'.P1' 打头, 同样, 程序以一种特殊行结束, 该行以 \verb'.P2' 打头.
 552 | 这些特殊行叫作 ``文本格式化'' 命令, 有了这些命令的帮助, 在排版书籍的时候,
 553 | 程序可以以一种容易识别的字体显示出来. 因为程序之间不能互相嵌套, 所以这些 
 554 | 文本格式化命令必须按照交替序列, 轮流出现:
 555 | \begin{awkcode}
 556 |     .P1 .P2 .P1 .P2 ... .P1 .P2
 557 | \end{awkcode}
 558 | 如果其中一个被漏掉了, 那么排版软件的输出将会是完全混乱的. 为了确保书籍被
 559 | 正确得排版, 我们写了这个小程序, 它可以用于检查分隔符是否按照正确的顺序出现,
 560 | 程序虽小, 但却是检查程序的典型代表:
 561 | \begin{awkcode}
 562 |     # p12check - check input for alternating .P1/.P2 delimiters
 563 |     
 564 |     /^\.P1/ { if (p != 0)
 565 |                   print ".P1 after .P1, line", NR
 566 |               p = 1
 567 |             }
 568 |     /^\.P2/ { if (p != 1)
 569 |                   print ".P2 with no preceding .P1, line", NR
 570 |               p = 0
 571 |             }
 572 |     END     { if (p != 0) print "missing .P2 at end" }
 573 | \end{awkcode}
 574 | 如果分隔符按照正确的顺序出现, 那么变量 \verb'p' 就会按照 \texttt{0 1 0 1 0
 575 | ... 1 0} 的规律变化, 否则, 一条错误消息被打印出来, 消息含有发生错误时,
 576 | 当前输入行 所在的行号.
 577 | \marginpar{78}
 578 | 
 579 | \begin{exercise}
 580 |     \label{exer:p12check}
 581 |     如何修改这个程序, 使得它可以处理具有多种分隔符的文本?
 582 | \end{exercise}
 583 | 
 584 | \subsection{密码文件检查}
 585 | \label{subsec:password_file checking}
 586 | 
 587 | Unix 系统中的密码文件含有授权用户的用户名及其相关信息, 密码文件的每一行都
 588 | 由 7 个字段组成:
 589 | \begin{awkcode}
 590 |     root:qyxRi2uhuVjrg:0:2::/:
 591 |     bwk:1L./v6iblzzNE:9:1:Brian Kernighan:/usr/bwk:
 592 |     ava:otxs1oTVoyvMQ:15:1:Al Aho:/usr/ava:
 593 |     uucp:xutIBs2hKtcls:48:1:uucp daemon:/usr/lib/uucp:uucico
 594 |     pjw:xNqY//GDc8FFg:170:2:Peter Weinberger:/usr/pjw:
 595 |     mark:j0z1fuQmqIvdE:374:1:Mark Kernighan:/usr/bwk/mark:
 596 |     ...
 597 | \end{awkcode}
 598 | 第1个字段是用户的登录名, 只能由字母或数字组成. 第 2 个字段是加密后的登录
 599 | 密码, 如果密码是空的, 那么任何人都可以利用这个用户名来登录系统, 如果这个
 600 | 字段非空, 那么只有知道密码的用户才能成功登录. 第 3 与第 4 个字段是数字,
 601 | 第 6 个字段以 \verb'/' 开始. 下面这个程序打印的行不符合前面所描述的结构,
 602 | 顺带打印它们的行号, 及一条恰当的诊断消息, 每个晚上都让这个程序运行一遍可
 603 | 以让系统更加健康, 远离攻击.
 604 | \begin{awkcode}
 605 |     # passwd - check password file
 606 |     
 607 |     BEGIN {
 608 |         FS = ":" }
 609 |     NF != 7 {
 610 |         printf("line %d, does not have 7 fields: %s\n", NR, $0) }
 611 |     $1 ~ /[^A-Za-z0-9]/ {
 612 |         printf("line %d, nonalphanumeric user id: %s\n", NR, $0) }
 613 |     $2 == "" {
 614 |         printf("line %d, no password: %s\n", NR, $0) }
 615 |     $3 ~ /[^0-9]/ {
 616 |         printf("line %d, nonnumeric user id: %s\n", NR, $0) }
 617 |     $4 ~ /[^0-9]/ {
 618 |         printf("line %d, nonnumeric group id: %s\n", NR, $0) }
 619 |     $6 !~ /^\// {
 620 |         printf("line %d, invalid login directory: %s\n", NR, $0) }
 621 | \end{awkcode}
 622 | 
 623 | 这是增量开发程序的好例子: 每当有人认为需要添加新的检查条件时, 只需要往
 624 | 程序中添加即可, 其他部分保持不动, 于是程序会越来越完善.
 625 | 
 626 | \subsection{自动生成数据验证程序}
 627 | \label{subsec:generating_data_validation_programs}
 628 | 
 629 | \marginpar{79}
 630 | 密码文件检查程序由我们手工编写而成, 不过更有趣的方式是把条件与消息集合
 631 | 自动转化成检查程序. 下面这个集合含有几个错误条件及其对应的提示信息, 
 632 | 这些错误条件取自上一个程序. 如果某个输入行满足错误条件, 对应的提示信息
 633 | 就会被打印出来.
 634 | \begin{awkcode}
 635 |     NF != 7                 does not have 7 fields
 636 |     $1 ~ /[^A-Za-z0-9]/     nonalphanumeric user id
 637 |     $2 == ""                no password
 638 | \end{awkcode}
 639 | 下面这个程序把\ \mbox{条件}\mbox{-}消息\ 对转化成检查程序:
 640 | \begin{awkcode}
 641 |     # checkgen - generate data-checking program
 642 |     #     input:  expressions of the form: pattern tabs message
 643 |     #     output: program to print message when pattern matches
 644 |     
 645 |     BEGIN { FS = "\t+" }
 646 |     { printf("%s {\n\tprintf(\"line %%d, %s: %%s\\n\",NR,$0) }\n",
 647 |           $1, $2)
 648 |     }
 649 | \end{awkcode}
 650 | 程序的输出是一系列的条件与打印消息的动作:
 651 | \begin{awkcode}
 652 |     NF != 7 {
 653 |         printf("line %d, does not have 7 fields: %s\n",NR,$0) }
 654 |     $1 ~ /[^A-Za-z0-9]/ {
 655 |         printf("line %d, nonalphanumeric user id: %s\n",NR,$0) }
 656 |     $2 == "" {
 657 |         printf("line %d, no password: %s\n",NR,$0) }
 658 | \end{awkcode}
 659 | 检查程序运行时, 如果当前输入行使得条件为真, 那么程序就会打印出一条消息, 
 660 | 消息含有当前输入行的行号, 错误消息, 及当前输入行的内容.
 661 | 需要注意的是, 在程序
 662 | \verb'checkgen' 中, \verb'printf' 格式字符串的某些特殊字符需要用双引号
 663 | 括起来, 只有这样才能生成有效的程序. 举例来说, 为了输出一个 \verb'%',
 664 | 必须将它写成 \verb'%%'; 为了输出 \verb'\n', 必须写成 \verb'\\n'.
 665 | 
 666 | 用一个 awk 程序来生成另一个 awk 程序是一个应用很广泛的技巧 (不限于 awk
 667 | 语言), 在本书后面的章节里我们还会看到更多这样的例子.
 668 | 
 669 | \begin{exercise}
 670 |     \label{exer:checkgen}
 671 |     增强 \verb'checkgen' 的功能, 使得我们可以原封不动地向程序传递一段
 672 |     代码, 例如创建一个 \verb'BEGIN' 来设置字段分隔符.
 673 | \end{exercise}
 674 | 
 675 | \subsection{AWK 的版本}
 676 | \label{subsec:which_version_of_awk}
 677 | 
 678 | Awk 可以用来检验程序, 也可以用来组织程序测试. 本小节包含的程序有几分近亲
 679 | 相奸的味道: 用 awk 程序检查 awk 程序.
 680 | 
 681 | Awk 的新版可能包含更多的内建变量与内建函数, 而老程序有可能不小心用到了
 682 | \marginpar{80}
 683 | 这些名字, 例如, 老程序用 \verb'sub' 命名一个变量, 而在新版 awk 中,
 684 | \verb'sub' 是一个内建函数. 下面的程序可以用来检测这种错误:
 685 | \begin{awkcode}
 686 |     # compat - check if awk program uses new built-in names
 687 |     
 688 |     BEGIN { asplit("close system atan2 sin cos rand srand " \
 689 |                    "match sub gsub", fcns)
 690 |             asplit("ARGC ARGV FNR RSTART RLENGTH SUBSEP", vars)
 691 |             asplit("do delete function return", keys)
 692 |           }
 693 |     
 694 |           { line = $0 }
 695 |     
 696 |     /"/   { gsub(/"([^"]|\\")*"/, "", line) }     # remove strings,
 697 |     /\//  { gsub(/\/([^\/]|\\\/)+\//, "", line) } # reg exprs,
 698 |     /#/   { sub(/#.*/, "", line) }                # and comments
 699 |     
 700 |           { n = split(line, x, "[^A-Za-z0-9_]+")  # into words
 701 |             for (i = 1; i <= n; i++) {
 702 |                 if (x[i] in fcns)	
 703 |                     warn(x[i] " is now a built-in function")
 704 |                 if (x[i] in vars)
 705 |                     warn(x[i] " is now a built-in variable")
 706 |                 if (x[i] in keys)
 707 |                     warn(x[i] " is now a keyword")
 708 |             }
 709 |           }
 710 |     
 711 |     function asplit(str, arr) {  # make an assoc array from str
 712 |         n = split(str, temp)
 713 |         for (i = 1; i <= n; i++)
 714 |             arr[temp[i]]++
 715 |         return n
 716 |     }
 717 |     
 718 |     function warn(s) {
 719 |         sub(/^[ \t]*/, "")
 720 |         printf("file %s, line %d: %s\n\t%s\n", FILENAME, FNR, s, $0)
 721 |     }
 722 | \end{awkcode}
 723 | 
 724 | 程序中真正复杂的地方在于替换语句: 在一个输入行被检查之前, 语句试图从输入
 725 | 行中移除被双引号包围的字符串, 正则表达式, 以及注释. 替换语句并不是非常
 726 | 完善, 所以有些行可能没有得到正确的处理.
 727 | 
 728 | 第 1 个 \verb'split' 函数的第 3 个参数被解释成一个正则表达式, 输入行中,
 729 | 被该表达式匹配的最左最长子字符串成为字段分隔符. \verb'split' 把不含
 730 | 字母或数字的字符串当作分隔符, 将输入行分割成一个个子字符串, 每个子字符串
 731 | 仅含有字母或数字. 这个分割操作把所有的运算符或标点符号一次性移除.
 732 | \marginpar{81}
 733 | 
 734 | 函数 \verb'asplit' 类似于 \verb'split', 但前者创建一个数组, 数组的下标是字
 735 | 符串内的单词, 然后就可以测试新来的单词是否在数组内.
 736 | 
 737 | 如果把文件 \filename{compat} 作为输入, 输出是:
 738 | \begin{awkcode}
 739 |     file tmp.awk, line 12: gsub is now a built-in function
 740 |         /\//  { gsub(/\/([^\/]|\\\/)+\//, "", line) } # reg exprs,
 741 |     file tmp.awk, line 13: sub is now a built-in function
 742 |         /#/   { sub(/#.*/, "", line) }                # and comments
 743 |     file tmp.awk, line 26: function is now a keyword
 744 |         function asplit(str, arr) {  # make an assoc array from str
 745 |     file tmp.awk, line 30: return is now a keyword
 746 |         return n
 747 |     file tmp.awk, line 33: function is now a keyword
 748 |         function warn(s) {
 749 |     file tmp.awk, line 34: sub is now a built-in function
 750 |         sub(/^[ \t]*/, "")
 751 |     file tmp.awk, line 35: FNR is now a built-in variable
 752 |         printf("file %s, line %d: %s\n\t%s\n", FILENAME, FNR, s, $0)
 753 | \end{awkcode}
 754 | 
 755 | \begin{exercise}
 756 |     重写 \verb'compat', 不用 \verb'asplit', 而是用正则表达式来识别关键词,
 757 |     内建函数等. 比较两个版本的复杂度与速度.
 758 | \end{exercise}
 759 | 
 760 | \begin{exercise}
 761 |     因为 awk 的变量不需要事先声明, 所以如果用户不小心把变量名写错了, awk 并
 762 |     不会检测到该错误. 写一个程序, 这个程序搜索文件中只出现一次的名字.
 763 |     为了让这个程序更具实用价值, 你可能需要对函数的定义及其用到的变量花点
 764 |     心思.
 765 | \end{exercise}
 766 | 
 767 | \section{打包与拆分}
 768 | \label{sec:bundle_and_unbundle}
 769 | 
 770 | 在讨论多行记录之前, 让我们先考虑一种特殊的情况: 如何将多个 ASCII 文件打包
 771 | (bundle) 成一个文件, 在打包完之后, 还可以将它们拆分 (unbundle) 成原来的文
 772 | 件. 这一节包含的两个程序分别用来完成这两件事情, 我们可以用它们将多个小文件
 773 | 打包成一个文件, 从而节省磁盘空间或方便邮寄.
 774 | 
 775 | 程序 \verb'bundle' 非常简短, 简短到你可以直接在命令行上输入, 它所做的工作
 776 | 仅仅是为每一行加上文件名前缀, 文件名可以通过内建变量 \verb'FILENAME' 得到.
 777 | \begin{awkcode}
 778 |     # bundle - combine multiple files into one
 779 |     { print FILENAME, $0 }
 780 | \end{awkcode}
 781 | 
 782 | 对应的 \verb'unbundle' 程序只是稍微需要花点心思:
 783 | \marginpar{82}
 784 | \begin{awkcode}
 785 |     # unbundle - unpack a bundle into separate files
 786 |     
 787 |     $1 != prev { close(prev); prev = $1 }
 788 |                { print substr($0, index($0, " ") + 1) >$1 }
 789 | \end{awkcode}
 790 | 如果遇到一个新文件, 则关闭之前打开的文件, 如果文件不是很多 (小于同时处于打开
 791 | 状态的文件数的最大值), 那么这一行可以省略.
 792 | 
 793 | 实现 \verb'bundle' 与 \verb'unbundle' 的方法还有很多种, 但是这里介绍的方法是
 794 | 最简单的, 而且对于比较短的文件, 空间效率也比较高. 另一种组织方式是在每
 795 | 一个文件之前, 添加一行带有文件名的, 容易识别的行, 这样的话, 文件名只需要
 796 | 出现一次.
 797 | 
 798 | \begin{exercise}
 799 |     比较不同版本 \verb'bundle' 与 \verb'unbundle' 的时间效率和空间效率,
 800 |     这些不同的版本用到了不同的头部信息与尾部信息, 对程序的性能与复杂性
 801 |     之间的折衷进行评价.
 802 | \end{exercise}
 803 | 
 804 | \section{多行记录}
 805 | \label{sec:multiline_records}
 806 | 
 807 | 到目前为止遇到的记录都是由单行组成的, 然而, 还有大量的数据, 其每一条记录
 808 | 都由多行组成, 比如地址薄
 809 | \begin{awkcode}
 810 |     Adam Smith
 811 |     1234 Wall St., Apt. 5C
 812 |     New York, NY 10021
 813 |     212 555-4321
 814 | \end{awkcode}
 815 | 或参考文献
 816 | \begin{awkcode}
 817 |     Donald E. Knuth
 818 |     The Art of Computer Programming
 819 |     Volume 2: Seminumerical Algorithms, Second Edition
 820 |     Addison-Wesley, Reading, Mass.
 821 |     1981
 822 | \end{awkcode}
 823 | 或个人笔记
 824 | \begin{awkcode}
 825 |     Chateau Lafite Rothschild 1947
 826 |     12 bottles @ 12.95
 827 | \end{awkcode}
 828 | 
 829 | 如果大小合适, 结构也很规范, 那么创建并维护这些信息相对来说还是比较容易的,
 830 | 在效果上, 每一条记录都等价于一张索引卡片. 与单行数据相比, 使用 awk 处理 
 831 | 多行数据所付出的工作量只是稍微多了一点. 我们将展示几种处理多行数据的方法.
 832 | 
 833 | \subsection{由空行分隔的记录}
 834 | \label{subsec:records_separated_by_blank_lines}
 835 | 
 836 | 假设我们有一本地址薄, 其每一条记录的前面 4 行分别是名字, 街道地址, 城市
 837 | 和州, 在这 4 行之后, 可能包含一行额外的信息, 记录之间由一行空白行分开:
 838 | \marginpar{83}
 839 | \begin{awkcode}
 840 |     Adam Smith
 841 |     1234 Wall St., Apt. 5C
 842 |     New York, NY 10021
 843 |     212 555-4321
 844 | 
 845 |     David W. Copperfield
 846 |     221 Dickens Lane
 847 |     Monterey, CA 93940
 848 |     408 555-0041
 849 |     work phone 408 555-6532
 850 |     Mary, birthday January 30
 851 | 
 852 |     Canadian Consulate
 853 |     555 Fifth Ave
 854 |     New York, NY
 855 |     212 586-2400
 856 | \end{awkcode}
 857 | 
 858 | 如果记录是由空白行分隔的, 那么它们可以被直接处理: 若记录分隔符 \verb'RS'
 859 | 被设置成空值 (\verb'RS=""'), 则每一个行块都被当成一个记录, 于是 
 860 | \begin{awkcode}
 861 |     BEGIN { RS = "" }
 862 |     /New York/
 863 | \end{awkcode}
 864 | 打印所有的, 含有 \verb'New York' 的记录, 而不管这个记录有多少行:
 865 | \begin{awkcode}
 866 |     Adam Smith
 867 |     1234 Wall St., Apt. 5C
 868 |     New York, NY 10021
 869 |     212 555-4321
 870 |     Canadian Consulate
 871 |     555 Fifth Ave
 872 |     New York, NY
 873 |     212 586-2400
 874 | \end{awkcode}
 875 | 如果记录按照这种方式打印出来, 则输出记录之间是不会有空白行的, 输入格式并
 876 | 不会被保留下来. 为了解决这个问题, 最简单的办法是把输出记录分隔符 \verb'ORS'
 877 | 设置成 \verb'\n\n':
 878 | \begin{awkcode}
 879 |     BEGIN { RS = ""; ORS = "\n\n" }
 880 |     /New York/
 881 | \end{awkcode}
 882 | 
 883 | 假设我们想要输出 \verb'Smith''s 的全名和他的电话号码 (也就是第 1 行以 
 884 | \verb'Smith' 结尾的记录的第 1 行与第 4 行), 如果每一行都表示一个字段, 那么
 885 | 就比较容易, 只要把 \verb'FS' 设置成 \verb'\n' 即可:
 886 | \marginpar{84}
 887 | \begin{awkcode}
 888 |     BEGIN           { RS = ""; FS = "\n" }
 889 |     $1 ~ /Smith$/   { print $1, $4 } # name, phone
 890 | \end{awkcode}
 891 | 程序的输出是 
 892 | \begin{awkcode}
 893 |     Adam Smith 212 555-4321
 894 | \end{awkcode}
 895 | 前面提过, 不管 \verb'FS' 的值是什么, 换行符总是多行记录的字段分隔符之一.
 896 | 如果 \verb'RS' 被设置成 \verb'""', 则默认的字段分隔符就是空格符, 制表符,
 897 | 以及换行符; 如果 \verb'FS' 是 \verb'\n', 则换行符就是唯一的字段分隔符.
 898 | 
 899 | \subsection{处理多行记录}
 900 | \label{subsec:processing_multiline_records}
 901 | 
 902 | 如果已经有一个程序可以以行为单位对输入进行处理, 那么我们只需要再写 2 个
 903 | awk 程序, 就可以把原来的程序应用到多行记录上. 第 1 个程序把多行记录组合成
 904 | 单行记录, 然后再由已存在的程序进行处理, 最后, 第 2 个程序再把输出转换成多行
 905 | 格式. (我们假设行的长度不会超过 awk 的上限)
 906 | 
 907 | 为了使过程更加具体, 现在让我们用 Unix 命令 \verb'sort' 对地址薄进行排序,
 908 | 下面的程序 \verb'pipeline' 按照姓氏对输入进行排序:
 909 | \begin{awkcode}
 910 |     # pipeline to sort address list by last names
 911 | 
 912 |     awk '
 913 |     BEGIN { RS = ""; FS = "\n" }
 914 |           { printf("%s!!#", x[split($1, x, " ")])
 915 |             for (i = 1; i <= NF; i++)
 916 |                 printf("%s%s", $i, i < NF ? "!!#" : "\n")
 917 |           }
 918 |     ' |
 919 |     sort |
 920 |     awk '
 921 |     BEGIN { FS = "!!#" }
 922 |           { for (i = 2; i <= NF; i++)
 923 |                 printf("%s\n", $i)
 924 |             printf("\n")
 925 |           }
 926 |     '
 927 | \end{awkcode}
 928 | 
 929 | 第 1 个程序中, 函数 \verb'split($1, x, " ")' 把每个记录的第 1 行切割并保
 930 | 存到数组 \verb'x' 中, 返回元素的个数, 于是, 姓氏保存在 
 931 | 元素 \verb'x[split($1, x, " ")]' 中 (前提是记录的第1行的最后一个单词确实是
 932 | 姓氏). 对每一条多行记录, 第 1 个程序都会创建一个单行记录, 记录包括姓氏,
 933 | 后面跟着字符串 \verb'!!#', 再后面是原多行记录的各个字段 (字段之间也是通过
 934 | 字符串 \verb'!!#' 分隔). 只要是输入数据中没有出现的, 并且在排序时
 935 | 可以排在输入数据之前的字符串, 都可以用来代替 \verb'!!#'. \verb'sort' 之后
 936 | 的程序通过分隔符 \verb'!!#' 识别原来的字段, 并重构出多行记录.
 937 | 
 938 | \begin{exercise}
 939 |     修改第 1 个 awk 程序: 检查输入数据中是否包含魔术字符串
 940 |     \verb'!!#'.
 941 | \end{exercise}
 942 | 
 943 | \marginpar{85}
 944 | \subsection{带有头部和尾部的记录}
 945 | \label{subsec:records_with_headers_and_trailers}
 946 | 
 947 | 有时候, 记录通过一个头部信息与一个尾部信息来识别, 而不是字段分隔符. 考虑 
 948 | 一个简单的例子, 仍然是地址薄, 不过每个记录都带有一个头部信息, 该信息
 949 | 指出了记录的某些特征 (比如职业), 跟在头部后面的是名字, 每条记录 (除了最
 950 | 后一条) 都由一个尾部结束, 尾部由一个空白行组成:
 951 | \begin{awkcode}
 952 |     accountant
 953 |     Adam Smith
 954 |     1234 Wall St., Apt. 5C
 955 |     New York, NY 10021
 956 | 
 957 |     doctor - ophthalmologist
 958 |     Dr. Will Seymour
 959 |     798 Maple Blvd.
 960 |     Berkeley Heights, NJ 07922
 961 | 
 962 |     lawyer
 963 |     David W. Copperfield
 964 |     221 Dickens Lane
 965 |     Monterey, CA 93940
 966 | 
 967 |     doctor - pediatrician
 968 |     Dr. Susan Mark
 969 |     600 Mountain Avenue
 970 |     Murray Hill, NJ 07974
 971 | \end{awkcode}
 972 | 为了打印所有医生的记录, 范围模式是最简单的办法:
 973 | \begin{awkcode}
 974 |     /^doctor/, /^$/
 975 | \end{awkcode}
 976 | 范围模式匹配以 \verb'doctor' 开始, 以空白行结束的记录 (\verb'/^$/' 匹配 
 977 | 一个空白行).
 978 | 
 979 | 为了从输出中移除掉头部信息, 我们可以用 
 980 | \begin{awkcode}
 981 |     /^doctor/ { p = 1; next }
 982 |     p == 1
 983 |     /^$/      { p = 0; next }
 984 | \end{awkcode}
 985 | 这个程序使用了一个变量控制行的打印, 如果当前输入行包含有期望的头部信息,
 986 | 则 \verb'p' 被设置为 1, 随后的尾部信息将 \verb'p' 重置为 0 (也就是 \verb'p'
 987 | 的初始值). 因为仅当 \verb'p' 为 1 时才会把当前输入行打印出来, 所以程序只
 988 | 打印记录的主体部分与尾部, 而选择其他输出组合反而比较简单.
 989 | 
 990 | \subsection{\mbox{名字}-值}
 991 | \label{subsec:name_value_data}
 992 | 
 993 | 
 994 | \marginpar{86}
 995 | 某些应用的数据更加结构化, 这些数据没办法表示成一系列非结构化的文本, 例如,
 996 | 地址可能含有国家名称, 也可能不包括街道地址.
 997 | 
 998 | 处理结构化数据的一种方法是为记录的每一个字段加上一个名字或关键词, 例如,
 999 | 我们有可能如此组织一本支票薄:
1000 | \begin{awkcode}
1001 |     check	1021
1002 |     to	Champagne Unlimited
1003 |     amount	123.10
1004 |     date	1/1/87
1005 | 
1006 |     deposit
1007 |     amount	500.00
1008 |     date	1/1/87
1009 | 
1010 |     check	1022
1011 |     date	1/2/87
1012 |     amount	45.10
1013 |     to	Getwell Drug Store
1014 |     tax	medical
1015 | 
1016 |     check	1023
1017 |     amount	125.00
1018 |     to	International Travel
1019 |     date	1/3/87
1020 | 
1021 |     amount	50.00
1022 |     to	Carnegie Hall
1023 |     date	1/3/87
1024 |     check	1024
1025 |     tax	charitable contribution
1026 | 
1027 |     to	American Express
1028 |     check	1025
1029 |     amount	75.75
1030 |     date	1/5/87
1031 | \end{awkcode}
1032 | 我们仍然使用多行记录, 记录之间用一个空白行分隔, 但是在记录内部, 每一个 
1033 | 数据都是自描述的: 每一个字段都由一个条目名称, 一个制表符, 及信息组成.
1034 | 这意味着不同的记录可以包含不同的字段, 即使是类似的字段, 其排列顺序也可以
1035 | 不一样.
1036 | 
1037 | 处理这种数据的方法之一是把它们都当作单行数据, 但是要注意空白行被当作
1038 | 分隔符. 每一行都指出了字段的名称及其所对应的值, 行与行之间没有其他过多的联
1039 | 系, 比如说, 如果我们想要计算存款与支票的总额, 只需要扫描存款项与支票项即可:
1040 | \marginpar{87}
1041 | \begin{awkcode}
1042 |     # check1 - print total deposits and checks
1043 | 
1044 |     /^check/   { ck = 1; next }
1045 |     /^deposit/ { dep = 1; next }
1046 |     /^amount/  { amt = $2; next }
1047 |     /^$/       { addup() }
1048 | 
1049 |     END        { addup()
1050 |                  printf("deposits $%.2f, checks $%.2f\n",
1051 |                      deposits, checks)
1052 |                }
1053 | 
1054 |     function addup() {
1055 |         if (ck)
1056 |             checks += amt
1057 |         else if (dep)
1058 |             deposits += amt
1059 |         ck = dep = amt = 0
1060 |     }
1061 | \end{awkcode}
1062 | 输出是 
1063 | \begin{awkcode}
1064 |     deposits $500.00, checks $418.95
1065 | \end{awkcode}
1066 | 
1067 | 程序非常简单, 只要输入数据格式正确, 不管记录中的条目以何种顺序出现, 程序 
1068 | 都能正确地工作. 但是程序也很脆弱, 它需要非常认真地初始化, 以及对文件结束
1069 | 标志的处理. 我们还有另一种方案可供选择, 那就是一次读取一条记录, 当需要时
1070 | 再对记录的条目进行挑选. 下面的程序也是对存款与支票进行求和, 但它使用了一
1071 | 个函数, 这个函数提取具有指定名字的条目的值:
1072 | \begin{awkcode}
1073 |     # check2 - print total deposits and checks
1074 | 
1075 |     BEGIN           { RS = ""; FS = "\n" }
1076 |     /(^|\n)deposit/ { deposits += field("amount"); next }
1077 |     /(^|\n)check/   { checks += field("amount"); next }
1078 |     END             { printf("deposits $%.2f, checks $%.2f\n",
1079 |                           deposits, checks)
1080 |                     }
1081 | 
1082 |     function field(name,   i,f) {
1083 |         for (i = 1; i <= NF; i++) {
1084 |             split($i, f, "\t")
1085 |             if (f[1] == name)
1086 |                 return f[2]
1087 |         }
1088 |         printf("error: no field %s in record\n%s\n", name, $0)
1089 |     }
1090 | \end{awkcode}
1091 | 函数 \verb'field(s)' 在当前记录中搜索名字是 \verb's' 的条目, 如果找到, 就
1092 | 把该项的值返回.
1093 | 
1094 | 第 3 种方案是把记录的每一个字段都分割到一个关联数组中, 然后再对值进行访问.
1095 | 下面将要介绍的程序以一种更加紧凑的方式打印支票信息:
1096 | \marginpar{88}
1097 | \begin{awkcode}
1098 |       1/1/87  1021  $123.10  Champagne Unlimited
1099 |       1/2/87  1022   $45.10  Getwell Drug Store
1100 |       1/3/87  1023  $125.00  International Travel
1101 |       1/3/87  1024   $50.00  Carnegie Hall
1102 |       1/5/87  1025   $75.75  American Express
1103 | \end{awkcode}
1104 | 程序的代码是
1105 | \begin{awkcode}
1106 |     # check3 - print check information
1107 | 
1108 |     BEGIN { RS = ""; FS = "\n" }
1109 |     /(^|\n)check/ {
1110 |         for (i = 1; i <= NF; i++) {
1111 |             split($i, f, "\t")
1112 |             val[f[1]] = f[2]
1113 |         }
1114 |         printf("%8s %5d %8s  %s\n",
1115 |             val["date"],
1116 |             val["check"],
1117 |             sprintf("$%.2f", val["amount"]),
1118 |             val["to"])
1119 |         for (i in val)
1120 |             delete val[i]
1121 |     }
1122 | \end{awkcode}
1123 | 利用 \verb'sprintf', 我们在总额的前面加上了美元符, 然后 \verb'printf' 再对
1124 | 字符串进行右对齐后输出.
1125 | 
1126 | \begin{exercise}
1127 |     写一个命令 \texttt{lookup}\ \textit{x}\ \textit{y}, 该命令从已知的文件
1128 | 中打印所有符合条件的多行记录, 条件是记录含有名字为 \textit{x} 且值为
1129 |     \textit{y} 的项.
1130 | \end{exercise}
1131 | 
1132 | \section{小结}
1133 | \label{sec:data_processing_summary}
1134 | 
1135 | 在这一章, 我们展示了多种不同种类的数据处理程序, 它们包括: 从地址薄中获取
1136 | 信息, 从数值数据中计算简单的统计信息, 检查程序或数据的有效性, 等等.
1137 | 使用 awk 完成这些工作非常简单, 其中原因是多方面的: \patact 模型非常适合
1138 | 这种类型的工作, 可调整的字段与记录分隔符可以适应不同格式与形状的输入数据,
1139 | 关联数组无论是存储数值, 还是字符串都非常方便, 像 \verb'split' 与 
1140 | \verb'substr' 这样的函数擅长文本数据的挑选, 而 \printf 则是一个灵活的
1141 | 格式化工具. 在下面的章节里, 我们将会看到这些功能更进一步的应用.
1142 | 


--------------------------------------------------------------------------------
/latex_src/epilog.tex:
--------------------------------------------------------------------------------
  1 | % vim: ts=8 sts=8 sw=4 et tw=75
  2 | \chapter{后记}
  3 | \label{chap:epilog}
  4 | \marginpar{181}
  5 | 
  6 | 能看到这里, 说明读者在某种程度上已经是一个熟练的 awk 用户了, 至少不再是
  7 | 一个笨拙的初学者. 当你在学习书中的示例程序时, 以及自己写程序的过程中, 
  8 | 可能想知道 awk 为什么会是现在这个样子, 是否还有需要改进的地方.
  9 | 
 10 | 本章的第一部分先讲一些历史故事, 然后讨论一下作为编程语言使用时, awk 有
 11 | 哪些优点和缺点. 第二部分探讨 awk 程序的性能, 另外, 如果某个问题过于庞大,
 12 | 以致于无法用一个单独的程序解决时, 文章也提供了一些对问题进行重新规划
 13 | 的方法.
 14 | 
 15 | \section{作为语言的 AWK}
 16 | \label{sec:awk_as_a_language}
 17 | 
 18 | 关于 awk 的工作开始于 1977 年. 在那时候, 搜索文件的 Unix 程序
 19 | (\texttt{grep} 和 \texttt{sed}) 只支持正则表达式模式, 并且唯一能做的操作
 20 | 只有替换和打印一整行数据, 还不存在字段和非数值操作. 我们当时的目标是
 21 | 开发一款模式识别语言, 该语言支持字段, 包括用模式匹配字段, 以及用动作
 22 | 操作字段. 最初, 我们只能想用它转换数据, 验证程序的输入, 通过处理
 23 | 输出数据来生成报表, 或对它们重新编排, 以此作为其他程序的输入.
 24 | 
 25 | 1977 年的 awk 只有很少的内建变量和预定义函数, 当时只是用它写一些很简短
 26 | 的程序, 就像第 \ref{chap:an_awk_tutorial} 章中出现的那些小程序. 后来,
 27 | 我们写了一个小教程, 来指导新同事如何使用 awk. 正则表达式的表示法
 28 | 来源于 \texttt{lex} 和 \texttt{egrep}, 其他的表达式和语句则来源于 C
 29 | 语言.
 30 | 
 31 | 我们希望程序能够尽量得简洁, 最好只有一两行, 这样就能够快速地输入并执行,
 32 |     默认操作都是为了向这个方向努力, 具体来说, 使用空格
 33 | 作为默认的字段分隔符, 隐式地初始化, 变量的无类型声明, 等等, 这些设计
 34 | 选择都使得 单行程序变成可能. 作为作者, 我们非常清楚地 ``知道'' awk
 35 | 将会被如何地使用, 所以我们通常只写单行程序.
 36 | 
 37 | \marginpar{182}
 38 | Awk 的快速传播强有力地推动了语言的发展. 把 awk 作为一门通用编程语言使用,
 39 | 而且能够这么快速地流行起来, 我们都感到非常的惊喜, 当看到一个无法在
 40 | 一页内显示完毕的 awk 程序时, 我们的第一反应是震惊和惊异. 之所以会出现这种
 41 | 情况是因为许多人在使用计算机时, 仅限于 shell (命令行语言) 和 awk, 而不
 42 | 是使用一门 ``真正'' 的编程语言来开发程序 --- 他们经常过度伸展所喜爱
 43 | 的工具.
 44 | 
 45 | 为变量的值同时维护两种表示形式: 字符串与数值, 根据上下文选择合适的形式
 46 | --- 这只是一个实验性设计, 目的是为了尽可能地使用同一套运算符集合写出
 47 | 简短的程序, 在字符串与数值的界限很模糊的情况下, 程序也要能正确地工作.
 48 | 最终目标完成地很好, 但偶尔也会因为粗心而得到意料之外的运行结果. 第 
 49 | \ref{chap:the_awk_language} 章介绍的规则可以用来解决界限模糊的情况, 它
 50 | 们都来源于用户的使用经验.
 51 | 
 52 | 关联数组的灵感来源于 SNOBOL4 表格 (虽然它们不具备 SNOBOL4 表格的通用性).
 53 | 诞生 awk 的机器内存很小, 而且速度很慢, 正是这个环境造就了数组现在的性质.
 54 | 把下标类型限制为字符串是其中一种表现, 另外的限制还包括单维数组 (虽然套了
 55 | 一层语法外衣, 但本质上还是一维数组). 一个更加通用的实现应该支持多维数组,
 56 | 至少支持数组元素可以是另外一个数组.
 57 | 
 58 | Awk 的主要特性在 1985 年被加入进来, 主要是为了满足用户需求. 添加的
 59 | 特性包括动态正则表达式, 新的内建变量与内建函数, 多输入流, 以及最重要的用
 60 | 户自定义函数.
 61 | 
 62 | \texttt{match}, 动态正则表达式和新的字符串替换函数提供了非常有用的功能,
 63 | 而且对用户来说, 只是稍微增加了一点复杂度.
 64 | 
 65 | 在 \texttt{getline} 被引入之前, 输入数据的唯一种类是 \patact 语句所隐含
 66 | 着的隐式输入循环. 这个限制条件确实太强了. 在原来的 awk 版本中, 对于具有
 67 | 多个输入数据源的程序 (比如格式信函生成程序) 来说, 必须通过设置一个标志变量
 68 | (或其他类似的技巧) 来读取数据源. 而在新版的 awk 中, 多个输入数据可以在
 69 | \texttt{BEGIN} 部分, 用 \texttt{getline} 读取. 另一方面, \texttt{getline}
 70 | 是过载的, 它的语法和其他表达式相比并不一致. 其中一个问题是 \texttt{getline}
 71 | 需要返回它所读取到的数据, 但同时也会返回表示成功或失败的返回值.
 72 | 
 73 | 用户自定义函数的实现是一个折中方案, 从 awk 的最初设计开始, 出现了许多
 74 | 困难. 我们并不打算在语言中加入声明, 这个设计造成的一个结果是声明局部
 75 | 变量的特殊方法 --- 把局部变量写到参数列表中. 这种做法不仅看起来很陌生,
 76 | 而且会让大型程序更容易出错. 另外, 缺少显式的字符串拼接运算符可以让程序
 77 | 更加简短, 但这同时也要求在调用函数时, 必须在函数名之后紧跟上左括号. 不
 78 | 管怎么说, 新的特性使得用 awk 编写大型程序变得更加容易.
 79 | \marginpar{183}
 80 | 
 81 | \section{性能}
 82 | \label{sec:performance}
 83 | 在某种程度上, awk 是很有吸引力的 --- 通常情况下, 用它编写你所需要的程序
 84 | 非常容易, 而且在面对适当规模的数据时, 处理起来也足够快, 特别是在程序本身
 85 | 也会变化的情况下.
 86 | 
 87 | 然而, 当处理的数据规模越来越大时, awk 程序就会越来越慢. 从常理上讲, 这
 88 | 种现象是很正常的, 但是等待结果的过程常常使人无法忍受. 解决这种问题
 89 | 的办法都比较复杂, 但是本节提出的一些建议或许能对你产生一些帮助.
 90 | 
 91 | 当程序的运行时间过长时, 除了忍耐, 可以试着从其他几个方面入手. 首先, 让
 92 | 程序运行得更快是可能的 --- 或者利用更好的算法, 或者是把频繁执行的操作,
 93 | 用等价的, 但是更轻量的操作替换掉. 在第
 94 | \ref{chap:experiments_with_algorithms} 章读者已经见到了一个优秀的算法能够
 95 | 产生的巨大作用 --- 即使是在数据规模只有适度增加的情况下, 线性算法和平方
 96 | 算法之间也会产生巨大的差距. 然后, 你可以限制 awk 程序的功能, 而使用其
 97 | 他更快速的程序和 awk 配合. 最后, 你也可以用其他编程语言重写程序.
 98 | 
 99 | 在着手提高程序的性能之前, 你必须知道程序的时间都花在了哪里. 即使是在 
100 | 每种操作和底层硬件非常接近的编程语言中, 人们对时间开销的分布所作出的估计
101 | 也会非常不可靠. 在 awk 中, 这些估计会显得更加狡猾, 因为其中许多操作和
102 | 传统的机器操作并不对应, 这些操作包括模式匹配, 字段分割, 字符串拼接和
103 | 替换. 在不同的机器上, awk 所执行的用于实现这些操作的指令也会不同, 因此
104 | awk 程序中相关操作的开销也就不同.
105 | 
106 | Awk 并没有内建的计时工具, 因此在本地环境中, 哪些操作属于高开销, 哪些操 
107 | 作属于低开销 --- 完全取决于用户怎么理解. 为了分辨出高开销和低开销操作,
108 | 最简单的办法就是制作一份不同构造之间的差异度量. 例如, 读取一行数据或
109 | 递增一个变量的值需要多长时间? 我们在多种不同的计算机平台上都做了测量 ---
110 | 从 PC 一直到大型机. 用一个包含 10,000 行 (500,000 个字符) 的文件作为
111 | 输入数据, 运行 3 个程序, 同时和 Unix 命令 \texttt{wc} 作对比. 测试结果
112 | 如下:
113 | \begin{center}
114 | \begin{tabular}{l|c|c|c|c|c}
115 |     \hline
116 |     \hline
117 |     \multicolumn{1}{c|}{程序} & \makecell{AT\&T \\ 6300+} &
118 |     \makecell{DEC VAX \\ 11-750} &
119 |     \makecell{AT\&T \\ 3B2/600} & \makecell{SUN-3} &
120 |     \makecell{DEC VAX \\ 8550} \\
121 |     \hline
122 |     \texttt{END \{ print NR \}} & 30 & 17.4 & 5.9 & 4.6 & 1.6 \\
123 |     \texttt{\{n++\}; END \{print n\}} & 45 & 24.4 & 8.4 & 6.5 & 2.4 \\
124 |     \texttt{\{ i = NF \}} & 59 & 34.8 & 12.5 & 9.3 & 3.3 \\
125 |     \texttt{wc} 命令 & 30 & 8.2 & 2.9 & 3.3 & 1.0 \\
126 |     \hline
127 | \end{tabular}
128 | \end{center}
129 | 第 1 个程序在 DEV VAX 8550 中运行了 1.6 秒, 也就是说读取一行数据平均
130 | 消耗 0.16 微秒. 第 2 个程序表明在读取数据的同时, 递增变量需要额外消耗 
131 | 0.08 微秒. 第 3 个程序表明把输入行切分成字段需要 0.33 微秒. 作为对比,
132 | \marginpar{184}
133 | 用 C 程序 (在这里是 Unix 命令 \texttt{wc}) 对 10,000 行数据进行计数
134 | 需要 1 秒钟的时间, 也就是每行 0.1 微秒.
135 | 
136 | 其他类似的测量表明字符串比较操作, 例如 \verb'$1=="xxx"' 所花费的时间,
137 | 和正则表达式匹配 \verb'$1~/xxx/' 大致相同. 正则表达式匹配的时间开销
138 | 基本上独立于表达式的复杂程度, 但是当一个复合比较表达式变得越来越复杂时,
139 | 它的时间开销也会越来越高. 动态正则表达式的开销可以变得很高, 因为它
140 | 可能需要为每一个测试重新构造识别对象.
141 | 
142 | 拼接多个字符串的代价比较昂贵:
143 | \begin{awkcode}
144 |     print $1 " " $2 " " $3 " " $4 " " $5
145 | \end{awkcode}
146 | 所花费的时间大概是
147 | \begin{awkcode}
148 |     print $1, $2, $3, $4, $5
149 | \end{awkcode}
150 | 的 2 倍.
151 | 
152 | 我们在前面说过, 数组的行为比较复杂. 只要数组中的元素不太多, 则访问一个元素
153 | 的时间开销就是一定的. 在这之后, 随着元素个数的增加, 时间开销大致按照线性
154 | 增长. 如果元素个数非常多, 这时候程序的性能也会受到操作系统的影响, 因为操作
155 | 系统需要分配内存来存放变量. 因此, 相对于小数组, 在大数组中访问一个元素
156 | 需要付出更高的代价. 如果你想在数组中存放一个大文件, 那就必须牢记这些.
157 | 
158 | 第 2 个手段是重新构造计算过程, 使得其中一些工作可以通过其他程序完成.
159 | 例如在整本书中, 为了避免自己写一个排序用的 awk 程序, 我们用了多次
160 | \texttt{sort} 命令. 如果你需要从一个很大的文件中分离出某些数据, 可以用
161 | \texttt{grep} 或 \texttt{egrep} 搜索数据, 然后再交由 awk 处理.
162 | 如果你需要做大量的替换操作 (比如第 \ref{chap:processing_words} 章的交
163 | 叉引用程序), 那么可以选择一种流式编辑器 (比如 \texttt{sed}) 来完成这部分
164 | 工作. 简单来说, 就是把一个大任务切分成多个小任务, 然后再针对每个小任务
165 | 选择一个最合适的工具处理.
166 | 
167 | 最后一个办法是用其他编程语言重写程序. 基本原则是把 awk 中比较有用的内建
168 | 特性用子例程替换掉, 除此之外, 尽量让新程序和原程序在结构上保持一致.
169 | 不要试图完全模仿 awk 的工作方式, 只要能解决问题就足够了.
170 | 比较有用的练习是写一个小型函数库, 函数库提供了字段分割, 关联数组,
171 | 和正则表达式匹配, 在某些不支持动态字符串的语言中 (比如 C 语言), 你可能
172 | 还需要一些能够方便地分配和释放字符串的子例程. 有了这些库函数, 把 awk 
173 | 程序转换成其他更快的程序就方便多了.
174 | 
175 | 通过模式匹配, 字段分割, 关联数组等内建特性, awk 把其他传统语言很难完
176 | 成的工作都简单化了. 利用这些特性, awk 程序虽然编写起来比较方便, 但是和%
177 | \marginpar{185}%
178 | 认真编写过的等价的 C 程序相比, 在效率上会差一点. 一般来说, 效率并不会
179 | 成为什么大问题, 所以 awk 既方便使用, 运行起来也足够快.
180 | 
181 | 当 awk 的效率成为一个问题时, 就有必要测量一下程序中各个部分的运行时间, 看
182 | 看时间都花在了哪里. 虽然在不同的机器中, 相关操作的开销都会有所不同,
183 | 但是测量技术可以应用在任何一台机器中. 最后, 虽然使用低级语言编写程序
184 | 比较麻烦, 但是也要注意理解程序与时间, 否则的话, 新程序不仅难以编写, 效率 
185 | 还很低.
186 | 
187 | \section{结论}
188 | \label{sec:conclusion}
189 | 虽然 awk 不能解决所有的编程问题, 但它却是程序员的必备工具之一, 尤其是
190 | 在 Unix (在 Unix 中要经常用到各种工具). 也许书中的大程序给你留下了些
191 | 不同的印象, 但是大多数 awk 程序其实非常简短, 而且所执行的任务本来就是
192 | 当初开发 awk 的目标: 计数, 数据格式转换, 计算, 以及从报表中提取信息.
193 | 
194 | 对于上一段中提到的任务, 程序的开发时间比运行时间更加重要, 在这一方面
195 | awk 难逢敌手. 隐式输入循环和 \patact 范式简化了 (而且经常是完全消除了)
196 | 流程控制语句. 字段分割操作处理最常见的输入数据形式, 而数值和字符串, 以
197 | 及它们之间的类型转换处理最常见的数据类型. 关联数组同时提供了传统的数组
198 | 存储功能和灵活的下标. 正则表达式提供了描述文本的统一表示法. 默认的初
199 | 始化操作和声明的缺少缩减了程序的规模.
200 | 
201 | 我们没有料到的是, 在许多非常规应用中也用到了 awk. 比如 ``非编程'' 到
202 | ``编程'' 的转换是一个渐变的过程, 由于缺少传统语言 (比如
203 | C 和 Pascal) 所具有的语法包袱, 所以 awk 学习起来非常简单, 它甚至是相
204 | 当一部分人的第一门编程语言.
205 | 
206 | 在 1985 年加入的特性, 尤其是自定义函数的支持, 催生出了许多未曾预料到的
207 | 应用, 比如小型数据库系统和小语言编译器. 在许多种情况下, awk 只是用
208 | 于构造原型, 测试想法的可行性, 以及对特性和用户接口进行评测, 即使如此,
209 | 在某些情况下, awk 程序仍然可以被当作一件真正的产品使用. Awk 甚至被用
210 | 到了软件工程课程中, 因为和大型语言相比, 使用 awk 对设计进行实验可能会
211 | 更加方便.
212 | 
213 | 当然, 我们要小心不能走得太远 --- 任何工具都有极限 --- 但是很多人已经
214 | 发现, awk 是解决许多问题的有用工具, 我们希望本书所介绍的使用方法, 对
215 | 读者来说同样有用.
216 | 
217 | \marginpar{186}
218 | \subsection*{参考资料}
219 | 
220 | 本书作者写的 ``AWK --- a pattern scanning and processing language'' 描述
221 | 了 awk 的原始版本, 载于 \textit{Software --- Practice and Experience},
222 | 1979 年 4 月, 这篇文章还包括了和语言设计有关的技术性讨论.
223 | 
224 | Awk 的大部分语法来源于 C 语言, \textit{The C Programming Language} 
225 | (B. W. Kernighan 和 D. M. Ritchie 著, Prentice-Hall 1978 年出版)
226 | 对 C 语言进行了完整的讨论. \texttt{egrep}, \texttt{lex} 和 \texttt{sed}
227 | 使用的正则表达式在 \textit{The Unix Programmer's Manual} 的第 2 部分中
228 | 讨论. \textit{Compilers: Principles, Techniques, and Tools} (Aho, Seti, 
229 | 和 Ullman 著, Addision-Wesley 1986 年出版) 的第 3 章包含了一个关于正
230 | 则表达式模式匹配的讨论, 新版本的 awk 就用到了该技术.
231 | 
232 | 也许你会觉得把 awk 和其他类似的语言作对比会比较有趣, 这些语言的元老当然
233 | 是 SNOBOL4, \textit{The SNOBOL4 Programming Language} (R. Griswold, J.
234 | Poage, 和 I. Polonsky 著, Prentice-Hall 1971 年版) 对该语言进行了详细
235 | 的讨论. 虽然 SNOBOL4 苦于应付非结构化的输入语言, 但它仍然是一门非常
236 | 强大, 灵活的编程语言. ICON (详情见 R. Griswold 和 M. Griswold 所著的
237 | \textit{The ICON Programming Language}, Prentice-Hall 1983 年出版) 是
238 | SNOBOL4 的直系后代, 它有着更友好的语法规则, 也集成了更多的模式设施.
239 | IBM 系统的解释语言 REXX 是另一个例子, 虽然它更想把自己当作
240 | 一个 shell 或命令解释器, 详情请参考 \textit{The REXX Language} (M. F.
241 | Cowlishaw 著, Prentice-Hall 1985 年出版).
242 | 


--------------------------------------------------------------------------------
/latex_src/images/cover.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wuzhouhui/awk/787d83cbf2a2f1686d026000a6054e531bb7b538/latex_src/images/cover.pdf


--------------------------------------------------------------------------------
/latex_src/images/heap_sort.eps:
--------------------------------------------------------------------------------
  1 | %!PS-Adobe-2.0 EPSF-2.0
  2 | %%BoundingBox: 46 576 353 756
  3 | %%HiResBoundingBox: 46.000000 576.000000 352.500000 755.500000
  4 | %%Creator: groff version 1.22.2
  5 | %%CreationDate: Wed Feb  8 23:12:36 2017
  6 | %%DocumentNeededResources: font Times-Roman
  7 | %%DocumentSuppliedResources: procset grops 1.22 2
  8 | %%PageOrder: Ascend
  9 | %%DocumentMedia: Default 612 792 0 () ()
 10 | %%EndComments
 11 | % EPSF created by ps2eps 1.68
 12 | %%BeginProlog
 13 | save
 14 | countdictstack
 15 | mark
 16 | newpath
 17 | /showpage {} def
 18 | /setpagedevice {pop} def
 19 | %%EndProlog
 20 | %%Page 1 1
 21 | %%BeginDefaults
 22 | %%PageMedia: Default
 23 | %%EndDefaults
 24 | %%BeginProlog
 25 | %%BeginResource: procset grops 1.22 2
 26 | %!PS-Adobe-3.0 Resource-ProcSet
 27 | /setpacking where{
 28 | pop
 29 | currentpacking
 30 | true setpacking
 31 | }if
 32 | /grops 120 dict dup begin
 33 | /SC 32 def
 34 | /A/show load def
 35 | /B{0 SC 3 -1 roll widthshow}bind def
 36 | /C{0 exch ashow}bind def
 37 | /D{0 exch 0 SC 5 2 roll awidthshow}bind def
 38 | /E{0 rmoveto show}bind def
 39 | /F{0 rmoveto 0 SC 3 -1 roll widthshow}bind def
 40 | /G{0 rmoveto 0 exch ashow}bind def
 41 | /H{0 rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 42 | /I{0 exch rmoveto show}bind def
 43 | /J{0 exch rmoveto 0 SC 3 -1 roll widthshow}bind def
 44 | /K{0 exch rmoveto 0 exch ashow}bind def
 45 | /L{0 exch rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 46 | /M{rmoveto show}bind def
 47 | /N{rmoveto 0 SC 3 -1 roll widthshow}bind def
 48 | /O{rmoveto 0 exch ashow}bind def
 49 | /P{rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 50 | /Q{moveto show}bind def
 51 | /R{moveto 0 SC 3 -1 roll widthshow}bind def
 52 | /S{moveto 0 exch ashow}bind def
 53 | /T{moveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 54 | /SF{
 55 | findfont exch
 56 | [exch dup 0 exch 0 exch neg 0 0]makefont
 57 | dup setfont
 58 | [exch/setfont cvx]cvx bind def
 59 | }bind def
 60 | /MF{
 61 | findfont
 62 | [5 2 roll
 63 | 0 3 1 roll
 64 | neg 0 0]makefont
 65 | dup setfont
 66 | [exch/setfont cvx]cvx bind def
 67 | }bind def
 68 | /level0 0 def
 69 | /RES 0 def
 70 | /PL 0 def
 71 | /LS 0 def
 72 | /MANUAL{
 73 | statusdict begin/manualfeed true store end
 74 | }bind def
 75 | /PLG{
 76 | gsave newpath clippath pathbbox grestore
 77 | exch pop add exch pop
 78 | }bind def
 79 | /BP{
 80 | /level0 save def
 81 | 1 setlinecap
 82 | 1 setlinejoin
 83 | DEFS/BPhook known{DEFS begin BPhook end}if
 84 | 72 RES div dup scale
 85 | LS{
 86 | 90 rotate
 87 | }{
 88 | 0 PL translate
 89 | }ifelse
 90 | 1 -1 scale
 91 | }bind def
 92 | /EP{
 93 | level0 restore
 94 | showpage
 95 | }def
 96 | /DA{
 97 | newpath arcn stroke
 98 | }bind def
 99 | /SN{
100 | transform
101 | .25 sub exch .25 sub exch
102 | round .25 add exch round .25 add exch
103 | itransform
104 | }bind def
105 | /DL{
106 | SN
107 | moveto
108 | SN
109 | lineto stroke
110 | }bind def
111 | /DC{
112 | newpath 0 360 arc closepath
113 | }bind def
114 | /TM matrix def
115 | /DE{
116 | TM currentmatrix pop
117 | translate scale newpath 0 0 .5 0 360 arc closepath
118 | TM setmatrix
119 | }bind def
120 | /RC/rcurveto load def
121 | /RL/rlineto load def
122 | /ST/stroke load def
123 | /MT/moveto load def
124 | /CL/closepath load def
125 | /Fr{
126 | setrgbcolor fill
127 | }bind def
128 | /setcmykcolor where{
129 | pop
130 | /Fk{
131 | setcmykcolor fill
132 | }bind def
133 | }if
134 | /Fg{
135 | setgray fill
136 | }bind def
137 | /FL/fill load def
138 | /LW/setlinewidth load def
139 | /Cr/setrgbcolor load def
140 | /setcmykcolor where{
141 | pop
142 | /Ck/setcmykcolor load def
143 | }if
144 | /Cg/setgray load def
145 | /RE{
146 | findfont
147 | dup maxlength 1 index/FontName known not{1 add}if dict begin
148 | {
149 | 1 index/FID ne
150 | 2 index/UniqueID ne
151 | and
152 | {def}{pop pop}ifelse
153 | }forall
154 | /Encoding exch def
155 | dup/FontName exch def
156 | currentdict end definefont pop
157 | }bind def
158 | /DEFS 0 def
159 | /EBEGIN{
160 | moveto
161 | DEFS begin
162 | }bind def
163 | /EEND/end load def
164 | /CNT 0 def
165 | /level1 0 def
166 | /PBEGIN{
167 | /level1 save def
168 | translate
169 | div 3 1 roll div exch scale
170 | neg exch neg exch translate
171 | 0 setgray
172 | 0 setlinecap
173 | 1 setlinewidth
174 | 0 setlinejoin
175 | 10 setmiterlimit
176 | []0 setdash
177 | /setstrokeadjust where{
178 | pop
179 | false setstrokeadjust
180 | }if
181 | /setoverprint where{
182 | pop
183 | false setoverprint
184 | }if
185 | newpath
186 | /CNT countdictstack def
187 | userdict begin
188 | /showpage{}def
189 | /setpagedevice{}def
190 | mark
191 | }bind def
192 | /PEND{
193 | cleartomark
194 | countdictstack CNT sub{end}repeat
195 | level1 restore
196 | }bind def
197 | end def
198 | /setpacking where{
199 | pop
200 | setpacking
201 | }if
202 | %%EndResource
203 | %%EndProlog
204 | %%BeginSetup
205 | %%BeginFeature: *PageSize Default
206 | << /PageSize [ 612 792 ] /ImagingBBox null >> setpagedevice
207 | %%EndFeature
208 | %%IncludeResource: font Times-Roman
209 | grops begin/DEFS 1 dict def DEFS begin/u{.001 mul}bind def end/RES 72
210 | def/PL 792 def/LS false def/ENC0[/asciicircum/asciitilde/Scaron/Zcaron
211 | /scaron/zcaron/Ydieresis/trademark/quotesingle/Euro/.notdef/.notdef
212 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
213 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
214 | /.notdef/.notdef/space/exclam/quotedbl/numbersign/dollar/percent
215 | /ampersand/quoteright/parenleft/parenright/asterisk/plus/comma/hyphen
216 | /period/slash/zero/one/two/three/four/five/six/seven/eight/nine/colon
217 | /semicolon/less/equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M/N/O
218 | /P/Q/R/S/T/U/V/W/X/Y/Z/bracketleft/backslash/bracketright/circumflex
219 | /underscore/quoteleft/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y
220 | /z/braceleft/bar/braceright/tilde/.notdef/quotesinglbase/guillemotleft
221 | /guillemotright/bullet/florin/fraction/perthousand/dagger/daggerdbl
222 | /endash/emdash/ff/fi/fl/ffi/ffl/dotlessi/dotlessj/grave/hungarumlaut
223 | /dotaccent/breve/caron/ring/ogonek/quotedblleft/quotedblright/oe/lslash
224 | /quotedblbase/OE/Lslash/.notdef/exclamdown/cent/sterling/currency/yen
225 | /brokenbar/section/dieresis/copyright/ordfeminine/guilsinglleft
226 | /logicalnot/minus/registered/macron/degree/plusminus/twosuperior
227 | /threesuperior/acute/mu/paragraph/periodcentered/cedilla/onesuperior
228 | /ordmasculine/guilsinglright/onequarter/onehalf/threequarters
229 | /questiondown/Agrave/Aacute/Acircumflex/Atilde/Adieresis/Aring/AE
230 | /Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/Igrave/Iacute/Icircumflex
231 | /Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/Odieresis
232 | /multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn
233 | /germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/ccedilla
234 | /egrave/eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/idieresis
235 | /eth/ntilde/ograve/oacute/ocircumflex/otilde/odieresis/divide/oslash
236 | /ugrave/uacute/ucircumflex/udieresis/yacute/thorn/ydieresis]def
237 | /Times-Roman@0 ENC0/Times-Roman RE
238 | %%EndSetup
239 | %%Page: 1 1
240 | %%BeginPageSetup
241 | BP
242 | %%EndPageSetup
243 | .4 LW 129.6 40.8 129.6 184.8 DL 345.6 40.8 129.6 40.8 DL 345.6 184.8
244 | 345.6 40.8 DL 129.6 184.8 345.6 184.8 DL/F0 10/Times-Roman@0 SF
245 | (Number of elements)196.775 215.8 Q(Comparisons)45.61 109 Q 2.5(+E)46.27
246 | 121 S(xchanges)-2.5 E 135.36 184.8 129.6 184.8 DL(0)121.72 187 Q 135.36
247 | 148.8 129.6 148.8 DL(500)111.72 151 Q 135.36 112.8 129.6 112.8 DL(1000)
248 | 106.72 115 Q 135.36 76.8 129.6 76.8 DL(1500)106.72 79 Q 135.36 40.8
249 | 129.6 40.8 DL(2000)106.72 43 Q 129.6 190.56 129.6 184.8 DL(0)127.1
250 | 199.912 Q 172.8 190.56 172.8 184.8 DL(20)167.8 199.912 Q 216 190.56 216
251 | 184.8 DL(40)211 199.912 Q 259.2 190.56 259.2 184.8 DL(60)254.2 199.912 Q
252 | 302.4 190.56 302.4 184.8 DL(80)297.4 199.912 Q 345.6 190.56 345.6 184.8
253 | DL(100)338.1 199.912 Q(HEAPSOR)147.265 79 Q(T)-.6 E/F1 8/Times-Roman@0
254 | SF(re)258.304 93.4 Q -.12(ve)-.2 G(rse-sorted).12 E(random)290.18 79 Q
255 | (equal-element)279.74 136.6 Q 151.2 180.12 129.6 184.8 DL 172.8 171.984
256 | 151.2 180.12 DL 194.4 160.968 172.8 171.984 DL 216 149.952 194.4 160.968
257 | DL 237.6 136.272 216 149.952 DL 259.2 124.32 237.6 136.272 DL 280.8
258 | 111.288 259.2 124.32 DL 302.4 96.888 280.8 111.288 DL 324 82.848 302.4
259 | 96.888 DL 345.6 68.16 324 82.848 DL 133.128 184.152 129.6 184.8 DL
260 | 139.176 183.072 135.648 183.72 DL 145.152 181.992 141.624 182.64 DL
261 | 151.2 180.984 147.672 181.632 DL 154.656 179.904 151.2 180.984 DL
262 | 160.704 177.96 157.248 179.04 DL 166.752 176.016 163.296 177.096 DL
263 | 172.8 174.072 169.344 175.152 DL 176.184 172.848 172.8 174.072 DL
264 | 182.232 170.688 178.848 171.912 DL 188.352 168.528 184.968 169.752 DL
265 | 194.4 166.368 191.016 167.592 DL 197.712 165 194.4 166.368 DL 203.832
266 | 162.408 200.52 163.776 DL 209.88 159.816 206.568 161.184 DL 216 157.224
267 | 212.688 158.592 DL 219.312 155.784 216 157.224 DL 225.432 153.12 222.12
268 | 154.56 DL 231.48 150.528 228.168 151.968 DL 237.6 147.864 234.288
269 | 149.304 DL 240.912 146.496 237.6 147.864 DL 247.032 143.904 243.72
270 | 145.272 DL 253.08 141.312 249.768 142.68 DL 259.2 138.72 255.888 140.088
271 | DL 262.44 137.136 259.2 138.72 DL 268.56 134.184 265.32 135.768 DL
272 | 274.68 131.232 271.44 132.816 DL 280.8 128.352 277.56 129.936 DL 284.04
273 | 126.768 280.8 128.352 DL 290.16 123.744 286.92 125.328 DL 296.28 120.648
274 | 293.04 122.232 DL 302.4 117.624 299.16 119.208 DL 305.64 116.04 302.4
275 | 117.624 DL 311.76 113.016 308.52 114.6 DL 317.88 109.92 314.64 111.504
276 | DL 324 106.896 320.76 108.48 DL 327.24 105.312 324 106.896 DL 333.36
277 | 102.216 330.12 103.8 DL 339.48 99.12 336.24 100.704 DL 345.6 96.024
278 | 342.36 97.608 DL 129.6 184.8 129.6 184.8 DL 133.2 184.08 133.2 184.08 DL
279 | 136.8 183.288 136.8 183.288 DL 140.4 182.568 140.4 182.568 DL 144
280 | 181.848 144 181.848 DL 147.6 181.056 147.6 181.056 DL 151.2 180.336
281 | 151.2 180.336 DL 151.2 180.336 151.2 180.336 DL 154.8 178.896 154.8
282 | 178.896 DL 158.4 177.456 158.4 177.456 DL 162 176.016 162 176.016 DL
283 | 165.6 174.576 165.6 174.576 DL 169.2 173.136 169.2 173.136 DL 172.8
284 | 171.696 172.8 171.696 DL 172.8 171.696 172.8 171.696 DL 175.896 170.328
285 | 175.896 170.328 DL 178.992 168.888 178.992 168.888 DL 182.088 167.52
286 | 182.088 167.52 DL 185.112 166.152 185.112 166.152 DL 188.208 164.784
287 | 188.208 164.784 DL 191.304 163.344 191.304 163.344 DL 194.4 161.976
288 | 194.4 161.976 DL 194.4 161.976 194.4 161.976 DL 197.496 160.464 197.496
289 | 160.464 DL 200.592 158.88 200.592 158.88 DL 203.688 157.368 203.688
290 | 157.368 DL 206.712 155.856 206.712 155.856 DL 209.808 154.344 209.808
291 | 154.344 DL 212.904 152.76 212.904 152.76 DL 216 151.248 216 151.248 DL
292 | 216 151.248 216 151.248 DL 219.096 149.52 219.096 149.52 DL 222.192
293 | 147.864 222.192 147.864 DL 225.288 146.136 225.288 146.136 DL 228.312
294 | 144.408 228.312 144.408 DL 231.408 142.68 231.408 142.68 DL 234.504
295 | 141.024 234.504 141.024 DL 237.6 139.296 237.6 139.296 DL 237.6 139.296
296 | 237.6 139.296 DL 240.696 137.568 240.696 137.568 DL 243.792 135.84
297 | 243.792 135.84 DL 246.888 134.112 246.888 134.112 DL 249.912 132.312
298 | 249.912 132.312 DL 253.008 130.584 253.008 130.584 DL 256.104 128.856
299 | 256.104 128.856 DL 259.2 127.128 259.2 127.128 DL 259.2 127.128 259.2
300 | 127.128 DL 262.296 125.256 262.296 125.256 DL 265.392 123.384 265.392
301 | 123.384 DL 268.488 121.512 268.488 121.512 DL 271.512 119.64 271.512
302 | 119.64 DL 274.608 117.768 274.608 117.768 DL 277.704 115.896 277.704
303 | 115.896 DL 280.8 114.024 280.8 114.024 DL 280.8 114.024 280.8 114.024 DL
304 | 283.896 112.08 283.896 112.08 DL 286.992 110.136 286.992 110.136 DL
305 | 290.088 108.192 290.088 108.192 DL 293.112 106.32 293.112 106.32 DL
306 | 296.208 104.376 296.208 104.376 DL 299.304 102.432 299.304 102.432 DL
307 | 302.4 100.488 302.4 100.488 DL 302.4 100.488 302.4 100.488 DL 305.496
308 | 98.472 305.496 98.472 DL 308.592 96.528 308.592 96.528 DL 311.688 94.512
309 | 311.688 94.512 DL 314.712 92.568 314.712 92.568 DL 317.808 90.552
310 | 317.808 90.552 DL 320.904 88.608 320.904 88.608 DL 324 86.592 324 86.592
311 | DL 324 86.592 324 86.592 DL 327.096 84.576 327.096 84.576 DL 330.192
312 | 82.56 330.192 82.56 DL 333.288 80.544 333.288 80.544 DL 336.312 78.6
313 | 336.312 78.6 DL 339.408 76.584 339.408 76.584 DL 342.504 74.568 342.504
314 | 74.568 DL 345.6 72.552 345.6 72.552 DL 0 Cg EP
315 | %%Trailer
316 | end
317 | %%Trailer
318 | cleartomark
319 | countdictstack
320 | exch sub { end } repeat
321 | restore
322 | %%EOF
323 | 


--------------------------------------------------------------------------------
/latex_src/images/insertion_sort.eps:
--------------------------------------------------------------------------------
  1 | %!PS-Adobe-2.0 EPSF-2.0
  2 | %%BoundingBox: 46 576 353 756
  3 | %%HiResBoundingBox: 46.000000 576.000000 352.500000 755.500000
  4 | %%Creator: groff version 1.22.2
  5 | %%CreationDate: Wed Feb  8 22:58:03 2017
  6 | %%DocumentNeededResources: font Times-Roman
  7 | %%DocumentSuppliedResources: procset grops 1.22 2
  8 | %%PageOrder: Ascend
  9 | %%DocumentMedia: Default 612 792 0 () ()
 10 | %%EndComments
 11 | % EPSF created by ps2eps 1.68
 12 | %%BeginProlog
 13 | save
 14 | countdictstack
 15 | mark
 16 | newpath
 17 | /showpage {} def
 18 | /setpagedevice {pop} def
 19 | %%EndProlog
 20 | %%Page 1 1
 21 | %%BeginDefaults
 22 | %%PageMedia: Default
 23 | %%EndDefaults
 24 | %%BeginProlog
 25 | %%BeginResource: procset grops 1.22 2
 26 | %!PS-Adobe-3.0 Resource-ProcSet
 27 | /setpacking where{
 28 | pop
 29 | currentpacking
 30 | true setpacking
 31 | }if
 32 | /grops 120 dict dup begin
 33 | /SC 32 def
 34 | /A/show load def
 35 | /B{0 SC 3 -1 roll widthshow}bind def
 36 | /C{0 exch ashow}bind def
 37 | /D{0 exch 0 SC 5 2 roll awidthshow}bind def
 38 | /E{0 rmoveto show}bind def
 39 | /F{0 rmoveto 0 SC 3 -1 roll widthshow}bind def
 40 | /G{0 rmoveto 0 exch ashow}bind def
 41 | /H{0 rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 42 | /I{0 exch rmoveto show}bind def
 43 | /J{0 exch rmoveto 0 SC 3 -1 roll widthshow}bind def
 44 | /K{0 exch rmoveto 0 exch ashow}bind def
 45 | /L{0 exch rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 46 | /M{rmoveto show}bind def
 47 | /N{rmoveto 0 SC 3 -1 roll widthshow}bind def
 48 | /O{rmoveto 0 exch ashow}bind def
 49 | /P{rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 50 | /Q{moveto show}bind def
 51 | /R{moveto 0 SC 3 -1 roll widthshow}bind def
 52 | /S{moveto 0 exch ashow}bind def
 53 | /T{moveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 54 | /SF{
 55 | findfont exch
 56 | [exch dup 0 exch 0 exch neg 0 0]makefont
 57 | dup setfont
 58 | [exch/setfont cvx]cvx bind def
 59 | }bind def
 60 | /MF{
 61 | findfont
 62 | [5 2 roll
 63 | 0 3 1 roll
 64 | neg 0 0]makefont
 65 | dup setfont
 66 | [exch/setfont cvx]cvx bind def
 67 | }bind def
 68 | /level0 0 def
 69 | /RES 0 def
 70 | /PL 0 def
 71 | /LS 0 def
 72 | /MANUAL{
 73 | statusdict begin/manualfeed true store end
 74 | }bind def
 75 | /PLG{
 76 | gsave newpath clippath pathbbox grestore
 77 | exch pop add exch pop
 78 | }bind def
 79 | /BP{
 80 | /level0 save def
 81 | 1 setlinecap
 82 | 1 setlinejoin
 83 | DEFS/BPhook known{DEFS begin BPhook end}if
 84 | 72 RES div dup scale
 85 | LS{
 86 | 90 rotate
 87 | }{
 88 | 0 PL translate
 89 | }ifelse
 90 | 1 -1 scale
 91 | }bind def
 92 | /EP{
 93 | level0 restore
 94 | showpage
 95 | }def
 96 | /DA{
 97 | newpath arcn stroke
 98 | }bind def
 99 | /SN{
100 | transform
101 | .25 sub exch .25 sub exch
102 | round .25 add exch round .25 add exch
103 | itransform
104 | }bind def
105 | /DL{
106 | SN
107 | moveto
108 | SN
109 | lineto stroke
110 | }bind def
111 | /DC{
112 | newpath 0 360 arc closepath
113 | }bind def
114 | /TM matrix def
115 | /DE{
116 | TM currentmatrix pop
117 | translate scale newpath 0 0 .5 0 360 arc closepath
118 | TM setmatrix
119 | }bind def
120 | /RC/rcurveto load def
121 | /RL/rlineto load def
122 | /ST/stroke load def
123 | /MT/moveto load def
124 | /CL/closepath load def
125 | /Fr{
126 | setrgbcolor fill
127 | }bind def
128 | /setcmykcolor where{
129 | pop
130 | /Fk{
131 | setcmykcolor fill
132 | }bind def
133 | }if
134 | /Fg{
135 | setgray fill
136 | }bind def
137 | /FL/fill load def
138 | /LW/setlinewidth load def
139 | /Cr/setrgbcolor load def
140 | /setcmykcolor where{
141 | pop
142 | /Ck/setcmykcolor load def
143 | }if
144 | /Cg/setgray load def
145 | /RE{
146 | findfont
147 | dup maxlength 1 index/FontName known not{1 add}if dict begin
148 | {
149 | 1 index/FID ne
150 | 2 index/UniqueID ne
151 | and
152 | {def}{pop pop}ifelse
153 | }forall
154 | /Encoding exch def
155 | dup/FontName exch def
156 | currentdict end definefont pop
157 | }bind def
158 | /DEFS 0 def
159 | /EBEGIN{
160 | moveto
161 | DEFS begin
162 | }bind def
163 | /EEND/end load def
164 | /CNT 0 def
165 | /level1 0 def
166 | /PBEGIN{
167 | /level1 save def
168 | translate
169 | div 3 1 roll div exch scale
170 | neg exch neg exch translate
171 | 0 setgray
172 | 0 setlinecap
173 | 1 setlinewidth
174 | 0 setlinejoin
175 | 10 setmiterlimit
176 | []0 setdash
177 | /setstrokeadjust where{
178 | pop
179 | false setstrokeadjust
180 | }if
181 | /setoverprint where{
182 | pop
183 | false setoverprint
184 | }if
185 | newpath
186 | /CNT countdictstack def
187 | userdict begin
188 | /showpage{}def
189 | /setpagedevice{}def
190 | mark
191 | }bind def
192 | /PEND{
193 | cleartomark
194 | countdictstack CNT sub{end}repeat
195 | level1 restore
196 | }bind def
197 | end def
198 | /setpacking where{
199 | pop
200 | setpacking
201 | }if
202 | %%EndResource
203 | %%EndProlog
204 | %%BeginSetup
205 | %%BeginFeature: *PageSize Default
206 | << /PageSize [ 612 792 ] /ImagingBBox null >> setpagedevice
207 | %%EndFeature
208 | %%IncludeResource: font Times-Roman
209 | grops begin/DEFS 1 dict def DEFS begin/u{.001 mul}bind def end/RES 72
210 | def/PL 792 def/LS false def/ENC0[/asciicircum/asciitilde/Scaron/Zcaron
211 | /scaron/zcaron/Ydieresis/trademark/quotesingle/Euro/.notdef/.notdef
212 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
213 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
214 | /.notdef/.notdef/space/exclam/quotedbl/numbersign/dollar/percent
215 | /ampersand/quoteright/parenleft/parenright/asterisk/plus/comma/hyphen
216 | /period/slash/zero/one/two/three/four/five/six/seven/eight/nine/colon
217 | /semicolon/less/equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M/N/O
218 | /P/Q/R/S/T/U/V/W/X/Y/Z/bracketleft/backslash/bracketright/circumflex
219 | /underscore/quoteleft/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y
220 | /z/braceleft/bar/braceright/tilde/.notdef/quotesinglbase/guillemotleft
221 | /guillemotright/bullet/florin/fraction/perthousand/dagger/daggerdbl
222 | /endash/emdash/ff/fi/fl/ffi/ffl/dotlessi/dotlessj/grave/hungarumlaut
223 | /dotaccent/breve/caron/ring/ogonek/quotedblleft/quotedblright/oe/lslash
224 | /quotedblbase/OE/Lslash/.notdef/exclamdown/cent/sterling/currency/yen
225 | /brokenbar/section/dieresis/copyright/ordfeminine/guilsinglleft
226 | /logicalnot/minus/registered/macron/degree/plusminus/twosuperior
227 | /threesuperior/acute/mu/paragraph/periodcentered/cedilla/onesuperior
228 | /ordmasculine/guilsinglright/onequarter/onehalf/threequarters
229 | /questiondown/Agrave/Aacute/Acircumflex/Atilde/Adieresis/Aring/AE
230 | /Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/Igrave/Iacute/Icircumflex
231 | /Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/Odieresis
232 | /multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn
233 | /germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/ccedilla
234 | /egrave/eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/idieresis
235 | /eth/ntilde/ograve/oacute/ocircumflex/otilde/odieresis/divide/oslash
236 | /ugrave/uacute/ucircumflex/udieresis/yacute/thorn/ydieresis]def
237 | /Times-Roman@0 ENC0/Times-Roman RE
238 | %%EndSetup
239 | %%Page: 1 1
240 | %%BeginPageSetup
241 | BP
242 | %%EndPageSetup
243 | .4 LW 129.6 40.8 129.6 184.8 DL 345.6 40.8 129.6 40.8 DL 345.6 184.8
244 | 345.6 40.8 DL 129.6 184.8 345.6 184.8 DL/F0 10/Times-Roman@0 SF
245 | (Number of elements)196.775 215.8 Q(Comparisons)45.61 109 Q 2.5(+E)46.27
246 | 121 S(xchanges)-2.5 E 135.36 184.8 129.6 184.8 DL(0)121.72 187 Q 135.36
247 | 156 129.6 156 DL(2000)106.72 158.2 Q 135.36 127.2 129.6 127.2 DL(4000)
248 | 106.72 129.4 Q 135.36 98.4 129.6 98.4 DL(6000)106.72 100.6 Q 135.36 69.6
249 | 129.6 69.6 DL(8000)106.72 71.8 Q 135.36 40.8 129.6 40.8 DL(10000)101.72
250 | 43 Q 129.6 190.56 129.6 184.8 DL(0)127.1 199.912 Q 172.8 190.56 172.8
251 | 184.8 DL(20)167.8 199.912 Q 216 190.56 216 184.8 DL(40)211 199.912 Q
252 | 259.2 190.56 259.2 184.8 DL(60)254.2 199.912 Q 302.4 190.56 302.4 184.8
253 | DL(80)297.4 199.912 Q 345.6 190.56 345.6 184.8 DL(100)338.1 199.912 Q
254 | (INSER)154.585 71.8 Q(TION SOR)-.6 E(T)-.6 E/F1 8/Times-Roman@0 SF(re)
255 | 258.304 86.2 Q -.12(ve)-.2 G(rse-sorted).12 E(random)279.38 129.4 Q
256 | (equal-element)279.74 179.8 Q 151.2 184.008 129.6 184.8 DL 172.8 181.92
257 | 151.2 184.008 DL 194.4 177.096 172.8 181.992 DL 216 174.36 194.4 177.024
258 | DL 237.6 168.672 216 174.432 DL 259.2 159.384 237.6 168.672 DL 280.8
259 | 151.464 259.2 159.384 DL 302.4 134.328 280.8 151.464 DL 324 126.768
260 | 302.4 134.4 DL 345.6 110.208 324 126.696 DL 133.2 184.8 129.6 184.8 DL
261 | 139.176 184.728 135.576 184.728 DL 145.224 184.728 141.624 184.728 DL
262 | 151.2 184.656 147.6 184.656 DL 154.8 184.656 151.2 184.656 DL 160.776
263 | 184.656 157.176 184.656 DL 166.824 184.584 163.224 184.584 DL 172.8
264 | 184.584 169.2 184.584 DL 176.4 184.512 172.8 184.512 DL 182.376 184.512
265 | 178.776 184.512 DL 188.424 184.44 184.824 184.44 DL 194.4 184.44 190.8
266 | 184.44 DL 198 184.368 194.4 184.368 DL 203.976 184.368 200.376 184.368
267 | DL 210.024 184.296 206.424 184.296 DL 216 184.296 212.4 184.296 DL 219.6
268 | 184.224 216 184.224 DL 225.576 184.224 221.976 184.224 DL 231.624
269 | 184.152 228.024 184.152 DL 237.6 184.152 234 184.152 DL 241.2 184.08
270 | 237.6 184.08 DL 247.176 184.08 243.576 184.08 DL 253.224 184.008 249.624
271 | 184.008 DL 259.2 184.008 255.6 184.008 DL 262.8 183.936 259.2 183.936 DL
272 | 268.776 183.936 265.176 183.936 DL 274.824 183.864 271.224 183.864 DL
273 | 280.8 183.864 277.2 183.864 DL 284.4 183.792 280.8 183.792 DL 290.376
274 | 183.792 286.776 183.792 DL 296.424 183.72 292.824 183.72 DL 302.4 183.72
275 | 298.8 183.72 DL 306 183.648 302.4 183.648 DL 311.976 183.648 308.376
276 | 183.648 DL 318.024 183.576 314.424 183.576 DL 324 183.576 320.4 183.576
277 | DL 327.6 183.504 324 183.504 DL 333.576 183.504 329.976 183.504 DL
278 | 339.624 183.432 336.024 183.432 DL 345.6 183.432 342 183.432 DL 129.6
279 | 184.8 129.6 184.8 DL 133.2 184.584 133.2 184.584 DL 136.8 184.368 136.8
280 | 184.368 DL 140.4 184.152 140.4 184.152 DL 144 183.936 144 183.936 DL
281 | 147.6 183.72 147.6 183.72 DL 151.2 183.504 151.2 183.504 DL 151.2
282 | 183.504 151.2 183.504 DL 154.8 182.784 154.8 182.784 DL 158.4 182.136
283 | 158.4 182.136 DL 162 181.416 162 181.416 DL 165.6 180.696 165.6 180.696
284 | DL 169.2 180.048 169.2 180.048 DL 172.8 179.328 172.8 179.328 DL 172.8
285 | 179.328 172.8 179.328 DL 176.4 178.176 176.4 178.176 DL 180 176.952 180
286 | 176.952 DL 183.6 175.8 183.6 175.8 DL 187.2 174.648 187.2 174.648 DL
287 | 190.8 173.424 190.8 173.424 DL 194.4 172.272 194.4 172.272 DL 194.4
288 | 172.272 194.4 172.272 DL 197.496 170.832 197.496 170.832 DL 200.592
289 | 169.464 200.592 169.464 DL 203.688 168.024 203.688 168.024 DL 206.712
290 | 166.584 206.712 166.584 DL 209.808 165.144 209.808 165.144 DL 212.904
291 | 163.776 212.904 163.776 DL 216 162.336 216 162.336 DL 216 162.336 216
292 | 162.336 DL 219.096 160.536 219.096 160.536 DL 222.192 158.664 222.192
293 | 158.664 DL 225.288 156.864 225.288 156.864 DL 228.312 154.992 228.312
294 | 154.992 DL 231.408 153.192 231.408 153.192 DL 234.504 151.32 234.504
295 | 151.32 DL 237.6 149.52 237.6 149.52 DL 237.6 149.52 237.6 149.52 DL
296 | 240.696 147.288 240.696 147.288 DL 243.792 145.056 243.792 145.056 DL
297 | 246.888 142.824 246.888 142.824 DL 249.912 140.52 249.912 140.52 DL
298 | 253.008 138.288 253.008 138.288 DL 256.104 136.056 256.104 136.056 DL
299 | 259.2 133.824 259.2 133.824 DL 259.2 133.824 259.2 133.824 DL 261.936
300 | 131.52 261.936 131.52 DL 264.6 129.144 264.6 129.144 DL 267.336 126.84
301 | 267.336 126.84 DL 270 124.536 270 124.536 DL 272.736 122.232 272.736
302 | 122.232 DL 275.4 119.856 275.4 119.856 DL 278.136 117.552 278.136
303 | 117.552 DL 280.8 115.248 280.8 115.248 DL 280.8 115.248 280.8 115.248 DL
304 | 283.536 112.584 283.536 112.584 DL 286.2 109.848 286.2 109.848 DL
305 | 288.936 107.184 288.936 107.184 DL 291.6 104.52 291.6 104.52 DL 294.336
306 | 101.856 294.336 101.856 DL 297 99.12 297 99.12 DL 299.736 96.456 299.736
307 | 96.456 DL 302.4 93.792 302.4 93.792 DL 302.4 93.792 302.4 93.792 DL
308 | 304.776 91.056 304.776 91.056 DL 307.224 88.392 307.224 88.392 DL 309.6
309 | 85.656 309.6 85.656 DL 311.976 82.992 311.976 82.992 DL 314.424 80.256
310 | 314.424 80.256 DL 316.8 77.592 316.8 77.592 DL 319.176 74.856 319.176
311 | 74.856 DL 321.624 72.192 321.624 72.192 DL 324 69.456 324 69.456 DL 324
312 | 69.456 324 69.456 DL 326.16 66.72 326.16 66.72 DL 328.32 63.984 328.32
313 | 63.984 DL 330.48 61.32 330.48 61.32 DL 332.64 58.584 332.64 58.584 DL
314 | 334.8 55.848 334.8 55.848 DL 336.96 53.112 336.96 53.112 DL 339.12
315 | 50.376 339.12 50.376 DL 341.28 47.712 341.28 47.712 DL 343.44 44.976
316 | 343.44 44.976 DL 345.6 42.24 345.6 42.24 DL 0 Cg EP
317 | %%Trailer
318 | end
319 | %%Trailer
320 | cleartomark
321 | countdictstack
322 | exch sub { end } repeat
323 | restore
324 | %%EOF
325 | 


--------------------------------------------------------------------------------
/latex_src/images/quicksort.eps:
--------------------------------------------------------------------------------
  1 | %!PS-Adobe-2.0 EPSF-2.0
  2 | %%BoundingBox: 46 576 353 756
  3 | %%HiResBoundingBox: 46.000000 576.000000 352.500000 756.000000
  4 | %%Creator: groff version 1.22.2
  5 | %%CreationDate: Wed Feb  8 23:06:54 2017
  6 | %%DocumentNeededResources: font Times-Roman
  7 | %%DocumentSuppliedResources: procset grops 1.22 2
  8 | %%PageOrder: Ascend
  9 | %%DocumentMedia: Default 612 792 0 () ()
 10 | %%EndComments
 11 | % EPSF created by ps2eps 1.68
 12 | %%BeginProlog
 13 | save
 14 | countdictstack
 15 | mark
 16 | newpath
 17 | /showpage {} def
 18 | /setpagedevice {pop} def
 19 | %%EndProlog
 20 | %%Page 1 1
 21 | %%BeginDefaults
 22 | %%PageMedia: Default
 23 | %%EndDefaults
 24 | %%BeginProlog
 25 | %%BeginResource: procset grops 1.22 2
 26 | %!PS-Adobe-3.0 Resource-ProcSet
 27 | /setpacking where{
 28 | pop
 29 | currentpacking
 30 | true setpacking
 31 | }if
 32 | /grops 120 dict dup begin
 33 | /SC 32 def
 34 | /A/show load def
 35 | /B{0 SC 3 -1 roll widthshow}bind def
 36 | /C{0 exch ashow}bind def
 37 | /D{0 exch 0 SC 5 2 roll awidthshow}bind def
 38 | /E{0 rmoveto show}bind def
 39 | /F{0 rmoveto 0 SC 3 -1 roll widthshow}bind def
 40 | /G{0 rmoveto 0 exch ashow}bind def
 41 | /H{0 rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 42 | /I{0 exch rmoveto show}bind def
 43 | /J{0 exch rmoveto 0 SC 3 -1 roll widthshow}bind def
 44 | /K{0 exch rmoveto 0 exch ashow}bind def
 45 | /L{0 exch rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 46 | /M{rmoveto show}bind def
 47 | /N{rmoveto 0 SC 3 -1 roll widthshow}bind def
 48 | /O{rmoveto 0 exch ashow}bind def
 49 | /P{rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 50 | /Q{moveto show}bind def
 51 | /R{moveto 0 SC 3 -1 roll widthshow}bind def
 52 | /S{moveto 0 exch ashow}bind def
 53 | /T{moveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 54 | /SF{
 55 | findfont exch
 56 | [exch dup 0 exch 0 exch neg 0 0]makefont
 57 | dup setfont
 58 | [exch/setfont cvx]cvx bind def
 59 | }bind def
 60 | /MF{
 61 | findfont
 62 | [5 2 roll
 63 | 0 3 1 roll
 64 | neg 0 0]makefont
 65 | dup setfont
 66 | [exch/setfont cvx]cvx bind def
 67 | }bind def
 68 | /level0 0 def
 69 | /RES 0 def
 70 | /PL 0 def
 71 | /LS 0 def
 72 | /MANUAL{
 73 | statusdict begin/manualfeed true store end
 74 | }bind def
 75 | /PLG{
 76 | gsave newpath clippath pathbbox grestore
 77 | exch pop add exch pop
 78 | }bind def
 79 | /BP{
 80 | /level0 save def
 81 | 1 setlinecap
 82 | 1 setlinejoin
 83 | DEFS/BPhook known{DEFS begin BPhook end}if
 84 | 72 RES div dup scale
 85 | LS{
 86 | 90 rotate
 87 | }{
 88 | 0 PL translate
 89 | }ifelse
 90 | 1 -1 scale
 91 | }bind def
 92 | /EP{
 93 | level0 restore
 94 | showpage
 95 | }def
 96 | /DA{
 97 | newpath arcn stroke
 98 | }bind def
 99 | /SN{
100 | transform
101 | .25 sub exch .25 sub exch
102 | round .25 add exch round .25 add exch
103 | itransform
104 | }bind def
105 | /DL{
106 | SN
107 | moveto
108 | SN
109 | lineto stroke
110 | }bind def
111 | /DC{
112 | newpath 0 360 arc closepath
113 | }bind def
114 | /TM matrix def
115 | /DE{
116 | TM currentmatrix pop
117 | translate scale newpath 0 0 .5 0 360 arc closepath
118 | TM setmatrix
119 | }bind def
120 | /RC/rcurveto load def
121 | /RL/rlineto load def
122 | /ST/stroke load def
123 | /MT/moveto load def
124 | /CL/closepath load def
125 | /Fr{
126 | setrgbcolor fill
127 | }bind def
128 | /setcmykcolor where{
129 | pop
130 | /Fk{
131 | setcmykcolor fill
132 | }bind def
133 | }if
134 | /Fg{
135 | setgray fill
136 | }bind def
137 | /FL/fill load def
138 | /LW/setlinewidth load def
139 | /Cr/setrgbcolor load def
140 | /setcmykcolor where{
141 | pop
142 | /Ck/setcmykcolor load def
143 | }if
144 | /Cg/setgray load def
145 | /RE{
146 | findfont
147 | dup maxlength 1 index/FontName known not{1 add}if dict begin
148 | {
149 | 1 index/FID ne
150 | 2 index/UniqueID ne
151 | and
152 | {def}{pop pop}ifelse
153 | }forall
154 | /Encoding exch def
155 | dup/FontName exch def
156 | currentdict end definefont pop
157 | }bind def
158 | /DEFS 0 def
159 | /EBEGIN{
160 | moveto
161 | DEFS begin
162 | }bind def
163 | /EEND/end load def
164 | /CNT 0 def
165 | /level1 0 def
166 | /PBEGIN{
167 | /level1 save def
168 | translate
169 | div 3 1 roll div exch scale
170 | neg exch neg exch translate
171 | 0 setgray
172 | 0 setlinecap
173 | 1 setlinewidth
174 | 0 setlinejoin
175 | 10 setmiterlimit
176 | []0 setdash
177 | /setstrokeadjust where{
178 | pop
179 | false setstrokeadjust
180 | }if
181 | /setoverprint where{
182 | pop
183 | false setoverprint
184 | }if
185 | newpath
186 | /CNT countdictstack def
187 | userdict begin
188 | /showpage{}def
189 | /setpagedevice{}def
190 | mark
191 | }bind def
192 | /PEND{
193 | cleartomark
194 | countdictstack CNT sub{end}repeat
195 | level1 restore
196 | }bind def
197 | end def
198 | /setpacking where{
199 | pop
200 | setpacking
201 | }if
202 | %%EndResource
203 | %%EndProlog
204 | %%BeginSetup
205 | %%BeginFeature: *PageSize Default
206 | << /PageSize [ 612 792 ] /ImagingBBox null >> setpagedevice
207 | %%EndFeature
208 | %%IncludeResource: font Times-Roman
209 | grops begin/DEFS 1 dict def DEFS begin/u{.001 mul}bind def end/RES 72
210 | def/PL 792 def/LS false def/ENC0[/asciicircum/asciitilde/Scaron/Zcaron
211 | /scaron/zcaron/Ydieresis/trademark/quotesingle/Euro/.notdef/.notdef
212 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
213 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
214 | /.notdef/.notdef/space/exclam/quotedbl/numbersign/dollar/percent
215 | /ampersand/quoteright/parenleft/parenright/asterisk/plus/comma/hyphen
216 | /period/slash/zero/one/two/three/four/five/six/seven/eight/nine/colon
217 | /semicolon/less/equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M/N/O
218 | /P/Q/R/S/T/U/V/W/X/Y/Z/bracketleft/backslash/bracketright/circumflex
219 | /underscore/quoteleft/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y
220 | /z/braceleft/bar/braceright/tilde/.notdef/quotesinglbase/guillemotleft
221 | /guillemotright/bullet/florin/fraction/perthousand/dagger/daggerdbl
222 | /endash/emdash/ff/fi/fl/ffi/ffl/dotlessi/dotlessj/grave/hungarumlaut
223 | /dotaccent/breve/caron/ring/ogonek/quotedblleft/quotedblright/oe/lslash
224 | /quotedblbase/OE/Lslash/.notdef/exclamdown/cent/sterling/currency/yen
225 | /brokenbar/section/dieresis/copyright/ordfeminine/guilsinglleft
226 | /logicalnot/minus/registered/macron/degree/plusminus/twosuperior
227 | /threesuperior/acute/mu/paragraph/periodcentered/cedilla/onesuperior
228 | /ordmasculine/guilsinglright/onequarter/onehalf/threequarters
229 | /questiondown/Agrave/Aacute/Acircumflex/Atilde/Adieresis/Aring/AE
230 | /Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/Igrave/Iacute/Icircumflex
231 | /Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/Odieresis
232 | /multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn
233 | /germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/ccedilla
234 | /egrave/eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/idieresis
235 | /eth/ntilde/ograve/oacute/ocircumflex/otilde/odieresis/divide/oslash
236 | /ugrave/uacute/ucircumflex/udieresis/yacute/thorn/ydieresis]def
237 | /Times-Roman@0 ENC0/Times-Roman RE
238 | %%EndSetup
239 | %%Page: 1 1
240 | %%BeginPageSetup
241 | BP
242 | %%EndPageSetup
243 | .4 LW 129.6 40.8 129.6 184.8 DL 345.6 40.8 129.6 40.8 DL 345.6 184.8
244 | 345.6 40.8 DL 129.6 184.8 345.6 184.8 DL/F0 10/Times-Roman@0 SF
245 | (Number of elements)196.775 215.8 Q(Comparisons)45.61 109 Q 2.5(+E)46.27
246 | 121 S(xchanges)-2.5 E 135.36 184.8 129.6 184.8 DL(0)121.72 187 Q 135.36
247 | 136.776 129.6 136.776 DL(2000)106.72 138.976 Q 135.36 88.824 129.6
248 | 88.824 DL(4000)106.72 91.024 Q 135.36 40.8 129.6 40.8 DL(6000)106.72 43
249 | Q 129.6 190.56 129.6 184.8 DL(0)127.1 199.912 Q 172.8 190.56 172.8 184.8
250 | DL(20)167.8 199.912 Q 216 190.56 216 184.8 DL(40)211 199.912 Q 259.2
251 | 190.56 259.2 184.8 DL(60)254.2 199.912 Q 302.4 190.56 302.4 184.8 DL(80)
252 | 297.4 199.912 Q 345.6 190.56 345.6 184.8 DL(100)338.1 199.912 Q -.1(QU)
253 | 155.34 66.976 S(ICKSOR).1 E(T)-.6 E/F1 8/Times-Roman@0 SF(re)279.904
254 | 179.8 Q -.12(ve)-.2 G(rse-sorted).12 E(random)290.18 155.824 Q
255 | (equal-element)236.54 115 Q 151.2 183.72 129.6 184.8 DL 172.8 182.208
256 | 151.2 183.72 DL 194.4 179.688 172.8 182.208 DL 216 177.816 194.4 179.688
257 | DL 237.6 175.08 216 177.888 DL 259.2 172.272 237.6 175.08 DL 280.8
258 | 167.448 259.2 172.272 DL 302.4 166.296 280.8 167.448 DL 324 160.896
259 | 302.4 166.296 DL 345.6 156.36 324 160.896 DL 133.2 184.584 129.6 184.8
260 | DL 139.176 184.152 135.576 184.368 DL 145.224 183.72 141.624 183.936 DL
261 | 151.2 183.288 147.6 183.504 DL 154.728 182.64 151.2 183.288 DL 160.776
262 | 181.56 157.248 182.208 DL 166.752 180.408 163.224 181.056 DL 172.8
263 | 179.328 169.272 179.976 DL 176.256 178.32 172.8 179.328 DL 182.304
264 | 176.52 178.848 177.528 DL 188.352 174.792 184.896 175.8 DL 194.4 172.992
265 | 190.944 174 DL 197.712 171.624 194.4 172.992 DL 203.832 169.104 200.52
266 | 170.472 DL 209.88 166.656 206.568 168.024 DL 216 164.208 212.688 165.576
267 | DL 219.168 162.552 216 164.208 DL 225.288 159.384 222.12 161.04 DL
268 | 231.408 156.216 228.24 157.872 DL 237.6 153.048 234.432 154.704 DL
269 | 240.624 151.104 237.6 153.048 DL 246.816 147.216 243.792 149.16 DL
270 | 253.008 143.328 249.984 145.272 DL 259.2 139.44 256.176 141.384 DL
271 | 262.08 137.352 259.2 139.512 DL 268.344 132.744 265.464 134.904 DL
272 | 274.536 128.136 271.656 130.296 DL 280.8 123.528 277.92 125.688 DL
273 | 283.536 121.224 280.8 123.528 DL 289.8 115.896 287.064 118.2 DL 296.136
274 | 110.568 293.4 112.872 DL 302.4 105.168 299.664 107.472 DL 304.992
275 | 102.648 302.4 105.168 DL 309.744 98.112 307.152 100.632 DL 314.496
276 | 93.504 311.904 96.024 DL 319.248 88.968 316.656 91.488 DL 324 84.36
277 | 321.408 86.88 DL 326.448 81.768 324 84.432 DL 331.2 76.584 328.752
278 | 79.248 DL 336.024 71.472 333.576 74.136 DL 340.776 66.36 338.328 69.024
279 | DL 345.6 61.248 343.152 63.912 DL 129.6 184.8 129.6 184.8 DL 133.2
280 | 184.584 133.2 184.584 DL 136.8 184.296 136.8 184.296 DL 140.4 184.08
281 | 140.4 184.08 DL 144 183.864 144 183.864 DL 147.6 183.648 147.6 183.648
282 | DL 151.2 183.36 151.2 183.36 DL 151.2 183.36 151.2 183.36 DL 154.8
283 | 183.072 154.8 183.072 DL 158.4 182.712 158.4 182.712 DL 162 182.424 162
284 | 182.424 DL 165.6 182.064 165.6 182.064 DL 169.2 181.776 169.2 181.776 DL
285 | 172.8 181.416 172.8 181.416 DL 172.8 181.416 172.8 181.416 DL 176.4
286 | 180.984 176.4 180.984 DL 180 180.48 180 180.48 DL 183.6 180.048 183.6
287 | 180.048 DL 187.2 179.544 187.2 179.544 DL 190.8 179.112 190.8 179.112 DL
288 | 194.4 178.68 194.4 178.68 DL 194.4 178.68 194.4 178.68 DL 198 178.248
289 | 198 178.248 DL 201.6 177.816 201.6 177.816 DL 205.2 177.384 205.2
290 | 177.384 DL 208.8 177.024 208.8 177.024 DL 212.4 176.592 212.4 176.592 DL
291 | 216 176.16 216 176.16 DL 216 176.16 216 176.16 DL 219.6 176.016 219.6
292 | 176.016 DL 223.2 175.872 223.2 175.872 DL 226.8 175.728 226.8 175.728 DL
293 | 230.4 175.584 230.4 175.584 DL 234 175.44 234 175.44 DL 237.6 175.296
294 | 237.6 175.296 DL 237.6 175.296 237.6 175.296 DL 241.2 174.792 241.2
295 | 174.792 DL 244.8 174.36 244.8 174.36 DL 248.4 173.928 248.4 173.928 DL
296 | 252 173.424 252 173.424 DL 255.6 172.992 255.6 172.992 DL 259.2 172.56
297 | 259.2 172.56 DL 259.2 172.56 259.2 172.56 DL 262.8 171.48 262.8 171.48
298 | DL 266.4 170.472 266.4 170.472 DL 270 169.392 270 169.392 DL 273.6
299 | 168.384 273.6 168.384 DL 277.2 167.376 277.2 167.376 DL 280.8 166.296
300 | 280.8 166.296 DL 280.8 166.296 280.8 166.296 DL 284.4 166.368 284.4
301 | 166.368 DL 288 166.368 288 166.368 DL 291.6 166.368 291.6 166.368 DL
302 | 295.2 166.44 295.2 166.44 DL 298.8 166.44 298.8 166.44 DL 302.4 166.512
303 | 302.4 166.512 DL 302.4 166.512 302.4 166.512 DL 306 165.36 306 165.36 DL
304 | 309.6 164.208 309.6 164.208 DL 313.2 163.056 313.2 163.056 DL 316.8
305 | 161.904 316.8 161.904 DL 320.4 160.752 320.4 160.752 DL 324 159.6 324
306 | 159.6 DL 324 159.6 324 159.6 DL 327.6 159.384 327.6 159.384 DL 331.2
307 | 159.168 331.2 159.168 DL 334.8 158.88 334.8 158.88 DL 338.4 158.664
308 | 338.4 158.664 DL 342 158.448 342 158.448 DL 345.6 158.232 345.6 158.232
309 | DL 0 Cg EP
310 | %%Trailer
311 | end
312 | %%Trailer
313 | cleartomark
314 | countdictstack
315 | exch sub { end } repeat
316 | restore
317 | %%EOF
318 | 


--------------------------------------------------------------------------------
/latex_src/images/report3.eps:
--------------------------------------------------------------------------------
  1 | %!PS-Adobe-2.0 EPSF-2.0
  2 | %%BoundingBox: 110 507 502 787
  3 | %%HiResBoundingBox: 110.500000 507.000000 502.000000 786.500000
  4 | %%Creator: groff version 1.22.2
  5 | %%CreationDate: Mon Feb  6 21:07:47 2017
  6 | %%DocumentNeededResources: font Times-Roman
  7 | %%DocumentSuppliedResources: procset grops 1.22 2
  8 | %%PageOrder: Ascend
  9 | %%DocumentMedia: Default 612 792 0 () ()
 10 | %%EndComments
 11 | % EPSF created by ps2eps 1.68
 12 | %%BeginProlog
 13 | save
 14 | countdictstack
 15 | mark
 16 | newpath
 17 | /showpage {} def
 18 | /setpagedevice {pop} def
 19 | %%EndProlog
 20 | %%Page 1 1
 21 | %%BeginDefaults
 22 | %%PageMedia: Default
 23 | %%EndDefaults
 24 | %%BeginProlog
 25 | %%BeginResource: procset grops 1.22 2
 26 | %!PS-Adobe-3.0 Resource-ProcSet
 27 | /setpacking where{
 28 | pop
 29 | currentpacking
 30 | true setpacking
 31 | }if
 32 | /grops 120 dict dup begin
 33 | /SC 32 def
 34 | /A/show load def
 35 | /B{0 SC 3 -1 roll widthshow}bind def
 36 | /C{0 exch ashow}bind def
 37 | /D{0 exch 0 SC 5 2 roll awidthshow}bind def
 38 | /E{0 rmoveto show}bind def
 39 | /F{0 rmoveto 0 SC 3 -1 roll widthshow}bind def
 40 | /G{0 rmoveto 0 exch ashow}bind def
 41 | /H{0 rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 42 | /I{0 exch rmoveto show}bind def
 43 | /J{0 exch rmoveto 0 SC 3 -1 roll widthshow}bind def
 44 | /K{0 exch rmoveto 0 exch ashow}bind def
 45 | /L{0 exch rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 46 | /M{rmoveto show}bind def
 47 | /N{rmoveto 0 SC 3 -1 roll widthshow}bind def
 48 | /O{rmoveto 0 exch ashow}bind def
 49 | /P{rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 50 | /Q{moveto show}bind def
 51 | /R{moveto 0 SC 3 -1 roll widthshow}bind def
 52 | /S{moveto 0 exch ashow}bind def
 53 | /T{moveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 54 | /SF{
 55 | findfont exch
 56 | [exch dup 0 exch 0 exch neg 0 0]makefont
 57 | dup setfont
 58 | [exch/setfont cvx]cvx bind def
 59 | }bind def
 60 | /MF{
 61 | findfont
 62 | [5 2 roll
 63 | 0 3 1 roll
 64 | neg 0 0]makefont
 65 | dup setfont
 66 | [exch/setfont cvx]cvx bind def
 67 | }bind def
 68 | /level0 0 def
 69 | /RES 0 def
 70 | /PL 0 def
 71 | /LS 0 def
 72 | /MANUAL{
 73 | statusdict begin/manualfeed true store end
 74 | }bind def
 75 | /PLG{
 76 | gsave newpath clippath pathbbox grestore
 77 | exch pop add exch pop
 78 | }bind def
 79 | /BP{
 80 | /level0 save def
 81 | 1 setlinecap
 82 | 1 setlinejoin
 83 | DEFS/BPhook known{DEFS begin BPhook end}if
 84 | 72 RES div dup scale
 85 | LS{
 86 | 90 rotate
 87 | }{
 88 | 0 PL translate
 89 | }ifelse
 90 | 1 -1 scale
 91 | }bind def
 92 | /EP{
 93 | level0 restore
 94 | showpage
 95 | }def
 96 | /DA{
 97 | newpath arcn stroke
 98 | }bind def
 99 | /SN{
100 | transform
101 | .25 sub exch .25 sub exch
102 | round .25 add exch round .25 add exch
103 | itransform
104 | }bind def
105 | /DL{
106 | SN
107 | moveto
108 | SN
109 | lineto stroke
110 | }bind def
111 | /DC{
112 | newpath 0 360 arc closepath
113 | }bind def
114 | /TM matrix def
115 | /DE{
116 | TM currentmatrix pop
117 | translate scale newpath 0 0 .5 0 360 arc closepath
118 | TM setmatrix
119 | }bind def
120 | /RC/rcurveto load def
121 | /RL/rlineto load def
122 | /ST/stroke load def
123 | /MT/moveto load def
124 | /CL/closepath load def
125 | /Fr{
126 | setrgbcolor fill
127 | }bind def
128 | /setcmykcolor where{
129 | pop
130 | /Fk{
131 | setcmykcolor fill
132 | }bind def
133 | }if
134 | /Fg{
135 | setgray fill
136 | }bind def
137 | /FL/fill load def
138 | /LW/setlinewidth load def
139 | /Cr/setrgbcolor load def
140 | /setcmykcolor where{
141 | pop
142 | /Ck/setcmykcolor load def
143 | }if
144 | /Cg/setgray load def
145 | /RE{
146 | findfont
147 | dup maxlength 1 index/FontName known not{1 add}if dict begin
148 | {
149 | 1 index/FID ne
150 | 2 index/UniqueID ne
151 | and
152 | {def}{pop pop}ifelse
153 | }forall
154 | /Encoding exch def
155 | dup/FontName exch def
156 | currentdict end definefont pop
157 | }bind def
158 | /DEFS 0 def
159 | /EBEGIN{
160 | moveto
161 | DEFS begin
162 | }bind def
163 | /EEND/end load def
164 | /CNT 0 def
165 | /level1 0 def
166 | /PBEGIN{
167 | /level1 save def
168 | translate
169 | div 3 1 roll div exch scale
170 | neg exch neg exch translate
171 | 0 setgray
172 | 0 setlinecap
173 | 1 setlinewidth
174 | 0 setlinejoin
175 | 10 setmiterlimit
176 | []0 setdash
177 | /setstrokeadjust where{
178 | pop
179 | false setstrokeadjust
180 | }if
181 | /setoverprint where{
182 | pop
183 | false setoverprint
184 | }if
185 | newpath
186 | /CNT countdictstack def
187 | userdict begin
188 | /showpage{}def
189 | /setpagedevice{}def
190 | mark
191 | }bind def
192 | /PEND{
193 | cleartomark
194 | countdictstack CNT sub{end}repeat
195 | level1 restore
196 | }bind def
197 | end def
198 | /setpacking where{
199 | pop
200 | setpacking
201 | }if
202 | %%EndResource
203 | %%EndProlog
204 | %%BeginSetup
205 | %%BeginFeature: *PageSize Default
206 | << /PageSize [ 612 792 ] /ImagingBBox null >> setpagedevice
207 | %%EndFeature
208 | %%IncludeResource: font Times-Roman
209 | grops begin/DEFS 1 dict def DEFS begin/u{.001 mul}bind def end/RES 72
210 | def/PL 792 def/LS false def/ENC0[/asciicircum/asciitilde/Scaron/Zcaron
211 | /scaron/zcaron/Ydieresis/trademark/quotesingle/Euro/.notdef/.notdef
212 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
213 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
214 | /.notdef/.notdef/space/exclam/quotedbl/numbersign/dollar/percent
215 | /ampersand/quoteright/parenleft/parenright/asterisk/plus/comma/hyphen
216 | /period/slash/zero/one/two/three/four/five/six/seven/eight/nine/colon
217 | /semicolon/less/equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M/N/O
218 | /P/Q/R/S/T/U/V/W/X/Y/Z/bracketleft/backslash/bracketright/circumflex
219 | /underscore/quoteleft/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y
220 | /z/braceleft/bar/braceright/tilde/.notdef/quotesinglbase/guillemotleft
221 | /guillemotright/bullet/florin/fraction/perthousand/dagger/daggerdbl
222 | /endash/emdash/ff/fi/fl/ffi/ffl/dotlessi/dotlessj/grave/hungarumlaut
223 | /dotaccent/breve/caron/ring/ogonek/quotedblleft/quotedblright/oe/lslash
224 | /quotedblbase/OE/Lslash/.notdef/exclamdown/cent/sterling/currency/yen
225 | /brokenbar/section/dieresis/copyright/ordfeminine/guilsinglleft
226 | /logicalnot/minus/registered/macron/degree/plusminus/twosuperior
227 | /threesuperior/acute/mu/paragraph/periodcentered/cedilla/onesuperior
228 | /ordmasculine/guilsinglright/onequarter/onehalf/threequarters
229 | /questiondown/Agrave/Aacute/Acircumflex/Atilde/Adieresis/Aring/AE
230 | /Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/Igrave/Iacute/Icircumflex
231 | /Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/Odieresis
232 | /multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn
233 | /germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/ccedilla
234 | /egrave/eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/idieresis
235 | /eth/ntilde/ograve/oacute/ocircumflex/otilde/odieresis/divide/oslash
236 | /ugrave/uacute/ucircumflex/udieresis/yacute/thorn/ydieresis]def
237 | /Times-Roman@0 ENC0/Times-Roman RE
238 | %%EndSetup
239 | %%Page: 1 1
240 | %%BeginPageSetup
241 | BP
242 | %%EndPageSetup
243 | /F0 10/Times-Roman@0 SF(Report No. 3)110.245 12 Q(POPULA)32.37 E
244 | (TION, AREA, POPULA)-1.11 E(TION DENSITY)-1.11 E(January 1, 1988)37.79 E
245 | 24.6(CONTINENT COUNTR)110.245 36 R 26.57(YP)-.65 G(OPULA)-26.57 E 51.975
246 | (TION AREA)-1.11 F(POP)44.315 E 2.5(.D)-1.11 G(EN.)-2.5 E 15.985
247 | (Millions Pct.)261.425 48 R 13.91(of Thousands)2.5 F(Pct. of)15.705 E
248 | (People per)16.395 E(of People)259.35 60 Q -.8(To)19.31 G 18.195(tal of)
249 | .8 F(Sq. Mi.)2.5 E -.8(To)19.99 G 23.175(tal Sq.).8 F(Mi.)2.5 E .4 LW
250 | 501.755 64.5 251.145 64.5 DL 63.48(Asia Japan)110.245 74 R 34.74
251 | (120 4.3)56.05 F 34.175(144 0.6)37.38 F(833.3)28.195 E 55.77(India 746)
252 | 194.555 86 R 29.88(26.5 1267)32.24 F 25.695(4.9 588.8)36.675 F 47.43
253 | (China 1032)194.555 98 R 29.88(36.6 3705)32.24 F 25.695(14.4 278.5)
254 | 31.675 F 51.31(USSR 275)194.555 110 R 29.88(9.8 8649)37.24 F 30.695
255 | (33.7 31.8)31.675 F 501.755 114.5 251.145 114.5 DL -.18(TO)117.745 124 S
256 | -.93(TA)-.22 G 2.5(Lf).93 G(or Asia)-2.5 E 29.74(2173 77.1)84.38 F
257 | 29.175(13765 53.6)27.38 F 501.755 128.5 251.145 128.5 DL 251.145 130.5
258 | 501.755 130.5 DL 52.93(Europe German)110.245 140 R 46.76(y6)-.15 G 37.24
259 | (12)-46.76 G 39.88(.2 96)-37.24 F 25.695(0.4 635.4)36.675 F 47.99
260 | (England 56)194.555 152 R 39.88(2.0 94)37.24 F 25.695(0.4 595.7)36.675 F
261 | 54.11(France 55)194.555 164 R 34.88(2.0 211)37.24 F 25.695(0.8 260.7)
262 | 36.675 F 501.755 168.5 251.145 168.5 DL -.18(TO)117.745 178 S -.93(TA)
263 | -.22 G 2.5(Lf).93 G(or Europe)-2.5 E 34.74(172 6.1)78.83 F 34.175
264 | (401 1.6)37.38 F 501.755 182.5 251.145 182.5 DL 251.145 184.5 501.755
265 | 184.5 DL(North America)110.245 194 Q(Me)24.05 E 50.92(xico 78)-.15 F
266 | 34.88(2.8 762)37.24 F 25.695(3.0 102.4)36.675 F 56.32(USA 237)194.555
267 | 206 R 29.88(8.4 3615)37.24 F 30.695(14.1 65.6)31.675 F 51.33(Canada 25)
268 | 194.555 218 R 29.88(0.9 3852)37.24 F 35.695(15.0 6.5)31.675 F 501.755
269 | 222.5 251.145 222.5 DL -.18(TO)117.745 232 S -.93(TA)-.22 G 2.5(Lf).93 G
270 | (or North America)-2.5 E 29.74(340 12.1)47.45 F 29.175(8229 32.0)32.38 F
271 | 501.755 236.5 251.145 236.5 DL 251.145 238.5 501.755 238.5 DL
272 | (South America)110.245 248 Q 51.88(Brazil 134)24.04 F 29.88(4.8 3286)
273 | 37.24 F 30.695(12.8 40.8)31.675 F 501.755 252.5 251.145 252.5 DL -.18
274 | (TO)117.745 262 S -.93(TA)-.22 G 2.5(Lf).93 G(or South America)-2.5 E
275 | 34.74(134 4.8)47.44 F 29.175(3286 12.8)32.38 F 501.755 266.5 251.145
276 | 266.5 DL 251.145 268.5 501.755 268.5 DL(GRAND T)110.245 278 Q -.4(OT)
277 | -.18 G 86.32(AL 2819)-.53 F 24.88(100.0 25681)27.24 F(100.0)26.675 E
278 | 501.755 282.5 251.145 282.5 DL 251.145 284.5 501.755 284.5 DL 0 Cg EP
279 | %%Trailer
280 | end
281 | %%Trailer
282 | cleartomark
283 | countdictstack
284 | exch sub { end } repeat
285 | restore
286 | %%EOF
287 | 


--------------------------------------------------------------------------------
/latex_src/images/sort_cmp.eps:
--------------------------------------------------------------------------------
  1 | %!PS-Adobe-2.0 EPSF-2.0
  2 | %%BoundingBox: 46 576 353 756
  3 | %%HiResBoundingBox: 46.000000 576.000000 352.500000 756.000000
  4 | %%Creator: groff version 1.22.2
  5 | %%CreationDate: Wed Feb  8 23:22:33 2017
  6 | %%DocumentNeededResources: font Times-Roman
  7 | %%DocumentSuppliedResources: procset grops 1.22 2
  8 | %%PageOrder: Ascend
  9 | %%DocumentMedia: Default 612 792 0 () ()
 10 | %%EndComments
 11 | % EPSF created by ps2eps 1.68
 12 | %%BeginProlog
 13 | save
 14 | countdictstack
 15 | mark
 16 | newpath
 17 | /showpage {} def
 18 | /setpagedevice {pop} def
 19 | %%EndProlog
 20 | %%Page 1 1
 21 | %%BeginDefaults
 22 | %%PageMedia: Default
 23 | %%EndDefaults
 24 | %%BeginProlog
 25 | %%BeginResource: procset grops 1.22 2
 26 | %!PS-Adobe-3.0 Resource-ProcSet
 27 | /setpacking where{
 28 | pop
 29 | currentpacking
 30 | true setpacking
 31 | }if
 32 | /grops 120 dict dup begin
 33 | /SC 32 def
 34 | /A/show load def
 35 | /B{0 SC 3 -1 roll widthshow}bind def
 36 | /C{0 exch ashow}bind def
 37 | /D{0 exch 0 SC 5 2 roll awidthshow}bind def
 38 | /E{0 rmoveto show}bind def
 39 | /F{0 rmoveto 0 SC 3 -1 roll widthshow}bind def
 40 | /G{0 rmoveto 0 exch ashow}bind def
 41 | /H{0 rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 42 | /I{0 exch rmoveto show}bind def
 43 | /J{0 exch rmoveto 0 SC 3 -1 roll widthshow}bind def
 44 | /K{0 exch rmoveto 0 exch ashow}bind def
 45 | /L{0 exch rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 46 | /M{rmoveto show}bind def
 47 | /N{rmoveto 0 SC 3 -1 roll widthshow}bind def
 48 | /O{rmoveto 0 exch ashow}bind def
 49 | /P{rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 50 | /Q{moveto show}bind def
 51 | /R{moveto 0 SC 3 -1 roll widthshow}bind def
 52 | /S{moveto 0 exch ashow}bind def
 53 | /T{moveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 54 | /SF{
 55 | findfont exch
 56 | [exch dup 0 exch 0 exch neg 0 0]makefont
 57 | dup setfont
 58 | [exch/setfont cvx]cvx bind def
 59 | }bind def
 60 | /MF{
 61 | findfont
 62 | [5 2 roll
 63 | 0 3 1 roll
 64 | neg 0 0]makefont
 65 | dup setfont
 66 | [exch/setfont cvx]cvx bind def
 67 | }bind def
 68 | /level0 0 def
 69 | /RES 0 def
 70 | /PL 0 def
 71 | /LS 0 def
 72 | /MANUAL{
 73 | statusdict begin/manualfeed true store end
 74 | }bind def
 75 | /PLG{
 76 | gsave newpath clippath pathbbox grestore
 77 | exch pop add exch pop
 78 | }bind def
 79 | /BP{
 80 | /level0 save def
 81 | 1 setlinecap
 82 | 1 setlinejoin
 83 | DEFS/BPhook known{DEFS begin BPhook end}if
 84 | 72 RES div dup scale
 85 | LS{
 86 | 90 rotate
 87 | }{
 88 | 0 PL translate
 89 | }ifelse
 90 | 1 -1 scale
 91 | }bind def
 92 | /EP{
 93 | level0 restore
 94 | showpage
 95 | }def
 96 | /DA{
 97 | newpath arcn stroke
 98 | }bind def
 99 | /SN{
100 | transform
101 | .25 sub exch .25 sub exch
102 | round .25 add exch round .25 add exch
103 | itransform
104 | }bind def
105 | /DL{
106 | SN
107 | moveto
108 | SN
109 | lineto stroke
110 | }bind def
111 | /DC{
112 | newpath 0 360 arc closepath
113 | }bind def
114 | /TM matrix def
115 | /DE{
116 | TM currentmatrix pop
117 | translate scale newpath 0 0 .5 0 360 arc closepath
118 | TM setmatrix
119 | }bind def
120 | /RC/rcurveto load def
121 | /RL/rlineto load def
122 | /ST/stroke load def
123 | /MT/moveto load def
124 | /CL/closepath load def
125 | /Fr{
126 | setrgbcolor fill
127 | }bind def
128 | /setcmykcolor where{
129 | pop
130 | /Fk{
131 | setcmykcolor fill
132 | }bind def
133 | }if
134 | /Fg{
135 | setgray fill
136 | }bind def
137 | /FL/fill load def
138 | /LW/setlinewidth load def
139 | /Cr/setrgbcolor load def
140 | /setcmykcolor where{
141 | pop
142 | /Ck/setcmykcolor load def
143 | }if
144 | /Cg/setgray load def
145 | /RE{
146 | findfont
147 | dup maxlength 1 index/FontName known not{1 add}if dict begin
148 | {
149 | 1 index/FID ne
150 | 2 index/UniqueID ne
151 | and
152 | {def}{pop pop}ifelse
153 | }forall
154 | /Encoding exch def
155 | dup/FontName exch def
156 | currentdict end definefont pop
157 | }bind def
158 | /DEFS 0 def
159 | /EBEGIN{
160 | moveto
161 | DEFS begin
162 | }bind def
163 | /EEND/end load def
164 | /CNT 0 def
165 | /level1 0 def
166 | /PBEGIN{
167 | /level1 save def
168 | translate
169 | div 3 1 roll div exch scale
170 | neg exch neg exch translate
171 | 0 setgray
172 | 0 setlinecap
173 | 1 setlinewidth
174 | 0 setlinejoin
175 | 10 setmiterlimit
176 | []0 setdash
177 | /setstrokeadjust where{
178 | pop
179 | false setstrokeadjust
180 | }if
181 | /setoverprint where{
182 | pop
183 | false setoverprint
184 | }if
185 | newpath
186 | /CNT countdictstack def
187 | userdict begin
188 | /showpage{}def
189 | /setpagedevice{}def
190 | mark
191 | }bind def
192 | /PEND{
193 | cleartomark
194 | countdictstack CNT sub{end}repeat
195 | level1 restore
196 | }bind def
197 | end def
198 | /setpacking where{
199 | pop
200 | setpacking
201 | }if
202 | %%EndResource
203 | %%EndProlog
204 | %%BeginSetup
205 | %%BeginFeature: *PageSize Default
206 | << /PageSize [ 612 792 ] /ImagingBBox null >> setpagedevice
207 | %%EndFeature
208 | %%IncludeResource: font Times-Roman
209 | grops begin/DEFS 1 dict def DEFS begin/u{.001 mul}bind def end/RES 72
210 | def/PL 792 def/LS false def/ENC0[/asciicircum/asciitilde/Scaron/Zcaron
211 | /scaron/zcaron/Ydieresis/trademark/quotesingle/Euro/.notdef/.notdef
212 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
213 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
214 | /.notdef/.notdef/space/exclam/quotedbl/numbersign/dollar/percent
215 | /ampersand/quoteright/parenleft/parenright/asterisk/plus/comma/hyphen
216 | /period/slash/zero/one/two/three/four/five/six/seven/eight/nine/colon
217 | /semicolon/less/equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M/N/O
218 | /P/Q/R/S/T/U/V/W/X/Y/Z/bracketleft/backslash/bracketright/circumflex
219 | /underscore/quoteleft/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y
220 | /z/braceleft/bar/braceright/tilde/.notdef/quotesinglbase/guillemotleft
221 | /guillemotright/bullet/florin/fraction/perthousand/dagger/daggerdbl
222 | /endash/emdash/ff/fi/fl/ffi/ffl/dotlessi/dotlessj/grave/hungarumlaut
223 | /dotaccent/breve/caron/ring/ogonek/quotedblleft/quotedblright/oe/lslash
224 | /quotedblbase/OE/Lslash/.notdef/exclamdown/cent/sterling/currency/yen
225 | /brokenbar/section/dieresis/copyright/ordfeminine/guilsinglleft
226 | /logicalnot/minus/registered/macron/degree/plusminus/twosuperior
227 | /threesuperior/acute/mu/paragraph/periodcentered/cedilla/onesuperior
228 | /ordmasculine/guilsinglright/onequarter/onehalf/threequarters
229 | /questiondown/Agrave/Aacute/Acircumflex/Atilde/Adieresis/Aring/AE
230 | /Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/Igrave/Iacute/Icircumflex
231 | /Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/Odieresis
232 | /multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn
233 | /germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/ccedilla
234 | /egrave/eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/idieresis
235 | /eth/ntilde/ograve/oacute/ocircumflex/otilde/odieresis/divide/oslash
236 | /ugrave/uacute/ucircumflex/udieresis/yacute/thorn/ydieresis]def
237 | /Times-Roman@0 ENC0/Times-Roman RE
238 | %%EndSetup
239 | %%Page: 1 1
240 | %%BeginPageSetup
241 | BP
242 | %%EndPageSetup
243 | .4 LW 129.6 40.8 129.6 184.8 DL 345.6 40.8 129.6 40.8 DL 345.6 184.8
244 | 345.6 40.8 DL 129.6 184.8 345.6 184.8 DL/F0 10/Times-Roman@0 SF
245 | (Number of elements)196.775 215.8 Q(Comparisons)45.61 109 Q 2.5(+E)46.27
246 | 121 S(xchanges)-2.5 E 135.36 184.8 129.6 184.8 DL(0)121.72 187 Q 135.36
247 | 156 129.6 156 DL(1000)106.72 158.2 Q 135.36 127.2 129.6 127.2 DL(2000)
248 | 106.72 129.4 Q 135.36 98.4 129.6 98.4 DL(3000)106.72 100.6 Q 135.36 69.6
249 | 129.6 69.6 DL(4000)106.72 71.8 Q 135.36 40.8 129.6 40.8 DL(5000)106.72
250 | 43 Q 129.6 190.56 129.6 184.8 DL(0)127.1 199.912 Q 172.8 190.56 172.8
251 | 184.8 DL(20)167.8 199.912 Q 216 190.56 216 184.8 DL(40)211 199.912 Q
252 | 259.2 190.56 259.2 184.8 DL(60)254.2 199.912 Q 302.4 190.56 302.4 184.8
253 | DL(80)297.4 199.912 Q 345.6 190.56 345.6 184.8 DL(100)338.1 199.912 Q
254 | (COMP)154.44 65.56 Q(ARISON OF)-.92 E(SOR)147.62 77.56 Q(TING METHODS)
255 | -.6 E(\(RANDOM D)154.935 89.56 Q -1.21 -1.11(AT A)-.4 H(\))1.11 E/F1 8
256 | /Times-Roman@0 SF(isort)273.688 100.6 Q(hsort)294.4 143.8 Q(qsort)294.4
257 | 172.6 Q 151.2 182.928 129.6 184.8 DL 172.8 179.688 151.2 182.928 DL
258 | 194.4 175.296 172.8 179.688 DL 216 170.904 194.4 175.296 DL 237.6 165.36
259 | 216 170.832 DL 259.2 160.608 237.6 165.36 DL 280.8 155.424 259.2 160.608
260 | DL 302.4 149.664 280.8 155.424 DL 324 144.048 302.4 149.664 DL 345.6
261 | 138.144 324 144.048 DL 133.2 184.584 129.6 184.8 DL 139.176 184.224
262 | 135.576 184.44 DL 145.224 183.864 141.624 184.08 DL 151.2 183.504 147.6
263 | 183.72 DL 154.8 183.216 151.2 183.504 DL 160.776 182.712 157.176 183 DL
264 | 166.824 182.208 163.224 182.496 DL 172.8 181.704 169.2 181.992 DL 176.4
265 | 181.2 172.8 181.704 DL 182.376 180.336 178.776 180.84 DL 188.424 179.544
266 | 184.824 180.048 DL 194.4 178.68 190.8 179.184 DL 198 178.32 194.4 178.68
267 | DL 203.976 177.744 200.376 178.104 DL 210.024 177.096 206.424 177.456 DL
268 | 216 176.52 212.4 176.88 DL 219.528 175.872 216 176.448 DL 225.576
269 | 174.936 222.048 175.512 DL 231.552 174.072 228.024 174.648 DL 237.6
270 | 173.136 234.072 173.712 DL 241.128 172.56 237.6 173.136 DL 247.176
271 | 171.624 243.648 172.2 DL 253.152 170.688 249.624 171.264 DL 259.2
272 | 169.752 255.672 170.328 DL 262.656 168.816 259.2 169.752 DL 268.704
273 | 167.16 265.248 168.096 DL 274.752 165.576 271.296 166.512 DL 280.8
274 | 163.92 277.344 164.856 DL 284.4 163.704 280.8 163.92 DL 290.376 163.344
275 | 286.776 163.56 DL 296.424 162.984 292.824 163.2 DL 302.4 162.552 298.8
276 | 162.768 DL 305.856 161.544 302.4 162.552 DL 311.904 159.744 308.448
277 | 160.752 DL 317.952 157.944 314.496 158.952 DL 324 156.144 320.544
278 | 157.152 DL 327.456 155.28 324 156.144 DL 333.504 153.768 330.048 154.632
279 | DL 339.552 152.256 336.096 153.12 DL 345.6 150.744 342.144 151.608 DL
280 | 129.6 184.8 129.6 184.8 DL 133.2 184.512 133.2 184.512 DL 136.8 184.296
281 | 136.8 184.296 DL 140.4 184.008 140.4 184.008 DL 144 183.792 144 183.792
282 | DL 147.6 183.504 147.6 183.504 DL 151.2 183.288 151.2 183.288 DL 151.2
283 | 183.288 151.2 183.288 DL 154.8 182.568 154.8 182.568 DL 158.4 181.92
284 | 158.4 181.92 DL 162 181.2 162 181.2 DL 165.6 180.552 165.6 180.552 DL
285 | 169.2 179.832 169.2 179.832 DL 172.8 179.184 172.8 179.184 DL 172.8
286 | 179.184 172.8 179.184 DL 175.896 177.744 175.896 177.744 DL 178.992
287 | 176.376 178.992 176.376 DL 182.088 174.936 182.088 174.936 DL 185.112
288 | 173.496 185.112 173.496 DL 188.208 172.128 188.208 172.128 DL 191.304
289 | 170.688 191.304 170.688 DL 194.4 169.32 194.4 169.32 DL 194.4 169.32
290 | 194.4 169.32 DL 198 168.456 198 168.456 DL 201.6 167.52 201.6 167.52 DL
291 | 205.2 166.656 205.2 166.656 DL 208.8 165.792 208.8 165.792 DL 212.4
292 | 164.928 212.4 164.928 DL 216 164.064 216 164.064 DL 216 164.064 216
293 | 164.064 DL 219.096 162.408 219.096 162.408 DL 222.192 160.752 222.192
294 | 160.752 DL 225.288 159.096 225.288 159.096 DL 228.312 157.44 228.312
295 | 157.44 DL 231.408 155.784 231.408 155.784 DL 234.504 154.128 234.504
296 | 154.128 DL 237.6 152.544 237.6 152.544 DL 237.6 152.544 237.6 152.544 DL
297 | 240.336 150.168 240.336 150.168 DL 243 147.864 243 147.864 DL 245.736
298 | 145.56 245.736 145.56 DL 248.4 143.256 248.4 143.256 DL 251.136 140.952
299 | 251.136 140.952 DL 253.8 138.648 253.8 138.648 DL 256.536 136.344
300 | 256.536 136.344 DL 259.2 133.968 259.2 133.968 DL 259.2 133.968 259.2
301 | 133.968 DL 262.296 131.736 262.296 131.736 DL 265.392 129.504 265.392
302 | 129.504 DL 268.488 127.2 268.488 127.2 DL 271.512 124.968 271.512
303 | 124.968 DL 274.608 122.736 274.608 122.736 DL 277.704 120.432 277.704
304 | 120.432 DL 280.8 118.2 280.8 118.2 DL 280.8 118.2 280.8 118.2 DL 282.744
305 | 115.104 282.744 115.104 DL 284.76 111.936 284.76 111.936 DL 286.704
306 | 108.84 286.704 108.84 DL 288.648 105.744 288.648 105.744 DL 290.592
307 | 102.648 290.592 102.648 DL 292.608 99.48 292.608 99.48 DL 294.552 96.384
308 | 294.552 96.384 DL 296.496 93.288 296.496 93.288 DL 298.44 90.192 298.44
309 | 90.192 DL 300.456 87.024 300.456 87.024 DL 302.4 83.928 302.4 83.928 DL
310 | 302.4 83.928 302.4 83.928 DL 305.496 81.768 305.496 81.768 DL 308.592
311 | 79.536 308.592 79.536 DL 311.688 77.376 311.688 77.376 DL 314.712 75.216
312 | 314.712 75.216 DL 317.808 72.984 317.808 72.984 DL 320.904 70.824
313 | 320.904 70.824 DL 324 68.592 324 68.592 DL 324 68.592 324 68.592 DL
314 | 326.016 65.496 326.016 65.496 DL 328.032 62.472 328.032 62.472 DL
315 | 330.048 59.376 330.048 59.376 DL 332.136 56.28 332.136 56.28 DL 334.152
316 | 53.184 334.152 53.184 DL 336.168 50.088 336.168 50.088 DL 338.184 46.992
317 | 338.184 46.992 DL 340.2 43.896 340.2 43.896 DL 342.216 40.8 342.216 40.8
318 | DL 0 Cg EP
319 | %%Trailer
320 | end
321 | %%Trailer
322 | cleartomark
323 | countdictstack
324 | exch sub { end } repeat
325 | restore
326 | %%EOF
327 | 


--------------------------------------------------------------------------------
/latex_src/images/traffic_deaths.eps:
--------------------------------------------------------------------------------
  1 | %!PS-Adobe-2.0 EPSF-2.0
  2 | %%BoundingBox: 69 575 317 752
  3 | %%HiResBoundingBox: 69.500000 575.000000 317.000000 751.500000
  4 | %%Creator: groff version 1.22.2
  5 | %%CreationDate: Tue Feb  7 22:58:36 2017
  6 | %%DocumentNeededResources: font Times-Roman
  7 | %%DocumentSuppliedResources: procset grops 1.22 2
  8 | %%PageOrder: Ascend
  9 | %%DocumentMedia: Default 612 792 0 () ()
 10 | %%EndComments
 11 | % EPSF created by ps2eps 1.68
 12 | %%BeginProlog
 13 | save
 14 | countdictstack
 15 | mark
 16 | newpath
 17 | /showpage {} def
 18 | /setpagedevice {pop} def
 19 | %%EndProlog
 20 | %%Page 1 1
 21 | %%BeginDefaults
 22 | %%PageMedia: Default
 23 | %%EndDefaults
 24 | %%BeginProlog
 25 | %%BeginResource: procset grops 1.22 2
 26 | %!PS-Adobe-3.0 Resource-ProcSet
 27 | /setpacking where{
 28 | pop
 29 | currentpacking
 30 | true setpacking
 31 | }if
 32 | /grops 120 dict dup begin
 33 | /SC 32 def
 34 | /A/show load def
 35 | /B{0 SC 3 -1 roll widthshow}bind def
 36 | /C{0 exch ashow}bind def
 37 | /D{0 exch 0 SC 5 2 roll awidthshow}bind def
 38 | /E{0 rmoveto show}bind def
 39 | /F{0 rmoveto 0 SC 3 -1 roll widthshow}bind def
 40 | /G{0 rmoveto 0 exch ashow}bind def
 41 | /H{0 rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 42 | /I{0 exch rmoveto show}bind def
 43 | /J{0 exch rmoveto 0 SC 3 -1 roll widthshow}bind def
 44 | /K{0 exch rmoveto 0 exch ashow}bind def
 45 | /L{0 exch rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 46 | /M{rmoveto show}bind def
 47 | /N{rmoveto 0 SC 3 -1 roll widthshow}bind def
 48 | /O{rmoveto 0 exch ashow}bind def
 49 | /P{rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 50 | /Q{moveto show}bind def
 51 | /R{moveto 0 SC 3 -1 roll widthshow}bind def
 52 | /S{moveto 0 exch ashow}bind def
 53 | /T{moveto 0 exch 0 SC 5 2 roll awidthshow}bind def
 54 | /SF{
 55 | findfont exch
 56 | [exch dup 0 exch 0 exch neg 0 0]makefont
 57 | dup setfont
 58 | [exch/setfont cvx]cvx bind def
 59 | }bind def
 60 | /MF{
 61 | findfont
 62 | [5 2 roll
 63 | 0 3 1 roll
 64 | neg 0 0]makefont
 65 | dup setfont
 66 | [exch/setfont cvx]cvx bind def
 67 | }bind def
 68 | /level0 0 def
 69 | /RES 0 def
 70 | /PL 0 def
 71 | /LS 0 def
 72 | /MANUAL{
 73 | statusdict begin/manualfeed true store end
 74 | }bind def
 75 | /PLG{
 76 | gsave newpath clippath pathbbox grestore
 77 | exch pop add exch pop
 78 | }bind def
 79 | /BP{
 80 | /level0 save def
 81 | 1 setlinecap
 82 | 1 setlinejoin
 83 | DEFS/BPhook known{DEFS begin BPhook end}if
 84 | 72 RES div dup scale
 85 | LS{
 86 | 90 rotate
 87 | }{
 88 | 0 PL translate
 89 | }ifelse
 90 | 1 -1 scale
 91 | }bind def
 92 | /EP{
 93 | level0 restore
 94 | showpage
 95 | }def
 96 | /DA{
 97 | newpath arcn stroke
 98 | }bind def
 99 | /SN{
100 | transform
101 | .25 sub exch .25 sub exch
102 | round .25 add exch round .25 add exch
103 | itransform
104 | }bind def
105 | /DL{
106 | SN
107 | moveto
108 | SN
109 | lineto stroke
110 | }bind def
111 | /DC{
112 | newpath 0 360 arc closepath
113 | }bind def
114 | /TM matrix def
115 | /DE{
116 | TM currentmatrix pop
117 | translate scale newpath 0 0 .5 0 360 arc closepath
118 | TM setmatrix
119 | }bind def
120 | /RC/rcurveto load def
121 | /RL/rlineto load def
122 | /ST/stroke load def
123 | /MT/moveto load def
124 | /CL/closepath load def
125 | /Fr{
126 | setrgbcolor fill
127 | }bind def
128 | /setcmykcolor where{
129 | pop
130 | /Fk{
131 | setcmykcolor fill
132 | }bind def
133 | }if
134 | /Fg{
135 | setgray fill
136 | }bind def
137 | /FL/fill load def
138 | /LW/setlinewidth load def
139 | /Cr/setrgbcolor load def
140 | /setcmykcolor where{
141 | pop
142 | /Ck/setcmykcolor load def
143 | }if
144 | /Cg/setgray load def
145 | /RE{
146 | findfont
147 | dup maxlength 1 index/FontName known not{1 add}if dict begin
148 | {
149 | 1 index/FID ne
150 | 2 index/UniqueID ne
151 | and
152 | {def}{pop pop}ifelse
153 | }forall
154 | /Encoding exch def
155 | dup/FontName exch def
156 | currentdict end definefont pop
157 | }bind def
158 | /DEFS 0 def
159 | /EBEGIN{
160 | moveto
161 | DEFS begin
162 | }bind def
163 | /EEND/end load def
164 | /CNT 0 def
165 | /level1 0 def
166 | /PBEGIN{
167 | /level1 save def
168 | translate
169 | div 3 1 roll div exch scale
170 | neg exch neg exch translate
171 | 0 setgray
172 | 0 setlinecap
173 | 1 setlinewidth
174 | 0 setlinejoin
175 | 10 setmiterlimit
176 | []0 setdash
177 | /setstrokeadjust where{
178 | pop
179 | false setstrokeadjust
180 | }if
181 | /setoverprint where{
182 | pop
183 | false setoverprint
184 | }if
185 | newpath
186 | /CNT countdictstack def
187 | userdict begin
188 | /showpage{}def
189 | /setpagedevice{}def
190 | mark
191 | }bind def
192 | /PEND{
193 | cleartomark
194 | countdictstack CNT sub{end}repeat
195 | level1 restore
196 | }bind def
197 | end def
198 | /setpacking where{
199 | pop
200 | setpacking
201 | }if
202 | %%EndResource
203 | %%EndProlog
204 | %%BeginSetup
205 | %%BeginFeature: *PageSize Default
206 | << /PageSize [ 612 792 ] /ImagingBBox null >> setpagedevice
207 | %%EndFeature
208 | %%IncludeResource: font Times-Roman
209 | grops begin/DEFS 1 dict def DEFS begin/u{.001 mul}bind def end/RES 72
210 | def/PL 792 def/LS false def/ENC0[/asciicircum/asciitilde/Scaron/Zcaron
211 | /scaron/zcaron/Ydieresis/trademark/quotesingle/Euro/.notdef/.notdef
212 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
213 | /.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef
214 | /.notdef/.notdef/space/exclam/quotedbl/numbersign/dollar/percent
215 | /ampersand/quoteright/parenleft/parenright/asterisk/plus/comma/hyphen
216 | /period/slash/zero/one/two/three/four/five/six/seven/eight/nine/colon
217 | /semicolon/less/equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M/N/O
218 | /P/Q/R/S/T/U/V/W/X/Y/Z/bracketleft/backslash/bracketright/circumflex
219 | /underscore/quoteleft/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y
220 | /z/braceleft/bar/braceright/tilde/.notdef/quotesinglbase/guillemotleft
221 | /guillemotright/bullet/florin/fraction/perthousand/dagger/daggerdbl
222 | /endash/emdash/ff/fi/fl/ffi/ffl/dotlessi/dotlessj/grave/hungarumlaut
223 | /dotaccent/breve/caron/ring/ogonek/quotedblleft/quotedblright/oe/lslash
224 | /quotedblbase/OE/Lslash/.notdef/exclamdown/cent/sterling/currency/yen
225 | /brokenbar/section/dieresis/copyright/ordfeminine/guilsinglleft
226 | /logicalnot/minus/registered/macron/degree/plusminus/twosuperior
227 | /threesuperior/acute/mu/paragraph/periodcentered/cedilla/onesuperior
228 | /ordmasculine/guilsinglright/onequarter/onehalf/threequarters
229 | /questiondown/Agrave/Aacute/Acircumflex/Atilde/Adieresis/Aring/AE
230 | /Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/Igrave/Iacute/Icircumflex
231 | /Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/Odieresis
232 | /multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn
233 | /germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/ccedilla
234 | /egrave/eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/idieresis
235 | /eth/ntilde/ograve/oacute/ocircumflex/otilde/odieresis/divide/oslash
236 | /ugrave/uacute/ucircumflex/udieresis/yacute/thorn/ydieresis]def
237 | /Times-Roman@0 ENC0/Times-Roman RE
238 | %%EndSetup
239 | %%Page: 1 1
240 | %%BeginPageSetup
241 | BP
242 | %%EndPageSetup
243 | .4 LW 100.8 40.8 100.8 184.8 DL 316.8 40.8 100.8 40.8 DL 316.8 184.8
244 | 316.8 40.8 DL 100.8 184.8 316.8 184.8 DL/F0 10/Times-Roman@0 SF
245 | (Annual T)127.725 215.8 Q(raf)-.35 E(\214c Deaths, USA, 1925-1984)-.25 E
246 | 95.04 171.696 100.8 171.696 DL(10000)68.888 173.896 Q 95.04 119.352
247 | 100.8 119.352 DL(30000)68.888 121.552 Q 95.04 67.008 100.8 67.008 DL
248 | (50000)68.888 69.208 Q 131.688 190.56 131.688 184.8 DL(1930)121.688
249 | 199.912 Q 162.504 190.56 162.504 184.8 DL(1940)152.504 199.912 Q 193.392
250 | 190.56 193.392 184.8 DL(1950)183.392 199.912 Q 224.208 190.56 224.208
251 | 184.8 DL(1960)214.208 199.912 Q 255.096 190.56 255.096 184.8 DL(1970)
252 | 245.096 199.912 Q 285.912 190.56 285.912 184.8 DL(1980)275.912 199.912 Q
253 | /F1 9/Times-Roman@0 SF<83>114.633 146.644 Q<83>130.113 119.284 Q<83>
254 | 145.521 110.644 Q<83>160.929 114.82 Q<83>176.337 130.876 Q<83>179.433
255 | 117.556 Q<83>182.529 119.356 Q<83>185.625 120.436 Q<83>188.721 121.804 Q
256 | <83>191.817 114.1 Q<83>194.913 108.556 Q<83>197.937 106.54 Q<83>201.033
257 | 106.252 Q<83>204.129 112.228 Q<83>207.225 104.956 Q<83>210.321 101.572 Q
258 | <83>213.417 104.308 Q<83>216.513 108.484 Q<83>219.537 106.18 Q<83>
259 | 222.633 105.676 Q<83>225.729 105.964 Q<83>228.825 98.908 Q<83>231.921
260 | 91.78 Q<83>235.017 81.484 Q<83>238.113 77.668 Q<83>241.137 67.732 Q<83>
261 | 244.233 68.164 Q<83>247.329 62.98 Q<83>250.425 60.82 Q<83>253.521 63.196
262 | Q<83>256.617 63.412 Q<83>259.713 58.084 Q<83>262.737 59.452 Q<83>265.833
263 | 82.636 Q<83>268.929 84.436 Q<83>272.025 81.772 Q<83>275.121 75.652 Q<83>
264 | 278.217 69.244 Q<83>281.313 67.228 Q<83>284.337 67.228 Q<83>287.433
265 | 71.908 Q<83>290.529 85.948 Q<83>293.625 89.476 Q<83>296.721 85.084 Q 0
266 | Cg EP
267 | %%Trailer
268 | end
269 | %%Trailer
270 | cleartomark
271 | countdictstack
272 | exch sub { end } repeat
273 | restore
274 | %%EOF
275 | 


--------------------------------------------------------------------------------
/latex_src/preamble.tex:
--------------------------------------------------------------------------------
  1 | % vim: ts=4 sts=4 sw=4 et tw=75
  2 | % preamble here.
  3 | 
  4 | \documentclass[nofonts, oneside, fancyhdr]{ctexbook}
  5 | 
  6 | \usepackage{geometry}
  7 | \usepackage{fontspec}
  8 | \usepackage{xeCJK}
  9 | \usepackage{amssymb}
 10 | % for tikz
 11 | \usepackage{tikz}
 12 | % for varwidth
 13 | \usepackage{varwidth}
 14 | \usepackage{hyperref}
 15 | \usepackage{fancyhdr}
 16 | \usepackage{verbatim}
 17 | % for float's caption
 18 | \usepackage{caption}
 19 | % for exercise 
 20 | \usepackage{theorem}
 21 | % for varwidth
 22 | \usepackage{varwidth}
 23 | \usepackage{tikz}
 24 | % for format of contents
 25 | \usepackage{tocloft}
 26 | % for number of footnote 
 27 | \usepackage{pifont}
 28 | % 一页结束时, 脚注编号清零
 29 | \usepackage[perpage]{footmisc}
 30 | % summary 的边框
 31 | \usepackage{mdframed}
 32 | % 插图所需的宏包
 33 | \usepackage{graphicx}
 34 | % 
 35 | \usepackage[all, pdf]{xy}
 36 | % 边框
 37 | \usepackage{mdframed}
 38 | % 双栏排版
 39 | \usepackage{multicol}
 40 | % 表格单元格内换行
 41 | \usepackage{makecell}
 42 | \usepackage{listings}
 43 | 
 44 | \usepackage{pdfpages}
 45 | \usepackage{bookmark}
 46 | 
 47 | % 脚注编号带圈
 48 | \renewcommand\thefootnote{\ding{\numexpr171+\value{footnote}}}
 49 | 
 50 | % from package geometry
 51 | % 为边注加边框
 52 | \let\oldmarginpar=\marginpar
 53 | \renewcommand\marginpar[1]{%
 54 |     \oldmarginpar{\framebox{#1}}%
 55 | }
 56 | \geometry{%
 57 |     margin=1cm,
 58 |     marginparsep = 0.5cm,
 59 |     marginparwidth=1cm,
 60 |     top = 2.5cm,
 61 |     bottom = 2cm,
 62 |     outer = 2.0cm,
 63 |     inner = 2.0cm
 64 | }
 65 | 
 66 | \pagestyle{fancy}
 67 | \fancyhead[LE,RO]{\rightmark}
 68 | \fancyhead[LO,RE]{\leftmark}
 69 | \fancyfoot[C]{\thepage}
 70 | 
 71 | % from package hyperref
 72 | \hypersetup{
 73 |     bookmarksnumbered = true,
 74 |     pdftitle = {The AWK Programming Language},
 75 |     pdfcreator = {wuzhouhui250@gmail.com},
 76 |     pdfauthor = {Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger},
 77 |     pdfsubject = {awk programming},
 78 |     pdfkeywords = {awk, pattern scanning, text processing}
 79 | }
 80 | 
 81 | % from package fontspec and xeCJK
 82 | \setCJKmainfont{AR PL KaitiM GB}
 83 | \setCJKsansfont{AR PL KaitiM GB}
 84 | \setCJKmonofont{AR PL KaitiM GB}
 85 | \setmainfont{Century Schoolbook L}
 86 | \setsansfont{FreeSans}
 87 | % "Mapping={}" make quote symbol straight
 88 | \setmonofont[Mapping={}]{Courier 10 Pitch}
 89 | 
 90 | % the name of file or directory
 91 | \newcommand\filename[1]{\texttt{#1}}
 92 | 
 93 | % awk program, from package verbatim
 94 | \newenvironment{awkcode}%
 95 | {\verbatim}%
 96 | {\endverbatim}
 97 | 
 98 | % pattern for many situations
 99 | \newenvironment{pattern}%
100 | {\begin{quotation}}%
101 | {\end{quotation}}
102 | 
103 | % environment for summary
104 | \newmdenv{summaryframe}
105 | \newenvironment{summary}[1]
106 | {
107 |     \begin{summaryframe}
108 |     \begin{center} \Large{#1} \end{center}%
109 | }
110 | {
111 |     \end{summaryframe}
112 | }
113 | 
114 | % term in English
115 | \newcommand\term[1]{\textit{#1}}
116 | % term in Chinese
117 | \newcommand\cterm[1]{\textbf{#1}}
118 | 
119 | % subsection unnumbered
120 | \CTEXsetup[number={}]{subsection}
121 | 
122 | % Looks like subsection's format, reference CTex manual ctex.pdf
123 | \newcommand\pseudosubsec[1]{
124 |     \vspace{3.25ex plus 1ex minus .2ex}
125 |     \noindent\large\textbf{\phantom{占}#1}
126 |     \vspace{1.5ex plus .2ex}
127 |     \phantomsection
128 |     \addcontentsline{toc}{subsection}{\protect\numberline{}#1}
129 | }
130 | 
131 | % word that shows frequently
132 | \newcommand\awk{\texttt{awk}}
133 | \newcommand\print{\texttt{print}}
134 | \newcommand\printf{\texttt{printf}}
135 | \newcommand\nf{\texttt{NF}}
136 | \newcommand\nr{\texttt{NR}}
137 | \newcommand\AND{\texttt{\&\&}}
138 | \newcommand\OR{\texttt{||}}
139 | \newcommand\NOT{\texttt{!}}
140 | \newcommand\BEGIN{\texttt{BEGIN}}
141 | \newcommand\END{\texttt{END}}
142 | \newcommand\length{\texttt{length}}
143 | \newcommand\while{\texttt{while}}
144 | \newcommand\for{\texttt{for}}
145 | \newcommand\patact{\ \mbox{模式}\mbox{--}动作\ }
146 | \newcommand\stmt{\textit{statements}}
147 | \newcommand\expr{\textit{expression}}
148 | \newcommand\regexpr{\textit{regular expression}}
149 | \newcommand\pat{\textit{pattern}}
150 | \newcommand\fs{\texttt{FS}}
151 | \newcommand\OFS{\texttt{OFS}}
152 | \newcommand\ctn{\texttt{continue}}
153 | \newcommand\fmt{\textit{format}}
154 | 
155 | \theoremstyle{plain}
156 | \theoremheaderfont{\bfseries}
157 | \theorembodyfont{\normalfont}
158 | \newtheorem{exercise}{习题}[chapter]
159 | \newcommand\myexer{\textbf{习题\ }}
160 | 
161 | % 设置目录中 subsection 的缩进
162 | \settowidth\cftsubsecindent{2em}
163 | % 设置目录中 chapter 章节编号的宽度 (ctex 章节编号为中文, 需要特别注意).
164 | % 参考 <<LaTeX 入门>>, 刘海洋 编著, 电子工业出版社, 2013.6
165 | \settowidth\cftchapnumwidth{第十章} % 最宽的可能编号
166 | \renewcommand\cftchapaftersnumb{\hspace{0.5em}} % 额外间距
167 | 
168 | \title{AWK 程序设计语言}
169 | \author{Alfred V.Aho \and Brian W.Kernighan \and Peter J.Weinberger \and
170 |     \url{https://github.com/wuzhouhui/awk}
171 | }
172 | 


--------------------------------------------------------------------------------
/latex_src/preface.tex:
--------------------------------------------------------------------------------
  1 | % vim: ts=8 sts=8 sw=4 et tw=75
  2 | \chapter{前言}
  3 | \label{chap:preface}
  4 | 
  5 | \marginpar{iii}
  6 | 计算机用户经常把大量的时间花费在简单, 机械化的数据处理工作中 --- 改变
  7 | 数据格式, 验证数据的有效性, 搜索特定的数据项, 求和, 打印报表等. 这些
  8 | 工作完全可以自动化地完成, 但是如果每碰到一个这样的任务, 就用一门标准
  9 | 的编程语言 (比如 C 或 Pascal) 写一个专用的程序来解决它, 未免也太麻烦了.
 10 | 
 11 | Awk 是一门特殊的编程语言, 它非常适合处理上面提到的任务, 经常只需要
 12 | 一两行便可搞定. 一个 awk 程序由一系列的模式和动作组成, 这些模式与动作
 13 | 说明了在输入中搜索哪些数据, 以及当符合条件的数据被找到时, 应该执行什么
 14 | 操作. Awk 在输入文件集合中 搜索与模式相匹配的输入行, 当找到一个匹配行时, 
 15 | 便会执行对应的动作. 通过字符串, 数值, 字段, 变量和数组元素的比较操作,
 16 | 再加上正则表达式, 利用这些组合, 一个模式可以用来选择输入行, 而动作可以
 17 | 对选中的行作任意的处理. 描述动作的语言看起来和 C 非常像, 但是它不需要
 18 | 声明, 并且字符串和数值都是内建的数据类型.
 19 | 
 20 | Awk 自动地扫描输入文件, 并把每一个输入行切分成字段. 因为许多工作都是
 21 | 自动完成的 --- 包括输入, 字段分割, 存储管理, 初始化 --- 所以和传统语言
 22 | 编写的程序相比, awk 程序简短得多. Awk 最常用的用途就是前面提到的
 23 | 那些工作. 因为 awk 程序一般都很短, 所以人们经常这样使用它: 通过键盘在
 24 | 命令行中输入程序代码 (只有一两行), 执行, 然后把代码丢弃. 实际上, awk 是
 25 | 一个通用编程工具, 许多专用工具都可以用它来替代.
 26 | 
 27 | 由于表达式和操作非常简便, 所以用 awk 构造大型程序的原型就显得非常方便:
 28 | 先从简单的几行开始, 慢慢加以扩充, 测试不同的设计方案, 直到完成预期的目标.
 29 | 因为程序比较简短, 所以很容易上手, 如果在开发的过程中想到了一个更好的方案,
 30 | 修改起来 (甚至从头开始) 也没那么麻烦. 只要设计是正确的, 那么把 awk 程序
 31 | 翻译成其他语言也很方便.
 32 | \marginpar{iv}
 33 | 
 34 | \section*{本书组织}
 35 | 本书的第一目标是告诉读者 awk 是什么, 以及如何高效地使用它. 第 
 36 | \ref{chap:an_awk_tutorial} 章是一个快速入门教程, 读过几页之后, 读者应该就
 37 | 有了足够的知识开始写一些有用的 awk 程序. 这一章的例子非常简短, 它
 38 | 们都是 awk 的典型应用.
 39 | 
 40 | 第 \ref{chap:the_awk_language} 章对整个 awk 语言进行描述. 虽然这一章
 41 | 也包含了许多例子, 但是读起来就像手册一样枯燥, 所以在第一次阅读本章
 42 | 时, 快速浏览即可.
 43 | 
 44 | 本书的剩下几章包含了相当丰富的例子, 这些例子主要用来展示 awk 的应用范围
 45 | 如何广泛, 以及如何高效地使用它. 其中一些示例比较常规, 另外一些虽然展示
 46 | 了某些编程思想, 但并没有非常实际的用途, 还有一小部分例子仅仅是因为
 47 | 它们比较有趣.
 48 | 
 49 | 第 \ref{chap:data_processing} 章的重点是检索, 转换, 归约和数据验证  ---
 50 | 这些任务本来就是当初开发 awk 的目标. 这一章还讨论了如何处理多行记录, 比如
 51 | 地址薄.
 52 | 
 53 | Awk 是管理小型个人数据库的优秀工具. 第 \ref{chap:reports_and_databases}
 54 | 章讨论如何从数据库中生成报表, 以及如何为存储在多个文件中的数据
 55 | 构造一个简单的关系数据库系统和对应的查询语言.
 56 | 
 57 | Awk 处理文本就像其他语言处理数值一样方便, 所以它经常被应用在文本
 58 | 处理领域. 第 \ref{chap:processing_words} 章讨论如何使用 awk 生成文本,
 59 | 以及协助文档的准备工作. 这一章包含了一个索引生成程序, 本书的索引就是用它
 60 | 的增强版生成的.
 61 | 
 62 | 第 \ref{chap:little_languages} 章关于 ``小语言'', 小语言指的是特定
 63 | 于某个领域的定制化语言. 使用 awk 编写翻译器非常方便, 因为它的基本操作
 64 | 支持大部分的词汇和表格管理工作. 这一章包含汇编程序, 绘图程序和
 65 | 几个计算器程序.
 66 | 
 67 | Awk 还可以用来演示算法. 因为用 awk 写程序不需要声明, 也不用担心内存管理,
 68 | 所以它不仅具有伪代码的许多优点, 而且是可运行的. 第
 69 | \ref{chap:experiments_with_algorithms} 讨论算法实验, 包括测试与性能
 70 | 评价. 算法包括几种排序算法, 最后以 Unix 程序 \texttt{make} 作为结束.
 71 | 
 72 | 第 \ref{chap:epilog} 章介绍 awk 的历史. 除此之外, 如果程序比较慢, 或者条
 73 | 件比较苛刻, 这一章还提出了几点优化建议.
 74 | 
 75 | 附录 \ref{chap:awk_summary} 总结了 awk 语言, 附录
 76 | \ref{chap:answers_to_selected_exercises} 是部分习题的参考答案.
 77 | 
 78 | 读者应该从第 \ref{chap:an_awk_tutorial} 章开始阅读, 并尝试自己动手写
 79 | 程序. 快速浏览第 \ref{chap:the_awk_language} 章, 重点关注汇总和表格,
 80 | 不要陷入到细节当中. 然后根据自己的兴趣, 阅读后面的章节, 这些章节之间
 81 | 都是互相独立的, 所以不用在意阅读顺序.
 82 | \marginpar{v}
 83 | \section*{示例}
 84 | 示例覆盖了多个主题, 但其中最重要的是向读者展示如何高效地使用 awk. 我们
 85 | 已经努力让书中的例子覆盖尽可能多的程序结构, 对关联数组和正则表达式有所侧重,
 86 | 因为它们是 awk 编程的主要特点.
 87 | 
 88 | 第 2 个主题是展示 awk 丰富的功能. 从数据库到芯片设计, 从数值分析到图形
 89 | 图像, 从编译器到系统管理, 从非编程人员的第一门语言到软件工程课的实现
 90 | 语言 --- 都可以见到 awk 的身影. 我们希望书中的例子也能够让读者有所启发.
 91 | 
 92 | 第 3 个主题展示的是如何完成一些常见的计算操作. 相关的例子包括关系型数据
 93 | 库, 玩具计算机的汇编程序和解释程序, 绘图语言, awk 子集的递归下降语法
 94 | 分析器, 基于 \texttt{make} 的文件更新程序等. 在每个案例中, 都会有一
 95 | 个简短的程序, 向读者展示操作过程中最核心的部分, 以便快速理解和动手
 96 | 实践.
 97 | 
 98 | 另外, 我们还会向读者说明解决编程问题的一系列方法. Awk 对快速原型开发
 99 | 方法支持得很好, 另外一种不太明显的策略是分而治之 --- 把一件大任务分成
100 | 几个小任务, 每一个小任务集中解决问题的某一方面. 最后一种方法是开发一个
101 | 用来生成其他程序的程序. 通过开发小语言, 我们可以更加容易地定义出良好
102 | 的用户接口, 以及更合理的实现方案. 虽然前面提到的方法是在 awk 的环境
103 | 中提出, 但实际上, 它们都是非常通用的编程方法, 每一位程序员都应该掌握.
104 | 
105 | 书中所有的例子都是直接从文本中加以测试, 这些文本都是以机器可读的形式
106 | 呈现. 我们已经尽力让这些示例程序不含有错误, 但是我们既没有为它们添加
107 | 特性, 也没有用所有可能的无效数据对它们进行测试: 这些程序的目标主要是用
108 | 来说明问题.
109 | 
110 | \section*{AWK 的演变}
111 | 
112 | Awk 最早由本书作者在 1977 年设计并实现, 当时是作为实验的一部分, 而这个实
113 | 验是为了检查 Unix 工具 \texttt{grep} 和 \texttt{sed} 是否可以像处理文本
114 | 那样处理数值. Awk 的开发基于我们对正则表达式和可编程编辑器的兴趣. 虽然 
115 | 开发 awk 是为了写出非常简短的程序, 但是它丰富的功能马上吸引了众多用户,
116 | 而这些用户经常开发大型程序, 这些大型程序需要 awk 提供更多的功能, 因此在
117 | 1985 年, awk 推出了一个增强版.
118 | 
119 | 增强版增加的一个主要特性是允许用户定义自己的函数.
120 | \marginpar{vi}
121 | 其他的增强功能包括动态正则表达式 --- 带有文本替换和模式匹配功能; 更丰富
122 | 的内建函数与变量; 新增的运算符和语句; 从多个文件中读取输入数据; 命令行 
123 | 参数的支持. 出错时的消息提示也得到了加强. 第 \ref{chap:an_awk_tutorial}
124 | 章的例子只使用了原版 awk 的功能, 而后面的例子则用到了许多新增的特性.
125 | 
126 | 本书的 awk 版本是 Unix System V Release 3.1 的一部分, 其源代码可以
127 | 通过 AT\&T 的 Unix System Toolchest 软件发行系统得到, 具体方式是拨打
128 | 1-201-522-6900, 并以访客身份登录; 如果是欧洲地区, 请联系位于伦敦的 AT\&T
129 | Unix Europe (44-1-567-7711); 如果是远东地区, 请联系位于东京的
130 | AT\&T Unix Pacific (81-3-431-3670).
131 | 
132 | 因为 awk 在 Unix 系统中开发而成, 所以它的某些功能只能在 Unix 系统使用,
133 | 有几个例子使用了这些和操作系统相关的功能.
134 | 另外, 我们还假设系统中提供了某些 Unix 实用工具, 尤其是  \texttt{sort}.
135 | 除了这些限制条件, awk 应该能在多种平台中使用, 尤其是它也能在 MS-DOS 中
136 | 运行, 更多的信息请咨询 Addison-Wesley.
137 | 
138 | Awk 并不完美, 它也有一些例外, 遗漏, 或者仅仅是一些不好的设计导致的, 有时
139 | 候它还会很慢. 但它同时也是一门功能丰富的语言, 可以解决许多编程问题,
140 | 希望读者能像我们一样, 从 awk 中得到巨大的帮助.
141 | 
142 | \section*{致谢}
143 | 本书的写作过程得到了许多人的帮助, 我们由衷地感谢他们, 尤其是 Jon Bentley,
144 | 他的热情始终鼓舞着我们. Jon 为本书的创作提供了许多想法和程序, 这些都来源
145 | 于他长期使用和教授 awk 积累下来的经验, 他还认真地阅读了本书的部分草稿.
146 | 另外, 我们还要感谢 Doug McIlroy, 作为一名出色的读者, 他帮助我们改善了
147 | 整本书的结构和内容. 其他人还包括 Susan Aho, Jaap Akkerhuis, Lorinda
148 | Cherry, Chris Fraser, Eric Grosse, Riccardo Gusella, Bob Herbst, Mark
149 | Kernighan, John Linderman, Bob Martin, Howard Moscovitz, Gerard Schmitt,
150 | Don Swartwout, Howard Trickey, Peter van Eijk, Chris Van Wyk, 和 Mihalis
151 | Yannakakis, 谢谢他们的帮助.
152 | 
153 | \begin{flushright}
154 | Alfred V. Aho\par
155 | Brian W. Kernighan \par
156 | Peter J. Weinberger
157 | \end{flushright}
158 | 


--------------------------------------------------------------------------------