├── .gitignore
├── COPYING
├── MANIFEST.in
├── README.md
├── __init__.py
├── bin
    ├── .pandoc-3.0.1.pkg
    ├── .python3-3.11.1.pkg
    ├── README.hermit.md
    ├── activate-hermit
    ├── hermit
    ├── hermit.hcl
    ├── pandoc
    ├── pip
    ├── pip3
    ├── pip3.11
    ├── pydoc3
    ├── pydoc3.11
    ├── python
    ├── python3
    ├── python3-config
    ├── python3.11
    └── python3.11-config
├── pawk
├── pawk.py
├── pawk_test.py
├── pyproject.toml
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | *.pyc
3 | .venv
4 | 


--------------------------------------------------------------------------------
/COPYING:
--------------------------------------------------------------------------------
 1 | Copyright (C) 2018 Alec Thomas
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
 7 | of the Software, and to permit persons to whom the Software is furnished to do
 8 | so, subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19 | SOFTWARE.
20 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include README.md
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # PAWK - A Python line processor (like AWK)
  2 | 
  3 | PAWK aims to bring the full power of Python to AWK-like line-processing.
  4 | 
  5 | Here are some quick examples to show some of the advantages of pawk over AWK.
  6 | 
  7 | The first example transforms `/etc/hosts` into a JSON map of host to IP:
  8 | 
  9 | 	cat /etc/hosts | pawk -B 'd={}' -E 'json.dumps(d)' '!/^#/ d[f[1]] = f[0]'
 10 | 
 11 | Breaking this down:
 12 | 
 13 | 1. `-B 'd={}'` is a begin statement initializing a dictionary, executed once before processing begins.
 14 | 2. `-E 'json.dumps(d)'` is an end statement expression, producing the JSON representation of the dictionary `d`.
 15 | 3. `!/^#/` tells pawk to match any line *not* beginning with `#`.
 16 | 4. `d[f[1]] = f[0]` adds a dictionary entry where the key is the second field in the line (the first hostname) and the value is the first field (the IP address).
 17 | 
 18 | And another example showing how to bzip2-compress + base64-encode a file:
 19 | 
 20 | 	cat pawk.py | pawk -E 'base64.encodestring(bz2.compress(t))'
 21 | 
 22 | ### AWK example translations
 23 | 
 24 | Most basic AWK constructs are available. You can find more idiomatic examples below in the example section, but here are a bunch of awk commands and their equivalent pawk commands to get started with:
 25 | 
 26 | Print lines matching a pattern:
 27 | 
 28 | 	ls -l / | awk '/etc/'
 29 | 	ls -l / | pawk '/etc/'
 30 | 
 31 | Print lines *not* matching a pattern:
 32 | 
 33 | 	ls -l / | awk '!/etc/'
 34 | 	ls -l / | pawk '!/etc/'
 35 | 
 36 | Field slicing and dicing (here pawk wins because of Python's array slicing):
 37 | 
 38 | 	ls -l / | awk '/etc/ {print $5, $6, $7, $8, $9}'
 39 | 	ls -l / | pawk '/etc/ f[4:]'
 40 | 
 41 | Begin and end end actions (in this case, summing the sizes of all files):
 42 | 
 43 | 	ls -l | awk 'BEGIN {c = 0} {c += $5} END {print c}'
 44 | 	ls -l | pawk -B 'c = 0' -E 'c' 'c += int(f[4])'
 45 | 
 46 | Print files where a field matches a numeric expression (in this case where files are > 1024 bytes):
 47 | 
 48 | 	ls -l | awk '$5 > 1024'
 49 | 	ls -l | pawk 'int(f[4]) > 1024'
 50 | 
 51 | Matching a single field (any filename with "t" in it):
 52 | 
 53 | 	ls -l | awk '$NF ~/t/'
 54 | 	ls -l | pawk '"t" in f[-1]'
 55 | 
 56 | ## Installation
 57 | 
 58 | It should be as simple as:
 59 | 
 60 | ```
 61 | pip install pawk
 62 | ```
 63 | 
 64 | But if that doesn't work, just download the `pawk.py`, make it executable, and place it somewhere in your path.
 65 | 
 66 | ## Expression evaluation
 67 | 
 68 | PAWK evaluates a Python expression or statement against each line in stdin. The following variables are available in local context:
 69 | 
 70 | - `line` - Current line text, including newline.
 71 | - `l` - Current line text, excluding newline.
 72 | - `n` - The current 1-based line number.
 73 | - `f` - Fields of the line (split by the field separator `-F`).
 74 | - `nf` - Number of fields in this line.
 75 | - `m` - Tuple of match regular expression capture groups, if any.
 76 | 
 77 | 
 78 | In the context of the `-E` block:
 79 | 
 80 | - `t` - The entire input text up to the current cursor position.
 81 | 
 82 | If the flag `-H, --header` is provided, each field in the first row of the input will be treated as field variable names in subsequent rows. The header is not output. For example, given the input:
 83 | 
 84 | ```
 85 | count name
 86 | 12 bob
 87 | 34 fred
 88 | ```
 89 | 
 90 | We could do:
 91 | 
 92 | ```
 93 | $ pawk -H '"%s is %s" % (name, count)' < input.txt
 94 | bob is 12
 95 | fred is 34
 96 | ```
 97 | 
 98 | To output a header as well, use `-B`:
 99 | 
100 | ```
101 | $ pawk -H -B '"name is count"' '"%s is %s" % (name, count)' < input.txt
102 | name is count
103 | bob is 12
104 | fred is 34
105 | ```
106 | 
107 | Module references will be automatically imported if possible. Additionally, the `--import <module>[,<module>,...]` flag can be used to import symbols from a set of modules into the evaluation context.
108 | 
109 | eg. `--import os.path` will import all symbols from `os.path`, such as `os.path.isfile()`, into the context.
110 | 
111 | ## Output
112 | 
113 | ### Line actions
114 | 
115 | The type of the evaluated expression determines how output is displayed:
116 | 
117 | - `tuple` or `list`: the elements are converted to strings and joined with the output delimiter (`-O`).
118 | - `None` or `False`: nothing is output for that line.
119 | - `True`: the original line is output.
120 | - Any other value is converted to a string.
121 | 
122 | ### Start/end blocks
123 | 
124 | The rules are the same as for line actions with one difference.  Because there is no "line" that corresponds to them, an expression returning True is ignored.
125 | 
126 | 	$ echo -ne 'foo\nbar' | pawk -E t
127 |     foo
128 |     bar
129 | 
130 | 
131 | ## Command-line usage
132 | 
133 | ```
134 | Usage: cat input | pawk [<options>] <expr>
135 | 
136 | A Python line-processor (like awk).
137 | 
138 | See https://github.com/alecthomas/pawk for details. Based on
139 | http://code.activestate.com/recipes/437932/.
140 | 
141 | Options:
142 |   -h, --help            show this help message and exit
143 |   -I <filename>, --in_place=<filename>
144 |                         modify given input file in-place
145 |   -i <modules>, --import=<modules>
146 |                         comma-separated list of modules to "from x import *"
147 |                         from
148 |   -F <delim>            input delimiter
149 |   -O <delim>            output delimiter
150 |   -L <delim>            output line separator
151 |   -B <statement>, --begin=<statement>
152 |                         begin statement
153 |   -E <statement>, --end=<statement>
154 |                         end statement
155 |   -s, --statement       DEPRECATED. retained for backward compatibility
156 |   -H, --header          use first row as field variable names in subsequent
157 |                         rows
158 |   --strict              abort on exceptions
159 | ```
160 | 
161 | ## Examples
162 | 
163 | ### Line processing
164 | 
165 | Print the name and size of every file from stdin:
166 | 
167 | 	find . -type f | pawk 'f[0], os.stat(f[0]).st_size'
168 | 
169 | > **Note:** this example also shows how pawk automatically imports referenced modules, in this case `os`.
170 | 
171 | Print the sum size of all files from stdin:
172 | 
173 | 	find . -type f | \
174 | 		pawk \
175 | 			--begin 'c=0' \
176 | 			--end c \
177 | 			'c += os.stat(f[0]).st_size'
178 | 
179 | Short-flag version:
180 | 
181 | 	find . -type f | pawk -B c=0 -E c 'c += os.stat(f[0]).st_size'
182 | 
183 | 
184 | ### Whole-file processing
185 | 
186 | If you do not provide a line expression, but do provide an end statement, pawk will accumulate each line, and the entire file's text will be available in the end statement as `t`. This is useful for operations on entire files, like the following example of converting a file from markdown to HTML:
187 | 
188 | 	cat README.md | \
189 | 		pawk --end 'markdown.markdown(t)'
190 | 
191 | Short-flag version:
192 | 
193 | 	cat README.md | pawk -E 'markdown.markdown(t)'
194 | 
195 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alecthomas/pawk/d60f78399e8a01857ebd73415a00e7eb424043ab/__init__.py


--------------------------------------------------------------------------------
/bin/.pandoc-3.0.1.pkg:
--------------------------------------------------------------------------------
1 | hermit


--------------------------------------------------------------------------------
/bin/.python3-3.11.1.pkg:
--------------------------------------------------------------------------------
1 | hermit


--------------------------------------------------------------------------------
/bin/README.hermit.md:
--------------------------------------------------------------------------------
1 | # Hermit environment
2 | 
3 | This is a [Hermit](https://github.com/cashapp/hermit) bin directory.
4 | 
5 | The symlinks in this directory are managed by Hermit and will automatically
6 | download and install Hermit itself as well as packages. These packages are
7 | local to this environment.
8 | 


--------------------------------------------------------------------------------
/bin/activate-hermit:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | # This file must be used with "source bin/activate-hermit" from bash or zsh.
 3 | # You cannot run it directly
 4 | #
 5 | # THIS FILE IS GENERATED; DO NOT MODIFY
 6 | 
 7 | if [ "${BASH_SOURCE-}" = "$0" ]; then
 8 |   echo "You must source this script: \$ source $0" >&2
 9 |   exit 33
10 | fi
11 | 
12 | BIN_DIR="$(dirname "${BASH_SOURCE[0]:-${(%):-%x}}")"
13 | if "${BIN_DIR}/hermit" noop > /dev/null; then
14 |   eval "$("${BIN_DIR}/hermit" activate "${BIN_DIR}/..")"
15 | 
16 |   if [ -n "${BASH-}" ] || [ -n "${ZSH_VERSION-}" ]; then
17 |       hash -r 2>/dev/null
18 |     fi
19 | 
20 |     echo "Hermit environment $("${HERMIT_ENV}"/bin/hermit env HERMIT_ENV) activated"
21 | fi
22 | 


--------------------------------------------------------------------------------
/bin/hermit:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | #
 3 | # THIS FILE IS GENERATED; DO NOT MODIFY
 4 | 
 5 | set -eo pipefail
 6 | 
 7 | export HERMIT_USER_HOME=~
 8 | 
 9 | if [ -z "${HERMIT_STATE_DIR}" ]; then
10 |   case "$(uname -s)" in
11 |   Darwin)
12 |     export HERMIT_STATE_DIR="${HERMIT_USER_HOME}/Library/Caches/hermit"
13 |     ;;
14 |   Linux)
15 |     export HERMIT_STATE_DIR="${XDG_CACHE_HOME:-${HERMIT_USER_HOME}/.cache}/hermit"
16 |     ;;
17 |   esac
18 | fi
19 | 
20 | export HERMIT_DIST_URL="${HERMIT_DIST_URL:-https://github.com/cashapp/hermit/releases/download/stable}"
21 | HERMIT_CHANNEL="$(basename "${HERMIT_DIST_URL}")"
22 | export HERMIT_CHANNEL
23 | export HERMIT_EXE=${HERMIT_EXE:-${HERMIT_STATE_DIR}/pkg/hermit@${HERMIT_CHANNEL}/hermit}
24 | 
25 | if [ ! -x "${HERMIT_EXE}" ]; then
26 |   echo "Bootstrapping ${HERMIT_EXE} from ${HERMIT_DIST_URL}" 1>&2
27 |   INSTALL_SCRIPT="$(mktemp)"
28 |   # This value must match that of the install script
29 |   INSTALL_SCRIPT_SHA256="180e997dd837f839a3072a5e2f558619b6d12555cd5452d3ab19d87720704e38"
30 |   if [ "${INSTALL_SCRIPT_SHA256}" = "BYPASS" ]; then
31 |     curl -fsSL "${HERMIT_DIST_URL}/install.sh" -o "${INSTALL_SCRIPT}"
32 |   else
33 |     # Install script is versioned by its sha256sum value
34 |     curl -fsSL "${HERMIT_DIST_URL}/install-${INSTALL_SCRIPT_SHA256}.sh" -o "${INSTALL_SCRIPT}"
35 |     # Verify install script's sha256sum
36 |     openssl dgst -sha256 "${INSTALL_SCRIPT}" | \
37 |       awk -v EXPECTED="$INSTALL_SCRIPT_SHA256" \
38 |       '$2!=EXPECTED {print "Install script sha256 " $2 " does not match " EXPECTED; exit 1}'
39 |   fi
40 |   /bin/bash "${INSTALL_SCRIPT}" 1>&2
41 | fi
42 | 
43 | exec "${HERMIT_EXE}" --level=fatal exec "$0" -- "$@"
44 | 


--------------------------------------------------------------------------------
/bin/hermit.hcl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alecthomas/pawk/d60f78399e8a01857ebd73415a00e7eb424043ab/bin/hermit.hcl


--------------------------------------------------------------------------------
/bin/pandoc:
--------------------------------------------------------------------------------
1 | .pandoc-3.0.1.pkg


--------------------------------------------------------------------------------
/bin/pip:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/pip3:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/pip3.11:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/pydoc3:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/pydoc3.11:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/python:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/python3:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/python3-config:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/python3.11:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/bin/python3.11-config:
--------------------------------------------------------------------------------
1 | .python3-3.11.1.pkg


--------------------------------------------------------------------------------
/pawk:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | 
3 | from pawk import main
4 | 
5 | main()
6 | 


--------------------------------------------------------------------------------
/pawk.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | """cat input | pawk [<options>] <expr>
  5 | 
  6 | A Python line-processor (like awk).
  7 | 
  8 | See https://github.com/alecthomas/pawk for details. Based on
  9 | http://code.activestate.com/recipes/437932/.
 10 | """
 11 | 
 12 | import ast
 13 | import codecs
 14 | import inspect
 15 | import optparse
 16 | import os
 17 | import re
 18 | import sys
 19 | 
 20 | 
 21 | __version__ = '0.8.0'
 22 | 
 23 | 
 24 | RESULT_VAR_NAME = "__result"
 25 | 
 26 | 
 27 | if sys.version_info[0] > 2:
 28 |     from itertools import zip_longest
 29 |     
 30 |     try:
 31 |         exec_ = __builtins__['exec']
 32 |     except TypeError:
 33 |         exec_ = getattr(__builtins__, 'exec')
 34 |     STRING_ESCAPE = 'unicode_escape'
 35 | else:
 36 |     from itertools import izip_longest as zip_longest
 37 |     
 38 |     def exec_(_code_, _globs_=None, _locs_=None):
 39 |         if _globs_ is None:
 40 |             frame = sys._getframe(1)
 41 |             _globs_ = frame.f_globals
 42 |             if _locs_ is None:
 43 |                 _locs_ = frame.f_locals
 44 |             del frame
 45 |         elif _locs_ is None:
 46 |             _locs_ = _globs_
 47 |         exec("""exec _code_ in _globs_, _locs_""")
 48 |     STRING_ESCAPE = 'string_escape'
 49 | 
 50 | 
 51 | # Store the last expression, if present, into variable var_name.
 52 | def save_last_expression(tree, var_name=RESULT_VAR_NAME):
 53 |     body = tree.body
 54 |     node = body[-1] if len(body) else None
 55 |     body.insert(0, ast.Assign(targets=[ast.Name(id=var_name, ctx=ast.Store())],
 56 |                               value=ast.Constant(None)))
 57 |     if node and isinstance(node, ast.Expr):
 58 |         body[-1] = ast.copy_location(ast.Assign(
 59 |             targets=[ast.Name(id=var_name, ctx=ast.Store())], value=node.value), node)
 60 |     return ast.fix_missing_locations(tree)
 61 | 
 62 | 
 63 | def compile_command(text):
 64 |     tree = save_last_expression(compile(text, 'EXPR', 'exec', flags=ast.PyCF_ONLY_AST))
 65 |     return compile(tree, 'EXPR', 'exec')
 66 | 
 67 | 
 68 | def eval_in_context(codeobj, context, var_name=RESULT_VAR_NAME):
 69 |     exec_(codeobj, globals(), context)
 70 |     return context.pop(var_name, None)
 71 | 
 72 | 
 73 | class Action(object):
 74 |     """Represents a single action to be applied to each line."""
 75 | 
 76 |     def __init__(self, pattern=None, cmd='l', have_end_statement=False, negate=False, strict=False):
 77 |         self.delim = None
 78 |         self.odelim = ' '
 79 |         self.negate = negate
 80 |         self.pattern = None if pattern is None else re.compile(pattern)
 81 |         self.cmd = cmd
 82 |         self.strict = strict
 83 |         self._compile(have_end_statement)
 84 | 
 85 |     @classmethod
 86 |     def from_options(cls, options, arg):
 87 |         negate, pattern, cmd = Action._parse_command(arg)
 88 |         return cls(pattern=pattern, cmd=cmd, have_end_statement=(options.end is not None), negate=negate, strict=options.strict)
 89 | 
 90 |     def _compile(self, have_end_statement):
 91 |         if not self.cmd:
 92 |             if have_end_statement:
 93 |                 self.cmd = 't += line'
 94 |             else:
 95 |                 self.cmd = 'l'
 96 |         self._codeobj = compile_command(self.cmd)
 97 | 
 98 |     def apply(self, context, line):
 99 |         """Apply action to line.
100 | 
101 |         :return: Line text or None.
102 |         """
103 |         match = self._match(line)
104 |         if match is None:
105 |             return None
106 |         context['m'] = match
107 |         try:
108 |             return eval_in_context(self._codeobj, context)
109 |         except:
110 |             if not self.strict:
111 |                 return None
112 |             raise
113 | 
114 |     def _match(self, line):
115 |         if self.pattern is None:
116 |             return self.negate
117 |         match = self.pattern.search(line)
118 |         if match is not None:
119 |             return None if self.negate else match.groups()
120 |         elif self.negate:
121 |             return ()
122 | 
123 |     @staticmethod
124 |     def _parse_command(arg):
125 |         match = re.match(r'(?ms)(?:(!)?/((?:\\.|[^/])+)/)?(.*)', arg)
126 |         negate, pattern, cmd = match.groups()
127 |         cmd = cmd.strip()
128 |         negate = bool(negate)
129 |         return negate, pattern, cmd
130 | 
131 | 
132 | class Context(dict):
133 |     def apply(self, numz, line, headers=None):
134 |         l = line.rstrip()
135 |         f = l.split(self.delim)
136 |         self.update(line=line, l=l, n=numz + 1, f=f, nf=len(f))
137 |         if headers:
138 |             self.update(zip_longest(headers, f))
139 | 
140 |     @classmethod
141 |     def from_options(cls, options, modules):
142 |         self = cls()
143 |         self['t'] = ''
144 |         self['m'] = ()
145 |         if options.imports:
146 |             for imp in options.imports.split(','):
147 |                 m = __import__(imp.strip(), fromlist=['.'])
148 |                 self.update((k, v) for k, v in inspect.getmembers(m) if k[0] != '_')
149 | 
150 |         self.delim = codecs.decode(options.delim, STRING_ESCAPE) if options.delim else None
151 |         self.odelim = codecs.decode(options.delim_out, STRING_ESCAPE)
152 |         self.line_separator = codecs.decode(options.line_separator, STRING_ESCAPE)
153 | 
154 |         for m in modules:
155 |             try:
156 |                 key = m.split('.')[0]
157 |                 self[key] = __import__(m)
158 |             except:
159 |                 pass
160 |         return self
161 | 
162 | 
163 | def process(context, input, output, begin_statement, actions, end_statement, strict, header):
164 |     """Process a stream."""
165 |     # Override "print"
166 |     old_stdout = sys.stdout
167 |     sys.stdout = output
168 |     write = output.write
169 | 
170 |     def write_result(result, when_true=None):
171 |         if result is True:
172 |             result = when_true
173 |         elif isinstance(result, (list, tuple)):
174 |             result = context.odelim.join(map(str, result))
175 |         if result is not None and result is not False:
176 |             result = str(result)
177 |             if not result.endswith(context.line_separator):
178 |                 result = result.rstrip('\n') + context.line_separator
179 |             write(result)
180 | 
181 |     try:
182 |         headers = None
183 |         if header:
184 |             line = input.readline()
185 |             context.apply(-1, line)
186 |             headers = context['f']
187 | 
188 |         if begin_statement:
189 |             write_result(eval_in_context(compile_command(begin_statement), context))
190 | 
191 |         for numz, line in enumerate(input):
192 |             context.apply(numz, line, headers=headers)
193 |             for action in actions:
194 |                 write_result(action.apply(context, line), when_true=line)
195 | 
196 |         if end_statement:
197 |             write_result(eval_in_context(compile_command(end_statement), context))
198 |     finally:
199 |         sys.stdout = old_stdout
200 | 
201 | 
202 | def parse_commandline(argv):
203 |     parser = optparse.OptionParser(version=__version__)
204 |     parser.set_usage(__doc__.strip())
205 |     parser.add_option('-I', '--in_place', dest='in_place', help='modify given input file in-place', metavar='<filename>')
206 |     parser.add_option('-i', '--import', dest='imports', help='comma-separated list of modules to "from x import *" from', metavar='<modules>')
207 |     parser.add_option('-F', dest='delim', help='input delimiter', metavar='<delim>', default=None)
208 |     parser.add_option('-O', dest='delim_out', help='output delimiter', metavar='<delim>', default=' ')
209 |     parser.add_option('-L', dest='line_separator', help='output line separator', metavar='<delim>', default='\n')
210 |     parser.add_option('-B', '--begin', help='begin statement', metavar='<statement>')
211 |     parser.add_option('-E', '--end', help='end statement', metavar='<statement>')
212 |     parser.add_option('-s', '--statement', action='store_true', help='DEPRECATED. retained for backward compatibility')
213 |     parser.add_option('-H', '--header', action='store_true', help='use first row as field variable names in subsequent rows')
214 |     parser.add_option('--strict', action='store_true', help='abort on exceptions')
215 |     return parser.parse_args(argv[1:])
216 | 
217 | 
218 | # For integration tests.
219 | def run(argv, input, output):
220 |     options, args = parse_commandline(argv)
221 | 
222 |     try:
223 |         if options.in_place:
224 |             os.rename(options.in_place, options.in_place + '~')
225 |             input = open(options.in_place + '~')
226 |             output = open(options.in_place, 'w')
227 | 
228 |         # Auto-import. This is not smart.
229 |         all_text = ' '.join([(options.begin or ''), ' '.join(args), (options.end or '')])
230 |         modules = re.findall(r'([\w.]+)+(?=\.\w+)\b', all_text)
231 | 
232 |         context = Context.from_options(options, modules)
233 |         actions = [Action.from_options(options, arg) for arg in args]
234 |         if not actions:
235 |             actions = [Action.from_options(options, '')]
236 | 
237 |         process(context, input, output, options.begin, actions, options.end, options.strict, options.header)
238 |     finally:
239 |         if options.in_place:
240 |             output.close()
241 |             input.close()
242 | 
243 | 
244 | def main():
245 |     try:
246 |         run(sys.argv, sys.stdin, sys.stdout)
247 |     except EnvironmentError as e:
248 |         # Workaround for close failed in file object destructor: sys.excepthook is missing lost sys.stderr
249 |         # http://stackoverflow.com/questions/7955138/addressing-sys-excepthook-error-in-bash-script
250 |         sys.stderr.write(str(e) + '\n')
251 |         sys.exit(1)
252 |     except KeyboardInterrupt:
253 |         sys.exit(1)
254 | 
255 | 
256 | if __name__ == '__main__':
257 |     main()
258 | 


--------------------------------------------------------------------------------
/pawk_test.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | import sys
  3 | import timeit
  4 | from pawk import Action, Context, run, parse_commandline
  5 | 
  6 | if sys.version_info[0] > 2:
  7 |     from io import StringIO
  8 | else:
  9 |     from StringIO import StringIO
 10 | 
 11 | 
 12 | TEST_INPUT_LS = r'''
 13 | total 72
 14 | -rw-r-----  1 alec  staff    18 Feb  9 11:52 MANIFEST.in
 15 | -rw-r-----@ 1 alec  staff  3491 Feb 10 11:08 README.md
 16 | drwxr-x---  4 alec  staff   136 Feb  9 23:35 dist/
 17 | -rwxr-x---  1 alec  staff    53 Feb 10 04:47 pawk*
 18 | drwxr-x---  6 alec  staff   204 Feb  9 21:09 pawk.egg-info/
 19 | -rw-r-----  1 alec  staff  5045 Feb 10 11:37 pawk.py
 20 | -rw-r--r--  1 alec  staff   521 Feb 10 04:56 pawk_test.py
 21 | -rw-r-----  1 alec  staff   468 Feb 10 04:42 setup.py
 22 | '''
 23 | 
 24 | TEST_INPUT_CSV_WITH_EMPTY_FIELD = r'''
 25 | model,color,price
 26 | A01,,100
 27 | B03,blue,200
 28 | '''
 29 | 
 30 | def run_integration_test(input, args):
 31 |     input = StringIO(input.strip())
 32 |     output = StringIO()
 33 |     run(['pawk'] + args, input, output)
 34 |     return output.getvalue().strip()
 35 | 
 36 | 
 37 | def test_action_parse():
 38 |     negate, pattern, cmd = Action()._parse_command(r'/(\w+)/ l')
 39 |     assert pattern == r'(\w+)'
 40 |     assert cmd == 'l'
 41 |     assert negate is False
 42 | 
 43 | 
 44 | def test_action_match():
 45 |     action = Action(r'(\w+) \w+')
 46 |     groups = action._match('test case')
 47 |     assert groups == ('test',)
 48 | 
 49 | 
 50 | def test_action_match_negate():
 51 |     action = Action(r'(\w+) \w+', negate=True)
 52 |     groups = action._match('test case')
 53 |     assert groups is None
 54 |     groups = action._match('test')
 55 |     assert groups == ()
 56 | 
 57 | 
 58 | def test_integration_sum():
 59 |     out = run_integration_test(TEST_INPUT_LS, ['-Bc = 0', '-Ec', 'c += int(f[4])'])
 60 |     assert out == '9936'
 61 | 
 62 | 
 63 | def test_integration_match():
 64 |     out = run_integration_test(TEST_INPUT_LS, ['/pawk_test/ f[4]'])
 65 |     assert out == '521'
 66 | 
 67 | 
 68 | def test_integration_negate_match():
 69 |     out = run_integration_test(TEST_INPUT_LS, ['!/^total|pawk/ f[-1]'])
 70 |     assert out.splitlines() == ['MANIFEST.in', 'README.md', 'dist/', 'setup.py']
 71 | 
 72 | 
 73 | def test_integration_truth():
 74 |     out = run_integration_test(TEST_INPUT_LS, ['int(f[4]) > 1024'])
 75 |     assert [r.split()[-1] for r in out.splitlines()] == ['README.md', 'pawk.py']
 76 | 
 77 | 
 78 | def test_integration_multiple_actions():
 79 |     out = run_integration_test(TEST_INPUT_LS, ['/setup/', '/README/'])
 80 |     assert [r.split()[-1] for r in out.splitlines()] == ['README.md', 'setup.py']
 81 | 
 82 | 
 83 | def test_integration_csv_empty_fields():
 84 |     out = run_integration_test(TEST_INPUT_CSV_WITH_EMPTY_FIELD, ['-F,', 'f[2]'])
 85 |     assert out.splitlines() == ['price', '100', '200']
 86 |     out = run_integration_test(TEST_INPUT_CSV_WITH_EMPTY_FIELD, ['-F,', 'f[1]'])
 87 |     assert out.splitlines() == ['color', '', 'blue']
 88 | 
 89 | 
 90 | def benchmark_fields():
 91 |     options, _ = parse_commandline([''])
 92 |     action = Action(cmd='f')
 93 |     context = Context.from_options(options, [])
 94 |     t = timeit.Timer(lambda: action.apply(context, 'foo bar waz was haz has hair'))
 95 |     print((t.repeat(repeat=3, number=100000)))
 96 | 
 97 | 
 98 | if __name__ == '__main__':
 99 |     benchmark_fields()
100 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [tool.poetry]
 2 | name = "pawk"
 3 | version = "0.8.0"
 4 | authors = ["Alec Thomas <alec@swapoff.org>"]
 5 | description = "PAWK - A Python line processor (like AWK)"
 6 | license = "MIT"
 7 | readme = "README.md"
 8 | homepage = "https://github.com/alecthomas/pawk"
 9 | repository = "http://github.com/alecthomas/pawk"
10 | classifiers = [
11 |   "Programming Language :: Python :: 3",
12 |   "License :: OSI Approved :: MIT License",
13 |   "Operating System :: OS Independent",
14 | ]
15 | 
16 | [tool.poetry.scripts]
17 | pawk = "pawk:main"
18 | 
19 | [tool.poetry.urls]
20 | "Bug Tracker" = "https://github.com/alecthomas/pawk/issues"
21 | 
22 | [tool.poetry.dependencies]
23 | python = ">=3.7"
24 | 
25 | [tool.poetry.group.dev.dependencies]
26 | knotr = "^0.4.1"
27 | 
28 | [build-system]
29 | requires = ["poetry-core"]
30 | build-backend = "poetry.core.masonry.api"
31 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import atexit
 2 | import os
 3 | import sys
 4 | import pawk
 5 | try:
 6 |     from setuptools import setup
 7 | except ImportError:
 8 |     from distutils.core import setup
 9 | 
10 | if sys.version_info[0] != 3:
11 |     print("PAWK requires Python 3")
12 |     exit(1)
13 | 
14 | 
15 | try:
16 |     import pypandoc
17 |     long_description = pypandoc.convert('README.md', 'rst')
18 |     with open('README.rst', 'w') as f:
19 |         f.write(long_description)
20 |     atexit.register(lambda: os.unlink('README.rst'))
21 | except (ImportError, OSError):
22 |     print('WARNING: Could not locate pandoc, using Markdown long_description.')
23 |     with open('README.md') as f:
24 |         long_description = f.read()
25 | 
26 | description = long_description.splitlines()[0].strip()
27 | 
28 | setup(
29 |     name='pawk',
30 |     url='http://github.com/alecthomas/pawk',
31 |     download_url='http://github.com/alecthomas/pawk',
32 |     version=pawk.__version__,
33 |     description=description,
34 |     long_description=long_description,
35 |     license='PSF',
36 |     platforms=['any'],
37 |     author='Alec Thomas',
38 |     author_email='alec@swapoff.org',
39 |     py_modules=['pawk'],
40 |     scripts=['pawk'],
41 |     )
42 | 


--------------------------------------------------------------------------------