├── .gitignore ├── COPYING ├── MANIFEST.in ├── README.md ├── __init__.py ├── bin ├── .pandoc-3.0.1.pkg ├── .python3-3.11.1.pkg ├── README.hermit.md ├── activate-hermit ├── hermit ├── hermit.hcl ├── pandoc ├── pip ├── pip3 ├── pip3.11 ├── pydoc3 ├── pydoc3.11 ├── python ├── python3 ├── python3-config ├── python3.11 └── python3.11-config ├── pawk ├── pawk.py ├── pawk_test.py ├── pyproject.toml └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | *.pyc 3 | .venv 4 | -------------------------------------------------------------------------------- /COPYING: -------------------------------------------------------------------------------- 1 | Copyright (C) 2018 Alec Thomas 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include README.md 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PAWK - A Python line processor (like AWK) 2 | 3 | PAWK aims to bring the full power of Python to AWK-like line-processing. 4 | 5 | Here are some quick examples to show some of the advantages of pawk over AWK. 6 | 7 | The first example transforms `/etc/hosts` into a JSON map of host to IP: 8 | 9 | cat /etc/hosts | pawk -B 'd={}' -E 'json.dumps(d)' '!/^#/ d[f[1]] = f[0]' 10 | 11 | Breaking this down: 12 | 13 | 1. `-B 'd={}'` is a begin statement initializing a dictionary, executed once before processing begins. 14 | 2. `-E 'json.dumps(d)'` is an end statement expression, producing the JSON representation of the dictionary `d`. 15 | 3. `!/^#/` tells pawk to match any line *not* beginning with `#`. 16 | 4. `d[f[1]] = f[0]` adds a dictionary entry where the key is the second field in the line (the first hostname) and the value is the first field (the IP address). 17 | 18 | And another example showing how to bzip2-compress + base64-encode a file: 19 | 20 | cat pawk.py | pawk -E 'base64.encodestring(bz2.compress(t))' 21 | 22 | ### AWK example translations 23 | 24 | Most basic AWK constructs are available. You can find more idiomatic examples below in the example section, but here are a bunch of awk commands and their equivalent pawk commands to get started with: 25 | 26 | Print lines matching a pattern: 27 | 28 | ls -l / | awk '/etc/' 29 | ls -l / | pawk '/etc/' 30 | 31 | Print lines *not* matching a pattern: 32 | 33 | ls -l / | awk '!/etc/' 34 | ls -l / | pawk '!/etc/' 35 | 36 | Field slicing and dicing (here pawk wins because of Python's array slicing): 37 | 38 | ls -l / | awk '/etc/ {print $5, $6, $7, $8, $9}' 39 | ls -l / | pawk '/etc/ f[4:]' 40 | 41 | Begin and end end actions (in this case, summing the sizes of all files): 42 | 43 | ls -l | awk 'BEGIN {c = 0} {c += $5} END {print c}' 44 | ls -l | pawk -B 'c = 0' -E 'c' 'c += int(f[4])' 45 | 46 | Print files where a field matches a numeric expression (in this case where files are > 1024 bytes): 47 | 48 | ls -l | awk '$5 > 1024' 49 | ls -l | pawk 'int(f[4]) > 1024' 50 | 51 | Matching a single field (any filename with "t" in it): 52 | 53 | ls -l | awk '$NF ~/t/' 54 | ls -l | pawk '"t" in f[-1]' 55 | 56 | ## Installation 57 | 58 | It should be as simple as: 59 | 60 | ``` 61 | pip install pawk 62 | ``` 63 | 64 | But if that doesn't work, just download the `pawk.py`, make it executable, and place it somewhere in your path. 65 | 66 | ## Expression evaluation 67 | 68 | PAWK evaluates a Python expression or statement against each line in stdin. The following variables are available in local context: 69 | 70 | - `line` - Current line text, including newline. 71 | - `l` - Current line text, excluding newline. 72 | - `n` - The current 1-based line number. 73 | - `f` - Fields of the line (split by the field separator `-F`). 74 | - `nf` - Number of fields in this line. 75 | - `m` - Tuple of match regular expression capture groups, if any. 76 | 77 | 78 | In the context of the `-E` block: 79 | 80 | - `t` - The entire input text up to the current cursor position. 81 | 82 | If the flag `-H, --header` is provided, each field in the first row of the input will be treated as field variable names in subsequent rows. The header is not output. For example, given the input: 83 | 84 | ``` 85 | count name 86 | 12 bob 87 | 34 fred 88 | ``` 89 | 90 | We could do: 91 | 92 | ``` 93 | $ pawk -H '"%s is %s" % (name, count)' < input.txt 94 | bob is 12 95 | fred is 34 96 | ``` 97 | 98 | To output a header as well, use `-B`: 99 | 100 | ``` 101 | $ pawk -H -B '"name is count"' '"%s is %s" % (name, count)' < input.txt 102 | name is count 103 | bob is 12 104 | fred is 34 105 | ``` 106 | 107 | Module references will be automatically imported if possible. Additionally, the `--import [,,...]` flag can be used to import symbols from a set of modules into the evaluation context. 108 | 109 | eg. `--import os.path` will import all symbols from `os.path`, such as `os.path.isfile()`, into the context. 110 | 111 | ## Output 112 | 113 | ### Line actions 114 | 115 | The type of the evaluated expression determines how output is displayed: 116 | 117 | - `tuple` or `list`: the elements are converted to strings and joined with the output delimiter (`-O`). 118 | - `None` or `False`: nothing is output for that line. 119 | - `True`: the original line is output. 120 | - Any other value is converted to a string. 121 | 122 | ### Start/end blocks 123 | 124 | The rules are the same as for line actions with one difference. Because there is no "line" that corresponds to them, an expression returning True is ignored. 125 | 126 | $ echo -ne 'foo\nbar' | pawk -E t 127 | foo 128 | bar 129 | 130 | 131 | ## Command-line usage 132 | 133 | ``` 134 | Usage: cat input | pawk [] 135 | 136 | A Python line-processor (like awk). 137 | 138 | See https://github.com/alecthomas/pawk for details. Based on 139 | http://code.activestate.com/recipes/437932/. 140 | 141 | Options: 142 | -h, --help show this help message and exit 143 | -I , --in_place= 144 | modify given input file in-place 145 | -i , --import= 146 | comma-separated list of modules to "from x import *" 147 | from 148 | -F input delimiter 149 | -O output delimiter 150 | -L output line separator 151 | -B , --begin= 152 | begin statement 153 | -E , --end= 154 | end statement 155 | -s, --statement DEPRECATED. retained for backward compatibility 156 | -H, --header use first row as field variable names in subsequent 157 | rows 158 | --strict abort on exceptions 159 | ``` 160 | 161 | ## Examples 162 | 163 | ### Line processing 164 | 165 | Print the name and size of every file from stdin: 166 | 167 | find . -type f | pawk 'f[0], os.stat(f[0]).st_size' 168 | 169 | > **Note:** this example also shows how pawk automatically imports referenced modules, in this case `os`. 170 | 171 | Print the sum size of all files from stdin: 172 | 173 | find . -type f | \ 174 | pawk \ 175 | --begin 'c=0' \ 176 | --end c \ 177 | 'c += os.stat(f[0]).st_size' 178 | 179 | Short-flag version: 180 | 181 | find . -type f | pawk -B c=0 -E c 'c += os.stat(f[0]).st_size' 182 | 183 | 184 | ### Whole-file processing 185 | 186 | If you do not provide a line expression, but do provide an end statement, pawk will accumulate each line, and the entire file's text will be available in the end statement as `t`. This is useful for operations on entire files, like the following example of converting a file from markdown to HTML: 187 | 188 | cat README.md | \ 189 | pawk --end 'markdown.markdown(t)' 190 | 191 | Short-flag version: 192 | 193 | cat README.md | pawk -E 'markdown.markdown(t)' 194 | 195 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alecthomas/pawk/d60f78399e8a01857ebd73415a00e7eb424043ab/__init__.py -------------------------------------------------------------------------------- /bin/.pandoc-3.0.1.pkg: -------------------------------------------------------------------------------- 1 | hermit -------------------------------------------------------------------------------- /bin/.python3-3.11.1.pkg: -------------------------------------------------------------------------------- 1 | hermit -------------------------------------------------------------------------------- /bin/README.hermit.md: -------------------------------------------------------------------------------- 1 | # Hermit environment 2 | 3 | This is a [Hermit](https://github.com/cashapp/hermit) bin directory. 4 | 5 | The symlinks in this directory are managed by Hermit and will automatically 6 | download and install Hermit itself as well as packages. These packages are 7 | local to this environment. 8 | -------------------------------------------------------------------------------- /bin/activate-hermit: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # This file must be used with "source bin/activate-hermit" from bash or zsh. 3 | # You cannot run it directly 4 | # 5 | # THIS FILE IS GENERATED; DO NOT MODIFY 6 | 7 | if [ "${BASH_SOURCE-}" = "$0" ]; then 8 | echo "You must source this script: \$ source $0" >&2 9 | exit 33 10 | fi 11 | 12 | BIN_DIR="$(dirname "${BASH_SOURCE[0]:-${(%):-%x}}")" 13 | if "${BIN_DIR}/hermit" noop > /dev/null; then 14 | eval "$("${BIN_DIR}/hermit" activate "${BIN_DIR}/..")" 15 | 16 | if [ -n "${BASH-}" ] || [ -n "${ZSH_VERSION-}" ]; then 17 | hash -r 2>/dev/null 18 | fi 19 | 20 | echo "Hermit environment $("${HERMIT_ENV}"/bin/hermit env HERMIT_ENV) activated" 21 | fi 22 | -------------------------------------------------------------------------------- /bin/hermit: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # THIS FILE IS GENERATED; DO NOT MODIFY 4 | 5 | set -eo pipefail 6 | 7 | export HERMIT_USER_HOME=~ 8 | 9 | if [ -z "${HERMIT_STATE_DIR}" ]; then 10 | case "$(uname -s)" in 11 | Darwin) 12 | export HERMIT_STATE_DIR="${HERMIT_USER_HOME}/Library/Caches/hermit" 13 | ;; 14 | Linux) 15 | export HERMIT_STATE_DIR="${XDG_CACHE_HOME:-${HERMIT_USER_HOME}/.cache}/hermit" 16 | ;; 17 | esac 18 | fi 19 | 20 | export HERMIT_DIST_URL="${HERMIT_DIST_URL:-https://github.com/cashapp/hermit/releases/download/stable}" 21 | HERMIT_CHANNEL="$(basename "${HERMIT_DIST_URL}")" 22 | export HERMIT_CHANNEL 23 | export HERMIT_EXE=${HERMIT_EXE:-${HERMIT_STATE_DIR}/pkg/hermit@${HERMIT_CHANNEL}/hermit} 24 | 25 | if [ ! -x "${HERMIT_EXE}" ]; then 26 | echo "Bootstrapping ${HERMIT_EXE} from ${HERMIT_DIST_URL}" 1>&2 27 | INSTALL_SCRIPT="$(mktemp)" 28 | # This value must match that of the install script 29 | INSTALL_SCRIPT_SHA256="180e997dd837f839a3072a5e2f558619b6d12555cd5452d3ab19d87720704e38" 30 | if [ "${INSTALL_SCRIPT_SHA256}" = "BYPASS" ]; then 31 | curl -fsSL "${HERMIT_DIST_URL}/install.sh" -o "${INSTALL_SCRIPT}" 32 | else 33 | # Install script is versioned by its sha256sum value 34 | curl -fsSL "${HERMIT_DIST_URL}/install-${INSTALL_SCRIPT_SHA256}.sh" -o "${INSTALL_SCRIPT}" 35 | # Verify install script's sha256sum 36 | openssl dgst -sha256 "${INSTALL_SCRIPT}" | \ 37 | awk -v EXPECTED="$INSTALL_SCRIPT_SHA256" \ 38 | '$2!=EXPECTED {print "Install script sha256 " $2 " does not match " EXPECTED; exit 1}' 39 | fi 40 | /bin/bash "${INSTALL_SCRIPT}" 1>&2 41 | fi 42 | 43 | exec "${HERMIT_EXE}" --level=fatal exec "$0" -- "$@" 44 | -------------------------------------------------------------------------------- /bin/hermit.hcl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alecthomas/pawk/d60f78399e8a01857ebd73415a00e7eb424043ab/bin/hermit.hcl -------------------------------------------------------------------------------- /bin/pandoc: -------------------------------------------------------------------------------- 1 | .pandoc-3.0.1.pkg -------------------------------------------------------------------------------- /bin/pip: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/pip3: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/pip3.11: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/pydoc3: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/pydoc3.11: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/python: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/python3: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/python3-config: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/python3.11: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /bin/python3.11-config: -------------------------------------------------------------------------------- 1 | .python3-3.11.1.pkg -------------------------------------------------------------------------------- /pawk: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from pawk import main 4 | 5 | main() 6 | -------------------------------------------------------------------------------- /pawk.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | """cat input | pawk [] 5 | 6 | A Python line-processor (like awk). 7 | 8 | See https://github.com/alecthomas/pawk for details. Based on 9 | http://code.activestate.com/recipes/437932/. 10 | """ 11 | 12 | import ast 13 | import codecs 14 | import inspect 15 | import optparse 16 | import os 17 | import re 18 | import sys 19 | 20 | 21 | __version__ = '0.8.0' 22 | 23 | 24 | RESULT_VAR_NAME = "__result" 25 | 26 | 27 | if sys.version_info[0] > 2: 28 | from itertools import zip_longest 29 | 30 | try: 31 | exec_ = __builtins__['exec'] 32 | except TypeError: 33 | exec_ = getattr(__builtins__, 'exec') 34 | STRING_ESCAPE = 'unicode_escape' 35 | else: 36 | from itertools import izip_longest as zip_longest 37 | 38 | def exec_(_code_, _globs_=None, _locs_=None): 39 | if _globs_ is None: 40 | frame = sys._getframe(1) 41 | _globs_ = frame.f_globals 42 | if _locs_ is None: 43 | _locs_ = frame.f_locals 44 | del frame 45 | elif _locs_ is None: 46 | _locs_ = _globs_ 47 | exec("""exec _code_ in _globs_, _locs_""") 48 | STRING_ESCAPE = 'string_escape' 49 | 50 | 51 | # Store the last expression, if present, into variable var_name. 52 | def save_last_expression(tree, var_name=RESULT_VAR_NAME): 53 | body = tree.body 54 | node = body[-1] if len(body) else None 55 | body.insert(0, ast.Assign(targets=[ast.Name(id=var_name, ctx=ast.Store())], 56 | value=ast.Constant(None))) 57 | if node and isinstance(node, ast.Expr): 58 | body[-1] = ast.copy_location(ast.Assign( 59 | targets=[ast.Name(id=var_name, ctx=ast.Store())], value=node.value), node) 60 | return ast.fix_missing_locations(tree) 61 | 62 | 63 | def compile_command(text): 64 | tree = save_last_expression(compile(text, 'EXPR', 'exec', flags=ast.PyCF_ONLY_AST)) 65 | return compile(tree, 'EXPR', 'exec') 66 | 67 | 68 | def eval_in_context(codeobj, context, var_name=RESULT_VAR_NAME): 69 | exec_(codeobj, globals(), context) 70 | return context.pop(var_name, None) 71 | 72 | 73 | class Action(object): 74 | """Represents a single action to be applied to each line.""" 75 | 76 | def __init__(self, pattern=None, cmd='l', have_end_statement=False, negate=False, strict=False): 77 | self.delim = None 78 | self.odelim = ' ' 79 | self.negate = negate 80 | self.pattern = None if pattern is None else re.compile(pattern) 81 | self.cmd = cmd 82 | self.strict = strict 83 | self._compile(have_end_statement) 84 | 85 | @classmethod 86 | def from_options(cls, options, arg): 87 | negate, pattern, cmd = Action._parse_command(arg) 88 | return cls(pattern=pattern, cmd=cmd, have_end_statement=(options.end is not None), negate=negate, strict=options.strict) 89 | 90 | def _compile(self, have_end_statement): 91 | if not self.cmd: 92 | if have_end_statement: 93 | self.cmd = 't += line' 94 | else: 95 | self.cmd = 'l' 96 | self._codeobj = compile_command(self.cmd) 97 | 98 | def apply(self, context, line): 99 | """Apply action to line. 100 | 101 | :return: Line text or None. 102 | """ 103 | match = self._match(line) 104 | if match is None: 105 | return None 106 | context['m'] = match 107 | try: 108 | return eval_in_context(self._codeobj, context) 109 | except: 110 | if not self.strict: 111 | return None 112 | raise 113 | 114 | def _match(self, line): 115 | if self.pattern is None: 116 | return self.negate 117 | match = self.pattern.search(line) 118 | if match is not None: 119 | return None if self.negate else match.groups() 120 | elif self.negate: 121 | return () 122 | 123 | @staticmethod 124 | def _parse_command(arg): 125 | match = re.match(r'(?ms)(?:(!)?/((?:\\.|[^/])+)/)?(.*)', arg) 126 | negate, pattern, cmd = match.groups() 127 | cmd = cmd.strip() 128 | negate = bool(negate) 129 | return negate, pattern, cmd 130 | 131 | 132 | class Context(dict): 133 | def apply(self, numz, line, headers=None): 134 | l = line.rstrip() 135 | f = l.split(self.delim) 136 | self.update(line=line, l=l, n=numz + 1, f=f, nf=len(f)) 137 | if headers: 138 | self.update(zip_longest(headers, f)) 139 | 140 | @classmethod 141 | def from_options(cls, options, modules): 142 | self = cls() 143 | self['t'] = '' 144 | self['m'] = () 145 | if options.imports: 146 | for imp in options.imports.split(','): 147 | m = __import__(imp.strip(), fromlist=['.']) 148 | self.update((k, v) for k, v in inspect.getmembers(m) if k[0] != '_') 149 | 150 | self.delim = codecs.decode(options.delim, STRING_ESCAPE) if options.delim else None 151 | self.odelim = codecs.decode(options.delim_out, STRING_ESCAPE) 152 | self.line_separator = codecs.decode(options.line_separator, STRING_ESCAPE) 153 | 154 | for m in modules: 155 | try: 156 | key = m.split('.')[0] 157 | self[key] = __import__(m) 158 | except: 159 | pass 160 | return self 161 | 162 | 163 | def process(context, input, output, begin_statement, actions, end_statement, strict, header): 164 | """Process a stream.""" 165 | # Override "print" 166 | old_stdout = sys.stdout 167 | sys.stdout = output 168 | write = output.write 169 | 170 | def write_result(result, when_true=None): 171 | if result is True: 172 | result = when_true 173 | elif isinstance(result, (list, tuple)): 174 | result = context.odelim.join(map(str, result)) 175 | if result is not None and result is not False: 176 | result = str(result) 177 | if not result.endswith(context.line_separator): 178 | result = result.rstrip('\n') + context.line_separator 179 | write(result) 180 | 181 | try: 182 | headers = None 183 | if header: 184 | line = input.readline() 185 | context.apply(-1, line) 186 | headers = context['f'] 187 | 188 | if begin_statement: 189 | write_result(eval_in_context(compile_command(begin_statement), context)) 190 | 191 | for numz, line in enumerate(input): 192 | context.apply(numz, line, headers=headers) 193 | for action in actions: 194 | write_result(action.apply(context, line), when_true=line) 195 | 196 | if end_statement: 197 | write_result(eval_in_context(compile_command(end_statement), context)) 198 | finally: 199 | sys.stdout = old_stdout 200 | 201 | 202 | def parse_commandline(argv): 203 | parser = optparse.OptionParser(version=__version__) 204 | parser.set_usage(__doc__.strip()) 205 | parser.add_option('-I', '--in_place', dest='in_place', help='modify given input file in-place', metavar='') 206 | parser.add_option('-i', '--import', dest='imports', help='comma-separated list of modules to "from x import *" from', metavar='') 207 | parser.add_option('-F', dest='delim', help='input delimiter', metavar='', default=None) 208 | parser.add_option('-O', dest='delim_out', help='output delimiter', metavar='', default=' ') 209 | parser.add_option('-L', dest='line_separator', help='output line separator', metavar='', default='\n') 210 | parser.add_option('-B', '--begin', help='begin statement', metavar='') 211 | parser.add_option('-E', '--end', help='end statement', metavar='') 212 | parser.add_option('-s', '--statement', action='store_true', help='DEPRECATED. retained for backward compatibility') 213 | parser.add_option('-H', '--header', action='store_true', help='use first row as field variable names in subsequent rows') 214 | parser.add_option('--strict', action='store_true', help='abort on exceptions') 215 | return parser.parse_args(argv[1:]) 216 | 217 | 218 | # For integration tests. 219 | def run(argv, input, output): 220 | options, args = parse_commandline(argv) 221 | 222 | try: 223 | if options.in_place: 224 | os.rename(options.in_place, options.in_place + '~') 225 | input = open(options.in_place + '~') 226 | output = open(options.in_place, 'w') 227 | 228 | # Auto-import. This is not smart. 229 | all_text = ' '.join([(options.begin or ''), ' '.join(args), (options.end or '')]) 230 | modules = re.findall(r'([\w.]+)+(?=\.\w+)\b', all_text) 231 | 232 | context = Context.from_options(options, modules) 233 | actions = [Action.from_options(options, arg) for arg in args] 234 | if not actions: 235 | actions = [Action.from_options(options, '')] 236 | 237 | process(context, input, output, options.begin, actions, options.end, options.strict, options.header) 238 | finally: 239 | if options.in_place: 240 | output.close() 241 | input.close() 242 | 243 | 244 | def main(): 245 | try: 246 | run(sys.argv, sys.stdin, sys.stdout) 247 | except EnvironmentError as e: 248 | # Workaround for close failed in file object destructor: sys.excepthook is missing lost sys.stderr 249 | # http://stackoverflow.com/questions/7955138/addressing-sys-excepthook-error-in-bash-script 250 | sys.stderr.write(str(e) + '\n') 251 | sys.exit(1) 252 | except KeyboardInterrupt: 253 | sys.exit(1) 254 | 255 | 256 | if __name__ == '__main__': 257 | main() 258 | -------------------------------------------------------------------------------- /pawk_test.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import sys 3 | import timeit 4 | from pawk import Action, Context, run, parse_commandline 5 | 6 | if sys.version_info[0] > 2: 7 | from io import StringIO 8 | else: 9 | from StringIO import StringIO 10 | 11 | 12 | TEST_INPUT_LS = r''' 13 | total 72 14 | -rw-r----- 1 alec staff 18 Feb 9 11:52 MANIFEST.in 15 | -rw-r-----@ 1 alec staff 3491 Feb 10 11:08 README.md 16 | drwxr-x--- 4 alec staff 136 Feb 9 23:35 dist/ 17 | -rwxr-x--- 1 alec staff 53 Feb 10 04:47 pawk* 18 | drwxr-x--- 6 alec staff 204 Feb 9 21:09 pawk.egg-info/ 19 | -rw-r----- 1 alec staff 5045 Feb 10 11:37 pawk.py 20 | -rw-r--r-- 1 alec staff 521 Feb 10 04:56 pawk_test.py 21 | -rw-r----- 1 alec staff 468 Feb 10 04:42 setup.py 22 | ''' 23 | 24 | TEST_INPUT_CSV_WITH_EMPTY_FIELD = r''' 25 | model,color,price 26 | A01,,100 27 | B03,blue,200 28 | ''' 29 | 30 | def run_integration_test(input, args): 31 | input = StringIO(input.strip()) 32 | output = StringIO() 33 | run(['pawk'] + args, input, output) 34 | return output.getvalue().strip() 35 | 36 | 37 | def test_action_parse(): 38 | negate, pattern, cmd = Action()._parse_command(r'/(\w+)/ l') 39 | assert pattern == r'(\w+)' 40 | assert cmd == 'l' 41 | assert negate is False 42 | 43 | 44 | def test_action_match(): 45 | action = Action(r'(\w+) \w+') 46 | groups = action._match('test case') 47 | assert groups == ('test',) 48 | 49 | 50 | def test_action_match_negate(): 51 | action = Action(r'(\w+) \w+', negate=True) 52 | groups = action._match('test case') 53 | assert groups is None 54 | groups = action._match('test') 55 | assert groups == () 56 | 57 | 58 | def test_integration_sum(): 59 | out = run_integration_test(TEST_INPUT_LS, ['-Bc = 0', '-Ec', 'c += int(f[4])']) 60 | assert out == '9936' 61 | 62 | 63 | def test_integration_match(): 64 | out = run_integration_test(TEST_INPUT_LS, ['/pawk_test/ f[4]']) 65 | assert out == '521' 66 | 67 | 68 | def test_integration_negate_match(): 69 | out = run_integration_test(TEST_INPUT_LS, ['!/^total|pawk/ f[-1]']) 70 | assert out.splitlines() == ['MANIFEST.in', 'README.md', 'dist/', 'setup.py'] 71 | 72 | 73 | def test_integration_truth(): 74 | out = run_integration_test(TEST_INPUT_LS, ['int(f[4]) > 1024']) 75 | assert [r.split()[-1] for r in out.splitlines()] == ['README.md', 'pawk.py'] 76 | 77 | 78 | def test_integration_multiple_actions(): 79 | out = run_integration_test(TEST_INPUT_LS, ['/setup/', '/README/']) 80 | assert [r.split()[-1] for r in out.splitlines()] == ['README.md', 'setup.py'] 81 | 82 | 83 | def test_integration_csv_empty_fields(): 84 | out = run_integration_test(TEST_INPUT_CSV_WITH_EMPTY_FIELD, ['-F,', 'f[2]']) 85 | assert out.splitlines() == ['price', '100', '200'] 86 | out = run_integration_test(TEST_INPUT_CSV_WITH_EMPTY_FIELD, ['-F,', 'f[1]']) 87 | assert out.splitlines() == ['color', '', 'blue'] 88 | 89 | 90 | def benchmark_fields(): 91 | options, _ = parse_commandline(['']) 92 | action = Action(cmd='f') 93 | context = Context.from_options(options, []) 94 | t = timeit.Timer(lambda: action.apply(context, 'foo bar waz was haz has hair')) 95 | print((t.repeat(repeat=3, number=100000))) 96 | 97 | 98 | if __name__ == '__main__': 99 | benchmark_fields() 100 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.poetry] 2 | name = "pawk" 3 | version = "0.8.0" 4 | authors = ["Alec Thomas "] 5 | description = "PAWK - A Python line processor (like AWK)" 6 | license = "MIT" 7 | readme = "README.md" 8 | homepage = "https://github.com/alecthomas/pawk" 9 | repository = "http://github.com/alecthomas/pawk" 10 | classifiers = [ 11 | "Programming Language :: Python :: 3", 12 | "License :: OSI Approved :: MIT License", 13 | "Operating System :: OS Independent", 14 | ] 15 | 16 | [tool.poetry.scripts] 17 | pawk = "pawk:main" 18 | 19 | [tool.poetry.urls] 20 | "Bug Tracker" = "https://github.com/alecthomas/pawk/issues" 21 | 22 | [tool.poetry.dependencies] 23 | python = ">=3.7" 24 | 25 | [tool.poetry.group.dev.dependencies] 26 | knotr = "^0.4.1" 27 | 28 | [build-system] 29 | requires = ["poetry-core"] 30 | build-backend = "poetry.core.masonry.api" 31 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import atexit 2 | import os 3 | import sys 4 | import pawk 5 | try: 6 | from setuptools import setup 7 | except ImportError: 8 | from distutils.core import setup 9 | 10 | if sys.version_info[0] != 3: 11 | print("PAWK requires Python 3") 12 | exit(1) 13 | 14 | 15 | try: 16 | import pypandoc 17 | long_description = pypandoc.convert('README.md', 'rst') 18 | with open('README.rst', 'w') as f: 19 | f.write(long_description) 20 | atexit.register(lambda: os.unlink('README.rst')) 21 | except (ImportError, OSError): 22 | print('WARNING: Could not locate pandoc, using Markdown long_description.') 23 | with open('README.md') as f: 24 | long_description = f.read() 25 | 26 | description = long_description.splitlines()[0].strip() 27 | 28 | setup( 29 | name='pawk', 30 | url='http://github.com/alecthomas/pawk', 31 | download_url='http://github.com/alecthomas/pawk', 32 | version=pawk.__version__, 33 | description=description, 34 | long_description=long_description, 35 | license='PSF', 36 | platforms=['any'], 37 | author='Alec Thomas', 38 | author_email='alec@swapoff.org', 39 | py_modules=['pawk'], 40 | scripts=['pawk'], 41 | ) 42 | --------------------------------------------------------------------------------