├── README.md └── llua /README.md: -------------------------------------------------------------------------------- 1 | ## Fitting in with Existing Ecosystem 2 | 3 | The Unix command-line philosophy is to combine small specialized programs 4 | with pipelines and i/o redirection. Tools like `cat`, `head`, `grep`, etc are 5 | universally available and play well together. `awk` occupies a special place: 6 | it _is_ possible to write programs using 'awk', but it 7 | is nowadays mostly used for powerful one-liners, particularly for data with delimited 8 | columns like tab or space separated files. 9 | 10 | So any new tool should not do what can already be easily done with the standard 11 | toolbox. 12 | In particular, making it easier to use a programming language for one-liners 13 | should not reinvent `awk` (this has already happened with `perl`, but I don't care for 14 | dollar-languages). 15 | 16 | `llua` is a small Swiss Army-style command utility which exposes the expressive 17 | power of Lua to command-line scripters. 18 | 19 | This will install `llua` and its common aliases to '/usr/local/bin' (the default is '~/bin') 20 | 21 | ``` 22 | $ sudo lua llua install /usr/local/bin 23 | ``` 24 | 25 | ## Matching Lua string patterns 26 | 27 | [Lua string patterns](https://www.lua.org/pil/20.2.html) are a powerful subset of 28 | true regular expressions. Technical they aren't regular expressions because of the 29 | lack of alternation (various possible matches separated by '|') but follow the same 30 | notation, with '%' being the escape character rather than '\'. 31 | 32 | `llua m` applies the Lua `string.match` function to every line in its input. 33 | Here we're picking out proper names, defined as an upper case letter followed by 34 | one or more lower-case letters. The second example picks out _two_ values defined as 35 | one or more non-space characters. Here we must use parentheses to indicate the 36 | captured groups. 37 | 38 | ```sh 39 | $ echo 'your name is Frodo' | llua m '%u%l+' 40 | Frodo 41 | $ echo 'Bonzo is a dog' | llua m '(%S+) is a (%S+)' 42 | Bonzo dog 43 | ``` 44 | `llua m` is a powerful way to extract columns out of arbitrarily-structured data; 45 | its alias is `lmatch`. 46 | 47 | On Linux "ls /proc | lmatch '%d+'" will return all the pids of running processes by 48 | picking out the integer entries. If a line does not match the pattern, 49 | it will be ignored. 50 | 51 | ```sh 52 | $ mount | lmatch '(/dev/%S+) on (%S+)' 53 | /dev/sda2 / 54 | /dev/sdb1 /home 55 | ``` 56 | 57 | ## Lua-style global substitution 58 | 59 | `sed` is the designated tool of choice, but I think there's sufficient complementary 60 | power provided by `string.gsub` to warrant exposing it on the command-line. `lsub` 61 | is the alias for `llua g': 62 | 63 | ```sh 64 | $ echo 'bonzo is a dog' | lsub '(%S+) is a (%S+)' '%2-%1' 65 | dog-bonzo 66 | ``` 67 | `lsub` takes a pattern argument and a replacement argument; any matches in 68 | the pattern are available in the replacement as numbered captures. Note that by 69 | default it is a _global_ replacement - all matches on each line will be substituted. 70 | 71 | Any `llua` subcommand can take the `-t` flag, which provides a string that becomes 72 | the _single_ input line of the command. This is convenient if you want to avoid the 73 | `echo` trick! 74 | 75 | The other kind of replacement string is an _expression_: 76 | 77 | ```sh 78 | $ lsub -t 'HOME is where the heart is' -l '(%u+)' 'getenv(s)' 79 | /home/steve is where the heart is 80 | ``` 81 | 82 | `llua` makes all the functions in the `math` and `os` tables available globally 83 | for your convenience. [os.date](https://www.lua.org/manual/5.3/manual.html#pdf-os.date) 84 | can convert timestamps into human date/time using the same flags as `strftime`. This is 85 | useful for preprocessing log files which don't bother to present time in a friendly way. 86 | 87 | ```sh 88 | $ echo '1465637777 had a bath' | lsub -l '^(%d+)' 'date("%T",s)' 89 | 05:36:17 had a bath 90 | ``` 91 | ## Evaluating some expression for each line 92 | 93 | `llua l` (alias `leval`) evaluates an expression for each line; same relaxed rule for function 94 | visibility as with `lsub -l`. The expression is an implicit function of 95 | l (the line), ll (the _last_ line) and lno (the line number). 96 | 97 | ```sh 98 | $ leval -t 'hello dolly' 'l:sub(1,4):upper()' 99 | HELL 100 | $ seq 1 5 | leval '10*l, 100*l' 101 | 10 100 102 | 20 200 103 | 30 300 104 | 40 400 105 | 50 500 106 | ``` 107 | This only works because Lua will auto-convert strings into numbers if the context demands 108 | it - arguably a misfeature with non-trivial programs, but very convenient for the current 109 | use! 110 | 111 | Note that it is easy to produce multiple output items - just separate expressions with 112 | commas. What if we wanted another output delimiter? CSV is a popular format: 113 | 114 | ```sh 115 | $ seq 1 4 | leval -o, 'l/pi, sin(l)/pi' 116 | 0.31831,0.267849 117 | 0.63662,0.289438 118 | 0.95493,0.0449199 119 | 1.27324,-0.240898 120 | ``` 121 | `-o` applies to the other subcommands as well. 122 | 123 | I did not intend to re-invent Awk, but manipulating pre-split columns is very convenient. 124 | The `-s` flag splits the input according to the input delimiter (set with `-i`) 125 | 126 | ```sh 127 | $ echo "10 20" | leval -s "2*F[1], 3*F[2]" 128 | 20 60 129 | ``` 130 | CSV files are convenient, if you don't rely on differences involving quoting values. 131 | The `-c` flag implies `-s` and "-i, -o,". It assumes that the the first line contains the column 132 | names. In this case, you can access the fields by name - if the original names contain 133 | spaces or odd characters, they will be replaced by underscores. Also, there's an option 134 | to specify _output_ column names using a SQL-like notation. 135 | 136 | ```sh 137 | scratch$ cat tmp.csv 138 | X,Y 139 | 0.840188,1.183149 140 | 0.783099,2.39532 141 | 0.911647,0.592653 142 | 0.335223,2.30469 143 | scratch$ leval -c 'F.Y - 2*F.X as A, F.X-F.Y as B' tmp.csv 144 | A,B 145 | -0.497227,-0.342961 146 | 0.829122,-1.612221 147 | -1.230641,0.318994 148 | 1.634244,-1.969467 149 | 150 | If you don't use the 'as name' notation for each output field, then no header is 151 | written out. 152 | 153 | ## Sorting by column 154 | 155 | As with `lsub/sed`, there is already `sort` - it can sort on columns. But the meaning 156 | of column here is 'character column' not 'delimited column'. This seems rather feeble at 157 | this point in computing history, to have to work with fixed-format data (which was hip 158 | when guys wrote FORTRAN wearing ties.) 159 | 160 | Hence 'llua s', aliased as `lsort`. 161 | 162 | First of all, make some random data, and first sort on second column, ascending, 163 | and then sort on first column, descending. (We need 'n' to specify that this is 164 | to be intepreted as numbers - without 'n' it uses usual string comparison.) 165 | 166 | ```sh 167 | scratch$ seq 1 5 | leval 'random(),random()' > rand.txt 168 | scratch$ lsort 2n rand.txt 169 | 0.911647 0.197551 170 | 0.840188 0.394383 171 | 0.277775 0.55397 172 | 0.335223 0.76823 173 | 0.783099 0.79844 174 | scratch$ lsort 1nd rand.txt 175 | 0.911647 0.197551 176 | 0.840188 0.394383 177 | 0.783099 0.79844 178 | 0.335223 0.76823 179 | 0.277775 0.55397 180 | ``` 181 | `lsort` is not a speed demon. But it is a lot easier to use than `sort` when you 182 | have delimited columns. 183 | 184 | ## Collecting instances of a captured value 185 | 186 | `llua c` aka `lcollect` is occasionally useful. It extracts a value like `lmatch`, but 187 | counts how often each unique value occurs. These are words that appear first 188 | on each line of this readme. 189 | 190 | ```sh 191 | $ lcollect '%a+' README.md 192 | as 1 193 | sed 1 194 | lsub 2 195 | to 2 196 | On 1 197 | Lua 3 198 | HELL 1 199 | is 2 200 | LLUA 1 201 | .... 202 | ``` 203 | We can combine our cool tools into pipelines, naturally. 204 | 205 | ```sh 206 | $ cat README.md | tr ' ' '\n' | lcollect '%a+' | lsort 2nd | head 207 | the 54 208 | is 40 209 | a 28 210 | to 25 211 | as 17 212 | for 17 213 | llua 17 214 | and 15 215 | sh 15 216 | with 14 217 | ``` 218 | 219 | ## Evaluating Lua expressions 220 | 221 | This is the odd one out. `llua e` (alias `lx`) does not consume standard input. 222 | I include it because it's simply more powerful than `expr` - the standard mathematical 223 | functions are available, hex literals, and if you are using Lua 5.3, bit operators as well. 224 | 225 | ```sh 226 | $# all functions and constants from math.* are available 227 | $ lx 'sin(1.2)*pi + 1' 228 | 3.92809 229 | $# As well as os.*; this is 'one hour in the future' 230 | $ lx 'time()+3600' 231 | 1465477097 232 | $ lx -x '0x100 + 0x200' 233 | 0x300 234 | ``` 235 | The `-x` flag forces hexadecimal output; as with `leval`, multiple expressions can 236 | be evaluated and the output delimiter is controlled with `-o`. 237 | 238 | ## Environment and Flags 239 | 240 | All operations understand the environment variable `LLUA_FMT` which is a 241 | custom format to use for printing floating-point values: 242 | 243 | ```sh 244 | LLUA_FMT='%4.2f' lx 'sin(0.234),cos(0.234)' 245 | 0.23 0.97 246 | ``` 247 | 248 | If `LLUA_CONFIG` is defined, `llua` will read a configuration file at that location. 249 | If it's just 1, then `llua` will use the default location `~/.lluarc`. It may contain 250 | any Lua definitions: 251 | 252 | ``` 253 | $ cat ~/.lluac 254 | K=1024 255 | function s(x) return x*x end 256 | $ export LLUA_CONFIG=1 257 | $ lx 's(K)' 258 | 1048576 259 | ``` 260 | This is very useful if you have some custom constants and operations that you 261 | would like at your fingertips. The relaxed lookup rules apply in this 262 | file as well. 263 | 264 | Every operation except `lx` understands `-n` which prints out the line number 265 | as well. 266 | 267 | All operations understand `-o` for output delimiter, for instance `-o,`. The 268 | column-oriented input readers like `lfmt` and `lsort` also understand the 269 | equivalent `-i` for input delimiter. 270 | 271 | ## Conclusion 272 | 273 | I've found these to be the most useful Lua one-liners; where they overlap existing 274 | standard tools, I've included them where they offer more functionality than the 275 | standard. 276 | 277 | I did not resist the temptation to reinvent `awk`, since the functionality was so 278 | convenient. 279 | 280 | [Lua patterns](http://man.openbsd.org/patterns.7) are also used for URL rewriting in OpenBSD's 281 | `httpd` daemon. 282 | 283 | [sbp](https://f.juef.space/sbp/) by Svyatoslav Mishyn already implements `lmatch` 284 | using the OpenBSD modified Lua string pattern matching code. 285 | -------------------------------------------------------------------------------- /llua: -------------------------------------------------------------------------------- 1 | #!/usr/bin/lua 2 | local usage = [[ 3 | Lua stream commands 4 | llua 5 | Commands: 6 | m (patt) extract values with lua string patterns (alias lmatch) 7 | g (patt) (repl) substitute string pattern with replacement (alias lsub) 8 | If -l is specified, replacement is an expression in 's' 9 | f (repl) replacement string with implicit captures (alias lf) 10 | e (expr) evaluate Lua expression (NO input) (alias lx) 11 | This expression may have several parts separated by commas 12 | c (pat) count occurances of value extracted with pattern (alias lcount) 13 | If there are two matches, the first is the key and the 14 | second is the value ('collect') 15 | l (expr) evaluate expression in 'l' for each line (alias leval) 16 | s (col) sort input by delimited column. Add 'n' for numeric sort (alias lsort) 17 | 18 | Flags: 19 | -t a string value which becomes the single input line 20 | -n output the line number as well 21 | -o(delim) output delimiter - default is tab 22 | -i(delim) input column delimiter (for f and s) 23 | -l use a function in 's' for substitution (for g) 24 | -x means print output in hex 25 | 26 | Expressions: functions in os and math table are directly available; 27 | will look up values in the environment if not found. 28 | 29 | Type 'llua ex' for examples 30 | ]] 31 | 32 | local examples = [[ 33 | # --- llua examples --- 34 | # matching: 35 | $ echo 'your name is Frodo' | llua m '(%u%l+)' 36 | Frodo 37 | # substitution: 38 | $ echo 'bonzo is a dog' | llua g '(%S+) is a (%S+)' '%2-%1' 39 | dog-bonzo 40 | $ llua g -l -t 'HOME is where the heart is' '(%u+)' 'getenv(s)' 41 | /home/steve is where the heart is 42 | # formatted column output 43 | $ echo 'one two three' | llua f '(%2)?%1:%3' 44 | (two)?one:three 45 | $ echo 'one,two,three' | llua f -i, '%1 %2 %3' 46 | one two three 47 | # expressions 48 | $ llua e 'sin(1.2)*pi' 49 | 2.9280871453332 50 | $ llua e -x '0x100 + 0x200' 51 | 0x300 52 | $ llua e 'time()+3600' 53 | 1465477097 54 | $ llua e -o: '10,20,30' 55 | 10:20:30 56 | # line function 57 | $ seq 1 4 | llua l 'l/pi, sin(l)/pi' 58 | 0.31830988618379 0.26784853340116 59 | 0.63661977236758 0.2894383604401 60 | 0.95492965855137 0.044919893703792 61 | 1.2732395447352 -0.24089771614508 62 | # sorting by column (here by 2nd, numerical asc order) 63 | $ seq 1 5 | llua l 'random(),random()' | llua s 2n 64 | 0.91164735793678 0.19755136929338 65 | 0.84018771715471 0.39438292681909 66 | 0.27777471080319 0.55396995579543 67 | 0.33522275571489 0.7682295948119 68 | 0.78309922375861 0.79844003347607 69 | ]] 70 | 71 | -- these are uggestions for aliases. If you create 72 | -- soft links to llua on the path with these names 73 | -- then 'lm' means 'llua m' etc. 74 | local alias = { 75 | lx = 'e', 76 | lmatch = 'm', 77 | lsub = 'g', 78 | lcollect = 'c', 79 | lfmt = 'f', 80 | lsort = 's', 81 | leval = 'l', 82 | } 83 | 84 | local function install(dest) 85 | local function exec(cmd) 86 | print(cmd) 87 | os.execute(cmd) 88 | end 89 | exec('chmod +x llua') 90 | exec('cp llua '..dest) 91 | for a in pairs(alias) do 92 | exec('ln -s '..dest..'/llua'..' '..dest..'/'..a) 93 | end 94 | end 95 | 96 | local function quit(msg, show_usage) 97 | io.stderr:write('llua ',msg,'\n') 98 | if show_usage then io.stderr:write(usage,'\n') end 99 | os.exit(1) 100 | end 101 | 102 | 103 | local select, print, append, getenv = select, print, table.insert, os.getenv 104 | local unpack = table.unpack or unpack 105 | 106 | local prog = arg[0]:match '/([^/]+)$' 107 | local cmd = alias[prog] 108 | local start = 1 109 | if not cmd then 110 | cmd = arg[1] 111 | start = 2 112 | end 113 | 114 | if not cmd then 115 | print 'must provide command' 116 | print(usage) 117 | return 118 | elseif cmd == 'ex' then 119 | print(examples) 120 | return 121 | elseif cmd == 'install' then 122 | local dest = arg[2] or '~/bin' 123 | print ('installing to '..dest) 124 | install (dest) 125 | return 126 | end 127 | 128 | local parms = {} 129 | local hex, lineno, lambda, input, idelim, odelim 130 | local split, column_headers 131 | local i = start 132 | while i <= #arg do 133 | local a = arg[i] 134 | if a == '-x' then hex = true 135 | elseif a == '-n' then lineno = true 136 | elseif a == '-l' then lambda = true 137 | elseif a == '-s' then split = true 138 | elseif a == '-c' then 139 | column_headers = true 140 | split = true 141 | idelim = ',' 142 | odelim = ',' 143 | elseif a == '-t' then 144 | input = arg[i+1] 145 | i = i + 1 146 | elseif a == '-h' then 147 | print(usage) 148 | return 149 | elseif a:sub(1,2) == '-i' then 150 | idelim = a:sub(3) 151 | elseif a:sub(1,2) == '-o' then 152 | odelim = a:sub(3) 153 | if odelim == '\\n' then odelim = '\n' end 154 | else 155 | append(parms,a) 156 | end 157 | i = i + 1 158 | end 159 | 160 | -- relaxed access to Lua library functions 161 | setmetatable(_G,{ 162 | __index = function(self,k,v) 163 | return os[k] or math[k] or getenv(k) 164 | or quit("unknown variable "..k) 165 | end 166 | }) 167 | 168 | -- can specify float format specifically with env var LLUA_FMT 169 | -- we use '%g' with 5.3 to restore old behaviour when printing floats 170 | local ffmt = getenv 'LLUA_FMT' or (_VERSION=='Lua 5.3' and '%g') 171 | 172 | -- optionally load configuration containing Lua definitions 173 | local lcnfig = getenv 'LLUA_CONFIG' 174 | if lcnfig == '1' then 175 | lcnfig = getenv 'HOME' ..'/.lluarc' 176 | end 177 | if lcnfig then 178 | local f,chunk,ok,e 179 | f,e = io.open(lcnfig,'r') 180 | if e then quit(lcnfig..': '..e) end 181 | chunk,e = load(f:read '*a',lcnfig,'t',_G) 182 | if e then quit(e) end 183 | ok, e = pcall(chunk) 184 | if not ok then quit(lcnfig..': '..e) end 185 | f:close() 186 | end 187 | 188 | if hex or ffmt then 189 | local old_tostring,type,format = _G.tostring,type,string.format 190 | _G.tostring = function(s) 191 | if type(s) == 'number' then 192 | if hex then 193 | return format('0x%X',s) 194 | else 195 | return format(ffmt,s) 196 | end 197 | else 198 | return old_tostring(s) 199 | end 200 | end 201 | end 202 | 203 | 204 | if odelim then 205 | local write,tostring = io.write,tostring 206 | print = function(...) 207 | local n = select('#',...) 208 | local args = {...} 209 | for i = 1,n do 210 | write(tostring(args[i])) 211 | if i < n then write(odelim) end 212 | end 213 | write('\n') 214 | end 215 | end 216 | 217 | local lno,current_line,current_file = 1,'','' 218 | 219 | local function _print(...) 220 | if select(1,...) then 221 | if lineno then 222 | print(current_file..lno..':',...) 223 | else 224 | print(...) 225 | end 226 | end 227 | lno = lno + 1 228 | end 229 | 230 | local function lines(file) 231 | if input then 232 | return function() 233 | local res = input 234 | input = nil 235 | return res 236 | end 237 | end 238 | if file then 239 | current_file = file..':' 240 | return io.lines(file) 241 | else 242 | return io.lines() 243 | end 244 | end 245 | 246 | local function make_column_match(max,only_last) 247 | local res,delim,patt 248 | if idelim then 249 | res = '' 250 | delim = idelim 251 | patt = '[^'..delim..']+' 252 | else -- just spaces 253 | res = '^%s*' 254 | delim = '%s+' 255 | patt = '%S+' 256 | end 257 | local last_capture = '('..patt..')' 258 | local capture = only_last and patt or last_capture 259 | for i = 1,max do 260 | res = res .. (i < max and capture..delim or last_capture) 261 | end 262 | return res 263 | end 264 | 265 | local cmds = {} 266 | 267 | function cmds.m(patt,file) 268 | if not patt then quit('expecting pattern') end 269 | for line in lines(file) do 270 | _print(line:match(patt)) 271 | end 272 | end 273 | 274 | local function compile (args,expr) 275 | local fn = 'return function('..args..') return '..expr..' end' 276 | local chunk,e = load(fn,"") 277 | if e then 278 | quit("compile error "..e) 279 | end 280 | return chunk() 281 | end 282 | 283 | function cmds.l (expr,file) 284 | if not expr then quit('expecting expression') end 285 | local iter = lines(file) 286 | local patt 287 | if split then 288 | local max,max_of = 1, math.max 289 | local ofields = {} 290 | if column_headers then 291 | -- split the column names 292 | local line = iter() 293 | local columns,i = {},0 294 | for col in line:gmatch '[^,]+' do 295 | i = i + 1 296 | col = col:gsub('%W','_') 297 | columns[col] = i 298 | end 299 | -- extract output column names 300 | expr = expr:gsub(' as (%a+)',function(col) 301 | append(ofields,col) 302 | return '' 303 | end) 304 | -- and patch F.IDEN! 305 | expr = expr:gsub('F%.([%a_][%w_]*)',function(col) 306 | local icol = columns[col] 307 | max = max_of(max,icol) 308 | return 'F['..icol..']' 309 | end) 310 | if #ofields > 0 then 311 | print(table.concat(ofields,',')) 312 | end 313 | else -- maximum number of fields needed from F[idx] 314 | expr:gsub('F%[(%d+)%]',function(idx) 315 | max = max_of(max,tonumber(idx)) 316 | end) 317 | end 318 | patt = make_column_match(max) 319 | end 320 | local fun = compile('l,ll,lno,F',expr) 321 | local last 322 | local ok,err = pcall(function() 323 | local F = {} 324 | for line in iter do 325 | if split then 326 | F = {line:match(patt)} 327 | end 328 | current_line = line 329 | _print(fun(line,last,lno,F)) 330 | last = line 331 | end 332 | end) 333 | if err then quit(err) end 334 | end 335 | 336 | function cmds.c(patt,file) 337 | if not patt then quit('expecting pattern') end 338 | local items = {} 339 | for line in lines(file) do 340 | local m,val = line:match(patt) 341 | if m then 342 | if val then 343 | items[m] = val 344 | else 345 | items[m] = (items[m] or 0) + 1 346 | end 347 | end 348 | end 349 | for m,c in pairs(items) do 350 | print(m,c) 351 | end 352 | end 353 | 354 | function cmds.s(icol,file) 355 | if not icol then quit('expecting column') end 356 | local items,postfix = {} 357 | icol,postfix = icol:match '(%d+)(.*)' 358 | if not icol then quit('col did not start with a number') end 359 | local numerical = postfix:match'n' 360 | local descending = postfix:match'd' 361 | icol = tonumber(icol) 362 | local patt = make_column_match(icol,true) 363 | for line in lines(file) do 364 | local key = line:match(patt) 365 | if numerical then key = tonumber(key) end 366 | if key then 367 | append(items,{line=line,key=key}) 368 | end 369 | end 370 | table.sort(items, 371 | descending 372 | and 373 | (function(a,b) return a.key > b.key end) 374 | or 375 | (function(a,b) return a.key < b.key end)) 376 | for i = 1,#items do 377 | print(items[i].line) 378 | end 379 | end 380 | 381 | function cmds.g(patt,repl,file) 382 | if not patt or not repl then quit('expecting pattern/replacement') end 383 | if lambda then -- -l flag 384 | repl = compile('s',repl) 385 | end 386 | for line in lines(file) do 387 | line = line:gsub(patt,repl) 388 | if line then print(line) end 389 | end 390 | end 391 | 392 | function cmds.f(repl,file) 393 | if not repl then quit('expecting string') end 394 | -- maximum number of captures? 395 | local max = 1 396 | repl:gsub('%%(%d+)',function(idx) 397 | idx = tonumber(idx) 398 | if idx > max then max = idx end 399 | end) 400 | local patt = make_column_match(max) 401 | cmds.g(patt..'.*',repl,file) 402 | end 403 | 404 | function cmds.e (expr) 405 | if not expr then quit('expecting expression') end 406 | local fn,e = load('return '..expr) 407 | if e then 408 | quit(e) 409 | end 410 | print(fn()) 411 | end 412 | 413 | local op = cmds[cmd] 414 | if op then 415 | op(unpack(parms)) 416 | else 417 | quit('unknown command '..(op or '?'),true) 418 | end 419 | --------------------------------------------------------------------------------