├── README.markdown ├── index.mkd ├── map └── map-vs-gp.mkd /README.markdown: -------------------------------------------------------------------------------- 1 | Documentation is [here](http://github.com/sitaramc/map/blob/master/index.mkd). 2 | 3 | NOTE that this command has been completely rewritten in Nov 2015. The old one 4 | is available in the "old" branch. **The 'usage' HAS changed significantly, 5 | and in a backward incompatible way.** Be sure to read the documentation! 6 | -------------------------------------------------------------------------------- /index.mkd: -------------------------------------------------------------------------------- 1 | # Map -- making xargs simpler *and* more powerful! 2 | 3 | > > NOTE that this command has been completely rewritten in Nov 2015. The 4 | > > old one is available in the "old" branch. **The 'usage' HAS changed 5 | > > significantly, and in a backward incompatible way.** Be sure to read 6 | > > the documentation! 7 | 8 | # For existing users 9 | 10 | The major changes from the old "map" command are: 11 | 12 | * **SECURITY**; see appendix 1 for details. **Even though this is in the 13 | appendix, it is important -- please read!** 14 | * **important**: the meaning of `%%` has changed; please read below. 15 | * `-v` replaced by using environment variable "D" to "1". `-q` no longer 16 | available. 17 | * `-p` defaults to 4, not #CPUs (but see the section on environment 18 | variables later). 19 | * Finally, we no longer use 'xargs' to do the actual parallelism; we do it 20 | natively, in pure perl. 21 | 22 | ---------------------------------------------------------------------- 23 | 24 | # OVERVIEW 25 | 26 | `map` can replace xargs for most purposes as well as replace many `for` loops 27 | in shell. Clone the [repo](https://github.com/sitaramc/map) or grab just the 28 | [script](https://raw.githubusercontent.com/sitaramc/map/master/map). Run `map 29 | -h` to get some quick help, but be aware that only scratches the surface! 30 | 31 | When used like xargs, map is **line-oriented**; each input *line* is an 32 | argument, not each word within a line. When used like a shell loop, arguments 33 | come from the command line. There are examples of both kinds later in this 34 | document. 35 | 36 | Map has only **two options**: `-p` (**max procs**, default 4), which says how 37 | many jobs to run, and keep running, in parallel, and `-n` (**max args**), 38 | which says how many of the input arguments to use in *one* command/run. The 39 | default for max args is either 1 or 100; see later. As a **convenient 40 | shortcut**, if the *first* argument is `-1` (in general, `-N` for any integer 41 | N), this is converted to `-p1 -nN` -- i.e., single-processing, N arguments per 42 | command. 43 | 44 | (As of 2021-11-27, if you're running sequentially (i.e., `-p` is `1`, 45 | implicitly or explicitly), `map` will emulate `xargs`'s `--open-tty` option, 46 | allowing you to use, for example, `vim`, as the command to run. There's no 47 | option for this; it just happens automatically. I'll revisit it if there are 48 | problems.) 49 | 50 | Note: please also see the section on environment variables for more on this. 51 | 52 | Map has only **one replacement string**, the `%` symbol, with some easy to 53 | remember suffixes. `%` is the same as `%F`, and stands for the full argument. 54 | `%D` stands for the directory component of the argument (`.` if no directory 55 | component existed), `%B` for the basename, and `%E` for extension. In 56 | general, `%F` equals `%D/%B.%E`. Finally, `%%` renders as a `%` sign in the 57 | resulting command. 58 | 59 | (**Default `%`**: if no `%` sign or variant is supplied, a single `%` is added 60 | at the end.) 61 | 62 | The **default max-args** is 1 if the command template has more than one `%` 63 | (or `%F`, `%D`, `%B`, `%E`) in it, and 100 if the template has only one `%` 64 | (`%%` is not relevant for this purpose). To understand why, consider the 65 | following three commands: 66 | 67 | ls *.gif | map convert % %B.png # default max-args: 1 68 | map "mkdir %B; tar -C %B -xf %" *.tgz # default max-args: 1 69 | 70 | ls *.gif | map cp % /tmp # default max-args: 100 71 | 72 | In the first and second examples, the command must run invididually for each 73 | input, because the command has both the basename of the file (`%B`) as well as 74 | the full filename (`%`) in the template. In fact, I cannot think of any 75 | command with more than one `%` in the command template, where it makes sense 76 | to have `-n` greater than 1. Hence the default in such cases is 1. 77 | 78 | In the third example, we're OK with running `cp a.gif b.gif c.gif [...] /tmp`, 79 | so the default is 100. 80 | 81 | However, in this case there are also **counter-examples**, where you only need 82 | a single `%`, but max-args also must be limited to 1. A common example is: 83 | 84 | ls *.tar | map tar -xf % # will FAIL if more than one tar file found 85 | 86 | because tar cannot extract more than one archive at a time. In such cases you 87 | **must** use `-n 1`: 88 | 89 | ls *.tar | map -n 1 tar -xf % 90 | 91 | Map can take **arguments from the command line** too, if you want a different 92 | way to execute shell loops. In this case the complete actual command must be 93 | in quotes: 94 | 95 | map "cp % /tmp" *.tgz 96 | # same as: ls *.tgz | map cp % /tmp 97 | 98 | By the way, did you notice some of the earlier examples, we had **arguments in 99 | the middle of the command**, yet map substituted more than one in its place? 100 | (`xargs` cannot insert multiple arguments in the middle; only at the end.) 101 | 102 | Map can do more than that! If you use "%", or one of its variants, *within* a 103 | word, the **entire word is repeated** when the max-args is greater than one: 104 | 105 | map "du -sm /home/%/mail" alice bob carol 106 | # runs: du -sm /home/alice/mail /home/bob/mail /home/carol/mail 107 | # NOT something silly like: du -sm /home/alice bob carol/mail 108 | 109 | Finally, map also has a **delimiter mode**, which introduces replacement 110 | strings `%1` thru `%9`, letting you do things like this: 111 | 112 | cat /etc/passwd | map -d: 'echo user %1 has shell %7' 113 | 114 | In this mode, you can supply `-ds` to mean whitespace, `-dt` to mean tabs, or 115 | specify the actual delimiter (like the `-d:` above). 116 | 117 | # Environment variables 118 | 119 | * If you want to see what command is *actually* being executed, run it with 120 | the environment variable `D` set to 1. If your environment already uses 121 | "D" for something else, you can use `MAP_D` instead. 122 | 123 | * The built-in default of "4" for max-procs (`-p`) can be overridden by 124 | setting the environment variable `MAP_MAX_PROCS`. For example, your 125 | bashrc may contain: 126 | 127 | # default for -p: half the number of CPUs 128 | export MAP_MAX_PROCS=$(( `nproc` / 2 )) 129 | 130 | * The built-in default of "100" for max-args (`-n`) can be overridden by 131 | setting the environment variable `MAP_MAX_ARGS`. 132 | 133 | # Other notes 134 | 135 | * By default, there's a delay of 0.01 seconds before each fork, to prevent 136 | "accidents". Use `-f` (for 'fullspeed') if you are running lots of 137 | short lived jobs. 138 | 139 | * Since many commands don't produce any output for a long time, map has a 140 | very low-noise way of showing progress: 141 | 142 | $ map -n 1 -p 3 "tar cf %B.tar %" */ 143 | +++-+-+-+--- 144 | 145 | This only kicks in when `-p` is greater than 1 and the D environment 146 | variable (see previous bullet) is unset. Each `+` is a new task started, 147 | and `-` is a task ended. This goes to STDERR. 148 | 149 | * When map exits, the exit code will be the sum of: 150 | 151 | * if one or more jobs failed, add 1 152 | * if unsafe files (see Appendix 1) were found, add 2 153 | 154 | # Thoughts/discussion 155 | 156 | The rationale for some of the changes and decisions goes like this: 157 | 158 | * The whole "%" versus "%%" business was confusing some people, especially 159 | because of the weirdness of things like "convert %% %%B.png", which won't 160 | work!. Being able to produce an actual "%" was also considered important, 161 | and "%%" was best for that. 162 | 163 | One could argue that the current scheme is also confusing, but it leans 164 | towards more explicitness. 165 | 166 | * Using #CPUs as the default for `-p` didn't seem to make sense when most of 167 | my jobs are IO- or network-bound. The default is 4 now, but again, think 168 | of this as increasing explicitness. 169 | 170 | * The choice of "100" for the default max-args is somewhat of an educated 171 | guess. The length of the command line very rarely becomes a factor, IMO, 172 | and if you have a list of say 1000 arguments, then it is arguably better 173 | to fire off 4 parallel runs, each taking 100 arguments, and repeat them as 174 | they finish, than to fire off just one with all 1000 arguments. 175 | 176 | * The "delay" of 0.01 seconds before each fork prevents run-a-way forking 177 | when you make a mistake (don't ask!). In most normal use, you're more 178 | likely to have a few long running ("long" relative to 0.01 seconds!) tasks 179 | rather than several thousand extremely short-lived ones. So you'll rarely 180 | need to use the `-f` override, but it's there in case you do. 181 | 182 | * Switching from xargs to actually run the parallel stuff natively in perl 183 | turned out to be ridiculously easy, so I did. 184 | 185 | # Appendix 1: Security and bad filenames 186 | 187 | A long time ago, I used to say that people who use spaces in filenames should 188 | be shot. I have mellowed a bit from those days, but there are still a lot of 189 | characters which have no good reason to be in a file name -- it's either a 190 | serious bug, or someone trying to attack your system. 191 | 192 | (A summary of "invalid" filenames I found on my machines is at the bottom of 193 | this section.) 194 | 195 | As such, programs that *quietly handle them, with no warnings at all*, are 196 | doing their users a disservice. Getting used to such "smartness" in one 197 | program may backfire unless *all* the programs you use are equally smart *and* 198 | you never write oddball one-off scripts. 199 | 200 | People that hide behind "the standard allows it, so the script must support 201 | it", need to read section 2.2 ("Standards permit the exclusion of bad 202 | filenames") of [this page][dwfn] or section 4.7 ("Filename portability") of 203 | "The Open Group Base Specifications Issue 7 IEEE Std 1003.1, 2013 Edition", 204 | [General Concepts][ogpfn]. Not only are you misinformed, you're misleading 205 | your users/readers with your "standarder-than-thou" snootiness. (If you want 206 | to know who I'm railing at, just search stackoverflow for "newlines in 207 | filenames" or something similar -- there's no shortage of them!) 208 | 209 | [dwfn]: http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html 210 | [ogpfn]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html 211 | 212 | Map does allow you to work with such files, but imposes some restrictions. 213 | Briefly, you can't use files that are potentially insecure with a command 214 | template that is guaranteed to require a shell. Here are the details: 215 | 216 | * First, some definitions. "Shell meta characters" are one of 217 | 218 | ` ~ # $ & * ( ) [ ] { } \ | ; ' " < > ? 219 | 220 | A "shell unsafe" filename contains one of these characters or a newline or 221 | a tab (we give spaces a pass -- that ship has sailed, sadly) in the 222 | filename, OR starts with a "-". 223 | 224 | * If your command template has shell metacharacters, then files whose names 225 | are shell unsafe are rejected (but see note below), with a message printed 226 | to that effect. The remaining files are allowed to pass. **This may be a 227 | problem in some cases, where a partial run is not considered to be valid. 228 | YOU HAVE BEEN WARNED.** If you want to detect such problems before you 229 | run your actual command, just run your file list through: 230 | 231 | ... | map ": > /dev/null" 232 | 233 | (NOTE: you can force such files to be accepted for processing by using 234 | `-i`. This option is intentionally NOT shown in the usage message because 235 | I want people to read this section before they use it.) 236 | 237 | * If your command template does not have any shell metacharacters, nothing 238 | is rejected, and the resulting command(s) are run directly from perl in 239 | "list" mode (see 'perldoc -f exec' etc for details). This is secure. 240 | 241 | * **However, note that the direct command you give to map may itself be 242 | a shell script. There is no way for map to know that, and careless 243 | use could lead to the same problems we are trying to catch/prevent.** 244 | But at least you have some warning from map (see next bullet) if there 245 | are unsafe files in a directory where you would not normally expect 246 | them. 247 | 248 | * Either way, at the end of the run you will see a count of files whose 249 | names were shell-unsafe, if there were any, even if the commands ran in 250 | "list" mode and thus did not reject any of the files. 251 | 252 | Initially, I chose the option of making the "unsafe file" detection tighter 253 | (fewer characters are deemed unsafe), and *never* allowing such filenames to 254 | be processed at all. But I quickly realised that there's value in allowing 255 | direct external commands without using a shell ("rm" comes to mind (heh!) but 256 | also potentially "tar", "mv", and so on), so I broadened the criteria to be 257 | considered unsafe, and allowed non-shell (safe) execution. 258 | 259 | The best part is that you can now run something like this: 260 | 261 | find ... | map ": > /dev/null" 262 | 263 | and it'll give you a nice little list of files that you probably want to 264 | rename... 265 | 266 | ...or, worse, didn't expect to find! 267 | 268 | ## summary of files I found on my machines: 269 | 270 | I found the following broad types of unsafe filenames. Except for the second 271 | and third items on the list, I don't ever anticipate **needing** to process 272 | them individually. For those two, I'll only use binaries ("mv", "tar", etc.) 273 | or carefully vetted scripts when processing them via map. 274 | 275 | 1. `~` at the end, like some editors leave (or `pubring.gpg~`). Mine 276 | doesn't, but it might not be a bad idea to make an exception for `~` if it 277 | is not at the beginning of a filename; that should be quite safe. 278 | 279 | 2. parentheses and single quote are sometimes found in media and document 280 | names. 281 | 282 | 3. same with `&`, though a bit less often. 283 | 284 | 4. gpg revocation keys appear to contain the keyid in parens. 285 | 286 | 5. mozilla and thunderbird extensions use braces `{}` around some long GUIDs. 287 | -------------------------------------------------------------------------------- /map: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | use strict; 3 | use warnings; 4 | use 5.10.0; 5 | use Data::Dumper; 6 | $Data::Dumper::Terse = 1; 7 | $Data::Dumper::Indent = 0; 8 | 9 | use Time::HiRes qw(sleep); 10 | use POSIX ":sys_wait_h"; 11 | use Getopt::Std; 12 | 13 | # ---------------------------------------------------------------------- 14 | # constants, globals, and option handling 15 | use vars qw($opt_n $opt_p $opt_d $opt_f $opt_i $opt_h $cmdt %pid); 16 | my $qr_shell_meta = qr([`~#\$&*()\[\]\{\}\\|;'"<>?]); 17 | my $qr_unsafe_file = qr(^-|[\n\t]|$qr_shell_meta); 18 | my $cmdt_has_shell_meta = 0; 19 | my $unsafe_count = 0; 20 | my $fail_count = 0; 21 | my $D = $ENV{MAP_D} // ( ( $ENV{D} || 0 ) =~ /^[0-9]$/ ? $ENV{D} : 0 ); 22 | 23 | usage() unless @ARGV; 24 | 25 | # -N as *first* arg is shortcut for "-p1 -nN"; do this before calling getopt 26 | splice @ARGV, 0, 1, "-p1", "-n$1" if ( $ARGV[0] =~ /^-(\d+)$/ ); 27 | getopts('n:p:d:fih'); 28 | usage() if $opt_h; 29 | 30 | # ---------------------------------------------------------------------- 31 | # main 32 | 33 | setup(); # uses up $ARGV[0], at least 34 | _log( 1, "input options: -n $opt_n -p $opt_p, template: '$cmdt'" ); 35 | 36 | if (@ARGV) { 37 | batcher($_) for @ARGV; 38 | batcher(undef); # undef signals 'eof' to batcher 39 | } else { 40 | while (<>) { 41 | chomp; next unless /\S/; 42 | batcher($_); 43 | } 44 | batcher(undef); 45 | } 46 | 47 | # stragglers 48 | 1 while _wait() >= 0; 49 | 50 | # status 51 | $unsafe_count = 0 if $opt_i; # user doesn't care! 52 | _log( 0, "==== $unsafe_count unsafe filenames found ====" ) if $unsafe_count; 53 | _log( 0, "==== $fail_count jobs failed ====" ) if $fail_count; 54 | exit( ( $fail_count ? 1 : 0 ) + ( $unsafe_count ? 2 : 0 ) ); 55 | 56 | # ---------------------------------------------------------------------- 57 | # setup 58 | 59 | sub setup { 60 | @ARGV = ( $ENV{SHELL}, "-c" ) unless @ARGV; 61 | 62 | if ( -t 0 ) { 63 | $cmdt = shift @ARGV; 64 | } else { 65 | $cmdt = join( " ", @ARGV ); @ARGV = (); 66 | } 67 | $opt_p ||= ( $ENV{MAP_MAX_PROCS} // 4 ); 68 | 69 | my $count = ( $cmdt =~ s/%/%/g ); # count %s 70 | $count -= ( ( $cmdt =~ s/%%/%%/g ) * 2 ); # subtract count of %% 71 | $cmdt .= " %F" unless $count; # append single %F if needed 72 | $opt_n ||= ( $count > 1 ? 1 : ( $ENV{MAP_MAX_ARGS} // 100 ) ); 73 | 74 | # default % is %F unless followed by D/B/E/%/1-9 75 | $cmdt =~ s/%([FDBE%1-9])|%(.|$)/"%" . ($1 ? $1 : "F$2")/ge; 76 | 77 | $cmdt_has_shell_meta = 1 if $cmdt =~ $qr_shell_meta; 78 | } 79 | 80 | # ---------------------------------------------------------------------- 81 | # batcher 82 | 83 | sub batcher { 84 | state $count = 0; 85 | state @batch; 86 | 87 | my $arg = shift; 88 | 89 | if ( defined($arg) ) { 90 | if ( $arg =~ $qr_unsafe_file ) { 91 | $unsafe_count++; 92 | if ( not $opt_i and $cmdt_has_shell_meta ) { 93 | # the command template requires shell, but the argument is not shell safe 94 | _log( 0, "rejected: '$arg'" ); 95 | return; 96 | } 97 | } 98 | push @batch, $arg; 99 | $count++; 100 | } 101 | 102 | if ( $count >= $opt_n or ( $count and !defined($arg) ) ) { 103 | builder(@batch); 104 | $count = 0; 105 | @batch = (); 106 | } 107 | } 108 | 109 | # ---------------------------------------------------------------------- 110 | # builder 111 | 112 | sub builder { 113 | my @batch = @_; 114 | 115 | my @out = (); 116 | for my $word ( ( split ' ', $cmdt ) ) { 117 | # every word in the command template needs to be examined for %[FDBE1-9] 118 | if ( $word !~ /(? '%', F => $f, D => $d, B => $b, E => $e ); 146 | $word =~ s/%([FDBE%1-9])/exists($x{$1}) ? $x{$1} : $argwords[$1]/ge; 147 | 148 | return $word; 149 | } 150 | 151 | sub split_path { 152 | local $_ = shift; 153 | s(/$)(); # remove irritating trailing slash if present 154 | my $f = $_; 155 | my ( $d, $b, $e ) = ('') x 3; 156 | $d = s(^(.*)/)() ? $1 : "."; 157 | if (m(^(.*)(\.[^.]+)$)) { 158 | ( $b, $e ) = ( $1, $2 ); 159 | } else { # there was no period 160 | $b = $_; 161 | } 162 | return ( $f, $d, $b, $e ); 163 | } 164 | 165 | # ---------------------------------------------------------------------- 166 | # runner 167 | 168 | sub runner { 169 | my @cmd = @_; 170 | 171 | while ( keys(%pid) >= $opt_p ) { 172 | my $pid = _wait(); 173 | delete $pid{$pid}; 174 | } 175 | 176 | my $pid = spawn(@cmd); 177 | $pid{$pid} = join( " ", @cmd ); 178 | _log( 1, "$pid\t" . ( $cmdt_has_shell_meta ? $pid{$pid} : Dumper( \@cmd ) ) ); 179 | } 180 | 181 | sub spawn { 182 | my @cmd = @_; 183 | 184 | # don't spawn more than 100 per second unless '-f' supplied 185 | sleep 0.01 unless $opt_f; 186 | 187 | my $pid = fork; 188 | die "fork: $!" unless defined $pid; 189 | return $pid if $pid; 190 | 191 | # if running only one-at-a-time, let STDIN be /dev/tty, a la `xargs -o` 192 | open(STDIN, "<", "/dev/tty") if $opt_p == 1; 193 | 194 | if ($cmdt_has_shell_meta) { 195 | # command *needs* to be run from the shell, since it has meta 196 | # characters. We have also rejected filenames which could cause a 197 | # problem so it's safe to run this way. 198 | my $cmd = join( " ", @cmd ); 199 | exec $ENV{SHELL}, "-c", $cmd; 200 | } else { 201 | # command doesn't need to be run from a shell (and some arguments 202 | # *may* be shell unsafe too), so run it securely 203 | exec { $cmd[0] } @cmd; 204 | } 205 | 206 | die "exec: $!"; 207 | } 208 | 209 | sub _wait { 210 | my $pid = wait(); 211 | return $pid if $pid == -1; # no more processes to wait on 212 | my $es = $?; 213 | return $pid unless $es or $D; # nothing to see, nothing asked to show 214 | 215 | $fail_count++ if $es; 216 | if ( $es == -1 ) { 217 | $es = "FAILED: $!"; 218 | } elsif ( $es & 127 ) { 219 | $es = sprintf "KILLED: signal %d, %s coredump", ( $es & 127 ), ( $es & 128 ) ? 'with' : 'without'; 220 | } else { 221 | $es = sprintf "EXITED: exit status %d", $es >> 8; 222 | } 223 | 224 | # you have to show *something*, whether because es != 0 or D != 0. For D 225 | # != 0, the command need not be shown; it was already shown when it started 226 | _log( 0, "$pid\t$es" . ( $D ? "" : "\n\t$pid{$pid}" ) ); 227 | return $pid; 228 | } 229 | 230 | # ---------------------------------------------------------------------- 231 | # service routines 232 | 233 | sub gen_ts { 234 | my ( $s, $m, $h ) = (localtime)[ 0 .. 2 ]; 235 | for ( $s, $m, $h ) { 236 | $_ = "0$_" if $_ < 10; 237 | } 238 | return "[$h:$m:$s] "; 239 | } 240 | 241 | sub _log { 242 | my ( $lvl, $msg ) = @_; 243 | return if $lvl > ( $D || 0 ); 244 | say STDERR gen_ts . $msg; 245 | } 246 | 247 | # ---------------------------------------------------------------------- 248 | # usage 249 | 250 | sub usage { 251 | say ; 252 | exit 1; 253 | } 254 | 255 | __DATA__ 256 | 257 | Usage: map [-p maxprocs] [-n max-args] [-ds|-dt|-dX] [-f] command-template [args] 258 | 259 | -p: maxprocs: number of commands to run in parallel (default 4) 260 | -n: max-args: number of arguments in each invocation of a command (default is 261 | either 1 or 100; see documentation) 262 | -d: delimiter mode: see documentation 263 | -f: fullspeed; do not delay 0.01s before each fork 264 | 265 | Also, if the FIRST argument is '-1' (in general, '-N' for any integer N), this 266 | is converted to '-p1 -nN'. That is, single-processing, N arguments per command. 267 | 268 | Please note THIS IS A COMPLETE REWRITE (Nov 2015). There are several 269 | backward-incompatible changes from the old "map" command. Please see 270 | http://github.com/sitaramc/map/index.mkd for details. 271 | 272 | -------------------------------------------------------------------------------- /map-vs-gp.mkd: -------------------------------------------------------------------------------- 1 | # Map versus GNU Parallel 2 | 3 | Here's a feature comparision of map versus GNU parallel, mostly using examples 4 | in their [man page](http://www.gnu.org/software/parallel/man.html). 5 | 6 | Note that GNU parallel appears to have an unholy obsession with files that 7 | have funky names, like filenames with **newlines** in them. One of their 8 | examples is a file called `My brother's 12" records` (no comments from the 9 | peanut gallery please!). 10 | 11 | In real life, if you have a file with a newline or a double quote (or 12 | combination of quotes) in it, it's more likely someone trying to attack you 13 | than anything else. See Appendix 1 in the main documentation for a longer 14 | rant on this. 15 | 16 | # Summary 17 | 18 | Anyway, the [main man page](https://www.gnu.org/software/parallel/man.html) 19 | has 48 examples (as of 2015-11). Here are some meta-comments about those 20 | examples that didn't translate cleanly (or in some cases didn't translate at 21 | all) in map: 22 | 23 | * Cartesian product: Map can do that by piping to another map (probably ad 24 | infinitum), but on the rare occasions I have needed them, I used a very 25 | simple program; see Appendix 1 (it's small enough that I can -- if I ever 26 | feel the need -- roll it into map, too). Almost all of the examples 27 | involving CPs can be done using that, with the final result piped to map's 28 | delimiter mode; see example 13 below. 29 | 30 | * Embedding perl code: we don't do that, but it can't be that hard, if 31 | needed. If someone sends me a really convincing -- as in, not contrived 32 | -- use case I will probably add it. 33 | 34 | * Grouping output lines: I have never come across a case where I want 35 | parallelism as well as ordered output. On the rare cases that I did, I 36 | just add `| tail -9999` to the command template; works fine. 37 | 38 | * Tagging output lines: Even rarer than the above. If needed, I would 39 | probably just add `| sed -e "s/^/% /"`. 40 | 41 | * Keep output order same as input: Again, I never needed that *and* 42 | parallelism. Unlike the previous two examples, I don't have a ready 43 | alternative either. So this is definitely something I may look into if I 44 | ever see the need (or someone asks). 45 | 46 | * Splitting a big file and sending it to different invocations: I had a 47 | program called split_map in the old 'map' but I don't think I ever used 48 | it so I got rid of it. (If someone was using it and needs it back, let me 49 | know...) 50 | 51 | * Dealing with remote systems, sending files back and forth, and so on: 52 | fuggedaboudit! 53 | 54 | PS: Honestly, Parallel has so many options for so many things I'm disappointed 55 | it doesn't come with a game of tetris to play while you're waiting for your 56 | jobs to complete ;-) 57 | 58 | # Details 59 | 60 | Let's briefly recap map's options, defaults, and argument handling first. The 61 | default max-args is 100 (`-n` option) and the default parallelism is 4 (`-p` 62 | option). The documentation explains the rationale behind these defaults. The 63 | command template can contain `%` (which is eqvt to `%F`), or `%D`, `%B`, and 64 | `%E`. Generally, `%F` equals `%D/%B.%E`. In 'delimiter mode', you can use 65 | `%1` thru `%9`. 66 | 67 | With that out of the way, here are the details for some of the interesting 68 | ones, as briefly as I could write them up. I've left out many that are 69 | boring, trivially doable, or repeat an earlier example with marginal changes, 70 | Stuff that is completely outside the core idea (like all the remote file 71 | handling stuff) is also ignored. And I've already ranted about filenames with 72 | quote characters and so on. 73 | 74 | * example 1, xargs -n1 75 | 76 | parallel gzip --best 77 | map -n1 "gzip --best" 78 | 79 | * example 2, arguments from command line 80 | 81 | parallel gzip --best ::: *.html 82 | map -n1 "gzip --best" *.html 83 | 84 | * example 3, inserting multiple arguments 85 | 86 | parallel -m mv {} destdir 87 | map "mv % destdir" 88 | 89 | * example 4, context replace 90 | 91 | seq -w 0 9999 | parallel -X rm pict{}.jpg 92 | seq -w 0 9999 | map "rm pict%.jpg" 93 | 94 | * example 5, compute intensive jobs and substitution 95 | 96 | find . -name '*.jpg' | parallel convert -geometry 120 {} {}_thumb.jpg 97 | find . -name '*.jpg' | map "convert -geometry 120 % %_thumb.jpg" 98 | 99 | find . -name '*.jpg' | parallel convert -geometry 120 {} {.}_thumb.jpg 100 | find . -name '*.jpg' | map "convert -geometry 120 % %D/%B_thumb.jpg" 101 | 102 | * example 6, substitution and redirection 103 | 104 | parallel "zcat {} >{.}" ::: *.gz 105 | map "zcat % > %B" *.gz 106 | 107 | * example 8, calling bash functions 108 | 109 | So, after impressing us with their ability to handle shell functions 110 | entirely typed at the command line (in example 7), they now stoop down to 111 | how we mere mortals can/should do it. 112 | 113 | Anyway this works fine (in bash), using map. 114 | 115 | * example 10, log rotate 116 | 117 | seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}' 118 | seq 9 -1 1 | map -p1 'echo mv log.% log.$(( % + 1 ))' 119 | 120 | * example 11, removing file extension... 121 | 122 | This is covered extensively in the main documentation. Not sure what 123 | parallel's `{.}` means but for map, you use `%B` if you are sure the file 124 | doesn't have a directory component, and `%D/%B` otherwise. (Or you can 125 | play safe by always using `%D/%B`.) 126 | 127 | * example 12, removing two file extensions 128 | 129 | parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz 130 | 131 | map -1 "echo % %B" *.tar.gz | map -1 echo %B | 132 | map -1 -ds "echo mkdir %2; echo tar -C %2 -xf %1" 133 | 134 | The first map takes "foo.tar.gz" and prints "foo.tar.gz foo.tar". That's 135 | in one line. The next map take that, and since map treats the entire line 136 | as one argument, it chops off what it sees as the extension, giving us 137 | "foo.tar.gz foo". Then delimiter mode picks up the two fields and 138 | substitutes them for %1 and %2 in the appropriate places. 139 | 140 | But yeah, their solution looks simpler. However, I won't be adding a 141 | "%BB" or something; it's not that important. 142 | 143 | * example 13, download 10 images for past 30 days 144 | 145 | parallel echo http://foo.com/'$(date -d "today -{1} days" +%Y%m%d)_{2}.jpg' ::: $(seq 6) ::: $(seq -w 2) 146 | 147 | This is basically a cartesian product, with some backtick interpolation 148 | thrown in to complicate things needlessly. (By which I mean, the date 149 | command could have been pushed into the `$(seq 6)` piece, and it would 150 | have been much clearer overall.) 151 | 152 | seq 6 | map -1 'date -d "today -% days" +%%Y%%m%%d' | 153 | map -1 'seq -w 2 | map -1 echo % %%' | 154 | map -1 -ds "echo wget http://foo.com/%1_%2.jpg" 155 | 156 | The first line produces a series of dates (YYYYmmdd), the second creates a 157 | cartesian product with the sequence "1", and "2", producing lines like 158 | "20151115 1", "20151115 2", etc., and the third line uses delimiter mode 159 | to create the actual command from the two fields in each line. 160 | 161 | But honestly, if I needed a cartesian product I'd use a program meant to 162 | do just that; see Appendix 1. 163 | 164 | * example 14, embedding perl code 165 | 166 | The stated purpose is "Copy files as last modified date (ISO8601) with 167 | added random digits", but why would you do that? 168 | 169 | find . | parallel cp {} \ 170 | '../destdir/{= $a=int(10000*rand); $_=`date -r "$_" +%FT%T"$a"`; chomp; =}' 171 | 172 | Map doesn't allow embedded perl code, but this specific example can be 173 | done -- for what it is worth with such a meaningless exercise -- in shell 174 | too: 175 | 176 | find . | map -1 'cp % ../dst/`date -r % +%%FT%%T `_$RANDOM' 177 | 178 | * example 25, suppsedly about "using shell variables" 179 | 180 | Please see the appendix on "security" in the documentation for map. 181 | 182 | * example 29, parallel grep: "on multi-core CPUs [...] can often speed this 183 | up", says the manual. 184 | 185 | Not for most people. I've seen people make measurements on a warm cache 186 | and walk away blithely saying "grep is not IO-bound". Sorry, but it is. 187 | Disk speeds -- and especially latency -- have not kept pace with CPU. 188 | 189 | There are two exceptions to this. One, your regex is pathologically bad, 190 | (or you need to dust off your regex book). Two, you are using high end 191 | SSD, because then not only is the raw speed much more, there is no 192 | latency. 193 | 194 | Anyway, the only part of their examples that map does not do, is the "keep 195 | order". (Curiously, they are feeding it from `find` without a `sort`, 196 | which makes the "keep order" somewhat useless; I have never known `find` 197 | to print filenames in any predictable order.) 198 | 199 | * example 39, run the same command 10 times 200 | 201 | seq 10 | parallel -n0 my_command my_args 202 | seq 10 | map "my_command my_args # %" 203 | 204 | yeah, it's a trick but it works fine! 205 | 206 | * example 40, working as 'cat | sh' 207 | 208 | parallel -j 100 < jobs_to_run 209 | map -p100 -n1 < jobs_to_run 210 | 211 | # Appendix 1: cartesian products 212 | 213 | I have rarely found a need for a cartesian product but it does happen. Long 214 | before I wrote `map`, I had the following script in my `~/bin`: 215 | 216 | #!/usr/bin/perl 217 | 218 | # seq 1 3 | cart-prod a b c | cart-prod one two 219 | 220 | use strict; 221 | use warnings; 222 | 223 | my @args = @ARGV; 224 | @ARGV = (); 225 | 226 | while (<>) { 227 | chomp; 228 | for my $a (@args) { 229 | print "$_\t" if $_; 230 | print "$a\n"; 231 | } 232 | } 233 | 234 | If I had to do example 13 it would be: 235 | 236 | seq 6 | cart-prod {1..2} | while read a b 237 | do 238 | echo wget http://www.example.com/path/to/$(date -d "today -$a days" +%Y%m%d)_$b.jpg 239 | done 240 | 241 | which is a lot more readable than Parallel's ":::" syntax. 242 | 243 | Similarly, the first part of example 16 would be: 244 | 245 | seq 5 | cart-prod {01..10} | cart-prod {1..5} | map -ds 'echo x%1y%2z%3 > x%1y%2z%3' 246 | 247 | And example 21 (finding the lowest difference between files) would be: 248 | 249 | ls | cart-prod * | map -1 -ds 'echo %1 %2\\t`diff %1 %2 | wc -l`' | sort -k3n 250 | --------------------------------------------------------------------------------