├── .gitignore ├── Makefile ├── NOTES.textile ├── README ├── br └── brutils ├── Makefile ├── README ├── brm.c ├── brp.c └── brutils.h /.gitignore: -------------------------------------------------------------------------------- 1 | brutils/brm 2 | brutils/brp 3 | *.o 4 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | BINDIR=/usr/local/bin 2 | 3 | all: 4 | make -C brutils all 5 | 6 | clean: 7 | make -C brutils clean 8 | 9 | install: all 10 | make -C brutils install 11 | install -c br $(BINDIR) 12 | 13 | uninstall: 14 | make -C brutils uninstall 15 | rm $(BINDIR)/br 16 | -------------------------------------------------------------------------------- /NOTES.textile: -------------------------------------------------------------------------------- 1 | h2. bashreduce : mapreduce in a bash script 2 | 3 | bashreduce lets you apply your favorite unix tools in a mapreduce fashion across multiple machines/cores. There's no installation, administration, or distributed filesystem. You'll need: 4 | 5 | * "br":http://github.com/erikfrey/bashreduce/blob/master/br somewhere handy in your path 6 | * vanilla unix tools: sort, awk, ssh, netcat 7 | * password-less ssh to each machine you plan to use 8 | 9 | h2. Configuration 10 | 11 | Edit @/etc/br.hosts@ and enter the machines you wish to use as workers. Or specify your machines at runtime: 12 | 13 |
br -m "host1 host2 host3"
14 | 15 | To take advantage of multiple cores, repeat the host name. 16 | 17 | h2. Examples 18 | 19 | h3. sorting 20 | 21 |
br < input > output
22 | 23 | h3. word count 24 | 25 |
br -r "uniq -c" < input > output
26 | 27 | h3. great big join 28 | 29 |
LC_ALL='C' br -r "join - /tmp/join_data" < input > output
30 | 31 | h2. Performance 32 | 33 | h3. big honkin' local machine 34 | 35 | Let's start with a simpler scenario: I have a machine with multiple cores and with normal unix tools I'm relegated to using just one core. How does br help us here? Here's br on an 8-core machine, essentially operating as a poor man's multi-core sort: 36 | 37 | |_. command |_. using |_. time |_. rate | 38 | | sort -k1,1 -S2G 4gb_file > 4gb_file_sorted | coreutils | 30m32.078s | 2.24 MBps | 39 | | br -i 4gb_file -o 4gb_file_sorted | coreutils | 11m3.111s | 6.18 MBps | 40 | | br -i 4gb_file -o 4gb_file_sorted | brp/brm | 7m13.695s | 9.44 MBps | 41 | 42 | The job completely i/o saturates, but still a reasonable gain! 43 | 44 | h3. many cheap machines 45 | 46 | Here lies the promise of mapreduce: rather than use my big honkin' machine, I have a bunch of cheaper machines lying around that I can distribute my work to. How does br behave when I add four cheaper 4-core machines into the mix? 47 | 48 | |_. command |_. using |_. time |_. rate | 49 | | sort -k1,1 -S2G 4gb_file > 4gb_file_sorted | coreutils | 30m32.078s | 2.24 MBps | 50 | | br -i 4gb_file -o 4gb_file_sorted | coreutils | 8m30.652s | 8.02 MBps | 51 | | br -i 4gb_file -o 4gb_file_sorted | brp/brm | 4m7.596s | 16.54 MBps | 52 | 53 | We have a new bottleneck: we're limited by how quickly we can partition/pump our dataset out to the nodes. awk and sort begin to show their limitations (our clever awk script is a bit cpu bound, and @sort -m@ can only merge so many files at once). So we use two little helper programs written in C (yes, I know! it's cheating! if you can think of a better partition/merge using core unix tools, contact me) to partition the data and merge it back. 54 | 55 | h3. Future work 56 | 57 | I've tested this on ubuntu/debian, but not on other distros. According to Daniel Einspanjer, netcat has different parameters on Redhat. 58 | 59 | br has a poor man's dfs like so: 60 | 61 |
br -r "cat > /tmp/myfile" < input
62 | 63 | But this breaks if you specify the same host multiple times. Maybe some kind of very basic virtualization is in order. Maybe. 64 | 65 | Other niceties would be to more closely mimic the options presented in sort (numeric, reverse, etc). 66 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | NAME 2 | br - bashreduce, map/reduce in bash 3 | 4 | SYNOPSIS 5 | br [-h '[ ][...]'] [-f] 6 | [-m ] [-r ] [-M ] 7 | [-i ] [-o ] [-c ] [-S ] 8 | [-t ] [-?] 9 | 10 | DESCRIPTION 11 | br implements a map/reduce framework that allows the map and reduce 12 | jobs to be constructed from standard Unix tools. It can operate on 13 | data piped into it or on files known to exist on the worker nodes. 14 | 15 | INSTALLATION 16 | br can run out-of-the box but for convenience, put it somewhere on 17 | your PATH. The brutils come with a Makefile that will install 18 | these tools, probably to /usr/local/bin. If you leave these tools 19 | in place relative to br, they'll be picked up and used from there. 20 | 21 | You'll need passwordless ssh access to each machine you wish to 22 | use as a worker (even localhost, for now). 23 | 24 | OPTIONS 25 | -h The list of hosts to be used, specified as a space- 26 | delimited list. Specifying the same host multiple 27 | times is allowed and is useful for taking explicit 28 | advantage of multiple cores. 29 | 30 | -f Only pass filenames from the master to the workers, 31 | under the assumption that they all have a mirror of 32 | the data set. This is equivalent to prefixing your 33 | map program with "xargs cat |". 34 | 35 | -m The map program, which should expect initial data on 36 | stdin and produce the intermediate data on stdout. 37 | The stderr from this program will be logged [1]. 38 | 39 | -r The reduce program, which should expect intermediate 40 | data on stdin and produce the final data on stdout. 41 | The stderr from this program will be logged [1]. 42 | 43 | -M The merge program, which should expect each node's 44 | final data on stdin and produce the unified final 45 | data on stdout. The stderr from this program will 46 | be logged [1]. 47 | 48 | -i A file or directory to serve as input. If it is a 49 | file, this is equivalent to piping that file into 50 | br. If it is a directory, it is equivalent to 51 | piping the names of every file in that directory 52 | into br. 53 | 54 | -o The local file where output will be stored. 55 | Defaults to stdout. 56 | 57 | -c The column used by sort to produce the final output. 58 | Defaults to 1. 59 | 60 | -S This is directly the -S option from sort(1). It defaults 61 | to 256M. 62 | 63 | -t The temporary directory for storing br files. All of 64 | these files will be prefixed with "br_" and, with the 65 | exception of br_stderr, will be cleaned up upon 66 | successful exit. Defaults to /tmp. 67 | 68 | -? Shows a help message and details about the arguments. 69 | 70 | FILES 71 | All of these files are in the tmp directory specified by the user, 72 | which defaults to /tmp. 73 | 74 | br_job_* 75 | The master uses named pipes to move data (or filenames) 76 | to the worker(s). These pipes are stored here. This 77 | directory is removed at the end of the job. 78 | 79 | br_node_* 80 | Each node uses named pipes to buffer data. These 81 | are stored here and do not overlap, even if a 82 | node is specified twice in the host list. These 83 | directories are removed at the end of the job. 84 | 85 | br_stderr 86 | This file is cleared at the beginning of each 87 | job. It records stderr from most commands run 88 | by br. This is currently a point of contention 89 | between jobs. This file is NOT removed at the 90 | end of the job. 91 | 92 | NOTES 93 | [1] Because you can specify a pipeline for any or all of the map, 94 | reduce and merge steps, br can only reliably log stderr from 95 | the last step in those pipelines. 96 | 97 | AUTHORS 98 | Erik Frey 99 | Richard Crowley 100 | 101 | SEE ALSO 102 | 103 | 104 | 105 | LICENSE 106 | The goodwill of Erik Frey. 107 | -------------------------------------------------------------------------------- /br: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # bashreduce: mapreduce in bash 3 | # erik@fawx.com 4 | 5 | usage() { 6 | echo "Usage: $1 [-h '[ ][...]'] [-f]" >&2 7 | echo " [-m ] [-r ] [-M ]" >&2 8 | echo " [-i ] [-o ] [-c ]" \ 9 | "[-S ]" >&2 10 | echo " [-t ] [-?]" >&2 11 | if [ -n "$2" ] ; then 12 | echo " -h hosts to use; repeat hosts for multiple cores" >&2 13 | echo " (defaults to contents of /etc/br.hosts)" >&2 14 | echo " -f send filenames, not data, over the network," >&2 15 | echo " implies each host has a mirror of the dataset" >&2 16 | echo " -m map program (effectively defaults to cat)" >&2 17 | echo " -r reduce program" >&2 18 | echo " -M merge program" >&2 19 | echo " -i input file or directory (defaults to stdin)" >&2 20 | echo " -o output file (defaults to stdout)" >&2 21 | echo " -c column used by sort (defaults to 1)" >&2 22 | echo " -S memory to use for sort (defaults to 256M)" >&2 23 | echo " -t tmp directory (defaults to /tmp)" >&2 24 | echo " -? this help message" >&2 25 | fi 26 | exit 2 27 | } 28 | 29 | # Defaults 30 | hosts= 31 | filenames=false 32 | map= 33 | reduce= 34 | merge= 35 | input= 36 | output= 37 | column=1 38 | sort_mem=256M 39 | tmp=/tmp 40 | 41 | program=$(basename $0) 42 | while getopts "h:fm:r:M:i:o:c:S:t:?" name; do 43 | case "$name" in 44 | h) hosts=$OPTARG;; 45 | f) filenames=true;; 46 | m) map=$OPTARG;; 47 | r) reduce=$OPTARG;; 48 | M) merge=$OPTARG;; 49 | i) input=$OPTARG;; 50 | o) output=$OPTARG;; 51 | c) column=$OPTARG;; 52 | S) sort_mem=$OPTARG;; 53 | t) tmp=$OPTARG;; 54 | ?) usage $program MOAR;; 55 | *) usage $program;; 56 | esac 57 | done 58 | 59 | # If -h wasn't given, try /etc/br.hosts 60 | if [[ -z "$hosts" ]]; then 61 | if [[ -e /etc/br.hosts ]]; then 62 | hosts=$(cat /etc/br.hosts) 63 | else 64 | echo "$program: must specify -h or provide /etc/br.hosts" 65 | usage $program 66 | fi 67 | fi 68 | 69 | # Start br_stderr from a clean slate 70 | cp /dev/null $tmp/br_stderr 71 | 72 | # Setup map and reduce as parts of a pipeline 73 | [[ -n "$map" ]] && map="| $map 2>>$tmp/br_stderr" 74 | [[ $filenames == true ]] && map="| xargs -n1 \ 75 | sh -c 'zcat \$0 2>>$tmp/br_stderr || cat \$0 2>>$tmp/br_stderr' $map" 76 | [[ -n "$reduce" ]] && reduce="| $reduce 2>>$tmp/br_stderr" 77 | 78 | jobid="$(uuidgen)" 79 | jobpath="$tmp/br_job_$jobid" 80 | nodepath="$tmp/br_node_$jobid" 81 | mkdir -p $jobpath/{in,out} 82 | 83 | port_in=8192 84 | port_out=$(($port_in + 1)) 85 | host_idx=0 86 | out_files= 87 | for host in $hosts; do 88 | mkfifo $jobpath/{in,out}/$host_idx 89 | 90 | # Listen for work (remote) 91 | ssh -n $host "mkdir -p $nodepath/" 92 | pid=$(ssh -n $host "nc -l -p $port_out >$nodepath/$host_idx \ 93 | 2>>$tmp/br_stderr >$tmp/br_stderr >$tmp/br_stderr \ 101 | $map $reduce \ 102 | | nc -q0 -l -p $port_in >>$tmp/br_stderr &" 103 | 104 | # Send work (local) 105 | nc $host $port_in >$jobpath/in/$host_idx & 106 | 107 | # Receive results (local) 108 | nc -q0 $host $port_out <$jobpath/out/$host_idx & 109 | out_files="$out_files $jobpath/out/$host_idx" 110 | 111 | # ++i 112 | port_in=$(($port_in + 2)) 113 | port_out=$(($port_in + 1)) 114 | host_idx=$(($host_idx + 1)) 115 | 116 | done 117 | 118 | # Create the command to produce input 119 | if [[ -d "$input" ]]; then 120 | input="find $input -type f |" 121 | [[ $filenames == false ]] && input="$input xargs -n1 \ 122 | sh -c 'zcat \$0 2>>$tmp/br_stderr || cat \$0 2>>$tmp/br_stderr' |" 123 | elif [[ -f "$input" ]]; then 124 | input="sh -c 'zcat $input 2>>$tmp/br_stderr \ 125 | || cat $input 2>>$tmp/br_stderr' |" 126 | else 127 | input= 128 | fi 129 | 130 | # Partition local input to the remote workers 131 | if which brp >>$tmp/br_stderr; then 132 | BRP=brp 133 | elif [[ -f brutils/brp ]]; then 134 | BRP=brutils/brp 135 | fi 136 | if [[ -n "$BRP" ]]; then 137 | eval "$input $BRP - $(($column - 1)) $out_files" 138 | else 139 | # use awk if we don't have brp 140 | # we're taking advantage of a special property that awk leaves its file handles open until its done 141 | # i think this is universal 142 | # we're also sending a zero length string to all the handles at the end, in case some pipe got no love 143 | eval "$input awk '{ 144 | srand(\$$column); 145 | print \$0 >>\"$jobpath/out/\"int(rand() * $host_idx); 146 | } 147 | END { 148 | for (i = 0; i != $host_idx; ++i) 149 | printf \"\" >>\"$jobpath/out/\"i; 150 | }'" 151 | fi 152 | 153 | # Merge output from hosts into one 154 | # Maybe use the -M program, if not just sort (preferring brm) 155 | if which brm >>$tmp/br_stderr; then 156 | BRM=brm 157 | elif [[ -f brutils/brm ]]; then 158 | BRM=brutils/brm 159 | fi 160 | if [[ -n "$merge" ]]; then 161 | eval "find $jobpath/in -type p | xargs cat \ 162 | | $merge 2>>$tmp/br_stderr ${output:+| pv >$output}" 163 | else 164 | if [[ -n "$BRM" ]]; then 165 | eval "$BRM - $(($column - 1)) $(find $jobpath/in/ -type p | xargs) \ 166 | ${output:+| pv >$output}" 167 | else 168 | # use sort -m if we don't have brm 169 | # sort -m creates tmp files if too many input files are specified 170 | # brm doesn't do this 171 | eval "sort -k$column,$column -S$sort_mem -m $jobpath/in/* \ 172 | ${output:+| pv >$output}" 173 | fi 174 | fi 175 | 176 | # Cleanup 177 | rm -rf $jobpath 178 | for host in $hosts; do 179 | ssh $host "rm -rf $nodepath" 180 | done 181 | 182 | # TODO: is there a safe way to kill subprocesses upon fail? 183 | # this seems to work: /bin/kill -- -$$ 184 | -------------------------------------------------------------------------------- /brutils/Makefile: -------------------------------------------------------------------------------- 1 | CFLAGS = -O3 -Wall 2 | OBJS_BRP = brp.o 3 | OBJS_BRM = brm.o 4 | HEADERS = brutils.h 5 | LIBS = 6 | TARGET_BRP = brp 7 | TARGET_BRM = brm 8 | BINDIR=/usr/local/bin 9 | 10 | all: $(TARGET_BRP) $(TARGET_BRM) 11 | 12 | $(TARGET_BRP): $(OBJS_BRP) $(HEADERS) 13 | $(CC) -o $(TARGET_BRP) $(OBJS_BRP) $(LIBS) 14 | 15 | $(TARGET_BRM): $(OBJS_BRM) $(HEADERS) 16 | $(CC) -o $(TARGET_BRM) $(OBJS_BRM) $(LIBS) 17 | 18 | clean: 19 | rm -f $(OBJS_BRP) $(OBJS_BRM) $(TARGET_BRP) $(TARGET_BRM) 20 | 21 | install: all 22 | install -c brp $(BINDIR) 23 | install -c brm $(BINDIR) 24 | 25 | uninstall: 26 | rm $(BINDIR)/br{p,m} 27 | -------------------------------------------------------------------------------- /brutils/README: -------------------------------------------------------------------------------- 1 | Too bad that partitioning using awk is fairly cpu bound. Here's a little c cheat. If someone can think of a way to partition text that's much faster than the awk script in br, email me: erik@fawx.com . 2 | -------------------------------------------------------------------------------- /brutils/brm.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include "brutils.h" 8 | 9 | void showusage() { 10 | fprintf(stderr, "usage: brm output column-index input1 input2 input3...\n"); 11 | fprintf(stderr, " can specify '-' for output, to write to stdout\n"); 12 | exit(1); 13 | } 14 | 15 | int main(int argc, char * argv[]) 16 | { 17 | FILE * pout = stdout; 18 | int i, col_index; 19 | 20 | if (argc < 4) 21 | showusage(); 22 | if (strcmp(argv[1], "-") != 0) 23 | pout = try_open(argv[1], "wb"); 24 | col_index = atoi(argv[2]); 25 | 26 | int lines_len = argc - 3; 27 | line_t ** lines = (line_t **) malloc( lines_len * sizeof(line_t *) ); 28 | line_t ** lines_end = lines; 29 | for (i = 0; i != lines_len; ++i) { 30 | *lines_end = (line_t *) malloc( sizeof(line_t) ); 31 | (*lines_end)->pin = try_open(argv[i + 3], "rb"); 32 | if (read_parse(col_index, *lines_end)) { 33 | ++lines_end; 34 | lower_bound_move(lines, lines_end); 35 | } 36 | else { 37 | fclose((*lines_end)->pin); 38 | free(*lines_end); 39 | } 40 | } 41 | 42 | // okay, merge! 43 | line_t * back; 44 | while (lines != lines_end) { 45 | // write to out 46 | back = *(lines_end - 1); 47 | *back->col_end = back->col_end_val; 48 | fputs(back->buf, pout); 49 | if (read_parse(col_index, back)) 50 | lower_bound_move(lines, lines_end); 51 | else { 52 | fclose(back->pin); 53 | --lines_end; 54 | } 55 | } 56 | 57 | if (pout != stdout) 58 | fclose(pout); 59 | 60 | return 0; 61 | } 62 | -------------------------------------------------------------------------------- /brutils/brp.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include "brutils.h" 8 | 9 | void showusage() { 10 | fprintf(stderr, "usage: brp input column-index output1 output2 output3...\n"); 11 | fprintf(stderr, " can specify '-' for input, to read from stdin\n"); 12 | exit(1); 13 | } 14 | 15 | int main(int argc, char * argv[]) 16 | { 17 | line_t line; 18 | int i, col_index; 19 | 20 | if (argc < 4) 21 | showusage(); 22 | if (strcmp(argv[1], "-") != 0) 23 | line.pin = try_open(argv[1], "rb"); 24 | else 25 | line.pin = stdin; 26 | col_index = atoi(argv[2]); 27 | 28 | int pouts_len = argc - 3; 29 | FILE ** pouts = (FILE **) malloc( pouts_len * sizeof(FILE *) ); 30 | for (i = 0; i != pouts_len; ++i) 31 | pouts[i] = try_open(argv[i + 3], "wb"); 32 | 33 | while (fgets(line.buf, sizeof(line.buf), line.pin)) { 34 | if ( find_col(col_index, &line) ) // if this string has the requisite number of columns 35 | fputs(line.buf, pouts[fnv_hash(line.col_beg, line.col_end) % pouts_len]); // write it to the correct file 36 | } 37 | 38 | if (line.pin != stdin) 39 | fclose(line.pin); 40 | 41 | for (i = 0; i != pouts_len; ++i) 42 | fclose(pouts[i]); 43 | 44 | return 0; 45 | } 46 | 47 | -------------------------------------------------------------------------------- /brutils/brutils.h: -------------------------------------------------------------------------------- 1 | #ifndef __BR_UTILS_H__ 2 | #define __BR_UTILS_H__ 3 | 4 | #include 5 | #include 6 | 7 | FILE * try_open(const char * path, const char * flags) { 8 | FILE * p = fopen(path, flags); 9 | if (!p) { 10 | fprintf(stderr, "could not open %s: %s\n", path, strerror(errno)); 11 | exit(1); 12 | } 13 | return p; 14 | } 15 | 16 | unsigned int fnv_hash(const char *p, const char *end) { 17 | unsigned int h = 2166136261UL; 18 | for (; p != end; ++p) 19 | h = (h * 16777619) ^ *p; 20 | return h; 21 | } 22 | 23 | typedef struct 24 | { 25 | char buf[8192]; 26 | char * col_beg; 27 | char * col_end; 28 | char col_end_val; 29 | FILE * pin; 30 | } line_t; 31 | 32 | int find_col(int col, line_t * line) { 33 | for (line->col_beg = line->buf; col != 0 && *line->col_beg != 0; ++line->col_beg) { 34 | if ( isspace(*line->col_beg) ) 35 | --col; 36 | } 37 | if (*line->col_beg == 0) 38 | return 0; 39 | for (line->col_end = line->col_beg; !isspace(*line->col_end); ++line->col_end) {} 40 | return 1; 41 | } 42 | 43 | int read_parse(int col, line_t * line) { 44 | while (fgets(line->buf, sizeof(line->buf), line->pin)) { 45 | if (find_col(col, line)) { 46 | line->col_end_val = *line->col_end; 47 | *line->col_end = 0; 48 | return 1; 49 | } 50 | } 51 | return 0; 52 | } 53 | 54 | // move end - 1 to the proper position in beg..end 55 | void lower_bound_move(line_t ** beg, line_t ** end) 56 | { 57 | if (beg == end) 58 | return; 59 | 60 | int len = end - beg - 1; 61 | int half; 62 | line_t ** mid; 63 | 64 | // [ * * * * x ] 65 | // we need to move x to its correct position in the otherwise sorted array 66 | while (len > 0) { 67 | half = len >> 1; 68 | mid = beg + half; 69 | if ( strcoll( (*mid)->col_beg, (*(end - 1))->col_beg) > 0 ) { 70 | beg = mid + 1; 71 | len = len - half - 1; 72 | } 73 | else 74 | len = half; 75 | } 76 | 77 | // if beg < end - 1, we need to move beg up 78 | if (beg < end - 1) { 79 | line_t * tmp = *(end - 1); 80 | memmove(beg + 1, beg, (end - beg - 1) * sizeof(line_t **)); 81 | *beg = tmp; 82 | } 83 | } 84 | 85 | #endif // __BR_UTILS_H__ 86 | --------------------------------------------------------------------------------