├── .gitignore
├── Makefile
├── NOTES.textile
├── README
├── br
└── brutils
├── Makefile
├── README
├── brm.c
├── brp.c
└── brutils.h
/.gitignore:
--------------------------------------------------------------------------------
1 | brutils/brm
2 | brutils/brp
3 | *.o
4 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | BINDIR=/usr/local/bin
2 |
3 | all:
4 | make -C brutils all
5 |
6 | clean:
7 | make -C brutils clean
8 |
9 | install: all
10 | make -C brutils install
11 | install -c br $(BINDIR)
12 |
13 | uninstall:
14 | make -C brutils uninstall
15 | rm $(BINDIR)/br
16 |
--------------------------------------------------------------------------------
/NOTES.textile:
--------------------------------------------------------------------------------
1 | h2. bashreduce : mapreduce in a bash script
2 |
3 | bashreduce lets you apply your favorite unix tools in a mapreduce fashion across multiple machines/cores. There's no installation, administration, or distributed filesystem. You'll need:
4 |
5 | * "br":http://github.com/erikfrey/bashreduce/blob/master/br somewhere handy in your path
6 | * vanilla unix tools: sort, awk, ssh, netcat
7 | * password-less ssh to each machine you plan to use
8 |
9 | h2. Configuration
10 |
11 | Edit @/etc/br.hosts@ and enter the machines you wish to use as workers. Or specify your machines at runtime:
12 |
13 |
br -m "host1 host2 host3"
14 |
15 | To take advantage of multiple cores, repeat the host name.
16 |
17 | h2. Examples
18 |
19 | h3. sorting
20 |
21 | br < input > output
22 |
23 | h3. word count
24 |
25 | br -r "uniq -c" < input > output
26 |
27 | h3. great big join
28 |
29 | LC_ALL='C' br -r "join - /tmp/join_data" < input > output
30 |
31 | h2. Performance
32 |
33 | h3. big honkin' local machine
34 |
35 | Let's start with a simpler scenario: I have a machine with multiple cores and with normal unix tools I'm relegated to using just one core. How does br help us here? Here's br on an 8-core machine, essentially operating as a poor man's multi-core sort:
36 |
37 | |_. command |_. using |_. time |_. rate |
38 | | sort -k1,1 -S2G 4gb_file > 4gb_file_sorted | coreutils | 30m32.078s | 2.24 MBps |
39 | | br -i 4gb_file -o 4gb_file_sorted | coreutils | 11m3.111s | 6.18 MBps |
40 | | br -i 4gb_file -o 4gb_file_sorted | brp/brm | 7m13.695s | 9.44 MBps |
41 |
42 | The job completely i/o saturates, but still a reasonable gain!
43 |
44 | h3. many cheap machines
45 |
46 | Here lies the promise of mapreduce: rather than use my big honkin' machine, I have a bunch of cheaper machines lying around that I can distribute my work to. How does br behave when I add four cheaper 4-core machines into the mix?
47 |
48 | |_. command |_. using |_. time |_. rate |
49 | | sort -k1,1 -S2G 4gb_file > 4gb_file_sorted | coreutils | 30m32.078s | 2.24 MBps |
50 | | br -i 4gb_file -o 4gb_file_sorted | coreutils | 8m30.652s | 8.02 MBps |
51 | | br -i 4gb_file -o 4gb_file_sorted | brp/brm | 4m7.596s | 16.54 MBps |
52 |
53 | We have a new bottleneck: we're limited by how quickly we can partition/pump our dataset out to the nodes. awk and sort begin to show their limitations (our clever awk script is a bit cpu bound, and @sort -m@ can only merge so many files at once). So we use two little helper programs written in C (yes, I know! it's cheating! if you can think of a better partition/merge using core unix tools, contact me) to partition the data and merge it back.
54 |
55 | h3. Future work
56 |
57 | I've tested this on ubuntu/debian, but not on other distros. According to Daniel Einspanjer, netcat has different parameters on Redhat.
58 |
59 | br has a poor man's dfs like so:
60 |
61 | br -r "cat > /tmp/myfile" < input
62 |
63 | But this breaks if you specify the same host multiple times. Maybe some kind of very basic virtualization is in order. Maybe.
64 |
65 | Other niceties would be to more closely mimic the options presented in sort (numeric, reverse, etc).
66 |
--------------------------------------------------------------------------------
/README:
--------------------------------------------------------------------------------
1 | NAME
2 | br - bashreduce, map/reduce in bash
3 |
4 | SYNOPSIS
5 | br [-h '[ ][...]'] [-f]
6 | [-m ] [-r ] [-M ]
7 | [-i ] [-o ] [-c ] [-S ]
8 | [-t ] [-?]
9 |
10 | DESCRIPTION
11 | br implements a map/reduce framework that allows the map and reduce
12 | jobs to be constructed from standard Unix tools. It can operate on
13 | data piped into it or on files known to exist on the worker nodes.
14 |
15 | INSTALLATION
16 | br can run out-of-the box but for convenience, put it somewhere on
17 | your PATH. The brutils come with a Makefile that will install
18 | these tools, probably to /usr/local/bin. If you leave these tools
19 | in place relative to br, they'll be picked up and used from there.
20 |
21 | You'll need passwordless ssh access to each machine you wish to
22 | use as a worker (even localhost, for now).
23 |
24 | OPTIONS
25 | -h The list of hosts to be used, specified as a space-
26 | delimited list. Specifying the same host multiple
27 | times is allowed and is useful for taking explicit
28 | advantage of multiple cores.
29 |
30 | -f Only pass filenames from the master to the workers,
31 | under the assumption that they all have a mirror of
32 | the data set. This is equivalent to prefixing your
33 | map program with "xargs cat |".
34 |
35 | -m The map program, which should expect initial data on
36 | stdin and produce the intermediate data on stdout.
37 | The stderr from this program will be logged [1].
38 |
39 | -r The reduce program, which should expect intermediate
40 | data on stdin and produce the final data on stdout.
41 | The stderr from this program will be logged [1].
42 |
43 | -M The merge program, which should expect each node's
44 | final data on stdin and produce the unified final
45 | data on stdout. The stderr from this program will
46 | be logged [1].
47 |
48 | -i A file or directory to serve as input. If it is a
49 | file, this is equivalent to piping that file into
50 | br. If it is a directory, it is equivalent to
51 | piping the names of every file in that directory
52 | into br.
53 |
54 | -o The local file where output will be stored.
55 | Defaults to stdout.
56 |
57 | -c The column used by sort to produce the final output.
58 | Defaults to 1.
59 |
60 | -S This is directly the -S option from sort(1). It defaults
61 | to 256M.
62 |
63 | -t The temporary directory for storing br files. All of
64 | these files will be prefixed with "br_" and, with the
65 | exception of br_stderr, will be cleaned up upon
66 | successful exit. Defaults to /tmp.
67 |
68 | -? Shows a help message and details about the arguments.
69 |
70 | FILES
71 | All of these files are in the tmp directory specified by the user,
72 | which defaults to /tmp.
73 |
74 | br_job_*
75 | The master uses named pipes to move data (or filenames)
76 | to the worker(s). These pipes are stored here. This
77 | directory is removed at the end of the job.
78 |
79 | br_node_*
80 | Each node uses named pipes to buffer data. These
81 | are stored here and do not overlap, even if a
82 | node is specified twice in the host list. These
83 | directories are removed at the end of the job.
84 |
85 | br_stderr
86 | This file is cleared at the beginning of each
87 | job. It records stderr from most commands run
88 | by br. This is currently a point of contention
89 | between jobs. This file is NOT removed at the
90 | end of the job.
91 |
92 | NOTES
93 | [1] Because you can specify a pipeline for any or all of the map,
94 | reduce and merge steps, br can only reliably log stderr from
95 | the last step in those pipelines.
96 |
97 | AUTHORS
98 | Erik Frey
99 | Richard Crowley
100 |
101 | SEE ALSO
102 |
103 |
104 |
105 | LICENSE
106 | The goodwill of Erik Frey.
107 |
--------------------------------------------------------------------------------
/br:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # bashreduce: mapreduce in bash
3 | # erik@fawx.com
4 |
5 | usage() {
6 | echo "Usage: $1 [-h '[ ][...]'] [-f]" >&2
7 | echo " [-m ] [-r ] [-M ]" >&2
8 | echo " [-i ] [-o ] [-c ]" \
9 | "[-S ]" >&2
10 | echo " [-t ] [-?]" >&2
11 | if [ -n "$2" ] ; then
12 | echo " -h hosts to use; repeat hosts for multiple cores" >&2
13 | echo " (defaults to contents of /etc/br.hosts)" >&2
14 | echo " -f send filenames, not data, over the network," >&2
15 | echo " implies each host has a mirror of the dataset" >&2
16 | echo " -m map program (effectively defaults to cat)" >&2
17 | echo " -r reduce program" >&2
18 | echo " -M merge program" >&2
19 | echo " -i input file or directory (defaults to stdin)" >&2
20 | echo " -o output file (defaults to stdout)" >&2
21 | echo " -c column used by sort (defaults to 1)" >&2
22 | echo " -S memory to use for sort (defaults to 256M)" >&2
23 | echo " -t tmp directory (defaults to /tmp)" >&2
24 | echo " -? this help message" >&2
25 | fi
26 | exit 2
27 | }
28 |
29 | # Defaults
30 | hosts=
31 | filenames=false
32 | map=
33 | reduce=
34 | merge=
35 | input=
36 | output=
37 | column=1
38 | sort_mem=256M
39 | tmp=/tmp
40 |
41 | program=$(basename $0)
42 | while getopts "h:fm:r:M:i:o:c:S:t:?" name; do
43 | case "$name" in
44 | h) hosts=$OPTARG;;
45 | f) filenames=true;;
46 | m) map=$OPTARG;;
47 | r) reduce=$OPTARG;;
48 | M) merge=$OPTARG;;
49 | i) input=$OPTARG;;
50 | o) output=$OPTARG;;
51 | c) column=$OPTARG;;
52 | S) sort_mem=$OPTARG;;
53 | t) tmp=$OPTARG;;
54 | ?) usage $program MOAR;;
55 | *) usage $program;;
56 | esac
57 | done
58 |
59 | # If -h wasn't given, try /etc/br.hosts
60 | if [[ -z "$hosts" ]]; then
61 | if [[ -e /etc/br.hosts ]]; then
62 | hosts=$(cat /etc/br.hosts)
63 | else
64 | echo "$program: must specify -h or provide /etc/br.hosts"
65 | usage $program
66 | fi
67 | fi
68 |
69 | # Start br_stderr from a clean slate
70 | cp /dev/null $tmp/br_stderr
71 |
72 | # Setup map and reduce as parts of a pipeline
73 | [[ -n "$map" ]] && map="| $map 2>>$tmp/br_stderr"
74 | [[ $filenames == true ]] && map="| xargs -n1 \
75 | sh -c 'zcat \$0 2>>$tmp/br_stderr || cat \$0 2>>$tmp/br_stderr' $map"
76 | [[ -n "$reduce" ]] && reduce="| $reduce 2>>$tmp/br_stderr"
77 |
78 | jobid="$(uuidgen)"
79 | jobpath="$tmp/br_job_$jobid"
80 | nodepath="$tmp/br_node_$jobid"
81 | mkdir -p $jobpath/{in,out}
82 |
83 | port_in=8192
84 | port_out=$(($port_in + 1))
85 | host_idx=0
86 | out_files=
87 | for host in $hosts; do
88 | mkfifo $jobpath/{in,out}/$host_idx
89 |
90 | # Listen for work (remote)
91 | ssh -n $host "mkdir -p $nodepath/"
92 | pid=$(ssh -n $host "nc -l -p $port_out >$nodepath/$host_idx \
93 | 2>>$tmp/br_stderr >$tmp/br_stderr >$tmp/br_stderr \
101 | $map $reduce \
102 | | nc -q0 -l -p $port_in >>$tmp/br_stderr &"
103 |
104 | # Send work (local)
105 | nc $host $port_in >$jobpath/in/$host_idx &
106 |
107 | # Receive results (local)
108 | nc -q0 $host $port_out <$jobpath/out/$host_idx &
109 | out_files="$out_files $jobpath/out/$host_idx"
110 |
111 | # ++i
112 | port_in=$(($port_in + 2))
113 | port_out=$(($port_in + 1))
114 | host_idx=$(($host_idx + 1))
115 |
116 | done
117 |
118 | # Create the command to produce input
119 | if [[ -d "$input" ]]; then
120 | input="find $input -type f |"
121 | [[ $filenames == false ]] && input="$input xargs -n1 \
122 | sh -c 'zcat \$0 2>>$tmp/br_stderr || cat \$0 2>>$tmp/br_stderr' |"
123 | elif [[ -f "$input" ]]; then
124 | input="sh -c 'zcat $input 2>>$tmp/br_stderr \
125 | || cat $input 2>>$tmp/br_stderr' |"
126 | else
127 | input=
128 | fi
129 |
130 | # Partition local input to the remote workers
131 | if which brp >>$tmp/br_stderr; then
132 | BRP=brp
133 | elif [[ -f brutils/brp ]]; then
134 | BRP=brutils/brp
135 | fi
136 | if [[ -n "$BRP" ]]; then
137 | eval "$input $BRP - $(($column - 1)) $out_files"
138 | else
139 | # use awk if we don't have brp
140 | # we're taking advantage of a special property that awk leaves its file handles open until its done
141 | # i think this is universal
142 | # we're also sending a zero length string to all the handles at the end, in case some pipe got no love
143 | eval "$input awk '{
144 | srand(\$$column);
145 | print \$0 >>\"$jobpath/out/\"int(rand() * $host_idx);
146 | }
147 | END {
148 | for (i = 0; i != $host_idx; ++i)
149 | printf \"\" >>\"$jobpath/out/\"i;
150 | }'"
151 | fi
152 |
153 | # Merge output from hosts into one
154 | # Maybe use the -M program, if not just sort (preferring brm)
155 | if which brm >>$tmp/br_stderr; then
156 | BRM=brm
157 | elif [[ -f brutils/brm ]]; then
158 | BRM=brutils/brm
159 | fi
160 | if [[ -n "$merge" ]]; then
161 | eval "find $jobpath/in -type p | xargs cat \
162 | | $merge 2>>$tmp/br_stderr ${output:+| pv >$output}"
163 | else
164 | if [[ -n "$BRM" ]]; then
165 | eval "$BRM - $(($column - 1)) $(find $jobpath/in/ -type p | xargs) \
166 | ${output:+| pv >$output}"
167 | else
168 | # use sort -m if we don't have brm
169 | # sort -m creates tmp files if too many input files are specified
170 | # brm doesn't do this
171 | eval "sort -k$column,$column -S$sort_mem -m $jobpath/in/* \
172 | ${output:+| pv >$output}"
173 | fi
174 | fi
175 |
176 | # Cleanup
177 | rm -rf $jobpath
178 | for host in $hosts; do
179 | ssh $host "rm -rf $nodepath"
180 | done
181 |
182 | # TODO: is there a safe way to kill subprocesses upon fail?
183 | # this seems to work: /bin/kill -- -$$
184 |
--------------------------------------------------------------------------------
/brutils/Makefile:
--------------------------------------------------------------------------------
1 | CFLAGS = -O3 -Wall
2 | OBJS_BRP = brp.o
3 | OBJS_BRM = brm.o
4 | HEADERS = brutils.h
5 | LIBS =
6 | TARGET_BRP = brp
7 | TARGET_BRM = brm
8 | BINDIR=/usr/local/bin
9 |
10 | all: $(TARGET_BRP) $(TARGET_BRM)
11 |
12 | $(TARGET_BRP): $(OBJS_BRP) $(HEADERS)
13 | $(CC) -o $(TARGET_BRP) $(OBJS_BRP) $(LIBS)
14 |
15 | $(TARGET_BRM): $(OBJS_BRM) $(HEADERS)
16 | $(CC) -o $(TARGET_BRM) $(OBJS_BRM) $(LIBS)
17 |
18 | clean:
19 | rm -f $(OBJS_BRP) $(OBJS_BRM) $(TARGET_BRP) $(TARGET_BRM)
20 |
21 | install: all
22 | install -c brp $(BINDIR)
23 | install -c brm $(BINDIR)
24 |
25 | uninstall:
26 | rm $(BINDIR)/br{p,m}
27 |
--------------------------------------------------------------------------------
/brutils/README:
--------------------------------------------------------------------------------
1 | Too bad that partitioning using awk is fairly cpu bound. Here's a little c cheat. If someone can think of a way to partition text that's much faster than the awk script in br, email me: erik@fawx.com .
2 |
--------------------------------------------------------------------------------
/brutils/brm.c:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 |
7 | #include "brutils.h"
8 |
9 | void showusage() {
10 | fprintf(stderr, "usage: brm output column-index input1 input2 input3...\n");
11 | fprintf(stderr, " can specify '-' for output, to write to stdout\n");
12 | exit(1);
13 | }
14 |
15 | int main(int argc, char * argv[])
16 | {
17 | FILE * pout = stdout;
18 | int i, col_index;
19 |
20 | if (argc < 4)
21 | showusage();
22 | if (strcmp(argv[1], "-") != 0)
23 | pout = try_open(argv[1], "wb");
24 | col_index = atoi(argv[2]);
25 |
26 | int lines_len = argc - 3;
27 | line_t ** lines = (line_t **) malloc( lines_len * sizeof(line_t *) );
28 | line_t ** lines_end = lines;
29 | for (i = 0; i != lines_len; ++i) {
30 | *lines_end = (line_t *) malloc( sizeof(line_t) );
31 | (*lines_end)->pin = try_open(argv[i + 3], "rb");
32 | if (read_parse(col_index, *lines_end)) {
33 | ++lines_end;
34 | lower_bound_move(lines, lines_end);
35 | }
36 | else {
37 | fclose((*lines_end)->pin);
38 | free(*lines_end);
39 | }
40 | }
41 |
42 | // okay, merge!
43 | line_t * back;
44 | while (lines != lines_end) {
45 | // write to out
46 | back = *(lines_end - 1);
47 | *back->col_end = back->col_end_val;
48 | fputs(back->buf, pout);
49 | if (read_parse(col_index, back))
50 | lower_bound_move(lines, lines_end);
51 | else {
52 | fclose(back->pin);
53 | --lines_end;
54 | }
55 | }
56 |
57 | if (pout != stdout)
58 | fclose(pout);
59 |
60 | return 0;
61 | }
62 |
--------------------------------------------------------------------------------
/brutils/brp.c:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 |
7 | #include "brutils.h"
8 |
9 | void showusage() {
10 | fprintf(stderr, "usage: brp input column-index output1 output2 output3...\n");
11 | fprintf(stderr, " can specify '-' for input, to read from stdin\n");
12 | exit(1);
13 | }
14 |
15 | int main(int argc, char * argv[])
16 | {
17 | line_t line;
18 | int i, col_index;
19 |
20 | if (argc < 4)
21 | showusage();
22 | if (strcmp(argv[1], "-") != 0)
23 | line.pin = try_open(argv[1], "rb");
24 | else
25 | line.pin = stdin;
26 | col_index = atoi(argv[2]);
27 |
28 | int pouts_len = argc - 3;
29 | FILE ** pouts = (FILE **) malloc( pouts_len * sizeof(FILE *) );
30 | for (i = 0; i != pouts_len; ++i)
31 | pouts[i] = try_open(argv[i + 3], "wb");
32 |
33 | while (fgets(line.buf, sizeof(line.buf), line.pin)) {
34 | if ( find_col(col_index, &line) ) // if this string has the requisite number of columns
35 | fputs(line.buf, pouts[fnv_hash(line.col_beg, line.col_end) % pouts_len]); // write it to the correct file
36 | }
37 |
38 | if (line.pin != stdin)
39 | fclose(line.pin);
40 |
41 | for (i = 0; i != pouts_len; ++i)
42 | fclose(pouts[i]);
43 |
44 | return 0;
45 | }
46 |
47 |
--------------------------------------------------------------------------------
/brutils/brutils.h:
--------------------------------------------------------------------------------
1 | #ifndef __BR_UTILS_H__
2 | #define __BR_UTILS_H__
3 |
4 | #include
5 | #include
6 |
7 | FILE * try_open(const char * path, const char * flags) {
8 | FILE * p = fopen(path, flags);
9 | if (!p) {
10 | fprintf(stderr, "could not open %s: %s\n", path, strerror(errno));
11 | exit(1);
12 | }
13 | return p;
14 | }
15 |
16 | unsigned int fnv_hash(const char *p, const char *end) {
17 | unsigned int h = 2166136261UL;
18 | for (; p != end; ++p)
19 | h = (h * 16777619) ^ *p;
20 | return h;
21 | }
22 |
23 | typedef struct
24 | {
25 | char buf[8192];
26 | char * col_beg;
27 | char * col_end;
28 | char col_end_val;
29 | FILE * pin;
30 | } line_t;
31 |
32 | int find_col(int col, line_t * line) {
33 | for (line->col_beg = line->buf; col != 0 && *line->col_beg != 0; ++line->col_beg) {
34 | if ( isspace(*line->col_beg) )
35 | --col;
36 | }
37 | if (*line->col_beg == 0)
38 | return 0;
39 | for (line->col_end = line->col_beg; !isspace(*line->col_end); ++line->col_end) {}
40 | return 1;
41 | }
42 |
43 | int read_parse(int col, line_t * line) {
44 | while (fgets(line->buf, sizeof(line->buf), line->pin)) {
45 | if (find_col(col, line)) {
46 | line->col_end_val = *line->col_end;
47 | *line->col_end = 0;
48 | return 1;
49 | }
50 | }
51 | return 0;
52 | }
53 |
54 | // move end - 1 to the proper position in beg..end
55 | void lower_bound_move(line_t ** beg, line_t ** end)
56 | {
57 | if (beg == end)
58 | return;
59 |
60 | int len = end - beg - 1;
61 | int half;
62 | line_t ** mid;
63 |
64 | // [ * * * * x ]
65 | // we need to move x to its correct position in the otherwise sorted array
66 | while (len > 0) {
67 | half = len >> 1;
68 | mid = beg + half;
69 | if ( strcoll( (*mid)->col_beg, (*(end - 1))->col_beg) > 0 ) {
70 | beg = mid + 1;
71 | len = len - half - 1;
72 | }
73 | else
74 | len = half;
75 | }
76 |
77 | // if beg < end - 1, we need to move beg up
78 | if (beg < end - 1) {
79 | line_t * tmp = *(end - 1);
80 | memmove(beg + 1, beg, (end - beg - 1) * sizeof(line_t **));
81 | *beg = tmp;
82 | }
83 | }
84 |
85 | #endif // __BR_UTILS_H__
86 |
--------------------------------------------------------------------------------