├── readme.txt ├── set-operations-in-unix-shell.pdf ├── set-operations-in-unix-shell.txt └── set-operations-in-unix-shell.xlsx /readme.txt: -------------------------------------------------------------------------------- 1 | This is an implementation of 14 set operations by using only Unix utilities. 2 | 3 | It was created by Peteris Krumins (peter@catonmat.net). 4 | His blog is at http://www.catonmat.net -- good coders code, great reuse. 5 | 6 | This document is released under GNU Free Documentation License. 7 | 8 | It was written as a supplementary material for my article "Set Operations in 9 | the Unix Shell". This article explains how all the set operations were 10 | created. It can be read here: 11 | 12 | http://www.catonmat.net/blog/set-operations-in-unix-shell/ 13 | 14 | ------------------------------------------------------------------------------ 15 | 16 | The implementation contains the following 14 set operations: 17 | 18 | * Set Membership. 19 | * Set Equality. 20 | * Set Cardinality. 21 | * Subset Test. 22 | * Set Union. 23 | * Set Intersection. 24 | * Set Complement. 25 | * Set Symmetric Difference. 26 | * Power Set. 27 | * Set Cartesian Product. 28 | * Disjoint Set Test. 29 | * Empty Set Test. 30 | * Minimum. 31 | * Maximum. 32 | 33 | They are implemented by using the following Unix utilities: 34 | 35 | * grep 36 | * awk 37 | * diff 38 | * comm 39 | * cat 40 | * sort 41 | * uniq 42 | * head 43 | * join 44 | * tail 45 | * wc 46 | * tr 47 | * sed 48 | * cut 49 | 50 | The implementation is available in .txt (ascii), .pdf and excel 2007 (.xlsx) 51 | formats. The latest version of this cheat sheet can always be downloaded here: 52 | 53 | .txt: http://www.catonmat.net/download/setops.txt 54 | .pdf: http://www.catonmat.net/download/setops.pdf 55 | 56 | excel file is available only in the source tree. 57 | 58 | I am sorry that I didn't use LaTeX for this document but I wanted to see what 59 | I can create in excel. 60 | 61 | Actually I also wrote another article on the same subject called "Set 62 | Operations in the Unix Shell Simplified" where I just listed the operations 63 | without explaining them. It's here: 64 | 65 | http://www.catonmat.net/blog/set-operations-in-unix-shell-simplified/ 66 | 67 | 68 | ------------------------------------------------------------------------------ 69 | 70 | Have fun with unix and sets! ;) 71 | 72 | 73 | Sincerely, 74 | Peteris Krumins 75 | http://www.catonmat.net 76 | 77 | -------------------------------------------------------------------------------- /set-operations-in-unix-shell.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkrumins/set-operations-in-unix-shell/HEAD/set-operations-in-unix-shell.pdf -------------------------------------------------------------------------------- /set-operations-in-unix-shell.txt: -------------------------------------------------------------------------------- 1 | .----------------------------------------------------------------------------. 2 | | | 3 | | Set Operations in the Unix Shell v1.01 | 4 | | | 5 | '----------------------------------------------------------------------------' 6 | | Peteris Krumins (peter@catonmat.net), 2008.12.02 | 7 | | http://www.catonmat.net - good coders code, great reuse | 8 | | | 9 | | Released under the GNU Free Document License | 10 | '----------------------------------------------------------------------------' 11 | 12 | Set operations covered in this document: 13 | ---------------------------------------- 14 | - Set Membership. 15 | - Set Equality. 16 | - Set Cardinality. 17 | - Subset Test. 18 | - Set Union. 19 | - Set Intersection. 20 | - Set Complement. 21 | - Set Symmetric Difference. 22 | - Power Set. 23 | - Set Cartesian Product. 24 | - Disjoint Set Test. 25 | - Empty Set Test. 26 | - Minimum. 27 | - Maximum. 28 | 29 | Full explanation of these operations at: 30 | http://www.catonmat.net/blog/set-operations-in-unix-shell/ 31 | 32 | 33 | Set Membership 34 | -------------- 35 | 36 | $ grep -xc 'element' set # outputs 1 if element is in set 37 | # outputs >1 if set is a multi-set 38 | # outputs 0 if element is not in set 39 | 40 | $ grep -xq 'element' set # returns 0 (true) if element is in set 41 | # returns 1 (false) if element is not in set 42 | 43 | $ awk '$0 == "element" { s=1; exit } END { exit !s }' set 44 | # returns 0 if element is in set, 1 otherwise. 45 | 46 | $ awk -v e='element' '$0 == e { s=1; exit } END { exit !s }' set 47 | 48 | 49 | Set Equality 50 | ------------ 51 | 52 | $ diff -q <(sort set1) <(sort set2) # returns 0 if set1 is equal to set2 53 | # returns 1 if set1 != set2 54 | 55 | $ diff -q <(sort set1 | uniq) <(sort set2 | uniq) 56 | # collapses multi-sets into sets and does the same as previous 57 | 58 | $ awk '{ if (!($0 in a)) c++; a[$0] } END{ exit !(c==NR/2) }' set1 set2 59 | # returns 0 if set1 == set2 60 | # returns 1 if set1 != set2 61 | 62 | $ awk '{ a[$0] } END{ exit !(length(a)==NR/2) }' set1 set2 63 | # same as previous, requires >= gnu awk 3.1.5 64 | 65 | 66 | Set Cardinality 67 | --------------- 68 | 69 | $ wc -l set | cut -d' ' -f1 # outputs number of elements in set 70 | 71 | $ wc -l < set 72 | 73 | $ awk 'END { print NR }' set 74 | 75 | 76 | Subset Test 77 | ----------- 78 | 79 | $ comm -23 <(sort subset | uniq) <(sort set | uniq) | head -1 80 | # outputs something if subset is not a subset of set 81 | # does not putput anything if subset is a subset of set 82 | 83 | $ awk 'NR==FNR { a[$0]; next } { if !($0 in a) exit 1 }' set subset 84 | # returns 0 if subset is a subset of set 85 | # returns 1 if subset is not a subset of set 86 | 87 | 88 | Set Union 89 | --------- 90 | 91 | $ cat set1 set2 # outputs union of set1 and set2 92 | # assumes they are disjoint 93 | 94 | $ awk 1 set1 set2 # ditto 95 | 96 | $ cat set1 set2 ... setn # union over n sets 97 | 98 | $ cat set1 set2 | sort -u # same, but assumes they are not disjoint 99 | 100 | $ sort set1 set2 | uniq 101 | 102 | $ sort -u set1 set2 103 | 104 | $ awk '!a[$0]++' # ditto 105 | 106 | 107 | Set Intersection 108 | ---------------- 109 | 110 | $ comm -12 <(sort set1) <(sort set2) # outputs insersect of set1 and set2 111 | 112 | $ grep -xF -f set1 set2 113 | 114 | $ sort set1 set2 | uniq -d 115 | 116 | $ join <(sort -n A) <(sort -n B) 117 | 118 | $ awk 'NR==FNR { a[$0]; next } $0 in a' set1 set2 119 | 120 | 121 | Set Complement 122 | -------------- 123 | 124 | $ comm -23 <(sort set1) <(sort set2) 125 | # outputs elements in set1 that are not in set2 126 | 127 | $ grep -vxF -f set2 set1 # ditto 128 | 129 | $ sort set2 set2 set1 | uniq -u # ditto 130 | 131 | $ awk 'NR==FNR { a[$0]; next } !($0 in a)' set2 set1 132 | 133 | 134 | Set Symmetric Difference 135 | ------------------------ 136 | 137 | $ comm -3 <(sort set1) <(sort set2) | sed 's/\t//g' 138 | # outputs elements that are in set1 or in set2 but not both 139 | 140 | $ comm -3 <(sort set1) <(sort set2) | tr -d '\t' 141 | 142 | $ sort set1 set2 | uniq -u 143 | 144 | $ cat <(grep -vxF -f set1 set2) <(grep -vxF -f set2 set1) 145 | 146 | $ grep -vxF -f set1 set2; grep -vxF -f set2 set1 147 | 148 | $ awk 'NR==FNR { a[$0]; next } $0 in a { delete a[$0]; next } 1; 149 | END { for (b in a) print b }' set1 set2 150 | 151 | 152 | Power Set 153 | --------- 154 | 155 | $ p() { [ $# -eq 0 ] && echo || (shift; p "$@") | 156 | while read r ; do echo -e "$1 $r\n$r"; done } 157 | $ p `cat set` 158 | 159 | # no nice awk solution, you are welcome to email me one: peter@catonmat.net 160 | 161 | 162 | Set Cartesian Product 163 | --------------------- 164 | 165 | $ while read a; do while read b; do echo "$a, $b"; done < set1; done < set2 166 | 167 | $ awk 'NR==FNR { a[$0]; next } { for (i in a) print i, $0 }' set1 set2 168 | 169 | 170 | Disjoint Set Test 171 | ----------------- 172 | 173 | $ comm -12 <(sort set1) <(sort set2) # does not output anything if disjoint 174 | 175 | $ awk '++seen[$0] == 2 { exit 1 }' set1 set2 # returns 0 if disjoint 176 | # returns 1 if not 177 | 178 | 179 | Empty Set Test 180 | -------------- 181 | 182 | $ wc -l set | cut -d' ' -f1 # outputs 0 if the set is empty 183 | # outputs >0 if the set is not empty 184 | 185 | $ wc -l < set 186 | 187 | $ awk '{ exit 1 }' set # returns 0 if set is empty, 1 otherwise 188 | 189 | 190 | Minimum 191 | ------- 192 | 193 | $ head -1 <(sort set) # outputs the minimum element in the set 194 | 195 | $ awk 'NR == 1 { min = $0 } $0 < min { min = $0 } END { print min }' 196 | 197 | 198 | Maximum 199 | ------- 200 | 201 | $ tail -1 <(sort set) # outputs the maximum element in the set 202 | 203 | $ awk 'NR == 1 { max = $0 } $0 > max { max = $0 } END { print max }' 204 | 205 | .---------------------------------------------------------------------------. 206 | | Peteris Krumins (peter@catonmat.net), 2008.12.02 | 207 | | http://www.catonmat.net - good coders code, great reuse | 208 | | | 209 | | Released under the GNU Free Document License v1.01 | 210 | | | 211 | | Thanks to waldner and pgas from #awk on FreeNode | 212 | | Power set function by Andreas: http://lysium.de/blog | 213 | '---------------------------------------------------------------------------' 214 | -------------------------------------------------------------------------------- /set-operations-in-unix-shell.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkrumins/set-operations-in-unix-shell/HEAD/set-operations-in-unix-shell.xlsx --------------------------------------------------------------------------------