├── README.md ├── exercises ├── GNU_grep │ ├── .ref_solutions │ │ ├── ex01_basic_match.txt │ │ ├── ex02_basic_options.txt │ │ ├── ex03_multiple_string_match.txt │ │ ├── ex04_filenames.txt │ │ ├── ex05_word_line_matching.txt │ │ ├── ex06_ABC_context_matching.txt │ │ ├── ex07_recursive_search.txt │ │ ├── ex08_search_pattern_from_file.txt │ │ ├── ex09_regex_anchors.txt │ │ ├── ex10_regex_this_or_that.txt │ │ ├── ex11_regex_quantifiers.txt │ │ ├── ex12_regex_character_class_part1.txt │ │ ├── ex13_regex_character_class_part2.txt │ │ ├── ex14_regex_grouping_and_backreference.txt │ │ ├── ex15_regex_PCRE.txt │ │ └── ex16_misc_and_extras.txt │ ├── ex01_basic_match.txt │ ├── ex01_basic_match │ │ └── sample.txt │ ├── ex02_basic_options.txt │ ├── ex02_basic_options │ │ └── sample.txt │ ├── ex03_multiple_string_match.txt │ ├── ex03_multiple_string_match │ │ └── sample.txt │ ├── ex04_filenames.txt │ ├── ex04_filenames │ │ ├── greeting.txt │ │ ├── poem.txt │ │ └── sample.txt │ ├── ex05_word_line_matching.txt │ ├── ex05_word_line_matching │ │ ├── greeting.txt │ │ ├── sample.txt │ │ └── words.txt │ ├── ex06_ABC_context_matching.txt │ ├── ex06_ABC_context_matching │ │ └── sample.txt │ ├── ex07_recursive_search.txt │ ├── ex07_recursive_search │ │ ├── msg │ │ │ ├── greeting.txt │ │ │ └── sample.txt │ │ ├── poem.txt │ │ ├── progs │ │ │ ├── hello.py │ │ │ └── hello.sh │ │ └── words.txt │ ├── ex08_search_pattern_from_file.txt │ ├── ex08_search_pattern_from_file │ │ ├── baz.txt │ │ ├── foo.txt │ │ └── words.txt │ ├── ex09_regex_anchors.txt │ ├── ex09_regex_anchors │ │ └── sample.txt │ ├── ex10_regex_this_or_that.txt │ ├── ex10_regex_this_or_that │ │ └── sample.txt │ ├── ex11_regex_quantifiers.txt │ ├── ex11_regex_quantifiers │ │ └── garbled.txt │ ├── ex12_regex_character_class_part1.txt │ ├── ex12_regex_character_class_part1 │ │ └── sample_words.txt │ ├── ex13_regex_character_class_part2.txt │ ├── ex13_regex_character_class_part2 │ │ └── sample.txt │ ├── ex14_regex_grouping_and_backreference.txt │ ├── ex14_regex_grouping_and_backreference │ │ └── sample.txt │ ├── ex15_regex_PCRE.txt │ ├── ex15_regex_PCRE │ │ └── sample.txt │ ├── ex16_misc_and_extras.txt │ ├── ex16_misc_and_extras │ │ ├── garbled.txt │ │ ├── poem.txt │ │ └── sample.txt │ └── solve └── README.md ├── file_attributes.md ├── gnu_awk.md ├── gnu_grep.md ├── gnu_sed.md ├── images ├── color_option.png ├── colordiff.png ├── highlight_string_whole_file_op.png └── wdiff_to_colordiff.png ├── miscellaneous.md ├── overview_presentation ├── baz.json ├── cli_text_processing.pdf ├── foo.xml ├── greeting.txt └── sample.txt ├── perl_the_swiss_knife.md ├── restructure_text.md ├── ruby_one_liners.md ├── sorting_stuff.md ├── tail_less_cat_head.md ├── whats_the_difference.md └── wheres_my_file.md /README.md: -------------------------------------------------------------------------------- 1 | # Command Line Text Processing 2 | 3 | Learn about various commands available for common and exotic text processing needs. Examples have been tested on GNU/Linux - there'd be syntax/feature variations with other distributions, consult their respective `man` pages for details. 4 | 5 | --- 6 | 7 | :warning: :warning: I'm no longer actively working on this repo. Instead, I've converted existing chapters into ebooks (see [ebook section](#ebooks) below for links), available under the same license. These ebooks are better formatted, updated for newer versions of the software, includes exercises, solutions, etc. Since all the chapters have been converted, I'm archiving this repo. 8 | 9 | --- 10 | 11 |
12 | 13 | ## Ebooks 14 | 15 | Individual online ebooks with better formatting, explanations, exercises, solutions, etc: 16 | 17 | * [CLI text processing with GNU grep and ripgrep](https://learnbyexample.github.io/learn_gnugrep_ripgrep/) 18 | * [CLI text processing with GNU sed](https://learnbyexample.github.io/learn_gnused/) 19 | * [CLI text processing with GNU awk](https://learnbyexample.github.io/learn_gnuawk/) 20 | * [Ruby One-Liners Guide](https://learnbyexample.github.io/learn_ruby_oneliners/) 21 | * [Perl One-Liners Guide](https://learnbyexample.github.io/learn_perl_oneliners/) 22 | * [CLI text processing with GNU Coreutils](https://learnbyexample.github.io/cli_text_processing_coreutils/) 23 | * [Linux Command Line Computing](https://learnbyexample.github.io/cli-computing/) 24 | 25 | See https://learnbyexample.github.io/books/ for links to PDF/EPUB versions and other ebooks. 26 | 27 |
28 | 29 | ## Chapters 30 | 31 | As mentioned earlier, I'm no longer actively working on these chapters: 32 | 33 | * [Cat, Less, Tail and Head](./tail_less_cat_head.md) 34 | * cat, less, tail, head, Text Editors 35 | * [GNU grep](./gnu_grep.md) 36 | * [GNU sed](./gnu_sed.md) 37 | * [GNU awk](./gnu_awk.md) 38 | * [Perl the swiss knife](./perl_the_swiss_knife.md) 39 | * [Ruby one liners](./ruby_one_liners.md) 40 | * [Sorting stuff](./sorting_stuff.md) 41 | * sort, uniq, comm, shuf 42 | * [Restructure text](./restructure_text.md) 43 | * paste, column, pr, fold 44 | * [Whats the difference](./whats_the_difference.md) 45 | * cmp, diff 46 | * [Wheres my file](./wheres_my_file.md) 47 | * [File attributes](./file_attributes.md) 48 | * wc, du, df, touch, file 49 | * [Miscellaneous](./miscellaneous.md) 50 | * cut, tr, basename, dirname, xargs, seq 51 | 52 |
53 | 54 | ## Webinar recordings 55 | 56 | Recorded couple of videos based on content in the chapters, not sure if I'll do more: 57 | 58 | * [Using the sort command](https://www.youtube.com/watch?v=qLfAwwb5vGs) 59 | * [Using uniq and comm](https://www.youtube.com/watch?v=uAb2kxA2TyQ) 60 | 61 | See also my short videos on [Linux command line tips](https://www.youtube.com/watch?v=p0KCLusMd5Q&list=PLTv2U3HnAL4PNTmRqZBSUgKaiHbRL2zeY) 62 | 63 |
64 | 65 | ## Exercises 66 | 67 | Check out [exercises](./exercises) directory to solve practice questions on `grep`, right from the command line itself. 68 | 69 | See also my [TUI-apps](https://github.com/learnbyexample/TUI-apps) repo for interactive CLI text processing exercises. 70 | 71 |
72 | 73 | ## Contributing 74 | 75 | * Please [open an issue](https://github.com/learnbyexample/Command-line-text-processing/issues) for typos or bugs 76 | * As this repo is no longer actively worked upon, **please do not submit pull requests** 77 | * Share the repo with friends/colleagues, on social media, etc to help reach other learners 78 | * In case you need to reach me, mail me at `echo 'yrneaolrknzcyr.arg@tznvy.pbz' | tr 'a-z' 'n-za-m'` or send a DM via [twitter](https://twitter.com/learn_byexample) 79 | 80 |
81 | 82 | ## Acknowledgements 83 | 84 | * [unix.stackexchange](https://unix.stackexchange.com/) and [stackoverflow](https://stackoverflow.com/) - for getting answers to pertinent questions as well as sharpening skills by understanding and answering questions 85 | * Forums like [Linux users](https://www.linkedin.com/groups/65688), [/r/commandline/](https://www.reddit.com/r/commandline/), [/r/linux/](https://www.reddit.com/r/linux/), [/r/ruby/](https://www.reddit.com/r/ruby/), [news.ycombinator](https://news.ycombinator.com/news), [devup](http://devup.in/) and others for valuable feedback (especially spotting mistakes) and encouragement 86 | * See [wikipedia entry 'Roses Are Red'](https://en.wikipedia.org/wiki/Roses_Are_Red) for `poem.txt` used as sample text input file 87 | 88 |
89 | 90 | ## License 91 | 92 | This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/) 93 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex01_basic_match.txt: -------------------------------------------------------------------------------- 1 | 1) Match lines containing the string: day 2 | Solution: grep 'day' sample.txt 3 | 4 | 2) Match lines containing the string: it 5 | Solution: grep 'it' sample.txt 6 | 7 | 3) Match lines containing the string: do you 8 | Solution: grep 'do you' sample.txt 9 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex02_basic_options.txt: -------------------------------------------------------------------------------- 1 | 1) Match lines containing the string irrespective of lower/upper case: no 2 | Solution: grep -i 'no' sample.txt 3 | 4 | 2) Match lines not containing the string: o 5 | Solution: grep -v 'o' sample.txt 6 | 7 | 3) Match lines with line numbers containing the string: it 8 | Solution: grep -n 'it' sample.txt 9 | 10 | 4) Output only number of matching lines containing the string: a 11 | Solution: grep -c 'a' sample.txt 12 | 13 | 5) Match first two lines containing the string: do 14 | Solution: grep -m2 'do' sample.txt 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex03_multiple_string_match.txt: -------------------------------------------------------------------------------- 1 | 1) Match lines containing either of these three strings 2 | String1: Not 3 | String2: he 4 | String3: sun 5 | Solution: grep -e 'Not' -e 'he' -e 'sun' sample.txt 6 | 7 | 2) Match lines containing both these strings 8 | String1: He 9 | String2: or 10 | Solution: grep 'He' sample.txt | grep 'or' 11 | 12 | 3) Match lines containing either of these two strings 13 | String1: a 14 | String2: i 15 | and contains this as well 16 | String3: do 17 | Solution: grep -e 'a' -e 'i' sample.txt | grep 'do' 18 | 19 | 4) Match lines containing the string 20 | String1: it 21 | but not these strings 22 | String2: No 23 | String3: no 24 | Solution: grep 'it' sample.txt | grep -vi 'no' 25 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex04_filenames.txt: -------------------------------------------------------------------------------- 1 | Note: All files present in the directory should be given as file inputs to grep 2 | 3 | 1) Show only filenames containing the string: are 4 | Solution: grep -l 'are' * 5 | 6 | 2) Show only filenames NOT containing the string: two 7 | Solution: grep -L 'two' * 8 | 9 | 3) Match all lines containing the string: are 10 | Solution: grep 'are' * 11 | 12 | 4) Match maximum of two matching lines along with filenames containing the character: a 13 | Solution: grep -m2 'a' * 14 | 15 | 5) Match all lines without prefixing filename containing the string: to 16 | Solution: grep -h 'to' * 17 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex05_word_line_matching.txt: -------------------------------------------------------------------------------- 1 | Note: All files present in the directory should be given as file inputs to grep 2 | 3 | 1) Match lines containing whole word: do 4 | Solution: grep -w 'do' * 5 | 6 | 2) Match whole lines containing the string: Hello World 7 | Solution: grep -x 'Hello World' * 8 | 9 | 3) Match lines containing these whole words: 10 | Word1: He 11 | Word2: far 12 | Solution: grep -w -e 'far' -e 'He' * 13 | 14 | 4) Match lines containing the whole word: you 15 | and NOT containing the case insensitive string: How 16 | Solution: grep -w 'you' * | grep -vi 'how' 17 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex06_ABC_context_matching.txt: -------------------------------------------------------------------------------- 1 | 1) Get lines and 3 following it containing the string: you 2 | Solution: grep -A3 'you' sample.txt 3 | 4 | 2) Get lines and 2 preceding it containing the string: is 5 | Solution: grep -B2 'is' sample.txt 6 | 7 | 3) Get lines and 1 following/preceding containing the string: Not 8 | Solution: grep -C1 'Not' sample.txt 9 | 10 | 4) Get lines and 1 following and 4 preceding containing the string: Not 11 | Solution: grep -A1 -B4 'Not' sample.txt 12 | 13 | 5) Get lines and 1 preceding it containing the string: you 14 | there should be no separator between the matches 15 | Solution: grep --no-group-separator -B1 'you' sample.txt 16 | 17 | 6) Get lines and 1 preceding it containing the string: you 18 | the separator between the matches should be: ##### 19 | Solution: grep --group-separator='#####' -B1 'you' sample.txt 20 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex07_recursive_search.txt: -------------------------------------------------------------------------------- 1 | Note: Every file in this directory and sub-directories is input for grep, unless otherwise specified 2 | 3 | 1) Match all lines containing the string: you 4 | Solution: grep -r 'you' 5 | 6 | 2) Show only filenames matching the string: Hello 7 | filenames should only end with .txt 8 | Solution: grep -rl --include='*.txt' 'Hello' 9 | 10 | 3) Show only filenames matching the string: Hello 11 | filenames should NOT end with .txt 12 | Solution: grep -rl --exclude='*.txt' 'Hello' 13 | 14 | 4) Show only filenames matching the string: are 15 | should not include the directory: progs 16 | Solution: grep -rl --exclude-dir='progs' 'are' 17 | 18 | 5) Show only filenames matching the string: are 19 | should NOT include these directories 20 | dir1: progs 21 | dir2: msg 22 | Solution: grep -rl --exclude-dir='progs' --exclude-dir='msg' 'are' 23 | 24 | 6) Show only filenames matching the string: are 25 | should include files only from sub-directories 26 | hint: use shell glob pattern to specify directories to search 27 | Solution: grep -rl 'are' */ 28 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex08_search_pattern_from_file.txt: -------------------------------------------------------------------------------- 1 | Note: words.txt has only whole words per line, use it as file input when task is to match whole words 2 | 3 | 1) Match all strings from file words.txt in file baz.txt 4 | Solution: grep -f words.txt baz.txt 5 | 6 | 2) Match all words from file words.txt in file foo.txt 7 | should only match whole words 8 | should print only matching words, not entire line 9 | Solution: grep -owf words.txt foo.txt 10 | 11 | 3) Show common lines between foo.txt and baz.txt 12 | Solution: grep -Fxf foo.txt baz.txt 13 | 14 | 4) Show lines present in baz.txt but not in foo.txt 15 | Solution: grep -Fxvf foo.txt baz.txt 16 | 17 | 5) Show lines present in foo.txt but not in baz.txt 18 | Solution: grep -Fxvf baz.txt foo.txt 19 | 20 | 6) Find all words common between all three files in the directory 21 | should only match whole words 22 | should print only matching words, not entire line 23 | Solution: grep -owf words.txt foo.txt | grep -owf- baz.txt 24 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex09_regex_anchors.txt: -------------------------------------------------------------------------------- 1 | 1) Match all lines starting with: no 2 | Solution: grep '^no' sample.txt 3 | 4 | 2) Match all lines ending with: it 5 | Solution: grep 'it$' sample.txt 6 | 7 | 3) Match all lines containing whole word: do 8 | Solution: grep -w 'do' sample.txt 9 | 10 | 4) Match all lines containing words starting with: do 11 | Solution: grep '\' sample.txt 15 | 16 | 6) Match all lines starting with: ^ 17 | Solution: grep '^^' sample.txt 18 | 19 | 7) Match all lines ending with: $ 20 | Solution: grep '$$' sample.txt 21 | 22 | 8) Match all lines containing the string: in 23 | not surrounded by word boundaries, for ex: mint but not tin or ink 24 | Solution: grep '\Bin\B' sample.txt 25 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex10_regex_this_or_that.txt: -------------------------------------------------------------------------------- 1 | 1) Match all lines containing any of these strings: 2 | String1: day 3 | String2: not 4 | Solution: grep -E 'day|not' sample.txt 5 | 6 | 2) Match all lines containing any of these whole words: 7 | String1: he 8 | String2: in 9 | Solution: grep -wE 'he|in' sample.txt 10 | 11 | 3) Match all lines containing any of these strings: 12 | String1: you 13 | String2: be 14 | String3: to 15 | String4: he 16 | Solution: grep -E 'he|be|to|you' sample.txt 17 | 18 | 4) Match all lines containing any of these strings: 19 | String1: you 20 | String2: be 21 | String3: to 22 | String4: he 23 | but NOT these strings: 24 | String1: it 25 | String2: do 26 | Solution: grep -E 'he|be|to|you' sample.txt | grep -vE 'do|it' 27 | 28 | 5) Match all lines starting with any of these strings: 29 | String1: no 30 | String2: to 31 | Solution: grep -E '^no|^to' sample.txt 32 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex11_regex_quantifiers.txt: -------------------------------------------------------------------------------- 1 | 1) Extract all 3 character strings surrounded by word boundaries 2 | Solution: grep -ow '...' garbled.txt 3 | 4 | 2) Extract largest string from each line 5 | starting with character: d 6 | ending with character : g 7 | Solution: grep -o 'd.*g' garbled.txt 8 | 9 | 3) Extract all strings from each line 10 | starting with character: d 11 | followed by zero or one: o 12 | ending with character : g 13 | Solution: grep -oE 'do?g' garbled.txt 14 | 15 | 4) Extract all strings from each line 16 | starting with character: d 17 | followed by zero or one of any character 18 | ending with character : g 19 | Solution: grep -oE 'd.?g' garbled.txt 20 | 21 | 5) Extract all strings from each line 22 | starting with character: g 23 | followed by atleast one: o 24 | ending with character : d 25 | Solution: grep -oE 'go+d' garbled.txt 26 | 27 | 6) Extract all strings from each line 28 | starting with character : g 29 | followed by extactly six: o 30 | ending with character : d 31 | Solution: grep -oE 'go{6}d' garbled.txt 32 | 33 | 7) Extract all strings from each line 34 | starting with character : g 35 | followed by min two and max four: o 36 | ending with character : d 37 | Solution: grep -oE 'go{2,4}d' garbled.txt 38 | 39 | 8) Extract all strings from each line 40 | starting with character: d 41 | followed by max of two : o 42 | ending with character : g 43 | Solution: grep -oE 'do{,2}g' garbled.txt 44 | 45 | 9) Extract all strings from each line 46 | starting with character : g 47 | followed by min of three: o 48 | ending with character : d 49 | Solution: grep -oE 'go{3,}d' garbled.txt 50 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex12_regex_character_class_part1.txt: -------------------------------------------------------------------------------- 1 | 1) Match all lines containing any of these characters: 2 | character1: q 3 | character2: x 4 | character3: z 5 | Solution: grep '[qzx]' sample_words.txt 6 | 7 | 2) Match all lines containing any of these characters: 8 | character1: c 9 | character2: f 10 | followed by any character 11 | followed by : t 12 | Solution: grep '[cf].t' sample_words.txt 13 | 14 | 3) Extract all words starting with character: s 15 | ignore case 16 | should contain only alphabets 17 | minimum two letters 18 | should be surrounded by word boundaries 19 | Solution: grep -iowE 's[a-z]+' sample_words.txt 20 | 21 | 4) Extract all words made up of these characters: 22 | character1: a 23 | character2: c 24 | character3: e 25 | character4: r 26 | character5: s 27 | ignore case 28 | should contain only alphabets 29 | should be surrounded by word boundaries 30 | Solution: grep -iowE '[acers]+' sample_words.txt 31 | 32 | 5) Extract all numbers surrounded by word boundaries 33 | Solution: grep -ow '[0-9]*' sample_words.txt 34 | 35 | 6) Extract all numbers surrounded by word boundaries matching the condition 36 | 30 <= number <= 70 37 | Solution: grep -owE '[3-6][0-9]|70' sample_words.txt 38 | 39 | 7) Extract all words made up of non-vowel characters 40 | ignore case 41 | should contain only alphabets and at least two 42 | should be surrounded by word boundaries 43 | Solution: grep -iowE '[b-df-hj-np-tv-z]{2,}' sample_words.txt 44 | 45 | 8) Extract all sequence of strings consisting of character: - 46 | surrounded on either side by zero or more case insensitive alphabets 47 | Solution: grep -io '[a-z]*-[a-z]*' sample_words.txt 48 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex13_regex_character_class_part2.txt: -------------------------------------------------------------------------------- 1 | 1) Extract all characters before first occurrence of = 2 | Solution: grep -o '^[^=]*' sample.txt 3 | 4 | 2) Extract all characters from start of line made up of these characters 5 | upper or lower case alphabets 6 | all digits 7 | the underscore character 8 | Solution: grep -o '^\w*' sample.txt 9 | 10 | 3) Match all lines containing the sequence 11 | String1: there 12 | any number of whitespace 13 | String2: have 14 | Solution: grep 'there\s*have' sample.txt 15 | 16 | 4) Extract all characters from start of line made up of these characters 17 | upper or lower case alphabets 18 | all digits 19 | the characters [ and ] 20 | ending with ] 21 | Solution: grep -oi '^[]a-z0-9[]*]' sample.txt 22 | 23 | 5) Extract all punctuation characters from first line 24 | Solution: grep -om1 '[[:punct:]]' sample.txt 25 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex14_regex_grouping_and_backreference.txt: -------------------------------------------------------------------------------- 1 | 1) Match lines containing these strings 2 | String1: scare 3 | String2: spore 4 | Solution: grep -E 's(po|ca)re' sample.txt 5 | 6 | 2) Extract these words 7 | Word1: handy 8 | Word2: hand 9 | Word3: hands 10 | Word4: handful 11 | Solution: grep -oE 'hand([sy]|ful)?' sample.txt 12 | 13 | 3) Extract all whole words with at least one letter occurring twice in the word 14 | ignore case 15 | only alphabets 16 | the letter occurring twice need not be placed next to each other 17 | Solution: grep -ioE '[a-z]*([a-z])[a-z]*\1[a-z]*' sample.txt 18 | 19 | 4) Match lines where same sequence of three consecutive alphabets is matched another time in the same line 20 | ignore case 21 | Solution: grep -iE '([a-z]{3}).*\1' sample.txt 22 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex15_regex_PCRE.txt: -------------------------------------------------------------------------------- 1 | 1) Extract all strings to the right of = 2 | provided characters from start of line until = do not include [ or ] 3 | Solution: grep -oP '^[^][=]+=\K.*' sample.txt 4 | 5 | 2) Match all lines containing the string: Hi 6 | but shouldn't be followed afterwards in the line by: are 7 | Solution: grep -P 'Hi(?!.*are)' sample.txt 8 | 9 | 3) Extract from start of line up to the string: Hi 10 | provided it is followed afterwards in the line by: you 11 | Solution: grep -oP '.*Hi(?=.*you)' sample.txt 12 | 13 | 4) Extract all sequence of characters surrounded on both sides by space character 14 | the space character should not be part of output 15 | Solution: grep -oP ' \K[^ ]+(?= )' sample.txt 16 | 17 | 5) Extract all words 18 | made of upper or lower case alphabets 19 | at least two letters in length 20 | surrounded by word boundaries 21 | should not contain consecutive repeated alphabets 22 | Solution: grep -iowP '[a-z]*([a-z])\1[a-z]*(*SKIP)(*F)|[a-z]{2,}' sample.txt 23 | 24 | -------------------------------------------------------------------------------- /exercises/GNU_grep/.ref_solutions/ex16_misc_and_extras.txt: -------------------------------------------------------------------------------- 1 | Note: all files in directory are input to grep, unless otherwise specified 2 | 3 | 1) Extract all negative numbers 4 | starts with - followed by one or more digits 5 | do not output filenames 6 | Solution: grep -hoE -- '-[0-9]+' * 7 | 8 | 2) Display only filenames containing these two strings anywhere in the file 9 | String1: day 10 | String2: and 11 | Solution: grep -zlE 'day.*and|and.*day' * 12 | 13 | 3) The below command 14 | grep -c '^Solution:' ../.ref_solutions/* 15 | will give number of questions in each exercise. Change it, using another command and pipe if needed, so that only overall total is printed 16 | Solution: cat ../.ref_solutions/* | grep -c '^Solution:' 17 | 18 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex01_basic_match.txt: -------------------------------------------------------------------------------- 1 | 1) Match lines containing the string: day 2 | 3 | 4 | 2) Match lines containing the string: it 5 | 6 | 7 | 3) Match lines containing the string: do you 8 | 9 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex01_basic_match/sample.txt: -------------------------------------------------------------------------------- 1 | Hello World! 2 | 3 | Good day 4 | How do you do? 5 | 6 | Just do it 7 | Believe it! 8 | 9 | Today is sunny 10 | Not a bit funny 11 | No doubt you like it too 12 | 13 | Much ado about nothing 14 | He he he 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex02_basic_options.txt: -------------------------------------------------------------------------------- 1 | 1) Match lines containing the string irrespective of lower/upper case: no 2 | 3 | 4 | 2) Match lines not containing the string: o 5 | 6 | 7 | 3) Match lines with line numbers containing the string: it 8 | 9 | 10 | 4) Output only number of matching lines containing the string: a 11 | 12 | 13 | 5) Match first two lines containing the string: do 14 | 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex02_basic_options/sample.txt: -------------------------------------------------------------------------------- 1 | Hello World! 2 | 3 | Good day 4 | How do you do? 5 | 6 | Just do it 7 | Believe it! 8 | 9 | Today is sunny 10 | Not a bit funny 11 | No doubt you like it too 12 | 13 | Much ado about nothing 14 | He he he 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex03_multiple_string_match.txt: -------------------------------------------------------------------------------- 1 | 1) Match lines containing either of these three strings 2 | String1: Not 3 | String2: he 4 | String3: sun 5 | 6 | 7 | 2) Match lines containing both these strings 8 | String1: He 9 | String2: or 10 | 11 | 12 | 3) Match lines containing either of these two strings 13 | String1: a 14 | String2: i 15 | and contains this as well 16 | String3: do 17 | 18 | 19 | 4) Match lines containing the string 20 | String1: it 21 | but not these strings 22 | String2: No 23 | String3: no 24 | 25 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex03_multiple_string_match/sample.txt: -------------------------------------------------------------------------------- 1 | Hello World! 2 | 3 | Good day 4 | How do you do? 5 | 6 | Just do it 7 | Believe it! 8 | 9 | Today is sunny 10 | Not a bit funny 11 | No doubt you like it too 12 | 13 | Much ado about nothing 14 | He he he 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex04_filenames.txt: -------------------------------------------------------------------------------- 1 | Note: All files present in the directory should be given as file inputs to grep 2 | 3 | 1) Show only filenames containing the string: are 4 | 5 | 6 | 2) Show only filenames NOT containing the string: two 7 | 8 | 9 | 3) Match all lines containing the string: are 10 | 11 | 12 | 4) Match maximum of two matching lines along with filenames containing the character: a 13 | 14 | 15 | 5) Match all lines without prefixing filename containing the string: to 16 | 17 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex04_filenames/greeting.txt: -------------------------------------------------------------------------------- 1 | Hi, how are you? 2 | 3 | Hola :) 4 | 5 | Hello world 6 | 7 | Good day 8 | 9 | Rock on 10 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex04_filenames/poem.txt: -------------------------------------------------------------------------------- 1 | Roses are red, 2 | Violets are blue, 3 | Sugar is sweet, 4 | And so are you. 5 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex04_filenames/sample.txt: -------------------------------------------------------------------------------- 1 | Hello World! 2 | 3 | Good day 4 | How do you do? 5 | 6 | Just do it 7 | Believe it! 8 | 9 | Today is sunny 10 | Not a bit funny 11 | No doubt you like it too 12 | 13 | Much ado about nothing 14 | He he he 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex05_word_line_matching.txt: -------------------------------------------------------------------------------- 1 | Note: All files present in the directory should be given as file inputs to grep 2 | 3 | 1) Match lines containing whole word: do 4 | 5 | 6 | 2) Match whole lines containing the string: Hello World 7 | 8 | 9 | 3) Match lines containing these whole words: 10 | Word1: He 11 | Word2: far 12 | 13 | 14 | 4) Match lines containing the whole word: you 15 | and NOT containing the case insensitive string: How 16 | 17 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex05_word_line_matching/greeting.txt: -------------------------------------------------------------------------------- 1 | Hi, how are you? 2 | 3 | Hola :) 4 | 5 | Hello World 6 | 7 | Good day 8 | 9 | Rock on 10 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex05_word_line_matching/sample.txt: -------------------------------------------------------------------------------- 1 | Hello World! 2 | 3 | Good day 4 | How do you do? 5 | 6 | Just do it 7 | Believe it! 8 | 9 | Today is sunny 10 | Not a bit funny 11 | No doubt you like it too 12 | 13 | Much ado about nothing 14 | He he he 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex05_word_line_matching/words.txt: -------------------------------------------------------------------------------- 1 | afar 2 | far 3 | carfare 4 | farce 5 | faraway 6 | airfare 7 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex06_ABC_context_matching.txt: -------------------------------------------------------------------------------- 1 | 1) Get lines and 3 following it containing the string: you 2 | 3 | 4 | 2) Get lines and 2 preceding it containing the string: is 5 | 6 | 7 | 3) Get lines and 1 following/preceding containing the string: Not 8 | 9 | 10 | 4) Get lines and 1 following and 4 preceding containing the string: Not 11 | 12 | 13 | 5) Get lines and 1 preceding it containing the string: you 14 | there should be no separator between the matches 15 | 16 | 17 | 6) Get lines and 1 preceding it containing the string: you 18 | the separator between the matches should be: ##### 19 | 20 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex06_ABC_context_matching/sample.txt: -------------------------------------------------------------------------------- 1 | Hello World! 2 | 3 | Good day 4 | How do you do? 5 | 6 | Just do it 7 | Believe it! 8 | 9 | Today is sunny 10 | Not a bit funny 11 | No doubt you like it too 12 | 13 | Much ado about nothing 14 | He he he 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex07_recursive_search.txt: -------------------------------------------------------------------------------- 1 | Note: Every file in this directory and sub-directories is input for grep, unless otherwise specified 2 | 3 | 1) Match all lines containing the string: you 4 | 5 | 6 | 2) Show only filenames matching the string: Hello 7 | filenames should only end with .txt 8 | 9 | 10 | 3) Show only filenames matching the string: Hello 11 | filenames should NOT end with .txt 12 | 13 | 14 | 4) Show only filenames matching the string: are 15 | should not include the directory: progs 16 | 17 | 18 | 5) Show only filenames matching the string: are 19 | should NOT include these directories 20 | dir1: progs 21 | dir2: msg 22 | 23 | 24 | 6) Show only filenames matching the string: are 25 | should include files only from sub-directories 26 | hint: use shell glob pattern to specify directories to search 27 | 28 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex07_recursive_search/msg/greeting.txt: -------------------------------------------------------------------------------- 1 | Hi, how are you? 2 | 3 | Hola :) 4 | 5 | Hello World 6 | 7 | Good day 8 | 9 | Rock on 10 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex07_recursive_search/msg/sample.txt: -------------------------------------------------------------------------------- 1 | Hello World! 2 | 3 | Good day 4 | How do you do? 5 | 6 | Just do it 7 | Believe it! 8 | 9 | Today is sunny 10 | Not a bit funny 11 | No doubt you like it too 12 | 13 | Much ado about nothing 14 | He he he 15 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex07_recursive_search/poem.txt: -------------------------------------------------------------------------------- 1 | Roses are red, 2 | Violets are blue, 3 | Sugar is sweet, 4 | And so are you. 5 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex07_recursive_search/progs/hello.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python3 2 | 3 | print("Hello World") 4 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex07_recursive_search/progs/hello.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "Hello $USER" 4 | echo "Today is $(date -u +%A)" 5 | echo 'Hope you are having a nice day' 6 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex07_recursive_search/words.txt: -------------------------------------------------------------------------------- 1 | afar 2 | far 3 | carfare 4 | farce 5 | faraway 6 | airfare 7 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex08_search_pattern_from_file.txt: -------------------------------------------------------------------------------- 1 | Note: words.txt has only whole words per line, use it as file input when task is to match whole words 2 | 3 | 1) Match all strings from file words.txt in file baz.txt 4 | 5 | 6 | 2) Match all words from file words.txt in file foo.txt 7 | should only match whole words 8 | should print only matching words, not entire line 9 | 10 | 11 | 3) Show common lines between foo.txt and baz.txt 12 | 13 | 14 | 4) Show lines present in baz.txt but not in foo.txt 15 | 16 | 17 | 5) Show lines present in foo.txt but not in baz.txt 18 | 19 | 20 | 6) Find all words common between all three files in the directory 21 | should only match whole words 22 | should print only matching words, not entire line 23 | 24 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex08_search_pattern_from_file/baz.txt: -------------------------------------------------------------------------------- 1 | I saw a few red cars going that way 2 | To the end! 3 | Are you coming today to the party? 4 | a[5] = 'good'; 5 | Have you read the Harry Potter series? 6 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex08_search_pattern_from_file/foo.txt: -------------------------------------------------------------------------------- 1 | part 2 | a[5] = 'good'; 3 | I saw a few red cars going that way 4 | Believe it! 5 | to do list 6 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex08_search_pattern_from_file/words.txt: -------------------------------------------------------------------------------- 1 | car 2 | part 3 | to 4 | read 5 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex09_regex_anchors.txt: -------------------------------------------------------------------------------- 1 | 1) Match all lines starting with: no 2 | 3 | 4 | 2) Match all lines ending with: it 5 | 6 | 7 | 3) Match all lines containing whole word: do 8 | 9 | 10 | 4) Match all lines containing words starting with: do 11 | 12 | 13 | 5) Match all lines containing words ending with: do 14 | 15 | 16 | 6) Match all lines starting with: ^ 17 | 18 | 19 | 7) Match all lines ending with: $ 20 | 21 | 22 | 8) Match all lines containing the string: in 23 | not surrounded by word boundaries, for ex: mint but not tin or ink 24 | 25 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex09_regex_anchors/sample.txt: -------------------------------------------------------------------------------- 1 | hello world! 2 | 3 | good day 4 | how do you do? 5 | 6 | just do it 7 | believe it! 8 | 9 | today is sunny 10 | not a bit funny 11 | no doubt you like it too 12 | 13 | much ado about nothing 14 | he he he 15 | 16 | ^ could be exponentiation or xor operator 17 | scalar variables in perl start with $ 18 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex10_regex_this_or_that.txt: -------------------------------------------------------------------------------- 1 | 1) Match all lines containing any of these strings: 2 | String1: day 3 | String2: not 4 | 5 | 6 | 2) Match all lines containing any of these whole words: 7 | String1: he 8 | String2: in 9 | 10 | 11 | 3) Match all lines containing any of these strings: 12 | String1: you 13 | String2: be 14 | String3: to 15 | String4: he 16 | 17 | 18 | 4) Match all lines containing any of these strings: 19 | String1: you 20 | String2: be 21 | String3: to 22 | String4: he 23 | but NOT these strings: 24 | String1: it 25 | String2: do 26 | 27 | 28 | 5) Match all lines starting with any of these strings: 29 | String1: no 30 | String2: to 31 | 32 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex10_regex_this_or_that/sample.txt: -------------------------------------------------------------------------------- 1 | hello world! 2 | 3 | good day 4 | how do you do? 5 | 6 | just do it 7 | believe it! 8 | 9 | today is sunny 10 | not a bit funny 11 | no doubt you like it too 12 | 13 | much ado about nothing 14 | he he he 15 | 16 | ^ could be exponentiation or xor operator 17 | scalar variables in perl start with $ 18 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex11_regex_quantifiers.txt: -------------------------------------------------------------------------------- 1 | 1) Extract all 3 character strings surrounded by word boundaries 2 | 3 | 4 | 2) Extract largest string from each line 5 | starting with character: d 6 | ending with character : g 7 | 8 | 9 | 3) Extract all strings from each line 10 | starting with character: d 11 | followed by zero or one: o 12 | ending with character : g 13 | 14 | 15 | 4) Extract all strings from each line 16 | starting with character: d 17 | followed by zero or one of any character 18 | ending with character : g 19 | 20 | 21 | 5) Extract all strings from each line 22 | starting with character: g 23 | followed by atleast one: o 24 | ending with character : d 25 | 26 | 27 | 6) Extract all strings from each line 28 | starting with character : g 29 | followed by extactly six: o 30 | ending with character : d 31 | 32 | 33 | 7) Extract all strings from each line 34 | starting with character : g 35 | followed by min two and max four: o 36 | ending with character : d 37 | 38 | 39 | 8) Extract all strings from each line 40 | starting with character: d 41 | followed by max of two : o 42 | ending with character : g 43 | 44 | 45 | 9) Extract all strings from each line 46 | starting with character : g 47 | followed by min of three: o 48 | ending with character : d 49 | 50 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex11_regex_quantifiers/garbled.txt: -------------------------------------------------------------------------------- 1 | gd 2 | god 3 | goood 4 | oh gold 5 | goooooodyyyy 6 | dog 7 | dg 8 | dig good gold 9 | doogoodog 10 | c@t made forty justify 11 | dodging a toy 12 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex12_regex_character_class_part1.txt: -------------------------------------------------------------------------------- 1 | 1) Match all lines containing any of these characters: 2 | character1: q 3 | character2: x 4 | character3: z 5 | 6 | 7 | 2) Match all lines containing any of these characters: 8 | character1: c 9 | character2: f 10 | followed by any character 11 | followed by : t 12 | 13 | 14 | 3) Extract all words starting with character: s 15 | ignore case 16 | should contain only alphabets 17 | minimum two letters 18 | should be surrounded by word boundaries 19 | 20 | 21 | 4) Extract all words made up of these characters: 22 | character1: a 23 | character2: c 24 | character3: e 25 | character4: r 26 | character5: s 27 | ignore case 28 | should contain only alphabets 29 | should be surrounded by word boundaries 30 | 31 | 32 | 5) Extract all numbers surrounded by word boundaries 33 | 34 | 35 | 6) Extract all numbers surrounded by word boundaries matching the condition 36 | 30 <= number <= 70 37 | 38 | 39 | 7) Extract all words made up of non-vowel characters 40 | ignore case 41 | should contain only alphabets and at least two 42 | should be surrounded by word boundaries 43 | 44 | 45 | 8) Extract all sequence of strings consisting of character: - 46 | surrounded on either side by zero or more case insensitive alphabets 47 | 48 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex12_regex_character_class_part1/sample_words.txt: -------------------------------------------------------------------------------- 1 | far 30 scarce f@$t 42 fit 2 | Cute 34 quite pry far-fetched Sure 3 | 70 cast-away 12 good hue he 4 | cry just Nymph race Peace. 67 5 | foo;bar;baz;p@t 6 | ARE 72 cut copy paste 7 | p1ate rest 512 Sync 8 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex13_regex_character_class_part2.txt: -------------------------------------------------------------------------------- 1 | 1) Extract all characters before first occurrence of = 2 | 3 | 4 | 2) Extract all characters from start of line made up of these characters 5 | upper or lower case alphabets 6 | all digits 7 | the underscore character 8 | 9 | 10 | 3) Match all lines containing the sequence 11 | String1: there 12 | any number of whitespace 13 | String2: have 14 | 15 | 16 | 4) Extract all characters from start of line made up of these characters 17 | upper or lower case alphabets 18 | all digits 19 | the characters [ and ] 20 | ending with ] 21 | 22 | 23 | 5) Extract all punctuation characters from first line 24 | 25 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex13_regex_character_class_part2/sample.txt: -------------------------------------------------------------------------------- 1 | a[2]='sample string' 2 | foo_bar=4232 3 | appx_pi=3.14 4 | greeting="Hi there have a nice day" 5 | food[4]="dosa" 6 | b[0][1]=42 7 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex14_regex_grouping_and_backreference.txt: -------------------------------------------------------------------------------- 1 | 1) Match lines containing these strings 2 | String1: scare 3 | String2: spore 4 | 5 | 6 | 2) Extract these words 7 | Word1: handy 8 | Word2: hand 9 | Word3: hands 10 | Word4: handful 11 | 12 | 13 | 3) Extract all whole words with at least one letter occurring twice in the word 14 | ignore case 15 | only alphabets 16 | the letter occurring twice need not be placed next to each other 17 | 18 | 19 | 4) Match lines where same sequence of three consecutive alphabets is matched another time in the same line 20 | ignore case 21 | 22 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex14_regex_grouping_and_backreference/sample.txt: -------------------------------------------------------------------------------- 1 | hands hand library scare handy handful 2 | scared too big time eel candy 3 | spare food regulate circuit spore stare 4 | tire tempt cold malady 5 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex15_regex_PCRE.txt: -------------------------------------------------------------------------------- 1 | 1) Extract all strings to the right of = 2 | provided characters from start of line until = do not include [ or ] 3 | 4 | 5 | 2) Match all lines containing the string: Hi 6 | but shouldn't be followed afterwards in the line by: are 7 | 8 | 9 | 3) Extract from start of line up to the string: Hi 10 | provided it is followed afterwards in the line by: you 11 | 12 | 13 | 4) Extract all sequence of characters surrounded on both sides by space character 14 | the space character should not be part of output 15 | 16 | 17 | 5) Extract all words 18 | made of upper or lower case alphabets 19 | at least two letters in length 20 | surrounded by word boundaries 21 | should not contain consecutive repeated alphabets 22 | 23 | 24 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex15_regex_PCRE/sample.txt: -------------------------------------------------------------------------------- 1 | a[2]='Hi, how are you?' 2 | foo_bar=4232 3 | appx_pi=3.14 4 | greeting="Hi there have a nice day" 5 | food[4]="dosa" 6 | b[0][1]=42 7 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex16_misc_and_extras.txt: -------------------------------------------------------------------------------- 1 | Note: all files in directory are input to grep, unless otherwise specified 2 | 3 | 1) Extract all negative numbers 4 | starts with - followed by one or more digits 5 | do not output filenames 6 | 7 | 8 | 2) Display only filenames containing these two strings anywhere in the file 9 | String1: day 10 | String2: and 11 | 12 | 13 | 3) The below command 14 | grep -c '^Solution:' ../.ref_solutions/* 15 | will give number of questions in each exercise. Change it, using another command and pipe if needed, so that only overall total is printed 16 | 17 | 18 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex16_misc_and_extras/garbled.txt: -------------------------------------------------------------------------------- 1 | day and night 2 | -43 and 99 and 12 3 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex16_misc_and_extras/poem.txt: -------------------------------------------------------------------------------- 1 | Roses are red, 2 | Violets are blue, 3 | Sugar is sweet, 4 | And so are you. 5 | 6 | Good day to you :) 7 | -------------------------------------------------------------------------------- /exercises/GNU_grep/ex16_misc_and_extras/sample.txt: -------------------------------------------------------------------------------- 1 | account balance: -2300 2 | good day 3 | foo and bar and baz 4 | -------------------------------------------------------------------------------- /exercises/GNU_grep/solve: -------------------------------------------------------------------------------- 1 | dir_name=$(basename "$PWD") 2 | ref_file="../.ref_solutions/$dir_name.txt" 3 | sol_file="../$dir_name.txt" 4 | tmp_file='../.tmp.txt' 5 | 6 | # color output 7 | tcolors=$(tput colors) 8 | if [[ -n $tcolors && $tcolors -ge 8 ]]; then 9 | red=$(tput setaf 1) 10 | green=$(tput setaf 2) 11 | blue=$(tput setaf 4) 12 | clr_color=$(tput sgr0) 13 | else 14 | red='' 15 | green='' 16 | blue='' 17 | clr_color='' 18 | fi 19 | 20 | sub_sol=0 21 | if [[ $1 == -s ]]; then 22 | prev_cmd=$(fc -ln -2 | sed 's/^[ \t]*//;q') 23 | sub_sol=1 24 | elif [[ $1 == -q ]]; then 25 | # highlight the question to be solved next 26 | # or show only the (unanswered)? question to be solved next 27 | cat "$sol_file" 28 | return 29 | elif [[ -n $1 ]]; then 30 | echo -e 'Unknown option...Exiting script' 31 | return 32 | fi 33 | 34 | count=0 35 | sol_count=0 36 | err_count=0 37 | while IFS= read -u3 -r ref_line && read -u4 -r sol_line; do 38 | if [[ "${ref_line:0:9}" == Solution: ]]; then 39 | (( count++ )) 40 | 41 | if [[ $sub_sol == 1 && -z $sol_line ]]; then 42 | sol_line="$prev_cmd" 43 | sub_sol=0 44 | fi 45 | 46 | if [[ "$(eval "command ${ref_line:10}")" == "$(eval "command $sol_line")" ]]; then 47 | (( sol_count++ )) 48 | # use color if terminal supports 49 | echo '---------------------------------------------' 50 | echo "Match for question $count:" 51 | echo "${red}Submitted solution:${clr_color} $sol_line" 52 | echo "${green}Reference solution:${clr_color} ${ref_line:10}" 53 | echo '---------------------------------------------' 54 | else 55 | (( err_count++ )) 56 | if [[ $err_count == 1 && -n $sol_line ]]; then 57 | echo '---------------------------------------------' 58 | echo "Mismatch for question $count:" 59 | echo "$(tput bold)${red}Expected output is:${clr_color}$(tput rmso)" 60 | eval "command ${ref_line:10}" 61 | echo '---------------------------------------------' 62 | fi 63 | sol_line='' 64 | fi 65 | fi 66 | 67 | echo "$sol_line" >> "$tmp_file" 68 | 69 | done 3<"$ref_file" 4<"$sol_file" 70 | 71 | ((count==sol_count)) && printf "\t\t$(tput bold)${blue}All Pass${clr_color}$(tput rmso)\t\t\n" 72 | 73 | mv "$tmp_file" "$sol_file" 74 | 75 | # vim: syntax=bash 76 | -------------------------------------------------------------------------------- /exercises/README.md: -------------------------------------------------------------------------------- 1 | # Exercises 2 | 3 | Instructions and shell script here assumes `bash` shell. Tested on *GNU bash, version 4.3.46* 4 | 5 |
6 | 7 | * For example, the first exercise for **GNU_grep** 8 | * directory: `ex01_basic_match` 9 | * question file: `ex01_basic_match.txt` 10 | * solution reference: `.ref_solutions/ex01_basic_match.txt` 11 | * Each exercise contains one or more question to be solved 12 | * The script `solve` will assist in checking solutions 13 | 14 | ```bash 15 | $ git clone https://github.com/learnbyexample/Command-line-text-processing.git 16 | $ cd Command-line-text-processing/exercises/GNU_grep/ 17 | $ ls 18 | ex01_basic_match ex02_basic_options ex03_multiple_string_match solve 19 | ex01_basic_match.txt ex02_basic_options.txt ex03_multiple_string_match.txt 20 | 21 | $ find -name 'ex01*' 22 | ./.ref_solutions/ex01_basic_match.txt 23 | ./ex01_basic_match 24 | ./ex01_basic_match.txt 25 | ``` 26 | 27 |
28 | 29 | * Solving the questions 30 | * Go to the exercise folder 31 | * Use `ls` to see input file(s) 32 | * To see the problems for that exercise, follow the steps below 33 | 34 | ```bash 35 | $ cd ex01_basic_match 36 | $ ls 37 | sample.txt 38 | 39 | $ # to see the questions 40 | $ source ../solve -q 41 | 1) Match lines containing the string: day 42 | 43 | 44 | 2) Match lines containing the string: it 45 | 46 | 47 | 3) Match lines containing the string: do you 48 | 49 | 50 | $ # or open the questions file with your fav editor 51 | $ gvim ../$(basename "$PWD").txt 52 | $ # create an alias to use from any ex* directory 53 | $ alias oq='gvim ../$(basename "$PWD").txt' 54 | $ oq 55 | ``` 56 | 57 |
58 | 59 | * Submitting solutions one by one 60 | * immediately after executing command that answers a question, call the `solve` script 61 | 62 | ```bash 63 | $ grep 'day' sample.txt 64 | Good day 65 | Today is sunny 66 | $ source ../solve -s 67 | --------------------------------------------- 68 | Match for question 1: 69 | Submitted solution: grep 'day' sample.txt 70 | Reference solution: grep 'day' sample.txt 71 | --------------------------------------------- 72 | ``` 73 | 74 |
75 | 76 | * Submit all at once 77 | * by editing the `../$(basename "$PWD").txt` file directly 78 | * the answer should replace the empty line immediately following the question 79 | * **Note** 80 | * there are different ways to solve the same question 81 | * but for specific exercise like **GNU_grep** try to solve using `grep` only 82 | * also, remember that `eval` is used to check equivalence. So be sure of commands submitted 83 | 84 | ```bash 85 | $ cat ../$(basename "$PWD").txt 86 | 1) Match lines containing the string: day 87 | grep 'day' sample.txt 88 | 89 | 2) Match lines containing the string: it 90 | sed -n '/it/p' sample.txt 91 | 92 | 3) Match lines containing the string: do you 93 | echo 'How do you do?' 94 | 95 | $ source ../solve 96 | --------------------------------------------- 97 | Match for question 1: 98 | Submitted solution: grep 'day' sample.txt 99 | Reference solution: grep 'day' sample.txt 100 | --------------------------------------------- 101 | --------------------------------------------- 102 | Match for question 2: 103 | Submitted solution: sed -n '/it/p' sample.txt 104 | Reference solution: grep 'it' sample.txt 105 | --------------------------------------------- 106 | --------------------------------------------- 107 | Match for question 3: 108 | Submitted solution: echo 'How do you do?' 109 | Reference solution: grep 'do you' sample.txt 110 | --------------------------------------------- 111 | All Pass 112 | ``` 113 | 114 |
115 | 116 | * Then move on to next exercise directory 117 | * Create aliases for different commands for easy use, after checking that the aliases are available of course 118 | 119 | ```bash 120 | $ type cs cq ca nq pq 121 | bash: type: cs: not found 122 | bash: type: cq: not found 123 | bash: type: ca: not found 124 | bash: type: nq: not found 125 | bash: type: pq: not found 126 | 127 | $ alias cs='source ../solve -s' 128 | $ alias cq='source ../solve -q' 129 | $ alias ca='source ../solve' 130 | $ # to go to directory of next question 131 | $ nq() { d=$(basename "$PWD"); nd=$(printf "../ex%02d*/" $((${d:2:2}+1))); cd $nd ; } 132 | $ # to go to directory of previous question 133 | $ pq() { d=$(basename "$PWD"); pd=$(printf "../ex%02d*/" $((${d:2:2}-1))); cd $pd ; } 134 | ``` 135 | 136 |
137 | 138 | If wrong solution is submitted, the expected output is shown. This also helps to better understand the question as I found it difficult to convey the intent of question clearly with words alone... 139 | 140 | ```bash 141 | $ source ../solve -q 142 | 1) Match lines containing the string: day 143 | 144 | 145 | 2) Match lines containing the string: it 146 | 147 | 148 | 3) Match lines containing the string: do you 149 | 150 | $ grep 'do' sample.txt 151 | How do you do? 152 | Just do it 153 | No doubt you like it too 154 | Much ado about nothing 155 | $ source ../solve -s 156 | --------------------------------------------- 157 | Mismatch for question 1: 158 | Expected output is: 159 | Good day 160 | Today is sunny 161 | --------------------------------------------- 162 | ``` 163 | -------------------------------------------------------------------------------- /file_attributes.md: -------------------------------------------------------------------------------- 1 | # File attributes 2 | 3 | **Table of Contents** 4 | 5 | * [wc](#wc) 6 | * [Various counts](#various-counts) 7 | * [subtle differences](#subtle-differences) 8 | * [Further reading for wc](#further-reading-for-wc) 9 | * [du](#du) 10 | * [Default size](#default-size) 11 | * [Various size formats](#various-size-formats) 12 | * [Dereferencing links](#dereferencing-links) 13 | * [Filtering options](#filtering-options) 14 | * [Further reading for du](#further-reading-for-du) 15 | * [df](#df) 16 | * [Examples](#examples) 17 | * [Further reading for df](#further-reading-for-df) 18 | * [touch](#touch) 19 | * [Creating empty file](#creating-empty-file) 20 | * [Updating timestamps](#updating-timestamps) 21 | * [Preserving timestamp](#preserving-timestamp) 22 | * [Further reading for touch](#further-reading-for-touch) 23 | * [file](#file) 24 | * [File type examples](#file-type-examples) 25 | * [Further reading for file](#further-reading-for-file) 26 | 27 |
28 | 29 | ## wc 30 | 31 | ```bash 32 | $ wc --version | head -n1 33 | wc (GNU coreutils) 8.25 34 | 35 | $ man wc 36 | WC(1) User Commands WC(1) 37 | 38 | NAME 39 | wc - print newline, word, and byte counts for each file 40 | 41 | SYNOPSIS 42 | wc [OPTION]... [FILE]... 43 | wc [OPTION]... --files0-from=F 44 | 45 | DESCRIPTION 46 | Print newline, word, and byte counts for each FILE, and a total line if 47 | more than one FILE is specified. A word is a non-zero-length sequence 48 | of characters delimited by white space. 49 | 50 | With no FILE, or when FILE is -, read standard input. 51 | ... 52 | ``` 53 | 54 |
55 | 56 | #### Various counts 57 | 58 | ```bash 59 | $ cat sample.txt 60 | Hello World 61 | Good day 62 | No doubt you like it too 63 | Much ado about nothing 64 | He he he 65 | 66 | $ # by default, gives newline/word/byte count (in that order) 67 | $ wc sample.txt 68 | 5 17 78 sample.txt 69 | 70 | $ # options to get individual numbers 71 | $ wc -l sample.txt 72 | 5 sample.txt 73 | $ wc -w sample.txt 74 | 17 sample.txt 75 | $ wc -c sample.txt 76 | 78 sample.txt 77 | 78 | $ # use shell input redirection if filename is not needed 79 | $ wc -l < sample.txt 80 | 5 81 | ``` 82 | 83 | * multiple file input 84 | * automatically displays total at end 85 | 86 | ```bash 87 | $ cat greeting.txt 88 | Hello there 89 | Have a safe journey 90 | $ cat fruits.txt 91 | Fruit Price 92 | apple 42 93 | banana 31 94 | fig 90 95 | guava 6 96 | 97 | $ wc *.txt 98 | 5 10 57 fruits.txt 99 | 2 6 32 greeting.txt 100 | 5 17 78 sample.txt 101 | 12 33 167 total 102 | ``` 103 | 104 | * use `-L` to get length of longest line 105 | 106 | ```bash 107 | $ wc -L < sample.txt 108 | 24 109 | 110 | $ echo 'foo bar baz' | wc -L 111 | 11 112 | $ echo 'hi there!' | wc -L 113 | 9 114 | 115 | $ # last line will show max value, not sum of all input 116 | $ wc -L *.txt 117 | 13 fruits.txt 118 | 19 greeting.txt 119 | 24 sample.txt 120 | 24 total 121 | ``` 122 | 123 |
124 | 125 | #### subtle differences 126 | 127 | * byte count vs character count 128 | 129 | ```bash 130 | $ # when input is ASCII 131 | $ printf 'hi there' | wc -c 132 | 8 133 | $ printf 'hi there' | wc -m 134 | 8 135 | 136 | $ # when input has multi-byte characters 137 | $ printf 'hi👍' | od -x 138 | 0000000 6968 9ff0 8d91 139 | 0000006 140 | 141 | $ printf 'hi👍' | wc -m 142 | 3 143 | 144 | $ printf 'hi👍' | wc -c 145 | 6 146 | ``` 147 | 148 | * `-l` option gives only the count of number of newline characters 149 | 150 | ```bash 151 | $ printf 'hi there\ngood day' | wc -l 152 | 1 153 | $ printf 'hi there\ngood day\n' | wc -l 154 | 2 155 | $ printf 'hi there\n\n\nfoo\n' | wc -l 156 | 4 157 | ``` 158 | 159 | * From `man wc` "A word is a non-zero-length sequence of characters delimited by white space" 160 | 161 | ```bash 162 | $ echo 'foo bar ;-*' | wc -w 163 | 3 164 | 165 | $ # use other text processing as needed 166 | $ echo 'foo bar ;-*' | grep -iowE '[a-z]+' 167 | foo 168 | bar 169 | $ echo 'foo bar ;-*' | grep -iowE '[a-z]+' | wc -l 170 | 2 171 | ``` 172 | 173 | * `-L` won't count non-printable characters and tabs are converted to equivalent spaces 174 | 175 | ```bash 176 | $ printf 'food\tgood' | wc -L 177 | 12 178 | $ printf 'food\tgood' | wc -m 179 | 9 180 | $ printf 'food\tgood' | awk '{print length()}' 181 | 9 182 | 183 | $ printf 'foo\0bar\0baz' | wc -L 184 | 9 185 | $ printf 'foo\0bar\0baz' | wc -m 186 | 11 187 | $ printf 'foo\0bar\0baz' | awk '{print length()}' 188 | 11 189 | ``` 190 | 191 |
192 | 193 | #### Further reading for wc 194 | 195 | * `man wc` and `info wc` for more options and detailed documentation 196 | * [wc Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/wc?sort=votes&pageSize=15) 197 | * [wc Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/wc?sort=votes&pageSize=15) 198 | 199 |
200 | 201 | ## du 202 | 203 | ```bash 204 | $ du --version | head -n1 205 | du (GNU coreutils) 8.25 206 | 207 | $ man du 208 | DU(1) User Commands DU(1) 209 | 210 | NAME 211 | du - estimate file space usage 212 | 213 | SYNOPSIS 214 | du [OPTION]... [FILE]... 215 | du [OPTION]... --files0-from=F 216 | 217 | DESCRIPTION 218 | Summarize disk usage of the set of FILEs, recursively for directories. 219 | ... 220 | ``` 221 | 222 |
223 | 224 |
225 | 226 | #### Default size 227 | 228 | * By default, size is given in size of **1024 bytes** 229 | * Files are ignored, all directories and sub-directories are recursively reported 230 | 231 | ```bash 232 | $ ls -F 233 | projs/ py_learn@ words.txt 234 | 235 | $ du 236 | 17920 ./projs/full_addr 237 | 14316 ./projs/half_addr 238 | 32952 ./projs 239 | 33880 . 240 | ``` 241 | 242 | * use `-a` to recursively show both files and directories 243 | * use `-s` to show total directory size without descending into its sub-directories 244 | 245 | ```bash 246 | $ du -a 247 | 712 ./projs/report.log 248 | 17916 ./projs/full_addr/faddr.v 249 | 17920 ./projs/full_addr 250 | 14312 ./projs/half_addr/haddr.v 251 | 14316 ./projs/half_addr 252 | 32952 ./projs 253 | 0 ./py_learn 254 | 924 ./words.txt 255 | 33880 . 256 | 257 | $ du -s 258 | 33880 . 259 | 260 | $ du -s projs words.txt 261 | 32952 projs 262 | 924 words.txt 263 | ``` 264 | 265 | * use `-S` to show directory size without taking into account size of its sub-directories 266 | 267 | ```bash 268 | $ du -S 269 | 17920 ./projs/full_addr 270 | 14316 ./projs/half_addr 271 | 716 ./projs 272 | 928 . 273 | ``` 274 | 275 |
276 | 277 |
278 | 279 | #### Various size formats 280 | 281 | ```bash 282 | $ # number of bytes 283 | $ stat -c %s words.txt 284 | 938848 285 | $ du -b words.txt 286 | 938848 words.txt 287 | 288 | $ # kilobytes = 1024 bytes 289 | $ du -sk projs 290 | 32952 projs 291 | $ # megabytes = 1024 kilobytes 292 | $ du -sm projs 293 | 33 projs 294 | 295 | $ # -B to specify custom byte scale size 296 | $ du -sB 5000 projs 297 | 6749 projs 298 | $ du -sB 1048576 projs 299 | 33 projs 300 | ``` 301 | 302 | * human readable and si units 303 | 304 | ```bash 305 | $ # in terms of powers of 1024 306 | $ # M = 1048576 bytes and so on 307 | $ du -sh projs/* words.txt 308 | 18M projs/full_addr 309 | 14M projs/half_addr 310 | 712K projs/report.log 311 | 924K words.txt 312 | 313 | $ # in terms of powers of 1000 314 | $ # M = 1000000 bytes and so on 315 | $ du -s --si projs/* words.txt 316 | 19M projs/full_addr 317 | 15M projs/half_addr 318 | 730k projs/report.log 319 | 947k words.txt 320 | ``` 321 | 322 | * sorting 323 | 324 | ```bash 325 | $ du -sh projs/* words.txt | sort -h 326 | 712K projs/report.log 327 | 924K words.txt 328 | 14M projs/half_addr 329 | 18M projs/full_addr 330 | 331 | $ du -sk projs/* | sort -nr 332 | 17920 projs/full_addr 333 | 14316 projs/half_addr 334 | 712 projs/report.log 335 | ``` 336 | 337 | * to get size based on number of characters in file rather than disk space alloted 338 | 339 | ```bash 340 | $ du -b words.txt 341 | 938848 words.txt 342 | 343 | $ du -h words.txt 344 | 924K words.txt 345 | 346 | $ # 938848/1024 = 916.84 347 | $ du --apparent-size -h words.txt 348 | 917K words.txt 349 | ``` 350 | 351 |
352 | 353 | #### Dereferencing links 354 | 355 | * See `man` and `info` pages for other related options 356 | 357 | ```bash 358 | $ # -D to dereference command line argument 359 | $ du py_learn 360 | 0 py_learn 361 | $ du -shD py_learn 362 | 503M py_learn 363 | 364 | $ # -L to dereference links found by du 365 | $ du -sh 366 | 34M . 367 | $ du -shL 368 | 536M . 369 | ``` 370 | 371 |
372 | 373 | #### Filtering options 374 | 375 | * `-d` to specify maximum depth 376 | 377 | ```bash 378 | $ du -ah projs 379 | 712K projs/report.log 380 | 18M projs/full_addr/faddr.v 381 | 18M projs/full_addr 382 | 14M projs/half_addr/haddr.v 383 | 14M projs/half_addr 384 | 33M projs 385 | 386 | $ du -ah -d1 projs 387 | 712K projs/report.log 388 | 18M projs/full_addr 389 | 14M projs/half_addr 390 | 33M projs 391 | ``` 392 | 393 | * `-c` to also show total size at end 394 | 395 | ```bash 396 | $ du -cshD projs py_learn 397 | 33M projs 398 | 503M py_learn 399 | 535M total 400 | ``` 401 | 402 | * `-t` to provide a threshold comparison 403 | 404 | ```bash 405 | $ # >= 15M 406 | $ du -Sh -t 15M 407 | 18M ./projs/full_addr 408 | 409 | $ # <= 1M 410 | $ du -ah -t -1M 411 | 712K ./projs/report.log 412 | 0 ./py_learn 413 | 924K ./words.txt 414 | ``` 415 | 416 | * excluding files/directories based on **glob** pattern 417 | * see also `--exclude-from=FILE` and `--files0-from=FILE` options 418 | 419 | ```bash 420 | $ # note that excluded files affect directory size reported 421 | $ du -ah --exclude='*addr*' projs 422 | 712K projs/report.log 423 | 716K projs 424 | 425 | $ # depending on shell, brace expansion can be used 426 | $ du -ah --exclude='*.'{v,log} projs 427 | 4.0K projs/full_addr 428 | 4.0K projs/half_addr 429 | 12K projs 430 | ``` 431 | 432 |
433 | 434 | #### Further reading for du 435 | 436 | * `man du` and `info du` for more options and detailed documentation 437 | * [du Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/disk-usage?sort=votes&pageSize=15) 438 | * [du Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/du?sort=votes&pageSize=15) 439 | 440 |
441 | 442 | ## df 443 | 444 | ```bash 445 | $ df --version | head -n1 446 | df (GNU coreutils) 8.25 447 | 448 | $ man df 449 | DF(1) User Commands DF(1) 450 | 451 | NAME 452 | df - report file system disk space usage 453 | 454 | SYNOPSIS 455 | df [OPTION]... [FILE]... 456 | 457 | DESCRIPTION 458 | This manual page documents the GNU version of df. df displays the 459 | amount of disk space available on the file system containing each file 460 | name argument. If no file name is given, the space available on all 461 | currently mounted file systems is shown. 462 | ... 463 | ``` 464 | 465 |
466 | 467 | #### Examples 468 | 469 | ```bash 470 | $ # use df without arguments to get information on all currently mounted file systems 471 | $ df . 472 | Filesystem 1K-blocks Used Available Use% Mounted on 473 | /dev/sda1 98298500 58563816 34734748 63% / 474 | 475 | $ # use -B option for custom size 476 | $ # use --si for size in powers of 1000 instead of 1024 477 | $ df -h . 478 | Filesystem Size Used Avail Use% Mounted on 479 | /dev/sda1 94G 56G 34G 63% / 480 | ``` 481 | 482 | * Use `--output` to report only specific fields of interest 483 | 484 | ```bash 485 | $ df -h --output=size,used,file / /media/learnbyexample/projs 486 | Size Used File 487 | 94G 56G / 488 | 92G 35G /media/learnbyexample/projs 489 | 490 | $ df -h --output=pcent . 491 | Use% 492 | 63% 493 | 494 | $ df -h --output=pcent,fstype | awk -F'%' 'NR>2 && $1>=40' 495 | 63% ext3 496 | 40% ext4 497 | 51% ext4 498 | ``` 499 | 500 |
501 | 502 | #### Further reading for df 503 | 504 | * `man df` and `info df` for more options and detailed documentation 505 | * [df Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/df?sort=votes&pageSize=15) 506 | * [Parsing df command output with awk](https://unix.stackexchange.com/questions/360865/parsing-df-command-output-with-awk) 507 | * [processing df output](https://www.reddit.com/r/bash/comments/68dbml/using_an_array_variable_in_an_awk_command/) 508 | 509 |
510 | 511 | ## touch 512 | 513 | ```bash 514 | $ touch --version | head -n1 515 | touch (GNU coreutils) 8.25 516 | 517 | $ man touch 518 | TOUCH(1) User Commands TOUCH(1) 519 | 520 | NAME 521 | touch - change file timestamps 522 | 523 | SYNOPSIS 524 | touch [OPTION]... FILE... 525 | 526 | DESCRIPTION 527 | Update the access and modification times of each FILE to the current 528 | time. 529 | 530 | A FILE argument that does not exist is created empty, unless -c or -h 531 | is supplied. 532 | ... 533 | ``` 534 | 535 |
536 | 537 | #### Creating empty file 538 | 539 | ```bash 540 | $ ls foo.txt 541 | ls: cannot access 'foo.txt': No such file or directory 542 | $ touch foo.txt 543 | $ ls foo.txt 544 | foo.txt 545 | 546 | $ # use -c if new file shouldn't be created 547 | $ rm foo.txt 548 | $ touch -c foo.txt 549 | $ ls foo.txt 550 | ls: cannot access 'foo.txt': No such file or directory 551 | ``` 552 | 553 |
554 | 555 | #### Updating timestamps 556 | 557 | * Updating both access and modification timestamp to current time 558 | 559 | ```bash 560 | $ # last access time 561 | $ stat -c %x fruits.txt 562 | 2017-07-19 17:06:01.523308599 +0530 563 | $ # last modification time 564 | $ stat -c %y fruits.txt 565 | 2017-07-13 13:54:03.576055933 +0530 566 | 567 | $ touch fruits.txt 568 | $ stat -c %x fruits.txt 569 | 2017-07-21 10:11:44.241921229 +0530 570 | $ stat -c %y fruits.txt 571 | 2017-07-21 10:11:44.241921229 +0530 572 | ``` 573 | 574 | * Updating only access or modification timestamp 575 | 576 | ```bash 577 | $ touch -a greeting.txt 578 | $ stat -c %x greeting.txt 579 | 2017-07-21 10:14:08.457268564 +0530 580 | $ stat -c %y greeting.txt 581 | 2017-07-13 13:54:26.004499660 +0530 582 | 583 | $ touch -m sample.txt 584 | $ stat -c %x sample.txt 585 | 2017-07-13 13:48:24.945450646 +0530 586 | $ stat -c %y sample.txt 587 | 2017-07-21 10:14:40.770006144 +0530 588 | ``` 589 | 590 | * Using timestamp from another file to update 591 | 592 | ```bash 593 | $ stat -c $'%x\n%y' power.log report.log 594 | 2017-07-19 10:48:03.978295434 +0530 595 | 2017-07-14 20:50:42.850887578 +0530 596 | 2017-06-24 13:00:31.773583923 +0530 597 | 2017-06-24 12:59:53.316751651 +0530 598 | 599 | $ # copy both access and modification timestamp from power.log to report.log 600 | $ touch -r power.log report.log 601 | $ stat -c $'%x\n%y' report.log 602 | 2017-07-19 10:48:03.978295434 +0530 603 | 2017-07-14 20:50:42.850887578 +0530 604 | 605 | $ # add -a or -m options to limit to only access or modification timestamp 606 | ``` 607 | 608 | * Using date string to update 609 | * See also `-t` option 610 | 611 | ```bash 612 | $ # add -a or -m as needed 613 | $ touch -d '2010-03-17 17:04:23' report.log 614 | $ stat -c $'%x\n%y' report.log 615 | 2010-03-17 17:04:23.000000000 +0530 616 | 2010-03-17 17:04:23.000000000 +0530 617 | ``` 618 | 619 |
620 | 621 | #### Preserving timestamp 622 | 623 | * Text processing on files would update the timestamps 624 | 625 | ```bash 626 | $ stat -c $'%x\n%y' power.log 627 | 2017-07-21 11:11:42.862874240 +0530 628 | 2017-07-13 21:31:53.496323704 +0530 629 | 630 | $ sed -i 's/foo/bar/g' power.log 631 | $ stat -c $'%x\n%y' power.log 632 | 2017-07-21 11:12:20.303504336 +0530 633 | 2017-07-21 11:12:20.303504336 +0530 634 | ``` 635 | 636 | * `touch` can be used to restore timestamps after processing 637 | 638 | ```bash 639 | $ # first copy the timestamps using touch -r 640 | $ stat -c $'%x\n%y' story.txt 641 | 2017-06-24 13:00:31.773583923 +0530 642 | 2017-06-24 12:59:53.316751651 +0530 643 | $ # tmp.txt is temporary empty file 644 | $ touch -r story.txt tmp.txt 645 | $ stat -c $'%x\n%y' tmp.txt 646 | 2017-06-24 13:00:31.773583923 +0530 647 | 2017-06-24 12:59:53.316751651 +0530 648 | 649 | $ # after text processing, copy back the timestamps and remove temporary file 650 | $ sed -i 's/cat/dog/g' story.txt 651 | $ touch -r tmp.txt story.txt && rm tmp.txt 652 | $ stat -c $'%x\n%y' story.txt 653 | 2017-06-24 13:00:31.773583923 +0530 654 | 2017-06-24 12:59:53.316751651 +0530 655 | ``` 656 | 657 |
658 | 659 | #### Further reading for touch 660 | 661 | * `man touch` and `info touch` for more options and detailed documentation 662 | * [touch Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/touch?sort=votes&pageSize=15) 663 | 664 |
665 | 666 | ## file 667 | 668 | ```bash 669 | $ file --version | head -n1 670 | file-5.25 671 | 672 | $ man file 673 | FILE(1) BSD General Commands Manual FILE(1) 674 | 675 | NAME 676 | file — determine file type 677 | 678 | SYNOPSIS 679 | file [-bcEhiklLNnprsvzZ0] [--apple] [--extension] [--mime-encoding] 680 | [--mime-type] [-e testname] [-F separator] [-f namefile] 681 | [-m magicfiles] [-P name=value] file ... 682 | file -C [-m magicfiles] 683 | file [--help] 684 | 685 | DESCRIPTION 686 | This manual page documents version 5.25 of the file command. 687 | 688 | file tests each argument in an attempt to classify it. There are three 689 | sets of tests, performed in this order: filesystem tests, magic tests, 690 | and language tests. The first test that succeeds causes the file type to 691 | be printed. 692 | ... 693 | ``` 694 | 695 |
696 | 697 |
698 | 699 | #### File type examples 700 | 701 | ```bash 702 | $ file sample.txt 703 | sample.txt: ASCII text 704 | $ # without file name in output 705 | $ file -b sample.txt 706 | ASCII text 707 | 708 | $ printf 'hi👍\n' | file - 709 | /dev/stdin: UTF-8 Unicode text 710 | $ printf 'hi👍\n' | file -i - 711 | /dev/stdin: text/plain; charset=utf-8 712 | 713 | $ file ch 714 | ch: Bourne-Again shell script, ASCII text executable 715 | 716 | $ file sunset.jpg moon.png 717 | sunset.jpg: JPEG image data 718 | moon.png: PNG image data, 32 x 32, 8-bit/color RGBA, non-interlaced 719 | ``` 720 | 721 | * different line terminators 722 | 723 | ```bash 724 | $ printf 'hi' | file - 725 | /dev/stdin: ASCII text, with no line terminators 726 | 727 | $ printf 'hi\r' | file - 728 | /dev/stdin: ASCII text, with CR line terminators 729 | 730 | $ printf 'hi\r\n' | file - 731 | /dev/stdin: ASCII text, with CRLF line terminators 732 | 733 | $ printf 'hi\n' | file - 734 | /dev/stdin: ASCII text 735 | ``` 736 | 737 | * find all files of particular type in current directory, for example `image` files 738 | 739 | ```bash 740 | $ find -type f -exec bash -c '(file -b "$0" | grep -wq "image data") && echo "$0"' {} \; 741 | ./sunset.jpg 742 | ./moon.png 743 | 744 | $ # if filenames do not contain : or newline characters 745 | $ find -type f -exec file {} + | awk -F: '/\/{print $1}' 746 | ./sunset.jpg 747 | ./moon.png 748 | ``` 749 | 750 |
751 | 752 | #### Further reading for file 753 | 754 | * `man file` and `info file` for more options and detailed documentation 755 | * See also `identify` command which `describes the format and characteristics of one or more image files` 756 | -------------------------------------------------------------------------------- /images/color_option.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/color_option.png -------------------------------------------------------------------------------- /images/colordiff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/colordiff.png -------------------------------------------------------------------------------- /images/highlight_string_whole_file_op.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/highlight_string_whole_file_op.png -------------------------------------------------------------------------------- /images/wdiff_to_colordiff.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/wdiff_to_colordiff.png -------------------------------------------------------------------------------- /miscellaneous.md: -------------------------------------------------------------------------------- 1 | # Miscellaneous 2 | 3 | **Table of Contents** 4 | 5 | * [cut](#cut) 6 | * [select specific fields](#select-specific-fields) 7 | * [suppressing lines without delimiter](#suppressing-lines-without-delimiter) 8 | * [specifying delimiters](#specifying-delimiters) 9 | * [complement](#complement) 10 | * [select specific characters](#select-specific-characters) 11 | * [Further reading for cut](#further-reading-for-cut) 12 | * [tr](#tr) 13 | * [translation](#translation) 14 | * [escape sequences and character classes](#escape-sequences-and-character-classes) 15 | * [deletion](#deletion) 16 | * [squeeze](#squeeze) 17 | * [Further reading for tr](#further-reading-for-tr) 18 | * [basename](#basename) 19 | * [dirname](#dirname) 20 | * [xargs](#xargs) 21 | * [seq](#seq) 22 | * [integer sequences](#integer-sequences) 23 | * [specifying separator](#specifying-separator) 24 | * [floating point sequences](#floating-point-sequences) 25 | * [Further reading for seq](#further-reading-for-seq) 26 | 27 |
28 | 29 | ## cut 30 | 31 | ```bash 32 | $ cut --version | head -n1 33 | cut (GNU coreutils) 8.25 34 | 35 | $ man cut 36 | CUT(1) User Commands CUT(1) 37 | 38 | NAME 39 | cut - remove sections from each line of files 40 | 41 | SYNOPSIS 42 | cut OPTION... [FILE]... 43 | 44 | DESCRIPTION 45 | Print selected parts of lines from each FILE to standard output. 46 | 47 | With no FILE, or when FILE is -, read standard input. 48 | ... 49 | ``` 50 | 51 |
52 | 53 | #### select specific fields 54 | 55 | * Default delimiter is **tab** character 56 | * `-f` option allows to print specific field(s) from each input line 57 | 58 | ```bash 59 | $ printf 'foo\tbar\t123\tbaz\n' 60 | foo bar 123 baz 61 | 62 | $ # single field 63 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f2 64 | bar 65 | 66 | $ # multiple fields can be specified by using , 67 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f2,4 68 | bar baz 69 | 70 | $ # output is always ascending order of field numbers 71 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f3,1 72 | foo 123 73 | 74 | $ # range can be specified using - 75 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f1-3 76 | foo bar 123 77 | $ # if ending number is omitted, select till last field 78 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f3- 79 | 123 baz 80 | ``` 81 | 82 |
83 | 84 | #### suppressing lines without delimiter 85 | 86 | ```bash 87 | $ cat marks.txt 88 | jan 2017 89 | foobar 12 45 23 90 | feb 2017 91 | foobar 18 38 19 92 | 93 | $ # by default lines without delimiter will be printed 94 | $ cut -f2- marks.txt 95 | jan 2017 96 | 12 45 23 97 | feb 2017 98 | 18 38 19 99 | 100 | $ # use -s option to suppress such lines 101 | $ cut -s -f2- marks.txt 102 | 12 45 23 103 | 18 38 19 104 | ``` 105 | 106 |
107 | 108 | #### specifying delimiters 109 | 110 | * use `-d` option to specify input delimiter other than default **tab** character 111 | * only single character can be used, for multi-character/regex based delimiter use `awk` or `perl` 112 | 113 | ```bash 114 | $ echo 'foo:bar:123:baz' | cut -d: -f3 115 | 123 116 | 117 | $ # by default output delimiter is same as input 118 | $ echo 'foo:bar:123:baz' | cut -d: -f1,4 119 | foo:baz 120 | 121 | $ # quote the delimiter character if it clashes with shell special characters 122 | $ echo 'one;two;three;four' | cut -d; -f3 123 | cut: option requires an argument -- 'd' 124 | Try 'cut --help' for more information. 125 | -f3: command not found 126 | $ echo 'one;two;three;four' | cut -d';' -f3 127 | three 128 | ``` 129 | 130 | * use `--output-delimiter` option to specify different output delimiter 131 | * since this option accepts a string, more than one character can be specified 132 | * See also [using $ prefixed string](https://unix.stackexchange.com/questions/48106/what-does-it-mean-to-have-a-dollarsign-prefixed-string-in-a-script) 133 | 134 | ```bash 135 | $ printf 'foo\tbar\t123\tbaz\n' | cut --output-delimiter=: -f1-3 136 | foo:bar:123 137 | 138 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=' ' -f1,3- 139 | one three four 140 | 141 | $ # tested on bash, might differ with other shells 142 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=$'\t' -f1,3- 143 | one three four 144 | 145 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=' - ' -f1,3- 146 | one - three - four 147 | ``` 148 | 149 |
150 | 151 | #### complement 152 | 153 | ```bash 154 | $ echo 'one;two;three;four' | cut -d';' -f1,3- 155 | one;three;four 156 | 157 | $ # to print other than specified fields 158 | $ echo 'one;two;three;four' | cut -d';' --complement -f2 159 | one;three;four 160 | ``` 161 | 162 |
163 | 164 | #### select specific characters 165 | 166 | * similar to `-f` for field selection, use `-c` for character selection 167 | * See manual for what defines a character and differences between `-b` and `-c` 168 | 169 | ```bash 170 | $ echo 'foo:bar:123:baz' | cut -c4 171 | : 172 | 173 | $ printf 'foo\tbar\t123\tbaz\n' | cut -c1,4,7 174 | f r 175 | 176 | $ echo 'foo:bar:123:baz' | cut -c8- 177 | :123:baz 178 | 179 | $ echo 'foo:bar:123:baz' | cut --complement -c8- 180 | foo:bar 181 | 182 | $ echo 'foo:bar:123:baz' | cut -c1,6,7 --output-delimiter=' ' 183 | f a r 184 | 185 | $ echo 'abcdefghij' | cut --output-delimiter='-' -c1-3,4-7,8- 186 | abc-defg-hij 187 | 188 | $ cut -c1-3 marks.txt 189 | jan 190 | foo 191 | feb 192 | foo 193 | ``` 194 | 195 |
196 | 197 | #### Further reading for cut 198 | 199 | * `man cut` and `info cut` for more options and detailed documentation 200 | * [cut Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/cut?sort=votes&pageSize=15) 201 | 202 |
203 | 204 | ## tr 205 | 206 | ```bash 207 | $ tr --version | head -n1 208 | tr (GNU coreutils) 8.25 209 | 210 | $ man tr 211 | TR(1) User Commands TR(1) 212 | 213 | NAME 214 | tr - translate or delete characters 215 | 216 | SYNOPSIS 217 | tr [OPTION]... SET1 [SET2] 218 | 219 | DESCRIPTION 220 | Translate, squeeze, and/or delete characters from standard input, writ‐ 221 | ing to standard output. 222 | ... 223 | ``` 224 | 225 |
226 | 227 | #### translation 228 | 229 | * one-to-one mapping of characters, all occurrences are translated 230 | * as good practice, enclose the arguments in single quotes to avoid issues due to shell interpretation 231 | 232 | ```bash 233 | $ echo 'foo bar cat baz' | tr 'abc' '123' 234 | foo 21r 31t 21z 235 | 236 | $ # use - to represent a range in ascending order 237 | $ echo 'foo bar cat baz' | tr 'a-f' '1-6' 238 | 6oo 21r 31t 21z 239 | 240 | $ # changing case 241 | $ echo 'foo bar cat baz' | tr 'a-z' 'A-Z' 242 | FOO BAR CAT BAZ 243 | $ echo 'Hello World' | tr 'a-zA-Z' 'A-Za-z' 244 | hELLO wORLD 245 | 246 | $ echo 'foo;bar;baz' | tr ; : 247 | tr: missing operand 248 | Try 'tr --help' for more information. 249 | $ echo 'foo;bar;baz' | tr ';' ':' 250 | foo:bar:baz 251 | ``` 252 | 253 | * rot13 example 254 | 255 | ```bash 256 | $ echo 'foo bar cat baz' | tr 'a-z' 'n-za-m' 257 | sbb one png onm 258 | $ echo 'sbb one png onm' | tr 'a-z' 'n-za-m' 259 | foo bar cat baz 260 | 261 | $ echo 'Hello World' | tr 'a-zA-Z' 'n-za-mN-ZA-M' 262 | Uryyb Jbeyq 263 | $ echo 'Uryyb Jbeyq' | tr 'a-zA-Z' 'n-za-mN-ZA-M' 264 | Hello World 265 | ``` 266 | 267 | * use shell input redirection for file input 268 | 269 | ```bash 270 | $ cat marks.txt 271 | jan 2017 272 | foobar 12 45 23 273 | feb 2017 274 | foobar 18 38 19 275 | 276 | $ tr 'a-z' 'A-Z' < marks.txt 277 | JAN 2017 278 | FOOBAR 12 45 23 279 | FEB 2017 280 | FOOBAR 18 38 19 281 | ``` 282 | 283 | * if arguments are of different lengths 284 | 285 | ```bash 286 | $ # when second argument is longer, the extra characters are ignored 287 | $ echo 'foo bar cat baz' | tr 'abc' '1-9' 288 | foo 21r 31t 21z 289 | 290 | $ # when first argument is longer 291 | $ # the last character of second argument gets re-used 292 | $ echo 'foo bar cat baz' | tr 'a-z' '123' 293 | 333 213 313 213 294 | 295 | $ # use -t option to truncate first argument to same length as second 296 | $ echo 'foo bar cat baz' | tr -t 'a-z' '123' 297 | foo 21r 31t 21z 298 | ``` 299 | 300 |
301 | 302 | #### escape sequences and character classes 303 | 304 | * Certain characters like newline, tab, etc can be represented using escape sequences or octal representation 305 | * Certain commonly useful groups of characters like alphabets, digits, punctuations etc have character class as shortcuts 306 | * See [gnu tr manual](http://www.gnu.org/software/coreutils/manual/html_node/Character-sets.html#Character-sets) for all escape sequences and character classes 307 | 308 | ```bash 309 | $ printf 'foo\tbar\t123\tbaz\n' | tr '\t' ':' 310 | foo:bar:123:baz 311 | 312 | $ echo 'foo:bar:123:baz' | tr ':' '\n' 313 | foo 314 | bar 315 | 123 316 | baz 317 | $ # makes it easier to transform 318 | $ echo 'foo:bar:123:baz' | tr ':' '\n' | pr -2ats'-' 319 | foo-bar 320 | 123-baz 321 | 322 | $ echo 'foo bar cat baz' | tr '[:lower:]' '[:upper:]' 323 | FOO BAR CAT BAZ 324 | ``` 325 | 326 | * since `-` is used for character ranges, place it at the end to represent it literally 327 | * cannot be used at start of argument as it would get treated as option 328 | * or use `--` to indicate end of option processing 329 | * similarly, to represent `\` literally, use `\\` 330 | 331 | ```bash 332 | $ echo '/foo-bar/baz/report' | tr '-a-z' '_A-Z' 333 | tr: invalid option -- 'a' 334 | Try 'tr --help' for more information. 335 | 336 | $ echo '/foo-bar/baz/report' | tr 'a-z-' 'A-Z_' 337 | /FOO_BAR/BAZ/REPORT 338 | 339 | $ echo '/foo-bar/baz/report' | tr -- '-a-z' '_A-Z' 340 | /FOO_BAR/BAZ/REPORT 341 | 342 | $ echo '/foo-bar/baz/report' | tr '/-' '\\_' 343 | \foo_bar\baz\report 344 | ``` 345 | 346 |
347 | 348 | #### deletion 349 | 350 | * use `-d` option to specify characters to be deleted 351 | * add complement option `-c` if it is easier to define which characters are to be retained 352 | 353 | ```bash 354 | $ echo '2017-03-21' | tr -d '-' 355 | 20170321 356 | 357 | $ echo 'Hi123 there. How a32re you' | tr -d '1-9' 358 | Hi there. How are you 359 | 360 | $ # delete all punctuation characters 361 | $ echo '"Foo1!", "Bar.", ":Baz:"' | tr -d '[:punct:]' 362 | Foo1 Bar Baz 363 | 364 | $ # deleting carriage return character 365 | $ cat -v greeting.txt 366 | Hi there^M 367 | How are you^M 368 | $ tr -d '\r' < greeting.txt | cat -v 369 | Hi there 370 | How are you 371 | 372 | $ # retain only alphabets, comma and newline characters 373 | $ echo '"Foo1!", "Bar.", ":Baz:"' | tr -cd '[:alpha:],\n' 374 | Foo,Bar,Baz 375 | ``` 376 | 377 |
378 | 379 | #### squeeze 380 | 381 | * to change consecutive repeated characters to single copy of that character 382 | 383 | ```bash 384 | $ # only lower case alphabets 385 | $ echo 'FFoo seed 11233' | tr -s 'a-z' 386 | FFo sed 11233 387 | 388 | $ # alphabets and digits 389 | $ echo 'FFoo seed 11233' | tr -s '[:alnum:]' 390 | Fo sed 123 391 | 392 | $ # squeeze other than alphabets 393 | $ echo 'FFoo seed 11233' | tr -sc '[:alpha:]' 394 | FFoo seed 123 395 | 396 | $ # only characters present in second argument is used for squeeze 397 | $ echo 'FFoo seed 11233' | tr -s 'A-Z' 'a-z' 398 | fo sed 11233 399 | 400 | $ # multiple consecutive horizontal spaces to single space 401 | $ printf 'foo\t\tbar \t123 baz\n' 402 | foo bar 123 baz 403 | $ printf 'foo\t\tbar \t123 baz\n' | tr -s '[:blank:]' ' ' 404 | foo bar 123 baz 405 | ``` 406 | 407 |
408 | 409 | #### Further reading for tr 410 | 411 | * `man tr` and `info tr` for more options and detailed documentation 412 | * [tr Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/tr?sort=votes&pageSize=15) 413 | 414 |
415 | 416 | ## basename 417 | 418 | ```bash 419 | $ basename --version | head -n1 420 | basename (GNU coreutils) 8.25 421 | 422 | $ man basename 423 | BASENAME(1) User Commands BASENAME(1) 424 | 425 | NAME 426 | basename - strip directory and suffix from filenames 427 | 428 | SYNOPSIS 429 | basename NAME [SUFFIX] 430 | basename OPTION... NAME... 431 | 432 | DESCRIPTION 433 | Print NAME with any leading directory components removed. If speci‐ 434 | fied, also remove a trailing SUFFIX. 435 | ... 436 | ``` 437 | 438 |
439 | 440 | **Examples** 441 | 442 | ```bash 443 | $ # same as using pwd command 444 | $ echo "$PWD" 445 | /home/learnbyexample 446 | 447 | $ basename "$PWD" 448 | learnbyexample 449 | 450 | $ # use -a option if there are multiple arguments 451 | $ basename -a foo/a/report.log bar/y/power.log 452 | report.log 453 | power.log 454 | 455 | $ # use single quotes if arguments contain space and other special shell characters 456 | $ # use suffix option -s to strip file extension from filename 457 | $ basename -s '.log' '/home/learnbyexample/proj adder/power.log' 458 | power 459 | $ # -a is implied when using -s option 460 | $ basename -s'.log' foo/a/report.log bar/y/power.log 461 | report 462 | power 463 | ``` 464 | 465 | * Can also use [Parameter expansion](http://mywiki.wooledge.org/BashFAQ/073) if working on file paths saved in variables 466 | * assumes `bash` shell and similar that support this feature 467 | 468 | ```bash 469 | $ # remove from start of string up to last / 470 | $ file='/home/learnbyexample/proj adder/power.log' 471 | $ basename "$file" 472 | power.log 473 | $ echo "${file##*/}" 474 | power.log 475 | 476 | $ t="${file##*/}" 477 | $ # remove .log from end of string 478 | $ echo "${t%.log}" 479 | power 480 | ``` 481 | 482 | * See `man basename` and `info basename` for detailed documentation 483 | 484 |
485 | 486 | ## dirname 487 | 488 | ```bash 489 | $ dirname --version | head -n1 490 | dirname (GNU coreutils) 8.25 491 | 492 | $ man dirname 493 | DIRNAME(1) User Commands DIRNAME(1) 494 | 495 | NAME 496 | dirname - strip last component from file name 497 | 498 | SYNOPSIS 499 | dirname [OPTION] NAME... 500 | 501 | DESCRIPTION 502 | Output each NAME with its last non-slash component and trailing slashes 503 | removed; if NAME contains no /'s, output '.' (meaning the current 504 | directory). 505 | ... 506 | ``` 507 | 508 |
509 | 510 | **Examples** 511 | 512 | ```bash 513 | $ echo "$PWD" 514 | /home/learnbyexample 515 | 516 | $ dirname "$PWD" 517 | /home 518 | 519 | $ # use single quotes if arguments contain space and other special shell characters 520 | $ dirname '/home/learnbyexample/proj adder/power.log' 521 | /home/learnbyexample/proj adder 522 | 523 | $ # unlike basename, by default dirname handles multiple arguments 524 | $ dirname foo/a/report.log bar/y/power.log 525 | foo/a 526 | bar/y 527 | 528 | $ # if no / in argument, output is . to indicate current directory 529 | $ dirname power.log 530 | . 531 | ``` 532 | 533 | * Use `$()` command substitution to further process output as needed 534 | 535 | ```bash 536 | $ dirname '/home/learnbyexample/proj adder/power.log' 537 | /home/learnbyexample/proj adder 538 | 539 | $ dirname "$(dirname '/home/learnbyexample/proj adder/power.log')" 540 | /home/learnbyexample 541 | 542 | $ basename "$(dirname '/home/learnbyexample/proj adder/power.log')" 543 | proj adder 544 | ``` 545 | 546 | * Can also use [Parameter expansion](http://mywiki.wooledge.org/BashFAQ/073) if working on file paths saved in variables 547 | * assumes `bash` shell and similar that support this feature 548 | 549 | ```bash 550 | $ # remove from last / in the string to end of string 551 | $ file='/home/learnbyexample/proj adder/power.log' 552 | $ dirname "$file" 553 | /home/learnbyexample/proj adder 554 | $ echo "${file%/*}" 555 | /home/learnbyexample/proj adder 556 | 557 | $ # remove from second last / to end of string 558 | $ echo "${file%/*/*}" 559 | /home/learnbyexample 560 | 561 | $ # apply basename trick to get just directory name instead of full path 562 | $ t="${file%/*}" 563 | $ echo "${t##*/}" 564 | proj adder 565 | ``` 566 | 567 | * See `man dirname` and `info dirname` for detailed documentation 568 | 569 |
570 | 571 | ## xargs 572 | 573 | ```bash 574 | $ xargs --version | head -n1 575 | xargs (GNU findutils) 4.7.0-git 576 | 577 | $ whatis xargs 578 | xargs (1) - build and execute command lines from standard input 579 | 580 | $ # from 'man xargs' 581 | This manual page documents the GNU version of xargs. xargs reads items 582 | from the standard input, delimited by blanks (which can be protected 583 | with double or single quotes or a backslash) or newlines, and executes 584 | the command (default is /bin/echo) one or more times with any initial- 585 | arguments followed by items read from standard input. Blank lines on 586 | the standard input are ignored. 587 | ``` 588 | 589 | While `xargs` is [primarily used](https://unix.stackexchange.com/questions/24954/when-is-xargs-needed) for passing output of command or file contents to another command as input arguments and/or parallel processing, it can be quite handy for certain text processing stuff with default `echo` command 590 | 591 | ```bash 592 | $ printf ' foo\t\tbar \t123 baz \n' | cat -e 593 | foo bar 123 baz $ 594 | $ # tr helps to change consecutive blanks to single space 595 | $ # but what if blanks at start and end have to be removed as well? 596 | $ printf ' foo\t\tbar \t123 baz \n' | tr -s '[:blank:]' ' ' | cat -e 597 | foo bar 123 baz $ 598 | $ # xargs does this by default 599 | $ printf ' foo\t\tbar \t123 baz \n' | xargs | cat -e 600 | foo bar 123 baz$ 601 | 602 | $ # -n option limits number of arguments per line 603 | $ printf ' foo\t\tbar \t123 baz \n' | xargs -n2 604 | foo bar 605 | 123 baz 606 | 607 | $ # same as using: paste -d' ' - - - 608 | $ # or: pr -3ats' ' 609 | $ seq 6 | xargs -n3 610 | 1 2 3 611 | 4 5 6 612 | ``` 613 | 614 | * use `-a` option to specify file input instead of stdin 615 | 616 | ```bash 617 | $ cat marks.txt 618 | jan 2017 619 | foobar 12 45 23 620 | feb 2017 621 | foobar 18 38 19 622 | 623 | $ xargs -a marks.txt 624 | jan 2017 foobar 12 45 23 feb 2017 foobar 18 38 19 625 | 626 | $ # use -L option to limit max number of lines per command line 627 | $ xargs -L2 -a marks.txt 628 | jan 2017 foobar 12 45 23 629 | feb 2017 foobar 18 38 19 630 | ``` 631 | 632 | * **Note** since `echo` is the command being executed, it will cause issue with option interpretation 633 | 634 | ```bash 635 | $ printf ' -e foo\t\tbar \t123 baz \n' | xargs -n2 636 | foo 637 | bar 123 638 | baz 639 | 640 | $ # use -t option to see what is happening (verbose output) 641 | $ printf ' -e foo\t\tbar \t123 baz \n' | xargs -n2 -t 642 | echo -e foo 643 | foo 644 | echo bar 123 645 | bar 123 646 | echo baz 647 | baz 648 | ``` 649 | 650 | * See `man xargs` and `info xargs` for detailed documentation 651 | 652 |
653 | 654 | ## seq 655 | 656 | ```bash 657 | $ seq --version | head -n1 658 | seq (GNU coreutils) 8.25 659 | 660 | $ man seq 661 | SEQ(1) User Commands SEQ(1) 662 | 663 | NAME 664 | seq - print a sequence of numbers 665 | 666 | SYNOPSIS 667 | seq [OPTION]... LAST 668 | seq [OPTION]... FIRST LAST 669 | seq [OPTION]... FIRST INCREMENT LAST 670 | 671 | DESCRIPTION 672 | Print numbers from FIRST to LAST, in steps of INCREMENT. 673 | ... 674 | ``` 675 | 676 |
677 | 678 | #### integer sequences 679 | 680 | * see `info seq` for details of how large numbers are handled 681 | * for ex: `seq 50000000000000000000 2 50000000000000000004` may not work 682 | 683 | ```bash 684 | $ # default start=1 and increment=1 685 | $ seq 3 686 | 1 687 | 2 688 | 3 689 | 690 | $ # default increment=1 691 | $ seq 25434 25437 692 | 25434 693 | 25435 694 | 25436 695 | 25437 696 | $ seq -5 -3 697 | -5 698 | -4 699 | -3 700 | 701 | $ # different increment value 702 | $ seq 1000 5 1011 703 | 1000 704 | 1005 705 | 1010 706 | 707 | $ # use negative increment for descending order 708 | $ seq 10 -5 -7 709 | 10 710 | 5 711 | 0 712 | -5 713 | ``` 714 | 715 | * use `-w` option for leading zeros 716 | * largest length of start/end value is used to determine padding 717 | 718 | ```bash 719 | $ seq 008 010 720 | 8 721 | 9 722 | 10 723 | 724 | $ # or: seq -w 8 010 725 | $ seq -w 008 010 726 | 008 727 | 009 728 | 010 729 | 730 | $ seq -w 0003 731 | 0001 732 | 0002 733 | 0003 734 | ``` 735 | 736 |
737 | 738 | #### specifying separator 739 | 740 | * As seen already, default is newline separator between numbers 741 | * `-s` option allows to use custom string between numbers 742 | * A newline is always added at end 743 | 744 | ```bash 745 | $ seq -s: 4 746 | 1:2:3:4 747 | 748 | $ seq -s' ' 4 749 | 1 2 3 4 750 | 751 | $ seq -s' - ' 4 752 | 1 - 2 - 3 - 4 753 | ``` 754 | 755 |
756 | 757 | #### floating point sequences 758 | 759 | ```bash 760 | $ # default increment=1 761 | $ seq 0.5 2.5 762 | 0.5 763 | 1.5 764 | 2.5 765 | 766 | $ seq -s':' -2 0.75 3 767 | -2.00:-1.25:-0.50:0.25:1.00:1.75:2.50 768 | 769 | $ # Scientific notation is supported 770 | $ seq 1.2e2 1.22e2 771 | 120 772 | 121 773 | 122 774 | ``` 775 | 776 | * formatting numbers, see `info seq` for details 777 | 778 | ```bash 779 | $ seq -f'%.3f' -s':' -2 0.75 3 780 | -2.000:-1.250:-0.500:0.250:1.000:1.750:2.500 781 | 782 | $ seq -f'%.3e' 1.2e2 1.22e2 783 | 1.200e+02 784 | 1.210e+02 785 | 1.220e+02 786 | ``` 787 | 788 |
789 | 790 | #### Further reading for seq 791 | 792 | * `man seq` and `info seq` for more options, corner cases and detailed documentation 793 | * [seq Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/seq?sort=votes&pageSize=15) 794 | -------------------------------------------------------------------------------- /overview_presentation/baz.json: -------------------------------------------------------------------------------- 1 | { 2 | "abc": { 3 | "@attr": "good", 4 | "text": "Hi there" 5 | }, 6 | "xyz": { 7 | "@attr": "bad", 8 | "text": "I am good. How are you?" 9 | } 10 | } 11 | -------------------------------------------------------------------------------- /overview_presentation/cli_text_processing.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/overview_presentation/cli_text_processing.pdf -------------------------------------------------------------------------------- /overview_presentation/foo.xml: -------------------------------------------------------------------------------- 1 | 2 | Hi there 3 | I am good. How are you? 4 | 5 | -------------------------------------------------------------------------------- /overview_presentation/greeting.txt: -------------------------------------------------------------------------------- 1 | Hi there 2 | Have a nice day 3 | -------------------------------------------------------------------------------- /overview_presentation/sample.txt: -------------------------------------------------------------------------------- 1 | Hello World! 2 | 3 | Good day 4 | How do you do? 5 | 6 | Just do it 7 | Believe 42 it! 8 | 9 | Today is sunny 10 | Not a bit funny 11 | No doubt you like it too 12 | 13 | Much ado about nothing 14 | He he 123 he he 15 | -------------------------------------------------------------------------------- /restructure_text.md: -------------------------------------------------------------------------------- 1 | # Restructure text 2 | 3 | **Table of Contents** 4 | 5 | * [paste](#paste) 6 | * [Concatenating files column wise](#concatenating-files-column-wise) 7 | * [Interleaving lines](#interleaving-lines) 8 | * [Lines to multiple columns](#lines-to-multiple-columns) 9 | * [Different delimiters between columns](#different-delimiters-between-columns) 10 | * [Multiple lines to single row](#multiple-lines-to-single-row) 11 | * [Further reading for paste](#further-reading-for-paste) 12 | * [column](#column) 13 | * [Pretty printing tables](#pretty-printing-tables) 14 | * [Specifying different input delimiter](#specifying-different-input-delimiter) 15 | * [Further reading for column](#further-reading-for-column) 16 | * [pr](#pr) 17 | * [Converting lines to columns](#converting-lines-to-columns) 18 | * [Changing PAGE_WIDTH](#changing-page_width) 19 | * [Combining multiple input files](#combining-multiple-input-files) 20 | * [Transposing a table](#transposing-a-table) 21 | * [Further reading for pr](#further-reading-for-pr) 22 | * [fold](#fold) 23 | * [Examples](#examples) 24 | * [Further reading for fold](#further-reading-for-fold) 25 | 26 |
27 | 28 | ## paste 29 | 30 | ```bash 31 | $ paste --version | head -n1 32 | paste (GNU coreutils) 8.25 33 | 34 | $ man paste 35 | PASTE(1) User Commands PASTE(1) 36 | 37 | NAME 38 | paste - merge lines of files 39 | 40 | SYNOPSIS 41 | paste [OPTION]... [FILE]... 42 | 43 | DESCRIPTION 44 | Write lines consisting of the sequentially corresponding lines from 45 | each FILE, separated by TABs, to standard output. 46 | 47 | With no FILE, or when FILE is -, read standard input. 48 | ... 49 | ``` 50 | 51 |
52 | 53 | #### Concatenating files column wise 54 | 55 | * By default, `paste` adds a TAB between corresponding lines of input files 56 | 57 | ```bash 58 | $ paste colors_1.txt colors_2.txt 59 | Blue Black 60 | Brown Blue 61 | Purple Green 62 | Red Red 63 | Teal White 64 | ``` 65 | 66 | * Specifying a different delimiter using `-d` 67 | * The `<()` syntax is [Process Substitution](http://mywiki.wooledge.org/ProcessSubstitution) 68 | * to put it simply - allows output of command to be passed as input file to another command without needing to manually create a temporary file 69 | 70 | ```bash 71 | $ paste -d, <(seq 5) <(seq 6 10) 72 | 1,6 73 | 2,7 74 | 3,8 75 | 4,9 76 | 5,10 77 | 78 | $ # empty cells if number of lines is not same for all input files 79 | $ # -d\| can also be used 80 | $ paste -d'|' <(seq 3) <(seq 4 6) <(seq 7 10) 81 | 1|4|7 82 | 2|5|8 83 | 3|6|9 84 | ||10 85 | ``` 86 | 87 | * to paste without any character in between, use `\0` as delimiter 88 | * note that `\0` here doesn't mean the ASCII NUL character 89 | * can also use `-d ''` with `GNU paste` 90 | 91 | ```bash 92 | $ paste -d'\0' <(seq 3) <(seq 6 8) 93 | 16 94 | 27 95 | 38 96 | ``` 97 | 98 |
99 | 100 | #### Interleaving lines 101 | 102 | * Interleave lines by using newline as delimiter 103 | 104 | ```bash 105 | $ paste -d'\n' <(seq 11 13) <(seq 101 103) 106 | 11 107 | 101 108 | 12 109 | 102 110 | 13 111 | 103 112 | ``` 113 | 114 |
115 | 116 | #### Lines to multiple columns 117 | 118 | * Number of `-` specified determines number of output columns 119 | * Input lines can be passed only as stdin 120 | 121 | ```bash 122 | $ # single column to two columns 123 | $ seq 10 | paste -d, - - 124 | 1,2 125 | 3,4 126 | 5,6 127 | 7,8 128 | 9,10 129 | 130 | $ # single column to five columns 131 | $ seq 10 | paste -d: - - - - - 132 | 1:2:3:4:5 133 | 6:7:8:9:10 134 | 135 | $ # input redirection for file input 136 | $ paste -d, - - < colors_1.txt 137 | Blue,Brown 138 | Purple,Red 139 | Teal, 140 | ``` 141 | 142 | * Use `printf` trick if number of columns to specify is too large 143 | 144 | ```bash 145 | $ # prompt at end of line not shown for simplicity 146 | $ printf -- "- %.s" {1..5} 147 | - - - - - 148 | 149 | $ seq 10 | paste -d, $(printf -- "- %.s" {1..5}) 150 | 1,2,3,4,5 151 | 6,7,8,9,10 152 | ``` 153 | 154 |
155 | 156 | #### Different delimiters between columns 157 | 158 | * For more than 2 columns, different delimiter character can be specified - passed as list to `-d` option 159 | 160 | ```bash 161 | $ # , is used between 1st and 2nd column 162 | $ # - is used between 2nd and 3rd column 163 | $ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9) 164 | 1,4-7 165 | 2,5-8 166 | 3,6-9 167 | 168 | $ # re-use list from beginning if not specified for all columns 169 | $ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12) 170 | 1,4-7,10 171 | 2,5-8,11 172 | 3,6-9,12 173 | $ # another example 174 | $ seq 10 | paste -d':,' - - - - - 175 | 1:2,3:4,5 176 | 6:7,8:9,10 177 | 178 | $ # so, with single delimiter, it is just re-used for all columns 179 | $ paste -d, <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12) 180 | 1,4,7,10 181 | 2,5,8,11 182 | 3,6,9,12 183 | ``` 184 | 185 | * combination of `-d` and `/dev/null` (empty file) can give multi-character separation between columns 186 | * If this is too confusing to use, consider [pr](#pr) instead 187 | 188 | ```bash 189 | $ paste -d' : ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9) 190 | 1 : 4 : 7 191 | 2 : 5 : 8 192 | 3 : 6 : 9 193 | 194 | $ # or just use pr instead 195 | $ pr -mts' : ' <(seq 3) <(seq 4 6) <(seq 7 9) 196 | 1 : 4 : 7 197 | 2 : 5 : 8 198 | 3 : 6 : 9 199 | 200 | $ # but paste would allow different delimiters ;) 201 | $ paste -d' : - ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9) 202 | 1 : 4 - 7 203 | 2 : 5 - 8 204 | 3 : 6 - 9 205 | 206 | $ # pr would need two invocations 207 | $ pr -mts' : ' <(seq 3) <(seq 4 6) | pr -mts' - ' - <(seq 7 9) 208 | 1 : 4 - 7 209 | 2 : 5 - 8 210 | 3 : 6 - 9 211 | ``` 212 | 213 | * example to show using empty file instead of `/dev/null` 214 | 215 | ```bash 216 | $ # assuming file named e doesn't exist 217 | $ touch e 218 | $ # or use this, will empty contents even if file named e already exists :P 219 | $ > e 220 | 221 | $ paste -d' : - ' <(seq 3) e e <(seq 4 6) e e <(seq 7 9) 222 | 1 : 4 - 7 223 | 2 : 5 - 8 224 | 3 : 6 - 9 225 | ``` 226 | 227 |
228 | 229 | #### Multiple lines to single row 230 | 231 | ```bash 232 | $ paste -sd, colors_1.txt 233 | Blue,Brown,Purple,Red,Teal 234 | 235 | $ # multiple files each gets a row 236 | $ paste -sd: colors_1.txt colors_2.txt 237 | Blue:Brown:Purple:Red:Teal 238 | Black:Blue:Green:Red:White 239 | 240 | $ # multiple input files need not have same number of lines 241 | $ paste -sd, <(seq 3) <(seq 5 9) 242 | 1,2,3 243 | 5,6,7,8,9 244 | ``` 245 | 246 | * Often used to serialize multiple line output from another command 247 | 248 | ```bash 249 | $ sort -u colors_1.txt colors_2.txt | paste -sd, 250 | Black,Blue,Brown,Green,Purple,Red,Teal,White 251 | ``` 252 | 253 | * For multiple character delimiter, post-process if separator is unique or use another tool like `perl` 254 | 255 | ```bash 256 | $ seq 10 | paste -sd, 257 | 1,2,3,4,5,6,7,8,9,10 258 | 259 | $ # post-process 260 | $ seq 10 | paste -sd, | sed 's/,/ : /g' 261 | 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10 262 | 263 | $ # using perl alone 264 | $ seq 10 | perl -pe 's/\n/ : / if(!eof)' 265 | 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10 266 | ``` 267 | 268 |
269 | 270 | #### Further reading for paste 271 | 272 | * `man paste` and `info paste` for more options and detailed documentation 273 | * [paste Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/paste?sort=votes&pageSize=15) 274 | 275 |
276 | 277 | ## column 278 | 279 | ```bash 280 | COLUMN(1) BSD General Commands Manual COLUMN(1) 281 | 282 | NAME 283 | column — columnate lists 284 | 285 | SYNOPSIS 286 | column [-entx] [-c columns] [-s sep] [file ...] 287 | 288 | DESCRIPTION 289 | The column utility formats its input into multiple columns. Rows are 290 | filled before columns. Input is taken from file operands, or, by 291 | default, from the standard input. Empty lines are ignored unless the -e 292 | option is used. 293 | ... 294 | ``` 295 | 296 |
297 | 298 | #### Pretty printing tables 299 | 300 | * by default whitespace is input delimiter 301 | 302 | ```bash 303 | $ cat dishes.txt 304 | North alootikki baati khichdi makkiroti poha 305 | South appam bisibelebath dosa koottu sevai 306 | West dhokla khakhra modak shiro vadapav 307 | East handoguri litti momo rosgulla shondesh 308 | 309 | $ column -t dishes.txt 310 | North alootikki baati khichdi makkiroti poha 311 | South appam bisibelebath dosa koottu sevai 312 | West dhokla khakhra modak shiro vadapav 313 | East handoguri litti momo rosgulla shondesh 314 | ``` 315 | 316 | * often useful to get neatly aligned columns from output of another command 317 | 318 | ```bash 319 | $ paste fruits.txt price.txt 320 | Fruits Price 321 | apple 182 322 | guava 90 323 | watermelon 35 324 | banana 72 325 | pomegranate 280 326 | 327 | $ paste fruits.txt price.txt | column -t 328 | Fruits Price 329 | apple 182 330 | guava 90 331 | watermelon 35 332 | banana 72 333 | pomegranate 280 334 | ``` 335 | 336 |
337 | 338 | #### Specifying different input delimiter 339 | 340 | * Use `-s` to specify input delimiter 341 | * Use `-n` to prevent merging empty cells 342 | * From `man column` "This option is a Debian GNU/Linux extension" 343 | 344 | ```bash 345 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) 346 | 1,5,11 347 | 2,6,12 348 | 3,7,13 349 | ,8, 350 | ,9, 351 | 352 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) | column -s, -t 353 | 1 5 11 354 | 2 6 12 355 | 3 7 13 356 | 8 357 | 9 358 | 359 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) | column -s, -nt 360 | 1 5 11 361 | 2 6 12 362 | 3 7 13 363 | 8 364 | 9 365 | ``` 366 | 367 |
368 | 369 | #### Further reading for column 370 | 371 | * `man column` for more options and detailed documentation 372 | * [column Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/columns?sort=votes&pageSize=15) 373 | * More examples [here](http://www.commandlinefu.com/commands/using/column/sort-by-votes) 374 | 375 |
376 | 377 | ## pr 378 | 379 | ```bash 380 | $ pr --version | head -n1 381 | pr (GNU coreutils) 8.25 382 | 383 | $ man pr 384 | PR(1) User Commands PR(1) 385 | 386 | NAME 387 | pr - convert text files for printing 388 | 389 | SYNOPSIS 390 | pr [OPTION]... [FILE]... 391 | 392 | DESCRIPTION 393 | Paginate or columnate FILE(s) for printing. 394 | 395 | With no FILE, or when FILE is -, read standard input. 396 | ... 397 | ``` 398 | 399 | * `Paginate` is not covered, examples related only to `columnate` 400 | * For example, default invocation on a file would add a header, etc 401 | 402 | ```bash 403 | $ # truncated output shown 404 | $ pr fruits.txt 405 | 406 | 407 | 2017-04-21 17:49 fruits.txt Page 1 408 | 409 | 410 | Fruits 411 | apple 412 | guava 413 | watermelon 414 | banana 415 | pomegranate 416 | 417 | ``` 418 | 419 | * Following sections will use `-t` to omit page headers and trailers 420 | 421 |
422 | 423 | #### Converting lines to columns 424 | 425 | * With [paste](#lines-to-multiple-columns), changing input file rows to column(s) is possible only with consecutive lines 426 | * `pr` can do that as well as split entire file itself according to number of columns needed 427 | * And `-s` option in `pr` allows multi-character output delimiter 428 | * As usual, examples to better show the functionalities 429 | 430 | ```bash 431 | $ # note how the input got split into two and resulting splits joined by , 432 | $ seq 6 | pr -2ts, 433 | 1,4 434 | 2,5 435 | 3,6 436 | 437 | $ # note how two consecutive lines gets joined by , 438 | $ seq 6 | paste -d, - - 439 | 1,2 440 | 3,4 441 | 5,6 442 | ``` 443 | 444 | * Default **PAGE_WIDTH** is 72 characters, so each column gets 72 divided by number of columns unless `-s` is used 445 | 446 | ```bash 447 | $ # 3 columns, so each column width is 24 characters 448 | $ seq 9 | pr -3t 449 | 1 4 7 450 | 2 5 8 451 | 3 6 9 452 | 453 | $ # using -s, desired delimiter can be specified 454 | $ seq 9 | pr -3ts' ' 455 | 1 4 7 456 | 2 5 8 457 | 3 6 9 458 | 459 | $ seq 9 | pr -3ts' : ' 460 | 1 : 4 : 7 461 | 2 : 5 : 8 462 | 3 : 6 : 9 463 | 464 | $ # default is TAB when using -s option with no arguments 465 | $ seq 9 | pr -3ts 466 | 1 4 7 467 | 2 5 8 468 | 3 6 9 469 | ``` 470 | 471 | * Using `-a` to change consecutive rows, similar to `paste` 472 | 473 | ```bash 474 | $ seq 8 | pr -4ats: 475 | 1:2:3:4 476 | 5:6:7:8 477 | 478 | $ # no output delimiter for empty cells 479 | $ seq 22 | pr -5ats, 480 | 1,2,3,4,5 481 | 6,7,8,9,10 482 | 11,12,13,14,15 483 | 16,17,18,19,20 484 | 21,22 485 | 486 | $ # note output delimiter even for empty cells 487 | $ seq 22 | paste -d, - - - - - 488 | 1,2,3,4,5 489 | 6,7,8,9,10 490 | 11,12,13,14,15 491 | 16,17,18,19,20 492 | 21,22,,, 493 | ``` 494 | 495 |
496 | 497 | #### Changing PAGE_WIDTH 498 | 499 | * The default PAGE_WIDTH is 72 500 | * The formula `(col-1)*len(delimiter) + col` seems to work in determining minimum PAGE_WIDTH required for multiple column output 501 | * `col` is number of columns required 502 | 503 | ```bash 504 | $ # (36-1)*1 + 36 = 71, so within PAGE_WIDTH limit 505 | $ seq 74 | pr -36ats, 506 | 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36 507 | 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72 508 | 73,74 509 | $ # (37-1)*1 + 37 = 73, more than default PAGE_WIDTH limit 510 | $ seq 74 | pr -37ats, 511 | pr: page width too narrow 512 | ``` 513 | 514 | * Use `-w` to specify a different PAGE_WIDTH 515 | * The `-J` option turns off truncation 516 | 517 | ```bash 518 | $ # (37-1)*1 + 37 = 73 519 | $ seq 74 | pr -J -w73 -37ats, 520 | 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37 521 | 38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74 522 | 523 | $ # (3-1)*4 + 3 = 11 524 | $ seq 6 | pr -J -w10 -3ats'::::' 525 | pr: page width too narrow 526 | $ seq 6 | pr -J -w11 -3ats'::::' 527 | 1::::2::::3 528 | 4::::5::::6 529 | 530 | $ # if calculating is difficult, simply use a large number 531 | $ seq 6 | pr -J -w500 -3ats'::::' 532 | 1::::2::::3 533 | 4::::5::::6 534 | ``` 535 | 536 |
537 | 538 | #### Combining multiple input files 539 | 540 | * Use `-m` option to combine multiple files in parallel, similar to `paste` 541 | 542 | ```bash 543 | $ # 2 columns, so each column width is 36 characters 544 | $ pr -mt fruits.txt price.txt 545 | Fruits Price 546 | apple 182 547 | guava 90 548 | watermelon 35 549 | banana 72 550 | pomegranate 280 551 | 552 | $ # default is TAB when using -s option with no arguments 553 | $ pr -mts <(seq 3) <(seq 4 6) <(seq 7 10) 554 | 1 4 7 555 | 2 5 8 556 | 3 6 9 557 | 10 558 | 559 | $ # double TAB as separator 560 | $ # shell expands $'\t\t' before command is executed 561 | $ pr -mts$'\t\t' colors_1.txt colors_2.txt 562 | Blue Black 563 | Brown Blue 564 | Purple Green 565 | Red Red 566 | Teal White 567 | ``` 568 | 569 | * For interleaving, specify newline as separator 570 | 571 | ```bash 572 | $ pr -mts$'\n' fruits.txt price.txt 573 | Fruits 574 | Price 575 | apple 576 | 182 577 | guava 578 | 90 579 | watermelon 580 | 35 581 | banana 582 | 72 583 | pomegranate 584 | 280 585 | ``` 586 | 587 |
588 | 589 | #### Transposing a table 590 | 591 | ```bash 592 | $ # delimiter is single character, so easy to use tr to change it to newline 593 | $ cat dishes.txt 594 | North alootikki baati khichdi makkiroti poha 595 | South appam bisibelebath dosa koottu sevai 596 | West dhokla khakhra modak shiro vadapav 597 | East handoguri litti momo rosgulla shondesh 598 | 599 | $ # 4 columns, so each column width is 18 characters 600 | $ # $(wc -l < dishes.txt) gives number of columns required 601 | $ tr ' ' '\n' < dishes.txt | pr -$(wc -l < dishes.txt)t 602 | North South West East 603 | alootikki appam dhokla handoguri 604 | baati bisibelebath khakhra litti 605 | khichdi dosa modak momo 606 | makkiroti koottu shiro rosgulla 607 | poha sevai vadapav shondesh 608 | ``` 609 | 610 | * Pipe the output to `column` if spacing is too much 611 | 612 | ```bash 613 | $ tr ' ' '\n' < dishes.txt | pr -$(wc -l < dishes.txt)t | column -t 614 | North South West East 615 | alootikki appam dhokla handoguri 616 | baati bisibelebath khakhra litti 617 | khichdi dosa modak momo 618 | makkiroti koottu shiro rosgulla 619 | poha sevai vadapav shondesh 620 | ``` 621 | 622 |
623 | 624 | #### Further reading for pr 625 | 626 | * `man pr` and `info pr` for more options and detailed documentation 627 | * More examples [here](http://docstore.mik.ua/orelly/unix3/upt/ch21_15.htm) 628 | 629 |
630 | 631 | ## fold 632 | 633 | ```bash 634 | $ fold --version | head -n1 635 | fold (GNU coreutils) 8.25 636 | 637 | $ man fold 638 | FOLD(1) User Commands FOLD(1) 639 | 640 | NAME 641 | fold - wrap each input line to fit in specified width 642 | 643 | SYNOPSIS 644 | fold [OPTION]... [FILE]... 645 | 646 | DESCRIPTION 647 | Wrap input lines in each FILE, writing to standard output. 648 | 649 | With no FILE, or when FILE is -, read standard input. 650 | ... 651 | ``` 652 | 653 |
654 | 655 | #### Examples 656 | 657 | ```bash 658 | $ nl story.txt 659 | 1 The princess of a far away land fought bravely to rescue a travelling group from bandits. And the happy story ends here. Have a nice day. 660 | 2 Still here? okay, read on: The prince of Happalakkahuhu wished he could be as brave as his sister and vowed to train harder 661 | 662 | $ # default folding width is 80 663 | $ fold story.txt 664 | The princess of a far away land fought bravely to rescue a travelling group from 665 | bandits. And the happy story ends here. Have a nice day. 666 | Still here? okay, read on: The prince of Happalakkahuhu wished he could be as br 667 | ave as his sister and vowed to train harder 668 | 669 | $ fold story.txt | nl 670 | 1 The princess of a far away land fought bravely to rescue a travelling group from 671 | 2 bandits. And the happy story ends here. Have a nice day. 672 | 3 Still here? okay, read on: The prince of Happalakkahuhu wished he could be as br 673 | 4 ave as his sister and vowed to train harder 674 | ``` 675 | 676 | * `-s` option breaks at spaces to avoid word splitting 677 | 678 | ```bash 679 | $ fold -s story.txt 680 | The princess of a far away land fought bravely to rescue a travelling group 681 | from bandits. And the happy story ends here. Have a nice day. 682 | Still here? okay, read on: The prince of Happalakkahuhu wished he could be as 683 | brave as his sister and vowed to train harder 684 | ``` 685 | 686 | * Use `-w` to change default width 687 | 688 | ```bash 689 | $ fold -s -w60 story.txt 690 | The princess of a far away land fought bravely to rescue a 691 | travelling group from bandits. And the happy story ends 692 | here. Have a nice day. 693 | Still here? okay, read on: The prince of Happalakkahuhu 694 | wished he could be as brave as his sister and vowed to 695 | train harder 696 | ``` 697 | 698 |
699 | 700 | #### Further reading for fold 701 | 702 | * `man fold` and `info fold` for more options and detailed documentation 703 | 704 | -------------------------------------------------------------------------------- /sorting_stuff.md: -------------------------------------------------------------------------------- 1 | # Sorting stuff 2 | 3 | **Table of Contents** 4 | 5 | * [sort](#sort) 6 | * [Default sort](#default-sort) 7 | * [Reverse sort](#reverse-sort) 8 | * [Various number sorting](#various-number-sorting) 9 | * [Random sort](#random-sort) 10 | * [Specifying output file](#specifying-output-file) 11 | * [Unique sort](#unique-sort) 12 | * [Column based sorting](#column-based-sorting) 13 | * [Further reading for sort](#further-reading-for-sort) 14 | * [uniq](#uniq) 15 | * [Default uniq](#default-uniq) 16 | * [Only duplicates](#only-duplicates) 17 | * [Only unique](#only-unique) 18 | * [Prefix count](#prefix-count) 19 | * [Ignoring case](#ignoring-case) 20 | * [Combining multiple files](#combining-multiple-files) 21 | * [Column options](#column-options) 22 | * [Further reading for uniq](#further-reading-for-uniq) 23 | * [comm](#comm) 24 | * [Default three column output](#default-three-column-output) 25 | * [Suppressing columns](#suppressing-columns) 26 | * [Files with duplicates](#files-with-duplicates) 27 | * [Further reading for comm](#further-reading-for-comm) 28 | * [shuf](#shuf) 29 | * [Random lines](#random-lines) 30 | * [Random integer numbers](#random-integer-numbers) 31 | * [Further reading for shuf](#further-reading-for-shuf) 32 | 33 |
34 | 35 | ## sort 36 | 37 | ```bash 38 | $ sort --version | head -n1 39 | sort (GNU coreutils) 8.25 40 | 41 | $ man sort 42 | SORT(1) User Commands SORT(1) 43 | 44 | NAME 45 | sort - sort lines of text files 46 | 47 | SYNOPSIS 48 | sort [OPTION]... [FILE]... 49 | sort [OPTION]... --files0-from=F 50 | 51 | DESCRIPTION 52 | Write sorted concatenation of all FILE(s) to standard output. 53 | 54 | With no FILE, or when FILE is -, read standard input. 55 | ... 56 | ``` 57 | 58 | **Note**: All examples shown here assumes ASCII encoded input file 59 | 60 | 61 |
62 | 63 | #### Default sort 64 | 65 | ```bash 66 | $ cat poem.txt 67 | Roses are red, 68 | Violets are blue, 69 | Sugar is sweet, 70 | And so are you. 71 | 72 | $ sort poem.txt 73 | And so are you. 74 | Roses are red, 75 | Sugar is sweet, 76 | Violets are blue, 77 | ``` 78 | 79 | * Well, that was easy. The lines were sorted alphabetically (ascending order by default) and it so happened that first letter alone was enough to decide the order 80 | * For next example, let's extract all the words and sort them 81 | * also allows to showcase `sort` accepting stdin 82 | * See [GNU grep](./gnu_grep.md) chapter if the `grep` command used below looks alien 83 | 84 | ```bash 85 | $ # output might differ depending on locale settings 86 | $ # note the case-insensitiveness of output 87 | $ grep -oi '[a-z]*' poem.txt | sort 88 | And 89 | are 90 | are 91 | are 92 | blue 93 | is 94 | red 95 | Roses 96 | so 97 | Sugar 98 | sweet 99 | Violets 100 | you 101 | ``` 102 | 103 | * heed hereunto 104 | * See also 105 | * [arch wiki - locale](https://wiki.archlinux.org/index.php/locale) 106 | * [Linux: Define Locale and Language Settings](https://www.shellhacks.com/linux-define-locale-language-settings/) 107 | 108 | ```bash 109 | $ info sort | tail 110 | 111 | (1) If you use a non-POSIX locale (e.g., by setting ‘LC_ALL’ to 112 | ‘en_US’), then ‘sort’ may produce output that is sorted differently than 113 | you’re accustomed to. In that case, set the ‘LC_ALL’ environment 114 | variable to ‘C’. Note that setting only ‘LC_COLLATE’ has two problems. 115 | First, it is ineffective if ‘LC_ALL’ is also set. Second, it has 116 | undefined behavior if ‘LC_CTYPE’ (or ‘LANG’, if ‘LC_CTYPE’ is unset) is 117 | set to an incompatible value. For example, you get undefined behavior 118 | if ‘LC_CTYPE’ is ‘ja_JP.PCK’ but ‘LC_COLLATE’ is ‘en_US.UTF-8’. 119 | ``` 120 | 121 | * Example to help show effect of locale setting 122 | 123 | ```bash 124 | $ # note how uppercase is sorted before lowercase 125 | $ grep -oi '[a-z]*' poem.txt | LC_ALL=C sort 126 | And 127 | Roses 128 | Sugar 129 | Violets 130 | are 131 | are 132 | are 133 | blue 134 | is 135 | red 136 | so 137 | sweet 138 | you 139 | ``` 140 | 141 |
142 | 143 | #### Reverse sort 144 | 145 | * This is simply reversing from default ascending order to descending order 146 | 147 | ```bash 148 | $ sort -r poem.txt 149 | Violets are blue, 150 | Sugar is sweet, 151 | Roses are red, 152 | And so are you. 153 | ``` 154 | 155 |
156 | 157 | #### Various number sorting 158 | 159 | ```bash 160 | $ cat numbers.txt 161 | 20 162 | 53 163 | 3 164 | 101 165 | 166 | $ sort numbers.txt 167 | 101 168 | 20 169 | 3 170 | 53 171 | ``` 172 | 173 | * Whoops, what happened there? `sort` won't know to treat them as numbers unless specified 174 | * Depending on format of numbers, different options have to be used 175 | * First up is `-n` option, which sorts based on numerical value 176 | 177 | ```bash 178 | $ sort -n numbers.txt 179 | 3 180 | 20 181 | 53 182 | 101 183 | 184 | $ sort -nr numbers.txt 185 | 101 186 | 53 187 | 20 188 | 3 189 | ``` 190 | 191 | * The `-n` option can handle negative numbers 192 | * As well as thousands separator and decimal point (depends on locale) 193 | * The `<()` syntax is [Process Substitution](http://mywiki.wooledge.org/ProcessSubstitution) 194 | * to put it simply - allows output of command to be passed as input file to another command without needing to manually create a temporary file 195 | 196 | ```bash 197 | $ # multiple files are merged as single input by default 198 | $ sort -n numbers.txt <(echo '-4') 199 | -4 200 | 3 201 | 20 202 | 53 203 | 101 204 | 205 | $ sort -n numbers.txt <(echo '1,234') 206 | 3 207 | 20 208 | 53 209 | 101 210 | 1,234 211 | 212 | $ sort -n numbers.txt <(echo '31.24') 213 | 3 214 | 20 215 | 31.24 216 | 53 217 | 101 218 | ``` 219 | 220 | * Use `-g` if input contains numbers prefixed by `+` or [E scientific notation](https://en.wikipedia.org/wiki/Scientific_notation#E_notation) 221 | 222 | ```bash 223 | $ cat generic_numbers.txt 224 | +120 225 | -1.53 226 | 3.14e+4 227 | 42.1e-2 228 | 229 | $ sort -g generic_numbers.txt 230 | -1.53 231 | 42.1e-2 232 | +120 233 | 3.14e+4 234 | ``` 235 | 236 | * Commands like `du` have options to display numbers in human readable formats 237 | * `sort` supports sorting such numbers using the `-h` option 238 | 239 | ```bash 240 | $ du -sh * 241 | 104K power.log 242 | 746M projects 243 | 316K report.log 244 | 20K sample.txt 245 | $ du -sh * | sort -h 246 | 20K sample.txt 247 | 104K power.log 248 | 316K report.log 249 | 746M projects 250 | 251 | $ # --si uses powers of 1000 instead of 1024 252 | $ du -s --si * 253 | 107k power.log 254 | 782M projects 255 | 324k report.log 256 | 21k sample.txt 257 | $ du -s --si * | sort -h 258 | 21k sample.txt 259 | 107k power.log 260 | 324k report.log 261 | 782M projects 262 | ``` 263 | 264 | * Version sort - dealing with numbers mixed with other characters 265 | * If this sorting is needed simply while displaying directory contents, use `ls -v` instead of piping to `sort -V` 266 | 267 | ```bash 268 | $ cat versions.txt 269 | foo_v1.2 270 | bar_v2.1.3 271 | foobar_v2 272 | foo_v1.2.1 273 | foo_v1.3 274 | 275 | $ sort -V versions.txt 276 | bar_v2.1.3 277 | foobar_v2 278 | foo_v1.2 279 | foo_v1.2.1 280 | foo_v1.3 281 | ``` 282 | 283 | * Another common use case is when there are multiple filenames differentiated by numbers 284 | 285 | ```bash 286 | $ cat files.txt 287 | file0 288 | file10 289 | file3 290 | file4 291 | 292 | $ sort -V files.txt 293 | file0 294 | file3 295 | file4 296 | file10 297 | ``` 298 | 299 | * Can be used when dealing with numbers reported by `time` command as well 300 | 301 | ```bash 302 | $ # different solving durations 303 | $ cat rubik_time.txt 304 | 5m35.363s 305 | 3m20.058s 306 | 4m5.099s 307 | 4m1.130s 308 | 3m42.833s 309 | 4m33.083s 310 | 311 | $ # assuming consistent min/sec format 312 | $ sort -V rubik_time.txt 313 | 3m20.058s 314 | 3m42.833s 315 | 4m1.130s 316 | 4m5.099s 317 | 4m33.083s 318 | 5m35.363s 319 | ``` 320 | 321 |
322 | 323 | #### Random sort 324 | 325 | * Note that duplicate lines will always end up next to each other 326 | * might be useful as a feature for some cases ;) 327 | * Use `shuf` if this is not desirable 328 | * See also [How can I shuffle the lines of a text file on the Unix command line or in a shell script?](https://stackoverflow.com/questions/2153882/how-can-i-shuffle-the-lines-of-a-text-file-on-the-unix-command-line-or-in-a-shel) 329 | 330 | ```bash 331 | $ cat nums.txt 332 | 1 333 | 10 334 | 10 335 | 12 336 | 23 337 | 563 338 | 339 | $ # the two 10s will always be next to each other 340 | $ sort -R nums.txt 341 | 563 342 | 12 343 | 1 344 | 10 345 | 10 346 | 23 347 | 348 | $ # duplicates can end up anywhere 349 | $ shuf nums.txt 350 | 10 351 | 23 352 | 1 353 | 10 354 | 563 355 | 12 356 | ``` 357 | 358 |
359 | 360 | #### Specifying output file 361 | 362 | * The `-o` option can be used to specify output file 363 | * Useful for in place editing 364 | 365 | ```bash 366 | $ sort -R nums.txt -o rand_nums.txt 367 | $ cat rand_nums.txt 368 | 23 369 | 1 370 | 10 371 | 10 372 | 563 373 | 12 374 | 375 | $ sort -R nums.txt -o nums.txt 376 | $ cat nums.txt 377 | 563 378 | 23 379 | 10 380 | 10 381 | 1 382 | 12 383 | ``` 384 | 385 | * Use shell script looping if there multiple files to be sorted in place 386 | * Below snippet is for `bash` shell 387 | 388 | ```bash 389 | $ for f in *.txt; do echo sort -V "$f" -o "$f"; done 390 | sort -V files.txt -o files.txt 391 | sort -V rubik_time.txt -o rubik_time.txt 392 | sort -V versions.txt -o versions.txt 393 | 394 | $ # remove echo once commands look fine 395 | $ for f in *.txt; do sort -V "$f" -o "$f"; done 396 | ``` 397 | 398 |
399 | 400 | #### Unique sort 401 | 402 | * Keep only first copy of lines that are deemed to be same according to `sort` option used 403 | 404 | ```bash 405 | $ cat duplicates.txt 406 | foo 407 | 12 carrots 408 | foo 409 | 12 apples 410 | 5 guavas 411 | 412 | $ # only one copy of foo in output 413 | $ sort -u duplicates.txt 414 | 12 apples 415 | 12 carrots 416 | 5 guavas 417 | foo 418 | ``` 419 | 420 | * According to option used, definition of duplicate will vary 421 | * For example, when `-n` is used, matching numbers are deemed same even if rest of line differs 422 | * Pipe the output to `uniq` if this is not desirable 423 | 424 | ```bash 425 | $ # note how first copy of line starting with 12 is retained 426 | $ sort -nu duplicates.txt 427 | foo 428 | 5 guavas 429 | 12 carrots 430 | 431 | $ # use uniq when entire line should be compared to find duplicates 432 | $ sort -n duplicates.txt | uniq 433 | foo 434 | 5 guavas 435 | 12 apples 436 | 12 carrots 437 | ``` 438 | 439 | * Use `-f` option to ignore case of alphabets while determining duplicates 440 | 441 | ```bash 442 | $ cat words.txt 443 | CAR 444 | are 445 | car 446 | Are 447 | foot 448 | are 449 | 450 | $ # only the two 'are' were considered duplicates 451 | $ sort -u words.txt 452 | are 453 | Are 454 | car 455 | CAR 456 | foot 457 | 458 | $ # note again that first copy of duplicate is retained 459 | $ sort -fu words.txt 460 | are 461 | CAR 462 | foot 463 | ``` 464 | 465 |
466 | 467 | #### Column based sorting 468 | 469 | From `info sort` 470 | 471 | ``` 472 | ‘-k POS1[,POS2]’ 473 | ‘--key=POS1[,POS2]’ 474 | Specify a sort field that consists of the part of the line between 475 | POS1 and POS2 (or the end of the line, if POS2 is omitted), 476 | _inclusive_. 477 | 478 | Each POS has the form ‘F[.C][OPTS]’, where F is the number of the 479 | field to use, and C is the number of the first character from the 480 | beginning of the field. Fields and character positions are 481 | numbered starting with 1; a character position of zero in POS2 482 | indicates the field’s last character. If ‘.C’ is omitted from 483 | POS1, it defaults to 1 (the beginning of the field); if omitted 484 | from POS2, it defaults to 0 (the end of the field). OPTS are 485 | ordering options, allowing individual keys to be sorted according 486 | to different rules; see below for details. Keys can span multiple 487 | fields. 488 | ``` 489 | 490 | * By default, blank characters (space and tab) serve as field separators 491 | 492 | ```bash 493 | $ cat fruits.txt 494 | apple 42 495 | guava 6 496 | fig 90 497 | banana 31 498 | 499 | $ sort fruits.txt 500 | apple 42 501 | banana 31 502 | fig 90 503 | guava 6 504 | 505 | $ # sort based on 2nd column numbers 506 | $ sort -k2,2n fruits.txt 507 | guava 6 508 | banana 31 509 | apple 42 510 | fig 90 511 | ``` 512 | 513 | * Using a different field separator 514 | * Consider the following sample input file having fields separated by `:` 515 | 516 | ```bash 517 | $ # name:pet_name:no_of_pets 518 | $ cat pets.txt 519 | foo:dog:2 520 | xyz:cat:1 521 | baz:parrot:5 522 | abcd:cat:3 523 | joe:dog:1 524 | bar:fox:1 525 | temp_var:squirrel:4 526 | boss:dog:10 527 | ``` 528 | 529 | * Sorting based on particular column or column to end of line 530 | * In case of multiple entries, by default `sort` would use content of remaining parts of line to resolve 531 | 532 | ```bash 533 | $ # only 2nd column 534 | $ # -k2,4 would mean 2nd column to 4th column 535 | $ sort -t: -k2,2 pets.txt 536 | abcd:cat:3 537 | xyz:cat:1 538 | boss:dog:10 539 | foo:dog:2 540 | joe:dog:1 541 | bar:fox:1 542 | baz:parrot:5 543 | temp_var:squirrel:4 544 | 545 | $ # from 2nd column to end of line 546 | $ sort -t: -k2 pets.txt 547 | xyz:cat:1 548 | abcd:cat:3 549 | joe:dog:1 550 | boss:dog:10 551 | foo:dog:2 552 | bar:fox:1 553 | baz:parrot:5 554 | temp_var:squirrel:4 555 | ``` 556 | 557 | * Multiple keys can be specified to resolve ties 558 | * Note that if there are still multiple entries with specified keys, remaining parts of lines would be used 559 | 560 | ```bash 561 | $ # default sort for 2nd column, numeric sort on 3rd column to resolve ties 562 | $ sort -t: -k2,2 -k3,3n pets.txt 563 | xyz:cat:1 564 | abcd:cat:3 565 | joe:dog:1 566 | foo:dog:2 567 | boss:dog:10 568 | bar:fox:1 569 | baz:parrot:5 570 | temp_var:squirrel:4 571 | 572 | $ # numeric sort on 3rd column, default sort for 2nd column to resolve ties 573 | $ sort -t: -k3,3n -k2,2 pets.txt 574 | xyz:cat:1 575 | joe:dog:1 576 | bar:fox:1 577 | foo:dog:2 578 | abcd:cat:3 579 | temp_var:squirrel:4 580 | baz:parrot:5 581 | boss:dog:10 582 | ``` 583 | 584 | * Use `-s` option to retain original order of lines in case of tie 585 | 586 | ```bash 587 | $ sort -s -t: -k2,2 pets.txt 588 | xyz:cat:1 589 | abcd:cat:3 590 | foo:dog:2 591 | joe:dog:1 592 | boss:dog:10 593 | bar:fox:1 594 | baz:parrot:5 595 | temp_var:squirrel:4 596 | ``` 597 | 598 | * The `-u` option, as seen earlier, will retain only first match 599 | 600 | ```bash 601 | $ sort -u -t: -k2,2 pets.txt 602 | xyz:cat:1 603 | foo:dog:2 604 | bar:fox:1 605 | baz:parrot:5 606 | temp_var:squirrel:4 607 | 608 | $ sort -u -t: -k3,3n pets.txt 609 | xyz:cat:1 610 | foo:dog:2 611 | abcd:cat:3 612 | temp_var:squirrel:4 613 | baz:parrot:5 614 | boss:dog:10 615 | ``` 616 | 617 | * Sometimes, the input has to be sorted first and then `-u` used on the sorted output 618 | * See also [remove duplicates based on the value of another column](https://unix.stackexchange.com/questions/379835/remove-duplicates-based-on-the-value-of-another-column) 619 | 620 | ```bash 621 | $ # sort by number in 3rd column 622 | $ sort -t: -k3,3n pets.txt 623 | bar:fox:1 624 | joe:dog:1 625 | xyz:cat:1 626 | foo:dog:2 627 | abcd:cat:3 628 | temp_var:squirrel:4 629 | baz:parrot:5 630 | boss:dog:10 631 | 632 | $ # then get unique entry based on 2nd column 633 | $ sort -t: -k3,3n pets.txt | sort -t: -u -k2,2 634 | xyz:cat:1 635 | joe:dog:1 636 | bar:fox:1 637 | baz:parrot:5 638 | temp_var:squirrel:4 639 | ``` 640 | 641 | * Specifying particular characters within fields 642 | * If character position is not specified, defaults to `1` for starting column and `0` (last character) for ending column 643 | 644 | ```bash 645 | $ cat marks.txt 646 | fork,ap_12,54 647 | flat,up_342,1.2 648 | fold,tn_48,211 649 | more,ap_93,7 650 | rest,up_5,63 651 | 652 | $ # for 2nd column, sort numerically only from 4th character to end 653 | $ sort -t, -k2.4,2n marks.txt 654 | rest,up_5,63 655 | fork,ap_12,54 656 | fold,tn_48,211 657 | more,ap_93,7 658 | flat,up_342,1.2 659 | 660 | $ # sort uniquely based on first two characters of line 661 | $ sort -u -k1.1,1.2 marks.txt 662 | flat,up_342,1.2 663 | fork,ap_12,54 664 | more,ap_93,7 665 | rest,up_5,63 666 | ``` 667 | 668 | * If there are headers 669 | 670 | ```bash 671 | $ cat header.txt 672 | fruit qty 673 | apple 42 674 | guava 6 675 | fig 90 676 | banana 31 677 | 678 | $ # separate and combine header and content to be sorted 679 | $ cat <(head -n1 header.txt) <(tail -n +2 header.txt | sort -k2nr) 680 | fruit qty 681 | fig 90 682 | apple 42 683 | banana 31 684 | guava 6 685 | ``` 686 | 687 | * See also [sort by last field value when number of fields varies](https://stackoverflow.com/questions/3832068/bash-sort-text-file-by-last-field-value) 688 | 689 |
690 | 691 | #### Further reading for sort 692 | 693 | * There are many other options apart from handful presented above. See `man sort` and `info sort` for detailed documentation and more examples 694 | * [sort like a master](http://www.skorks.com/2010/05/sort-files-like-a-master-with-the-linux-sort-command-bash/) 695 | * [When -b to ignore leading blanks is needed](https://unix.stackexchange.com/a/104527/109046) 696 | * [sort Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/sort?sort=votes&pageSize=15) 697 | * [sort on multiple columns using -k option](https://unix.stackexchange.com/questions/249452/unix-multiple-column-sort-issue) 698 | * [sort a string character wise](https://stackoverflow.com/questions/2373874/how-to-sort-characters-in-a-string) 699 | * [Scalability of 'sort -u' for gigantic files](https://unix.stackexchange.com/questions/279096/scalability-of-sort-u-for-gigantic-files) 700 | 701 |
702 | 703 | ## uniq 704 | 705 | ```bash 706 | $ uniq --version | head -n1 707 | uniq (GNU coreutils) 8.25 708 | 709 | $ man uniq 710 | UNIQ(1) User Commands UNIQ(1) 711 | 712 | NAME 713 | uniq - report or omit repeated lines 714 | 715 | SYNOPSIS 716 | uniq [OPTION]... [INPUT [OUTPUT]] 717 | 718 | DESCRIPTION 719 | Filter adjacent matching lines from INPUT (or standard input), writing 720 | to OUTPUT (or standard output). 721 | 722 | With no options, matching lines are merged to the first occurrence. 723 | ... 724 | ``` 725 | 726 |
727 | 728 | #### Default uniq 729 | 730 | ```bash 731 | $ cat word_list.txt 732 | are 733 | are 734 | to 735 | good 736 | bad 737 | bad 738 | bad 739 | good 740 | are 741 | bad 742 | 743 | $ # adjacent duplicate lines are removed, leaving one copy 744 | $ uniq word_list.txt 745 | are 746 | to 747 | good 748 | bad 749 | good 750 | are 751 | bad 752 | 753 | $ # To remove duplicates from entire file, input has to be sorted first 754 | $ # also showcases that uniq accepts stdin as input 755 | $ sort word_list.txt | uniq 756 | are 757 | bad 758 | good 759 | to 760 | ``` 761 | 762 |
763 | 764 | #### Only duplicates 765 | 766 | ```bash 767 | $ # duplicates adjacent to each other 768 | $ uniq -d word_list.txt 769 | are 770 | bad 771 | 772 | $ # duplicates in entire file 773 | $ sort word_list.txt | uniq -d 774 | are 775 | bad 776 | good 777 | ``` 778 | 779 | * To get only duplicates as well as show all duplicates 780 | 781 | ```bash 782 | $ uniq -D word_list.txt 783 | are 784 | are 785 | bad 786 | bad 787 | bad 788 | 789 | $ sort word_list.txt | uniq -D 790 | are 791 | are 792 | are 793 | bad 794 | bad 795 | bad 796 | bad 797 | good 798 | good 799 | ``` 800 | 801 | * To distinguish the different groups 802 | 803 | ```bash 804 | $ # using --all-repeated=prepend will add a newline before the first group as well 805 | $ sort word_list.txt | uniq --all-repeated=separate 806 | are 807 | are 808 | are 809 | 810 | bad 811 | bad 812 | bad 813 | bad 814 | 815 | good 816 | good 817 | ``` 818 | 819 |
820 | 821 | #### Only unique 822 | 823 | ```bash 824 | $ # lines with no adjacent duplicates 825 | $ uniq -u word_list.txt 826 | to 827 | good 828 | good 829 | are 830 | bad 831 | 832 | $ # unique lines in entire file 833 | $ sort word_list.txt | uniq -u 834 | to 835 | ``` 836 | 837 |
838 | 839 | #### Prefix count 840 | 841 | ```bash 842 | $ # adjacent lines 843 | $ uniq -c word_list.txt 844 | 2 are 845 | 1 to 846 | 1 good 847 | 3 bad 848 | 1 good 849 | 1 are 850 | 1 bad 851 | 852 | $ # entire file 853 | $ sort word_list.txt | uniq -c 854 | 3 are 855 | 4 bad 856 | 2 good 857 | 1 to 858 | 859 | $ # entire file, only duplicates 860 | $ sort word_list.txt | uniq -cd 861 | 3 are 862 | 4 bad 863 | 2 good 864 | ``` 865 | 866 | * Sorting by count 867 | 868 | ```bash 869 | $ # sort by count 870 | $ sort word_list.txt | uniq -c | sort -n 871 | 1 to 872 | 2 good 873 | 3 are 874 | 4 bad 875 | 876 | $ # reverse the order, highest count first 877 | $ sort word_list.txt | uniq -c | sort -nr 878 | 4 bad 879 | 3 are 880 | 2 good 881 | 1 to 882 | ``` 883 | 884 | * To get only entries with min/max count, bit of [awk](./gnu_awk.md) magic would help 885 | 886 | ```bash 887 | $ # consider this result 888 | $ sort colors.txt | uniq -c | sort -nr 889 | 3 Red 890 | 3 Blue 891 | 2 Yellow 892 | 1 Green 893 | 1 Black 894 | 895 | $ # to get all max count 896 | $ # save 1st line 1st column value to c and then print if 1st column equals c 897 | $ sort colors.txt | uniq -c | sort -nr | awk 'NR==1{c=$1} $1==c' 898 | 3 Red 899 | 3 Blue 900 | $ # to get all min count 901 | $ sort colors.txt | uniq -c | sort -n | awk 'NR==1{c=$1} $1==c' 902 | 1 Black 903 | 1 Green 904 | ``` 905 | 906 | * Get rough count of most used commands from `history` file 907 | 908 | ```bash 909 | $ # awk '{print $1}' will get the 1st column alone 910 | $ awk '{print $1}' "$HISTFILE" | sort | uniq -c | sort -nr | head 911 | 1465 echo 912 | 1180 grep 913 | 552 cd 914 | 531 awk 915 | 451 sed 916 | 423 vi 917 | 418 cat 918 | 392 perl 919 | 325 printf 920 | 320 sort 921 | 922 | $ # extract command name from start of line or preceded by 'spaces|spaces' 923 | $ # won't catch commands in other places like command substitution though 924 | $ grep -oP '(^| +\| +)\K[^ ]+' "$HISTFILE" | sort | uniq -c | sort -nr | head 925 | 2006 grep 926 | 1469 echo 927 | 933 sed 928 | 698 awk 929 | 552 cd 930 | 513 perl 931 | 510 cat 932 | 453 sort 933 | 423 vi 934 | 327 printf 935 | ``` 936 | 937 |
938 | 939 | #### Ignoring case 940 | 941 | ```bash 942 | $ cat another_list.txt 943 | food 944 | Food 945 | good 946 | are 947 | bad 948 | Are 949 | 950 | $ # note how first copy is retained 951 | $ uniq -i another_list.txt 952 | food 953 | good 954 | are 955 | bad 956 | Are 957 | 958 | $ uniq -iD another_list.txt 959 | food 960 | Food 961 | ``` 962 | 963 |
964 | 965 | #### Combining multiple files 966 | 967 | ```bash 968 | $ sort -f word_list.txt another_list.txt | uniq -i 969 | are 970 | bad 971 | food 972 | good 973 | to 974 | 975 | $ sort -f word_list.txt another_list.txt | uniq -c 976 | 4 are 977 | 1 Are 978 | 5 bad 979 | 1 food 980 | 1 Food 981 | 3 good 982 | 1 to 983 | 984 | $ sort -f word_list.txt another_list.txt | uniq -ic 985 | 5 are 986 | 5 bad 987 | 2 food 988 | 3 good 989 | 1 to 990 | ``` 991 | 992 | * If only adjacent lines (not sorted) is required, need to concatenate files using another command 993 | 994 | ```bash 995 | $ uniq -id word_list.txt 996 | are 997 | bad 998 | 999 | $ uniq -id another_list.txt 1000 | food 1001 | 1002 | $ cat word_list.txt another_list.txt | uniq -id 1003 | are 1004 | bad 1005 | food 1006 | ``` 1007 | 1008 |
1009 | 1010 | #### Column options 1011 | 1012 | * `uniq` has few options dealing with column manipulations. Not extensive as `sort -k` but handy for some cases 1013 | * First up, skipping fields 1014 | * No option to specify different delimiter 1015 | * From `info uniq`: Fields are sequences of non-space non-tab characters that are separated from each other by at least one space or tab 1016 | * Number of spaces/tabs between fields should be same 1017 | 1018 | ```bash 1019 | $ cat shopping.txt 1020 | lemon 5 1021 | mango 5 1022 | banana 8 1023 | bread 1 1024 | orange 5 1025 | 1026 | $ # skips first field 1027 | $ uniq -f1 shopping.txt 1028 | lemon 5 1029 | banana 8 1030 | bread 1 1031 | orange 5 1032 | 1033 | $ # use -f3 to skip first three fields and so on 1034 | ``` 1035 | 1036 | * Skipping characters 1037 | 1038 | ```bash 1039 | $ cat text 1040 | glue 1041 | blue 1042 | black 1043 | stack 1044 | stuck 1045 | 1046 | $ # don't consider first 2 characters 1047 | $ uniq -s2 text 1048 | glue 1049 | black 1050 | stuck 1051 | 1052 | $ # to visualize the above example 1053 | $ # assume there are two fields and uniq is applied on 2nd column 1054 | $ sed 's/^../& /' text 1055 | gl ue 1056 | bl ue 1057 | bl ack 1058 | st ack 1059 | st uck 1060 | ``` 1061 | 1062 | * Upto specified characters 1063 | 1064 | ```bash 1065 | $ # consider only first 2 characters 1066 | $ uniq -w2 text 1067 | glue 1068 | blue 1069 | stack 1070 | 1071 | $ # to visualize the above example 1072 | $ # assume there are two fields and uniq is applied on 1st column 1073 | $ sed 's/^../& /' text 1074 | gl ue 1075 | bl ue 1076 | bl ack 1077 | st ack 1078 | st uck 1079 | ``` 1080 | 1081 | * Combining `-s` and `-w` 1082 | * Can be combined with `-f` as well 1083 | 1084 | ```bash 1085 | $ # skip first 3 characters and then use next 2 characters 1086 | $ uniq -s3 -w2 text 1087 | glue 1088 | black 1089 | ``` 1090 | 1091 | 1092 |
1093 | 1094 | #### Further reading for uniq 1095 | 1096 | * Do check out `man uniq` and `info uniq` for other options and more detailed documentation 1097 | * [uniq Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/uniq?sort=votes&pageSize=15) 1098 | * [process duplicate lines only based on certain fields](https://unix.stackexchange.com/questions/387590/print-the-duplicate-lines-only-on-fields-1-2-from-csv-file) 1099 | 1100 |
1101 | 1102 | ## comm 1103 | 1104 | ```bash 1105 | $ comm --version | head -n1 1106 | comm (GNU coreutils) 8.25 1107 | 1108 | $ man comm 1109 | COMM(1) User Commands COMM(1) 1110 | 1111 | NAME 1112 | comm - compare two sorted files line by line 1113 | 1114 | SYNOPSIS 1115 | comm [OPTION]... FILE1 FILE2 1116 | 1117 | DESCRIPTION 1118 | Compare sorted files FILE1 and FILE2 line by line. 1119 | 1120 | When FILE1 or FILE2 (not both) is -, read standard input. 1121 | 1122 | With no options, produce three-column output. Column one contains 1123 | lines unique to FILE1, column two contains lines unique to FILE2, and 1124 | column three contains lines common to both files. 1125 | ... 1126 | ``` 1127 | 1128 |
1129 | 1130 | #### Default three column output 1131 | 1132 | Consider below sample input files 1133 | 1134 | ```bash 1135 | $ # sorted input files viewed side by side 1136 | $ paste colors_1.txt colors_2.txt 1137 | Blue Black 1138 | Brown Blue 1139 | Purple Green 1140 | Red Red 1141 | Teal White 1142 | Yellow 1143 | ``` 1144 | 1145 | * Without any option, `comm` gives 3 column output 1146 | * lines unique to first file 1147 | * lines unique to second file 1148 | * lines common to both files 1149 | 1150 | ```bash 1151 | $ comm colors_1.txt colors_2.txt 1152 | Black 1153 | Blue 1154 | Brown 1155 | Green 1156 | Purple 1157 | Red 1158 | Teal 1159 | White 1160 | Yellow 1161 | ``` 1162 | 1163 |
1164 | 1165 | #### Suppressing columns 1166 | 1167 | * `-1` suppress lines unique to first file 1168 | * `-2` suppress lines unique to second file 1169 | * `-3` suppress lines common to both files 1170 | 1171 | ```bash 1172 | $ # suppressing column 3 1173 | $ comm -3 colors_1.txt colors_2.txt 1174 | Black 1175 | Brown 1176 | Green 1177 | Purple 1178 | Teal 1179 | White 1180 | Yellow 1181 | ``` 1182 | 1183 | * Combining options gives three distinct and useful constructs 1184 | * First, getting only common lines to both files 1185 | 1186 | ```bash 1187 | $ comm -12 colors_1.txt colors_2.txt 1188 | Blue 1189 | Red 1190 | ``` 1191 | 1192 | * Second, lines unique to first file 1193 | 1194 | ```bash 1195 | $ comm -23 colors_1.txt colors_2.txt 1196 | Brown 1197 | Purple 1198 | Teal 1199 | Yellow 1200 | ``` 1201 | 1202 | * And the third, lines unique to second file 1203 | 1204 | ```bash 1205 | $ comm -13 colors_1.txt colors_2.txt 1206 | Black 1207 | Green 1208 | White 1209 | ``` 1210 | 1211 | * See also how the above three cases can be done [using grep alone](./gnu_grep.md#search-strings-from-file) 1212 | * **Note** input files do not need to be sorted for `grep` solution 1213 | 1214 | If different `sort` order than default is required, use `--nocheck-order` to ignore error message 1215 | 1216 | ```bash 1217 | $ comm -23 <(sort -n numbers.txt) <(sort -n nums.txt) 1218 | 3 1219 | comm: file 1 is not in sorted order 1220 | 20 1221 | 53 1222 | 101 1223 | 1224 | $ comm --nocheck-order -23 <(sort -n numbers.txt) <(sort -n nums.txt) 1225 | 3 1226 | 20 1227 | 53 1228 | 101 1229 | ``` 1230 | 1231 |
1232 | 1233 | #### Files with duplicates 1234 | 1235 | * As many duplicate lines match in both files, they'll be considered as common 1236 | * Rest will be unique to respective files 1237 | * This is useful for cases like finding lines present in first but not in second taking in to consideration count of duplicates as well 1238 | * This solution won't be possible with `grep` 1239 | 1240 | ```bash 1241 | $ paste list1 list2 1242 | a a 1243 | a b 1244 | a c 1245 | b c 1246 | b d 1247 | c 1248 | 1249 | $ comm list1 list2 1250 | a 1251 | a 1252 | a 1253 | b 1254 | b 1255 | c 1256 | c 1257 | d 1258 | 1259 | $ comm -23 list1 list2 1260 | a 1261 | a 1262 | b 1263 | ``` 1264 | 1265 |
1266 | 1267 | #### Further reading for comm 1268 | 1269 | * `man comm` and `info comm` for more options and detailed documentation 1270 | * [comm Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/comm?sort=votes&pageSize=15) 1271 | 1272 |
1273 | 1274 | ## shuf 1275 | 1276 | ```bash 1277 | $ shuf --version | head -n1 1278 | shuf (GNU coreutils) 8.25 1279 | 1280 | $ man shuf 1281 | SHUF(1) User Commands SHUF(1) 1282 | 1283 | NAME 1284 | shuf - generate random permutations 1285 | 1286 | SYNOPSIS 1287 | shuf [OPTION]... [FILE] 1288 | shuf -e [OPTION]... [ARG]... 1289 | shuf -i LO-HI [OPTION]... 1290 | 1291 | DESCRIPTION 1292 | Write a random permutation of the input lines to standard output. 1293 | 1294 | With no FILE, or when FILE is -, read standard input. 1295 | ... 1296 | ``` 1297 | 1298 |
1299 | 1300 | #### Random lines 1301 | 1302 | * Without repeating input lines 1303 | 1304 | ```bash 1305 | $ cat nums.txt 1306 | 1 1307 | 10 1308 | 10 1309 | 12 1310 | 23 1311 | 563 1312 | 1313 | $ # duplicates can end up anywhere 1314 | $ # all lines are part of output 1315 | $ shuf nums.txt 1316 | 10 1317 | 23 1318 | 1 1319 | 10 1320 | 563 1321 | 12 1322 | 1323 | $ # limit max number of output lines 1324 | $ shuf -n2 nums.txt 1325 | 563 1326 | 23 1327 | ``` 1328 | 1329 | * Use `-o` option to specify output file name instead of displaying on stdout 1330 | * Helpful for inplace editing 1331 | 1332 | ```bash 1333 | $ shuf nums.txt -o nums.txt 1334 | $ cat nums.txt 1335 | 10 1336 | 12 1337 | 23 1338 | 10 1339 | 563 1340 | 1 1341 | ``` 1342 | 1343 | * With repeated input lines 1344 | 1345 | ```bash 1346 | $ # -n3 for max 3 lines, -r allows input lines to be repeated 1347 | $ shuf -n3 -r nums.txt 1348 | 1 1349 | 1 1350 | 563 1351 | 1352 | $ seq 3 | shuf -n5 -r 1353 | 2 1354 | 1 1355 | 2 1356 | 1 1357 | 2 1358 | 1359 | $ # if a limit using -n is not specified, shuf will output lines indefinitely 1360 | ``` 1361 | 1362 | * use `-e` option to specify multiple input lines from command line itself 1363 | 1364 | ```bash 1365 | $ shuf -e red blue green 1366 | green 1367 | blue 1368 | red 1369 | 1370 | $ shuf -e 'hi there' 'hello world' foo bar 1371 | bar 1372 | hi there 1373 | foo 1374 | hello world 1375 | 1376 | $ shuf -n2 -e 'hi there' 'hello world' foo bar 1377 | foo 1378 | hi there 1379 | 1380 | $ shuf -r -n4 -e foo bar 1381 | foo 1382 | foo 1383 | bar 1384 | foo 1385 | ``` 1386 | 1387 |
1388 | 1389 | #### Random integer numbers 1390 | 1391 | * The `-i` option accepts integer range as input to be shuffled 1392 | 1393 | ```bash 1394 | $ shuf -i 3-8 1395 | 3 1396 | 7 1397 | 6 1398 | 4 1399 | 8 1400 | 5 1401 | ``` 1402 | 1403 | * Combine with other options as needed 1404 | 1405 | ```bash 1406 | $ shuf -n3 -i 3-8 1407 | 5 1408 | 4 1409 | 7 1410 | 1411 | $ shuf -r -n4 -i 3-8 1412 | 5 1413 | 5 1414 | 7 1415 | 8 1416 | 1417 | $ shuf -r -n5 -i 0-1 1418 | 1 1419 | 0 1420 | 0 1421 | 1 1422 | 1 1423 | ``` 1424 | 1425 | * Use [seq](./miscellaneous.md#seq) input if negative numbers, floating point, etc are needed 1426 | 1427 | ```bash 1428 | $ seq 2 -1 -2 | shuf 1429 | 2 1430 | -1 1431 | -2 1432 | 0 1433 | 1 1434 | 1435 | $ seq 0.3 0.1 0.7 | shuf -n3 1436 | 0.4 1437 | 0.5 1438 | 0.7 1439 | ``` 1440 | 1441 | 1442 |
1443 | 1444 | #### Further reading for shuf 1445 | 1446 | * `man shuf` and `info shuf` for more options and detailed documentation 1447 | * [Generate random numbers in specific range](https://unix.stackexchange.com/questions/140750/generate-random-numbers-in-specific-range) 1448 | * [Variable - randomly choose among three numbers](https://unix.stackexchange.com/questions/330689/variable-randomly-chosen-among-three-numbers-10-100-and-1000) 1449 | * Related to 'random' stuff: 1450 | * [How to generate a random string?](https://unix.stackexchange.com/questions/230673/how-to-generate-a-random-string) 1451 | * [How can I populate a file with random data?](https://unix.stackexchange.com/questions/33629/how-can-i-populate-a-file-with-random-data) 1452 | * [Run commands at random](https://unix.stackexchange.com/questions/81566/run-commands-at-random) 1453 | 1454 | -------------------------------------------------------------------------------- /tail_less_cat_head.md: -------------------------------------------------------------------------------- 1 | # Cat, Less, Tail and Head 2 | 3 | **Table of Contents** 4 | 5 | * [cat](#cat) 6 | * [Concatenate files](#concatenate-files) 7 | * [Accepting input from stdin](#accepting-input-from-stdin) 8 | * [Squeeze consecutive empty lines](#squeeze-consecutive-empty-lines) 9 | * [Prefix line numbers](#prefix-line-numbers) 10 | * [Viewing special characters](#viewing-special-characters) 11 | * [Writing text to file](#writing-text-to-file) 12 | * [tac](#tac) 13 | * [Useless use of cat](#useless-use-of-cat) 14 | * [Further Reading for cat](#further-reading-for-cat) 15 | * [less](#less) 16 | * [Navigation commands](#navigation-commands) 17 | * [Further Reading for less](#further-reading-for-less) 18 | * [tail](#tail) 19 | * [linewise tail](#linewise-tail) 20 | * [characterwise tail](#characterwise-tail) 21 | * [multiple file input for tail](#multiple-file-input-for-tail) 22 | * [Further Reading for tail](#further-reading-for-tail) 23 | * [head](#head) 24 | * [linewise head](#linewise-head) 25 | * [characterwise head](#characterwise-head) 26 | * [multiple file input for head](#multiple-file-input-for-head) 27 | * [combining head and tail](#combining-head-and-tail) 28 | * [Further Reading for head](#further-reading-for-head) 29 | * [Text Editors](#text-editors) 30 | 31 |
32 | 33 | ## cat 34 | 35 | ```bash 36 | $ cat --version | head -n1 37 | cat (GNU coreutils) 8.25 38 | 39 | $ man cat 40 | CAT(1) User Commands CAT(1) 41 | 42 | NAME 43 | cat - concatenate files and print on the standard output 44 | 45 | SYNOPSIS 46 | cat [OPTION]... [FILE]... 47 | 48 | DESCRIPTION 49 | Concatenate FILE(s) to standard output. 50 | 51 | With no FILE, or when FILE is -, read standard input. 52 | ... 53 | ``` 54 | 55 | * For below examples, `marks_201*` files contain 3 fields delimited by TAB 56 | * To avoid formatting issues, TAB has been converted to spaces using `col -x` while pasting the output here 57 | 58 |
59 | 60 | #### Concatenate files 61 | 62 | * One or more files can be given as input and hence a lot of times, `cat` is used to quickly see contents of small single file on terminal 63 | * To save the output of concatenation, just redirect stdout 64 | 65 | ```bash 66 | $ ls 67 | marks_2015.txt marks_2016.txt marks_2017.txt 68 | 69 | $ cat marks_201* 70 | Name Maths Science 71 | foo 67 78 72 | bar 87 85 73 | Name Maths Science 74 | foo 70 75 75 | bar 85 88 76 | Name Maths Science 77 | foo 68 76 78 | bar 90 90 79 | 80 | $ # save stdout to a file 81 | $ cat marks_201* > all_marks.txt 82 | ``` 83 | 84 |
85 | 86 | #### Accepting input from stdin 87 | 88 | ```bash 89 | $ # combining input from stdin and other files 90 | $ printf 'Name\tMaths\tScience \nbaz\t56\t63\nbak\t71\t65\n' | cat - marks_2015.txt 91 | Name Maths Science 92 | baz 56 63 93 | bak 71 65 94 | Name Maths Science 95 | foo 67 78 96 | bar 87 85 97 | 98 | $ # - can be placed in whatever order is required 99 | $ printf 'Name\tMaths\tScience \nbaz\t56\t63\nbak\t71\t65\n' | cat marks_2015.txt - 100 | Name Maths Science 101 | foo 67 78 102 | bar 87 85 103 | Name Maths Science 104 | baz 56 63 105 | bak 71 65 106 | ``` 107 | 108 |
109 | 110 | #### Squeeze consecutive empty lines 111 | 112 | ```bash 113 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n' 114 | hello 115 | 116 | 117 | world 118 | 119 | have a nice day 120 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n' | cat -s 121 | hello 122 | 123 | world 124 | 125 | have a nice day 126 | ``` 127 | 128 |
129 | 130 | #### Prefix line numbers 131 | 132 | ```bash 133 | $ # number all lines 134 | $ cat -n marks_201* 135 | 1 Name Maths Science 136 | 2 foo 67 78 137 | 3 bar 87 85 138 | 4 Name Maths Science 139 | 5 foo 70 75 140 | 6 bar 85 88 141 | 7 Name Maths Science 142 | 8 foo 68 76 143 | 9 bar 90 90 144 | 145 | $ # number only non-empty lines 146 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n' | cat -sb 147 | 1 hello 148 | 149 | 2 world 150 | 151 | 3 have a nice day 152 | ``` 153 | 154 | * For more numbering options, check out the command `nl` 155 | 156 | ```bash 157 | $ whatis nl 158 | nl (1) - number lines of files 159 | ``` 160 | 161 |
162 | 163 | #### Viewing special characters 164 | 165 | * End of line identified by `$` 166 | * Useful for example to see trailing spaces 167 | 168 | ```bash 169 | $ cat -E marks_2015.txt 170 | Name Maths Science $ 171 | foo 67 78$ 172 | bar 87 85$ 173 | ``` 174 | 175 | * TAB identified by `^I` 176 | 177 | ```bash 178 | $ cat -T marks_2015.txt 179 | Name^IMaths^IScience 180 | foo^I67^I78 181 | bar^I87^I85 182 | ``` 183 | 184 | * Non-printing characters 185 | * See [Show Non-Printing Characters](http://docstore.mik.ua/orelly/unix/upt/ch25_07.htm) for more detailed info 186 | 187 | ```bash 188 | $ # NUL character 189 | $ printf 'foo\0bar\0baz\n' | cat -v 190 | foo^@bar^@baz 191 | 192 | $ # to check for dos-style line endings 193 | $ printf 'Hello World!\r\n' | cat -v 194 | Hello World!^M 195 | 196 | $ printf 'Hello World!\r\n' | dos2unix | cat -v 197 | Hello World! 198 | ``` 199 | 200 | * the `-A` option is equivalent to `-vET` 201 | * the `-e` option is equivalent to `-vE` 202 | * If `dos2unix` and `unix2dos` are not available, see [How to convert DOS/Windows newline (CRLF) to Unix newline (\n)](https://stackoverflow.com/questions/2613800/how-to-convert-dos-windows-newline-crlf-to-unix-newline-n-in-a-bash-script) 203 | 204 |
205 | 206 | #### Writing text to file 207 | 208 | ```bash 209 | $ cat > sample.txt 210 | This is an example of adding text to a new file using cat command. 211 | Press Ctrl+d on a newline to save and quit. 212 | 213 | $ cat sample.txt 214 | This is an example of adding text to a new file using cat command. 215 | Press Ctrl+d on a newline to save and quit. 216 | ``` 217 | 218 | * See also how to use [heredoc](http://mywiki.wooledge.org/HereDocument) 219 | * [How can I write a here doc to a file](https://stackoverflow.com/questions/2953081/how-can-i-write-a-here-doc-to-a-file-in-bash-script) 220 | * See also [difference between Ctrl+c and Ctrl+d to signal end of stdin input in bash](https://unix.stackexchange.com/questions/16333/how-to-signal-the-end-of-stdin-input-in-bash) 221 | 222 |
223 | 224 | #### tac 225 | 226 | ```bash 227 | $ whatis tac 228 | tac (1) - concatenate and print files in reverse 229 | $ tac --version | head -n1 230 | tac (GNU coreutils) 8.25 231 | 232 | $ seq 3 | tac 233 | 3 234 | 2 235 | 1 236 | 237 | $ tac marks_2015.txt 238 | bar 87 85 239 | foo 67 78 240 | Name Maths Science 241 | ``` 242 | 243 | * Useful in cases where logic is easier to write when working on reversed file 244 | * Consider this made up log file, many **Warning** lines but need to extract only from last such **Warning** upto **Error** line 245 | * See [GNU sed chapter](./gnu_sed.md#lines-between-two-regexps) for details on the `sed` command used below 246 | 247 | ```bash 248 | $ cat report.log 249 | blah blah 250 | Warning: something went wrong 251 | more blah 252 | whatever 253 | Warning: something else went wrong 254 | some text 255 | some more text 256 | Error: something seriously went wrong 257 | blah blah blah 258 | 259 | $ tac report.log | sed -n '/Error:/,/Warning:/p' | tac 260 | Warning: something else went wrong 261 | some text 262 | some more text 263 | Error: something seriously went wrong 264 | ``` 265 | 266 | * Similarly, if characters in lines have to be reversed, use the `rev` command 267 | 268 | ```bash 269 | $ whatis rev 270 | rev (1) - reverse lines characterwise 271 | ``` 272 | 273 |
274 | 275 | #### Useless use of cat 276 | 277 | * `cat` is used so frequently to view contents of a file that somehow users think other commands cannot handle file input 278 | * [UUOC](https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat) 279 | * [Useless Use of Cat Award](http://porkmail.org/era/unix/award.html) 280 | 281 | ```bash 282 | $ cat report.log | grep -E 'Warning|Error' 283 | Warning: something went wrong 284 | Warning: something else went wrong 285 | Error: something seriously went wrong 286 | $ grep -E 'Warning|Error' report.log 287 | Warning: something went wrong 288 | Warning: something else went wrong 289 | Error: something seriously went wrong 290 | ``` 291 | 292 | * Use [input redirection](http://wiki.bash-hackers.org/howto/redirection_tutorial) if a command doesn't accept file input 293 | 294 | ```bash 295 | $ cat marks_2015.txt | tr 'A-Z' 'a-z' 296 | name maths science 297 | foo 67 78 298 | bar 87 85 299 | $ tr 'A-Z' 'a-z' < marks_2015.txt 300 | name maths science 301 | foo 67 78 302 | bar 87 85 303 | ``` 304 | 305 | * However, `cat` should definitely be used where **concatenation** is needed 306 | 307 | ```bash 308 | $ grep -c 'foo' marks_201* 309 | marks_2015.txt:1 310 | marks_2016.txt:1 311 | marks_2017.txt:1 312 | 313 | $ # concatenation allows to get overall count in one-shot in this case 314 | $ cat marks_201* | grep -c 'foo' 315 | 3 316 | ``` 317 | 318 |
319 | 320 | #### Further Reading for cat 321 | 322 | * [cat Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/cat?sort=votes&pageSize=15) 323 | * [cat Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/cat?sort=votes&pageSize=15) 324 | 325 |
326 | 327 | ## less 328 | 329 | ```bash 330 | $ less --version | head -n1 331 | less 481 (GNU regular expressions) 332 | 333 | $ # By default, pager is used to display the man pages 334 | $ # and usually, pager is linked to less command 335 | $ type pager less 336 | pager is /usr/bin/pager 337 | less is /usr/bin/less 338 | 339 | $ realpath /usr/bin/pager 340 | /bin/less 341 | $ realpath /usr/bin/less 342 | /bin/less 343 | $ diff -s /usr/bin/pager /usr/bin/less 344 | Files /usr/bin/pager and /usr/bin/less are identical 345 | ``` 346 | 347 | * `cat` command is NOT suitable for viewing contents of large files on the Terminal 348 | * `less` displays contents of a file, automatically fits to size of Terminal, allows scrolling in either direction and other options for effective viewing 349 | * Usually, `man` command uses `less` command to display the help page 350 | * The navigation commands are similar to `vi` editor 351 | 352 |
353 | 354 | #### Navigation commands 355 | 356 | Commonly used commands are given below, press `h` for summary of options 357 | 358 | * `g` go to start of file 359 | * `G` go to end of file 360 | * `q` quit 361 | * `/pattern` search for the given pattern in forward direction 362 | * `?pattern` search for the given pattern in backward direction 363 | * `n` go to next pattern 364 | * `N` go to previous pattern 365 | 366 |
367 | 368 | #### Further Reading for less 369 | 370 | * See `man less` for detailed info on commands and options. For example: 371 | * `-s` option to squeeze consecutive blank lines 372 | * `-N` option to prefix line number 373 | * `less` command is an [improved version](https://unix.stackexchange.com/questions/604/isnt-less-just-more) of `more` command 374 | * [differences between most, more and less](https://unix.stackexchange.com/questions/81129/what-are-the-differences-between-most-more-and-less) 375 | * [less Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/less?sort=votes&pageSize=15) 376 | 377 |
378 | 379 | ## tail 380 | 381 | ```bash 382 | $ tail --version | head -n1 383 | tail (GNU coreutils) 8.25 384 | 385 | $ man tail 386 | TAIL(1) User Commands TAIL(1) 387 | 388 | NAME 389 | tail - output the last part of files 390 | 391 | SYNOPSIS 392 | tail [OPTION]... [FILE]... 393 | 394 | DESCRIPTION 395 | Print the last 10 lines of each FILE to standard output. With more 396 | than one FILE, precede each with a header giving the file name. 397 | 398 | With no FILE, or when FILE is -, read standard input. 399 | ... 400 | ``` 401 | 402 |
403 | 404 | #### linewise tail 405 | 406 | Consider this sample file, with line numbers prefixed 407 | 408 | ```bash 409 | $ cat sample.txt 410 | 1) Hello World 411 | 2) 412 | 3) Good day 413 | 4) How are you 414 | 5) 415 | 6) Just do-it 416 | 7) Believe it 417 | 8) 418 | 9) Today is sunny 419 | 10) Not a bit funny 420 | 11) No doubt you like it too 421 | 12) 422 | 13) Much ado about nothing 423 | 14) He he he 424 | 15) Adios amigo 425 | ``` 426 | 427 | * default behavior - display last 10 lines 428 | 429 | ```bash 430 | $ tail sample.txt 431 | 6) Just do-it 432 | 7) Believe it 433 | 8) 434 | 9) Today is sunny 435 | 10) Not a bit funny 436 | 11) No doubt you like it too 437 | 12) 438 | 13) Much ado about nothing 439 | 14) He he he 440 | 15) Adios amigo 441 | ``` 442 | 443 | * Use `-n` option to control number of lines to filter 444 | 445 | ```bash 446 | $ tail -n3 sample.txt 447 | 13) Much ado about nothing 448 | 14) He he he 449 | 15) Adios amigo 450 | 451 | $ # some versions of tail allow to skip explicit n character 452 | $ tail -5 sample.txt 453 | 11) No doubt you like it too 454 | 12) 455 | 13) Much ado about nothing 456 | 14) He he he 457 | 15) Adios amigo 458 | ``` 459 | 460 | * when number is prefixed with `+` sign, all lines are fetched from that particular line number to end of file 461 | 462 | ```bash 463 | $ tail -n +10 sample.txt 464 | 10) Not a bit funny 465 | 11) No doubt you like it too 466 | 12) 467 | 13) Much ado about nothing 468 | 14) He he he 469 | 15) Adios amigo 470 | 471 | $ seq 13 17 | tail -n +3 472 | 15 473 | 16 474 | 17 475 | ``` 476 | 477 |
478 | 479 | #### characterwise tail 480 | 481 | * Note that this works byte wise and not suitable for multi-byte character encodings 482 | 483 | ```bash 484 | $ # last three characters including the newline character 485 | $ echo 'Hi there!' | tail -c3 486 | e! 487 | 488 | $ # excluding the first character 489 | $ echo 'Hi there!' | tail -c +2 490 | i there! 491 | ``` 492 | 493 |
494 | 495 | #### multiple file input for tail 496 | 497 | ```bash 498 | $ tail -n2 report.log sample.txt 499 | ==> report.log <== 500 | Error: something seriously went wrong 501 | blah blah blah 502 | 503 | ==> sample.txt <== 504 | 14) He he he 505 | 15) Adios amigo 506 | 507 | $ # -q option to avoid filename in output 508 | $ tail -q -n2 report.log sample.txt 509 | Error: something seriously went wrong 510 | blah blah blah 511 | 14) He he he 512 | 15) Adios amigo 513 | ``` 514 | 515 |
516 | 517 | #### Further Reading for tail 518 | 519 | * `tail -f` and related options are beyond the scope of this tutorial. Below links might be useful 520 | * [look out for buffering](http://mywiki.wooledge.org/BashFAQ/009) 521 | * [Piping tail -f output though grep twice](https://stackoverflow.com/questions/13858912/piping-tail-output-though-grep-twice) 522 | * [tail and less](https://unix.stackexchange.com/questions/196168/does-less-have-a-feature-like-tail-follow-name-f) 523 | * [tail Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/tail?sort=votes&pageSize=15) 524 | * [tail Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/tail?sort=votes&pageSize=15) 525 | 526 |
527 | 528 | ## head 529 | 530 | ```bash 531 | $ head --version | head -n1 532 | head (GNU coreutils) 8.25 533 | 534 | $ man head 535 | HEAD(1) User Commands HEAD(1) 536 | 537 | NAME 538 | head - output the first part of files 539 | 540 | SYNOPSIS 541 | head [OPTION]... [FILE]... 542 | 543 | DESCRIPTION 544 | Print the first 10 lines of each FILE to standard output. With more 545 | than one FILE, precede each with a header giving the file name. 546 | 547 | With no FILE, or when FILE is -, read standard input. 548 | ... 549 | ``` 550 | 551 |
552 | 553 | #### linewise head 554 | 555 | * default behavior - display starting 10 lines 556 | 557 | ```bash 558 | $ head sample.txt 559 | 1) Hello World 560 | 2) 561 | 3) Good day 562 | 4) How are you 563 | 5) 564 | 6) Just do-it 565 | 7) Believe it 566 | 8) 567 | 9) Today is sunny 568 | 10) Not a bit funny 569 | ``` 570 | 571 | * Use `-n` option to control number of lines to filter 572 | 573 | ```bash 574 | $ head -n3 sample.txt 575 | 1) Hello World 576 | 2) 577 | 3) Good day 578 | 579 | $ # some versions of head allow to skip explicit n character 580 | $ head -4 sample.txt 581 | 1) Hello World 582 | 2) 583 | 3) Good day 584 | 4) How are you 585 | ``` 586 | 587 | * when number is prefixed with `-` sign, all lines are fetched except those many lines to end of file 588 | 589 | ```bash 590 | $ # except last 9 lines of file 591 | $ head -n -9 sample.txt 592 | 1) Hello World 593 | 2) 594 | 3) Good day 595 | 4) How are you 596 | 5) 597 | 6) Just do-it 598 | 599 | $ # except last 2 lines 600 | $ seq 13 17 | head -n -2 601 | 13 602 | 14 603 | 15 604 | ``` 605 | 606 |
607 | 608 | #### characterwise head 609 | 610 | * Note that this works byte wise and not suitable for multi-byte character encodings 611 | 612 | ```bash 613 | $ # if output of command doesn't end with newline, prompt will be on same line 614 | $ # to highlight working of command, the prompt for such cases is not shown here 615 | 616 | $ # first two characters 617 | $ echo 'Hi there!' | head -c2 618 | Hi 619 | 620 | $ # excluding last four characters 621 | $ echo 'Hi there!' | head -c -4 622 | Hi the 623 | ``` 624 | 625 |
626 | 627 | #### multiple file input for head 628 | 629 | ```bash 630 | $ head -n3 report.log sample.txt 631 | ==> report.log <== 632 | blah blah 633 | Warning: something went wrong 634 | more blah 635 | 636 | ==> sample.txt <== 637 | 1) Hello World 638 | 2) 639 | 3) Good day 640 | 641 | $ # -q option to avoid filename in output 642 | $ head -q -n3 report.log sample.txt 643 | blah blah 644 | Warning: something went wrong 645 | more blah 646 | 1) Hello World 647 | 2) 648 | 3) Good day 649 | ``` 650 | 651 |
652 | 653 | #### combining head and tail 654 | 655 | * Despite involving two commands, often this combination is faster than equivalent sed/awk versions 656 | 657 | ```bash 658 | $ head -n11 sample.txt | tail -n3 659 | 9) Today is sunny 660 | 10) Not a bit funny 661 | 11) No doubt you like it too 662 | 663 | $ tail sample.txt | head -n2 664 | 6) Just do-it 665 | 7) Believe it 666 | ``` 667 | 668 |
669 | 670 | #### Further Reading for head 671 | 672 | * [head Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/head?sort=votes&pageSize=15) 673 | 674 |
675 | 676 | ## Text Editors 677 | 678 | For editing text files, the following applications can be used. Of these, `gedit`, `nano`, `vi` and/or `vim` are available in most distros by default 679 | 680 | Easy to use 681 | 682 | * [gedit](https://wiki.gnome.org/Apps/Gedit) 683 | * [geany](http://www.geany.org/) 684 | * [nano](http://nano-editor.org/) 685 | 686 | Powerful text editors 687 | 688 | * [vim](https://github.com/vim/vim) 689 | * [vim learning resources](https://github.com/learnbyexample/scripting_course/blob/master/Vim_curated_resources.md) and [vim reference](https://github.com/learnbyexample/vim_reference) for further info 690 | * [emacs](https://www.gnu.org/software/emacs/) 691 | * [atom](https://atom.io/) 692 | * [sublime](https://www.sublimetext.com/) 693 | 694 | Check out [this analysis](https://github.com/jhallen/joes-sandbox/tree/master/editor-perf) for some performance/feature comparisons of various text editors 695 | -------------------------------------------------------------------------------- /whats_the_difference.md: -------------------------------------------------------------------------------- 1 | # What's the difference 2 | 3 | **Table of Contents** 4 | 5 | * [cmp](#cmp) 6 | * [diff](#diff) 7 | * [Comparing Directories](#comparing-directories) 8 | * [colordiff](#colordiff) 9 | 10 |
11 | 12 | ## cmp 13 | 14 | ```bash 15 | $ cmp --version | head -n1 16 | cmp (GNU diffutils) 3.3 17 | 18 | $ man cmp 19 | CMP(1) User Commands CMP(1) 20 | 21 | NAME 22 | cmp - compare two files byte by byte 23 | 24 | SYNOPSIS 25 | cmp [OPTION]... FILE1 [FILE2 [SKIP1 [SKIP2]]] 26 | 27 | DESCRIPTION 28 | Compare two files byte by byte. 29 | 30 | The optional SKIP1 and SKIP2 specify the number of bytes to skip at the 31 | beginning of each file (zero by default). 32 | ... 33 | ``` 34 | 35 | * As the comparison is byte by byte, it doesn't matter if file is human readable or not 36 | * A typical use case is to check if two executables are same or not 37 | 38 | ```bash 39 | $ echo 'foo 123' > f1; echo 'food 123' > f2 40 | $ cmp f1 f2 41 | f1 f2 differ: byte 4, line 1 42 | 43 | $ # print differing bytes 44 | $ cmp -b f1 f2 45 | f1 f2 differ: byte 4, line 1 is 40 144 d 46 | 47 | $ # skip given bytes from each file 48 | $ # if only one number is given, it is used for both inputs 49 | $ cmp -i 3:4 f1 f2 50 | $ echo $? 51 | 0 52 | 53 | $ # compare only given number of bytes from start of inputs 54 | $ cmp -n 3 f1 f2 55 | $ echo $? 56 | 0 57 | 58 | $ # suppress output 59 | $ cmp -s f1 f2 60 | $ echo $? 61 | 1 62 | ``` 63 | 64 | * Comparison stops immediately at the first difference found 65 | * If verbose option `-l` is used, comparison would stop at whichever input reaches end of file first 66 | 67 | ```bash 68 | $ # first column is byte number 69 | $ # second/third column is respective octal value of differing bytes 70 | $ cmp -l f1 f2 71 | 4 40 144 72 | 5 61 40 73 | 6 62 61 74 | 7 63 62 75 | 8 12 63 76 | cmp: EOF on f1 77 | ``` 78 | 79 | **Further Reading** 80 | 81 | * `man cmp` and `info cmp` for more options and detailed documentation 82 | 83 | 84 |
85 | 86 | ## diff 87 | 88 | ```bash 89 | $ diff --version | head -n1 90 | diff (GNU diffutils) 3.3 91 | 92 | $ man diff 93 | DIFF(1) User Commands DIFF(1) 94 | 95 | NAME 96 | diff - compare files line by line 97 | 98 | SYNOPSIS 99 | diff [OPTION]... FILES 100 | 101 | DESCRIPTION 102 | Compare FILES line by line. 103 | ... 104 | ``` 105 | 106 | * `diff` output shows lines from first file input starting with `<` 107 | * lines from second file input starts with `>` 108 | * between the two file contents, `---` is used as separator 109 | * each difference is prefixed by a command that indicates the differences (see links at end of section for more details) 110 | 111 | ```bash 112 | $ paste d1 d2 113 | 1 1 114 | 2 hello 115 | 3 3 116 | world 4 117 | 118 | $ diff d1 d2 119 | 2c2 120 | < 2 121 | --- 122 | > hello 123 | 4c4 124 | < world 125 | --- 126 | > 4 127 | 128 | $ diff <(seq 4) <(seq 5) 129 | 4a5 130 | > 5 131 | ``` 132 | 133 | * use `-i` option to ignore case 134 | 135 | ```bash 136 | $ echo 'Hello World!' > i1 137 | $ echo 'hello world!' > i2 138 | 139 | $ diff i1 i2 140 | 1c1 141 | < Hello World! 142 | --- 143 | > hello world! 144 | 145 | $ diff -i i1 i2 146 | $ echo $? 147 | 0 148 | ``` 149 | 150 | * ignoring difference in white spaces 151 | 152 | ```bash 153 | $ # -b option to ignore changes in the amount of white space 154 | $ diff -b <(echo 'good day') <(echo 'good day') 155 | $ echo $? 156 | 0 157 | 158 | $ # -w option to ignore all white spaces 159 | $ diff -w <(echo 'hi there ') <(echo ' hi there') 160 | $ echo $? 161 | 0 162 | $ diff -w <(echo 'hi there ') <(echo 'hithere') 163 | $ echo $? 164 | 0 165 | 166 | # use -B to ignore only blank lines 167 | # use -E to ignore changes due to tab expansion 168 | # use -z to ignore trailing white spaces at end of line 169 | ``` 170 | 171 | * side-by-side output 172 | 173 | ```bash 174 | $ diff -y d1 d2 175 | 1 1 176 | 2 | hello 177 | 3 3 178 | world | 4 179 | 180 | $ # -y is usually used along with other options 181 | $ # default width is 130 print columns 182 | $ diff -W 60 --suppress-common-lines -y d1 d2 183 | 2 | hello 184 | world | 4 185 | 186 | $ diff -W 20 --left-column -y <(seq 4) <(seq 5) 187 | 1 ( 188 | 2 ( 189 | 3 ( 190 | 4 ( 191 | > 5 192 | ``` 193 | 194 | * by default, there is no output if input files are same. Use `-s` option to additionally indicate files are same 195 | * by default, all differences are shown. Use `-q` option to indicate only that files differ 196 | 197 | ```bash 198 | $ cp i1 i1_copy 199 | $ diff -s i1 i1_copy 200 | Files i1 and i1_copy are identical 201 | $ diff -s i1 i2 202 | 1c1 203 | < Hello World! 204 | --- 205 | > hello world! 206 | 207 | $ diff -q i1 i1_copy 208 | $ diff -q i1 i2 209 | Files i1 and i2 differ 210 | 211 | $ # combine them to always get one line output 212 | $ diff -sq i1 i1_copy 213 | Files i1 and i1_copy are identical 214 | $ diff -sq i1 i2 215 | Files i1 and i2 differ 216 | ``` 217 | 218 |
219 | 220 | #### Comparing Directories 221 | 222 | * when comparing two files of same name from different directories, specifying the filename is optional for one of the directories 223 | 224 | ```bash 225 | $ mkdir dir1 dir2 226 | $ echo 'Hello World!' > dir1/i1 227 | $ echo 'hello world!' > dir2/i1 228 | 229 | $ diff dir1/i1 dir2 230 | 1c1 231 | < Hello World! 232 | --- 233 | > hello world! 234 | 235 | $ diff -s i1 dir1/ 236 | Files i1 and dir1/i1 are identical 237 | $ diff -s . dir1/i1 238 | Files ./i1 and dir1/i1 are identical 239 | ``` 240 | 241 | * if both arguments are directories, all files are compared 242 | 243 | ```bash 244 | $ touch dir1/report.log dir1/lists dir2/power.log 245 | $ cp f1 dir1/ 246 | $ cp f1 dir2/ 247 | 248 | $ # by default, all differences are reported 249 | $ # as well as filenames which are unique to respective directories 250 | $ diff dir1 dir2 251 | diff dir1/i1 dir2/i1 252 | 1c1 253 | < Hello World! 254 | --- 255 | > hello world! 256 | Only in dir1: lists 257 | Only in dir2: power.log 258 | Only in dir1: report.log 259 | ``` 260 | 261 | * to report only filenames 262 | 263 | ```bash 264 | $ diff -sq dir1 dir2 265 | Files dir1/f1 and dir2/f1 are identical 266 | Files dir1/i1 and dir2/i1 differ 267 | Only in dir1: lists 268 | Only in dir2: power.log 269 | Only in dir1: report.log 270 | 271 | $ # list only differing files 272 | $ # also useful to copy-paste the command for GUI diffs like tkdiff/vimdiff 273 | $ diff dir1 dir2 | grep '^diff ' 274 | diff dir1/i1 dir2/i1 275 | ``` 276 | 277 | * to recursively compare sub-directories as well, use `-r` 278 | 279 | ```bash 280 | $ mkdir dir1/subdir dir2/subdir 281 | $ echo 'good' > dir1/subdir/f1 282 | $ echo 'goad' > dir2/subdir/f1 283 | 284 | $ diff -srq dir1 dir2 285 | Files dir1/f1 and dir2/f1 are identical 286 | Files dir1/i1 and dir2/i1 differ 287 | Only in dir1: lists 288 | Only in dir2: power.log 289 | Only in dir1: report.log 290 | Files dir1/subdir/f1 and dir2/subdir/f1 differ 291 | 292 | $ diff -r dir1 dir2 | grep '^diff ' 293 | diff -r dir1/i1 dir2/i1 294 | diff -r dir1/subdir/f1 dir2/subdir/f1 295 | ``` 296 | 297 | * See also [GNU diffutils manual - comparing directories](https://www.gnu.org/software/diffutils/manual/diffutils.html#Comparing-Directories) for further options and details like excluding files, ignoring filename case, etc and `dirdiff` command 298 | 299 |
300 | 301 | #### colordiff 302 | 303 | ```bash 304 | $ whatis colordiff 305 | colordiff (1) - a tool to colorize diff output 306 | 307 | $ whatis wdiff 308 | wdiff (1) - display word differences between text files 309 | ``` 310 | 311 | * simply replace `diff` with `colordiff` 312 | 313 | ![colordiff](./images/colordiff.png) 314 | 315 | * or, pass output of a `diff` tool to `colordiff` 316 | 317 | ![wdiff to colordiff](./images/wdiff_to_colordiff.png) 318 | 319 | * See also [stackoverflow - How to colorize diff on the command line?](https://stackoverflow.com/questions/8800578/how-to-colorize-diff-on-the-command-line) for other options 320 | 321 |
322 | 323 | **Further Reading** 324 | 325 | * `man diff` and `info diff` for more options and detailed documentation 326 | * [GNU diffutils manual](https://www.gnu.org/software/diffutils/manual/diffutils.html) for a better documentation 327 | * `man -k diff` to get list of all commands related to `diff` 328 | * [diff Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/diff?sort=votes&pageSize=15) 329 | * [unix.stackexchange - GUI diff and merge tools](https://unix.stackexchange.com/questions/4573/which-gui-diff-viewer-would-you-recommend-with-copy-to-left-right-functionality) 330 | * [unix.stackexchange - Understanding diff output](https://unix.stackexchange.com/questions/81998/understanding-of-diff-output) 331 | * [stackoverflow - Using output of diff to create patch](https://stackoverflow.com/questions/437219/using-the-output-of-diff-to-create-the-patch) 332 | 333 | -------------------------------------------------------------------------------- /wheres_my_file.md: -------------------------------------------------------------------------------- 1 | # Where's my file 2 | 3 | **Table of Contents** 4 | 5 | * [find](#find) 6 | * [locate](#locate) 7 | 8 |
9 | 10 | ## find 11 | 12 | ```bash 13 | $ find --version | head -n1 14 | find (GNU findutils) 4.7.0-git 15 | 16 | $ man find 17 | FIND(1) General Commands Manual FIND(1) 18 | 19 | NAME 20 | find - search for files in a directory hierarchy 21 | 22 | SYNOPSIS 23 | find [-H] [-L] [-P] [-D debugopts] [-Olevel] [starting-point...] 24 | [expression] 25 | 26 | DESCRIPTION 27 | This manual page documents the GNU version of find. GNU find searches 28 | the directory tree rooted at each given starting-point by evaluating 29 | the given expression from left to right, according to the rules of 30 | precedence (see section OPERATORS), until the outcome is known (the 31 | left hand side is false for and operations, true for or), at which 32 | point find moves on to the next file name. If no starting-point is 33 | specified, `.' is assumed. 34 | ... 35 | ``` 36 | 37 | **Examples** 38 | 39 | Filtering based on file name 40 | 41 | * `find . -iname 'power.log'` search and print path of file named power.log (ignoring case) in current directory and its sub-directories 42 | * `find -name '*log'` search and print path of all files whose name ends with log in current directory - using `.` is optional when searching in current directory 43 | * `find -not -name '*log'` print path of all files whose name does NOT end with log in current directory 44 | * `find -regextype egrep -regex '.*/\w+'` use extended regular expression to match filename containing only `[a-zA-Z_]` characters 45 | * `.*/` is needed to match initial part of file path 46 | 47 | Filtering based on file type 48 | 49 | * `find /home/guest1/proj -type f` print path of all regular files found in specified directory 50 | * `find /home/guest1/proj -type d` print path of all directories found in specified directory 51 | * `find /home/guest1/proj -type f -name '.*'` print path of all hidden files 52 | 53 | Filtering based on depth 54 | 55 | The relative path `.` is considered as depth 0 directory, files and folders immediately contained in a directory are at depth 1 and so on 56 | 57 | * `find -maxdepth 1 -type f` all regular files (including hidden ones) from current directory (without going to sub-directories) 58 | * `find -maxdepth 1 -type f -name '[!.]*'` all regular files (but not hidden ones) from current directory (without going to sub-directories) 59 | * `-not -name '.*'` can be also used 60 | * `find -mindepth 1 -maxdepth 1 -type d` all directories (including hidden ones) in current directory (without going to sub-directories) 61 | 62 | Filtering based on file properties 63 | 64 | * `find -mtime -2` print files that were modified within last two days in current directory 65 | * Note that day here means 24 hours 66 | * `find -mtime +7` print files that were modified more than seven days back in current directory 67 | * `find -daystart -type f -mtime -1` files that were modified from beginning of day (not past 24 hours) 68 | * `find -size +10k` print files with size greater than 10 kilobytes in current directory 69 | * `find -size -1M` print files with size less than 1 megabytes in current directory 70 | * `find -size 2G` print files of size 2 gigabytes in current directory 71 | 72 | Passing filtered files as input to other commands 73 | 74 | * `find report -name '*log*' -exec rm {} \;` delete all filenames containing log in report folder and its sub-folders 75 | * here `rm` command is called for every file matching the search conditions 76 | * since `;` is a special character for shell, it needs to be escaped using `\` 77 | * `find report -name '*log*' -delete` delete all filenames containing log in report folder and its sub-folders 78 | * `find -name '*.txt' -exec wc {} +` list of files ending with txt are all passed together as argument to `wc` command instead of executing wc command for every file 79 | * no need to use escape the `+` character in this case 80 | * also note that number of invocations of command specified is not necessarily once if number of files found is too large 81 | * `find -name '*.log' -exec mv {} ../log/ \;` move files ending with .log to log directory present in one hierarchy above. `mv` is executed once per each filtered file 82 | * `find -name '*.log' -exec mv -t ../log/ {} +` the `-t` option allows to specify target directory and then provide multiple files to be moved as argument 83 | * Similarly, one can use `-t` for `cp` command 84 | 85 | **Further Reading** 86 | 87 | * [using find](http://mywiki.wooledge.org/UsingFind) 88 | * [find examples on SO](https://stackoverflow.com/documentation/bash/566/find#t=201612140534548263961) 89 | * [Collection of find examples](http://alvinalexander.com/unix/edu/examples/find.shtml) 90 | * [find Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/find?sort=votes&pageSize=15) 91 | * [find and tar example](https://unix.stackexchange.com/questions/282762/find-mtime-1-print-xargs-tar-archives-all-files-from-directory-ignoring-t/282885#282885) 92 | * [find Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/find?sort=votes&pageSize=15) 93 | * [Why is looping over find's output bad practice?](https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice) 94 | 95 | 96 |
97 | 98 | ## locate 99 | 100 | ```bash 101 | $ locate --version | head -n1 102 | mlocate 0.26 103 | 104 | $ man locate 105 | locate(1) General Commands Manual locate(1) 106 | 107 | NAME 108 | locate - find files by name 109 | 110 | SYNOPSIS 111 | locate [OPTION]... PATTERN... 112 | 113 | DESCRIPTION 114 | locate reads one or more databases prepared by updatedb(8) and writes 115 | file names matching at least one of the PATTERNs to standard output, 116 | one per line. 117 | 118 | If --regex is not specified, PATTERNs can contain globbing characters. 119 | If any PATTERN contains no globbing characters, locate behaves as if 120 | the pattern were *PATTERN*. 121 | ... 122 | ``` 123 | 124 | Faster alternative to `find` command when searching for a file by its name. It is based on a database, which gets updated by a `cron` job. So, newer files may be not present in results. Use this command if it is available in your distro and you remember some part of filename. Very useful if one has to search entire filesystem in which case `find` command might take a very long time compared to `locate` 125 | 126 | **Examples** 127 | 128 | * `locate 'power'` print path of files containing power in the whole filesystem 129 | * matches anywhere in path, ex: '/home/learnbyexample/lowpower_adder/result.log' and '/home/learnbyexample/power.log' are both a valid match 130 | * implicitly, `locate` would change the string to `*power*` as no globbing characters are present in the string specified 131 | * `locate -b '\power.log'` print path matching the string power.log exactly at end of path 132 | * '/home/learnbyexample/power.log' matches but not '/home/learnbyexample/lowpower.log' 133 | * since globbing character '\' is used while specifying search string, it doesn't get implicitly replaced by `*power.log*` 134 | * `locate -b '\proj_adder'` the `-b` option also comes in handy to print only the path of directory name, otherwise every file under that folder would also be displayed 135 | * [find vs locate - pros and cons](https://unix.stackexchange.com/questions/60205/locate-vs-find-usage-pros-and-cons-of-each-other) 136 | 137 | 138 | --------------------------------------------------------------------------------