├── README.md
├── exercises
    ├── GNU_grep
    │   ├── .ref_solutions
    │   │   ├── ex01_basic_match.txt
    │   │   ├── ex02_basic_options.txt
    │   │   ├── ex03_multiple_string_match.txt
    │   │   ├── ex04_filenames.txt
    │   │   ├── ex05_word_line_matching.txt
    │   │   ├── ex06_ABC_context_matching.txt
    │   │   ├── ex07_recursive_search.txt
    │   │   ├── ex08_search_pattern_from_file.txt
    │   │   ├── ex09_regex_anchors.txt
    │   │   ├── ex10_regex_this_or_that.txt
    │   │   ├── ex11_regex_quantifiers.txt
    │   │   ├── ex12_regex_character_class_part1.txt
    │   │   ├── ex13_regex_character_class_part2.txt
    │   │   ├── ex14_regex_grouping_and_backreference.txt
    │   │   ├── ex15_regex_PCRE.txt
    │   │   └── ex16_misc_and_extras.txt
    │   ├── ex01_basic_match.txt
    │   ├── ex01_basic_match
    │   │   └── sample.txt
    │   ├── ex02_basic_options.txt
    │   ├── ex02_basic_options
    │   │   └── sample.txt
    │   ├── ex03_multiple_string_match.txt
    │   ├── ex03_multiple_string_match
    │   │   └── sample.txt
    │   ├── ex04_filenames.txt
    │   ├── ex04_filenames
    │   │   ├── greeting.txt
    │   │   ├── poem.txt
    │   │   └── sample.txt
    │   ├── ex05_word_line_matching.txt
    │   ├── ex05_word_line_matching
    │   │   ├── greeting.txt
    │   │   ├── sample.txt
    │   │   └── words.txt
    │   ├── ex06_ABC_context_matching.txt
    │   ├── ex06_ABC_context_matching
    │   │   └── sample.txt
    │   ├── ex07_recursive_search.txt
    │   ├── ex07_recursive_search
    │   │   ├── msg
    │   │   │   ├── greeting.txt
    │   │   │   └── sample.txt
    │   │   ├── poem.txt
    │   │   ├── progs
    │   │   │   ├── hello.py
    │   │   │   └── hello.sh
    │   │   └── words.txt
    │   ├── ex08_search_pattern_from_file.txt
    │   ├── ex08_search_pattern_from_file
    │   │   ├── baz.txt
    │   │   ├── foo.txt
    │   │   └── words.txt
    │   ├── ex09_regex_anchors.txt
    │   ├── ex09_regex_anchors
    │   │   └── sample.txt
    │   ├── ex10_regex_this_or_that.txt
    │   ├── ex10_regex_this_or_that
    │   │   └── sample.txt
    │   ├── ex11_regex_quantifiers.txt
    │   ├── ex11_regex_quantifiers
    │   │   └── garbled.txt
    │   ├── ex12_regex_character_class_part1.txt
    │   ├── ex12_regex_character_class_part1
    │   │   └── sample_words.txt
    │   ├── ex13_regex_character_class_part2.txt
    │   ├── ex13_regex_character_class_part2
    │   │   └── sample.txt
    │   ├── ex14_regex_grouping_and_backreference.txt
    │   ├── ex14_regex_grouping_and_backreference
    │   │   └── sample.txt
    │   ├── ex15_regex_PCRE.txt
    │   ├── ex15_regex_PCRE
    │   │   └── sample.txt
    │   ├── ex16_misc_and_extras.txt
    │   ├── ex16_misc_and_extras
    │   │   ├── garbled.txt
    │   │   ├── poem.txt
    │   │   └── sample.txt
    │   └── solve
    └── README.md
├── file_attributes.md
├── gnu_awk.md
├── gnu_grep.md
├── gnu_sed.md
├── images
    ├── color_option.png
    ├── colordiff.png
    ├── highlight_string_whole_file_op.png
    └── wdiff_to_colordiff.png
├── miscellaneous.md
├── overview_presentation
    ├── baz.json
    ├── cli_text_processing.pdf
    ├── foo.xml
    ├── greeting.txt
    └── sample.txt
├── perl_the_swiss_knife.md
├── restructure_text.md
├── ruby_one_liners.md
├── sorting_stuff.md
├── tail_less_cat_head.md
├── whats_the_difference.md
└── wheres_my_file.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Command Line Text Processing
 2 | 
 3 | Learn about various commands available for common and exotic text processing needs. Examples have been tested on GNU/Linux - there'd be syntax/feature variations with other distributions, consult their respective `man` pages for details.
 4 | 
 5 | ---
 6 | 
 7 | :warning: :warning: I'm no longer actively working on this repo. Instead, I've converted existing chapters into ebooks (see [ebook section](#ebooks) below for links), available under the same license. These ebooks are better formatted, updated for newer versions of the software, includes exercises, solutions, etc. Since all the chapters have been converted, I'm archiving this repo.
 8 | 
 9 | ---
10 | 
11 | <br>
12 | 
13 | ## Ebooks
14 | 
15 | Individual online ebooks with better formatting, explanations, exercises, solutions, etc:
16 | 
17 | * [CLI text processing with GNU grep and ripgrep](https://learnbyexample.github.io/learn_gnugrep_ripgrep/)
18 | * [CLI text processing with GNU sed](https://learnbyexample.github.io/learn_gnused/)
19 | * [CLI text processing with GNU awk](https://learnbyexample.github.io/learn_gnuawk/)
20 | * [Ruby One-Liners Guide](https://learnbyexample.github.io/learn_ruby_oneliners/)
21 | * [Perl One-Liners Guide](https://learnbyexample.github.io/learn_perl_oneliners/)
22 | * [CLI text processing with GNU Coreutils](https://learnbyexample.github.io/cli_text_processing_coreutils/)
23 | * [Linux Command Line Computing](https://learnbyexample.github.io/cli-computing/)
24 | 
25 | See https://learnbyexample.github.io/books/ for links to PDF/EPUB versions and other ebooks.
26 | 
27 | <br>
28 | 
29 | ## Chapters
30 | 
31 | As mentioned earlier, I'm no longer actively working on these chapters:
32 | 
33 | * [Cat, Less, Tail and Head](./tail_less_cat_head.md)
34 |     * cat, less, tail, head, Text Editors
35 | * [GNU grep](./gnu_grep.md)
36 | * [GNU sed](./gnu_sed.md)
37 | * [GNU awk](./gnu_awk.md)
38 | * [Perl the swiss knife](./perl_the_swiss_knife.md)
39 | * [Ruby one liners](./ruby_one_liners.md)
40 | * [Sorting stuff](./sorting_stuff.md)
41 |     * sort, uniq, comm, shuf
42 | * [Restructure text](./restructure_text.md)
43 |     * paste, column, pr, fold
44 | * [Whats the difference](./whats_the_difference.md)
45 |     * cmp, diff
46 | * [Wheres my file](./wheres_my_file.md)
47 | * [File attributes](./file_attributes.md)
48 |     * wc, du, df, touch, file
49 | * [Miscellaneous](./miscellaneous.md)
50 |     * cut, tr, basename, dirname, xargs, seq
51 | 
52 | <br>
53 | 
54 | ## Webinar recordings
55 | 
56 | Recorded couple of videos based on content in the chapters, not sure if I'll do more:
57 | 
58 | * [Using the sort command](https://www.youtube.com/watch?v=qLfAwwb5vGs)
59 | * [Using uniq and comm](https://www.youtube.com/watch?v=uAb2kxA2TyQ)
60 | 
61 | See also my short videos on [Linux command line tips](https://www.youtube.com/watch?v=p0KCLusMd5Q&list=PLTv2U3HnAL4PNTmRqZBSUgKaiHbRL2zeY)
62 | 
63 | <br>
64 | 
65 | ## Exercises
66 | 
67 | Check out [exercises](./exercises) directory to solve practice questions on `grep`, right from the command line itself.
68 | 
69 | See also my [TUI-apps](https://github.com/learnbyexample/TUI-apps) repo for interactive CLI text processing exercises.
70 | 
71 | <br>
72 | 
73 | ## Contributing
74 | 
75 | * Please [open an issue](https://github.com/learnbyexample/Command-line-text-processing/issues) for typos or bugs
76 |     * As this repo is no longer actively worked upon, **please do not submit pull requests**
77 | * Share the repo with friends/colleagues, on social media, etc to help reach other learners
78 | * In case you need to reach me, mail me at `echo 'yrneaolrknzcyr.arg@tznvy.pbz' | tr 'a-z' 'n-za-m'` or send a DM via [twitter](https://twitter.com/learn_byexample)
79 | 
80 | <br>
81 | 
82 | ## Acknowledgements
83 | 
84 | * [unix.stackexchange](https://unix.stackexchange.com/) and [stackoverflow](https://stackoverflow.com/) - for getting answers to pertinent questions as well as sharpening skills by understanding and answering questions
85 | * Forums like [Linux users](https://www.linkedin.com/groups/65688), [/r/commandline/](https://www.reddit.com/r/commandline/), [/r/linux/](https://www.reddit.com/r/linux/), [/r/ruby/](https://www.reddit.com/r/ruby/), [news.ycombinator](https://news.ycombinator.com/news), [devup](http://devup.in/) and others for valuable feedback (especially spotting mistakes) and encouragement
86 | * See [wikipedia entry 'Roses Are Red'](https://en.wikipedia.org/wiki/Roses_Are_Red) for `poem.txt` used as sample text input file
87 | 
88 | <br>
89 | 
90 | ## License
91 | 
92 | This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/)
93 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex01_basic_match.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing the string: day
2 | Solution: grep 'day' sample.txt
3 | 
4 | 2) Match lines containing the string: it
5 | Solution: grep 'it' sample.txt
6 | 
7 | 3) Match lines containing the string: do you
8 | Solution: grep 'do you' sample.txt
9 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex02_basic_options.txt:
--------------------------------------------------------------------------------
 1 | 1) Match lines containing the string irrespective of lower/upper case: no
 2 | Solution: grep -i 'no' sample.txt
 3 | 
 4 | 2) Match lines not containing the string: o
 5 | Solution: grep -v 'o' sample.txt
 6 | 
 7 | 3) Match lines with line numbers containing the string: it
 8 | Solution: grep -n 'it' sample.txt
 9 | 
10 | 4) Output only number of matching lines containing the string: a
11 | Solution: grep -c 'a' sample.txt
12 | 
13 | 5) Match first two lines containing the string: do
14 | Solution: grep -m2 'do' sample.txt
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex03_multiple_string_match.txt:
--------------------------------------------------------------------------------
 1 | 1) Match lines containing either of these three strings
 2 |         String1: Not
 3 |         String2: he
 4 |         String3: sun
 5 | Solution: grep -e 'Not' -e 'he' -e 'sun' sample.txt
 6 | 
 7 | 2) Match lines containing both these strings
 8 |         String1: He
 9 |         String2: or
10 | Solution: grep 'He' sample.txt | grep 'or'
11 | 
12 | 3) Match lines containing either of these two strings
13 |         String1: a
14 |         String2: i
15 |    and contains this as well
16 |         String3: do
17 | Solution: grep -e 'a' -e 'i' sample.txt | grep 'do'
18 | 
19 | 4) Match lines containing the string
20 |         String1: it
21 |    but not these strings
22 |         String2: No
23 |         String3: no
24 | Solution: grep 'it' sample.txt | grep -vi 'no'
25 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex04_filenames.txt:
--------------------------------------------------------------------------------
 1 | Note: All files present in the directory should be given as file inputs to grep
 2 | 
 3 | 1) Show only filenames containing the string: are
 4 | Solution: grep -l 'are' *
 5 | 
 6 | 2) Show only filenames NOT containing the string: two
 7 | Solution: grep -L 'two' *
 8 | 
 9 | 3) Match all lines containing the string: are
10 | Solution: grep 'are' *
11 | 
12 | 4) Match maximum of two matching lines along with filenames containing the character: a
13 | Solution: grep -m2 'a' *
14 | 
15 | 5) Match all lines without prefixing filename containing the string: to
16 | Solution: grep -h 'to' *
17 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex05_word_line_matching.txt:
--------------------------------------------------------------------------------
 1 | Note: All files present in the directory should be given as file inputs to grep
 2 | 
 3 | 1) Match lines containing whole word: do
 4 | Solution: grep -w 'do' *
 5 | 
 6 | 2) Match whole lines containing the string: Hello World
 7 | Solution: grep -x 'Hello World' *
 8 | 
 9 | 3) Match lines containing these whole words:
10 |         Word1: He
11 |         Word2: far
12 | Solution: grep -w -e 'far' -e 'He' *
13 | 
14 | 4) Match lines containing the whole word: you
15 |     and NOT containing the case insensitive string: How
16 | Solution: grep -w 'you' * | grep -vi 'how'
17 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex06_ABC_context_matching.txt:
--------------------------------------------------------------------------------
 1 | 1) Get lines and 3 following it containing the string: you
 2 | Solution: grep -A3 'you' sample.txt
 3 | 
 4 | 2) Get lines and 2 preceding it containing the string: is
 5 | Solution: grep -B2 'is' sample.txt
 6 | 
 7 | 3) Get lines and 1 following/preceding containing the string: Not
 8 | Solution: grep -C1 'Not' sample.txt
 9 | 
10 | 4) Get lines and 1 following and 4 preceding containing the string: Not
11 | Solution: grep -A1 -B4 'Not' sample.txt
12 | 
13 | 5) Get lines and 1 preceding it containing the string: you
14 |         there should be no separator between the matches
15 | Solution: grep --no-group-separator -B1 'you' sample.txt
16 | 
17 | 6) Get lines and 1 preceding it containing the string: you
18 |         the separator between the matches should be: #####
19 | Solution: grep --group-separator='#####' -B1 'you' sample.txt
20 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex07_recursive_search.txt:
--------------------------------------------------------------------------------
 1 | Note: Every file in this directory and sub-directories is input for grep, unless otherwise specified
 2 | 
 3 | 1) Match all lines containing the string: you
 4 | Solution: grep -r 'you'
 5 | 
 6 | 2) Show only filenames matching the string: Hello
 7 |     filenames should only end with .txt 
 8 | Solution: grep -rl --include='*.txt' 'Hello'
 9 | 
10 | 3) Show only filenames matching the string: Hello
11 |     filenames should NOT end with .txt 
12 | Solution: grep -rl --exclude='*.txt' 'Hello'
13 | 
14 | 4) Show only filenames matching the string: are
15 |     should not include the directory: progs
16 | Solution: grep -rl --exclude-dir='progs' 'are'
17 | 
18 | 5) Show only filenames matching the string: are
19 |     should NOT include these directories
20 |             dir1: progs
21 |             dir2: msg
22 | Solution: grep -rl --exclude-dir='progs' --exclude-dir='msg' 'are'
23 | 
24 | 6) Show only filenames matching the string: are
25 |     should include files only from sub-directories
26 |     hint: use shell glob pattern to specify directories to search
27 | Solution: grep -rl 'are' */
28 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex08_search_pattern_from_file.txt:
--------------------------------------------------------------------------------
 1 | Note: words.txt has only whole words per line, use it as file input when task is to match whole words
 2 | 
 3 | 1) Match all strings from file words.txt in file baz.txt
 4 | Solution: grep -f words.txt baz.txt 
 5 | 
 6 | 2) Match all words from file words.txt in file foo.txt
 7 |     should only match whole words
 8 |     should print only matching words, not entire line
 9 | Solution: grep -owf words.txt foo.txt
10 | 
11 | 3) Show common lines between foo.txt and baz.txt
12 | Solution: grep -Fxf foo.txt baz.txt
13 | 
14 | 4) Show lines present in baz.txt but not in foo.txt
15 | Solution: grep -Fxvf foo.txt baz.txt
16 | 
17 | 5) Show lines present in foo.txt but not in baz.txt
18 | Solution: grep -Fxvf baz.txt foo.txt
19 | 
20 | 6) Find all words common between all three files in the directory
21 |     should only match whole words
22 |     should print only matching words, not entire line
23 | Solution: grep -owf words.txt foo.txt | grep -owf- baz.txt
24 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex09_regex_anchors.txt:
--------------------------------------------------------------------------------
 1 | 1) Match all lines starting with: no
 2 | Solution: grep '^no' sample.txt
 3 | 
 4 | 2) Match all lines ending with: it
 5 | Solution: grep 'it$' sample.txt
 6 | 
 7 | 3) Match all lines containing whole word: do
 8 | Solution: grep -w 'do' sample.txt
 9 | 
10 | 4) Match all lines containing words starting with: do
11 | Solution: grep '\<do' sample.txt
12 | 
13 | 5) Match all lines containing words ending with: do
14 | Solution: grep 'do\>' sample.txt
15 | 
16 | 6) Match all lines starting with: ^
17 | Solution: grep '^^' sample.txt
18 | 
19 | 7) Match all lines ending with: $
20 | Solution: grep '$$' sample.txt
21 | 
22 | 8) Match all lines containing the string: in
23 |     not surrounded by word boundaries, for ex: mint but not tin or ink
24 | Solution: grep '\Bin\B' sample.txt
25 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex10_regex_this_or_that.txt:
--------------------------------------------------------------------------------
 1 | 1) Match all lines containing any of these strings:
 2 |         String1: day
 3 |         String2: not
 4 | Solution: grep -E 'day|not' sample.txt
 5 | 
 6 | 2) Match all lines containing any of these whole words:
 7 |         String1: he
 8 |         String2: in
 9 | Solution: grep -wE 'he|in' sample.txt
10 | 
11 | 3) Match all lines containing any of these strings:
12 |         String1: you
13 |         String2: be
14 |         String3: to
15 |         String4: he
16 | Solution: grep -E 'he|be|to|you' sample.txt
17 | 
18 | 4) Match all lines containing any of these strings:
19 |         String1: you
20 |         String2: be
21 |         String3: to
22 |         String4: he
23 |     but NOT these strings:
24 |         String1: it
25 |         String2: do
26 | Solution: grep -E 'he|be|to|you' sample.txt | grep -vE 'do|it'
27 | 
28 | 5) Match all lines starting with any of these strings:
29 |         String1: no
30 |         String2: to
31 | Solution: grep -E '^no|^to' sample.txt
32 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex11_regex_quantifiers.txt:
--------------------------------------------------------------------------------
 1 | 1) Extract all 3 character strings surrounded by word boundaries
 2 | Solution: grep -ow '...' garbled.txt
 3 | 
 4 | 2) Extract largest string from each line
 5 |         starting with character: d
 6 |         ending with character  : g
 7 | Solution: grep -o 'd.*g' garbled.txt
 8 | 
 9 | 3) Extract all strings from each line
10 |         starting with character: d
11 |         followed by zero or one: o
12 |         ending with character  : g
13 | Solution: grep -oE 'do?g' garbled.txt
14 | 
15 | 4) Extract all strings from each line
16 |         starting with character: d
17 |         followed by zero or one of any character
18 |         ending with character  : g
19 | Solution: grep -oE 'd.?g' garbled.txt
20 | 
21 | 5) Extract all strings from each line
22 |         starting with character: g
23 |         followed by atleast one: o
24 |         ending with character  : d
25 | Solution: grep -oE 'go+d' garbled.txt
26 | 
27 | 6) Extract all strings from each line
28 |         starting with character : g
29 |         followed by extactly six: o
30 |         ending with character   : d
31 | Solution: grep -oE 'go{6}d' garbled.txt
32 | 
33 | 7) Extract all strings from each line
34 |         starting with character         : g
35 |         followed by min two and max four: o
36 |         ending with character           : d
37 | Solution: grep -oE 'go{2,4}d' garbled.txt
38 | 
39 | 8) Extract all strings from each line
40 |         starting with character: d
41 |         followed by max of two : o
42 |         ending with character  : g
43 | Solution: grep -oE 'do{,2}g' garbled.txt
44 | 
45 | 9) Extract all strings from each line
46 |         starting with character : g
47 |         followed by min of three: o
48 |         ending with character   : d
49 | Solution: grep -oE 'go{3,}d' garbled.txt
50 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex12_regex_character_class_part1.txt:
--------------------------------------------------------------------------------
 1 | 1) Match all lines containing any of these characters:
 2 |         character1: q
 3 |         character2: x
 4 |         character3: z
 5 | Solution: grep '[qzx]' sample_words.txt
 6 | 
 7 | 2) Match all lines containing any of these characters:
 8 |         character1: c
 9 |         character2: f
10 |     followed by any character
11 |     followed by   : t
12 | Solution: grep '[cf].t' sample_words.txt
13 | 
14 | 3) Extract all words starting with character: s
15 |     ignore case
16 |     should contain only alphabets
17 |     minimum two letters
18 |     should be surrounded by word boundaries
19 | Solution: grep -iowE 's[a-z]+' sample_words.txt
20 | 
21 | 4) Extract all words made up of these characters:
22 |         character1: a
23 |         character2: c
24 |         character3: e
25 |         character4: r
26 |         character5: s
27 |     ignore case
28 |     should contain only alphabets
29 |     should be surrounded by word boundaries
30 | Solution: grep -iowE '[acers]+' sample_words.txt
31 | 
32 | 5) Extract all numbers surrounded by word boundaries
33 | Solution: grep -ow '[0-9]*' sample_words.txt
34 | 
35 | 6) Extract all numbers surrounded by word boundaries matching the condition
36 |     30 <= number <= 70
37 | Solution: grep -owE '[3-6][0-9]|70' sample_words.txt
38 | 
39 | 7) Extract all words made up of non-vowel characters
40 |     ignore case
41 |     should contain only alphabets and at least two
42 |     should be surrounded by word boundaries
43 | Solution: grep -iowE '[b-df-hj-np-tv-z]{2,}' sample_words.txt
44 | 
45 | 8) Extract all sequence of strings consisting of character: -
46 |     surrounded on either side by zero or more case insensitive alphabets    
47 | Solution: grep -io '[a-z]*-[a-z]*' sample_words.txt
48 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex13_regex_character_class_part2.txt:
--------------------------------------------------------------------------------
 1 | 1) Extract all characters before first occurrence of =
 2 | Solution: grep -o '^[^=]*' sample.txt
 3 | 
 4 | 2) Extract all characters from start of line made up of these characters
 5 |         upper or lower case alphabets
 6 |         all digits
 7 |         the underscore character
 8 | Solution: grep -o '^\w*' sample.txt
 9 | 
10 | 3) Match all lines containing the sequence
11 |         String1: there
12 |         any number of whitespace
13 |         String2: have
14 | Solution: grep 'there\s*have' sample.txt
15 | 
16 | 4) Extract all characters from start of line made up of these characters
17 |         upper or lower case alphabets
18 |         all digits
19 |         the characters [ and ]
20 |         ending with ]
21 | Solution: grep -oi '^[]a-z0-9[]*]' sample.txt
22 | 
23 | 5) Extract all punctuation characters from first line
24 | Solution: grep -om1 '[[:punct:]]' sample.txt
25 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex14_regex_grouping_and_backreference.txt:
--------------------------------------------------------------------------------
 1 | 1) Match lines containing these strings
 2 |         String1: scare
 3 |         String2: spore
 4 | Solution: grep -E 's(po|ca)re' sample.txt
 5 | 
 6 | 2) Extract these words
 7 |         Word1: handy
 8 |         Word2: hand
 9 |         Word3: hands
10 |         Word4: handful
11 | Solution: grep -oE 'hand([sy]|ful)?' sample.txt
12 | 
13 | 3) Extract all whole words with at least one letter occurring twice in the word
14 |     ignore case
15 |     only alphabets
16 |     the letter occurring twice need not be placed next to each other
17 | Solution: grep -ioE '[a-z]*([a-z])[a-z]*\1[a-z]*' sample.txt
18 | 
19 | 4) Match lines where same sequence of three consecutive alphabets is matched another time in the same line
20 |     ignore case
21 | Solution: grep -iE '([a-z]{3}).*\1' sample.txt
22 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex15_regex_PCRE.txt:
--------------------------------------------------------------------------------
 1 | 1) Extract all strings to the right of =
 2 |     provided characters from start of line until = do not include [ or ]
 3 | Solution: grep -oP '^[^][=]+=\K.*' sample.txt
 4 | 
 5 | 2) Match all lines containing the string: Hi
 6 |     but shouldn't be followed afterwards in the line by: are
 7 | Solution: grep -P 'Hi(?!.*are)' sample.txt
 8 | 
 9 | 3) Extract from start of line up to the string: Hi
10 |     provided it is followed afterwards in the line by: you
11 | Solution: grep -oP '.*Hi(?=.*you)' sample.txt
12 | 
13 | 4) Extract all sequence of characters surrounded on both sides by space character
14 |     the space character should not be part of output
15 | Solution: grep -oP ' \K[^ ]+(?= )' sample.txt
16 | 
17 | 5) Extract all words
18 |     made of upper or lower case alphabets
19 |     at least two letters in length
20 |     surrounded by word boundaries
21 |     should not contain consecutive repeated alphabets
22 | Solution: grep -iowP '[a-z]*([a-z])\1[a-z]*(*SKIP)(*F)|[a-z]{2,}' sample.txt
23 | 
24 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex16_misc_and_extras.txt:
--------------------------------------------------------------------------------
 1 | Note: all files in directory are input to grep, unless otherwise specified
 2 | 
 3 | 1) Extract all negative numbers
 4 |     starts with - followed by one or more digits
 5 |     do not output filenames
 6 | Solution: grep -hoE -- '-[0-9]+' *
 7 | 
 8 | 2) Display only filenames containing these two strings anywhere in the file
 9 |         String1: day
10 |         String2: and
11 | Solution: grep -zlE 'day.*and|and.*day' *
12 | 
13 | 3) The below command
14 |         grep -c '^Solution:' ../.ref_solutions/*
15 |     will give number of questions in each exercise. Change it, using another command and pipe if needed, so that only overall total is printed
16 | Solution: cat ../.ref_solutions/* | grep -c '^Solution:'
17 | 
18 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex01_basic_match.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing the string: day
2 | 
3 | 
4 | 2) Match lines containing the string: it
5 | 
6 | 
7 | 3) Match lines containing the string: do you
8 | 
9 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex01_basic_match/sample.txt:
--------------------------------------------------------------------------------
 1 | Hello World!
 2 | 
 3 | Good day
 4 | How do you do?
 5 | 
 6 | Just do it
 7 | Believe it!
 8 | 
 9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 | 
13 | Much ado about nothing
14 | He he he
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex02_basic_options.txt:
--------------------------------------------------------------------------------
 1 | 1) Match lines containing the string irrespective of lower/upper case: no
 2 | 
 3 | 
 4 | 2) Match lines not containing the string: o
 5 | 
 6 | 
 7 | 3) Match lines with line numbers containing the string: it
 8 | 
 9 | 
10 | 4) Output only number of matching lines containing the string: a
11 | 
12 | 
13 | 5) Match first two lines containing the string: do
14 | 
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex02_basic_options/sample.txt:
--------------------------------------------------------------------------------
 1 | Hello World!
 2 | 
 3 | Good day
 4 | How do you do?
 5 | 
 6 | Just do it
 7 | Believe it!
 8 | 
 9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 | 
13 | Much ado about nothing
14 | He he he
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex03_multiple_string_match.txt:
--------------------------------------------------------------------------------
 1 | 1) Match lines containing either of these three strings
 2 |         String1: Not
 3 |         String2: he
 4 |         String3: sun
 5 | 
 6 | 
 7 | 2) Match lines containing both these strings
 8 |         String1: He
 9 |         String2: or
10 | 
11 | 
12 | 3) Match lines containing either of these two strings
13 |         String1: a
14 |         String2: i
15 |    and contains this as well
16 |         String3: do
17 | 
18 | 
19 | 4) Match lines containing the string
20 |         String1: it
21 |    but not these strings
22 |         String2: No
23 |         String3: no
24 | 
25 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex03_multiple_string_match/sample.txt:
--------------------------------------------------------------------------------
 1 | Hello World!
 2 | 
 3 | Good day
 4 | How do you do?
 5 | 
 6 | Just do it
 7 | Believe it!
 8 | 
 9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 | 
13 | Much ado about nothing
14 | He he he
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex04_filenames.txt:
--------------------------------------------------------------------------------
 1 | Note: All files present in the directory should be given as file inputs to grep
 2 | 
 3 | 1) Show only filenames containing the string: are
 4 | 
 5 | 
 6 | 2) Show only filenames NOT containing the string: two
 7 | 
 8 | 
 9 | 3) Match all lines containing the string: are
10 | 
11 | 
12 | 4) Match maximum of two matching lines along with filenames containing the character: a
13 | 
14 | 
15 | 5) Match all lines without prefixing filename containing the string: to
16 | 
17 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex04_filenames/greeting.txt:
--------------------------------------------------------------------------------
 1 | Hi, how are you?
 2 | 
 3 | Hola :)
 4 | 
 5 | Hello world
 6 | 
 7 | Good day
 8 | 
 9 | Rock on
10 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex04_filenames/poem.txt:
--------------------------------------------------------------------------------
1 | Roses are red,
2 | Violets are blue,
3 | Sugar is sweet,
4 | And so are you.
5 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex04_filenames/sample.txt:
--------------------------------------------------------------------------------
 1 | Hello World!
 2 | 
 3 | Good day
 4 | How do you do?
 5 | 
 6 | Just do it
 7 | Believe it!
 8 | 
 9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 | 
13 | Much ado about nothing
14 | He he he
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex05_word_line_matching.txt:
--------------------------------------------------------------------------------
 1 | Note: All files present in the directory should be given as file inputs to grep
 2 | 
 3 | 1) Match lines containing whole word: do
 4 | 
 5 | 
 6 | 2) Match whole lines containing the string: Hello World
 7 | 
 8 | 
 9 | 3) Match lines containing these whole words:
10 |         Word1: He
11 |         Word2: far
12 | 
13 | 
14 | 4) Match lines containing the whole word: you
15 |     and NOT containing the case insensitive string: How
16 | 
17 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex05_word_line_matching/greeting.txt:
--------------------------------------------------------------------------------
 1 | Hi, how are you?
 2 | 
 3 | Hola :)
 4 | 
 5 | Hello World
 6 | 
 7 | Good day
 8 | 
 9 | Rock on
10 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex05_word_line_matching/sample.txt:
--------------------------------------------------------------------------------
 1 | Hello World!
 2 | 
 3 | Good day
 4 | How do you do?
 5 | 
 6 | Just do it
 7 | Believe it!
 8 | 
 9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 | 
13 | Much ado about nothing
14 | He he he
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex05_word_line_matching/words.txt:
--------------------------------------------------------------------------------
1 | afar
2 | far
3 | carfare
4 | farce
5 | faraway
6 | airfare
7 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex06_ABC_context_matching.txt:
--------------------------------------------------------------------------------
 1 | 1) Get lines and 3 following it containing the string: you
 2 | 
 3 | 
 4 | 2) Get lines and 2 preceding it containing the string: is
 5 | 
 6 | 
 7 | 3) Get lines and 1 following/preceding containing the string: Not
 8 | 
 9 | 
10 | 4) Get lines and 1 following and 4 preceding containing the string: Not
11 | 
12 | 
13 | 5) Get lines and 1 preceding it containing the string: you
14 |         there should be no separator between the matches
15 | 
16 | 
17 | 6) Get lines and 1 preceding it containing the string: you
18 |         the separator between the matches should be: #####
19 | 
20 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex06_ABC_context_matching/sample.txt:
--------------------------------------------------------------------------------
 1 | Hello World!
 2 | 
 3 | Good day
 4 | How do you do?
 5 | 
 6 | Just do it
 7 | Believe it!
 8 | 
 9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 | 
13 | Much ado about nothing
14 | He he he
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search.txt:
--------------------------------------------------------------------------------
 1 | Note: Every file in this directory and sub-directories is input for grep, unless otherwise specified
 2 | 
 3 | 1) Match all lines containing the string: you
 4 | 
 5 | 
 6 | 2) Show only filenames matching the string: Hello
 7 |     filenames should only end with .txt 
 8 | 
 9 | 
10 | 3) Show only filenames matching the string: Hello
11 |     filenames should NOT end with .txt 
12 | 
13 | 
14 | 4) Show only filenames matching the string: are
15 |     should not include the directory: progs
16 | 
17 | 
18 | 5) Show only filenames matching the string: are
19 |     should NOT include these directories
20 |             dir1: progs
21 |             dir2: msg
22 | 
23 | 
24 | 6) Show only filenames matching the string: are
25 |     should include files only from sub-directories
26 |     hint: use shell glob pattern to specify directories to search
27 | 
28 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/msg/greeting.txt:
--------------------------------------------------------------------------------
 1 | Hi, how are you?
 2 | 
 3 | Hola :)
 4 | 
 5 | Hello World
 6 | 
 7 | Good day
 8 | 
 9 | Rock on
10 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/msg/sample.txt:
--------------------------------------------------------------------------------
 1 | Hello World!
 2 | 
 3 | Good day
 4 | How do you do?
 5 | 
 6 | Just do it
 7 | Believe it!
 8 | 
 9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 | 
13 | Much ado about nothing
14 | He he he
15 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/poem.txt:
--------------------------------------------------------------------------------
1 | Roses are red,
2 | Violets are blue,
3 | Sugar is sweet,
4 | And so are you.
5 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/progs/hello.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python3
2 | 
3 | print("Hello World")
4 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/progs/hello.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | echo "Hello $USER"
4 | echo "Today is $(date -u +%A)"
5 | echo 'Hope you are having a nice day'
6 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/words.txt:
--------------------------------------------------------------------------------
1 | afar
2 | far
3 | carfare
4 | farce
5 | faraway
6 | airfare
7 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex08_search_pattern_from_file.txt:
--------------------------------------------------------------------------------
 1 | Note: words.txt has only whole words per line, use it as file input when task is to match whole words
 2 | 
 3 | 1) Match all strings from file words.txt in file baz.txt
 4 | 
 5 | 
 6 | 2) Match all words from file words.txt in file foo.txt
 7 |     should only match whole words
 8 |     should print only matching words, not entire line
 9 | 
10 | 
11 | 3) Show common lines between foo.txt and baz.txt
12 | 
13 | 
14 | 4) Show lines present in baz.txt but not in foo.txt
15 | 
16 | 
17 | 5) Show lines present in foo.txt but not in baz.txt
18 | 
19 | 
20 | 6) Find all words common between all three files in the directory
21 |     should only match whole words
22 |     should print only matching words, not entire line
23 | 
24 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex08_search_pattern_from_file/baz.txt:
--------------------------------------------------------------------------------
1 | I saw a few red cars going that way
2 | To the end!
3 | Are you coming today to the party?
4 | a[5] = 'good';
5 | Have you read the Harry Potter series?
6 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex08_search_pattern_from_file/foo.txt:
--------------------------------------------------------------------------------
1 | part
2 | a[5] = 'good';
3 | I saw a few red cars going that way
4 | Believe it!
5 | to do list
6 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex08_search_pattern_from_file/words.txt:
--------------------------------------------------------------------------------
1 | car
2 | part
3 | to
4 | read
5 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex09_regex_anchors.txt:
--------------------------------------------------------------------------------
 1 | 1) Match all lines starting with: no
 2 | 
 3 | 
 4 | 2) Match all lines ending with: it
 5 | 
 6 | 
 7 | 3) Match all lines containing whole word: do
 8 | 
 9 | 
10 | 4) Match all lines containing words starting with: do
11 | 
12 | 
13 | 5) Match all lines containing words ending with: do
14 | 
15 | 
16 | 6) Match all lines starting with: ^
17 | 
18 | 
19 | 7) Match all lines ending with: $
20 | 
21 | 
22 | 8) Match all lines containing the string: in
23 |     not surrounded by word boundaries, for ex: mint but not tin or ink
24 | 
25 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex09_regex_anchors/sample.txt:
--------------------------------------------------------------------------------
 1 | hello world!
 2 | 
 3 | good day
 4 | how do you do?
 5 | 
 6 | just do it
 7 | believe it!
 8 | 
 9 | today is sunny
10 | not a bit funny
11 | no doubt you like it too
12 | 
13 | much ado about nothing
14 | he he he
15 | 
16 | ^ could be exponentiation or xor operator
17 | scalar variables in perl start with $
18 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex10_regex_this_or_that.txt:
--------------------------------------------------------------------------------
 1 | 1) Match all lines containing any of these strings:
 2 |         String1: day
 3 |         String2: not
 4 | 
 5 | 
 6 | 2) Match all lines containing any of these whole words:
 7 |         String1: he
 8 |         String2: in
 9 | 
10 | 
11 | 3) Match all lines containing any of these strings:
12 |         String1: you
13 |         String2: be
14 |         String3: to
15 |         String4: he
16 | 
17 | 
18 | 4) Match all lines containing any of these strings:
19 |         String1: you
20 |         String2: be
21 |         String3: to
22 |         String4: he
23 |     but NOT these strings:
24 |         String1: it
25 |         String2: do
26 | 
27 | 
28 | 5) Match all lines starting with any of these strings:
29 |         String1: no
30 |         String2: to
31 | 
32 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex10_regex_this_or_that/sample.txt:
--------------------------------------------------------------------------------
 1 | hello world!
 2 | 
 3 | good day
 4 | how do you do?
 5 | 
 6 | just do it
 7 | believe it!
 8 | 
 9 | today is sunny
10 | not a bit funny
11 | no doubt you like it too
12 | 
13 | much ado about nothing
14 | he he he
15 | 
16 | ^ could be exponentiation or xor operator
17 | scalar variables in perl start with $
18 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex11_regex_quantifiers.txt:
--------------------------------------------------------------------------------
 1 | 1) Extract all 3 character strings surrounded by word boundaries
 2 | 
 3 | 
 4 | 2) Extract largest string from each line
 5 |         starting with character: d
 6 |         ending with character  : g
 7 | 
 8 | 
 9 | 3) Extract all strings from each line
10 |         starting with character: d
11 |         followed by zero or one: o
12 |         ending with character  : g
13 | 
14 | 
15 | 4) Extract all strings from each line
16 |         starting with character: d
17 |         followed by zero or one of any character
18 |         ending with character  : g
19 | 
20 | 
21 | 5) Extract all strings from each line
22 |         starting with character: g
23 |         followed by atleast one: o
24 |         ending with character  : d
25 | 
26 | 
27 | 6) Extract all strings from each line
28 |         starting with character : g
29 |         followed by extactly six: o
30 |         ending with character   : d
31 | 
32 | 
33 | 7) Extract all strings from each line
34 |         starting with character         : g
35 |         followed by min two and max four: o
36 |         ending with character           : d
37 | 
38 | 
39 | 8) Extract all strings from each line
40 |         starting with character: d
41 |         followed by max of two : o
42 |         ending with character  : g
43 | 
44 | 
45 | 9) Extract all strings from each line
46 |         starting with character : g
47 |         followed by min of three: o
48 |         ending with character   : d
49 | 
50 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex11_regex_quantifiers/garbled.txt:
--------------------------------------------------------------------------------
 1 | gd
 2 | god
 3 | goood
 4 | oh gold
 5 | goooooodyyyy
 6 | dog
 7 | dg
 8 | dig good gold
 9 | doogoodog
10 | c@t made forty justify
11 | dodging a toy
12 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex12_regex_character_class_part1.txt:
--------------------------------------------------------------------------------
 1 | 1) Match all lines containing any of these characters:
 2 |         character1: q
 3 |         character2: x
 4 |         character3: z
 5 | 
 6 | 
 7 | 2) Match all lines containing any of these characters:
 8 |         character1: c
 9 |         character2: f
10 |     followed by any character
11 |     followed by   : t
12 | 
13 | 
14 | 3) Extract all words starting with character: s
15 |     ignore case
16 |     should contain only alphabets
17 |     minimum two letters
18 |     should be surrounded by word boundaries
19 | 
20 | 
21 | 4) Extract all words made up of these characters:
22 |         character1: a
23 |         character2: c
24 |         character3: e
25 |         character4: r
26 |         character5: s
27 |     ignore case
28 |     should contain only alphabets
29 |     should be surrounded by word boundaries
30 | 
31 | 
32 | 5) Extract all numbers surrounded by word boundaries
33 | 
34 | 
35 | 6) Extract all numbers surrounded by word boundaries matching the condition
36 |     30 <= number <= 70
37 | 
38 | 
39 | 7) Extract all words made up of non-vowel characters
40 |     ignore case
41 |     should contain only alphabets and at least two
42 |     should be surrounded by word boundaries
43 | 
44 | 
45 | 8) Extract all sequence of strings consisting of character: -
46 |     surrounded on either side by zero or more case insensitive alphabets    
47 | 
48 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex12_regex_character_class_part1/sample_words.txt:
--------------------------------------------------------------------------------
1 | far 30 scarce f@$t 42 fit
2 | Cute 34 quite pry far-fetched Sure
3 | 70 cast-away 12 good hue he
4 | cry just Nymph race Peace. 67
5 | foo;bar;baz;p@t
6 | ARE 72 cut copy paste
7 | p1ate rest 512 Sync
8 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex13_regex_character_class_part2.txt:
--------------------------------------------------------------------------------
 1 | 1) Extract all characters before first occurrence of =
 2 | 
 3 | 
 4 | 2) Extract all characters from start of line made up of these characters
 5 |         upper or lower case alphabets
 6 |         all digits
 7 |         the underscore character
 8 | 
 9 | 
10 | 3) Match all lines containing the sequence
11 |         String1: there
12 |         any number of whitespace
13 |         String2: have
14 | 
15 | 
16 | 4) Extract all characters from start of line made up of these characters
17 |         upper or lower case alphabets
18 |         all digits
19 |         the characters [ and ]
20 |         ending with ]
21 | 
22 | 
23 | 5) Extract all punctuation characters from first line
24 | 
25 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex13_regex_character_class_part2/sample.txt:
--------------------------------------------------------------------------------
1 | a[2]='sample string'
2 | foo_bar=4232
3 | appx_pi=3.14
4 | greeting="Hi  there		have a nice   day"
5 | food[4]="dosa"
6 | b[0][1]=42
7 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex14_regex_grouping_and_backreference.txt:
--------------------------------------------------------------------------------
 1 | 1) Match lines containing these strings
 2 |         String1: scare
 3 |         String2: spore
 4 | 
 5 | 
 6 | 2) Extract these words
 7 |         Word1: handy
 8 |         Word2: hand
 9 |         Word3: hands
10 |         Word4: handful
11 | 
12 | 
13 | 3) Extract all whole words with at least one letter occurring twice in the word
14 |     ignore case
15 |     only alphabets
16 |     the letter occurring twice need not be placed next to each other
17 | 
18 | 
19 | 4) Match lines where same sequence of three consecutive alphabets is matched another time in the same line
20 |     ignore case
21 | 
22 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex14_regex_grouping_and_backreference/sample.txt:
--------------------------------------------------------------------------------
1 | hands hand library scare handy handful
2 | scared too big time eel candy
3 | spare food regulate circuit spore stare
4 | tire tempt cold malady
5 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex15_regex_PCRE.txt:
--------------------------------------------------------------------------------
 1 | 1) Extract all strings to the right of =
 2 |     provided characters from start of line until = do not include [ or ]
 3 | 
 4 | 
 5 | 2) Match all lines containing the string: Hi
 6 |     but shouldn't be followed afterwards in the line by: are
 7 | 
 8 | 
 9 | 3) Extract from start of line up to the string: Hi
10 |     provided it is followed afterwards in the line by: you
11 | 
12 | 
13 | 4) Extract all sequence of characters surrounded on both sides by space character
14 |     the space character should not be part of output
15 | 
16 | 
17 | 5) Extract all words
18 |     made of upper or lower case alphabets
19 |     at least two letters in length
20 |     surrounded by word boundaries
21 |     should not contain consecutive repeated alphabets
22 | 
23 | 
24 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex15_regex_PCRE/sample.txt:
--------------------------------------------------------------------------------
1 | a[2]='Hi, how are you?'
2 | foo_bar=4232
3 | appx_pi=3.14
4 | greeting="Hi there have a nice day"
5 | food[4]="dosa"
6 | b[0][1]=42
7 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex16_misc_and_extras.txt:
--------------------------------------------------------------------------------
 1 | Note: all files in directory are input to grep, unless otherwise specified
 2 | 
 3 | 1) Extract all negative numbers
 4 |     starts with - followed by one or more digits
 5 |     do not output filenames
 6 | 
 7 | 
 8 | 2) Display only filenames containing these two strings anywhere in the file
 9 |         String1: day
10 |         String2: and
11 | 
12 | 
13 | 3) The below command
14 |         grep -c '^Solution:' ../.ref_solutions/*
15 |     will give number of questions in each exercise. Change it, using another command and pipe if needed, so that only overall total is printed
16 | 
17 | 
18 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex16_misc_and_extras/garbled.txt:
--------------------------------------------------------------------------------
1 | day and night
2 | -43 and 99 and 12
3 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex16_misc_and_extras/poem.txt:
--------------------------------------------------------------------------------
1 | Roses are red,
2 | Violets are blue,
3 | Sugar is sweet,
4 | And so are you.
5 | 
6 | Good day to you :)
7 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/ex16_misc_and_extras/sample.txt:
--------------------------------------------------------------------------------
1 | account balance: -2300
2 | good day
3 | foo and bar and baz
4 | 


--------------------------------------------------------------------------------
/exercises/GNU_grep/solve:
--------------------------------------------------------------------------------
 1 | dir_name=$(basename "$PWD")
 2 | ref_file="../.ref_solutions/$dir_name.txt"
 3 | sol_file="../$dir_name.txt"
 4 | tmp_file='../.tmp.txt'
 5 | 
 6 | # color output
 7 | tcolors=$(tput colors)
 8 | if [[ -n $tcolors && $tcolors -ge 8 ]]; then
 9 |     red=$(tput setaf 1)
10 |     green=$(tput setaf 2)
11 |     blue=$(tput setaf 4)
12 |     clr_color=$(tput sgr0)
13 | else
14 |     red=''
15 |     green=''
16 |     blue=''
17 |     clr_color=''
18 | fi
19 | 
20 | sub_sol=0
21 | if [[ $1 == -s ]]; then
22 |     prev_cmd=$(fc -ln -2 | sed 's/^[ \t]*//;q')
23 |     sub_sol=1
24 | elif [[ $1 == -q ]]; then
25 |     # highlight the question to be solved next
26 |     # or show only the (unanswered)? question to be solved next
27 |     cat "$sol_file"
28 |     return
29 | elif [[ -n $1 ]]; then
30 |     echo -e 'Unknown option...Exiting script'
31 |     return
32 | fi
33 | 
34 | count=0
35 | sol_count=0
36 | err_count=0
37 | while IFS= read -u3 -r ref_line && read -u4 -r sol_line; do
38 |     if [[ "${ref_line:0:9}" == Solution: ]]; then
39 |         (( count++ ))
40 | 
41 |         if [[ $sub_sol == 1 && -z $sol_line ]]; then
42 |             sol_line="$prev_cmd"
43 |             sub_sol=0
44 |         fi
45 | 
46 |         if [[ "$(eval "command ${ref_line:10}")" == "$(eval "command $sol_line")" ]]; then
47 |             (( sol_count++ ))
48 |             # use color if terminal supports
49 |             echo '---------------------------------------------'
50 |             echo "Match for question $count:"
51 |             echo "${red}Submitted solution:${clr_color} $sol_line"
52 |             echo "${green}Reference solution:${clr_color} ${ref_line:10}"
53 |             echo '---------------------------------------------'
54 |         else
55 |             (( err_count++ ))
56 |             if [[ $err_count == 1 && -n $sol_line ]]; then
57 |                 echo '---------------------------------------------'
58 |                 echo "Mismatch for question $count:"
59 |                 echo "$(tput bold)${red}Expected output is:${clr_color}$(tput rmso)"
60 |                 eval "command ${ref_line:10}"
61 |                 echo '---------------------------------------------'
62 |             fi
63 |             sol_line=''
64 |         fi
65 |     fi
66 | 
67 |     echo "$sol_line" >> "$tmp_file"
68 | 
69 | done 3<"$ref_file" 4<"$sol_file"
70 | 
71 | ((count==sol_count)) && printf "\t\t$(tput bold)${blue}All Pass${clr_color}$(tput rmso)\t\t\n"
72 | 
73 | mv "$tmp_file" "$sol_file"
74 | 
75 | # vim: syntax=bash
76 | 


--------------------------------------------------------------------------------
/exercises/README.md:
--------------------------------------------------------------------------------
  1 | # <a name="exercises"></a>Exercises
  2 | 
  3 | Instructions and shell script here assumes `bash` shell. Tested on *GNU bash, version 4.3.46*
  4 | 
  5 | <br>
  6 | 
  7 | * For example, the first exercise for **GNU_grep**
  8 |     * directory: `ex01_basic_match`
  9 |     * question file: `ex01_basic_match.txt`
 10 |     * solution reference: `.ref_solutions/ex01_basic_match.txt`
 11 | * Each exercise contains one or more question to be solved
 12 | * The script `solve` will assist in checking solutions
 13 | 
 14 | ```bash
 15 | $ git clone https://github.com/learnbyexample/Command-line-text-processing.git
 16 | $ cd Command-line-text-processing/exercises/GNU_grep/
 17 | $ ls
 18 | ex01_basic_match      ex02_basic_options      ex03_multiple_string_match      solve
 19 | ex01_basic_match.txt  ex02_basic_options.txt  ex03_multiple_string_match.txt
 20 | 
 21 | $ find -name 'ex01*'
 22 | ./.ref_solutions/ex01_basic_match.txt
 23 | ./ex01_basic_match
 24 | ./ex01_basic_match.txt
 25 | ```
 26 | 
 27 | <br>
 28 | 
 29 | * Solving the questions
 30 |     * Go to the exercise folder
 31 |     * Use `ls` to see input file(s)
 32 |     * To see the problems for that exercise, follow the steps below
 33 | 
 34 | ```bash
 35 | $ cd ex01_basic_match
 36 | $ ls
 37 | sample.txt
 38 | 
 39 | $ # to see the questions
 40 | $ source ../solve -q
 41 | 1) Match lines containing the string: day
 42 | 
 43 | 
 44 | 2) Match lines containing the string: it
 45 | 
 46 | 
 47 | 3) Match lines containing the string: do you
 48 | 
 49 | 
 50 | $ # or open the questions file with your fav editor
 51 | $ gvim ../$(basename "$PWD").txt
 52 | $ # create an alias to use from any ex* directory
 53 | $ alias oq='gvim ../$(basename "$PWD").txt'
 54 | $ oq
 55 | ```
 56 | 
 57 | <br>
 58 | 
 59 | * Submitting solutions one by one
 60 |     * immediately after executing command that answers a question, call the `solve` script
 61 | 
 62 | ```bash
 63 | $ grep 'day' sample.txt 
 64 | Good day
 65 | Today is sunny
 66 | $ source ../solve -s
 67 | ---------------------------------------------
 68 | Match for question 1:
 69 | Submitted solution: grep 'day' sample.txt 
 70 | Reference solution: grep 'day' sample.txt
 71 | ---------------------------------------------
 72 | ```
 73 | 
 74 | <br>
 75 | 
 76 | * Submit all at once
 77 |     * by editing the `../$(basename "$PWD").txt` file directly
 78 |     * the answer should replace the empty line immediately following the question
 79 | * **Note**
 80 |     * there are different ways to solve the same question
 81 |     * but for specific exercise like **GNU_grep** try to solve using `grep` only
 82 |     * also, remember that `eval` is used to check equivalence. So be sure of commands submitted
 83 | 
 84 | ```bash
 85 | $ cat ../$(basename "$PWD").txt
 86 | 1) Match lines containing the string: day
 87 | grep 'day' sample.txt
 88 | 
 89 | 2) Match lines containing the string: it
 90 | sed -n '/it/p' sample.txt
 91 | 
 92 | 3) Match lines containing the string: do you
 93 | echo 'How do you do?'
 94 | 
 95 | $ source ../solve
 96 | ---------------------------------------------
 97 | Match for question 1:
 98 | Submitted solution: grep 'day' sample.txt
 99 | Reference solution: grep 'day' sample.txt
100 | ---------------------------------------------
101 | ---------------------------------------------
102 | Match for question 2:
103 | Submitted solution: sed -n '/it/p' sample.txt
104 | Reference solution: grep 'it' sample.txt
105 | ---------------------------------------------
106 | ---------------------------------------------
107 | Match for question 3:
108 | Submitted solution: echo 'How do you do?'
109 | Reference solution: grep 'do you' sample.txt
110 | ---------------------------------------------
111 | 		All Pass		
112 | ```
113 | 
114 | <br>
115 | 
116 | * Then move on to next exercise directory
117 | * Create aliases for different commands for easy use, after checking that the aliases are available of course
118 | 
119 | ```bash
120 | $ type cs cq ca nq pq
121 | bash: type: cs: not found
122 | bash: type: cq: not found
123 | bash: type: ca: not found
124 | bash: type: nq: not found
125 | bash: type: pq: not found
126 | 
127 | $ alias cs='source ../solve -s'
128 | $ alias cq='source ../solve -q'
129 | $ alias ca='source ../solve'
130 | $ # to go to directory of next question
131 | $ nq() { d=$(basename "$PWD"); nd=$(printf "../ex%02d*/" $((${d:2:2}+1))); cd $nd ; }
132 | $ # to go to directory of previous question
133 | $ pq() { d=$(basename "$PWD"); pd=$(printf "../ex%02d*/" $((${d:2:2}-1))); cd $pd ; }
134 | ```
135 | 
136 | <br>
137 | 
138 | If wrong solution is submitted, the expected output is shown. This also helps to better understand the question as I found it difficult to convey the intent of question clearly with words alone...
139 | 
140 | ```bash
141 | $ source ../solve -q
142 | 1) Match lines containing the string: day
143 | 
144 | 
145 | 2) Match lines containing the string: it
146 | 
147 | 
148 | 3) Match lines containing the string: do you
149 | 
150 | $ grep 'do' sample.txt 
151 | How do you do?
152 | Just do it
153 | No doubt you like it too
154 | Much ado about nothing
155 | $ source ../solve -s
156 | ---------------------------------------------
157 | Mismatch for question 1:
158 | Expected output is:
159 | Good day
160 | Today is sunny
161 | ---------------------------------------------
162 | ```
163 | 


--------------------------------------------------------------------------------
/file_attributes.md:
--------------------------------------------------------------------------------
  1 | # <a name="file-attributes"></a>File attributes
  2 | 
  3 | **Table of Contents**
  4 | 
  5 | * [wc](#wc)
  6 |     * [Various counts](#various-counts)
  7 |     * [subtle differences](#subtle-differences)
  8 |     * [Further reading for wc](#further-reading-for-wc)
  9 | * [du](#du)
 10 |     * [Default size](#default-size)
 11 |     * [Various size formats](#various-size-formats)
 12 |     * [Dereferencing links](#dereferencing-links)
 13 |     * [Filtering options](#filtering-options)
 14 |     * [Further reading for du](#further-reading-for-du)
 15 | * [df](#df)
 16 |     * [Examples](#examples)
 17 |     * [Further reading for df](#further-reading-for-df)
 18 | * [touch](#touch)
 19 |     * [Creating empty file](#creating-empty-file)
 20 |     * [Updating timestamps](#updating-timestamps)
 21 |     * [Preserving timestamp](#preserving-timestamp)
 22 |     * [Further reading for touch](#further-reading-for-touch)
 23 | * [file](#file)
 24 |     * [File type examples](#file-type-examples)
 25 |     * [Further reading for file](#further-reading-for-file)
 26 | 
 27 | <br>
 28 | 
 29 | ## <a name="wc"></a>wc
 30 | 
 31 | ```bash
 32 | $ wc --version | head -n1
 33 | wc (GNU coreutils) 8.25
 34 | 
 35 | $ man wc
 36 | WC(1)                            User Commands                           WC(1)
 37 | 
 38 | NAME
 39 |        wc - print newline, word, and byte counts for each file
 40 | 
 41 | SYNOPSIS
 42 |        wc [OPTION]... [FILE]...
 43 |        wc [OPTION]... --files0-from=F
 44 | 
 45 | DESCRIPTION
 46 |        Print newline, word, and byte counts for each FILE, and a total line if
 47 |        more than one FILE is specified.  A word is a non-zero-length  sequence
 48 |        of characters delimited by white space.
 49 | 
 50 |        With no FILE, or when FILE is -, read standard input.
 51 | ...
 52 | ```
 53 | 
 54 | <br>
 55 | 
 56 | #### <a name="various-counts"></a>Various counts
 57 | 
 58 | ```bash
 59 | $ cat sample.txt
 60 | Hello World
 61 | Good day
 62 | No doubt you like it too
 63 | Much ado about nothing
 64 | He he he
 65 | 
 66 | $ # by default, gives newline/word/byte count (in that order)
 67 | $ wc sample.txt
 68 |  5 17 78 sample.txt
 69 | 
 70 | $ # options to get individual numbers
 71 | $ wc -l sample.txt
 72 | 5 sample.txt
 73 | $ wc -w sample.txt
 74 | 17 sample.txt
 75 | $ wc -c sample.txt
 76 | 78 sample.txt
 77 | 
 78 | $ # use shell input redirection if filename is not needed
 79 | $ wc -l < sample.txt
 80 | 5
 81 | ```
 82 | 
 83 | * multiple file input
 84 | * automatically displays total at end
 85 | 
 86 | ```bash
 87 | $ cat greeting.txt
 88 | Hello there
 89 | Have a safe journey
 90 | $ cat fruits.txt
 91 | Fruit   Price
 92 | apple   42
 93 | banana  31
 94 | fig     90
 95 | guava   6
 96 | 
 97 | $ wc *.txt
 98 |   5  10  57 fruits.txt
 99 |   2   6  32 greeting.txt
100 |   5  17  78 sample.txt
101 |  12  33 167 total
102 | ```
103 | 
104 | * use `-L` to get length of longest line
105 | 
106 | ```bash
107 | $ wc -L < sample.txt
108 | 24
109 | 
110 | $ echo 'foo bar baz' | wc -L
111 | 11
112 | $ echo 'hi there!' | wc -L
113 | 9
114 | 
115 | $ # last line will show max value, not sum of all input
116 | $ wc -L *.txt
117 |  13 fruits.txt
118 |  19 greeting.txt
119 |  24 sample.txt
120 |  24 total
121 | ```
122 | 
123 | <br>
124 | 
125 | #### <a name="subtle-differences"></a>subtle differences
126 | 
127 | * byte count vs character count
128 | 
129 | ```bash
130 | $ # when input is ASCII
131 | $ printf 'hi there' | wc -c
132 | 8
133 | $ printf 'hi there' | wc -m
134 | 8
135 | 
136 | $ # when input has multi-byte characters
137 | $ printf 'hi👍' | od -x
138 | 0000000 6968 9ff0 8d91
139 | 0000006
140 | 
141 | $ printf 'hi👍' | wc -m
142 | 3
143 | 
144 | $ printf 'hi👍' | wc -c
145 | 6
146 | ```
147 | 
148 | * `-l` option gives only the count of number of newline characters
149 | 
150 | ```bash
151 | $ printf 'hi there\ngood day' | wc -l
152 | 1
153 | $ printf 'hi there\ngood day\n' | wc -l
154 | 2
155 | $ printf 'hi there\n\n\nfoo\n' | wc -l
156 | 4
157 | ```
158 | 
159 | * From `man wc` "A word is a non-zero-length sequence of characters delimited by white space"
160 | 
161 | ```bash
162 | $ echo 'foo        bar ;-*' | wc -w
163 | 3
164 | 
165 | $ # use other text processing as needed
166 | $ echo 'foo        bar ;-*' | grep -iowE '[a-z]+'
167 | foo
168 | bar
169 | $ echo 'foo        bar ;-*' | grep -iowE '[a-z]+' | wc -l
170 | 2
171 | ```
172 | 
173 | * `-L` won't count non-printable characters and tabs are converted to equivalent spaces
174 | 
175 | ```bash
176 | $ printf 'food\tgood' | wc -L
177 | 12
178 | $ printf 'food\tgood' | wc -m
179 | 9
180 | $ printf 'food\tgood' | awk '{print length()}'
181 | 9
182 | 
183 | $ printf 'foo\0bar\0baz' | wc -L
184 | 9
185 | $ printf 'foo\0bar\0baz' | wc -m
186 | 11
187 | $ printf 'foo\0bar\0baz' | awk '{print length()}'
188 | 11
189 | ```
190 | 
191 | <br>
192 | 
193 | #### <a name="further-reading-for-wc"></a>Further reading for wc
194 | 
195 | * `man wc` and `info wc` for more options and detailed documentation
196 | * [wc Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/wc?sort=votes&pageSize=15)
197 | * [wc Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/wc?sort=votes&pageSize=15)
198 | 
199 | <br>
200 | 
201 | ## <a name="du"></a>du
202 | 
203 | ```bash
204 | $ du --version | head -n1
205 | du (GNU coreutils) 8.25
206 | 
207 | $ man du
208 | DU(1)                            User Commands                           DU(1)
209 | 
210 | NAME
211 |        du - estimate file space usage
212 | 
213 | SYNOPSIS
214 |        du [OPTION]... [FILE]...
215 |        du [OPTION]... --files0-from=F
216 | 
217 | DESCRIPTION
218 |        Summarize disk usage of the set of FILEs, recursively for directories.
219 | ...
220 | ```
221 | 
222 | <br>
223 | 
224 | <br>
225 | 
226 | #### <a name="default-size"></a>Default size
227 | 
228 | * By default, size is given in size of **1024 bytes**
229 | * Files are ignored, all directories and sub-directories are recursively reported
230 | 
231 | ```bash
232 | $ ls -F
233 | projs/  py_learn@  words.txt
234 | 
235 | $ du
236 | 17920   ./projs/full_addr
237 | 14316   ./projs/half_addr
238 | 32952   ./projs
239 | 33880   .
240 | ```
241 | 
242 | * use `-a` to recursively show both files and directories
243 | * use `-s` to show total directory size without descending into its sub-directories
244 | 
245 | ```bash
246 | $ du -a
247 | 712     ./projs/report.log
248 | 17916   ./projs/full_addr/faddr.v
249 | 17920   ./projs/full_addr
250 | 14312   ./projs/half_addr/haddr.v
251 | 14316   ./projs/half_addr
252 | 32952   ./projs
253 | 0       ./py_learn
254 | 924     ./words.txt
255 | 33880   .
256 | 
257 | $ du -s
258 | 33880   .
259 | 
260 | $ du -s projs words.txt
261 | 32952   projs
262 | 924     words.txt
263 | ```
264 | 
265 | * use `-S` to show directory size without taking into account size of its sub-directories
266 | 
267 | ```bash
268 | $ du -S
269 | 17920   ./projs/full_addr
270 | 14316   ./projs/half_addr
271 | 716     ./projs
272 | 928     .
273 | ```
274 | 
275 | <br>
276 | 
277 | <br>
278 | 
279 | #### <a name="various-size-formats"></a>Various size formats
280 | 
281 | ```bash
282 | $ # number of bytes
283 | $ stat -c %s words.txt
284 | 938848
285 | $ du -b words.txt
286 | 938848  words.txt
287 | 
288 | $ # kilobytes = 1024 bytes
289 | $ du -sk projs
290 | 32952   projs
291 | $ # megabytes = 1024 kilobytes
292 | $ du -sm projs
293 | 33      projs
294 | 
295 | $ # -B to specify custom byte scale size
296 | $ du -sB 5000 projs
297 | 6749    projs
298 | $ du -sB 1048576 projs
299 | 33      projs
300 | ```
301 | 
302 | * human readable and si units
303 | 
304 | ```bash
305 | $ # in terms of powers of 1024
306 | $ # M = 1048576 bytes and so on
307 | $ du -sh projs/* words.txt
308 | 18M     projs/full_addr
309 | 14M     projs/half_addr
310 | 712K    projs/report.log
311 | 924K    words.txt
312 | 
313 | $ # in terms of powers of 1000
314 | $ # M = 1000000 bytes and so on
315 | $ du -s --si projs/* words.txt
316 | 19M     projs/full_addr
317 | 15M     projs/half_addr
318 | 730k    projs/report.log
319 | 947k    words.txt
320 | ```
321 | 
322 | * sorting
323 | 
324 | ```bash
325 | $ du -sh projs/* words.txt | sort -h
326 | 712K    projs/report.log
327 | 924K    words.txt
328 | 14M     projs/half_addr
329 | 18M     projs/full_addr
330 | 
331 | $ du -sk projs/* | sort -nr
332 | 17920   projs/full_addr
333 | 14316   projs/half_addr
334 | 712     projs/report.log
335 | ```
336 | 
337 | * to get size based on number of characters in file rather than disk space alloted
338 | 
339 | ```bash
340 | $ du -b words.txt
341 | 938848  words.txt
342 | 
343 | $ du -h words.txt
344 | 924K    words.txt
345 | 
346 | $ # 938848/1024 = 916.84
347 | $ du --apparent-size -h words.txt
348 | 917K    words.txt
349 | ```
350 | 
351 | <br>
352 | 
353 | #### <a name="dereferencing-links"></a>Dereferencing links
354 | 
355 | * See `man` and `info` pages for other related options
356 | 
357 | ```bash
358 | $ # -D to dereference command line argument
359 | $ du py_learn
360 | 0       py_learn
361 | $ du -shD py_learn
362 | 503M    py_learn
363 | 
364 | $ # -L to dereference links found by du
365 | $ du -sh
366 | 34M     .
367 | $ du -shL
368 | 536M    .
369 | ```
370 | 
371 | <br>
372 | 
373 | #### <a name="filtering-options"></a>Filtering options
374 | 
375 | * `-d` to specify maximum depth
376 | 
377 | ```bash
378 | $ du -ah projs
379 | 712K    projs/report.log
380 | 18M     projs/full_addr/faddr.v
381 | 18M     projs/full_addr
382 | 14M     projs/half_addr/haddr.v
383 | 14M     projs/half_addr
384 | 33M     projs
385 | 
386 | $ du -ah -d1 projs
387 | 712K    projs/report.log
388 | 18M     projs/full_addr
389 | 14M     projs/half_addr
390 | 33M     projs
391 | ```
392 | 
393 | * `-c` to also show total size at end
394 | 
395 | ```bash
396 | $ du -cshD projs py_learn
397 | 33M     projs
398 | 503M    py_learn
399 | 535M    total
400 | ```
401 | 
402 | * `-t` to provide a threshold comparison
403 | 
404 | ```bash
405 | $ # >= 15M
406 | $ du -Sh -t 15M
407 | 18M     ./projs/full_addr
408 | 
409 | $ # <= 1M
410 | $ du -ah -t -1M
411 | 712K    ./projs/report.log
412 | 0       ./py_learn
413 | 924K    ./words.txt
414 | ```
415 | 
416 | * excluding files/directories based on **glob** pattern
417 | * see also `--exclude-from=FILE` and `--files0-from=FILE` options
418 | 
419 | ```bash
420 | $ # note that excluded files affect directory size reported
421 | $ du -ah --exclude='*addr*' projs
422 | 712K    projs/report.log
423 | 716K    projs
424 | 
425 | $ # depending on shell, brace expansion can be used
426 | $ du -ah --exclude='*.'{v,log} projs
427 | 4.0K    projs/full_addr
428 | 4.0K    projs/half_addr
429 | 12K     projs
430 | ```
431 | 
432 | <br>
433 | 
434 | #### <a name="further-reading-for-du"></a>Further reading for du
435 | 
436 | * `man du` and `info du` for more options and detailed documentation
437 | * [du Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/disk-usage?sort=votes&pageSize=15)
438 | * [du Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/du?sort=votes&pageSize=15)
439 | 
440 | <br>
441 | 
442 | ## <a name="df"></a>df
443 | 
444 | ```bash
445 | $ df --version | head -n1
446 | df (GNU coreutils) 8.25
447 | 
448 | $ man df
449 | DF(1)                            User Commands                           DF(1)
450 | 
451 | NAME
452 |        df - report file system disk space usage
453 | 
454 | SYNOPSIS
455 |        df [OPTION]... [FILE]...
456 | 
457 | DESCRIPTION
458 |        This  manual  page  documents  the  GNU version of df.  df displays the
459 |        amount of disk space available on the file system containing each  file
460 |        name  argument.   If  no file name is given, the space available on all
461 |        currently mounted file systems is shown.
462 | ...
463 | ```
464 | 
465 | <br>
466 | 
467 | #### <a name="examples"></a>Examples
468 | 
469 | ```bash
470 | $ # use df without arguments to get information on all currently mounted file systems
471 | $ df .
472 | Filesystem     1K-blocks     Used Available Use% Mounted on
473 | /dev/sda1       98298500 58563816  34734748  63% /
474 | 
475 | $ # use -B option for custom size
476 | $ # use --si for size in powers of 1000 instead of 1024
477 | $ df -h .
478 | Filesystem      Size  Used Avail Use% Mounted on
479 | /dev/sda1        94G   56G   34G  63% /
480 | ```
481 | 
482 | * Use `--output` to report only specific fields of interest
483 | 
484 | ```bash
485 | $ df -h --output=size,used,file / /media/learnbyexample/projs
486 |  Size  Used File
487 |   94G   56G /
488 |   92G   35G /media/learnbyexample/projs
489 | 
490 | $ df -h --output=pcent .
491 | Use%
492 |  63%
493 | 
494 | $ df -h --output=pcent,fstype | awk -F'%' 'NR>2 && $1>=40'
495 |  63% ext3
496 |  40% ext4
497 |  51% ext4
498 | ```
499 | 
500 | <br>
501 | 
502 | #### <a name="further-reading-for-df"></a>Further reading for df
503 | 
504 | * `man df` and `info df` for more options and detailed documentation
505 | * [df Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/df?sort=votes&pageSize=15)
506 | * [Parsing df command output with awk](https://unix.stackexchange.com/questions/360865/parsing-df-command-output-with-awk)
507 | * [processing df output](https://www.reddit.com/r/bash/comments/68dbml/using_an_array_variable_in_an_awk_command/)
508 | 
509 | <br>
510 | 
511 | ## <a name="touch"></a>touch
512 | 
513 | ```bash
514 | $ touch --version | head -n1
515 | touch (GNU coreutils) 8.25
516 | 
517 | $ man touch
518 | TOUCH(1)                         User Commands                        TOUCH(1)
519 | 
520 | NAME
521 |        touch - change file timestamps
522 | 
523 | SYNOPSIS
524 |        touch [OPTION]... FILE...
525 | 
526 | DESCRIPTION
527 |        Update  the  access  and modification times of each FILE to the current
528 |        time.
529 | 
530 |        A FILE argument that does not exist is created empty, unless -c  or  -h
531 |        is supplied.
532 | ...
533 | ```
534 | 
535 | <br>
536 | 
537 | #### <a name="creating-empty-file"></a>Creating empty file
538 | 
539 | ```bash
540 | $ ls foo.txt
541 | ls: cannot access 'foo.txt': No such file or directory
542 | $ touch foo.txt
543 | $ ls foo.txt
544 | foo.txt
545 | 
546 | $ # use -c if new file shouldn't be created
547 | $ rm foo.txt
548 | $ touch -c foo.txt
549 | $ ls foo.txt
550 | ls: cannot access 'foo.txt': No such file or directory
551 | ```
552 | 
553 | <br>
554 | 
555 | #### <a name="updating-timestamps"></a>Updating timestamps
556 | 
557 | * Updating both access and modification timestamp to current time
558 | 
559 | ```bash
560 | $ # last access time
561 | $ stat -c %x fruits.txt
562 | 2017-07-19 17:06:01.523308599 +0530
563 | $ # last modification time
564 | $ stat -c %y fruits.txt
565 | 2017-07-13 13:54:03.576055933 +0530
566 | 
567 | $ touch fruits.txt
568 | $ stat -c %x fruits.txt
569 | 2017-07-21 10:11:44.241921229 +0530
570 | $ stat -c %y fruits.txt
571 | 2017-07-21 10:11:44.241921229 +0530
572 | ```
573 | 
574 | * Updating only access or modification timestamp
575 | 
576 | ```bash
577 | $ touch -a greeting.txt
578 | $ stat -c %x greeting.txt
579 | 2017-07-21 10:14:08.457268564 +0530
580 | $ stat -c %y greeting.txt
581 | 2017-07-13 13:54:26.004499660 +0530
582 | 
583 | $ touch -m sample.txt
584 | $ stat -c %x sample.txt
585 | 2017-07-13 13:48:24.945450646 +0530
586 | $ stat -c %y sample.txt
587 | 2017-07-21 10:14:40.770006144 +0530
588 | ```
589 | 
590 | * Using timestamp from another file to update
591 | 
592 | ```bash
593 | $ stat -c $'%x\n%y' power.log report.log
594 | 2017-07-19 10:48:03.978295434 +0530
595 | 2017-07-14 20:50:42.850887578 +0530
596 | 2017-06-24 13:00:31.773583923 +0530
597 | 2017-06-24 12:59:53.316751651 +0530
598 | 
599 | $ # copy both access and modification timestamp from power.log to report.log
600 | $ touch -r power.log report.log
601 | $ stat -c $'%x\n%y' report.log
602 | 2017-07-19 10:48:03.978295434 +0530
603 | 2017-07-14 20:50:42.850887578 +0530
604 | 
605 | $ # add -a or -m options to limit to only access or modification timestamp
606 | ```
607 | 
608 | * Using date string to update
609 | * See also `-t` option
610 | 
611 | ```bash
612 | $ # add -a or -m as needed
613 | $ touch -d '2010-03-17 17:04:23' report.log
614 | $ stat -c $'%x\n%y' report.log
615 | 2010-03-17 17:04:23.000000000 +0530
616 | 2010-03-17 17:04:23.000000000 +0530
617 | ```
618 | 
619 | <br>
620 | 
621 | #### <a name="preserving-timestamp"></a>Preserving timestamp
622 | 
623 | * Text processing on files would update the timestamps
624 | 
625 | ```bash
626 | $ stat -c $'%x\n%y' power.log
627 | 2017-07-21 11:11:42.862874240 +0530
628 | 2017-07-13 21:31:53.496323704 +0530
629 | 
630 | $ sed -i 's/foo/bar/g' power.log
631 | $ stat -c $'%x\n%y' power.log
632 | 2017-07-21 11:12:20.303504336 +0530
633 | 2017-07-21 11:12:20.303504336 +0530
634 | ```
635 | 
636 | * `touch` can be used to restore timestamps after processing
637 | 
638 | ```bash
639 | $ # first copy the timestamps using touch -r
640 | $ stat -c $'%x\n%y' story.txt
641 | 2017-06-24 13:00:31.773583923 +0530
642 | 2017-06-24 12:59:53.316751651 +0530
643 | $ # tmp.txt is temporary empty file
644 | $ touch -r story.txt tmp.txt
645 | $ stat -c $'%x\n%y' tmp.txt
646 | 2017-06-24 13:00:31.773583923 +0530
647 | 2017-06-24 12:59:53.316751651 +0530
648 | 
649 | $ # after text processing, copy back the timestamps and remove temporary file
650 | $ sed -i 's/cat/dog/g' story.txt
651 | $ touch -r tmp.txt story.txt && rm tmp.txt
652 | $ stat -c $'%x\n%y' story.txt
653 | 2017-06-24 13:00:31.773583923 +0530
654 | 2017-06-24 12:59:53.316751651 +0530
655 | ```
656 | 
657 | <br>
658 | 
659 | #### <a name="further-reading-for-touch"></a>Further reading for touch
660 | 
661 | * `man touch` and `info touch` for more options and detailed documentation
662 | * [touch Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/touch?sort=votes&pageSize=15)
663 | 
664 | <br>
665 | 
666 | ## <a name="file"></a>file
667 | 
668 | ```bash
669 | $ file --version | head -n1
670 | file-5.25
671 | 
672 | $ man file
673 | FILE(1)                   BSD General Commands Manual                  FILE(1)
674 | 
675 | NAME
676 |      file — determine file type
677 | 
678 | SYNOPSIS
679 |      file [-bcEhiklLNnprsvzZ0] [--apple] [--extension] [--mime-encoding]
680 |           [--mime-type] [-e testname] [-F separator] [-f namefile]
681 |           [-m magicfiles] [-P name=value] file ...
682 |      file -C [-m magicfiles]
683 |      file [--help]
684 | 
685 | DESCRIPTION
686 |      This manual page documents version 5.25 of the file command.
687 | 
688 |      file tests each argument in an attempt to classify it.  There are three
689 |      sets of tests, performed in this order: filesystem tests, magic tests,
690 |      and language tests.  The first test that succeeds causes the file type to
691 |      be printed.
692 | ...
693 | ```
694 | 
695 | <br>
696 | 
697 | <br>
698 | 
699 | #### <a name="file-type-examples"></a>File type examples
700 | 
701 | ```bash
702 | $ file sample.txt
703 | sample.txt: ASCII text
704 | $ # without file name in output
705 | $ file -b sample.txt
706 | ASCII text
707 | 
708 | $ printf 'hi👍\n' | file -
709 | /dev/stdin: UTF-8 Unicode text
710 | $ printf 'hi👍\n' | file -i -
711 | /dev/stdin: text/plain; charset=utf-8
712 | 
713 | $ file ch
714 | ch:  Bourne-Again shell script, ASCII text executable
715 | 
716 | $ file sunset.jpg moon.png
717 | sunset.jpg: JPEG image data
718 | moon.png: PNG image data, 32 x 32, 8-bit/color RGBA, non-interlaced
719 | ```
720 | 
721 | * different line terminators
722 | 
723 | ```bash
724 | $ printf 'hi' | file -
725 | /dev/stdin: ASCII text, with no line terminators
726 | 
727 | $ printf 'hi\r' | file -
728 | /dev/stdin: ASCII text, with CR line terminators
729 | 
730 | $ printf 'hi\r\n' | file -
731 | /dev/stdin: ASCII text, with CRLF line terminators
732 | 
733 | $ printf 'hi\n' | file -
734 | /dev/stdin: ASCII text
735 | ```
736 | 
737 | * find all files of particular type in current directory, for example `image` files
738 | 
739 | ```bash
740 | $ find -type f -exec bash -c '(file -b "$0" | grep -wq "image data") && echo "$0"' {} \;
741 | ./sunset.jpg
742 | ./moon.png
743 | 
744 | $ # if filenames do not contain : or newline characters
745 | $ find -type f -exec file {} + | awk -F: '/\<image data\>/{print $1}'
746 | ./sunset.jpg
747 | ./moon.png
748 | ```
749 | 
750 | <br>
751 | 
752 | #### <a name="further-reading-for-file"></a>Further reading for file
753 | 
754 | * `man file` and `info file` for more options and detailed documentation
755 | * See also `identify` command which `describes the format and characteristics of one or more image files`
756 | 


--------------------------------------------------------------------------------
/images/color_option.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/color_option.png


--------------------------------------------------------------------------------
/images/colordiff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/colordiff.png


--------------------------------------------------------------------------------
/images/highlight_string_whole_file_op.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/highlight_string_whole_file_op.png


--------------------------------------------------------------------------------
/images/wdiff_to_colordiff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/wdiff_to_colordiff.png


--------------------------------------------------------------------------------
/miscellaneous.md:
--------------------------------------------------------------------------------
  1 | # <a name="miscellaneous"></a>Miscellaneous
  2 | 
  3 | **Table of Contents**
  4 | 
  5 | * [cut](#cut)
  6 |     * [select specific fields](#select-specific-fields)
  7 |     * [suppressing lines without delimiter](#suppressing-lines-without-delimiter)
  8 |     * [specifying delimiters](#specifying-delimiters)
  9 |     * [complement](#complement)
 10 |     * [select specific characters](#select-specific-characters)
 11 |     * [Further reading for cut](#further-reading-for-cut)
 12 | * [tr](#tr)
 13 |     * [translation](#translation)
 14 |     * [escape sequences and character classes](#escape-sequences-and-character-classes)
 15 |     * [deletion](#deletion)
 16 |     * [squeeze](#squeeze)
 17 |     * [Further reading for tr](#further-reading-for-tr)
 18 | * [basename](#basename)
 19 | * [dirname](#dirname)
 20 | * [xargs](#xargs)
 21 | * [seq](#seq)
 22 |     * [integer sequences](#integer-sequences)
 23 |     * [specifying separator](#specifying-separator)
 24 |     * [floating point sequences](#floating-point-sequences)
 25 |     * [Further reading for seq](#further-reading-for-seq)
 26 | 
 27 | <br>
 28 | 
 29 | ## <a name="cut"></a>cut
 30 | 
 31 | ```bash
 32 | $ cut --version | head -n1
 33 | cut (GNU coreutils) 8.25
 34 | 
 35 | $ man cut
 36 | CUT(1)                           User Commands                          CUT(1)
 37 | 
 38 | NAME
 39 |        cut - remove sections from each line of files
 40 | 
 41 | SYNOPSIS
 42 |        cut OPTION... [FILE]...
 43 | 
 44 | DESCRIPTION
 45 |        Print selected parts of lines from each FILE to standard output.
 46 | 
 47 |        With no FILE, or when FILE is -, read standard input.
 48 | ...
 49 | ```
 50 | 
 51 | <br>
 52 | 
 53 | #### <a name="select-specific-fields"></a>select specific fields
 54 | 
 55 | * Default delimiter is **tab** character
 56 | * `-f` option allows to print specific field(s) from each input line
 57 | 
 58 | ```bash
 59 | $ printf 'foo\tbar\t123\tbaz\n'
 60 | foo     bar     123     baz
 61 | 
 62 | $ # single field
 63 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f2
 64 | bar
 65 | 
 66 | $ # multiple fields can be specified by using ,
 67 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f2,4
 68 | bar     baz
 69 | 
 70 | $ # output is always ascending order of field numbers
 71 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f3,1
 72 | foo     123
 73 | 
 74 | $ # range can be specified using -
 75 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f1-3
 76 | foo     bar     123
 77 | $ # if ending number is omitted, select till last field
 78 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f3-
 79 | 123     baz
 80 | ```
 81 | 
 82 | <br>
 83 | 
 84 | #### <a name="suppressing-lines-without-delimiter"></a>suppressing lines without delimiter
 85 | 
 86 | ```bash
 87 | $ cat marks.txt
 88 | jan 2017
 89 | foobar  12      45      23
 90 | feb 2017
 91 | foobar  18      38      19
 92 | 
 93 | $ # by default lines without delimiter will be printed
 94 | $ cut -f2- marks.txt
 95 | jan 2017
 96 | 12      45      23
 97 | feb 2017
 98 | 18      38      19
 99 | 
100 | $ # use -s option to suppress such lines
101 | $ cut -s -f2- marks.txt
102 | 12      45      23
103 | 18      38      19
104 | ```
105 | 
106 | <br>
107 | 
108 | #### <a name="specifying-delimiters"></a>specifying delimiters
109 | 
110 | * use `-d` option to specify input delimiter other than default **tab** character
111 | * only single character can be used, for multi-character/regex based delimiter use `awk` or `perl`
112 | 
113 | ```bash
114 | $ echo 'foo:bar:123:baz' | cut -d: -f3
115 | 123
116 | 
117 | $ # by default output delimiter is same as input
118 | $ echo 'foo:bar:123:baz' | cut -d: -f1,4
119 | foo:baz
120 | 
121 | $ # quote the delimiter character if it clashes with shell special characters
122 | $ echo 'one;two;three;four' | cut -d; -f3
123 | cut: option requires an argument -- 'd'
124 | Try 'cut --help' for more information.
125 | -f3: command not found
126 | $ echo 'one;two;three;four' | cut -d';' -f3
127 | three
128 | ```
129 | 
130 | * use `--output-delimiter` option to specify different output delimiter
131 | * since this option accepts a string, more than one character can be specified
132 | * See also [using $ prefixed string](https://unix.stackexchange.com/questions/48106/what-does-it-mean-to-have-a-dollarsign-prefixed-string-in-a-script)
133 | 
134 | ```bash
135 | $ printf 'foo\tbar\t123\tbaz\n' | cut --output-delimiter=: -f1-3
136 | foo:bar:123
137 | 
138 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=' ' -f1,3-
139 | one three four
140 | 
141 | $ # tested on bash, might differ with other shells
142 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=$'\t' -f1,3-
143 | one     three   four
144 | 
145 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=' - ' -f1,3-
146 | one - three - four
147 | ```
148 | 
149 | <br>
150 | 
151 | #### <a name="complement"></a>complement
152 | 
153 | ```bash
154 | $ echo 'one;two;three;four' | cut -d';' -f1,3-
155 | one;three;four
156 | 
157 | $ # to print other than specified fields
158 | $ echo 'one;two;three;four' | cut -d';' --complement -f2
159 | one;three;four
160 | ```
161 | 
162 | <br>
163 | 
164 | #### <a name="select-specific-characters"></a>select specific characters
165 | 
166 | * similar to `-f` for field selection, use `-c` for character selection
167 | * See manual for what defines a character and differences between `-b` and `-c`
168 | 
169 | ```bash
170 | $ echo 'foo:bar:123:baz' | cut -c4
171 | :
172 | 
173 | $ printf 'foo\tbar\t123\tbaz\n' | cut -c1,4,7
174 | f       r
175 | 
176 | $ echo 'foo:bar:123:baz' | cut -c8-
177 | :123:baz
178 | 
179 | $ echo 'foo:bar:123:baz' | cut --complement -c8-
180 | foo:bar
181 | 
182 | $ echo 'foo:bar:123:baz' | cut -c1,6,7 --output-delimiter=' '
183 | f a r
184 | 
185 | $ echo 'abcdefghij' | cut --output-delimiter='-' -c1-3,4-7,8-
186 | abc-defg-hij
187 | 
188 | $ cut -c1-3 marks.txt
189 | jan
190 | foo
191 | feb
192 | foo
193 | ```
194 | 
195 | <br>
196 | 
197 | #### <a name="further-reading-for-cut"></a>Further reading for cut
198 | 
199 | * `man cut` and `info cut` for more options and detailed documentation
200 | * [cut Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/cut?sort=votes&pageSize=15)
201 | 
202 | <br>
203 | 
204 | ## <a name="tr"></a>tr
205 | 
206 | ```bash
207 | $ tr --version | head -n1
208 | tr (GNU coreutils) 8.25
209 | 
210 | $ man tr
211 | TR(1)                            User Commands                           TR(1)
212 | 
213 | NAME
214 |        tr - translate or delete characters
215 | 
216 | SYNOPSIS
217 |        tr [OPTION]... SET1 [SET2]
218 | 
219 | DESCRIPTION
220 |        Translate, squeeze, and/or delete characters from standard input, writ‐
221 |        ing to standard output.
222 | ...
223 | ```
224 | 
225 | <br>
226 | 
227 | #### <a name="translation"></a>translation
228 | 
229 | * one-to-one mapping of characters, all occurrences are translated
230 | * as good practice, enclose the arguments in single quotes to avoid issues due to shell interpretation
231 | 
232 | ```bash
233 | $ echo 'foo bar cat baz' | tr 'abc' '123'
234 | foo 21r 31t 21z
235 | 
236 | $ # use - to represent a range in ascending order
237 | $ echo 'foo bar cat baz' | tr 'a-f' '1-6'
238 | 6oo 21r 31t 21z
239 | 
240 | $ # changing case
241 | $ echo 'foo bar cat baz' | tr 'a-z' 'A-Z'
242 | FOO BAR CAT BAZ
243 | $ echo 'Hello World' | tr 'a-zA-Z' 'A-Za-z'
244 | hELLO wORLD
245 | 
246 | $ echo 'foo;bar;baz' | tr ; :
247 | tr: missing operand
248 | Try 'tr --help' for more information.
249 | $ echo 'foo;bar;baz' | tr ';' ':'
250 | foo:bar:baz
251 | ```
252 | 
253 | * rot13 example
254 | 
255 | ```bash
256 | $ echo 'foo bar cat baz' | tr 'a-z' 'n-za-m'
257 | sbb one png onm
258 | $ echo 'sbb one png onm' | tr 'a-z' 'n-za-m'
259 | foo bar cat baz
260 | 
261 | $ echo 'Hello World' | tr 'a-zA-Z' 'n-za-mN-ZA-M'
262 | Uryyb Jbeyq
263 | $ echo 'Uryyb Jbeyq' | tr 'a-zA-Z' 'n-za-mN-ZA-M'
264 | Hello World
265 | ```
266 | 
267 | * use shell input redirection for file input
268 | 
269 | ```bash
270 | $ cat marks.txt
271 | jan 2017
272 | foobar  12      45      23
273 | feb 2017
274 | foobar  18      38      19
275 | 
276 | $ tr 'a-z' 'A-Z' < marks.txt
277 | JAN 2017
278 | FOOBAR  12      45      23
279 | FEB 2017
280 | FOOBAR  18      38      19
281 | ```
282 | 
283 | * if arguments are of different lengths
284 | 
285 | ```bash
286 | $ # when second argument is longer, the extra characters are ignored
287 | $ echo 'foo bar cat baz' | tr 'abc' '1-9'
288 | foo 21r 31t 21z
289 | 
290 | $ # when first argument is longer
291 | $ # the last character of second argument gets re-used
292 | $ echo 'foo bar cat baz' | tr 'a-z' '123'
293 | 333 213 313 213
294 | 
295 | $ # use -t option to truncate first argument to same length as second
296 | $ echo 'foo bar cat baz' | tr -t 'a-z' '123'
297 | foo 21r 31t 21z
298 | ```
299 | 
300 | <br>
301 | 
302 | #### <a name="escape-sequences-and-character-classes"></a>escape sequences and character classes
303 | 
304 | * Certain characters like newline, tab, etc can be represented using escape sequences or octal representation
305 | * Certain commonly useful groups of characters like alphabets, digits, punctuations etc have character class as shortcuts
306 | * See [gnu tr manual](http://www.gnu.org/software/coreutils/manual/html_node/Character-sets.html#Character-sets) for all escape sequences and character classes
307 | 
308 | ```bash
309 | $ printf 'foo\tbar\t123\tbaz\n' | tr '\t' ':'
310 | foo:bar:123:baz
311 | 
312 | $ echo 'foo:bar:123:baz' | tr ':' '\n'
313 | foo
314 | bar
315 | 123
316 | baz
317 | $ # makes it easier to transform
318 | $ echo 'foo:bar:123:baz' | tr ':' '\n' | pr -2ats'-'
319 | foo-bar
320 | 123-baz
321 | 
322 | $ echo 'foo bar cat baz' | tr '[:lower:]' '[:upper:]'
323 | FOO BAR CAT BAZ
324 | ```
325 | 
326 | * since `-` is used for character ranges, place it at the end to represent it literally
327 |     * cannot be used at start of argument as it would get treated as option
328 |     * or use `--` to indicate end of option processing
329 | * similarly, to represent `\` literally, use `\\`
330 | 
331 | ```bash
332 | $ echo '/foo-bar/baz/report' | tr '-a-z' '_A-Z'
333 | tr: invalid option -- 'a'
334 | Try 'tr --help' for more information.
335 | 
336 | $ echo '/foo-bar/baz/report' | tr 'a-z-' 'A-Z_'
337 | /FOO_BAR/BAZ/REPORT
338 | 
339 | $ echo '/foo-bar/baz/report' | tr -- '-a-z' '_A-Z'
340 | /FOO_BAR/BAZ/REPORT
341 | 
342 | $ echo '/foo-bar/baz/report' | tr '/-' '\\_'
343 | \foo_bar\baz\report
344 | ```
345 | 
346 | <br>
347 | 
348 | #### <a name="deletion"></a>deletion
349 | 
350 | * use `-d` option to specify characters to be deleted
351 | * add complement option `-c` if it is easier to define which characters are to be retained
352 | 
353 | ```bash
354 | $ echo '2017-03-21' | tr -d '-'
355 | 20170321
356 | 
357 | $ echo 'Hi123 there. How a32re you' | tr -d '1-9'
358 | Hi there. How are you
359 | 
360 | $ # delete all punctuation characters
361 | $ echo '"Foo1!", "Bar.", ":Baz:"' | tr -d '[:punct:]'
362 | Foo1 Bar Baz
363 | 
364 | $ # deleting carriage return character
365 | $ cat -v greeting.txt
366 | Hi there^M
367 | How are you^M
368 | $ tr -d '\r' < greeting.txt | cat -v
369 | Hi there
370 | How are you
371 | 
372 | $ # retain only alphabets, comma and newline characters
373 | $ echo '"Foo1!", "Bar.", ":Baz:"' | tr -cd '[:alpha:],\n'
374 | Foo,Bar,Baz
375 | ```
376 | 
377 | <br>
378 | 
379 | #### <a name="squeeze"></a>squeeze
380 | 
381 | * to change consecutive repeated characters to single copy of that character
382 | 
383 | ```bash
384 | $ # only lower case alphabets
385 | $ echo 'FFoo seed 11233' | tr -s 'a-z'
386 | FFo sed 11233
387 | 
388 | $ # alphabets and digits
389 | $ echo 'FFoo seed 11233' | tr -s '[:alnum:]'
390 | Fo sed 123
391 | 
392 | $ # squeeze other than alphabets
393 | $ echo 'FFoo seed 11233' | tr -sc '[:alpha:]'
394 | FFoo seed 123
395 | 
396 | $ # only characters present in second argument is used for squeeze
397 | $ echo 'FFoo seed 11233' | tr -s 'A-Z' 'a-z'
398 | fo sed 11233
399 | 
400 | $ # multiple consecutive horizontal spaces to single space
401 | $ printf 'foo\t\tbar \t123     baz\n'
402 | foo             bar     123     baz
403 | $ printf 'foo\t\tbar \t123     baz\n' | tr -s '[:blank:]' ' '
404 | foo bar 123 baz
405 | ```
406 | 
407 | <br>
408 | 
409 | #### <a name="further-reading-for-tr"></a>Further reading for tr
410 | 
411 | * `man tr` and `info tr` for more options and detailed documentation
412 | * [tr Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/tr?sort=votes&pageSize=15)
413 | 
414 | <br>
415 | 
416 | ## <a name="basename"></a>basename
417 | 
418 | ```bash
419 | $ basename --version | head -n1
420 | basename (GNU coreutils) 8.25
421 | 
422 | $ man basename
423 | BASENAME(1)                      User Commands                     BASENAME(1)
424 | 
425 | NAME
426 |        basename - strip directory and suffix from filenames
427 | 
428 | SYNOPSIS
429 |        basename NAME [SUFFIX]
430 |        basename OPTION... NAME...
431 | 
432 | DESCRIPTION
433 |        Print  NAME  with  any leading directory components removed.  If speci‐
434 |        fied, also remove a trailing SUFFIX.
435 | ...
436 | ```
437 | 
438 | <br>
439 | 
440 | **Examples**
441 | 
442 | ```bash
443 | $ # same as using pwd command
444 | $ echo "$PWD"
445 | /home/learnbyexample
446 | 
447 | $ basename "$PWD"
448 | learnbyexample
449 | 
450 | $ # use -a option if there are multiple arguments
451 | $ basename -a foo/a/report.log bar/y/power.log
452 | report.log
453 | power.log
454 | 
455 | $ # use single quotes if arguments contain space and other special shell characters
456 | $ # use suffix option -s to strip file extension from filename
457 | $ basename -s '.log' '/home/learnbyexample/proj adder/power.log'
458 | power
459 | $ # -a is implied when using -s option
460 | $ basename -s'.log' foo/a/report.log bar/y/power.log
461 | report
462 | power
463 | ```
464 | 
465 | * Can also use [Parameter expansion](http://mywiki.wooledge.org/BashFAQ/073) if working on file paths saved in variables
466 |     * assumes `bash` shell and similar that support this feature
467 | 
468 | ```bash
469 | $ # remove from start of string up to last /
470 | $ file='/home/learnbyexample/proj adder/power.log'
471 | $ basename "$file"
472 | power.log
473 | $ echo "${file##*/}"
474 | power.log
475 | 
476 | $ t="${file##*/}"
477 | $ # remove .log from end of string
478 | $ echo "${t%.log}"
479 | power
480 | ```
481 | 
482 | * See `man basename` and `info basename` for detailed documentation
483 | 
484 | <br>
485 | 
486 | ## <a name="dirname"></a>dirname
487 | 
488 | ```bash
489 | $ dirname --version | head -n1
490 | dirname (GNU coreutils) 8.25
491 | 
492 | $ man dirname
493 | DIRNAME(1)                       User Commands                      DIRNAME(1)
494 | 
495 | NAME
496 |        dirname - strip last component from file name
497 | 
498 | SYNOPSIS
499 |        dirname [OPTION] NAME...
500 | 
501 | DESCRIPTION
502 |        Output each NAME with its last non-slash component and trailing slashes
503 |        removed; if NAME contains no  /'s,  output  '.'  (meaning  the  current
504 |        directory).
505 | ...
506 | ```
507 | 
508 | <br>
509 | 
510 | **Examples**
511 | 
512 | ```bash
513 | $ echo "$PWD"
514 | /home/learnbyexample
515 | 
516 | $ dirname "$PWD"
517 | /home
518 | 
519 | $ # use single quotes if arguments contain space and other special shell characters
520 | $ dirname '/home/learnbyexample/proj adder/power.log'
521 | /home/learnbyexample/proj adder
522 | 
523 | $ # unlike basename, by default dirname handles multiple arguments
524 | $ dirname foo/a/report.log bar/y/power.log
525 | foo/a
526 | bar/y
527 | 
528 | $ # if no / in argument, output is . to indicate current directory
529 | $ dirname power.log
530 | .
531 | ```
532 | 
533 | * Use `$()` command substitution to further process output as needed
534 | 
535 | ```bash
536 | $ dirname '/home/learnbyexample/proj adder/power.log'
537 | /home/learnbyexample/proj adder
538 | 
539 | $ dirname "$(dirname '/home/learnbyexample/proj adder/power.log')"
540 | /home/learnbyexample
541 | 
542 | $ basename "$(dirname '/home/learnbyexample/proj adder/power.log')"
543 | proj adder
544 | ```
545 | 
546 | * Can also use [Parameter expansion](http://mywiki.wooledge.org/BashFAQ/073) if working on file paths saved in variables
547 |     * assumes `bash` shell and similar that support this feature
548 | 
549 | ```bash
550 | $ # remove from last / in the string to end of string
551 | $ file='/home/learnbyexample/proj adder/power.log'
552 | $ dirname "$file"
553 | /home/learnbyexample/proj adder
554 | $ echo "${file%/*}"
555 | /home/learnbyexample/proj adder
556 | 
557 | $ # remove from second last / to end of string
558 | $ echo "${file%/*/*}"
559 | /home/learnbyexample
560 | 
561 | $ # apply basename trick to get just directory name instead of full path
562 | $ t="${file%/*}"
563 | $ echo "${t##*/}"
564 | proj adder
565 | ```
566 | 
567 | * See `man dirname` and `info dirname` for detailed documentation
568 | 
569 | <br>
570 | 
571 | ## <a name="xargs"></a>xargs
572 | 
573 | ```bash
574 | $ xargs --version | head -n1
575 | xargs (GNU findutils) 4.7.0-git
576 | 
577 | $ whatis xargs
578 | xargs (1)            - build and execute command lines from standard input
579 | 
580 | $ # from 'man xargs'
581 |        This manual page documents the GNU version of xargs.  xargs reads items
582 |        from  the  standard  input, delimited by blanks (which can be protected
583 |        with double or single quotes or a backslash) or newlines, and  executes
584 |        the  command (default is /bin/echo) one or more times with any initial-
585 |        arguments followed by items read from standard input.  Blank  lines  on
586 |        the standard input are ignored.
587 | ```
588 | 
589 | While `xargs` is [primarily used](https://unix.stackexchange.com/questions/24954/when-is-xargs-needed) for passing output of command or file contents to another command as input arguments and/or parallel processing, it can be quite handy for certain text processing stuff with default `echo` command
590 | 
591 | ```bash
592 | $ printf ' foo\t\tbar \t123     baz \n' | cat -e
593 |  foo		bar 	123     baz $
594 | $ # tr helps to change consecutive blanks to single space
595 | $ # but what if blanks at start and end have to be removed as well?
596 | $ printf ' foo\t\tbar \t123     baz \n' | tr -s '[:blank:]' ' ' | cat -e
597 |  foo bar 123 baz $
598 | $ # xargs does this by default
599 | $ printf ' foo\t\tbar \t123     baz \n' | xargs | cat -e
600 | foo bar 123 baz$
601 | 
602 | $ # -n option limits number of arguments per line
603 | $ printf ' foo\t\tbar \t123     baz \n' | xargs -n2
604 | foo bar
605 | 123 baz
606 | 
607 | $ # same as using: paste -d' ' - - -
608 | $ # or: pr -3ats' '
609 | $ seq 6 | xargs -n3
610 | 1 2 3
611 | 4 5 6
612 | ```
613 | 
614 | * use `-a` option to specify file input instead of stdin
615 | 
616 | ```bash
617 | $ cat marks.txt
618 | jan 2017
619 | foobar  12      45      23
620 | feb 2017
621 | foobar  18      38      19
622 | 
623 | $ xargs -a marks.txt
624 | jan 2017 foobar 12 45 23 feb 2017 foobar 18 38 19
625 | 
626 | $ # use -L option to limit max number of lines per command line
627 | $ xargs -L2 -a marks.txt
628 | jan 2017 foobar 12 45 23
629 | feb 2017 foobar 18 38 19
630 | ```
631 | 
632 | * **Note** since `echo` is the command being executed, it will cause issue with option interpretation
633 | 
634 | ```bash
635 | $ printf ' -e foo\t\tbar \t123     baz \n' | xargs -n2
636 | foo
637 | bar 123
638 | baz
639 | 
640 | $ # use -t option to see what is happening (verbose output)
641 | $ printf ' -e foo\t\tbar \t123     baz \n' | xargs -n2 -t
642 | echo -e foo 
643 | foo
644 | echo bar 123 
645 | bar 123
646 | echo baz 
647 | baz
648 | ```
649 | 
650 | * See `man xargs` and `info xargs` for detailed documentation
651 | 
652 | <br>
653 | 
654 | ## <a name="seq"></a>seq
655 | 
656 | ```bash
657 | $ seq --version | head -n1
658 | seq (GNU coreutils) 8.25
659 | 
660 | $ man seq
661 | SEQ(1)                           User Commands                          SEQ(1)
662 | 
663 | NAME
664 |        seq - print a sequence of numbers
665 | 
666 | SYNOPSIS
667 |        seq [OPTION]... LAST
668 |        seq [OPTION]... FIRST LAST
669 |        seq [OPTION]... FIRST INCREMENT LAST
670 | 
671 | DESCRIPTION
672 |        Print numbers from FIRST to LAST, in steps of INCREMENT.
673 | ...
674 | ```
675 | 
676 | <br>
677 | 
678 | #### <a name="integer-sequences"></a>integer sequences
679 | 
680 | * see `info seq` for details of how large numbers are handled
681 |     * for ex: `seq 50000000000000000000 2 50000000000000000004` may not work
682 | 
683 | ```bash
684 | $ # default start=1 and increment=1
685 | $ seq 3
686 | 1
687 | 2
688 | 3
689 | 
690 | $ # default increment=1
691 | $ seq 25434 25437
692 | 25434
693 | 25435
694 | 25436
695 | 25437
696 | $ seq -5 -3
697 | -5
698 | -4
699 | -3
700 | 
701 | $ # different increment value
702 | $ seq 1000 5 1011
703 | 1000
704 | 1005
705 | 1010
706 | 
707 | $ # use negative increment for descending order
708 | $ seq 10 -5 -7
709 | 10
710 | 5
711 | 0
712 | -5
713 | ```
714 | 
715 | * use `-w` option for leading zeros
716 | * largest length of start/end value is used to determine padding
717 | 
718 | ```bash
719 | $ seq 008 010
720 | 8
721 | 9
722 | 10
723 | 
724 | $ # or: seq -w 8 010
725 | $ seq -w 008 010
726 | 008
727 | 009
728 | 010
729 | 
730 | $ seq -w 0003
731 | 0001
732 | 0002
733 | 0003
734 | ```
735 | 
736 | <br>
737 | 
738 | #### <a name="specifying-separator"></a>specifying separator
739 | 
740 | * As seen already, default is newline separator between numbers
741 | * `-s` option allows to use custom string between numbers
742 | * A newline is always added at end
743 | 
744 | ```bash
745 | $ seq -s: 4
746 | 1:2:3:4
747 | 
748 | $ seq -s' ' 4
749 | 1 2 3 4
750 | 
751 | $ seq -s' - ' 4
752 | 1 - 2 - 3 - 4
753 | ```
754 | 
755 | <br>
756 | 
757 | #### <a name="floating-point-sequences"></a>floating point sequences
758 | 
759 | ```bash
760 | $ # default increment=1
761 | $ seq 0.5 2.5
762 | 0.5
763 | 1.5
764 | 2.5
765 | 
766 | $ seq -s':' -2 0.75 3
767 | -2.00:-1.25:-0.50:0.25:1.00:1.75:2.50
768 | 
769 | $ # Scientific notation is supported
770 | $ seq 1.2e2 1.22e2
771 | 120
772 | 121
773 | 122
774 | ```
775 | 
776 | * formatting numbers, see `info seq` for details
777 | 
778 | ```bash
779 | $ seq -f'%.3f' -s':' -2 0.75 3
780 | -2.000:-1.250:-0.500:0.250:1.000:1.750:2.500
781 | 
782 | $ seq -f'%.3e' 1.2e2 1.22e2
783 | 1.200e+02
784 | 1.210e+02
785 | 1.220e+02
786 | ```
787 | 
788 | <br>
789 | 
790 | #### <a name="further-reading-for-seq"></a>Further reading for seq
791 | 
792 | * `man seq` and `info seq` for more options, corner cases and detailed documentation
793 | * [seq Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/seq?sort=votes&pageSize=15)
794 | 


--------------------------------------------------------------------------------
/overview_presentation/baz.json:
--------------------------------------------------------------------------------
 1 | {
 2 |    "abc": {
 3 |       "@attr": "good",
 4 |       "text": "Hi there"
 5 |    },
 6 |    "xyz": {
 7 |       "@attr": "bad",
 8 |       "text": "I am good. How are you?"
 9 |    }
10 | }
11 | 


--------------------------------------------------------------------------------
/overview_presentation/cli_text_processing.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/overview_presentation/cli_text_processing.pdf


--------------------------------------------------------------------------------
/overview_presentation/foo.xml:
--------------------------------------------------------------------------------
1 | <foo>
2 |     <abc attr="good">Hi there</abc>
3 |     <xyz attr="bad">I am good. How are you?</xyz>
4 | </foo>
5 | 


--------------------------------------------------------------------------------
/overview_presentation/greeting.txt:
--------------------------------------------------------------------------------
1 | Hi there
2 | Have a nice day
3 | 


--------------------------------------------------------------------------------
/overview_presentation/sample.txt:
--------------------------------------------------------------------------------
 1 | Hello World!
 2 | 
 3 | Good day
 4 | How do you do?
 5 | 
 6 | Just do it
 7 | Believe 42 it!
 8 | 
 9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 | 
13 | Much ado about nothing
14 | He he 123 he he
15 | 


--------------------------------------------------------------------------------
/restructure_text.md:
--------------------------------------------------------------------------------
  1 | # <a name="restructure-text"></a>Restructure text
  2 | 
  3 | **Table of Contents**
  4 | 
  5 | * [paste](#paste)
  6 |     * [Concatenating files column wise](#concatenating-files-column-wise)
  7 |     * [Interleaving lines](#interleaving-lines)
  8 |     * [Lines to multiple columns](#lines-to-multiple-columns)
  9 |     * [Different delimiters between columns](#different-delimiters-between-columns)
 10 |     * [Multiple lines to single row](#multiple-lines-to-single-row)
 11 |     * [Further reading for paste](#further-reading-for-paste)
 12 | * [column](#column)
 13 |     * [Pretty printing tables](#pretty-printing-tables)
 14 |     * [Specifying different input delimiter](#specifying-different-input-delimiter)
 15 |     * [Further reading for column](#further-reading-for-column)
 16 | * [pr](#pr)
 17 |     * [Converting lines to columns](#converting-lines-to-columns)
 18 |     * [Changing PAGE_WIDTH](#changing-page_width)
 19 |     * [Combining multiple input files](#combining-multiple-input-files)
 20 |     * [Transposing a table](#transposing-a-table)
 21 |     * [Further reading for pr](#further-reading-for-pr)
 22 | * [fold](#fold)
 23 |     * [Examples](#examples)
 24 |     * [Further reading for fold](#further-reading-for-fold)
 25 | 
 26 | <br>
 27 | 
 28 | ## <a name="paste"></a>paste
 29 | 
 30 | ```bash
 31 | $ paste --version | head -n1
 32 | paste (GNU coreutils) 8.25
 33 | 
 34 | $ man paste
 35 | PASTE(1)                         User Commands                        PASTE(1)
 36 | 
 37 | NAME
 38 |        paste - merge lines of files
 39 | 
 40 | SYNOPSIS
 41 |        paste [OPTION]... [FILE]...
 42 | 
 43 | DESCRIPTION
 44 |        Write  lines  consisting  of  the sequentially corresponding lines from
 45 |        each FILE, separated by TABs, to standard output.
 46 | 
 47 |        With no FILE, or when FILE is -, read standard input.
 48 | ...
 49 | ```
 50 | 
 51 | <br>
 52 | 
 53 | #### <a name="concatenating-files-column-wise"></a>Concatenating files column wise
 54 | 
 55 | * By default, `paste` adds a TAB between corresponding lines of input files
 56 | 
 57 | ```bash
 58 | $ paste colors_1.txt colors_2.txt
 59 | Blue    Black
 60 | Brown   Blue
 61 | Purple  Green
 62 | Red     Red
 63 | Teal    White
 64 | ```
 65 | 
 66 | * Specifying a different delimiter using `-d`
 67 | * The `<()` syntax is [Process Substitution](http://mywiki.wooledge.org/ProcessSubstitution)
 68 |     * to put it simply - allows output of command to be passed as input file to another command without needing to manually create a temporary file
 69 | 
 70 | ```bash
 71 | $ paste -d, <(seq 5) <(seq 6 10)
 72 | 1,6
 73 | 2,7
 74 | 3,8
 75 | 4,9
 76 | 5,10
 77 | 
 78 | $ # empty cells if number of lines is not same for all input files
 79 | $ # -d\| can also be used
 80 | $ paste -d'|' <(seq 3) <(seq 4 6) <(seq 7 10)
 81 | 1|4|7
 82 | 2|5|8
 83 | 3|6|9
 84 | ||10
 85 | ```
 86 | 
 87 | * to paste without any character in between, use `\0` as delimiter
 88 |     * note that `\0` here doesn't mean the ASCII NUL character
 89 |     * can also use `-d ''` with `GNU paste`
 90 | 
 91 | ```bash
 92 | $ paste -d'\0' <(seq 3) <(seq 6 8)
 93 | 16
 94 | 27
 95 | 38
 96 | ```
 97 | 
 98 | <br>
 99 | 
100 | #### <a name="interleaving-lines"></a>Interleaving lines
101 | 
102 | * Interleave lines by using newline as delimiter
103 | 
104 | ```bash
105 | $ paste -d'\n' <(seq 11 13) <(seq 101 103)
106 | 11
107 | 101
108 | 12
109 | 102
110 | 13
111 | 103
112 | ```
113 | 
114 | <br>
115 | 
116 | #### <a name="lines-to-multiple-columns"></a>Lines to multiple columns
117 | 
118 | * Number of `-` specified determines number of output columns
119 | * Input lines can be passed only as stdin
120 | 
121 | ```bash
122 | $ # single column to two columns
123 | $ seq 10 | paste -d, - -
124 | 1,2
125 | 3,4
126 | 5,6
127 | 7,8
128 | 9,10
129 | 
130 | $ # single column to five columns
131 | $ seq 10 | paste -d: - - - - -
132 | 1:2:3:4:5
133 | 6:7:8:9:10
134 | 
135 | $ # input redirection for file input
136 | $ paste -d, - - < colors_1.txt
137 | Blue,Brown
138 | Purple,Red
139 | Teal,
140 | ```
141 | 
142 | * Use `printf` trick if number of columns to specify is too large
143 | 
144 | ```bash
145 | $ # prompt at end of line not shown for simplicity
146 | $ printf -- "- %.s" {1..5}
147 | - - - - - 
148 | 
149 | $ seq 10 | paste -d, $(printf -- "- %.s" {1..5})
150 | 1,2,3,4,5
151 | 6,7,8,9,10
152 | ```
153 | 
154 | <br>
155 | 
156 | #### <a name="different-delimiters-between-columns"></a>Different delimiters between columns
157 | 
158 | * For more than 2 columns, different delimiter character can be specified - passed as list to `-d` option
159 | 
160 | ```bash
161 | $ # , is used between 1st and 2nd column
162 | $ # - is used between 2nd and 3rd column
163 | $ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9)
164 | 1,4-7
165 | 2,5-8
166 | 3,6-9
167 | 
168 | $ # re-use list from beginning if not specified for all columns
169 | $ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12)
170 | 1,4-7,10
171 | 2,5-8,11
172 | 3,6-9,12
173 | $ # another example
174 | $ seq 10 | paste -d':,' - - - - -
175 | 1:2,3:4,5
176 | 6:7,8:9,10
177 | 
178 | $ # so, with single delimiter, it is just re-used for all columns
179 | $ paste -d, <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12)
180 | 1,4,7,10
181 | 2,5,8,11
182 | 3,6,9,12
183 | ```
184 | 
185 | * combination of `-d` and `/dev/null` (empty file) can give multi-character separation between columns
186 | * If this is too confusing to use, consider [pr](#pr) instead
187 | 
188 | ```bash
189 | $ paste -d' : ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9)
190 | 1 : 4 : 7
191 | 2 : 5 : 8
192 | 3 : 6 : 9
193 | 
194 | $ # or just use pr instead
195 | $ pr -mts' : ' <(seq 3) <(seq 4 6) <(seq 7 9)
196 | 1 : 4 : 7
197 | 2 : 5 : 8
198 | 3 : 6 : 9
199 | 
200 | $ # but paste would allow different delimiters ;)
201 | $ paste -d' :  - ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9)
202 | 1 : 4 - 7
203 | 2 : 5 - 8
204 | 3 : 6 - 9
205 | 
206 | $ # pr would need two invocations
207 | $ pr -mts' : ' <(seq 3) <(seq 4 6) | pr -mts' - ' - <(seq 7 9)
208 | 1 : 4 - 7
209 | 2 : 5 - 8
210 | 3 : 6 - 9
211 | ```
212 | 
213 | * example to show using empty file instead of `/dev/null`
214 | 
215 | ```bash
216 | $ # assuming file named e doesn't exist
217 | $ touch e
218 | $ # or use this, will empty contents even if file named e already exists :P
219 | $ > e
220 | 
221 | $ paste -d' :  - ' <(seq 3) e e <(seq 4 6) e e <(seq 7 9)
222 | 1 : 4 - 7
223 | 2 : 5 - 8
224 | 3 : 6 - 9
225 | ```
226 | 
227 | <br>
228 | 
229 | #### <a name="multiple-lines-to-single-row"></a>Multiple lines to single row
230 | 
231 | ```bash
232 | $ paste -sd, colors_1.txt
233 | Blue,Brown,Purple,Red,Teal
234 | 
235 | $ # multiple files each gets a row
236 | $ paste -sd: colors_1.txt colors_2.txt
237 | Blue:Brown:Purple:Red:Teal
238 | Black:Blue:Green:Red:White
239 | 
240 | $ # multiple input files need not have same number of lines
241 | $ paste -sd, <(seq 3) <(seq 5 9)
242 | 1,2,3
243 | 5,6,7,8,9
244 | ```
245 | 
246 | * Often used to serialize multiple line output from another command
247 | 
248 | ```bash
249 | $ sort -u colors_1.txt colors_2.txt | paste -sd,
250 | Black,Blue,Brown,Green,Purple,Red,Teal,White
251 | ```
252 | 
253 | * For multiple character delimiter, post-process if separator is unique or use another tool like `perl`
254 | 
255 | ```bash
256 | $ seq 10 | paste -sd,
257 | 1,2,3,4,5,6,7,8,9,10
258 | 
259 | $ # post-process
260 | $ seq 10 | paste -sd, | sed 's/,/ : /g'
261 | 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10
262 | 
263 | $ # using perl alone
264 | $ seq 10 | perl -pe 's/\n/ : / if(!eof)'
265 | 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10
266 | ```
267 | 
268 | <br>
269 | 
270 | #### <a name="further-reading-for-paste"></a>Further reading for paste
271 | 
272 | * `man paste` and `info paste` for more options and detailed documentation
273 | * [paste Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/paste?sort=votes&pageSize=15)
274 | 
275 | <br>
276 | 
277 | ## <a name="column"></a>column
278 | 
279 | ```bash
280 | COLUMN(1)                 BSD General Commands Manual                COLUMN(1)
281 | 
282 | NAME
283 |      column — columnate lists
284 | 
285 | SYNOPSIS
286 |      column [-entx] [-c columns] [-s sep] [file ...]
287 | 
288 | DESCRIPTION
289 |      The column utility formats its input into multiple columns.  Rows are
290 |      filled before columns.  Input is taken from file operands, or, by
291 |      default, from the standard input.  Empty lines are ignored unless the -e
292 |      option is used.
293 | ...
294 | ```
295 | 
296 | <br>
297 | 
298 | #### <a name="pretty-printing-tables"></a>Pretty printing tables
299 | 
300 | * by default whitespace is input delimiter
301 | 
302 | ```bash
303 | $ cat dishes.txt
304 | North alootikki baati khichdi makkiroti poha
305 | South appam bisibelebath dosa koottu sevai
306 | West dhokla khakhra modak shiro vadapav
307 | East handoguri litti momo rosgulla shondesh
308 | 
309 | $ column -t dishes.txt
310 | North  alootikki  baati         khichdi  makkiroti  poha
311 | South  appam      bisibelebath  dosa     koottu     sevai
312 | West   dhokla     khakhra       modak    shiro      vadapav
313 | East   handoguri  litti         momo     rosgulla   shondesh
314 | ```
315 | 
316 | * often useful to get neatly aligned columns from output of another command
317 | 
318 | ```bash
319 | $ paste fruits.txt price.txt
320 | Fruits  Price
321 | apple   182
322 | guava   90
323 | watermelon      35
324 | banana  72
325 | pomegranate     280
326 | 
327 | $ paste fruits.txt price.txt | column -t
328 | Fruits       Price
329 | apple        182
330 | guava        90
331 | watermelon   35
332 | banana       72
333 | pomegranate  280
334 | ```
335 | 
336 | <br>
337 | 
338 | #### <a name="specifying-different-input-delimiter"></a>Specifying different input delimiter
339 | 
340 | * Use `-s` to specify input delimiter
341 | * Use `-n` to prevent merging empty cells
342 |     * From `man column` "This option is a Debian GNU/Linux extension"
343 | 
344 | ```bash
345 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13)
346 | 1,5,11
347 | 2,6,12
348 | 3,7,13
349 | ,8,
350 | ,9,
351 | 
352 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) | column -s, -t
353 | 1  5  11
354 | 2  6  12
355 | 3  7  13
356 | 8
357 | 9
358 | 
359 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) | column -s, -nt
360 | 1  5  11
361 | 2  6  12
362 | 3  7  13
363 |    8  
364 |    9  
365 | ```
366 | 
367 | <br>
368 | 
369 | #### <a name="further-reading-for-column"></a>Further reading for column
370 | 
371 | * `man column` for more options and detailed documentation
372 | * [column Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/columns?sort=votes&pageSize=15)
373 | * More examples [here](http://www.commandlinefu.com/commands/using/column/sort-by-votes)
374 | 
375 | <br>
376 | 
377 | ## <a name="pr"></a>pr
378 | 
379 | ```bash
380 | $ pr --version | head -n1
381 | pr (GNU coreutils) 8.25
382 | 
383 | $ man pr
384 | PR(1)                            User Commands                           PR(1)
385 | 
386 | NAME
387 |        pr - convert text files for printing
388 | 
389 | SYNOPSIS
390 |        pr [OPTION]... [FILE]...
391 | 
392 | DESCRIPTION
393 |        Paginate or columnate FILE(s) for printing.
394 | 
395 |        With no FILE, or when FILE is -, read standard input.
396 | ...
397 | ```
398 | 
399 | * `Paginate` is not covered, examples related only to `columnate`
400 | * For example, default invocation on a file would add a header, etc
401 | 
402 | ```bash
403 | $ # truncated output shown
404 | $ pr fruits.txt
405 | 
406 | 
407 | 2017-04-21 17:49                    fruits.txt                    Page 1
408 | 
409 | 
410 | Fruits
411 | apple
412 | guava
413 | watermelon
414 | banana
415 | pomegranate
416 | 
417 | ```
418 | 
419 | * Following sections will use `-t` to omit page headers and trailers
420 | 
421 | <br>
422 | 
423 | #### <a name="converting-lines-to-columns"></a>Converting lines to columns
424 | 
425 | * With [paste](#lines-to-multiple-columns), changing input file rows to column(s) is possible only with consecutive lines
426 | * `pr` can do that as well as split entire file itself according to number of columns needed
427 | * And `-s` option in `pr` allows multi-character output delimiter
428 | * As usual, examples to better show the functionalities
429 | 
430 | ```bash
431 | $ # note how the input got split into two and resulting splits joined by ,
432 | $ seq 6 | pr -2ts,
433 | 1,4
434 | 2,5
435 | 3,6
436 | 
437 | $ # note how two consecutive lines gets joined by ,
438 | $ seq 6 | paste -d, - -
439 | 1,2
440 | 3,4
441 | 5,6
442 | ```
443 | 
444 | * Default **PAGE_WIDTH** is 72 characters, so each column gets 72 divided by number of columns unless `-s` is used
445 | 
446 | ```bash
447 | $ # 3 columns, so each column width is 24 characters
448 | $ seq 9 | pr -3t
449 | 1                       4                       7
450 | 2                       5                       8
451 | 3                       6                       9
452 | 
453 | $ # using -s, desired delimiter can be specified
454 | $ seq 9 | pr -3ts' '
455 | 1 4 7
456 | 2 5 8
457 | 3 6 9
458 | 
459 | $ seq 9 | pr -3ts' : '
460 | 1 : 4 : 7
461 | 2 : 5 : 8
462 | 3 : 6 : 9
463 | 
464 | $ # default is TAB when using -s option with no arguments
465 | $ seq 9 | pr -3ts
466 | 1       4       7
467 | 2       5       8
468 | 3       6       9
469 | ```
470 | 
471 | * Using `-a` to change consecutive rows, similar to `paste`
472 | 
473 | ```bash
474 | $ seq 8 | pr -4ats:
475 | 1:2:3:4
476 | 5:6:7:8
477 | 
478 | $ # no output delimiter for empty cells
479 | $ seq 22 | pr -5ats,
480 | 1,2,3,4,5
481 | 6,7,8,9,10
482 | 11,12,13,14,15
483 | 16,17,18,19,20
484 | 21,22
485 | 
486 | $ # note output delimiter even for empty cells
487 | $ seq 22 | paste -d, - - - - -
488 | 1,2,3,4,5
489 | 6,7,8,9,10
490 | 11,12,13,14,15
491 | 16,17,18,19,20
492 | 21,22,,,
493 | ```
494 | 
495 | <br>
496 | 
497 | #### <a name="changing-page_width"></a>Changing PAGE_WIDTH
498 | 
499 | * The default PAGE_WIDTH is 72
500 | * The formula `(col-1)*len(delimiter) + col` seems to work in determining minimum PAGE_WIDTH required for multiple column output
501 |     * `col` is number of columns required
502 | 
503 | ```bash
504 | $ # (36-1)*1 + 36 = 71, so within PAGE_WIDTH limit
505 | $ seq 74 | pr -36ats,
506 | 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36
507 | 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72
508 | 73,74
509 | $ # (37-1)*1 + 37 = 73, more than default PAGE_WIDTH limit
510 | $ seq 74 | pr -37ats,
511 | pr: page width too narrow
512 | ```
513 | 
514 | * Use `-w` to specify a different PAGE_WIDTH
515 | * The `-J` option turns off truncation
516 | 
517 | ```bash
518 | $ # (37-1)*1 + 37 = 73
519 | $ seq 74 | pr -J -w73 -37ats,
520 | 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37
521 | 38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74
522 | 
523 | $ # (3-1)*4 + 3 = 11
524 | $ seq 6 | pr -J -w10 -3ats'::::'
525 | pr: page width too narrow
526 | $ seq 6 | pr -J -w11 -3ats'::::'
527 | 1::::2::::3
528 | 4::::5::::6
529 | 
530 | $ # if calculating is difficult, simply use a large number
531 | $ seq 6 | pr -J -w500 -3ats'::::'
532 | 1::::2::::3
533 | 4::::5::::6
534 | ```
535 | 
536 | <br>
537 | 
538 | #### <a name="combining-multiple-input-files"></a>Combining multiple input files
539 | 
540 | * Use `-m` option to combine multiple files in parallel, similar to `paste`
541 | 
542 | ```bash
543 | $ # 2 columns, so each column width is 36 characters
544 | $ pr -mt fruits.txt price.txt
545 | Fruits                              Price
546 | apple                               182
547 | guava                               90
548 | watermelon                          35
549 | banana                              72
550 | pomegranate                         280
551 | 
552 | $ # default is TAB when using -s option with no arguments
553 | $ pr -mts <(seq 3) <(seq 4 6) <(seq 7 10)
554 | 1       4       7
555 | 2       5       8
556 | 3       6       9
557 |                 10
558 | 
559 | $ # double TAB as separator
560 | $ # shell expands $'\t\t' before command is executed
561 | $ pr -mts$'\t\t' colors_1.txt colors_2.txt
562 | Blue            Black
563 | Brown           Blue
564 | Purple          Green
565 | Red             Red
566 | Teal            White
567 | ```
568 | 
569 | * For interleaving, specify newline as separator
570 | 
571 | ```bash
572 | $ pr -mts$'\n' fruits.txt price.txt
573 | Fruits
574 | Price
575 | apple
576 | 182
577 | guava
578 | 90
579 | watermelon
580 | 35
581 | banana
582 | 72
583 | pomegranate
584 | 280
585 | ```
586 | 
587 | <br>
588 | 
589 | #### <a name="transposing-a-table"></a>Transposing a table
590 | 
591 | ```bash
592 | $ # delimiter is single character, so easy to use tr to change it to newline
593 | $ cat dishes.txt
594 | North alootikki baati khichdi makkiroti poha
595 | South appam bisibelebath dosa koottu sevai
596 | West dhokla khakhra modak shiro vadapav
597 | East handoguri litti momo rosgulla shondesh
598 | 
599 | $ # 4 columns, so each column width is 18 characters
600 | $ # $(wc -l < dishes.txt) gives number of columns required
601 | $ tr ' ' '\n' < dishes.txt | pr -$(wc -l < dishes.txt)t
602 | North             South             West              East
603 | alootikki         appam             dhokla            handoguri
604 | baati             bisibelebath      khakhra           litti
605 | khichdi           dosa              modak             momo
606 | makkiroti         koottu            shiro             rosgulla
607 | poha              sevai             vadapav           shondesh
608 | ```
609 | 
610 | * Pipe the output to `column` if spacing is too much
611 | 
612 | ```bash
613 | $ tr ' ' '\n' < dishes.txt | pr -$(wc -l < dishes.txt)t | column -t
614 | North      South         West     East
615 | alootikki  appam         dhokla   handoguri
616 | baati      bisibelebath  khakhra  litti
617 | khichdi    dosa          modak    momo
618 | makkiroti  koottu        shiro    rosgulla
619 | poha       sevai         vadapav  shondesh
620 | ```
621 | 
622 | <br>
623 | 
624 | #### <a name="further-reading-for-pr"></a>Further reading for pr
625 | 
626 | * `man pr` and `info pr` for more options and detailed documentation
627 | * More examples [here](http://docstore.mik.ua/orelly/unix3/upt/ch21_15.htm)
628 | 
629 | <br>
630 | 
631 | ## <a name="fold"></a>fold
632 | 
633 | ```bash
634 | $ fold --version | head -n1
635 | fold (GNU coreutils) 8.25
636 | 
637 | $ man fold
638 | FOLD(1)                          User Commands                         FOLD(1)
639 | 
640 | NAME
641 |        fold - wrap each input line to fit in specified width
642 | 
643 | SYNOPSIS
644 |        fold [OPTION]... [FILE]...
645 | 
646 | DESCRIPTION
647 |        Wrap input lines in each FILE, writing to standard output.
648 | 
649 |        With no FILE, or when FILE is -, read standard input.
650 | ...
651 | ```
652 | 
653 | <br>
654 | 
655 | #### <a name="examples"></a>Examples
656 | 
657 | ```bash
658 | $ nl story.txt
659 |      1	The princess of a far away land fought bravely to rescue a travelling group from bandits. And the happy story ends here. Have a nice day.
660 |      2	Still here? okay, read on: The prince of Happalakkahuhu wished he could be as brave as his sister and vowed to train harder
661 | 
662 | $ # default folding width is 80
663 | $ fold story.txt
664 | The princess of a far away land fought bravely to rescue a travelling group from
665 |  bandits. And the happy story ends here. Have a nice day.
666 | Still here? okay, read on: The prince of Happalakkahuhu wished he could be as br
667 | ave as his sister and vowed to train harder
668 | 
669 | $ fold story.txt | nl
670 |      1	The princess of a far away land fought bravely to rescue a travelling group from
671 |      2	 bandits. And the happy story ends here. Have a nice day.
672 |      3	Still here? okay, read on: The prince of Happalakkahuhu wished he could be as br
673 |      4	ave as his sister and vowed to train harder
674 | ```
675 | 
676 | * `-s` option breaks at spaces to avoid word splitting
677 | 
678 | ```bash
679 | $ fold -s story.txt
680 | The princess of a far away land fought bravely to rescue a travelling group 
681 | from bandits. And the happy story ends here. Have a nice day.
682 | Still here? okay, read on: The prince of Happalakkahuhu wished he could be as 
683 | brave as his sister and vowed to train harder
684 | ```
685 | 
686 | * Use `-w` to change default width
687 | 
688 | ```bash
689 | $ fold -s -w60 story.txt
690 | The princess of a far away land fought bravely to rescue a 
691 | travelling group from bandits. And the happy story ends 
692 | here. Have a nice day.
693 | Still here? okay, read on: The prince of Happalakkahuhu 
694 | wished he could be as brave as his sister and vowed to 
695 | train harder
696 | ```
697 | 
698 | <br>
699 | 
700 | #### <a name="further-reading-for-fold"></a>Further reading for fold
701 | 
702 | * `man fold` and `info fold` for more options and detailed documentation
703 | 
704 | 


--------------------------------------------------------------------------------
/sorting_stuff.md:
--------------------------------------------------------------------------------
   1 | # <a name="sorting-stuff"></a>Sorting stuff
   2 | 
   3 | **Table of Contents**
   4 | 
   5 | * [sort](#sort)
   6 |     * [Default sort](#default-sort)
   7 |     * [Reverse sort](#reverse-sort)
   8 |     * [Various number sorting](#various-number-sorting)
   9 |     * [Random sort](#random-sort)
  10 |     * [Specifying output file](#specifying-output-file)
  11 |     * [Unique sort](#unique-sort)
  12 |     * [Column based sorting](#column-based-sorting)
  13 |     * [Further reading for sort](#further-reading-for-sort)
  14 | * [uniq](#uniq)
  15 |     * [Default uniq](#default-uniq)
  16 |     * [Only duplicates](#only-duplicates)
  17 |     * [Only unique](#only-unique)
  18 |     * [Prefix count](#prefix-count)
  19 |     * [Ignoring case](#ignoring-case)
  20 |     * [Combining multiple files](#combining-multiple-files)
  21 |     * [Column options](#column-options)
  22 |     * [Further reading for uniq](#further-reading-for-uniq)
  23 | * [comm](#comm)
  24 |     * [Default three column output](#default-three-column-output)
  25 |     * [Suppressing columns](#suppressing-columns)
  26 |     * [Files with duplicates](#files-with-duplicates)
  27 |     * [Further reading for comm](#further-reading-for-comm)
  28 | * [shuf](#shuf)
  29 |     * [Random lines](#random-lines)
  30 |     * [Random integer numbers](#random-integer-numbers)
  31 |     * [Further reading for shuf](#further-reading-for-shuf)
  32 | 
  33 | <br>
  34 | 
  35 | ## <a name="sort"></a>sort
  36 | 
  37 | ```bash
  38 | $ sort --version | head -n1
  39 | sort (GNU coreutils) 8.25
  40 | 
  41 | $ man sort
  42 | SORT(1)                          User Commands                         SORT(1)
  43 | 
  44 | NAME
  45 |        sort - sort lines of text files
  46 | 
  47 | SYNOPSIS
  48 |        sort [OPTION]... [FILE]...
  49 |        sort [OPTION]... --files0-from=F
  50 | 
  51 | DESCRIPTION
  52 |        Write sorted concatenation of all FILE(s) to standard output.
  53 | 
  54 |        With no FILE, or when FILE is -, read standard input.
  55 | ...
  56 | ```
  57 | 
  58 | **Note**: All examples shown here assumes ASCII encoded input file
  59 | 
  60 | 
  61 | <br>
  62 | 
  63 | #### <a name="default-sort"></a>Default sort
  64 | 
  65 | ```bash
  66 | $ cat poem.txt
  67 | Roses are red,
  68 | Violets are blue,
  69 | Sugar is sweet,
  70 | And so are you.
  71 | 
  72 | $ sort poem.txt
  73 | And so are you.
  74 | Roses are red,
  75 | Sugar is sweet,
  76 | Violets are blue,
  77 | ```
  78 | 
  79 | * Well, that was easy. The lines were sorted alphabetically (ascending order by default) and it so happened that first letter alone was enough to decide the order
  80 | * For next example, let's extract all the words and sort them
  81 |     * also allows to showcase `sort` accepting stdin
  82 |     * See [GNU grep](./gnu_grep.md) chapter if the `grep` command used below looks alien
  83 | 
  84 | ```bash
  85 | $ # output might differ depending on locale settings
  86 | $ # note the case-insensitiveness of output
  87 | $ grep -oi '[a-z]*' poem.txt | sort
  88 | And
  89 | are
  90 | are
  91 | are
  92 | blue
  93 | is
  94 | red
  95 | Roses
  96 | so
  97 | Sugar
  98 | sweet
  99 | Violets
 100 | you
 101 | ```
 102 | 
 103 | * heed hereunto
 104 | * See also
 105 |     * [arch wiki - locale](https://wiki.archlinux.org/index.php/locale)
 106 |     * [Linux: Define Locale and Language Settings](https://www.shellhacks.com/linux-define-locale-language-settings/)
 107 | 
 108 | ```bash
 109 | $ info sort | tail
 110 | 
 111 |    (1) If you use a non-POSIX locale (e.g., by setting ‘LC_ALL’ to
 112 | ‘en_US’), then ‘sort’ may produce output that is sorted differently than
 113 | you’re accustomed to.  In that case, set the ‘LC_ALL’ environment
 114 | variable to ‘C’.  Note that setting only ‘LC_COLLATE’ has two problems.
 115 | First, it is ineffective if ‘LC_ALL’ is also set.  Second, it has
 116 | undefined behavior if ‘LC_CTYPE’ (or ‘LANG’, if ‘LC_CTYPE’ is unset) is
 117 | set to an incompatible value.  For example, you get undefined behavior
 118 | if ‘LC_CTYPE’ is ‘ja_JP.PCK’ but ‘LC_COLLATE’ is ‘en_US.UTF-8’.
 119 | ```
 120 | 
 121 | * Example to help show effect of locale setting
 122 | 
 123 | ```bash
 124 | $ # note how uppercase is sorted before lowercase
 125 | $ grep -oi '[a-z]*' poem.txt | LC_ALL=C sort
 126 | And
 127 | Roses
 128 | Sugar
 129 | Violets
 130 | are
 131 | are
 132 | are
 133 | blue
 134 | is
 135 | red
 136 | so
 137 | sweet
 138 | you
 139 | ```
 140 | 
 141 | <br>
 142 | 
 143 | #### <a name="reverse-sort"></a>Reverse sort
 144 | 
 145 | * This is simply reversing from default ascending order to descending order
 146 | 
 147 | ```bash
 148 | $ sort -r poem.txt
 149 | Violets are blue,
 150 | Sugar is sweet,
 151 | Roses are red,
 152 | And so are you.
 153 | ```
 154 | 
 155 | <br>
 156 | 
 157 | #### <a name="various-number-sorting"></a>Various number sorting
 158 | 
 159 | ```bash
 160 | $ cat numbers.txt
 161 | 20
 162 | 53
 163 | 3
 164 | 101
 165 | 
 166 | $ sort numbers.txt
 167 | 101
 168 | 20
 169 | 3
 170 | 53
 171 | ```
 172 | 
 173 | * Whoops, what happened there? `sort` won't know to treat them as numbers unless specified
 174 | * Depending on format of numbers, different options have to be used
 175 | * First up is `-n` option, which sorts based on numerical value
 176 | 
 177 | ```bash
 178 | $ sort -n numbers.txt
 179 | 3
 180 | 20
 181 | 53
 182 | 101
 183 | 
 184 | $ sort -nr numbers.txt
 185 | 101
 186 | 53
 187 | 20
 188 | 3
 189 | ```
 190 | 
 191 | * The `-n` option can handle negative numbers
 192 | * As well as thousands separator and decimal point (depends on locale)
 193 | * The `<()` syntax is [Process Substitution](http://mywiki.wooledge.org/ProcessSubstitution)
 194 |     * to put it simply - allows output of command to be passed as input file to another command without needing to manually create a temporary file
 195 | 
 196 | ```bash
 197 | $ # multiple files are merged as single input by default
 198 | $ sort -n numbers.txt <(echo '-4')
 199 | -4
 200 | 3
 201 | 20
 202 | 53
 203 | 101
 204 | 
 205 | $ sort -n numbers.txt <(echo '1,234')
 206 | 3
 207 | 20
 208 | 53
 209 | 101
 210 | 1,234
 211 | 
 212 | $ sort -n numbers.txt <(echo '31.24')
 213 | 3
 214 | 20
 215 | 31.24
 216 | 53
 217 | 101
 218 | ```
 219 | 
 220 | * Use `-g` if input contains numbers prefixed by `+` or [E scientific notation](https://en.wikipedia.org/wiki/Scientific_notation#E_notation)
 221 | 
 222 | ```bash
 223 | $ cat generic_numbers.txt
 224 | +120
 225 | -1.53
 226 | 3.14e+4
 227 | 42.1e-2
 228 | 
 229 | $ sort -g generic_numbers.txt
 230 | -1.53
 231 | 42.1e-2
 232 | +120
 233 | 3.14e+4
 234 | ```
 235 | 
 236 | * Commands like `du` have options to display numbers in human readable formats
 237 | * `sort` supports sorting such numbers using the `-h` option
 238 | 
 239 | ```bash
 240 | $ du -sh *
 241 | 104K    power.log
 242 | 746M    projects
 243 | 316K    report.log
 244 | 20K     sample.txt
 245 | $ du -sh * | sort -h
 246 | 20K     sample.txt
 247 | 104K    power.log
 248 | 316K    report.log
 249 | 746M    projects
 250 | 
 251 | $ # --si uses powers of 1000 instead of 1024
 252 | $ du -s --si *
 253 | 107k    power.log
 254 | 782M    projects
 255 | 324k    report.log
 256 | 21k     sample.txt
 257 | $ du -s --si * | sort -h
 258 | 21k     sample.txt
 259 | 107k    power.log
 260 | 324k    report.log
 261 | 782M    projects
 262 | ```
 263 | 
 264 | * Version sort - dealing with numbers mixed with other characters
 265 | * If this sorting is needed simply while displaying directory contents, use `ls -v` instead of piping to `sort -V`
 266 | 
 267 | ```bash
 268 | $ cat versions.txt
 269 | foo_v1.2
 270 | bar_v2.1.3
 271 | foobar_v2
 272 | foo_v1.2.1
 273 | foo_v1.3
 274 | 
 275 | $ sort -V versions.txt
 276 | bar_v2.1.3
 277 | foobar_v2
 278 | foo_v1.2
 279 | foo_v1.2.1
 280 | foo_v1.3
 281 | ```
 282 | 
 283 | * Another common use case is when there are multiple filenames differentiated by numbers
 284 | 
 285 | ```bash
 286 | $ cat files.txt
 287 | file0
 288 | file10
 289 | file3
 290 | file4
 291 | 
 292 | $ sort -V files.txt
 293 | file0
 294 | file3
 295 | file4
 296 | file10
 297 | ```
 298 | 
 299 | * Can be used when dealing with numbers reported by `time` command as well
 300 | 
 301 | ```bash
 302 | $ # different solving durations
 303 | $ cat rubik_time.txt
 304 | 5m35.363s
 305 | 3m20.058s
 306 | 4m5.099s
 307 | 4m1.130s
 308 | 3m42.833s
 309 | 4m33.083s
 310 | 
 311 | $ # assuming consistent min/sec format
 312 | $ sort -V rubik_time.txt
 313 | 3m20.058s
 314 | 3m42.833s
 315 | 4m1.130s
 316 | 4m5.099s
 317 | 4m33.083s
 318 | 5m35.363s
 319 | ```
 320 | 
 321 | <br>
 322 | 
 323 | #### <a name="random-sort"></a>Random sort
 324 | 
 325 | * Note that duplicate lines will always end up next to each other
 326 |     * might be useful as a feature for some cases ;)
 327 |     * Use `shuf` if this is not desirable
 328 | * See also [How can I shuffle the lines of a text file on the Unix command line or in a shell script?](https://stackoverflow.com/questions/2153882/how-can-i-shuffle-the-lines-of-a-text-file-on-the-unix-command-line-or-in-a-shel)
 329 | 
 330 | ```bash
 331 | $ cat nums.txt
 332 | 1
 333 | 10
 334 | 10
 335 | 12
 336 | 23
 337 | 563
 338 | 
 339 | $ # the two 10s will always be next to each other
 340 | $ sort -R nums.txt
 341 | 563
 342 | 12
 343 | 1
 344 | 10
 345 | 10
 346 | 23
 347 | 
 348 | $ # duplicates can end up anywhere
 349 | $ shuf nums.txt
 350 | 10
 351 | 23
 352 | 1
 353 | 10
 354 | 563
 355 | 12
 356 | ```
 357 | 
 358 | <br>
 359 | 
 360 | #### <a name="specifying-output-file"></a>Specifying output file
 361 | 
 362 | * The `-o` option can be used to specify output file
 363 | * Useful for in place editing
 364 | 
 365 | ```bash
 366 | $ sort -R nums.txt -o rand_nums.txt
 367 | $ cat rand_nums.txt
 368 | 23
 369 | 1
 370 | 10
 371 | 10
 372 | 563
 373 | 12
 374 | 
 375 | $ sort -R nums.txt -o nums.txt
 376 | $ cat nums.txt
 377 | 563
 378 | 23
 379 | 10
 380 | 10
 381 | 1
 382 | 12
 383 | ```
 384 | 
 385 | * Use shell script looping if there multiple files to be sorted in place
 386 | * Below snippet is for `bash` shell
 387 | 
 388 | ```bash
 389 | $ for f in *.txt; do echo sort -V "$f" -o "$f"; done
 390 | sort -V files.txt -o files.txt
 391 | sort -V rubik_time.txt -o rubik_time.txt
 392 | sort -V versions.txt -o versions.txt
 393 | 
 394 | $ # remove echo once commands look fine
 395 | $ for f in *.txt; do sort -V "$f" -o "$f"; done
 396 | ```
 397 | 
 398 | <br>
 399 | 
 400 | #### <a name="unique-sort"></a>Unique sort
 401 | 
 402 | * Keep only first copy of lines that are deemed to be same according to `sort` option used
 403 | 
 404 | ```bash
 405 | $ cat duplicates.txt
 406 | foo
 407 | 12 carrots
 408 | foo
 409 | 12 apples
 410 | 5 guavas
 411 | 
 412 | $ # only one copy of foo in output
 413 | $ sort -u duplicates.txt
 414 | 12 apples
 415 | 12 carrots
 416 | 5 guavas
 417 | foo
 418 | ```
 419 | 
 420 | * According to option used, definition of duplicate will vary
 421 | * For example, when `-n` is used, matching numbers are deemed same even if rest of line differs
 422 |     * Pipe the output to `uniq` if this is not desirable
 423 | 
 424 | ```bash
 425 | $ # note how first copy of line starting with 12 is retained
 426 | $ sort -nu duplicates.txt
 427 | foo
 428 | 5 guavas
 429 | 12 carrots
 430 | 
 431 | $ # use uniq when entire line should be compared to find duplicates
 432 | $ sort -n duplicates.txt | uniq
 433 | foo
 434 | 5 guavas
 435 | 12 apples
 436 | 12 carrots
 437 | ```
 438 | 
 439 | * Use `-f` option to ignore case of alphabets while determining duplicates
 440 | 
 441 | ```bash
 442 | $ cat words.txt
 443 | CAR
 444 | are
 445 | car
 446 | Are
 447 | foot
 448 | are
 449 | 
 450 | $ # only the two 'are' were considered duplicates
 451 | $ sort -u words.txt
 452 | are
 453 | Are
 454 | car
 455 | CAR
 456 | foot
 457 | 
 458 | $ # note again that first copy of duplicate is retained
 459 | $ sort -fu words.txt
 460 | are
 461 | CAR
 462 | foot
 463 | ```
 464 | 
 465 | <br>
 466 | 
 467 | #### <a name="column-based-sorting"></a>Column based sorting
 468 | 
 469 | From `info sort`
 470 | 
 471 | ```
 472 | ‘-k POS1[,POS2]’
 473 | ‘--key=POS1[,POS2]’
 474 |      Specify a sort field that consists of the part of the line between
 475 |      POS1 and POS2 (or the end of the line, if POS2 is omitted),
 476 |      _inclusive_.
 477 | 
 478 |      Each POS has the form ‘F[.C][OPTS]’, where F is the number of the
 479 |      field to use, and C is the number of the first character from the
 480 |      beginning of the field.  Fields and character positions are
 481 |      numbered starting with 1; a character position of zero in POS2
 482 |      indicates the field’s last character.  If ‘.C’ is omitted from
 483 |      POS1, it defaults to 1 (the beginning of the field); if omitted
 484 |      from POS2, it defaults to 0 (the end of the field).  OPTS are
 485 |      ordering options, allowing individual keys to be sorted according
 486 |      to different rules; see below for details.  Keys can span multiple
 487 |      fields.
 488 | ```
 489 | 
 490 | * By default, blank characters (space and tab) serve as field separators
 491 | 
 492 | ```bash
 493 | $ cat fruits.txt
 494 | apple   42
 495 | guava   6
 496 | fig     90
 497 | banana  31
 498 | 
 499 | $ sort fruits.txt
 500 | apple   42
 501 | banana  31
 502 | fig     90
 503 | guava   6
 504 | 
 505 | $ # sort based on 2nd column numbers
 506 | $ sort -k2,2n fruits.txt
 507 | guava   6
 508 | banana  31
 509 | apple   42
 510 | fig     90
 511 | ```
 512 | 
 513 | * Using a different field separator
 514 | * Consider the following sample input file having fields separated by `:`
 515 | 
 516 | ```bash
 517 | $ # name:pet_name:no_of_pets
 518 | $ cat pets.txt
 519 | foo:dog:2
 520 | xyz:cat:1
 521 | baz:parrot:5
 522 | abcd:cat:3
 523 | joe:dog:1
 524 | bar:fox:1
 525 | temp_var:squirrel:4
 526 | boss:dog:10
 527 | ```
 528 | 
 529 | * Sorting based on particular column or column to end of line
 530 | * In case of multiple entries, by default `sort` would use content of remaining parts of line to resolve
 531 | 
 532 | ```bash
 533 | $ # only 2nd column
 534 | $ # -k2,4 would mean 2nd column to 4th column
 535 | $ sort -t: -k2,2 pets.txt
 536 | abcd:cat:3
 537 | xyz:cat:1
 538 | boss:dog:10
 539 | foo:dog:2
 540 | joe:dog:1
 541 | bar:fox:1
 542 | baz:parrot:5
 543 | temp_var:squirrel:4
 544 | 
 545 | $ # from 2nd column to end of line
 546 | $ sort -t: -k2 pets.txt
 547 | xyz:cat:1
 548 | abcd:cat:3
 549 | joe:dog:1
 550 | boss:dog:10
 551 | foo:dog:2
 552 | bar:fox:1
 553 | baz:parrot:5
 554 | temp_var:squirrel:4
 555 | ```
 556 | 
 557 | * Multiple keys can be specified to resolve ties
 558 | * Note that if there are still multiple entries with specified keys, remaining parts of lines would be used
 559 | 
 560 | ```bash
 561 | $ # default sort for 2nd column, numeric sort on 3rd column to resolve ties
 562 | $ sort -t: -k2,2 -k3,3n pets.txt
 563 | xyz:cat:1
 564 | abcd:cat:3
 565 | joe:dog:1
 566 | foo:dog:2
 567 | boss:dog:10
 568 | bar:fox:1
 569 | baz:parrot:5
 570 | temp_var:squirrel:4
 571 | 
 572 | $ # numeric sort on 3rd column, default sort for 2nd column to resolve ties
 573 | $ sort -t: -k3,3n -k2,2 pets.txt
 574 | xyz:cat:1
 575 | joe:dog:1
 576 | bar:fox:1
 577 | foo:dog:2
 578 | abcd:cat:3
 579 | temp_var:squirrel:4
 580 | baz:parrot:5
 581 | boss:dog:10
 582 | ```
 583 | 
 584 | * Use `-s` option to retain original order of lines in case of tie
 585 | 
 586 | ```bash
 587 | $ sort -s -t: -k2,2 pets.txt
 588 | xyz:cat:1
 589 | abcd:cat:3
 590 | foo:dog:2
 591 | joe:dog:1
 592 | boss:dog:10
 593 | bar:fox:1
 594 | baz:parrot:5
 595 | temp_var:squirrel:4
 596 | ```
 597 | 
 598 | * The `-u` option, as seen earlier, will retain only first match
 599 | 
 600 | ```bash
 601 | $ sort -u -t: -k2,2 pets.txt
 602 | xyz:cat:1
 603 | foo:dog:2
 604 | bar:fox:1
 605 | baz:parrot:5
 606 | temp_var:squirrel:4
 607 | 
 608 | $ sort -u -t: -k3,3n pets.txt
 609 | xyz:cat:1
 610 | foo:dog:2
 611 | abcd:cat:3
 612 | temp_var:squirrel:4
 613 | baz:parrot:5
 614 | boss:dog:10
 615 | ```
 616 | 
 617 | * Sometimes, the input has to be sorted first and then `-u` used on the sorted output
 618 | * See also [remove duplicates based on the value of another column](https://unix.stackexchange.com/questions/379835/remove-duplicates-based-on-the-value-of-another-column)
 619 | 
 620 | ```bash
 621 | $ # sort by number in 3rd column
 622 | $ sort -t: -k3,3n pets.txt
 623 | bar:fox:1
 624 | joe:dog:1
 625 | xyz:cat:1
 626 | foo:dog:2
 627 | abcd:cat:3
 628 | temp_var:squirrel:4
 629 | baz:parrot:5
 630 | boss:dog:10
 631 | 
 632 | $ # then get unique entry based on 2nd column
 633 | $ sort -t: -k3,3n pets.txt | sort -t: -u -k2,2
 634 | xyz:cat:1
 635 | joe:dog:1
 636 | bar:fox:1
 637 | baz:parrot:5
 638 | temp_var:squirrel:4
 639 | ```
 640 | 
 641 | * Specifying particular characters within fields
 642 | * If character position is not specified, defaults to `1` for starting column and `0` (last character) for ending column
 643 | 
 644 | ```bash
 645 | $ cat marks.txt
 646 | fork,ap_12,54
 647 | flat,up_342,1.2
 648 | fold,tn_48,211
 649 | more,ap_93,7
 650 | rest,up_5,63
 651 | 
 652 | $ # for 2nd column, sort numerically only from 4th character to end
 653 | $ sort -t, -k2.4,2n marks.txt
 654 | rest,up_5,63
 655 | fork,ap_12,54
 656 | fold,tn_48,211
 657 | more,ap_93,7
 658 | flat,up_342,1.2
 659 | 
 660 | $ # sort uniquely based on first two characters of line
 661 | $ sort -u -k1.1,1.2 marks.txt
 662 | flat,up_342,1.2
 663 | fork,ap_12,54
 664 | more,ap_93,7
 665 | rest,up_5,63
 666 | ```
 667 | 
 668 | * If there are headers
 669 | 
 670 | ```bash
 671 | $ cat header.txt
 672 | fruit   qty
 673 | apple   42
 674 | guava   6
 675 | fig     90
 676 | banana  31
 677 | 
 678 | $ # separate and combine header and content to be sorted
 679 | $ cat <(head -n1 header.txt) <(tail -n +2 header.txt | sort -k2nr)
 680 | fruit   qty
 681 | fig     90
 682 | apple   42
 683 | banana  31
 684 | guava   6
 685 | ```
 686 | 
 687 | * See also [sort by last field value when number of fields varies](https://stackoverflow.com/questions/3832068/bash-sort-text-file-by-last-field-value)
 688 | 
 689 | <br>
 690 | 
 691 | #### <a name="further-reading-for-sort"></a>Further reading for sort
 692 | 
 693 | * There are many other options apart from handful presented above. See `man sort` and `info sort` for detailed documentation and more examples
 694 | * [sort like a master](http://www.skorks.com/2010/05/sort-files-like-a-master-with-the-linux-sort-command-bash/)
 695 | * [When -b to ignore leading blanks is needed](https://unix.stackexchange.com/a/104527/109046)
 696 | * [sort Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/sort?sort=votes&pageSize=15)
 697 | * [sort on multiple columns using -k option](https://unix.stackexchange.com/questions/249452/unix-multiple-column-sort-issue)
 698 | * [sort a string character wise](https://stackoverflow.com/questions/2373874/how-to-sort-characters-in-a-string)
 699 | * [Scalability of 'sort -u' for gigantic files](https://unix.stackexchange.com/questions/279096/scalability-of-sort-u-for-gigantic-files)
 700 | 
 701 | <br>
 702 | 
 703 | ## <a name="uniq"></a>uniq
 704 | 
 705 | ```bash
 706 | $ uniq --version | head -n1
 707 | uniq (GNU coreutils) 8.25
 708 | 
 709 | $ man uniq
 710 | UNIQ(1)                          User Commands                         UNIQ(1)
 711 | 
 712 | NAME
 713 |        uniq - report or omit repeated lines
 714 | 
 715 | SYNOPSIS
 716 |        uniq [OPTION]... [INPUT [OUTPUT]]
 717 | 
 718 | DESCRIPTION
 719 |        Filter  adjacent matching lines from INPUT (or standard input), writing
 720 |        to OUTPUT (or standard output).
 721 | 
 722 |        With no options, matching lines are merged to the first occurrence.
 723 | ...
 724 | ```
 725 | 
 726 | <br>
 727 | 
 728 | #### <a name="default-uniq"></a>Default uniq
 729 | 
 730 | ```bash
 731 | $ cat word_list.txt
 732 | are
 733 | are
 734 | to
 735 | good
 736 | bad
 737 | bad
 738 | bad
 739 | good
 740 | are
 741 | bad
 742 | 
 743 | $ # adjacent duplicate lines are removed, leaving one copy
 744 | $ uniq word_list.txt
 745 | are
 746 | to
 747 | good
 748 | bad
 749 | good
 750 | are
 751 | bad
 752 | 
 753 | $ # To remove duplicates from entire file, input has to be sorted first
 754 | $ # also showcases that uniq accepts stdin as input
 755 | $ sort word_list.txt | uniq
 756 | are
 757 | bad
 758 | good
 759 | to
 760 | ```
 761 | 
 762 | <br>
 763 | 
 764 | #### <a name="only-duplicates"></a>Only duplicates
 765 | 
 766 | ```bash
 767 | $ # duplicates adjacent to each other
 768 | $ uniq -d word_list.txt
 769 | are
 770 | bad
 771 | 
 772 | $ # duplicates in entire file
 773 | $ sort word_list.txt | uniq -d
 774 | are
 775 | bad
 776 | good
 777 | ```
 778 | 
 779 | * To get only duplicates as well as show all duplicates
 780 | 
 781 | ```bash
 782 | $ uniq -D word_list.txt
 783 | are
 784 | are
 785 | bad
 786 | bad
 787 | bad
 788 | 
 789 | $ sort word_list.txt | uniq -D
 790 | are
 791 | are
 792 | are
 793 | bad
 794 | bad
 795 | bad
 796 | bad
 797 | good
 798 | good
 799 | ```
 800 | 
 801 | * To distinguish the different groups
 802 | 
 803 | ```bash
 804 | $ # using --all-repeated=prepend will add a newline before the first group as well
 805 | $ sort word_list.txt | uniq --all-repeated=separate
 806 | are
 807 | are
 808 | are
 809 | 
 810 | bad
 811 | bad
 812 | bad
 813 | bad
 814 | 
 815 | good
 816 | good
 817 | ```
 818 | 
 819 | <br>
 820 | 
 821 | #### <a name="only-unique"></a>Only unique
 822 | 
 823 | ```bash
 824 | $ # lines with no adjacent duplicates
 825 | $ uniq -u word_list.txt
 826 | to
 827 | good
 828 | good
 829 | are
 830 | bad
 831 | 
 832 | $ # unique lines in entire file
 833 | $ sort word_list.txt | uniq -u
 834 | to
 835 | ```
 836 | 
 837 | <br>
 838 | 
 839 | #### <a name="prefix-count"></a>Prefix count
 840 | 
 841 | ```bash
 842 | $ # adjacent lines
 843 | $ uniq -c word_list.txt
 844 |       2 are
 845 |       1 to
 846 |       1 good
 847 |       3 bad
 848 |       1 good
 849 |       1 are
 850 |       1 bad
 851 | 
 852 | $ # entire file
 853 | $ sort word_list.txt | uniq -c
 854 |       3 are
 855 |       4 bad
 856 |       2 good
 857 |       1 to
 858 | 
 859 | $ # entire file, only duplicates
 860 | $ sort word_list.txt | uniq -cd
 861 |       3 are
 862 |       4 bad
 863 |       2 good
 864 | ```
 865 | 
 866 | * Sorting by count
 867 | 
 868 | ```bash
 869 | $ # sort by count
 870 | $ sort word_list.txt | uniq -c | sort -n
 871 |       1 to
 872 |       2 good
 873 |       3 are
 874 |       4 bad
 875 | 
 876 | $ # reverse the order, highest count first
 877 | $ sort word_list.txt | uniq -c | sort -nr
 878 |       4 bad
 879 |       3 are
 880 |       2 good
 881 |       1 to
 882 | ```
 883 | 
 884 | * To get only entries with min/max count, bit of [awk](./gnu_awk.md) magic would help
 885 | 
 886 | ```bash
 887 | $ # consider this result
 888 | $ sort colors.txt | uniq -c | sort -nr
 889 |       3 Red
 890 |       3 Blue
 891 |       2 Yellow
 892 |       1 Green
 893 |       1 Black
 894 | 
 895 | $ # to get all max count
 896 | $ # save 1st line 1st column value to c and then print if 1st column equals c
 897 | $ sort colors.txt | uniq -c | sort -nr | awk 'NR==1{c=$1} $1==c'
 898 |       3 Red
 899 |       3 Blue
 900 | $ # to get all min count
 901 | $ sort colors.txt | uniq -c | sort -n | awk 'NR==1{c=$1} $1==c'
 902 |       1 Black
 903 |       1 Green
 904 | ```
 905 | 
 906 | * Get rough count of most used commands from `history` file
 907 | 
 908 | ```bash
 909 | $ # awk '{print $1}' will get the 1st column alone
 910 | $ awk '{print $1}' "$HISTFILE" | sort | uniq -c | sort -nr | head
 911 |    1465 echo
 912 |    1180 grep
 913 |     552 cd
 914 |     531 awk
 915 |     451 sed
 916 |     423 vi
 917 |     418 cat
 918 |     392 perl
 919 |     325 printf
 920 |     320 sort
 921 | 
 922 | $ # extract command name from start of line or preceded by 'spaces|spaces'
 923 | $ # won't catch commands in other places like command substitution though
 924 | $ grep -oP '(^| +\| +)\K[^ ]+' "$HISTFILE" | sort | uniq -c | sort -nr | head
 925 |    2006 grep
 926 |    1469 echo
 927 |     933 sed
 928 |     698 awk
 929 |     552 cd
 930 |     513 perl
 931 |     510 cat
 932 |     453 sort
 933 |     423 vi
 934 |     327 printf
 935 | ```
 936 | 
 937 | <br>
 938 | 
 939 | #### <a name="ignoring-case"></a>Ignoring case
 940 | 
 941 | ```bash
 942 | $ cat another_list.txt
 943 | food
 944 | Food
 945 | good
 946 | are
 947 | bad
 948 | Are
 949 | 
 950 | $ # note how first copy is retained
 951 | $ uniq -i another_list.txt
 952 | food
 953 | good
 954 | are
 955 | bad
 956 | Are
 957 | 
 958 | $ uniq -iD another_list.txt
 959 | food
 960 | Food
 961 | ```
 962 | 
 963 | <br>
 964 | 
 965 | #### <a name="combining-multiple-files"></a>Combining multiple files
 966 | 
 967 | ```bash
 968 | $ sort -f word_list.txt another_list.txt | uniq -i
 969 | are
 970 | bad
 971 | food
 972 | good
 973 | to
 974 | 
 975 | $ sort -f word_list.txt another_list.txt | uniq -c
 976 |       4 are
 977 |       1 Are
 978 |       5 bad
 979 |       1 food
 980 |       1 Food
 981 |       3 good
 982 |       1 to
 983 | 
 984 | $ sort -f word_list.txt another_list.txt | uniq -ic
 985 |       5 are
 986 |       5 bad
 987 |       2 food
 988 |       3 good
 989 |       1 to
 990 | ```
 991 | 
 992 | * If only adjacent lines (not sorted) is required, need to concatenate files using another command
 993 | 
 994 | ```bash
 995 | $ uniq -id word_list.txt
 996 | are
 997 | bad
 998 | 
 999 | $ uniq -id another_list.txt
1000 | food
1001 | 
1002 | $ cat word_list.txt another_list.txt | uniq -id
1003 | are
1004 | bad
1005 | food
1006 | ```
1007 | 
1008 | <br>
1009 | 
1010 | #### <a name="column-options"></a>Column options
1011 | 
1012 | * `uniq` has few options dealing with column manipulations. Not extensive as `sort -k` but handy for some cases
1013 | * First up, skipping fields
1014 |     * No option to specify different delimiter
1015 |     * From `info uniq`: Fields are sequences of non-space non-tab characters that are separated from each other by at least one space or tab
1016 |     * Number of spaces/tabs between fields should be same
1017 | 
1018 | ```bash
1019 | $ cat shopping.txt
1020 | lemon 5
1021 | mango 5
1022 | banana 8
1023 | bread 1
1024 | orange 5
1025 | 
1026 | $ # skips first field
1027 | $ uniq -f1 shopping.txt
1028 | lemon 5
1029 | banana 8
1030 | bread 1
1031 | orange 5
1032 | 
1033 | $ # use -f3 to skip first three fields and so on
1034 | ```
1035 | 
1036 | * Skipping characters
1037 | 
1038 | ```bash
1039 | $ cat text
1040 | glue
1041 | blue
1042 | black
1043 | stack
1044 | stuck
1045 | 
1046 | $ # don't consider first 2 characters
1047 | $ uniq -s2 text
1048 | glue
1049 | black
1050 | stuck
1051 | 
1052 | $ # to visualize the above example
1053 | $ # assume there are two fields and uniq is applied on 2nd column
1054 | $ sed 's/^../& /' text
1055 | gl ue
1056 | bl ue
1057 | bl ack
1058 | st ack
1059 | st uck
1060 | ```
1061 | 
1062 | * Upto specified characters
1063 | 
1064 | ```bash
1065 | $ # consider only first 2 characters
1066 | $ uniq -w2 text
1067 | glue
1068 | blue
1069 | stack
1070 | 
1071 | $ # to visualize the above example
1072 | $ # assume there are two fields and uniq is applied on 1st column
1073 | $ sed 's/^../& /' text
1074 | gl ue
1075 | bl ue
1076 | bl ack
1077 | st ack
1078 | st uck
1079 | ```
1080 | 
1081 | * Combining `-s` and `-w`
1082 | * Can be combined with `-f` as well
1083 | 
1084 | ```bash
1085 | $ # skip first 3 characters and then use next 2 characters
1086 | $ uniq -s3 -w2 text
1087 | glue
1088 | black
1089 | ```
1090 | 
1091 | 
1092 | <br>
1093 | 
1094 | #### <a name="further-reading-for-uniq"></a>Further reading for uniq
1095 | 
1096 | * Do check out `man uniq` and `info uniq` for other options and more detailed documentation
1097 | * [uniq Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/uniq?sort=votes&pageSize=15)
1098 | * [process duplicate lines only based on certain fields](https://unix.stackexchange.com/questions/387590/print-the-duplicate-lines-only-on-fields-1-2-from-csv-file)
1099 | 
1100 | <br>
1101 | 
1102 | ## <a name="comm"></a>comm
1103 | 
1104 | ```bash
1105 | $ comm --version | head -n1
1106 | comm (GNU coreutils) 8.25
1107 | 
1108 | $ man comm
1109 | COMM(1)                          User Commands                         COMM(1)
1110 | 
1111 | NAME
1112 |        comm - compare two sorted files line by line
1113 | 
1114 | SYNOPSIS
1115 |        comm [OPTION]... FILE1 FILE2
1116 | 
1117 | DESCRIPTION
1118 |        Compare sorted files FILE1 and FILE2 line by line.
1119 | 
1120 |        When FILE1 or FILE2 (not both) is -, read standard input.
1121 | 
1122 |        With  no  options,  produce  three-column  output.  Column one contains
1123 |        lines unique to FILE1, column two contains lines unique to  FILE2,  and
1124 |        column three contains lines common to both files.
1125 | ...
1126 | ```
1127 | 
1128 | <br>
1129 | 
1130 | #### <a name="default-three-column-output"></a>Default three column output
1131 | 
1132 | Consider below sample input files
1133 | 
1134 | ```bash
1135 | $ # sorted input files viewed side by side
1136 | $ paste colors_1.txt colors_2.txt
1137 | Blue    Black
1138 | Brown   Blue
1139 | Purple  Green
1140 | Red     Red
1141 | Teal    White
1142 | Yellow
1143 | ```
1144 | 
1145 | * Without any option, `comm` gives 3 column output
1146 |     * lines unique to first file
1147 |     * lines unique to second file
1148 |     * lines common to both files
1149 | 
1150 | ```bash
1151 | $ comm colors_1.txt colors_2.txt
1152 |         Black
1153 |                 Blue
1154 | Brown
1155 |         Green
1156 | Purple
1157 |                 Red
1158 | Teal
1159 |         White
1160 | Yellow
1161 | ```
1162 | 
1163 | <br>
1164 | 
1165 | #### <a name="suppressing-columns"></a>Suppressing columns
1166 | 
1167 | * `-1` suppress lines unique to first file
1168 | * `-2` suppress lines unique to second file
1169 | * `-3` suppress lines common to both files
1170 | 
1171 | ```bash
1172 | $ # suppressing column 3
1173 | $ comm -3 colors_1.txt colors_2.txt
1174 |         Black
1175 | Brown
1176 |         Green
1177 | Purple
1178 | Teal
1179 |         White
1180 | Yellow
1181 | ```
1182 | 
1183 | * Combining options gives three distinct and useful constructs
1184 | * First, getting only common lines to both files
1185 | 
1186 | ```bash
1187 | $ comm -12 colors_1.txt colors_2.txt
1188 | Blue
1189 | Red
1190 | ```
1191 | 
1192 | * Second, lines unique to first file
1193 | 
1194 | ```bash
1195 | $ comm -23 colors_1.txt colors_2.txt
1196 | Brown
1197 | Purple
1198 | Teal
1199 | Yellow
1200 | ```
1201 | 
1202 | * And the third, lines unique to second file
1203 | 
1204 | ```bash
1205 | $ comm -13 colors_1.txt colors_2.txt
1206 | Black
1207 | Green
1208 | White
1209 | ```
1210 | 
1211 | * See also how the above three cases can be done [using grep alone](./gnu_grep.md#search-strings-from-file)
1212 |     * **Note** input files do not need to be sorted for `grep` solution
1213 | 
1214 | If different `sort` order than default is required, use `--nocheck-order` to ignore error message
1215 | 
1216 | ```bash
1217 | $ comm -23 <(sort -n numbers.txt) <(sort -n nums.txt)
1218 | 3
1219 | comm: file 1 is not in sorted order
1220 | 20
1221 | 53
1222 | 101
1223 | 
1224 | $ comm --nocheck-order -23 <(sort -n numbers.txt) <(sort -n nums.txt)
1225 | 3
1226 | 20
1227 | 53
1228 | 101
1229 | ```
1230 | 
1231 | <br>
1232 | 
1233 | #### <a name="files-with-duplicates"></a>Files with duplicates
1234 | 
1235 | * As many duplicate lines match in both files, they'll be considered as common
1236 | * Rest will be unique to respective files
1237 | * This is useful for cases like finding lines present in first but not in second taking in to consideration count of duplicates as well
1238 |     * This solution won't be possible with `grep`
1239 | 
1240 | ```bash
1241 | $ paste list1 list2
1242 | a       a
1243 | a       b
1244 | a       c
1245 | b       c
1246 | b       d
1247 | c
1248 | 
1249 | $ comm list1 list2
1250 |                 a
1251 | a
1252 | a
1253 |                 b
1254 | b
1255 |                 c
1256 |         c
1257 |         d
1258 | 
1259 | $ comm -23 list1 list2
1260 | a
1261 | a
1262 | b
1263 | ```
1264 | 
1265 | <br>
1266 | 
1267 | #### <a name="further-reading-for-comm"></a>Further reading for comm
1268 | 
1269 | * `man comm` and `info comm` for more options and detailed documentation
1270 | * [comm Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/comm?sort=votes&pageSize=15)
1271 | 
1272 | <br>
1273 | 
1274 | ## <a name="shuf"></a>shuf
1275 | 
1276 | ```bash
1277 | $ shuf --version | head -n1
1278 | shuf (GNU coreutils) 8.25
1279 | 
1280 | $ man shuf
1281 | SHUF(1)                          User Commands                         SHUF(1)
1282 | 
1283 | NAME
1284 |        shuf - generate random permutations
1285 | 
1286 | SYNOPSIS
1287 |        shuf [OPTION]... [FILE]
1288 |        shuf -e [OPTION]... [ARG]...
1289 |        shuf -i LO-HI [OPTION]...
1290 | 
1291 | DESCRIPTION
1292 |        Write a random permutation of the input lines to standard output.
1293 | 
1294 |        With no FILE, or when FILE is -, read standard input.
1295 | ...
1296 | ```
1297 | 
1298 | <br>
1299 | 
1300 | #### <a name="random-lines"></a>Random lines
1301 | 
1302 | * Without repeating input lines
1303 | 
1304 | ```bash
1305 | $ cat nums.txt
1306 | 1
1307 | 10
1308 | 10
1309 | 12
1310 | 23
1311 | 563
1312 | 
1313 | $ # duplicates can end up anywhere
1314 | $ # all lines are part of output
1315 | $ shuf nums.txt
1316 | 10
1317 | 23
1318 | 1
1319 | 10
1320 | 563
1321 | 12
1322 | 
1323 | $ # limit max number of output lines
1324 | $ shuf -n2 nums.txt
1325 | 563
1326 | 23
1327 | ```
1328 | 
1329 | * Use `-o` option to specify output file name instead of displaying on stdout
1330 | * Helpful for inplace editing
1331 | 
1332 | ```bash
1333 | $ shuf nums.txt -o nums.txt
1334 | $ cat nums.txt
1335 | 10
1336 | 12
1337 | 23
1338 | 10
1339 | 563
1340 | 1
1341 | ```
1342 | 
1343 | * With repeated input lines
1344 | 
1345 | ```bash
1346 | $ # -n3 for max 3 lines, -r allows input lines to be repeated
1347 | $ shuf -n3 -r nums.txt
1348 | 1
1349 | 1
1350 | 563
1351 | 
1352 | $ seq 3 | shuf -n5 -r
1353 | 2
1354 | 1
1355 | 2
1356 | 1
1357 | 2
1358 | 
1359 | $ # if a limit using -n is not specified, shuf will output lines indefinitely
1360 | ```
1361 | 
1362 | * use `-e` option to specify multiple input lines from command line itself
1363 | 
1364 | ```bash
1365 | $ shuf -e red blue green
1366 | green
1367 | blue
1368 | red
1369 | 
1370 | $ shuf -e 'hi there' 'hello world' foo bar
1371 | bar
1372 | hi there
1373 | foo
1374 | hello world
1375 | 
1376 | $ shuf -n2 -e 'hi there' 'hello world' foo bar
1377 | foo
1378 | hi there
1379 | 
1380 | $ shuf -r -n4 -e foo bar
1381 | foo
1382 | foo
1383 | bar
1384 | foo
1385 | ```
1386 | 
1387 | <br>
1388 | 
1389 | #### <a name="random-integer-numbers"></a>Random integer numbers
1390 | 
1391 | * The `-i` option accepts integer range as input to be shuffled
1392 | 
1393 | ```bash
1394 | $ shuf -i 3-8
1395 | 3
1396 | 7
1397 | 6
1398 | 4
1399 | 8
1400 | 5
1401 | ```
1402 | 
1403 | * Combine with other options as needed
1404 | 
1405 | ```bash
1406 | $ shuf -n3 -i 3-8
1407 | 5
1408 | 4
1409 | 7
1410 | 
1411 | $ shuf -r -n4 -i 3-8
1412 | 5
1413 | 5
1414 | 7
1415 | 8
1416 | 
1417 | $ shuf -r -n5 -i 0-1
1418 | 1
1419 | 0
1420 | 0
1421 | 1
1422 | 1
1423 | ```
1424 | 
1425 | * Use [seq](./miscellaneous.md#seq) input if negative numbers, floating point, etc are needed
1426 | 
1427 | ```bash
1428 | $ seq 2 -1 -2 | shuf
1429 | 2
1430 | -1
1431 | -2
1432 | 0
1433 | 1
1434 | 
1435 | $ seq 0.3 0.1 0.7 | shuf -n3
1436 | 0.4
1437 | 0.5
1438 | 0.7
1439 | ```
1440 | 
1441 | 
1442 | <br>
1443 | 
1444 | #### <a name="further-reading-for-shuf"></a>Further reading for shuf
1445 | 
1446 | * `man shuf` and `info shuf` for more options and detailed documentation
1447 | * [Generate random numbers in specific range](https://unix.stackexchange.com/questions/140750/generate-random-numbers-in-specific-range)
1448 | * [Variable - randomly choose among three numbers](https://unix.stackexchange.com/questions/330689/variable-randomly-chosen-among-three-numbers-10-100-and-1000)
1449 | * Related to 'random' stuff:
1450 |     * [How to generate a random string?](https://unix.stackexchange.com/questions/230673/how-to-generate-a-random-string)
1451 |     * [How can I populate a file with random data?](https://unix.stackexchange.com/questions/33629/how-can-i-populate-a-file-with-random-data)
1452 |     * [Run commands at random](https://unix.stackexchange.com/questions/81566/run-commands-at-random)
1453 | 
1454 | 


--------------------------------------------------------------------------------
/tail_less_cat_head.md:
--------------------------------------------------------------------------------
  1 | # <a name="cat-less-tail-and-head"></a>Cat, Less, Tail and Head
  2 | 
  3 | **Table of Contents**
  4 | 
  5 | * [cat](#cat)
  6 |     * [Concatenate files](#concatenate-files)
  7 |     * [Accepting input from stdin](#accepting-input-from-stdin)
  8 |     * [Squeeze consecutive empty lines](#squeeze-consecutive-empty-lines)
  9 |     * [Prefix line numbers](#prefix-line-numbers)
 10 |     * [Viewing special characters](#viewing-special-characters)
 11 |     * [Writing text to file](#writing-text-to-file)
 12 |     * [tac](#tac)
 13 |     * [Useless use of cat](#useless-use-of-cat)
 14 |     * [Further Reading for cat](#further-reading-for-cat)
 15 | * [less](#less)
 16 |     * [Navigation commands](#navigation-commands)
 17 |     * [Further Reading for less](#further-reading-for-less)
 18 | * [tail](#tail)
 19 |     * [linewise tail](#linewise-tail)
 20 |     * [characterwise tail](#characterwise-tail)
 21 |     * [multiple file input for tail](#multiple-file-input-for-tail)
 22 |     * [Further Reading for tail](#further-reading-for-tail)
 23 | * [head](#head)
 24 |     * [linewise head](#linewise-head)
 25 |     * [characterwise head](#characterwise-head)
 26 |     * [multiple file input for head](#multiple-file-input-for-head)
 27 |     * [combining head and tail](#combining-head-and-tail)
 28 |     * [Further Reading for head](#further-reading-for-head)
 29 | * [Text Editors](#text-editors)
 30 | 
 31 | <br>
 32 | 
 33 | ## <a name="cat"></a>cat
 34 | 
 35 | ```bash
 36 | $ cat --version | head -n1
 37 | cat (GNU coreutils) 8.25
 38 | 
 39 | $ man cat
 40 | CAT(1)                           User Commands                          CAT(1)
 41 | 
 42 | NAME
 43 |        cat - concatenate files and print on the standard output
 44 | 
 45 | SYNOPSIS
 46 |        cat [OPTION]... [FILE]...
 47 | 
 48 | DESCRIPTION
 49 |        Concatenate FILE(s) to standard output.
 50 | 
 51 |        With no FILE, or when FILE is -, read standard input.
 52 | ...
 53 | ```
 54 | 
 55 | * For below examples, `marks_201*` files contain 3 fields delimited by TAB
 56 | * To avoid formatting issues, TAB has been converted to spaces using `col -x` while pasting the output here
 57 | 
 58 | <br>
 59 | 
 60 | #### <a name="concatenate-files"></a>Concatenate files
 61 | 
 62 | * One or more files can be given as input and hence a lot of times, `cat` is used to quickly see contents of small single file on terminal
 63 | * To save the output of concatenation, just redirect stdout
 64 | 
 65 | ```bash
 66 | $ ls
 67 | marks_2015.txt  marks_2016.txt  marks_2017.txt
 68 | 
 69 | $ cat marks_201*
 70 | Name    Maths   Science
 71 | foo     67      78
 72 | bar     87      85
 73 | Name    Maths   Science
 74 | foo     70      75
 75 | bar     85      88
 76 | Name    Maths   Science
 77 | foo     68      76
 78 | bar     90      90
 79 | 
 80 | $ # save stdout to a file
 81 | $ cat marks_201* > all_marks.txt
 82 | ```
 83 | 
 84 | <br>
 85 | 
 86 | #### <a name="accepting-input-from-stdin"></a>Accepting input from stdin
 87 | 
 88 | ```bash
 89 | $ # combining input from stdin and other files
 90 | $ printf 'Name\tMaths\tScience \nbaz\t56\t63\nbak\t71\t65\n' | cat - marks_2015.txt
 91 | Name    Maths   Science
 92 | baz     56      63
 93 | bak     71      65
 94 | Name    Maths   Science
 95 | foo     67      78
 96 | bar     87      85
 97 | 
 98 | $ # - can be placed in whatever order is required
 99 | $ printf 'Name\tMaths\tScience \nbaz\t56\t63\nbak\t71\t65\n' | cat marks_2015.txt -
100 | Name    Maths   Science
101 | foo     67      78
102 | bar     87      85
103 | Name    Maths   Science
104 | baz     56      63
105 | bak     71      65
106 | ```
107 | 
108 | <br>
109 | 
110 | #### <a name="squeeze-consecutive-empty-lines"></a>Squeeze consecutive empty lines
111 | 
112 | ```bash
113 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n'
114 | hello
115 | 
116 | 
117 | world
118 | 
119 | have a nice day
120 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n' | cat -s
121 | hello
122 | 
123 | world
124 | 
125 | have a nice day
126 | ```
127 | 
128 | <br>
129 | 
130 | #### <a name="prefix-line-numbers"></a>Prefix line numbers
131 | 
132 | ```bash
133 | $ # number all lines
134 | $ cat -n marks_201*
135 |      1  Name    Maths   Science
136 |      2  foo     67      78
137 |      3  bar     87      85
138 |      4  Name    Maths   Science
139 |      5  foo     70      75
140 |      6  bar     85      88
141 |      7  Name    Maths   Science
142 |      8  foo     68      76
143 |      9  bar     90      90
144 | 
145 | $ # number only non-empty lines
146 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n' | cat -sb
147 |      1  hello
148 | 
149 |      2  world
150 | 
151 |      3  have a nice day
152 | ```
153 | 
154 | * For more numbering options, check out the command `nl`
155 | 
156 | ```bash
157 | $ whatis nl
158 | nl (1)               - number lines of files
159 | ```
160 | 
161 | <br>
162 | 
163 | #### <a name="viewing-special-characters"></a>Viewing special characters
164 | 
165 | * End of line identified by `$`
166 | * Useful for example to see trailing spaces
167 | 
168 | ```bash
169 | $ cat -E marks_2015.txt
170 | Name    Maths   Science $
171 | foo     67      78$
172 | bar     87      85$
173 | ```
174 | 
175 | * TAB identified by `^I`
176 | 
177 | ```bash
178 | $ cat -T marks_2015.txt
179 | Name^IMaths^IScience 
180 | foo^I67^I78
181 | bar^I87^I85
182 | ```
183 | 
184 | * Non-printing characters
185 | * See [Show Non-Printing Characters](http://docstore.mik.ua/orelly/unix/upt/ch25_07.htm) for more detailed info
186 | 
187 | ```bash
188 | $ # NUL character
189 | $ printf 'foo\0bar\0baz\n' | cat -v
190 | foo^@bar^@baz
191 | 
192 | $ # to check for dos-style line endings
193 | $ printf 'Hello World!\r\n' | cat -v
194 | Hello World!^M
195 | 
196 | $ printf 'Hello World!\r\n' | dos2unix | cat -v
197 | Hello World!
198 | ```
199 | 
200 | * the `-A` option is equivalent to `-vET`
201 | * the `-e` option is equivalent to `-vE`
202 | * If `dos2unix` and `unix2dos` are not available, see [How to convert DOS/Windows newline (CRLF) to Unix newline (\n)](https://stackoverflow.com/questions/2613800/how-to-convert-dos-windows-newline-crlf-to-unix-newline-n-in-a-bash-script)
203 | 
204 | <br>
205 | 
206 | #### <a name="writing-text-to-file"></a>Writing text to file
207 | 
208 | ```bash
209 | $ cat > sample.txt
210 | This is an example of adding text to a new file using cat command.
211 | Press Ctrl+d on a newline to save and quit.
212 | 
213 | $ cat sample.txt
214 | This is an example of adding text to a new file using cat command.
215 | Press Ctrl+d on a newline to save and quit.
216 | ```
217 | 
218 | * See also how to use [heredoc](http://mywiki.wooledge.org/HereDocument)
219 |     * [How can I write a here doc to a file](https://stackoverflow.com/questions/2953081/how-can-i-write-a-here-doc-to-a-file-in-bash-script)
220 | * See also [difference between Ctrl+c and Ctrl+d to signal end of stdin input in bash](https://unix.stackexchange.com/questions/16333/how-to-signal-the-end-of-stdin-input-in-bash)
221 | 
222 | <br>
223 | 
224 | #### <a name="tac"></a>tac
225 | 
226 | ```bash
227 | $ whatis tac
228 | tac (1)              - concatenate and print files in reverse
229 | $ tac --version | head -n1
230 | tac (GNU coreutils) 8.25
231 | 
232 | $ seq 3 | tac
233 | 3
234 | 2
235 | 1
236 | 
237 | $ tac marks_2015.txt
238 | bar     87      85
239 | foo     67      78
240 | Name    Maths   Science
241 | ```
242 | 
243 | * Useful in cases where logic is easier to write when working on reversed file
244 | * Consider this made up log file, many **Warning** lines but need to extract only from last such **Warning** upto **Error** line
245 |     * See [GNU sed chapter](./gnu_sed.md#lines-between-two-regexps) for details on the `sed` command used below
246 | 
247 | ```bash
248 | $ cat report.log
249 | blah blah
250 | Warning: something went wrong
251 | more blah
252 | whatever
253 | Warning: something else went wrong
254 | some text
255 | some more text
256 | Error: something seriously went wrong
257 | blah blah blah
258 | 
259 | $ tac report.log | sed -n '/Error:/,/Warning:/p' | tac
260 | Warning: something else went wrong
261 | some text
262 | some more text
263 | Error: something seriously went wrong
264 | ```
265 | 
266 | * Similarly, if characters in lines have to be reversed, use the `rev` command
267 | 
268 | ```bash
269 | $ whatis rev
270 | rev (1)              - reverse lines characterwise
271 | ```
272 | 
273 | <br>
274 | 
275 | #### <a name="useless-use-of-cat"></a>Useless use of cat
276 | 
277 | * `cat` is used so frequently to view contents of a file that somehow users think other commands cannot handle file input
278 | * [UUOC](https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat)
279 | * [Useless Use of Cat Award](http://porkmail.org/era/unix/award.html)
280 | 
281 | ```bash
282 | $ cat report.log | grep -E 'Warning|Error'
283 | Warning: something went wrong
284 | Warning: something else went wrong
285 | Error: something seriously went wrong
286 | $ grep -E 'Warning|Error' report.log
287 | Warning: something went wrong
288 | Warning: something else went wrong
289 | Error: something seriously went wrong
290 | ```
291 | 
292 | * Use [input redirection](http://wiki.bash-hackers.org/howto/redirection_tutorial) if a command doesn't accept file input
293 | 
294 | ```bash
295 | $ cat marks_2015.txt | tr 'A-Z' 'a-z'
296 | name    maths   science
297 | foo     67      78
298 | bar     87      85
299 | $ tr 'A-Z' 'a-z' < marks_2015.txt
300 | name    maths   science
301 | foo     67      78
302 | bar     87      85
303 | ```
304 | 
305 | * However, `cat` should definitely be used where **concatenation** is needed
306 | 
307 | ```bash
308 | $ grep -c 'foo' marks_201*
309 | marks_2015.txt:1
310 | marks_2016.txt:1
311 | marks_2017.txt:1
312 | 
313 | $ # concatenation allows to get overall count in one-shot in this case
314 | $ cat marks_201* | grep -c 'foo'
315 | 3
316 | ```
317 | 
318 | <br>
319 | 
320 | #### <a name="further-reading-for-cat"></a>Further Reading for cat
321 | 
322 | * [cat Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/cat?sort=votes&pageSize=15)
323 | * [cat Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/cat?sort=votes&pageSize=15)
324 | 
325 | <br>
326 | 
327 | ## <a name="less"></a>less
328 | 
329 | ```bash
330 | $ less --version | head -n1
331 | less 481 (GNU regular expressions)
332 | 
333 | $ # By default, pager is used to display the man pages
334 | $ # and usually, pager is linked to less command
335 | $ type pager less
336 | pager is /usr/bin/pager
337 | less is /usr/bin/less
338 | 
339 | $ realpath /usr/bin/pager
340 | /bin/less
341 | $ realpath /usr/bin/less
342 | /bin/less
343 | $ diff -s /usr/bin/pager /usr/bin/less
344 | Files /usr/bin/pager and /usr/bin/less are identical
345 | ```
346 | 
347 | * `cat` command is NOT suitable for viewing contents of large files on the Terminal
348 | * `less` displays contents of a file, automatically fits to size of Terminal, allows scrolling in either direction and other options for effective viewing
349 | * Usually, `man` command uses `less` command to display the help page
350 | * The navigation commands are similar to `vi` editor
351 | 
352 | <br>
353 | 
354 | #### <a name="navigation-commands"></a>Navigation commands
355 | 
356 | Commonly used commands are given below, press `h` for summary of options
357 | 
358 | * `g` go to start of file
359 | * `G` go to end of file
360 | * `q` quit
361 | * `/pattern` search for the given pattern in forward direction
362 | * `?pattern` search for the given pattern in backward direction
363 | * `n` go to next pattern
364 | * `N` go to previous pattern
365 | 
366 | <br>
367 | 
368 | #### <a name="further-reading-for-less"></a>Further Reading for less
369 | 
370 | * See `man less` for detailed info on commands and options. For example:
371 |     * `-s` option to squeeze consecutive blank lines
372 |     * `-N` option to prefix line number
373 | * `less` command is an [improved version](https://unix.stackexchange.com/questions/604/isnt-less-just-more) of `more` command
374 | * [differences between most, more and less](https://unix.stackexchange.com/questions/81129/what-are-the-differences-between-most-more-and-less)
375 | * [less Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/less?sort=votes&pageSize=15)
376 | 
377 | <br>
378 | 
379 | ## <a name="tail"></a>tail
380 | 
381 | ```bash
382 | $ tail --version | head -n1
383 | tail (GNU coreutils) 8.25
384 | 
385 | $ man tail
386 | TAIL(1)                          User Commands                         TAIL(1)
387 | 
388 | NAME
389 |        tail - output the last part of files
390 | 
391 | SYNOPSIS
392 |        tail [OPTION]... [FILE]...
393 | 
394 | DESCRIPTION
395 |        Print  the  last  10  lines of each FILE to standard output.  With more
396 |        than one FILE, precede each with a header giving the file name.
397 | 
398 |        With no FILE, or when FILE is -, read standard input.
399 | ...
400 | ```
401 | 
402 | <br>
403 | 
404 | #### <a name="linewise-tail"></a>linewise tail
405 | 
406 | Consider this sample file, with line numbers prefixed
407 | 
408 | ```bash
409 | $ cat sample.txt
410 |  1) Hello World
411 |  2) 
412 |  3) Good day
413 |  4) How are you
414 |  5) 
415 |  6) Just do-it
416 |  7) Believe it
417 |  8) 
418 |  9) Today is sunny
419 | 10) Not a bit funny
420 | 11) No doubt you like it too
421 | 12) 
422 | 13) Much ado about nothing
423 | 14) He he he
424 | 15) Adios amigo
425 | ```
426 | 
427 | * default behavior - display last 10 lines
428 | 
429 | ```bash
430 | $ tail sample.txt
431 |  6) Just do-it
432 |  7) Believe it
433 |  8) 
434 |  9) Today is sunny
435 | 10) Not a bit funny
436 | 11) No doubt you like it too
437 | 12) 
438 | 13) Much ado about nothing
439 | 14) He he he
440 | 15) Adios amigo
441 | ```
442 | 
443 | * Use `-n` option to control number of lines to filter
444 | 
445 | ```bash
446 | $ tail -n3 sample.txt
447 | 13) Much ado about nothing
448 | 14) He he he
449 | 15) Adios amigo
450 | 
451 | $ # some versions of tail allow to skip explicit n character
452 | $ tail -5 sample.txt
453 | 11) No doubt you like it too
454 | 12) 
455 | 13) Much ado about nothing
456 | 14) He he he
457 | 15) Adios amigo
458 | ```
459 | 
460 | * when number is prefixed with `+` sign, all lines are fetched from that particular line number to end of file
461 | 
462 | ```bash
463 | $ tail -n +10 sample.txt
464 | 10) Not a bit funny
465 | 11) No doubt you like it too
466 | 12) 
467 | 13) Much ado about nothing
468 | 14) He he he
469 | 15) Adios amigo
470 | 
471 | $ seq 13 17 | tail -n +3
472 | 15
473 | 16
474 | 17
475 | ```
476 | 
477 | <br>
478 | 
479 | #### <a name="characterwise-tail"></a>characterwise tail
480 | 
481 | * Note that this works byte wise and not suitable for multi-byte character encodings
482 | 
483 | ```bash
484 | $ # last three characters including the newline character
485 | $ echo 'Hi there!' | tail -c3
486 | e!
487 | 
488 | $ # excluding the first character
489 | $ echo 'Hi there!' | tail -c +2
490 | i there!
491 | ```
492 | 
493 | <br>
494 | 
495 | #### <a name="multiple-file-input-for-tail"></a>multiple file input for tail
496 | 
497 | ```bash
498 | $ tail -n2 report.log sample.txt
499 | ==> report.log <==
500 | Error: something seriously went wrong
501 | blah blah blah
502 | 
503 | ==> sample.txt <==
504 | 14) He he he
505 | 15) Adios amigo
506 | 
507 | $ # -q option to avoid filename in output
508 | $ tail -q -n2 report.log sample.txt
509 | Error: something seriously went wrong
510 | blah blah blah
511 | 14) He he he
512 | 15) Adios amigo
513 | ```
514 | 
515 | <br>
516 | 
517 | #### <a name="further-reading-for-tail"></a>Further Reading for tail
518 | 
519 | * `tail -f` and related options are beyond the scope of this tutorial. Below links might be useful
520 |     * [look out for buffering](http://mywiki.wooledge.org/BashFAQ/009)
521 |     * [Piping tail -f output though grep twice](https://stackoverflow.com/questions/13858912/piping-tail-output-though-grep-twice)
522 |     * [tail and less](https://unix.stackexchange.com/questions/196168/does-less-have-a-feature-like-tail-follow-name-f)
523 | * [tail Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/tail?sort=votes&pageSize=15)
524 | * [tail Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/tail?sort=votes&pageSize=15)
525 | 
526 | <br>
527 | 
528 | ## <a name="head"></a>head
529 | 
530 | ```bash
531 | $ head --version | head -n1
532 | head (GNU coreutils) 8.25
533 | 
534 | $ man head
535 | HEAD(1)                          User Commands                         HEAD(1)
536 | 
537 | NAME
538 |        head - output the first part of files
539 | 
540 | SYNOPSIS
541 |        head [OPTION]... [FILE]...
542 | 
543 | DESCRIPTION
544 |        Print  the  first  10 lines of each FILE to standard output.  With more
545 |        than one FILE, precede each with a header giving the file name.
546 | 
547 |        With no FILE, or when FILE is -, read standard input.
548 | ...
549 | ```
550 | 
551 | <br>
552 | 
553 | #### <a name="linewise-head"></a>linewise head
554 | 
555 | * default behavior - display starting 10 lines
556 | 
557 | ```bash
558 | $ head sample.txt
559 |  1) Hello World
560 |  2) 
561 |  3) Good day
562 |  4) How are you
563 |  5) 
564 |  6) Just do-it
565 |  7) Believe it
566 |  8) 
567 |  9) Today is sunny
568 | 10) Not a bit funny
569 | ```
570 | 
571 | * Use `-n` option to control number of lines to filter
572 | 
573 | ```bash
574 | $ head -n3 sample.txt
575 |  1) Hello World
576 |  2) 
577 |  3) Good day
578 | 
579 | $ # some versions of head allow to skip explicit n character
580 | $ head -4 sample.txt
581 |  1) Hello World
582 |  2) 
583 |  3) Good day
584 |  4) How are you
585 | ```
586 | 
587 | * when number is prefixed with `-` sign, all lines are fetched except those many lines to end of file
588 | 
589 | ```bash
590 | $ # except last 9 lines of file
591 | $ head -n -9 sample.txt
592 |  1) Hello World
593 |  2) 
594 |  3) Good day
595 |  4) How are you
596 |  5) 
597 |  6) Just do-it
598 | 
599 | $ # except last 2 lines
600 | $ seq 13 17 | head -n -2
601 | 13
602 | 14
603 | 15
604 | ```
605 | 
606 | <br>
607 | 
608 | #### <a name="characterwise-head"></a>characterwise head
609 | 
610 | * Note that this works byte wise and not suitable for multi-byte character encodings
611 | 
612 | ```bash
613 | $ # if output of command doesn't end with newline, prompt will be on same line
614 | $ # to highlight working of command, the prompt for such cases is not shown here
615 | 
616 | $ # first two characters
617 | $ echo 'Hi there!' | head -c2
618 | Hi
619 | 
620 | $ # excluding last four characters
621 | $ echo 'Hi there!' | head -c -4
622 | Hi the
623 | ```
624 | 
625 | <br>
626 | 
627 | #### <a name="multiple-file-input-for-head"></a>multiple file input for head
628 | 
629 | ```bash
630 | $ head -n3 report.log sample.txt
631 | ==> report.log <==
632 | blah blah
633 | Warning: something went wrong
634 | more blah
635 | 
636 | ==> sample.txt <==
637 |  1) Hello World
638 |  2) 
639 |  3) Good day
640 | 
641 | $ # -q option to avoid filename in output
642 | $ head -q -n3 report.log sample.txt
643 | blah blah
644 | Warning: something went wrong
645 | more blah
646 |  1) Hello World
647 |  2) 
648 |  3) Good day
649 | ```
650 | 
651 | <br>
652 | 
653 | #### <a name="combining-head-and-tail"></a>combining head and tail
654 | 
655 | * Despite involving two commands, often this combination is faster than equivalent sed/awk versions
656 | 
657 | ```bash
658 | $ head -n11 sample.txt | tail -n3
659 |  9) Today is sunny
660 | 10) Not a bit funny
661 | 11) No doubt you like it too
662 | 
663 | $ tail sample.txt | head -n2
664 |  6) Just do-it
665 |  7) Believe it
666 | ```
667 | 
668 | <br>
669 | 
670 | #### <a name="further-reading-for-head"></a>Further Reading for head
671 | 
672 | * [head Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/head?sort=votes&pageSize=15)
673 | 
674 | <br>
675 | 
676 | ## <a name="text-editors"></a>Text Editors
677 | 
678 | For editing text files, the following applications can be used. Of these, `gedit`, `nano`, `vi` and/or `vim` are available in most distros by default
679 | 
680 | Easy to use
681 | 
682 | * [gedit](https://wiki.gnome.org/Apps/Gedit)
683 | * [geany](http://www.geany.org/)
684 | * [nano](http://nano-editor.org/)
685 | 
686 | Powerful text editors
687 | 
688 | * [vim](https://github.com/vim/vim)
689 |     * [vim learning resources](https://github.com/learnbyexample/scripting_course/blob/master/Vim_curated_resources.md) and [vim reference](https://github.com/learnbyexample/vim_reference) for further info
690 | * [emacs](https://www.gnu.org/software/emacs/)
691 | * [atom](https://atom.io/)
692 | * [sublime](https://www.sublimetext.com/)
693 | 
694 | Check out [this analysis](https://github.com/jhallen/joes-sandbox/tree/master/editor-perf) for some performance/feature comparisons of various text editors
695 | 


--------------------------------------------------------------------------------
/whats_the_difference.md:
--------------------------------------------------------------------------------
  1 | # <a name="whats-the-difference"></a>What's the difference
  2 | 
  3 | **Table of Contents**
  4 | 
  5 | * [cmp](#cmp)
  6 | * [diff](#diff)
  7 |     * [Comparing Directories](#comparing-directories)
  8 |     * [colordiff](#colordiff)
  9 | 
 10 | <br>
 11 | 
 12 | ## <a name="cmp"></a>cmp
 13 | 
 14 | ```bash
 15 | $ cmp --version | head -n1
 16 | cmp (GNU diffutils) 3.3
 17 | 
 18 | $ man cmp
 19 | CMP(1)                           User Commands                          CMP(1)
 20 | 
 21 | NAME
 22 |        cmp - compare two files byte by byte
 23 | 
 24 | SYNOPSIS
 25 |        cmp [OPTION]... FILE1 [FILE2 [SKIP1 [SKIP2]]]
 26 | 
 27 | DESCRIPTION
 28 |        Compare two files byte by byte.
 29 | 
 30 |        The optional SKIP1 and SKIP2 specify the number of bytes to skip at the
 31 |        beginning of each file (zero by default).
 32 | ...
 33 | ```
 34 | 
 35 | * As the comparison is byte by byte, it doesn't matter if file is human readable or not
 36 | * A typical use case is to check if two executables are same or not
 37 | 
 38 | ```bash
 39 | $ echo 'foo 123' > f1; echo 'food 123' > f2
 40 | $ cmp f1 f2
 41 | f1 f2 differ: byte 4, line 1
 42 | 
 43 | $ # print differing bytes
 44 | $ cmp -b f1 f2
 45 | f1 f2 differ: byte 4, line 1 is  40   144 d
 46 | 
 47 | $ # skip given bytes from each file
 48 | $ # if only one number is given, it is used for both inputs
 49 | $ cmp -i 3:4 f1 f2
 50 | $ echo $?
 51 | 0
 52 | 
 53 | $ # compare only given number of bytes from start of inputs
 54 | $ cmp -n 3 f1 f2
 55 | $ echo $?
 56 | 0
 57 | 
 58 | $ # suppress output
 59 | $ cmp -s f1 f2
 60 | $ echo $?
 61 | 1
 62 | ```
 63 | 
 64 | * Comparison stops immediately at the first difference found
 65 | * If verbose option `-l` is used, comparison would stop at whichever input reaches end of file first
 66 | 
 67 | ```bash
 68 | $ # first column is byte number
 69 | $ # second/third column is respective octal value of differing bytes
 70 | $ cmp -l f1 f2
 71 | 4  40 144
 72 | 5  61  40
 73 | 6  62  61
 74 | 7  63  62
 75 | 8  12  63
 76 | cmp: EOF on f1
 77 | ```
 78 | 
 79 | **Further Reading**
 80 | 
 81 | * `man cmp` and `info cmp` for more options and detailed documentation
 82 | 
 83 | 
 84 | <br>
 85 | 
 86 | ## <a name="diff"></a>diff
 87 | 
 88 | ```bash
 89 | $ diff --version | head -n1
 90 | diff (GNU diffutils) 3.3
 91 | 
 92 | $ man diff
 93 | DIFF(1)                          User Commands                         DIFF(1)
 94 | 
 95 | NAME
 96 |        diff - compare files line by line
 97 | 
 98 | SYNOPSIS
 99 |        diff [OPTION]... FILES
100 | 
101 | DESCRIPTION
102 |        Compare FILES line by line.
103 | ...
104 | ```
105 | 
106 | * `diff` output shows lines from first file input starting with `<`
107 | * lines from second file input starts with `>`
108 | * between the two file contents, `---` is used as separator
109 | * each difference is prefixed by a command that indicates the differences (see links at end of section for more details)
110 | 
111 | ```bash
112 | $ paste d1 d2
113 | 1       1
114 | 2       hello
115 | 3       3
116 | world   4
117 | 
118 | $ diff d1 d2
119 | 2c2
120 | < 2
121 | ---
122 | > hello
123 | 4c4
124 | < world
125 | ---
126 | > 4
127 | 
128 | $ diff <(seq 4) <(seq 5)
129 | 4a5
130 | > 5
131 | ```
132 | 
133 | * use `-i` option to ignore case
134 | 
135 | ```bash
136 | $ echo 'Hello World!' > i1
137 | $ echo 'hello world!' > i2
138 | 
139 | $ diff i1 i2
140 | 1c1
141 | < Hello World!
142 | ---
143 | > hello world!
144 | 
145 | $ diff -i i1 i2
146 | $ echo $?
147 | 0
148 | ```
149 | 
150 | * ignoring difference in white spaces
151 | 
152 | ```bash
153 | $ # -b option to ignore changes in the amount of white space
154 | $ diff -b <(echo 'good day') <(echo 'good    day')
155 | $ echo $?
156 | 0
157 | 
158 | $ # -w option to ignore all white spaces
159 | $ diff -w <(echo 'hi    there ') <(echo ' hi there')
160 | $ echo $?
161 | 0
162 | $ diff -w <(echo 'hi    there ') <(echo 'hithere')
163 | $ echo $?
164 | 0
165 | 
166 | # use -B to ignore only blank lines
167 | # use -E to ignore changes due to tab expansion
168 | # use -z to ignore trailing white spaces at end of line
169 | ```
170 | 
171 | * side-by-side output
172 | 
173 | ```bash
174 | $ diff -y d1 d2
175 | 1                                                               1
176 | 2                                                             | hello
177 | 3                                                               3
178 | world                                                         | 4
179 | 
180 | $ # -y is usually used along with other options
181 | $ # default width is 130 print columns
182 | $ diff -W 60 --suppress-common-lines -y d1 d2
183 | 2                            |  hello
184 | world                        |  4
185 | 
186 | $ diff -W 20 --left-column -y <(seq 4) <(seq 5)
187 | 1     (
188 | 2     (
189 | 3     (
190 | 4     (
191 |       > 5
192 | ```
193 | 
194 | * by default, there is no output if input files are same. Use `-s` option to additionally indicate files are same
195 | * by default, all differences are shown. Use `-q` option to indicate only that files differ
196 | 
197 | ```bash
198 | $ cp i1 i1_copy
199 | $ diff -s i1 i1_copy
200 | Files i1 and i1_copy are identical
201 | $ diff -s i1 i2
202 | 1c1
203 | < Hello World!
204 | ---
205 | > hello world!
206 | 
207 | $ diff -q i1 i1_copy
208 | $ diff -q i1 i2
209 | Files i1 and i2 differ
210 | 
211 | $ # combine them to always get one line output
212 | $ diff -sq i1 i1_copy
213 | Files i1 and i1_copy are identical
214 | $ diff -sq i1 i2
215 | Files i1 and i2 differ
216 | ```
217 | 
218 | <br>
219 | 
220 | #### <a name="comparing-directories"></a>Comparing Directories
221 | 
222 | * when comparing two files of same name from different directories, specifying the filename is optional for one of the directories
223 | 
224 | ```bash
225 | $ mkdir dir1 dir2
226 | $ echo 'Hello World!' > dir1/i1
227 | $ echo 'hello world!' > dir2/i1
228 | 
229 | $ diff dir1/i1 dir2
230 | 1c1
231 | < Hello World!
232 | ---
233 | > hello world!
234 | 
235 | $ diff -s i1 dir1/
236 | Files i1 and dir1/i1 are identical
237 | $ diff -s . dir1/i1
238 | Files ./i1 and dir1/i1 are identical
239 | ```
240 | 
241 | * if both arguments are directories, all files are compared
242 | 
243 | ```bash
244 | $ touch dir1/report.log dir1/lists dir2/power.log
245 | $ cp f1 dir1/
246 | $ cp f1 dir2/
247 | 
248 | $ # by default, all differences are reported
249 | $ # as well as filenames which are unique to respective directories
250 | $ diff dir1 dir2
251 | diff dir1/i1 dir2/i1
252 | 1c1
253 | < Hello World!
254 | ---
255 | > hello world!
256 | Only in dir1: lists
257 | Only in dir2: power.log
258 | Only in dir1: report.log
259 | ```
260 | 
261 | * to report only filenames
262 | 
263 | ```bash
264 | $ diff -sq dir1 dir2
265 | Files dir1/f1 and dir2/f1 are identical
266 | Files dir1/i1 and dir2/i1 differ
267 | Only in dir1: lists
268 | Only in dir2: power.log
269 | Only in dir1: report.log
270 | 
271 | $ # list only differing files
272 | $ # also useful to copy-paste the command for GUI diffs like tkdiff/vimdiff
273 | $ diff dir1 dir2 | grep '^diff '
274 | diff dir1/i1 dir2/i1
275 | ```
276 | 
277 | * to recursively compare sub-directories as well, use `-r`
278 | 
279 | ```bash
280 | $ mkdir dir1/subdir dir2/subdir
281 | $ echo 'good' > dir1/subdir/f1
282 | $ echo 'goad' > dir2/subdir/f1
283 | 
284 | $ diff -srq dir1 dir2
285 | Files dir1/f1 and dir2/f1 are identical
286 | Files dir1/i1 and dir2/i1 differ
287 | Only in dir1: lists
288 | Only in dir2: power.log
289 | Only in dir1: report.log
290 | Files dir1/subdir/f1 and dir2/subdir/f1 differ
291 | 
292 | $ diff -r dir1 dir2 | grep '^diff '
293 | diff -r dir1/i1 dir2/i1
294 | diff -r dir1/subdir/f1 dir2/subdir/f1
295 | ```
296 | 
297 | * See also [GNU diffutils manual - comparing directories](https://www.gnu.org/software/diffutils/manual/diffutils.html#Comparing-Directories) for further options and details like excluding files, ignoring filename case, etc and `dirdiff` command
298 | 
299 | <br>
300 | 
301 | #### <a name="colordiff"></a>colordiff
302 | 
303 | ```bash
304 | $ whatis colordiff 
305 | colordiff (1)        - a tool to colorize diff output
306 | 
307 | $ whatis wdiff
308 | wdiff (1)            - display word differences between text files
309 | ```
310 | 
311 | * simply replace `diff` with `colordiff`
312 | 
313 | ![colordiff](./images/colordiff.png)
314 | 
315 | * or, pass output of a `diff` tool to `colordiff`
316 | 
317 | ![wdiff to colordiff](./images/wdiff_to_colordiff.png)
318 | 
319 | * See also [stackoverflow - How to colorize diff on the command line?](https://stackoverflow.com/questions/8800578/how-to-colorize-diff-on-the-command-line) for other options
320 | 
321 | <br>
322 | 
323 | **Further Reading**
324 | 
325 | * `man diff` and `info diff` for more options and detailed documentation
326 |     * [GNU diffutils manual](https://www.gnu.org/software/diffutils/manual/diffutils.html) for a better documentation
327 | * `man -k diff` to get list of all commands related to `diff`
328 | * [diff Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/diff?sort=votes&pageSize=15)
329 | * [unix.stackexchange - GUI diff and merge tools](https://unix.stackexchange.com/questions/4573/which-gui-diff-viewer-would-you-recommend-with-copy-to-left-right-functionality)
330 | * [unix.stackexchange - Understanding diff output](https://unix.stackexchange.com/questions/81998/understanding-of-diff-output)
331 | * [stackoverflow - Using output of diff to create patch](https://stackoverflow.com/questions/437219/using-the-output-of-diff-to-create-the-patch)
332 | 
333 | 


--------------------------------------------------------------------------------
/wheres_my_file.md:
--------------------------------------------------------------------------------
  1 | # <a name="where's-my-file"></a>Where's my file
  2 | 
  3 | **Table of Contents**
  4 | 
  5 | * [find](#find)
  6 | * [locate](#locate)
  7 | 
  8 | <br>
  9 | 
 10 | ## <a name="find"></a>find
 11 | 
 12 | ```bash
 13 | $ find --version | head -n1
 14 | find (GNU findutils) 4.7.0-git
 15 | 
 16 | $ man find
 17 | FIND(1)                     General Commands Manual                    FIND(1)
 18 | 
 19 | NAME
 20 |        find - search for files in a directory hierarchy
 21 | 
 22 | SYNOPSIS
 23 |        find  [-H]  [-L]  [-P]  [-D  debugopts]  [-Olevel]  [starting-point...]
 24 |        [expression]
 25 | 
 26 | DESCRIPTION
 27 |        This manual page documents the GNU version of find.  GNU find  searches
 28 |        the  directory  tree  rooted at each given starting-point by evaluating
 29 |        the given expression from left to right,  according  to  the  rules  of
 30 |        precedence  (see  section  OPERATORS),  until the outcome is known (the
 31 |        left hand side is false for and operations,  true  for  or),  at  which
 32 |        point  find  moves  on  to the next file name.  If no starting-point is
 33 |        specified, `.' is assumed.
 34 | ...
 35 | ```
 36 | 
 37 | **Examples**
 38 | 
 39 | Filtering based on file name
 40 | 
 41 | * `find . -iname 'power.log'` search and print path of file named power.log (ignoring case) in current directory and its sub-directories
 42 | * `find -name '*log'` search and print path of all files whose name ends with log in current directory - using `.` is optional when searching in current directory
 43 | * `find -not -name '*log'` print path of all files whose name does NOT end with log in current directory
 44 | * `find -regextype egrep -regex '.*/\w+'` use extended regular expression to match filename containing only `[a-zA-Z_]` characters
 45 |     * `.*/` is needed to match initial part of file path
 46 | 
 47 | Filtering based on file type
 48 | 
 49 | * `find /home/guest1/proj -type f` print path of all regular files found in specified directory
 50 | * `find /home/guest1/proj -type d` print path of all directories found in specified directory
 51 | * `find /home/guest1/proj -type f -name '.*'` print path of all hidden files
 52 | 
 53 | Filtering based on depth
 54 | 
 55 | The relative path `.` is considered as depth 0 directory, files and folders immediately contained in a directory are at depth 1 and so on
 56 | 
 57 | * `find -maxdepth 1 -type f` all regular files (including hidden ones) from current directory (without going to sub-directories)
 58 | * `find -maxdepth 1 -type f -name '[!.]*'` all regular files (but not hidden ones) from current directory (without going to sub-directories)
 59 |     * `-not -name '.*'` can be also used
 60 | * `find -mindepth 1 -maxdepth 1 -type d` all directories (including hidden ones) in current directory (without going to sub-directories)
 61 | 
 62 | Filtering based on file properties
 63 | 
 64 | * `find -mtime -2` print files that were modified within last two days in current directory
 65 |     * Note that day here means 24 hours
 66 | * `find -mtime +7` print files that were modified more than seven days back in current directory
 67 | * `find -daystart -type f -mtime -1` files that were modified from beginning of day (not past 24 hours)
 68 | * `find -size +10k` print files with size greater than 10 kilobytes in current directory
 69 | * `find -size -1M` print files with size less than 1 megabytes in current directory
 70 | * `find -size 2G` print files of size 2 gigabytes in current directory
 71 | 
 72 | Passing filtered files as input to other commands
 73 | 
 74 | * `find report -name '*log*' -exec rm {} \;` delete all filenames containing log in report folder and its sub-folders
 75 |     * here `rm` command is called for every file matching the search conditions
 76 |     * since `;` is a special character for shell, it needs to be escaped using `\`
 77 | * `find report -name '*log*' -delete` delete all filenames containing log in report folder and its sub-folders
 78 | * `find -name '*.txt' -exec wc {} +` list of files ending with txt are all passed together as argument to `wc` command instead of executing wc command for every file
 79 |     * no need to use escape the `+` character in this case
 80 |     * also note that number of invocations of command specified is not necessarily once if number of files found is too large
 81 | * `find -name '*.log' -exec mv {} ../log/ \;` move files ending with .log to log directory present in one hierarchy above. `mv` is executed once per each filtered file
 82 | * `find -name '*.log' -exec mv -t ../log/ {} +` the `-t` option allows to specify target directory and then provide multiple files to be moved as argument
 83 |     * Similarly, one can use `-t` for `cp` command
 84 | 
 85 | **Further Reading**
 86 | 
 87 | * [using find](http://mywiki.wooledge.org/UsingFind)
 88 | * [find examples on SO](https://stackoverflow.com/documentation/bash/566/find#t=201612140534548263961)
 89 | * [Collection of find examples](http://alvinalexander.com/unix/edu/examples/find.shtml)
 90 | * [find Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/find?sort=votes&pageSize=15)
 91 | * [find and tar example](https://unix.stackexchange.com/questions/282762/find-mtime-1-print-xargs-tar-archives-all-files-from-directory-ignoring-t/282885#282885)
 92 | * [find Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/find?sort=votes&pageSize=15)
 93 | * [Why is looping over find's output bad practice?](https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice)
 94 | 
 95 | 
 96 | <br>
 97 | 
 98 | ## <a name="locate"></a>locate
 99 | 
100 | ```bash
101 | $ locate --version | head -n1
102 | mlocate 0.26
103 | 
104 | $ man locate
105 | locate(1)                   General Commands Manual                  locate(1)
106 | 
107 | NAME
108 |        locate - find files by name
109 | 
110 | SYNOPSIS
111 |        locate [OPTION]... PATTERN...
112 | 
113 | DESCRIPTION
114 |        locate  reads  one or more databases prepared by updatedb(8) and writes
115 |        file names matching at least one of the PATTERNs  to  standard  output,
116 |        one per line.
117 | 
118 |        If  --regex is not specified, PATTERNs can contain globbing characters.
119 |        If any PATTERN contains no globbing characters, locate  behaves  as  if
120 |        the pattern were *PATTERN*.
121 | ...
122 | ```
123 | 
124 | Faster alternative to `find` command when searching for a file by its name. It is based on a database, which gets updated by a `cron` job. So, newer files may be not present in results. Use this command if it is available in your distro and you remember some part of filename. Very useful if one has to search entire filesystem in which case `find` command might take a very long time compared to `locate`
125 | 
126 | **Examples**
127 | 
128 | * `locate 'power'` print path of files containing power in the whole filesystem
129 |     * matches anywhere in path, ex: '/home/learnbyexample/lowpower_adder/result.log' and '/home/learnbyexample/power.log' are both a valid match
130 |     * implicitly, `locate` would change the string to `*power*` as no globbing characters are present in the string specified
131 | * `locate -b '\power.log'` print path matching the string power.log exactly at end of path
132 |     * '/home/learnbyexample/power.log' matches but not '/home/learnbyexample/lowpower.log'
133 |     * since globbing character '\' is used while specifying search string, it doesn't get implicitly replaced by `*power.log*`
134 | * `locate -b '\proj_adder'` the `-b` option also comes in handy to print only the path of directory name, otherwise every file under that folder would also be displayed
135 | * [find vs locate - pros and cons](https://unix.stackexchange.com/questions/60205/locate-vs-find-usage-pros-and-cons-of-each-other)
136 | 
137 | 
138 | 


--------------------------------------------------------------------------------