├── README.md
├── exercises
├── GNU_grep
│ ├── .ref_solutions
│ │ ├── ex01_basic_match.txt
│ │ ├── ex02_basic_options.txt
│ │ ├── ex03_multiple_string_match.txt
│ │ ├── ex04_filenames.txt
│ │ ├── ex05_word_line_matching.txt
│ │ ├── ex06_ABC_context_matching.txt
│ │ ├── ex07_recursive_search.txt
│ │ ├── ex08_search_pattern_from_file.txt
│ │ ├── ex09_regex_anchors.txt
│ │ ├── ex10_regex_this_or_that.txt
│ │ ├── ex11_regex_quantifiers.txt
│ │ ├── ex12_regex_character_class_part1.txt
│ │ ├── ex13_regex_character_class_part2.txt
│ │ ├── ex14_regex_grouping_and_backreference.txt
│ │ ├── ex15_regex_PCRE.txt
│ │ └── ex16_misc_and_extras.txt
│ ├── ex01_basic_match.txt
│ ├── ex01_basic_match
│ │ └── sample.txt
│ ├── ex02_basic_options.txt
│ ├── ex02_basic_options
│ │ └── sample.txt
│ ├── ex03_multiple_string_match.txt
│ ├── ex03_multiple_string_match
│ │ └── sample.txt
│ ├── ex04_filenames.txt
│ ├── ex04_filenames
│ │ ├── greeting.txt
│ │ ├── poem.txt
│ │ └── sample.txt
│ ├── ex05_word_line_matching.txt
│ ├── ex05_word_line_matching
│ │ ├── greeting.txt
│ │ ├── sample.txt
│ │ └── words.txt
│ ├── ex06_ABC_context_matching.txt
│ ├── ex06_ABC_context_matching
│ │ └── sample.txt
│ ├── ex07_recursive_search.txt
│ ├── ex07_recursive_search
│ │ ├── msg
│ │ │ ├── greeting.txt
│ │ │ └── sample.txt
│ │ ├── poem.txt
│ │ ├── progs
│ │ │ ├── hello.py
│ │ │ └── hello.sh
│ │ └── words.txt
│ ├── ex08_search_pattern_from_file.txt
│ ├── ex08_search_pattern_from_file
│ │ ├── baz.txt
│ │ ├── foo.txt
│ │ └── words.txt
│ ├── ex09_regex_anchors.txt
│ ├── ex09_regex_anchors
│ │ └── sample.txt
│ ├── ex10_regex_this_or_that.txt
│ ├── ex10_regex_this_or_that
│ │ └── sample.txt
│ ├── ex11_regex_quantifiers.txt
│ ├── ex11_regex_quantifiers
│ │ └── garbled.txt
│ ├── ex12_regex_character_class_part1.txt
│ ├── ex12_regex_character_class_part1
│ │ └── sample_words.txt
│ ├── ex13_regex_character_class_part2.txt
│ ├── ex13_regex_character_class_part2
│ │ └── sample.txt
│ ├── ex14_regex_grouping_and_backreference.txt
│ ├── ex14_regex_grouping_and_backreference
│ │ └── sample.txt
│ ├── ex15_regex_PCRE.txt
│ ├── ex15_regex_PCRE
│ │ └── sample.txt
│ ├── ex16_misc_and_extras.txt
│ ├── ex16_misc_and_extras
│ │ ├── garbled.txt
│ │ ├── poem.txt
│ │ └── sample.txt
│ └── solve
└── README.md
├── file_attributes.md
├── gnu_awk.md
├── gnu_grep.md
├── gnu_sed.md
├── images
├── color_option.png
├── colordiff.png
├── highlight_string_whole_file_op.png
└── wdiff_to_colordiff.png
├── miscellaneous.md
├── overview_presentation
├── baz.json
├── cli_text_processing.pdf
├── foo.xml
├── greeting.txt
└── sample.txt
├── perl_the_swiss_knife.md
├── restructure_text.md
├── ruby_one_liners.md
├── sorting_stuff.md
├── tail_less_cat_head.md
├── whats_the_difference.md
└── wheres_my_file.md
/README.md:
--------------------------------------------------------------------------------
1 | # Command Line Text Processing
2 |
3 | Learn about various commands available for common and exotic text processing needs. Examples have been tested on GNU/Linux - there'd be syntax/feature variations with other distributions, consult their respective `man` pages for details.
4 |
5 | ---
6 |
7 | :warning: :warning: I'm no longer actively working on this repo. Instead, I've converted existing chapters into ebooks (see [ebook section](#ebooks) below for links), available under the same license. These ebooks are better formatted, updated for newer versions of the software, includes exercises, solutions, etc. Since all the chapters have been converted, I'm archiving this repo.
8 |
9 | ---
10 |
11 |
12 |
13 | ## Ebooks
14 |
15 | Individual online ebooks with better formatting, explanations, exercises, solutions, etc:
16 |
17 | * [CLI text processing with GNU grep and ripgrep](https://learnbyexample.github.io/learn_gnugrep_ripgrep/)
18 | * [CLI text processing with GNU sed](https://learnbyexample.github.io/learn_gnused/)
19 | * [CLI text processing with GNU awk](https://learnbyexample.github.io/learn_gnuawk/)
20 | * [Ruby One-Liners Guide](https://learnbyexample.github.io/learn_ruby_oneliners/)
21 | * [Perl One-Liners Guide](https://learnbyexample.github.io/learn_perl_oneliners/)
22 | * [CLI text processing with GNU Coreutils](https://learnbyexample.github.io/cli_text_processing_coreutils/)
23 | * [Linux Command Line Computing](https://learnbyexample.github.io/cli-computing/)
24 |
25 | See https://learnbyexample.github.io/books/ for links to PDF/EPUB versions and other ebooks.
26 |
27 |
28 |
29 | ## Chapters
30 |
31 | As mentioned earlier, I'm no longer actively working on these chapters:
32 |
33 | * [Cat, Less, Tail and Head](./tail_less_cat_head.md)
34 | * cat, less, tail, head, Text Editors
35 | * [GNU grep](./gnu_grep.md)
36 | * [GNU sed](./gnu_sed.md)
37 | * [GNU awk](./gnu_awk.md)
38 | * [Perl the swiss knife](./perl_the_swiss_knife.md)
39 | * [Ruby one liners](./ruby_one_liners.md)
40 | * [Sorting stuff](./sorting_stuff.md)
41 | * sort, uniq, comm, shuf
42 | * [Restructure text](./restructure_text.md)
43 | * paste, column, pr, fold
44 | * [Whats the difference](./whats_the_difference.md)
45 | * cmp, diff
46 | * [Wheres my file](./wheres_my_file.md)
47 | * [File attributes](./file_attributes.md)
48 | * wc, du, df, touch, file
49 | * [Miscellaneous](./miscellaneous.md)
50 | * cut, tr, basename, dirname, xargs, seq
51 |
52 |
53 |
54 | ## Webinar recordings
55 |
56 | Recorded couple of videos based on content in the chapters, not sure if I'll do more:
57 |
58 | * [Using the sort command](https://www.youtube.com/watch?v=qLfAwwb5vGs)
59 | * [Using uniq and comm](https://www.youtube.com/watch?v=uAb2kxA2TyQ)
60 |
61 | See also my short videos on [Linux command line tips](https://www.youtube.com/watch?v=p0KCLusMd5Q&list=PLTv2U3HnAL4PNTmRqZBSUgKaiHbRL2zeY)
62 |
63 |
64 |
65 | ## Exercises
66 |
67 | Check out [exercises](./exercises) directory to solve practice questions on `grep`, right from the command line itself.
68 |
69 | See also my [TUI-apps](https://github.com/learnbyexample/TUI-apps) repo for interactive CLI text processing exercises.
70 |
71 |
72 |
73 | ## Contributing
74 |
75 | * Please [open an issue](https://github.com/learnbyexample/Command-line-text-processing/issues) for typos or bugs
76 | * As this repo is no longer actively worked upon, **please do not submit pull requests**
77 | * Share the repo with friends/colleagues, on social media, etc to help reach other learners
78 | * In case you need to reach me, mail me at `echo 'yrneaolrknzcyr.arg@tznvy.pbz' | tr 'a-z' 'n-za-m'` or send a DM via [twitter](https://twitter.com/learn_byexample)
79 |
80 |
81 |
82 | ## Acknowledgements
83 |
84 | * [unix.stackexchange](https://unix.stackexchange.com/) and [stackoverflow](https://stackoverflow.com/) - for getting answers to pertinent questions as well as sharpening skills by understanding and answering questions
85 | * Forums like [Linux users](https://www.linkedin.com/groups/65688), [/r/commandline/](https://www.reddit.com/r/commandline/), [/r/linux/](https://www.reddit.com/r/linux/), [/r/ruby/](https://www.reddit.com/r/ruby/), [news.ycombinator](https://news.ycombinator.com/news), [devup](http://devup.in/) and others for valuable feedback (especially spotting mistakes) and encouragement
86 | * See [wikipedia entry 'Roses Are Red'](https://en.wikipedia.org/wiki/Roses_Are_Red) for `poem.txt` used as sample text input file
87 |
88 |
89 |
90 | ## License
91 |
92 | This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/)
93 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex01_basic_match.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing the string: day
2 | Solution: grep 'day' sample.txt
3 |
4 | 2) Match lines containing the string: it
5 | Solution: grep 'it' sample.txt
6 |
7 | 3) Match lines containing the string: do you
8 | Solution: grep 'do you' sample.txt
9 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex02_basic_options.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing the string irrespective of lower/upper case: no
2 | Solution: grep -i 'no' sample.txt
3 |
4 | 2) Match lines not containing the string: o
5 | Solution: grep -v 'o' sample.txt
6 |
7 | 3) Match lines with line numbers containing the string: it
8 | Solution: grep -n 'it' sample.txt
9 |
10 | 4) Output only number of matching lines containing the string: a
11 | Solution: grep -c 'a' sample.txt
12 |
13 | 5) Match first two lines containing the string: do
14 | Solution: grep -m2 'do' sample.txt
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex03_multiple_string_match.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing either of these three strings
2 | String1: Not
3 | String2: he
4 | String3: sun
5 | Solution: grep -e 'Not' -e 'he' -e 'sun' sample.txt
6 |
7 | 2) Match lines containing both these strings
8 | String1: He
9 | String2: or
10 | Solution: grep 'He' sample.txt | grep 'or'
11 |
12 | 3) Match lines containing either of these two strings
13 | String1: a
14 | String2: i
15 | and contains this as well
16 | String3: do
17 | Solution: grep -e 'a' -e 'i' sample.txt | grep 'do'
18 |
19 | 4) Match lines containing the string
20 | String1: it
21 | but not these strings
22 | String2: No
23 | String3: no
24 | Solution: grep 'it' sample.txt | grep -vi 'no'
25 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex04_filenames.txt:
--------------------------------------------------------------------------------
1 | Note: All files present in the directory should be given as file inputs to grep
2 |
3 | 1) Show only filenames containing the string: are
4 | Solution: grep -l 'are' *
5 |
6 | 2) Show only filenames NOT containing the string: two
7 | Solution: grep -L 'two' *
8 |
9 | 3) Match all lines containing the string: are
10 | Solution: grep 'are' *
11 |
12 | 4) Match maximum of two matching lines along with filenames containing the character: a
13 | Solution: grep -m2 'a' *
14 |
15 | 5) Match all lines without prefixing filename containing the string: to
16 | Solution: grep -h 'to' *
17 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex05_word_line_matching.txt:
--------------------------------------------------------------------------------
1 | Note: All files present in the directory should be given as file inputs to grep
2 |
3 | 1) Match lines containing whole word: do
4 | Solution: grep -w 'do' *
5 |
6 | 2) Match whole lines containing the string: Hello World
7 | Solution: grep -x 'Hello World' *
8 |
9 | 3) Match lines containing these whole words:
10 | Word1: He
11 | Word2: far
12 | Solution: grep -w -e 'far' -e 'He' *
13 |
14 | 4) Match lines containing the whole word: you
15 | and NOT containing the case insensitive string: How
16 | Solution: grep -w 'you' * | grep -vi 'how'
17 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex06_ABC_context_matching.txt:
--------------------------------------------------------------------------------
1 | 1) Get lines and 3 following it containing the string: you
2 | Solution: grep -A3 'you' sample.txt
3 |
4 | 2) Get lines and 2 preceding it containing the string: is
5 | Solution: grep -B2 'is' sample.txt
6 |
7 | 3) Get lines and 1 following/preceding containing the string: Not
8 | Solution: grep -C1 'Not' sample.txt
9 |
10 | 4) Get lines and 1 following and 4 preceding containing the string: Not
11 | Solution: grep -A1 -B4 'Not' sample.txt
12 |
13 | 5) Get lines and 1 preceding it containing the string: you
14 | there should be no separator between the matches
15 | Solution: grep --no-group-separator -B1 'you' sample.txt
16 |
17 | 6) Get lines and 1 preceding it containing the string: you
18 | the separator between the matches should be: #####
19 | Solution: grep --group-separator='#####' -B1 'you' sample.txt
20 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex07_recursive_search.txt:
--------------------------------------------------------------------------------
1 | Note: Every file in this directory and sub-directories is input for grep, unless otherwise specified
2 |
3 | 1) Match all lines containing the string: you
4 | Solution: grep -r 'you'
5 |
6 | 2) Show only filenames matching the string: Hello
7 | filenames should only end with .txt
8 | Solution: grep -rl --include='*.txt' 'Hello'
9 |
10 | 3) Show only filenames matching the string: Hello
11 | filenames should NOT end with .txt
12 | Solution: grep -rl --exclude='*.txt' 'Hello'
13 |
14 | 4) Show only filenames matching the string: are
15 | should not include the directory: progs
16 | Solution: grep -rl --exclude-dir='progs' 'are'
17 |
18 | 5) Show only filenames matching the string: are
19 | should NOT include these directories
20 | dir1: progs
21 | dir2: msg
22 | Solution: grep -rl --exclude-dir='progs' --exclude-dir='msg' 'are'
23 |
24 | 6) Show only filenames matching the string: are
25 | should include files only from sub-directories
26 | hint: use shell glob pattern to specify directories to search
27 | Solution: grep -rl 'are' */
28 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex08_search_pattern_from_file.txt:
--------------------------------------------------------------------------------
1 | Note: words.txt has only whole words per line, use it as file input when task is to match whole words
2 |
3 | 1) Match all strings from file words.txt in file baz.txt
4 | Solution: grep -f words.txt baz.txt
5 |
6 | 2) Match all words from file words.txt in file foo.txt
7 | should only match whole words
8 | should print only matching words, not entire line
9 | Solution: grep -owf words.txt foo.txt
10 |
11 | 3) Show common lines between foo.txt and baz.txt
12 | Solution: grep -Fxf foo.txt baz.txt
13 |
14 | 4) Show lines present in baz.txt but not in foo.txt
15 | Solution: grep -Fxvf foo.txt baz.txt
16 |
17 | 5) Show lines present in foo.txt but not in baz.txt
18 | Solution: grep -Fxvf baz.txt foo.txt
19 |
20 | 6) Find all words common between all three files in the directory
21 | should only match whole words
22 | should print only matching words, not entire line
23 | Solution: grep -owf words.txt foo.txt | grep -owf- baz.txt
24 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex09_regex_anchors.txt:
--------------------------------------------------------------------------------
1 | 1) Match all lines starting with: no
2 | Solution: grep '^no' sample.txt
3 |
4 | 2) Match all lines ending with: it
5 | Solution: grep 'it$' sample.txt
6 |
7 | 3) Match all lines containing whole word: do
8 | Solution: grep -w 'do' sample.txt
9 |
10 | 4) Match all lines containing words starting with: do
11 | Solution: grep '\' sample.txt
15 |
16 | 6) Match all lines starting with: ^
17 | Solution: grep '^^' sample.txt
18 |
19 | 7) Match all lines ending with: $
20 | Solution: grep '$$' sample.txt
21 |
22 | 8) Match all lines containing the string: in
23 | not surrounded by word boundaries, for ex: mint but not tin or ink
24 | Solution: grep '\Bin\B' sample.txt
25 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex10_regex_this_or_that.txt:
--------------------------------------------------------------------------------
1 | 1) Match all lines containing any of these strings:
2 | String1: day
3 | String2: not
4 | Solution: grep -E 'day|not' sample.txt
5 |
6 | 2) Match all lines containing any of these whole words:
7 | String1: he
8 | String2: in
9 | Solution: grep -wE 'he|in' sample.txt
10 |
11 | 3) Match all lines containing any of these strings:
12 | String1: you
13 | String2: be
14 | String3: to
15 | String4: he
16 | Solution: grep -E 'he|be|to|you' sample.txt
17 |
18 | 4) Match all lines containing any of these strings:
19 | String1: you
20 | String2: be
21 | String3: to
22 | String4: he
23 | but NOT these strings:
24 | String1: it
25 | String2: do
26 | Solution: grep -E 'he|be|to|you' sample.txt | grep -vE 'do|it'
27 |
28 | 5) Match all lines starting with any of these strings:
29 | String1: no
30 | String2: to
31 | Solution: grep -E '^no|^to' sample.txt
32 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex11_regex_quantifiers.txt:
--------------------------------------------------------------------------------
1 | 1) Extract all 3 character strings surrounded by word boundaries
2 | Solution: grep -ow '...' garbled.txt
3 |
4 | 2) Extract largest string from each line
5 | starting with character: d
6 | ending with character : g
7 | Solution: grep -o 'd.*g' garbled.txt
8 |
9 | 3) Extract all strings from each line
10 | starting with character: d
11 | followed by zero or one: o
12 | ending with character : g
13 | Solution: grep -oE 'do?g' garbled.txt
14 |
15 | 4) Extract all strings from each line
16 | starting with character: d
17 | followed by zero or one of any character
18 | ending with character : g
19 | Solution: grep -oE 'd.?g' garbled.txt
20 |
21 | 5) Extract all strings from each line
22 | starting with character: g
23 | followed by atleast one: o
24 | ending with character : d
25 | Solution: grep -oE 'go+d' garbled.txt
26 |
27 | 6) Extract all strings from each line
28 | starting with character : g
29 | followed by extactly six: o
30 | ending with character : d
31 | Solution: grep -oE 'go{6}d' garbled.txt
32 |
33 | 7) Extract all strings from each line
34 | starting with character : g
35 | followed by min two and max four: o
36 | ending with character : d
37 | Solution: grep -oE 'go{2,4}d' garbled.txt
38 |
39 | 8) Extract all strings from each line
40 | starting with character: d
41 | followed by max of two : o
42 | ending with character : g
43 | Solution: grep -oE 'do{,2}g' garbled.txt
44 |
45 | 9) Extract all strings from each line
46 | starting with character : g
47 | followed by min of three: o
48 | ending with character : d
49 | Solution: grep -oE 'go{3,}d' garbled.txt
50 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex12_regex_character_class_part1.txt:
--------------------------------------------------------------------------------
1 | 1) Match all lines containing any of these characters:
2 | character1: q
3 | character2: x
4 | character3: z
5 | Solution: grep '[qzx]' sample_words.txt
6 |
7 | 2) Match all lines containing any of these characters:
8 | character1: c
9 | character2: f
10 | followed by any character
11 | followed by : t
12 | Solution: grep '[cf].t' sample_words.txt
13 |
14 | 3) Extract all words starting with character: s
15 | ignore case
16 | should contain only alphabets
17 | minimum two letters
18 | should be surrounded by word boundaries
19 | Solution: grep -iowE 's[a-z]+' sample_words.txt
20 |
21 | 4) Extract all words made up of these characters:
22 | character1: a
23 | character2: c
24 | character3: e
25 | character4: r
26 | character5: s
27 | ignore case
28 | should contain only alphabets
29 | should be surrounded by word boundaries
30 | Solution: grep -iowE '[acers]+' sample_words.txt
31 |
32 | 5) Extract all numbers surrounded by word boundaries
33 | Solution: grep -ow '[0-9]*' sample_words.txt
34 |
35 | 6) Extract all numbers surrounded by word boundaries matching the condition
36 | 30 <= number <= 70
37 | Solution: grep -owE '[3-6][0-9]|70' sample_words.txt
38 |
39 | 7) Extract all words made up of non-vowel characters
40 | ignore case
41 | should contain only alphabets and at least two
42 | should be surrounded by word boundaries
43 | Solution: grep -iowE '[b-df-hj-np-tv-z]{2,}' sample_words.txt
44 |
45 | 8) Extract all sequence of strings consisting of character: -
46 | surrounded on either side by zero or more case insensitive alphabets
47 | Solution: grep -io '[a-z]*-[a-z]*' sample_words.txt
48 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex13_regex_character_class_part2.txt:
--------------------------------------------------------------------------------
1 | 1) Extract all characters before first occurrence of =
2 | Solution: grep -o '^[^=]*' sample.txt
3 |
4 | 2) Extract all characters from start of line made up of these characters
5 | upper or lower case alphabets
6 | all digits
7 | the underscore character
8 | Solution: grep -o '^\w*' sample.txt
9 |
10 | 3) Match all lines containing the sequence
11 | String1: there
12 | any number of whitespace
13 | String2: have
14 | Solution: grep 'there\s*have' sample.txt
15 |
16 | 4) Extract all characters from start of line made up of these characters
17 | upper or lower case alphabets
18 | all digits
19 | the characters [ and ]
20 | ending with ]
21 | Solution: grep -oi '^[]a-z0-9[]*]' sample.txt
22 |
23 | 5) Extract all punctuation characters from first line
24 | Solution: grep -om1 '[[:punct:]]' sample.txt
25 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex14_regex_grouping_and_backreference.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing these strings
2 | String1: scare
3 | String2: spore
4 | Solution: grep -E 's(po|ca)re' sample.txt
5 |
6 | 2) Extract these words
7 | Word1: handy
8 | Word2: hand
9 | Word3: hands
10 | Word4: handful
11 | Solution: grep -oE 'hand([sy]|ful)?' sample.txt
12 |
13 | 3) Extract all whole words with at least one letter occurring twice in the word
14 | ignore case
15 | only alphabets
16 | the letter occurring twice need not be placed next to each other
17 | Solution: grep -ioE '[a-z]*([a-z])[a-z]*\1[a-z]*' sample.txt
18 |
19 | 4) Match lines where same sequence of three consecutive alphabets is matched another time in the same line
20 | ignore case
21 | Solution: grep -iE '([a-z]{3}).*\1' sample.txt
22 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex15_regex_PCRE.txt:
--------------------------------------------------------------------------------
1 | 1) Extract all strings to the right of =
2 | provided characters from start of line until = do not include [ or ]
3 | Solution: grep -oP '^[^][=]+=\K.*' sample.txt
4 |
5 | 2) Match all lines containing the string: Hi
6 | but shouldn't be followed afterwards in the line by: are
7 | Solution: grep -P 'Hi(?!.*are)' sample.txt
8 |
9 | 3) Extract from start of line up to the string: Hi
10 | provided it is followed afterwards in the line by: you
11 | Solution: grep -oP '.*Hi(?=.*you)' sample.txt
12 |
13 | 4) Extract all sequence of characters surrounded on both sides by space character
14 | the space character should not be part of output
15 | Solution: grep -oP ' \K[^ ]+(?= )' sample.txt
16 |
17 | 5) Extract all words
18 | made of upper or lower case alphabets
19 | at least two letters in length
20 | surrounded by word boundaries
21 | should not contain consecutive repeated alphabets
22 | Solution: grep -iowP '[a-z]*([a-z])\1[a-z]*(*SKIP)(*F)|[a-z]{2,}' sample.txt
23 |
24 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/.ref_solutions/ex16_misc_and_extras.txt:
--------------------------------------------------------------------------------
1 | Note: all files in directory are input to grep, unless otherwise specified
2 |
3 | 1) Extract all negative numbers
4 | starts with - followed by one or more digits
5 | do not output filenames
6 | Solution: grep -hoE -- '-[0-9]+' *
7 |
8 | 2) Display only filenames containing these two strings anywhere in the file
9 | String1: day
10 | String2: and
11 | Solution: grep -zlE 'day.*and|and.*day' *
12 |
13 | 3) The below command
14 | grep -c '^Solution:' ../.ref_solutions/*
15 | will give number of questions in each exercise. Change it, using another command and pipe if needed, so that only overall total is printed
16 | Solution: cat ../.ref_solutions/* | grep -c '^Solution:'
17 |
18 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex01_basic_match.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing the string: day
2 |
3 |
4 | 2) Match lines containing the string: it
5 |
6 |
7 | 3) Match lines containing the string: do you
8 |
9 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex01_basic_match/sample.txt:
--------------------------------------------------------------------------------
1 | Hello World!
2 |
3 | Good day
4 | How do you do?
5 |
6 | Just do it
7 | Believe it!
8 |
9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 |
13 | Much ado about nothing
14 | He he he
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex02_basic_options.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing the string irrespective of lower/upper case: no
2 |
3 |
4 | 2) Match lines not containing the string: o
5 |
6 |
7 | 3) Match lines with line numbers containing the string: it
8 |
9 |
10 | 4) Output only number of matching lines containing the string: a
11 |
12 |
13 | 5) Match first two lines containing the string: do
14 |
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex02_basic_options/sample.txt:
--------------------------------------------------------------------------------
1 | Hello World!
2 |
3 | Good day
4 | How do you do?
5 |
6 | Just do it
7 | Believe it!
8 |
9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 |
13 | Much ado about nothing
14 | He he he
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex03_multiple_string_match.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing either of these three strings
2 | String1: Not
3 | String2: he
4 | String3: sun
5 |
6 |
7 | 2) Match lines containing both these strings
8 | String1: He
9 | String2: or
10 |
11 |
12 | 3) Match lines containing either of these two strings
13 | String1: a
14 | String2: i
15 | and contains this as well
16 | String3: do
17 |
18 |
19 | 4) Match lines containing the string
20 | String1: it
21 | but not these strings
22 | String2: No
23 | String3: no
24 |
25 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex03_multiple_string_match/sample.txt:
--------------------------------------------------------------------------------
1 | Hello World!
2 |
3 | Good day
4 | How do you do?
5 |
6 | Just do it
7 | Believe it!
8 |
9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 |
13 | Much ado about nothing
14 | He he he
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex04_filenames.txt:
--------------------------------------------------------------------------------
1 | Note: All files present in the directory should be given as file inputs to grep
2 |
3 | 1) Show only filenames containing the string: are
4 |
5 |
6 | 2) Show only filenames NOT containing the string: two
7 |
8 |
9 | 3) Match all lines containing the string: are
10 |
11 |
12 | 4) Match maximum of two matching lines along with filenames containing the character: a
13 |
14 |
15 | 5) Match all lines without prefixing filename containing the string: to
16 |
17 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex04_filenames/greeting.txt:
--------------------------------------------------------------------------------
1 | Hi, how are you?
2 |
3 | Hola :)
4 |
5 | Hello world
6 |
7 | Good day
8 |
9 | Rock on
10 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex04_filenames/poem.txt:
--------------------------------------------------------------------------------
1 | Roses are red,
2 | Violets are blue,
3 | Sugar is sweet,
4 | And so are you.
5 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex04_filenames/sample.txt:
--------------------------------------------------------------------------------
1 | Hello World!
2 |
3 | Good day
4 | How do you do?
5 |
6 | Just do it
7 | Believe it!
8 |
9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 |
13 | Much ado about nothing
14 | He he he
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex05_word_line_matching.txt:
--------------------------------------------------------------------------------
1 | Note: All files present in the directory should be given as file inputs to grep
2 |
3 | 1) Match lines containing whole word: do
4 |
5 |
6 | 2) Match whole lines containing the string: Hello World
7 |
8 |
9 | 3) Match lines containing these whole words:
10 | Word1: He
11 | Word2: far
12 |
13 |
14 | 4) Match lines containing the whole word: you
15 | and NOT containing the case insensitive string: How
16 |
17 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex05_word_line_matching/greeting.txt:
--------------------------------------------------------------------------------
1 | Hi, how are you?
2 |
3 | Hola :)
4 |
5 | Hello World
6 |
7 | Good day
8 |
9 | Rock on
10 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex05_word_line_matching/sample.txt:
--------------------------------------------------------------------------------
1 | Hello World!
2 |
3 | Good day
4 | How do you do?
5 |
6 | Just do it
7 | Believe it!
8 |
9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 |
13 | Much ado about nothing
14 | He he he
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex05_word_line_matching/words.txt:
--------------------------------------------------------------------------------
1 | afar
2 | far
3 | carfare
4 | farce
5 | faraway
6 | airfare
7 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex06_ABC_context_matching.txt:
--------------------------------------------------------------------------------
1 | 1) Get lines and 3 following it containing the string: you
2 |
3 |
4 | 2) Get lines and 2 preceding it containing the string: is
5 |
6 |
7 | 3) Get lines and 1 following/preceding containing the string: Not
8 |
9 |
10 | 4) Get lines and 1 following and 4 preceding containing the string: Not
11 |
12 |
13 | 5) Get lines and 1 preceding it containing the string: you
14 | there should be no separator between the matches
15 |
16 |
17 | 6) Get lines and 1 preceding it containing the string: you
18 | the separator between the matches should be: #####
19 |
20 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex06_ABC_context_matching/sample.txt:
--------------------------------------------------------------------------------
1 | Hello World!
2 |
3 | Good day
4 | How do you do?
5 |
6 | Just do it
7 | Believe it!
8 |
9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 |
13 | Much ado about nothing
14 | He he he
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search.txt:
--------------------------------------------------------------------------------
1 | Note: Every file in this directory and sub-directories is input for grep, unless otherwise specified
2 |
3 | 1) Match all lines containing the string: you
4 |
5 |
6 | 2) Show only filenames matching the string: Hello
7 | filenames should only end with .txt
8 |
9 |
10 | 3) Show only filenames matching the string: Hello
11 | filenames should NOT end with .txt
12 |
13 |
14 | 4) Show only filenames matching the string: are
15 | should not include the directory: progs
16 |
17 |
18 | 5) Show only filenames matching the string: are
19 | should NOT include these directories
20 | dir1: progs
21 | dir2: msg
22 |
23 |
24 | 6) Show only filenames matching the string: are
25 | should include files only from sub-directories
26 | hint: use shell glob pattern to specify directories to search
27 |
28 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/msg/greeting.txt:
--------------------------------------------------------------------------------
1 | Hi, how are you?
2 |
3 | Hola :)
4 |
5 | Hello World
6 |
7 | Good day
8 |
9 | Rock on
10 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/msg/sample.txt:
--------------------------------------------------------------------------------
1 | Hello World!
2 |
3 | Good day
4 | How do you do?
5 |
6 | Just do it
7 | Believe it!
8 |
9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 |
13 | Much ado about nothing
14 | He he he
15 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/poem.txt:
--------------------------------------------------------------------------------
1 | Roses are red,
2 | Violets are blue,
3 | Sugar is sweet,
4 | And so are you.
5 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/progs/hello.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python3
2 |
3 | print("Hello World")
4 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/progs/hello.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | echo "Hello $USER"
4 | echo "Today is $(date -u +%A)"
5 | echo 'Hope you are having a nice day'
6 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex07_recursive_search/words.txt:
--------------------------------------------------------------------------------
1 | afar
2 | far
3 | carfare
4 | farce
5 | faraway
6 | airfare
7 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex08_search_pattern_from_file.txt:
--------------------------------------------------------------------------------
1 | Note: words.txt has only whole words per line, use it as file input when task is to match whole words
2 |
3 | 1) Match all strings from file words.txt in file baz.txt
4 |
5 |
6 | 2) Match all words from file words.txt in file foo.txt
7 | should only match whole words
8 | should print only matching words, not entire line
9 |
10 |
11 | 3) Show common lines between foo.txt and baz.txt
12 |
13 |
14 | 4) Show lines present in baz.txt but not in foo.txt
15 |
16 |
17 | 5) Show lines present in foo.txt but not in baz.txt
18 |
19 |
20 | 6) Find all words common between all three files in the directory
21 | should only match whole words
22 | should print only matching words, not entire line
23 |
24 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex08_search_pattern_from_file/baz.txt:
--------------------------------------------------------------------------------
1 | I saw a few red cars going that way
2 | To the end!
3 | Are you coming today to the party?
4 | a[5] = 'good';
5 | Have you read the Harry Potter series?
6 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex08_search_pattern_from_file/foo.txt:
--------------------------------------------------------------------------------
1 | part
2 | a[5] = 'good';
3 | I saw a few red cars going that way
4 | Believe it!
5 | to do list
6 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex08_search_pattern_from_file/words.txt:
--------------------------------------------------------------------------------
1 | car
2 | part
3 | to
4 | read
5 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex09_regex_anchors.txt:
--------------------------------------------------------------------------------
1 | 1) Match all lines starting with: no
2 |
3 |
4 | 2) Match all lines ending with: it
5 |
6 |
7 | 3) Match all lines containing whole word: do
8 |
9 |
10 | 4) Match all lines containing words starting with: do
11 |
12 |
13 | 5) Match all lines containing words ending with: do
14 |
15 |
16 | 6) Match all lines starting with: ^
17 |
18 |
19 | 7) Match all lines ending with: $
20 |
21 |
22 | 8) Match all lines containing the string: in
23 | not surrounded by word boundaries, for ex: mint but not tin or ink
24 |
25 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex09_regex_anchors/sample.txt:
--------------------------------------------------------------------------------
1 | hello world!
2 |
3 | good day
4 | how do you do?
5 |
6 | just do it
7 | believe it!
8 |
9 | today is sunny
10 | not a bit funny
11 | no doubt you like it too
12 |
13 | much ado about nothing
14 | he he he
15 |
16 | ^ could be exponentiation or xor operator
17 | scalar variables in perl start with $
18 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex10_regex_this_or_that.txt:
--------------------------------------------------------------------------------
1 | 1) Match all lines containing any of these strings:
2 | String1: day
3 | String2: not
4 |
5 |
6 | 2) Match all lines containing any of these whole words:
7 | String1: he
8 | String2: in
9 |
10 |
11 | 3) Match all lines containing any of these strings:
12 | String1: you
13 | String2: be
14 | String3: to
15 | String4: he
16 |
17 |
18 | 4) Match all lines containing any of these strings:
19 | String1: you
20 | String2: be
21 | String3: to
22 | String4: he
23 | but NOT these strings:
24 | String1: it
25 | String2: do
26 |
27 |
28 | 5) Match all lines starting with any of these strings:
29 | String1: no
30 | String2: to
31 |
32 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex10_regex_this_or_that/sample.txt:
--------------------------------------------------------------------------------
1 | hello world!
2 |
3 | good day
4 | how do you do?
5 |
6 | just do it
7 | believe it!
8 |
9 | today is sunny
10 | not a bit funny
11 | no doubt you like it too
12 |
13 | much ado about nothing
14 | he he he
15 |
16 | ^ could be exponentiation or xor operator
17 | scalar variables in perl start with $
18 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex11_regex_quantifiers.txt:
--------------------------------------------------------------------------------
1 | 1) Extract all 3 character strings surrounded by word boundaries
2 |
3 |
4 | 2) Extract largest string from each line
5 | starting with character: d
6 | ending with character : g
7 |
8 |
9 | 3) Extract all strings from each line
10 | starting with character: d
11 | followed by zero or one: o
12 | ending with character : g
13 |
14 |
15 | 4) Extract all strings from each line
16 | starting with character: d
17 | followed by zero or one of any character
18 | ending with character : g
19 |
20 |
21 | 5) Extract all strings from each line
22 | starting with character: g
23 | followed by atleast one: o
24 | ending with character : d
25 |
26 |
27 | 6) Extract all strings from each line
28 | starting with character : g
29 | followed by extactly six: o
30 | ending with character : d
31 |
32 |
33 | 7) Extract all strings from each line
34 | starting with character : g
35 | followed by min two and max four: o
36 | ending with character : d
37 |
38 |
39 | 8) Extract all strings from each line
40 | starting with character: d
41 | followed by max of two : o
42 | ending with character : g
43 |
44 |
45 | 9) Extract all strings from each line
46 | starting with character : g
47 | followed by min of three: o
48 | ending with character : d
49 |
50 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex11_regex_quantifiers/garbled.txt:
--------------------------------------------------------------------------------
1 | gd
2 | god
3 | goood
4 | oh gold
5 | goooooodyyyy
6 | dog
7 | dg
8 | dig good gold
9 | doogoodog
10 | c@t made forty justify
11 | dodging a toy
12 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex12_regex_character_class_part1.txt:
--------------------------------------------------------------------------------
1 | 1) Match all lines containing any of these characters:
2 | character1: q
3 | character2: x
4 | character3: z
5 |
6 |
7 | 2) Match all lines containing any of these characters:
8 | character1: c
9 | character2: f
10 | followed by any character
11 | followed by : t
12 |
13 |
14 | 3) Extract all words starting with character: s
15 | ignore case
16 | should contain only alphabets
17 | minimum two letters
18 | should be surrounded by word boundaries
19 |
20 |
21 | 4) Extract all words made up of these characters:
22 | character1: a
23 | character2: c
24 | character3: e
25 | character4: r
26 | character5: s
27 | ignore case
28 | should contain only alphabets
29 | should be surrounded by word boundaries
30 |
31 |
32 | 5) Extract all numbers surrounded by word boundaries
33 |
34 |
35 | 6) Extract all numbers surrounded by word boundaries matching the condition
36 | 30 <= number <= 70
37 |
38 |
39 | 7) Extract all words made up of non-vowel characters
40 | ignore case
41 | should contain only alphabets and at least two
42 | should be surrounded by word boundaries
43 |
44 |
45 | 8) Extract all sequence of strings consisting of character: -
46 | surrounded on either side by zero or more case insensitive alphabets
47 |
48 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex12_regex_character_class_part1/sample_words.txt:
--------------------------------------------------------------------------------
1 | far 30 scarce f@$t 42 fit
2 | Cute 34 quite pry far-fetched Sure
3 | 70 cast-away 12 good hue he
4 | cry just Nymph race Peace. 67
5 | foo;bar;baz;p@t
6 | ARE 72 cut copy paste
7 | p1ate rest 512 Sync
8 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex13_regex_character_class_part2.txt:
--------------------------------------------------------------------------------
1 | 1) Extract all characters before first occurrence of =
2 |
3 |
4 | 2) Extract all characters from start of line made up of these characters
5 | upper or lower case alphabets
6 | all digits
7 | the underscore character
8 |
9 |
10 | 3) Match all lines containing the sequence
11 | String1: there
12 | any number of whitespace
13 | String2: have
14 |
15 |
16 | 4) Extract all characters from start of line made up of these characters
17 | upper or lower case alphabets
18 | all digits
19 | the characters [ and ]
20 | ending with ]
21 |
22 |
23 | 5) Extract all punctuation characters from first line
24 |
25 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex13_regex_character_class_part2/sample.txt:
--------------------------------------------------------------------------------
1 | a[2]='sample string'
2 | foo_bar=4232
3 | appx_pi=3.14
4 | greeting="Hi there have a nice day"
5 | food[4]="dosa"
6 | b[0][1]=42
7 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex14_regex_grouping_and_backreference.txt:
--------------------------------------------------------------------------------
1 | 1) Match lines containing these strings
2 | String1: scare
3 | String2: spore
4 |
5 |
6 | 2) Extract these words
7 | Word1: handy
8 | Word2: hand
9 | Word3: hands
10 | Word4: handful
11 |
12 |
13 | 3) Extract all whole words with at least one letter occurring twice in the word
14 | ignore case
15 | only alphabets
16 | the letter occurring twice need not be placed next to each other
17 |
18 |
19 | 4) Match lines where same sequence of three consecutive alphabets is matched another time in the same line
20 | ignore case
21 |
22 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex14_regex_grouping_and_backreference/sample.txt:
--------------------------------------------------------------------------------
1 | hands hand library scare handy handful
2 | scared too big time eel candy
3 | spare food regulate circuit spore stare
4 | tire tempt cold malady
5 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex15_regex_PCRE.txt:
--------------------------------------------------------------------------------
1 | 1) Extract all strings to the right of =
2 | provided characters from start of line until = do not include [ or ]
3 |
4 |
5 | 2) Match all lines containing the string: Hi
6 | but shouldn't be followed afterwards in the line by: are
7 |
8 |
9 | 3) Extract from start of line up to the string: Hi
10 | provided it is followed afterwards in the line by: you
11 |
12 |
13 | 4) Extract all sequence of characters surrounded on both sides by space character
14 | the space character should not be part of output
15 |
16 |
17 | 5) Extract all words
18 | made of upper or lower case alphabets
19 | at least two letters in length
20 | surrounded by word boundaries
21 | should not contain consecutive repeated alphabets
22 |
23 |
24 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex15_regex_PCRE/sample.txt:
--------------------------------------------------------------------------------
1 | a[2]='Hi, how are you?'
2 | foo_bar=4232
3 | appx_pi=3.14
4 | greeting="Hi there have a nice day"
5 | food[4]="dosa"
6 | b[0][1]=42
7 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex16_misc_and_extras.txt:
--------------------------------------------------------------------------------
1 | Note: all files in directory are input to grep, unless otherwise specified
2 |
3 | 1) Extract all negative numbers
4 | starts with - followed by one or more digits
5 | do not output filenames
6 |
7 |
8 | 2) Display only filenames containing these two strings anywhere in the file
9 | String1: day
10 | String2: and
11 |
12 |
13 | 3) The below command
14 | grep -c '^Solution:' ../.ref_solutions/*
15 | will give number of questions in each exercise. Change it, using another command and pipe if needed, so that only overall total is printed
16 |
17 |
18 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex16_misc_and_extras/garbled.txt:
--------------------------------------------------------------------------------
1 | day and night
2 | -43 and 99 and 12
3 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex16_misc_and_extras/poem.txt:
--------------------------------------------------------------------------------
1 | Roses are red,
2 | Violets are blue,
3 | Sugar is sweet,
4 | And so are you.
5 |
6 | Good day to you :)
7 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/ex16_misc_and_extras/sample.txt:
--------------------------------------------------------------------------------
1 | account balance: -2300
2 | good day
3 | foo and bar and baz
4 |
--------------------------------------------------------------------------------
/exercises/GNU_grep/solve:
--------------------------------------------------------------------------------
1 | dir_name=$(basename "$PWD")
2 | ref_file="../.ref_solutions/$dir_name.txt"
3 | sol_file="../$dir_name.txt"
4 | tmp_file='../.tmp.txt'
5 |
6 | # color output
7 | tcolors=$(tput colors)
8 | if [[ -n $tcolors && $tcolors -ge 8 ]]; then
9 | red=$(tput setaf 1)
10 | green=$(tput setaf 2)
11 | blue=$(tput setaf 4)
12 | clr_color=$(tput sgr0)
13 | else
14 | red=''
15 | green=''
16 | blue=''
17 | clr_color=''
18 | fi
19 |
20 | sub_sol=0
21 | if [[ $1 == -s ]]; then
22 | prev_cmd=$(fc -ln -2 | sed 's/^[ \t]*//;q')
23 | sub_sol=1
24 | elif [[ $1 == -q ]]; then
25 | # highlight the question to be solved next
26 | # or show only the (unanswered)? question to be solved next
27 | cat "$sol_file"
28 | return
29 | elif [[ -n $1 ]]; then
30 | echo -e 'Unknown option...Exiting script'
31 | return
32 | fi
33 |
34 | count=0
35 | sol_count=0
36 | err_count=0
37 | while IFS= read -u3 -r ref_line && read -u4 -r sol_line; do
38 | if [[ "${ref_line:0:9}" == Solution: ]]; then
39 | (( count++ ))
40 |
41 | if [[ $sub_sol == 1 && -z $sol_line ]]; then
42 | sol_line="$prev_cmd"
43 | sub_sol=0
44 | fi
45 |
46 | if [[ "$(eval "command ${ref_line:10}")" == "$(eval "command $sol_line")" ]]; then
47 | (( sol_count++ ))
48 | # use color if terminal supports
49 | echo '---------------------------------------------'
50 | echo "Match for question $count:"
51 | echo "${red}Submitted solution:${clr_color} $sol_line"
52 | echo "${green}Reference solution:${clr_color} ${ref_line:10}"
53 | echo '---------------------------------------------'
54 | else
55 | (( err_count++ ))
56 | if [[ $err_count == 1 && -n $sol_line ]]; then
57 | echo '---------------------------------------------'
58 | echo "Mismatch for question $count:"
59 | echo "$(tput bold)${red}Expected output is:${clr_color}$(tput rmso)"
60 | eval "command ${ref_line:10}"
61 | echo '---------------------------------------------'
62 | fi
63 | sol_line=''
64 | fi
65 | fi
66 |
67 | echo "$sol_line" >> "$tmp_file"
68 |
69 | done 3<"$ref_file" 4<"$sol_file"
70 |
71 | ((count==sol_count)) && printf "\t\t$(tput bold)${blue}All Pass${clr_color}$(tput rmso)\t\t\n"
72 |
73 | mv "$tmp_file" "$sol_file"
74 |
75 | # vim: syntax=bash
76 |
--------------------------------------------------------------------------------
/exercises/README.md:
--------------------------------------------------------------------------------
1 | # Exercises
2 |
3 | Instructions and shell script here assumes `bash` shell. Tested on *GNU bash, version 4.3.46*
4 |
5 |
6 |
7 | * For example, the first exercise for **GNU_grep**
8 | * directory: `ex01_basic_match`
9 | * question file: `ex01_basic_match.txt`
10 | * solution reference: `.ref_solutions/ex01_basic_match.txt`
11 | * Each exercise contains one or more question to be solved
12 | * The script `solve` will assist in checking solutions
13 |
14 | ```bash
15 | $ git clone https://github.com/learnbyexample/Command-line-text-processing.git
16 | $ cd Command-line-text-processing/exercises/GNU_grep/
17 | $ ls
18 | ex01_basic_match ex02_basic_options ex03_multiple_string_match solve
19 | ex01_basic_match.txt ex02_basic_options.txt ex03_multiple_string_match.txt
20 |
21 | $ find -name 'ex01*'
22 | ./.ref_solutions/ex01_basic_match.txt
23 | ./ex01_basic_match
24 | ./ex01_basic_match.txt
25 | ```
26 |
27 |
28 |
29 | * Solving the questions
30 | * Go to the exercise folder
31 | * Use `ls` to see input file(s)
32 | * To see the problems for that exercise, follow the steps below
33 |
34 | ```bash
35 | $ cd ex01_basic_match
36 | $ ls
37 | sample.txt
38 |
39 | $ # to see the questions
40 | $ source ../solve -q
41 | 1) Match lines containing the string: day
42 |
43 |
44 | 2) Match lines containing the string: it
45 |
46 |
47 | 3) Match lines containing the string: do you
48 |
49 |
50 | $ # or open the questions file with your fav editor
51 | $ gvim ../$(basename "$PWD").txt
52 | $ # create an alias to use from any ex* directory
53 | $ alias oq='gvim ../$(basename "$PWD").txt'
54 | $ oq
55 | ```
56 |
57 |
58 |
59 | * Submitting solutions one by one
60 | * immediately after executing command that answers a question, call the `solve` script
61 |
62 | ```bash
63 | $ grep 'day' sample.txt
64 | Good day
65 | Today is sunny
66 | $ source ../solve -s
67 | ---------------------------------------------
68 | Match for question 1:
69 | Submitted solution: grep 'day' sample.txt
70 | Reference solution: grep 'day' sample.txt
71 | ---------------------------------------------
72 | ```
73 |
74 |
75 |
76 | * Submit all at once
77 | * by editing the `../$(basename "$PWD").txt` file directly
78 | * the answer should replace the empty line immediately following the question
79 | * **Note**
80 | * there are different ways to solve the same question
81 | * but for specific exercise like **GNU_grep** try to solve using `grep` only
82 | * also, remember that `eval` is used to check equivalence. So be sure of commands submitted
83 |
84 | ```bash
85 | $ cat ../$(basename "$PWD").txt
86 | 1) Match lines containing the string: day
87 | grep 'day' sample.txt
88 |
89 | 2) Match lines containing the string: it
90 | sed -n '/it/p' sample.txt
91 |
92 | 3) Match lines containing the string: do you
93 | echo 'How do you do?'
94 |
95 | $ source ../solve
96 | ---------------------------------------------
97 | Match for question 1:
98 | Submitted solution: grep 'day' sample.txt
99 | Reference solution: grep 'day' sample.txt
100 | ---------------------------------------------
101 | ---------------------------------------------
102 | Match for question 2:
103 | Submitted solution: sed -n '/it/p' sample.txt
104 | Reference solution: grep 'it' sample.txt
105 | ---------------------------------------------
106 | ---------------------------------------------
107 | Match for question 3:
108 | Submitted solution: echo 'How do you do?'
109 | Reference solution: grep 'do you' sample.txt
110 | ---------------------------------------------
111 | All Pass
112 | ```
113 |
114 |
115 |
116 | * Then move on to next exercise directory
117 | * Create aliases for different commands for easy use, after checking that the aliases are available of course
118 |
119 | ```bash
120 | $ type cs cq ca nq pq
121 | bash: type: cs: not found
122 | bash: type: cq: not found
123 | bash: type: ca: not found
124 | bash: type: nq: not found
125 | bash: type: pq: not found
126 |
127 | $ alias cs='source ../solve -s'
128 | $ alias cq='source ../solve -q'
129 | $ alias ca='source ../solve'
130 | $ # to go to directory of next question
131 | $ nq() { d=$(basename "$PWD"); nd=$(printf "../ex%02d*/" $((${d:2:2}+1))); cd $nd ; }
132 | $ # to go to directory of previous question
133 | $ pq() { d=$(basename "$PWD"); pd=$(printf "../ex%02d*/" $((${d:2:2}-1))); cd $pd ; }
134 | ```
135 |
136 |
137 |
138 | If wrong solution is submitted, the expected output is shown. This also helps to better understand the question as I found it difficult to convey the intent of question clearly with words alone...
139 |
140 | ```bash
141 | $ source ../solve -q
142 | 1) Match lines containing the string: day
143 |
144 |
145 | 2) Match lines containing the string: it
146 |
147 |
148 | 3) Match lines containing the string: do you
149 |
150 | $ grep 'do' sample.txt
151 | How do you do?
152 | Just do it
153 | No doubt you like it too
154 | Much ado about nothing
155 | $ source ../solve -s
156 | ---------------------------------------------
157 | Mismatch for question 1:
158 | Expected output is:
159 | Good day
160 | Today is sunny
161 | ---------------------------------------------
162 | ```
163 |
--------------------------------------------------------------------------------
/file_attributes.md:
--------------------------------------------------------------------------------
1 | # File attributes
2 |
3 | **Table of Contents**
4 |
5 | * [wc](#wc)
6 | * [Various counts](#various-counts)
7 | * [subtle differences](#subtle-differences)
8 | * [Further reading for wc](#further-reading-for-wc)
9 | * [du](#du)
10 | * [Default size](#default-size)
11 | * [Various size formats](#various-size-formats)
12 | * [Dereferencing links](#dereferencing-links)
13 | * [Filtering options](#filtering-options)
14 | * [Further reading for du](#further-reading-for-du)
15 | * [df](#df)
16 | * [Examples](#examples)
17 | * [Further reading for df](#further-reading-for-df)
18 | * [touch](#touch)
19 | * [Creating empty file](#creating-empty-file)
20 | * [Updating timestamps](#updating-timestamps)
21 | * [Preserving timestamp](#preserving-timestamp)
22 | * [Further reading for touch](#further-reading-for-touch)
23 | * [file](#file)
24 | * [File type examples](#file-type-examples)
25 | * [Further reading for file](#further-reading-for-file)
26 |
27 |
28 |
29 | ## wc
30 |
31 | ```bash
32 | $ wc --version | head -n1
33 | wc (GNU coreutils) 8.25
34 |
35 | $ man wc
36 | WC(1) User Commands WC(1)
37 |
38 | NAME
39 | wc - print newline, word, and byte counts for each file
40 |
41 | SYNOPSIS
42 | wc [OPTION]... [FILE]...
43 | wc [OPTION]... --files0-from=F
44 |
45 | DESCRIPTION
46 | Print newline, word, and byte counts for each FILE, and a total line if
47 | more than one FILE is specified. A word is a non-zero-length sequence
48 | of characters delimited by white space.
49 |
50 | With no FILE, or when FILE is -, read standard input.
51 | ...
52 | ```
53 |
54 |
55 |
56 | #### Various counts
57 |
58 | ```bash
59 | $ cat sample.txt
60 | Hello World
61 | Good day
62 | No doubt you like it too
63 | Much ado about nothing
64 | He he he
65 |
66 | $ # by default, gives newline/word/byte count (in that order)
67 | $ wc sample.txt
68 | 5 17 78 sample.txt
69 |
70 | $ # options to get individual numbers
71 | $ wc -l sample.txt
72 | 5 sample.txt
73 | $ wc -w sample.txt
74 | 17 sample.txt
75 | $ wc -c sample.txt
76 | 78 sample.txt
77 |
78 | $ # use shell input redirection if filename is not needed
79 | $ wc -l < sample.txt
80 | 5
81 | ```
82 |
83 | * multiple file input
84 | * automatically displays total at end
85 |
86 | ```bash
87 | $ cat greeting.txt
88 | Hello there
89 | Have a safe journey
90 | $ cat fruits.txt
91 | Fruit Price
92 | apple 42
93 | banana 31
94 | fig 90
95 | guava 6
96 |
97 | $ wc *.txt
98 | 5 10 57 fruits.txt
99 | 2 6 32 greeting.txt
100 | 5 17 78 sample.txt
101 | 12 33 167 total
102 | ```
103 |
104 | * use `-L` to get length of longest line
105 |
106 | ```bash
107 | $ wc -L < sample.txt
108 | 24
109 |
110 | $ echo 'foo bar baz' | wc -L
111 | 11
112 | $ echo 'hi there!' | wc -L
113 | 9
114 |
115 | $ # last line will show max value, not sum of all input
116 | $ wc -L *.txt
117 | 13 fruits.txt
118 | 19 greeting.txt
119 | 24 sample.txt
120 | 24 total
121 | ```
122 |
123 |
124 |
125 | #### subtle differences
126 |
127 | * byte count vs character count
128 |
129 | ```bash
130 | $ # when input is ASCII
131 | $ printf 'hi there' | wc -c
132 | 8
133 | $ printf 'hi there' | wc -m
134 | 8
135 |
136 | $ # when input has multi-byte characters
137 | $ printf 'hi👍' | od -x
138 | 0000000 6968 9ff0 8d91
139 | 0000006
140 |
141 | $ printf 'hi👍' | wc -m
142 | 3
143 |
144 | $ printf 'hi👍' | wc -c
145 | 6
146 | ```
147 |
148 | * `-l` option gives only the count of number of newline characters
149 |
150 | ```bash
151 | $ printf 'hi there\ngood day' | wc -l
152 | 1
153 | $ printf 'hi there\ngood day\n' | wc -l
154 | 2
155 | $ printf 'hi there\n\n\nfoo\n' | wc -l
156 | 4
157 | ```
158 |
159 | * From `man wc` "A word is a non-zero-length sequence of characters delimited by white space"
160 |
161 | ```bash
162 | $ echo 'foo bar ;-*' | wc -w
163 | 3
164 |
165 | $ # use other text processing as needed
166 | $ echo 'foo bar ;-*' | grep -iowE '[a-z]+'
167 | foo
168 | bar
169 | $ echo 'foo bar ;-*' | grep -iowE '[a-z]+' | wc -l
170 | 2
171 | ```
172 |
173 | * `-L` won't count non-printable characters and tabs are converted to equivalent spaces
174 |
175 | ```bash
176 | $ printf 'food\tgood' | wc -L
177 | 12
178 | $ printf 'food\tgood' | wc -m
179 | 9
180 | $ printf 'food\tgood' | awk '{print length()}'
181 | 9
182 |
183 | $ printf 'foo\0bar\0baz' | wc -L
184 | 9
185 | $ printf 'foo\0bar\0baz' | wc -m
186 | 11
187 | $ printf 'foo\0bar\0baz' | awk '{print length()}'
188 | 11
189 | ```
190 |
191 |
192 |
193 | #### Further reading for wc
194 |
195 | * `man wc` and `info wc` for more options and detailed documentation
196 | * [wc Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/wc?sort=votes&pageSize=15)
197 | * [wc Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/wc?sort=votes&pageSize=15)
198 |
199 |
200 |
201 | ## du
202 |
203 | ```bash
204 | $ du --version | head -n1
205 | du (GNU coreutils) 8.25
206 |
207 | $ man du
208 | DU(1) User Commands DU(1)
209 |
210 | NAME
211 | du - estimate file space usage
212 |
213 | SYNOPSIS
214 | du [OPTION]... [FILE]...
215 | du [OPTION]... --files0-from=F
216 |
217 | DESCRIPTION
218 | Summarize disk usage of the set of FILEs, recursively for directories.
219 | ...
220 | ```
221 |
222 |
223 |
224 |
225 |
226 | #### Default size
227 |
228 | * By default, size is given in size of **1024 bytes**
229 | * Files are ignored, all directories and sub-directories are recursively reported
230 |
231 | ```bash
232 | $ ls -F
233 | projs/ py_learn@ words.txt
234 |
235 | $ du
236 | 17920 ./projs/full_addr
237 | 14316 ./projs/half_addr
238 | 32952 ./projs
239 | 33880 .
240 | ```
241 |
242 | * use `-a` to recursively show both files and directories
243 | * use `-s` to show total directory size without descending into its sub-directories
244 |
245 | ```bash
246 | $ du -a
247 | 712 ./projs/report.log
248 | 17916 ./projs/full_addr/faddr.v
249 | 17920 ./projs/full_addr
250 | 14312 ./projs/half_addr/haddr.v
251 | 14316 ./projs/half_addr
252 | 32952 ./projs
253 | 0 ./py_learn
254 | 924 ./words.txt
255 | 33880 .
256 |
257 | $ du -s
258 | 33880 .
259 |
260 | $ du -s projs words.txt
261 | 32952 projs
262 | 924 words.txt
263 | ```
264 |
265 | * use `-S` to show directory size without taking into account size of its sub-directories
266 |
267 | ```bash
268 | $ du -S
269 | 17920 ./projs/full_addr
270 | 14316 ./projs/half_addr
271 | 716 ./projs
272 | 928 .
273 | ```
274 |
275 |
276 |
277 |
278 |
279 | #### Various size formats
280 |
281 | ```bash
282 | $ # number of bytes
283 | $ stat -c %s words.txt
284 | 938848
285 | $ du -b words.txt
286 | 938848 words.txt
287 |
288 | $ # kilobytes = 1024 bytes
289 | $ du -sk projs
290 | 32952 projs
291 | $ # megabytes = 1024 kilobytes
292 | $ du -sm projs
293 | 33 projs
294 |
295 | $ # -B to specify custom byte scale size
296 | $ du -sB 5000 projs
297 | 6749 projs
298 | $ du -sB 1048576 projs
299 | 33 projs
300 | ```
301 |
302 | * human readable and si units
303 |
304 | ```bash
305 | $ # in terms of powers of 1024
306 | $ # M = 1048576 bytes and so on
307 | $ du -sh projs/* words.txt
308 | 18M projs/full_addr
309 | 14M projs/half_addr
310 | 712K projs/report.log
311 | 924K words.txt
312 |
313 | $ # in terms of powers of 1000
314 | $ # M = 1000000 bytes and so on
315 | $ du -s --si projs/* words.txt
316 | 19M projs/full_addr
317 | 15M projs/half_addr
318 | 730k projs/report.log
319 | 947k words.txt
320 | ```
321 |
322 | * sorting
323 |
324 | ```bash
325 | $ du -sh projs/* words.txt | sort -h
326 | 712K projs/report.log
327 | 924K words.txt
328 | 14M projs/half_addr
329 | 18M projs/full_addr
330 |
331 | $ du -sk projs/* | sort -nr
332 | 17920 projs/full_addr
333 | 14316 projs/half_addr
334 | 712 projs/report.log
335 | ```
336 |
337 | * to get size based on number of characters in file rather than disk space alloted
338 |
339 | ```bash
340 | $ du -b words.txt
341 | 938848 words.txt
342 |
343 | $ du -h words.txt
344 | 924K words.txt
345 |
346 | $ # 938848/1024 = 916.84
347 | $ du --apparent-size -h words.txt
348 | 917K words.txt
349 | ```
350 |
351 |
352 |
353 | #### Dereferencing links
354 |
355 | * See `man` and `info` pages for other related options
356 |
357 | ```bash
358 | $ # -D to dereference command line argument
359 | $ du py_learn
360 | 0 py_learn
361 | $ du -shD py_learn
362 | 503M py_learn
363 |
364 | $ # -L to dereference links found by du
365 | $ du -sh
366 | 34M .
367 | $ du -shL
368 | 536M .
369 | ```
370 |
371 |
372 |
373 | #### Filtering options
374 |
375 | * `-d` to specify maximum depth
376 |
377 | ```bash
378 | $ du -ah projs
379 | 712K projs/report.log
380 | 18M projs/full_addr/faddr.v
381 | 18M projs/full_addr
382 | 14M projs/half_addr/haddr.v
383 | 14M projs/half_addr
384 | 33M projs
385 |
386 | $ du -ah -d1 projs
387 | 712K projs/report.log
388 | 18M projs/full_addr
389 | 14M projs/half_addr
390 | 33M projs
391 | ```
392 |
393 | * `-c` to also show total size at end
394 |
395 | ```bash
396 | $ du -cshD projs py_learn
397 | 33M projs
398 | 503M py_learn
399 | 535M total
400 | ```
401 |
402 | * `-t` to provide a threshold comparison
403 |
404 | ```bash
405 | $ # >= 15M
406 | $ du -Sh -t 15M
407 | 18M ./projs/full_addr
408 |
409 | $ # <= 1M
410 | $ du -ah -t -1M
411 | 712K ./projs/report.log
412 | 0 ./py_learn
413 | 924K ./words.txt
414 | ```
415 |
416 | * excluding files/directories based on **glob** pattern
417 | * see also `--exclude-from=FILE` and `--files0-from=FILE` options
418 |
419 | ```bash
420 | $ # note that excluded files affect directory size reported
421 | $ du -ah --exclude='*addr*' projs
422 | 712K projs/report.log
423 | 716K projs
424 |
425 | $ # depending on shell, brace expansion can be used
426 | $ du -ah --exclude='*.'{v,log} projs
427 | 4.0K projs/full_addr
428 | 4.0K projs/half_addr
429 | 12K projs
430 | ```
431 |
432 |
433 |
434 | #### Further reading for du
435 |
436 | * `man du` and `info du` for more options and detailed documentation
437 | * [du Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/disk-usage?sort=votes&pageSize=15)
438 | * [du Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/du?sort=votes&pageSize=15)
439 |
440 |
441 |
442 | ## df
443 |
444 | ```bash
445 | $ df --version | head -n1
446 | df (GNU coreutils) 8.25
447 |
448 | $ man df
449 | DF(1) User Commands DF(1)
450 |
451 | NAME
452 | df - report file system disk space usage
453 |
454 | SYNOPSIS
455 | df [OPTION]... [FILE]...
456 |
457 | DESCRIPTION
458 | This manual page documents the GNU version of df. df displays the
459 | amount of disk space available on the file system containing each file
460 | name argument. If no file name is given, the space available on all
461 | currently mounted file systems is shown.
462 | ...
463 | ```
464 |
465 |
466 |
467 | #### Examples
468 |
469 | ```bash
470 | $ # use df without arguments to get information on all currently mounted file systems
471 | $ df .
472 | Filesystem 1K-blocks Used Available Use% Mounted on
473 | /dev/sda1 98298500 58563816 34734748 63% /
474 |
475 | $ # use -B option for custom size
476 | $ # use --si for size in powers of 1000 instead of 1024
477 | $ df -h .
478 | Filesystem Size Used Avail Use% Mounted on
479 | /dev/sda1 94G 56G 34G 63% /
480 | ```
481 |
482 | * Use `--output` to report only specific fields of interest
483 |
484 | ```bash
485 | $ df -h --output=size,used,file / /media/learnbyexample/projs
486 | Size Used File
487 | 94G 56G /
488 | 92G 35G /media/learnbyexample/projs
489 |
490 | $ df -h --output=pcent .
491 | Use%
492 | 63%
493 |
494 | $ df -h --output=pcent,fstype | awk -F'%' 'NR>2 && $1>=40'
495 | 63% ext3
496 | 40% ext4
497 | 51% ext4
498 | ```
499 |
500 |
501 |
502 | #### Further reading for df
503 |
504 | * `man df` and `info df` for more options and detailed documentation
505 | * [df Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/df?sort=votes&pageSize=15)
506 | * [Parsing df command output with awk](https://unix.stackexchange.com/questions/360865/parsing-df-command-output-with-awk)
507 | * [processing df output](https://www.reddit.com/r/bash/comments/68dbml/using_an_array_variable_in_an_awk_command/)
508 |
509 |
510 |
511 | ## touch
512 |
513 | ```bash
514 | $ touch --version | head -n1
515 | touch (GNU coreutils) 8.25
516 |
517 | $ man touch
518 | TOUCH(1) User Commands TOUCH(1)
519 |
520 | NAME
521 | touch - change file timestamps
522 |
523 | SYNOPSIS
524 | touch [OPTION]... FILE...
525 |
526 | DESCRIPTION
527 | Update the access and modification times of each FILE to the current
528 | time.
529 |
530 | A FILE argument that does not exist is created empty, unless -c or -h
531 | is supplied.
532 | ...
533 | ```
534 |
535 |
536 |
537 | #### Creating empty file
538 |
539 | ```bash
540 | $ ls foo.txt
541 | ls: cannot access 'foo.txt': No such file or directory
542 | $ touch foo.txt
543 | $ ls foo.txt
544 | foo.txt
545 |
546 | $ # use -c if new file shouldn't be created
547 | $ rm foo.txt
548 | $ touch -c foo.txt
549 | $ ls foo.txt
550 | ls: cannot access 'foo.txt': No such file or directory
551 | ```
552 |
553 |
554 |
555 | #### Updating timestamps
556 |
557 | * Updating both access and modification timestamp to current time
558 |
559 | ```bash
560 | $ # last access time
561 | $ stat -c %x fruits.txt
562 | 2017-07-19 17:06:01.523308599 +0530
563 | $ # last modification time
564 | $ stat -c %y fruits.txt
565 | 2017-07-13 13:54:03.576055933 +0530
566 |
567 | $ touch fruits.txt
568 | $ stat -c %x fruits.txt
569 | 2017-07-21 10:11:44.241921229 +0530
570 | $ stat -c %y fruits.txt
571 | 2017-07-21 10:11:44.241921229 +0530
572 | ```
573 |
574 | * Updating only access or modification timestamp
575 |
576 | ```bash
577 | $ touch -a greeting.txt
578 | $ stat -c %x greeting.txt
579 | 2017-07-21 10:14:08.457268564 +0530
580 | $ stat -c %y greeting.txt
581 | 2017-07-13 13:54:26.004499660 +0530
582 |
583 | $ touch -m sample.txt
584 | $ stat -c %x sample.txt
585 | 2017-07-13 13:48:24.945450646 +0530
586 | $ stat -c %y sample.txt
587 | 2017-07-21 10:14:40.770006144 +0530
588 | ```
589 |
590 | * Using timestamp from another file to update
591 |
592 | ```bash
593 | $ stat -c $'%x\n%y' power.log report.log
594 | 2017-07-19 10:48:03.978295434 +0530
595 | 2017-07-14 20:50:42.850887578 +0530
596 | 2017-06-24 13:00:31.773583923 +0530
597 | 2017-06-24 12:59:53.316751651 +0530
598 |
599 | $ # copy both access and modification timestamp from power.log to report.log
600 | $ touch -r power.log report.log
601 | $ stat -c $'%x\n%y' report.log
602 | 2017-07-19 10:48:03.978295434 +0530
603 | 2017-07-14 20:50:42.850887578 +0530
604 |
605 | $ # add -a or -m options to limit to only access or modification timestamp
606 | ```
607 |
608 | * Using date string to update
609 | * See also `-t` option
610 |
611 | ```bash
612 | $ # add -a or -m as needed
613 | $ touch -d '2010-03-17 17:04:23' report.log
614 | $ stat -c $'%x\n%y' report.log
615 | 2010-03-17 17:04:23.000000000 +0530
616 | 2010-03-17 17:04:23.000000000 +0530
617 | ```
618 |
619 |
620 |
621 | #### Preserving timestamp
622 |
623 | * Text processing on files would update the timestamps
624 |
625 | ```bash
626 | $ stat -c $'%x\n%y' power.log
627 | 2017-07-21 11:11:42.862874240 +0530
628 | 2017-07-13 21:31:53.496323704 +0530
629 |
630 | $ sed -i 's/foo/bar/g' power.log
631 | $ stat -c $'%x\n%y' power.log
632 | 2017-07-21 11:12:20.303504336 +0530
633 | 2017-07-21 11:12:20.303504336 +0530
634 | ```
635 |
636 | * `touch` can be used to restore timestamps after processing
637 |
638 | ```bash
639 | $ # first copy the timestamps using touch -r
640 | $ stat -c $'%x\n%y' story.txt
641 | 2017-06-24 13:00:31.773583923 +0530
642 | 2017-06-24 12:59:53.316751651 +0530
643 | $ # tmp.txt is temporary empty file
644 | $ touch -r story.txt tmp.txt
645 | $ stat -c $'%x\n%y' tmp.txt
646 | 2017-06-24 13:00:31.773583923 +0530
647 | 2017-06-24 12:59:53.316751651 +0530
648 |
649 | $ # after text processing, copy back the timestamps and remove temporary file
650 | $ sed -i 's/cat/dog/g' story.txt
651 | $ touch -r tmp.txt story.txt && rm tmp.txt
652 | $ stat -c $'%x\n%y' story.txt
653 | 2017-06-24 13:00:31.773583923 +0530
654 | 2017-06-24 12:59:53.316751651 +0530
655 | ```
656 |
657 |
658 |
659 | #### Further reading for touch
660 |
661 | * `man touch` and `info touch` for more options and detailed documentation
662 | * [touch Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/touch?sort=votes&pageSize=15)
663 |
664 |
665 |
666 | ## file
667 |
668 | ```bash
669 | $ file --version | head -n1
670 | file-5.25
671 |
672 | $ man file
673 | FILE(1) BSD General Commands Manual FILE(1)
674 |
675 | NAME
676 | file — determine file type
677 |
678 | SYNOPSIS
679 | file [-bcEhiklLNnprsvzZ0] [--apple] [--extension] [--mime-encoding]
680 | [--mime-type] [-e testname] [-F separator] [-f namefile]
681 | [-m magicfiles] [-P name=value] file ...
682 | file -C [-m magicfiles]
683 | file [--help]
684 |
685 | DESCRIPTION
686 | This manual page documents version 5.25 of the file command.
687 |
688 | file tests each argument in an attempt to classify it. There are three
689 | sets of tests, performed in this order: filesystem tests, magic tests,
690 | and language tests. The first test that succeeds causes the file type to
691 | be printed.
692 | ...
693 | ```
694 |
695 |
696 |
697 |
698 |
699 | #### File type examples
700 |
701 | ```bash
702 | $ file sample.txt
703 | sample.txt: ASCII text
704 | $ # without file name in output
705 | $ file -b sample.txt
706 | ASCII text
707 |
708 | $ printf 'hi👍\n' | file -
709 | /dev/stdin: UTF-8 Unicode text
710 | $ printf 'hi👍\n' | file -i -
711 | /dev/stdin: text/plain; charset=utf-8
712 |
713 | $ file ch
714 | ch: Bourne-Again shell script, ASCII text executable
715 |
716 | $ file sunset.jpg moon.png
717 | sunset.jpg: JPEG image data
718 | moon.png: PNG image data, 32 x 32, 8-bit/color RGBA, non-interlaced
719 | ```
720 |
721 | * different line terminators
722 |
723 | ```bash
724 | $ printf 'hi' | file -
725 | /dev/stdin: ASCII text, with no line terminators
726 |
727 | $ printf 'hi\r' | file -
728 | /dev/stdin: ASCII text, with CR line terminators
729 |
730 | $ printf 'hi\r\n' | file -
731 | /dev/stdin: ASCII text, with CRLF line terminators
732 |
733 | $ printf 'hi\n' | file -
734 | /dev/stdin: ASCII text
735 | ```
736 |
737 | * find all files of particular type in current directory, for example `image` files
738 |
739 | ```bash
740 | $ find -type f -exec bash -c '(file -b "$0" | grep -wq "image data") && echo "$0"' {} \;
741 | ./sunset.jpg
742 | ./moon.png
743 |
744 | $ # if filenames do not contain : or newline characters
745 | $ find -type f -exec file {} + | awk -F: '/\/{print $1}'
746 | ./sunset.jpg
747 | ./moon.png
748 | ```
749 |
750 |
751 |
752 | #### Further reading for file
753 |
754 | * `man file` and `info file` for more options and detailed documentation
755 | * See also `identify` command which `describes the format and characteristics of one or more image files`
756 |
--------------------------------------------------------------------------------
/images/color_option.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/color_option.png
--------------------------------------------------------------------------------
/images/colordiff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/colordiff.png
--------------------------------------------------------------------------------
/images/highlight_string_whole_file_op.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/highlight_string_whole_file_op.png
--------------------------------------------------------------------------------
/images/wdiff_to_colordiff.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/images/wdiff_to_colordiff.png
--------------------------------------------------------------------------------
/miscellaneous.md:
--------------------------------------------------------------------------------
1 | # Miscellaneous
2 |
3 | **Table of Contents**
4 |
5 | * [cut](#cut)
6 | * [select specific fields](#select-specific-fields)
7 | * [suppressing lines without delimiter](#suppressing-lines-without-delimiter)
8 | * [specifying delimiters](#specifying-delimiters)
9 | * [complement](#complement)
10 | * [select specific characters](#select-specific-characters)
11 | * [Further reading for cut](#further-reading-for-cut)
12 | * [tr](#tr)
13 | * [translation](#translation)
14 | * [escape sequences and character classes](#escape-sequences-and-character-classes)
15 | * [deletion](#deletion)
16 | * [squeeze](#squeeze)
17 | * [Further reading for tr](#further-reading-for-tr)
18 | * [basename](#basename)
19 | * [dirname](#dirname)
20 | * [xargs](#xargs)
21 | * [seq](#seq)
22 | * [integer sequences](#integer-sequences)
23 | * [specifying separator](#specifying-separator)
24 | * [floating point sequences](#floating-point-sequences)
25 | * [Further reading for seq](#further-reading-for-seq)
26 |
27 |
28 |
29 | ## cut
30 |
31 | ```bash
32 | $ cut --version | head -n1
33 | cut (GNU coreutils) 8.25
34 |
35 | $ man cut
36 | CUT(1) User Commands CUT(1)
37 |
38 | NAME
39 | cut - remove sections from each line of files
40 |
41 | SYNOPSIS
42 | cut OPTION... [FILE]...
43 |
44 | DESCRIPTION
45 | Print selected parts of lines from each FILE to standard output.
46 |
47 | With no FILE, or when FILE is -, read standard input.
48 | ...
49 | ```
50 |
51 |
52 |
53 | #### select specific fields
54 |
55 | * Default delimiter is **tab** character
56 | * `-f` option allows to print specific field(s) from each input line
57 |
58 | ```bash
59 | $ printf 'foo\tbar\t123\tbaz\n'
60 | foo bar 123 baz
61 |
62 | $ # single field
63 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f2
64 | bar
65 |
66 | $ # multiple fields can be specified by using ,
67 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f2,4
68 | bar baz
69 |
70 | $ # output is always ascending order of field numbers
71 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f3,1
72 | foo 123
73 |
74 | $ # range can be specified using -
75 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f1-3
76 | foo bar 123
77 | $ # if ending number is omitted, select till last field
78 | $ printf 'foo\tbar\t123\tbaz\n' | cut -f3-
79 | 123 baz
80 | ```
81 |
82 |
83 |
84 | #### suppressing lines without delimiter
85 |
86 | ```bash
87 | $ cat marks.txt
88 | jan 2017
89 | foobar 12 45 23
90 | feb 2017
91 | foobar 18 38 19
92 |
93 | $ # by default lines without delimiter will be printed
94 | $ cut -f2- marks.txt
95 | jan 2017
96 | 12 45 23
97 | feb 2017
98 | 18 38 19
99 |
100 | $ # use -s option to suppress such lines
101 | $ cut -s -f2- marks.txt
102 | 12 45 23
103 | 18 38 19
104 | ```
105 |
106 |
107 |
108 | #### specifying delimiters
109 |
110 | * use `-d` option to specify input delimiter other than default **tab** character
111 | * only single character can be used, for multi-character/regex based delimiter use `awk` or `perl`
112 |
113 | ```bash
114 | $ echo 'foo:bar:123:baz' | cut -d: -f3
115 | 123
116 |
117 | $ # by default output delimiter is same as input
118 | $ echo 'foo:bar:123:baz' | cut -d: -f1,4
119 | foo:baz
120 |
121 | $ # quote the delimiter character if it clashes with shell special characters
122 | $ echo 'one;two;three;four' | cut -d; -f3
123 | cut: option requires an argument -- 'd'
124 | Try 'cut --help' for more information.
125 | -f3: command not found
126 | $ echo 'one;two;three;four' | cut -d';' -f3
127 | three
128 | ```
129 |
130 | * use `--output-delimiter` option to specify different output delimiter
131 | * since this option accepts a string, more than one character can be specified
132 | * See also [using $ prefixed string](https://unix.stackexchange.com/questions/48106/what-does-it-mean-to-have-a-dollarsign-prefixed-string-in-a-script)
133 |
134 | ```bash
135 | $ printf 'foo\tbar\t123\tbaz\n' | cut --output-delimiter=: -f1-3
136 | foo:bar:123
137 |
138 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=' ' -f1,3-
139 | one three four
140 |
141 | $ # tested on bash, might differ with other shells
142 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=$'\t' -f1,3-
143 | one three four
144 |
145 | $ echo 'one;two;three;four' | cut -d';' --output-delimiter=' - ' -f1,3-
146 | one - three - four
147 | ```
148 |
149 |
150 |
151 | #### complement
152 |
153 | ```bash
154 | $ echo 'one;two;three;four' | cut -d';' -f1,3-
155 | one;three;four
156 |
157 | $ # to print other than specified fields
158 | $ echo 'one;two;three;four' | cut -d';' --complement -f2
159 | one;three;four
160 | ```
161 |
162 |
163 |
164 | #### select specific characters
165 |
166 | * similar to `-f` for field selection, use `-c` for character selection
167 | * See manual for what defines a character and differences between `-b` and `-c`
168 |
169 | ```bash
170 | $ echo 'foo:bar:123:baz' | cut -c4
171 | :
172 |
173 | $ printf 'foo\tbar\t123\tbaz\n' | cut -c1,4,7
174 | f r
175 |
176 | $ echo 'foo:bar:123:baz' | cut -c8-
177 | :123:baz
178 |
179 | $ echo 'foo:bar:123:baz' | cut --complement -c8-
180 | foo:bar
181 |
182 | $ echo 'foo:bar:123:baz' | cut -c1,6,7 --output-delimiter=' '
183 | f a r
184 |
185 | $ echo 'abcdefghij' | cut --output-delimiter='-' -c1-3,4-7,8-
186 | abc-defg-hij
187 |
188 | $ cut -c1-3 marks.txt
189 | jan
190 | foo
191 | feb
192 | foo
193 | ```
194 |
195 |
196 |
197 | #### Further reading for cut
198 |
199 | * `man cut` and `info cut` for more options and detailed documentation
200 | * [cut Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/cut?sort=votes&pageSize=15)
201 |
202 |
203 |
204 | ## tr
205 |
206 | ```bash
207 | $ tr --version | head -n1
208 | tr (GNU coreutils) 8.25
209 |
210 | $ man tr
211 | TR(1) User Commands TR(1)
212 |
213 | NAME
214 | tr - translate or delete characters
215 |
216 | SYNOPSIS
217 | tr [OPTION]... SET1 [SET2]
218 |
219 | DESCRIPTION
220 | Translate, squeeze, and/or delete characters from standard input, writ‐
221 | ing to standard output.
222 | ...
223 | ```
224 |
225 |
226 |
227 | #### translation
228 |
229 | * one-to-one mapping of characters, all occurrences are translated
230 | * as good practice, enclose the arguments in single quotes to avoid issues due to shell interpretation
231 |
232 | ```bash
233 | $ echo 'foo bar cat baz' | tr 'abc' '123'
234 | foo 21r 31t 21z
235 |
236 | $ # use - to represent a range in ascending order
237 | $ echo 'foo bar cat baz' | tr 'a-f' '1-6'
238 | 6oo 21r 31t 21z
239 |
240 | $ # changing case
241 | $ echo 'foo bar cat baz' | tr 'a-z' 'A-Z'
242 | FOO BAR CAT BAZ
243 | $ echo 'Hello World' | tr 'a-zA-Z' 'A-Za-z'
244 | hELLO wORLD
245 |
246 | $ echo 'foo;bar;baz' | tr ; :
247 | tr: missing operand
248 | Try 'tr --help' for more information.
249 | $ echo 'foo;bar;baz' | tr ';' ':'
250 | foo:bar:baz
251 | ```
252 |
253 | * rot13 example
254 |
255 | ```bash
256 | $ echo 'foo bar cat baz' | tr 'a-z' 'n-za-m'
257 | sbb one png onm
258 | $ echo 'sbb one png onm' | tr 'a-z' 'n-za-m'
259 | foo bar cat baz
260 |
261 | $ echo 'Hello World' | tr 'a-zA-Z' 'n-za-mN-ZA-M'
262 | Uryyb Jbeyq
263 | $ echo 'Uryyb Jbeyq' | tr 'a-zA-Z' 'n-za-mN-ZA-M'
264 | Hello World
265 | ```
266 |
267 | * use shell input redirection for file input
268 |
269 | ```bash
270 | $ cat marks.txt
271 | jan 2017
272 | foobar 12 45 23
273 | feb 2017
274 | foobar 18 38 19
275 |
276 | $ tr 'a-z' 'A-Z' < marks.txt
277 | JAN 2017
278 | FOOBAR 12 45 23
279 | FEB 2017
280 | FOOBAR 18 38 19
281 | ```
282 |
283 | * if arguments are of different lengths
284 |
285 | ```bash
286 | $ # when second argument is longer, the extra characters are ignored
287 | $ echo 'foo bar cat baz' | tr 'abc' '1-9'
288 | foo 21r 31t 21z
289 |
290 | $ # when first argument is longer
291 | $ # the last character of second argument gets re-used
292 | $ echo 'foo bar cat baz' | tr 'a-z' '123'
293 | 333 213 313 213
294 |
295 | $ # use -t option to truncate first argument to same length as second
296 | $ echo 'foo bar cat baz' | tr -t 'a-z' '123'
297 | foo 21r 31t 21z
298 | ```
299 |
300 |
301 |
302 | #### escape sequences and character classes
303 |
304 | * Certain characters like newline, tab, etc can be represented using escape sequences or octal representation
305 | * Certain commonly useful groups of characters like alphabets, digits, punctuations etc have character class as shortcuts
306 | * See [gnu tr manual](http://www.gnu.org/software/coreutils/manual/html_node/Character-sets.html#Character-sets) for all escape sequences and character classes
307 |
308 | ```bash
309 | $ printf 'foo\tbar\t123\tbaz\n' | tr '\t' ':'
310 | foo:bar:123:baz
311 |
312 | $ echo 'foo:bar:123:baz' | tr ':' '\n'
313 | foo
314 | bar
315 | 123
316 | baz
317 | $ # makes it easier to transform
318 | $ echo 'foo:bar:123:baz' | tr ':' '\n' | pr -2ats'-'
319 | foo-bar
320 | 123-baz
321 |
322 | $ echo 'foo bar cat baz' | tr '[:lower:]' '[:upper:]'
323 | FOO BAR CAT BAZ
324 | ```
325 |
326 | * since `-` is used for character ranges, place it at the end to represent it literally
327 | * cannot be used at start of argument as it would get treated as option
328 | * or use `--` to indicate end of option processing
329 | * similarly, to represent `\` literally, use `\\`
330 |
331 | ```bash
332 | $ echo '/foo-bar/baz/report' | tr '-a-z' '_A-Z'
333 | tr: invalid option -- 'a'
334 | Try 'tr --help' for more information.
335 |
336 | $ echo '/foo-bar/baz/report' | tr 'a-z-' 'A-Z_'
337 | /FOO_BAR/BAZ/REPORT
338 |
339 | $ echo '/foo-bar/baz/report' | tr -- '-a-z' '_A-Z'
340 | /FOO_BAR/BAZ/REPORT
341 |
342 | $ echo '/foo-bar/baz/report' | tr '/-' '\\_'
343 | \foo_bar\baz\report
344 | ```
345 |
346 |
347 |
348 | #### deletion
349 |
350 | * use `-d` option to specify characters to be deleted
351 | * add complement option `-c` if it is easier to define which characters are to be retained
352 |
353 | ```bash
354 | $ echo '2017-03-21' | tr -d '-'
355 | 20170321
356 |
357 | $ echo 'Hi123 there. How a32re you' | tr -d '1-9'
358 | Hi there. How are you
359 |
360 | $ # delete all punctuation characters
361 | $ echo '"Foo1!", "Bar.", ":Baz:"' | tr -d '[:punct:]'
362 | Foo1 Bar Baz
363 |
364 | $ # deleting carriage return character
365 | $ cat -v greeting.txt
366 | Hi there^M
367 | How are you^M
368 | $ tr -d '\r' < greeting.txt | cat -v
369 | Hi there
370 | How are you
371 |
372 | $ # retain only alphabets, comma and newline characters
373 | $ echo '"Foo1!", "Bar.", ":Baz:"' | tr -cd '[:alpha:],\n'
374 | Foo,Bar,Baz
375 | ```
376 |
377 |
378 |
379 | #### squeeze
380 |
381 | * to change consecutive repeated characters to single copy of that character
382 |
383 | ```bash
384 | $ # only lower case alphabets
385 | $ echo 'FFoo seed 11233' | tr -s 'a-z'
386 | FFo sed 11233
387 |
388 | $ # alphabets and digits
389 | $ echo 'FFoo seed 11233' | tr -s '[:alnum:]'
390 | Fo sed 123
391 |
392 | $ # squeeze other than alphabets
393 | $ echo 'FFoo seed 11233' | tr -sc '[:alpha:]'
394 | FFoo seed 123
395 |
396 | $ # only characters present in second argument is used for squeeze
397 | $ echo 'FFoo seed 11233' | tr -s 'A-Z' 'a-z'
398 | fo sed 11233
399 |
400 | $ # multiple consecutive horizontal spaces to single space
401 | $ printf 'foo\t\tbar \t123 baz\n'
402 | foo bar 123 baz
403 | $ printf 'foo\t\tbar \t123 baz\n' | tr -s '[:blank:]' ' '
404 | foo bar 123 baz
405 | ```
406 |
407 |
408 |
409 | #### Further reading for tr
410 |
411 | * `man tr` and `info tr` for more options and detailed documentation
412 | * [tr Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/tr?sort=votes&pageSize=15)
413 |
414 |
415 |
416 | ## basename
417 |
418 | ```bash
419 | $ basename --version | head -n1
420 | basename (GNU coreutils) 8.25
421 |
422 | $ man basename
423 | BASENAME(1) User Commands BASENAME(1)
424 |
425 | NAME
426 | basename - strip directory and suffix from filenames
427 |
428 | SYNOPSIS
429 | basename NAME [SUFFIX]
430 | basename OPTION... NAME...
431 |
432 | DESCRIPTION
433 | Print NAME with any leading directory components removed. If speci‐
434 | fied, also remove a trailing SUFFIX.
435 | ...
436 | ```
437 |
438 |
439 |
440 | **Examples**
441 |
442 | ```bash
443 | $ # same as using pwd command
444 | $ echo "$PWD"
445 | /home/learnbyexample
446 |
447 | $ basename "$PWD"
448 | learnbyexample
449 |
450 | $ # use -a option if there are multiple arguments
451 | $ basename -a foo/a/report.log bar/y/power.log
452 | report.log
453 | power.log
454 |
455 | $ # use single quotes if arguments contain space and other special shell characters
456 | $ # use suffix option -s to strip file extension from filename
457 | $ basename -s '.log' '/home/learnbyexample/proj adder/power.log'
458 | power
459 | $ # -a is implied when using -s option
460 | $ basename -s'.log' foo/a/report.log bar/y/power.log
461 | report
462 | power
463 | ```
464 |
465 | * Can also use [Parameter expansion](http://mywiki.wooledge.org/BashFAQ/073) if working on file paths saved in variables
466 | * assumes `bash` shell and similar that support this feature
467 |
468 | ```bash
469 | $ # remove from start of string up to last /
470 | $ file='/home/learnbyexample/proj adder/power.log'
471 | $ basename "$file"
472 | power.log
473 | $ echo "${file##*/}"
474 | power.log
475 |
476 | $ t="${file##*/}"
477 | $ # remove .log from end of string
478 | $ echo "${t%.log}"
479 | power
480 | ```
481 |
482 | * See `man basename` and `info basename` for detailed documentation
483 |
484 |
485 |
486 | ## dirname
487 |
488 | ```bash
489 | $ dirname --version | head -n1
490 | dirname (GNU coreutils) 8.25
491 |
492 | $ man dirname
493 | DIRNAME(1) User Commands DIRNAME(1)
494 |
495 | NAME
496 | dirname - strip last component from file name
497 |
498 | SYNOPSIS
499 | dirname [OPTION] NAME...
500 |
501 | DESCRIPTION
502 | Output each NAME with its last non-slash component and trailing slashes
503 | removed; if NAME contains no /'s, output '.' (meaning the current
504 | directory).
505 | ...
506 | ```
507 |
508 |
509 |
510 | **Examples**
511 |
512 | ```bash
513 | $ echo "$PWD"
514 | /home/learnbyexample
515 |
516 | $ dirname "$PWD"
517 | /home
518 |
519 | $ # use single quotes if arguments contain space and other special shell characters
520 | $ dirname '/home/learnbyexample/proj adder/power.log'
521 | /home/learnbyexample/proj adder
522 |
523 | $ # unlike basename, by default dirname handles multiple arguments
524 | $ dirname foo/a/report.log bar/y/power.log
525 | foo/a
526 | bar/y
527 |
528 | $ # if no / in argument, output is . to indicate current directory
529 | $ dirname power.log
530 | .
531 | ```
532 |
533 | * Use `$()` command substitution to further process output as needed
534 |
535 | ```bash
536 | $ dirname '/home/learnbyexample/proj adder/power.log'
537 | /home/learnbyexample/proj adder
538 |
539 | $ dirname "$(dirname '/home/learnbyexample/proj adder/power.log')"
540 | /home/learnbyexample
541 |
542 | $ basename "$(dirname '/home/learnbyexample/proj adder/power.log')"
543 | proj adder
544 | ```
545 |
546 | * Can also use [Parameter expansion](http://mywiki.wooledge.org/BashFAQ/073) if working on file paths saved in variables
547 | * assumes `bash` shell and similar that support this feature
548 |
549 | ```bash
550 | $ # remove from last / in the string to end of string
551 | $ file='/home/learnbyexample/proj adder/power.log'
552 | $ dirname "$file"
553 | /home/learnbyexample/proj adder
554 | $ echo "${file%/*}"
555 | /home/learnbyexample/proj adder
556 |
557 | $ # remove from second last / to end of string
558 | $ echo "${file%/*/*}"
559 | /home/learnbyexample
560 |
561 | $ # apply basename trick to get just directory name instead of full path
562 | $ t="${file%/*}"
563 | $ echo "${t##*/}"
564 | proj adder
565 | ```
566 |
567 | * See `man dirname` and `info dirname` for detailed documentation
568 |
569 |
570 |
571 | ## xargs
572 |
573 | ```bash
574 | $ xargs --version | head -n1
575 | xargs (GNU findutils) 4.7.0-git
576 |
577 | $ whatis xargs
578 | xargs (1) - build and execute command lines from standard input
579 |
580 | $ # from 'man xargs'
581 | This manual page documents the GNU version of xargs. xargs reads items
582 | from the standard input, delimited by blanks (which can be protected
583 | with double or single quotes or a backslash) or newlines, and executes
584 | the command (default is /bin/echo) one or more times with any initial-
585 | arguments followed by items read from standard input. Blank lines on
586 | the standard input are ignored.
587 | ```
588 |
589 | While `xargs` is [primarily used](https://unix.stackexchange.com/questions/24954/when-is-xargs-needed) for passing output of command or file contents to another command as input arguments and/or parallel processing, it can be quite handy for certain text processing stuff with default `echo` command
590 |
591 | ```bash
592 | $ printf ' foo\t\tbar \t123 baz \n' | cat -e
593 | foo bar 123 baz $
594 | $ # tr helps to change consecutive blanks to single space
595 | $ # but what if blanks at start and end have to be removed as well?
596 | $ printf ' foo\t\tbar \t123 baz \n' | tr -s '[:blank:]' ' ' | cat -e
597 | foo bar 123 baz $
598 | $ # xargs does this by default
599 | $ printf ' foo\t\tbar \t123 baz \n' | xargs | cat -e
600 | foo bar 123 baz$
601 |
602 | $ # -n option limits number of arguments per line
603 | $ printf ' foo\t\tbar \t123 baz \n' | xargs -n2
604 | foo bar
605 | 123 baz
606 |
607 | $ # same as using: paste -d' ' - - -
608 | $ # or: pr -3ats' '
609 | $ seq 6 | xargs -n3
610 | 1 2 3
611 | 4 5 6
612 | ```
613 |
614 | * use `-a` option to specify file input instead of stdin
615 |
616 | ```bash
617 | $ cat marks.txt
618 | jan 2017
619 | foobar 12 45 23
620 | feb 2017
621 | foobar 18 38 19
622 |
623 | $ xargs -a marks.txt
624 | jan 2017 foobar 12 45 23 feb 2017 foobar 18 38 19
625 |
626 | $ # use -L option to limit max number of lines per command line
627 | $ xargs -L2 -a marks.txt
628 | jan 2017 foobar 12 45 23
629 | feb 2017 foobar 18 38 19
630 | ```
631 |
632 | * **Note** since `echo` is the command being executed, it will cause issue with option interpretation
633 |
634 | ```bash
635 | $ printf ' -e foo\t\tbar \t123 baz \n' | xargs -n2
636 | foo
637 | bar 123
638 | baz
639 |
640 | $ # use -t option to see what is happening (verbose output)
641 | $ printf ' -e foo\t\tbar \t123 baz \n' | xargs -n2 -t
642 | echo -e foo
643 | foo
644 | echo bar 123
645 | bar 123
646 | echo baz
647 | baz
648 | ```
649 |
650 | * See `man xargs` and `info xargs` for detailed documentation
651 |
652 |
653 |
654 | ## seq
655 |
656 | ```bash
657 | $ seq --version | head -n1
658 | seq (GNU coreutils) 8.25
659 |
660 | $ man seq
661 | SEQ(1) User Commands SEQ(1)
662 |
663 | NAME
664 | seq - print a sequence of numbers
665 |
666 | SYNOPSIS
667 | seq [OPTION]... LAST
668 | seq [OPTION]... FIRST LAST
669 | seq [OPTION]... FIRST INCREMENT LAST
670 |
671 | DESCRIPTION
672 | Print numbers from FIRST to LAST, in steps of INCREMENT.
673 | ...
674 | ```
675 |
676 |
677 |
678 | #### integer sequences
679 |
680 | * see `info seq` for details of how large numbers are handled
681 | * for ex: `seq 50000000000000000000 2 50000000000000000004` may not work
682 |
683 | ```bash
684 | $ # default start=1 and increment=1
685 | $ seq 3
686 | 1
687 | 2
688 | 3
689 |
690 | $ # default increment=1
691 | $ seq 25434 25437
692 | 25434
693 | 25435
694 | 25436
695 | 25437
696 | $ seq -5 -3
697 | -5
698 | -4
699 | -3
700 |
701 | $ # different increment value
702 | $ seq 1000 5 1011
703 | 1000
704 | 1005
705 | 1010
706 |
707 | $ # use negative increment for descending order
708 | $ seq 10 -5 -7
709 | 10
710 | 5
711 | 0
712 | -5
713 | ```
714 |
715 | * use `-w` option for leading zeros
716 | * largest length of start/end value is used to determine padding
717 |
718 | ```bash
719 | $ seq 008 010
720 | 8
721 | 9
722 | 10
723 |
724 | $ # or: seq -w 8 010
725 | $ seq -w 008 010
726 | 008
727 | 009
728 | 010
729 |
730 | $ seq -w 0003
731 | 0001
732 | 0002
733 | 0003
734 | ```
735 |
736 |
737 |
738 | #### specifying separator
739 |
740 | * As seen already, default is newline separator between numbers
741 | * `-s` option allows to use custom string between numbers
742 | * A newline is always added at end
743 |
744 | ```bash
745 | $ seq -s: 4
746 | 1:2:3:4
747 |
748 | $ seq -s' ' 4
749 | 1 2 3 4
750 |
751 | $ seq -s' - ' 4
752 | 1 - 2 - 3 - 4
753 | ```
754 |
755 |
756 |
757 | #### floating point sequences
758 |
759 | ```bash
760 | $ # default increment=1
761 | $ seq 0.5 2.5
762 | 0.5
763 | 1.5
764 | 2.5
765 |
766 | $ seq -s':' -2 0.75 3
767 | -2.00:-1.25:-0.50:0.25:1.00:1.75:2.50
768 |
769 | $ # Scientific notation is supported
770 | $ seq 1.2e2 1.22e2
771 | 120
772 | 121
773 | 122
774 | ```
775 |
776 | * formatting numbers, see `info seq` for details
777 |
778 | ```bash
779 | $ seq -f'%.3f' -s':' -2 0.75 3
780 | -2.000:-1.250:-0.500:0.250:1.000:1.750:2.500
781 |
782 | $ seq -f'%.3e' 1.2e2 1.22e2
783 | 1.200e+02
784 | 1.210e+02
785 | 1.220e+02
786 | ```
787 |
788 |
789 |
790 | #### Further reading for seq
791 |
792 | * `man seq` and `info seq` for more options, corner cases and detailed documentation
793 | * [seq Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/seq?sort=votes&pageSize=15)
794 |
--------------------------------------------------------------------------------
/overview_presentation/baz.json:
--------------------------------------------------------------------------------
1 | {
2 | "abc": {
3 | "@attr": "good",
4 | "text": "Hi there"
5 | },
6 | "xyz": {
7 | "@attr": "bad",
8 | "text": "I am good. How are you?"
9 | }
10 | }
11 |
--------------------------------------------------------------------------------
/overview_presentation/cli_text_processing.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/learnbyexample/Command-line-text-processing/ce56c851f078469736bbe51a6938c21cc934022e/overview_presentation/cli_text_processing.pdf
--------------------------------------------------------------------------------
/overview_presentation/foo.xml:
--------------------------------------------------------------------------------
1 |
2 | Hi there
3 | I am good. How are you?
4 |
5 |
--------------------------------------------------------------------------------
/overview_presentation/greeting.txt:
--------------------------------------------------------------------------------
1 | Hi there
2 | Have a nice day
3 |
--------------------------------------------------------------------------------
/overview_presentation/sample.txt:
--------------------------------------------------------------------------------
1 | Hello World!
2 |
3 | Good day
4 | How do you do?
5 |
6 | Just do it
7 | Believe 42 it!
8 |
9 | Today is sunny
10 | Not a bit funny
11 | No doubt you like it too
12 |
13 | Much ado about nothing
14 | He he 123 he he
15 |
--------------------------------------------------------------------------------
/restructure_text.md:
--------------------------------------------------------------------------------
1 | # Restructure text
2 |
3 | **Table of Contents**
4 |
5 | * [paste](#paste)
6 | * [Concatenating files column wise](#concatenating-files-column-wise)
7 | * [Interleaving lines](#interleaving-lines)
8 | * [Lines to multiple columns](#lines-to-multiple-columns)
9 | * [Different delimiters between columns](#different-delimiters-between-columns)
10 | * [Multiple lines to single row](#multiple-lines-to-single-row)
11 | * [Further reading for paste](#further-reading-for-paste)
12 | * [column](#column)
13 | * [Pretty printing tables](#pretty-printing-tables)
14 | * [Specifying different input delimiter](#specifying-different-input-delimiter)
15 | * [Further reading for column](#further-reading-for-column)
16 | * [pr](#pr)
17 | * [Converting lines to columns](#converting-lines-to-columns)
18 | * [Changing PAGE_WIDTH](#changing-page_width)
19 | * [Combining multiple input files](#combining-multiple-input-files)
20 | * [Transposing a table](#transposing-a-table)
21 | * [Further reading for pr](#further-reading-for-pr)
22 | * [fold](#fold)
23 | * [Examples](#examples)
24 | * [Further reading for fold](#further-reading-for-fold)
25 |
26 |
27 |
28 | ## paste
29 |
30 | ```bash
31 | $ paste --version | head -n1
32 | paste (GNU coreutils) 8.25
33 |
34 | $ man paste
35 | PASTE(1) User Commands PASTE(1)
36 |
37 | NAME
38 | paste - merge lines of files
39 |
40 | SYNOPSIS
41 | paste [OPTION]... [FILE]...
42 |
43 | DESCRIPTION
44 | Write lines consisting of the sequentially corresponding lines from
45 | each FILE, separated by TABs, to standard output.
46 |
47 | With no FILE, or when FILE is -, read standard input.
48 | ...
49 | ```
50 |
51 |
52 |
53 | #### Concatenating files column wise
54 |
55 | * By default, `paste` adds a TAB between corresponding lines of input files
56 |
57 | ```bash
58 | $ paste colors_1.txt colors_2.txt
59 | Blue Black
60 | Brown Blue
61 | Purple Green
62 | Red Red
63 | Teal White
64 | ```
65 |
66 | * Specifying a different delimiter using `-d`
67 | * The `<()` syntax is [Process Substitution](http://mywiki.wooledge.org/ProcessSubstitution)
68 | * to put it simply - allows output of command to be passed as input file to another command without needing to manually create a temporary file
69 |
70 | ```bash
71 | $ paste -d, <(seq 5) <(seq 6 10)
72 | 1,6
73 | 2,7
74 | 3,8
75 | 4,9
76 | 5,10
77 |
78 | $ # empty cells if number of lines is not same for all input files
79 | $ # -d\| can also be used
80 | $ paste -d'|' <(seq 3) <(seq 4 6) <(seq 7 10)
81 | 1|4|7
82 | 2|5|8
83 | 3|6|9
84 | ||10
85 | ```
86 |
87 | * to paste without any character in between, use `\0` as delimiter
88 | * note that `\0` here doesn't mean the ASCII NUL character
89 | * can also use `-d ''` with `GNU paste`
90 |
91 | ```bash
92 | $ paste -d'\0' <(seq 3) <(seq 6 8)
93 | 16
94 | 27
95 | 38
96 | ```
97 |
98 |
99 |
100 | #### Interleaving lines
101 |
102 | * Interleave lines by using newline as delimiter
103 |
104 | ```bash
105 | $ paste -d'\n' <(seq 11 13) <(seq 101 103)
106 | 11
107 | 101
108 | 12
109 | 102
110 | 13
111 | 103
112 | ```
113 |
114 |
115 |
116 | #### Lines to multiple columns
117 |
118 | * Number of `-` specified determines number of output columns
119 | * Input lines can be passed only as stdin
120 |
121 | ```bash
122 | $ # single column to two columns
123 | $ seq 10 | paste -d, - -
124 | 1,2
125 | 3,4
126 | 5,6
127 | 7,8
128 | 9,10
129 |
130 | $ # single column to five columns
131 | $ seq 10 | paste -d: - - - - -
132 | 1:2:3:4:5
133 | 6:7:8:9:10
134 |
135 | $ # input redirection for file input
136 | $ paste -d, - - < colors_1.txt
137 | Blue,Brown
138 | Purple,Red
139 | Teal,
140 | ```
141 |
142 | * Use `printf` trick if number of columns to specify is too large
143 |
144 | ```bash
145 | $ # prompt at end of line not shown for simplicity
146 | $ printf -- "- %.s" {1..5}
147 | - - - - -
148 |
149 | $ seq 10 | paste -d, $(printf -- "- %.s" {1..5})
150 | 1,2,3,4,5
151 | 6,7,8,9,10
152 | ```
153 |
154 |
155 |
156 | #### Different delimiters between columns
157 |
158 | * For more than 2 columns, different delimiter character can be specified - passed as list to `-d` option
159 |
160 | ```bash
161 | $ # , is used between 1st and 2nd column
162 | $ # - is used between 2nd and 3rd column
163 | $ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9)
164 | 1,4-7
165 | 2,5-8
166 | 3,6-9
167 |
168 | $ # re-use list from beginning if not specified for all columns
169 | $ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12)
170 | 1,4-7,10
171 | 2,5-8,11
172 | 3,6-9,12
173 | $ # another example
174 | $ seq 10 | paste -d':,' - - - - -
175 | 1:2,3:4,5
176 | 6:7,8:9,10
177 |
178 | $ # so, with single delimiter, it is just re-used for all columns
179 | $ paste -d, <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12)
180 | 1,4,7,10
181 | 2,5,8,11
182 | 3,6,9,12
183 | ```
184 |
185 | * combination of `-d` and `/dev/null` (empty file) can give multi-character separation between columns
186 | * If this is too confusing to use, consider [pr](#pr) instead
187 |
188 | ```bash
189 | $ paste -d' : ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9)
190 | 1 : 4 : 7
191 | 2 : 5 : 8
192 | 3 : 6 : 9
193 |
194 | $ # or just use pr instead
195 | $ pr -mts' : ' <(seq 3) <(seq 4 6) <(seq 7 9)
196 | 1 : 4 : 7
197 | 2 : 5 : 8
198 | 3 : 6 : 9
199 |
200 | $ # but paste would allow different delimiters ;)
201 | $ paste -d' : - ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9)
202 | 1 : 4 - 7
203 | 2 : 5 - 8
204 | 3 : 6 - 9
205 |
206 | $ # pr would need two invocations
207 | $ pr -mts' : ' <(seq 3) <(seq 4 6) | pr -mts' - ' - <(seq 7 9)
208 | 1 : 4 - 7
209 | 2 : 5 - 8
210 | 3 : 6 - 9
211 | ```
212 |
213 | * example to show using empty file instead of `/dev/null`
214 |
215 | ```bash
216 | $ # assuming file named e doesn't exist
217 | $ touch e
218 | $ # or use this, will empty contents even if file named e already exists :P
219 | $ > e
220 |
221 | $ paste -d' : - ' <(seq 3) e e <(seq 4 6) e e <(seq 7 9)
222 | 1 : 4 - 7
223 | 2 : 5 - 8
224 | 3 : 6 - 9
225 | ```
226 |
227 |
228 |
229 | #### Multiple lines to single row
230 |
231 | ```bash
232 | $ paste -sd, colors_1.txt
233 | Blue,Brown,Purple,Red,Teal
234 |
235 | $ # multiple files each gets a row
236 | $ paste -sd: colors_1.txt colors_2.txt
237 | Blue:Brown:Purple:Red:Teal
238 | Black:Blue:Green:Red:White
239 |
240 | $ # multiple input files need not have same number of lines
241 | $ paste -sd, <(seq 3) <(seq 5 9)
242 | 1,2,3
243 | 5,6,7,8,9
244 | ```
245 |
246 | * Often used to serialize multiple line output from another command
247 |
248 | ```bash
249 | $ sort -u colors_1.txt colors_2.txt | paste -sd,
250 | Black,Blue,Brown,Green,Purple,Red,Teal,White
251 | ```
252 |
253 | * For multiple character delimiter, post-process if separator is unique or use another tool like `perl`
254 |
255 | ```bash
256 | $ seq 10 | paste -sd,
257 | 1,2,3,4,5,6,7,8,9,10
258 |
259 | $ # post-process
260 | $ seq 10 | paste -sd, | sed 's/,/ : /g'
261 | 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10
262 |
263 | $ # using perl alone
264 | $ seq 10 | perl -pe 's/\n/ : / if(!eof)'
265 | 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10
266 | ```
267 |
268 |
269 |
270 | #### Further reading for paste
271 |
272 | * `man paste` and `info paste` for more options and detailed documentation
273 | * [paste Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/paste?sort=votes&pageSize=15)
274 |
275 |
276 |
277 | ## column
278 |
279 | ```bash
280 | COLUMN(1) BSD General Commands Manual COLUMN(1)
281 |
282 | NAME
283 | column — columnate lists
284 |
285 | SYNOPSIS
286 | column [-entx] [-c columns] [-s sep] [file ...]
287 |
288 | DESCRIPTION
289 | The column utility formats its input into multiple columns. Rows are
290 | filled before columns. Input is taken from file operands, or, by
291 | default, from the standard input. Empty lines are ignored unless the -e
292 | option is used.
293 | ...
294 | ```
295 |
296 |
297 |
298 | #### Pretty printing tables
299 |
300 | * by default whitespace is input delimiter
301 |
302 | ```bash
303 | $ cat dishes.txt
304 | North alootikki baati khichdi makkiroti poha
305 | South appam bisibelebath dosa koottu sevai
306 | West dhokla khakhra modak shiro vadapav
307 | East handoguri litti momo rosgulla shondesh
308 |
309 | $ column -t dishes.txt
310 | North alootikki baati khichdi makkiroti poha
311 | South appam bisibelebath dosa koottu sevai
312 | West dhokla khakhra modak shiro vadapav
313 | East handoguri litti momo rosgulla shondesh
314 | ```
315 |
316 | * often useful to get neatly aligned columns from output of another command
317 |
318 | ```bash
319 | $ paste fruits.txt price.txt
320 | Fruits Price
321 | apple 182
322 | guava 90
323 | watermelon 35
324 | banana 72
325 | pomegranate 280
326 |
327 | $ paste fruits.txt price.txt | column -t
328 | Fruits Price
329 | apple 182
330 | guava 90
331 | watermelon 35
332 | banana 72
333 | pomegranate 280
334 | ```
335 |
336 |
337 |
338 | #### Specifying different input delimiter
339 |
340 | * Use `-s` to specify input delimiter
341 | * Use `-n` to prevent merging empty cells
342 | * From `man column` "This option is a Debian GNU/Linux extension"
343 |
344 | ```bash
345 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13)
346 | 1,5,11
347 | 2,6,12
348 | 3,7,13
349 | ,8,
350 | ,9,
351 |
352 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) | column -s, -t
353 | 1 5 11
354 | 2 6 12
355 | 3 7 13
356 | 8
357 | 9
358 |
359 | $ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) | column -s, -nt
360 | 1 5 11
361 | 2 6 12
362 | 3 7 13
363 | 8
364 | 9
365 | ```
366 |
367 |
368 |
369 | #### Further reading for column
370 |
371 | * `man column` for more options and detailed documentation
372 | * [column Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/columns?sort=votes&pageSize=15)
373 | * More examples [here](http://www.commandlinefu.com/commands/using/column/sort-by-votes)
374 |
375 |
376 |
377 | ## pr
378 |
379 | ```bash
380 | $ pr --version | head -n1
381 | pr (GNU coreutils) 8.25
382 |
383 | $ man pr
384 | PR(1) User Commands PR(1)
385 |
386 | NAME
387 | pr - convert text files for printing
388 |
389 | SYNOPSIS
390 | pr [OPTION]... [FILE]...
391 |
392 | DESCRIPTION
393 | Paginate or columnate FILE(s) for printing.
394 |
395 | With no FILE, or when FILE is -, read standard input.
396 | ...
397 | ```
398 |
399 | * `Paginate` is not covered, examples related only to `columnate`
400 | * For example, default invocation on a file would add a header, etc
401 |
402 | ```bash
403 | $ # truncated output shown
404 | $ pr fruits.txt
405 |
406 |
407 | 2017-04-21 17:49 fruits.txt Page 1
408 |
409 |
410 | Fruits
411 | apple
412 | guava
413 | watermelon
414 | banana
415 | pomegranate
416 |
417 | ```
418 |
419 | * Following sections will use `-t` to omit page headers and trailers
420 |
421 |
422 |
423 | #### Converting lines to columns
424 |
425 | * With [paste](#lines-to-multiple-columns), changing input file rows to column(s) is possible only with consecutive lines
426 | * `pr` can do that as well as split entire file itself according to number of columns needed
427 | * And `-s` option in `pr` allows multi-character output delimiter
428 | * As usual, examples to better show the functionalities
429 |
430 | ```bash
431 | $ # note how the input got split into two and resulting splits joined by ,
432 | $ seq 6 | pr -2ts,
433 | 1,4
434 | 2,5
435 | 3,6
436 |
437 | $ # note how two consecutive lines gets joined by ,
438 | $ seq 6 | paste -d, - -
439 | 1,2
440 | 3,4
441 | 5,6
442 | ```
443 |
444 | * Default **PAGE_WIDTH** is 72 characters, so each column gets 72 divided by number of columns unless `-s` is used
445 |
446 | ```bash
447 | $ # 3 columns, so each column width is 24 characters
448 | $ seq 9 | pr -3t
449 | 1 4 7
450 | 2 5 8
451 | 3 6 9
452 |
453 | $ # using -s, desired delimiter can be specified
454 | $ seq 9 | pr -3ts' '
455 | 1 4 7
456 | 2 5 8
457 | 3 6 9
458 |
459 | $ seq 9 | pr -3ts' : '
460 | 1 : 4 : 7
461 | 2 : 5 : 8
462 | 3 : 6 : 9
463 |
464 | $ # default is TAB when using -s option with no arguments
465 | $ seq 9 | pr -3ts
466 | 1 4 7
467 | 2 5 8
468 | 3 6 9
469 | ```
470 |
471 | * Using `-a` to change consecutive rows, similar to `paste`
472 |
473 | ```bash
474 | $ seq 8 | pr -4ats:
475 | 1:2:3:4
476 | 5:6:7:8
477 |
478 | $ # no output delimiter for empty cells
479 | $ seq 22 | pr -5ats,
480 | 1,2,3,4,5
481 | 6,7,8,9,10
482 | 11,12,13,14,15
483 | 16,17,18,19,20
484 | 21,22
485 |
486 | $ # note output delimiter even for empty cells
487 | $ seq 22 | paste -d, - - - - -
488 | 1,2,3,4,5
489 | 6,7,8,9,10
490 | 11,12,13,14,15
491 | 16,17,18,19,20
492 | 21,22,,,
493 | ```
494 |
495 |
496 |
497 | #### Changing PAGE_WIDTH
498 |
499 | * The default PAGE_WIDTH is 72
500 | * The formula `(col-1)*len(delimiter) + col` seems to work in determining minimum PAGE_WIDTH required for multiple column output
501 | * `col` is number of columns required
502 |
503 | ```bash
504 | $ # (36-1)*1 + 36 = 71, so within PAGE_WIDTH limit
505 | $ seq 74 | pr -36ats,
506 | 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36
507 | 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72
508 | 73,74
509 | $ # (37-1)*1 + 37 = 73, more than default PAGE_WIDTH limit
510 | $ seq 74 | pr -37ats,
511 | pr: page width too narrow
512 | ```
513 |
514 | * Use `-w` to specify a different PAGE_WIDTH
515 | * The `-J` option turns off truncation
516 |
517 | ```bash
518 | $ # (37-1)*1 + 37 = 73
519 | $ seq 74 | pr -J -w73 -37ats,
520 | 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37
521 | 38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74
522 |
523 | $ # (3-1)*4 + 3 = 11
524 | $ seq 6 | pr -J -w10 -3ats'::::'
525 | pr: page width too narrow
526 | $ seq 6 | pr -J -w11 -3ats'::::'
527 | 1::::2::::3
528 | 4::::5::::6
529 |
530 | $ # if calculating is difficult, simply use a large number
531 | $ seq 6 | pr -J -w500 -3ats'::::'
532 | 1::::2::::3
533 | 4::::5::::6
534 | ```
535 |
536 |
537 |
538 | #### Combining multiple input files
539 |
540 | * Use `-m` option to combine multiple files in parallel, similar to `paste`
541 |
542 | ```bash
543 | $ # 2 columns, so each column width is 36 characters
544 | $ pr -mt fruits.txt price.txt
545 | Fruits Price
546 | apple 182
547 | guava 90
548 | watermelon 35
549 | banana 72
550 | pomegranate 280
551 |
552 | $ # default is TAB when using -s option with no arguments
553 | $ pr -mts <(seq 3) <(seq 4 6) <(seq 7 10)
554 | 1 4 7
555 | 2 5 8
556 | 3 6 9
557 | 10
558 |
559 | $ # double TAB as separator
560 | $ # shell expands $'\t\t' before command is executed
561 | $ pr -mts$'\t\t' colors_1.txt colors_2.txt
562 | Blue Black
563 | Brown Blue
564 | Purple Green
565 | Red Red
566 | Teal White
567 | ```
568 |
569 | * For interleaving, specify newline as separator
570 |
571 | ```bash
572 | $ pr -mts$'\n' fruits.txt price.txt
573 | Fruits
574 | Price
575 | apple
576 | 182
577 | guava
578 | 90
579 | watermelon
580 | 35
581 | banana
582 | 72
583 | pomegranate
584 | 280
585 | ```
586 |
587 |
588 |
589 | #### Transposing a table
590 |
591 | ```bash
592 | $ # delimiter is single character, so easy to use tr to change it to newline
593 | $ cat dishes.txt
594 | North alootikki baati khichdi makkiroti poha
595 | South appam bisibelebath dosa koottu sevai
596 | West dhokla khakhra modak shiro vadapav
597 | East handoguri litti momo rosgulla shondesh
598 |
599 | $ # 4 columns, so each column width is 18 characters
600 | $ # $(wc -l < dishes.txt) gives number of columns required
601 | $ tr ' ' '\n' < dishes.txt | pr -$(wc -l < dishes.txt)t
602 | North South West East
603 | alootikki appam dhokla handoguri
604 | baati bisibelebath khakhra litti
605 | khichdi dosa modak momo
606 | makkiroti koottu shiro rosgulla
607 | poha sevai vadapav shondesh
608 | ```
609 |
610 | * Pipe the output to `column` if spacing is too much
611 |
612 | ```bash
613 | $ tr ' ' '\n' < dishes.txt | pr -$(wc -l < dishes.txt)t | column -t
614 | North South West East
615 | alootikki appam dhokla handoguri
616 | baati bisibelebath khakhra litti
617 | khichdi dosa modak momo
618 | makkiroti koottu shiro rosgulla
619 | poha sevai vadapav shondesh
620 | ```
621 |
622 |
623 |
624 | #### Further reading for pr
625 |
626 | * `man pr` and `info pr` for more options and detailed documentation
627 | * More examples [here](http://docstore.mik.ua/orelly/unix3/upt/ch21_15.htm)
628 |
629 |
630 |
631 | ## fold
632 |
633 | ```bash
634 | $ fold --version | head -n1
635 | fold (GNU coreutils) 8.25
636 |
637 | $ man fold
638 | FOLD(1) User Commands FOLD(1)
639 |
640 | NAME
641 | fold - wrap each input line to fit in specified width
642 |
643 | SYNOPSIS
644 | fold [OPTION]... [FILE]...
645 |
646 | DESCRIPTION
647 | Wrap input lines in each FILE, writing to standard output.
648 |
649 | With no FILE, or when FILE is -, read standard input.
650 | ...
651 | ```
652 |
653 |
654 |
655 | #### Examples
656 |
657 | ```bash
658 | $ nl story.txt
659 | 1 The princess of a far away land fought bravely to rescue a travelling group from bandits. And the happy story ends here. Have a nice day.
660 | 2 Still here? okay, read on: The prince of Happalakkahuhu wished he could be as brave as his sister and vowed to train harder
661 |
662 | $ # default folding width is 80
663 | $ fold story.txt
664 | The princess of a far away land fought bravely to rescue a travelling group from
665 | bandits. And the happy story ends here. Have a nice day.
666 | Still here? okay, read on: The prince of Happalakkahuhu wished he could be as br
667 | ave as his sister and vowed to train harder
668 |
669 | $ fold story.txt | nl
670 | 1 The princess of a far away land fought bravely to rescue a travelling group from
671 | 2 bandits. And the happy story ends here. Have a nice day.
672 | 3 Still here? okay, read on: The prince of Happalakkahuhu wished he could be as br
673 | 4 ave as his sister and vowed to train harder
674 | ```
675 |
676 | * `-s` option breaks at spaces to avoid word splitting
677 |
678 | ```bash
679 | $ fold -s story.txt
680 | The princess of a far away land fought bravely to rescue a travelling group
681 | from bandits. And the happy story ends here. Have a nice day.
682 | Still here? okay, read on: The prince of Happalakkahuhu wished he could be as
683 | brave as his sister and vowed to train harder
684 | ```
685 |
686 | * Use `-w` to change default width
687 |
688 | ```bash
689 | $ fold -s -w60 story.txt
690 | The princess of a far away land fought bravely to rescue a
691 | travelling group from bandits. And the happy story ends
692 | here. Have a nice day.
693 | Still here? okay, read on: The prince of Happalakkahuhu
694 | wished he could be as brave as his sister and vowed to
695 | train harder
696 | ```
697 |
698 |
699 |
700 | #### Further reading for fold
701 |
702 | * `man fold` and `info fold` for more options and detailed documentation
703 |
704 |
--------------------------------------------------------------------------------
/sorting_stuff.md:
--------------------------------------------------------------------------------
1 | # Sorting stuff
2 |
3 | **Table of Contents**
4 |
5 | * [sort](#sort)
6 | * [Default sort](#default-sort)
7 | * [Reverse sort](#reverse-sort)
8 | * [Various number sorting](#various-number-sorting)
9 | * [Random sort](#random-sort)
10 | * [Specifying output file](#specifying-output-file)
11 | * [Unique sort](#unique-sort)
12 | * [Column based sorting](#column-based-sorting)
13 | * [Further reading for sort](#further-reading-for-sort)
14 | * [uniq](#uniq)
15 | * [Default uniq](#default-uniq)
16 | * [Only duplicates](#only-duplicates)
17 | * [Only unique](#only-unique)
18 | * [Prefix count](#prefix-count)
19 | * [Ignoring case](#ignoring-case)
20 | * [Combining multiple files](#combining-multiple-files)
21 | * [Column options](#column-options)
22 | * [Further reading for uniq](#further-reading-for-uniq)
23 | * [comm](#comm)
24 | * [Default three column output](#default-three-column-output)
25 | * [Suppressing columns](#suppressing-columns)
26 | * [Files with duplicates](#files-with-duplicates)
27 | * [Further reading for comm](#further-reading-for-comm)
28 | * [shuf](#shuf)
29 | * [Random lines](#random-lines)
30 | * [Random integer numbers](#random-integer-numbers)
31 | * [Further reading for shuf](#further-reading-for-shuf)
32 |
33 |
34 |
35 | ## sort
36 |
37 | ```bash
38 | $ sort --version | head -n1
39 | sort (GNU coreutils) 8.25
40 |
41 | $ man sort
42 | SORT(1) User Commands SORT(1)
43 |
44 | NAME
45 | sort - sort lines of text files
46 |
47 | SYNOPSIS
48 | sort [OPTION]... [FILE]...
49 | sort [OPTION]... --files0-from=F
50 |
51 | DESCRIPTION
52 | Write sorted concatenation of all FILE(s) to standard output.
53 |
54 | With no FILE, or when FILE is -, read standard input.
55 | ...
56 | ```
57 |
58 | **Note**: All examples shown here assumes ASCII encoded input file
59 |
60 |
61 |
62 |
63 | #### Default sort
64 |
65 | ```bash
66 | $ cat poem.txt
67 | Roses are red,
68 | Violets are blue,
69 | Sugar is sweet,
70 | And so are you.
71 |
72 | $ sort poem.txt
73 | And so are you.
74 | Roses are red,
75 | Sugar is sweet,
76 | Violets are blue,
77 | ```
78 |
79 | * Well, that was easy. The lines were sorted alphabetically (ascending order by default) and it so happened that first letter alone was enough to decide the order
80 | * For next example, let's extract all the words and sort them
81 | * also allows to showcase `sort` accepting stdin
82 | * See [GNU grep](./gnu_grep.md) chapter if the `grep` command used below looks alien
83 |
84 | ```bash
85 | $ # output might differ depending on locale settings
86 | $ # note the case-insensitiveness of output
87 | $ grep -oi '[a-z]*' poem.txt | sort
88 | And
89 | are
90 | are
91 | are
92 | blue
93 | is
94 | red
95 | Roses
96 | so
97 | Sugar
98 | sweet
99 | Violets
100 | you
101 | ```
102 |
103 | * heed hereunto
104 | * See also
105 | * [arch wiki - locale](https://wiki.archlinux.org/index.php/locale)
106 | * [Linux: Define Locale and Language Settings](https://www.shellhacks.com/linux-define-locale-language-settings/)
107 |
108 | ```bash
109 | $ info sort | tail
110 |
111 | (1) If you use a non-POSIX locale (e.g., by setting ‘LC_ALL’ to
112 | ‘en_US’), then ‘sort’ may produce output that is sorted differently than
113 | you’re accustomed to. In that case, set the ‘LC_ALL’ environment
114 | variable to ‘C’. Note that setting only ‘LC_COLLATE’ has two problems.
115 | First, it is ineffective if ‘LC_ALL’ is also set. Second, it has
116 | undefined behavior if ‘LC_CTYPE’ (or ‘LANG’, if ‘LC_CTYPE’ is unset) is
117 | set to an incompatible value. For example, you get undefined behavior
118 | if ‘LC_CTYPE’ is ‘ja_JP.PCK’ but ‘LC_COLLATE’ is ‘en_US.UTF-8’.
119 | ```
120 |
121 | * Example to help show effect of locale setting
122 |
123 | ```bash
124 | $ # note how uppercase is sorted before lowercase
125 | $ grep -oi '[a-z]*' poem.txt | LC_ALL=C sort
126 | And
127 | Roses
128 | Sugar
129 | Violets
130 | are
131 | are
132 | are
133 | blue
134 | is
135 | red
136 | so
137 | sweet
138 | you
139 | ```
140 |
141 |
142 |
143 | #### Reverse sort
144 |
145 | * This is simply reversing from default ascending order to descending order
146 |
147 | ```bash
148 | $ sort -r poem.txt
149 | Violets are blue,
150 | Sugar is sweet,
151 | Roses are red,
152 | And so are you.
153 | ```
154 |
155 |
156 |
157 | #### Various number sorting
158 |
159 | ```bash
160 | $ cat numbers.txt
161 | 20
162 | 53
163 | 3
164 | 101
165 |
166 | $ sort numbers.txt
167 | 101
168 | 20
169 | 3
170 | 53
171 | ```
172 |
173 | * Whoops, what happened there? `sort` won't know to treat them as numbers unless specified
174 | * Depending on format of numbers, different options have to be used
175 | * First up is `-n` option, which sorts based on numerical value
176 |
177 | ```bash
178 | $ sort -n numbers.txt
179 | 3
180 | 20
181 | 53
182 | 101
183 |
184 | $ sort -nr numbers.txt
185 | 101
186 | 53
187 | 20
188 | 3
189 | ```
190 |
191 | * The `-n` option can handle negative numbers
192 | * As well as thousands separator and decimal point (depends on locale)
193 | * The `<()` syntax is [Process Substitution](http://mywiki.wooledge.org/ProcessSubstitution)
194 | * to put it simply - allows output of command to be passed as input file to another command without needing to manually create a temporary file
195 |
196 | ```bash
197 | $ # multiple files are merged as single input by default
198 | $ sort -n numbers.txt <(echo '-4')
199 | -4
200 | 3
201 | 20
202 | 53
203 | 101
204 |
205 | $ sort -n numbers.txt <(echo '1,234')
206 | 3
207 | 20
208 | 53
209 | 101
210 | 1,234
211 |
212 | $ sort -n numbers.txt <(echo '31.24')
213 | 3
214 | 20
215 | 31.24
216 | 53
217 | 101
218 | ```
219 |
220 | * Use `-g` if input contains numbers prefixed by `+` or [E scientific notation](https://en.wikipedia.org/wiki/Scientific_notation#E_notation)
221 |
222 | ```bash
223 | $ cat generic_numbers.txt
224 | +120
225 | -1.53
226 | 3.14e+4
227 | 42.1e-2
228 |
229 | $ sort -g generic_numbers.txt
230 | -1.53
231 | 42.1e-2
232 | +120
233 | 3.14e+4
234 | ```
235 |
236 | * Commands like `du` have options to display numbers in human readable formats
237 | * `sort` supports sorting such numbers using the `-h` option
238 |
239 | ```bash
240 | $ du -sh *
241 | 104K power.log
242 | 746M projects
243 | 316K report.log
244 | 20K sample.txt
245 | $ du -sh * | sort -h
246 | 20K sample.txt
247 | 104K power.log
248 | 316K report.log
249 | 746M projects
250 |
251 | $ # --si uses powers of 1000 instead of 1024
252 | $ du -s --si *
253 | 107k power.log
254 | 782M projects
255 | 324k report.log
256 | 21k sample.txt
257 | $ du -s --si * | sort -h
258 | 21k sample.txt
259 | 107k power.log
260 | 324k report.log
261 | 782M projects
262 | ```
263 |
264 | * Version sort - dealing with numbers mixed with other characters
265 | * If this sorting is needed simply while displaying directory contents, use `ls -v` instead of piping to `sort -V`
266 |
267 | ```bash
268 | $ cat versions.txt
269 | foo_v1.2
270 | bar_v2.1.3
271 | foobar_v2
272 | foo_v1.2.1
273 | foo_v1.3
274 |
275 | $ sort -V versions.txt
276 | bar_v2.1.3
277 | foobar_v2
278 | foo_v1.2
279 | foo_v1.2.1
280 | foo_v1.3
281 | ```
282 |
283 | * Another common use case is when there are multiple filenames differentiated by numbers
284 |
285 | ```bash
286 | $ cat files.txt
287 | file0
288 | file10
289 | file3
290 | file4
291 |
292 | $ sort -V files.txt
293 | file0
294 | file3
295 | file4
296 | file10
297 | ```
298 |
299 | * Can be used when dealing with numbers reported by `time` command as well
300 |
301 | ```bash
302 | $ # different solving durations
303 | $ cat rubik_time.txt
304 | 5m35.363s
305 | 3m20.058s
306 | 4m5.099s
307 | 4m1.130s
308 | 3m42.833s
309 | 4m33.083s
310 |
311 | $ # assuming consistent min/sec format
312 | $ sort -V rubik_time.txt
313 | 3m20.058s
314 | 3m42.833s
315 | 4m1.130s
316 | 4m5.099s
317 | 4m33.083s
318 | 5m35.363s
319 | ```
320 |
321 |
322 |
323 | #### Random sort
324 |
325 | * Note that duplicate lines will always end up next to each other
326 | * might be useful as a feature for some cases ;)
327 | * Use `shuf` if this is not desirable
328 | * See also [How can I shuffle the lines of a text file on the Unix command line or in a shell script?](https://stackoverflow.com/questions/2153882/how-can-i-shuffle-the-lines-of-a-text-file-on-the-unix-command-line-or-in-a-shel)
329 |
330 | ```bash
331 | $ cat nums.txt
332 | 1
333 | 10
334 | 10
335 | 12
336 | 23
337 | 563
338 |
339 | $ # the two 10s will always be next to each other
340 | $ sort -R nums.txt
341 | 563
342 | 12
343 | 1
344 | 10
345 | 10
346 | 23
347 |
348 | $ # duplicates can end up anywhere
349 | $ shuf nums.txt
350 | 10
351 | 23
352 | 1
353 | 10
354 | 563
355 | 12
356 | ```
357 |
358 |
359 |
360 | #### Specifying output file
361 |
362 | * The `-o` option can be used to specify output file
363 | * Useful for in place editing
364 |
365 | ```bash
366 | $ sort -R nums.txt -o rand_nums.txt
367 | $ cat rand_nums.txt
368 | 23
369 | 1
370 | 10
371 | 10
372 | 563
373 | 12
374 |
375 | $ sort -R nums.txt -o nums.txt
376 | $ cat nums.txt
377 | 563
378 | 23
379 | 10
380 | 10
381 | 1
382 | 12
383 | ```
384 |
385 | * Use shell script looping if there multiple files to be sorted in place
386 | * Below snippet is for `bash` shell
387 |
388 | ```bash
389 | $ for f in *.txt; do echo sort -V "$f" -o "$f"; done
390 | sort -V files.txt -o files.txt
391 | sort -V rubik_time.txt -o rubik_time.txt
392 | sort -V versions.txt -o versions.txt
393 |
394 | $ # remove echo once commands look fine
395 | $ for f in *.txt; do sort -V "$f" -o "$f"; done
396 | ```
397 |
398 |
399 |
400 | #### Unique sort
401 |
402 | * Keep only first copy of lines that are deemed to be same according to `sort` option used
403 |
404 | ```bash
405 | $ cat duplicates.txt
406 | foo
407 | 12 carrots
408 | foo
409 | 12 apples
410 | 5 guavas
411 |
412 | $ # only one copy of foo in output
413 | $ sort -u duplicates.txt
414 | 12 apples
415 | 12 carrots
416 | 5 guavas
417 | foo
418 | ```
419 |
420 | * According to option used, definition of duplicate will vary
421 | * For example, when `-n` is used, matching numbers are deemed same even if rest of line differs
422 | * Pipe the output to `uniq` if this is not desirable
423 |
424 | ```bash
425 | $ # note how first copy of line starting with 12 is retained
426 | $ sort -nu duplicates.txt
427 | foo
428 | 5 guavas
429 | 12 carrots
430 |
431 | $ # use uniq when entire line should be compared to find duplicates
432 | $ sort -n duplicates.txt | uniq
433 | foo
434 | 5 guavas
435 | 12 apples
436 | 12 carrots
437 | ```
438 |
439 | * Use `-f` option to ignore case of alphabets while determining duplicates
440 |
441 | ```bash
442 | $ cat words.txt
443 | CAR
444 | are
445 | car
446 | Are
447 | foot
448 | are
449 |
450 | $ # only the two 'are' were considered duplicates
451 | $ sort -u words.txt
452 | are
453 | Are
454 | car
455 | CAR
456 | foot
457 |
458 | $ # note again that first copy of duplicate is retained
459 | $ sort -fu words.txt
460 | are
461 | CAR
462 | foot
463 | ```
464 |
465 |
466 |
467 | #### Column based sorting
468 |
469 | From `info sort`
470 |
471 | ```
472 | ‘-k POS1[,POS2]’
473 | ‘--key=POS1[,POS2]’
474 | Specify a sort field that consists of the part of the line between
475 | POS1 and POS2 (or the end of the line, if POS2 is omitted),
476 | _inclusive_.
477 |
478 | Each POS has the form ‘F[.C][OPTS]’, where F is the number of the
479 | field to use, and C is the number of the first character from the
480 | beginning of the field. Fields and character positions are
481 | numbered starting with 1; a character position of zero in POS2
482 | indicates the field’s last character. If ‘.C’ is omitted from
483 | POS1, it defaults to 1 (the beginning of the field); if omitted
484 | from POS2, it defaults to 0 (the end of the field). OPTS are
485 | ordering options, allowing individual keys to be sorted according
486 | to different rules; see below for details. Keys can span multiple
487 | fields.
488 | ```
489 |
490 | * By default, blank characters (space and tab) serve as field separators
491 |
492 | ```bash
493 | $ cat fruits.txt
494 | apple 42
495 | guava 6
496 | fig 90
497 | banana 31
498 |
499 | $ sort fruits.txt
500 | apple 42
501 | banana 31
502 | fig 90
503 | guava 6
504 |
505 | $ # sort based on 2nd column numbers
506 | $ sort -k2,2n fruits.txt
507 | guava 6
508 | banana 31
509 | apple 42
510 | fig 90
511 | ```
512 |
513 | * Using a different field separator
514 | * Consider the following sample input file having fields separated by `:`
515 |
516 | ```bash
517 | $ # name:pet_name:no_of_pets
518 | $ cat pets.txt
519 | foo:dog:2
520 | xyz:cat:1
521 | baz:parrot:5
522 | abcd:cat:3
523 | joe:dog:1
524 | bar:fox:1
525 | temp_var:squirrel:4
526 | boss:dog:10
527 | ```
528 |
529 | * Sorting based on particular column or column to end of line
530 | * In case of multiple entries, by default `sort` would use content of remaining parts of line to resolve
531 |
532 | ```bash
533 | $ # only 2nd column
534 | $ # -k2,4 would mean 2nd column to 4th column
535 | $ sort -t: -k2,2 pets.txt
536 | abcd:cat:3
537 | xyz:cat:1
538 | boss:dog:10
539 | foo:dog:2
540 | joe:dog:1
541 | bar:fox:1
542 | baz:parrot:5
543 | temp_var:squirrel:4
544 |
545 | $ # from 2nd column to end of line
546 | $ sort -t: -k2 pets.txt
547 | xyz:cat:1
548 | abcd:cat:3
549 | joe:dog:1
550 | boss:dog:10
551 | foo:dog:2
552 | bar:fox:1
553 | baz:parrot:5
554 | temp_var:squirrel:4
555 | ```
556 |
557 | * Multiple keys can be specified to resolve ties
558 | * Note that if there are still multiple entries with specified keys, remaining parts of lines would be used
559 |
560 | ```bash
561 | $ # default sort for 2nd column, numeric sort on 3rd column to resolve ties
562 | $ sort -t: -k2,2 -k3,3n pets.txt
563 | xyz:cat:1
564 | abcd:cat:3
565 | joe:dog:1
566 | foo:dog:2
567 | boss:dog:10
568 | bar:fox:1
569 | baz:parrot:5
570 | temp_var:squirrel:4
571 |
572 | $ # numeric sort on 3rd column, default sort for 2nd column to resolve ties
573 | $ sort -t: -k3,3n -k2,2 pets.txt
574 | xyz:cat:1
575 | joe:dog:1
576 | bar:fox:1
577 | foo:dog:2
578 | abcd:cat:3
579 | temp_var:squirrel:4
580 | baz:parrot:5
581 | boss:dog:10
582 | ```
583 |
584 | * Use `-s` option to retain original order of lines in case of tie
585 |
586 | ```bash
587 | $ sort -s -t: -k2,2 pets.txt
588 | xyz:cat:1
589 | abcd:cat:3
590 | foo:dog:2
591 | joe:dog:1
592 | boss:dog:10
593 | bar:fox:1
594 | baz:parrot:5
595 | temp_var:squirrel:4
596 | ```
597 |
598 | * The `-u` option, as seen earlier, will retain only first match
599 |
600 | ```bash
601 | $ sort -u -t: -k2,2 pets.txt
602 | xyz:cat:1
603 | foo:dog:2
604 | bar:fox:1
605 | baz:parrot:5
606 | temp_var:squirrel:4
607 |
608 | $ sort -u -t: -k3,3n pets.txt
609 | xyz:cat:1
610 | foo:dog:2
611 | abcd:cat:3
612 | temp_var:squirrel:4
613 | baz:parrot:5
614 | boss:dog:10
615 | ```
616 |
617 | * Sometimes, the input has to be sorted first and then `-u` used on the sorted output
618 | * See also [remove duplicates based on the value of another column](https://unix.stackexchange.com/questions/379835/remove-duplicates-based-on-the-value-of-another-column)
619 |
620 | ```bash
621 | $ # sort by number in 3rd column
622 | $ sort -t: -k3,3n pets.txt
623 | bar:fox:1
624 | joe:dog:1
625 | xyz:cat:1
626 | foo:dog:2
627 | abcd:cat:3
628 | temp_var:squirrel:4
629 | baz:parrot:5
630 | boss:dog:10
631 |
632 | $ # then get unique entry based on 2nd column
633 | $ sort -t: -k3,3n pets.txt | sort -t: -u -k2,2
634 | xyz:cat:1
635 | joe:dog:1
636 | bar:fox:1
637 | baz:parrot:5
638 | temp_var:squirrel:4
639 | ```
640 |
641 | * Specifying particular characters within fields
642 | * If character position is not specified, defaults to `1` for starting column and `0` (last character) for ending column
643 |
644 | ```bash
645 | $ cat marks.txt
646 | fork,ap_12,54
647 | flat,up_342,1.2
648 | fold,tn_48,211
649 | more,ap_93,7
650 | rest,up_5,63
651 |
652 | $ # for 2nd column, sort numerically only from 4th character to end
653 | $ sort -t, -k2.4,2n marks.txt
654 | rest,up_5,63
655 | fork,ap_12,54
656 | fold,tn_48,211
657 | more,ap_93,7
658 | flat,up_342,1.2
659 |
660 | $ # sort uniquely based on first two characters of line
661 | $ sort -u -k1.1,1.2 marks.txt
662 | flat,up_342,1.2
663 | fork,ap_12,54
664 | more,ap_93,7
665 | rest,up_5,63
666 | ```
667 |
668 | * If there are headers
669 |
670 | ```bash
671 | $ cat header.txt
672 | fruit qty
673 | apple 42
674 | guava 6
675 | fig 90
676 | banana 31
677 |
678 | $ # separate and combine header and content to be sorted
679 | $ cat <(head -n1 header.txt) <(tail -n +2 header.txt | sort -k2nr)
680 | fruit qty
681 | fig 90
682 | apple 42
683 | banana 31
684 | guava 6
685 | ```
686 |
687 | * See also [sort by last field value when number of fields varies](https://stackoverflow.com/questions/3832068/bash-sort-text-file-by-last-field-value)
688 |
689 |
690 |
691 | #### Further reading for sort
692 |
693 | * There are many other options apart from handful presented above. See `man sort` and `info sort` for detailed documentation and more examples
694 | * [sort like a master](http://www.skorks.com/2010/05/sort-files-like-a-master-with-the-linux-sort-command-bash/)
695 | * [When -b to ignore leading blanks is needed](https://unix.stackexchange.com/a/104527/109046)
696 | * [sort Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/sort?sort=votes&pageSize=15)
697 | * [sort on multiple columns using -k option](https://unix.stackexchange.com/questions/249452/unix-multiple-column-sort-issue)
698 | * [sort a string character wise](https://stackoverflow.com/questions/2373874/how-to-sort-characters-in-a-string)
699 | * [Scalability of 'sort -u' for gigantic files](https://unix.stackexchange.com/questions/279096/scalability-of-sort-u-for-gigantic-files)
700 |
701 |
702 |
703 | ## uniq
704 |
705 | ```bash
706 | $ uniq --version | head -n1
707 | uniq (GNU coreutils) 8.25
708 |
709 | $ man uniq
710 | UNIQ(1) User Commands UNIQ(1)
711 |
712 | NAME
713 | uniq - report or omit repeated lines
714 |
715 | SYNOPSIS
716 | uniq [OPTION]... [INPUT [OUTPUT]]
717 |
718 | DESCRIPTION
719 | Filter adjacent matching lines from INPUT (or standard input), writing
720 | to OUTPUT (or standard output).
721 |
722 | With no options, matching lines are merged to the first occurrence.
723 | ...
724 | ```
725 |
726 |
727 |
728 | #### Default uniq
729 |
730 | ```bash
731 | $ cat word_list.txt
732 | are
733 | are
734 | to
735 | good
736 | bad
737 | bad
738 | bad
739 | good
740 | are
741 | bad
742 |
743 | $ # adjacent duplicate lines are removed, leaving one copy
744 | $ uniq word_list.txt
745 | are
746 | to
747 | good
748 | bad
749 | good
750 | are
751 | bad
752 |
753 | $ # To remove duplicates from entire file, input has to be sorted first
754 | $ # also showcases that uniq accepts stdin as input
755 | $ sort word_list.txt | uniq
756 | are
757 | bad
758 | good
759 | to
760 | ```
761 |
762 |
763 |
764 | #### Only duplicates
765 |
766 | ```bash
767 | $ # duplicates adjacent to each other
768 | $ uniq -d word_list.txt
769 | are
770 | bad
771 |
772 | $ # duplicates in entire file
773 | $ sort word_list.txt | uniq -d
774 | are
775 | bad
776 | good
777 | ```
778 |
779 | * To get only duplicates as well as show all duplicates
780 |
781 | ```bash
782 | $ uniq -D word_list.txt
783 | are
784 | are
785 | bad
786 | bad
787 | bad
788 |
789 | $ sort word_list.txt | uniq -D
790 | are
791 | are
792 | are
793 | bad
794 | bad
795 | bad
796 | bad
797 | good
798 | good
799 | ```
800 |
801 | * To distinguish the different groups
802 |
803 | ```bash
804 | $ # using --all-repeated=prepend will add a newline before the first group as well
805 | $ sort word_list.txt | uniq --all-repeated=separate
806 | are
807 | are
808 | are
809 |
810 | bad
811 | bad
812 | bad
813 | bad
814 |
815 | good
816 | good
817 | ```
818 |
819 |
820 |
821 | #### Only unique
822 |
823 | ```bash
824 | $ # lines with no adjacent duplicates
825 | $ uniq -u word_list.txt
826 | to
827 | good
828 | good
829 | are
830 | bad
831 |
832 | $ # unique lines in entire file
833 | $ sort word_list.txt | uniq -u
834 | to
835 | ```
836 |
837 |
838 |
839 | #### Prefix count
840 |
841 | ```bash
842 | $ # adjacent lines
843 | $ uniq -c word_list.txt
844 | 2 are
845 | 1 to
846 | 1 good
847 | 3 bad
848 | 1 good
849 | 1 are
850 | 1 bad
851 |
852 | $ # entire file
853 | $ sort word_list.txt | uniq -c
854 | 3 are
855 | 4 bad
856 | 2 good
857 | 1 to
858 |
859 | $ # entire file, only duplicates
860 | $ sort word_list.txt | uniq -cd
861 | 3 are
862 | 4 bad
863 | 2 good
864 | ```
865 |
866 | * Sorting by count
867 |
868 | ```bash
869 | $ # sort by count
870 | $ sort word_list.txt | uniq -c | sort -n
871 | 1 to
872 | 2 good
873 | 3 are
874 | 4 bad
875 |
876 | $ # reverse the order, highest count first
877 | $ sort word_list.txt | uniq -c | sort -nr
878 | 4 bad
879 | 3 are
880 | 2 good
881 | 1 to
882 | ```
883 |
884 | * To get only entries with min/max count, bit of [awk](./gnu_awk.md) magic would help
885 |
886 | ```bash
887 | $ # consider this result
888 | $ sort colors.txt | uniq -c | sort -nr
889 | 3 Red
890 | 3 Blue
891 | 2 Yellow
892 | 1 Green
893 | 1 Black
894 |
895 | $ # to get all max count
896 | $ # save 1st line 1st column value to c and then print if 1st column equals c
897 | $ sort colors.txt | uniq -c | sort -nr | awk 'NR==1{c=$1} $1==c'
898 | 3 Red
899 | 3 Blue
900 | $ # to get all min count
901 | $ sort colors.txt | uniq -c | sort -n | awk 'NR==1{c=$1} $1==c'
902 | 1 Black
903 | 1 Green
904 | ```
905 |
906 | * Get rough count of most used commands from `history` file
907 |
908 | ```bash
909 | $ # awk '{print $1}' will get the 1st column alone
910 | $ awk '{print $1}' "$HISTFILE" | sort | uniq -c | sort -nr | head
911 | 1465 echo
912 | 1180 grep
913 | 552 cd
914 | 531 awk
915 | 451 sed
916 | 423 vi
917 | 418 cat
918 | 392 perl
919 | 325 printf
920 | 320 sort
921 |
922 | $ # extract command name from start of line or preceded by 'spaces|spaces'
923 | $ # won't catch commands in other places like command substitution though
924 | $ grep -oP '(^| +\| +)\K[^ ]+' "$HISTFILE" | sort | uniq -c | sort -nr | head
925 | 2006 grep
926 | 1469 echo
927 | 933 sed
928 | 698 awk
929 | 552 cd
930 | 513 perl
931 | 510 cat
932 | 453 sort
933 | 423 vi
934 | 327 printf
935 | ```
936 |
937 |
938 |
939 | #### Ignoring case
940 |
941 | ```bash
942 | $ cat another_list.txt
943 | food
944 | Food
945 | good
946 | are
947 | bad
948 | Are
949 |
950 | $ # note how first copy is retained
951 | $ uniq -i another_list.txt
952 | food
953 | good
954 | are
955 | bad
956 | Are
957 |
958 | $ uniq -iD another_list.txt
959 | food
960 | Food
961 | ```
962 |
963 |
964 |
965 | #### Combining multiple files
966 |
967 | ```bash
968 | $ sort -f word_list.txt another_list.txt | uniq -i
969 | are
970 | bad
971 | food
972 | good
973 | to
974 |
975 | $ sort -f word_list.txt another_list.txt | uniq -c
976 | 4 are
977 | 1 Are
978 | 5 bad
979 | 1 food
980 | 1 Food
981 | 3 good
982 | 1 to
983 |
984 | $ sort -f word_list.txt another_list.txt | uniq -ic
985 | 5 are
986 | 5 bad
987 | 2 food
988 | 3 good
989 | 1 to
990 | ```
991 |
992 | * If only adjacent lines (not sorted) is required, need to concatenate files using another command
993 |
994 | ```bash
995 | $ uniq -id word_list.txt
996 | are
997 | bad
998 |
999 | $ uniq -id another_list.txt
1000 | food
1001 |
1002 | $ cat word_list.txt another_list.txt | uniq -id
1003 | are
1004 | bad
1005 | food
1006 | ```
1007 |
1008 |
1009 |
1010 | #### Column options
1011 |
1012 | * `uniq` has few options dealing with column manipulations. Not extensive as `sort -k` but handy for some cases
1013 | * First up, skipping fields
1014 | * No option to specify different delimiter
1015 | * From `info uniq`: Fields are sequences of non-space non-tab characters that are separated from each other by at least one space or tab
1016 | * Number of spaces/tabs between fields should be same
1017 |
1018 | ```bash
1019 | $ cat shopping.txt
1020 | lemon 5
1021 | mango 5
1022 | banana 8
1023 | bread 1
1024 | orange 5
1025 |
1026 | $ # skips first field
1027 | $ uniq -f1 shopping.txt
1028 | lemon 5
1029 | banana 8
1030 | bread 1
1031 | orange 5
1032 |
1033 | $ # use -f3 to skip first three fields and so on
1034 | ```
1035 |
1036 | * Skipping characters
1037 |
1038 | ```bash
1039 | $ cat text
1040 | glue
1041 | blue
1042 | black
1043 | stack
1044 | stuck
1045 |
1046 | $ # don't consider first 2 characters
1047 | $ uniq -s2 text
1048 | glue
1049 | black
1050 | stuck
1051 |
1052 | $ # to visualize the above example
1053 | $ # assume there are two fields and uniq is applied on 2nd column
1054 | $ sed 's/^../& /' text
1055 | gl ue
1056 | bl ue
1057 | bl ack
1058 | st ack
1059 | st uck
1060 | ```
1061 |
1062 | * Upto specified characters
1063 |
1064 | ```bash
1065 | $ # consider only first 2 characters
1066 | $ uniq -w2 text
1067 | glue
1068 | blue
1069 | stack
1070 |
1071 | $ # to visualize the above example
1072 | $ # assume there are two fields and uniq is applied on 1st column
1073 | $ sed 's/^../& /' text
1074 | gl ue
1075 | bl ue
1076 | bl ack
1077 | st ack
1078 | st uck
1079 | ```
1080 |
1081 | * Combining `-s` and `-w`
1082 | * Can be combined with `-f` as well
1083 |
1084 | ```bash
1085 | $ # skip first 3 characters and then use next 2 characters
1086 | $ uniq -s3 -w2 text
1087 | glue
1088 | black
1089 | ```
1090 |
1091 |
1092 |
1093 |
1094 | #### Further reading for uniq
1095 |
1096 | * Do check out `man uniq` and `info uniq` for other options and more detailed documentation
1097 | * [uniq Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/uniq?sort=votes&pageSize=15)
1098 | * [process duplicate lines only based on certain fields](https://unix.stackexchange.com/questions/387590/print-the-duplicate-lines-only-on-fields-1-2-from-csv-file)
1099 |
1100 |
1101 |
1102 | ## comm
1103 |
1104 | ```bash
1105 | $ comm --version | head -n1
1106 | comm (GNU coreutils) 8.25
1107 |
1108 | $ man comm
1109 | COMM(1) User Commands COMM(1)
1110 |
1111 | NAME
1112 | comm - compare two sorted files line by line
1113 |
1114 | SYNOPSIS
1115 | comm [OPTION]... FILE1 FILE2
1116 |
1117 | DESCRIPTION
1118 | Compare sorted files FILE1 and FILE2 line by line.
1119 |
1120 | When FILE1 or FILE2 (not both) is -, read standard input.
1121 |
1122 | With no options, produce three-column output. Column one contains
1123 | lines unique to FILE1, column two contains lines unique to FILE2, and
1124 | column three contains lines common to both files.
1125 | ...
1126 | ```
1127 |
1128 |
1129 |
1130 | #### Default three column output
1131 |
1132 | Consider below sample input files
1133 |
1134 | ```bash
1135 | $ # sorted input files viewed side by side
1136 | $ paste colors_1.txt colors_2.txt
1137 | Blue Black
1138 | Brown Blue
1139 | Purple Green
1140 | Red Red
1141 | Teal White
1142 | Yellow
1143 | ```
1144 |
1145 | * Without any option, `comm` gives 3 column output
1146 | * lines unique to first file
1147 | * lines unique to second file
1148 | * lines common to both files
1149 |
1150 | ```bash
1151 | $ comm colors_1.txt colors_2.txt
1152 | Black
1153 | Blue
1154 | Brown
1155 | Green
1156 | Purple
1157 | Red
1158 | Teal
1159 | White
1160 | Yellow
1161 | ```
1162 |
1163 |
1164 |
1165 | #### Suppressing columns
1166 |
1167 | * `-1` suppress lines unique to first file
1168 | * `-2` suppress lines unique to second file
1169 | * `-3` suppress lines common to both files
1170 |
1171 | ```bash
1172 | $ # suppressing column 3
1173 | $ comm -3 colors_1.txt colors_2.txt
1174 | Black
1175 | Brown
1176 | Green
1177 | Purple
1178 | Teal
1179 | White
1180 | Yellow
1181 | ```
1182 |
1183 | * Combining options gives three distinct and useful constructs
1184 | * First, getting only common lines to both files
1185 |
1186 | ```bash
1187 | $ comm -12 colors_1.txt colors_2.txt
1188 | Blue
1189 | Red
1190 | ```
1191 |
1192 | * Second, lines unique to first file
1193 |
1194 | ```bash
1195 | $ comm -23 colors_1.txt colors_2.txt
1196 | Brown
1197 | Purple
1198 | Teal
1199 | Yellow
1200 | ```
1201 |
1202 | * And the third, lines unique to second file
1203 |
1204 | ```bash
1205 | $ comm -13 colors_1.txt colors_2.txt
1206 | Black
1207 | Green
1208 | White
1209 | ```
1210 |
1211 | * See also how the above three cases can be done [using grep alone](./gnu_grep.md#search-strings-from-file)
1212 | * **Note** input files do not need to be sorted for `grep` solution
1213 |
1214 | If different `sort` order than default is required, use `--nocheck-order` to ignore error message
1215 |
1216 | ```bash
1217 | $ comm -23 <(sort -n numbers.txt) <(sort -n nums.txt)
1218 | 3
1219 | comm: file 1 is not in sorted order
1220 | 20
1221 | 53
1222 | 101
1223 |
1224 | $ comm --nocheck-order -23 <(sort -n numbers.txt) <(sort -n nums.txt)
1225 | 3
1226 | 20
1227 | 53
1228 | 101
1229 | ```
1230 |
1231 |
1232 |
1233 | #### Files with duplicates
1234 |
1235 | * As many duplicate lines match in both files, they'll be considered as common
1236 | * Rest will be unique to respective files
1237 | * This is useful for cases like finding lines present in first but not in second taking in to consideration count of duplicates as well
1238 | * This solution won't be possible with `grep`
1239 |
1240 | ```bash
1241 | $ paste list1 list2
1242 | a a
1243 | a b
1244 | a c
1245 | b c
1246 | b d
1247 | c
1248 |
1249 | $ comm list1 list2
1250 | a
1251 | a
1252 | a
1253 | b
1254 | b
1255 | c
1256 | c
1257 | d
1258 |
1259 | $ comm -23 list1 list2
1260 | a
1261 | a
1262 | b
1263 | ```
1264 |
1265 |
1266 |
1267 | #### Further reading for comm
1268 |
1269 | * `man comm` and `info comm` for more options and detailed documentation
1270 | * [comm Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/comm?sort=votes&pageSize=15)
1271 |
1272 |
1273 |
1274 | ## shuf
1275 |
1276 | ```bash
1277 | $ shuf --version | head -n1
1278 | shuf (GNU coreutils) 8.25
1279 |
1280 | $ man shuf
1281 | SHUF(1) User Commands SHUF(1)
1282 |
1283 | NAME
1284 | shuf - generate random permutations
1285 |
1286 | SYNOPSIS
1287 | shuf [OPTION]... [FILE]
1288 | shuf -e [OPTION]... [ARG]...
1289 | shuf -i LO-HI [OPTION]...
1290 |
1291 | DESCRIPTION
1292 | Write a random permutation of the input lines to standard output.
1293 |
1294 | With no FILE, or when FILE is -, read standard input.
1295 | ...
1296 | ```
1297 |
1298 |
1299 |
1300 | #### Random lines
1301 |
1302 | * Without repeating input lines
1303 |
1304 | ```bash
1305 | $ cat nums.txt
1306 | 1
1307 | 10
1308 | 10
1309 | 12
1310 | 23
1311 | 563
1312 |
1313 | $ # duplicates can end up anywhere
1314 | $ # all lines are part of output
1315 | $ shuf nums.txt
1316 | 10
1317 | 23
1318 | 1
1319 | 10
1320 | 563
1321 | 12
1322 |
1323 | $ # limit max number of output lines
1324 | $ shuf -n2 nums.txt
1325 | 563
1326 | 23
1327 | ```
1328 |
1329 | * Use `-o` option to specify output file name instead of displaying on stdout
1330 | * Helpful for inplace editing
1331 |
1332 | ```bash
1333 | $ shuf nums.txt -o nums.txt
1334 | $ cat nums.txt
1335 | 10
1336 | 12
1337 | 23
1338 | 10
1339 | 563
1340 | 1
1341 | ```
1342 |
1343 | * With repeated input lines
1344 |
1345 | ```bash
1346 | $ # -n3 for max 3 lines, -r allows input lines to be repeated
1347 | $ shuf -n3 -r nums.txt
1348 | 1
1349 | 1
1350 | 563
1351 |
1352 | $ seq 3 | shuf -n5 -r
1353 | 2
1354 | 1
1355 | 2
1356 | 1
1357 | 2
1358 |
1359 | $ # if a limit using -n is not specified, shuf will output lines indefinitely
1360 | ```
1361 |
1362 | * use `-e` option to specify multiple input lines from command line itself
1363 |
1364 | ```bash
1365 | $ shuf -e red blue green
1366 | green
1367 | blue
1368 | red
1369 |
1370 | $ shuf -e 'hi there' 'hello world' foo bar
1371 | bar
1372 | hi there
1373 | foo
1374 | hello world
1375 |
1376 | $ shuf -n2 -e 'hi there' 'hello world' foo bar
1377 | foo
1378 | hi there
1379 |
1380 | $ shuf -r -n4 -e foo bar
1381 | foo
1382 | foo
1383 | bar
1384 | foo
1385 | ```
1386 |
1387 |
1388 |
1389 | #### Random integer numbers
1390 |
1391 | * The `-i` option accepts integer range as input to be shuffled
1392 |
1393 | ```bash
1394 | $ shuf -i 3-8
1395 | 3
1396 | 7
1397 | 6
1398 | 4
1399 | 8
1400 | 5
1401 | ```
1402 |
1403 | * Combine with other options as needed
1404 |
1405 | ```bash
1406 | $ shuf -n3 -i 3-8
1407 | 5
1408 | 4
1409 | 7
1410 |
1411 | $ shuf -r -n4 -i 3-8
1412 | 5
1413 | 5
1414 | 7
1415 | 8
1416 |
1417 | $ shuf -r -n5 -i 0-1
1418 | 1
1419 | 0
1420 | 0
1421 | 1
1422 | 1
1423 | ```
1424 |
1425 | * Use [seq](./miscellaneous.md#seq) input if negative numbers, floating point, etc are needed
1426 |
1427 | ```bash
1428 | $ seq 2 -1 -2 | shuf
1429 | 2
1430 | -1
1431 | -2
1432 | 0
1433 | 1
1434 |
1435 | $ seq 0.3 0.1 0.7 | shuf -n3
1436 | 0.4
1437 | 0.5
1438 | 0.7
1439 | ```
1440 |
1441 |
1442 |
1443 |
1444 | #### Further reading for shuf
1445 |
1446 | * `man shuf` and `info shuf` for more options and detailed documentation
1447 | * [Generate random numbers in specific range](https://unix.stackexchange.com/questions/140750/generate-random-numbers-in-specific-range)
1448 | * [Variable - randomly choose among three numbers](https://unix.stackexchange.com/questions/330689/variable-randomly-chosen-among-three-numbers-10-100-and-1000)
1449 | * Related to 'random' stuff:
1450 | * [How to generate a random string?](https://unix.stackexchange.com/questions/230673/how-to-generate-a-random-string)
1451 | * [How can I populate a file with random data?](https://unix.stackexchange.com/questions/33629/how-can-i-populate-a-file-with-random-data)
1452 | * [Run commands at random](https://unix.stackexchange.com/questions/81566/run-commands-at-random)
1453 |
1454 |
--------------------------------------------------------------------------------
/tail_less_cat_head.md:
--------------------------------------------------------------------------------
1 | # Cat, Less, Tail and Head
2 |
3 | **Table of Contents**
4 |
5 | * [cat](#cat)
6 | * [Concatenate files](#concatenate-files)
7 | * [Accepting input from stdin](#accepting-input-from-stdin)
8 | * [Squeeze consecutive empty lines](#squeeze-consecutive-empty-lines)
9 | * [Prefix line numbers](#prefix-line-numbers)
10 | * [Viewing special characters](#viewing-special-characters)
11 | * [Writing text to file](#writing-text-to-file)
12 | * [tac](#tac)
13 | * [Useless use of cat](#useless-use-of-cat)
14 | * [Further Reading for cat](#further-reading-for-cat)
15 | * [less](#less)
16 | * [Navigation commands](#navigation-commands)
17 | * [Further Reading for less](#further-reading-for-less)
18 | * [tail](#tail)
19 | * [linewise tail](#linewise-tail)
20 | * [characterwise tail](#characterwise-tail)
21 | * [multiple file input for tail](#multiple-file-input-for-tail)
22 | * [Further Reading for tail](#further-reading-for-tail)
23 | * [head](#head)
24 | * [linewise head](#linewise-head)
25 | * [characterwise head](#characterwise-head)
26 | * [multiple file input for head](#multiple-file-input-for-head)
27 | * [combining head and tail](#combining-head-and-tail)
28 | * [Further Reading for head](#further-reading-for-head)
29 | * [Text Editors](#text-editors)
30 |
31 |
32 |
33 | ## cat
34 |
35 | ```bash
36 | $ cat --version | head -n1
37 | cat (GNU coreutils) 8.25
38 |
39 | $ man cat
40 | CAT(1) User Commands CAT(1)
41 |
42 | NAME
43 | cat - concatenate files and print on the standard output
44 |
45 | SYNOPSIS
46 | cat [OPTION]... [FILE]...
47 |
48 | DESCRIPTION
49 | Concatenate FILE(s) to standard output.
50 |
51 | With no FILE, or when FILE is -, read standard input.
52 | ...
53 | ```
54 |
55 | * For below examples, `marks_201*` files contain 3 fields delimited by TAB
56 | * To avoid formatting issues, TAB has been converted to spaces using `col -x` while pasting the output here
57 |
58 |
59 |
60 | #### Concatenate files
61 |
62 | * One or more files can be given as input and hence a lot of times, `cat` is used to quickly see contents of small single file on terminal
63 | * To save the output of concatenation, just redirect stdout
64 |
65 | ```bash
66 | $ ls
67 | marks_2015.txt marks_2016.txt marks_2017.txt
68 |
69 | $ cat marks_201*
70 | Name Maths Science
71 | foo 67 78
72 | bar 87 85
73 | Name Maths Science
74 | foo 70 75
75 | bar 85 88
76 | Name Maths Science
77 | foo 68 76
78 | bar 90 90
79 |
80 | $ # save stdout to a file
81 | $ cat marks_201* > all_marks.txt
82 | ```
83 |
84 |
85 |
86 | #### Accepting input from stdin
87 |
88 | ```bash
89 | $ # combining input from stdin and other files
90 | $ printf 'Name\tMaths\tScience \nbaz\t56\t63\nbak\t71\t65\n' | cat - marks_2015.txt
91 | Name Maths Science
92 | baz 56 63
93 | bak 71 65
94 | Name Maths Science
95 | foo 67 78
96 | bar 87 85
97 |
98 | $ # - can be placed in whatever order is required
99 | $ printf 'Name\tMaths\tScience \nbaz\t56\t63\nbak\t71\t65\n' | cat marks_2015.txt -
100 | Name Maths Science
101 | foo 67 78
102 | bar 87 85
103 | Name Maths Science
104 | baz 56 63
105 | bak 71 65
106 | ```
107 |
108 |
109 |
110 | #### Squeeze consecutive empty lines
111 |
112 | ```bash
113 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n'
114 | hello
115 |
116 |
117 | world
118 |
119 | have a nice day
120 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n' | cat -s
121 | hello
122 |
123 | world
124 |
125 | have a nice day
126 | ```
127 |
128 |
129 |
130 | #### Prefix line numbers
131 |
132 | ```bash
133 | $ # number all lines
134 | $ cat -n marks_201*
135 | 1 Name Maths Science
136 | 2 foo 67 78
137 | 3 bar 87 85
138 | 4 Name Maths Science
139 | 5 foo 70 75
140 | 6 bar 85 88
141 | 7 Name Maths Science
142 | 8 foo 68 76
143 | 9 bar 90 90
144 |
145 | $ # number only non-empty lines
146 | $ printf 'hello\n\n\nworld\n\nhave a nice day\n' | cat -sb
147 | 1 hello
148 |
149 | 2 world
150 |
151 | 3 have a nice day
152 | ```
153 |
154 | * For more numbering options, check out the command `nl`
155 |
156 | ```bash
157 | $ whatis nl
158 | nl (1) - number lines of files
159 | ```
160 |
161 |
162 |
163 | #### Viewing special characters
164 |
165 | * End of line identified by `$`
166 | * Useful for example to see trailing spaces
167 |
168 | ```bash
169 | $ cat -E marks_2015.txt
170 | Name Maths Science $
171 | foo 67 78$
172 | bar 87 85$
173 | ```
174 |
175 | * TAB identified by `^I`
176 |
177 | ```bash
178 | $ cat -T marks_2015.txt
179 | Name^IMaths^IScience
180 | foo^I67^I78
181 | bar^I87^I85
182 | ```
183 |
184 | * Non-printing characters
185 | * See [Show Non-Printing Characters](http://docstore.mik.ua/orelly/unix/upt/ch25_07.htm) for more detailed info
186 |
187 | ```bash
188 | $ # NUL character
189 | $ printf 'foo\0bar\0baz\n' | cat -v
190 | foo^@bar^@baz
191 |
192 | $ # to check for dos-style line endings
193 | $ printf 'Hello World!\r\n' | cat -v
194 | Hello World!^M
195 |
196 | $ printf 'Hello World!\r\n' | dos2unix | cat -v
197 | Hello World!
198 | ```
199 |
200 | * the `-A` option is equivalent to `-vET`
201 | * the `-e` option is equivalent to `-vE`
202 | * If `dos2unix` and `unix2dos` are not available, see [How to convert DOS/Windows newline (CRLF) to Unix newline (\n)](https://stackoverflow.com/questions/2613800/how-to-convert-dos-windows-newline-crlf-to-unix-newline-n-in-a-bash-script)
203 |
204 |
205 |
206 | #### Writing text to file
207 |
208 | ```bash
209 | $ cat > sample.txt
210 | This is an example of adding text to a new file using cat command.
211 | Press Ctrl+d on a newline to save and quit.
212 |
213 | $ cat sample.txt
214 | This is an example of adding text to a new file using cat command.
215 | Press Ctrl+d on a newline to save and quit.
216 | ```
217 |
218 | * See also how to use [heredoc](http://mywiki.wooledge.org/HereDocument)
219 | * [How can I write a here doc to a file](https://stackoverflow.com/questions/2953081/how-can-i-write-a-here-doc-to-a-file-in-bash-script)
220 | * See also [difference between Ctrl+c and Ctrl+d to signal end of stdin input in bash](https://unix.stackexchange.com/questions/16333/how-to-signal-the-end-of-stdin-input-in-bash)
221 |
222 |
223 |
224 | #### tac
225 |
226 | ```bash
227 | $ whatis tac
228 | tac (1) - concatenate and print files in reverse
229 | $ tac --version | head -n1
230 | tac (GNU coreutils) 8.25
231 |
232 | $ seq 3 | tac
233 | 3
234 | 2
235 | 1
236 |
237 | $ tac marks_2015.txt
238 | bar 87 85
239 | foo 67 78
240 | Name Maths Science
241 | ```
242 |
243 | * Useful in cases where logic is easier to write when working on reversed file
244 | * Consider this made up log file, many **Warning** lines but need to extract only from last such **Warning** upto **Error** line
245 | * See [GNU sed chapter](./gnu_sed.md#lines-between-two-regexps) for details on the `sed` command used below
246 |
247 | ```bash
248 | $ cat report.log
249 | blah blah
250 | Warning: something went wrong
251 | more blah
252 | whatever
253 | Warning: something else went wrong
254 | some text
255 | some more text
256 | Error: something seriously went wrong
257 | blah blah blah
258 |
259 | $ tac report.log | sed -n '/Error:/,/Warning:/p' | tac
260 | Warning: something else went wrong
261 | some text
262 | some more text
263 | Error: something seriously went wrong
264 | ```
265 |
266 | * Similarly, if characters in lines have to be reversed, use the `rev` command
267 |
268 | ```bash
269 | $ whatis rev
270 | rev (1) - reverse lines characterwise
271 | ```
272 |
273 |
274 |
275 | #### Useless use of cat
276 |
277 | * `cat` is used so frequently to view contents of a file that somehow users think other commands cannot handle file input
278 | * [UUOC](https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat)
279 | * [Useless Use of Cat Award](http://porkmail.org/era/unix/award.html)
280 |
281 | ```bash
282 | $ cat report.log | grep -E 'Warning|Error'
283 | Warning: something went wrong
284 | Warning: something else went wrong
285 | Error: something seriously went wrong
286 | $ grep -E 'Warning|Error' report.log
287 | Warning: something went wrong
288 | Warning: something else went wrong
289 | Error: something seriously went wrong
290 | ```
291 |
292 | * Use [input redirection](http://wiki.bash-hackers.org/howto/redirection_tutorial) if a command doesn't accept file input
293 |
294 | ```bash
295 | $ cat marks_2015.txt | tr 'A-Z' 'a-z'
296 | name maths science
297 | foo 67 78
298 | bar 87 85
299 | $ tr 'A-Z' 'a-z' < marks_2015.txt
300 | name maths science
301 | foo 67 78
302 | bar 87 85
303 | ```
304 |
305 | * However, `cat` should definitely be used where **concatenation** is needed
306 |
307 | ```bash
308 | $ grep -c 'foo' marks_201*
309 | marks_2015.txt:1
310 | marks_2016.txt:1
311 | marks_2017.txt:1
312 |
313 | $ # concatenation allows to get overall count in one-shot in this case
314 | $ cat marks_201* | grep -c 'foo'
315 | 3
316 | ```
317 |
318 |
319 |
320 | #### Further Reading for cat
321 |
322 | * [cat Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/cat?sort=votes&pageSize=15)
323 | * [cat Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/cat?sort=votes&pageSize=15)
324 |
325 |
326 |
327 | ## less
328 |
329 | ```bash
330 | $ less --version | head -n1
331 | less 481 (GNU regular expressions)
332 |
333 | $ # By default, pager is used to display the man pages
334 | $ # and usually, pager is linked to less command
335 | $ type pager less
336 | pager is /usr/bin/pager
337 | less is /usr/bin/less
338 |
339 | $ realpath /usr/bin/pager
340 | /bin/less
341 | $ realpath /usr/bin/less
342 | /bin/less
343 | $ diff -s /usr/bin/pager /usr/bin/less
344 | Files /usr/bin/pager and /usr/bin/less are identical
345 | ```
346 |
347 | * `cat` command is NOT suitable for viewing contents of large files on the Terminal
348 | * `less` displays contents of a file, automatically fits to size of Terminal, allows scrolling in either direction and other options for effective viewing
349 | * Usually, `man` command uses `less` command to display the help page
350 | * The navigation commands are similar to `vi` editor
351 |
352 |
353 |
354 | #### Navigation commands
355 |
356 | Commonly used commands are given below, press `h` for summary of options
357 |
358 | * `g` go to start of file
359 | * `G` go to end of file
360 | * `q` quit
361 | * `/pattern` search for the given pattern in forward direction
362 | * `?pattern` search for the given pattern in backward direction
363 | * `n` go to next pattern
364 | * `N` go to previous pattern
365 |
366 |
367 |
368 | #### Further Reading for less
369 |
370 | * See `man less` for detailed info on commands and options. For example:
371 | * `-s` option to squeeze consecutive blank lines
372 | * `-N` option to prefix line number
373 | * `less` command is an [improved version](https://unix.stackexchange.com/questions/604/isnt-less-just-more) of `more` command
374 | * [differences between most, more and less](https://unix.stackexchange.com/questions/81129/what-are-the-differences-between-most-more-and-less)
375 | * [less Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/less?sort=votes&pageSize=15)
376 |
377 |
378 |
379 | ## tail
380 |
381 | ```bash
382 | $ tail --version | head -n1
383 | tail (GNU coreutils) 8.25
384 |
385 | $ man tail
386 | TAIL(1) User Commands TAIL(1)
387 |
388 | NAME
389 | tail - output the last part of files
390 |
391 | SYNOPSIS
392 | tail [OPTION]... [FILE]...
393 |
394 | DESCRIPTION
395 | Print the last 10 lines of each FILE to standard output. With more
396 | than one FILE, precede each with a header giving the file name.
397 |
398 | With no FILE, or when FILE is -, read standard input.
399 | ...
400 | ```
401 |
402 |
403 |
404 | #### linewise tail
405 |
406 | Consider this sample file, with line numbers prefixed
407 |
408 | ```bash
409 | $ cat sample.txt
410 | 1) Hello World
411 | 2)
412 | 3) Good day
413 | 4) How are you
414 | 5)
415 | 6) Just do-it
416 | 7) Believe it
417 | 8)
418 | 9) Today is sunny
419 | 10) Not a bit funny
420 | 11) No doubt you like it too
421 | 12)
422 | 13) Much ado about nothing
423 | 14) He he he
424 | 15) Adios amigo
425 | ```
426 |
427 | * default behavior - display last 10 lines
428 |
429 | ```bash
430 | $ tail sample.txt
431 | 6) Just do-it
432 | 7) Believe it
433 | 8)
434 | 9) Today is sunny
435 | 10) Not a bit funny
436 | 11) No doubt you like it too
437 | 12)
438 | 13) Much ado about nothing
439 | 14) He he he
440 | 15) Adios amigo
441 | ```
442 |
443 | * Use `-n` option to control number of lines to filter
444 |
445 | ```bash
446 | $ tail -n3 sample.txt
447 | 13) Much ado about nothing
448 | 14) He he he
449 | 15) Adios amigo
450 |
451 | $ # some versions of tail allow to skip explicit n character
452 | $ tail -5 sample.txt
453 | 11) No doubt you like it too
454 | 12)
455 | 13) Much ado about nothing
456 | 14) He he he
457 | 15) Adios amigo
458 | ```
459 |
460 | * when number is prefixed with `+` sign, all lines are fetched from that particular line number to end of file
461 |
462 | ```bash
463 | $ tail -n +10 sample.txt
464 | 10) Not a bit funny
465 | 11) No doubt you like it too
466 | 12)
467 | 13) Much ado about nothing
468 | 14) He he he
469 | 15) Adios amigo
470 |
471 | $ seq 13 17 | tail -n +3
472 | 15
473 | 16
474 | 17
475 | ```
476 |
477 |
478 |
479 | #### characterwise tail
480 |
481 | * Note that this works byte wise and not suitable for multi-byte character encodings
482 |
483 | ```bash
484 | $ # last three characters including the newline character
485 | $ echo 'Hi there!' | tail -c3
486 | e!
487 |
488 | $ # excluding the first character
489 | $ echo 'Hi there!' | tail -c +2
490 | i there!
491 | ```
492 |
493 |
494 |
495 | #### multiple file input for tail
496 |
497 | ```bash
498 | $ tail -n2 report.log sample.txt
499 | ==> report.log <==
500 | Error: something seriously went wrong
501 | blah blah blah
502 |
503 | ==> sample.txt <==
504 | 14) He he he
505 | 15) Adios amigo
506 |
507 | $ # -q option to avoid filename in output
508 | $ tail -q -n2 report.log sample.txt
509 | Error: something seriously went wrong
510 | blah blah blah
511 | 14) He he he
512 | 15) Adios amigo
513 | ```
514 |
515 |
516 |
517 | #### Further Reading for tail
518 |
519 | * `tail -f` and related options are beyond the scope of this tutorial. Below links might be useful
520 | * [look out for buffering](http://mywiki.wooledge.org/BashFAQ/009)
521 | * [Piping tail -f output though grep twice](https://stackoverflow.com/questions/13858912/piping-tail-output-though-grep-twice)
522 | * [tail and less](https://unix.stackexchange.com/questions/196168/does-less-have-a-feature-like-tail-follow-name-f)
523 | * [tail Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/tail?sort=votes&pageSize=15)
524 | * [tail Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/tail?sort=votes&pageSize=15)
525 |
526 |
527 |
528 | ## head
529 |
530 | ```bash
531 | $ head --version | head -n1
532 | head (GNU coreutils) 8.25
533 |
534 | $ man head
535 | HEAD(1) User Commands HEAD(1)
536 |
537 | NAME
538 | head - output the first part of files
539 |
540 | SYNOPSIS
541 | head [OPTION]... [FILE]...
542 |
543 | DESCRIPTION
544 | Print the first 10 lines of each FILE to standard output. With more
545 | than one FILE, precede each with a header giving the file name.
546 |
547 | With no FILE, or when FILE is -, read standard input.
548 | ...
549 | ```
550 |
551 |
552 |
553 | #### linewise head
554 |
555 | * default behavior - display starting 10 lines
556 |
557 | ```bash
558 | $ head sample.txt
559 | 1) Hello World
560 | 2)
561 | 3) Good day
562 | 4) How are you
563 | 5)
564 | 6) Just do-it
565 | 7) Believe it
566 | 8)
567 | 9) Today is sunny
568 | 10) Not a bit funny
569 | ```
570 |
571 | * Use `-n` option to control number of lines to filter
572 |
573 | ```bash
574 | $ head -n3 sample.txt
575 | 1) Hello World
576 | 2)
577 | 3) Good day
578 |
579 | $ # some versions of head allow to skip explicit n character
580 | $ head -4 sample.txt
581 | 1) Hello World
582 | 2)
583 | 3) Good day
584 | 4) How are you
585 | ```
586 |
587 | * when number is prefixed with `-` sign, all lines are fetched except those many lines to end of file
588 |
589 | ```bash
590 | $ # except last 9 lines of file
591 | $ head -n -9 sample.txt
592 | 1) Hello World
593 | 2)
594 | 3) Good day
595 | 4) How are you
596 | 5)
597 | 6) Just do-it
598 |
599 | $ # except last 2 lines
600 | $ seq 13 17 | head -n -2
601 | 13
602 | 14
603 | 15
604 | ```
605 |
606 |
607 |
608 | #### characterwise head
609 |
610 | * Note that this works byte wise and not suitable for multi-byte character encodings
611 |
612 | ```bash
613 | $ # if output of command doesn't end with newline, prompt will be on same line
614 | $ # to highlight working of command, the prompt for such cases is not shown here
615 |
616 | $ # first two characters
617 | $ echo 'Hi there!' | head -c2
618 | Hi
619 |
620 | $ # excluding last four characters
621 | $ echo 'Hi there!' | head -c -4
622 | Hi the
623 | ```
624 |
625 |
626 |
627 | #### multiple file input for head
628 |
629 | ```bash
630 | $ head -n3 report.log sample.txt
631 | ==> report.log <==
632 | blah blah
633 | Warning: something went wrong
634 | more blah
635 |
636 | ==> sample.txt <==
637 | 1) Hello World
638 | 2)
639 | 3) Good day
640 |
641 | $ # -q option to avoid filename in output
642 | $ head -q -n3 report.log sample.txt
643 | blah blah
644 | Warning: something went wrong
645 | more blah
646 | 1) Hello World
647 | 2)
648 | 3) Good day
649 | ```
650 |
651 |
652 |
653 | #### combining head and tail
654 |
655 | * Despite involving two commands, often this combination is faster than equivalent sed/awk versions
656 |
657 | ```bash
658 | $ head -n11 sample.txt | tail -n3
659 | 9) Today is sunny
660 | 10) Not a bit funny
661 | 11) No doubt you like it too
662 |
663 | $ tail sample.txt | head -n2
664 | 6) Just do-it
665 | 7) Believe it
666 | ```
667 |
668 |
669 |
670 | #### Further Reading for head
671 |
672 | * [head Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/head?sort=votes&pageSize=15)
673 |
674 |
675 |
676 | ## Text Editors
677 |
678 | For editing text files, the following applications can be used. Of these, `gedit`, `nano`, `vi` and/or `vim` are available in most distros by default
679 |
680 | Easy to use
681 |
682 | * [gedit](https://wiki.gnome.org/Apps/Gedit)
683 | * [geany](http://www.geany.org/)
684 | * [nano](http://nano-editor.org/)
685 |
686 | Powerful text editors
687 |
688 | * [vim](https://github.com/vim/vim)
689 | * [vim learning resources](https://github.com/learnbyexample/scripting_course/blob/master/Vim_curated_resources.md) and [vim reference](https://github.com/learnbyexample/vim_reference) for further info
690 | * [emacs](https://www.gnu.org/software/emacs/)
691 | * [atom](https://atom.io/)
692 | * [sublime](https://www.sublimetext.com/)
693 |
694 | Check out [this analysis](https://github.com/jhallen/joes-sandbox/tree/master/editor-perf) for some performance/feature comparisons of various text editors
695 |
--------------------------------------------------------------------------------
/whats_the_difference.md:
--------------------------------------------------------------------------------
1 | # What's the difference
2 |
3 | **Table of Contents**
4 |
5 | * [cmp](#cmp)
6 | * [diff](#diff)
7 | * [Comparing Directories](#comparing-directories)
8 | * [colordiff](#colordiff)
9 |
10 |
11 |
12 | ## cmp
13 |
14 | ```bash
15 | $ cmp --version | head -n1
16 | cmp (GNU diffutils) 3.3
17 |
18 | $ man cmp
19 | CMP(1) User Commands CMP(1)
20 |
21 | NAME
22 | cmp - compare two files byte by byte
23 |
24 | SYNOPSIS
25 | cmp [OPTION]... FILE1 [FILE2 [SKIP1 [SKIP2]]]
26 |
27 | DESCRIPTION
28 | Compare two files byte by byte.
29 |
30 | The optional SKIP1 and SKIP2 specify the number of bytes to skip at the
31 | beginning of each file (zero by default).
32 | ...
33 | ```
34 |
35 | * As the comparison is byte by byte, it doesn't matter if file is human readable or not
36 | * A typical use case is to check if two executables are same or not
37 |
38 | ```bash
39 | $ echo 'foo 123' > f1; echo 'food 123' > f2
40 | $ cmp f1 f2
41 | f1 f2 differ: byte 4, line 1
42 |
43 | $ # print differing bytes
44 | $ cmp -b f1 f2
45 | f1 f2 differ: byte 4, line 1 is 40 144 d
46 |
47 | $ # skip given bytes from each file
48 | $ # if only one number is given, it is used for both inputs
49 | $ cmp -i 3:4 f1 f2
50 | $ echo $?
51 | 0
52 |
53 | $ # compare only given number of bytes from start of inputs
54 | $ cmp -n 3 f1 f2
55 | $ echo $?
56 | 0
57 |
58 | $ # suppress output
59 | $ cmp -s f1 f2
60 | $ echo $?
61 | 1
62 | ```
63 |
64 | * Comparison stops immediately at the first difference found
65 | * If verbose option `-l` is used, comparison would stop at whichever input reaches end of file first
66 |
67 | ```bash
68 | $ # first column is byte number
69 | $ # second/third column is respective octal value of differing bytes
70 | $ cmp -l f1 f2
71 | 4 40 144
72 | 5 61 40
73 | 6 62 61
74 | 7 63 62
75 | 8 12 63
76 | cmp: EOF on f1
77 | ```
78 |
79 | **Further Reading**
80 |
81 | * `man cmp` and `info cmp` for more options and detailed documentation
82 |
83 |
84 |
85 |
86 | ## diff
87 |
88 | ```bash
89 | $ diff --version | head -n1
90 | diff (GNU diffutils) 3.3
91 |
92 | $ man diff
93 | DIFF(1) User Commands DIFF(1)
94 |
95 | NAME
96 | diff - compare files line by line
97 |
98 | SYNOPSIS
99 | diff [OPTION]... FILES
100 |
101 | DESCRIPTION
102 | Compare FILES line by line.
103 | ...
104 | ```
105 |
106 | * `diff` output shows lines from first file input starting with `<`
107 | * lines from second file input starts with `>`
108 | * between the two file contents, `---` is used as separator
109 | * each difference is prefixed by a command that indicates the differences (see links at end of section for more details)
110 |
111 | ```bash
112 | $ paste d1 d2
113 | 1 1
114 | 2 hello
115 | 3 3
116 | world 4
117 |
118 | $ diff d1 d2
119 | 2c2
120 | < 2
121 | ---
122 | > hello
123 | 4c4
124 | < world
125 | ---
126 | > 4
127 |
128 | $ diff <(seq 4) <(seq 5)
129 | 4a5
130 | > 5
131 | ```
132 |
133 | * use `-i` option to ignore case
134 |
135 | ```bash
136 | $ echo 'Hello World!' > i1
137 | $ echo 'hello world!' > i2
138 |
139 | $ diff i1 i2
140 | 1c1
141 | < Hello World!
142 | ---
143 | > hello world!
144 |
145 | $ diff -i i1 i2
146 | $ echo $?
147 | 0
148 | ```
149 |
150 | * ignoring difference in white spaces
151 |
152 | ```bash
153 | $ # -b option to ignore changes in the amount of white space
154 | $ diff -b <(echo 'good day') <(echo 'good day')
155 | $ echo $?
156 | 0
157 |
158 | $ # -w option to ignore all white spaces
159 | $ diff -w <(echo 'hi there ') <(echo ' hi there')
160 | $ echo $?
161 | 0
162 | $ diff -w <(echo 'hi there ') <(echo 'hithere')
163 | $ echo $?
164 | 0
165 |
166 | # use -B to ignore only blank lines
167 | # use -E to ignore changes due to tab expansion
168 | # use -z to ignore trailing white spaces at end of line
169 | ```
170 |
171 | * side-by-side output
172 |
173 | ```bash
174 | $ diff -y d1 d2
175 | 1 1
176 | 2 | hello
177 | 3 3
178 | world | 4
179 |
180 | $ # -y is usually used along with other options
181 | $ # default width is 130 print columns
182 | $ diff -W 60 --suppress-common-lines -y d1 d2
183 | 2 | hello
184 | world | 4
185 |
186 | $ diff -W 20 --left-column -y <(seq 4) <(seq 5)
187 | 1 (
188 | 2 (
189 | 3 (
190 | 4 (
191 | > 5
192 | ```
193 |
194 | * by default, there is no output if input files are same. Use `-s` option to additionally indicate files are same
195 | * by default, all differences are shown. Use `-q` option to indicate only that files differ
196 |
197 | ```bash
198 | $ cp i1 i1_copy
199 | $ diff -s i1 i1_copy
200 | Files i1 and i1_copy are identical
201 | $ diff -s i1 i2
202 | 1c1
203 | < Hello World!
204 | ---
205 | > hello world!
206 |
207 | $ diff -q i1 i1_copy
208 | $ diff -q i1 i2
209 | Files i1 and i2 differ
210 |
211 | $ # combine them to always get one line output
212 | $ diff -sq i1 i1_copy
213 | Files i1 and i1_copy are identical
214 | $ diff -sq i1 i2
215 | Files i1 and i2 differ
216 | ```
217 |
218 |
219 |
220 | #### Comparing Directories
221 |
222 | * when comparing two files of same name from different directories, specifying the filename is optional for one of the directories
223 |
224 | ```bash
225 | $ mkdir dir1 dir2
226 | $ echo 'Hello World!' > dir1/i1
227 | $ echo 'hello world!' > dir2/i1
228 |
229 | $ diff dir1/i1 dir2
230 | 1c1
231 | < Hello World!
232 | ---
233 | > hello world!
234 |
235 | $ diff -s i1 dir1/
236 | Files i1 and dir1/i1 are identical
237 | $ diff -s . dir1/i1
238 | Files ./i1 and dir1/i1 are identical
239 | ```
240 |
241 | * if both arguments are directories, all files are compared
242 |
243 | ```bash
244 | $ touch dir1/report.log dir1/lists dir2/power.log
245 | $ cp f1 dir1/
246 | $ cp f1 dir2/
247 |
248 | $ # by default, all differences are reported
249 | $ # as well as filenames which are unique to respective directories
250 | $ diff dir1 dir2
251 | diff dir1/i1 dir2/i1
252 | 1c1
253 | < Hello World!
254 | ---
255 | > hello world!
256 | Only in dir1: lists
257 | Only in dir2: power.log
258 | Only in dir1: report.log
259 | ```
260 |
261 | * to report only filenames
262 |
263 | ```bash
264 | $ diff -sq dir1 dir2
265 | Files dir1/f1 and dir2/f1 are identical
266 | Files dir1/i1 and dir2/i1 differ
267 | Only in dir1: lists
268 | Only in dir2: power.log
269 | Only in dir1: report.log
270 |
271 | $ # list only differing files
272 | $ # also useful to copy-paste the command for GUI diffs like tkdiff/vimdiff
273 | $ diff dir1 dir2 | grep '^diff '
274 | diff dir1/i1 dir2/i1
275 | ```
276 |
277 | * to recursively compare sub-directories as well, use `-r`
278 |
279 | ```bash
280 | $ mkdir dir1/subdir dir2/subdir
281 | $ echo 'good' > dir1/subdir/f1
282 | $ echo 'goad' > dir2/subdir/f1
283 |
284 | $ diff -srq dir1 dir2
285 | Files dir1/f1 and dir2/f1 are identical
286 | Files dir1/i1 and dir2/i1 differ
287 | Only in dir1: lists
288 | Only in dir2: power.log
289 | Only in dir1: report.log
290 | Files dir1/subdir/f1 and dir2/subdir/f1 differ
291 |
292 | $ diff -r dir1 dir2 | grep '^diff '
293 | diff -r dir1/i1 dir2/i1
294 | diff -r dir1/subdir/f1 dir2/subdir/f1
295 | ```
296 |
297 | * See also [GNU diffutils manual - comparing directories](https://www.gnu.org/software/diffutils/manual/diffutils.html#Comparing-Directories) for further options and details like excluding files, ignoring filename case, etc and `dirdiff` command
298 |
299 |
300 |
301 | #### colordiff
302 |
303 | ```bash
304 | $ whatis colordiff
305 | colordiff (1) - a tool to colorize diff output
306 |
307 | $ whatis wdiff
308 | wdiff (1) - display word differences between text files
309 | ```
310 |
311 | * simply replace `diff` with `colordiff`
312 |
313 | 
314 |
315 | * or, pass output of a `diff` tool to `colordiff`
316 |
317 | 
318 |
319 | * See also [stackoverflow - How to colorize diff on the command line?](https://stackoverflow.com/questions/8800578/how-to-colorize-diff-on-the-command-line) for other options
320 |
321 |
322 |
323 | **Further Reading**
324 |
325 | * `man diff` and `info diff` for more options and detailed documentation
326 | * [GNU diffutils manual](https://www.gnu.org/software/diffutils/manual/diffutils.html) for a better documentation
327 | * `man -k diff` to get list of all commands related to `diff`
328 | * [diff Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/diff?sort=votes&pageSize=15)
329 | * [unix.stackexchange - GUI diff and merge tools](https://unix.stackexchange.com/questions/4573/which-gui-diff-viewer-would-you-recommend-with-copy-to-left-right-functionality)
330 | * [unix.stackexchange - Understanding diff output](https://unix.stackexchange.com/questions/81998/understanding-of-diff-output)
331 | * [stackoverflow - Using output of diff to create patch](https://stackoverflow.com/questions/437219/using-the-output-of-diff-to-create-the-patch)
332 |
333 |
--------------------------------------------------------------------------------
/wheres_my_file.md:
--------------------------------------------------------------------------------
1 | # Where's my file
2 |
3 | **Table of Contents**
4 |
5 | * [find](#find)
6 | * [locate](#locate)
7 |
8 |
9 |
10 | ## find
11 |
12 | ```bash
13 | $ find --version | head -n1
14 | find (GNU findutils) 4.7.0-git
15 |
16 | $ man find
17 | FIND(1) General Commands Manual FIND(1)
18 |
19 | NAME
20 | find - search for files in a directory hierarchy
21 |
22 | SYNOPSIS
23 | find [-H] [-L] [-P] [-D debugopts] [-Olevel] [starting-point...]
24 | [expression]
25 |
26 | DESCRIPTION
27 | This manual page documents the GNU version of find. GNU find searches
28 | the directory tree rooted at each given starting-point by evaluating
29 | the given expression from left to right, according to the rules of
30 | precedence (see section OPERATORS), until the outcome is known (the
31 | left hand side is false for and operations, true for or), at which
32 | point find moves on to the next file name. If no starting-point is
33 | specified, `.' is assumed.
34 | ...
35 | ```
36 |
37 | **Examples**
38 |
39 | Filtering based on file name
40 |
41 | * `find . -iname 'power.log'` search and print path of file named power.log (ignoring case) in current directory and its sub-directories
42 | * `find -name '*log'` search and print path of all files whose name ends with log in current directory - using `.` is optional when searching in current directory
43 | * `find -not -name '*log'` print path of all files whose name does NOT end with log in current directory
44 | * `find -regextype egrep -regex '.*/\w+'` use extended regular expression to match filename containing only `[a-zA-Z_]` characters
45 | * `.*/` is needed to match initial part of file path
46 |
47 | Filtering based on file type
48 |
49 | * `find /home/guest1/proj -type f` print path of all regular files found in specified directory
50 | * `find /home/guest1/proj -type d` print path of all directories found in specified directory
51 | * `find /home/guest1/proj -type f -name '.*'` print path of all hidden files
52 |
53 | Filtering based on depth
54 |
55 | The relative path `.` is considered as depth 0 directory, files and folders immediately contained in a directory are at depth 1 and so on
56 |
57 | * `find -maxdepth 1 -type f` all regular files (including hidden ones) from current directory (without going to sub-directories)
58 | * `find -maxdepth 1 -type f -name '[!.]*'` all regular files (but not hidden ones) from current directory (without going to sub-directories)
59 | * `-not -name '.*'` can be also used
60 | * `find -mindepth 1 -maxdepth 1 -type d` all directories (including hidden ones) in current directory (without going to sub-directories)
61 |
62 | Filtering based on file properties
63 |
64 | * `find -mtime -2` print files that were modified within last two days in current directory
65 | * Note that day here means 24 hours
66 | * `find -mtime +7` print files that were modified more than seven days back in current directory
67 | * `find -daystart -type f -mtime -1` files that were modified from beginning of day (not past 24 hours)
68 | * `find -size +10k` print files with size greater than 10 kilobytes in current directory
69 | * `find -size -1M` print files with size less than 1 megabytes in current directory
70 | * `find -size 2G` print files of size 2 gigabytes in current directory
71 |
72 | Passing filtered files as input to other commands
73 |
74 | * `find report -name '*log*' -exec rm {} \;` delete all filenames containing log in report folder and its sub-folders
75 | * here `rm` command is called for every file matching the search conditions
76 | * since `;` is a special character for shell, it needs to be escaped using `\`
77 | * `find report -name '*log*' -delete` delete all filenames containing log in report folder and its sub-folders
78 | * `find -name '*.txt' -exec wc {} +` list of files ending with txt are all passed together as argument to `wc` command instead of executing wc command for every file
79 | * no need to use escape the `+` character in this case
80 | * also note that number of invocations of command specified is not necessarily once if number of files found is too large
81 | * `find -name '*.log' -exec mv {} ../log/ \;` move files ending with .log to log directory present in one hierarchy above. `mv` is executed once per each filtered file
82 | * `find -name '*.log' -exec mv -t ../log/ {} +` the `-t` option allows to specify target directory and then provide multiple files to be moved as argument
83 | * Similarly, one can use `-t` for `cp` command
84 |
85 | **Further Reading**
86 |
87 | * [using find](http://mywiki.wooledge.org/UsingFind)
88 | * [find examples on SO](https://stackoverflow.com/documentation/bash/566/find#t=201612140534548263961)
89 | * [Collection of find examples](http://alvinalexander.com/unix/edu/examples/find.shtml)
90 | * [find Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/find?sort=votes&pageSize=15)
91 | * [find and tar example](https://unix.stackexchange.com/questions/282762/find-mtime-1-print-xargs-tar-archives-all-files-from-directory-ignoring-t/282885#282885)
92 | * [find Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/find?sort=votes&pageSize=15)
93 | * [Why is looping over find's output bad practice?](https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice)
94 |
95 |
96 |
97 |
98 | ## locate
99 |
100 | ```bash
101 | $ locate --version | head -n1
102 | mlocate 0.26
103 |
104 | $ man locate
105 | locate(1) General Commands Manual locate(1)
106 |
107 | NAME
108 | locate - find files by name
109 |
110 | SYNOPSIS
111 | locate [OPTION]... PATTERN...
112 |
113 | DESCRIPTION
114 | locate reads one or more databases prepared by updatedb(8) and writes
115 | file names matching at least one of the PATTERNs to standard output,
116 | one per line.
117 |
118 | If --regex is not specified, PATTERNs can contain globbing characters.
119 | If any PATTERN contains no globbing characters, locate behaves as if
120 | the pattern were *PATTERN*.
121 | ...
122 | ```
123 |
124 | Faster alternative to `find` command when searching for a file by its name. It is based on a database, which gets updated by a `cron` job. So, newer files may be not present in results. Use this command if it is available in your distro and you remember some part of filename. Very useful if one has to search entire filesystem in which case `find` command might take a very long time compared to `locate`
125 |
126 | **Examples**
127 |
128 | * `locate 'power'` print path of files containing power in the whole filesystem
129 | * matches anywhere in path, ex: '/home/learnbyexample/lowpower_adder/result.log' and '/home/learnbyexample/power.log' are both a valid match
130 | * implicitly, `locate` would change the string to `*power*` as no globbing characters are present in the string specified
131 | * `locate -b '\power.log'` print path matching the string power.log exactly at end of path
132 | * '/home/learnbyexample/power.log' matches but not '/home/learnbyexample/lowpower.log'
133 | * since globbing character '\' is used while specifying search string, it doesn't get implicitly replaced by `*power.log*`
134 | * `locate -b '\proj_adder'` the `-b` option also comes in handy to print only the path of directory name, otherwise every file under that folder would also be displayed
135 | * [find vs locate - pros and cons](https://unix.stackexchange.com/questions/60205/locate-vs-find-usage-pros-and-cons-of-each-other)
136 |
137 |
138 |
--------------------------------------------------------------------------------