├── Chapter01 ├── Docker.sh ├── Ubuntu.sh ├── enableWSL.ps1 ├── example1.sh └── macOSX.sh ├── Chapter02 └── Chapter02_examples.sh ├── Chapter03 └── Chapter03_examples.sh ├── Chapter04 ├── Chapter04_examples.sh ├── arrays.sh ├── bash_profile ├── gnuplot ├── hello_world-1.sh ├── hello_world-2.sh ├── inject_text.sh ├── netcat.sh └── network.sh ├── Chapter05 └── Chapter05_examples.sh ├── Chapter06 ├── Chapter06_examples.sh ├── forecast.sh └── python_average.py ├── LICENSE └── README.md /Chapter01/Docker.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | docker run -ivt nextrevtech/commandline-book /bin/bash 4 | -------------------------------------------------------------------------------- /Chapter01/Ubuntu.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | sudo apt-get update 4 | sudo apt install jq pip-python gnuplot sqlite3 libsqlite3-dev curl netcat bc 5 | pip install pandas 6 | 7 | -------------------------------------------------------------------------------- /Chapter01/enableWSL.ps1: -------------------------------------------------------------------------------- 1 | Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux 2 | -------------------------------------------------------------------------------- /Chapter01/example1.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | cat file.txt | tr '[:space:]' '[\n*]' | grep -v "^$" | sort | uniq -c | sort -bnr (tr '[:space:]' '[\n*]' | grep -v "^$" | sort | uniq -c | sort -bnr ) hello.txt 14 | 15 | cat hello.txt 16 | 17 | echo -e "1\n3\n19\n1\n25\n5" > numbers.txt 18 | 19 | cat numbers.txt 20 | 21 | cat numbers.txt | sort -n 22 | 23 | cat numbers.txt | sort -n | uniq 24 | 25 | history 26 | 27 | export PS1="\u@\h:\w>" 28 | -------------------------------------------------------------------------------- /Chapter03/Chapter03_examples.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | curl -O https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Digital_Ebook_Purchase_v1_00.tsv.gz && curl -O https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz 4 | 5 | ls -al amazon* 6 | 7 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_00.tsv.gz >> amazon_reviews_us_Digital_Ebook_Purchase_v1_00.tsv && zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz >> amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv 8 | 9 | cat *.tsv > reviews.tsv 10 | 11 | wc -l reviews.tsv 12 | 13 | cut -d$'\t' -f 6,8,13,14 reviews.tsv | more 14 | 15 | cut -d$'\t' -f 6,8,13,14 reviews.tsv > stripped_reviews.tsv 16 | 17 | grep -i Packt stripped_reviews.tsv | wc -w 18 | 19 | cat stripped_reviews.tsv | tr "\\t" "," > all_reviews.csv 20 | 21 | cat all_reviews.csv | awk -F "," '{print $4}' | grep -i Packt 22 | 23 | cat all_reviews.csv | awk -F "," '{print $4}' | grep -i Packt > background_words.txt & 24 | 25 | nohup cat all_reviews.csv | awk -F "," '{print $4}' | grep -i Packt > background_words.txt & 26 | 27 | sudo apt install -y screen tmux 28 | -------------------------------------------------------------------------------- /Chapter04/Chapter04_examples.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | gnuplot -e “set terminal sixelgd background rgb ‘white’; test” 4 | 5 | 6 | head amazon_reviews_us_Digital_Ebook_Purchase_v1_00.tsv 7 | 8 | head -n1 amazon_reviews_us_Digital_Ebook_Purchase_v1_00.tsv 9 | 10 | 11 | cat amazon_reviews_us_Digital_Ebook_Purchase_v1_00.tsv | cut -d ' ' -f 4,8-12,15 > test.tsv 12 | sqlite3 aws-ebook-reviews.sq3 < clusterchart.dat 32 | cat clusterchart.dat 33 | 34 | 35 | gnuplot -e "set style data histograms ; set style fill solid border lt -1 ; plot 'clusterchart.dat' using 2:xtic(1) ti col, '' u 3 ti col, '' u 4 ti col, '' u 5 ti col, '' u 6 ti col" 36 | 37 | 38 | GNUTERM=dumb gnuplot -e "set style data histograms ; set style fill solid border lt -1 ; plot 'clusterchart.dat' using 2:xtic(1) ti col, '' u 3 ti col, '' u 4 ti col, '' u 5 ti col, '' u 6 ti col" 39 | 40 | 41 | GNUTERM=dumb gnuplot -e "set style data histograms ; set style fill solid border lt -1 ; plot 'clusterchart.dat' using 2:xtic(1) ti col, '' u 3 ti col, '' u 4 ti col, '' u 5 ti col, '' u 6 ti col" 42 | 43 | 44 | barchart -s -f clusterchart.dat 'plot for [i=2:6] $data using i:xtic(1)' 45 | 46 | 47 | barchart -s -f clusterchart.dat 'plot for [i=2:6] $data using (100.*column(i)/column(7)):xtic(1) title column(i)' 48 | -------------------------------------------------------------------------------- /Chapter04/arrays.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | TMP_PATH=/bin:/usr/bin:/sbin:/usr/sbin 4 | IFS=: 5 | PATH_ARRAY=($TMP_PATH) 6 | unset IFS 7 | 8 | echo First element - ${PATH_ARRAY} 9 | echo First element - ${PATH_ARRAY[0]} 10 | echo Second element - ${PATH_ARRAY[1]} 11 | echo All elements - ${PATH_ARRAY[*]} 12 | echo All elements - ${PATH_ARRAY[@]} 13 | 14 | -------------------------------------------------------------------------------- /Chapter04/bash_profile: -------------------------------------------------------------------------------- 1 | case "${TERM}" in 2 | iterm2) 3 | TERM=xterm-256color 4 | COLORTERM=${COLORTERM:-truecolor} 5 | export GNUTERM=${GNUTERM:-png} 6 | ;; 7 | wsltty) 8 | TERM=xterm-256color 9 | COLORTERM=${COLORTERM:-truecolor} 10 | export GNUTERM=${GNUTERM:-sixelgd} 11 | ;; 12 | screen|xterm|xterm-256color) 13 | COLORTERM=${COLORTERM:-truecolor} 14 | export GNUTERM=${GNUTERM:-dumb} 15 | ;; 16 | *) 17 | export GNUTERM=${GNUTERM:-dumb} 18 | ;; 19 | esac 20 | 21 | -------------------------------------------------------------------------------- /Chapter04/gnuplot: -------------------------------------------------------------------------------- 1 | 2 | alias gnuplot="__gnuplot" 3 | __gnuplot() { 4 | SIZE=$(stty size 2>/dev/null) 5 | SIZE=${SIZE:-$(tput lines) $(tput cols)} 6 | COLS=${SIZE#* } 7 | ROWS=${SIZE% *} 8 | XPX=${XPX:-13} 9 | YPX=${YPX:-24} 10 | COLUMNS=${COLUMNS:-${COLS}} 11 | LINES=$((${LINES:-${ROWS}}-3)) 12 | case "${GNUTERM%% *}" in 13 | dumb) X=${COLUMNS} ; Y=${LINES} ; DCS_GUARD="cat" ;; 14 | png) X=$((XPX*COLUMNS)) ; Y=$((YPX*LINES)) ; DCS_GUARD="imgcat";; 15 | sixelgd) X=$((XPX*COLUMNS)) ; Y=$((YPX*LINES));; 16 | esac 17 | sed -i "s/^set term[[:space:]][^[:space:]]*/set term ${GNUTERM%% *}/" ~/.gnuplot 18 | GNUTERM="${GNUTERM} size $X,$Y" \gnuplot "$@" | ${DCS_GUARD:-cat} 19 | } 20 | 21 | alias barchart="FUNCNAME=barchart __barchart" 22 | __barchart() { 23 | local STACKED 24 | local DATA 25 | 26 | OPTIND=1 ; while getopts ":hf:s" opt; do 27 | case ${opt} in 28 | f) [ -r "${OPTARG}" ] && DATA=$(printf '$data <] " ; return 9 | ;; 10 | esac 11 | done 12 | shift $(($OPTIND - 1)) 13 | HOST=${1%:*} 14 | PORT=${1#*:} 15 | PORT=${2:-$PORT} 16 | (exec 6<>/dev/tcp/${HOST}/${PORT} 2>&1) 17 | RC=$? 18 | case "${VERBOSE}${RC}" in 19 | true0) printf "open\n";; 20 | true*) printf "closed\n";; 21 | esac 22 | return $RC 23 | } 24 | 25 | -------------------------------------------------------------------------------- /Chapter05/Chapter05_examples.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | seq -- $(seq 1 1 5) 4 | 5 | [ 0 = 1 ] && echo “a” || ([0==2] && echo b || echo c) 6 | [ -f /myconfig ] && read_params /myconfig 7 | 8 | 9 | testcase() { 10 | for VAR; do 11 | case “${VAR}” in 12 | ‘’) echo “empty”;; 13 | a) echo “a”;; 14 | b) echo “b”;; 15 | c) echo “c”;; 16 | *) echo “not a, b, c”;; 17 | esac 18 | done 19 | } 20 | testcase ‘’ foo a bar b c d 21 | 22 | 23 | ls / 24 | ls / >/dev/null 25 | ls /foobar 2>/dev/null 26 | ls / /foobar >stdout_and_stderr.log 2>&1 27 | ls / /foobar >stdout.log 2>stderr.log 28 | ls / /foobar 2>&1 >/dev/null 29 | 30 | cat keys.log 33 | 34 | 35 | cat <options.conf 36 | option=true 37 | option2=false 38 | option3=cat 39 | EOF 40 | 41 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | grep aardvark 42 | 43 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | grep [Aa]ardvark 44 | 45 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | grep .ardvark 46 | 47 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | grep -E '(aardvark|giraffe)' 48 | 49 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | grep -E '(a)?ardvark' 50 | 51 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | grep -E 'aaaaaaa(a)*' | head -n 3 52 | 53 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | awk '/aardvark/' 54 | 55 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | awk -F"\t" '$13 ~ /aardvark/' 56 | 57 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | awk -F"\t" '$13 ~ /aardvark/ {print $6}' 58 | 59 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | awk -F"\t" 'BEGIN {OFS=";"} ; $13 ~ /aardvark/ {print $6, $2, $3}' 60 | 61 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | awk '/aardvark/' | sed 's/aardvark/giraffe/g' 62 | 63 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | awk '/aardvark/' | sed '/ant/d' 64 | 65 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -f13 | awk '/aardvark/' | tr 'a' 'b' 66 | 67 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | head -n 10 | cut -f13 | sort 68 | 69 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | head -n 10 | tail -n +2 | cut -f13,8 | sort -n 70 | 71 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | head -n 10000 | tail -n +2 | cut -f13,8 | sort -n | tail -n 10 72 | 73 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | head -n 50000 | tail -n +2 | sort -t$'\t' -k9n,9 | tail -n 1 74 | 75 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | head -n 50000 | tail -n +2 | sort -t$'\t' -k9nr,9 -k10n,10 | tail -n 1 76 | 77 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | head -n 50000 | tail -n +2 | cut -f8 | sort | uniq 78 | 79 | 80 | -------------------------------------------------------------------------------- /Chapter06/Chapter06_examples.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f2,8 | head 4 | 5 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f2,8 | tail -n +2 | head< 6 | 7 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f15 | cut -d$'-' -f2,3,1 | head 8 | 9 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -c1-12 | head 10 | 11 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f2,8 | tail -n +2 | grep "^3" | head 12 | 13 | join -j2 <(zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f2 | sort | uniq -c) <(zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_00.tsv.gz | cut -d$'\t' -f2 | sort | uniq -c) | head 14 | 15 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f 2 | sort | uniq -c | head 16 | 17 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f2,8 | awk '{sum[$1]+=$2;count[$1]+=1} END {for (i in sum) {print i,sum[i],count[i],sum[i]/count[i]}}' | head 18 | 19 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f2,8 | awk '{sum[$1]+=$2;count[$1]+=1} END {for (i in sum) {print i,sum[i],count[i],sum[i]/count[i]}}' | sort -k3 -r -n | head 20 | 21 | zcat amazon_reviews_us_Digital_Ebook_Purchase_v1_01.tsv.gz | cut -d$'\t' -f2,8 | awk '{sum[$1]+=$2;count[$1]+=1} END {for (i in sum) {print i,sum[i],count[i],sum[i]/count[i]}}' | sort -k3 -r -n | awk '$3 >= 100 && $3 <=200' | head 22 | 23 | cp database.sq3 backups/`date +%F`-database.sq3 24 | 25 | sudo apt install sqlite3 26 | 27 | sqlite3 test.sq3 <Hands-On Data Science with the Command Line 6 | 7 | This is the code repository for [Hands-On Data Science with the Command Line](https://www.packtpub.com/big-data-and-business-intelligence/hands-data-science-command-line?utm_source=github&utm_medium=repository&utm_campaign=9781789132984), published by Packt. 8 | 9 | **Automate everyday data science tasks using command-line tools** 10 | 11 | ## What is this book about? 12 | The Command Line has been in existence on UNIX-based OSes in the form of Bash shell for over 3 decades. However, very little is known to developers as to how command-line tools can be OSEMN (pronounced as awesome and standing for Obtaining, Scrubbing, Exploring, Modeling, and iNterpreting data) for carrying out simple-to-advanced data science tasks at speed. 13 | 14 | This book covers the following exciting features: 15 | * Learn how to manage users, groups, and permissions 16 | * Encrypt and decrypt disks with Linux Unified Key Setup /Luks 17 | * Setup SSH for remote access, and connect it to other nodes 18 | * Understand how to add, remove, and search for packages 19 | * Use NFS and Samba to share directories with other users 20 | 21 | If you feel this book is for you, get your [copy](https://www.amazon.com/dp/1789132983) today! 22 | 23 | https://www.packtpub.com/ 25 | 26 | 27 | ## Instructions and Navigations 28 | All of the code is organized into folders. For example, Chapter02. 29 | 30 | The code will look like the following: 31 | ``` 32 | <greetlib.sh 33 | greet_yourself () { 34 | echo Hello, \${1:-\$USER}! 35 | } 36 | EOF 37 | ``` 38 | 39 | **Following is what you need for this book:** 40 | This book is for data scientists and data analysts with little to no knowledge of the command line but has an understanding of data science. Perform everyday data science tasks using the power of command line tools. 41 | 42 | With the following software and hardware list you can run all code files present in the book (Chapter 1-15). 43 | 44 | ### Software and Hardware List 45 | 46 | | Chapter | Software required | OS required | 47 | | -------- | ------------------------------------| -----------------------------------| 48 | | 1-6 | sqlite3 | Windows, Mac OS X, and Linux (Any) | 49 | 50 | 51 | ### Related products 52 | * Beginning Data Science with Python and Jupyter [[Packt]](https://www.packtpub.com/big-data-and-business-intelligence/beginning-data-science-python-and-jupyter?utm_source=github&utm_medium=repository&utm_campaign=9781789532029) [[Amazon]](https://www.amazon.com/dp/1789532027) 53 | 54 | * Hands-On Data Science with Anaconda [[Packt]](https://www.packtpub.com/big-data-and-business-intelligence/hands-data-science-anaconda?utm_source=github&utm_medium=repository&utm_campaign=9781788831192) [[Amazon]](https://www.amazon.com/dp/1788831195) 55 | 56 | ## Get to Know the Authors 57 | **Jason Morris** 58 | is a systems and research engineer with over 19 years of experience in system architecture, research engineering, and large data analysis. His primary focus is machine learning with TensorFlow, CUDA, and Apache Spark.Jason is also a speaker and a consultant on designing large-scale architectures, implementing best security practices on the cloud, creating near real-time image detection analytics with deep learning, and developing serverless architectures to aid in ETL. His most recent roles include solution architect, big data engineer, big data specialist, and instructor at Amazon Web Services. He is currently the Chief Technology Officer of Next Rev Technologies, and his favorite command-line program is netcat. 59 | 60 | **Chris McCubbin** 61 | is a data scientist and software developer with 20 years' experience in developing complex systems and analytics. He co-founded the successful big data security start-up Sqrrl, since acquired by Amazon. He has also developed smart swarming systems for drones, social network analysis systems in MapReduce, and big data security analytic platforms using the Accumulo and Spark Apache projects. He has been using the Unix command line, starting on IRIX platforms in college, and his favorite command-line program is find. 62 | 63 | **Raymond Page** 64 | is a computer engineer specializing in site reliability. His experience with embedded development engendered a passion for removing the pervasive bloat from web technologies and cloud computing. His favorite command is cat. 65 | 66 | 67 | ## Other books by the authors 68 | * [Android User Interface Development: Beginner's Guide](https://www.packtpub.com/application-development/android-user-interface-development-beginners-guide?utm_source=github&utm_medium=repository&utm_campaign=9781849514484) 69 | * [Hands-On Android UI Development](https://www.packtpub.com/application-development/hands-android-ui-development?utm_source=github&utm_medium=repository&utm_campaign=9781788475051) 70 | 71 | ### Suggestions and Feedback 72 | [Click here](https://docs.google.com/forms/d/e/1FAIpQLSdy7dATC6QmEL81FIUuymZ0Wy9vH1jHkvpY57OiMeKGqib_Ow/viewform) if you have any feedback or suggestions. 73 | ### Download a free PDF 74 | 75 | If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.
76 |

https://packt.link/free-ebook/9781789132984

--------------------------------------------------------------------------------