├── .gitattributes ├── LICENSE ├── README.md └── src └── metagoofeel.sh /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Ivan Šincek 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Metagoofeel 2 | 3 | Web crawler and downloader based on GNU Wget. 4 | 5 | The goal is to be less intrusive than simply mirroring an entire website. 6 | 7 | You can also import your own list with already crawled URLs (e.g. from Burp Suite). 8 | 9 | Current regular expression for extracting URLs from GNU Wget's output is `(?<=URL\:\ )[^\s]+(?=\ 200\ OK)` and for downloading is simply to check if the supplied keyword is contained in a URL. 10 | 11 | Tweak this tool to your liking by modifying regular expressions. 12 | 13 | Tested on Kali Linux v2021.2 (64-bit). 14 | 15 | Made for educational purposes. I hope it will help! 16 | 17 | ## How to Run 18 | 19 | Open your preferred console from [/src/](https://github.com/ivan-sincek/metagoofeel/tree/master/src) and run the commands shown below. 20 | 21 | Install required packages: 22 | 23 | ```fundamental 24 | apt-get -y install bc 25 | ``` 26 | 27 | Change file permissions: 28 | 29 | ```fundamental 30 | chmod +x metagoofeel.sh 31 | ``` 32 | 33 | Run the script: 34 | 35 | ```fundamental 36 | ./metagoofeel.sh 37 | ``` 38 | 39 | Tail the crawling progress (optional): 40 | 41 | ```fundamental 42 | tail -f metagoofeel_urls.txt 43 | ``` 44 | 45 | ## Usage 46 | 47 | ```fundamental 48 | Metagoofeel v2.2 ( github.com/ivan-sincek/metagoofeel ) 49 | 50 | --- Crawl --- 51 | Usage: ./metagoofeel.sh -d domain [-r recursion] 52 | Example: ./metagoofeel.sh -d https://example.com [-r 20 ] 53 | 54 | --- Crawl and download --- 55 | Usage: ./metagoofeel.sh -d domain -k keyword [-r recursion] 56 | Example: ./metagoofeel.sh -d https://example.com -k all [-r 20 ] 57 | 58 | --- Download from a file --- 59 | Usage: ./metagoofeel.sh -f file -k keyword 60 | Example: ./metagoofeel.sh -f metagoofeel_urls.txt -k pdf 61 | 62 | DESCRIPTION 63 | Crawl through an entire website and download specific or all files 64 | DOMAIN 65 | Domain you want to crawl 66 | -d - https://example.com | https://192.168.1.10 | etc. 67 | KEYWORD 68 | Keyword to download only specific files 69 | Use 'all' to download all files 70 | -k - pdf | js | png | all | etc. 71 | RECURSION 72 | Maximum recursion depth 73 | Use '0' for infinite 74 | Default: 10 75 | -r - 0 | 5 | etc. 76 | FILE 77 | File with [already crawled] URLs 78 | -f - metagoofeel_urls.txt | etc. 79 | ``` 80 | -------------------------------------------------------------------------------- /src/metagoofeel.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | start=$(date "+%s.%N") 4 | 5 | # -------------------------- INFO -------------------------- 6 | 7 | function basic () { 8 | proceed=false 9 | echo "Metagoofeel v2.2 ( github.com/ivan-sincek/metagoofeel )" 10 | echo "" 11 | echo "--- Crawl ---" 12 | echo "Usage: ./metagoofeel.sh -d domain [-r recursion]" 13 | echo "Example: ./metagoofeel.sh -d https://example.com [-r 20 ]" 14 | echo "" 15 | echo "--- Crawl and download ---" 16 | echo "Usage: ./metagoofeel.sh -d domain -k keyword [-r recursion]" 17 | echo "Example: ./metagoofeel.sh -d https://example.com -k all [-r 20 ]" 18 | echo "" 19 | echo "--- Download from a file ---" 20 | echo "Usage: ./metagoofeel.sh -f file -k keyword" 21 | echo "Example: ./metagoofeel.sh -f metagoofeel_urls.txt -k pdf" 22 | } 23 | 24 | function advanced () { 25 | basic 26 | echo "" 27 | echo "DESCRIPTION" 28 | echo " Crawl through an entire website and download specific or all files" 29 | echo "DOMAIN" 30 | echo " Domain you want to crawl" 31 | echo " -d - https://example.com | https://192.168.1.10 | etc." 32 | echo "KEYWORD" 33 | echo " Keyword to download only specific files" 34 | echo " Use 'all' to download all files" 35 | echo " -k - pdf | js | png | all | etc." 36 | echo "RECURSION" 37 | echo " Maximum recursion depth" 38 | echo " Use '0' for infinite" 39 | echo " Default: 10" 40 | echo " -r - 0 | 5 | etc." 41 | echo "FILE" 42 | echo " File with [already crawled] URLs" 43 | echo " -f - metagoofeel_urls.txt | etc." 44 | } 45 | 46 | # -------------------- VALIDATION BEGIN -------------------- 47 | 48 | # my own validation algorithm 49 | 50 | proceed=true 51 | 52 | # $1 (required) - message 53 | function echo_error () { 54 | echo "ERROR: ${1}" 1>&2 55 | } 56 | 57 | # $1 (required) - message 58 | # $2 (required) - help 59 | function error () { 60 | proceed=false 61 | echo_error "${1}" 62 | if [[ $2 == true ]]; then 63 | echo "Use -h for basic and --help for advanced info" 1>&2 64 | fi 65 | } 66 | 67 | declare -A args=([domain]="" [keyword]="" [recursion]="" [file]="") 68 | 69 | # $1 (required) - key 70 | # $2 (required) - value 71 | function validate () { 72 | if [[ ! -z $2 ]]; then 73 | if [[ $1 == "-d" && -z ${args[domain]} ]]; then 74 | args[domain]=$2 75 | elif [[ $1 == "-k" && -z ${args[keyword]} ]]; then 76 | args[keyword]=$2 77 | elif [[ $1 == "-r" && -z ${args[recursion]} ]]; then 78 | args[recursion]=$2 79 | if [[ ! ( ${args[recursion]} =~ ^[0-9]+$ ) ]]; then 80 | error "Recursion depth must be numeric" 81 | fi 82 | elif [[ $1 == "-f" && -z ${args[file]} ]]; then 83 | args[file]=$2 84 | if [[ ! -f ${args[file]} ]]; then 85 | error "File does not exists" 86 | elif [[ ! -r ${args[file]} ]]; then 87 | error "File does not have read permission" 88 | elif [[ ! -s ${args[file]} ]]; then 89 | error "File is empty" 90 | fi 91 | fi 92 | fi 93 | } 94 | 95 | # $1 (required) - argc 96 | # $2 (required) - args 97 | function check() { 98 | local argc=$1 99 | local -n args_ref=$2 100 | local count=0 101 | for key in ${!args_ref[@]}; do 102 | if [[ ! -z ${args_ref[$key]} ]]; then 103 | count=$((count + 1)) 104 | fi 105 | done 106 | echo $((argc - count == argc / 2)) 107 | } 108 | 109 | if [[ $# == 0 ]]; then 110 | advanced 111 | elif [[ $# == 1 ]]; then 112 | if [[ $1 == "-h" ]]; then 113 | basic 114 | elif [[ $1 == "--help" ]]; then 115 | advanced 116 | else 117 | error "Incorrect usage" true 118 | fi 119 | elif [[ $(($# % 2)) -eq 0 && $# -le $((${#args[@]} * 2)) ]]; then 120 | for key in $(seq 1 2 $#); do 121 | val=$((key + 1)) 122 | validate "${!key}" "${!val}" 123 | done 124 | if [[ -z ${args[domain]} && -z ${args[file]} || ( ! -z ${args[domain]} || ! -z ${args[recursion]} ) && ! -z ${args[file]} || $(check $# args) -eq false ]]; then 125 | error "Missing a mandatory option (-d) and/or optional (-k, -r)" 126 | error "Missing a mandatory option (-f, -k)" true 127 | fi 128 | else 129 | error "Incorrect usage" true 130 | fi 131 | 132 | # --------------------- VALIDATION END --------------------- 133 | 134 | # ----------------------- TASK BEGIN ----------------------- 135 | 136 | # $1 (required) - message 137 | function timestamp () { 138 | echo "${1} -- $(date "+%H:%M:%S %m-%d-%Y")" 139 | } 140 | 141 | function interrupt () { 142 | echo "" 143 | echo "[Interrupted]" 144 | } 145 | 146 | # $1 (required) - domain 147 | # $2 (required) - output 148 | # $3 (optional) - recursion 149 | function crawl () { 150 | echo "All crawled URLs will be saved in '${2}'" 151 | echo "You can tail the crawling progress with 'tail -f ${2}'" 152 | echo "Press CTRL + C to stop early" 153 | timestamp "Crawling has started" 154 | wget "${1}" -e robots=off -nv --spider --random-wait -nd --no-cache -r -l "${3}:-10" -o "${2}" 155 | timestamp "Crawling has ended " 156 | grep -Po '(?<=URL\:\ )[^\s]+(?=\ 200\ OK)' "${2}" | sort -u -o "${2}" 157 | echo "Total URLs crawled: $(cat "${2}" | grep -Po '[^\s]+' | wc -l)" 158 | } 159 | 160 | downloading=true 161 | 162 | function interrupt_download () { 163 | downloading=false 164 | interrupt 165 | } 166 | 167 | # $1 (required) - keyword 168 | # $2 (required) - input 169 | function download () { 170 | local count=0 171 | local directory="metagoofeel_$(echo "${1}" | sed "s/[[:space:]]/_/g;s/\//_/g")" 172 | echo "All downloaded files will be saved in '/${directory}/'" 173 | echo "Press CTRL + C to stop early" 174 | timestamp "Downloading has started" 175 | for url in $(cat "${2}" | grep -Po '[^\s]+'); do 176 | if [[ $downloading == false ]]; then 177 | break 178 | fi 179 | if [[ $1 == "all" || $(echo "${url}" | grep -i "${1}") ]]; then 180 | if [[ $(wget "${url}" -e robots=off -nv -nc -nd --no-cache -P "${directory}" 2>&1) ]]; then 181 | echo "${url}" 182 | count=$((count + 1)) 183 | fi 184 | fi 185 | done 186 | timestamp "Downloading has ended " 187 | echo "Total files downloaded: ${count}" 188 | } 189 | 190 | if [[ $proceed == true ]]; then 191 | echo "########################################################################" 192 | echo "# #" 193 | echo "# Metagoofeel v2.2 #" 194 | echo "# by Ivan Sincek #" 195 | echo "# #" 196 | echo "# Crawl through an entire website and download specific or all files. #" 197 | echo "# GitHub repository at github.com/ivan-sincek/metagoofeel. #" 198 | echo "# #" 199 | echo "########################################################################" 200 | if [[ ! -z ${args[file]} ]]; then 201 | trap interrupt_download INT 202 | download "${args[keyword]}" "${args[file]}" 203 | trap INT 204 | else 205 | output="metagoofeel_urls.txt" 206 | input="yes" 207 | if [[ -f $output ]]; then 208 | echo "Output file '${output}' already exists" 209 | read -p "Overwrite the output file (yes): " input 210 | echo "" 211 | fi 212 | if [[ $input == "yes" ]]; then 213 | trap interrupt INT 214 | crawl "${args[domain]}" "${output}" "${args[recursion]}" 215 | trap INT 216 | if [[ ! -z ${args[keyword]} ]]; then 217 | echo "" 218 | read -p "Start downloading (yes): " input 219 | if [[ $input == "yes" ]]; then 220 | echo "" 221 | trap interrupt_download INT 222 | download "${args[keyword]}" "${output}" 223 | trap INT 224 | fi 225 | fi 226 | fi 227 | fi 228 | end=$(date "+%s.%N") 229 | runtime=$(echo "${end} - ${start}" | bc -l) 230 | echo "" 231 | echo "Script has finished in ${runtime}" 232 | fi 233 | 234 | # ------------------------ TASK END ------------------------ 235 | --------------------------------------------------------------------------------