├── LICENSE ├── README.md ├── config.sh ├── domain_enum ├── examples ├── example1.png └── example2.png ├── extractor.sh ├── fuzz ├── log.csv ├── requirements └── xml_fields /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 eschultze 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # URLextractor 2 | 3 | Information gathering & website reconnaissance 4 | ------ 5 | 6 | **Usage:** 7 | `./extractor http://www.hackthissite.org/` 8 | 9 | ![](https://github.com/eschultze/URLextractor/blob/master/examples/example1.png) 10 | 11 | **Tips:** 12 | * Colorex: put colors to the ouput `pip install colorex` and use it like `./extractor http://www.hackthissite.org/ | colorex -g "INFO" -r "ALERT"` 13 | * Tldextract: is used by dnsenumeration function `pip install tldextract` 14 | 15 | Features: 16 | ------ 17 | 18 | * IP and hosting info like city and country (using [FreegeoIP](http://freegeoip.net/)) 19 | * DNS servers (using [dig](http://packages.ubuntu.com/precise/dnsutils)) 20 | * ASN, Network range, ISP name (using [RISwhois](https://www.ripe.net/analyse/archived-projects/ris-tools-web-interfaces/riswhois)) 21 | * Load balancer test 22 | * Whois for abuse mail (using [Spamcop](https://www.spamcop.net/)) 23 | * PAC (Proxy Auto Configuration) file 24 | * Compares hashes to diff code 25 | * robots.txt (recursively looking for hidden stuff) 26 | * Source code (looking for passwords and users) 27 | * External links (frames from other websites) 28 | * Directory FUZZ (like Dirbuster and Wfuzz - using [Dirbuster](https://www.owasp.org/index.php/Category:OWASP_DirBuster_Project)) directory list) 29 | * [URLvoid](http://www.urlvoid.com/) API - checks Google page rank, Alexa rank and possible blacklists 30 | * Provides useful links at other websites to correlate with IP/ASN 31 | * Option to open ALL results in browser at the end 32 | 33 | Changelog to version 0.2.0: 34 | ------ 35 | 36 | * [Fix] Changed GeoIP from freegeoip to ip-api 37 | * [Fix/Improvement] Remove duplicates from robots.txt 38 | * [Improvement] Better whois abuse contacts (abuse.net) 39 | * [Improvement] Top passwords collection added to sourcecode checking 40 | * [New feature] Firt run verification to install dependencies if need 41 | * [New feature] Log file 42 | * [New feature] Check for hostname on log file 43 | * [New feature] Check if hostname is listed on Spamaus Domain Blacklist 44 | * [New feature] Run a quick dnsenumeration with common server names 45 | 46 | Changelog to version 0.1.9: 47 | ------ 48 | 49 | * Abuse mail using lynx istead of ~~curl~~ 50 | * Target server name parsing fixed 51 | * More verbose about HTTP codes and directory discovery 52 | * MD5 collection for IP fixed 53 | * Links found now show unique URLs from array 54 | * [New feature] **Google** results 55 | * [New feature] **Bing** IP check for other hosts/vhosts 56 | * [New feature] Opened ports from **Shodan** 57 | * [New feature] **VirusTotal** information about IP 58 | * [New feature] **Alexa Rank** information about $TARGET_HOST 59 | 60 | Requirements: 61 | ------ 62 | 63 | Tested on Kali light mini AND OSX 10.11.3 with brew 64 | ``` 65 | sudo apt-get install bc curl dnsutils libxml2-utils whois md5sha1sum lynx openssl -y 66 | ``` 67 | 68 | **Configuration file:** 69 | ``` 70 | CURL_TIMEOUT=15 #timeout in --connect-timeout 71 | CURL_UA=Mozilla #user-agent (keep it simple) 72 | INTERNAL=NO #YES OR NO (show internal network info) 73 | URLVOID_KEY=your_API_key #using API from http://www.urlvoid.com/ 74 | FUZZ_LIMIT=10 #how many lines it will read from fuzz file 75 | OPEN_TARGET_URLS=NO #open found URLs at the end of script 76 | OPEN_EXTERNAL_LINKS=NO #open external links (frames) at the end of script 77 | FIRST_TIME=YES #if first time check for dependecies 78 | ``` 79 | 80 | Todo list: 81 | ------ 82 | 83 | * [x] Upload to github :) 84 | * [x] Check for installed packages 85 | * [ ] Integration with other APIs 86 | * [ ] Export to CSV 87 | * [ ] Integration with CipherScan 88 | 89 | ## Stargazers over time 90 | 91 | [![Stargazers over time](https://starchart.cc/eschultze/URLextractor.svg)](https://starchart.cc/eschultze/URLextractor) 92 | -------------------------------------------------------------------------------- /config.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | CURL_TIMEOUT=15 #timeout in --connect-timeout 4 | CURL_UA=Mozilla #user-agent (keep it simple) 5 | INTERNAL=NO #YES OR NO (show internal network info) 6 | URLVOID_KEY= #using API from http://www.urlvoid.com/ 7 | FUZZ_LIMIT=10 #how many lines it will read from fuzz file 8 | OPEN_TARGET_URLS=NO #open found URLs at the end of script 9 | OPEN_EXTERNAL_LINKS=NO #open external links (frames) at the end of script 10 | FIRST_TIME=YES #if first time check for dependecies 11 | 12 | -------------------------------------------------------------------------------- /domain_enum: -------------------------------------------------------------------------------- 1 | ad 2 | admin 3 | ads 4 | alpha 5 | api 6 | api-online 7 | apolo 8 | app 9 | beta 10 | bi 11 | blog 12 | cdn 13 | events 14 | ex 15 | files 16 | ftp 17 | gateway 18 | go 19 | help 20 | ib 21 | images 22 | internetbanking 23 | intranet 24 | jobs 25 | join 26 | live 27 | login 28 | m 29 | mail 30 | mail2 31 | mobile 32 | moodle 33 | mx 34 | mx2 35 | mx3 36 | my 37 | new 38 | news 39 | ns1 40 | ns2 41 | ns3 42 | oauth 43 | old 44 | one 45 | open 46 | out 47 | outlook 48 | portfolio 49 | raw 50 | repo 51 | router 52 | search 53 | siem 54 | slack 55 | slackbot 56 | snmp 57 | stream 58 | support 59 | syslog 60 | tags 61 | test 62 | upload 63 | video 64 | vpn 65 | webconf 66 | webmail 67 | webportal 68 | wiki 69 | www2 70 | www3 71 | zendesk 72 | -------------------------------------------------------------------------------- /examples/example1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eschultze/URLextractor/739864d0f8c6427c835a31e3360d491947fe406a/examples/example1.png -------------------------------------------------------------------------------- /examples/example2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eschultze/URLextractor/739864d0f8c6427c835a31e3360d491947fe406a/examples/example2.png -------------------------------------------------------------------------------- /extractor.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | erase_temp_files(){ 4 | echo -e "\n[ALERT] OK... Let's close" 5 | rm -f URLs_$TARGET_HOST.txt $TARGET_DOMAIN.xml URLsExternal$TARGET_HOST.txt 6 | exit 130 7 | } 8 | 9 | trap erase_temp_files SIGINT 10 | 11 | clear 12 | 13 | source config.sh 14 | 15 | if [[ $FIRST_TIME = "YES" ]]; then 16 | for APP in $(cat requirements); do sudo apt-get install $APP; done 17 | sed -i '10s/YES/NO/' config.sh 18 | clear 19 | fi 20 | 21 | TARGET=$1 22 | 23 | echo -e "\e[1;32m##################################################" 24 | echo -e "# URLextractor #" 25 | echo -e "# Information Gathering & Website Reconnaissance #" 26 | echo -e "# coded by eschultze #" 27 | echo -e "# https://phishstats.info/ #" 28 | echo -e "# version - 0.2.0 #" 29 | echo -e "##################################################\e[m" 30 | 31 | date '+[INFO] Date: %d/%m/%y | Time: %H:%M:%S' 32 | date_begin=$(date +"%s") 33 | 34 | if [[ $INTERNAL != "NO" ]]; then 35 | echo [INFO] ----Machine info---- 36 | distrib=$(cat /etc/issue | cut -d' ' -f1) 37 | echo [*] Distribution: $distrib 38 | user=$(whoami) 39 | echo [*] User: $user 40 | echo [INFO] ----Network info---- 41 | rede=$(ifconfig | awk '{print$1}' | grep 'eth\|lo\|lan\|pan\|vmnet' | grep ':' | cut -d':' -f1 | head -1) 42 | echo [*] Network interface: $rede 43 | internal=$(ifconfig | grep "inet " | awk '{print$2}' | head -1) 44 | echo [*] Internal IP: $internal 45 | 46 | EXTERNAL_IP=$(curl -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT http://ipinfo.io/ip) 47 | GEOIP=$(curl -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT http://ip-api.com/csv/$EXTERNAL_IP) echo [*] External IP: $EXTERNAL_IP 48 | EXTERNAL_IP_CC=$(echo $GEOIP | cut -d',' -f2 | cut -d '"' -f2) && echo [*] CC: $EXTERNAL_IP_CC 49 | 50 | TRIES=0 51 | TRIES_MAX=6 52 | while [[ $EXTERNAL_IP_CC = "Try again later" ]] || [[ $EXTERNAL_IP_CC = "" ]]; do 53 | echo "[ALERT] Problem with IP-API detected... trying to reconnect with $CURL_TIMEOUT seconds timeout. Number of tries: $TRIES/$TRIES_MAX" 54 | TRIES=$((TRIES+1)) 55 | GEOIP=`curl -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT http://ip-api.com/csv/$EXTERNAL_IP` 56 | EXTERNAL_IP_CC=`echo $GEOIP | cut -d',' -f2 | cut -d '"' -f2` 57 | echo [*] Number of tries: $TRIES 58 | 59 | if [[ $TRIES -ge 6 ]]; then 60 | echo "[ALERT] It seems IP-API is currently DOWN... exiting" 61 | exit 1 62 | fi 63 | done 64 | 65 | EXTERNAL_IP_CN=$(echo $GEOIP | cut -d',' -f3 | cut -d '"' -f2) && echo [*] Country: $EXTERNAL_IP_CN 66 | EXTERNAL_IP_RG=$(echo $GEOIP | cut -d',' -f4 | cut -d '"' -f2) && echo [*] RegionCode: $EXTERNAL_IP_RG 67 | EXTERNAL_IP_RN=$(echo $GEOIP | cut -d',' -f5 | cut -d '"' -f2) && echo [*] RegionName: $EXTERNAL_IP_RN 68 | EXTERNAL_IP_CITY=$(echo $GEOIP | cut -d',' -f6 | cut -d '"' -f2) && echo [*] City: $EXTERNAL_IP_CITY 69 | EXTERNAL_MAP=$(echo $GEPIP | cut -d',' -f8-9) && echo [*] GoogleMaps: https://www.google.com/maps/@$EXTERNAL_MAP,10z 70 | 71 | WHOIS_IP=`whois -h riswhois.ripe.net $EXTERNAL_IP | egrep "route|origin|descr" | head -4` 72 | EXTERNAL_IP_ASN=$(echo $WHOIS_IP | awk '{print$13}') && echo [*] ASN: $EXTERNAL_IP_ASN 73 | EXTERNAL_IP_BGP=$(echo $WHOIS_IP | awk '{print$11}') && echo [*] BGP_PREFIX: $EXTERNAL_IP_BGP 74 | EXTERNAL_IP_ISP=$(echo $WHOIS_IP | cut -d' ' -f15-28) && echo [*] ISP: $EXTERNAL_IP_ISP 75 | 76 | fi 77 | 78 | TARGET_HOST=$(echo $TARGET | cut -d'/' -f3 | cut -d':' -f1) 79 | if [[ -z $TARGET ]]; then 80 | echo "[ALERT] NO target set" 81 | echo "[ALERT] USAGE: ./extractor http://site.com/ OR http://site.com/path/dir/file.php OR http://site.com/path/proxy.pac" 82 | exit 1 83 | else 84 | TARGET=$(curl --fail -A $CURL_UA -L --write-out "%{url_effective}\n" --silent --connect-timeout $CURL_TIMEOUT --output /dev/null $1) 85 | 86 | echo [INFO] ------TARGET info------ 87 | echo [*] TARGET: $TARGET 88 | 89 | GEOIP=$(curl -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT http://ip-api.com/csv/$TARGET_HOST) 90 | TARGET_IP=$(echo $GEOIP | cut -d',' -f1 | cut -d '"' -f2) 91 | 92 | LOG_IP=`cat log.csv | cut -d',' -f2 | grep $TARGET_IP | wc -l | sed -e 's/^[ \t]*//'` 93 | if [[ $LOG_IP -ge 1 ]]; then 94 | echo "[*] Same IP $TARGET_IP was previously analyzed $LOG_IP time(s)" 95 | fi 96 | 97 | LOG_TARGET=`cat log.csv | cut -d',' -f3 | grep $TARGET | wc -l | sed -e 's/^[ \t]*//'` 98 | if [[ $LOG_TARGET -ge 1 ]]; then 99 | echo "[*] Same target $TARGET was previously analyzed $LOG_TARGET time(s)" 100 | fi 101 | 102 | TRIES=0 103 | TRIES_MAX=6 104 | while [[ $TARGET_IP = "Try again later" ]] || [[ $TARGET_IP = "" ]]; do 105 | TRIES=$((TRIES+1)) 106 | echo "[ALERT] Problem with IP-API detected... trying to reconnect with $CURL_TIMEOUT seconds timeout. Number of tries: $TRIES/$TRIES_MAX" 107 | GEOIP=`curl -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT http://ip-api.com/csv/$TARGET_HOST` 108 | TARGET_IP=`echo $GEOIP | cut -d',' -f1 | cut -d '"' -f2` 109 | 110 | if [[ $TRIES -ge 6 ]]; then 111 | echo "[ALERT] It seems IP-API is currently DOWN... exiting" 112 | exit 1 113 | fi 114 | done 115 | 116 | if [[ $TARGET_IP =~ "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$" ]]; then 117 | echo [*] TARGET IP: $TARGET_IP 118 | else 119 | TARGET_IP=$(host $TARGET_HOST | grep "has address" | cut -d' ' -f4 | head -1) 120 | if [[ -z $TARGET_IP ]]; then 121 | echo "[ALERT] It seems $TARGET is OFFLINE... exiting" 122 | exit 1 123 | else 124 | echo [*] TARGET IP: $TARGET_IP 125 | fi 126 | fi 127 | 128 | TARGET_LOADB=$(host $TARGET_HOST | grep "has address" | wc -l | sed -e 's/^[ \t]*//') 129 | if [[ $TARGET_LOADB -ge 2 ]]; then 130 | echo "[ALERT] $TARGET_HOST has a load balancer for IPv4 with the following IPs:" 131 | for TARGET_LOADB_IP in $(host $TARGET_HOST | grep "has address" | cut -d' ' -f4) 132 | do 133 | echo [*] $TARGET_LOADB_IP 134 | done 135 | else 136 | echo "[INFO] NO load balancer detected for $TARGET_HOST..." 137 | fi 138 | 139 | TARGET_DNS=$(dig -t SOA $TARGET_HOST | grep -A1 "AUTHORITY SECTION\|ANSWER SECTION" | awk '{print$5}' | sed '/^$/d') && echo [*] DNS servers: ${TARGET_DNS[@]} 140 | 141 | TARGET_SERVER=$(curl -A $CURL_UA -I -L --silent http://$TARGET_HOST/ | grep Server: | uniq | cut -d' ' -f2-10) && echo [*] TARGET server: $TARGET_SERVER 142 | TARGET_IP_CC=$(echo $GEOIP | cut -d',' -f3 | cut -d '"' -f2) && echo [*] CC: $TARGET_IP_CC 143 | TARGET_IP_CN=$(echo $GEOIP | cut -d',' -f2 | cut -d '"' -f2) && echo [*] Country: $TARGET_IP_CN 144 | TARGET_IP_RG=$(echo $GEOIP | cut -d',' -f4 | cut -d '"' -f2) && echo [*] RegionCode: $TARGET_IP_RG 145 | TARGET_IP_RN=$(echo $GEOIP | cut -d',' -f5 | cut -d '"' -f2) && echo [*] RegionName: $TARGET_IP_RN 146 | TARGET_IP_CITY=$(echo $GEOIP | cut -d',' -f6 | cut -d '"' -f2) && echo [*] City: $TARGET_IP_CITY 147 | 148 | WHOIS_IP=`whois -h riswhois.ripe.net $TARGET_IP | egrep "route|origin|descr" | head -4` 149 | TARGET_IP_ASN=$(echo $WHOIS_IP | awk '{print$13}') && echo [*] ASN: $TARGET_IP_ASN 150 | TARGET_IP_BGP=$(echo $WHOIS_IP | awk '{print$11}') && echo [*] BGP_PREFIX: $TARGET_IP_BGP 151 | TARGET_IP_ISP=$(echo $WHOIS_IP | cut -d' ' -f15-28) && echo [*] ISP: $TARGET_IP_ISP 152 | 153 | if [[ $TARGET =~ ^https ]]; then 154 | echo "[INFO] SSL/HTTPS certificate detected" 155 | SSL_ISSUER=`echo | openssl s_client -servername $TARGET_HOST -connect $TARGET_HOST:443 2>/dev/null | openssl x509 -noout -issuer -subject | grep "issuer"` && echo [*] Issuer: $SSL_ISSUER 156 | SSL_SUBJECT=`echo | openssl s_client -servername $TARGET_HOST -connect $TARGET_HOST:443 2>/dev/null | openssl x509 -noout -issuer -subject | grep "subject"` && echo [*] Subject: $SSL_SUBJECT 157 | SSL_ISSUER_LETS=`echo $SSL_ISSUER | grep -oiE "let.?s.?encrypt"` 158 | if [[ $SSL_ISSUER_LETS != "" ]]; then 159 | echo "[ALERT] Let's Encrypt is commonly used for Phishing" 160 | fi 161 | fi 162 | 163 | TLD_EXTRACT=$(which tldextract) 164 | if [[ $TLD_EXTRACT != "" ]]; then 165 | TARGET_DOMAIN=$(tldextract $TARGET_HOST | rev | cut -d' ' -f1-2 | rev | sed 's/ /./g') 166 | 167 | echo "[INFO] DNS enumeration:" 168 | for DOMAIN_ENUM in $(cat domain_enum) 169 | do 170 | DOMAIN_ENUMTEST=$(dig +short $DOMAIN_ENUM.$TARGET_DOMAIN | xargs) 171 | if [[ $DOMAIN_ENUMTEST != "" ]]; then 172 | echo -e "[*] $DOMAIN_ENUM.$TARGET_DOMAIN \t $DOMAIN_ENUMTEST" 173 | fi 174 | done 175 | fi 176 | 177 | TEMP_MAIL_ARRAY=() 178 | echo "[INFO] Possible abuse mails are:" 179 | for TEMP_MAIL in $(curl -L -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT "https://www.spamcop.net/sc?track=$TARGET_IP" | grep -oE 'mailto:.*' | grep -v bait | cut -d':' -f2 | cut -d'"' -f1) 180 | do 181 | TEMP_MAIL_ARRAY+=($TEMP_MAIL) 182 | done 183 | for TEMP_MAIL in $(dig -t TXT +short $TARGET_HOST.contacts.abuse.net | sed 's/"//g') 184 | do 185 | TEMP_MAIL_ARRAY+=($TEMP_MAIL) 186 | done 187 | 188 | SPAMHAUS_DBL=$(dig +short $TARGET_HOST.dbl.spamhaus.org) 189 | if [[ $SPAMHAUS_DBL != "" ]]; then 190 | echo "[ALERT] $TARGET_HOST is listed on SpamHaus Domain Blacklist" 191 | fi 192 | 193 | 194 | for TEMP_MAIL in ${TEMP_MAIL_ARRAY[@]}; do echo [*] $TEMP_MAIL; done | sort -u 195 | 196 | PAC_TEST=$(curl -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT $TARGET | grep -o FindProxyForURL) 197 | if [[ "$PAC_TEST" = "FindProxyForURL" ]]; then 198 | PAC_PROXY=$(curl -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT $TARGET | grep PROXY | grep -oE "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]):([0-9]{1,5})") 199 | echo "[ALERT] PAC (Proxy Auto Configuration) file found with possible PROXY: $PAC_PROXY" 200 | else 201 | echo "[INFO] NO PAC (Proxy Auto Configuration) file FOUND" 202 | fi 203 | 204 | TARGET_PATH=$(echo $TARGET | cut -d'/' -f4-20) 205 | FOLDER_COUNT=$(echo $TARGET_PATH | tr "/" " " | wc -w | sed -e 's/^[ \t]*//') 206 | if [[ $FOLDER_COUNT -ge 2 ]]; then 207 | echo "[INFO] Checking for HTTP status codes recursively from /$TARGET_PATH" 208 | echo -e "[INFO] Status code \t Folders " 209 | for (( dir = 1; dir < $FOLDER_COUNT; dir++ )); do 210 | TEMP_PATH=`echo $TARGET_PATH | cut -d '/' -f1-$dir` 211 | TEMP_HTTP_CODE=`curl -A $CURL_UA -L --write-out "%{http_code}\n" --silent --connect-timeout $CURL_TIMEOUT --output /dev/null "http://$TARGET_HOST/$TEMP_PATH"` 212 | echo -e "[*] \t $TEMP_HTTP_CODE \t\t http://$TARGET_HOST/$TEMP_PATH/" 213 | echo "http://$TARGET_HOST/$TEMP_PATH/" >> URLs_$TARGET_HOST.txt 214 | done 215 | fi 216 | 217 | ROBOTS=$(curl -A $CURL_UA -L --write-out "%{http_code}\n" --silent --connect-timeout $CURL_TIMEOUT --output /dev/null "http://$TARGET_HOST/robots.txt") 218 | if [[ $ROBOTS = 200 ]]; then 219 | echo "[ALERT] robots.txt file FOUND in http://$TARGET_HOST/robots.txt" 220 | echo "[INFO] Checking for HTTP status codes recursively from http://$TARGET_HOST/robots.txt" 221 | echo -e "[INFO] Status code \t Folders " 222 | for TEMP_ROBOTS in $(curl -A $CURL_UA -L --silent --connect-timeout $CURL_TIMEOUT "http://$TARGET_HOST/robots.txt" | grep -oE "^(All.*|Dis.*).*" | cut -d' ' -f2 | sort | uniq) 223 | do 224 | ROBOTS_CODE=`curl -L -A $CURL_UA --write-out "%{http_code}\n" --silent --connect-timeout $CURL_TIMEOUT --output /dev/null "http://$TARGET_HOST$TEMP_ROBOTS"` 225 | if [[ $ROBOTS_CODE =~ ^2 ]] || [[ $ROBOTS_CODE =~ ^3 ]]; then 226 | echo -e "[*] \t $ROBOTS_CODE \t\t http://$TARGET_HOST$TEMP_ROBOTS" 227 | echo http://$TARGET_HOST$TEMP_ROBOTS >> URLs_$TARGET_HOST.txt 228 | fi 229 | done 230 | fi 231 | 232 | echo "[INFO] Starting FUZZing in http://$TARGET_HOST/FUzZzZzZzZz..." 233 | echo -e "[INFO] Status code \t Folders " 234 | cat fuzz | head -$FUZZ_LIMIT | while read DIR 235 | do 236 | FUZZ_CODE=`curl -L -A $CURL_UA --write-out "%{http_code}\n" --silent --connect-timeout $CURL_TIMEOUT --output /dev/null "http://$TARGET_HOST/$DIR"` 237 | if [[ $FUZZ_CODE =~ ^2 ]] || [[ $FUZZ_CODE =~ ^3 ]]; then 238 | echo -e "[*] \t $FUZZ_CODE \t\t http://$TARGET_HOST/$DIR" 239 | echo http://$TARGET_HOST/$DIR >> URLs_$TARGET_HOST.txt 240 | fi 241 | done 242 | 243 | PASS1=$(curl -A $CURL_UA -L --silent --connect-timeout $CURL_TIMEOUT $TARGET | grep -Ei "0|1234|12345|111111|121212|123123|123321|123456|555555|654321|654321|666666|696969|1234567|7777777|12345678|123456789|987654321|1234567890|123qwe|18atcskd2w|1q2w3e|1q2w3e4r|1q2w3e4r5t|1qaz2wsx|3rjs1la7qe|abc123|access|admin|adobe123|ashley|azerty|bailey|baseball|batman|dragon|flower|Football|freedom|google|hello|hottie|iloveyou|jesus|letmein|login|loveme|master|michael|monkey|mustang|mynoob|ninja|passw0rd|password|password1|photoshop|princess|qazwsx|qwerty|qwertyuiop|shadow|solo|starwars|sunshine|superman|trustno1|welcome|whatever|zaq1zaq1|zxcvbnm") 244 | PASS2=$(curl -A $CURL_UA -L --silent --connect-timeout $CURL_TIMEOUT http://$TARGET_HOST/ | grep -Ei "0|1234|12345|111111|121212|123123|123321|123456|555555|654321|654321|666666|696969|1234567|7777777|12345678|123456789|987654321|1234567890|123qwe|18atcskd2w|1q2w3e|1q2w3e4r|1q2w3e4r5t|1qaz2wsx|3rjs1la7qe|abc123|access|admin|adobe123|ashley|azerty|bailey|baseball|batman|dragon|flower|Football|freedom|google|hello|hottie|iloveyou|jesus|letmein|login|loveme|master|michael|monkey|mustang|mynoob|ninja|passw0rd|password|password1|photoshop|princess|qazwsx|qwerty|qwertyuiop|shadow|solo|starwars|sunshine|superman|trustno1|welcome|whatever|zaq1zaq1|zxcvbnm") 245 | PASS3=$(curl -A $CURL_UA -L --silent --connect-timeout $CURL_TIMEOUT http://$TARGET_IP/ | grep -Ei "0|1234|12345|111111|121212|123123|123321|123456|555555|654321|654321|666666|696969|1234567|7777777|12345678|123456789|987654321|1234567890|123qwe|18atcskd2w|1q2w3e|1q2w3e4r|1q2w3e4r5t|1qaz2wsx|3rjs1la7qe|abc123|access|admin|adobe123|ashley|azerty|bailey|baseball|batman|dragon|flower|Football|freedom|google|hello|hottie|iloveyou|jesus|letmein|login|loveme|master|michael|monkey|mustang|mynoob|ninja|passw0rd|password|password1|photoshop|princess|qazwsx|qwerty|qwertyuiop|shadow|solo|starwars|sunshine|superman|trustno1|welcome|whatever|zaq1zaq1|zxcvbnm") 246 | 247 | if [[ $PASS1 != "" ]] || [[ $PASS2 != "" ]] || [[ $PASS3 != "" ]]; then 248 | echo "[ALERT] Look in the source code. It may contain passwords" 249 | else 250 | echo "[INFO] NO passwords found in source code" 251 | fi 252 | 253 | WWW_CHECK=$(echo $TARGET_HOST | grep -o www) 254 | MD1=$(curl -A $CURL_UA -L --silent --connect-timeout $CURL_TIMEOUT http://$TARGET_HOST/ | md5sum | cut -d' ' -f1) 255 | 256 | if [[ -z $WWW_CHECK ]]; then 257 | MD2=$(curl -A $CURL_UA -L --silent --connect-timeout $CURL_TIMEOUT "http://www.$TARGET_HOST/" | md5sum | cut -d' ' -f1) 258 | REDIR1=$(curl -A $CURL_UA -L --write-out "%{url_effective}\n" --silent --connect-timeout $CURL_TIMEOUT --output /dev/null http://$TARGET_HOST/) 259 | REDIR2=$(curl -A $CURL_UA -L --write-out "%{url_effective}\n" --silent --connect-timeout $CURL_TIMEOUT --output /dev/null "http://www.$TARGET_HOST/") 260 | 261 | if [[ $MD1 != $MD2 ]]; then 262 | echo "[ALERT] Content in http://$TARGET_HOST/ AND http://www.$TARGET_HOST/ is different" 263 | echo "[INFO] MD5 for http://$TARGET_HOST/ is: $MD1" 264 | echo "[INFO] MD5 for http://www.$TARGET_HOST/ is: $MD2" 265 | echo "[INFO] http://$TARGET_HOST/ redirects to $REDIR1" 266 | echo "[INFO] http://www.$TARGET_HOST/ redirects to $REDIR2" 267 | 268 | echo http://$TARGET_HOST/ >> URLs_$TARGET_HOST.txt 269 | echo http://www.$TARGET_HOST/ >> URLs_$TARGET_HOST.txt 270 | 271 | URL_ARRAY=($TARGET http://$TARGET_HOST/ http://$TARGET_IP/) 272 | fi 273 | fi 274 | 275 | MD3=$(curl -A $CURL_UA -L --silent --connect-timeout $CURL_TIMEOUT "http://$TARGET_IP/" | md5sum | cut -d' ' -f1) 276 | if [[ $MD1 = $MD3 ]]; then 277 | echo "[INFO] SAME content in http://$TARGET_HOST/ AND http://$TARGET_IP/" 278 | URL_ARRAY=($TARGET) 279 | else 280 | URL_ARRAY=($TARGET http://$TARGET_IP/) 281 | fi 282 | 283 | for TEMP_ARRAY in $(echo ${URL_ARRAY[*]}) 284 | do 285 | TEMP_LINK=`lynx -dump -force_html -listonly -nonumbers -accept_all_cookies -width=160 "$TEMP_ARRAY" | grep "^http\|^ftp\|^irc" | sort | uniq >> URLsExternal$TARGET_HOST.txt` 286 | done 287 | echo "[INFO] Links found from ${URL_ARRAY[*]}:" 288 | if [[ -s URLsExternal$TARGET_HOST.txt ]]; then 289 | cat URLsExternal$TARGET_HOST.txt | sort | uniq | while read LINKS 290 | do 291 | echo [*] $LINKS 292 | done 293 | fi 294 | 295 | if [[ $TLD_EXTRACT = "" ]]; then 296 | HOST_COUNT=$(echo $TARGET_HOST | tr "." " " | wc -w | sed -e 's/^[ \t]*//') 297 | if [[ $HOST_COUNT -ge 3 ]]; then 298 | CUT_TEMP=$(echo $HOST_COUNT -1 | bc) 299 | TARGET_DOMAIN=$(echo $TARGET_HOST | cut -d'.' -f$CUT_TEMP-$HOST_COUNT) 300 | else 301 | TARGET_DOMAIN=$TARGET_HOST 302 | fi 303 | fi 304 | 305 | if [[ $URLVOID_KEY != "" ]]; then 306 | echo "[INFO] URLvoid API information:" 307 | curl -L -A $CURL_UA --silent --connect-timeout $CURL_TIMEOUT http://api.urlvoid.com/api1000/$URLVOID_KEY/host/$TARGET_DOMAIN/ > $TARGET_DOMAIN.xml 308 | IFS=$'\n' && for URL_VOID_F in $(cat xml_fields) 309 | do 310 | URL_VOID_F1=$(echo $URL_VOID_F | cut -d',' -f1) 311 | URL_VOID_F2=$(echo $URL_VOID_F | cut -d',' -f2) 312 | URLVOID_RESULT=$(xmllint --xpath "string(//$URL_VOID_F1)" $TARGET_DOMAIN.xml) 313 | if [ ! -z $URLVOID_RESULT ] 314 | then 315 | echo "[*] $URL_VOID_F2: $URLVOID_RESULT" 316 | else 317 | echo "[*] $URL_VOID_F2: EMPTY" 318 | fi 319 | done 320 | fi 321 | 322 | if [[ $OPEN_TARGET_URLS != "NO" ]]; then 323 | COUNT=1 324 | cat URLs_$TARGET_HOST.txt | cut -d' ' -f2 | while read URL 325 | do 326 | if [[ $COUNT -le 1 ]]; then 327 | COUNT=$((COUNT+1)) 328 | xdg-open $URL 2>/dev/null 329 | sleep 5 330 | else 331 | xdg-open $URL 2>/dev/null 332 | sleep 1 333 | fi 334 | done 335 | fi 336 | 337 | 338 | if [[ $OPEN_EXTERNAL_LINKS != "NO" ]]; then 339 | COUNT=1 340 | cat URLsExternal$TARGET_HOST.txt | cut -d' ' -f2 | while read URL 341 | do 342 | if [[ $COUNT -le 1 ]]; then 343 | COUNT=$((COUNT+1)) 344 | xdg-open $URL 2>/dev/null 345 | sleep 5 346 | else 347 | xdg-open $URL 2>/dev/null 348 | sleep 1 349 | fi 350 | done 351 | fi 352 | 353 | LYNX_GOOGLE_COUNT=`lynx -dump -force_html -nolist -accept_all_cookies -width=160 "http://google.com/search?q=$TARGET_HOST" | grep "result" | wc -w | sed -e 's/^[ \t]*//'` 354 | LYNX_GOOGLE_COUNT_TEMP=`echo $LYNX_GOOGLE_COUNT -3 | bc` 355 | 356 | LYNX_GOOGLE=$(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "http://google.com/search?q=$TARGET_HOST" | grep "result" | sed -e 's/^[ \t]*//' | cut -d' ' -f$LYNX_GOOGLE_COUNT_TEMP-$LYNX_GOOGLE_COUNT) 357 | if [[ $LYNX_GOOGLE != "" ]]; then 358 | echo [INFO] GOOGLE has $LYNX_GOOGLE about http://$TARGET_HOST/ 359 | fi 360 | 361 | LYNX_BING_IP=$(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "http://www.bing.com/search?q=ip%3A$TARGET_IP" | grep "resultsDate" | awk '{print$1}') 362 | if [[ $LYNX_BING_IP != "" ]]; then 363 | echo [INFO] BING shows $TARGET_IP is shared with $LYNX_BING_IP hosts/vhosts 364 | fi 365 | 366 | echo [INFO] Shodan detected the following opened ports on $TARGET_IP: 367 | for SHODAN_PROTO in $(lynx -dump -force_html -nolist -accept_all_cookies "https://www.shodan.io/host/$TARGET_IP" | grep '*' | grep -o '[0-9]*' | sort | uniq) 368 | do 369 | echo [*] $SHODAN_PROTO 370 | done 371 | 372 | echo "[INFO] ------VirusTotal SECTION------" 373 | echo "[INFO] VirusTotal passive DNS only stores address records. The following domains resolved to the given IP address:" 374 | IFS=$'\n' && for VIRUST_DNS in $(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "https://www.virustotal.com/pt/ip-address/$TARGET_IP/information/" | grep -A10 'passive DNS only stores address records' | grep -v '/' | grep -o '20.*' | column -t) 375 | do 376 | echo [*] $VIRUST_DNS 377 | done 378 | 379 | 380 | echo "[INFO] Latest URLs hosted in this IP address detected by at least one URL scanner or malicious URL dataset:" 381 | IFS=$'\n' && for VIRUST_URLS_D in $(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "https://www.virustotal.com/pt/ip-address/$TARGET_IP/information/" | grep -A10 'URLs hosted in this IP address' | grep "$TARGET_HOST" | column -t) 382 | do 383 | echo [*] $VIRUST_URLS_D 384 | done 385 | 386 | echo "[INFO] Latest files that are not detected by any antivirus solution and were downloaded by VirusTotal from the IP address provided:" 387 | IFS=$'\n' && for VIRUST_URLS_N in $(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "https://www.virustotal.com/pt/ip-address/$TARGET_IP/information/" | grep -A10 'not detected by any antivirus' | grep '/' | column -t) 388 | do 389 | echo [*] $VIRUST_URLS_N 390 | done 391 | 392 | echo "[INFO] ------Alexa Rank SECTION------" 393 | echo "[INFO] Percent of Visitors Rank in Country:" 394 | IFS=$'\n' && for ALEXA_COUNTRY in $(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "http://www.alexa.com/siteinfo/$TARGET_HOST" | grep -A5 'Percent of Visitors Rank in Country' | tail -5 | sed -e 's/^[ \t]*//' | sed -n -e 's/^.*Flag //p' | awk '{print$1,$2,$3,$4,$5}') 395 | do 396 | echo [*] $ALEXA_COUNTRY 397 | done 398 | 399 | echo "[INFO] Percent of Search Traffic:" 400 | IFS=$'\n' && for ALEXA_SEARCH in $(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "http://www.alexa.com/siteinfo/$TARGET_HOST" | grep -A5 'Percent of Search Traffic' | sed -e 's/^[ \t]*//' | grep -o '[0-9].*\..*' | cut -d' ' -f2-50 | sed -e 's/^[ \t]*//') 401 | do 402 | echo [*] $ALEXA_SEARCH 403 | done 404 | 405 | echo "[INFO] Percent of Unique Visits:" 406 | IFS=$'\n' && for ALEXA_VISITS in $(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "http://www.alexa.com/siteinfo/$TARGET_HOST" | grep -A5 'Percent of Unique Visits' | sed -e 's/^[ \t]*//' | grep -o '[0-9].*\..*' | awk '{print$2,$3}' | column -t) 407 | do 408 | echo [*] $ALEXA_VISITS 409 | done 410 | 411 | echo "[INFO] Total Sites Linking In:" 412 | IFS=$'\n' && for ALEXA_LINKING in $(lynx -dump -force_html -nolist -accept_all_cookies -width=160 "http://www.alexa.com/siteinfo/$TARGET_HOST" | grep -A9 'Total Sites Linking In' | sed -e 's/^[ \t]*//' | grep -o '[0-9].*\..*' | awk '{print$2,$3}' | head -5 | column -t) 413 | do 414 | echo [*] $ALEXA_LINKING 415 | done 416 | 417 | echo [INFO] Useful links related to $TARGET_HOST - $TARGET_IP: 418 | echo "[*] https://www.virustotal.com/pt/ip-address/$TARGET_IP/information/" | tee -a URLs_$TARGET_HOST.txt 419 | echo "[*] https://www.hybrid-analysis.com/search?host=$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 420 | echo "[*] https://www.shodan.io/host/$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 421 | echo "[*] https://www.senderbase.org/lookup/?search_string=$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 422 | echo "[*] https://www.alienvault.com/open-threat-exchange/ip/$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 423 | echo "[*] http://pastebin.com/search?q=$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 424 | echo "[*] http://urlquery.net/search.php?q=$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 425 | echo "[*] http://www.alexa.com/siteinfo/$TARGET_HOST" | tee -a URLs_$TARGET_HOST.txt 426 | echo "[*] http://www.google.com/safebrowsing/diagnostic?site=$TARGET_HOST" | tee -a URLs_$TARGET_HOST.txt 427 | echo "[*] https://censys.io/ipv4/$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 428 | echo "[*] https://www.abuseipdb.com/check/$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 429 | echo "[*] https://urlscan.io/search/#$TARGET_IP" | tee -a URLs_$TARGET_HOST.txt 430 | echo "[*] https://github.com/search?q=$TARGET_IP&type=Code" | tee -a URLs_$TARGET_HOST.txt 431 | 432 | if [[ $TARGET_IP_ASN != "" ]]; then 433 | echo [INFO] Useful links related to $TARGET_IP_ASN - $TARGET_IP_BGP: 434 | TARGET_IP_ASN_TEMP=$(echo $TARGET_IP_ASN | cut -c3-12) 435 | echo "[*] http://www.google.com/safebrowsing/diagnostic?site=AS:$TARGET_IP_ASN_TEMP" | tee -a URLs_$TARGET_HOST.txt 436 | echo "[*] https://www.senderbase.org/lookup/?search_string=$TARGET_IP_BGP" | tee -a URLs_$TARGET_HOST.txt 437 | echo "[*] http://bgp.he.net/$TARGET_IP_ASN" | tee -a URLs_$TARGET_HOST.txt 438 | echo "[*] https://stat.ripe.net/$TARGET_IP_ASN" | tee -a URLs_$TARGET_HOST.txt 439 | fi 440 | 441 | rm -f URLs_$TARGET_HOST.txt $TARGET_DOMAIN.xml URLsExternal$TARGET_HOST.txt 442 | 443 | echo -e "`date +"%H:%M:%S %d/%m/%Y"`,$TARGET_IP,$TARGET" >> log.csv 444 | 445 | date '+[INFO] Date: %d/%m/%y | Time: %H:%M:%S' 446 | date_end=$(date +"%s") 447 | difference=$(($date_end-$date_begin)) 448 | echo "[INFO] Total time: $(($difference / 60)) minute(s) and $(($difference %60)) second(s)" 449 | 450 | exit 0 451 | fi 452 | -------------------------------------------------------------------------------- /log.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/eschultze/URLextractor/739864d0f8c6427c835a31e3360d491947fe406a/log.csv -------------------------------------------------------------------------------- /requirements: -------------------------------------------------------------------------------- 1 | curl 2 | whois 3 | libxml2-utils 4 | bc 5 | dnsutils 6 | md5sha1sum 7 | lynx 8 | openssl 9 | -------------------------------------------------------------------------------- /xml_fields: -------------------------------------------------------------------------------- 1 | domain_age,Domain age 2 | google_page_rank,Google page rank 3 | alexa_rank,Alexa rank 4 | hostname,Hostname 5 | count,Blacklisted 6 | --------------------------------------------------------------------------------