├── Gemfile ├── README.md ├── domains.txt ├── keywords.txt └── spiderpig.rb /Gemfile: -------------------------------------------------------------------------------- 1 | #GEMS 2 | source "https://rubygems.org" 3 | 4 | gem 'anemone' 5 | gem 'yomu' 6 | gem 'trollop' 7 | gem 'colorize' 8 | gem 'luhn' 9 | gem 'ipaddress' 10 | gem 'exiftool_vendored' 11 | gem 'exiftool' -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Spiderpig is a document metadata harvester first and foremost. It is intended for use by security professionals and pen testers. Please do not run it against sites without permission. 2 | Spiderpig actively spiders a site, downloads all documents and parses out useful data. You can also provide a domain instead of a full URL, it will DNS brute force sub-domains before spidering each resolved name, downloading the files and doing the metadata harvesting. 3 | 4 | Most document metadata harvesters use search results to find documents. Spiderpig was created to provide an alternative to that. 5 | 6 | ### Basic usage 7 | 8 | **./spiderpig -u** http://www.somewebsite.com - Spiders the provided URL, downloads documents and prints out the document creator (potentially a username) and the software used to create the document. 9 | 10 | **./spiderpig -d somewebsite.com** - Performs sub-domain brute forcing, then spiders each resolved name. Currently the default sub-domain list is 'domains.txt' which is included with Spiderpig. This is a slightly modified 'small.txt' from dirb - https://sourceforge.net/projects/dirb/ 11 | 12 | **./spiderpig -d somewebsite.com -b mysubdomains.txt** - Specify your own subdomain text file for brute forcing. 13 | 14 | There are also options to obey the robots.txt (or not), use a proxy server, specify the spidering depth, specify a user agent string and specify a dns server: 15 | 16 | **-o, --obey-robots Should we obey robots.txt? Default is true (default: True)** 17 | 18 | **-e, --depth Spidering depth - Think before setting too large a value (default: 2)** 19 | 20 | **-s, --user-agent Enter your own user agent string in double quotes! 21 | (default: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0)Gecko/20100101 Firefox/40.1)** 22 | 23 | **-n, --dns-server Provide a custom DNS server to use for subdomain lookups - Google resolver1 is the default (default:8.8.8.8)** 24 | 25 | **-p, --proxy Specify a proxy server** 26 | 27 | **-r, --proxyp Specify a proxy port** 28 | 29 | **-x, --exif Downloads image files and parses them for Exif GeoTags, camera make and model** 30 | **-f, --offline Harvest files you already have** 31 | 32 | Note - Exif mode is currently 'standalone' - i.e, not to be used in conjuction with other options. Example: 33 | ./spiderpig -u http://somewebsite.com --exif 34 | Exif mode also drops a gmaps.csv file that is google maps import compatible. That means you can see a map of where all the images were taken, as below: 35 | 36 | ![Google Maps](https://cloud.githubusercontent.com/assets/5301488/14234060/0d51ac24-f9d1-11e5-873c-b5d7a4c121a3.png) 37 | 38 | 39 | 40 | ### Dirtmode 41 | 42 | Dirtmode is where things get a little more interesting. It is designed to find 'dirt' on your target organisation. 43 | Currently, Dirtmode will pull out the following information from all downloaded documents: 44 | 45 | - Email addresses 46 | - Credit Card Numbers (Luhn/Mod10 validated) 47 | - IP Addresses 48 | - Keywords - See keywords.txt and add your own. This functionality is designed to find information that shouldn't be in the public domain, for example passwords in documents, references to internal systems and administrative protocols etc. This could be edited to find whatever you like realistically. Feel free to make a request and I will endevour to add it. 49 | 50 | When running Dirtmode, you can also generate a wordlist. This simply builds a flat text file of all words seen in all documents. This can be useful in two ways. 1) As a file for sub-domain brute forcing and 2) As a password list for remote password attacks or hash cracking. Example usage: 51 | 52 | **./spiderpig -u http://www.somewebsite.com --dirtmode --passlist** 53 | This will drop a 'passlist.txt' into the datestamped directory that contains all downloaded documents. 54 | 55 | ### Installation 56 | Just run 'bundle install' from the cloned directory. 57 | If you don't have bundler installed, 'sudo gem install bundler' should do it. 58 | 59 | ### Notes/Known issues 60 | If you run into issues, comment out '**$stderr.reopen("/dev/null", "w")**' on line 27. This will send errors back to your console. 61 | If you get an error about not being able to create a listener, kill java and try again. This is because the Yomu metadata module uses Apache Tika (Java) to get data. This spawns a local server for faster processing. It often does not die correctly and holds onto the port it bound to. 62 | 63 | Tested on OSX El Capitan with ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-darwin14] and Kali 2.0 Rolling. Should work on any system that can run Ruby. 64 | -------------------------------------------------------------------------------- /domains.txt: -------------------------------------------------------------------------------- 1 | Admin 2 | Administration 3 | CVS 4 | CYBERDOCS 5 | CYBERDOCS25 6 | CYBERDOCS31 7 | INSTALL_admin 8 | Log 9 | Logs 10 | Pages 11 | Servlet 12 | Servlets 13 | SiteServer 14 | Sources 15 | Statistics 16 | Stats 17 | W3SVC 18 | W3SVC1 19 | W3SVC2 20 | W3SVC3 21 | WEB-INF 22 | _admin 23 | _pages 24 | a 25 | aa 26 | aaa 27 | abc 28 | about 29 | academic 30 | access 31 | accessgranted 32 | account 33 | accounting 34 | action 35 | actions 36 | active 37 | adm 38 | admin 39 | admin_ 40 | admin_login 41 | admin_logon 42 | administrat 43 | administration 44 | administrator 45 | adminlogin 46 | adminlogon 47 | adminsql 48 | admon 49 | adsl 50 | agent 51 | agents 52 | alias 53 | aliases 54 | all 55 | alpha 56 | analog 57 | analyse 58 | announcements 59 | answer 60 | any 61 | apache 62 | api 63 | app 64 | applet 65 | applets 66 | appliance 67 | application 68 | applications 69 | apps 70 | archive 71 | archives 72 | arrow 73 | asp 74 | aspadmin 75 | assets 76 | attach 77 | attachments 78 | audit 79 | auth 80 | auto 81 | automatic 82 | b 83 | back 84 | back-up 85 | backdoor 86 | backend 87 | backoffice 88 | backup 89 | backups 90 | bak 91 | bak-up 92 | bakup 93 | bank 94 | banks 95 | banner 96 | banners 97 | base 98 | basic 99 | bass 100 | batch 101 | bd 102 | bdata 103 | bea 104 | bean 105 | beans 106 | beta 107 | bill 108 | billing 109 | bin 110 | binaries 111 | biz 112 | blog 113 | blow 114 | board 115 | boards 116 | body 117 | boot 118 | bot 119 | bots 120 | box 121 | boxes 122 | broken 123 | bsd 124 | bug 125 | bugs 126 | build 127 | builder 128 | bulk 129 | buttons 130 | c 131 | cache 132 | cachemgr 133 | cad 134 | can 135 | captcha 136 | car 137 | card 138 | cardinal 139 | cards 140 | carpet 141 | cart 142 | cas 143 | cat 144 | catalog 145 | catalogs 146 | catch 147 | cc 148 | ccs 149 | cert 150 | certenroll 151 | certificate 152 | certificates 153 | certs 154 | cfdocs 155 | cfg 156 | cgi 157 | cgi-bin 158 | cgi-bin/ 159 | cgi-win 160 | cgibin 161 | chan 162 | change 163 | changepw 164 | channel 165 | chart 166 | chat 167 | class 168 | classes 169 | classic 170 | classified 171 | classifieds 172 | client 173 | clients 174 | cluster 175 | cm 176 | cmd 177 | code 178 | coffee 179 | coke 180 | command 181 | commerce 182 | commercial 183 | common 184 | component 185 | compose 186 | composer 187 | compressed 188 | comunicator 189 | con 190 | config 191 | configs 192 | configuration 193 | configure 194 | connect 195 | connections 196 | console 197 | constant 198 | constants 199 | contact 200 | contacts 201 | content 202 | contents 203 | control 204 | controller 205 | controlpanel 206 | controls 207 | corba 208 | core 209 | corporate 210 | count 211 | counter 212 | cpanel 213 | create 214 | creation 215 | credit 216 | creditcards 217 | cron 218 | crs 219 | css 220 | customer 221 | customers 222 | cv 223 | cvs 224 | d 225 | daemon 226 | dat 227 | data 228 | database 229 | databases 230 | dav 231 | db 232 | dba 233 | dbase 234 | dbm 235 | dbms 236 | debug 237 | default 238 | delete 239 | deletion 240 | demo 241 | demos 242 | deny 243 | deploy 244 | deployment 245 | design 246 | details 247 | dev 248 | dev60cgi 249 | devel 250 | develop 251 | developement 252 | developers 253 | development 254 | device 255 | devices 256 | devs 257 | diag 258 | dial 259 | dig 260 | dir 261 | directory 262 | discovery 263 | disk 264 | dispatch 265 | dispatcher 266 | dms 267 | dns 268 | doc 269 | docs 270 | docs41 271 | docs51 272 | document 273 | documents 274 | down 275 | download 276 | downloads 277 | draft 278 | dragon 279 | dratfs 280 | driver 281 | dump 282 | dumpenv 283 | e 284 | easy 285 | ebriefs 286 | echannel 287 | ecommerce 288 | edit 289 | editor 290 | element 291 | elements 292 | employees 293 | en 294 | eng 295 | engine 296 | english 297 | enterprise 298 | env 299 | environ 300 | environment 301 | error 302 | errors 303 | es 304 | esales 305 | esp 306 | established 307 | esupport 308 | etc 309 | event 310 | events 311 | example 312 | examples 313 | exchange 314 | exe 315 | exec 316 | executable 317 | executables 318 | explorer 319 | export 320 | external 321 | extra 322 | Extranet 323 | extranet 324 | fail 325 | failed 326 | fcgi-bin 327 | feedback 328 | field 329 | file 330 | files 331 | filter 332 | firewall 333 | first 334 | flash 335 | folder 336 | foo 337 | forget 338 | forgot 339 | forgotten 340 | form 341 | format 342 | formhandler 343 | formsend 344 | formupdate 345 | fortune 346 | forum 347 | forums 348 | frame 349 | framework 350 | ftp 351 | fun 352 | function 353 | functions 354 | games 355 | gate 356 | generic 357 | gest 358 | get 359 | global 360 | globalnav 361 | globals 362 | gone 363 | gp 364 | gpapp 365 | granted 366 | graphics 367 | group 368 | groups 369 | guest 370 | guestbook 371 | guests 372 | hack 373 | hacker 374 | handler 375 | hanlder 376 | happening 377 | head 378 | header 379 | headers 380 | hello 381 | helloworld 382 | help 383 | hidden 384 | hide 385 | history 386 | hits 387 | home 388 | homepage 389 | homes 390 | homework 391 | host 392 | hosts 393 | htdocs 394 | htm 395 | html 396 | htmls 397 | ibm 398 | icons 399 | idbc 400 | iis 401 | images 402 | img 403 | import 404 | inbox 405 | inc 406 | include 407 | includes 408 | incoming 409 | incs 410 | index 411 | index2 412 | index_adm 413 | index_admin 414 | indexes 415 | info 416 | information 417 | ingres 418 | ingress 419 | ini 420 | init 421 | input 422 | install 423 | installation 424 | interactive 425 | internal 426 | internet 427 | intranet 428 | intro 429 | inventory 430 | invitation 431 | invite 432 | ipp 433 | ips 434 | j 435 | java 436 | java-sys 437 | javascript 438 | jdbc 439 | job 440 | join 441 | jrun 442 | js 443 | jsp 444 | jsps 445 | jsr 446 | keep 447 | kept 448 | kernel 449 | key 450 | lab 451 | labs 452 | launch 453 | launchpage 454 | ldap 455 | left 456 | level 457 | lib 458 | libraries 459 | library 460 | libs 461 | link 462 | links 463 | linux 464 | list 465 | load 466 | loader 467 | lock 468 | lockout 469 | log 470 | logfile 471 | logfiles 472 | logger 473 | logging 474 | login 475 | logo 476 | logon 477 | logout 478 | logs 479 | lost%2Bfound 480 | ls 481 | magic 482 | maillist 483 | main 484 | maint 485 | makefile 486 | man 487 | manage 488 | management 489 | manager 490 | manual 491 | map 492 | market 493 | marketing 494 | master 495 | mbo 496 | mdb 497 | me 498 | member 499 | members 500 | memory 501 | menu 502 | message 503 | messages 504 | messaging 505 | meta 506 | metabase 507 | mgr 508 | mine 509 | minimum 510 | mirror 511 | mirrors 512 | misc 513 | mkstats 514 | model 515 | modem 516 | module 517 | modules 518 | monitor 519 | mount 520 | mp3 521 | mp3s 522 | mqseries 523 | mrtg 524 | ms 525 | ms-sql 526 | msql 527 | mssql 528 | music 529 | my 530 | my-sql 531 | mysql 532 | names 533 | navigation 534 | ne 535 | net 536 | netscape 537 | netstat 538 | network 539 | new 540 | news 541 | next 542 | nl 543 | nobody 544 | notes 545 | novell 546 | nul 547 | null 548 | number 549 | object 550 | objects 551 | odbc 552 | of 553 | off 554 | office 555 | ogl 556 | old 557 | oldie 558 | on 559 | online 560 | open 561 | openapp 562 | openfile 563 | operator 564 | oracle 565 | oradata 566 | order 567 | orders 568 | outgoing 569 | output 570 | pad 571 | page 572 | pages 573 | pam 574 | panel 575 | paper 576 | papers 577 | pass 578 | passes 579 | passw 580 | passwd 581 | passwor 582 | password 583 | passwords 584 | path 585 | pdf 586 | perl 587 | perl5 588 | personal 589 | personals 590 | pgsql 591 | phone 592 | php 593 | phpMyAdmin 594 | phpmyadmin 595 | pics 596 | ping 597 | pix 598 | pl 599 | pls 600 | plx 601 | pol 602 | policy 603 | poll 604 | pop 605 | portal 606 | portlet 607 | portlets 608 | post 609 | postgres 610 | power 611 | press 612 | preview 613 | print 614 | printenv 615 | priv 616 | private 617 | privs 618 | process 619 | processform 620 | prod 621 | production 622 | products 623 | professor 624 | profile 625 | program 626 | project 627 | proof 628 | properties 629 | protect 630 | protected 631 | proxy 632 | ps 633 | pub 634 | public 635 | publish 636 | publisher 637 | purchase 638 | purchases 639 | put 640 | pw 641 | pwd 642 | python 643 | query 644 | queue 645 | quote 646 | ramon 647 | random 648 | rank 649 | rcs 650 | readme 651 | redir 652 | redirect 653 | reference 654 | references 655 | reg 656 | reginternal 657 | regional 658 | register 659 | registered 660 | release 661 | remind 662 | reminder 663 | remote 664 | removed 665 | report 666 | reports 667 | requisite 668 | research 669 | reseller 670 | resource 671 | resources 672 | responder 673 | restricted 674 | retail 675 | right 676 | robot 677 | robotics 678 | root 679 | route 680 | router 681 | rpc 682 | rss 683 | rules 684 | run 685 | sales 686 | sample 687 | samples 688 | save 689 | saved 690 | schema 691 | scr 692 | scratc 693 | script 694 | scripts 695 | sdk 696 | search 697 | secret 698 | secrets 699 | section 700 | sections 701 | secure 702 | secured 703 | security 704 | select 705 | sell 706 | send 707 | sendmail 708 | sensepost 709 | sensor 710 | sent 711 | server 712 | server_stats 713 | servers 714 | service 715 | services 716 | servlet 717 | servlets 718 | session 719 | sessions 720 | set 721 | setting 722 | settings 723 | setup 724 | share 725 | shared 726 | shell 727 | shit 728 | shop 729 | shopper 730 | show 731 | showcode 732 | shtml 733 | sign 734 | signature 735 | signin 736 | simple 737 | single 738 | site 739 | sitemap 740 | sites 741 | small 742 | snoop 743 | soap 744 | soapdocs 745 | software 746 | solaris 747 | solutions 748 | somebody 749 | source 750 | sources 751 | spain 752 | spanish 753 | sql 754 | sqladmin 755 | src 756 | srchad 757 | srv 758 | ssi 759 | ssl 760 | staff 761 | start 762 | startpage 763 | stat 764 | statistic 765 | statistics 766 | stats 767 | status 768 | stop 769 | store 770 | story 771 | string 772 | student 773 | stuff 774 | style 775 | stylesheet 776 | stylesheets 777 | submit 778 | submitter 779 | sun 780 | super 781 | support 782 | supported 783 | survey 784 | svc 785 | svn 786 | svr 787 | sw 788 | sys 789 | sysadmin 790 | system 791 | table 792 | tag 793 | tape 794 | tar 795 | target 796 | tech 797 | temp 798 | template 799 | templates 800 | temporal 801 | temps 802 | terminal 803 | test 804 | testing 805 | tests 806 | text 807 | texts 808 | ticket 809 | tmp 810 | today 811 | tool 812 | toolbar 813 | tools 814 | top 815 | topics 816 | tour 817 | tpv 818 | trace 819 | traffic 820 | transactions 821 | transfer 822 | transport 823 | trap 824 | trash 825 | tree 826 | trees 827 | tutorial 828 | uddi 829 | uninstall 830 | unix 831 | up 832 | update 833 | updates 834 | upload 835 | uploader 836 | uploads 837 | usage 838 | user 839 | users 840 | usr 841 | ustats 842 | util 843 | utilities 844 | utility 845 | utils 846 | validation 847 | validatior 848 | vap 849 | var 850 | vb 851 | vbs 852 | vbscript 853 | vbscripts 854 | vfs 855 | view 856 | viewer 857 | views 858 | virtual 859 | visitor 860 | vpn 861 | w 862 | w3 863 | w3c 864 | warez 865 | wdav 866 | web 867 | webaccess 868 | webadmin 869 | webapp 870 | webboard 871 | webcart 872 | webdata 873 | webdav 874 | webdist 875 | webhits 876 | weblog 877 | weblogic 878 | weblogs 879 | webmail 880 | webmaster 881 | websearch 882 | website 883 | webstat 884 | webstats 885 | webvpn 886 | welcome 887 | wellcome 888 | whatever 889 | whatnot 890 | whois 891 | will 892 | win 893 | windows 894 | word 895 | wordpress 896 | work 897 | workplace 898 | workshop 899 | wp 900 | wstats 901 | wusage 902 | www 903 | wwwboard 904 | wwwjoin 905 | wwwlog 906 | wwwstats 907 | xcache 908 | xfer 909 | xml 910 | xmlrpc 911 | xsl 912 | xsql 913 | xyz 914 | zap 915 | zip 916 | zipfiles 917 | zips 918 | ~adm 919 | ~admin 920 | ~administrator 921 | ~bin 922 | ~ftp 923 | ~guest 924 | ~mail 925 | ~operator 926 | ~root 927 | ~sys 928 | ~sysadm 929 | ~sysadmin 930 | ~test 931 | ~user 932 | ~webmaster 933 | ~www -------------------------------------------------------------------------------- /keywords.txt: -------------------------------------------------------------------------------- 1 | vpn 2 | rdp 3 | ssh 4 | telnet 5 | citrix 6 | pin 7 | code 8 | password 9 | user 10 | username 11 | pass 12 | login 13 | datacenter 14 | datacentre 15 | /p 16 | /u 17 | :u 18 | :p 19 | -u 20 | -p 21 | --user 22 | --pass 23 | --password 24 | sysvol 25 | script 26 | domain 27 | workgroup 28 | c: 29 | d: 30 | database 31 | sql 32 | oracle 33 | su - 34 | sudo 35 | useradd 36 | usermod 37 | userdel 38 | passwd 39 | chmod 40 | chown 41 | /etc/shadow 42 | /etc/passwd 43 | system32 44 | winnt -------------------------------------------------------------------------------- /spiderpig.rb: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env ruby 2 | #test 3 | #https://github.com/hatlord/Spiderpig 4 | 5 | require 'anemone' 6 | require 'yomu' 7 | require 'resolv' 8 | require 'trollop' 9 | require 'colorize' 10 | require 'luhn' 11 | require 'ipaddress' 12 | require 'exiftool' 13 | require 'exiftool_vendored' 14 | 15 | banner = <<-OINK 16 | ┈┈┏━╮╭━┓┈╭━━━━━━━━━━━━━━━━╮ 17 | ┈┈┃┏┗┛┓┃╭┫SpiderPig v0.97b┃ 18 | ┈┈╰┓▋▋┏╯╯╰━━━━━━━━━━━━━━━━╯ 19 | ┈╭━┻╮╲┗━━━━╮╭╮┈ 20 | ┈┃▎▎┃╲╲╲╲╲╲┣━╯┈ 21 | ┈╰━┳┻▅╯╲╲╲╲┃┈┈┈ 22 | ┈┈┈╰━┳┓┏┳┓┏╯┈┈┈ 23 | ┈┈┈┈┈┗┻┛┗┻┛┈┈┈┈ 24 | OINK 25 | puts banner.light_magenta 26 | 27 | 28 | @foldername = Time.now.strftime("%d%b%Y_%H%M%S") 29 | Dir.mkdir @foldername 30 | $stderr.reopen("/dev/null", "w") 31 | 32 | def arguments 33 | 34 | opts = Trollop::options do 35 | version "Spiderpig v0.97beta".light_blue 36 | banner <<-EOS 37 | 38 | Spiderpig is a document metadata harvester that relies on active spidering to find its documents. This is to 39 | provide an alternative to harvesters that use search results to identify documents. It requires either a full URL 40 | or a domain name. If you provide a domain name, it will do sub-domain brute forcing and then spider each site it finds. 41 | You can either use the default sub-domains file, or specify your own with a full path to that file. 42 | Examples: 43 | Test a URL: 44 | ./spiderpig.rb -u http://www.website.com 45 | Test a domain - This will do sub-domain enumeration with the default subs file: 46 | ./spiderpig.rb -d website.com 47 | Test domain with your own sub-domain file: 48 | ./spiderpig.rb -d website.com -b mysubsfile.txt 49 | 50 | EOS 51 | 52 | opt :url, "Choose a specific site to spider - Ensure you include http:// etc.", :type => String 53 | opt :domain, "Choose a domain. We will perform sub domain brute forcing, then spider the results", :type => String 54 | opt :obey_robots, "Should we obey robots.txt? Default is true", :default => "True" 55 | opt :depth, "Spidering depth - Think before setting too large a value", :default => 2 56 | opt :user_agent, "Enter your own user agent string in double quotes!", :default => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1" 57 | opt :subdomains, "subs", :default => "domains.txt" 58 | opt :dns_server, "Provide a custom DNS server to use for subdomain lookups - Google resolver1 is the default", :default => "8.8.8.8" 59 | opt :proxy, "Specify a proxy server", :default => nil 60 | opt :proxyp, "Specify a proxy port", :default => nil 61 | opt :dirtmode, "Dig within documents for sensitive data. Currently IPs, credit card numbers, emails" 62 | opt :passlist, "Builds a wordlist from the content of all downloaded documents. Must be used with --dirtmode" 63 | opt :exif, "Downloads image files and parses them for Exif GeoTags" 64 | opt :offline, "Harvest Files You Already Have" 65 | 66 | if ARGV.empty? 67 | puts "Need Help? Try ./spiderpig --help or -h" 68 | exit 69 | end 70 | end 71 | opts 72 | end 73 | 74 | def subdomains(arg) 75 | subs = [] 76 | 77 | if arg[:domain] 78 | subs << arg[:domain] 79 | end 80 | if arg[:url] 81 | subs << arg[:url] 82 | end 83 | 84 | target = arg[:domain] 85 | if arg[:domain] 86 | puts "Subdomain enumeration for #{target} beginning at #{Time.now.strftime("%H:%M:%S")}" 87 | 88 | File.open(arg[:subdomains],"r").each_line do |subdomain| 89 | Resolv.new(resolvers=[arg[:dns_server]]) 90 | subdomain.chomp! 91 | ip = Resolv.getaddress "#{subdomain}.#{target}" rescue "" 92 | if ip != nil 93 | puts "#{subdomain}.#{target} \t #{ip}" 94 | subs << "http://#{subdomain}.#{target}" 95 | end 96 | end 97 | end 98 | subs 99 | end 100 | 101 | def download(arg, subdomains) 102 | 103 | if !arg[:exif] 104 | doc = /\b.+.pdf|\b.+.doc$|\b.+.docx$|\b.+.xls$|\b.+.xlsx$|\b.+.pages/i 105 | else 106 | doc = /\b.+.jpg|\b.+.tiff/i 107 | end 108 | if arg[:url] 109 | puts "\nSearching For Files on #{arg[:url]}".red 110 | end 111 | if arg[:domain] 112 | puts "\nSearching For Files on #{arg[:domain]} subdomains".red 113 | end 114 | 115 | puts "Downloading Files:\n".red 116 | subdomains.each do |subs| 117 | Anemone.crawl( 118 | subs, 119 | :depth_limit => arg[:depth], 120 | :obey_robots_txt => arg[:obey_robots], 121 | :user_agent => arg[:user_agent], 122 | :proxy_host => arg[:proxy], 123 | :proxy_port => arg[:proxyp], 124 | :accept_cookies => true, 125 | :skip_query_strings => true, 126 | :verbose => false 127 | ) do |anemone| 128 | anemone.on_pages_like(doc) do |page| 129 | begin 130 | filename = File.basename(page.url.request_uri.to_s) 131 | File.open("#{@foldername}/#{filename}","wb") {|f| f.write(page.body)} 132 | puts "#{page.url}" 133 | File.open('/tmp/links.txt', 'a+') {|f| f.puts(page.url)} 134 | rescue 135 | puts "error while downloading #{page.url}" 136 | end 137 | end 138 | end 139 | end 140 | end 141 | 142 | def metadata(files, arg) 143 | metadata = [] 144 | port = rand(54000..59000) 145 | 146 | if !files.empty? and !arg[:exif] 147 | Yomu.server(:metadata, custom_port = port) 148 | puts "\nReading MetaData From Files - This may take some time!\n".red 149 | files.each do |file| 150 | puts "Processing #{file}".green 151 | metadata << Yomu.new(file).metadata 152 | end 153 | Yomu.kill_server! 154 | end 155 | metadata 156 | end 157 | 158 | def filecontent(files, arg) 159 | alltext = [] 160 | port = rand(60000..65000) 161 | 162 | if arg[:dirtmode] 163 | if !files.empty? and !arg[:exif] 164 | Yomu.server(:text, custom_port = port) 165 | files.each do |file| 166 | hash = {} 167 | hash[file] = Yomu.new(file).text 168 | alltext << hash 169 | end 170 | Yomu.kill_server! 171 | end 172 | alltext 173 | end 174 | end 175 | 176 | def emails(content, arg) 177 | email_regex = /[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/i 178 | 179 | if arg[:dirtmode] 180 | puts "\nEmail addresses found in documents:".light_blue 181 | 182 | content.each do |z| 183 | z.each do |k, v| 184 | email = v.to_s.scan email_regex 185 | email.uniq! 186 | if !email.empty? 187 | puts "\n" + k.to_s + "\n" + email.join("\n").cyan 188 | end 189 | end 190 | end 191 | end 192 | end 193 | 194 | def ipaddr(content, arg) 195 | ip_regex = /\d+\.\d+\.\d+\.\d+/ 196 | 197 | if arg[:dirtmode] 198 | puts "\nPossible IP Addresses found in documents:".light_blue 199 | 200 | content.each do |z| 201 | z.each do |k, v| 202 | ip = v.to_s.scan ip_regex 203 | ip.uniq! 204 | ip.keep_if { |ip| IPAddress.valid? ip } 205 | if !ip.empty? 206 | puts "\n" + k.to_s + "\n" + ip.join("\n").cyan 207 | end 208 | end 209 | end 210 | end 211 | end 212 | 213 | def cc(content, arg) 214 | cc_regex = /\b(?:\d[ -]*?){13,16}\b/ 215 | 216 | if arg[:dirtmode] 217 | puts "\nPossible Credit Card Number Found!!".light_blue 218 | 219 | content.each do |z| 220 | z.each do |k, v| 221 | cc = v.to_s.scan cc_regex 222 | cc.uniq! 223 | cc.each { |card| card.gsub!(/-| /, "") } 224 | cc.keep_if { |card| Luhn.valid? card } 225 | if !cc.empty? 226 | puts "\n" + k.to_s + "\n" + cc.join("\n").cyan 227 | end 228 | end 229 | end 230 | end 231 | end 232 | 233 | def passlist(content, arg) 234 | if arg[:passlist] 235 | puts "\npasslist.txt generated".light_blue 236 | 237 | content.each do |z| 238 | z.each do |k, v| 239 | words = v.to_s.scan /\w+/ 240 | words.uniq! 241 | if !words.empty? 242 | out_file = File.new("#{@foldername}/passlist.txt", "w") 243 | out_file.puts "\n" + words.join("\n").cyan 244 | out_file.close 245 | end 246 | end 247 | end 248 | end 249 | end 250 | 251 | def keywords(content, arg) 252 | if arg[:dirtmode] 253 | wordarr = [] 254 | file = File.open("keywords.txt") do |f| 255 | f.each_line do |line| 256 | wordarr << line 257 | end 258 | puts "\nPotentially Sensitive Data In Document(s)".light_blue 259 | 260 | content.each do |z| 261 | z.each do |k, v| 262 | wordarr.each do |w| 263 | keywords = v.to_s.scan /#{w}/i 264 | keywords.uniq! 265 | if !keywords.empty? 266 | printf "%-50s %s", k, keywords.join("").cyan 267 | end 268 | end 269 | end 270 | end 271 | end 272 | end 273 | end 274 | 275 | def exif(files, arg) 276 | exifarray = [] 277 | exifarray << "ImageName,Latitude,Longitude" 278 | 279 | if arg[:exif] 280 | if !files.empty? 281 | files.each do |z| 282 | exif = Exiftool.new(z) 283 | exifh = exif.to_hash 284 | if exifh.has_key?(:gps_latitude) 285 | exifarray << "#{exifh[:file_name]},#{exifh[:gps_latitude]},#{exifh[:gps_longitude]}" 286 | puts "#{"\n" + exifh[:file_name].light_blue + "\n" + exifh[:gps_latitude].to_s.yellow + " " + exifh[:gps_longitude].to_s.magenta}" 287 | else 288 | puts "#{"\n" + exifh[:file_name]} ***Image did not contain GPS data***".red 289 | end 290 | if exifh.has_key?(:make) 291 | puts "#{exifh[:make] + ' ' + exifh[:model] + ' ' + exifh[:software].to_s}".green 292 | end 293 | end 294 | gmaps = File.new("#{@foldername}/gmaps.csv", "w") 295 | gmaps.puts exifarray 296 | gmaps.close 297 | puts "\nGoogle Maps Compatible CSV Written to #{@foldername}/gmaps.csv".light_blue 298 | end 299 | end 300 | end 301 | 302 | def printer(meta, arg) 303 | if meta != nil and !arg[:exif] 304 | puts "\nPotential Usernames (Document Creator)".light_blue 305 | puts meta.map { |h| h["Author"] }.compact.reject(&:empty?).uniq 306 | puts "\nSoftware Used to Create Documents".light_blue 307 | puts meta.map { |h| h["producer"] }.compact.reject(&:empty?).uniq 308 | puts meta.map { |h| h["Application-Name"] }.compact.reject(&:empty?).uniq 309 | end 310 | end 311 | 312 | arg = arguments 313 | subdomains = subdomains(arg) 314 | download(arg, subdomains) 315 | numfiles = Dir["#{@foldername}/*"].length 316 | puts "Number of Files Downloaded: #{numfiles}".light_blue 317 | if arg[:offline] 318 | files = Dir.glob(ARGV[0] + '/*.*') 319 | else 320 | files = Dir.glob("#{@foldername}/*") 321 | end 322 | meta = metadata(files, arg) 323 | content = filecontent(files, arg) 324 | emails(content, arg) 325 | ipaddr(content, arg) 326 | cc(content, arg) 327 | keywords(content, arg) 328 | exif(files, arg) 329 | printer(meta, arg) 330 | passlist(content, arg) --------------------------------------------------------------------------------