├── website ├── .gitignore ├── static │ ├── favicon.ico │ ├── img │ │ └── devdungeon_logo.png │ └── css │ │ └── global.css ├── templates │ ├── premium.tmpl │ ├── search_results.tmpl │ ├── domain_listing.tmpl │ ├── view_domain.tmpl │ ├── index.tmpl │ └── layout.tmpl └── website.go ├── .gitignore ├── screenshots ├── website.png └── worker_http.png ├── systemctl └── webgenome.service ├── core └── core.go ├── README.md ├── worker_http └── worker_http.go └── LICENSE.txt /website/.gitignore: -------------------------------------------------------------------------------- 1 | website.log 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.exe 2 | .dropbox.attr 3 | .idea 4 | *~ -------------------------------------------------------------------------------- /screenshots/website.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevDungeon/WebGenome/HEAD/screenshots/website.png -------------------------------------------------------------------------------- /website/static/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevDungeon/WebGenome/HEAD/website/static/favicon.ico -------------------------------------------------------------------------------- /screenshots/worker_http.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevDungeon/WebGenome/HEAD/screenshots/worker_http.png -------------------------------------------------------------------------------- /website/static/img/devdungeon_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevDungeon/WebGenome/HEAD/website/static/img/devdungeon_logo.png -------------------------------------------------------------------------------- /website/templates/premium.tmpl: -------------------------------------------------------------------------------- 1 |
Current Page: {{ .pageNumber }}
-------------------------------------------------------------------------------- /systemctl/webgenome.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=Webgenome Webserver 3 | After=network.target 4 | 5 | [Service] 6 | Type=simple 7 | User=webgenome 8 | WorkingDirectory=/home/webgenome/go/src/github.com/DevDungeon/WebGenome/website 9 | ExecStart=/home/webgenome/go/bin/website 10 | Restart=always 11 | 12 | [Install] 13 | WantedBy=multi-user.target 14 | -------------------------------------------------------------------------------- /website/templates/domain_listing.tmpl: -------------------------------------------------------------------------------- 1 |Current Page: {{ .pageNumber }}
-------------------------------------------------------------------------------- /core/core.go: -------------------------------------------------------------------------------- 1 | package core 2 | 3 | import ( 4 | "time" 5 | 6 | "gopkg.in/mgo.v2/bson" 7 | ) 8 | 9 | type Header struct { 10 | Key string 11 | Value string 12 | } 13 | 14 | type Domain struct { 15 | Id bson.ObjectId `bson:"_id,omitempty"` 16 | Name string 17 | ParentDomain bson.ObjectId `bson:",omitempty"` 18 | Skipped bool `bson:",omitempty"` 19 | LastChecked time.Time `bson:",omitempty"` 20 | Headers []Header `bson:",omitempty"` 21 | } 22 | -------------------------------------------------------------------------------- /website/templates/view_domain.tmpl: -------------------------------------------------------------------------------- 1 | 2 | 3 |6 | Random Domain 7 |
8 | 9 | 10 || {{.Key}} | {{.Value}} | 15 |
|---|
| Last Checked | {{.domain.LastChecked}} |
|---|---|
| Skipped | {{.domain.Skipped}} |
Any domain found by this crawler was found by following a publicly available hyperlink on the internet. The web crawlers was seeded with a single domain, devdungeon.com. This list explains how the crawler ended up on this domain when it only followed visible links starting from a single page.
29 |Web Genome is a research project to gather statistics about the usage of various technologies across the internet. It is a breadth first web crawler that stores 2 | HTTP headers, which can provide information about the underlying technology of a website. Below you can browse domain names based on the technologies they use.
3 | 4 |Total Domains in Database: {{.totalDomains}}
5 | 6 | 7 | 8 |