├── .gitignore ├── LICENSE ├── README.md ├── chromeSession.go ├── chromeSession_test.go └── examples └── docker ├── Dockerfile ├── Makefile ├── README.md └── main.go /.gitignore: -------------------------------------------------------------------------------- 1 | # Binaries for programs and plugins 2 | *.exe 3 | *.dll 4 | *.so 5 | *.dylib 6 | 7 | # Test binary, build with `go test -c` 8 | *.test 9 | 10 | # Output of the go coverage tool, specifically when used with LiteIDE 11 | *.out 12 | 13 | # Project-local glide cache, RE: https://github.com/Masterminds/glide/issues/736 14 | .glide/ 15 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Eric Greer 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DO NOT USE! This project does not work anymore due to changes in Chrome. Use the Chrome DevTools protocol in [chromedp](https://github.com/chromedp) instead! 2 | 3 | # headlessChrome 🤖 4 | **Support only for Ubuntu on Docker for now. Mac appears to not be working.** 😬 5 | 6 | A [go](https://golang.org) package for working with headless Chrome. Run interactive JavaScript commands on pages with go and Chrome without a GUI. Includes a few helpful functions out of the box to query and click selector paths by their classes, divs, or html content. 7 | 8 | You could use this package to click buttons and scrape content on/from a website as if you were a browser, or to render pages that wouldn't be supported by other things like phantomjs or casperjs. Especially useful for sites that use EmberJS, where the content is rendered by javascript after the HTML payload is delivered. 9 | 10 | #### Examples 11 | 12 | An example project that does some simple things with a `Makefile` and `Dockerfile` is in the examples directory. 13 | 14 | #### Install 15 | `go get github.com/integrii/headlessChrome` 16 | 17 | #### Documentation 18 | [http://godoc.org/github.com/integrii/headlessChrome](http://godoc.org/github.com/integrii/headlessChrome) 19 | 20 | ##### Docker Version 21 | To run Chrome headless with docker, check out `examples/docker/main.go` as well as `examples/docker/Makefile`. When in that directory, you can do `make test` to build and run the container with the example app inside. You will see the source of httpbin.org displayed at the end of the build and run. 22 | 23 | ##### Custom Flags 24 | By default, we startup with the bare minimum flags necessary to start headless chrome and open a javascript console. If you want more flags, like a resolution size, or a custom User-Agent, you can specify it by replacing the `Args` variable. Just be sure to append to it so you don't kill the default flags... 25 | 26 | ```go 27 | headlessChrome.Args = append(headlessChrome.Args,"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36") 28 | headlessChrome.Args = append(headlessChrome.Args,"--window-size=1024,768") 29 | ``` 30 | 31 | ##### Changing the Path to Chrome 32 | 33 | Change the path to Chrome by simply setting the `headlessChrome.ChromePath` variable. 34 | ```go 35 | headlessChrome.ChromePath = `/opt/google/chrome-unstable/chrome` 36 | ``` 37 | 38 | 39 | ##### JavaScript Helper Examples 40 | 41 | Find the full list in [the docs](http://godoc.org/github.com/integrii/headlessChrome). 42 | 43 | 44 | ```go 45 | // click some span element from the page by its text content 46 | browser.ClickItemWithInnerHTML("span", "Google Search",0) 47 | 48 | // select the content of something by its css classes 49 | browser.GetContentOfItemWithClasses("button arrow bold",0) 50 | time.Sleep(time.Second) // give it a second to query 51 | 52 | // read the selected stuff from the console by picking 53 | // the next item from the output channel 54 | fmt.Println(<-browser.Output) 55 | ``` 56 | 57 | 58 | #### Contributing 59 | 60 | Please send pull requests! It would be good to have support for more operating systems or more handy helpers to run more commonly used javascript code easily. Adding support for other operating systems should be as simple as checking the platform type and changing the `ChromePath` variable's default value. 61 | -------------------------------------------------------------------------------- /chromeSession.go: -------------------------------------------------------------------------------- 1 | // Package headlessChrome is a Go package for working with headless Chrome. Run interactive JavaScript commands on web pages with Go and Chrome. Edit 2 | 3 | package headlessChrome 4 | 5 | import ( 6 | "errors" 7 | "fmt" 8 | "strconv" 9 | "strings" 10 | "time" 11 | 12 | "github.com/integrii/interactive" 13 | ) 14 | 15 | // Debug enables debug output for this package to console 16 | var Debug bool 17 | 18 | // BrowserStartupTime is how long chrome has to startup the console 19 | // before we consider it a failure 20 | var BrowserStartupTime = time.Second * 20 21 | 22 | // ChromePath is the command to execute chrome 23 | var ChromePath = ChromePathMacOS 24 | 25 | // ChromePathMacOS is where chrome normally lives on MacOS 26 | var ChromePathMacOS = `/Applications/Google Chrome.app/Contents/MacOS/Google Chrome` 27 | 28 | // ChromePathDocker is where chrome normally lives in the project's docker container 29 | var ChromePathDocker = `/opt/google/chrome-unstable/chrome` 30 | 31 | // Args are the args that will be used to start chrome 32 | var Args = []string{ 33 | "--headless", 34 | "--disable-gpu", 35 | "--repl", 36 | // "--dump-dom", 37 | // "--window-size=1024,768", 38 | // "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", 39 | // "--verbose", 40 | } 41 | 42 | const expectedFirstLine = `Type a Javascript expression to evaluate or "quit" to exit.` 43 | const promptPrefix = `>>>` 44 | 45 | // outputSanitizer puts output coming from the consolw that 46 | // does not begin with the input prompt into the session 47 | // output channel 48 | func (cs *ChromeSession) outputSanitizer() { 49 | for text := range cs.Session.Output { 50 | debug("raw output:", text) 51 | if !strings.HasPrefix(text, promptPrefix) { 52 | cs.Output <- text 53 | } 54 | } 55 | } 56 | 57 | // ChromeSession is an interactive console Session with a Chrome 58 | // instance. 59 | type ChromeSession struct { 60 | Session *interactive.Session 61 | Output chan string 62 | Input chan string 63 | } 64 | 65 | // Exit exits the running command out by ossuing a 'quit' 66 | // to the chrome console 67 | func (cs *ChromeSession) Exit() { 68 | cs.Session.Write(`;quit`) 69 | cs.Session.Exit() // exit the process with an interrupt signal 70 | cs.Session.Close() // close the tty session 71 | } 72 | 73 | // Write writes to the Session 74 | func (cs *ChromeSession) Write(s string) { 75 | debug("write:", s) 76 | cs.Session.Write(s) 77 | } 78 | 79 | // outputPrinter prints all outputs from the output channel to the cli 80 | func (cs *ChromeSession) outputPrinter() { 81 | for l := range cs.Session.Output { 82 | debug("read:", l) 83 | fmt.Println(l) 84 | } 85 | } 86 | 87 | // ForceClose issues a force kill to the command 88 | func (cs *ChromeSession) ForceClose() { 89 | cs.Session.ForceClose() 90 | } 91 | 92 | // ClickSelector calls a click() on the supplied selector 93 | func (cs *ChromeSession) ClickSelector(s string) { 94 | cs.Write(`document.querySelector("` + s + `").click()`) 95 | } 96 | 97 | // ClickItemWithInnerHTML clicks an item that has the matching inner html 98 | func (cs *ChromeSession) ClickItemWithInnerHTML(elementType string, s string, itemIndex int) { 99 | cs.Write(`var x = $("` + elementType + `").filter(function(idx) { return this.innerHTML == "` + s + `"});x[` + strconv.Itoa(itemIndex) + `].click()`) 100 | } 101 | 102 | // GetItemWithInnerHTML fetches the item with the specified innerHTML content 103 | func (cs *ChromeSession) GetItemWithInnerHTML(elementType string, s string, itemIndex int) { 104 | cs.Write(`var x = $("` + elementType + `").filter(function(idx) { return this.innerHTML == "` + s + `"});x[` + strconv.Itoa(itemIndex) + `]`) 105 | } 106 | 107 | // GetContentOfItemWithClasses fetches the content of the element with the specified classes 108 | func (cs *ChromeSession) GetContentOfItemWithClasses(classes string, itemIndex int) { 109 | cs.Write(`document.getElementsByClassName("` + classes + `")[` + strconv.Itoa(itemIndex) + `].innerHTML`) 110 | } 111 | 112 | // GetValueOfItemWithClasses returns the form value of the specified item 113 | func (cs *ChromeSession) GetValueOfItemWithClasses(classes string, itemIndex int) { 114 | cs.Write(`document.getElementsByClassName("` + classes + `")[` + strconv.Itoa(itemIndex) + `].value`) 115 | } 116 | 117 | // GetContentOfItemWithSelector gets the content of an element with the specified selector 118 | func (cs *ChromeSession) GetContentOfItemWithSelector(selector string) { 119 | cs.Write(`document.querySelector("` + selector + `").innerHTML()`) 120 | } 121 | 122 | // ClickItemWithClasses clicks on the first item it finds with the provided classes. 123 | // Multiple classes are separated by spaces 124 | func (cs *ChromeSession) ClickItemWithClasses(classes string, itemIndex int) { 125 | cs.Write(`document.getElementsByClassName("` + classes + `")[` + strconv.Itoa(itemIndex) + `].click()`) 126 | } 127 | 128 | // SetTextByID sets the text on the div with the specified id 129 | func (cs *ChromeSession) SetTextByID(id string, text string) { 130 | cs.Write(`document.getElementById("` + id + `").innerHTML = "` + text + `"`) 131 | } 132 | 133 | // ClickItemWithID clicks an item with the specified id 134 | func (cs *ChromeSession) ClickItemWithID(id string) { 135 | cs.Write(`document.getElementById("` + id + `").click()`) 136 | } 137 | 138 | // SetTextByClasses sets the text on the div with the specified id 139 | func (cs *ChromeSession) SetTextByClasses(classes string, itemIndex int, text string) { 140 | cs.Write(`document.getElementsByClassName("` + classes + `")[` + strconv.Itoa(itemIndex) + `].innerHTML = "` + text + `"`) 141 | } 142 | 143 | // SetInputTextByClasses sets the input text for an input field 144 | func (cs *ChromeSession) SetInputTextByClasses(classes string, itemIndex int, text string) { 145 | cs.Write(`document.getElementsByClassName("` + classes + `")[` + strconv.Itoa(itemIndex) + `].value = "` + text + `"`) 146 | } 147 | 148 | // NewBrowserWithTimeout starts a new chrome headless session 149 | // but limits how long it can run before its killed forcefully. 150 | // A time limit of 0 means there is not a time limit 151 | func NewBrowserWithTimeout(url string, timeout time.Duration) (*ChromeSession, error) { 152 | var err error 153 | 154 | debug("Creating a new browser pointed to", url) 155 | 156 | chromeSession := ChromeSession{} 157 | chromeSession.Output = make(chan string, 5000) 158 | 159 | // add url as last arg and create new Session 160 | args := append(Args, url) 161 | debug(ChromePath, args) 162 | chromeSession.Session, err = interactive.NewSessionWithTimeout(ChromePath, args, timeout) 163 | if err != nil { 164 | return &chromeSession, err 165 | } 166 | 167 | // map output and input channels for easy use 168 | chromeSession.Input = chromeSession.Session.Input 169 | go chromeSession.outputSanitizer() 170 | 171 | // wait for the console ready line from the browser 172 | // and if it does not start in time, throw an error 173 | startupTime := time.NewTimer(BrowserStartupTime) 174 | for { 175 | select { 176 | case <-startupTime.C: 177 | debug("ERROR: Browser failed to start before browser startup time cutoff") 178 | chromeSession.ForceClose() // force cloe the session because it failed 179 | err = errors.New("Chrome console failed to init in the alotted time") 180 | return &chromeSession, err 181 | case line := <-chromeSession.Output: 182 | if strings.Contains(line, expectedFirstLine) { 183 | debug("Chrome console REPL ready") 184 | return &chromeSession, err 185 | } 186 | debug("WARNING: Unespected first line when initializing headless Chrome console:", line) 187 | } 188 | } 189 | } 190 | 191 | // NewBrowser starts a new chrome headless Session. 192 | func NewBrowser(url string) (*ChromeSession, error) { 193 | return NewBrowserWithTimeout(url, 0) 194 | } 195 | 196 | func debug(s ...interface{}) { 197 | if Debug { 198 | fmt.Println(s...) 199 | } 200 | } 201 | -------------------------------------------------------------------------------- /chromeSession_test.go: -------------------------------------------------------------------------------- 1 | package headlessChrome 2 | 3 | import ( 4 | "strings" 5 | "testing" 6 | "time" 7 | ) 8 | 9 | // TestMainPageScrape tests a scrape from content on httpbin.org 10 | func TestMainPageScrape(t *testing.T) { 11 | 12 | Debug = true 13 | 14 | // make a new Session 15 | chrome, err := NewBrowser(`google.com`) 16 | if err != nil { 17 | t.Fatal(err) 18 | } 19 | 20 | time.Sleep(time.Second * 5) 21 | chrome.Write(`document.documentElement.outerHTML`) 22 | chrome.Exit() 23 | 24 | // write to the Session and issue an exit 25 | var googleFound bool 26 | for l := range chrome.Output { 27 | if strings.Contains(l, "google") { 28 | googleFound = true 29 | } 30 | t.Log(l) 31 | } 32 | 33 | if !googleFound { 34 | t.Fatal("Didnt find google in the output") 35 | } 36 | } 37 | -------------------------------------------------------------------------------- /examples/docker/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM yukinying/chrome-headless-browser 2 | MAINTAINER integrii@gmail.com 3 | USER root 4 | RUN apt-get update 5 | RUN apt-get -y install golang-1.8 6 | RUN apt-get -y install git 7 | RUN mkdir /go 8 | RUN mkdir /app 9 | ADD main.go /app/main.go 10 | WORKDIR /app 11 | RUN /usr/lib/go-1.8/bin/go get github.com/integrii/headlessChrome 12 | RUN /usr/lib/go-1.8/bin/go build -o /app/headless 13 | ENTRYPOINT ["/app/headless","--chromePath=/opt/google/chrome-unstable/chrome"] 14 | -------------------------------------------------------------------------------- /examples/docker/Makefile: -------------------------------------------------------------------------------- 1 | build: 2 | docker build . -t headless-chrome 3 | 4 | test: build run 5 | 6 | run: 7 | docker run --shm-size 1G --rm -it --cap-add=SYS_ADMIN headless-chrome 8 | 9 | repl: 10 | docker run --shm-size 1G --rm -it --cap-add=SYS_ADMIN -v --entrypoint="/opt/google/chrome-unstable/chrome" headless-chrome --headless --disable-gpu --repl --no-sandbox 'https://ebay.com' 11 | -------------------------------------------------------------------------------- /examples/docker/README.md: -------------------------------------------------------------------------------- 1 | Runs headless-chrome from an ubuntu container. 2 | 3 | `make test` 4 | -------------------------------------------------------------------------------- /examples/docker/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "time" 6 | 7 | "github.com/integrii/headlessChrome" 8 | ) 9 | 10 | func main() { 11 | 12 | // set headless chrome package into debug mode (prints to stdout) 13 | // headlessChrome.Debug = true 14 | // interactive.Debug = true 15 | 16 | // set the path to the docker container chrome executable 17 | // headlessChrome.ChromePath = headlessChrome.ChromePathDocker 18 | headlessChrome.ChromePath = `/opt/google/chrome-unstable/chrome` 19 | 20 | // set some additional arguments for when starting chrome 21 | headlessChrome.Args = append(headlessChrome.Args, "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36") 22 | headlessChrome.Args = append(headlessChrome.Args, "--window-size=1024,768") 23 | headlessChrome.Args = append(headlessChrome.Args, "--no-sandbox") 24 | 25 | // make a new session 26 | browser, err := headlessChrome.NewBrowser(`http://httpbin.org`) 27 | if err != nil { 28 | panic(err) 29 | } 30 | // Close the browser process when this func returns 31 | defer browser.Exit() 32 | 33 | // sleep while content is rendered. You could replace this 34 | // with some javascript that only returns when the 35 | // content exists to be safer. 36 | time.Sleep(time.Second * 5) 37 | 38 | // Query all the HTML from the web site for fun 39 | browser.Write("document.documentElement.outerHTML") 40 | time.Sleep(time.Second) 41 | 42 | // loop over all the output that came from the ouput channel 43 | // and print it to the console 44 | for len(browser.Output) > 0 { 45 | fmt.Println(<-browser.Output) 46 | } 47 | 48 | } 49 | --------------------------------------------------------------------------------