├── .gitignore ├── CHANGELOG.md ├── LICENSE ├── README.md ├── config └── config.exs ├── lib ├── hpdf.ex └── hpdf │ ├── application.ex │ ├── browser.ex │ ├── controller.ex │ ├── controller_supervisor.ex │ ├── printer.ex │ ├── receiver.ex │ └── web_socket.ex ├── mix.exs ├── mix.lock └── test ├── expdf_test.exs └── test_helper.exs /.gitignore: -------------------------------------------------------------------------------- 1 | # The directory Mix will write compiled artifacts to. 2 | /_build 3 | 4 | # If you run "mix test --cover", coverage assets end up here. 5 | /cover 6 | 7 | # The directory Mix downloads your dependencies sources to. 8 | /deps 9 | 10 | # Where 3rd-party dependencies like ExDoc output generated docs. 11 | /doc 12 | 13 | # Ignore .fetch files in case you like to edit your project deps locally. 14 | /.fetch 15 | 16 | # If the VM crashes, it generates a dump, let's ignore it too. 17 | erl_crash.dump 18 | 19 | # Also ignore archive artifacts (built via "mix archive.build"). 20 | *.ez 21 | 22 | *.pdf 23 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | v0.3.1 2 | Correctly detect redirects only on the request URL 3 | 4 | v0.3.0 5 | Add a max_wait_time option to force printing after a set time 6 | 7 | v0.1.0 8 | 9 | Initial Release 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Daniel Neighman 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining 4 | a copy of this software and associated documentation files (the 5 | "Software"), to deal in the Software without restriction, including 6 | without limitation the rights to use, copy, modify, merge, publish, 7 | distribute, sublicense, and/or sell copies of the Software, and to 8 | permit persons to whom the Software is furnished to do so, subject to 9 | the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be 12 | included in all copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 15 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 16 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 17 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 18 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 19 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 20 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # HPDF 2 | 3 | Headless PDF printing (with Chrome) 4 | 5 | Use Chrome in Headless mode to print pages to PDF. 6 | Each page is loaded in it's own browser context, similar to an Incognito window. 7 | 8 | Pages may be printed that require authentication allowing you to print pages that are behind login wall. 9 | 10 | When using HPDF you need to have a headless chrome running. 11 | You can get a headless chrome browser by using a docker container. 12 | A public container can be found at: https://hub.docker.com/r/justinribeiro/chrome-headless/ 13 | 14 | By default HPDF will look for chrome at `http://localhost:9222`. 15 | This can be configured in your configuration files by using: 16 | 17 | ```elixir 18 | config :hpdf, HPDF, 19 | address: "http://my_custom_domain:9222" 20 | ``` 21 | 22 | ```sh 23 | docker run -d -p 9222:9222 --cap-add=SYS_ADMIN justinribeiro/chrome-headless 24 | ``` 25 | 26 | ### Example 27 | 28 | ```elixir 29 | case HPDF.print_pdf!(my_url, timeout: 30_000) do 30 | {:ok, pdf_data} -> do_stuff_with_the_pdf_binary_data(pdf_data) 31 | {:error, error_type, reason} -> #Handle error 32 | {:error, reason} -> # Handle error 33 | ``` 34 | 35 | Common error types provided by HPDF 36 | * `:page_error` - An error was returned by the browser 37 | * `:page_redirected` - The URL was redirected 38 | * `:page_load_failure` - The page loaded with a non 200 status code 39 | * `:crashed` - The browser crashed 40 | 41 | ### Using header authentication 42 | 43 | When printing a page using header authentication, 44 | usually it's not only the original page, but all AJAX requests made within it that need to have the authentication header included. 45 | 46 | Assuming you have a token 47 | 48 | ```elixir 49 | header_value = get_my_auth_header() 50 | headers = %{"authorization" => header_value} 51 | 52 | case HPDF.print_pdf!(my_url, timeout: 30_000, page_headers: headers) do 53 | {:ok, pdf_data} -> do_stuff_with_the_pdf_binary_data(pdf_data) 54 | {:error, error_type, reason} -> #Handle error 55 | {:error, reason} -> # Handle error 56 | end 57 | ``` 58 | 59 | ### Using cookie authentication 60 | An initiating cookie can be used to access pages. 61 | 62 | ```elixir 63 | cookie = %{ 64 | name: "_cookie_name", 65 | value: cookie_value, 66 | domain: "your.domain", 67 | path: "/", 68 | secure: true, 69 | httpOnly: true, 70 | } 71 | 72 | {:ok, data} = HPDF.print_pdf!(url, timeout: 30_000, cookie: cookie) 73 | ``` 74 | 75 | ### Calling `print_pdf!` 76 | 77 | Prints a PDF file with the provided options. 78 | The HPDF.Application must be running before calling this function 79 | 80 | ### Options 81 | 82 | * `timeout` - The timeout for the call. Default 5_000 83 | * `after_load_delay` - The time to wait after the page finishes loading. Allowing for dynamic JS calls and rendering. 84 | * `cookie` - Supply a cookie for the page to be loaded with. See https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-setCookie 85 | * `page_headers` - A map of headers to supply to the page 86 | * `include_headers_on_same_domain` - A bool. Default True. If true, all requests to the same domain will include the same headers as the main page 87 | * `print_options` - A map of options to the print method. See https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF 88 | * `max_wait_time` - A time in miliseconds after which the page will be forcefully printed even if there are outstanding requests 89 | 90 | ## Installation 91 | 92 | If [available in Hex](https://hex.pm/docs/publish), the package can be installed 93 | by adding `hpdf` to your list of dependencies in `mix.exs`: 94 | 95 | ```elixir 96 | def deps do 97 | [{:hpdf, "~> 0.3.1"}] 98 | end 99 | 100 | def application do 101 | # Specify extra applications you'll use from Erlang/Elixir 102 | [extra_applications: [:hpdf], 103 | end 104 | ``` 105 | 106 | Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc) 107 | and published on [HexDocs](https://hexdocs.pm). Once published, the docs can 108 | be found at [https://hexdocs.pm/hpdf](https://hexdocs.pm/hpdf). 109 | -------------------------------------------------------------------------------- /config/config.exs: -------------------------------------------------------------------------------- 1 | # This file is responsible for configuring your application 2 | # and its dependencies with the aid of the Mix.Config module. 3 | use Mix.Config 4 | 5 | config :logger, level: :debug 6 | 7 | # This configuration is loaded before any dependency and is restricted 8 | # to this project. If another project depends on this project, this 9 | # file won't be loaded nor affect the parent project. For this reason, 10 | # if you want to provide default values for your application for 11 | # 3rd-party users, it should be done in your "mix.exs" file. 12 | 13 | # You can configure for your application as: 14 | # 15 | # config :hpdf, key: :value 16 | # 17 | # And access this configuration in your application as: 18 | # 19 | # Application.get_env(:hpdf, :key) 20 | # 21 | # Or configure a 3rd-party app: 22 | # 23 | # config :logger, level: :info 24 | # 25 | 26 | # It is also possible to import configuration files, relative to this 27 | # directory. For example, you can emulate configuration per environment 28 | # by uncommenting the line below and defining dev.exs, test.exs and such. 29 | # Configuration from the imported file will override the ones defined 30 | # here (which is why it is important to import them last). 31 | # 32 | # import_config "#{Mix.env}.exs" 33 | -------------------------------------------------------------------------------- /lib/hpdf.ex: -------------------------------------------------------------------------------- 1 | defmodule HPDF do 2 | @moduledoc """ 3 | Uses Chrome in Headless mode to print pages to PDF. 4 | Each page is loaded in it's own browser context, similar to an Incognito window. 5 | 6 | Pages may be printed that require authentication allowing you to print pages that are behind login wall. 7 | 8 | When using HPDF you need to have a headless chrome running. 9 | By default HPDF will look for chrome at `http://localhost:9222`. 10 | This can be configured in your configuration files by using: 11 | 12 | ```elixir 13 | config :hpdf, HPDF, 14 | address: "http://my_custom_domain:9222" 15 | ``` 16 | 17 | You can get a headless chrome browser by using a docker container. 18 | 19 | A public container can be found at: https://hub.docker.com/r/justinribeiro/chrome-headless/ 20 | 21 | ```sh 22 | docker run -d -p 9222:9222 --cap-add=SYS_ADMIN justinribeiro/chrome-headless 23 | ``` 24 | 25 | ### Example 26 | 27 | ```elixir 28 | case HPDF.print_page!(my_url, timeout: 30_000) do 29 | {:ok, pdf_data} -> do_stuff_with_the_pdf_binary_data(pdf_data) 30 | {:error, error_type, reason} -> #Handle error 31 | {:error, reason} -> # Handle error 32 | ``` 33 | 34 | Common error types provided by HPDF 35 | * `:page_error` - An error was returned by the browser 36 | * `:page_redirected` - The URL was redirected 37 | * `:page_load_failure` - The page loaded with a non 200 status code 38 | * `:crashed` - The browser crashed 39 | 40 | ### Using header authentication 41 | 42 | When printing a page using header authentication, 43 | usually it's not only the original page, but all AJAX requests made within it that need to have the authentication header included. 44 | 45 | Assuming you have a token 46 | 47 | ```elixir 48 | header_value = get_my_auth_header() 49 | headers = %{"authorization" => header_value} 50 | 51 | case HPDF.print_page!(my_url, timeout: 30_000, page_headers: headers) do 52 | {:ok, pdf_data} -> do_stuff_with_the_pdf_binary_data(pdf_data) 53 | {:error, error_type, reason} -> #Handle error 54 | {:error, reason} -> # Handle error 55 | end 56 | ``` 57 | 58 | ### Using cookie authentication 59 | An initiating cookie can be used to access pages. 60 | 61 | ```elixir 62 | cookie = %{ 63 | name: "_cookie_name", 64 | value: cookie_value, 65 | domain: "your.domain", 66 | path: "/", 67 | secure: true, 68 | httpOnly: true, 69 | } 70 | 71 | {:ok, data} = HPDF.print_pdf!(url, timeout: 30_000, cookie: cookie) 72 | ``` 73 | """ 74 | 75 | @doc """ 76 | Prints a PDF file with the provided options. 77 | The HPDF.Application must be running before calling this funtion 78 | 79 | ### Options 80 | 81 | * `timeout` - The timeout for the call. Deafult 5_000 82 | * `after_load_delay` - The time to wait after the page finishes loading. Allowing for dynamic JS calls and rendering. 83 | * `cookie` - Supply a cookie for the page to be loaded with. See https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-setCookie 84 | * `page_headers` - A map of headers to supply to the page 85 | * `include_headers_on_same_domain` - A bool. Default True. If true, all requests to the same domain will include the same headers as the main page 86 | * `print_options` - A map of options to the print method. See https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF 87 | * `max_wait_time` - The maximum amount of time to wait for loading before printing in miliseconds. 88 | """ 89 | def print_pdf!(url, options \\ []) do 90 | HPDF.Controller.print_pdf!(url, options) 91 | end 92 | end 93 | -------------------------------------------------------------------------------- /lib/hpdf/application.ex: -------------------------------------------------------------------------------- 1 | defmodule HPDF.Application do 2 | # See http://elixir-lang.org/docs/stable/elixir/Application.html 3 | # for more information on OTP Applications 4 | @moduledoc false 5 | 6 | use Application 7 | 8 | def start(_type, _args) do 9 | import Supervisor.Spec, warn: false 10 | 11 | # Define workers and child supervisors to be supervised 12 | children = [ 13 | worker(HPDF.Browser, []), 14 | supervisor(HPDF.Controller.Supervisor, []) 15 | ] 16 | 17 | # See http://elixir-lang.org/docs/stable/elixir/Supervisor.html 18 | # for other strategies and supported options 19 | opts = [strategy: :one_for_all, name: HPDF.Supervisor] 20 | Supervisor.start_link(children, opts) 21 | end 22 | end 23 | -------------------------------------------------------------------------------- /lib/hpdf/browser.ex: -------------------------------------------------------------------------------- 1 | defmodule HPDF.Browser do 2 | @moduledoc false 3 | 4 | # Manages browser sessions. This manages creating and cleaning up browser sessions as contexts in chrome. 5 | # Each session is provided with a new 'Context' - similar to an Incognito window. 6 | # 7 | # This module manages the browser and utilizes the master web socket to create and shutdown contexts. 8 | # Each time we print a page, we create a new context. 9 | # The browser will monitor the requesting process so that once it stops, the context is removed automatically 10 | 11 | use HPDF.WebSocket 12 | require Logger 13 | 14 | @initial_state %{ 15 | connected?: false, 16 | debugger_http_address: nil, 17 | socket_uri: nil, 18 | session_counter: 1, 19 | sessions: %{}, 20 | socket: nil, 21 | receiver: nil, 22 | } 23 | 24 | def start_link(), do: start_link([]) 25 | def start_link(args) do 26 | GenServer.start_link(__MODULE__, args, name: __MODULE__) 27 | end 28 | 29 | def init(_args) do 30 | config = Application.get_env(:hpdf, HPDF, []) 31 | 32 | debugger_address = Keyword.get(config, :address, "http://localhost:9222") 33 | 34 | # TODO: fail if we don't get a socket address 35 | socket_address = fetch_controller_socket_address(debugger_address) 36 | ws_uri = URI.parse(socket_address) 37 | 38 | {:ok, init_state} = super(socket_args: [{ws_uri.host, ws_uri.port}, [path: ws_uri.path]]) 39 | 40 | state = 41 | %{@initial_state | debugger_http_address: debugger_address, 42 | socket_uri: ws_uri, 43 | receiver: init_state.receiver, 44 | socket: init_state.socket} 45 | 46 | 47 | {:ok, state} 48 | end 49 | 50 | def new_session do 51 | GenServer.call(__MODULE__, :new_session) 52 | end 53 | 54 | def close_session(session), do: GenServer.call(__MODULE__, {:close_session, session}) 55 | 56 | def handle_connect(state) do 57 | {:noreply, %{state | connected?: true}} 58 | end 59 | 60 | def handle_close({:abnormal, _reason}, state) do 61 | {:stop, :abnormal, %{state | connected?: false}} 62 | end 63 | 64 | def handle_close(_reason, state) do 65 | {:stop, :normal, %{state | connected?: false}} 66 | end 67 | 68 | def handle_call(:close_session, _from, %{connected?: false} = state) do 69 | {:reply, :ok, state} 70 | end 71 | 72 | def handle_call({:close_session, session}, _from, state) do 73 | new_state = terminate_session(session, state) 74 | {:reply, :ok, new_state} 75 | end 76 | 77 | def handle_call(:new_session, _from, %{connected?: false} = state) do 78 | {:reply, {:error, :not_connected}, state} 79 | end 80 | 81 | def handle_call(:new_session, {pid, _ref} = from, state) do 82 | session_state = %{ 83 | id: state.session_counter, 84 | owner: pid, 85 | reply_to: from, 86 | context_id: nil, 87 | page_id: nil, 88 | page_ws_uri: nil, 89 | monitor_ref: Process.monitor(pid), 90 | } 91 | 92 | method(state.socket, "Target.createBrowserContext", %{}, session_state.id) 93 | new_state = put_in(state, [:sessions, session_state.id], session_state) 94 | method(state.socket, "Browser.getVersion") 95 | 96 | {:noreply, %{new_state | session_counter: state.session_counter + 1}} 97 | end 98 | 99 | 100 | def handle_info({:DOWN, ref, :process, _pid, _reason}, state) do 101 | session = Enum.find state.sessions, fn {_, session} -> 102 | session.monitor_ref == ref 103 | end 104 | 105 | new_state = 106 | if session do 107 | terminate_session(elem(session, 1), state) 108 | else 109 | state 110 | end 111 | 112 | {:noreply, new_state} 113 | end 114 | 115 | 116 | def handle_frame( 117 | {:text, 118 | %{"id" => session_id, "result" => %{"browserContextId" => context_id}}}, 119 | %{sessions: sessions} = state 120 | ) do 121 | session = Map.get(sessions, session_id) 122 | session = %{session | context_id: context_id} 123 | request_new_page_for_context(context_id, session, state) 124 | 125 | new_state = put_in(state, [:sessions, session_id], session) 126 | {:noreply, new_state} 127 | end 128 | 129 | def handle_frame( 130 | {:text, %{"id" => session_id, "result" => %{"targetId" => page_id}}}, 131 | %{sessions: sessions} = state 132 | ) do 133 | session = Map.get(sessions, session_id) 134 | session = %{session | page_id: page_id, 135 | page_ws_uri: page_ws_address(state, page_id)} 136 | 137 | GenServer.reply(session.reply_to, {:ok, session}) 138 | 139 | new_state = put_in(state, [:sessions, session_id], session) 140 | {:noreply, new_state} 141 | end 142 | 143 | def handle_frame(_frame, state) do 144 | {:noreply, state} 145 | end 146 | 147 | defp request_new_page_for_context(context_id, %{id: method_id}, %{socket: socket}) do 148 | method( 149 | socket, 150 | "Target.createTarget", 151 | %{browserContextId: context_id, url: "about:_blank"}, 152 | method_id 153 | ) 154 | end 155 | 156 | defp terminate_session(session, %{socket: socket} = state) do 157 | method( 158 | socket, 159 | "Target.closeTarget", 160 | %{targetId: session.page_id}, 161 | session.id 162 | ) 163 | method( 164 | socket, 165 | "Target.disposeBrowserContext", 166 | %{browserContextId: session.context_id}, 167 | session.id 168 | ) 169 | 170 | new_sessions = Map.drop(state.sessions, [session.id]) 171 | %{state | sessions: new_sessions} 172 | end 173 | 174 | defp fetch_controller_socket_address(debugger_address) do 175 | address = debugger_address |> URI.parse() 176 | address = %{address | path: "/json"} 177 | resp = HTTPotion.get(address) 178 | 179 | response = 180 | case resp do 181 | %HTTPotion.Response{body: body, status_code: 200} -> 182 | Poison.decode!(body) 183 | end 184 | 185 | controller_context = response |> Enum.reverse() |> hd() 186 | socket_url = Map.get(controller_context, "webSocketDebuggerUrl") 187 | 188 | if socket_url do 189 | socket_url 190 | else 191 | "ws://#{address.host}:#{address.port}/devtools/page/#{controller_context["id"]}" 192 | end 193 | end 194 | 195 | defp page_ws_address(%{socket_uri: socket_uri}, page_id) do 196 | %{socket_uri | path: "/devtools/page/#{page_id}"} 197 | end 198 | 199 | defp method(socket, meth, params \\ %{}, id \\ nil) 200 | defp method(socket, meth, params, nil) do 201 | method(socket, meth, params, 56) 202 | end 203 | 204 | defp method(socket, meth, params, id) do 205 | args = %{method: meth, id: id, params: params} 206 | Socket.Web.send(socket, {:text, Poison.encode!(args)}) 207 | end 208 | end 209 | -------------------------------------------------------------------------------- /lib/hpdf/controller.ex: -------------------------------------------------------------------------------- 1 | defmodule HPDF.Controller do 2 | @moduledoc false 3 | 4 | # The controller is the entry point 5 | use GenServer 6 | 7 | def start_link(args) do 8 | GenServer.start_link(__MODULE__, args) 9 | end 10 | 11 | def start_link(args, _opts) do 12 | GenServer.start_link(__MODULE__, args) 13 | end 14 | 15 | def init(_) do 16 | {:ok, %{}} 17 | end 18 | 19 | def print_pdf!(url, opts \\ []) do 20 | {:ok, pid} = HPDF.Controller.Supervisor.start_child([]) 21 | timeout = Keyword.get(opts, :timeout, :infinity) 22 | result = GenServer.call(pid, {:print_pdf, url, opts}, timeout) 23 | GenServer.stop(pid, :normal) 24 | result 25 | end 26 | 27 | def handle_call({:print_pdf, url, opts}, _from, state) do 28 | case HPDF.Browser.new_session() do 29 | {:ok, session} -> 30 | result = HPDF.Printer.print_page!(session.page_ws_uri, url, opts) 31 | HPDF.Browser.close_session(session) 32 | {:reply, result, state} 33 | {:error, _reason} = err -> err 34 | end 35 | end 36 | end 37 | -------------------------------------------------------------------------------- /lib/hpdf/controller_supervisor.ex: -------------------------------------------------------------------------------- 1 | defmodule HPDF.Controller.Supervisor do 2 | @moduledoc false 3 | use Supervisor 4 | 5 | def start_link() do 6 | Supervisor.start_link(__MODULE__, [], name: __MODULE__) 7 | end 8 | 9 | def start_child(id) do 10 | Supervisor.start_child(__MODULE__, id: id) 11 | end 12 | 13 | def init(_args) do 14 | children = [ 15 | worker(HPDF.Controller, [], restart: :temporary) 16 | ] 17 | 18 | supervise(children, strategy: :simple_one_for_one) 19 | end 20 | end 21 | -------------------------------------------------------------------------------- /lib/hpdf/printer.ex: -------------------------------------------------------------------------------- 1 | defmodule HPDF.Printer do 2 | @moduledoc false 3 | 4 | # This module listens to a web socket that has been opened to a chrome context. 5 | # Implemeted in web_socket.ex 6 | # When `print_page!` is called, it navigates to the provided URL with network interception turned on 7 | # 8 | # It then watches each request and response, replacing headers on intercepted requests 9 | # and tracking responses. 10 | # 11 | # Once a `Page.frameStoppedLoading` event has been received the page has mostly stopped loading. 12 | # This might be true and some time `after_load_delay` is provided to allow any JS scripts to load dynamically. 13 | # Each time a request is made after the page is "loaded", the timer is reset once the response is received. 14 | 15 | use HPDF.WebSocket 16 | 17 | require Logger 18 | 19 | @pdf_req_id 420420 20 | @default_timeout 5_000 21 | 22 | # setting a cookie. The cookie value should describe 23 | # the values provided in 24 | # https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-setCookie 25 | defstruct socket: nil, # The web socket to use. This is given by the starting process. 26 | receiver: nil, # The receiver process. This is given by the starting process. 27 | after_load_delay: 750, # How long to wait after the page is loaded before starting to print 28 | cookie: nil, # The cookie options https://chromedevtools.github.io/devtools-protocol/tot/Network/#method-setCookie 29 | active_requests: 0, # A counter of the active requests that are in-progress (not including web-sockets) 30 | page_url: nil, # The URL of the page to print 31 | page_loaded?: false, 32 | page_headers: nil, # Page headers will be included on the initial page load 33 | include_headers_on_same_domain: true, # if set to true, any headers that were given for the original page will also be used on requests on the same domain 34 | print_timer: nil, 35 | max_wait_time: nil, 36 | timer: nil, # private 37 | reply_to: nil, # private 38 | print_options: %{}, # options for printing https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF 39 | page_frame: nil, # an id that chrome is using for the frame 40 | req_count: 1, 41 | canceled: false 42 | 43 | 44 | @doc false 45 | def init(args) do 46 | {:ok, init_state} = super(args) 47 | 48 | afd = Keyword.get(args, :after_load_delay, 750) 49 | headers = Keyword.get(args, :page_headers) 50 | cookie = Keyword.get(args, :cookie) 51 | print_options = Keyword.get(args, :print_options, %{}) 52 | max_wait_time = Keyword.get(args, :max_wait_time) 53 | headers = 54 | if headers do 55 | for {k, v} <- headers, into: %{}, do: {to_string(k), v} 56 | end 57 | 58 | on_same_domain = Keyword.get(args, :include_headers_on_same_domain, true) 59 | 60 | state = 61 | %__MODULE__{ 62 | receiver: init_state.receiver, 63 | socket: init_state.socket, 64 | after_load_delay: afd, 65 | page_headers: headers, 66 | include_headers_on_same_domain: on_same_domain, 67 | cookie: cookie, 68 | print_options: print_options, 69 | page_frame: nil, 70 | max_wait_time: max_wait_time, 71 | } 72 | 73 | {:ok, state} 74 | end 75 | 76 | @doc false 77 | def print_page!(ws_address, url, opts \\ []) do 78 | ws_uri = URI.parse(ws_address) 79 | 80 | args = Keyword.merge( 81 | [ 82 | socket_args: [{ws_uri.host, ws_uri.port}, [path: ws_uri.path]], 83 | ], 84 | opts 85 | ) 86 | 87 | {:ok, printer} = __MODULE__.start_link(args) 88 | print_page(printer, url, opts) 89 | end 90 | 91 | defp print_page(printer, url, opts) do 92 | timeout = Keyword.get(opts, :timeout, @default_timeout) 93 | GenServer.call(printer, {:print_pdf, url, opts}, timeout) 94 | end 95 | 96 | def handle_connect(state) do 97 | {:noreply, state} 98 | end 99 | 100 | def handle_call({:print_pdf, url, _opts}, from, state) do 101 | method(state.socket, "Page.enable", %{}, 1) 102 | method(state.socket, "Network.enable", %{}, 2) 103 | method(state.socket, "Network.setRequestInterceptionEnabled", %{enabled: true}, 5) 104 | 105 | if state.cookie do 106 | method(state.socket, "Network.setCookie", state.cookie, 8) 107 | end 108 | 109 | method(state.socket, "Page.navigate", %{url: url}, 3) 110 | print_timer = 111 | if state.max_wait_time do 112 | Process.send_after(self(), {:override_print_page}, state.max_wait_time) 113 | end 114 | 115 | {:noreply, %{state | reply_to: from, page_url: url, print_timer: print_timer}} 116 | end 117 | 118 | def handle_frame({:text, %{"id" => @pdf_req_id, "result" => %{ "data" => pdf_data}}}, state) do 119 | handle_pdf(pdf_data, state) 120 | {:stop, :normal, state} 121 | end 122 | 123 | def handle_frame({:text, %{"error" => %{"message" => "Invalid InterceptionId."}}}, state) do 124 | # intercepted requests that have been canceled will error 125 | {:noreply, state} 126 | end 127 | 128 | def handle_frame({:text, %{"error" => %{"message" => msg}}} = frame, state) do 129 | log_debug(inspect(frame)) 130 | GenServer.reply(state.reply_to, {:error, :page_error, msg}) 131 | {:stop, :normal, state} 132 | end 133 | 134 | def handle_frame({:text, %{"id" => 3, "result" => %{"frameId" => frameId}}}, state) do 135 | {:noreply, %{state | page_frame: frameId}} 136 | end 137 | 138 | def handle_frame( 139 | {:text, 140 | %{"method" => "Network.requestWillBeSent", 141 | "params" => %{"frameId" => frameId, "redirectResponse" => %{}, "request" => %{"url" => redirectURL}}, 142 | }}, %{page_url: redirectURL} = state) 143 | do 144 | if frameId == state.page_frame do 145 | GenServer.reply(state.reply_to, {:error, :page_redirected, redirectURL}) 146 | {:stop, :normal, state} 147 | else 148 | {:noreply, state} 149 | end 150 | end 151 | 152 | def handle_frame( 153 | {:text, 154 | %{ 155 | "method" => "Network.responseReceived", 156 | "params" => %{"response" => %{"status" => status, "url" => url}} 157 | }} = frame, 158 | %{page_url: url} = state 159 | ) when not (status in (200..299)) do 160 | log_debug("not a 200: #{inspect(frame)}") 161 | GenServer.reply(state.reply_to, {:error, :page_load_failure, status}) 162 | {:stop, :normal, state} 163 | end 164 | 165 | def handle_frame( 166 | {:text, %{"method" => "Network.requestWillBeSent"}}, 167 | %{timer: timer} = state 168 | ) do 169 | if timer do 170 | Process.cancel_timer(timer) 171 | end 172 | {:noreply, %{state | active_requests: state.active_requests + 1, timer: nil}} 173 | end 174 | 175 | def handle_frame( 176 | {:text, %{"method" => "Network.responseReceived"}}, 177 | %{timer: timer, active_requests: count} = state 178 | ) do 179 | if timer do 180 | Process.cancel_timer(timer) 181 | end 182 | 183 | new_timer = 184 | if state.page_loaded? && count <= 1 do 185 | Process.send_after(self(), {:print_page}, state.after_load_delay) 186 | end 187 | 188 | {:noreply, %{state | active_requests: count - 1, timer: new_timer}} 189 | end 190 | 191 | def handle_frame( 192 | {:text, 193 | %{ 194 | "method" => "Network.requestIntercepted", 195 | "params" => %{ 196 | "interceptionId" => interception_id, 197 | "request" => request, 198 | } 199 | }}, 200 | state 201 | ) do 202 | updated_headers = updated_headers_for_request(request, state) 203 | method( 204 | state.socket, 205 | "Network.continueInterceptedRequest", 206 | %{"headers" => updated_headers, "interceptionId" => interception_id}, 207 | state.req_count 208 | ) 209 | {:noreply, %{state | req_count: state.req_count + 1}} 210 | end 211 | 212 | def handle_frame({:text, %{"method" => "Page.frameStoppedLoading"}}, state) do 213 | if state.timer, do: Process.cancel_timer(state.timer) 214 | 215 | timer = 216 | if state.active_requests <= 1 do 217 | Process.send_after(self(), {:print_page}, state.after_load_delay) 218 | end 219 | {:noreply, %{state | page_loaded?: true, timer: timer}} 220 | end 221 | 222 | def handle_frame({:text, %{"method" => "Inspector.targetCrashed"}} = frame, state) do 223 | log_debug("Inspector crashed: #{inspect(frame)}") 224 | GenServer.reply(state.reply_to, {:error, :crashed, :crashed}) 225 | {:stop, :abnormal, state} 226 | end 227 | 228 | def handle_frame(_frame, state) do 229 | {:noreply, state} 230 | end 231 | 232 | def handle_info({:override_print_page}, state) do 233 | method(state.socket, "Page.stopLoading", %{}, 999) 234 | send(self(), {:print_page}) 235 | {:noreply, %{state | canceled: true}} 236 | end 237 | 238 | def handle_info({:print_page}, state) do 239 | if state.timer, do: Process.cancel_timer(state.timer) 240 | if state.print_timer, do: Process.cancel_timer(state.print_timer) 241 | 242 | method(state.socket, "Page.printToPDF", state.print_options, @pdf_req_id) 243 | {:noreply, %{state | timer: nil, print_timer: nil}} 244 | end 245 | 246 | def handle_info(_msg, state) do 247 | {:noreply, state} 248 | end 249 | 250 | def handle_error(reason, state) do 251 | log_debug("HPDF Error: #{inspect(reason)}") 252 | {:noreply, state} 253 | end 254 | 255 | def handle_close(_reason, state) do 256 | {:noreply, state} 257 | end 258 | 259 | def terminate(_reason, state) do 260 | Process.exit(state.receiver, :kill) 261 | :ok 262 | end 263 | 264 | defp handle_pdf(pdf_data, state) do 265 | data = Base.decode64(pdf_data) 266 | case data do 267 | {:ok, bytes} -> 268 | GenServer.reply(state.reply_to, {:ok, bytes}) 269 | {:error, reason} -> 270 | GenServer.reply(state.reply_to, {:error, reason}) 271 | end 272 | end 273 | 274 | defp method(socket, meth, params, nil) do 275 | method(socket, meth, params, 56) 276 | end 277 | 278 | defp method(socket, meth, params, id) do 279 | args = %{method: meth, id: id, params: params} 280 | Socket.Web.send(socket, {:text, Poison.encode!(args)}) 281 | end 282 | 283 | defp updated_headers_for_request( 284 | %{"headers" => headers}, 285 | %{page_headers: nil} 286 | ) do 287 | headers 288 | end 289 | 290 | defp updated_headers_for_request( 291 | %{"headers" => req_headers, "url" => req_url}, 292 | %{page_headers: headers, page_url: url, include_headers_on_same_domain: true} 293 | ) when is_map(headers) do 294 | uri = URI.parse(url) 295 | req_uri = URI.parse(req_url) 296 | 297 | if uri.host == req_uri.host do 298 | Map.merge(req_headers, headers) 299 | else 300 | req_headers 301 | end 302 | end 303 | 304 | defp updated_headers_for_request(%{"headers" => req_headers}, _) do 305 | req_headers 306 | end 307 | 308 | defp log_debug(message) do 309 | Logger.debug(message) 310 | end 311 | end 312 | -------------------------------------------------------------------------------- /lib/hpdf/receiver.ex: -------------------------------------------------------------------------------- 1 | defmodule HPDF.Receiver do 2 | @moduledoc false 3 | 4 | # Receives messages from the raw web socket. 5 | require Logger 6 | 7 | def start_link({socket, owning_pid}, _opts \\ []) do 8 | pid = spawn_link(__MODULE__, :listen, [socket, owning_pid, %{}]) 9 | {:ok, pid} 10 | end 11 | 12 | def listen(socket, owning_pid, state) do 13 | case Socket.Web.recv(socket) do 14 | :ok -> {:socket_recv, :ok} 15 | :close -> {:socket_close, :close} 16 | {:ok, msg} -> 17 | Process.send owning_pid, {:socket_recv, {:ok, msg}}, [] 18 | listen socket, owning_pid, state 19 | {:error, thing} -> 20 | Process.send owning_pid, {:socket_recv, {:error, thing}}, [] 21 | listen socket, owning_pid, state 22 | {:close, reason} -> 23 | Process.send owning_pid, {:socket_close, reason}, [] 24 | msg -> 25 | Process.send owning_pid, {:socket_recv, {:unknown, msg}}, [] 26 | listen socket, owning_pid, state 27 | end 28 | end 29 | end 30 | -------------------------------------------------------------------------------- /lib/hpdf/web_socket.ex: -------------------------------------------------------------------------------- 1 | defmodule HPDF.WebSocket do 2 | @moduledoc false 3 | 4 | # Converts web socket callbacks into something more useful 5 | 6 | defmacro __using__(_opts \\ []) do 7 | quote do 8 | use GenServer 9 | 10 | require Logger 11 | 12 | alias HPDF.Receiver 13 | 14 | def start_link(args, options \\ []) do 15 | GenServer.start_link(__MODULE__, args, options) 16 | end 17 | 18 | def start(args, options \\ []) do 19 | GenServer.start(__MODULE__, args, options) 20 | end 21 | 22 | def init(args) do 23 | socket_args = Keyword.fetch!(args, :socket_args) 24 | with {:ok, socket} <- apply(Socket.Web, :connect, socket_args) do 25 | {:ok, receiver} = Receiver.start_link({socket, self()}) 26 | Process.send_after(self(), :on_connect, 10) 27 | {:ok, %{socket: socket, receiver: receiver, args: args}} 28 | end 29 | end 30 | 31 | def send_frame(pid, frame) do 32 | GenServer.cast(pid, {:send_frame, frame}) 33 | end 34 | 35 | defp do_send_frame(socket, frame) do 36 | Socket.Web.send(socket, frame) 37 | end 38 | 39 | def close(socket) do 40 | Socket.Web.close(socket) 41 | end 42 | 43 | def handle_cast({:send_frame, frame}, %{socket: socket} = state) do 44 | do_send_frame(socket, frame) 45 | {:noreply, state} 46 | end 47 | 48 | def handle_info(:on_connect, state), do: handle_connect(state) 49 | 50 | def handle_info({:socket_recv, :ok}, state), do: {:noreply, state} 51 | 52 | def handle_info({:socket_close, reason}, state), do: handle_close(reason, state) 53 | def handle_info({:socket_recv, :ok}, state), do: {:noreply, state} 54 | def handle_info({:socket_recv, {:ok, {:text, msg}}}, state) do 55 | body = Poison.decode!(msg) 56 | handle_frame({:text, body}, state) 57 | end 58 | 59 | def handle_info({:socket_recv, {:ok, {:close, :abnormal, reason}}}, state) do 60 | handle_close({:abnormal, reason}, state) 61 | end 62 | 63 | def handle_info({:socket_recv, {:error, :ebadf}}, state), do: {:stop, :lost_connection} 64 | def handle_info({:socket_recv, {:error, reason}}, state), do: handle_error(reason, state) 65 | 66 | def handle_connect(state), do: {:noreply, state} 67 | def handle_frame(_msg, state), do: {:noreply, state} 68 | def handle_error(_reason, state), do: {:noreply, state} 69 | def handle_close(:close, state), do: {:exit, :normal} 70 | def handle_close({:abnormal, reason}, state), do: {:exit, :abnormal} 71 | def handle_close(_msg, state), do: {:exit, :normal} 72 | 73 | defoverridable start_link: 1, 74 | start_link: 2, 75 | handle_connect: 1, 76 | handle_frame: 2, 77 | handle_error: 2, 78 | handle_close: 2, 79 | init: 1 80 | end 81 | end 82 | end 83 | -------------------------------------------------------------------------------- /mix.exs: -------------------------------------------------------------------------------- 1 | defmodule HPDF.Mixfile do 2 | @moduledoc false 3 | use Mix.Project 4 | 5 | @version "0.3.1" 6 | @url "https://github.com/hassox/hpdf" 7 | @maintainers [ 8 | "Daniel Neighman", 9 | ] 10 | 11 | def project do 12 | [app: :hpdf, 13 | version: @version, 14 | elixir: "~> 1.5", 15 | package: package(), 16 | source_url: @url, 17 | maintainers: @maintainers, 18 | description: "PDF printer using headless Chrome", 19 | homepage_url: @url, 20 | build_embedded: Mix.env == :prod, 21 | start_permanent: Mix.env == :prod, 22 | docs: docs(), 23 | deps: deps()] 24 | end 25 | 26 | # Configuration for the OTP application 27 | # 28 | # Type "mix help compile.app" for more information 29 | def application do 30 | # Specify extra applications you'll use from Erlang/Elixir 31 | [extra_applications: [:logger], 32 | mod: {HPDF.Application, []}] 33 | end 34 | 35 | def docs do 36 | [ 37 | extras: ["README.md", "CHANGELOG.md"], 38 | source_ref: "v#{@version}" 39 | ] 40 | end 41 | 42 | # Dependencies can be Hex packages: 43 | # 44 | # {:my_dep, "~> 0.3.0"} 45 | # 46 | # Or git/path repositories: 47 | # 48 | # {:my_dep, git: "https://github.com/elixir-lang/my_dep.git", tag: "0.1.0"} 49 | # 50 | # Type "mix help deps" for more examples and options 51 | defp deps do 52 | [{:socket, "~> 0.3.12"}, 53 | {:httpotion, "~> 3.0.1"}, 54 | {:uuid, "~>1.1"}, 55 | {:poison, "~> 3.1.0" }, 56 | {:ex_doc, ">= 0.0.0", only: :dev}, 57 | ] 58 | end 59 | 60 | defp package do 61 | [ 62 | maintainers: @maintainers, 63 | licenses: ["MIT"], 64 | links: %{github: @url}, 65 | files: ~w(lib) ++ ~w(CHANGELOG.md LICENSE mix.exs README.md) 66 | ] 67 | end 68 | end 69 | -------------------------------------------------------------------------------- /mix.lock: -------------------------------------------------------------------------------- 1 | %{"earmark": {:hex, :earmark, "1.2.3", "206eb2e2ac1a794aa5256f3982de7a76bf4579ff91cb28d0e17ea2c9491e46a4", [], [], "hexpm"}, 2 | "ex_doc": {:hex, :ex_doc, "0.18.1", "37c69d2ef62f24928c1f4fdc7c724ea04aecfdf500c4329185f8e3649c915baf", [], [{:earmark, "~> 1.1", [hex: :earmark, repo: "hexpm", optional: false]}], "hexpm"}, 3 | "httpotion": {:hex, :httpotion, "3.0.2", "525b9bfeb592c914a61a8ee31fdde3871e1861dfe805f8ee5f711f9f11a93483", [:mix], [{:ibrowse, "~> 4.2", [hex: :ibrowse, repo: "hexpm", optional: false]}], "hexpm"}, 4 | "ibrowse": {:hex, :ibrowse, "4.4.0", "2d923325efe0d2cb09b9c6a047b2835a5eda69d8a47ed6ff8bc03628b764e991", [:rebar3], [], "hexpm"}, 5 | "poison": {:hex, :poison, "3.1.0", "d9eb636610e096f86f25d9a46f35a9facac35609a7591b3be3326e99a0484665", [:mix], [], "hexpm"}, 6 | "socket": {:hex, :socket, "0.3.12", "4a6543815136503fee67eff0932da1742fad83f84c49130c854114153cc549a6", [:mix], [], "hexpm"}, 7 | "uuid": {:hex, :uuid, "1.1.7", "007afd58273bc0bc7f849c3bdc763e2f8124e83b957e515368c498b641f7ab69", [:mix], [], "hexpm"}} 8 | -------------------------------------------------------------------------------- /test/expdf_test.exs: -------------------------------------------------------------------------------- 1 | defmodule HPDFTest do 2 | use ExUnit.Case 3 | doctest HPDF 4 | 5 | test "the truth" do 6 | assert 1 + 1 == 2 7 | end 8 | end 9 | -------------------------------------------------------------------------------- /test/test_helper.exs: -------------------------------------------------------------------------------- 1 | ExUnit.start() 2 | --------------------------------------------------------------------------------