-
250 |
- 251 | 252 | 253 |
- 254 | Frankfurt Main / Germany 255 | 256 |
- 257 | Sign in to view email 258 | 259 | 260 |
307 | Popular repositories
308 |
309 |
310 |
311 |
312 | -
313 |
-
314 |
315 |
316 |
317 | ihave.to
318 |
319 |
320 |
321 |
322 |
End2End encrypted memo board with realtime multiuser support
323 | 324 |325 | 326 | JavaScript 327 | 328 | 329 | 6 330 | 331 | 332 | 333 | 2 334 | 335 |
336 | 337 |
338 | -
339 |
340 |
341 |
342 | omm
343 |
344 |
345 |
346 |
347 |
An oject to markup mapper to keep plain html out of your JavaScript code
348 | 349 |350 | 351 | 352 | 1 353 | 354 |
355 | 356 |
357 | -
358 |
359 |
360 |
361 | appstack.io
362 |
363 |
364 |
365 |
366 |
application to give unrelated data, structure and bindings inside a full relational storage
367 | 368 |369 | 370 | PHP 371 |
372 | 373 |
374 | -
375 |
376 |
377 |
378 | clrlog
379 |
380 |
381 |
382 |
383 |
Lightweight colorful JavaScript application logger with stack trace and logfile support for node.js
384 | 385 |386 | 387 | JavaScript 388 |
389 | 390 |
391 | -
392 |
393 |
394 |
395 | traider.io
396 |
397 |
398 |
399 |
Forked from EastpointSoftware/traider.io
400 | 401 |traider.io
402 | 403 |404 | 405 | JavaScript 406 |
407 | 408 |
409 | -
410 |
411 |
412 |
413 | grunt-ssh-deploy
414 |
415 |
416 |
417 |
Forked from dasuchin/grunt-ssh-deploy
418 | 419 |Grunt SSH Deployment
420 | 421 |422 | 423 | JavaScript 424 |
425 | 426 |
427 |
434 | 1 contribution 435 | in 2012 436 |
437 | 438 |-
966 |
- 967 | 968 | 2018 969 | 970 | 971 |
- 972 | 973 | 2017 974 | 975 | 976 |
- 977 | 978 | 2016 979 | 980 | 981 |
- 982 | 983 | 2015 984 | 985 | 986 |
- 987 | 988 | 2014 989 | 990 | 991 |
- 992 | 993 | 2013 994 | 995 | 996 |
- 997 | 998 | 2012 999 | 1000 | 1001 |
1006 | Contribution activity 1007 | 1029 |
1030 | 1031 |1036 | May 1037 | 1038 | 2012 1039 |
1040 | 1041 | 1042 | 1043 | 1044 | 1045 | 1046 | 1047 | 1048 | 1049 | 1050 | 1051 | 1052 | 1053 |1059 | Joined GitHub 1060 |
1061 |
1067 |
9 |
10 | ## About
11 | FetchBot is a library and shell command that provides a simple JSON-API to perform human like interactions and
12 | data extractions on any website and was built on top of [puppeteer](https://github.com/GoogleChrome/puppeteer).
13 |
14 | **Simple working principle:**
15 |
16 |
17 | **Extended data fetch working principle:**
18 |
19 |
20 | **Introduction video (German):**
21 |
22 | [](https://www.youtube.com/watch?v=t71saoi4slQ)
23 |
24 |
25 | **Using FetchBot you can do both:**
26 | - automate website interactions like a human
27 | - treat website(s) like an API and use fetched data in your project.
28 |
29 | FetchBot has an "event listener like" system that turns your browser into a bot who knows what to do when the url
30 | changes. The "event" is an url/regex and it's configuration is executed, once the url/pattern matches the currently
31 | opened one. Now on it's up to you to configure a friendly bot or a crazy zombie.
32 |
33 | ````javascript
34 | const myFetchBotInstance = new FetchBot({attached:true});
35 |
36 | let resultForJob1 = await myFetchBotInstance.runAndStandby('/path/to/job1.json');
37 | let resultForJob2 = await myFetchBotInstance.runAndStandby('/path/to/job2.json');
38 |
39 | await myFetchBotInstance.exit();
40 |
41 | // Now do something with the results
42 | console.log(resultForJob1);
43 | console.log(resultForJob2);
44 | ````
45 |
46 | ## Installation
47 |
48 | **NOTICE: FetchBot is not running on ARM architectures yet**
49 |
50 | ### Short installation (works well on a mac)
51 |
52 | You can install via npm in your project using:
53 | ```bash
54 | npm install --save fetchbot
55 | ```
56 |
57 |
58 | ### Safe installation (For installs on Debian/Ubuntu or other linux systems)
59 | Ensure dependencies below are installed on Debian/Ubuntu systems
60 |
61 | ````bash
62 | apt-get install gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
63 | ````
64 |
65 | For other operating systems have a look in the
66 | [troubleshooting section](https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md) for puppeteer
67 | related problems.
68 |
69 |
70 | **For other problems leave an issue here.**
71 |
72 | ### Use as library in your own project
73 |
74 | To get the most out of FetchBot it can be also integrated into a software project as a 3rd party library.
75 | From here on there are unlimited possibilities and a list of nice use cases will follow soon.
76 | ````bash
77 | $ cd /my/existing/project/
78 | $ npm install fetchbot
79 | ````
80 | ### Use as global command
81 |
82 | ````bash
83 | $ sudo npm install -g fetchbot --unsafe-perm=true
84 |
85 | # --unsafe-perm=true is required yet due to global install issues in puppeteer
86 | # https://github.com/GoogleChrome/puppeteer/issues/375#issuecomment-363466257
87 | ````
88 |
89 | ### Options
90 | Many options can be applied directly via passed configuration object to control browser and page behavior.
91 | All these options can be passed via command line too. An entire list of all command line options can be obtained via:
92 |
93 | > To get a complete list whats possible via commandline just type
94 | ````bash
95 | $ fetchbot --help
96 | ````
97 | or in a local installation
98 | ````bash
99 | $ ./node_modules/.bin/fetchbot --help
100 | ````
101 |
102 | #### Options params
103 | ````text
104 | attached: boolean | default=false Specifies if the browser window is shown or not
105 | trust: boolean | default=false Open unsecure https pages without a warning
106 | width: number | defautlt=800 Browser and view port width
107 | height: number | default=600 Browser and view port height
108 | wait: number | default=750 Delay after each command before execution continues
109 | slowmo:number | default=0 Slowes down the execution in milliseconds
110 | agent:string | default=Fetchbot-1.10.1 User agent string
111 | debug: boolean | default=false Determine if debug/logging messages are shown
112 | ````
113 |
114 |
115 | #### Pass options via command line
116 | > Command line input example
117 | ````bash
118 | $ fetchbot --job=./path/to/job/file.json --slowmo=250 --output=a-json-file.json --attached --debug
119 | ````
120 |
121 | #### Pass options as configuration object in the library
122 | ````javascript
123 | const FetchBot = require('fetchbot');
124 |
125 | (async () => {
126 |
127 | // Pass a path to a job configuration file
128 | const fetchbot = new FetchBot({attached: false});
129 | fetchBotData = await fetchbot.runAndExit('./path/to/job/file.json');
130 |
131 | console.log(fetchBotData);
132 |
133 |
134 | // Or by passing a configuration opject directly
135 | const fetchbot = new FetchBot({
136 | "attached": true,
137 | "slowmo": 250,
138 | "width": 1280,
139 | "height": 1024,
140 | "trust": true
141 | });
142 |
143 | fetchBotData = await fetchbot.runAndExit({
144 | "https://google.com": {
145 | "root": true,
146 | "type": [
147 | [
148 | "input",
149 | "puppeteer-fetchbot aoepeople"
150 | ],
151 | [
152 | "input",
153 | "\n"
154 | ]
155 | ]
156 | },
157 | "/search": {
158 | "fetch": {
159 | "h3.r > a AS headlines": [],
160 | "h3.r > a AS links": {
161 | "attr": "href",
162 | "type": []
163 | }
164 | },
165 | "waitFor": [
166 | [
167 | 1000
168 | ]
169 | ]
170 | }
171 | });
172 | console.log(fetchBotData);
173 | })();
174 | ````
175 |
176 | ## Job configuration
177 | A job configuration is a JSON object which has on the highest level URI's as keys.
178 | > Example the configurations highest level
179 |
180 | ````json
181 | {
182 | "https://github.com/aoepeople": {"root":true}
183 | }
184 | ````
185 |
186 | ````json
187 | {
188 | "https://github.com/aoepeople": {"root":true},
189 | "https://github.com/aoepeople/home.html": [{}, {}],
190 | "https://www.aoe.com/en/solutions.html": {"root":true}
191 | }
192 | ````
193 |
194 | - **Root** Objects
195 | - **Stopover** Objects (can be wrapped in arrays)
196 |
197 | ### Root objects
198 |
199 | The root level url forces FetchBot to to open the page url immediately. It's allowed to have multiple root
200 | objects inside a single configuration. Once all root configuration urls have been visited the FetchBot job is finished and
201 | fetched data is returned (see **Data Fetching**).
202 |
203 | >Example
204 | ````json
205 | {
206 | "https://www.aoe.com/en/": {
207 | "root":true,
208 | "click":"nav.main-menu.ng-scope > ul > li:nth-child(2) > a"
209 | }
210 | }
211 | ````
212 |
213 | ### Stopover objects
214 |
215 | Stopover objects do **not** have the root property. These objects behave different and can be understood a bit like
216 | event listeners. Once the browser changes the url and the opened url matches a stopover url ist's configuration gets
217 | applied (e.g. by a form submission on a root page or a clicked link). Once a configuration has been applied to an open
218 | page the object gets immediately removed from FetchBot job list.
219 |
220 | > Syntax
221 |
222 | ````json
223 | {
224 |
225 | "https://www.aoe.com/en/solutions.html": {
226 | "click":"nav.main-menu.ng-scope > ul > li:nth-child(2) > a"
227 | },
228 |
229 | "https://www.aoe.com/en/products.html": [
230 | {
231 | "click":"[data-qa=\"header-navigation-search-icon\"]"
232 | },
233 | {
234 | "type":[["#city-input-field", "Open Source"]],
235 | "click":"#search"
236 | }
237 | ]
238 | }
239 | ````
240 |
241 |
242 |
243 | ### Command types for interaction
244 |
245 | There are three ways yet how page-commands can be called.
246 |
247 | - Without a parameter (No argument action)
248 | - With a single argument (Single argument action)
249 | - With mutiple arguments (Multiple arguments action)
250 |
251 | **Note:** Any single argument action can also been called using the multiple argument action
252 |
253 | #### No argument action
254 |
255 | > Syntax for e.g. page.reload()
256 | ````json
257 | {
258 | "reload":null
259 | }
260 | ````
261 | #### Single argument action
262 | > Syntax for e.g. page.click("#myButton")
263 | ````json
264 | {
265 | "click":"#myButton"
266 | }
267 | ````
268 | #### Multiple arguments action
269 | > Syntax for e.g. page.type("#myInput", "Hello World")
270 | ````json
271 | {
272 | "type":[
273 | ["#myInput", "Hello World"]
274 | ]
275 | }
276 | ````
277 |
278 | ### Data Fetching aka. "Crawling"
279 |
280 | For data fetching there is a `fetch` API that simplifies puppeteers evaluation interface.
281 | The `fetch` API provides declarative support to four different data types:
282 |
283 | - `Boolean`
284 | - `Number`
285 | - `String`
286 | - `Array of String(s)`
287 | - `Array of Numbers(s)`
288 | - `Objects containing an additional attribute matching `
289 |
290 | And of course it's possible to map meaningful property names to selectors using the `AS` or `as`
291 | keyword.
292 |
293 | Fetching the `textContent` attribute is the default behavior but it's possible as well to access any other
294 | attribute. Then write instead of the defined data type an object containing a configuration of `type` and `attr`.
295 | `type` is the data type as previously explained and `attr` is the attribute to fetch.
296 |
297 | **Fetch syntax**
298 |
299 |
300 | **The configuration above results in an object like in the example below**
301 |
302 |
303 |
304 | ### And now it's time to start interaction with a website
305 |
306 | Feel free to copy this example below, save to a file e.g. googlesearch.json and execute using the cli tool.
307 |
308 | ```bash
309 | ./node_modules/.bin/fetchbot --job=googlesearch.json --debug --slowmo
310 | ````
311 |
312 | > Example job
313 | ```json
314 | {
315 | "https://google.com": {
316 | "root": true,
317 | "type": [
318 | [
319 | "input",
320 | "puppeteer-fetchbot aoepeople"
321 | ],
322 | [
323 | "input",
324 | "\n"
325 | ]
326 | ]
327 | },
328 | "/search": {
329 | "fetch": {
330 | "h3.r > a AS headlines": [],
331 | "h3.r > a AS links": {
332 | "attr": "href",
333 | "type": []
334 | }
335 | },
336 | "waitFor": [
337 | [
338 | 1000
339 | ]
340 | ]
341 | }
342 | }
343 | ````
344 | > Results in something like this
345 | `````json
346 | {
347 | "headlines": [
348 | "GitHub - AOEpeople/puppeteer-fetchbot: Library and Shell command ...",
349 | "AOE · GitHub",
350 | "fetchbot - npm"
351 | ],
352 | "links": [
353 | "https://github.com/AOEpeople/puppeteer-fetchbot",
354 | "https://github.com/AOEpeople",
355 | "https://www.npmjs.com/package/fetchbot"
356 | ]
357 | }
358 | `````
359 |
360 | A complete list whats possible on a page is yet only available in the puppeteer documentation at
361 | [Page API Chapter](https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#class-page).
362 |
363 | ## Examples
364 | ### Boilerplate (plain JS)
365 | ````javascript
366 | var FetchBot = require('fetchbot'),
367 |
368 | // Or alternatively create an instance which tells FetchBot to load a JSON file as config
369 | myFetchBot = new FetchBot({attached: true, debug:true});
370 |
371 | myFetchBot
372 | .runAndExit('googlesearch.json')
373 | .then(function (result) {
374 | console.log(result);
375 | // {
376 | // "headlines": [
377 | // "GitHub - AOEpeople/puppeteer-fetchbot: Library and Shell command ...",
378 | // "AOE · GitHub",
379 | // "fetchbot - npm"
380 | // ],
381 | // "links": [
382 | // "https://github.com/AOEpeople/puppeteer-fetchbot",
383 | // "https://github.com/AOEpeople",
384 | // "https://www.npmjs.com/package/fetchbot"
385 | // ]
386 | // }
387 | });
388 | ````
389 |
390 | ### Conclusion
391 | FetchBot has been introduced to speed up the development process as a frontend engineer by stepping automatically over
392 | pages which are not part of the current user story. But during development more and more use cases were found and it
393 | made a lot of fun building "batch like" JSON files that turned the browser into a bot. FetchBot was written in [TypeScript](https://www.typescriptlang.org/) and is transpiled in build run.
394 | It's normally automatically built during installation.
395 |
396 | Now it's time to thank all the people who had an open ear and a different perspective than myself and yeah all in all
397 | made FetchBot much better.
398 |
399 |
400 | [travis-icon]: https://travis-ci.org/AOEpeople/puppeteer-fetchbot.svg?branch=master
401 | [travis]: https://travis-ci.org/AOEpeople/puppeteer-fetchbot "Build status – Travis-CI"
402 |
403 | [codecov]: https://codecov.io/gh/AOEpeople/puppeteer-fetchbot "Code Coverage – Codecov"
404 | [codecov-icon]: https://codecov.io/gh/AOEpeople/puppeteer-fetchbot/branch/master/graph/badge.svg "Code Coverage – Codecov"
405 |
406 | [npm]: https://npmjs.com/package/fetchbot "FetchBot – on NPM"
407 | [npm-icon]: https://img.shields.io/npm/v/fetchbot.svg
408 | [license-icon]: https://img.shields.io/npm/l/fetchbot.svg
409 | [downl-icon]: https://img.shields.io/npm/dt/fetchbot.svg "Count of total downloads – NPM"
410 | [build]: https://github.com/AOEpeople/puppeteer-fetchbot/tree/master/dist
411 |
412 |
413 |
414 |
415 |
416 |
--------------------------------------------------------------------------------
/mocks/gitHubPage.htm:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |