├── .env.example ├── .gitignore ├── README.md ├── add_scheduled_task.bat ├── aws-lambda.js ├── package.json ├── run.bat └── server.js /.env.example: -------------------------------------------------------------------------------- 1 | PACKT_EMAIL= 2 | PACKT_PASSWORD= -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | 5 | # Runtime data 6 | pids 7 | *.pid 8 | *.seed 9 | 10 | # Dependency directory 11 | # https://docs.npmjs.com/misc/faq#should-i-check-my-node-modules-folder-into-git 12 | node_modules 13 | 14 | # Environment 15 | .env 16 | 17 | # Editors 18 | .idea 19 | output.txt 20 | test -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Grab a book a day for free from Packt Pub, https://www.packtpub.com/packt/offers/free-learning. 2 | 3 | ## 1. Install prerequisites 4 | 5 | Install this script in the cloned directory using the following command: 6 | 7 | npm install 8 | 9 | 10 | ## 2. Add credentials 11 | 12 | Copy the .env file into place with 13 | 14 | cp .env.example .env 15 | 16 | Or on Windows: 17 | 18 | copy .env.example .env 19 | 20 | And set your packt email and password. 21 | 22 | 23 | ## 3. Grab on recurrent basis 24 | 25 | ### Using Node 26 | After that run the script with the following command: 27 | 28 | watch -n 5000 --differences node server.js 29 | 30 | ### Using Crontab 31 | Or add it to your crontable: 32 | 33 | crontab -e 34 | 35 | For the crontab all paths in **MUST** be absolute. 36 | 37 | Within the open cron editor window 38 | 39 | 0 14 * * * /usr/local/bin/node /Users///grab_packt/server.js >> /tmp/cron_output 40 | 41 | If you are using UTC/BST timezone in your server, you might want to set the crontab as follow: 42 | 43 | 25 0 * * * /usr/bin/nodejs /home/user/grab_packt/server.js >> /tmp/cron_output 44 | 45 | ### Using Task Scheduler in Windows 46 | 47 | Check the *run.bat* file in the repo. Correct any path if necessary according to your needs. Try running the script manually to verify that it works as expected. 48 | 49 | Then add a scheduled task to execute run.bat every day by running. 50 | 51 | add_scheduled_task.bat 52 | 53 | ### OSX Error 54 | If you get the message: 55 | `crontab: temp file must be edited in place` 56 | 57 | On a related issue, if you get the message: 58 | `crontab: temp file must be edited in place` 59 | 60 | **Try:** 61 | 1) Add to `.bash_profile` 62 | ```sh 63 | alias crontab="VIM_CRONTAB=true crontab" 64 | ``` 65 | 2) Add to `.vimrc` 66 | ```vi 67 | if $VIM_CRONTAB == "true" 68 | set nobackup 69 | set nowritebackup 70 | endif 71 | ``` 72 | *note: .bash_profile might be called .profile* 73 | *note: .vimrc and .bash_profile are located in the home directory: `~/`* 74 | *Reference: http://superuser.com/a/750528* 75 | 76 | ### Using Launchd (OSX) 77 | launchd is recommended over cron for the OSX system. 78 | 79 | This runs on load and from then on every 24 hours (86400 seconds). 80 | Just substitute `` for your own. 81 | 82 | *by daemon I am referring to the .plist file* 83 | 84 | Navigate to directory: 85 | ```sh 86 | cd $HOME/Library/LaunchAgents 87 | ``` 88 | 89 | Create file: 90 | ```sh 91 | touch com..grab_pkt.plist 92 | ``` 93 | 94 | Edit file: 95 | ```xml 96 | 97 | 98 | 99 | 100 | Label 101 | com..grab_pkt 102 | 103 | ProgramArguments 104 | 105 | /usr/local/bin/node 106 | /Users//development/misc/grab_packt/server.js 107 | 108 | 109 | Nice 110 | 1 111 | 112 | StartInterval 113 | 86400 114 | 115 | RunAtLoad 116 | 117 | 118 | StandardErrorPath 119 | /tmp/GrabPkt.err 120 | 121 | StandardOutPath 122 | /tmp/GrabPkt.out 123 | 124 | 125 | ``` 126 | 127 | Load this daemon into the system: 128 | ```sh 129 | launchctl load com..grab_pkt.plist 130 | ``` 131 | *to unload just change load to unload* 132 | 133 | Check output of script: 134 | ```sh 135 | /tmp/GrabPkt.out 136 | ``` 137 | It should be similar to: 138 | ```sh 139 | ----------- Packt Grab Started ----------- 140 | Book Title: Learning Libgdx Game Development 141 | Claim URL: https://www.packtpub.com/freelearning-claim/13277/21478 142 | ----------- Packt Grab Done -------------- 143 | ``` 144 | 145 | Check for errors: 146 | ```sh 147 | /tmp/GrabPkt.err 148 | ``` 149 | Mine is empty due to having no errors. 150 | 151 | In order to test I would: 152 | - remove the `GrabPkt.out` file 153 | - unload daemon 154 | - load daemon 155 | - check output of `GrabPkt.out` file 156 | 157 | *reference: http://alvinalexander.com/mac-os-x/mac-osx-startup-crontab-launchd-jobs* 158 | 159 | ### Using in AWS Lambda 160 | 161 | 1. You need to [sign up](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) for an AWS account or use an existing one. 162 | 2. After sign in to the **AWS Management Console**, select a region and open the **AWS Lambda console**. 163 | 3. Choose **Get Started Now**. 164 | ![](http://docs.aws.amazon.com/lambda/latest/dg/images/gs-1-10.png) 165 | - Note: The console shows the **Get Started Now** page only if you do not have any Lambda functions created. If you have created functions already, you will see the **Lambda** > **Functions** page. On the list page, choose **Create a Lambda function** to go to the **Lambda** > **New function** page. 166 | 4. On the **Select blueprint page**, choose **Skip**. 167 | 5. On the **Configure triggers** page, do the following: 168 | 1. Choose **CloudWatch Events - Schedule**. 169 | 2. Enter a rule name in **Rule name**. 170 | 3. **Schedule expression** select `rate(1 day)`. 171 | 4. Check **Enable trigger**. 172 | 5. Choose **Next**. 173 | 6. On the **Configure function** page, do the following: 174 | 1. Enter a function name in **Name**. 175 | 2. **Runtime** is `Node.js 4.3` or above. 176 | 3. **Code entry type** is `Upload a .ZIP file`. 177 | 4. Zip the source code and **Upload**. 178 | - Please make sure to execute `npm install` and configure `.env` before zip the source code. 179 | 5. In the **Lambda function handler and role** section, do the following: 180 | 1. **Handler** is `aws-lambda.handler`. 181 | 2. In **Role**, choose **Create new role from template(s)**. 182 | 3. In **Role name**, type a name for the role. 183 | 4. In **Role templates**, you can leave this field blank because your Lambda function already has the basic execution permission it needs. 184 | 6. In the **Advanced settings** section, do the following: 185 | 1. In **Memory (MB)**, choose `128`. 186 | 2. In **Timeout**, enter `0` min `30` sec. 187 | 7. Choose **Next**. 188 | 7. Choose **Create Function** to create a Lambda function. 189 | ![Imgur](http://i.imgur.com/S3YDeqw.png) 190 | 8. Choose **Test**. 191 | 9. In the **Input test event** page, enter `{}` in the window. 192 | 10. Choose **Save and test**. 193 | 11. Upon successful execution, view results in the console. 194 | ![Imgur](http://i.imgur.com/TV2E1LO.png) 195 | -------------------------------------------------------------------------------- /add_scheduled_task.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | cls 3 | 4 | SET TASKSLIST=taskslist.txt 5 | SET TASKNAME=Grab_Packt_Books 6 | 7 | schtasks.exe /query > %TASKSLIST% 8 | findstr /B /I %TASKNAME% %TASKSLIST% >nul 9 | 10 | IF %errorlevel%==0 GOTO :delete 11 | GOTO :create 12 | 13 | :delete 14 | echo Removing scheduled task %TASKNAME% 15 | schtasks.exe /DELETE /TN "%TASKNAME%" /F >nul 16 | 17 | :create 18 | echo Creating scheduled task %TASKNAME% 19 | schtasks.exe /Create /SC DAILY /TN "%TASKNAME%" /TR "C:\Users\%UserName%\Documents\GitHub\grab_packt\run.bat" 20 | del "%TASKSLIST%" >nul 21 | -------------------------------------------------------------------------------- /aws-lambda.js: -------------------------------------------------------------------------------- 1 | require('dotenv').load({ 2 | path: __dirname + '/.env' 3 | }); 4 | 5 | var request = require('request'); 6 | var cheerio = require('cheerio'); 7 | var loginDetails = { 8 | email: process.env.PACKT_EMAIL, 9 | password: process.env.PACKT_PASSWORD, 10 | op: "Login", 11 | form_id: "packt_user_login_form", 12 | form_build_id: "" 13 | }; 14 | var url = 'https://www.packtpub.com/packt/offers/free-learning'; 15 | var loginError = 'Sorry, you entered an invalid email address and password combination.'; 16 | var getBookUrl; 17 | var bookTitle; 18 | 19 | //we need cookies for that, therefore let's turn JAR on 20 | request = request.defaults({ 21 | jar: true 22 | }); 23 | 24 | exports.handler = function(event, context, callback) { 25 | 26 | request(url, function(err, res, body) { 27 | if (err) { 28 | callback('Request failed'); 29 | return; 30 | } 31 | 32 | var $ = cheerio.load(body); 33 | getBookUrl = 'https://www.packtpub.com' + $("a.twelve-days-claim").attr("href"); 34 | bookTitle = $(".dotd-title").text().trim(); 35 | var newFormId = $("input[type='hidden'][id^=form][value^=form]").val(); 36 | 37 | if (newFormId) { 38 | loginDetails.form_build_id = newFormId; 39 | } 40 | 41 | request.post({ 42 | uri: url, 43 | headers: { 44 | 'content-type': 'application/x-www-form-urlencoded' 45 | }, 46 | body: require('querystring').stringify(loginDetails) 47 | }, function(err, res, body) { 48 | if (err) { 49 | callback('Login failed'); 50 | return; 51 | }; 52 | var $ = cheerio.load(body); 53 | var loginFailed = $("div.error:contains('"+loginError+"')"); 54 | if (loginFailed.length) { 55 | callback('Login failed, please check your email address and password'); 56 | return; 57 | } 58 | 59 | request(getBookUrl, function(err, res, body) { 60 | if (err) { 61 | callback('Request Error'); 62 | return; 63 | } 64 | 65 | console.log({ 66 | title: bookTitle, 67 | url: getBookUrl, 68 | }); 69 | }); 70 | }); 71 | }); 72 | } 73 | 74 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "scrapper", 3 | "version": "0.0.0", 4 | "description": "grabbing a book a day", 5 | "main": "server.js", 6 | "repository": "", 7 | "author": "", 8 | "dependencies": { 9 | "cheerio": "latest", 10 | "dotenv": "^1.1.0", 11 | "request": "latest" 12 | } 13 | } 14 | -------------------------------------------------------------------------------- /run.bat: -------------------------------------------------------------------------------- 1 | "C:\Program Files\nodejs\node.exe" "C:\Users\%UserName%\Documents\GitHub\grab_packt\server.js" >> "C:\Users\%UserName%\Documents\GitHub\grab_packt\output.txt" -------------------------------------------------------------------------------- /server.js: -------------------------------------------------------------------------------- 1 | require('dotenv').load({ 2 | path: __dirname + '/.env' 3 | }); 4 | 5 | var request = require('request'); 6 | var cheerio = require('cheerio'); 7 | var loginDetails = { 8 | email: process.env.PACKT_EMAIL, 9 | password: process.env.PACKT_PASSWORD, 10 | op: "Login", 11 | form_id: "packt_user_login_form", 12 | form_build_id: "" 13 | }; 14 | var url = 'https://www.packtpub.com/packt/offers/free-learning'; 15 | var loginError = 'Sorry, you entered an invalid email address and password combination.'; 16 | var getBookUrl; 17 | var bookTitle; 18 | 19 | //we need cookies for that, therefore let's turn JAR on 20 | request = request.defaults({ 21 | jar: true 22 | }); 23 | 24 | console.log('----------- Packt Grab Started -----------'); 25 | request(url, function(err, res, body) { 26 | if (err) { 27 | console.error('Request failed'); 28 | console.log('----------- Packt Grab Done --------------'); 29 | return; 30 | } 31 | 32 | var $ = cheerio.load(body); 33 | getBookUrl = $("a.twelve-days-claim").attr("href"); 34 | bookTitle = $(".dotd-title").text().trim(); 35 | var newFormId = $("input[type='hidden'][id^=form][value^=form]").val(); 36 | 37 | if (newFormId) { 38 | loginDetails.form_build_id = newFormId; 39 | } 40 | 41 | request.post({ 42 | uri: url, 43 | headers: { 44 | 'content-type': 'application/x-www-form-urlencoded' 45 | }, 46 | body: require('querystring').stringify(loginDetails) 47 | }, function(err, res, body) { 48 | if (err) { 49 | console.error('Login failed'); 50 | console.log('----------- Packt Grab Done --------------'); 51 | return; 52 | }; 53 | var $ = cheerio.load(body); 54 | var loginFailed = $("div.error:contains('"+loginError+"')"); 55 | if (loginFailed.length) { 56 | console.error('Login failed, please check your email address and password'); 57 | console.log('Login failed, please check your email address and password'); 58 | console.log('----------- Packt Grab Done --------------'); 59 | return; 60 | } 61 | 62 | request('https://www.packtpub.com' + getBookUrl, function(err, res, body) { 63 | if (err) { 64 | console.error('Request Error'); 65 | console.log('----------- Packt Grab Done --------------'); 66 | return; 67 | } 68 | 69 | var $ = cheerio.load(body); 70 | 71 | console.log('Book Title: ' + bookTitle); 72 | console.log('Claim URL: https://www.packtpub.com' + getBookUrl); 73 | console.log('----------- Packt Grab Done --------------'); 74 | }); 75 | }); 76 | }); 77 | --------------------------------------------------------------------------------