├── .gitignore
├── .travis.yml
├── LICENSE
├── README.md
├── download_links.py
├── main.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | language: python
 2 | python:
 3 |   - "3.8"
 4 |   - "3.7"
 5 |   - "3.6"
 6 |   - "3.5"
 7 | cache: pip
 8 | install:
 9 |   - pip install -r requirements.txt
10 | script:
11 |   - python main.py -h
12 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Gunjan Nandy
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # tutorial-pdf-downloader
 2 | 
 3 | [![Build Status](https://api.travis-ci.com/Gunjan933/tutorial-pdf-downloader.svg?branch=master)](https://travis-ci.com/Gunjan933/tutorial-pdf-downloader) [![Known Vulnerabilities](https://img.shields.io/badge/vulnerabilities%20-0-brightgreen.svg?style=flat)](https://snyk.io//test/github/Gunjan933/tutorial-pdf-downloader?targetFile=requirements.txt) [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/Gunjan933/tutorial-pdf-downloader/graphs/contributors) [![python >=3.5](https://img.shields.io/badge/python->=3.5-blue.svg?style=flat)](#python-support)
 4 | 
 5 | Downloads full tutorial PDFs from **[Javatpoint](https://www.javatpoint.com/)**, **[Tutorialspoint](https://www.tutorialspoint.com/)**  and other websites.
 6 | 
 7 | ## Disclaimer / Please note:
 8 | 
 9 | These websites can provide free and quality education by showing advertisements, that's their only source of income. Don't overuse this script, as it gives their server huge pressure. This script is for those, who are poor, who don't have the luxury of stable and sustained internet connection. As I think education should be free for all, doesn't mean I support piracy. This is for educational purposes only. Always support their work, either by paying or visiting their websites.
10 | 
11 | ## Usage
12 | 
13 | ### First install depedencies:
14 | Make sure you have pip installed, then run
15 | 
16 | ```console
17 | pip install --user -r requirements.txt
18 |  ```
19 | ### Set up download links:
20 | * Copy any links from any tutorial from **[Javatpoint](https://www.javatpoint.com/)** or **[Tutorialspoint](https://www.tutorialspoint.com/)** or **both** and paste it in `download_links.py`.
21 | 	- To see examples, open **[download_links.py](./download_links.py)**
22 | * If you want to download all listed tutorial in both sites, then jump to next step.
23 | 
24 | ### Run the Downloader:
25 | * To download each links from `download_links.py` that you set earlier:
26 | 
27 | 	```console
28 | 	python main.py -u
29 | 	```
30 | * To download all tutorials from Javatpoint:
31 | 
32 | 	```console
33 | 	python main.py -j
34 | 	```
35 | * To download all tutorials from Tutorialspoint:
36 | 
37 | 	```console
38 | 	python main.py -t
39 | 	```
40 | * To download all tutorials from Javatpoint and Tutorialspoint:
41 | 
42 | 	```console
43 | 	python main.py -a
44 | 	```
45 | * To check usages or help:
46 | 
47 | 	```console
48 | 	python main.py -h
49 | 	```
50 | 
51 | ## Save location
52 | 
53 | PDFs are saved in the same directory, in which you cloned the repository.
54 | ```
55 | -- some_parent_directory
56 |    |
57 |    |-- tutorial-pdf-downloader
58 |    |   |
59 |    |   |--main.py
60 |    |   |--download_links.py
61 |    |   |--requirements.txt
62 |    |   |--README.md
63 |    |
64 |    |
65 |    |-- downloads
66 |        |
67 |        |--artificial-intelligence
68 |        |--mobile-computing
69 | ```
70 | 
71 | ## Changelog
72 | * Added **[Tutorialspoint](https://www.tutorialspoint.com/)** support.
73 | 
74 | ## Future works
75 | * ~~Add **[Tutorialspoint](https://www.tutorialspoint.com/)** support.~~
76 | * Add GUI.
77 | 
78 | ## Bugs / Issues
79 | * If found, please report it in issues section.
80 | 
81 | ## Contribute
82 | * Any contributions or suggestions are welcome.


--------------------------------------------------------------------------------
/download_links.py:
--------------------------------------------------------------------------------
1 | download_list = {
2 | 	# "https://www.javatpoint.com/aptitude/quantitative" ,
3 | 	# "https://www.tutorialspoint.com/react_native/react_native_alert.htm" ,
4 | 	"https://www.javatpoint.com/apache-ant-tutorial" ,
5 | 	# "https://www.tutorialspoint.com/java/index.htm" ,
6 | }
7 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | from download_links import*
  2 | import multiprocessing as mp
  3 | from bs4 import BeautifulSoup
  4 | from multiprocessing import Process
  5 | from multiprocessing import Pool as process_pool
  6 | from multiprocessing.dummy import Pool as thread_pool
  7 | import requests, time, os, sys, weasyprint, argparse, fnmatch, shutil
  8 | 
  9 | def tutorialspoint_get_page(url):
 10 |     domain_name = "https://www.tutorialspoint.com"
 11 |     while True:
 12 |         try:
 13 |             page_response = requests.get(url, headers={'User-Agent': 'Chrome'}, timeout=5)
 14 |             soup = BeautifulSoup(page_response.content, "html.parser")
 15 |             str_soup = str(soup)
 16 |         except:
 17 |             print("    >>  Could not connect to "+url.split("/")[-1]+", retrying")
 18 |             time.sleep(1)
 19 |             continue
 20 |         else:
 21 |             print("    >>  Downloaded " + url.split("/")[-1])
 22 |             break
 23 |     page = str_soup[str_soup.find('<p'):]
 24 |     if "bottom_navigation" in page:
 25 |         page = page.split("bottom_navigation")[0]
 26 |         page = page[:page.rfind("<div")]
 27 |     else:
 28 |         print("Couldn't find delimiter")
 29 |     page = page.replace('="/','="'+domain_name+'/')
 30 |     return page
 31 | 
 32 | 
 33 | def tutorialspoint(url):
 34 |     tutorial = url.split("/")[-2]
 35 |     domain_name = url.split(tutorial)[0][:-1]
 36 |     while True:
 37 |         try:
 38 |             page_response = requests.get(url, headers={'User-Agent': 'Chrome'}, timeout=1)
 39 |             soup = BeautifulSoup(page_response.content, "html.parser")
 40 |             str_soup = str(soup)
 41 |         except:
 42 |             # raise
 43 |             print("Could not connect to tutorialspoint, trying again in 1 seconds!")
 44 |             time.sleep(1)
 45 |             continue
 46 |         else:
 47 |             break
 48 | 
 49 |     print("Downloading " + tutorial)
 50 |     links = []
 51 | 
 52 |     for ul in soup.find_all("ul", attrs={"class":"toc chapters"}):
 53 |         for li in ul.find_all("li"):
 54 |             for a in li.find_all("a"):
 55 |                 if ".htm" in a['href']:
 56 |                     links.append(domain_name + a['href'])
 57 |     pages = []
 58 |     with thread_pool(processes = 2*mp.cpu_count()) as pool:
 59 |         pages = pool.map(tutorialspoint_get_page, links)
 60 | 
 61 |     head = str_soup[:str_soup.find("<body")] + '\n<body>\n'
 62 |     head = head.replace("<style>",'<style>\n.prettyprint{\nbackground-color:#D3D3D3;\nfont-size: 12px;}\n')
 63 | 
 64 |     end = '\n</body>\n</html>'
 65 |     page = head + "".join(pages) + end
 66 |     with open ('..'+os.sep+'temp'+os.sep+tutorial+".html","w") as f:
 67 |         f.write(page)
 68 |     print(tutorial + " download completed")
 69 |     return
 70 | 
 71 | 
 72 | def tutorialspoint_all():
 73 |     url = "https://www.tutorialspoint.com/tutorialslibrary.htm"
 74 | 
 75 |     print("Connecting to Tutorialspoint")
 76 |     while True:
 77 |         try:
 78 |             page_response = requests.get(url, headers={'User-Agent': 'Chrome'}, timeout=5)
 79 |             soup = BeautifulSoup(page_response.content, "html.parser")
 80 |             str_soup = str(soup)
 81 |         except:
 82 |             # raise
 83 |             print("Could not connect, trying again in 1 seconds!")
 84 |             time.sleep(1)
 85 |             continue
 86 |         else:
 87 |             break
 88 | 
 89 |     links = []
 90 |     for ul in soup.find_all("ul", attrs={"class":"menu"}):
 91 |         for li in ul.find_all("li"):
 92 |             for a in li.find_all("a"):
 93 |                 links.append(url[:url.rfind("/")] + a['href'])
 94 | 
 95 |     with thread_pool(processes = 3) as pool:
 96 |         pool.map(tutorialspoint, links)
 97 | 
 98 | def javatpoint_get_page(url):
 99 |     while True:
100 |         try:
101 |             page_response = requests.get(url, headers={'User-Agent': 'Chrome'}, timeout=5)
102 |             soup = BeautifulSoup(page_response.content, "html.parser")
103 |             str_soup = str(soup)
104 |         except:
105 |             print("    >>  Could not connect to "+url.split("/")[-1]+", retrying")
106 |             time.sleep(1)
107 |             continue
108 |         else:
109 |             print("    >>  Downloaded " + url.split("/")[-1])
110 |             break
111 |     page = str_soup[str_soup.find("<h1"):]
112 |     if "Prerequisite" in page:
113 |         page = page.split("Prerequisite")[0]
114 |         page = page[:page.rfind("<hr")]
115 |     elif "nexttopicdiv" in page:
116 |         page = page.split("nexttopicdiv")[0]
117 |         page = page[:page.rfind("<hr")]
118 |     elif "bottomnext" in page:
119 |         page = page.split("nexttopicdiv")[0]
120 |         page = page[:page.rfind("<br")]
121 |     else:
122 |         print("Couldn't find delimiter")
123 | 
124 |     return page
125 | 
126 | def save_pdf(tutorial):
127 |     tutorial = str(tutorial.split(".")[0])
128 |     if not os.path.exists('..'+os.sep+'downloads'+os.sep+tutorial):
129 |         os.makedirs('..'+os.sep+'downloads'+os.sep+tutorial)
130 |     print("Converting "+tutorial+" in pdf,\nplease wait, it may take a while...")
131 |     try:
132 |         html = weasyprint.HTML('..'+os.sep+'temp'+os.sep+tutorial+".html")
133 |         main_doc = html.render()
134 |         pdf = main_doc.write_pdf()
135 |         with open('..'+os.sep+'downloads'+os.sep+tutorial+os.sep+tutorial+".pdf", 'wb') as f:
136 |             f.write(pdf)
137 |         print(tutorial + " pdf saved")
138 |         os.remove('..'+os.sep+'temp'+os.sep+tutorial+".html")
139 |     except:
140 |         pass
141 |     return
142 | 
143 | def html_to_pdf():
144 |     completed_tutorials = []
145 |     while not os.listdir('..'+os.sep+'temp'):
146 |         time.sleep(2)
147 | 
148 |     while os.listdir('..'+os.sep+'temp'):
149 |         tutorials = fnmatch.filter(os.listdir('..'+os.sep+'temp'), '*.html')
150 |         for tutorial in tutorials:
151 |             if tutorial not in completed_tutorials:
152 |                 save_pdf_runner = Process(target=save_pdf, args=(tutorial,))
153 |                 save_pdf_runner.start()
154 |                 completed_tutorials.append(tutorial)
155 | 
156 |     save_pdf_runner.join()
157 |     try:
158 |         os.rmdir('..'+os.sep+'temp')
159 |     except:
160 |         print("cannot delete '../temp' as it is not empty")
161 |     return
162 | 
163 | def javatpoint(url, tutorial=None):
164 |     if tutorial is None:
165 |         tutorial = url.split("/")[-1]
166 |     print("Downloading " + tutorial)
167 |     while True:
168 |         try:
169 |             page_response = requests.get(url, headers={'User-Agent': 'Chrome'}, timeout=5)
170 |             soup = BeautifulSoup(page_response.content, "html.parser")
171 |             str_soup = str(soup)
172 |         except:
173 |             # raise
174 |             print("Could not connect to javatpoint, trying again in 1 seconds!")
175 |             time.sleep(1)
176 |             continue
177 |         else:
178 |             break
179 |     links = []
180 | 
181 |     for div in soup.find_all("div", attrs={"class":"leftmenu"}):
182 |         for a in div.find_all("a"):
183 |             links.append(url[:url.rfind("/")+1] + a["href"])
184 | 
185 |     pages = []
186 |     with thread_pool(processes = 2*mp.cpu_count()) as pool:
187 |         pages = pool.map(javatpoint_get_page, links)
188 | 
189 |     page = str_soup[:str_soup.find("<body")] + "\n<body>\n" + "".join(pages)+ "\n</body>\n</html>"
190 | 
191 |     with open ('..'+os.sep+'temp'+os.sep+tutorial+".html","w") as f:
192 |         f.write(page)
193 |     # os.system('xdg-open page.html')
194 |     # os.system('xdg-open page.pdf')
195 |     print(tutorial + " download completed")
196 |     return
197 | 
198 | def javatpoint_all():
199 |     url = "https://www.javatpoint.com/"
200 | 
201 |     print("Connecting to Javatpoint")
202 |     while True:
203 |         try:
204 |             page_response = requests.get(url, headers={'User-Agent': 'Chrome'}, timeout=5)
205 |             soup = BeautifulSoup(page_response.content, "html.parser")
206 |             str_soup = str(soup)
207 |         except:
208 |             # raise
209 |             print("Could not connect, trying again in 1 seconds!")
210 |             time.sleep(1)
211 |             continue
212 |         else:
213 |             break
214 | 
215 |     tutorials = []
216 |     links = []
217 |     break_condition = False
218 |     for div in soup.find_all("div", attrs={"class":"firsthomecontent"}):
219 |         if break_condition:
220 |             break
221 |         for a in div.find_all("a"):
222 |             if "forum" in a["href"]:
223 |                 break_condition = True
224 |                 break
225 |             if "http" in a["href"]:
226 |                 links.append(a["href"])
227 |             else:
228 |                 links.append("https://www.javatpoint.com/" + a["href"])
229 |             for tutorial_name in a.find_all("p"):
230 |                 tutorials.append(tutorial_name.contents[0])
231 | 
232 |     with thread_pool(processes = 3) as pool:
233 |         pool.starmap(javatpoint, zip(links, tutorials))
234 |     return
235 | 
236 | 
237 | def download_list_all():
238 |     if len(download_list) == 0:
239 |         return
240 |     javatpoint_list = []
241 |     tutorialspoint_list = []
242 |     for link in download_list:
243 |         if "https://www.javatpoint.com/" in link:
244 |             javatpoint_list.append(link)
245 |         elif "https://www.tutorialspoint.com/" in link:
246 |             tutorialspoint_list.append(link)
247 |         else:
248 |             print(link + " is not a valid link!")
249 | 
250 |     def javatpoint_start():
251 |         with thread_pool(processes = 3) as pool:
252 |             pool.map(javatpoint, javatpoint_list)
253 |         return
254 |     def tutorialspoint_start():
255 |         with thread_pool(processes = 3) as pool:
256 |             pool.map(tutorialspoint, tutorialspoint_list)
257 |         return
258 |     javatpoint_runner = Process(target=javatpoint_start)
259 |     tutorialspoint_runner = Process(target=tutorialspoint_start)
260 |     javatpoint_runner.start()
261 |     tutorialspoint_runner.start()
262 |     return
263 | 
264 | def download_all():
265 |     javatpoint_runner = Process(target=javatpoint_all)
266 |     tutorialspoint_runner = Process(target=tutorialspoint_all)
267 |     javatpoint_runner.start()
268 |     tutorialspoint_runner.start()
269 |     return
270 | 
271 | def main():
272 | 
273 |     parser = argparse.ArgumentParser(description="download pdf of any tutorial from Javatpoint or Tutorialspoint")
274 |     parser.add_argument("-a", "--all", action="store_true", help="download all tutorials from Javatpoint and Tutorialspoint")
275 |     parser.add_argument("-j", "--javatpoint", action="store_true", help="download all tutorials from Javatpoint")
276 |     parser.add_argument("-t", "--tutorialspoint", action="store_true", help="download all tutorials from Tutorialspoint")
277 |     parser.add_argument("-u", "--url", action="store_true", help="download all tutorials mentioned in 'download_links.py'")
278 |     if len(sys.argv) < 2:
279 |         parser.print_usage()
280 |         sys.exit(1)
281 |     args = parser.parse_args()
282 | 
283 |     if not os.path.exists('..'+os.sep+'downloads'):
284 |         os.makedirs('..'+os.sep+'downloads')
285 |     if os.path.exists('..'+os.sep+'temp'):
286 |         shutil.rmtree('..'+os.sep+'temp')
287 |     os.makedirs('..'+os.sep+'temp')
288 | 
289 |     pdf_conversion = Process(target=html_to_pdf)
290 |     pdf_conversion.start()
291 | 
292 |     if args.all:
293 |         download_all()
294 | 
295 |     if args.javatpoint:
296 |         javatpoint_all()
297 | 
298 |     if args.tutorialspoint:
299 |         tutorialspoint_all()
300 | 
301 |     elif args.url:
302 |         download_list_all()
303 | 
304 |     return
305 | 
306 | if __name__=="__main__":
307 |     main()
308 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | requests
2 | bs4
3 | weasyprint
4 | 


--------------------------------------------------------------------------------