├── tst.txt ├── img ├── Thumbs.db ├── wikilist.png ├── UsingWikiPDF.gif └── wikipdfcolor.png ├── README.md ├── wikipdf.py └── wikipdfcolor.py /tst.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iiviigames/wikipdf/HEAD/tst.txt -------------------------------------------------------------------------------- /img/Thumbs.db: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iiviigames/wikipdf/HEAD/img/Thumbs.db -------------------------------------------------------------------------------- /img/wikilist.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iiviigames/wikipdf/HEAD/img/wikilist.png -------------------------------------------------------------------------------- /img/UsingWikiPDF.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iiviigames/wikipdf/HEAD/img/UsingWikiPDF.gif -------------------------------------------------------------------------------- /img/wikipdfcolor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iiviigames/wikipdf/HEAD/img/wikipdfcolor.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # WikiPDF 2 | 3 | 4 | Included here are 2 scripts, which will download a list of wikipedia pages in PDF format without going through the hassle of clicking the "Download Page as PDF" link every time you find an interesting article. 5 | 6 | 7 | ![WikiPDFColor Usage](./img/UsingWikiPDF.gif) 8 | 9 | 10 | 11 | > `wikipdfcolor.py` 12 | > + The **new and improved** version! As long as you've got a decent terminal (I recommend [ConEmu](https://conemu.github.io/) for Windows users!), you should have a lovely, color coded and more intuitive experience overall using the script! 13 | > 14 | > `wikipdf.py` 15 | > + The classic version. Uglier, less user friendly. 16 | 17 | 18 | --- 19 | 20 | How To Use the Script 21 | --------------------- 22 | 23 | + Save this git in the folder you'd like to run the code from. I'll say it's `C:\Users\{your_name}\Desktop` for this example. 24 | + Extract the zip file to a folder on the Desktop called `WikiPDF` or whatever you want. 25 | + See the file in there called **`tst.txt`**? That's how you need to format the `.txt` files for whatever pages you want to download. If they aren't on new lines, or written just like they are in their actual wikipedia page titles, you may see errors. 26 | + Here's a nice photo, just to have a visual reference of how it should look. 27 | 28 | + > ![Text File Format](./img/wikilist.png) 29 | 30 | + Run **`wikipdf.py`** or **`wikipdfcolor.py`** - _your choice_. A console window will pop up. Here's a clip of me using the color version. 31 | + > ![wikipdfcolor](./img/wikipdfcolor.png) 32 | + The _color_ version will give you a couple of different regularly used file locations, along with the folder you launched the program from. You can enter the corresponding number to make a selection, and then press `ENTER`. 33 | + You can also choose to enter a _custom directory_ by choosing option 5, and then typing in any path you'd like. 34 | + The _classic_ version requires you to enter the location that the textfile you want to use is in. It's bad, and less pretty, and I don't think you should use this one. (_The code is less efficient and more obnoxious too!_) 35 | + After making your selection, all `.txt` files present in the chosen directory will be displayed without their extensions visible. This is just for aesthetics, **I promise** they are `.txt` files. 36 | + Finally, enter the file's name (in the example it's `tst`) and pres **`ENTER`**. 37 | 38 | **BOOM!** 39 | 40 | _The magic begins!_ 41 | 42 | --- 43 | 44 | _With Solidarity,_ 45 | 46 | :crystal_ball: [__*iivii*__](https://merveilles.town/@thelibrarian) 47 | -------------------------------------------------------------------------------- /wikipdf.py: -------------------------------------------------------------------------------- 1 | # usr/bin/env python 2 | #---------------------------------------------------------------------------# 3 | # MODULE: wikipdf.py3 # 4 | # INFO: Accesses a .txt file with each line being a title of a # 5 | # Wikipedia page you'd like to save a PDF of. It then # 6 | # downloads all the files, saving them with a nice file- # 7 | # name into the folder that the .txt is already in. # 8 | # INSPIRATION: Data hoarders have problems too. # 9 | # CODED BY: iivii @odd_codes # 10 | # EMAIL: iiviigames@pm.me # 11 | # WEBSITE: https://odd.codes # 12 | # LICENSE: BUDDYPACT # 13 | # BORROW, USE, DONATE, DOWNLOAD! # 14 | # Your price? A courteous thanks. # 15 | # # 16 | # # 17 | # Confused about that BUDDYPACT? Well, its simple. You can use my code # 18 | # for anything at all, commercial, personal, erotic or, whatever. The # 19 | # only thing you are required to do if you choose to use it, is to link # 20 | # me to the thing you used it for, or shoot me an email to tell me what # 21 | # you're working on, so I can see the cool shit you are doing, and see # 22 | # the connections forming between disparate groups. # 23 | # # 24 | # Maybe we can even get others to do this as well, and before you know it,# 25 | # everybody is connected through collaborative frienships. That's the goal# 26 | # of BUDDYPACT: # 27 | # # 28 | # Friendship Through Collaboration. # 29 | #---------------------------------------------------------------------------# 30 | 31 | 32 | # IMPORTS 33 | #_______________________________________________________________________________ 34 | 35 | import sys 36 | import os 37 | import requests 38 | 39 | 40 | # GLOBALS 41 | #_______________________________________________________________________________ 42 | 43 | BASEURL = "https://en.wikipedia.org/" 44 | APIPDF = "api/rest_v1/page/pdf/" 45 | 46 | # FUNCTIONS 47 | #________________________________________________________________________________ 48 | 49 | 50 | def fix_string(words, spacer=" ", replacer=" ", addto=""): 51 | """ 52 | Fixes a string by removing certain characters and swapping them out for new ones. 53 | """ 54 | fixed = words.replace(spacer, replacer) 55 | fixed += addto 56 | return fixed 57 | 58 | 59 | def fix_string_list(wordlist, spacer=" ", replacer=" ", addto=""): 60 | """ 61 | As fix string, however, does this with all items in a list. 62 | """ 63 | formatted_list = [] 64 | for i in range(len(wordlist)): 65 | entry = wordlist[i] 66 | entry_fixed = entry.replace(spacer,replacer) 67 | entry_fixed += addto 68 | formatted_list.append(entry_fixed) 69 | 70 | return formatted_list 71 | 72 | 73 | def get_data_from(line): 74 | """ 75 | Retrieves input from a user. 76 | """ 77 | msg = line 78 | msglen = len(msg) 79 | prompt = "\n::: " 80 | line = "_" * msglen 81 | res = input(msg+prompt) 82 | print(line + "\r") 83 | return res 84 | 85 | 86 | def get_txt_list(): 87 | """ 88 | This function is responsible for moving to the directory where the 89 | text file containing the desired wikipedia pages to download is 90 | located. 91 | 92 | It will only seek out text files and has preventative measures to 93 | ensure no errors will occur if the user types in a directory wrong, 94 | or a file name wrong. 95 | 96 | Once the user selects the directory and the file containing the 97 | list of wikipedia entries, the file is parsed, and passed to the 98 | download function. 99 | """ 100 | print("Enter the directory name where the text file is located, or,") 101 | directory = get_data_from("just hit ENTER to use the current directory:") 102 | if directory == "": 103 | directory = os.curdir 104 | else: 105 | os.chdir(directory) 106 | 107 | dirlist = os.listdir(directory) 108 | txtlist = [] 109 | for i in range(len(dirlist)): 110 | if str(dirlist[i]).endswith(".txt"): 111 | txtlist.append(str(dirlist[i])) 112 | 113 | 114 | if len(txtlist) < 1: 115 | print("No .txt files in this directory\nExiting!") 116 | raise(sys.exit()) 117 | 118 | 119 | print("Please enter one of the following file names to use as the reference:\n") 120 | print("NOTE: Do not enter .txt to the end of the filename!\n\n") 121 | striplist = [] 122 | for i in range(len(txtlist)): 123 | striplist.append(txtlist[i].replace(".txt","")) 124 | 125 | for i in striplist: 126 | print(i) 127 | 128 | print("\n") 129 | 130 | fname = get_data_from("Enter the name of the .txt file to retrieve page names from:") 131 | for i in range(len(striplist)): 132 | if fname == striplist[i]: 133 | fname += ".txt" 134 | break 135 | elif i == len(striplist) - 1: 136 | print("Entered an invalid filename.\nExiting!") 137 | raise(sys.exit()) 138 | 139 | 140 | print("Success!\nReading from: " + fname) 141 | 142 | # READ FROM TEXTFILE 143 | text_list = [] 144 | with open(fname, 'r') as text_file: 145 | file_contents = text_file.readlines() 146 | for line in file_contents: 147 | # Remove the linebreaks at the end of each line 148 | line_current = line.replace("\n","") 149 | # Append the formatted line to an entry in text_list 150 | text_list.append(line_current) 151 | 152 | # FORMAT text_list INTO PROPER REQUEST FORMAT 153 | api_titles = fix_string_list(text_list, " ", "_") 154 | 155 | # Output API List Contents for Testing 156 | #print("____________________________________________________________\n") 157 | #for i in api_titles: 158 | # print(i) 159 | #print("____________________________________________________________\n") 160 | 161 | # RETURN LIST FOR REQUESTING 162 | return api_titles 163 | 164 | 165 | def download_pdfs(api_list): 166 | """ 167 | This is the downloading portion of the code. 168 | A list urls is passed into the argument and the names of the files 169 | are created within this code. This loops until the .txt file 170 | list is completely parsed. 171 | """ 172 | addresses = [] 173 | for i in range(len(api_list)): 174 | title = api_list[i] 175 | address = BASEURL + APIPDF + title 176 | addresses.append(address) 177 | 178 | # Request PDF information from wikipedia. 179 | for i in range(len(addresses)): 180 | # Use the name that was appeneded to the address list 181 | out_file_name = api_list[i]+".pdf" 182 | # Notify the user of the file downloading and what its name will be 183 | print("Downloading: %s" % out_file_name) 184 | 185 | # Response from server 186 | r = requests.get(addresses[i], stream=True) 187 | 188 | # Write the file to PDF in Chunks 189 | with open(out_file_name, 'wb') as pdf: 190 | for chunk in r.iter_content(chunk_size=4096): 191 | if chunk: 192 | pdf.write(chunk) 193 | 194 | # Notify user that the file has successfully downloaded. 195 | print("%s downloaded!\n" % out_file_name) 196 | 197 | # Completed Downloads List! 198 | print("DOWNLOADED ALL FILES!") 199 | raise(sys.exit()) 200 | 201 | 202 | # MAINLOOP 203 | #_______________________________________________________________________________ 204 | 205 | wiki_get_list = get_txt_list() 206 | download_pdfs(wiki_get_list) 207 | -------------------------------------------------------------------------------- /wikipdfcolor.py: -------------------------------------------------------------------------------- 1 | # usr/bin/env python 2 | #------------------------------------------------------------------------------# 3 | # MODULE: wikipdf.py 4 | # INFO: Accesses a .txt file with each line being a title of 5 | # Wikipedia page you'd like to save a PDF of. It then 6 | # downloads all the files, saving them with a nice file 7 | # name into the folder that the .txt is already in 8 | # INSPIRATION: Data hoarders have problems too. 9 | # CODED BY: iivii @odd_codes 10 | # EMAIL: iiviigames@pm.me 11 | # WEBSITE: https://odd.codes 12 | # LICENSE: BUDDYPACT 13 | # BORROW, USE, DONATE, DOWNLOAD! 14 | # Your price? A courteous thanks. 15 | # 16 | # 17 | # Confused about that BUDDYPACT? Well, its simple. You can use my code 18 | # for anything at all, commercial, personal, erotic or, whatever. The 19 | # only thing you are required to do if you choose to use it, is to link 20 | # me to the thing you used it for, or shoot me an email to tell me what 21 | # you're working on, so I can see the cool shit you are doing, and see 22 | # the connections forming between disparate groups. 23 | # 24 | # Maybe we can even get others to do this as well, and before you know it, 25 | # everybody is connected through collaborative frienships. That's the goal 26 | # of BUDDYPACT: 27 | # Friendship Through Collaboration 28 | #------------------------------------------------------------------------------# 29 | 30 | 31 | # IMPORTS 32 | #_______________________________________________________________________________ 33 | from __future__ import print_function 34 | import sys 35 | import os 36 | import requests 37 | import os 38 | 39 | 40 | # GLOBALS 41 | #_______________________________________________________________________________ 42 | 43 | # Wikipedia API 44 | BASEURL = "https://en.wikipedia.org/" 45 | APIPDF = "api/rest_v1/page/pdf/" 46 | 47 | # Folders for quick selection during the pathing process. 48 | # These can be changed to whatever you want, and then quickly 49 | # selected when prompted, rather than having to type a path 50 | # every time. 51 | CALLDIR = os.getcwd() 52 | DESKTOP = os.path.join(os.path.join(os.path.expanduser('~')), 'Desktop') 53 | DOCUMENTS = os.path.join(os.path.join(os.path.expanduser('~')), 'Documents') 54 | DOWNLOADS = os.path.join(os.path.join(os.path.expanduser('~')), 'Downloads') 55 | CURRENT = CALLDIR + " (The Current Directory)" 56 | CUSTOM = "Enter a custom directory" 57 | 58 | USUAL = [DESKTOP, DOCUMENTS, DOWNLOADS, CURRENT, CUSTOM] 59 | 60 | 61 | # Color functions. 62 | ATTRIBUTES = dict(list(zip(['bold','dark','','underline','blink','','reverse','concealed'],list(range(1, 9))))) 63 | del ATTRIBUTES[''] 64 | HIGHLIGHTS = dict(list(zip(['_grey','_red','_green','_yellow','_blue','_magenta','_cyan','_white'],list(range(40, 48))))) 65 | COLORS = dict(list(zip(['grey','red','green','yellow','blue','magenta','cyan','white',],list(range(30, 38))))) 66 | RESET = '\033[0m' 67 | # FUNCTIONS 68 | #________________________________________________________________________________ 69 | 70 | 71 | # Alter a single string 72 | def fix_string(words, spacer=" ", replacer=" ", addto=""): 73 | """ 74 | Fixes a string by removing certain characters and swapping them out for new ones. 75 | """ 76 | fixed = words.replace(spacer, replacer) 77 | fixed += addto 78 | return fixed 79 | 80 | 81 | # Alter a list of strings 82 | def fix_string_list(wordlist, spacer=" ", replacer=" ", addto=""): 83 | """ 84 | As fix string, however, does this with all items in a list. 85 | """ 86 | formatted_list = [] 87 | for i in range(len(wordlist)): 88 | entry = wordlist[i] 89 | entry_fixed = entry.replace(spacer,replacer) 90 | entry_fixed += addto 91 | formatted_list.append(entry_fixed) 92 | 93 | return formatted_list 94 | 95 | 96 | # User input. 97 | def get_data_from(line,color="grey"): 98 | """ 99 | Retrieves input from a user. 100 | """ 101 | msg = line 102 | msglen = len(msg) 103 | prompt = "\n::: " 104 | line = "_" * msglen 105 | cprint(msg,color) 106 | res = input(prompt) 107 | cprint(line + "\r", "cyan", None, ["underline", "concealed"]) 108 | return res 109 | 110 | 111 | # Colorizer for the terminal 112 | def colored(text, color=None, on_color=None, attrs=None): 113 | """ 114 | The workhorse that cprint calls upon. 115 | Example: 116 | colored('Holy Baboons Ass, Robin!, 'grey', '_red', ['blue', 'blink']) 117 | colored('Napster was an inside job!', 'green', on_color=None, attrs=["bold"]) 118 | """ 119 | if os.getenv('ANSI_COLORS_DISABLED') is None: 120 | fmt_str = '\033[%dm%s' 121 | if color is not None: 122 | text = fmt_str % (COLORS[color], text) 123 | 124 | if on_color is not None: 125 | text = fmt_str % (HIGHLIGHTS[on_color], text) 126 | 127 | if attrs is not None: 128 | for attr in attrs: 129 | text = fmt_str % (ATTRIBUTES[attr], text) 130 | 131 | text += RESET 132 | 133 | return text 134 | 135 | 136 | # The easy-bake color print function 137 | def cprint(text, color=None, on_color=None, attrs=None, **kwargs): 138 | """ 139 | Colored printing in the terminal! 140 | FG: red, green, yellow, blue, magenta, cyan, white 141 | BG: _red, _green, _yellow, _blue, _magenta, _cyan, _white 142 | STYLES: bold, dark, underline, blink, reverse, concealed, or any color name! 143 | """ 144 | 145 | print((colored(text, color, on_color, attrs)), **kwargs) 146 | 147 | 148 | # Directory Changing 149 | def dshift(): 150 | print("Enter the associated number to switch to that directory") 151 | count = 0 152 | tc = "n" 153 | bc = "" 154 | # The color values for printing on console. 155 | # Change these as you want! Check the cprint function for more 156 | # information on how these are formatted. 157 | for entry in USUAL: 158 | count+=1 159 | if count == 1: 160 | tc = "magenta" 161 | bc = "_grey" 162 | elif count == 2: 163 | tc = "green" 164 | bc = "_grey" 165 | elif count == 3: 166 | tc = "red" 167 | bc = "_grey" 168 | elif count == 4: 169 | tc = "cyan" 170 | bc = "_grey" 171 | elif count == 5: 172 | tc = "yellow" 173 | bc = "_grey" 174 | 175 | cprint(str(count) + ".) " + entry, tc, bc) 176 | 177 | cprint("---------------------------------------------------", "cyan", "_grey", ["underline"]) 178 | response = get_data_from("Please type out your choice, and press ENTER") 179 | if not response in ['1','2','3','4','5']: 180 | cprint("INVALID ENTRY. EXITING...","blue", "_red") 181 | raise(sys.exit()) 182 | else: 183 | # This is where you could add more options for folders you tend to 184 | # download to. 185 | if response == '1': 186 | return DESKTOP 187 | elif response == '2': 188 | return DOCUMENTS 189 | elif response == '3': 190 | return DOWNLOADS 191 | elif response == '4': 192 | return os.curdir 193 | else: 194 | return response 195 | 196 | 197 | def get_txt_list(): 198 | """ 199 | This function is responsible for moving to the directory where the 200 | text file containing the desired wikipedia pages to download is 201 | located. 202 | 203 | It will only seek out text files and has preventative measures to 204 | ensure no errors will occur if the user types in a directory wrong, 205 | or a file name wrong. 206 | 207 | Once the user selects the directory and the file containing the 208 | list of wikipedia entries, the file is parsed, and passed to the 209 | download function. 210 | """ 211 | directory = dshift() 212 | if directory == "5": 213 | directory = get_data_from("Enter the directory to use:", "magenta") 214 | # Attempt a directory shift to the entered drive point; exit on failure. 215 | try: 216 | os.chdir(directory) 217 | except FileNotFoundError: 218 | cprint("No directory exists in the provided location.", "red") 219 | raise(sys.exit()) 220 | 221 | except OSError: 222 | cprint("Nothing entered; so the current directory will be checked!", "magenta", None, ["underline"]) 223 | directory = os.curdir 224 | try: 225 | os.chdir(directory) 226 | except FileNotFoundError: 227 | cprint("Sorry, no valid files in the current directory.\nEXITING!", "red", None, ["dark"]) 228 | raise(sys.exit()) 229 | 230 | # Store all the files in the final directory and create an empty list. 231 | dirlist = os.listdir(directory) 232 | txtlist = [] 233 | 234 | # Any files ending with the .txt extension will register and append 235 | # into the txtlist. This is to ensure that the program doesn't attempt 236 | # to read from other file types based on a user mis-input. 237 | for i in range(len(dirlist)): 238 | if str(dirlist[i]).endswith(".txt"): 239 | txtlist.append(str(dirlist[i])) 240 | 241 | # If after looping there exists no files in the txtlist, this means that 242 | # no .txt files exist in that directory, and the program exits. 243 | if len(txtlist) < 1: 244 | cprint("No .txt files in this directory\nExiting!", "red", None, ["underline"]) 245 | raise(sys.exit()) 246 | 247 | # Notify user that they should enter the name of the text file (since 248 | # it's possible there's more than one in the directory) that they want 249 | # to parse from. 250 | cprint("Please enter one of the following file names to use as the reference:", "green") 251 | cprint("NOTE: Do not enter .txt to the end of the filename!\n","yellow",None,["underline","bold"]) 252 | 253 | # Create a list that will contain the files without their '.txt' 254 | # extension. This is unnecessary really, and just my preference to 255 | # not have to type '.txt' every time I want to use this. This would 256 | # need to be changed (rather easily, too) in order to read other 257 | # file types. 258 | striplist = [] 259 | for i in range(len(txtlist)): 260 | striplist.append(txtlist[i].replace(".txt","")) 261 | 262 | # Output all the file names with their extensions stripped out. 263 | for i in striplist: 264 | print(i) 265 | fname = get_data_from("Enter the name of the .txt file to retrieve page names from:") 266 | 267 | # Error catching for bad entries. 268 | for i in range(len(striplist)): 269 | if fname == striplist[i]: 270 | fname += ".txt" 271 | break 272 | elif i == len(striplist) - 1: 273 | cprint("Entered an invalid filename.\nExiting!","red", None, ["underline", "bold"]) 274 | raise(sys.exit()) 275 | 276 | # FINALLY! A successful attempt to find a text file and parse it. 277 | print("Success!\nReading from: " + fname) 278 | 279 | # Now, actually parse the lines and prepare them for their insertion 280 | # into the wikipedia API and eventually, their file names as PDFs. 281 | text_list = [] 282 | with open(fname, 'r') as text_file: 283 | file_contents = text_file.readlines() 284 | for line in file_contents: 285 | # Remove the linebreaks at the end of each line 286 | line_current = line.replace("\n","") 287 | # Append the formatted line to an entry in text_list 288 | text_list.append(line_current) 289 | 290 | # FORMAT text_list INTO PROPER REQUEST FORMAT 291 | api_titles = fix_string_list(text_list, " ", "_") 292 | 293 | # RETURN LIST FOR REQUESTING 294 | return api_titles 295 | 296 | 297 | def download_pdfs(api_list): 298 | """ 299 | This function is called once a text file has been parsed and passed to it as 300 | a list; entries already formatted for the requests. 301 | 302 | For every entry in the .txt file, a call to wikipedia is made, and if that page 303 | exists, it will be downloaded into the desired destination as a PDF. 304 | 305 | Works like a fucking charm! 306 | """ 307 | addresses = [] 308 | for i in range(len(api_list)): 309 | title = api_list[i] 310 | address = BASEURL + APIPDF + title 311 | addresses.append(address) 312 | 313 | # Request PDF information from wikipedia. 314 | for i in range(len(addresses)): 315 | # Use the name that was appeneded to the address list 316 | out_file_name = api_list[i]+".pdf" 317 | # Notify the user of the file downloading and what its name will be 318 | cprint("Downloading: %s" % out_file_name, "blue", "_grey") 319 | 320 | # Response from server 321 | r = requests.get(addresses[i], stream=True) 322 | 323 | # Write the file to PDF in Chunks 324 | with open(out_file_name, 'wb') as pdf: 325 | for chunk in r.iter_content(chunk_size=4096): 326 | if chunk: 327 | pdf.write(chunk) 328 | 329 | # Notify user that the file has successfully downloaded. 330 | cprint("%s downloaded!\n" % out_file_name, "green", "_grey", ["dark", "blink"]) 331 | 332 | # Completed Downloads List! 333 | cprint("DOWNLOADED ALL FILES!", "blue", "_yellow", ["underline"]) 334 | raise(sys.exit()) 335 | 336 | 337 | # MAINLOOP 338 | #_______________________________________________________________________________ 339 | if __name__ == '__main__': 340 | wiki_get_list = get_txt_list() 341 | download_pdfs(wiki_get_list) 342 | --------------------------------------------------------------------------------