├── LICENSE ├── README.md └── bookp.py /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # bOOkp 2 | 3 | Quick'n'dirty script to download all you Kindle ebooks. 4 | 5 | I needed to backup all my Kindle e-books, so put together this script. It does 6 | work for now, but a change in the download process will probably break it, and I 7 | may not have the time to fix it right away. 8 | 9 | You can download all your e-books (that are eligible for download), or you can 10 | specify multiple ASINs to download. By default the script will only display 11 | warnings, errors, and a finish message. If you want to see progress, you have to 12 | use the `--verbose` flag. Selenium with ChromeDriver is used to handle login, 13 | and you can display the browser with `--showbrowser` - this may come handy if 14 | something goes wrong. 15 | 16 | The only mandatory command line parameter is the e-mail address associated with 17 | your Amazon account, but of course the script will need your password too - it 18 | will ask for it if not given as parameter. Keep in mind that passwords given as 19 | parameters will probably be stored in you history! 20 | 21 | The script will also ask which of your devices you want to download your books 22 | to. This is important, because the downloaded books will be DRMd to that 23 | particular device. The serial number (which is required to remove DRM) will be 24 | printed when the books are downloaded. 25 | 26 | ## Usage 27 | 28 | ``` 29 | usage: bookp.py [-h] [--verbose] [--showbrowser] --email EMAIL 30 | [--password PASSWORD] [--outputdir OUTPUTDIR] [--proxy PROXY] 31 | [--asin [ASIN [ASIN ...]]] 32 | 33 | Amazon e-book downloader. 34 | 35 | optional arguments: 36 | -h, --help show this help message and exit 37 | --verbose show info messages 38 | --showbrowser display browser while creating session. 39 | --email EMAIL Amazon account e-mail address 40 | --password PASSWORD Amazon account password 41 | --outputdir OUTPUTDIR 42 | download directory (default: books) 43 | --proxy PROXY HTTP proxy server 44 | --asin [ASIN [ASIN ...]] 45 | list of ASINs to download 46 | ``` 47 | 48 | ## Requirements 49 | 50 | * [Python 3.x](https://www.python.org) 51 | * [ChromeDriver](https://sites.google.com/a/chromium.org/chromedriver/downloads) 52 | * the following Python modules: 53 | * [requests](https://pypi.org/project/requests/) 54 | * [PyVirtualDisplay](https://pypi.org/project/PyVirtualDisplay/) 55 | * [selenium](https://pypi.org/project/selenium/) 56 | -------------------------------------------------------------------------------- /bookp.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import getpass 4 | import json 5 | import logging 6 | import os 7 | import re 8 | import requests 9 | import sys 10 | import urllib.parse 11 | 12 | from argparse import ArgumentParser 13 | from pyvirtualdisplay import Display 14 | from selenium import webdriver 15 | 16 | 17 | user_agent = {'User-Agent': 'krumpli'} 18 | logger = logging.getLogger(__name__) 19 | 20 | def create_session(email, password, browser_visible=False, proxy=None): 21 | if not browser_visible: 22 | display = Display(visible=0) 23 | display.start() 24 | 25 | logger.info("Starting browser") 26 | options = webdriver.ChromeOptions() 27 | if proxy: 28 | options.add_argument('--proxy-server='+proxy) 29 | browser = webdriver.Chrome(chrome_options=options) 30 | 31 | logger.info("Loading www.amazon.com") 32 | browser.get('https://www.amazon.com') 33 | 34 | logger.info("Logging in") 35 | browser.find_element_by_css_selector("#nav-signin-tooltip > a.nav-action-button").click() 36 | browser.find_element_by_id("ap_email").clear() 37 | browser.find_element_by_id("ap_email").send_keys(email) 38 | 39 | browser.find_element_by_id("ap_password").clear() 40 | browser.find_element_by_id("ap_password").send_keys(password) 41 | browser.find_element_by_id("signInSubmit").click() 42 | 43 | logger.info("Getting CSRF token") 44 | browser.get('https://www.amazon.com/hz/mycd/myx#/home/content/booksAll') 45 | 46 | match = re.search('var csrfToken = "(.*)";', browser.page_source) 47 | if match: 48 | csrf_token = match.group(1) 49 | 50 | cookies = {} 51 | for cookie in browser.get_cookies(): 52 | cookies[cookie['name']] = cookie['value'] 53 | 54 | browser.quit() 55 | if not browser_visible: 56 | display.stop(); 57 | 58 | return cookies, csrf_token 59 | 60 | """ 61 | NOTE: This function is not used currently, because the download URL can be 62 | constructed without this additional request. This might change in the future, 63 | so I'm keeping this here just in case. 64 | 65 | def get_download_url(user_agent, cookies, csrf_token, asin, device_id): 66 | logger.info("Getting download URL for " + asin) 67 | data_json = { 68 | 'param':{ 69 | 'DownloadViaUSB':{ 70 | 'contentName':asin, 71 | 'encryptedDeviceAccountId':device_id, # device['deviceAccountId'] 72 | 'originType':'Purchase' 73 | } 74 | } 75 | } 76 | 77 | r = requests.post('https://www.amazon.com/hz/mycd/ajax', 78 | data={'data':json.dumps(data_json), 'csrfToken':csrf_token}, 79 | headers=user_agent, cookies=cookies) 80 | rr = json.loads(r.text)["DownloadViaUSB"] 81 | return rr["URL"] if rr["success"] else None 82 | """ 83 | 84 | def get_devices(user_agent, cookies, csrf_token): 85 | logger.info("Getting device list") 86 | data_json = {'param': {'GetDevices': {}}} 87 | 88 | r = requests.post('https://www.amazon.com/hz/mycd/ajax', 89 | data={'data':json.dumps(data_json), 'csrfToken':csrf_token}, 90 | headers=user_agent, cookies=cookies) 91 | devices = json.loads(r.text)["GetDevices"]["devices"] 92 | 93 | return [device for device in devices if 'deviceSerialNumber' in device] 94 | 95 | def get_asins(user_agent, cookies, csrf_token): 96 | logger.info("Getting e-book list") 97 | startIndex = 0 98 | batchSize = 100 99 | data_json = { 100 | 'param':{ 101 | 'OwnershipData':{ 102 | 'sortOrder':'DESCENDING', 103 | 'sortIndex':'DATE', 104 | 'startIndex':startIndex, 105 | 'batchSize':batchSize, 106 | 'contentType':'Ebook', 107 | 'itemStatus':['Active'], 108 | 'originType':['Purchase'], 109 | } 110 | } 111 | } 112 | 113 | # NOTE: This loop could be replaced with only one request, since the 114 | # response tells us how many items are there ('numberOfItems'). I guess that 115 | # number will never be high enough to cause problems, but I want to be on 116 | # the safe side, hence the download in batches approach. 117 | asins = [] 118 | while True: 119 | r = requests.post('https://www.amazon.com/hz/mycd/ajax', 120 | data={'data':json.dumps(data_json), 'csrfToken':csrf_token}, 121 | headers=user_agent, cookies=cookies) 122 | rr = json.loads(r.text) 123 | asins += [book['asin'] for book in rr['OwnershipData']['items']] 124 | 125 | if rr['OwnershipData']['hasMoreItems']: 126 | startIndex += batchSize 127 | data_json['param']['OwnershipData']['startIndex'] = startIndex 128 | else: 129 | break 130 | 131 | return asins 132 | 133 | def download_books(user_agent, cookies, device, asins, directory): 134 | logger.info("Downloading {} books".format(len(asins))) 135 | cdn_url = 'http://cde-g7g.amazon.com/FionaCDEServiceEngine/FSDownloadContent' 136 | cdn_params = 'type=EBOK&key={}&fsn={}&device_type={}' 137 | for asin in asins: 138 | try: 139 | params = cdn_params.format(asin, device['deviceSerialNumber'], device['deviceType']) 140 | r = requests.get(cdn_url, params=params, headers=user_agent, cookies=cookies, stream=True) 141 | name = re.findall("filename\*=UTF-8''(.+)", r.headers['Content-Disposition'])[0] 142 | name = urllib.parse.unquote(name) 143 | name = name.replace('/', '_') 144 | with open(os.path.join(directory, name), 'wb') as f: 145 | for chunk in r.iter_content(chunk_size=512): 146 | f.write(chunk) 147 | logger.info('Downloaded ' + asin + ': ' + name) 148 | except Exception as e: 149 | logger.debug(e) 150 | logger.error('Failed to download ' + asin) 151 | 152 | def main(): 153 | parser = ArgumentParser(description="Amazon e-book downloader.") 154 | parser.add_argument("--verbose", help="show info messages", action="store_true") 155 | parser.add_argument("--showbrowser", help="display browser while creating session.", action="store_true") 156 | parser.add_argument("--email", help="Amazon account e-mail address", required=True) 157 | parser.add_argument("--password", help="Amazon account password", default=None) 158 | parser.add_argument("--outputdir", help="download directory (default: books)", default="books") 159 | parser.add_argument("--proxy", help="HTTP proxy server", default=None) 160 | parser.add_argument("--asin", help="list of ASINs to download", nargs='*') 161 | args = parser.parse_args() 162 | 163 | if args.verbose: 164 | logger.setLevel(logging.INFO) 165 | else: 166 | logger.setLevel(logging.WARNING) 167 | formatter = logging.Formatter('[%(levelname)s]\t%(asctime)s %(message)s') 168 | handler = logging.StreamHandler() 169 | handler.setFormatter(formatter) 170 | logger.addHandler(handler) 171 | 172 | password = args.password 173 | if not password: 174 | password = getpass.getpass("Your Amazon password: ") 175 | 176 | if os.path.isfile(args.outputdir): 177 | logger.error("Output directory is a file!") 178 | return -1 179 | elif not os.path.isdir(args.outputdir): 180 | os.mkdir(args.outputdir) 181 | 182 | cookies, csrf_token = create_session(args.email, password, 183 | browser_visible=args.showbrowser, proxy=args.proxy) 184 | if not args.asin: 185 | asins = get_asins(user_agent, cookies, csrf_token) 186 | else: 187 | asins = args.asin 188 | 189 | devices = get_devices(user_agent, cookies, csrf_token) 190 | print("Please choose which device you want to download your e-books to!") 191 | for i in range(len(devices)): 192 | print(" " + str(i) + ". " + devices[i]['deviceAccountName']) 193 | while True: 194 | try: 195 | choice = int(input("Device #: ")) 196 | except: 197 | logger.error("Not a number!") 198 | if choice in range(len(devices)): 199 | break 200 | 201 | download_books(user_agent, cookies, devices[choice], asins, args.outputdir) 202 | 203 | print("\n\nAll done!\nNow you can use apprenticeharper's DeDRM tools " \ 204 | "(https://github.com/apprenticeharper/DeDRM_tools)\n" \ 205 | "with the following serial number to remove DRM: " + 206 | devices[choice]['deviceSerialNumber']) 207 | 208 | if __name__ == '__main__': 209 | try: 210 | sys.exit(main()) 211 | except KeyboardInterrupt: 212 | logger.info("Exiting...") 213 | --------------------------------------------------------------------------------