├── LICENSE
├── README.md
└── bookp.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | This is free and unencumbered software released into the public domain.
 2 | 
 3 | Anyone is free to copy, modify, publish, use, compile, sell, or
 4 | distribute this software, either in source code form or as a compiled
 5 | binary, for any purpose, commercial or non-commercial, and by any
 6 | means.
 7 | 
 8 | In jurisdictions that recognize copyright laws, the author or authors
 9 | of this software dedicate any and all copyright interest in the
10 | software to the public domain. We make this dedication for the benefit
11 | of the public at large and to the detriment of our heirs and
12 | successors. We intend this dedication to be an overt act of
13 | relinquishment in perpetuity of all present and future rights to this
14 | software under copyright law.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22 | OTHER DEALINGS IN THE SOFTWARE.
23 | 
24 | For more information, please refer to <http://unlicense.org>
25 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # bOOkp
 2 | 
 3 | Quick'n'dirty script to download all you Kindle ebooks.
 4 | 
 5 | I needed to backup all my Kindle e-books, so put together this script. It does
 6 | work for now, but a change in the download process will probably break it, and I
 7 | may not have the time to fix it right away.
 8 | 
 9 | You can download all your e-books (that are eligible for download), or you can
10 | specify multiple ASINs to download. By default the script will only display
11 | warnings, errors, and a finish message. If you want to see progress, you have to
12 | use the `--verbose` flag. Selenium with ChromeDriver is used to handle login,
13 | and you can display the browser with `--showbrowser` - this may come handy if
14 | something goes wrong.
15 | 
16 | The only mandatory command line parameter is the e-mail address associated with
17 | your Amazon account, but of course the script will need your password too - it
18 | will ask for it if not given as parameter. Keep in mind that passwords given as
19 | parameters will probably be stored in you history!
20 | 
21 | The script will also ask which of your devices you want to download your books
22 | to. This is important, because the downloaded books will be DRMd to that
23 | particular device. The serial number (which is required to remove DRM) will be
24 | printed when the books are downloaded.
25 | 
26 | ## Usage
27 | 
28 | ```
29 | usage: bookp.py [-h] [--verbose] [--showbrowser] --email EMAIL
30 |                 [--password PASSWORD] [--outputdir OUTPUTDIR] [--proxy PROXY]
31 |                 [--asin [ASIN [ASIN ...]]]
32 | 
33 | Amazon e-book downloader.
34 | 
35 | optional arguments:
36 |   -h, --help            show this help message and exit
37 |   --verbose             show info messages
38 |   --showbrowser         display browser while creating session.
39 |   --email EMAIL         Amazon account e-mail address
40 |   --password PASSWORD   Amazon account password
41 |   --outputdir OUTPUTDIR
42 |                         download directory (default: books)
43 |   --proxy PROXY         HTTP proxy server
44 |   --asin [ASIN [ASIN ...]]
45 |                         list of ASINs to download
46 | ```
47 | 
48 | ## Requirements
49 | 
50 | * [Python 3.x](https://www.python.org)
51 | * [ChromeDriver](https://sites.google.com/a/chromium.org/chromedriver/downloads)
52 | * the following Python modules:
53 |   * [requests](https://pypi.org/project/requests/)
54 |   * [PyVirtualDisplay](https://pypi.org/project/PyVirtualDisplay/)
55 |   * [selenium](https://pypi.org/project/selenium/)
56 | 


--------------------------------------------------------------------------------
/bookp.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import getpass
  4 | import json
  5 | import logging
  6 | import os
  7 | import re
  8 | import requests
  9 | import sys
 10 | import urllib.parse
 11 | 
 12 | from argparse import ArgumentParser
 13 | from pyvirtualdisplay import Display
 14 | from selenium import webdriver
 15 | 
 16 | 
 17 | user_agent = {'User-Agent': 'krumpli'}
 18 | logger = logging.getLogger(__name__)
 19 | 
 20 | def create_session(email, password, browser_visible=False, proxy=None):
 21 |     if not browser_visible:
 22 |         display = Display(visible=0)
 23 |         display.start()
 24 | 
 25 |     logger.info("Starting browser")
 26 |     options = webdriver.ChromeOptions()
 27 |     if proxy:
 28 |         options.add_argument('--proxy-server='+proxy)
 29 |     browser = webdriver.Chrome(chrome_options=options)
 30 | 
 31 |     logger.info("Loading www.amazon.com")
 32 |     browser.get('https://www.amazon.com')
 33 | 
 34 |     logger.info("Logging in")
 35 |     browser.find_element_by_css_selector("#nav-signin-tooltip > a.nav-action-button").click()
 36 |     browser.find_element_by_id("ap_email").clear()
 37 |     browser.find_element_by_id("ap_email").send_keys(email)
 38 | 
 39 |     browser.find_element_by_id("ap_password").clear()
 40 |     browser.find_element_by_id("ap_password").send_keys(password)
 41 |     browser.find_element_by_id("signInSubmit").click()
 42 | 
 43 |     logger.info("Getting CSRF token")
 44 |     browser.get('https://www.amazon.com/hz/mycd/myx#/home/content/booksAll')
 45 | 
 46 |     match = re.search('var csrfToken = "(.*)";', browser.page_source)
 47 |     if match:
 48 |         csrf_token = match.group(1)
 49 | 
 50 |     cookies = {}
 51 |     for cookie in browser.get_cookies():
 52 |         cookies[cookie['name']] = cookie['value']
 53 | 
 54 |     browser.quit()
 55 |     if not browser_visible:
 56 |         display.stop();
 57 | 
 58 |     return cookies, csrf_token
 59 | 
 60 | """
 61 | NOTE: This function is not used currently, because the download URL can be
 62 | constructed without this additional request. This might change in the future,
 63 | so I'm keeping this here just in case.
 64 | 
 65 | def get_download_url(user_agent, cookies, csrf_token, asin, device_id):
 66 |     logger.info("Getting download URL for " + asin)
 67 |     data_json = {
 68 |         'param':{
 69 |             'DownloadViaUSB':{
 70 |                 'contentName':asin,
 71 |                 'encryptedDeviceAccountId':device_id, # device['deviceAccountId']
 72 |                 'originType':'Purchase'
 73 |             }
 74 |         }
 75 |     }    
 76 | 
 77 |     r = requests.post('https://www.amazon.com/hz/mycd/ajax',
 78 |         data={'data':json.dumps(data_json), 'csrfToken':csrf_token},
 79 |         headers=user_agent, cookies=cookies)
 80 |     rr = json.loads(r.text)["DownloadViaUSB"]
 81 |     return rr["URL"] if rr["success"] else None
 82 | """
 83 | 
 84 | def get_devices(user_agent, cookies, csrf_token):
 85 |     logger.info("Getting device list")
 86 |     data_json = {'param': {'GetDevices': {}}}
 87 |     
 88 |     r = requests.post('https://www.amazon.com/hz/mycd/ajax',
 89 |         data={'data':json.dumps(data_json), 'csrfToken':csrf_token},
 90 |         headers=user_agent, cookies=cookies)
 91 |     devices = json.loads(r.text)["GetDevices"]["devices"]
 92 | 
 93 |     return [device for device in devices if 'deviceSerialNumber' in device]
 94 | 
 95 | def get_asins(user_agent, cookies, csrf_token):
 96 |     logger.info("Getting e-book list")
 97 |     startIndex = 0
 98 |     batchSize = 100
 99 |     data_json = {
100 |         'param':{
101 |             'OwnershipData':{
102 |                 'sortOrder':'DESCENDING',
103 |                 'sortIndex':'DATE',
104 |                 'startIndex':startIndex,
105 |                 'batchSize':batchSize,
106 |                 'contentType':'Ebook',
107 |                 'itemStatus':['Active'],
108 |                 'originType':['Purchase'],
109 |             }
110 |         }
111 |     }
112 | 
113 |     # NOTE: This loop could be replaced with only one request, since the
114 |     # response tells us how many items are there ('numberOfItems'). I guess that
115 |     # number will never be high enough to cause problems, but I want to be on
116 |     # the safe side, hence the download in batches approach.
117 |     asins = []
118 |     while True:
119 |         r = requests.post('https://www.amazon.com/hz/mycd/ajax',
120 |             data={'data':json.dumps(data_json), 'csrfToken':csrf_token},
121 |             headers=user_agent, cookies=cookies)
122 |         rr = json.loads(r.text)
123 |         asins += [book['asin'] for book in rr['OwnershipData']['items']]
124 | 
125 |         if rr['OwnershipData']['hasMoreItems']:
126 |             startIndex += batchSize
127 |             data_json['param']['OwnershipData']['startIndex'] = startIndex
128 |         else:
129 |             break
130 | 
131 |     return asins
132 | 
133 | def download_books(user_agent, cookies, device, asins, directory):
134 |     logger.info("Downloading {} books".format(len(asins)))
135 |     cdn_url = 'http://cde-g7g.amazon.com/FionaCDEServiceEngine/FSDownloadContent'
136 |     cdn_params = 'type=EBOK&key={}&fsn={}&device_type={}'
137 |     for asin in asins:
138 |         try:
139 |             params = cdn_params.format(asin, device['deviceSerialNumber'], device['deviceType'])
140 |             r = requests.get(cdn_url, params=params, headers=user_agent, cookies=cookies, stream=True)
141 |             name = re.findall("filename\*=UTF-8''(.+)", r.headers['Content-Disposition'])[0]
142 |             name = urllib.parse.unquote(name)
143 |             name = name.replace('/', '_')
144 |             with open(os.path.join(directory, name), 'wb') as f:
145 |                 for chunk in r.iter_content(chunk_size=512):
146 |                     f.write(chunk)
147 |             logger.info('Downloaded ' + asin + ': ' + name)
148 |         except Exception as e:
149 |             logger.debug(e)
150 |             logger.error('Failed to download ' + asin)
151 | 
152 | def main():
153 |     parser = ArgumentParser(description="Amazon e-book downloader.")
154 |     parser.add_argument("--verbose", help="show info messages", action="store_true")
155 |     parser.add_argument("--showbrowser", help="display browser while creating session.", action="store_true")
156 |     parser.add_argument("--email", help="Amazon account e-mail address", required=True)
157 |     parser.add_argument("--password", help="Amazon account password", default=None)
158 |     parser.add_argument("--outputdir", help="download directory (default: books)", default="books")
159 |     parser.add_argument("--proxy", help="HTTP proxy server", default=None)
160 |     parser.add_argument("--asin", help="list of ASINs to download", nargs='*')
161 |     args = parser.parse_args()
162 | 
163 |     if args.verbose:
164 |         logger.setLevel(logging.INFO)
165 |     else:
166 |         logger.setLevel(logging.WARNING)
167 |     formatter = logging.Formatter('[%(levelname)s]\t%(asctime)s %(message)s')
168 |     handler = logging.StreamHandler()
169 |     handler.setFormatter(formatter)
170 |     logger.addHandler(handler)
171 | 
172 |     password = args.password
173 |     if not password:
174 |         password = getpass.getpass("Your Amazon password: ")
175 | 
176 |     if os.path.isfile(args.outputdir):
177 |         logger.error("Output directory is a file!")
178 |         return -1
179 |     elif not os.path.isdir(args.outputdir):
180 |         os.mkdir(args.outputdir)
181 | 
182 |     cookies, csrf_token = create_session(args.email, password,
183 |         browser_visible=args.showbrowser, proxy=args.proxy)
184 |     if not args.asin:
185 |         asins = get_asins(user_agent, cookies, csrf_token)
186 |     else:
187 |         asins = args.asin
188 | 
189 |     devices = get_devices(user_agent, cookies, csrf_token)
190 |     print("Please choose which device you want to download your e-books to!")
191 |     for i in range(len(devices)):
192 |         print(" " + str(i) + ". " + devices[i]['deviceAccountName'])
193 |     while True:
194 |         try:
195 |             choice = int(input("Device #: "))
196 |         except:
197 |             logger.error("Not a number!")
198 |         if choice in range(len(devices)):
199 |             break
200 | 
201 |     download_books(user_agent, cookies, devices[choice], asins, args.outputdir)
202 | 
203 |     print("\n\nAll done!\nNow you can use apprenticeharper's DeDRM tools " \
204 |             "(https://github.com/apprenticeharper/DeDRM_tools)\n" \
205 |             "with the following serial number to remove DRM: " +
206 |             devices[choice]['deviceSerialNumber'])
207 | 
208 | if __name__ == '__main__':
209 |     try:
210 |     	sys.exit(main())
211 |     except KeyboardInterrupt:
212 |         logger.info("Exiting...")
213 | 


--------------------------------------------------------------------------------