├── .DS_Store
├── .gitignore
├── Assets
    ├── readwise-python_1.png
    ├── readwise-python_2.png
    ├── readwise-python_3.png
    ├── readwise-python_4.png
    ├── readwise-python_5.png
    └── readwise-python_6.png
├── README.md
├── readwise-GET.py
├── readwise-GET_install.py
└── readwiseMetadata.py.default


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nicolevanderhoeven/readwise2directory/972f91d1a79ccc1d7b4bb359cc50a51ea364595e/.DS_Store


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | 
 5 | # C extensions
 6 | *.so
 7 | 
 8 | # Distribution / packaging
 9 | bin/
10 | build/
11 | develop-eggs/
12 | dist/
13 | eggs/
14 | lib/
15 | lib64/
16 | parts/
17 | sdist/
18 | var/
19 | *.egg-info/
20 | .installed.cfg
21 | *.egg
22 | 
23 | # Installer logs
24 | pip-log.txt
25 | pip-delete-this-directory.txt
26 | 
27 | # Unit test / coverage reports
28 | .tox/
29 | .coverage
30 | .cache
31 | nosetests.xml
32 | coverage.xml
33 | 
34 | # Translations
35 | *.mo
36 | 
37 | # Mr Developer
38 | .mr.developer.cfg
39 | .project
40 | .pydevproject
41 | 
42 | # Rope
43 | .ropeproject
44 | 
45 | # Django stuff:
46 | *.log
47 | *.pot
48 | 
49 | # Sphinx documentation
50 | docs/_build/
51 | 
52 | # Metadata with access token
53 | readwiseMetadata.py
54 | 
55 | # readWise Categories
56 | readwiseCategories/
57 | 
58 | # Extra log
59 | readwiseGET_scheduled.log
60 | 


--------------------------------------------------------------------------------
/Assets/readwise-python_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nicolevanderhoeven/readwise2directory/972f91d1a79ccc1d7b4bb359cc50a51ea364595e/Assets/readwise-python_1.png


--------------------------------------------------------------------------------
/Assets/readwise-python_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nicolevanderhoeven/readwise2directory/972f91d1a79ccc1d7b4bb359cc50a51ea364595e/Assets/readwise-python_2.png


--------------------------------------------------------------------------------
/Assets/readwise-python_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nicolevanderhoeven/readwise2directory/972f91d1a79ccc1d7b4bb359cc50a51ea364595e/Assets/readwise-python_3.png


--------------------------------------------------------------------------------
/Assets/readwise-python_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nicolevanderhoeven/readwise2directory/972f91d1a79ccc1d7b4bb359cc50a51ea364595e/Assets/readwise-python_4.png


--------------------------------------------------------------------------------
/Assets/readwise-python_5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nicolevanderhoeven/readwise2directory/972f91d1a79ccc1d7b4bb359cc50a51ea364595e/Assets/readwise-python_5.png


--------------------------------------------------------------------------------
/Assets/readwise-python_6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nicolevanderhoeven/readwise2directory/972f91d1a79ccc1d7b4bb359cc50a51ea364595e/Assets/readwise-python_6.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Description
 2 | 
 3 | Fetch new books and highlights from Readwise and print the results as markdown files in a chosen directory (i.e. Obsidian vault)
 4 | 
 5 | I'm a huge fan of [Readwise](https://readwise.io/) and [Obsidian](https://obsidian.md/), and I hope this is helpful to others like me who wanted something a bit different than the basic markdown export (beta)
 6 | 
 7 | # Features
 8 | 
 9 | - Fetch all or subset of new books and highlights from Readwise (via their API https://readwise.io/api_deets)
10 | 	- Filter by custom `date from` or `last script run` date
11 | - Group and sort highlights by book/article/podcast/tweet
12 | - Create new markdown notes or append to existing ones in a chosen directory (i.e. Obsidian vault)
13 | 	- Filenames are formatted using [slugify](https://docs.djangoproject.com/en/3.1/ref/utils/)
14 | 	- Highlights with 'discard' tag are removed
15 | 	- Books with no highlights are ignored
16 | - Markdown notes are formatted as:
17 | 	- Book metadata - in YAML format
18 | 		- Title
19 | 		- Author
20 | 		- Number of highlights
21 | 		- Last updated date - formatted as "YYMMDD dddd" in wikilinks
22 | 		- Readwise URL
23 |  	- Title - as a heading 1
24 |  	- Highlight data
25 | 		- Text
26 | 		- Block reference ID - using the Readwise highlight ID as the unique block reference
27 | 		- Note
28 | 		- Tags - optional
29 | 		- References (e.g. original URL)
30 | 		- Date - formatted as "YYMMDD dddd" in wikilinks
31 | - Store book and highlight data into JSON files for easy retrieval and manipulation
32 | - Print outputs to the console and store in a log file for troubleshooting
33 | 
34 | # Screenshots
35 | 
36 | ##### Markdown note with book metadata as YAML frontmatter
37 | ![](Assets/readwise-python_1.png)
38 | 
39 | ##### Cover images with hyperlinks to their source URLs in Readwise
40 | ![](Assets/readwise-python_6.png)
41 | 
42 | ##### Highlight data with Readwise highlight ID's as unique block references
43 | ![](Assets/readwise-python_2.png)
44 | 
45 | ##### Markdown note with headings (h1-h5) from Readwise
46 | ![](Assets/readwise-python_3.png)
47 | 
48 | ##### Graph view of results
49 | ![](Assets/readwise-python_4.png)
50 | 
51 | ##### Log file of outputs
52 | ![](Assets/readwise-python_5.png)
53 | 
54 | # Installation
55 | 
56 | - Clone this repo or download the ZIP folder and move to a chosen directory - this will serve as the `sourceDirectory` for running the scripts
57 | 	- Make sure the `readwiseCategories` folder is in the same directory as the `readwise-GET.py` script. This will store your JSON files.
58 | - Configure the `readwiseMetadata.py` file:
59 | 	- Required
60 | 		- Rename `readwiseMetadata.py.default to readwiseMetadata.py`.
61 | 		- Add your token - https://readwise.io/access_token
62 | 		- Specify a valid `targetDirectory` path for your markdown notes (e.g. Dropbox folder, Obsidian Vault).
63 | 	- Optional
64 | 		- Customise the request query string - add a `dateFrom` (formatted as "YYYY-MM-DD"), otherwise the `last script run` will be used (if available) or all highlights will be fetched
65 | 		- Add your `email` and `password`
66 | 		- Specify a `chromedriverDirectory` - instructions [here](https://chromedriver.chromium.org/)
67 | 		- Update the `highlightLimitToFetchTags` - default is 10 (recommended)
68 | 		- Specify a valid `downloadsDirectory`
69 | - Install the Python modules specified in `readwise-GET_install.py` via [pip](https://packaging.python.org/tutorials/installing-packages/)
70 | - Open the terminal or command prompt and navigate to the `sourceDirectory` (i.e. downloaded folder) - e.g. cd C:/Users/johnsmith/Downloads/readwise2directory-main
71 | - Run the `readwise-GET.py` script
72 | 	- `py readwise-GET.py` (on Windows) or `python3.9 readwise-GET.py` (on Mac)
73 | 	- Note: ~3 minutes to process ~1300 books, ~6200 highlights and ~2900 tags
74 | 
75 | # Disclaimers
76 | 
77 | - This is NOT an official plugin or integration, so please use mindfully.
78 | - This is my first real contribution on GitHub, so I'm open to any and all feedback
79 | 
80 | # Requirements
81 | - A Readwise account and a valid access token (https://readwise.io/access_token)
82 | - Python 3.9.0+ (https://www.python.org/downloads/).
83 | 
84 | # Contributions
85 | 
86 | [![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/paypalme/nicrivard)
87 | 
88 | If you like this plugin, please consider donating; I really appreciate any and all support! ❤️
89 | 


--------------------------------------------------------------------------------
/readwise-GET.py:
--------------------------------------------------------------------------------
   1 | ##############################
   2 | ### Import python packages ###
   3 | ##############################
   4 | 
   5 | import requests, os, io, sys, shutil, django, json, time
   6 | #from datetime import datetime
   7 | import datetime
   8 | from itertools import groupby
   9 | from operator import itemgetter
  10 | from unidecode import unidecode
  11 | from pathvalidate import ValidationError, validate_filepath
  12 | from pathlib import Path
  13 | from django.utils.text import slugify
  14 | from json import JSONEncoder
  15 | from json.decoder import JSONDecodeError
  16 | import pandas as pd
  17 | import numpy as np
  18 | from selenium import webdriver
  19 | from selenium.webdriver.common.by import By
  20 | from selenium.common.exceptions import TimeoutException, NoSuchElementException
  21 | from selenium.webdriver.common.keys import Keys
  22 | from selenium.webdriver.chrome.options import Options
  23 | from selenium.webdriver.support.ui import WebDriverWait
  24 | from selenium.webdriver.support import expected_conditions as EC
  25 | 
  26 | ##########################
  27 | ### Log script outputs ###
  28 | ##########################
  29 | 
  30 | old_stdout = sys.stdout
  31 | 
  32 | old_cwd = os.getcwd()
  33 | 
  34 | startTime = datetime.datetime.now()
  35 | 
  36 | def logDateTimeOutput(message):
  37 |     log_file = open('readwiseGET.log', 'a')
  38 |     sys.stdout = log_file
  39 |     now = datetime.datetime.now()
  40 |     print(now.strftime("%Y-%m-%dT%H:%M:%SZ") + " " + str(message))
  41 |     sys.stdout = old_stdout
  42 |     log_file.close()
  43 | 
  44 | logDateTimeOutput('Script started')
  45 | 
  46 | ########################
  47 | ### Create functions ###
  48 | ########################
  49 | 
  50 | # Check if a directory variable is defined and formatted correctly
  51 | # If TRUE, add a new system path for that. If FALSE, do nothing.
  52 | def insertPath(directory):
  53 |     if directory == "" or directory is None:
  54 |         return
  55 |     else:
  56 |         try:
  57 |             sys.path.insert(1, directory)
  58 |         except ValidationError as e:
  59 |             logDateTimeOutput(e)
  60 | 
  61 | # Check if a 'dateFrom' variable is defined and formatted correctly
  62 | # If TRUE, convert to UTC format. If FALSE, default to dateLastScriptRun
  63 | def convertDateFromToUtcFormat(dateFrom):
  64 |     if dateFrom == "" or dateFrom is None:
  65 |         lastScriptRunDateMatchingString = ' Script complete'
  66 |         try:
  67 |             for line in reversed(list(open('readwiseGET.log', 'r').readlines())):
  68 |                 if lastScriptRunDateMatchingString in line:
  69 |                     dateLastScriptRun = str(line.replace(lastScriptRunDateMatchingString, '')).rstrip("\n")
  70 |                     dateFrom = dateLastScriptRun
  71 |                     message = 'Last successful script run = "' + str(dateFrom) + '" used as dateFrom in query string'
  72 |                     logDateTimeOutput(message)
  73 |                     print(message)
  74 |                     return dateLastScriptRun
  75 |         except IOError:
  76 |             logDateTimeOutput('Failed to read readwiseGET.log file')
  77 |     elif dateFrom != "" or dateFrom is not None:
  78 |         try:
  79 |             dateFrom = datetime.datetime.strptime(dateFrom, '%Y-%m-%d')
  80 |             dateFrom = dateFrom.strftime("%Y-%m-%dT%H:%M:%SZ")
  81 |             message = 'Date from = "' + str(dateFrom) + '" from readwiseMetadata used in query string'
  82 |             logDateTimeOutput(message)
  83 |             print(message)
  84 |             return dateFrom
  85 |         except ValueError:
  86 |             logDateTimeOutput("Incorrect data format. It should be 'YYYY-MM-DD'")
  87 |     else:
  88 |         message = 'No dateFrom variable defined in readwiseMetadata or readwiseGET.log. Fetching all readwise highlights'
  89 |         logDateTimeOutput(message)
  90 |         print(message)
  91 | 
  92 | def replaceNoneInListOfDict(listOfDicts):
  93 |     for i in range(len(listOfDicts)):
  94 |         for k, v in iter(listOfDicts[i].items()):
  95 |             if k == 'location' and v is None:
  96 |                 listOfDicts[i][k] = 0
  97 |             if k == 'location_type' and v == 'none':
  98 |                 listOfDicts[i][k] = 'custom'
  99 | 
 100 | ######################################################
 101 | ### Manipulating book and highlight data with JSON ###
 102 | ######################################################
 103 | 
 104 | # Load JSON file into list of categories objects
 105 | def loadBookDataFromJsonToObject():
 106 |     for i in range(len(categoriesObjectNames)):
 107 |         try:
 108 |             with open(sourceDirectory + "/readwiseCategories/" + categoriesObjectNames[i] + ".json", 'r') as infile:
 109 |                 try:
 110 |                     categoriesObject[i] = json.load(infile) # list of categories objects with up-to-date data loaded from JSON files
 111 |                     message = str(len(categoriesObject[i])) + ' books loaded from ' + str(categoriesObjectNames[i]) + '.json'
 112 |                     logDateTimeOutput(message)
 113 |                 except JSONDecodeError:
 114 |                     categoriesObject[i] = []
 115 |         except FileNotFoundError:
 116 |             categoriesObject[i] = []
 117 | 
 118 | # Check if 'book_id' exists already. If no, append book data to the relevant category object
 119 | def appendBookDataToObject():
 120 |     newBooksCounter = 0
 121 |     updatedBooksCounter = 0
 122 |     totalNumberOfBooks = len(booksListResultsSort)
 123 |     print('totalNumber of Books = ' + str(totalNumberOfBooks))
 124 |     for key, value in booksListResultsGroup: # key = 'category'
 125 |         old_newBooksCounter = newBooksCounter
 126 |         old_updatedBooksCounter = updatedBooksCounter
 127 |         for data in value:
 128 |             book_id = str(data['id'])
 129 |             title = unidecode(data['title'])
 130 |             if(str(data['author']) == "None"):
 131 |                 author = " "
 132 |             else:
 133 |                 author = unidecode(data['author'])
 134 |             source = data['category']
 135 |             num_highlights = data['num_highlights']
 136 |             updated = data['updated']
 137 |             cover_image_url = data['cover_image_url']
 138 |             url = data['highlights_url']
 139 |             source_url = data['source_url']
 140 |             highlights = []
 141 |             values = { "book_id" : book_id, "title" : title, "author" : author, "source" : source, "url" : url, "cover_image_url" : cover_image_url, "source_url" : source_url, "num_highlights" : num_highlights, "updated" : updated, "highlights" : highlights }
 142 |             indexCategory = categoriesObjectNames.index(source) # Identify which position the 'category' corresponds to within the list of category objects
 143 |             print('title = ' + title)
 144 |             if not any(d["book_id"] == book_id for d in categoriesObject[indexCategory]):
 145 |                 categoriesObject[indexCategory].append(values)
 146 |                 newBooksCounter += 1
 147 |                 print(str((newBooksCounter + updatedBooksCounter)) + '/' + str(len(booksListResultsSort)) + ' books added or updated')
 148 |                 print('New title = ' + title)
 149 |             else:
 150 |                 indexBook = list(map(itemgetter('book_id'), categoriesObject[indexCategory])).index(book_id)
 151 |                 categoriesObject[indexCategory][indexBook]['book_id'] = book_id
 152 |                 categoriesObject[indexCategory][indexBook]['title'] = title
 153 |                 categoriesObject[indexCategory][indexBook]['author'] = author
 154 |                 categoriesObject[indexCategory][indexBook]['source'] = source
 155 |                 categoriesObject[indexCategory][indexBook]['num_highlights'] = num_highlights
 156 |                 categoriesObject[indexCategory][indexBook]['updated'] = updated
 157 |                 categoriesObject[indexCategory][indexBook]['cover_image_url'] = cover_image_url
 158 |                 categoriesObject[indexCategory][indexBook]['url'] = url
 159 |                 categoriesObject[indexCategory][indexBook]['source_url'] = source_url
 160 |                 updatedBooksCounter += 1
 161 |                 print(str((newBooksCounter + updatedBooksCounter)) + '/' + str(len(booksListResultsSort)) + ' books added or updated')
 162 |         new_newBooksCounter = newBooksCounter
 163 |         new_updatedBooksCounter = updatedBooksCounter
 164 |         message = str(new_newBooksCounter - old_newBooksCounter) + ' new books added and ' + str(new_updatedBooksCounter - old_updatedBooksCounter) + ' updated in ' + str(categoriesObjectNames[indexCategory]) + ' object'
 165 |         logDateTimeOutput(message)
 166 | 
 167 | # Check if 'highlight_id' exists already. If no, append highlight data to the relevant 'book_id' within the category object
 168 | def appendHighlightDataToObject():
 169 |     newHighlightsCounter = 0
 170 |     updatedHighlightsCounter = 0
 171 |     for key, value in highlightsListResultsGroup: # key = 'book_id'
 172 |         listCategories = [item for category in categoriesObject for item in category]
 173 |         if any(d.get('book_id') == str(key) for d in listCategories): # Check if the 'book_id' from the grouped highlights exists.
 174 |             index = list(map(itemgetter('book_id'), listCategories)).index(str(key))
 175 |             source = listCategories[index]['source'] # Get the 'category' of the corresponding 'book_id' from the grouped highlights
 176 |             indexCategory = categoriesObjectNames.index(source) # Identify which position the 'category' corresponds to within the list of category objects
 177 |             indexBook = list(map(itemgetter('book_id'), categoriesObject[indexCategory])).index(str(key)) # Identify which position the 'book_id' corresponds to within the category object
 178 |             for data in value:
 179 |                 id = str(data['id'])
 180 |                 note = unidecode(data['note'])
 181 |                 location = str(data['location'])
 182 |                 location_type = data['location_type']
 183 |                 book_id = str(data['book_id'])
 184 |                 url = str(data['url'])
 185 |                 highlighted_at = str(data['highlighted_at'])
 186 |                 updated = str(data['updated'])
 187 |                 text = unidecode(data['text'])
 188 |                 tags = []
 189 |                 # If the source is a book, add 10 hours to the highlighted at date to account for timezone difference between me and AKST (Amazon's highlighted at timezone)
 190 |                 # if (str(source) == 'books'):
 191 |                 #     highlighted_at = str(data['highlighted_at']) # 2021-02-06T04:56:00Z
 192 |                 #     highlighted_at = datetime.datetime.strptime(highlighted_at, "%Y-%m-%dT%H:%M:%SZ")
 193 |                 #     highlighted_at = highlighted_at + datetime.timedelta(hours=10)
 194 |                 #     highlighted_at = highlighted_at.strftime("%Y-%m-%dT%H:%M:%SZ")
 195 |                 #     print(' appendHighlightDataToObject highlighted_at =', highlighted_at)
 196 |                 # highlight = { "id" : id, "text" : text, "note" : note, "tags" : tags, "location" : location, "location_type" : location_type, "url" : url, "highlighted_at" : highlighted_at, "updated" : updated }
 197 |                 if not any(d["id"] == id for d in categoriesObject[indexCategory][indexBook]['highlights']):
 198 |                     highlight = { "id" : id, "text" : text, "note" : note, "tags" : tags, "location" : location, "location_type" : location_type, "url" : url, "highlighted_at" : highlighted_at, "updated" : updated }
 199 |                     categoriesObject[indexCategory][indexBook]['highlights'].append(highlight)
 200 |                     sorted(categoriesObject[indexCategory][indexBook]['highlights'], key = itemgetter('location'))
 201 |                     newHighlightsCounter += 1
 202 |                     listOfBookIdsToUpdateMarkdownNotes.append([str(key), str(source)])
 203 |                     print(str((newHighlightsCounter + updatedHighlightsCounter)) + '/' + str(len(highlightsListResultsSort)) + ' highlights added or updated')
 204 |                 else:
 205 |                     indexHighlight = list(map(itemgetter('id'), categoriesObject[indexCategory][indexBook]['highlights'])).index(id) # Should be the same as 'data'
 206 |                     tags = categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['tags']
 207 |                     highlight = { "id" : id, "text" : text, "note" : note, "tags" : tags, "location" : location, "location_type" : location_type, "url" : url, "highlighted_at" : highlighted_at, "updated" : updated }
 208 |                     categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight] = highlight
 209 |                     sorted(categoriesObject[indexCategory][indexBook]['highlights'], key = itemgetter('location'))
 210 |                     updatedHighlightsCounter += 1
 211 |                     listOfBookIdsToUpdateMarkdownNotes.append([str(key), str(source)])
 212 |                     print(str((newHighlightsCounter + updatedHighlightsCounter)) + '/' + str(len(highlightsListResultsSort)) + ' highlights added or updated')
 213 |     message = str(newHighlightsCounter) + ' new highlights added and ' + str(updatedHighlightsCounter) + ' updated (excl tags)' # '.json'
 214 |     logDateTimeOutput(message)
 215 | 
 216 | def appendTagsToHighlightObject(list_highlights):
 217 |     if fetchTagsBoolean is False:
 218 |         return
 219 |     else:
 220 |         if len(list_highlights) == 0:
 221 |             return
 222 |         else:
 223 |             # Open new Chrome window via Selenium
 224 |             print('Opening new Chrome browser window...')
 225 |             options = webdriver.ChromeOptions()
 226 |             options.add_argument('--ignore-certificate-errors')
 227 |             options.add_argument('--incognito')
 228 |             options.add_argument('--headless')
 229 |             options.add_argument('--log-level=3') # to stop logging
 230 |             options.add_argument("start-maximized")
 231 |             driver = webdriver.Chrome(chromedriverDirectory, options=options)
 232 |             # driver = webdriver.Chrome(chromedriverDirectory)
 233 |             driver.get('https://readwise.io/accounts/login')
 234 |             print('Logging into readwise using credentials provided in readwiseMetadata')
 235 |             # Input email as username from readwiseMetadata
 236 |             username = driver.find_element_by_xpath("//*[@id='id_login']")
 237 |             username.clear()
 238 |             username.send_keys(email) # from 'readwiseMetadata'
 239 |             # Input password from readwiseMetadata
 240 |             password = driver.find_element_by_xpath("//*[@id='id_password']")
 241 |             password.clear()
 242 |             password.send_keys(pwd) # from 'readwiseMetadata'
 243 |             # Click login button
 244 |             driver.find_element_by_xpath("/html/body/div[1]/div/div/div/div/div/div/form/div[3]/button").click()
 245 |             print('Log-in successful! Fetching tags...')
 246 |             # Loop through new highlights
 247 |             updatedTagsCounter = 0
 248 |             newOrUpdatedTagsProgressCounter = 0
 249 |             for i in range(len(list_highlights)): # key = 'book_id'
 250 |                 listCategories = [item for category in categoriesObject for item in category]
 251 |                 key = str(list_highlights[i]['book_id'])
 252 |                 id = str(list_highlights[i]['id'])
 253 |                 index = list(map(itemgetter('book_id'), listCategories)).index(key)
 254 |                 source = listCategories[index]['source'] # Get the 'category' of the corresponding 'book_id' from the grouped highlights
 255 |                 indexCategory = categoriesObjectNames.index(source) # Identify which position the 'category' corresponds to within the list of category objects
 256 |                 indexBook = list(map(itemgetter('book_id'), categoriesObject[indexCategory])).index(str(key)) # Identify which position the 'book_id' corresponds to within the category object
 257 |                 bookLastUpdated = categoriesObject[indexCategory][indexBook]['updated']
 258 |                 indexHighlight = list(map(itemgetter('id'), categoriesObject[indexCategory][indexBook]['highlights'])).index(id)
 259 |                 # highlights = categoriesObject[indexCategory][indexBook]['highlights']
 260 |                 book_id = categoriesObject[indexCategory][indexBook]['book_id']
 261 |                 bookReviewUrl = 'https://readwise.io/bookreview/' + book_id
 262 |                 # Open new tab in Chrome window
 263 |                 driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 't')
 264 |                 # driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
 265 |                 driver.get(bookReviewUrl)
 266 |                 # Loop through tags and append to highlight object within corresponding book object
 267 |                 try:
 268 |                     xPathHighlightId = "//*[@id=\'highlight" + id + "\']"
 269 |                     highlightIdBlock = WebDriverWait(driver, 10).until(
 270 |                         EC.presence_of_element_located((By.XPATH, xPathHighlightId))
 271 |                     )
 272 |                     tagLinks = highlightIdBlock.find_elements_by_class_name("tag-link") # Get tags within 'highlight id' block
 273 |                     # Load original tags (if they exist)
 274 |                     originalTags = categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['tags']
 275 |                     originalTags = sorted(originalTags)
 276 |                     originalTagsCounter = len(originalTags)
 277 |                     # Ignore highlights with no tags
 278 |                     if tagLinks == []:
 279 |                         pass
 280 |                     newTags = []
 281 |                     for tag in tagLinks:
 282 |                         if tag == 'readwise':
 283 |                             pass
 284 |                         originalHref = tag.get_attribute("href") # e.g. https://readwise.io/tags/<tag_name>
 285 |                         trimHref = originalHref.replace('https://readwise.io/tags/', '') # e.g. <tag_name>
 286 |                         newTags.append(trimHref)
 287 |                     newTags = sorted(newTags)
 288 |                     newTagsCounter = len(newTags)
 289 |                     if originalTags == newTags:
 290 |                         pass
 291 |                     else:
 292 |                         categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['tags'] = newTags
 293 |                         categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['updated'] = bookLastUpdated
 294 |                         updatedTagsCounter += abs((newTagsCounter - originalTagsCounter))
 295 |                         newOrUpdatedTagsProgressCounter += 1
 296 |                         listOfBookIdsToUpdateMarkdownNotes.append([str(key), str(source)])
 297 |                         print(str(newOrUpdatedTagsProgressCounter) + '/' + str(len(list_highlights)) + ' highlights updated with tags')
 298 |                 except:
 299 |                     message = 'Error looping through tags in highlight id block "' + str(id) + '". Book id: "' + str(book_id) + '". Book URL: "' + str(bookReviewUrl) + '". File: "' \
 300 |                     + str(categoriesObjectNames[indexCategory]) + '.json". Book location: "' + str(indexBook) + '". Highlight location: "' + str(indexHighlight) + '".'
 301 |                     logDateTimeOutput(message)
 302 |                     pass
 303 |             driver.quit()
 304 |         try:
 305 |             message = str(updatedTagsCounter) + ' tags added or updated to ' + str(len(list_highlights)) + ' highlights in ' + str(categoriesObjectNames[indexCategory]) + ' object'
 306 |             logDateTimeOutput(message)
 307 |         except UnboundLocalError:
 308 |             message = 'No tags to add or update'
 309 |             logDateTimeOutput(message)
 310 | 
 311 | def appendUpdatedHighlightsToObject():
 312 |     listOfBookIdsFromBooksList = []
 313 |     listOfBookIdsFromHighlightsList = []
 314 |     listofBookIdsWithMissingHighlights = []
 315 |     for i in range(len(booksListResultsSort)):
 316 |         listOfBookIdsFromBooksList.append(str(booksListResultsSort[i]['id']))
 317 |     for i in range(len(highlightsListResultsSort)):
 318 |         listOfBookIdsFromHighlightsList.append(str(highlightsListResultsSort[i]['book_id']))
 319 |         listOfBookIdsFromHighlightsList = list(dict.fromkeys(listOfBookIdsFromHighlightsList)) # Remove duplicates
 320 |     for i in range(len(listOfBookIdsFromBooksList)):
 321 |         if listOfBookIdsFromBooksList[i] not in listOfBookIdsFromHighlightsList:
 322 |             listofBookIdsWithMissingHighlights.append(str(listOfBookIdsFromBooksList[i]))
 323 |         else:
 324 |             pass
 325 |     for i in range(len(listofBookIdsWithMissingHighlights)):
 326 |         missingHighlightsListQueryString = {
 327 |             "page_size": 1000, # 1000 items per page - maximum
 328 |             "page": 1, # Page 1 >> build for loop to cycle through pages and stop when complete
 329 |             "book_id": listofBookIdsWithMissingHighlights[i],
 330 |         }
 331 |         # Trigger GET request with missingHighlightsListQueryString
 332 |         missingHighlightsList = requests.get(
 333 |             url="https://readwise.io/api/v2/highlights/",
 334 |             headers={"Authorization": "Token " + token}, # token imported from readwiseAccessToken file
 335 |             params=missingHighlightsListQueryString # query string object
 336 |         )
 337 |         # Convert response into JSON object
 338 |         try:
 339 |             missingHighlightsListJson = missingHighlightsList.json() # type(missingHighlightsListJson) = 'dictionary'
 340 |         except ValueError:
 341 |             message = 'Response content from missingHighlightsList request is not valid JSON'
 342 |             logDateTimeOutput(message)
 343 |             print(message) # Originally from https://github.com/psf/requests/issues/4908#issuecomment-627486125
 344 |             break
 345 |             # JSONDecodeError: Expecting value: line 1 column 1 (char 0) specifically happens with an empty string (i.e. empty response content)
 346 |         try:
 347 |             # Create dictionary of missingHighlightsListJson['results']
 348 |             missingHighlightsListResults = missingHighlightsListJson['results'] # type(highlightsListResults) = 'list'
 349 |         except NameError:
 350 |             message = 'Cannot extract results from empty JSON for missingHighlightsList request'
 351 |             logDateTimeOutput(message)
 352 |             print(message)
 353 |             break
 354 |         # Loop through pagination using 'next' property from GET response
 355 |         try:
 356 |             additionalLoopCounter = 0
 357 |             while missingHighlightsListJson['next']:
 358 |                 additionalLoopCounter += 1
 359 |                 print('Fetching additional missing highlight data from readwise... (page ' + str(additionalLoopCounter) + ')')
 360 |                 missingHighlightsList = requests.get(
 361 |                     url=missingHighlightsListJson['next'], # keep same query parameters from booksListQueryString object
 362 |                     headers={"Authorization": "Token " + token}, # token imported from readwiseAccessToken file
 363 |                 )
 364 |                 try:
 365 |                     print('Converting additional missing highlight data returned into JSON... (page ' + str(additionalLoopCounter) + ')')
 366 |                     missingHighlightsListJson = missingHighlightsList.json() # type(missingHighlightsListJson) = 'dictionary'
 367 |                 except ValueError:
 368 |                     message = 'Response content from additional missingHighlightsList request is not valid JSON'
 369 |                     logDateTimeOutput(message)
 370 |                     print(message) # Originally from https://github.com/psf/requests/issues/4908#issuecomment-627486125
 371 |                     # JSONDecodeError: Expecting value: line 1 column 1 (char 0) specifically happens with an empty string (i.e. empty response content)
 372 |                     break
 373 |                 try:
 374 |                     missingHighlightsListResults.extend(missingHighlightsListJson['results'])
 375 |                 except NameError:
 376 |                     message = 'Cannot extract results from empty JSON for additional missingHighlightsList request'
 377 |                     logDateTimeOutput(message)
 378 |                     print(message)
 379 |                     break
 380 |         except NameError:
 381 |             message = 'Cannot loop through pagination from empty response'
 382 |             logDateTimeOutput(message)
 383 |             print(message)
 384 |             break
 385 |         # Replace 'location': None and 'location_type': 'none' values in list of dictionaries
 386 |         replaceNoneInListOfDict(missingHighlightsListResults)
 387 |         # Sort highlightsListResults data by 'book_id' key and 'location'
 388 |         missingHighlightsListResultsSort = sorted(missingHighlightsListResults, key = itemgetter('location'))
 389 |         newMissingHighlightsCounter = 0
 390 |         updatedMissingHighlightsCounter = 0
 391 |         if len(missingHighlightsListResults) == 0:
 392 |             break
 393 |         else:
 394 |             try:
 395 |                 for j in range(len(missingHighlightsListResultsSort)):
 396 |                     listCategories = [item for category in categoriesObject for item in category]
 397 |                     book_id = str(missingHighlightsListResultsSort[j]['book_id'])
 398 |                     index = list(map(itemgetter('book_id'), listCategories)).index(book_id)
 399 |                     source = listCategories[index]['source'] # Get the 'category' of the corresponding 'book_id' from the grouped highlights
 400 |                     indexCategory = categoriesObjectNames.index(source) # Identify which position the 'category' corresponds to within the list of category objects
 401 |                     indexBook = list(map(itemgetter('book_id'), categoriesObject[indexCategory])).index(str(book_id)) # Identify which position the 'book_id' corresponds to within the category object
 402 |                     bookLastUpdated = categoriesObject[indexCategory][indexBook]['updated']
 403 |                     id = str(missingHighlightsListResultsSort[j]['id'])
 404 |                     note = unidecode(missingHighlightsListResultsSort[j]['note'])
 405 |                     location = str(missingHighlightsListResultsSort[j]['location'])
 406 |                     location_type = missingHighlightsListResultsSort[j]['location_type']
 407 |                     url = str(missingHighlightsListResultsSort[j]['url'])
 408 |                     # If the source is a book, add 10 hours to the highlighted at date to account for timezone difference between me and AKST (Amazon's highlighted at timezone)
 409 |                     if (str(source) == 'books'):
 410 |                         highlighted_at = str(data['highlighted_at']) # 2021-02-06T04:56:00Z
 411 |                         highlighted_at = datetime.datetime.strptime(highlighted_at, "%Y-%m-%dT%H:%M:%SZ")
 412 |                         highlighted_at = highlighted_at + datetime.timedelta(hours=10)
 413 |                         highlighted_at = highlighted_at.strftime("%Y-%m-%dT%H:%M:%SZ")
 414 |                         print('appendUpdatedHighlightsToObject highlighted_at =', highlighted_at)
 415 |                     highlighted_at = str(missingHighlightsListResultsSort[j]['highlighted_at'])
 416 |                     updated = str(missingHighlightsListResultsSort[j]['updated'])
 417 |                     text = unidecode(missingHighlightsListResultsSort[j]['text'])
 418 |                     tags = []
 419 |                     highlight = { "id" : id, "text" : text, "note" : note, "tags" : tags, "location" : location, "location_type" : location_type, "url" : url, "highlighted_at" : highlighted_at, "updated" : updated }
 420 |                     if not any(d["id"] == id for d in categoriesObject[indexCategory][indexBook]['highlights']):
 421 |                         categoriesObject[indexCategory][indexBook]['highlights'].append(highlight)
 422 |                         sorted(categoriesObject[indexCategory][indexBook]['highlights'], key = itemgetter('location'))
 423 |                         newMissingHighlightsCounter += 1
 424 |                         listOfBookIdsToUpdateMarkdownNotes.append([str(book_id), str(source)])
 425 |                         print(str((newMissingHighlightsCounter + updatedMissingHighlightsCounter)) + '/' + str(len(missingHighlightsListResultsSort)) + ' missing highlights added or updated')
 426 |                     else:
 427 |                         indexHighlight = list(map(itemgetter('id'), categoriesObject[indexCategory][indexBook]['highlights'])).index(id)
 428 |                         if categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['text'] == text:
 429 |                             if categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['note'] == note:
 430 |                                 pass
 431 |                             else:
 432 |                                 categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['note'] = note
 433 |                                 categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['updated'] = bookLastUpdated
 434 |                         else:
 435 |                             categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['text'] = text
 436 |                             categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['updated'] = bookLastUpdated
 437 |                             if categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['note'] == note:
 438 |                                 pass
 439 |                             else:
 440 |                                 categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['note'] = note
 441 |                         sorted(categoriesObject[indexCategory][indexBook]['highlights'], key = itemgetter('location'))
 442 |                         updatedMissingHighlightsCounter += 1
 443 |                         listOfBookIdsToUpdateMarkdownNotes.append([str(book_id), str(source)])
 444 |                     print(str((newMissingHighlightsCounter + updatedMissingHighlightsCounter)) + '/' + str(len(missingHighlightsListResultsSort)) + ' missing highlights added or updated in ' \
 445 |                         + str(categoriesObjectNames[indexCategory]) + ' object')
 446 |             except ValueError:
 447 |                 pass
 448 |     try:
 449 |         message = str(updatedMissingHighlightsCounter) + ' highlights updated (incl tags) in ' + str(categoriesObjectNames[indexCategory]) + ' object' # '.json'
 450 |         logDateTimeOutput(message)
 451 |         appendHighlightsToListForFetchingTags(missingHighlightsListToFetchTagsFor, missingHighlightsListResultsSort)
 452 |         appendHighlightsToListForFetchingTags(allHighlightsToFetchTagsFor, missingHighlightsListResultsSort)
 453 |         # appendTagsToHighlightObject(missingHighlightsListResultsSort)
 454 |     except UnboundLocalError:
 455 |         message = 'No missing highlights (incl tags) to update'
 456 |         logDateTimeOutput(message)
 457 | 
 458 | def appendBookAndHighlightObjectToJson():
 459 |     for i in range(len(categoriesObjectNames)):
 460 |         try:
 461 |             with open(os.path.join(sourceDirectory, "readwiseCategories", categoriesObjectNames[i] + ".json"), 'w') as outfile:
 462 |                 json.dump(categoriesObject[i], outfile, indent=4)
 463 |         except FileNotFoundError:
 464 |             with open(os.path.join(sourceDirectory, "readwiseCategories", categoriesObjectNames[i] + ".json"), 'x') as outfile:
 465 |                 json.dump(categoriesObject[i], outfile, indent=4)
 466 | 
 467 | def replaceNoneInListOfDict(listOfDicts):
 468 |     for i in range(len(listOfDicts)):
 469 |         for k, v in iter(listOfDicts[i].items()):
 470 |             if k == 'location' and v is None:
 471 |                 listOfDicts[i][k] = 0
 472 |             if k == 'location_type' and v == 'none':
 473 |                 listOfDicts[i][k] = 'custom'
 474 |             if k == 'highlighted_at' and v is None:
 475 |                 listOfDicts[i][k] = str(v)
 476 | 
 477 | def removeHighlightsWithDiscardTag():
 478 |     listCategories = list(categoriesObject)
 479 |     highlightsWithDiscardTagCounter = 0
 480 |     for i in range(len(listCategories)):
 481 |         for k in range(len(listCategories[i])):
 482 |             book_id = str(listCategories[i][k]['book_id'])
 483 |             source = str(listCategories[i][k]['source'])
 484 |             originalNumberOfhighlights = listCategories[i][k]['num_highlights']
 485 |             indexCategory = categoriesObjectNames.index(source) # Identify which position the 'category' corresponds to within the list of category objects
 486 |             indexBook = list(map(itemgetter('book_id'), categoriesObject[indexCategory])).index(str(book_id)) # Identify which position the 'book_id' corresponds to within the category object
 487 |             originalListOfHighlights = listCategories[i][k]['highlights'].copy()
 488 |             newListOfHighlights = categoriesObject[indexCategory][indexBook]['highlights'].copy()
 489 |             for n in range(len(originalListOfHighlights)):
 490 |                 try:
 491 |                     if any('discard' in s for s in listCategories[i][k]['highlights'][n]['tags']):
 492 |                         id = listCategories[i][k]['highlights'][n]['id']
 493 |                         indexHighlight = list(map(itemgetter('id'), newListOfHighlights)).index(str(id))
 494 |                         newListOfHighlights.pop(indexHighlight)
 495 |                         # listCategories[i][k]['highlights'].pop(n) # Remove highlight with 'discard' tag from list
 496 |                         highlightsWithDiscardTagCounter += 1
 497 |                 except IndexError:
 498 |                     continue
 499 |             categoriesObject[indexCategory][indexBook]['highlights'] = newListOfHighlights
 500 |             newNumberOfhighlights = len(newListOfHighlights)
 501 |             categoriesObject[indexCategory][indexBook]['num_highlights'] = newNumberOfhighlights
 502 |             if str(originalNumberOfhighlights - newNumberOfhighlights) == '0':
 503 |                 pass
 504 |             else:
 505 |                 print(str(originalNumberOfhighlights - newNumberOfhighlights) + ' highlights removed from ' + str(listCategories[i][k]['book_id']))
 506 |     message = str(highlightsWithDiscardTagCounter) + ' highlights discarded'
 507 |     logDateTimeOutput(message)
 508 |     print(message)
 509 | 
 510 | def appendHashtagToTags():
 511 |     listCategories = list(categoriesObject)
 512 |     tagsWithNoHashtag = 0
 513 |     for i in range(len(listCategories)):
 514 |         for k in range(len(listCategories[i])):
 515 |             book_id = str(listCategories[i][k]['book_id'])
 516 |             source = str(listCategories[i][k]['source'])
 517 |             indexCategory = categoriesObjectNames.index(source) # Identify which position the 'category' corresponds to within the list of category objects
 518 |             indexBook = list(map(itemgetter('book_id'), categoriesObject[indexCategory])).index(str(book_id)) # Identify which position the 'book_id' corresponds to within the category object
 519 |             for n in range(len(listCategories[i][k]['highlights'])):
 520 |                 id = listCategories[i][k]['highlights'][n]['id']
 521 |                 indexHighlight = list(map(itemgetter('id'), categoriesObject[indexCategory][indexBook]['highlights'])).index(str(id))
 522 |                 for t in range(len(listCategories[i][k]['highlights'][n]['tags'])):
 523 |                     tag = str(listCategories[i][k]['highlights'][n]['tags'][t])
 524 |                     positionTag = listCategories[i][k]['highlights'][n]['tags'].index(tag) # Should be the same as 't'
 525 |                     if listCategories[i][k]['highlights'][n]['tags'][t].startswith('#'):
 526 |                         pass
 527 |                     else:
 528 |                         categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['tags'][positionTag] = '#' + \
 529 |                         categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['tags'][positionTag]
 530 |                         # listCategories[i][k]['highlights'][n]['tags'][t] = '#' + listCategories[i][k]['highlights'][n]['tags'][t]
 531 |                         tagsWithNoHashtag += 1
 532 |     message = str(tagsWithNoHashtag) + ' tags updated with hashtags'
 533 |     print(message)
 534 | 
 535 | # Set boolean value to determine if tags should be fetched (default = True)
 536 | # If any of the optional input variables in readwiseMetadata are blank or missing, set boolean to False
 537 | fetchTagsBoolean = True
 538 | 
 539 | def fetchTagsTrueOrFalse(fetchTagsBoolean, inputVariable):
 540 |     if fetchTagsBoolean is False:
 541 |         return False
 542 |     elif inputVariable == "" or inputVariable is None:
 543 |         return False
 544 |     else:
 545 |         return True
 546 | 
 547 | ################################################
 548 | ### Load CSV export into dataframe and lists ###
 549 | ################################################
 550 | 
 551 | def latest_download_file():
 552 |       path = sourceDirectory
 553 |       files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
 554 |       newest = files[-1]
 555 |       return newest
 556 | 
 557 | def download_wait():
 558 |     seconds = 0
 559 |     dl_wait = True
 560 |     while dl_wait and seconds < 20:
 561 |         time.sleep(1)
 562 |         dl_wait = False
 563 |         for fname in os.listdir(sourceDirectory):
 564 |             if fname.endswith('.crdownload'):
 565 |                 dl_wait = True
 566 |         seconds += 1
 567 |     newest_file = latest_download_file()
 568 |     return newest_file
 569 | 
 570 | ####### V2.0 #######
 571 | 
 572 | # Use Selenium to export CSV extract of highlight data, and save in sourceDirectory
 573 | def downloadCsvExport(latestDownloadedFileName): # with_ublock=False, chromedriverDirectory=None
 574 |     if fetchTagsBoolean is False:
 575 |         return
 576 |     else:
 577 |         # Open new Chrome window via Selenium
 578 |         print('Opening new Chrome browser window...')
 579 |         options = webdriver.ChromeOptions()
 580 |         options.add_argument("--headless")
 581 |         options.add_argument("window-size=1920,1080")
 582 |         options.add_argument("--log-level=3") # to stop logging
 583 |         options.add_argument("--silent")
 584 |         options.add_argument("--disable-logging")
 585 |         options.add_argument("--disable-blink-features=AutomationControlled")
 586 |         options.add_experimental_option('prefs', {
 587 |         # "download.default_directory": downloadsDirectory, # Set own Download path
 588 |         "download.prompt_for_download": False, # Do not ask for download at runtime
 589 |         "download.directory_upgrade": True, # Also needed to suppress download prompt
 590 |         "w3c": False, # allows selenium to accept cookies with a non-int64 'expiry' value
 591 |         "excludeSwitches": ["enable-logging"], # removes the 'DevTools listening' log message
 592 |         "excludeSwitches": ["enable-automation"], # prevent Cloudflare from detecting ChromeDriver as bot
 593 |         "useAutomationExtension": False,
 594 |         })
 595 |         driver = webdriver.Chrome(
 596 |             executable_path=chromedriverDirectory,
 597 |             options=options,
 598 |         )
 599 |         driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
 600 |         params = {'behavior': 'allow', 'downloadPath': sourceDirectory}
 601 |         driver.execute_cdp_cmd('Page.setDownloadBehavior', params)
 602 |         driver.get('https://readwise.io/accounts/login')
 603 |         print('Logging into readwise using credentials provided in readwiseMetadata')
 604 |         # Input email as username from readwiseMetadata
 605 |         WebDriverWait(driver, 10).until(
 606 |             EC.presence_of_element_located((By.XPATH, "//*[@id='id_login']")))
 607 |         username = driver.find_element_by_xpath("//*[@id='id_login']")
 608 |         username.clear()
 609 |         username.send_keys(email) # from 'readwiseMetadata'
 610 |         # Input password from readwiseMetadata
 611 |         WebDriverWait(driver, 10).until(
 612 |             EC.presence_of_element_located((By.XPATH, "//*[@id='id_password']")))
 613 |         password = driver.find_element_by_xpath("//*[@id='id_password']")
 614 |         password.clear()
 615 |         password.send_keys(pwd) # from 'readwiseMetadata'
 616 |         # Click login button
 617 |         WebDriverWait(driver, 10).until(
 618 |             EC.presence_of_element_located((By.XPATH, "/html/body/div[1]/div/div/div/div/div/div/form/div[3]/button")))
 619 |         driver.find_element_by_xpath("/html/body/div[1]/div/div/div/div/div/div/form/div[3]/button").click()
 620 |         print('Log-in successful! Redirecting to export page...')
 621 |         driver.get('https://readwise.io/export')
 622 |         # Click export CSV button
 623 |         WebDriverWait(driver, 10).until(
 624 |             EC.presence_of_element_located((By.XPATH, "//*[@id='MiscApp']/div/div[3]/div/div[1]/div/div[2]/div/button")))
 625 |         driver.find_element_by_xpath("//*[@id='MiscApp']/div/div[3]/div/div[1]/div/div[2]/div/button").click()
 626 |         print('Redirect successful! Waiting for CSV export...')
 627 |         dlFilename = download_wait()
 628 |         # rename the downloaded file
 629 |         shutil.move(dlFilename, os.path.join(sourceDirectory, latestDownloadedFileName))
 630 |         message = str(latestDownloadedFileName) + ' successfully added to ' + str(sourceDirectory)
 631 |         logDateTimeOutput(message)
 632 |         print(message)
 633 |         print('Closing Chrome browser window...')
 634 |         driver.quit()
 635 | 
 636 | # Clean-up list values
 637 | def cleanUpListValues(listFromCsv, replacementCharacter):
 638 |     for i in range(len(listFromCsv)):
 639 |         if(str(listFromCsv[i]) == "nan"):
 640 |             listFromCsv[i] = str(replacementCharacter)
 641 |         else:
 642 |             listFromCsv[i] = unidecode(str(listFromCsv[i]))
 643 | 
 644 | # Make book titles valid filenames via Django
 645 | def convertTitleToValidFilename(listToConvert):
 646 |     for i in range(len(listToConvert)):
 647 |         listToConvert[i] = slugify(listToConvert[i])
 648 |         # listToConvert[i] = get_valid_filename_django(listToConvert[i])
 649 | 
 650 | # Convert all book titles to lowercase
 651 | def toLowercase(listToConvert):
 652 |     for i in range(len(listToConvert)):
 653 |         listToConvert[i] = listToConvert[i].lower()
 654 | 
 655 | # Replace empty CSV cells of 'Tags' with ""
 656 | def replaceEmptyTagCells(list_Tags):
 657 |     for i in range(len(list_Tags)):
 658 |         if(str(list_Tags[i]) == "nan"):
 659 |             list_Tags[i] = ""
 660 |         else:
 661 |             list_Tags[i] = list_Tags[i].replace(',', ' ')
 662 | 
 663 | # Normalise date strings e.g. 2020-01-01T12:59:59Z >> 2020-01-01 12:59:59
 664 | def dateStringNormaliser(dateString):
 665 |     for i in range(len(dateString)):
 666 |         dateString[i] = dateString[i].replace('T', ' ')[0 : 19]
 667 | 
 668 | # Create empty lists to fill data from CSV
 669 | list_Highlight = []
 670 | list_BookTitle = []
 671 | list_BookAuthor = []
 672 | list_AmazonBookId = []
 673 | list_Note = []
 674 | list_Color = []
 675 | list_Tags = []
 676 | list_LocationType = []
 677 | list_Location = []
 678 | list_HighlightedAt = []
 679 | 
 680 | # Create additional lists to supplement ones provided in the CSV export
 681 | list_ReadwiseBookId = [] # 'Readwise Book ID'
 682 | list_Source = [] # 'Source' # e.g. Articles
 683 | list_Url = [] # 'Url'
 684 | list_NumberOfHighlights = [] # 'Number of Highlights'
 685 | list_UpdatedAt = [] # 'Updated at'
 686 | list_HighlightId = [] # 'Highlight ID'
 687 | 
 688 | # Fill newly-created lists with empty values to aid with index matching
 689 | def fillListWithEmptyCharacters(listToGetRangeFrom, listToFill):
 690 |     for i in range(len(listToGetRangeFrom)):
 691 |         listToFill.append("")
 692 | 
 693 | # Create lists to add extracted highlight data from API calls
 694 | # Then we can compare these lists to to those from the CSV export to retrieve highlight id's, book id's, and highlight tags
 695 | list_extractedHighlightId = []
 696 | list_extractedHighlightText = []
 697 | list_extractedHighlightTags = []
 698 | list_extractedHighlightBookId = []
 699 | list_extractedHighlightBookTitle = []
 700 | list_extractedHighlightBookAuthor = []
 701 | list_extractedHighlightLocation = []
 702 | list_extractedHighlightedAt = []
 703 | 
 704 | # Create lists to collect fallouts e.g. no highlight id retrieved from highlight text, highlight text has duplicate values
 705 | list_noMatchingHighlightIdFromText = []
 706 | list_noMatchingBookIdFromTitle = []
 707 | list_duplicateHighlightTextValues = []
 708 | 
 709 | # Fill empty lists with values from highlights list of dictionaries
 710 | def fillListsWithHighlightData(listToFill):
 711 |     for j in range(len(listToFill)):
 712 |         for k, v in iter(listToFill[j].items()):
 713 |             if k == 'text':
 714 |                 list_extractedHighlightText[j] = str(v)
 715 |             if k == 'id':
 716 |                 list_extractedHighlightId[j] = str(v)
 717 |             if k == 'location':
 718 |                 list_extractedHighlightLocation[j] = str(v)
 719 |             if k == 'highlighted_at':
 720 |                 list_extractedHighlightedAt[j] = str(v)
 721 |             if k == 'book_id':
 722 |                 list_extractedHighlightBookId[j] = str(v)
 723 | 
 724 | # Clean-up extracted list values
 725 | def cleanUpExtractedListValues(listFromJson):
 726 |     for i in range(len(listFromJson)):
 727 |         listFromJson[i] = unidecode(str(listFromJson[i]))
 728 | 
 729 | # Mark duplicate values e.g. AirrQuotes
 730 | def checkForDuplicates(listToGetRangeFrom, listToCheckDuplicateValues):
 731 |     for i in range(len(listToGetRangeFrom)):
 732 |         if listToCheckDuplicateValues.count(listToCheckDuplicateValues[i]) > 1:
 733 |             list_duplicateHighlightTextValues[i] = 'Duplicate value'
 734 | 
 735 | # Fetch highlight id, book id, and tags from 'highlight text' or 'highlighted at' (if there are duplicates)
 736 | def fetchTagsFromCsvData(list_Highlight, list_BookTitle, list_BookAuthor, list_AmazonBookId, list_Note, list_Color, list_Tags, list_LocationType, list_Location, list_HighlightedAt, \
 737 |     list_ReadwiseBookId, list_Source, list_Url, list_NumberOfHighlights, list_UpdatedAt, list_HighlightId, list_extractedHighlightTags, list_extractedHighlightText, list_extractedHighlightId, \
 738 |     list_extractedHighlightLocation, list_extractedHighlightedAt, list_extractedHighlightBookId, list_noMatchingHighlightIdFromText, list_duplicateHighlightTextValues):
 739 |     textMatch = 0
 740 |     noMatch = 0
 741 |     tagsFromTextMatch = 0
 742 |     totalNumberOfTags = sum(1 for x in list_Tags if x != '')
 743 |     for i in range(len(list_extractedHighlightText)):
 744 |         try:
 745 |             if list_duplicateHighlightTextValues[i] == 'Duplicate value':
 746 |                 if list_extractedHighlightedAt[i] in list_HighlightedAt:
 747 |                     index1 = list_HighlightedAt.index(list_extractedHighlightedAt[i])
 748 |                     list_HighlightId[index1] = str(list_extractedHighlightId[i])
 749 |                     list_ReadwiseBookId[index1] = str(list_extractedHighlightBookId[i])
 750 |                     list_duplicateHighlightTextValues[i] = ""
 751 |                     if(str(list_Tags[index1]) == ""):
 752 |                         list_extractedHighlightTags[i] = ""
 753 |                     else:
 754 |                         list_extractedHighlightTags[i] = str(list_Tags[index1])
 755 |                         tagsFromTextMatch += 1
 756 |                     textMatch += 1
 757 |                     print(str(textMatch) + '/' + str(len(list_extractedHighlightText)) + ' highlights matched with ' \
 758 |                         + str(tagsFromTextMatch) + '/' + str(totalNumberOfTags) + ' tags')
 759 |                 else:
 760 |                     noMatch += 1
 761 |                     message = str(list_extractedHighlightId[i]) + ' from ' + str(list_extractedHighlightBookId[i]) + ' not matched as it is a duplicate'
 762 |                     print(message)
 763 |                     pass
 764 |             else:
 765 |                 if list_extractedHighlightText[i] in list_Highlight:
 766 |                     index2 = list_Highlight.index(list_extractedHighlightText[i])
 767 |                     list_HighlightId[index2] = str(list_extractedHighlightId[i])
 768 |                     list_ReadwiseBookId[index2] = str(list_extractedHighlightBookId[i])
 769 |                     if(str(list_Tags[index2]) == ""):
 770 |                         list_extractedHighlightTags[i] = ""
 771 |                     else:
 772 |                         list_extractedHighlightTags[i] = str(list_Tags[index2])
 773 |                         tagsFromTextMatch += 1
 774 |                     textMatch += 1
 775 |                     print(str(textMatch) + '/' + str(len(list_extractedHighlightText)) + ' highlights matched with ' \
 776 |                         + str(tagsFromTextMatch) + '/' + str(totalNumberOfTags) + ' tags')
 777 |                 else:
 778 |                     try:
 779 |                         list_noMatchingHighlightIdFromText[i] = 'No highlight text match'
 780 |                     except IndexError:
 781 |                         return
 782 |         except IndexError:
 783 |             return
 784 |     message = str(textMatch) + '/' + str(len(list_extractedHighlightText)) + ' highlights matched with ' \
 785 |                         + str(tagsFromTextMatch) + '/' + str(totalNumberOfTags) + ' tags'
 786 |     logDateTimeOutput(message)
 787 | 
 788 | def appendTagsFromCsvToCategoriesObject(list_highlights, list_ExtractedTags):
 789 |     tagsFromCsvCounter = 0
 790 |     totalNumberOfTags = sum(1 for x in list_ExtractedTags if x != '')
 791 |     for i in range(len(list_highlights)): # key = 'book_id'
 792 |         listCategories = [item for category in categoriesObject for item in category]
 793 |         key = str(list_highlights[i]['book_id'])
 794 |         id = str(list_highlights[i]['id'])
 795 |         index = list(map(itemgetter('book_id'), listCategories)).index(key)
 796 |         source = listCategories[index]['source'] # Get the 'category' of the corresponding 'book_id' from the grouped highlights
 797 |         indexCategory = categoriesObjectNames.index(source) # Identify which position the 'category' corresponds to within the list of category objects
 798 |         indexBook = list(map(itemgetter('book_id'), categoriesObject[indexCategory])).index(str(key)) # Identify which position the 'book_id' corresponds to within the category object
 799 |         bookLastUpdated = categoriesObject[indexCategory][indexBook]['updated']
 800 |         indexHighlight = list(map(itemgetter('id'), categoriesObject[indexCategory][indexBook]['highlights'])).index(id)
 801 |         # highlights = categoriesObject[indexCategory][indexBook]['highlights']
 802 |         book_id = categoriesObject[indexCategory][indexBook]['book_id']
 803 |         bookReviewUrl = 'https://readwise.io/bookreview/' + book_id
 804 |         indexTags = list_extractedHighlightId.index(id)
 805 |         if str(list_ExtractedTags[indexTags]) == '' or str(list_ExtractedTags[indexTags]) == 'nan':
 806 |             categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['tags'] = []
 807 |         else:
 808 |             tagsArray = str(list_ExtractedTags[indexTags]).split()
 809 |             categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['tags'] = tagsArray
 810 |             categoriesObject[indexCategory][indexBook]['highlights'][indexHighlight]['updated'] = bookLastUpdated
 811 |             tagsFromCsvCounter += 1
 812 |             print(str(tagsFromCsvCounter) + '/' + str(totalNumberOfTags) + ' tags added or updated from the CSV export')
 813 |     message = str(tagsFromCsvCounter) + '/' + str(totalNumberOfTags) + ' tags added or updated from the CSV export'
 814 |     logDateTimeOutput(message)
 815 | 
 816 | def runFetchCsvData():
 817 |     readwiseCsvExportFileName = 'readwise-data.csv'
 818 |     downloadCsvExport(readwiseCsvExportFileName)
 819 |     readwiseCsvExportPath = os.path.join(sourceDirectory, readwiseCsvExportFileName)
 820 |     df = pd.read_csv(readwiseCsvExportPath)
 821 |     # Insert complete path to the excel file and optional variables
 822 |     # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
 823 |     df.sort_values(by=['Highlighted at'], ascending=True)
 824 |     # Insert the name of the column as a string in brackets
 825 |     list_Highlight = list(df['Highlight'])
 826 |     list_BookTitle = list(df['Book Title'])
 827 |     list_BookAuthor = list(df['Book Author'])
 828 |     list_AmazonBookId = list(df['Amazon Book ID'])
 829 |     list_Note = list(df['Note'])
 830 |     list_Color = list(df['Color'])
 831 |     list_Tags = list(df['Tags'])
 832 |     list_LocationType = list(df['Location Type'])
 833 |     list_Location = list(df['Location'])
 834 |     list_HighlightedAt = list(df['Highlighted at'])
 835 |     cleanUpListValues(list_Highlight, " ")
 836 |     cleanUpListValues(list_BookAuthor, " ")
 837 |     cleanUpListValues(list_Note, " ")
 838 |     cleanUpListValues(list_Location, "0")
 839 |     convertTitleToValidFilename(list_BookTitle)
 840 |     toLowercase(list_BookTitle)
 841 |     replaceEmptyTagCells(list_Tags)
 842 |     dateStringNormaliser(list_HighlightedAt)
 843 |     fillListWithEmptyCharacters(list_HighlightedAt, list_ReadwiseBookId)
 844 |     fillListWithEmptyCharacters(list_HighlightedAt, list_Source)
 845 |     fillListWithEmptyCharacters(list_HighlightedAt, list_Url)
 846 |     fillListWithEmptyCharacters(list_HighlightedAt, list_NumberOfHighlights)
 847 |     fillListWithEmptyCharacters(list_HighlightedAt, list_UpdatedAt)
 848 |     fillListWithEmptyCharacters(list_HighlightedAt, list_HighlightId)
 849 |     return list_Highlight, list_BookTitle, list_BookAuthor, list_AmazonBookId, list_Note, list_Color, list_Tags, list_LocationType, list_Location, list_HighlightedAt, \
 850 |     list_ReadwiseBookId, list_Source, list_Url, list_NumberOfHighlights, list_UpdatedAt, list_HighlightId
 851 | 
 852 | def runExtractDataFromApi(list_Highlight, list_BookTitle, list_BookAuthor, list_AmazonBookId, list_Note, list_Color, list_Tags, list_LocationType, list_Location, list_HighlightedAt, \
 853 |     list_ReadwiseBookId, list_Source, list_Url, list_NumberOfHighlights, list_UpdatedAt, list_HighlightId):
 854 |     allHighlightsToFetchTagsForSortByDate = sorted(allHighlightsToFetchTagsFor, key = itemgetter('highlighted_at'))
 855 |     fillListWithEmptyCharacters(allHighlightsToFetchTagsForSortByDate, list_extractedHighlightTags)
 856 |     fillListWithEmptyCharacters(allHighlightsToFetchTagsForSortByDate, list_extractedHighlightText)
 857 |     fillListWithEmptyCharacters(allHighlightsToFetchTagsForSortByDate, list_extractedHighlightId)
 858 |     fillListWithEmptyCharacters(allHighlightsToFetchTagsForSortByDate, list_extractedHighlightLocation)
 859 |     fillListWithEmptyCharacters(allHighlightsToFetchTagsForSortByDate, list_extractedHighlightedAt)
 860 |     fillListWithEmptyCharacters(allHighlightsToFetchTagsForSortByDate, list_extractedHighlightBookId)
 861 |     fillListWithEmptyCharacters(allHighlightsToFetchTagsForSortByDate, list_noMatchingHighlightIdFromText)
 862 |     fillListWithEmptyCharacters(allHighlightsToFetchTagsForSortByDate, list_duplicateHighlightTextValues)
 863 |     fillListsWithHighlightData(allHighlightsToFetchTagsForSortByDate)
 864 |     cleanUpExtractedListValues(list_extractedHighlightText)
 865 |     cleanUpExtractedListValues(list_extractedHighlightId)
 866 |     cleanUpExtractedListValues(list_extractedHighlightLocation)
 867 |     cleanUpExtractedListValues(list_extractedHighlightedAt)
 868 |     cleanUpExtractedListValues(list_extractedHighlightBookId)
 869 |     dateStringNormaliser(list_extractedHighlightedAt)
 870 |     checkForDuplicates(list_extractedHighlightText, list_extractedHighlightText)
 871 |     return allHighlightsToFetchTagsForSortByDate, list_extractedHighlightTags, list_extractedHighlightText, list_extractedHighlightId, list_extractedHighlightLocation, \
 872 |         list_extractedHighlightedAt, list_extractedHighlightBookId, list_noMatchingHighlightIdFromText, list_duplicateHighlightTextValues
 873 | 
 874 | def runFetchTagsFromCsvData(list_Highlight, list_BookTitle, list_BookAuthor, list_AmazonBookId, list_Note, list_Color, list_Tags, list_LocationType, list_Location, list_HighlightedAt, \
 875 |     list_ReadwiseBookId, list_Source, list_Url, list_NumberOfHighlights, list_UpdatedAt, list_HighlightId, list_extractedHighlightTags, list_extractedHighlightText, list_extractedHighlightId, \
 876 |     list_extractedHighlightLocation, list_extractedHighlightedAt, list_extractedHighlightBookId, list_noMatchingHighlightIdFromText, list_duplicateHighlightTextValues):
 877 |     fetchTagsFromCsvData(list_Highlight, list_BookTitle, list_BookAuthor, list_AmazonBookId, list_Note, list_Color, list_Tags, list_LocationType, list_Location, list_HighlightedAt, \
 878 |     list_ReadwiseBookId, list_Source, list_Url, list_NumberOfHighlights, list_UpdatedAt, list_HighlightId, list_extractedHighlightTags, list_extractedHighlightText, list_extractedHighlightId, \
 879 |     list_extractedHighlightLocation, list_extractedHighlightedAt, list_extractedHighlightBookId, list_noMatchingHighlightIdFromText, list_duplicateHighlightTextValues)
 880 |     appendTagsFromCsvToCategoriesObject(allHighlightsToFetchTagsFor, list_extractedHighlightTags)
 881 | 
 882 | ###############################################################
 883 | ### Create markdown notes with updated books and highlights ###
 884 | ###############################################################
 885 | 
 886 | # Create function for generating new markdown notes
 887 | # Change working directory to desired path set in the readwiseMetadata file
 888 | # Append book metadata at the top of the note e.g. title, author, source, readwise url
 889 | # Append all highlights separated by "---" beneath the book metadata
 890 | 
 891 | def createMarkdownNote(listOfBookIdsToUpdateMarkdownNotes):
 892 |     # for x in listOfBookIdsToUpdateMarkdownNotes:
 893 |     #     print("listOfBookIdsToUpdateMarkdownNotes " + str(x))
 894 |     booksWithNoHighlights = 0
 895 |     booksWithHeadings = 0
 896 |     if os.path.exists(targetDirectory):
 897 |         os.chdir(targetDirectory)
 898 |     else:
 899 |         print('Error! The target directory does not exist or is incorrect')
 900 |     # Match the 'book_id' to the correct category dictionary e.g. books, articles
 901 |     # Retrieve 'book_id' metadata from the dictionary
 902 |     listCategories = list(categoriesObject)
 903 |     # listCategories = [item for category in categoriesObject for item in category]
 904 |     listOfBookIdsToUpdateMarkdownNotes.sort()
 905 |     listOfBookIdsToUpdateMarkdownNotes = list(listOfBookIdsToUpdateMarkdownNotes for listOfBookIdsToUpdateMarkdownNotes,_ in groupby(listOfBookIdsToUpdateMarkdownNotes))
 906 |     if len(listOfBookIdsToUpdateMarkdownNotes) != 0:
 907 |         for bookData in range(len(listOfBookIdsToUpdateMarkdownNotes)):  # type(listOfBookIdsToUpdateMarkdownNotes[bookData]) = list
 908 |             key = str(listOfBookIdsToUpdateMarkdownNotes[bookData][0])
 909 |             source = str(listOfBookIdsToUpdateMarkdownNotes[bookData][1])
 910 |             indexCategory = categoriesObjectNames.index(source) # Identify which position the 'category' corresponds to within the list of category objects
 911 |             indexBook = list(map(itemgetter('book_id'), categoriesObject[indexCategory])).index(str(key)) # Identify which position the 'book_id' corresponds to within the category object
 912 |             yamlData = []
 913 |             titleBlock = []
 914 |             commentData = []
 915 |             yamlData.append("---" + "\n")
 916 |             # Add title to yamlData and titleBlock
 917 |             title = unidecode(categoriesObject[indexCategory][indexBook]['title']).replace('"', '\'')
 918 |             yamlData.append("Title: " + "\"" + str(title) + "\"" + "\n")
 919 |             titleBlock.append("# " + str(title) + "\n")
 920 |             if(str(categoriesObject[indexCategory][indexBook]['author']) == "None"):
 921 |                 author = " "
 922 |                 yamlData.append("Author: " + str(author) + "\n")
 923 |             else:
 924 |                 author = unidecode(categoriesObject[indexCategory][indexBook]['author']).replace('"', '\'')
 925 |                 yamlData.append("Author: " + "\"" + str(author) + "\"" + "\n")
 926 |             source = categoriesObject[indexCategory][indexBook]['source']
 927 |             yamlData.append("Tags: " + "[" + "readwise2directory" + ", TVZ, " + str(source) + "]" + "\n")
 928 |             num_highlights = categoriesObject[indexCategory][indexBook]['num_highlights']
 929 |             yamlData.append("Highlights: " + str(num_highlights) + "\n")
 930 |             lastUpdated = datetime.datetime.strptime(categoriesObject[indexCategory][indexBook]['updated'][0:10], '%Y-%m-%d').strftime("%Y-%m-%d")
 931 |             yamlData.append("Updated: " + "[[" + str(lastUpdated) + "]]" + "\n")
 932 |             # Add readwise url to yamlData and titleBlock
 933 |             url = str(categoriesObject[indexCategory][indexBook]['url'])
 934 |             yamlData.append("Readwise URL: " + str(url) + "\n")
 935 |             titleBlock.append("[Readwise URL](" + str(url) + ")")
 936 |             book_id = str(categoriesObject[indexCategory][indexBook]['book_id'])
 937 |             yamlData.append("Readwise ID: " + str(book_id) + "\n")
 938 |             # Add source URL (if exists) to yamlData and titleBlock
 939 |             try:
 940 |                 source_url = str(categoriesObject[indexCategory][indexBook]['source_url'])
 941 |                 if source_url.lower() == "none" or source_url.lower() == "null" or source_url == "":
 942 |                     print('no source URL found')
 943 |                     #continue # Commented this out because otherwise, books don't get markdown notes generated from them.
 944 |                 else:
 945 |                     yamlData.append("Source URL: " + str(source_url) + "\n")
 946 |                     titleBlock.append(" | " + "[Source URL](" + str(source_url) + ")"+ "\n\n")
 947 |             except NameError:
 948 |                 continue
 949 |             yamlData.append("---" + "\n\n")
 950 |             titleBlock.append("---" + "\n")
 951 | 
 952 |             # Add comment with tags
 953 |             commentData.append("%%\n")
 954 |             commentData.append("Last Updated: [[" + str(lastUpdated) + "]]\n")
 955 |             commentData.append("%%" + "\n")
 956 |             # Add cover image URL if exists
 957 |             try:
 958 |                 cover_image_url = str(categoriesObject[indexCategory][indexBook]['cover_image_url'])
 959 |                 titleBlock.append("![](" + cover_image_url + ")" + "\n\n")
 960 |                 titleBlock.append("---" + "\n")
 961 |             except NameError:
 962 |                 continue
 963 |             #fileName = slugify(title)
 964 |             fileName = title # Removed slugify to preserve case and spaces
 965 |             # fileName = get_valid_filename_django(title)
 966 |             yamlData = "".join(yamlData)
 967 |             commentData = "".join(commentData)
 968 |             titleBlock = "".join(titleBlock)
 969 |             # Ignore books with no highlights
 970 |             if str(num_highlights) == '0':
 971 |                 booksWithNoHighlights += 1
 972 |                 pass
 973 |             else:
 974 |                 # Change directory according to source
 975 |                 if str(source) == 'tweets':
 976 |                     sourceOutputDir = 'Tweet'
 977 |                 if str(source) == 'articles':
 978 |                     sourceOutputDir = 'Article'
 979 |                 if str(source) == 'books':
 980 |                     sourceOutputDir = 'Book'
 981 |                 if str(source) == 'podcasts':
 982 |                     sourceOutputDir = 'Podcast'
 983 |                 if str(source) == 'supplementals':
 984 |                     sourceOutputDir = 'Supplemental'
 985 |                 os.chdir(targetDirectory + '/' + sourceOutputDir)
 986 |                 with open(fileName + ".md", 'w') as newFile: # Warning: this will overwrite all content within the readwise note.
 987 |                     print(yamlData, file=newFile)
 988 |                     print(commentData, file=newFile)
 989 |                     print(titleBlock, file=newFile)
 990 |                     # Append highlights to the file beneath the 'book_id' metadata
 991 |                     for n in range(len(categoriesObject[indexCategory][indexBook]['highlights'])):
 992 |                         highlightData = []
 993 |                         id = str(categoriesObject[indexCategory][indexBook]['highlights'][n]['id'])
 994 |                         note = unidecode(categoriesObject[indexCategory][indexBook]['highlights'][n]['note'])
 995 |                         location = str(categoriesObject[indexCategory][indexBook]['highlights'][n]['location'])
 996 |                         location_type = categoriesObject[indexCategory][indexBook]['highlights'][n]['location_type']
 997 |                         tagsArray = categoriesObject[indexCategory][indexBook]['highlights'][n]['tags']
 998 |                         text = unidecode(categoriesObject[indexCategory][indexBook]['highlights'][n]['text'])
 999 |                         if "__" in text:
1000 |                             text = text.replace("__", "==")
1001 |                         # Add # for h1-h5 headings
1002 |                         listOfHeadings = ['#h1', '#h2', '#h3', '#h4', '#h5']
1003 |                         if any(item in tagsArray for item in listOfHeadings):
1004 |                             if any('#h1' in s for s in tagsArray):
1005 |                                 highlightData.append("## " + text + "\n" + " ^" + id + "\n\n")
1006 |                                 booksWithHeadings += 1
1007 |                             elif any('#h2' in s for s in tagsArray):
1008 |                                 highlightData.append("### " + text + "\n" + " ^" + id + "\n\n")
1009 |                                 booksWithHeadings += 1
1010 |                             elif any('#h3' in s for s in tagsArray):
1011 |                                 highlightData.append("#### " + text + "\n" + " ^" + id + "\n\n")
1012 |                                 booksWithHeadings += 1
1013 |                             elif any('#h4' in s for s in tagsArray):
1014 |                                 highlightData.append("##### " + text + "\n" + " ^" + id + "\n\n")
1015 |                                 booksWithHeadings += 1
1016 |                             elif any('#h5' in s for s in tagsArray):
1017 |                                 highlightData.append("###### " + text + "\n" + " ^" + id + "\n\n")
1018 |                                 booksWithHeadings += 1
1019 |                         else:
1020 |                             # Pre-pend a "> " character to any text with line breaks
1021 |                             # Or pre-pend a "> \" if line is empty
1022 |                             # This is to fix the issue where the block-reference doesn't pick-up parent items
1023 |                             if "\n" in text:
1024 |                                 textNew = []
1025 |                                 textSplit = text.split("\n") # type(highlight['text']) = 'list'
1026 |                                 for s in range(len(textSplit)):
1027 |                                     if textSplit[s] == '':
1028 |                                         x = ("> \\" + textSplit[s])
1029 |                                     else:
1030 |                                         x = ("> " + textSplit[s])
1031 |                                     textNew.append(x)
1032 |                                 textNew = "\n".join(textNew)
1033 |                                 highlightData.append(textNew + "\n\n" + "^" + id + "\n\n")
1034 |                             else:
1035 |                                 highlightData.append(text + " ^" + id + "\n\n")
1036 |                         if note == [] or note == "":
1037 |                             pass
1038 |                         else:
1039 |                             highlightData.append("**Note:** " + str(note) + "\n")
1040 |                         if tagsArray == [] or tagsArray == "":
1041 |                             pass
1042 |                         else:
1043 |                             tags = " ".join(str(v) for v in tagsArray)
1044 |                             highlightData.append("**Tags:** " + str(tags) + "\n")
1045 |                         if str(categoriesObject[indexCategory][indexBook]['highlights'][n]['url']) == "None":
1046 |                             pass
1047 |                         else:
1048 |                             url = str(categoriesObject[indexCategory][indexBook]['highlights'][n]['url'])
1049 |                             highlightData.append("**References:** " + str(url) + "\n")
1050 |                         if source == "podcasts" and str(url) != "None":
1051 |                             # Append 'embed/' after the 'airr.io/' string and before the '/quote/' string
1052 |                             airrQuoteMatchingPattern = 'airr.io/'
1053 |                             airrQuoteEmbedText = 'embed/'
1054 |                             if any(airrQuoteMatchingPattern in url for string in url): # Check if url is an AirrQuote
1055 |                                 i = url.find(airrQuoteMatchingPattern) # Find index of matching pattern
1056 |                                 podcastUrl = url[:i + len(airrQuoteMatchingPattern)] + airrQuoteEmbedText + url[i + len(airrQuoteMatchingPattern):]
1057 |                             else:
1058 |                                 podcastUrl = url
1059 |                             iFrameWithPodcastUrl = '<iframe src="' + podcastUrl + '" frameborder="0" style="width:100%; height:100%;"></iframe>'
1060 |                             highlightData.append(iFrameWithPodcastUrl + "\n")
1061 |                         """
1062 |                         highlighted_at = datetime.datetime.strptime(categoriesObject[indexCategory][indexBook]['highlights'][n]['highlighted_at'][0:10], '%Y-%m-%d').strftime("%y%m%d %A") # Trim the UTC date field and re-format
1063 |                         updated = datetime.datetime.strptime(categoriesObject[indexCategory][indexBook]['highlights'][n]['updated'][0:10], '%Y-%m-%d').strftime("%y%m%d %A") # Trim the UTC date field and re-format
1064 |                         if highlighted_at == updated:
1065 |                             date = updated
1066 |                             highlightData.append("**Date:** " + "[[" + str(date) + "]]" + "\n")
1067 |                         else:
1068 |                             date = highlighted_at
1069 |                             highlightData.append("**Date:** " + "[[" + str(date) + "]]" + "\n")
1070 |                         """
1071 |                         highlightData.append("\n" + "---" + "\n")
1072 |                         highlightData = "".join(highlightData)
1073 |                         print(highlightData, file=newFile)
1074 |                 print(' - "' + str(title) + '"')
1075 |     os.chdir(sourceDirectory) # Revert to original directory with script
1076 |     if str(booksWithHeadings) == '0':
1077 |         pass
1078 |     else:
1079 |         print(str(booksWithHeadings) + ' highlights converted into headings')
1080 |     if str(booksWithNoHighlights) == '0':
1081 |         pass
1082 |     else:
1083 |         print(str(booksWithNoHighlights) + ' books ignored as they contained no highlights')
1084 |     differenceMarkdownNoteAmount = newMarkdownNoteAmount - originalMarkdownNoteAmount
1085 |     message = str(differenceMarkdownNoteAmount) + ' new markdown notes created and ' + str(len(listOfBookIdsToUpdateMarkdownNotes) - differenceMarkdownNoteAmount) + ' markdown notes updated'
1086 |     # message = str(len(listOfBookIdsToUpdateMarkdownNotes)) + ' new markdown notes created'
1087 |     logDateTimeOutput(message)
1088 |     print(message)
1089 | 
1090 | ##########################################################
1091 | ### Calculate the number of new markdown notes created ###
1092 | ##########################################################
1093 | 
1094 | def numberOfMarkdownNotes():
1095 |     counter = 0
1096 |     listCategories = list(categoriesObject)
1097 |     for i in range(len(listCategories)):
1098 |         counter += len(listCategories[i])
1099 |     return counter
1100 | 
1101 | #######################################################
1102 | ### Import variables from file in another directory ###
1103 | #######################################################
1104 | 
1105 | # Import all variables from readwiseMetadata file
1106 | print('Importing variables from readwiseMetadata...')
1107 | from readwiseMetadata import token, targetDirectory, dateFrom, email, pwd, chromedriverDirectory, highlightLimitToFetchTags
1108 | # from readwiseMetadata import *
1109 | 
1110 | # Check dateFrom variable
1111 | print('Checking if a valid dateFrom variable is defined in readwiseMetadata...')
1112 | dateFrom = convertDateFromToUtcFormat(dateFrom)
1113 | 
1114 | # Check targetDirectory variable is valid
1115 | print('Checking if a valid targetDirectory variable is defined in readwiseMetadata...')
1116 | insertPath(targetDirectory)
1117 | 
1118 | abspath = os.path.realpath(__file__) # Create absolute path for this file
1119 | 
1120 | # Create sourceDirectory variable from absolute path for this file
1121 | print('Creating sourceDirectory variable from absolute path for this file...')
1122 | sourceDirectory = os.path.dirname(abspath) # Create variable defining the directory name
1123 | # sourceDirectory = os.getcwd()
1124 | print(str(sourceDirectory) + ' directory variable defined')
1125 | 
1126 | # Function to check if any of the optional input variables in readwiseMetadata are blank or missing
1127 | # If blank or missing, set boolean value to False and no tags will be fetched
1128 | fetchTagsBoolean = fetchTagsTrueOrFalse(fetchTagsBoolean, email)
1129 | fetchTagsBoolean = fetchTagsTrueOrFalse(fetchTagsBoolean, pwd)
1130 | fetchTagsBoolean = fetchTagsTrueOrFalse(fetchTagsBoolean, chromedriverDirectory)
1131 | fetchTagsBoolean = fetchTagsTrueOrFalse(fetchTagsBoolean, highlightLimitToFetchTags)
1132 | 
1133 | ######################################
1134 | ### Load book data from JSON files ###
1135 | ######################################
1136 | 
1137 | articles = {}
1138 | books = {}
1139 | podcasts = {}
1140 | supplementals = {}
1141 | tweets = {}
1142 | 
1143 | categoriesObject = [articles, books, podcasts, supplementals, tweets] # type(categoriesObject[0]) = 'dictionary'
1144 | 
1145 | categoriesObjectNames = ["articles", "books", "podcasts", "supplementals", "tweets"] # type(categoriesObjectNames[0]) = 'string'
1146 | 
1147 | # Load existing readwise data from JSON files into categoriesObject
1148 | print('Loading data from JSON files in readwiseCategories to categoriesObject...')
1149 | loadBookDataFromJsonToObject()
1150 | 
1151 | originalMarkdownNoteAmount = numberOfMarkdownNotes() # Sum the original number of books in each dictionary
1152 | 
1153 | ##################
1154 | ### Books LIST ###
1155 | ##################
1156 | 
1157 | # Readwise REST API information = 'https://readwise.io/api_deets'
1158 | # Readwise endpoint = 'https://readwise.io/api/v2/books/'
1159 | 
1160 | booksListQueryString = {
1161 |     "page_size": 1000, # 1000 items per page - maximum
1162 |     "page": 1, # Page 1 >> build for loop to cycle through pages and stop when complete
1163 |     "updated__gt": dateFrom, # if no date provided, it will default to dateLastScriptRun
1164 | }
1165 | 
1166 | # Trigger GET request with booksListQueryString
1167 | print('Fetching book data from readwise...')
1168 | booksList = requests.get(
1169 |     url="https://readwise.io/api/v2/books/", # endpoint provided by https://readwise.io/api_deets
1170 |     headers={"Authorization": "Token " + token}, # token imported from readwiseAccessToken file
1171 |     params=booksListQueryString # query string object
1172 | )
1173 | print("Here's the response: " + str(booksList.content))
1174 | 
1175 | # Convert response into JSON object
1176 | try:
1177 |     print('Converting readwise book data returned into JSON...')
1178 |     booksListJson = booksList.json() # type(booksListJson) >> 'dict' https://docs.python.org/3/tutorial/datastructures.html#dictionaries
1179 | except ValueError:
1180 |     message = 'Response content from booksList request is not valid JSON'
1181 |     logDateTimeOutput(message)
1182 |     print(message) # Originally from https://github.com/psf/requests/issues/4908#issuecomment-627486125
1183 |     # JSONDecodeError: Expecting value: line 1 column 1 (char 0) specifically happens with an empty string (i.e. empty response content)
1184 | 
1185 | try:
1186 |     # Create new object of booksListJson['results']
1187 |     booksListResults = booksListJson['results'] # type(booksListResults) = 'list'
1188 | except NameError:
1189 |     message = 'Cannot extract results from empty JSON for booksList request'
1190 |     logDateTimeOutput(message)
1191 |     print(message)
1192 | 
1193 | # Loop through pagination using 'next' property from GET response
1194 | try:
1195 |     additionalLoopCounter = 0
1196 |     while booksListJson['next']:
1197 |         additionalLoopCounter += 1
1198 |         print('Fetching additional book data from readwise... (page ' + str(additionalLoopCounter) + ')')
1199 |         booksList = requests.get(
1200 |             url=booksListJson['next'], # keep same query parameters from booksListQueryString object
1201 |             headers={"Authorization": "Token " + token}, # token imported from readwiseAccessToken file
1202 |         )
1203 |         try:
1204 |             print('Converting additional readwise book data returned into JSON... (page ' + str(additionalLoopCounter) + ')')
1205 |             booksListJson = booksList.json() # type(booksListJson) = 'dictionary'
1206 |         except ValueError:
1207 |             message = 'Response content from additional booksList request is not valid JSON'
1208 |             logDateTimeOutput(message)
1209 |             print(message) # Originally from https://github.com/psf/requests/issues/4908#issuecomment-627486125
1210 |             # JSONDecodeError: Expecting value: line 1 column 1 (char 0) specifically happens with an empty string (i.e. empty response content)
1211 |             break
1212 |         try:
1213 |             # Create dictionary of highlightsListJson['results']
1214 |             booksListResults.extend(booksListJson['results']) # type(booksListJson) = 'list'
1215 |         except NameError:
1216 |             message = 'Cannot extract results from empty JSON for additional booksList request'
1217 |             logDateTimeOutput(message)
1218 |             print(message)
1219 |             break
1220 | except NameError:
1221 |     message = 'Cannot loop through pagination from empty response'
1222 |     logDateTimeOutput(message)
1223 |     print(message)
1224 | 
1225 | # Sort booksListResults data by 'category' key
1226 | print('Sorting readwise book data by category...')
1227 | booksListResultsSort = sorted(booksListResults, key = itemgetter('category')) # e.g. 'category' = 'books'
1228 | 
1229 | # Group booksListResults data by 'category' key
1230 | print('Grouping readwise book data by category...')
1231 | booksListResultsGroup = groupby(booksListResultsSort, key = itemgetter('category'))
1232 | 
1233 | # Append new books to categoriesObject, or update existing book data
1234 | print('Appending readwise book data returned to categoriesObject...')
1235 | appendBookDataToObject()
1236 | 
1237 | #######################
1238 | ### Highlights LIST ###
1239 | #######################
1240 | 
1241 | # Readwise REST API information = 'https://readwise.io/api_deets'
1242 | # Readwise endpoint = 'https://readwise.io/api/v2/highlights/'
1243 | 
1244 | # Create highlightsList query string:
1245 | highlightsListQueryString = {
1246 |     "page_size": 1000, # 1000 items per page - maximum
1247 |     "page": 1, # Page 1 >> build for loop to cycle through pages and stop when complete
1248 |     "highlighted_at__gt": dateFrom,
1249 | }
1250 | 
1251 | # Trigger GET request with highlightsListQueryString
1252 | print('Fetching highlight data from readwise...')
1253 | highlightsList = requests.get(
1254 |     url="https://readwise.io/api/v2/highlights/",
1255 |     headers={"Authorization": "Token " + token}, # token imported from readwiseAccessToken file
1256 |     params=highlightsListQueryString # query string object
1257 | )
1258 | 
1259 | # Convert response into JSON object
1260 | try:
1261 |     print('Converting readwise highlight data returned into JSON...')
1262 |     highlightsListJson = highlightsList.json() # type(highlightsListJson) = 'dictionary'
1263 | except ValueError:
1264 |     message = 'Response content is not valid JSON'
1265 |     logDateTimeOutput(message)
1266 |     print(message) # Originally from https://github.com/psf/requests/issues/4908#issuecomment-627486125
1267 |     # JSONDecodeError: Expecting value: line 1 column 1 (char 0) specifically happens with an empty string (i.e. empty response content)
1268 | 
1269 | try:
1270 |     # Create dictionary of highlightsListJson['results']
1271 |     highlightsListResults = highlightsListJson['results'] # type(highlightsListResults) = 'list'
1272 | except NameError:
1273 |     message = 'Cannot extract results from empty JSON'
1274 |     logDateTimeOutput(message)
1275 |     print(message)
1276 | 
1277 | try:
1278 |     # Loop through pagination using 'next' property from GET response
1279 |     additionalLoopCounter = 0
1280 |     while highlightsListJson['next']:
1281 |         additionalLoopCounter += 1
1282 |         print('Fetching additional highlight data from readwise... (page ' + str(additionalLoopCounter) + ')')
1283 |         highlightsList = requests.get(
1284 |             url=highlightsListJson['next'], # keep same query parameters from booksListQueryString object
1285 |             headers={"Authorization": "Token " + token}, # token imported from readwiseAccessToken file
1286 |         )
1287 |         # Convert response into JSON object
1288 |         try:
1289 |             print('Converting additional readwise highlight data returned into JSON... (page ' + str(additionalLoopCounter) + ')')
1290 |             highlightsListJson = highlightsList.json() # type(highlightsListJson) = 'dictionary'
1291 |         except ValueError:
1292 |             message = 'Response content is not valid JSON'
1293 |             logDateTimeOutput(message)
1294 |             print(message) # Originally from https://github.com/psf/requests/issues/4908#issuecomment-627486125
1295 |             # JSONDecodeError: Expecting value: line 1 column 1 (char 0) specifically happens with an empty string (i.e. empty response content)
1296 |             break
1297 |         try:
1298 |             # Create dictionary of highlightsListJson['results']
1299 |             highlightsListResults.extend(highlightsListJson['results']) # type(highlightsListResults) = 'list'
1300 |         except NameError:
1301 |             message = 'Cannot extract results from empty JSON'
1302 |             logDateTimeOutput(message)
1303 |             print(message)
1304 |             break
1305 | except NameError:
1306 |     message = 'Cannot loop through pagination from empty response'
1307 |     logDateTimeOutput(message)
1308 |     print(message)
1309 | 
1310 | # Replace "location" and "location_type" fields with no values as this will otherwise block highlight data sorting and grouping
1311 | replaceNoneInListOfDict(highlightsListResults)
1312 | 
1313 | # Sort highlightsListResults data by 'book_id' key and 'location'
1314 | print('Sorting readwise highlight data by category...')
1315 | highlightsListResultsSort = sorted(highlightsListResults, key = itemgetter('book_id', 'location'))
1316 | 
1317 | # Group highlightsListResultsSort data by 'category' key
1318 | print('Grouping readwise highlight data by category...')
1319 | highlightsListResultsGroup = groupby(highlightsListResultsSort, key = itemgetter('book_id'))
1320 | 
1321 | listOfBookIdsToUpdateMarkdownNotes = [] # Append 'book ids' to loop through when creating new or updating existing arkdown notes
1322 | 
1323 | # Append new highlights to categoriesObject, or update existing highlight data
1324 | print('Appending readwise highlight data returned to categoriesObject...')
1325 | appendHighlightDataToObject()
1326 | 
1327 | allHighlightsToFetchTagsFor = [] # Append values from 'highlightsListResultsSort' and 'missingHighlightsListResultsSort' into this list
1328 | missingHighlightsListToFetchTagsFor = [] # Append values from 'missingHighlightsListResultsSort' into this list
1329 | allHighlightsToFetchTagsForSortByDate = []
1330 | 
1331 | def appendHighlightsToListForFetchingTags(originalList, highlightsListToAppend):
1332 |     # allHighlightsToFetchTagsFor = [allHighlightsToFetchTagsFor.append(highlightsListToAppend)
1333 |     for i in range(len(highlightsListToAppend)):
1334 |         originalList.append(highlightsListToAppend[i])
1335 | 
1336 | appendHighlightsToListForFetchingTags(allHighlightsToFetchTagsFor, highlightsListResultsSort)
1337 | 
1338 | print('Appending updated highlight data to categoriesObject...')
1339 | appendUpdatedHighlightsToObject()
1340 | 
1341 | #########################################################
1342 | ### Fetch tags individually or in bulk via CSV export ###
1343 | #########################################################
1344 | 
1345 | # appendTagsToHighlightObject(highlightsListResultsSort)
1346 | 
1347 | # If num of highlights in 'highlightsListResultsSort' is greater than limit specified in 'highlightLimitToFetchTags', fetch tags via CSV export
1348 | # Otherwise web scrape tags individually via Selenium
1349 | def fetchTagsIndividuallyOrInBulk():
1350 |     if fetchTagsBoolean is True:
1351 |         try:
1352 |             if len(allHighlightsToFetchTagsFor) > highlightLimitToFetchTags:
1353 |                 message = 'Fetching tags for ' + str(len(allHighlightsToFetchTagsFor)) + ' highlights in bulk via CSV export...'
1354 |                 logDateTimeOutput(message)
1355 |                 print(message)
1356 |                 list_Highlight, list_BookTitle, list_BookAuthor, list_AmazonBookId, list_Note, list_Color, list_Tags, list_LocationType, list_Location, \
1357 |                 list_HighlightedAt, list_ReadwiseBookId, list_Source, list_Url, list_NumberOfHighlights, list_UpdatedAt, list_HighlightId = runFetchCsvData()
1358 |                 # runFetchCsvData()
1359 |                 allHighlightsToFetchTagsForSortByDate, list_extractedHighlightTags, list_extractedHighlightText, list_extractedHighlightId, list_extractedHighlightLocation, \
1360 |                 list_extractedHighlightedAt, list_extractedHighlightBookId, list_noMatchingHighlightIdFromText, list_duplicateHighlightTextValues = \
1361 |                 runExtractDataFromApi(list_Highlight, list_BookTitle, list_BookAuthor, list_AmazonBookId, list_Note, list_Color, list_Tags, list_LocationType, \
1362 |                 list_Location, list_HighlightedAt, list_ReadwiseBookId, list_Source, list_Url, list_NumberOfHighlights, list_UpdatedAt, list_HighlightId)
1363 |                 # runFetchTagsFromCsvData()
1364 |                 runFetchTagsFromCsvData(list_Highlight, list_BookTitle, list_BookAuthor, list_AmazonBookId, list_Note, list_Color, list_Tags, list_LocationType, \
1365 |                     list_Location, list_HighlightedAt, list_ReadwiseBookId, list_Source, list_Url, list_NumberOfHighlights, list_UpdatedAt, list_HighlightId, \
1366 |                     list_extractedHighlightTags, list_extractedHighlightText, list_extractedHighlightId, list_extractedHighlightLocation, list_extractedHighlightedAt, \
1367 |                     list_extractedHighlightBookId, list_noMatchingHighlightIdFromText, list_duplicateHighlightTextValues)
1368 |             elif len(allHighlightsToFetchTagsFor) <= highlightLimitToFetchTags:
1369 |                 message = 'Fetching tags for ' + str(len(allHighlightsToFetchTagsFor)) + ' highlights individually...'
1370 |                 logDateTimeOutput(message)
1371 |                 print(message)
1372 |                 appendTagsToHighlightObject(highlightsListResultsSort)
1373 |                 appendTagsToHighlightObject(missingHighlightsListToFetchTagsFor)
1374 |             else:
1375 |                 message = 'Error trying to determine whether to fetch tags individually or in bulk'
1376 |                 logDateTimeOutput(message)
1377 |                 print(message)
1378 |         except (OSError, ValueError):
1379 |             return
1380 |     else:
1381 |         return
1382 | 
1383 | if fetchTagsBoolean is True:
1384 |     fetchTagsIndividuallyOrInBulk() # Function to determine whether to fetch tags individually or in bulk
1385 |     removeHighlightsWithDiscardTag() # Function to remove highlights from categoriesObject which contain 'discard' tag
1386 |     appendHashtagToTags() # Function to append a hashtag to the start of every tag (if they are missing)
1387 | else:
1388 |     message = 'No tags fetched as one of the input variables required in readwiseMetadata is blank or invalid'
1389 |     logDateTimeOutput(message)
1390 |     print(message)
1391 | 
1392 | # Export books with updated highlights to JSON files
1393 | appendBookAndHighlightObjectToJson()
1394 | 
1395 | ############################
1396 | ### Create markdown note ###
1397 | ############################
1398 | 
1399 | newMarkdownNoteAmount = numberOfMarkdownNotes() # Sum the new number of books in each dictionary
1400 | 
1401 | print('Creating or updating markdown notes...')
1402 | 
1403 | createMarkdownNote(listOfBookIdsToUpdateMarkdownNotes)
1404 | 
1405 | ###############################################
1406 | ### Print script completion time to console ###
1407 | ###############################################
1408 | 
1409 | os.chdir(sourceDirectory)
1410 | 
1411 | message = 'Script complete'
1412 | logDateTimeOutput(message)
1413 | print(message)
1414 | 


--------------------------------------------------------------------------------
/readwise-GET_install.py:
--------------------------------------------------------------------------------
 1 | ############################
 2 | ### Install dependencies ###
 3 | ############################
 4 | 
 5 | # Instructions for installing Python modules here https://docs.python.org/3/installing/index.html
 6 | # If using Mac, use python3.9 -m pip install. If using Windows, use py -3.9 -m pip install 
 7 | 
 8 | # Mac
 9 | 
10 | # !/usr/bin/env python3.9
11 | 
12 | python3.9 -m pip install requests
13 | python3.9 -m  pip install Django
14 | python3.9 -m  pip install Unidecode
15 | python3.9 -m  pip install pathvalidate
16 | python3.9 -m  pip install pandas
17 | python3.9 -m  pip install chromedriver
18 | python3.9 -m  pip install selenium
19 | 
20 | """
21 | # Windows
22 | 
23 | # !/usr/bin/env py
24 | 
25 | py -3.9 -m pip install requests
26 | py -3.9 -m  pip install Django
27 | py -3.9 -m  pip install Unidecode
28 | py -3.9 -m  pip install pathvalidate
29 | py -3.9 -m  pip install pandas
30 | py -3.9 -m  pip install chromedriver
31 | py -3.9 -m  pip install selenium
32 | """
33 | 


--------------------------------------------------------------------------------
/readwiseMetadata.py.default:
--------------------------------------------------------------------------------
 1 | #############################
 2 | ### Readwise Access Token ###
 3 | #############################
 4 | 
 5 | token = "" # ENTER YOUR TOKEN HERE
 6 | # Retrieve from https://readwise.io/access_token
 7 | # e.g. "abc123dEf45Gh6"
 8 | 
 9 | ###########################################################
10 | ### Specify target directory for new markdown notes i.e. Obsidian vault ###
11 | ###########################################################
12 | 
13 | targetDirectory = "" # ENTER VALID DIRECTORY PATH HERE
14 | # e.g. "/Users/johnsmith/Dropbox/Obsidian/Vault" on Mac or "\\Users\\johnsmith\\Dropbox\\Obsidian\\Vault" on Windows
15 | 
16 | ##################################################
17 | ### Specify query string parameters (optional) ###
18 | ##################################################
19 | 
20 | dateFrom = "" # "YYYY-MM-DD" format only. Get highlights AFTER this date only.
21 | # If set to "" or None, the script will default to 'last successful script run' date from readwiseGET.log (if exists), or it will fetch all readwise resources
22 | # e.g. "2020-01-01"
23 | 
24 | #########################################
25 | ### Data for fetching tags (optional) ###
26 | #########################################
27 | 
28 | # Readwise API endpoints seem to exclude tags, so I've added functionality to fetch tags from new or updated highlights.
29 | # Note: this uses Selenium to web scrape data from your readwise profile. Please use with caution!
30 | # If any of these variables are set to "" or None, no tags will be fetched.
31 | 
32 | email = "" # ENTER YOUR EMAIL HERE
33 | # e.g. "johnsmith@gmail.com"
34 | 
35 | pwd = "" # ENTER YOUR PASSWORD HERE
36 | # e.g. "J0HNSM1TH_312"
37 | 
38 | chromedriverDirectory = "" # ENTER VALID PATH TO CHROMEDRIVER
39 | # e.g. "/Users/johnsmith/Downloads/chromedriver.exe" on Mac or "\\Users\\johnsmith\\Downloads\\chromedriver.exe" on Windows
40 | # Read more here https://chromedriver.chromium.org/
41 | 
42 | highlightLimitToFetchTags = 10 # ENTER NUMBER HERE
43 | # Specify an integer limit (I recommend 10 for speed) to determine whether to fetch tags individually or in bulk via CSV export
44 | # If <=10 highlights returned, fetch tags individually. If >10 highlights returned, fetch tags in bulk via a CSV export
45 | 


--------------------------------------------------------------------------------