├── LICENSE
├── README.md
├── notes.md
├── notes2bear
├── notes2html
└── notes2quiver


/LICENSE:
--------------------------------------------------------------------------------
 1 | This is free and unencumbered software released into the public domain.
 2 | 
 3 | Anyone is free to copy, modify, publish, use, compile, sell, or
 4 | distribute this software, either in source code form or as a compiled
 5 | binary, for any purpose, commercial or non-commercial, and by any
 6 | means.
 7 | 
 8 | In jurisdictions that recognize copyright laws, the author or authors
 9 | of this software dedicate any and all copyright interest in the
10 | software to the public domain. We make this dedication for the benefit
11 | of the public at large and to the detriment of our heirs and
12 | successors. We intend this dedication to be an overt act of
13 | relinquishment in perpetuity of all present and future rights to this
14 | software under copyright law.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22 | OTHER DEALINGS IN THE SOFTWARE.
23 | 
24 | For more information, please refer to <http://unlicense.org>
25 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Apple Notes Export Tools
 2 | 
 3 | This repository includes a few python export tools for Apple's "Notes.app".  The scripts require python3, but I tried to make the scripts self-contained and concise, with no additional dependencies. I may revisit this and break the common code into a library in the future.
 4 | 
 5 | This repository is in the public domain.
 6 | 
 7 | ## `notes.md`
 8 | 
 9 | Description of how notes data is stored.
10 | 
11 | ## `notes2bear`
12 | 
13 | Writes a `notes.bearbk` file in Bear's backup format (a zip file with their own markup flavor).  It doesn't handle tables because Bear doesn't do tables yet. This shaves about 100 lines from the script.
14 | 
15 | ## `notes2html` 
16 | 
17 | `notes2html` takes an destination directory and writes a tree of html files and images.  One html file per note and any associated media in the media directory. If you specify `--title` the files with be named with the title guessed by Apple (with / replaced by _).  If you specify `--svg`, any drawings will be rendered as inline SVG; otherwise, the fallback jpg files provided by Apple will be used.
18 | 
19 | Usage is:
20 | 
21 | ```
22 | notes2html [--svg] [--title] dest
23 | ```
24 | 
25 | 
26 | 
27 | 
28 | 
29 | 
30 | 
31 | 
32 | 
33 | 
34 | 


--------------------------------------------------------------------------------
/notes.md:
--------------------------------------------------------------------------------
  1 | # Notes on Notes.app
  2 | For future reference and to aid anyone else who might want to extract data from the "Notes" app. I compiled this out of curiosity and a desire to backup my notes content. 
  3 | 
  4 | This document only covers the current format of notes, as synced with iCloud. The application also supports IMAP synced notes with reduced functionality. They are stored in a separate database (and on the IMAP server). You're on your own there, but it's pretty much just MIME and HTML.
  5 | 
  6 | ## OSX Files
  7 | 
  8 | The notes are stored in `~/Library/Group Containers/group.com.apple.notes` in a sqlite database named `NoteStore.sqlite`. The database contains sync state from iCloud. It's a CoreData store, but I approached it from plain sqlite.
  9 | 
 10 | We're interested in the `ZICCLOUDSYNCINGOBJECT` table and the `ZICNOTEDATA` table. The former contains the sync state of each iCloud object - notes, attachments, and folders, and the latter contains the note data. Everything has a UUID identifier `ZIDENTIFIER`. The `ZICCLOUDSYNCINGOBJECT` table column `ZNOTEDATA` points to the `Z_PK` column in `ZICNOTEDATA`.
 11 | The meat of the data is in zlib compressed protobuf blobs. For documents, it's `ZDATA` in `ZNOTEDATA` and for tables/drawings, it's `ZMERGEABLEDATA` in `ZICCLOUDSYNCINGOBJECT`.
 12 | 
 13 | Apple uses a [CvRDT](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) for syncing (evidenced by various strings appearing in the data and analysis of what happens as you edit). I suspect the one used for documents, which they call "topotext" is not actually a CvRDT (they seem to order conflicts on a first to sync basis rather than vector clock) but I could be mistaken. It doesn't matter since everything goes through iCloud, which serializes the merges.
 14 | 
 15 | ## Protobuf Data
 16 | 
 17 | **Document Wrapper**
 18 | Everything is wrapped with a versioned document object:
 19 | 
 20 |     message Document {
 21 |         repeated Version version = 2;
 22 |     }
 23 |     message Version {
 24 |         optional bytes data = 3;
 25 |     }
 26 | 
 27 | There are additional fields that aren't relevant to us, and I've never seen more than one `version`. The content of the `data` field is also protobuf but varies depending on whether we're looking at a note, table, or drawing.
 28 | 
 29 | **Notes**
 30 | The protobuf data for a note is a `String` as described below. Assume everything is optional and I’ve elided the CRDT stuff.  (Repeated field 3 of String is a sequence clock, length, attribute clock, tombstone, and children.  It forms a DAG. For chunks of length >1 the clock is implicitly incremented for each character.)
 31 | 
 32 | 
 33 |     message String {
 34 |         string string = 2;
 35 |         // these are in order, disjoint, and their length sums to the length of string
 36 |         repeated AttributeRun attributeRun = 5;
 37 |     }
 38 |     message AttributeRun {
 39 |         uint32 length = 1;
 40 |         ParagraphStyle paragraphStyle = 2;
 41 |         Font font = 3;    
 42 |         uint32 fontHints = 5; // 1:bold, 2:italic, 3:bold italic
 43 |         uint32 underline = 6;
 44 |         uint32 strikethrough = 7;
 45 |         int32 superscript = 8; // sign indicates super/subscript
 46 |         string link = 9;
 47 |         Color color = 10;
 48 |         AttachmentInfo attachmentInfo = 12;
 49 |     }
 50 |     message ParagraphStyle {
 51 |         // 0:title, 1:heading, 4:monospace, 100:dotitem, 101:dashitem, 102:numitem, 
 52 |         // 103:todoitem
 53 |         uint32 style = 1;
 54 |         uint32 alignment = 2; // 0:left, 1:center, 2:right, 3:justified
 55 |         int32 indent = 4;
 56 |         Todo todo = 5;
 57 |     }
 58 |     message Font {
 59 |         string name = 1;
 60 |         float pointSize = 2;
 61 |         uint32 fontHints = 3;
 62 |     }
 63 |     message AttachmentInfo {
 64 |         string attachmentIdentifier = 1;
 65 |         string typeUTI = 2;
 66 |     }
 67 |     message Todo {
 68 |         bytes todoUUID = 1;
 69 |         bool done = 2;
 70 |     }
 71 |     message Color {
 72 |         float red = 1;
 73 |         float green = 2;
 74 |         float blue = 3;
 75 |         float alpha = 4;
 76 |     }
 77 | 
 78 | **Drawings**
 79 | This info isn’t strictly necessary.  For each drawing, you’ll find a rendering in `FallbackImages/UUID.jpg`. I was curious whether I could recover / backup the original vector data, so I came up with the following. (The root object here is `Drawing`.)
 80 | 
 81 | 
 82 |     message Drawing {
 83 |         int64 serializationVersion = 1;
 84 |         repeated bytes replicaUUIDs = 2;
 85 |         repeated StrokeID versionVector = 3;
 86 |         repeated Ink inks = 4;
 87 |         repeated Stroke strokes = 5;
 88 |         int64 orientation = 6;
 89 |         StrokeID orientationVersion = 7;
 90 |         Rectangle bounds = 8;
 91 |         bytes uuid = 9;
 92 |     }
 93 |     message Color {
 94 |         float red = 1;
 95 |         float green = 2;
 96 |         float blue = 3;
 97 |         float alpha = 4;
 98 |     }
 99 |     message Rectangle {
100 |         float height = 4;
101 |         float originX = 1;
102 |         float originY = 2;
103 |         float width = 3;
104 |     }
105 |     message Transform {
106 |         float a = 1;
107 |         float b = 2;
108 |         float c = 3;
109 |         float d = 4;
110 |         float tx = 5;
111 |         float ty = 6;
112 |     }
113 |     message Ink {
114 |         Color color = 1;
115 |         string identifier = 2;
116 |         int64 version = 3;
117 |     }
118 |     message Stroke {
119 |         int64 inkIndex = 3;
120 |         int64 pointsCount = 4;
121 |         bytes points = 5;
122 |         Rectangle bounds = 6;
123 |         bool hidden = 9;
124 |         double timestamp = 11;
125 |         bool createdWithFinger = 12;
126 |         Transform transform = 10;
127 |     }
128 | 
129 | The byte array `points` is compactly encoded list of points. It is a sequence of this struct:
130 | 
131 | 
132 |     struct PKCompressedStrokePoint {
133 |         float timestamp;            // timestamp (delta from somethin, probably previous)
134 |         float xpos;
135 |         float ypos;
136 |         unsigned short radius;      // radius*10
137 |         unsigned short aspectRatio; // aspectRatio*1000
138 |         unsigned short edgeWidth;   // edgeWidth*10
139 |         unsigned short force;       // force*1000
140 |         unsigned short azimuth;     // azimuth*10430.2191955274
141 |         unsigned char altitude;     // altitude*162.338041953733
142 |         unsigned char opacity;      // opacity*255
143 |     };
144 | 
145 | The `inkIndex` field points into the array `inks`. The `identifier` in an `Ink` includes stuff like `com.apple.ink.marker`. I kinda fudged this in my svg generation - apple’s rendering code is much more sophisticated, taking azimuth/altitude into account.  My code works well enough for pen, but falls short on marker.  You may be better off with the jpeg.
146 | 
147 | **Tables**
148 | This one is complicated, and I think a bit of explanation is in order to explain why it’s structured this way. They are trying to model a table with multiple people editing at the same time. Editing the contents of a cell is essentially a solved problem (it’s complicated, but it’s solved above - each cell is its own "document” - a `String` object from above).
149 | 
150 | But in addition to this, people are adding, removing, and reordering columns.  You want to ensure that if two people move a column or one person adds a column and another adds a row, things end up in a sane state, no matter which order you see the operations.
151 | 
152 | To do this, we consider the rows to be an ordered set of uuids. (And the same for the columns.) Then you have a map of column guid → row guid → String object. 
153 | 
154 | The data itself a pile of CRDTs encoded with something like NSKeyedArchiver, but built on top of protobuf for these CRDTs. The root object contains a few tables, and a list of objects (like NSKeyedArchiver), and reverenced by index via a variant time that apple calls `ObjectID`. (I managed to figure this out by generically decoding the protobuf data and looking at it, but later found they ship an older revision of the proto files to their web app.)
155 | 
156 | This is the variant type used below:
157 | 
158 | 
159 |     message ObjectID {
160 |         uint64 unsignedIntegerValue = 2;
161 |         string stringValue = 4;
162 |         uint32 objectIndex = 6;
163 |     }
164 | 
165 | The `objectIndex` is a index into the list of `object` in `Document`. 
166 | 
167 | The root `Document` is:
168 | 
169 | 
170 |     message Document {   
171 |         repeated DocObject object = 3;
172 |         repeated string keyItem = 4;
173 |         repeated string typeItem = 5;
174 |         repeated bytes uuidItem = 6;
175 |     }
176 |     
177 |     message DocObject {
178 |         RegisterLatest registerLatest = 1;
179 |         Dictionary dictionary = 6;
180 |         String string = 10;  // this is our fancy String above
181 |         CustomObject custom = 13;
182 |         OrderedSet orderedSet = 16;    
183 |     }
184 |     
185 | 
186 | The first object in the `object` field is the root object. A `CustomObject` is essentially a key/value map with a type.  The keys are indexed from `keyItem` and type is from `typeItem`.
187 | 
188 | 
189 |     message CustomObject {
190 |         int32 type = 1; // index into "typeItem" below
191 |         message MapEntry {
192 |             required int32 key = 1; // index into keyItem below
193 |             required ObjectID value = 2;
194 |         }
195 |         repeated MapEntry mapEntry = 3;
196 |     }
197 | 
198 | For a UUID, the type is `com.apple.CRDT.NSUUID` and there is a `UUIDIndex` field whose value is the index of the UUID in `uuidItem` of the `Document`.
199 | 
200 | A `RegisterLatest` is just a CRDT for a value.  There is a clock, not shown here, which helps with merging conflicts.  It’s last write wins. It only appears to be used to point at an NSString custom object holding “CRTableColumnDirectionLeftToRight”.  I’m ignoring this at the moment.
201 | 
202 | 
203 |     message RegisterLatest {
204 |         ObjectID contents = 2;
205 |     }
206 | 
207 | A `Dictionary` object holds object id for both key and value. In practice the keys points to a UUID `CustomObject` and the value is either a UUID or another dictionary. (This is a last-write-wins CRDT, I’m leaving out the clock values.) 
208 | 
209 | 
210 |     message Dictionary {
211 |         message Element {
212 |            ObjectID key = 1;
213 |            ObjectID value = 2;
214 |         }
215 |         repeated Element element = 1;
216 |     }
217 | 
218 | Which leaves us with `OrderedSet`.  An ordered set leverages `String` to provide an vector of UUIDs via `TTArray`.  Pairs of string position to UUID are stored in `attachments` of `TTArray`.  Surrounding that is `Array` which has a CRDT Dictionary to map the uuid of the TTArray to the content in that position of the array (which happens to also be a UUID in this case).  And surrounding that is `OrderedSet` which contains another `Dictionary` of UUID to UUID, but the keys and values are the same (uuids from the TTArray space). This seems to be used to filter out deleted items in the case where you simultaneously move and delete a column. (The move does delete + add and the delete does a delete, so a copy remains in the Array.)
219 | 
220 | Two conflicting moves will also create duplicates in the array. Apple appears to handle this by ignoring all but the first instance. (And cleaning it up on the next move of that column.)
221 | 
222 | 
223 |     message OrderedSet {
224 |         Array ordering = 1;
225 |         Dictionary elements = 2; // set of elements that haven't been deleted
226 |     }
227 |     message Array {
228 |         TTArray array = 1; // TTArray
229 |         Dictionary contents = 2; // map of TTArray uuid to content uuid
230 |     }
231 |     message TTArray {
232 |         String contents = 1; // we don't actually reference this.
233 |         ArrayAttachment attachments = 2; // list of (position -> uuid
234 |     }
235 |     message ArrayAttachment {
236 |         int64 index = 1;
237 |         bytes uuid = 2;
238 |     }
239 | 
240 | 
241 | **Decoding Tables**
242 | 
243 | Ok, so to decode a table, the root object will be a CustomObject with the fields:
244 | 
245 | | Field       | Value                                                       |
246 | | ----------- | ----------------------------------------------------------- |
247 | | crRows      | OrderedSet for row uuids                                    |
248 | | crColumns   | OrderedSet for column uuids                                 |
249 | | cellColumns | Dictionary of column uuid → Dictionary of row uuid → String |
250 | 
251 | The `value` of these fields are referenced by `objectIndex`.  Both `crRows` and `crColumns` are a `CustomObject` of type `OrderedSet`. 
252 | 
253 | To get the list of uuids for these ordered sets, take each `ordering.array.attachments.uuid`, filter out values that don’t appear as keys in `elements` , and look up each of the resulting values in the dictionary `ordering.contents`. 
254 | 
255 | Then iterate through the column uuids and look them up in cellColumns. The result, for each column, will be a dictionary.  In this dictionary, look up each row uuid.  The result, if present, will be a String object (or rather an ObjectId pointing to a String object).  This is the content of the table cell.
256 | 
257 | 
258 | 
259 | ## TODO
260 | 
261 | I still need to write up the attachment, folder, and encryption stuff.
262 | 
263 | 
264 | 


--------------------------------------------------------------------------------
/notes2bear:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import zlib, os, sqlite3, re, zipfile, json
  3 | from struct import unpack_from
  4 | from datetime import datetime
  5 | 
  6 | # Simple script to export Notes.app to Bear.app backup format
  7 | # I have code to decode the tables and drawings (to svg), but not necessary for Bear
  8 | 
  9 | def uvarint(data,pos):
 10 |     x = s = 0
 11 |     while True:
 12 |         b = data[pos]
 13 |         pos += 1
 14 |         x = x | ((b&0x7f)<<s)
 15 |         if b < 0x80: return x,pos
 16 |         s += 7
 17 | 
 18 | def readbytes(data,pos):
 19 |     l,pos = uvarint(data,pos)
 20 |     return data[pos:pos+l], pos+l
 21 | 
 22 | def readstruct(fmt,l):
 23 |     return lambda data,pos: (unpack_from(fmt,data,pos)[0],pos+l)
 24 | 
 25 | readers = [ uvarint, readstruct('<d',8), readbytes, None, None, readstruct('<f',4) ]
 26 | 
 27 | def parse(data, schema):
 28 |     "parses a protobuf"
 29 |     obj = {}
 30 |     pos = 0
 31 |     while pos < len(data):
 32 |         val,pos = uvarint(data,pos)
 33 |         typ = val & 7
 34 |         key = val >> 3
 35 |         val, pos = readers[typ](data,pos)
 36 |         if key not in schema: 
 37 |             continue
 38 |         name, repeated, typ = schema[key]
 39 |         if isinstance(typ, dict):
 40 |             val = parse(val, typ)
 41 |         if typ == 'string':
 42 |             val = val.decode('utf8')
 43 |         if repeated:
 44 |             val = obj.get(name,[]) + [val]
 45 |         obj[name] = val
 46 |     return obj
 47 | 
 48 | def translate(data, media):
 49 |     styles = {0: '# ', 1: '## ',100: '* ', 101: '* ', 102: '1. ', 103: '- '}
 50 |     rval = []
 51 |     refs = []
 52 |     txt = data['string']
 53 |     pos = 0
 54 |     acc = None
 55 |     pre = False
 56 |     for run in data['attributeRun']:
 57 |         l = run['length']
 58 |         for frag in re.findall(r'\n|[^\n]+',txt[pos:pos+l]):
 59 |             if acc is None: # start paragraph
 60 |                 pstyle = run.get('paragraphStyle',{}).get('style')
 61 |                 indent = run.get('paragraphStyle',{}).get('indent',0)
 62 |                 acc = "  "*indent+styles.get(pstyle,"")
 63 |                 if pstyle == 103 and run['paragraphStyle']['todo']['done']:
 64 |                     acc = "  "*indent+"+ "
 65 |                 if pstyle == 4:
 66 |                     if not pre: 
 67 |                         rval.append("```")
 68 |                 elif pre:
 69 |                     rval.append("```")
 70 |                 pre = pstyle == 4
 71 |             if frag == '\n': # end paragraph
 72 |                 rval.append(acc)
 73 |                 acc = None
 74 |             else: # accumulate and handle inline styles - although bear doesn't seem to support nested ones. 
 75 |                 link = run.get('link')
 76 |                 info = run.get('attachmentInfo')
 77 |                 style = run.get('fontHints',0) + 4*run.get('underline',0) + 8*run.get('strikethrough',0)
 78 |                 if style & 1: frag = f'*{frag}*'
 79 |                 if style & 2: frag = f'/{frag}/'
 80 |                 if style & 4: frag = f'_{frag}_'
 81 |                 if style & 8: frag = f'~{frag}~'
 82 |                 if link: frag = f'[{frag}]({link})'
 83 |                 if info:
 84 |                     id = info.get('attachmentIdentifier')
 85 |                     fn = media.get(id)
 86 |                     if fn:
 87 |                         _,e = os.path.splitext(fn)
 88 |                         acc += f'[assets/{id}{e}]'
 89 |                         refs.append(id)
 90 |                     else:
 91 |                         acc += f"ATTACH {info}"
 92 |                 else:
 93 |                     acc += frag
 94 |         pos += l
 95 |     if acc: rval.append(acc)
 96 |     rval = '\n'.join(rval)+"\n"
 97 |     return rval,refs
 98 | 
 99 | # The schema subset needed for bear export
100 | docschema = { 
101 |     2: [ "version", 1, { 
102 |         3: [ "data", 0, {
103 |             2: [ "string", 0, "string"],
104 |             5: [ "attributeRun", 1, {
105 |                 1: ["length",0,0],
106 |                 2: ["paragraphStyle", 0, {
107 |                     1: ["style", 0,0],
108 |                     4: ["indent",0,0],
109 |                     5: ["todo",0,{ 
110 |                         1: ["todoUUID", 0, "bytes"],
111 |                         2: ["done",0,0]
112 |                     }]
113 |                 }],
114 |                 5: ["fontHints",0,0],
115 |                 6: ["underline",0,0],
116 |                 7: ["strikethrough",0,0],
117 |                 9: [ "link", 0, "string" ],
118 |                 12: [ "attachmentInfo", 0, {
119 |                     1: [ "attachmentIdentifier", 0, "string"],
120 |                     2: [ "typeUTI", 0, "string"]
121 |                 }]
122 |             }]
123 |         }]
124 |     }]
125 | }
126 | 
127 | if __name__ == '__main__':
128 |     root = os.path.expanduser("~/Library/Group Containers/group.com.apple.notes")
129 |     db = sqlite3.Connection(os.path.join(root,'NoteStore.sqlite'))
130 |     media = {} # there is some indirection for attachments
131 |     for a,b,fn in db.execute('select a.zidentifier, b.zidentifier, b.zfilename from ziccloudsyncingobject a left join ziccloudsyncingobject b on a.zmedia = b.z_pk'):
132 |         if fn:
133 |             full = os.path.join(root,'Media',b,fn)
134 |         else:
135 |             full = os.path.join(root,'FallbackImages',a+".jpg")
136 |         if os.path.exists(full):
137 |             media[a] = full
138 | 
139 |     count = 0
140 |     with zipfile.ZipFile('notes.bearbk','w') as zip:
141 |         for id, title, data, cdate, mdate in db.execute('select o.zidentifier, ztitle1, zdata, o.zcreationdate1, o.zmodificationdate1 from zicnotedata join ziccloudsyncingobject o on znotedata = zicnotedata.z_pk where zicnotedata.zcryptotag is null'):
142 |             ze = f'notes.bearbk/{id}.textbundle'
143 |             if data:
144 |                 pb = zlib.decompress(data, 47)
145 |                 doc = parse(pb, docschema)['version'][0]['data']
146 |                 if not doc['string']: continue # some are blank
147 |                 text,refs = translate(doc, media)
148 |                 info = {
149 |                     "type":"public.plain-text", 
150 |                     "creatorIdentifier": "net.shinyfrog.bear",
151 |                     "net.shinyfrog.bear": { 
152 |                         "modificationDate": datetime.fromtimestamp(int(mdate)+978307200).isoformat(),
153 |                         "creationDate": datetime.fromtimestamp(int(cdate)+978307200).isoformat()
154 |                     },
155 |                     "version":2
156 |                 }
157 |                 zip.writestr(f'{ze}/info.json',json.dumps(info))
158 |                 zip.writestr(f'{ze}/text.txt',text)
159 |                 for ref in refs:
160 |                     _,e = os.path.splitext(media[ref])
161 |                     fn = f'{ze}/assets/{ref}{e}'
162 |                     zip.write(media[ref],fn)
163 |                 count += 1
164 |     print(f"wrote {count} notes to notes.bearbk")
165 | 
166 | 


--------------------------------------------------------------------------------
/notes2html:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import os, sqlite3, json, struct, re, zipfile, sys
  3 | from zlib import decompress
  4 | import xml.etree.ElementTree as ET
  5 | 
  6 | # HTML construction utils
  7 | 
  8 | def append(rval,a):
  9 |     "append a to rval and return a"
 10 |     if isinstance(a,str):
 11 |         i = len(rval)-1
 12 |         if i<0:
 13 |             rval.text = (rval.text or "")+a
 14 |         else:
 15 |             rval[i].tail = (rval[i].tail or "")+a
 16 |     elif isinstance(a,ET.Element):
 17 |         rval.append(a)
 18 |     elif isinstance(a,dict):
 19 |         rval.attrib.update(a)
 20 |     else:
 21 |         raise Exception(f"unhandled type {type(a)}")
 22 |     return a
 23 | 
 24 | def E(tag,*args):
 25 |     tag,*cc = tag.split('.')
 26 |     rval = ET.Element(tag)
 27 |     tail = None
 28 |     if cc: rval.set('class',' '.join(cc))
 29 |     for a in args:
 30 |         append(rval,a)
 31 |     return rval
 32 | 
 33 | # protobuf parser
 34 | 
 35 | def uvarint(data,pos):
 36 |     x = s = 0
 37 |     while True:
 38 |         b = data[pos]
 39 |         pos += 1
 40 |         x = x | ((b&0x7f)<<s)
 41 |         if b < 0x80: return x,pos
 42 |         s += 7
 43 | 
 44 | def readbytes(data,pos):
 45 |     l,pos = uvarint(data,pos)
 46 |     return data[pos:pos+l], pos+l
 47 | 
 48 | def readstruct(fmt,l):
 49 |     return lambda data,pos: (struct.unpack_from(fmt,data,pos)[0],pos+l)
 50 | 
 51 | readers = [ uvarint, readstruct('<d',8), readbytes, None, None, readstruct('<f',4) ]
 52 | 
 53 | def parse(data, schema):
 54 |     "parses a protobuf"
 55 |     obj = {}
 56 |     pos = 0
 57 |     while pos < len(data):
 58 |         val,pos = uvarint(data,pos)
 59 |         typ = val & 7
 60 |         key = val >> 3
 61 |         val, pos = readers[typ](data,pos)
 62 |         if key not in schema: 
 63 |             continue
 64 |         name, repeated, typ = schema[key]
 65 |         if isinstance(typ, dict):
 66 |             val = parse(val, typ)
 67 |         if typ == 'string':
 68 |             val = val.decode('utf8')
 69 |         if repeated:
 70 |             val = obj.get(name,[]) + [val]
 71 |         obj[name] = val
 72 |     return obj
 73 | 
 74 | def svg(drawing):
 75 |     "Convert note drawing to SVG"
 76 |     width = drawing['bounds']['width']
 77 |     height = drawing['bounds']['height']
 78 |     rval = E('svg',{'width':str(width),'height':str(height)})
 79 |     inks = drawing.get('inks')
 80 |     for stroke in drawing.get('strokes',[]):
 81 |         if stroke.get('hidden'):
 82 |             continue
 83 |         if 'points' in stroke:
 84 |             swidth=1
 85 |             ink = inks[stroke['inkIndex']]
 86 |             c = ink['color']
 87 |             red = int(c['red']*255)
 88 |             green = int(c['green']*255)
 89 |             blue = int(c['blue']*255)
 90 |             alpha = c['alpha']
 91 |             if ink['identifier'] == 'com.apple.ink.marker':
 92 |                 swidth = 15
 93 |                 alpha = 0.5
 94 | 
 95 |             color = f'rgba({red},{green},{blue},{alpha})'
 96 |             path = ''
 97 |             for _,x,y,*rest in struct.iter_unpack('<3f5H2B',stroke['points']):
 98 |                 path += f"L{x:.2f} {y:.2f}"
 99 |             path = "M"+path[1:]
100 |             
101 |             rval.append(E('path',{'d':"M"+path[1:],'stroke':color,'stroke-width':str(swidth),'stroke-cap':'round','fill':'none'}))
102 |             if 'transform' in stroke:
103 |                 rval[-1].set('transform',"matrix({a} {b} {c} {d} {tx:.2f} {ty:.2f})".format(**stroke['transform']))
104 |     return rval
105 | 
106 | def render_html(note,attachments={}):
107 |     if note is None:
108 |         return ""
109 |     "Convert note attributed string to HTML"
110 |     # TODO
111 |     # - attachments
112 |     styles = {0:'h1',1:'h2',4:'pre',100:'li',101:'li',102:'li',103:'li'}
113 |     rval = E('div')
114 |     txt = note['string']
115 |     pos = 0
116 |     par = None
117 |     for run in note.get('attributeRun',[]):
118 |         l = run['length']
119 |         for frag in re.findall(r'\n|[^\n]+',txt[pos:pos+l]):
120 |             if par is None: # start paragraph
121 |                 pstyle = run.get('paragraphStyle',{}).get('style',-1)
122 |                 indent = run.get('paragraphStyle',{}).get('indent',0)
123 |                 if pstyle > 100: # this mess handles merging bulleted lists
124 |                     tag = ['ul','ul','ol','ul'][pstyle - 100]
125 |                     par = rval
126 |                     while indent > 0:
127 |                         last = par[-1]
128 |                         if last.tag != tag:
129 |                             break
130 |                         par = last
131 |                         indent -= 1
132 |                     while indent >= 0:
133 |                         par = append(par,E(tag))
134 |                         indent -= 1
135 |                     par = append(par,E('li'))
136 |                 elif pstyle == 4 and rval[-1].tag == 'pre':
137 |                     par = rval[-1]
138 |                     append(par,"\n")
139 |                 else:
140 |                     par = append(rval,E(styles.get(pstyle,'p')))
141 |                 if pstyle == 103:
142 |                     par.append(E('input',{"type":"checkbox"}))
143 |                     if run.get('paragraphStyle',{}).get('todo',{}).get('done'):
144 |                         par[0].set('checked','')
145 |             if frag == '\n':
146 |                 par = None
147 |             else:
148 |                 link = run.get('link')
149 |                 info = run.get('attachmentInfo')
150 |                 style = run.get('fontHints',0) + 4*run.get('underline',0) + 8*run.get('strikethrough',0)
151 |                 if style & 1: frag = E('b',frag)
152 |                 if style & 2: frag = E('em',frag)
153 |                 if style & 4: frag = E('u',frag)
154 |                 if style & 8: frag = E('strike',frag)
155 |                 if info:
156 |                     attach = attachments.get(info.get('attachmentIdentifier'))
157 |                     if attach and attach.get('html'):
158 |                         frag = attach.get('html')
159 |                 append(par,frag)
160 |         pos += l
161 |     return rval
162 | 
163 | def process_archive(table):
164 |     "Decode a 'CRArchive'"
165 |     objects = []
166 | 
167 |     def dodict(v):
168 |         rval = {}
169 |         for e in v.get('element',[]):
170 |             rval[coerce(e['key'])] = coerce(e['value'])
171 |         return rval
172 | 
173 |     def coerce(o):
174 |         [(k,v)]= o.items()
175 |         if 'custom' == k:
176 |             rval = dict((table['keyItem'][e['key']],coerce(e['value'])) for e in v['mapEntry'])
177 |             typ = table['typeItem'][v['type']]
178 |             if typ == 'com.apple.CRDT.NSUUID':
179 |                 return table['uuidItem'][rval['UUIDIndex']]
180 |             if typ == 'com.apple.CRDT.NSString':
181 |                 return rval['self']
182 |             return rval
183 |         if k == 'objectIndex':
184 |             return coerce(table['object'][v])
185 |         if k == 'registerLatest':
186 |             return coerce(v['contents'])
187 |         if k == 'orderedSet':
188 |             elements = dodict(v['elements'])
189 |             contents = dodict(v['ordering']['contents'])
190 |             rval = []
191 |             for a in v['ordering']['array']['attachments']:
192 |                 value = contents[a['uuid']]
193 |                 if value not in rval and a['uuid'] in elements:
194 |                     rval.append(value)
195 |             return rval
196 |         if k == 'dictionary':
197 |             return dodict(v)
198 |         if k in ('stringValue','unsignedIntegerValue','string'):
199 |             return v
200 |         raise Exception(f"unhandled type {k}")
201 | 
202 |     return coerce(table['object'][0])
203 | 
204 | def render_table(table):
205 |     "Render a table to html"
206 |     table = process_archive(table)
207 |     rval = E('table')
208 |     for row in table['crRows']:
209 |         tr = E('tr')
210 |         rval.append(tr)
211 |         for col in table['crColumns']:
212 |             cell = table.get('cellColumns').get(col,{}).get(row)
213 |             td = E('td',render_html(cell))
214 |             rval.append(td)
215 |     return rval
216 | 
217 | s_string = {
218 |     2: [ "string", 0, "string"],
219 |     5: [ "attributeRun", 1, {
220 |         1: ["length",0,0],
221 |         2: ["paragraphStyle", 0, {
222 |             1: ["style", 0,0],
223 |             4: ["indent",0,0],
224 |             5: ["todo",0,{ 
225 |                 1: ["todoUUID", 0, "bytes"],
226 |                 2: ["done",0,0]
227 |             }]
228 |         }],
229 |         5: ["fontHints",0,0],
230 |         6: ["underline",0,0],
231 |         7: ["strikethrough",0,0],
232 |         9: ["link",0,"string"],
233 |         12: [ "attachmentInfo", 0, {
234 |             1: [ "attachmentIdentifier", 0, "string"],
235 |             2: [ "typeUTI", 0, "string"]
236 |         }]
237 |     }]
238 | }
239 | 
240 | s_doc = { 2: ["version", 1, { 3: ["data", 0, s_string ]}]}
241 | 
242 | s_drawing = { 2: ["version", 1, { 3: ["data", 0, {
243 |             4: ["inks",1, {
244 |                 1:["color",0,{1:["red",0,0],2:["green",0,0],3:["blue",0,0],4:["alpha",0,0]}],
245 |                 2:["identifier",0,"string"]
246 |             }],
247 |             5: ["strokes",1, {
248 |                 3:["inkIndex",0,0],
249 |                 5:["points",0,"bytes"],
250 |                 9:["hidden",0,0],
251 |                 10: ["transform",0,{1:["a",0,0],2:["b",0,0],3:["c",0,0],4:["d",0,0],5:["tx",0,0],6:["ty",0,0]}]
252 |             }],
253 |             8: ["bounds", 0, {1:["originX",0,0],2:["originY",0,0],3:["width",0,0],4:["height",0,0]}]
254 |         }]
255 |     }
256 | ]}
257 | 
258 | # this essentially is a variant type
259 | s_oid = { 2:["unsignedIntegerValue",0,0], 4:["stringValue",0,'string'], 6:["objectIndex",0,0] }
260 | s_dictionary = {1:["element",1,{ 1:["key",0,s_oid], 2:["value",0,s_oid]}]}
261 | s_table = { 2: ["version", 1, { 3: ["data", 0, {
262 |     3: ["object",1,{
263 |         1:["registerLatest",0,{2:["contents",0,s_oid]}],
264 |         6:["dictionary",0,s_dictionary],
265 |         10:["string",0,s_string],
266 |         13:["custom",0,{
267 |             1:["type",0,0],
268 |             3:["mapEntry",1,{
269 |                 1:["key",0,0],
270 |                 2:["value",0,s_oid]
271 |             }]
272 |         }],
273 |         16:["orderedSet",0,{
274 |             1: ["ordering",0, {
275 |                 1:["array",0,{
276 |                     1:["contents",0,s_string],
277 |                     2:["attachments",1,{1:["index",0,0],2:["uuid",0,0]}]
278 |                 }],
279 |                 2:["contents",0,s_dictionary]
280 |             }],
281 |             2: ["elements",0,s_dictionary]
282 |         }]
283 |     }],
284 |     4:["keyItem",1,"string"],
285 |     5:["typeItem",1,"string"],
286 |     6:["uuidItem",1,"bytes"]
287 | }]}]}
288 | 
289 | def write(data,*path):
290 |     path = os.path.join(*path)
291 |     os.makedirs(os.path.dirname(path),exist_ok=True)
292 |     open(path,'wb').write(data)
293 | 
294 | if __name__ == '__main__':
295 |     css = '''
296 | .underline { text-decoration: underline; }
297 | .strikethrough { text-decoration: line-through; }
298 | .todo { list-style-type: none; margin-left: -20px; }
299 | .dashitem { list-style-type: none; }
300 | .dashitem:before { content: "-"; text-indent: -5px }
301 | '''
302 | 
303 |     def help():
304 |         print(f'Usage:\n')
305 |         print(f'   {sys.argv[0]} [--svg] [--title] dest')
306 |         print(f'   --svg    Use svg for drawings')
307 |         print(f'   --title  Use title for filenames')
308 |         print(f'   dest     destination directory')
309 |         print()
310 |         exit(-1)
311 | 
312 |     dest = None
313 |     use_svg = False    
314 |     use_title = False
315 |     for x in sys.argv[1:]:
316 |         if x == '--svg': use_svg = True
317 |         elif x == '--title': use_title = True
318 |         elif x.startswith('--'):
319 |             help()
320 |         else:
321 |             dest = x
322 | 
323 |     if not dest:
324 |         help()
325 | 
326 | 
327 |     root = os.path.expanduser("~/Library/Group Containers/group.com.apple.notes")
328 |     dbpath = os.path.join(root,'NoteStore.sqlite')
329 |     write(open(dbpath,'rb').read(),dest,'NoteStore.sqlite')
330 |     db = sqlite3.Connection(dbpath)
331 | 
332 |     # process attachments first
333 |     attachments = {}
334 |     mquery = '''select a.zidentifier, a.zmergeabledata, a.ztypeuti, b.zidentifier, b.zfilename, a.zurlstring,a.ztitle
335 |         from ziccloudsyncingobject a left join ziccloudsyncingobject b on a.zmedia = b.z_pk
336 |         where a.zcryptotag is null and a.ztypeuti is not null'''
337 |     for id, data, typ, id2, fname, url,title in db.execute(mquery):
338 |         if typ == 'com.apple.drawing' and data and use_svg:
339 |             doc = parse(decompress(data,47),s_drawing)
340 |             attachments[id] = {'html': svg(doc['version'][0]['data'])}
341 |         elif typ == 'com.apple.notes.table' and data:
342 |             doc = parse(decompress(data,47),s_table)
343 |             attachments[id] = {'html': render_table(doc['version'][0]['data']) }
344 |         elif typ == 'public.url':
345 |             # there is a preview image somewhere too, but not sure I care
346 |             attachments[id] = {'html': E('a',{'href':url},title or url)}
347 |         elif fname:
348 |             fn = os.path.join('Media',id2,fname)
349 |             if typ in ['public.tiff','public.jpeg','public.png']:
350 |                 attachments[id] = {'html': E('img',{'src':fn})}
351 |             else:
352 |                 attachments[id] = {'html': E('a',{'href':fn},fname)}
353 |             src = os.path.join(root,fn)
354 |             if os.path.exists(src):
355 |                 write(open(src,'rb').read(),dest,fn)
356 |         else:
357 |             fn = os.path.join('FallbackImages',id+'.jpg')
358 |             src = os.path.join(root,fn)
359 |             if os.path.exists(src):
360 |                 attachments[id] = {'html': E('img',{'src':fn})}
361 |                 write(open(src,'rb').read(),dest,fn)
362 |             
363 |     nquery = '''select a.zidentifier, a.ztitle1, n.zdata from zicnotedata n join ziccloudsyncingobject a on a.znotedata = n.z_pk 
364 |         where n.zcryptotag is null and zdata is not null'''
365 | 
366 |     seen = set()
367 |     count = 0
368 |     for id,title,data in db.execute(nquery):
369 |         pb = decompress(data,47)
370 |         doc = parse(pb,s_doc)['version'][0]['data']
371 |         section = render_html(doc,attachments)
372 |         section.tag = 'section'
373 |         hdoc = E('html',E('head',E('style',css)),E('body',section))
374 |         fn = id
375 |         if use_title and title:
376 |             tmp = title.replace(':','_').replace('/','_')[:64]
377 |             fn = tmp
378 |             ix = 1
379 |             while fn in seen:
380 |                 fn = tmp + '_' + str(ix)
381 |                 ix += 1
382 |             seen.add(fn)
383 |         html = ET.tostring(hdoc,method='html')
384 |         try:
385 |             write(html,dest,f'{fn}.html')
386 |         except:
387 |             print(f'write to {fn}.html failed, trying {id}.html')
388 |             write(html,dest,f'{fn}.html')
389 |         count += 1
390 | 
391 |     print(f"wrote {count} documents to {dest}")
392 | 


--------------------------------------------------------------------------------
/notes2quiver:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import os, sqlite3, json, struct, re, zipfile, sys
  3 | from zlib import decompress
  4 | import xml.etree.ElementTree as ET
  5 | 
  6 | # protobuf parser
  7 | 
  8 | def uvarint(data,pos):
  9 |     x = s = 0
 10 |     while True:
 11 |         b = data[pos]
 12 |         pos += 1
 13 |         x = x | ((b&0x7f)<<s)
 14 |         if b < 0x80: return x,pos
 15 |         s += 7
 16 | 
 17 | def readbytes(data,pos):
 18 |     l,pos = uvarint(data,pos)
 19 |     return data[pos:pos+l], pos+l
 20 | 
 21 | def readstruct(fmt,l):
 22 |     return lambda data,pos: (struct.unpack_from(fmt,data,pos)[0],pos+l)
 23 | 
 24 | readers = [ uvarint, readstruct('<d',8), readbytes, None, None, readstruct('<f',4) ]
 25 | 
 26 | def parse(data, schema):
 27 |     "parses a protobuf"
 28 |     obj = {}
 29 |     pos = 0
 30 |     while pos < len(data):
 31 |         val,pos = uvarint(data,pos)
 32 |         typ = val & 7
 33 |         key = val >> 3
 34 |         val, pos = readers[typ](data,pos)
 35 |         if key not in schema: 
 36 |             continue
 37 |         name, repeated, typ = schema[key]
 38 |         if isinstance(typ, dict):
 39 |             val = parse(val, typ)
 40 |         if typ == 'string':
 41 |             val = val.decode('utf8')
 42 |         if repeated:
 43 |             val = obj.get(name,[]) + [val]
 44 |         obj[name] = val
 45 |     return obj
 46 | 
 47 | # HTML construction utils
 48 | 
 49 | def append(rval,a):
 50 |     "append a to rval and return a"
 51 |     if isinstance(a,str):
 52 |         i = len(rval)-1
 53 |         if i<0:
 54 |             rval.text = (rval.text or "")+a
 55 |         else:
 56 |             rval[i].tail = (rval[i].tail or "")+a
 57 |     elif isinstance(a,ET.Element):
 58 |         rval.append(a)
 59 |     elif isinstance(a,dict):
 60 |         rval.attrib.update(a)
 61 |     else:
 62 |         raise Exception(f"unhandled type {type(a)}")
 63 |     return a
 64 | 
 65 | def E(tag,*args,**attrs):
 66 |     tag,*cc = tag.split('.')
 67 |     rval = ET.Element(tag)
 68 |     tail = None
 69 |     if cc: rval.set('class',' '.join(cc))
 70 |     if attrs:
 71 |         append(rval,attrs)
 72 |     for a in args:
 73 |         append(rval,a)
 74 |     return rval
 75 | 
 76 | # Util for processing CRArchive
 77 | 
 78 | def process_archive(table):
 79 |     "Decode a 'CRArchive' (for tables)"
 80 |     objects = []
 81 | 
 82 |     def dodict(v):
 83 |         return {coerce(e['key']):coerce(e['value']) for e in v.get('element',[])}
 84 | 
 85 |     def coerce(o):
 86 |         [(k,v)] = o.items()
 87 |         if 'custom' == k:
 88 |             rval = dict((table['keyItem'][e['key']],coerce(e['value'])) for e in v['mapEntry'])
 89 |             typ = table['typeItem'][v['type']]
 90 |             if typ == 'com.apple.CRDT.NSUUID':
 91 |                 return table['uuidItem'][rval['UUIDIndex']]
 92 |             if typ == 'com.apple.CRDT.NSString':
 93 |                 return rval['self']
 94 |             return rval
 95 |         if k == 'objectIndex':
 96 |             return coerce(table['object'][v])
 97 |         if k == 'registerLatest':
 98 |             return coerce(v['contents'])
 99 |         if k == 'orderedSet':
100 |             elements = dodict(v['elements'])
101 |             contents = dodict(v['ordering']['contents'])
102 |             rval = []
103 |             for a in v['ordering']['array']['attachments']:
104 |                 value = contents[a['uuid']]
105 |                 if value not in rval and a['uuid'] in elements:
106 |                     rval.append(value)
107 |             return rval
108 |         if k == 'dictionary':
109 |             return dodict(v)
110 |         if k in ('stringValue','unsignedIntegerValue','string'):
111 |             return v
112 |         raise Exception(f"unhandled type {k}")
113 | 
114 |     return coerce(table['object'][0])
115 | 
116 | # HTML
117 | 
118 | def render_html(note,get_attach=lambda x:None):
119 |     if note is None:
120 |         return ""
121 |     "Convert note attributed string to HTML"
122 |     # TODO
123 |     # - attachments
124 |     styles = {0:'h1',1:'h2',4:'pre',100:'li',101:'li',102:'li',103:'li'}
125 |     rval = E('div')
126 |     txt = note['string']
127 |     pos = 0
128 |     par = None
129 |     for run in note.get('attributeRun',[]):
130 |         l = run['length']
131 |         for frag in re.findall(r'\n|[^\n]+',txt[pos:pos+l]):
132 |             if par is None: # start paragraph
133 |                 pstyle = run.get('paragraphStyle',{}).get('style',-1)
134 |                 indent = run.get('paragraphStyle',{}).get('indent',0)
135 |                 if pstyle >= 100: # this mess handles merging todo lists
136 |                     tag = ['ul','ul','ol','ul'][pstyle - 100]
137 |                     par = rval
138 |                     while indent > 0:
139 |                         last = par[-1]
140 |                         if last.tag != tag:
141 |                             break
142 |                         par = last
143 |                         indent -= 1
144 |                     while indent >= 0:
145 |                         par = append(par,E(tag))
146 |                         indent -= 1
147 |                     par = append(par,E('li'))
148 |                 elif pstyle == 4 and rval[-1].tag == 'pre':
149 |                     par = rval[-1]
150 |                     append(par,"\n")
151 |                 else:
152 |                     par = append(rval,E(styles.get(pstyle,'p')))
153 |                 if pstyle == 103:
154 |                     par.append(E('input',{"type":"checkbox"}))
155 |                     if run.get('paragraphStyle',{}).get('todo',{}).get('done'):
156 |                         par[0].set('checked','')
157 |             if frag == '\n':
158 |                 par = None
159 |             else:
160 |                 link = run.get('link')
161 |                 info = run.get('attachmentInfo')
162 |                 style = run.get('fontHints',0) + 4*run.get('underline',0) + 8*run.get('strikethrough',0)
163 |                 if style & 1: frag = E('b',frag)
164 |                 if style & 2: frag = E('em',frag)
165 |                 if style & 4: frag = E('u',frag)
166 |                 if style & 8: frag = E('strike',frag)
167 |                 if info:
168 |                     attach = get_attach(info.get('attachmentIdentifier'))
169 |                     if attach is not None:
170 |                         frag = attach
171 |                 if link:
172 |                     frag = E('a',frag,href=link)
173 | 
174 |                 append(par,frag)
175 |         pos += l
176 |     return rval
177 | 
178 | def render_table_html(table):
179 |     "Render a table to html"
180 |     table = process_archive(table)
181 |     rval = E('table')
182 |     for row in table['crRows']:
183 |         tr = E('tr')
184 |         rval.append(tr)
185 |         for col in table['crColumns']:
186 |             cell = table.get('cellColumns').get(col,{}).get(row)
187 |             td = E('td',render_html(cell))
188 |             rval.append(td)
189 |     return rval
190 | 
191 | 
192 | # protobuf schema
193 | 
194 | s_string = {
195 |     2: [ "string", 0, "string"],
196 |     5: [ "attributeRun", 1, {
197 |         1: ["length",0,0],
198 |         2: ["paragraphStyle", 0, {
199 |             1: ["style", 0,0],
200 |             4: ["indent",0,0],
201 |             5: ["todo",0,{ 
202 |                 1: ["todoUUID", 0, "bytes"],
203 |                 2: ["done",0,0]
204 |             }]
205 |         }],
206 |         5: ["fontHints",0,0],
207 |         6: ["underline",0,0],
208 |         7: ["strikethrough",0,0],
209 |         9: ["link",0,"string"],
210 |         12: [ "attachmentInfo", 0, {
211 |             1: [ "attachmentIdentifier", 0, "string"],
212 |             2: [ "typeUTI", 0, "string"]
213 |         }]
214 |     }]
215 | }
216 | 
217 | s_doc = { 2: ["version", 1, { 3: ["data", 0, s_string ]}]}
218 | 
219 | # this essentially is a variant type
220 | s_oid = { 2:["unsignedIntegerValue",0,0], 4:["stringValue",0,'string'], 6:["objectIndex",0,0] }
221 | s_dictionary = {1:["element",1,{ 1:["key",0,s_oid], 2:["value",0,s_oid]}]}
222 | s_table = { 2: ["version", 1, { 3: ["data", 0, {
223 |     3: ["object",1,{
224 |         1:["registerLatest",0,{2:["contents",0,s_oid]}],
225 |         6:["dictionary",0,s_dictionary],
226 |         10:["string",0,s_string],
227 |         13:["custom",0,{
228 |             1:["type",0,0],
229 |             3:["mapEntry",1,{
230 |                 1:["key",0,0],
231 |                 2:["value",0,s_oid]
232 |             }]
233 |         }],
234 |         16:["orderedSet",0,{
235 |             1: ["ordering",0, {
236 |                 1:["array",0,{
237 |                     1:["contents",0,s_string],
238 |                     2:["attachments",1,{1:["index",0,0],2:["uuid",0,0]}]
239 |                 }],
240 |                 2:["contents",0,s_dictionary]
241 |             }],
242 |             2: ["elements",0,s_dictionary]
243 |         }]
244 |     }],
245 |     4:["keyItem",1,"string"],
246 |     5:["typeItem",1,"string"],
247 |     6:["uuidItem",1,"bytes"]
248 | }]}]}
249 | 
250 | 
251 | if __name__ == '__main__':
252 |     import uuid
253 |     from hashlib import md5
254 | 
255 |     def write(data,*path):
256 |         path = os.path.join(*path)
257 |         os.makedirs(os.path.dirname(path),exist_ok=True)
258 |         open(path,'wb').write(data)
259 | 
260 |     def writej(data,*path):
261 |         write(json.dumps(data,indent=True).encode('utf8'),*path)
262 | 
263 |     dest = 'notes.qvnotebook'
264 |     if len(sys.argv)>1:
265 |         dest = sys.argv[1]
266 | 
267 |     root = os.path.expanduser("~/Library/Group Containers/group.com.apple.notes")
268 |     dbpath = os.path.join(root,'NoteStore.sqlite')
269 |     db = sqlite3.Connection(dbpath)
270 | 
271 |     fn = os.path.join(dest,'meta.json')
272 |     if not os.path.exists(fn):
273 |         import uuid
274 |         writej({'name': 'Notes', 'uuid': str(uuid.uuid4())},fn)
275 | 
276 |     # process attachments first
277 |     attachments = {}
278 |     mquery = '''select a.zmergeabledata, a.ztypeuti, b.zidentifier, b.zfilename, a.zurlstring,a.ztitle
279 |                   from ziccloudsyncingobject a left join ziccloudsyncingobject b on a.zmedia = b.z_pk
280 |                  where a.zcryptotag is null and a.ztypeuti is not null and a.zidentifier = ?'''
281 | 
282 |     nquery = '''select a.zidentifier, a.ztitle1, a.zcreationdate1, a.zmodificationdate1, n.zdata 
283 |                   from zicnotedata n 
284 |                   join ziccloudsyncingobject a on a.znotedata = n.z_pk 
285 |                  where n.zcryptotag is null and zdata is not null and zmarkedfordeletion is not 1'''
286 | 
287 |     # For each note
288 |     for id,title,create,modify,data in db.execute(nquery):
289 |         dn = id+'.qvnote'
290 |         
291 |         def get_attach(id):
292 |             "Find attachment via db / filesystem, copy into note and return html to reference it"
293 |             row = db.execute(mquery,(id,)).fetchone()
294 |             if not row:
295 |                 print("Missed attachment",id)
296 |                 return ""
297 | 
298 |             data, typ, id2, fname, url, title = row
299 |             if typ == 'com.apple.notes.table' and data:
300 |                 doc = parse(decompress(data,47),s_table)
301 |                 return render_table_html(doc['version'][0]['data'])
302 |             elif typ == 'public.url':
303 |                 # there is a preview image somewhere too, but not sure I care
304 |                 return E("a",title or url,href=url)
305 |             elif fname:
306 |                 fn = os.path.join('Media',id2,fname)
307 |             else:
308 |                 fn = os.path.join('FallbackImages',id+'.jpg')
309 | 
310 |             src = os.path.join(root,fn)
311 |             if os.path.exists(src):
312 |                 data = open(src,'rb').read()
313 |                 hc = md5(data).hexdigest().upper()
314 |                 _,ext = os.path.splitext(src)
315 |                 fn2 = hc+ext
316 |                 write(data, dest, dn, 'resources', fn2)
317 |                 if ext in ['.jpg','.jpeg','.png','.tiff']:
318 |                     return E('img', src=f'quiver-image-url/{fn2}',alt=fn)
319 |                 else:
320 |                     return E('a',fn,href=f'quiver-file-url/{fn2}')
321 |             print("fail",id,typ)
322 |             return E('span')
323 | 
324 |         pb = decompress(data,47)
325 |         doc = parse(pb,s_doc)['version'][0]['data']
326 |         section = render_html(doc,get_attach)
327 |         section = ET.tostring(section,method="html").decode('utf8')
328 | 
329 |         unix_ts = 0
330 |         content = {'title': title, 'cells': [{ 'type': 'text', 'data': section }]}
331 |         meta = {'uuid':id, 'created_at': int(create)+978307200, 'tags': [], 'title': title, 'updated_at': int(modify)+978307200}
332 |         writej(content, dest, dn, 'content.json')
333 |         writej(meta, dest, dn, 'meta.json')
334 | 
335 |     print(f"wrote files to {dest}")
336 | 


--------------------------------------------------------------------------------