├── .gitignore ├── README.md ├── bin ├── ingest ├── publish-albums └── publish-posts ├── conf.py.sample ├── ditchbook ├── __init__.py ├── albums.py ├── ingest.py ├── micropub.py ├── posts.py └── util.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | __pycache__ 3 | *.egg-info 4 | conf.py 5 | export 6 | mf2 7 | venv 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Ditchbook: Facebook to Micropub Toolkit 2 | ======================================= 3 | 4 | Ditchbook is a toolkit for taking a high-fidelity Facebook JSON export, and 5 | migrating selected content to your [Micropub](https://indieweb.org/micropub) 6 | compatible website, including [Micro.blog](https://micro.blog) websites. Its a 7 | great way to own your own data, and free yourself from Facebook. 8 | 9 | Usage 10 | ----- 11 | 12 | ### Installation 13 | 14 | First, you'll need to clone or download this project. Ditchbook requires Python 15 | 3.6 or greater to run. I recommend installing inside of a `virtualenv`: 16 | 17 | ```sh 18 | $ git clone git@github.com:cleverdevil/ditchbook.git 19 | $ cd ditchbook 20 | $ virtualenv -p python3.6 venv 21 | $ . venv/bin/activate 22 | $ python setup.py develop 23 | ``` 24 | 25 | ### Create a Facebook JSON Export 26 | 27 | Once you've got a working installation, you'll need to create a Facebook export 28 | [here](https://www.facebook.com/settings?tab=your_facebook_information). Select 29 | `JSON` for the type of export and use high resolution photos. This can take a 30 | few hours. Once you're done, download the ZIP file, and uncompress it into a 31 | directory. Let's assume you've placed it in a directory called "export" 32 | contained within your `ditchbook` directory. 33 | 34 | ### Ingest and Convert Facebook Data 35 | 36 | Next, you'll want to run the `ingest` script, which will try and read in data 37 | from your Facebook data, and then output it as standard 38 | [microformats2](http://indieweb.org/microformats2) JSON data. 39 | 40 | ```sh 41 | $ bin/ingest export 42 | ``` 43 | 44 | If all went well, you'll have a directory named `mf2` containing your converted 45 | data. Huzzah! 46 | 47 | ### Configure Micropub 48 | 49 | Next, copy `conf.py.sample` to `conf.py` and make the appropriate edits. You'll 50 | need to provide your micropub endpoint, micropub media endpoint, micropub token, 51 | and the destination. 52 | 53 | In addition, you can configure a mapping of names to hyperlinks for when you've 54 | got "mentions" of people in your content. 55 | 56 | ### Publish: Albums 57 | 58 | Now, its time to publish your photo albums. 59 | 60 | ```sh 61 | $ bin/publish-albums 62 | ``` 63 | 64 | Script will loop through the albums contained in your export, give you some 65 | basic information about each one, and give you the choice of migrating an album 66 | to your website, or not. 67 | 68 | I'd recommend against uploading albums with many hundreds (or thousands) of 69 | photos. Facebook creates an album called "Mobile Uploads" that tends to contain 70 | every single photo ever uploaded with your iOS or Android device, and you're 71 | better off not migrating that album. The photos themselves will be migrated as 72 | "posts" in a future step, if you choose. 73 | 74 | ### Publish: Posts 75 | 76 | Finally, you can publish other "posts" from Facebook, which includes notes, 77 | status updates, and photos. Because the content isn't particularly well-suited 78 | for migration, ditchbook will ignore many types of content, and focuses on the 79 | types of data you'd like on your website. That means link sharing, events, and 80 | other related data won't be migrated. 81 | 82 | At this time, ditchbook doesn't support migrating videos. Maybe I'll get around 83 | to it in the future. We'll see. 84 | 85 | ```sh 86 | $ bin/publish-posts 87 | ``` 88 | 89 | This script will run in a similar fashion to the `publish-albums` script, but 90 | will automatically publish posts that have at least one photo. Everything else 91 | will ask you for confirmation. 92 | 93 | Future 94 | ------ 95 | 96 | Ideally, I'd like to make this process easier and more seamless for end users. 97 | Feel free to use the code to do that! I have no intention of using this code for 98 | any commercial purpose, and instead and primarily motivated to help people free 99 | themselves from Facebook, and control their own information. 100 | 101 | Enjoy! 102 | 103 | License 104 | ------- 105 | This code is licensed under the MIT license. 106 | 107 | Copyright 2018, Jonathan LaCour 108 | 109 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 110 | 111 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 112 | 113 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 114 | -------------------------------------------------------------------------------- /bin/ingest: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from ditchbook.ingest import process_album, process_post 4 | 5 | import json 6 | import os 7 | import sys 8 | 9 | 10 | if __name__ == '__main__': 11 | # get path to Facebook export 12 | if len(sys.argv) < 2: 13 | print("Usage: ") 14 | print(" $ bin/ingest ") 15 | sys.exit(1) 16 | path = sys.argv[1] 17 | 18 | # create export directories 19 | os.system("mkdir -p mf2/albums") 20 | 21 | # first, handle posts 22 | with open('%s/posts/your_posts.json' % path, 'r') as posts_file: 23 | posts = json.loads(posts_file.read()) 24 | for post in posts.get('status_updates', []): 25 | mf2 = process_post(post) 26 | if not mf2: 27 | continue 28 | 29 | with open('mf2/%d.json' % post['timestamp'], 'w') as output: 30 | output.write(json.dumps(mf2, indent=2)) 31 | 32 | # then, handle albums 33 | for album in os.listdir('%s/photos_and_videos/album' % path): 34 | with open(os.path.join('%s/photos_and_videos/album' % path, album), 'r') as album_file: 35 | album_data = json.loads(album_file.read()) 36 | mf2 = process_album(album_data) 37 | if not mf2: 38 | continue 39 | 40 | with open('mf2/albums/%d.json' % album_data['last_modified_timestamp'], 'w') as output: 41 | output.write(json.dumps(mf2, indent=2)) 42 | -------------------------------------------------------------------------------- /bin/publish-albums: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from ditchbook.albums import process_album 4 | 5 | import sys 6 | import os 7 | import json 8 | 9 | 10 | def main(): 11 | for filename in os.listdir('mf2/albums'): 12 | album = json.loads(open(os.path.join('mf2/albums', filename)).read()) 13 | 14 | print('=' * 80) 15 | print('Album Name:', album['properties'].get('name', ['No Name'])[0]) 16 | print(' ->', 'Published', album['properties'].get('published', [''])[0]) 17 | print(' ->', len(album.get('children', [])), 'photos') 18 | 19 | answer = input('Migrate album [Y, n]? ') 20 | if answer in ('Y', 'y'): 21 | process_album(album) 22 | 23 | 24 | if __name__ == '__main__': 25 | main() 26 | -------------------------------------------------------------------------------- /bin/publish-posts: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from ditchbook.posts import process_post 4 | 5 | import os 6 | import json 7 | 8 | 9 | def main(): 10 | for filename in os.listdir('mf2'): 11 | if not filename.endswith('.json'): 12 | continue 13 | 14 | post = json.loads(open(os.path.join('mf2', filename)).read()) 15 | 16 | print('=' * 80) 17 | print('Post Title:', post['properties'].get('name', ['No Name'])[0]) 18 | print(' ->', 'Published', post['properties'].get('published', [''])[0]) 19 | print(' ->', len(post['properties'].get('photo', [])), 'photos.') 20 | print(' ->', post['properties'].get('content', [''])[0]) 21 | 22 | if len(post['properties'].get('photo', [])): 23 | print(' <-- AUTOMATICALLY POSTING: PHOTO PRESENT -->') 24 | process_post(post) 25 | else: 26 | answer = input('Migrate post [Y, n]? ') 27 | if answer in ('Y', 'y'): 28 | process_post(post) 29 | 30 | 31 | if __name__ == '__main__': 32 | main() 33 | -------------------------------------------------------------------------------- /conf.py.sample: -------------------------------------------------------------------------------- 1 | from datetime import datetime 2 | from pytz import timezone 3 | 4 | 5 | TOKEN = 'INSERT-TOKEN-HERE' 6 | MP_ENDPOINT = 'https://micro.blog/micropub' 7 | MP_MEDIA_ENDPOINT = 'https://micro.blog/micropub/media' 8 | MP_DESTINATION = 'http://your-username.micro.blog' 9 | 10 | MENTION_MAP = { 11 | 'Jonathan LaCour': 'Jonathan LaCour' 12 | } 13 | 14 | # If you want to map particular date ranges to a specific time zone, go ahead 15 | TIMEZONES = [ 16 | (datetime(1900, 1, 1), datetime(2200, 1, 1), timezone('US/Eastern')), 17 | ] 18 | 19 | -------------------------------------------------------------------------------- /ditchbook/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cleverdevil/ditchbook/471a6804d805e86ddcca2de78fb1aa4dc0389697/ditchbook/__init__.py -------------------------------------------------------------------------------- /ditchbook/albums.py: -------------------------------------------------------------------------------- 1 | from ditchbook import micropub 2 | from ditchbook.util import replace_mentions 3 | 4 | 5 | photo_tmpl = '''
6 | 7 | %(caption)s 8 |
''' 9 | 10 | caption_tmpl = '''

%(caption)s

''' 11 | 12 | 13 | def process_album(album): 14 | mf2 = album.copy() 15 | mf2['properties']['photo'] = [] 16 | content_parts = [] 17 | 18 | # add the album caption 19 | album_caption = mf2['properties'].get('content', [None])[0] 20 | if album_caption: 21 | album_caption = replace_mentions(album_caption) 22 | content_parts.append(caption_tmpl % {'caption': album_caption}) 23 | 24 | # iterate through the children and populate the body 25 | for child in mf2.get('children', []): 26 | uploaded_photo = micropub.upload(child['properties']['photo'][0]) 27 | 28 | ns = {'photo_uri': uploaded_photo, 'caption': ''} 29 | caption = child['properties'].get('content', [None])[0] 30 | if caption: 31 | caption = replace_mentions(caption) 32 | ns['caption'] = caption_tmpl % {'caption': caption} 33 | 34 | content_parts.append( 35 | photo_tmpl % ns 36 | ) 37 | 38 | del mf2['children'] 39 | 40 | mf2['properties']['content'] = [ {'html': ''.join(content_parts)} ] 41 | 42 | # publish it 43 | micropub.publish(mf2) 44 | -------------------------------------------------------------------------------- /ditchbook/ingest.py: -------------------------------------------------------------------------------- 1 | from ditchbook import micropub 2 | 3 | from datetime import datetime 4 | from pytz import timezone, utc 5 | 6 | import json 7 | import re 8 | import os 9 | import conf 10 | 11 | 12 | FILTERS = [ 13 | re.compile(".* wrote on .*'s timeline."), 14 | re.compile(".* shared .*'s .*."), 15 | re.compile(".* shared (a|an) (group|quote|memory|event|link|post|Page)."), 16 | re.compile(".* shared (his|her) (post|video|album)."), 17 | re.compile(".* posted in .*."), 18 | re.compile(".* was (in|at) .*."), 19 | re.compile(".* was live in."), 20 | re.compile(".* updated the (group|event) photo in .*."), 21 | re.compile(".* was looking for recommendations.") 22 | ] 23 | 24 | NOTE_EXPR = re.compile(".* published a note.") 25 | PHOTO_EXPR = re.compile(".* (added|posted) .* (photo|photos).") 26 | VIDEO_EXPR = re.compile(".* added .* new (video|videos).") 27 | 28 | 29 | def apply_timezone(dt, default=utc): 30 | for start, stop, zone in conf.TIMEZONES: 31 | if start <= dt <= stop: 32 | return zone.localize(dt).astimezone(utc) 33 | 34 | return default.localize(dt).astimezone(utc) 35 | 36 | 37 | def process_post(post): 38 | for expr in FILTERS: 39 | if expr.match(post.get('title', '')): 40 | return 41 | 42 | # get localized datetime and then convert to UTC 43 | dt = datetime.fromtimestamp(post['timestamp']) 44 | dt = apply_timezone(dt) 45 | 46 | # create MF2 container 47 | mf2 = { 48 | 'type': ['h-entry'], 49 | 'properties': { 50 | 'published': [dt.isoformat(sep=' ')] 51 | } 52 | } 53 | 54 | # if the post doesn't have a 'title', I can't classify it 55 | # so skip it 56 | if not post.get('title'): 57 | return 58 | 59 | # check to see if this is a "note" 60 | if NOTE_EXPR.match(post['title']): 61 | note = post['attachments'][0]['data'][0]['note'] 62 | mf2['properties']['content'] = [note['text']] 63 | mf2['properties']['name'] = [note['title']] 64 | 65 | # handle photos 66 | elif PHOTO_EXPR.match(post['title']): 67 | if 'data' in post: 68 | mf2['properties']['content'] = [post['data'][0]['post']] 69 | 70 | mf2['properties']['photo'] = [] 71 | for attachment in post['attachments']: 72 | for media in attachment['data']: 73 | try: 74 | mf2['properties']['photo'].append( 75 | 'export/%s' % media['media']['uri'] 76 | ) 77 | except KeyError: 78 | print('-' * 80) 79 | print('Unexpected missing key in media ->') 80 | print(media) 81 | print('-' * 80) 82 | return 83 | 84 | # handle videos 85 | elif VIDEO_EXPR.match(post['title']): 86 | if 'data' in post: 87 | mf2['properties']['content'] = [post['data'][0]['post']] 88 | 89 | if 'attachments' not in post: 90 | return 91 | 92 | mf2['properties']['video'] = [] 93 | for attachment in post['attachments']: 94 | for media in attachment['data']: 95 | mf2['properties']['video'].append( 96 | 'export/%s' % media['media']['uri'] 97 | ) 98 | 99 | # handle standard status updates 100 | elif 'data' in post: 101 | try: 102 | mf2['properties']['content'] = [post['data'][0]['post']] 103 | except KeyError: 104 | print('-' * 80) 105 | print('Unexpected missing key "post" ->') 106 | print(post['data']) 107 | print('-' * 80) 108 | return 109 | 110 | else: 111 | return 112 | 113 | return mf2 114 | 115 | 116 | def process_album(album): 117 | # get localized datetime and then convert to UTC 118 | dt = datetime.fromtimestamp(album['last_modified_timestamp']) 119 | dt = apply_timezone(dt) 120 | 121 | # create basic MF2 JSON structure 122 | mf2 = { 123 | 'type': ['h-entry'], 124 | 'properties': { 125 | 'name': [album['name']], 126 | 'published': [dt.isoformat(sep=' ')] 127 | } 128 | } 129 | 130 | # add "featured" photo, if present 131 | if 'cover_photo' in album: 132 | mf2['properties']['featured'] = [ 133 | 'export/' + album['cover_photo']['uri'] 134 | ] 135 | 136 | # if the album has a caption, set it 137 | if 'description' in album: 138 | mf2['properties']['content'] = [ 139 | album['description'] 140 | ] 141 | 142 | # append the children to parent 143 | mf2['children'] = [] 144 | for photo in album['photos']: 145 | child = { 146 | 'type': ['h-entry'], 147 | 'properties': { 148 | 'photo': [ 149 | 'export/' + photo['uri'] 150 | ] 151 | } 152 | } 153 | 154 | # add photo caption, if available 155 | if 'description' in photo: 156 | child['properties']['content'] = [photo['description']] 157 | 158 | mf2['children'].append(child) 159 | 160 | return mf2 161 | -------------------------------------------------------------------------------- /ditchbook/micropub.py: -------------------------------------------------------------------------------- 1 | import os 2 | import requests 3 | import sys 4 | import conf 5 | import json 6 | 7 | 8 | def generate_headers(): 9 | headers = { 10 | 'Authorization': 'Bearer %s' % conf.TOKEN 11 | } 12 | if hasattr(conf, 'MP_DESTINATION'): 13 | headers['mp-destination'] = conf.MP_DESTINATION 14 | return headers 15 | 16 | 17 | def upload(file_path): 18 | print('Attempting to upload:', file_path) 19 | 20 | files = {'file': ('image.jpg', open(file_path, 'rb'), 'image/jpeg')} 21 | response = requests.post( 22 | conf.MP_MEDIA_ENDPOINT, 23 | headers=generate_headers(), 24 | files=files 25 | ) 26 | 27 | if response.status_code == 202: 28 | print(' Uploaded -> ', response.headers['Location']) 29 | return response.headers['Location'] 30 | else: 31 | print(' Failed to upload! Status code', response.status_code) 32 | sys.exit(1) 33 | return None 34 | 35 | 36 | def publish(mf2): 37 | print('Publishing MF2:') 38 | print(json.dumps(mf2, indent=2)) 39 | 40 | response = requests.post( 41 | conf.MP_ENDPOINT, 42 | json=mf2, 43 | headers=generate_headers() 44 | ) 45 | 46 | if response.status_code == 202: 47 | print(' Published -> ', response.headers['Location']) 48 | return response.headers['Location'] 49 | else: 50 | print(' Failed to publish! Status code', response.status_code) 51 | sys.exit(1) 52 | return None 53 | -------------------------------------------------------------------------------- /ditchbook/posts.py: -------------------------------------------------------------------------------- 1 | from ditchbook import micropub 2 | from ditchbook.util import replace_mentions 3 | 4 | 5 | def process_post(post): 6 | mf2 = post.copy() 7 | 8 | # skip videos for now 9 | if mf2['properties'].get('video'): 10 | return 11 | 12 | # upload photos, if any 13 | photos = [] 14 | for photo in post['properties'].get('photo', []): 15 | photos.append(micropub.upload(photo)) 16 | if len(photos): 17 | mf2['properties']['photo'] = photos 18 | 19 | # prepare content 20 | if len(mf2['properties'].get('content', [])): 21 | mf2['properties']['content'] = [ 22 | {'html': replace_mentions(mf2['properties']['content'][0])} 23 | ] 24 | 25 | # publish it 26 | micropub.publish(mf2) 27 | -------------------------------------------------------------------------------- /ditchbook/util.py: -------------------------------------------------------------------------------- 1 | import re 2 | import conf 3 | 4 | 5 | MENTION_EXPR = re.compile("@\[\d+:\d+:(.*)]") 6 | 7 | def replace_mentions(content): 8 | def f(match): 9 | name = match.groups()[0] 10 | return conf.MENTION_MAP.get(name, name) 11 | return re.sub(MENTION_EXPR, f, content) 12 | 13 | 14 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | from setuptools import setup, find_packages 3 | 4 | setup( 5 | name='ditchbook', 6 | version='0.1.0', 7 | description='Migrate a Facebook JSON export to your Micropub website.', 8 | author='Jonathan LaCour', 9 | author_email='jonathan@cleverdevil.org', 10 | install_requires=[ 11 | "requests", 12 | "pytz", 13 | ], 14 | zip_safe=False, 15 | include_package_data=True, 16 | packages=find_packages(exclude=['ez_setup']), 17 | ) 18 | --------------------------------------------------------------------------------