├── .gitignore ├── README.md ├── TwitterTool.py ├── credentials.txt └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | venv/ 2 | *.sw[a-z] 3 | *.pyc 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Archive-Tweets 2 | 3 | This is a command-line tool written in Python 2.7, 4 | to archive or delete all tweets that you have posted, or that you have liked. 5 | It is optionally possible to also archive all attached media (images, etc.) 6 | 7 | ## Use 8 | 9 | Windows: set up a UNIX-compatible interface, like Cygwin. Then follow 10 | the Linux/OSX instructions below. 11 | 12 | Linux/OSX: Open your terminal, clone this repo, `pip install requirements.txt`. 13 | This software will need to authenticate with Twitter, so you'll need to 14 | [create an app on Twitter](https://apps.twitter.com/) and insert the credentials obtained 15 | into the `credentials.txt` file. 16 | 17 | Then you can run the script. I suggest using `-W ignore` to suppress SSL warnings. 18 | There are several flags: 19 | 20 | - `--posted` to indicate that you want to select the tweets you have personally made/retweeted 21 | - `--liked` to indicate that you want to select the tweets you have liked 22 | - `--archive` to indicate that you want to download the selected tweets 23 | - `--delete` to indicate that you want to delete the selected tweets (un-like in the case of `--liked`) 24 | - `--media` to indicate that if you're archiving tweets, you also want to save their media attachments (images, etc.) 25 | 26 | You may select only one of `--posted` and `--liked` at a time, not both. The `--media` 27 | flag requires the `--archive` flag: you can only archive the media attachments if you're already 28 | archiving the tweets in the first place. 29 | 30 | Some examples of correct use: 31 | 32 | `$ python -W ignore TwitterTool.py --archive --media --liked` 33 | `$ python -W ignore TwitterTool.py --archive --delete --posted` 34 | 35 | The software then creates a folder, `Archive-Liked-Tweets` or `Archive-Personal-Tweets`, 36 | depending on whether you selected `--liked` or `--posted`, respectively, and within that, 37 | creates a new folder for every tweet, with the path name given by the timestamp of the tweet's 38 | publication and the tweet's unique identifier. Within every tweet's folder is the pretty-printed `.json` 39 | object representing the tweet, as well as any attached media files, if the option to download them was selected. 40 | 41 | ## Rate Limits 42 | 43 | Since this software is reliant on the Twitter API, the rate limits apply: 44 | 45 | - For personal tweets, you can make 46 | [900 queries every 15 minutes](https://dev.twitter.com/rest/reference/get/statuses/user_timeline), 47 | in blocks of 200 requests per query. 48 | 49 | - For liked tweets, you can make 50 | [75 queries every 15 minutes](https://dev.twitter.com/rest/reference/get/favorites/list), 51 | in blocks of 200 requests per query. 52 | 53 | With these generous limits in place, you should find it possible to handle your entire 54 | timeline rather swiftly. Should you hit a rate limit, the app will simply sleep until 55 | the fifteen minute period is over. 56 | 57 | ## Display 58 | 59 | Display of tweets has not been a priority in development so far. I believe there are other 60 | Open-Source projects that have done [a reasonable job](https://github.com/amwhalen/archive-my-tweets) at this, which you can 61 | adapt straight-forwardly. (I'd be happy to accept a PR that generates pages rendering the archived tweets. It is my plan to do this eventually.) 62 | 63 | To search all the tweets in a directory for some text, `cd` into the relevant directory, and then use: 64 | `grep -rnw . -e ""` 65 | e.g. `grep -rnw ./Archive-Liked-Tweets -e "rice pudding"` 66 | 67 | ## Notes 68 | 69 | - The tool relies on the [Python-Twitter](https://github.com/bear/python-twitter) library, 70 | which provides a helpful wrapper around Twitter's API. Duly note that Twitter makes changes 71 | to their API once in a while (months/years), which makes it possible for the objects 72 | that the API functions (`api.GetUserTimeline, api.GetFavorites`, etc.) return to be erroneous. 73 | 74 | To avoid such errors, you should run this tool from the REPL and make a few calls to the API, 75 | and double-check the JSON objects that you get against the [Twitter API](https://dev.twitter.com/rest/reference). 76 | The documentation for every endpoint has examples of the returned objects. 77 | 78 | Example of REPL use: 79 | ``` 80 | >>> from TwitterTool import * 81 | >>> api = credentials_and_authenticate() 82 | >>> api.GetStatus(824666495305162752).__dict__['_json'] 83 | >>> api.GetUserTimeline(count=1) 84 | ``` 85 | 86 | - One frequent Twitter pattern is the posting of threads or tweetstorms, which this software 87 | currently does not handle automatically. You'd have to `like` all the tweets in a tweetstorm 88 | to archive all of them. It would be more convenient if you could just `like` the first one, 89 | and the software also grabs the rest for you. This is a bit of an inconvenience. and it's 90 | a currently open issue. 91 | -------------------------------------------------------------------------------- /TwitterTool.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2.7 2 | """ 3 | A small utility to save or delete all of your personally-posted or liked tweets. 4 | John Loeber | contact@johnloeber.com | January 13, 2017 | Python 2.7.6 5 | """ 6 | 7 | import dateutil.parser 8 | import twitter 9 | import json 10 | import os 11 | import sys 12 | import logging 13 | import argparse 14 | import ConfigParser 15 | import urllib 16 | 17 | from time import time, sleep 18 | from math import ceil 19 | 20 | # to enable saving logs, consider e.g. .basicConfig(filename="twitter-tool-x.log") 21 | # where x should be a unique identifier for this log (e.g. timestamp). 22 | logging.basicConfig(stream=sys.stdout, level=logging.INFO) 23 | 24 | # we will set the twitter api session in main(), this is to instantiate it as a global 25 | api = None 26 | 27 | def download_media(folder_path, media_url, fallback_filename): 28 | """ 29 | Download a media file contained in a tweet. 30 | - folder_path: string, the name of the folder in which to save the file 31 | - media_url: string, url to the media file 32 | - fallback_filename: 33 | """ 34 | logging.info("Preparing to download media: " + media_url) 35 | if "/media/" in media_url: 36 | # I am not entirely sure if all media_urls contain "/media/", hence this conditional 37 | media_suffix = media_url[media_url.index("/media/")+7:] 38 | else: 39 | extension = "." + media_url.split(".")[-1] 40 | media_suffix = fallback_filename + extension 41 | 42 | file_path = folder_path + "/" + media_suffix 43 | # often, the media item in 'entities' is also in 'extended_entities'. don't download twice. 44 | if not media_suffix in os.listdir(folder_path): 45 | logging.info("Downloading media: " + media_url) 46 | urllib.urlretrieve(media_url, file_path) 47 | else: 48 | logging.info("Skipped duplicate media download: " + media_url) 49 | 50 | def archive_single_tweet(tweet, archive_name, id_str, media): 51 | """ 52 | Archives a single tweet. 53 | - tweet: twitter.Status, representing the tweet 54 | - archive_name: string, the folder in which this tweet is to have its archive sub-folder 55 | - id_str: string, the tweet's unique identifier 56 | - media: boolean, save attached media if True. 57 | """ 58 | tweet_as_dict = tweet.AsDict() 59 | logging.info("Archiving tweet id: " + id_str) 60 | created_at = dateutil.parser.parse(tweet_as_dict['created_at']) 61 | folder_name = created_at.strftime("%Y-%m-%d-%H:%M:%S") + "-" + id_str 62 | folder_path = archive_name + "/" + folder_name 63 | 64 | if os.path.exists(folder_path): 65 | logging.info("Trying to archive tweet: " + folder_name + "\n\tArchive folder already exists. Proceeding anyway.") 66 | else: 67 | os.makedirs(folder_path) 68 | 69 | file_name = "tweet-" + id_str 70 | file_path = folder_path + "/" + file_name + ".json" 71 | tweet_as_json = tweet.__dict__['_json'] 72 | 73 | with open(file_path, "w") as f: 74 | json.dump(tweet_as_json, f, indent=4, sort_keys=True, separators=(',', ':')) 75 | 76 | if media: 77 | # handle media attachments 78 | if 'media' in tweet_as_json['entities']: 79 | tweet_entities_media = tweet_as_json['entities']['media'] 80 | for media_index, media_item in enumerate(tweet_entities_media): 81 | fallback_file_name = "media_" + str(media_index) 82 | download_media(folder_path, media_item['media_url'], fallback_file_name) 83 | 84 | if 'extended_entities' in tweet_as_json: 85 | if 'media' in tweet_as_json['extended_entities']: 86 | tweet_ee_media = tweet_as_json['extended_entities']['media'] 87 | for media_index, media_item in enumerate(tweet_ee_media): 88 | fallback_file_name = "extended_media_" + str(media_index) 89 | download_media(folder_path, media_item['media_url'], fallback_file_name) 90 | 91 | def handle_single_liked_tweet(tweet, archive, delete, media): 92 | """ 93 | archives or deletes a single linked tweet. 94 | - tweet: twitter.Status, representing the tweet 95 | - archive: boolean, saving the tweet if True 96 | - delete: boolean, un-liking the tweet if True 97 | - media: boolean, saving the tweet's media if True (and if archive is True) 98 | """ 99 | id_str = tweet.__dict__['id_str'] 100 | logging.info("Handling tweet id: " + id_str) 101 | 102 | if archive: 103 | archive_name = "Archive-Liked-Tweets" 104 | archive_single_tweet(tweet, archive_name, id_str, media) 105 | 106 | if delete: 107 | logging.info("Un-liking tweet: " + id_str) 108 | api.DestroyFavorite(status_id=tweet.__dict__['id']) 109 | 110 | def handle_single_personal_tweet(tweet, archive, delete, media): 111 | """ 112 | archives or deletes a single personal tweet. 113 | - tweet: twitter.Status, representing the tweet 114 | - archive: boolean, saving the tweet if True 115 | - delete: boolean, deleting the tweet if True 116 | - media: boolean, saving the tweet's media if True (and if archive is True) 117 | """ 118 | id_str = tweet.__dict__['id_str'] 119 | logging.info("Handling tweet id: " + id_str) 120 | 121 | if archive: 122 | archive_name = "Archive-Personal-Tweets" 123 | archive_single_tweet(tweet, archive_name, id_str, media) 124 | 125 | if delete: 126 | logging.info("Deleting tweet: " + id_str) 127 | api.DestroyStatus(status_id=tweet.__dict__['id']) 128 | 129 | def handle_liked_tweets(archive, delete, media): 130 | """ 131 | archives or deletes as many liked tweets as possible. (see README for limits.) 132 | - archive: boolean, saving the tweets if True 133 | - delete: boolean, un-liking the tweets if True 134 | - media: boolean, saving each tweet's media if True (and if archive is True) 135 | """ 136 | if not os.path.exists("Archive-Liked-Tweets"): 137 | os.makedirs("Archive-Liked-Tweets") 138 | 139 | liked_ratelimit = api.CheckRateLimit("https://api.twitter.com/1.1/favorites/list.json") 140 | remaining = liked_ratelimit.remaining 141 | reset_timestamp = liked_ratelimit.reset 142 | logging.info("Rate Limit Status: " + str(remaining) + " calls to `favorites` remaining in this 15-minute time period.") 143 | 144 | if remaining > 0: 145 | logging.info("Retrieving a new batch of favorites!") 146 | favorites = api.GetFavorites(count=200) 147 | for favorite in favorites: 148 | handle_single_liked_tweet(favorite, archive, delete, media) 149 | if len(favorites) == 0: 150 | logging.info("There are no more liked tweets to handle!") 151 | else: 152 | handle_liked_tweets(archive, delete, media) 153 | else: 154 | logging.info("Rate limit has been hit! Sleeping until rate limit resets.") 155 | seconds_until_reset = int(ceil(time() - reset_timestamp)) 156 | sleep(seconds_until_reset) 157 | handle_liked_tweets(archive, delete, media) 158 | 159 | 160 | def handle_personal_tweets(archive, delete, media): 161 | """ 162 | archives or deletes as many personal tweets as possible. (see README for limits.) 163 | - archive: boolean, saving the tweets if True 164 | - delete: boolean, deleting the tweets if True 165 | - media: boolean, saving each tweet's media if True (and if archive is True) 166 | """ 167 | if not os.path.exists("Archive-Personal-Tweets"): 168 | os.makedirs("Archive-Personal-Tweets") 169 | 170 | usertimeline_ratelimit = api.CheckRateLimit("https://api.twitter.com/1.1/statuses/user_timeline.json") 171 | remaining = usertimeline_ratelimit.remaining 172 | reset_timestamp = usertimeline_ratelimit.reset 173 | logging.info("Rate Limit Status: " + str(remaining) + " calls to `user_timeline` remaining in this 15-minute time period.") 174 | 175 | if remaining > 0: 176 | logging.info("Retrieving a new batch of personal tweets!") 177 | tweets = api.GetUserTimeline(count=200) 178 | for tweet in tweets: 179 | handle_single_personal_tweet(tweet, archive, delete, media) 180 | if len(tweets) == 0: 181 | logging.info("There are no more personal tweets to handle!") 182 | else: 183 | handle_personal_tweets(archive, delete, media) 184 | else: 185 | logging.info("Rate limit has been hit! Sleeping until rate limit resets.") 186 | seconds_until_reset = int(ceil(time() - reset_timestamp)) 187 | sleep(seconds_until_reset) 188 | handle_personal_tweets(archive, delete, media) 189 | 190 | def arguments_and_confirm(): 191 | """ 192 | handle the user's command-line arguments, ensure input is valid, 193 | confirm the user's intention. 194 | """ 195 | parser = argparse.ArgumentParser(description='See README for help with running this program.') 196 | group = parser.add_mutually_exclusive_group() 197 | group.add_argument("--liked", help="use this flag to handle liked/favorited tweets.", 198 | action="store_true", default=False) 199 | group.add_argument("--posted", help="use this flag to handle tweets that you have authored (retweets included).", 200 | action="store_true", default=False) 201 | parser.add_argument("--archive", help="use this flag to archive (save) tweets.", 202 | action="store_true", default=False) 203 | parser.add_argument("--delete", help="use this flag to delete/un-like tweets.", 204 | action="store_true", default=False) 205 | parser.add_argument("--media", help="use this flag to save media files attached to tweets, if archiving.", 206 | action="store_true", default=False) 207 | 208 | args = parser.parse_args() 209 | 210 | if not (args.posted or args.liked): 211 | raise ValueError("You must supply either the --posted or --liked flag to specify whether " 212 | "you want to handle the tweets that you made/retweeted, or the tweets you liked. " 213 | "\nPlease see README for instructions.") 214 | 215 | elif (not args.archive) and args.media: 216 | raise ValueError("You have selected not to archive, but to save media. This is impossible. " 217 | "You can only save media if you're archiving.\nPlease see README for instructions.") 218 | 219 | elif not (args.archive or args.delete): 220 | raise ValueError("You must supply at least one of the --archive or --delete flags, to " 221 | "specify what you want to do with the selected tweets. " 222 | "\nPlease see README for instructions.") 223 | 224 | else: 225 | option_string = "You have selected: " 226 | if args.archive: 227 | option_string += "to ARCHIVE " 228 | if args.media: 229 | option_string += "(and save media files) " 230 | if args.delete: 231 | if args.posted: 232 | option_string += "and DELETE " 233 | else: 234 | option_string += "and UN-LIKE " 235 | else: 236 | if args.posted: 237 | option_string += "to DELETE " 238 | else: 239 | option_string += "to UN-LIKE " 240 | 241 | if args.posted: 242 | liked_or_personal = 'personal' 243 | option_string += "ALL tweets you have POSTED (including retweets)." 244 | else: 245 | liked_or_personal = 'liked' 246 | option_string += "ALL tweets you have LIKED." 247 | 248 | print option_string 249 | 250 | while True: 251 | # loop in case the user does not confirm correctly. 252 | confirm = raw_input("Please confirm. Yes/No\n").lower() 253 | if len(confirm) >= 1: 254 | if confirm[0] == 'y': 255 | return liked_or_personal, args.archive, args.delete, args.media 256 | elif confirm[0] == 'n': 257 | sys.exit(0) 258 | 259 | def credentials_and_authenticate(): 260 | """ 261 | parse credentials from credentials.txt and authenticate with Twitter. 262 | """ 263 | config = ConfigParser.ConfigParser() 264 | config.read('credentials.txt') 265 | consumer_key = config.get('TWITTER-TOOL', 'consumer_key') 266 | consumer_secret = config.get('TWITTER-TOOL', 'consumer_secret') 267 | access_token_key = config.get('TWITTER-TOOL', 'access_token_key') 268 | access_token_secret = config.get('TWITTER-TOOL', 'access_token_secret') 269 | 270 | global api 271 | api = twitter.Api(consumer_key=consumer_key, 272 | consumer_secret=consumer_secret, 273 | access_token_key=access_token_key, 274 | access_token_secret=access_token_secret) 275 | 276 | # returning api instead of None so that it's possible to import and use this from the REPL. 277 | return api 278 | 279 | def main(): 280 | tweet_type, archive, delete, media = arguments_and_confirm() 281 | credentials_and_authenticate() 282 | 283 | if tweet_type == "personal": 284 | handle_personal_tweets(archive, delete, media) 285 | else: 286 | handle_liked_tweets(archive, delete, media) 287 | 288 | if __name__ == '__main__': 289 | main() 290 | -------------------------------------------------------------------------------- /credentials.txt: -------------------------------------------------------------------------------- 1 | [TWITTER-TOOL] 2 | consumer_key = WWWWWWWW 3 | consumer_secret = XXXXXXXXXXX 4 | access_token_key = YYYYYYYYYYYYY 5 | access_token_secret = ZZZZZZZZZZZZZ 6 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | appdirs==1.4.0 2 | future==0.16.0 3 | oauthlib==2.0.1 4 | packaging==16.8 5 | pyparsing==2.1.10 6 | python-dateutil==2.6.0 7 | python-twitter==3.2.1 8 | requests==2.13.0 9 | requests-oauthlib==0.7.0 10 | six==1.10.0 11 | --------------------------------------------------------------------------------