├── .gitignore
├── README.md
├── TwitterTool.py
├── credentials.txt
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | venv/
2 | *.sw[a-z]
3 | *.pyc
4 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Archive-Tweets
 2 | 
 3 | This is a command-line tool written in Python 2.7, 
 4 | to archive or delete all tweets that you have posted, or that you have liked. 
 5 | It is optionally possible to also archive all attached media (images, etc.)
 6 | 
 7 | ## Use
 8 | 
 9 | Windows: set up a UNIX-compatible interface, like Cygwin. Then follow
10 | the Linux/OSX instructions below.
11 | 
12 | Linux/OSX: Open your terminal, clone this repo, `pip install requirements.txt`.
13 | This software will need to authenticate with Twitter, so you'll need to 
14 | [create an app on Twitter](https://apps.twitter.com/) and insert the credentials obtained
15 | into the `credentials.txt` file.
16 | 
17 | Then you can run the script. I suggest using `-W ignore` to suppress SSL warnings.
18 | There are several flags:
19 | 
20 | - `--posted` to indicate that you want to select the tweets you have personally made/retweeted
21 | - `--liked` to indicate that you want to select the tweets you have liked
22 | - `--archive` to indicate that you want to download the selected tweets
23 | - `--delete` to indicate that you want to delete the selected tweets (un-like in the case of `--liked`)
24 | - `--media` to indicate that if you're archiving tweets, you also want to save their media attachments (images, etc.)
25 | 
26 | You may select only one of `--posted` and `--liked` at a time, not both. The `--media`
27 | flag requires the `--archive` flag: you can only archive the media attachments if you're already
28 | archiving the tweets in the first place.
29 | 
30 | Some examples of correct use:
31 | 
32 | `$ python -W ignore TwitterTool.py --archive --media --liked`    
33 | `$ python -W ignore TwitterTool.py --archive --delete --posted`
34 | 
35 | The software then creates a folder, `Archive-Liked-Tweets` or `Archive-Personal-Tweets`,
36 | depending on whether you selected `--liked` or `--posted`, respectively, and within that,
37 | creates a new folder for every tweet, with the path name given by the timestamp of the tweet's 
38 | publication and the tweet's unique identifier. Within every tweet's folder is the pretty-printed `.json` 
39 | object representing the tweet, as well as any attached media files, if the option to download them was selected.
40 | 
41 | ## Rate Limits
42 | 
43 | Since this software is reliant on the Twitter API, the rate limits apply:
44 | 
45 | - For personal tweets, you can make 
46 | [900 queries every 15 minutes](https://dev.twitter.com/rest/reference/get/statuses/user_timeline), 
47 | in blocks of 200 requests per query. 
48 | 
49 | - For liked tweets, you can make 
50 | [75 queries every 15 minutes](https://dev.twitter.com/rest/reference/get/favorites/list), 
51 | in blocks of 200 requests per query. 
52 | 
53 | With these generous limits in place, you should find it possible to handle your entire
54 | timeline rather swiftly. Should you hit a rate limit, the app will simply sleep until
55 | the fifteen minute period is over.
56 | 
57 | ## Display
58 | 
59 | Display of tweets has not been a priority in development so far. I believe there are other
60 | Open-Source projects that have done [a reasonable job](https://github.com/amwhalen/archive-my-tweets) at this, which you can
61 | adapt straight-forwardly. (I'd be happy to accept a PR that generates pages rendering the archived tweets. It is my plan to do this eventually.)
62 | 
63 | To search all the tweets in a directory for some text, `cd` into the relevant directory, and then use:
64 | `grep -rnw . -e "<your text here>"`
65 | e.g. `grep -rnw ./Archive-Liked-Tweets -e "rice pudding"`
66 | 
67 | ## Notes
68 | 
69 | - The tool relies on the [Python-Twitter](https://github.com/bear/python-twitter) library,
70 | which provides a helpful wrapper around Twitter's API. Duly note that Twitter makes changes
71 | to their API once in a while (months/years), which makes it possible for the objects 
72 | that the API functions (`api.GetUserTimeline, api.GetFavorites`, etc.) return to be erroneous.
73 | 
74 | To avoid such errors, you should run this tool from the REPL and make a few calls to the API,
75 | and double-check the JSON objects that you get against the [Twitter API](https://dev.twitter.com/rest/reference).
76 | The documentation for every endpoint has examples of the returned objects.
77 | 
78 | Example of REPL use:
79 | ```
80 | >>> from TwitterTool import *
81 | >>> api = credentials_and_authenticate()
82 | >>> api.GetStatus(824666495305162752).__dict__['_json']                             
83 | >>> api.GetUserTimeline(count=1)
84 | ```
85 | 
86 | - One frequent Twitter pattern is the posting of threads or tweetstorms, which this software
87 | currently does not handle automatically. You'd have to `like` all the tweets in a tweetstorm
88 | to archive all of them. It would be more convenient if you could just `like` the first one,
89 | and the software also grabs the rest for you. This is a bit of an inconvenience. and it's
90 | a currently open issue.
91 | 


--------------------------------------------------------------------------------
/TwitterTool.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python2.7
  2 | """
  3 | A small utility to save or delete all of your personally-posted or liked tweets.
  4 | John Loeber | contact@johnloeber.com | January 13, 2017 | Python 2.7.6
  5 | """
  6 | 
  7 | import dateutil.parser
  8 | import twitter
  9 | import json
 10 | import os
 11 | import sys
 12 | import logging
 13 | import argparse
 14 | import ConfigParser
 15 | import urllib
 16 | 
 17 | from time import time, sleep
 18 | from math import ceil
 19 | 
 20 | # to enable saving logs, consider e.g. .basicConfig(filename="twitter-tool-x.log")
 21 | # where x should be a unique identifier for this log (e.g. timestamp).
 22 | logging.basicConfig(stream=sys.stdout, level=logging.INFO)
 23 | 
 24 | # we will set the twitter api session in main(), this is to instantiate it as a global
 25 | api = None
 26 | 
 27 | def download_media(folder_path, media_url, fallback_filename):
 28 |     """
 29 |     Download a media file contained in a tweet.
 30 |         - folder_path: string, the name of the folder in which to save the file
 31 |         - media_url: string, url to the media file
 32 |         - fallback_filename:
 33 |     """
 34 |     logging.info("Preparing to download media: " + media_url)
 35 |     if "/media/" in media_url:
 36 |         # I am not entirely sure if all media_urls contain "/media/", hence this conditional
 37 |         media_suffix = media_url[media_url.index("/media/")+7:]
 38 |     else:
 39 |         extension = "." + media_url.split(".")[-1]
 40 |         media_suffix = fallback_filename + extension
 41 | 
 42 |     file_path = folder_path + "/" + media_suffix
 43 |     # often, the media item in 'entities' is also in 'extended_entities'. don't download twice.
 44 |     if not media_suffix in os.listdir(folder_path):
 45 |         logging.info("Downloading media: " + media_url)
 46 |         urllib.urlretrieve(media_url, file_path)
 47 |     else:
 48 |         logging.info("Skipped duplicate media download: " + media_url)
 49 | 
 50 | def archive_single_tweet(tweet, archive_name, id_str, media):
 51 |     """
 52 |     Archives a single tweet.
 53 |         - tweet: twitter.Status, representing the tweet
 54 |         - archive_name: string, the folder in which this tweet is to have its archive sub-folder
 55 |         - id_str: string, the tweet's unique identifier
 56 |         - media: boolean, save attached media if True.
 57 |     """
 58 |     tweet_as_dict = tweet.AsDict()
 59 |     logging.info("Archiving tweet id: " + id_str)
 60 |     created_at = dateutil.parser.parse(tweet_as_dict['created_at'])
 61 |     folder_name = created_at.strftime("%Y-%m-%d-%H:%M:%S") + "-" + id_str
 62 |     folder_path = archive_name + "/" + folder_name
 63 | 
 64 |     if os.path.exists(folder_path):
 65 |         logging.info("Trying to archive tweet: " + folder_name + "\n\tArchive folder already exists. Proceeding anyway.")
 66 |     else:
 67 |         os.makedirs(folder_path)
 68 | 
 69 |     file_name = "tweet-" + id_str
 70 |     file_path = folder_path + "/" + file_name  + ".json"
 71 |     tweet_as_json = tweet.__dict__['_json']
 72 | 
 73 |     with open(file_path, "w") as f:
 74 |         json.dump(tweet_as_json, f, indent=4, sort_keys=True, separators=(',', ':'))
 75 | 
 76 |     if media:
 77 |         # handle media attachments
 78 |         if 'media' in tweet_as_json['entities']:
 79 |             tweet_entities_media = tweet_as_json['entities']['media']
 80 |             for media_index, media_item in enumerate(tweet_entities_media):
 81 |                 fallback_file_name = "media_" + str(media_index)
 82 |                 download_media(folder_path, media_item['media_url'], fallback_file_name)
 83 | 
 84 |         if 'extended_entities' in tweet_as_json:
 85 |             if 'media' in tweet_as_json['extended_entities']:
 86 |                 tweet_ee_media = tweet_as_json['extended_entities']['media']
 87 |                 for media_index, media_item in enumerate(tweet_ee_media):
 88 |                     fallback_file_name = "extended_media_" + str(media_index)
 89 |                     download_media(folder_path, media_item['media_url'], fallback_file_name)
 90 | 
 91 | def handle_single_liked_tweet(tweet, archive, delete, media):
 92 |     """
 93 |     archives or deletes a single linked tweet.
 94 |         - tweet: twitter.Status, representing the tweet
 95 |         - archive: boolean, saving the tweet if True
 96 |         - delete: boolean, un-liking the tweet if True
 97 |         - media: boolean, saving the tweet's media if True (and if archive is True)
 98 |     """
 99 |     id_str = tweet.__dict__['id_str']
100 |     logging.info("Handling tweet id: " + id_str)
101 | 
102 |     if archive:
103 |         archive_name = "Archive-Liked-Tweets"
104 |         archive_single_tweet(tweet, archive_name, id_str, media)
105 | 
106 |     if delete:
107 |         logging.info("Un-liking tweet: " + id_str)
108 |         api.DestroyFavorite(status_id=tweet.__dict__['id'])
109 | 
110 | def handle_single_personal_tweet(tweet, archive, delete, media):
111 |     """
112 |     archives or deletes a single personal tweet.
113 |         - tweet: twitter.Status, representing the tweet
114 |         - archive: boolean, saving the tweet if True
115 |         - delete: boolean, deleting the tweet if True
116 |         - media: boolean, saving the tweet's media if True (and if archive is True)
117 |     """
118 |     id_str = tweet.__dict__['id_str']
119 |     logging.info("Handling tweet id: " + id_str)
120 | 
121 |     if archive:
122 |         archive_name = "Archive-Personal-Tweets"
123 |         archive_single_tweet(tweet, archive_name, id_str, media)
124 | 
125 |     if delete:
126 |         logging.info("Deleting tweet: " + id_str)
127 |         api.DestroyStatus(status_id=tweet.__dict__['id'])
128 | 
129 | def handle_liked_tweets(archive, delete, media):
130 |     """
131 |     archives or deletes as many liked tweets as possible. (see README for limits.)
132 |         - archive: boolean, saving the tweets if True
133 |         - delete: boolean, un-liking the tweets if True
134 |         - media: boolean, saving each tweet's media if True (and if archive is True)
135 |     """
136 |     if not os.path.exists("Archive-Liked-Tweets"):
137 |         os.makedirs("Archive-Liked-Tweets")
138 | 
139 |     liked_ratelimit = api.CheckRateLimit("https://api.twitter.com/1.1/favorites/list.json")
140 |     remaining = liked_ratelimit.remaining
141 |     reset_timestamp = liked_ratelimit.reset
142 |     logging.info("Rate Limit Status: " + str(remaining) + " calls to `favorites` remaining in this 15-minute time period.")
143 | 
144 |     if remaining > 0:
145 |         logging.info("Retrieving a new batch of favorites!")
146 |         favorites = api.GetFavorites(count=200)
147 |         for favorite in favorites:
148 |             handle_single_liked_tweet(favorite, archive, delete, media)
149 |         if len(favorites) == 0:
150 |             logging.info("There are no more liked tweets to handle!")
151 |         else:
152 |             handle_liked_tweets(archive, delete, media)
153 |     else:
154 |         logging.info("Rate limit has been hit! Sleeping until rate limit resets.")
155 |         seconds_until_reset = int(ceil(time() - reset_timestamp))
156 |         sleep(seconds_until_reset)
157 |         handle_liked_tweets(archive, delete, media)
158 | 
159 | 
160 | def handle_personal_tweets(archive, delete, media):
161 |     """
162 |     archives or deletes as many personal tweets as possible. (see README for limits.)
163 |         - archive: boolean, saving the tweets if True
164 |         - delete: boolean, deleting the tweets if True
165 |         - media: boolean, saving each tweet's media if True (and if archive is True)
166 |     """
167 |     if not os.path.exists("Archive-Personal-Tweets"):
168 |         os.makedirs("Archive-Personal-Tweets")
169 | 
170 |     usertimeline_ratelimit = api.CheckRateLimit("https://api.twitter.com/1.1/statuses/user_timeline.json")
171 |     remaining = usertimeline_ratelimit.remaining
172 |     reset_timestamp = usertimeline_ratelimit.reset
173 |     logging.info("Rate Limit Status: " + str(remaining) + " calls to `user_timeline` remaining in this 15-minute time period.")
174 | 
175 |     if remaining > 0:
176 |         logging.info("Retrieving a new batch of personal tweets!")
177 |         tweets = api.GetUserTimeline(count=200)
178 |         for tweet in tweets:
179 |             handle_single_personal_tweet(tweet, archive, delete, media)
180 |         if len(tweets) == 0:
181 |             logging.info("There are no more personal tweets to handle!")
182 |         else:
183 |             handle_personal_tweets(archive, delete, media)
184 |     else:
185 |         logging.info("Rate limit has been hit! Sleeping until rate limit resets.")
186 |         seconds_until_reset = int(ceil(time() - reset_timestamp))
187 |         sleep(seconds_until_reset)
188 |         handle_personal_tweets(archive, delete, media)
189 | 
190 | def arguments_and_confirm():
191 |     """
192 |     handle the user's command-line arguments, ensure input is valid,
193 |     confirm the user's intention.
194 |     """
195 |     parser = argparse.ArgumentParser(description='See README for help with running this program.')
196 |     group = parser.add_mutually_exclusive_group()
197 |     group.add_argument("--liked", help="use this flag to handle liked/favorited tweets.",
198 |                         action="store_true", default=False)
199 |     group.add_argument("--posted", help="use this flag to handle tweets that you have authored (retweets included).",
200 |                         action="store_true", default=False)
201 |     parser.add_argument("--archive", help="use this flag to archive (save) tweets.",
202 |                         action="store_true", default=False)
203 |     parser.add_argument("--delete", help="use this flag to delete/un-like tweets.",
204 |                         action="store_true", default=False)
205 |     parser.add_argument("--media", help="use this flag to save media files attached to tweets, if archiving.",
206 |                         action="store_true", default=False)
207 | 
208 |     args = parser.parse_args()
209 | 
210 |     if not (args.posted or args.liked):
211 |         raise ValueError("You must supply either the --posted or --liked flag to specify whether "
212 |             "you want to handle the tweets that you made/retweeted, or the tweets you liked. "
213 |             "\nPlease see README for instructions.")
214 | 
215 |     elif (not args.archive) and args.media:
216 |         raise ValueError("You have selected not to archive, but to save media. This is impossible. "
217 |             "You can only save media if you're archiving.\nPlease see README for instructions.")
218 | 
219 |     elif not (args.archive or args.delete):
220 |         raise ValueError("You must supply at least one of the --archive or --delete flags, to "
221 |             "specify what you want to do with the selected tweets. "
222 |             "\nPlease see README for instructions.")
223 | 
224 |     else:
225 |         option_string = "You have selected: "
226 |         if args.archive:
227 |             option_string += "to ARCHIVE "
228 |             if args.media:
229 |                 option_string += "(and save media files) "
230 |             if args.delete:
231 |                 if args.posted:
232 |                     option_string += "and DELETE "
233 |                 else:
234 |                     option_string += "and UN-LIKE "
235 |         else:
236 |             if args.posted:
237 |                 option_string += "to DELETE "
238 |             else:
239 |                 option_string += "to UN-LIKE "
240 | 
241 |         if args.posted:
242 |             liked_or_personal = 'personal'
243 |             option_string += "ALL tweets you have POSTED (including retweets)."
244 |         else:
245 |             liked_or_personal = 'liked'
246 |             option_string += "ALL tweets you have LIKED."
247 | 
248 |     print option_string
249 | 
250 |     while True:
251 |         # loop in case the user does not confirm correctly.
252 |         confirm = raw_input("Please confirm. Yes/No\n").lower()
253 |         if len(confirm) >= 1:
254 |             if confirm[0] == 'y':
255 |                 return liked_or_personal, args.archive, args.delete, args.media
256 |             elif confirm[0] == 'n':
257 |                 sys.exit(0)
258 | 
259 | def credentials_and_authenticate():
260 |     """
261 |     parse credentials from credentials.txt and authenticate with Twitter.
262 |     """
263 |     config = ConfigParser.ConfigParser()
264 |     config.read('credentials.txt')
265 |     consumer_key = config.get('TWITTER-TOOL', 'consumer_key')
266 |     consumer_secret = config.get('TWITTER-TOOL', 'consumer_secret')
267 |     access_token_key = config.get('TWITTER-TOOL', 'access_token_key')
268 |     access_token_secret = config.get('TWITTER-TOOL', 'access_token_secret')
269 | 
270 |     global api
271 |     api = twitter.Api(consumer_key=consumer_key,
272 |                       consumer_secret=consumer_secret,
273 |                       access_token_key=access_token_key,
274 |                       access_token_secret=access_token_secret)
275 | 
276 |     # returning api instead of None so that it's possible to import and use this from the REPL.
277 |     return api
278 | 
279 | def main():
280 |     tweet_type, archive, delete, media = arguments_and_confirm()
281 |     credentials_and_authenticate()
282 | 
283 |     if tweet_type == "personal":
284 |         handle_personal_tweets(archive, delete, media)
285 |     else:
286 |         handle_liked_tweets(archive, delete, media)
287 | 
288 | if __name__ == '__main__':
289 |     main()
290 | 


--------------------------------------------------------------------------------
/credentials.txt:
--------------------------------------------------------------------------------
1 | [TWITTER-TOOL]
2 | consumer_key = WWWWWWWW
3 | consumer_secret = XXXXXXXXXXX
4 | access_token_key = YYYYYYYYYYYYY
5 | access_token_secret = ZZZZZZZZZZZZZ
6 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | appdirs==1.4.0
 2 | future==0.16.0
 3 | oauthlib==2.0.1
 4 | packaging==16.8
 5 | pyparsing==2.1.10
 6 | python-dateutil==2.6.0
 7 | python-twitter==3.2.1
 8 | requests==2.13.0
 9 | requests-oauthlib==0.7.0
10 | six==1.10.0
11 | 


--------------------------------------------------------------------------------