├── requirements.txt ├── post.sh ├── README.md └── laser.py /requirements.txt: -------------------------------------------------------------------------------- 1 | requests>=2.20.0 2 | twitter==1.17.1 3 | -------------------------------------------------------------------------------- /post.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Have your bot post 3 | # If you can get stuck in random loops, remember to use timeout! 4 | set -e 5 | 6 | cd $(dirname $0) 7 | export LC_ALL=ja_JP.UTF-8 8 | ./laser.py &>> log 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # showmemore 2 | 3 | > **Note:** This is not maintained. It was a fun weekend project a few years 4 | > ago, and [@showmepixels](https://twitter.com/showmepixels) is still running, 5 | > but I don't work to improve it and it's only a matter of time before Twitter 6 | > or Tumblr change their APIs so it doesn't work. The code is left up for the 7 | > curious. 8 | 9 | **ShowMeMore** is an automated researcher. Given a list of tags to start with, 10 | it goes hunting for images, and over time grows its model in response to 11 | reactions, slowly reaching out to find things you weren't aware you already 12 | liked. 13 | 14 | It started as an attempt to recreate 15 | [@Archillect](https://twitter.com/archillect); you can read more about how it 16 | came to be and the motivations behind it 17 | [here](https://www.dampfkraft.com/by-id/2931e31b.html#The-Laser-Syriacum). 18 | 19 | Currently, it's configured to be run as a Twitter bot pulling images from 20 | Tumblr and Flickr. 21 | 22 | ## Setting up Your Own 23 | 24 | Currently setting up your own bot involves a lot of manual steps; I cannot 25 | pretend it is anything but tedious. Before getting started you should be 26 | familiar with concepts like API keys and have at least a little experience in 27 | the Unix shell. (If someone wants to automate more of this it would be much 28 | appreciated.) 29 | 30 | While the technical steps don't take a lot of time, you'll need to spend a 31 | while evaluating early posts to nudge your bot in the right direction, which 32 | might take a few days. 33 | 34 | ### Create a bot account 35 | 36 | Create a Twitter account for the bot to use. I would be flattered if you called 37 | it `showme`, but anything will do. 38 | 39 | Have the account follow your main Twitter account; you'll send DMs to control 40 | it after it's running. 41 | 42 | It's fine to make the account private; in fact, I would recommend keeping it 43 | private until you make sure the posts are what you're going for. 44 | 45 | You should tweet once via the web client so Twitter knows a human is fiddling 46 | with the account. 47 | 48 | ### Get API Keys 49 | 50 | You need to use at least one of Flickr or Tumblr as a source for posts. 51 | 52 | You can register a Tumblr API key 53 | [here](https://www.tumblr.com/docs/en/api/v2). They will give you two API keys; 54 | the one you care about is your "OAuth Consumer Key". We'll use it later, so 55 | just copy it somewhere for now. 56 | 57 | You can register a Flickr API key 58 | [here](https://www.flickr.com/services/api/misc.api_keys.html). Flickr will 59 | also give you two keys; you'll need the longer one. 60 | 61 | You also need a Twitter app, which you can register 62 | [here](https://apps.twitter.com/). The app will need to request permission to 63 | read and send DMs, which is not the default setting, so be sure to specify that. 64 | 65 | ### Set up the bot's environment 66 | 67 | You'll need to clone the repository and perform a few more steps. You need 68 | **Python 3** installed. The only library dependencies are 69 | [requests](http://docs.python-requests.org/en/master/) and 70 | [twitter](https://github.com/sixohsix/twitter). 71 | 72 | git clone git@github.com:polm/showmemore.git 73 | cd showmemore 74 | pip install -r requirements.txt 75 | echo mybot > name # use your bot's Twitter handle instead of "mybot" 76 | mkdir out # downloaded images will be saved here 77 | 78 | You'll need to stash your Twitter App credentials in the directory for the bot 79 | to get. `twitter-creds.json` should be a JSON file with `key` and `secret` 80 | fields, as indicated by your Twitter app info. 81 | 82 | At this point, have a browser open and logged in to Twitter as your bot. We're 83 | going to connect the application to the account. Run the script like this: 84 | 85 | ./laser.py initdb 86 | 87 | This will do a couple of things. First it may open a web browser, possibly in 88 | your terminal; kill that and a Twitter URL will be displayed. Open that while 89 | logged in as the bot account, authorize the application, and post the numeric 90 | code you get into your terminal. After that it will initialize the database, 91 | and you're almost ready to go. 92 | 93 | ### Initial Settings 94 | 95 | Using your normal Twitter account - the one the bot is following - it's time to 96 | give the bot some information to get started with. The bot recognizes several commands: 97 | 98 | - **key**: Set an API key for a source service. Currently, valid service names are just `flickr` and `tumblr`. Example: `key flickr [your api key]` 99 | - **seed**: Assign points to a tag, used to get the bot started. Example: `seed some cool tag` 100 | - **ignore**: Don't use the given tag to pick posts. (Posts with the tag won't be banned, but it'll never be used as a starting point to look for candidates.) Example: `ignore some bad tag` 101 | - **ban**: Ban posts from a Tumblr blog. The "blog name" is usually the `example` in `example.tumblr.com`. Alternately, to ban a Flickr user, use the format `flickr:[user-id]`, where the `user-id` is the automatically assigned Flickr user ID (looks like `12345678@N00`, not a normal username). Posts from a banned blog will never be selected. 102 | 103 | The `seed`, `ignore`, and `ban` keywords can be prefixed with `un` to undo 104 | them. Don't quote arguments to commands, it'll just confuse the bot. For Tumblr 105 | tags, use spaces or dashes the same way Tumblr blogs do - the API treats them 106 | the same, but it's better to use the more common form for seeding or ignoring 107 | to avoid confusing the bot. 108 | 109 | To get started you should set at least one API key and at least five seeds. I'd 110 | also go ahead and ignore a few overly general tags, like `art`, `gif`, 111 | `tumblr`, and `ifttt`. 112 | 113 | ### Set up cron 114 | 115 | We're almost done. The script needs to be run repeatedly in order to post. 116 | Here's an example cron entry to run it every ten minutes: 117 | 118 | */10 * * * * /home/you/code/showmemore/post.sh 119 | 120 | The `post.sh` script just handles logging and encoding. Try running it manually 121 | once to be sure everything works. The bot should reply to each DM you've sent 122 | to let you know it was processed. If it posts successfully, it will add a line 123 | to the `log` file in its script directory consisting of JSON that describes the 124 | post. One field in this JSON to look out for is `origin` - while every aspect 125 | of a candidate is judged, the `origin` is the aspect that was used to find the 126 | candidate in the first place. Checking the origin is a good way to understand 127 | what's going on when the bot surprises you. 128 | 129 | ### Training the bot 130 | 131 | At this point the bot should be running successfully, but it doesn't have much 132 | of a model to go on. My suggestion would be to let the bot post ten or twenty 133 | items, then favorite the ones that match your image of what the bot should 134 | post. This will give it more potential sources to draw from when posting. After 135 | that watch it for a day or two, favoriting good posts and building up your 136 | ignore list, to guide it to what you want it to be. 137 | 138 | If you don't like cluttering up your favorites list, use the bot to favorite 139 | its own posts. You can also reply to posts from any account the bot follows and 140 | every emoji star (⭐) will be counted as an extra like; you'll know it worked 141 | if the post is automatically favorited. 142 | 143 | Good luck, and have fun! 144 | 145 | ## Algorithm Overview 146 | 147 | The algorithm for selecting posts is simpler than you might think. It doesn't 148 | use anything you'd describe as artificial intelligence, and the actual visual 149 | properties of candidate images are never considered. Its lack of sophistication 150 | means it falls down spectacularly sometimes, but it does have some advantages - 151 | calculations are fast and don't require large banks of data. 152 | 153 | Overall, the algorithm is a bit like Pagerank working on photo metadata. 154 | 155 | "Aspects" are true binary properties of a post - tags and authors (or at least 156 | source blogs) are the most important aspects, but liking or reblogging users on 157 | Tumblr and photo groups on Flickr are also aspects. Aspects have a type such as 158 | `tag`, `author`, `reblog`, `liked`, `flickr-pool`, and a value that identifies 159 | the relevant resource. 160 | 161 | When a tweet by the bot is liked, that's treated as a vote for every aspect of 162 | the source post. (RTs are just treated as multiple likes for scoring purposes.) 163 | Votes are used to calculate two kinds of scores: all-time historical score and 164 | per-post score. So if an aspect has been present in 10 posts that have together 165 | gotten 500 likes, it might have an all-time score of 500, but a per-post score 166 | of just 50. 167 | 168 | In actuality, it's a little more complicated than that - points a tweet gets 169 | are divided equally between all aspects on the tweet. So a tweet with a source 170 | with many tags will give fewer points to each. 171 | 172 | Anyway, regarding how points are used, the algorithm has three phases: 173 | 174 | 1. Candidate Gathering 175 | 2. Culling 176 | 3. Post Selection 177 | 178 | In **Candidate Gathering**, **per-post** scores are used to make a series of 179 | weighted random picks of aspects. These aspects are used to query source APIs 180 | and get posts to look at. 181 | 182 | In **Culling**, items that have been posted before or are from banned blogs are 183 | removed from the candidate list. This is the simplest phase, but it's important 184 | to keep the bot from going in circles. How to determine if two posts are 185 | duplicates is also a bit subtle; the current design errs on the side of posting 186 | duplicates sometimes while being simple in implementation. 187 | 188 | If no candidates are left at the end of culling, the algorithm returns to the 189 | Candidate Gathering phase. Otherwise, it proceeds to Post Selection. 190 | 191 | In **Post Selection**, candidates are scored for all their aspects based on 192 | **all-time** aspect scores. Then the highest-scoring post is made into a tweet, 193 | and the aspects attached to the post are recorded in the application's 194 | database. 195 | 196 | These three steps are repeated every time the script is run. 197 | 198 | ## Next Steps 199 | 200 | See the issues page for small things. 201 | 202 | A bigger change that would be nice would be generalizing the program to work on 203 | webpages rather than Tumblr and Flickr API items, as a kind of guided Pagerank. 204 | 205 | ## License 206 | 207 | Kopyleft, All Rites Reversed, do as you please. WTFPL if you prefer. 208 | 209 | -POLM 210 | -------------------------------------------------------------------------------- /laser.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | import os, sys, json 4 | from random import choice, randint, sample 5 | from collections import Counter, defaultdict 6 | from subprocess import call 7 | import shutil # for downloading images 8 | 9 | import requests 10 | from twitter import * 11 | import sqlite3 12 | 13 | #### command modes 14 | doinitdb = (len(sys.argv) > 1 and sys.argv[1] == 'initdb') 15 | justranking = (len(sys.argv) > 1 and sys.argv[1] == 'justranking') 16 | nopost = (len(sys.argv) > 1 and sys.argv[1] == 'nopost') 17 | 18 | #### parameters 19 | 20 | seed_val = 10000 21 | rt_val = 1500 22 | fav_val = 500 23 | threshold = 200 24 | 25 | # API keys 26 | KEYS = {} 27 | 28 | def bloop(ss): 29 | # print debugging info 30 | print('-----| ' + ss + ' |-----') 31 | 32 | def uniq(ll): 33 | return list(set(ll)) 34 | 35 | # This is where this script is 36 | path = os.path.dirname(os.path.realpath(sys.argv[0])) 37 | bot_name = open('name').read().strip() 38 | # put your app key and secret in this file 39 | creds = json.loads(open(path + "/twitter-creds.json").read()) 40 | CONSUMER_KEY = creds['key'] 41 | CONSUMER_SECRET = creds['secret'] 42 | MY_TWITTER_CREDS = path + '/' + bot_name + '.auth' 43 | if not os.path.exists(MY_TWITTER_CREDS): 44 | oauth_dance(bot_name, CONSUMER_KEY, CONSUMER_SECRET, MY_TWITTER_CREDS) 45 | 46 | oauth_token, oauth_secret = read_token_file(MY_TWITTER_CREDS) 47 | 48 | twitter = Twitter(auth=OAuth(oauth_token, oauth_secret, CONSUMER_KEY, CONSUMER_SECRET)) 49 | 50 | def init_db(): 51 | conn = sqlite3.connect('showme.db') 52 | conn.execute("""create table source ( 53 | source text primary key not null, 54 | imgurl text)""") 55 | 56 | conn.execute("""create table tweet ( 57 | tweetid text primary key not null, 58 | source text, 59 | faves int, 60 | rts int)""") 61 | 62 | conn.execute("""create table source_aspect ( 63 | source text not null, 64 | aspect text not null, 65 | primary key (source, aspect))""") 66 | 67 | conn.execute("""create table key ( 68 | service text primary key not null, 69 | key text not null)""") 70 | 71 | conn.execute("""create table seed ( 72 | name text primary key not null)""") 73 | 74 | conn.execute("""create table ignore ( 75 | name text primary key not null)""") 76 | 77 | conn.execute("""create table ban ( 78 | name text primary key not null)""") 79 | 80 | # used to keep track of read messages 81 | conn.execute("""create table dm ( 82 | id text primary key not null)""") 83 | 84 | conn.execute("""create table reply ( 85 | id text primary key not null, 86 | tweet text not null, 87 | stars int)""") 88 | 89 | # add indexes so things happen with reasonable speed 90 | conn.execute("create index tweet_source_index on tweet(source)") 91 | conn.execute("create index source_aspect_source_index on source_aspect(source)") 92 | 93 | conn.commit() 94 | conn.close() 95 | 96 | def save_source(conn, post): 97 | conn.execute("""insert or ignore into source (source, imgurl) 98 | values (?, ?)""", (post['source'], post['imageurl'])) 99 | 100 | for aspect in post['aspects']: 101 | # these are not checked for changes, so just insert if necessary 102 | conn.execute("""insert or ignore into source_aspect (source, aspect) 103 | values (?, ?)""", (post['source'], aspect)) 104 | 105 | def save_tweet(conn, post): 106 | # note this does an update first, then an insert that ignores failure 107 | # this way if it exists it's updated, and if it doesn't exist it's left alone 108 | conn.execute("""update tweet set faves = ?, rts = ? where tweetid = ?""", 109 | (post['faves'], post['rts'], post['id'])) 110 | conn.execute("""insert or ignore into tweet (faves, rts, source, tweetid) 111 | values (?, ?, ?, ?) """, 112 | (post['faves'], post['rts'], post['source'], post['id'])) 113 | 114 | def load_aspects(): 115 | model = { 116 | 'scores': defaultdict(Counter), 117 | 'postcounts': defaultdict(Counter), 118 | 'perpost': Counter()} 119 | timeline = twitter.statuses.user_timeline(screen_name=bot_name,count=200) 120 | 121 | conn = sqlite3.connect('showme.db') 122 | tweetmap = {} 123 | for tweet in timeline: 124 | tweetmap[tweet['id_str']] = { 125 | 'rts': int(tweet['retweet_count']), 126 | 'faves': int(tweet['favorite_count'])} 127 | save_tweet(conn, {'id': tweet['id_str'], 'rts': tweet['retweet_count'], 'faves':tweet['favorite_count'], 'source':''}) 128 | conn.commit() 129 | conn.close() 130 | 131 | ids = [] # the source urls, used to prevent duplicates 132 | post_counts = Counter() # how many posts is each aspect used on? 133 | 134 | conn = sqlite3.connect('showme.db') 135 | seeds = [s[0] for s in conn.execute("select name from seed").fetchall()] 136 | for seed in seeds: 137 | model['scores']['tag'][seed] = seed_val 138 | ignorelist = [t[0] for t in conn.execute("select name from ignore").fetchall()] 139 | ignorelist = frozenset(ignorelist) 140 | 141 | tweetbonus = Counter() 142 | for tweetid, stars in conn.execute("select tweet, stars from reply"): 143 | tweetbonus[tweetid] += stars 144 | 145 | cur = conn.cursor() 146 | for source in cur.execute("""select source from source""").fetchall(): 147 | source = source[0] 148 | ids.append(source) 149 | 150 | tweet = cur.execute("select faves, rts, tweetid from tweet where source = ?", (source,)).fetchone() 151 | 152 | post_aspects = list(cur.execute("select aspect from source_aspect where source = ?", (source,))) 153 | aspect_count = len(post_aspects) 154 | 155 | for aspect in post_aspects: 156 | aspect = aspect[0] 157 | post_counts[aspect] += 1 158 | field, _, val = aspect.partition(':') 159 | model['postcounts'][field][val] += 1 160 | if tweet: 161 | fav_count = tweet[0] + tweetbonus[tweet[2]] 162 | model['scores'][field][val] += int((fav_val * fav_count) + (rt_val * tweet[1]) / aspect_count) 163 | 164 | for field in model['scores']: 165 | for val in list(model['scores'][field].keys()): 166 | aspect = field + ':' + val 167 | # The minimum post count is effectively ten to avoid over-valuing aspects 168 | # Note that the real post count can be 0, as for seeds 169 | base = max(10, post_counts[aspect]) 170 | model['perpost'][aspect] = int(model['scores'][field][val] / base) 171 | if model['scores'][field][val] < threshold: 172 | # don't consider tags with less than some number of likes 173 | del model['scores'][field][val] 174 | del model['perpost'][aspect] 175 | if field == 'liked' or field == 'reblog': 176 | # these fields are not used to select things 177 | del model['perpost'][aspect] 178 | continue 179 | if val in ignorelist: 180 | # ignored things are bad pickers 181 | del model['perpost'][aspect] 182 | 183 | conn.commit() 184 | conn.close() 185 | 186 | if justranking: 187 | # TODO redo this 188 | 189 | sys.exit(0) 190 | 191 | return (ids, model) 192 | 193 | def counter_choice(aspects,debug=False): 194 | # pick weighted random aspect, use it for search 195 | 196 | total = sum(aspects.values()) 197 | pick = randint(0, total) 198 | for aspect, count in aspects.items(): 199 | pick -= count 200 | if pick < 0: 201 | return aspect 202 | 203 | def pick_by_score(results, model): 204 | if not results: return None # in case it's empty 205 | scoremap = Counter({p['post_url']:0 for p in results}) 206 | aspects = model['scores']['tag'] + model['scores']['liked'] 207 | 208 | # get avg for post 209 | for result in results: 210 | url = result['post_url'] 211 | if 'score' in result: 212 | scoremap[url] += result['score'] 213 | for tag in result['tags']: 214 | scoremap[url] += aspects[tag] 215 | 216 | # This is effectively a large penalty to all unknown tags 217 | if tag in aspects: 218 | scoremap[url] += 10000 219 | 220 | # bonus points for trusted likers 221 | if 'liked_by' in result: 222 | for liker in result['liked_by']: 223 | scoremap[url] += aspects['liked:' + liker] 224 | 225 | scoremap[url] = int(scoremap[url] / (len(result['tags']) + 1) ) 226 | 227 | candidates = Counter() 228 | for key, val in scoremap.most_common(): 229 | candidates[key] = val 230 | if nopost: 231 | match = None 232 | for res in results: 233 | if res['post_url'] == key: 234 | match = res 235 | print(str(val) + '\t' + str(key) + '\t' + match['origin']) 236 | if nopost: 237 | print(len(scoremap)) 238 | sys.exit(0) 239 | 240 | picked = counter_choice(candidates) 241 | for res in results: 242 | if res['post_url'] == picked: 243 | res['score'] = scoremap[picked] 244 | return res 245 | return None 246 | 247 | def flickr_get_tag(tag): 248 | if not 'flickr' in KEYS: return [] 249 | out = requests.post('https://api.flickr.com/services/rest/', { 250 | 'nojsoncallback': 1, 251 | 'method': 'flickr.photos.search', 252 | 'api_key': KEYS['flickr'], 253 | 'tags': tag, 254 | 'extras': 'owner_name,tags,views,count_faves', 255 | 'license': '1,2,4,5,7,8', 256 | 'format':'json'}).json()['photos']['photo'] 257 | out = [o for o in out if int(o['count_faves']) > 50] 258 | 259 | # take only the top portion 260 | out.sort(key=lambda o: o['count_faves'],reverse=True) 261 | cutoff = max(10,int(len(out)/4)) 262 | out = out[:cutoff] 263 | 264 | for o in out: 265 | o['score'] = 5000 266 | o['post_url'] = 'https://www.flickr.com/{}/{}'.format(o['owner'], o['id']) 267 | o['tags'] = o['tags'].split(' ') 268 | o['type'] = 'photo' 269 | o['blog_name'] = 'flickr:' + o['owner'] 270 | o['origin'] = '#' + ''.join(tag.title().split(' ')) 271 | o['origin'] = 'tag:' + tag 272 | if not tag in o['tags']: 273 | o['tags'].append(tag) 274 | o['flickr'] = True 275 | return out 276 | 277 | def tumblr_get_tag(tag): 278 | year = 365 * 24 * 60 * 60 279 | baseurl = 'https://api.tumblr.com/v2/tagged?feature_type=everything&reblog_info=true¬es_info=true&filter=text&tag=' + tag + '&api_key=' + KEYS['tumblr'] 280 | out = requests.get(baseurl).json()['response'] 281 | if out: 282 | out += requests.get(baseurl + '&before=' + str(out[-1]['timestamp'])).json()['response'] 283 | out = [o for o in out if o['note_count'] > 20] 284 | 285 | # take only the top portion 286 | out.sort(key=lambda o: o['note_count'],reverse=True) 287 | cutoff = max(10,int(len(out)/4)) 288 | out = out[:cutoff] 289 | 290 | for c in out: 291 | c['score'] = 5000 292 | if not tag in c['tags']: 293 | c['tags'].append(tag) 294 | 295 | # some blogs use their name as a tag on everything, stop that 296 | if c['blog_name'] in c['tags']: 297 | c['tags'].remove(c['blog_name']) 298 | # if this is a reblog, the original author is an author too 299 | if 'reblog' in c: 300 | c['original_author'] = [r['blog']['name'] for r in c['trail']] 301 | # save reblog/like info to use in ranking 302 | if 'notes' in c: 303 | c['reblog_sources'] = [r['blog_name'] for r in c['notes'] if r['type'] == 'reblog'] 304 | c['liked_by'] = [l['blog_name'] for l in c['notes'] if l['type'] == 'like'] 305 | c['origin'] = 'tag:' + tag 306 | return out 307 | 308 | def flickr_get_author(author): 309 | if not 'flickr' in KEYS: return [] 310 | parts = author.split(':') 311 | if len(parts) < 2 or not parts[0] == 'flickr': return [] 312 | 313 | out = requests.post('https://api.flickr.com/services/rest/', { 314 | 'nojsoncallback': 1, 315 | 'method': 'flickr.photos.search', 316 | 'api_key': KEYS['flickr'], 317 | 'user_id': parts[1], 318 | 'extras': 'owner_name,tags', 319 | 'license': '1,2,4,5,7,8', 320 | 'format':'json'}).json()['photos']['photo'] 321 | 322 | for o in out: 323 | o['post_url'] = 'https://www.flickr.com/{}/{}'.format(o['owner'], o['id']) 324 | o['tags'] = o['tags'].split(' ') 325 | o['type'] = 'photo' 326 | o['score'] = 5000 327 | o['note_count'] = 1000 # flickr photos are pretty good 328 | o['blog_name'] = 'flickr:' + o['owner'] 329 | o['origin'] = o['blog_name'] 330 | o['flickr'] = True 331 | return out 332 | 333 | def tumblr_get_author(author): 334 | if author.split(':')[0] == 'flickr': return [] # we can't handle this 335 | baseurl = 'https://api.tumblr.com/v2/blog/' + author + '/posts/photo?filter=text&reblog_info=true¬es_info=true&api_key=' + KEYS['tumblr'] 336 | candidates = requests.get(baseurl).json()['response']['posts'] 337 | candidates += requests.get(baseurl + '&offset=20').json()['response']['posts'] 338 | 339 | for c in candidates: 340 | c['origin'] = 'author:' + author 341 | c['score'] = 5000 342 | if 'reblog' in c: 343 | c['original_author'] = [r['blog']['name'] for r in c['trail']] 344 | # likes are considered partial authors worth exploring 345 | if 'notes' in c: 346 | reblogs = [r['blog_name'] for r in c['notes'] if r['type'] == 'reblog'] 347 | c['liked_by'] = [l['blog_name'] for l in c['notes'] if l['type'] == 'like'] 348 | c['reblog_sources'] = uniq(c['reblog_sources'] + reblogs) 349 | 350 | return candidates 351 | 352 | def flickr_get_pool(poolid): 353 | res = requests.post('https://api.flickr.com/services/rest/', { 354 | 'nojsoncallback': 1, 355 | 'method': 'flickr.photos.search', 356 | 'api_key': KEYS['flickr'], 357 | 'group_id': poolid, 358 | 'extras': 'owner_name,tags', 359 | 'license': '1,2,4,5,7,8', 360 | 'format': 'json'}).json() 361 | 362 | # seems auth randomly fails sometimes 363 | if not 'photos' in res: 364 | return [] 365 | 366 | out = res['photos']['photo'] 367 | 368 | for o in out: 369 | o['post_url'] = 'https://www.flickr.com/{}/{}'.format(o['owner'], o['id']) 370 | o['tags'] = o['tags'].split(' ') 371 | o['type'] = 'photo' 372 | o['score'] = 5000 373 | o['note_count'] = 1000 # flickr photos are pretty good 374 | o['blog_name'] = 'flickr-pool:' + poolid 375 | o['origin'] = o['blog_name'] 376 | o['flickr'] = True 377 | 378 | return out 379 | 380 | def gather_candidates(aspects): 381 | candidates = [] 382 | aspects_c = Counter(aspects) # make a copy to mutate 383 | for ii in range(0, min(10, len(aspects_c))): 384 | aspect = counter_choice(aspects_c) 385 | del aspects_c[aspect] # this way we can't pick the same thing twice 386 | if nopost: print(aspect) 387 | form, _, val = aspect.partition(':') 388 | if form == 'tag': 389 | try: 390 | candidates += tumblr_get_tag(val) 391 | except: 392 | pass 393 | candidates += flickr_get_tag(val) 394 | elif form == 'author': 395 | try: 396 | candidates += tumblr_get_author(val) 397 | except: 398 | pass 399 | candidates += flickr_get_author(val) 400 | elif form == 'flickr-pool': 401 | candidates += flickr_get_pool(val) 402 | return candidates 403 | 404 | def remove_duplicates(candidates, ids): 405 | # The source URL is used when the image comes from an external source 406 | # Hopefully making use of it will help prevent duplicates 407 | for r in candidates: 408 | r['orig_url'] = r['post_url'] 409 | 410 | # filter to only photo posts 411 | candidates = [r for r in candidates if r['type'] == 'photo'] 412 | # no duplicates 413 | candidates = [r for r in candidates if not r['post_url'] in ids] 414 | candidates = [r for r in candidates if not r['orig_url'] in ids] 415 | return candidates 416 | 417 | def remove_banned(candidates): 418 | # no banned blogs 419 | conn = sqlite3.connect('showme.db') 420 | banned = [t[0] for t in conn.execute("select name from ban").fetchall()] 421 | conn.close() 422 | candidates = [r for r in candidates if not r['blog_name'] in banned] 423 | return candidates 424 | 425 | def choose_post(ids, model): 426 | # pick random search result, avoiding duplicates 427 | candidates = gather_candidates(model['perpost']) 428 | candidates = remove_duplicates(candidates, ids) 429 | candidates = remove_banned(candidates) 430 | 431 | # If we have nothing break out and try again 432 | if not candidates: 433 | return False 434 | 435 | choice = pick_by_score(candidates, model) 436 | return choice 437 | 438 | def make_post(source): 439 | post = { 440 | 'source': source['post_url'], 441 | 'score': source['score'], 442 | 'id': 'null', 443 | 'origin': source['origin']} 444 | post['aspects'] = [ ('tag:' + t.lower()) for t in source['tags']] + ['author:' + source['blog_name']] 445 | 446 | if 'flickr' in source: 447 | # This is very irritating. 448 | # Flickr supports originals too large for Twitter to accept the upload. 449 | # They offer other sizes, but, for old images sometimes there's not actually an image. 450 | sizes = requests.post('https://api.flickr.com/services/rest/', { 451 | 'nojsoncallback': 1, 452 | 'method': 'flickr.photos.getSizes', 453 | 'api_key': KEYS['flickr'], 454 | 'photo_id': source['id'], 455 | 'format':'json'}).json()['sizes']['size'] 456 | size = sizes[-1] 457 | if int(size['width']) > 2000 or int(size['height']) > 2000: 458 | size = sizes[-2] 459 | post['imageurl'] = size['source'] 460 | imageurl = post['imageurl'] 461 | # get pools as possible sources for future posts 462 | res = requests.post('https://api.flickr.com/services/rest/', { 463 | 'nojsoncallback': 1, 464 | 'method': 'flickr.photos.getAllContexts', 465 | 'api_key': KEYS['flickr'], 466 | 'photo_id': source['id'], 467 | 'format': 'json'}).json() 468 | 469 | if 'pool' in res: 470 | for group in res['pool']: 471 | post['aspects'].append('flickr-pool:' + group['id']) 472 | 473 | if not 'flickr' in source: 474 | # at the moment, this implies Tumblr 475 | # Should be cleaned up and made generic... 476 | if 'reblog_sources' in source: 477 | for rs in source['reblog_sources']: 478 | post['aspects'].append('reblog:' + rs) 479 | if 'liked_by' in source: 480 | for ll in source['liked_by']: 481 | post['aspects'].append('liked:' + ll) 482 | if 'original_author' in source: 483 | for ll in source['original_author']: 484 | post['aspects'].append('author:' + ll) 485 | 486 | imageurl = source['photos'][0]['original_size']['url'] 487 | post['imageurl'] = imageurl 488 | post['text'] = source['caption'][:100] 489 | if len(source['caption']) > 100: 490 | post['text'] += '...' 491 | 492 | 493 | image_fname = imageurl.split('/')[-1] 494 | fname = path + '/out/' + image_fname 495 | response = requests.get(imageurl, stream=True) 496 | with open(fname, 'wb') as out_file: 497 | shutil.copyfileobj(response.raw, out_file) 498 | 499 | # This is for Tumblr - international URLs confusing and get partially treated as text 500 | # this attracts spam bots looking for keywords 501 | status = '/'.join(post['source'].split('/')[0:5]) 502 | print(json.dumps(post, ensure_ascii=False,sort_keys=True)) 503 | filedata = None 504 | with open(fname, 'rb') as imagefile: 505 | filedata = imagefile.read() 506 | 507 | if nopost: sys.exit(0) # just wanted to see what would have been posted 508 | 509 | params = {"media[]": filedata, "status": status} 510 | resp = twitter.statuses.update_with_media(**params) 511 | 512 | post['id'] = resp['id_str'] 513 | 514 | # initial values 515 | post['faves'] = 0 516 | post['rts'] = 0 517 | 518 | conn = sqlite3.connect('showme.db') 519 | save_source(conn, post) 520 | save_tweet(conn,post) 521 | conn.commit() 522 | conn.close() 523 | 524 | def process_commands(): 525 | """Read in direct messages and act on them if necessary.""" 526 | # strategy: 527 | # - get new dms 528 | # - save dms into db 529 | # - get unprocessed dms from db, ordered by time, and apply 530 | # commands: key (flickr/tumblr), seed (tag/author), ignore (tag) 531 | # should return seed and ignore list 532 | messages = twitter.direct_messages() 533 | conn = sqlite3.connect('showme.db') 534 | read = [m[0] for m in conn.execute("select id from dm").fetchall()] 535 | for message in messages: 536 | # we don't see our own replies here, so only check to see if we're following them 537 | if not message['sender']['following']: continue 538 | if message['id_str'] in read: continue # read already 539 | 540 | words = message['text'].split(' ') 541 | term = ' '.join(words[1:]) 542 | 543 | if words[0] == 'key': 544 | if len(words) != 3: twitter.direct_messages.new(text="format is wrong; use : key ", user_id=message['sender']['id']) 545 | conn.execute("insert or replace into key (service, key) values (?,?)", (words[1], words[2])) 546 | twitter.direct_messages.new(text="ok", user_id=message['sender']['id']) 547 | elif words[0] == 'seed': 548 | conn.execute("insert or replace into seed (name) values (?)", (term, )) 549 | twitter.direct_messages.new(text="ok, seeded '{}'".format(term), user_id=message['sender']['id']) 550 | elif words[0] == 'unseed': 551 | conn.execute("delete from seed where name = ?", (term, )) 552 | twitter.direct_messages.new(text="ok, unseeded '{}'".format(term), user_id=message['sender']['id']) 553 | elif words[0] == 'ignore': 554 | conn.execute("insert or replace into ignore (name) values (?)", (term, )) 555 | twitter.direct_messages.new(text="ok, ignored '{}'".format(term), user_id=message['sender']['id']) 556 | elif words[0] == 'unignore': 557 | conn.execute("delete from ignore where name = ?", (term, )) 558 | twitter.direct_messages.new(text="ok, unignored '{}'".format(term), user_id=message['sender']['id']) 559 | elif words[0] == 'ban': 560 | conn.execute("insert or replace into ban (name) values (?)", (term, )) 561 | twitter.direct_messages.new(text="ok, banned '{}'".format(term), user_id=message['sender']['id']) 562 | elif words[0] == 'unban': 563 | conn.execute("delete from ban where name = ?", (term, )) 564 | twitter.direct_messages.new(text="ok, unbanned '{}'".format(term), user_id=message['sender']['id']) 565 | else: 566 | twitter.direct_messages.new(text="I don't understand. Valid commands: key, seed, ignore, ban", user_id=message['sender']['id']) 567 | conn.execute("insert or replace into dm (id) values (?)", (message['id_str'], )) 568 | conn.commit() 569 | conn.close() 570 | 571 | def load_keys(): 572 | conn = sqlite3.connect('showme.db') 573 | for service, key in conn.execute("select service, key from key").fetchall(): 574 | KEYS[service] = key 575 | conn.close() 576 | if not ('flickr' in KEYS or 'tumblr' in KEYS): 577 | print("No api keys, giving up") 578 | sys.exit(1) 579 | 580 | def process_replies(): 581 | """Read replies""" 582 | # replies can use emoji. If it's from a user we follow, they can boost posts this way. 583 | replies = twitter.statuses.mentions_timeline() 584 | conn = sqlite3.connect('showme.db') 585 | read = [m[0] for m in conn.execute("select id from reply").fetchall()] 586 | for reply in replies: 587 | if reply['id_str'] in read: continue # already done 588 | if not reply['user']['following']: continue 589 | if not reply['in_reply_to_status_id_str']: continue # we're only interested in specific replies 590 | 591 | bonus = 0 592 | for cc in reply['text']: 593 | # this is an emoji star, should probably add other characters 594 | if cc in '⭐🌟': bonus += 1 595 | 596 | conn.execute("insert or replace into reply (id, tweet, stars) values (?,?,?)", 597 | (reply['id_str'], reply['in_reply_to_status_id_str'], bonus)) 598 | conn.commit() 599 | # favorite it to let them know we saw it 600 | twitter.favorites.create(_id=reply['id_str']) 601 | conn.close() 602 | 603 | def main(): 604 | 605 | if doinitdb: 606 | # this needs to be done once the first time it's run 607 | # alternate strategy: do automatically if db file doesn't exist 608 | init_db() 609 | sys.exit(0) 610 | 611 | process_commands() # commands via dm from the operator 612 | process_replies() # replies to add extra points 613 | load_keys() # get API keys so we can fetch posts 614 | 615 | ids, model = load_aspects() # load tag/author/etc data from db 616 | 617 | source = False 618 | while not source: 619 | source = choose_post(ids, model) # the main selection part 620 | 621 | make_post(source) # handles twitter post & saving 622 | 623 | if __name__ == "__main__": 624 | main() 625 | --------------------------------------------------------------------------------