├── LICENSE ├── Procfile ├── README.md ├── ignoredSubs.py ├── requirements.txt ├── runtime.txt └── xpostsearch.py /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Jonathan (papernotes) 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /Procfile: -------------------------------------------------------------------------------- 1 | # Procfile 2 | worker: python xpostsearch.py 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Reddit Original Post Searcher Bot 2 | [OriginalPostSearcher](https://www.reddit.com/user/OriginalPostSearcher/) 3 | A Reddit bot that aims to comment with the original submission of an xpost 4 | 5 | Made as a practice bot with Python for [Reddit](http://www.reddit.com/) using [PRAW](https://praw.readthedocs.org/en/v3.1.0/). I wanted to make something for fun to learn a bit more about Python and databases. 6 | Thanks to [stackoverflow](http://stackoverflow.com/), [/r/learnpython](http://www.reddit.com/r/learnpython), and [/r/python](http://www.reddit.com/r/python) 7 | 8 | ~About 7k Karma in 1 Month :D 9 | 10 | The format for a response is: 11 | ``` 12 | Original post referenced from /r/subreddit by /u/user 13 | *submission title with link* 14 | ``` 15 | # FAQ 16 | **What's the purpose of this bot?** 17 | The bot's purpose is to provide a sort of convenience and sourcing tool for x-posts that occur daily on Reddit. With this bot's comments, the referenced post in an x-post and its author is credited. It also helps save a few clicks, especially for mobile users. 18 | 19 | **Isn't this useless if there's the "other discussions" tab?** 20 | I actually did not realize that there was an "other discussions" tab until it was pointed out when this bot was commenting. At that point, I thought that there was actually no point for this bot due to that tab. However, I had a lot of positive reception (see below) and there was the benefit of providing a quick link for mobile users. With that, I decided to continue development of the bot, saving Redditors a few clicks and providing proper crediting. 21 | 22 | **How does it work?** 23 | Generally speaking, the bot looks to find any submission with that might be an x-post, tries to see if it can find the original submission in the subreddit mentioned in the title *e.g. Title of post! (xpost from /r/specific_subreddit)*, and comments to the x-post the title of original submission, the author of that submission, its subreddit, and a link to the original submission. The bot is hosted on Heroku. 24 | 25 | **How does the bot search for the original post?** 26 | I, funnily enough, ended up using the "other discussions", along with several other tiers of searching with PRAW for the bot's code. 27 | First, the bot checks the user's previous history, and if that fails, the bot checks the "other discussions" tab. If that fails as well, the bot will look through the referenced subreddit in the title of the x-post and search through the *Hot* and *New* tabs of the subreddit. By that time, the bot will hopefully find the post and comment. 28 | 29 | **What if the x-post author provided the source in the comments already?** 30 | The bot has a check just for that! If it doesn't find a possible source in the comments, the bot will go through the process of searching for the original post. By that time, however, the author may have provided the source as well. 31 | 32 | **What about brigading for subreddits that require "np" links?** 33 | There is a list of subreddits that the bot provides "np" links - [nopart.py](https://github.com/papernotes/Reddit-OriginalPostSearcher/blob/master/nopart.py) 34 | There is also a list of subreddits that the bot will not comment to - [ignoredSubs.py](https://github.com/papernotes/Reddit-OriginalPostSearcher/blob/master/ignoredSubs.py) 35 | 36 | **Can I provide a suggestion?** 37 | Of course! PM [/u/OriginalPostSearcher](https://www.reddit.com/message/compose/?to=OriginalPostSearcher), and I'll be happy to look into it. 38 | 39 | # Reception 40 | ### Positive 41 | *This is my new favourite bot!* - [Kvothealar](https://www.reddit.com/r/shittyrobots/comments/3ful9e/the_tiniest_firefighter_xpost_rgifs/cts9azi?context=3) 42 | *I like this bot. Whoever came up with it did good.* - [CaseH1984](https://www.reddit.com/r/retrogaming/comments/3fsrds/xpost_from_rgaming_guy_3d_prints_a_tiny_nes_case/ctrlm2c?context=3) 43 | *Keep up the good work* - [Xfactor5492](https://www.reddit.com/r/CrappyDesign/comments/3ebgyz/girlfriend_wasnt_sure_why_i_laughed_at_her_water/ctdcc4j?context=3) 44 | *THE SECOND COMING OF REPOST STATISTICS?! THE PRODIGAL SON HAS RETURNED!* - [StaticDraco](https://www.reddit.com/r/funny/comments/3fn5ds/cop_frees_baby_skunk_from_yogurt_container_xpost/ctql2ey?context=3) 45 | *At least the bot gave me credit for my photo :)* - [FoodandFrenchies](https://www.reddit.com/r/burgers/comments/3fo5jb/bacon_avocado_bison_cheeseburger_on_a_homemade/ctqege2?context=3) 46 | *I fucking love the bots in this sub. There's a sentence I never thought I'd see myself type.* - [Takatalvi_Ignatio](https://www.reddit.com/r/deathgrips/comments/3flirt/ive_been_playing_waaaay_too_much_poe_for_this_to/ctq3pru?context=3) 47 | *Thank you bot. What would we do without you.* - [LordZikarno](https://www.reddit.com/r/ElderScrolls/comments/3fmo4f/interesting_reference_ive_found_in_skyrim_xpost/ctpz6a0?context=3) 48 | *I'd recommend not banning that bot, he's quite useful.* - [Chastlily](https://www.reddit.com/r/fireemblem/comments/3fimvt/hey_guys_i_recently_made_a_fire_emblem_radiant/ctoy96s?context=3) 49 | *This bot is awesome* - [Ninja_Fox_](https://www.reddit.com/r/linuxmasterrace/comments/3flqtj/wipes_windows_in_seconds_xpost_from_rfunny/ctpsbns?context=3) 50 | *I appreciate that.* - [FaceReaityBot](https://www.reddit.com/r/wethebest/comments/3flnm9/go_buy_your_whole_family_something_nice_xpost/ctpqhag?context=3) 51 | *thank mr bot for good xposts and lack of calcium* - [Poyoarya](https://www.reddit.com/r/shittyreactiongifs/comments/3fkgwt/mfw_i_realize_i_forgot_my_skeleton_at_home_xpost/ctpgl58?context=3) 52 | *Thank you for tagging me botfriend <3* - [throwcap](https://www.reddit.com/r/shockwaveporn/comments/3fj9wd/missile_hitting_its_target_xpost_from_rvideos_10s/ctp3xbi?context=3) 53 | *Ooh, that's a cool bot.* - [Non-Alignment](https://www.reddit.com/r/fireemblem/comments/3fimvt/hey_guys_i_recently_made_a_fire_emblem_radiant/ctoxpqs?context=3) 54 | *Wow, what a useful bot I never knew existed :O* - [mnmnmnmn1](https://www.reddit.com/r/TheBluePill/comments/3fc9qs/gaylubeoil_gets_into_a_dickwaving_contest_with/ctnpdvd?context=3) 55 | *Ahh you beat me to it. Excellent bot, this.* - [critically_damped](https://www.reddit.com/r/LaserCats/comments/3f91ql/allweather_lasercat_xpost_from_rcatloaf/ctmfwrr?context=3) 56 | *I think this bot addresses every possible problem with reposting.* - [winter_mutant](https://www.reddit.com/r/ContagiousLaughter/comments/3f1mn6/okay_google_whats_a_blumpkin_xpost_from/ctky10v?context=3) 57 | *Well that is a nifty bot.* - [davidverner](https://www.reddit.com/r/AmIFreeToGo/comments/3f1r2u/crosspost_from_rroadcam_driver_smashes_into_cars/ctkqbb1?context=3) 58 | *Hey, I kind of like this. Kudos bot.* - [NewJerseyFreakshow](https://www.reddit.com/r/TopMindsOfReddit/comments/3eq9a4/top_mind_mod_of_coontown_ueugenenix_gets_demodded/cthd8qy?context=3) 59 | *What a lovely bot.* - [kevik72](https://www.reddit.com/r/funny/comments/3efuvp/trick_friends_into_thinking_you_have_your_shit/cteiy59?context=3) 60 | *See, the bot knows how to crosspost. Why can't we all?* - [Duke_Wintermaul](https://www.reddit.com/r/Nerf/comments/3dsi5g/finally_xpost_from_rgifs/ct8t6bx?context=3) 61 | *Woah, that's super helpful. I've never seen the x-post bot work like this.* - [jimmycthatsme](https://www.reddit.com/r/woodworking/comments/3e7vja/my_buddy_alan_is_a_woodworker_was_told_his_work/ctcb83q?context=3) 62 | 63 | ### Not so positive 64 | *I hate this bot.* - [MightyDebo](https://www.reddit.com/r/ElderScrolls/comments/3fmo4f/interesting_reference_ive_found_in_skyrim_xpost/ctqigu7?context=3) 65 | *Please die mr bot* - [Lurkerphile](https://www.reddit.com/r/skyrim/comments/3fdt5d/i_guess_nazeem_wasnt_as_important_as_he_thought/ctnwrh5?context=3) 66 | *Is this bot really necessary? Can't people just click "other discussions" to see this?* - [send-me-to-hell](https://www.reddit.com/r/linux/comments/3f2cix/continual_testing_of_mainline_linux_kernels_xpost/ctkozdh?context=3) 67 | *Yeah, we know how to use "other discussions" tab.* - [nakilon](https://www.reddit.com/r/MyPeopleNeedMe/comments/3er1yu/battlefield_4_impressive_helicopter_physics_xpost/cthknhm?context=3) 68 | 69 | And many "you have been banned from posting to /r/______" 70 | 71 | 72 | **Favorite Thread** - [Googling Recursion](https://www.reddit.com/r/nevertellmetheodds/comments/3f8kt3/xpost_rnevertellmetheodds_this_truck_drifting_on/ctmc72u?context=3) 73 | 74 | # Milestones 75 | - 30k Karma (2/12/2016) 76 | - 40k Karma (5/12/2016) 77 | 78 | # TODO 79 | - Do something to deal with unwanted comments (Completed 7/15/2015) 80 | - Continue to update/optimize bot 81 | - Continue to update the list of ignored and no-participation subreddits 82 | - Code refactoring could probably be done (It's quite messy at the moment) (OOP Design Completed 10/15/2015) 83 | 84 | # Updates 85 | ``` 86 | 1.0.1 (7/15/2015) - Fixed commenting bug that involved the wrong links and added ability to delete unwanted comments 87 | 1.0.2 (7/16/2015) - Fixed a string checking bug for utf-8 and added logging/print statements 88 | 1.0.3 Updated the order of finding the original post, check for content first 89 | 1.0.4 (7/17/2015) - Changed the return value of one of the variables, added more logging, updated ignoredSubs list/function names 90 | 1.0.5 (7/18/2015) - Added source check, renamed old user_agent from old files, and updated ignoredSubs list 91 | 1.0.6 Updated source checking, updated user_agent, removed searchedPosts.txt, and updated ignoredSubs list 92 | 1.0.7 (7/19/2015) - Added check if getting subreddit failed, changed comment style, and updated ignoredSubs list 93 | 1.0.8 (7/20/2015) - Changed commenting style/words, updated ignoredSubs list 94 | 1.0.9 - Code cleanup, updated ignoredSubs list 95 | 1.1.0 (7/21/2015) - Added new function to find the original post faster (doesn't cover self-posts), updated ignoredSubs list 96 | 1.1.1 - Cleaned up code, updated ignoredSubs list 97 | 1.1.2 (7/22/2015) - Added original poster's username for commenting 98 | 1.1.3 (7/25/2015) - Fixed checking original post bug that involved "in" phrase, updated commenting to emphasize convenience for 99 | mobile users, and updated ignoredSubs list 100 | 1.1.4 (7/29/2015) - Added ability to create no participation links for certain subreddits 101 | 1.1.5 (7/30/2015) - Added ability to look through poster's previous posts to save time searching 102 | 1.1.6 (8/05/2015) - Changed commenting to include link to Github repo 103 | 1.1.7 (8/13/2015) - Added ability to search specifically for xposts to save time going through non-xposts 104 | 1.1.8 (8/30/2015) - Added check to make sure content is as work safe as possible 105 | 1.1.9 (9/18/2015) - Added secondary check to make sure no source comments are found after setting up comment 106 | 1.2.0 (10/15/2015) - Major code refactoring. Bot is now object-oriented. Also slightly faster than previous versions. 107 | 1.2.1 (11/12/2015) - Added check to see make sure link isn't from the same subreddit. Small fix for finding original subreddit name 108 | 1.2.2 (3/4/2016) - Fixed bug in not deleting negative comments regularly 109 | 1.2.3 (6/4/2016) - Added check for when the original submission's title references the xpost's title 110 | 2.0.0 (10/28/2016) - Adopted SemVer and change links to other subreddits to be np links 111 | ``` 112 | -------------------------------------------------------------------------------- /ignoredSubs.py: -------------------------------------------------------------------------------- 1 | # A list of subreddits to not bother searching in/bothering/banned from 2 | 3 | ignore_list = ["anime", "asianamerican", "askhistorians", "askscience", "aww", "chicagosuburbs", 4 | "cosplay", "cumberbitches", "d3gf", "deer", "depression", "depthhub", 5 | "drinkingdollars", "forwardsfromgrandma", "geckos", "giraffes", 6 | "grindsmygears", "misc", "mixedbreeds", "news", "newtotf2", "omaha", "petstacking", 7 | "pigs", "politicaldiscussion", "politics", "programmingcirclejerk", "raerthdev", 8 | "rants", "runningcirclejerk", "salvia", "science", "seiko", "shoplifting", 9 | "sketches","suicidewatch", "talesfromtechsupport","torrent","torrents","trackers", 10 | "tr4shbros", "unitedkingdom", "askreddit", "benfrick", "futurology", 11 | "graphic_design", "historicalwhatif", "lolgrindr", "malifaux", "nfl", 12 | "toonami", "ps2ceres","duelingcorner", "gadgets", "personalfinance", "woahdude", 13 | "wheredidthesodago", "gentlemanboners", "cats", "business", "holdmybeer", "beer", 14 | "pcgaming", "motorcycles", "xboxone", "mma", "productivity", "parenting", "horror" 15 | "enhancement", "biology", "apphookup", "sanfrancisco", "singularity", "transhuman", 16 | "trucks", "evangelion", "listentous", "multihub", "woahpoon", "careerguidance", "ebola", 17 | "keming", "aliens", "horses", "internationalpolitics", "libraries", "amazon", "harley", 18 | "graphicnovels", "dualsport", "milwaukee", "blackops3", "soundsvintage", "fullmoviesonline", 19 | "calamariraceteam", "listentoobscure", "blackberry", "supermoto", "european", "diesel", 20 | "batesmotel", "modeltrains", "bikebuilders", "dailyherald", "shittydiy", "askmeanything", 21 | "troutfishing", "hackedgadgets", "musicguides", "fayetteville", "englishlearning", 22 | "mentalfloss", "deathrowdiesel", "listentocurated", "southbend", "subredditreports", 23 | "gasmonkey", "bitcoinbeg", "listentousagain", "listentonew", "420", "gwar", "weaponsystems", 24 | "rants", "tforcenetwork", "summit", "breckenridge", "hogfornoobs", "artreddits", "tcgcollecting", 25 | "battletops", "sliders", "caddenmorandiary", "radd_it", "decaf", "woahgifs", "radditplaylists", 26 | "subofrome", "rainbow", "botwatchman", "fuckharley", "etfs", "neogaf", "svara", "msawareness", 27 | "metrosexual", "replygore", "usfreepress", "critterart", "3dchalk", "futurologymoderators", 28 | "truckmemes", "cssstyle", "happynews", "biochemistry2", "parables", "osiris", "truedisability", 29 | "futurologyappeals", "radditfaq", "weedtrees", "trglodyte", "happyworldnews", "stevediary", 30 | "blacklistpics", "serendipity", "pcmasterrace", "minecraft", "photoshopbattles", "interestingasfuck", 31 | "peoplebeingjerks", "animalsbeingjerks", "gaming", "awwnime", "k_on", "blackfellas", 32 | "games", "dvdcollection", "animefigures", "modernmagic", "australia", "calligraphy", 33 | "blackladies", "firefighting", "womenwithwatches", "kotakuinaction", "mechanicalkeyboards", 34 | "android", "wtf", "cringe", "dbz", "skeptic", "knives", "pics", "pic", "unexpected", "adviceanimals", 35 | "twoxchromosomes", "lewronggeneration", "nascar", "badphilosophy", "gifs", 36 | "philosophy", "censorship", "conservative", "fatlogic", "historyofideas", "earthprobes", 37 | "fantasyfootball", "redditarmie", "mistyfront", "reclaimedbynature", "blackpeopletwitter", "gamephysics", 38 | "programmerhumor", "technology", "worldnews", "youtubehaiku", "movies", "shittyprogramming", "sports", 39 | "music", "books", "history", "food", "television", "art", "diy", "warhammer", "flicks", "moescape", "asatru", 40 | "cringepics", "getmotivated", "conspiratard", "mindcrack", "soccer", "troy", "femradebates", "pussypassdenied", 41 | "lotr", "delaware", "foreveralone", "trollxchromosomes", "netsec", "tiara", "horror", "funny", "gif", 42 | "nationalpark", "celebs", "prettygirls", "surpriseddogs", "bellathorne", "delawarepolitics", 43 | "gunsarecool", "stardustcrusaders", "creepypms", "bears", "space", "denmark", "ireland", "damnthatsinteresting", 44 | "thriftstorehauls", "metalgearsolidv_pc", "etimusic", "rage", "osha", "zettairyouiki", "gameofthrones", "sinotibetan", 45 | "hopheadsde", "thailand", "romania", "photoshopfail", "cynicalbrit", "serverporn", "whatcouldgowrong", 46 | "kanmusu", "hardware", "sociology", "conspiracy", "perfecttiming", "largeimages", "republican", "dnb", 47 | "eatcheapandhealthy", "food", "foodporn", "agarioball", "codzombies", "listentothis", "europe", "bouldering", 48 | "euromusic", "barca", "destinythegame", "polandball", "stateball", "planetball", "polandballart", "panda", 49 | "fpvracing", "denvernuggets", "freethought", "aquariums", "vive", "specart", "animation", "comics", 50 | "steroids", "arkansas", "slammedtrucks", "oc_cars", "happygifs", "nba", "pizza", "abcdesis", "linguistics", 51 | "photography", "syriancivilwar", "progolf", "altbriggs", "headpats", "beautifulfemales", "bakchodi", 52 | "wec", "blancpain", "consoleproletariat", "blackandgold", "imaginarymonsters", "imaginarylandscapes", 53 | "imaginarycharacters", "adorableart", "bengals", "soma", "frontpage", "whatisthis", "lawschool", 54 | "shitamericanssay", "cbc_radio", "colorado", "redditdads", "apotheoun", "hyomin", "newzealand", "publicfreakout", 55 | "beachdogs", "catholicism", "chemistry", "montreal", "longbeards", "fishing", "slowcooking", "everythingscience", 56 | "retrobattlestations", "electricians", "crossdressing", "starwars", "iamverysmart", "cowboys", "userexperience", 57 | "japanesewatches", "opiates", "2007scape", "korean", "gamedeals", "blind", "blackpeoplegifs", "animalsbeingbros", 58 | "analogygifs", "camping", "cityporn", "bestofreports", "memes", "internetisbeautiful", "highqualitygifs", 59 | "humansbeingbros", "rage", "reactiongifs", "hifw", "nottheonion", "reversegif", "thingsthatblowup", "youdontsurf", 60 | "wastedgifs", "genderqueer", "imagesofnewyork", "imagesofcanada", "imagesofusa", "imagesofalabama", "miiverseinaction", 61 | "tattoos", "seahawks", "conspiracymemes", "justneckbeardthings", "ar15", "trance", "5555555", "commentsgetdrawn", 62 | "unitedstatesofamerica", "workbenches", "philippines", "tng", "leagueofmemes", "atari", "kiddet", "hillaryclinton", 63 | "boston", "coffee", "weekendgunnit", "cardinals", "torontobluejays", "military", "goldreplies", "shinjimin", 64 | "losangeleskings", "writermotivation", "kerbalplanes", "sto", "latino", "southflorida", "mylittleandysonic1", "zika", 65 | "babyrooms", "nononono", "overlanding", "ak47", "callofduty", "bbcradiodrama", "the_crew", "shittydarksouls", "atheism", 66 | "badtattoos", "army", "quityourbullshit", "mycology", "brokengifs", "catslaps", "albany", "canadaguns", "makeupaddiction", 67 | "freeeuropenews", "naruto", "magnificentmemes", "onepunchman", "emcomm", "belgium", "sysadmin", "codeperformance", 68 | "watches", "stpetersburgfl", "imagesofwisconsin", "hockeycards", "imagesofengland", "koreangirlstop", "infinitewarfare", 69 | "indianfood", "mariners", "britpics", "redditgetsdrawn", "fantasy", "gascar", "labsafety", "rpg", "taliyahmains", 70 | "woodworking", "iosgaming", "doctorwhumour", "imagesofcalifornia", "imagesofflorida", "imagesofmaine", "imagesofmichigan", 71 | "imagesoforegon", "imagesofthe1910s", "imagesofvirginia", "imagesofvermont", "imagesofthe2010s", "texas", "misanthropy", 72 | "hipaa", "canadapolitics", "unixporn", "pureasoiaf", "joerogaine", "subredditsimulator", "vintageelectronics", "neutralnews", 73 | "california", "fastfood", "mcdonalds", "obama", "catholicpolitics", "funfacts", "liberal", "enoughsandersspam", "irelandtelevision", 74 | "sailormoon", "functionalprint", "humor", "music", "asoiaf", "mame", "paradoxplaza", "straya", "berserk", "printsf", "lunathedog", 75 | "mariomaker", "mariomakerlevels", "learnjapanese", "iceland", "spotted", "tools", "whatisthisthing", "battlebots", "caferacers", 76 | "mkd", "onionhate", "elitedangerous", "thelastofus", "strangerthings", "darkfuturology", "sacramento", "thathappened", 77 | "asmr", "codmodernwarfare", "laclippers", "bayarea", "moonmoon"] 78 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | praw >= 3.3.0 2 | psycopg2 >= 2.6.1 3 | SQLAlchemy >= 1.0.6 -------------------------------------------------------------------------------- /runtime.txt: -------------------------------------------------------------------------------- 1 | python-2.7.9 2 | -------------------------------------------------------------------------------- /xpostsearch.py: -------------------------------------------------------------------------------- 1 | """ OriginalPostSearcher bot """ 2 | import herokuDB 3 | import ignoredSubs 4 | import praw 5 | import time 6 | from sqlalchemy import create_engine 7 | from sqlalchemy import text 8 | 9 | REDDIT_CLIENT = praw.Reddit(user_agent="OriginalPostSearcher 2.0.0") 10 | REDDIT_CLIENT.login(disable_warning=True) 11 | 12 | # a list of words that might be an "xpost" 13 | 14 | X_POST_DICTIONARY = set() 15 | X_POST_DICTIONARY.update(["xpost", "x-post", "crosspost","cross-post", 16 | "xposted", "crossposted", "x-posted"]) 17 | 18 | 19 | # list of words to check for so we don't post if source is already there 20 | ORIGINAL_COMMENTS = set() 21 | ORIGINAL_COMMENTS.update(['source', 'original', 'original post', 'sauce', 'link', 22 | 'x-post', 'xpost', 'x-post', 'crosspost', 'cross post', 23 | 'cross-post', 'referenced', 'credit', 'credited', 'other', 24 | 'post']) 25 | 26 | # create the ENGINE for the database 27 | ENGINE = create_engine(herokuDB.url) 28 | 29 | # don't bother these subs 30 | IGNORED_SUBS = set() 31 | IGNORED_SUBS.update(ignoredSubs.ignore_list) 32 | 33 | 34 | class SearchBot(object): 35 | def __init__(self): 36 | self.xpost_dict = X_POST_DICTIONARY 37 | self.ignored_subs = IGNORED_SUBS 38 | 39 | # cache for database 40 | self.cache = set() 41 | self.temp_cache = set() 42 | self.xpost_submissions = set() 43 | 44 | # fields for the xposted submission 45 | self.xpost_url = None # link shared in the submission 46 | self.xpost_permalink = None 47 | self.xpost_title = None 48 | self.xpost_author = None 49 | self.xpost_sub = None # subreddit object of xpost 50 | self.xpost_sub_title = None # the string of the subreddit 51 | 52 | # fields for the original subreddit 53 | self.original_sub = None # subreddit object 54 | self.original_sub_title = None # title of the subreddit 55 | self.original_title = None 56 | self.original_permalink = None 57 | self.original_author = None 58 | 59 | 60 | # -------- Main Bot Methods --------- # 61 | 62 | 63 | def create_comment(self, submission): 64 | print "Making comment\n" 65 | if not self.original_author: 66 | self.original_author = "a [deleted] user" 67 | else: 68 | self.original_author = "/u/" + str(self.original_author) 69 | 70 | # make links np links 71 | original_link_list = self.original_link.split("https://www.") 72 | self.original_link = "http://np." + original_link_list[1] 73 | 74 | # create the string to comment with 75 | comment_string = ("X-Post referenced from [/r/" + 76 | self.original_sub_title + "](http://np.reddit.com/r/" + 77 | self.original_sub_title + ") by " + self.original_author + 78 | " \n[" + self.original_title.encode('utf-8') + 79 | "](" + self.original_link.encode('utf-8') + 80 | ")\n***** \n \n^^I ^^am ^^a ^^bot. ^^I" + 81 | " ^^delete ^^my ^^negative ^^comments. ^^[Contact]" + 82 | "(https://www.reddit.com/message/" + 83 | "compose/?to=OriginalPostSearcher)" + 84 | " ^^| ^^[Code](https://github.com/" + 85 | "papernotes/Reddit-OriginalPostSearcher)" + 86 | " ^^| ^^[FAQ](https://github.com/papernotes/" + 87 | "Reddit-OriginalPostSearcher#faq)") 88 | print comment_string 89 | 90 | # double check 91 | if self.has_source(submission): 92 | print "Source found" 93 | else: 94 | submission.add_comment(comment_string) 95 | print "\nCommented!" 96 | 97 | 98 | def delete_negative(self): 99 | print "Checking previous comments for deletion" 100 | user = REDDIT_CLIENT.get_redditor('OriginalPostSearcher') 101 | submitted = user.get_comments(limit=200) 102 | for item in submitted: 103 | if int(item.score) < -1: 104 | print("\nDeleted negative comment\n " + str(item)) 105 | item.delete() 106 | 107 | 108 | def get_original_sub(self): 109 | try: 110 | self.xpost_title = self.xpost_title.split() 111 | except: 112 | print "Failed split" 113 | pass 114 | self.original_sub_title = None 115 | return 116 | try: 117 | for word in self.xpost_title: 118 | if '/r/' in word: 119 | # split from /r/ 120 | word = word.split('/r/')[1] 121 | word = word.split(')')[0] # try for parentheses first 122 | word = word.split(']')[0] # try for brackets 123 | print("/r/ word = " + word.encode('utf-8')) 124 | self.original_sub_title = word 125 | break 126 | # split for "r/" only format 127 | elif 'r/' in word: 128 | word = word.split('r/')[1] 129 | word = word.split(')')[0] # try for parentheses first 130 | word = word.split(']')[0] # try for brackets 131 | print("r/ word = " + word.encode('utf-8')) 132 | self.original_sub_title = word 133 | break 134 | else: 135 | self.original_sub_title = None 136 | except: 137 | print("Could not get original subreddit") 138 | self.original_sub_title = None 139 | 140 | 141 | def reset_fields(self): 142 | self.original_sub_title = None 143 | self.original_found = False 144 | 145 | 146 | def search_for_post(self, submission, lim): 147 | duplicates = submission.get_duplicates(limit=lim) 148 | 149 | print "Searching Dupes" 150 | for submission in duplicates: 151 | if self.is_original(submission): 152 | self.original_permalink = submission.permalink 153 | return True 154 | 155 | poster_name = self.xpost_author.encode('utf-8') 156 | poster = REDDIT_CLIENT.get_redditor(poster_name) 157 | user_submissions = poster.get_submitted(limit=lim) 158 | 159 | print "Searching User" 160 | for submission in user_submissions: 161 | if self.is_original(submission): 162 | self.original_permalink = submission.permalink 163 | return True 164 | 165 | # in case the subreddit doesn't exist 166 | try: 167 | self.original_sub = REDDIT_CLIENT.get_subreddit(self.original_sub_title) 168 | 169 | print "Searching New" 170 | for submission in self.original_sub.get_new(limit=lim): 171 | if self.is_original(submission): 172 | self.original_permalink = submission.permalink 173 | return True 174 | 175 | print "Searching Hot" 176 | for submission in self.original_sub.get_hot(limit=lim): 177 | if self.is_original(submission): 178 | self.original_permalink = submission.permalink 179 | return True 180 | except: 181 | pass 182 | return False 183 | 184 | print "--------------Failed all searches" 185 | return False 186 | 187 | 188 | def set_original_fields(self, submission): 189 | try: 190 | self.original_title = submission.title.encode('utf-8') 191 | self.original_link = submission.permalink 192 | self.original_author = submission.author 193 | self.original_found = True 194 | except: 195 | pass 196 | 197 | 198 | def set_xpost_fields(self, submission): 199 | try: 200 | self.xpost_url = submission.url.encode('utf-8') 201 | self.xpost_permalink = submission.permalink 202 | self.xpost_author = submission.author.name 203 | self.xpost_title = submission.title.lower().encode('utf-8') 204 | self.xpost_sub = submission.subreddit 205 | self.xpost_sub_title = str(submission.subreddit.display_name.lower()) 206 | except: 207 | pass 208 | 209 | 210 | def set_xpost_submissions(self, search_terms, client): 211 | """ 212 | Searches for the most recent xposts and sets it 213 | """ 214 | print "Finding xposts" 215 | for entry in search_terms: 216 | for title in client.search(entry, sort="new"): 217 | self.xpost_submissions.add(title) 218 | 219 | 220 | def get_xpost_title(self, title): 221 | # format TITLE(xpost) 222 | if (len(title) == title.find(')') + 1): 223 | return title.split('(')[0] 224 | # format TITLE[xpost] 225 | elif (len(title) == title.find(']') + 1): 226 | return title.split('[')[0] 227 | # format (xpost)TITLE 228 | elif (title.find('(') == 0): 229 | return title.split(')')[1] 230 | # format [xpost]TITLE 231 | elif (title.find('[') == 0): 232 | return title.split('[')[1] 233 | # weird format, return false 234 | else: 235 | print ("Couldn't get title correctly") 236 | return None 237 | 238 | 239 | # -------- Boolean Methods --------- # 240 | 241 | 242 | def has_source(self, submission): 243 | for comment in submission.comments: 244 | try: 245 | if (any(string in str(comment.body).lower() 246 | for string in ORIGINAL_COMMENTS)): 247 | print("Source in comments found: ") 248 | print(" " + str(comment.body) + "\n") 249 | return True 250 | except: 251 | pass 252 | 253 | print "No 'source' comments found" 254 | return False 255 | 256 | 257 | def is_ignored_or_nsfw(self, submission): 258 | return not (submission.subreddit.display_name.lower() in self.ignored_subs or 259 | submission.over_18 is True) 260 | 261 | 262 | def is_original(self, submission): 263 | try: 264 | if (self.xpost_url == str(submission.url).encode('utf-8') and 265 | submission.subreddit.display_name.lower().encode('utf-8') == self.original_sub_title and 266 | submission.over_18 is False and 267 | not self.xpost_permalink in submission.permalink): 268 | self.set_original_fields(submission) 269 | return True 270 | return False 271 | except: 272 | pass 273 | return False 274 | 275 | 276 | def is_same_ref(self): 277 | """ 278 | If the original submission's title is an x-post referencing 279 | the xpost sub, then return True 280 | """ 281 | if self.xpost_sub_title in self.original_title: 282 | print "True Ref" 283 | return True 284 | print "False Ref" 285 | return False 286 | 287 | 288 | def is_xpost(self, submission): 289 | submission_title = submission.title.lower() 290 | try: 291 | submission_title = submission_title.encode('utf-8') 292 | except: 293 | pass 294 | return False 295 | return any(string in submission_title for string in self.xpost_dict) 296 | 297 | 298 | # -------- Database --------- # 299 | 300 | 301 | def clear_database(self): 302 | num_rows = ENGINE.execute("select * from searched_posts") 303 | 304 | if num_rows.rowcount > 1000: 305 | ENGINE.execute("delete from searched_posts") 306 | print "Cleared database" 307 | if len(self.cache) > 1000: 308 | self.cache = self.cache[int(len(self.cache))/2:] 309 | print "Halved cache" 310 | 311 | 312 | def id_added(self, sub_id): 313 | id_added_text = text("select * from searched_posts where post_id = :postID") 314 | return ENGINE.execute(id_added_text, postID=sub_id).rowcount != 0 315 | 316 | 317 | def setup_database_cache(self): 318 | result = ENGINE.execute("select * from searched_posts") 319 | 320 | for row in result: 321 | self.temp_cache.add(str(row[0])) 322 | 323 | for value in self.temp_cache: 324 | if value not in self.cache: 325 | self.cache.add(str(value)) 326 | 327 | 328 | def write_to_file(self, sub_id): 329 | """ 330 | Saves the submission we just searched 331 | """ 332 | if not self.id_added(sub_id): 333 | temp_text = text('insert into searched_posts (post_id) values(:postID)') 334 | ENGINE.execute(temp_text, postID=sub_id) 335 | 336 | 337 | 338 | if __name__ == '__main__': 339 | bot = SearchBot() 340 | print "Created bot" 341 | 342 | while True: 343 | bot.set_xpost_submissions(X_POST_DICTIONARY, REDDIT_CLIENT) 344 | bot.setup_database_cache() 345 | 346 | for submission in bot.xpost_submissions: 347 | # NSFW content or ignored subreddit 348 | if not bot.is_ignored_or_nsfw(submission) and submission.id not in bot.cache: 349 | bot.write_to_file(submission.id) 350 | bot.reset_fields() 351 | continue 352 | 353 | if bot.is_xpost(submission) and submission.id not in bot.cache: 354 | 355 | bot.set_xpost_fields(submission) 356 | 357 | try: 358 | if "reddit" in bot.xpost_url.encode('utf-8'): 359 | print "Post links to Reddit" 360 | bot.write_to_file(submission.id) 361 | bot.reset_fields() 362 | continue 363 | except: 364 | bot.write_to_file(submission.id) 365 | bot.reset_fields() 366 | continue 367 | 368 | print("\nXPost found!") 369 | print("subreddit = " + bot.xpost_sub_title) 370 | print("post title = " + bot.xpost_title) 371 | print("xpost_url = " + bot.xpost_url) 372 | print("xpost_permalink = " + bot.xpost_permalink.encode('utf-8')) 373 | 374 | bot.write_to_file(submission.id) 375 | bot.get_original_sub() 376 | 377 | if (bot.original_sub_title == None or 378 | bot.original_sub_title == bot.xpost_sub.display_name.lower().encode('utf-8')): 379 | print "Failed original subreddit or same subreddit" 380 | bot.reset_fields() 381 | else: 382 | if not bot.has_source(submission) and bot.search_for_post(submission, 150) and not bot.is_same_ref(): 383 | try: 384 | bot.create_comment(submission) 385 | bot.write_to_file(submission.id) 386 | bot.reset_fields() 387 | except: 388 | print "Failed to comment" 389 | bot.write_to_file(submission.id) 390 | bot.reset_fields() 391 | else: 392 | print "Failed to find source" 393 | bot.write_to_file(submission.id) 394 | bot.reset_fields() 395 | # the submission is not an xpost or submission id is in cache already 396 | else: 397 | bot.reset_fields() 398 | 399 | bot.delete_negative() 400 | bot.temp_cache.clear() 401 | bot.xpost_submissions.clear() 402 | 403 | print "\nSleeping\n" 404 | time.sleep(10) 405 | if len(bot.cache) > 1000: 406 | bot.clear_database() 407 | --------------------------------------------------------------------------------