├── LICENSE
├── Procfile
├── README.md
├── ignoredSubs.py
├── requirements.txt
├── runtime.txt
└── xpostsearch.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2015 Jonathan (papernotes)
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 
23 | 


--------------------------------------------------------------------------------
/Procfile:
--------------------------------------------------------------------------------
1 | # Procfile
2 | worker: python xpostsearch.py
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Reddit Original Post Searcher Bot
  2 | [OriginalPostSearcher](https://www.reddit.com/user/OriginalPostSearcher/)  
  3 | A Reddit bot that aims to comment with the original submission of an xpost  
  4 | 
  5 | Made as a practice bot with Python for [Reddit](http://www.reddit.com/) using [PRAW](https://praw.readthedocs.org/en/v3.1.0/). I wanted to make something for fun to learn a bit more about Python and databases.  
  6 | Thanks to [stackoverflow](http://stackoverflow.com/), [/r/learnpython](http://www.reddit.com/r/learnpython), and [/r/python](http://www.reddit.com/r/python)
  7 | 
  8 | ~About 7k Karma in 1 Month :D
  9 | 
 10 | The format for a response is:
 11 | ```
 12 | Original post referenced from /r/subreddit by /u/user
 13 | *submission title with link*
 14 | ```  
 15 | # FAQ
 16 | **What's the purpose of this bot?**  
 17 | The bot's purpose is to provide a sort of convenience and sourcing tool for x-posts that occur daily on Reddit. With this bot's comments, the referenced post in an x-post and its author is credited. It also helps save a few clicks, especially for mobile users.  
 18 |   
 19 | **Isn't this useless if there's the "other discussions" tab?**  
 20 | I actually did not realize that there was an "other discussions" tab until it was pointed out when this bot was commenting. At that point, I thought that there was actually no point for this bot due to that tab. However, I had a lot of positive reception (see below) and there was the benefit of providing a quick link for mobile users. With that, I decided to continue development of the bot, saving Redditors a few clicks and providing proper crediting. 
 21 |   
 22 | **How does it work?**  
 23 | Generally speaking, the bot looks to find any submission with that might be an x-post, tries to see if it can find the original submission in the subreddit mentioned in the title *e.g. Title of post! (xpost from /r/specific_subreddit)*, and comments to the x-post the title of original submission, the author of that submission, its subreddit, and a link to the original submission. The bot is hosted on Heroku.  
 24 |   
 25 | **How does the bot search for the original post?**  
 26 | I, funnily enough, ended up using the "other discussions", along with several other tiers of searching with PRAW for the bot's code.  
 27 | First, the bot checks the user's previous history, and if that fails, the bot checks the "other discussions" tab. If that fails as well, the bot will look through the referenced subreddit in the title of the x-post and search through the *Hot* and *New* tabs of the subreddit. By that time, the bot will hopefully find the post and comment.
 28 |   
 29 | **What if the x-post author provided the source in the comments already?**  
 30 | The bot has a check just for that! If it doesn't find a possible source in the comments, the bot will go through the process of searching for the original post. By that time, however, the author may have provided the source as well.  
 31 |   
 32 | **What about brigading for subreddits that require "np" links?**  
 33 | There is a list of subreddits that the bot provides "np" links -  [nopart.py](https://github.com/papernotes/Reddit-OriginalPostSearcher/blob/master/nopart.py)  
 34 | There is also a list of subreddits that the bot will not comment to -  [ignoredSubs.py](https://github.com/papernotes/Reddit-OriginalPostSearcher/blob/master/ignoredSubs.py)  
 35 |   
 36 | **Can I provide a suggestion?**  
 37 | Of course! PM [/u/OriginalPostSearcher](https://www.reddit.com/message/compose/?to=OriginalPostSearcher), and I'll be happy to look into it.
 38 |   
 39 | # Reception  
 40 | ### Positive
 41 | *This is my new favourite bot!* - [Kvothealar](https://www.reddit.com/r/shittyrobots/comments/3ful9e/the_tiniest_firefighter_xpost_rgifs/cts9azi?context=3)  
 42 | *I like this bot. Whoever came up with it did good.* - [CaseH1984](https://www.reddit.com/r/retrogaming/comments/3fsrds/xpost_from_rgaming_guy_3d_prints_a_tiny_nes_case/ctrlm2c?context=3)  
 43 | *Keep up the good work* - [Xfactor5492](https://www.reddit.com/r/CrappyDesign/comments/3ebgyz/girlfriend_wasnt_sure_why_i_laughed_at_her_water/ctdcc4j?context=3)  
 44 | *THE SECOND COMING OF REPOST STATISTICS?! THE PRODIGAL SON HAS RETURNED!* - [StaticDraco](https://www.reddit.com/r/funny/comments/3fn5ds/cop_frees_baby_skunk_from_yogurt_container_xpost/ctql2ey?context=3)  
 45 | *At least the bot gave me credit for my photo :)* - [FoodandFrenchies](https://www.reddit.com/r/burgers/comments/3fo5jb/bacon_avocado_bison_cheeseburger_on_a_homemade/ctqege2?context=3)  
 46 | *I fucking love the bots in this sub. There's a sentence I never thought I'd see myself type.* - [Takatalvi_Ignatio](https://www.reddit.com/r/deathgrips/comments/3flirt/ive_been_playing_waaaay_too_much_poe_for_this_to/ctq3pru?context=3)  
 47 | *Thank you bot. What would we do without you.* - [LordZikarno](https://www.reddit.com/r/ElderScrolls/comments/3fmo4f/interesting_reference_ive_found_in_skyrim_xpost/ctpz6a0?context=3)  
 48 | *I'd recommend not banning that bot, he's quite useful.* - [Chastlily](https://www.reddit.com/r/fireemblem/comments/3fimvt/hey_guys_i_recently_made_a_fire_emblem_radiant/ctoy96s?context=3)  
 49 | *This bot is awesome* - [Ninja_Fox_](https://www.reddit.com/r/linuxmasterrace/comments/3flqtj/wipes_windows_in_seconds_xpost_from_rfunny/ctpsbns?context=3)  
 50 | *I appreciate that.* - [FaceReaityBot](https://www.reddit.com/r/wethebest/comments/3flnm9/go_buy_your_whole_family_something_nice_xpost/ctpqhag?context=3)  
 51 | *thank mr bot for good xposts and lack of calcium* - [Poyoarya](https://www.reddit.com/r/shittyreactiongifs/comments/3fkgwt/mfw_i_realize_i_forgot_my_skeleton_at_home_xpost/ctpgl58?context=3)  
 52 | *Thank you for tagging me botfriend <3* - [throwcap](https://www.reddit.com/r/shockwaveporn/comments/3fj9wd/missile_hitting_its_target_xpost_from_rvideos_10s/ctp3xbi?context=3)  
 53 | *Ooh, that's a cool bot.* - [Non-Alignment](https://www.reddit.com/r/fireemblem/comments/3fimvt/hey_guys_i_recently_made_a_fire_emblem_radiant/ctoxpqs?context=3)  
 54 | *Wow, what a useful bot I never knew existed :O* - [mnmnmnmn1](https://www.reddit.com/r/TheBluePill/comments/3fc9qs/gaylubeoil_gets_into_a_dickwaving_contest_with/ctnpdvd?context=3)  
 55 | *Ahh you beat me to it. Excellent bot, this.* - [critically_damped](https://www.reddit.com/r/LaserCats/comments/3f91ql/allweather_lasercat_xpost_from_rcatloaf/ctmfwrr?context=3)  
 56 | *I think this bot addresses every possible problem with reposting.* - [winter_mutant](https://www.reddit.com/r/ContagiousLaughter/comments/3f1mn6/okay_google_whats_a_blumpkin_xpost_from/ctky10v?context=3)  
 57 | *Well that is a nifty bot.* - [davidverner](https://www.reddit.com/r/AmIFreeToGo/comments/3f1r2u/crosspost_from_rroadcam_driver_smashes_into_cars/ctkqbb1?context=3)  
 58 | *Hey, I kind of like this. Kudos bot.* - [NewJerseyFreakshow](https://www.reddit.com/r/TopMindsOfReddit/comments/3eq9a4/top_mind_mod_of_coontown_ueugenenix_gets_demodded/cthd8qy?context=3)  
 59 | *What a lovely bot.* - [kevik72](https://www.reddit.com/r/funny/comments/3efuvp/trick_friends_into_thinking_you_have_your_shit/cteiy59?context=3)  
 60 | *See, the bot knows how to crosspost. Why can't we all?* - [Duke_Wintermaul](https://www.reddit.com/r/Nerf/comments/3dsi5g/finally_xpost_from_rgifs/ct8t6bx?context=3)  
 61 | *Woah, that's super helpful. I've never seen the x-post bot work like this.* - [jimmycthatsme](https://www.reddit.com/r/woodworking/comments/3e7vja/my_buddy_alan_is_a_woodworker_was_told_his_work/ctcb83q?context=3)
 62 | 
 63 | ### Not so positive  
 64 | *I hate this bot.* - [MightyDebo](https://www.reddit.com/r/ElderScrolls/comments/3fmo4f/interesting_reference_ive_found_in_skyrim_xpost/ctqigu7?context=3)  
 65 | *Please die mr bot* - [Lurkerphile](https://www.reddit.com/r/skyrim/comments/3fdt5d/i_guess_nazeem_wasnt_as_important_as_he_thought/ctnwrh5?context=3)  
 66 | *Is this bot really necessary? Can't people just click "other discussions" to see this?* - [send-me-to-hell](https://www.reddit.com/r/linux/comments/3f2cix/continual_testing_of_mainline_linux_kernels_xpost/ctkozdh?context=3)  
 67 | *Yeah, we know how to use "other discussions" tab.* - [nakilon](https://www.reddit.com/r/MyPeopleNeedMe/comments/3er1yu/battlefield_4_impressive_helicopter_physics_xpost/cthknhm?context=3)  
 68 |   
 69 | And many "you have been banned from posting to /r/______"
 70 |   
 71 | 
 72 | **Favorite Thread** - [Googling Recursion](https://www.reddit.com/r/nevertellmetheodds/comments/3f8kt3/xpost_rnevertellmetheodds_this_truck_drifting_on/ctmc72u?context=3)
 73 | 
 74 | # Milestones
 75 | - 30k Karma (2/12/2016)  
 76 | - 40k Karma (5/12/2016)
 77 | 
 78 | # TODO
 79 | - Do something to deal with unwanted comments (Completed 7/15/2015)
 80 | - Continue to update/optimize bot
 81 | - Continue to update the list of ignored and no-participation subreddits
 82 | - Code refactoring could probably be done (It's quite messy at the moment) (OOP Design Completed 10/15/2015)
 83 | 
 84 | # Updates
 85 | ```
 86 | 1.0.1 (7/15/2015)  -  Fixed commenting bug that involved the wrong links and added ability to delete unwanted comments  
 87 | 1.0.2 (7/16/2015)  -  Fixed a string checking bug for utf-8 and added logging/print statements  
 88 | 1.0.3                 Updated the order of finding the original post, check for content first
 89 | 1.0.4 (7/17/2015)  -  Changed the return value of one of the variables, added more logging, updated ignoredSubs list/function names
 90 | 1.0.5 (7/18/2015)  -  Added source check, renamed old user_agent from old files, and updated ignoredSubs list
 91 | 1.0.6                 Updated source checking, updated user_agent, removed searchedPosts.txt, and updated ignoredSubs list
 92 | 1.0.7 (7/19/2015)  -  Added check if getting subreddit failed, changed comment style, and updated ignoredSubs list
 93 | 1.0.8 (7/20/2015)  -  Changed commenting style/words, updated ignoredSubs list
 94 | 1.0.9              -  Code cleanup, updated ignoredSubs list
 95 | 1.1.0 (7/21/2015)  -  Added new function to find the original post faster (doesn't cover self-posts), updated ignoredSubs list
 96 | 1.1.1              -  Cleaned up code, updated ignoredSubs list
 97 | 1.1.2 (7/22/2015)  -  Added original poster's username for commenting
 98 | 1.1.3 (7/25/2015)  -  Fixed checking original post bug that involved "in" phrase, updated commenting to emphasize convenience for
 99 |                       mobile users, and updated ignoredSubs list
100 | 1.1.4 (7/29/2015)  -  Added ability to create no participation links for certain subreddits
101 | 1.1.5 (7/30/2015)  -  Added ability to look through poster's previous posts to save time searching
102 | 1.1.6 (8/05/2015)  -  Changed commenting to include link to Github repo
103 | 1.1.7 (8/13/2015)  -  Added ability to search specifically for xposts to save time going through non-xposts
104 | 1.1.8 (8/30/2015)  -  Added check to make sure content is as work safe as possible
105 | 1.1.9 (9/18/2015)  -  Added secondary check to make sure no source comments are found after setting up comment
106 | 1.2.0 (10/15/2015) -  Major code refactoring. Bot is now object-oriented. Also slightly faster than previous versions.
107 | 1.2.1 (11/12/2015) -  Added check to see make sure link isn't from the same subreddit. Small fix for finding original subreddit name
108 | 1.2.2 (3/4/2016)   -  Fixed bug in not deleting negative comments regularly
109 | 1.2.3 (6/4/2016)   -  Added check for when the original submission's title references the xpost's title
110 | 2.0.0 (10/28/2016) -  Adopted SemVer and change links to other subreddits to be np links
111 | ```
112 | 


--------------------------------------------------------------------------------
/ignoredSubs.py:
--------------------------------------------------------------------------------
 1 | # A list of subreddits to not bother searching in/bothering/banned from
 2 | 
 3 | ignore_list = ["anime", "asianamerican", "askhistorians", "askscience", "aww", "chicagosuburbs",
 4 |     	"cosplay", "cumberbitches", "d3gf", "deer", "depression", "depthhub",
 5 |     	"drinkingdollars", "forwardsfromgrandma", "geckos", "giraffes",
 6 |     	"grindsmygears", "misc", "mixedbreeds", "news", "newtotf2", "omaha", "petstacking",
 7 |     	"pigs", "politicaldiscussion", "politics", "programmingcirclejerk", "raerthdev",
 8 |     	"rants", "runningcirclejerk", "salvia", "science", "seiko", "shoplifting",
 9 |     	"sketches","suicidewatch", "talesfromtechsupport","torrent","torrents","trackers",
10 |         "tr4shbros", "unitedkingdom", "askreddit", "benfrick", "futurology",
11 |     	"graphic_design", "historicalwhatif", "lolgrindr", "malifaux", "nfl",
12 |     	"toonami", "ps2ceres","duelingcorner", "gadgets", "personalfinance", "woahdude",
13 |         "wheredidthesodago", "gentlemanboners", "cats", "business", "holdmybeer", "beer",
14 |         "pcgaming", "motorcycles", "xboxone", "mma", "productivity", "parenting", "horror"
15 |         "enhancement", "biology", "apphookup", "sanfrancisco", "singularity", "transhuman",
16 |         "trucks", "evangelion", "listentous", "multihub", "woahpoon", "careerguidance", "ebola",
17 |         "keming", "aliens", "horses", "internationalpolitics", "libraries", "amazon", "harley",
18 |         "graphicnovels", "dualsport", "milwaukee", "blackops3", "soundsvintage", "fullmoviesonline",
19 |         "calamariraceteam", "listentoobscure", "blackberry", "supermoto", "european", "diesel",
20 |         "batesmotel", "modeltrains", "bikebuilders", "dailyherald", "shittydiy", "askmeanything",
21 |         "troutfishing", "hackedgadgets", "musicguides", "fayetteville", "englishlearning",
22 |         "mentalfloss", "deathrowdiesel", "listentocurated", "southbend", "subredditreports",
23 |         "gasmonkey", "bitcoinbeg", "listentousagain", "listentonew", "420", "gwar", "weaponsystems",
24 |         "rants", "tforcenetwork", "summit", "breckenridge", "hogfornoobs", "artreddits", "tcgcollecting",
25 |         "battletops", "sliders", "caddenmorandiary", "radd_it", "decaf", "woahgifs", "radditplaylists",
26 |         "subofrome", "rainbow", "botwatchman", "fuckharley", "etfs", "neogaf", "svara", "msawareness",
27 |         "metrosexual", "replygore", "usfreepress", "critterart", "3dchalk", "futurologymoderators",
28 |         "truckmemes", "cssstyle", "happynews", "biochemistry2", "parables", "osiris", "truedisability",
29 |         "futurologyappeals", "radditfaq", "weedtrees", "trglodyte", "happyworldnews", "stevediary",
30 |         "blacklistpics", "serendipity", "pcmasterrace", "minecraft", "photoshopbattles", "interestingasfuck",
31 |         "peoplebeingjerks", "animalsbeingjerks", "gaming", "awwnime", "k_on", "blackfellas",
32 |         "games", "dvdcollection", "animefigures", "modernmagic", "australia", "calligraphy",
33 |         "blackladies", "firefighting", "womenwithwatches", "kotakuinaction", "mechanicalkeyboards",
34 |         "android", "wtf", "cringe", "dbz", "skeptic", "knives", "pics", "pic", "unexpected", "adviceanimals",
35 |         "twoxchromosomes", "lewronggeneration", "nascar", "badphilosophy", "gifs",
36 |         "philosophy", "censorship", "conservative", "fatlogic", "historyofideas", "earthprobes",
37 |         "fantasyfootball", "redditarmie", "mistyfront", "reclaimedbynature", "blackpeopletwitter", "gamephysics",
38 |         "programmerhumor", "technology", "worldnews", "youtubehaiku", "movies", "shittyprogramming", "sports",
39 |         "music", "books", "history", "food", "television", "art", "diy", "warhammer", "flicks", "moescape", "asatru",
40 |         "cringepics", "getmotivated", "conspiratard", "mindcrack", "soccer", "troy", "femradebates", "pussypassdenied",
41 |         "lotr", "delaware", "foreveralone", "trollxchromosomes", "netsec", "tiara", "horror", "funny", "gif",
42 |         "nationalpark", "celebs", "prettygirls", "surpriseddogs", "bellathorne", "delawarepolitics",
43 |         "gunsarecool", "stardustcrusaders", "creepypms", "bears", "space", "denmark", "ireland", "damnthatsinteresting",
44 |         "thriftstorehauls", "metalgearsolidv_pc", "etimusic", "rage", "osha", "zettairyouiki", "gameofthrones", "sinotibetan",
45 |         "hopheadsde", "thailand", "romania", "photoshopfail", "cynicalbrit", "serverporn", "whatcouldgowrong",
46 |         "kanmusu", "hardware", "sociology", "conspiracy", "perfecttiming", "largeimages", "republican", "dnb",
47 |         "eatcheapandhealthy", "food", "foodporn", "agarioball", "codzombies", "listentothis", "europe", "bouldering",
48 |         "euromusic", "barca", "destinythegame", "polandball", "stateball", "planetball", "polandballart", "panda",
49 |         "fpvracing", "denvernuggets", "freethought", "aquariums", "vive", "specart", "animation", "comics",
50 |         "steroids", "arkansas", "slammedtrucks", "oc_cars", "happygifs", "nba", "pizza", "abcdesis", "linguistics",
51 |         "photography", "syriancivilwar", "progolf", "altbriggs", "headpats", "beautifulfemales", "bakchodi",
52 |         "wec", "blancpain", "consoleproletariat", "blackandgold", "imaginarymonsters", "imaginarylandscapes",
53 |         "imaginarycharacters", "adorableart", "bengals", "soma", "frontpage", "whatisthis", "lawschool",
54 |         "shitamericanssay", "cbc_radio", "colorado", "redditdads", "apotheoun", "hyomin", "newzealand", "publicfreakout",
55 |         "beachdogs", "catholicism", "chemistry", "montreal", "longbeards", "fishing", "slowcooking", "everythingscience",
56 |         "retrobattlestations", "electricians", "crossdressing", "starwars", "iamverysmart", "cowboys", "userexperience",
57 |         "japanesewatches", "opiates", "2007scape", "korean", "gamedeals", "blind", "blackpeoplegifs", "animalsbeingbros",
58 |         "analogygifs", "camping", "cityporn", "bestofreports", "memes", "internetisbeautiful", "highqualitygifs",
59 |         "humansbeingbros", "rage", "reactiongifs", "hifw", "nottheonion", "reversegif", "thingsthatblowup", "youdontsurf",
60 |         "wastedgifs", "genderqueer", "imagesofnewyork", "imagesofcanada", "imagesofusa", "imagesofalabama", "miiverseinaction",
61 |         "tattoos", "seahawks", "conspiracymemes", "justneckbeardthings", "ar15", "trance", "5555555", "commentsgetdrawn",
62 |         "unitedstatesofamerica", "workbenches", "philippines", "tng", "leagueofmemes", "atari", "kiddet", "hillaryclinton",
63 |         "boston", "coffee", "weekendgunnit", "cardinals", "torontobluejays", "military", "goldreplies", "shinjimin",
64 |         "losangeleskings", "writermotivation", "kerbalplanes", "sto", "latino", "southflorida", "mylittleandysonic1", "zika",
65 |         "babyrooms", "nononono", "overlanding", "ak47", "callofduty", "bbcradiodrama", "the_crew", "shittydarksouls", "atheism",
66 |         "badtattoos", "army", "quityourbullshit", "mycology", "brokengifs", "catslaps", "albany", "canadaguns", "makeupaddiction",
67 |         "freeeuropenews", "naruto", "magnificentmemes", "onepunchman", "emcomm", "belgium", "sysadmin", "codeperformance",
68 |         "watches", "stpetersburgfl", "imagesofwisconsin", "hockeycards", "imagesofengland", "koreangirlstop", "infinitewarfare",
69 |         "indianfood", "mariners", "britpics", "redditgetsdrawn", "fantasy", "gascar", "labsafety", "rpg", "taliyahmains",
70 |         "woodworking", "iosgaming", "doctorwhumour", "imagesofcalifornia", "imagesofflorida", "imagesofmaine", "imagesofmichigan",
71 |         "imagesoforegon", "imagesofthe1910s", "imagesofvirginia", "imagesofvermont", "imagesofthe2010s", "texas", "misanthropy",
72 |         "hipaa", "canadapolitics", "unixporn", "pureasoiaf", "joerogaine", "subredditsimulator", "vintageelectronics", "neutralnews",
73 |         "california", "fastfood", "mcdonalds", "obama", "catholicpolitics", "funfacts", "liberal", "enoughsandersspam", "irelandtelevision",
74 |         "sailormoon", "functionalprint", "humor", "music", "asoiaf", "mame", "paradoxplaza", "straya", "berserk", "printsf", "lunathedog",
75 |         "mariomaker", "mariomakerlevels", "learnjapanese", "iceland", "spotted", "tools", "whatisthisthing", "battlebots", "caferacers",
76 |         "mkd", "onionhate", "elitedangerous", "thelastofus", "strangerthings", "darkfuturology", "sacramento", "thathappened",
77 |         "asmr", "codmodernwarfare", "laclippers", "bayarea", "moonmoon"]
78 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | praw >= 3.3.0
2 | psycopg2 >= 2.6.1
3 | SQLAlchemy >= 1.0.6


--------------------------------------------------------------------------------
/runtime.txt:
--------------------------------------------------------------------------------
1 | python-2.7.9
2 | 


--------------------------------------------------------------------------------
/xpostsearch.py:
--------------------------------------------------------------------------------
  1 | """ OriginalPostSearcher bot """
  2 | import herokuDB
  3 | import ignoredSubs
  4 | import praw
  5 | import time
  6 | from sqlalchemy import create_engine
  7 | from sqlalchemy import text
  8 | 
  9 | REDDIT_CLIENT = praw.Reddit(user_agent="OriginalPostSearcher 2.0.0")
 10 | REDDIT_CLIENT.login(disable_warning=True)
 11 | 
 12 | # a list of words that might be an "xpost"
 13 | 
 14 | X_POST_DICTIONARY = set()
 15 | X_POST_DICTIONARY.update(["xpost", "x-post", "crosspost","cross-post",
 16 |                      "xposted", "crossposted", "x-posted"])
 17 | 
 18 | 
 19 | # list of words to check for so we don't post if source is already there
 20 | ORIGINAL_COMMENTS = set()
 21 | ORIGINAL_COMMENTS.update(['source', 'original', 'original post', 'sauce', 'link',
 22 |                      'x-post', 'xpost', 'x-post', 'crosspost', 'cross post',
 23 |                      'cross-post', 'referenced', 'credit', 'credited', 'other',
 24 |                      'post'])
 25 | 
 26 | # create the ENGINE for the database
 27 | ENGINE = create_engine(herokuDB.url)
 28 | 
 29 | # don't bother these subs
 30 | IGNORED_SUBS = set()
 31 | IGNORED_SUBS.update(ignoredSubs.ignore_list)
 32 | 
 33 | 
 34 | class SearchBot(object):
 35 |     def __init__(self):
 36 |         self.xpost_dict = X_POST_DICTIONARY
 37 |         self.ignored_subs = IGNORED_SUBS
 38 | 
 39 |         # cache for database
 40 |         self.cache = set()
 41 |         self.temp_cache = set()
 42 |         self.xpost_submissions = set()
 43 | 
 44 |         # fields for the xposted submission
 45 |         self.xpost_url = None                   # link shared in the submission
 46 |         self.xpost_permalink = None
 47 |         self.xpost_title = None
 48 |         self.xpost_author = None
 49 |         self.xpost_sub = None                   # subreddit object of xpost
 50 |         self.xpost_sub_title = None             # the string of the subreddit
 51 | 
 52 |         # fields for the original subreddit
 53 |         self.original_sub = None                # subreddit object
 54 |         self.original_sub_title = None          # title of the subreddit
 55 |         self.original_title = None
 56 |         self.original_permalink = None
 57 |         self.original_author = None
 58 | 
 59 | 
 60 |     # -------- Main Bot Methods --------- #
 61 | 
 62 | 
 63 |     def create_comment(self, submission):
 64 |         print "Making comment\n"
 65 |         if not self.original_author:
 66 |             self.original_author = "a [deleted] user"
 67 |         else:
 68 |             self.original_author = "/u/" + str(self.original_author)
 69 | 
 70 |         # make links np links
 71 |         original_link_list = self.original_link.split("https://www.")
 72 |         self.original_link = "http://np." + original_link_list[1]
 73 | 
 74 |         # create the string to comment with
 75 |         comment_string = ("X-Post referenced from [/r/" +
 76 |                       self.original_sub_title + "](http://np.reddit.com/r/" + 
 77 |                       self.original_sub_title + ") by " + self.original_author +
 78 |                       "  \n[" + self.original_title.encode('utf-8') +
 79 |                       "](" + self.original_link.encode('utf-8') +
 80 |                       ")\n*****  \n  \n^^I ^^am ^^a ^^bot. ^^I" +
 81 |                       " ^^delete ^^my ^^negative ^^comments. ^^[Contact]" +
 82 |                       "(https://www.reddit.com/message/" +
 83 |                       "compose/?to=OriginalPostSearcher)" +
 84 |                       " ^^| ^^[Code](https://github.com/" +
 85 |                       "papernotes/Reddit-OriginalPostSearcher)" +
 86 |                       " ^^| ^^[FAQ](https://github.com/papernotes/" +
 87 |                       "Reddit-OriginalPostSearcher#faq)")
 88 |         print comment_string
 89 | 
 90 |         # double check
 91 |         if self.has_source(submission):
 92 |             print "Source found"
 93 |         else:
 94 |             submission.add_comment(comment_string)
 95 |             print "\nCommented!"
 96 | 
 97 | 
 98 |     def delete_negative(self):
 99 |         print "Checking previous comments for deletion"
100 |         user = REDDIT_CLIENT.get_redditor('OriginalPostSearcher')
101 |         submitted = user.get_comments(limit=200)
102 |         for item in submitted:
103 |             if int(item.score) < -1:
104 |                 print("\nDeleted negative comment\n        " + str(item))
105 |                 item.delete()
106 | 
107 | 
108 |     def get_original_sub(self):
109 |         try:
110 |             self.xpost_title = self.xpost_title.split()
111 |         except:
112 |             print "Failed split"
113 |             pass
114 |             self.original_sub_title = None
115 |             return
116 |         try:
117 |             for word in self.xpost_title:
118 |                 if '/r/' in word:
119 |                     # split from /r/
120 |                     word = word.split('/r/')[1]
121 |                     word = word.split(')')[0]   # try for parentheses first
122 |                     word = word.split(']')[0]   # try for brackets
123 |                     print("/r/ word = " + word.encode('utf-8'))
124 |                     self.original_sub_title = word
125 |                     break
126 |                 # split for "r/" only format
127 |                 elif 'r/' in word:
128 |                     word = word.split('r/')[1]
129 |                     word = word.split(')')[0]   # try for parentheses first
130 |                     word = word.split(']')[0]   # try for brackets
131 |                     print("r/ word = " + word.encode('utf-8'))
132 |                     self.original_sub_title = word
133 |                     break
134 |                 else:
135 |                     self.original_sub_title = None
136 |         except:
137 |             print("Could not get original subreddit")
138 |             self.original_sub_title = None
139 | 
140 | 
141 |     def reset_fields(self):
142 |         self.original_sub_title = None
143 |         self.original_found = False
144 | 
145 | 
146 |     def search_for_post(self, submission, lim):
147 |         duplicates = submission.get_duplicates(limit=lim)
148 | 
149 |         print "Searching Dupes"
150 |         for submission in duplicates:
151 |             if self.is_original(submission):
152 |                 self.original_permalink = submission.permalink
153 |                 return True
154 | 
155 |         poster_name = self.xpost_author.encode('utf-8')
156 |         poster = REDDIT_CLIENT.get_redditor(poster_name)
157 |         user_submissions = poster.get_submitted(limit=lim)
158 | 
159 |         print "Searching User"
160 |         for submission in user_submissions:
161 |             if self.is_original(submission):
162 |                 self.original_permalink = submission.permalink
163 |                 return True
164 | 
165 |         # in case the subreddit doesn't exist
166 |         try:
167 |             self.original_sub = REDDIT_CLIENT.get_subreddit(self.original_sub_title)
168 | 
169 |             print "Searching New"
170 |             for submission in self.original_sub.get_new(limit=lim):
171 |                 if self.is_original(submission):
172 |                     self.original_permalink = submission.permalink
173 |                     return True
174 | 
175 |             print "Searching Hot"
176 |             for submission in self.original_sub.get_hot(limit=lim):
177 |                 if self.is_original(submission):
178 |                     self.original_permalink = submission.permalink
179 |                     return True
180 |         except:
181 |             pass
182 |             return False
183 | 
184 |         print "--------------Failed all searches"
185 |         return False
186 | 
187 | 
188 |     def set_original_fields(self, submission):
189 |         try:
190 |             self.original_title = submission.title.encode('utf-8')
191 |             self.original_link = submission.permalink
192 |             self.original_author = submission.author
193 |             self.original_found = True
194 |         except:
195 |             pass
196 | 
197 | 
198 |     def set_xpost_fields(self, submission):
199 |         try:
200 |             self.xpost_url = submission.url.encode('utf-8')
201 |             self.xpost_permalink = submission.permalink
202 |             self.xpost_author = submission.author.name
203 |             self.xpost_title = submission.title.lower().encode('utf-8')
204 |             self.xpost_sub = submission.subreddit
205 |             self.xpost_sub_title = str(submission.subreddit.display_name.lower())
206 |         except:
207 |             pass
208 | 
209 | 
210 |     def set_xpost_submissions(self, search_terms, client):
211 |         """
212 |             Searches for the most recent xposts and sets it
213 |         """
214 |         print "Finding xposts"
215 |         for entry in search_terms:
216 |             for title in client.search(entry, sort="new"):
217 |                 self.xpost_submissions.add(title)
218 | 
219 | 
220 |     def get_xpost_title(self, title):
221 |         # format TITLE(xpost)
222 |         if (len(title) == title.find(')') + 1):
223 |             return title.split('(')[0]
224 |         # format TITLE[xpost]
225 |         elif (len(title) == title.find(']') + 1):
226 |             return title.split('[')[0]
227 |         # format (xpost)TITLE
228 |         elif (title.find('(') == 0):
229 |             return title.split(')')[1]
230 |         # format [xpost]TITLE
231 |         elif (title.find('[') == 0):
232 |             return title.split('[')[1]
233 |         # weird format, return false
234 |         else:
235 |             print ("Couldn't get title correctly")
236 |             return None
237 | 
238 | 
239 |     # -------- Boolean Methods --------- #
240 | 
241 | 
242 |     def has_source(self, submission):
243 |         for comment in submission.comments:
244 |             try:
245 |                 if (any(string in str(comment.body).lower()
246 |                         for string in ORIGINAL_COMMENTS)):
247 |                     print("Source in comments found: ")
248 |                     print("     " + str(comment.body) + "\n")
249 |                     return True
250 |             except:
251 |                 pass
252 | 
253 |         print "No 'source' comments found"
254 |         return False
255 | 
256 | 
257 |     def is_ignored_or_nsfw(self, submission):
258 |         return not (submission.subreddit.display_name.lower() in self.ignored_subs or
259 |            submission.over_18 is True)
260 | 
261 | 
262 |     def is_original(self, submission):
263 |         try:
264 |             if (self.xpost_url == str(submission.url).encode('utf-8') and
265 |                 submission.subreddit.display_name.lower().encode('utf-8') == self.original_sub_title and
266 |                 submission.over_18 is False and
267 |                 not self.xpost_permalink in submission.permalink):
268 |                 self.set_original_fields(submission)
269 |                 return True
270 |             return False
271 |         except:
272 |             pass
273 |             return False
274 | 
275 | 
276 |     def is_same_ref(self):
277 |         """
278 |             If the original submission's title is an x-post referencing
279 |             the xpost sub, then return True
280 |         """
281 |         if self.xpost_sub_title in self.original_title:
282 |             print "True Ref"
283 |             return True
284 |         print "False Ref"
285 |         return False
286 | 
287 | 
288 |     def is_xpost(self, submission):
289 |         submission_title = submission.title.lower()
290 |         try:
291 |             submission_title = submission_title.encode('utf-8')
292 |         except:
293 |             pass
294 |             return False
295 |         return any(string in submission_title for string in self.xpost_dict)
296 | 
297 | 
298 |     # -------- Database --------- #
299 | 
300 | 
301 |     def clear_database(self):
302 |         num_rows = ENGINE.execute("select * from searched_posts")
303 | 
304 |         if num_rows.rowcount > 1000:
305 |             ENGINE.execute("delete from searched_posts")
306 |             print "Cleared database"
307 |         if len(self.cache) > 1000:
308 |             self.cache = self.cache[int(len(self.cache))/2:]
309 |             print "Halved cache"
310 | 
311 | 
312 |     def id_added(self, sub_id):
313 |         id_added_text = text("select * from searched_posts where post_id = :postID")
314 |         return ENGINE.execute(id_added_text, postID=sub_id).rowcount != 0
315 | 
316 | 
317 |     def setup_database_cache(self):
318 |         result = ENGINE.execute("select * from searched_posts")
319 | 
320 |         for row in result:
321 |             self.temp_cache.add(str(row[0]))
322 | 
323 |         for value in self.temp_cache:
324 |             if value not in self.cache:
325 |                 self.cache.add(str(value))
326 | 
327 | 
328 |     def write_to_file(self, sub_id):
329 |         """
330 |             Saves the submission we just searched
331 |         """
332 |         if not self.id_added(sub_id):
333 |             temp_text = text('insert into searched_posts (post_id) values(:postID)')
334 |             ENGINE.execute(temp_text, postID=sub_id)
335 | 
336 | 
337 | 
338 | if __name__ == '__main__':
339 |     bot = SearchBot()
340 |     print "Created bot"
341 | 
342 |     while True:
343 |         bot.set_xpost_submissions(X_POST_DICTIONARY, REDDIT_CLIENT)
344 |         bot.setup_database_cache()
345 | 
346 |         for submission in bot.xpost_submissions:
347 |             # NSFW content or ignored subreddit
348 |             if not bot.is_ignored_or_nsfw(submission) and submission.id not in bot.cache:
349 |                 bot.write_to_file(submission.id)
350 |                 bot.reset_fields()
351 |                 continue
352 | 
353 |             if bot.is_xpost(submission) and submission.id not in bot.cache:
354 |                
355 |                 bot.set_xpost_fields(submission)
356 | 
357 |                 try:
358 |                     if "reddit" in bot.xpost_url.encode('utf-8'):
359 |                         print "Post links to Reddit"
360 |                         bot.write_to_file(submission.id)
361 |                         bot.reset_fields()
362 |                         continue
363 |                 except:
364 |                     bot.write_to_file(submission.id)
365 |                     bot.reset_fields()
366 |                     continue
367 | 
368 |                 print("\nXPost found!")
369 |                 print("subreddit = " + bot.xpost_sub_title)
370 |                 print("post title = " + bot.xpost_title)
371 |                 print("xpost_url = " + bot.xpost_url)
372 |                 print("xpost_permalink = " + bot.xpost_permalink.encode('utf-8'))
373 | 
374 |                 bot.write_to_file(submission.id)
375 |                 bot.get_original_sub()
376 | 
377 |                 if (bot.original_sub_title == None or 
378 |                     bot.original_sub_title == bot.xpost_sub.display_name.lower().encode('utf-8')):
379 |                     print "Failed original subreddit or same subreddit"
380 |                     bot.reset_fields()
381 |                 else:
382 |                     if not bot.has_source(submission) and bot.search_for_post(submission, 150) and not bot.is_same_ref():
383 |                         try:
384 |                             bot.create_comment(submission)
385 |                             bot.write_to_file(submission.id)
386 |                             bot.reset_fields()
387 |                         except:
388 |                             print "Failed to comment"
389 |                             bot.write_to_file(submission.id)
390 |                             bot.reset_fields()
391 |                     else:
392 |                         print "Failed to find source"
393 |                         bot.write_to_file(submission.id)
394 |                         bot.reset_fields()
395 |             # the submission is not an xpost or submission id is in cache already
396 |             else:
397 |                 bot.reset_fields()
398 | 
399 |         bot.delete_negative()
400 |         bot.temp_cache.clear()
401 |         bot.xpost_submissions.clear()
402 | 
403 |         print "\nSleeping\n"
404 |         time.sleep(10)
405 |         if len(bot.cache) > 1000:
406 |             bot.clear_database()
407 | 


--------------------------------------------------------------------------------