├── .github ├── FUNDING.yml └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── CODE_OF_CONDUCT.md ├── LICENSE ├── README.md ├── imgur_scraper ├── __init__.py ├── imgur_scraper.py └── utils.py ├── requirements.txt └── setup.py /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Smartphone (please complete the following information):** 32 | - Device: [e.g. iPhone6] 33 | - OS: [e.g. iOS8.1] 34 | - Browser [e.g. stock browser, safari] 35 | - Version [e.g. 22] 36 | 37 | **Additional context** 38 | Add any other context about the problem here. 39 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | env 2 | __pycache__ 3 | .idea/ 4 | build/ 5 | dist/ 6 | imgur_scraper.egg-info/ -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to making participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies both within project spaces and in public spaces 49 | when an individual is representing the project or its community. Examples of 50 | representing a project or community include using an official project e-mail 51 | address, posting via an official social media account, or acting as an appointed 52 | representative at an online or offline event. Representation of a project may be 53 | further defined and clarified by project maintainers. 54 | 55 | ## Enforcement 56 | 57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 58 | reported by contacting the project team at saadman@outlook.com. All 59 | complaints will be reviewed and investigated and will result in a response that 60 | is deemed necessary and appropriate to the circumstances. The project team is 61 | obligated to maintain confidentiality with regard to the reporter of an incident. 62 | Further details of specific enforcement policies may be posted separately. 63 | 64 | Project maintainers who do not follow or enforce the Code of Conduct in good 65 | faith may face temporary or permanent repercussions as determined by other 66 | members of the project's leadership. 67 | 68 | ## Attribution 69 | 70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 72 | 73 | [homepage]: https://www.contributor-covenant.org 74 | 75 | For answers to common questions about this code of conduct, see 76 | https://www.contributor-covenant.org/faq 77 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 saadmanrafat 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Imgur Scraper 2 | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/imgur-scraper) [![Downloads](https://pepy.tech/badge/imgur-scraper)](https://pepy.tech/project/imgur-scraper) ![PyPI - License](https://img.shields.io/pypi/l/imgur-scraper) 3 | 4 | Retrieve years of imgur.com's data. No authentication required. Implemented using their frontend API. 5 | 6 | # Usage 7 | ![alt text](https://i.imgur.com/JsLWD8e.gif) 8 | 9 | # Features 10 | 11 | Returns close to 500 data points for each date. 12 | 13 | ```javascript 14 | { 15 | 'title': 'I said no, my fiancé said yes. Meet Zeta', 16 | 'url': 'https://imgur.com/gallery/H5Xw4dh', 17 | 'points': '5,996', 18 | 'tags': 'aww,kitten,kitty', 19 | 'type': 'image', 20 | 'views': '4,363' 21 | 'date': '2015-05-06' 22 | } 23 | ``` 24 | ### New Features 25 | ``` 26 | - Username 27 | - Comment_Count 28 | - Downs 29 | - Ups 30 | - Points 31 | - Score 32 | - Timestamp 33 | - Views 34 | - Favorite_Count 35 | - Hot_datetime 36 | - NSFW 37 | - Platform 38 | - Virality 39 | ``` 40 | More attributes to be added soon, any suggestions or [feature requests](https://github.com/saadmanrafat/imgur-scraper/issues) are welcome. 41 | 42 | # Installation 43 | ``` 44 | $ pip install imgur-scraper 45 | ``` 46 | 47 | # Support The Project 48 | 49 | ```markdown 50 | bitcoin: bc1q44nlg0rvp2w4vf50cf40kgg2cvtgyhz7mlvhm0cnlqjg7cd5dh9szsaw8p 51 | ``` 52 | 53 | -------------------------------------------------------------------------------- /imgur_scraper/__init__.py: -------------------------------------------------------------------------------- 1 | from .imgur_scraper import main 2 | -------------------------------------------------------------------------------- /imgur_scraper/imgur_scraper.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import csv 3 | import datetime 4 | import json 5 | import os 6 | import re 7 | 8 | from requests_html import HTMLSession 9 | 10 | from .utils import Convert 11 | 12 | 13 | def get_more_details_of_post(post_url: str) -> json: 14 | """ 15 | :param post_url: the url of an imgur post 16 | :return: Details like Virality-score, username etc in JSON format 17 | """ 18 | details = {} 19 | try: 20 | request = HTMLSession().get(post_url) 21 | 22 | # some times, request isn't properly made, hence call again. 23 | if len(request.html.find('script')) < 18: 24 | request = HTMLSession().get(post_url) 25 | 26 | return details 27 | # handle when its not there at all 28 | 29 | regex = 'item: ({.+} )' # regex to isolate the `item` dict. 30 | # 18th script tag has the `item` dict. this is tested on more than 1500 links. 31 | matched = re.search(regex, request.html.find( 32 | 'script')[18].text).group(0) 33 | item = json.loads(matched[5:]) 34 | 35 | details['username'] = item['account_url'] 36 | details['comment_count'] = item['comment_count'] 37 | details['downs'] = item['downs'] 38 | details['ups'] = item['ups'] 39 | details['points'] = item['points'] 40 | details['score'] = item['score'] 41 | details['timestamp'] = item['timestamp'] 42 | details['views'] = item['views'] 43 | details['favorite_count'] = item['favorite_count'] 44 | details['hot_datetime'] = item['hot_datetime'] 45 | details['nsfw'] = item['nsfw'] 46 | details['platform'] = 'Not Detected' if item['platform'] == None else item['platform'] 47 | details['virality'] = item['virality'] 48 | except Exception as e: 49 | print(e) 50 | 51 | return details 52 | 53 | 54 | def get_viral_posts_from(start_date: str, end_date: str, provide_details: bool) -> json: 55 | """ 56 | :param start_date: date in string 57 | :param end_date: date in string 58 | :param provide_details: boolean value to get more details of a post 59 | :return: Imgur's viral content of the specified period in JSON Format 60 | """ 61 | convert = Convert(start_date, end_date) 62 | start, end = convert.to_days_ago() 63 | for days_ago in reversed(range(end, start + 1)): 64 | day_count = 0 65 | counter = 0 66 | r = HTMLSession().get( 67 | f"https://imgur.com/gallery/hot/viral/page/{days_ago}/hit?scrolled&set={counter}" 68 | ) 69 | if r.html.find(".images-header-main"): 70 | print( 71 | "Grabbing " 72 | + " ".join(r.html.find(".images-header-main") 73 | [0].full_text.split()) 74 | ) 75 | while not r.html.find("#nomore"): 76 | for entries in r.html.find(".post"): 77 | less_details = { 78 | "title": entries.find(".hover > p")[0].full_text, 79 | "url": f"https://imgur.com{entries.find('.image-list-link')[0].attrs['href']}", 80 | "points": entries.find(".point-info-points > span")[0].full_text, 81 | "tags": entries.find(".point-info")[0].attrs["data-gallery-tags"].rstrip(), 82 | "type": entries.find(".post-info")[0].full_text.strip().split()[0], 83 | "views": entries.find(".post-info")[0].full_text.strip().split()[2], 84 | "date": convert.from_days_ago(day_count), 85 | } 86 | if provide_details: 87 | more_details = get_more_details_of_post( 88 | f"https://imgur.com{entries.find('.image-list-link')[0].attrs['href']}") 89 | yield dict(more_details, **less_details) 90 | else: 91 | yield less_details 92 | counter += 1 93 | r = HTMLSession().get( 94 | f"https://imgur.com/gallery/hot/viral/page/{days_ago}/hit?scrolled&set={counter}" 95 | ) 96 | day_count += 1 97 | 98 | 99 | def main(): 100 | parser = argparse.ArgumentParser( 101 | prog="imgur-scraper", 102 | usage="\n$ imgur-scraper [COMMAND]", 103 | description="Retrieve Imgur's Viral Posts", 104 | ) 105 | parser._optionals.title = "COMMAND" 106 | parser.add_argument("--version", action="version", 107 | version="%(prog)s 0.1.14") 108 | parser.add_argument( 109 | "--date", 110 | action="store", 111 | dest="date", 112 | type=str, 113 | metavar="", 114 | help="date format YYYY-MM-DD (required)", 115 | required=True, 116 | ) 117 | parser.add_argument( 118 | "--end_date", 119 | action="store", 120 | dest="end_date", 121 | type=str, 122 | metavar="", 123 | help="date format YYYY-MM-DD (optional)", 124 | default=str(datetime.datetime.utcnow()), 125 | ) 126 | parser.add_argument( 127 | "--csv", 128 | action="store_true", 129 | dest="to_csv", 130 | help="flag to save the data in a csv file (defaults to False)", 131 | ) 132 | parser.add_argument( 133 | "--path", 134 | action="store", 135 | default=".", 136 | metavar="", 137 | type=str, 138 | dest="path_to_save", 139 | help="path to save the csv file in \ 140 | (defaults to the current working directory)", 141 | ) 142 | parser.add_argument( 143 | "--details", 144 | action="store_true", 145 | dest="provide_details", 146 | help="flag to get more details about a post, like: username, virality etc" 147 | # TODO: describe better 148 | ) 149 | results = parser.parse_args() 150 | start_date = results.date 151 | end_date = results.end_date.split(" ")[0] 152 | path = results.path_to_save 153 | to_csv = results.to_csv 154 | provide_details = results.provide_details 155 | 156 | if to_csv: 157 | try: 158 | file_name = os.path.join( 159 | path, f"{start_date}_to_{end_date}_imgur_data.csv") 160 | with open(file_name, "x", newline="", encoding="utf-8") as csvfile: 161 | if provide_details: 162 | fieldnames = ["title", "url", "points", "tags", "type", "views", "date", "username", "comment_count", "downs", 163 | "ups", "points", "score", "timestamp", "views", "favorite_count", "hot_datetime", "nsfw", "platform", "virality"] 164 | else: 165 | fieldnames = ["title", "url", "points", 166 | "tags", "type", "views", "date"] 167 | writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 168 | writer.writeheader() 169 | writer.writerows(get_viral_posts_from( 170 | start_date, end_date, provide_details)) 171 | print(f"CSV saved in {os.path.abspath(file_name)}") 172 | except FileExistsError as f: 173 | print(f) 174 | except ValueError as v: 175 | print(v) 176 | else: 177 | for post in get_viral_posts_from(start_date, end_date, provide_details): 178 | print(post) 179 | 180 | 181 | if __name__ == "__main__": 182 | main() 183 | -------------------------------------------------------------------------------- /imgur_scraper/utils.py: -------------------------------------------------------------------------------- 1 | from datetime import datetime, date, timedelta 2 | 3 | date_format = "%Y-%m-%d" 4 | 5 | 6 | class Convert: 7 | """Subtracts the given time from the current UTC time 8 | and returns the number of days. 9 | 10 | :param:start_date, where date is a string 11 | :param:end_date, where date is a string 12 | """ 13 | 14 | def __init__(self, start_date: str, end_date: str): 15 | self.start_date = start_date 16 | self.end_date = end_date 17 | 18 | def _user_given_time(self): 19 | return ( 20 | datetime.strptime(self.start_date, date_format), 21 | datetime.strptime(self.end_date, date_format), 22 | ) 23 | 24 | def to_days_ago(self): 25 | start_time, end_time = self._user_given_time() 26 | time_now = datetime.utcnow() 27 | if time_now < start_time or time_now < end_time: 28 | raise ValueError("Invalid Date") 29 | start_time = (datetime.utcnow() - start_time).days 30 | end_time = (datetime.utcnow() - end_time).days 31 | if start_time < end_time: 32 | raise ValueError("Invalid Date Range") 33 | return start_time, end_time 34 | 35 | def from_days_ago(self, days_ago: int): 36 | date_components = list(map(lambda n: int(n), self.start_date.split("-"))) 37 | date_ref = date( 38 | date_components[0], date_components[1], date_components[2] 39 | ) + timedelta(days=float(days_ago)) 40 | return date_ref.strftime(date_format) 41 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | requests==2.22.0 2 | requests-html==0.10.0 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | from setuptools import setup 4 | 5 | 6 | NAME = "imgur_scraper" 7 | DESCRIPTION = "Scrape years of Imgur's data without any authentication." 8 | URL = "https://github.com/saadmanrafat/imgur-scraper" 9 | EMAIL = "saadmanhere@gmail.com" 10 | AUTHOR = "Saadman Rafat" 11 | REQUIRES_PYTHON = ">=3.6.0" 12 | VERSION = "2.6.3" 13 | REQUIRED = ["requests-html", "requests"] 14 | 15 | this_directory = os.path.abspath(os.path.dirname(__file__)) 16 | with open(os.path.join(this_directory, "README.md"), encoding="utf-8") as f: 17 | long_description = f.read() 18 | 19 | setup( 20 | name=NAME, 21 | long_description=long_description, 22 | long_description_content_type="text/markdown", 23 | version=VERSION, 24 | description=DESCRIPTION, 25 | author=AUTHOR, 26 | author_email=EMAIL, 27 | python_requires=REQUIRES_PYTHON, 28 | url=URL, 29 | packages=["imgur_scraper"], 30 | entry_points={ 31 | "console_scripts": ["imgur-scraper=imgur_scraper.imgur_scraper:main"] 32 | }, 33 | install_requires=REQUIRED, 34 | include_package_data=True, 35 | license="MIT", 36 | classifiers=[ 37 | "License :: OSI Approved :: MIT License", 38 | "Programming Language :: Python", 39 | "Programming Language :: Python :: 3", 40 | "Programming Language :: Python :: 3.6", 41 | "Programming Language :: Python :: 3.7", 42 | "Programming Language :: Python :: 3.8", 43 | "Programming Language :: Python :: Implementation :: CPython", 44 | "Programming Language :: Python :: Implementation :: PyPy" 45 | ], 46 | ) 47 | --------------------------------------------------------------------------------