├── .github
    ├── FUNDING.yml
    └── ISSUE_TEMPLATE
    │   ├── bug_report.md
    │   └── feature_request.md
├── .gitignore
├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
├── imgur_scraper
    ├── __init__.py
    ├── imgur_scraper.py
    └── utils.py
├── requirements.txt
└── setup.py


/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Bug report
 3 | about: Create a report to help us improve
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 | 
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Go to '...'
16 | 2. Click on '....'
17 | 3. Scroll down to '....'
18 | 4. See error
19 | 
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 | 
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 | 
26 | **Desktop (please complete the following information):**
27 |  - OS: [e.g. iOS]
28 |  - Browser [e.g. chrome, safari]
29 |  - Version [e.g. 22]
30 | 
31 | **Smartphone (please complete the following information):**
32 |  - Device: [e.g. iPhone6]
33 |  - OS: [e.g. iOS8.1]
34 |  - Browser [e.g. stock browser, safari]
35 |  - Version [e.g. 22]
36 | 
37 | **Additional context**
38 | Add any other context about the problem here.
39 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Feature request
 3 | about: Suggest an idea for this project
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Is your feature request related to a problem? Please describe.**
11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12 | 
13 | **Describe the solution you'd like**
14 | A clear and concise description of what you want to happen.
15 | 
16 | **Describe alternatives you've considered**
17 | A clear and concise description of any alternative solutions or features you've considered.
18 | 
19 | **Additional context**
20 | Add any other context or screenshots about the feature request here.
21 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | env
2 | __pycache__
3 | .idea/
4 | build/
5 | dist/
6 | imgur_scraper.egg-info/


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Covenant Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as
 6 | contributors and maintainers pledge to making participation in our project and
 7 | our community a harassment-free experience for everyone, regardless of age, body
 8 | size, disability, ethnicity, sex characteristics, gender identity and expression,
 9 | level of experience, education, socio-economic status, nationality, personal
10 | appearance, race, religion, or sexual identity and orientation.
11 | 
12 | ## Our Standards
13 | 
14 | Examples of behavior that contributes to creating a positive environment
15 | include:
16 | 
17 | * Using welcoming and inclusive language
18 | * Being respectful of differing viewpoints and experiences
19 | * Gracefully accepting constructive criticism
20 | * Focusing on what is best for the community
21 | * Showing empathy towards other community members
22 | 
23 | Examples of unacceptable behavior by participants include:
24 | 
25 | * The use of sexualized language or imagery and unwelcome sexual attention or
26 |  advances
27 | * Trolling, insulting/derogatory comments, and personal or political attacks
28 | * Public or private harassment
29 | * Publishing others' private information, such as a physical or electronic
30 |  address, without explicit permission
31 | * Other conduct which could reasonably be considered inappropriate in a
32 |  professional setting
33 | 
34 | ## Our Responsibilities
35 | 
36 | Project maintainers are responsible for clarifying the standards of acceptable
37 | behavior and are expected to take appropriate and fair corrective action in
38 | response to any instances of unacceptable behavior.
39 | 
40 | Project maintainers have the right and responsibility to remove, edit, or
41 | reject comments, commits, code, wiki edits, issues, and other contributions
42 | that are not aligned to this Code of Conduct, or to ban temporarily or
43 | permanently any contributor for other behaviors that they deem inappropriate,
44 | threatening, offensive, or harmful.
45 | 
46 | ## Scope
47 | 
48 | This Code of Conduct applies both within project spaces and in public spaces
49 | when an individual is representing the project or its community. Examples of
50 | representing a project or community include using an official project e-mail
51 | address, posting via an official social media account, or acting as an appointed
52 | representative at an online or offline event. Representation of a project may be
53 | further defined and clarified by project maintainers.
54 | 
55 | ## Enforcement
56 | 
57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
58 | reported by contacting the project team at saadman@outlook.com. All
59 | complaints will be reviewed and investigated and will result in a response that
60 | is deemed necessary and appropriate to the circumstances. The project team is
61 | obligated to maintain confidentiality with regard to the reporter of an incident.
62 | Further details of specific enforcement policies may be posted separately.
63 | 
64 | Project maintainers who do not follow or enforce the Code of Conduct in good
65 | faith may face temporary or permanent repercussions as determined by other
66 | members of the project's leadership.
67 | 
68 | ## Attribution
69 | 
70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
72 | 
73 | [homepage]: https://www.contributor-covenant.org
74 | 
75 | For answers to common questions about this code of conduct, see
76 | https://www.contributor-covenant.org/faq
77 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 saadmanrafat
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Imgur Scraper
 2 | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/imgur-scraper) [![Downloads](https://pepy.tech/badge/imgur-scraper)](https://pepy.tech/project/imgur-scraper) ![PyPI - License](https://img.shields.io/pypi/l/imgur-scraper)
 3 | 
 4 | Retrieve years of imgur.com's data. No authentication required. Implemented using their frontend API.
 5 | 
 6 | # Usage
 7 | ![alt text](https://i.imgur.com/JsLWD8e.gif)
 8 | 
 9 | # Features
10 | 
11 | Returns close to 500 data points for each date.
12 | 
13 | ```javascript
14 | {
15 |   'title': 'I said no, my fiancé said yes. Meet Zeta', 
16 |   'url': 'https://imgur.com/gallery/H5Xw4dh', 
17 |   'points': '5,996', 
18 |   'tags': 'aww,kitten,kitty', 
19 |   'type': 'image', 
20 |   'views': '4,363'
21 |   'date': '2015-05-06'
22 | }
23 | ```
24 | ### New Features
25 | ```
26 | - Username
27 | - Comment_Count
28 | - Downs 
29 | - Ups 
30 | - Points
31 | - Score
32 | - Timestamp
33 | - Views
34 | - Favorite_Count
35 | - Hot_datetime
36 | - NSFW
37 | - Platform
38 | - Virality
39 | ```
40 | More attributes to be added soon, any suggestions or [feature requests](https://github.com/saadmanrafat/imgur-scraper/issues) are welcome. 
41 | 
42 | # Installation
43 | ```
44 | $ pip install imgur-scraper
45 | ```
46 | 
47 | # Support The Project
48 | 
49 | ```markdown
50 | bitcoin: bc1q44nlg0rvp2w4vf50cf40kgg2cvtgyhz7mlvhm0cnlqjg7cd5dh9szsaw8p
51 | ```
52 | 
53 | 


--------------------------------------------------------------------------------
/imgur_scraper/__init__.py:
--------------------------------------------------------------------------------
1 | from .imgur_scraper import main
2 | 


--------------------------------------------------------------------------------
/imgur_scraper/imgur_scraper.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import csv
  3 | import datetime
  4 | import json
  5 | import os
  6 | import re
  7 | 
  8 | from requests_html import HTMLSession
  9 | 
 10 | from .utils import Convert
 11 | 
 12 | 
 13 | def get_more_details_of_post(post_url: str) -> json:
 14 |     """
 15 |     :param post_url: the url of an imgur post
 16 |     :return: Details like Virality-score, username etc in JSON format
 17 |     """
 18 |     details = {}
 19 |     try:
 20 |         request = HTMLSession().get(post_url)
 21 | 
 22 |         # some times, request isn't properly made, hence call again.
 23 |         if len(request.html.find('script')) < 18:
 24 |             request = HTMLSession().get(post_url)
 25 | 
 26 |             return details
 27 |             # handle when its not there at all
 28 | 
 29 |         regex = 'item: ({.+} )'  # regex to isolate the `item` dict.
 30 |         # 18th script tag has the `item` dict. this is tested on more than 1500 links.
 31 |         matched = re.search(regex, request.html.find(
 32 |             'script')[18].text).group(0)
 33 |         item = json.loads(matched[5:])
 34 | 
 35 |         details['username'] = item['account_url']
 36 |         details['comment_count'] = item['comment_count']
 37 |         details['downs'] = item['downs']
 38 |         details['ups'] = item['ups']
 39 |         details['points'] = item['points']
 40 |         details['score'] = item['score']
 41 |         details['timestamp'] = item['timestamp']
 42 |         details['views'] = item['views']
 43 |         details['favorite_count'] = item['favorite_count']
 44 |         details['hot_datetime'] = item['hot_datetime']
 45 |         details['nsfw'] = item['nsfw']
 46 |         details['platform'] = 'Not Detected' if item['platform'] == None else item['platform']
 47 |         details['virality'] = item['virality']
 48 |     except Exception as e:
 49 |         print(e)
 50 | 
 51 |     return details
 52 | 
 53 | 
 54 | def get_viral_posts_from(start_date: str, end_date: str, provide_details: bool) -> json:
 55 |     """
 56 |     :param start_date: date in string
 57 |     :param end_date: date in string
 58 |     :param provide_details: boolean value to get more details of a post
 59 |     :return: Imgur's viral content of the specified period in JSON Format
 60 |     """
 61 |     convert = Convert(start_date, end_date)
 62 |     start, end = convert.to_days_ago()
 63 |     for days_ago in reversed(range(end, start + 1)):
 64 |         day_count = 0
 65 |         counter = 0
 66 |         r = HTMLSession().get(
 67 |             f"https://imgur.com/gallery/hot/viral/page/{days_ago}/hit?scrolled&set={counter}"
 68 |         )
 69 |         if r.html.find(".images-header-main"):
 70 |             print(
 71 |                 "Grabbing "
 72 |                 + " ".join(r.html.find(".images-header-main")
 73 |                            [0].full_text.split())
 74 |             )
 75 |         while not r.html.find("#nomore"):
 76 |             for entries in r.html.find(".post"):
 77 |                 less_details = {
 78 |                     "title": entries.find(".hover > p")[0].full_text,
 79 |                     "url": f"https://imgur.com{entries.find('.image-list-link')[0].attrs['href']}",
 80 |                     "points": entries.find(".point-info-points > span")[0].full_text,
 81 |                     "tags": entries.find(".point-info")[0].attrs["data-gallery-tags"].rstrip(),
 82 |                     "type": entries.find(".post-info")[0].full_text.strip().split()[0],
 83 |                     "views": entries.find(".post-info")[0].full_text.strip().split()[2],
 84 |                     "date": convert.from_days_ago(day_count),
 85 |                 }
 86 |                 if provide_details:
 87 |                     more_details = get_more_details_of_post(
 88 |                         f"https://imgur.com{entries.find('.image-list-link')[0].attrs['href']}")
 89 |                     yield dict(more_details, **less_details)
 90 |                 else:
 91 |                     yield less_details
 92 |             counter += 1
 93 |             r = HTMLSession().get(
 94 |                 f"https://imgur.com/gallery/hot/viral/page/{days_ago}/hit?scrolled&set={counter}"
 95 |             )
 96 |         day_count += 1
 97 | 
 98 | 
 99 | def main():
100 |     parser = argparse.ArgumentParser(
101 |         prog="imgur-scraper",
102 |         usage="\n$ imgur-scraper [COMMAND]",
103 |         description="Retrieve Imgur's Viral Posts",
104 |     )
105 |     parser._optionals.title = "COMMAND"
106 |     parser.add_argument("--version", action="version",
107 |                         version="%(prog)s 0.1.14")
108 |     parser.add_argument(
109 |         "--date",
110 |         action="store",
111 |         dest="date",
112 |         type=str,
113 |         metavar="",
114 |         help="date format YYYY-MM-DD (required)",
115 |         required=True,
116 |     )
117 |     parser.add_argument(
118 |         "--end_date",
119 |         action="store",
120 |         dest="end_date",
121 |         type=str,
122 |         metavar="",
123 |         help="date format YYYY-MM-DD (optional)",
124 |         default=str(datetime.datetime.utcnow()),
125 |     )
126 |     parser.add_argument(
127 |         "--csv",
128 |         action="store_true",
129 |         dest="to_csv",
130 |         help="flag to save the data in a csv file (defaults to False)",
131 |     )
132 |     parser.add_argument(
133 |         "--path",
134 |         action="store",
135 |         default=".",
136 |         metavar="",
137 |         type=str,
138 |         dest="path_to_save",
139 |         help="path to save the csv file in \
140 |             (defaults to the current working directory)",
141 |     )
142 |     parser.add_argument(
143 |         "--details",
144 |         action="store_true",
145 |         dest="provide_details",
146 |         help="flag to get more details about a post, like: username, virality etc"
147 |         # TODO: describe better
148 |     )
149 |     results = parser.parse_args()
150 |     start_date = results.date
151 |     end_date = results.end_date.split(" ")[0]
152 |     path = results.path_to_save
153 |     to_csv = results.to_csv
154 |     provide_details = results.provide_details
155 | 
156 |     if to_csv:
157 |         try:
158 |             file_name = os.path.join(
159 |                 path, f"{start_date}_to_{end_date}_imgur_data.csv")
160 |             with open(file_name, "x", newline="", encoding="utf-8") as csvfile:
161 |                 if provide_details:
162 |                     fieldnames = ["title", "url", "points", "tags", "type", "views", "date", "username", "comment_count", "downs",
163 |                                   "ups", "points", "score", "timestamp", "views", "favorite_count", "hot_datetime", "nsfw", "platform", "virality"]
164 |                 else:
165 |                     fieldnames = ["title", "url", "points",
166 |                                   "tags", "type", "views", "date"]
167 |                 writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
168 |                 writer.writeheader()
169 |                 writer.writerows(get_viral_posts_from(
170 |                     start_date, end_date, provide_details))
171 |             print(f"CSV saved in {os.path.abspath(file_name)}")
172 |         except FileExistsError as f:
173 |             print(f)
174 |         except ValueError as v:
175 |             print(v)
176 |     else:
177 |         for post in get_viral_posts_from(start_date, end_date, provide_details):
178 |             print(post)
179 | 
180 | 
181 | if __name__ == "__main__":
182 |     main()
183 | 


--------------------------------------------------------------------------------
/imgur_scraper/utils.py:
--------------------------------------------------------------------------------
 1 | from datetime import datetime, date, timedelta
 2 | 
 3 | date_format = "%Y-%m-%d"
 4 | 
 5 | 
 6 | class Convert:
 7 |     """Subtracts the given time from the current UTC time
 8 |     and returns the number of days.
 9 | 
10 |     :param:start_date, where date is a string
11 |     :param:end_date, where date is a string
12 |     """
13 | 
14 |     def __init__(self, start_date: str, end_date: str):
15 |         self.start_date = start_date
16 |         self.end_date = end_date
17 | 
18 |     def _user_given_time(self):
19 |         return (
20 |             datetime.strptime(self.start_date, date_format),
21 |             datetime.strptime(self.end_date, date_format),
22 |         )
23 | 
24 |     def to_days_ago(self):
25 |         start_time, end_time = self._user_given_time()
26 |         time_now = datetime.utcnow()
27 |         if time_now < start_time or time_now < end_time:
28 |             raise ValueError("Invalid Date")
29 |         start_time = (datetime.utcnow() - start_time).days
30 |         end_time = (datetime.utcnow() - end_time).days
31 |         if start_time < end_time:
32 |             raise ValueError("Invalid Date Range")
33 |         return start_time, end_time
34 | 
35 |     def from_days_ago(self, days_ago: int):
36 |         date_components = list(map(lambda n: int(n), self.start_date.split("-")))
37 |         date_ref = date(
38 |             date_components[0], date_components[1], date_components[2]
39 |         ) + timedelta(days=float(days_ago))
40 |         return date_ref.strftime(date_format)
41 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | requests==2.22.0
2 | requests-html==0.10.0
3 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | from setuptools import setup
 4 | 
 5 | 
 6 | NAME = "imgur_scraper"
 7 | DESCRIPTION = "Scrape years of Imgur's data without any authentication."
 8 | URL = "https://github.com/saadmanrafat/imgur-scraper"
 9 | EMAIL = "saadmanhere@gmail.com"
10 | AUTHOR = "Saadman Rafat"
11 | REQUIRES_PYTHON = ">=3.6.0"
12 | VERSION = "2.6.3"
13 | REQUIRED = ["requests-html", "requests"]
14 | 
15 | this_directory = os.path.abspath(os.path.dirname(__file__))
16 | with open(os.path.join(this_directory, "README.md"), encoding="utf-8") as f:
17 |     long_description = f.read()
18 | 
19 | setup(
20 |     name=NAME,
21 |     long_description=long_description,
22 |     long_description_content_type="text/markdown",
23 |     version=VERSION,
24 |     description=DESCRIPTION,
25 |     author=AUTHOR,
26 |     author_email=EMAIL,
27 |     python_requires=REQUIRES_PYTHON,
28 |     url=URL,
29 |     packages=["imgur_scraper"],
30 |     entry_points={
31 |         "console_scripts": ["imgur-scraper=imgur_scraper.imgur_scraper:main"]
32 |     },
33 |     install_requires=REQUIRED,
34 |     include_package_data=True,
35 |     license="MIT",
36 |     classifiers=[
37 |         "License :: OSI Approved :: MIT License",
38 |         "Programming Language :: Python",
39 |         "Programming Language :: Python :: 3",
40 |         "Programming Language :: Python :: 3.6",
41 |         "Programming Language :: Python :: 3.7",
42 |         "Programming Language :: Python :: 3.8",
43 |         "Programming Language :: Python :: Implementation :: CPython",
44 |         "Programming Language :: Python :: Implementation :: PyPy"
45 |     ],
46 | )
47 | 


--------------------------------------------------------------------------------