├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── README.md ├── api.md ├── kemono-dl.py ├── requirements.txt └── src ├── __init__.py ├── args.py ├── helper.py ├── logger.py ├── main.py ├── my_yt_dlp.py └── version.py /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a bug report 4 | labels: bug 5 | 6 | --- 7 | 8 | 9 | 10 | 11 | 12 | 13 | ### Version 14 | 15 | Version: 16 | 17 | ### Your Command 18 | 19 | ```bash 20 | 21 | Please replace with command used. 22 | 23 | ``` 24 | 25 | ### Description of bug 26 | 27 | 28 | 29 | ### How To Reproduce 30 | 31 | 32 | 33 | ### Error messages and tracebacks 34 | 35 | ```python 36 | 37 | Please replace with errors or tracebacks. 38 | 39 | ``` 40 | 41 | ### Additional comments 42 | 43 | 44 | 45 | 46 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature Request 3 | about: Suggest a feature 4 | labels: "feature request" 5 | 6 | --- 7 | ### Description 8 | 9 | 10 | 11 | ### Service, User ID, Post ID 12 | 13 | 14 | - Site: 15 | - Service: 16 | - User ID: 17 | - Post ID: 18 | 19 | ### Additional comments 20 | 21 | 22 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | venv/ 2 | Downloads/ 3 | yt_dlp_temp/ 4 | *cookies.txt 5 | *.pyc 6 | *.log 7 | *.bat 8 | links.txt 9 | test.py 10 | archive.txt -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # kemono-dl 2 | A downloader tool for kemono.party and coomer.party. 3 | 4 | ## How to use 5 | 1. Install python 3. (Disable path length limit during install) 6 | 2. Download source code for the [latest release](https://github.com/AplhaSlayer1964/kemono-dl/releases/latest) and extract it 7 | 3. Then install requirements with `pip install -r requirements.txt` 8 | - If the command doesn't run try adding `python -m`, `python3 -m`, or `py -m` to the front 9 | 4. Get a cookie.txt file from kemono.party/coomer.party 10 | - You can get a cookie text file on [Chrome](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid?hl=en) with this extension. 11 | - A cookie.txt file is required to use downloader! 12 | 5. Run `python kemono-dl.py --cookies "cookie.txt" --links https://kemono.party/SERVICE/user/USERID` 13 | - If the script doesn't run try replacing `python` with `python3` or `py` 14 | 15 | # Command Line Options 16 | 17 | ## Required! 18 | 19 | `--cookies FILE` 20 | Takes in a cookie file or a list of cookie files separated by a comma. Used to get around the DDOS protection. Your cookie file must have been gotten while logged in to use the favorite options. 21 | 22 | ## What posts to download 23 | 24 | `--links LINKS` 25 | Takes in a url or list of urls separated by a comma. 26 | `--from-file FILE` 27 | Reads in a file with urls separated by new lines. Lines starting with # will not be read in. 28 | `--kemono-fav-users SERVICE` 29 | Downloads favorite users from kemono.party of specified type or types separated by a comma. Types include: all, patreon, fanbox, gumroad, subscribestar, dlsite, fantia. Your cookie file must have been gotten while logged in to work. 30 | `--coomer-fav-users SERVICE` 31 | Downloads favorite users from coomer.party of specified type or types separated by a comma. Types include: all, onlyfans. Your cookie file must have been gotten while logged in to work. 32 | `--kemono-fav-posts` 33 | Downloads favorite posts from kemono.party. Your cookie file must have been gotten while logged in to work. 34 | `--coomer-fav-posts` 35 | Downloads favorite posts from coomer.party. Your cookie file must have been gotten while logged in to work. 36 | 37 | ## What files to download 38 | 39 | `--inline` 40 | Download the inline images from the post content. 41 | `--content` 42 | Write the post content to a html file. The html file includes comments if `--comments` is passed. 43 | `--comments` 44 | Write the post comments to a html file. 45 | `--json` 46 | Write the post json to a file. 47 | `--extract-links` 48 | Write extracted links from post content to a text file. 49 | `--dms` 50 | Write user dms to a html file. Only works when a user url is passed. 51 | `--icon` 52 | Download the users profile icon. Only works when a user url is passed. 53 | `--banner` 54 | Download the users profile banner. Only works when a user url is passed. 55 | `--yt-dlp` (UNDER CONSTRUCTION) 56 | Try to download the post embed with yt-dlp. 57 | `--skip-attachments` 58 | Do not download post attachments. 59 | `--overwrite` 60 | Overwrite any previously created files. 61 | 62 | ## Output 63 | 64 | `--dirname-pattern PATTERN` 65 | Set the file path pattern for where files are downloaded. See [Output Patterns](https://github.com/AplhaSlayer1964/kemono-dl#output-patterns=) for more detail. 66 | `--filename-pattern PATTERN` 67 | Set the file name pattern for attachments. See [Output Patterns](https://github.com/AplhaSlayer1964/kemono-dl#output-patterns=) for more detail. 68 | `--inline-filename-pattern PATTERN` 69 | Set the file name pattern for inline images. See [Output Patterns](https://github.com/AplhaSlayer1964/kemono-dl#output-patterns=) for more detail. 70 | `--other-filename-pattern PATTERN` 71 | Set the file name pattern for post content, extracted links, and json. See [Output Patterns](https://github.com/AplhaSlayer1964/kemono-dl#output-patterns=) for more detail. 72 | `--user-filename-pattern PATTERN` 73 | Set the file name pattern for icon, banner, and dms. See [Output Patterns](https://github.com/AplhaSlayer1964/kemono-dl#output-patterns=) for more detail. 74 | `--date-strf-pattern PATTERN` 75 | Set the date strf pattern variable. See [Output Patterns](https://github.com/AplhaSlayer1964/kemono-dl#output-patterns=) for more detail. 76 | `--restrict-names` 77 | Set all file and folder names to be limited to only the ascii character set. 78 | 79 | ## Download Filters 80 | 81 | `--archive FILE` 82 | Only download posts that are not recorded in the archive file. 83 | `--date YYYYMMDD` 84 | Only download posts published from this date. 85 | `--datebefore YYYYMMDD` 86 | Only download posts published before this date. 87 | `--dateafter YYYYMMDD` 88 | Only download posts published after this date. 89 | `--user-updated-datebefore YYYYMMDD` 90 | Only download user posts if the user was updated before this date. 91 | `--user-updated-dateafter YYYYMMDD` 92 | Only download user posts if the user was updated after this date. 93 | `--min-filesize SIZE` 94 | Only download attachments or inline images with greater than this file size. (ex #gb | #mb | #kb | #b) 95 | `--max-filesize SIZE` 96 | Only download attachments or inline images with less than this file size. (ex #gb | #mb | #kb | #b) 97 | `--only-filetypes EXT` 98 | Only download attachments or inline images with the given file type(s). Takes a file extensions or list of file extensions separated by a comma. (ex mp4,jpg,gif,zip) 99 | `--skip-filetypes EXT` 100 | Only download attachments or inline images without the given file type(s). Takes a file extensions or list of file extensions separated by a comma. (ex mp4,jpg,gif,zip) 101 | 102 | ## Other 103 | 104 | `--help` 105 | Prints all available options and exit. 106 | `--version` 107 | Print the version and exit. 108 | `--verbose` 109 | Display debug information and copies output to a file. 110 | `--quite` 111 | Suppress printing except for warnings, errors, and exceptions. 112 | `--simulate` 113 | Simulate the given command and do not write to disk. 114 | `--no-part-files` 115 | Do not save attachments or inline images as .part files while downloading. Files partially downloaded will not be resumed if program stops. 116 | `--yt-dlp-args ARGS` (UNDER CONSTRUCTION) 117 | The args yt-dlp will use to download with. Formatted as a python dictionary object. 118 | `--post-timeout SEC` 119 | The time in seconds to wait between downloading posts. (default: 0) 120 | `--retry COUNT` 121 | The amount of times to retry / resume downloading a file. (default: 5) 122 | `--ratelimit-sleep SEC` 123 | The time in seconds to wait after being ratelimited (default: 120) 124 | 125 | # Notes 126 | - Excepted link formats: 127 | - `https://{site}.party/{service}/user/{user_id}` 128 | - `https://{site}.party/{service}/user/{user_id}/post/{post_id}` 129 | - By default files are saved as .part files until completed. 130 | - I assume the .party site has the correct hash for attachments. This may not be the case in rare cases. 131 | - If the server is incorrect the file will remain a .part file. 132 | - You can remove the .part from the file name and see if it downloaded correctly. 133 | - If it is correct but the downloader said the hash was wrong please report it in the [pinned issue]() so I can report it to the .party site. 134 | - Some files do not have the file size in the response header and will not be downloaded when using `--min-filesize` or `--max-filesize`. 135 | - `.pdf` is a known file type that will never return file size from response headers. 136 | - Gumroad posts published date is not provided so `--date`, `--datebefore`, and `--dateafter` will always skip Gumroad posts. 137 | - Files will not be overwritten by default. 138 | - Inline images default names are the file hash. 139 | - For getting `--yt-dlp` to work please follow its instillation [guide](https://github.com/yt-dlp/yt-dlp#installation=). 140 | - For `--yt-dlp-args ARGS` refer to this for available [options](https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/YoutubeDL.py#L181). 141 | 142 | # Output Patterns 143 | 144 | ## Variables 145 | 146 | The pattern options allow you to modify the file path and file name using variables from the post. `--dirname-pattern` is the base file path for all post files. 147 | All file name patterns are appended to the end of the `--dirname-pattern`. File name patterns may also contain sub folder paths specific to that type of file such as with the default pattern for `--inline-filename-pattern`. 148 | 149 | All variables referring to dates are controlled by `--date-strf-pattern`. Standard python datetime strftime() format codes can be found [here](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes). 150 | 151 | ### All Options 152 | - `{site}` 153 | The .party site the post is hosted on. (ie. kemono.party or coomer.party) 154 | - `{service}` 155 | The service of the post. 156 | - `{user_id}` 157 | The user id of the poster. 158 | - `{username}` 159 | The user name of the poster. 160 | - `{id}` 161 | The post id. 162 | - `{title}` 163 | The post title. 164 | - `{published}` 165 | The published date of the post. 166 | - `{added}` 167 | The date the post was added to the .party site. 168 | - `{updated}` 169 | The date the post was last updated on the .party site. 170 | - `{user_updated}` 171 | The date the user was last updated on the .party site. 172 | 173 | ### Only file names 174 | - `{ext}` 175 | The file extension. 176 | - `{filename}` 177 | The original file name. 178 | - `{index}` 179 | The files index order. Only `--filename-pattern` and `--inline-filename-pattern` 180 | - `{hash}` 181 | The hash of the file. Only `--filename-pattern` and `--inline-filename-pattern` 182 | 183 | 184 | ## Default Patterns 185 | `--dirname-pattern` 186 | ```python 187 | "Downloads\{service}\{username} [{user_id}]" 188 | ``` 189 | `--filename-pattern` 190 | ```python 191 | "[{published}] [{id}] {title}\{index}_{filename}.{ext}" 192 | ``` 193 | `--inline-filename-pattern` 194 | ```python 195 | "[{published}] [{id}] {title}\inline\{index}_{filename}.{ext}" 196 | ``` 197 | `--other-filename-pattern` 198 | ```python 199 | "[{published}] [{id}] {title}\[{id}]_{filename}.{ext}" 200 | ``` 201 | `--user-filename-pattern` 202 | ```python 203 | "[{user_id}]_{filename}.{ext}" 204 | ``` 205 | `--date-strf-pattern` 206 | ```python 207 | "%Y%m%d" 208 | ``` 209 | 210 | ## Examples 211 | TODO 212 | -------------------------------------------------------------------------------- /api.md: -------------------------------------------------------------------------------- 1 | ## USERS 2 | ```python 3 | # api call: /api/{service}/user/{user_id}?o={chunk} 4 | # chunk starts at 0 and incriments by 25 5 | # returns a list of post data (see POSTS) 6 | ``` 7 | ## POSTS 8 | ```python 9 | # api call: /api/{service}/user/{user_id}/post/{post_id} 10 | # returns a dictionary of the post data 11 | 12 | post # dict 13 | ['title'] # str 14 | ['added'] # str, datetime object 15 | ['edited'] # str, datetime object 16 | ['id'] # str 17 | ['user'] # str 18 | ['published'] # str, datetime object 19 | ['attachments'] # list of dict 20 | ['name'] # str 21 | ['path'] # str 22 | ['file'] # dict 23 | ['name'] # str 24 | ['path'] # str 25 | ['content'] # str, html 26 | ['shared_file'] # bool 27 | ['embed']: # dict 28 | ['description'] # str 29 | ['subject'] # str 30 | ['url'] # str 31 | ``` 32 | ## DISCORD CHANNELS 33 | ```python 34 | # api call: /api/discord/channels/lookup?q={sercer_id} 35 | # returns a list of dictionaris contaning channel names and ids 36 | 37 | channel # dict 38 | ['id'] # str 39 | ['name'] # str 40 | ``` 41 | ## DISCORD CHANNEL POSTS 42 | ```python 43 | # api call: /api/discord/channel/{channel_id}?skip={skip} 44 | # skip starts at 0 and incriments by 10 45 | # returns a list of dictionaries contaning each posts data 46 | 47 | post # dict 48 | ['added'] # str, datetime object 49 | ['attachments'] # list of dict 50 | ['isImage'] # str 51 | ['name'] # str 52 | ['path'] # str 53 | ['author'] # dict 54 | ['avatar'] # str 55 | ['discriminator'] # str 56 | ['id'] # str 57 | ['public_flags'] # int 58 | ['username'] # str 59 | ['channel'] # str 60 | ['content'] # str, html 61 | ['edited'] # ??? 62 | ['embeds'] # list of dict 63 | ['description'] # str 64 | ['thumbnail'] # dict 65 | ['height'] # int 66 | ['proxy_url'] # str 67 | ['url'] # str 68 | ['width'] # int 69 | ['title'] # str 70 | ['type'] # str 71 | ['url'] # str 72 | ['id'] # str 73 | ['mentions'] # list of dict 74 | ['avatar'] # str 75 | ['discriminator'] # str 76 | ['id'] # str 77 | ['public_flags'] # int 78 | ['username'] # str 79 | ['published'] # str, datetime object 80 | ['server'] # str 81 | ``` 82 | ## CREATORS 83 | ```python 84 | # api call: /api/creators 85 | # returns a list of dictionaries of user data 86 | 87 | creator # dict 88 | ['id'] # str 89 | ['indexed'] # str 90 | ['name'] # str 91 | ['service'] # str 92 | ['updated'] # str 93 | ``` 94 | ## FAVORITES 95 | ```python 96 | # api all: /api/favorites?type={type} 97 | # type can be post or artist 98 | # (artist) returns a list of dictionaries with user data 99 | 100 | favorite_user # dict 101 | ['faved_seq'] # int 102 | ['id'] # str 103 | ['indexed'] # str, datetime object 104 | ['name'] # str 105 | ['service'] # str 106 | ['updated'] # str, datetime object 107 | 108 | # (post) returns a list of dictionaries with post data 109 | 110 | favorite_post # dict, same as post 111 | ['faved_seq'] # int 112 | ``` 113 | -------------------------------------------------------------------------------- /kemono-dl.py: -------------------------------------------------------------------------------- 1 | from src.main import main 2 | 3 | if __name__ == '__main__': 4 | main() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | beautifulsoup4==4.11.1 2 | Pillow==9.1.0 3 | requests==2.27.1 4 | yt_dlp==2022.4.8 5 | -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlphaSlayer1964/kemono-dl/5bfab2ee925c3dcd092bbe4ad0532fe504b34cc7/src/__init__.py -------------------------------------------------------------------------------- /src/args.py: -------------------------------------------------------------------------------- 1 | import os 2 | import datetime 3 | import re 4 | import argparse 5 | from http.cookiejar import MozillaCookieJar, LoadError 6 | 7 | from .version import __version__ 8 | 9 | def get_args(): 10 | 11 | ap = argparse.ArgumentParser() 12 | 13 | ap.add_argument("--cookies", 14 | metavar="FILE", type=str, default=None, required=True, 15 | help="Takes in a cookie file or a list of cookie files separated by a comma. Used to get around the DDOS protection. Your cookie file must have been gotten while logged in to use the favorite options.") 16 | 17 | 18 | 19 | ap.add_argument("--links", 20 | metavar="LINKS", type=str, default=None, 21 | help="Takes in a url or list of urls separated by a comma.") 22 | 23 | ap.add_argument("--from-file", 24 | metavar="FILE", type=str, default=None, 25 | help="Reads in a file with urls separated by new lines. Lines starting with # will not be read in.") 26 | 27 | ap.add_argument("--kemono-fav-users", 28 | metavar="SERVICE", type=str, default=None, 29 | help="Downloads favorite users from kemono.party of specified type or types separated by a comma. Types include: all, patreon, fanbox, gumroad, subscribestar, dlsite, fantia. Your cookie file must have been gotten while logged in to work.") 30 | 31 | ap.add_argument("--coomer-fav-users", 32 | metavar="SERVICE", type=str, default=None, 33 | help="Downloads favorite users from coomer.party of specified type or types separated by a comma. Types include: all, onlyfans. Your cookie file must have been gotten while logged in to work.") 34 | 35 | ap.add_argument("--kemono-fav-posts", 36 | action='store_true', default=False, 37 | help="Downloads favorite posts from kemono.party. Your cookie file must have been gotten while logged in to work.") 38 | 39 | ap.add_argument("--coomer-fav-posts", 40 | action='store_true', default=False, 41 | help="Downloads favorite posts from coomer.party. Your cookie file must have been gotten while logged in to work.") 42 | 43 | 44 | 45 | ap.add_argument("--inline", 46 | action='store_true', default=False, 47 | help="Download the inline images from the post content.") 48 | 49 | ap.add_argument("--content", 50 | action='store_true', default=False, 51 | help="Write the post content to a html file. The html file includes comments if `--comments` is passed.") 52 | 53 | ap.add_argument("--comments", 54 | action='store_true', default=False, 55 | help="Write the post comments to a html file.") 56 | 57 | ap.add_argument("--json", 58 | action='store_true', default=False, 59 | help="Write the post json to a file.") 60 | 61 | ap.add_argument("--extract-links", 62 | action='store_true', default=False, 63 | help="Write extracted links from post content to a text file.") 64 | 65 | ap.add_argument("--dms", 66 | action='store_true', default=False, 67 | help="Write user dms to a html file. Only works when a user url is passed.") 68 | 69 | ap.add_argument("--icon", 70 | action='store_true', default=False, 71 | help="Download the users profile icon. Only works when a user url is passed.") 72 | 73 | ap.add_argument("--banner", 74 | action='store_true', default=False, 75 | help="Download the users profile banner. Only works when a user url is passed.") 76 | 77 | ap.add_argument("--yt-dlp", 78 | action='store_true', default=False, 79 | help="Try to download the post embed with yt-dlp.") 80 | 81 | ap.add_argument("--skip-attachments", 82 | action='store_true', default=False, 83 | help="Do not download post attachments.") 84 | 85 | ap.add_argument("--overwrite", 86 | action='store_true', default=False, 87 | help="Overwrite any previously created files.") 88 | 89 | 90 | 91 | ap.add_argument("--dirname-pattern", 92 | metavar="DIRNAME_PATTERN", type=str, default='Downloads\{service}\{username} [{user_id}]', 93 | help="Set the file path pattern for where files are downloaded. See Output Patterns for more detail.") 94 | 95 | ap.add_argument("--filename-pattern", 96 | metavar="FILENAME_PATTERN", type=str, default='[{published}] [{id}] {title}\{index}_{filename}.{ext}', 97 | help="Set the file name pattern for attachments. See Output Patterns for more detail.") 98 | 99 | ap.add_argument("--inline-filename-pattern", 100 | metavar="INLINE_FILENAME_PATTERN", type=str, default='[{published}] [{id}] {title}\inline\{index}_{filename}.{ext}', 101 | help="Set the file name pattern for inline images. See Output Patterns for more detail.") 102 | 103 | ap.add_argument("--other-filename-pattern", 104 | metavar="OTHER_FILENAME_PATTERN", type=str, default='[{published}] [{id}] {title}\[{id}]_{filename}.{ext}', 105 | help="Set the file name pattern for post content, extracted links, and json. See Output Patterns for more detail.") 106 | 107 | ap.add_argument("--user-filename-pattern", 108 | metavar="USER_FILENAME_PATTERN", type=str, default='[{user_id}]_{filename}.{ext}', 109 | help="Set the file name pattern for icon, banner and dms. See Output Patterns for more detail.") 110 | 111 | ap.add_argument("--date-strf-pattern", 112 | metavar="DATE_STRF_PATTERN", type=str, default='%Y%m%d', 113 | help="Set the date strf pattern variable. See Output Patterns for more detail.") 114 | 115 | ap.add_argument("--restrict-names", 116 | action='store_true', default=False, 117 | help='Set all file and folder names to be limited to only the ascii character set.') 118 | 119 | 120 | 121 | ap.add_argument("--archive", 122 | metavar="FILE", type=str, default=None, 123 | help="Only download posts that are not recorded in the archive file.") 124 | 125 | ap.add_argument("--date", 126 | metavar="YYYYMMDD", type=str, default=None, 127 | help="Only download posts published from this date.") 128 | 129 | ap.add_argument("--datebefore", 130 | metavar="YYYYMMDD", type=str, default=None, 131 | help="Only download posts published before this date.") 132 | 133 | ap.add_argument("--dateafter", 134 | metavar="YYYYMMDD", type=str, default=None, 135 | help="Only download posts published after this date.") 136 | 137 | ap.add_argument("--user-updated-datebefore", 138 | metavar="YYYYMMDD", type=str, default=None, 139 | help="Only download user posts if the user was updated before this date.") 140 | 141 | ap.add_argument("--user-updated-dateafter", 142 | metavar="YYYYMMDD", type=str, default=None, 143 | help="Only download user posts if the user was updated after this date.") 144 | 145 | ap.add_argument("--min-filesize", 146 | metavar="SIZE", type=str, default=None, 147 | help="Only download attachments or inline images with greater than this file size. (ex #gb | #mb | #kb | #b)") 148 | 149 | ap.add_argument("--max-filesize", 150 | metavar="SIZE", type=str, default=None, 151 | help="Only download attachments or inline images with less than this file size. (ex #gb | #mb | #kb | #b)") 152 | 153 | ap.add_argument("--only-filetypes", 154 | metavar="EXT", type=str, default=[], 155 | help="Only download attachments or inline images with the given file type(s). Takes a file extensions or list of file extensions separated by a comma. (ex mp4,jpg,gif,zip)") 156 | 157 | ap.add_argument("--skip-filetypes", 158 | metavar="EXT", type=str, default=[], 159 | help="Only download attachments or inline images without the given file type(s). Takes a file extensions or list of file extensions separated by a comma. (ex mp4,jpg,gif,zip)") 160 | 161 | 162 | 163 | ap.add_argument("--version", 164 | action='version', version=str(__version__), 165 | help="Print the version and exit.") 166 | 167 | ap.add_argument("--verbose", 168 | action='store_true', default=False, 169 | help="Display debug information and copies output to a file.") 170 | 171 | ap.add_argument("--quiet", 172 | action='store_true', default=False, 173 | help="Suppress printing except for warnings, errors, and exceptions.") 174 | 175 | ap.add_argument("--simulate", 176 | action='store_true', default=False, 177 | help="Simulate the given command and do not write to disk.") 178 | 179 | ap.add_argument("--no-part-files", 180 | action='store_true', default=False, 181 | help="Do not save attachments or inline images as .part files while downloading. Files partially downloaded will not be resumed if program stops. ") 182 | 183 | ap.add_argument("--yt-dlp-args", 184 | metavar="YT_DLP_ARGS", type=str, default=None, 185 | help="The args yt-dlp will use to download with. Formatted as a python dictionary object. ") 186 | 187 | ap.add_argument("--post-timeout", 188 | metavar="SEC", type=int, default=0, 189 | help="The time in seconds to wait between downloading posts. (default: 0)") 190 | 191 | ap.add_argument("--retry", 192 | metavar="COUNT", type=int, default=5, 193 | help="The amount of times to retry / resume downloading a file. (default: 5)") 194 | 195 | ap.add_argument("--ratelimit-sleep", 196 | metavar="SEC", type=int, default=120, 197 | help="The time in seconds to wait after being ratelimited (default: 120)") 198 | 199 | ap.add_argument("--user-agent", 200 | metavar="UA", type=str, default='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36', 201 | help="Set a custom user agent") 202 | 203 | args = vars(ap.parse_args()) 204 | 205 | # takes a comma seperated lost of cookie files and loads them into a cookie jar 206 | if args['cookies']: 207 | cookie_files = [s.strip() for s in args["cookies"].split(",")] 208 | args['cookies'] = MozillaCookieJar() 209 | loaded = 0 210 | for cookie_file in cookie_files: 211 | try: 212 | args['cookies'].load(cookie_file) 213 | loaded += 1 214 | except LoadError: 215 | print(F"Unable to load cookie {cookie_file}") 216 | except FileNotFoundError: 217 | print(F"Unable to find cookie {cookie_file}") 218 | if loaded == 0: 219 | print("No cookies loaded | exiting"), exit() 220 | 221 | # takes a comma seperated string of links and converts them to a list 222 | if args['links']: 223 | args['links'] = [s.strip().split('?')[0] for s in args["links"].split(",")] 224 | else: 225 | args['links'] = [] 226 | 227 | # takes a file and converts it to a list 228 | if args['from_file']: 229 | if not os.path.exists(args['from_file']): 230 | print(f"--from-file {args['from_file']} does not exist") 231 | with open(args['from_file'],'r') as f: 232 | # lines starting with '#' are ignored 233 | args['from_file'] = [line.rstrip().split('?')[0] for line in f if line[0] != '#' and line.strip() != ''] 234 | else: 235 | args['from_file'] = [] 236 | 237 | if args['archive']: 238 | # the archive file doesn't need to exist but the directory does 239 | if not os.path.isdir(os.path.dirname(os.path.abspath(args['archive']))): 240 | print(f"--archive {args['archive']} directory does not exist"), quit() 241 | 242 | if args['only_filetypes'] and args['skip_filetypes']: 243 | print('--only-filetypes and --skip-filetypes can not be given together'), quit() 244 | # takes a comma seperated string of extentions and converts them to a list 245 | if args['only_filetypes']: 246 | args['only_filetypes'] = [s.strip().lower() for s in args["only_filetypes"].split(",")] 247 | # takes a comma seperated string of extentions and converts them to a list 248 | if args['skip_filetypes']: 249 | args['skip_filetypes'] = [s.strip().lower() for s in args["skip_filetypes"].split(",")] 250 | 251 | def check_date(args, key): 252 | try: 253 | args[key] = datetime.datetime.strptime(args[key], r'%Y%m%d') 254 | except: 255 | print(f"--{key} {args[key]} is an invalid date | correct format: YYYYMMDD"), exit() 256 | 257 | if args['date']: 258 | check_date(args, 'date') 259 | if args['datebefore']: 260 | check_date(args, 'datebefore') 261 | if args['dateafter']: 262 | check_date(args, 'dateafter') 263 | if args['user_updated_datebefore']: 264 | check_date(args, 'user_updated_datebefore') 265 | if args['user_updated_dateafter']: 266 | check_date(args, 'user_updated_dateafter') 267 | 268 | def check_size(args, key): 269 | found = re.search(r'([0-9]+)(gb|mb|kb|b)', args[key].lower()) 270 | if found: 271 | if found.group(2) == 'b': 272 | args[key] = int(found.group(1)) 273 | elif found.group(2) == 'kb': 274 | args[key] = int(found.group(1)) * 10**2 275 | elif found.group(2) == 'mb': 276 | args[key] = int(found.group(1)) * 10**6 277 | elif found.group(2) == 'gb': 278 | args[key] = int(found.group(1)) * 10**9 279 | return 280 | print(f"--{key} {args[key]} is an invalid size | correct format: ex 1b 1kb 1mb 1gb"), quit() 281 | 282 | if args['max_filesize']: 283 | check_size(args, 'max_filesize') 284 | if args['min_filesize']: 285 | check_size(args, 'min_filesize') 286 | 287 | if args['kemono_fav_users']: 288 | temp = [] 289 | for s in args["kemono_fav_users"].split(","): 290 | if s.strip().lower() in {'all', 'patreon', 'fanbox', 'gumroad', 'subscribestar', 'dlsite', 'fantia'}: 291 | temp.append(s.strip().lower()) 292 | else: 293 | print(f"--kemono-fav-users {s.strip()} is not a valid option") 294 | if len(temp) == 0: 295 | print(f"--kemono-fav-users no valid options were passed") 296 | args['kemono_fav_users'] = temp 297 | 298 | if args['coomer_fav_users']: 299 | temp = [] 300 | for s in args["coomer_fav_users"].split(","): 301 | if s.strip().lower() in {'all', 'onlyfans'}: 302 | temp.append(s.strip().lower()) 303 | else: 304 | print(f"--coomer-fav-users {s.strip()} is not a valid option") 305 | if len(temp) == 0: 306 | print(f"--coomer-fav-users no valid options were passed") 307 | args['coomer_fav_users'] = temp 308 | 309 | return args -------------------------------------------------------------------------------- /src/helper.py: -------------------------------------------------------------------------------- 1 | import re 2 | import hashlib 3 | import os 4 | import time 5 | 6 | def parse_url(url): 7 | # parse urls 8 | downloadable = re.search(r'^https://(kemono\.party|coomer\.party)/([^/]+)/user/([^/]+)($|/post/([^/]+)$)',url) 9 | if not downloadable: 10 | return None 11 | return downloadable.group(1) 12 | 13 | # create path from template pattern 14 | def compile_post_path(post_variables, template, ascii): 15 | drive, tail = os.path.splitdrive(template) 16 | tail = tail[1:] if tail[0] in {'/','\\'} else tail 17 | tail_split = re.split(r'\\|/', tail) 18 | cleaned_path = drive + os.path.sep if drive else '' 19 | for folder in tail_split: 20 | if ascii: 21 | cleaned_path = os.path.join(cleaned_path, restrict_ascii(clean_folder_name(folder.format(**post_variables)))) 22 | else: 23 | cleaned_path = os.path.join(cleaned_path, clean_folder_name(folder.format(**post_variables))) 24 | return cleaned_path 25 | 26 | # create file path from template pattern 27 | def compile_file_path(post_path, post_variables, file_variables, template, ascii): 28 | file_split = re.split(r'\\|/', template) 29 | if len(file_split) > 1: 30 | for folder in file_split[:-1]: 31 | if ascii: 32 | post_path = os.path.join(post_path, restrict_ascii(clean_folder_name(folder.format(**file_variables, **post_variables)))) 33 | else: 34 | post_path = os.path.join(post_path, clean_folder_name(folder.format(**file_variables, **post_variables))) 35 | if ascii: 36 | cleaned_file = restrict_ascii(clean_file_name(file_split[-1].format(**file_variables, **post_variables))) 37 | else: 38 | cleaned_file = clean_file_name(file_split[-1].format(**file_variables, **post_variables)) 39 | return os.path.join(post_path, cleaned_file) 40 | 41 | # get file hash 42 | def get_file_hash(file:str): 43 | sha256_hash = hashlib.sha256() 44 | with open(file,"rb") as f: 45 | for byte_block in iter(lambda: f.read(4096),b""): 46 | sha256_hash.update(byte_block) 47 | return sha256_hash.hexdigest().lower() 48 | 49 | # clean folder name for windows 50 | def clean_folder_name(folder_name:str): 51 | if not folder_name.rstrip(): 52 | folder_name = '_' 53 | return re.sub(r'[\x00-\x1f\\/:\"*?<>\|]|\.$','_',folder_name.rstrip())[:248] 54 | 55 | # clean file name for windows 56 | def clean_file_name(file_name:str): 57 | if not file_name: 58 | file_name = '_' 59 | file_name = re.sub(r'[\x00-\x1f\\/:\"*?<>\|]','_', file_name) 60 | file_name, file_extension = os.path.splitext(file_name) 61 | return file_name[:255-len(file_extension)-5] + file_extension 62 | 63 | def restrict_ascii(string:str): 64 | return re.sub(r'[^\x21-\x7f]','_',string) 65 | 66 | def check_date(post_date, date, datebefore, dateafter): 67 | if date: 68 | if date == post_date: 69 | return False 70 | if datebefore and dateafter: 71 | if dateafter <= post_date <= datebefore: 72 | return False 73 | elif datebefore: 74 | if datebefore >= post_date: 75 | return False 76 | elif dateafter: 77 | if dateafter <= post_date: 78 | return False 79 | return True 80 | 81 | # prints download bar 82 | def print_download_bar(total:int, downloaded:int, resumed:int, start): 83 | time_diff = time.time() - start 84 | if time_diff == 0.0: 85 | time_diff = 0.000001 86 | done = 50 87 | 88 | rate = (downloaded-resumed)/time_diff 89 | 90 | eta = time.strftime("%H:%M:%S", time.gmtime((total-downloaded) / rate)) 91 | 92 | if rate/2**10 < 100: 93 | rate = (round(rate/2**10, 1), 'KB') 94 | elif rate/2**20 < 100: 95 | rate = (round(rate/2**20, 1), 'MB') 96 | else: 97 | rate = (round(rate/2**30, 1), 'GB') 98 | 99 | if total: 100 | done = int(50*downloaded/total) 101 | if total/2**10 < 100: 102 | total = (round(total/2**10, 1), 'KB') 103 | downloaded = round(downloaded/2**10,1) 104 | elif total/2**20 < 100: 105 | total = (round(total/2**20, 1), 'MB') 106 | downloaded = round(downloaded/2**20,1) 107 | else: 108 | total = (round(total/2**30, 1), 'GB') 109 | downloaded = round(downloaded/2**30,1) 110 | else: 111 | if downloaded/2**10 < 100: 112 | total = ('???', 'KB') 113 | downloaded = round(downloaded/2**10,1) 114 | elif downloaded/2**20 < 100: 115 | total = ('???', 'MB') 116 | downloaded = round(downloaded/2**20,1) 117 | else: 118 | total = ('???', 'GB') 119 | downloaded = round(downloaded/2**30,1) 120 | 121 | bar_fill = '='*done 122 | bar_empty = ' '*(50-done) 123 | overlap_buffer = ' '*15 124 | print(f'[{bar_fill}{bar_empty}] {downloaded}/{total[0]} {total[1]} at {rate[0]} {rate[1]}/s ETA {eta}{overlap_buffer}', end='\r') 125 | 126 | # redo this 127 | # def check_version(): 128 | # try: 129 | # current_version = datetime.datetime.strptime(__version__, r'%Y.%m.%d') 130 | # except: 131 | # current_version = datetime.datetime.strptime(__version__, r'%Y.%m.%d.%H') 132 | # github_api_url = 'https://api.github.com/repos/AplhaSlayer1964/kemono-dl/releases/latest' 133 | # try: 134 | # latest_tag = requests.get(url=github_api_url, timeout=300).json()['tag_name'] 135 | # except: 136 | # logger.error("Failed to check latest version of kemono-dl") 137 | # return 138 | # try: 139 | # latest_version = datetime.datetime.strptime(latest_tag, r'%Y.%m.%d') 140 | # except: 141 | # latest_version = datetime.datetime.strptime(latest_tag, r'%Y.%m.%d.%H') 142 | # if current_version < latest_version: 143 | # logger.debug(f"Using kemono-dl {__version__} while latest release is kemono-dl {latest_tag}") 144 | # logger.warning(f"A newer version of kemono-dl is available. Please update to the latest release at https://github.com/AplhaSlayer1964/kemono-dl/releases/latest") -------------------------------------------------------------------------------- /src/logger.py: -------------------------------------------------------------------------------- 1 | import logging 2 | 3 | from .args import get_args 4 | 5 | args = get_args() 6 | 7 | if args['verbose']: 8 | # clear log file 9 | file = open('debug.log','w') 10 | file.close() 11 | 12 | logging.getLogger("requests").setLevel(logging.WARNING) 13 | logging.getLogger("urllib3").setLevel(logging.WARNING) 14 | 15 | logger = logging.getLogger('kemono-dl') 16 | 17 | logger.setLevel(logging.INFO) 18 | if args['quiet']: 19 | logger.setLevel(logging.WARNING) 20 | if args['verbose']: 21 | logger.setLevel(logging.DEBUG) 22 | 23 | file_format = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s') 24 | stream_format = logging.Formatter('%(levelname)s:%(message)s') 25 | 26 | file_handler = logging.FileHandler('debug.log', encoding="utf-16") 27 | file_handler.setFormatter(file_format) 28 | 29 | stream_handler = logging.StreamHandler() 30 | stream_handler.setFormatter(stream_format) 31 | 32 | if args['verbose']: 33 | logger.addHandler(file_handler) 34 | logger.addHandler(stream_handler) -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from requests.adapters import HTTPAdapter, Retry 3 | import re 4 | import os 5 | from bs4 import BeautifulSoup 6 | import time 7 | import datetime 8 | from PIL import Image 9 | from io import BytesIO 10 | import json 11 | 12 | from .args import get_args 13 | from .logger import logger 14 | from .version import __version__ 15 | from .helper import get_file_hash, print_download_bar, check_date, parse_url, compile_post_path, compile_file_path 16 | from .my_yt_dlp import my_yt_dlp 17 | 18 | class downloader: 19 | 20 | def __init__(self, args): 21 | self.input_urls = args['links'] + args['from_file'] 22 | # list of completed posts from current session 23 | self.comp_posts = [] 24 | # list of creators info 25 | self.creators = [] 26 | 27 | # requests variables 28 | self.headers = {'User-Agent': args['user_agent']} 29 | self.cookies = args['cookies'] 30 | self.timeout = 300 31 | 32 | # file/folder naming 33 | self.download_path_template = args['dirname_pattern'] 34 | self.filename_template = args['filename_pattern'] 35 | self.inline_filename_template = args['inline_filename_pattern'] 36 | self.other_filename_template = args['other_filename_pattern'] 37 | self.user_filename_template = args['user_filename_pattern'] 38 | self.date_strf_pattern = args['date_strf_pattern'] 39 | self.yt_dlp_args = args['yt_dlp_args'] 40 | self.restrict_ascii = args['restrict_names'] 41 | 42 | self.archive_file = args['archive'] 43 | self.archive_list = [] 44 | self.post_errors = 0 45 | 46 | # controls what to download/save 47 | self.attachments = not args['skip_attachments'] 48 | self.inline = args['inline'] 49 | self.content = args['content'] 50 | self.extract_links = args['extract_links'] 51 | self.comments = args['comments'] 52 | self.json = args['json'] 53 | self.yt_dlp = args['yt_dlp'] 54 | self.k_fav_posts = args['kemono_fav_posts'] 55 | self.c_fav_posts = args['coomer_fav_posts'] 56 | self.k_fav_users = args['kemono_fav_users'] 57 | self.c_fav_users = args['coomer_fav_users'] 58 | self.icon_banner = [] 59 | if args['icon']: 60 | self.icon_banner.append('icon') 61 | if args['banner']: 62 | self.icon_banner.append('banner') 63 | self.dms = args['dms'] 64 | 65 | # controls files to ignore 66 | self.overwrite = args['overwrite'] 67 | self.only_ext = args['only_filetypes'] 68 | self.not_ext = args['skip_filetypes'] 69 | self.max_size = args['max_filesize'] 70 | self.min_size = args['min_filesize'] 71 | 72 | # controlls posts to ignore 73 | self.date = args['date'] 74 | self.datebefore = args['datebefore'] 75 | self.dateafter = args['dateafter'] 76 | self.user_up_datebefore = args['user_updated_datebefore'] 77 | self.user_up_dateafter = args['user_updated_dateafter'] 78 | 79 | # other 80 | self.retry = args['retry'] 81 | self.no_part = args['no_part_files'] 82 | self.ratelimit_sleep = args['ratelimit_sleep'] 83 | self.post_timeout = args['post_timeout'] 84 | self.simulate = args['simulate'] 85 | 86 | self.session = requests.Session() 87 | retries = Retry( 88 | total=self.retry, 89 | backoff_factor=0.1, 90 | status_forcelist=[ 500, 502, 503, 504 ] 91 | ) 92 | self.session.mount('https://', HTTPAdapter(max_retries=retries)) 93 | self.session.mount('http://', HTTPAdapter(max_retries=retries)) 94 | 95 | self.start_download() 96 | 97 | def get_creators(self, domain:str): 98 | # get site creators 99 | creators_api = f"https://{domain}/api/creators/" 100 | logger.debug(f"Getting creator json from {creators_api}") 101 | return self.session.get(url=creators_api, cookies=self.cookies, headers=self.headers, timeout=self.timeout).json() 102 | 103 | def get_user(self, user_id:str, service:str): 104 | for creator in self.creators: 105 | if creator['id'] == user_id and creator['service'] == service: 106 | return creator 107 | return None 108 | 109 | def get_favorites(self, domain:str, fav_type:str, services:list = None): 110 | fav_api = f'https://{domain}/api/favorites?type={fav_type}' 111 | logger.debug(f"Getting favorite json from {fav_api}") 112 | response = self.session.get(url=fav_api, headers=self.headers, cookies=self.cookies, timeout=self.timeout) 113 | if response.status_code == 401: 114 | logger.error(f"{response.status_code} {response.reason} | Bad cookie file") 115 | return 116 | if not response.ok: 117 | logger.error(f"{response.status_code} {response.reason}") 118 | return 119 | for favorite in response.json(): 120 | if fav_type == 'post': 121 | self.get_post(f"https://{domain}/{favorite['service']}/user/{favorite['user']}/post/{favorite['id']}") 122 | if fav_type == 'artist': 123 | if not (favorite['service'] in services or 'all' in services): 124 | logger.info(f"Skipping user {favorite['name']} | Service {favorite['service']} was not requested") 125 | continue 126 | self.get_post(f"https://{domain}/{favorite['service']}/user/{favorite['id']}") 127 | 128 | def get_post(self, url:str): 129 | found = re.search(r'(https://(kemono\.party|coomer\.party)/)(([^/]+)/user/([^/]+)($|/post/[^/]+))', url) 130 | if not found: 131 | logger.error(f"Unable to find url parameters for {url}") 132 | return 133 | api = f"{found.group(1)}api/{found.group(3)}" 134 | site = found.group(2) 135 | service = found.group(4) 136 | user_id = found.group(5) 137 | is_post = found.group(6) 138 | user = self.get_user(user_id, service) 139 | if not user: 140 | logger.error(f"Unable to find user info in creators list | {service} | {user_id}") 141 | return 142 | if not is_post: 143 | if self.skip_user(user): 144 | return 145 | logger.info(f"Downloading posts from {site}.party | {service} | {user['name']} | {user['id']}") 146 | chunk = 0 147 | first = True 148 | while True: 149 | if is_post: 150 | logger.debug(f"Requesting post json from: {api}") 151 | json = self.session.get(url=api, cookies=self.cookies, headers=self.headers, timeout=self.timeout).json() 152 | else: 153 | logger.debug(f"Requesting user json from: {api}?o={chunk}") 154 | json = self.session.get(url=f"{api}?o={chunk}", cookies=self.cookies, headers=self.headers, timeout=self.timeout).json() 155 | if not json: 156 | if is_post: 157 | logger.error(f"Unable to find post json for {api}") 158 | elif chunk == 0: 159 | logger.error(f"Unable to find user json for {api}?o={chunk}") 160 | return # completed 161 | for post in json: 162 | post = self.clean_post(post, user, site) 163 | # only download once 164 | if not is_post and first: 165 | self.download_icon_banner(post, self.icon_banner) 166 | if self.dms: 167 | self.write_dms(post) 168 | first = False 169 | if self.skip_post(post): 170 | continue 171 | try: 172 | self.download_post(post) 173 | if self.post_timeout: 174 | logger.info(f"Sleeping for {self.post_timeout} seconds.") 175 | time.sleep(self.post_timeout) 176 | except: 177 | logger.exception("Unable to download post | service:{service} user_id:{user_id} post_id:{id}".format(**post['post_variables'])) 178 | self.comp_posts.append("https://{site}/{service}/user/{user_id}/post/{id}".format(**post['post_variables'])) 179 | if len(json) < 25: 180 | return # completed 181 | chunk += 25 182 | 183 | 184 | def download_icon_banner(self, post:dict, img_types:list): 185 | for img_type in img_types: 186 | if post['post_variables']['service'] in {'dlsite'}: 187 | logger.warning(f"Profile {img_type}s are not supported for {post['post_variables']['service']} users") 188 | return 189 | if post['post_variables']['service'] in {'gumroad'} and img_type == 'banner': 190 | logger.warning(f"Profile {img_type}s are not supported for {post['post_variables']['service']} users") 191 | return 192 | image_url = "https://{site}/{img_type}s/{service}/{user_id}".format(img_type=img_type, **post['post_variables']) 193 | response = self.session.get(url=image_url,headers=self.headers, cookies=self.cookies, timeout=self.timeout) 194 | try: 195 | image = Image.open(BytesIO(response.content)) 196 | file_variables = { 197 | 'filename':img_type, 198 | 'ext':image.format.lower() 199 | } 200 | file_path = compile_file_path(post['post_path'], post['post_variables'], file_variables, self.user_filename_template, self.restrict_ascii) 201 | if os.path.exists(file_path): 202 | logger.info(f"Skipping: {os.path.split(file_path)[1]} | File already exists") 203 | return 204 | logger.info(f"Downloading: {os.path.split(file_path)[1]}") 205 | logger.debug(f"Downloading to: {file_path}") 206 | if not self.simulate: 207 | if not os.path.exists(os.path.split(file_path)[0]): 208 | os.makedirs(os.path.split(file_path)[0]) 209 | image.save(file_path, format=image.format) 210 | except: 211 | logger.error(f"Unable to download profile {img_type} for {post['post_variables']['username']}") 212 | 213 | def write_dms(self, post:dict): 214 | # no api method to get comments so using from html (not future proof) 215 | post_url = "https://{site}/{service}/user/{user_id}/dms".format(**post['post_variables']) 216 | response = self.session.get(url=post_url, allow_redirects=True, headers=self.headers, cookies=self.cookies, timeout=self.timeout) 217 | page_soup = BeautifulSoup(response.text, 'html.parser') 218 | if page_soup.find("div", {"class": "no-results"}): 219 | logger.info("No DMs found for https://{site}/{service}/user/{user_id}".format(**post['post_variables'])) 220 | return 221 | dms_soup = page_soup.find("div", {"class": "card-list__items"}) 222 | file_variables = { 223 | 'filename':'direct messages', 224 | 'ext':'html' 225 | } 226 | file_path = compile_file_path(post['post_path'], post['post_variables'], file_variables, self.user_filename_template, self.restrict_ascii) 227 | self.write_to_file(file_path, dms_soup.prettify()) 228 | 229 | def get_inline_images(self, post, content_soup): 230 | # only get images that are hosted by the .party site 231 | inline_images = [inline_image for inline_image in content_soup.find_all("img") if inline_image['src'][0] == '/'] 232 | for index, inline_image in enumerate(inline_images): 233 | file = {} 234 | filename, file_extension = os.path.splitext(inline_image['src'].rsplit('/')[-1]) 235 | m = re.search(r'[a-zA-Z0-9]{64}', inline_image['src']) 236 | file_hash = m.group(0) if m else None 237 | file['file_variables'] = { 238 | 'filename': filename, 239 | 'ext': file_extension[1:], 240 | 'url': f"https://{post['post_variables']['site']}/data{inline_image['src']}", 241 | 'hash': file_hash, 242 | 'index': f"{index + 1}".zfill(len(str(len(inline_images)))) 243 | } 244 | file['file_path'] = compile_file_path(post['post_path'], post['post_variables'], file['file_variables'], self.inline_filename_template, self.restrict_ascii) 245 | # set local image location in html 246 | inline_image['src'] = file['file_path'] 247 | post['inline_images'].append(file) 248 | return content_soup 249 | 250 | def compile_content_links(self, post, content_soup, embed_links): 251 | href_links = content_soup.find_all(href=True) 252 | post['links']['text'] = embed_links 253 | for href_link in href_links: 254 | post['links']['text'] += f"{href_link['href']}\n" 255 | post['links']['file_variables'] = { 256 | 'filename':'links', 257 | 'ext':'txt' 258 | } 259 | post['links']['file_path'] = compile_file_path(post['post_path'], post['post_variables'], post['links']['file_variables'], self.other_filename_template, self.restrict_ascii) 260 | 261 | def get_comments(self, post_variables:dict): 262 | try: 263 | # no api method to get comments so using from html (not future proof) 264 | post_url = "https://{site}/{service}/user/{user_id}/post/{id}".format(**post_variables) 265 | response = self.session.get(url=post_url, allow_redirects=True, headers=self.headers, cookies=self.cookies, timeout=self.timeout) 266 | page_soup = BeautifulSoup(response.text, 'html.parser') 267 | comment_soup = page_soup.find("div", {"class": "post__comments"}) 268 | no_comments = re.search('([^ ]+ does not support comment scraping yet\.|No comments found for this post\.)',comment_soup.text) 269 | if no_comments: 270 | logger.debug(no_comments.group(1).strip()) 271 | return '' 272 | return comment_soup.prettify() 273 | except: 274 | self.post_errors += 1 275 | logger.exception("Failed to get post comments") 276 | 277 | def compile_post_content(self, post, content_soup, comment_soup, embed): 278 | post['content']['text'] = f"{content_soup}\n{embed}\n{comment_soup}" 279 | post['content']['file_variables'] = { 280 | 'filename':'content', 281 | 'ext':'html' 282 | } 283 | post['content']['file_path'] = compile_file_path(post['post_path'], post['post_variables'], post['content']['file_variables'], self.other_filename_template, self.restrict_ascii) 284 | 285 | def clean_post(self, post:dict, user:dict, domain:str): 286 | new_post = {} 287 | # set post variables 288 | new_post['post_variables'] = {} 289 | new_post['post_variables']['title'] = post['title'] 290 | new_post['post_variables']['id'] = post['id'] 291 | new_post['post_variables']['user_id'] = post['user'] 292 | new_post['post_variables']['username'] = user['name'] 293 | new_post['post_variables']['site'] = domain 294 | new_post['post_variables']['service'] = post['service'] 295 | new_post['post_variables']['added'] = datetime.datetime.strptime(post['added'], r'%a, %d %b %Y %H:%M:%S %Z').strftime(self.date_strf_pattern) if post['added'] else None 296 | new_post['post_variables']['updated'] = datetime.datetime.strptime(post['edited'], r'%a, %d %b %Y %H:%M:%S %Z').strftime(self.date_strf_pattern) if post['edited'] else None 297 | new_post['post_variables']['user_updated'] = datetime.datetime.strptime(user['updated'], r'%a, %d %b %Y %H:%M:%S %Z').strftime(self.date_strf_pattern) if user['updated'] else None 298 | new_post['post_variables']['published'] = datetime.datetime.strptime(post['published'], r'%a, %d %b %Y %H:%M:%S %Z').strftime(self.date_strf_pattern) if post['published'] else None 299 | 300 | new_post['post_path'] = compile_post_path(new_post['post_variables'], self.download_path_template, self.restrict_ascii) 301 | 302 | new_post['attachments'] = [] 303 | if self.attachments: 304 | # add post file to front of attachments list if it doesn't already exist 305 | if post['file'] and not post['file'] in post['attachments']: 306 | post['attachments'].insert(0, post['file']) 307 | # loop over attachments and set file variables 308 | for index, attachment in enumerate(post['attachments']): 309 | file = {} 310 | filename, file_extension = os.path.splitext(attachment['name']) 311 | m = re.search(r'[a-zA-Z0-9]{64}', attachment['path']) 312 | file_hash = m.group(0) if m else None 313 | file['file_variables'] = { 314 | 'filename': filename, 315 | 'ext': file_extension[1:], 316 | 'url': f"https://{domain}/data{attachment['path']}?f={attachment['name']}", 317 | 'hash': file_hash, 318 | 'index': f"{index + 1}".zfill(len(str(len(post['attachments'])))) 319 | } 320 | file['file_path'] = compile_file_path(new_post['post_path'], new_post['post_variables'], file['file_variables'], self.filename_template, self.restrict_ascii) 321 | new_post['attachments'].append(file) 322 | 323 | new_post['inline_images'] = [] 324 | content_soup = BeautifulSoup(post['content'], 'html.parser') 325 | if self.inline: 326 | content_soup = self.get_inline_images(new_post, content_soup) 327 | 328 | comment_soup = self.get_comments(new_post['post_variables']) if self.comments else '' 329 | 330 | new_post['content'] = {'text':None,'file_variables':None, 'file_path':None} 331 | embed = "{subject}\n{url}\n{description}".format(**post['embed']) if post['embed'] else '' 332 | if (self.content or self.comments) and (content_soup or comment_soup or embed): 333 | self.compile_post_content(new_post, content_soup.prettify(), comment_soup, embed) 334 | 335 | new_post['links'] = {'text':None,'file_variables':None, 'file_path':None} 336 | embed_url = "{url}\n".format(**post['embed']) if post['embed'] else '' 337 | if self.extract_links: 338 | self.compile_content_links(new_post, content_soup, embed_url) 339 | 340 | return new_post 341 | 342 | def download_post(self, post:dict): 343 | # might look buggy if title has new lines in it 344 | logger.info("Downloading Post | {title}".format(**post['post_variables'])) 345 | logger.debug("Post URL: https://{site}/{service}/user/{user_id}/post/{id}".format(**post['post_variables'])) 346 | self.download_attachments(post) 347 | self.download_inline(post) 348 | self.write_content(post) 349 | self.write_links(post) 350 | if self.json: 351 | self.write_json(post) 352 | self.download_yt_dlp(post) 353 | self.write_archive(post) 354 | self.post_errors = 0 355 | 356 | def download_attachments(self, post:dict): 357 | # download the post attachments 358 | for file in post['attachments']: 359 | try: 360 | self.download_file(file, retry=self.retry) 361 | except: 362 | self.post_errors += 1 363 | logger.exception(f"Failed to download: {file['file_path']}") 364 | 365 | def download_inline(self, post:dict): 366 | # download the post inline files 367 | for file in post['inline_images']: 368 | try: 369 | self.download_file(file, retry=self.retry) 370 | except: 371 | self.post_errors += 1 372 | logger.exception(f"Failed to download: {file['file_path']}") 373 | 374 | def write_content(self, post:dict): 375 | # write post content 376 | if post['content']['text']: 377 | try: 378 | self.write_to_file(post['content']['file_path'], post['content']['text']) 379 | except: 380 | self.post_errors += 1 381 | logger.exception(f"Failed to save content") 382 | 383 | def write_links(self, post:dict): 384 | # Write post content links 385 | if post['links']['text']: 386 | try: 387 | self.write_to_file(post['links']['file_path'], post['links']['text']) 388 | except: 389 | self.post_errors += 1 390 | logger.exception(f"Failed to save content links") 391 | 392 | def write_json(self, post:dict): 393 | try: 394 | # add this to clean post function 395 | file_variables = { 396 | 'filename':'json', 397 | 'ext':'json' 398 | } 399 | file_path = compile_file_path(post['post_path'], post['post_variables'], file_variables, self.other_filename_template, self.restrict_ascii) 400 | self.write_to_file(file_path, post) 401 | except: 402 | self.post_errors += 1 403 | logger.exception(f"Failed to save json") 404 | 405 | def write_to_file(self, file_path, file_content): 406 | # check if file exists and if should overwrite 407 | if os.path.exists(file_path) and not self.overwrite: 408 | logger.info(f"Skipping: {os.path.split(file_path)[1]} | File already exists") 409 | return 410 | logger.info(f"Writing: {os.path.split(file_path)[1]}") 411 | logger.debug(f"Writing to: {file_path}") 412 | if not self.simulate: 413 | # create folder path if it doesn't exist 414 | if not os.path.exists(os.path.split(file_path)[0]): 415 | os.makedirs(os.path.split(file_path)[0]) 416 | # write to file 417 | if isinstance(file_content, dict): 418 | with open(file_path,'w') as f: 419 | json.dump(file_content, f, indent=4, sort_keys=True) 420 | else: 421 | with open(file_path,'wb') as f: 422 | f.write(file_content.encode("utf-16")) 423 | 424 | def download_file(self, file:dict, retry:int): 425 | # download a file 426 | if self.skip_file(file): 427 | return 428 | 429 | part_file = f"{file['file_path']}.part" if not self.no_part else file['file_path'] 430 | 431 | logger.info(f"Downloading: {os.path.split(file['file_path'])[1]}") 432 | logger.debug(f"Downloading from: {file['file_variables']['url']}") 433 | logger.debug(f"Downloading to: {part_file}") 434 | 435 | # try to resume part file 436 | resume_size = 0 437 | if os.path.exists(part_file) and not self.overwrite: 438 | resume_size = os.path.getsize(part_file) 439 | logger.info(f"Trying to resuming partial download | Resume size: {resume_size} bytes") 440 | 441 | try: 442 | response = self.session.get(url=file['file_variables']['url'], stream=True, headers={**self.headers,'Range':f"bytes={resume_size}-"}, cookies=self.cookies, timeout=self.timeout) 443 | except: 444 | logger.exception(f"Failed to get responce: {file['file_variables']['url']} | Retrying") 445 | if retry > 0: 446 | self.download_file(file, retry=retry-1) 447 | return 448 | logger.error(f"Failed to get responce: {file['file_variables']['url']} | All retries failed") 449 | self.post_errors += 1 450 | return 451 | 452 | # responce status code checking 453 | if response.status_code == 404: 454 | logger.error(f"Failed to download: {os.path.split(file['file_path'])[1]} | 404 Not Found") 455 | self.post_errors += 1 456 | return 457 | 458 | if response.status_code == 403: 459 | logger.error(f"Failed to download: {os.path.split(file['file_path'])[1]} | 403 Forbidden") 460 | self.post_errors += 1 461 | return 462 | 463 | if response.status_code == 416: 464 | logger.warning(f"Failed to download: {os.path.split(file['file_path'])[1]} | 416 Range Not Satisfiable | Assuming broken server hash value") 465 | content_length = self.session.get(url=file['file_variables']['url'], stream=True, headers=self.headers, cookies=self.cookies, timeout=self.timeout).headers.get('content-length', '') 466 | if content_length == resume_size: 467 | logger.debug("Correct amount of bytes downloaded | Assuming download completed successfully") 468 | if self.overwrite: 469 | os.replace(part_file, file['file_path']) 470 | else: 471 | os.rename(part_file, file['file_path']) 472 | return 473 | logger.error("Incorrect amount of bytes downloaded | Something went so wrong I have no idea what happened | Removing file") 474 | os.remove(part_file) 475 | self.post_errors += 1 476 | return 477 | 478 | if response.status_code == 429: 479 | logger.warning(f"Failed to download: {os.path.split(file['file_path'])[1]} | 429 Too Many Requests | Sleeping for {self.ratelimit_sleep} seconds") 480 | time.sleep(self.ratelimit_sleep) 481 | if retry > 0: 482 | self.download_file(file, retry=retry-1) 483 | return 484 | logger.error(f"Failed to download: {os.path.split(file['file_path'])[1]} | 429 Too Many Requests | All retries failed") 485 | self.post_errors += 1 486 | return 487 | if not response.ok: 488 | logger.error(f"Failed to download: {os.path.split(file['file_path'])[1]} | {response.status_code} {response.reason}") 489 | self.post_errors += 1 490 | return 491 | 492 | total = int(response.headers.get('content-length', 0)) 493 | if total: 494 | total += resume_size 495 | 496 | if not self.simulate: 497 | if not os.path.exists(os.path.split(file['file_path'])[0]): 498 | os.makedirs(os.path.split(file['file_path'])[0]) 499 | with open(part_file, 'ab') as f: 500 | start = time.time() 501 | downloaded = resume_size 502 | for chunk in response.iter_content(chunk_size=1024*1024): 503 | downloaded += len(chunk) 504 | f.write(chunk) 505 | print_download_bar(total, downloaded, resume_size, start) 506 | print() 507 | 508 | # verify download 509 | local_hash = get_file_hash(part_file) 510 | logger.debug(f"Local File hash: {local_hash}") 511 | logger.debug(f"Sever File hash: {file['file_variables']['hash']}") 512 | if local_hash != file['file_variables']['hash']: 513 | logger.warning(f"File hash did not match server! | Retrying") 514 | if retry > 0: 515 | self.download_file(file, retry=retry-1) 516 | return 517 | logger.error(f"File hash did not match server! | All retries failed") 518 | self.post_errors += 1 519 | return 520 | # remove .part from file name 521 | if self.overwrite: 522 | os.replace(part_file, file['file_path']) 523 | else: 524 | os.rename(part_file, file['file_path']) 525 | 526 | def download_yt_dlp(self, post:dict): 527 | # download from video streaming site 528 | # if self.yt_dlp and post['embed']: 529 | pass 530 | # my_yt_dlp(post['embed']['url'], post['post_path'], self.yt_dlp_args) 531 | 532 | def load_archive(self): 533 | # load archived posts 534 | if self.archive_file and os.path.exists(self.archive_file): 535 | with open(self.archive_file,'r') as f: 536 | self.archive_list = f.read().splitlines() 537 | 538 | def write_archive(self, post:dict): 539 | if self.archive_file and self.post_errors == 0 and not self.simulate: 540 | with open(self.archive_file,'a') as f: 541 | f.write("https://{site}/{service}/user/{user_id}/post/{id}".format(**post['post_variables']) + '\n') 542 | 543 | def skip_user(self, user:dict): 544 | # check last update date 545 | if self.user_up_datebefore or self.user_up_dateafter: 546 | if check_date(datetime.datetime.strptime(user['updated'], r'%a, %d %b %Y %H:%M:%S %Z'), None, self.user_up_datebefore, self.user_up_dateafter): 547 | logger.info("Skipping user | user updated date not in range") 548 | return True 549 | return False 550 | 551 | def skip_post(self, post:dict): 552 | # check if the post should be downloaded 553 | if self.archive_file: 554 | if "https://{site}/{service}/user/{user_id}/post/{id}".format(**post['post_variables']) in self.archive_list: 555 | logger.info("Skipping post | post already archived") 556 | return True 557 | 558 | if self.date or self.datebefore or self.dateafter: 559 | if not post['post_variables']['published']: 560 | logger.info("Skipping post | post published date not in range") 561 | return True 562 | elif check_date(datetime.datetime.strptime(post['post_variables']['published'], self.date_strf_pattern), self.date, self.datebefore, self.dateafter): 563 | logger.info("Skipping post | post published date not in range") 564 | return True 565 | 566 | if "https://{site}/{service}/user/{user_id}/post/{id}".format(**post['post_variables']) in self.comp_posts: 567 | logger.info("Skipping post | post was already downloaded this session") 568 | return True 569 | 570 | return False 571 | 572 | def skip_file(self, file:dict): 573 | # check if file exists 574 | if not self.overwrite: 575 | if os.path.exists(file['file_path']): 576 | logger.info(f"Skipping: {os.path.split(file['file_path'])[1]} | File already exists") 577 | return True 578 | 579 | # check file name extention 580 | if self.only_ext: 581 | if not file['file_variables']['ext'].lower() in self.only_ext: 582 | logger.info(f"Skipping: {os.path.split(file['file_path'])[1]} | File extention {file['file_variables']['ext']} not found in include list {self.only_ext}") 583 | return True 584 | if self.not_ext: 585 | if file['file_variables']['ext'].lower() in self.not_ext: 586 | logger.info(f"Skipping: {os.path.split(file['file_path'])[1]} | File extention {file['file_variables']['ext']} found in exclude list {self.not_ext}") 587 | return True 588 | 589 | # check file size 590 | if self.min_size or self.max_size: 591 | file_size = requests.get(file['file_variables']['url'], cookies=self.cookies, stream=True).headers.get('content-length', 0) 592 | if int(file_size) == 0: 593 | logger.info(f"Skipping: {os.path.split(file['file_path'])[1]} | File size not included in file header") 594 | return True 595 | if self.min_size and self.max_size: 596 | if not (self.min_size <= int(file_size) <= self.max_size): 597 | logger.info(f"Skipping: {os.path.split(file['file_path'])[1]} | File size in bytes {file_size} was not between {self.min_size} and {self.max_size}") 598 | return True 599 | elif self.min_size: 600 | if not (self.min_size <= int(file_size)): 601 | logger.info(f"Skipping: {os.path.split(file['file_path'])[1]} | File size in bytes {file_size} was not >= {self.min_size}") 602 | return True 603 | elif self.max_size: 604 | if not (int(file_size) <= self.max_size): 605 | logger.info(f"Skipping: {os.path.split(file['file_path'])[1]} | File size in bytes {file_size} was not <= {self.max_size}") 606 | return True 607 | return False 608 | 609 | 610 | 611 | def start_download(self): 612 | # start the download process 613 | self.load_archive() 614 | 615 | urls = [] 616 | domains = [] 617 | 618 | for url in self.input_urls: 619 | domain = parse_url(url) 620 | if not domain: 621 | logger.warning(f"URL is not downloadable | {url}") 622 | continue 623 | urls.append(url) 624 | if not domain in domains: domains.append(domain) 625 | 626 | if self.k_fav_posts or self.k_fav_users: 627 | if not 'kemono.party' in domains: 628 | domains.append('kemono.party') 629 | if self.c_fav_posts or self.c_fav_users: 630 | if not 'coomer.party' in domains: 631 | domains.append('coomer.party') 632 | 633 | for domain in domains: 634 | try: 635 | self.creators += self.get_creators(domain) 636 | except: 637 | logger.exception(f"Unable to get list of creators from {domain}") 638 | if not self.creators: 639 | logger.error("No creator information was retrieved. | exiting") 640 | exit() 641 | 642 | if self.k_fav_posts: 643 | try: 644 | self.get_favorites('kemono.party', 'post', retry=self.retry) 645 | except: 646 | logger.exception("Unable to get favorite posts from kemono.party") 647 | if self.c_fav_posts: 648 | try: 649 | self.get_favorites('coomer.party', 'post') 650 | except: 651 | logger.exception("Unable to get favorite posts from coomer.party") 652 | if self.k_fav_users: 653 | try: 654 | self.get_favorites('kemono.party', 'artist', self.k_fav_users) 655 | except: 656 | logger.exception("Unable to get favorite users from kemono.party") 657 | if self.c_fav_users: 658 | try: 659 | self.get_favorites('coomer.party', 'artist', self.c_fav_users) 660 | except: 661 | logger.exception("Unable to get favorite users from coomer.party") 662 | 663 | for url in urls: 664 | try: 665 | self.get_post(url) 666 | except: 667 | logger.exception(f"Unable to get posts for {url}") 668 | 669 | def main(): 670 | downloader(get_args()) 671 | -------------------------------------------------------------------------------- /src/my_yt_dlp.py: -------------------------------------------------------------------------------- 1 | import yt_dlp 2 | import shutil 3 | from yt_dlp import DownloadError 4 | import os 5 | 6 | from .logger import logger 7 | 8 | 9 | def my_yt_dlp(url:str, file_path:str, args:dict): 10 | logger.info(f"Downloading with yt-dlp: URL {url}") 11 | temp_folder = os.path.join(os.getcwd(),"yt_dlp_temp") 12 | try: 13 | # please reffer to yt-dlp's github for options 14 | ydl_opts = {"paths": {"home": file_path}, "noplaylist" : True, "quiet" : True, "verbose": False} 15 | with yt_dlp.YoutubeDL(ydl_opts) as ydl: 16 | ydl.download([url]) 17 | # clean up temp folder 18 | shutil.rmtree(temp_folder) 19 | except (Exception, DownloadError) as e: 20 | # clean up temp folder 21 | if os.path.exists(temp_folder): 22 | shutil.rmtree(temp_folder) 23 | logger.error(f"yt-dlp: Could not download URL {url}") 24 | return -------------------------------------------------------------------------------- /src/version.py: -------------------------------------------------------------------------------- 1 | __version__ = '2022.04.28' 2 | --------------------------------------------------------------------------------