├── LICENSE ├── README.md ├── confusion_matrix.png ├── environment.yml ├── raw_data ├── drawings │ └── urls_drawings.txt ├── hentai │ └── urls_hentai.txt ├── neutral │ └── urls_neutral.txt ├── porn │ └── urls_porn.txt └── sexy │ └── urls_sexy.txt └── scripts ├── 1_get_urls.sh ├── 2_download_from_urls.sh ├── 3_optional_download_drawings.sh ├── 4_optional_download_neutral.sh ├── 5_create_train.sh ├── 6_create_test.sh ├── download_nsfw_urls.py ├── rip.properties ├── ripme.jar └── source_urls ├── drawings.txt ├── hentai.txt ├── neutral.txt ├── porn.txt └── sexy.txt /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Alexander Kim 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # NSFW Data Scrapper 2 | 3 | ## Description 4 | 5 | This is a set of scripts that allows for an automatic collection of _10s of thousands_ of images for the following (loosely defined) categories to be later used for training an image classifier: 6 | - `porn` - pornography images 7 | - `hentai` - hentai images, but also includes pornographic drawings 8 | - `sexy` - sexually explicit images, but not pornography. Think nude photos, playboy, bikini, beach volleyball, etc. 9 | - `neutral` - safe for work neutral images of everyday things and people 10 | - `drawings` - safe for work drawings (including anime) 11 | 12 | **Note**: the scripts have only been tested in Ubuntu 16.04 Linux distribution 13 | 14 | Here is what each script (located under `scripts` directory) does: 15 | - `1_get_urls.sh` - iterates through text files under `scripts/source_urls` downloading URLs of images for each of the 5 categories above. The [Ripme](https://github.com/RipMeApp/ripme) application performs all the heavy lifting. The source URLs are mostly links to various subreddits, but could be any website that Ripme supports. 16 | *Note*: I already ran this script for you, and its outputs are located in `raw_data` directory. No need to rerun unless you edit files under `scripts/source_urls` 17 | - `2_download_from_urls.sh` - downloads actual images for urls found in text files in `raw_data` directory 18 | - `3_optional_download_drawings.sh` - (optional) script that downloads SFW anime images from the [Danbooru2018](https://www.gwern.net/Danbooru2018) database 19 | - `4_optional_download_neutral.sh` - (optional) script that downloads SFW neutral images from the [Caltech256](http://www.vision.caltech.edu/Image_Datasets/Caltech256/) dataset 20 | - `5_create_train.sh` - creates `data/train` directory and copy all `*.jpg` and `*.jpeg` file into it from `raw_data`. Also removes corrupted images 21 | - `6_create_test.sh` - creates `data/test` directory and moves `N=2000` random files for each class from `data/train` to `data/test` (change this number inside the script if you need a different train/test split). Alternatively, you can run it multiple times, each time it will move `N` images for each class from `data/train` to `data/test`. 22 | 23 | ## Prerequisites 24 | - Python3 environment: `conda env create -f environment.yml` 25 | - Java runtime environment: 26 | - Ubuntu linux:`sudo apt-get install default-jre` 27 | - Linux command line tools: `wget`, `convert` (`imagemagick` suite of tools), `rsync`, `shuf` 28 | 29 | ## How to run 30 | Change working directory to `scripts` and execute each script in the sequence indicated by the number in the file name, e.g.: 31 | ```bash 32 | $ bash 1_get_urls.sh # has already been run 33 | $ find ../raw_data -name "urls_*.txt" -exec sh -c "echo Number of URLs in {}: ; cat {} | wc -l" \; 34 | Number of URLs in ../raw_data/drawings/urls_drawings.txt: 35 | 25732 36 | Number of URLs in ../raw_data/hentai/urls_hentai.txt: 37 | 45228 38 | Number of URLs in ../raw_data/neutral/urls_neutral.txt: 39 | 20960 40 | Number of URLs in ../raw_data/sexy/urls_sexy.txt: 41 | 19554 42 | Number of URLs in ../raw_data/porn/urls_porn.txt: 43 | 116521 44 | $ bash 2_download_from_urls.sh 45 | $ bash 3_optional_download_drawings.sh # optional 46 | $ bash 4_optional_download_neutral.sh # optional 47 | $ bash 5_create_train.sh 48 | $ bash 6_create_test.sh 49 | $ cd ../data 50 | $ ls train 51 | drawings hentai neutral porn sexy 52 | $ ls test 53 | drawings hentai neutral porn sexy 54 | ``` 55 | 56 | I was able to train a CNN classifier to 91% accuracy with the following confusion matrix: 57 | ![alt text](confusion_matrix.png) 58 | 59 | As expected, `anime` and `hentai` are confused with each other more frequently than with other classes. 60 | 61 | Same with `porn` and `sexy` categories. 62 | -------------------------------------------------------------------------------- /confusion_matrix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EBazarov/nsfw_data_scrapper/15a903bcf7ad32af5d949f77edba928ed9df96d6/confusion_matrix.png -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: nsfw_data_scrapper 2 | channels: 3 | - anaconda 4 | - defaults 5 | - conda-forge 6 | dependencies: 7 | - asn1crypto=0.24.0=py37_0 8 | - beautifulsoup4=4.6.3=py37_0 9 | - ca-certificates=2018.03.07=0 10 | - certifi=2018.10.15=py37_0 11 | - cffi=1.11.5=py37he75722e_1 12 | - chardet=3.0.4=py37_1 13 | - cryptography=2.4.1=py37h1ba5d50_0 14 | - icu=58.2=h211956c_0 15 | - idna=2.7=py37_0 16 | - libedit=3.1.20170329=h6b74fdf_2 17 | - libffi=3.2.1=h4deb6c0_3 18 | - libgcc-ng=8.2.0=hdf63c60_1 19 | - libstdcxx-ng=8.2.0=hdf63c60_1 20 | - libxml2=2.9.8=h26e45fe_1 21 | - libxslt=1.1.32=h1312cb7_0 22 | - lxml=4.2.5=py37hefd8a0e_0 23 | - ncurses=6.1=he6710b0_1 24 | - openssl=1.1.1=h7b6447c_0 25 | - pip=18.1=py37_0 26 | - pycparser=2.19=py37_0 27 | - pyopenssl=18.0.0=py37_0 28 | - pysocks=1.6.8=py37_0 29 | - python=3.7.1=h0371630_3 30 | - readline=7.0=h7b6447c_5 31 | - requests=2.20.1=py37_0 32 | - setuptools=40.6.2=py37_0 33 | - six=1.11.0=py37_1 34 | - sqlite=3.25.3=h7b6447c_0 35 | - tk=8.6.8=hbc83047_0 36 | - urllib3=1.23=py37_0 37 | - wheel=0.32.3=py37_0 38 | - xz=5.2.4=h14c3975_4 39 | - zlib=1.2.11=h7b6447c_3 40 | -------------------------------------------------------------------------------- /scripts/1_get_urls.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" 4 | base_dir="$(dirname "$scripts_dir")" 5 | raw_data_dir="$base_dir/raw_data" 6 | 7 | declare -a class_names=( 8 | "neutral" 9 | "drawings" 10 | "sexy" 11 | "porn" 12 | "hentai" 13 | ) 14 | 15 | for cname in "${class_names[@]}" 16 | do 17 | echo "Getting images for class: $cname" 18 | urls_file="$raw_data_dir/$cname/urls_$cname.txt" 19 | while read url 20 | do 21 | if [[ ! "$url" =~ ^"#" ]] 22 | then 23 | echo "$url" 24 | java -jar "$scripts_dir/ripme.jar" --skip404 --ripsdirectory "$raw_data_dir/$cname" --url "$url" 25 | fi 26 | done < "$scripts_dir/source_urls/$cname.txt" 27 | done 28 | 29 | 30 | declare -a ph_porn_keywords=( 31 | "blowjob" 32 | "sex" 33 | "gangbang" 34 | "fingering" 35 | "asian" 36 | "bukkake" 37 | "teen" 38 | "cumshot" 39 | "milf" 40 | "pussy" 41 | "creampie" 42 | ) 43 | 44 | for keyword in "${ph_porn_keywords[@]}" 45 | do 46 | urls_file="$raw_data_dir/porn/urls.txt" 47 | python download_nsfw_urls.py -k "$keyword" -o "$urls_file" 48 | done 49 | 50 | declare -a ph_hentai_keywords=( 51 | "hentai" 52 | "manga" 53 | "cartoon" 54 | ) 55 | 56 | for keyword in "${ph_hentai_keywords[@]}" 57 | do 58 | urls_file="$raw_data_dir/hentai/urls.txt" 59 | python download_nsfw_urls.py -k "$keyword" -o "$urls_file" 60 | done 61 | 62 | for cname in "${class_names[@]}" 63 | do 64 | urls_file="$raw_data_dir/$cname/urls_$cname.txt" 65 | tmpfile=$(mktemp) 66 | find "$raw_data_dir/$cname" -type f -name "urls.txt" -exec cat {} + >> "$tmpfile" 67 | grep -e ".jpeg" -e ".jpg" "$tmpfile" > "$urls_file" 68 | sort -u -o "$urls_file" "$urls_file" 69 | rm "$tmpfile" 70 | done -------------------------------------------------------------------------------- /scripts/2_download_from_urls.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" 4 | base_dir="$(dirname "$scripts_dir")" 5 | raw_data_dir="$base_dir/raw_data" 6 | 7 | declare -a class_names=( 8 | "neutral" 9 | "drawings" 10 | "sexy" 11 | "porn" 12 | "hentai" 13 | ) 14 | 15 | for cname in "${class_names[@]}" 16 | do 17 | urls_file="$raw_data_dir/$cname/urls_$cname.txt" 18 | images_dir="$raw_data_dir/$cname/IMAGES" 19 | mkdir -p "$images_dir" 20 | echo "Class: $cname. Total # of urls: $(cat $urls_file | wc -l)" 21 | echo "Downloading images..." 22 | wget -nc -q -i "$urls_file" -P "$images_dir" 23 | done 24 | -------------------------------------------------------------------------------- /scripts/3_optional_download_drawings.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" 4 | base_dir="$(dirname "$scripts_dir")" 5 | raw_data_dir="$base_dir/raw_data" 6 | drawings_dir="$raw_data_dir/drawings" 7 | download_dir="$drawings_dir/anime_dataset" 8 | mkdir -p "$download_dir" 9 | 10 | 11 | n_batches=4 12 | # since the numbering starts at 0, actual number of batches will be `n_batches + 1` 13 | # each batch contains roughly 2200 images 14 | for batch_num in $(seq -f "%04g" 0 "$n_batches") 15 | do 16 | rsync --recursive --times "rsync://78.46.86.149:873/danbooru2018/512px/$batch_num" "$download_dir" 17 | done 18 | 19 | # for whatever reason, most images contain black borders 20 | # need to remove them 21 | for subdir_name in $(ls "$download_dir") 22 | do 23 | find "$download_dir/$subdir_name" -name "*.jpg" -print0 | 24 | while IFS= read -r -d '' img 25 | do 26 | convert "$img" -bordercolor black -border 1x1 -fuzz 20% -trim "$img" 27 | done 28 | done -------------------------------------------------------------------------------- /scripts/4_optional_download_neutral.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" 4 | base_dir="$(dirname "$scripts_dir")" 5 | raw_data_dir="$base_dir/raw_data" 6 | mkdir -p "$raw_data_dir/neutral" 7 | 8 | wget http://www.vision.caltech.edu/Image_Datasets/Caltech256/256_ObjectCategories.tar -P "$raw_data_dir/neutral" 9 | tar -xf "$raw_data_dir/neutral/256_ObjectCategories.tar" -C "$raw_data_dir/neutral" 10 | -------------------------------------------------------------------------------- /scripts/5_create_train.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" 4 | base_dir="$(dirname "$scripts_dir")" 5 | raw_data_dir="$base_dir/raw_data" 6 | data_dir="$base_dir/data" 7 | 8 | declare -a class_names=( 9 | "neutral" 10 | "drawings" 11 | "sexy" 12 | "porn" 13 | "hentai" 14 | ) 15 | 16 | train_dir="$data_dir/train" 17 | mkdir -p "$train_dir" 18 | 19 | echo "Copying image to the training folder" 20 | for cname in "${class_names[@]}" 21 | do 22 | raw_data_class_dir="$raw_data_dir/$cname" 23 | if [[ -d "$raw_data_class_dir" ]] 24 | then 25 | mkdir -p "$train_dir/$cname" 26 | find "$raw_data_class_dir" -type f \( -name '*.jpg' -o -name '*.jpeg' \) -print0 | 27 | while IFS= read -r -d '' jpg_f 28 | do 29 | cp "$jpg_f" "$train_dir/$cname/$(uuidgen).jpg" 30 | done 31 | fi 32 | done 33 | 34 | echo "Removing corrupted images" 35 | find "$train_dir" -type f -name '*.jpg' -print0 | 36 | while IFS= read -r -d '' jpg_f 37 | do 38 | is_corrupted="$(convert "$jpg_f" /dev/null &> /dev/null; echo $?)" 39 | if [[ "$is_corrupted" -eq "1" ]] 40 | then 41 | echo "removing: $jpg_f" 42 | rm "$jpg_f" 43 | fi 44 | done 45 | 46 | echo "Number of files per class:" 47 | for subdir in $(ls "$train_dir") 48 | do 49 | echo "$subdir": "$(find "$train_dir/$subdir" -type f | wc -l)" 50 | done -------------------------------------------------------------------------------- /scripts/6_create_test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" 4 | base_dir="$(dirname "$scripts_dir")" 5 | data_dir="$base_dir/data" 6 | 7 | N=2000 8 | 9 | declare -a class_names=( 10 | "neutral" 11 | "drawings" 12 | "sexy" 13 | "porn" 14 | "hentai" 15 | ) 16 | 17 | 18 | train_dir="$data_dir/train" 19 | test_dir="$data_dir/test" 20 | mkdir -p "$test_dir" 21 | 22 | for cname in "${class_names[@]}" 23 | do 24 | test_dir_class="$test_dir/$cname" 25 | mkdir -p "$test_dir_class" 26 | train_dir_class="$train_dir/$cname" 27 | ls "$train_dir_class" | shuf -n $N | xargs -I{} mv "$train_dir_class/{}" "$test_dir_class" 28 | done 29 | 30 | echo "Number of files per class (train):" 31 | for subdir in $(ls "$train_dir") 32 | do 33 | echo "$subdir": "$(find "$train_dir/$subdir" -type f | wc -l)" 34 | done 35 | 36 | echo "Number of files per class (test):" 37 | for subdir in $(ls "$test_dir") 38 | do 39 | echo "$subdir": "$(find "$test_dir/$subdir" -type f | wc -l)" 40 | done -------------------------------------------------------------------------------- /scripts/download_nsfw_urls.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from bs4 import BeautifulSoup 3 | 4 | BASE_URL = 'https://www.pornhub.com' 5 | 6 | 7 | def get_albums_urls_from_keyword(keyword, max_page_num=30): 8 | album_urls = [] 9 | for page_num in range(1, max_page_num + 1): 10 | search_url = f'{BASE_URL}/albums?search={keyword}&page={page_num}' 11 | html = requests.get(search_url).text 12 | bs_data = BeautifulSoup(html, "lxml") 13 | album_divs = bs_data.find_all("div", {"class": "photoAlbumListBlock"}) 14 | 15 | for div in album_divs: 16 | try: 17 | album_url = div.find_all("a", href=True)[0].attrs["href"] 18 | album_urls.append(f'{BASE_URL}{album_url}') 19 | except Exception as e: 20 | print(e) 21 | return album_urls 22 | 23 | 24 | def get_preview_urls_from_album_url(album_url): 25 | html = requests.get(album_url).text 26 | bs_data = BeautifulSoup(html, "lxml") 27 | preview_divs = bs_data.find_all("div", {"class": "photoAlbumListBlock"}) 28 | preview_urls = [] 29 | for preview_div in preview_divs: 30 | short_preview_url = preview_div.find_all("a", href=True)[0].attrs["href"] 31 | preview_urls.append(f'{BASE_URL}{short_preview_url}') 32 | return preview_urls 33 | 34 | 35 | def get_image_url_from_preview_url(preview_url): 36 | html = requests.get(preview_url).text 37 | bs_data = BeautifulSoup(html, "lxml") 38 | try: 39 | center_img_div = bs_data.find_all("div", {"class": "centerImage"})[1] 40 | except Exception as e: 41 | print(f"Couldn't get image url from preview url {preview_url}:\n{str(e)}") 42 | return None 43 | image_url = center_img_div.find_all("a", href=True)[-1].find_all("img")[0]['src'] 44 | return image_url 45 | 46 | 47 | def print_image_urls_from_keyword(keyword, out_file, jpg_only=True, max_page_num=30): 48 | album_urls = get_albums_urls_from_keyword(keyword, max_page_num) 49 | if len(album_urls) > 0: 50 | for album_url in album_urls: 51 | preview_urls = get_preview_urls_from_album_url(album_url) 52 | if len(preview_urls) > 0: 53 | for preview_url in preview_urls: 54 | if len(preview_urls) > 0: 55 | image_url = get_image_url_from_preview_url(preview_url) 56 | if image_url is not None: 57 | if (jpg_only and image_url.endswith('.jpg')) or (not jpg_only): 58 | with open(out_file, "a") as f: 59 | f.write(image_url + "\n") 60 | 61 | 62 | if __name__ == "__main__": 63 | from argparse import ArgumentParser 64 | 65 | parser = ArgumentParser() 66 | parser.add_argument("-k", "--keyword", dest="keyword", help="search keyword", required=True) 67 | parser.add_argument("-o", "--out_file", dest="out_file", help="output filepath", required=True) 68 | parser.add_argument("-j", "--jpg_only", dest="jpg_only", help="download jpg only", default=True) 69 | parser.add_argument("-p", "--page_num", dest="page_num", help="maximum number of page to parse", default=30) 70 | args = parser.parse_args() 71 | print_image_urls_from_keyword(keyword=args.keyword, out_file=args.out_file, 72 | jpg_only=bool(args.jpg_only), max_page_num=int(args.page_num)) 73 | -------------------------------------------------------------------------------- /scripts/rip.properties: -------------------------------------------------------------------------------- 1 | # Download threads to use per ripper 2 | threads.size = 5 3 | 4 | # Overwrite existing files 5 | file.overwrite = false 6 | 7 | # Number of retries on failed downloads 8 | download.retries = 1 9 | 10 | # File download timeout (in milliseconds) 11 | download.timeout = 60000 12 | 13 | # Page download timeout (in milliseconds) 14 | page.timeout = 5000 15 | 16 | # Maximum size of downloaded files in bytes (required) 17 | download.max_size = 104857600 18 | 19 | # Don't retry on 404 errors 20 | error.skip404 = true 21 | 22 | # API creds 23 | twitter.auth = VW9Ybjdjb1pkd2J0U3kwTUh2VXVnOm9GTzVQVzNqM29LQU1xVGhnS3pFZzhKbGVqbXU0c2lHQ3JrUFNNZm8= 24 | tumblr.auth = JFNLu3CbINQjRdUvZibXW9VpSEVYYtiPJ86o8YmvgLZIoKyuNX 25 | gw.api = gonewild 26 | 27 | twitter.max_requests = 10 28 | 29 | clipboard.autorip = false 30 | 31 | download.save_order = false 32 | remember.url_history = false 33 | window.position = false 34 | descriptions.save = false 35 | auto.update = false 36 | log.level = Log level: Error 37 | play.sound = false 38 | download.show_popup = false 39 | log.save = false 40 | urls_only.save = true 41 | album_titles.save = false 42 | prefer.mp4 = false 43 | errors.skip404 = true 44 | rips.directory = /home/al_kim/Documents/MyGithub/nsfw_data_scrapper/raw_data/neutral 45 | download.history = https://www.reddit.com/r/mildlypenis/top/?t=all,https://www.reddit.com/r/mildlyvagina/top/?t=all,https://www.reddit.com/r/Damnthatsinteresting/top/?t=all,https://www.reddit.com/r/tattoos/top/?t=all,https://www.reddit.com/r/progresspics/top/?t=all,https://www.reddit.com/r/photoshopbattles/top/?t=all,https://www.reddit.com/r/aww/top/?t=all,https://www.reddit.com/r/funny/top/?t=all,https://www.reddit.com/r/pics/top/?t=all,https://www.reddit.com/r/photographs/top/?t=all,https://www.reddit.com/r/photography/top/?t=all,https://www.reddit.com/r/EarthPorn/top/?t=all,https://www.reddit.com/r/HistoryPorn/top/?t=all 46 | -------------------------------------------------------------------------------- /scripts/ripme.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EBazarov/nsfw_data_scrapper/15a903bcf7ad32af5d949f77edba928ed9df96d6/scripts/ripme.jar -------------------------------------------------------------------------------- /scripts/source_urls/drawings.txt: -------------------------------------------------------------------------------- 1 | https://www.reddit.com/r/AnimeCalendar/top/?t=all 2 | https://www.reddit.com/r/Melanime/top/?t=all 3 | https://www.reddit.com/r/anime/top/?t=all 4 | https://www.reddit.com/r/Boruto/top/?t=all 5 | https://www.reddit.com/r/overlord/top/?t=all 6 | https://www.reddit.com/r/streetmoe/top/?t=all 7 | https://www.reddit.com/r/animeponytails/top/?t=all 8 | https://www.reddit.com/r/awoonime/top/?t=all 9 | https://www.reddit.com/r/awwnime/top/?t=all 10 | https://www.reddit.com/r/bishounen/top/?t=all 11 | https://www.reddit.com/r/cutelittlefangs/top/?t=all 12 | https://www.reddit.com/r/cuteanimeboys/top/?t=all 13 | https://www.reddit.com/r/endcard/top/?t=all 14 | https://www.reddit.com/r/gunime/top/?t=all 15 | https://www.reddit.com/r/Moescape/top/?t=all 16 | https://www.reddit.com/r/headpats/top/?t=all 17 | https://www.reddit.com/r/imouto/top/?t=all 18 | https://www.reddit.com/r/kyoaniyuri/top/?t=all 19 | https://www.reddit.com/r/Patchuu/top/?t=all 20 | https://www.reddit.com/r/Pixiv/top/?t=all 21 | https://www.reddit.com/r/pokemoe/top/?t=all 22 | https://www.reddit.com/r/Pouts/top/?t=all 23 | https://www.reddit.com/r/SleepingAnimeGirls/top/?t=all 24 | https://www.reddit.com/r/Tsunderes/top/?t=all 25 | https://www.reddit.com/r/twintails/top/?t=all 26 | https://www.reddit.com/r/WholesomeYuri/top/?t=all 27 | https://www.reddit.com/r/zettairyouiki/top/?t=all -------------------------------------------------------------------------------- /scripts/source_urls/hentai.txt: -------------------------------------------------------------------------------- 1 | https://www.reddit.com/r/hentai/top/?t=all 2 | https://www.reddit.com/r/HentaiHumiliation/top/?t=all 3 | https://www.reddit.com/r/HentaiPics/top/?t=all 4 | https://www.reddit.com/r/HentaiLovers/top/?t=all 5 | https://www.reddit.com/r/Hentai4Everyone/top/?t=all 6 | https://www.reddit.com/r/ecchi/top/?t=all 7 | https://www.reddit.com/r/MonsterGirl/top/?t=all 8 | https://www.reddit.com/r/sukebei/top/?t=all 9 | https://www.reddit.com/r/yaoi/top/?t=all -------------------------------------------------------------------------------- /scripts/source_urls/neutral.txt: -------------------------------------------------------------------------------- 1 | https://www.reddit.com/r/mildlypenis/top/?t=all 2 | https://www.reddit.com/r/mildlyvagina/top/?t=all 3 | https://www.reddit.com/r/Damnthatsinteresting/top/?t=all 4 | https://www.reddit.com/r/tattoos/top/?t=all 5 | https://www.reddit.com/r/progresspics/top/?t=all 6 | https://www.reddit.com/r/photoshopbattles/top/?t=all 7 | https://www.reddit.com/r/aww/top/?t=all 8 | https://www.reddit.com/r/funny/top/?t=all 9 | https://www.reddit.com/r/pics/top/?t=all 10 | https://www.reddit.com/r/photographs/top/?t=all 11 | https://www.reddit.com/r/photography/top/?t=all 12 | https://www.reddit.com/r/EarthPorn/top/?t=all 13 | https://www.reddit.com/r/HistoryPorn/top/?t=all 14 | https://www.reddit.com/r/Rateme/top/?t=all 15 | https://www.reddit.com/r/roastme/top/?t=all 16 | https://www.reddit.com/r/wtfstockphotos/top/?t=all 17 | https://www.reddit.com/r/mildlyinteresting/top/?t=all 18 | https://www.reddit.com/r/interestingasfuck/top/?t=all 19 | https://www.reddit.com/r/wholesomememes/top/?t=all 20 | https://www.reddit.com/r/memes/top/?t=all 21 | https://www.reddit.com/r/FoodPorn/top/?t=all 22 | https://www.reddit.com/r/desert/top/?t=all 23 | https://www.reddit.com/r/desertporn/top/?t=all 24 | https://www.reddit.com/r/ImaginaryDeserts/top/?t=all 25 | https://www.reddit.com/r/mildlyboobs/top/?t=all 26 | -------------------------------------------------------------------------------- /scripts/source_urls/porn.txt: -------------------------------------------------------------------------------- 1 | https://www.reddit.com/r/gangbang/top/?t=all 2 | https://www.reddit.com/r/Hegoesdown/top/?t=all 3 | https://www.reddit.com/r/Pussylicking/top/?t=all 4 | https://www.reddit.com/r/anal/top/?t=all 5 | https://www.reddit.com/r/pornpics/top/?t=all 6 | https://www.reddit.com/r/blowjob/top/?t=all 7 | https://www.reddit.com/r/blowbang/top/?t=all 8 | https://www.reddit.com/r/Triplepenetration/top/?t=all -------------------------------------------------------------------------------- /scripts/source_urls/sexy.txt: -------------------------------------------------------------------------------- 1 | https://www.reddit.com/r/celebritylegs/top/?t=all 2 | https://www.reddit.com/r/Models/top/?t=all 3 | https://www.reddit.com/r/VSModels/top/?t=all 4 | https://www.reddit.com/r/goddesses/top/?t=all 5 | https://www.reddit.com/r/tightdresses/top/?t=all 6 | https://www.reddit.com/r/girlsinyogapants/top/?t=all 7 | https://www.reddit.com/r/burstingout/top/?t=all 8 | https://www.reddit.com/r/randomsexiness/top/?t=all 9 | https://www.reddit.com/r/BustyPetite/top/?t=all 10 | https://www.reddit.com/r/SexyTummies/top/?t=all 11 | https://www.reddit.com/r/VolleyballGirls/top/?t=all 12 | https://www.reddit.com/r/RealGirls/top/?t=all 13 | https://www.reddit.com/r/sexygirls/top/?t=all 14 | https://www.reddit.com/r/stripgirls/top/?t=all 15 | https://www.reddit.com/r/HotGirlsNSFW/top/?t=all 16 | https://www.reddit.com/r/OnePieceSuits/top/?t=all 17 | https://www.reddit.com/r/swimsuit/top/?t=all 18 | https://www.reddit.com/r/nsfwswimsuit/top/?t=all 19 | https://www.reddit.com/r/leotards/top/?t=all 20 | https://www.reddit.com/r/swimsuits/top/?t=all 21 | https://www.reddit.com/r/bikinis/top/?t=all 22 | https://www.reddit.com/r/CrochetBikinis/top/?t=all 23 | https://www.reddit.com/r/MicroBikini/top/?t=all 24 | https://www.reddit.com/r/asiansinswimsuits/top/?t=all 25 | --------------------------------------------------------------------------------