├── LICENSE
├── README.md
├── confusion_matrix.png
├── environment.yml
├── raw_data
    ├── drawings
    │   └── urls_drawings.txt
    ├── hentai
    │   └── urls_hentai.txt
    ├── neutral
    │   └── urls_neutral.txt
    ├── porn
    │   └── urls_porn.txt
    └── sexy
    │   └── urls_sexy.txt
└── scripts
    ├── 1_get_urls.sh
    ├── 2_download_from_urls.sh
    ├── 3_optional_download_drawings.sh
    ├── 4_optional_download_neutral.sh
    ├── 5_create_train.sh
    ├── 6_create_test.sh
    ├── download_nsfw_urls.py
    ├── rip.properties
    ├── ripme.jar
    └── source_urls
        ├── drawings.txt
        ├── hentai.txt
        ├── neutral.txt
        ├── porn.txt
        └── sexy.txt


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Alexander Kim
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # NSFW Data Scrapper
 2 | 
 3 | ## Description
 4 | 
 5 | This is a set of scripts that allows for an automatic collection of _10s of thousands_ of images for the following (loosely defined) categories to be later used for training an image classifier:
 6 | - `porn` - pornography images
 7 | - `hentai` - hentai images, but also includes pornographic drawings
 8 | - `sexy` - sexually explicit images, but not pornography. Think nude photos, playboy, bikini, beach volleyball, etc.
 9 | - `neutral` - safe for work neutral images of everyday things and people
10 | - `drawings` - safe for work drawings (including anime)
11 | 
12 | **Note**: the scripts have only been tested in Ubuntu 16.04 Linux distribution
13 | 
14 | Here is what each script (located under `scripts` directory) does:
15 | - `1_get_urls.sh` - iterates through text files under `scripts/source_urls` downloading URLs of images for each of the 5 categories above. The [Ripme](https://github.com/RipMeApp/ripme) application performs all the heavy lifting. The source URLs are mostly links to various subreddits, but could be any website that Ripme supports.
16 | *Note*: I already ran this script for you, and its outputs are located in `raw_data` directory. No need to rerun unless you edit files under `scripts/source_urls`
17 | - `2_download_from_urls.sh` - downloads actual images for urls found in text files in `raw_data` directory
18 | - `3_optional_download_drawings.sh` - (optional) script that downloads SFW anime images from the [Danbooru2018](https://www.gwern.net/Danbooru2018) database
19 | - `4_optional_download_neutral.sh` - (optional) script that downloads SFW neutral images from the [Caltech256](http://www.vision.caltech.edu/Image_Datasets/Caltech256/) dataset
20 | - `5_create_train.sh` - creates `data/train` directory and copy all `*.jpg` and `*.jpeg` file into it from `raw_data`. Also removes corrupted images
21 | - `6_create_test.sh` - creates `data/test` directory and moves `N=2000` random files for each class from `data/train` to `data/test` (change this number inside the script if you need a different train/test split). Alternatively, you can run it multiple times, each time it will move `N` images for each class from `data/train` to `data/test`.
22 | 
23 | ## Prerequisites
24 | - Python3 environment: `conda env create -f environment.yml`
25 | - Java runtime environment: 
26 |    - Ubuntu linux:`sudo apt-get install default-jre`
27 | - Linux command line tools: `wget`, `convert` (`imagemagick` suite of tools), `rsync`, `shuf`
28 | 
29 | ## How to run
30 | Change working directory to `scripts` and execute each script in the sequence indicated by the number in the file name, e.g.:
31 | ```bash
32 | $ bash 1_get_urls.sh # has already been run
33 | $ find ../raw_data -name "urls_*.txt" -exec sh -c "echo Number of URLs in {}: ; cat {} | wc -l" \;
34 | Number of URLs in ../raw_data/drawings/urls_drawings.txt:
35 |    25732
36 | Number of URLs in ../raw_data/hentai/urls_hentai.txt:
37 |    45228
38 | Number of URLs in ../raw_data/neutral/urls_neutral.txt:
39 |    20960
40 | Number of URLs in ../raw_data/sexy/urls_sexy.txt:
41 |    19554
42 | Number of URLs in ../raw_data/porn/urls_porn.txt:
43 |   116521
44 | $ bash 2_download_from_urls.sh
45 | $ bash 3_optional_download_drawings.sh # optional
46 | $ bash 4_optional_download_neutral.sh # optional
47 | $ bash 5_create_train.sh
48 | $ bash 6_create_test.sh
49 | $ cd ../data
50 | $ ls train
51 | drawings hentai neutral porn sexy
52 | $ ls test
53 | drawings hentai neutral porn sexy
54 | ```
55 | 
56 | I was able to train a CNN classifier to 91% accuracy with the following confusion matrix:
57 | ![alt text](confusion_matrix.png)
58 | 
59 | As expected,  `anime` and `hentai` are confused with each other more frequently than with other classes.
60 | 
61 | Same with `porn` and `sexy` categories.
62 | 


--------------------------------------------------------------------------------
/confusion_matrix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EBazarov/nsfw_data_scrapper/15a903bcf7ad32af5d949f77edba928ed9df96d6/confusion_matrix.png


--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
 1 | name: nsfw_data_scrapper
 2 | channels:
 3 |   - anaconda
 4 |   - defaults
 5 |   - conda-forge
 6 | dependencies:
 7 |   - asn1crypto=0.24.0=py37_0
 8 |   - beautifulsoup4=4.6.3=py37_0
 9 |   - ca-certificates=2018.03.07=0
10 |   - certifi=2018.10.15=py37_0
11 |   - cffi=1.11.5=py37he75722e_1
12 |   - chardet=3.0.4=py37_1
13 |   - cryptography=2.4.1=py37h1ba5d50_0
14 |   - icu=58.2=h211956c_0
15 |   - idna=2.7=py37_0
16 |   - libedit=3.1.20170329=h6b74fdf_2
17 |   - libffi=3.2.1=h4deb6c0_3
18 |   - libgcc-ng=8.2.0=hdf63c60_1
19 |   - libstdcxx-ng=8.2.0=hdf63c60_1
20 |   - libxml2=2.9.8=h26e45fe_1
21 |   - libxslt=1.1.32=h1312cb7_0
22 |   - lxml=4.2.5=py37hefd8a0e_0
23 |   - ncurses=6.1=he6710b0_1
24 |   - openssl=1.1.1=h7b6447c_0
25 |   - pip=18.1=py37_0
26 |   - pycparser=2.19=py37_0
27 |   - pyopenssl=18.0.0=py37_0
28 |   - pysocks=1.6.8=py37_0
29 |   - python=3.7.1=h0371630_3
30 |   - readline=7.0=h7b6447c_5
31 |   - requests=2.20.1=py37_0
32 |   - setuptools=40.6.2=py37_0
33 |   - six=1.11.0=py37_1
34 |   - sqlite=3.25.3=h7b6447c_0
35 |   - tk=8.6.8=hbc83047_0
36 |   - urllib3=1.23=py37_0
37 |   - wheel=0.32.3=py37_0
38 |   - xz=5.2.4=h14c3975_4
39 |   - zlib=1.2.11=h7b6447c_3
40 | 


--------------------------------------------------------------------------------
/scripts/1_get_urls.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 4 | base_dir="$(dirname "$scripts_dir")"
 5 | raw_data_dir="$base_dir/raw_data"
 6 | 
 7 | declare -a class_names=(
 8 | 	"neutral"
 9 | 	"drawings"
10 | 	"sexy"
11 | 	"porn"
12 | 	"hentai"
13 | 	)
14 | 
15 | for cname in "${class_names[@]}"
16 | do
17 | 	echo "Getting images for class: $cname"
18 | 	urls_file="$raw_data_dir/$cname/urls_$cname.txt"
19 | 	while read url
20 | 	do
21 | 		if [[ ! "$url" =~ ^"#" ]]
22 | 		then
23 | 			echo "$url"
24 | 			java -jar "$scripts_dir/ripme.jar" --skip404 --ripsdirectory "$raw_data_dir/$cname" --url "$url"
25 | 		fi
26 | 	done < "$scripts_dir/source_urls/$cname.txt"
27 | done
28 | 
29 | 
30 | declare -a ph_porn_keywords=(
31 | 	"blowjob"
32 | 	"sex"
33 | 	"gangbang"
34 | 	"fingering"
35 | 	"asian"
36 | 	"bukkake"
37 | 	"teen"
38 | 	"cumshot"
39 | 	"milf"
40 | 	"pussy"
41 | 	"creampie"
42 | 	)
43 | 
44 | for keyword in "${ph_porn_keywords[@]}"
45 | do
46 |     urls_file="$raw_data_dir/porn/urls.txt"
47 |     python download_nsfw_urls.py -k "$keyword" -o "$urls_file"
48 | done
49 | 
50 | declare -a ph_hentai_keywords=(
51 | 	"hentai"
52 | 	"manga"
53 | 	"cartoon"
54 | 	)
55 | 
56 | for keyword in "${ph_hentai_keywords[@]}"
57 | do
58 |     urls_file="$raw_data_dir/hentai/urls.txt"
59 |     python download_nsfw_urls.py -k "$keyword" -o "$urls_file"
60 | done
61 | 
62 | for cname in "${class_names[@]}"
63 | do
64 | 	urls_file="$raw_data_dir/$cname/urls_$cname.txt"
65 | 	tmpfile=$(mktemp)
66 | 	find "$raw_data_dir/$cname" -type f -name "urls.txt" -exec cat {} + >> "$tmpfile"
67 | 	grep -e ".jpeg" -e ".jpg" "$tmpfile" > "$urls_file"
68 | 	sort -u -o "$urls_file" "$urls_file"
69 | 	rm "$tmpfile"
70 | done


--------------------------------------------------------------------------------
/scripts/2_download_from_urls.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 4 | base_dir="$(dirname "$scripts_dir")"
 5 | raw_data_dir="$base_dir/raw_data"
 6 | 
 7 | declare -a class_names=(
 8 | 	"neutral"
 9 | 	"drawings"
10 | 	"sexy"
11 | 	"porn"
12 | 	"hentai"
13 | 	)
14 | 
15 | for cname in "${class_names[@]}"
16 | do
17 | 	urls_file="$raw_data_dir/$cname/urls_$cname.txt"
18 | 	images_dir="$raw_data_dir/$cname/IMAGES"
19 | 	mkdir -p "$images_dir"
20 | 	echo "Class: $cname. Total # of urls: $(cat $urls_file | wc -l)"
21 | 	echo "Downloading images..."
22 | 	wget -nc -q -i "$urls_file" -P "$images_dir"
23 | done
24 | 


--------------------------------------------------------------------------------
/scripts/3_optional_download_drawings.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 4 | base_dir="$(dirname "$scripts_dir")"
 5 | raw_data_dir="$base_dir/raw_data"
 6 | drawings_dir="$raw_data_dir/drawings"
 7 | download_dir="$drawings_dir/anime_dataset"
 8 | mkdir -p "$download_dir"
 9 | 
10 | 
11 | n_batches=4
12 | # since the numbering starts at 0, actual number of batches will be `n_batches + 1`
13 | # each batch contains roughly 2200 images
14 | for batch_num in $(seq -f "%04g" 0 "$n_batches")
15 | do
16 |     rsync --recursive --times "rsync://78.46.86.149:873/danbooru2018/512px/$batch_num" "$download_dir"
17 | done
18 | 
19 | # for whatever reason, most images contain black borders
20 | # need to remove them
21 | for subdir_name in $(ls "$download_dir")
22 | do
23 |     find "$download_dir/$subdir_name" -name "*.jpg" -print0 |
24 |     while IFS= read -r -d '' img
25 |     do
26 |         convert "$img" -bordercolor black -border 1x1 -fuzz 20% -trim "$img"
27 |     done
28 | done


--------------------------------------------------------------------------------
/scripts/4_optional_download_neutral.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 4 | base_dir="$(dirname "$scripts_dir")"
 5 | raw_data_dir="$base_dir/raw_data"
 6 | mkdir -p "$raw_data_dir/neutral"
 7 | 
 8 | wget http://www.vision.caltech.edu/Image_Datasets/Caltech256/256_ObjectCategories.tar -P "$raw_data_dir/neutral"
 9 | tar -xf "$raw_data_dir/neutral/256_ObjectCategories.tar" -C "$raw_data_dir/neutral"
10 | 


--------------------------------------------------------------------------------
/scripts/5_create_train.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 4 | base_dir="$(dirname "$scripts_dir")"
 5 | raw_data_dir="$base_dir/raw_data"
 6 | data_dir="$base_dir/data"
 7 | 
 8 | declare -a class_names=(
 9 | 	"neutral"
10 | 	"drawings"
11 | 	"sexy"
12 | 	"porn"
13 | 	"hentai"
14 | 	)
15 | 
16 | train_dir="$data_dir/train"
17 | mkdir -p "$train_dir"
18 | 
19 | echo "Copying image to the training folder"
20 | for cname in "${class_names[@]}"
21 | do
22 | 	raw_data_class_dir="$raw_data_dir/$cname"
23 | 	if [[ -d "$raw_data_class_dir" ]]
24 | 	then
25 | 		mkdir -p "$train_dir/$cname"
26 | 		find "$raw_data_class_dir" -type f \( -name '*.jpg' -o -name '*.jpeg' \) -print0 |
27 | 		while IFS= read -r -d '' jpg_f
28 | 		do
29 | 		    cp "$jpg_f" "$train_dir/$cname/$(uuidgen).jpg"
30 | 		done
31 | 	fi
32 | done
33 | 
34 | echo "Removing corrupted images"
35 | find "$train_dir" -type f -name '*.jpg' -print0 | 
36 | while IFS= read -r -d '' jpg_f
37 | do
38 |     is_corrupted="$(convert "$jpg_f" /dev/null &> /dev/null; echo $?)"
39 | 	if [[ "$is_corrupted" -eq  "1" ]]
40 | 	then
41 | 		echo "removing: $jpg_f"
42 | 		rm "$jpg_f"
43 | 	fi
44 | done
45 | 
46 | echo "Number of files per class:"
47 | for subdir in $(ls "$train_dir")
48 | do 
49 | 	echo "$subdir": "$(find "$train_dir/$subdir" -type f | wc -l)"
50 | done


--------------------------------------------------------------------------------
/scripts/6_create_test.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | scripts_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
 4 | base_dir="$(dirname "$scripts_dir")"
 5 | data_dir="$base_dir/data"
 6 | 
 7 | N=2000
 8 | 
 9 | declare -a class_names=(
10 | 	"neutral"
11 | 	"drawings"
12 | 	"sexy"
13 | 	"porn"
14 | 	"hentai"
15 | 	)
16 | 
17 | 
18 | train_dir="$data_dir/train"
19 | test_dir="$data_dir/test"
20 | mkdir -p "$test_dir"
21 | 
22 | for cname in "${class_names[@]}"
23 | do
24 | 	test_dir_class="$test_dir/$cname"
25 | 	mkdir -p "$test_dir_class"
26 | 	train_dir_class="$train_dir/$cname"
27 | 	ls "$train_dir_class" | shuf -n $N | xargs -I{} mv "$train_dir_class/{}" "$test_dir_class"
28 | done
29 | 
30 | echo "Number of files per class (train):"
31 | for subdir in $(ls "$train_dir")
32 | do
33 | 	echo "$subdir": "$(find "$train_dir/$subdir" -type f | wc -l)"
34 | done
35 | 
36 | echo "Number of files per class (test):"
37 | for subdir in $(ls "$test_dir")
38 | do
39 | 	echo "$subdir": "$(find "$test_dir/$subdir" -type f | wc -l)"
40 | done


--------------------------------------------------------------------------------
/scripts/download_nsfw_urls.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from bs4 import BeautifulSoup
 3 | 
 4 | BASE_URL = 'https://www.pornhub.com'
 5 | 
 6 | 
 7 | def get_albums_urls_from_keyword(keyword, max_page_num=30):
 8 |     album_urls = []
 9 |     for page_num in range(1, max_page_num + 1):
10 |         search_url = f'{BASE_URL}/albums?search={keyword}&page={page_num}'
11 |         html = requests.get(search_url).text
12 |         bs_data = BeautifulSoup(html, "lxml")
13 |         album_divs = bs_data.find_all("div", {"class": "photoAlbumListBlock"})
14 | 
15 |         for div in album_divs:
16 |             try:
17 |                 album_url = div.find_all("a", href=True)[0].attrs["href"]
18 |                 album_urls.append(f'{BASE_URL}{album_url}')
19 |             except Exception as e:
20 |                 print(e)
21 |     return album_urls
22 | 
23 | 
24 | def get_preview_urls_from_album_url(album_url):
25 |     html = requests.get(album_url).text
26 |     bs_data = BeautifulSoup(html, "lxml")
27 |     preview_divs = bs_data.find_all("div", {"class": "photoAlbumListBlock"})
28 |     preview_urls = []
29 |     for preview_div in preview_divs:
30 |         short_preview_url = preview_div.find_all("a", href=True)[0].attrs["href"]
31 |         preview_urls.append(f'{BASE_URL}{short_preview_url}')
32 |     return preview_urls
33 | 
34 | 
35 | def get_image_url_from_preview_url(preview_url):
36 |     html = requests.get(preview_url).text
37 |     bs_data = BeautifulSoup(html, "lxml")
38 |     try:
39 |         center_img_div = bs_data.find_all("div", {"class": "centerImage"})[1]
40 |     except Exception as e:
41 |         print(f"Couldn't get image url from preview url {preview_url}:\n{str(e)}")
42 |         return None
43 |     image_url = center_img_div.find_all("a", href=True)[-1].find_all("img")[0]['src']
44 |     return image_url
45 | 
46 | 
47 | def print_image_urls_from_keyword(keyword, out_file, jpg_only=True, max_page_num=30):
48 |     album_urls = get_albums_urls_from_keyword(keyword, max_page_num)
49 |     if len(album_urls) > 0:
50 |         for album_url in album_urls:
51 |             preview_urls = get_preview_urls_from_album_url(album_url)
52 |             if len(preview_urls) > 0:
53 |                 for preview_url in preview_urls:
54 |                     if len(preview_urls) > 0:
55 |                         image_url = get_image_url_from_preview_url(preview_url)
56 |                         if image_url is not None:
57 |                             if (jpg_only and image_url.endswith('.jpg')) or (not jpg_only):
58 |                                 with open(out_file, "a") as f:
59 |                                     f.write(image_url + "\n")
60 | 
61 | 
62 | if __name__ == "__main__":
63 |     from argparse import ArgumentParser
64 | 
65 |     parser = ArgumentParser()
66 |     parser.add_argument("-k", "--keyword", dest="keyword", help="search keyword", required=True)
67 |     parser.add_argument("-o", "--out_file", dest="out_file", help="output filepath", required=True)
68 |     parser.add_argument("-j", "--jpg_only", dest="jpg_only", help="download jpg only", default=True)
69 |     parser.add_argument("-p", "--page_num", dest="page_num", help="maximum number of page to parse", default=30)
70 |     args = parser.parse_args()
71 |     print_image_urls_from_keyword(keyword=args.keyword, out_file=args.out_file,
72 |                                   jpg_only=bool(args.jpg_only), max_page_num=int(args.page_num))
73 | 


--------------------------------------------------------------------------------
/scripts/rip.properties:
--------------------------------------------------------------------------------
 1 | # Download threads to use per ripper
 2 | threads.size = 5
 3 | 
 4 | # Overwrite existing files
 5 | file.overwrite = false
 6 | 
 7 | # Number of retries on failed downloads
 8 | download.retries = 1
 9 | 
10 | # File download timeout (in milliseconds)
11 | download.timeout = 60000
12 | 
13 | # Page download timeout (in milliseconds)
14 | page.timeout = 5000
15 | 
16 | # Maximum size of downloaded files in bytes (required)
17 | download.max_size = 104857600
18 | 
19 | # Don't retry on 404 errors
20 | error.skip404 = true
21 | 
22 | # API creds
23 | twitter.auth = VW9Ybjdjb1pkd2J0U3kwTUh2VXVnOm9GTzVQVzNqM29LQU1xVGhnS3pFZzhKbGVqbXU0c2lHQ3JrUFNNZm8=
24 | tumblr.auth = JFNLu3CbINQjRdUvZibXW9VpSEVYYtiPJ86o8YmvgLZIoKyuNX
25 | gw.api = gonewild
26 | 
27 | twitter.max_requests = 10
28 | 
29 | clipboard.autorip = false
30 | 
31 | download.save_order = false
32 | remember.url_history = false
33 | window.position = false
34 | descriptions.save = false
35 | auto.update = false
36 | log.level = Log level: Error
37 | play.sound = false
38 | download.show_popup = false
39 | log.save = false
40 | urls_only.save = true
41 | album_titles.save = false
42 | prefer.mp4 = false
43 | errors.skip404 = true
44 | rips.directory = /home/al_kim/Documents/MyGithub/nsfw_data_scrapper/raw_data/neutral
45 | download.history = https://www.reddit.com/r/mildlypenis/top/?t=all,https://www.reddit.com/r/mildlyvagina/top/?t=all,https://www.reddit.com/r/Damnthatsinteresting/top/?t=all,https://www.reddit.com/r/tattoos/top/?t=all,https://www.reddit.com/r/progresspics/top/?t=all,https://www.reddit.com/r/photoshopbattles/top/?t=all,https://www.reddit.com/r/aww/top/?t=all,https://www.reddit.com/r/funny/top/?t=all,https://www.reddit.com/r/pics/top/?t=all,https://www.reddit.com/r/photographs/top/?t=all,https://www.reddit.com/r/photography/top/?t=all,https://www.reddit.com/r/EarthPorn/top/?t=all,https://www.reddit.com/r/HistoryPorn/top/?t=all
46 | 


--------------------------------------------------------------------------------
/scripts/ripme.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EBazarov/nsfw_data_scrapper/15a903bcf7ad32af5d949f77edba928ed9df96d6/scripts/ripme.jar


--------------------------------------------------------------------------------
/scripts/source_urls/drawings.txt:
--------------------------------------------------------------------------------
 1 | https://www.reddit.com/r/AnimeCalendar/top/?t=all
 2 | https://www.reddit.com/r/Melanime/top/?t=all
 3 | https://www.reddit.com/r/anime/top/?t=all
 4 | https://www.reddit.com/r/Boruto/top/?t=all
 5 | https://www.reddit.com/r/overlord/top/?t=all
 6 | https://www.reddit.com/r/streetmoe/top/?t=all
 7 | https://www.reddit.com/r/animeponytails/top/?t=all
 8 | https://www.reddit.com/r/awoonime/top/?t=all
 9 | https://www.reddit.com/r/awwnime/top/?t=all
10 | https://www.reddit.com/r/bishounen/top/?t=all
11 | https://www.reddit.com/r/cutelittlefangs/top/?t=all
12 | https://www.reddit.com/r/cuteanimeboys/top/?t=all
13 | https://www.reddit.com/r/endcard/top/?t=all
14 | https://www.reddit.com/r/gunime/top/?t=all
15 | https://www.reddit.com/r/Moescape/top/?t=all
16 | https://www.reddit.com/r/headpats/top/?t=all
17 | https://www.reddit.com/r/imouto/top/?t=all
18 | https://www.reddit.com/r/kyoaniyuri/top/?t=all
19 | https://www.reddit.com/r/Patchuu/top/?t=all
20 | https://www.reddit.com/r/Pixiv/top/?t=all
21 | https://www.reddit.com/r/pokemoe/top/?t=all
22 | https://www.reddit.com/r/Pouts/top/?t=all
23 | https://www.reddit.com/r/SleepingAnimeGirls/top/?t=all
24 | https://www.reddit.com/r/Tsunderes/top/?t=all
25 | https://www.reddit.com/r/twintails/top/?t=all
26 | https://www.reddit.com/r/WholesomeYuri/top/?t=all
27 | https://www.reddit.com/r/zettairyouiki/top/?t=all


--------------------------------------------------------------------------------
/scripts/source_urls/hentai.txt:
--------------------------------------------------------------------------------
1 | https://www.reddit.com/r/hentai/top/?t=all
2 | https://www.reddit.com/r/HentaiHumiliation/top/?t=all
3 | https://www.reddit.com/r/HentaiPics/top/?t=all
4 | https://www.reddit.com/r/HentaiLovers/top/?t=all
5 | https://www.reddit.com/r/Hentai4Everyone/top/?t=all
6 | https://www.reddit.com/r/ecchi/top/?t=all
7 | https://www.reddit.com/r/MonsterGirl/top/?t=all
8 | https://www.reddit.com/r/sukebei/top/?t=all
9 | https://www.reddit.com/r/yaoi/top/?t=all


--------------------------------------------------------------------------------
/scripts/source_urls/neutral.txt:
--------------------------------------------------------------------------------
 1 | https://www.reddit.com/r/mildlypenis/top/?t=all
 2 | https://www.reddit.com/r/mildlyvagina/top/?t=all
 3 | https://www.reddit.com/r/Damnthatsinteresting/top/?t=all
 4 | https://www.reddit.com/r/tattoos/top/?t=all
 5 | https://www.reddit.com/r/progresspics/top/?t=all
 6 | https://www.reddit.com/r/photoshopbattles/top/?t=all
 7 | https://www.reddit.com/r/aww/top/?t=all
 8 | https://www.reddit.com/r/funny/top/?t=all
 9 | https://www.reddit.com/r/pics/top/?t=all
10 | https://www.reddit.com/r/photographs/top/?t=all
11 | https://www.reddit.com/r/photography/top/?t=all
12 | https://www.reddit.com/r/EarthPorn/top/?t=all
13 | https://www.reddit.com/r/HistoryPorn/top/?t=all
14 | https://www.reddit.com/r/Rateme/top/?t=all
15 | https://www.reddit.com/r/roastme/top/?t=all
16 | https://www.reddit.com/r/wtfstockphotos/top/?t=all
17 | https://www.reddit.com/r/mildlyinteresting/top/?t=all
18 | https://www.reddit.com/r/interestingasfuck/top/?t=all
19 | https://www.reddit.com/r/wholesomememes/top/?t=all
20 | https://www.reddit.com/r/memes/top/?t=all
21 | https://www.reddit.com/r/FoodPorn/top/?t=all
22 | https://www.reddit.com/r/desert/top/?t=all
23 | https://www.reddit.com/r/desertporn/top/?t=all
24 | https://www.reddit.com/r/ImaginaryDeserts/top/?t=all
25 | https://www.reddit.com/r/mildlyboobs/top/?t=all
26 | 


--------------------------------------------------------------------------------
/scripts/source_urls/porn.txt:
--------------------------------------------------------------------------------
1 | https://www.reddit.com/r/gangbang/top/?t=all
2 | https://www.reddit.com/r/Hegoesdown/top/?t=all
3 | https://www.reddit.com/r/Pussylicking/top/?t=all
4 | https://www.reddit.com/r/anal/top/?t=all
5 | https://www.reddit.com/r/pornpics/top/?t=all
6 | https://www.reddit.com/r/blowjob/top/?t=all
7 | https://www.reddit.com/r/blowbang/top/?t=all
8 | https://www.reddit.com/r/Triplepenetration/top/?t=all


--------------------------------------------------------------------------------
/scripts/source_urls/sexy.txt:
--------------------------------------------------------------------------------
 1 | https://www.reddit.com/r/celebritylegs/top/?t=all
 2 | https://www.reddit.com/r/Models/top/?t=all
 3 | https://www.reddit.com/r/VSModels/top/?t=all
 4 | https://www.reddit.com/r/goddesses/top/?t=all
 5 | https://www.reddit.com/r/tightdresses/top/?t=all
 6 | https://www.reddit.com/r/girlsinyogapants/top/?t=all
 7 | https://www.reddit.com/r/burstingout/top/?t=all
 8 | https://www.reddit.com/r/randomsexiness/top/?t=all
 9 | https://www.reddit.com/r/BustyPetite/top/?t=all
10 | https://www.reddit.com/r/SexyTummies/top/?t=all
11 | https://www.reddit.com/r/VolleyballGirls/top/?t=all
12 | https://www.reddit.com/r/RealGirls/top/?t=all
13 | https://www.reddit.com/r/sexygirls/top/?t=all
14 | https://www.reddit.com/r/stripgirls/top/?t=all
15 | https://www.reddit.com/r/HotGirlsNSFW/top/?t=all
16 | https://www.reddit.com/r/OnePieceSuits/top/?t=all
17 | https://www.reddit.com/r/swimsuit/top/?t=all
18 | https://www.reddit.com/r/nsfwswimsuit/top/?t=all
19 | https://www.reddit.com/r/leotards/top/?t=all
20 | https://www.reddit.com/r/swimsuits/top/?t=all
21 | https://www.reddit.com/r/bikinis/top/?t=all
22 | https://www.reddit.com/r/CrochetBikinis/top/?t=all
23 | https://www.reddit.com/r/MicroBikini/top/?t=all
24 | https://www.reddit.com/r/asiansinswimsuits/top/?t=all
25 | 


--------------------------------------------------------------------------------