├── .gitignore ├── media ├── header_im.jpg ├── header_im2.jpg ├── face_size_chart.png ├── license_pie_chart.png ├── category_pie_chart.png └── cateogry_pie_chart.png ├── search_username.py ├── FDF256.md ├── download_fdf256.py ├── download.py └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | data/ 2 | -------------------------------------------------------------------------------- /media/header_im.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/header_im.jpg -------------------------------------------------------------------------------- /media/header_im2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/header_im2.jpg -------------------------------------------------------------------------------- /media/face_size_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/face_size_chart.png -------------------------------------------------------------------------------- /media/license_pie_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/license_pie_chart.png -------------------------------------------------------------------------------- /media/category_pie_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/category_pie_chart.png -------------------------------------------------------------------------------- /media/cateogry_pie_chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/cateogry_pie_chart.png -------------------------------------------------------------------------------- /search_username.py: -------------------------------------------------------------------------------- 1 | import click 2 | import json 3 | from pathlib import Path 4 | 5 | @click.command() 6 | @click.argument("filepath", type=click.Path(exists=True, file_okay=True, dir_okay=False)) 7 | @click.option("--user_nsid", required=True) 8 | def main(filepath, user_nsid): 9 | assert user_nsid is not None 10 | with open(filepath, "r") as fp: 11 | data = json.load(fp) 12 | for key, item in data.items(): 13 | if item["user_nsid"] == user_nsid: 14 | print(f"Image {key} includes your username") 15 | 16 | if __name__ == "__main__": 17 | main() 18 | 19 | -------------------------------------------------------------------------------- /FDF256.md: -------------------------------------------------------------------------------- 1 | # FDF256 2 | 3 | The Flickr Diverse Faces 256 (FDF256) dataset is a derivate from 248,564 images from the YFCC100M dataset, following the dataset generation of the original FDF dataset. 4 | The training dataset consists of 241,982 images and the validation set of 6533 images, where each face is up/downsampled to $256 \times 256$. 5 | We filter out all faces where the original resolution is smaller than $64 \times 64$. 6 | Each face is annotated with keypoints from a pre-trained keypoint R-CNN R50-FPN from torchvision, and the bounding box is from the official implementation of DSFD. 7 | 8 | 9 | ## Licenses 10 | The images are collected from images in the YFCC-100M dataset and each image in our dataset is free to use for **academic** or **open source** projects. For each face, the corresponding original license is given in the metadata. 11 | Some of the images require giving proper credit to the original author, as well as indicating any changes that were made to the images. The original author is given in the metadata. 12 | 13 | ## Download 14 | 15 | 1. First, install dependencies: 16 | 17 | ```bash 18 | pip install wget, tqdm, click 19 | ``` 20 | 21 | 2. To download, run (expects python 3.6+): 22 | 23 | ``` 24 | python download_fdf256.py 25 | ``` 26 | 27 | ## Citation 28 | If you find this dataset, please cite: 29 | ``` 30 | @inproceedings{hukkelas23DP2, 31 | author={Hukkelås, Håkon and Lindseth, Frank}, 32 | booktitle={2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, 33 | title={DeepPrivacy2: Towards Realistic Full-Body Anonymization}, 34 | year={2023}, 35 | volume={}, 36 | number={}, 37 | pages={1329-1338}, 38 | doi={10.1109/WACV56688.2023.00138}} 39 | ``` 40 | 41 | ## Privacy 42 | FDF consists of photos that are published for free use and redistribution by the respective authors. 43 | 44 | To find out if your photo is included in FDF256: we have included a script to search the dataset with your flickr User NSID. 45 | 46 | To check if your user is included, run the script: 47 | ``` 48 | python3 search_username.py data/fdf256/train/fdf_metainfo.json --user_nsid=FLICKR_USER_NSID 49 | ``` 50 | The script will print all images that where published with the given user nsid. 51 | 52 | To get your photo removed from FDF: 53 | 54 | 1. Go to Flickr and do one of the following: 55 | - Tag the photo with `no_cv` to indicate that you do not wish it to be used for computer vision research. 56 | - Change the license of the photo to `None` (All rights reserved) or any Creative Commons license with `NoDerivs` to indicate that you do not want it to be redistributed. 57 | - Make the photo private, i.e., only visible to you and your friends/family. 58 | - Get the photo removed from Flickr altogether. 59 | 2. Contact [hakon.hukkelas@ntnu.no](mailto:hakon.hukkelas@ntnu.no). Include your flicker User NSID in the mail. 60 | -------------------------------------------------------------------------------- /download_fdf256.py: -------------------------------------------------------------------------------- 1 | import wget 2 | import os 3 | import zipfile 4 | import click 5 | from pathlib import Path 6 | from hashlib import md5 7 | md5sums = { 8 | "cc-by-2": "e45e313358a5912927ed3a8aa620b3b1", 9 | "cc-by-nc-2": "12c531a59a47783bca53d69b04653805", 10 | "cc-by-sa-2": "2cd40e77def0e14148530d7f250a199e", 11 | "cc-by-nc-sa-2": "afad28fdb033ae57ce5e5e2d95a6be18", 12 | } 13 | 14 | image_urls = { 15 | "cc-by-nc-sa-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/33bb6132-a30e-4169-a09e-48b94cd5e09010fd8f59-1db9-4192-96f0-83d18adf50b4ee3ab25c-b8f8-4c40-a8ba-eaa64d830412", 16 | "cc-by-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/cb545564-120f-4f35-8b68-63e59e4fd273b1c36452-21e7-4976-85dd-a86c0738ebc256264f20-f969-4a82-ae1c-f6345d8e8d1f", 17 | "cc-by-sa-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/4e5c27bd-f5fd-4dd3-bf2b-4434a8952df0ff4d11f8-e993-4517-9378-b35d89e7882ecae76ce7-88a0-417b-b5d8-e030863e97f6", 18 | "cc-by-nc-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/da46d666-4378-4e75-9182-e683ebe08f2e9203750f-7ed8-42f7-8bce-90a7dfddd3764401df36-a3c3-4d39-ae8e-0cb4397e5c74", 19 | } 20 | 21 | fdf_metainfo_url = "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/b704049a-d465-4a07-9cb3-ca270ffab80292e4d5ac-6172-4d37-bf63-4438f61f8aa0e1f6483d-5d45-40b5-b356-10b71fc00e89" 22 | fdf_metainfo_md5sum = "b790269bd64e9a6c1b1b032a9ff60410" 23 | 24 | def extract_zip(zip_path: Path, target_path: Path): 25 | print(f"Extracting contents of {zip_path} to {target_path}") 26 | with zipfile.ZipFile(zip_path, "r") as fp: 27 | fp.extractall(str(target_path)) 28 | 29 | 30 | def is_valid(filepath: Path, md5sum): 31 | with open(filepath, "rb") as fp: 32 | cur_md5sum = md5(fp.read()).hexdigest() 33 | print(cur_md5sum, md5sum) 34 | return cur_md5sum == md5sum 35 | 36 | 37 | def download(url, target_path: Path, md5sum): 38 | target_path.parent.mkdir(exist_ok=True, parents=True) 39 | if target_path.is_file(): 40 | if is_valid(target_path, md5sum): 41 | print(f"File already downloaded: {target_path}. Skipping download") 42 | return 43 | print("Downloaded file is not correct. Deleting old.", target_path) 44 | target_path.unlink() 45 | print("Downloading:", url) 46 | wget.download(url, str(target_path)) 47 | assert is_valid(target_path, md5sum), "Downloaded file is not correct." 48 | 49 | 50 | @click.command() 51 | @click.argument("target_path") 52 | def main(target_path): 53 | target_path = Path(target_path) 54 | download(fdf_metainfo_url, target_path.joinpath("metainfo.zip"), fdf_metainfo_md5sum) 55 | 56 | extract_zip(target_path.joinpath("metainfo.zip"), target_path) 57 | 58 | for image_license, image_url in image_urls.items(): 59 | print("Downloading images with license:", image_license) 60 | download(image_url, target_path.joinpath(image_license + ".zip"), md5sums[image_license]) 61 | extract_zip(target_path.joinpath(image_license + ".zip"), target_path) 62 | 63 | 64 | main() -------------------------------------------------------------------------------- /download.py: -------------------------------------------------------------------------------- 1 | import wget 2 | import argparse 3 | import tqdm 4 | import os 5 | import zipfile 6 | 7 | def extract_metainfo(zip_path, target_dir): 8 | with zipfile.ZipFile(zip_path, "r") as fp: 9 | for fileinfo in fp.infolist(): 10 | if fileinfo.is_dir(): 11 | continue 12 | orig_filename = fileinfo.filename 13 | target_filename = orig_filename.replace("metainfo" + os.sep, "") 14 | target_path = os.path.join(target_dir, target_filename) 15 | dirname = os.path.dirname(target_path) 16 | if dirname != "": 17 | os.makedirs(dirname, exist_ok=True) 18 | fileinfo.filename = os.path.basename(target_path) 19 | fp.extract(orig_filename, os.path.dirname(target_path)) 20 | 21 | 22 | def extract_images(zip_path, orig_folder_name, target_dir): 23 | with zipfile.ZipFile(zip_path, "r") as fp: 24 | for fileinfo in tqdm.tqdm(fp.infolist(), desc=f"Extracting: {orig_folder_name}"): 25 | if fileinfo.is_dir(): 26 | continue 27 | orig_filename = fileinfo.filename 28 | target_filename = orig_filename.replace(orig_folder_name, "images") 29 | target_path = os.path.join(target_dir, target_filename) 30 | #target_path = os.path.dirname(target_path) 31 | dirname = os.path.dirname(target_path) 32 | if dirname != "": 33 | print("Creating:", dirname) 34 | os.makedirs(dirname, exist_ok=True) 35 | fileinfo.filename = os.path.basename(target_path) 36 | # target_dir = os.path.dirname(target_path) 37 | fp.extract(orig_filename, os.path.dirname(target_path)) 38 | 39 | 40 | def download(url, target_path): 41 | if os.path.isfile(target_path): 42 | print(f"File already downloaded: {target_path}. Skipping download") 43 | return 44 | print("Downloading:", url) 45 | wget.download(url, target_path) 46 | 47 | 48 | 49 | parser = argparse.ArgumentParser() 50 | parser.add_argument("--target_directory", default=os.path.join("data", "fdf")) 51 | parser.add_argument("--download_images", default=False, action="store_true") 52 | args = parser.parse_args() 53 | 54 | os.makedirs(args.target_directory, exist_ok=True) 55 | 56 | 57 | fdf_metainfo_url = "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/87c06e58-a6cc-4299-81b6-c36f2bed6a0ce5810e37-59d6-4d8f-9e86-fdafe7b58c86106c2d7d-91e8-4c80-986a-0ccdbe02ddb0" 58 | 59 | print("Downloading metainfo") 60 | metainfo_path = os.path.join(args.target_directory, "metainfo.zip") 61 | download(fdf_metainfo_url, metainfo_path) 62 | 63 | extract_metainfo(metainfo_path, args.target_directory) 64 | 65 | if not args.download_images: 66 | exit(0) 67 | 68 | image_urls = { 69 | "cc-by-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/30d325f8-f726-4974-96d5-5cb351f58db378d1ec02-3261-492d-a77d-194efc8e32d6becdc34b-0f1f-45ec-9a6a-dc2bff37f3d8", 70 | "cc-by-nc-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/e0dd287a-9a55-4082-a100-842279450bd9aa116eea-73fd-42e3-8b6a-6e3bb3e5629b765d7093-c784-4c69-90b2-694adf76c992", 71 | "cc-by-nc-sa-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/cc32f149-d109-4e1e-ae6d-aa92dc10148e56a00d43-8d11-4ce4-b5f9-5ac4419bc86b2b3cfe29-74dd-4ef1-893f-f87b41170b12", 72 | "cc-by-sa-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/21aeaf4d-c6e9-4dfe-86ce-2203601623bfa9028100-9e89-49eb-8426-99eccd5ea7ac06082e81-0c2e-45b4-827d-a4abae7a9e78" 73 | } 74 | 75 | for image_license, image_url in image_urls.items(): 76 | print("Downloading images with license:", image_license) 77 | filename = f"{image_license}.zip" 78 | target_path = os.path.join(args.target_directory, filename) 79 | download(image_url, target_path) 80 | extract_images(target_path, image_license, args.target_directory) 81 | 82 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Flickr Diverse Faces - FDF 2 | Flickr Diverse Faces (FDF) is a dataset with **1.5M faces** "in the wild". 3 | FDF has a large diversity in terms of facial pose, age, ethnicity, occluding objects, facial painting, and image background. 4 | The dataset is designed for generative models for face anonymization, and it was released with the paper "*DeepPrivacy: A Generative Adversarial Network for Face Anonymization.* 5 | 6 | 7 | ![](media/header_im.jpg) 8 | 9 | The dataset was crawled from the website Flickr ([YFCC-100M dataset](http://projects.dfki.uni-kl.de/yfcc100m/)) and automatically annotated. 10 | Each face is annotated with **7 facial landmarks** (left/right ear, lef/right eye, left/right shoulder, and nose), and a **bounding box** of the face. [Our paper]() goes into more detail about the automatic annotation. 11 | 12 | 13 | 14 | ## Licenses 15 | The images are collected from images in the YFCC-100M dataset and each image in our dataset is free to use for **academic** or **open source** projects. 16 | For each face, the corresponding original license is given in the metadata. Some of the images require giving proper credit to the original author, as well as indicating any changes that were made to the images. The original author is given in the metadata. 17 | 18 | The dataset contains images with the following licenses: 19 | - [CC BY-NC-SA 2.0](https://creativecommons.org/licenses/by-nc-sa/2.0/): 623,598 Images (23.4 GB) 20 | - [CC BY-SA 2.0](https://creativecommons.org/licenses/by-sa/2.0/): 199,502 Images 7.4 GB) 21 | - [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/): 352,961 Images (13.1 GB) 22 | - [CC BY-NC 2.0](https://creativecommons.org/licenses/by-nc/2.0/): 295,192 Images (10.9 GB) 23 | 24 | The FDF metadata is under [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). 25 | 26 | ## Citation 27 | If you find this code or dataset useful, please cite the following: 28 | ``` 29 | @InProceedings{10.1007/978-3-030-33720-9_44, 30 | author="Hukkel{\aa}s, H{\aa}kon 31 | and Mester, Rudolf 32 | and Lindseth, Frank", 33 | title="DeepPrivacy: A Generative Adversarial Network for Face Anonymization", 34 | booktitle="Advances in Visual Computing", 35 | year="2019", 36 | publisher="Springer International Publishing", 37 | pages="565--578", 38 | isbn="978-3-030-33720-9" 39 | } 40 | ``` 41 | 42 | ## Download 43 | 44 | 1. First, install dependencies: 45 | 46 | ```bash 47 | pip install wget, tqdm 48 | ``` 49 | 50 | 2. To download metadata, run (expects python 3.6+): 51 | 52 | ``` 53 | python download.py --target_directory data/fdf 54 | ``` 55 | 56 | 3. If you want to download including images: 57 | ``` 58 | python download.py --target_directory data/fdf --download_images 59 | ``` 60 | 61 | 62 | ## Metainfo 63 | For each face in the dataset, follows the following metainfo: 64 | 65 | ``` 66 | { 67 | "0": { // FDF image index 68 | "author": "flickr_username", 69 | "bounding_box": [], # List with 4 eleemnts [xmin, ymin, xmax, ymax] indicating the bounding box of the face in the FDF image. In range 0-1. 70 | "category": "validation", # validation or training set 71 | "date_crawled": "2019-3-6", 72 | "date_taken": "2010-01-16 21:47:59.0", 73 | "date_uploaded": "2010-01-16", 74 | "landmark": [], # List with shape (7,2). Each row is (x0, y0) indicating the position of the landmark. Landmark order: [nose, r_eye, l_eye, r_ear, l_ear, r_shoulder, l_shoulder]. In range 0-1. 75 | "license": "Attribution-NonCommercial License", 76 | "license_url": "http://creativecommons.org/licenses/by-nc/2.0/", 77 | "original_bounding_box": [], # List with 4 eleemnts [xmin, ymin, xmax, ymax] indicating the bounding box of the face in original image from flickr. 78 | "original_landmark": [], # Landmark from the original image from flickr. List with shape (7,2). Each row is (x0, y0) indicating the position of the landmark. Landmark order: [nose, r_eye, l_eye, r_ear, l_ear, r_shoulder, l_shoulder] 79 | "photo_title": "original_photo_name", # Flickr photo title 80 | "photo_url": "http://www.flickr.com/photos/.../", # Original image URL 81 | "yfcc100m_line_idx": "0" # The Line index from the YFCC-100M dataset 82 | }, 83 | .... 84 | } 85 | ``` 86 | 87 | ## Statistics 88 | #### Distribution of image licenses 89 | 90 | ![](media/license_pie_chart.png) 91 | 92 | ### Training vs Validation Percentage 93 | There are 50,000 validation images, 1,421,253 training images. 94 | 95 | ![](media/category_pie_chart.png) 96 | 97 | ### Original Face size 98 | Each face in the original image has a resolution of minimum: 99 | 100 | ![](media/face_size_chart.png) 101 | 102 | ## Citation 103 | If you find the dataset useful, please cite the following: 104 | ``` 105 | @InProceedings{10.1007/978-3-030-33720-9_44, 106 | author="Hukkel{\aa}s, H{\aa}kon 107 | and Mester, Rudolf 108 | and Lindseth, Frank", 109 | title="DeepPrivacy: A Generative Adversarial Network for Face Anonymization", 110 | booktitle="Advances in Visual Computing", 111 | year="2019", 112 | publisher="Springer International Publishing", 113 | pages="565--578", 114 | isbn="978-3-030-33720-9" 115 | } 116 | ``` 117 | --------------------------------------------------------------------------------