├── .gitignore
├── media
    ├── header_im.jpg
    ├── header_im2.jpg
    ├── face_size_chart.png
    ├── license_pie_chart.png
    ├── category_pie_chart.png
    └── cateogry_pie_chart.png
├── search_username.py
├── FDF256.md
├── download_fdf256.py
├── download.py
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | data/
2 | 


--------------------------------------------------------------------------------
/media/header_im.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/header_im.jpg


--------------------------------------------------------------------------------
/media/header_im2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/header_im2.jpg


--------------------------------------------------------------------------------
/media/face_size_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/face_size_chart.png


--------------------------------------------------------------------------------
/media/license_pie_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/license_pie_chart.png


--------------------------------------------------------------------------------
/media/category_pie_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/category_pie_chart.png


--------------------------------------------------------------------------------
/media/cateogry_pie_chart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hukkelas/FDF/HEAD/media/cateogry_pie_chart.png


--------------------------------------------------------------------------------
/search_username.py:
--------------------------------------------------------------------------------
 1 | import click
 2 | import json
 3 | from pathlib import Path
 4 | 
 5 | @click.command()
 6 | @click.argument("filepath", type=click.Path(exists=True, file_okay=True, dir_okay=False))
 7 | @click.option("--user_nsid", required=True)
 8 | def main(filepath, user_nsid):
 9 |     assert user_nsid is not None
10 |     with open(filepath, "r") as fp:
11 |         data = json.load(fp)
12 |     for key, item in data.items():
13 |         if item["user_nsid"] == user_nsid:
14 |             print(f"Image {key} includes your username")
15 | 
16 | if __name__ == "__main__":
17 |     main()
18 | 
19 | 


--------------------------------------------------------------------------------
/FDF256.md:
--------------------------------------------------------------------------------
 1 | # FDF256 
 2 | 
 3 | The Flickr Diverse Faces 256 (FDF256) dataset is a derivate from 248,564 images from the YFCC100M dataset, following the dataset generation of the original FDF dataset.
 4 | The training dataset consists of 241,982 images and the validation set of 6533 images, where each face is up/downsampled to $256 \times 256$.
 5 | We filter out all faces where the original resolution is smaller than $64 \times 64$.
 6 | Each face is annotated with keypoints from a pre-trained keypoint R-CNN R50-FPN from torchvision, and the bounding box is from the official implementation of DSFD.
 7 | 
 8 | 
 9 | ## Licenses
10 | The images are collected from images in the YFCC-100M dataset and each image in our dataset is free to use for **academic** or **open source** projects. For each face, the corresponding original license is given in the metadata.
11 | Some of the images require giving proper credit to the original author, as well as indicating any changes that were made to the images. The original author is given in the metadata.
12 | 
13 | ## Download
14 | 
15 | 1. First, install dependencies:
16 | 
17 | ```bash
18 | pip install wget, tqdm, click
19 | ```
20 | 
21 | 2. To download, run (expects python 3.6+): 
22 | 
23 | ```
24 | python download_fdf256.py
25 | ```
26 | 
27 | ## Citation
28 | If you find this dataset, please cite:
29 | ```
30 | @inproceedings{hukkelas23DP2,
31 |   author={Hukkelås, Håkon and Lindseth, Frank},
32 |   booktitle={2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, 
33 |   title={DeepPrivacy2: Towards Realistic Full-Body Anonymization}, 
34 |   year={2023},
35 |   volume={},
36 |   number={},
37 |   pages={1329-1338},
38 |   doi={10.1109/WACV56688.2023.00138}}
39 | ```
40 | 
41 | ## Privacy
42 | FDF consists of photos that are published for free use and redistribution by the respective authors.
43 | 
44 | To find out if your photo is included in FDF256: we have included a script to search the dataset with your flickr User NSID.
45 | 
46 | To check if your user is included, run the script:
47 | ```
48 | python3 search_username.py data/fdf256/train/fdf_metainfo.json --user_nsid=FLICKR_USER_NSID
49 | ```
50 | The script will print all images that where published with the given user nsid.
51 | 
52 | To get your photo removed from FDF:
53 | 
54 | 1. Go to Flickr and do one of the following:
55 |     - Tag the photo with `no_cv` to indicate that you do not wish it to be used for computer vision research.
56 |     - Change the license of the photo to `None` (All rights reserved) or any Creative Commons license with `NoDerivs` to indicate that you do not want it to be redistributed.
57 |     - Make the photo private, i.e., only visible to you and your friends/family.
58 |     - Get the photo removed from Flickr altogether.
59 | 2. Contact [hakon.hukkelas@ntnu.no](mailto:hakon.hukkelas@ntnu.no). Include your flicker User NSID in the mail.
60 | 


--------------------------------------------------------------------------------
/download_fdf256.py:
--------------------------------------------------------------------------------
 1 | import wget
 2 | import os
 3 | import zipfile
 4 | import click
 5 | from pathlib import Path
 6 | from hashlib import md5
 7 | md5sums = {
 8 |     "cc-by-2": "e45e313358a5912927ed3a8aa620b3b1",
 9 |     "cc-by-nc-2": "12c531a59a47783bca53d69b04653805",
10 |     "cc-by-sa-2": "2cd40e77def0e14148530d7f250a199e",
11 |     "cc-by-nc-sa-2": "afad28fdb033ae57ce5e5e2d95a6be18",
12 | }
13 | 
14 | image_urls = {
15 |     "cc-by-nc-sa-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/33bb6132-a30e-4169-a09e-48b94cd5e09010fd8f59-1db9-4192-96f0-83d18adf50b4ee3ab25c-b8f8-4c40-a8ba-eaa64d830412",
16 |     "cc-by-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/cb545564-120f-4f35-8b68-63e59e4fd273b1c36452-21e7-4976-85dd-a86c0738ebc256264f20-f969-4a82-ae1c-f6345d8e8d1f",
17 |     "cc-by-sa-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/4e5c27bd-f5fd-4dd3-bf2b-4434a8952df0ff4d11f8-e993-4517-9378-b35d89e7882ecae76ce7-88a0-417b-b5d8-e030863e97f6",
18 |     "cc-by-nc-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/da46d666-4378-4e75-9182-e683ebe08f2e9203750f-7ed8-42f7-8bce-90a7dfddd3764401df36-a3c3-4d39-ae8e-0cb4397e5c74",
19 | }
20 | 
21 | fdf_metainfo_url = "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/b704049a-d465-4a07-9cb3-ca270ffab80292e4d5ac-6172-4d37-bf63-4438f61f8aa0e1f6483d-5d45-40b5-b356-10b71fc00e89"
22 | fdf_metainfo_md5sum = "b790269bd64e9a6c1b1b032a9ff60410"
23 | 
24 | def extract_zip(zip_path: Path, target_path: Path):
25 |     print(f"Extracting contents of {zip_path} to  {target_path}")
26 |     with zipfile.ZipFile(zip_path, "r") as fp:
27 |         fp.extractall(str(target_path))
28 | 
29 | 
30 | def is_valid(filepath: Path, md5sum):
31 |     with open(filepath, "rb") as fp:
32 |         cur_md5sum = md5(fp.read()).hexdigest()
33 |     print(cur_md5sum, md5sum)
34 |     return cur_md5sum == md5sum
35 | 
36 | 
37 | def download(url, target_path: Path, md5sum):
38 |     target_path.parent.mkdir(exist_ok=True, parents=True)
39 |     if target_path.is_file():
40 |         if is_valid(target_path, md5sum):
41 |             print(f"File already downloaded: {target_path}. Skipping download")
42 |             return
43 |         print("Downloaded file is not correct. Deleting old.", target_path)
44 |         target_path.unlink()
45 |     print("Downloading:", url)
46 |     wget.download(url, str(target_path))
47 |     assert is_valid(target_path, md5sum), "Downloaded file is not correct."
48 | 
49 | 
50 | @click.command()
51 | @click.argument("target_path")
52 | def main(target_path):
53 |     target_path = Path(target_path)
54 |     download(fdf_metainfo_url, target_path.joinpath("metainfo.zip"), fdf_metainfo_md5sum)
55 |     
56 |     extract_zip(target_path.joinpath("metainfo.zip"), target_path)
57 | 
58 |     for image_license, image_url in image_urls.items():
59 |         print("Downloading images with license:", image_license)
60 |         download(image_url, target_path.joinpath(image_license + ".zip"), md5sums[image_license])
61 |         extract_zip(target_path.joinpath(image_license + ".zip"), target_path)
62 | 
63 | 
64 | main()


--------------------------------------------------------------------------------
/download.py:
--------------------------------------------------------------------------------
 1 | import wget
 2 | import argparse
 3 | import tqdm
 4 | import os
 5 | import zipfile
 6 | 
 7 | def extract_metainfo(zip_path, target_dir):
 8 |     with zipfile.ZipFile(zip_path, "r") as fp:
 9 |         for fileinfo in fp.infolist():
10 |             if fileinfo.is_dir():
11 |                 continue
12 |             orig_filename = fileinfo.filename
13 |             target_filename = orig_filename.replace("metainfo" + os.sep, "")
14 |             target_path = os.path.join(target_dir, target_filename)
15 |             dirname = os.path.dirname(target_path)
16 |             if dirname != "":
17 |                 os.makedirs(dirname, exist_ok=True)
18 |             fileinfo.filename = os.path.basename(target_path)
19 |             fp.extract(orig_filename, os.path.dirname(target_path))
20 | 
21 | 
22 | def extract_images(zip_path, orig_folder_name, target_dir):
23 |     with zipfile.ZipFile(zip_path, "r") as fp:
24 |         for fileinfo in tqdm.tqdm(fp.infolist(), desc=f"Extracting: {orig_folder_name}"):
25 |             if fileinfo.is_dir():
26 |                 continue
27 |             orig_filename = fileinfo.filename
28 |             target_filename = orig_filename.replace(orig_folder_name, "images")
29 |             target_path = os.path.join(target_dir, target_filename)
30 |             #target_path = os.path.dirname(target_path)
31 |             dirname = os.path.dirname(target_path)
32 |             if dirname != "":
33 |                 print("Creating:", dirname)
34 |                 os.makedirs(dirname, exist_ok=True)
35 |             fileinfo.filename = os.path.basename(target_path)
36 | #            target_dir = os.path.dirname(target_path)
37 |             fp.extract(orig_filename, os.path.dirname(target_path))
38 | 
39 | 
40 | def download(url, target_path):
41 |     if os.path.isfile(target_path):
42 |         print(f"File already downloaded: {target_path}. Skipping download")
43 |         return
44 |     print("Downloading:", url)
45 |     wget.download(url, target_path)
46 | 
47 | 
48 | 
49 | parser = argparse.ArgumentParser()
50 | parser.add_argument("--target_directory", default=os.path.join("data", "fdf"))
51 | parser.add_argument("--download_images", default=False, action="store_true")
52 | args = parser.parse_args()
53 | 
54 | os.makedirs(args.target_directory, exist_ok=True)
55 | 
56 | 
57 | fdf_metainfo_url = "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/87c06e58-a6cc-4299-81b6-c36f2bed6a0ce5810e37-59d6-4d8f-9e86-fdafe7b58c86106c2d7d-91e8-4c80-986a-0ccdbe02ddb0"
58 | 
59 | print("Downloading metainfo")
60 | metainfo_path = os.path.join(args.target_directory, "metainfo.zip")
61 | download(fdf_metainfo_url, metainfo_path)
62 | 
63 | extract_metainfo(metainfo_path, args.target_directory)
64 | 
65 | if not args.download_images:
66 |     exit(0)
67 | 
68 | image_urls = {
69 |     "cc-by-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/30d325f8-f726-4974-96d5-5cb351f58db378d1ec02-3261-492d-a77d-194efc8e32d6becdc34b-0f1f-45ec-9a6a-dc2bff37f3d8",
70 |     "cc-by-nc-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/e0dd287a-9a55-4082-a100-842279450bd9aa116eea-73fd-42e3-8b6a-6e3bb3e5629b765d7093-c784-4c69-90b2-694adf76c992",
71 |     "cc-by-nc-sa-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/cc32f149-d109-4e1e-ae6d-aa92dc10148e56a00d43-8d11-4ce4-b5f9-5ac4419bc86b2b3cfe29-74dd-4ef1-893f-f87b41170b12",
72 |     "cc-by-sa-2": "https://api.loke.aws.unit.no/dlr-gui-backend-resources-content/v2/contents/links/21aeaf4d-c6e9-4dfe-86ce-2203601623bfa9028100-9e89-49eb-8426-99eccd5ea7ac06082e81-0c2e-45b4-827d-a4abae7a9e78"
73 | }
74 | 
75 | for image_license, image_url in image_urls.items():
76 |     print("Downloading images with license:", image_license)
77 |     filename = f"{image_license}.zip"
78 |     target_path = os.path.join(args.target_directory, filename)
79 |     download(image_url, target_path)
80 |     extract_images(target_path, image_license, args.target_directory)
81 | 
82 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Flickr Diverse Faces - FDF
  2 | Flickr Diverse Faces (FDF) is a dataset with **1.5M faces** "in the wild".
  3 | FDF has a large diversity in terms of facial pose, age, ethnicity, occluding objects, facial painting, and image background.
  4 | The dataset is designed for generative models for face anonymization, and it was released with the paper "*DeepPrivacy: A Generative Adversarial Network for Face Anonymization.*
  5 | 
  6 | 
  7 | ![](media/header_im.jpg)
  8 | 
  9 | The dataset was crawled from the website Flickr ([YFCC-100M dataset](http://projects.dfki.uni-kl.de/yfcc100m/)) and automatically annotated.
 10 | Each face is annotated with **7 facial landmarks** (left/right ear, lef/right eye, left/right shoulder, and nose), and a **bounding box** of the face. [Our paper]() goes into more detail about the automatic annotation.
 11 | 
 12 | 
 13 | 
 14 | ## Licenses
 15 | The images are collected from images in the YFCC-100M dataset and each image in our dataset is free to use for **academic** or **open source** projects.
 16 | For each face, the corresponding original license is given in the metadata. Some of the images require giving proper credit to the original author, as well as indicating any changes that were made to the images. The original author is given in the metadata.
 17 | 
 18 | The dataset contains images with the following licenses:
 19 |  - [CC BY-NC-SA 2.0](https://creativecommons.org/licenses/by-nc-sa/2.0/): 623,598 Images (23.4 GB)
 20 |  - [CC BY-SA 2.0](https://creativecommons.org/licenses/by-sa/2.0/): 199,502 Images 7.4 GB)
 21 | - [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/): 352,961 Images (13.1 GB)
 22 | - [CC BY-NC 2.0](https://creativecommons.org/licenses/by-nc/2.0/): 295,192 Images (10.9 GB)
 23 | 
 24 | The FDF metadata is under [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
 25 | 
 26 | ## Citation
 27 | If you find this code or dataset useful, please cite the following:
 28 | ```
 29 | @InProceedings{10.1007/978-3-030-33720-9_44,
 30 | author="Hukkel{\aa}s, H{\aa}kon
 31 | and Mester, Rudolf
 32 | and Lindseth, Frank",
 33 | title="DeepPrivacy: A Generative Adversarial Network for Face Anonymization",
 34 | booktitle="Advances in Visual Computing",
 35 | year="2019",
 36 | publisher="Springer International Publishing",
 37 | pages="565--578",
 38 | isbn="978-3-030-33720-9"
 39 | }
 40 | ```
 41 | 
 42 | ## Download
 43 | 
 44 | 1. First, install dependencies:
 45 | 
 46 | ```bash
 47 | pip install wget, tqdm
 48 | ```
 49 | 
 50 | 2. To download metadata, run (expects python 3.6+): 
 51 | 
 52 | ```
 53 | python download.py --target_directory data/fdf
 54 | ```
 55 | 
 56 | 3. If you want to download including images:
 57 | ```
 58 | python download.py --target_directory data/fdf --download_images
 59 | ```
 60 | 
 61 | 
 62 | ## Metainfo
 63 | For each face in the dataset, follows the following metainfo:
 64 | 
 65 | ```
 66 | {
 67 |     "0": { // FDF image index
 68 |         "author": "flickr_username",
 69 |         "bounding_box": [], # List with 4 eleemnts [xmin, ymin, xmax, ymax] indicating the bounding box of the face in the FDF image. In range 0-1.
 70 |         "category": "validation", # validation or training set
 71 |         "date_crawled": "2019-3-6", 
 72 |         "date_taken": "2010-01-16 21:47:59.0",
 73 |         "date_uploaded": "2010-01-16",
 74 |         "landmark": [], # List with shape (7,2). Each row is (x0, y0) indicating the position of the landmark. Landmark order: [nose, r_eye, l_eye, r_ear, l_ear, r_shoulder, l_shoulder]. In range 0-1.
 75 |         "license": "Attribution-NonCommercial License",
 76 |         "license_url": "http://creativecommons.org/licenses/by-nc/2.0/",
 77 |         "original_bounding_box": [], # List with 4 eleemnts [xmin, ymin, xmax, ymax] indicating the bounding box of the face in original image from flickr.
 78 |         "original_landmark": [], # Landmark from the original image from flickr. List with shape (7,2). Each row is (x0, y0) indicating the position of the landmark. Landmark order: [nose, r_eye, l_eye, r_ear, l_ear, r_shoulder, l_shoulder]
 79 |         "photo_title": "original_photo_name", # Flickr photo title
 80 |         "photo_url": "http://www.flickr.com/photos/.../", # Original image URL
 81 |         "yfcc100m_line_idx": "0" # The Line index from the YFCC-100M dataset
 82 |     },
 83 |     ....
 84 | }
 85 | ```
 86 | 
 87 | ## Statistics
 88 | #### Distribution of image licenses
 89 | 
 90 | ![](media/license_pie_chart.png)
 91 | 
 92 | ### Training vs Validation Percentage
 93 | There are 50,000 validation images, 1,421,253 training images.
 94 | 
 95 | ![](media/category_pie_chart.png)
 96 | 
 97 | ### Original Face size
 98 | Each face in the original image has a resolution of minimum: 
 99 | 
100 | ![](media/face_size_chart.png)
101 | 
102 | ## Citation
103 | If you find the dataset useful, please cite the following:
104 | ```
105 | @InProceedings{10.1007/978-3-030-33720-9_44,
106 | author="Hukkel{\aa}s, H{\aa}kon
107 | and Mester, Rudolf
108 | and Lindseth, Frank",
109 | title="DeepPrivacy: A Generative Adversarial Network for Face Anonymization",
110 | booktitle="Advances in Visual Computing",
111 | year="2019",
112 | publisher="Springer International Publishing",
113 | pages="565--578",
114 | isbn="978-3-030-33720-9"
115 | }
116 | ```
117 | 


--------------------------------------------------------------------------------