├── .gitignore ├── ADD_NOISE_INSTRUCTIONS.md ├── EVALUATION_INSTRUCTIONS.md ├── LICENSE ├── README.md ├── TRAINING_INSTRUCTIONS.md ├── datasets ├── __init__.py ├── beyond_gauss.py ├── build_darktable_data.py ├── build_imagenet_data.py ├── build_pixel_isp_data.py ├── classes.py ├── dataset_factory.py ├── dataset_utils.py ├── imagenet.py ├── imagenet_2012_bounding_boxes.csv ├── imagenet_labels.txt ├── imagenet_lsvrc_2015_synsets.txt ├── imagenet_metadata.txt ├── imnet_reg.py ├── labels.txt ├── number_synsets.txt ├── raw.py ├── raw_metadata.txt ├── synset_labels.txt └── training_synsets.txt ├── deployment ├── __init__.py ├── model_deploy.py └── model_deploy_test.py ├── environment.yml ├── loss_functions ├── __init__.py └── loss_factory.py ├── nets ├── __init__.py ├── inception.py ├── inception_utils.py ├── isp.py ├── mobilenet_isp.py ├── mobilenet_v1.py ├── nets_factory.py ├── nets_factory_test.py └── unet.py ├── preprocessing ├── __init__.py ├── inception_preprocessing.py ├── isp_pretrain_preprocessing.py ├── joint_isp_preprocessing.py ├── no_preprocessing.py ├── preprocessing_factory.py ├── sensor_model.py └── writeout_preprocessing.py ├── run_test_captured_images.sh ├── run_test_synthetic_images.sh ├── run_train_joint_models.sh ├── simulate_raw_images.py ├── teaser ├── architecture_2.jpg └── teaser_v4.png ├── test_captured_images.py ├── test_synthetic_images.py └── train_image_classifier.py /.gitignore: -------------------------------------------------------------------------------- 1 | *__pycache__* 2 | *.idea 3 | *$py.class 4 | *.egg-info 5 | -------------------------------------------------------------------------------- /ADD_NOISE_INSTRUCTIONS.md: -------------------------------------------------------------------------------- 1 | # Simulating noisy raw images from Imagenet 2 | In order to evaluate and train new ISP or perception 3 | models on noisy images, we provide the noisy images 4 | that we used for evaluating a the Hardware ISP of Movidius Myraid 2 5 | evaluation board: [Noisy-ImageNet](https://drive.google.com/drive/folders/1f9B319TDtFpZSi7HEXnrPa31rtPm54iH?usp=sharing). 6 | 7 | We also provide the code to simulate noisy raw images from 8 | the ImageNet dataset, using the image formation model 9 | described in the manuscript. 10 | 11 | In order to introduce `2to20lux` noise to a the ImageNet dataset run 12 | 13 | ``` 14 | python simulate_raw_images.py --ll_low=0.001 --ll_high=0.010 \ 15 | --input_dir=$IMAGENET_DIR --output_dir=$OUT_DIR 16 | ``` 17 | where `$IMAGENET_DIR` is the ImageNet directory, training or evaluation sets, 18 | `$OUT_DIR` is the directory where the noisy images are written to, and 19 | `ll_low` and `ll_high` are the lowest and highest light level, respectively. 20 | To generate images with other noise profiles, adapt the `ll_low` and 21 | `ll_high` accordingly (see more examples in the `run_train_joint_models.sh` script). 22 | -------------------------------------------------------------------------------- /EVALUATION_INSTRUCTIONS.md: -------------------------------------------------------------------------------- 1 | # Evaluating pre-trained models 2 | In order to reproduce the results presented in the 3 | paper, first, download the [pre-trained models](https://drive.google.com/file/d/1kBTRAS2W5Ayf2DOxKIgIBmPv5OHaMbCD/view?usp=sharing). 4 | 5 | ## Evaluate our joint models on real data 6 | Download and extract the real captured (low-light) images [dataset](https://drive.google.com/file/d/1fj2u8t_wVdNVUmcjyeK8VuqDfTAd7RJA/view?usp=sharing). 7 | 8 | To run our `2to200lux` joint model over the captured data (Table 2 of the paper), 9 | run 10 | ``` 11 | python test_captured_images.py --device=1 --dataset_dir=$DATASET_DIR --dataset_name=imagenet \ 12 | --checkpoint_path=$CHECKPOINTS/joint128/2to200lux/model.ckpt-232721 \ 13 | --model_name=mobilenet_isp --noise_channel=True --use_anscombe=True \ 14 | --isp_model_name=isp --eval_image_size=224 --sensor=Pixel --eval_dir $OUT_DIR 15 | ``` 16 | where `--device` is the GPU where the model will run on, 17 | and `$DATASET_DIR` and `$CHECKPOINTS` should be set to the downloaded dataset 18 | and checkpoint directories, respectively. `$OUT_DIR` can be set to an 19 | arbitrary output directory path. See `run_test_captured_images.sh` for 20 | additional parameters to evaluate baseline models. 21 | 22 | ## Evaluate our joint models on synthetic data 23 | Download the [Imagenet][in] (validation) dataset. 24 | To evaluate our joint model over noisy images with a `6lux` noise profile, run 25 | ``` 26 | python test_synthetic_images.py --device=1 --checkpoint_path=$CHECKPOINTS/joint128/6lux/model.ckpt-222267 \ 27 | --dataset_dir=$IMAGENET_DATASET_DIR --dataset_name=imagenet --mode=6lux \ 28 | --model_name=mobilenet_isp --eval_dir=$OUT_DIR 29 | ``` 30 | 31 | where `$IMAGENET_DATASET_DIR` is the path to the ImageNet (validation) dataset, 32 | `$CHECKPOINTS` is set to the downloaded checkpoints directory, 33 | `--device` is the GPU where the model will run on, 34 | and `$OUT_DIR` is an arbitrary directory where the results are written to. 35 | 36 | For both the synthetic and captured image evaluation scripts, generated results include 37 | the noisy input raw images, anscombe network output images, 38 | and lists of correctly and wrongly classified images. 39 | To run the trained models over different noise profiles, 40 | modify the checkpoint paths and `--mode` parameter (3lux, 6lux, 2to20lux, or 2to200lux) 41 | accordingly. See `run_test_synthetic_images.sh` for the specific parameters for each noise profile. 42 | 43 | 44 | 45 | 46 | 49 | 50 | 67 | 68 | [in]: http://image-net.org/index -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 princeton-computational-imaging 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Dirty Pixels: Towards End-to-End Image Processing and Perception 2 | This repository contains the code for the paper 3 | 4 | **[Dirty Pixels: Towards End-to-End Image Processing and Perception][1]** 5 | [Steven Diamond][sd], [Vincent Sitzmann][vs], [Frank Julca-Aguilar][fj], [Stephen Boyd][sb], [Gordon Wetzstein][gw], [Felix Heide][fh] 6 | Transactions on Graphics, 2021 | To be presented at SIGGRAPH, 2021 7 | 8 |
9 | 10 |
11 | 12 |
13 | 14 |
15 | 16 | 17 |
18 | 19 | ## Installation 20 | Clone this repository: 21 | ``` 22 | git clone git@github.com:princeton-computational-imaging/DirtyPixels.git 23 | ``` 24 | 25 | The project was developed using Python 3.6, Tensorflow (v1.12) and Slim. 26 | We provide an environment file to install all dependencies (creating an envirnoment called dirtypix): 27 | 28 | ``` 29 | conda env create -f environment.yml 30 | conda activate dirtypix 31 | ``` 32 | 33 | 34 | 35 | ## Running Experiments 36 | We provide code and data and trained models to reproduce the main results presented at the paper, and instructions on how to use this project for further research: 37 | - [EVALUATION_INSTRUCTIONS.md](EVALUATION_INSTRUCTIONS.md) provides instructions 38 | on how to evaluate our proposed models and reproduce results of the paper. 39 | - [TRAINING_INSTRUCTIONS.md](TRAINING_INSTRUCTIONS.md) gives instructions on how to train new models following our proposed approach. 40 | - [ADD_NOISE_INSTRUCTIONS.md](ADD_NOISE_INSTRUCTIONS.md) explains how to simulate 41 | noisy raw images following the image formation model defined in the 42 | manuscript. 43 | 44 | ## Citation 45 | If you find our work useful in your research, please cite: 46 | 47 | ``` 48 | @article{steven:dirtypixels2021, 49 | title={Dirty Pixels: Towards End-to-End Image Processing and Perception}, 50 | author={Diamond, Steven and Sitzmann, Vincent and Julca-Aguilar, Frank and Boyd, Stephen and Wetzstein, Gordon and Heide, Felix}, 51 | journal={ACM Transactions on Graphics (SIGGRAPH)}, 52 | year={2021}, 53 | publisher={ACM} 54 | } 55 | ``` 56 | 57 | ## License 58 | 59 | This project is released under [MIT License](LICENSE). 60 | 61 | 62 | [1]: https://arxiv.org/abs/1701.06487 63 | [sd]: https://stevendiamond.me 64 | [vs]: https://vsitzmann.github.io 65 | [fj]: https://github.com/fjulca-aguilar 66 | [sb]: https://web.stanford.edu/~boyd/ 67 | [gw]: https://stanford.edu/~gordonwz/ 68 | [fh]: https://www.cs.princeton.edu/~fheide/ 69 | 70 | -------------------------------------------------------------------------------- /TRAINING_INSTRUCTIONS.md: -------------------------------------------------------------------------------- 1 | 2 | ## Training new models over noisy RAW data 3 | Download the [Imagenet][in] (training) dataset. 4 | As described in the supplemental document, 5 | our joint models were trained in two stages. 6 | In the first stage, we train the anscombe 7 | and MobileNet components separately on ImageNet. 8 | In this stage, we use L1 norm to train the 9 | anscombe networks. In the second stage, 10 | the joint (MobileNet + Anscombe) model is trained using only 11 | the high level (classification) 12 | loss and the checkpoints obtained in the first stage. 13 | To facilitate training new models, we provide the 14 | checkpoints obtained from the first stage. The checkpoints 15 | can be downloaded following the instruction in 16 | [EVALUATION_INSTRUCTIONS.md](EVALUATION_INSTRUCTIONS.md). 17 | 18 | ## Generating TFRecords for training 19 | In order to generate TFRecord files for training, 20 | run the `build_imagenet_data.py` script in the `datasets` 21 | folder: 22 | 23 | ``` 24 | cd datasets 25 | python build_imagenet_data.py --train_directory=$IMAGENET_TRAIN_DIR \ 26 | --output_directory=$OUT_DIR \ 27 | --num_threads 8 28 | ``` 29 | where `$IMAGENET_TRAIN_DIR` is the path to the Imagenet training dataset, 30 | `$OUT_DIR` is the path to the directory where the TFRecord files will 31 | be exported, and `--num_threads` defines the number of threads to 32 | preprocess the images. 33 | 34 | 35 | 36 | ## Training command example 37 | In order to train our proposed joint architecture 38 | on a `6lux` noise profile run: 39 | 40 | ``` 41 | python train_image_classifier.py --train_dir=$TRAIN_DIR \ 42 | --dataset_dir=$IMAGENET_TFRECORDS --ll_low=0.003 \ 43 | --ll_high=0.003 --batch_size=256 --model_name=mobilenet_isp --num_readers=8 \ 44 | --num_preprocessing_threads=8 --isp_checkpoint_path=$CHECKPOINTS/multires128/6lux/model.ckpt-27000 \ 45 | --checkpoint_path=$CHECKPOINTS/mobilenet_v1_128/mobilenet_v1_1.0_128.ckpt --noise_channel=True \ 46 | --use_anscombe=True --num_clones=2 --isp_model_name=isp --num_iters=1 --device=0,1 \ 47 | --learning_rate=0.00045 --num_epochs_per_decay=2 --train_image_size=128 48 | ``` 49 | where `$IMAGENET_TFRECORDS` is set to the directory with the Imagenet TFRecords, and `$CHECKPOINTS` is set to the downloaded checkpoints directory. The paramaters `--checkpoint_path` 50 | and `--isp_checkpoint_path` are set to the checkpoints obtained in the first training stage. 51 | For training over other noisy profiles, see 52 | `run_train_joint_models.sh`. For more details about the specific training parameters, 53 | see the main manuscript and supplemental document. To visualise the training 54 | progress run `tensorboard --logdir=$TRAIN_DIR`. 55 | 56 | [in]: http://image-net.org/index 57 | 58 | -------------------------------------------------------------------------------- /datasets/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /datasets/beyond_gauss.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Provides data for the Cifar10 dataset. 16 | 17 | The dataset scripts used to create the dataset can be found at: 18 | tensorflow/models/slim/data/create_cifar10_dataset.py 19 | """ 20 | 21 | from __future__ import absolute_import 22 | from __future__ import division 23 | from __future__ import print_function 24 | 25 | import os 26 | import tensorflow as tf 27 | 28 | from datasets import dataset_utils 29 | 30 | slim = tf.contrib.slim 31 | 32 | _FILE_PATTERN = 'beyond_gauss_%s.tfrecord' 33 | 34 | # for bw patches 35 | # SPLITS_TO_SIZES = {'train': 212096, 'test': 10000} 36 | # for color patches 37 | SPLITS_TO_SIZES = {'train': 252416, 'test': 10000} 38 | _ITEMS_TO_DESCRIPTIONS = { 39 | 'input': 'The input image for the model', 40 | 'ground_truth': 'The ground truth image to regress on.', 41 | } 42 | 43 | 44 | def get_split(split_name, dataset_dir, file_pattern=None, reader=None): 45 | """Gets a dataset tuple with instructions for reading cifar10. 46 | 47 | Args: 48 | split_name: A train/test split name. 49 | dataset_dir: The base directory of the dataset sources. 50 | file_pattern: The file pattern to use when matching the dataset sources. 51 | It is assumed that the pattern contains a '%s' string so that the split 52 | name can be inserted. 53 | reader: The TensorFlow reader type. 54 | 55 | Returns: 56 | A `Dataset` namedtuple. 57 | 58 | Raises: 59 | ValueError: if `split_name` is not a valid train/test split. 60 | """ 61 | if split_name not in SPLITS_TO_SIZES: 62 | raise ValueError('split name %s was not recognized.' % split_name) 63 | 64 | if not file_pattern: 65 | file_pattern = _FILE_PATTERN 66 | file_pattern = os.path.join(dataset_dir, file_pattern % split_name) 67 | 68 | # Allowing None in the signature so that dataset_factory can use the default. 69 | if not reader: 70 | reader = tf.TFRecordReader 71 | 72 | keys_to_features = { 73 | 'input_img/encoded': tf.FixedLenFeature((), tf.string, default_value=''), 74 | 'gt_img/encoded': tf.FixedLenFeature((), tf.string, default_value=''), 75 | 'image/format': tf.FixedLenFeature((), tf.string, default_value='png'), 76 | } 77 | 78 | items_to_handlers = { 79 | 'input': slim.tfexample_decoder.Image('input_img/encoded', format_key='image/format'), 80 | 'ground_truth': slim.tfexample_decoder.Image('gt_img/encoded', format_key='image/format'), 81 | } 82 | 83 | decoder = slim.tfexample_decoder.TFExampleDecoder( 84 | keys_to_features, items_to_handlers) 85 | 86 | return slim.dataset.Dataset( 87 | data_sources=file_pattern, 88 | reader=reader, 89 | decoder=decoder, 90 | num_samples=SPLITS_TO_SIZES[split_name], 91 | items_to_descriptions=_ITEMS_TO_DESCRIPTIONS) 92 | -------------------------------------------------------------------------------- /datasets/classes.py: -------------------------------------------------------------------------------- 1 | import os 2 | from glob import glob 3 | from distutils.dir_util import copy_tree 4 | 5 | base = '/media/data/dirty_pix_v3/validation_RAW/' 6 | root = base + 'RAW_human_ISO8000_EXP10000' 7 | target = base + 'RAW_synset_ISO8000_EXP10000' 8 | 9 | human_labels = glob(os.path.join(root,'*/')) 10 | human_labels = [label.split('/')[-2] for label in human_labels] 11 | print(human_labels) 12 | 13 | human_to_synset = {} 14 | with open('raw_metadata.txt', 'r') as synset_human_file: 15 | for line in synset_human_file: 16 | synset = line[:9] 17 | human = line[9:].strip().lower() 18 | for label in human_labels: 19 | for match in human.split(','): 20 | if label.strip() == match.strip().lower(): 21 | human_to_synset[label] = synset 22 | 23 | print(human_to_synset) 24 | missing = False 25 | for h in human_labels: 26 | if h not in human_to_synset: 27 | print h 28 | missing = True 29 | if missing: 30 | print("Missing synsets!") 31 | else: 32 | print("All synsets mapped!") 33 | 34 | #print len(human_labels) 35 | #print len(human_to_synset) 36 | 37 | all_dirs = glob(os.path.join(root,'*/')) 38 | for subdir in all_dirs: 39 | no_imgs = len(glob(os.path.join(subdir, '*.dng'))) 40 | if not no_imgs: 41 | print(subdir + " is empty") 42 | continue 43 | 44 | subdir = subdir[len(root)+1:-1] 45 | print(subdir) 46 | 47 | if subdir not in human_to_synset: 48 | print("Skipping %s"%subdir) 49 | continue 50 | 51 | print("Copying %d files from class %s"%(no_imgs, subdir)) 52 | 53 | synset = human_to_synset[subdir] 54 | new_dir = os.path.join(target, synset) 55 | old_dir = os.path.join(root,subdir) 56 | copy_tree(old_dir, new_dir) 57 | -------------------------------------------------------------------------------- /datasets/dataset_factory.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """A factory-pattern class which returns classification image/label pairs.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | from datasets import imagenet 22 | from datasets import imnet_reg 23 | from datasets import beyond_gauss 24 | from datasets import raw 25 | 26 | datasets_map = { 27 | 'imnet_reg': imnet_reg, 28 | 'imagenet': imagenet, 29 | 'beyond_gauss': beyond_gauss, 30 | 'raw': raw, 31 | } 32 | 33 | 34 | def get_dataset(name, split_name, dataset_dir, file_pattern=None, reader=None): 35 | """Given a dataset name and a split_name returns a Dataset. 36 | 37 | Args: 38 | name: String, the name of the dataset. 39 | split_name: A train/test split name. 40 | dataset_dir: The directory where the dataset files are stored. 41 | file_pattern: The file pattern to use for matching the dataset source files. 42 | reader: The subclass of tf.ReaderBase. If left as `None`, then the default 43 | reader defined by each dataset is used. 44 | 45 | Returns: 46 | A `Dataset` class. 47 | 48 | Raises: 49 | ValueError: If the dataset `name` is unknown. 50 | """ 51 | if name not in datasets_map: 52 | raise ValueError('Name of dataset unknown %s' % name) 53 | return datasets_map[name].get_split( 54 | split_name, 55 | dataset_dir, 56 | file_pattern, 57 | reader) 58 | -------------------------------------------------------------------------------- /datasets/dataset_utils.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains utilities for downloading and converting datasets.""" 16 | from __future__ import absolute_import 17 | from __future__ import division 18 | from __future__ import print_function 19 | 20 | import os 21 | import sys 22 | import tarfile 23 | 24 | from six.moves import urllib 25 | import tensorflow as tf 26 | 27 | LABELS_FILENAME = 'labels.txt' 28 | 29 | 30 | def int64_feature(values): 31 | """Returns a TF-Feature of int64s. 32 | 33 | Args: 34 | values: A scalar or list of values. 35 | 36 | Returns: 37 | a TF-Feature. 38 | """ 39 | if not isinstance(values, (tuple, list)): 40 | values = [values] 41 | return tf.train.Feature(int64_list=tf.train.Int64List(value=values)) 42 | 43 | 44 | def bytes_feature(values): 45 | """Returns a TF-Feature of bytes. 46 | 47 | Args: 48 | values: A string. 49 | 50 | Returns: 51 | a TF-Feature. 52 | """ 53 | return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values])) 54 | 55 | 56 | def image_to_tfexample(image_data, image_format, height, width, class_id): 57 | return tf.train.Example(features=tf.train.Features(feature={ 58 | 'image/encoded': bytes_feature(image_data), 59 | 'image/format': bytes_feature(image_format), 60 | 'image/class/label': int64_feature(class_id), 61 | 'image/height': int64_feature(height), 62 | 'image/width': int64_feature(width), 63 | })) 64 | 65 | 66 | def image_to_tfexample_for_regression(input_img, gt_img, image_format, height, width): 67 | return tf.train.Example(features=tf.train.Features(feature={ 68 | 'input_img/encoded': bytes_feature(input_img), 69 | 'gt_img/encoded': bytes_feature(gt_img), 70 | 'imgs/format': bytes_feature(image_format), 71 | 'imgs/height': int64_feature(height), 72 | 'imgs/width': int64_feature(width), 73 | })) 74 | 75 | 76 | def download_and_uncompress_tarball(tarball_url, dataset_dir): 77 | """Downloads the `tarball_url` and uncompresses it locally. 78 | 79 | Args: 80 | tarball_url: The URL of a tarball file. 81 | dataset_dir: The directory where the temporary files are stored. 82 | """ 83 | filename = tarball_url.split('/')[-1] 84 | filepath = os.path.join(dataset_dir, filename) 85 | 86 | def _progress(count, block_size, total_size): 87 | sys.stdout.write('\r>> Downloading %s %.1f%%' % ( 88 | filename, float(count * block_size) / float(total_size) * 100.0)) 89 | sys.stdout.flush() 90 | filepath, _ = urllib.request.urlretrieve(tarball_url, filepath, _progress) 91 | print() 92 | statinfo = os.stat(filepath) 93 | print('Successfully downloaded', filename, statinfo.st_size, 'bytes.') 94 | tarfile.open(filepath, 'r:gz').extractall(dataset_dir) 95 | 96 | 97 | def write_label_file(labels_to_class_names, dataset_dir, 98 | filename=LABELS_FILENAME): 99 | """Writes a file with the list of class names. 100 | 101 | Args: 102 | labels_to_class_names: A map of (integer) labels to class names. 103 | dataset_dir: The directory in which the labels file should be written. 104 | filename: The filename where the class names are written. 105 | """ 106 | labels_filename = os.path.join(dataset_dir, filename) 107 | with tf.gfile.Open(labels_filename, 'w') as f: 108 | for label in labels_to_class_names: 109 | class_name = labels_to_class_names[label] 110 | f.write('%d:%s\n' % (label, class_name)) 111 | 112 | 113 | def has_labels(dataset_dir, filename=LABELS_FILENAME): 114 | """Specifies whether or not the dataset directory contains a label map file. 115 | 116 | Args: 117 | dataset_dir: The directory in which the labels file is found. 118 | filename: The filename where the class names are written. 119 | 120 | Returns: 121 | `True` if the labels file exists and `False` otherwise. 122 | """ 123 | return tf.gfile.Exists(os.path.join(dataset_dir, filename)) 124 | 125 | 126 | def read_label_file(dataset_dir, filename=LABELS_FILENAME): 127 | """Reads the labels file and returns a mapping from ID to class name. 128 | 129 | Args: 130 | dataset_dir: The directory in which the labels file is found. 131 | filename: The filename where the class names are written. 132 | 133 | Returns: 134 | A map from a label (integer) to class name. 135 | """ 136 | labels_filename = os.path.join(dataset_dir, filename) 137 | with tf.gfile.Open(labels_filename, 'r') as f: 138 | lines = f.read() #f.read().decode() 139 | lines = lines.split('\n') 140 | print('lines ', lines) 141 | lines = filter(None, lines) 142 | 143 | labels_to_class_names = {} 144 | for line in lines: 145 | index = line.index(':') 146 | labels_to_class_names[int(line[:index])] = line[index+1:] 147 | return labels_to_class_names 148 | -------------------------------------------------------------------------------- /datasets/imagenet.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Provides data for the ImageNet ILSVRC 2012 Dataset plus some bounding boxes. 16 | 17 | Some images have one or more bounding boxes associated with the label of the 18 | image. See details here: http://image-net.org/download-bboxes 19 | 20 | ImageNet is based upon WordNet 3.0. To uniquely identify a synset, we use 21 | "WordNet ID" (wnid), which is a concatenation of POS ( i.e. part of speech ) 22 | and SYNSET OFFSET of WordNet. For more information, please refer to the 23 | WordNet documentation[http://wordnet.princeton.edu/wordnet/documentation/]. 24 | 25 | "There are bounding boxes for over 3000 popular synsets available. 26 | For each synset, there are on average 150 images with bounding boxes." 27 | 28 | WARNING: Don't use for object detection, in this case all the bounding boxes 29 | of the image belong to just one class. 30 | """ 31 | from __future__ import absolute_import 32 | from __future__ import division 33 | from __future__ import print_function 34 | 35 | import os 36 | from six.moves import urllib 37 | import tensorflow as tf 38 | 39 | from datasets import dataset_utils 40 | 41 | slim = tf.contrib.slim 42 | 43 | # TODO(nsilberman): Add tfrecord file type once the script is updated. 44 | _FILE_PATTERN = '%s-*' 45 | 46 | _SPLITS_TO_SIZES = { 47 | 'train': 600000,#1281167, 48 | 'validation': 50000, 49 | } 50 | 51 | _ITEMS_TO_DESCRIPTIONS = { 52 | 'image': 'A color image of varying height and width.', 53 | 'label': 'The label id of the image, integer between 0 and 999', 54 | 'label_text': 'The text of the label.', 55 | 'object/bbox': 'A list of bounding boxes.', 56 | 'object/label': 'A list of labels, one per each object.', 57 | } 58 | 59 | _NUM_CLASSES = 1001 60 | 61 | 62 | def create_readable_names_for_imagenet_labels(): 63 | """Create a dict mapping label id to human readable string. 64 | 65 | Returns: 66 | labels_to_names: dictionary where keys are integers from to 1000 67 | and values are human-readable names. 68 | 69 | We retrieve a synset file, which contains a list of valid synset labels used 70 | by ILSVRC competition. There is one synset one per line, eg. 71 | # n01440764 72 | # n01443537 73 | We also retrieve a synset_to_human_file, which contains a mapping from synsets 74 | to human-readable names for every synset in Imagenet. These are stored in a 75 | tsv format, as follows: 76 | # n02119247 black fox 77 | # n02119359 silver fox 78 | We assign each synset (in alphabetical order) an integer, starting from 1 79 | (since 0 is reserved for the background class). 80 | 81 | Code is based on 82 | https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463 83 | """ 84 | 85 | # pylint: disable=g-line-too-long 86 | base_url = 'https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/' 87 | synset_url = '{}/imagenet_lsvrc_2015_synsets.txt'.format(base_url) 88 | synset_to_human_url = '{}/imagenet_metadata.txt'.format(base_url) 89 | 90 | #filename, _ = urllib.request.urlretrieve(synset_url) 91 | filename = './datasets/imagenet_lsvrc_2015_synsets.txt' 92 | synset_list = [s.strip() for s in open(filename).readlines()] 93 | num_synsets_in_ilsvrc = len(synset_list) 94 | assert num_synsets_in_ilsvrc == 1000 95 | 96 | #filename, _ = urllib.request.urlretrieve(synset_to_human_url) 97 | filename = './datasets/imagenet_metadata.txt' 98 | synset_to_human_list = open(filename).readlines() 99 | num_synsets_in_all_imagenet = len(synset_to_human_list) 100 | assert num_synsets_in_all_imagenet == 21842 101 | 102 | synset_to_human = {} 103 | for s in synset_to_human_list: 104 | parts = s.strip().split('\t') 105 | assert len(parts) == 2 106 | synset = parts[0] 107 | human = parts[1] 108 | synset_to_human[synset] = human 109 | 110 | label_index = 1 111 | labels_to_names = {0: 'background'} 112 | for synset in synset_list: 113 | name = synset_to_human[synset] 114 | labels_to_names[label_index] = name 115 | label_index += 1 116 | 117 | return labels_to_names 118 | 119 | 120 | def get_split(split_name, dataset_dir, file_pattern=None, reader=None): 121 | """Gets a dataset tuple with instructions for reading ImageNet. 122 | 123 | Args: 124 | split_name: A train/test split name. 125 | dataset_dir: The base directory of the dataset sources. 126 | file_pattern: The file pattern to use when matching the dataset sources. 127 | It is assumed that the pattern contains a '%s' string so that the split 128 | name can be inserted. 129 | reader: The TensorFlow reader type. 130 | 131 | Returns: 132 | A `Dataset` namedtuple. 133 | 134 | Raises: 135 | ValueError: if `split_name` is not a valid train/test split. 136 | """ 137 | if split_name not in _SPLITS_TO_SIZES: 138 | raise ValueError('split name %s was not recognized.' % split_name) 139 | 140 | if not file_pattern: 141 | file_pattern = _FILE_PATTERN 142 | file_pattern = os.path.join(dataset_dir, file_pattern % split_name) 143 | 144 | # Allowing None in the signature so that dataset_factory can use the default. 145 | if reader is None: 146 | reader = tf.TFRecordReader 147 | 148 | keys_to_features = { 149 | 'image/encoded': tf.FixedLenFeature( 150 | (), tf.string, default_value=''), 151 | 'image/format': tf.FixedLenFeature( 152 | (), tf.string, default_value='jpeg'), 153 | 'image/class/label': tf.FixedLenFeature( 154 | [], dtype=tf.int64, default_value=-1), 155 | 'image/class/text': tf.FixedLenFeature( 156 | [], dtype=tf.string, default_value=''), 157 | 'image/object/bbox/xmin': tf.VarLenFeature( 158 | dtype=tf.float32), 159 | 'image/object/bbox/ymin': tf.VarLenFeature( 160 | dtype=tf.float32), 161 | 'image/object/bbox/xmax': tf.VarLenFeature( 162 | dtype=tf.float32), 163 | 'image/object/bbox/ymax': tf.VarLenFeature( 164 | dtype=tf.float32), 165 | 'image/object/class/label': tf.VarLenFeature( 166 | dtype=tf.int64), 167 | } 168 | 169 | items_to_handlers = { 170 | 'image': slim.tfexample_decoder.Image('image/encoded', 'image/format'), 171 | 'label': slim.tfexample_decoder.Tensor('image/class/label'), 172 | 'label_text': slim.tfexample_decoder.Tensor('image/class/text'), 173 | 'object/bbox': slim.tfexample_decoder.BoundingBox( 174 | ['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/'), 175 | 'object/label': slim.tfexample_decoder.Tensor('image/object/class/label'), 176 | } 177 | 178 | decoder = slim.tfexample_decoder.TFExampleDecoder( 179 | keys_to_features, items_to_handlers) 180 | 181 | labels_to_names = None 182 | if dataset_utils.has_labels(dataset_dir): 183 | labels_to_names = dataset_utils.read_label_file(dataset_dir) 184 | else: 185 | labels_to_names = create_readable_names_for_imagenet_labels() 186 | dataset_utils.write_label_file(labels_to_names, dataset_dir) 187 | 188 | return slim.dataset.Dataset( 189 | data_sources=file_pattern, 190 | reader=reader, 191 | decoder=decoder, 192 | num_samples=_SPLITS_TO_SIZES[split_name], 193 | items_to_descriptions=_ITEMS_TO_DESCRIPTIONS, 194 | num_classes=_NUM_CLASSES, 195 | labels_to_names=labels_to_names) 196 | -------------------------------------------------------------------------------- /datasets/imagenet_lsvrc_2015_synsets.txt: -------------------------------------------------------------------------------- 1 | n01440764 2 | n01443537 3 | n01484850 4 | n01491361 5 | n01494475 6 | n01496331 7 | n01498041 8 | n01514668 9 | n01514859 10 | n01518878 11 | n01530575 12 | n01531178 13 | n01532829 14 | n01534433 15 | n01537544 16 | n01558993 17 | n01560419 18 | n01580077 19 | n01582220 20 | n01592084 21 | n01601694 22 | n01608432 23 | n01614925 24 | n01616318 25 | n01622779 26 | n01629819 27 | n01630670 28 | n01631663 29 | n01632458 30 | n01632777 31 | n01641577 32 | n01644373 33 | n01644900 34 | n01664065 35 | n01665541 36 | n01667114 37 | n01667778 38 | n01669191 39 | n01675722 40 | n01677366 41 | n01682714 42 | n01685808 43 | n01687978 44 | n01688243 45 | n01689811 46 | n01692333 47 | n01693334 48 | n01694178 49 | n01695060 50 | n01697457 51 | n01698640 52 | n01704323 53 | n01728572 54 | n01728920 55 | n01729322 56 | n01729977 57 | n01734418 58 | n01735189 59 | n01737021 60 | n01739381 61 | n01740131 62 | n01742172 63 | n01744401 64 | n01748264 65 | n01749939 66 | n01751748 67 | n01753488 68 | n01755581 69 | n01756291 70 | n01768244 71 | n01770081 72 | n01770393 73 | n01773157 74 | n01773549 75 | n01773797 76 | n01774384 77 | n01774750 78 | n01775062 79 | n01776313 80 | n01784675 81 | n01795545 82 | n01796340 83 | n01797886 84 | n01798484 85 | n01806143 86 | n01806567 87 | n01807496 88 | n01817953 89 | n01818515 90 | n01819313 91 | n01820546 92 | n01824575 93 | n01828970 94 | n01829413 95 | n01833805 96 | n01843065 97 | n01843383 98 | n01847000 99 | n01855032 100 | n01855672 101 | n01860187 102 | n01871265 103 | n01872401 104 | n01873310 105 | n01877812 106 | n01882714 107 | n01883070 108 | n01910747 109 | n01914609 110 | n01917289 111 | n01924916 112 | n01930112 113 | n01943899 114 | n01944390 115 | n01945685 116 | n01950731 117 | n01955084 118 | n01968897 119 | n01978287 120 | n01978455 121 | n01980166 122 | n01981276 123 | n01983481 124 | n01984695 125 | n01985128 126 | n01986214 127 | n01990800 128 | n02002556 129 | n02002724 130 | n02006656 131 | n02007558 132 | n02009229 133 | n02009912 134 | n02011460 135 | n02012849 136 | n02013706 137 | n02017213 138 | n02018207 139 | n02018795 140 | n02025239 141 | n02027492 142 | n02028035 143 | n02033041 144 | n02037110 145 | n02051845 146 | n02056570 147 | n02058221 148 | n02066245 149 | n02071294 150 | n02074367 151 | n02077923 152 | n02085620 153 | n02085782 154 | n02085936 155 | n02086079 156 | n02086240 157 | n02086646 158 | n02086910 159 | n02087046 160 | n02087394 161 | n02088094 162 | n02088238 163 | n02088364 164 | n02088466 165 | n02088632 166 | n02089078 167 | n02089867 168 | n02089973 169 | n02090379 170 | n02090622 171 | n02090721 172 | n02091032 173 | n02091134 174 | n02091244 175 | n02091467 176 | n02091635 177 | n02091831 178 | n02092002 179 | n02092339 180 | n02093256 181 | n02093428 182 | n02093647 183 | n02093754 184 | n02093859 185 | n02093991 186 | n02094114 187 | n02094258 188 | n02094433 189 | n02095314 190 | n02095570 191 | n02095889 192 | n02096051 193 | n02096177 194 | n02096294 195 | n02096437 196 | n02096585 197 | n02097047 198 | n02097130 199 | n02097209 200 | n02097298 201 | n02097474 202 | n02097658 203 | n02098105 204 | n02098286 205 | n02098413 206 | n02099267 207 | n02099429 208 | n02099601 209 | n02099712 210 | n02099849 211 | n02100236 212 | n02100583 213 | n02100735 214 | n02100877 215 | n02101006 216 | n02101388 217 | n02101556 218 | n02102040 219 | n02102177 220 | n02102318 221 | n02102480 222 | n02102973 223 | n02104029 224 | n02104365 225 | n02105056 226 | n02105162 227 | n02105251 228 | n02105412 229 | n02105505 230 | n02105641 231 | n02105855 232 | n02106030 233 | n02106166 234 | n02106382 235 | n02106550 236 | n02106662 237 | n02107142 238 | n02107312 239 | n02107574 240 | n02107683 241 | n02107908 242 | n02108000 243 | n02108089 244 | n02108422 245 | n02108551 246 | n02108915 247 | n02109047 248 | n02109525 249 | n02109961 250 | n02110063 251 | n02110185 252 | n02110341 253 | n02110627 254 | n02110806 255 | n02110958 256 | n02111129 257 | n02111277 258 | n02111500 259 | n02111889 260 | n02112018 261 | n02112137 262 | n02112350 263 | n02112706 264 | n02113023 265 | n02113186 266 | n02113624 267 | n02113712 268 | n02113799 269 | n02113978 270 | n02114367 271 | n02114548 272 | n02114712 273 | n02114855 274 | n02115641 275 | n02115913 276 | n02116738 277 | n02117135 278 | n02119022 279 | n02119789 280 | n02120079 281 | n02120505 282 | n02123045 283 | n02123159 284 | n02123394 285 | n02123597 286 | n02124075 287 | n02125311 288 | n02127052 289 | n02128385 290 | n02128757 291 | n02128925 292 | n02129165 293 | n02129604 294 | n02130308 295 | n02132136 296 | n02133161 297 | n02134084 298 | n02134418 299 | n02137549 300 | n02138441 301 | n02165105 302 | n02165456 303 | n02167151 304 | n02168699 305 | n02169497 306 | n02172182 307 | n02174001 308 | n02177972 309 | n02190166 310 | n02206856 311 | n02219486 312 | n02226429 313 | n02229544 314 | n02231487 315 | n02233338 316 | n02236044 317 | n02256656 318 | n02259212 319 | n02264363 320 | n02268443 321 | n02268853 322 | n02276258 323 | n02277742 324 | n02279972 325 | n02280649 326 | n02281406 327 | n02281787 328 | n02317335 329 | n02319095 330 | n02321529 331 | n02325366 332 | n02326432 333 | n02328150 334 | n02342885 335 | n02346627 336 | n02356798 337 | n02361337 338 | n02363005 339 | n02364673 340 | n02389026 341 | n02391049 342 | n02395406 343 | n02396427 344 | n02397096 345 | n02398521 346 | n02403003 347 | n02408429 348 | n02410509 349 | n02412080 350 | n02415577 351 | n02417914 352 | n02422106 353 | n02422699 354 | n02423022 355 | n02437312 356 | n02437616 357 | n02441942 358 | n02442845 359 | n02443114 360 | n02443484 361 | n02444819 362 | n02445715 363 | n02447366 364 | n02454379 365 | n02457408 366 | n02480495 367 | n02480855 368 | n02481823 369 | n02483362 370 | n02483708 371 | n02484975 372 | n02486261 373 | n02486410 374 | n02487347 375 | n02488291 376 | n02488702 377 | n02489166 378 | n02490219 379 | n02492035 380 | n02492660 381 | n02493509 382 | n02493793 383 | n02494079 384 | n02497673 385 | n02500267 386 | n02504013 387 | n02504458 388 | n02509815 389 | n02510455 390 | n02514041 391 | n02526121 392 | n02536864 393 | n02606052 394 | n02607072 395 | n02640242 396 | n02641379 397 | n02643566 398 | n02655020 399 | n02666196 400 | n02667093 401 | n02669723 402 | n02672831 403 | n02676566 404 | n02687172 405 | n02690373 406 | n02692877 407 | n02699494 408 | n02701002 409 | n02704792 410 | n02708093 411 | n02727426 412 | n02730930 413 | n02747177 414 | n02749479 415 | n02769748 416 | n02776631 417 | n02777292 418 | n02782093 419 | n02783161 420 | n02786058 421 | n02787622 422 | n02788148 423 | n02790996 424 | n02791124 425 | n02791270 426 | n02793495 427 | n02794156 428 | n02795169 429 | n02797295 430 | n02799071 431 | n02802426 432 | n02804414 433 | n02804610 434 | n02807133 435 | n02808304 436 | n02808440 437 | n02814533 438 | n02814860 439 | n02815834 440 | n02817516 441 | n02823428 442 | n02823750 443 | n02825657 444 | n02834397 445 | n02835271 446 | n02837789 447 | n02840245 448 | n02841315 449 | n02843684 450 | n02859443 451 | n02860847 452 | n02865351 453 | n02869837 454 | n02870880 455 | n02871525 456 | n02877765 457 | n02879718 458 | n02883205 459 | n02892201 460 | n02892767 461 | n02894605 462 | n02895154 463 | n02906734 464 | n02909870 465 | n02910353 466 | n02916936 467 | n02917067 468 | n02927161 469 | n02930766 470 | n02939185 471 | n02948072 472 | n02950826 473 | n02951358 474 | n02951585 475 | n02963159 476 | n02965783 477 | n02966193 478 | n02966687 479 | n02971356 480 | n02974003 481 | n02977058 482 | n02978881 483 | n02979186 484 | n02980441 485 | n02981792 486 | n02988304 487 | n02992211 488 | n02992529 489 | n02999410 490 | n03000134 491 | n03000247 492 | n03000684 493 | n03014705 494 | n03016953 495 | n03017168 496 | n03018349 497 | n03026506 498 | n03028079 499 | n03032252 500 | n03041632 501 | n03042490 502 | n03045698 503 | n03047690 504 | n03062245 505 | n03063599 506 | n03063689 507 | n03065424 508 | n03075370 509 | n03085013 510 | n03089624 511 | n03095699 512 | n03100240 513 | n03109150 514 | n03110669 515 | n03124043 516 | n03124170 517 | n03125729 518 | n03126707 519 | n03127747 520 | n03127925 521 | n03131574 522 | n03133878 523 | n03134739 524 | n03141823 525 | n03146219 526 | n03160309 527 | n03179701 528 | n03180011 529 | n03187595 530 | n03188531 531 | n03196217 532 | n03197337 533 | n03201208 534 | n03207743 535 | n03207941 536 | n03208938 537 | n03216828 538 | n03218198 539 | n03220513 540 | n03223299 541 | n03240683 542 | n03249569 543 | n03250847 544 | n03255030 545 | n03259280 546 | n03271574 547 | n03272010 548 | n03272562 549 | n03290653 550 | n03291819 551 | n03297495 552 | n03314780 553 | n03325584 554 | n03337140 555 | n03344393 556 | n03345487 557 | n03347037 558 | n03355925 559 | n03372029 560 | n03376595 561 | n03379051 562 | n03384352 563 | n03388043 564 | n03388183 565 | n03388549 566 | n03393912 567 | n03394916 568 | n03400231 569 | n03404251 570 | n03417042 571 | n03424325 572 | n03425413 573 | n03443371 574 | n03444034 575 | n03445777 576 | n03445924 577 | n03447447 578 | n03447721 579 | n03450230 580 | n03452741 581 | n03457902 582 | n03459775 583 | n03461385 584 | n03467068 585 | n03476684 586 | n03476991 587 | n03478589 588 | n03481172 589 | n03482405 590 | n03483316 591 | n03485407 592 | n03485794 593 | n03492542 594 | n03494278 595 | n03495258 596 | n03496892 597 | n03498962 598 | n03527444 599 | n03529860 600 | n03530642 601 | n03532672 602 | n03534580 603 | n03535780 604 | n03538406 605 | n03544143 606 | n03584254 607 | n03584829 608 | n03590841 609 | n03594734 610 | n03594945 611 | n03595614 612 | n03598930 613 | n03599486 614 | n03602883 615 | n03617480 616 | n03623198 617 | n03627232 618 | n03630383 619 | n03633091 620 | n03637318 621 | n03642806 622 | n03649909 623 | n03657121 624 | n03658185 625 | n03661043 626 | n03662601 627 | n03666591 628 | n03670208 629 | n03673027 630 | n03676483 631 | n03680355 632 | n03690938 633 | n03691459 634 | n03692522 635 | n03697007 636 | n03706229 637 | n03709823 638 | n03710193 639 | n03710637 640 | n03710721 641 | n03717622 642 | n03720891 643 | n03721384 644 | n03724870 645 | n03729826 646 | n03733131 647 | n03733281 648 | n03733805 649 | n03742115 650 | n03743016 651 | n03759954 652 | n03761084 653 | n03763968 654 | n03764736 655 | n03769881 656 | n03770439 657 | n03770679 658 | n03773504 659 | n03775071 660 | n03775546 661 | n03776460 662 | n03777568 663 | n03777754 664 | n03781244 665 | n03782006 666 | n03785016 667 | n03786901 668 | n03787032 669 | n03788195 670 | n03788365 671 | n03791053 672 | n03792782 673 | n03792972 674 | n03793489 675 | n03794056 676 | n03796401 677 | n03803284 678 | n03804744 679 | n03814639 680 | n03814906 681 | n03825788 682 | n03832673 683 | n03837869 684 | n03838899 685 | n03840681 686 | n03841143 687 | n03843555 688 | n03854065 689 | n03857828 690 | n03866082 691 | n03868242 692 | n03868863 693 | n03871628 694 | n03873416 695 | n03874293 696 | n03874599 697 | n03876231 698 | n03877472 699 | n03877845 700 | n03884397 701 | n03887697 702 | n03888257 703 | n03888605 704 | n03891251 705 | n03891332 706 | n03895866 707 | n03899768 708 | n03902125 709 | n03903868 710 | n03908618 711 | n03908714 712 | n03916031 713 | n03920288 714 | n03924679 715 | n03929660 716 | n03929855 717 | n03930313 718 | n03930630 719 | n03933933 720 | n03935335 721 | n03937543 722 | n03938244 723 | n03942813 724 | n03944341 725 | n03947888 726 | n03950228 727 | n03954731 728 | n03956157 729 | n03958227 730 | n03961711 731 | n03967562 732 | n03970156 733 | n03976467 734 | n03976657 735 | n03977966 736 | n03980874 737 | n03982430 738 | n03983396 739 | n03991062 740 | n03992509 741 | n03995372 742 | n03998194 743 | n04004767 744 | n04005630 745 | n04008634 746 | n04009552 747 | n04019541 748 | n04023962 749 | n04026417 750 | n04033901 751 | n04033995 752 | n04037443 753 | n04039381 754 | n04040759 755 | n04041544 756 | n04044716 757 | n04049303 758 | n04065272 759 | n04067472 760 | n04069434 761 | n04070727 762 | n04074963 763 | n04081281 764 | n04086273 765 | n04090263 766 | n04099969 767 | n04111531 768 | n04116512 769 | n04118538 770 | n04118776 771 | n04120489 772 | n04125021 773 | n04127249 774 | n04131690 775 | n04133789 776 | n04136333 777 | n04141076 778 | n04141327 779 | n04141975 780 | n04146614 781 | n04147183 782 | n04149813 783 | n04152593 784 | n04153751 785 | n04154565 786 | n04162706 787 | n04179913 788 | n04192698 789 | n04200800 790 | n04201297 791 | n04204238 792 | n04204347 793 | n04208210 794 | n04209133 795 | n04209239 796 | n04228054 797 | n04229816 798 | n04235860 799 | n04238763 800 | n04239074 801 | n04243546 802 | n04251144 803 | n04252077 804 | n04252225 805 | n04254120 806 | n04254680 807 | n04254777 808 | n04258138 809 | n04259630 810 | n04263257 811 | n04264628 812 | n04265275 813 | n04266014 814 | n04270147 815 | n04273569 816 | n04275548 817 | n04277352 818 | n04285008 819 | n04286575 820 | n04296562 821 | n04310018 822 | n04311004 823 | n04311174 824 | n04317175 825 | n04325704 826 | n04326547 827 | n04328186 828 | n04330267 829 | n04332243 830 | n04335435 831 | n04336792 832 | n04344873 833 | n04346328 834 | n04347754 835 | n04350905 836 | n04355338 837 | n04355933 838 | n04356056 839 | n04357314 840 | n04366367 841 | n04367480 842 | n04370456 843 | n04371430 844 | n04371774 845 | n04372370 846 | n04376876 847 | n04380533 848 | n04389033 849 | n04392985 850 | n04398044 851 | n04399382 852 | n04404412 853 | n04409515 854 | n04417672 855 | n04418357 856 | n04423845 857 | n04428191 858 | n04429376 859 | n04435653 860 | n04442312 861 | n04443257 862 | n04447861 863 | n04456115 864 | n04458633 865 | n04461696 866 | n04462240 867 | n04465501 868 | n04467665 869 | n04476259 870 | n04479046 871 | n04482393 872 | n04483307 873 | n04485082 874 | n04486054 875 | n04487081 876 | n04487394 877 | n04493381 878 | n04501370 879 | n04505470 880 | n04507155 881 | n04509417 882 | n04515003 883 | n04517823 884 | n04522168 885 | n04523525 886 | n04525038 887 | n04525305 888 | n04532106 889 | n04532670 890 | n04536866 891 | n04540053 892 | n04542943 893 | n04548280 894 | n04548362 895 | n04550184 896 | n04552348 897 | n04553703 898 | n04554684 899 | n04557648 900 | n04560804 901 | n04562935 902 | n04579145 903 | n04579432 904 | n04584207 905 | n04589890 906 | n04590129 907 | n04591157 908 | n04591713 909 | n04592741 910 | n04596742 911 | n04597913 912 | n04599235 913 | n04604644 914 | n04606251 915 | n04612504 916 | n04613696 917 | n06359193 918 | n06596364 919 | n06785654 920 | n06794110 921 | n06874185 922 | n07248320 923 | n07565083 924 | n07579787 925 | n07583066 926 | n07584110 927 | n07590611 928 | n07613480 929 | n07614500 930 | n07615774 931 | n07684084 932 | n07693725 933 | n07695742 934 | n07697313 935 | n07697537 936 | n07711569 937 | n07714571 938 | n07714990 939 | n07715103 940 | n07716358 941 | n07716906 942 | n07717410 943 | n07717556 944 | n07718472 945 | n07718747 946 | n07720875 947 | n07730033 948 | n07734744 949 | n07742313 950 | n07745940 951 | n07747607 952 | n07749582 953 | n07753113 954 | n07753275 955 | n07753592 956 | n07754684 957 | n07760859 958 | n07768694 959 | n07802026 960 | n07831146 961 | n07836838 962 | n07860988 963 | n07871810 964 | n07873807 965 | n07875152 966 | n07880968 967 | n07892512 968 | n07920052 969 | n07930864 970 | n07932039 971 | n09193705 972 | n09229709 973 | n09246464 974 | n09256479 975 | n09288635 976 | n09332890 977 | n09399592 978 | n09421951 979 | n09428293 980 | n09468604 981 | n09472597 982 | n09835506 983 | n10148035 984 | n10565667 985 | n11879895 986 | n11939491 987 | n12057211 988 | n12144580 989 | n12267677 990 | n12620546 991 | n12768682 992 | n12985857 993 | n12998815 994 | n13037406 995 | n13040303 996 | n13044778 997 | n13052670 998 | n13054560 999 | n13133613 1000 | n15075141 1001 | -------------------------------------------------------------------------------- /datasets/imnet_reg.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Provides data for the ImageNet ILSVRC 2012 Dataset plus some bounding boxes. 16 | 17 | Some images have one or more bounding boxes associated with the label of the 18 | image. See details here: http://image-net.org/download-bboxes 19 | 20 | ImageNet is based upon WordNet 3.0. To uniquely identify a synset, we use 21 | "WordNet ID" (wnid), which is a concatenation of POS ( i.e. part of speech ) 22 | and SYNSET OFFSET of WordNet. For more information, please refer to the 23 | WordNet documentation[http://wordnet.princeton.edu/wordnet/documentation/]. 24 | 25 | "There are bounding boxes for over 3000 popular synsets available. 26 | For each synset, there are on average 150 images with bounding boxes." 27 | 28 | WARNING: Don't use for object detection, in this case all the bounding boxes 29 | of the image belong to just one class. 30 | """ 31 | from __future__ import absolute_import 32 | from __future__ import division 33 | from __future__ import print_function 34 | 35 | import os 36 | from six.moves import urllib 37 | import tensorflow as tf 38 | 39 | from datasets import dataset_utils 40 | 41 | slim = tf.contrib.slim 42 | 43 | # TODO(nsilberman): Add tfrecord file type once the script is updated. 44 | _FILE_PATTERN = '%s-*' 45 | 46 | _SPLITS_TO_SIZES = { 47 | 'train': 1156644, 48 | 'validation': 50000, 49 | } 50 | 51 | _ITEMS_TO_DESCRIPTIONS = { 52 | 'input_img': 'A color image of varying height and width.', 53 | 'gt_img': 'A color image of varying height and width.', 54 | 'label': 'The label id of the image, integer between 0 and 999', 55 | 'label_text': 'The text of the label.', 56 | 'object/bbox': 'A list of bounding boxes.', 57 | 'object/label': 'A list of labels, one per each object.', 58 | } 59 | 60 | _NUM_CLASSES = 1001 61 | 62 | 63 | def create_readable_names_for_imagenet_labels(): 64 | """Create a dict mapping label id to human readable string. 65 | 66 | Returns: 67 | labels_to_names: dictionary where keys are integers from to 1000 68 | and values are human-readable names. 69 | 70 | We retrieve a synset file, which contains a list of valid synset labels used 71 | by ILSVRC competition. There is one synset one per line, eg. 72 | # n01440764 73 | # n01443537 74 | We also retrieve a synset_to_human_file, which contains a mapping from synsets 75 | to human-readable names for every synset in Imagenet. These are stored in a 76 | tsv format, as follows: 77 | # n02119247 black fox 78 | # n02119359 silver fox 79 | We assign each synset (in alphabetical order) an integer, starting from 1 80 | (since 0 is reserved for the background class). 81 | 82 | Code is based on 83 | https://github.com/tensorflow/models/blob/master/inception/inception/data/convert_reg_imgnet.py#L463 84 | """ 85 | 86 | # pylint: disable=g-line-too-long 87 | base_url = 'https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/' 88 | synset_url = '{}/imagenet_lsvrc_2015_synsets.txt'.format(base_url) 89 | synset_to_human_url = '{}/imagenet_metadata.txt'.format(base_url) 90 | 91 | #filename, _ = urllib.request.urlretrieve(synset_url) 92 | filename = 'imagenet_lsvrc_2015_synsets.txt' 93 | synset_list = [s.strip() for s in open(filename).readlines()] 94 | num_synsets_in_ilsvrc = len(synset_list) 95 | assert num_synsets_in_ilsvrc == 1000 96 | 97 | #filename, _ = urllib.request.urlretrieve(synset_to_human_url) 98 | filename = 'imagenet_metadata.txt' 99 | synset_to_human_list = open(filename).readlines() 100 | num_synsets_in_all_imagenet = len(synset_to_human_list) 101 | assert num_synsets_in_all_imagenet == 21842 102 | 103 | synset_to_human = {} 104 | for s in synset_to_human_list: 105 | parts = s.strip().split('\t') 106 | assert len(parts) == 2 107 | synset = parts[0] 108 | human = parts[1] 109 | synset_to_human[synset] = human 110 | 111 | label_index = 1 112 | labels_to_names = {0: 'background'} 113 | for synset in synset_list: 114 | name = synset_to_human[synset] 115 | labels_to_names[label_index] = name 116 | label_index += 1 117 | 118 | return labels_to_names 119 | 120 | 121 | def get_split(split_name, dataset_dir, file_pattern=None, reader=None): 122 | """Gets a dataset tuple with instructions for reading ImageNet. 123 | 124 | Args: 125 | split_name: A train/test split name. 126 | dataset_dir: The base directory of the dataset sources. 127 | file_pattern: The file pattern to use when matching the dataset sources. 128 | It is assumed that the pattern contains a '%s' string so that the split 129 | name can be inserted. 130 | reader: The TensorFlow reader type. 131 | 132 | Returns: 133 | A `Dataset` namedtuple. 134 | 135 | Raises: 136 | ValueError: if `split_name` is not a valid train/test split. 137 | """ 138 | if split_name not in _SPLITS_TO_SIZES: 139 | raise ValueError('split name %s was not recognized.' % split_name) 140 | 141 | if not file_pattern: 142 | file_pattern = _FILE_PATTERN 143 | file_pattern = os.path.join(dataset_dir, file_pattern % split_name) 144 | 145 | # Allowing None in the signature so that dataset_factory can use the default. 146 | if reader is None: 147 | reader = tf.TFRecordReader 148 | 149 | keys_to_features = { 150 | 'input_img/encoded': tf.FixedLenFeature( 151 | (), tf.string, default_value=''), 152 | 'gt_img/encoded': tf.FixedLenFeature( 153 | (), tf.string, default_value=''), 154 | 'image/format': tf.FixedLenFeature( 155 | (), tf.string, default_value='jpeg'), 156 | 'image/class/label': tf.FixedLenFeature( 157 | [], dtype=tf.int64, default_value=-1), 158 | 'image/class/text': tf.FixedLenFeature( 159 | [], dtype=tf.string, default_value=''), 160 | 'image/object/bbox/xmin': tf.VarLenFeature( 161 | dtype=tf.float32), 162 | 'image/object/bbox/ymin': tf.VarLenFeature( 163 | dtype=tf.float32), 164 | 'image/object/bbox/xmax': tf.VarLenFeature( 165 | dtype=tf.float32), 166 | 'image/object/bbox/ymax': tf.VarLenFeature( 167 | dtype=tf.float32), 168 | 'image/object/class/label': tf.VarLenFeature( 169 | dtype=tf.int64), 170 | } 171 | 172 | items_to_handlers = { 173 | 'input': slim.tfexample_decoder.Image('input_img/encoded', 'image/format'), 174 | 'ground_truth': slim.tfexample_decoder.Image('gt_img/encoded', 'image/format'), 175 | 'label': slim.tfexample_decoder.Tensor('image/class/label'), 176 | 'label_text': slim.tfexample_decoder.Tensor('image/class/text'), 177 | 'object/bbox': slim.tfexample_decoder.BoundingBox( 178 | ['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/'), 179 | 'object/label': slim.tfexample_decoder.Tensor('image/object/class/label'), 180 | } 181 | 182 | decoder = slim.tfexample_decoder.TFExampleDecoder( 183 | keys_to_features, items_to_handlers) 184 | 185 | return slim.dataset.Dataset( 186 | data_sources=file_pattern, 187 | reader=reader, 188 | decoder=decoder, 189 | num_samples=_SPLITS_TO_SIZES[split_name], 190 | items_to_descriptions=_ITEMS_TO_DESCRIPTIONS) 191 | -------------------------------------------------------------------------------- /datasets/raw.py: -------------------------------------------------------------------------------- 1 | """ 2 | data for the ImageNet ILSVRC 2012 Dataset plus some bounding boxes. 3 | 4 | Some images have one or more bounding boxes associated with the label of the 5 | image. See details here: http://image-net.org/download-bboxes 6 | 7 | ImageNet is based upon WordNet 3.0. To uniquely identify a synset, we use 8 | "WordNet ID" (wnid), which is a concatenation of POS ( i.e. part of speech ) 9 | and SYNSET OFFSET of WordNet. For more information, please refer to the 10 | WordNet documentation[http://wordnet.princeton.edu/wordnet/documentation/]. 11 | 12 | "There are bounding boxes for over 3000 popular synsets available. 13 | For each synset, there are on average 150 images with bounding boxes." 14 | 15 | WARNING: Don't use for object detection, in this case all the bounding boxes 16 | of the image belong to just one class. 17 | """ 18 | from __future__ import absolute_import 19 | from __future__ import division 20 | from __future__ import print_function 21 | 22 | import os 23 | from six.moves import urllib 24 | import tensorflow as tf 25 | 26 | from datasets import dataset_utils 27 | 28 | slim = tf.contrib.slim 29 | 30 | # TODO(nsilberman): Add tfrecord file type once the script is updated. 31 | _FILE_PATTERN = '%s-*' 32 | 33 | _SPLITS_TO_SIZES = { 34 | 'train': 1281167, 35 | 'validation': 1103, # low 844, medium 1103 36 | } 37 | 38 | _ITEMS_TO_DESCRIPTIONS = { 39 | 'image': 'A color image of varying height and width.', 40 | 'label': 'The label id of the image, integer between 0 and 999', 41 | 'label_text': 'The text of the label.', 42 | 'object/bbox': 'A list of bounding boxes.', 43 | 'object/label': 'A list of labels, one per each object.', 44 | } 45 | 46 | _NUM_CLASSES = 1001 47 | 48 | 49 | def create_readable_names_for_imagenet_labels(): 50 | """Create a dict mapping label id to human readable string. 51 | 52 | Returns: 53 | labels_to_names: dictionary where keys are integers from to 1000 54 | and values are human-readable names. 55 | 56 | We retrieve a synset file, which contains a list of valid synset labels used 57 | by ILSVRC competition. There is one synset one per line, eg. 58 | # n01440764 59 | # n01443537 60 | We also retrieve a synset_to_human_file, which contains a mapping from synsets 61 | to human-readable names for every synset in Imagenet. These are stored in a 62 | tsv format, as follows: 63 | # n02119247 black fox 64 | # n02119359 silver fox 65 | We assign each synset (in alphabetical order) an integer, starting from 1 66 | (since 0 is reserved for the background class). 67 | 68 | Code is based on 69 | https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463 70 | """ 71 | 72 | # pylint: disable=g-line-too-long 73 | base_url = 'https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/' 74 | synset_url = '{}/imagenet_lsvrc_2015_synsets.txt'.format(base_url) 75 | synset_to_human_url = '{}/imagenet_metadata.txt'.format(base_url) 76 | 77 | filename, _ = urllib.request.urlretrieve(synset_url) 78 | synset_list = [s.strip() for s in open(filename).readlines()] 79 | num_synsets_in_ilsvrc = len(synset_list) 80 | assert num_synsets_in_ilsvrc == 1000 81 | 82 | filename, _ = urllib.request.urlretrieve(synset_to_human_url) 83 | synset_to_human_list = open(filename).readlines() 84 | num_synsets_in_all_imagenet = len(synset_to_human_list) 85 | assert num_synsets_in_all_imagenet == 21842 86 | 87 | synset_to_human = {} 88 | for s in synset_to_human_list: 89 | parts = s.strip().split('\t') 90 | assert len(parts) == 2 91 | synset = parts[0] 92 | human = parts[1] 93 | synset_to_human[synset] = human 94 | 95 | label_index = 1 96 | labels_to_names = {0: 'background'} 97 | for synset in synset_list: 98 | name = synset_to_human[synset] 99 | labels_to_names[label_index] = name 100 | label_index += 1 101 | 102 | return labels_to_names 103 | 104 | 105 | def get_split(split_name, dataset_dir, file_pattern=None, reader=None): 106 | """Gets a dataset tuple with instructions for reading ImageNet. 107 | 108 | Args: 109 | split_name: A train/test split name. 110 | dataset_dir: The base directory of the dataset sources. 111 | file_pattern: The file pattern to use when matching the dataset sources. 112 | It is assumed that the pattern contains a '%s' string so that the split 113 | name can be inserted. 114 | reader: The TensorFlow reader type. 115 | 116 | Returns: 117 | A `Dataset` namedtuple. 118 | 119 | Raises: 120 | ValueError: if `split_name` is not a valid train/test split. 121 | """ 122 | if split_name not in _SPLITS_TO_SIZES: 123 | raise ValueError('split name %s was not recognized.' % split_name) 124 | 125 | if not file_pattern: 126 | file_pattern = _FILE_PATTERN 127 | file_pattern = os.path.join(dataset_dir, file_pattern % split_name) 128 | 129 | # Allowing None in the signature so that dataset_factory can use the default. 130 | if reader is None: 131 | reader = tf.TFRecordReader 132 | 133 | keys_to_features = { 134 | 'image/encoded': tf.FixedLenFeature( 135 | (), tf.string, default_value=''), 136 | 'image/format': tf.FixedLenFeature( 137 | (), tf.string, default_value='jpeg'), 138 | 'image/class/label': tf.FixedLenFeature( 139 | [], dtype=tf.int64, default_value=-1), 140 | 'image/class/text': tf.FixedLenFeature( 141 | [], dtype=tf.string, default_value=''), 142 | 'image/filename': tf.FixedLenFeature( 143 | [], dtype=tf.string, default_value=''), 144 | 'image/object/bbox/xmin': tf.VarLenFeature( 145 | dtype=tf.float32), 146 | 'image/object/bbox/ymin': tf.VarLenFeature( 147 | dtype=tf.float32), 148 | 'image/object/bbox/xmax': tf.VarLenFeature( 149 | dtype=tf.float32), 150 | 'image/object/bbox/ymax': tf.VarLenFeature( 151 | dtype=tf.float32), 152 | 'image/object/class/label': tf.VarLenFeature( 153 | dtype=tf.int64), 154 | } 155 | 156 | items_to_handlers = { 157 | 'image': slim.tfexample_decoder.Image('image/encoded', 'image/format'), 158 | 'label': slim.tfexample_decoder.Tensor('image/class/label'), 159 | 'filename': slim.tfexample_decoder.Tensor('image/filename'), 160 | 'label_text': slim.tfexample_decoder.Tensor('image/class/text'), 161 | 'object/bbox': slim.tfexample_decoder.BoundingBox( 162 | ['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/'), 163 | 'object/label': slim.tfexample_decoder.Tensor('image/object/class/label'), 164 | } 165 | 166 | decoder = slim.tfexample_decoder.TFExampleDecoder( 167 | keys_to_features, items_to_handlers) 168 | 169 | labels_to_names = None 170 | if dataset_utils.has_labels(dataset_dir): 171 | labels_to_names = dataset_utils.read_label_file(dataset_dir) 172 | else: 173 | labels_to_names = create_readable_names_for_imagenet_labels() 174 | dataset_utils.write_label_file(labels_to_names, dataset_dir) 175 | 176 | return slim.dataset.Dataset( 177 | data_sources=file_pattern, 178 | reader=reader, 179 | decoder=decoder, 180 | num_samples=_SPLITS_TO_SIZES[split_name], 181 | items_to_descriptions=_ITEMS_TO_DESCRIPTIONS, 182 | num_classes=_NUM_CLASSES, 183 | labels_to_names=labels_to_names) 184 | 185 | -------------------------------------------------------------------------------- /datasets/raw_metadata.txt: -------------------------------------------------------------------------------- 1 | n04116512 rubber eraser, rubber, pencil eraser 2 | n03995372 power drill 3 | n03983396 pop bottle, soda bottle 4 | n03291819 envelope 5 | n03063599 coffee mug 6 | n03891251 park bench 7 | n07753592 banana 8 | n02870880 bookcase 9 | n02965783 car mirror 10 | n02823428 beer bottle 11 | n02974003 car wheel 12 | n04254777 sock, socks 13 | n03085013 computer keyboard, keypad 14 | n03793489 mouse, computer mouse 15 | n02783161 ballpoint, ballpoint pen, ballpen, Biro 16 | n04485082 tripod 17 | n02877765 bottlecap 18 | n03792782 mountain bike, all-terrain bike, off-roader 19 | n03782006 monitor 20 | n04131690 saltshaker, salt shaker 21 | n03761084 microwave, microwave oven 22 | n04557648 water bottle 23 | n03208938 disk brake, disc brake, disk brakes 24 | n04507155 umbrella 25 | n02786058 Band Aid 26 | n04153751 screw 27 | n04548362 wallet, billfold, notecase, pocketbook 28 | n04254120 soap dispenser 29 | n04356056 sunglasses, dark glasses, shades 30 | n04548280 wall clock 31 | n04447861 toilet seat 32 | n03958227 plastic bag 33 | n03717622 manhole cover 34 | n03481172 hammer 35 | n15075141 toilet tissue, toilet paper, bathroom tissue 36 | n04004767 printer 37 | n03924679 photocopier 38 | n03657121 lens cap, lens cover 39 | n04118776 rule, ruler 40 | n04009552 projector 41 | n03857828 oscilloscope, scope, cathode-ray oscilloscope, CRO 42 | n03492542 hard disc, hard disk, fixed disk 43 | n03388183 fountain pen 44 | n02840245 binder, ring-binder 45 | n02769748 backpack, back pack, knapsack, packsack, rucksack, haversack 46 | n03832673 notebook, notebook computer 47 | n03297495 espresso maker 48 | n02782093 balloon 49 | n03887697 paper towel 50 | n04069434 reflex camera 51 | n03180011 desktop computer 52 | n03179701 desk 53 | n02992529 cellular telephone, cellular phone, cellphone, cell, mobile phone 54 | n03637318 lampshade, lamp shade 55 | n03929660 pick, plectrum, plectron 56 | n03445777 golf ball 57 | n03666591 lighter, light, igniter, ignitor 58 | n04591713 wine bottle 59 | n02747177 trash can 60 | n04409515 tennis ball 61 | n03223299 doormat 62 | n04554684 washing machine 63 | n04557648 water bottle 64 | n04553703 washbasin 65 | 66 | -------------------------------------------------------------------------------- /datasets/synset_labels.txt: -------------------------------------------------------------------------------- 1 | n02823428:441 2 | n04507155:880 3 | n04485082:873 4 | n04557648:899 5 | n03782006:665 6 | n02769748:415 7 | n03291819:550 8 | n03793489:674 9 | n03085013:509 10 | n02840245:447 11 | n02974003:480 12 | n03887697:701 13 | n03761084:652 14 | n04153751:784 15 | n02870880:454 16 | n03924679:714 17 | n03983396:738 18 | n15075141:1000 19 | n03717622:641 20 | n03208938:536 21 | n04254120:805 22 | n03063599:505 23 | n03637318:620 24 | n03891251:704 25 | n03832673:682 26 | n04356056:838 27 | n04118776:770 28 | n04009552:746 29 | n02965783:476 30 | n04004767:743 31 | n03180011:528 32 | n07753592:955 33 | n04548280:893 34 | n04447861:862 35 | n04548362:894 36 | n03995372:741 37 | n03857828:689 38 | n02783161:419 39 | n03388183:564 40 | n03297495:551 41 | n03085013:509 42 | n03657121:623 43 | n07753592:955 44 | n04548280:893 45 | n04254777:807 46 | n02783161:419 47 | n03887697:701 48 | n02823428:441 49 | n04447861:862 50 | n03782006:665 51 | n03063599:505 52 | n15075141:1000 53 | n03891251:704 54 | n04153751:784 55 | n03857828:689 56 | n03291819:550 57 | n02786058:420 58 | n04548362:894 59 | n04118776:770 60 | n02974003:480 61 | n03761084:652 62 | n04485082:873 63 | n04254120:805 64 | n03924679:714 65 | n03637318:620 66 | n02769748:415 67 | n02870880:454 68 | n03793489:674 69 | n03995372:741 70 | n03717622:641 71 | n02965783:476 72 | n03887697:701 73 | n03717622:641 74 | n03063599:505 75 | n04507155:880 76 | n04447861:862 77 | n07753592:955 78 | n04153751:784 79 | n15075141:1000 80 | n03793489:674 81 | n03782006:665 82 | n03291819:550 83 | n02870880:454 84 | n04254120:805 85 | n04118776:770 86 | n03657121:623 87 | n03208938:536 88 | n03983396:738 89 | n02783161:419 90 | n03085013:509 91 | n04548280:893 92 | n03297495:551 93 | n04485082:873 94 | n04004767:743 95 | n03857828:689 96 | n02974003:480 97 | n04548362:894 98 | n03761084:652 99 | n02769748:415 100 | n03891251:704 101 | n04254777:807 102 | n03924679:714 103 | n03995372:741 104 | n02823428:441 105 | n03832673:682 106 | n02786058:420 107 | n04557648:899 108 | n02965783:476 109 | n02840245:447 110 | n04009552:746 111 | n04356056:838 112 | n03637318:620 113 | n03180011:528 114 | n03388183:564 115 | n03929660:715 116 | n04591713:908 117 | n03445777:575 118 | n03666591:627 119 | n02747177:413 120 | n04409515:853 121 | n03223299:540 122 | n04554684:898 123 | n04557648:899 124 | n04553703:897 125 | n04116512:768 126 | -------------------------------------------------------------------------------- /datasets/training_synsets.txt: -------------------------------------------------------------------------------- 1 | n01440764 2 | n01443537 3 | n01484850 4 | n01491361 5 | n01494475 6 | n01496331 7 | n01498041 8 | n01514668 9 | n01514859 10 | n01518878 11 | n01530575 12 | n01531178 13 | n01532829 14 | n01534433 15 | n01537544 16 | n01558993 17 | n01560419 18 | n01580077 19 | n01582220 20 | n01592084 21 | n01601694 22 | n01608432 23 | n01614925 24 | n01616318 25 | n01622779 26 | n01629819 27 | n01630670 28 | n01631663 29 | n01632458 30 | n01632777 31 | n01641577 32 | n01644373 33 | n01644900 34 | n01664065 35 | n01665541 36 | n01667114 37 | n01667778 38 | n01669191 39 | n01675722 40 | n01677366 41 | n01682714 42 | n01685808 43 | n01687978 44 | n01688243 45 | n01689811 46 | n01692333 47 | n01693334 48 | n01694178 49 | n01695060 50 | n01697457 51 | n01698640 52 | n01704323 53 | n01728572 54 | n01728920 55 | n01729322 56 | n01729977 57 | n01734418 58 | n01735189 59 | n01737021 60 | n01739381 61 | n01740131 62 | n01742172 63 | n01744401 64 | n01748264 65 | n01749939 66 | n01751748 67 | n01753488 68 | n01755581 69 | n01756291 70 | n01768244 71 | n01770081 72 | n01770393 73 | n01773157 74 | n01773549 75 | n01773797 76 | n01774384 77 | n01774750 78 | n01775062 79 | n01776313 80 | n01784675 81 | n01795545 82 | n01796340 83 | n01797886 84 | n01798484 85 | n01806143 86 | n01806567 87 | n01807496 88 | n01817953 89 | n01818515 90 | n01819313 91 | n01820546 92 | n01824575 93 | n01828970 94 | n01829413 95 | n01833805 96 | n01843065 97 | n01843383 98 | n01847000 99 | n01855032 100 | n01855672 101 | n01860187 102 | n01871265 103 | n01872772 104 | n01873310 105 | n01877812 106 | n01882714 107 | n01883070 108 | n01910747 109 | n01914609 110 | n01917289 111 | n01924916 112 | n01930112 113 | n01943899 114 | n01944390 115 | n07922607 116 | n01950731 117 | n01955084 118 | n01968897 119 | n01978287 120 | n01978455 121 | n01980166 122 | n01981276 123 | n01983481 124 | n01984695 125 | n01985128 126 | n01986214 127 | n01990800 128 | n02002556 129 | n02002724 130 | n02006656 131 | n02007558 132 | n02009229 133 | n02009912 134 | n02011460 135 | n02012849 136 | n02013706 137 | n02017213 138 | n02018207 139 | n02018795 140 | n02025239 141 | n02027492 142 | n02028035 143 | n02033041 144 | n02037110 145 | n02051845 146 | n02056570 147 | n02058221 148 | n02066245 149 | n02071294 150 | n02074367 151 | n02077923 152 | n02085620 153 | n02085782 154 | n02085936 155 | n02086079 156 | n02086240 157 | n02086646 158 | n02086910 159 | n02087046 160 | n02087394 161 | n02088094 162 | n02088238 163 | n02088364 164 | n02088466 165 | n02088632 166 | n02089078 167 | n02089867 168 | n02089973 169 | n02090379 170 | n02090622 171 | n02090721 172 | n02091032 173 | n02091134 174 | n02091244 175 | n02091467 176 | n02091635 177 | n02091831 178 | n02092002 179 | n02092339 180 | n02093256 181 | n02093428 182 | n02093647 183 | n02093754 184 | n02093859 185 | n02093991 186 | n02094114 187 | n02094258 188 | n02094433 189 | n02095314 190 | n02095570 191 | n02095889 192 | n02096051 193 | n02096177 194 | n02096294 195 | n02096437 196 | n02096585 197 | n02097047 198 | n02097130 199 | n02097209 200 | n02097298 201 | n02097474 202 | n02097658 203 | n02098105 204 | n02098286 205 | n02098413 206 | n02099267 207 | n02099429 208 | n02099601 209 | n02099712 210 | n02099849 211 | n02100236 212 | n02100583 213 | n02100735 214 | n02100877 215 | n02101006 216 | n02101388 217 | n02101556 218 | n02102040 219 | n02102177 220 | n02102318 221 | n02102480 222 | n02102973 223 | n02104029 224 | n02104365 225 | n02105056 226 | n02105162 227 | n02105251 228 | n02105412 229 | n02105505 230 | n02105641 231 | n02105855 232 | n02106030 233 | n02106166 234 | n02106382 235 | n02106550 236 | n02106662 237 | n02107142 238 | n02107312 239 | n02107574 240 | n02107683 241 | n02107908 242 | n02108000 243 | n02108089 244 | n02108422 245 | n02108551 246 | n02108915 247 | n02109047 248 | n02109525 249 | n02109961 250 | n02110063 251 | n02110185 252 | n02110341 253 | n02110627 254 | n02110806 255 | n02110958 256 | n02111129 257 | n02111277 258 | n02111500 259 | n02111889 260 | n02112018 261 | n02112137 262 | n02112350 263 | n02112706 264 | n02113023 265 | n02113186 266 | n02113624 267 | n02113712 268 | n02113799 269 | n02113978 270 | n02114367 271 | n02114548 272 | n02114712 273 | n02114855 274 | n02115641 275 | n02115913 276 | n02116738 277 | n02117135 278 | n02119022 279 | n02119789 280 | n02120079 281 | n02120505 282 | n02123045 283 | n02123159 284 | n02123394 285 | n02123597 286 | n02124075 287 | n02125311 288 | n02127052 289 | n02128385 290 | n02128757 291 | n02128925 292 | n02129165 293 | n02129604 294 | n02130308 295 | n02132136 296 | n02133161 297 | n02134084 298 | n02134418 299 | n02137549 300 | n02138441 301 | n02165105 302 | n02165456 303 | n02167151 304 | n02168699 305 | n02169497 306 | n02172182 307 | n02174001 308 | n02177972 309 | n03373237 310 | n02206856 311 | n02219486 312 | n02226429 313 | n02229544 314 | n02231487 315 | n02233338 316 | n02236044 317 | n02256656 318 | n02259212 319 | n02264363 320 | n02268443 321 | n02268853 322 | n02276258 323 | n02277742 324 | n02279972 325 | n02280649 326 | n02281406 327 | n02281787 328 | n02317335 329 | n02319095 330 | n02321529 331 | n02325366 332 | n02326432 333 | n02328150 334 | n02342885 335 | n02346627 336 | n02356798 337 | n02361337 338 | n02818254 339 | n02364673 340 | n02389026 341 | n02391049 342 | n02395406 343 | n02396427 344 | n02397096 345 | n02398521 346 | n02403003 347 | n02408429 348 | n02410509 349 | n02412080 350 | n02415577 351 | n02417914 352 | n02422106 353 | n02422699 354 | n02423022 355 | n02437312 356 | n02437616 357 | n02441942 358 | n02442845 359 | n02443114 360 | n02443484 361 | n02444819 362 | n02445715 363 | n02447366 364 | n02454379 365 | n02457408 366 | n02480495 367 | n02480855 368 | n02481823 369 | n02483362 370 | n02483708 371 | n02484975 372 | n02486261 373 | n02486410 374 | n02487347 375 | n02488291 376 | n02488702 377 | n02489166 378 | n02490219 379 | n02492035 380 | n02492660 381 | n02493509 382 | n02493793 383 | n02494079 384 | n02497673 385 | n02500267 386 | n02504013 387 | n02504458 388 | n02509815 389 | n02510455 390 | n02514041 391 | n02526121 392 | n02536864 393 | n02606052 394 | n02607072 395 | n02640242 396 | n02641379 397 | n02643566 398 | n02655020 399 | n02666196 400 | n02667093 401 | n02669723 402 | n02672831 403 | n02676566 404 | n02687172 405 | n02690373 406 | n02692877 407 | n02699494 408 | n02701002 409 | n02704792 410 | n02708093 411 | n02727426 412 | n08496334 413 | n02747177 414 | n02749479 415 | n02769748 416 | n02776631 417 | n02777292 418 | n02782093 419 | n02783161 420 | n02786058 421 | n02787622 422 | n02788148 423 | n02790996 424 | n02791124 425 | n02791270 426 | n02793495 427 | n02794156 428 | n02795169 429 | n02797295 430 | n02799071 431 | n02802426 432 | n02804515 433 | n02804610 434 | n02807133 435 | n02808304 436 | n02808440 437 | n02814533 438 | n02814860 439 | n02815834 440 | n02817516 441 | n02823428 442 | n02823750 443 | n02825657 444 | n02834397 445 | n02835271 446 | n02837789 447 | n02840245 448 | n02841315 449 | n02843684 450 | n02859443 451 | n02860847 452 | n02865351 453 | n02869837 454 | n02870880 455 | n02871525 456 | n02877765 457 | n02879718 458 | n02883205 459 | n02892201 460 | n02892767 461 | n02894605 462 | n02895154 463 | n02906734 464 | n02909870 465 | n02910353 466 | n02916936 467 | n02917067 468 | n02927161 469 | n02930766 470 | n02939185 471 | n02948072 472 | n02950826 473 | n02951358 474 | n02951585 475 | n02963159 476 | n02965783 477 | n02966193 478 | n02966687 479 | n02971356 480 | n02974003 481 | n02977058 482 | n02978881 483 | n02979186 484 | n02980441 485 | n02981792 486 | n02988304 487 | n02992211 488 | n02992529 489 | n02999936 490 | n03000134 491 | n03000247 492 | n03000684 493 | n03014705 494 | n03016953 495 | n03017168 496 | n03018349 497 | n03026506 498 | n03028079 499 | n03032252 500 | n03041632 501 | n03042490 502 | n03045698 503 | n03047690 504 | n03062245 505 | n03063599 506 | n03063689 507 | n03065424 508 | n03075370 509 | n03085013 510 | n03089624 511 | n03095699 512 | n03100240 513 | n03109150 514 | n03110669 515 | n03124043 516 | n03124170 517 | n03125729 518 | n03126707 519 | n03127747 520 | n03127925 521 | n03131574 522 | n03133878 523 | n03134739 524 | n03141823 525 | n03146219 526 | n03160309 527 | n03179701 528 | n03180011 529 | n03187595 530 | n03188531 531 | n03196217 532 | n03197337 533 | n03201208 534 | n03207743 535 | n03207941 536 | n03208938 537 | n03216828 538 | n03218198 539 | n03220513 540 | n03223299 541 | n03240683 542 | n03249569 543 | n03250847 544 | n03255030 545 | n03259401 546 | n03271574 547 | n03272010 548 | n03272562 549 | n03290653 550 | n13869788 551 | n03297495 552 | n03314780 553 | n03325584 554 | n03337140 555 | n03344393 556 | n03345487 557 | n03347037 558 | n03355925 559 | n03372029 560 | n03376595 561 | n03379051 562 | n03384352 563 | n03388043 564 | n03388183 565 | n03388549 566 | n03393912 567 | n03394916 568 | n03400231 569 | n03404251 570 | n03417042 571 | n03424325 572 | n03425413 573 | n03443371 574 | n03444034 575 | n03445777 576 | n03445924 577 | n03447447 578 | n03447721 579 | n03450230 580 | n03452741 581 | n03457902 582 | n03459775 583 | n03461385 584 | n03467068 585 | n03476684 586 | n03476991 587 | n03478589 588 | n03482001 589 | n03482405 590 | n03483316 591 | n03485407 592 | n03485794 593 | n03492542 594 | n03494278 595 | n03495570 596 | n03496892 597 | n03498962 598 | n03527565 599 | n03529860 600 | n09218315 601 | n03532672 602 | n03534580 603 | n03535780 604 | n03538406 605 | n03544143 606 | n03584254 607 | n03584829 608 | n03590841 609 | n03594734 610 | n03594945 611 | n03595614 612 | n03598930 613 | n03599486 614 | n03602883 615 | n03617480 616 | n03623198 617 | n03627232 618 | n03630383 619 | n03633091 620 | n03637318 621 | n03642806 622 | n03649909 623 | n03657121 624 | n03658185 625 | n07977870 626 | n03662601 627 | n03666591 628 | n03670208 629 | n03673027 630 | n03676483 631 | n03680355 632 | n03690938 633 | n03691459 634 | n03692522 635 | n03697007 636 | n03706229 637 | n03709823 638 | n03710193 639 | n03710637 640 | n03710721 641 | n03717622 642 | n03720891 643 | n03721384 644 | n03725035 645 | n03729826 646 | n03733131 647 | n03733281 648 | n03733805 649 | n03742115 650 | n03743016 651 | n03759954 652 | n03761084 653 | n03763968 654 | n03764736 655 | n03769881 656 | n03770439 657 | n03770679 658 | n03773504 659 | n03775071 660 | n03775546 661 | n03776460 662 | n03777568 663 | n03777754 664 | n03781244 665 | n03782006 666 | n03785016 667 | n03786901 668 | n03787032 669 | n03788195 670 | n03788365 671 | n03791053 672 | n03792782 673 | n03792972 674 | n03793489 675 | n03794056 676 | n03796401 677 | n03803284 678 | n03804744 679 | n03814639 680 | n03814906 681 | n03825788 682 | n03832673 683 | n03837869 684 | n03838899 685 | n03840681 686 | n03841143 687 | n03843555 688 | n03854065 689 | n03857828 690 | n03866082 691 | n03868242 692 | n03868863 693 | n03871628 694 | n03873416 695 | n03874293 696 | n03874599 697 | n03876231 698 | n03877472 699 | n03878211 700 | n03884397 701 | n03887697 702 | n03888257 703 | n03888605 704 | n03891251 705 | n03891332 706 | n03895866 707 | n03899768 708 | n03902125 709 | n03903868 710 | n03908618 711 | n03908714 712 | n03916031 713 | n03920288 714 | n03924679 715 | n03929660 716 | n03929855 717 | n03930313 718 | n03930630 719 | n03934042 720 | n03935335 721 | n03937543 722 | n03938244 723 | n03942813 724 | n03944341 725 | n03947888 726 | n03950228 727 | n03954731 728 | n03956157 729 | n03958227 730 | n03961711 731 | n03967562 732 | n03970156 733 | n03976467 734 | n03977158 735 | n03977966 736 | n03980874 737 | n03982430 738 | n03983396 739 | n03991062 740 | n03992509 741 | n03995372 742 | n03998194 743 | n04004767 744 | n04005630 745 | n04008634 746 | n04009801 747 | n04019541 748 | n04023962 749 | n04026417 750 | n04033901 751 | n04033995 752 | n04037443 753 | n04039381 754 | n09403211 755 | n04041544 756 | n04044716 757 | n04049303 758 | n04065272 759 | n04067658 760 | n04069434 761 | n04070727 762 | n04074963 763 | n04081281 764 | n04086273 765 | n04090263 766 | n04099969 767 | n04111531 768 | n04116512 769 | n04118538 770 | n04118776 771 | n04120489 772 | n04125116 773 | n04127249 774 | n04131690 775 | n04133789 776 | n04136333 777 | n04141076 778 | n04141327 779 | n04141975 780 | n04146614 781 | n04147291 782 | n04149813 783 | n04152593 784 | n04154340 785 | n07917272 786 | n04162706 787 | n04179913 788 | n04192698 789 | n04200800 790 | n04201297 791 | n04204238 792 | n04204347 793 | n04208427 794 | n04209133 795 | n04209239 796 | n04228054 797 | n04229816 798 | n04235860 799 | n04238763 800 | n04239074 801 | n04243546 802 | n04251144 803 | n04252077 804 | n04252225 805 | n04254120 806 | n04254680 807 | n04254777 808 | n04258138 809 | n04259630 810 | n04263257 811 | n04264628 812 | n04265275 813 | n04266014 814 | n04270147 815 | n04273569 816 | n04275548 817 | n04277669 818 | n04285008 819 | n04286575 820 | n04296562 821 | n04310018 822 | n04311004 823 | n04311174 824 | n04317175 825 | n04325704 826 | n04326547 827 | n04328186 828 | n04330267 829 | n04332243 830 | n04335435 831 | n04337157 832 | n04344873 833 | n04346328 834 | n04347754 835 | n04350905 836 | n04355338 837 | n04355933 838 | n04356056 839 | n04357314 840 | n04366367 841 | n04367480 842 | n04370456 843 | n04371430 844 | n04371774 845 | n04372370 846 | n04376876 847 | n04380533 848 | n04389033 849 | n04392985 850 | n04398044 851 | n04399382 852 | n04404412 853 | n04409515 854 | n04417672 855 | n04418357 856 | n04423845 857 | n04428191 858 | n04429376 859 | n04435653 860 | n04442312 861 | n04443257 862 | n04447861 863 | n04456115 864 | n04458633 865 | n04461696 866 | n04462240 867 | n04465666 868 | n04467665 869 | n04476259 870 | n04479046 871 | n04482393 872 | n04483307 873 | n04485082 874 | n04486054 875 | n04487081 876 | n04487394 877 | n04493381 878 | n04501370 879 | n04505470 880 | n04507155 881 | n04509417 882 | n04515003 883 | n04517823 884 | n04522168 885 | n04523525 886 | n04525038 887 | n04525305 888 | n04532106 889 | n04532670 890 | n04536866 891 | n04540053 892 | n04542943 893 | n04548280 894 | n04548362 895 | n04550184 896 | n04552348 897 | n04553703 898 | n04554684 899 | n04557648 900 | n04560804 901 | n04562935 902 | n04579145 903 | n04579667 904 | n04584207 905 | n04589890 906 | n04590129 907 | n04591157 908 | n04591713 909 | n04592741 910 | n04596742 911 | n04597913 912 | n04599235 913 | n04604644 914 | n04606251 915 | n04612504 916 | n04613696 917 | n06359193 918 | n06596364 919 | n06785654 920 | n06794110 921 | n06874185 922 | n07248320 923 | n07565083 924 | n07579787 925 | n07583066 926 | n07584110 927 | n07590611 928 | n07613480 929 | n07614500 930 | n07615774 931 | n07684084 932 | n07693725 933 | n07695742 934 | n07697313 935 | n07697537 936 | n07711569 937 | n07714571 938 | n07714990 939 | n07715103 940 | n12159804 941 | n12160303 942 | n12160857 943 | n07717556 944 | n07718472 945 | n07718747 946 | n07720875 947 | n07730033 948 | n13001041 949 | n07742313 950 | n07745940 951 | n07747607 952 | n07749582 953 | n07753113 954 | n07753275 955 | n07753592 956 | n07754684 957 | n07760859 958 | n07768694 959 | n07802026 960 | n07831146 961 | n07836838 962 | n07860988 963 | n07871810 964 | n07873807 965 | n07875152 966 | n07880968 967 | n07892512 968 | n07920052 969 | n07930864 970 | n07932039 971 | n09193705 972 | n09229709 973 | n09246464 974 | n09256479 975 | n09288635 976 | n09332890 977 | n09399592 978 | n09421951 979 | n09428293 980 | n09468604 981 | n09472597 982 | n09835506 983 | n10148035 984 | n10565667 985 | n11879895 986 | n11939491 987 | n12057211 988 | n12144580 989 | n12267677 990 | n12620546 991 | n12768682 992 | n12985857 993 | n12998815 994 | n13037406 995 | n13040303 996 | n13044778 997 | n13052670 998 | n13054560 999 | n13133613 1000 | n15075141 1001 | -------------------------------------------------------------------------------- /deployment/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: dirtypix 2 | channels: 3 | - defaults 4 | dependencies: 5 | - _libgcc_mutex=0.1=main 6 | - _tflow_select=2.1.0=gpu 7 | - absl-py=0.12.0=py36h06a4308_0 8 | - astor=0.8.1=py36h06a4308_0 9 | - blas=1.0=mkl 10 | - blosc=1.21.0=h8c45485_0 11 | - brotli=1.0.9=he6710b0_2 12 | - bzip2=1.0.8=h7b6447c_0 13 | - c-ares=1.17.1=h27cfd23_0 14 | - ca-certificates=2021.4.13=h06a4308_1 15 | - certifi=2020.12.5=py36h06a4308_0 16 | - charls=2.1.0=he6710b0_2 17 | - cloudpickle=1.6.0=py_0 18 | - coverage=5.5=py36h27cfd23_2 19 | - cudatoolkit=9.2=0 20 | - cudnn=7.6.5=cuda9.2_0 21 | - cupti=9.2.148=0 22 | - cycler=0.10.0=py36_0 23 | - cython=0.29.23=py36h2531618_0 24 | - cytoolz=0.11.0=py36h7b6447c_0 25 | - dask-core=2021.3.0=pyhd3eb1b0_0 26 | - decorator=5.0.6=pyhd3eb1b0_0 27 | - freetype=2.10.4=h5ab3b9f_0 28 | - gast=0.4.0=py_0 29 | - giflib=5.1.4=h14c3975_1 30 | - grpcio=1.36.1=py36h2157cd5_1 31 | - h5py=2.10.0=py36hd6299e0_1 32 | - hdf5=1.10.6=hb1b8bf9_0 33 | - imagecodecs=2020.5.30=py36hfa7d478_2 34 | - imageio=2.9.0=pyhd3eb1b0_0 35 | - importlib-metadata=3.10.0=py36h06a4308_0 36 | - intel-openmp=2021.2.0=h06a4308_610 37 | - jpeg=9b=h024ee3a_2 38 | - jxrlib=1.1=h7b6447c_2 39 | - keras-applications=1.0.8=py_1 40 | - keras-preprocessing=1.1.2=pyhd3eb1b0_0 41 | - kiwisolver=1.3.1=py36h2531618_0 42 | - lcms2=2.12=h3be6417_0 43 | - ld_impl_linux-64=2.33.1=h53a641e_7 44 | - libaec=1.0.4=he6710b0_1 45 | - libffi=3.3=he6710b0_2 46 | - libgcc-ng=9.1.0=hdf63c60_0 47 | - libgfortran-ng=7.3.0=hdf63c60_0 48 | - libpng=1.6.37=hbc83047_0 49 | - libprotobuf=3.14.0=h8c45485_0 50 | - libstdcxx-ng=9.1.0=hdf63c60_0 51 | - libtiff=4.1.0=h2733197_1 52 | - libwebp=1.0.1=h8e7db2f_0 53 | - libzopfli=1.0.3=he6710b0_0 54 | - lz4-c=1.9.3=h2531618_0 55 | - markdown=3.3.4=py36h06a4308_0 56 | - matplotlib-base=3.3.4=py36h62a2d02_0 57 | - mkl=2020.2=256 58 | - mkl-service=2.3.0=py36he8ac12f_0 59 | - mkl_fft=1.3.0=py36h54f3939_0 60 | - mkl_random=1.1.1=py36h0573a6f_0 61 | - ncurses=6.2=he6710b0_1 62 | - networkx=2.5=py_0 63 | - numpy=1.19.2=py36h54aff64_0 64 | - numpy-base=1.19.2=py36hfa32c7d_0 65 | - olefile=0.46=py_0 66 | - openjpeg=2.3.0=h05c96fa_1 67 | - openssl=1.1.1k=h27cfd23_0 68 | - pillow=8.2.0=py36he98fc37_0 69 | - pip=21.0.1=py36h06a4308_0 70 | - protobuf=3.14.0=py36h2531618_1 71 | - pyparsing=2.4.7=pyhd3eb1b0_0 72 | - python=3.6.13=hdb3f193_0 73 | - python-dateutil=2.8.1=pyhd3eb1b0_0 74 | - pywavelets=1.1.1=py36h7b6447c_2 75 | - pyyaml=5.4.1=py36h27cfd23_1 76 | - readline=8.1=h27cfd23_0 77 | - scikit-image=0.17.2=py36hdf5156a_0 78 | - scipy=1.5.2=py36h0b6359f_0 79 | - setuptools=52.0.0=py36h06a4308_0 80 | - six=1.15.0=pyhd3eb1b0_0 81 | - snappy=1.1.8=he6710b0_0 82 | - sqlite=3.35.4=hdfb4753_0 83 | - tensorboard=1.12.2=py36he6710b0_0 84 | - tensorflow=1.12.0=gpu_py36he74679b_0 85 | - tensorflow-base=1.12.0=gpu_py36had579c0_0 86 | - tensorflow-gpu=1.12.0=h0d30ee6_0 87 | - termcolor=1.1.0=py36h06a4308_1 88 | - tifffile=2021.3.31=pyhd3eb1b0_1 89 | - tk=8.6.10=hbc83047_0 90 | - toolz=0.11.1=pyhd3eb1b0_0 91 | - tornado=6.1=py36h27cfd23_0 92 | - typing_extensions=3.7.4.3=pyha847dfd_0 93 | - werkzeug=1.0.1=pyhd3eb1b0_0 94 | - wheel=0.36.2=pyhd3eb1b0_0 95 | - xz=5.2.5=h7b6447c_0 96 | - yaml=0.2.5=h7b6447c_0 97 | - zipp=3.4.1=pyhd3eb1b0_0 98 | - zlib=1.2.11=h7b6447c_3 99 | - zstd=1.4.5=h9ceee32_0 100 | prefix: /home/frank.julca-aguilar/anaconda3/envs/dirtypix 101 | -------------------------------------------------------------------------------- /loss_functions/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/princeton-computational-imaging/DirtyPixels/6c82b124c9e32bbf5fa7d6adf8db8103132e4e5e/loss_functions/__init__.py -------------------------------------------------------------------------------- /loss_functions/loss_factory.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains a factory for building various models.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | import tensorflow as tf 22 | 23 | from preprocessing import cifarnet_preprocessing 24 | from preprocessing import inception_preprocessing 25 | from preprocessing import lenet_preprocessing 26 | from preprocessing import vgg_preprocessing 27 | 28 | slim = tf.contrib.slim 29 | 30 | 31 | 32 | def get_loss(name): 33 | """Returns loss_fn(outputs, ground_truths, **kwargs), where "outputs" are the model outputs. 34 | 35 | Args: 36 | name: The name of the loss function. 37 | 38 | Returns: 39 | loss_fn: A function that computes the loss between the inputs and the ground_truths 40 | 41 | Raises: 42 | ValueError: If Preprocessing `name` is not recognized. 43 | """ 44 | loss_fn_map = { 45 | 'mean_squared_error':slim.losses.mean_squared_error, 46 | 'absolute_difference':slim.losses.absolute_difference 47 | } 48 | 49 | if name not in loss_fn_map: 50 | raise ValueError('Loss function name [%s] was not recognized' % name) 51 | 52 | def loss_fn(outputs, ground_truths, **kwargs): 53 | return loss_fn_map[name]( 54 | outputs, ground_truths, **kwargs) 55 | 56 | return loss_fn 57 | -------------------------------------------------------------------------------- /nets/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /nets/inception.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Brings all inception models under one namespace.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | # pylint: disable=unused-import 22 | from nets.inception_resnet_v2 import inception_resnet_v2 23 | from nets.inception_resnet_v2 import inception_resnet_v2_arg_scope 24 | from nets.inception_v1 import inception_v1 25 | from nets.inception_v1 import inception_v1_arg_scope 26 | from nets.inception_v1 import inception_v1_base 27 | from nets.inception_v2 import inception_v2 28 | from nets.inception_v2 import inception_v2_arg_scope 29 | from nets.inception_v2 import inception_v2_base 30 | from nets.inception_v3 import inception_v3 31 | from nets.inception_v3 import inception_v3_arg_scope 32 | from nets.inception_v3 import inception_v3_base 33 | from nets.inception_v4 import inception_v4 34 | from nets.inception_v4 import inception_v4_arg_scope 35 | from nets.inception_v4 import inception_v4_base 36 | # pylint: enable=unused-import 37 | -------------------------------------------------------------------------------- /nets/inception_utils.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains common code shared by all inception models. 16 | 17 | Usage of arg scope: 18 | with slim.arg_scope(inception_arg_scope()): 19 | logits, end_points = inception.inception_v3(images, num_classes, 20 | is_training=is_training) 21 | 22 | """ 23 | from __future__ import absolute_import 24 | from __future__ import division 25 | from __future__ import print_function 26 | 27 | import tensorflow as tf 28 | 29 | slim = tf.contrib.slim 30 | 31 | 32 | def inception_arg_scope(weight_decay=0.00004, 33 | use_batch_norm=True, 34 | batch_norm_decay=0.9997, 35 | batch_norm_epsilon=0.001): 36 | """Defines the default arg scope for inception models. 37 | 38 | Args: 39 | weight_decay: The weight decay to use for regularizing the model. 40 | use_batch_norm: "If `True`, batch_norm is applied after each convolution. 41 | batch_norm_decay: Decay for batch norm moving average. 42 | batch_norm_epsilon: Small float added to variance to avoid dividing by zero 43 | in batch norm. 44 | 45 | Returns: 46 | An `arg_scope` to use for the inception models. 47 | """ 48 | print("weight decay = ", weight_decay) 49 | batch_norm_params = { 50 | # Decay for the moving averages. 51 | 'decay': batch_norm_decay, 52 | # epsilon to prevent 0s in variance. 53 | 'epsilon': batch_norm_epsilon, 54 | # collection containing update_ops. 55 | 'updates_collections': tf.GraphKeys.UPDATE_OPS, 56 | } 57 | if use_batch_norm: 58 | normalizer_fn = slim.batch_norm 59 | normalizer_params = batch_norm_params 60 | else: 61 | normalizer_fn = None 62 | normalizer_params = {} 63 | # Set weight_decay for weights in Conv and FC layers. 64 | with slim.arg_scope([slim.conv2d, slim.fully_connected], 65 | weights_regularizer=slim.l2_regularizer(weight_decay)): 66 | with slim.arg_scope( 67 | [slim.conv2d], 68 | weights_initializer=slim.variance_scaling_initializer(), 69 | activation_fn=tf.nn.relu, 70 | normalizer_fn=normalizer_fn, 71 | normalizer_params=normalizer_params) as sc: 72 | return sc 73 | -------------------------------------------------------------------------------- /nets/isp.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains the definition of the Inception V4 architecture. 16 | 17 | As described in http://arxiv.org/abs/1602.07261. 18 | 19 | Inception-v4, Inception-ResNet and the Impact of Residual Connections 20 | on Learning 21 | Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi 22 | """ 23 | from __future__ import absolute_import 24 | from __future__ import division 25 | from __future__ import print_function 26 | 27 | import tensorflow as tf 28 | 29 | from nets import inception_utils 30 | 31 | slim = tf.contrib.slim 32 | 33 | def isp_arg_scope(weight_decay=0.00004, 34 | use_batch_norm=True, 35 | batch_norm_decay=0.95, 36 | batch_norm_epsilon=0.0001): 37 | """Defines the default arg scope for inception models. 38 | 39 | Args: 40 | weight_decay: The weight decay to use for regularizing the model. 41 | use_batch_norm: "If `True`, batch_norm is applied after each convolution. 42 | batch_norm_decay: Decay for batch norm moving average. 43 | batch_norm_epsilon: Small float added to variance to avoid dividing by zero 44 | in batch norm. 45 | 46 | Returns: 47 | An `arg_scope` to use for the inception models. 48 | """ 49 | print("weight decay = ", weight_decay) 50 | print("batch norm decay = ", batch_norm_decay) 51 | batch_norm_params = { 52 | # Decay for the moving averages. 53 | 'decay': batch_norm_decay, 54 | # epsilon to prevent 0s in variance. 55 | 'epsilon': batch_norm_epsilon, 56 | # collection containing update_ops. 57 | 'updates_collections': tf.GraphKeys.UPDATE_OPS, 58 | 'center': True, 59 | 'scale': False, 60 | } 61 | if use_batch_norm: 62 | normalizer_fn = slim.batch_norm 63 | normalizer_params = batch_norm_params 64 | else: 65 | normalizer_fn = None 66 | normalizer_params = {} 67 | # Set weight_decay for weights in Conv and FC layers. 68 | with slim.arg_scope([slim.conv2d, slim.fully_connected], 69 | weights_regularizer=slim.l2_regularizer(weight_decay)): 70 | with slim.arg_scope( 71 | [slim.conv2d], 72 | weights_initializer=slim.variance_scaling_initializer(), 73 | activation_fn=tf.nn.relu, 74 | normalizer_fn=normalizer_fn, 75 | normalizer_params=normalizer_params) as sc: 76 | return sc 77 | 78 | def anscombe(data, sigma, alpha, scale=255.0, is_real_data=False): 79 | """Transform N(mu,sigma^2) + \alpha Pois(y) into N(0,scale^2) noise.""" 80 | if is_real_data: 81 | z = data/alpha[:,None,None,:] 82 | sigma_hat = sigma/alpha 83 | sqrt_term = z + 3./8. + tf.square(sigma_hat)[:,None,None,:] 84 | else: 85 | z = data/alpha[:,None,None,None] 86 | sigma_hat = sigma/alpha 87 | sqrt_term = z + 3./8. + tf.square(sigma_hat)[:,None,None,None] 88 | 89 | sqrt_term = tf.maximum(sqrt_term, 0.0) 90 | 91 | return 2*tf.sqrt(sqrt_term) 92 | 93 | 94 | def inv_anscombe(data, sigma, alpha, scale=1.0, unbiased=False, is_real_data=False): 95 | """Invert anscombe transform.""" 96 | sigma_hat = sigma/alpha 97 | if is_real_data: 98 | z = .25* tf.square(data) - 1./8 - tf.square(sigma_hat)[:,None,None,:] 99 | if unbiased: 100 | z = z + .25*tf.sqrt(3./2)*data**-1 - 11./8.*data**-2 + 5./8.*tf.sqrt(3./2)*data**-3 101 | result = z*alpha[:,None,None,:] 102 | else: 103 | z = .25* tf.square(data) - 1./8 - tf.square(sigma_hat)[:,None,None,None] 104 | #data = tf.Print(data, ["data", tf.reduce_max(data), tf.reduce_min(data)]) 105 | 106 | #z = tf.maximum(z, 0) 107 | if unbiased: 108 | z = z + .25*tf.sqrt(3./2)*data**-1 - 11./8.*data**-2 + 5./8.*tf.sqrt(3./2)*data**-3 109 | result = z*alpha[:,None,None,None] 110 | return result 111 | #return tf.clip_by_value(result, 0.0, scale) 112 | 113 | def prox_grad_isp(inputs, 114 | alpha, 115 | sigma, 116 | bayer_mask, 117 | num_iters=4, 118 | num_channels=3, 119 | num_layers=5, 120 | kernel=None, 121 | num_classes=1001, 122 | is_training=True, 123 | scale=1.0, 124 | use_anscombe=True, 125 | noise_channel=True, 126 | use_chen_unet=False, 127 | is_real_data=True): 128 | 129 | end_points = {} 130 | end_points['inputs'] = inputs 131 | if use_anscombe and alpha is not None: 132 | print(("USING THE ANCOMB TRANSFORM with scale %f" % scale) + "!"*10) 133 | true_img = anscombe(inputs, alpha=alpha, sigma=sigma, scale=scale, is_real_data=is_real_data) 134 | min_offset = tf.reduce_min(true_img, [1,2,3], keep_dims=True) 135 | max_scale = tf.reduce_max(true_img, [1,2,3], keep_dims=True) 136 | noise_scale = scale/(max_scale - min_offset) 137 | true_img = (true_img - min_offset)*noise_scale 138 | noise_ch = noise_scale 139 | end_points['post_anscombe'] = true_img 140 | else: 141 | true_img = inputs 142 | noise_ch = sigma[:,None,None,None] 143 | 144 | if not noise_channel: 145 | noise_ch = None 146 | else: 147 | print(("USING NOISE CHANNEL")) 148 | dims = [d.value for d in inputs.get_shape()] 149 | noise_ch = tf.tile(noise_ch, [1, dims[1], dims[2], 1]) 150 | 151 | if use_chen_unet: 152 | print('USING UNET AS ISP (NON-PROX GRAD)') 153 | from nets import unet 154 | ans_x_out = unet.unet(true_img) 155 | end_points = {} 156 | 157 | else: 158 | ans_x_out, end_points = prox_grad(true_img, bayer_mask, end_points, num_layers=num_layers, 159 | num_iters=num_iters, noise_channel=noise_ch, is_training=is_training) 160 | # ans_x_out, end_points = prox_grad(true_img, bayer_mask, end_points, num_layers=num_layers, 161 | # num_iters=num_iters, noise_channel=noise_ch, is_training=is_training) 162 | 163 | if use_anscombe and alpha is not None: 164 | end_points['pre_inv_anscombe'] = ans_x_out 165 | ans_x_out = ans_x_out/noise_scale + min_offset 166 | ans_x_out = inv_anscombe(ans_x_out, alpha=alpha, sigma=sigma, scale=scale, is_real_data=is_real_data) 167 | end_points['outputs'] = ans_x_out 168 | return ans_x_out, end_points 169 | 170 | 171 | def prox_grad(inputs, bayer_mask, end_points, num_layers=5, num_iters=4, lambda_init=1.0, 172 | is_training=True, scope='gauss_den', noise_channel=None): 173 | flat_inputs = tf.reduce_sum(inputs, 3, keep_dims=True) 174 | with tf.variable_scope(scope, 'gauss_den', [inputs]) as sc: 175 | xk = inputs 176 | lam = slim.variable(name='lambda', shape=[], initializer=tf.constant_initializer(lambda_init)) 177 | end_points['lambda'] = lam 178 | beta_init = 1.0 179 | for t in range(num_iters): 180 | with tf.variable_scope('iter_%i'% t): 181 | with slim.arg_scope([slim.batch_norm, slim.dropout], 182 | is_training=is_training): 183 | # Collect outputs for conv2d, fully_connected and max_pool2d. 184 | beta_init *= 2.0 # Continuation scheme as proposed in http://www.caam.rice.edu/~yzhang/reports/tr0710_rev.pdf, algorithm 2 185 | beta = slim.variable(name='beta', shape=[], initializer=tf.constant_initializer(beta_init)) 186 | end_points['beta%s'%t] = beta 187 | with tf.variable_scope('prior_grad') as prior_scope: 188 | #curr_z = cnn_proximal(xk, num_layers, 3, noise_channel, width=12, rate=1) 189 | if noise_channel is None: 190 | concat_xk = xk 191 | else: 192 | concat_xk = tf.concat([xk, noise_channel], 3) 193 | curr_z = unet_res(concat_xk, 0, 'unet') 194 | #end_points['prior_grad_%i' % t] = curr_z 195 | tmp = xk - curr_z 196 | xk = (lam*bayer_mask*inputs + beta*tmp)/(lam*bayer_mask + beta) 197 | #end_points['iter_%i' % t] = xk 198 | 199 | return xk, end_points 200 | 201 | def unet_res(inputs, depth, scope, max_depth=2): 202 | # U-NET operating at a given resolution. 203 | shape = [d.value for d in inputs.get_shape()] 204 | print(depth, shape) 205 | ch = max(shape[3]*2, 8) 206 | with tf.variable_scope('depth_%s' % depth, values=[inputs]) as scope: 207 | if depth == 0: 208 | outputs = slim.conv2d(inputs, ch, [3, 3], rate=2, scope='conv_in', normalizer_fn=None) 209 | else: 210 | outputs = slim.conv2d(inputs, ch, [3, 3], scope='conv_in') 211 | outputs = slim.conv2d(outputs, ch, [3, 3], scope='conv_1') 212 | downsamp = slim.avg_pool2d(outputs, [2, 2]) 213 | if depth < max_depth: 214 | lower = unet_res(downsamp, depth+1, scope, max_depth) 215 | outputs = tf.concat([outputs, lower], 3) 216 | with tf.variable_scope('depth_%s' % depth, values=[outputs]) as scope: 217 | outputs = slim.conv2d(outputs, ch, [3, 3], scope='conv_2') 218 | if depth > 0: 219 | outputs = slim.conv2d(outputs, ch, [3, 3], scope='out_conv') 220 | outputs = slim.conv2d_transpose(outputs, ch//2, [2,2], stride=2, scope='up_conv', 221 | activation_fn=None, normalizer_fn=None) 222 | else: 223 | outputs = slim.conv2d(outputs, 3, [3, 3], scope='out_conv', 224 | activation_fn=None, normalizer_fn=None) 225 | return outputs 226 | -------------------------------------------------------------------------------- /nets/nets_factory.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains a factory for building various models.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | import functools 21 | 22 | import tensorflow as tf 23 | 24 | from nets import isp 25 | from nets import mobilenet_v1 26 | from nets import mobilenet_isp 27 | 28 | slim = tf.contrib.slim 29 | 30 | networks_map = {'isp': isp.prox_grad_isp, 31 | 'mobilenet_isp': mobilenet_isp.mobilenet_v1, 32 | 'mobilenet_v1': mobilenet_v1.mobilenet_v1, 33 | 'deeper_mobilenet_v1': mobilenet_v1.deeper_mobile_net_v1, 34 | } 35 | 36 | arg_scopes_map = {'isp': isp.isp_arg_scope, 37 | 'mobilenet_isp': mobilenet_isp.mobilenet_v1_arg_scope, 38 | 'mobilenet_v1': mobilenet_v1.mobilenet_v1_arg_scope, 39 | 'deeper_mobilenet_v1': mobilenet_v1.mobilenet_v1_arg_scope, 40 | } 41 | 42 | 43 | def get_network_fn(name, num_classes, weight_decay, batch_norm_decay, is_training): 44 | """Returns a network_fn such as `logits, end_points = network_fn(images)`. 45 | 46 | Args: 47 | name: The name of the network. 48 | num_classes: The number of classes to use for classification. 49 | weight_decay: The l2 coefficient for the model weights. 50 | is_training: `True` if the model is being used for training and `False` 51 | otherwise. 52 | 53 | Returns: 54 | network_fn: A function that applies the model to a batch of images. It has 55 | the following signature: 56 | logits, end_points = network_fn(images) 57 | Raises: 58 | ValueError: If network `name` is not recognized. 59 | """ 60 | if name not in networks_map: 61 | raise ValueError('Name of network unknown %s' % name) 62 | 63 | func = networks_map[name] 64 | 65 | @functools.wraps(func) 66 | def network_fn(images, **kwargs): 67 | arg_scope = arg_scopes_map[name](weight_decay=weight_decay) 68 | 69 | with slim.arg_scope(arg_scope): 70 | return func(images, is_training=is_training, **kwargs) 71 | 72 | if hasattr(func, 'default_image_size'): 73 | network_fn.default_image_size = func.default_image_size 74 | 75 | return network_fn 76 | -------------------------------------------------------------------------------- /nets/nets_factory_test.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Google Inc. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | """Tests for slim.inception.""" 17 | 18 | from __future__ import absolute_import 19 | from __future__ import division 20 | from __future__ import print_function 21 | 22 | import tensorflow as tf 23 | 24 | from nets import nets_factory 25 | 26 | slim = tf.contrib.slim 27 | 28 | 29 | class NetworksTest(tf.test.TestCase): 30 | 31 | def testGetNetworkFn(self): 32 | batch_size = 5 33 | num_classes = 1000 34 | for net in nets_factory.networks_map: 35 | with self.test_session(): 36 | net_fn = nets_factory.get_network_fn(net, num_classes) 37 | # Most networks use 224 as their default_image_size 38 | image_size = getattr(net_fn, 'default_image_size', 224) 39 | inputs = tf.random_uniform((batch_size, image_size, image_size, 3)) 40 | logits, end_points = net_fn(inputs) 41 | self.assertTrue(isinstance(logits, tf.Tensor)) 42 | self.assertTrue(isinstance(end_points, dict)) 43 | self.assertEqual(logits.get_shape().as_list()[0], batch_size) 44 | self.assertEqual(logits.get_shape().as_list()[-1], num_classes) 45 | 46 | def testGetNetworkFnArgScope(self): 47 | batch_size = 5 48 | num_classes = 10 49 | net = 'cifarnet' 50 | with self.test_session(use_gpu=True): 51 | net_fn = nets_factory.get_network_fn(net, num_classes) 52 | image_size = getattr(net_fn, 'default_image_size', 224) 53 | with slim.arg_scope([slim.model_variable, slim.variable], 54 | device='/CPU:0'): 55 | inputs = tf.random_uniform((batch_size, image_size, image_size, 3)) 56 | net_fn(inputs) 57 | weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, 'CifarNet/conv1')[0] 58 | self.assertDeviceEqual('/CPU:0', weights.device) 59 | 60 | if __name__ == '__main__': 61 | tf.test.main() 62 | -------------------------------------------------------------------------------- /nets/unet.py: -------------------------------------------------------------------------------- 1 | # Tensorflow mandates these. 2 | from __future__ import absolute_import 3 | from __future__ import division 4 | from __future__ import print_function 5 | 6 | from collections import namedtuple 7 | import functools 8 | 9 | import tensorflow as tf 10 | 11 | slim = tf.contrib.slim 12 | 13 | def lrelu(x): 14 | return tf.maximum(x * 0.2, x) 15 | 16 | def upsample_and_concat(x1, x2, output_channels, in_channels): 17 | pool_size = 2 18 | deconv_filter = tf.Variable(tf.truncated_normal([pool_size, pool_size, output_channels, in_channels], stddev=0.02)) 19 | deconv = tf.nn.conv2d_transpose(x1, deconv_filter, tf.shape(x2), strides=[1, pool_size, pool_size, 1]) 20 | 21 | deconv_output = tf.concat([deconv, x2], 3) 22 | deconv_output.set_shape([None, None, None, output_channels * 2]) 23 | 24 | return deconv_output 25 | 26 | 27 | def unet(input, scope=None): 28 | with tf.variable_scope(scope, 'gauss_den_chen_unet', [input]) as sc: 29 | conv1 = slim.conv2d(input, 8, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv1_1') 30 | conv1 = slim.conv2d(conv1, 8, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv1_2') 31 | pool1 = slim.max_pool2d(conv1, [2, 2], padding='SAME') 32 | 33 | conv2 = slim.conv2d(pool1, 16, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv2_1') 34 | conv2 = slim.conv2d(conv2, 16, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv2_2') 35 | pool2 = slim.max_pool2d(conv2, [2, 2], padding='SAME') 36 | 37 | conv3 = slim.conv2d(pool2, 32, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv3_1') 38 | conv3 = slim.conv2d(conv3, 32, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv3_2') 39 | pool3 = slim.max_pool2d(conv3, [2, 2], padding='SAME') 40 | 41 | conv4 = slim.conv2d(pool3, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv4_1') 42 | conv4 = slim.conv2d(conv4, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv4_2') 43 | pool4 = slim.max_pool2d(conv4, [2, 2], padding='SAME') 44 | 45 | conv5 = slim.conv2d(pool4, 128, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv5_1') 46 | conv5 = slim.conv2d(conv5, 128, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv5_2') 47 | 48 | up6 = upsample_and_concat(conv5, conv4, 64, 128) 49 | conv6 = slim.conv2d(up6, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv6_1') 50 | conv6 = slim.conv2d(conv6, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv6_2') 51 | 52 | up7 = upsample_and_concat(conv6, conv3, 32, 64) 53 | conv7 = slim.conv2d(up7, 32, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv7_1') 54 | conv7 = slim.conv2d(conv7, 32, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv7_2') 55 | 56 | up8 = upsample_and_concat(conv7, conv2, 16, 32) 57 | conv8 = slim.conv2d(up8, 16, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv8_1') 58 | conv8 = slim.conv2d(conv8, 16, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv8_2') 59 | 60 | up9 = upsample_and_concat(conv8, conv1, 8, 16) 61 | conv9 = slim.conv2d(up9, 8, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv9_1') 62 | conv9 = slim.conv2d(conv9, 8, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv9_2') 63 | 64 | # conv10 = slim.conv2d(conv9, 12, [1, 1], rate=1, activation_fn=None, scope='g_conv10') 65 | # out = tf.depth_to_space(conv10, 2) 66 | out = slim.conv2d(conv9, 3, [1, 1], rate=1, activation_fn=None, scope='g_conv10') 67 | return out 68 | 69 | # def unet(input, scope=None): 70 | # with tf.variable_scope(scope, 'gauss_den_chen_unet', [input]) as sc: 71 | # conv1 = slim.conv2d(input, 8, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv1_1') 72 | # conv1 = slim.conv2d(conv1, 8, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv1_2') 73 | # pool1 = slim.max_pool2d(conv1, [2, 2], padding='SAME') 74 | 75 | # conv2 = slim.conv2d(pool1, 16, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv2_1') 76 | # conv2 = slim.conv2d(conv2, 16, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv2_2') 77 | # pool2 = slim.max_pool2d(conv2, [2, 2], padding='SAME') 78 | 79 | # conv3 = slim.conv2d(pool2, 32, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv3_1') 80 | # conv3 = slim.conv2d(conv3, 32, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv3_2') 81 | # pool3 = slim.max_pool2d(conv3, [2, 2], padding='SAME') 82 | 83 | # conv4 = slim.conv2d(pool3, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv4_1') 84 | # conv4 = slim.conv2d(conv4, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv4_2') 85 | # pool4 = slim.max_pool2d(conv4, [2, 2], padding='SAME') 86 | 87 | # conv5 = slim.conv2d(pool4, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv5_1') 88 | # conv5 = slim.conv2d(conv5, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv5_2') 89 | 90 | # up6 = upsample_and_concat(conv5, conv4, 32, 64) 91 | # conv6 = slim.conv2d(up6, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv6_1') 92 | # conv6 = slim.conv2d(conv6, 64, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv6_2') 93 | 94 | # up7 = upsample_and_concat(conv6, conv3, 32, 64) 95 | # conv7 = slim.conv2d(up7, 32, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv7_1') 96 | # conv7 = slim.conv2d(conv7, 32, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv7_2') 97 | 98 | # up8 = upsample_and_concat(conv7, conv2, 16, 32) 99 | # conv8 = slim.conv2d(up8, 16, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv8_1') 100 | # conv8 = slim.conv2d(conv8, 16, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv8_2') 101 | 102 | # up9 = upsample_and_concat(conv8, conv1, 8, 16) 103 | # conv9 = slim.conv2d(up9, 8, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv9_1') 104 | # conv9 = slim.conv2d(conv9, 8, [3, 3], rate=1, activation_fn=lrelu, scope='g_conv9_2') 105 | 106 | # # conv10 = slim.conv2d(conv9, 12, [1, 1], rate=1, activation_fn=None, scope='g_conv10') 107 | # # out = tf.depth_to_space(conv10, 2) 108 | # out = slim.conv2d(conv9, 3, [1, 1], rate=1, activation_fn=None, scope='g_conv10') 109 | # return out -------------------------------------------------------------------------------- /preprocessing/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /preprocessing/inception_preprocessing.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Provides utilities to preprocess images for the Inception networks.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | from preprocessing import sensor_model 22 | 23 | import tensorflow as tf 24 | import numpy as np 25 | 26 | from tensorflow.python.ops import control_flow_ops 27 | 28 | 29 | def apply_with_random_selector(x, func, num_cases): 30 | """Computes func(x, sel), with sel sampled from [0...num_cases-1]. 31 | 32 | Args: 33 | x: input Tensor. 34 | func: Python function to apply. 35 | num_cases: Python int32, number of cases to sample sel from. 36 | 37 | Returns: 38 | The result of func(x, sel), where func receives the value of the 39 | selector as a python integer, but sel is sampled dynamically. 40 | """ 41 | sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32) 42 | # Pass the real x only to one of the func calls. 43 | return control_flow_ops.merge([ 44 | func(control_flow_ops.switch(x, tf.equal(sel, case))[1], case) 45 | for case in range(num_cases)])[0] 46 | 47 | 48 | def distort_color(image, color_ordering=0, fast_mode=True, scope=None): 49 | """Distort the color of a Tensor image. 50 | 51 | Each color distortion is non-commutative and thus ordering of the color ops 52 | matters. Ideally we would randomly permute the ordering of the color ops. 53 | Rather then adding that level of complication, we select a distinct ordering 54 | of color ops for each preprocessing thread. 55 | 56 | Args: 57 | image: 3-D Tensor containing single image in [0, 1]. 58 | color_ordering: Python int, a type of distortion (valid values: 0-3). 59 | fast_mode: Avoids slower ops (random_hue and random_contrast) 60 | scope: Optional scope for name_scope. 61 | Returns: 62 | 3-D Tensor color-distorted image on range [0, 1] 63 | Raises: 64 | ValueError: if color_ordering not in [0, 3] 65 | """ 66 | with tf.name_scope(scope, 'distort_color', [image]): 67 | if fast_mode: 68 | if color_ordering == 0: 69 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 70 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 71 | else: 72 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 73 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 74 | else: 75 | if color_ordering == 0: 76 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 77 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 78 | image = tf.image.random_hue(image, max_delta=0.2) 79 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 80 | elif color_ordering == 1: 81 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 82 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 83 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 84 | image = tf.image.random_hue(image, max_delta=0.2) 85 | elif color_ordering == 2: 86 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 87 | image = tf.image.random_hue(image, max_delta=0.2) 88 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 89 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 90 | elif color_ordering == 3: 91 | image = tf.image.random_hue(image, max_delta=0.2) 92 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 93 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 94 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 95 | else: 96 | raise ValueError('color_ordering must be in [0, 3]') 97 | 98 | # The random_* ops do not necessarily clamp. 99 | return tf.clip_by_value(image, 0.0, 1.0) 100 | 101 | 102 | def distorted_bounding_box_crop(image, 103 | bbox, 104 | min_object_covered=0.1, 105 | aspect_ratio_range=(0.75, 1.33), 106 | area_range=(0.05, 1.0), 107 | max_attempts=100, 108 | scope=None): 109 | """Generates cropped_image using a one of the bboxes randomly distorted. 110 | 111 | See `tf.image.sample_distorted_bounding_box` for more documentation. 112 | 113 | Args: 114 | image: 3-D Tensor of image (it will be converted to floats in [0, 1]). 115 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 116 | where each coordinate is [0, 1) and the coordinates are arranged 117 | as [ymin, xmin, ymax, xmax]. If num_boxes is 0 then it would use the whole 118 | image. 119 | min_object_covered: An optional `float`. Defaults to `0.1`. The cropped 120 | area of the image must contain at least this fraction of any bounding box 121 | supplied. 122 | aspect_ratio_range: An optional list of `floats`. The cropped area of the 123 | image must have an aspect ratio = width / height within this range. 124 | area_range: An optional list of `floats`. The cropped area of the image 125 | must contain a fraction of the supplied image within in this range. 126 | max_attempts: An optional `int`. Number of attempts at generating a cropped 127 | region of the image of the specified constraints. After `max_attempts` 128 | failures, return the entire image. 129 | scope: Optional scope for name_scope. 130 | Returns: 131 | A tuple, a 3-D Tensor cropped_image and the distorted bbox 132 | """ 133 | with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bbox]): 134 | # Each bounding box has shape [1, num_boxes, box coords] and 135 | # the coordinates are ordered [ymin, xmin, ymax, xmax]. 136 | 137 | # A large fraction of image datasets contain a human-annotated bounding 138 | # box delineating the region of the image containing the object of interest. 139 | # We choose to create a new bounding box for the object which is a randomly 140 | # distorted version of the human-annotated bounding box that obeys an 141 | # allowed range of aspect ratios, sizes and overlap with the human-annotated 142 | # bounding box. If no box is supplied, then we assume the bounding box is 143 | # the entire image. 144 | sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( 145 | tf.shape(image), 146 | bounding_boxes=bbox, 147 | min_object_covered=min_object_covered, 148 | aspect_ratio_range=aspect_ratio_range, 149 | area_range=area_range, 150 | max_attempts=max_attempts, 151 | use_image_if_no_bounding_boxes=True) 152 | bbox_begin, bbox_size, distort_bbox = sample_distorted_bounding_box 153 | 154 | # Crop the image to the specified bounding box. 155 | cropped_image = tf.slice(image, bbox_begin, bbox_size) 156 | return cropped_image, distort_bbox 157 | 158 | 159 | def preprocess_for_train(image, height, width, bbox, 160 | fast_mode=True, 161 | light_level=None, 162 | scope=None): 163 | """Distort one image for training a netwo. 164 | 165 | Distorting images provides a useful technique for augmenting the data 166 | set during training in order to make the network invariant to aspects 167 | of the image that do not effect the label. 168 | 169 | Additionally it would create image_summaries to display the different 170 | transformations applied to the image. 171 | 172 | Args: 173 | image: 3-D Tensor of image. If dtype is tf.float32 then the range should be 174 | [0, 1], otherwise it would converted to tf.float32 assuming that the range 175 | is [0, MAX], where MAX is largest positive representable number for 176 | int(8/16/32) data type (see `tf.image.convert_image_dtype` for details). 177 | height: integer 178 | width: integer 179 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 180 | where each coordinate is [0, 1) and the coordinates are arranged 181 | as [ymin, xmin, ymax, xmax]. 182 | fast_mode: Optional boolean, if True avoids slower transformations (i.e. 183 | bi-cubic resizing, random_hue or random_contrast). 184 | scope: Optional scope for name_scope. 185 | Returns: 186 | 3-D float Tensor of distorted image used for training with range [-1, 1]. 187 | """ 188 | with tf.name_scope(scope, 'distort_image', [image, height, width, bbox]): 189 | if bbox is None: 190 | bbox = tf.constant([0.0, 0.0, 1.0, 1.0], 191 | dtype=tf.float32, 192 | shape=[1, 1, 4]) 193 | if image.dtype != tf.float32: 194 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 195 | 196 | # Each bounding box has shape [1, num_boxes, box coords] and 197 | # the coordinates are ordered [ymin, xmin, ymax, xmax]. 198 | image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0), 199 | bbox) 200 | tf.summary.image('image_with_bounding_boxes', image_with_box) 201 | 202 | distorted_image, distorted_bbox = distorted_bounding_box_crop(image, bbox) 203 | 204 | # Restore the shape since the dynamic slice based upon the bbox_size loses 205 | # the third dimension. 206 | distorted_image.set_shape([None, None, 3]) 207 | image_with_distorted_box = tf.image.draw_bounding_boxes( 208 | tf.expand_dims(image, 0), distorted_bbox) 209 | tf.summary.image('images_with_distorted_bounding_box', 210 | image_with_distorted_box) 211 | 212 | # Use nearest neighbor subsampling. 213 | print("USING NEAREST NEIGHBOR SUBSAMPLING") 214 | distorted_image = tf.image.resize_images(distorted_image, [height, width], 215 | method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 216 | 217 | tf.summary.image('cropped_resized_image', 218 | tf.expand_dims(distorted_image, 0)) 219 | 220 | # Add noise - this is only relevant when training the model from scratch. For training with an ISP, there's a small subset of images that are already noised up. 221 | #distorted_image, a, gauss_std = sensor_model.sensor_noise_rand_light_level(distorted_image, light_level) 222 | #tf.summary.image('noisy_image', tf.expand_dims(distorted_image,0)) 223 | #bayer_mask = sensor_model.get_bayer_mask(height, width) 224 | #tf.summary.image('bayer_mask', tf.expand_dims(bayer_mask*255, 0)) 225 | #distorted_image = distorted_image*bayer_mask 226 | 227 | # Randomly flip the image horizontally. 228 | distorted_image = tf.image.random_flip_left_right(distorted_image) 229 | 230 | tf.summary.image('final_distorted_image', 231 | tf.expand_dims(distorted_image, 0)) 232 | distorted_image = tf.subtract(distorted_image, 0.5) 233 | distorted_image = tf.multiply(distorted_image, 2.0) 234 | return distorted_image 235 | 236 | 237 | def preprocess_for_eval(image, height, width, light_level=None, 238 | central_fraction=0.875, scope=None): 239 | """Prepare one image for evaluation. 240 | 241 | If height and width are specified it would output an image with that size by 242 | applying resize_bilinear. 243 | 244 | If central_fraction is specified it would cropt the central fraction of the 245 | input image. 246 | 247 | Args: 248 | image: 3-D Tensor of image. If dtype is tf.float32 then the range should be 249 | [0, 1], otherwise it would converted to tf.float32 assuming that the range 250 | is [0, MAX], where MAX is largest positive representable number for 251 | int(8/16/32) data type (see `tf.image.convert_image_dtype` for details) 252 | height: integer 253 | width: integer 254 | central_fraction: Optional Float, fraction of the image to crop. 255 | scope: Optional scope for name_scope. 256 | Returns: 257 | 3-D float Tensor of prepared image. 258 | """ 259 | with tf.name_scope(scope, 'eval_image', [image, height, width]): 260 | if image.dtype != tf.float32: 261 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 262 | 263 | # Crop the central region of the image with an area containing 87.5% of 264 | # the original image. 265 | if central_fraction: 266 | image = tf.image.central_crop(image, central_fraction=central_fraction) 267 | 268 | #image = tf.py_func(sensor_model.sensor_model, [image], tf.float32, stateful=True) 269 | if height and width: 270 | # Resize the image to the specified height and width. 271 | image = tf.expand_dims(image, 0) 272 | image = tf.image.resize_images(image, [height, width], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 273 | 274 | image = tf.squeeze(image, [0]) 275 | 276 | # Add noise (only for our ISP) 277 | #image, a, gauss_std = sensor_model.sensor_noise_rand_light_level(image, light_level) 278 | #image = image*sensor_model.get_bayer_mask(height, width) 279 | 280 | image = tf.subtract(image, 0.5) 281 | image = tf.multiply(image, 2.0) 282 | image.set_shape([height, width, 3]) 283 | return image 284 | 285 | 286 | def preprocess_image(image, ground_truth, height, width, 287 | is_training=False, 288 | bbox=None, 289 | fast_mode=True, 290 | light_level=None): 291 | """Pre-process one image for training or evaluation. 292 | 293 | Args: 294 | image: 3-D Tensor [height, width, channels] with the image. 295 | height: integer, image expected height. 296 | width: integer, image expected width. 297 | is_training: Boolean. If true it would transform an image for train, 298 | otherwise it would transform it for evaluation. 299 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 300 | where each coordinate is [0, 1) and the coordinates are arranged as 301 | [ymin, xmin, ymax, xmax]. 302 | fast_mode: Optional boolean, if True avoids slower transformations. 303 | 304 | Returns: 305 | 3-D float Tensor containing an appropriately scaled image 306 | 307 | Raises: 308 | ValueError: if user does not provide bounding box 309 | """ 310 | if is_training: 311 | return preprocess_for_train(image, height, width, bbox, fast_mode, light_level) 312 | else: 313 | return preprocess_for_eval(image, height, width, light_level) 314 | -------------------------------------------------------------------------------- /preprocessing/joint_isp_preprocessing.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Provides utilities to preprocess images for the Inception networks.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | from preprocessing import sensor_model 22 | 23 | import tensorflow as tf 24 | import numpy as np 25 | 26 | from tensorflow.python.ops import control_flow_ops 27 | 28 | 29 | def apply_with_random_selector(x, func, num_cases): 30 | """Computes func(x, sel), with sel sampled from [0...num_cases-1]. 31 | 32 | Args: 33 | x: input Tensor. 34 | func: Python function to apply. 35 | num_cases: Python int32, number of cases to sample sel from. 36 | 37 | Returns: 38 | The result of func(x, sel), where func receives the value of the 39 | selector as a python integer, but sel is sampled dynamically. 40 | """ 41 | sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32) 42 | # Pass the real x only to one of the func calls. 43 | return control_flow_ops.merge([ 44 | func(control_flow_ops.switch(x, tf.equal(sel, case))[1], case) 45 | for case in range(num_cases)])[0] 46 | 47 | 48 | def distort_color(image, color_ordering=0, fast_mode=True, scope=None): 49 | """Distort the color of a Tensor image. 50 | 51 | Each color distortion is non-commutative and thus ordering of the color ops 52 | matters. Ideally we would randomly permute the ordering of the color ops. 53 | Rather then adding that level of complication, we select a distinct ordering 54 | of color ops for each preprocessing thread. 55 | 56 | Args: 57 | image: 3-D Tensor containing single image in [0, 1]. 58 | color_ordering: Python int, a type of distortion (valid values: 0-3). 59 | fast_mode: Avoids slower ops (random_hue and random_contrast) 60 | scope: Optional scope for name_scope. 61 | Returns: 62 | 3-D Tensor color-distorted image on range [0, 1] 63 | Raises: 64 | ValueError: if color_ordering not in [0, 3] 65 | """ 66 | with tf.name_scope(scope, 'distort_color', [image]): 67 | if fast_mode: 68 | if color_ordering == 0: 69 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 70 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 71 | else: 72 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 73 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 74 | else: 75 | if color_ordering == 0: 76 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 77 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 78 | image = tf.image.random_hue(image, max_delta=0.2) 79 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 80 | elif color_ordering == 1: 81 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 82 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 83 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 84 | image = tf.image.random_hue(image, max_delta=0.2) 85 | elif color_ordering == 2: 86 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 87 | image = tf.image.random_hue(image, max_delta=0.2) 88 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 89 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 90 | elif color_ordering == 3: 91 | image = tf.image.random_hue(image, max_delta=0.2) 92 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 93 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 94 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 95 | else: 96 | raise ValueError('color_ordering must be in [0, 3]') 97 | 98 | # The random_* ops do not necessarily clamp. 99 | return tf.clip_by_value(image, 0.0, 1.0) 100 | 101 | 102 | def distorted_bounding_box_crop(image, 103 | bbox, 104 | min_object_covered=0.1, 105 | aspect_ratio_range=(0.75, 1.33), 106 | area_range=(0.05, 1.0), 107 | max_attempts=100, 108 | scope=None): 109 | """Generates cropped_image using a one of the bboxes randomly distorted. 110 | 111 | See `tf.image.sample_distorted_bounding_box` for more documentation. 112 | 113 | Args: 114 | image: 3-D Tensor of image (it will be converted to floats in [0, 1]). 115 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 116 | where each coordinate is [0, 1) and the coordinates are arranged 117 | as [ymin, xmin, ymax, xmax]. If num_boxes is 0 then it would use the whole 118 | image. 119 | min_object_covered: An optional `float`. Defaults to `0.1`. The cropped 120 | area of the image must contain at least this fraction of any bounding box 121 | supplied. 122 | aspect_ratio_range: An optional list of `floats`. The cropped area of the 123 | image must have an aspect ratio = width / height within this range. 124 | area_range: An optional list of `floats`. The cropped area of the image 125 | must contain a fraction of the supplied image within in this range. 126 | max_attempts: An optional `int`. Number of attempts at generating a cropped 127 | region of the image of the specified constraints. After `max_attempts` 128 | failures, return the entire image. 129 | scope: Optional scope for name_scope. 130 | Returns: 131 | A tuple, a 3-D Tensor cropped_image and the distorted bbox 132 | """ 133 | with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bbox]): 134 | # Each bounding box has shape [1, num_boxes, box coords] and 135 | # the coordinates are ordered [ymin, xmin, ymax, xmax]. 136 | 137 | # A large fraction of image datasets contain a human-annotated bounding 138 | # box delineating the region of the image containing the object of interest. 139 | # We choose to create a new bounding box for the object which is a randomly 140 | # distorted version of the human-annotated bounding box that obeys an 141 | # allowed range of aspect ratios, sizes and overlap with the human-annotated 142 | # bounding box. If no box is supplied, then we assume the bounding box is 143 | # the entire image. 144 | sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( 145 | tf.shape(image), 146 | bounding_boxes=bbox, 147 | min_object_covered=min_object_covered, 148 | aspect_ratio_range=aspect_ratio_range, 149 | area_range=area_range, 150 | max_attempts=max_attempts, 151 | use_image_if_no_bounding_boxes=True) 152 | bbox_begin, bbox_size, distort_bbox = sample_distorted_bounding_box 153 | 154 | # Crop the image to the specified bounding box. 155 | cropped_image = tf.slice(image, bbox_begin, bbox_size) 156 | return cropped_image, distort_bbox 157 | 158 | 159 | def preprocess_for_train(image, height, width, bbox, 160 | fast_mode=True, 161 | light_level=None, 162 | scope=None): 163 | """Distort one image for training a netwo. 164 | 165 | Distorting images provides a useful technique for augmenting the data 166 | set during training in order to make the network invariant to aspects 167 | of the image that do not effect the label. 168 | 169 | Additionally it would create image_summaries to display the different 170 | transformations applied to the image. 171 | 172 | Args: 173 | image: 3-D Tensor of image. If dtype is tf.float32 then the range should be 174 | [0, 1], otherwise it would converted to tf.float32 assuming that the range 175 | is [0, MAX], where MAX is largest positive representable number for 176 | int(8/16/32) data type (see `tf.image.convert_image_dtype` for details). 177 | height: integer 178 | width: integer 179 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 180 | where each coordinate is [0, 1) and the coordinates are arranged 181 | as [ymin, xmin, ymax, xmax]. 182 | fast_mode: Optional boolean, if True avoids slower transformations (i.e. 183 | bi-cubic resizing, random_hue or random_contrast). 184 | scope: Optional scope for name_scope. 185 | Returns: 186 | 3-D float Tensor of distorted image used for training with range [-1, 1]. 187 | """ 188 | with tf.name_scope(scope, 'distort_image', [image, height, width, bbox]): 189 | 190 | 191 | if bbox is None: 192 | bbox = tf.constant([0.0, 0.0, 1.0, 1.0], 193 | dtype=tf.float32, 194 | shape=[1, 1, 4]) 195 | if image.dtype != tf.float32: 196 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 197 | 198 | # Each bounding box has shape [1, num_boxes, box coords] and 199 | # the coordinates are ordered [ymin, xmin, ymax, xmax]. 200 | image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0), 201 | bbox) 202 | tf.summary.image('image_with_bounding_boxes', image_with_box) 203 | 204 | distorted_image, distorted_bbox = distorted_bounding_box_crop(image, bbox) 205 | # Restore the shape since the dynamic slice based upon the bbox_size loses 206 | # the third dimension. 207 | distorted_image.set_shape([None, None, 3]) 208 | image_with_distorted_box = tf.image.draw_bounding_boxes( 209 | tf.expand_dims(image, 0), distorted_bbox) 210 | tf.summary.image('images_with_distorted_bounding_box', 211 | image_with_distorted_box) 212 | 213 | # This resizing operation may distort the images because the aspect 214 | # ratio is not respected. We select a resize method in a round robin 215 | # fashion based on the thread number. 216 | # Note that ResizeMethod contains 4 enumerated resizing methods. 217 | 218 | 219 | # We select only 1 case for fast_mode bilinear. 220 | #num_resize_cases = 1 221 | #distorted_image = apply_with_random_selector( 222 | # distorted_image, 223 | # lambda x, method: tf.image.resize_images(x, [height, width], method=method), 224 | # num_cases=num_resize_cases) 225 | 226 | # Use nearest neighbor subsampling. 227 | print("USING NEAREST NEIGHBOR SUBSAMPLING") 228 | distorted_image = tf.image.resize_images(distorted_image, [height, width], 229 | method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 230 | 231 | tf.summary.image('cropped_resized_image', 232 | tf.expand_dims(distorted_image, 0)) 233 | 234 | # Randomly flip the image horizontally. 235 | distorted_image = tf.image.random_flip_left_right(distorted_image) 236 | 237 | tf.summary.image('final_distorted_image', 238 | tf.expand_dims(distorted_image, 0)) 239 | return distorted_image 240 | 241 | 242 | def preprocess_for_eval(image, height, width, light_level=None, 243 | central_fraction=0.875, scope=None, sensor='Nexus_6P_rear'): 244 | """Prepare one image for evaluation. 245 | 246 | If height and width are specified it would output an image with that size by 247 | applying resize_bilinear. 248 | 249 | If central_fraction is specified it would cropt the central fraction of the 250 | input image. 251 | 252 | Args: 253 | image: 3-D Tensor of image. If dtype is tf.float32 then the range should be 254 | [0, 1], otherwise it would converted to tf.float32 assuming that the range 255 | is [0, MAX], where MAX is largest positive representable number for 256 | int(8/16/32) data type (see `tf.image.convert_image_dtype` for details) 257 | height: integer 258 | width: integer 259 | central_fraction: Optional Float, fraction of the image to crop. 260 | scope: Optional scope for name_scope. 261 | Returns: 262 | 3-D float Tensor of prepared image. 263 | """ 264 | with tf.name_scope(scope, 'eval_image', [image, height, width]): 265 | if image.dtype != tf.float32: 266 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 267 | 268 | # Crop the central region of the image with an area containing 87.5% of 269 | # the original image. 270 | if central_fraction: 271 | image = tf.image.central_crop(image, central_fraction=central_fraction) 272 | 273 | #image = tf.py_func(sensor_model.sensor_model, [image], tf.float32, stateful=True) 274 | if height and width: 275 | # Resize the image to the specified height and width. 276 | image = tf.expand_dims(image, 0) 277 | image = tf.image.resize_images(image, [height, width], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 278 | 279 | image = tf.squeeze(image, [0]) 280 | 281 | 282 | B = image[::2,::2,2] 283 | R = image[1::2,1::2,0] 284 | G1 = image[1::2,::2,1] 285 | G2 = image[::2,1::2,1] 286 | stacked = tf.stack([R, B, G1, G2], axis=2) 287 | mean = tf.reduce_mean(stacked) 288 | std = tf.py_func(noise_est, [stacked], tf.float32) 289 | light_level = sensor_model.std2ll(std, mean=mean, sensor=sensor) 290 | light_level.set_shape([]) 291 | image.set_shape([height, width, 3]) 292 | return image, light_level 293 | 294 | def noise_est(img): 295 | stds = sensor_model.estimate_std(img) 296 | return np.float32(np.mean(stds)) 297 | 298 | def preprocess_image(image, ground_truth, height, width, 299 | is_training=False, 300 | bbox=None, 301 | fast_mode=True, 302 | light_level=None, 303 | sensor='Nexus_6P_rear'): 304 | """Pre-process one image for training or evaluation. 305 | 306 | Args: 307 | image: 3-D Tensor [height, width, channels] with the image. 308 | height: integer, image expected height. 309 | width: integer, image expected width. 310 | is_training: Boolean. If true it would transform an image for train, 311 | otherwise it would transform it for evaluation. 312 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 313 | where each coordinate is [0, 1) and the coordinates are arranged as 314 | [ymin, xmin, ymax, xmax]. 315 | fast_mode: Optional boolean, if True avoids slower transformations. 316 | 317 | Returns: 318 | 3-D float Tensor containing an appropriately scaled image 319 | 320 | Raises: 321 | ValueError: if user does not provide bounding box 322 | """ 323 | if is_training: 324 | return preprocess_for_train(image, height, width, bbox, fast_mode, light_level) 325 | else: 326 | return preprocess_for_eval(image, height, width, light_level, sensor=sensor) 327 | -------------------------------------------------------------------------------- /preprocessing/no_preprocessing.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Provides utilities to preprocess images for the Inception networks.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | from preprocessing import sensor_model 22 | 23 | import tensorflow as tf 24 | import numpy as np 25 | 26 | from tensorflow.python.ops import control_flow_ops 27 | 28 | 29 | def apply_with_random_selector(x, func, num_cases): 30 | """Computes func(x, sel), with sel sampled from [0...num_cases-1]. 31 | 32 | Args: 33 | x: input Tensor. 34 | func: Python function to apply. 35 | num_cases: Python int32, number of cases to sample sel from. 36 | 37 | Returns: 38 | The result of func(x, sel), where func receives the value of the 39 | selector as a python integer, but sel is sampled dynamically. 40 | """ 41 | sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32) 42 | # Pass the real x only to one of the func calls. 43 | return control_flow_ops.merge([ 44 | func(control_flow_ops.switch(x, tf.equal(sel, case))[1], case) 45 | for case in range(num_cases)])[0] 46 | 47 | 48 | def distort_color(image, color_ordering=0, fast_mode=True, scope=None): 49 | """Distort the color of a Tensor image. 50 | 51 | Each color distortion is non-commutative and thus ordering of the color ops 52 | matters. Ideally we would randomly permute the ordering of the color ops. 53 | Rather then adding that level of complication, we select a distinct ordering 54 | of color ops for each preprocessing thread. 55 | 56 | Args: 57 | image: 3-D Tensor containing single image in [0, 1]. 58 | color_ordering: Python int, a type of distortion (valid values: 0-3). 59 | fast_mode: Avoids slower ops (random_hue and random_contrast) 60 | scope: Optional scope for name_scope. 61 | Returns: 62 | 3-D Tensor color-distorted image on range [0, 1] 63 | Raises: 64 | ValueError: if color_ordering not in [0, 3] 65 | """ 66 | with tf.name_scope(scope, 'distort_color', [image]): 67 | if fast_mode: 68 | if color_ordering == 0: 69 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 70 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 71 | else: 72 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 73 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 74 | else: 75 | if color_ordering == 0: 76 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 77 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 78 | image = tf.image.random_hue(image, max_delta=0.2) 79 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 80 | elif color_ordering == 1: 81 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 82 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 83 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 84 | image = tf.image.random_hue(image, max_delta=0.2) 85 | elif color_ordering == 2: 86 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 87 | image = tf.image.random_hue(image, max_delta=0.2) 88 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 89 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 90 | elif color_ordering == 3: 91 | image = tf.image.random_hue(image, max_delta=0.2) 92 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 93 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 94 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 95 | else: 96 | raise ValueError('color_ordering must be in [0, 3]') 97 | 98 | # The random_* ops do not necessarily clamp. 99 | return tf.clip_by_value(image, 0.0, 1.0) 100 | 101 | 102 | def distorted_bounding_box_crop(image, 103 | bbox, 104 | min_object_covered=0.1, 105 | aspect_ratio_range=(0.75, 1.33), 106 | area_range=(0.05, 1.0), 107 | max_attempts=100, 108 | scope=None): 109 | """Generates cropped_image using a one of the bboxes randomly distorted. 110 | 111 | See `tf.image.sample_distorted_bounding_box` for more documentation. 112 | 113 | Args: 114 | image: 3-D Tensor of image (it will be converted to floats in [0, 1]). 115 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 116 | where each coordinate is [0, 1) and the coordinates are arranged 117 | as [ymin, xmin, ymax, xmax]. If num_boxes is 0 then it would use the whole 118 | image. 119 | min_object_covered: An optional `float`. Defaults to `0.1`. The cropped 120 | area of the image must contain at least this fraction of any bounding box 121 | supplied. 122 | aspect_ratio_range: An optional list of `floats`. The cropped area of the 123 | image must have an aspect ratio = width / height within this range. 124 | area_range: An optional list of `floats`. The cropped area of the image 125 | must contain a fraction of the supplied image within in this range. 126 | max_attempts: An optional `int`. Number of attempts at generating a cropped 127 | region of the image of the specified constraints. After `max_attempts` 128 | failures, return the entire image. 129 | scope: Optional scope for name_scope. 130 | Returns: 131 | A tuple, a 3-D Tensor cropped_image and the distorted bbox 132 | """ 133 | with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bbox]): 134 | # Each bounding box has shape [1, num_boxes, box coords] and 135 | # the coordinates are ordered [ymin, xmin, ymax, xmax]. 136 | 137 | # A large fraction of image datasets contain a human-annotated bounding 138 | # box delineating the region of the image containing the object of interest. 139 | # We choose to create a new bounding box for the object which is a randomly 140 | # distorted version of the human-annotated bounding box that obeys an 141 | # allowed range of aspect ratios, sizes and overlap with the human-annotated 142 | # bounding box. If no box is supplied, then we assume the bounding box is 143 | # the entire image. 144 | sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( 145 | tf.shape(image), 146 | bounding_boxes=bbox, 147 | min_object_covered=min_object_covered, 148 | aspect_ratio_range=aspect_ratio_range, 149 | area_range=area_range, 150 | max_attempts=max_attempts, 151 | use_image_if_no_bounding_boxes=True) 152 | bbox_begin, bbox_size, distort_bbox = sample_distorted_bounding_box 153 | 154 | # Crop the image to the specified bounding box. 155 | cropped_image = tf.slice(image, bbox_begin, bbox_size) 156 | return cropped_image, distort_bbox 157 | 158 | 159 | def preprocess_for_train(image, height, width, bbox, 160 | fast_mode=True, 161 | light_level=None, 162 | scope=None): 163 | """Distort one image for training a netwo. 164 | 165 | Distorting images provides a useful technique for augmenting the data 166 | set during training in order to make the network invariant to aspects 167 | of the image that do not effect the label. 168 | 169 | Additionally it would create image_summaries to display the different 170 | transformations applied to the image. 171 | 172 | Args: 173 | image: 3-D Tensor of image. If dtype is tf.float32 then the range should be 174 | [0, 1], otherwise it would converted to tf.float32 assuming that the range 175 | is [0, MAX], where MAX is largest positive representable number for 176 | int(8/16/32) data type (see `tf.image.convert_image_dtype` for details). 177 | height: integer 178 | width: integer 179 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 180 | where each coordinate is [0, 1) and the coordinates are arranged 181 | as [ymin, xmin, ymax, xmax]. 182 | fast_mode: Optional boolean, if True avoids slower transformations (i.e. 183 | bi-cubic resizing, random_hue or random_contrast). 184 | scope: Optional scope for name_scope. 185 | Returns: 186 | 3-D float Tensor of distorted image used for training with range [-1, 1]. 187 | """ 188 | with tf.name_scope(scope, 'distort_image', [image, height, width, bbox]): 189 | 190 | 191 | if image.dtype != tf.float32: 192 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 193 | 194 | # Randomly flip the image horizontally. 195 | distorted_image = tf.image.random_flip_left_right(image) 196 | 197 | tf.summary.image('final_distorted_image', 198 | tf.expand_dims(distorted_image, 0)) 199 | distorted_image.set_shape([height, width, 3]) 200 | return 2*(distorted_image - 0.5) 201 | 202 | 203 | def preprocess_for_eval(image, height, width, light_level=None, 204 | central_fraction=0.875, scope=None): 205 | """Prepare one image for evaluation. 206 | 207 | If height and width are specified it would output an image with that size by 208 | applying resize_bilinear. 209 | 210 | If central_fraction is specified it would cropt the central fraction of the 211 | input image. 212 | 213 | Args: 214 | image: 3-D Tensor of image. If dtype is tf.float32 then the range should be 215 | [0, 1], otherwise it would converted to tf.float32 assuming that the range 216 | is [0, MAX], where MAX is largest positive representable number for 217 | int(8/16/32) data type (see `tf.image.convert_image_dtype` for details) 218 | height: integer 219 | width: integer 220 | central_fraction: Optional Float, fraction of the image to crop. 221 | scope: Optional scope for name_scope. 222 | Returns: 223 | 3-D float Tensor of prepared image. 224 | """ 225 | with tf.name_scope(scope, 'eval_image', [image, height, width]): 226 | if image.dtype != tf.float32: 227 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 228 | image.set_shape([height, width, 3]) 229 | 230 | # Break into colors and then upsample. 231 | #B = tf.image.resize_images(image[::2,::2,2:3], [height, width]) 232 | #R = tf.image.resize_images(image[1::2,1::2,0:1], [height, width]) 233 | #G1 = tf.image.resize_images(image[1::2,::2,1:2], [height, width]) 234 | #G2 = tf.image.resize_images(image[::2,1::2,1:2], [height, width]) 235 | #image = tf.concat([R,(G1+G2)/2,B], axis=2) 236 | 237 | #image = tf.Print(image, [tf.reduce_min(image), tf.reduce_max(image)]) 238 | return 2*(image - 0.5) 239 | 240 | def noise_est(img): 241 | stds = sensor_model.estimate_std(img) 242 | return np.float32(np.mean(stds)) 243 | 244 | def preprocess_image(image, ground_truth, height, width, 245 | is_training=False, 246 | bbox=None, 247 | fast_mode=True, 248 | light_level=None, sensor=None): 249 | """Pre-process one image for training or evaluation. 250 | 251 | Args: 252 | image: 3-D Tensor [height, width, channels] with the image. 253 | height: integer, image expected height. 254 | width: integer, image expected width. 255 | is_training: Boolean. If true it would transform an image for train, 256 | otherwise it would transform it for evaluation. 257 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 258 | where each coordinate is [0, 1) and the coordinates are arranged as 259 | [ymin, xmin, ymax, xmax]. 260 | fast_mode: Optional boolean, if True avoids slower transformations. 261 | 262 | Returns: 263 | 3-D float Tensor containing an appropriately scaled image 264 | 265 | Raises: 266 | ValueError: if user does not provide bounding box 267 | """ 268 | if is_training: 269 | return preprocess_for_train(image, height, width, bbox, fast_mode, light_level) 270 | else: 271 | return preprocess_for_eval(image, height, width, light_level) 272 | -------------------------------------------------------------------------------- /preprocessing/preprocessing_factory.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains a factory for building various models.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | import tensorflow as tf 22 | 23 | from preprocessing import inception_preprocessing 24 | from preprocessing import isp_pretrain_preprocessing 25 | from preprocessing import joint_isp_preprocessing 26 | from preprocessing import writeout_preprocessing 27 | from preprocessing import no_preprocessing 28 | 29 | slim = tf.contrib.slim 30 | 31 | 32 | def get_preprocessing(name, is_training): 33 | """Returns preprocessing_fn(image, height, width, **kwargs). 34 | 35 | Args: 36 | name: The name of the preprocessing function. 37 | is_training: `True` if the model is being used for training and `False` 38 | otherwise. 39 | 40 | Returns: 41 | preprocessing_fn: A function that preprocessing a single image (pre-batch). 42 | It has the following signature: 43 | image = preprocessing_fn(image, output_height, output_width, ...). 44 | 45 | Raises: 46 | ValueError: If Preprocessing `name` is not recognized. 47 | """ 48 | preprocessing_fn_map = { 49 | 'isp': isp_pretrain_preprocessing, 50 | 'mobilenet_v1': inception_preprocessing, 51 | 'mobilenet_isp': joint_isp_preprocessing, 52 | 'resnet_isp': isp_pretrain_preprocessing, 53 | 'gharbi_isp': isp_pretrain_preprocessing, 54 | 'writeout': writeout_preprocessing, 55 | 'none': no_preprocessing, 56 | 'deeper_mobilenet_v1': inception_preprocessing, 57 | } 58 | 59 | if name not in preprocessing_fn_map: 60 | raise ValueError('Preprocessing name [%s] was not recognized' % name) 61 | 62 | def preprocessing_fn(image, ground_truth, output_height, output_width, **kwargs): 63 | return preprocessing_fn_map[name].preprocess_image( 64 | image, ground_truth, output_height, output_width, is_training=is_training, **kwargs) 65 | 66 | return preprocessing_fn 67 | -------------------------------------------------------------------------------- /preprocessing/sensor_model.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import scipy.io as sio 4 | from scipy.stats import poisson 5 | 6 | ############################################################################### 7 | # Sensor model 8 | ############################################################################### 9 | 10 | # Sensors calibrated for ISO100 (format is Poissonian scale, Gaussian std) 11 | # Iso 100, 200, 400, 800, 1600, 3200 12 | sensors = {'Nexus_6P_rear': [0.00018724, 0.0004733], 13 | 'Nexus_6P_front': [0.00015, 0.0003875], 14 | 'SEMCO': [0.000388, 0.0025], 15 | 'OV2740': [0.000088021, 0.00022673], 16 | #'GAUSSIAN': [0,0.005], 17 | 'GAUSS': [0,1], 18 | 'POISSON': [1,0], 19 | 'Pixel': [0.0153, 0.0328], #[0.00019856, 0.0017], 20 | 'Pixel3x3': [2.2682e-4, 0.0017], 21 | 'Pixel5x5': [1.2361e-4, 0.0043], 22 | 'Pixel7x7': [7.3344e-05, 0.0077], 23 | } 24 | 25 | sensorpositions = {'center': 0.5, 26 | 'offaxis': 0.9, 27 | 'periphery': 1.0} 28 | 29 | light_levels = 3 * np.array([2 ** i for i in range(6)]) / 2000.0 30 | 31 | def std2ll(std, mean=0.5, sensor='Nexus_6P_rear'): 32 | #light_level = sensors[sensor][1]/std 33 | #print('Sensor', sensor) 34 | alpha, beta = sensors[sensor] 35 | alpha_mean = alpha*mean 36 | num = np.sqrt(alpha_mean**2 + 4*beta**2*std**2) - alpha_mean 37 | light_level = (2*beta**2)/num 38 | return light_level 39 | 40 | def get_bayer_mask(height, width): 41 | # Mask based on Bayer pattern. (assume RGB order of colors) 42 | # B G 43 | # G R 44 | bayer_mask = np.zeros([height, width, 3]) 45 | bayer_mask[1::2,1::2,0:1] = 1 # R 46 | bayer_mask[1::2,::2,1:2] = 1 # G 47 | bayer_mask[::2,1::2,1:2] = 1 # G 48 | bayer_mask[::2,::2,2:3] = 1 # B 49 | return bayer_mask 50 | 51 | def optics_model(psfs, sensorpos='center', visualize=True ): 52 | #Expects calibrated PSFs (in matlab format) as input 53 | 54 | #Compute positions on grid 55 | psf_shape = np.array(psfs.shape) 56 | selected_pos = (psf_shape*sensorpositions[sensorpos]).astype(int) 57 | 58 | #Extract the position 59 | psf_sel = psfs[selected_pos[0] - 1,selected_pos[1] - 1]['PSF'][0,0] 60 | psf_sel = np.maximum(psf_sel, 0.0) 61 | 62 | #Normalize 63 | for ch in range(psf_sel.shape[2]): 64 | psf_sel[:,:,ch] = psf_sel[:,:,ch]/np.sum(psf_sel[:,:,ch]) 65 | 66 | return psf_sel 67 | 68 | def psf_iterator(): 69 | sensor_positions = ['center', 'offaxis', 'periphery'] 70 | psfs = sio.loadmat('PSFs/bloc_256_Nexus_defective.mat')['bloc'] 71 | for sensor_position in sensor_positions: 72 | psf_kernel = np.asfortranarray(optics_model(psfs, sensorpos=sensor_position, visualize=False).astype(np.float32)) 73 | yield sensor_position, psf_kernel 74 | 75 | def load_psfs(): 76 | sensor_positions = ['center', 'offaxis', 'periphery'] 77 | psfs = sio.loadmat('PSFs/bloc_256_Nexus_defective.mat')['bloc'] 78 | 79 | kernels = [] 80 | for sensor_position in sensor_positions: 81 | kernel = np.asfortranarray(optics_model(psfs, sensorpos=sensor_position, visualize=False).astype(np.float32)) 82 | for channel in xrange(3): 83 | kernel[:,:,channel] /= np.sum(kernel[:,:,channel]) 84 | kernels.append(kernel) 85 | 86 | return kernels 87 | 88 | def get_noise_params(iso, sensor): 89 | sensor = 'Nexus_6P_rear' 90 | poisson = sensors[sensor][0] 91 | sigma = sensors[sensor][1] 92 | 93 | a = poisson * iso / 100.0 #Poisson scale 94 | b = (sigma * iso / 100.0)**2 95 | 96 | return a, np.sqrt(b) 97 | 98 | def sensor_model(y): 99 | # Invalid sensor 100 | iso = 1.0 / 0.0015 * 100 101 | sensor='Nexus_6P_rear' 102 | 103 | poisson = sensors[sensor][0] 104 | sigma = sensors[sensor][1] 105 | 106 | #Output stats 107 | #print( 'Sensor {0} ISO {1} Poisson {2} Gaussian {3}'.format(sensor, iso, poisson, sigma) ) 108 | 109 | # Assume linear ISO model 110 | a = poisson * iso / 100.0 #Poisson scale 111 | b = (sigma * iso / 100.0)**2 112 | 113 | #Return Poissonian-Gaussian response 114 | #noisy_img = poisson_gaussian_np(y, a, b, True, True) 115 | noisy_img = poisson_gaussian_np(y, a, b, True, True) 116 | return noisy_img.astype(np.float32) 117 | 118 | def sensor_noise_rand_sigma(img_batch, sigma_range, scale=1.0, sensor='Nexus_6P_rear'): 119 | # Define in terms of Gaussian noise after Anscombe. 120 | batch_size = img_batch.get_shape()[0].value 121 | poisson = sensors[sensor][0] 122 | gauss = sensors[sensor][1] 123 | sigma = tf.random_uniform([batch_size], sigma_range[0], sigma_range[1])*scale/255.0 124 | if poisson == 0: 125 | noisy_batch = img_batch + sigma[:,None,None,None] * tf.random_normal(shape=img_batch.get_shape(), dtype=tf.float32) 126 | noisy_batch = tf.clip_by_value(noisy_batch, 0.0, scale) 127 | return noisy_batch, None, sigma 128 | sigma_hat = gauss/poisson 129 | offset = 2*tf.sqrt(3./8. + sigma_hat**2) 130 | tmp = (1./sigma + offset)**2/4 - 3./8. - sigma_hat**2 131 | light_level = poisson*tmp 132 | iso = 1.0 / light_level * 100. 133 | #iso = tf.Print(iso, [light_level]) 134 | 135 | # Assume linear ISO model 136 | a = poisson * iso / 100.0 * scale #Poisson scale 137 | gauss_var = tf.square(gauss * iso / 100.0) * scale**2 138 | 139 | upper = 2*tf.sqrt(light_level/poisson + 3./8. + sigma_hat**2) 140 | lower = 2*tf.sqrt(3./8. + sigma_hat**2) 141 | tf.summary.scalar('noise_level', 255./(upper - lower)[0]) 142 | tf.summary.scalar('iso', tf.reduce_mean(iso)) 143 | tf.summary.scalar('light_level', tf.reduce_mean(light_level)) 144 | tf.summary.scalar('a', tf.reduce_mean(a)/scale) 145 | tf.summary.scalar('gauss_variance', tf.reduce_mean(gauss_var)/scale**2) 146 | 147 | # a = tf.Print(a, [255./(upper - lower)]) 148 | print("Simulating sensor {0}.".format(sensor)) 149 | 150 | noisy_batch = poisson_gauss_tf(img_batch, a, gauss_var, clip=(0.,scale)) 151 | # Return Poissonian-Gaussian response 152 | return noisy_batch, a, tf.sqrt(gauss_var) 153 | 154 | def get_coeffs(light_levels, sensor='Nexus_6P_rear'): 155 | #print('Sensor', sensor) 156 | poisson = sensors[sensor][0] 157 | gauss = sensors[sensor][1] 158 | iso = 1.0 / light_levels * 100. 159 | a = poisson * iso / 100.0 #Poisson scale 160 | b = (gauss * iso / 100.0) 161 | return a, b 162 | 163 | def sensor_noise_rand_light_level(img_batch, ll_range, scale=1.0, sensor='Nexus_6P_rear'): 164 | print("Sensor = %s, scale = %s" % (sensor, scale)) 165 | batch_size = img_batch.get_shape()[0].value 166 | poisson = sensors[sensor][0] 167 | gauss = sensors[sensor][1] 168 | 169 | # Sample uniformly in logspace. 170 | # low ll * exp(u), u ~ [0, log(high ll/low ll)] 171 | ll_ratio = ll_range[1]/ll_range[0] 172 | ll_factor = tf.random_uniform([batch_size], minval=0, maxval=tf.log(ll_ratio), dtype=tf.float32) 173 | light_level = ll_range[0]*tf.exp(ll_factor) 174 | iso = 1.0 / light_level * 100. 175 | 176 | # Assume linear ISO model 177 | a = poisson * iso / 100.0 * scale #Poisson scale 178 | 179 | gauss_var = tf.square(gauss * iso / 100.0) * scale**2 180 | if poisson == 0: 181 | noisy_batch = img_batch + tf.sqrt(gauss_var[:,None,None,None]) * tf.random_normal(shape=img_batch.get_shape(), dtype=tf.float32) 182 | noisy_batch = tf.clip_by_value(noisy_batch, 0.0, scale) 183 | return noisy_batch, np.zeros(batch_size), tf.sqrt(gauss_var) 184 | 185 | tf.summary.scalar('iso', tf.reduce_mean(iso)) 186 | tf.summary.scalar('light_level', tf.reduce_mean(light_level)) 187 | tf.summary.scalar('a', tf.reduce_mean(a)/scale) 188 | tf.summary.scalar('gauss_variance', tf.reduce_mean(gauss_var)/scale**2) 189 | 190 | print("Simulating sensor {0}.".format(sensor)) 191 | 192 | noisy_batch = poisson_gauss_tf(img_batch, a, gauss_var, clip=(0.,scale)) 193 | sigma_hat = gauss/poisson 194 | 195 | return noisy_batch, a, tf.sqrt(gauss_var) 196 | 197 | def poisson_gauss_tf(img_batch, a, gauss_var, clip=(0.,1.)): 198 | # Apply poissonian-gaussian noise model following A.Foi et al. 199 | # Foi, A., "Practical denoising of clipped or overexposed noisy images", 200 | # Proc. 16th European Signal Process. Conf., EUSIPCO 2008, Lausanne, Switzerland, August 2008. 201 | batch_shape = tf.shape(img_batch) 202 | 203 | a_p = a[:,None,None,None] 204 | out = tf.random_poisson(shape=[], lam=tf.maximum(img_batch/a_p, 0.0), dtype=tf.float32) * a_p 205 | #out = tf.Print(out, [tf.reduce_max(out), tf.reduce_min(out)]) 206 | gauss_var = tf.maximum(gauss_var, 0.0) 207 | 208 | gauss_noise = tf.sqrt(gauss_var[:,None,None,None]) * tf.random_normal(shape=batch_shape, dtype=tf.float32) #Gaussian component 209 | 210 | out += gauss_noise 211 | 212 | # Clipping 213 | if clip is not None: 214 | out = tf.clip_by_value(out, clip[0], clip[1]) 215 | 216 | # Return the simulated image 217 | return out 218 | 219 | 220 | def poisson_gaussian_np(y, a, b, clip_below=True, clip_above=True): 221 | # Apply poissonian-gaussian noise model following A.Foi et al. 222 | # Foi, A., "Practical denoising of clipped or overexposed noisy images", 223 | # Proc. 16th European Signal Process. Conf., EUSIPCO 2008, Lausanne, Switzerland, August 2008. 224 | 225 | # Check method 226 | if(a==0): # no Poissonian component 227 | z = y 228 | else: # Poissonian component 229 | z = np.random.poisson( np.maximum(y/a,0.0) )*a; 230 | 231 | if(b<0): 232 | raise warnings.warn('The Gaussian noise parameter b has to be non-negative (setting b=0)') 233 | b = 0.0 234 | 235 | z = z + np.sqrt(b) * np.random.randn(*y.shape) #Gaussian component 236 | 237 | # Clipping 238 | if(clip_above): 239 | z = np.minimum(z, 1.0); 240 | 241 | if(clip_below): 242 | z = np.maximum(z, 0.0); 243 | 244 | # Return the simulated image 245 | return z 246 | 247 | # Currently only implements one method 248 | NoiseEstMethod = {'daub_reflect': 0, 'daub_replicate': 1} 249 | 250 | 251 | def estimate_std(z, method='daub_reflect'): 252 | import cv2 253 | # Estimates noise standard deviation assuming additive gaussian noise 254 | 255 | # Check method 256 | if (method not in NoiseEstMethod.values()) and (method in NoiseEstMethod.keys()): 257 | method = NoiseEstMethod[method] 258 | else: 259 | raise Exception("Invalid noise estimation method.") 260 | 261 | # Check shape 262 | if len(z.shape) == 2: 263 | z = z[..., np.newaxis] 264 | elif len(z.shape) != 3: 265 | raise Exception("Supports only up to 3D images.") 266 | 267 | # Run on multichannel image 268 | channels = z.shape[2] 269 | dev = np.zeros(channels) 270 | 271 | # Iterate over channels 272 | for ch in range(channels): 273 | 274 | # Daubechies denoising method 275 | if method == NoiseEstMethod['daub_reflect'] or method == NoiseEstMethod['daub_replicate']: 276 | daub6kern = np.array([0.03522629188571, 0.08544127388203, -0.13501102001025, 277 | -0.45987750211849, 0.80689150931109, -0.33267055295008], 278 | dtype=np.float32, order='F') 279 | 280 | if method == NoiseEstMethod['daub_reflect']: 281 | wav_det = cv2.sepFilter2D(z, -1, daub6kern, daub6kern, 282 | borderType=cv2.BORDER_REFLECT_101) 283 | else: 284 | wav_det = cv2.sepFilter2D(z, -1, daub6kern, daub6kern, 285 | borderType=cv2.BORDER_REPLICATE) 286 | 287 | dev[ch] = np.median(np.absolute(wav_det)) / 0.6745 288 | 289 | # Return standard deviation 290 | return dev 291 | -------------------------------------------------------------------------------- /preprocessing/writeout_preprocessing.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Provides utilities to preprocess images for the Inception networks.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | from preprocessing import sensor_model 22 | 23 | import tensorflow as tf 24 | import numpy as np 25 | 26 | from tensorflow.python.ops import control_flow_ops 27 | 28 | 29 | def apply_with_random_selector(x, func, num_cases): 30 | """Computes func(x, sel), with sel sampled from [0...num_cases-1]. 31 | 32 | Args: 33 | x: input Tensor. 34 | func: Python function to apply. 35 | num_cases: Python int32, number of cases to sample sel from. 36 | 37 | Returns: 38 | The result of func(x, sel), where func receives the value of the 39 | selector as a python integer, but sel is sampled dynamically. 40 | """ 41 | sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32) 42 | # Pass the real x only to one of the func calls. 43 | return control_flow_ops.merge([ 44 | func(control_flow_ops.switch(x, tf.equal(sel, case))[1], case) 45 | for case in range(num_cases)])[0] 46 | 47 | 48 | def distort_color(image, color_ordering=0, fast_mode=True, scope=None): 49 | """Distort the color of a Tensor image. 50 | 51 | Each color distortion is non-commutative and thus ordering of the color ops 52 | matters. Ideally we would randomly permute the ordering of the color ops. 53 | Rather then adding that level of complication, we select a distinct ordering 54 | of color ops for each preprocessing thread. 55 | 56 | Args: 57 | image: 3-D Tensor containing single image in [0, 1]. 58 | color_ordering: Python int, a type of distortion (valid values: 0-3). 59 | fast_mode: Avoids slower ops (random_hue and random_contrast) 60 | scope: Optional scope for name_scope. 61 | Returns: 62 | 3-D Tensor color-distorted image on range [0, 1] 63 | Raises: 64 | ValueError: if color_ordering not in [0, 3] 65 | """ 66 | with tf.name_scope(scope, 'distort_color', [image]): 67 | if fast_mode: 68 | if color_ordering == 0: 69 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 70 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 71 | else: 72 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 73 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 74 | else: 75 | if color_ordering == 0: 76 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 77 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 78 | image = tf.image.random_hue(image, max_delta=0.2) 79 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 80 | elif color_ordering == 1: 81 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 82 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 83 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 84 | image = tf.image.random_hue(image, max_delta=0.2) 85 | elif color_ordering == 2: 86 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 87 | image = tf.image.random_hue(image, max_delta=0.2) 88 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 89 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 90 | elif color_ordering == 3: 91 | image = tf.image.random_hue(image, max_delta=0.2) 92 | image = tf.image.random_saturation(image, lower=0.5, upper=1.5) 93 | image = tf.image.random_contrast(image, lower=0.5, upper=1.5) 94 | image = tf.image.random_brightness(image, max_delta=32. / 255.) 95 | else: 96 | raise ValueError('color_ordering must be in [0, 3]') 97 | 98 | # The random_* ops do not necessarily clamp. 99 | return tf.clip_by_value(image, 0.0, 1.0) 100 | 101 | 102 | def distorted_bounding_box_crop(image, 103 | bbox, 104 | min_object_covered=0.1, 105 | aspect_ratio_range=(0.75, 1.33), 106 | area_range=(0.05, 1.0), 107 | max_attempts=100, 108 | scope=None): 109 | """Generates cropped_image using a one of the bboxes randomly distorted. 110 | 111 | See `tf.image.sample_distorted_bounding_box` for more documentation. 112 | 113 | Args: 114 | image: 3-D Tensor of image (it will be converted to floats in [0, 1]). 115 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 116 | where each coordinate is [0, 1) and the coordinates are arranged 117 | as [ymin, xmin, ymax, xmax]. If num_boxes is 0 then it would use the whole 118 | image. 119 | min_object_covered: An optional `float`. Defaults to `0.1`. The cropped 120 | area of the image must contain at least this fraction of any bounding box 121 | supplied. 122 | aspect_ratio_range: An optional list of `floats`. The cropped area of the 123 | image must have an aspect ratio = width / height within this range. 124 | area_range: An optional list of `floats`. The cropped area of the image 125 | must contain a fraction of the supplied image within in this range. 126 | max_attempts: An optional `int`. Number of attempts at generating a cropped 127 | region of the image of the specified constraints. After `max_attempts` 128 | failures, return the entire image. 129 | scope: Optional scope for name_scope. 130 | Returns: 131 | A tuple, a 3-D Tensor cropped_image and the distorted bbox 132 | """ 133 | with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bbox]): 134 | # Each bounding box has shape [1, num_boxes, box coords] and 135 | # the coordinates are ordered [ymin, xmin, ymax, xmax]. 136 | 137 | # A large fraction of image datasets contain a human-annotated bounding 138 | # box delineating the region of the image containing the object of interest. 139 | # We choose to create a new bounding box for the object which is a randomly 140 | # distorted version of the human-annotated bounding box that obeys an 141 | # allowed range of aspect ratios, sizes and overlap with the human-annotated 142 | # bounding box. If no box is supplied, then we assume the bounding box is 143 | # the entire image. 144 | sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( 145 | tf.shape(image), 146 | bounding_boxes=bbox, 147 | min_object_covered=min_object_covered, 148 | aspect_ratio_range=aspect_ratio_range, 149 | area_range=area_range, 150 | max_attempts=max_attempts, 151 | use_image_if_no_bounding_boxes=True) 152 | bbox_begin, bbox_size, distort_bbox = sample_distorted_bounding_box 153 | 154 | # Crop the image to the specified bounding box. 155 | cropped_image = tf.slice(image, bbox_begin, bbox_size) 156 | return cropped_image, distort_bbox 157 | 158 | 159 | def preprocess_for_train(image, height, width, bbox, 160 | fast_mode=True, 161 | light_level=None, 162 | scope=None): 163 | """Distort one image for training a netwo. 164 | 165 | Distorting images provides a useful technique for augmenting the data 166 | set during training in order to make the network invariant to aspects 167 | of the image that do not effect the label. 168 | 169 | Additionally it would create image_summaries to display the different 170 | transformations applied to the image. 171 | 172 | Args: 173 | image: 3-D Tensor of image. If dtype is tf.float32 then the range should be 174 | [0, 1], otherwise it would converted to tf.float32 assuming that the range 175 | is [0, MAX], where MAX is largest positive representable number for 176 | int(8/16/32) data type (see `tf.image.convert_image_dtype` for details). 177 | height: integer 178 | width: integer 179 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 180 | where each coordinate is [0, 1) and the coordinates are arranged 181 | as [ymin, xmin, ymax, xmax]. 182 | fast_mode: Optional boolean, if True avoids slower transformations (i.e. 183 | bi-cubic resizing, random_hue or random_contrast). 184 | scope: Optional scope for name_scope. 185 | Returns: 186 | 3-D float Tensor of distorted image used for training with range [-1, 1]. 187 | """ 188 | with tf.name_scope(scope, 'distort_image', [image, height, width, bbox]): 189 | 190 | 191 | if bbox is None: 192 | bbox = tf.constant([0.0, 0.0, 1.0, 1.0], 193 | dtype=tf.float32, 194 | shape=[1, 1, 4]) 195 | if image.dtype != tf.float32: 196 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 197 | 198 | # Each bounding box has shape [1, num_boxes, box coords] and 199 | # the coordinates are ordered [ymin, xmin, ymax, xmax]. 200 | image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0), 201 | bbox) 202 | tf.summary.image('image_with_bounding_boxes', image_with_box) 203 | 204 | distorted_image, distorted_bbox = distorted_bounding_box_crop(image, bbox) 205 | # Restore the shape since the dynamic slice based upon the bbox_size loses 206 | # the third dimension. 207 | distorted_image.set_shape([None, None, 3]) 208 | image_with_distorted_box = tf.image.draw_bounding_boxes( 209 | tf.expand_dims(image, 0), distorted_bbox) 210 | tf.summary.image('images_with_distorted_bounding_box', 211 | image_with_distorted_box) 212 | 213 | # This resizing operation may distort the images because the aspect 214 | # ratio is not respected. We select a resize method in a round robin 215 | # fashion based on the thread number. 216 | # Note that ResizeMethod contains 4 enumerated resizing methods. 217 | 218 | 219 | # We select only 1 case for fast_mode bilinear. 220 | #num_resize_cases = 1 221 | #distorted_image = apply_with_random_selector( 222 | # distorted_image, 223 | # lambda x, method: tf.image.resize_images(x, [height, width], method=method), 224 | # num_cases=num_resize_cases) 225 | 226 | # Use nearest neighbor subsampling. 227 | print("USING NEAREST NEIGHBOR SUBSAMPLING") 228 | distorted_image = tf.image.resize_images(distorted_image, [height, width], 229 | method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 230 | 231 | tf.summary.image('cropped_resized_image', 232 | tf.expand_dims(distorted_image, 0)) 233 | 234 | # Randomly flip the image horizontally. 235 | distorted_image = tf.image.random_flip_left_right(distorted_image) 236 | 237 | tf.summary.image('final_distorted_image', 238 | tf.expand_dims(distorted_image, 0)) 239 | return distorted_image 240 | 241 | 242 | def preprocess_for_eval(image, height, width, light_level=None, 243 | central_fraction=0.875, scope=None): 244 | """Prepare one image for evaluation. 245 | 246 | If height and width are specified it would output an image with that size by 247 | applying resize_bilinear. 248 | 249 | If central_fraction is specified it would cropt the central fraction of the 250 | input image. 251 | 252 | Args: 253 | image: 3-D Tensor of image. If dtype is tf.float32 then the range should be 254 | [0, 1], otherwise it would converted to tf.float32 assuming that the range 255 | is [0, MAX], where MAX is largest positive representable number for 256 | int(8/16/32) data type (see `tf.image.convert_image_dtype` for details) 257 | height: integer 258 | width: integer 259 | central_fraction: Optional Float, fraction of the image to crop. 260 | scope: Optional scope for name_scope. 261 | Returns: 262 | 3-D float Tensor of prepared image. 263 | """ 264 | with tf.name_scope(scope, 'eval_image', [image, height, width]): 265 | if image.dtype != tf.float32: 266 | image = tf.image.convert_image_dtype(image, dtype=tf.float32) 267 | 268 | # Crop the central region of the image with an area containing 87.5% of 269 | # the original image. 270 | if central_fraction: 271 | image = tf.image.central_crop(image, central_fraction=central_fraction) 272 | 273 | #image = tf.py_func(sensor_model.sensor_model, [image], tf.float32, stateful=True) 274 | if height and width: 275 | # Resize the image to the specified height and width. 276 | image = tf.expand_dims(image, 0) 277 | image = tf.image.resize_images(image, [height, width], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 278 | 279 | image = tf.squeeze(image, [0]) 280 | 281 | image.set_shape([height, width, 3]) 282 | return image 283 | 284 | def preprocess_image(image, ground_truth, height, width, 285 | is_training=False, 286 | bbox=None, 287 | fast_mode=True, 288 | light_level=None): 289 | """Pre-process one image for training or evaluation. 290 | 291 | Args: 292 | image: 3-D Tensor [height, width, channels] with the image. 293 | height: integer, image expected height. 294 | width: integer, image expected width. 295 | is_training: Boolean. If true it would transform an image for train, 296 | otherwise it would transform it for evaluation. 297 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 298 | where each coordinate is [0, 1) and the coordinates are arranged as 299 | [ymin, xmin, ymax, xmax]. 300 | fast_mode: Optional boolean, if True avoids slower transformations. 301 | 302 | Returns: 303 | 3-D float Tensor containing an appropriately scaled image 304 | 305 | Raises: 306 | ValueError: if user does not provide bounding box 307 | """ 308 | if is_training: 309 | return preprocess_for_train(image, height, width, bbox, fast_mode, light_level) 310 | else: 311 | return preprocess_for_eval(image, height, width, light_level) 312 | -------------------------------------------------------------------------------- /run_test_captured_images.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Set the checkpoint and dataset paths 4 | checkpoints=/path/to/checkpoints 5 | dataset_dir=/path/to/dataset/RAW_synset_ISO8000_EXP10000/ 6 | 7 | # Change --eval_dir paramater if needed 8 | # Proposed Joint Architecture 9 | python test_captured_images.py --device=1 --dataset_dir=$dataset_dir --dataset_name=imagenet \ 10 | --checkpoint_path=$checkpoints/joint128/2to200lux/model.ckpt-232721 \ 11 | --model_name=mobilenet_isp --noise_channel=True --use_anscombe=True \ 12 | --isp_model_name=isp --eval_image_size=224 --sensor=Pixel --eval_dir joint_real_2to200lux 13 | 14 | # Proposed Joint Architecture (no Anscombe layers) 15 | python test_captured_images.py --device=1 --dataset_dir=$dataset_dir --dataset_name=imagenet \ 16 | --checkpoint_path=$checkpoints/joint128/2to200lux_no_ansc/model.ckpt-215307 \ 17 | --model_name=mobilenet_isp --noise_channel=False --use_anscombe=False \ 18 | --isp_model_name=isp --eval_image_size=224 --sensor=Pixel --eval_dir joint_no_anscombe_real_2to200lux 19 | 20 | # # From Scratch MobileNet-v1 21 | python test_captured_images.py --device=1 --dataset_dir=$dataset_dir --dataset_name=imagenet \ 22 | --checkpoint_path=$checkpoints/mobilenet_v1_128/2to200lux/model.ckpt-325357 \ 23 | --model_name=mobilenet_v1 --eval_image_size=224 --preprocessing_name=mobilenet_isp \ 24 | --eval_dir mobilenet_v1_real_2to200lux -------------------------------------------------------------------------------- /run_test_synthetic_images.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | checkpoints_dir=/path/to/checkpoints 4 | dataset_dir=/path/to/imagenet_validation 5 | eval_dir=/path/to/output_dir 6 | 7 | noise=3lux 8 | python test_synthetic_images.py --device=1 \ 9 | --checkpoint_path=$checkpoints_dir'/joint128/'$noise'/model.ckpt-216759' \ 10 | --dataset_dir=$dataset_dir --dataset_name=imagenet --mode=$noise \ 11 | --model_name=mobilenet_isp --eval_dir=$eval_dir/$noise 12 | 13 | noise=6lux 14 | python test_synthetic_images.py --device=1 \ 15 | --checkpoint_path=$checkpoints_dir'/joint128/'$noise'/model.ckpt-222267' \ 16 | --dataset_dir=$dataset_dir --dataset_name=imagenet --mode=$noise \ 17 | --model_name=mobilenet_isp --eval_dir=$eval_dir/$noise 18 | 19 | noise=2to20lux 20 | python test_synthetic_images.py --device=1 \ 21 | --checkpoint_path=$checkpoints_dir'/joint128/'$noise'/model.ckpt-232718' \ 22 | --dataset_dir=$dataset_dir --dataset_name=imagenet --mode=$noise \ 23 | --model_name=mobilenet_isp --eval_dir=$eval_dir/$noise 24 | 25 | noise=2to200lux 26 | python test_synthetic_images.py --device=1 \ 27 | --checkpoint_path=$checkpoints_dir'/joint128/'$noise'/model.ckpt-232721' \ 28 | --dataset_dir=$dataset_dir --dataset_name=imagenet --mode=$noise \ 29 | --model_name=mobilenet_isp --eval_dir=$eval_dir/$noise 30 | 31 | -------------------------------------------------------------------------------- /run_train_joint_models.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | TRAIN_DIR=/path/to/train_dir 4 | IMAGENET_TFRECORDS=/path/to/imagenetTFRecords 5 | CHECKPOINTS=/path/to/checkpoints 6 | 7 | # Train with 3lux noisy images 8 | # Set number of clones and device according to machine resources 9 | python train_image_classifier.py --train_dir=$TRAIN_DIR/3lux \ 10 | --dataset_dir=$IMAGENET_TFRECORDS --ll_low=0.0015 \ 11 | --ll_high=0.0015 --batch_size=256 --model_name=mobilenet_isp --num_readers=8 \ 12 | --num_preprocessing_threads=8 --isp_checkpoint_path=$CHECKPOINTS/multires128/6lux/model.ckpt-27000 \ 13 | --checkpoint_path=$CHECKPOINTS/mobilenet_v1_128/mobilenet_v1_1.0_128.ckpt --noise_channel=True \ 14 | --use_anscombe=True --num_clones=4 --isp_model_name=isp --num_iters=1 --device=0,1,2,3 \ 15 | --learning_rate=0.00045 --num_epochs_per_decay=2 --train_image_size=128 16 | 17 | 18 | # Train with 6lux noisy images 19 | python train_image_classifier.py --train_dir=$TRAIN_DIR/6lux \ 20 | --dataset_dir=$IMAGENET_TFRECORDS --ll_low=0.003 \ 21 | --ll_high=0.003 --batch_size=256 --model_name=mobilenet_isp --num_readers=8 \ 22 | --num_preprocessing_threads=8 --isp_checkpoint_path=$CHECKPOINTS/multires128/6lux/model.ckpt-27000 \ 23 | --checkpoint_path=$CHECKPOINTS/mobilenet_v1_128/mobilenet_v1_1.0_128.ckpt --noise_channel=True \ 24 | --use_anscombe=True --num_clones=4 --isp_model_name=isp --num_iters=1 --device=0,1,3,4 \ 25 | --learning_rate=0.00045 --num_epochs_per_decay=2 --train_image_size=128 26 | 27 | 28 | # Train with 2to20lux noisy images 29 | python train_image_classifier.py --train_dir=$TRAIN_DIR/2to20lux \ 30 | --dataset_dir=$IMAGENET_TFRECORDS --ll_low=0.001 \ 31 | --ll_high=0.010 --batch_size=256 --model_name=mobilenet_isp --num_readers=8 \ 32 | --num_preprocessing_threads=8 --isp_checkpoint_path=$CHECKPOINTS/multires128/6lux/model.ckpt-27000 \ 33 | --checkpoint_path=$CHECKPOINTS/mobilenet_v1_128/mobilenet_v1_1.0_128.ckpt --noise_channel=True \ 34 | --use_anscombe=True --num_clones=4 --isp_model_name=isp --num_iters=1 --device=0,1,2,3 \ 35 | --learning_rate=0.00045 --num_epochs_per_decay=2 --train_image_size=128 36 | 37 | # Train with 2to200lux noisy images 38 | python train_image_classifier.py --train_dir=$TRAIN_DIR/2to200lux \ 39 | --dataset_dir=$IMAGENET_TFRECORDS --ll_low=0.001 \ 40 | --ll_high=0.100 --batch_size=256 --model_name=mobilenet_isp --num_readers=8 \ 41 | --num_preprocessing_threads=8 --isp_checkpoint_path=$CHECKPOINTS/multires128/6lux/model.ckpt-27000 \ 42 | --checkpoint_path=$CHECKPOINTS/mobilenet_v1_128/mobilenet_v1_1.0_128.ckpt --noise_channel=True \ 43 | --use_anscombe=True --num_clones=4 --isp_model_name=isp --num_iters=1 --device=0,1,3,4 \ 44 | --learning_rate=0.00045 --num_epochs_per_decay=2 --train_image_size=128 -------------------------------------------------------------------------------- /simulate_raw_images.py: -------------------------------------------------------------------------------- 1 | """Script for adding noise to ImageNet-like dataset.""" 2 | 3 | from __future__ import absolute_import 4 | from __future__ import division 5 | from __future__ import print_function 6 | 7 | import math 8 | import tensorflow as tf 9 | import os 10 | import cv2 11 | from datasets import dataset_factory, build_imagenet_data 12 | import numpy as np 13 | from preprocessing import preprocessing_factory, sensor_model 14 | from pprint import pprint 15 | from glob import glob 16 | 17 | 18 | tf.app.flags.DEFINE_float( 19 | 'll_low', None, 20 | 'Lowest light level.') 21 | 22 | tf.app.flags.DEFINE_float( 23 | 'll_high', None, 24 | 'Highest light level.') 25 | 26 | tf.app.flags.DEFINE_string( 27 | 'sensor', 'Nexus_6P_rear', 'The sensor.') 28 | 29 | tf.app.flags.DEFINE_string( 30 | 'output_dir', None, 'Directory where the results are saved to.') 31 | 32 | tf.app.flags.DEFINE_string( 33 | 'input_dir', None, 'The directory where the dataset files are stored.') 34 | 35 | tf.app.flags.DEFINE_string( 36 | 'preprocessing_name', 'mobilenet_v1', 'The name of the preprocessing to use. If left as `None`, then the model_name flag is used.') 37 | 38 | tf.app.flags.DEFINE_integer( 39 | 'eval_image_size', 128, 'Eval image size') 40 | 41 | FLAGS = tf.app.flags.FLAGS 42 | 43 | def main(_): 44 | if not FLAGS.input_dir: 45 | raise ValueError('You must supply the input directory with --input_dir') 46 | if not FLAGS.output_dir: 47 | raise ValueError('You must supply the dataset directory with --output_dir') 48 | 49 | tf.logging.set_verbosity(tf.logging.INFO) 50 | with tf.Graph().as_default(): 51 | 52 | # Preprocess the images so that they all have the same size 53 | preprocessing_name = FLAGS.preprocessing_name or FLAGS.model_name 54 | image_preprocessing_fn = preprocessing_factory.get_preprocessing( 55 | preprocessing_name, 56 | is_training=False) 57 | 58 | eval_image_size = FLAGS.eval_image_size 59 | orig_image = tf.placeholder(tf.uint8, shape=(None, None, 3)) 60 | image = image_preprocessing_fn(orig_image, orig_image, eval_image_size, eval_image_size) 61 | images = tf.expand_dims(image, 0) 62 | 63 | # Add noise. 64 | noisy_batch, alpha, sigma = sensor_model.sensor_noise_rand_light_level(images, 65 | [FLAGS.ll_low, FLAGS.ll_high], 66 | scale=1.0, sensor=FLAGS.sensor) 67 | 68 | bayer_mask = sensor_model.get_bayer_mask(eval_image_size, eval_image_size) 69 | inputs = noisy_batch*bayer_mask 70 | 71 | if not os.path.isdir(FLAGS.output_dir): 72 | os.mkdir(FLAGS.output_dir) 73 | 74 | with tf.Session() as sess: 75 | count = 0 76 | synsets = [path for path in os.listdir(FLAGS.input_dir) if not '.' in path] 77 | 78 | for synset in synsets: 79 | path = os.path.join(FLAGS.input_dir, synset) 80 | image_names = os.listdir(path) 81 | print("Found %d images in %s"%(len(image_names), synset)) 82 | 83 | synset_path = os.path.join(FLAGS.output_dir, synset) 84 | if not os.path.isdir(synset_path): 85 | os.mkdir(synset_path) 86 | 87 | for imagename in image_names: 88 | output_imgfn = os.path.join(FLAGS.output_dir, synset, imagename.split('.')[0]+'.png') 89 | if os.path.isfile(output_imgfn): 90 | continue 91 | loaded_image = cv2.imread(os.path.join(path, imagename)) 92 | 93 | # BGR to RGB 94 | loaded_image = loaded_image[..., ::-1] 95 | images, alpha_val, sigma_val = sess.run( 96 | [inputs, alpha, sigma], 97 | feed_dict={orig_image:loaded_image}) 98 | img = (255.0*images[0,:,:,:]).astype(np.uint8) 99 | 100 | # RGB to BGR 101 | img = img[..., ::-1] 102 | 103 | if count % 1000 == 0: 104 | print("%d processed images." % (count)) 105 | cv2.imwrite(output_imgfn, img) 106 | count += 1 107 | 108 | print('Total images processed:', count) 109 | 110 | 111 | if __name__ == '__main__': 112 | tf.app.run() 113 | -------------------------------------------------------------------------------- /teaser/architecture_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/princeton-computational-imaging/DirtyPixels/6c82b124c9e32bbf5fa7d6adf8db8103132e4e5e/teaser/architecture_2.jpg -------------------------------------------------------------------------------- /teaser/teaser_v4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/princeton-computational-imaging/DirtyPixels/6c82b124c9e32bbf5fa7d6adf8db8103132e4e5e/teaser/teaser_v4.png -------------------------------------------------------------------------------- /test_captured_images.py: -------------------------------------------------------------------------------- 1 | """Generic evaluation script that evaluates a model using the Dirty-Pixels captured dataset.""" 2 | 3 | from __future__ import absolute_import 4 | from __future__ import division 5 | from __future__ import print_function 6 | 7 | import math 8 | import skimage.measure 9 | import scipy.ndimage.filters 10 | import tensorflow as tf 11 | import scipy.io 12 | import os 13 | import cv2 14 | from datasets import dataset_factory, build_imagenet_data 15 | import numpy as np 16 | from nets import nets_factory 17 | from preprocessing import preprocessing_factory, sensor_model 18 | import matplotlib.pyplot as plt 19 | from pprint import pprint 20 | from nets.isp import anscombe 21 | import rawpy 22 | import pyexifinfo 23 | 24 | slim = tf.contrib.slim 25 | 26 | tf.app.flags.DEFINE_string( 27 | 'device', '0', 'GPU device to use.') 28 | 29 | tf.app.flags.DEFINE_string( 30 | 'sensor', 'Nexus_6P_rear', 'The sensor.') 31 | 32 | tf.app.flags.DEFINE_string( 33 | 'isp_model_name', None, 'The name of the ISP architecture to train.') 34 | 35 | tf.app.flags.DEFINE_boolean('use_anscombe', True, 36 | 'Use Anscombe transform.') 37 | 38 | tf.app.flags.DEFINE_boolean('noise_channel', True, 39 | 'Use noise channel.') 40 | 41 | tf.app.flags.DEFINE_integer( 42 | 'num_iters', 1, 43 | 'Number of iterations for the unrolled Proximal Gradient Network.') 44 | 45 | tf.app.flags.DEFINE_integer( 46 | 'num_layers', 17, 'Number of layers to be used in the HQS ISP prior -- DEPRECATED') 47 | 48 | tf.app.flags.DEFINE_string( 49 | 'checkpoint_path', '/tmp/tfmodel/', 50 | 'The directory where the model was written to or an absolute path to a ' 51 | 'checkpoint file.') 52 | 53 | tf.app.flags.DEFINE_string( 54 | 'eval_dir', '/tmp/tfmodel/', 'Directory where the results are saved to.') 55 | 56 | tf.app.flags.DEFINE_string( 57 | 'dataset_dir', None, 'The directory where the dataset files are stored.') 58 | 59 | tf.app.flags.DEFINE_string( 60 | 'model_name', 'inception_v3', 'The name of the architecture to evaluate.') 61 | 62 | tf.app.flags.DEFINE_string( 63 | 'preprocessing_name', None, 'The name of the preprocessing to use. If left ' 64 | 'as `None`, then the model_name flag is used.') 65 | 66 | tf.app.flags.DEFINE_integer( 67 | 'eval_image_size', None, 'Eval image size') 68 | 69 | 70 | FLAGS = tf.app.flags.FLAGS 71 | 72 | 73 | def crop_and_subsample(img, target_size, average=None): 74 | factor = int(np.floor(min(img.shape) / target_size)) 75 | ch = (img.shape[0] - factor * target_size) / 2 76 | cw = (img.shape[1] - factor * target_size) / 2 77 | cropped = img[int(np.floor(ch)):-int(np.ceil(ch)), 78 | int(np.floor(cw)):-int(np.ceil(cw))] 79 | if average is not None: 80 | cropped = scipy.ndimage.filters.convolve(cropped, np.ones((average, average))) 81 | return cropped[::factor, ::factor] 82 | 83 | 84 | def main(_): 85 | if not FLAGS.dataset_dir: 86 | raise ValueError('You must supply the dataset directory with --dataset_dir') 87 | 88 | os.environ['CUDA_VISIBLE_DEVICES'] = FLAGS.device 89 | 90 | tf.logging.set_verbosity(tf.logging.INFO) 91 | with tf.Graph().as_default(): 92 | 93 | #################### 94 | # Select the model # 95 | #################### 96 | num_classes = 1001 97 | network_fn = nets_factory.get_network_fn( 98 | FLAGS.model_name, 99 | num_classes, 100 | weight_decay=0.0, 101 | batch_norm_decay=0.95, 102 | is_training=False) 103 | 104 | ##################################### 105 | # Select the preprocessing function # 106 | ##################################### 107 | preprocessing_name = FLAGS.preprocessing_name or FLAGS.model_name 108 | image_preprocessing_fn = preprocessing_factory.get_preprocessing( 109 | preprocessing_name, 110 | is_training=False) 111 | 112 | eval_image_size = FLAGS.eval_image_size or network_fn.default_image_size 113 | 114 | orig_image = tf.placeholder(tf.float32, shape=(eval_image_size, eval_image_size, 3)) 115 | alpha = tf.placeholder(tf.float32, shape=[1, 3]) 116 | sigma = tf.placeholder(tf.float32, shape=[1, 3]) 117 | bayer_mask = sensor_model.get_bayer_mask(eval_image_size, eval_image_size) 118 | # image = image_preprocessing_fn(orig_image, orig_image, eval_image_size, eval_image_size, sensor=FLAGS.sensor) 119 | image = orig_image * bayer_mask 120 | # alpha, sigma = sensor_model.get_coeffs(light_level[None], sensor=FLAGS.sensor) 121 | # Scale to [-1, 1] 122 | if FLAGS.isp_model_name is None: 123 | image = 2 * (image - 0.5) 124 | 125 | images = tf.expand_dims(image, 0) 126 | 127 | #################### 128 | # Define the model # 129 | #################### 130 | inputs = images 131 | 132 | network_ops = network_fn(images=inputs, alpha=alpha, sigma=sigma, 133 | bayer_mask=bayer_mask, use_anscombe=FLAGS.use_anscombe, 134 | noise_channel=FLAGS.noise_channel, 135 | num_classes=num_classes, 136 | num_iters=FLAGS.num_iters, num_layers=FLAGS.num_layers, 137 | isp_model_name=FLAGS.isp_model_name, is_real_data=True) 138 | logits, end_points = network_ops[:2] 139 | 140 | variables_to_restore = slim.get_variables_to_restore() 141 | saver = tf.train.Saver() 142 | 143 | if tf.gfile.IsDirectory(FLAGS.checkpoint_path): 144 | checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path) 145 | else: 146 | checkpoint_path = FLAGS.checkpoint_path 147 | 148 | synset2label = {} 149 | with open("datasets/synset_labels.txt", "r") as f: 150 | for line in f: 151 | synset, label = line.split(':') 152 | synset2label[synset] = int(label) 153 | 154 | if not os.path.isdir(FLAGS.eval_dir): 155 | os.mkdir(FLAGS.eval_dir) 156 | 157 | with tf.Session() as sess: 158 | saver.restore(sess, FLAGS.checkpoint_path) 159 | synsets = os.listdir(FLAGS.dataset_dir) 160 | number_to_human = {int(i[0]):i[1] for i in np.genfromtxt('datasets/imagenet_labels.txt', delimiter=':', dtype=np.string_)} 161 | 162 | # estimated alpha and gama 163 | alpha_val = 0.0153 164 | sigma_val = 0.0328 165 | count = 0 166 | top1 = 0 167 | top5 = 0 168 | correct_paths = [] 169 | wrong_paths = [] 170 | for synset in synsets: 171 | if synset == 'labels.txt': 172 | continue 173 | synset_top5 = 0 174 | path = os.path.join(FLAGS.dataset_dir, synset) 175 | image_names = [name for name in sorted(os.listdir(path)) if '.dng' in name] 176 | for imagename in image_names: 177 | try: 178 | loaded_image = rawpy.imread(os.path.join(path, imagename)) 179 | info = pyexifinfo.get_json(os.path.join(path, imagename))[0] 180 | black_level = float(info['EXIF:BlackLevel'].split(' ')[0]) 181 | awb = [float(x) for x in info['EXIF:AsShotNeutral'].split(' ')] 182 | raw_img = (loaded_image.raw_image_visible - black_level) / 1023. 183 | except Exception as e: 184 | print(synset, imagename, e) 185 | continue 186 | 187 | B = raw_img[::2, ::2] / awb[2] 188 | R = raw_img[1::2, 1::2] / awb[0] 189 | G1 = raw_img[1::2, ::2] / awb[1] 190 | G2 = raw_img[::2, 1::2] / awb[1] 191 | B, R, G1, G2 = (crop_and_subsample(img, eval_image_size // 2) 192 | for img in [B, R, G1, G2]) 193 | scale_factor = 1.0 / np.percentile(np.stack([B, R, G1, G2], axis=2), 98) 194 | 195 | mosaiced = np.zeros((224, 224, 3)) 196 | mosaiced[::2, ::2, 2] = B 197 | mosaiced[1::2, 1::2, 0] = R 198 | mosaiced[1::2, ::2, 1] = G1 199 | mosaiced[::2, 1::2, 1] = G2 200 | 201 | img_scaled = mosaiced * scale_factor 202 | input_img = np.clip(img_scaled, 0, 1) 203 | scaling = (scale_factor / np.array(awb))[None, :] 204 | logits_vals, clean_image = sess.run( 205 | [logits[0, :], end_points.get('mobilenet_input', alpha)], 206 | feed_dict={orig_image: input_img, 207 | alpha: alpha_val * scaling, 208 | sigma: sigma_val * scaling}) 209 | correct = synset2label[synset] 210 | predictions = np.argsort(-logits_vals) 211 | rank = np.nonzero(predictions == correct)[0] 212 | clean_image = clean_image.squeeze() 213 | 214 | if count % 100 == 0: 215 | print("%d images out of 1000" % (count)) 216 | 217 | trgt_path = os.path.join(FLAGS.eval_dir, 'clean', synset) 218 | raw_path = os.path.join(FLAGS.eval_dir, 'raw', synset) 219 | 220 | if not os.path.exists(raw_path): 221 | os.makedirs(raw_path) 222 | 223 | if not os.path.exists(trgt_path): 224 | os.makedirs(trgt_path) 225 | cv2.imwrite(os.path.join(raw_path, imagename[:-4]+'.png'), (input_img*255).astype(np.uint8)) 226 | if FLAGS.isp_model_name == 'isp': 227 | trgt_path = os.path.join(trgt_path, imagename[:-4]+'.png') 228 | plt.imsave(trgt_path, clean_image) 229 | 230 | if rank == 0: 231 | correct_paths.append("%s \"%s\" \"%s\""%(os.path.join(trgt_path, imagename[:-4]+'.png'), number_to_human[correct], number_to_human[predictions[0]])) 232 | top1 += 1.0 233 | else: 234 | wrong_paths.append("%s \"%s\" \"%s\""%(os.path.join(trgt_path, imagename[:-4]+'.png'), number_to_human[correct], number_to_human[predictions[0]])) 235 | 236 | if rank <= 5: 237 | top5 += 1.0 238 | synset_top5 += 1.0 239 | count += 1 240 | 241 | print("Synset %s, Top 5 %f" % (synset, synset_top5 / len(image_names))) 242 | 243 | print("Top-1 %f, Top-5 %f" % (top1 / count, top5 / count)) 244 | 245 | with open(os.path.join(FLAGS.eval_dir, 'correct.txt'), 'w') as f: 246 | for item in correct_paths: 247 | f.write("%s\n" % item) 248 | with open(os.path.join(FLAGS.eval_dir, 'wrong.txt'), 'w') as f: 249 | for item in wrong_paths: 250 | f.write("%s\n" % item) 251 | 252 | 253 | if __name__ == '__main__': 254 | tf.app.run() 255 | 256 | 257 | 258 | -------------------------------------------------------------------------------- /test_synthetic_images.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | 17 | # This file evaluates a trained network on a test dataset and saves the filenames of images 18 | # that were correctly / falsely classified into a text file, so that the images that different 19 | # classifiers got right / wrong can be compared. 20 | 21 | """Generic evaluation script that evaluates a model using a given dataset. 22 | Noise is introduced before images are input to the classifier, and it is defined by the 23 | mode parameter, and the Camera Image Formation 24 | model defined in the Dirty-Pixels manuscript. 25 | """ 26 | 27 | from __future__ import absolute_import 28 | from __future__ import division 29 | from __future__ import print_function 30 | 31 | import math 32 | import tensorflow as tf 33 | import os 34 | from glob import glob 35 | 36 | import cv2 37 | 38 | from preprocessing import preprocessing_factory, sensor_model 39 | from datasets import dataset_factory 40 | from nets import nets_factory 41 | import numpy as np 42 | 43 | slim = tf.contrib.slim 44 | 45 | tf.app.flags.DEFINE_string( 46 | 'device', '', 'The address of the TensorFlow master to use.') 47 | 48 | tf.app.flags.DEFINE_string( 49 | 'mode', '3lux', 'Noise profile: 3lux, 6lux, 2to20lux, or 2to200lux.') 50 | 51 | 52 | tf.app.flags.DEFINE_string( 53 | 'checkpoint_path', '/tmp/tfmodel/', 54 | 'The directory where the model was written to or an absolute path to a ' 55 | 'checkpoint file.') 56 | 57 | tf.app.flags.DEFINE_string( 58 | 'dataset_name', 'imagenet', 'The name of the dataset to load.') 59 | 60 | tf.app.flags.DEFINE_string( 61 | 'dataset_dir', None, 'The directory where the dataset files are stored.') 62 | 63 | tf.app.flags.DEFINE_string( 64 | 'model_name', None, 'The name of the architecture to evaluate.') 65 | 66 | tf.app.flags.DEFINE_string('eval_dir', 'output_synthetic_images', 'Output directory') 67 | 68 | FLAGS = tf.app.flags.FLAGS 69 | 70 | 71 | def imnet_generator(root_directory): 72 | # list all directories 73 | dirs = sorted(glob(os.path.join(root_directory, "*/"))) 74 | print("#### num dirs", len(dirs)) 75 | 76 | # Build the label lookup table 77 | synset_to_label = {synset.decode('utf-8'):i+1 for i, synset in enumerate(np.genfromtxt('datasets/imagenet_lsvrc_2015_synsets.txt', dtype=np.string_))} 78 | # print(synset_to_label.items()) 79 | 80 | # loop through directories and glob all images 81 | for idx, dir in enumerate(dirs): 82 | # Glob all image files in this directory 83 | img_files = glob(os.path.join(dir, '*.png')) 84 | img_files += glob(os.path.join(dir, '*.jpg')) 85 | img_files += glob(os.path.join(dir, '*.jpeg')) 86 | img_files += glob(os.path.join(dir, '*.JPEG')) 87 | 88 | for img_file in img_files: 89 | yield img_file, synset_to_label[os.path.basename(os.path.normpath(dir))], os.path.basename(os.path.normpath(dir)) 90 | 91 | def parse_img(img_path): 92 | rgb_string = tf.read_file(img_path) 93 | rgb_decoded = tf.image.decode_jpeg(rgb_string) # uint8 94 | rgb_decoded = tf.cast(rgb_decoded, tf.float32) 95 | rgb_decoded /= 255. 96 | return rgb_decoded 97 | 98 | def main(_): 99 | if not FLAGS.dataset_dir: 100 | raise ValueError('You must supply the dataset directory with --dataset_dir') 101 | 102 | os.environ['CUDA_VISIBLE_DEVICES'] = FLAGS.device 103 | eval_dir = FLAGS.eval_dir 104 | 105 | tf.logging.set_verbosity(tf.logging.INFO) 106 | with tf.Graph().as_default(): 107 | tf_global_step = slim.get_or_create_global_step() 108 | 109 | num_classes = 1001 110 | eval_image_size = 128 111 | 112 | image_path_graph = tf.placeholder(tf.string) 113 | label_graph = tf.placeholder(tf.int32) 114 | 115 | image = parse_img(image_path_graph) 116 | 117 | image = tf.image.central_crop(image, central_fraction=0.875) 118 | 119 | image = tf.expand_dims(image, 0) 120 | image = tf.image.resize_images(image, [eval_image_size, eval_image_size], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) 121 | image = tf.squeeze(image, [0]) 122 | 123 | #################### 124 | # Select the model # 125 | #################### 126 | network_fn = nets_factory.get_network_fn( 127 | FLAGS.model_name, 128 | num_classes=num_classes, 129 | batch_norm_decay=0.9, 130 | weight_decay=0.0, 131 | is_training=False) 132 | 133 | image.set_shape([128,128,3]) 134 | image = tf.expand_dims(image, 0) 135 | 136 | if FLAGS.mode == '2to20lux': 137 | ll_low = 0.001 138 | ll_high = 0.01 139 | elif FLAGS.mode == '2to200lux': 140 | ll_low = 0.001 141 | ll_high = 0.1 142 | elif FLAGS.mode == '3lux': 143 | ll_low = 0.0015 144 | ll_high = 0.0015 145 | elif FLAGS.mode == '6lux': 146 | ll_low = 0.003 147 | ll_high = 0.003 148 | 149 | noisy_batch, alpha, sigma = \ 150 | sensor_model.sensor_noise_rand_light_level(image, [ll_low, ll_high], scale=1.0, sensor='Nexus_6P_rear') 151 | bayer_mask = sensor_model.get_bayer_mask(128, 128) 152 | 153 | raw_image_graph = noisy_batch * bayer_mask 154 | 155 | #################### 156 | # Define the model # 157 | #################### 158 | logits, end_points, cleaned_image_graph = network_fn(images=raw_image_graph, alpha=alpha, sigma=sigma, 159 | bayer_mask=bayer_mask, use_anscombe=True, 160 | noise_channel=True, 161 | num_classes=num_classes, 162 | num_iters=1, num_layers=17, 163 | isp_model_name='isp') 164 | 165 | predictions = tf.argmax(logits, 1) 166 | 167 | if tf.gfile.IsDirectory(FLAGS.checkpoint_path): 168 | print('###### Loading last checkpoint of directory', FLAGS.checkpoint_path) 169 | checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path) 170 | else: 171 | print('###### Loading checkpoint', FLAGS.checkpoint_path) 172 | checkpoint_path = FLAGS.checkpoint_path 173 | 174 | 175 | tf.logging.info('Evaluating %s' % FLAGS.checkpoint_path) 176 | 177 | correct_paths = [] 178 | wrong_paths = [] 179 | 180 | # Restore variables from checkpoint 181 | variables_to_restore = slim.get_variables_to_restore() # slim.get_model_variables() 182 | saver = tf.train.Saver(variables_to_restore) 183 | 184 | number_to_human = {int(i[0]):i[1] for i in np.genfromtxt('datasets/imagenet_labels.txt', delimiter=':', dtype=np.string_)} 185 | 186 | eval_dir= FLAGS.eval_dir 187 | os.makedirs(eval_dir, exist_ok=True) 188 | 189 | with tf.Session() as sess: 190 | sess.run(tf.global_variables_initializer()) 191 | saver.restore(sess, checkpoint_path) 192 | 193 | count = 0 194 | for img_file, label, synset in imnet_generator(FLAGS.dataset_dir): 195 | preds_value, cleaned_image, raw_image = sess.run([predictions, cleaned_image_graph, raw_image_graph], 196 | feed_dict={image_path_graph:img_file, label_graph:label}) 197 | 198 | cleaned_image = np.clip(cleaned_image, 0.0, 1.0).squeeze()[:,:,::-1] 199 | raw_image = raw_image.squeeze()[:,:,::-1] 200 | img_filename = os.path.basename(os.path.normpath(img_file)) 201 | 202 | our_path = os.path.join(eval_dir, 'anscombe_output', FLAGS.mode, synset) 203 | raw_path = os.path.join(eval_dir, 'raw', FLAGS.mode, synset) 204 | 205 | if not os.path.exists(our_path): 206 | os.makedirs(our_path) 207 | if not os.path.exists(raw_path): 208 | os.makedirs(raw_path) 209 | 210 | if count % 10000 == 0: 211 | print('num. processed ', count) 212 | print('num. correct paths', len(correct_paths)) 213 | count += 1 214 | img_filename = os.path.splitext(img_filename)[0] + '.png' 215 | 216 | cv2.imwrite(os.path.join(our_path, img_filename), (cleaned_image*255).astype(np.uint8)) 217 | cv2.imwrite(os.path.join(raw_path, img_filename), (raw_image*255).astype(np.uint8)) 218 | 219 | if preds_value.squeeze() == label: 220 | correct_paths.append("%s \"%s\" \"%s\""%(os.path.join(our_path, img_filename), number_to_human[label], number_to_human[preds_value[0]])) 221 | else: 222 | wrong_paths.append("%s \"%s\" \"%s\""%(os.path.join(our_path, img_filename), number_to_human[label], number_to_human[preds_value[0]])) 223 | 224 | print('Top-1 accuracy', float(len(correct_paths))/float(len(wrong_paths)+len(correct_paths))) 225 | correct_paths_fn = os.path.join(eval_dir, FLAGS.mode + '_correct.txt') 226 | with open(correct_paths_fn, 'w') as f: 227 | for item in correct_paths: 228 | f.write("%s\n" % item) 229 | wrong_paths_fn = os.path.join(eval_dir, FLAGS.mode + '_wrong.txt') 230 | with open(wrong_paths_fn, 'w') as f: 231 | for item in wrong_paths: 232 | f.write("%s\n" % item) 233 | 234 | if __name__ == '__main__': 235 | tf.app.run() 236 | --------------------------------------------------------------------------------