├── LICENSE ├── README.md ├── doc └── results_legend.png ├── predict.py ├── predict_template.py ├── segmentation ├── bee_dataset.py ├── dataset.py ├── model.py ├── results_analysis.py ├── results_visualization.py ├── training_config.py └── unet.py ├── train.py └── train_template.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Okinawa Institute of Science & Technology 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Segmentation-based bee detection 2 | ================================ 3 | 4 | This repository contains code and instructions for a segmentation based object detection method, initially developed for a beehive environment and described here: [Towards Dense Object Tracking in a 2D Honeybee Hive](http://openaccess.thecvf.com/content_cvpr_2018/html/Bozek_Towards_Dense_Object_CVPR_2018_paper.html). 5 | 6 | The repository is prepared for the beehive dataset available on [this webpage](https://groups.oist.jp/bptu/honeybee-tracking-dataset). The model is also a simplified version of the one used in the paper : single frame only, no gaussian weights, smaller resolution. 7 | 8 | 9 | ## Requirements 10 | - Python 3.5+ 11 | - [TensorFlow](https://www.tensorflow.org/) (Tested on 1.12) 12 | - [OpenCV](https://opencv.org/) 13 | - [Numpy](http://www.numpy.org/) 14 | 15 | ## Run training and prediction for the beehive dataset 16 | Download one of the two bee datasets available, for example the 30 fps one: 17 | 30 fps: [images](https://beepositions.unit.oist.jp/frame_imgs_30fps.tgz), [annotations](https://beepositions.unit.oist.jp/frame_annotations_30fps.tgz) 18 | 70 fps: [images](https://beepositions.unit.oist.jp/frame_imgs_70fps.tgz), [annotations](https://beepositions.unit.oist.jp/frame_annotations_70fps.tgz) 19 | 20 | Create an empty folder to uncompress the two files, the resulting file structure should look like: 21 | ``` 22 | dataset 23 | +-- frames 24 | | +-- *.png 25 | +-- frames_txt 26 | | +-- *.txt 27 | ``` 28 | 29 | You can train the model by passing as an argument the path to your dataset root folder: 30 | 31 | `python3 train.py path_to_dataset` 32 | 33 | A subset of the images will be separated for testing (parameter --validation_num_files). You can observe that the loss value decreases with time. 34 | 35 | After training you can predict the results on the test set by running: 36 | 37 | `python3 predict.py path_to_dataset` 38 | 39 | The results will be stored in a new `predict_results` folder. For each input frame, three images will be generated : 40 | ![alt text](doc/results_legend.png) 41 | 42 | An aggregate results file of error metrics for all test images is also created : `average_error_metrics.txt`. 43 | 44 | 45 | 46 | ## To adapt to another dataset 47 | 48 | To predict on a new dataset, you can use the script `predict_template.py`, by replacing the function `predict_data_generator` to load your own dataset. 49 | 50 | To train on a new dataset, you can use the script `train_template.py` by replacing the functions `train_data_generator` and `eval_data_generator` (generate a pair of images image_data, label_data). 51 | 52 | The labels for the beehive dataset were created using a custom [tool](https://github.com/oist/DenseObjectAnnotation). 53 | 54 | ## Contact 55 | 56 | If you have any questions about this project, please contact: 57 | laetitia.hebert at oist.jp 58 | kasia.bozek at oist.jp 59 | -------------------------------------------------------------------------------- /doc/results_legend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oist/DenseObjectDetection/004ebf3d76bd66fcaa7f13ce3acafbf336927ed5/doc/results_legend.png -------------------------------------------------------------------------------- /predict.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import shutil 5 | from functools import partial 6 | 7 | import numpy as np 8 | import cv2 9 | import tensorflow as tf 10 | 11 | from segmentation import model, dataset, bee_dataset, training_config 12 | from segmentation.results_analysis import find_positions, compute_error_metrics 13 | from segmentation.results_visualization import plot_positions, plot_segmentation_map, plot_TP_FN_FP 14 | 15 | logging.basicConfig() 16 | logger = logging.getLogger(__name__) 17 | logger.setLevel(logging.DEBUG) 18 | tf.logging.set_verbosity(tf.logging.INFO) 19 | 20 | 21 | def save_error_file(point_d, axis_d, tps, fps, fns, correct_type, out_file): 22 | total = len(point_d) + fns 23 | tps_mn = float(tps) / total 24 | fns_mn = float(fns) / total 25 | fps_mn = float(fps) / total 26 | correct_type_pr = float(correct_type) / tps 27 | 28 | with open(out_file, "w") as f: 29 | f.write("position error (pixels): mean: {:.2f} ({:.2f}) median: {:.2f}\n".format(np.mean(point_d), 30 | np.std(point_d), 31 | np.median(point_d))) 32 | f.write("correct class: {:.2f}%\n".format(correct_type_pr * 100)) 33 | axis_d = np.rad2deg(axis_d) 34 | f.write("axis error (degrees) : mean: {:.2f} ({:.2f}) median {:.2f}\n".format(np.mean(axis_d), 35 | np.std(axis_d), 36 | np.median(axis_d))) 37 | f.write("True Positives: {:.2f}%\n".format(tps_mn * 100)) 38 | f.write("False Negatives: {:.2f}%\n".format(fns_mn * 100)) 39 | f.write("False Positives: {:.2f}%\n".format(fps_mn * 100)) 40 | 41 | 42 | if __name__ == '__main__': 43 | 44 | parser = argparse.ArgumentParser() 45 | parser.add_argument('dataset_root_dir', type=str, help="Path to sample images folder") 46 | parser.add_argument('--checkpoint_dir', default='checkpoints', help="Path to trained model folder") 47 | parser.add_argument('--results_folder', default='predict_results', help="Output folder") 48 | 49 | # model parameters, should be the same as training 50 | parser.add_argument('--num_classes', type=int, default=3, help="How many outputs of the model") 51 | parser.add_argument('--data_format', type=str, default='channels_last', choices={'channels_last', 'channels_first'}) 52 | 53 | # evaluation metrics 54 | parser.add_argument('--min_distance_px', type=int, default=20, 55 | help="Minimum distance in pixels between prediction and label objects" 56 | " to be considered a true positive result." 57 | "Use same coordinate system as the predicted image.") 58 | parser.add_argument('--min_blob_size_px', type=int, default=20, 59 | help="Blobs with bounding box sides smaller than min_blob_size_px are discarded." 60 | "Use same coordinate system as the predicted image.") 61 | parser.add_argument('--max_blob_size_px', type=int, default=200, 62 | help="Blobs with bounding box sides larger than max_blob_size_px are discarded." 63 | "Use same coordinate system as the predicted image.") 64 | 65 | args = parser.parse_args() 66 | logger.info('Predicting with settings: {}'.format(vars(args))) 67 | 68 | output_path = os.path.join(os.getcwd(), args.results_folder) 69 | if os.path.isdir(output_path): 70 | shutil.rmtree(output_path) 71 | os.mkdir(output_path) 72 | 73 | images_root_dir = os.path.join(args.dataset_root_dir, 'frames') 74 | labels_root_dir = os.path.join(args.dataset_root_dir, 'frames_txt') 75 | if not (os.path.exists(images_root_dir) and os.path.exists(labels_root_dir)): 76 | raise FileNotFoundError() 77 | 78 | config = training_config.get(args.dataset_root_dir) 79 | if config is not None: 80 | to_predict_filenames = config['test'] 81 | logger.info('Predicting only test images from training_config file.') 82 | else: 83 | logger.info( 84 | "Couldn't find a training_config file, so predicting all images in folder {}".format(images_root_dir)) 85 | to_predict_filenames = [os.path.splitext(x)[0] for x in os.listdir(images_root_dir)] 86 | 87 | labels = [bee_dataset.read_label_file_globalcoords(os.path.join(labels_root_dir, name + '.txt')) 88 | for name in to_predict_filenames] 89 | regions_of_interest = [l[1] for l in labels] 90 | 91 | estimator = tf.estimator.Estimator(model_fn=partial(model.build_model, 92 | num_classes=args.num_classes, 93 | data_format=args.data_format, 94 | bg_fg_weight=None), model_dir=args.checkpoint_dir) 95 | 96 | predictions = estimator.predict(input_fn=partial(dataset.make_dataset, 97 | data_generator=partial(bee_dataset.generate_predict, 98 | images_root_dir=images_root_dir, 99 | filenames=to_predict_filenames, 100 | regions_of_interest=regions_of_interest), 101 | data_format=args.data_format, 102 | batch_size=1, 103 | mode=tf.estimator.ModeKeys.PREDICT)) 104 | 105 | drawing_functions = bee_dataset.get_object_drawing_functions() 106 | 107 | TP_count, FP_count, FN_count, correct_type_count = 0, 0, 0, 0 108 | all_pixel_dist, all_axis_diff = [], [] 109 | 110 | for name, prediction, label in zip(to_predict_filenames, predictions, labels): 111 | logger.info('processing {}'.format(name)) 112 | 113 | input_image = prediction['input_data'] 114 | pred_image = prediction['prediction'] 115 | 116 | channels_axis = 0 if args.data_format == 'channels_first' else -1 117 | amax = np.argmax(pred_image, axis=channels_axis) 118 | 119 | input_image = np.uint8(np.squeeze(input_image) * 255) 120 | input_image = cv2.cvtColor(input_image, cv2.COLOR_GRAY2BGR) 121 | 122 | plot_segmentation_map(input_image, amax, 123 | os.path.join(output_path, "{}_seg_map.png".format(name)), num_classes=args.num_classes) 124 | 125 | predictions_pos = find_positions(amax, args.min_blob_size_px, args.max_blob_size_px) 126 | if len(predictions_pos) == 0: 127 | logger.info("Blob analysis failed to find objects.") 128 | continue 129 | 130 | np.savetxt(os.path.join(output_path, "{}_predictions.csv".format(name)), predictions_pos, fmt="%i,%i,%i,%.4f") 131 | np.savetxt(os.path.join(output_path, "{}_labels.csv".format(name)), label[0], fmt="%i,%i,%i,%.4f") 132 | 133 | plot_positions(input_image, [label[0], predictions_pos], [(0, 250, 255), (0, 0, 255)], 134 | os.path.join(output_path, "{}_mixed.png".format(name)), 135 | drawing_params=drawing_functions) 136 | 137 | pixel_dist, axis_diff, correct_type, TP_results, FN_results, FP_results \ 138 | = compute_error_metrics(np.array(label[0]), np.array(predictions_pos), dist_min=args.min_distance_px) 139 | 140 | TP_count += len(TP_results) 141 | FN_count += len(FN_results) 142 | FP_count += len(FP_results) 143 | correct_type_count += correct_type 144 | all_pixel_dist += pixel_dist 145 | all_axis_diff += axis_diff 146 | 147 | plot_TP_FN_FP(input_image, TP_results, FN_results, FP_results, 148 | os.path.join(output_path, "{}_detail.png".format(name)), drawing_functions) 149 | 150 | save_error_file(all_pixel_dist, all_axis_diff, TP_count, FP_count, FN_count, correct_type_count, 151 | os.path.join(output_path, "average_error_metrics.txt")) 152 | -------------------------------------------------------------------------------- /predict_template.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import shutil 5 | from functools import partial 6 | 7 | import numpy as np 8 | import cv2 9 | import tensorflow as tf 10 | 11 | from segmentation import model, dataset, bee_dataset 12 | from segmentation.results_analysis import find_positions 13 | from segmentation.results_visualization import plot_positions, plot_segmentation_map 14 | 15 | logging.basicConfig() 16 | logger = logging.getLogger(__name__) 17 | logger.setLevel(logging.DEBUG) 18 | tf.logging.set_verbosity(tf.logging.INFO) 19 | 20 | 21 | def predict_data_generator(): 22 | # Here load your 2D image data in the format uint8 (values between 0 and 255) 23 | # example : 24 | # for my_image_path in my_images_paths: 25 | # yield cv2.imread(my_image_path, cv2.IMREAD_GRAYSCALE) 26 | raise NotImplementedError() 27 | 28 | 29 | if __name__ == '__main__': 30 | 31 | parser = argparse.ArgumentParser() 32 | parser.add_argument('--checkpoint_dir', default='checkpoints', help="Path to trained model folder") 33 | parser.add_argument('--results_folder', default='predict_results', help="Output folder") 34 | 35 | # model parameters, should be the same as training 36 | parser.add_argument('--num_classes', type=int, default=3, help="How many outputs of the model") 37 | parser.add_argument('--data_format', type=str, default='channels_last', choices={'channels_last', 'channels_first'}) 38 | 39 | # metrics to accept a blob as an object 40 | parser.add_argument('--min_blob_size_px', type=int, default=20, 41 | help="Blobs with bounding box sides smaller than min_blob_size_px are discarded." 42 | "Use same coordinate system as the predicted image.") 43 | parser.add_argument('--max_blob_size_px', type=int, default=200, 44 | help="Blobs with bounding box sides larger than max_blob_size_px are discarded." 45 | "Use same coordinate system as the predicted image.") 46 | 47 | args = parser.parse_args() 48 | logger.info('Predicting with settings: {}'.format(vars(args))) 49 | 50 | output_path = os.path.join(os.getcwd(), args.results_folder) 51 | if os.path.isdir(output_path): 52 | shutil.rmtree(output_path) 53 | os.mkdir(output_path) 54 | 55 | estimator = tf.estimator.Estimator(model_fn=partial(model.build_model, 56 | num_classes=args.num_classes, 57 | data_format=args.data_format, 58 | bg_fg_weight=None), model_dir=args.checkpoint_dir) 59 | 60 | predictions = estimator.predict(input_fn=partial(dataset.make_dataset, 61 | data_generator=predict_data_generator, 62 | data_format=args.data_format, 63 | batch_size=1, 64 | mode=tf.estimator.ModeKeys.PREDICT)) 65 | 66 | drawing_functions = bee_dataset.get_object_drawing_functions() 67 | 68 | for index, prediction in enumerate(predictions): 69 | 70 | input_image = prediction['input_data'] 71 | pred_image = prediction['prediction'] 72 | 73 | channels_axis = 0 if args.data_format == 'channels_first' else -1 74 | amax = np.argmax(pred_image, axis=channels_axis) 75 | 76 | input_image = np.uint8(np.squeeze(input_image) * 255) 77 | input_image = cv2.cvtColor(input_image, cv2.COLOR_GRAY2BGR) 78 | 79 | plot_segmentation_map(input_image, amax, 80 | os.path.join(output_path, "{}_seg_map.png".format(index)), num_classes=args.num_classes) 81 | 82 | predictions_pos = find_positions(amax, args.min_blob_size_px, args.max_blob_size_px) 83 | if len(predictions_pos) == 0: 84 | logger.info("Blob analysis failed to find objects.") 85 | continue 86 | 87 | np.savetxt(os.path.join(output_path, "{}_predictions.csv".format(index)), predictions_pos, fmt="%i,%i,%i,%.4f") 88 | 89 | plot_positions(input_image, [predictions_pos], [(0, 250, 255)], 90 | os.path.join(output_path, "{}_positions.png".format(index)), 91 | drawing_params=drawing_functions) 92 | -------------------------------------------------------------------------------- /segmentation/bee_dataset.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import logging 3 | import os 4 | 5 | import cv2 6 | import numpy as np 7 | 8 | logging.basicConfig() 9 | logger = logging.getLogger(__name__) 10 | logger.setLevel(logging.DEBUG) 11 | 12 | 13 | # raw image properties 14 | SUB_IMAGE_SIZE = (512, 512) 15 | BEE_OBJECT_SIZES = {1: (20, 35), # bee class is labeled 1 16 | 2: (20, 20)} # butt class is labeled 2 17 | # pre processing params 18 | SCALE_FACTOR = 2 # downscale images, labels by half 19 | 20 | 21 | def get_object_drawing_functions(): 22 | return {1: draw_bee_body, 23 | 2: draw_bee_butt} 24 | 25 | 26 | def draw_bee_butt(out_image, x, y, a, color): 27 | 28 | r = 30 // SCALE_FACTOR 29 | cv2.circle(out_image, (int(x), int(y)), r, color, thickness=2) 30 | draw_center(out_image, x, y, color) 31 | 32 | 33 | def draw_center(out_image, x, y, color): 34 | 35 | d = 4 // SCALE_FACTOR 36 | cv2.rectangle(out_image, (int(x) - d, int(y) - d), (int(x) + d, int(y) + d), color, thickness=-1) 37 | 38 | 39 | def draw_bee_body(out_image, x, y, a, color): 40 | 41 | d = 60. / SCALE_FACTOR 42 | dx = np.sin(a) * d 43 | dy = np.cos(a) * d 44 | x1, y1, x2, y2 = int(x - dx), int(y + dy), int(x + dx), int(y - dy) 45 | cv2.line(out_image, (x1, y1), (x2, y2), color, thickness=2) 46 | draw_center(out_image, x, y, color) 47 | 48 | 49 | def generate_training(frames_root_dir, labels_root_dir, filenames): 50 | 51 | for name in filenames: 52 | 53 | label_filepath = os.path.join(labels_root_dir, name + '.txt') 54 | image_filepath = os.path.join(frames_root_dir, name + '.png') 55 | 56 | if not os.path.exists(label_filepath) or not os.path.exists(image_filepath): 57 | logger.info('Skipping {}.'.format(name)) 58 | continue 59 | 60 | image = cv2.imread(image_filepath, cv2.IMREAD_GRAYSCALE) 61 | 62 | frame_label = read_label_file(label_filepath) 63 | 64 | all_unique_offsets = np.unique([[x[0], x[1]] for x in frame_label], axis=0) 65 | 66 | sub_label_size = (SUB_IMAGE_SIZE[0] // SCALE_FACTOR, 67 | SUB_IMAGE_SIZE[1] // SCALE_FACTOR) 68 | 69 | for offset_x, offset_y in all_unique_offsets: 70 | 71 | label_image = np.zeros(sub_label_size, dtype=np.uint8) 72 | 73 | sub_labels = [x for x in frame_label if x[0] == offset_x and x[1] == offset_y] 74 | 75 | for _, _, x, y, bee_type, angle in sub_labels: 76 | bee_object_size = BEE_OBJECT_SIZES[bee_type] 77 | 78 | x = x // SCALE_FACTOR 79 | y = y // SCALE_FACTOR 80 | r1 = bee_object_size[0] // SCALE_FACTOR 81 | r2 = bee_object_size[1] // SCALE_FACTOR 82 | 83 | ellipse_around_point(label_image, y, x, angle, r1=r1, r2=r2, value=bee_type) 84 | 85 | sub_image = image[offset_y:offset_y + SUB_IMAGE_SIZE[0], 86 | offset_x:offset_x + SUB_IMAGE_SIZE[1]] 87 | 88 | fx, fy = (1 / float(SCALE_FACTOR),) * 2 89 | sub_image = cv2.resize(sub_image, None, fx=fx, fy=fy, interpolation=cv2.INTER_LINEAR) 90 | 91 | yield sub_image, label_image 92 | 93 | 94 | def generate_predict(images_root_dir, filenames, regions_of_interest): 95 | 96 | for name, roi in zip(filenames, regions_of_interest): 97 | image_filepath = os.path.join(images_root_dir, name + '.png') 98 | image = cv2.imread(image_filepath, cv2.IMREAD_GRAYSCALE) 99 | image = image[roi[2]:roi[3], roi[0]:roi[1]] 100 | fx, fy = (1 / float(SCALE_FACTOR),) * 2 101 | image = cv2.resize(image, None, fx=fx, fy=fy, interpolation=cv2.INTER_LINEAR) 102 | yield image 103 | 104 | 105 | def ellipse_around_point(image, xc, yc, angle, r1, r2, value): 106 | 107 | image_size = image.shape 108 | 109 | ind0 = np.arange(-xc, image_size[0] - xc)[:, np.newaxis] * np.ones((1, image_size[1])) 110 | ind1 = np.arange(-yc, image_size[1] - yc)[np.newaxis, :] * np.ones((image_size[0], 1)) 111 | ind = np.concatenate([ind0[np.newaxis], ind1[np.newaxis]], axis=0) 112 | 113 | sin_a = np.sin(angle) 114 | cos_a = np.cos(angle) 115 | 116 | image[((ind[0, :, :] * sin_a + ind[1, :, :] * cos_a) ** 2 / r1 ** 2 + ( 117 | ind[1, :, :] * sin_a - ind[0, :, :] * cos_a) ** 2 / r2 ** 2) <= 1] = value 118 | 119 | return image 120 | 121 | 122 | def read_label_file(label_filename): 123 | with open(label_filename, 'r') as csvfile: 124 | csv_reader = csv.reader(csvfile, delimiter='\t') 125 | 126 | def parse_row(row): 127 | offset_x, offset_y = int(row[0]), int(row[1]) 128 | bee_type = int(row[2]) 129 | x, y = int(row[3]), int(row[4]) 130 | angle = float(row[5]) 131 | 132 | return offset_x, offset_y, x, y, bee_type, angle 133 | 134 | return list(map(parse_row, csv_reader)) 135 | 136 | 137 | def read_label_file_globalcoords(label_filename): 138 | 139 | rows = read_label_file(label_filename) 140 | 141 | unique_offsets = np.unique([[x[0], x[1]] for x in rows], axis=0) 142 | 143 | roi = [np.min(unique_offsets[:, 0]), np.max(unique_offsets[:, 0]) + SUB_IMAGE_SIZE[0], 144 | np.min(unique_offsets[:, 1]), np.max(unique_offsets[:, 1]) + SUB_IMAGE_SIZE[1]] 145 | 146 | labels_global_coordinates = [[offset_x + x - roi[0], offset_y + y - roi[2], bee_type, angle] 147 | for offset_x, offset_y, x, y, bee_type, angle in rows] 148 | 149 | labels_global_coordinates = [[x // SCALE_FACTOR, y // SCALE_FACTOR, bee_type, angle] 150 | for x, y, bee_type, angle in labels_global_coordinates] 151 | 152 | return labels_global_coordinates, roi 153 | -------------------------------------------------------------------------------- /segmentation/dataset.py: -------------------------------------------------------------------------------- 1 | from functools import partial 2 | 3 | import tensorflow as tf 4 | from tensorflow.python.data import Dataset 5 | 6 | 7 | def make_dataset(data_generator, data_format, batch_size, mode): 8 | first = next(data_generator()) 9 | 10 | if mode == tf.estimator.ModeKeys.PREDICT: 11 | types = tf.uint8 12 | shapes = first.shape 13 | else: 14 | types = (tf.uint8, tf.uint8) 15 | shapes = (first[0].shape, first[1].shape) 16 | 17 | dataset = Dataset.from_generator(data_generator, types, shapes) 18 | 19 | if mode == tf.estimator.ModeKeys.TRAIN: 20 | dataset = dataset.shuffle(buffer_size=10) 21 | dataset = dataset.repeat() 22 | 23 | process_data_fn = process_image if mode == tf.estimator.ModeKeys.PREDICT else process_image_label 24 | dataset = dataset.map(partial(process_data_fn, data_format=data_format), 25 | num_parallel_calls=64) 26 | 27 | dataset = dataset.batch(batch_size) 28 | 29 | if mode == tf.estimator.ModeKeys.TRAIN: 30 | dataset = dataset.map(partial(augment_data)) 31 | 32 | dataset = dataset.prefetch(batch_size) 33 | 34 | return dataset 35 | 36 | 37 | def process_image(image_data, data_format): 38 | 39 | image_data = tf.cast(image_data, dtype=tf.float32) / 255. 40 | image_data = image_data[:, :, tf.newaxis] if data_format == 'channels_last' else image_data[tf.newaxis, :, :] 41 | return image_data 42 | 43 | 44 | def process_image_label(image_data, label_data, data_format): 45 | 46 | image_data = process_image(image_data, data_format) 47 | 48 | label_data = tf.cast(label_data, dtype=tf.float32) 49 | label_data = label_data[:, :, tf.newaxis] if data_format == 'channels_last' else label_data[tf.newaxis, :, :] 50 | return image_data, label_data 51 | 52 | 53 | def augment_data(image_data, label_data): 54 | # random flip horizontally 55 | flip_cond = tf.less(tf.random_uniform([], 0, 1.0), .5) 56 | image_data, label_data = tf.cond(flip_cond, 57 | lambda: ( 58 | tf.image.flip_left_right(image_data), tf.image.flip_left_right(label_data)), 59 | lambda: (image_data, label_data)) 60 | 61 | # random rotation 90 62 | k = tf.random_uniform((), minval=0, maxval=4, dtype=tf.int32) 63 | image_data = tf.image.rot90(image_data, k=k) 64 | label_data = tf.image.rot90(label_data, k=k) 65 | 66 | return image_data, label_data 67 | -------------------------------------------------------------------------------- /segmentation/model.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | from segmentation.unet import create_unet 4 | 5 | 6 | def build_model(features, labels, num_classes, data_format, bg_fg_weight, mode): 7 | # The U-net network generates an output from the images data, network_output is the last layer of the network 8 | network_output = create_unet(features, num_classes, data_format) 9 | 10 | # Prediction step : return the network output, plus the input data to compare 11 | if mode == tf.estimator.ModeKeys.PREDICT: 12 | predictions = {'prediction': network_output, 'input_data': features} 13 | return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions) 14 | 15 | # Calculate the loss function by comparing network output and labels 16 | loss = add_loss(network_output, labels, data_format, num_classes=num_classes, weight=bg_fg_weight) 17 | 18 | # Evaluation step : output only the loss 19 | if mode == tf.estimator.ModeKeys.EVAL: 20 | return tf.estimator.EstimatorSpec(mode=mode, loss=loss) 21 | 22 | # Training step : minimize the loss using an optimizer, outputs the loss too 23 | if mode == tf.estimator.ModeKeys.TRAIN: 24 | optimizer = tf.train.AdamOptimizer() 25 | train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step()) 26 | return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op) 27 | 28 | 29 | def add_loss(logits, labels, data_format, num_classes, weight): 30 | with tf.name_scope('loss'): 31 | # Convert the labels to separate binary channels : one for each class 32 | channels_axis = 1 if data_format == 'channels_first' else -1 33 | labels = tf.squeeze(labels, axis=channels_axis) 34 | oh_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes, name="one_hot", axis=channels_axis) 35 | 36 | # Compare the network output with the labels, which will produce the most probable class for each pixel 37 | loss_map = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, 38 | labels=tf.stop_gradient(oh_labels), 39 | dim=channels_axis) 40 | 41 | # Weigh the loss map 42 | weight_map = tf.where(tf.equal(labels, 0), 43 | tf.fill(tf.shape(labels), 1 - weight), 44 | tf.fill(tf.shape(labels), weight)) 45 | weighted_loss = tf.multiply(loss_map, weight_map) 46 | 47 | # Average of the loss for the full batch 48 | loss = tf.reduce_mean(weighted_loss, name="weighted_loss") 49 | tf.losses.add_loss(loss) 50 | return loss 51 | -------------------------------------------------------------------------------- /segmentation/results_analysis.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | 4 | 5 | def _find_main_axis(regions, region_index): 6 | xs, ys = np.where(regions == region_index) 7 | m = np.concatenate([-ys[:, np.newaxis], xs[:, np.newaxis]], axis=1) 8 | 9 | _, _, v = np.linalg.svd(m - np.mean(m, axis=0), full_matrices=False) 10 | 11 | return np.arctan2(v[0, 0], v[0, 1]) 12 | 13 | 14 | def find_positions(pred, min_blob_size, max_blob_size): 15 | num_regions, regions, stats, centroids = cv2.connectedComponentsWithStats(pred.astype(np.uint8)) 16 | 17 | result = [] 18 | for region_index in np.arange(1, num_regions): 19 | region_stats = stats[region_index] 20 | 21 | # remove too big or too small blobs 22 | region_width, region_height = region_stats[2], region_stats[3] 23 | if region_width < min_blob_size or region_width > max_blob_size \ 24 | or region_height < min_blob_size or region_height > max_blob_size: 25 | continue 26 | 27 | # get blob properties : main axis and centroid 28 | ax = _find_main_axis(regions, region_index) 29 | x, y = centroids[region_index][0], centroids[region_index][1] 30 | 31 | # find object type (biggest occurence in the region) 32 | unique_values, count = np.unique(pred[regions == region_index], return_counts=True) 33 | typ = unique_values[np.argmax(count)] 34 | 35 | result.append([x, y, typ, ax]) 36 | 37 | return result 38 | 39 | 40 | def _calculate_points_dist(preds, labels): 41 | # euclidean distance between pred and label 42 | res = np.zeros((len(preds), len(labels))) 43 | 44 | for ip in range(len(preds)): 45 | for il in range(len(labels)): 46 | pred = preds[ip] 47 | label = labels[il] 48 | res[ip, il] = np.sqrt((pred[0] - label[0]) ** 2 + (pred[1] - label[1]) ** 2) 49 | 50 | return res 51 | 52 | 53 | def _axis_difference(a1, a2): 54 | # transform all angles in range [0, pi/2] before comparing with absolute difference 55 | 56 | a1 = np.mod(a1, np.pi) 57 | a2 = np.mod(a2, np.pi) 58 | 59 | a1 = np.pi - a1 if a1 > np.pi / 2 else a1 60 | a2 = np.pi - a2 if a2 > np.pi / 2 else a2 61 | 62 | return np.abs(a2 - a1) 63 | 64 | 65 | def compute_error_metrics(labels, preds, dist_min): 66 | TP_results, correct_type, pixel_dist, axis_diff = [], 0, [], [] 67 | 68 | dist_matrix = _calculate_points_dist(preds, labels) 69 | 70 | # find matches in dist_matrix, until no more values close enough to match 71 | while dist_matrix.shape[0] > 0 and dist_matrix.shape[1] > 0 and np.min(dist_matrix) < dist_min: 72 | 73 | ip, il = np.argwhere(dist_matrix == np.min(dist_matrix))[0] 74 | 75 | xp, yp, tp, ap = tuple(preds[ip]) 76 | xl, yl, tl, al = tuple(labels[il]) 77 | 78 | TP_results.append(preds[ip]) 79 | pixel_dist.append(dist_matrix[ip, il]) 80 | 81 | # only calculate axis difference for class 1 (bee body) 82 | if tl == 1 and tp == 1: 83 | axis_diff.append(_axis_difference(ap, al)) 84 | 85 | correct_type += int(tp == tl) 86 | 87 | dist_matrix = np.delete(dist_matrix, ip, 0) 88 | dist_matrix = np.delete(dist_matrix, il, 1) 89 | preds = np.delete(preds, ip, axis=0) 90 | labels = np.delete(labels, il, axis=0) 91 | 92 | FN_results = labels 93 | FP_results = preds 94 | 95 | return pixel_dist, axis_diff, correct_type, TP_results, FN_results, FP_results 96 | -------------------------------------------------------------------------------- /segmentation/results_visualization.py: -------------------------------------------------------------------------------- 1 | import colorsys 2 | 3 | import numpy as np 4 | import cv2 5 | 6 | 7 | def plot_TP_FN_FP(input_image, tps, fns, fps, out_file, drawing_params): 8 | 9 | symbols_image = np.zeros_like(input_image) 10 | 11 | for x, y, typ, a in tps: 12 | drawing_params[typ](symbols_image, x, y, a, (0, 255, 0)) 13 | for x, y, typ, a in fns: 14 | drawing_params[typ](symbols_image, x, y, a, (0, 0, 255)) 15 | for x, y, typ, a in fps: 16 | drawing_params[typ](symbols_image, x, y, a, (0, 255, 255)) 17 | 18 | res = cv2.addWeighted(input_image, 1, symbols_image, 0.5, 0) 19 | cv2.imwrite(out_file, res) 20 | 21 | 22 | def plot_positions(input_image, positions, colors, out_file, drawing_params): 23 | 24 | symbols_image = np.zeros_like(input_image) 25 | 26 | for pos, color in zip(positions, colors): 27 | for x, y, typ, a in pos: 28 | drawing_params[typ](symbols_image, x, y, a, color) 29 | 30 | res = cv2.addWeighted(input_image, 1, symbols_image, 0.5, 0) 31 | cv2.imwrite(out_file, res) 32 | 33 | 34 | def plot_segmentation_map(input_image, prediction, out_file, num_classes): 35 | 36 | label_im = np.zeros_like(input_image) 37 | 38 | prediction = prediction.astype(np.float32) / (num_classes - 1) 39 | 40 | colors = {k: tuple([int(c * 255) for c in reversed(colorsys.hsv_to_rgb(k, 1, 1))]) 41 | for k in np.unique(prediction)} 42 | 43 | values_x, values_y = np.where(prediction > 0) 44 | for x, y in zip(values_x, values_y): 45 | label_im[x, y] = colors[prediction[x, y]] 46 | 47 | res = cv2.addWeighted(input_image, 1, label_im, 0.4, 0) 48 | cv2.imwrite(out_file, res) 49 | -------------------------------------------------------------------------------- /segmentation/training_config.py: -------------------------------------------------------------------------------- 1 | import json 2 | import logging 3 | import os 4 | import random 5 | 6 | import numpy as np 7 | 8 | logging.basicConfig() 9 | logger = logging.getLogger(__name__) 10 | logger.setLevel(logging.DEBUG) 11 | 12 | CONFIG_NAME = 'training_config.json' 13 | 14 | 15 | def get(dataset_root_dir): 16 | config_filepath = os.path.join(dataset_root_dir, CONFIG_NAME) 17 | 18 | if not os.path.exists(config_filepath): 19 | return None 20 | 21 | with open(config_filepath, 'r') as f: 22 | try: 23 | training_config = json.load(f) 24 | logger.info('Loading training config from {}'.format(config_filepath)) 25 | return training_config 26 | except IOError: 27 | return None 28 | 29 | 30 | def create(dataset_root_dir, test_num_files): 31 | logger.info('Creating new training config') 32 | 33 | all_files = [] 34 | for _, _, files in os.walk(dataset_root_dir): 35 | for name in files: 36 | all_files.append(os.path.splitext(name)[0]) 37 | all_files = np.unique(all_files) 38 | 39 | random.shuffle(all_files) 40 | train_names = all_files[test_num_files:] 41 | test_names = all_files[:test_num_files] 42 | with open(os.path.join(dataset_root_dir, CONFIG_NAME), 'w') as outfile: 43 | json.dump({'test': test_names.tolist(), 'train': train_names.tolist()}, outfile, indent=4) 44 | 45 | return get(dataset_root_dir) 46 | -------------------------------------------------------------------------------- /segmentation/unet.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | 4 | def _create_conv_relu(inputs, data_format, name, filters, strides=1, kernel_size=3, padding='same'): 5 | return tf.layers.conv2d(inputs=inputs, filters=filters, strides=strides, 6 | kernel_size=kernel_size, padding=padding, data_format=data_format, 7 | name='{}_conv'.format(name), activation=tf.nn.relu) 8 | 9 | 10 | def _create_pool(data, data_format, name, pool_size=2, strides=2): 11 | return tf.layers.max_pooling2d(inputs=data, pool_size=pool_size, strides=strides, 12 | padding='same', name=name, data_format=data_format) 13 | 14 | 15 | def _contracting_path(data, data_format, num_layers, num_filters): 16 | interim = [] 17 | 18 | dim_out = num_filters 19 | for i in range(num_layers): 20 | name = 'c_{}'.format(i) 21 | conv1 = _create_conv_relu(data, data_format, '{}_1'.format(name), dim_out) 22 | conv2 = _create_conv_relu(conv1, data_format, '{}_2'.format(name), dim_out) 23 | pool = _create_pool(conv2, data_format, name) 24 | data = pool 25 | 26 | dim_out *= 2 27 | interim.append(conv2) 28 | 29 | return interim, data 30 | 31 | 32 | def _expansive_path(data, data_format, interim, num_layers, dim_in): 33 | dim_out = int(dim_in / 2) 34 | for i in range(num_layers): 35 | name = "e_{}".format(i) 36 | upconv = tf.layers.conv2d_transpose(data, filters=dim_out, kernel_size=2, strides=2, 37 | name='{}_upconv'.format(name), data_format=data_format, ) 38 | 39 | channels_axis = 1 if data_format == 'channels_first' else -1 40 | concat = tf.concat([interim[len(interim) - i - 1], upconv], axis=channels_axis) 41 | conv1 = _create_conv_relu(concat, data_format, '{}_1'.format(name), dim_out) 42 | conv2 = _create_conv_relu(conv1, data_format, '{}_2'.format(name), dim_out) 43 | data = conv2 44 | dim_out = int(dim_out / 2) 45 | return data 46 | 47 | 48 | def create_unet(data, num_classes, data_format, num_layers=3, num_filters=32): 49 | 50 | (interim, contracting_data) = _contracting_path(data, data_format, num_layers, num_filters) 51 | 52 | middle_dim = num_filters * (2 ** num_layers) 53 | middle_conv_1 = _create_conv_relu(contracting_data, data_format, 'm_1', middle_dim) 54 | middle_conv_2 = _create_conv_relu(middle_conv_1, data_format, 'm_2', middle_dim) 55 | middle_end = middle_conv_2 56 | 57 | expansive_path = _expansive_path(middle_end, data_format, interim, num_layers, middle_dim) 58 | 59 | conv_last = _create_conv_relu(expansive_path, data_format, 'final', num_classes) 60 | return conv_last 61 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | from functools import partial 5 | 6 | import tensorflow as tf 7 | 8 | from segmentation import dataset, bee_dataset, training_config 9 | from segmentation.model import build_model 10 | 11 | logging.basicConfig() 12 | logger = logging.getLogger(__name__) 13 | logger.setLevel(logging.DEBUG) 14 | tf.logging.set_verbosity(tf.logging.INFO) 15 | 16 | if __name__ == '__main__': 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument('dataset_root_dir', type=str, 19 | help="path of root folder containing frames and frames_txt folders") 20 | 21 | # model parameters 22 | parser.add_argument('--num_classes', type=int, default=3, help="How many outputs of the model") 23 | parser.add_argument('--data_format', type=str, default='channels_last', choices={'channels_last', 'channels_first'}) 24 | 25 | # training parameters 26 | parser.add_argument('--bg_fg_weight', type=float, default=0.9, 27 | help="How much to weight the foreground objects against the background during training.") 28 | parser.add_argument('--validation_num_files', type=int, default=10, 29 | help="How many images files are used for validation (chosen randomly).") 30 | parser.add_argument('--batch_size', type=int, default=8, 31 | help="Batch size for training") 32 | parser.add_argument('--num_steps', type=int, default=5000, help="Number of training steps") 33 | parser.add_argument('--checkpoint_dir', type=str, default='checkpoints', help="Save model to this path.") 34 | args = parser.parse_args() 35 | 36 | logger.info('Training network with settings: {}'.format(vars(args))) 37 | 38 | images_root_dir = os.path.join(args.dataset_root_dir, 'frames') 39 | labels_root_dir = os.path.join(args.dataset_root_dir, 'frames_txt') 40 | if not (os.path.exists(images_root_dir) and os.path.exists(labels_root_dir)): 41 | raise FileNotFoundError() 42 | 43 | config = training_config.get(args.dataset_root_dir) 44 | if config is None: 45 | config = training_config.create(args.dataset_root_dir, args.validation_num_files) 46 | 47 | estimator = tf.estimator.Estimator(model_fn=partial(build_model, 48 | num_classes=args.num_classes, 49 | data_format=args.data_format, 50 | bg_fg_weight=args.bg_fg_weight), 51 | model_dir=args.checkpoint_dir, 52 | config=tf.estimator.RunConfig(save_checkpoints_steps=100, 53 | save_summary_steps=100)) 54 | 55 | train_spec = tf.estimator.TrainSpec(input_fn=partial(dataset.make_dataset, 56 | data_generator=partial(bee_dataset.generate_training, 57 | frames_root_dir=images_root_dir, 58 | labels_root_dir=labels_root_dir, 59 | filenames=config['train']), 60 | data_format=args.data_format, 61 | batch_size=args.batch_size, 62 | mode=tf.estimator.ModeKeys.TRAIN), max_steps=args.num_steps) 63 | 64 | eval_spec = tf.estimator.EvalSpec(input_fn=partial(dataset.make_dataset, 65 | data_generator=partial(bee_dataset.generate_training, 66 | frames_root_dir=images_root_dir, 67 | labels_root_dir=labels_root_dir, 68 | filenames=config['test']), 69 | data_format=args.data_format, 70 | batch_size=args.batch_size, 71 | mode=tf.estimator.ModeKeys.EVAL), steps=None) 72 | 73 | tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) 74 | -------------------------------------------------------------------------------- /train_template.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | from functools import partial 4 | 5 | import tensorflow as tf 6 | 7 | from segmentation import dataset 8 | from segmentation.model import build_model 9 | 10 | logging.basicConfig() 11 | logger = logging.getLogger(__name__) 12 | logger.setLevel(logging.DEBUG) 13 | tf.logging.set_verbosity(tf.logging.INFO) 14 | 15 | 16 | def train_data_generator(): 17 | # Here generate your own training data : pair of 2D images in uint8 format 18 | # example : 19 | # while True: 20 | # data = cv2.imread(my_train_data_image_path, cv2.IMREAD_GRAYSCALE) 21 | # label = cv2.imread(my_train_label_image_path, cv2.IMREAD_GRAYSCALE) 22 | # yield data, label 23 | raise NotImplementedError() 24 | 25 | 26 | def eval_data_generator(): 27 | # Same as train_data_generator but with evaluation data, should not loop if using steps=None in EvalSpec 28 | raise NotImplementedError() 29 | 30 | 31 | if __name__ == '__main__': 32 | parser = argparse.ArgumentParser() 33 | # model parameters 34 | parser.add_argument('--num_classes', type=int, default=3, help="How many outputs of the model") 35 | parser.add_argument('--data_format', type=str, default='channels_last', choices={'channels_last', 'channels_first'}) 36 | 37 | # training parameters 38 | parser.add_argument('--bg_fg_weight', type=float, default=0.9, 39 | help="How much to weight the foreground objects against the background during training.") 40 | parser.add_argument('--batch_size', type=int, default=8, 41 | help="Batch size for training") 42 | parser.add_argument('--num_steps', type=int, default=5000, help="Number of training steps") 43 | parser.add_argument('--checkpoint_dir', type=str, default='checkpoints', help="Save model to this path.") 44 | args = parser.parse_args() 45 | 46 | logger.info('Training network with settings: {}'.format(vars(args))) 47 | 48 | estimator = tf.estimator.Estimator(model_fn=partial(build_model, 49 | num_classes=args.num_classes, 50 | data_format=args.data_format, 51 | bg_fg_weight=args.bg_fg_weight), 52 | model_dir=args.checkpoint_dir, 53 | config=tf.estimator.RunConfig(save_checkpoints_steps=100, 54 | save_summary_steps=100)) 55 | 56 | train_spec = tf.estimator.TrainSpec(input_fn=partial(dataset.make_dataset, 57 | data_generator=train_data_generator, 58 | data_format=args.data_format, 59 | batch_size=args.batch_size, 60 | mode=tf.estimator.ModeKeys.TRAIN), max_steps=args.num_steps) 61 | 62 | eval_spec = tf.estimator.EvalSpec(input_fn=partial(dataset.make_dataset, 63 | data_generator=eval_data_generator, 64 | data_format=args.data_format, 65 | batch_size=args.batch_size, 66 | mode=tf.estimator.ModeKeys.EVAL), steps=None) 67 | 68 | tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) 69 | --------------------------------------------------------------------------------