├── .gitignore ├── README.md ├── __init__.py ├── apply_net.py ├── apply_net_single.py ├── configs ├── exp01.yaml ├── exp02.yaml ├── exp03.yaml ├── exp04.yaml ├── exp05.yaml └── exp06.yaml ├── evaluate_net.py ├── imaterialist ├── __init__.py ├── config.py ├── data │ ├── __init__.py │ ├── dataset_mapper.py │ ├── datasets │ │ ├── __init__.py │ │ ├── coco.py │ │ ├── make_dataset.py │ │ ├── rle_utils.py │ │ ├── rle_utils_old.py │ │ └── test_rle.py │ └── structures.py ├── evaluator.py ├── modeling │ ├── __init__.py │ ├── attributes_rcnn.py │ └── roi_heads │ │ ├── __init__.py │ │ ├── attributes_head.py │ │ └── roi_heads.py └── submission_utils │ ├── resize_longest_edge.py │ └── test_csv_write.py ├── notebooks ├── 01-EDA.ipynb ├── 02-rle_encoder_decoder.ipynb ├── 03-Create-dataset.ipynb ├── 04-Inference.ipynb ├── 05-Training_and_inference_experiments.ipynb ├── 06-Attribute-inference.ipynb ├── 07-Results.ipynb └── InteractiveLabelExplorer.ipynb ├── requirements.txt └── train_net.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | # C extensions 6 | *.so 7 | 8 | # Distribution / packaging 9 | .Python 10 | env/ 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | *.egg-info/ 23 | .installed.cfg 24 | *.egg 25 | 26 | # PyInstaller 27 | # Usually these files are written by a python script from a template 28 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 29 | *.manifest 30 | *.spec 31 | results_bengali/ 32 | # Installer logs 33 | pip-log.txt 34 | pip-delete-this-directory.txt 35 | 36 | # Unit test / coverage reports 37 | htmlcov/ 38 | .tox/ 39 | .coverage 40 | .coverage.* 41 | .cache 42 | nosetests.xml 43 | coverage.xml 44 | *.cover 45 | 46 | # Translations 47 | *.mo 48 | *.pot 49 | 50 | # Django stuff: 51 | *.log 52 | 53 | # Sphinx documentation 54 | docs/_build/ 55 | 56 | # PyBuilder 57 | target/ 58 | 59 | # DotEnv configuration 60 | .env_bengali 61 | 62 | # Database 63 | *.db 64 | *.rdb 65 | 66 | # Pycharm 67 | .idea 68 | 69 | # VS Code 70 | .vscode/ 71 | 72 | # Spyder 73 | .spyproject/ 74 | 75 | # Jupyter NB Checkpoints 76 | .ipynb_checkpoints/ 77 | */.ipynb_checkpoints/ 78 | 79 | # exclude data from source control by default 80 | /home 81 | /data 82 | /data_* 83 | /results_* 84 | /notebooks_* 85 | /output 86 | /iMaterialist2020/configs/* 87 | 88 | # Mac OS-specific storage files 89 | .DS_Store 90 | 91 | # vim 92 | *.swp 93 | *.swo 94 | 95 | # Mypy cache 96 | .mypy_cache/ 97 | 98 | .idea/ 99 | __pycache__ 100 | configs/eai_server_paths.yaml 101 | 102 | # inbox folder for experiment runs should stay local 103 | configs/inbox 104 | 105 | 106 | configs/Archive/ 107 | 108 | models/mobilenet_v2-b0353104.pth 109 | 110 | /depends/depends.zip 111 | /depends/dill.pkl 112 | /depends/wheelhouse 113 | submission.csv 114 | /depends/wheelhouse/ 115 | .env* 116 | .flake8 117 | =2.0.1 118 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # iMaterialist 2020 Kaggle Competition in Detectron2 2 | 3 | In this competition we are tasked to do instance segmentation as well as attribute localization (recognize one or multiple attributes for the instances) on a fashion and apparel dataset. [Here is the link to competition](https://www.kaggle.com/c/imaterialist-fashion-2020-fgvc7/overview). 4 | 5 |

6 | 7 | ## Model and Training 8 | 9 | To solve the challenging problems entailed in this task we use and extend Detectron2’s MaskRCNN architecture and added a new attribute head as shown in orange below. 10 | 11 |

12 | 13 | - In prior steps in the MaskRCNN architecture we leverage a ResNet-50 with a feature pyramid network (FPN) as backbone. 14 | - The input image is resized to 1300 of the longer edge to feed the network. 15 | - Random horizontal flipping was applied during the training. 16 | - The model was trained on top of pre-trained COCO dataset weights for 300,000 iterations. 17 | 18 | ## Kaggle Submission 19 | 20 | The submission to Kaggle required specific encoding (run length encoding - RLE) for all the predicted masks in order to reduce the size of the submitted file. This posed a number of challenges since RLE is not standardized amongst COCO, Detectron2 and Kaggle. Also, Kaggle required that each pixel of the masks do not overlap, so mask refining was required. 21 | 22 | ## Evaluation 23 | 24 | Submissions are evaluated on the mean average precision at two different thresholds. 25 | 26 | 1. IoU: intersection over union (IoU) thresholds. The IoU of a proposed set of object pixels and a set of true object pixels is calculated as: 27 | 28 |

29 | 30 | 2. F1: f1 score between a set of predicted attributes and a set of true attributes of one segmentation mask 31 | 32 | The metric sweeps over a range of IoU thresholds and F1 thresholds, at each point calculating an average precision value. The threshold values range from 0.5 to 0.95 with a step size of 0.05: (0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95). In other words, at an IoU threshold of 0.5 and an F1 threshold of 0.5, a predicted object is considered a "hit" if it satisfies the following conditions: 33 | 34 | 1. Its intersection over union with a ground truth object is greater than 0.5 35 | 2. If the ground truth object has attributes, the f1 scores of predicted attributes and ground-truth attributes is greater than 0.5. 36 | At each threshold pair, t=(ti, tf), a precision value is calculated based on the number of true positives (TP), false negatives (FN), and false positives (FP) resulting from comparing the predicted object to all ground truth objects: 37 | 38 |

39 | 40 | ## Category and Attributes Analysis 41 | 42 | There are 46 apparel categories and 294 attributes presented in the Fashionpedia dataset. On average, each image was annotated with 7.3 instances, 5.4 categories, and 16.7 attributes. Of all the masks with categories and attributes, each mask has 3.7 attributes on average (max 14 attributes). 43 | 44 | ## Docker 45 | 46 | A Docker image is available at https://hub.docker.com/r/cvnnig/detectron2. 47 | 48 | ## WIP 49 | 50 | This repo is still being cleaned and organized. 51 | 52 | ## Authors 53 | 54 | Julien Beaulieu, Yang Ding 55 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Julienbeaulieu/iMaterialist2020-Image-Segmentation-on-Detectron2/8d96069bb021dd374fb8de310bb454d351971974/__init__.py -------------------------------------------------------------------------------- /apply_net.py: -------------------------------------------------------------------------------- 1 | 2 | """ 3 | Run model inference on a set of image. 4 | 5 | Files will be generated one by one to save memory 6 | 7 | Predicted masks are encoded into RLE format to save memory as well. 8 | 9 | Predictions can be visualized and will also be saved to csv. 10 | 11 | """ 12 | 13 | import logging 14 | import csv 15 | import torch 16 | import os 17 | import pickle 18 | import pandas as pd 19 | import numpy as np 20 | from datetime import datetime 21 | from environs import Env 22 | from pathlib import Path 23 | from matplotlib.pyplot import imsave 24 | from typing import Any, Dict, List 25 | 26 | from detectron2.data.detection_utils import read_image 27 | from detectron2.engine import default_argument_parser, launch 28 | from detectron2.structures.instances import Instances 29 | from detectron2.utils.visualizer import ColorMode, Visualizer 30 | 31 | from imaterialist.data.datasets.coco import register_datadict, MetadataCatalog 32 | from imaterialist.config import setup_prediction 33 | from imaterialist.evaluator import iMatPredictor 34 | from imaterialist.data.datasets.rle_utils_old import mask_to_KaggleRLE 35 | from imaterialist.data.datasets.rle_utils import mask_to_KaggleRLE_downscale 36 | 37 | 38 | LOGGER_NAME = "apply_net" 39 | logger = logging.getLogger(LOGGER_NAME) 40 | 41 | env = Env() 42 | env.read_env() 43 | 44 | path_data_interim = Path(env("path_interim")) 45 | path_test_data = Path(env("path_test")) 46 | path_output = Path(env("path_output")) 47 | 48 | class FileGen: 49 | ''' 50 | Class that lazily builds a list of file_paths from a directory. 51 | This is done by returning a generator using a generator expression. 52 | Helps not run into memory issues 53 | ''' 54 | 55 | def __init__(self, file_path): 56 | self.file_path = file_path 57 | 58 | def __iter__(self): 59 | return (os.path.join(self.file_path, fname) 60 | for fname in os.listdir(self.file_path) 61 | if os.path.isfile(os.path.join(self.file_path, fname))) 62 | 63 | 64 | def execute_on_outputs(entry: Dict[str, Any], outputs: Instances) -> List[dict]: 65 | """ 66 | Parse instance from prediction to return a dict of the easier to read attributes. 67 | :param entry: 68 | :param outputs: 69 | :return: 70 | """ 71 | 72 | image_fpath = entry["file_name"] 73 | logger.info(f"Processing {image_fpath}") 74 | 75 | # Get predicted classes from outputs 76 | pred_classes = np.array(outputs.pred_classes.cpu().tolist()) 77 | 78 | # Get attribute scores from outputs 79 | attr_scores = np.array(outputs.attr_scores.cpu()) 80 | 81 | # Keep only attributes with a score > 0.5 82 | attr_filter = attr_scores > 0.5 83 | 84 | # Get the index of attributes where the score is > 0. Each item in the list 85 | # corresponds to the predicted attributes for one instance 86 | # attr_filtered = [np.array(torch.where(attr_filter[i])[0].to("cpu")) for i in range(len(attr_filter))] 87 | attr_filtered = [np.where(attr_filter[i])[0] for i in range(len(attr_filter))] 88 | 89 | 90 | # Get masks from outputs 91 | has_mask = outputs.has("pred_masks") 92 | if has_mask: 93 | # use RLE to encode the masks, because they are too large and takes memory 94 | # since this evaluator stores outputs of the entire dataset 95 | 96 | # Old Non-Union RLE 97 | # rles = [ 98 | # mask_to_KaggleRLE(mask) for mask in outputs.pred_masks.cpu() 99 | # ] 100 | 101 | # New Union RLE 102 | rles = refine_masks(outputs.pred_masks.cpu()) 103 | 104 | 105 | ############################## 106 | # Uncomment following code to encode the masks to compressed RLE 107 | # instead of uncompressed RLE like above: 108 | ############################## 109 | 110 | # use RLE to encode the masks, because they are too large and takes memory 111 | # since this evaluator stores outputs of the entire dataset 112 | # rles = [ 113 | # mask_util.encode(np.array(mask[:, :, None], order="F", dtype="uint8"))[0] 114 | # for mask in outputs.pred_masks.cpu() 115 | # ] 116 | # for rle in rles: 117 | # # "counts" is an array encoded by mask_util as a byte-stream. Python3's 118 | # # json writer which always produces strings cannot serialize a bytestream 119 | # # unless you decode it. Thankfully, utf-8 works out (which is also what 120 | # # the pycocotools/_mask.pyx does). 121 | # rle["counts"] = rle["counts"].decode("utf-8") 122 | results = [] 123 | 124 | # Cycle each instance of an image 125 | for k in range(len(outputs)): 126 | # Attribute 294 is the category for empty attributes so we make the tensor empty 127 | # if it contains 294 128 | if 294 in attr_filtered[k]: 129 | # Must be sent to CPU 130 | attr_filtered[k] = np.array(torch.tensor([], device='cuda:0').cpu()) 131 | 132 | 133 | # per Kaggle requirement. 134 | attributes_sorted = get_attribute_ids(list(attr_filtered[k])) 135 | # Get image ID from full path string 136 | image_id = Path(image_fpath).stem 137 | class_id = str(pred_classes[k]) 138 | if has_mask: 139 | result = {"ImageId": image_id, 140 | "EncodedPixels": rles[k], 141 | "ClassId": class_id, 142 | "AttributesIds": attributes_sorted, # attribute IDs must be comma separated and sorted. 143 | } 144 | # Encoded Pixels MUST be SPACE separated 145 | else: 146 | result = {"ImageId": image_id, 147 | "ClassId": str(pred_classes[k]), 148 | "AttributesIds": attributes_sorted, # attribute IDs must be comma separated and sorted. 149 | } 150 | results.append(result) 151 | return results 152 | 153 | def get_attribute_ids(att_ids: List[int]): 154 | """ 155 | Get concatenated AttributesIds 156 | Args: 157 | att_ids: [int], list of apparel attributes 158 | Returns: 159 | att_ids: string, e.g. "2,10,55,91" 160 | """ 161 | # Source: https://www.kaggle.com/c/imaterialist-fashion-2020-fgvc7/overview/evaluation 162 | att_ids.sort() # need to be sorted before concatenation 163 | return ','.join([str(a) for a in att_ids]) 164 | 165 | def export_results(result: Dict[str, Any]): 166 | """ 167 | Take the results and write them out into CSV and PKL for upload and future review 168 | :param result: 169 | :return: 170 | """ 171 | # Get and create the folder path name to ensure the subsequent write operation will be successful. 172 | out_fname = result["out_fname"] 173 | out_dir = os.path.dirname(out_fname) 174 | if len(out_dir) > 0 and not os.path.exists(out_dir): 175 | os.makedirs(out_dir) 176 | 177 | # Write out pickle 178 | path_pickle = path_output / f"{out_fname}.pkl" 179 | pickle_write(result, path_pickle) 180 | 181 | # Write out CSV 182 | path_csv = path_output / f"{out_fname}.csv" 183 | # Filter out the blank rows where no encoded pixel is given but class prediction is given... 184 | filter_csv_write(result["results"], path_csv) 185 | 186 | 187 | def pickle_write(result, path_pickle): 188 | with open(path_pickle, "wb") as pickle_file: 189 | pickle.dump(result["results"], pickle_file) 190 | logger.info(f"Output saved to {path_pickle}") 191 | 192 | 193 | def main(args, visualize=True): 194 | # datadic_train = pd.read_feather(path_data_interim / 'imaterialist_train_multihot_n=4000.feather') 195 | # datadic_val = pd.read_feather(path_data_interim / 'imterailist_val_multihot_n=1000.feather') 196 | 197 | # register_datadict(datadic_train, "sample_fashion_train") 198 | # register_datadict(datadic_val, "sample_fashion_test") 199 | 200 | # This small set of data just to provide label./home/nasty/imaterialis 201 | datadic_test = pd.read_feather(path_data_interim / 'imateralist_val_multihot_n=1000.feather') 202 | register_datadict(datadic_test, "sample_fashion_test") 203 | fashion_metadata = get_fashion_metadata() 204 | 205 | # This update the prediction weight and output path automatically. 206 | cfg = setup_prediction(args) 207 | 208 | # cfg must have (cfg.DATASETS.TEST[0]) 209 | 210 | # Generate the predictor 211 | predictor = iMatPredictor(cfg) 212 | 213 | # Create a list of image files 214 | # Loop through all data and generate. 215 | file_list = FileGen(path_test_data) 216 | 217 | # Dictionary where we'll append the results of all images and instances 218 | all_results = {"results": [], "out_fname": 'result_file'} 219 | 220 | for file_name in file_list: 221 | img = read_image(file_name, format="BGR") # predictor takes BGR format 222 | with torch.no_grad(): 223 | 224 | if visualize: 225 | #========================================== 226 | # Call the visualizer, label and save data 227 | # ========================================= 228 | 229 | #show_predicted_image(file_name, predictor, fashion_metadata) 230 | 231 | v = Visualizer(img[:, :, ::-1], 232 | metadata=fashion_metadata, 233 | scale=0.8, 234 | instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels 235 | ) 236 | 237 | # Get outputs of model in eval mode 238 | outputs = predictor(img)["instances"] 239 | v = v.draw_instance_predictions(outputs.to("cpu")) 240 | time_stamp = datetime.now().isoformat().replace(":", "") 241 | name = Path(file_name).stem 242 | imsave(f"{path_output}/{name}_{time_stamp}.png", v.get_image()[:, :, ::-1]) 243 | 244 | # Get results for image 245 | result = execute_on_outputs({"file_name": file_name, "image": img}, outputs) 246 | 247 | all_results["results"].append(result) 248 | 249 | # Dump all results to output path 250 | # Pkl and CSV 251 | export_results(all_results) 252 | print("Example prediction result: ") 253 | print(all_results["results"][0]) # verification 254 | 255 | 256 | def filter_csv_write(list_list_dict: List[List[dict]], path_csv): 257 | """ 258 | Write the list of csv predictions into CSV but omit the rows where EncodedPixels are empty 259 | :param list_dict: 260 | :param path_csv: 261 | :return: 262 | """ 263 | # Flatten the two list. 264 | # Feturn item if they the encoded pixel is not flat. 265 | flat_list = [] 266 | # Iterate through image list. 267 | for sublist in list_list_dict: 268 | # Iterate through mask list 269 | for item in sublist: 270 | # If the EncodedPixel is empty, skip. 271 | # if item["EncodedPixels"] == "": 272 | # continue 273 | # else: 274 | flat_list.append(item) 275 | 276 | # With blanks. 277 | # flat_list = [item for sublist in list_list_dict for item in sublist] 278 | 279 | # Source: https://stackoverflow.com/questions/3086973/how-do-i-convert-this-list-of-dictionaries-to-a-csv-file 280 | keys = flat_list[0].keys() 281 | with open(path_csv, 'w') as output_file: 282 | # quote char prevent dict_writer to quote string that contain separtor: , 283 | # The attributes are separated by COMMA, and must be quoted, by using space, 284 | dict_writer = csv.DictWriter(output_file, keys) 285 | dict_writer.writeheader() 286 | dict_writer.writerows(flat_list) 287 | 288 | 289 | def csv_write(list_list_dict: List[List[dict]], path_csv): 290 | """ 291 | Write the list of csv predictions into CSV. 292 | :param list_dict: 293 | :param path_csv: 294 | :return: 295 | """ 296 | # Flatten the two list. 297 | flat_list = [item for sublist in list_list_dict for item in sublist] 298 | 299 | 300 | # Source: https://stackoverflow.com/questions/3086973/how-do-i-convert-this-list-of-dictionaries-to-a-csv-file 301 | keys = flat_list[0].keys() 302 | with open(path_csv, 'w') as output_file: 303 | # quote char prevent dict_writer to quote string that contain separtor: , 304 | # The attributes are separated by COMMA, and must be quoted, by using space, 305 | dict_writer = csv.DictWriter(output_file, keys) 306 | dict_writer.writeheader() 307 | dict_writer.writerows(flat_list) 308 | 309 | #path_submission = Path(path_csv).parent / "submission.csv" 310 | #reader = csv.reader(open(path_csv, "r"), skipinitialspace=True) 311 | #writer = csv.writer(open(path_submission, "w"), quoting=csv.QUOTE_NONE) 312 | #writer.writerows(reader) 313 | 314 | 315 | def get_fashion_metadata(): 316 | # datadic_val = pd.read_feather(path_data_interim / 'imaterailist_test_multihot_n=100.feather') 317 | # register_datadict(datadic_val, "sample_fashion_test") 318 | fashion_metadata = MetadataCatalog.get("sample_fashion_test") 319 | return fashion_metadata 320 | 321 | 322 | if __name__ == '__main__': 323 | args = default_argument_parser().parse_args() 324 | args.eval_only = True 325 | args.config_file = "/home/nasty/imaterialist2020/iMaterialist2020/configs/exp05.yaml" 326 | print("Command Line Args:", args) 327 | launch( 328 | main, 329 | args.num_gpus, 330 | num_machines=args.num_machines, 331 | machine_rank=args.machine_rank, 332 | dist_url=args.dist_url, 333 | args=(args,), 334 | ) 335 | -------------------------------------------------------------------------------- /apply_net_single.py: -------------------------------------------------------------------------------- 1 | """ 2 | Runs inference on one image 3 | """ 4 | 5 | import cv2 6 | import pickle 7 | import os 8 | from pathlib import Path 9 | from environs import Env 10 | from datetime import datetime 11 | from matplotlib.pyplot import imshow, imsave 12 | 13 | from apply_net import get_fashion_metadata 14 | from imaterialist.data.datasets.coco import register_datadict 15 | from detectron2.utils.visualizer import Visualizer, ColorMode 16 | 17 | from imaterialist.config import initialize_imaterialist_config, update_weights_outpath 18 | from imaterialist.evaluator import iMatPredictor 19 | 20 | env = Env() 21 | env.read_env() 22 | 23 | path_output = Path(env("path_output_images")) 24 | path_data_interim = Path(env("path_interim")) 25 | path_images_local = Path(env("path_images_local")) 26 | 27 | def predicted_image_datadict(datadic_test, predictor, fashion_metadata): 28 | """ 29 | Show 3 predicted images from the Fashion Dict (make sure it is the test set!) 30 | :param dict_test: 31 | :return: 32 | """ 33 | import random 34 | from datetime import datetime 35 | from imaterialist.data.datasets.make_dataset import load_category_attributes 36 | seed = random.randint(0, 99999999) 37 | 38 | # Randomly Grab 9 samples, iterate through rows of them, convert to list of tuple. : 39 | list_tuple = list(datadic_test.sample(n=50, random_state=seed).iterrows()) 40 | _, list_datadic = zip(*list_tuple) 41 | 42 | for i, d in enumerate(list_datadic): 43 | time_stamp = datetime.now().isoformat().replace(":", "") 44 | 45 | im = cv2.imread(d["ImageId"]) 46 | im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) 47 | 48 | # Run through predictor 49 | outputs = predictor(im) 50 | 51 | # Visualize 52 | v = Visualizer(im[:, :, ::-1], 53 | metadata=fashion_metadata, 54 | scale=0.8, 55 | instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels 56 | ) 57 | # Bring the data back to CPU before passing to Numpy to draw 58 | v = v.draw_instance_predictions(outputs["instances"].to("cpu")) 59 | imshow(v.get_image()[:, :, ::-1]) 60 | 61 | imsave(f"{path_output}/{time_stamp}.png", v.get_image()[:, :, ::-1]) 62 | 63 | 64 | def predicted_image_show(path_image_file, predictor, fashion_metadata): 65 | """ 66 | Visualize and save an image predicted using the given predictor and labelled with the fashion data. 67 | 68 | :param dict_test: 69 | :return: 70 | """ 71 | time_stamp = datetime.now().isoformat().replace(":", "") 72 | 73 | for path in os.listdir(path_image_file): 74 | 75 | full_path = os.path.join(path_image_file, path) 76 | print(full_path) 77 | 78 | 79 | im = cv2.imread(full_path) 80 | im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) 81 | 82 | # Run through predictor 83 | outputs = predictor(im) 84 | 85 | # Visualize 86 | v = Visualizer(im[:, :, ::-1], 87 | metadata=fashion_metadata, 88 | scale=0.8, 89 | instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels 90 | ) 91 | # Bring the data back to CPU before passing to Numpy to draw 92 | v = v.draw_instance_predictions(outputs["instances"].to("cpu")) 93 | imshow(v.get_image()[:, :, ::-1]) 94 | 95 | imsave(f"{path_output}/{path}.png", v.get_image()[:, :, ::-1]) 96 | 97 | 98 | def load_model_predict_image(path_image=path_images_local): 99 | # cfg = setup(args) 100 | cfg = initialize_imaterialist_config() 101 | 102 | # Merge from TRAINED config file. 103 | cfg.merge_from_file("/home/julien/data-science/kaggle/imaterialist/configs/exp06.yaml") 104 | update_weights_outpath(cfg, "/home/julien/data-science/kaggle/imaterialist/output/exp03/model_0109999.pth") 105 | 106 | # Set max input size 107 | cfg.INPUT.MAX_SIZE_TEST = 1024 108 | 109 | # Generate Predictor 110 | predictor = iMatPredictor(cfg) 111 | 112 | datadict_val = pickle.load(open(path_data_interim / 'imaterialist_test_multihot_n=100.p', 'rb')) 113 | register_datadict(datadict_val, "sample_fashion_test") 114 | 115 | # This small set of data just to provide label. 116 | fashion_metadata = get_fashion_metadata() 117 | 118 | # Call the visualizer. 119 | predicted_image_show(path_image, predictor, fashion_metadata) 120 | 121 | if __name__ == '__main__': 122 | load_model_predict_image() 123 | 124 | -------------------------------------------------------------------------------- /configs/exp01.yaml: -------------------------------------------------------------------------------- 1 | DATALOADER: 2 | NUM_WORKERS: 1 3 | MODEL: 4 | ROI_HEADS: 5 | NAME: "StandardROIHeads" 6 | SCORE_THRESH_TEST: 0.3 7 | DATASETS: 8 | TRAIN: ("sample_fashion_train", ) 9 | TEST: ("sample_fashion_test",) 10 | SOLVER: 11 | MAX_ITER: 1000 12 | OUTPUT_DIR: '/home/julien/data-science/kaggle/imaterialist/output/exp01' 13 | 14 | 15 | -------------------------------------------------------------------------------- /configs/exp02.yaml: -------------------------------------------------------------------------------- 1 | 2 | MODEL: 3 | ROI_HEADS: 4 | NAME: "StandardROIHeads" 5 | SCORE_THRESH_TEST: 0.5 6 | DATASETS: 7 | TRAIN: ("sample_fashion_train", ) 8 | TEST: ("sample_fashion_test",) 9 | SOLVER: 10 | MAX_ITER: 50000 11 | IMS_PER_BATCH: 4 12 | OUTPUT_DIR: '/home/julien/data-science/kaggle/imaterialist/output/exp02' -------------------------------------------------------------------------------- /configs/exp03.yaml: -------------------------------------------------------------------------------- 1 | 2 | MODEL: 3 | ROI_BOX_HEAD: 4 | NUM_FC: 2 5 | ROI_HEADS: 6 | NAME: "StandardROIHeads" 7 | SCORE_THRESH_TEST: 0.5 8 | DATASETS: 9 | TRAIN: ("sample_fashion_train", ) 10 | TEST: ("sample_fashion_test",) 11 | SOLVER: 12 | MAX_ITER: 120000 13 | IMS_PER_BATCH: 8 14 | BASE_LR: 0.0004 15 | TEST: 16 | DETECTIONS_PER_IMAGE: 30 17 | OUTPUT_DIR: '/home/julien/data-science/kaggle/imaterialist/output/exp03' 18 | 19 | -------------------------------------------------------------------------------- /configs/exp04.yaml: -------------------------------------------------------------------------------- 1 | DATALOADER: 2 | NUM_WORKERS: 1 3 | MODEL: 4 | ROI_BOX_HEAD: 5 | NUM_FC: 2 6 | ROI_HEADS: 7 | NAME: "StandardROIHeads" 8 | SCORE_THRESH_TEST: 0.5 9 | DATASETS: 10 | TRAIN: ("sample_fashion_train", ) 11 | TEST: ("sample_fashion_test",) 12 | SOLVER: 13 | MAX_ITER: 120000 14 | IMS_PER_BATCH: 2 15 | BASE_LR: 0.0004 16 | OUTPUT_DIR: '/home/julien/data-science/kaggle/imaterialist/output/exp04' 17 | -------------------------------------------------------------------------------- /configs/exp05.yaml: -------------------------------------------------------------------------------- 1 | 2 | INPUT: 3 | MAX_SIZE_TEST: 1024 4 | MODEL: 5 | ROI_BOX_HEAD: 6 | NUM_FC: 2 7 | ROI_HEADS: 8 | NAME: "StandardROIHeads" 9 | SCORE_THRESH_TEST: 0.5 10 | DATASETS: 11 | TRAIN: ("sample_fashion_train", ) 12 | TEST: ("sample_fashion_test",) 13 | SOLVER: 14 | MAX_ITER: 120000 15 | IMS_PER_BATCH: 8 16 | BASE_LR: 0.0004 17 | OUTPUT_DIR: '/home/julien/data-science/kaggle/imaterialist/output/exp05' 18 | -------------------------------------------------------------------------------- /configs/exp06.yaml: -------------------------------------------------------------------------------- 1 | 2 | INPUT: 3 | MAX_SIZE_TEST: 1024 4 | MODEL: 5 | ROI_BOX_HEAD: 6 | NUM_FC: 2 7 | ROI_HEADS: 8 | NAME: "StandardROIHeads" 9 | SCORE_THRESH_TEST: 0.5 10 | DATASETS: 11 | TRAIN: ("sample_fashion_train", ) 12 | TEST: ("sample_fashion_test",) 13 | SOLVER: 14 | MAX_ITER: 500 15 | IMS_PER_BATCH: 4 16 | BASE_LR: 0.004 17 | OUTPUT_DIR: '/home/julien/data-science/kaggle/imaterialist/output/images' 18 | -------------------------------------------------------------------------------- /evaluate_net.py: -------------------------------------------------------------------------------- 1 | """ 2 | Run the coco evaluator on "sample_fashion_test" to evaluate performance using 3 | AP metric 4 | 5 | TODO: add command line interface for all dataset and model weight inputs instead of 6 | having them hard coded. 7 | """ 8 | 9 | 10 | from pathlib import Path 11 | 12 | from detectron2.engine import DefaultTrainer 13 | from detectron2.config import get_cfg 14 | from detectron2.evaluation import COCOEvaluator, inference_on_dataset 15 | from detectron2.data import build_detection_test_loader 16 | 17 | from iMaterialist2020.imaterialist.config import add_imaterialist_config 18 | from iMaterialist2020.imaterialist.data.datasets.coco import register_datadict 19 | from iMaterialist2020.imaterialist.data.dataset_mapper import iMatDatasetMapper 20 | from environs import Env 21 | 22 | env = Env() 23 | env.read_env() 24 | 25 | # Get training dataframe 26 | path_data = Path(env("path_raw")) 27 | path_image = path_data / "train/" 28 | path_output = Path(env("path_output")) 29 | path_eval = Path(env("path_eval")) 30 | path_data_interim = Path(env("path_interim")) 31 | path_model = Path(env("path_model")) 32 | 33 | if __name__=="__main__": 34 | # load dataframe 35 | # fixme: this number needs to update or dynamic 36 | datadic_train = pd.read_feather(path_data_interim / 'imaterialist_train_multihot_n=266721.feather') 37 | datadic_test = pd.read_feather(path_data_interim / 'imaterailist_test_multihot_n=66680.feather') 38 | 39 | register_datadict(datadic_train, "sample_fashion_train") 40 | register_datadict(datadic_test, "sample_fashion_test") 41 | 42 | # cfg = setup(args) 43 | cfg = get_cfg() 44 | 45 | # Add Solver etc. 46 | add_imaterialist_config(cfg) 47 | 48 | # Merge from config file. 49 | config_file = "/home/dyt811/Git/cvnnig/iMaterialist2020/configs/config.yaml" 50 | cfg.merge_from_file(config_file) 51 | 52 | # Load the final weight. 53 | cfg.MODEL.WEIGHTS = str(path_model / "model_0109999.pth") 54 | cfg.OUTPUT_DIR = str(path_output) 55 | 56 | trainer = DefaultTrainer(cfg) 57 | 58 | # load weights 59 | trainer.resume_or_load(resume=False) 60 | 61 | # Evaluate performance using AP metric implemented in COCO API 62 | evaluator = COCOEvaluator("sample_fashion_test", cfg, False, output_dir=str(path_output)) 63 | val_loader = build_detection_test_loader(cfg, "sample_fashion_test", mapper=iMatDatasetMapper(cfg)) 64 | inference_on_dataset(trainer.model, val_loader, evaluator) -------------------------------------------------------------------------------- /imaterialist/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Julienbeaulieu/iMaterialist2020-Image-Segmentation-on-Detectron2/8d96069bb021dd374fb8de310bb454d351971974/imaterialist/__init__.py -------------------------------------------------------------------------------- /imaterialist/config.py: -------------------------------------------------------------------------------- 1 | from detectron2.config import CfgNode as CN 2 | from detectron2 import model_zoo 3 | from detectron2.config import get_cfg 4 | from pathlib import Path 5 | from environs import Env 6 | from detectron2.engine import default_setup 7 | import detectron2.utils.comm as comm 8 | from detectron2.utils.logger import setup_logger 9 | import os 10 | 11 | env = Env() 12 | env.read_env() 13 | 14 | def add_imaterialist_config(cfg: CN): 15 | """ 16 | Add config for imaterialist2 head 17 | """ 18 | 19 | _C = cfg 20 | 21 | _C.merge_from_file(model_zoo.get_config_file( 22 | "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) 23 | _C.MODEL.WEIGHTS = model_zoo.get_checkpoint_url( 24 | "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") 25 | 26 | ##### Input ##### 27 | # Set a smaller image size than default to avoid memory problems 28 | 29 | # Size of the smallest side of the image during training 30 | # _C.INPUT.MIN_SIZE_TRAIN = (400,) 31 | # # Maximum size of the side of the image during training 32 | # _C.INPUT.MAX_SIZE_TRAIN = 600 33 | 34 | # # Size of the smallest side of the image during testing. Set to zero to disable resize in testing. 35 | # _C.INPUT.MIN_SIZE_TEST = 400 36 | # # Maximum size of the side of the image during testing 37 | # _C.INPUT.MAX_SIZE_TEST = 600 38 | 39 | _C.SOLVER.IMS_PER_BATCH = 2 40 | _C.SOLVER.BASE_LR = 0.0004 41 | _C.SOLVER.MAX_ITER = 50000 42 | _C.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512 # default: 512 43 | _C.MODEL.ROI_HEADS.NUM_CLASSES = 46 # 46 classes in iMaterialist 44 | _C.MODEL.ROI_HEADS.NUM_ATTRIBUTES = 295 45 | # this should ALWAYS be left at 1 because it will double or more memory usage if higher. 46 | _C.DATALOADER.NUM_WORKERS = 1 47 | 48 | def initialize_imaterialist_config(): 49 | """ 50 | Cannot directly merge until intialize the imaterialist config properly in the first place. 51 | :return: 52 | """ 53 | cfg = get_cfg() 54 | add_imaterialist_config(cfg) 55 | return cfg 56 | 57 | def setup_prediction(args): 58 | """ 59 | Setup up the cfg per the prediction requirement. 60 | Will use weight specified in the environmental variable. 61 | :param args: 62 | :return: 63 | """ 64 | cfg = initialize_imaterialist_config() 65 | 66 | # Merge from pretrained or opts 67 | cfg.merge_from_file(args.config_file) 68 | cfg.merge_from_list(args.opts) 69 | 70 | cfg = update_weights_outpath(cfg, env("path_trained_weights")) 71 | # cfg must have (cfg.DATASETS.TEST[0]) 72 | 73 | # Set max input size 74 | #cfg.INPUT.MAX_SIZE_TEST = 1024 75 | 76 | cfg.freeze() 77 | default_setup(cfg, args) 78 | # Setup logger for "imaterialist" module 79 | setup_logger(output=cfg.OUTPUT_DIR, distributed_rank=comm.get_rank(), name="imaterialist") 80 | return cfg 81 | 82 | def update_weights_outpath(cfg, weights_path): 83 | """ 84 | Update these two attributes using environmental variable because the CFG past along was hard coded. 85 | :param cfg: 86 | :param weights_path: 87 | :return: 88 | """ 89 | # Add the trained weights 90 | cfg.MODEL.WEIGHTS = weights_path 91 | cfg.OUTPUT_DIR = env("path_output") 92 | 93 | return cfg 94 | -------------------------------------------------------------------------------- /imaterialist/data/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Julienbeaulieu/iMaterialist2020-Image-Segmentation-on-Detectron2/8d96069bb021dd374fb8de310bb454d351971974/imaterialist/data/__init__.py -------------------------------------------------------------------------------- /imaterialist/data/dataset_mapper.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import torch 3 | import numpy as np 4 | from fvcore.common.file_io import PathManager 5 | 6 | from detectron2.data import DatasetMapper 7 | from detectron2.data import transforms as T 8 | from detectron2.data import detection_utils as utils 9 | 10 | from .structures import Attributes 11 | 12 | class iMatDatasetMapper(DatasetMapper): 13 | """ 14 | A callable which takes a dataset dict in Detectron2 Dataset format, 15 | and maps it into a format used by the model. 16 | 17 | This is a customized version of the default DatasetMapper where we add attributes 18 | to the Instances class. 19 | 20 | The callable currently does the following: 21 | 22 | 1. Read the image from "file_name" 23 | 2. Applies cropping/geometric transforms to the image and annotations 24 | 3. Prepare data and annotations to Tensor and :class:`Instances` (including 25 | attributes) 26 | """ 27 | def __call__(self, dataset_dict): 28 | """ 29 | Args: 30 | dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format. 31 | 32 | Returns: 33 | dict: a format that builtin models in detectron2 accept 34 | """ 35 | dataset_dict = copy.deepcopy(dataset_dict) # it will be modified by code below 36 | # USER: Write your own image loading if it's not from a file 37 | image = utils.read_image(dataset_dict["file_name"], format=self.img_format) 38 | utils.check_image_size(dataset_dict, image) 39 | 40 | if "annotations" not in dataset_dict: 41 | image, transforms = T.apply_transform_gens( 42 | ([self.crop_gen] if self.crop_gen else []) + self.tfm_gens, image 43 | ) 44 | else: 45 | # Crop around an instance if there are instances in the image. 46 | # USER: Remove if you don't use cropping 47 | if self.crop_gen: 48 | crop_tfm = utils.gen_crop_transform_with_instance( 49 | self.crop_gen.get_crop_size(image.shape[:2]), 50 | image.shape[:2], 51 | np.random.choice(dataset_dict["annotations"]), 52 | ) 53 | image = crop_tfm.apply_image(image) 54 | image, transforms = T.apply_transform_gens(self.tfm_gens, image) 55 | if self.crop_gen: 56 | transforms = crop_tfm + transforms 57 | 58 | image_shape = image.shape[:2] # h, w 59 | 60 | # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory, 61 | # but not efficient on large generic data structures due to the use of pickle & mp.Queue. 62 | # Therefore it's important to use torch.Tensor. 63 | dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1))) 64 | 65 | # USER: Remove if you don't use pre-computed proposals. 66 | if self.load_proposals: 67 | utils.transform_proposals( 68 | dataset_dict, image_shape, transforms, self.min_box_side_len, self.proposal_topk 69 | ) 70 | 71 | if not self.is_train: 72 | # USER: Modify this if you want to keep them for some reason. 73 | dataset_dict.pop("annotations", None) 74 | dataset_dict.pop("sem_seg_file_name", None) 75 | return dataset_dict 76 | 77 | if "annotations" in dataset_dict: 78 | # USER: Modify this if you want to keep them for some reason. 79 | for anno in dataset_dict["annotations"]: 80 | if not self.mask_on: 81 | anno.pop("segmentation", None) 82 | if not self.keypoint_on: 83 | anno.pop("keypoints", None) 84 | 85 | # USER: Implement additional transformations if you have other types of data 86 | annos = [ 87 | utils.transform_instance_annotations( 88 | obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices 89 | ) 90 | for obj in dataset_dict.pop("annotations") 91 | if obj.get("iscrowd", 0) == 0 92 | ] 93 | instances = utils.annotations_to_instances( 94 | annos, image_shape, mask_format=self.mask_format 95 | ) 96 | # Create a tight bounding box from masks, useful when image is cropped 97 | if self.crop_gen and instances.has("gt_masks"): 98 | instances.gt_boxes = instances.gt_masks.get_bounding_boxes() 99 | 100 | ################################# 101 | # Custom attributes section 102 | ################################# 103 | 104 | # Get attributes from annos 105 | if len(annos) and 'attributes' in annos[0]: 106 | 107 | # get a list of list of attributes 108 | gt_attributes = [x['attributes'] for x in annos] 109 | gt_attributes = torch.tensor(gt_attributes, dtype=torch.float32) 110 | # Put attributes in Attributes class holder and add them to instances 111 | 112 | # Using Attributes(gt_attributes) needs more work - currently fails 113 | instances.gt_attributes = gt_attributes 114 | 115 | # End attributes section 116 | 117 | dataset_dict["instances"] = utils.filter_empty_instances(instances) 118 | return dataset_dict -------------------------------------------------------------------------------- /imaterialist/data/datasets/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Julienbeaulieu/iMaterialist2020-Image-Segmentation-on-Detectron2/8d96069bb021dd374fb8de310bb454d351971974/imaterialist/data/datasets/__init__.py -------------------------------------------------------------------------------- /imaterialist/data/datasets/coco.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import random 3 | import numpy as np 4 | 5 | from detectron2.structures import BoxMode 6 | from detectron2.data import DatasetCatalog, MetadataCatalog 7 | from imaterialist.data.datasets.make_dataset import load_dataset_into_dataframes 8 | from imaterialist.data.datasets.rle_utils import rle_decode_string 9 | 10 | 11 | 12 | # https://detectron2.readthedocs.io/tutorials/datasets.html 13 | # https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5 14 | def convert_to_datadict(df_input): 15 | """ 16 | :param df_input: 17 | :return: 18 | """ 19 | dataset_dicts = [] 20 | 21 | # Find the unique list of imageId, we will build the 22 | list_unique_ImageIds = df_input['ImageId'].unique().tolist() 23 | for idx, filename in enumerate(list_unique_ImageIds): 24 | 25 | record = {} 26 | 27 | # Convert to int otherwise evaluation will throw an error 28 | record['height'] = int(df_input[df_input['ImageId'] == filename]['Height'].values[0]) 29 | record['width'] = int(df_input[df_input['ImageId'] == filename]['Width'].values[0]) 30 | 31 | record['file_name'] = filename 32 | record['image_id'] = idx 33 | 34 | objs = [] 35 | for index, row in df_input[(df_input['ImageId'] == filename)].iterrows(): 36 | 37 | # Get binary mask 38 | mask = rle_decode_string(row['EncodedPixels'], row['Height'], row['Width']) 39 | 40 | # opencv 4.2+ 41 | # Transform the mask from binary to polygon format 42 | contours, hierarchy = cv2.findContours((mask).astype(np.uint8), cv2.RETR_TREE, 43 | cv2.CHAIN_APPROX_SIMPLE) 44 | 45 | # opencv 3.2 46 | # mask_new, contours, hierarchy = cv2.findContours((mask).astype(np.uint8), cv2.RETR_TREE, 47 | # cv2.CHAIN_APPROX_SIMPLE) 48 | 49 | segmentation = [] 50 | 51 | for contour in contours: 52 | contour = contour.flatten().tolist() 53 | # segmentation.append(contour) 54 | if len(contour) > 4: 55 | segmentation.append(contour) 56 | 57 | # Data for each mask 58 | obj = { 59 | 'bbox': [row['x0'], row['y0'], row['x1'], row['y1']], 60 | 'bbox_mode': BoxMode.XYXY_ABS, 61 | 'category_id': row['ClassId'], 62 | 'attributes': row['AttributesIds'], # New key: attributes 63 | 'segmentation': segmentation, 64 | } 65 | objs.append(obj) 66 | record['annotations'] = objs 67 | dataset_dicts.append(record) 68 | return dataset_dicts 69 | 70 | def register_datadict(datadict_input, label_dataset:str = "sample_fashion_train"): 71 | """ 72 | Register the data type with the Catalog function from Detectron2 code base. 73 | fixme: currently hard coded as sample_fashion_train sample_fashion_test 74 | """ 75 | _, _, df_categories = load_dataset_into_dataframes() 76 | # Register the train and test and set metadata 77 | 78 | DatasetCatalog.register(label_dataset, lambda d=datadict_input: convert_to_datadict(d)) 79 | MetadataCatalog.get(label_dataset).set(thing_classes=list(df_categories.name)) -------------------------------------------------------------------------------- /imaterialist/data/datasets/make_dataset.py: -------------------------------------------------------------------------------- 1 | import json 2 | import logging 3 | import pickle 4 | import numpy as np 5 | import pandas as pd 6 | from pathlib import Path 7 | from sklearn import preprocessing 8 | 9 | from imaterialist.data.datasets.rle_utils import rle_decode_string, rle2bbox 10 | 11 | from environs import Env 12 | 13 | env = Env() 14 | env.read_env() 15 | 16 | # Get training dataframe 17 | path_data = Path(env("path_raw")) 18 | path_image = path_data / "train/" 19 | path_data_interim = Path(env("path_interim")) 20 | 21 | def load_category_attributes(path_data: Path = path_data): 22 | # Get label descriptions 23 | with open(path_data / 'label_descriptions.json', 'r') as file: 24 | label_desc = json.load(file) 25 | 26 | df_categories = pd.DataFrame(label_desc['categories']) 27 | df_attributes = pd.DataFrame(label_desc['attributes']) 28 | 29 | return df_attributes, df_categories 30 | 31 | 32 | def load_dataset_into_dataframes(path_data: Path = path_data, n_cases: int = 0): 33 | """ 34 | Get all the CSV from the competition into dataframes. 35 | """ 36 | 37 | path_label = path_data / 'train.csv' 38 | df = pd.read_csv(path_label) 39 | 40 | # Just getting a smaller df to make the rest run faster 41 | if n_cases == 0: 42 | df = df.copy() 43 | elif n_cases != 0: 44 | df = df[:n_cases].copy() 45 | 46 | # Get label descriptions 47 | with open(path_data/'label_descriptions.json', 'r') as file: 48 | label_desc = json.load(file) 49 | 50 | df_categories = pd.DataFrame(label_desc['categories']) 51 | df_attributes = pd.DataFrame(label_desc['attributes']) 52 | 53 | return df, df_attributes, df_categories 54 | 55 | 56 | def attr_str_to_list(df, df_attributes): 57 | ''' 58 | Function that transforms DataFrame AttributeIds which are of type string into a 59 | list of integers. Strings must be converted because they cannot be transformed into Tensors 60 | ''' 61 | lb = preprocessing.LabelBinarizer() 62 | 63 | attribute_list = df_attributes.id.unique() 64 | attribute_list = np.sort(np.insert(attribute_list, 1, 999)) 65 | 66 | lb.fit(attribute_list) 67 | 68 | 69 | # cycle through all the non NaN rows - NaN causes an error 70 | for index, row in df.iterrows(): 71 | 72 | # Treating str differently than int 73 | if isinstance(row['AttributesIds'], str): 74 | 75 | # Convert each row's string into a list of strings 76 | df['AttributesIds'][index] = row['AttributesIds'].split(',') 77 | 78 | # Convert each string in the list to int 79 | df['AttributesIds'][index] = [int(x) for x in df['AttributesIds'][index]] 80 | 81 | # If int - make it a list of length 1 82 | if isinstance(row['AttributesIds'], int): 83 | df['AttributesIds'][index] = [999] 84 | 85 | df['AttributesIds'][index] = lb.transform(df['AttributesIds'][index]).sum(axis=0) 86 | 87 | 88 | def create_datadict(df_labels_masks, df_attributes): 89 | """ 90 | Creates the data dictionary necessary for Detectron2 which incorporated the additional following information: 91 | ImageId 92 | x0 93 | y0 94 | x1 95 | y1 96 | """ 97 | 98 | # Get image file path required for dict and add it to our data frame 99 | 100 | # Get only the first 50K labels, out of 333K labels. 101 | datedic_labels_masks = df_labels_masks.copy() # df sample 102 | 103 | # Append ImageId information. 104 | datedic_labels_masks['ImageId'] = str(path_image) + "/" + datedic_labels_masks['ImageId'] + ".jpg" 105 | 106 | # Get bboxes for each mask with our helper function 107 | bboxes = [rle2bbox(c.EncodedPixels, (c.Height, c.Width)) for n, c in datedic_labels_masks.iterrows()] 108 | 109 | # Turn list into array for proper indexing 110 | bboxes_array = np.array(bboxes) 111 | 112 | # Add each x, y coordinate as a column 113 | datedic_labels_masks['x0'], datedic_labels_masks['y0'], datedic_labels_masks['x1'], datedic_labels_masks['y1'] = bboxes_array[:, 0], bboxes_array[:, 1], bboxes_array[:,2], bboxes_array[:, 3] 114 | 115 | datedic_labels_masks = datedic_labels_masks.astype({"x0": int, "y0": int, "x1":int, 'y1':int}) 116 | 117 | #Replace NaNs from AttributeIds by 999 118 | datedic_labels_masks = datedic_labels_masks.fillna(999) 119 | 120 | # Turn attributes from string to list of ints with padding 121 | attr_str_to_list(datedic_labels_masks, df_attributes) 122 | 123 | return datedic_labels_masks 124 | 125 | def main(n_sample_size: int = 0, train_test_split: float = 0.8): 126 | """ 127 | Runs data processing scripts to turn raw train.csv dataframe from (../raw) into 128 | cleaned dataframe ready to be used by our dataset_dict. 129 | :param n_sample_size: 130 | :return: 131 | """ 132 | logger = logging.getLogger(__name__) 133 | logger.info('making final data set from raw data') 134 | 135 | data_full, df_attributes, _ = load_dataset_into_dataframes(n_cases=500) 136 | datadic_full = create_datadict(data_full, df_attributes) 137 | 138 | # if n_sample_size not specified, use entire data set. 139 | # If too large, use entire data set. 140 | if n_sample_size == 0 or int(n_sample_size) > datadic_full.shape[0]: 141 | n_sample_size = len(datadic_full) 142 | 143 | 144 | # Arbitrary split in training / testing dataframes 145 | n_train: int = round(n_sample_size * train_test_split) 146 | n_test: int = n_sample_size - n_train 147 | datadic_train = datadic_full[:n_train].copy() 148 | datadic_val = datadic_full[-n_test:].copy() 149 | 150 | pickle.dump(datadic_train, open(path_data_interim / f'imaterialist_train_multihot_n={n_train}.p', "wb")) 151 | pickle.dump(datadic_val, open(path_data_interim / f'imaterialist_test_multihot_n={n_test}.p', "wb")) 152 | 153 | # # Saving to feather format - faster than pickle 154 | # datadic_train.reset_index().to_feather(path_data_interim / f'imaterialist_train_multihot_n={n_train}.feather') 155 | # datadic_val.reset_index().to_feather(path_data_interim / f'imaterailist_test_multihot_n={n_test}.feather') 156 | 157 | if __name__ == '__main__': 158 | log_fmt = '%(asctime)s - %(name)s - %(levelname)s - %(message)s' 159 | logging.basicConfig(level=logging.INFO, format=log_fmt) 160 | 161 | # Change size to create larger datasets 162 | main(500) 163 | -------------------------------------------------------------------------------- /imaterialist/data/datasets/rle_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from PIL import Image 3 | import math 4 | 5 | # Rle helper functions 6 | 7 | def rle_decode_string(rle, h, w): 8 | ''' 9 | rle: run-length encoded image mask, as string 10 | h: heigh of image on which RLE was produced 11 | w: width of image on which RLE was produced 12 | returns a binary mask with the same shape 13 | ''' 14 | mask = np.full(h * w, 0, dtype=np.uint8) 15 | annotation = [int(x) for x in rle.split(' ')] 16 | for i, start_pixel in enumerate(annotation[::2]): 17 | mask[start_pixel: start_pixel + annotation[2 * i + 1]] = 1 18 | mask = mask.reshape((h, w), order='F') 19 | 20 | return mask 21 | 22 | 23 | def mask_to_KaggleRLE_old(img): 24 | ''' 25 | Source: https://www.kaggle.com/lifa08/run-length-encode-and-decode 26 | img: numpy array, 1 - mask, 0 - background 27 | Returns run length as string formated 28 | ''' 29 | pixels = img.flatten() 30 | pixels = np.concatenate([[0], pixels, [0]]) 31 | runs = np.where(pixels[1:] != pixels[:-1])[0] + 1 32 | runs[1::2] -= runs[::2] 33 | return ' '.join(str(x) for x in runs) 34 | 35 | def mask_to_KaggleRLENew(img): 36 | pixels = img.T.flatten() 37 | # We need to allow for cases where there is a '1' at either end of the sequence. 38 | # We do this by padding with a zero at each end when needed. 39 | use_padding = False 40 | if pixels[0] or pixels[-1]: 41 | use_padding = True 42 | pixel_padded = np.zeros([len(pixels) + 2], dtype=pixels.dtype) 43 | pixel_padded[1:-1] = pixels 44 | pixels = pixel_padded 45 | rle = np.where(pixels[1:] != pixels[:-1])[0] + 2 46 | if use_padding: 47 | rle = rle - 1 48 | rle[1::2] = rle[1::2] - rle[:-1:2] 49 | return ' '.join(str(x) for x in rle) 50 | 51 | 52 | 53 | def mask_to_KaggleRLE_downscale(img, max_size=1024): 54 | ''' 55 | Adaptive funciton to first DOWNSCALE the image before running RLE 56 | # Source: https://stackoverflow.com/a/28453021 57 | img: numpy array, 1 - mask, 0 - background 58 | Returns run length as string formated 59 | ''' 60 | # img is a tensor 61 | img_array = np.array(img) 62 | pil_image = Image.fromarray(img_array) 63 | width_current = pil_image.size[0] 64 | height_current = pil_image.size[1] 65 | 66 | longest_edge = max(width_current, height_current) 67 | 68 | # Always rescale 69 | if (width_current > height_current): 70 | new_width = max_size 71 | scaled_height = max_size / float(width_current) * height_current 72 | # Must floor the mask 73 | new_height = int(math.floor(scaled_height)) 74 | else: 75 | scale_width = max_size / float(height_current) * width_current 76 | # Must floor the mask 77 | new_width = int(math.floor(scale_width)) 78 | new_height = max_size 79 | 80 | pil_image = pil_image.resize((new_width, new_height), Image.NEAREST) 81 | 82 | image_array = np.array(pil_image) 83 | return mask_to_KaggleRLENew(image_array) 84 | 85 | 86 | def rle_encode(mask): 87 | pixels = mask.T.flatten() 88 | # We need to allow for cases where there is a '1' at either end of the sequence. 89 | # We do this by padding with a zero at each end when needed. 90 | use_padding = False 91 | if pixels[0] or pixels[-1]: 92 | use_padding = True 93 | pixel_padded = np.zeros([len(pixels) + 2], dtype=pixels.dtype) 94 | pixel_padded[1:-1] = pixels 95 | pixels = pixel_padded 96 | rle = np.where(pixels[1:] != pixels[:-1])[0] + 1 97 | if use_padding: 98 | rle = rle - 1 99 | rle[1::2] = rle[1::2] - rle[:-1:2] 100 | return rle 101 | 102 | 103 | def rle2bbox(rle, shape): 104 | ''' 105 | Get a bbox from a mask which is required for Detectron 2 dataset 106 | rle: run-length encoded image mask, as string 107 | shape: (height, width) of image on which RLE was produced 108 | Returns (x0, y0, x1, y1) tuple describing the bounding box of the rle mask 109 | 110 | Note on image vs np.array dimensions: 111 | 112 | np.array implies the `[y, x]` indexing order in terms of image dimensions, 113 | so the variable on `shape[0]` is `y`, and the variable on the `shape[1]` is `x`, 114 | hence the result would be correct (x0,y0,x1,y1) in terms of image dimensions 115 | for RLE-encoded indices of np.array (which are produced by widely used kernels 116 | and are used in most kaggle competitions datasets) 117 | ''' 118 | 119 | a = np.fromiter(rle.split(), dtype=np.uint) 120 | a = a.reshape((-1, 2)) # an array of (start, length) pairs 121 | a[:, 0] -= 1 # `start` is 1-indexed 122 | 123 | y0 = a[:, 0] % shape[0] 124 | y1 = y0 + a[:, 1] 125 | if np.any(y1 > shape[0]): 126 | # got `y` overrun, meaning that there are a pixels in mask on 0 and shape[0] position 127 | y0 = 0 128 | y1 = shape[0] 129 | else: 130 | y0 = np.min(y0) 131 | y1 = np.max(y1) 132 | 133 | x0 = a[:, 0] // shape[0] 134 | x1 = (a[:, 0] + a[:, 1]) // shape[0] 135 | x0 = np.min(x0) 136 | x1 = np.max(x1) 137 | 138 | if x1 > shape[1]: 139 | # just went out of the image dimensions 140 | raise ValueError("invalid RLE or image dimensions: x1=%d > shape[1]=%d" % ( 141 | x1, shape[1] 142 | )) 143 | 144 | return x0, y0, x1, y1 145 | 146 | def rle_decode_new(rle_str, mask_shape): 147 | # This is the reverse decoding mechanism 148 | # Source: https://www.kaggle.com/stainsby/fast-tested-rle-and-input-routines 149 | s = rle_str.split() 150 | starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])] 151 | starts -= 1 152 | ends = starts + lengths 153 | mask = np.zeros(np.prod(mask_shape), dtype=np.uint8) 154 | for lo, hi in zip(starts, ends): 155 | mask[lo:hi] = 1 156 | return mask.reshape(mask_shape[::-1]).T 157 | 158 | def refine_masks(masks): 159 | """ 160 | Take a series of masks 161 | sort them from small to large (yet preserve order information) 162 | Add each one to union, keep track of already counted pixel coordinate positions to make the masks "uniquefy" 163 | Convert each "uniquefied" mask to RLE 164 | return them in the order originally found 165 | 166 | :param masks: 167 | :param n_labels: 168 | :return: 169 | """ 170 | # Source: https://github.com/abhishekkrthakur/imat-fashion/blob/master/predict.py 171 | n_labels = len(masks) 172 | 173 | # Early return if nothing. 174 | if n_labels == 0: 175 | # Return empty string 176 | return [] 177 | 178 | masks = np.array(masks) 179 | 180 | # Compute the areas of each mask 181 | #mask_areas = np.sum(masks.reshape(-1, masks.shape[0]), axis=1) 182 | masks_areas = [np.sum(np.sum(mask)) for mask in masks] 183 | 184 | # Preserve the original order? 185 | masks_areas_ordered = list(enumerate(masks_areas)) 186 | masks_areas_ordered_sorted = sorted(masks_areas_ordered, key=lambda a: a[1]) 187 | 188 | # One reference mask is created to be incrementally populated 189 | # This has same as number of labels. 190 | 191 | # Generate a blank union mask, which all labels will be iteratively updating to ensure no overlap pixels. 192 | union_mask = np.zeros(masks[0].shape, dtype=bool) 193 | 194 | # Refined masks 195 | uniquefied_mask = [] 196 | 197 | # Iterate from the smallest, so smallest ones are preserved 198 | # Second parameter is area, useless. 199 | for mask_index, _ in masks_areas_ordered_sorted: 200 | # Current Mask: 201 | mask_current = masks[mask_index, :, :] 202 | 203 | # unionized version fo the current mask: Default to false/0 if not defined. 204 | # not logical_not, it turns all False (default) to True. True&True = True 205 | # All true for the first iteration only. 206 | union_mask_inverted = np.logical_not(union_mask) 207 | 208 | uniquefied_mask_current = np.logical_and( 209 | mask_current, 210 | union_mask_inverted # not logical_not, it turns all False (default) to True. True&True = True 211 | ) 212 | 213 | uniquefied_mask.append((mask_index, uniquefied_mask_current)) 214 | 215 | # update the union mask to include the latest calculation 216 | union_mask = np.logical_or(mask_current, union_mask) 217 | 218 | # sort this by original index. 219 | uniquefied_mask.sort(key=lambda a: a[0]) 220 | 221 | refined_rle = [] 222 | 223 | # Iterate through masks last axis 224 | for mask_index, uniquefied_mask_current in uniquefied_mask: 225 | 226 | # Change this line to determine whether to use downscaled KaggleRLE or regular KaggleRLE conversion process 227 | rle = mask_to_KaggleRLE_downscale(uniquefied_mask_current) 228 | #rle = mask_to_KaggleRLE_old(uniquefied_mask_current) 229 | #rle = mask_to_KaggleRLE_downscale(uniquefied_mask_current) 230 | refined_rle.append(rle) 231 | # Sanity check on uniquefying reduction 232 | # print(f"Original: {masks_areas[mask_index]}, {np.sum(np.sum(uniquefied_mask_current))}") 233 | 234 | return refined_rle 235 | -------------------------------------------------------------------------------- /imaterialist/data/datasets/rle_utils_old.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pycocotools.mask import encode, toBbox 3 | from typing import List 4 | from itertools import groupby 5 | 6 | 7 | def mask_to_uncompressed_CocoRLE(binary_mask): 8 | """ 9 | Source: https://stackoverflow.com/questions/49494337/encode-numpy-array-using-uncompressed-rle-for-coco-dataset 10 | """ 11 | rle = {'counts': [], 'size': list(binary_mask.shape)} 12 | counts = rle.get('counts') 13 | for i, (value, elements) in enumerate(groupby(binary_mask.ravel(order='F'))): 14 | if i == 0 and value == 1: 15 | counts.append(0) 16 | counts.append(len(list(elements))) 17 | return rle 18 | 19 | def mask_to_KaggleRLE(img): 20 | ''' 21 | Source: https://www.kaggle.com/lifa08/run-length-encode-and-decode 22 | img: numpy array, 1 - mask, 0 - background 23 | Returns run length as string formated 24 | ''' 25 | pixels = img.flatten() 26 | pixels = np.concatenate([[0], pixels, [0]]) 27 | runs = np.where(pixels[1:] != pixels[:-1])[0] + 1 28 | runs[1::2] -= runs[::2] 29 | return ' '.join(str(x) for x in runs) 30 | 31 | 32 | def KaggleRLE_to_mask1(mask_rle, shape): 33 | ''' 34 | # Source: https://www.kaggle.com/lifa08/run-length-encode-and-decode 35 | mask_rle: run-length as string formated (start length) 36 | shape: (height,width) of array to return, Height Width Format 37 | Returns numpy array, 1 - mask, 0 - background 38 | 39 | (in fortran format?) 40 | 41 | ''' 42 | s = mask_rle.split() 43 | starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])] 44 | starts -= 1 45 | ends = starts + lengths 46 | img = np.zeros(shape[0] * shape[1], dtype=np.uint8) 47 | 48 | for lo, hi in zip(starts, ends): 49 | img[lo:hi] = 1 50 | return img.reshape(shape) 51 | 52 | 53 | def KaggleRLE_to_CocoRLE(KaggleRLE: str, h: int, w: int) -> List[dict]: 54 | """ 55 | This wrapper function converts kaggle KaggleRLE to binary, then convert that binary to COCORLE. 56 | :param KaggleRLE: 57 | :param h: 58 | :param w: 59 | :return: 60 | """ 61 | # Conver to binary using tried and true masks. 62 | mask = KaggleRLE_to_mask(KaggleRLE, h, w) 63 | 64 | # using PyCoCoAPI to convert binary mask to CocoRLE format, which are re a LIST of dict of run-length encoding of binary masks. 65 | CocoRLE = encode(np.asfortranarray(mask)) 66 | 67 | return CocoRLE 68 | 69 | def KaggleRLE_to_CocoBoundBoxes(KaggleRLE: str, h: int, w: int) -> List[int]: 70 | 71 | # Generate the KaggleRLE in coco format 72 | CocoRLE = KaggleRLE_to_CocoRLE(KaggleRLE, h, w) 73 | 74 | # Generate the BBS using the CocoAPI. 75 | CocoBBS = toBbox(CocoRLE) 76 | 77 | return CocoBBS 78 | 79 | def KaggleRLE_to_mask(rle, h, w): 80 | ''' 81 | rle: run-length encoded image mask, as string 82 | h: heigh of image on which KaggleRLE was produced 83 | w: width of image on which KaggleRLE was produced 84 | 85 | returns a binary mask with the same shape 86 | 87 | ''' 88 | mask = np.full(h * w, 0, dtype=np.uint8) 89 | annotation = [int(x) for x in rle.split(' ')] 90 | for i, start_pixel in enumerate(annotation[::2]): 91 | mask[start_pixel: start_pixel + annotation[2 * i + 1]] = 1 92 | mask = mask.reshape((h, w), order='F') 93 | 94 | return mask 95 | 96 | 97 | def KaggleRLE_to_bbox(rle, shape): 98 | ''' 99 | Get a bbox from a mask which is required for Detectron 2 dataset 100 | rle: run-length encoded image mask, as string 101 | shape: (height, width) of image on which KaggleRLE was produced 102 | Returns (x0, y0, x1, y1) tuple describing the bounding box of the rle mask 103 | 104 | Note on image vs np.array dimensions: 105 | 106 | np.array implies the `[y, x]` indexing order in terms of image dimensions, 107 | so the variable on `shape[0]` is `y`, and the variable on the `shape[1]` is `x`, 108 | hence the result would be correct (x0,y0,x1,y1) in terms of image dimensions 109 | for KaggleRLE-encoded indices of np.array (which are produced by widely used kernels 110 | and are used in most kaggle competitions datasets) 111 | ''' 112 | 113 | a = np.fromiter(rle.split(), dtype=np.uint) 114 | a = a.reshape((-1, 2)) # an array of (start, length) pairs 115 | a[:, 0] -= 1 # `start` is 1-indexed 116 | 117 | y0 = a[:, 0] % shape[0] 118 | y1 = y0 + a[:, 1] 119 | if np.any(y1 > shape[0]): 120 | # got `y` overrun, meaning that there are a pixels in mask on 0 and shape[0] position 121 | y0 = 0 122 | y1 = shape[0] 123 | else: 124 | y0 = np.min(y0) 125 | y1 = np.max(y1) 126 | 127 | x0 = a[:, 0] // shape[0] 128 | x1 = (a[:, 0] + a[:, 1]) // shape[0] 129 | x0 = np.min(x0) 130 | x1 = np.max(x1) 131 | 132 | if x1 > shape[1]: 133 | # just went out of the image dimensions 134 | raise ValueError("invalid KaggleRLE or image dimensions: x1=%d > shape[1]=%d" % ( 135 | x1, shape[1] 136 | )) 137 | 138 | return x0, y0, x1, y1 -------------------------------------------------------------------------------- /imaterialist/data/datasets/test_rle.py: -------------------------------------------------------------------------------- 1 | from pycocotools.mask import decode, frPyObjects 2 | import pytest 3 | import numpy as np 4 | import pickle 5 | from pycocotools import mask 6 | import pytest 7 | from rle_utils_old import KaggleRLE_to_CocoRLE, KaggleRLE_to_mask, KaggleRLE_to_mask1, mask_to_uncompressed_CocoRLE, mask_to_KaggleRLE,KaggleRLE_to_CocoBoundBoxes, KaggleRLE_to_bbox 8 | from rle_utils import refine_masks, rle_decode_new 9 | from PIL import Image 10 | def test_pycocoapiRLE(): 11 | 12 | # Example prediction data. 13 | data1 = "PUi>9Td0j0\\O 1024 or raw_image.size[1] > 1024: 123 | pass 124 | total_pixel = raw_image.size[0] * raw_image.size[1] 125 | total_index = (int(RLElist[-2])+int(RLElist[-1])) 126 | outofbound = total_index - total_pixel 127 | if outofbound > 0: 128 | shithitfans = outofbound 129 | elif outofbound == 0 or total_index >= 1024*1024: 130 | shithitfans = 1 131 | else: 132 | shithitfans = 0 133 | 134 | #if shithitfans != 0: 135 | result.append(f"For image {name} sized at {raw_image.size}, RLE = total: {RLElist[-2:]}, {total_pixel} vs {total_index}. Out of bound {outofbound} {'FuckedUp ' * shithitfans} ") 136 | 137 | with open("/home/dyt811/Desktop/NewRLE_sizecheck.csv", mode='w', newline="") as f: 138 | writer = csv.writer(f) 139 | for row in result: 140 | writer.writerow(row) 141 | 142 | -------------------------------------------------------------------------------- /imaterialist/data/structures.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from typing import Iterator, List, Tuple, Union 3 | import torch.nn.functional as F 4 | 5 | # Base Attribute holder 6 | class Attributes: 7 | """ 8 | This structure stores a list of attributes as a Nx13 torch.Tensor. 9 | It behaves like a Tensor 10 | (support indexing, `to(device)`, `.device`, and iteration over all attributes) 11 | """ 12 | 13 | AttributeSizeType = Union[List[int], Tuple[int, int]] 14 | 15 | def __init__(self, tensor: torch.Tensor): 16 | """ 17 | Args: 18 | tensor (Tensor[float]): a Nx14 matrix. Each row is [attribute_1, attribute_2, ...]. 19 | """ 20 | device = tensor.device if isinstance(tensor, torch.Tensor) else torch.device("cpu") 21 | tensor = torch.as_tensor(tensor, dtype=torch.int64, device=device) 22 | if tensor.numel() == 0: 23 | # Use reshape, so we don't end up creating a new tensor that does not depend on 24 | # the inputs (and consequently confuses jit) 25 | tensor = tensor.reshape((0, 295)).to(dtype=torch.int64, device=device) 26 | assert tensor.dim() == 2 and tensor.size(-1) == 295, tensor.size() 27 | 28 | self.tensor = tensor 29 | 30 | 31 | def __getitem__(self, item: Union[int, slice, torch.BoolTensor]) -> "Boxes": 32 | """ 33 | Returns: 34 | Attributes: Create a new :class:`Attributes` by indexing. 35 | The following usage are allowed: 36 | 1. `new_attributes = attributes[3]`: return a `Attributes` which contains only one Attribute. 37 | 2. `new_attributes = attributes[2:10]`: return a slice of attributes. 38 | 3. `new_attributes = attributes[vector]`, where vector is a torch.BoolTensor 39 | with `length = len(attributes)`. Nonzero elements in the vector will be selected. 40 | Note that the returned Attributes might share storage with this Attributes, 41 | subject to Pytorch's indexing semantics. 42 | """ 43 | if isinstance(item, int): 44 | return Attributes(self.tensor[item].view(1, -1)) 45 | b = self.tensor[item] 46 | assert b.dim() == 2, "Indexing on Attributes with {} failed to return a matrix!".format(item) 47 | return Attributes(b) 48 | 49 | def __len__(self) -> int: 50 | return self.tensor.shape[0] 51 | 52 | def to(self, device: str) -> "Attributes": 53 | return Attributes(self.tensor.to(device)) 54 | 55 | def nonempty(self, threshold: float = 0.0) -> torch.Tensor: 56 | """ 57 | Find attributes that are non-empty. 58 | An attribute is considered empty if its first attribute in the list is 999. 59 | Returns: 60 | Tensor: 61 | a binary vector which represents whether each attribute is empty 62 | (False) or non-empty (True). 63 | """ 64 | attributes = self.tensor 65 | first_attr = attributes[:, 0] 66 | keep = (first_attr != 999) 67 | return keep 68 | 69 | def __repr__(self) -> str: 70 | return "Attributes(" + str(self.tensor) + ")" 71 | 72 | 73 | def remove_padding(self, attribute): 74 | pass 75 | 76 | @classmethod 77 | def cat(cls, attributes_list: List["Attributes"]) -> "Attributes": 78 | """ 79 | Concatenates a list of Attributes into a single Attributes 80 | Arguments: 81 | Attributes_list (list[Attributes]) 82 | Returns: 83 | Attributes: the concatenated Attributes 84 | """ 85 | assert isinstance(attributes_list, (list, tuple)) 86 | if len(attributes_list) == 0: 87 | return cls(torch.empty(0)) 88 | assert all(isinstance(attribute, Attributes) for attribute in attributes_list) 89 | 90 | # use torch.cat (v.s. layers.cat) so the returned boxes never share storage with input 91 | cat_attributes = cls(torch.cat([b.tensor for b in attributes_list], dim=0)) 92 | return cat_attributes 93 | 94 | def size(self): 95 | 'required in order to pass loss function assertions' 96 | return (len(self), 295) 97 | 98 | def numel(self): 99 | 'required in order to pass loss function assertions' 100 | return len(self) 101 | 102 | @property 103 | def device(self) -> torch.device: 104 | return self.tensor.device 105 | 106 | def __iter__(self) -> Iterator[torch.Tensor]: 107 | """ 108 | Yield attributes as a Tensor of shape (14,) at a time. 109 | """ 110 | yield from self.tensor -------------------------------------------------------------------------------- /imaterialist/evaluator.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import detectron2.data.transforms as T 3 | from detectron2.checkpoint import DetectionCheckpointer 4 | 5 | from imaterialist.data.datasets.coco import MetadataCatalog 6 | from imaterialist.modeling import build_model 7 | 8 | class iMatPredictor: 9 | """ 10 | A specailized predictor with attributes added! 11 | # Note the reference to the imaterialst MetaDataCatalog and modeling specifically! 12 | """ 13 | def __init__(self, cfg): 14 | self.cfg = cfg.clone() # cfg can be modified by model 15 | self.model = build_model(self.cfg) 16 | self.model.eval() 17 | self.metadata = MetadataCatalog.get(cfg.DATASETS.TEST[0]) 18 | 19 | checkpointer = DetectionCheckpointer(self.model) 20 | checkpointer.load(cfg.MODEL.WEIGHTS) 21 | 22 | self.transform_gen = T.ResizeShortestEdge( 23 | [cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MIN_SIZE_TEST], cfg.INPUT.MAX_SIZE_TEST 24 | ) 25 | 26 | self.input_format = cfg.INPUT.FORMAT 27 | assert self.input_format in ["RGB", "BGR"], self.input_format 28 | 29 | def __call__(self, original_image): 30 | """ 31 | Args: 32 | original_image (np.ndarray): an image of shape (H, W, C) (in BGR order). 33 | Returns: 34 | predictions (dict): 35 | the output of the model for one image only. 36 | See :doc:`/tutorials/models` for details about the format. 37 | """ 38 | with torch.no_grad(): # https://github.com/sphinx-doc/sphinx/issues/4258 39 | # Apply pre-processing to image. 40 | if self.input_format == "RGB": 41 | # whether the model expects BGR inputs or RGB 42 | original_image = original_image[:, :, ::-1] 43 | height, width = original_image.shape[:2] 44 | image = self.transform_gen.get_transform(original_image).apply_image(original_image) 45 | image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1)) 46 | 47 | inputs = {"image": image, "height": height, "width": width} 48 | predictions = self.model([inputs])[0] 49 | return predictions -------------------------------------------------------------------------------- /imaterialist/modeling/__init__.py: -------------------------------------------------------------------------------- 1 | from .attributes_rcnn import build_model -------------------------------------------------------------------------------- /imaterialist/modeling/attributes_rcnn.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import logging 3 | import numpy as np 4 | import torch 5 | from torch import nn 6 | 7 | from .roi_heads.roi_heads import build_roi_heads 8 | 9 | from detectron2.structures import ImageList 10 | from detectron2.utils.events import get_event_storage 11 | from detectron2.utils.logger import log_first_n 12 | from detectron2.utils.registry import Registry 13 | from detectron2.modeling.backbone import build_backbone 14 | from detectron2.modeling.postprocessing import detector_postprocess 15 | from detectron2.modeling.proposal_generator import build_proposal_generator 16 | from detectron2.modeling.meta_arch.build import META_ARCH_REGISTRY 17 | 18 | 19 | META_ARCH_REGISTRY = Registry("META_ARCH") # noqa F401 isort:skip 20 | META_ARCH_REGISTRY.__doc__ = """ 21 | Registry for meta-architectures, i.e. the whole model. 22 | 23 | The registered object will be called with `obj(cfg)` 24 | and expected to return a `nn.Module` object. 25 | """ 26 | 27 | __all__ = ["GeneralizedRCNN", "ProposalNetwork", "build_model"] 28 | 29 | 30 | @META_ARCH_REGISTRY.register() 31 | class GeneralizedRCNN(nn.Module): 32 | """ 33 | Generalized R-CNN. Any models that contains the following three components: 34 | 1. Per-image feature extraction (aka backbone) 35 | 2. Region proposal generation 36 | 3. Per-region feature extraction and prediction 37 | """ 38 | 39 | def __init__(self, cfg): 40 | super().__init__() 41 | 42 | self.backbone = build_backbone(cfg) 43 | self.proposal_generator = build_proposal_generator(cfg, self.backbone.output_shape()) 44 | self.roi_heads = build_roi_heads(cfg, self.backbone.output_shape()) 45 | self.vis_period = cfg.VIS_PERIOD 46 | self.input_format = cfg.INPUT.FORMAT 47 | 48 | assert len(cfg.MODEL.PIXEL_MEAN) == len(cfg.MODEL.PIXEL_STD) 49 | self.register_buffer("pixel_mean", torch.Tensor(cfg.MODEL.PIXEL_MEAN).view(-1, 1, 1)) 50 | self.register_buffer("pixel_std", torch.Tensor(cfg.MODEL.PIXEL_STD).view(-1, 1, 1)) 51 | 52 | @property 53 | def device(self): 54 | return self.pixel_mean.device 55 | 56 | def visualize_training(self, batched_inputs, proposals): 57 | """ 58 | A function used to visualize images and proposals. It shows ground truth 59 | bounding boxes on the original image and up to 20 predicted object 60 | proposals on the original image. Users can implement different 61 | visualization functions for different models. 62 | 63 | Args: 64 | batched_inputs (list): a list that contains input to the model. 65 | proposals (list): a list that contains predicted proposals. Both 66 | batched_inputs and proposals should have the same length. 67 | """ 68 | from detectron2.utils.visualizer import Visualizer 69 | 70 | storage = get_event_storage() 71 | max_vis_prop = 20 72 | 73 | for input, prop in zip(batched_inputs, proposals): 74 | img = input["image"].cpu().numpy() 75 | assert img.shape[0] == 3, "Images should have 3 channels." 76 | if self.input_format == "BGR": 77 | img = img[::-1, :, :] 78 | img = img.transpose(1, 2, 0) 79 | v_gt = Visualizer(img, None) 80 | v_gt = v_gt.overlay_instances(boxes=input["instances"].gt_boxes) 81 | anno_img = v_gt.get_image() 82 | box_size = min(len(prop.proposal_boxes), max_vis_prop) 83 | v_pred = Visualizer(img, None) 84 | v_pred = v_pred.overlay_instances( 85 | boxes=prop.proposal_boxes[0:box_size].tensor.cpu().numpy() 86 | ) 87 | prop_img = v_pred.get_image() 88 | vis_img = np.concatenate((anno_img, prop_img), axis=1) 89 | vis_img = vis_img.transpose(2, 0, 1) 90 | vis_name = "Left: GT bounding boxes; Right: Predicted proposals" 91 | storage.put_image(vis_name, vis_img) 92 | break # only visualize one image in a batch 93 | 94 | def forward(self, batched_inputs): 95 | """ 96 | Args: 97 | batched_inputs: a list, batched outputs of :class:`DatasetMapper` . 98 | Each item in the list contains the inputs for one image. 99 | For now, each item in the list is a dict that contains: 100 | 101 | * image: Tensor, image in (C, H, W) format. 102 | * instances (optional): groundtruth :class:`Instances` 103 | * proposals (optional): :class:`Instances`, precomputed proposals. 104 | 105 | Other information that's included in the original dicts, such as: 106 | 107 | * "height", "width" (int): the output resolution of the model, used in inference. 108 | See :meth:`postprocess` for details. 109 | 110 | Returns: 111 | list[dict]: 112 | Each dict is the output for one input image. 113 | The dict contains one key "instances" whose value is a :class:`Instances`. 114 | The :class:`Instances` object has the following keys: 115 | "pred_boxes", "pred_classes", "scores", "pred_masks", "pred_keypoints" 116 | """ 117 | if not self.training: 118 | return self.inference(batched_inputs) 119 | 120 | images = self.preprocess_image(batched_inputs) 121 | if "instances" in batched_inputs[0]: 122 | gt_instances = [x["instances"].to(self.device) for x in batched_inputs] 123 | elif "targets" in batched_inputs[0]: 124 | log_first_n( 125 | logging.WARN, "'targets' in the model inputs is now renamed to 'instances'!", n=10 126 | ) 127 | gt_instances = [x["targets"].to(self.device) for x in batched_inputs] 128 | else: 129 | gt_instances = None 130 | 131 | features = self.backbone(images.tensor) 132 | 133 | if self.proposal_generator: 134 | proposals, proposal_losses = self.proposal_generator(images, features, gt_instances) 135 | else: 136 | assert "proposals" in batched_inputs[0] 137 | proposals = [x["proposals"].to(self.device) for x in batched_inputs] 138 | proposal_losses = {} 139 | 140 | _, detector_losses = self.roi_heads(images, features, proposals, gt_instances) 141 | if self.vis_period > 0: 142 | storage = get_event_storage() 143 | if storage.iter % self.vis_period == 0: 144 | self.visualize_training(batched_inputs, proposals) 145 | 146 | losses = {} 147 | losses.update(detector_losses) 148 | losses.update(proposal_losses) 149 | return losses 150 | 151 | def inference(self, batched_inputs, detected_instances=None, do_postprocess=True): 152 | """ 153 | Run inference on the given inputs. 154 | 155 | Args: 156 | batched_inputs (list[dict]): same as in :meth:`forward` 157 | detected_instances (None or list[Instances]): if not None, it 158 | contains an `Instances` object per image. The `Instances` 159 | object contains "pred_boxes" and "pred_classes" which are 160 | known boxes in the image. 161 | The inference will then skip the detection of bounding boxes, 162 | and only predict other per-ROI outputs. 163 | do_postprocess (bool): whether to apply post-processing on the outputs. 164 | 165 | Returns: 166 | same as in :meth:`forward`. 167 | """ 168 | assert not self.training 169 | 170 | images = self.preprocess_image(batched_inputs) 171 | features = self.backbone(images.tensor) 172 | 173 | if detected_instances is None: 174 | if self.proposal_generator: 175 | proposals, _ = self.proposal_generator(images, features, None) 176 | else: 177 | assert "proposals" in batched_inputs[0] 178 | proposals = [x["proposals"].to(self.device) for x in batched_inputs] 179 | 180 | results, _ = self.roi_heads(images, features, proposals, None) 181 | else: 182 | detected_instances = [x.to(self.device) for x in detected_instances] 183 | results = self.roi_heads.forward_with_given_boxes(features, detected_instances) 184 | 185 | if do_postprocess: 186 | return GeneralizedRCNN._postprocess(results, batched_inputs, images.image_sizes) 187 | else: 188 | return results 189 | 190 | def preprocess_image(self, batched_inputs): 191 | """ 192 | Normalize, pad and batch the input images. 193 | """ 194 | images = [x["image"].to(self.device) for x in batched_inputs] 195 | images = [(x - self.pixel_mean) / self.pixel_std for x in images] 196 | images = ImageList.from_tensors(images, self.backbone.size_divisibility) 197 | return images 198 | 199 | @staticmethod 200 | def _postprocess(instances, batched_inputs, image_sizes): 201 | """ 202 | Rescale the output instances to the target size. 203 | """ 204 | # note: private function; subject to changes 205 | processed_results = [] 206 | for results_per_image, input_per_image, image_size in zip( 207 | instances, batched_inputs, image_sizes 208 | ): 209 | height = input_per_image.get("height", image_size[0]) 210 | width = input_per_image.get("width", image_size[1]) 211 | r = detector_postprocess(results_per_image, height, width) 212 | processed_results.append({"instances": r}) 213 | return processed_results 214 | 215 | 216 | @META_ARCH_REGISTRY.register() 217 | class ProposalNetwork(nn.Module): 218 | """ 219 | A meta architecture that only predicts object proposals. 220 | """ 221 | 222 | def __init__(self, cfg): 223 | super().__init__() 224 | self.backbone = build_backbone(cfg) 225 | self.proposal_generator = build_proposal_generator(cfg, self.backbone.output_shape()) 226 | 227 | self.register_buffer("pixel_mean", torch.Tensor(cfg.MODEL.PIXEL_MEAN).view(-1, 1, 1)) 228 | self.register_buffer("pixel_std", torch.Tensor(cfg.MODEL.PIXEL_STD).view(-1, 1, 1)) 229 | 230 | @property 231 | def device(self): 232 | return self.pixel_mean.device 233 | 234 | def forward(self, batched_inputs): 235 | """ 236 | Args: 237 | Same as in :class:`GeneralizedRCNN.forward` 238 | 239 | Returns: 240 | list[dict]: 241 | Each dict is the output for one input image. 242 | The dict contains one key "proposals" whose value is a 243 | :class:`Instances` with keys "proposal_boxes" and "objectness_logits". 244 | """ 245 | images = [x["image"].to(self.device) for x in batched_inputs] 246 | images = [(x - self.pixel_mean) / self.pixel_std for x in images] 247 | images = ImageList.from_tensors(images, self.backbone.size_divisibility) 248 | features = self.backbone(images.tensor) 249 | 250 | if "instances" in batched_inputs[0]: 251 | gt_instances = [x["instances"].to(self.device) for x in batched_inputs] 252 | elif "targets" in batched_inputs[0]: 253 | log_first_n( 254 | logging.WARN, "'targets' in the model inputs is now renamed to 'instances'!", n=10 255 | ) 256 | gt_instances = [x["targets"].to(self.device) for x in batched_inputs] 257 | else: 258 | gt_instances = None 259 | proposals, proposal_losses = self.proposal_generator(images, features, gt_instances) 260 | # In training, the proposals are not useful at all but we generate them anyway. 261 | # This makes RPN-only models about 5% slower. 262 | if self.training: 263 | return proposal_losses 264 | 265 | processed_results = [] 266 | for results_per_image, input_per_image, image_size in zip( 267 | proposals, batched_inputs, images.image_sizes 268 | ): 269 | height = input_per_image.get("height", image_size[0]) 270 | width = input_per_image.get("width", image_size[1]) 271 | r = detector_postprocess(results_per_image, height, width) 272 | processed_results.append({"proposals": r}) 273 | return processed_results 274 | 275 | def build_model(cfg): 276 | """ 277 | Build the whole model architecture, defined by ``cfg.MODEL.META_ARCHITECTURE``. 278 | Note that it does not load any weights from ``cfg``. 279 | """ 280 | meta_arch = cfg.MODEL.META_ARCHITECTURE 281 | model = META_ARCH_REGISTRY.get(meta_arch)(cfg) 282 | model.to(torch.device(cfg.MODEL.DEVICE)) 283 | return model 284 | -------------------------------------------------------------------------------- /imaterialist/modeling/roi_heads/__init__.py: -------------------------------------------------------------------------------- 1 | from .roi_heads import ( 2 | ROI_HEADS_REGISTRY, 3 | build_roi_heads, 4 | StandardROIHeads, 5 | ) 6 | 7 | -------------------------------------------------------------------------------- /imaterialist/modeling/roi_heads/attributes_head.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import logging 3 | import torch 4 | from torch import nn 5 | from torch.nn import functional as F 6 | 7 | from detectron2.config import configurable 8 | from detectron2.layers import Linear, ShapeSpec, batched_nms, cat 9 | from detectron2.modeling.box_regression import Box2BoxTransform 10 | from detectron2.structures import Boxes, Instances 11 | from detectron2.utils.events import get_event_storage 12 | from detectron2.modeling.roi_heads.fast_rcnn import FastRCNNOutputLayers, FastRCNNOutputs 13 | from detectron2.modeling.box_regression import Box2BoxTransform 14 | 15 | __all__ = ["fast_rcnn_inference", "AttributesFastRCNNOutputLayers"] 16 | 17 | 18 | logger = logging.getLogger(__name__) 19 | 20 | """ 21 | Shape shorthand in this module: 22 | 23 | N: number of images in the minibatch 24 | R: number of ROIs, combined over all images, in the minibatch 25 | Ri: number of ROIs in image i 26 | K: number of foreground classes. E.g.,there are 80 foreground classes in COCO. 27 | 28 | Naming convention: 29 | 30 | deltas: refers to the 4-d (dx, dy, dw, dh) deltas that parameterize the box2box 31 | transform (see :class:`box_regression.Box2BoxTransform`). 32 | 33 | pred_class_logits: predicted class scores in [-inf, +inf]; use 34 | softmax(pred_class_logits) to estimate P(class). 35 | 36 | gt_classes: ground-truth classification labels in [0, K], where [0, K) represent 37 | foreground object classes and K represents the background class. 38 | 39 | pred_proposal_deltas: predicted box2box transform deltas for transforming proposals 40 | to detection box predictions. 41 | 42 | gt_proposal_deltas: ground-truth box2box transform deltas 43 | """ 44 | 45 | 46 | def fast_rcnn_inference(boxes, scores, attr_scores, image_shapes, score_thresh, nms_thresh, topk_per_image): 47 | """ 48 | Call `fast_rcnn_inference_single_image` for all images. 49 | 50 | Args: 51 | boxes (list[Tensor]): A list of Tensors of predicted class-specific or class-agnostic 52 | boxes for each image. Element i has shape (Ri, K * 4) if doing 53 | class-specific regression, or (Ri, 4) if doing class-agnostic 54 | regression, where Ri is the number of predicted objects for image i. 55 | This is compatible with the output of :meth:`FastRCNNOutputLayers.predict_boxes`. 56 | scores (list[Tensor]): A list of Tensors of predicted class scores for each image. 57 | Element i has shape (Ri, K + 1), where Ri is the number of predicted objects 58 | for image i. Compatible with the output of :meth:`FastRCNNOutputLayers.predict_probs`. 59 | 60 | New: 61 | attributes (list[Tensor]): A list of Tensors of predicted attributes for each images. 62 | Element i has shape (Ri, K * 14). 63 | 64 | image_shapes (list[tuple]): A list of (width, height) tuples for each image in the batch. 65 | score_thresh (float): Only return detections with a confidence score exceeding this 66 | threshold. 67 | nms_thresh (float): The threshold to use for box non-maximum suppression. Value in [0, 1]. 68 | topk_per_image (int): The number of top scoring detections to return. Set < 0 to return 69 | all detections. 70 | 71 | Returns: 72 | instances: (list[Instances]): A list of N instances, one for each image in the batch, 73 | that stores the topk most confidence detections. 74 | kept_indices: (list[Tensor]): A list of 1D tensor of length of N, each element indicates 75 | the corresponding boxes/scores index in [0, Ri) from the input, for image i. 76 | """ 77 | result_per_image = [ 78 | fast_rcnn_inference_single_image( 79 | boxes_per_image, 80 | scores_per_image, 81 | attributes_per_image, 82 | image_shape, 83 | score_thresh, 84 | nms_thresh, 85 | topk_per_image 86 | ) 87 | for scores_per_image, boxes_per_image, attributes_per_image, image_shape in zip(scores, boxes, attr_scores, image_shapes) 88 | ] 89 | return [x[0] for x in result_per_image], [x[1] for x in result_per_image] 90 | 91 | 92 | def fast_rcnn_inference_single_image( 93 | boxes, scores, attr_scores, image_shape, score_thresh, nms_thresh, topk_per_image): 94 | """ 95 | Single-image inference. Return bounding-box detection results by thresholding 96 | on scores and applying non-maximum suppression (NMS). 97 | 98 | Args: 99 | Same as `fast_rcnn_inference`, but with boxes, scores, and image shapes 100 | per image. 101 | 102 | Returns: 103 | Same as `fast_rcnn_inference`, but for only one image. 104 | """ 105 | # Make sure boxes and scores don't contain infinite or Nan 106 | valid_mask = torch.isfinite(boxes).all(dim=1) & torch.isfinite(scores).all(dim=1) \ 107 | & torch.isfinite(attr_scores).all(dim=1) 108 | 109 | # Get scores from finite boxes and scores 110 | if not valid_mask.all(): 111 | boxes = boxes[valid_mask] 112 | scores = scores[valid_mask] 113 | attr_scores = attr_scores[valid_mask] 114 | 115 | scores = scores[:, :-1] # Remove background class? 116 | num_bbox_reg_classes = boxes.shape[1] // 4 117 | # Convert to Boxes to use the `clip` function ... 118 | boxes = Boxes(boxes.reshape(-1, 4)) 119 | boxes.clip(image_shape) 120 | boxes = boxes.tensor.view(-1, num_bbox_reg_classes, 4) # R x C x 4 121 | 122 | # If using Attributes class: 123 | # attributes = Attributes(attributes.reshape(-1, 295)) 124 | # attributes = attributes.tensor.view(-1, num_bbox_reg_classes, 295) 125 | 126 | # Filter results based on detection scores 127 | filter_mask = scores > score_thresh # R x K 128 | # R' x 2. First column contains indices of the R predictions; 129 | # Second column contains indices of classes. 130 | filter_inds = filter_mask.nonzero() 131 | 132 | if num_bbox_reg_classes == 1: 133 | boxes = boxes[filter_inds[:, 0], 0] 134 | else: 135 | boxes = boxes[filter_mask] 136 | scores = scores[filter_mask] 137 | 138 | # Apply per-class NMS 139 | keep = batched_nms(boxes, scores, filter_inds[:, 1], nms_thresh) 140 | if topk_per_image >= 0: 141 | keep = keep[:topk_per_image] 142 | boxes, scores, attr_scores, filter_inds, = boxes[keep], scores[keep], attr_scores[keep], filter_inds[keep] 143 | 144 | result = Instances(image_shape) 145 | result.pred_boxes = Boxes(boxes) 146 | result.scores = scores 147 | result.attr_scores = attr_scores 148 | result.pred_classes = filter_inds[:, 1] 149 | return result, filter_inds[:, 0] 150 | 151 | 152 | class AttributesFastRCNNOutputs(FastRCNNOutputs): 153 | """ 154 | A class that stores information about outputs of a Fast R-CNN head. 155 | It provides methods that are used to decode the outputs of a Fast R-CNN head. 156 | """ 157 | 158 | def __init__( 159 | self, 160 | box2box_transform, 161 | pred_class_logits, 162 | pred_attributes, 163 | pred_proposal_deltas, 164 | proposals, 165 | smooth_l1_beta=0, 166 | ): 167 | """ 168 | Args: 169 | box2box_transform (Box2BoxTransform/Box2BoxTransformRotated): 170 | box2box transform instance for proposal-to-detection transformations. 171 | pred_class_logits (Tensor): A tensor of shape (R, K + 1) storing the predicted class 172 | logits for all R predicted object instances. 173 | Each row corresponds to a predicted object instance. 174 | pred_proposal_deltas (Tensor): A tensor of shape (R, K * B) or (R, B) for 175 | class-specific or class-agnostic regression. It stores the predicted deltas that 176 | transform proposals into final box detections. 177 | B is the box dimension (4 or 5). 178 | When B is 4, each row is [dx, dy, dw, dh (, ....)]. 179 | When B is 5, each row is [dx, dy, dw, dh, da (, ....)]. 180 | proposals (list[Instances]): A list of N Instances, where Instances i stores the 181 | proposals for image i, in the field "proposal_boxes". 182 | When training, each Instances must have ground-truth labels 183 | stored in the field "gt_classes" and "gt_boxes". 184 | The total number of all instances must be equal to R. 185 | smooth_l1_beta (float): The transition point between L1 and L2 loss in 186 | the smooth L1 loss function. When set to 0, the loss becomes L1. When 187 | set to +inf, the loss becomes constant 0. 188 | """ 189 | self.box2box_transform = box2box_transform 190 | self.num_preds_per_image = [len(p) for p in proposals] 191 | self.pred_class_logits = pred_class_logits 192 | self.pred_attributes = pred_attributes # attribute predictions 193 | self.pred_proposal_deltas = pred_proposal_deltas 194 | self.smooth_l1_beta = smooth_l1_beta 195 | self.image_shapes = [x.image_size for x in proposals] 196 | 197 | if len(proposals): 198 | box_type = type(proposals[0].proposal_boxes) 199 | 200 | # Used if we take the Attributes class 201 | attribute_type = type(proposals[0].gt_attributes) 202 | 203 | # cat(..., dim=0) concatenates over all images in the batch 204 | self.proposals = box_type.cat([p.proposal_boxes for p in proposals]) 205 | assert ( 206 | not self.proposals.tensor.requires_grad 207 | ), "Proposals should not require gradients!" 208 | 209 | # The following fields should exist only when training. 210 | if proposals[0].has("gt_boxes"): 211 | self.gt_boxes = box_type.cat([p.gt_boxes for p in proposals]) 212 | assert proposals[0].has("gt_classes") 213 | self.gt_classes = cat([p.gt_classes for p in proposals], dim=0) 214 | self.gt_attributes = cat([p.gt_attributes for p in proposals], dim=0) 215 | 216 | # use this line if using Attributes class 217 | #self.gt_attributes = attribute_type.cat([p.gt_attributes for p in proposals]) 218 | else: 219 | self.proposals = Boxes(torch.zeros(0, 4, device=self.pred_proposal_deltas.device)) 220 | self._no_instances = len(proposals) == 0 # no instances found 221 | 222 | def _log_accuracy(self): 223 | """ 224 | Log the accuracy metrics to EventStorage. 225 | """ 226 | num_instances = self.gt_classes.numel() 227 | pred_classes = self.pred_class_logits.argmax(dim=1) 228 | bg_class_ind = self.pred_class_logits.shape[1] - 1 229 | 230 | fg_inds = (self.gt_classes >= 0) & (self.gt_classes < bg_class_ind) 231 | num_fg = fg_inds.nonzero().numel() 232 | fg_gt_classes = self.gt_classes[fg_inds] 233 | fg_pred_classes = pred_classes[fg_inds] 234 | 235 | num_false_negative = (fg_pred_classes == bg_class_ind).nonzero().numel() 236 | num_accurate = (pred_classes == self.gt_classes).nonzero().numel() 237 | fg_num_accurate = (fg_pred_classes == fg_gt_classes).nonzero().numel() 238 | 239 | storage = get_event_storage() 240 | if num_instances > 0: 241 | storage.put_scalar("fast_rcnn/cls_accuracy", num_accurate / num_instances) 242 | if num_fg > 0: 243 | storage.put_scalar("fast_rcnn/fg_cls_accuracy", fg_num_accurate / num_fg) 244 | storage.put_scalar("fast_rcnn/false_negative", num_false_negative / num_fg) 245 | 246 | def binary_cross_entropy_loss(self): 247 | """ 248 | Compute the binary cross entropy loss for attribute classification. 249 | 250 | Returns: 251 | scalar Tensor 252 | """ 253 | if self._no_instances: 254 | return 0.0 255 | else: 256 | return F.binary_cross_entropy_with_logits( 257 | self.pred_attributes, 258 | self.gt_attributes, 259 | reduction="mean") 260 | 261 | def losses(self): 262 | """ 263 | Compute the default losses for box head in Fast(er) R-CNN, 264 | with softmax cross entropy loss and smooth L1 loss. 265 | 266 | Returns: 267 | A dict of losses (scalar tensors) containing keys "loss_cls" and "loss_box_reg". 268 | """ 269 | return { 270 | "loss_cls": self.softmax_cross_entropy_loss(), 271 | "loss_box_reg": self.smooth_l1_loss(), 272 | "loss_attr": self.binary_cross_entropy_loss() 273 | } 274 | 275 | class AttributesFastRCNNOutputLayers(FastRCNNOutputLayers): 276 | """ 277 | Two linear layers for predicting Fast R-CNN outputs: 278 | (1) proposal-to-detection box regression deltas 279 | (2) classification scores 280 | (3) attribute scores 281 | """ 282 | 283 | @configurable 284 | def __init__( 285 | self, 286 | input_shape, 287 | *, 288 | box2box_transform, 289 | num_classes, 290 | num_attributes, 291 | cls_agnostic_bbox_reg=False, 292 | smooth_l1_beta=0.0, 293 | test_score_thresh=0.0, 294 | test_nms_thresh=0.5, 295 | test_topk_per_image=100, 296 | ): 297 | """ 298 | NOTE: this interface is experimental. 299 | 300 | Args: 301 | input_shape (ShapeSpec): shape of the input feature to this module 302 | box2box_transform (Box2BoxTransform or Box2BoxTransformRotated): 303 | num_classes (int): number of foreground classes 304 | cls_agnostic_bbox_reg (bool): whether to use class agnostic for bbox regression 305 | smooth_l1_beta (float): transition point from L1 to L2 loss. 306 | test_score_thresh (float): threshold to filter predictions results. 307 | test_nms_thresh (float): NMS threshold for prediction results. 308 | test_topk_per_image (int): number of top predictions to produce per image. 309 | """ 310 | super().__init__(input_shape, box2box_transform=box2box_transform, num_classes=num_classes) 311 | if isinstance(input_shape, int): # some backward compatbility 312 | input_shape = ShapeSpec(channels=input_shape) 313 | input_size = input_shape.channels * (input_shape.width or 1) * (input_shape.height or 1) 314 | # The prediction layer for num_classes foreground classes and one background class 315 | # (hence + 1) 316 | self.cls_score = Linear(input_size, num_classes + 1) 317 | 318 | # Add attribute branch 319 | self.attr_scores = Linear(input_size, num_attributes) 320 | 321 | num_bbox_reg_classes = 1 if cls_agnostic_bbox_reg else num_classes 322 | box_dim = len(box2box_transform.weights) 323 | self.bbox_pred = Linear(input_size, num_bbox_reg_classes * box_dim) 324 | 325 | nn.init.normal_(self.cls_score.weight, std=0.01) 326 | nn.init.normal_(self.attr_scores.weight, std=0.01) 327 | nn.init.normal_(self.bbox_pred.weight, std=0.001) 328 | for l in [self.cls_score, self.attr_scores, self.bbox_pred]: 329 | nn.init.constant_(l.bias, 0) 330 | 331 | self.box2box_transform = box2box_transform 332 | self.smooth_l1_beta = smooth_l1_beta 333 | self.test_score_thresh = test_score_thresh 334 | self.test_nms_thresh = test_nms_thresh 335 | self.test_topk_per_image = test_topk_per_image 336 | 337 | @classmethod 338 | def from_config(cls, cfg, input_shape): 339 | return { 340 | "input_shape": input_shape, 341 | "box2box_transform": Box2BoxTransform(weights=cfg.MODEL.ROI_BOX_HEAD.BBOX_REG_WEIGHTS), 342 | # fmt: off 343 | "num_classes" : cfg.MODEL.ROI_HEADS.NUM_CLASSES, 344 | "num_attributes" : cfg.MODEL.ROI_HEADS.NUM_ATTRIBUTES, 345 | "cls_agnostic_bbox_reg" : cfg.MODEL.ROI_BOX_HEAD.CLS_AGNOSTIC_BBOX_REG, 346 | "smooth_l1_beta" : cfg.MODEL.ROI_BOX_HEAD.SMOOTH_L1_BETA, 347 | "test_score_thresh" : cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST, 348 | "test_nms_thresh" : cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST, 349 | "test_topk_per_image" : cfg.TEST.DETECTIONS_PER_IMAGE 350 | # fmt: on 351 | } 352 | 353 | def forward(self, x): 354 | """ 355 | Returns: 356 | Tensor: Nx(K+1) scores for each box 357 | Tensor: Nx4 or Nx(Kx4) bounding box regression deltas. 358 | """ 359 | if x.dim() > 2: 360 | x = torch.flatten(x, start_dim=1) 361 | scores = self.cls_score(x) 362 | attr_scores = self.attr_scores(x) 363 | proposal_deltas = self.bbox_pred(x) 364 | return scores, attr_scores, proposal_deltas 365 | 366 | # TODO: move the implementation to this class. 367 | def losses(self, predictions, proposals): 368 | """ 369 | Args: 370 | predictions: return values of :meth:`forward()`. 371 | proposals (list[Instances]): proposals that match the features 372 | that were used to compute predictions. 373 | """ 374 | scores, attr_scores, proposal_deltas = predictions 375 | return AttributesFastRCNNOutputs( 376 | self.box2box_transform, scores, attr_scores, proposal_deltas, proposals, self.smooth_l1_beta 377 | ).losses() 378 | 379 | def inference(self, predictions, proposals): 380 | """ 381 | Returns: 382 | list[Instances]: same as `fast_rcnn_inference`. 383 | list[Tensor]: same as `fast_rcnn_inference`. 384 | """ 385 | boxes = self.predict_boxes(predictions, proposals) 386 | scores = self.predict_probs(predictions, proposals) 387 | attr_scores = self.predict_attribute_probs(predictions, proposals) 388 | image_shapes = [x.image_size for x in proposals] 389 | return fast_rcnn_inference( 390 | boxes, 391 | scores, 392 | attr_scores, 393 | image_shapes, 394 | self.test_score_thresh, 395 | self.test_nms_thresh, 396 | self.test_topk_per_image, 397 | ) 398 | 399 | def predict_boxes_for_gt_classes(self, predictions, proposals): 400 | """ 401 | Returns: 402 | list[Tensor]: A list of Tensors of predicted boxes for GT classes in case of 403 | class-specific box head. Element i of the list has shape (Ri, B), where Ri is 404 | the number of predicted objects for image i and B is the box dimension (4 or 5) 405 | """ 406 | if not len(proposals): 407 | return [] 408 | scores, _, proposal_deltas = predictions 409 | proposal_boxes = [p.proposal_boxes for p in proposals] 410 | proposal_boxes = proposal_boxes[0].cat(proposal_boxes).tensor 411 | N, B = proposal_boxes.shape 412 | predict_boxes = apply_deltas_broadcast( 413 | self.box2box_transform, proposal_deltas, proposal_boxes 414 | ) # Nx(KxB) 415 | 416 | K = predict_boxes.shape[1] // B 417 | if K > 1: 418 | gt_classes = torch.cat([p.gt_classes for p in proposals], dim=0) 419 | # Some proposals are ignored or have a background class. Their gt_classes 420 | # cannot be used as index. 421 | gt_classes = gt_classes.clamp_(0, K - 1) 422 | 423 | predict_boxes = predict_boxes.view(N, K, B)[ 424 | torch.arange(N, dtype=torch.long, device=predict_boxes.device), gt_classes 425 | ] 426 | num_prop_per_image = [len(p) for p in proposals] 427 | return predict_boxes.split(num_prop_per_image) 428 | 429 | def predict_boxes(self, predictions, proposals): 430 | """ 431 | Returns: 432 | list[Tensor]: A list of Tensors of predicted class-specific or class-agnostic boxes 433 | for each image. Element i has shape (Ri, K * B) or (Ri, B), where Ri is 434 | the number of predicted objects for image i and B is the box dimension (4 or 5) 435 | """ 436 | if not len(proposals): 437 | return [] 438 | _, _, proposal_deltas = predictions 439 | num_prop_per_image = [len(p) for p in proposals] 440 | proposal_boxes = [p.proposal_boxes for p in proposals] 441 | proposal_boxes = proposal_boxes[0].cat(proposal_boxes).tensor 442 | predict_boxes = self.box2box_transform.apply_deltas( 443 | proposal_deltas, proposal_boxes 444 | ) # Nx(KxB) 445 | return predict_boxes.split(num_prop_per_image) 446 | 447 | def predict_probs(self, predictions, proposals): 448 | """ 449 | Returns: 450 | list[Tensor]: A list of Tensors of predicted class probabilities for each image. 451 | Element i has shape (Ri, K + 1), where Ri is the number of predicted objects 452 | for image i. 453 | """ 454 | scores, _, _ = predictions 455 | num_inst_per_image = [len(p) for p in proposals] 456 | probs = F.softmax(scores, dim=-1) 457 | return probs.split(num_inst_per_image, dim=0) 458 | 459 | def predict_attribute_probs(self, predictions, proposals): 460 | """ 461 | Returns: 462 | list[Tensor]: A list of Tensors of predicted class probabilities for each image. 463 | Element i has shape (Ri, K + 1), where Ri is the number of predicted objects 464 | for image i. 465 | """ 466 | _, attr_scores, _ = predictions 467 | num_inst_per_image = [len(p) for p in proposals] 468 | probs = torch.sigmoid(attr_scores) 469 | return probs.split(num_inst_per_image, dim=0) 470 | -------------------------------------------------------------------------------- /imaterialist/modeling/roi_heads/roi_heads.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | import logging 3 | from typing import Dict 4 | 5 | from detectron2.layers import ShapeSpec 6 | from detectron2.utils.registry import Registry 7 | from detectron2.modeling.roi_heads.roi_heads import StandardROIHeads 8 | from detectron2.modeling.poolers import ROIPooler 9 | from detectron2.modeling.roi_heads.box_head import build_box_head 10 | 11 | from .attributes_head import AttributesFastRCNNOutputLayers 12 | 13 | ROI_HEADS_REGISTRY = Registry("ROI_HEADS") 14 | ROI_HEADS_REGISTRY.__doc__ = """ 15 | Registry for ROI heads in a generalized R-CNN model. 16 | ROIHeads take feature maps and region proposals, and 17 | perform per-region computation. 18 | 19 | The registered object will be called with `obj(cfg, input_shape)`. 20 | The call is expected to return an :class:`ROIHeads`. 21 | """ 22 | 23 | logger = logging.getLogger(__name__) 24 | 25 | 26 | def build_roi_heads(cfg, input_shape): 27 | """ 28 | Build ROIHeads defined by `cfg.MODEL.ROI_HEADS.NAME`. 29 | """ 30 | name = cfg.MODEL.ROI_HEADS.NAME 31 | return ROI_HEADS_REGISTRY.get(name)(cfg, input_shape) 32 | 33 | 34 | @ROI_HEADS_REGISTRY.register() 35 | class StandardROIHeads(StandardROIHeads): 36 | """ 37 | It's "standard" in a sense that there is no ROI transform sharing 38 | or feature sharing between tasks. 39 | The cropped rois go to separate branches (boxes and masks) directly. 40 | This way, it is easier to make separate abstractions for different branches. 41 | 42 | This class is used by most models, such as FPN and C5. 43 | To implement more models, you can subclass it and implement a different 44 | :meth:`forward()` or a head. 45 | """ 46 | 47 | def __init__(self, cfg, input_shape: Dict[str, ShapeSpec]): 48 | super().__init__(cfg, input_shape) 49 | self.in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES 50 | 51 | self._init_box_head(cfg, input_shape) 52 | 53 | @classmethod 54 | def _init_box_head(cls, cfg, input_shape): 55 | # fmt: off 56 | in_features = cfg.MODEL.ROI_HEADS.IN_FEATURES 57 | pooler_resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION 58 | pooler_scales = tuple(1.0 / input_shape[k].stride for k in in_features) 59 | sampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO 60 | pooler_type = cfg.MODEL.ROI_BOX_HEAD.POOLER_TYPE 61 | # fmt: on 62 | 63 | # If StandardROIHeads is applied on multiple feature maps (as in FPN), 64 | # then we share the same predictors and therefore the channel counts must be the same 65 | in_channels = [input_shape[f].channels for f in in_features] 66 | # Check all channel counts are equal 67 | assert len(set(in_channels)) == 1, in_channels 68 | in_channels = in_channels[0] 69 | 70 | box_pooler = ROIPooler( 71 | output_size=pooler_resolution, 72 | scales=pooler_scales, 73 | sampling_ratio=sampling_ratio, 74 | pooler_type=pooler_type, 75 | ) 76 | # Here we split "box head" and "box predictor", which is mainly due to historical reasons. 77 | # They are used together so the "box predictor" layers should be part of the "box head". 78 | # New subclasses of ROIHeads do not need "box predictor"s. 79 | box_head = build_box_head( 80 | cfg, ShapeSpec(channels=in_channels, height=pooler_resolution, width=pooler_resolution) 81 | ) 82 | box_predictor = AttributesFastRCNNOutputLayers(cfg, box_head.output_shape) 83 | return { 84 | "box_in_features": in_features, 85 | "box_pooler": box_pooler, 86 | "box_head": box_head, 87 | "box_predictor": box_predictor, 88 | } 89 | -------------------------------------------------------------------------------- /imaterialist/submission_utils/resize_longest_edge.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from PIL import Image 3 | # Rle helper functions 4 | from matplotlib.pyplot import imsave 5 | from environs import Env 6 | from pathlib import Path 7 | from tqdm import tqdm 8 | import os 9 | import glob 10 | import math 11 | env = Env() 12 | env.read_env() 13 | 14 | path_data_interim = Path(env("path_interim")) 15 | path_test_data = Path(env("path_test")) 16 | path_output = Path(env("path_output")) 17 | 18 | def downscale_folder(path_test_data="/home/dyt811/Git/cvnnig/data_imaterialist2020/raw/test"): 19 | """ 20 | Take the entire test data set, try to downscale the long edge to 1024. 21 | :param path_test_data: 22 | :return: 23 | """ 24 | list_files = glob.glob(f"{path_test_data}/*.jpg") 25 | for file in tqdm(list_files): 26 | downscale_image(file, path_data_interim) 27 | 28 | def downscale_image(img, path_data_interim = path_data_interim, max_size=1024): 29 | ''' 30 | Adaptive funciton to first DOWNSCALE the image before running RLE 31 | # Source: https://stackoverflow.com/a/28453021 32 | img: numpy array, 1 - mask, 0 - background 33 | Returns run length as string formated 34 | ''' 35 | #img is a tensor 36 | img_data = Image.open(img) 37 | img_array = np.array(img_data) 38 | pil_image = Image.fromarray(img_array) 39 | width_current = pil_image.size[0] 40 | height_current = pil_image.size[1] 41 | 42 | longest_edge = max(width_current, height_current) 43 | 44 | # Longest edge MUST be 1024, even if smaller or larger images 45 | if (width_current > height_current): 46 | new_width = max_size 47 | scaled_height = max_size / float(width_current) * height_current 48 | new_height = int(math.floor(scaled_height)) 49 | else: 50 | scale_width = max_size / float(height_current) * width_current 51 | new_width = int(math.floor(scale_width)) 52 | new_height = max_size 53 | # Always resizing. 54 | pil_image = pil_image.resize((new_width, new_height), Image.NEAREST) 55 | 56 | image_array = np.array(pil_image) 57 | 58 | imsave(f"{path_data_interim}/resized_test/{Path(img).stem}.jpg", image_array) 59 | 60 | if __name__ =="__main__": 61 | downscale_folder() -------------------------------------------------------------------------------- /imaterialist/submission_utils/test_csv_write.py: -------------------------------------------------------------------------------- 1 | import csv 2 | from typing import List 3 | import pickle 4 | 5 | def filter_csv_write(list_list_dict: List[List[dict]], path_csv): 6 | """ 7 | Write the list of csv predictions into CSV. 8 | :param list_dict: 9 | :param path_csv: 10 | :return: 11 | """ 12 | # Flatten the two list. 13 | # Feturn item if they the encoded pixel is not flat. 14 | flat_list = [] 15 | # Iterate through image list. 16 | for sublist in list_list_dict: 17 | # Iterate through mask list 18 | for item in sublist: 19 | # If the EncodedPixel is empty, skip. 20 | if item["EncodedPixels"] == "": 21 | continue 22 | else: 23 | flat_list.append(item) 24 | 25 | # With blanks. 26 | # flat_list = [item for sublist in list_list_dict for item in sublist] 27 | 28 | # Source: https://stackoverflow.com/questions/3086973/how-do-i-convert-this-list-of-dictionaries-to-a-csv-file 29 | keys = flat_list[0].keys() 30 | with open(path_csv, 'w') as output_file: 31 | # quote char prevent dict_writer to quote string that contain separtor: , 32 | # The attributes are separated by COMMA, and must be quoted, by using space, 33 | dict_writer = csv.DictWriter(output_file, keys) 34 | dict_writer.writeheader() 35 | dict_writer.writerows(flat_list) 36 | 37 | def /test_filter_csvwrite(): 38 | # Load masks 39 | data = pickle.load(open("/home/dyt811/Git/cvnnig/data_imaterialist2020/2020-05-25T014759_NSM0.75Prediction/result_file.pkl", 'rb')) 40 | filter_csv_write(data, "2020-05-26T005749_csvBlank.csv") -------------------------------------------------------------------------------- /notebooks/03-Create-dataset.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import json\n", 10 | "import logging\n", 11 | "import pickle\n", 12 | "import numpy as np\n", 13 | "import pandas as pd\n", 14 | "from pathlib import Path\n", 15 | "from sklearn import preprocessing\n", 16 | "import sys\n", 17 | "from environs import Env\n", 18 | "import torch\n", 19 | "import detectron2" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": {}, 26 | "outputs": [ 27 | { 28 | "data": { 29 | "text/plain": [ 30 | "'0.1.3'" 31 | ] 32 | }, 33 | "execution_count": 2, 34 | "metadata": {}, 35 | "output_type": "execute_result" 36 | } 37 | ], 38 | "source": [ 39 | "detectron2.__version__" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 2, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "sys.path.append('../')\n", 49 | "\n", 50 | "env = Env()\n", 51 | "env.read_env()\n", 52 | "\n", 53 | "# Get training dataframe\n", 54 | "path_data = Path(env(\"path_raw\"))\n", 55 | "path_image = path_data / \"train/\"\n", 56 | "path_data_interim = Path(env(\"path_interim\"))" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 3, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "from imaterialist.data.datasets.make_dataset import load_dataset_into_dataframes, create_datadict, attr_str_to_list " 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 4, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "name": "stderr", 75 | "output_type": "stream", 76 | "text": [ 77 | "../imaterialist/data/datasets/make_dataset.py:76: SettingWithCopyWarning: \n", 78 | "A value is trying to be set on a copy of a slice from a DataFrame\n", 79 | "\n", 80 | "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", 81 | " df['AttributesIds'][index] = row['AttributesIds'].split(',')\n", 82 | "../imaterialist/data/datasets/make_dataset.py:79: SettingWithCopyWarning: \n", 83 | "A value is trying to be set on a copy of a slice from a DataFrame\n", 84 | "\n", 85 | "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", 86 | " df['AttributesIds'][index] = [int(x) for x in df['AttributesIds'][index]]\n", 87 | "../imaterialist/data/datasets/make_dataset.py:85: SettingWithCopyWarning: \n", 88 | "A value is trying to be set on a copy of a slice from a DataFrame\n", 89 | "\n", 90 | "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", 91 | " df['AttributesIds'][index] = lb.transform(df['AttributesIds'][index]).sum(axis=0)\n", 92 | "../imaterialist/data/datasets/make_dataset.py:83: SettingWithCopyWarning: \n", 93 | "A value is trying to be set on a copy of a slice from a DataFrame\n", 94 | "\n", 95 | "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", 96 | " df['AttributesIds'][index] = [999]\n" 97 | ] 98 | } 99 | ], 100 | "source": [ 101 | "data_full, df_attributes, _ = load_dataset_into_dataframes(n_cases=500)\n", 102 | "datadic_full = create_datadict(data_full, df_attributes)" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 5, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "n_train = 400\n", 112 | "n_test = 100\n", 113 | "\n", 114 | "datadic_train = datadic_full[:n_train].copy()\n", 115 | "datadic_val = datadic_full[-n_test:].copy()" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 6, 121 | "metadata": {}, 122 | "outputs": [ 123 | { 124 | "data": { 125 | "text/html": [ 126 | "
\n", 127 | "\n", 140 | "\n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | "
ImageIdEncodedPixelsHeightWidthClassIdAttributesIdsx0y0x1y1
322/home/julien/data-science/kaggle/imaterialist/...1131858 13 1131917 36 1133630 42 1133718 40 11...1800120023[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...62814186921583
307/home/julien/data-science/kaggle/imaterialist/...212864 7 213881 21 214899 31 215920 35 216943 ...102468323[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...207878257961
136/home/julien/data-science/kaggle/imaterialist/...179500 6 180515 16 181534 22 182557 25 183579 ...102468010[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...175163530956
370/home/julien/data-science/kaggle/imaterialist/...5669654 1 5673613 1 5677571 3 5681530 4 568548...3960264031[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...1431154620572923
288/home/julien/data-science/kaggle/imaterialist/...896338 1 897937 4 899537 7 901137 9 902736 13 ...1600106733[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...560282631349
167/home/julien/data-science/kaggle/imaterialist/...247017 4 248399 11 249780 16 251162 18 252543 ...13839004[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...178337682984
53/home/julien/data-science/kaggle/imaterialist/...701292 5 703597 15 705903 25 708209 34 710515 ...231015366[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...30311818242136
75/home/julien/data-science/kaggle/imaterialist/...1590945 1 1593942 4 1596939 7 1599938 8 160293...3000200037[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...5309297411003
350/home/julien/data-science/kaggle/imaterialist/...243646 2 244218 6 244714 16 244743 3 244760 5 ...576102424[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...422217456575
265/home/julien/data-science/kaggle/imaterialist/...1660869 5 1663151 15 1665436 21 1667723 22 167...2287152233[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...726499887571
\n", 289 | "
" 290 | ], 291 | "text/plain": [ 292 | " ImageId \\\n", 293 | "322 /home/julien/data-science/kaggle/imaterialist/... \n", 294 | "307 /home/julien/data-science/kaggle/imaterialist/... \n", 295 | "136 /home/julien/data-science/kaggle/imaterialist/... \n", 296 | "370 /home/julien/data-science/kaggle/imaterialist/... \n", 297 | "288 /home/julien/data-science/kaggle/imaterialist/... \n", 298 | "167 /home/julien/data-science/kaggle/imaterialist/... \n", 299 | "53 /home/julien/data-science/kaggle/imaterialist/... \n", 300 | "75 /home/julien/data-science/kaggle/imaterialist/... \n", 301 | "350 /home/julien/data-science/kaggle/imaterialist/... \n", 302 | "265 /home/julien/data-science/kaggle/imaterialist/... \n", 303 | "\n", 304 | " EncodedPixels Height Width \\\n", 305 | "322 1131858 13 1131917 36 1133630 42 1133718 40 11... 1800 1200 \n", 306 | "307 212864 7 213881 21 214899 31 215920 35 216943 ... 1024 683 \n", 307 | "136 179500 6 180515 16 181534 22 182557 25 183579 ... 1024 680 \n", 308 | "370 5669654 1 5673613 1 5677571 3 5681530 4 568548... 3960 2640 \n", 309 | "288 896338 1 897937 4 899537 7 901137 9 902736 13 ... 1600 1067 \n", 310 | "167 247017 4 248399 11 249780 16 251162 18 252543 ... 1383 900 \n", 311 | "53 701292 5 703597 15 705903 25 708209 34 710515 ... 2310 1536 \n", 312 | "75 1590945 1 1593942 4 1596939 7 1599938 8 160293... 3000 2000 \n", 313 | "350 243646 2 244218 6 244714 16 244743 3 244760 5 ... 576 1024 \n", 314 | "265 1660869 5 1663151 15 1665436 21 1667723 22 167... 2287 1522 \n", 315 | "\n", 316 | " ClassId AttributesIds x0 y0 \\\n", 317 | "322 23 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 628 1418 \n", 318 | "307 23 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 207 878 \n", 319 | "136 10 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 175 163 \n", 320 | "370 31 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 1431 1546 \n", 321 | "288 33 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 560 282 \n", 322 | "167 4 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 178 337 \n", 323 | "53 6 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 303 1181 \n", 324 | "75 37 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 530 929 \n", 325 | "350 24 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 422 217 \n", 326 | "265 33 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 726 499 \n", 327 | "\n", 328 | " x1 y1 \n", 329 | "322 692 1583 \n", 330 | "307 257 961 \n", 331 | "136 530 956 \n", 332 | "370 2057 2923 \n", 333 | "288 631 349 \n", 334 | "167 682 984 \n", 335 | "53 824 2136 \n", 336 | "75 741 1003 \n", 337 | "350 456 575 \n", 338 | "265 887 571 " 339 | ] 340 | }, 341 | "execution_count": 6, 342 | "metadata": {}, 343 | "output_type": "execute_result" 344 | } 345 | ], 346 | "source": [ 347 | "datadic_train.sample(10)" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": 7, 353 | "metadata": {}, 354 | "outputs": [ 355 | { 356 | "name": "stdout", 357 | "output_type": "stream", 358 | "text": [ 359 | "\n", 360 | "RangeIndex: 400 entries, 0 to 399\n", 361 | "Data columns (total 10 columns):\n", 362 | " # Column Non-Null Count Dtype \n", 363 | "--- ------ -------------- ----- \n", 364 | " 0 ImageId 400 non-null object\n", 365 | " 1 EncodedPixels 400 non-null object\n", 366 | " 2 Height 400 non-null int64 \n", 367 | " 3 Width 400 non-null int64 \n", 368 | " 4 ClassId 400 non-null int64 \n", 369 | " 5 AttributesIds 400 non-null object\n", 370 | " 6 x0 400 non-null int64 \n", 371 | " 7 y0 400 non-null int64 \n", 372 | " 8 x1 400 non-null int64 \n", 373 | " 9 y1 400 non-null int64 \n", 374 | "dtypes: int64(7), object(3)\n", 375 | "memory usage: 31.4+ KB\n" 376 | ] 377 | } 378 | ], 379 | "source": [ 380 | "\n", 381 | "datadic_train.info()" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 20, 387 | "metadata": {}, 388 | "outputs": [], 389 | "source": [ 390 | "import pickle" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 10, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [ 399 | "datadict_train = pickle.load(open(path_data_interim / 'imaterialist_train_multihot_n=400.p', 'rb'))" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": {}, 406 | "outputs": [], 407 | "source": [] 408 | } 409 | ], 410 | "metadata": { 411 | "kernelspec": { 412 | "display_name": "Python (imaterialist)", 413 | "language": "python", 414 | "name": "imaterialist" 415 | }, 416 | "language_info": { 417 | "codemirror_mode": { 418 | "name": "ipython", 419 | "version": 3 420 | }, 421 | "file_extension": ".py", 422 | "mimetype": "text/x-python", 423 | "name": "python", 424 | "nbconvert_exporter": "python", 425 | "pygments_lexer": "ipython3", 426 | "version": "3.8.3" 427 | } 428 | }, 429 | "nbformat": 4, 430 | "nbformat_minor": 4 431 | } 432 | -------------------------------------------------------------------------------- /notebooks/06-Attribute-inference.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "%reload_ext autoreload\n", 10 | "%autoreload 2\n", 11 | "%matplotlib inline" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 2, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import numpy as np \n", 21 | "import pandas as pd \n", 22 | "import collections\n", 23 | "import torch\n", 24 | "import feather\n", 25 | "import json\n", 26 | "import os\n", 27 | "import cv2\n", 28 | "import random\n", 29 | "import gc\n", 30 | "import pycocotools\n", 31 | "\n", 32 | "from tqdm import tqdm\n", 33 | "import matplotlib.pyplot as plt\n", 34 | "import PIL\n", 35 | "from PIL import Image, ImageFile\n", 36 | "from torch.utils.data import Dataset, DataLoader\n", 37 | "\n", 38 | "from pathlib import Path\n", 39 | "from environs import Env\n", 40 | "\n", 41 | "from detectron2 import model_zoo\n", 42 | "from detectron2.structures import BoxMode\n", 43 | "from detectron2.engine import DefaultPredictor, default_argument_parser, default_setup\n", 44 | "from detectron2.config import get_cfg\n", 45 | "from detectron2.utils.visualizer import Visualizer\n", 46 | "from detectron2.data import MetadataCatalog\n", 47 | "from detectron2.utils.logger import setup_logger\n", 48 | "\n", 49 | "import sys\n", 50 | "sys.path.append('../')\n", 51 | "\n", 52 | "from iMaterialist2020.imaterialist.data.datasets.coco import register_datadict\n", 53 | "from iMaterialist2020.imaterialist.config import add_imaterialist_config\n", 54 | "from iMaterialist2020.imaterialist.modeling import build_model\n", 55 | "\n", 56 | "env = Env()\n", 57 | "env.read_env()" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 3, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "# Get training dataframe\n", 67 | "data_dir = Path(env('path_raw'))\n", 68 | "image_dir = Path(env('path_images'))\n", 69 | "df = pd.read_csv(data_dir/'train.csv')\n", 70 | "\n", 71 | "# Load modified df for Detectron2 dataset dict \n", 72 | "df_detectron = pd.read_feather('../data/interim/imaterialist_train_multihot_n=4000.feather') \n", 73 | "\n", 74 | "# Get label descriptions\n", 75 | "with open(data_dir/'label_descriptions.json', 'r') as file:\n", 76 | " label_desc = json.load(file)\n", 77 | "df_categories = pd.DataFrame(label_desc['categories'])\n", 78 | "df_attributes = pd.DataFrame(label_desc['attributes'])" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 4, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "output_type": "execute_result", 88 | "data": { 89 | "text/plain": "'/home/nasty/imaterialist2020/data/raw/train/00000663ed1ff0c4e0132b9b9ac53f6e.jpg'" 90 | }, 91 | "metadata": {}, 92 | "execution_count": 4 93 | } 94 | ], 95 | "source": [ 96 | "\n", 97 | "df_detectron.ImageId[0]" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 5, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "output_type": "error", 107 | "ename": "TypeError", 108 | "evalue": "Image data of dtype \u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mimshow\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf_detectron\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mImageId\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 113 | "\u001b[0;32m~/anaconda3/envs/imaterialist/lib/python3.8/site-packages/matplotlib/pyplot.py\u001b[0m in \u001b[0;36mimshow\u001b[0;34m(X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, data, **kwargs)\u001b[0m\n\u001b[1;32m 2676\u001b[0m \u001b[0mfilterrad\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m4.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mimlim\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcbook\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdeprecation\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_deprecated_parameter\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2677\u001b[0m resample=None, url=None, *, data=None, **kwargs):\n\u001b[0;32m-> 2678\u001b[0;31m __ret = gca().imshow(\n\u001b[0m\u001b[1;32m 2679\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcmap\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcmap\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnorm\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnorm\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maspect\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maspect\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2680\u001b[0m \u001b[0minterpolation\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minterpolation\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0malpha\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0malpha\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvmin\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvmin\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 114 | "\u001b[0;32m~/anaconda3/envs/imaterialist/lib/python3.8/site-packages/matplotlib/__init__.py\u001b[0m in \u001b[0;36minner\u001b[0;34m(ax, data, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1597\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0minner\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0max\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1598\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1599\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0max\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0mmap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msanitize_sequence\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1600\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1601\u001b[0m \u001b[0mbound\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnew_sig\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbind\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0max\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 115 | "\u001b[0;32m~/anaconda3/envs/imaterialist/lib/python3.8/site-packages/matplotlib/cbook/deprecation.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 367\u001b[0m \u001b[0;34mf\"%(removal)s. If any parameter follows {name!r}, they \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 368\u001b[0m f\"should be pass as keyword, not positionally.\")\n\u001b[0;32m--> 369\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 370\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 371\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 116 | "\u001b[0;32m~/anaconda3/envs/imaterialist/lib/python3.8/site-packages/matplotlib/cbook/deprecation.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 367\u001b[0m \u001b[0;34mf\"%(removal)s. If any parameter follows {name!r}, they \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 368\u001b[0m f\"should be pass as keyword, not positionally.\")\n\u001b[0;32m--> 369\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 370\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 371\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 117 | "\u001b[0;32m~/anaconda3/envs/imaterialist/lib/python3.8/site-packages/matplotlib/axes/_axes.py\u001b[0m in \u001b[0;36mimshow\u001b[0;34m(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, **kwargs)\u001b[0m\n\u001b[1;32m 5677\u001b[0m resample=resample, **kwargs)\n\u001b[1;32m 5678\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 5679\u001b[0;31m \u001b[0mim\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mset_data\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5680\u001b[0m \u001b[0mim\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mset_alpha\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0malpha\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5681\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mim\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_clip_path\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 118 | "\u001b[0;32m~/anaconda3/envs/imaterialist/lib/python3.8/site-packages/matplotlib/image.py\u001b[0m in \u001b[0;36mset_data\u001b[0;34m(self, A)\u001b[0m\n\u001b[1;32m 682\u001b[0m if (self._A.dtype != np.uint8 and\n\u001b[1;32m 683\u001b[0m not np.can_cast(self._A.dtype, float, \"same_kind\")):\n\u001b[0;32m--> 684\u001b[0;31m raise TypeError(\"Image data of dtype {} cannot be converted to \"\n\u001b[0m\u001b[1;32m 685\u001b[0m \"float\".format(self._A.dtype))\n\u001b[1;32m 686\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 119 | "\u001b[0;31mTypeError\u001b[0m: Image data of dtype .container { width:100% !important; }" 16 | ], 17 | "text/plain": [ 18 | "" 19 | ] 20 | }, 21 | "metadata": {}, 22 | "output_type": "display_data" 23 | } 24 | ], 25 | "source": [ 26 | "import pickle\n", 27 | "import os\n", 28 | "from pathlib import Path\n", 29 | "from PIL import Image as PILImage\n", 30 | "from IPython.display import Image \n", 31 | "# Notebook widget for interactive exploration\n", 32 | "import ipywidgets as widgets\n", 33 | "from ipywidgets import interact, interact_manual\n", 34 | "import matplotlib.pyplot as plt\n", 35 | "from matplotlib.pyplot import imshow\n", 36 | "import cv2 as cv\n", 37 | "from IPython.core.display import display, HTML\n", 38 | "display(HTML(\"\"))\n", 39 | "\n", 40 | "from PythonUtils.rle_decoding import RLE_decoding\n", 41 | "\n", 42 | "import numpy as np\n", 43 | "import pandas as pd\n", 44 | "from dotenv import load_dotenv, find_dotenv\n", 45 | "from src.data.csv_label_read import pandaread_image_labels\n", 46 | "from dotenv import load_dotenv, find_dotenv\n", 47 | "load_dotenv(find_dotenv())\n", 48 | "PATH_DATA_RAW = os.getenv(\"PATH_DATA_RAW\")\n" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 3, 54 | "metadata": { 55 | "pycharm": { 56 | "is_executing": true 57 | } 58 | }, 59 | "outputs": [], 60 | "source": [ 61 | "dataframe = pandaread_image_labels()" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 4, 67 | "metadata": { 68 | "pycharm": { 69 | "is_executing": true 70 | } 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "# Interactively Explorer the DataFrame\n", 75 | "import dtale\n", 76 | "d = dtale.show(dataframe)\n", 77 | "d.open_browser()" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 5, 83 | "metadata": { 84 | "pycharm": { 85 | "is_executing": true 86 | }, 87 | "scrolled": false 88 | }, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "application/vnd.jupyter.widget-view+json": { 93 | "model_id": "74d20e20c9d34601bcc05a3a43793a43", 94 | "version_major": 2, 95 | "version_minor": 0 96 | }, 97 | "text/plain": [ 98 | "interactive(children=(IntSlider(value=166700, description='index_label', max=333400), Output()), _dom_classes=…" 99 | ] 100 | }, 101 | "metadata": {}, 102 | "output_type": "display_data" 103 | } 104 | ], 105 | "source": [ 106 | "@interact\n", 107 | "def show_count(index_label=(0, len(dataframe)-1)):\n", 108 | " df_label = dataframe.loc[index_label, :] \n", 109 | " print(df_label.ClassId)\n", 110 | " #print(df_label.EncodedPixels)\n", 111 | " \n", 112 | " (order, length) = RLE_decoding.parse_order_length_string(df_label.EncodedPixels)\n", 113 | " \n", 114 | " test = RLE_decoding(order, length, x_max=df_label.Width, y_max=df_label.Height, y_encoded_first=False)\n", 115 | " test.decode()\n", 116 | " \n", 117 | " path_original = Path(PATH_DATA_RAW) / f\"train/{df_label.ImageId}.jpg\"\n", 118 | " \n", 119 | " image_original = PILImage.open(path_original)\n", 120 | " \n", 121 | " scale = 4\n", 122 | " \n", 123 | " \n", 124 | " original_resized = image_original.resize((image_original.size[0]//scale,image_original.size[1]//scale ))\n", 125 | " \n", 126 | " from matplotlib import rcParams\n", 127 | "\n", 128 | " # figure size in inches optional\n", 129 | " rcParams['figure.figsize'] = 11 ,8\n", 130 | " \n", 131 | " mask = test.get_mask()\n", 132 | " mask_resized = mask.resize((df_label.Width//scale, df_label.Height//scale))\n", 133 | " \n", 134 | " # display images\n", 135 | " fig, ax = plt.subplots(1,2)\n", 136 | "\n", 137 | " ax[0].imshow(original_resized, cmap='gray');\n", 138 | " ax[1].imshow(mask_resized, cmap='gray');\n", 139 | " fig.set_size_inches(20,20)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": null, 145 | "metadata": { 146 | "pycharm": { 147 | "is_executing": true 148 | } 149 | }, 150 | "outputs": [], 151 | "source": [ 152 | "image_original.size" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": { 159 | "pycharm": { 160 | "is_executing": true 161 | } 162 | }, 163 | "outputs": [], 164 | "source": [] 165 | } 166 | ], 167 | "metadata": { 168 | "kernelspec": { 169 | "display_name": "Python 3", 170 | "language": "python", 171 | "name": "python3" 172 | }, 173 | "language_info": { 174 | "codemirror_mode": { 175 | "name": "ipython", 176 | "version": 3 177 | }, 178 | "file_extension": ".py", 179 | "mimetype": "text/x-python", 180 | "name": "python", 181 | "nbconvert_exporter": "python", 182 | "pygments_lexer": "ipython3", 183 | "version": "3.7.6" 184 | } 185 | }, 186 | "nbformat": 4, 187 | "nbformat_minor": 2 188 | } 189 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==0.9.0 2 | astroid==2.4.0 3 | attrs==19.3.0 4 | backcall==0.1.0 5 | bleach==3.1.4 6 | cachetools==4.1.0 7 | certifi==2020.4.5.1 8 | chardet==3.0.4 9 | cloudpickle==1.4.1 10 | cycler==0.10.0 11 | Cython==0.29.17 12 | decorator==4.4.2 13 | defusedxml==0.6.0 14 | detectron2==0.1.2+cu101 15 | entrypoints==0.3 16 | environs==7.4.0 17 | feather-format==0.4.1 18 | flake8==3.7.9 19 | future==0.18.2 20 | fvcore==0.1.dev200506 21 | google-auth==1.14.2 22 | google-auth-oauthlib==0.4.1 23 | grpcio==1.28.1 24 | idna==2.9 25 | importlib-metadata==1.5.0 26 | ipykernel==5.1.4 27 | ipython==7.13.0 28 | ipython-genutils==0.2.0 29 | isort==4.3.21 30 | jedi==0.17.0 31 | Jinja2==2.11.2 32 | joblib==0.14.1 33 | jsonschema==3.2.0 34 | jupyter-client==6.1.3 35 | jupyter-core==4.6.3 36 | kaggle==1.5.6 37 | kiwisolver==1.2.0 38 | lazy-object-proxy==1.4.3 39 | Markdown==3.2.2 40 | MarkupSafe==1.1.1 41 | marshmallow==3.6.0 42 | matplotlib==3.1.3 43 | mccabe==0.6.1 44 | mistune==0.8.4 45 | mkl-fft==1.0.15 46 | mkl-random==1.1.0 47 | mkl-service==2.3.0 48 | mock==4.0.2 49 | nbconvert==5.6.1 50 | nbformat==5.0.6 51 | notebook==6.0.3 52 | numpy==1.18.1 53 | oauthlib==3.1.0 54 | olefile==0.46 55 | opencv-python==4.2.0.34 56 | pandas==1.0.3 57 | pandocfilters==1.4.2 58 | parso==0.7.0 59 | pexpect==4.8.0 60 | pickleshare==0.7.5 61 | Pillow==7.1.2 62 | portalocker==1.7.0 63 | prometheus-client==0.7.1 64 | prompt-toolkit==3.0.4 65 | protobuf==3.11.3 66 | ptyprocess==0.6.0 67 | pyarrow==0.17.0 68 | pyasn1==0.4.8 69 | pyasn1-modules==0.2.8 70 | pycocotools==2.0 71 | pycodestyle==2.5.0 72 | pydot==1.4.1 73 | pyflakes==2.1.1 74 | Pygments==2.6.1 75 | pylint==2.5.0 76 | pyparsing==2.4.7 77 | pyrsistent==0.16.0 78 | python-dateutil==2.8.1 79 | python-dotenv==0.13.0 80 | python-slugify==4.0.0 81 | pytz==2020.1 82 | PyYAML==5.3.1 83 | pyzmq==18.1.1 84 | requests==2.23.0 85 | requests-oauthlib==1.3.0 86 | rsa==4.0 87 | scikit-learn==0.23.0 88 | scipy==1.4.1 89 | seaborn==0.10.1 90 | Send2Trash==1.5.0 91 | sip==4.19.13 92 | six==1.14.0 93 | tabulate==0.8.7 94 | tensorboard==2.2.1 95 | tensorboard-plugin-wit==1.6.0.post3 96 | termcolor==1.1.0 97 | terminado==0.8.3 98 | testpath==0.4.4 99 | text-unidecode==1.3 100 | threadpoolctl==2.0.0 101 | toml==0.10.0 102 | torch==1.5.0+cu101 103 | torchvision==0.6.0+cu101 104 | tornado==6.0.4 105 | tqdm==4.46.0 106 | traitlets==4.3.3 107 | urllib3==1.24.3 108 | wcwidth==0.1.9 109 | webencodings==0.5.1 110 | Werkzeug==1.0.1 111 | wrapt==1.11.2 112 | yacs==0.1.7 113 | zipp==3.1.0 114 | -------------------------------------------------------------------------------- /train_net.py: -------------------------------------------------------------------------------- 1 | """ 2 | iMaterialist 2020 training script. 3 | 4 | This script runs a trainer where we pass in custom dataset mapper which contains all the 5 | attributes of each instance. 6 | 7 | We register the data dictionnaries, load the configs, and run the trainer 8 | """ 9 | 10 | 11 | import pandas as pd 12 | import logging 13 | from environs import Env 14 | from pathlib import Path 15 | import pickle 16 | 17 | import detectron2.utils.comm as comm 18 | from detectron2 import model_zoo 19 | from detectron2.config import get_cfg 20 | from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, launch 21 | from detectron2.data import build_detection_train_loader, build_detection_test_loader 22 | from detectron2.utils.logger import setup_logger 23 | 24 | from imaterialist.data.dataset_mapper import iMatDatasetMapper 25 | from imaterialist.config import add_imaterialist_config 26 | from imaterialist.data.datasets.coco import register_datadict 27 | from imaterialist.modeling import build_model 28 | 29 | from imaterialist.modeling import roi_heads 30 | 31 | # Get environment variables 32 | env = Env() 33 | env.read_env() 34 | 35 | # Set path to the data 36 | path_data_interim = Path(env("path_interim")) 37 | 38 | class FashionTrainer(DefaultTrainer): 39 | 'A customized version of DefaultTrainer. We add a custom mapping to the dataloader' 40 | 41 | @classmethod 42 | def build_train_loader(cls, cfg): 43 | return build_detection_train_loader(cfg, mapper=iMatDatasetMapper(cfg)) 44 | 45 | @classmethod 46 | def build_test_loader(cls, cfg, dataset_name): 47 | return build_detection_test_loader(cfg, dataset_name, mapper=iMatDatasetMapper(cfg)) 48 | 49 | @classmethod 50 | def build_model(cls, cfg): 51 | """ 52 | Returns: 53 | torch.nn.Module: 54 | 55 | It now calls :func:`detectron2.modeling.build_model`. 56 | """ 57 | model = build_model(cfg) 58 | logger = logging.getLogger(__name__) 59 | logger.info("Model:\n{}".format(model)) 60 | return model 61 | 62 | def setup(args): 63 | """ 64 | Setup all the custom and default configs before training 65 | """ 66 | cfg = get_cfg() 67 | add_imaterialist_config(cfg) 68 | cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) 69 | cfg.merge_from_file(args.config_file) 70 | 71 | cfg.merge_from_list(args.opts) 72 | cfg.freeze() 73 | default_setup(cfg, args) 74 | # Setup logger for "imaterialist" module 75 | setup_logger(output=cfg.OUTPUT_DIR, distributed_rank=comm.get_rank(), name="imaterialist") 76 | return cfg 77 | 78 | def main(args): 79 | """ 80 | load dataframes 81 | register detectron2 datadictionnaries 82 | setup config 83 | initialize the trainer 84 | run trainer to train the model 85 | """ 86 | # load dataframe 87 | # fixme: this number needs to update or dynamic 88 | # datadic_train = pd.read_feather(path_data_interim / 'imaterialist_train_multihot_n=400.feather') 89 | # datadic_val = pd.read_feather(path_data_interim / 'imaterailist_test_multihot_n=100.feather') 90 | 91 | datadict_train = pickle.load(open(path_data_interim / 'imaterialist_train_multihot_n=400.p', 'rb')) 92 | datadict_val = pickle.load(open(path_data_interim / 'imaterialist_test_multihot_n=100.p', 'rb')) 93 | 94 | register_datadict(datadict_train, "sample_fashion_train") 95 | register_datadict(datadict_val, "sample_fashion_test") 96 | 97 | cfg = setup(args) 98 | 99 | trainer = FashionTrainer(cfg) 100 | trainer.resume_or_load(resume=args.resume) 101 | return trainer.train() 102 | 103 | if __name__ == '__main__': 104 | args = default_argument_parser().parse_args() 105 | args.config_file = "/home/julien/data-science/kaggle/imaterialist/configs/exp06.yaml" 106 | print("Command Line Args:", args) 107 | launch( 108 | main, 109 | args.num_gpus, 110 | num_machines=args.num_machines, 111 | machine_rank=args.machine_rank, 112 | dist_url=args.dist_url, 113 | args=(args,), 114 | ) --------------------------------------------------------------------------------