├── README.md ├── costum_imagenet.py ├── examples ├── test1.png └── test2.png ├── get_indices.py ├── read_xml.py └── test_foreground.py /README.md: -------------------------------------------------------------------------------- 1 | # ImageNet 1K Bounding Boxes 2 | For some experiments, you might wanna pass only the `background` of imagenet images vs passing only the foreground. Here, I've included the code to extract the meta-data for the bounding box, cleaning up the the downloaded stuff, and then changing ImageNet Loader to support only the images that have box annotations. 3 | 4 | # How to use: 5 | ```python 6 | from costum_imagenet import BackgroundForegroundImageNet 7 | tr = trans.Compose([trans.Resize(224), trans.CenterCrop(224), trans.ToTensor(), ]) 8 | dataset = BackgroundForegroundImageNet(root='./data/imagenet/train', download=True, transform=tr) 9 | x, b, f, y = dataset[0] 10 | torchvision.utils.save_image(torch.stack([x, b, f]), 'test1.png') 11 | ``` 12 | 13 | # Example: 14 | ![Test1](/examples/test1.png) 15 | ![Test2](/examples/test2.png) 16 | 17 | If you set the value `download=True`, the bounding boxes and the indices of `imagenet` train split that have the bounding boxes will be downloaded. But if for some reason you want to create your own bounding boxes from the scratch, here's the steps for doing it: 18 | 19 | # Restarting from the scratch 20 | Downloading: 21 | First download the data from [here](https://image-net.org/data/bboxes_annotations.tar.gz): 22 | ```bash 23 | wget "https://image-net.org/data/bboxes_annotations.tar.gz" 24 | ``` 25 | 26 | Extract the File: 27 | ```bash 28 | tar -xvf bboxes_annotations.tar.gz 29 | ``` 30 | 31 | Extract every subfolder: 32 | ```bash 33 | cd bboxes_annotations 34 | ls | grep .tar.gz | while read f ; do tar -xvf "${f}" ; done 35 | ``` 36 | 37 | Convert dataset to JS: 38 | ```bash 39 | python read_xml.py 40 | ``` 41 | 42 | Clean the extra 50GB extracted files: 43 | ```bash 44 | rm *.tar.gz 45 | ls | grep "n.*" | while read f ; do rm -rf "${f}" ; done 46 | ``` 47 | 48 | Get Indices that have bounding boxes: 49 | ```bash 50 | python get_indices.py 51 | ``` 52 | 53 | Then simply pass the path to the files `boxes.pt` and `indices.pt` to your `BackgroundForegroundImageNet` constructor 54 | ```python 55 | dataset = BackgroundForegroundImageNet(root='.', download=False, boxes='boxes.pt', indices='indices.pt') 56 | ``` 57 | -------------------------------------------------------------------------------- /costum_imagenet.py: -------------------------------------------------------------------------------- 1 | from torch.utils import model_zoo 2 | from torchvision.datasets import ImageFolder 3 | import torch 4 | from torchvision.transforms import ToTensor, ToPILImage 5 | 6 | 7 | class BackgroundForegroundImageNet(ImageFolder): 8 | boxes_url = 'https://github.com/AminJun/ImageNet1KBoundingBoxes/releases/download/files/boxes.pt' 9 | indices_url = 'https://github.com/AminJun/ImageNet1KBoundingBoxes/releases/download/files/indices.pt' 10 | 11 | def __init__(self, root: str = '.', download=True, boxes: str = None, indices: str = None, *args, **kwargs): 12 | assert download or (boxes is not None and indices is not None) 13 | if download: 14 | self.boxes = model_zoo.load_url(self.boxes_url, map_location='cpu') 15 | self.b_indices = model_zoo.load_url(self.indices_url, map_location='cpu') 16 | else: 17 | self.boxes = torch.load(boxes) 18 | self.b_indices = torch.load(indices) 19 | 20 | merged = {} 21 | for k, v in self.boxes.items(): 22 | merged.update(v) 23 | self.boxes = merged 24 | 25 | self.pre_transform = ToTensor() 26 | self.back_transform = ToPILImage() 27 | print('loading imagenet folders') 28 | super(BackgroundForegroundImageNet, self).__init__(root, *args, **kwargs) 29 | 30 | def __len__(self): 31 | return len(self.b_indices) 32 | 33 | def __getitem__(self, item): 34 | real_i = self.b_indices[item] 35 | path, target = self.samples[real_i] 36 | sample = self.pre_transform(self.loader(path)) 37 | background = sample.clone().detach() 38 | for box in self.boxes[path.split('/')[-1]][0]: 39 | x1, x2, y1, y2 = box 40 | background[:, int(y1):int(y2), int(x1):int(x2)] = 0 41 | foreground = (sample - background).detach().clone() 42 | 43 | sample, background, foreground = self.back_transform(sample), self.back_transform( 44 | background), self.back_transform(foreground) 45 | 46 | if self.transform is not None: 47 | sample, background, foreground = self.transform(sample), self.transform(background), self.transform( 48 | foreground) 49 | if self.target_transform is not None: 50 | target = self.target_transform(target) 51 | 52 | return sample, background, foreground, target 53 | -------------------------------------------------------------------------------- /examples/test1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AminJun/ImageNet1KBoundingBoxes/c5958eaeb4553a6eb154c8c031f2b9df0f494790/examples/test1.png -------------------------------------------------------------------------------- /examples/test2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AminJun/ImageNet1KBoundingBoxes/c5958eaeb4553a6eb154c8c031f2b9df0f494790/examples/test2.png -------------------------------------------------------------------------------- /get_indices.py: -------------------------------------------------------------------------------- 1 | from torchvision.datasets import ImageFolder 2 | import torch 3 | 4 | 5 | def main(): 6 | dataset = ImageFolder(root='./data/imagenet/train', transform=None) 7 | boxes = torch.load('boxes.pt') 8 | merged = {} 9 | for k, v in boxes.items(): 10 | merged.update(v) 11 | 12 | indices = [i for i, (name, _) in enumerate(dataset.samples) if name.split('/')[-1] in merged] 13 | torch.save(indices, 'indices.pt') 14 | 15 | 16 | if __name__ == '__main__': 17 | main() 18 | -------------------------------------------------------------------------------- /read_xml.py: -------------------------------------------------------------------------------- 1 | import pdb 2 | import torch 3 | from tqdm import tqdm 4 | import xmltodict 5 | import os 6 | 7 | 8 | def translate_obj(cls: str, obj) -> list: 9 | if isinstance(obj, list): 10 | return [[v['bndbox']['xmin'], v['bndbox']['xmax'], v['bndbox']['ymin'], v['bndbox']['ymax']] for v in obj if 11 | v['name'] == cls] 12 | if obj['name'] == cls: 13 | v = obj 14 | return [[v['bndbox']['xmin'], v['bndbox']['xmax'], v['bndbox']['ymin'], v['bndbox']['ymax']]] 15 | raise NotImplementedError 16 | 17 | 18 | def objects(expected_class: str, path: str) -> list: 19 | with open(path, 'r') as f: 20 | data = f.read() 21 | xml = xmltodict.parse(data) 22 | return [translate_obj(expected_class, v) for k, v in xml['annotation'].items() if k == 'object'] 23 | 24 | 25 | def get_path(xml_path: str) -> str: 26 | return xml_path[:-4] + '.JPEG' 27 | 28 | 29 | def translate_folder(xml_folder: str, root: str) -> {}: 30 | parent = os.path.join(root, xml_folder) 31 | return {f'{get_path(path)}': objects(xml_folder, os.path.join(parent, path)) for path in os.listdir(parent)} 32 | 33 | 34 | def translate_dataset(root: str, classes: list): 35 | return {f'{dr}': translate_folder(dr, root) for dr in tqdm(os.listdir(root)) if 36 | os.path.isdir(os.path.join(root, dr)) and dr in classes} 37 | 38 | 39 | def main(): 40 | with open('im1knames.txt', 'r') as f: 41 | im1k_classes = f.read().split('\n') 42 | dataset = translate_dataset('/cmlscratch/aminjun/Datasets/ImageNetBoxes/Annotation/', im1k_classes) 43 | torch.save(dataset, 'boxes.pt') 44 | 45 | 46 | if __name__ == '__main__': 47 | main() 48 | -------------------------------------------------------------------------------- /test_foreground.py: -------------------------------------------------------------------------------- 1 | import torchvision.utils 2 | import torch 3 | import torchvision.transforms as trans 4 | 5 | from costum_imagenet import BackgroundForegroundImageNet 6 | 7 | 8 | def test1(): 9 | tr = trans.Compose([trans.Resize(224), trans.CenterCrop(224), trans.ToTensor(), ]) 10 | dataset = BackgroundForegroundImageNet(root='./data/imagenet/train', download=True, transform=tr) 11 | x, b, f, y = dataset[0] 12 | torchvision.utils.save_image(torch.stack([x, b, f]), 'test1.png') 13 | 14 | 15 | def test2(): 16 | tr = trans.Compose([trans.Resize(224), trans.CenterCrop(224), trans.ToTensor(), ]) 17 | dataset = BackgroundForegroundImageNet(root='./data/imagenet/train', download=False, boxes='boxes.pt', 18 | indices='indices.pt', transform=tr) 19 | x, b, f, y = dataset[1] 20 | torchvision.utils.save_image(torch.stack([x, b, f]), 'test2.png') 21 | 22 | 23 | if __name__ == '__main__': 24 | test1() 25 | test2() 26 | --------------------------------------------------------------------------------