├── .gitignore ├── CONTRIBUTING.md ├── LICENSE.txt ├── README.md ├── config.yml ├── environment.yml └── scripts ├── codraw_dataset_generation ├── codraw_add_data_to_raw.py ├── codraw_object_detection.py └── codraw_raw_to_hdf5.py ├── download_data.sh ├── iclevr_dataset_generation ├── iclevr_add_data_to_raw.py ├── iclevr_object_detection.py └── iclevr_raw_to_hdf5.py └── joint_codraw_iclevr └── generate_glove_file.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.h5 2 | *.rar 3 | *.swp 4 | *.zip 5 | 6 | GeNeVA-v1/ 7 | 8 | data/ 9 | raw-data/ 10 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | This project welcomes contributions and suggestions. Most contributions require you to 4 | agree to a Contributor License Agreement (CLA) declaring that you have the right to, 5 | and actually do, grant us the rights to use your contribution. For details, visit 6 | https://cla.microsoft.com. 7 | 8 | When you submit a pull request, a CLA-bot will automatically determine whether you need 9 | to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the 10 | instructions provided by the bot. You will only need to do this once across all repositories using our CLA. 11 | 12 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 13 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) 14 | or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 15 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Generative Neural Visual Artist (GeNeVA) - Datasets - Generation Code 2 | Copyright (c) Microsoft Corporation. All rights reserved. 3 | 4 | MIT License 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE. 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Generative Neural Visual Artist (GeNeVA) - Datasets - Generation Code 2 | 3 | Scripts to generate the `CoDraw` and `i-CLEVR` datasets used for the `GeNeVA` task proposed in [Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction](https://arxiv.org/abs/1811.09845). 4 | 5 | ## Setup ## 6 | 7 | ### 1. Install Miniconda 8 | 9 | wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh 10 | bash Miniconda3-latest-Linux-x86_64.sh 11 | rm Miniconda3-latest-Linux-x86_64.sh 12 | 13 | You will now have to restart your shell for the path changes to take effect. 14 | 15 | ### 2. Clone the repository 16 | 17 | git clone git@github.com:Maluuba/GeNeVA_datasets.git # use https://github.com/Maluuba/GeNeVA_datasets.git for HTTPS 18 | cd GeNeVA_datasets 19 | 20 | ### 3. Create a conda environment for this repository 21 | 22 | conda env create -f environment.yml 23 | 24 | ### 4. Activate the environment 25 | 26 | source activate geneva 27 | 28 | ### 5. Download external data files 29 | 30 | ./scripts/download_data.sh 31 | 32 | ### 6. Download GeNeVA data files to the repository 33 | 34 | Download the [GeNeVA zip file](https://www.microsoft.com/en-us/research/project/generative-neural-visual-artist-geneva/) and extract it as specified below: 35 | - `GeNeVA-v1.zip` 36 | ``` 37 | unzip GeNeVA-v1.zip 38 | ``` 39 | Please review the LICENSE for the GeNeVA zip file in the extracted `GeNeVA-v1` folder 40 | - `data.rar`: pre-generated data files for both datasets 41 | ``` 42 | rar x GeNeVA-v1/data.rar ./ # `sudo apt-get install rar` if rar is not installed 43 | ``` 44 | - `CoDraw_images.rar`: CoDraw images for each scene's json 45 | ``` 46 | rar x GeNeVA-v1/CoDraw_images.rar raw-data/CoDraw 47 | ``` 48 | - `i-CLEVR.rar`: i-CLEVR scene images, scene jsons, background image 49 | ``` 50 | rar x GeNeVA-v1/i-CLEVR.rar raw-data/ 51 | ``` 52 | 53 | ### 7. Generate dataset HDF5 files 54 | 55 | - Vocabulary 56 | ``` 57 | python scripts/joint_codraw_iclevr/generate_glove_file.py 58 | ``` 59 | - CoDraw 60 | ``` 61 | python scripts/codraw_dataset_generation/codraw_add_data_to_raw.py 62 | python scripts/codraw_dataset_generation/codraw_raw_to_hdf5.py # dataset for GeNeVA-GAN 63 | python scripts/codraw_dataset_generation/codraw_object_detection.py # dataset for Object Detector & Localizer 64 | ``` 65 | - i-CLEVR 66 | ``` 67 | python scripts/iclevr_dataset_generation/iclevr_add_data_to_raw.py 68 | python scripts/iclevr_dataset_generation/iclevr_raw_to_hdf5.py # dataset for GeNeVA-GAN 69 | python scripts/iclevr_dataset_generation/iclevr_object_detection.py # dataset for Object Detector & Localizer 70 | ``` 71 | 72 | ### 8. (Optional) Downloaded data can now be deleted 73 | 74 | rm raw-data/ -rf 75 | rm GeNeVA-v1/ -rf 76 | rm GeNeVA-v1.zip 77 | 78 | ## Reference ## 79 | If you use this code or the GeNeVA datasets as part of any published research, please cite the following paper: 80 | 81 | Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, and Graham W. Taylor. 82 | **"Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction"** 83 | *arXiv preprint arXiv:1811.09845* (2018). 84 | 85 | ```bibtex 86 | @article{elnouby2018tell_draw_repeat, 87 | author = {El{-}Nouby, Alaaeldin and Sharma, Shikhar and Schulz, Hannes and Hjelm, Devon and El Asri, Layla and Ebrahimi Kahou, Samira and Bengio, Yoshua and Taylor, Graham W.}, 88 | title = {Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction}, 89 | journal = {CoRR}, 90 | volume = {abs/1811.09845}, 91 | year = {2018}, 92 | url = {http://arxiv.org/abs/1811.09845}, 93 | archivePrefix = {arXiv}, 94 | eprint = {1811.09845} 95 | } 96 | ``` 97 | 98 | ## Microsoft Open Source Code of Conduct ## 99 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 100 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) 101 | or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 102 | 103 | ## License ## 104 | See [LICENSE.txt](LICENSE.txt). 105 | -------------------------------------------------------------------------------- /config.yml: -------------------------------------------------------------------------------- 1 | codraw_background: 'raw-data/CoDraw/background.png' 2 | codraw_images: 'raw-data/CoDraw/images/' 3 | codraw_objects_source: 'raw-data/CoDraw/10K_instance_occurence_58_names.txt' 4 | codraw_scenes: 'raw-data/CoDraw/output/' 5 | 6 | codraw_extracted_coordinates: 'data/CoDraw/extracted_coords.txt' 7 | codraw_hdf5_folder: 'data/CoDraw/' 8 | codraw_objects: 'data/CoDraw/objects.txt' 9 | codraw_png_to_object: 'data/CoDraw/png_to_object.txt' 10 | codraw_spell_check: 'data/CoDraw/bing_mappings.pkl' 11 | codraw_vocab: 'data/CoDraw/vocab.txt' 12 | 13 | iclevr_background: 'raw-data/i-CLEVR/background.png' 14 | iclevr_data_source: 'raw-data/i-CLEVR/' 15 | 16 | iclevr_hdf5_folder: 'data/iCLEVR/' 17 | iclevr_objects: 'data/iCLEVR/objects.txt' 18 | iclevr_vocab: 'data/iCLEVR/vocab.txt' 19 | 20 | glove_source: 'raw-data/GloVe/glove.840B.300d.txt' 21 | 22 | glove_output: 'data/CoDraw_iCLEVR/glove_codraw_iclevr.txt' 23 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: geneva 2 | channels: 3 | - defaults 4 | dependencies: 5 | - h5py=2.8.0=py36h989c5e5_3 6 | - nltk=3.4=py36_1 7 | - numpy=1.16.2=py36h7e9f1db_0 8 | - opencv=3.4.2=py36h6fd60c2_1 9 | - python=3.6.8=h0371630_0 10 | - pyyaml=5.1=py36h7b6447c_0 11 | - tqdm=4.31.1=py36_1 12 | -------------------------------------------------------------------------------- /scripts/codraw_dataset_generation/codraw_add_data_to_raw.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Microsoft Corporation. 2 | # Licensed under the MIT license. 3 | """ 4 | Script to extract object names and coordinate information from the CoDraw data 5 | """ 6 | from glob import glob 7 | import json 8 | 9 | import numpy as np 10 | from tqdm import tqdm 11 | import yaml 12 | 13 | 14 | with open('config.yml', 'r') as f: 15 | keys = yaml.load(f, Loader=yaml.FullLoader) 16 | 17 | 18 | def extract_object_names(): 19 | # save names of all the objects in the dataset 20 | with open(keys['codraw_objects_source'], 'r') as f: 21 | names = f.readlines() 22 | names = [name.strip().split('\t')[0].lower() for name in names] 23 | with open(keys['codraw_objects'], 'w') as f: 24 | for name in names: 25 | f.write(name+'\n') 26 | 27 | 28 | def extract_objects(): 29 | json_path = keys['codraw_scenes'] 30 | extracted_objects_file = open(keys['codraw_extracted_coordinates'], 'w') 31 | 32 | # load object names 33 | with open(keys['codraw_objects'], 'r') as f: 34 | OBJECTS = [l.split()[0] for l in f] 35 | # mapping from filenames of individual object images to the object names 36 | PNG_MAPPING = {} 37 | with open(keys['codraw_png_to_object'], 'r') as f: 38 | for l in f: 39 | splits = l.split('\t') 40 | PNG_MAPPING[splits[0]] = splits[1].strip() 41 | 42 | # loop through scene jsons and extract object information 43 | for scene_json in tqdm(sorted(glob('{}/*.json'.format(json_path)))): 44 | with open(scene_json, 'r') as f: 45 | scene = json.load(f) 46 | scene_id = scene['image_id'] 47 | 48 | turn_id = -1 49 | for dialog in scene['dialog']: 50 | if len(dialog['abs_d']) < 2: 51 | continue 52 | 53 | turn_id += 1 54 | bow = np.zeros((len(OBJECTS)), dtype=int) 55 | x_coords = np.ones((len(OBJECTS)), dtype=int) * -1 56 | y_coords = np.ones((len(OBJECTS)), dtype=int) * -1 57 | z_coords = np.ones((len(OBJECTS)), dtype=int) * -1 58 | 59 | dialog = dialog['abs_d'].split(',') 60 | 61 | idx = 1 62 | for _ in range(int(dialog[0])): 63 | png_filename = dialog[idx][:-4] 64 | x = float(dialog[idx + 4]) 65 | y = float(dialog[idx + 5]) 66 | z = float(dialog[idx + 6]) 67 | 68 | if abs(x) > 1000 or abs(y) > 1000 or abs(z) > 1000: 69 | idx = idx + 8 70 | continue 71 | 72 | index = OBJECTS.index(PNG_MAPPING[png_filename]) 73 | bow[index] = 1 74 | x_coords[index] = x 75 | y_coords[index] = y 76 | z_coords[index] = z 77 | 78 | idx = idx + 8 79 | 80 | meta_data = [] 81 | for e in range(len(OBJECTS)): 82 | object_meta = str.join(',', 83 | [str(bow[e]), 84 | str(x_coords[e]), 85 | str(y_coords[e]), 86 | str(z_coords[e])]) 87 | meta_data.append(object_meta) 88 | 89 | extracted_objects_file.write('Scene{}_{}\t{}\n'.format(scene_id, 90 | turn_id, 91 | str.join(' ', meta_data))) 92 | 93 | 94 | if __name__ == '__main__': 95 | extract_object_names() 96 | extract_objects() 97 | -------------------------------------------------------------------------------- /scripts/codraw_dataset_generation/codraw_object_detection.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Microsoft Corporation. 2 | # Licensed under the MIT license. 3 | """ 4 | Script to parse and read CoDraw data and save it in HDF5 format for Object Detector & Localizer 5 | """ 6 | from glob import glob 7 | import json 8 | import os 9 | 10 | import cv2 11 | import h5py 12 | import numpy as np 13 | from tqdm import tqdm 14 | import yaml 15 | 16 | 17 | with open('config.yml', 'r') as f: 18 | keys = yaml.load(f, Loader=yaml.FullLoader) 19 | 20 | 21 | def create_object_detection_dataset(): 22 | # load required keys 23 | scenes_path = keys['codraw_scenes'] 24 | images_path = keys['codraw_images'] 25 | background_img = cv2.imread(keys['codraw_background']) 26 | h5_path = keys['codraw_hdf5_folder'] 27 | codraw_extracted_coords = keys['codraw_extracted_coordinates'] 28 | 29 | # set height, width, scaling parameters 30 | h, w, _ = background_img.shape 31 | scale_x = 128. / w 32 | scale_y = 128. / h 33 | scaling_ratio = np.array([scale_x, scale_y, 1]) 34 | 35 | # create hdf5 files for train, val, test 36 | h5_train = h5py.File(os.path.join(h5_path, 'codraw_obj_train.h5'), 'w') 37 | h5_val = h5py.File(os.path.join(h5_path, 'codraw_obj_val.h5'), 'w') 38 | h5_test = h5py.File(os.path.join(h5_path, 'codraw_obj_test.h5'), 'w') 39 | 40 | # set objects and bow (bag of words) dicts for each image 41 | bow_dim = 0 42 | GT_BOW = {} 43 | GT_OBJECTS = {} 44 | with open(codraw_extracted_coords, 'r') as f: 45 | for line in f: 46 | splits = line.split('\t') 47 | image = splits[0] 48 | split_coords = lambda x: [int(c) for c in x.split(',')] 49 | bow = np.array([split_coords(b) for b in splits[1].split()]) 50 | bow_dim = len(bow) 51 | GT_BOW[image] = bow[:, 0] 52 | scaling = scaling_ratio * np.expand_dims(bow[:, 0], axis=1).repeat(3, 1) 53 | GT_OBJECTS[image] = (bow[:, 1:] * scaling).astype(int) 54 | 55 | # start saving data into hdf5; loop over all scenes 56 | c_train = -1 57 | c_val = -1 58 | c_test = -1 59 | for scene_file in tqdm(sorted(glob('{}/*json'.format(scenes_path)))): 60 | # identify if scene belongs to train / val / test 61 | split = scene_file.split('/')[-1].split('_')[0] 62 | 63 | with open(scene_file, 'r') as f: 64 | scene = json.load(f) 65 | scene_id = scene['image_id'] 66 | 67 | # loop over turns in a single scene 68 | idx = 0 69 | for i in range(len(scene['dialog'])): 70 | turn = scene['dialog'][i] 71 | 72 | bow = GT_BOW['Scene{}_{}'.format(scene_id, idx)] 73 | coords = GT_OBJECTS['Scene{}_{}'.format(scene_id, idx)] 74 | 75 | # if there is no image for current turn: merge with next turn 76 | if turn['abs_d'] == '': 77 | continue 78 | 79 | image = cv2.imread(os.path.join(images_path, 'Scene{}_{}.png'.format(scene_id, idx))) 80 | image = cv2.resize(image, (128, 128)) 81 | idx += 1 82 | 83 | if split == 'train': 84 | c_train += 1 85 | ex = h5_train.create_group(str(c_train)) 86 | elif split == 'val': 87 | c_val += 1 88 | ex = h5_val.create_group(str(c_val)) 89 | elif split == 'test': 90 | c_test += 1 91 | ex = h5_test.create_group(str(c_test)) 92 | 93 | ex.create_dataset('image', data=image) 94 | ex.create_dataset('objects', data=np.array(bow)) 95 | ex.create_dataset('coords', data=np.array(coords)) 96 | ex.create_dataset('scene_id', data=scene_id) 97 | 98 | 99 | if __name__ == '__main__': 100 | create_object_detection_dataset() 101 | -------------------------------------------------------------------------------- /scripts/codraw_dataset_generation/codraw_raw_to_hdf5.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Microsoft Corporation. 2 | # Licensed under the MIT license. 3 | """ 4 | Script to parse and read raw CoDraw data and save it in HDF5 format for GeNeVA-GAN 5 | """ 6 | from glob import glob 7 | import json 8 | import os 9 | import pickle 10 | import string 11 | 12 | import cv2 13 | import h5py 14 | import nltk 15 | import numpy as np 16 | from tqdm import tqdm 17 | import yaml 18 | 19 | 20 | with open('config.yml', 'r') as f: 21 | keys = yaml.load(f, Loader=yaml.FullLoader) 22 | 23 | 24 | def replace_at_offset(msg, tok, offset, tok_replace): 25 | before = msg[:offset] 26 | after = msg[offset:] 27 | after = after.replace(tok, tok_replace, 1) 28 | return before + after 29 | 30 | 31 | def create_h5(): 32 | # load required keys 33 | scenes_path = keys['codraw_scenes'] 34 | images_path = keys['codraw_images'] 35 | background_img = cv2.imread(keys['codraw_background']) 36 | h5_path = keys['codraw_hdf5_folder'] 37 | spell_check = keys['codraw_spell_check'] 38 | codraw_extracted_coords = keys['codraw_extracted_coordinates'] 39 | 40 | # set height, width, scaling parameters 41 | h, w, _ = background_img.shape 42 | scale_x = 128. / w 43 | scale_y = 128. / h 44 | scaling_ratio = np.array([scale_x, scale_y, 1]) 45 | background_img = cv2.resize(background_img, (128, 128)) 46 | 47 | # load spelling corrections - obtained via Bing Spell Check API 48 | with open(spell_check, 'rb') as f: 49 | spell_check = pickle.load(f) 50 | 51 | # create hdf5 files for train, val, test 52 | h5_train = h5py.File(os.path.join(h5_path, 'codraw_train.h5'), 'w') 53 | h5_val = h5py.File(os.path.join(h5_path, 'codraw_val.h5'), 'w') 54 | h5_test = h5py.File(os.path.join(h5_path, 'codraw_test.h5'), 'w') 55 | h5_train.create_dataset('background', data=background_img) 56 | h5_val.create_dataset('background', data=background_img) 57 | h5_test.create_dataset('background', data=background_img) 58 | 59 | # set objects and bow (bag of words) dicts for each image 60 | bow_dim = 0 61 | GT_BOW = {} 62 | GT_OBJECTS = {} 63 | with open(codraw_extracted_coords, 'r') as f: 64 | for line in f: 65 | splits = line.split('\t') 66 | image = splits[0] 67 | split_coords = lambda x: [int(c) for c in x.split(',')] 68 | bow = np.array([split_coords(b) for b in splits[1].split()]) 69 | bow_dim = len(bow) 70 | GT_BOW[image] = bow[:, 0] 71 | scaling = scaling_ratio * np.expand_dims(bow[:, 0], axis=1).repeat(3, 1) 72 | GT_OBJECTS[image] = (bow[:, 1:] * scaling).astype(int) 73 | 74 | # mark purely chitchat turns to be removed 75 | chitchat = ['hi', 'done', 'ok', 'alright', 'okay', 'thanks', 'bye', 'hello'] 76 | 77 | # start saving data into hdf5; loop over all scenes 78 | c_train = 0 79 | c_val = 0 80 | c_test = 0 81 | for scene_file in tqdm(sorted(glob('{}/*json'.format(scenes_path)))): 82 | # identify if scene belongs to train / val / test 83 | split = scene_file.split('/')[-1].split('_')[0] 84 | 85 | images = [] 86 | utterences = [] 87 | objects = [] 88 | coordinates = [] 89 | 90 | with open(scene_file, 'r') as f: 91 | scene = json.load(f) 92 | scene_id = scene['image_id'] 93 | 94 | # loop over turns in a single scene 95 | idx = 0 96 | prev_bow = np.zeros((bow_dim)) 97 | description = [] 98 | for i in range(len(scene['dialog'])): 99 | bow = GT_BOW['Scene{}_{}'.format(scene_id, idx)] 100 | # new objects added in this turn 101 | hamming_distance = np.sum(bow - prev_bow) 102 | turn = scene['dialog'][i] 103 | # lowercase all messages 104 | teller = str.lower(turn['msg_t']) 105 | drawer = str.lower(turn['msg_d']) 106 | # clear chitchat turns 107 | if teller in chitchat: 108 | teller = '' 109 | if drawer in chitchat: 110 | drawer = '' 111 | 112 | # replace with spelling suggestions returned by Bing Spell Check API 113 | if teller in spell_check and len(spell_check[teller]['flaggedTokens']) != 0: 114 | for flagged_token in spell_check[teller]['flaggedTokens']: 115 | tok = flagged_token['token'] 116 | tok_offset = flagged_token['offset'] 117 | assert len(flagged_token['suggestions']) == 1 118 | tok_replace = flagged_token['suggestions'][0]['suggestion'] 119 | teller = replace_at_offset(teller, tok, tok_offset, tok_replace) 120 | if drawer in spell_check and len(spell_check[drawer]['flaggedTokens']) != 0: 121 | for flagged_token in spell_check[drawer]['flaggedTokens']: 122 | tok = flagged_token['token'] 123 | tok_offset = flagged_token['offset'] 124 | assert len(flagged_token['suggestions']) == 1 125 | tok_replace = flagged_token['suggestions'][0]['suggestion'] 126 | drawer = replace_at_offset(drawer, tok, tok_offset, tok_replace) 127 | 128 | # add delimiting tokens: , 129 | if teller != '': 130 | description += [''] + nltk.word_tokenize(teller) 131 | if drawer != '': 132 | description += [''] + nltk.word_tokenize(drawer) 133 | 134 | description = [w for w in description if w not in chitchat] 135 | description = [w for w in description if w not in string.punctuation] 136 | 137 | bow = GT_BOW['Scene{}_{}'.format(scene_id, idx)] 138 | coords = GT_OBJECTS['Scene{}_{}'.format(scene_id, idx)] 139 | 140 | # if there is no image for current turn: merge with next turn 141 | if turn['abs_d'] == '': 142 | continue 143 | 144 | # if no new object is added in image for current turn: merge with next turn 145 | if hamming_distance < 1: 146 | prev_bow = bow 147 | idx += 1 148 | continue 149 | 150 | # queue image, instruction, objects bow, object coordinates for saving 151 | if len(description) > 0: 152 | image = cv2.imread(os.path.join(images_path, 'Scene{}_{}.png'.format(scene_id, idx))) 153 | image = cv2.resize(image, (128, 128)) 154 | 155 | images.append(image) 156 | utterences.append(str.join(' ', description)) 157 | objects.append(bow) 158 | coordinates.append(coords) 159 | 160 | description = [] 161 | idx += 1 162 | prev_bow = bow 163 | 164 | # add current scene's data to hdf5 165 | if len(images) > 0: 166 | if split == 'train': 167 | scene = h5_train.create_group(str(c_train)) 168 | c_train += 1 169 | elif split == 'val': 170 | scene = h5_val.create_group(str(c_val)) 171 | c_val += 1 172 | elif split == 'test': 173 | scene = h5_test.create_group(str(c_test)) 174 | c_test += 1 175 | 176 | scene.create_dataset('images', data=images) 177 | dt = h5py.special_dtype(vlen=str) 178 | scene.create_dataset('utterences', data=np.string_(utterences), dtype=dt) 179 | scene.create_dataset('objects', data=np.array(objects)) 180 | scene.create_dataset('coords', data=np.array(coordinates)) 181 | scene.create_dataset('scene_id', data=scene_id) 182 | else: 183 | print(scene_id) 184 | 185 | 186 | if __name__ == '__main__': 187 | create_h5() 188 | -------------------------------------------------------------------------------- /scripts/download_data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -e 4 | 5 | function download () { 6 | URL=$1 7 | TGTDIR=. 8 | if [ -n "$2" ]; then 9 | TGTDIR=$2 10 | mkdir -p $TGTDIR 11 | fi 12 | echo "Downloading ${URL} to ${TGTDIR}" 13 | wget $URL -P $TGTDIR 14 | } 15 | 16 | # GloVe data 17 | if [ ! -f raw-data/GloVe/glove.840B.300d.txt ] 18 | then 19 | download http://nlp.stanford.edu/data/glove.840B.300d.zip 20 | mkdir --parents raw-data/GloVe 21 | unzip glove.840B.300d.zip glove.840B.300d.txt -d raw-data/GloVe 22 | rm -f glove.840B.300d.zip 23 | fi 24 | 25 | # get CoDraw GitHub repository 26 | if [ ! -d raw-data/CoDraw/asset ] 27 | then 28 | git clone https://github.com/facebookresearch/CoDraw.git raw-data/CoDraw 29 | fi 30 | 31 | # get CoDraw individual json files 32 | if [ ! -d raw-data/CoDraw/output ] 33 | then 34 | cd raw-data/CoDraw 35 | if [ ! -f dataset/CoDraw_1_0.json ] 36 | then 37 | mkdir --parents dataset 38 | wget -O dataset/CoDraw_1_0.json https://github.com/facebookresearch/CoDraw/releases/download/v1.0/CoDraw_1_0.json 39 | fi 40 | python script/preprocess.py dataset/CoDraw_1_0.json 41 | rm dataset/CoDraw_1_0.json 42 | cd ../../ 43 | fi 44 | 45 | # get CoDraw background image and object names 46 | if [ ! -f raw-data/CoDraw/background.png ] 47 | then 48 | download https://vision.ece.vt.edu/clipart/dataset/AbstractScenes_v1.1.zip 49 | unzip -j AbstractScenes_v1.1.zip AbstractScenes_v1.1/Pngs/background.png -d raw-data/CoDraw 50 | unzip -j AbstractScenes_v1.1.zip AbstractScenes_v1.1/VisualFeatures/10K_instance_occurence_58_names.txt -d raw-data/CoDraw 51 | rm AbstractScenes_v1.1.zip 52 | fi 53 | -------------------------------------------------------------------------------- /scripts/iclevr_dataset_generation/iclevr_add_data_to_raw.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Microsoft Corporation. 2 | # Licensed under the MIT license. 3 | """ 4 | Script to create a list of all objects in the i-CLEVR data 5 | """ 6 | import itertools 7 | import yaml 8 | 9 | 10 | with open('config.yml', 'r') as f: 11 | keys = yaml.load(f, Loader=yaml.FullLoader) 12 | 13 | 14 | COLORS = ['gray', 'red', 'blue', 'green', 'brown', 'purple', 'cyan', 'yellow'] 15 | SHAPES = ['cube', 'sphere', 'cylinder'] 16 | 17 | 18 | def create_vocab(): 19 | obj_list = list(itertools.product(SHAPES, COLORS)) 20 | obj_list = [' '.join(x) for x in obj_list] 21 | 22 | with open(keys['iclevr_objects'], 'w') as f: 23 | for item in obj_list: 24 | f.write("%s\n" % item) 25 | 26 | 27 | if __name__ == '__main__': 28 | create_vocab() 29 | -------------------------------------------------------------------------------- /scripts/iclevr_dataset_generation/iclevr_object_detection.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Microsoft Corporation. 2 | # Licensed under the MIT license. 3 | """ 4 | Script to parse and read i-CLEVR data and save it in HDF5 format for Object Detector & Localizer 5 | """ 6 | from glob import glob 7 | import json 8 | import os 9 | 10 | import cv2 11 | import h5py 12 | import numpy as np 13 | from tqdm import tqdm 14 | import yaml 15 | 16 | 17 | with open('config.yml', 'r') as f: 18 | keys = yaml.load(f, Loader=yaml.FullLoader) 19 | 20 | 21 | def create_h5(): 22 | # load required keys 23 | data_path = keys['iclevr_data_source'] 24 | output_path = keys['iclevr_hdf5_folder'] 25 | OBJECTS = keys['iclevr_objects'] 26 | with open(OBJECTS, 'r') as f: 27 | OBJECTS = f.readlines() 28 | OBJECTS = [tuple(x.strip().split()) for x in OBJECTS] 29 | background_path = keys['iclevr_background'] 30 | 31 | # create hdf5 files for train, val, test 32 | train_h5 = h5py.File(os.path.join(output_path, 'clevr_obj_train.h5'), 'w') 33 | val_h5 = h5py.File(os.path.join(output_path, 'clevr_obj_val.h5'), 'w') 34 | test_h5 = h5py.File(os.path.join(output_path, 'clevr_obj_test.h5'), 'w') 35 | 36 | json_path = os.path.join(data_path, 'scenes') 37 | images_path = os.path.join(data_path, 'images') 38 | 39 | background_image = cv2.imread(background_path) 40 | 41 | entites = json.dumps(['{} {}'.format(e[0], e[1]) for e in OBJECTS]) 42 | 43 | # start saving data into hdf5; loop over all scenes 44 | c_train = -1 45 | c_val = -1 46 | c_test = -1 47 | for scene in tqdm(glob(json_path + '/*.json')): 48 | filename = os.path.basename(scene) 49 | with open(scene, 'r') as f: 50 | scene = json.load(f) 51 | 52 | # identify if scene belongs to train / val / test 53 | split = filename.split('_')[1] 54 | scene_id = filename.split('_')[2][:-5] 55 | 56 | # add images 57 | images_files = sorted(glob(os.path.join(images_path, 'CLEVR_{}_{}_*'.format(split, scene_id)))) 58 | images = [] 59 | for t, image_file in enumerate(images_files): 60 | image = cv2.imread(image_file) 61 | image = cv2.resize(image, (128, 128)) 62 | images.append(image) 63 | 64 | # add objects and object coordinates 65 | agg_object = np.zeros(24) 66 | objects = np.zeros((5, 24)) 67 | agg_object_coords = np.zeros((24, 3)) 68 | object_coords = np.zeros((5, 24, 3)) 69 | for t, obj in enumerate(scene['objects']): 70 | color = obj['color'] 71 | shape = obj['shape'] 72 | index = OBJECTS.index((shape, color)) 73 | agg_object[index] = 1 74 | objects[t] = agg_object 75 | agg_object_coords[index] = [obj['pixel_coords'][0]/320.*128, obj['pixel_coords'][1]/240.*128, obj['pixel_coords'][2]] 76 | object_coords[t] = agg_object_coords 77 | 78 | for t, obj in enumerate(scene['objects']): 79 | if split == 'train': 80 | c_train += 1 81 | sample = train_h5.create_group(str(c_train)) 82 | elif split == 'val': 83 | c_val += 1 84 | sample = val_h5.create_group(str(c_val)) 85 | else: 86 | c_test += 1 87 | sample = test_h5.create_group(str(c_test)) 88 | 89 | sample.create_dataset('scene_id', data=scene_id) 90 | sample.create_dataset('image', data=np.array(images)[t]) 91 | sample.create_dataset('objects', data=objects[t]) 92 | sample.create_dataset('coords', data=np.array(object_coords)[t]) 93 | 94 | 95 | if __name__ == '__main__': 96 | create_h5() 97 | -------------------------------------------------------------------------------- /scripts/iclevr_dataset_generation/iclevr_raw_to_hdf5.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Microsoft Corporation. 2 | # Licensed under the MIT license. 3 | """ 4 | Script to parse and read raw i-CLEVR data and save it in HDF5 format for GeNeVA-GAN 5 | """ 6 | from glob import glob 7 | import json 8 | import os 9 | 10 | import cv2 11 | import h5py 12 | import numpy as np 13 | from tqdm import tqdm 14 | import yaml 15 | 16 | 17 | with open('config.yml', 'r') as f: 18 | keys = yaml.load(f, Loader=yaml.FullLoader) 19 | 20 | 21 | def create_h5(): 22 | # load required keys 23 | data_path = keys['iclevr_data_source'] 24 | output_path = keys['iclevr_hdf5_folder'] 25 | OBJECTS = keys['iclevr_objects'] 26 | with open(OBJECTS, 'r') as f: 27 | OBJECTS = f.readlines() 28 | OBJECTS = [tuple(x.strip().split()) for x in OBJECTS] 29 | background_path = keys['iclevr_background'] 30 | 31 | # create hdf5 files for train, val, test 32 | train_h5 = h5py.File(os.path.join(output_path, 'clevr_train.h5'), 'w') 33 | val_h5 = h5py.File(os.path.join(output_path, 'clevr_val.h5'), 'w') 34 | test_h5 = h5py.File(os.path.join(output_path, 'clevr_test.h5'), 'w') 35 | 36 | json_path = os.path.join(data_path, 'scenes/') 37 | images_path = os.path.join(data_path, 'images/') 38 | text_path = os.path.join(data_path, 'text/') 39 | 40 | # add background image to hdf5 41 | background_image = cv2.imread(background_path) 42 | train_h5.create_dataset('background', data=background_image) 43 | val_h5.create_dataset('background', data=background_image) 44 | test_h5.create_dataset('background', data=background_image) 45 | 46 | # add object properties to hdf5 47 | entites = json.dumps(['{} {}'.format(e[0], e[1]) for e in OBJECTS]) 48 | train_h5.create_dataset('entities', data=entites) 49 | val_h5.create_dataset('entities', data=entites) 50 | test_h5.create_dataset('entities', data=entites) 51 | 52 | # start saving data into hdf5; loop over all scenes 53 | for scene in tqdm(glob(json_path + '/*.json')): 54 | filename = os.path.basename(scene) 55 | with open(scene, 'r') as f: 56 | scene = json.load(f) 57 | 58 | # identify if scene belongs to train / val / test 59 | split = filename.split('_')[1] 60 | scene_id = filename.split('_')[2][:-5] 61 | 62 | # add text 63 | text_file = os.path.join(text_path, 'CLEVR_{}_{}.txt'.format(split, scene_id)) 64 | with open(text_file, 'r') as f: 65 | text = [line.strip() for line in f] 66 | 67 | # add images 68 | images_files = sorted(glob(os.path.join(images_path, 'CLEVR_{}_{}_*'.format(split, scene_id)))) 69 | images = [] 70 | for t, image_file in enumerate(images_files): 71 | image = cv2.imread(image_file) 72 | image = cv2.resize(image, (128, 128)) 73 | images.append(image) 74 | 75 | # add objects and object coordinates 76 | agg_object = np.zeros(24) 77 | objects = np.zeros((5, 24)) 78 | agg_object_coords = np.zeros((24, 3)) 79 | object_coords = np.zeros((5, 24, 3)) 80 | for t, obj in enumerate(scene['objects']): 81 | color = obj['color'] 82 | shape = obj['shape'] 83 | index = OBJECTS.index((shape, color)) 84 | agg_object[index] = 1 85 | objects[t] = agg_object 86 | agg_object_coords[index] = [obj['pixel_coords'][0]/320.*128, obj['pixel_coords'][1]/240.*128, obj['pixel_coords'][2]] 87 | object_coords[t] = agg_object_coords 88 | 89 | if split == 'train': 90 | sample = train_h5.create_group(scene_id) 91 | elif split == 'val': 92 | sample = val_h5.create_group(scene_id) 93 | else: 94 | sample = test_h5.create_group(scene_id) 95 | 96 | sample.create_dataset('scene_id', data=scene_id) 97 | sample.create_dataset('images', data=np.array(images)) 98 | sample.create_dataset('text', data=json.dumps(text)) 99 | sample.create_dataset('objects', data=objects) 100 | sample.create_dataset('coords', data=np.array(object_coords)) 101 | 102 | 103 | if __name__ == '__main__': 104 | create_h5() 105 | -------------------------------------------------------------------------------- /scripts/joint_codraw_iclevr/generate_glove_file.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Microsoft Corporation. 2 | # Licensed under the MIT license. 3 | """ 4 | Script to generate the GloVe embedding file for the CoDraw and i-CLEVR dataset vocabularies 5 | """ 6 | from tqdm import tqdm 7 | import yaml 8 | 9 | 10 | with open('config.yml', 'r') as f: 11 | keys = yaml.load(f, Loader=yaml.FullLoader) 12 | 13 | 14 | def generate_glove_file(): 15 | codraw_vocab = keys['codraw_vocab'] 16 | clevr_vocab = keys['iclevr_vocab'] 17 | output_file = keys['glove_output'] 18 | original_glove = keys['glove_source'] 19 | 20 | # read CoDraw vocabulary 21 | with open(codraw_vocab, 'r') as f: 22 | codraw_vocab = f.readlines() 23 | codraw_vocab = [x.strip().rsplit(' ', 1)[0] for x in codraw_vocab] 24 | 25 | # read i-CLEVR vocabulary 26 | with open(clevr_vocab, 'r') as f: 27 | clevr_vocab = f.readlines() 28 | clevr_vocab = [x.strip().rsplit(' ', 1)[0] for x in clevr_vocab] 29 | 30 | # combine vocabularies and add special tokens for CoDraw Drawer and Teller 31 | codraw_vocab += clevr_vocab + ['', ''] 32 | codraw_vocab = list(set(codraw_vocab)) 33 | codraw_vocab.sort() 34 | 35 | print('Loading GloVe file. This might take a few minutes.') 36 | with open(original_glove, 'r') as f: 37 | original_glove = f.readlines() 38 | tok_glove_pairs = [x.strip().split(' ', 1) for x in original_glove] 39 | 40 | # extract GloVe vectors for vocabulary tokens 41 | for token, glove_emb in tqdm(tok_glove_pairs): 42 | if token == 'unk': 43 | unk_embedding = glove_emb 44 | try: 45 | token_idx = codraw_vocab.index(token) 46 | except ValueError: 47 | continue 48 | else: 49 | codraw_vocab[token_idx] = ' '.join([token, glove_emb]) 50 | 51 | # set Drawer and Teller token vectors; assign 'unk' GloVe embedding to unknown words 52 | unk_count = 0 53 | for itidx, item in enumerate(codraw_vocab): 54 | if len(item.split(' ')) == 1: 55 | if item == '': 56 | codraw_vocab[itidx] = ' '.join(['', ('0.1 ' * 150 + '0.0 ' * 150)[:-1]]) 57 | elif item == '': 58 | codraw_vocab[itidx] = ' '.join(['', ('0.0 ' * 150 + '0.1 ' * 150)[:-1]]) 59 | else: 60 | unk_count += 1 61 | codraw_vocab[itidx] = ' '.join([item, unk_embedding]) 62 | 63 | # write GloVe vector file for the CoDraw and i-CLEVR datasets combined 64 | with open(output_file, 'w') as f: 65 | for item in codraw_vocab: 66 | f.write('%s\n' % item) 67 | 68 | print('Total words in vocab: {}\n`unk` embedding words: {}'.format(len(codraw_vocab), unk_count)) 69 | 70 | 71 | if __name__ == '__main__': 72 | generate_glove_file() 73 | --------------------------------------------------------------------------------