├── .gitignore
├── CONTRIBUTING.md
├── LICENSE.txt
├── README.md
├── config.yml
├── environment.yml
└── scripts
    ├── codraw_dataset_generation
        ├── codraw_add_data_to_raw.py
        ├── codraw_object_detection.py
        └── codraw_raw_to_hdf5.py
    ├── download_data.sh
    ├── iclevr_dataset_generation
        ├── iclevr_add_data_to_raw.py
        ├── iclevr_object_detection.py
        └── iclevr_raw_to_hdf5.py
    └── joint_codraw_iclevr
        └── generate_glove_file.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | *.h5
 2 | *.rar
 3 | *.swp
 4 | *.zip
 5 | 
 6 | GeNeVA-v1/
 7 | 
 8 | data/
 9 | raw-data/
10 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing
 2 | 
 3 | This project welcomes contributions and suggestions. Most contributions require you to
 4 | agree to a Contributor License Agreement (CLA) declaring that you have the right to,
 5 | and actually do, grant us the rights to use your contribution. For details, visit
 6 | https://cla.microsoft.com.
 7 | 
 8 | When you submit a pull request, a CLA-bot will automatically determine whether you need
 9 | to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the
10 | instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
11 | 
12 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
13 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
14 | or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
15 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | Generative Neural Visual Artist (GeNeVA) - Datasets - Generation Code
 2 | Copyright (c) Microsoft Corporation. All rights reserved.
 3 | 
 4 | MIT License
 5 | 
 6 | Permission is hereby granted, free of charge, to any person obtaining a copy
 7 | of this software and associated documentation files (the "Software"), to deal
 8 | in the Software without restriction, including without limitation the rights
 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10 | copies of the Software, and to permit persons to whom the Software is
11 | furnished to do so, subject to the following conditions:
12 | 
13 | The above copyright notice and this permission notice shall be included in all
14 | copies or substantial portions of the Software.
15 | 
16 | THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22 | SOFTWARE.
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Generative Neural Visual Artist (GeNeVA) - Datasets - Generation Code
  2 | 
  3 | Scripts to generate the `CoDraw` and `i-CLEVR` datasets used for the `GeNeVA` task proposed in [Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction](https://arxiv.org/abs/1811.09845).
  4 | 
  5 | ## Setup ##
  6 | 
  7 | ### 1. Install Miniconda
  8 | 
  9 |     wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
 10 |     bash Miniconda3-latest-Linux-x86_64.sh
 11 |     rm Miniconda3-latest-Linux-x86_64.sh
 12 | 
 13 | You will now have to restart your shell for the path changes to take effect.
 14 | 
 15 | ### 2. Clone the repository
 16 | 
 17 |     git clone git@github.com:Maluuba/GeNeVA_datasets.git  # use https://github.com/Maluuba/GeNeVA_datasets.git for HTTPS
 18 |     cd GeNeVA_datasets
 19 | 
 20 | ### 3. Create a conda environment for this repository
 21 | 
 22 |     conda env create -f environment.yml
 23 | 
 24 | ### 4. Activate the environment
 25 | 
 26 |     source activate geneva
 27 | 
 28 | ### 5. Download external data files
 29 | 
 30 |     ./scripts/download_data.sh
 31 | 
 32 | ### 6. Download GeNeVA data files to the repository
 33 | 
 34 | Download the [GeNeVA zip file](https://www.microsoft.com/en-us/research/project/generative-neural-visual-artist-geneva/) and extract it as specified below:
 35 |  - `GeNeVA-v1.zip`
 36 |     ```
 37 |     unzip GeNeVA-v1.zip
 38 |     ```
 39 |     Please review the LICENSE for the GeNeVA zip file in the extracted `GeNeVA-v1` folder
 40 |  - `data.rar`: pre-generated data files for both datasets
 41 |     ```
 42 |     rar x GeNeVA-v1/data.rar ./  # `sudo apt-get install rar` if rar is not installed
 43 |     ```
 44 |  - `CoDraw_images.rar`: CoDraw images for each scene's json
 45 |     ```
 46 |     rar x GeNeVA-v1/CoDraw_images.rar raw-data/CoDraw
 47 |     ```
 48 |  - `i-CLEVR.rar`: i-CLEVR scene images, scene jsons, background image
 49 |     ```
 50 |     rar x GeNeVA-v1/i-CLEVR.rar raw-data/
 51 |     ```
 52 | 
 53 | ### 7. Generate dataset HDF5 files
 54 | 
 55 |  - Vocabulary
 56 |     ```
 57 |     python scripts/joint_codraw_iclevr/generate_glove_file.py
 58 |     ```
 59 |  - CoDraw
 60 |     ```
 61 |     python scripts/codraw_dataset_generation/codraw_add_data_to_raw.py
 62 |     python scripts/codraw_dataset_generation/codraw_raw_to_hdf5.py       # dataset for GeNeVA-GAN
 63 |     python scripts/codraw_dataset_generation/codraw_object_detection.py  # dataset for Object Detector & Localizer
 64 |     ```
 65 |  - i-CLEVR
 66 |     ```
 67 |     python scripts/iclevr_dataset_generation/iclevr_add_data_to_raw.py
 68 |     python scripts/iclevr_dataset_generation/iclevr_raw_to_hdf5.py       # dataset for GeNeVA-GAN
 69 |     python scripts/iclevr_dataset_generation/iclevr_object_detection.py  # dataset for Object Detector & Localizer
 70 |     ```
 71 | 
 72 | ### 8. (Optional) Downloaded data can now be deleted
 73 | 
 74 |     rm raw-data/ -rf
 75 |     rm GeNeVA-v1/ -rf
 76 |     rm GeNeVA-v1.zip
 77 | 
 78 | ## Reference ##
 79 | If you use this code or the GeNeVA datasets as part of any published research, please cite the following paper:
 80 | 
 81 | Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, and Graham W. Taylor.
 82 | **"Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction"**
 83 | *arXiv preprint arXiv:1811.09845* (2018).
 84 | 
 85 | ```bibtex
 86 | @article{elnouby2018tell_draw_repeat,
 87 |     author  = {El{-}Nouby, Alaaeldin and Sharma, Shikhar and Schulz, Hannes and Hjelm, Devon and El Asri, Layla and Ebrahimi Kahou, Samira and Bengio, Yoshua and Taylor, Graham W.},
 88 |     title   = {Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction},
 89 |     journal = {CoRR},
 90 |     volume  = {abs/1811.09845},
 91 |     year    = {2018},
 92 |     url     = {http://arxiv.org/abs/1811.09845},
 93 |     archivePrefix = {arXiv},
 94 |     eprint  = {1811.09845}
 95 | }
 96 | ```
 97 | 
 98 | ## Microsoft Open Source Code of Conduct ##
 99 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
100 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
101 | or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
102 | 
103 | ## License ##
104 | See [LICENSE.txt](LICENSE.txt).
105 | 


--------------------------------------------------------------------------------
/config.yml:
--------------------------------------------------------------------------------
 1 | codraw_background: 'raw-data/CoDraw/background.png'
 2 | codraw_images: 'raw-data/CoDraw/images/'
 3 | codraw_objects_source: 'raw-data/CoDraw/10K_instance_occurence_58_names.txt'
 4 | codraw_scenes: 'raw-data/CoDraw/output/'
 5 | 
 6 | codraw_extracted_coordinates: 'data/CoDraw/extracted_coords.txt'
 7 | codraw_hdf5_folder: 'data/CoDraw/'
 8 | codraw_objects: 'data/CoDraw/objects.txt'
 9 | codraw_png_to_object: 'data/CoDraw/png_to_object.txt'
10 | codraw_spell_check: 'data/CoDraw/bing_mappings.pkl'
11 | codraw_vocab: 'data/CoDraw/vocab.txt'
12 | 
13 | iclevr_background: 'raw-data/i-CLEVR/background.png'
14 | iclevr_data_source: 'raw-data/i-CLEVR/'
15 | 
16 | iclevr_hdf5_folder: 'data/iCLEVR/'
17 | iclevr_objects: 'data/iCLEVR/objects.txt'
18 | iclevr_vocab: 'data/iCLEVR/vocab.txt'
19 | 
20 | glove_source: 'raw-data/GloVe/glove.840B.300d.txt'
21 | 
22 | glove_output: 'data/CoDraw_iCLEVR/glove_codraw_iclevr.txt'
23 | 


--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
 1 | name: geneva
 2 | channels:
 3 |   - defaults
 4 | dependencies:
 5 |   - h5py=2.8.0=py36h989c5e5_3
 6 |   - nltk=3.4=py36_1
 7 |   - numpy=1.16.2=py36h7e9f1db_0
 8 |   - opencv=3.4.2=py36h6fd60c2_1
 9 |   - python=3.6.8=h0371630_0
10 |   - pyyaml=5.1=py36h7b6447c_0
11 |   - tqdm=4.31.1=py36_1
12 | 


--------------------------------------------------------------------------------
/scripts/codraw_dataset_generation/codraw_add_data_to_raw.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Microsoft Corporation.
 2 | # Licensed under the MIT license.
 3 | """
 4 | Script to extract object names and coordinate information from the CoDraw data
 5 | """
 6 | from glob import glob
 7 | import json
 8 | 
 9 | import numpy as np
10 | from tqdm import tqdm
11 | import yaml
12 | 
13 | 
14 | with open('config.yml', 'r') as f:
15 |     keys = yaml.load(f, Loader=yaml.FullLoader)
16 | 
17 | 
18 | def extract_object_names():
19 |     # save names of all the objects in the dataset
20 |     with open(keys['codraw_objects_source'], 'r') as f:
21 |         names = f.readlines()
22 |         names = [name.strip().split('\t')[0].lower() for name in names]
23 |     with open(keys['codraw_objects'], 'w') as f:
24 |         for name in names:
25 |             f.write(name+'\n')
26 | 
27 | 
28 | def extract_objects():
29 |     json_path = keys['codraw_scenes']
30 |     extracted_objects_file = open(keys['codraw_extracted_coordinates'], 'w')
31 | 
32 |     # load object names
33 |     with open(keys['codraw_objects'], 'r') as f:
34 |         OBJECTS = [l.split()[0] for l in f]
35 |     # mapping from filenames of individual object images to the object names
36 |     PNG_MAPPING = {}
37 |     with open(keys['codraw_png_to_object'], 'r') as f:
38 |         for l in f:
39 |             splits = l.split('\t')
40 |             PNG_MAPPING[splits[0]] = splits[1].strip()
41 | 
42 |     # loop through scene jsons and extract object information
43 |     for scene_json in tqdm(sorted(glob('{}/*.json'.format(json_path)))):
44 |         with open(scene_json, 'r') as f:
45 |             scene = json.load(f)
46 |             scene_id = scene['image_id']
47 | 
48 |             turn_id = -1
49 |             for dialog in scene['dialog']:
50 |                 if len(dialog['abs_d']) < 2:
51 |                     continue
52 | 
53 |                 turn_id += 1
54 |                 bow = np.zeros((len(OBJECTS)), dtype=int)
55 |                 x_coords = np.ones((len(OBJECTS)), dtype=int) * -1
56 |                 y_coords = np.ones((len(OBJECTS)), dtype=int) * -1
57 |                 z_coords = np.ones((len(OBJECTS)), dtype=int) * -1
58 | 
59 |                 dialog = dialog['abs_d'].split(',')
60 | 
61 |                 idx = 1
62 |                 for _ in range(int(dialog[0])):
63 |                     png_filename = dialog[idx][:-4]
64 |                     x = float(dialog[idx + 4])
65 |                     y = float(dialog[idx + 5])
66 |                     z = float(dialog[idx + 6])
67 | 
68 |                     if abs(x) > 1000 or abs(y) > 1000 or abs(z) > 1000:
69 |                         idx = idx + 8
70 |                         continue
71 | 
72 |                     index = OBJECTS.index(PNG_MAPPING[png_filename])
73 |                     bow[index] = 1
74 |                     x_coords[index] = x
75 |                     y_coords[index] = y
76 |                     z_coords[index] = z
77 | 
78 |                     idx = idx + 8
79 | 
80 |                 meta_data = []
81 |                 for e in range(len(OBJECTS)):
82 |                     object_meta = str.join(',',
83 |                                            [str(bow[e]),
84 |                                             str(x_coords[e]),
85 |                                             str(y_coords[e]),
86 |                                             str(z_coords[e])])
87 |                     meta_data.append(object_meta)
88 | 
89 |                 extracted_objects_file.write('Scene{}_{}\t{}\n'.format(scene_id,
90 |                                                                        turn_id,
91 |                                                                        str.join(' ', meta_data)))
92 | 
93 | 
94 | if __name__ == '__main__':
95 |     extract_object_names()
96 |     extract_objects()
97 | 


--------------------------------------------------------------------------------
/scripts/codraw_dataset_generation/codraw_object_detection.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Microsoft Corporation.
  2 | # Licensed under the MIT license.
  3 | """
  4 | Script to parse and read CoDraw data and save it in HDF5 format for Object Detector & Localizer
  5 | """
  6 | from glob import glob
  7 | import json
  8 | import os
  9 | 
 10 | import cv2
 11 | import h5py
 12 | import numpy as np
 13 | from tqdm import tqdm
 14 | import yaml
 15 | 
 16 | 
 17 | with open('config.yml', 'r') as f:
 18 |     keys = yaml.load(f, Loader=yaml.FullLoader)
 19 | 
 20 | 
 21 | def create_object_detection_dataset():
 22 |     # load required keys
 23 |     scenes_path = keys['codraw_scenes']
 24 |     images_path = keys['codraw_images']
 25 |     background_img = cv2.imread(keys['codraw_background'])
 26 |     h5_path = keys['codraw_hdf5_folder']
 27 |     codraw_extracted_coords = keys['codraw_extracted_coordinates']
 28 | 
 29 |     # set height, width, scaling parameters
 30 |     h, w, _ = background_img.shape
 31 |     scale_x = 128. / w
 32 |     scale_y = 128. / h
 33 |     scaling_ratio = np.array([scale_x, scale_y, 1])
 34 | 
 35 |     # create hdf5 files for train, val, test
 36 |     h5_train = h5py.File(os.path.join(h5_path, 'codraw_obj_train.h5'), 'w')
 37 |     h5_val = h5py.File(os.path.join(h5_path, 'codraw_obj_val.h5'), 'w')
 38 |     h5_test = h5py.File(os.path.join(h5_path, 'codraw_obj_test.h5'), 'w')
 39 | 
 40 |     # set objects and bow (bag of words) dicts for each image
 41 |     bow_dim = 0
 42 |     GT_BOW = {}
 43 |     GT_OBJECTS = {}
 44 |     with open(codraw_extracted_coords, 'r') as f:
 45 |         for line in f:
 46 |             splits = line.split('\t')
 47 |             image = splits[0]
 48 |             split_coords = lambda x: [int(c) for c in x.split(',')]
 49 |             bow = np.array([split_coords(b) for b in splits[1].split()])
 50 |             bow_dim = len(bow)
 51 |             GT_BOW[image] = bow[:, 0]
 52 |             scaling = scaling_ratio * np.expand_dims(bow[:, 0], axis=1).repeat(3, 1)
 53 |             GT_OBJECTS[image] = (bow[:, 1:] * scaling).astype(int)
 54 | 
 55 |     # start saving data into hdf5; loop over all scenes
 56 |     c_train = -1
 57 |     c_val = -1
 58 |     c_test = -1
 59 |     for scene_file in tqdm(sorted(glob('{}/*json'.format(scenes_path)))):
 60 |         # identify if scene belongs to train / val / test
 61 |         split = scene_file.split('/')[-1].split('_')[0]
 62 | 
 63 |         with open(scene_file, 'r') as f:
 64 |             scene = json.load(f)
 65 |         scene_id = scene['image_id']
 66 | 
 67 |         # loop over turns in a single scene
 68 |         idx = 0
 69 |         for i in range(len(scene['dialog'])):
 70 |             turn = scene['dialog'][i]
 71 | 
 72 |             bow = GT_BOW['Scene{}_{}'.format(scene_id, idx)]
 73 |             coords = GT_OBJECTS['Scene{}_{}'.format(scene_id, idx)]
 74 | 
 75 |             # if there is no image for current turn: merge with next turn
 76 |             if turn['abs_d'] == '':
 77 |                 continue
 78 | 
 79 |             image = cv2.imread(os.path.join(images_path, 'Scene{}_{}.png'.format(scene_id, idx)))
 80 |             image = cv2.resize(image, (128, 128))
 81 |             idx += 1
 82 | 
 83 |             if split == 'train':
 84 |                 c_train += 1
 85 |                 ex = h5_train.create_group(str(c_train))
 86 |             elif split == 'val':
 87 |                 c_val += 1
 88 |                 ex = h5_val.create_group(str(c_val))
 89 |             elif split == 'test':
 90 |                 c_test += 1
 91 |                 ex = h5_test.create_group(str(c_test))
 92 | 
 93 |             ex.create_dataset('image', data=image)
 94 |             ex.create_dataset('objects', data=np.array(bow))
 95 |             ex.create_dataset('coords', data=np.array(coords))
 96 |             ex.create_dataset('scene_id', data=scene_id)
 97 | 
 98 | 
 99 | if __name__ == '__main__':
100 |     create_object_detection_dataset()
101 | 


--------------------------------------------------------------------------------
/scripts/codraw_dataset_generation/codraw_raw_to_hdf5.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Microsoft Corporation.
  2 | # Licensed under the MIT license.
  3 | """
  4 | Script to parse and read raw CoDraw data and save it in HDF5 format for GeNeVA-GAN
  5 | """
  6 | from glob import glob
  7 | import json
  8 | import os
  9 | import pickle
 10 | import string
 11 | 
 12 | import cv2
 13 | import h5py
 14 | import nltk
 15 | import numpy as np
 16 | from tqdm import tqdm
 17 | import yaml
 18 | 
 19 | 
 20 | with open('config.yml', 'r') as f:
 21 |     keys = yaml.load(f, Loader=yaml.FullLoader)
 22 | 
 23 | 
 24 | def replace_at_offset(msg, tok, offset, tok_replace):
 25 |     before = msg[:offset]
 26 |     after = msg[offset:]
 27 |     after = after.replace(tok, tok_replace, 1)
 28 |     return before + after
 29 | 
 30 | 
 31 | def create_h5():
 32 |     # load required keys
 33 |     scenes_path = keys['codraw_scenes']
 34 |     images_path = keys['codraw_images']
 35 |     background_img = cv2.imread(keys['codraw_background'])
 36 |     h5_path = keys['codraw_hdf5_folder']
 37 |     spell_check = keys['codraw_spell_check']
 38 |     codraw_extracted_coords = keys['codraw_extracted_coordinates']
 39 | 
 40 |     # set height, width, scaling parameters
 41 |     h, w, _ = background_img.shape
 42 |     scale_x = 128. / w
 43 |     scale_y = 128. / h
 44 |     scaling_ratio = np.array([scale_x, scale_y, 1])
 45 |     background_img = cv2.resize(background_img, (128, 128))
 46 | 
 47 |     # load spelling corrections - obtained via Bing Spell Check API
 48 |     with open(spell_check, 'rb') as f:
 49 |         spell_check = pickle.load(f)
 50 | 
 51 |     # create hdf5 files for train, val, test
 52 |     h5_train = h5py.File(os.path.join(h5_path, 'codraw_train.h5'), 'w')
 53 |     h5_val = h5py.File(os.path.join(h5_path, 'codraw_val.h5'), 'w')
 54 |     h5_test = h5py.File(os.path.join(h5_path, 'codraw_test.h5'), 'w')
 55 |     h5_train.create_dataset('background', data=background_img)
 56 |     h5_val.create_dataset('background', data=background_img)
 57 |     h5_test.create_dataset('background', data=background_img)
 58 | 
 59 |     # set objects and bow (bag of words) dicts for each image
 60 |     bow_dim = 0
 61 |     GT_BOW = {}
 62 |     GT_OBJECTS = {}
 63 |     with open(codraw_extracted_coords, 'r') as f:
 64 |         for line in f:
 65 |             splits = line.split('\t')
 66 |             image = splits[0]
 67 |             split_coords = lambda x: [int(c) for c in x.split(',')]
 68 |             bow = np.array([split_coords(b) for b in splits[1].split()])
 69 |             bow_dim = len(bow)
 70 |             GT_BOW[image] = bow[:, 0]
 71 |             scaling = scaling_ratio * np.expand_dims(bow[:, 0], axis=1).repeat(3, 1)
 72 |             GT_OBJECTS[image] = (bow[:, 1:] * scaling).astype(int)
 73 | 
 74 |     # mark purely chitchat turns to be removed
 75 |     chitchat = ['hi', 'done', 'ok', 'alright', 'okay', 'thanks', 'bye', 'hello']
 76 | 
 77 |     # start saving data into hdf5; loop over all scenes
 78 |     c_train = 0
 79 |     c_val = 0
 80 |     c_test = 0
 81 |     for scene_file in tqdm(sorted(glob('{}/*json'.format(scenes_path)))):
 82 |         # identify if scene belongs to train / val / test
 83 |         split = scene_file.split('/')[-1].split('_')[0]
 84 | 
 85 |         images = []
 86 |         utterences = []
 87 |         objects = []
 88 |         coordinates = []
 89 | 
 90 |         with open(scene_file, 'r') as f:
 91 |             scene = json.load(f)
 92 |         scene_id = scene['image_id']
 93 | 
 94 |         # loop over turns in a single scene
 95 |         idx = 0
 96 |         prev_bow = np.zeros((bow_dim))
 97 |         description = []
 98 |         for i in range(len(scene['dialog'])):
 99 |             bow = GT_BOW['Scene{}_{}'.format(scene_id, idx)]
100 |             # new objects added in this turn
101 |             hamming_distance = np.sum(bow - prev_bow)
102 |             turn = scene['dialog'][i]
103 |             # lowercase all messages
104 |             teller = str.lower(turn['msg_t'])
105 |             drawer = str.lower(turn['msg_d'])
106 |             # clear chitchat turns
107 |             if teller in chitchat:
108 |                 teller = ''
109 |             if drawer in chitchat:
110 |                 drawer = ''
111 | 
112 |             # replace with spelling suggestions returned by Bing Spell Check API
113 |             if teller in spell_check and len(spell_check[teller]['flaggedTokens']) != 0:
114 |                 for flagged_token in spell_check[teller]['flaggedTokens']:
115 |                     tok = flagged_token['token']
116 |                     tok_offset = flagged_token['offset']
117 |                     assert len(flagged_token['suggestions']) == 1
118 |                     tok_replace = flagged_token['suggestions'][0]['suggestion']
119 |                     teller = replace_at_offset(teller, tok, tok_offset, tok_replace)
120 |             if drawer in spell_check and len(spell_check[drawer]['flaggedTokens']) != 0:
121 |                 for flagged_token in spell_check[drawer]['flaggedTokens']:
122 |                     tok = flagged_token['token']
123 |                     tok_offset = flagged_token['offset']
124 |                     assert len(flagged_token['suggestions']) == 1
125 |                     tok_replace = flagged_token['suggestions'][0]['suggestion']
126 |                     drawer = replace_at_offset(drawer, tok, tok_offset, tok_replace)
127 | 
128 |             # add delimiting tokens: <teller>, <drawer>
129 |             if teller != '':
130 |                 description += ['<teller>'] + nltk.word_tokenize(teller)
131 |             if drawer != '':
132 |                 description += ['<drawer>'] + nltk.word_tokenize(drawer)
133 | 
134 |             description = [w for w in description if w not in chitchat]
135 |             description = [w for w in description if w not in string.punctuation]
136 | 
137 |             bow = GT_BOW['Scene{}_{}'.format(scene_id, idx)]
138 |             coords = GT_OBJECTS['Scene{}_{}'.format(scene_id, idx)]
139 | 
140 |             # if there is no image for current turn: merge with next turn
141 |             if turn['abs_d'] == '':
142 |                 continue
143 | 
144 |             # if no new object is added in image for current turn: merge with next turn
145 |             if hamming_distance < 1:
146 |                 prev_bow = bow
147 |                 idx += 1
148 |                 continue
149 | 
150 |             # queue image, instruction, objects bow, object coordinates for saving
151 |             if len(description) > 0:
152 |                 image = cv2.imread(os.path.join(images_path, 'Scene{}_{}.png'.format(scene_id, idx)))
153 |                 image = cv2.resize(image, (128, 128))
154 | 
155 |                 images.append(image)
156 |                 utterences.append(str.join(' ', description))
157 |                 objects.append(bow)
158 |                 coordinates.append(coords)
159 | 
160 |                 description = []
161 |                 idx += 1
162 |                 prev_bow = bow
163 | 
164 |         # add current scene's data to hdf5
165 |         if len(images) > 0:
166 |             if split == 'train':
167 |                 scene = h5_train.create_group(str(c_train))
168 |                 c_train += 1
169 |             elif split == 'val':
170 |                 scene = h5_val.create_group(str(c_val))
171 |                 c_val += 1
172 |             elif split == 'test':
173 |                 scene = h5_test.create_group(str(c_test))
174 |                 c_test += 1
175 | 
176 |             scene.create_dataset('images', data=images)
177 |             dt = h5py.special_dtype(vlen=str)
178 |             scene.create_dataset('utterences', data=np.string_(utterences), dtype=dt)
179 |             scene.create_dataset('objects', data=np.array(objects))
180 |             scene.create_dataset('coords', data=np.array(coordinates))
181 |             scene.create_dataset('scene_id', data=scene_id)
182 |         else:
183 |             print(scene_id)
184 | 
185 | 
186 | if __name__ == '__main__':
187 |     create_h5()
188 | 


--------------------------------------------------------------------------------
/scripts/download_data.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -e
 4 | 
 5 | function download () {
 6 |     URL=$1
 7 |     TGTDIR=.
 8 |     if [ -n "$2" ]; then
 9 |         TGTDIR=$2
10 |         mkdir -p $TGTDIR
11 |     fi
12 |     echo "Downloading ${URL} to ${TGTDIR}"
13 |     wget $URL -P $TGTDIR
14 | }
15 | 
16 | # GloVe data
17 | if [ ! -f raw-data/GloVe/glove.840B.300d.txt ]
18 | then
19 |     download http://nlp.stanford.edu/data/glove.840B.300d.zip
20 |     mkdir --parents raw-data/GloVe
21 |     unzip glove.840B.300d.zip glove.840B.300d.txt -d raw-data/GloVe
22 |     rm -f glove.840B.300d.zip
23 | fi
24 | 
25 | # get CoDraw GitHub repository
26 | if [ ! -d raw-data/CoDraw/asset ]
27 | then
28 |     git clone https://github.com/facebookresearch/CoDraw.git raw-data/CoDraw
29 | fi
30 | 
31 | # get CoDraw individual json files
32 | if [ ! -d raw-data/CoDraw/output ]
33 | then
34 |     cd raw-data/CoDraw
35 |     if [ ! -f dataset/CoDraw_1_0.json ]
36 |     then
37 |         mkdir --parents dataset
38 |         wget -O dataset/CoDraw_1_0.json https://github.com/facebookresearch/CoDraw/releases/download/v1.0/CoDraw_1_0.json
39 |     fi
40 |     python script/preprocess.py dataset/CoDraw_1_0.json
41 |     rm dataset/CoDraw_1_0.json
42 |     cd ../../
43 | fi
44 | 
45 | # get CoDraw background image and object names
46 | if [ ! -f raw-data/CoDraw/background.png ]
47 | then
48 |     download https://vision.ece.vt.edu/clipart/dataset/AbstractScenes_v1.1.zip
49 |     unzip -j AbstractScenes_v1.1.zip AbstractScenes_v1.1/Pngs/background.png -d raw-data/CoDraw
50 |     unzip -j AbstractScenes_v1.1.zip AbstractScenes_v1.1/VisualFeatures/10K_instance_occurence_58_names.txt -d raw-data/CoDraw
51 |     rm AbstractScenes_v1.1.zip
52 | fi
53 | 


--------------------------------------------------------------------------------
/scripts/iclevr_dataset_generation/iclevr_add_data_to_raw.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Microsoft Corporation.
 2 | # Licensed under the MIT license.
 3 | """
 4 | Script to create a list of all objects in the i-CLEVR data
 5 | """
 6 | import itertools
 7 | import yaml
 8 | 
 9 | 
10 | with open('config.yml', 'r') as f:
11 |     keys = yaml.load(f, Loader=yaml.FullLoader)
12 | 
13 | 
14 | COLORS = ['gray', 'red', 'blue', 'green', 'brown', 'purple', 'cyan', 'yellow']
15 | SHAPES = ['cube', 'sphere', 'cylinder']
16 | 
17 | 
18 | def create_vocab():
19 |     obj_list = list(itertools.product(SHAPES, COLORS))
20 |     obj_list = [' '.join(x) for x in obj_list]
21 | 
22 |     with open(keys['iclevr_objects'], 'w') as f:
23 |         for item in obj_list:
24 |             f.write("%s\n" % item)
25 | 
26 | 
27 | if __name__ == '__main__':
28 |     create_vocab()
29 | 


--------------------------------------------------------------------------------
/scripts/iclevr_dataset_generation/iclevr_object_detection.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Microsoft Corporation.
 2 | # Licensed under the MIT license.
 3 | """
 4 | Script to parse and read i-CLEVR data and save it in HDF5 format for Object Detector & Localizer
 5 | """
 6 | from glob import glob
 7 | import json
 8 | import os
 9 | 
10 | import cv2
11 | import h5py
12 | import numpy as np
13 | from tqdm import tqdm
14 | import yaml
15 | 
16 | 
17 | with open('config.yml', 'r') as f:
18 |     keys = yaml.load(f, Loader=yaml.FullLoader)
19 | 
20 | 
21 | def create_h5():
22 |     # load required keys
23 |     data_path = keys['iclevr_data_source']
24 |     output_path = keys['iclevr_hdf5_folder']
25 |     OBJECTS = keys['iclevr_objects']
26 |     with open(OBJECTS, 'r') as f:
27 |         OBJECTS = f.readlines()
28 |         OBJECTS = [tuple(x.strip().split()) for x in OBJECTS]
29 |     background_path = keys['iclevr_background']
30 | 
31 |     # create hdf5 files for train, val, test
32 |     train_h5 = h5py.File(os.path.join(output_path, 'clevr_obj_train.h5'), 'w')
33 |     val_h5 = h5py.File(os.path.join(output_path, 'clevr_obj_val.h5'), 'w')
34 |     test_h5 = h5py.File(os.path.join(output_path, 'clevr_obj_test.h5'), 'w')
35 | 
36 |     json_path = os.path.join(data_path, 'scenes')
37 |     images_path = os.path.join(data_path, 'images')
38 | 
39 |     background_image = cv2.imread(background_path)
40 | 
41 |     entites = json.dumps(['{} {}'.format(e[0], e[1]) for e in OBJECTS])
42 | 
43 |     # start saving data into hdf5; loop over all scenes
44 |     c_train = -1
45 |     c_val = -1
46 |     c_test = -1
47 |     for scene in tqdm(glob(json_path + '/*.json')):
48 |         filename = os.path.basename(scene)
49 |         with open(scene, 'r') as f:
50 |             scene = json.load(f)
51 | 
52 |         # identify if scene belongs to train / val / test
53 |         split = filename.split('_')[1]
54 |         scene_id = filename.split('_')[2][:-5]
55 | 
56 |         # add images
57 |         images_files = sorted(glob(os.path.join(images_path, 'CLEVR_{}_{}_*'.format(split, scene_id))))
58 |         images = []
59 |         for t, image_file in enumerate(images_files):
60 |             image = cv2.imread(image_file)
61 |             image = cv2.resize(image, (128, 128))
62 |             images.append(image)
63 | 
64 |         # add objects and object coordinates
65 |         agg_object = np.zeros(24)
66 |         objects = np.zeros((5, 24))
67 |         agg_object_coords = np.zeros((24, 3))
68 |         object_coords = np.zeros((5, 24, 3))
69 |         for t, obj in enumerate(scene['objects']):
70 |             color = obj['color']
71 |             shape = obj['shape']
72 |             index = OBJECTS.index((shape, color))
73 |             agg_object[index] = 1
74 |             objects[t] = agg_object
75 |             agg_object_coords[index] = [obj['pixel_coords'][0]/320.*128, obj['pixel_coords'][1]/240.*128, obj['pixel_coords'][2]]
76 |             object_coords[t] = agg_object_coords
77 | 
78 |         for t, obj in enumerate(scene['objects']):
79 |             if split == 'train':
80 |                 c_train += 1
81 |                 sample = train_h5.create_group(str(c_train))
82 |             elif split == 'val':
83 |                 c_val += 1
84 |                 sample = val_h5.create_group(str(c_val))
85 |             else:
86 |                 c_test += 1
87 |                 sample = test_h5.create_group(str(c_test))
88 | 
89 |             sample.create_dataset('scene_id', data=scene_id)
90 |             sample.create_dataset('image', data=np.array(images)[t])
91 |             sample.create_dataset('objects', data=objects[t])
92 |             sample.create_dataset('coords', data=np.array(object_coords)[t])
93 | 
94 | 
95 | if __name__ == '__main__':
96 |     create_h5()
97 | 


--------------------------------------------------------------------------------
/scripts/iclevr_dataset_generation/iclevr_raw_to_hdf5.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) Microsoft Corporation.
  2 | # Licensed under the MIT license.
  3 | """
  4 | Script to parse and read raw i-CLEVR data and save it in HDF5 format for GeNeVA-GAN
  5 | """
  6 | from glob import glob
  7 | import json
  8 | import os
  9 | 
 10 | import cv2
 11 | import h5py
 12 | import numpy as np
 13 | from tqdm import tqdm
 14 | import yaml
 15 | 
 16 | 
 17 | with open('config.yml', 'r') as f:
 18 |     keys = yaml.load(f, Loader=yaml.FullLoader)
 19 | 
 20 | 
 21 | def create_h5():
 22 |     # load required keys
 23 |     data_path = keys['iclevr_data_source']
 24 |     output_path = keys['iclevr_hdf5_folder']
 25 |     OBJECTS = keys['iclevr_objects']
 26 |     with open(OBJECTS, 'r') as f:
 27 |         OBJECTS = f.readlines()
 28 |         OBJECTS = [tuple(x.strip().split()) for x in OBJECTS]
 29 |     background_path = keys['iclevr_background']
 30 | 
 31 |     # create hdf5 files for train, val, test
 32 |     train_h5 = h5py.File(os.path.join(output_path, 'clevr_train.h5'), 'w')
 33 |     val_h5 = h5py.File(os.path.join(output_path, 'clevr_val.h5'), 'w')
 34 |     test_h5 = h5py.File(os.path.join(output_path, 'clevr_test.h5'), 'w')
 35 | 
 36 |     json_path = os.path.join(data_path, 'scenes/')
 37 |     images_path = os.path.join(data_path, 'images/')
 38 |     text_path = os.path.join(data_path, 'text/')
 39 | 
 40 |     # add background image to hdf5
 41 |     background_image = cv2.imread(background_path)
 42 |     train_h5.create_dataset('background', data=background_image)
 43 |     val_h5.create_dataset('background', data=background_image)
 44 |     test_h5.create_dataset('background', data=background_image)
 45 | 
 46 |     # add object properties to hdf5
 47 |     entites = json.dumps(['{} {}'.format(e[0], e[1]) for e in OBJECTS])
 48 |     train_h5.create_dataset('entities', data=entites)
 49 |     val_h5.create_dataset('entities', data=entites)
 50 |     test_h5.create_dataset('entities', data=entites)
 51 | 
 52 |     # start saving data into hdf5; loop over all scenes
 53 |     for scene in tqdm(glob(json_path + '/*.json')):
 54 |         filename = os.path.basename(scene)
 55 |         with open(scene, 'r') as f:
 56 |             scene = json.load(f)
 57 | 
 58 |         # identify if scene belongs to train / val / test
 59 |         split = filename.split('_')[1]
 60 |         scene_id = filename.split('_')[2][:-5]
 61 | 
 62 |         # add text
 63 |         text_file = os.path.join(text_path, 'CLEVR_{}_{}.txt'.format(split, scene_id))
 64 |         with open(text_file, 'r') as f:
 65 |             text = [line.strip() for line in f]
 66 | 
 67 |         # add images
 68 |         images_files = sorted(glob(os.path.join(images_path, 'CLEVR_{}_{}_*'.format(split, scene_id))))
 69 |         images = []
 70 |         for t, image_file in enumerate(images_files):
 71 |             image = cv2.imread(image_file)
 72 |             image = cv2.resize(image, (128, 128))
 73 |             images.append(image)
 74 | 
 75 |         # add objects and object coordinates
 76 |         agg_object = np.zeros(24)
 77 |         objects = np.zeros((5, 24))
 78 |         agg_object_coords = np.zeros((24, 3))
 79 |         object_coords = np.zeros((5, 24, 3))
 80 |         for t, obj in enumerate(scene['objects']):
 81 |             color = obj['color']
 82 |             shape = obj['shape']
 83 |             index = OBJECTS.index((shape, color))
 84 |             agg_object[index] = 1
 85 |             objects[t] = agg_object
 86 |             agg_object_coords[index] = [obj['pixel_coords'][0]/320.*128, obj['pixel_coords'][1]/240.*128, obj['pixel_coords'][2]]
 87 |             object_coords[t] = agg_object_coords
 88 | 
 89 |         if split == 'train':
 90 |             sample = train_h5.create_group(scene_id)
 91 |         elif split == 'val':
 92 |             sample = val_h5.create_group(scene_id)
 93 |         else:
 94 |             sample = test_h5.create_group(scene_id)
 95 | 
 96 |         sample.create_dataset('scene_id', data=scene_id)
 97 |         sample.create_dataset('images', data=np.array(images))
 98 |         sample.create_dataset('text', data=json.dumps(text))
 99 |         sample.create_dataset('objects', data=objects)
100 |         sample.create_dataset('coords', data=np.array(object_coords))
101 | 
102 | 
103 | if __name__ == '__main__':
104 |     create_h5()
105 | 


--------------------------------------------------------------------------------
/scripts/joint_codraw_iclevr/generate_glove_file.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) Microsoft Corporation.
 2 | # Licensed under the MIT license.
 3 | """
 4 | Script to generate the GloVe embedding file for the CoDraw and i-CLEVR dataset vocabularies
 5 | """
 6 | from tqdm import tqdm
 7 | import yaml
 8 | 
 9 | 
10 | with open('config.yml', 'r') as f:
11 |     keys = yaml.load(f, Loader=yaml.FullLoader)
12 | 
13 | 
14 | def generate_glove_file():
15 |     codraw_vocab = keys['codraw_vocab']
16 |     clevr_vocab = keys['iclevr_vocab']
17 |     output_file = keys['glove_output']
18 |     original_glove = keys['glove_source']
19 | 
20 |     # read CoDraw vocabulary
21 |     with open(codraw_vocab, 'r') as f:
22 |         codraw_vocab = f.readlines()
23 |         codraw_vocab = [x.strip().rsplit(' ', 1)[0] for x in codraw_vocab]
24 | 
25 |     # read i-CLEVR vocabulary
26 |     with open(clevr_vocab, 'r') as f:
27 |         clevr_vocab = f.readlines()
28 |         clevr_vocab = [x.strip().rsplit(' ', 1)[0] for x in clevr_vocab]
29 | 
30 |     # combine vocabularies and add special tokens for CoDraw Drawer and Teller
31 |     codraw_vocab += clevr_vocab + ['<drawer>', '<teller>']
32 |     codraw_vocab = list(set(codraw_vocab))
33 |     codraw_vocab.sort()
34 | 
35 |     print('Loading GloVe file. This might take a few minutes.')
36 |     with open(original_glove, 'r') as f:
37 |         original_glove = f.readlines()
38 |         tok_glove_pairs = [x.strip().split(' ', 1) for x in original_glove]
39 | 
40 |     # extract GloVe vectors for vocabulary tokens
41 |     for token, glove_emb in tqdm(tok_glove_pairs):
42 |         if token == 'unk':
43 |             unk_embedding = glove_emb
44 |         try:
45 |             token_idx = codraw_vocab.index(token)
46 |         except ValueError:
47 |             continue
48 |         else:
49 |             codraw_vocab[token_idx] = ' '.join([token, glove_emb])
50 | 
51 |     # set Drawer and Teller token vectors; assign 'unk' GloVe embedding to unknown words
52 |     unk_count = 0
53 |     for itidx, item in enumerate(codraw_vocab):
54 |         if len(item.split(' ')) == 1:
55 |             if item == '<drawer>':
56 |                 codraw_vocab[itidx] = ' '.join(['<drawer>', ('0.1 ' * 150 + '0.0 ' * 150)[:-1]])
57 |             elif item == '<teller>':
58 |                 codraw_vocab[itidx] = ' '.join(['<teller>', ('0.0 ' * 150 + '0.1 ' * 150)[:-1]])
59 |             else:
60 |                 unk_count += 1
61 |                 codraw_vocab[itidx] = ' '.join([item, unk_embedding])
62 | 
63 |     # write GloVe vector file for the CoDraw and i-CLEVR datasets combined
64 |     with open(output_file, 'w') as f:
65 |         for item in codraw_vocab:
66 |             f.write('%s\n' % item)
67 | 
68 |     print('Total words in vocab: {}\n`unk` embedding words: {}'.format(len(codraw_vocab), unk_count))
69 | 
70 | 
71 | if __name__ == '__main__':
72 |     generate_glove_file()
73 | 


--------------------------------------------------------------------------------