├── .gitignore
├── LICENSE
├── README.md
├── data
    ├── generate_pascal3d_csv.py
    ├── getPascalTrainVal.m
    └── get_csv.sh
├── datasets
    ├── __init__.py
    ├── pascal3d.py
    ├── pascal3d_kp.py
    └── vp_util.py
├── model_weights
    └── get_weights.sh
├── models
    ├── __init__.py
    ├── clickhere_cnn.py
    └── render4cnn.py
├── train.py
└── util
    ├── Paths.py
    ├── __init__.py
    ├── load_datasets.py
    ├── metrics.py
    ├── torch_utils.py
    └── vp_loss.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | experiments
  2 | set_env.sh
  3 | trash
  4 | cluster*
  5 | *.npy
  6 | *.pth
  7 | *.pkl
  8 | old_stuff
  9 | test.py
 10 | data/*.csv
 11 | data/*.txt
 12 | 
 13 | # Generated Ignores
 14 | # Byte-compiled / optimized / DLL files
 15 | __pycache__/
 16 | *.py[cod]
 17 | *$py.class
 18 | 
 19 | # C extensions
 20 | *.so
 21 | 
 22 | # Distribution / packaging
 23 | .Python
 24 | env/
 25 | build/
 26 | develop-eggs/
 27 | dist/
 28 | downloads/
 29 | eggs/
 30 | .eggs/
 31 | lib/
 32 | lib64/
 33 | parts/
 34 | sdist/
 35 | var/
 36 | wheels/
 37 | *.egg-info/
 38 | .installed.cfg
 39 | *.egg
 40 | 
 41 | # PyInstaller
 42 | #  Usually these files are written by a python script from a template
 43 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 44 | *.manifest
 45 | *.spec
 46 | 
 47 | # Installer logs
 48 | pip-log.txt
 49 | pip-delete-this-directory.txt
 50 | 
 51 | # Unit test / coverage reports
 52 | htmlcov/
 53 | .tox/
 54 | .coverage
 55 | .coverage.*
 56 | .cache
 57 | nosetests.xml
 58 | coverage.xml
 59 | *.cover
 60 | .hypothesis/
 61 | 
 62 | # Translations
 63 | *.mo
 64 | *.pot
 65 | 
 66 | # Django stuff:
 67 | *.log
 68 | local_settings.py
 69 | 
 70 | # Flask stuff:
 71 | instance/
 72 | .webassets-cache
 73 | 
 74 | # Scrapy stuff:
 75 | .scrapy
 76 | 
 77 | # Sphinx documentation
 78 | docs/_build/
 79 | 
 80 | # PyBuilder
 81 | target/
 82 | 
 83 | # Jupyter Notebook
 84 | .ipynb_checkpoints
 85 | 
 86 | # pyenv
 87 | .python-version
 88 | 
 89 | # celery beat schedule file
 90 | celerybeat-schedule
 91 | 
 92 | # SageMath parsed files
 93 | *.sage.py
 94 | 
 95 | # dotenv
 96 | .env
 97 | 
 98 | # virtualenv
 99 | .venv
100 | venv/
101 | ENV/
102 | 
103 | # Spyder project settings
104 | .spyderproject
105 | .spyproject
106 | 
107 | # Rope project settings
108 | .ropeproject
109 | 
110 | # mkdocs documentation
111 | /site
112 | 
113 | # mypy
114 | .mypy_cache/
115 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Mohamed El Banani
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # pytorch-clickhere-cnn
  2 | 
  3 | 
  4 | ## Introduction
  5 | 
  6 | This is a [PyTorch](http://pytorch.org) implementation of [Clickhere CNN](https://github.com/rszeto/click-here-cnn)
  7 | and [Render for CNN](https://github.com/shapenet/RenderForCNN).
  8 | 
  9 | We currently provide the model, converted weights, dataset classes, and training/evaluation scripts.
 10 | This implementation also includes an implementation of the Geometric Structure Aware loss function first mentioned in Render For CNN.
 11 | 
 12 | 
 13 | If you have any questions, please email me at mbanani@umich.edu.
 14 | 
 15 | ## Getting Started
 16 | 
 17 | Add the repository to your python path:
 18 | 
 19 |         export PYTHONPATH=$PYTHONPATH:$(pwd)
 20 | 
 21 | 
 22 | Please be aware that I make use of the following packages:
 23 | - Python 3.6 
 24 | - PyTorch 1.1 and Torch Vision 0.2.2
 25 | - scipy 
 26 | - pandas 
 27 | 
 28 | ### Generating the data
 29 | Download the [Pascal 3D+ dataset](http://cvgl.stanford.edu/projects/pascal3d.html) (Release 1.1).
 30 | Set the path for the Pascal3D directory in util/Paths.py. Finally, run the following command from the repository's root directory.
 31 | 
 32 |         cd data/
 33 |         python generate_pascal3d_csv.py
 34 | 
 35 | Please note that this will generate the csv files for 3 variants of the dataset: Pascal 3D+ (full), Pascal 3D+ (easy), and Pascal 3D-Vehicles (with keypoints). Those datasets are needed to obtain different sets of results.
 36 | Alternatively, you could directly download the csv files by using `data/get_csv.sh`
 37 | 
 38 | ### Pre-trained Model Weights
 39 | 
 40 | We have converted the RenderForCNN and Clickhere model weights from the respective caffe models.
 41 | The converted models are available for download by running the script in `model_weights/get_weights.sh` 
 42 | The converted models achieve comparable performance to the Caffe for the Render For CNN
 43 | model, however, there is a larger error observed for Clickhere CNN.
 44 | Updated results coming soon.
 45 | 
 46 | ### Running Inference
 47 | 
 48 | After downloading Pascal 3D+ and the pretrained weights, generating the CSV files, and setting the appropriate paths as mentioned above,
 49 | you can run inference on the Pascal 3D+ dataset by running one of the following commands (depending on the model):
 50 | 
 51 |         python train.py  --model chcnn --dataset pascalVehKP  
 52 |         python train.py  --model r4cnn --dataset pascalEasy
 53 |         python train.py  --model r4cnn --dataset pascal
 54 | 
 55 | 
 56 | #### Results
 57 | 
 58 | To be updated soon! 
 59 | 
 60 | The original Render For CNN paper reported the results on the 'easy' subset of Pascal3D, which removes any truncated and occluded images from the datasets. While Click-Here CNN reports results on an augmented version of the dataset where multiple instances may belong to the same object in the image as each image-keypoint pair corresponds to an instance. Below are the results Obtained from each of the runs above.
 61 | 
 62 | ### Render For CNN paper results:
 63 | 
 64 | We evaluate the converted model on Pascal3D-easy as reported in the original Render For CNN paper,
 65 | as well as the full Pascal 3D dataset.
 66 | It is worth nothing that the converted model actually exceeds the performance reported in Render For CNN.
 67 | 
 68 | # #### Accuracy
 69 | ataset    | plane | bike  | boat  | bottle| bus   | car   | chair |d.table| mbike | sofa  | train | tv    | mean  |
 70 | |:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
 71 | | Full      | 76.26 | 69.58 | 59.03 | 87.74 | 84.32 | 69.97 | 74.2  | 66.79 | 77.29 | 82.37 | 75.48 | 81.93 | 75.41 |
 72 | | Easy      | 80.37 | 85.59 | 62.93 | 95.60 | 94.14 | 84.08 | 82.76 | 80.95 | 85.30 | 84.61 | 84.08 | 93.26 | 84.47 |
 73 | | Reported  | 74    | 83    | 52    | 91    | 91    | 88    | 86    | 73    | 78    | 90    | 86    | 92    | 82    |
 74 | 
 75 | #### Median Error
 76 | |dataset    | plane | bike  | boat  | bottle| bus   | car   | chair |d.table| mbike | sofa  | train | tv    | mean  |
 77 | |:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
 78 | |Full       | 11.52 | 15.33 | 19.33 | 8.51  | 5.54  | 9.39  | 13.83 | 12.87 | 14.90 | 13.03 | 8.96  | 13.72 | 12.24 |
 79 | |Easy       | 10.32 | 11.66 | 17.74 | 6.66  | 4.52  | 6.65  | 11.21 | 9.75  | 13.11 | 9.76  | 5.52  | 11.93 | 9.90  |
 80 | |Reported   | 15.4  | 14.8  | 25.6  | 9.3   | 3.6   | 6.0   | 9.7   | 10.8  | 16.7  | 9.5   | 6.1   | 12.6  | 11.7  |
 81 | 
 82 | 
 83 | 
 84 | ### Pascal3D - Vehicles with Keypoints
 85 | 
 86 | We evaluated the converted Render For CNN and Click-Here CNN models on Pascal3D-Vehicle.
 87 | It should be noted that the results for Click-Here are lower than those achieved by running the author provided Caffe code.
 88 | It seems that there is something incorrect with the current reimplementation and/or weight conversion.
 89 | We are working on fixing this problem.
 90 | 
 91 | #### Accuracy
 92 | |                           |  bus  | car   | m.bike | mean  |
 93 | |:-------------------------:|:-----:|:-----:|:------:|:-----:|
 94 | | Render For CNN            | 89.26 | 74.36 | 81.93  | 81.85 |
 95 | | Click-Here CNN            | 86.91 | 83.25 | 73.83  | 81.33 |
 96 | | Click-Here CNN (reported) | 96.8  | 90.2  | 85.2   | 90.7  |
 97 | 
 98 | #### Median Error
 99 | |                           |  bus  | car   | m.bike | mean  |
100 | |:-------------------------:|:-----:|:-----:|:------:|:-----:|
101 | | Render For CNN            | 5.16  | 8.53  | 13.46  | 9.05  |
102 | | Click-Here CNN            | 4.01  | 8.18  | 19.71  | 10.63 |
103 | | Click-Here CNN (reported) | 2.63  | 4.98  | 11.4   | 6.35  |
104 | 
105 | 
106 | ### Pascal3D - Vehicles with Keypoints (Fine-tuned Models)
107 | 
108 | We fine-tuned both models on the Pascal 3D+ (Vehicles with Keypoints) dataset.
109 | Since we suspect that the problem with the replication of the Click-Here CNN model
110 | is in the attention section, we conducted an experiment where we only fine-tuned
111 | those weights. As reported below, fine-tuning just the attention model achieves the best performance.
112 | 
113 | |                               |  bus  | car   | m.bike | mean  |
114 | |:-----------------------------:|:-----:|:-----:|:------:|:-----:|
115 | | Render For CNN FT             | 93.55 | 83.98 | 87.30  | 88.28 |
116 | | Click-Here CNN FT             | 92.97 | 89.84 | 81.25  | 88.02 |
117 | | Click-Here CNN FT-Attention   | 94.48 | 90.77 | 84.91  | 90.05 |
118 | | Click-Here CNN (reported)     | 96.8  | 90.2  | 85.2   | 90.7  |
119 | 
120 | 
121 | #### Median Error
122 | 
123 | |                               |  bus  | car   | m.bike | mean  |
124 | |:-----------------------------:|:-----:|:-----:|:------:|:-----:|
125 | | Render For CNN FT             | 3.04  | 5.83  | 11.95  | 6.94  |
126 | | Click-Here CNN FT             | 2.93  | 5.14  | 13.42  | 7.16  |
127 | | Click-Here CNN FT-Attention   | 2.88  | 5.24  | 12.10  | 6.74  |
128 | | Click-Here CNN (reported)     | 2.63  | 4.98  | 11.4   | 6.35  |
129 | 
130 | 
131 | ## Training the model
132 | 
133 | To train the model, simply run `python train.py` with parameter flags as indicated in train.py.
134 | 
135 | ## Citation
136 | 
137 | This is an implementation of [Clickhere CNN](https://github.come/rszeto/click-here-cnn) and [Render For CNN](https://github.com/shapenet/RenderForCNN), so please cite the respective papers if you use this code in any published work.
138 | 
139 | ## Acknowledgements
140 | 
141 | We would like to thank Ryan Szeto, Hao Su, and Charles R. Qi for providing their code, and
142 | for their assistance with questions regarding reimplementing their work. We would also
143 | like to acknowledge [Kenta Iwasaki](https://discuss.pytorch.org/u/dranithix/summary) for
144 | his advice with loss function implementation and [Qi Fan](https://github.com/fanq15) for releasing
145 | [caffe_to_torch_to_pytorch](https://github.com/fanq15/caffe_to_torch_to_pytorch).
146 | 
147 | This work has been partially supported by DARPA W32P4Q-15-C-0070 (subcontract from SoarTech) and funds from the University of Michigan Mobility Transformation Center.
148 | 


--------------------------------------------------------------------------------
/data/generate_pascal3d_csv.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import numpy    as np
  3 | import scipy.io as spio
  4 | 
  5 | from IPython import embed
  6 | 
  7 | from util import Paths
  8 | 
  9 | INFO_FILE_HEADER = 'imgPath,bboxTLX,bboxTLY,bboxBRX,bboxBRY,imgKeyptX,imgKeyptY,keyptClass,objClass,azimuthClass,elevationClass,rotationClass\n'
 10 | 
 11 | synset_name_pairs = [   ('02691156', 'aeroplane'),
 12 |                         ('02834778', 'bicycle'),
 13 |                         ('02858304', 'boat'),
 14 |                         ('02876657', 'bottle'),
 15 |                         ('02924116', 'bus'),
 16 |                         ('02958343', 'car'),
 17 |                         ('03001627', 'chair'),
 18 |                         ('04379243', 'diningtable'),
 19 |                         ('03790512', 'motorbike'),
 20 |                         ('04256520', 'sofa'),
 21 |                         ('04468005', 'train'),
 22 |                         ('03211117', 'tvmonitor')]
 23 | 
 24 | KEYPOINT_TYPES = {
 25 |     'aeroplane'   : ['right_wing', 'tail', 'rudder_upper', 'noselanding',
 26 |                     'left_wing', 'rudder_lower', 'right_elevator', 'left_elevator'],
 27 |     'bicycle'     : ['left_front_wheel', 'left_back_wheel', 'seat_back',
 28 |                     'right_front_wheel', 'left_pedal_center', 'head_center',
 29 |                     'left_handle', 'right_pedal_center', 'right_handle',
 30 |                     'right_back_wheel', 'seat_front'],
 31 |     'boat'        : ['head', 'head_left', 'head_down', 'head_right',
 32 |                     'tail', 'tail_left', 'tail_right'],
 33 |     'bottle'      : ['body', 'body_right', 'body_left', 'bottom_right',
 34 |                     'bottom', 'mouth', 'bottom_left'],
 35 |     'bus'         : ['body_back_left_lower', 'body_back_left_upper', 'body_back_right_lower',
 36 |                     'body_back_right_upper', 'body_front_left_upper', 'body_front_right_upper',
 37 |                     'body_front_left_lower', 'body_front_right_lower', 'left_back_wheel',
 38 |                     'left_front_wheel', 'right_back_wheel', 'right_front_wheel'],
 39 |     'car'         : ['left_front_wheel', 'left_back_wheel', 'right_front_wheel',
 40 |                     'right_back_wheel', 'upper_left_windshield', 'upper_right_windshield',
 41 |                     'upper_left_rearwindow', 'upper_right_rearwindow', 'left_front_light',
 42 |                     'right_front_light', 'left_back_trunk', 'right_back_trunk'],
 43 |     'chair'       : ['seat_upper_right', 'leg_upper_left', 'seat_lower_left', 'leg_upper_right',
 44 |                     'back_upper_left', 'leg_lower_left', 'seat_upper_left', 'leg_lower_right',
 45 |                     'seat_lower_right', 'back_upper_right'],
 46 |     'diningtable' : ['top_upper_right', 'top_right', 'top_left', 'leg_upper_left',
 47 |                     'top_lower_left', 'top_lower_right', 'top_down', 'leg_upper_right',
 48 |                     'top_upper_left', 'leg_lower_left', 'leg_lower_right', 'top_up'],
 49 |     'motorbike'   : ['back_seat', 'front_seat', 'head_center', 'headlight_center',
 50 |                     'left_back_wheel', 'left_front_wheel', 'left_handle_center',
 51 |                     'right_back_wheel', 'right_front_wheel', 'right_handle_center'],
 52 |     'sofa'        : ['front_bottom_right', 'top_right_corner', 'top_left_corner',
 53 |                     'front_bottom_left', 'seat_bottom_right', 'seat_bottom_left',
 54 |                     'right_bottom_back', 'seat_top_right', 'left_bottom_back', 'seat_top_left'],
 55 |     'train'       : ['mid2_left_top', 'head_left_top', 'head_right_bottom', 'mid2_right_bottom',
 56 |                     'mid1_left_bottom', 'mid1_right_top', 'tail_right_bottom', 'tail_right_top', 'tail_left_top',
 57 |                     'head_right_top', 'head_left_bottom', 'mid2_left_bottom', 'tail_left_bottom', 'head_top',
 58 |                     'mid1_left_top', 'mid2_right_top', 'mid1_right_bottom'],
 59 |     'tvmonitor'   : ['front_bottom_right', 'back_top_left', 'front_top_left', 'front_bottom_left',
 60 |                     'back_bottom_left', 'front_top_right', 'back_top_right', 'back_bottom_right']
 61 | }
 62 | 
 63 | 
 64 | SYNSET_CLASSIDX_MAP = {}
 65 | for i in range(len(synset_name_pairs)):
 66 |     synset, _ = synset_name_pairs[i]
 67 |     SYNSET_CLASSIDX_MAP[synset] = i
 68 | 
 69 | KEYPOINT_CLASSES = []
 70 | for synset, class_name in synset_name_pairs:
 71 |     keypoint_names = KEYPOINT_TYPES[class_name]
 72 |     for keypoint_name in keypoint_names:
 73 |         KEYPOINT_CLASSES.append(class_name + '_' + keypoint_name)
 74 | 
 75 | KEYPOINTCLASS_INDEX_MAP = {}
 76 | for i in range(len(KEYPOINT_CLASSES)):
 77 |     KEYPOINTCLASS_INDEX_MAP[KEYPOINT_CLASSES[i]] = i
 78 | 
 79 | DATASET_SOURCES     = ['pascal', 'imagenet']
 80 | PASCAL3D_ROOT       = Paths.pascal3d_root
 81 | ANNOTATIONS_ROOT    = os.path.join(PASCAL3D_ROOT, 'Annotations')
 82 | IMAGES_ROOT         = os.path.join(PASCAL3D_ROOT, 'Images')
 83 | 
 84 | 
 85 | """
 86 |     Create pascal image kp dataset for all classes.
 87 |     Code adapted from (https://github.com/rszeto/click-here-cnn)
 88 | """
 89 | def create_pascal_image_kp_csvs(vehicles = False):
 90 |     # Generate train and test lists and store in file
 91 |     BASE_DIR = os.path.dirname(os.path.abspath(__file__))
 92 | 
 93 |     if not (os.path.exists('trainImgIds.txt') and os.path.exists('valImgIds.txt')):
 94 |         matlab_cmd = 'addpath(\'%s\'); getPascalTrainVal' % BASE_DIR
 95 |         print('Generating MATLAB command: %s' % (matlab_cmd))
 96 |         os.system('matlab -nodisplay -r "try %s; catch; end; quit;"' % matlab_cmd)
 97 |         # os.system('matlab -nodisplay -r "%s; quit;"' % matlab_cmd)
 98 |         # Get training and test image IDs
 99 |     with open('trainImgIds.txt', 'rb') as trainIdsFile:
100 |         trainIds = np.loadtxt(trainIdsFile, dtype='string')
101 |     with open('valImgIds.txt', 'rb') as testIdsFile:
102 |         testIds = np.loadtxt(testIdsFile, dtype='string')
103 | 
104 |     data_dir = os.path.join(os.path.dirname(BASE_DIR), 'data')
105 |     if not os.path.exists(data_dir):
106 |         os.makedirs(data_dir)
107 | 
108 | 
109 |     if vehicles:
110 |         train_csv = os.path.join(data_dir, 'veh_pascal3d_kp_train.csv')
111 |         valid_csv = os.path.join(data_dir, 'veh_pascal3d_kp_valid.csv')
112 | 
113 | 
114 |         synset_name_pairs = [   ('02924116', 'bus'),
115 |                                 ('02958343', 'car'),
116 |                                 ('03790512', 'motorbike')]
117 | 
118 |     else:
119 |         train_csv = os.path.join(data_dir, 'pascal3d_kp_train.csv')
120 |         valid_csv = os.path.join(data_dir, 'pascal3d_kp_valid.csv')
121 |         synset_name_pairs = [   ('02691156', 'aeroplane'),
122 |                                 ('02834778', 'bicycle'),
123 |                                 ('02858304', 'boat'),
124 |                                 ('02876657', 'bottle'),
125 |                                 ('02924116', 'bus'),
126 |                                 ('02958343', 'car'),
127 |                                 ('03001627', 'chair'),
128 |                                 ('04379243', 'diningtable'),
129 |                                 ('03790512', 'motorbike'),
130 |                                 ('04256520', 'sofa'),
131 |                                 ('04468005', 'train'),
132 |                                 ('03211117', 'tvmonitor')]
133 | 
134 |     SYNSET_CLASSIDX_MAP = {}
135 |     for i in range(len(synset_name_pairs)):
136 |         synset, _ = synset_name_pairs[i]
137 |         SYNSET_CLASSIDX_MAP[synset] = i
138 | 
139 |     KEYPOINT_CLASSES = []
140 |     for synset, class_name in synset_name_pairs:
141 |         keypoint_names = KEYPOINT_TYPES[class_name]
142 |         for keypoint_name in keypoint_names:
143 |             KEYPOINT_CLASSES.append(class_name + '_' + keypoint_name)
144 | 
145 |     KEYPOINTCLASS_INDEX_MAP = {}
146 |     for i in range(len(KEYPOINT_CLASSES)):
147 |         KEYPOINTCLASS_INDEX_MAP[KEYPOINT_CLASSES[i]] = i
148 | 
149 |     info_file_train = open(train_csv, 'w')
150 |     info_file_train.write(INFO_FILE_HEADER)
151 |     info_file_test = open(valid_csv, 'w')
152 |     info_file_test.write(INFO_FILE_HEADER)
153 | 
154 |     for synset, class_name in synset_name_pairs:
155 |         print("Generating data for %s " % (class_name))
156 |         all_zeros = 0
157 |         counter = 0
158 |         counter_kp = 0
159 | 
160 |         object_class = SYNSET_CLASSIDX_MAP[synset]
161 |         for dataset_source in DATASET_SOURCES:
162 |             class_source_id = '%s_%s' % (class_name, dataset_source)
163 |             for anno_file in sorted(os.listdir(os.path.join(ANNOTATIONS_ROOT, class_source_id))):
164 |                 anno_file_id = os.path.splitext(os.path.basename(anno_file))[0]
165 |                 if anno_file_id in trainIds:
166 |                     anno_file_set = 'train'
167 |                 elif anno_file_id in testIds:
168 |                     anno_file_set = 'test'
169 |                 else:
170 |                     continue
171 | 
172 |                 anno = loadmat(os.path.join(ANNOTATIONS_ROOT, class_source_id, anno_file))['record']
173 |                 rel_image_path = os.path.join('Images', class_source_id, anno['filename'])
174 | 
175 | 
176 |                 # Make objs an array regardless of how many objects there are
177 |                 objs = np.array([anno['objects']]) if isinstance(anno['objects'], dict) else anno['objects']
178 |                 for obj_i, obj in enumerate(objs):
179 |                     # Only deal with objects in current class
180 |                     if obj['class'] == class_name:
181 |                         # Get crop using bounding box from annotation
182 |                         # Note: Annotations are in MATLAB coordinates (1-indexed), inclusive
183 |                         # Convert to 0-indexed numpy array
184 |                         bbox = np.array(obj['bbox']) - 1
185 | 
186 |                         # Get visible and in-frame keypoints
187 |                         keypoints = obj['anchors']
188 |                         try:
189 |                             assert set(KEYPOINT_TYPES[class_name]) == set(keypoints.keys())
190 |                         except:
191 |                             print("Assertion failed for keypoint types")
192 |                             embed()
193 | 
194 |                         viewpoint = obj['viewpoint']
195 |                         # Remove erronous KPs
196 |                         if(viewpoint['azimuth'] == viewpoint['theta'] == viewpoint['elevation'] == 0.0):
197 |                             all_zeros += 1
198 |                         else:
199 |                             counter += 1
200 |                             azimuth = np.mod(np.round(viewpoint['azimuth']), 360)
201 |                             elevation = np.mod(np.round(viewpoint['elevation']), 360)
202 |                             tilt = np.mod(np.round(viewpoint['theta']), 360)
203 | 
204 |                             for keypoint_name in KEYPOINT_TYPES[class_name]:
205 |                                 # Get 0-indexed keypoint location
206 |                                 keypoint_loc_full = keypoints[keypoint_name]['location'] - 1
207 |                                 if keypoint_loc_full.size > 0 and insideBox(keypoint_loc_full, bbox):
208 |                                     counter_kp += 1
209 |                                     # Add info for current keypoint
210 |                                     keypoint_class = KEYPOINTCLASS_INDEX_MAP[class_name + '_' + keypoint_name]
211 |                                     if vehicles:
212 |                                         if object_class == 0:
213 |                                             final_label = ( 4, azimuth, elevation, tilt)
214 |                                         elif object_class == 1:
215 |                                             final_label = ( 5, azimuth, elevation, tilt)
216 |                                         elif object_class == 2:
217 |                                             final_label = ( 8, azimuth, elevation, tilt)
218 |                                         else:
219 |                                             print("Error: Object classes do not match expected values!")
220 | 
221 |                                     keypoint_str = keypointInfo2Str(rel_image_path, bbox, keypoint_loc_full, keypoint_class, final_label)
222 |                                     if anno_file_set == 'train':
223 |                                         info_file_train.write(keypoint_str)
224 |                                     else:
225 |                                         info_file_test.write(keypoint_str)
226 |         print("%s : %d images %d image-kp pairs, %d ommited " % (class_name, counter, counter_kp, all_zeros))
227 | 
228 |     info_file_train.close()
229 |     info_file_test.close()
230 | 
231 | 
232 | """
233 |     Create pascal image kp dataset for all classes.
234 |     Code adapted from (https://github.com/rszeto/click-here-cnn)
235 | """
236 | def create_pascal_image_csvs(easy = False):
237 |     # Generate train and test lists and store in file
238 |     BASE_DIR = os.path.dirname(os.path.abspath(__file__))
239 | 
240 |     if not (os.path.exists('trainImgIds.txt') and os.path.exists('valImgIds.txt')):
241 |         matlab_cmd = 'addpath(\'%s\'); getPascalTrainVal' % BASE_DIR
242 |         print('Generating MATLAB command: %s' % (matlab_cmd))
243 |         os.system('matlab -nodisplay -r "try %s; catch; end; quit;"' % matlab_cmd)
244 |         # os.system('matlab -nodisplay -r "%s; quit;"' % matlab_cmd)
245 |         # Get training and test image IDs
246 |     with open('trainImgIds.txt', 'rb') as trainIdsFile:
247 |         trainIds = np.loadtxt(trainIdsFile, dtype='string')
248 |     with open('valImgIds.txt', 'rb') as testIdsFile:
249 |         testIds = np.loadtxt(testIdsFile, dtype='string')
250 | 
251 |     data_dir = os.path.join(os.path.dirname(BASE_DIR), 'data')
252 |     if not os.path.exists(data_dir):
253 |         os.makedirs(data_dir)
254 | 
255 |     if easy:
256 |         train_csv = os.path.join(data_dir, 'pascal3d_train_easy.csv')
257 |         valid_csv = os.path.join(data_dir, 'pascal3d_valid_easy.csv')
258 |     else:
259 |         train_csv = os.path.join(data_dir, 'pascal3d_train.csv')
260 |         valid_csv = os.path.join(data_dir, 'pascal3d_valid.csv')
261 | 
262 | 
263 |     info_file_train = open(train_csv, 'w')
264 |     info_file_train.write(INFO_FILE_HEADER)
265 |     info_file_test = open(valid_csv, 'w')
266 |     info_file_test.write(INFO_FILE_HEADER)
267 | 
268 |     for synset, class_name in synset_name_pairs:
269 |         print("Generating data for %s " % (class_name))
270 |         all_zeros = 0
271 |         hard_images = 0
272 |         counter = 0
273 |         object_class = SYNSET_CLASSIDX_MAP[synset]
274 |         for dataset_source in DATASET_SOURCES:
275 |             class_source_id = '%s_%s' % (class_name, dataset_source)
276 |             for anno_file in sorted(os.listdir(os.path.join(ANNOTATIONS_ROOT, class_source_id))):
277 |                 anno_file_id = os.path.splitext(os.path.basename(anno_file))[0]
278 |                 if anno_file_id in trainIds:
279 |                     anno_file_set = 'train'
280 |                 elif anno_file_id in testIds:
281 |                     anno_file_set = 'test'
282 |                 else:
283 |                     continue
284 | 
285 |                 anno = loadmat(os.path.join(ANNOTATIONS_ROOT, class_source_id, anno_file))['record']
286 |                 rel_image_path = os.path.join('Images', class_source_id, anno['filename'])
287 | 
288 |                 # Make objs an array regardless of how many objects there are
289 |                 objs = np.array([anno['objects']]) if isinstance(anno['objects'], dict) else anno['objects']
290 |                 for obj_i, obj in enumerate(objs):
291 |                     # Only deal with objects in current class
292 |                     if obj['class'] == class_name:
293 |                         # Get crop using bounding box from annotation
294 |                         # Note: Annotations are in MATLAB coordinates (1-indexed), inclusive
295 |                         # Convert to 0-indexed numpy array
296 |                         bbox = np.array(obj['bbox']) - 1
297 | 
298 |                         viewpoint = obj['viewpoint']
299 |                         # Remove erronous KPs
300 |                         if(viewpoint['azimuth'] == viewpoint['theta'] == viewpoint['elevation'] == 0.0):
301 |                             all_zeros += 1
302 |                         elif (easy and (obj['difficult'] == 1 or obj['truncated'] == 1 or obj['occluded'] == 1 )):
303 |                             hard_images += 1
304 |                         else:
305 |                             counter += 1
306 |                             azimuth = np.mod(np.round(viewpoint['azimuth']), 360)
307 |                             elevation = np.mod(np.round(viewpoint['elevation']), 360)
308 |                             tilt = np.mod(np.round(viewpoint['theta']), 360)
309 | 
310 |                             final_label = ( object_class, azimuth, elevation, tilt)
311 |                             viewpoint_str = viewpointInfo2Str(rel_image_path, bbox, final_label)
312 |                             if anno_file_set == 'train':
313 |                                 info_file_train.write(viewpoint_str)
314 |                             else:
315 |                                 info_file_test.write(viewpoint_str)
316 |         print("%s : %d images, ommitted: all_zeros - %d, difficult - %d  " % (class_name, counter, all_zeros,hard_images))
317 | 
318 |     info_file_train.close()
319 |     info_file_test.close()
320 | 
321 | ######### Importing .mat files ###############################################
322 | ######### Reference: http://stackoverflow.com/a/8832212 ######################
323 | 
324 | def loadmat(filename):
325 |     '''
326 |     this function should be called instead of direct spio.loadmat
327 |     as it cures the problem of not properly recovering python dictionaries
328 |     from mat files. It calls the function check keys to cure all entries
329 |     which are still mat-objects
330 |     '''
331 |     data = spio.loadmat(filename, struct_as_record=False, squeeze_me=True)
332 |     return _check_keys(data)
333 | 
334 | def _check_keys(dict):
335 |     '''
336 |     checks if entries in dictionary are mat-objects. If yes
337 |     todict is called to change them to nested dictionaries
338 |     '''
339 |     for key in dict:
340 |         if isinstance(dict[key], spio.matlab.mio5_params.mat_struct):
341 |             dict[key] = _todict(dict[key])
342 |     return dict
343 | 
344 | def _todict(matobj):
345 |     '''
346 |     A recursive function which constructs from matobjects nested dictionaries
347 |     '''
348 |     dict = {}
349 |     for strg in matobj._fieldnames:
350 |         elem = matobj.__dict__[strg]
351 |         if isinstance(elem, spio.matlab.mio5_params.mat_struct):
352 |             dict[strg] = _todict(elem)
353 |         # Handle case where elem is an array of mat_structs
354 |         elif isinstance(elem, np.ndarray) and len(elem) > 0 and \
355 |                 isinstance(elem[0], spio.matlab.mio5_params.mat_struct):
356 |             dict[strg] = np.array([_todict(subelem) for subelem in elem])
357 |         else:
358 |             dict[strg] = elem
359 |     return dict
360 | 
361 | def insideBox(point, box):
362 |     return point[0] >= box[0] and point[0] <= box[2] \
363 |             and point[1] >= box[1] and point[1] <= box[3]
364 | 
365 | def keypointInfo2Str(fullImagePath, bbox, keyptLoc, keyptClass, viewptLabel):
366 |     return '%s,%d,%d,%d,%d,%f,%f,%d,%d,%d,%d,%d\n' % (
367 |         fullImagePath,
368 |         bbox[0], bbox[1], bbox[2], bbox[3],
369 |         keyptLoc[0], keyptLoc[1],
370 |         keyptClass,
371 |         viewptLabel[0], viewptLabel[1], viewptLabel[2], viewptLabel[3]
372 |     )
373 | 
374 | def viewpointInfo2Str(fullImagePath, bbox, viewptLabel):
375 |     return '%s,%d,%d,%d,%d,%d,%d,%d,%d\n' % (
376 |         fullImagePath,
377 |         bbox[0], bbox[1], bbox[2], bbox[3],
378 |         viewptLabel[0], viewptLabel[1], viewptLabel[2], viewptLabel[3]
379 |     )
380 | 
381 | if __name__ == '__main__':
382 |     # create_pascal_image_kp_csvs()
383 |     create_pascal_image_kp_csvs(vehicles = True)       # Just vehicles
384 |     # create_pascal_image_csvs()
385 |     # create_pascal_image_csvs(easy = True)              # Easy subset
386 | 


--------------------------------------------------------------------------------
/data/getPascalTrainVal.m:
--------------------------------------------------------------------------------
 1 | PASCAL3D_ROOT = '/z/home/mbanani/datasets/pascal3d';
 2 | addpath(fullfile(PASCAL3D_ROOT, 'PASCAL', 'VOCdevkit', 'VOCcode'));
 3 | 
 4 | % Run VOC code to extract image IDs
 5 | VOCinit;
 6 | trainImgIds = textread(sprintf(VOCopts.imgsetpath, 'train'), '%s');
 7 | valImgIds = textread(sprintf(VOCopts.imgsetpath, 'val'), '%s');
 8 | 
 9 | % Save IDs to file
10 | trainIdsFile = fopen('trainImgIds.txt', 'w');
11 | for i=1:numel(trainImgIds)
12 |     fprintf(trainIdsFile, '%s\n', trainImgIds{i});
13 | end
14 | valIdsFile = fopen('valImgIds.txt', 'w');
15 | for i=1:numel(valImgIds)
16 |     fprintf(valIdsFile, '%s\n', valImgIds{i});
17 | end
18 | 


--------------------------------------------------------------------------------
/data/get_csv.sh:
--------------------------------------------------------------------------------
1 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_kp_train.csv         
2 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_kp_valid.csv         
3 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_train.csv            
4 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_train_easy.csv       
5 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_valid.csv            
6 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_valid_easy.csv       
7 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/veh_pascal3d_kp_train.csv     
8 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/veh_pascal3d_kp_valid.csv     
9 | 


--------------------------------------------------------------------------------
/datasets/__init__.py:
--------------------------------------------------------------------------------
1 | from .pascal3d              import *
2 | from .pascal3d_kp           import *
3 | 


--------------------------------------------------------------------------------
/datasets/pascal3d.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import numpy as np
  3 | import time
  4 | import pandas
  5 | import os
  6 | 
  7 | import numpy            as np
  8 | import torch.utils.data as data
  9 | 
 10 | from PIL            import Image
 11 | from .vp_util        import label_to_probs
 12 | from torchvision    import transforms
 13 | import copy
 14 | import random
 15 | 
 16 | from IPython import embed
 17 | 
 18 | class pascal3d(data.Dataset):
 19 |     """
 20 |         Construct a Pascal Dataset.
 21 |         Inputs:
 22 |             csv_path    path containing instance data
 23 |             augment     boolean for flipping images
 24 |     """
 25 |     def __init__(self, csv_path, dataset_root = None, im_size = 227, transform = None, just_easy = False, num_classes = 12):
 26 | 
 27 |         start_time = time.time()
 28 | 
 29 |         # Load instance data from csv-file
 30 |         im_paths, bbox, obj_cls, vp_labels = self.csv_to_instances(csv_path)
 31 |         print("csv file length: ", len(im_paths))
 32 | 
 33 |         # dataset parameters
 34 |         self.root           = dataset_root
 35 |         self.loader         = self.pil_loader
 36 |         self.im_paths   = im_paths
 37 |         self.bbox       = bbox
 38 |         self.obj_cls    = obj_cls
 39 |         self.vp_labels  = vp_labels
 40 |         self.flip       = [False] * len(im_paths)
 41 | 
 42 |         self.im_size        = im_size
 43 |         self.num_classes    = num_classes
 44 |         self.num_instances  = len(self.im_paths)
 45 |         assert transform   != None
 46 |         self.transform      = transform
 47 | 
 48 |         # Set weights for loss
 49 |         class_hist          = np.histogram(obj_cls, list(range(0, self.num_classes+1)))[0]
 50 |         mean_class_size     = np.mean(class_hist)
 51 |         self.loss_weights   = mean_class_size / class_hist
 52 | 
 53 |         # Print out dataset stats
 54 |         print("Dataset loaded in ", time.time() - start_time, " secs.")
 55 |         print("Dataset size: ", self.num_instances)
 56 | 
 57 |     def __getitem__(self, index):
 58 |         """
 59 |             Args:
 60 |             index (int): Index
 61 |             Returns:
 62 |             tuple: (image, target) where target is class_index of the target class.
 63 |         """
 64 | 
 65 |         # Load and transform image
 66 |         if self.root == None:
 67 |             im_path = self.im_paths[index]
 68 |         else:
 69 |             im_path = os.path.join(self.root, self.im_paths[index])
 70 | 
 71 |         bbox    = self.bbox[index]
 72 |         obj_cls = self.obj_cls[index]
 73 |         view    = self.vp_labels[index]
 74 |         flip    = self.flip[index]
 75 | 
 76 |         # Transform labels
 77 |         azim, elev, tilt = (view + 360.) % 360.
 78 | 
 79 |         # Load and transform image
 80 |         img = self.loader(im_path, bbox = bbox, flip = flip)
 81 |         if self.transform is not None:
 82 |             img = self.transform(img)
 83 | 
 84 | 
 85 |         # construct unique key for statistics -- only need to generate imid and year
 86 |         _bb     = str(bbox[0]) + '-' + str(bbox[1]) + '-' + str(bbox[2]) + '-' + str(bbox[3])
 87 |         key_uid = self.im_paths[index] + '_'  + _bb + '_objc' + str(obj_cls) + '_kpc' + str(0)
 88 | 
 89 |         return img, azim, elev, tilt, obj_cls, -1, -1, key_uid
 90 | 
 91 |     def __len__(self):
 92 |         return self.num_instances
 93 | 
 94 |     """
 95 |         Loads images and applies the following transformations
 96 |             1. convert all images to RGB
 97 |             2. crop images using bbox (if provided)
 98 |             3. resize using LANCZOS to rescale_size
 99 |             4. convert from RGB to BGR
100 |             5. (? not done now) convert from HWC to CHW
101 |             6. (optional) flip image
102 | 
103 |         TODO: once this works, convert to a relative path, which will matter for
104 |               synthetic data dataset class size.
105 |     """
106 |     def pil_loader(self, path, bbox = None ,flip = False):
107 |         # open path as file to avoid ResourceWarning
108 |         # link: (https://github.com/python-pillow/Pillow/issues/835)
109 |         with open(path, 'rb') as f:
110 |             with Image.open(f) as img:
111 |                 img = img.convert('RGB')
112 | 
113 |                 # Convert to BGR from RGB
114 |                 r, g, b = img.split()
115 |                 img = Image.merge("RGB", (b, g, r))
116 | 
117 |                 img = img.crop(box=bbox)
118 | 
119 |                 # verify that imresize uses LANCZOS
120 |                 img = img.resize( (self.im_size, self.im_size), Image.LANCZOS)
121 | 
122 |                 # flip image
123 |                 if flip:
124 |                     img = img.transpose(Image.FLIP_LEFT_RIGHT)
125 | 
126 |                 return img
127 | 
128 |     def csv_to_instances(self, csv_path):
129 |         df   = pandas.read_csv(csv_path, sep=',')
130 |         data = df.values
131 | 
132 |         data_split = np.split(data, [0, 1, 5, 6, 9], axis=1)
133 |         del(data_split[0])
134 | 
135 |         image_paths = np.squeeze(data_split[0]).tolist()
136 |         bboxes      = data_split[1].tolist()
137 |         obj_class   = np.squeeze(data_split[2]).tolist()
138 |         viewpoints  = np.array(data_split[3].tolist())
139 | 
140 |         return image_paths, bboxes, obj_class, viewpoints
141 | 
142 |     def augment(self):
143 |         self.im_paths   = self.im_paths  + self.im_paths
144 |         self.bbox       = self.bbox      + self.bbox
145 |         self.obj_cls    = self.obj_cls   + self.obj_cls
146 |         self.vp_labels  = self.vp_labels + self.vp_labels
147 |         self.flip       = self.flip      + [True] * self.num_instances
148 |         assert len(self.flip) == len(self.im_paths)
149 |         self.num_instances = len(self.im_paths)
150 |         print("Augmented dataset. New size: ", self.num_instances)
151 | 
152 |     def generate_validation(self, ratio = 0.1):
153 |         assert ratio > (2.*self.num_classes/float(self.num_instances)) and ratio < 0.5
154 | 
155 |         random.seed(a = 2741998)
156 | 
157 |         valid_class     = copy.deepcopy(self)
158 | 
159 |         valid_size      = int(ratio * self.num_instances)
160 |         train_size      = self.num_instances - valid_size
161 |         train_instances = list(range(0, self.num_instances))
162 |         valid_instances = random.sample(train_instances, valid_size)
163 |         train_instances = [x for x in train_instances if x not in valid_instances]
164 | 
165 |         assert train_size == len(train_instances) and valid_size == len(valid_instances)
166 | 
167 |         valid_class.im_paths        = [ self.im_paths[i]     for i in sorted(valid_instances) ]
168 |         valid_class.bbox            = [ self.bbox[i]            for i in sorted(valid_instances) ]
169 |         valid_class.obj_cls         = [ self.obj_cls[i]         for i in sorted(valid_instances) ]
170 |         valid_class.vp_labels       = [ self.vp_labels[i]       for i in sorted(valid_instances) ]
171 |         valid_class.flip            = [ self.flip[i]           for i in sorted(valid_instances) ]
172 |         valid_class.num_instances   = valid_size
173 | 
174 |         self.im_paths            = [ self.im_paths[i]     for i in sorted(train_instances) ]
175 |         self.bbox                = [ self.bbox[i]            for i in sorted(train_instances) ]
176 |         self.obj_cls             = [ self.obj_cls[i]         for i in sorted(train_instances) ]
177 |         self.vp_labels           = [ self.vp_labels[i]       for i in sorted(train_instances) ]
178 |         self.flip                 = [ self.flip[i]           for i in sorted(train_instances) ]
179 |         self.num_instances       = train_size
180 | 
181 |         return valid_class
182 | 


--------------------------------------------------------------------------------
/datasets/pascal3d_kp.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import time
  3 | import copy
  4 | import random
  5 | import pandas
  6 | import os
  7 | 
  8 | import numpy    as np
  9 | 
 10 | from PIL            import Image
 11 | from .vp_util        import label_to_probs
 12 | from torchvision    import transforms
 13 | from IPython        import embed
 14 | 
 15 | 
 16 | class pascal3d_kp(torch.utils.data.Dataset):
 17 | 
 18 |     """
 19 |         Construct a Pascal Dataset.
 20 |         Inputs:
 21 |             csv_path    path containing instance data
 22 |             augment     boolean for flipping images
 23 |     """
 24 |     def __init__(self, csv_path, dataset_root = None, im_size = 227, transform = None, map_size = 46, num_classes = 12, flip = False):
 25 | 
 26 |         assert transform           != None
 27 | 
 28 |         start_time = time.time()
 29 | 
 30 |         # Load instance data from csv-file
 31 |         im_paths, bbox, kp_loc, kp_cls, obj_cls, vp_labels = self.csv_to_instances(csv_path)
 32 |         csv_length = len(im_paths)
 33 | 
 34 |         # dataset parameters
 35 |         self.root           = dataset_root
 36 |         self.loader         = self.pil_loader
 37 |         self.im_paths       = im_paths
 38 |         self.bbox           = bbox
 39 |         self.kp_loc         = kp_loc
 40 |         self.kp_cls         = kp_cls
 41 |         self.obj_cls        = obj_cls
 42 |         self.vp_labels      = vp_labels
 43 |         self.img_size       = im_size
 44 |         self.map_size       = map_size
 45 |         self.num_classes    = num_classes
 46 |         self.num_instances  = len(self.im_paths)
 47 |         self.transform      = transform
 48 | 
 49 |         # Print out dataset stats
 50 |         print("================================")
 51 |         print("Pascal3D (w/ Keypoints) Stats: ")
 52 |         print("CSV file length  : ", len(im_paths))
 53 |         print("Dataset size     : ", self.num_instances)
 54 |         print("Loading time (s) : ", time.time() - start_time)
 55 | 
 56 | 
 57 |     """
 58 |         __getitem__ method:
 59 |             Args:
 60 |             index (int): Index
 61 |             Returns:
 62 |             tuple: (image, target) where target is class_index of the target class.
 63 |     """
 64 |     def __getitem__(self, index):
 65 | 
 66 |         # Load and transform image
 67 |         if self.root == None:
 68 |             c = self.im_paths[index]
 69 |         else:
 70 |             im_path = os.path.join(self.root, self.im_paths[index])
 71 | 
 72 |         bbox    = list(self.bbox[index])
 73 |         kp_loc  = list(self.kp_loc[index])
 74 |         kp_cls  = self.kp_cls[index]
 75 |         obj_cls = self.obj_cls[index]
 76 | 
 77 |         view    = self.vp_labels[index]
 78 | 
 79 |         # Transform labels
 80 |         azim, elev, tilt = (view + 360.) % 360.
 81 | 
 82 |         # Load and transform image
 83 |         img, kp_loc = self.loader(im_path, bbox, kp_loc)
 84 |         img = self.transform(img)
 85 | 
 86 |         # Generate keypoint map image, and kp class vector
 87 |         kpc_vec         = np.zeros( (34) )
 88 |         kpc_vec[kp_cls] = 1
 89 |         kp_class        = torch.from_numpy(kpc_vec).float()
 90 | 
 91 |         kpm_map         = self.generate_kp_map_chebyshev(kp_loc)
 92 |         kp_map          = torch.from_numpy(kpm_map).float()
 93 | 
 94 |         # construct unique key for statistics -- only need to generate imid and year
 95 |         _bb     = str(bbox[0]) + '-' + str(bbox[1]) + '-' + str(bbox[2]) + '-' + str(bbox[3])
 96 |         key_uid = self.im_paths[index] + '_'  + _bb + '_objc' + str(obj_cls) + '_kpc' + str(kp_cls)
 97 | 
 98 |         return img, azim, elev, tilt, obj_cls, kp_map, kp_class, key_uid
 99 | 
100 |     """
101 |         Retuns the Length of the dataset
102 |     """
103 |     def __len__(self):
104 |         return self.num_instances
105 | 
106 |     """
107 |         Image loader
108 |         Inputs:
109 |             path        absolute image path
110 |             bbox        4-element tuple (x_min, y_min, x_max, y_max)
111 |             flip        boolean for flipping image horizontally
112 |             kp_loc      2-element tuple (x_loc, y_loc)
113 |     """
114 |     def pil_loader(self, path, bbox, kp_loc):
115 |         # open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835)
116 |         with open(path, 'rb') as f:
117 |             with Image.open(f) as img:
118 |                 # Calculate relative kp_loc position
119 |                 kp_loc[0] = float(kp_loc[0]-bbox[0])/float(bbox[2]-bbox[0])
120 |                 kp_loc[1] = float(kp_loc[1]-bbox[1])/float(bbox[3]-bbox[1])
121 | 
122 |                 # Convert to RGB, crop, and resize
123 |                 img = img.convert('RGB')
124 | 
125 |                 # Convert to BGR from RGB
126 |                 r, g, b = img.split()
127 |                 img = Image.merge("RGB", (b, g, r))
128 | 
129 |                 img = img.crop(box=bbox)
130 |                 img = img.resize( (self.img_size, self.img_size), Image.LANCZOS)
131 | 
132 |                 return img, kp_loc
133 | 
134 |     """
135 |         Convert CSV file to instances
136 |     """
137 |     def csv_to_instances(self, csv_path):
138 |         # imgPath,bboxTLX,bboxTLY,bboxBRX,bboxBRY,imgKeyptX,imgKeyptY,keyptClass,objClass,azimuthClass,elevationClass,rotationClass
139 |         # /z/.../datasets/pascal3d/Images/bus_pascal/2008_000032.jpg,5,117,488,273,9.186347,158.402214,1,4,1756,1799,1443
140 | 
141 |         df   = pandas.read_csv(csv_path, sep=',')
142 |         data = df.values
143 | 
144 |         data_split = np.split(data, [0, 1, 5, 7, 8, 9, 12], axis=1)
145 |         del(data_split[0])
146 | 
147 |         image_paths = np.squeeze(data_split[0]).tolist()
148 | 
149 |         # if self.root != None:
150 |         #     image_paths = [path.split('pascal3d/')[1] for path in image_paths]
151 | 
152 |         bboxes      = data_split[1].tolist()
153 |         kp_loc      = data_split[2].tolist()
154 |         kp_class    = np.squeeze(data_split[3]).tolist()
155 |         obj_class   = np.squeeze(data_split[4]).tolist()
156 |         viewpoints  = np.array(data_split[5].tolist())
157 | 
158 |         return image_paths, bboxes, kp_loc, kp_class, obj_class, viewpoints
159 | 
160 | 
161 |     """
162 |         Generate Chbyshev-based map given a keypoint location
163 |     """
164 |     def generate_kp_map_chebyshev(self, kp):
165 | 
166 |         assert kp[0] >= 0. and kp[0] <= 1., kp
167 |         assert kp[1] >= 0. and kp[1] <= 1., kp
168 |         kp_map = np.ndarray( (self.map_size, self.map_size) )
169 | 
170 | 
171 |         kp[0] = kp[0] * self.map_size
172 |         kp[1] = kp[1] * self.map_size
173 | 
174 |         for i in range(0, self.map_size):
175 |             for j in range(0, self.map_size):
176 |                 kp_map[i,j] = max( np.abs(i - kp[0]), np.abs(j - kp[1]))
177 | 
178 |         # Normalize by dividing by the maximum possible value, which is self.IMG_SIZE -1
179 |         kp_map = kp_map / (1. * self.map_size)
180 |         # kp_map = -2. * (kp_map - 0.5)
181 | 
182 |         return kp_map
183 | 
184 | 


--------------------------------------------------------------------------------
/datasets/vp_util.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import os
 3 | import sys
 4 | 
 5 | def label_to_probs(view_angles, flip):
 6 |     # extract angles
 7 |     azim = view_angles[0] % 360
 8 |     elev = view_angles[1] % 360
 9 |     tilt = view_angles[2] % 360
10 | 
11 |     if flip:
12 |         azim = (360-azim) % 360
13 |         tilt = (-1 *tilt) % 360
14 | 
15 | 


--------------------------------------------------------------------------------
/model_weights/get_weights.sh:
--------------------------------------------------------------------------------
1 | wget http://www-personal.umich.edu/~mbanani/clickhere/weights/r4cnn.pkl
2 | wget http://www-personal.umich.edu/~mbanani/clickhere/weights/ch_cnn.npy
3 | 


--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
1 | from .render4cnn        import *
2 | from .clickhere_cnn     import *
3 | 


--------------------------------------------------------------------------------
/models/clickhere_cnn.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | import numpy as np
  5 | from IPython import embed
  6 | 
  7 | class clickhere_cnn(nn.Module):
  8 |     def __init__(self, renderCNN, weights_path = None, num_classes = 12):
  9 |         super(clickhere_cnn, self).__init__()
 10 | 
 11 |         # Image Stream
 12 |         self.conv4 = renderCNN.conv4
 13 |         self.conv5 = renderCNN.conv5
 14 | 
 15 |         self.infer = nn.Sequential(
 16 |                         nn.Linear(9216,4096),
 17 |                         nn.ReLU(),
 18 |                         nn.Dropout(0.5),
 19 |                         nn.Linear(4096,4096),
 20 |                         nn.ReLU(),
 21 |                         nn.Dropout(0.5))
 22 | 
 23 |         #Keypoint Stream
 24 |         self.kp_map = nn.Linear(2116,2116)
 25 |         self.kp_class = nn.Linear(34,34)
 26 |         self.kp_fuse = nn.Linear(2150,169)
 27 |         self.pool_map = nn.MaxPool2d( (5,5), (5,5), (1,1), ceil_mode=True)
 28 | 
 29 |         # Fused layer
 30 |         self.fusion = nn.Sequential(nn.Linear(4096 + 384, 4096), nn.ReLU(), nn.Dropout(0.5))
 31 | 
 32 |         # Prediction layers
 33 |         self.azim = nn.Linear(4096, 12 * 360)
 34 |         self.elev = nn.Linear(4096, 12 * 360)
 35 |         self.tilt = nn.Linear(4096, 12 * 360)
 36 | 
 37 |         if weights_path is not None:
 38 |             self.init_weights(weights_path)
 39 | 
 40 | 
 41 |     def init_weights(self, weights_path):
 42 |         npy_dict = np.load(weights_path, allow_pickle = True, encoding = 'latin1').item()
 43 | 
 44 |         state_dict = npy_dict
 45 |         # Convert parameters to torch tensors
 46 |         for key in list(npy_dict.keys()):
 47 |             state_dict[key]['weight'] = torch.from_numpy(npy_dict[key]['weight'])
 48 |             state_dict[key]['bias']   = torch.from_numpy(npy_dict[key]['bias'])
 49 | 
 50 |         self.conv4[0].weight.data.copy_(state_dict['conv1']['weight'])
 51 |         self.conv4[0].bias.data.copy_(state_dict['conv1']['bias'])
 52 |         self.conv4[4].weight.data.copy_(state_dict['conv2']['weight'])
 53 |         self.conv4[4].bias.data.copy_(state_dict['conv2']['bias'])
 54 |         self.conv4[8].weight.data.copy_(state_dict['conv3']['weight'])
 55 |         self.conv4[8].bias.data.copy_(state_dict['conv3']['bias'])
 56 |         self.conv4[10].weight.data.copy_(state_dict['conv4']['weight'])
 57 |         self.conv4[10].bias.data.copy_(state_dict['conv4']['bias'])
 58 |         self.conv5[0].weight.data.copy_(state_dict['conv5']['weight'])
 59 |         self.conv5[0].bias.data.copy_(state_dict['conv5']['bias'])
 60 | 
 61 |         self.infer[0].weight.data.copy_(state_dict['fc6']['weight'])
 62 |         self.infer[0].bias.data.copy_(state_dict['fc6']['bias'])
 63 |         self.infer[3].weight.data.copy_(state_dict['fc7']['weight'])
 64 |         self.infer[3].bias.data.copy_(state_dict['fc7']['bias'])
 65 |         self.fusion[0].weight.data.copy_(state_dict['fc8']['weight'])
 66 |         self.fusion[0].bias.data.copy_(state_dict['fc8']['bias'])
 67 | 
 68 |         self.kp_map.weight.data.copy_(state_dict['fc-keypoint-map']['weight'])
 69 |         self.kp_map.bias.data.copy_(state_dict['fc-keypoint-map']['bias'])
 70 |         self.kp_class.weight.data.copy_(state_dict['fc-keypoint-class']['weight'])
 71 |         self.kp_class.bias.data.copy_(state_dict['fc-keypoint-class']['bias'])
 72 |         self.kp_fuse.weight.data.copy_(state_dict['fc-keypoint-concat']['weight'])
 73 |         self.kp_fuse.bias.data.copy_(state_dict['fc-keypoint-concat']['bias'])
 74 | 
 75 |         self.azim.weight.data.copy_( state_dict['pred_azimuth'  ]['weight'] )
 76 |         self.elev.weight.data.copy_( state_dict['pred_elevation']['weight'] )
 77 |         self.tilt.weight.data.copy_( state_dict['pred_tilt'     ]['weight'] )
 78 | 
 79 |         self.azim.bias.data.copy_( state_dict['pred_azimuth'  ]['bias'] )
 80 |         self.elev.bias.data.copy_( state_dict['pred_elevation']['bias'] )
 81 |         self.tilt.bias.data.copy_( state_dict['pred_tilt'     ]['bias'] )
 82 | 
 83 | 
 84 |     def forward(self, images, kp_map, kp_cls, obj_class):
 85 |         # Image Stream
 86 |         conv4 = self.conv4(images)
 87 |         im_stream = self.conv5(conv4)
 88 |         im_stream = im_stream.view(im_stream.size(0), -1)
 89 |         im_stream = self.infer(im_stream)
 90 | 
 91 |         # Keypoint Stream
 92 |         kp_map = kp_map.view(kp_map.size(0), -1)
 93 |         kp_map = self.kp_map(kp_map)
 94 |         kp_cls = self.kp_class(kp_cls)
 95 | 
 96 |         # Concatenate the two keypoint feature vectors
 97 |         kp_stream = torch.cat([kp_map, kp_cls], dim = 1)
 98 | 
 99 |         # Softmax followed by reshaping into a 13x13
100 |         # Conv4 as shape batch * 384 * 13 * 13
101 |         kp_stream = F.softmax(self.kp_fuse(kp_stream), dim=1)
102 |         kp_stream = kp_stream.view(kp_stream.size(0) ,1, 13, 13)
103 | 
104 |         # Attention -> Elt. wise product, then summation over x and y dims
105 |         kp_stream = kp_stream * conv4 # CHECK IF THIS DOES WHAT I THINK IT DOES!! TODO 
106 |         kp_stream = kp_stream.sum(3).sum(2)
107 | 
108 |         # Concatenate fc7 and attended features
109 |         fused_embed = torch.cat([im_stream, kp_stream], dim = 1)
110 |         fused_embed = self.fusion(fused_embed)
111 | 
112 |         # Final inference
113 |         azim = self.azim(fused_embed)
114 |         elev = self.elev(fused_embed)
115 |         tilt = self.tilt(fused_embed)
116 | 
117 |         # mask on class
118 |         azim = self.azim(fused_embed)
119 |         azim = azim.view(-1, 12, 360)
120 |         azim = azim[torch.arange(fused_embed.shape[0]), obj_class, :]
121 |         elev = self.elev(fused_embed)
122 |         elev = elev.view(-1, 12, 360)
123 |         elev = elev[torch.arange(fused_embed.shape[0]), obj_class, :]
124 |         tilt = self.tilt(fused_embed)
125 |         tilt = tilt.view(-1, 12, 360)
126 |         tilt = tilt[torch.arange(fused_embed.shape[0]), obj_class, :]
127 | 
128 |         return azim, tilt, elev
129 | 


--------------------------------------------------------------------------------
/models/render4cnn.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | from IPython import embed
 4 | 
 5 | class render4cnn(nn.Module):
 6 |     def __init__(self, weights_path = None):
 7 |         super(render4cnn, self).__init__()
 8 | 
 9 |         # define model
10 |         self.conv4 = nn.Sequential(  
11 |                         nn.Conv2d(3, 96, (11, 11), (4,4)),
12 |                         nn.ReLU(),
13 |                         nn.MaxPool2d( (3,3), (2,2), (0,0), ceil_mode=True),
14 |                         nn.LocalResponseNorm(5, alpha=0.0001, beta=0.75, k=1.),
15 |                         nn.Conv2d(96, 256, (5, 5), (1,1), (2,2), 1,2),
16 |                         nn.ReLU(),
17 |                         nn.MaxPool2d( (3,3), (2,2), (0,0), ceil_mode=True),
18 |                         nn.LocalResponseNorm(5, alpha=0.0001, beta=0.75, k=1.),
19 |                         nn.Conv2d(256,384,(3, 3),(1, 1),(1, 1)),
20 |                         nn.ReLU(),
21 |                         nn.Conv2d(384,384,(3, 3),(1, 1),(1, 1),1,2),
22 |                         nn.ReLU(),
23 |                     )
24 | 
25 |         self.conv5 = nn.Sequential(  
26 |                         nn.Conv2d(384,256,(3, 3),(1, 1),(1, 1),1,2),
27 |                         nn.ReLU(),
28 |                         nn.MaxPool2d((3, 3),(2, 2),(0, 0),ceil_mode=True),
29 |                     )
30 | 
31 |         self.infer = nn.Sequential( 
32 |                         nn.Linear(9216,4096),
33 |                         nn.ReLU(),
34 |                         nn.Dropout(0.5),
35 |                         nn.Linear(4096,4096),
36 |                         nn.ReLU(),
37 |                         nn.Dropout(0.5),
38 |                     )
39 | 
40 |         self.azim     = nn.Linear(4096, 12 * 360)
41 |         self.elev     = nn.Linear(4096, 12 * 360)
42 |         self.tilt     = nn.Linear(4096, 12 * 360)
43 |         
44 |         if weights_path is not None:
45 |             self._initialize_weights(weights_path)        
46 | 
47 |     # weight initialization from torchvision/models/vgg.py
48 |     def _initialize_weights(self, weights_path):
49 |         state_dict = torch.load(weights_path)['model_state_dict']
50 |         
51 |         layers = [0, 4, 8, 10]
52 |         for l in layers:
53 |             self.conv4[l].weight.data.copy_( state_dict['conv4.'+str(l) + '.weight']) 
54 |             self.conv4[l].bias.data.copy_(   state_dict['conv4.'+str(l) + '.bias']) 
55 | 
56 |         self.conv5[0].weight.data.copy_( state_dict['conv5.0.weight'])
57 |         self.conv5[0].bias.data.copy_(   state_dict['conv5.0.bias']) 
58 |         
59 |         self.infer[0].weight.data.copy_(    state_dict['infer.0.weight'])
60 |         self.infer[0].bias.data.copy_(      state_dict['infer.0.bias'])
61 |         self.infer[3].weight.data.copy_(    state_dict['infer.3.weight'])
62 |         self.infer[3].bias.data.copy_(      state_dict['infer.3.bias'])
63 |     
64 |         self.azim.weight.data.copy_(    state_dict['azim.0.weight'])
65 |         self.azim.bias.data.copy_(      state_dict['azim.0.bias'])
66 |         self.elev.weight.data.copy_(    state_dict['elev.0.weight'])
67 |         self.elev.bias.data.copy_(      state_dict['elev.0.bias'])
68 |         self.tilt.weight.data.copy_(    state_dict['tilt.0.weight'])
69 |         self.tilt.bias.data.copy_(      state_dict['tilt.0.bias'])
70 | 
71 | 
72 |     def forward(self, x, obj_class):
73 |         # generate output 
74 |         x = self.conv4(x)
75 |         x = self.conv5(x)
76 |         x = x.view(x.shape[0], -1)
77 |         x = self.infer(x)
78 | 
79 |         # mask on class
80 |         azim = self.azim(x)
81 |         azim = azim.view(-1, 12,360)
82 |         azim = azim[torch.arange(x.shape[0]), obj_class, :]
83 |         elev = self.elev(x)
84 |         elev = elev.view(-1, 12,360)
85 |         elev = elev[torch.arange(x.shape[0]), obj_class, :]
86 |         tilt = self.tilt(x)
87 |         tilt = tilt.view(-1, 12,360)
88 |         tilt = tilt[torch.arange(x.shape[0]), obj_class, :]
89 |         
90 |         return azim, elev, tilt
91 | 
92 | 
93 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import argparse, os, sys, shutil, time
  2 | 
  3 | import numpy as np
  4 | from IPython import embed
  5 | 
  6 | import torch
  7 | 
  8 | from util import SoftmaxVPLoss, Paths, get_data_loaders, kp_dict 
  9 | from models import clickhere_cnn, render4cnn
 10 | from util.torch_utils import save_checkpoint
 11 | 
 12 | def main(args):
 13 |     initialization_time = time.time()
 14 | 
 15 | 
 16 |     print("#############  Read in Database   ##############")
 17 |     train_loader, valid_loader = get_data_loaders(  dataset     = args.dataset,
 18 |                                                     batch_size  = args.batch_size,
 19 |                                                     num_workers = args.num_workers,
 20 |                                                     model       = args.model)
 21 | 
 22 |     print("#############  Initiate Model     ##############")
 23 |     if args.model == 'chcnn':
 24 |         assert Paths.render4cnn_weights != None, "Error: Set render4cnn weights path in util/Paths.py."
 25 |         model = clickhere_cnn(render4cnn(), weights_path = Paths.clickhere_weights)
 26 |         args.no_keypoint = False
 27 |     elif args.model == 'r4cnn':
 28 |         assert Paths.render4cnn_weights != None, "Error: Set render4cnn weights path in util/Paths.py."
 29 |         model = render4cnn(weights_path = Paths.render4cnn_weights)
 30 |         args.no_keypoint = True
 31 |     else:
 32 |         assert False, "Error: unknown model choice."
 33 | 
 34 |     # Loss functions
 35 |     criterion = SoftmaxVPLoss() 
 36 | 
 37 |     # Parameters to train
 38 |     if args.just_attention and (not args.no_keypoint):
 39 |         params = list(model.map_linear.parameters()) +list(model.cls_linear.parameters())
 40 |         params = params + list(model.kp_softmax.parameters()) +list(model.fusion.parameters())
 41 |         params = params + list(model.azim.parameters()) + list(model.elev.parameters())
 42 |         params = params + list(model.tilt.parameters())
 43 |     else:
 44 |         params = list(model.parameters())
 45 | 
 46 |     # Optimizer
 47 |     optimizer = torch.optim.Adam(params, lr = args.lr)
 48 |     
 49 |     # train/evaluate on GPU
 50 |     model.cuda()
 51 | 
 52 |     print("Time to initialize take: ", time.time() - initialization_time)
 53 |     print("#############  Start Training     ##############")
 54 |     total_step = len(train_loader)
 55 | 
 56 |     for epoch in range(0, args.num_epochs):
 57 | 
 58 |         if epoch % args.eval_epoch == 0:
 59 |             eval_step(  model       = model,
 60 |                         data_loader = valid_loader,
 61 |                         criterion   = criterion,
 62 |                         step        = epoch * total_step,
 63 |                         datasplit   = "valid")
 64 | 
 65 |         train_step( model        = model,
 66 |                     train_loader = train_loader,
 67 |                     criterion    = criterion,
 68 |                     optimizer    = optimizer,
 69 |                     epoch        = epoch,
 70 |                     step         = epoch * total_step)
 71 | 
 72 | 
 73 | def train_step(model, train_loader, criterion, optimizer, epoch, step):
 74 |     model.train()
 75 |     total_step      = len(train_loader)
 76 |     loss_sum        = 0.
 77 | 
 78 |     for i, (images, azim_label, elev_label, tilt_label, obj_class, kp_map, kp_class, key_uid) in enumerate(train_loader):
 79 | 
 80 |         # Set mini-batch dataset
 81 |         images      = images.cuda()
 82 |         azim_label  = azim_label.cuda()
 83 |         elev_label  = elev_label.cuda()
 84 |         tilt_label  = tilt_label.cuda()
 85 |         obj_class   = obj_class.cuda()
 86 | 
 87 |         # Forward, Backward and Optimize
 88 |         model.zero_grad()
 89 | 
 90 |         if args.no_keypoint:
 91 |             azim, elev, tilt = model(images, obj_class)
 92 |         else:
 93 |             kp_map      = kp_map.cuda()
 94 |             kp_class    = kp_class.cuda()
 95 |             azim, elev, tilt = model(images, kp_map, kp_class, obj_class)
 96 | 
 97 |         loss_a = criterion(azim, azim_label)
 98 |         loss_e = criterion(elev, elev_label)
 99 |         loss_t = criterion(tilt, tilt_label)
100 |         loss = loss_a + loss_e + loss_t
101 | 
102 |         loss.backward()
103 |         optimizer.step()
104 | 
105 |         loss_sum += loss.item()
106 |         # Print log info
107 |         if i % args.log_rate == 0 and i > 0:
108 |             print("Epoch [%d/%d] Step [%d/%d]: Training Loss = %2.5f" %( epoch, args.num_epochs, i, total_step, loss_sum / (i + 1)))
109 | 
110 | 
111 | def eval_step( model, data_loader,  criterion, step, datasplit):
112 |     model.eval()
113 | 
114 |     total_step      = len(data_loader)
115 |     epoch_loss_a    = 0.
116 |     epoch_loss_e    = 0.
117 |     epoch_loss_t    = 0.
118 |     epoch_loss      = 0.
119 |     results_dict    = kp_dict()
120 | 
121 |     for i, (images, azim_label, elev_label, tilt_label, obj_class, kp_map, kp_class, key_uid) in enumerate(data_loader):
122 | 
123 |         if i % args.log_rate == 0:
124 |             print("Evaluation of %s [%d/%d] " % (datasplit, i, total_step))
125 | 
126 |         # Set mini-batch dataset
127 |         images      = images.cuda()
128 |         azim_label  = azim_label.cuda()
129 |         elev_label  = elev_label.cuda()
130 |         tilt_label  = tilt_label.cuda()
131 |         obj_class   = obj_class.cuda()
132 | 
133 |         if args.no_keypoint:
134 |             azim, elev, tilt = model(images, obj_class)
135 |         else:
136 |             kp_map      = kp_map.cuda()
137 |             kp_class    = kp_class.cuda()
138 |             azim, elev, tilt = model(images, kp_map, kp_class, obj_class)
139 | 
140 |         # embed()
141 |         epoch_loss_a += criterion(azim, azim_label).item()
142 |         epoch_loss_e += criterion(elev, elev_label).item()
143 |         epoch_loss_t += criterion(tilt, tilt_label).item()
144 | 
145 |         results_dict.update_dict( key_uid,
146 |                             [azim.data.cpu().numpy(), elev.data.cpu().numpy(), tilt.data.cpu().numpy()],
147 |                             [azim_label.data.cpu().numpy(), elev_label.data.cpu().numpy(), tilt_label.data.cpu().numpy()])
148 | 
149 | 
150 |     type_accuracy, type_total, type_geo_dist = results_dict.metrics()
151 | 
152 |     geo_dist_median = [np.median(type_dist) * 180. / np.pi for type_dist in type_geo_dist if type_dist != [] ]
153 |     type_accuracy   = [ type_accuracy[i] * 100. for i in range(0, len(type_accuracy)) if  type_total[i] > 0]
154 |     w_acc           = np.mean(type_accuracy)
155 | 
156 |     print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
157 |     print("Type Acc_pi/6 : ", type_accuracy, " -> ", w_acc, " %")
158 |     print("Type Median   : ", [ int(1000 * a_type_med) / 1000. for a_type_med in geo_dist_median ], " -> ", int(1000 * np.mean(geo_dist_median)) / 1000., " degrees")
159 |     print("Type Loss     : ", [epoch_loss_a/total_step, epoch_loss_e/total_step, epoch_loss_t/total_step], " -> ", (epoch_loss_a + epoch_loss_e + epoch_loss_t ) / total_step)
160 |     print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
161 | 
162 | 
163 | 
164 | if __name__ == '__main__':
165 | 
166 |     parser = argparse.ArgumentParser()
167 | 
168 |     # logging parameters
169 |     parser.add_argument('--eval_epoch',      type=int , default=5)
170 |     parser.add_argument('--log_rate',        type=int, default=10)
171 |     parser.add_argument('--num_workers',     type=int, default=7)
172 | 
173 |     # training parameters
174 |     parser.add_argument('--num_epochs',      type=int, default=100)
175 |     parser.add_argument('--batch_size',      type=int, default=64)
176 |     parser.add_argument('--lr',              type=float, default=3e-4)
177 |     parser.add_argument('--optimizer',       type=str,default='sgd')
178 | 
179 |     # experiment details
180 |     parser.add_argument('--dataset',         type=str, default='pascal')
181 |     parser.add_argument('--model',           type=str, default='pretrained_clickhere')
182 |     parser.add_argument('--experiment_name', type=str, default= 'Test')
183 |     parser.add_argument('--just_attention',  action="store_true",default=False)
184 | 
185 | 
186 |     args = parser.parse_args()
187 |     main(args)
188 | 


--------------------------------------------------------------------------------
/util/Paths.py:
--------------------------------------------------------------------------------
1 | import os
2 | 
3 | pascal3d_root           = '/home/mbanani/datasets/pascal3d'
4 | 
5 | root_dir                = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
6 | render4cnn_weights      =  os.path.join(root_dir, 'model_weights/r4cnn.pkl')
7 | ft_render4cnn_weights   =  os.path.join(root_dir, 'model_weights/ryan_render.npy')
8 | clickhere_weights       =  os.path.join(root_dir, 'model_weights/ch_cnn.npy')
9 | 


--------------------------------------------------------------------------------
/util/__init__.py:
--------------------------------------------------------------------------------
1 | from .vp_loss import SoftmaxVPLoss
2 | from .metrics import kp_dict 
3 | from .load_datasets import *
4 | from . import Paths
5 | 


--------------------------------------------------------------------------------
/util/load_datasets.py:
--------------------------------------------------------------------------------
 1 | import os,sys, math
 2 | import torch
 3 | import numpy                    as np
 4 | import torchvision.transforms   as transforms
 5 | from datasets                   import pascal3d, pascal3d_kp
 6 | 
 7 | from IPython import embed
 8 | from .Paths import * 
 9 | 
10 | root_dir     = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
11 | dataset_root = pascal3d_root 
12 | 
13 | 
14 | def get_data_loaders(dataset, batch_size, num_workers, model, num_classes = 12):
15 | 
16 |     image_size = 227
17 |     train_transform   = transforms.Compose([transforms.ToTensor(),
18 |                         transforms.Normalize(   mean=(0., 0., 0.),
19 |                                                  std=(1./255., 1./255., 1./255.)
20 |                                              ),
21 |                         transforms.Normalize(   mean=(104, 116.668, 122.678),
22 |                                                 std=(1., 1., 1.)
23 |                                             )
24 |                         ])
25 | 
26 |     test_transform   = transforms.Compose([transforms.ToTensor(),
27 |                         transforms.Normalize(   mean=(0., 0., 0.),
28 |                                                  std=(1./255., 1./255., 1./255.)
29 |                                              ),
30 |                         transforms.Normalize(   mean=(104, 116.668, 122.678),
31 |                                                 std=(1., 1., 1.)
32 |                                             )
33 |                         ])
34 | 
35 |     # # The New transform for ImageNet Stuff
36 |     # new_transform   = transforms.Compose([
37 |     #                                     transforms.ToTensor(),
38 |     #                                     transforms.Normalize(mean=(0.485, 0.456, 0.406),
39 |     #                                     std=(0.229, 0.224, 0.225))])
40 | 
41 | 
42 |     if dataset == "pascal":
43 |         csv_train = os.path.join(root_dir, 'data/pascal3d_train.csv')
44 |         csv_test  = os.path.join(root_dir, 'data/pascal3d_valid.csv')
45 | 
46 |         train_set = pascal3d(csv_train, dataset_root= dataset_root, transform = train_transform, im_size = image_size)
47 |         test_set  = pascal3d(csv_test,  dataset_root= dataset_root, transform = test_transform,  im_size = image_size)
48 |     elif dataset == "pascalEasy":
49 |         csv_train = os.path.join(root_dir, 'data/pascal3d_train_easy.csv')
50 |         csv_test  = os.path.join(root_dir, 'data/pascal3d_valid_easy.csv')
51 | 
52 |         train_set = pascal3d(   csv_train, dataset_root= dataset_root,
53 |                                 transform = train_transform, im_size = image_size)
54 |         test_set  = pascal3d(   csv_test,  dataset_root= dataset_root,
55 |                                 transform = test_transform,  im_size = image_size)
56 | 
57 |     elif dataset == "pascalVehKP":
58 |         csv_train = os.path.join(root_dir, 'data/veh_pascal3d_kp_train.csv')
59 |         csv_test  = os.path.join(root_dir, 'data/veh_pascal3d_kp_valid.csv')
60 | 
61 |         train_set = pascal3d_kp(csv_train,
62 |                                 dataset_root= dataset_root,
63 |                                 transform = train_transform,
64 |                                 im_size = image_size,
65 |                                 num_classes = num_classes)
66 | 
67 |         test_set  = pascal3d_kp(csv_test,
68 |                                 dataset_root= dataset_root,
69 |                                 transform = test_transform,
70 |                                 im_size = image_size,
71 |                                 num_classes = num_classes)
72 | 
73 |     elif dataset == "pascalKP":
74 |         csv_train = os.path.join(root_dir, 'data/pascal3d_kp_train.csv')
75 |         csv_test  = os.path.join(root_dir, 'data/pascal3d_kp_valid.csv')
76 | 
77 |         train_set = pascal3d_kp(csv_train, dataset_root= dataset_root, transform = train_transform, im_size = image_size)
78 |         test_set  = pascal3d_kp(csv_test,  dataset_root= dataset_root, transform = test_transform,  im_size = image_size)
79 |     else:
80 |         print("Error in load_datasets: Dataset name not defined.")
81 | 
82 | 
83 | 
84 |     # Generate data loaders
85 |     train_loader = torch.utils.data.DataLoader( dataset = train_set,
86 |                                                 batch_size =batch_size,
87 |                                                 shuffle = True ,
88 |                                                 num_workers =num_workers,
89 |                                                 drop_last   = True)
90 | 
91 |     test_loader  = torch.utils.data.DataLoader( dataset=test_set,
92 |                                                 batch_size=batch_size,
93 |                                                 shuffle = False,
94 |                                                 num_workers=num_workers,
95 |                                                 drop_last = False)
96 | 
97 |     return train_loader, test_loader
98 | 
99 | 


--------------------------------------------------------------------------------
/util/metrics.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy.misc
  3 | 
  4 | from scipy import linalg as linAlg
  5 | from IPython import embed
  6 | 
  7 | def compute_angle_dists(preds, labels):
  8 |     # Get rotation matrices from prediction and ground truth angles
  9 |     predR   = angle2dcm(preds[0],  preds[1], preds[2])
 10 |     gtR     = angle2dcm(labels[0], labels[1], labels[2])
 11 | 
 12 |     # Get geodesic distance
 13 |     return linAlg.norm(linAlg.logm(np.dot(predR.T, gtR)), 2) / np.sqrt(2)
 14 | 
 15 | def angle2dcm(xRot, yRot, zRot, deg_type='deg'):
 16 |     if deg_type == 'deg':
 17 |         xRot = xRot * np.pi / 180.0
 18 |         yRot = yRot * np.pi / 180.0
 19 |         zRot = zRot * np.pi / 180.0
 20 | 
 21 |     xMat = np.array([
 22 |         [np.cos(xRot), np.sin(xRot), 0],
 23 |         [-np.sin(xRot), np.cos(xRot), 0],
 24 |         [0, 0, 1]
 25 |     ])
 26 | 
 27 |     yMat = np.array([
 28 |         [np.cos(yRot), 0, -np.sin(yRot)],
 29 |         [0, 1, 0],
 30 |         [np.sin(yRot), 0, np.cos(yRot)]
 31 |     ])
 32 | 
 33 |     zMat = np.array([
 34 |         [1, 0, 0],
 35 |         [0, np.cos(zRot), np.sin(zRot)],
 36 |         [0, -np.sin(zRot), np.cos(zRot)]
 37 |     ])
 38 | 
 39 |     return np.dot(zMat, np.dot(yMat, xMat))
 40 | 
 41 | 
 42 | class kp_dict(object):
 43 | 
 44 |     def __init__(self, num_classes = 12):
 45 |         self.keypoint_dict  = dict()
 46 |         self.num_classes    = num_classes
 47 |         self.class_ranges   = list(range(0, 360*(self.num_classes + 1), 360))
 48 |         self.threshold      = np.pi / 6.
 49 | 
 50 |     """
 51 |         Updates the keypoint dictionary
 52 |         params:     unique_id       unique id of each instance (NAME_objc#_kpc#)
 53 |                     predictions     the predictions for each vector
 54 |     """
 55 |     def update_dict(self, unique_id, predictions, labels):
 56 |         """Log a scalar variable."""
 57 |         if type(predictions) == int:
 58 |             predictions = [predictions]
 59 |             labels      = [labels]
 60 | 
 61 |         for i in range(0, len(unique_id)):
 62 |             image       = unique_id[i].split('_objc')[0]
 63 |             obj_class   = int(unique_id[i].split('_objc')[1].split('_kpc')[0])
 64 |             kp_class    = int(unique_id[i].split('_objc')[1].split('_kpc')[1])
 65 | 
 66 |             start_index = self.class_ranges[obj_class]
 67 |             end_index   = self.class_ranges[obj_class + 1]
 68 | 
 69 | 
 70 |             pred_probs = (  predictions[0][i], predictions[1][i], predictions[2][i])
 71 | 
 72 |             label_probs = ( labels[0][i], labels[1][i], labels[2][i])
 73 | 
 74 | 
 75 |             if image in list(self.keypoint_dict.keys()):
 76 |                 self.keypoint_dict[image][kp_class] = pred_probs
 77 |             else:
 78 |                 self.keypoint_dict[image] = {'class' : obj_class, 'label' : label_probs, kp_class : pred_probs}
 79 | 
 80 | 
 81 |     def calculate_geo_performance(self):
 82 |         for image in list(self.keypoint_dict.keys()):
 83 |             curr_label = self.keypoint_dict[image]['label']
 84 |             self.keypoint_dict[image]['geo_dist'] = dict()
 85 |             self.keypoint_dict[image]['correct'] = dict()
 86 |             for kp in list(self.keypoint_dict[image].keys()):
 87 |                 if type(kp) != str :
 88 |                     curr_pred = [   np.argmax(self.keypoint_dict[image][kp][0]),
 89 |                                     np.argmax(self.keypoint_dict[image][kp][1]),
 90 |                                     np.argmax(self.keypoint_dict[image][kp][2])]
 91 |                     self.keypoint_dict[image]['geo_dist'][kp] = compute_angle_dists(curr_pred, curr_label)
 92 |                     self.keypoint_dict[image]['correct'][kp]  = 1 if (self.keypoint_dict[image]['geo_dist'][kp] <= self.threshold) else 0
 93 | 
 94 |     def metrics(self, unique = False):
 95 |         self.calculate_geo_performance()
 96 | 
 97 |         type_geo_dist   = [ [] for x in range(0, self.num_classes)]
 98 |         type_correct    = np.zeros(self.num_classes, dtype=np.float32)
 99 |         type_total      = np.zeros(self.num_classes, dtype=np.float32)
100 | 
101 |         for image in list(self.keypoint_dict.keys()):
102 |             object_type = self.keypoint_dict[image]['class']
103 |             curr_correct = 0.
104 |             curr_total   = 0.
105 |             curr_geodist = []
106 |             for kp in list(self.keypoint_dict[image]['correct'].keys()):
107 |                 curr_correct += self.keypoint_dict[image]['correct'][kp]
108 |                 curr_total   += 1.
109 |                 curr_geodist.append(self.keypoint_dict[image]['geo_dist'][kp])
110 | 
111 |             if unique:
112 |                 curr_correct = curr_correct / curr_total
113 |                 curr_total   = 1.
114 |                 curr_geodist = [np.median(curr_geodist)]
115 | 
116 | 
117 | 
118 |             type_correct[object_type] += curr_correct
119 |             type_total[object_type]   += curr_total
120 |             for dist in curr_geodist:
121 |                 type_geo_dist[object_type].append(dist)
122 | 
123 |         type_accuracy   = np.zeros(self.num_classes, dtype=np.float16)
124 |         for i in range(0, self.num_classes):
125 |             if type_total[i] > 0:
126 |                 type_accuracy[i] = float(type_correct[i]) / type_total[i]
127 | 
128 |         self.calculate_performance_baselines()
129 |         return type_accuracy, type_total, type_geo_dist
130 | 
131 | 
132 |     def calculate_performance_baselines(self, mode = 'real'):
133 | 
134 |         mean_baseline       = [ [] for x in range(0, self.num_classes)]
135 |         total_baseline      = [ [] for x in range(0, self.num_classes)]
136 | 
137 |         #iterate over batch
138 |         for image in list(self.keypoint_dict.keys()):
139 |             obj_cls = self.keypoint_dict[image]['class']
140 | 
141 |             perf = [self.keypoint_dict[image]['geo_dist'][kp] for kp in list(self.keypoint_dict[image]['geo_dist'].keys())]
142 | 
143 |             # Append baselines
144 |             mean_baseline[obj_cls  ].append(np.mean(perf))
145 |             for p in perf:
146 |                 total_baseline[obj_cls ].append(p )
147 | 
148 | 
149 |         # embed()
150 |         accuracy_mean    = np.around([ 100. * np.mean([ num < self.threshold for num in mean_baseline[i]   ]) for i in range(0, self.num_classes) ], decimals = 2)
151 |         accuracy_total   = np.around([ 100. * np.mean([ num < self.threshold for num in total_baseline[i]  ]) for i in range(0, self.num_classes) ], decimals = 2)
152 | 
153 |         medError_mean    = np.around([ (180. / np.pi ) * np.median(mean_baseline[i]  ) for i in range(0, self.num_classes) ], decimals = 2)
154 |         medError_total   = np.around([ (180. / np.pi ) * np.median(total_baseline[i] ) for i in range(0, self.num_classes) ], decimals = 2)
155 | 
156 | 
157 |         if np.isnan(accuracy_mean[0]):
158 |             accuracy_mean = accuracy_mean[[4,5,8]]
159 |             accuracy_total = accuracy_total[[4,5,8]]
160 |             medError_mean = medError_mean[[4,5,8]]
161 |             medError_total = medError_total[[4,5,8]]
162 | 
163 |         print("--------------------------------------------")
164 |         print("Accuracy ")
165 |         print("mean      : ", accuracy_mean   , " -- mean : ", np.round(np.mean(accuracy_mean   ), decimals = 2))
166 |         print("total     : ", accuracy_total  , " -- mean : ", np.round(np.mean(accuracy_total  ), decimals = 2))
167 |         print("")
168 |         print("Median Error ")
169 |         print("mean      : ", medError_mean   , " -- mean : ",  np.round(np.mean(medError_mean   ), decimals = 2))
170 |         print("total     : ", medError_total  , " -- mean : ",  np.round(np.mean(medError_total  ), decimals = 2))
171 |         print("--------------------------------------------")
172 | 


--------------------------------------------------------------------------------
/util/torch_utils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import os
 3 | import shutil
 4 | 
 5 | def save_checkpoint(model, optimizer, curr_epoch, curr_step, args, curr_loss, curr_acc, filename):
 6 |     """
 7 |         Saves a checkpoint and updates the best loss and best weighted accuracy
 8 |     """
 9 |     is_best_loss = curr_loss < args.best_loss
10 |     is_best_acc = curr_acc > args.best_acc
11 | 
12 |     args.best_acc = max(args.best_acc, curr_acc)
13 |     args.best_loss = min(args.best_loss, curr_loss)
14 | 
15 |     state = {   'epoch':curr_epoch,
16 |                 'step': curr_step,
17 |                 'args': args,
18 |                 'state_dict': model.state_dict(),
19 |                 'val_loss': args.best_loss,
20 |                 'val_acc': args.best_acc,
21 |                 'optimizer' : optimizer.state_dict(),
22 |              }
23 |     path = os.path.join(args.experiment_path, filename)
24 |     torch.save(state, path)
25 |     if is_best_loss:
26 |         shutil.copyfile(path, os.path.join(args.experiment_path, 'model_best_loss.pkl'))
27 |     if is_best_acc:
28 |         shutil.copyfile(path, os.path.join(args.experiment_path, 'model_best_acc.pkl'))
29 | 
30 |     return args
31 | 
32 | def accuracy(output, target, topk=(1,)):
33 |     """ From The PyTorch ImageNet example """
34 |     """Computes the precision@k for the specified values of k"""
35 |     maxk = max(topk)
36 |     batch_size = target.size(0)
37 | 
38 |     _, pred = output.topk(maxk, 1, True, True)
39 |     pred = pred.t()
40 |     correct = pred.eq(target.view(1, -1).expand_as(pred))
41 | 
42 |     res = []
43 |     for k in topk:
44 |         correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
45 |         res.append(correct_k.mul_(100.0 / batch_size))
46 |     return res
47 | 


--------------------------------------------------------------------------------
/util/vp_loss.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Multi-class Geometric Viewpoint Aware Loss
 3 | A PyTorch implmentation of the geometric-aware softmax view loss as described in
 4 | RenderForCNN (link: https://arxiv.org/pdf/1505.05641.pdf)
 5 | Caffe implmentation:
 6 | https://github.com/charlesq34/caffe-render-for-cnn/blob/view_prediction/
 7 | """
 8 | import torch
 9 | 
10 | from torch      import nn
11 | from IPython    import embed
12 | 
13 | import torch.nn.functional  as F
14 | import numpy                as np
15 | 
16 | class SoftmaxVPLoss(nn.Module):
17 |     # Loss parameters taken directly from Render4CNN paper
18 |     # azim_band_width = 7     # 15 in paper
19 |     # elev_band_width = 2     # 5 in paper
20 |     # tilt_band_width = 2     # 5 in paper
21 | 
22 |     # azim_sigma = 5
23 |     # elev_sigma = 3
24 |     # tilt_sigma = 3
25 |     def __init__(self, kernel_size = 7, sigma=25):
26 |         super(SoftmaxVPLoss, self).__init__()
27 |         
28 |         self.filter         = self.viewloss_filter(kernel_size, sigma)
29 |         self.kernel_size    = kernel_size
30 | 
31 |     def viewloss_filter(self, size, sigma):
32 |         vec     = np.linspace(-1*size, size, 1 + 2*size, dtype=np.float)
33 |         prob    = np.exp(-1 * abs(vec) / sigma)
34 |         # # normalize filter because otherwise loss will scale with kernel size
35 |         # prob    = prob / np.sum(prob)
36 |         prob    = torch.FloatTensor(prob)[None, None, :]    # 1 x 1 x (1 + 2*size)
37 |         return prob
38 | 
39 | 
40 |     def forward(self, preds, labels, size_average=True):
41 |         """
42 |         :param preds:   Angle predictions (batch_size, 360 x num_classes)
43 |         :param targets: Angle labels (batch_size, 360 x num_classes)
44 |         :return: Loss. Loss is a variable which may have a backward pass performed.
45 |         Apply softmax over the preds, and then apply geometrics loss
46 |         """
47 |         # Set absolute minimum for numerical stability (assuming float16 - 6x10^-5)
48 |         assert len(labels.shape) == 1
49 |         batch_size  = labels.shape[0]
50 | 
51 |         # Construct onehot labels -- dimension has to be (batch x 1 x dimension)
52 |         """
53 |         I thought that creation of a new tensor might slow down calculations 
54 |         but it doesn't seem to slow things; 
55 |         10^4 iterations of scatter to batch of 256 took 0.5 sec
56 |         
57 |         speed test:
58 |             scatter is ~25% faster for small batches (32),
59 |             indexing is ~25% faster for larger batches (>1024)
60 |             both are the same around 256 batch size
61 |         """
62 |         labels      = labels.long()
63 |         labels_oh   = torch.zeros(batch_size, 360)
64 |         labels_oh[torch.arange(batch_size), labels] = 1.
65 | 
66 |         x = labels_oh.cuda()
67 |                
68 |         # Concat one hot vector and convolve
69 |         labels_oh   = torch.cat( (  labels_oh[:, -self.kernel_size:], 
70 |                                     labels_oh, 
71 |                                     labels_oh[:, :self.kernel_size]), 
72 |                                  dim = 1)  
73 |   
74 |         labels_oh   = F.conv1d(labels_oh[:, None, :], self.filter)
75 | 
76 |         # convert labels to CUDA
77 |         labels_oh   = labels_oh.squeeze(1).cuda() 
78 |         
79 |         # calculate loss -- sum from paper
80 |         loss = (-1 * labels_oh * preds.log_softmax(1)).sum(1) 
81 |         #loss = loss / (1 + 2 * self.kernel_size)
82 | 
83 |         # loss = F.mse_loss(preds.softmax(1), labels_oh, reduction = 'sum')
84 | 
85 |         if size_average:
86 |             loss = loss.mean()
87 |         else:
88 |             loss = loss.sum()
89 | 
90 |         return loss
91 | 


--------------------------------------------------------------------------------