├── .gitignore ├── LICENSE ├── README.md ├── data ├── generate_pascal3d_csv.py ├── getPascalTrainVal.m └── get_csv.sh ├── datasets ├── __init__.py ├── pascal3d.py ├── pascal3d_kp.py └── vp_util.py ├── model_weights └── get_weights.sh ├── models ├── __init__.py ├── clickhere_cnn.py └── render4cnn.py ├── train.py └── util ├── Paths.py ├── __init__.py ├── load_datasets.py ├── metrics.py ├── torch_utils.py └── vp_loss.py /.gitignore: -------------------------------------------------------------------------------- 1 | experiments 2 | set_env.sh 3 | trash 4 | cluster* 5 | *.npy 6 | *.pth 7 | *.pkl 8 | old_stuff 9 | test.py 10 | data/*.csv 11 | data/*.txt 12 | 13 | # Generated Ignores 14 | # Byte-compiled / optimized / DLL files 15 | __pycache__/ 16 | *.py[cod] 17 | *$py.class 18 | 19 | # C extensions 20 | *.so 21 | 22 | # Distribution / packaging 23 | .Python 24 | env/ 25 | build/ 26 | develop-eggs/ 27 | dist/ 28 | downloads/ 29 | eggs/ 30 | .eggs/ 31 | lib/ 32 | lib64/ 33 | parts/ 34 | sdist/ 35 | var/ 36 | wheels/ 37 | *.egg-info/ 38 | .installed.cfg 39 | *.egg 40 | 41 | # PyInstaller 42 | # Usually these files are written by a python script from a template 43 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 44 | *.manifest 45 | *.spec 46 | 47 | # Installer logs 48 | pip-log.txt 49 | pip-delete-this-directory.txt 50 | 51 | # Unit test / coverage reports 52 | htmlcov/ 53 | .tox/ 54 | .coverage 55 | .coverage.* 56 | .cache 57 | nosetests.xml 58 | coverage.xml 59 | *.cover 60 | .hypothesis/ 61 | 62 | # Translations 63 | *.mo 64 | *.pot 65 | 66 | # Django stuff: 67 | *.log 68 | local_settings.py 69 | 70 | # Flask stuff: 71 | instance/ 72 | .webassets-cache 73 | 74 | # Scrapy stuff: 75 | .scrapy 76 | 77 | # Sphinx documentation 78 | docs/_build/ 79 | 80 | # PyBuilder 81 | target/ 82 | 83 | # Jupyter Notebook 84 | .ipynb_checkpoints 85 | 86 | # pyenv 87 | .python-version 88 | 89 | # celery beat schedule file 90 | celerybeat-schedule 91 | 92 | # SageMath parsed files 93 | *.sage.py 94 | 95 | # dotenv 96 | .env 97 | 98 | # virtualenv 99 | .venv 100 | venv/ 101 | ENV/ 102 | 103 | # Spyder project settings 104 | .spyderproject 105 | .spyproject 106 | 107 | # Rope project settings 108 | .ropeproject 109 | 110 | # mkdocs documentation 111 | /site 112 | 113 | # mypy 114 | .mypy_cache/ 115 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Mohamed El Banani 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pytorch-clickhere-cnn 2 | 3 | 4 | ## Introduction 5 | 6 | This is a [PyTorch](http://pytorch.org) implementation of [Clickhere CNN](https://github.com/rszeto/click-here-cnn) 7 | and [Render for CNN](https://github.com/shapenet/RenderForCNN). 8 | 9 | We currently provide the model, converted weights, dataset classes, and training/evaluation scripts. 10 | This implementation also includes an implementation of the Geometric Structure Aware loss function first mentioned in Render For CNN. 11 | 12 | 13 | If you have any questions, please email me at mbanani@umich.edu. 14 | 15 | ## Getting Started 16 | 17 | Add the repository to your python path: 18 | 19 | export PYTHONPATH=$PYTHONPATH:$(pwd) 20 | 21 | 22 | Please be aware that I make use of the following packages: 23 | - Python 3.6 24 | - PyTorch 1.1 and Torch Vision 0.2.2 25 | - scipy 26 | - pandas 27 | 28 | ### Generating the data 29 | Download the [Pascal 3D+ dataset](http://cvgl.stanford.edu/projects/pascal3d.html) (Release 1.1). 30 | Set the path for the Pascal3D directory in util/Paths.py. Finally, run the following command from the repository's root directory. 31 | 32 | cd data/ 33 | python generate_pascal3d_csv.py 34 | 35 | Please note that this will generate the csv files for 3 variants of the dataset: Pascal 3D+ (full), Pascal 3D+ (easy), and Pascal 3D-Vehicles (with keypoints). Those datasets are needed to obtain different sets of results. 36 | Alternatively, you could directly download the csv files by using `data/get_csv.sh` 37 | 38 | ### Pre-trained Model Weights 39 | 40 | We have converted the RenderForCNN and Clickhere model weights from the respective caffe models. 41 | The converted models are available for download by running the script in `model_weights/get_weights.sh` 42 | The converted models achieve comparable performance to the Caffe for the Render For CNN 43 | model, however, there is a larger error observed for Clickhere CNN. 44 | Updated results coming soon. 45 | 46 | ### Running Inference 47 | 48 | After downloading Pascal 3D+ and the pretrained weights, generating the CSV files, and setting the appropriate paths as mentioned above, 49 | you can run inference on the Pascal 3D+ dataset by running one of the following commands (depending on the model): 50 | 51 | python train.py --model chcnn --dataset pascalVehKP 52 | python train.py --model r4cnn --dataset pascalEasy 53 | python train.py --model r4cnn --dataset pascal 54 | 55 | 56 | #### Results 57 | 58 | To be updated soon! 59 | 60 | The original Render For CNN paper reported the results on the 'easy' subset of Pascal3D, which removes any truncated and occluded images from the datasets. While Click-Here CNN reports results on an augmented version of the dataset where multiple instances may belong to the same object in the image as each image-keypoint pair corresponds to an instance. Below are the results Obtained from each of the runs above. 61 | 62 | ### Render For CNN paper results: 63 | 64 | We evaluate the converted model on Pascal3D-easy as reported in the original Render For CNN paper, 65 | as well as the full Pascal 3D dataset. 66 | It is worth nothing that the converted model actually exceeds the performance reported in Render For CNN. 67 | 68 | # #### Accuracy 69 | ataset | plane | bike | boat | bottle| bus | car | chair |d.table| mbike | sofa | train | tv | mean | 70 | |:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| 71 | | Full | 76.26 | 69.58 | 59.03 | 87.74 | 84.32 | 69.97 | 74.2 | 66.79 | 77.29 | 82.37 | 75.48 | 81.93 | 75.41 | 72 | | Easy | 80.37 | 85.59 | 62.93 | 95.60 | 94.14 | 84.08 | 82.76 | 80.95 | 85.30 | 84.61 | 84.08 | 93.26 | 84.47 | 73 | | Reported | 74 | 83 | 52 | 91 | 91 | 88 | 86 | 73 | 78 | 90 | 86 | 92 | 82 | 74 | 75 | #### Median Error 76 | |dataset | plane | bike | boat | bottle| bus | car | chair |d.table| mbike | sofa | train | tv | mean | 77 | |:---------:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| 78 | |Full | 11.52 | 15.33 | 19.33 | 8.51 | 5.54 | 9.39 | 13.83 | 12.87 | 14.90 | 13.03 | 8.96 | 13.72 | 12.24 | 79 | |Easy | 10.32 | 11.66 | 17.74 | 6.66 | 4.52 | 6.65 | 11.21 | 9.75 | 13.11 | 9.76 | 5.52 | 11.93 | 9.90 | 80 | |Reported | 15.4 | 14.8 | 25.6 | 9.3 | 3.6 | 6.0 | 9.7 | 10.8 | 16.7 | 9.5 | 6.1 | 12.6 | 11.7 | 81 | 82 | 83 | 84 | ### Pascal3D - Vehicles with Keypoints 85 | 86 | We evaluated the converted Render For CNN and Click-Here CNN models on Pascal3D-Vehicle. 87 | It should be noted that the results for Click-Here are lower than those achieved by running the author provided Caffe code. 88 | It seems that there is something incorrect with the current reimplementation and/or weight conversion. 89 | We are working on fixing this problem. 90 | 91 | #### Accuracy 92 | | | bus | car | m.bike | mean | 93 | |:-------------------------:|:-----:|:-----:|:------:|:-----:| 94 | | Render For CNN | 89.26 | 74.36 | 81.93 | 81.85 | 95 | | Click-Here CNN | 86.91 | 83.25 | 73.83 | 81.33 | 96 | | Click-Here CNN (reported) | 96.8 | 90.2 | 85.2 | 90.7 | 97 | 98 | #### Median Error 99 | | | bus | car | m.bike | mean | 100 | |:-------------------------:|:-----:|:-----:|:------:|:-----:| 101 | | Render For CNN | 5.16 | 8.53 | 13.46 | 9.05 | 102 | | Click-Here CNN | 4.01 | 8.18 | 19.71 | 10.63 | 103 | | Click-Here CNN (reported) | 2.63 | 4.98 | 11.4 | 6.35 | 104 | 105 | 106 | ### Pascal3D - Vehicles with Keypoints (Fine-tuned Models) 107 | 108 | We fine-tuned both models on the Pascal 3D+ (Vehicles with Keypoints) dataset. 109 | Since we suspect that the problem with the replication of the Click-Here CNN model 110 | is in the attention section, we conducted an experiment where we only fine-tuned 111 | those weights. As reported below, fine-tuning just the attention model achieves the best performance. 112 | 113 | | | bus | car | m.bike | mean | 114 | |:-----------------------------:|:-----:|:-----:|:------:|:-----:| 115 | | Render For CNN FT | 93.55 | 83.98 | 87.30 | 88.28 | 116 | | Click-Here CNN FT | 92.97 | 89.84 | 81.25 | 88.02 | 117 | | Click-Here CNN FT-Attention | 94.48 | 90.77 | 84.91 | 90.05 | 118 | | Click-Here CNN (reported) | 96.8 | 90.2 | 85.2 | 90.7 | 119 | 120 | 121 | #### Median Error 122 | 123 | | | bus | car | m.bike | mean | 124 | |:-----------------------------:|:-----:|:-----:|:------:|:-----:| 125 | | Render For CNN FT | 3.04 | 5.83 | 11.95 | 6.94 | 126 | | Click-Here CNN FT | 2.93 | 5.14 | 13.42 | 7.16 | 127 | | Click-Here CNN FT-Attention | 2.88 | 5.24 | 12.10 | 6.74 | 128 | | Click-Here CNN (reported) | 2.63 | 4.98 | 11.4 | 6.35 | 129 | 130 | 131 | ## Training the model 132 | 133 | To train the model, simply run `python train.py` with parameter flags as indicated in train.py. 134 | 135 | ## Citation 136 | 137 | This is an implementation of [Clickhere CNN](https://github.come/rszeto/click-here-cnn) and [Render For CNN](https://github.com/shapenet/RenderForCNN), so please cite the respective papers if you use this code in any published work. 138 | 139 | ## Acknowledgements 140 | 141 | We would like to thank Ryan Szeto, Hao Su, and Charles R. Qi for providing their code, and 142 | for their assistance with questions regarding reimplementing their work. We would also 143 | like to acknowledge [Kenta Iwasaki](https://discuss.pytorch.org/u/dranithix/summary) for 144 | his advice with loss function implementation and [Qi Fan](https://github.com/fanq15) for releasing 145 | [caffe_to_torch_to_pytorch](https://github.com/fanq15/caffe_to_torch_to_pytorch). 146 | 147 | This work has been partially supported by DARPA W32P4Q-15-C-0070 (subcontract from SoarTech) and funds from the University of Michigan Mobility Transformation Center. 148 | -------------------------------------------------------------------------------- /data/generate_pascal3d_csv.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import scipy.io as spio 4 | 5 | from IPython import embed 6 | 7 | from util import Paths 8 | 9 | INFO_FILE_HEADER = 'imgPath,bboxTLX,bboxTLY,bboxBRX,bboxBRY,imgKeyptX,imgKeyptY,keyptClass,objClass,azimuthClass,elevationClass,rotationClass\n' 10 | 11 | synset_name_pairs = [ ('02691156', 'aeroplane'), 12 | ('02834778', 'bicycle'), 13 | ('02858304', 'boat'), 14 | ('02876657', 'bottle'), 15 | ('02924116', 'bus'), 16 | ('02958343', 'car'), 17 | ('03001627', 'chair'), 18 | ('04379243', 'diningtable'), 19 | ('03790512', 'motorbike'), 20 | ('04256520', 'sofa'), 21 | ('04468005', 'train'), 22 | ('03211117', 'tvmonitor')] 23 | 24 | KEYPOINT_TYPES = { 25 | 'aeroplane' : ['right_wing', 'tail', 'rudder_upper', 'noselanding', 26 | 'left_wing', 'rudder_lower', 'right_elevator', 'left_elevator'], 27 | 'bicycle' : ['left_front_wheel', 'left_back_wheel', 'seat_back', 28 | 'right_front_wheel', 'left_pedal_center', 'head_center', 29 | 'left_handle', 'right_pedal_center', 'right_handle', 30 | 'right_back_wheel', 'seat_front'], 31 | 'boat' : ['head', 'head_left', 'head_down', 'head_right', 32 | 'tail', 'tail_left', 'tail_right'], 33 | 'bottle' : ['body', 'body_right', 'body_left', 'bottom_right', 34 | 'bottom', 'mouth', 'bottom_left'], 35 | 'bus' : ['body_back_left_lower', 'body_back_left_upper', 'body_back_right_lower', 36 | 'body_back_right_upper', 'body_front_left_upper', 'body_front_right_upper', 37 | 'body_front_left_lower', 'body_front_right_lower', 'left_back_wheel', 38 | 'left_front_wheel', 'right_back_wheel', 'right_front_wheel'], 39 | 'car' : ['left_front_wheel', 'left_back_wheel', 'right_front_wheel', 40 | 'right_back_wheel', 'upper_left_windshield', 'upper_right_windshield', 41 | 'upper_left_rearwindow', 'upper_right_rearwindow', 'left_front_light', 42 | 'right_front_light', 'left_back_trunk', 'right_back_trunk'], 43 | 'chair' : ['seat_upper_right', 'leg_upper_left', 'seat_lower_left', 'leg_upper_right', 44 | 'back_upper_left', 'leg_lower_left', 'seat_upper_left', 'leg_lower_right', 45 | 'seat_lower_right', 'back_upper_right'], 46 | 'diningtable' : ['top_upper_right', 'top_right', 'top_left', 'leg_upper_left', 47 | 'top_lower_left', 'top_lower_right', 'top_down', 'leg_upper_right', 48 | 'top_upper_left', 'leg_lower_left', 'leg_lower_right', 'top_up'], 49 | 'motorbike' : ['back_seat', 'front_seat', 'head_center', 'headlight_center', 50 | 'left_back_wheel', 'left_front_wheel', 'left_handle_center', 51 | 'right_back_wheel', 'right_front_wheel', 'right_handle_center'], 52 | 'sofa' : ['front_bottom_right', 'top_right_corner', 'top_left_corner', 53 | 'front_bottom_left', 'seat_bottom_right', 'seat_bottom_left', 54 | 'right_bottom_back', 'seat_top_right', 'left_bottom_back', 'seat_top_left'], 55 | 'train' : ['mid2_left_top', 'head_left_top', 'head_right_bottom', 'mid2_right_bottom', 56 | 'mid1_left_bottom', 'mid1_right_top', 'tail_right_bottom', 'tail_right_top', 'tail_left_top', 57 | 'head_right_top', 'head_left_bottom', 'mid2_left_bottom', 'tail_left_bottom', 'head_top', 58 | 'mid1_left_top', 'mid2_right_top', 'mid1_right_bottom'], 59 | 'tvmonitor' : ['front_bottom_right', 'back_top_left', 'front_top_left', 'front_bottom_left', 60 | 'back_bottom_left', 'front_top_right', 'back_top_right', 'back_bottom_right'] 61 | } 62 | 63 | 64 | SYNSET_CLASSIDX_MAP = {} 65 | for i in range(len(synset_name_pairs)): 66 | synset, _ = synset_name_pairs[i] 67 | SYNSET_CLASSIDX_MAP[synset] = i 68 | 69 | KEYPOINT_CLASSES = [] 70 | for synset, class_name in synset_name_pairs: 71 | keypoint_names = KEYPOINT_TYPES[class_name] 72 | for keypoint_name in keypoint_names: 73 | KEYPOINT_CLASSES.append(class_name + '_' + keypoint_name) 74 | 75 | KEYPOINTCLASS_INDEX_MAP = {} 76 | for i in range(len(KEYPOINT_CLASSES)): 77 | KEYPOINTCLASS_INDEX_MAP[KEYPOINT_CLASSES[i]] = i 78 | 79 | DATASET_SOURCES = ['pascal', 'imagenet'] 80 | PASCAL3D_ROOT = Paths.pascal3d_root 81 | ANNOTATIONS_ROOT = os.path.join(PASCAL3D_ROOT, 'Annotations') 82 | IMAGES_ROOT = os.path.join(PASCAL3D_ROOT, 'Images') 83 | 84 | 85 | """ 86 | Create pascal image kp dataset for all classes. 87 | Code adapted from (https://github.com/rszeto/click-here-cnn) 88 | """ 89 | def create_pascal_image_kp_csvs(vehicles = False): 90 | # Generate train and test lists and store in file 91 | BASE_DIR = os.path.dirname(os.path.abspath(__file__)) 92 | 93 | if not (os.path.exists('trainImgIds.txt') and os.path.exists('valImgIds.txt')): 94 | matlab_cmd = 'addpath(\'%s\'); getPascalTrainVal' % BASE_DIR 95 | print('Generating MATLAB command: %s' % (matlab_cmd)) 96 | os.system('matlab -nodisplay -r "try %s; catch; end; quit;"' % matlab_cmd) 97 | # os.system('matlab -nodisplay -r "%s; quit;"' % matlab_cmd) 98 | # Get training and test image IDs 99 | with open('trainImgIds.txt', 'rb') as trainIdsFile: 100 | trainIds = np.loadtxt(trainIdsFile, dtype='string') 101 | with open('valImgIds.txt', 'rb') as testIdsFile: 102 | testIds = np.loadtxt(testIdsFile, dtype='string') 103 | 104 | data_dir = os.path.join(os.path.dirname(BASE_DIR), 'data') 105 | if not os.path.exists(data_dir): 106 | os.makedirs(data_dir) 107 | 108 | 109 | if vehicles: 110 | train_csv = os.path.join(data_dir, 'veh_pascal3d_kp_train.csv') 111 | valid_csv = os.path.join(data_dir, 'veh_pascal3d_kp_valid.csv') 112 | 113 | 114 | synset_name_pairs = [ ('02924116', 'bus'), 115 | ('02958343', 'car'), 116 | ('03790512', 'motorbike')] 117 | 118 | else: 119 | train_csv = os.path.join(data_dir, 'pascal3d_kp_train.csv') 120 | valid_csv = os.path.join(data_dir, 'pascal3d_kp_valid.csv') 121 | synset_name_pairs = [ ('02691156', 'aeroplane'), 122 | ('02834778', 'bicycle'), 123 | ('02858304', 'boat'), 124 | ('02876657', 'bottle'), 125 | ('02924116', 'bus'), 126 | ('02958343', 'car'), 127 | ('03001627', 'chair'), 128 | ('04379243', 'diningtable'), 129 | ('03790512', 'motorbike'), 130 | ('04256520', 'sofa'), 131 | ('04468005', 'train'), 132 | ('03211117', 'tvmonitor')] 133 | 134 | SYNSET_CLASSIDX_MAP = {} 135 | for i in range(len(synset_name_pairs)): 136 | synset, _ = synset_name_pairs[i] 137 | SYNSET_CLASSIDX_MAP[synset] = i 138 | 139 | KEYPOINT_CLASSES = [] 140 | for synset, class_name in synset_name_pairs: 141 | keypoint_names = KEYPOINT_TYPES[class_name] 142 | for keypoint_name in keypoint_names: 143 | KEYPOINT_CLASSES.append(class_name + '_' + keypoint_name) 144 | 145 | KEYPOINTCLASS_INDEX_MAP = {} 146 | for i in range(len(KEYPOINT_CLASSES)): 147 | KEYPOINTCLASS_INDEX_MAP[KEYPOINT_CLASSES[i]] = i 148 | 149 | info_file_train = open(train_csv, 'w') 150 | info_file_train.write(INFO_FILE_HEADER) 151 | info_file_test = open(valid_csv, 'w') 152 | info_file_test.write(INFO_FILE_HEADER) 153 | 154 | for synset, class_name in synset_name_pairs: 155 | print("Generating data for %s " % (class_name)) 156 | all_zeros = 0 157 | counter = 0 158 | counter_kp = 0 159 | 160 | object_class = SYNSET_CLASSIDX_MAP[synset] 161 | for dataset_source in DATASET_SOURCES: 162 | class_source_id = '%s_%s' % (class_name, dataset_source) 163 | for anno_file in sorted(os.listdir(os.path.join(ANNOTATIONS_ROOT, class_source_id))): 164 | anno_file_id = os.path.splitext(os.path.basename(anno_file))[0] 165 | if anno_file_id in trainIds: 166 | anno_file_set = 'train' 167 | elif anno_file_id in testIds: 168 | anno_file_set = 'test' 169 | else: 170 | continue 171 | 172 | anno = loadmat(os.path.join(ANNOTATIONS_ROOT, class_source_id, anno_file))['record'] 173 | rel_image_path = os.path.join('Images', class_source_id, anno['filename']) 174 | 175 | 176 | # Make objs an array regardless of how many objects there are 177 | objs = np.array([anno['objects']]) if isinstance(anno['objects'], dict) else anno['objects'] 178 | for obj_i, obj in enumerate(objs): 179 | # Only deal with objects in current class 180 | if obj['class'] == class_name: 181 | # Get crop using bounding box from annotation 182 | # Note: Annotations are in MATLAB coordinates (1-indexed), inclusive 183 | # Convert to 0-indexed numpy array 184 | bbox = np.array(obj['bbox']) - 1 185 | 186 | # Get visible and in-frame keypoints 187 | keypoints = obj['anchors'] 188 | try: 189 | assert set(KEYPOINT_TYPES[class_name]) == set(keypoints.keys()) 190 | except: 191 | print("Assertion failed for keypoint types") 192 | embed() 193 | 194 | viewpoint = obj['viewpoint'] 195 | # Remove erronous KPs 196 | if(viewpoint['azimuth'] == viewpoint['theta'] == viewpoint['elevation'] == 0.0): 197 | all_zeros += 1 198 | else: 199 | counter += 1 200 | azimuth = np.mod(np.round(viewpoint['azimuth']), 360) 201 | elevation = np.mod(np.round(viewpoint['elevation']), 360) 202 | tilt = np.mod(np.round(viewpoint['theta']), 360) 203 | 204 | for keypoint_name in KEYPOINT_TYPES[class_name]: 205 | # Get 0-indexed keypoint location 206 | keypoint_loc_full = keypoints[keypoint_name]['location'] - 1 207 | if keypoint_loc_full.size > 0 and insideBox(keypoint_loc_full, bbox): 208 | counter_kp += 1 209 | # Add info for current keypoint 210 | keypoint_class = KEYPOINTCLASS_INDEX_MAP[class_name + '_' + keypoint_name] 211 | if vehicles: 212 | if object_class == 0: 213 | final_label = ( 4, azimuth, elevation, tilt) 214 | elif object_class == 1: 215 | final_label = ( 5, azimuth, elevation, tilt) 216 | elif object_class == 2: 217 | final_label = ( 8, azimuth, elevation, tilt) 218 | else: 219 | print("Error: Object classes do not match expected values!") 220 | 221 | keypoint_str = keypointInfo2Str(rel_image_path, bbox, keypoint_loc_full, keypoint_class, final_label) 222 | if anno_file_set == 'train': 223 | info_file_train.write(keypoint_str) 224 | else: 225 | info_file_test.write(keypoint_str) 226 | print("%s : %d images %d image-kp pairs, %d ommited " % (class_name, counter, counter_kp, all_zeros)) 227 | 228 | info_file_train.close() 229 | info_file_test.close() 230 | 231 | 232 | """ 233 | Create pascal image kp dataset for all classes. 234 | Code adapted from (https://github.com/rszeto/click-here-cnn) 235 | """ 236 | def create_pascal_image_csvs(easy = False): 237 | # Generate train and test lists and store in file 238 | BASE_DIR = os.path.dirname(os.path.abspath(__file__)) 239 | 240 | if not (os.path.exists('trainImgIds.txt') and os.path.exists('valImgIds.txt')): 241 | matlab_cmd = 'addpath(\'%s\'); getPascalTrainVal' % BASE_DIR 242 | print('Generating MATLAB command: %s' % (matlab_cmd)) 243 | os.system('matlab -nodisplay -r "try %s; catch; end; quit;"' % matlab_cmd) 244 | # os.system('matlab -nodisplay -r "%s; quit;"' % matlab_cmd) 245 | # Get training and test image IDs 246 | with open('trainImgIds.txt', 'rb') as trainIdsFile: 247 | trainIds = np.loadtxt(trainIdsFile, dtype='string') 248 | with open('valImgIds.txt', 'rb') as testIdsFile: 249 | testIds = np.loadtxt(testIdsFile, dtype='string') 250 | 251 | data_dir = os.path.join(os.path.dirname(BASE_DIR), 'data') 252 | if not os.path.exists(data_dir): 253 | os.makedirs(data_dir) 254 | 255 | if easy: 256 | train_csv = os.path.join(data_dir, 'pascal3d_train_easy.csv') 257 | valid_csv = os.path.join(data_dir, 'pascal3d_valid_easy.csv') 258 | else: 259 | train_csv = os.path.join(data_dir, 'pascal3d_train.csv') 260 | valid_csv = os.path.join(data_dir, 'pascal3d_valid.csv') 261 | 262 | 263 | info_file_train = open(train_csv, 'w') 264 | info_file_train.write(INFO_FILE_HEADER) 265 | info_file_test = open(valid_csv, 'w') 266 | info_file_test.write(INFO_FILE_HEADER) 267 | 268 | for synset, class_name in synset_name_pairs: 269 | print("Generating data for %s " % (class_name)) 270 | all_zeros = 0 271 | hard_images = 0 272 | counter = 0 273 | object_class = SYNSET_CLASSIDX_MAP[synset] 274 | for dataset_source in DATASET_SOURCES: 275 | class_source_id = '%s_%s' % (class_name, dataset_source) 276 | for anno_file in sorted(os.listdir(os.path.join(ANNOTATIONS_ROOT, class_source_id))): 277 | anno_file_id = os.path.splitext(os.path.basename(anno_file))[0] 278 | if anno_file_id in trainIds: 279 | anno_file_set = 'train' 280 | elif anno_file_id in testIds: 281 | anno_file_set = 'test' 282 | else: 283 | continue 284 | 285 | anno = loadmat(os.path.join(ANNOTATIONS_ROOT, class_source_id, anno_file))['record'] 286 | rel_image_path = os.path.join('Images', class_source_id, anno['filename']) 287 | 288 | # Make objs an array regardless of how many objects there are 289 | objs = np.array([anno['objects']]) if isinstance(anno['objects'], dict) else anno['objects'] 290 | for obj_i, obj in enumerate(objs): 291 | # Only deal with objects in current class 292 | if obj['class'] == class_name: 293 | # Get crop using bounding box from annotation 294 | # Note: Annotations are in MATLAB coordinates (1-indexed), inclusive 295 | # Convert to 0-indexed numpy array 296 | bbox = np.array(obj['bbox']) - 1 297 | 298 | viewpoint = obj['viewpoint'] 299 | # Remove erronous KPs 300 | if(viewpoint['azimuth'] == viewpoint['theta'] == viewpoint['elevation'] == 0.0): 301 | all_zeros += 1 302 | elif (easy and (obj['difficult'] == 1 or obj['truncated'] == 1 or obj['occluded'] == 1 )): 303 | hard_images += 1 304 | else: 305 | counter += 1 306 | azimuth = np.mod(np.round(viewpoint['azimuth']), 360) 307 | elevation = np.mod(np.round(viewpoint['elevation']), 360) 308 | tilt = np.mod(np.round(viewpoint['theta']), 360) 309 | 310 | final_label = ( object_class, azimuth, elevation, tilt) 311 | viewpoint_str = viewpointInfo2Str(rel_image_path, bbox, final_label) 312 | if anno_file_set == 'train': 313 | info_file_train.write(viewpoint_str) 314 | else: 315 | info_file_test.write(viewpoint_str) 316 | print("%s : %d images, ommitted: all_zeros - %d, difficult - %d " % (class_name, counter, all_zeros,hard_images)) 317 | 318 | info_file_train.close() 319 | info_file_test.close() 320 | 321 | ######### Importing .mat files ############################################### 322 | ######### Reference: http://stackoverflow.com/a/8832212 ###################### 323 | 324 | def loadmat(filename): 325 | ''' 326 | this function should be called instead of direct spio.loadmat 327 | as it cures the problem of not properly recovering python dictionaries 328 | from mat files. It calls the function check keys to cure all entries 329 | which are still mat-objects 330 | ''' 331 | data = spio.loadmat(filename, struct_as_record=False, squeeze_me=True) 332 | return _check_keys(data) 333 | 334 | def _check_keys(dict): 335 | ''' 336 | checks if entries in dictionary are mat-objects. If yes 337 | todict is called to change them to nested dictionaries 338 | ''' 339 | for key in dict: 340 | if isinstance(dict[key], spio.matlab.mio5_params.mat_struct): 341 | dict[key] = _todict(dict[key]) 342 | return dict 343 | 344 | def _todict(matobj): 345 | ''' 346 | A recursive function which constructs from matobjects nested dictionaries 347 | ''' 348 | dict = {} 349 | for strg in matobj._fieldnames: 350 | elem = matobj.__dict__[strg] 351 | if isinstance(elem, spio.matlab.mio5_params.mat_struct): 352 | dict[strg] = _todict(elem) 353 | # Handle case where elem is an array of mat_structs 354 | elif isinstance(elem, np.ndarray) and len(elem) > 0 and \ 355 | isinstance(elem[0], spio.matlab.mio5_params.mat_struct): 356 | dict[strg] = np.array([_todict(subelem) for subelem in elem]) 357 | else: 358 | dict[strg] = elem 359 | return dict 360 | 361 | def insideBox(point, box): 362 | return point[0] >= box[0] and point[0] <= box[2] \ 363 | and point[1] >= box[1] and point[1] <= box[3] 364 | 365 | def keypointInfo2Str(fullImagePath, bbox, keyptLoc, keyptClass, viewptLabel): 366 | return '%s,%d,%d,%d,%d,%f,%f,%d,%d,%d,%d,%d\n' % ( 367 | fullImagePath, 368 | bbox[0], bbox[1], bbox[2], bbox[3], 369 | keyptLoc[0], keyptLoc[1], 370 | keyptClass, 371 | viewptLabel[0], viewptLabel[1], viewptLabel[2], viewptLabel[3] 372 | ) 373 | 374 | def viewpointInfo2Str(fullImagePath, bbox, viewptLabel): 375 | return '%s,%d,%d,%d,%d,%d,%d,%d,%d\n' % ( 376 | fullImagePath, 377 | bbox[0], bbox[1], bbox[2], bbox[3], 378 | viewptLabel[0], viewptLabel[1], viewptLabel[2], viewptLabel[3] 379 | ) 380 | 381 | if __name__ == '__main__': 382 | # create_pascal_image_kp_csvs() 383 | create_pascal_image_kp_csvs(vehicles = True) # Just vehicles 384 | # create_pascal_image_csvs() 385 | # create_pascal_image_csvs(easy = True) # Easy subset 386 | -------------------------------------------------------------------------------- /data/getPascalTrainVal.m: -------------------------------------------------------------------------------- 1 | PASCAL3D_ROOT = '/z/home/mbanani/datasets/pascal3d'; 2 | addpath(fullfile(PASCAL3D_ROOT, 'PASCAL', 'VOCdevkit', 'VOCcode')); 3 | 4 | % Run VOC code to extract image IDs 5 | VOCinit; 6 | trainImgIds = textread(sprintf(VOCopts.imgsetpath, 'train'), '%s'); 7 | valImgIds = textread(sprintf(VOCopts.imgsetpath, 'val'), '%s'); 8 | 9 | % Save IDs to file 10 | trainIdsFile = fopen('trainImgIds.txt', 'w'); 11 | for i=1:numel(trainImgIds) 12 | fprintf(trainIdsFile, '%s\n', trainImgIds{i}); 13 | end 14 | valIdsFile = fopen('valImgIds.txt', 'w'); 15 | for i=1:numel(valImgIds) 16 | fprintf(valIdsFile, '%s\n', valImgIds{i}); 17 | end 18 | -------------------------------------------------------------------------------- /data/get_csv.sh: -------------------------------------------------------------------------------- 1 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_kp_train.csv 2 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_kp_valid.csv 3 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_train.csv 4 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_train_easy.csv 5 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_valid.csv 6 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/pascal3d_valid_easy.csv 7 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/veh_pascal3d_kp_train.csv 8 | wget http://www-personal.umich.edu/~mbanani/clickhere/csv/veh_pascal3d_kp_valid.csv 9 | -------------------------------------------------------------------------------- /datasets/__init__.py: -------------------------------------------------------------------------------- 1 | from .pascal3d import * 2 | from .pascal3d_kp import * 3 | -------------------------------------------------------------------------------- /datasets/pascal3d.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | import time 4 | import pandas 5 | import os 6 | 7 | import numpy as np 8 | import torch.utils.data as data 9 | 10 | from PIL import Image 11 | from .vp_util import label_to_probs 12 | from torchvision import transforms 13 | import copy 14 | import random 15 | 16 | from IPython import embed 17 | 18 | class pascal3d(data.Dataset): 19 | """ 20 | Construct a Pascal Dataset. 21 | Inputs: 22 | csv_path path containing instance data 23 | augment boolean for flipping images 24 | """ 25 | def __init__(self, csv_path, dataset_root = None, im_size = 227, transform = None, just_easy = False, num_classes = 12): 26 | 27 | start_time = time.time() 28 | 29 | # Load instance data from csv-file 30 | im_paths, bbox, obj_cls, vp_labels = self.csv_to_instances(csv_path) 31 | print("csv file length: ", len(im_paths)) 32 | 33 | # dataset parameters 34 | self.root = dataset_root 35 | self.loader = self.pil_loader 36 | self.im_paths = im_paths 37 | self.bbox = bbox 38 | self.obj_cls = obj_cls 39 | self.vp_labels = vp_labels 40 | self.flip = [False] * len(im_paths) 41 | 42 | self.im_size = im_size 43 | self.num_classes = num_classes 44 | self.num_instances = len(self.im_paths) 45 | assert transform != None 46 | self.transform = transform 47 | 48 | # Set weights for loss 49 | class_hist = np.histogram(obj_cls, list(range(0, self.num_classes+1)))[0] 50 | mean_class_size = np.mean(class_hist) 51 | self.loss_weights = mean_class_size / class_hist 52 | 53 | # Print out dataset stats 54 | print("Dataset loaded in ", time.time() - start_time, " secs.") 55 | print("Dataset size: ", self.num_instances) 56 | 57 | def __getitem__(self, index): 58 | """ 59 | Args: 60 | index (int): Index 61 | Returns: 62 | tuple: (image, target) where target is class_index of the target class. 63 | """ 64 | 65 | # Load and transform image 66 | if self.root == None: 67 | im_path = self.im_paths[index] 68 | else: 69 | im_path = os.path.join(self.root, self.im_paths[index]) 70 | 71 | bbox = self.bbox[index] 72 | obj_cls = self.obj_cls[index] 73 | view = self.vp_labels[index] 74 | flip = self.flip[index] 75 | 76 | # Transform labels 77 | azim, elev, tilt = (view + 360.) % 360. 78 | 79 | # Load and transform image 80 | img = self.loader(im_path, bbox = bbox, flip = flip) 81 | if self.transform is not None: 82 | img = self.transform(img) 83 | 84 | 85 | # construct unique key for statistics -- only need to generate imid and year 86 | _bb = str(bbox[0]) + '-' + str(bbox[1]) + '-' + str(bbox[2]) + '-' + str(bbox[3]) 87 | key_uid = self.im_paths[index] + '_' + _bb + '_objc' + str(obj_cls) + '_kpc' + str(0) 88 | 89 | return img, azim, elev, tilt, obj_cls, -1, -1, key_uid 90 | 91 | def __len__(self): 92 | return self.num_instances 93 | 94 | """ 95 | Loads images and applies the following transformations 96 | 1. convert all images to RGB 97 | 2. crop images using bbox (if provided) 98 | 3. resize using LANCZOS to rescale_size 99 | 4. convert from RGB to BGR 100 | 5. (? not done now) convert from HWC to CHW 101 | 6. (optional) flip image 102 | 103 | TODO: once this works, convert to a relative path, which will matter for 104 | synthetic data dataset class size. 105 | """ 106 | def pil_loader(self, path, bbox = None ,flip = False): 107 | # open path as file to avoid ResourceWarning 108 | # link: (https://github.com/python-pillow/Pillow/issues/835) 109 | with open(path, 'rb') as f: 110 | with Image.open(f) as img: 111 | img = img.convert('RGB') 112 | 113 | # Convert to BGR from RGB 114 | r, g, b = img.split() 115 | img = Image.merge("RGB", (b, g, r)) 116 | 117 | img = img.crop(box=bbox) 118 | 119 | # verify that imresize uses LANCZOS 120 | img = img.resize( (self.im_size, self.im_size), Image.LANCZOS) 121 | 122 | # flip image 123 | if flip: 124 | img = img.transpose(Image.FLIP_LEFT_RIGHT) 125 | 126 | return img 127 | 128 | def csv_to_instances(self, csv_path): 129 | df = pandas.read_csv(csv_path, sep=',') 130 | data = df.values 131 | 132 | data_split = np.split(data, [0, 1, 5, 6, 9], axis=1) 133 | del(data_split[0]) 134 | 135 | image_paths = np.squeeze(data_split[0]).tolist() 136 | bboxes = data_split[1].tolist() 137 | obj_class = np.squeeze(data_split[2]).tolist() 138 | viewpoints = np.array(data_split[3].tolist()) 139 | 140 | return image_paths, bboxes, obj_class, viewpoints 141 | 142 | def augment(self): 143 | self.im_paths = self.im_paths + self.im_paths 144 | self.bbox = self.bbox + self.bbox 145 | self.obj_cls = self.obj_cls + self.obj_cls 146 | self.vp_labels = self.vp_labels + self.vp_labels 147 | self.flip = self.flip + [True] * self.num_instances 148 | assert len(self.flip) == len(self.im_paths) 149 | self.num_instances = len(self.im_paths) 150 | print("Augmented dataset. New size: ", self.num_instances) 151 | 152 | def generate_validation(self, ratio = 0.1): 153 | assert ratio > (2.*self.num_classes/float(self.num_instances)) and ratio < 0.5 154 | 155 | random.seed(a = 2741998) 156 | 157 | valid_class = copy.deepcopy(self) 158 | 159 | valid_size = int(ratio * self.num_instances) 160 | train_size = self.num_instances - valid_size 161 | train_instances = list(range(0, self.num_instances)) 162 | valid_instances = random.sample(train_instances, valid_size) 163 | train_instances = [x for x in train_instances if x not in valid_instances] 164 | 165 | assert train_size == len(train_instances) and valid_size == len(valid_instances) 166 | 167 | valid_class.im_paths = [ self.im_paths[i] for i in sorted(valid_instances) ] 168 | valid_class.bbox = [ self.bbox[i] for i in sorted(valid_instances) ] 169 | valid_class.obj_cls = [ self.obj_cls[i] for i in sorted(valid_instances) ] 170 | valid_class.vp_labels = [ self.vp_labels[i] for i in sorted(valid_instances) ] 171 | valid_class.flip = [ self.flip[i] for i in sorted(valid_instances) ] 172 | valid_class.num_instances = valid_size 173 | 174 | self.im_paths = [ self.im_paths[i] for i in sorted(train_instances) ] 175 | self.bbox = [ self.bbox[i] for i in sorted(train_instances) ] 176 | self.obj_cls = [ self.obj_cls[i] for i in sorted(train_instances) ] 177 | self.vp_labels = [ self.vp_labels[i] for i in sorted(train_instances) ] 178 | self.flip = [ self.flip[i] for i in sorted(train_instances) ] 179 | self.num_instances = train_size 180 | 181 | return valid_class 182 | -------------------------------------------------------------------------------- /datasets/pascal3d_kp.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import time 3 | import copy 4 | import random 5 | import pandas 6 | import os 7 | 8 | import numpy as np 9 | 10 | from PIL import Image 11 | from .vp_util import label_to_probs 12 | from torchvision import transforms 13 | from IPython import embed 14 | 15 | 16 | class pascal3d_kp(torch.utils.data.Dataset): 17 | 18 | """ 19 | Construct a Pascal Dataset. 20 | Inputs: 21 | csv_path path containing instance data 22 | augment boolean for flipping images 23 | """ 24 | def __init__(self, csv_path, dataset_root = None, im_size = 227, transform = None, map_size = 46, num_classes = 12, flip = False): 25 | 26 | assert transform != None 27 | 28 | start_time = time.time() 29 | 30 | # Load instance data from csv-file 31 | im_paths, bbox, kp_loc, kp_cls, obj_cls, vp_labels = self.csv_to_instances(csv_path) 32 | csv_length = len(im_paths) 33 | 34 | # dataset parameters 35 | self.root = dataset_root 36 | self.loader = self.pil_loader 37 | self.im_paths = im_paths 38 | self.bbox = bbox 39 | self.kp_loc = kp_loc 40 | self.kp_cls = kp_cls 41 | self.obj_cls = obj_cls 42 | self.vp_labels = vp_labels 43 | self.img_size = im_size 44 | self.map_size = map_size 45 | self.num_classes = num_classes 46 | self.num_instances = len(self.im_paths) 47 | self.transform = transform 48 | 49 | # Print out dataset stats 50 | print("================================") 51 | print("Pascal3D (w/ Keypoints) Stats: ") 52 | print("CSV file length : ", len(im_paths)) 53 | print("Dataset size : ", self.num_instances) 54 | print("Loading time (s) : ", time.time() - start_time) 55 | 56 | 57 | """ 58 | __getitem__ method: 59 | Args: 60 | index (int): Index 61 | Returns: 62 | tuple: (image, target) where target is class_index of the target class. 63 | """ 64 | def __getitem__(self, index): 65 | 66 | # Load and transform image 67 | if self.root == None: 68 | c = self.im_paths[index] 69 | else: 70 | im_path = os.path.join(self.root, self.im_paths[index]) 71 | 72 | bbox = list(self.bbox[index]) 73 | kp_loc = list(self.kp_loc[index]) 74 | kp_cls = self.kp_cls[index] 75 | obj_cls = self.obj_cls[index] 76 | 77 | view = self.vp_labels[index] 78 | 79 | # Transform labels 80 | azim, elev, tilt = (view + 360.) % 360. 81 | 82 | # Load and transform image 83 | img, kp_loc = self.loader(im_path, bbox, kp_loc) 84 | img = self.transform(img) 85 | 86 | # Generate keypoint map image, and kp class vector 87 | kpc_vec = np.zeros( (34) ) 88 | kpc_vec[kp_cls] = 1 89 | kp_class = torch.from_numpy(kpc_vec).float() 90 | 91 | kpm_map = self.generate_kp_map_chebyshev(kp_loc) 92 | kp_map = torch.from_numpy(kpm_map).float() 93 | 94 | # construct unique key for statistics -- only need to generate imid and year 95 | _bb = str(bbox[0]) + '-' + str(bbox[1]) + '-' + str(bbox[2]) + '-' + str(bbox[3]) 96 | key_uid = self.im_paths[index] + '_' + _bb + '_objc' + str(obj_cls) + '_kpc' + str(kp_cls) 97 | 98 | return img, azim, elev, tilt, obj_cls, kp_map, kp_class, key_uid 99 | 100 | """ 101 | Retuns the Length of the dataset 102 | """ 103 | def __len__(self): 104 | return self.num_instances 105 | 106 | """ 107 | Image loader 108 | Inputs: 109 | path absolute image path 110 | bbox 4-element tuple (x_min, y_min, x_max, y_max) 111 | flip boolean for flipping image horizontally 112 | kp_loc 2-element tuple (x_loc, y_loc) 113 | """ 114 | def pil_loader(self, path, bbox, kp_loc): 115 | # open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835) 116 | with open(path, 'rb') as f: 117 | with Image.open(f) as img: 118 | # Calculate relative kp_loc position 119 | kp_loc[0] = float(kp_loc[0]-bbox[0])/float(bbox[2]-bbox[0]) 120 | kp_loc[1] = float(kp_loc[1]-bbox[1])/float(bbox[3]-bbox[1]) 121 | 122 | # Convert to RGB, crop, and resize 123 | img = img.convert('RGB') 124 | 125 | # Convert to BGR from RGB 126 | r, g, b = img.split() 127 | img = Image.merge("RGB", (b, g, r)) 128 | 129 | img = img.crop(box=bbox) 130 | img = img.resize( (self.img_size, self.img_size), Image.LANCZOS) 131 | 132 | return img, kp_loc 133 | 134 | """ 135 | Convert CSV file to instances 136 | """ 137 | def csv_to_instances(self, csv_path): 138 | # imgPath,bboxTLX,bboxTLY,bboxBRX,bboxBRY,imgKeyptX,imgKeyptY,keyptClass,objClass,azimuthClass,elevationClass,rotationClass 139 | # /z/.../datasets/pascal3d/Images/bus_pascal/2008_000032.jpg,5,117,488,273,9.186347,158.402214,1,4,1756,1799,1443 140 | 141 | df = pandas.read_csv(csv_path, sep=',') 142 | data = df.values 143 | 144 | data_split = np.split(data, [0, 1, 5, 7, 8, 9, 12], axis=1) 145 | del(data_split[0]) 146 | 147 | image_paths = np.squeeze(data_split[0]).tolist() 148 | 149 | # if self.root != None: 150 | # image_paths = [path.split('pascal3d/')[1] for path in image_paths] 151 | 152 | bboxes = data_split[1].tolist() 153 | kp_loc = data_split[2].tolist() 154 | kp_class = np.squeeze(data_split[3]).tolist() 155 | obj_class = np.squeeze(data_split[4]).tolist() 156 | viewpoints = np.array(data_split[5].tolist()) 157 | 158 | return image_paths, bboxes, kp_loc, kp_class, obj_class, viewpoints 159 | 160 | 161 | """ 162 | Generate Chbyshev-based map given a keypoint location 163 | """ 164 | def generate_kp_map_chebyshev(self, kp): 165 | 166 | assert kp[0] >= 0. and kp[0] <= 1., kp 167 | assert kp[1] >= 0. and kp[1] <= 1., kp 168 | kp_map = np.ndarray( (self.map_size, self.map_size) ) 169 | 170 | 171 | kp[0] = kp[0] * self.map_size 172 | kp[1] = kp[1] * self.map_size 173 | 174 | for i in range(0, self.map_size): 175 | for j in range(0, self.map_size): 176 | kp_map[i,j] = max( np.abs(i - kp[0]), np.abs(j - kp[1])) 177 | 178 | # Normalize by dividing by the maximum possible value, which is self.IMG_SIZE -1 179 | kp_map = kp_map / (1. * self.map_size) 180 | # kp_map = -2. * (kp_map - 0.5) 181 | 182 | return kp_map 183 | 184 | -------------------------------------------------------------------------------- /datasets/vp_util.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os 3 | import sys 4 | 5 | def label_to_probs(view_angles, flip): 6 | # extract angles 7 | azim = view_angles[0] % 360 8 | elev = view_angles[1] % 360 9 | tilt = view_angles[2] % 360 10 | 11 | if flip: 12 | azim = (360-azim) % 360 13 | tilt = (-1 *tilt) % 360 14 | 15 | -------------------------------------------------------------------------------- /model_weights/get_weights.sh: -------------------------------------------------------------------------------- 1 | wget http://www-personal.umich.edu/~mbanani/clickhere/weights/r4cnn.pkl 2 | wget http://www-personal.umich.edu/~mbanani/clickhere/weights/ch_cnn.npy 3 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | from .render4cnn import * 2 | from .clickhere_cnn import * 3 | -------------------------------------------------------------------------------- /models/clickhere_cnn.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | import numpy as np 5 | from IPython import embed 6 | 7 | class clickhere_cnn(nn.Module): 8 | def __init__(self, renderCNN, weights_path = None, num_classes = 12): 9 | super(clickhere_cnn, self).__init__() 10 | 11 | # Image Stream 12 | self.conv4 = renderCNN.conv4 13 | self.conv5 = renderCNN.conv5 14 | 15 | self.infer = nn.Sequential( 16 | nn.Linear(9216,4096), 17 | nn.ReLU(), 18 | nn.Dropout(0.5), 19 | nn.Linear(4096,4096), 20 | nn.ReLU(), 21 | nn.Dropout(0.5)) 22 | 23 | #Keypoint Stream 24 | self.kp_map = nn.Linear(2116,2116) 25 | self.kp_class = nn.Linear(34,34) 26 | self.kp_fuse = nn.Linear(2150,169) 27 | self.pool_map = nn.MaxPool2d( (5,5), (5,5), (1,1), ceil_mode=True) 28 | 29 | # Fused layer 30 | self.fusion = nn.Sequential(nn.Linear(4096 + 384, 4096), nn.ReLU(), nn.Dropout(0.5)) 31 | 32 | # Prediction layers 33 | self.azim = nn.Linear(4096, 12 * 360) 34 | self.elev = nn.Linear(4096, 12 * 360) 35 | self.tilt = nn.Linear(4096, 12 * 360) 36 | 37 | if weights_path is not None: 38 | self.init_weights(weights_path) 39 | 40 | 41 | def init_weights(self, weights_path): 42 | npy_dict = np.load(weights_path, allow_pickle = True, encoding = 'latin1').item() 43 | 44 | state_dict = npy_dict 45 | # Convert parameters to torch tensors 46 | for key in list(npy_dict.keys()): 47 | state_dict[key]['weight'] = torch.from_numpy(npy_dict[key]['weight']) 48 | state_dict[key]['bias'] = torch.from_numpy(npy_dict[key]['bias']) 49 | 50 | self.conv4[0].weight.data.copy_(state_dict['conv1']['weight']) 51 | self.conv4[0].bias.data.copy_(state_dict['conv1']['bias']) 52 | self.conv4[4].weight.data.copy_(state_dict['conv2']['weight']) 53 | self.conv4[4].bias.data.copy_(state_dict['conv2']['bias']) 54 | self.conv4[8].weight.data.copy_(state_dict['conv3']['weight']) 55 | self.conv4[8].bias.data.copy_(state_dict['conv3']['bias']) 56 | self.conv4[10].weight.data.copy_(state_dict['conv4']['weight']) 57 | self.conv4[10].bias.data.copy_(state_dict['conv4']['bias']) 58 | self.conv5[0].weight.data.copy_(state_dict['conv5']['weight']) 59 | self.conv5[0].bias.data.copy_(state_dict['conv5']['bias']) 60 | 61 | self.infer[0].weight.data.copy_(state_dict['fc6']['weight']) 62 | self.infer[0].bias.data.copy_(state_dict['fc6']['bias']) 63 | self.infer[3].weight.data.copy_(state_dict['fc7']['weight']) 64 | self.infer[3].bias.data.copy_(state_dict['fc7']['bias']) 65 | self.fusion[0].weight.data.copy_(state_dict['fc8']['weight']) 66 | self.fusion[0].bias.data.copy_(state_dict['fc8']['bias']) 67 | 68 | self.kp_map.weight.data.copy_(state_dict['fc-keypoint-map']['weight']) 69 | self.kp_map.bias.data.copy_(state_dict['fc-keypoint-map']['bias']) 70 | self.kp_class.weight.data.copy_(state_dict['fc-keypoint-class']['weight']) 71 | self.kp_class.bias.data.copy_(state_dict['fc-keypoint-class']['bias']) 72 | self.kp_fuse.weight.data.copy_(state_dict['fc-keypoint-concat']['weight']) 73 | self.kp_fuse.bias.data.copy_(state_dict['fc-keypoint-concat']['bias']) 74 | 75 | self.azim.weight.data.copy_( state_dict['pred_azimuth' ]['weight'] ) 76 | self.elev.weight.data.copy_( state_dict['pred_elevation']['weight'] ) 77 | self.tilt.weight.data.copy_( state_dict['pred_tilt' ]['weight'] ) 78 | 79 | self.azim.bias.data.copy_( state_dict['pred_azimuth' ]['bias'] ) 80 | self.elev.bias.data.copy_( state_dict['pred_elevation']['bias'] ) 81 | self.tilt.bias.data.copy_( state_dict['pred_tilt' ]['bias'] ) 82 | 83 | 84 | def forward(self, images, kp_map, kp_cls, obj_class): 85 | # Image Stream 86 | conv4 = self.conv4(images) 87 | im_stream = self.conv5(conv4) 88 | im_stream = im_stream.view(im_stream.size(0), -1) 89 | im_stream = self.infer(im_stream) 90 | 91 | # Keypoint Stream 92 | kp_map = kp_map.view(kp_map.size(0), -1) 93 | kp_map = self.kp_map(kp_map) 94 | kp_cls = self.kp_class(kp_cls) 95 | 96 | # Concatenate the two keypoint feature vectors 97 | kp_stream = torch.cat([kp_map, kp_cls], dim = 1) 98 | 99 | # Softmax followed by reshaping into a 13x13 100 | # Conv4 as shape batch * 384 * 13 * 13 101 | kp_stream = F.softmax(self.kp_fuse(kp_stream), dim=1) 102 | kp_stream = kp_stream.view(kp_stream.size(0) ,1, 13, 13) 103 | 104 | # Attention -> Elt. wise product, then summation over x and y dims 105 | kp_stream = kp_stream * conv4 # CHECK IF THIS DOES WHAT I THINK IT DOES!! TODO 106 | kp_stream = kp_stream.sum(3).sum(2) 107 | 108 | # Concatenate fc7 and attended features 109 | fused_embed = torch.cat([im_stream, kp_stream], dim = 1) 110 | fused_embed = self.fusion(fused_embed) 111 | 112 | # Final inference 113 | azim = self.azim(fused_embed) 114 | elev = self.elev(fused_embed) 115 | tilt = self.tilt(fused_embed) 116 | 117 | # mask on class 118 | azim = self.azim(fused_embed) 119 | azim = azim.view(-1, 12, 360) 120 | azim = azim[torch.arange(fused_embed.shape[0]), obj_class, :] 121 | elev = self.elev(fused_embed) 122 | elev = elev.view(-1, 12, 360) 123 | elev = elev[torch.arange(fused_embed.shape[0]), obj_class, :] 124 | tilt = self.tilt(fused_embed) 125 | tilt = tilt.view(-1, 12, 360) 126 | tilt = tilt[torch.arange(fused_embed.shape[0]), obj_class, :] 127 | 128 | return azim, tilt, elev 129 | -------------------------------------------------------------------------------- /models/render4cnn.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from IPython import embed 4 | 5 | class render4cnn(nn.Module): 6 | def __init__(self, weights_path = None): 7 | super(render4cnn, self).__init__() 8 | 9 | # define model 10 | self.conv4 = nn.Sequential( 11 | nn.Conv2d(3, 96, (11, 11), (4,4)), 12 | nn.ReLU(), 13 | nn.MaxPool2d( (3,3), (2,2), (0,0), ceil_mode=True), 14 | nn.LocalResponseNorm(5, alpha=0.0001, beta=0.75, k=1.), 15 | nn.Conv2d(96, 256, (5, 5), (1,1), (2,2), 1,2), 16 | nn.ReLU(), 17 | nn.MaxPool2d( (3,3), (2,2), (0,0), ceil_mode=True), 18 | nn.LocalResponseNorm(5, alpha=0.0001, beta=0.75, k=1.), 19 | nn.Conv2d(256,384,(3, 3),(1, 1),(1, 1)), 20 | nn.ReLU(), 21 | nn.Conv2d(384,384,(3, 3),(1, 1),(1, 1),1,2), 22 | nn.ReLU(), 23 | ) 24 | 25 | self.conv5 = nn.Sequential( 26 | nn.Conv2d(384,256,(3, 3),(1, 1),(1, 1),1,2), 27 | nn.ReLU(), 28 | nn.MaxPool2d((3, 3),(2, 2),(0, 0),ceil_mode=True), 29 | ) 30 | 31 | self.infer = nn.Sequential( 32 | nn.Linear(9216,4096), 33 | nn.ReLU(), 34 | nn.Dropout(0.5), 35 | nn.Linear(4096,4096), 36 | nn.ReLU(), 37 | nn.Dropout(0.5), 38 | ) 39 | 40 | self.azim = nn.Linear(4096, 12 * 360) 41 | self.elev = nn.Linear(4096, 12 * 360) 42 | self.tilt = nn.Linear(4096, 12 * 360) 43 | 44 | if weights_path is not None: 45 | self._initialize_weights(weights_path) 46 | 47 | # weight initialization from torchvision/models/vgg.py 48 | def _initialize_weights(self, weights_path): 49 | state_dict = torch.load(weights_path)['model_state_dict'] 50 | 51 | layers = [0, 4, 8, 10] 52 | for l in layers: 53 | self.conv4[l].weight.data.copy_( state_dict['conv4.'+str(l) + '.weight']) 54 | self.conv4[l].bias.data.copy_( state_dict['conv4.'+str(l) + '.bias']) 55 | 56 | self.conv5[0].weight.data.copy_( state_dict['conv5.0.weight']) 57 | self.conv5[0].bias.data.copy_( state_dict['conv5.0.bias']) 58 | 59 | self.infer[0].weight.data.copy_( state_dict['infer.0.weight']) 60 | self.infer[0].bias.data.copy_( state_dict['infer.0.bias']) 61 | self.infer[3].weight.data.copy_( state_dict['infer.3.weight']) 62 | self.infer[3].bias.data.copy_( state_dict['infer.3.bias']) 63 | 64 | self.azim.weight.data.copy_( state_dict['azim.0.weight']) 65 | self.azim.bias.data.copy_( state_dict['azim.0.bias']) 66 | self.elev.weight.data.copy_( state_dict['elev.0.weight']) 67 | self.elev.bias.data.copy_( state_dict['elev.0.bias']) 68 | self.tilt.weight.data.copy_( state_dict['tilt.0.weight']) 69 | self.tilt.bias.data.copy_( state_dict['tilt.0.bias']) 70 | 71 | 72 | def forward(self, x, obj_class): 73 | # generate output 74 | x = self.conv4(x) 75 | x = self.conv5(x) 76 | x = x.view(x.shape[0], -1) 77 | x = self.infer(x) 78 | 79 | # mask on class 80 | azim = self.azim(x) 81 | azim = azim.view(-1, 12,360) 82 | azim = azim[torch.arange(x.shape[0]), obj_class, :] 83 | elev = self.elev(x) 84 | elev = elev.view(-1, 12,360) 85 | elev = elev[torch.arange(x.shape[0]), obj_class, :] 86 | tilt = self.tilt(x) 87 | tilt = tilt.view(-1, 12,360) 88 | tilt = tilt[torch.arange(x.shape[0]), obj_class, :] 89 | 90 | return azim, elev, tilt 91 | 92 | 93 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import argparse, os, sys, shutil, time 2 | 3 | import numpy as np 4 | from IPython import embed 5 | 6 | import torch 7 | 8 | from util import SoftmaxVPLoss, Paths, get_data_loaders, kp_dict 9 | from models import clickhere_cnn, render4cnn 10 | from util.torch_utils import save_checkpoint 11 | 12 | def main(args): 13 | initialization_time = time.time() 14 | 15 | 16 | print("############# Read in Database ##############") 17 | train_loader, valid_loader = get_data_loaders( dataset = args.dataset, 18 | batch_size = args.batch_size, 19 | num_workers = args.num_workers, 20 | model = args.model) 21 | 22 | print("############# Initiate Model ##############") 23 | if args.model == 'chcnn': 24 | assert Paths.render4cnn_weights != None, "Error: Set render4cnn weights path in util/Paths.py." 25 | model = clickhere_cnn(render4cnn(), weights_path = Paths.clickhere_weights) 26 | args.no_keypoint = False 27 | elif args.model == 'r4cnn': 28 | assert Paths.render4cnn_weights != None, "Error: Set render4cnn weights path in util/Paths.py." 29 | model = render4cnn(weights_path = Paths.render4cnn_weights) 30 | args.no_keypoint = True 31 | else: 32 | assert False, "Error: unknown model choice." 33 | 34 | # Loss functions 35 | criterion = SoftmaxVPLoss() 36 | 37 | # Parameters to train 38 | if args.just_attention and (not args.no_keypoint): 39 | params = list(model.map_linear.parameters()) +list(model.cls_linear.parameters()) 40 | params = params + list(model.kp_softmax.parameters()) +list(model.fusion.parameters()) 41 | params = params + list(model.azim.parameters()) + list(model.elev.parameters()) 42 | params = params + list(model.tilt.parameters()) 43 | else: 44 | params = list(model.parameters()) 45 | 46 | # Optimizer 47 | optimizer = torch.optim.Adam(params, lr = args.lr) 48 | 49 | # train/evaluate on GPU 50 | model.cuda() 51 | 52 | print("Time to initialize take: ", time.time() - initialization_time) 53 | print("############# Start Training ##############") 54 | total_step = len(train_loader) 55 | 56 | for epoch in range(0, args.num_epochs): 57 | 58 | if epoch % args.eval_epoch == 0: 59 | eval_step( model = model, 60 | data_loader = valid_loader, 61 | criterion = criterion, 62 | step = epoch * total_step, 63 | datasplit = "valid") 64 | 65 | train_step( model = model, 66 | train_loader = train_loader, 67 | criterion = criterion, 68 | optimizer = optimizer, 69 | epoch = epoch, 70 | step = epoch * total_step) 71 | 72 | 73 | def train_step(model, train_loader, criterion, optimizer, epoch, step): 74 | model.train() 75 | total_step = len(train_loader) 76 | loss_sum = 0. 77 | 78 | for i, (images, azim_label, elev_label, tilt_label, obj_class, kp_map, kp_class, key_uid) in enumerate(train_loader): 79 | 80 | # Set mini-batch dataset 81 | images = images.cuda() 82 | azim_label = azim_label.cuda() 83 | elev_label = elev_label.cuda() 84 | tilt_label = tilt_label.cuda() 85 | obj_class = obj_class.cuda() 86 | 87 | # Forward, Backward and Optimize 88 | model.zero_grad() 89 | 90 | if args.no_keypoint: 91 | azim, elev, tilt = model(images, obj_class) 92 | else: 93 | kp_map = kp_map.cuda() 94 | kp_class = kp_class.cuda() 95 | azim, elev, tilt = model(images, kp_map, kp_class, obj_class) 96 | 97 | loss_a = criterion(azim, azim_label) 98 | loss_e = criterion(elev, elev_label) 99 | loss_t = criterion(tilt, tilt_label) 100 | loss = loss_a + loss_e + loss_t 101 | 102 | loss.backward() 103 | optimizer.step() 104 | 105 | loss_sum += loss.item() 106 | # Print log info 107 | if i % args.log_rate == 0 and i > 0: 108 | print("Epoch [%d/%d] Step [%d/%d]: Training Loss = %2.5f" %( epoch, args.num_epochs, i, total_step, loss_sum / (i + 1))) 109 | 110 | 111 | def eval_step( model, data_loader, criterion, step, datasplit): 112 | model.eval() 113 | 114 | total_step = len(data_loader) 115 | epoch_loss_a = 0. 116 | epoch_loss_e = 0. 117 | epoch_loss_t = 0. 118 | epoch_loss = 0. 119 | results_dict = kp_dict() 120 | 121 | for i, (images, azim_label, elev_label, tilt_label, obj_class, kp_map, kp_class, key_uid) in enumerate(data_loader): 122 | 123 | if i % args.log_rate == 0: 124 | print("Evaluation of %s [%d/%d] " % (datasplit, i, total_step)) 125 | 126 | # Set mini-batch dataset 127 | images = images.cuda() 128 | azim_label = azim_label.cuda() 129 | elev_label = elev_label.cuda() 130 | tilt_label = tilt_label.cuda() 131 | obj_class = obj_class.cuda() 132 | 133 | if args.no_keypoint: 134 | azim, elev, tilt = model(images, obj_class) 135 | else: 136 | kp_map = kp_map.cuda() 137 | kp_class = kp_class.cuda() 138 | azim, elev, tilt = model(images, kp_map, kp_class, obj_class) 139 | 140 | # embed() 141 | epoch_loss_a += criterion(azim, azim_label).item() 142 | epoch_loss_e += criterion(elev, elev_label).item() 143 | epoch_loss_t += criterion(tilt, tilt_label).item() 144 | 145 | results_dict.update_dict( key_uid, 146 | [azim.data.cpu().numpy(), elev.data.cpu().numpy(), tilt.data.cpu().numpy()], 147 | [azim_label.data.cpu().numpy(), elev_label.data.cpu().numpy(), tilt_label.data.cpu().numpy()]) 148 | 149 | 150 | type_accuracy, type_total, type_geo_dist = results_dict.metrics() 151 | 152 | geo_dist_median = [np.median(type_dist) * 180. / np.pi for type_dist in type_geo_dist if type_dist != [] ] 153 | type_accuracy = [ type_accuracy[i] * 100. for i in range(0, len(type_accuracy)) if type_total[i] > 0] 154 | w_acc = np.mean(type_accuracy) 155 | 156 | print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++") 157 | print("Type Acc_pi/6 : ", type_accuracy, " -> ", w_acc, " %") 158 | print("Type Median : ", [ int(1000 * a_type_med) / 1000. for a_type_med in geo_dist_median ], " -> ", int(1000 * np.mean(geo_dist_median)) / 1000., " degrees") 159 | print("Type Loss : ", [epoch_loss_a/total_step, epoch_loss_e/total_step, epoch_loss_t/total_step], " -> ", (epoch_loss_a + epoch_loss_e + epoch_loss_t ) / total_step) 160 | print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++") 161 | 162 | 163 | 164 | if __name__ == '__main__': 165 | 166 | parser = argparse.ArgumentParser() 167 | 168 | # logging parameters 169 | parser.add_argument('--eval_epoch', type=int , default=5) 170 | parser.add_argument('--log_rate', type=int, default=10) 171 | parser.add_argument('--num_workers', type=int, default=7) 172 | 173 | # training parameters 174 | parser.add_argument('--num_epochs', type=int, default=100) 175 | parser.add_argument('--batch_size', type=int, default=64) 176 | parser.add_argument('--lr', type=float, default=3e-4) 177 | parser.add_argument('--optimizer', type=str,default='sgd') 178 | 179 | # experiment details 180 | parser.add_argument('--dataset', type=str, default='pascal') 181 | parser.add_argument('--model', type=str, default='pretrained_clickhere') 182 | parser.add_argument('--experiment_name', type=str, default= 'Test') 183 | parser.add_argument('--just_attention', action="store_true",default=False) 184 | 185 | 186 | args = parser.parse_args() 187 | main(args) 188 | -------------------------------------------------------------------------------- /util/Paths.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | pascal3d_root = '/home/mbanani/datasets/pascal3d' 4 | 5 | root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) 6 | render4cnn_weights = os.path.join(root_dir, 'model_weights/r4cnn.pkl') 7 | ft_render4cnn_weights = os.path.join(root_dir, 'model_weights/ryan_render.npy') 8 | clickhere_weights = os.path.join(root_dir, 'model_weights/ch_cnn.npy') 9 | -------------------------------------------------------------------------------- /util/__init__.py: -------------------------------------------------------------------------------- 1 | from .vp_loss import SoftmaxVPLoss 2 | from .metrics import kp_dict 3 | from .load_datasets import * 4 | from . import Paths 5 | -------------------------------------------------------------------------------- /util/load_datasets.py: -------------------------------------------------------------------------------- 1 | import os,sys, math 2 | import torch 3 | import numpy as np 4 | import torchvision.transforms as transforms 5 | from datasets import pascal3d, pascal3d_kp 6 | 7 | from IPython import embed 8 | from .Paths import * 9 | 10 | root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) 11 | dataset_root = pascal3d_root 12 | 13 | 14 | def get_data_loaders(dataset, batch_size, num_workers, model, num_classes = 12): 15 | 16 | image_size = 227 17 | train_transform = transforms.Compose([transforms.ToTensor(), 18 | transforms.Normalize( mean=(0., 0., 0.), 19 | std=(1./255., 1./255., 1./255.) 20 | ), 21 | transforms.Normalize( mean=(104, 116.668, 122.678), 22 | std=(1., 1., 1.) 23 | ) 24 | ]) 25 | 26 | test_transform = transforms.Compose([transforms.ToTensor(), 27 | transforms.Normalize( mean=(0., 0., 0.), 28 | std=(1./255., 1./255., 1./255.) 29 | ), 30 | transforms.Normalize( mean=(104, 116.668, 122.678), 31 | std=(1., 1., 1.) 32 | ) 33 | ]) 34 | 35 | # # The New transform for ImageNet Stuff 36 | # new_transform = transforms.Compose([ 37 | # transforms.ToTensor(), 38 | # transforms.Normalize(mean=(0.485, 0.456, 0.406), 39 | # std=(0.229, 0.224, 0.225))]) 40 | 41 | 42 | if dataset == "pascal": 43 | csv_train = os.path.join(root_dir, 'data/pascal3d_train.csv') 44 | csv_test = os.path.join(root_dir, 'data/pascal3d_valid.csv') 45 | 46 | train_set = pascal3d(csv_train, dataset_root= dataset_root, transform = train_transform, im_size = image_size) 47 | test_set = pascal3d(csv_test, dataset_root= dataset_root, transform = test_transform, im_size = image_size) 48 | elif dataset == "pascalEasy": 49 | csv_train = os.path.join(root_dir, 'data/pascal3d_train_easy.csv') 50 | csv_test = os.path.join(root_dir, 'data/pascal3d_valid_easy.csv') 51 | 52 | train_set = pascal3d( csv_train, dataset_root= dataset_root, 53 | transform = train_transform, im_size = image_size) 54 | test_set = pascal3d( csv_test, dataset_root= dataset_root, 55 | transform = test_transform, im_size = image_size) 56 | 57 | elif dataset == "pascalVehKP": 58 | csv_train = os.path.join(root_dir, 'data/veh_pascal3d_kp_train.csv') 59 | csv_test = os.path.join(root_dir, 'data/veh_pascal3d_kp_valid.csv') 60 | 61 | train_set = pascal3d_kp(csv_train, 62 | dataset_root= dataset_root, 63 | transform = train_transform, 64 | im_size = image_size, 65 | num_classes = num_classes) 66 | 67 | test_set = pascal3d_kp(csv_test, 68 | dataset_root= dataset_root, 69 | transform = test_transform, 70 | im_size = image_size, 71 | num_classes = num_classes) 72 | 73 | elif dataset == "pascalKP": 74 | csv_train = os.path.join(root_dir, 'data/pascal3d_kp_train.csv') 75 | csv_test = os.path.join(root_dir, 'data/pascal3d_kp_valid.csv') 76 | 77 | train_set = pascal3d_kp(csv_train, dataset_root= dataset_root, transform = train_transform, im_size = image_size) 78 | test_set = pascal3d_kp(csv_test, dataset_root= dataset_root, transform = test_transform, im_size = image_size) 79 | else: 80 | print("Error in load_datasets: Dataset name not defined.") 81 | 82 | 83 | 84 | # Generate data loaders 85 | train_loader = torch.utils.data.DataLoader( dataset = train_set, 86 | batch_size =batch_size, 87 | shuffle = True , 88 | num_workers =num_workers, 89 | drop_last = True) 90 | 91 | test_loader = torch.utils.data.DataLoader( dataset=test_set, 92 | batch_size=batch_size, 93 | shuffle = False, 94 | num_workers=num_workers, 95 | drop_last = False) 96 | 97 | return train_loader, test_loader 98 | 99 | -------------------------------------------------------------------------------- /util/metrics.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.misc 3 | 4 | from scipy import linalg as linAlg 5 | from IPython import embed 6 | 7 | def compute_angle_dists(preds, labels): 8 | # Get rotation matrices from prediction and ground truth angles 9 | predR = angle2dcm(preds[0], preds[1], preds[2]) 10 | gtR = angle2dcm(labels[0], labels[1], labels[2]) 11 | 12 | # Get geodesic distance 13 | return linAlg.norm(linAlg.logm(np.dot(predR.T, gtR)), 2) / np.sqrt(2) 14 | 15 | def angle2dcm(xRot, yRot, zRot, deg_type='deg'): 16 | if deg_type == 'deg': 17 | xRot = xRot * np.pi / 180.0 18 | yRot = yRot * np.pi / 180.0 19 | zRot = zRot * np.pi / 180.0 20 | 21 | xMat = np.array([ 22 | [np.cos(xRot), np.sin(xRot), 0], 23 | [-np.sin(xRot), np.cos(xRot), 0], 24 | [0, 0, 1] 25 | ]) 26 | 27 | yMat = np.array([ 28 | [np.cos(yRot), 0, -np.sin(yRot)], 29 | [0, 1, 0], 30 | [np.sin(yRot), 0, np.cos(yRot)] 31 | ]) 32 | 33 | zMat = np.array([ 34 | [1, 0, 0], 35 | [0, np.cos(zRot), np.sin(zRot)], 36 | [0, -np.sin(zRot), np.cos(zRot)] 37 | ]) 38 | 39 | return np.dot(zMat, np.dot(yMat, xMat)) 40 | 41 | 42 | class kp_dict(object): 43 | 44 | def __init__(self, num_classes = 12): 45 | self.keypoint_dict = dict() 46 | self.num_classes = num_classes 47 | self.class_ranges = list(range(0, 360*(self.num_classes + 1), 360)) 48 | self.threshold = np.pi / 6. 49 | 50 | """ 51 | Updates the keypoint dictionary 52 | params: unique_id unique id of each instance (NAME_objc#_kpc#) 53 | predictions the predictions for each vector 54 | """ 55 | def update_dict(self, unique_id, predictions, labels): 56 | """Log a scalar variable.""" 57 | if type(predictions) == int: 58 | predictions = [predictions] 59 | labels = [labels] 60 | 61 | for i in range(0, len(unique_id)): 62 | image = unique_id[i].split('_objc')[0] 63 | obj_class = int(unique_id[i].split('_objc')[1].split('_kpc')[0]) 64 | kp_class = int(unique_id[i].split('_objc')[1].split('_kpc')[1]) 65 | 66 | start_index = self.class_ranges[obj_class] 67 | end_index = self.class_ranges[obj_class + 1] 68 | 69 | 70 | pred_probs = ( predictions[0][i], predictions[1][i], predictions[2][i]) 71 | 72 | label_probs = ( labels[0][i], labels[1][i], labels[2][i]) 73 | 74 | 75 | if image in list(self.keypoint_dict.keys()): 76 | self.keypoint_dict[image][kp_class] = pred_probs 77 | else: 78 | self.keypoint_dict[image] = {'class' : obj_class, 'label' : label_probs, kp_class : pred_probs} 79 | 80 | 81 | def calculate_geo_performance(self): 82 | for image in list(self.keypoint_dict.keys()): 83 | curr_label = self.keypoint_dict[image]['label'] 84 | self.keypoint_dict[image]['geo_dist'] = dict() 85 | self.keypoint_dict[image]['correct'] = dict() 86 | for kp in list(self.keypoint_dict[image].keys()): 87 | if type(kp) != str : 88 | curr_pred = [ np.argmax(self.keypoint_dict[image][kp][0]), 89 | np.argmax(self.keypoint_dict[image][kp][1]), 90 | np.argmax(self.keypoint_dict[image][kp][2])] 91 | self.keypoint_dict[image]['geo_dist'][kp] = compute_angle_dists(curr_pred, curr_label) 92 | self.keypoint_dict[image]['correct'][kp] = 1 if (self.keypoint_dict[image]['geo_dist'][kp] <= self.threshold) else 0 93 | 94 | def metrics(self, unique = False): 95 | self.calculate_geo_performance() 96 | 97 | type_geo_dist = [ [] for x in range(0, self.num_classes)] 98 | type_correct = np.zeros(self.num_classes, dtype=np.float32) 99 | type_total = np.zeros(self.num_classes, dtype=np.float32) 100 | 101 | for image in list(self.keypoint_dict.keys()): 102 | object_type = self.keypoint_dict[image]['class'] 103 | curr_correct = 0. 104 | curr_total = 0. 105 | curr_geodist = [] 106 | for kp in list(self.keypoint_dict[image]['correct'].keys()): 107 | curr_correct += self.keypoint_dict[image]['correct'][kp] 108 | curr_total += 1. 109 | curr_geodist.append(self.keypoint_dict[image]['geo_dist'][kp]) 110 | 111 | if unique: 112 | curr_correct = curr_correct / curr_total 113 | curr_total = 1. 114 | curr_geodist = [np.median(curr_geodist)] 115 | 116 | 117 | 118 | type_correct[object_type] += curr_correct 119 | type_total[object_type] += curr_total 120 | for dist in curr_geodist: 121 | type_geo_dist[object_type].append(dist) 122 | 123 | type_accuracy = np.zeros(self.num_classes, dtype=np.float16) 124 | for i in range(0, self.num_classes): 125 | if type_total[i] > 0: 126 | type_accuracy[i] = float(type_correct[i]) / type_total[i] 127 | 128 | self.calculate_performance_baselines() 129 | return type_accuracy, type_total, type_geo_dist 130 | 131 | 132 | def calculate_performance_baselines(self, mode = 'real'): 133 | 134 | mean_baseline = [ [] for x in range(0, self.num_classes)] 135 | total_baseline = [ [] for x in range(0, self.num_classes)] 136 | 137 | #iterate over batch 138 | for image in list(self.keypoint_dict.keys()): 139 | obj_cls = self.keypoint_dict[image]['class'] 140 | 141 | perf = [self.keypoint_dict[image]['geo_dist'][kp] for kp in list(self.keypoint_dict[image]['geo_dist'].keys())] 142 | 143 | # Append baselines 144 | mean_baseline[obj_cls ].append(np.mean(perf)) 145 | for p in perf: 146 | total_baseline[obj_cls ].append(p ) 147 | 148 | 149 | # embed() 150 | accuracy_mean = np.around([ 100. * np.mean([ num < self.threshold for num in mean_baseline[i] ]) for i in range(0, self.num_classes) ], decimals = 2) 151 | accuracy_total = np.around([ 100. * np.mean([ num < self.threshold for num in total_baseline[i] ]) for i in range(0, self.num_classes) ], decimals = 2) 152 | 153 | medError_mean = np.around([ (180. / np.pi ) * np.median(mean_baseline[i] ) for i in range(0, self.num_classes) ], decimals = 2) 154 | medError_total = np.around([ (180. / np.pi ) * np.median(total_baseline[i] ) for i in range(0, self.num_classes) ], decimals = 2) 155 | 156 | 157 | if np.isnan(accuracy_mean[0]): 158 | accuracy_mean = accuracy_mean[[4,5,8]] 159 | accuracy_total = accuracy_total[[4,5,8]] 160 | medError_mean = medError_mean[[4,5,8]] 161 | medError_total = medError_total[[4,5,8]] 162 | 163 | print("--------------------------------------------") 164 | print("Accuracy ") 165 | print("mean : ", accuracy_mean , " -- mean : ", np.round(np.mean(accuracy_mean ), decimals = 2)) 166 | print("total : ", accuracy_total , " -- mean : ", np.round(np.mean(accuracy_total ), decimals = 2)) 167 | print("") 168 | print("Median Error ") 169 | print("mean : ", medError_mean , " -- mean : ", np.round(np.mean(medError_mean ), decimals = 2)) 170 | print("total : ", medError_total , " -- mean : ", np.round(np.mean(medError_total ), decimals = 2)) 171 | print("--------------------------------------------") 172 | -------------------------------------------------------------------------------- /util/torch_utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import os 3 | import shutil 4 | 5 | def save_checkpoint(model, optimizer, curr_epoch, curr_step, args, curr_loss, curr_acc, filename): 6 | """ 7 | Saves a checkpoint and updates the best loss and best weighted accuracy 8 | """ 9 | is_best_loss = curr_loss < args.best_loss 10 | is_best_acc = curr_acc > args.best_acc 11 | 12 | args.best_acc = max(args.best_acc, curr_acc) 13 | args.best_loss = min(args.best_loss, curr_loss) 14 | 15 | state = { 'epoch':curr_epoch, 16 | 'step': curr_step, 17 | 'args': args, 18 | 'state_dict': model.state_dict(), 19 | 'val_loss': args.best_loss, 20 | 'val_acc': args.best_acc, 21 | 'optimizer' : optimizer.state_dict(), 22 | } 23 | path = os.path.join(args.experiment_path, filename) 24 | torch.save(state, path) 25 | if is_best_loss: 26 | shutil.copyfile(path, os.path.join(args.experiment_path, 'model_best_loss.pkl')) 27 | if is_best_acc: 28 | shutil.copyfile(path, os.path.join(args.experiment_path, 'model_best_acc.pkl')) 29 | 30 | return args 31 | 32 | def accuracy(output, target, topk=(1,)): 33 | """ From The PyTorch ImageNet example """ 34 | """Computes the precision@k for the specified values of k""" 35 | maxk = max(topk) 36 | batch_size = target.size(0) 37 | 38 | _, pred = output.topk(maxk, 1, True, True) 39 | pred = pred.t() 40 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 41 | 42 | res = [] 43 | for k in topk: 44 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 45 | res.append(correct_k.mul_(100.0 / batch_size)) 46 | return res 47 | -------------------------------------------------------------------------------- /util/vp_loss.py: -------------------------------------------------------------------------------- 1 | """ 2 | Multi-class Geometric Viewpoint Aware Loss 3 | A PyTorch implmentation of the geometric-aware softmax view loss as described in 4 | RenderForCNN (link: https://arxiv.org/pdf/1505.05641.pdf) 5 | Caffe implmentation: 6 | https://github.com/charlesq34/caffe-render-for-cnn/blob/view_prediction/ 7 | """ 8 | import torch 9 | 10 | from torch import nn 11 | from IPython import embed 12 | 13 | import torch.nn.functional as F 14 | import numpy as np 15 | 16 | class SoftmaxVPLoss(nn.Module): 17 | # Loss parameters taken directly from Render4CNN paper 18 | # azim_band_width = 7 # 15 in paper 19 | # elev_band_width = 2 # 5 in paper 20 | # tilt_band_width = 2 # 5 in paper 21 | 22 | # azim_sigma = 5 23 | # elev_sigma = 3 24 | # tilt_sigma = 3 25 | def __init__(self, kernel_size = 7, sigma=25): 26 | super(SoftmaxVPLoss, self).__init__() 27 | 28 | self.filter = self.viewloss_filter(kernel_size, sigma) 29 | self.kernel_size = kernel_size 30 | 31 | def viewloss_filter(self, size, sigma): 32 | vec = np.linspace(-1*size, size, 1 + 2*size, dtype=np.float) 33 | prob = np.exp(-1 * abs(vec) / sigma) 34 | # # normalize filter because otherwise loss will scale with kernel size 35 | # prob = prob / np.sum(prob) 36 | prob = torch.FloatTensor(prob)[None, None, :] # 1 x 1 x (1 + 2*size) 37 | return prob 38 | 39 | 40 | def forward(self, preds, labels, size_average=True): 41 | """ 42 | :param preds: Angle predictions (batch_size, 360 x num_classes) 43 | :param targets: Angle labels (batch_size, 360 x num_classes) 44 | :return: Loss. Loss is a variable which may have a backward pass performed. 45 | Apply softmax over the preds, and then apply geometrics loss 46 | """ 47 | # Set absolute minimum for numerical stability (assuming float16 - 6x10^-5) 48 | assert len(labels.shape) == 1 49 | batch_size = labels.shape[0] 50 | 51 | # Construct onehot labels -- dimension has to be (batch x 1 x dimension) 52 | """ 53 | I thought that creation of a new tensor might slow down calculations 54 | but it doesn't seem to slow things; 55 | 10^4 iterations of scatter to batch of 256 took 0.5 sec 56 | 57 | speed test: 58 | scatter is ~25% faster for small batches (32), 59 | indexing is ~25% faster for larger batches (>1024) 60 | both are the same around 256 batch size 61 | """ 62 | labels = labels.long() 63 | labels_oh = torch.zeros(batch_size, 360) 64 | labels_oh[torch.arange(batch_size), labels] = 1. 65 | 66 | x = labels_oh.cuda() 67 | 68 | # Concat one hot vector and convolve 69 | labels_oh = torch.cat( ( labels_oh[:, -self.kernel_size:], 70 | labels_oh, 71 | labels_oh[:, :self.kernel_size]), 72 | dim = 1) 73 | 74 | labels_oh = F.conv1d(labels_oh[:, None, :], self.filter) 75 | 76 | # convert labels to CUDA 77 | labels_oh = labels_oh.squeeze(1).cuda() 78 | 79 | # calculate loss -- sum from paper 80 | loss = (-1 * labels_oh * preds.log_softmax(1)).sum(1) 81 | #loss = loss / (1 + 2 * self.kernel_size) 82 | 83 | # loss = F.mse_loss(preds.softmax(1), labels_oh, reduction = 'sum') 84 | 85 | if size_average: 86 | loss = loss.mean() 87 | else: 88 | loss = loss.sum() 89 | 90 | return loss 91 | --------------------------------------------------------------------------------