├── 015994080.jpg ├── example_result.jpg ├── LICENSE ├── README.MD ├── visu.py ├── postprocess.py ├── dope.py ├── NOTICE ├── visu3d.py └── model.py /015994080.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/naver/dope/HEAD/015994080.jpg -------------------------------------------------------------------------------- /example_result.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/naver/dope/HEAD/example_result.jpg -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | DOPE, Copyright (c) 2020 Naver Corporation, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license. 2 | 3 | A summary of the CC BY-NC-SA 4.0 license is located here: 4 | https://creativecommons.org/licenses/by-nc-sa/4.0/ 5 | 6 | The CC BY-NC-SA 4.0 license is located here: 7 | https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode 8 | 9 | -------------------------------------------------------------------------------- /README.MD: -------------------------------------------------------------------------------- 1 | # DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild 2 | 3 | This repository contains the code for running our DOPE model. 4 | We only provide code for testing, not for training. 5 | If you use our code, please cite our [ECCV'20 paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123710375.pdf): 6 | 7 | ```bibtex 8 | @inproceedings{dope, 9 | title={{DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild}}, 10 | author={{Weinzaepfel, Philippe and Br\'egier, Romain and Combaluzier, Hadrien and Leroy, Vincent and Rogez, Gr\'egory}, 11 | booktitle={{ECCV}}, 12 | year={2020} 13 | } 14 | ``` 15 | 16 | ## License 17 | 18 | DOPE is distributed under the CC BY-NC-SA 4.0 License. See [LICENSE](LICENSE) for more information. 19 | 20 | ### Getting started 21 | 22 | Our python3 code requires the following packages: 23 | * pytorch 24 | * torchvision 25 | * opencv (for drawing the results) 26 | * numpy/scipy 27 | 28 | Our code has been tested on Linux, with pytorch 1.5 and torchvision 0.6. 29 | We do not provide support for installation. 30 | 31 | #### Download the models 32 | 33 | First create a folder `models/` in which you should place the downloaded pretrained models. 34 | The list of models include: 35 | * [DOPE_v1_0_0](http://download.europe.naverlabs.com/ComputerVision/DOPE_models/DOPE_v1_0_0.pth.tgz) as used in our ECCV'20 paper 36 | * [DOPErealtime_v1_0_0](http://download.europe.naverlabs.com/ComputerVision/DOPE_models/DOPErealtime_v1_0_0.pth.tgz) which is its real-time version 37 | 38 | #### post-processing with a modified version of LCR-Net++ 39 | 40 | Our post-processing relies on a modified version of the pose proposals integration proposed in the [LCR-Net++ code](https://thoth.inrialpes.fr/src/LCR-Net/). 41 | To get this code, once in the DOPE folder, please clone our modified LCR-Net++ repository: 42 | ``` 43 | git clone https://github.com/naver/lcrnet-v2-improved-ppi.git 44 | ``` 45 | 46 | Alternatively, you can use a more naive post-processing based on non-maximum suppression by add `--postprocess nms` to the commandlines below, which will result in a slight decrease of performance. 47 | 48 | 49 | ## Using the code 50 | 51 | To use our code on an image, use the following command: 52 | 53 | ``` 54 | python dope.py --model --image 55 | ``` 56 | 57 | with 58 | * ``: name of model to use (eg DOPE_v1_0_0) 59 | * ``: name of the image to test 60 | 61 | For instance, you can run 62 | ``` 63 | python dope.py --model DOPErealtime_v1_0_0 --image 015994080.jpg 64 | ``` 65 | 66 | The command will create an image `_.jpg` that shows the 2D poses output by our DOPE model. 67 | 68 | We also provide code to visualize the results but in 2D and in 3D, just add the argument `--visu3d` to the previous commandline. 69 | The 3D visualization uses OpenGL and the visvis python package, we do not provide support for installation and OpenGL issues. 70 | Running the command with the `--visu3d` option should create another file with the 3D visualization named `__visu3d.jpg`. 71 | Note that DOPE predicts 3D poses for each body part in their relative coordinate systems, eg, centered on body center for bodies. 72 | For better visualization, we approximate 3D scene coordinates by finding the offsets that minimize the reprojection error based on a scaled orthographic projection model. 73 | 74 | Here is one example resulting image: 75 | ![example result](example_result.jpg) 76 | 77 | Our real-time models use half computation. In case your device cannot handle it, please uncomment the line ` #ckpt['half'] = False` in `dope.py`. 78 | 79 | 80 | -------------------------------------------------------------------------------- /visu.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020-present NAVER Corp. 2 | # CC BY-NC-SA 4.0 3 | # Available only for non-commercial use 4 | 5 | import numpy as np 6 | import cv2 7 | 8 | def _get_bones_and_colors(J, ignore_neck=False): # colors in BGR 9 | """ 10 | param J: number of joints -- used to deduce the body part considered. 11 | param ignore_neck: if True, the neck bone of won't be returned in case of a body (J==13) 12 | """ 13 | if J==13: # full body (similar to LCR-Net) 14 | lbones = [(9,11),(7,9),(1,3),(3,5)] 15 | if ignore_neck: 16 | rbones = [(0,2),(2,4),(8,10),(6,8)] + [(4,5),(10,11)] + [([4,5],[10,11])] 17 | else: 18 | rbones = [(0,2),(2,4),(8,10),(6,8)] + [(4,5),(10,11)] + [([4,5],[10,11]),(12,[10,11])] 19 | bonecolors = [ [0,255,0] ] * len(lbones) + [ [255,0,0] ] * len(rbones) 20 | pltcolors = [ 'g-' ] * len(lbones) + [ 'b-' ] * len(rbones) 21 | bones = lbones + rbones 22 | elif J==21: # hand (format similar to HO3D dataset) 23 | bones = [ [(0,n+1),(n+1,3*n+6),(3*n+6,3*n+7),(3*n+7,3*n+8)] for n in range(5)] 24 | bones = sum(bones,[]) 25 | bonecolors = [(255,0,255)]*4 + [(255,0,0)]*4 + [(0,255,0)]*4 + [(0,255,255)]*4 + [(0,0,255)] *4 26 | pltcolors = ['m']*4 + ['b']*4 + ['g']*4 + ['y']*4 + ['r']*4 27 | elif J==84: # face (ibug format) 28 | bones = [ (n,n+1) for n in range(83) if n not in [32,37,42,46,51,57,63,75]] + [(52,57),(58,63),(64,75),(76,83)] 29 | # 32 x contour + 4 x r-sourcil + 4 x l-sourcil + 7 x nose + 5 x l-eye + 5 x r-eye +20 x lip + l-eye + r-eye + lip + lip 30 | bonecolors = 32 * [(255,0,0)] + 4*[(255,0,0)] + 4*[(255,255,0)] + 7*[(255,0,255)] + 5*[(0,255,255)] + 5*[(0,255,0)] + 18*[(0,0,255)] + [(0,255,255),(0,255,0),(0,0,255),(0,0,255)] 31 | pltcolors = 32 * ['b'] + 4*['b'] + 4*['c'] + 7*['m'] + 5*['y'] + 5*['g'] + 18*['r'] + ['y','g','r','r'] 32 | else: 33 | raise NotImplementedError('unknown bones/colors for J='+str(J)) 34 | return bones, bonecolors, pltcolors 35 | 36 | def _get_xy(pose2d, i): 37 | if isinstance(i,int): 38 | return pose2d[i,:] 39 | else: 40 | return np.mean(pose2d[i,:], axis=0) 41 | 42 | def _get_xy_tupleint(pose2d, i): 43 | return tuple(map(int,_get_xy(pose2d, i))) 44 | 45 | def _get_xyz(pose3d, i): 46 | if isinstance(i,int): 47 | return pose3d[i,:] 48 | else: 49 | return np.mean(pose3d[i,:], axis=0) 50 | 51 | def visualize_bodyhandface2d(im, dict_poses2d, dict_scores=None, lw=2, max_padding=100, bgr=True): 52 | """ 53 | bgr: whether input/output is bgr or rgb 54 | 55 | dict_poses2d: some key/value among {'body': body_pose2d, 'hand': hand_pose2d, 'face': face_pose2d} 56 | """ 57 | if all(v.size==0 for v in dict_poses2d.values()): return im 58 | 59 | h,w = im.shape[:2] 60 | bones = {} 61 | bonecolors = {} 62 | for k,v in dict_poses2d.items(): 63 | bones[k], bonecolors[k], _ = _get_bones_and_colors(v.shape[1]) 64 | 65 | # pad if necessary (if some joints are outside image boundaries) 66 | pad_top, pad_bot, pad_lft, pad_rgt = 0, 0, 0, 0 67 | for poses2d in dict_poses2d.values(): 68 | if poses2d.size==0: continue 69 | xmin, ymin = np.min(poses2d.reshape(-1,2), axis=0) 70 | xmax, ymax = np.max(poses2d.reshape(-1,2), axis=0) 71 | pad_top = max(pad_top, min(max_padding, max(0, int(-ymin-5)))) 72 | pad_bot = max(pad_bot, min(max_padding, max(0, int(ymax+5-h)))) 73 | pad_lft = max(pad_lft, min(max_padding, max(0, int(-xmin-5)))) 74 | pad_rgt = max(pad_rgt, min(max_padding, max(0, int(xmax+5-w)))) 75 | 76 | imout = cv2.copyMakeBorder(im, top=pad_top, bottom=pad_bot, left=pad_lft, right=pad_rgt, borderType=cv2.BORDER_CONSTANT, value=[0,0,0] ) 77 | if not bgr: imout = np.ascontiguousarray(imout[:,:,::-1]) 78 | outposes2d = {} 79 | for part,poses2d in dict_poses2d.items(): 80 | outposes2d[part] = poses2d.copy() 81 | outposes2d[part][:,:,0] += pad_lft 82 | outposes2d[part][:,:,1] += pad_top 83 | 84 | # for each part 85 | for part, poses2d in outposes2d.items(): 86 | 87 | # draw each detection 88 | for ipose in range(poses2d.shape[0]): # bones 89 | pose2d = poses2d[ipose,...] 90 | 91 | # draw poses 92 | for ii, (i,j) in enumerate(bones[part]): 93 | p1 = _get_xy_tupleint(pose2d, i) 94 | p2 = _get_xy_tupleint(pose2d, j) 95 | cv2.line(imout, p1, p2, bonecolors[part][ii], thickness=lw*2) 96 | for j in range(pose2d.shape[0]): 97 | p = _get_xy_tupleint(pose2d, j) 98 | cv2.circle(imout, p, (2 if part!='face' else 1)*lw, (0,0,255), thickness=-1) 99 | 100 | # draw scores 101 | if dict_scores is not None: cv2.putText(imout, '{:.2f}'.format(dict_scores[part][ipose]), (int(pose2d[12,0]-10),int(pose2d[12,1]-10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0) ) 102 | 103 | if not bgr: imout = imout[:,:,::-1] 104 | 105 | return imout 106 | -------------------------------------------------------------------------------- /postprocess.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020-present NAVER Corp. 2 | # CC BY-NC-SA 4.0 3 | # Available only for non-commercial use 4 | 5 | import numpy as np 6 | import torch 7 | from torchvision.ops import nms 8 | 9 | def _boxes_from_poses(poses, margin=0.1): # pytorch version 10 | x1y1,_ = torch.min(poses, dim=1) # N x 2 11 | x2y2,_ = torch.max(poses, dim=1) # N x 2 12 | coords = torch.cat( (x1y1,x2y2), dim=1) 13 | sizes = x2y2-x1y1 14 | coords[:,0:2] -= margin * sizes 15 | coords[:,2:4] += margin * sizes 16 | return coords 17 | 18 | def DOPE_NMS(scores, boxes, pose2d, pose3d, min_score=0.5, iou_threshold=0.1): 19 | if scores.numel()==0: 20 | return torch.LongTensor([]), torch.LongTensor([]) 21 | maxscores, bestcls = torch.max(scores[:,1:], dim=1) 22 | valid_indices = torch.nonzero(maxscores>=min_score) 23 | if valid_indices.numel()==0: 24 | return torch.LongTensor([]), torch.LongTensor([]) 25 | else: 26 | valid_indices = valid_indices[:,0] 27 | 28 | boxes = _boxes_from_poses(pose2d[valid_indices,bestcls[valid_indices],:,:], margin=0.1) 29 | indices = valid_indices[ nms(boxes, maxscores[valid_indices,...], iou_threshold) ] 30 | bestcls = bestcls[indices] 31 | 32 | return {'score': scores[indices, bestcls+1], 'pose2d': pose2d[indices, bestcls, :, :], 'pose3d': pose3d[indices, bestcls, :, :]}, indices, bestcls 33 | 34 | 35 | 36 | 37 | 38 | def _get_bbox_from_points(points2d, margin): 39 | """ 40 | Compute a bounding box around 2D keypoints, with a margin. 41 | margin: the margin is relative to the size of the tight bounding box 42 | 43 | """ 44 | assert (len(points2d.shape)==2 and points2d.shape[1] == 2) 45 | mini = np.min(points2d, axis=0) 46 | maxi = np.max(points2d, axis=0) 47 | size = maxi-mini 48 | lower = mini - margin*size 49 | upper = maxi + margin*size 50 | box = np.concatenate((lower, upper)).astype(np.float32) 51 | return box 52 | 53 | def assign_hands_to_body(body_poses, hand_poses, hand_isright, margin=1): 54 | if body_poses.size==0: return [] 55 | if hand_poses.size==0: return [(-1,-1) for i in range(body_poses.shape[0])] 56 | from scipy.spatial.distance import cdist 57 | body_rwrist = body_poses[:,6,:] 58 | body_lwrist = body_poses[:,7,:] 59 | hand_wrist = hand_poses[:,0,:] 60 | hand_boxes = np.concatenate([_get_bbox_from_points(hand_poses[i,:,:], margin=0.1)[None,:] for i in range(hand_poses.shape[0])], axis=0) 61 | hand_size = np.max(hand_boxes[:,2:4]-hand_boxes[:,0:2], axis=1) 62 | # associate body and hand if the distance hand-body and body-hand is the smallest one and is this distance is smaller than 3*hand_size 63 | wrists_from_body = [(-1,-1) for i in range(body_poses.shape[0])] # pair of (left_hand_id, right_hand_id) 64 | dist_lwrist = cdist(body_lwrist,hand_wrist) 65 | dist_rwrist = cdist(body_rwrist,hand_wrist) 66 | for i in range(body_poses.shape[0]): 67 | lwrist = -1 68 | rwrist = -1 69 | if hand_wrist.size>0: 70 | best_lwrist = np.argmin(dist_lwrist[i,:]) 71 | if np.argmin(dist_lwrist[:,best_lwrist])==i and dist_lwrist[i,best_lwrist] <= margin * hand_size[best_lwrist]: 72 | lwrist = best_lwrist 73 | best_rwrist = np.argmin(dist_rwrist[i,:]) 74 | if np.argmin(dist_rwrist[:,best_rwrist])==i and dist_rwrist[i,best_rwrist] <= margin * hand_size[best_rwrist]: 75 | rwrist = best_rwrist 76 | wrists_from_body[i] = (lwrist,rwrist) 77 | return wrists_from_body # pair of (left_hand_id, right_hand_id) for each body pose (-1 means no association) 78 | 79 | def assign_head_to_body(body_poses, head_poses): 80 | if body_poses.size==0: return [] 81 | if head_poses.size==0: return [-1 for i in range(body_poses.shape[0])] 82 | head_boxes = np.concatenate([_get_bbox_from_points(head_poses[i,:,:], margin=0.1)[None,:] for i in range(head_poses.shape[0])], axis=0) 83 | body_heads = body_poses[:,12,:] 84 | bodyhead_in_headboxes = np.empty( (body_poses.shape[0], head_boxes.shape[0]), dtype=np.bool) 85 | for i in range(body_poses.shape[0]): 86 | bodyhead = body_heads[i,:] 87 | bodyhead_in_headboxes[i,:] = (bodyhead[0]>=head_boxes[:,0]) * (bodyhead[0]<=head_boxes[:,2]) * (bodyhead[1]>=head_boxes[:,1]) * (bodyhead[1]<=head_boxes[:,3]) 88 | head_for_body = [] 89 | for i in range(body_poses.shape[0]): 90 | if np.sum(bodyhead_in_headboxes[i,:])==1: 91 | j = np.where(bodyhead_in_headboxes[i,:])[0][0] 92 | if np.sum(bodyhead_in_headboxes[:,j])==1: 93 | head_for_body.append(j) 94 | else: 95 | head_for_body.append(-1) 96 | else: 97 | head_for_body.append(-1) 98 | return head_for_body 99 | 100 | 101 | 102 | 103 | def assign_hands_and_head_to_body(detections): 104 | det_poses2d = {part: np.stack([d['pose2d'] for d in part_detections], axis=0) if len(part_detections)>0 else np.empty( (0,0,2), dtype=np.float32) for part, part_detections in detections.items()} 105 | hand_isright = np.array([ d['hand_isright'] for d in detections['hand']]) 106 | body_with_wrists = assign_hands_to_body(det_poses2d['body'], det_poses2d['hand'], hand_isright, margin=1) 107 | BODY_RIGHT_WRIST_KPT_ID = 6 108 | BODY_LEFT_WRIST_KPT_ID = 7 109 | for i,(lwrist,rwrist) in enumerate(body_with_wrists): 110 | if lwrist != -1: detections['body'][i]['pose2d'][BODY_LEFT_WRIST_KPT_ID,:] = detections['hand'][lwrist]['pose2d'][0,:] 111 | if rwrist != -1: detections['body'][i]['pose2d'][BODY_RIGHT_WRIST_KPT_ID,:] = detections['hand'][rwrist]['pose2d'][0,:] 112 | body_with_head = assign_head_to_body(det_poses2d['body'], det_poses2d['face']) 113 | return detections, body_with_wrists, body_with_head 114 | 115 | -------------------------------------------------------------------------------- /dope.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020-present NAVER Corp. 2 | # CC BY-NC-SA 4.0 3 | # Available only for non-commercial use 4 | 5 | import sys, os 6 | import argparse 7 | import os.path as osp 8 | from PIL import Image 9 | import cv2 10 | import numpy as np 11 | 12 | import torch 13 | from torchvision.transforms import ToTensor 14 | 15 | _thisdir = osp.realpath(osp.dirname(__file__)) 16 | 17 | from model import dope_resnet50, num_joints 18 | import postprocess 19 | 20 | import visu 21 | 22 | def dope(imagename, modelname, postprocessing='ppi'): 23 | if postprocessing=='ppi': 24 | sys.path.append( _thisdir+'/lcrnet-v2-improved-ppi/') 25 | try: 26 | from lcr_net_ppi_improved import LCRNet_PPI_improved 27 | except ModuleNotFoundError: 28 | raise Exception('To use the pose proposals integration (ppi) as postprocessing, please follow the readme instruction by cloning our modified version of LCRNet_v2.0 here. Alternatively, you can use --postprocess nms without any installation, with a slight decrease of performance.') 29 | 30 | 31 | device = 'cuda:0' if torch.cuda.is_available() else 'cpu' 32 | 33 | # load model 34 | ckpt_fname = osp.join(_thisdir, 'models', modelname+'.pth.tgz') 35 | if not os.path.isfile(ckpt_fname): 36 | raise Exception('{:s} does not exist, please download the model first and place it in the models/ folder'.format(ckpt_fname)) 37 | print('Loading model', modelname) 38 | ckpt = torch.load(ckpt_fname, map_location=device) 39 | #ckpt['half'] = False # uncomment this line in case your device cannot handle half computation 40 | ckpt['dope_kwargs']['rpn_post_nms_top_n_test'] = 1000 41 | model = dope_resnet50(**ckpt['dope_kwargs']) 42 | if ckpt['half']: model = model.half() 43 | model = model.eval() 44 | model.load_state_dict(ckpt['state_dict']) 45 | model = model.to(device) 46 | 47 | # load the image 48 | print('Loading image', imagename) 49 | image = Image.open(imagename) 50 | imlist = [ToTensor()(image).to(device)] 51 | if ckpt['half']: imlist = [im.half() for im in imlist] 52 | resolution = imlist[0].size()[-2:] 53 | 54 | # forward pass of the dope network 55 | print('Running DOPE') 56 | with torch.no_grad(): 57 | results = model(imlist, None)[0] 58 | 59 | # postprocess results (pose proposals integration, wrists/head assignment) 60 | print('Postprocessing') 61 | assert postprocessing in ['nms','ppi'] 62 | parts = ['body','hand','face'] 63 | if postprocessing=='ppi': 64 | res = {k: v.float().data.cpu().numpy() for k,v in results.items()} 65 | detections = {} 66 | for part in parts: 67 | detections[part] = LCRNet_PPI_improved(res[part+'_scores'], res['boxes'], res[part+'_pose2d'], res[part+'_pose3d'], resolution, **ckpt[part+'_ppi_kwargs']) 68 | else: # nms 69 | detections = {} 70 | for part in parts: 71 | dets, indices, bestcls = postprocess.DOPE_NMS(results[part+'_scores'], results['boxes'], results[part+'_pose2d'], results[part+'_pose3d'], min_score=0.3) 72 | dets = {k: v.float().data.cpu().numpy() for k,v in dets.items()} 73 | detections[part] = [{'score': dets['score'][i], 'pose2d': dets['pose2d'][i,...], 'pose3d': dets['pose3d'][i,...]} for i in range(dets['score'].size)] 74 | if part=='hand': 75 | for i in range(len(detections[part])): 76 | detections[part][i]['hand_isright'] = bestcls0 else np.empty( (0,num_joints[part],2), dtype=np.float32) for part, part_detections in detections.items()} 84 | scores = {part: [d['score'] for d in part_detections] for part,part_detections in detections.items()} 85 | imout = visu.visualize_bodyhandface2d(np.asarray(image)[:,:,::-1], 86 | det_poses2d, 87 | dict_scores=scores, 88 | ) 89 | outfile = imagename+'_{:s}.jpg'.format(modelname) 90 | cv2.imwrite(outfile, imout) 91 | print('\t', outfile) 92 | 93 | # display results in 3D 94 | if args.do_visu3d: 95 | print('Displaying results in 3D') 96 | import visu3d 97 | viewer3d = visu3d.Viewer3d() 98 | img3d, img2d = viewer3d.plot3d(image, 99 | bodies={'pose3d': np.stack([d['pose3d'] for d in detections['body']]), 'pose2d' : np.stack([d['pose2d'] for d in detections['body']])}, 100 | hands={'pose3d': np.stack([d['pose3d'] for d in detections['hand']]), 'pose2d' : np.stack([d['pose2d'] for d in detections['hand']])}, 101 | faces={'pose3d': np.stack([d['pose3d'] for d in detections['face']]), 'pose2d' : np.stack([d['pose2d'] for d in detections['face']])}, 102 | body_with_wrists=body_with_wrists, 103 | body_with_head=body_with_head, 104 | interactive=False) 105 | outfile3d = imagename+'_{:s}_visu3d.jpg'.format(modelname) 106 | cv2.imwrite(outfile3d, img3d[:,:,::-1]) 107 | print('\t', outfile3d) 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | if __name__=="__main__": 116 | parser = argparse.ArgumentParser(description='running DOPE on an image: python dope.py --model --image ') 117 | parser.add_argument('--model', required=True, type=str, help='name of the model to use (eg DOPE_v1_0_0)') 118 | parser.add_argument('--image', required=True, type=str , help='path to the image') 119 | parser.add_argument('--postprocess', default='ppi', choices=['ppi','nms'], help='postprocessing method') 120 | parser.add_argument('--visu3d', dest='do_visu3d', default=False, action='store_true') 121 | args = parser.parse_args() 122 | dope(args.image, args.model, postprocessing=args.postprocess) 123 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | dope 2 | Copyright 2019-present NAVER Corp. 3 | 4 | This project contains subcomponents with separate copyright notices and license terms. 5 | Your use of the source code for these subcomponents is subject to the terms and conditions of the following licenses. 6 | 7 | ===== 8 | 9 | pytorch/pytorch 10 | https://github.com/pytorch/pytorch 11 | 12 | 13 | From PyTorch: 14 | 15 | Copyright (c) 2016- Facebook, Inc (Adam Paszke) 16 | Copyright (c) 2014- Facebook, Inc (Soumith Chintala) 17 | Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert) 18 | Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu) 19 | Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu) 20 | Copyright (c) 2011-2013 NYU (Clement Farabet) 21 | Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston) 22 | Copyright (c) 2006 Idiap Research Institute (Samy Bengio) 23 | Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz) 24 | 25 | From Caffe2: 26 | 27 | Copyright (c) 2016-present, Facebook Inc. All rights reserved. 28 | 29 | All contributions by Facebook: 30 | Copyright (c) 2016 Facebook Inc. 31 | 32 | All contributions by Google: 33 | Copyright (c) 2015 Google Inc. 34 | All rights reserved. 35 | 36 | All contributions by Yangqing Jia: 37 | Copyright (c) 2015 Yangqing Jia 38 | All rights reserved. 39 | 40 | All contributions from Caffe: 41 | Copyright(c) 2013, 2014, 2015, the respective contributors 42 | All rights reserved. 43 | 44 | All other contributions: 45 | Copyright(c) 2015, 2016 the respective contributors 46 | All rights reserved. 47 | 48 | Caffe2 uses a copyright model similar to Caffe: each contributor holds 49 | copyright over their contributions to Caffe2. The project versioning records 50 | all such contribution and copyright details. If a contributor wants to further 51 | mark their specific copyright on a particular contribution, they should 52 | indicate their copyright solely in the commit message of the change when it is 53 | committed. 54 | 55 | All rights reserved. 56 | 57 | Redistribution and use in source and binary forms, with or without 58 | modification, are permitted provided that the following conditions are met: 59 | 60 | 1. Redistributions of source code must retain the above copyright 61 | notice, this list of conditions and the following disclaimer. 62 | 63 | 2. Redistributions in binary form must reproduce the above copyright 64 | notice, this list of conditions and the following disclaimer in the 65 | documentation and/or other materials provided with the distribution. 66 | 67 | 3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America 68 | and IDIAP Research Institute nor the names of its contributors may be 69 | used to endorse or promote products derived from this software without 70 | specific prior written permission. 71 | 72 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 73 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 74 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 75 | ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 76 | LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 77 | CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 78 | SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 79 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 80 | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 81 | ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 82 | POSSIBILITY OF SUCH DAMAGE. 83 | 84 | ===== 85 | 86 | pytorch/vision 87 | https://github.com/pytorch/vision 88 | 89 | 90 | BSD 3-Clause License 91 | 92 | Copyright (c) Soumith Chintala 2016, 93 | All rights reserved. 94 | 95 | Redistribution and use in source and binary forms, with or without 96 | modification, are permitted provided that the following conditions are met: 97 | 98 | * Redistributions of source code must retain the above copyright notice, this 99 | list of conditions and the following disclaimer. 100 | 101 | * Redistributions in binary form must reproduce the above copyright notice, 102 | this list of conditions and the following disclaimer in the documentation 103 | and/or other materials provided with the distribution. 104 | 105 | * Neither the name of the copyright holder nor the names of its 106 | contributors may be used to endorse or promote products derived from 107 | this software without specific prior written permission. 108 | 109 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 110 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 111 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 112 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 113 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 114 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 115 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 116 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 117 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 118 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 119 | 120 | ==== 121 | 122 | almarklein/visvis 123 | https://github.com/almarklein/visvis 124 | 125 | 126 | Visvis is subject to the (new) BSD license: 127 | 128 | Copyright (c) 2012-2017, Visvis development team 129 | 130 | Redistribution and use in source and binary forms, with or without 131 | modification, are permitted provided that the following conditions are met: 132 | * Redistributions of source code must retain the above copyright 133 | notice, this list of conditions and the following disclaimer. 134 | * Redistributions in binary form must reproduce the above copyright 135 | notice, this list of conditions and the following disclaimer in the 136 | documentation and/or other materials provided with the distribution. 137 | * The names of its contributors may not be used to endorse or promote 138 | products derived from this software without specific prior written 139 | permission. 140 | 141 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 142 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 143 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 144 | ARE DISCLAIMED. IN NO EVENT SHALL ALMAR KLEIN BE LIABLE FOR ANY 145 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 146 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 147 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 148 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 149 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 150 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 151 | 152 | Visvis contributers: 153 | 154 | * Almar Klein 155 | * Robert Schroll 156 | * Keith Smith 157 | * Rob Reilink 158 | -------------------------------------------------------------------------------- /visu3d.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020-present NAVER Corp. 2 | # CC BY-NC-SA 4.0 3 | # Available only for non-commercial use 4 | 5 | 6 | import numpy as np 7 | import visvis as vv 8 | import visu 9 | import scipy.optimize 10 | import copy 11 | import PIL 12 | 13 | def scale_orthographic(points3d, points2d): 14 | """ 15 | Return a scaled set of 3D points, offseted in XY direction so as to minimize the distance to the 2D points 16 | """ 17 | residuals_func1 = lambda x : ((x[None,:2] + points3d[:,:2]) - points2d).flatten() 18 | res1 = scipy.optimize.least_squares(residuals_func1, x0=[0,0], method='lm') 19 | residuals_func2 = lambda x : (np.exp(x[2]) * (x[None,:2] + points3d[:,:2]) - points2d).flatten() 20 | res2 = scipy.optimize.least_squares(residuals_func2, x0=np.concatenate((res1['x'],[0])), method='lm') 21 | x=res2['x'] 22 | output3d = points3d.copy() 23 | output3d[:,:2]+=x[None,:2] 24 | output3d*=np.exp(x[2]) 25 | return output3d 26 | 27 | 28 | 29 | class Viewer3d: 30 | def __init__(self, display2d=True, camera_zoom=None, camera_location=None): 31 | self.app = vv.use('qt5') 32 | self.figsize = (1280+20,720+20) 33 | self.display2d = display2d 34 | self.camera_zoom = camera_zoom 35 | self.camera_location = camera_location 36 | 37 | 38 | def plot3d(self, img, 39 | bodies={'pose3d':np.empty((0,13,3)), 'pose2d':np.empty((0,13,2))}, 40 | hands={'pose3d':np.empty((0,21,3)), 'pose2d':np.empty((0,21,2))}, 41 | faces={'pose3d':np.empty((0,84,3)), 'pose2d':np.empty((0,84,2))}, 42 | body_with_wrists=[], 43 | body_with_head=[], 44 | interactive=False): 45 | """ 46 | :param img: a HxWx3 numpy array 47 | :param bodies: dictionnaroes with 'pose3d' (resp 'pose2d') with the body 3D (resp 2D) pose 48 | :param faces: same with face pose 49 | :param hands: same with hand pose 50 | :param body_with_wrists: list with for each body, a tuple (left_hand_id, right_hand_id) of the index of the hand detection attached to this body detection (-1 if none) for left and right hands 51 | :parma body_with_head: list with for each body, the index of the face detection attached to this body detection (-1 if none) 52 | :param interactive: whether to open the viewer in an interactive manner or not 53 | """ 54 | 55 | # body pose do not use the same coordinate systems 56 | bodies['pose3d'][:,:,0] *= -1 57 | bodies['pose3d'][:,:,1] *= -1 58 | 59 | # Compute 3D scaled representation of each part, stored in "points3d" 60 | hands, bodies, faces = [copy.copy(s) for s in (hands, bodies, faces)] 61 | parts = (hands, bodies, faces) 62 | for part in parts: 63 | part['points3d'] = np.zeros_like(part['pose3d']) 64 | for part_idx in range(len(part['pose3d'])): 65 | points3d = scale_orthographic(part['pose3d'][part_idx], part['pose2d'][part_idx]) 66 | part['points3d'][part_idx] = points3d 67 | 68 | # Various display tricks to make the 3D visualization of full-body nice 69 | # (1) for faces, add a Z offset to faces to align them with the body 70 | for body_id, face_id in enumerate(body_with_head): 71 | if face_id !=-1: 72 | z_offset = bodies['points3d'][body_id,12,2] - np.mean(faces['points3d'][face_id,:,2]) 73 | faces['points3d'][face_id,:,2] += z_offset 74 | # (2) for hands, add a 3D offset to put them at the wrist location 75 | for body_id, (lwrist_id, rwrist_id) in enumerate(body_with_wrists): 76 | if lwrist_id != -1: 77 | hands['points3d'][lwrist_id,:,:] = bodies['points3d'][body_id,7,:] - hands['points3d'][lwrist_id,0,:] 78 | if rwrist_id != -1: 79 | hands['points3d'][rwrist_id,:,:] = bodies['points3d'][body_id,6,:] - hands['points3d'][rwrist_id,0,:] 80 | 81 | img = np.asarray(img) 82 | height, width = img.shape[:2] 83 | 84 | 85 | fig=vv.figure(1) 86 | fig.Clear() 87 | 88 | fig._SetPosition(0,0,self.figsize[0], self.figsize[1]) 89 | if not interactive: 90 | fig._enableUserInteraction=False 91 | 92 | axes = vv.gca() 93 | # Hide axis 94 | axes.axis.visible = False 95 | 96 | scaling_factor = 1.0/height 97 | 98 | # Camera interaction is not intuitive along z axis 99 | # We reference every object to a parent frame that is rotated to circumvent the issue 100 | ref_frame = vv.Wobject(axes) 101 | ref_frame.transformations.append(vv.Transform_Rotate(-90, 1,0,0)) 102 | ref_frame.transformations.append(vv.Transform_Translate(-0.5*width*scaling_factor, -0.5, 0)) 103 | 104 | # Draw image 105 | if self.display2d: 106 | # Display pose in 2D 107 | img = visu.visualize_bodyhandface2d(img, 108 | dict_poses2d={'body': bodies['pose2d'], 109 | 'hand': hands['pose2d'], 110 | 'face': faces['pose2d']}, 111 | lw=2, max_padding=0, bgr=False) 112 | 113 | XX, YY = np.meshgrid([0,width*scaling_factor],[0, 1]) 114 | img_z_offset = 0.5 115 | ZZ = img_z_offset * np.ones(XX.shape) 116 | # Draw image 117 | embedded_img = vv.surf(XX, YY, ZZ, img) 118 | embedded_img.parent = ref_frame 119 | embedded_img.ambientAndDiffuse=1.0 120 | 121 | # Draw a grid on the bottom floor to get a sense of depth 122 | XX, ZZ = np.meshgrid(np.linspace(0, width*scaling_factor, 10), img_z_offset - np.linspace(0, width*scaling_factor, 10)) 123 | YY = np.ones_like(XX) 124 | grid3d = vv.surf(XX, YY, ZZ) 125 | grid3d.parent = ref_frame 126 | grid3d.edgeColor=(0.1,0.1,0.1,1.0) 127 | grid3d.edgeShading='plain' 128 | grid3d.faceShading=None 129 | 130 | 131 | # Draw pose 132 | for part in parts: 133 | 134 | for part_idx in range(len(part['points3d'])): 135 | points3d = part['points3d'][part_idx]*scaling_factor 136 | # Draw bones 137 | J = len(points3d) 138 | is_body = (J==13) 139 | ignore_neck = False if not is_body else body_with_head[part_idx]!=-1 140 | bones, bonecolors, pltcolors = visu._get_bones_and_colors(J, ignore_neck=ignore_neck) 141 | for (kpt_id1, kpt_id2), color in zip(bones, bonecolors): 142 | color = color[2], color[1], color[0] # BGR vs RGB 143 | p1 = visu._get_xyz(points3d, kpt_id1) 144 | p2 = visu._get_xyz(points3d, kpt_id2) 145 | pointset=vv.Pointset(3) 146 | pointset.append(p1) 147 | pointset.append(p2) 148 | 149 | # Draw bones as solid capsules 150 | bone_radius = 0.005 151 | line = vv.solidLine(pointset, radius=bone_radius) 152 | line.faceColor = color 153 | line.ambientAndDiffuse=1.0 154 | 155 | line.parent = ref_frame 156 | 157 | # Draw keypoints, except for faces 158 | if J != 84: 159 | keypoints_to_plot = points3d 160 | if ignore_neck: 161 | # for a nicer display, ignore head keypoint 162 | keypoints_to_plot=keypoints_to_plot[:12,:] 163 | # Use solid spheres 164 | for i in range(len(keypoints_to_plot)): 165 | kpt_wobject = vv.solidSphere(translation=keypoints_to_plot[i,:].tolist(), scaling=1.5*bone_radius) 166 | kpt_wobject.faceColor = (255,0,0) 167 | kpt_wobject.ambientAndDiffuse=1.0 168 | kpt_wobject.parent = ref_frame 169 | 170 | # Use just an ambient lighting 171 | axes.light0.ambient=0.8 172 | axes.light0.diffuse=0.2 173 | axes.light0.specular=0.0 174 | 175 | cam = vv.cameras.ThreeDCamera() 176 | axes.camera=cam 177 | #z axis 178 | cam.azimuth=-45 179 | cam.elevation=20 180 | cam.roll=0 181 | # Orthographic camera 182 | cam.fov=0 183 | if self.camera_zoom is None: 184 | cam.zoom *= 1.3 # Zoom a bit more 185 | else: 186 | cam.zoom = self.camera_zoom 187 | if self.camera_location is not None: 188 | cam.loc = self.camera_location 189 | cam.SetView() 190 | 191 | 192 | if interactive: 193 | self.app.Run() 194 | else: 195 | fig._widget.update() 196 | self.app.ProcessEvents() 197 | 198 | img3d = vv.getframe(vv.gcf()) 199 | img3d = np.clip(img3d * 255, 0, 255).astype(np.uint8) 200 | # Crop gray borders 201 | img3d = img3d[10:-10, 10:-10,:] 202 | 203 | return img3d, img 204 | 205 | 206 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020-present NAVER Corp. 2 | # CC BY-NC-SA 4.0 3 | # Available only for non-commercial use 4 | 5 | from collections import OrderedDict 6 | 7 | import torch 8 | from torch import nn 9 | import torch.nn.functional as F 10 | 11 | from torchvision.ops import misc as misc_nn_ops 12 | from torchvision.ops import MultiScaleRoIAlign 13 | 14 | from torchvision.models import resnet 15 | from torchvision.models.detection.rpn import AnchorGenerator, RPNHead, RegionProposalNetwork 16 | from torchvision.models.detection.roi_heads import RoIHeads 17 | from torchvision.models.detection.generalized_rcnn import GeneralizedRCNN 18 | from torchvision.models.detection.transform import GeneralizedRCNNTransform, resize_keypoints, resize_boxes 19 | 20 | parts = ['body','hand','face'] 21 | num_joints = {'body': 13, 'hand': 21, 'face': 84} 22 | 23 | class Dope_Transform(GeneralizedRCNNTransform): 24 | 25 | def __init__(self, min_size, max_size, image_mean, image_std): 26 | super(self.__class__, self).__init__(min_size, max_size, image_mean, image_std) 27 | 28 | def postprocess(self, result, image_shapes, original_image_sizes): 29 | if self.training: 30 | return result 31 | for i, (pred, im_s, o_im_s) in enumerate(zip(result, image_shapes, original_image_sizes)): 32 | boxes = pred["boxes"] 33 | boxes = resize_boxes(boxes, im_s, o_im_s) 34 | result[i]["boxes"] = boxes 35 | for k in ['pose2d', 'body_pose2d', 'hand_pose2d', 'face_pose2d']: 36 | if k in pred and pred[k] is not None: 37 | pose2d = pred[k] 38 | pose2d = resize_keypoints(pose2d, im_s, o_im_s) 39 | result[i][k] = pose2d 40 | return result 41 | 42 | class Dope_RCNN(GeneralizedRCNN): 43 | 44 | def __init__(self, backbone, 45 | dope_roi_pool, dope_head, dope_predictor, 46 | # transform parameters 47 | min_size=800, max_size=1333, 48 | image_mean=None, image_std=None, 49 | # RPN parameters 50 | rpn_anchor_generator=None, rpn_head=None, 51 | rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000, 52 | rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000, 53 | rpn_nms_thresh=0.7, 54 | rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3, 55 | rpn_batch_size_per_image=256, rpn_positive_fraction=0.5, 56 | # others 57 | num_anchor_poses = {'body': 20, 'hand': 10, 'face': 10}, 58 | pose2d_reg_weights = {part: 5.0 for part in parts}, 59 | pose3d_reg_weights = {part: 5.0 for part in parts}, 60 | ): 61 | 62 | if not hasattr(backbone, "out_channels"): 63 | raise ValueError( 64 | "backbone should contain an attribute out_channels " 65 | "specifying the number of output channels (assumed to be the " 66 | "same for all the levels)") 67 | 68 | assert isinstance(rpn_anchor_generator, (AnchorGenerator, type(None))) 69 | assert isinstance(dope_roi_pool, (MultiScaleRoIAlign, type(None))) 70 | 71 | out_channels = backbone.out_channels 72 | 73 | if rpn_anchor_generator is None: 74 | anchor_sizes = ((32,), (64,), (128,), (256,), (512,)) 75 | aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes) 76 | rpn_anchor_generator = AnchorGenerator( 77 | anchor_sizes, aspect_ratios 78 | ) 79 | if rpn_head is None: 80 | rpn_head = RPNHead( 81 | out_channels, rpn_anchor_generator.num_anchors_per_location()[0] 82 | ) 83 | 84 | rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test) 85 | rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test) 86 | 87 | rpn = RegionProposalNetwork( 88 | rpn_anchor_generator, rpn_head, 89 | rpn_fg_iou_thresh, rpn_bg_iou_thresh, 90 | rpn_batch_size_per_image, rpn_positive_fraction, 91 | rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh) 92 | 93 | dope_heads = Dope_RoIHeads(dope_roi_pool, dope_head, dope_predictor, num_anchor_poses, pose2d_reg_weights=pose2d_reg_weights, pose3d_reg_weights=pose3d_reg_weights) 94 | 95 | if image_mean is None: 96 | image_mean = [0.485, 0.456, 0.406] 97 | if image_std is None: 98 | image_std = [0.229, 0.224, 0.225] 99 | transform = Dope_Transform(min_size, max_size, image_mean, image_std) 100 | 101 | super(Dope_RCNN, self).__init__(backbone, rpn, dope_heads, transform) 102 | 103 | 104 | 105 | class Dope_Predictor(nn.Module): 106 | 107 | def __init__(self, in_channels, dict_num_classes, dict_num_posereg): 108 | super(self.__class__, self).__init__() 109 | self.body_cls_score = nn.Linear(in_channels, dict_num_classes['body']) 110 | self.body_pose_pred = nn.Linear(in_channels, dict_num_posereg['body']) 111 | self.hand_cls_score = nn.Linear(in_channels, dict_num_classes['hand']) 112 | self.hand_pose_pred = nn.Linear(in_channels, dict_num_posereg['hand']) 113 | self.face_cls_score = nn.Linear(in_channels, dict_num_classes['face']) 114 | self.face_pose_pred = nn.Linear(in_channels, dict_num_posereg['face']) 115 | 116 | 117 | 118 | def forward(self, x): 119 | if x.dim() == 4: 120 | assert list(x.shape[2:]) == [1, 1] 121 | x = x.flatten(start_dim=1) 122 | scores = {} 123 | pose_deltas = {} 124 | scores['body'] = self.body_cls_score(x) 125 | pose_deltas['body'] = self.body_pose_pred(x) 126 | scores['hand'] = self.hand_cls_score(x) 127 | pose_deltas['hand'] = self.hand_pose_pred(x) 128 | scores['face'] = self.face_cls_score(x) 129 | pose_deltas['face'] = self.face_pose_pred(x) 130 | return scores, pose_deltas 131 | 132 | 133 | 134 | 135 | class Dope_RoIHeads(RoIHeads): 136 | 137 | def __init__(self, 138 | dope_roi_pool, 139 | dope_head, 140 | dope_predictor, 141 | num_anchor_poses, 142 | pose2d_reg_weights, 143 | pose3d_reg_weights): 144 | 145 | fg_iou_thresh=0.5 146 | bg_iou_thresh=0.5 147 | batch_size_per_image=512 148 | positive_fraction=0.25 149 | bbox_reg_weights = [0.0]*4 150 | score_thresh = 0.0 151 | nms_thresh = 1.0 152 | detections_per_img = 99999999 153 | super(self.__class__, self).__init__(None, None, None, fg_iou_thresh, bg_iou_thresh, batch_size_per_image, positive_fraction, bbox_reg_weights,score_thresh,nms_thresh,detections_per_img,mask_roi_pool=None,mask_head=None,mask_predictor=None,keypoint_roi_pool=None,keypoint_head=None,keypoint_predictor=None) 154 | for k in parts: 155 | self.register_buffer(k+'_anchor_poses', torch.empty( (num_anchor_poses[k], num_joints[k], 5) )) 156 | self.dope_roi_pool = dope_roi_pool 157 | self.dope_head = dope_head 158 | self.dope_predictor = dope_predictor 159 | self.J = num_joints 160 | self.pose2d_reg_weights = pose2d_reg_weights 161 | self.pose3d_reg_weights = pose3d_reg_weights 162 | 163 | def forward(self, features, proposals, image_shapes, targets=None): 164 | """ 165 | Arguments: 166 | features (List[torch.Tensor]) 167 | proposals (List[torch.Tensor[N, 4]]) 168 | image_shapes (List[Tuple[H, W]]) 169 | targets (List[Dict]) 170 | """ 171 | 172 | # roi_pool 173 | if features['0'].dtype==torch.float16: # UGLY: dope_roi_pool is not yet compatible with half 174 | features = {'0': features['0'].float()} 175 | if proposals[0].dtype==torch.float16: 176 | hproposals = [p.float() for p in proposals] 177 | else: 178 | hproposals = proposals 179 | dope_features = self.dope_roi_pool(features, hproposals, image_shapes) 180 | dope_features = dope_features.half() 181 | else: 182 | dope_features = self.dope_roi_pool(features, proposals, image_shapes) 183 | 184 | # head 185 | dope_features = self.dope_head(dope_features) 186 | 187 | # predictor 188 | class_logits, dope_regression = self.dope_predictor(dope_features) 189 | 190 | # process results 191 | result = [] 192 | losses = {} 193 | if self.training: 194 | raise NotImplementedError 195 | else: 196 | boxes, scores, poses2d, poses3d = self.postprocess_dope(class_logits, dope_regression, proposals, image_shapes) 197 | num_images = len(boxes) 198 | for i in range(num_images): 199 | res = {'boxes': boxes[i]} 200 | for k in parts: 201 | res[k+'_scores'] = scores[k][i] 202 | res[k+'_pose2d'] = poses2d[k][i] 203 | res[k+'_pose3d'] = poses3d[k][i] 204 | result.append(res) 205 | 206 | return result, losses 207 | 208 | def postprocess_dope(self, class_logits, dope_regression, proposals, image_shapes): 209 | boxes_per_image = [len(boxes_in_image) for boxes_in_image in proposals] 210 | num_images = len(proposals) 211 | pred_scores = {} 212 | all_poses_2d = {} 213 | all_poses_3d = {} 214 | for k in parts: 215 | # anchor poses 216 | anchor_poses = getattr(self, k+'_anchor_poses') 217 | nboxes, num_classes = class_logits[k].size() 218 | # scores 219 | sc = F.softmax(class_logits[k], -1) 220 | pred_scores[k] = sc.split(boxes_per_image, 0) 221 | # poses 222 | all_poses_2d[k] = [] 223 | all_poses_3d[k] = [] 224 | dope_regression[k] = dope_regression[k].view(nboxes, num_classes-1, self.J[k] * 5 ) 225 | dope_regression_per_image = dope_regression[k].split(boxes_per_image, 0) 226 | for img_id in range(num_images): 227 | dope_reg = dope_regression_per_image[img_id] 228 | boxes = proposals[img_id] 229 | # 2d 230 | offset = boxes[:,0:2] 231 | scale = boxes[:,2:4]-boxes[:,0:2] 232 | box_resized_anchors = offset[:,None,None,:] + anchor_poses[None,:,:,:2] * scale[:,None,None,:] 233 | dope_reg_2d = dope_reg[:,:,:2*self.J[k]].reshape(boxes.size(0),num_classes-1,self.J[k],2) / self.pose2d_reg_weights[k] 234 | pose2d = box_resized_anchors + dope_reg_2d * scale[:,None,None,:] 235 | all_poses_2d[k].append(pose2d) 236 | # 3d 237 | anchor3d = anchor_poses[None,:,:,-3:] 238 | dope_reg_3d = dope_reg[:,:,-3*self.J[k]:].reshape(boxes.size(0),num_classes-1,self.J[k],3) / self.pose3d_reg_weights[k] 239 | pose3d = anchor3d + dope_reg_3d 240 | all_poses_3d[k].append(pose3d) 241 | return proposals, pred_scores, all_poses_2d, all_poses_3d 242 | 243 | 244 | 245 | 246 | def dope_resnet50(**dope_kwargs): 247 | 248 | backbone_name = 'resnet50' 249 | from torchvision.ops import misc as misc_nn_ops 250 | class FrozenBatchNorm2dWithHalf(misc_nn_ops.FrozenBatchNorm2d): 251 | def forward(self, x): 252 | if x.dtype==torch.float16: # UGLY: seems that it does not work with half otherwise, so let's just use the standard bn function or half 253 | return F.batch_norm(x, self.running_mean, self.running_var, self.weight, self.bias, training=False) 254 | else: 255 | return super(self.__class__, self).forward(x) 256 | 257 | backbone = resnet.__dict__[backbone_name](pretrained=False, norm_layer=FrozenBatchNorm2dWithHalf) 258 | # build the main blocks 259 | class ResNetBody(nn.Module): 260 | def __init__(self, backbone): 261 | super(self.__class__, self).__init__() 262 | self.resnet_backbone = backbone 263 | self.out_channels = 1024 264 | def forward(self, x): 265 | x = self.resnet_backbone.conv1(x) 266 | x = self.resnet_backbone.bn1(x) 267 | x = self.resnet_backbone.relu(x) 268 | x = self.resnet_backbone.maxpool(x) 269 | x = self.resnet_backbone.layer1(x) 270 | x = self.resnet_backbone.layer2(x) 271 | x = self.resnet_backbone.layer3(x) 272 | return x 273 | resnet_body = ResNetBody(backbone) 274 | # build the anchor generator and pooler 275 | anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),), aspect_ratios=((0.5, 1.0, 2.0),)) 276 | roi_pooler = MultiScaleRoIAlign(featmap_names=['0'], output_size=7, sampling_ratio=2) 277 | # build the head and predictor 278 | class ResNetHead(nn.Module): 279 | def __init__(self, backbone): 280 | super(self.__class__, self).__init__() 281 | self.resnet_backbone = backbone 282 | def forward(self, x): 283 | x = self.resnet_backbone.layer4(x) 284 | x = self.resnet_backbone.avgpool(x) 285 | x = torch.flatten(x, 1) 286 | return x 287 | resnet_head = ResNetHead(backbone) 288 | 289 | # predictor 290 | num_anchor_poses = dope_kwargs['num_anchor_poses'] 291 | num_classes = {k: v+1 for k,v in num_anchor_poses.items()} 292 | num_posereg = {k: num_anchor_poses[k] * num_joints[k] * 5 for k in num_joints.keys()} 293 | predictor = Dope_Predictor(2048, num_classes, num_posereg) 294 | 295 | # full model 296 | model = Dope_RCNN(resnet_body, roi_pooler, resnet_head, predictor, rpn_anchor_generator=anchor_generator, **dope_kwargs) 297 | 298 | return model 299 | --------------------------------------------------------------------------------