├── 015994080.jpg
├── example_result.jpg
├── LICENSE
├── README.MD
├── visu.py
├── postprocess.py
├── dope.py
├── NOTICE
├── visu3d.py
└── model.py


/015994080.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/naver/dope/HEAD/015994080.jpg


--------------------------------------------------------------------------------
/example_result.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/naver/dope/HEAD/example_result.jpg


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | DOPE, Copyright (c) 2020 Naver Corporation, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license.
2 | 
3 | A summary of the CC BY-NC-SA 4.0 license is located here:
4 | 	https://creativecommons.org/licenses/by-nc-sa/4.0/
5 | 
6 | The CC BY-NC-SA 4.0 license is located here:
7 | 	https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
8 | 
9 | 


--------------------------------------------------------------------------------
/README.MD:
--------------------------------------------------------------------------------
 1 | # DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild
 2 | 
 3 | This repository contains the code for running our DOPE model.
 4 | We only provide code for testing, not for training.
 5 | If you use our code, please cite our [ECCV'20 paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123710375.pdf):
 6 | 
 7 | ```bibtex
 8 | @inproceedings{dope,
 9 |   title={{DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild}},
10 |   author={{Weinzaepfel, Philippe and Br\'egier, Romain and Combaluzier, Hadrien and Leroy, Vincent and Rogez, Gr\'egory},
11 |   booktitle={{ECCV}},
12 |   year={2020}
13 | }
14 | ```
15 | 
16 | ## License
17 | 
18 | DOPE is distributed under the CC BY-NC-SA 4.0 License. See [LICENSE](LICENSE) for more information.
19 | 
20 | ### Getting started
21 | 
22 | Our python3 code requires the following packages:
23 | * pytorch
24 | * torchvision
25 | * opencv (for drawing the results)
26 | * numpy/scipy
27 | 
28 | Our code has been tested on Linux, with pytorch 1.5 and torchvision 0.6.
29 | We do not provide support for installation.
30 | 
31 | #### Download the models
32 | 
33 | First create a folder `models/` in which you should place the downloaded pretrained models.
34 | The list of models include:
35 | * [DOPE_v1_0_0](http://download.europe.naverlabs.com/ComputerVision/DOPE_models/DOPE_v1_0_0.pth.tgz) as used in our ECCV'20 paper
36 | * [DOPErealtime_v1_0_0](http://download.europe.naverlabs.com/ComputerVision/DOPE_models/DOPErealtime_v1_0_0.pth.tgz) which is its real-time version
37 | 
38 | #### post-processing with a modified version of LCR-Net++
39 | 
40 | Our post-processing relies on a modified version of the pose proposals integration proposed in the [LCR-Net++ code](https://thoth.inrialpes.fr/src/LCR-Net/).
41 | To get this code, once in the DOPE folder, please clone our modified LCR-Net++ repository:
42 | ```
43 | git clone https://github.com/naver/lcrnet-v2-improved-ppi.git
44 | ```
45 | 
46 | Alternatively, you can use a more naive post-processing based on non-maximum suppression by add `--postprocess nms` to the commandlines below, which will result in a slight decrease of performance.
47 | 
48 | 
49 | ## Using the code
50 | 
51 | To use our code on an image, use the following command:
52 | 
53 | ```
54 | python dope.py --model <modelname> --image <imagename>
55 | ```
56 | 
57 | with
58 | * `<modelname>`: name of model to use (eg DOPE_v1_0_0)
59 | * `<imagename>`: name of the image to test
60 | 
61 | For instance, you can run
62 | ```
63 | python dope.py  --model DOPErealtime_v1_0_0 --image 015994080.jpg
64 | ```
65 | 
66 | The command will create an image `<imagename>_<modelname>.jpg` that shows the 2D poses output by our DOPE model.
67 | 
68 | We also provide code to visualize the results but in 2D and in 3D, just add the argument `--visu3d` to the previous commandline.
69 | The 3D visualization uses OpenGL and the visvis python package, we do not provide support for installation and OpenGL issues.
70 | Running the command with the `--visu3d` option should create another file with the 3D visualization named `<imagename>_<modelname>_visu3d.jpg`.
71 | Note that DOPE predicts 3D poses for each body part in their relative coordinate systems, eg, centered on body center for bodies.
72 | For better visualization, we approximate 3D scene coordinates by finding the offsets that minimize the reprojection error based on a scaled orthographic projection model.
73 | 
74 | Here is one example resulting image:
75 | ![example result](example_result.jpg)
76 | 
77 | Our real-time models use half computation. In case your device cannot handle it, please uncomment the line `  #ckpt['half'] = False` in `dope.py`.
78 | 
79 | 
80 | 


--------------------------------------------------------------------------------
/visu.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2020-present NAVER Corp.
  2 | # CC BY-NC-SA 4.0
  3 | # Available only for non-commercial use
  4 | 
  5 | import numpy as np
  6 | import cv2
  7 | 
  8 | def _get_bones_and_colors(J, ignore_neck=False): # colors in BGR
  9 |     """
 10 |     param J: number of joints -- used to deduce the body part considered.
 11 |     param ignore_neck: if True, the neck bone of won't be returned in case of a body (J==13)
 12 |     """
 13 |     if J==13: # full body (similar to LCR-Net)
 14 |         lbones = [(9,11),(7,9),(1,3),(3,5)]
 15 |         if ignore_neck:
 16 |             rbones = [(0,2),(2,4),(8,10),(6,8)] + [(4,5),(10,11)] + [([4,5],[10,11])]
 17 |         else:
 18 |             rbones = [(0,2),(2,4),(8,10),(6,8)] + [(4,5),(10,11)] + [([4,5],[10,11]),(12,[10,11])]
 19 |         bonecolors = [ [0,255,0] ] * len(lbones) + [ [255,0,0] ] * len(rbones)  
 20 |         pltcolors = [ 'g-' ] * len(lbones) + [ 'b-' ] * len(rbones)  
 21 |         bones = lbones + rbones
 22 |     elif J==21: # hand (format similar to HO3D dataset)
 23 |         bones = [ [(0,n+1),(n+1,3*n+6),(3*n+6,3*n+7),(3*n+7,3*n+8)] for n in range(5)]
 24 |         bones = sum(bones,[])
 25 |         bonecolors = [(255,0,255)]*4 + [(255,0,0)]*4 + [(0,255,0)]*4 + [(0,255,255)]*4 + [(0,0,255)] *4
 26 |         pltcolors = ['m']*4 + ['b']*4 + ['g']*4 + ['y']*4 + ['r']*4
 27 |     elif J==84: # face (ibug format)
 28 |         bones = [ (n,n+1) for n in range(83) if n not in [32,37,42,46,51,57,63,75]] + [(52,57),(58,63),(64,75),(76,83)]
 29 |         # 32 x contour + 4 x r-sourcil +  4 x l-sourcil + 7 x nose + 5 x l-eye + 5 x r-eye +20 x lip + l-eye + r-eye + lip + lip
 30 |         bonecolors = 32 * [(255,0,0)] + 4*[(255,0,0)] + 4*[(255,255,0)] + 7*[(255,0,255)] + 5*[(0,255,255)] + 5*[(0,255,0)] + 18*[(0,0,255)] + [(0,255,255),(0,255,0),(0,0,255),(0,0,255)]
 31 |         pltcolors = 32  * ['b']       + 4*['b']       + 4*['c']         + 7*['m']         + 5*['y']         + 5*['g']       + 18*['r']       + ['y','g','r','r']
 32 |     else:
 33 |         raise NotImplementedError('unknown bones/colors for J='+str(J))
 34 |     return bones, bonecolors, pltcolors
 35 |     
 36 | def _get_xy(pose2d, i):
 37 |     if isinstance(i,int):
 38 |         return pose2d[i,:]
 39 |     else:
 40 |         return np.mean(pose2d[i,:], axis=0)
 41 |         
 42 | def _get_xy_tupleint(pose2d, i):
 43 |     return tuple(map(int,_get_xy(pose2d, i)))
 44 |     
 45 | def _get_xyz(pose3d, i):
 46 |     if isinstance(i,int):
 47 |         return pose3d[i,:]
 48 |     else:
 49 |         return np.mean(pose3d[i,:], axis=0)
 50 |         
 51 | def visualize_bodyhandface2d(im, dict_poses2d, dict_scores=None, lw=2, max_padding=100, bgr=True):
 52 |     """
 53 |     bgr: whether input/output is bgr or rgb
 54 |     
 55 |     dict_poses2d: some key/value among {'body': body_pose2d, 'hand': hand_pose2d, 'face': face_pose2d}
 56 |     """
 57 |     if all(v.size==0 for v in dict_poses2d.values()): return im
 58 |      
 59 |     h,w = im.shape[:2]
 60 |     bones = {}
 61 |     bonecolors = {}
 62 |     for k,v in dict_poses2d.items():
 63 |         bones[k], bonecolors[k], _ = _get_bones_and_colors(v.shape[1])
 64 |     
 65 |     # pad if necessary (if some joints are outside image boundaries)
 66 |     pad_top, pad_bot, pad_lft, pad_rgt = 0, 0, 0, 0
 67 |     for poses2d in dict_poses2d.values():
 68 |         if poses2d.size==0: continue
 69 |         xmin, ymin = np.min(poses2d.reshape(-1,2), axis=0)
 70 |         xmax, ymax = np.max(poses2d.reshape(-1,2), axis=0)
 71 |         pad_top = max(pad_top, min(max_padding, max(0, int(-ymin-5))))
 72 |         pad_bot = max(pad_bot, min(max_padding, max(0, int(ymax+5-h))))
 73 |         pad_lft = max(pad_lft, min(max_padding, max(0, int(-xmin-5))))
 74 |         pad_rgt = max(pad_rgt, min(max_padding, max(0, int(xmax+5-w))))
 75 | 
 76 |     imout = cv2.copyMakeBorder(im, top=pad_top, bottom=pad_bot, left=pad_lft, right=pad_rgt, borderType=cv2.BORDER_CONSTANT, value=[0,0,0] )
 77 |     if not bgr: imout = np.ascontiguousarray(imout[:,:,::-1])
 78 |     outposes2d = {}
 79 |     for part,poses2d in dict_poses2d.items():
 80 |         outposes2d[part] = poses2d.copy()
 81 |         outposes2d[part][:,:,0] += pad_lft
 82 |         outposes2d[part][:,:,1] += pad_top
 83 |   
 84 |     # for each part
 85 |     for part, poses2d in outposes2d.items():
 86 |     
 87 |         # draw each detection
 88 |         for ipose in range(poses2d.shape[0]): # bones
 89 |             pose2d = poses2d[ipose,...]
 90 | 
 91 |             # draw poses
 92 |             for ii, (i,j) in enumerate(bones[part]):    
 93 |                 p1 = _get_xy_tupleint(pose2d, i)
 94 |                 p2 = _get_xy_tupleint(pose2d, j)
 95 |                 cv2.line(imout, p1, p2, bonecolors[part][ii], thickness=lw*2)
 96 |             for j in range(pose2d.shape[0]):
 97 |                 p = _get_xy_tupleint(pose2d, j)
 98 |                 cv2.circle(imout, p, (2 if part!='face' else 1)*lw, (0,0,255), thickness=-1)
 99 |           
100 |             # draw scores
101 |             if dict_scores is not None: cv2.putText(imout, '{:.2f}'.format(dict_scores[part][ipose]), (int(pose2d[12,0]-10),int(pose2d[12,1]-10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0) )
102 |       
103 |     if not bgr: imout = imout[:,:,::-1]
104 |     
105 |     return imout 
106 | 


--------------------------------------------------------------------------------
/postprocess.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2020-present NAVER Corp.
  2 | # CC BY-NC-SA 4.0
  3 | # Available only for non-commercial use
  4 | 
  5 | import numpy as np 
  6 | import torch
  7 | from torchvision.ops import nms
  8 | 
  9 | def _boxes_from_poses(poses, margin=0.1): # pytorch version
 10 |   x1y1,_ = torch.min(poses, dim=1) # N x 2
 11 |   x2y2,_ = torch.max(poses, dim=1) # N x 2
 12 |   coords = torch.cat( (x1y1,x2y2), dim=1)
 13 |   sizes = x2y2-x1y1
 14 |   coords[:,0:2] -= margin * sizes
 15 |   coords[:,2:4] += margin * sizes
 16 |   return coords
 17 |   
 18 | def DOPE_NMS(scores, boxes, pose2d, pose3d, min_score=0.5, iou_threshold=0.1):
 19 |   if scores.numel()==0:
 20 |     return torch.LongTensor([]), torch.LongTensor([])
 21 |   maxscores, bestcls = torch.max(scores[:,1:], dim=1)
 22 |   valid_indices = torch.nonzero(maxscores>=min_score)
 23 |   if valid_indices.numel()==0:
 24 |     return torch.LongTensor([]), torch.LongTensor([])
 25 |   else:
 26 |     valid_indices = valid_indices[:,0]
 27 |   
 28 |   boxes = _boxes_from_poses(pose2d[valid_indices,bestcls[valid_indices],:,:], margin=0.1)
 29 |   indices = valid_indices[ nms(boxes, maxscores[valid_indices,...], iou_threshold) ]
 30 |   bestcls = bestcls[indices]
 31 |   
 32 |   return {'score': scores[indices, bestcls+1], 'pose2d': pose2d[indices, bestcls, :, :], 'pose3d': pose3d[indices, bestcls, :, :]}, indices, bestcls
 33 |   
 34 |   
 35 |   
 36 | 
 37 | 
 38 | def _get_bbox_from_points(points2d, margin):
 39 |     """ 
 40 |     Compute a bounding box around 2D keypoints, with a margin.
 41 |     margin: the margin is relative to the size of the tight bounding box
 42 | 
 43 |     """
 44 |     assert (len(points2d.shape)==2 and points2d.shape[1] == 2)
 45 |     mini = np.min(points2d, axis=0)
 46 |     maxi = np.max(points2d, axis=0)
 47 |     size = maxi-mini
 48 |     lower = mini - margin*size
 49 |     upper = maxi + margin*size
 50 |     box = np.concatenate((lower, upper)).astype(np.float32)
 51 |     return box  
 52 | 
 53 | def assign_hands_to_body(body_poses, hand_poses, hand_isright, margin=1):
 54 |   if body_poses.size==0: return []
 55 |   if hand_poses.size==0: return [(-1,-1) for i in range(body_poses.shape[0])]
 56 |   from scipy.spatial.distance import cdist
 57 |   body_rwrist = body_poses[:,6,:]
 58 |   body_lwrist = body_poses[:,7,:]
 59 |   hand_wrist = hand_poses[:,0,:]
 60 |   hand_boxes = np.concatenate([_get_bbox_from_points(hand_poses[i,:,:], margin=0.1)[None,:] for i in range(hand_poses.shape[0])], axis=0)
 61 |   hand_size = np.max(hand_boxes[:,2:4]-hand_boxes[:,0:2], axis=1)
 62 |   # associate body and hand if the distance hand-body and body-hand is the smallest one and is this distance is smaller than 3*hand_size
 63 |   wrists_from_body = [(-1,-1) for i in range(body_poses.shape[0])] # pair of (left_hand_id, right_hand_id)
 64 |   dist_lwrist = cdist(body_lwrist,hand_wrist)
 65 |   dist_rwrist = cdist(body_rwrist,hand_wrist)
 66 |   for i in range(body_poses.shape[0]):
 67 |     lwrist = -1
 68 |     rwrist = -1
 69 |     if hand_wrist.size>0:
 70 |       best_lwrist = np.argmin(dist_lwrist[i,:])
 71 |       if np.argmin(dist_lwrist[:,best_lwrist])==i and dist_lwrist[i,best_lwrist] <= margin * hand_size[best_lwrist]:
 72 |         lwrist = best_lwrist
 73 |       best_rwrist = np.argmin(dist_rwrist[i,:])
 74 |       if np.argmin(dist_rwrist[:,best_rwrist])==i and dist_rwrist[i,best_rwrist] <= margin * hand_size[best_rwrist]:
 75 |         rwrist = best_rwrist
 76 |     wrists_from_body[i] = (lwrist,rwrist)
 77 |   return wrists_from_body # pair of (left_hand_id, right_hand_id) for each body pose (-1 means no association)
 78 | 
 79 | def assign_head_to_body(body_poses, head_poses):
 80 |   if body_poses.size==0: return []
 81 |   if head_poses.size==0: return [-1 for i in range(body_poses.shape[0])]
 82 |   head_boxes = np.concatenate([_get_bbox_from_points(head_poses[i,:,:], margin=0.1)[None,:] for i in range(head_poses.shape[0])], axis=0)
 83 |   body_heads = body_poses[:,12,:]
 84 |   bodyhead_in_headboxes = np.empty( (body_poses.shape[0], head_boxes.shape[0]), dtype=np.bool)
 85 |   for i in range(body_poses.shape[0]):
 86 |     bodyhead = body_heads[i,:]
 87 |     bodyhead_in_headboxes[i,:] = (bodyhead[0]>=head_boxes[:,0]) * (bodyhead[0]<=head_boxes[:,2]) * (bodyhead[1]>=head_boxes[:,1]) * (bodyhead[1]<=head_boxes[:,3])
 88 |   head_for_body = []
 89 |   for i in range(body_poses.shape[0]):
 90 |     if np.sum(bodyhead_in_headboxes[i,:])==1:
 91 |       j = np.where(bodyhead_in_headboxes[i,:])[0][0]
 92 |       if np.sum(bodyhead_in_headboxes[:,j])==1:
 93 |         head_for_body.append(j)
 94 |       else:
 95 |         head_for_body.append(-1)
 96 |     else:
 97 |       head_for_body.append(-1)
 98 |   return head_for_body
 99 |   
100 |   
101 |   
102 |   
103 | def assign_hands_and_head_to_body(detections):
104 |   det_poses2d = {part: np.stack([d['pose2d'] for d in part_detections], axis=0) if len(part_detections)>0 else np.empty( (0,0,2), dtype=np.float32) for part, part_detections in detections.items()}
105 |   hand_isright = np.array([ d['hand_isright'] for d in detections['hand']])
106 |   body_with_wrists = assign_hands_to_body(det_poses2d['body'], det_poses2d['hand'], hand_isright, margin=1)
107 |   BODY_RIGHT_WRIST_KPT_ID = 6
108 |   BODY_LEFT_WRIST_KPT_ID = 7
109 |   for i,(lwrist,rwrist) in enumerate(body_with_wrists):
110 |       if lwrist != -1: detections['body'][i]['pose2d'][BODY_LEFT_WRIST_KPT_ID,:] = detections['hand'][lwrist]['pose2d'][0,:]
111 |       if rwrist != -1: detections['body'][i]['pose2d'][BODY_RIGHT_WRIST_KPT_ID,:] = detections['hand'][rwrist]['pose2d'][0,:]
112 |   body_with_head = assign_head_to_body(det_poses2d['body'], det_poses2d['face'])
113 |   return detections, body_with_wrists, body_with_head
114 |   
115 | 


--------------------------------------------------------------------------------
/dope.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2020-present NAVER Corp.
  2 | # CC BY-NC-SA 4.0
  3 | # Available only for non-commercial use
  4 | 
  5 | import sys, os
  6 | import argparse
  7 | import os.path as osp
  8 | from PIL import Image
  9 | import cv2
 10 | import numpy as np
 11 | 
 12 | import torch
 13 | from torchvision.transforms import ToTensor
 14 | 
 15 | _thisdir = osp.realpath(osp.dirname(__file__))
 16 | 
 17 | from model import dope_resnet50, num_joints
 18 | import postprocess
 19 | 
 20 | import visu
 21 | 
 22 | def dope(imagename, modelname, postprocessing='ppi'):
 23 |     if postprocessing=='ppi':
 24 |       sys.path.append( _thisdir+'/lcrnet-v2-improved-ppi/')
 25 |       try:
 26 |         from lcr_net_ppi_improved import LCRNet_PPI_improved
 27 |       except ModuleNotFoundError:
 28 |         raise Exception('To use the pose proposals integration (ppi) as postprocessing, please follow the readme instruction by cloning our modified version of LCRNet_v2.0 here. Alternatively, you can use --postprocess nms without any installation, with a slight decrease of performance.')
 29 | 
 30 |     
 31 |     device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
 32 |       
 33 |     # load model
 34 |     ckpt_fname = osp.join(_thisdir, 'models', modelname+'.pth.tgz')
 35 |     if not os.path.isfile(ckpt_fname):
 36 |         raise Exception('{:s} does not exist, please download the model first and place it in the models/ folder'.format(ckpt_fname))
 37 |     print('Loading model', modelname)
 38 |     ckpt = torch.load(ckpt_fname, map_location=device)
 39 |     #ckpt['half'] = False # uncomment this line in case your device cannot handle half computation
 40 |     ckpt['dope_kwargs']['rpn_post_nms_top_n_test'] = 1000
 41 |     model = dope_resnet50(**ckpt['dope_kwargs'])
 42 |     if ckpt['half']: model = model.half()
 43 |     model = model.eval()
 44 |     model.load_state_dict(ckpt['state_dict'])
 45 |     model = model.to(device)
 46 |         
 47 |     # load the image
 48 |     print('Loading image', imagename)
 49 |     image = Image.open(imagename)
 50 |     imlist = [ToTensor()(image).to(device)]
 51 |     if ckpt['half']: imlist = [im.half() for im in imlist]
 52 |     resolution = imlist[0].size()[-2:]
 53 |     
 54 |     # forward pass of the dope network
 55 |     print('Running DOPE')
 56 |     with torch.no_grad():
 57 |         results = model(imlist, None)[0]
 58 |     
 59 |     # postprocess results (pose proposals integration, wrists/head assignment)
 60 |     print('Postprocessing')
 61 |     assert postprocessing in ['nms','ppi']
 62 |     parts = ['body','hand','face']
 63 |     if postprocessing=='ppi':
 64 |         res = {k: v.float().data.cpu().numpy() for k,v in results.items()}
 65 |         detections = {}
 66 |         for part in parts:
 67 |             detections[part] = LCRNet_PPI_improved(res[part+'_scores'], res['boxes'], res[part+'_pose2d'], res[part+'_pose3d'], resolution, **ckpt[part+'_ppi_kwargs'])
 68 |     else: # nms
 69 |         detections = {}
 70 |         for part in parts:
 71 |             dets, indices, bestcls = postprocess.DOPE_NMS(results[part+'_scores'], results['boxes'], results[part+'_pose2d'], results[part+'_pose3d'], min_score=0.3)
 72 |             dets = {k: v.float().data.cpu().numpy() for k,v in dets.items()}
 73 |             detections[part] = [{'score': dets['score'][i], 'pose2d': dets['pose2d'][i,...], 'pose3d': dets['pose3d'][i,...]} for i in range(dets['score'].size)]
 74 |             if part=='hand':
 75 |                 for i in range(len(detections[part])): 
 76 |                     detections[part][i]['hand_isright'] = bestcls<ckpt['hand_ppi_kwargs']['K']
 77 |     
 78 |     # assignment of hands and head to body
 79 |     detections, body_with_wrists, body_with_head = postprocess.assign_hands_and_head_to_body(detections)
 80 |     
 81 |     # display results
 82 |     print('Displaying results')
 83 |     det_poses2d = {part: np.stack([d['pose2d'] for d in part_detections], axis=0) if len(part_detections)>0 else np.empty( (0,num_joints[part],2), dtype=np.float32) for part, part_detections in detections.items()}
 84 |     scores = {part: [d['score'] for d in part_detections] for part,part_detections in detections.items()}
 85 |     imout = visu.visualize_bodyhandface2d(np.asarray(image)[:,:,::-1],
 86 |                                           det_poses2d,
 87 |                                           dict_scores=scores,
 88 |                                          )
 89 |     outfile = imagename+'_{:s}.jpg'.format(modelname)
 90 |     cv2.imwrite(outfile, imout)
 91 |     print('\t', outfile)
 92 |     
 93 |     # display results in 3D
 94 |     if args.do_visu3d:
 95 |         print('Displaying results in 3D')
 96 |         import visu3d
 97 |         viewer3d = visu3d.Viewer3d()
 98 |         img3d, img2d = viewer3d.plot3d(image, 
 99 |            bodies={'pose3d': np.stack([d['pose3d'] for d in detections['body']]), 'pose2d' : np.stack([d['pose2d'] for d in detections['body']])},
100 |            hands={'pose3d': np.stack([d['pose3d'] for d in detections['hand']]), 'pose2d' : np.stack([d['pose2d'] for d in detections['hand']])},
101 |            faces={'pose3d': np.stack([d['pose3d'] for d in detections['face']]), 'pose2d' : np.stack([d['pose2d'] for d in detections['face']])},
102 |            body_with_wrists=body_with_wrists,
103 |            body_with_head=body_with_head,
104 |            interactive=False)
105 |         outfile3d = imagename+'_{:s}_visu3d.jpg'.format(modelname)
106 |         cv2.imwrite(outfile3d, img3d[:,:,::-1])
107 |         print('\t', outfile3d)
108 |     
109 |     
110 |     
111 |     
112 | 
113 | 
114 | 
115 | if __name__=="__main__":
116 |     parser = argparse.ArgumentParser(description='running DOPE on an image: python dope.py --model <modelname> --image <imagename>')
117 |     parser.add_argument('--model', required=True, type=str, help='name of the model to use (eg DOPE_v1_0_0)')
118 |     parser.add_argument('--image', required=True, type=str , help='path to the image')
119 |     parser.add_argument('--postprocess', default='ppi', choices=['ppi','nms'], help='postprocessing method')
120 |     parser.add_argument('--visu3d', dest='do_visu3d', default=False, action='store_true')
121 |     args = parser.parse_args()
122 |     dope(args.image, args.model, postprocessing=args.postprocess)
123 | 


--------------------------------------------------------------------------------
/NOTICE:
--------------------------------------------------------------------------------
  1 | dope
  2 | Copyright 2019-present NAVER Corp.
  3 | 
  4 | This project contains subcomponents with separate copyright notices and license terms. 
  5 | Your use of the source code for these subcomponents is subject to the terms and conditions of the following licenses.
  6 | 
  7 | =====
  8 | 
  9 | pytorch/pytorch
 10 | https://github.com/pytorch/pytorch
 11 | 
 12 | 
 13 | From PyTorch:
 14 | 
 15 | Copyright (c) 2016-     Facebook, Inc            (Adam Paszke)
 16 | Copyright (c) 2014-     Facebook, Inc            (Soumith Chintala)
 17 | Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
 18 | Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
 19 | Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
 20 | Copyright (c) 2011-2013 NYU                      (Clement Farabet)
 21 | Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
 22 | Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
 23 | Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
 24 | 
 25 | From Caffe2:
 26 | 
 27 | Copyright (c) 2016-present, Facebook Inc. All rights reserved.
 28 | 
 29 | All contributions by Facebook:
 30 | Copyright (c) 2016 Facebook Inc.
 31 |  
 32 | All contributions by Google:
 33 | Copyright (c) 2015 Google Inc.
 34 | All rights reserved.
 35 |  
 36 | All contributions by Yangqing Jia:
 37 | Copyright (c) 2015 Yangqing Jia
 38 | All rights reserved.
 39 |  
 40 | All contributions from Caffe:
 41 | Copyright(c) 2013, 2014, 2015, the respective contributors
 42 | All rights reserved.
 43 |  
 44 | All other contributions:
 45 | Copyright(c) 2015, 2016 the respective contributors
 46 | All rights reserved.
 47 |  
 48 | Caffe2 uses a copyright model similar to Caffe: each contributor holds
 49 | copyright over their contributions to Caffe2. The project versioning records
 50 | all such contribution and copyright details. If a contributor wants to further
 51 | mark their specific copyright on a particular contribution, they should
 52 | indicate their copyright solely in the commit message of the change when it is
 53 | committed.
 54 | 
 55 | All rights reserved.
 56 | 
 57 | Redistribution and use in source and binary forms, with or without
 58 | modification, are permitted provided that the following conditions are met:
 59 | 
 60 | 1. Redistributions of source code must retain the above copyright
 61 |    notice, this list of conditions and the following disclaimer.
 62 | 
 63 | 2. Redistributions in binary form must reproduce the above copyright
 64 |    notice, this list of conditions and the following disclaimer in the
 65 |    documentation and/or other materials provided with the distribution.
 66 | 
 67 | 3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
 68 |    and IDIAP Research Institute nor the names of its contributors may be
 69 |    used to endorse or promote products derived from this software without
 70 |    specific prior written permission.
 71 | 
 72 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 73 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 74 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 75 | ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 76 | LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 77 | CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 78 | SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 79 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 80 | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 81 | ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 82 | POSSIBILITY OF SUCH DAMAGE.
 83 | 
 84 | =====
 85 | 
 86 | pytorch/vision
 87 | https://github.com/pytorch/vision
 88 | 
 89 | 
 90 | BSD 3-Clause License
 91 | 
 92 | Copyright (c) Soumith Chintala 2016, 
 93 | All rights reserved.
 94 | 
 95 | Redistribution and use in source and binary forms, with or without
 96 | modification, are permitted provided that the following conditions are met:
 97 | 
 98 | * Redistributions of source code must retain the above copyright notice, this
 99 |   list of conditions and the following disclaimer.
100 | 
101 | * Redistributions in binary form must reproduce the above copyright notice,
102 |   this list of conditions and the following disclaimer in the documentation
103 |   and/or other materials provided with the distribution.
104 | 
105 | * Neither the name of the copyright holder nor the names of its
106 |   contributors may be used to endorse or promote products derived from
107 |   this software without specific prior written permission.
108 | 
109 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
110 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
111 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
112 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
113 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
114 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
115 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
116 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
117 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
118 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
119 | 
120 | ====
121 | 
122 | almarklein/visvis
123 | https://github.com/almarklein/visvis
124 | 
125 | 
126 | Visvis is subject to the (new) BSD license:
127 | 
128 | Copyright (c) 2012-2017, Visvis development team
129 | 
130 | Redistribution and use in source and binary forms, with or without
131 | modification, are permitted provided that the following conditions are met:
132 | * Redistributions of source code must retain the above copyright
133 |   notice, this list of conditions and the following disclaimer.
134 | * Redistributions in binary form must reproduce the above copyright
135 |   notice, this list of conditions and the following disclaimer in the
136 |   documentation and/or other materials provided with the distribution.
137 | * The names of its contributors may not be used to endorse or promote
138 |   products derived from this software without specific prior written
139 |   permission.
140 | 
141 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
142 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
143 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
144 | ARE DISCLAIMED. IN NO EVENT SHALL ALMAR KLEIN BE LIABLE FOR ANY 
145 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
146 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
147 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
148 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
149 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
150 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
151 | 
152 | Visvis contributers:
153 | 
154 |   * Almar Klein
155 |   * Robert Schroll
156 |   * Keith Smith
157 |   * Rob Reilink
158 | 


--------------------------------------------------------------------------------
/visu3d.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2020-present NAVER Corp.
  2 | # CC BY-NC-SA 4.0
  3 | # Available only for non-commercial use
  4 | 
  5 | 
  6 | import numpy as np
  7 | import visvis as vv
  8 | import visu
  9 | import scipy.optimize
 10 | import copy
 11 | import PIL
 12 | 
 13 | def scale_orthographic(points3d, points2d):
 14 |     """
 15 |     Return a scaled set of 3D points, offseted in XY direction so as to minimize the distance to the 2D points
 16 |     """
 17 |     residuals_func1 = lambda x : ((x[None,:2] + points3d[:,:2]) - points2d).flatten()
 18 |     res1 = scipy.optimize.least_squares(residuals_func1, x0=[0,0], method='lm')
 19 |     residuals_func2 = lambda x : (np.exp(x[2]) * (x[None,:2] + points3d[:,:2]) - points2d).flatten()
 20 |     res2 = scipy.optimize.least_squares(residuals_func2, x0=np.concatenate((res1['x'],[0])), method='lm')
 21 |     x=res2['x']
 22 |     output3d = points3d.copy()
 23 |     output3d[:,:2]+=x[None,:2]
 24 |     output3d*=np.exp(x[2])
 25 |     return output3d
 26 |             
 27 | 
 28 | 
 29 | class Viewer3d:
 30 |     def __init__(self, display2d=True, camera_zoom=None, camera_location=None):
 31 |         self.app = vv.use('qt5')
 32 |         self.figsize = (1280+20,720+20)
 33 |         self.display2d = display2d
 34 |         self.camera_zoom = camera_zoom
 35 |         self.camera_location = camera_location
 36 | 
 37 | 
 38 |     def plot3d(self, img, 
 39 |                bodies={'pose3d':np.empty((0,13,3)), 'pose2d':np.empty((0,13,2))},
 40 |                hands={'pose3d':np.empty((0,21,3)), 'pose2d':np.empty((0,21,2))}, 
 41 |                faces={'pose3d':np.empty((0,84,3)), 'pose2d':np.empty((0,84,2))}, 
 42 |                body_with_wrists=[],
 43 |                body_with_head=[],
 44 |                interactive=False):
 45 |         """
 46 |         :param img: a HxWx3 numpy array
 47 |         :param bodies: dictionnaroes with 'pose3d' (resp 'pose2d') with the body 3D (resp 2D) pose
 48 |         :param faces: same with face pose
 49 |         :param hands: same with hand pose
 50 |         :param body_with_wrists: list with for each body, a tuple (left_hand_id, right_hand_id) of the index of the hand detection attached to this body detection (-1 if none) for left and right hands
 51 |         :parma body_with_head: list with for each body, the index of the face detection attached to this body detection (-1 if none)
 52 |         :param interactive: whether to open the viewer in an interactive manner or not
 53 |         """
 54 |         
 55 |         # body pose do not use the same coordinate systems        
 56 |         bodies['pose3d'][:,:,0] *= -1
 57 |         bodies['pose3d'][:,:,1] *= -1
 58 |                 
 59 |         # Compute 3D scaled representation of each part, stored in "points3d"
 60 |         hands, bodies, faces = [copy.copy(s) for s in (hands, bodies, faces)]
 61 |         parts = (hands, bodies, faces)
 62 |         for part in parts:
 63 |             part['points3d'] = np.zeros_like(part['pose3d'])
 64 |             for part_idx in range(len(part['pose3d'])):
 65 |                 points3d = scale_orthographic(part['pose3d'][part_idx], part['pose2d'][part_idx])  
 66 |                 part['points3d'][part_idx] = points3d
 67 |             
 68 |         # Various display tricks to make the 3D visualization of full-body nice
 69 |         # (1) for faces, add a Z offset to faces to align them with the body
 70 |         for body_id, face_id in enumerate(body_with_head):
 71 |             if face_id !=-1:
 72 |                 z_offset = bodies['points3d'][body_id,12,2] - np.mean(faces['points3d'][face_id,:,2])
 73 |                 faces['points3d'][face_id,:,2] += z_offset
 74 |         # (2) for hands, add a 3D offset to put them at the wrist location
 75 |         for body_id, (lwrist_id, rwrist_id) in enumerate(body_with_wrists):
 76 |             if lwrist_id != -1:
 77 |                 hands['points3d'][lwrist_id,:,:] = bodies['points3d'][body_id,7,:] - hands['points3d'][lwrist_id,0,:]
 78 |             if rwrist_id != -1:
 79 |                 hands['points3d'][rwrist_id,:,:] = bodies['points3d'][body_id,6,:] - hands['points3d'][rwrist_id,0,:]                
 80 |         
 81 |         img = np.asarray(img)
 82 |         height, width = img.shape[:2]
 83 |         
 84 | 
 85 |         fig=vv.figure(1)
 86 |         fig.Clear()        
 87 |         
 88 |         fig._SetPosition(0,0,self.figsize[0], self.figsize[1])
 89 |         if not interactive:
 90 |             fig._enableUserInteraction=False
 91 | 
 92 |         axes = vv.gca()
 93 |         # Hide axis
 94 |         axes.axis.visible = False
 95 |         
 96 |         scaling_factor = 1.0/height
 97 |         
 98 |         # Camera interaction is not intuitive along z axis
 99 |         # We reference every object to a parent frame that is rotated to circumvent the issue
100 |         ref_frame = vv.Wobject(axes)
101 |         ref_frame.transformations.append(vv.Transform_Rotate(-90, 1,0,0))
102 |         ref_frame.transformations.append(vv.Transform_Translate(-0.5*width*scaling_factor, -0.5, 0))
103 |         
104 |         # Draw image
105 |         if self.display2d:
106 |             # Display pose in 2D
107 |             img = visu.visualize_bodyhandface2d(img, 
108 |                                             dict_poses2d={'body': bodies['pose2d'],
109 |                                                           'hand': hands['pose2d'],
110 |                                                           'face': faces['pose2d']},
111 |                                             lw=2, max_padding=0, bgr=False)
112 |             
113 |             XX, YY = np.meshgrid([0,width*scaling_factor],[0, 1])
114 |             img_z_offset = 0.5
115 |             ZZ = img_z_offset * np.ones(XX.shape)
116 |             # Draw image
117 |             embedded_img = vv.surf(XX, YY, ZZ, img)
118 |             embedded_img.parent = ref_frame
119 |             embedded_img.ambientAndDiffuse=1.0
120 | 
121 |             # Draw a grid on the bottom floor to get a sense of depth
122 |             XX, ZZ = np.meshgrid(np.linspace(0, width*scaling_factor, 10), img_z_offset - np.linspace(0, width*scaling_factor, 10))
123 |             YY = np.ones_like(XX)
124 |             grid3d = vv.surf(XX, YY, ZZ)
125 |             grid3d.parent = ref_frame
126 |             grid3d.edgeColor=(0.1,0.1,0.1,1.0)
127 |             grid3d.edgeShading='plain'
128 |             grid3d.faceShading=None
129 | 
130 |         
131 |         # Draw pose
132 |         for part in parts:
133 | 
134 |             for part_idx in range(len(part['points3d'])):
135 |                 points3d = part['points3d'][part_idx]*scaling_factor
136 |                 # Draw bones
137 |                 J = len(points3d)
138 |                 is_body = (J==13)
139 |                 ignore_neck = False if not is_body else body_with_head[part_idx]!=-1
140 |                 bones, bonecolors, pltcolors = visu._get_bones_and_colors(J, ignore_neck=ignore_neck)
141 |                 for (kpt_id1, kpt_id2), color in zip(bones, bonecolors):
142 |                     color = color[2], color[1], color[0] # BGR vs RGB
143 |                     p1 = visu._get_xyz(points3d, kpt_id1)
144 |                     p2 = visu._get_xyz(points3d, kpt_id2)
145 |                     pointset=vv.Pointset(3)
146 |                     pointset.append(p1)
147 |                     pointset.append(p2)
148 | 
149 |                     # Draw bones as solid capsules
150 |                     bone_radius = 0.005
151 |                     line = vv.solidLine(pointset, radius=bone_radius)
152 |                     line.faceColor = color
153 |                     line.ambientAndDiffuse=1.0
154 | 
155 |                     line.parent = ref_frame
156 |                 
157 |                 # Draw keypoints, except for faces
158 |                 if J != 84:
159 |                     keypoints_to_plot = points3d
160 |                     if ignore_neck:
161 |                         # for a nicer display, ignore head keypoint
162 |                         keypoints_to_plot=keypoints_to_plot[:12,:]
163 |                     # Use solid spheres
164 |                     for i in range(len(keypoints_to_plot)):
165 |                         kpt_wobject = vv.solidSphere(translation=keypoints_to_plot[i,:].tolist(), scaling=1.5*bone_radius)
166 |                         kpt_wobject.faceColor = (255,0,0)
167 |                         kpt_wobject.ambientAndDiffuse=1.0
168 |                         kpt_wobject.parent = ref_frame
169 |         
170 |         # Use just an ambient lighting
171 |         axes.light0.ambient=0.8
172 |         axes.light0.diffuse=0.2
173 |         axes.light0.specular=0.0
174 |         
175 |         cam = vv.cameras.ThreeDCamera()
176 |         axes.camera=cam
177 |         #z axis
178 |         cam.azimuth=-45
179 |         cam.elevation=20
180 |         cam.roll=0
181 |         # Orthographic camera
182 |         cam.fov=0
183 |         if self.camera_zoom is None:
184 |             cam.zoom *= 1.3 # Zoom a bit more
185 |         else:
186 |             cam.zoom = self.camera_zoom
187 |         if self.camera_location is not None:
188 |             cam.loc = self.camera_location
189 |         cam.SetView()
190 |         
191 | 
192 |         if interactive:
193 |             self.app.Run()
194 |         else:
195 |             fig._widget.update()
196 |             self.app.ProcessEvents()
197 |             
198 |             img3d = vv.getframe(vv.gcf())
199 |             img3d = np.clip(img3d * 255, 0, 255).astype(np.uint8)
200 |             # Crop gray borders
201 |             img3d = img3d[10:-10, 10:-10,:]
202 |             
203 |             return img3d, img
204 | 
205 | 
206 | 


--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2020-present NAVER Corp.
  2 | # CC BY-NC-SA 4.0
  3 | # Available only for non-commercial use
  4 | 
  5 | from collections import OrderedDict
  6 | 
  7 | import torch
  8 | from torch import nn
  9 | import torch.nn.functional as F
 10 | 
 11 | from torchvision.ops import misc as misc_nn_ops
 12 | from torchvision.ops import MultiScaleRoIAlign
 13 | 
 14 | from torchvision.models import resnet
 15 | from torchvision.models.detection.rpn import AnchorGenerator, RPNHead, RegionProposalNetwork
 16 | from torchvision.models.detection.roi_heads import RoIHeads
 17 | from torchvision.models.detection.generalized_rcnn import GeneralizedRCNN
 18 | from torchvision.models.detection.transform import GeneralizedRCNNTransform, resize_keypoints, resize_boxes
 19 | 
 20 | parts = ['body','hand','face']
 21 | num_joints = {'body': 13, 'hand': 21, 'face': 84}
 22 | 
 23 | class Dope_Transform(GeneralizedRCNNTransform):
 24 |   
 25 |     def __init__(self, min_size, max_size, image_mean, image_std):
 26 |         super(self.__class__, self).__init__(min_size, max_size, image_mean, image_std)
 27 | 
 28 |     def postprocess(self, result, image_shapes, original_image_sizes):
 29 |         if self.training:
 30 |             return result
 31 |         for i, (pred, im_s, o_im_s) in enumerate(zip(result, image_shapes, original_image_sizes)):
 32 |             boxes = pred["boxes"]
 33 |             boxes = resize_boxes(boxes, im_s, o_im_s)
 34 |             result[i]["boxes"] = boxes
 35 |             for k in ['pose2d', 'body_pose2d', 'hand_pose2d', 'face_pose2d']:
 36 |                 if k in pred and pred[k] is not None:
 37 |                     pose2d = pred[k]
 38 |                     pose2d = resize_keypoints(pose2d, im_s, o_im_s)
 39 |                     result[i][k] = pose2d    
 40 |         return result
 41 | 
 42 | class Dope_RCNN(GeneralizedRCNN):
 43 | 
 44 |     def __init__(self, backbone,
 45 |                  dope_roi_pool, dope_head, dope_predictor,
 46 |                  # transform parameters
 47 |                  min_size=800, max_size=1333,
 48 |                  image_mean=None, image_std=None,
 49 |                  # RPN parameters
 50 |                  rpn_anchor_generator=None, rpn_head=None,
 51 |                  rpn_pre_nms_top_n_train=2000, rpn_pre_nms_top_n_test=1000,
 52 |                  rpn_post_nms_top_n_train=2000, rpn_post_nms_top_n_test=1000,
 53 |                  rpn_nms_thresh=0.7,
 54 |                  rpn_fg_iou_thresh=0.7, rpn_bg_iou_thresh=0.3,
 55 |                  rpn_batch_size_per_image=256, rpn_positive_fraction=0.5,
 56 |                  # others
 57 |                  num_anchor_poses = {'body': 20, 'hand': 10, 'face': 10},
 58 |                  pose2d_reg_weights = {part: 5.0 for part in parts},
 59 |                  pose3d_reg_weights = {part: 5.0 for part in parts},
 60 |                 ):
 61 |                 
 62 |         if not hasattr(backbone, "out_channels"):
 63 |             raise ValueError(
 64 |                 "backbone should contain an attribute out_channels "
 65 |                 "specifying the number of output channels (assumed to be the "
 66 |                 "same for all the levels)")
 67 | 
 68 |         assert isinstance(rpn_anchor_generator, (AnchorGenerator, type(None)))
 69 |         assert isinstance(dope_roi_pool, (MultiScaleRoIAlign, type(None)))
 70 | 
 71 |         out_channels = backbone.out_channels
 72 | 
 73 |         if rpn_anchor_generator is None:
 74 |             anchor_sizes = ((32,), (64,), (128,), (256,), (512,))
 75 |             aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
 76 |             rpn_anchor_generator = AnchorGenerator(
 77 |                 anchor_sizes, aspect_ratios
 78 |             )
 79 |         if rpn_head is None:
 80 |             rpn_head = RPNHead(
 81 |                 out_channels, rpn_anchor_generator.num_anchors_per_location()[0]
 82 |             )
 83 | 
 84 |         rpn_pre_nms_top_n = dict(training=rpn_pre_nms_top_n_train, testing=rpn_pre_nms_top_n_test)
 85 |         rpn_post_nms_top_n = dict(training=rpn_post_nms_top_n_train, testing=rpn_post_nms_top_n_test)
 86 |         
 87 |         rpn = RegionProposalNetwork(
 88 |             rpn_anchor_generator, rpn_head,
 89 |             rpn_fg_iou_thresh, rpn_bg_iou_thresh,
 90 |             rpn_batch_size_per_image, rpn_positive_fraction,
 91 |             rpn_pre_nms_top_n, rpn_post_nms_top_n, rpn_nms_thresh)
 92 | 
 93 |         dope_heads = Dope_RoIHeads(dope_roi_pool, dope_head, dope_predictor, num_anchor_poses, pose2d_reg_weights=pose2d_reg_weights, pose3d_reg_weights=pose3d_reg_weights)
 94 |             
 95 |         if image_mean is None:
 96 |             image_mean = [0.485, 0.456, 0.406]
 97 |         if image_std is None:
 98 |             image_std = [0.229, 0.224, 0.225]
 99 |         transform = Dope_Transform(min_size, max_size, image_mean, image_std)
100 | 
101 |         super(Dope_RCNN, self).__init__(backbone, rpn, dope_heads, transform)
102 |         
103 | 
104 | 
105 | class Dope_Predictor(nn.Module):
106 | 
107 |     def __init__(self, in_channels, dict_num_classes, dict_num_posereg):
108 |         super(self.__class__, self).__init__()
109 |         self.body_cls_score = nn.Linear(in_channels, dict_num_classes['body'])
110 |         self.body_pose_pred = nn.Linear(in_channels, dict_num_posereg['body'])
111 |         self.hand_cls_score = nn.Linear(in_channels, dict_num_classes['hand'])
112 |         self.hand_pose_pred = nn.Linear(in_channels, dict_num_posereg['hand'])
113 |         self.face_cls_score = nn.Linear(in_channels, dict_num_classes['face'])
114 |         self.face_pose_pred = nn.Linear(in_channels, dict_num_posereg['face'])
115 |         
116 | 
117 | 
118 |     def forward(self, x):   
119 |         if x.dim() == 4:
120 |             assert list(x.shape[2:]) == [1, 1]
121 |         x = x.flatten(start_dim=1)
122 |         scores = {}
123 |         pose_deltas = {}
124 |         scores['body'] = self.body_cls_score(x)
125 |         pose_deltas['body'] = self.body_pose_pred(x)
126 |         scores['hand'] = self.hand_cls_score(x)
127 |         pose_deltas['hand'] = self.hand_pose_pred(x)
128 |         scores['face'] = self.face_cls_score(x)
129 |         pose_deltas['face'] = self.face_pose_pred(x)
130 |         return scores, pose_deltas
131 | 
132 | 
133 | 
134 | 
135 | class Dope_RoIHeads(RoIHeads):
136 | 
137 |     def __init__(self,
138 |                  dope_roi_pool,
139 |                  dope_head,
140 |                  dope_predictor,
141 |                  num_anchor_poses,
142 |                  pose2d_reg_weights,
143 |                  pose3d_reg_weights):
144 |         
145 |         fg_iou_thresh=0.5
146 |         bg_iou_thresh=0.5
147 |         batch_size_per_image=512
148 |         positive_fraction=0.25
149 |         bbox_reg_weights = [0.0]*4
150 |         score_thresh = 0.0
151 |         nms_thresh = 1.0
152 |         detections_per_img = 99999999
153 |         super(self.__class__, self).__init__(None, None, None, fg_iou_thresh, bg_iou_thresh, batch_size_per_image, positive_fraction, bbox_reg_weights,score_thresh,nms_thresh,detections_per_img,mask_roi_pool=None,mask_head=None,mask_predictor=None,keypoint_roi_pool=None,keypoint_head=None,keypoint_predictor=None)
154 |         for k in parts:
155 |             self.register_buffer(k+'_anchor_poses', torch.empty( (num_anchor_poses[k], num_joints[k], 5) ))
156 |         self.dope_roi_pool = dope_roi_pool
157 |         self.dope_head = dope_head
158 |         self.dope_predictor = dope_predictor        
159 |         self.J = num_joints
160 |         self.pose2d_reg_weights = pose2d_reg_weights
161 |         self.pose3d_reg_weights = pose3d_reg_weights
162 | 
163 |     def forward(self, features, proposals, image_shapes, targets=None):
164 |         """
165 |         Arguments:
166 |             features (List[torch.Tensor])
167 |             proposals (List[torch.Tensor[N, 4]])
168 |             image_shapes (List[Tuple[H, W]])
169 |             targets (List[Dict])
170 |         """
171 |         
172 |         # roi_pool
173 |         if features['0'].dtype==torch.float16: # UGLY: dope_roi_pool is not yet compatible with half
174 |             features = {'0': features['0'].float()}
175 |             if proposals[0].dtype==torch.float16:
176 |                 hproposals = [p.float() for p in proposals] 
177 |             else:
178 |                 hproposals = proposals
179 |             dope_features = self.dope_roi_pool(features, hproposals, image_shapes)
180 |             dope_features = dope_features.half()          
181 |         else:
182 |             dope_features = self.dope_roi_pool(features, proposals, image_shapes)
183 |             
184 |         # head
185 |         dope_features = self.dope_head(dope_features)
186 |         
187 |         # predictor
188 |         class_logits, dope_regression = self.dope_predictor(dope_features)
189 | 
190 |         # process results
191 |         result = []
192 |         losses = {}
193 |         if self.training:
194 |             raise NotImplementedError
195 |         else:
196 |             boxes, scores, poses2d, poses3d = self.postprocess_dope(class_logits, dope_regression, proposals, image_shapes)
197 |             num_images = len(boxes)
198 |             for i in range(num_images):
199 |                 res = {'boxes': boxes[i]}
200 |                 for k in parts:
201 |                   res[k+'_scores'] = scores[k][i]
202 |                   res[k+'_pose2d'] = poses2d[k][i]
203 |                   res[k+'_pose3d'] = poses3d[k][i]
204 |                 result.append(res)
205 | 
206 |         return result, losses
207 |         
208 |     def postprocess_dope(self, class_logits, dope_regression, proposals, image_shapes):
209 |         boxes_per_image = [len(boxes_in_image) for boxes_in_image in proposals]
210 |         num_images = len(proposals)
211 |         pred_scores = {}
212 |         all_poses_2d = {}
213 |         all_poses_3d = {}
214 |         for k in parts:
215 |             # anchor poses 
216 |             anchor_poses = getattr(self, k+'_anchor_poses')
217 |             nboxes, num_classes = class_logits[k].size()
218 |             # scores
219 |             sc = F.softmax(class_logits[k], -1)
220 |             pred_scores[k] = sc.split(boxes_per_image, 0)
221 |             # poses
222 |             all_poses_2d[k] = []
223 |             all_poses_3d[k] = []
224 |             dope_regression[k] = dope_regression[k].view(nboxes, num_classes-1, self.J[k] * 5 )
225 |             dope_regression_per_image = dope_regression[k].split(boxes_per_image, 0)
226 |             for img_id in range(num_images):      
227 |                 dope_reg = dope_regression_per_image[img_id]
228 |                 boxes = proposals[img_id]
229 |                 # 2d
230 |                 offset = boxes[:,0:2]
231 |                 scale = boxes[:,2:4]-boxes[:,0:2]
232 |                 box_resized_anchors = offset[:,None,None,:] + anchor_poses[None,:,:,:2] * scale[:,None,None,:]
233 |                 dope_reg_2d = dope_reg[:,:,:2*self.J[k]].reshape(boxes.size(0),num_classes-1,self.J[k],2) / self.pose2d_reg_weights[k]
234 |                 pose2d = box_resized_anchors + dope_reg_2d * scale[:,None,None,:]
235 |                 all_poses_2d[k].append(pose2d)
236 |                 # 3d
237 |                 anchor3d = anchor_poses[None,:,:,-3:]
238 |                 dope_reg_3d = dope_reg[:,:,-3*self.J[k]:].reshape(boxes.size(0),num_classes-1,self.J[k],3) / self.pose3d_reg_weights[k]
239 |                 pose3d = anchor3d + dope_reg_3d
240 |                 all_poses_3d[k].append(pose3d)
241 |         return proposals, pred_scores, all_poses_2d, all_poses_3d
242 | 
243 | 
244 | 
245 | 
246 | def dope_resnet50(**dope_kwargs):
247 | 
248 |     backbone_name = 'resnet50'
249 |     from torchvision.ops import misc as misc_nn_ops
250 |     class FrozenBatchNorm2dWithHalf(misc_nn_ops.FrozenBatchNorm2d):    
251 |         def forward(self, x):
252 |             if x.dtype==torch.float16: # UGLY: seems that it does not work with half otherwise, so let's just use the standard bn function or half
253 |                 return F.batch_norm(x, self.running_mean, self.running_var, self.weight, self.bias, training=False)
254 |             else:
255 |                 return super(self.__class__, self).forward(x)           
256 |                 
257 |     backbone = resnet.__dict__[backbone_name](pretrained=False, norm_layer=FrozenBatchNorm2dWithHalf)
258 |     # build the main blocks
259 |     class ResNetBody(nn.Module):
260 |         def __init__(self, backbone):
261 |             super(self.__class__, self).__init__()
262 |             self.resnet_backbone = backbone
263 |             self.out_channels = 1024
264 |         def forward(self, x):
265 |             x = self.resnet_backbone.conv1(x)
266 |             x = self.resnet_backbone.bn1(x)
267 |             x = self.resnet_backbone.relu(x)
268 |             x = self.resnet_backbone.maxpool(x)
269 |             x = self.resnet_backbone.layer1(x)
270 |             x = self.resnet_backbone.layer2(x)
271 |             x = self.resnet_backbone.layer3(x)
272 |             return x
273 |     resnet_body = ResNetBody(backbone)
274 |     # build the anchor generator and pooler
275 |     anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),), aspect_ratios=((0.5, 1.0, 2.0),))
276 |     roi_pooler = MultiScaleRoIAlign(featmap_names=['0'], output_size=7, sampling_ratio=2)
277 |     # build the head and predictor
278 |     class ResNetHead(nn.Module):
279 |         def __init__(self, backbone):
280 |             super(self.__class__, self).__init__()
281 |             self.resnet_backbone = backbone
282 |         def forward(self, x):
283 |             x = self.resnet_backbone.layer4(x)
284 |             x = self.resnet_backbone.avgpool(x)
285 |             x = torch.flatten(x, 1)
286 |             return x
287 |     resnet_head = ResNetHead(backbone)
288 |     
289 |     # predictor
290 |     num_anchor_poses = dope_kwargs['num_anchor_poses']
291 |     num_classes = {k: v+1 for k,v in num_anchor_poses.items()}
292 |     num_posereg =  {k: num_anchor_poses[k] * num_joints[k] * 5 for k in num_joints.keys()}
293 |     predictor = Dope_Predictor(2048, num_classes, num_posereg)
294 | 
295 |     # full model 
296 |     model = Dope_RCNN(resnet_body, roi_pooler, resnet_head, predictor, rpn_anchor_generator=anchor_generator, **dope_kwargs)
297 | 
298 |     return model
299 | 


--------------------------------------------------------------------------------