├── .gitignore ├── LICENSE ├── README.md ├── data └── placeholder.txt ├── images ├── blouse.jpg ├── dress.jpg ├── outwear.jpg ├── skirt.jpg └── trousers.jpg ├── src ├── data_gen │ ├── data_generator.py │ ├── data_process.py │ ├── dataset.py │ ├── kpAnno.py │ ├── ohem.py │ └── utils.py ├── eval │ ├── eval_callback.py │ ├── evaluation.py │ └── post_process.py ├── top │ ├── demo.py │ ├── test.py │ └── train.py └── unet │ ├── fashion_net.py │ ├── refinenet.py │ ├── refinenet_mask_v3.py │ └── resnet101.py ├── submission └── placeholder.txt └── trained_models └── placeholder.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | *.pyc 3 | *.pkl 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 VictorLi 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AiFashion 2 | 3 | - Author: VictorLi, yuanyuan.li85@gmail.com 4 | - Code for FashionAI Global Challenge—Key Points Detection of Apparel 5 | [2018 TianChi](https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100068.5678.1.4ccc289bCzDJXu&raceId=231648&_lang=en_US) 6 | - Rank 45/2322 at 1st round competition, score 0.61 7 | - Rank 46 at 2nd round competition, score 0.477 8 | 9 | ## Images with detected keypoints 10 | ### Dress 11 | ![Dress](./images/dress.jpg) 12 | ### Blouse 13 | ![Blouse](./images/blouse.jpg) 14 | ### Outwear 15 | ![Outwear](./images/outwear.jpg) 16 | ### Skirt 17 | ![Skirt](./images/skirt.jpg) 18 | ### Trousers 19 | ![Trousers](./images/trousers.jpg) 20 | 21 | 22 | ## Basic idea 23 | - The key idea comes from paper [Cascaded Pyramid Network for Multi-Person Pose Estimation](https://arxiv.org/abs/1711.07319). We have a 2 stage network called global net and refine net who are U-net like. The network was trained to detect the heatmap of cloth's key points. The backbone network used here is resnet101. 24 | - To overcome the negative impact from different category, `input_mask` was introduced to zero the invalid keypoints. For example, skirt has 4 valid keypoints: `waistband_left`, `waistband_right`, `hemline_left` and `hemline_right`. In `input_mask`, only those valid masks are 1.0 , while other 20 masks are set as zero. 25 | - On line hard negative mining, at last stage of refinenet, only take the top losses as consideration and ignore the easy part (small loss) 26 | 27 | ## Dependency 28 | - Keras2.0 29 | - Tensorflow 30 | - Opencv/Numpy/Pandas 31 | - Pretrained model weights, resenet101 32 | 33 | ## Folder Structure 34 | - `data`: folder to store training and testing images and annotations 35 | - `trained_models`: folder to store trained models and logs 36 | - `submission`: folder to store generated submission for evaluation. 37 | - `src`: folder to put all of source code. 38 | `src/data_gen`: code for data generator including data augmentation and pre-process 39 | `src/eval`: code for evaluation, including inference and post-processing. 40 | `src/unet`: code for cnn model definition, including train, fine-tune, loss, optimizer definition. 41 | `src/top`:top level code for train, test and demo. 42 | 43 | ## How to train network 44 | - Download dataset from competition webpage and put it under data. 45 | `data/train` : data used as train. `data/test` : data used for test 46 | - Download [resnet101](https://gist.github.com/flyyufelix/65018873f8cb2bbe95f429c474aa1294) model and save it as `data/resnet101_weights_tf.h5`. 47 | Note: all the models here use channel_last dim order. 48 | - Train all-in-one network from scratch 49 | ``` 50 | python train.py --category all --epochs 30 --network v11 --batchSize 3 --gpuID 2 51 | ``` 52 | - The trained model and log will be put under `trained_models/all/xxxx`, i.e `trained_models/all/2018_05_23_15_18_07/` 53 | - The evaluation will run for each epoch and details saved to `val.log` 54 | - Resume training from a specific model. 55 | ``` 56 | python train.py --gpuID 2 --category all --epochs 30 --network v11 --batchSize 3 --resume True --resumeModel /path/to/model/start/with --initEpoch 6 57 | ``` 58 | 59 | ## How to test and generate submission 60 | - Run test and generate submission 61 | Below command search the best score from `modelpath` and use that to generate submission 62 | ``` 63 | python test.py --gpuID 2 --modelpath ../../trained_models/all/xxx --outpath ../../submission/2018_04_19/ --augment True 64 | ``` 65 | The submission will be saved as `submission.csv` 66 | 67 | ## How to run demo 68 | - Download the pre trained weights from [BaiduDisk](https://pan.baidu.com/s/1t7fB5wnRfW1Vny0gw7xUDQ) (password `1ae2`) or [GoogleDrive](https://drive.google.com/open?id=1VY-AO2F1XMQLBjEZjy6CrOSIPWWaHUGr) 69 | - Save it somewhere, i.e `trained_models/all/fashion_ai_keypoint_weights_epoch28.hdf5` 70 | - Or use your own trained model. 71 | - Run demo and the cloth with keypoints marked will be displayed. 72 | ``` 73 | python demo.py --gpuID 2 --modelfile ../../trained_models/all/fashion_ai_keypoint_weights_epoch28.hdf5 74 | ``` 75 | 76 | ## Reference 77 | - Resnet 101 Keras : https://github.com/statech/resnet 78 | -------------------------------------------------------------------------------- /data/placeholder.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras/0b3bd8cdee32e05619300e5466578644974279df/data/placeholder.txt -------------------------------------------------------------------------------- /images/blouse.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras/0b3bd8cdee32e05619300e5466578644974279df/images/blouse.jpg -------------------------------------------------------------------------------- /images/dress.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras/0b3bd8cdee32e05619300e5466578644974279df/images/dress.jpg -------------------------------------------------------------------------------- /images/outwear.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras/0b3bd8cdee32e05619300e5466578644974279df/images/outwear.jpg -------------------------------------------------------------------------------- /images/skirt.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras/0b3bd8cdee32e05619300e5466578644974279df/images/skirt.jpg -------------------------------------------------------------------------------- /images/trousers.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras/0b3bd8cdee32e05619300e5466578644974279df/images/trousers.jpg -------------------------------------------------------------------------------- /src/data_gen/data_generator.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import cv2 4 | import pandas as pd 5 | import numpy as np 6 | import random 7 | 8 | from kpAnno import KpAnno 9 | from dataset import getKpNum, getKpKeys, getFlipMapID, generate_input_mask 10 | from utils import make_gaussian, load_annotation_from_df 11 | from data_process import pad_image, resize_image, normalize_image, rotate_image, \ 12 | rotate_image_float, rotate_mask, crop_image 13 | from ohem import generate_topk_mask_ohem 14 | 15 | class DataGenerator(object): 16 | 17 | def __init__(self, category, annfile): 18 | self.category = category 19 | self.annfile = annfile 20 | self._initialize() 21 | 22 | def get_dim_order(self): 23 | # default tensorflow dim order 24 | return "channels_last" 25 | 26 | def get_dataset_size(self): 27 | return len(self.annDataFrame) 28 | 29 | def generator_with_mask_ohem(self, graph, kerasModel, batchSize=16, inputSize=(512, 512), flipFlag=False, cropFlag=False, 30 | shuffle=True, rotateFlag=True, nStackNum=1): 31 | 32 | ''' 33 | Input: batch_size * Height (512) * Width (512) * Channel (3) 34 | Input: batch_size * 256 * 256 * Channel (N+1). Mask for each category. 1.0 for valid parts in category. 0.0 for invalid parts 35 | Output: batch_size * Height/2 (256) * Width/2 (256) * Channel (N+1) 36 | ''' 37 | xdf = self.annDataFrame 38 | 39 | targetHeight, targetWidth = inputSize 40 | 41 | # train_input: npfloat, height, width, channels 42 | # train_gthmap: npfloat, N heatmap + 1 background heatmap, 43 | train_input = np.zeros((batchSize, targetHeight, targetWidth, 3), dtype=np.float) 44 | train_mask = np.zeros((batchSize, targetHeight / 2, targetWidth / 2, getKpNum(self.category) ), dtype=np.float) 45 | train_gthmap = np.zeros((batchSize, targetHeight / 2, targetWidth / 2, getKpNum(self.category) ), dtype=np.float) 46 | train_ohem_mask = np.zeros((batchSize, targetHeight / 2, targetWidth / 2, getKpNum(self.category) ), dtype=np.float) 47 | train_ohem_gthmap = np.zeros((batchSize, targetHeight / 2, targetWidth / 2, getKpNum(self.category) ), dtype=np.float) 48 | 49 | ## generator need to be infinite loop 50 | while 1: 51 | # random shuffle at first 52 | if shuffle: 53 | xdf = xdf.sample(frac=1) 54 | count = 0 55 | for _index, _row in xdf.iterrows(): 56 | xindex = count % batchSize 57 | xinput, xhmap = self._prcoess_img(_row, inputSize, rotateFlag, flipFlag, cropFlag, nobgFlag=True) 58 | xmask = generate_input_mask(_row['image_category'], 59 | (targetHeight, targetWidth, getKpNum(self.category))) 60 | 61 | xohem_mask, xohem_gthmap = generate_topk_mask_ohem([xinput, xmask], xhmap, kerasModel, graph, 62 | 8, _row['image_category'], dynamicFlag=False) 63 | 64 | train_input[xindex, :, :, :] = xinput 65 | train_mask[xindex, :, :, :] = xmask 66 | train_gthmap[xindex, :, :, :] = xhmap 67 | train_ohem_mask[xindex, :, :, :] = xohem_mask 68 | train_ohem_gthmap[xindex, :, :, :] = xohem_gthmap 69 | 70 | # if refinenet enable, refinenet has two outputs, globalnet and refinenet 71 | if xindex == 0 and count != 0: 72 | gthamplst = list() 73 | for i in range(nStackNum): 74 | gthamplst.append(train_gthmap) 75 | 76 | # last stack will use ohem gthmap 77 | gthamplst.append(train_ohem_gthmap) 78 | 79 | yield [train_input, train_mask, train_ohem_mask], gthamplst 80 | 81 | count += 1 82 | 83 | def _initialize(self): 84 | self._load_anno() 85 | 86 | def _load_anno(self): 87 | ''' 88 | Load annotations from train.csv 89 | ''' 90 | # Todo: check if category legal 91 | self.train_img_path = "../../data/train" 92 | 93 | # read into dataframe 94 | xpd = pd.read_csv(self.annfile) 95 | xpd = load_annotation_from_df(xpd, self.category) 96 | self.annDataFrame = xpd 97 | 98 | def _prcoess_img(self, dfrow, inputSize, rotateFlag, flipFlag, cropFlag, nobgFlag): 99 | 100 | mlist = dfrow[getKpKeys(self.category)] 101 | imgName, kpStr = mlist[0], mlist[1:] 102 | 103 | # read kp annotation from csv file 104 | kpAnnlst = list() 105 | for _kpstr in kpStr: 106 | _kpAn = KpAnno.readFromStr(_kpstr) 107 | kpAnnlst.append(_kpAn) 108 | 109 | assert (len(kpAnnlst) == getKpNum(self.category)), str(len(kpAnnlst))+" is not the same as "+str(getKpNum(self.category)) 110 | 111 | 112 | xcvmat = cv2.imread(os.path.join(self.train_img_path, imgName)) 113 | if xcvmat is None: 114 | return None, None 115 | 116 | #flip as first operation. 117 | # flip image 118 | if random.choice([0, 1]) and flipFlag: 119 | xcvmat, kpAnnlst = self.flip_image(xcvmat, kpAnnlst) 120 | 121 | #if cropFlag: 122 | # xcvmat, kpAnnlst = crop_image(xcvmat, kpAnnlst, 0.8, 0.95) 123 | 124 | # pad image to 512x512 125 | paddedImg, kpAnnlst = pad_image(xcvmat, kpAnnlst, inputSize[0], inputSize[1]) 126 | 127 | assert (len(kpAnnlst) == getKpNum(self.category)), str(len(kpAnnlst)) + " is not the same as " + str( 128 | getKpNum(self.category)) 129 | 130 | # output ground truth heatmap is 256x256 131 | trainGtHmap = self.__generate_hmap(paddedImg, kpAnnlst) 132 | 133 | if random.choice([0,1]) and rotateFlag: 134 | rAngle = np.random.randint(-1*40, 40) 135 | rotatedImage, _ = rotate_image(paddedImg, list(), rAngle) 136 | rotatedGtHmap = rotate_mask(trainGtHmap, rAngle) 137 | else: 138 | rotatedImage = paddedImg 139 | rotatedGtHmap = trainGtHmap 140 | 141 | # resize image 142 | resizedImg = cv2.resize(rotatedImage, inputSize) 143 | resizedGtHmap = cv2.resize(rotatedGtHmap, (inputSize[0]//2, inputSize[1]//2)) 144 | 145 | return normalize_image(resizedImg), resizedGtHmap 146 | 147 | 148 | def __generate_hmap(self, cvmat, kpAnnolst): 149 | # kpnum + background 150 | gthmp = np.zeros((cvmat.shape[0], cvmat.shape[1], getKpNum(self.category)), dtype=np.float) 151 | 152 | for i, _kpAnn in enumerate(kpAnnolst): 153 | if _kpAnn.visibility == -1: 154 | continue 155 | 156 | radius = 100 157 | gaussMask = make_gaussian(radius, radius, 20, None) 158 | 159 | # avoid out of boundary 160 | top_x, top_y = max(0, _kpAnn.x - radius/2), max(0, _kpAnn.y - radius/2) 161 | bottom_x, bottom_y = min(cvmat.shape[1], _kpAnn.x + radius/2), min(cvmat.shape[0], _kpAnn.y + radius/2) 162 | 163 | top_x_offset = top_x - (_kpAnn.x - radius/2) 164 | top_y_offset = top_y - (_kpAnn.y - radius/2) 165 | 166 | gthmp[ top_y:bottom_y, top_x:bottom_x, i] = gaussMask[top_y_offset:top_y_offset + bottom_y-top_y, 167 | top_x_offset:top_x_offset + bottom_x-top_x] 168 | 169 | return gthmp 170 | 171 | def flip_image(self, orgimg, orgKpAnolst): 172 | flipImg = cv2.flip(orgimg, flipCode=1) 173 | flipannlst = self.flip_annlst(orgKpAnolst, orgimg.shape) 174 | return flipImg, flipannlst 175 | 176 | 177 | def flip_annlst(self, kpannlst, imgshape): 178 | height, width, channels = imgshape 179 | 180 | # flip first 181 | flipAnnlst = list() 182 | for _kp in kpannlst: 183 | flip_x = width - _kp.x 184 | flipAnnlst.append(KpAnno(flip_x, _kp.y, _kp.visibility)) 185 | 186 | # exchange location of flip keypoints, left->right 187 | outAnnlst = flipAnnlst[:] 188 | for i, _kp in enumerate(flipAnnlst): 189 | mapId = getFlipMapID('all', i) 190 | outAnnlst[mapId] = _kp 191 | 192 | return outAnnlst 193 | 194 | 195 | 196 | 197 | -------------------------------------------------------------------------------- /src/data_gen/data_process.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import cv2 4 | import os 5 | from kpAnno import KpAnno 6 | 7 | def normalize_image(cvmat): 8 | assert (cvmat.dtype == np.uint8) , " only support normalize np.uint8 to float -0.5 ~ 0.5'" 9 | cvmat = cvmat.astype(np.float) 10 | cvmat = (cvmat - 128.0) / 256.0 11 | return cvmat 12 | 13 | def resize_image(cvmat, targetWidth, targetHeight): 14 | 15 | assert (cvmat.dtype == np.uint8) , " only support normalize np.uint8 in resize_image'" 16 | 17 | # get scale 18 | srcHeight, srcWidth, channles = cvmat.shape 19 | minScale = min( targetHeight*1.0/srcHeight, targetWidth*1.0/srcWidth) 20 | 21 | # resize 22 | resizedMat = cv2.resize(cvmat, None, fx=minScale, fy=minScale) 23 | reHeight, reWidth, channles = resizedMat.shape 24 | 25 | # pad to targetWidth or targetHeight 26 | outmat = np.zeros((targetHeight, targetWidth, 3), dtype=cvmat.dtype) + 128 27 | 28 | if targetHeight == reHeight and targetWidth == reWidth: 29 | outmat = resizedMat 30 | elif targetWidth != reWidth and targetHeight == reHeight: 31 | # add pad to width 32 | outmat[:, 0:reWidth, :] = resizedMat 33 | elif targetHeight != reHeight and targetWidth == reWidth: 34 | # add padding to height 35 | outmat[0:reHeight, :, :] = resizedMat 36 | else: 37 | assert(0), "after resize either width or height same as target width or target height" 38 | return (outmat, minScale) 39 | 40 | def pad_image(cvmat, kpAnno, targetWidth, targetHeight): 41 | ''' 42 | 43 | :param cvmat: input mat 44 | :param targetWidth: width to pad 45 | :param targetHeight: height to pad 46 | :return: 47 | ''' 48 | assert (cvmat.dtype == np.uint8) , " only support normalize np.uint8 in pad_image'" + str(cvmat.dtype) 49 | 50 | srcHeight, srcWidth, channles = cvmat.shape 51 | outmat = np.zeros((targetHeight, targetWidth, 3), dtype=cvmat.dtype) + 128 52 | 53 | if targetHeight == srcHeight and targetWidth == srcWidth: 54 | outmat = cvmat 55 | outkpAnno = kpAnno 56 | elif targetWidth != srcWidth and targetHeight == srcHeight: 57 | # add pad to width 58 | outmat[:, 0:srcWidth, :] = cvmat 59 | outkpAnno = kpAnno 60 | elif targetHeight != srcHeight and targetWidth == srcWidth: 61 | # add padding to height 62 | outmat[0:srcHeight, :, :] = cvmat 63 | outkpAnno = kpAnno 64 | else: 65 | # resize at first, then pad 66 | outmat, scale = resize_image(cvmat, targetWidth, targetHeight) 67 | outkpAnno = list() 68 | for _kpAnno in kpAnno: 69 | _nkp = KpAnno.applyScale(_kpAnno, scale) 70 | outkpAnno.append(_nkp) 71 | return (outmat, outkpAnno) 72 | 73 | 74 | def pad_image_inference(cvmat, targetWidth, targetHeight): 75 | ''' 76 | 77 | :param cvmat: input mat 78 | :param targetWidth: width to pad 79 | :param targetHeight: height to pad 80 | :return: 81 | ''' 82 | assert (cvmat.dtype == np.uint8), " only support normalize np.uint8 in pad_image'" + str(cvmat.dtype) 83 | 84 | srcHeight, srcWidth, channles = cvmat.shape 85 | outmat = np.zeros((targetHeight, targetWidth, 3), dtype=cvmat.dtype) + 128 86 | 87 | if targetHeight == srcHeight and targetWidth == srcWidth: 88 | outmat = cvmat 89 | scale = 1.0 90 | elif targetWidth > srcWidth and targetHeight == srcHeight: 91 | # add pad to width 92 | outmat[:, 0:srcWidth, :] = cvmat 93 | scale = 1.0 94 | elif targetHeight > srcHeight and targetWidth == srcWidth: 95 | # add padding to height 96 | outmat[0:srcHeight, :, :] = cvmat 97 | scale = 1.0 98 | else: 99 | # resize at first, then pad 100 | outmat, scale = resize_image(cvmat, targetWidth, targetHeight) 101 | 102 | return (outmat, scale) 103 | 104 | def rotate_image(cvmat, kpAnnLst, rotateAngle): 105 | 106 | assert (cvmat.dtype == np.uint8) , " only support normalize np.uint8 in rotate_image'" 107 | 108 | ##Make sure cvmat is square? 109 | height, width, channel = cvmat.shape 110 | 111 | center = ( width//2, height//2) 112 | rotateMatrix = cv2.getRotationMatrix2D(center, rotateAngle, 1.0) 113 | 114 | cos, sin = np.abs(rotateMatrix[0,0]), np.abs(rotateMatrix[0, 1]) 115 | newH = int((height*sin)+(width*cos)) 116 | newW = int((height*cos)+(width*sin)) 117 | 118 | rotateMatrix[0,2] += (newW/2) - center[0] #x 119 | rotateMatrix[1,2] += (newH/2) - center[1] #y 120 | 121 | # rotate image 122 | outMat = cv2.warpAffine(cvmat, rotateMatrix, (newH, newW), borderValue=(128, 128, 128)) 123 | 124 | # rotate annotations 125 | nKpLst = list() 126 | for _kp in kpAnnLst: 127 | _newkp = KpAnno.applyRotate(_kp, rotateMatrix) 128 | nKpLst.append(_newkp) 129 | 130 | return (outMat, nKpLst) 131 | 132 | 133 | def rotate_image_with_invrmat(cvmat, rotateAngle): 134 | 135 | assert (cvmat.dtype == np.uint8) , " only support normalize np.uint in rotate_image_with_invrmat'" 136 | 137 | ##Make sure cvmat is square? 138 | height, width, channel = cvmat.shape 139 | 140 | center = ( width//2, height//2) 141 | rotateMatrix = cv2.getRotationMatrix2D(center, rotateAngle, 1.0) 142 | 143 | cos, sin = np.abs(rotateMatrix[0,0]), np.abs(rotateMatrix[0, 1]) 144 | newH = int((height*sin)+(width*cos)) 145 | newW = int((height*cos)+(width*sin)) 146 | 147 | rotateMatrix[0,2] += (newW/2) - center[0] #x 148 | rotateMatrix[1,2] += (newH/2) - center[1] #y 149 | 150 | # rotate image 151 | outMat = cv2.warpAffine(cvmat, rotateMatrix, (newH, newW), borderValue=(128, 128, 128)) 152 | 153 | # generate inv rotate matrix 154 | invRotateMatrix = cv2.invertAffineTransform(rotateMatrix) 155 | 156 | return (outMat, invRotateMatrix, (width, height)) 157 | 158 | def rotate_mask(mask, rotateAngle): 159 | 160 | outmask = rotate_image_float(mask, rotateAngle) 161 | 162 | return outmask 163 | 164 | def rotate_image_float(cvmat, rotateAngle, borderValue=(0.0, 0.0, 0.0)): 165 | 166 | assert (cvmat.dtype == np.float) , " only support normalize np.float in rotate_image_float'" 167 | 168 | ##Make sure cvmat is square? 169 | height, width, channels = cvmat.shape 170 | 171 | center = ( width//2, height//2) 172 | rotateMatrix = cv2.getRotationMatrix2D(center, rotateAngle, 1.0) 173 | 174 | cos, sin = np.abs(rotateMatrix[0,0]), np.abs(rotateMatrix[0, 1]) 175 | newH = int((height*sin)+(width*cos)) 176 | newW = int((height*cos)+(width*sin)) 177 | 178 | rotateMatrix[0,2] += (newW/2) - center[0] #x 179 | rotateMatrix[1,2] += (newH/2) - center[1] #y 180 | 181 | # rotate image 182 | outMat = cv2.warpAffine(cvmat, rotateMatrix, (newH, newW), borderValue=borderValue) 183 | 184 | return outMat 185 | 186 | 187 | def crop_image(cvmat, kpAnnLst, lowLimitRatio, upLimitRatio): 188 | import random 189 | 190 | assert(lowLimitRatio < 1.0), 'lowLimitRatio should be less than 1.0' 191 | assert(upLimitRatio < 1.0), 'upLimitRatio should be less than 1.0' 192 | 193 | height, width, channels = cvmat.shape 194 | 195 | cropHeight = random.randrange(int(lowLimitRatio*height), int(upLimitRatio*height)) 196 | cropWidth = random.randrange(int(lowLimitRatio*width), int(upLimitRatio*width)) 197 | 198 | top_x = random.randrange(0, width - cropWidth) 199 | top_y = random.randrange(0, height - cropHeight) 200 | 201 | # apply offset for keypoints 202 | nKpLst = list() 203 | for _kp in kpAnnLst: 204 | if _kp.visibility == -1: 205 | _newkp = _kp 206 | else: 207 | _newkp = KpAnno.applyOffset(_kp, (top_x, top_y)) 208 | if _newkp.x <=0 or _newkp.y <=0: 209 | # negative location, return original image 210 | return cvmat, kpAnnLst 211 | if _newkp.x >= cropWidth or _newkp.y >= cropHeight: 212 | # keypoints are cropped out 213 | return cvmat, kpAnnLst 214 | nKpLst.append(_newkp) 215 | 216 | return cvmat[top_y:top_y+cropHeight, top_x:top_x+cropWidth], nKpLst 217 | 218 | if __name__ == "__main__": 219 | pass -------------------------------------------------------------------------------- /src/data_gen/dataset.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | def getKpNum(category): 4 | # remove one column 'image_id' 5 | return len(getKpKeys(category)) - 1 6 | 7 | TROUSERS_PART_KYES=['waistband_left', 'waistband_right', 'crotch', 'bottom_left_in', 'bottom_left_out', 'bottom_right_in', 'bottom_right_out'] 8 | TROUSERS_PART_FLIP_KYES=['waistband_right', 'waistband_left', 'crotch', 'bottom_right_in', 'bottom_right_out', 'bottom_left_in', 'bottom_left_out'] 9 | 10 | SKIRT_PART_KEYS=['waistband_left', 'waistband_right', 'hemline_left', 'hemline_right'] 11 | SKIRT_PART_FLIP_KEYS=['waistband_right', 'waistband_left', 'hemline_right', 'hemline_left'] 12 | 13 | 14 | DRESS_PART_KEYS= ['neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 'center_front', 15 | 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 16 | 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'hemline_left', 'hemline_right'] 17 | DRESS_PART_FLIP_KEYS=['neckline_right', 'neckline_left', 'shoulder_right', 'shoulder_left', 'center_front', 18 | 'armpit_right', 'armpit_left', 'waistline_right', 'waistline_left', 'cuff_right_in', 19 | 'cuff_right_out', 'cuff_left_in', 'cuff_left_out', 'hemline_right', 'hemline_left'] 20 | 21 | BLOUSE_PART_KEYS=['neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 22 | 'center_front', 'armpit_left', 'armpit_right', 'top_hem_left', 'top_hem_right', 23 | 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out'] 24 | 25 | BLOUSE_PART_FLIP_KEYS=['neckline_right', 'neckline_left', 'shoulder_right', 'shoulder_left', 26 | 'center_front', 'armpit_right', 'armpit_left', 'top_hem_right', 'top_hem_left', 27 | 'cuff_right_in', 'cuff_right_out', 'cuff_left_in', 'cuff_left_out'] 28 | 29 | OUTWEAR_PART_KEYS=['neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 30 | 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 31 | 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right'] 32 | 33 | OUTWEAR_PART_FLIP_KEYS = ['neckline_right', 'neckline_left', 'shoulder_right', 'shoulder_left', 34 | 'armpit_right', 'armpit_left', 'waistline_right', 'waistline_left', 'cuff_right_in', 35 | 'cuff_right_out', 'cuff_left_in', 'cuff_left_out', 'top_hem_right', 'top_hem_left'] 36 | 37 | ALL_PART_KEYS = ['neckline_left', 'neckline_right', 'center_front', 'shoulder_left', 'shoulder_right', 38 | 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 39 | 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right', 'waistband_left', 'waistband_right', 40 | 'hemline_left', 'hemline_right', 'crotch', 'bottom_left_in', 'bottom_left_out', 41 | 'bottom_right_in', 'bottom_right_out'] 42 | 43 | ALL_PART_FLIP_KEYS = [ 'neckline_right', 'neckline_left', 'center_front', 'shoulder_right', 'shoulder_left', 44 | 'armpit_right', 'armpit_left', 'waistline_right', 'waistline_left', 'cuff_right_in', 'cuff_right_out', 45 | 'cuff_left_in', 'cuff_left_out', 'top_hem_right', 'top_hem_left', 'waistband_right','waistband_left', 46 | 'hemline_right', 'hemline_left', 'crotch', 'bottom_right_in', 'bottom_right_out', 47 | 'bottom_left_in', 'bottom_left_out'] 48 | 49 | def getFlipKeys(category): 50 | if category == 'skirt': 51 | keys, mapkeys = SKIRT_PART_KEYS, SKIRT_PART_FLIP_KEYS 52 | elif category == 'dress': 53 | keys, mapkeys = DRESS_PART_KEYS, DRESS_PART_FLIP_KEYS 54 | elif category == 'trousers': 55 | keys, mapkeys = TROUSERS_PART_KYES, TROUSERS_PART_FLIP_KYES 56 | elif category == 'blouse': 57 | keys, mapkeys = BLOUSE_PART_KEYS, BLOUSE_PART_FLIP_KEYS 58 | elif category == 'outwear': 59 | keys, mapkeys = OUTWEAR_PART_KEYS, OUTWEAR_PART_FLIP_KEYS 60 | elif category == 'all': 61 | keys, mapkeys = ALL_PART_KEYS, ALL_PART_FLIP_KEYS 62 | else: 63 | assert (0), category + " not supported" 64 | 65 | xdict = dict() 66 | for i in range(len(keys)): 67 | xdict[keys[i]] = mapkeys[i] 68 | return keys, xdict 69 | 70 | def getFlipMapID(category, partid): 71 | keys, mapDict = getFlipKeys(category) 72 | mapKey = mapDict[keys[partid]] 73 | mapID = keys.index(mapKey) 74 | return mapID 75 | 76 | def getKpKeys(category): 77 | ''' 78 | 79 | :param category: 80 | :return: get the keypoint keys in annotation csv 81 | ''' 82 | SKIRT_KP_KEYS = ['image_id', 'waistband_left', 'waistband_right', 'hemline_left', 'hemline_right'] 83 | DRESS_KP_KEYS = ['image_id', 'neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 'center_front', 84 | 'armpit_left', 'armpit_right' , 'waistline_left' , 'waistline_right', 'cuff_left_in', 85 | 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'hemline_left', 'hemline_right'] 86 | TROUSERS_KP_KEYS=['image_id', 'waistband_left', 'waistband_right', 'crotch', 'bottom_left_in', 87 | 'bottom_left_out', 'bottom_right_in', 'bottom_right_out'] 88 | BLOUSE_KP_KEYS = [ 'image_id', 'neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 89 | 'center_front', 'armpit_left', 'armpit_right', 'top_hem_left', 'top_hem_right', 90 | 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out'] 91 | OUTWEAR_KP_KEYS= ['image_id', 'neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 92 | 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 93 | 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right'] 94 | 95 | ALL_KP_KESY = ['image_id','neckline_left', 'neckline_right', 'center_front', 'shoulder_left', 'shoulder_right', 96 | 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 97 | 'cuff_right_out', 'top_hem_left', 'top_hem_right', 'waistband_left', 'waistband_right', 'hemline_left', 'hemline_right' , 98 | 'crotch', 'bottom_left_in' , 'bottom_left_out', 'bottom_right_in' ,'bottom_right_out'] 99 | 100 | if category == 'skirt': 101 | return SKIRT_KP_KEYS 102 | elif category == 'dress': 103 | return DRESS_KP_KEYS 104 | elif category == 'trousers': 105 | return TROUSERS_KP_KEYS 106 | elif category == 'blouse': 107 | return BLOUSE_KP_KEYS 108 | elif category == 'outwear': 109 | return OUTWEAR_KP_KEYS 110 | elif category == 'all': 111 | return ALL_KP_KESY 112 | else: 113 | assert(0), category + ' not supported' 114 | 115 | 116 | def fill_dataframe(kplst, category, dfrow): 117 | keys = getKpKeys(category)[1:] 118 | 119 | # fill category 120 | dfrow['image_category'] = category 121 | 122 | assert (len(keys) == len(kplst)), str(len(kplst)) + ' must be the same as ' + str(len(keys)) 123 | for i, _key in enumerate(keys): 124 | kpann = kplst[i] 125 | outstr = str(int(kpann.x))+"_"+str(int(kpann.y))+"_"+str(1) 126 | dfrow[_key] = outstr 127 | 128 | 129 | def get_kp_index_from_allkeys(kpname): 130 | ALL_KP_KEYS = ['neckline_left', 'neckline_right', 'center_front', 'shoulder_left', 'shoulder_right', 131 | 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 132 | 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right', 'waistband_left', 'waistband_right', 133 | 'hemline_left', 'hemline_right', 'crotch', 'bottom_left_in', 'bottom_left_out', 'bottom_right_in', 'bottom_right_out'] 134 | 135 | return ALL_KP_KEYS.index(kpname) 136 | 137 | 138 | def generate_input_mask(image_category, shape, nobgFlag=True): 139 | import numpy as np 140 | # 0.0 for invalid key points for each category 141 | # 1.0 for valid key points for each category 142 | h, w, c = shape 143 | mask = np.zeros((h // 2, w // 2, c), dtype=np.float) 144 | 145 | for key in getKpKeys(image_category)[1:]: 146 | index = get_kp_index_from_allkeys(key) 147 | mask[:, :, index] = 1.0 148 | 149 | # for last channel, background 150 | if nobgFlag: mask[:, :, -1] = 0.0 151 | else: mask[:, :, -1] = 1.0 152 | 153 | return mask -------------------------------------------------------------------------------- /src/data_gen/kpAnno.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class KpAnno(object): 5 | ''' 6 | Convert string to x, y, visibility 7 | ''' 8 | def __init__(self, x, y, visibility): 9 | self.x = int(x) 10 | self.y = int(y) 11 | self.visibility = visibility 12 | 13 | @classmethod 14 | def readFromStr(cls, xstr): 15 | xarray = xstr.split('_') 16 | x = int(xarray[0]) 17 | y = int(xarray[1]) 18 | visibility = int(xarray[2]) 19 | return cls(x,y, visibility) 20 | 21 | @classmethod 22 | def applyScale(cls, kpAnno, scale): 23 | x = int(kpAnno.x*scale) 24 | y = int(kpAnno.y*scale) 25 | v = kpAnno.visibility 26 | return cls(x, y, v) 27 | 28 | @classmethod 29 | def applyRotate(cls, kpAnno, rotateMatrix): 30 | vector = [kpAnno.x, kpAnno.y, 1] 31 | rotatedV = np.dot(rotateMatrix, vector) 32 | return cls( int(rotatedV[0]), int(rotatedV[1]), kpAnno.visibility) 33 | 34 | @classmethod 35 | def applyOffset(cls, kpAnno, offset): 36 | x = kpAnno.x - offset[0] 37 | y = kpAnno.y - offset[1] 38 | v = kpAnno.visibility 39 | return cls(x, y, v) 40 | 41 | @staticmethod 42 | def calcDistance(kpA, kpB): 43 | distance = (kpA.x - kpB.x)**2 + (kpA.y - kpB.y)**2 44 | return np.sqrt(distance) 45 | -------------------------------------------------------------------------------- /src/data_gen/ohem.py: -------------------------------------------------------------------------------- 1 | 2 | import sys 3 | sys.path.insert(0, "../unet/") 4 | 5 | from keras.models import * 6 | from keras.layers import * 7 | from utils import np_euclidean_l2 8 | from dataset import getKpNum 9 | 10 | def generate_topk_mask_ohem(input_data, gthmap, keras_model, graph, topK, image_category, dynamicFlag=False): 11 | ''' 12 | :param input_data: input 13 | :param gthmap: ground truth 14 | :param keras_model: keras model 15 | :param graph: tf grpah to WA thread issue 16 | :param topK: number of kp selected 17 | :return: 18 | ''' 19 | 20 | # do inference, and calculate loss of each channel 21 | mimg, mmask = input_data 22 | ximg = mimg[np.newaxis,:,:,:] 23 | xmask = mmask[np.newaxis,:,:,:] 24 | 25 | if len(keras_model.input_layers) == 3: 26 | # use original mask as ohem_mask 27 | inputs = [ximg, xmask, xmask] 28 | else: 29 | inputs = [ximg, xmask] 30 | 31 | with graph.as_default(): 32 | keras_output = keras_model.predict(inputs) 33 | 34 | # heatmap of last stage 35 | outhmap = keras_output[-1] 36 | 37 | channel_num = gthmap.shape[-1] 38 | 39 | # calculate loss 40 | mloss = list() 41 | for i in range(channel_num): 42 | _dtmap = outhmap[0, :, :, i] 43 | _gtmap = gthmap[:, :, i] 44 | loss = np_euclidean_l2(_dtmap, _gtmap) 45 | mloss.append(loss) 46 | 47 | # refill input_mask, set topk as 1.0 and fill 0.0 for rest 48 | # fixme: topk may different b/w category 49 | if dynamicFlag: 50 | topK = getKpNum(image_category)//2 51 | 52 | ohem_mask = adjsut_mask(mloss, mmask, topK) 53 | 54 | ohem_gthmap = ohem_mask * gthmap 55 | 56 | return ohem_mask, ohem_gthmap 57 | 58 | def adjsut_mask(loss, input_mask, topk): 59 | # pick topk loss from losses 60 | # fill topk with 1.0 and fill the rest as 0.0 61 | assert (len(loss) == input_mask.shape[-1]), \ 62 | "shape should be same" + str(len(loss)) + " vs " + str(input_mask.shape) 63 | 64 | outmask = np.zeros(input_mask.shape, dtype=np.float) 65 | 66 | topk_index = sorted(range(len(loss)), key=lambda i:loss[i])[-topk:] 67 | 68 | for i in range(len(loss)): 69 | if i in topk_index: 70 | outmask[:,:,i] = 1.0 71 | 72 | return outmask 73 | -------------------------------------------------------------------------------- /src/data_gen/utils.py: -------------------------------------------------------------------------------- 1 | 2 | import numpy as np 3 | import pandas as pd 4 | import os 5 | 6 | def make_gaussian(width, height, sigma=3, center=None): 7 | ''' 8 | generate 2d guassion heatmap 9 | :return: 10 | ''' 11 | 12 | x = np.arange(0, width, 1, float) 13 | y = np.arange(0, height, 1, float)[:, np.newaxis] 14 | 15 | if center is None: 16 | x0 = width // 2 17 | y0 = height // 2 18 | else: 19 | x0 = center[0] 20 | y0 = center[1] 21 | 22 | return np.exp( -4*np.log(2)*((x-x0)**2 + (y-y0)**2)/sigma**2) 23 | 24 | 25 | def split_csv_train_val(allcsv, traincsv, valcsv, ratio=0.8): 26 | xdf = pd.read_csv(allcsv) 27 | # random shuffle 28 | xdf = xdf.sample(frac=1) 29 | 30 | # random sampling 31 | msk = np.random.rand(len(xdf)) < ratio 32 | trainDf= xdf[msk] 33 | valDf= xdf[~msk] 34 | print "total", len(xdf), "split into train ", len(trainDf), ' val', len(valDf) 35 | 36 | #save to file 37 | trainDf.to_csv(traincsv, index=False) 38 | valDf.to_csv(valcsv, index=False) 39 | 40 | 41 | def np_euclidean_l2(x, y): 42 | assert (x.shape == y.shape), "shape mismatched " + x.shape +" : " + y.shape 43 | loss = np.sum((x - y)**2) 44 | loss = np.sqrt(loss) 45 | return loss 46 | 47 | 48 | def load_annotation_from_df(df, category): 49 | if category == 'all': 50 | return df 51 | else: 52 | return df[df['image_category'] == category] 53 | 54 | 55 | -------------------------------------------------------------------------------- /src/eval/eval_callback.py: -------------------------------------------------------------------------------- 1 | 2 | import keras 3 | import os 4 | import datetime 5 | from evaluation import Evaluation 6 | from time import time 7 | class NormalizedErrorCallBack(keras.callbacks.Callback): 8 | 9 | def __init__(self, foldpath, category, multiOut=False, resumeFolder=None): 10 | self.parentFoldPath = foldpath 11 | self.category = category 12 | 13 | if resumeFolder is None: 14 | self.foldPath = os.path.join(self.parentFoldPath, self.category, datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')) 15 | if not os.path.exists(self.foldPath): 16 | os.mkdir(self.foldPath) 17 | else: 18 | self.foldPath = resumeFolder 19 | 20 | self.valLog = os.path.join(self.foldPath, 'val.log') 21 | self.multiOut = multiOut 22 | 23 | def get_folder_path(self): 24 | return self.foldPath 25 | 26 | def on_epoch_end(self, epoch, logs=None): 27 | modelName = os.path.join(self.foldPath, self.category+"_weights_"+str(epoch)+".hdf5") 28 | keras.models.save_model(self.model, modelName) 29 | print "Saving model to ", modelName 30 | 31 | print "Runing evaluation ........." 32 | 33 | xEval = Evaluation(self.category, None) 34 | xEval.init_from_model(self.model) 35 | 36 | start = time() 37 | neScore, categoryDict = xEval.eval(self.multiOut, details=True) 38 | end = time() 39 | print "Evaluation Done", str(neScore), " cost ", end - start, " seconds!" 40 | 41 | for key in categoryDict.keys(): 42 | scores = categoryDict[key] 43 | print key, ' score ', sum(scores)/len(scores) 44 | 45 | with open(self.valLog , 'a+') as xfile: 46 | xfile.write(modelName + ", Socre "+ str(neScore)+"\n") 47 | for key in categoryDict.keys(): 48 | scores = categoryDict[key] 49 | xfile.write(key + ": " + str(sum(scores)/len(scores)) + "\n") 50 | 51 | xfile.close() -------------------------------------------------------------------------------- /src/eval/evaluation.py: -------------------------------------------------------------------------------- 1 | 2 | import sys 3 | sys.path.insert(0, "../data_gen/") 4 | sys.path.insert(0, "../unet/") 5 | 6 | import pandas as pd 7 | from dataset import getKpKeys, getKpNum, getFlipMapID, get_kp_index_from_allkeys, generate_input_mask 8 | from kpAnno import KpAnno 9 | from post_process import post_process_heatmap 10 | from keras.models import load_model 11 | import os 12 | from refinenet_mask_v3 import euclidean_loss 13 | import numpy as np 14 | import cv2 15 | from resnet101 import Scale 16 | from utils import load_annotation_from_df 17 | from collections import defaultdict 18 | import copy 19 | from data_process import pad_image_inference 20 | 21 | class Evaluation(object): 22 | def __init__(self, category, modelFile): 23 | self.category = category 24 | self.train_img_path = "../../data/train" 25 | if modelFile is not None: 26 | self._initialize(modelFile) 27 | 28 | def init_from_model(self, model): 29 | self._load_anno() 30 | self.net = model 31 | 32 | def eval(self, multiOut=False, details=False, flip=True): 33 | xdf = self.annDataFrame 34 | scores = list() 35 | xdict = dict() 36 | xcategoryDict = defaultdict(list) 37 | for _index, _row in xdf.iterrows(): 38 | imgId = _row['image_id'] 39 | category = _row['image_category'] 40 | imgFile = os.path.join(self.train_img_path, imgId) 41 | gtKpAnno = self._get_groundtruth_kpAnno(_row) 42 | if flip: 43 | predKpAnno = self.predict_kp_with_flip(imgFile, category) 44 | else: 45 | predKpAnno = self.predict_kp(imgFile, category, multiOut) 46 | neScore = Evaluation.calc_ne_score(category, predKpAnno, gtKpAnno) 47 | scores.extend(neScore) 48 | if details: 49 | xcategoryDict[category].extend(neScore) 50 | if details: 51 | return sum(scores)/len(scores), xcategoryDict 52 | else: 53 | return sum(scores)/len(scores) 54 | 55 | def _initialize(self, modelFile): 56 | self._load_anno() 57 | self._initialize_network(modelFile) 58 | 59 | def _initialize_network(self, modelFile): 60 | self.net = load_model(modelFile, custom_objects={'euclidean_loss': euclidean_loss, 'Scale': Scale}) 61 | 62 | def _load_anno(self): 63 | ''' 64 | Load annotations from train.csv 65 | ''' 66 | self.annfile = os.path.join("../../data/train/Annotations", "val_split.csv") 67 | 68 | # read into dataframe 69 | xpd = pd.read_csv(self.annfile) 70 | xpd = load_annotation_from_df(xpd, self.category) 71 | self.annDataFrame = xpd 72 | 73 | 74 | def _get_groundtruth_kpAnno(self, dfrow): 75 | mlist = dfrow[getKpKeys(self.category)] 76 | imgName, kpStr = mlist[0], mlist[1:] 77 | # read kp annotation from csv file 78 | kpAnnlst = [KpAnno.readFromStr(_kpstr) for _kpstr in kpStr] 79 | return kpAnnlst 80 | 81 | def _net_inference_with_mask(self, imgFile, imgCategory): 82 | import cv2 83 | from data_process import normalize_image, pad_image_inference 84 | assert (len(self.net.input_layers) > 1), "input layer need to more than 1" 85 | 86 | # load image and preprocess 87 | img = cv2.imread(imgFile) 88 | 89 | img, scale = pad_image_inference(img, 512, 512) 90 | img = normalize_image(img) 91 | input_img = img[np.newaxis, :, :, :] 92 | 93 | input_mask = generate_input_mask(imgCategory, (512, 512, getKpNum(self.category)) ) 94 | input_mask = input_mask[np.newaxis, :, :, :] 95 | 96 | # inference 97 | heatmap = self.net.predict([input_img, input_mask, input_mask]) 98 | 99 | return (heatmap, scale) 100 | 101 | def _heatmap_sum(self, heatmaplst): 102 | outheatmap = np.copy(heatmaplst[0]) 103 | for i in range(1, len(heatmaplst), 1): 104 | outheatmap += heatmaplst[i] 105 | return outheatmap 106 | 107 | def predict_kp(self, imgFile, imgCategory, multiOutput=False): 108 | 109 | xnetout, scale = self._net_inference_with_mask(imgFile, imgCategory) 110 | 111 | if multiOutput: 112 | #todo: fixme, it is tricky that the previous stage has beeter performance than last stage's output. 113 | #todo: here, we are using multiple stage's output sum. 114 | heatmap = self._heatmap_sum(xnetout) 115 | else: 116 | heatmap = xnetout 117 | 118 | detectedKps = post_process_heatmap(heatmap, kpConfidenceTh=0.2) 119 | 120 | # scale to padded resolution 256X256 -> 512X512 121 | scaleTo512 = 2.0 122 | 123 | # apply scale to original resolution 124 | detectedKps = [KpAnno(_kp.x*scaleTo512/scale , _kp.y*scaleTo512/scale, _kp.visibility) for _kp in detectedKps] 125 | 126 | return detectedKps 127 | 128 | 129 | def predict_kp_with_flip(self, imgFile, imgCategory): 130 | # inference with flip and original image 131 | heatmap, scale = self._net_inference_flip(imgFile, imgCategory) 132 | 133 | detectedKps = post_process_heatmap(heatmap, kpConfidenceTh=0.2) 134 | 135 | # scale to padded resolution 256X256 -> 512X512 136 | scaleTo512 = 2.0 137 | 138 | # apply scale to original resolution 139 | detectedKps = [KpAnno(_kp.x * scaleTo512 / scale, _kp.y * scaleTo512 / scale, _kp.visibility) for _kp in 140 | detectedKps] 141 | 142 | return detectedKps 143 | 144 | def _net_inference_flip(self, imgFile, imgCategory): 145 | import cv2 146 | from data_process import normalize_image, pad_image_inference 147 | assert (len(self.net.input_layers) > 1), "input layer need to more than 1" 148 | 149 | batch_size =2 150 | 151 | input_img = np.zeros(shape=(batch_size, 512, 512, 3), dtype=np.float) 152 | input_mask = np.zeros(shape=(batch_size, 256, 256, getKpNum(self.category)), dtype=np.float) 153 | 154 | # load image and preprocess 155 | orgimage = cv2.imread(imgFile) 156 | 157 | padimg, scale = pad_image_inference(orgimage, 512, 512) 158 | flipimg = cv2.flip(padimg, flipCode=1) 159 | 160 | input_img[0,:,:,:] = normalize_image(padimg) 161 | input_img[1,:,:,:] = normalize_image(flipimg) 162 | 163 | mask = generate_input_mask(imgCategory, (512, 512, getKpNum(self.category))) 164 | input_mask[0,:,:,:] = mask 165 | input_mask[1,:,:,:] = mask 166 | 167 | # inference 168 | if len(self.net.input_layers) == 2: 169 | heatmap = self.net.predict([input_img, input_mask]) 170 | elif len(self.net.input_layers) == 3: 171 | heatmap = self.net.predict([input_img, input_mask, input_mask]) 172 | else: 173 | assert (0), str(len(self.net.input_layers)) + " should be 2 or 3 " 174 | 175 | # sum heatmap 176 | avgheatmap = self._heatmap_sum(heatmap) 177 | 178 | orgheatmap = avgheatmap[0,:,:,:] 179 | 180 | # convert to same sequency with original heatmap 181 | flipheatmap = avgheatmap[1,:,:,:] 182 | flipheatmap = self._flip_out_heatmap(flipheatmap) 183 | 184 | # average original and flip heatmap 185 | outheatmap = flipheatmap + orgheatmap 186 | outheatmap = outheatmap[np.newaxis, :, :, :] 187 | 188 | return (outheatmap, scale) 189 | 190 | def predict_kp_with_rotate(self, imgFile, imgCategory): 191 | # inference with rotated image 192 | rotateheatmap = self._net_inference_rotate(imgFile, imgCategory) 193 | rotateheatmap = rotateheatmap[np.newaxis, :, :, :] 194 | 195 | # original image and flip image 196 | orgflipmap, scale = self._net_inference_flip(imgFile, imgCategory) 197 | mflipmap = cv2.resize(orgflipmap[0,:,:,:], None, fx=2.0/scale, fy=2.0/scale) 198 | 199 | # add mflipmap and rotateheatmap 200 | avgheatmap = mflipmap[np.newaxis, :, :, :] 201 | 202 | b, h, w , c = rotateheatmap.shape 203 | avgheatmap[:, 0:h, 0:w,:] += rotateheatmap 204 | 205 | # generate key point locations 206 | detectedKps = post_process_heatmap(avgheatmap, kpConfidenceTh=0.2) 207 | 208 | return detectedKps 209 | 210 | def _net_inference_rotate(self, imgFile, imgCategory): 211 | from data_process import normalize_image, pad_image_inference, rotate_image_with_invrmat 212 | 213 | # load image and preprocess 214 | orgimage = cv2.imread(imgFile) 215 | 216 | anglelst = [-20, -10, 10, 20] 217 | 218 | input_img = np.zeros(shape=(len(anglelst), 512, 512, 3), dtype=np.float) 219 | input_mask = np.zeros(shape=(len(anglelst), 256, 256, getKpNum(self.category)), dtype=np.float) 220 | 221 | mlist = list() 222 | for i, angle in enumerate(anglelst): 223 | rotateimg, invRotMatrix, orgImgSize = rotate_image_with_invrmat(orgimage, angle) 224 | padimg, scale = pad_image_inference(rotateimg, 512, 512) 225 | _img = normalize_image(padimg) 226 | input_img[i, :, :, :] = _img 227 | mlist.append((scale, invRotMatrix)) 228 | 229 | mask = generate_input_mask(imgCategory, (512, 512, getKpNum(self.category))) 230 | for i, angle in enumerate(anglelst): 231 | input_mask[i, :,:,:] = mask 232 | 233 | # inference 234 | heatmap = self.net.predict([input_img, input_mask, input_mask]) 235 | heatmap = self._heatmap_sum(heatmap) 236 | 237 | # rotate back to original resolution 238 | sumheatmap = np.zeros(shape=(orgimage.shape[0], orgimage.shape[1], getKpNum(self.category)), dtype=np.float) 239 | for i, item in enumerate(mlist): 240 | _heatmap = heatmap[i, :, :, :] 241 | _scale, _invRotMatrix = item 242 | _heatmap = cv2.resize(_heatmap, None, fx=2.0 / _scale, fy=2.0 / _scale) 243 | _invheatmap = cv2.warpAffine(_heatmap, _invRotMatrix, (orgimage.shape[1], orgimage.shape[0])) 244 | sumheatmap += _invheatmap 245 | 246 | return sumheatmap 247 | 248 | def _flip_out_heatmap(self, flipout): 249 | outmap = np.zeros(flipout.shape, dtype=np.float) 250 | for i in range(flipout.shape[-1]): 251 | flipid = getFlipMapID(self.category, i) 252 | mask = np.copy(flipout[:, :, i]) 253 | outmap[:, :, flipid] = cv2.flip(mask, flipCode=1) 254 | return outmap 255 | 256 | 257 | @staticmethod 258 | def get_normized_distance(category, gtKp): 259 | ''' 260 | 261 | :param category: 262 | :param gtKp: 263 | :return: if ground truth's two points do not exist, return a big number 1e6 264 | ''' 265 | 266 | if category in ['skirt' ,'trousers']: 267 | ##waistband left and right 268 | waistband_left_index = get_kp_index_from_allkeys('waistband_left') 269 | waistband_right_index = get_kp_index_from_allkeys('waistband_right') 270 | 271 | if gtKp[waistband_left_index].visibility != -1 and gtKp[waistband_right_index].visibility != -1: 272 | distance = KpAnno.calcDistance(gtKp[waistband_left_index], gtKp[waistband_right_index]) 273 | else: 274 | distance = 1e6 275 | return distance 276 | elif category in ['blouse', 'dress', 'outwear']: 277 | armpit_left_index = get_kp_index_from_allkeys('armpit_left') 278 | armpit_right_index = get_kp_index_from_allkeys('armpit_right') 279 | ##armpit_left armpit_right' 280 | if gtKp[armpit_left_index].visibility != -1 and gtKp[armpit_right_index].visibility != -1: 281 | distance = KpAnno.calcDistance(gtKp[armpit_left_index], gtKp[armpit_right_index]) 282 | else: 283 | distance = 1e6 284 | return distance 285 | else: 286 | assert (0), category + " not implemented in _get_normized_distance" 287 | 288 | 289 | @staticmethod 290 | def calc_ne_score(category, dtKp, gtKp): 291 | 292 | assert (len(dtKp) == len(gtKp)), "predicted keypoint number should be the same as ground truth keypoints" + \ 293 | str(dtKp) + " vs " + str(gtKp) 294 | 295 | # calculate normalized error as score 296 | normalizedDistance = Evaluation.get_normized_distance(category, gtKp) 297 | 298 | mlist = list() 299 | for i in range(len(gtKp)): 300 | if gtKp[i].visibility == 1: 301 | dk = KpAnno.calcDistance(dtKp[i], gtKp[i]) 302 | mlist.append( dk/normalizedDistance) 303 | 304 | return mlist 305 | -------------------------------------------------------------------------------- /src/eval/post_process.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | from scipy.ndimage import gaussian_filter, maximum_filter 4 | from keras.layers import * 5 | from kpAnno import KpAnno 6 | 7 | def post_process_heatmap(heatMap, kpConfidenceTh=0.2): 8 | kplst = list() 9 | for i in range(heatMap.shape[-1]): 10 | # ignore last channel, background channel 11 | _map = heatMap[0, :, :, i] 12 | _map = gaussian_filter(_map, sigma=0.5) 13 | _nmsPeaks = non_max_supression(_map, windowSize=3, threshold=1e-6) 14 | 15 | y, x = np.where(_nmsPeaks == _nmsPeaks.max()) 16 | confidence = np.amax(_nmsPeaks) 17 | if confidence > kpConfidenceTh: 18 | kplst.append(KpAnno(x[0], y[0], 1)) 19 | else: 20 | kplst.append(KpAnno(x[0], y[0], -1)) 21 | return kplst 22 | 23 | def non_max_supression(plain, windowSize=3, threshold=1e-6): 24 | # clear value less than threshold 25 | under_th_indices = plain < threshold 26 | plain[under_th_indices] = 0 27 | return plain* (plain == maximum_filter(plain, footprint=np.ones((windowSize, windowSize)))) 28 | -------------------------------------------------------------------------------- /src/top/demo.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.insert(0, "../data_gen/") 3 | sys.path.insert(0, "../eval/") 4 | sys.path.insert(0, "../unet/") 5 | 6 | import argparse 7 | import os 8 | import pandas as pd 9 | import cv2 10 | from evaluation import Evaluation 11 | from dataset import getKpKeys, get_kp_index_from_allkeys 12 | 13 | def visualize_keypoint(imageName, category, dtkp): 14 | cvmat = cv2.imread(imageName) 15 | for key in getKpKeys(category)[1:]: 16 | index = get_kp_index_from_allkeys(key) 17 | _kp = dtkp[index] 18 | cv2.circle(cvmat, center=(_kp.x, _kp.y), radius=7, color=(1.0, 0.0, 0.0), thickness=2) 19 | cv2.imshow('demo', cvmat) 20 | cv2.waitKey() 21 | 22 | def demo(modelfile): 23 | 24 | # load network 25 | xEval = Evaluation('all', modelfile) 26 | 27 | # load images and run prediction 28 | testfile = os.path.join("../../data/test/", 'test.csv') 29 | xdf = pd.read_csv(testfile) 30 | xdf = xdf.sample(frac=1.0) 31 | 32 | for _index, _row in xdf.iterrows(): 33 | _image_id = _row['image_id'] 34 | _category = _row['image_category'] 35 | imageName = os.path.join("../../data/test", _image_id) 36 | print _image_id, _category 37 | dtkp = xEval.predict_kp_with_rotate(imageName, _category) 38 | visualize_keypoint(imageName, _category, dtkp) 39 | 40 | 41 | if __name__ == "__main__": 42 | parser = argparse.ArgumentParser() 43 | parser.add_argument("--gpuID", default=0, type=int, help='gpu id') 44 | parser.add_argument("--modelfile", help="file of model") 45 | 46 | args = parser.parse_args() 47 | 48 | print args 49 | 50 | os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" 51 | os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpuID) 52 | 53 | demo(args.modelfile) -------------------------------------------------------------------------------- /src/top/test.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.insert(0, "../data_gen/") 3 | sys.path.insert(0, "../eval/") 4 | sys.path.insert(0, "../unet/") 5 | 6 | import argparse 7 | import os 8 | from fashion_net import FashionNet 9 | from dataset import getKpNum, getKpKeys 10 | import pandas as pd 11 | from evaluation import Evaluation 12 | import pickle 13 | import numpy as np 14 | 15 | 16 | def get_best_single_model(valfile): 17 | ''' 18 | :param valfile: the log file with validation score for each snapshot 19 | :return: model file and score 20 | ''' 21 | 22 | def get_key(item): 23 | return item[1] 24 | 25 | with open(valfile) as xval: 26 | lines = xval.readlines() 27 | 28 | xlist = list() 29 | for linenum, xline in enumerate(lines): 30 | if 'hdf5' in xline and 'Socre' in xline: 31 | modelname = xline.strip().split(',')[0] 32 | overallscore = xline.strip().split(',')[1] 33 | xlist.append((modelname, overallscore)) 34 | 35 | bestmodel = sorted(xlist, key=get_key)[0] 36 | 37 | return bestmodel 38 | 39 | 40 | def fill_dataframe(kplst, keys, dfrow, image_category): 41 | # fill category 42 | 43 | dfrow['image_category'] = image_category 44 | 45 | assert (len(keys) == len(kplst)), str(len(kplst)) + ' must be the same as ' + str(len(keys)) 46 | for i, _key in enumerate(keys): 47 | kpann = kplst[i] 48 | outstr = str(int(kpann.x))+"_"+str(int(kpann.y))+"_"+str(1) 49 | dfrow[_key] = outstr 50 | 51 | def get_kp_from_dict(mdict, image_category, image_id): 52 | if image_category in mdict.keys(): 53 | xdict = mdict[image_category] 54 | else: 55 | xdict = mdict['all'] 56 | return xdict[image_id] 57 | 58 | def submission(pklpath): 59 | xdf = pd.read_csv("../../data/train/Annotations/train.csv") 60 | trainKeys = xdf.keys() 61 | 62 | testdf = pd.read_csv("../../data/test/test.csv") 63 | print len(testdf), " samples in test.csv" 64 | 65 | mdict = dict() 66 | for xfile in os.listdir(pklpath): 67 | if xfile.endswith('.pkl'): 68 | category = xfile.strip().split('.')[0] 69 | pkl = open(os.path.join(pklpath, xfile)) 70 | mdict[category] = pickle.load(pkl) 71 | 72 | print testdf.keys() 73 | print mdict.keys() 74 | 75 | submissionDf = pd.DataFrame(columns=trainKeys, index=np.arange(testdf.shape[0])) 76 | submissionDf = submissionDf.fillna(value='-1_-1_-1') 77 | submissionDf['image_id'] = testdf['image_id'] 78 | submissionDf['image_category'] = testdf['image_category'] 79 | 80 | for _index, _row in submissionDf.iterrows(): 81 | image_id = _row['image_id'] 82 | image_category = _row['image_category'] 83 | kplst = get_kp_from_dict(mdict, image_category, image_id) 84 | fill_dataframe(kplst, getKpKeys('all')[1:], _row, image_category) 85 | 86 | 87 | print len(submissionDf), "save to ", os.path.join(pklpath, 'submission.csv') 88 | submissionDf.to_csv( os.path.join(pklpath, 'submission.csv'), index=False ) 89 | 90 | 91 | def load_image_names(annfile, category): 92 | # read into dataframe 93 | xdf = pd.read_csv(annfile) 94 | xdf = xdf[xdf['image_category'] == category] 95 | return xdf 96 | 97 | def main_test(savepath, modelpath, augmentFlag): 98 | 99 | valfile = os.path.join(modelpath, 'val.log') 100 | bestmodels = get_best_single_model(valfile) 101 | 102 | print bestmodels, augmentFlag 103 | 104 | xEval = Evaluation('all', bestmodels[0]) 105 | 106 | # load images and run prediction 107 | testfile = os.path.join("../../data/test/", 'test.csv') 108 | 109 | for category in ['skirt', 'blouse', 'trousers', 'outwear', 'dress']: 110 | xdict = dict() 111 | xdf = load_image_names(testfile, category) 112 | print len(xdf), " images to process ", category 113 | 114 | count = 0 115 | for _index, _row in xdf.iterrows(): 116 | count += 1 117 | if count%1000 == 0: 118 | print count, "images have been processed" 119 | 120 | _image_id = _row['image_id'] 121 | imageName = os.path.join("../../data/test", _image_id) 122 | if augmentFlag: 123 | dtkp = xEval.predict_kp_with_rotate(imageName, _row['image_category']) 124 | else: 125 | dtkp = xEval.predict_kp(imageName, _row['image_category'], multiOutput=True) 126 | xdict[_image_id] = dtkp 127 | 128 | savefile = os.path.join(savepath, category+'.pkl') 129 | with open(savefile, 'wb') as xfile: 130 | pickle.dump(xdict, xfile) 131 | 132 | print "prediction save to ", savefile 133 | 134 | 135 | if __name__ == "__main__": 136 | parser = argparse.ArgumentParser() 137 | parser.add_argument("--gpuID", default=0, type=int, help='gpu id') 138 | parser.add_argument("--modelpath", help="path of trained model") 139 | parser.add_argument("--outpath", help="path to save predicted keypoints") 140 | parser.add_argument("--augment", default=False, type=bool, help="augment or not") 141 | 142 | args = parser.parse_args() 143 | 144 | print args 145 | 146 | os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" 147 | os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpuID) 148 | 149 | main_test(args.outpath, args.modelpath, args.augment) 150 | submission(args.outpath) -------------------------------------------------------------------------------- /src/top/train.py: -------------------------------------------------------------------------------- 1 | import sys 2 | sys.path.insert(0, "../data_gen/") 3 | sys.path.insert(0, "../unet/") 4 | 5 | import argparse 6 | import os 7 | from fashion_net import FashionNet 8 | from dataset import getKpNum 9 | import tensorflow as tf 10 | from keras import backend as k 11 | 12 | if __name__ == "__main__": 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument("--gpuID", default=0, type=int, help='gpu id') 15 | parser.add_argument("--category", help="specify cloth category") 16 | parser.add_argument("--network", help="specify network arch'") 17 | parser.add_argument("--batchSize", default=8, type=int, help='batch size for training') 18 | parser.add_argument("--epochs", default=20, type=int, help="number of traning epochs") 19 | parser.add_argument("--resume", default=False, type=bool, help="resume training or not") 20 | parser.add_argument("--lrdecay", default=False, type=bool, help="lr decay or not") 21 | parser.add_argument("--resumeModel", help="start point to retrain") 22 | parser.add_argument("--initEpoch", type=int, help="epoch to resume") 23 | 24 | 25 | args = parser.parse_args() 26 | 27 | os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" 28 | os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpuID) 29 | 30 | 31 | # TensorFlow wizardry 32 | config = tf.ConfigProto() 33 | 34 | # Don't pre-allocate memory; allocate as-needed 35 | config.gpu_options.allow_growth = True 36 | 37 | # Only allow a total of half the GPU memory to be allocated 38 | config.gpu_options.per_process_gpu_memory_fraction = 1.0 39 | 40 | # Create a session with the above options specified. 41 | k.tensorflow_backend.set_session(tf.Session(config=config)) 42 | 43 | if not args.resume : 44 | xnet = FashionNet(512, 512, getKpNum(args.category)) 45 | xnet.build_model(modelName=args.network, show=True) 46 | xnet.train(args.category, epochs=args.epochs, batchSize=args.batchSize, lrschedule=args.lrdecay) 47 | else: 48 | xnet = FashionNet(512, 512, getKpNum(args.category)) 49 | xnet.resume_train(args.category, args.resumeModel, args.network, args.initEpoch, 50 | epochs=args.epochs, batchSize=args.batchSize) -------------------------------------------------------------------------------- /src/unet/fashion_net.py: -------------------------------------------------------------------------------- 1 | 2 | import sys 3 | sys.path.insert(0, "../data_gen/") 4 | sys.path.insert(0, "../eval/") 5 | 6 | from data_generator import DataGenerator 7 | from keras.callbacks import ModelCheckpoint, CSVLogger 8 | from keras.models import load_model 9 | from data_process import pad_image, normalize_image 10 | import os 11 | import cv2 12 | import numpy as np 13 | import datetime 14 | from eval_callback import NormalizedErrorCallBack 15 | from refinenet_mask_v3 import Res101RefineNetMaskV3, euclidean_loss 16 | from resnet101 import Scale 17 | import tensorflow as tf 18 | 19 | class FashionNet(object): 20 | 21 | def __init__(self, inputHeight, inputWidth, nClasses): 22 | self.inputWidth = inputWidth 23 | self.inputHeight = inputHeight 24 | self.nClass = nClasses 25 | 26 | def build_model(self, modelName='v2', show=False): 27 | self.modelName = modelName 28 | self.model = Res101RefineNetMaskV3(self.nClass, self.inputHeight, self.inputWidth, nStackNum=2) 29 | self.nStackNum = 2 30 | 31 | # show model summary and layer name 32 | if show: 33 | self.model.summary() 34 | for layer in self.model.layers: 35 | print layer.name, layer.trainable 36 | 37 | def train(self, category, batchSize=8, epochs=20, lrschedule=False): 38 | trainDt = DataGenerator(category, os.path.join("../../data/train/Annotations", "train_split.csv")) 39 | trainGen = trainDt.generator_with_mask_ohem( graph=tf.get_default_graph(), kerasModel=self.model, 40 | batchSize= batchSize, inputSize=(self.inputHeight, self.inputWidth), 41 | nStackNum=self.nStackNum, flipFlag=False, cropFlag=False) 42 | 43 | normalizedErrorCallBack = NormalizedErrorCallBack("../../trained_models/", category, True) 44 | 45 | csvlogger = CSVLogger( os.path.join(normalizedErrorCallBack.get_folder_path(), 46 | "csv_train_"+self.modelName+"_"+str(datetime.datetime.now().strftime('%H:%M'))+".csv")) 47 | 48 | xcallbacks = [normalizedErrorCallBack, csvlogger] 49 | 50 | self.model.fit_generator(generator=trainGen, steps_per_epoch=trainDt.get_dataset_size()//batchSize, 51 | epochs=epochs, callbacks=xcallbacks) 52 | 53 | def load_model(self, netWeightFile): 54 | self.model = load_model(netWeightFile, custom_objects={'euclidean_loss': euclidean_loss, 'Scale': Scale}) 55 | 56 | def resume_train(self, category, pretrainModel, modelName, initEpoch, batchSize=8, epochs=20): 57 | self.modelName = modelName 58 | self.load_model(pretrainModel) 59 | refineNetflag = True 60 | self.nStackNum = 2 61 | 62 | modelPath = os.path.dirname(pretrainModel) 63 | 64 | trainDt = DataGenerator(category, os.path.join("../../data/train/Annotations", "train_split.csv")) 65 | trainGen = trainDt.generator_with_mask_ohem(graph=tf.get_default_graph(), kerasModel=self.model, 66 | batchSize=batchSize, inputSize=(self.inputHeight, self.inputWidth), 67 | nStackNum=self.nStackNum, flipFlag=False, cropFlag=False) 68 | 69 | 70 | normalizedErrorCallBack = NormalizedErrorCallBack("../../trained_models/", category, refineNetflag, resumeFolder=modelPath) 71 | 72 | csvlogger = CSVLogger(os.path.join(normalizedErrorCallBack.get_folder_path(), 73 | "csv_train_" + self.modelName + "_" + str( 74 | datetime.datetime.now().strftime('%H:%M')) + ".csv")) 75 | 76 | self.model.fit_generator(initial_epoch=initEpoch, generator=trainGen, steps_per_epoch=trainDt.get_dataset_size() // batchSize, 77 | epochs=epochs, callbacks=[normalizedErrorCallBack, csvlogger]) 78 | 79 | 80 | def predict_image(self, imgfile): 81 | # load image and preprocess 82 | img = cv2.imread(imgfile) 83 | img, _ = pad_image(img, list(), 512, 512) 84 | img = normalize_image(img) 85 | input = img[np.newaxis,:,:,:] 86 | # inference 87 | heatmap = self.model.predict(input) 88 | return heatmap 89 | 90 | 91 | def predict(self, input): 92 | # inference 93 | heatmap = self.model.predict(input) 94 | return heatmap -------------------------------------------------------------------------------- /src/unet/refinenet.py: -------------------------------------------------------------------------------- 1 | from keras.models import * 2 | from keras.layers import * 3 | from keras.optimizers import Adam, SGD 4 | from keras import backend as K 5 | from keras.applications.resnet50 import ResNet50 6 | 7 | IMAGE_ORDERING = 'channels_last' 8 | 9 | def Res101RefineNetDilated(n_classes, inputHeight, inputWidth): 10 | model = build_network_resnet101(inputHeight, inputWidth, n_classes, dilated=True) 11 | return model 12 | 13 | def Res101RefineNetStacked(n_classes, inputHeight, inputWidth, nStackNum): 14 | model = build_network_resnet101_stack(inputHeight, inputWidth, n_classes, nStackNum) 15 | return model 16 | 17 | def euclidean_loss(x, y): 18 | return K.sqrt(K.sum(K.square(x - y))) 19 | 20 | 21 | def create_global_net(lowlevelFeatures, n_classes): 22 | lf2x, lf4x, lf8x, lf16x = lowlevelFeatures 23 | 24 | o = lf16x 25 | 26 | o = (Conv2D(256, (3, 3), activation='relu', padding='same', name='up16x_conv', data_format=IMAGE_ORDERING))(o) 27 | o = (BatchNormalization())(o) 28 | 29 | o = (Conv2DTranspose(256, kernel_size=(3, 3), strides=(2, 2), name='upsample_16x', activation='relu', padding='same', 30 | data_format=IMAGE_ORDERING))(o) 31 | o = (concatenate([o, lf8x], axis=-1)) 32 | o = (Conv2D(128, (3, 3), activation='relu', padding='same', name='up8x_conv', data_format=IMAGE_ORDERING))(o) 33 | o = (BatchNormalization())(o) 34 | fup8x = o 35 | 36 | o = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='upsample_8x', padding='same', activation='relu', 37 | data_format=IMAGE_ORDERING))(o) 38 | o = (concatenate([o, lf4x], axis=-1)) 39 | o = (Conv2D(64, (3, 3), activation='relu', padding='same', name='up4x_conv', data_format=IMAGE_ORDERING))(o) 40 | o = (BatchNormalization())(o) 41 | fup4x = o 42 | 43 | o = (Conv2DTranspose(64, kernel_size=(3, 3), strides=(2, 2), name='upsample_4x', padding='same', activation='relu', 44 | data_format=IMAGE_ORDERING))(o) 45 | o = (concatenate([o, lf2x], axis=-1)) 46 | o = (Conv2D(64, (3, 3), activation='relu', padding='same', name='up2x_conv', data_format=IMAGE_ORDERING))(o) 47 | o = (BatchNormalization())(o) 48 | fup2x = o 49 | 50 | out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out2x', data_format=IMAGE_ORDERING)(fup2x) 51 | out4x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out4x', data_format=IMAGE_ORDERING)(fup4x) 52 | out8x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out8x', data_format=IMAGE_ORDERING)(fup8x) 53 | 54 | x4x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(out8x) 55 | eadd4x = Add(name='global4x')([x4x, out4x]) 56 | 57 | x2x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(eadd4x) 58 | eadd2x = Add(name='global2x')([x2x, out2x]) 59 | 60 | return (fup8x, eadd4x, eadd2x) 61 | 62 | def create_refine_net(inputFeatures, n_classes): 63 | f8x, f4x, f2x = inputFeatures 64 | 65 | # 2 Conv2DTranspose f8x -> fup8x 66 | fup8x = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='refine8x_deconv_1', padding='same', activation='relu', 67 | data_format=IMAGE_ORDERING))(f8x) 68 | fup8x = (BatchNormalization())(fup8x) 69 | 70 | fup8x = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='refine8x_deconv_2', padding='same', activation='relu', 71 | data_format=IMAGE_ORDERING))(fup8x) 72 | fup8x = (BatchNormalization())(fup8x) 73 | 74 | # 1 Conv2DTranspose f4x -> fup4x 75 | fup4x = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='refine4x_deconv', padding='same', activation='relu', 76 | data_format=IMAGE_ORDERING))(f4x) 77 | 78 | fup4x = (BatchNormalization())(fup4x) 79 | 80 | # 1 conv f2x -> fup2x 81 | fup2x = (Conv2D(128, (3, 3), activation='relu', padding='same', name='refine2x_conv', data_format=IMAGE_ORDERING))(f2x) 82 | fup2x = (BatchNormalization())(fup2x) 83 | 84 | # concat f2x, fup8x, fup4x 85 | fconcat = (concatenate([fup8x, fup4x, fup2x], axis=-1, name='refine_concat')) 86 | 87 | # 1x1 to map to required feature map 88 | out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='refine2x', data_format=IMAGE_ORDERING)(fconcat) 89 | 90 | return out2x 91 | 92 | 93 | def create_refine_net_bottleneck(inputFeatures, n_classes): 94 | f8x, f4x, f2x = inputFeatures 95 | 96 | # 2 Conv2DTranspose f8x -> fup8x 97 | fup8x = (Conv2D(256, kernel_size=(1, 1), name='refine8x_1', padding='same', activation='relu', data_format=IMAGE_ORDERING))(f8x) 98 | fup8x = (BatchNormalization())(fup8x) 99 | 100 | fup8x = (Conv2D(128, kernel_size=(1, 1), name='refine8x_2', padding='same', activation='relu', data_format=IMAGE_ORDERING))(fup8x) 101 | fup8x = (BatchNormalization())(fup8x) 102 | 103 | fup8x = UpSampling2D((4, 4), data_format=IMAGE_ORDERING)(fup8x) 104 | 105 | 106 | # 1 Conv2DTranspose f4x -> fup4x 107 | fup4x = (Conv2D(128, kernel_size=(1, 1), name='refine4x', padding='same', activation='relu', data_format=IMAGE_ORDERING))(f4x) 108 | fup4x = (BatchNormalization())(fup4x) 109 | fup4x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(fup4x) 110 | 111 | 112 | # 1 conv f2x -> fup2x 113 | fup2x = (Conv2D(128, (1, 1), activation='relu', padding='same', name='refine2x_conv', data_format=IMAGE_ORDERING))(f2x) 114 | fup2x = (BatchNormalization())(fup2x) 115 | 116 | # concat f2x, fup8x, fup4x 117 | fconcat = (concatenate([fup8x, fup4x, fup2x], axis=-1, name='refine_concat')) 118 | 119 | # 1x1 to map to required feature map 120 | out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='refine2x', data_format=IMAGE_ORDERING)(fconcat) 121 | 122 | return out2x 123 | 124 | 125 | def create_stack_refinenet(inputFeatures, n_classes, layerName): 126 | f8x, f4x, f2x = inputFeatures 127 | 128 | # 2 Conv2DTranspose f8x -> fup8x 129 | fup8x = (Conv2D(256, kernel_size=(1, 1), name=layerName+'_refine8x_1', padding='same', activation='relu'))(f8x) 130 | fup8x = (BatchNormalization())(fup8x) 131 | 132 | fup8x = (Conv2D(128, kernel_size=(1, 1), name=layerName+'refine8x_2', padding='same', activation='relu'))(fup8x) 133 | fup8x = (BatchNormalization())(fup8x) 134 | 135 | out8x = fup8x 136 | fup8x = UpSampling2D((4, 4), data_format=IMAGE_ORDERING)(fup8x) 137 | 138 | # 1 Conv2DTranspose f4x -> fup4x 139 | fup4x = (Conv2D(128, kernel_size=(1, 1), name=layerName+'refine4x', padding='same', activation='relu'))(f4x) 140 | fup4x = (BatchNormalization())(fup4x) 141 | out4x = fup4x 142 | fup4x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(fup4x) 143 | 144 | # 1 conv f2x -> fup2x 145 | fup2x = (Conv2D(128, (1, 1), activation='relu', padding='same', name=layerName+'refine2x_conv'))(f2x) 146 | fup2x = (BatchNormalization())(fup2x) 147 | 148 | # concat f2x, fup8x, fup4x 149 | fconcat = (concatenate([fup8x, fup4x, fup2x], axis=-1, name=layerName+'refine_concat')) 150 | 151 | # 1x1 to map to required feature map 152 | out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name=layerName+'refine2x')(fconcat) 153 | 154 | return out8x, out4x, out2x 155 | 156 | 157 | def create_global_net_dilated(lowlevelFeatures, n_classes): 158 | lf2x, lf4x, lf8x, lf16x = lowlevelFeatures 159 | 160 | o = lf16x 161 | 162 | o = (Conv2D(256, (3, 3), dilation_rate=(2, 2), activation='relu', padding='same', name='up16x_conv', data_format=IMAGE_ORDERING))(o) 163 | o = (BatchNormalization())(o) 164 | 165 | o = (Conv2DTranspose(256, kernel_size=(3, 3), strides=(2, 2), name='upsample_16x', activation='relu', padding='same', 166 | data_format=IMAGE_ORDERING))(o) 167 | o = (concatenate([o, lf8x], axis=-1)) 168 | o = (Conv2D(128, (3, 3), dilation_rate=(2, 2), activation='relu', padding='same', name='up8x_conv', data_format=IMAGE_ORDERING))(o) 169 | o = (BatchNormalization())(o) 170 | fup8x = o 171 | 172 | o = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='upsample_8x', padding='same', activation='relu', 173 | data_format=IMAGE_ORDERING))(o) 174 | o = (concatenate([o, lf4x], axis=-1)) 175 | o = (Conv2D(64, (3, 3), dilation_rate=(2, 2), activation='relu', padding='same', name='up4x_conv', data_format=IMAGE_ORDERING))(o) 176 | o = (BatchNormalization())(o) 177 | fup4x = o 178 | 179 | o = (Conv2DTranspose(64, kernel_size=(3, 3), strides=(2, 2), name='upsample_4x', padding='same', activation='relu', 180 | data_format=IMAGE_ORDERING))(o) 181 | o = (concatenate([o, lf2x], axis=-1)) 182 | o = (Conv2D(64, (3, 3), dilation_rate=(2, 2), activation='relu', padding='same', name='up2x_conv', data_format=IMAGE_ORDERING))(o) 183 | o = (BatchNormalization())(o) 184 | fup2x = o 185 | 186 | out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out2x', data_format=IMAGE_ORDERING)(fup2x) 187 | out4x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out4x', data_format=IMAGE_ORDERING)(fup4x) 188 | out8x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out8x', data_format=IMAGE_ORDERING)(fup8x) 189 | 190 | x4x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(out8x) 191 | eadd4x = Add(name='global4x')([x4x, out4x]) 192 | 193 | x2x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(eadd4x) 194 | eadd2x = Add(name='global2x')([x2x, out2x]) 195 | 196 | return (fup8x, eadd4x, eadd2x) 197 | 198 | 199 | def build_network_resnet101(inputHeight, inputWidth, n_classes, frozenlayers=True, dilated=False): 200 | input, lf2x, lf4x, lf8x, lf16x = load_backbone_res101net(inputHeight, inputWidth) 201 | 202 | # global net 8x, 4x, and 2x 203 | if dilated: 204 | g8x, g4x, g2x = create_global_net_dilated((lf2x, lf4x, lf8x, lf16x), n_classes) 205 | else: 206 | g8x, g4x, g2x = create_global_net((lf2x, lf4x, lf8x, lf16x), n_classes) 207 | 208 | # refine net, only 2x as output 209 | refine2x = create_refine_net_bottleneck((g8x, g4x, g2x), n_classes) 210 | 211 | model = Model(inputs=input, outputs=[g2x, refine2x]) 212 | 213 | adam = Adam(lr=1e-4) 214 | model.compile(optimizer=adam, loss=euclidean_loss, metrics=["accuracy"]) 215 | 216 | return model 217 | 218 | 219 | def build_network_resnet101_stack(inputHeight, inputWidth, n_classes, nStack): 220 | # backbone network 221 | input, lf2x,lf4x, lf8x, lf16x = load_backbone_res101net(inputHeight, inputWidth) 222 | 223 | # global net 224 | g8x, g4x, g2x = create_global_net_dilated((lf2x, lf4x, lf8x, lf16x), n_classes) 225 | 226 | s8x, s4x, s2x = g8x, g4x, g2x 227 | 228 | outputs = [g2x] 229 | for i in range(nStack): 230 | s8x, s4x, s2x = create_stack_refinenet((s8x, s4x, s2x), n_classes, 'stack_'+str(i)) 231 | outputs.append(s2x) 232 | 233 | model = Model(inputs=input, outputs=outputs) 234 | 235 | adam = Adam(lr=1e-4) 236 | model.compile(optimizer=adam, loss=euclidean_loss, metrics=["accuracy"]) 237 | return model 238 | 239 | 240 | def load_backbone_res101net(inputHeight, inputWidth): 241 | from resnet101 import ResNet101 242 | xresnet = ResNet101(weights='imagenet', include_top=False, input_shape=(inputHeight, inputWidth, 3)) 243 | 244 | xresnet.load_weights("../../data/resnet101_weights_tf.h5", by_name=True) 245 | 246 | lf16x = xresnet.get_layer('res4b22_relu').output 247 | lf8x = xresnet.get_layer('res3b2_relu').output 248 | lf4x = xresnet.get_layer('res2c_relu').output 249 | lf2x = xresnet.get_layer('conv1_relu').output 250 | 251 | # add one padding for lf4x whose shape is 127x127 252 | lf4xp = ZeroPadding2D(padding=((0, 1), (0, 1)))(lf4x) 253 | 254 | return (xresnet.input, lf2x, lf4xp, lf8x, lf16x) -------------------------------------------------------------------------------- /src/unet/refinenet_mask_v3.py: -------------------------------------------------------------------------------- 1 | 2 | from refinenet import load_backbone_res101net, create_global_net_dilated, create_stack_refinenet 3 | from keras.models import * 4 | from keras.layers import * 5 | from keras.optimizers import Adam, SGD 6 | from keras import backend as K 7 | import keras 8 | 9 | def Res101RefineNetMaskV3(n_classes, inputHeight, inputWidth, nStackNum): 10 | model = build_resnet101_stack_mask_v3(inputHeight, inputWidth, n_classes, nStackNum) 11 | return model 12 | 13 | def euclidean_loss(x, y): 14 | return K.sqrt(K.sum(K.square(x - y))) 15 | 16 | def apply_mask_to_output(output, mask): 17 | output_with_mask = keras.layers.multiply([output, mask]) 18 | return output_with_mask 19 | 20 | def build_resnet101_stack_mask_v3(inputHeight, inputWidth, n_classes, nStack): 21 | 22 | input_mask = Input(shape=(inputHeight//2, inputHeight//2, n_classes), name='mask') 23 | input_ohem_mask = Input(shape=(inputHeight//2, inputHeight//2, n_classes), name='ohem_mask') 24 | 25 | # backbone network 26 | input_image, lf2x,lf4x, lf8x, lf16x = load_backbone_res101net(inputHeight, inputWidth) 27 | 28 | # global net 29 | g8x, g4x, g2x = create_global_net_dilated((lf2x, lf4x, lf8x, lf16x), n_classes) 30 | 31 | s8x, s4x, s2x = g8x, g4x, g2x 32 | 33 | g2x_mask = apply_mask_to_output(g2x, input_mask) 34 | 35 | outputs = [g2x_mask] 36 | for i in range(nStack): 37 | s8x, s4x, s2x = create_stack_refinenet((s8x, s4x, s2x), n_classes, 'stack_'+str(i)) 38 | if i == (nStack-1): # last stack with ohem_mask 39 | s2x_mask = apply_mask_to_output(s2x, input_ohem_mask) 40 | else: 41 | s2x_mask = apply_mask_to_output(s2x, input_mask) 42 | outputs.append(s2x_mask) 43 | 44 | model = Model(inputs=[input_image, input_mask, input_ohem_mask], outputs=outputs) 45 | 46 | adam = Adam(lr=1e-4) 47 | model.compile(optimizer=adam, loss=euclidean_loss, metrics=["accuracy"]) 48 | return model -------------------------------------------------------------------------------- /src/unet/resnet101.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ResNet-101 model for Keras. 3 | 4 | # Reference: 5 | 6 | - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) 7 | 8 | Slightly modified Felix Yu's (https://github.com/flyyufelix) implementation of 9 | ResNet-101 to have consistent API as those pre-trained models within 10 | `keras.applications`. The original implementation is found here 11 | https://gist.github.com/flyyufelix/65018873f8cb2bbe95f429c474aa1294#file-resnet-101_keras-py 12 | 13 | Implementation is based on Keras 2.0 14 | """ 15 | from keras.layers import ( 16 | Input, Dense, Conv2D, MaxPooling2D, AveragePooling2D, ZeroPadding2D, 17 | Flatten, Activation, GlobalAveragePooling2D, GlobalMaxPooling2D, add) 18 | from keras.layers.normalization import BatchNormalization 19 | from keras.models import Model 20 | from keras import initializers 21 | from keras.engine import Layer, InputSpec 22 | from keras.engine.topology import get_source_inputs 23 | from keras import backend as K 24 | from keras.applications.imagenet_utils import _obtain_input_shape 25 | from keras.utils.data_utils import get_file 26 | 27 | import warnings 28 | import sys 29 | sys.setrecursionlimit(3000) 30 | 31 | 32 | WEIGHTS_PATH_TH = 'https://dl.dropboxusercontent.com/s/rrp56zm347fbrdn/resnet101_weights_th.h5?dl=0' 33 | WEIGHTS_PATH_TF = 'https://dl.dropboxusercontent.com/s/a21lyqwgf88nz9b/resnet101_weights_tf.h5?dl=0' 34 | MD5_HASH_TH = '3d2e9a49d05192ce6e22200324b7defe' 35 | MD5_HASH_TF = '867a922efc475e9966d0f3f7b884dc15' 36 | 37 | 38 | class Scale(Layer): 39 | '''Learns a set of weights and biases used for scaling the input data. 40 | the output consists simply in an element-wise multiplication of the input 41 | and a sum of a set of constants: 42 | 43 | out = in * gamma + beta, 44 | 45 | where 'gamma' and 'beta' are the weights and biases larned. 46 | 47 | # Arguments 48 | axis: integer, axis along which to normalize in mode 0. For instance, 49 | if your input tensor has shape (samples, channels, rows, cols), 50 | set axis to 1 to normalize per feature map (channels axis). 51 | momentum: momentum in the computation of the 52 | exponential average of the mean and standard deviation 53 | of the data, for feature-wise normalization. 54 | weights: Initialization weights. 55 | List of 2 Numpy arrays, with shapes: 56 | `[(input_shape,), (input_shape,)]` 57 | beta_init: name of initialization function for shift parameter 58 | (see [initializers](../initializers.md)), or alternatively, 59 | Theano/TensorFlow function to use for weights initialization. 60 | This parameter is only relevant if you don't pass a `weights` 61 | argument. 62 | gamma_init: name of initialization function for scale parameter (see 63 | [initializers](../initializers.md)), or alternatively, 64 | Theano/TensorFlow function to use for weights initialization. 65 | This parameter is only relevant if you don't pass a `weights` 66 | argument. 67 | gamma_init: name of initialization function for scale parameter (see 68 | [initializers](../initializers.md)), or alternatively, 69 | Theano/TensorFlow function to use for weights initialization. 70 | This parameter is only relevant if you don't pass a `weights` 71 | argument. 72 | ''' 73 | def __init__(self, 74 | weights=None, 75 | axis=-1, 76 | momentum=0.9, 77 | beta_init='zero', 78 | gamma_init='one', 79 | **kwargs): 80 | self.momentum = momentum 81 | self.axis = axis 82 | self.beta_init = initializers.get(beta_init) 83 | self.gamma_init = initializers.get(gamma_init) 84 | self.initial_weights = weights 85 | super(Scale, self).__init__(**kwargs) 86 | 87 | def build(self, input_shape): 88 | self.input_spec = [InputSpec(shape=input_shape)] 89 | shape = (int(input_shape[self.axis]),) 90 | 91 | self.gamma = K.variable( 92 | self.gamma_init(shape), 93 | name='{}_gamma'.format(self.name)) 94 | self.beta = K.variable( 95 | self.beta_init(shape), 96 | name='{}_beta'.format(self.name)) 97 | self.trainable_weights = [self.gamma, self.beta] 98 | 99 | if self.initial_weights is not None: 100 | self.set_weights(self.initial_weights) 101 | del self.initial_weights 102 | 103 | def call(self, x, mask=None): 104 | input_shape = self.input_spec[0].shape 105 | broadcast_shape = [1] * len(input_shape) 106 | broadcast_shape[self.axis] = input_shape[self.axis] 107 | 108 | out = K.reshape( 109 | self.gamma, 110 | broadcast_shape) * x + K.reshape(self.beta, broadcast_shape) 111 | return out 112 | 113 | def get_config(self): 114 | config = {"momentum": self.momentum, "axis": self.axis} 115 | base_config = super(Scale, self).get_config() 116 | return dict(list(base_config.items()) + list(config.items())) 117 | 118 | 119 | def identity_block(input_tensor, kernel_size, filters, stage, block): 120 | '''The identity_block is the block that has no conv layer at shortcut 121 | # Arguments 122 | input_tensor: input tensor 123 | kernel_size: defualt 3, the kernel size of middle conv layer at main 124 | path 125 | filters: list of integers, the nb_filters of 3 conv layer at main path 126 | stage: integer, current stage label, used for generating layer names 127 | block: 'a','b'..., current block label, used for generating layer names 128 | ''' 129 | eps = 1.1e-5 130 | if K.image_data_format() == 'channels_last': 131 | bn_axis = 3 132 | else: 133 | bn_axis = 1 134 | nb_filter1, nb_filter2, nb_filter3 = filters 135 | conv_name_base = 'res' + str(stage) + block + '_branch' 136 | bn_name_base = 'bn' + str(stage) + block + '_branch' 137 | scale_name_base = 'scale' + str(stage) + block + '_branch' 138 | 139 | x = Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a', 140 | use_bias=False)(input_tensor) 141 | x = BatchNormalization(epsilon=eps, axis=bn_axis, 142 | name=bn_name_base + '2a')(x) 143 | x = Scale(axis=bn_axis, name=scale_name_base + '2a')(x) 144 | x = Activation('relu', name=conv_name_base + '2a_relu')(x) 145 | 146 | x = ZeroPadding2D((1, 1), name=conv_name_base + '2b_zeropadding')(x) 147 | x = Conv2D(nb_filter2, (kernel_size, kernel_size), 148 | name=conv_name_base + '2b', use_bias=False)(x) 149 | x = BatchNormalization(epsilon=eps, axis=bn_axis, 150 | name=bn_name_base + '2b')(x) 151 | x = Scale(axis=bn_axis, name=scale_name_base + '2b')(x) 152 | x = Activation('relu', name=conv_name_base + '2b_relu')(x) 153 | 154 | x = Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', 155 | use_bias=False)(x) 156 | x = BatchNormalization(epsilon=eps, axis=bn_axis, 157 | name=bn_name_base + '2c')(x) 158 | x = Scale(axis=bn_axis, name=scale_name_base + '2c')(x) 159 | 160 | x = add([x, input_tensor], name='res' + str(stage) + block) 161 | x = Activation('relu', name='res' + str(stage) + block + '_relu')(x) 162 | return x 163 | 164 | 165 | def conv_block(input_tensor, 166 | kernel_size, 167 | filters, 168 | stage, 169 | block, 170 | strides=(2, 2)): 171 | '''conv_block is the block that has a conv layer at shortcut 172 | # Arguments 173 | input_tensor: input tensor 174 | kernel_size: defualt 3, the kernel size of middle conv layer at main 175 | path 176 | filters: list of integers, the nb_filters of 3 conv layer at main path 177 | stage: integer, current stage label, used for generating layer names 178 | block: 'a','b'..., current block label, used for generating layer names 179 | Note that from stage 3, the first conv layer at main path is with 180 | strides=(2,2). And the shortcut should have strides=(2,2) as well 181 | ''' 182 | eps = 1.1e-5 183 | if K.image_data_format() == 'channels_last': 184 | bn_axis = 3 185 | else: 186 | bn_axis = 1 187 | nb_filter1, nb_filter2, nb_filter3 = filters 188 | conv_name_base = 'res' + str(stage) + block + '_branch' 189 | bn_name_base = 'bn' + str(stage) + block + '_branch' 190 | scale_name_base = 'scale' + str(stage) + block + '_branch' 191 | 192 | x = Conv2D(nb_filter1, (1, 1), strides=strides, 193 | name=conv_name_base + '2a', use_bias=False)(input_tensor) 194 | x = BatchNormalization(epsilon=eps, axis=bn_axis, 195 | name=bn_name_base + '2a')(x) 196 | x = Scale(axis=bn_axis, name=scale_name_base + '2a')(x) 197 | x = Activation('relu', name=conv_name_base + '2a_relu')(x) 198 | 199 | x = ZeroPadding2D((1, 1), name=conv_name_base + '2b_zeropadding')(x) 200 | x = Conv2D(nb_filter2, (kernel_size, kernel_size), 201 | name=conv_name_base + '2b', use_bias=False)(x) 202 | x = BatchNormalization(epsilon=eps, axis=bn_axis, 203 | name=bn_name_base + '2b')(x) 204 | x = Scale(axis=bn_axis, name=scale_name_base + '2b')(x) 205 | x = Activation('relu', name=conv_name_base + '2b_relu')(x) 206 | 207 | x = Conv2D(nb_filter3, (1, 1), 208 | name=conv_name_base + '2c', use_bias=False)(x) 209 | x = BatchNormalization(epsilon=eps, axis=bn_axis, 210 | name=bn_name_base + '2c')(x) 211 | x = Scale(axis=bn_axis, name=scale_name_base + '2c')(x) 212 | 213 | shortcut = Conv2D(nb_filter3, (1, 1), strides=strides, 214 | name=conv_name_base + '1', use_bias=False)(input_tensor) 215 | shortcut = BatchNormalization(epsilon=eps, axis=bn_axis, 216 | name=bn_name_base + '1')(shortcut) 217 | shortcut = Scale(axis=bn_axis, name=scale_name_base + '1')(shortcut) 218 | 219 | x = add([x, shortcut], name='res' + str(stage) + block) 220 | x = Activation('relu', name='res' + str(stage) + block + '_relu')(x) 221 | return x 222 | 223 | 224 | def ResNet101(include_top=True, 225 | weights='imagenet', 226 | input_tensor=None, 227 | input_shape=None, 228 | pooling=None, 229 | classes=1000): 230 | """Instantiates the ResNet-101 architecture. 231 | 232 | Optionally loads weights pre-trained on ImageNet. Note that when using 233 | TensorFlow, for best performance you should set 234 | image_data_format='channels_last'` in your Keras config at 235 | ~/.keras/keras.json. 236 | 237 | The model and the weights are compatible with both TensorFlow and Theano. 238 | The data format convention used by the model is the one specified in your 239 | Keras config file. 240 | 241 | Parameters 242 | ---------- 243 | include_top: whether to include the fully-connected layer at the top of 244 | the network. 245 | weights: one of `None` (random initialization) or 'imagenet' 246 | (pre-training on ImageNet). 247 | input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) 248 | to use as image input for the model. 249 | input_shape: optional shape tuple, only to be specified if 250 | `include_top` is False (otherwise the input shape has to be 251 | `(224, 224, 3)` (with `channels_last` data format) or 252 | `(3, 224, 224)` (with `channels_first` data format). It should have 253 | exactly 3 inputs channels, and width and height should be no 254 | smaller than 197. 255 | E.g. `(200, 200, 3)` would be one valid value. 256 | pooling: Optional pooling mode for feature extraction when 257 | `include_top` is `False`. 258 | - `None` means that the output of the model will be the 4D tensor 259 | output of the last convolutional layer. 260 | - `avg` means that global average pooling will be applied to the 261 | output of the last convolutional layer, and thus the output of 262 | the model will be a 2D tensor. 263 | - `max` means that global max pooling will be applied. 264 | classes: optional number of classes to classify images into, only to be 265 | specified if `include_top` is True, and if no `weights` argument is 266 | specified. 267 | 268 | Returns 269 | ------- 270 | A Keras model instance. 271 | 272 | Raises 273 | ------ 274 | ValueError: in case of invalid argument for `weights`, or invalid input 275 | shape. 276 | """ 277 | if weights not in {'imagenet', None}: 278 | raise ValueError('The `weights` argument should be either ' 279 | '`None` (random initialization) or `imagenet` ' 280 | '(pre-training on ImageNet).') 281 | 282 | if weights == 'imagenet' and include_top and classes != 1000: 283 | raise ValueError('If using `weights` as imagenet with `include_top`' 284 | ' as true, `classes` should be 1000') 285 | 286 | # Determine proper input shape 287 | input_shape = _obtain_input_shape(input_shape, 288 | default_size=224, 289 | min_size=197, 290 | data_format=K.image_data_format(), 291 | require_flatten=include_top, 292 | weights=weights) 293 | 294 | if input_tensor is None: 295 | img_input = Input(shape=input_shape, name='data') 296 | else: 297 | if not K.is_keras_tensor(input_tensor): 298 | img_input = Input( 299 | tensor=input_tensor, shape=input_shape, name='data') 300 | else: 301 | img_input = input_tensor 302 | if K.image_data_format() == 'channels_last': 303 | bn_axis = 3 304 | else: 305 | bn_axis = 1 306 | eps = 1.1e-5 307 | 308 | x = ZeroPadding2D((3, 3), name='conv1_zeropadding')(img_input) 309 | x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=False)(x) 310 | x = BatchNormalization(epsilon=eps, axis=bn_axis, name='bn_conv1')(x) 311 | x = Scale(axis=bn_axis, name='scale_conv1')(x) 312 | x = Activation('relu', name='conv1_relu')(x) 313 | x = MaxPooling2D((3, 3), strides=(2, 2), name='pool1')(x) 314 | 315 | x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1)) 316 | x = identity_block(x, 3, [64, 64, 256], stage=2, block='b') 317 | x = identity_block(x, 3, [64, 64, 256], stage=2, block='c') 318 | 319 | x = conv_block(x, 3, [128, 128, 512], stage=3, block='a') 320 | for i in range(1, 3): 321 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='b' + str(i)) 322 | 323 | x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a') 324 | for i in range(1, 23): 325 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b' + str(i)) 326 | 327 | x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a') 328 | x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b') 329 | x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c') 330 | 331 | x = AveragePooling2D((7, 7), name='avg_pool')(x) 332 | 333 | if include_top: 334 | x = Flatten()(x) 335 | x = Dense(classes, activation='softmax', name='mmfc1000')(x) 336 | else: 337 | if pooling == 'avg': 338 | x = GlobalAveragePooling2D()(x) 339 | elif pooling == 'max': 340 | x = GlobalMaxPooling2D()(x) 341 | 342 | # Ensure that the model takes into account 343 | # any potential predecessors of `input_tensor`. 344 | if input_tensor is not None: 345 | inputs = get_source_inputs(input_tensor) 346 | else: 347 | inputs = img_input 348 | # Create model. 349 | model = Model(inputs, x, name='resnet101') 350 | 351 | ''' 352 | # load weights 353 | if weights == 'imagenet': 354 | filename = 'resnet101_weights_{}.h5'.format(K.image_dim_ordering()) 355 | if K.backend() == 'theano': 356 | path = WEIGHTS_PATH_TH 357 | md5_hash = MD5_HASH_TH 358 | else: 359 | path = WEIGHTS_PATH_TF 360 | md5_hash = MD5_HASH_TF 361 | weights_path = get_file( 362 | fname=filename, 363 | origin=path, 364 | cache_subdir='models', 365 | md5_hash=md5_hash, 366 | hash_algorithm='md5') 367 | model.load_weights(weights_path, by_name=True) 368 | 369 | if K.image_data_format() == 'channels_first' and K.backend() == 'tensorflow': 370 | warnings.warn('You are using the TensorFlow backend, yet you ' 371 | 'are using the Theano ' 372 | 'image data format convention ' 373 | '(`image_data_format="channels_first"`). ' 374 | 'For best performance, set ' 375 | '`image_data_format="channels_last"` in ' 376 | 'your Keras config ' 377 | 'at ~/.keras/keras.json.') 378 | ''' 379 | return model 380 | -------------------------------------------------------------------------------- /submission/placeholder.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras/0b3bd8cdee32e05619300e5466578644974279df/submission/placeholder.txt -------------------------------------------------------------------------------- /trained_models/placeholder.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras/0b3bd8cdee32e05619300e5466578644974279df/trained_models/placeholder.txt --------------------------------------------------------------------------------