├── README.md ├── data_model └── ReadMe.txt ├── raw_data ├── README.md ├── create_h5_dataset.py ├── get_data_txt.py └── image_process.py ├── train ├── predict_resvgg.py ├── read_data.py ├── resvgg_model.py └── train_resvgg.py └── yolo ├── ReadMe.txt ├── demo.c ├── demo.h ├── image.c └── image.h /README.md: -------------------------------------------------------------------------------- 1 | # pig_face 2 | This repository is used to save the code for a competition 3 | 4 | 若对以下描述有任何疑问,请及时与我们联系。 5 | 邮箱: xuxcong@gmail.com , jiexin_zheng@qq.com 6 | 7 | ## 1. 运行环境 8 | 9 | Ubuntu 16.04 python 2.7.12 cuda8.0 cudnn6.0 tensorflow 1.3.0 10 | 11 | GPU 4*TITAN XP 12 | 13 | 14 | ## 2. 从视频中截取出猪: 15 | 16 | (1)为了排除背景数据对模型的影响,我们使用yolo-9000算法提取出视频中每一帧的猪,代码来源于https://github.com/philipperemy/yolo-9000. 17 | 我们对其代码做了修改,将yolo解压包的代码解压后覆盖 darknet/src下同名文件即可 18 | 19 | (2)经观察后发现,虽然yolo-9000对猪的识别不一定会归于hog类,但是基本上所有的框都会以视频中的猪为主体,因此在取框的时候,我们不以hog类的框为输出图像,而是以置信度为参考标准。 20 | 21 | (3)我们保留所有置信度大于0.1的窗口 22 | 23 | (4)每个视频大约能得到一万多张ROI图片,我们按大小排序,选取大约前4000张图片,并剔除不相关的物体图片以及背景干扰较大的图片(比如没有框到猪身上,或者只框了极小部分的猪),将其作为训练集和验证集。 24 | 25 | (5)最后得到94677张图片 26 | 27 | 28 | ## 3. 预处理以及生成数据集 29 | 30 | (1)运行raw_data/image_process.py, 将上一步得到的图片通过padding的方法变为正方形,保证在之后的步骤中resize操作不会扭曲图片 31 | 32 | (2)运行raw_data/get_data_txt.py,对数据进行分割,并且将数据分割成50个储存文件,存在txt文件中,方便之后大数据的分步读取 33 | 34 | (3)运行raw_data/create_h5_dataset.h5, 将数据生成h5文件,这一步之后会得到50个储存训练集的.h5文件,以及50个储存验证集.h5文件 35 | 36 | ## 4. 模型 37 | 38 | (1)本模型基于细粒度识别模型bilinear cnn做的改进,参考源码来自于https://github.com/abhaydoke09/Bilinear-CNN-TensorFlow 39 | 参考论文 vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf 40 | Bilinear cnn是一个端到端的网络模型,该模型在CUB200-2011数据集上取得了弱监督细粒度分类模型的最好分类准确度。 41 | 42 | (2)bilinear cnn把最后一层卷积核的输出做了外积(实际是做内积),以此达到融合不同特征的目的。 43 | 44 | (3)我们队伍受resnet结构的启发,对bilinear cnn算法做了改进,将最后一层卷积核的输出也和前面其他层的卷积核的输出做内积,以此达到融合不同层次的特征的目的。再把得到的vector和原来的bilinear vector 融合。 我们增加了conv4_1、conv5_1对conv5_3的内积(只增加这两层是因为他们的filter numbers数量一致,pooling之后就可以做内积了,不需要加额外的卷积核) 45 | 我们的思想是:不同卷积层关注的特征不同,且对应感受视野的大小也不同(即有高低层次之分),在识别类似图像时,单独考虑特征是不够的,还需要考虑他们之间的空间关系。 46 | 47 | (4)加载预训练的vgg模型,先训练全连接层,之后再训练整个网络。预训练权重下载地址https://www.cs.toronto.edu/~frossard/post/vgg16/ 48 | 49 | (5)训练过程中加入实时的数据增强,包括旋转、随机改变对比度、随机改变亮度、随机crop. 训练时全连接层的drop out概率为0.5 50 | 51 | 52 | ## 4. 结构 53 | 54 | (1)train/read_data.py 是读取数据的结构。实现大数据的分次加载。 55 | 56 | (2)train/resvgg_model.py定义了网络结构,以及读取保存的权重的方法 57 | 58 | (3)train/train_resvgg.py定义了训练的过程 59 | 60 | (4)train/predict_resvgg.py 输出预测结果 61 | 62 | ## 5. 加载预训练模型,微调 63 | 64 | (1)在读取resvgg模型时,令finetune=False,实现只训练最后的全连接层。并且调用load_initial_weights(sess),读取预训练的vgg的卷积层的参数 65 | 66 | (2)训练设置 optimizer = tf.train.MomentumOptimizer(learning_rate=0.2, momentum=0.5).minimize(loss),训练次数50次 67 | 68 | (3)将过程中得到的最优模型保存下来 69 | 70 | ## 6. 全网络训练 71 | 72 | (1)在读取resvgg模型时,令finetune=True。 调用load_own_weight(sess , model_path),读取上一步得到的模型 73 | 74 | (2)训练设置optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss), 训练200次 75 | 76 | (3)将过程中得到的最优模型保存下来 77 | 78 | 79 | ## 7. 后期调整 80 | 81 | 实际训练过程中,只有第一次会在所有数据上训练满200次。在得到保存下来的模型后,之后的调参过程只取大约1/4的数据进行继续训练 82 | 83 | ## 8. 预测 84 | 85 | (1)运行 predict_resvgg.py 预测结果 86 | 87 | -------------------------------------------------------------------------------- /data_model/ReadMe.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xuxcong/pig-face-recognition/540cdd6026ec2be6250f2677c036bd57c2251e67/data_model/ReadMe.txt -------------------------------------------------------------------------------- /raw_data/README.md: -------------------------------------------------------------------------------- 1 | # pig_face 2 | This repository is used to save the code for a competition 3 | 4 | image_process.py 将原始图像通过padding的方法处理为正方形 5 | 6 | get_data_txt.py 将需要用到的图像的路径保存下来,为下一步生成数据库做准备 7 | 8 | create_h5_dataset.py 将数据处理为数据库的形式 -------------------------------------------------------------------------------- /raw_data/create_h5_dataset.py: -------------------------------------------------------------------------------- 1 | from tflearn.data_utils import build_hdf5_image_dataset 2 | import h5py 3 | 4 | 5 | path = '/home/smie/zhengjx/Res_Bilinear_cnns/raw_data/txt/' 6 | filenum = 50; 7 | filename = 'train_data' 8 | files = []; 9 | result = []; 10 | for i in range(0, filenum): 11 | files.append(path + filename + str(i) + '.txt'); 12 | result.append(filename + str(i) + '.h5') 13 | build_hdf5_image_dataset(files[i], image_shape=(488, 488), mode='file', output_path=result[i], categorical_labels=True, normalize=False) 14 | print('Finish dataset ' + result[i]); 15 | -------------------------------------------------------------------------------- /raw_data/get_data_txt.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from PIL import Image,ImageDraw,ImageFilter,ImageEnhance 3 | import numpy as np 4 | import os 5 | import matplotlib.pyplot as plt 6 | import time 7 | import shutil 8 | 9 | 10 | if __name__ == '__main__': 11 | 12 | 13 | work_file = os.getcwd(); 14 | 15 | data_directory = '/home/smie/zhengjx'; 16 | filenum = 50; 17 | train_file = []; 18 | validation_file = []; 19 | for i in range(0,50): 20 | train = open('txt/train_data' + str(i) + '.txt','w') 21 | validation = open('txt/validation_data' + str(i) +'.txt','w'); 22 | train_file.append(train); 23 | validation_file.append(validation); 24 | 25 | data_path = os.path.join(data_directory,'ROI'); 26 | train_num = 0; 27 | validation_num = 0; 28 | all_num = 0; 29 | for i in range(1,31): 30 | nowfile = os.path.join(data_path, str(i)); 31 | files = os.listdir(nowfile) 32 | for filename in files: 33 | newname = str(all_num) + '.jpg'; 34 | while(os.path.exists(os.path.join(nowfile,newname)) == True): 35 | newname = str(i) + newname; 36 | os.rename(os.path.join(nowfile,filename),os.path.join(nowfile,newname)) 37 | filepath = os.path.join(nowfile,newname); 38 | label = str(i); 39 | all_num = all_num + 1; 40 | if(all_num % 5 == 0): 41 | validation_file[validation_num % filenum].write(filepath + ' ' + label + '\n'); 42 | validation_num = validation_num + 1; 43 | else: 44 | train_file[train_num % filenum].write(filepath + ' ' + label + '\n'); 45 | train_num = train_num + 1; 46 | ''' 47 | test_file = open("testresult.txt",'w'); 48 | data_path = os.path.join(work_file,'testresult'); 49 | for i in range(1,31): 50 | nowfile = os.path.join(data_path, str(i)); 51 | files = os.listdir(nowfile) 52 | for filename in files: 53 | filepath = os.path.join(nowfile,filename); 54 | label = str(i); 55 | test_file.write(filepath + ' ' + label + '\n'); 56 | ''' 57 | ''' 58 | test_file = open("anstest_data.txt",'w'); 59 | data_path = os.path.join(work_file,'process_test'); 60 | for i in ['test_A']: 61 | nowfile = os.path.join(data_path, str(i)); 62 | files = os.listdir(nowfile) 63 | for filename in files: 64 | filepath = os.path.join(nowfile,filename); 65 | label = str(1); 66 | test_file.write(filepath + ' ' + label + '\n'); 67 | ''' 68 | -------------------------------------------------------------------------------- /raw_data/image_process.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from PIL import Image,ImageDraw,ImageFilter,ImageEnhance 3 | import numpy as np 4 | import os 5 | import matplotlib.pyplot as plt 6 | import time 7 | import shutil 8 | 9 | 10 | 11 | #reshape the images to a square 12 | def process_img(raw_path, result_path): 13 | img4 = Image.open(raw_path) 14 | longer_side = max(img4.size) 15 | horizontal_padding = (longer_side - img4.size[0]) / 2 16 | vertical_padding = (longer_side - img4.size[1]) / 2 17 | img5 = img4.crop( 18 | ( 19 | -horizontal_padding, 20 | -vertical_padding, 21 | img4.size[0] + horizontal_padding, 22 | img4.size[1] + vertical_padding 23 | ) 24 | ) 25 | img4.close(); 26 | img5 = img5.resize((512,512)) 27 | img5.save(result_path) 28 | 29 | 30 | 31 | 32 | 33 | 34 | if __name__ == '__main__': 35 | #This file reshape the images into specific size, and padding to a square 36 | ''' 37 | train_path = "/home/smie/zhengjx/face_recognize/raw_train/" 38 | for i in range(1,31): 39 | data_path = train_path + str(i); 40 | result_file = data_path + '/'; 41 | files = os.listdir(data_path) 42 | for img_name in files: 43 | img_path = data_path + '/' + img_name; 44 | result_img_path = result_file + '/' + img_name; 45 | process_img(img_path, result_img_path) 46 | 47 | ''' 48 | 49 | test_path = "/home/smie/zhengjx/face_recognize/test_B/" 50 | result_path = 'process_testB/' 51 | for i in ['test_B']: 52 | data_path = test_path; 53 | result_file = result_path; 54 | if(os.path.exists(result_file) == True): 55 | shutil.rmtree(result_file); 56 | time.sleep(1) 57 | os.mkdir(result_file); 58 | files = os.listdir(data_path) 59 | for img_name in files: 60 | img_path = data_path + '/' + img_name; 61 | result_img_path = result_file + '/' + img_name; 62 | process_img(img_path, result_img_path); 63 | 64 | -------------------------------------------------------------------------------- /train/predict_resvgg.py: -------------------------------------------------------------------------------- 1 | from resvgg_model import *; 2 | from read_data import *; 3 | 4 | def softmax(x): 5 | x = x - np.max(x) 6 | exp_x = np.exp(x) 7 | softmax_x = exp_x / np.sum(exp_x) 8 | return softmax_x 9 | 10 | if __name__ == '__main__': 11 | 12 | dataset_path = '/home/smie/zhengjx/Res_Bilinear_cnns/data_model/'; 13 | model_path = '/home/smie/zhengjx/Res_Bilinear_cnns/data_model/'; 14 | save_model_best = model_path + 'res_fine_last_layers_epoch_best.npz'; 15 | save_model_last = model_path + 'res_fine_last_layers_epoch_last.npz'; 16 | test_data_file = [dataset_path + 'testB.h5']; 17 | val_data_file = [dataset_path + 'new_val_448.h5']; 18 | num_class = 30; 19 | test_batch_size = 1; 20 | val_batch_size = 1; 21 | test_reader = data_reader(test_data_file, num_class, test_batch_size, shuffle = False); 22 | val_reader = data_reader(val_data_file, num_class, val_batch_size) 23 | 24 | model_path = model_path + 'res_fine_last_layers_epoch_best.npz'; 25 | 26 | sess = tf.Session() ## Start session to create training graph 27 | keep_prob = tf.placeholder(tf.float32); 28 | imgs = tf.placeholder(tf.float32, [None, 448, 448, 3]) 29 | target = tf.placeholder("float", [None, 30]) 30 | 31 | # resvgg = resvgg(imgs, 'vgg16_weights.npz', sess, finetune = False) 32 | resvgg = resvgg(imgs, keep_prob, 'vgg16_weights.npz', sess, res = False) 33 | 34 | print('VGG network created') 35 | 36 | # Defining other ops using Tensorflow 37 | loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=resvgg.fc3l, labels=target)) 38 | 39 | #optimizer = tf.train.MomentumOptimizer(learning_rate=0.0005, momentum=0.4).minimize(loss) 40 | optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.005).minimize(loss) 41 | check_op = tf.add_check_numerics_ops() 42 | 43 | 44 | correct_prediction = tf.equal(tf.argmax(resvgg.fc3l,1), tf.argmax(target,1)) 45 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 46 | 47 | num_correct_preds = tf.reduce_sum(tf.cast(correct_prediction, tf.float32)) 48 | 49 | sess.run(tf.global_variables_initializer()) 50 | #resvgg.load_initial_weights(sess) 51 | resvgg.load_own_weight(sess , model_path); 52 | 53 | 54 | # Use the validation loss to make sure that we have loaded the right model 55 | correct_val_count = 0 56 | val_loss = 0.0 57 | while(val_reader.have_next()): 58 | batch_val_x, batch_val_y = val_reader.next_batch(); 59 | val_loss += sess.run(loss, feed_dict={imgs: batch_val_x, target: batch_val_y, keep_prob: 1.0}) 60 | pred = sess.run(num_correct_preds, feed_dict = {imgs: batch_val_x, target: batch_val_y, keep_prob: 1.0}) 61 | correct_val_count+=pred 62 | val_loss = val_loss/(1.0*val_reader.total_datanum); 63 | print("##############################") 64 | print("Validation Loss -->", val_loss) 65 | print("correct_val_count, total_val_count", correct_val_count, val_reader.total_datanum) 66 | print("Validation Data Accuracy -->", 100.0*correct_val_count/(1.0*val_reader.total_datanum)) 67 | print("##############################") 68 | 69 | #parameter for test 70 | target_path = '/home/smie/zhengjx/Res_Bilinear_cnns/train_test/testB.txt'; 71 | images = [] 72 | with open(target_path, 'r') as f: 73 | for l in f.readlines(): 74 | l = l.strip('\n').split() 75 | name = l[0].split('/')[-1].split('.')[0]; 76 | images.append(name) 77 | csvfile = file('b_cnn_' + 'test' +'.csv', 'wb') 78 | writer = csv.writer(csvfile) 79 | i = 0; 80 | while(test_reader.have_next()): 81 | batch_test_x, batch_val_y = test_reader.next_batch(); 82 | result = sess.run([resvgg.fc3l], feed_dict={imgs: batch_test_x, keep_prob: 1.0}); 83 | result = softmax(result); 84 | if(i % 100 == 0): 85 | print(i) 86 | for j in range(0,30): 87 | writer.writerow([images[i], j + 1, max(round(result[0][0][j],7) - 0.000001, 0.0) * 0.96 + 0.001333 ]) 88 | i = i + 1; 89 | csvfile.close(); -------------------------------------------------------------------------------- /train/read_data.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import numpy as np 3 | import tflearn 4 | import os 5 | from tflearn.data_utils import shuffle 6 | import pickle 7 | import h5py 8 | import math 9 | import random 10 | import time 11 | from PIL import Image,ImageDraw,ImageFilter,ImageEnhance 12 | import csv 13 | from keras.preprocessing import image 14 | import PIL 15 | #rotate the image 16 | def rotate(x, row_axis=0, col_axis=1, channel_axis=2, fill_mode='nearest', cval=0.): 17 | rotate_limit=(-90, 90) 18 | theta = np.pi / 180 * np.random.uniform(rotate_limit[0], rotate_limit[1]) 19 | rotation_matrix = np.array([[np.cos(theta), -np.sin(theta), 0],[np.sin(theta), np.cos(theta), 0],[0, 0, 1]]) 20 | h, w = x.shape[row_axis], x.shape[col_axis] 21 | transform_matrix = image.transform_matrix_offset_center(rotation_matrix, h, w) 22 | x = image.apply_transform(x, transform_matrix, channel_axis, fill_mode, cval) 23 | return x 24 | 25 | # change brightness Color contrast sharpness 26 | def random_brightness(img, delta): 27 | img = PIL.Image.fromarray(np.uint8(img)) 28 | enh_bri = ImageEnhance.Brightness(img) 29 | brightness = np.random.randint(8,14) / 10.0; 30 | image_brightened = enh_bri.enhance(brightness); 31 | 32 | #color 33 | enh_col = ImageEnhance.Color(image_brightened) 34 | color = np.random.randint(8,14) / 10.0; 35 | image_colored = enh_col.enhance(color) 36 | 37 | enh_con = ImageEnhance.Contrast(image_colored) 38 | contrast =np.random.randint(8,14) / 10.0; 39 | image_contrasted = enh_con.enhance(contrast) 40 | 41 | enh_sha = ImageEnhance.Sharpness(image_contrasted) 42 | sharpness = contrast =np.random.randint(8,15) / 10.0; 43 | image_sharped = enh_sha.enhance(sharpness) 44 | 45 | return np.asarray(image_sharped) 46 | 47 | 48 | def self_random_crop(image_batch): 49 | result = [] 50 | for n in range(image_batch.shape[0]): 51 | newimg = random_brightness(image_batch[n], 0.8); 52 | newimg = rotate(newimg); 53 | start_x = random.randint(0,39) 54 | start_y = random.randint(0,39) 55 | newimg = newimg[start_y:start_y+448,start_x:start_x+448,:]; 56 | result.append(newimg) 57 | return result 58 | 59 | 60 | def move_zero_label(y, len1, len2): 61 | y_result = []; 62 | for i in range(len1): 63 | ytem = y[i]; 64 | y_result.append(ytem[1:len2]); 65 | return np.array(y_result); 66 | 67 | class data_reader: 68 | def __init__(self, datasets, numclass, batchsize, shuffle = True): 69 | self.shuffle = shuffle; 70 | self.num_class = numclass; 71 | self.dataset = datasets; 72 | self.file_num = len(datasets); 73 | self.now_read_file_pos = 0; 74 | self.batch_size = batchsize; 75 | self.data = None; 76 | self.X_data = None; 77 | self.Y_data = None; 78 | self.datanum = 0; 79 | self.batch_num = 0; 80 | self.tem_batch_pos = 0; 81 | self.total_datanum = 0; 82 | self.nextfile(); 83 | 84 | def new_iterator(self): 85 | self.now_read_file_pos = 0; 86 | self.total_datanum = 0; 87 | self.nextfile(); 88 | 89 | def nextfile(self): 90 | if(self.now_read_file_pos + 1 <= self.file_num): 91 | self.data = h5py.File(self.dataset[self.now_read_file_pos], 'r') 92 | self.X_data = self.data['X']; 93 | self.Y_data = self.data['Y']; 94 | self.datanum = self.X_data.shape[0]; 95 | self.total_datanum = self.total_datanum + self.datanum; 96 | self.Y_data = move_zero_label(self.Y_data, self.Y_data.shape[0], self.Y_data.shape[1]); 97 | if(self.shuffle): 98 | self.X_data, self.Y_data = shuffle(self.X_data, self.Y_data) 99 | self.batch_num = int(self.datanum / self.batch_size); 100 | self.tem_batch_pos = 0; 101 | print('Read data file: ' + self.dataset[self.now_read_file_pos]); 102 | self.now_read_file_pos = self.now_read_file_pos + 1; 103 | return True; 104 | else: 105 | return False; 106 | 107 | 108 | def next_batch(self, process = False): 109 | batch_xs = self.X_data[self.tem_batch_pos * self.batch_size: (self.tem_batch_pos + 1) * self.batch_size] 110 | batch_ys = self.Y_data[self.tem_batch_pos * self.batch_size: (self.tem_batch_pos + 1) * self.batch_size] 111 | if(process): 112 | batch_xs = self_random_crop(batch_xs); 113 | self.tem_batch_pos = self.tem_batch_pos + 1; 114 | return batch_xs , batch_ys 115 | 116 | 117 | def have_next(self): 118 | if(self.tem_batch_pos < self.batch_num): 119 | return True; 120 | if(self.tem_batch_pos >= self.batch_num): 121 | if(self.nextfile() == False): 122 | return False; 123 | else: 124 | return self.have_next(); 125 | 126 | -------------------------------------------------------------------------------- /train/resvgg_model.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import tensorflow as tf 3 | import numpy as np 4 | #from scipy.misc import imread, imresize 5 | import tflearn 6 | from tflearn.data_preprocessing import ImagePreprocessing 7 | from tflearn.data_augmentation import ImageAugmentation 8 | import os 9 | from tflearn.data_utils import shuffle 10 | import pickle 11 | from tflearn.data_utils import image_preloader 12 | import h5py 13 | import math 14 | import random 15 | import time 16 | import csv 17 | 18 | class resvgg: 19 | def __init__(self, imgs,keep_prob, weights=None, sess=None, res = False, finetune = True): 20 | self.finetune = finetune; ## only to train the last fc layer 21 | self.keep_prob = keep_prob; 22 | self.imgs = imgs 23 | self.res = res; ## use the resnet structure 24 | self.last_layer_parameters = [] ## Parameters in this list will be optimized when only last layer is being trained 25 | self.parameters = [] ## Parameters in this list will be optimized when whole BCNN network is finetuned 26 | self.convlayers() ## Create Convolutional layers 27 | self.fc_layers() ## Create Fully connected layer 28 | self.weight_file = weights 29 | 30 | 31 | 32 | def convlayers(self): 33 | 34 | # zero-mean input 35 | with tf.name_scope('preprocess') as scope: 36 | mean = tf.constant([123.68, 116.779, 103.939], dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean') 37 | images = self.imgs-mean 38 | print('Adding Data Augmentation') 39 | 40 | 41 | # conv1_1 42 | with tf.variable_scope("conv1_1"): 43 | weights = tf.get_variable("W", [3,3,3,64], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 44 | # Create variable named "biases". 45 | biases = tf.get_variable("b", [64], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 46 | conv = tf.nn.conv2d(images, weights, strides=[1, 1, 1, 1], padding='SAME') 47 | self.conv1_1 = tf.nn.relu(conv + biases) 48 | self.parameters += [weights, biases] 49 | 50 | 51 | # conv1_2 52 | with tf.variable_scope("conv1_2"): 53 | weights = tf.get_variable("W", [3,3,64,64], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 54 | # Create variable named "biases". 55 | biases = tf.get_variable("b", [64], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 56 | conv = tf.nn.conv2d(self.conv1_1, weights, strides=[1, 1, 1, 1], padding='SAME') 57 | combine = conv + biases; 58 | if(self.res): 59 | combine = combine + self.conv1_1; 60 | self.conv1_2 = tf.nn.relu( combine ) 61 | self.parameters += [weights, biases] 62 | 63 | # pool1 64 | self.pool1 = tf.nn.max_pool(self.conv1_2, 65 | ksize=[1, 2, 2, 1], 66 | strides=[1, 2, 2, 1], 67 | padding='SAME', 68 | name='pool1') 69 | 70 | # conv2_1 71 | with tf.variable_scope("conv2_1"): 72 | weights = tf.get_variable("W", [3,3,64,128], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 73 | # Create variable named "biases". 74 | biases = tf.get_variable("b", [128], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 75 | conv = tf.nn.conv2d(self.pool1, weights, strides=[1, 1, 1, 1], padding='SAME') 76 | self.conv2_1 = tf.nn.relu(conv + biases) 77 | self.parameters += [weights, biases] 78 | 79 | 80 | 81 | # conv2_2 82 | with tf.variable_scope("conv2_2"): 83 | weights = tf.get_variable("W", [3,3,128,128], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 84 | # Create variable named "biases". 85 | biases = tf.get_variable("b", [128], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 86 | conv = tf.nn.conv2d(self.conv2_1, weights, strides=[1, 1, 1, 1], padding='SAME') 87 | combine = conv + biases; 88 | if(self.res): 89 | combine = combine + self.conv2_1; 90 | self.conv2_2 = tf.nn.relu(combine) 91 | self.parameters += [weights, biases] 92 | 93 | 94 | # pool2 95 | self.pool2 = tf.nn.max_pool(self.conv2_2, 96 | ksize=[1, 2, 2, 1], 97 | strides=[1, 2, 2, 1], 98 | padding='SAME', 99 | name='pool2') 100 | 101 | # conv3_1 102 | with tf.variable_scope("conv3_1"): 103 | weights = tf.get_variable("W", [3,3,128,256], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 104 | # Create variable named "biases". 105 | biases = tf.get_variable("b", [256], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 106 | conv = tf.nn.conv2d(self.pool2, weights, strides=[1, 1, 1, 1], padding='SAME') 107 | self.conv3_1 = tf.nn.relu(conv + biases) 108 | self.parameters += [weights, biases] 109 | 110 | 111 | # conv3_2 112 | with tf.variable_scope("conv3_2"): 113 | weights = tf.get_variable("W", [3,3,256,256], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 114 | # Create variable named "biases". 115 | biases = tf.get_variable("b", [256], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 116 | conv = tf.nn.conv2d(self.conv3_1, weights, strides=[1, 1, 1, 1], padding='SAME') 117 | self.conv3_2 = tf.nn.relu(conv + biases) 118 | self.parameters += [weights, biases] 119 | 120 | # conv3_3 121 | with tf.variable_scope("conv3_3"): 122 | weights = tf.get_variable("W", [3,3,256,256], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 123 | # Create variable named "biases". 124 | biases = tf.get_variable("b", [256], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 125 | conv = tf.nn.conv2d(self.conv3_2, weights, strides=[1, 1, 1, 1], padding='SAME') 126 | combine = conv + biases; 127 | if(self.res): 128 | combine = combine + self.conv3_1; 129 | self.conv3_3 = tf.nn.relu(combine) 130 | self.parameters += [weights, biases] 131 | 132 | 133 | # pool3 134 | self.pool3 = tf.nn.max_pool(self.conv3_3, 135 | ksize=[1, 2, 2, 1], 136 | strides=[1, 2, 2, 1], 137 | padding='SAME', 138 | name='pool3') 139 | 140 | # conv4_1 141 | with tf.variable_scope("conv4_1"): 142 | weights = tf.get_variable("W", [3,3,256,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 143 | # Create variable named "biases". 144 | biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 145 | conv = tf.nn.conv2d(self.pool3, weights, strides=[1, 1, 1, 1], padding='SAME') 146 | self.conv4_1 = tf.nn.relu(conv + biases) 147 | self.parameters += [weights, biases] 148 | 149 | 150 | # conv4_2 151 | with tf.variable_scope("conv4_2"): 152 | weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 153 | # Create variable named "biases". 154 | biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 155 | conv = tf.nn.conv2d(self.conv4_1, weights, strides=[1, 1, 1, 1], padding='SAME') 156 | self.conv4_2 = tf.nn.relu(conv + biases) 157 | self.parameters += [weights, biases] 158 | 159 | 160 | # conv4_3 161 | with tf.variable_scope("conv4_3"): 162 | weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 163 | # Create variable named "biases". 164 | biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 165 | conv = tf.nn.conv2d(self.conv4_2, weights, strides=[1, 1, 1, 1], padding='SAME') 166 | combine = conv + biases; 167 | if(self.res): 168 | combine = combine + self.conv4_1; 169 | self.conv4_3 = tf.nn.relu(combine) 170 | self.parameters += [weights, biases] 171 | 172 | # pool4 173 | self.pool4 = tf.nn.max_pool(self.conv4_3, 174 | ksize=[1, 2, 2, 1], 175 | strides=[1, 2, 2, 1], 176 | padding='SAME', 177 | name='pool4') 178 | 179 | # conv5_1 180 | with tf.variable_scope("conv5_1"): 181 | weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 182 | # Create variable named "biases". 183 | biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 184 | conv = tf.nn.conv2d(self.pool4, weights, strides=[1, 1, 1, 1], padding='SAME') 185 | self.conv5_1 = tf.nn.relu(conv + biases) 186 | self.parameters += [weights, biases] 187 | 188 | 189 | # conv5_2 190 | with tf.variable_scope("conv5_2"): 191 | weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 192 | # Create variable named "biases". 193 | biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 194 | conv = tf.nn.conv2d(self.conv5_1, weights, strides=[1, 1, 1, 1], padding='SAME') 195 | self.conv5_2 = tf.nn.relu(conv + biases) 196 | self.parameters += [weights, biases] 197 | 198 | 199 | # conv5_3 200 | with tf.variable_scope("conv5_3"): 201 | weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune) 202 | # Create variable named "biases". 203 | biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune) 204 | conv = tf.nn.conv2d(self.conv5_2, weights, strides=[1, 1, 1, 1], padding='SAME') 205 | combine = conv + biases; 206 | if(self.res): 207 | combine = combine + self.conv5_1; 208 | self.conv5_3 = tf.nn.relu(conv + biases + self.conv5_1) 209 | self.parameters += [weights, biases] 210 | self.special_parameters = [weights,biases] 211 | 212 | 213 | self.z_l2 = self.get_bilinear_fc(self.conv5_3, self.conv5_3) 214 | # print('conv5_3 ',self.conv5_3.get_shape()) 215 | # print('self.conv5_1 ',self.conv5_1.get_shape()) 216 | self.z_l3 = self.get_bilinear_fc(self.conv5_3, self.conv5_1) 217 | # print('self.conv5_1 ',self.conv5_1.get_shape()) 218 | pool4_1 = tf.nn.max_pool(self.conv4_1, 219 | ksize=[1, 2, 2, 1], 220 | strides=[1, 2, 2, 1], 221 | padding='SAME', 222 | name='pool4') 223 | self.z_l4 = self.get_bilinear_fc(self.conv5_3, pool4_1) 224 | 225 | self.final_z = tf.concat([self.z_l2 ,self.z_l3, self.z_l4],1) 226 | print(self.final_z.get_shape()) 227 | 228 | def get_bilinear_fc(self,conv1, conv2): 229 | conv1 = tf.transpose(conv1, perm=[0,3,1,2]) 230 | conv1 = tf.reshape(conv1,[-1,512,784]) 231 | conv2 = tf.transpose(conv2, perm=[0,3,1,2]) 232 | conv2 = tf.reshape(conv2,[-1,512,784]) 233 | conv2 = tf.transpose(conv2, perm=[0,2,1]) 234 | phi_I = tf.matmul(conv1, conv2) 235 | phi_I = tf.reshape(phi_I,[-1,512*512]) 236 | phi_I = tf.divide(phi_I,784.0) 237 | y_ssqrt = tf.multiply(tf.sign(phi_I),tf.sqrt(tf.abs(phi_I)+1e-12)) 238 | z = tf.nn.l2_normalize(y_ssqrt, dim=1) 239 | print('Shape of z', z.get_shape()) 240 | return z 241 | 242 | 243 | 244 | def fc_layers(self): 245 | 246 | 247 | with tf.variable_scope('fc-new') as scope: 248 | fc3w = tf.get_variable('W', [786432, 30], initializer=tf.contrib.layers.xavier_initializer(), trainable=True) 249 | #fc3b = tf.Variable(tf.constant(1.0, shape=[100], dtype=tf.float32), name='biases', trainable=True) 250 | fc3b = tf.get_variable("b", [30], initializer=tf.constant_initializer(0.1), trainable=True) 251 | fc = tf.nn.bias_add(tf.matmul(self.final_z, fc3w), fc3b) 252 | self.fc3l = tf.nn.dropout(fc, self.keep_prob) 253 | self.last_layer_parameters += [fc3w, fc3b] 254 | self.parameters += [fc3w, fc3b] 255 | 256 | def load_initial_weights(self, session): 257 | weights_dict = np.load(self.weight_file, encoding = 'bytes') 258 | vgg_layers = ['conv1_1','conv1_2','conv2_1','conv2_2','conv3_1','conv3_2','conv3_3','conv4_1','conv4_2','conv4_3','conv5_1','conv5_2','conv5_3'] 259 | 260 | for op_name in vgg_layers: 261 | with tf.variable_scope(op_name, reuse = True): 262 | 263 | # Loop over list of weights/biases and assign them to their corresponding tf variable 264 | # Biases 265 | 266 | var = tf.get_variable('b', trainable = True) 267 | print('Adding weights to',var.name) 268 | session.run(var.assign(weights_dict[op_name+'_b'])) 269 | 270 | # Weights 271 | var = tf.get_variable('W', trainable = True) 272 | print('Adding weights to',var.name) 273 | session.run(var.assign(weights_dict[op_name+'_W'])) 274 | 275 | 276 | 277 | 278 | def load_own_weight(self,session, filename): 279 | i = 0; 280 | weights_dict = np.load(filename, encoding = 'bytes') 281 | '''Loop over all layer names stored in the weights dict 282 | Load only conv-layers. Skip fc-layers in VGG16''' 283 | vgg_layers = ['conv1_1','conv1_2','conv2_1','conv2_2','conv3_1','conv3_2','conv3_3','conv4_1','conv4_2','conv4_3','conv5_1','conv5_2','conv5_3'] 284 | 285 | for op_name in vgg_layers: 286 | with tf.variable_scope(op_name, reuse = True): 287 | # Weights 288 | var = tf.get_variable('W', trainable = True) 289 | print('Adding weights to',var.name) 290 | session.run(var.assign(weights_dict['arr_0' ][i])) 291 | 292 | var = tf.get_variable('b', trainable = True) 293 | print('Adding weights to',var.name) 294 | session.run(var.assign(weights_dict['arr_0' ][i+1])) 295 | i = i + 2; 296 | 297 | 298 | 299 | with tf.variable_scope('fc-new', reuse = True): 300 | ''' 301 | Load fc-layer weights trained in the first step. 302 | Use file .py to train last layer 303 | ''' 304 | print('Last layer weights: last_layers_epoch_best.npz') 305 | var = tf.get_variable('W', trainable = True) 306 | print('Adding weights to',var.name) 307 | session.run(var.assign(weights_dict['arr_0' ][i])) 308 | var = tf.get_variable('b', trainable = True) 309 | print('Adding weights to',var.name) 310 | session.run(var.assign(weights_dict['arr_0'][i+1])) 311 | i = i + 2; -------------------------------------------------------------------------------- /train/train_resvgg.py: -------------------------------------------------------------------------------- 1 | from resvgg_model import *; 2 | from read_data import *; 3 | 4 | 5 | if __name__ == '__main__': 6 | dataset_path = '/home/smie/zhengjx/Res_Bilinear_cnns/data_model/'; 7 | model_path = '/home/smie/zhengjx/Res_Bilinear_cnns/data_model/'; 8 | save_model_best = model_path + 'res_fine_last_layers_epoch_best.npz'; 9 | save_model_last = model_path + 'res_fine_last_layers_epoch_last.npz'; 10 | train_data_file = []; 11 | val_data_file = []; 12 | #define the data to use 13 | for i in range(0,50): 14 | train_data_file.append(dataset_path + 'train_data' + str(i) + '.h5'); 15 | val_data_file.append(dataset_path + 'validation_data' + str(i) + '.h5'); 16 | num_class = 30; 17 | train_batch_size = 8; 18 | val_batch_size = 8; 19 | train_reader = data_reader(train_data_file, num_class, train_batch_size); 20 | val_reader = data_reader(val_data_file, num_class, val_batch_size) 21 | 22 | model_path = model_path + 'res_fine_last_layers_epoch_best.npz'; 23 | 24 | sess = tf.Session() ## Start session to create training graph 25 | keep_prob = tf.placeholder(tf.float32); 26 | imgs = tf.placeholder(tf.float32, [None, 448, 448, 3]) 27 | target = tf.placeholder("float", [None, 30]) 28 | 29 | # resvgg = resvgg(imgs, 'vgg16_weights.npz', sess, finetune = False) # fine tuning 30 | resvgg = resvgg(imgs, keep_prob, 'vgg16_weights.npz', sess) 31 | 32 | print('Res Bilinear cnn network created') 33 | 34 | # Defining other ops using Tensorflow 35 | loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=resvgg.fc3l, labels=target)) 36 | 37 | #optimizer = tf.train.MomentumOptimizer(learning_rate=0.2, momentum=0.4).minimize(loss) # for fine tuning 38 | optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss) # for normal training 39 | check_op = tf.add_check_numerics_ops() 40 | 41 | 42 | correct_prediction = tf.equal(tf.argmax(resvgg.fc3l,1), tf.argmax(target,1)) 43 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 44 | 45 | num_correct_preds = tf.reduce_sum(tf.cast(correct_prediction, tf.float32)) 46 | 47 | sess.run(tf.global_variables_initializer()) 48 | #resvgg.load_initial_weights(sess) 49 | resvgg.load_own_weight(sess , model_path); # load the model trained before 50 | 51 | 52 | 53 | 54 | 55 | correct_val_count = 0 56 | val_loss = 0.0 57 | while(val_reader.have_next()): 58 | batch_val_x, batch_val_y = val_reader.next_batch(); 59 | val_loss += sess.run(loss, feed_dict={imgs: batch_val_x, target: batch_val_y, keep_prob:1.0}) 60 | pred = sess.run(num_correct_preds, feed_dict = {imgs: batch_val_x, target: batch_val_y, keep_prob:1.0}) 61 | correct_val_count+=pred 62 | val_loss = val_loss/(1.0*val_reader.total_datanum); 63 | print("##############################") 64 | print("Validation Loss -->", val_loss) 65 | print("correct_val_count, total_val_count", correct_val_count, val_reader.total_datanum) 66 | print("Validation Data Accuracy -->", 100.0*correct_val_count/(1.0*val_reader.total_datanum)) 67 | print("##############################") 68 | 69 | 70 | 71 | print('Starting training') 72 | best_validation_lost = val_loss; 73 | for epoch in range(50): 74 | train_reader.new_iterator(); 75 | ave_cost = 0; 76 | num = 100; 77 | i = 0; 78 | while(train_reader.have_next()): 79 | i = i + 1; 80 | batch_xs, batch_ys = train_reader.next_batch(process = True); 81 | start = time.time() 82 | sess.run([optimizer,check_op], feed_dict={imgs: batch_xs, target: batch_ys, keep_prob:0.5}) 83 | cost = sess.run(loss, feed_dict={imgs: batch_xs, target: batch_ys, keep_prob:1.0}) 84 | ave_cost = ave_cost + cost; 85 | if i % num == 0: 86 | ave_cost = 1.0 * ave_cost / num; 87 | print("Epoch:", '%03d' % (epoch+1), "Step:", '%03d' % i,"Loss:", str(ave_cost)) 88 | ave_cost = 0; 89 | 90 | 91 | correct_val_count = 0 92 | val_loss = 0.0; 93 | val_reader.new_iterator(); 94 | while(val_reader.have_next()): 95 | batch_val_x, batch_val_y = val_reader.next_batch(); 96 | val_loss += sess.run(loss, feed_dict={imgs: batch_val_x, target: batch_val_y, keep_prob:1.0}) 97 | pred = sess.run(num_correct_preds, feed_dict = {imgs: batch_val_x, target: batch_val_y, keep_prob:1.0}) 98 | correct_val_count+=pred 99 | val_loss = val_loss/(1.0*val_reader.total_datanum); 100 | print("##############################") 101 | print("Validation Loss -->", val_loss) 102 | print("correct_val_count, total_val_count", correct_val_count, val_reader.total_datanum) 103 | print("Validation Data Accuracy -->", 100.0*correct_val_count/(1.0*val_reader.total_datanum)) 104 | print("##############################") 105 | #save the best model 106 | if(val_loss < best_validation_lost): 107 | best_validation_lost = val_loss; 108 | last_layer_weights = [] 109 | for v in resvgg.parameters: 110 | last_layer_weights.append(sess.run(v)) 111 | np.savez(save_model_best,last_layer_weights) 112 | print('save the model!') 113 | 114 | 115 | last_layer_weights = [] 116 | for v in resvgg.parameters: 117 | print(v) 118 | last_layer_weights.append(sess.run(v)) 119 | np.savez(save_model_last,last_layer_weights) 120 | print('save the model!') 121 | -------------------------------------------------------------------------------- /yolo/ReadMe.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/xuxcong/pig-face-recognition/540cdd6026ec2be6250f2677c036bd57c2251e67/yolo/ReadMe.txt -------------------------------------------------------------------------------- /yolo/demo.c: -------------------------------------------------------------------------------- 1 | #include "network.h" 2 | #include "detection_layer.h" 3 | #include "region_layer.h" 4 | #include "cost_layer.h" 5 | #include "utils.h" 6 | #include "parser.h" 7 | #include "box.h" 8 | #include "image.h" 9 | #include "demo.h" 10 | #include 11 | 12 | #define DEMO 1 13 | 14 | #ifdef OPENCV 15 | 16 | static char **demo_names; 17 | static image **demo_alphabet; 18 | static int demo_classes; 19 | 20 | static float **probs; 21 | static box *boxes; 22 | static network *net; 23 | static image buff [3]; 24 | static image buff_letter[3]; 25 | static int buff_index = 0; 26 | static CvCapture * cap; 27 | static IplImage * ipl; 28 | static float fps = 0; 29 | static float demo_thresh = 0; 30 | static float demo_hier = .5; 31 | static int running = 0; 32 | 33 | static int demo_frame = 3; 34 | static int demo_detections = 0; 35 | static float **predictions; 36 | static int demo_index = 0; 37 | static int demo_done = 0; 38 | static float *avg; 39 | double demo_time; 40 | 41 | void *detect_in_thread(void *ptr) 42 | { 43 | running = 1; 44 | float nms = .4; 45 | 46 | layer l = net->layers[net->n-1]; 47 | float *X = buff_letter[(buff_index+2)%3].data; 48 | float *prediction = network_predict(net, X); 49 | 50 | memcpy(predictions[demo_index], prediction, l.outputs*sizeof(float)); 51 | mean_arrays(predictions, demo_frame, l.outputs, avg); 52 | l.output = avg; 53 | if(l.type == DETECTION){ 54 | get_detection_boxes(l, 1, 1, demo_thresh, probs, boxes, 0); 55 | } else if (l.type == REGION){ 56 | get_region_boxes(l, buff[0].w, buff[0].h, net->w, net->h, demo_thresh, probs, boxes, 0, 0, 0, demo_hier, 1); 57 | } else { 58 | error("Last layer must produce detections\n"); 59 | } 60 | if (nms > 0) do_nms_obj(boxes, probs, l.w*l.h*l.n, l.classes, nms); 61 | 62 | //printf("\033[2J"); //清屏 63 | //printf("\033[1;1H"); 64 | //printf("\nFPS:%.1f\n",fps); 65 | //printf("Objects:\n\n"); 66 | image display = buff[(buff_index+2) % 3]; 67 | draw_detections(display, demo_detections, demo_thresh, boxes, probs, 0, demo_names, demo_alphabet, demo_classes); //画ROI 来自image.c 68 | 69 | demo_index = (demo_index + 1)%demo_frame; 70 | running = 0; 71 | return 0; 72 | } 73 | 74 | void *fetch_in_thread(void *ptr) 75 | { 76 | int status = fill_image_from_stream(cap, buff[buff_index]); 77 | letterbox_image_into(buff[buff_index], net->w, net->h, buff_letter[buff_index]); 78 | if(status == 0) demo_done = 1; 79 | return 0; 80 | } 81 | 82 | void *display_in_thread(void *ptr) 83 | { 84 | show_image_cv(buff[(buff_index + 1)%3], "Demo", ipl); 85 | int c = cvWaitKey(1); 86 | if (c != -1) c = c%256; 87 | if (c == 27) { 88 | demo_done = 1; 89 | return 0; 90 | } else if (c == 82) { 91 | demo_thresh += .02; 92 | } else if (c == 84) { 93 | demo_thresh -= .02; 94 | if(demo_thresh <= .02) demo_thresh = .02; 95 | } else if (c == 83) { 96 | demo_hier += .02; 97 | } else if (c == 81) { 98 | demo_hier -= .02; 99 | if(demo_hier <= .0) demo_hier = .0; 100 | } 101 | return 0; 102 | } 103 | 104 | void *display_loop(void *ptr) 105 | { 106 | while(1){ 107 | display_in_thread(0); 108 | } 109 | } 110 | 111 | void *detect_loop(void *ptr) 112 | { 113 | while(1){ 114 | detect_in_thread(0); 115 | } 116 | } 117 | 118 | void demo(char *cfgfile, char *weightfile, float thresh, int cam_index, const char *filename, char **names, int classes, int delay, char *prefix, int avg_frames, float hier, int w, int h, int frames, int fullscreen) 119 | { 120 | demo_frame = avg_frames; 121 | predictions = calloc(demo_frame, sizeof(float*)); 122 | image **alphabet = load_alphabet(); //??? 123 | demo_names = names; 124 | demo_alphabet = alphabet; 125 | demo_classes = classes; 126 | demo_thresh = thresh; 127 | demo_hier = hier; 128 | printf("Demo\n"); 129 | net = load_network(cfgfile, weightfile, 0); 130 | set_batch_network(net, 1); 131 | pthread_t detect_thread; 132 | pthread_t fetch_thread; 133 | 134 | srand(2222222); 135 | 136 | if(filename){ 137 | printf("video file: %s\n", filename); 138 | cap = cvCaptureFromFile(filename); 139 | }else{ 140 | cap = cvCaptureFromCAM(cam_index); 141 | 142 | if(w){ 143 | cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_WIDTH, w); 144 | } 145 | if(h){ 146 | cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_HEIGHT, h); 147 | } 148 | if(frames){ 149 | cvSetCaptureProperty(cap, CV_CAP_PROP_FPS, frames); 150 | } 151 | } 152 | 153 | if(!cap) error("Couldn't connect to webcam.\n"); 154 | 155 | layer l = net->layers[net->n-1]; 156 | demo_detections = l.n*l.w*l.h; 157 | int j; 158 | 159 | avg = (float *) calloc(l.outputs, sizeof(float)); 160 | for(j = 0; j < demo_frame; ++j) predictions[j] = (float *) calloc(l.outputs, sizeof(float)); 161 | 162 | boxes = (box *)calloc(l.w*l.h*l.n, sizeof(box)); 163 | probs = (float **)calloc(l.w*l.h*l.n, sizeof(float *)); 164 | for(j = 0; j < l.w*l.h*l.n; ++j) probs[j] = (float *)calloc(l.classes+1, sizeof(float)); 165 | 166 | buff[0] = get_image_from_stream(cap); 167 | buff[1] = copy_image(buff[0]); 168 | buff[2] = copy_image(buff[0]); 169 | buff_letter[0] = letterbox_image(buff[0], net->w, net->h); 170 | buff_letter[1] = letterbox_image(buff[0], net->w, net->h); 171 | buff_letter[2] = letterbox_image(buff[0], net->w, net->h); 172 | ipl = cvCreateImage(cvSize(buff[0].w,buff[0].h), IPL_DEPTH_8U, buff[0].c); 173 | 174 | int count = 0; 175 | if(!prefix){ 176 | cvNamedWindow("Demo", CV_WINDOW_NORMAL); 177 | if(fullscreen){ 178 | cvSetWindowProperty("Demo", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN); 179 | } else { 180 | cvMoveWindow("Demo", 0, 0); 181 | cvResizeWindow("Demo", 1352, 1013); 182 | } 183 | } 184 | 185 | demo_time = what_time_is_it_now(); 186 | 187 | while(!demo_done){ 188 | printf("frame %d:\n",count+1); //??? count+1 189 | buff_index = (buff_index + 1) %3; 190 | if(pthread_create(&fetch_thread, 0, fetch_in_thread, 0)) error("Thread creation failed"); 191 | if(pthread_create(&detect_thread, 0, detect_in_thread, 0)) error("Thread creation failed"); 192 | if(!prefix){ 193 | fps = 1./(what_time_is_it_now() - demo_time); 194 | demo_time = what_time_is_it_now(); 195 | display_in_thread(0); 196 | }else{ 197 | char name[256]; 198 | sprintf(name, "%s_%04d", prefix, count); 199 | save_image(buff[(buff_index + 1)%3], name); //写入图片 200 | } 201 | pthread_join(fetch_thread, 0); 202 | pthread_join(detect_thread, 0); 203 | ++count; 204 | } 205 | } 206 | 207 | void demo_compare(char *cfg1, char *weight1, char *cfg2, char *weight2, float thresh, int cam_index, const char *filename, char **names, int classes, int delay, char *prefix, int avg_frames, float hier, int w, int h, int frames, int fullscreen) 208 | { 209 | demo_frame = avg_frames; 210 | predictions = calloc(demo_frame, sizeof(float*)); 211 | image **alphabet = load_alphabet(); 212 | demo_names = names; 213 | demo_alphabet = alphabet; 214 | demo_classes = classes; 215 | demo_thresh = thresh; 216 | demo_hier = hier; 217 | printf("Demo\n"); 218 | net = load_network(cfg1, weight1, 0); 219 | set_batch_network(net, 1); 220 | pthread_t detect_thread; 221 | pthread_t fetch_thread; 222 | 223 | srand(2222222); 224 | 225 | if(filename){ 226 | printf("video file: %s\n", filename); 227 | cap = cvCaptureFromFile(filename); 228 | }else{ 229 | cap = cvCaptureFromCAM(cam_index); 230 | 231 | if(w){ 232 | cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_WIDTH, w); 233 | } 234 | if(h){ 235 | cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_HEIGHT, h); 236 | } 237 | if(frames){ 238 | cvSetCaptureProperty(cap, CV_CAP_PROP_FPS, frames); 239 | } 240 | } 241 | 242 | if(!cap) error("Couldn't connect to webcam.\n"); 243 | 244 | layer l = net->layers[net->n-1]; 245 | demo_detections = l.n*l.w*l.h; 246 | int j; 247 | 248 | avg = (float *) calloc(l.outputs, sizeof(float)); 249 | for(j = 0; j < demo_frame; ++j) predictions[j] = (float *) calloc(l.outputs, sizeof(float)); 250 | 251 | boxes = (box *)calloc(l.w*l.h*l.n, sizeof(box)); 252 | probs = (float **)calloc(l.w*l.h*l.n, sizeof(float *)); 253 | for(j = 0; j < l.w*l.h*l.n; ++j) probs[j] = (float *)calloc(l.classes+1, sizeof(float)); 254 | 255 | buff[0] = get_image_from_stream(cap); 256 | buff[1] = copy_image(buff[0]); 257 | buff[2] = copy_image(buff[0]); 258 | buff_letter[0] = letterbox_image(buff[0], net->w, net->h); 259 | buff_letter[1] = letterbox_image(buff[0], net->w, net->h); 260 | buff_letter[2] = letterbox_image(buff[0], net->w, net->h); 261 | ipl = cvCreateImage(cvSize(buff[0].w,buff[0].h), IPL_DEPTH_8U, buff[0].c); 262 | 263 | int count = 0; 264 | if(!prefix){ 265 | cvNamedWindow("Demo", CV_WINDOW_NORMAL); 266 | if(fullscreen){ 267 | cvSetWindowProperty("Demo", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN); 268 | } else { 269 | cvMoveWindow("Demo", 0, 0); 270 | cvResizeWindow("Demo", 1352, 1013); 271 | } 272 | } 273 | 274 | demo_time = what_time_is_it_now(); 275 | 276 | while(!demo_done){ 277 | buff_index = (buff_index + 1) %3; 278 | if(pthread_create(&fetch_thread, 0, fetch_in_thread, 0)) error("Thread creation failed"); 279 | if(pthread_create(&detect_thread, 0, detect_in_thread, 0)) error("Thread creation failed"); 280 | if(!prefix){ 281 | fps = 1./(what_time_is_it_now() - demo_time); 282 | demo_time = what_time_is_it_now(); 283 | display_in_thread(0); 284 | }else{ 285 | char name[256]; 286 | sprintf(name, "%s_%04d", prefix, count); 287 | save_image(buff[(buff_index + 1)%3], name); 288 | } 289 | pthread_join(fetch_thread, 0); 290 | pthread_join(detect_thread, 0); 291 | ++count; 292 | } 293 | } 294 | #else 295 | void demo(char *cfgfile, char *weightfile, float thresh, int cam_index, const char *filename, char **names, int classes, int delay, char *prefix, int avg, float hier, int w, int h, int frames, int fullscreen) 296 | { 297 | fprintf(stderr, "Demo needs OpenCV for webcam images.\n"); 298 | } 299 | #endif 300 | 301 | -------------------------------------------------------------------------------- /yolo/demo.h: -------------------------------------------------------------------------------- 1 | #ifndef DEMO_H 2 | #define DEMO_H 3 | 4 | #include "image.h" 5 | 6 | #endif 7 | -------------------------------------------------------------------------------- /yolo/image.c: -------------------------------------------------------------------------------- 1 | #include "image.h" 2 | #include "utils.h" 3 | #include "blas.h" 4 | #include "cuda.h" 5 | #include 6 | #include 7 | 8 | #define STB_IMAGE_IMPLEMENTATION 9 | #include "stb_image.h" 10 | #define STB_IMAGE_WRITE_IMPLEMENTATION 11 | #include "stb_image_write.h" 12 | 13 | #include 14 | #include 15 | #include 16 | 17 | 18 | int windows = 0; 19 | 20 | int count=0; 21 | 22 | float colors[6][3] = { {1,0,1}, {0,0,1},{0,1,1},{0,1,0},{1,1,0},{1,0,0} }; 23 | 24 | float get_color(int c, int x, int max) 25 | { 26 | float ratio = ((float)x/max)*5; 27 | int i = floor(ratio); 28 | int j = ceil(ratio); 29 | ratio -= i; 30 | float r = (1-ratio) * colors[i][c] + ratio*colors[j][c]; 31 | //printf("%f\n", r); 32 | return r; 33 | } 34 | 35 | image mask_to_rgb(image mask) 36 | { 37 | int n = mask.c; 38 | image im = make_image(mask.w, mask.h, 3); 39 | int i, j; 40 | for(j = 0; j < n; ++j){ 41 | int offset = j*123457 % n; 42 | float red = get_color(2,offset,n); 43 | float green = get_color(1,offset,n); 44 | float blue = get_color(0,offset,n); 45 | for(i = 0; i < im.w*im.h; ++i){ 46 | im.data[i + 0*im.w*im.h] += mask.data[j*im.h*im.w + i]*red; 47 | im.data[i + 1*im.w*im.h] += mask.data[j*im.h*im.w + i]*green; 48 | im.data[i + 2*im.w*im.h] += mask.data[j*im.h*im.w + i]*blue; 49 | } 50 | } 51 | return im; 52 | } 53 | 54 | static float get_pixel(image m, int x, int y, int c) 55 | { 56 | assert(x < m.w && y < m.h && c < m.c); 57 | return m.data[c*m.h*m.w + y*m.w + x]; 58 | } 59 | static float get_pixel_extend(image m, int x, int y, int c) 60 | { 61 | if(x < 0 || x >= m.w || y < 0 || y >= m.h) return 0; 62 | /* 63 | if(x < 0) x = 0; 64 | if(x >= m.w) x = m.w-1; 65 | if(y < 0) y = 0; 66 | if(y >= m.h) y = m.h-1; 67 | */ 68 | if(c < 0 || c >= m.c) return 0; 69 | return get_pixel(m, x, y, c); 70 | } 71 | static void set_pixel(image m, int x, int y, int c, float val) 72 | { 73 | if (x < 0 || y < 0 || c < 0 || x >= m.w || y >= m.h || c >= m.c) return; 74 | assert(x < m.w && y < m.h && c < m.c); 75 | m.data[c*m.h*m.w + y*m.w + x] = val; 76 | } 77 | static void add_pixel(image m, int x, int y, int c, float val) 78 | { 79 | assert(x < m.w && y < m.h && c < m.c); 80 | m.data[c*m.h*m.w + y*m.w + x] += val; 81 | } 82 | 83 | static float bilinear_interpolate(image im, float x, float y, int c) 84 | { 85 | int ix = (int) floorf(x); 86 | int iy = (int) floorf(y); 87 | 88 | float dx = x - ix; 89 | float dy = y - iy; 90 | 91 | float val = (1-dy) * (1-dx) * get_pixel_extend(im, ix, iy, c) + 92 | dy * (1-dx) * get_pixel_extend(im, ix, iy+1, c) + 93 | (1-dy) * dx * get_pixel_extend(im, ix+1, iy, c) + 94 | dy * dx * get_pixel_extend(im, ix+1, iy+1, c); 95 | return val; 96 | } 97 | 98 | 99 | void composite_image(image source, image dest, int dx, int dy) 100 | { 101 | int x,y,k; 102 | for(k = 0; k < source.c; ++k){ 103 | for(y = 0; y < source.h; ++y){ 104 | for(x = 0; x < source.w; ++x){ 105 | float val = get_pixel(source, x, y, k); 106 | float val2 = get_pixel_extend(dest, dx+x, dy+y, k); 107 | set_pixel(dest, dx+x, dy+y, k, val * val2); 108 | } 109 | } 110 | } 111 | } 112 | 113 | image border_image(image a, int border) 114 | { 115 | image b = make_image(a.w + 2*border, a.h + 2*border, a.c); 116 | int x,y,k; 117 | for(k = 0; k < b.c; ++k){ 118 | for(y = 0; y < b.h; ++y){ 119 | for(x = 0; x < b.w; ++x){ 120 | float val = get_pixel_extend(a, x - border, y - border, k); 121 | if(x - border < 0 || x - border >= a.w || y - border < 0 || y - border >= a.h) val = 1; 122 | set_pixel(b, x, y, k, val); 123 | } 124 | } 125 | } 126 | return b; 127 | } 128 | 129 | image tile_images(image a, image b, int dx) 130 | { 131 | if(a.w == 0) return copy_image(b); 132 | image c = make_image(a.w + b.w + dx, (a.h > b.h) ? a.h : b.h, (a.c > b.c) ? a.c : b.c); 133 | fill_cpu(c.w*c.h*c.c, 1, c.data, 1); 134 | embed_image(a, c, 0, 0); 135 | composite_image(b, c, a.w + dx, 0); 136 | return c; 137 | } 138 | 139 | image get_label(image **characters, char *string, int size) 140 | { 141 | if(size > 7) size = 7; 142 | image label = make_empty_image(0,0,0); 143 | while(*string){ 144 | image l = characters[size][(int)*string]; 145 | image n = tile_images(label, l, -size - 1 + (size+1)/2); 146 | free_image(label); 147 | label = n; 148 | ++string; 149 | } 150 | image b = border_image(label, label.h*.25); 151 | free_image(label); 152 | return b; 153 | } 154 | 155 | void draw_label(image a, int r, int c, image label, const float *rgb) 156 | { 157 | int w = label.w; 158 | int h = label.h; 159 | if (r - h >= 0) r = r - h; 160 | 161 | int i, j, k; 162 | for(j = 0; j < h && j + r < a.h; ++j){ 163 | for(i = 0; i < w && i + c < a.w; ++i){ 164 | for(k = 0; k < label.c; ++k){ 165 | float val = get_pixel(label, i, j, k); 166 | set_pixel(a, i+c, j+r, k, rgb[k] * val); 167 | } 168 | } 169 | } 170 | } 171 | 172 | void draw_box(image a, int x1, int y1, int x2, int y2, float r, float g, float b) 173 | { 174 | //normalize_image(a); 175 | int i; 176 | if(x1 < 0) x1 = 0; 177 | if(x1 >= a.w) x1 = a.w-1; 178 | if(x2 < 0) x2 = 0; 179 | if(x2 >= a.w) x2 = a.w-1; 180 | 181 | if(y1 < 0) y1 = 0; 182 | if(y1 >= a.h) y1 = a.h-1; 183 | if(y2 < 0) y2 = 0; 184 | if(y2 >= a.h) y2 = a.h-1; 185 | 186 | for(i = x1; i <= x2; ++i){ 187 | a.data[i + y1*a.w + 0*a.w*a.h] = r; 188 | a.data[i + y2*a.w + 0*a.w*a.h] = r; 189 | 190 | a.data[i + y1*a.w + 1*a.w*a.h] = g; 191 | a.data[i + y2*a.w + 1*a.w*a.h] = g; 192 | 193 | a.data[i + y1*a.w + 2*a.w*a.h] = b; 194 | a.data[i + y2*a.w + 2*a.w*a.h] = b; 195 | } 196 | for(i = y1; i <= y2; ++i){ 197 | a.data[x1 + i*a.w + 0*a.w*a.h] = r; 198 | a.data[x2 + i*a.w + 0*a.w*a.h] = r; 199 | 200 | a.data[x1 + i*a.w + 1*a.w*a.h] = g; 201 | a.data[x2 + i*a.w + 1*a.w*a.h] = g; 202 | 203 | a.data[x1 + i*a.w + 2*a.w*a.h] = b; 204 | a.data[x2 + i*a.w + 2*a.w*a.h] = b; 205 | } 206 | } 207 | 208 | //printf("left:%d top:%d right:%d bot:%d width:%d\n",left,top,right,bot,width); 209 | void draw_box_width(image a, int x1, int y1, int x2, int y2, int w, float r, float g, float b) 210 | { 211 | int i; 212 | for(i = 0; i < w; ++i){ 213 | draw_box(a, x1+i, y1+i, x2-i, y2-i, r, g, b); 214 | } 215 | } 216 | 217 | void draw_bbox(image a, box bbox, int w, float r, float g, float b) 218 | { 219 | int left = (bbox.x-bbox.w/2)*a.w; 220 | int right = (bbox.x+bbox.w/2)*a.w; 221 | int top = (bbox.y-bbox.h/2)*a.h; 222 | int bot = (bbox.y+bbox.h/2)*a.h; 223 | 224 | int i; 225 | for(i = 0; i < w; ++i){ 226 | draw_box(a, left+i, top+i, right-i, bot-i, r, g, b); 227 | } 228 | } 229 | 230 | image **load_alphabet() 231 | { 232 | int i, j; 233 | const int nsize = 8; 234 | image **alphabets = calloc(nsize, sizeof(image)); 235 | for(j = 0; j < nsize; ++j){ 236 | alphabets[j] = calloc(128, sizeof(image)); 237 | for(i = 32; i < 127; ++i){ 238 | char buff[256]; 239 | sprintf(buff, "data/labels/%d_%d.png", i, j); 240 | alphabets[j][i] = load_image_color(buff, 0, 0); 241 | } 242 | } 243 | return alphabets; 244 | } 245 | 246 | void draw_detections(image im, int num, float thresh, box *boxes, float **probs, float **masks, char **names, image **alphabet, int classes) //被demo.c调用的函数,画出ROI 247 | { 248 | count++; 249 | char image_name[256]; 250 | sprintf(image_name, "%s_%04d.jpg", "output", count); 251 | //printf("%s\n",image_name); 252 | char image_roi_name[256]; 253 | int roi_num=0; //一张图片可能有多个ROI 254 | sprintf(image_roi_name, "ROI/%04d_roi_%03d.jpg", count, roi_num); 255 | 256 | 257 | int i,j; 258 | for(i = 0; i < num; ++i){ //遍历所有检测到的物体 259 | char labelstr[4096] = {0}; 260 | int class = -1; 261 | for(j = 0; j < classes; ++j){ //遍历所有分类 262 | if (probs[i][j] > thresh){ 263 | if (class < 0) { 264 | strcat(labelstr, names[j]); 265 | class = j; 266 | } else { 267 | strcat(labelstr, ", "); 268 | strcat(labelstr, names[j]); 269 | } 270 | //printf("%s\n","hehe"); 271 | printf(" %s: %.0f%%\n", names[j], probs[i][j]*100); 272 | } 273 | } 274 | if(class >= 0){ 275 | int width = im.h * .006; 276 | 277 | /* 278 | if(0){ 279 | width = pow(prob, 1./2.)*10+1; 280 | alphabet = 0; 281 | } 282 | */ 283 | 284 | //printf("%d %s: %.0f%%\n", i, names[class], prob*100); 285 | int offset = class*123457 % classes; 286 | float red = get_color(2,offset,classes); 287 | float green = get_color(1,offset,classes); 288 | float blue = get_color(0,offset,classes); 289 | float rgb[3]; 290 | 291 | //width = prob*20+2; 292 | 293 | rgb[0] = red; 294 | rgb[1] = green; 295 | rgb[2] = blue; 296 | box b = boxes[i]; 297 | 298 | int left = (b.x-b.w/2.)*im.w; 299 | int right = (b.x+b.w/2.)*im.w; 300 | int top = (b.y-b.h/2.)*im.h; 301 | int bot = (b.y+b.h/2.)*im.h; 302 | 303 | if(left < 0) left = 0; 304 | if(right > im.w-1) right = im.w-1; 305 | if(top < 0) top = 0; 306 | if(bot > im.h-1) bot = im.h-1; 307 | 308 | draw_box_width(im, left, top, right, bot, width, red, green, blue); //画ROI 309 | printf(" left:%d top:%d right:%d bot:%d\n",left,top,right,bot); //确定左上角和右下角的位置 310 | ///* 311 | //加载源图像CV_LOAD_IMAGE_COLOR或者CV_LOAD_IMAGE_GRAYSCALE 312 | IplImage *pSrc = cvLoadImage(image_name, -1); 313 | //printf(" loading %s\n",image_name); 314 | if(!pSrc) { 315 | printf(" %s load failed!\n",image_name); 316 | return ; 317 | } 318 | CvSize size= cvSize(right-left,bot-top);//区域大小 319 | cvSetImageROI(pSrc,cvRect(left,top,size.width, size.height));//设置源图像ROI 左边界,上边界,宽度,高度 320 | IplImage* pDest = cvCreateImage(size,pSrc->depth,pSrc->nChannels);//创建目标图像 321 | cvCopy(pSrc,pDest,0); //复制图像 322 | cvResetImageROI(pSrc);//源图像用完后,清空ROI 323 | cvSaveImage(image_roi_name,pDest,0);//保存目标图像 324 | printf(" saved %s\n",image_roi_name); 325 | roi_num++; 326 | sprintf(image_roi_name, "ROI/%04d_roi_%03d.jpg", count, roi_num); 327 | //*/ 328 | 329 | if (alphabet) { //画标签 330 | ///* 331 | image label = get_label(alphabet, labelstr, (im.h*.03)/10); 332 | //printf("label:%s\n",label); //null 333 | draw_label(im, top + width, left, label, rgb); //标签 334 | free_image(label); 335 | //*/ 336 | } 337 | if (masks){ 338 | image mask = float_to_image(14, 14, 1, masks[i]); 339 | image resized_mask = resize_image(mask, b.w*im.w, b.h*im.h); 340 | image tmask = threshold_image(resized_mask, .5); 341 | embed_image(tmask, im, left, top); 342 | free_image(mask); 343 | free_image(resized_mask); 344 | free_image(tmask); 345 | } 346 | } 347 | } 348 | } 349 | 350 | void transpose_image(image im) 351 | { 352 | assert(im.w == im.h); 353 | int n, m; 354 | int c; 355 | for(c = 0; c < im.c; ++c){ 356 | for(n = 0; n < im.w-1; ++n){ 357 | for(m = n + 1; m < im.w; ++m){ 358 | float swap = im.data[m + im.w*(n + im.h*c)]; 359 | im.data[m + im.w*(n + im.h*c)] = im.data[n + im.w*(m + im.h*c)]; 360 | im.data[n + im.w*(m + im.h*c)] = swap; 361 | } 362 | } 363 | } 364 | } 365 | 366 | void rotate_image_cw(image im, int times) 367 | { 368 | assert(im.w == im.h); 369 | times = (times + 400) % 4; 370 | int i, x, y, c; 371 | int n = im.w; 372 | for(i = 0; i < times; ++i){ 373 | for(c = 0; c < im.c; ++c){ 374 | for(x = 0; x < n/2; ++x){ 375 | for(y = 0; y < (n-1)/2 + 1; ++y){ 376 | float temp = im.data[y + im.w*(x + im.h*c)]; 377 | im.data[y + im.w*(x + im.h*c)] = im.data[n-1-x + im.w*(y + im.h*c)]; 378 | im.data[n-1-x + im.w*(y + im.h*c)] = im.data[n-1-y + im.w*(n-1-x + im.h*c)]; 379 | im.data[n-1-y + im.w*(n-1-x + im.h*c)] = im.data[x + im.w*(n-1-y + im.h*c)]; 380 | im.data[x + im.w*(n-1-y + im.h*c)] = temp; 381 | } 382 | } 383 | } 384 | } 385 | } 386 | 387 | void flip_image(image a) 388 | { 389 | int i,j,k; 390 | for(k = 0; k < a.c; ++k){ 391 | for(i = 0; i < a.h; ++i){ 392 | for(j = 0; j < a.w/2; ++j){ 393 | int index = j + a.w*(i + a.h*(k)); 394 | int flip = (a.w - j - 1) + a.w*(i + a.h*(k)); 395 | float swap = a.data[flip]; 396 | a.data[flip] = a.data[index]; 397 | a.data[index] = swap; 398 | } 399 | } 400 | } 401 | } 402 | 403 | image image_distance(image a, image b) 404 | { 405 | int i,j; 406 | image dist = make_image(a.w, a.h, 1); 407 | for(i = 0; i < a.c; ++i){ 408 | for(j = 0; j < a.h*a.w; ++j){ 409 | dist.data[j] += pow(a.data[i*a.h*a.w+j]-b.data[i*a.h*a.w+j],2); 410 | } 411 | } 412 | for(j = 0; j < a.h*a.w; ++j){ 413 | dist.data[j] = sqrt(dist.data[j]); 414 | } 415 | return dist; 416 | } 417 | 418 | void ghost_image(image source, image dest, int dx, int dy) 419 | { 420 | int x,y,k; 421 | float max_dist = sqrt((-source.w/2. + .5)*(-source.w/2. + .5)); 422 | for(k = 0; k < source.c; ++k){ 423 | for(y = 0; y < source.h; ++y){ 424 | for(x = 0; x < source.w; ++x){ 425 | float dist = sqrt((x - source.w/2. + .5)*(x - source.w/2. + .5) + (y - source.h/2. + .5)*(y - source.h/2. + .5)); 426 | float alpha = (1 - dist/max_dist); 427 | if(alpha < 0) alpha = 0; 428 | float v1 = get_pixel(source, x,y,k); 429 | float v2 = get_pixel(dest, dx+x,dy+y,k); 430 | float val = alpha*v1 + (1-alpha)*v2; 431 | set_pixel(dest, dx+x, dy+y, k, val); 432 | } 433 | } 434 | } 435 | } 436 | 437 | void embed_image(image source, image dest, int dx, int dy) 438 | { 439 | int x,y,k; 440 | for(k = 0; k < source.c; ++k){ 441 | for(y = 0; y < source.h; ++y){ 442 | for(x = 0; x < source.w; ++x){ 443 | float val = get_pixel(source, x,y,k); 444 | set_pixel(dest, dx+x, dy+y, k, val); 445 | } 446 | } 447 | } 448 | } 449 | 450 | image collapse_image_layers(image source, int border) 451 | { 452 | int h = source.h; 453 | h = (h+border)*source.c - border; 454 | image dest = make_image(source.w, h, 1); 455 | int i; 456 | for(i = 0; i < source.c; ++i){ 457 | image layer = get_image_layer(source, i); 458 | int h_offset = i*(source.h+border); 459 | embed_image(layer, dest, 0, h_offset); 460 | free_image(layer); 461 | } 462 | return dest; 463 | } 464 | 465 | void constrain_image(image im) 466 | { 467 | int i; 468 | for(i = 0; i < im.w*im.h*im.c; ++i){ 469 | if(im.data[i] < 0) im.data[i] = 0; 470 | if(im.data[i] > 1) im.data[i] = 1; 471 | } 472 | } 473 | 474 | void normalize_image(image p) 475 | { 476 | int i; 477 | float min = 9999999; 478 | float max = -999999; 479 | 480 | for(i = 0; i < p.h*p.w*p.c; ++i){ 481 | float v = p.data[i]; 482 | if(v < min) min = v; 483 | if(v > max) max = v; 484 | } 485 | if(max - min < .000000001){ 486 | min = 0; 487 | max = 1; 488 | } 489 | for(i = 0; i < p.c*p.w*p.h; ++i){ 490 | p.data[i] = (p.data[i] - min)/(max-min); 491 | } 492 | } 493 | 494 | void normalize_image2(image p) 495 | { 496 | float *min = calloc(p.c, sizeof(float)); 497 | float *max = calloc(p.c, sizeof(float)); 498 | int i,j; 499 | for(i = 0; i < p.c; ++i) min[i] = max[i] = p.data[i*p.h*p.w]; 500 | 501 | for(j = 0; j < p.c; ++j){ 502 | for(i = 0; i < p.h*p.w; ++i){ 503 | float v = p.data[i+j*p.h*p.w]; 504 | if(v < min[j]) min[j] = v; 505 | if(v > max[j]) max[j] = v; 506 | } 507 | } 508 | for(i = 0; i < p.c; ++i){ 509 | if(max[i] - min[i] < .000000001){ 510 | min[i] = 0; 511 | max[i] = 1; 512 | } 513 | } 514 | for(j = 0; j < p.c; ++j){ 515 | for(i = 0; i < p.w*p.h; ++i){ 516 | p.data[i+j*p.h*p.w] = (p.data[i+j*p.h*p.w] - min[j])/(max[j]-min[j]); 517 | } 518 | } 519 | free(min); 520 | free(max); 521 | } 522 | 523 | void copy_image_into(image src, image dest) 524 | { 525 | memcpy(dest.data, src.data, src.h*src.w*src.c*sizeof(float)); 526 | } 527 | 528 | image copy_image(image p) 529 | { 530 | image copy = p; 531 | copy.data = calloc(p.h*p.w*p.c, sizeof(float)); 532 | memcpy(copy.data, p.data, p.h*p.w*p.c*sizeof(float)); 533 | return copy; 534 | } 535 | 536 | void rgbgr_image(image im) 537 | { 538 | int i; 539 | for(i = 0; i < im.w*im.h; ++i){ 540 | float swap = im.data[i]; 541 | im.data[i] = im.data[i+im.w*im.h*2]; 542 | im.data[i+im.w*im.h*2] = swap; 543 | } 544 | } 545 | 546 | #ifdef OPENCV 547 | void show_image_cv(image p, const char *name, IplImage *disp) 548 | { 549 | int x,y,k; 550 | if(p.c == 3) rgbgr_image(p); 551 | //normalize_image(copy); 552 | 553 | char buff[256]; 554 | //sprintf(buff, "%s (%d)", name, windows); 555 | sprintf(buff, "%s", name); 556 | 557 | int step = disp->widthStep; 558 | cvNamedWindow(buff, CV_WINDOW_NORMAL); 559 | //cvMoveWindow(buff, 100*(windows%10) + 200*(windows/10), 100*(windows%10)); 560 | ++windows; 561 | for(y = 0; y < p.h; ++y){ 562 | for(x = 0; x < p.w; ++x){ 563 | for(k= 0; k < p.c; ++k){ 564 | disp->imageData[y*step + x*p.c + k] = (unsigned char)(get_pixel(p,x,y,k)*255); 565 | } 566 | } 567 | } 568 | if(0){ 569 | int w = 448; 570 | int h = w*p.h/p.w; 571 | if(h > 1000){ 572 | h = 1000; 573 | w = h*p.w/p.h; 574 | } 575 | IplImage *buffer = disp; 576 | disp = cvCreateImage(cvSize(w, h), buffer->depth, buffer->nChannels); 577 | cvResize(buffer, disp, CV_INTER_LINEAR); 578 | cvReleaseImage(&buffer); 579 | } 580 | cvShowImage(buff, disp); 581 | } 582 | #endif 583 | 584 | void show_image(image p, const char *name) 585 | { 586 | #ifdef OPENCV 587 | IplImage *disp = cvCreateImage(cvSize(p.w,p.h), IPL_DEPTH_8U, p.c); 588 | image copy = copy_image(p); 589 | constrain_image(copy); 590 | show_image_cv(copy, name, disp); 591 | free_image(copy); 592 | cvReleaseImage(&disp); 593 | #else 594 | fprintf(stderr, "Not compiled with OpenCV, saving to %s.png instead\n", name); 595 | save_image(p, name); 596 | #endif 597 | } 598 | 599 | #ifdef OPENCV 600 | 601 | void ipl_into_image(IplImage* src, image im) 602 | { 603 | unsigned char *data = (unsigned char *)src->imageData; 604 | int h = src->height; 605 | int w = src->width; 606 | int c = src->nChannels; 607 | int step = src->widthStep; 608 | int i, j, k; 609 | 610 | for(i = 0; i < h; ++i){ 611 | for(k= 0; k < c; ++k){ 612 | for(j = 0; j < w; ++j){ 613 | im.data[k*w*h + i*w + j] = data[i*step + j*c + k]/255.; 614 | } 615 | } 616 | } 617 | } 618 | 619 | image ipl_to_image(IplImage* src) 620 | { 621 | int h = src->height; 622 | int w = src->width; 623 | int c = src->nChannels; 624 | image out = make_image(w, h, c); 625 | ipl_into_image(src, out); 626 | return out; 627 | } 628 | 629 | image load_image_cv(char *filename, int channels) 630 | { 631 | IplImage* src = 0; 632 | int flag = -1; 633 | if (channels == 0) flag = -1; 634 | else if (channels == 1) flag = 0; 635 | else if (channels == 3) flag = 1; 636 | else { 637 | fprintf(stderr, "OpenCV can't force load with %d channels\n", channels); 638 | } 639 | 640 | if( (src = cvLoadImage(filename, flag)) == 0 ) 641 | { 642 | fprintf(stderr, "Cannot load image \"%s\"\n", filename); 643 | char buff[256]; 644 | sprintf(buff, "echo %s >> bad.list", filename); 645 | system(buff); 646 | return make_image(10,10,3); 647 | //exit(0); 648 | } 649 | image out = ipl_to_image(src); 650 | cvReleaseImage(&src); 651 | rgbgr_image(out); 652 | return out; 653 | } 654 | 655 | void flush_stream_buffer(CvCapture *cap, int n) 656 | { 657 | int i; 658 | for(i = 0; i < n; ++i) { 659 | cvQueryFrame(cap); 660 | } 661 | } 662 | 663 | image get_image_from_stream(CvCapture *cap) 664 | { 665 | IplImage* src = cvQueryFrame(cap); 666 | if (!src) return make_empty_image(0,0,0); 667 | image im = ipl_to_image(src); 668 | rgbgr_image(im); 669 | return im; 670 | } 671 | 672 | int fill_image_from_stream(CvCapture *cap, image im) 673 | { 674 | IplImage* src = cvQueryFrame(cap); 675 | if (!src) return 0; 676 | ipl_into_image(src, im); 677 | rgbgr_image(im); 678 | return 1; 679 | } 680 | 681 | void save_image_jpg(image p, const char *name) 682 | { 683 | image copy = copy_image(p); 684 | if(p.c == 3) rgbgr_image(copy); 685 | int x,y,k; 686 | 687 | char buff[256]; 688 | sprintf(buff, "%s.jpg", name); 689 | 690 | IplImage *disp = cvCreateImage(cvSize(p.w,p.h), IPL_DEPTH_8U, p.c); 691 | int step = disp->widthStep; 692 | for(y = 0; y < p.h; ++y){ 693 | for(x = 0; x < p.w; ++x){ 694 | for(k= 0; k < p.c; ++k){ 695 | disp->imageData[y*step + x*p.c + k] = (unsigned char)(get_pixel(copy,x,y,k)*255); 696 | } 697 | } 698 | } 699 | cvSaveImage(buff, disp,0); 700 | cvReleaseImage(&disp); 701 | free_image(copy); 702 | } 703 | #endif 704 | 705 | void save_image_png(image im, const char *name) 706 | { 707 | char buff[256]; 708 | //sprintf(buff, "%s (%d)", name, windows); 709 | sprintf(buff, "%s.png", name); 710 | unsigned char *data = calloc(im.w*im.h*im.c, sizeof(char)); 711 | int i,k; 712 | for(k = 0; k < im.c; ++k){ 713 | for(i = 0; i < im.w*im.h; ++i){ 714 | data[i*im.c+k] = (unsigned char) (255*im.data[i + k*im.w*im.h]); 715 | } 716 | } 717 | int success = stbi_write_png(buff, im.w, im.h, im.c, data, im.w*im.c); 718 | free(data); 719 | if(!success) fprintf(stderr, "Failed to write image %s\n", buff); 720 | } 721 | 722 | void save_image(image im, const char *name) 723 | { 724 | #ifdef OPENCV 725 | save_image_jpg(im, name); 726 | #else 727 | save_image_png(im, name); 728 | #endif 729 | } 730 | 731 | 732 | void show_image_layers(image p, char *name) 733 | { 734 | int i; 735 | char buff[256]; 736 | for(i = 0; i < p.c; ++i){ 737 | sprintf(buff, "%s - Layer %d", name, i); 738 | image layer = get_image_layer(p, i); 739 | show_image(layer, buff); 740 | free_image(layer); 741 | } 742 | } 743 | 744 | void show_image_collapsed(image p, char *name) 745 | { 746 | image c = collapse_image_layers(p, 1); 747 | show_image(c, name); 748 | free_image(c); 749 | } 750 | 751 | image make_empty_image(int w, int h, int c) 752 | { 753 | image out; 754 | out.data = 0; 755 | out.h = h; 756 | out.w = w; 757 | out.c = c; 758 | return out; 759 | } 760 | 761 | image make_image(int w, int h, int c) 762 | { 763 | image out = make_empty_image(w,h,c); 764 | out.data = calloc(h*w*c, sizeof(float)); 765 | return out; 766 | } 767 | 768 | image make_random_image(int w, int h, int c) 769 | { 770 | image out = make_empty_image(w,h,c); 771 | out.data = calloc(h*w*c, sizeof(float)); 772 | int i; 773 | for(i = 0; i < w*h*c; ++i){ 774 | out.data[i] = (rand_normal() * .25) + .5; 775 | } 776 | return out; 777 | } 778 | 779 | image float_to_image(int w, int h, int c, float *data) 780 | { 781 | image out = make_empty_image(w,h,c); 782 | out.data = data; 783 | return out; 784 | } 785 | 786 | void place_image(image im, int w, int h, int dx, int dy, image canvas) 787 | { 788 | int x, y, c; 789 | for(c = 0; c < im.c; ++c){ 790 | for(y = 0; y < h; ++y){ 791 | for(x = 0; x < w; ++x){ 792 | int rx = ((float)x / w) * im.w; 793 | int ry = ((float)y / h) * im.h; 794 | float val = bilinear_interpolate(im, rx, ry, c); 795 | set_pixel(canvas, x + dx, y + dy, c, val); 796 | } 797 | } 798 | } 799 | } 800 | 801 | image center_crop_image(image im, int w, int h) 802 | { 803 | int m = (im.w < im.h) ? im.w : im.h; 804 | image c = crop_image(im, (im.w - m) / 2, (im.h - m)/2, m, m); 805 | image r = resize_image(c, w, h); 806 | free_image(c); 807 | return r; 808 | } 809 | 810 | image rotate_crop_image(image im, float rad, float s, int w, int h, float dx, float dy, float aspect) 811 | { 812 | int x, y, c; 813 | float cx = im.w/2.; 814 | float cy = im.h/2.; 815 | image rot = make_image(w, h, im.c); 816 | for(c = 0; c < im.c; ++c){ 817 | for(y = 0; y < h; ++y){ 818 | for(x = 0; x < w; ++x){ 819 | float rx = cos(rad)*((x - w/2.)/s*aspect + dx/s*aspect) - sin(rad)*((y - h/2.)/s + dy/s) + cx; 820 | float ry = sin(rad)*((x - w/2.)/s*aspect + dx/s*aspect) + cos(rad)*((y - h/2.)/s + dy/s) + cy; 821 | float val = bilinear_interpolate(im, rx, ry, c); 822 | set_pixel(rot, x, y, c, val); 823 | } 824 | } 825 | } 826 | return rot; 827 | } 828 | 829 | image rotate_image(image im, float rad) 830 | { 831 | int x, y, c; 832 | float cx = im.w/2.; 833 | float cy = im.h/2.; 834 | image rot = make_image(im.w, im.h, im.c); 835 | for(c = 0; c < im.c; ++c){ 836 | for(y = 0; y < im.h; ++y){ 837 | for(x = 0; x < im.w; ++x){ 838 | float rx = cos(rad)*(x-cx) - sin(rad)*(y-cy) + cx; 839 | float ry = sin(rad)*(x-cx) + cos(rad)*(y-cy) + cy; 840 | float val = bilinear_interpolate(im, rx, ry, c); 841 | set_pixel(rot, x, y, c, val); 842 | } 843 | } 844 | } 845 | return rot; 846 | } 847 | 848 | void fill_image(image m, float s) 849 | { 850 | int i; 851 | for(i = 0; i < m.h*m.w*m.c; ++i) m.data[i] = s; 852 | } 853 | 854 | void translate_image(image m, float s) 855 | { 856 | int i; 857 | for(i = 0; i < m.h*m.w*m.c; ++i) m.data[i] += s; 858 | } 859 | 860 | void scale_image(image m, float s) 861 | { 862 | int i; 863 | for(i = 0; i < m.h*m.w*m.c; ++i) m.data[i] *= s; 864 | } 865 | 866 | image crop_image(image im, int dx, int dy, int w, int h) 867 | { 868 | image cropped = make_image(w, h, im.c); 869 | int i, j, k; 870 | for(k = 0; k < im.c; ++k){ 871 | for(j = 0; j < h; ++j){ 872 | for(i = 0; i < w; ++i){ 873 | int r = j + dy; 874 | int c = i + dx; 875 | float val = 0; 876 | r = constrain_int(r, 0, im.h-1); 877 | c = constrain_int(c, 0, im.w-1); 878 | val = get_pixel(im, c, r, k); 879 | set_pixel(cropped, i, j, k, val); 880 | } 881 | } 882 | } 883 | return cropped; 884 | } 885 | 886 | int best_3d_shift_r(image a, image b, int min, int max) 887 | { 888 | if(min == max) return min; 889 | int mid = floor((min + max) / 2.); 890 | image c1 = crop_image(b, 0, mid, b.w, b.h); 891 | image c2 = crop_image(b, 0, mid+1, b.w, b.h); 892 | float d1 = dist_array(c1.data, a.data, a.w*a.h*a.c, 10); 893 | float d2 = dist_array(c2.data, a.data, a.w*a.h*a.c, 10); 894 | free_image(c1); 895 | free_image(c2); 896 | if(d1 < d2) return best_3d_shift_r(a, b, min, mid); 897 | else return best_3d_shift_r(a, b, mid+1, max); 898 | } 899 | 900 | int best_3d_shift(image a, image b, int min, int max) 901 | { 902 | int i; 903 | int best = 0; 904 | float best_distance = FLT_MAX; 905 | for(i = min; i <= max; i += 2){ 906 | image c = crop_image(b, 0, i, b.w, b.h); 907 | float d = dist_array(c.data, a.data, a.w*a.h*a.c, 100); 908 | if(d < best_distance){ 909 | best_distance = d; 910 | best = i; 911 | } 912 | printf("%d %f\n", i, d); 913 | free_image(c); 914 | } 915 | return best; 916 | } 917 | 918 | void composite_3d(char *f1, char *f2, char *out, int delta) 919 | { 920 | if(!out) out = "out"; 921 | image a = load_image(f1, 0,0,0); 922 | image b = load_image(f2, 0,0,0); 923 | int shift = best_3d_shift_r(a, b, -a.h/100, a.h/100); 924 | 925 | image c1 = crop_image(b, 10, shift, b.w, b.h); 926 | float d1 = dist_array(c1.data, a.data, a.w*a.h*a.c, 100); 927 | image c2 = crop_image(b, -10, shift, b.w, b.h); 928 | float d2 = dist_array(c2.data, a.data, a.w*a.h*a.c, 100); 929 | 930 | if(d2 < d1 && 0){ 931 | image swap = a; 932 | a = b; 933 | b = swap; 934 | shift = -shift; 935 | printf("swapped, %d\n", shift); 936 | } 937 | else{ 938 | printf("%d\n", shift); 939 | } 940 | 941 | image c = crop_image(b, delta, shift, a.w, a.h); 942 | int i; 943 | for(i = 0; i < c.w*c.h; ++i){ 944 | c.data[i] = a.data[i]; 945 | } 946 | #ifdef OPENCV 947 | save_image_jpg(c, out); 948 | #else 949 | save_image(c, out); 950 | #endif 951 | } 952 | 953 | void letterbox_image_into(image im, int w, int h, image boxed) 954 | { 955 | int new_w = im.w; 956 | int new_h = im.h; 957 | if (((float)w/im.w) < ((float)h/im.h)) { 958 | new_w = w; 959 | new_h = (im.h * w)/im.w; 960 | } else { 961 | new_h = h; 962 | new_w = (im.w * h)/im.h; 963 | } 964 | image resized = resize_image(im, new_w, new_h); 965 | embed_image(resized, boxed, (w-new_w)/2, (h-new_h)/2); 966 | free_image(resized); 967 | } 968 | 969 | image letterbox_image(image im, int w, int h) 970 | { 971 | int new_w = im.w; 972 | int new_h = im.h; 973 | if (((float)w/im.w) < ((float)h/im.h)) { 974 | new_w = w; 975 | new_h = (im.h * w)/im.w; 976 | } else { 977 | new_h = h; 978 | new_w = (im.w * h)/im.h; 979 | } 980 | image resized = resize_image(im, new_w, new_h); 981 | image boxed = make_image(w, h, im.c); 982 | fill_image(boxed, .5); 983 | //int i; 984 | //for(i = 0; i < boxed.w*boxed.h*boxed.c; ++i) boxed.data[i] = 0; 985 | embed_image(resized, boxed, (w-new_w)/2, (h-new_h)/2); 986 | free_image(resized); 987 | return boxed; 988 | } 989 | 990 | image resize_max(image im, int max) 991 | { 992 | int w = im.w; 993 | int h = im.h; 994 | if(w > h){ 995 | h = (h * max) / w; 996 | w = max; 997 | } else { 998 | w = (w * max) / h; 999 | h = max; 1000 | } 1001 | if(w == im.w && h == im.h) return im; 1002 | image resized = resize_image(im, w, h); 1003 | return resized; 1004 | } 1005 | 1006 | image resize_min(image im, int min) 1007 | { 1008 | int w = im.w; 1009 | int h = im.h; 1010 | if(w < h){ 1011 | h = (h * min) / w; 1012 | w = min; 1013 | } else { 1014 | w = (w * min) / h; 1015 | h = min; 1016 | } 1017 | if(w == im.w && h == im.h) return im; 1018 | image resized = resize_image(im, w, h); 1019 | return resized; 1020 | } 1021 | 1022 | image random_crop_image(image im, int w, int h) 1023 | { 1024 | int dx = rand_int(0, im.w - w); 1025 | int dy = rand_int(0, im.h - h); 1026 | image crop = crop_image(im, dx, dy, w, h); 1027 | return crop; 1028 | } 1029 | 1030 | augment_args random_augment_args(image im, float angle, float aspect, int low, int high, int w, int h) 1031 | { 1032 | augment_args a = {0}; 1033 | aspect = rand_scale(aspect); 1034 | int r = rand_int(low, high); 1035 | int min = (im.h < im.w*aspect) ? im.h : im.w*aspect; 1036 | float scale = (float)r / min; 1037 | 1038 | float rad = rand_uniform(-angle, angle) * TWO_PI / 360.; 1039 | 1040 | float dx = (im.w*scale/aspect - w) / 2.; 1041 | float dy = (im.h*scale - w) / 2.; 1042 | //if(dx < 0) dx = 0; 1043 | //if(dy < 0) dy = 0; 1044 | dx = rand_uniform(-dx, dx); 1045 | dy = rand_uniform(-dy, dy); 1046 | 1047 | a.rad = rad; 1048 | a.scale = scale; 1049 | a.w = w; 1050 | a.h = h; 1051 | a.dx = dx; 1052 | a.dy = dy; 1053 | a.aspect = aspect; 1054 | return a; 1055 | } 1056 | 1057 | image random_augment_image(image im, float angle, float aspect, int low, int high, int w, int h) 1058 | { 1059 | augment_args a = random_augment_args(im, angle, aspect, low, high, w, h); 1060 | image crop = rotate_crop_image(im, a.rad, a.scale, a.w, a.h, a.dx, a.dy, a.aspect); 1061 | return crop; 1062 | } 1063 | 1064 | float three_way_max(float a, float b, float c) 1065 | { 1066 | return (a > b) ? ( (a > c) ? a : c) : ( (b > c) ? b : c) ; 1067 | } 1068 | 1069 | float three_way_min(float a, float b, float c) 1070 | { 1071 | return (a < b) ? ( (a < c) ? a : c) : ( (b < c) ? b : c) ; 1072 | } 1073 | 1074 | void yuv_to_rgb(image im) 1075 | { 1076 | assert(im.c == 3); 1077 | int i, j; 1078 | float r, g, b; 1079 | float y, u, v; 1080 | for(j = 0; j < im.h; ++j){ 1081 | for(i = 0; i < im.w; ++i){ 1082 | y = get_pixel(im, i , j, 0); 1083 | u = get_pixel(im, i , j, 1); 1084 | v = get_pixel(im, i , j, 2); 1085 | 1086 | r = y + 1.13983*v; 1087 | g = y + -.39465*u + -.58060*v; 1088 | b = y + 2.03211*u; 1089 | 1090 | set_pixel(im, i, j, 0, r); 1091 | set_pixel(im, i, j, 1, g); 1092 | set_pixel(im, i, j, 2, b); 1093 | } 1094 | } 1095 | } 1096 | 1097 | void rgb_to_yuv(image im) 1098 | { 1099 | assert(im.c == 3); 1100 | int i, j; 1101 | float r, g, b; 1102 | float y, u, v; 1103 | for(j = 0; j < im.h; ++j){ 1104 | for(i = 0; i < im.w; ++i){ 1105 | r = get_pixel(im, i , j, 0); 1106 | g = get_pixel(im, i , j, 1); 1107 | b = get_pixel(im, i , j, 2); 1108 | 1109 | y = .299*r + .587*g + .114*b; 1110 | u = -.14713*r + -.28886*g + .436*b; 1111 | v = .615*r + -.51499*g + -.10001*b; 1112 | 1113 | set_pixel(im, i, j, 0, y); 1114 | set_pixel(im, i, j, 1, u); 1115 | set_pixel(im, i, j, 2, v); 1116 | } 1117 | } 1118 | } 1119 | 1120 | // http://www.cs.rit.edu/~ncs/color/t_convert.html 1121 | void rgb_to_hsv(image im) 1122 | { 1123 | assert(im.c == 3); 1124 | int i, j; 1125 | float r, g, b; 1126 | float h, s, v; 1127 | for(j = 0; j < im.h; ++j){ 1128 | for(i = 0; i < im.w; ++i){ 1129 | r = get_pixel(im, i , j, 0); 1130 | g = get_pixel(im, i , j, 1); 1131 | b = get_pixel(im, i , j, 2); 1132 | float max = three_way_max(r,g,b); 1133 | float min = three_way_min(r,g,b); 1134 | float delta = max - min; 1135 | v = max; 1136 | if(max == 0){ 1137 | s = 0; 1138 | h = 0; 1139 | }else{ 1140 | s = delta/max; 1141 | if(r == max){ 1142 | h = (g - b) / delta; 1143 | } else if (g == max) { 1144 | h = 2 + (b - r) / delta; 1145 | } else { 1146 | h = 4 + (r - g) / delta; 1147 | } 1148 | if (h < 0) h += 6; 1149 | h = h/6.; 1150 | } 1151 | set_pixel(im, i, j, 0, h); 1152 | set_pixel(im, i, j, 1, s); 1153 | set_pixel(im, i, j, 2, v); 1154 | } 1155 | } 1156 | } 1157 | 1158 | void hsv_to_rgb(image im) 1159 | { 1160 | assert(im.c == 3); 1161 | int i, j; 1162 | float r, g, b; 1163 | float h, s, v; 1164 | float f, p, q, t; 1165 | for(j = 0; j < im.h; ++j){ 1166 | for(i = 0; i < im.w; ++i){ 1167 | h = 6 * get_pixel(im, i , j, 0); 1168 | s = get_pixel(im, i , j, 1); 1169 | v = get_pixel(im, i , j, 2); 1170 | if (s == 0) { 1171 | r = g = b = v; 1172 | } else { 1173 | int index = floor(h); 1174 | f = h - index; 1175 | p = v*(1-s); 1176 | q = v*(1-s*f); 1177 | t = v*(1-s*(1-f)); 1178 | if(index == 0){ 1179 | r = v; g = t; b = p; 1180 | } else if(index == 1){ 1181 | r = q; g = v; b = p; 1182 | } else if(index == 2){ 1183 | r = p; g = v; b = t; 1184 | } else if(index == 3){ 1185 | r = p; g = q; b = v; 1186 | } else if(index == 4){ 1187 | r = t; g = p; b = v; 1188 | } else { 1189 | r = v; g = p; b = q; 1190 | } 1191 | } 1192 | set_pixel(im, i, j, 0, r); 1193 | set_pixel(im, i, j, 1, g); 1194 | set_pixel(im, i, j, 2, b); 1195 | } 1196 | } 1197 | } 1198 | 1199 | void grayscale_image_3c(image im) 1200 | { 1201 | assert(im.c == 3); 1202 | int i, j, k; 1203 | float scale[] = {0.299, 0.587, 0.114}; 1204 | for(j = 0; j < im.h; ++j){ 1205 | for(i = 0; i < im.w; ++i){ 1206 | float val = 0; 1207 | for(k = 0; k < 3; ++k){ 1208 | val += scale[k]*get_pixel(im, i, j, k); 1209 | } 1210 | im.data[0*im.h*im.w + im.w*j + i] = val; 1211 | im.data[1*im.h*im.w + im.w*j + i] = val; 1212 | im.data[2*im.h*im.w + im.w*j + i] = val; 1213 | } 1214 | } 1215 | } 1216 | 1217 | image grayscale_image(image im) 1218 | { 1219 | assert(im.c == 3); 1220 | int i, j, k; 1221 | image gray = make_image(im.w, im.h, 1); 1222 | float scale[] = {0.299, 0.587, 0.114}; 1223 | for(k = 0; k < im.c; ++k){ 1224 | for(j = 0; j < im.h; ++j){ 1225 | for(i = 0; i < im.w; ++i){ 1226 | gray.data[i+im.w*j] += scale[k]*get_pixel(im, i, j, k); 1227 | } 1228 | } 1229 | } 1230 | return gray; 1231 | } 1232 | 1233 | image threshold_image(image im, float thresh) 1234 | { 1235 | int i; 1236 | image t = make_image(im.w, im.h, im.c); 1237 | for(i = 0; i < im.w*im.h*im.c; ++i){ 1238 | t.data[i] = im.data[i]>thresh ? 1 : 0; 1239 | } 1240 | return t; 1241 | } 1242 | 1243 | image blend_image(image fore, image back, float alpha) 1244 | { 1245 | assert(fore.w == back.w && fore.h == back.h && fore.c == back.c); 1246 | image blend = make_image(fore.w, fore.h, fore.c); 1247 | int i, j, k; 1248 | for(k = 0; k < fore.c; ++k){ 1249 | for(j = 0; j < fore.h; ++j){ 1250 | for(i = 0; i < fore.w; ++i){ 1251 | float val = alpha * get_pixel(fore, i, j, k) + 1252 | (1 - alpha)* get_pixel(back, i, j, k); 1253 | set_pixel(blend, i, j, k, val); 1254 | } 1255 | } 1256 | } 1257 | return blend; 1258 | } 1259 | 1260 | void scale_image_channel(image im, int c, float v) 1261 | { 1262 | int i, j; 1263 | for(j = 0; j < im.h; ++j){ 1264 | for(i = 0; i < im.w; ++i){ 1265 | float pix = get_pixel(im, i, j, c); 1266 | pix = pix*v; 1267 | set_pixel(im, i, j, c, pix); 1268 | } 1269 | } 1270 | } 1271 | 1272 | void translate_image_channel(image im, int c, float v) 1273 | { 1274 | int i, j; 1275 | for(j = 0; j < im.h; ++j){ 1276 | for(i = 0; i < im.w; ++i){ 1277 | float pix = get_pixel(im, i, j, c); 1278 | pix = pix+v; 1279 | set_pixel(im, i, j, c, pix); 1280 | } 1281 | } 1282 | } 1283 | 1284 | image binarize_image(image im) 1285 | { 1286 | image c = copy_image(im); 1287 | int i; 1288 | for(i = 0; i < im.w * im.h * im.c; ++i){ 1289 | if(c.data[i] > .5) c.data[i] = 1; 1290 | else c.data[i] = 0; 1291 | } 1292 | return c; 1293 | } 1294 | 1295 | void saturate_image(image im, float sat) 1296 | { 1297 | rgb_to_hsv(im); 1298 | scale_image_channel(im, 1, sat); 1299 | hsv_to_rgb(im); 1300 | constrain_image(im); 1301 | } 1302 | 1303 | void hue_image(image im, float hue) 1304 | { 1305 | rgb_to_hsv(im); 1306 | int i; 1307 | for(i = 0; i < im.w*im.h; ++i){ 1308 | im.data[i] = im.data[i] + hue; 1309 | if (im.data[i] > 1) im.data[i] -= 1; 1310 | if (im.data[i] < 0) im.data[i] += 1; 1311 | } 1312 | hsv_to_rgb(im); 1313 | constrain_image(im); 1314 | } 1315 | 1316 | void exposure_image(image im, float sat) 1317 | { 1318 | rgb_to_hsv(im); 1319 | scale_image_channel(im, 2, sat); 1320 | hsv_to_rgb(im); 1321 | constrain_image(im); 1322 | } 1323 | 1324 | void distort_image(image im, float hue, float sat, float val) 1325 | { 1326 | rgb_to_hsv(im); 1327 | scale_image_channel(im, 1, sat); 1328 | scale_image_channel(im, 2, val); 1329 | int i; 1330 | for(i = 0; i < im.w*im.h; ++i){ 1331 | im.data[i] = im.data[i] + hue; 1332 | if (im.data[i] > 1) im.data[i] -= 1; 1333 | if (im.data[i] < 0) im.data[i] += 1; 1334 | } 1335 | hsv_to_rgb(im); 1336 | constrain_image(im); 1337 | } 1338 | 1339 | void random_distort_image(image im, float hue, float saturation, float exposure) 1340 | { 1341 | float dhue = rand_uniform(-hue, hue); 1342 | float dsat = rand_scale(saturation); 1343 | float dexp = rand_scale(exposure); 1344 | distort_image(im, dhue, dsat, dexp); 1345 | } 1346 | 1347 | void saturate_exposure_image(image im, float sat, float exposure) 1348 | { 1349 | rgb_to_hsv(im); 1350 | scale_image_channel(im, 1, sat); 1351 | scale_image_channel(im, 2, exposure); 1352 | hsv_to_rgb(im); 1353 | constrain_image(im); 1354 | } 1355 | 1356 | image resize_image(image im, int w, int h) 1357 | { 1358 | image resized = make_image(w, h, im.c); 1359 | image part = make_image(w, im.h, im.c); 1360 | int r, c, k; 1361 | float w_scale = (float)(im.w - 1) / (w - 1); 1362 | float h_scale = (float)(im.h - 1) / (h - 1); 1363 | for(k = 0; k < im.c; ++k){ 1364 | for(r = 0; r < im.h; ++r){ 1365 | for(c = 0; c < w; ++c){ 1366 | float val = 0; 1367 | if(c == w-1 || im.w == 1){ 1368 | val = get_pixel(im, im.w-1, r, k); 1369 | } else { 1370 | float sx = c*w_scale; 1371 | int ix = (int) sx; 1372 | float dx = sx - ix; 1373 | val = (1 - dx) * get_pixel(im, ix, r, k) + dx * get_pixel(im, ix+1, r, k); 1374 | } 1375 | set_pixel(part, c, r, k, val); 1376 | } 1377 | } 1378 | } 1379 | for(k = 0; k < im.c; ++k){ 1380 | for(r = 0; r < h; ++r){ 1381 | float sy = r*h_scale; 1382 | int iy = (int) sy; 1383 | float dy = sy - iy; 1384 | for(c = 0; c < w; ++c){ 1385 | float val = (1-dy) * get_pixel(part, c, iy, k); 1386 | set_pixel(resized, c, r, k, val); 1387 | } 1388 | if(r == h-1 || im.h == 1) continue; 1389 | for(c = 0; c < w; ++c){ 1390 | float val = dy * get_pixel(part, c, iy+1, k); 1391 | add_pixel(resized, c, r, k, val); 1392 | } 1393 | } 1394 | } 1395 | 1396 | free_image(part); 1397 | return resized; 1398 | } 1399 | 1400 | 1401 | void test_resize(char *filename) 1402 | { 1403 | image im = load_image(filename, 0,0, 3); 1404 | float mag = mag_array(im.data, im.w*im.h*im.c); 1405 | printf("L2 Norm: %f\n", mag); 1406 | image gray = grayscale_image(im); 1407 | 1408 | image c1 = copy_image(im); 1409 | image c2 = copy_image(im); 1410 | image c3 = copy_image(im); 1411 | image c4 = copy_image(im); 1412 | distort_image(c1, .1, 1.5, 1.5); 1413 | distort_image(c2, -.1, .66666, .66666); 1414 | distort_image(c3, .1, 1.5, .66666); 1415 | distort_image(c4, .1, .66666, 1.5); 1416 | 1417 | 1418 | show_image(im, "Original"); 1419 | show_image(gray, "Gray"); 1420 | show_image(c1, "C1"); 1421 | show_image(c2, "C2"); 1422 | show_image(c3, "C3"); 1423 | show_image(c4, "C4"); 1424 | #ifdef OPENCV 1425 | while(1){ 1426 | image aug = random_augment_image(im, 0, .75, 320, 448, 320, 320); 1427 | show_image(aug, "aug"); 1428 | free_image(aug); 1429 | 1430 | 1431 | float exposure = 1.15; 1432 | float saturation = 1.15; 1433 | float hue = .05; 1434 | 1435 | image c = copy_image(im); 1436 | 1437 | float dexp = rand_scale(exposure); 1438 | float dsat = rand_scale(saturation); 1439 | float dhue = rand_uniform(-hue, hue); 1440 | 1441 | distort_image(c, dhue, dsat, dexp); 1442 | show_image(c, "rand"); 1443 | printf("%f %f %f\n", dhue, dsat, dexp); 1444 | free_image(c); 1445 | cvWaitKey(0); 1446 | } 1447 | #endif 1448 | } 1449 | 1450 | 1451 | image load_image_stb(char *filename, int channels) 1452 | { 1453 | int w, h, c; 1454 | unsigned char *data = stbi_load(filename, &w, &h, &c, channels); 1455 | if (!data) { 1456 | fprintf(stderr, "Cannot load image \"%s\"\nSTB Reason: %s\n", filename, stbi_failure_reason()); 1457 | exit(0); 1458 | } 1459 | if(channels) c = channels; 1460 | int i,j,k; 1461 | image im = make_image(w, h, c); 1462 | for(k = 0; k < c; ++k){ 1463 | for(j = 0; j < h; ++j){ 1464 | for(i = 0; i < w; ++i){ 1465 | int dst_index = i + w*j + w*h*k; 1466 | int src_index = k + c*i + c*w*j; 1467 | im.data[dst_index] = (float)data[src_index]/255.; 1468 | } 1469 | } 1470 | } 1471 | free(data); 1472 | return im; 1473 | } 1474 | 1475 | image load_image(char *filename, int w, int h, int c) 1476 | { 1477 | #ifdef OPENCV 1478 | image out = load_image_cv(filename, c); 1479 | #else 1480 | image out = load_image_stb(filename, c); 1481 | #endif 1482 | 1483 | if((h && w) && (h != out.h || w != out.w)){ 1484 | image resized = resize_image(out, w, h); 1485 | free_image(out); 1486 | out = resized; 1487 | } 1488 | return out; 1489 | } 1490 | 1491 | image load_image_color(char *filename, int w, int h) 1492 | { 1493 | return load_image(filename, w, h, 3); 1494 | } 1495 | 1496 | image get_image_layer(image m, int l) 1497 | { 1498 | image out = make_image(m.w, m.h, 1); 1499 | int i; 1500 | for(i = 0; i < m.h*m.w; ++i){ 1501 | out.data[i] = m.data[i+l*m.h*m.w]; 1502 | } 1503 | return out; 1504 | } 1505 | void print_image(image m) 1506 | { 1507 | int i, j, k; 1508 | for(i =0 ; i < m.c; ++i){ 1509 | for(j =0 ; j < m.h; ++j){ 1510 | for(k = 0; k < m.w; ++k){ 1511 | printf("%.2lf, ", m.data[i*m.h*m.w + j*m.w + k]); 1512 | if(k > 30) break; 1513 | } 1514 | printf("\n"); 1515 | if(j > 30) break; 1516 | } 1517 | printf("\n"); 1518 | } 1519 | printf("\n"); 1520 | } 1521 | 1522 | image collapse_images_vert(image *ims, int n) 1523 | { 1524 | int color = 1; 1525 | int border = 1; 1526 | int h,w,c; 1527 | w = ims[0].w; 1528 | h = (ims[0].h + border) * n - border; 1529 | c = ims[0].c; 1530 | if(c != 3 || !color){ 1531 | w = (w+border)*c - border; 1532 | c = 1; 1533 | } 1534 | 1535 | image filters = make_image(w, h, c); 1536 | int i,j; 1537 | for(i = 0; i < n; ++i){ 1538 | int h_offset = i*(ims[0].h+border); 1539 | image copy = copy_image(ims[i]); 1540 | //normalize_image(copy); 1541 | if(c == 3 && color){ 1542 | embed_image(copy, filters, 0, h_offset); 1543 | } 1544 | else{ 1545 | for(j = 0; j < copy.c; ++j){ 1546 | int w_offset = j*(ims[0].w+border); 1547 | image layer = get_image_layer(copy, j); 1548 | embed_image(layer, filters, w_offset, h_offset); 1549 | free_image(layer); 1550 | } 1551 | } 1552 | free_image(copy); 1553 | } 1554 | return filters; 1555 | } 1556 | 1557 | image collapse_images_horz(image *ims, int n) 1558 | { 1559 | int color = 1; 1560 | int border = 1; 1561 | int h,w,c; 1562 | int size = ims[0].h; 1563 | h = size; 1564 | w = (ims[0].w + border) * n - border; 1565 | c = ims[0].c; 1566 | if(c != 3 || !color){ 1567 | h = (h+border)*c - border; 1568 | c = 1; 1569 | } 1570 | 1571 | image filters = make_image(w, h, c); 1572 | int i,j; 1573 | for(i = 0; i < n; ++i){ 1574 | int w_offset = i*(size+border); 1575 | image copy = copy_image(ims[i]); 1576 | //normalize_image(copy); 1577 | if(c == 3 && color){ 1578 | embed_image(copy, filters, w_offset, 0); 1579 | } 1580 | else{ 1581 | for(j = 0; j < copy.c; ++j){ 1582 | int h_offset = j*(size+border); 1583 | image layer = get_image_layer(copy, j); 1584 | embed_image(layer, filters, w_offset, h_offset); 1585 | free_image(layer); 1586 | } 1587 | } 1588 | free_image(copy); 1589 | } 1590 | return filters; 1591 | } 1592 | 1593 | void show_image_normalized(image im, const char *name) 1594 | { 1595 | image c = copy_image(im); 1596 | normalize_image(c); 1597 | show_image(c, name); 1598 | free_image(c); 1599 | } 1600 | 1601 | void show_images(image *ims, int n, char *window) 1602 | { 1603 | image m = collapse_images_vert(ims, n); 1604 | /* 1605 | int w = 448; 1606 | int h = ((float)m.h/m.w) * 448; 1607 | if(h > 896){ 1608 | h = 896; 1609 | w = ((float)m.w/m.h) * 896; 1610 | } 1611 | image sized = resize_image(m, w, h); 1612 | */ 1613 | normalize_image(m); 1614 | save_image(m, window); 1615 | show_image(m, window); 1616 | free_image(m); 1617 | } 1618 | 1619 | void free_image(image m) 1620 | { 1621 | if(m.data){ 1622 | free(m.data); 1623 | } 1624 | } 1625 | -------------------------------------------------------------------------------- /yolo/image.h: -------------------------------------------------------------------------------- 1 | #ifndef IMAGE_H 2 | #define IMAGE_H 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include "box.h" 10 | #include "darknet.h" 11 | 12 | #ifndef __cplusplus 13 | #ifdef OPENCV 14 | int fill_image_from_stream(CvCapture *cap, image im); 15 | image ipl_to_image(IplImage* src); 16 | void ipl_into_image(IplImage* src, image im); 17 | void flush_stream_buffer(CvCapture *cap, int n); 18 | void show_image_cv(image p, const char *name, IplImage *disp); 19 | #endif 20 | #endif 21 | 22 | float get_color(int c, int x, int max); 23 | void draw_box(image a, int x1, int y1, int x2, int y2, float r, float g, float b); 24 | void draw_bbox(image a, box bbox, int w, float r, float g, float b); 25 | void draw_label(image a, int r, int c, image label, const float *rgb); 26 | void write_label(image a, int r, int c, image *characters, char *string, float *rgb); 27 | image image_distance(image a, image b); 28 | void scale_image(image m, float s); 29 | image rotate_crop_image(image im, float rad, float s, int w, int h, float dx, float dy, float aspect); 30 | image center_crop_image(image im, int w, int h); 31 | image random_crop_image(image im, int w, int h); 32 | image random_augment_image(image im, float angle, float aspect, int low, int high, int w, int h); 33 | augment_args random_augment_args(image im, float angle, float aspect, int low, int high, int w, int h); 34 | void letterbox_image_into(image im, int w, int h, image boxed); 35 | image resize_max(image im, int max); 36 | void translate_image(image m, float s); 37 | void embed_image(image source, image dest, int dx, int dy); 38 | void place_image(image im, int w, int h, int dx, int dy, image canvas); 39 | void saturate_image(image im, float sat); 40 | void exposure_image(image im, float sat); 41 | void distort_image(image im, float hue, float sat, float val); 42 | void saturate_exposure_image(image im, float sat, float exposure); 43 | void rgb_to_hsv(image im); 44 | void hsv_to_rgb(image im); 45 | void yuv_to_rgb(image im); 46 | void rgb_to_yuv(image im); 47 | 48 | 49 | image collapse_image_layers(image source, int border); 50 | image collapse_images_horz(image *ims, int n); 51 | image collapse_images_vert(image *ims, int n); 52 | 53 | void show_image_normalized(image im, const char *name); 54 | void show_images(image *ims, int n, char *window); 55 | void show_image_layers(image p, char *name); 56 | void show_image_collapsed(image p, char *name); 57 | 58 | void print_image(image m); 59 | 60 | image make_empty_image(int w, int h, int c); 61 | void copy_image_into(image src, image dest); 62 | 63 | image get_image_layer(image m, int l); 64 | 65 | #endif 66 | 67 | --------------------------------------------------------------------------------