├── README.md
├── data_model
    └── ReadMe.txt
├── raw_data
    ├── README.md
    ├── create_h5_dataset.py
    ├── get_data_txt.py
    └── image_process.py
├── train
    ├── predict_resvgg.py
    ├── read_data.py
    ├── resvgg_model.py
    └── train_resvgg.py
└── yolo
    ├── ReadMe.txt
    ├── demo.c
    ├── demo.h
    ├── image.c
    └── image.h


/README.md:
--------------------------------------------------------------------------------
 1 | # pig_face
 2 | This repository is used to save the code for a competition
 3 | 
 4 | 若对以下描述有任何疑问，请及时与我们联系。
 5 | 邮箱: xuxcong@gmail.com , jiexin_zheng@qq.com
 6 | 
 7 | ## 1.	运行环境 
 8 | 
 9 | Ubuntu 16.04  python 2.7.12  cuda8.0  cudnn6.0  tensorflow 1.3.0
10 | 
11 | GPU 4*TITAN XP
12 | 
13 | 
14 | ## 2. 从视频中截取出猪：
15 | 
16 | (1)为了排除背景数据对模型的影响，我们使用yolo-9000算法提取出视频中每一帧的猪，代码来源于https://github.com/philipperemy/yolo-9000. 
17 | 我们对其代码做了修改，将yolo解压包的代码解压后覆盖 darknet/src下同名文件即可
18 | 
19 | (2)经观察后发现，虽然yolo-9000对猪的识别不一定会归于hog类，但是基本上所有的框都会以视频中的猪为主体，因此在取框的时候，我们不以hog类的框为输出图像，而是以置信度为参考标准。
20 | 
21 | (3)我们保留所有置信度大于0.1的窗口
22 | 
23 | (4)每个视频大约能得到一万多张ROI图片，我们按大小排序，选取大约前4000张图片，并剔除不相关的物体图片以及背景干扰较大的图片（比如没有框到猪身上，或者只框了极小部分的猪），将其作为训练集和验证集。
24 | 
25 | (5)最后得到94677张图片
26 | 
27 | 
28 | ## 3. 预处理以及生成数据集
29 | 
30 | (1)运行raw_data/image_process.py， 将上一步得到的图片通过padding的方法变为正方形，保证在之后的步骤中resize操作不会扭曲图片
31 | 
32 | (2)运行raw_data/get_data_txt.py，对数据进行分割，并且将数据分割成50个储存文件，存在txt文件中，方便之后大数据的分步读取
33 | 
34 | (3)运行raw_data/create_h5_dataset.h5, 将数据生成h5文件，这一步之后会得到50个储存训练集的.h5文件，以及50个储存验证集.h5文件
35 | 
36 | ## 4. 模型
37 | 
38 | (1)本模型基于细粒度识别模型bilinear cnn做的改进，参考源码来自于https://github.com/abhaydoke09/Bilinear-CNN-TensorFlow
39 | 参考论文 vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf
40 | Bilinear cnn是一个端到端的网络模型，该模型在CUB200-2011数据集上取得了弱监督细粒度分类模型的最好分类准确度。
41 | 
42 | (2)bilinear cnn把最后一层卷积核的输出做了外积（实际是做内积），以此达到融合不同特征的目的。
43 | 
44 | (3)我们队伍受resnet结构的启发，对bilinear cnn算法做了改进，将最后一层卷积核的输出也和前面其他层的卷积核的输出做内积，以此达到融合不同层次的特征的目的。再把得到的vector和原来的bilinear vector 融合。 我们增加了conv4_1、conv5_1对conv5_3的内积（只增加这两层是因为他们的filter numbers数量一致，pooling之后就可以做内积了，不需要加额外的卷积核）
45 | 我们的思想是：不同卷积层关注的特征不同，且对应感受视野的大小也不同（即有高低层次之分），在识别类似图像时，单独考虑特征是不够的，还需要考虑他们之间的空间关系。
46 | 
47 | (4)加载预训练的vgg模型，先训练全连接层，之后再训练整个网络。预训练权重下载地址https://www.cs.toronto.edu/~frossard/post/vgg16/
48 | 
49 | (5)训练过程中加入实时的数据增强，包括旋转、随机改变对比度、随机改变亮度、随机crop. 训练时全连接层的drop out概率为0.5
50 | 
51 | 
52 | ## 4. 结构
53 | 
54 | (1)train/read_data.py 是读取数据的结构。实现大数据的分次加载。
55 | 
56 | (2)train/resvgg_model.py定义了网络结构，以及读取保存的权重的方法
57 | 
58 | (3)train/train_resvgg.py定义了训练的过程
59 | 
60 | (4)train/predict_resvgg.py 输出预测结果
61 | 
62 | ## 5. 加载预训练模型，微调
63 | 
64 | (1)在读取resvgg模型时，令finetune=False,实现只训练最后的全连接层。并且调用load_initial_weights(sess)，读取预训练的vgg的卷积层的参数
65 | 
66 | (2)训练设置 optimizer = tf.train.MomentumOptimizer(learning_rate=0.2, momentum=0.5).minimize(loss)，训练次数50次
67 | 
68 | (3)将过程中得到的最优模型保存下来
69 | 
70 | ## 6. 全网络训练
71 | 
72 | (1)在读取resvgg模型时，令finetune=True。 调用load_own_weight(sess , model_path)，读取上一步得到的模型
73 | 
74 | (2)训练设置optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)， 训练200次
75 | 
76 | (3)将过程中得到的最优模型保存下来
77 | 
78 | 
79 | ## 7. 后期调整
80 | 
81 | 实际训练过程中，只有第一次会在所有数据上训练满200次。在得到保存下来的模型后，之后的调参过程只取大约1/4的数据进行继续训练
82 | 
83 | ## 8. 预测
84 | 
85 | (1)运行 predict_resvgg.py 预测结果
86 | 
87 | 


--------------------------------------------------------------------------------
/data_model/ReadMe.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xuxcong/pig-face-recognition/540cdd6026ec2be6250f2677c036bd57c2251e67/data_model/ReadMe.txt


--------------------------------------------------------------------------------
/raw_data/README.md:
--------------------------------------------------------------------------------
1 | # pig_face
2 | This repository is used to save the code for a competition
3 | 
4 | image_process.py 将原始图像通过padding的方法处理为正方形
5 | 
6 | get_data_txt.py 将需要用到的图像的路径保存下来，为下一步生成数据库做准备
7 | 
8 | create_h5_dataset.py 将数据处理为数据库的形式


--------------------------------------------------------------------------------
/raw_data/create_h5_dataset.py:
--------------------------------------------------------------------------------
 1 | from tflearn.data_utils import build_hdf5_image_dataset
 2 | import h5py
 3 | 
 4 | 
 5 | path = '/home/smie/zhengjx/Res_Bilinear_cnns/raw_data/txt/' 
 6 | filenum = 50;
 7 | filename = 'train_data'
 8 | files = [];
 9 | result = [];
10 | for i in range(0, filenum):
11 |     files.append(path + filename + str(i) + '.txt');
12 |     result.append(filename + str(i) + '.h5')
13 |     build_hdf5_image_dataset(files[i], image_shape=(488, 488), mode='file', output_path=result[i], categorical_labels=True, normalize=False)
14 |     print('Finish dataset ' + result[i]);
15 | 


--------------------------------------------------------------------------------
/raw_data/get_data_txt.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from PIL import Image,ImageDraw,ImageFilter,ImageEnhance
 3 | import numpy as np
 4 | import os
 5 | import matplotlib.pyplot as plt
 6 | import time
 7 | import shutil
 8 | 
 9 | 
10 | if __name__ == '__main__':
11 | 
12 |     
13 | 	work_file = os.getcwd();
14 | 	
15 |         data_directory = '/home/smie/zhengjx';
16 |         filenum = 50;
17 |         train_file = [];
18 |         validation_file = [];
19 |         for i in range(0,50):
20 | 	    train = open('txt/train_data' + str(i) + '.txt','w')
21 | 	    validation = open('txt/validation_data' + str(i) +'.txt','w');
22 | 	    train_file.append(train);
23 |             validation_file.append(validation);
24 | 
25 | 	data_path = os.path.join(data_directory,'ROI');
26 |         train_num = 0;
27 |         validation_num = 0;
28 | 	all_num = 0;
29 | 	for i in range(1,31):
30 | 		nowfile = os.path.join(data_path, str(i)); 
31 | 		files = os.listdir(nowfile)
32 | 		for filename in files:
33 | 			newname = str(all_num) + '.jpg';
34 | 			while(os.path.exists(os.path.join(nowfile,newname)) == True):
35 | 				newname = str(i) + newname;
36 | 			os.rename(os.path.join(nowfile,filename),os.path.join(nowfile,newname))
37 | 			filepath = os.path.join(nowfile,newname);
38 | 			label = str(i);
39 | 			all_num = all_num + 1;
40 | 			if(all_num % 5 == 0):
41 | 				validation_file[validation_num % filenum].write(filepath + ' ' + label + '\n');
42 |                                 validation_num = validation_num + 1;
43 | 			else:
44 | 				train_file[train_num % filenum].write(filepath + ' ' + label + '\n');
45 |                                 train_num = train_num + 1;
46 | 	'''
47 | 	test_file = open("testresult.txt",'w');
48 | 	data_path = os.path.join(work_file,'testresult');
49 | 	for i in range(1,31):
50 | 		nowfile = os.path.join(data_path, str(i)); 
51 | 		files = os.listdir(nowfile)
52 | 		for filename in files:
53 | 			filepath = os.path.join(nowfile,filename);
54 | 			label = str(i);
55 | 			test_file.write(filepath + ' ' + label + '\n');
56 |         '''
57 | 	'''
58 | 	test_file = open("anstest_data.txt",'w');
59 | 	data_path = os.path.join(work_file,'process_test');
60 | 	for i in ['test_A']:
61 | 		nowfile = os.path.join(data_path, str(i)); 
62 | 		files = os.listdir(nowfile)
63 | 		for filename in files:
64 | 			filepath = os.path.join(nowfile,filename);
65 | 			label = str(1);
66 | 			test_file.write(filepath + ' ' + label + '\n');
67 | 	'''
68 | 


--------------------------------------------------------------------------------
/raw_data/image_process.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from PIL import Image,ImageDraw,ImageFilter,ImageEnhance
 3 | import numpy as np
 4 | import os
 5 | import matplotlib.pyplot as plt
 6 | import time
 7 | import shutil
 8 | 
 9 | 
10 | 
11 | #reshape the images to a square
12 | def process_img(raw_path, result_path):
13 |   img4 = Image.open(raw_path)
14 |   longer_side = max(img4.size)
15 |   horizontal_padding = (longer_side - img4.size[0]) / 2
16 |   vertical_padding = (longer_side - img4.size[1]) / 2
17 |   img5 = img4.crop(
18 |       (
19 |           -horizontal_padding,
20 |           -vertical_padding,
21 |           img4.size[0] + horizontal_padding,
22 |           img4.size[1] + vertical_padding
23 |       )
24 |   )
25 |   img4.close();
26 |   img5 = img5.resize((512,512))
27 |   img5.save(result_path)
28 | 
29 | 
30 |   
31 | 
32 | 
33 | 
34 | if __name__ == '__main__':
35 |     #This file reshape the images into specific size, and padding to a square
36 |     '''
37 |     train_path = "/home/smie/zhengjx/face_recognize/raw_train/"
38 |     for i in range(1,31):
39 |       data_path = train_path + str(i);
40 |       result_file = data_path + '/';
41 |       files = os.listdir(data_path)
42 |       for img_name in files:
43 |           img_path = data_path + '/' + img_name;
44 |           result_img_path = result_file + '/' + img_name;
45 |           process_img(img_path, result_img_path)
46 | 
47 |     '''
48 | 
49 |     test_path = "/home/smie/zhengjx/face_recognize/test_B/"
50 |     result_path = 'process_testB/'
51 |     for i in ['test_B']:
52 |         data_path = test_path;
53 |         result_file = result_path;
54 |         if(os.path.exists(result_file) == True):        
55 |             shutil.rmtree(result_file);
56 |             time.sleep(1)
57 |         os.mkdir(result_file);
58 |         files = os.listdir(data_path)
59 |         for img_name in files:
60 |               img_path = data_path + '/' + img_name;
61 |               result_img_path = result_file + '/' + img_name;
62 |               process_img(img_path, result_img_path);
63 | 
64 | 


--------------------------------------------------------------------------------
/train/predict_resvgg.py:
--------------------------------------------------------------------------------
 1 | from resvgg_model import *;
 2 | from read_data import *;
 3 | 
 4 | def softmax(x):
 5 |     x = x - np.max(x)
 6 |     exp_x = np.exp(x)
 7 |     softmax_x = exp_x / np.sum(exp_x)
 8 |     return softmax_x
 9 | 
10 | if __name__ == '__main__':
11 | 
12 |     dataset_path = '/home/smie/zhengjx/Res_Bilinear_cnns/data_model/';
13 |     model_path = '/home/smie/zhengjx/Res_Bilinear_cnns/data_model/';
14 |     save_model_best = model_path + 'res_fine_last_layers_epoch_best.npz';
15 |     save_model_last = model_path + 'res_fine_last_layers_epoch_last.npz';
16 |     test_data_file = [dataset_path + 'testB.h5'];
17 |     val_data_file = [dataset_path + 'new_val_448.h5'];
18 |     num_class = 30;
19 |     test_batch_size = 1;
20 |     val_batch_size = 1;
21 |     test_reader = data_reader(test_data_file, num_class, test_batch_size, shuffle = False);
22 |     val_reader = data_reader(val_data_file, num_class, val_batch_size)
23 | 
24 |     model_path =  model_path + 'res_fine_last_layers_epoch_best.npz';
25 |     
26 |     sess = tf.Session()     ## Start session to create training graph
27 |     keep_prob = tf.placeholder(tf.float32);
28 |     imgs = tf.placeholder(tf.float32, [None, 448, 448, 3])
29 |     target = tf.placeholder("float", [None, 30])
30 | 
31 |     # resvgg = resvgg(imgs, 'vgg16_weights.npz', sess, finetune = False)
32 |     resvgg = resvgg(imgs, keep_prob, 'vgg16_weights.npz', sess, res = False)
33 |     
34 |     print('VGG network created')
35 |     
36 |     # Defining other ops using Tensorflow
37 |     loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=resvgg.fc3l, labels=target))
38 | 
39 |     #optimizer = tf.train.MomentumOptimizer(learning_rate=0.0005, momentum=0.4).minimize(loss)
40 |     optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.005).minimize(loss)
41 |     check_op = tf.add_check_numerics_ops()
42 | 
43 | 
44 |     correct_prediction = tf.equal(tf.argmax(resvgg.fc3l,1), tf.argmax(target,1))
45 |     accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
46 | 
47 |     num_correct_preds = tf.reduce_sum(tf.cast(correct_prediction, tf.float32))
48 | 
49 |     sess.run(tf.global_variables_initializer())
50 |     #resvgg.load_initial_weights(sess)
51 |     resvgg.load_own_weight(sess , model_path);
52 | 
53 |     
54 |     # Use the validation loss to make sure that we have loaded the right model
55 |     correct_val_count = 0
56 |     val_loss = 0.0
57 |     while(val_reader.have_next()):
58 |         batch_val_x, batch_val_y = val_reader.next_batch();
59 |         val_loss += sess.run(loss, feed_dict={imgs: batch_val_x, target: batch_val_y, keep_prob: 1.0})
60 |         pred = sess.run(num_correct_preds, feed_dict = {imgs: batch_val_x, target: batch_val_y, keep_prob: 1.0})
61 |         correct_val_count+=pred
62 |     val_loss = val_loss/(1.0*val_reader.total_datanum);
63 |     print("##############################")
64 |     print("Validation Loss -->", val_loss)
65 |     print("correct_val_count, total_val_count", correct_val_count, val_reader.total_datanum)
66 |     print("Validation Data Accuracy -->", 100.0*correct_val_count/(1.0*val_reader.total_datanum))
67 |     print("##############################")
68 | 
69 |     #parameter for test
70 |     target_path = '/home/smie/zhengjx/Res_Bilinear_cnns/train_test/testB.txt';
71 |     images = []
72 |     with open(target_path, 'r') as f:   
73 |         for l in f.readlines():
74 |             l = l.strip('\n').split()
75 |             name = l[0].split('/')[-1].split('.')[0];
76 |             images.append(name)
77 |     csvfile = file('b_cnn_' + 'test' +'.csv', 'wb')
78 |     writer = csv.writer(csvfile)
79 |     i = 0;
80 |     while(test_reader.have_next()):
81 |         batch_test_x, batch_val_y = test_reader.next_batch();
82 |         result = sess.run([resvgg.fc3l], feed_dict={imgs: batch_test_x, keep_prob: 1.0});      
83 |         result = softmax(result);
84 |         if(i % 100 == 0):
85 |             print(i)
86 |         for j in range(0,30):
87 |             writer.writerow([images[i], j + 1, max(round(result[0][0][j],7) - 0.000001, 0.0) * 0.96 + 0.001333 ])
88 |         i = i + 1;
89 |     csvfile.close();


--------------------------------------------------------------------------------
/train/read_data.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import numpy as np
  3 | import tflearn
  4 | import os
  5 | from tflearn.data_utils import shuffle
  6 | import pickle 
  7 | import h5py
  8 | import math
  9 | import random
 10 | import time
 11 | from PIL import Image,ImageDraw,ImageFilter,ImageEnhance
 12 | import csv
 13 | from keras.preprocessing import image
 14 | import PIL
 15 | #rotate the image 
 16 | def rotate(x, row_axis=0, col_axis=1, channel_axis=2, fill_mode='nearest', cval=0.):
 17 |     rotate_limit=(-90, 90)
 18 |     theta = np.pi / 180 * np.random.uniform(rotate_limit[0], rotate_limit[1])
 19 |     rotation_matrix = np.array([[np.cos(theta), -np.sin(theta), 0],[np.sin(theta), np.cos(theta), 0],[0, 0, 1]])
 20 |     h, w = x.shape[row_axis], x.shape[col_axis]
 21 |     transform_matrix = image.transform_matrix_offset_center(rotation_matrix, h, w)
 22 |     x = image.apply_transform(x, transform_matrix, channel_axis, fill_mode, cval)
 23 |     return x
 24 | 
 25 | # change brightness  Color  contrast  sharpness
 26 | def random_brightness(img, delta):
 27 |     img = PIL.Image.fromarray(np.uint8(img))
 28 |     enh_bri = ImageEnhance.Brightness(img)  
 29 |     brightness = np.random.randint(8,14) / 10.0;
 30 |     image_brightened = enh_bri.enhance(brightness);
 31 | 
 32 |     #color
 33 |     enh_col = ImageEnhance.Color(image_brightened)  
 34 |     color = np.random.randint(8,14) / 10.0;
 35 |     image_colored = enh_col.enhance(color)  
 36 | 
 37 |     enh_con = ImageEnhance.Contrast(image_colored)  
 38 |     contrast =np.random.randint(8,14) / 10.0;
 39 |     image_contrasted = enh_con.enhance(contrast) 
 40 | 
 41 |     enh_sha = ImageEnhance.Sharpness(image_contrasted)  
 42 |     sharpness = contrast =np.random.randint(8,15) / 10.0;
 43 |     image_sharped = enh_sha.enhance(sharpness) 
 44 | 
 45 |     return np.asarray(image_sharped)
 46 | 
 47 | 
 48 | def self_random_crop(image_batch):
 49 |     result = []
 50 |     for n in range(image_batch.shape[0]):
 51 |         newimg = random_brightness(image_batch[n], 0.8);
 52 |         newimg = rotate(newimg);
 53 |         start_x = random.randint(0,39)
 54 |         start_y = random.randint(0,39)
 55 |         newimg = newimg[start_y:start_y+448,start_x:start_x+448,:];
 56 |         result.append(newimg)
 57 |     return result
 58 | 
 59 | 
 60 | def move_zero_label(y, len1, len2):
 61 |     y_result = [];
 62 |     for i in range(len1):
 63 |         ytem = y[i];
 64 |         y_result.append(ytem[1:len2]);
 65 |     return np.array(y_result);
 66 | 
 67 | class data_reader:
 68 |     def __init__(self, datasets, numclass, batchsize, shuffle = True):
 69 |     	self.shuffle = shuffle;
 70 |         self.num_class = numclass;
 71 |         self.dataset = datasets;
 72 |         self.file_num = len(datasets);
 73 |         self.now_read_file_pos = 0;
 74 |         self.batch_size = batchsize;
 75 |         self.data = None;
 76 |         self.X_data = None;
 77 |         self.Y_data = None;
 78 |         self.datanum = 0;
 79 |         self.batch_num = 0;
 80 |         self.tem_batch_pos = 0;
 81 |         self.total_datanum = 0;
 82 |         self.nextfile();
 83 | 
 84 |     def new_iterator(self):
 85 |         self.now_read_file_pos = 0;
 86 |         self.total_datanum = 0;
 87 |         self.nextfile();
 88 | 
 89 |     def nextfile(self):
 90 |         if(self.now_read_file_pos + 1 <= self.file_num):
 91 |             self.data = h5py.File(self.dataset[self.now_read_file_pos], 'r')
 92 |             self.X_data = self.data['X'];
 93 |             self.Y_data = self.data['Y'];
 94 |             self.datanum = self.X_data.shape[0];
 95 |             self.total_datanum = self.total_datanum + self.datanum;
 96 |             self.Y_data = move_zero_label(self.Y_data, self.Y_data.shape[0], self.Y_data.shape[1]);
 97 |             if(self.shuffle):
 98 |             	self.X_data, self.Y_data = shuffle(self.X_data, self.Y_data)
 99 |             self.batch_num = int(self.datanum / self.batch_size);
100 |             self.tem_batch_pos = 0;
101 |             print('Read data file: ' + self.dataset[self.now_read_file_pos]);
102 |             self.now_read_file_pos = self.now_read_file_pos + 1;
103 |             return True;
104 |         else:
105 |             return False;
106 | 
107 | 
108 |     def next_batch(self, process = False):
109 |         batch_xs = self.X_data[self.tem_batch_pos * self.batch_size: (self.tem_batch_pos + 1) * self.batch_size]
110 |         batch_ys = self.Y_data[self.tem_batch_pos * self.batch_size: (self.tem_batch_pos + 1) * self.batch_size]
111 |         if(process):
112 |             batch_xs = self_random_crop(batch_xs);
113 |         self.tem_batch_pos = self.tem_batch_pos + 1;
114 |         return batch_xs , batch_ys 
115 | 
116 | 
117 |     def have_next(self):
118 |         if(self.tem_batch_pos < self.batch_num):
119 |             return True;
120 |         if(self.tem_batch_pos >= self.batch_num):
121 |             if(self.nextfile() == False):
122 |                 return False;
123 |             else:
124 |                 return self.have_next(); 
125 | 
126 | 


--------------------------------------------------------------------------------
/train/resvgg_model.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import tensorflow as tf
  3 | import numpy as np
  4 | #from scipy.misc import imread, imresize
  5 | import tflearn
  6 | from tflearn.data_preprocessing import ImagePreprocessing
  7 | from tflearn.data_augmentation import ImageAugmentation
  8 | import os
  9 | from tflearn.data_utils import shuffle
 10 | import pickle 
 11 | from tflearn.data_utils import image_preloader
 12 | import h5py
 13 | import math
 14 | import random
 15 | import time
 16 | import csv
 17 | 
 18 | class resvgg:
 19 |     def __init__(self, imgs,keep_prob, weights=None, sess=None, res = False, finetune = True):
 20 |         self.finetune = finetune;           ## only to train the last fc layer
 21 |         self.keep_prob = keep_prob;
 22 |         self.imgs = imgs
 23 |         self.res = res;                     ## use the resnet structure
 24 |         self.last_layer_parameters = []     ## Parameters in this list will be optimized when only last layer is being trained 
 25 |         self.parameters = []                ## Parameters in this list will be optimized when whole BCNN network is finetuned
 26 |         self.convlayers()                   ## Create Convolutional layers
 27 |         self.fc_layers()                    ## Create Fully connected layer
 28 |         self.weight_file = weights    
 29 |             
 30 | 
 31 | 
 32 |     def convlayers(self):
 33 |         
 34 |         # zero-mean input
 35 |         with tf.name_scope('preprocess') as scope:
 36 |             mean = tf.constant([123.68, 116.779, 103.939], dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean')
 37 |             images = self.imgs-mean
 38 |             print('Adding Data Augmentation')
 39 | 
 40 | 
 41 |         # conv1_1
 42 |         with tf.variable_scope("conv1_1"):
 43 |             weights = tf.get_variable("W", [3,3,3,64], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
 44 |              # Create variable named "biases".
 45 |             biases = tf.get_variable("b", [64], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
 46 |             conv = tf.nn.conv2d(images, weights, strides=[1, 1, 1, 1], padding='SAME')
 47 |             self.conv1_1 = tf.nn.relu(conv + biases)
 48 |             self.parameters += [weights, biases]
 49 | 
 50 | 
 51 |         # conv1_2
 52 |         with tf.variable_scope("conv1_2"):
 53 |             weights = tf.get_variable("W", [3,3,64,64], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
 54 |              # Create variable named "biases".
 55 |             biases = tf.get_variable("b", [64], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
 56 |             conv = tf.nn.conv2d(self.conv1_1, weights, strides=[1, 1, 1, 1], padding='SAME')
 57 |             combine = conv + biases;
 58 |             if(self.res):
 59 |                 combine = combine + self.conv1_1;
 60 |             self.conv1_2 = tf.nn.relu( combine )
 61 |             self.parameters += [weights, biases]
 62 | 
 63 |         # pool1
 64 |         self.pool1 = tf.nn.max_pool(self.conv1_2,
 65 |                                ksize=[1, 2, 2, 1],
 66 |                                strides=[1, 2, 2, 1],
 67 |                                padding='SAME',
 68 |                                name='pool1')
 69 | 
 70 |         # conv2_1
 71 |         with tf.variable_scope("conv2_1"):
 72 |             weights = tf.get_variable("W", [3,3,64,128], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
 73 |              # Create variable named "biases".
 74 |             biases = tf.get_variable("b", [128], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
 75 |             conv = tf.nn.conv2d(self.pool1, weights, strides=[1, 1, 1, 1], padding='SAME')
 76 |             self.conv2_1 = tf.nn.relu(conv + biases)
 77 |             self.parameters += [weights, biases]
 78 | 
 79 | 
 80 | 
 81 |         # conv2_2
 82 |         with tf.variable_scope("conv2_2"):
 83 |             weights = tf.get_variable("W", [3,3,128,128], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
 84 |              # Create variable named "biases".
 85 |             biases = tf.get_variable("b", [128], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
 86 |             conv = tf.nn.conv2d(self.conv2_1, weights, strides=[1, 1, 1, 1], padding='SAME')
 87 |             combine = conv + biases;
 88 |             if(self.res):
 89 |                 combine = combine + self.conv2_1;
 90 |             self.conv2_2 = tf.nn.relu(combine)
 91 |             self.parameters += [weights, biases]
 92 | 
 93 | 
 94 |         # pool2
 95 |         self.pool2 = tf.nn.max_pool(self.conv2_2,
 96 |                                ksize=[1, 2, 2, 1],
 97 |                                strides=[1, 2, 2, 1],
 98 |                                padding='SAME',
 99 |                                name='pool2')
100 | 
101 |         # conv3_1
102 |         with tf.variable_scope("conv3_1"):
103 |             weights = tf.get_variable("W", [3,3,128,256], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
104 |              # Create variable named "biases".
105 |             biases = tf.get_variable("b", [256], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
106 |             conv = tf.nn.conv2d(self.pool2, weights, strides=[1, 1, 1, 1], padding='SAME')
107 |             self.conv3_1 = tf.nn.relu(conv + biases)
108 |             self.parameters += [weights, biases]
109 | 
110 | 
111 |         # conv3_2
112 |         with tf.variable_scope("conv3_2"):
113 |             weights = tf.get_variable("W", [3,3,256,256], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
114 |              # Create variable named "biases".
115 |             biases = tf.get_variable("b", [256], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
116 |             conv = tf.nn.conv2d(self.conv3_1, weights, strides=[1, 1, 1, 1], padding='SAME')
117 |             self.conv3_2 = tf.nn.relu(conv + biases)
118 |             self.parameters += [weights, biases]
119 | 
120 |         # conv3_3
121 |         with tf.variable_scope("conv3_3"):
122 |             weights = tf.get_variable("W", [3,3,256,256], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
123 |              # Create variable named "biases".
124 |             biases = tf.get_variable("b", [256], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
125 |             conv = tf.nn.conv2d(self.conv3_2, weights, strides=[1, 1, 1, 1], padding='SAME')
126 |             combine = conv + biases;
127 |             if(self.res):
128 |                 combine = combine + self.conv3_1;
129 |             self.conv3_3 = tf.nn.relu(combine)
130 |             self.parameters += [weights, biases]
131 | 
132 | 
133 |         # pool3
134 |         self.pool3 = tf.nn.max_pool(self.conv3_3,
135 |                                ksize=[1, 2, 2, 1],
136 |                                strides=[1, 2, 2, 1],
137 |                                padding='SAME',
138 |                                name='pool3')
139 | 
140 |         # conv4_1
141 |         with tf.variable_scope("conv4_1"):
142 |             weights = tf.get_variable("W", [3,3,256,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
143 |              # Create variable named "biases".
144 |             biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
145 |             conv = tf.nn.conv2d(self.pool3, weights, strides=[1, 1, 1, 1], padding='SAME')
146 |             self.conv4_1 = tf.nn.relu(conv + biases)
147 |             self.parameters += [weights, biases]
148 | 
149 | 
150 |         # conv4_2
151 |         with tf.variable_scope("conv4_2"):
152 |             weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
153 |              # Create variable named "biases".
154 |             biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
155 |             conv = tf.nn.conv2d(self.conv4_1, weights, strides=[1, 1, 1, 1], padding='SAME')
156 |             self.conv4_2 = tf.nn.relu(conv + biases)
157 |             self.parameters += [weights, biases]
158 | 
159 | 
160 |         # conv4_3
161 |         with tf.variable_scope("conv4_3"):
162 |             weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
163 |              # Create variable named "biases".
164 |             biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
165 |             conv = tf.nn.conv2d(self.conv4_2, weights, strides=[1, 1, 1, 1], padding='SAME')
166 |             combine = conv + biases;
167 |             if(self.res):
168 |                 combine = combine + self.conv4_1;
169 |             self.conv4_3 = tf.nn.relu(combine)
170 |             self.parameters += [weights, biases]
171 | 
172 |         # pool4
173 |         self.pool4 = tf.nn.max_pool(self.conv4_3,
174 |                                ksize=[1, 2, 2, 1],
175 |                                strides=[1, 2, 2, 1],
176 |                                padding='SAME',
177 |                                name='pool4')
178 | 
179 |         # conv5_1
180 |         with tf.variable_scope("conv5_1"):
181 |             weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
182 |              # Create variable named "biases".
183 |             biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
184 |             conv = tf.nn.conv2d(self.pool4, weights, strides=[1, 1, 1, 1], padding='SAME')
185 |             self.conv5_1 = tf.nn.relu(conv + biases)
186 |             self.parameters += [weights, biases]
187 | 
188 | 
189 |         # conv5_2
190 |         with tf.variable_scope("conv5_2"):
191 |             weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
192 |              # Create variable named "biases".
193 |             biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
194 |             conv = tf.nn.conv2d(self.conv5_1, weights, strides=[1, 1, 1, 1], padding='SAME')
195 |             self.conv5_2 = tf.nn.relu(conv + biases)
196 |             self.parameters += [weights, biases]
197 |             
198 | 
199 |         # conv5_3
200 |         with tf.variable_scope("conv5_3"):
201 |             weights = tf.get_variable("W", [3,3,512,512], initializer=tf.contrib.layers.xavier_initializer(), trainable=self.finetune)
202 |              # Create variable named "biases".
203 |             biases = tf.get_variable("b", [512], initializer=tf.constant_initializer(0.1), trainable=self.finetune)
204 |             conv = tf.nn.conv2d(self.conv5_2, weights, strides=[1, 1, 1, 1], padding='SAME')
205 |             combine = conv + biases;
206 |             if(self.res):
207 |                 combine = combine + self.conv5_1;
208 |             self.conv5_3 = tf.nn.relu(conv + biases + self.conv5_1)
209 |             self.parameters += [weights, biases]
210 |             self.special_parameters = [weights,biases]
211 | 
212 | 
213 |         self.z_l2 = self.get_bilinear_fc(self.conv5_3, self.conv5_3)
214 |         # print('conv5_3  ',self.conv5_3.get_shape())
215 |         # print('self.conv5_1  ',self.conv5_1.get_shape())
216 |         self.z_l3 = self.get_bilinear_fc(self.conv5_3, self.conv5_1)
217 |         # print('self.conv5_1  ',self.conv5_1.get_shape())
218 |         pool4_1 = tf.nn.max_pool(self.conv4_1,
219 |                                ksize=[1, 2, 2, 1],
220 |                                strides=[1, 2, 2, 1],
221 |                                padding='SAME',
222 |                                name='pool4')
223 |         self.z_l4 = self.get_bilinear_fc(self.conv5_3, pool4_1)
224 | 
225 |         self.final_z = tf.concat([self.z_l2 ,self.z_l3, self.z_l4],1)
226 |         print(self.final_z.get_shape())
227 | 
228 |     def get_bilinear_fc(self,conv1, conv2):
229 |         conv1 = tf.transpose(conv1, perm=[0,3,1,2])       
230 |         conv1 = tf.reshape(conv1,[-1,512,784])            
231 |         conv2 = tf.transpose(conv2, perm=[0,3,1,2])       
232 |         conv2 = tf.reshape(conv2,[-1,512,784])                                                                          
233 |         conv2 = tf.transpose(conv2, perm=[0,2,1])            
234 |         phi_I = tf.matmul(conv1, conv2)                 
235 |         phi_I = tf.reshape(phi_I,[-1,512*512])                
236 |         phi_I = tf.divide(phi_I,784.0)  
237 |         y_ssqrt = tf.multiply(tf.sign(phi_I),tf.sqrt(tf.abs(phi_I)+1e-12))       
238 |         z = tf.nn.l2_normalize(y_ssqrt, dim=1)     
239 |         print('Shape of z', z.get_shape())
240 |         return z
241 | 
242 | 
243 | 
244 |     def fc_layers(self):
245 | 
246 | 
247 |         with tf.variable_scope('fc-new') as scope:
248 |             fc3w = tf.get_variable('W', [786432, 30], initializer=tf.contrib.layers.xavier_initializer(), trainable=True)
249 |             #fc3b = tf.Variable(tf.constant(1.0, shape=[100], dtype=tf.float32), name='biases', trainable=True)
250 |             fc3b = tf.get_variable("b", [30], initializer=tf.constant_initializer(0.1), trainable=True)
251 |             fc = tf.nn.bias_add(tf.matmul(self.final_z, fc3w), fc3b)
252 |             self.fc3l = tf.nn.dropout(fc, self.keep_prob)
253 |             self.last_layer_parameters += [fc3w, fc3b]
254 |             self.parameters += [fc3w, fc3b]
255 | 
256 |     def load_initial_weights(self, session):
257 |         weights_dict = np.load(self.weight_file, encoding = 'bytes')
258 |         vgg_layers = ['conv1_1','conv1_2','conv2_1','conv2_2','conv3_1','conv3_2','conv3_3','conv4_1','conv4_2','conv4_3','conv5_1','conv5_2','conv5_3']
259 |         
260 |         for op_name in vgg_layers:
261 |             with tf.variable_scope(op_name, reuse = True):
262 |                 
263 |               # Loop over list of weights/biases and assign them to their corresponding tf variable
264 |                 # Biases
265 |               
266 |               var = tf.get_variable('b', trainable = True)
267 |               print('Adding weights to',var.name)
268 |               session.run(var.assign(weights_dict[op_name+'_b']))
269 |                   
270 |             # Weights
271 |               var = tf.get_variable('W', trainable = True)
272 |               print('Adding weights to',var.name)
273 |               session.run(var.assign(weights_dict[op_name+'_W']))
274 | 
275 | 
276 | 
277 | 
278 |     def load_own_weight(self,session, filename):
279 |         i = 0;
280 |         weights_dict = np.load(filename, encoding = 'bytes')
281 |         '''Loop over all layer names stored in the weights dict
282 |            Load only conv-layers. Skip fc-layers in VGG16'''
283 |         vgg_layers = ['conv1_1','conv1_2','conv2_1','conv2_2','conv3_1','conv3_2','conv3_3','conv4_1','conv4_2','conv4_3','conv5_1','conv5_2','conv5_3']
284 |         
285 |         for op_name in vgg_layers:
286 |             with tf.variable_scope(op_name, reuse = True):
287 |               # Weights
288 |               var = tf.get_variable('W', trainable = True)
289 |               print('Adding weights to',var.name)
290 |               session.run(var.assign(weights_dict['arr_0' ][i]))
291 | 
292 |               var = tf.get_variable('b', trainable = True)
293 |               print('Adding weights to',var.name)
294 |               session.run(var.assign(weights_dict['arr_0' ][i+1]))
295 |               i = i + 2;
296 | 
297 | 
298 | 
299 |         with tf.variable_scope('fc-new', reuse = True):
300 |             '''
301 |             Load fc-layer weights trained in the first step. 
302 |             Use file .py to train last layer
303 |             '''
304 |             print('Last layer weights: last_layers_epoch_best.npz')
305 |             var = tf.get_variable('W', trainable = True)
306 |             print('Adding weights to',var.name)
307 |             session.run(var.assign(weights_dict['arr_0' ][i]))
308 |             var = tf.get_variable('b', trainable = True)
309 |             print('Adding weights to',var.name)
310 |             session.run(var.assign(weights_dict['arr_0'][i+1]))
311 |             i = i + 2;


--------------------------------------------------------------------------------
/train/train_resvgg.py:
--------------------------------------------------------------------------------
  1 | from resvgg_model import *;
  2 | from read_data import *;
  3 | 
  4 | 
  5 | if __name__ == '__main__':
  6 |     dataset_path = '/home/smie/zhengjx/Res_Bilinear_cnns/data_model/';
  7 |     model_path = '/home/smie/zhengjx/Res_Bilinear_cnns/data_model/';
  8 |     save_model_best = model_path + 'res_fine_last_layers_epoch_best.npz';
  9 |     save_model_last = model_path + 'res_fine_last_layers_epoch_last.npz';
 10 |     train_data_file = [];
 11 |     val_data_file = [];
 12 |     #define the data to use
 13 |     for i in range(0,50):
 14 |         train_data_file.append(dataset_path + 'train_data' + str(i) + '.h5');
 15 |         val_data_file.append(dataset_path + 'validation_data' + str(i) + '.h5');
 16 |     num_class = 30;
 17 |     train_batch_size = 8;
 18 |     val_batch_size = 8;
 19 |     train_reader = data_reader(train_data_file, num_class, train_batch_size);
 20 |     val_reader = data_reader(val_data_file, num_class, val_batch_size)
 21 | 
 22 |     model_path =  model_path + 'res_fine_last_layers_epoch_best.npz';
 23 |     
 24 |     sess = tf.Session()     ## Start session to create training graph
 25 |     keep_prob = tf.placeholder(tf.float32);
 26 |     imgs = tf.placeholder(tf.float32, [None, 448, 448, 3])
 27 |     target = tf.placeholder("float", [None, 30])
 28 | 
 29 |     # resvgg = resvgg(imgs, 'vgg16_weights.npz', sess, finetune = False)   # fine tuning
 30 |     resvgg = resvgg(imgs, keep_prob, 'vgg16_weights.npz', sess)
 31 |     
 32 |     print('Res Bilinear cnn network created')
 33 |     
 34 |     # Defining other ops using Tensorflow
 35 |     loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=resvgg.fc3l, labels=target))
 36 | 
 37 |     #optimizer = tf.train.MomentumOptimizer(learning_rate=0.2, momentum=0.4).minimize(loss)  # for fine tuning
 38 |     optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)        # for normal training
 39 |     check_op = tf.add_check_numerics_ops()
 40 | 
 41 | 
 42 |     correct_prediction = tf.equal(tf.argmax(resvgg.fc3l,1), tf.argmax(target,1))
 43 |     accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
 44 | 
 45 |     num_correct_preds = tf.reduce_sum(tf.cast(correct_prediction, tf.float32))
 46 | 
 47 |     sess.run(tf.global_variables_initializer())
 48 |     #resvgg.load_initial_weights(sess)
 49 |     resvgg.load_own_weight(sess , model_path);   # load the model trained before 
 50 | 
 51 |     
 52 |         
 53 | 
 54 | 
 55 |     correct_val_count = 0
 56 |     val_loss = 0.0
 57 |     while(val_reader.have_next()):
 58 |         batch_val_x, batch_val_y = val_reader.next_batch();
 59 |         val_loss += sess.run(loss, feed_dict={imgs: batch_val_x, target: batch_val_y, keep_prob:1.0})
 60 |         pred = sess.run(num_correct_preds, feed_dict = {imgs: batch_val_x, target: batch_val_y, keep_prob:1.0})
 61 |         correct_val_count+=pred
 62 |     val_loss = val_loss/(1.0*val_reader.total_datanum);
 63 |     print("##############################")
 64 |     print("Validation Loss -->", val_loss)
 65 |     print("correct_val_count, total_val_count", correct_val_count, val_reader.total_datanum)
 66 |     print("Validation Data Accuracy -->", 100.0*correct_val_count/(1.0*val_reader.total_datanum))
 67 |     print("##############################")
 68 | 
 69 | 
 70 | 
 71 |     print('Starting training')
 72 |     best_validation_lost = val_loss;
 73 |     for epoch in range(50):
 74 |         train_reader.new_iterator();
 75 |         ave_cost = 0;
 76 |         num = 100;
 77 |         i = 0;
 78 |         while(train_reader.have_next()):
 79 |             i = i + 1;
 80 |             batch_xs, batch_ys = train_reader.next_batch(process = True);
 81 |             start = time.time()
 82 |             sess.run([optimizer,check_op], feed_dict={imgs: batch_xs, target: batch_ys, keep_prob:0.5})
 83 |             cost = sess.run(loss, feed_dict={imgs: batch_xs, target: batch_ys, keep_prob:1.0})
 84 |             ave_cost = ave_cost + cost;
 85 |             if i % num == 0:
 86 |                 ave_cost = 1.0 * ave_cost / num;
 87 |                 print("Epoch:", '%03d' % (epoch+1), "Step:", '%03d' % i,"Loss:", str(ave_cost))
 88 |                 ave_cost = 0;
 89 | 
 90 | 
 91 |         correct_val_count = 0
 92 |         val_loss = 0.0;
 93 |         val_reader.new_iterator();
 94 |         while(val_reader.have_next()):
 95 |             batch_val_x, batch_val_y = val_reader.next_batch();
 96 |             val_loss += sess.run(loss, feed_dict={imgs: batch_val_x, target: batch_val_y, keep_prob:1.0})
 97 |             pred = sess.run(num_correct_preds, feed_dict = {imgs: batch_val_x, target: batch_val_y, keep_prob:1.0})
 98 |             correct_val_count+=pred
 99 |         val_loss = val_loss/(1.0*val_reader.total_datanum);
100 |         print("##############################")
101 |         print("Validation Loss -->", val_loss)
102 |         print("correct_val_count, total_val_count", correct_val_count, val_reader.total_datanum)
103 |         print("Validation Data Accuracy -->", 100.0*correct_val_count/(1.0*val_reader.total_datanum))
104 |         print("##############################")
105 |         #save the best model
106 |         if(val_loss < best_validation_lost):
107 |             best_validation_lost = val_loss;
108 |             last_layer_weights = []
109 |             for v in resvgg.parameters:
110 |                 last_layer_weights.append(sess.run(v))
111 |             np.savez(save_model_best,last_layer_weights)
112 |             print('save the model!')
113 | 
114 | 
115 |     last_layer_weights = []
116 |     for v in resvgg.parameters:
117 |         print(v)
118 |         last_layer_weights.append(sess.run(v))
119 |     np.savez(save_model_last,last_layer_weights)
120 |     print('save the model!')
121 | 


--------------------------------------------------------------------------------
/yolo/ReadMe.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/xuxcong/pig-face-recognition/540cdd6026ec2be6250f2677c036bd57c2251e67/yolo/ReadMe.txt


--------------------------------------------------------------------------------
/yolo/demo.c:
--------------------------------------------------------------------------------
  1 | #include "network.h"
  2 | #include "detection_layer.h"
  3 | #include "region_layer.h"
  4 | #include "cost_layer.h"
  5 | #include "utils.h"
  6 | #include "parser.h"
  7 | #include "box.h"
  8 | #include "image.h"
  9 | #include "demo.h"
 10 | #include <sys/time.h>
 11 | 
 12 | #define DEMO 1
 13 | 
 14 | #ifdef OPENCV
 15 | 
 16 | static char **demo_names;
 17 | static image **demo_alphabet;
 18 | static int demo_classes;
 19 | 
 20 | static float **probs;
 21 | static box *boxes;
 22 | static network *net;
 23 | static image buff [3];
 24 | static image buff_letter[3];
 25 | static int buff_index = 0;
 26 | static CvCapture * cap;
 27 | static IplImage  * ipl;
 28 | static float fps = 0;
 29 | static float demo_thresh = 0;
 30 | static float demo_hier = .5;
 31 | static int running = 0;
 32 | 
 33 | static int demo_frame = 3;
 34 | static int demo_detections = 0;
 35 | static float **predictions;
 36 | static int demo_index = 0;
 37 | static int demo_done = 0;
 38 | static float *avg;
 39 | double demo_time;
 40 | 
 41 | void *detect_in_thread(void *ptr)
 42 | {
 43 |     running = 1;
 44 |     float nms = .4;
 45 | 
 46 |     layer l = net->layers[net->n-1];
 47 |     float *X = buff_letter[(buff_index+2)%3].data;
 48 |     float *prediction = network_predict(net, X);
 49 | 
 50 |     memcpy(predictions[demo_index], prediction, l.outputs*sizeof(float));
 51 |     mean_arrays(predictions, demo_frame, l.outputs, avg);
 52 |     l.output = avg;
 53 |     if(l.type == DETECTION){
 54 |         get_detection_boxes(l, 1, 1, demo_thresh, probs, boxes, 0);
 55 |     } else if (l.type == REGION){
 56 |         get_region_boxes(l, buff[0].w, buff[0].h, net->w, net->h, demo_thresh, probs, boxes, 0, 0, 0, demo_hier, 1);
 57 |     } else {
 58 |         error("Last layer must produce detections\n");
 59 |     }
 60 |     if (nms > 0) do_nms_obj(boxes, probs, l.w*l.h*l.n, l.classes, nms);
 61 | 
 62 |     //printf("\033[2J");  //清屏
 63 |     //printf("\033[1;1H");
 64 |     //printf("\nFPS:%.1f\n",fps);
 65 |     //printf("Objects:\n\n");
 66 |     image display = buff[(buff_index+2) % 3];
 67 |     draw_detections(display, demo_detections, demo_thresh, boxes, probs, 0, demo_names, demo_alphabet, demo_classes);  //画ROI 来自image.c
 68 | 
 69 |     demo_index = (demo_index + 1)%demo_frame;
 70 |     running = 0;
 71 |     return 0;
 72 | }
 73 | 
 74 | void *fetch_in_thread(void *ptr)
 75 | {
 76 |     int status = fill_image_from_stream(cap, buff[buff_index]);
 77 |     letterbox_image_into(buff[buff_index], net->w, net->h, buff_letter[buff_index]);
 78 |     if(status == 0) demo_done = 1;
 79 |     return 0;
 80 | }
 81 | 
 82 | void *display_in_thread(void *ptr)
 83 | {
 84 |     show_image_cv(buff[(buff_index + 1)%3], "Demo", ipl);
 85 |     int c = cvWaitKey(1);
 86 |     if (c != -1) c = c%256;
 87 |     if (c == 27) {
 88 |         demo_done = 1;
 89 |         return 0;
 90 |     } else if (c == 82) {
 91 |         demo_thresh += .02;
 92 |     } else if (c == 84) {
 93 |         demo_thresh -= .02;
 94 |         if(demo_thresh <= .02) demo_thresh = .02;
 95 |     } else if (c == 83) {
 96 |         demo_hier += .02;
 97 |     } else if (c == 81) {
 98 |         demo_hier -= .02;
 99 |         if(demo_hier <= .0) demo_hier = .0;
100 |     }
101 |     return 0;
102 | }
103 | 
104 | void *display_loop(void *ptr)
105 | {
106 |     while(1){
107 |         display_in_thread(0);
108 |     }
109 | }
110 | 
111 | void *detect_loop(void *ptr)
112 | {
113 |     while(1){
114 |         detect_in_thread(0);
115 |     }
116 | }
117 | 
118 | void demo(char *cfgfile, char *weightfile, float thresh, int cam_index, const char *filename, char **names, int classes, int delay, char *prefix, int avg_frames, float hier, int w, int h, int frames, int fullscreen)
119 | {
120 |     demo_frame = avg_frames;
121 |     predictions = calloc(demo_frame, sizeof(float*));
122 |     image **alphabet = load_alphabet();  //???
123 |     demo_names = names;
124 |     demo_alphabet = alphabet;
125 |     demo_classes = classes;
126 |     demo_thresh = thresh;
127 |     demo_hier = hier;
128 |     printf("Demo\n");
129 |     net = load_network(cfgfile, weightfile, 0);
130 |     set_batch_network(net, 1);
131 |     pthread_t detect_thread;
132 |     pthread_t fetch_thread;
133 | 
134 |     srand(2222222);
135 | 
136 |     if(filename){
137 |         printf("video file: %s\n", filename);
138 |         cap = cvCaptureFromFile(filename);
139 |     }else{
140 |         cap = cvCaptureFromCAM(cam_index);
141 | 
142 |         if(w){
143 |             cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_WIDTH, w);
144 |         }
145 |         if(h){
146 |             cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_HEIGHT, h);
147 |         }
148 |         if(frames){
149 |             cvSetCaptureProperty(cap, CV_CAP_PROP_FPS, frames);
150 |         }
151 |     }
152 | 
153 |     if(!cap) error("Couldn't connect to webcam.\n");
154 | 
155 |     layer l = net->layers[net->n-1];
156 |     demo_detections = l.n*l.w*l.h;
157 |     int j;
158 | 
159 |     avg = (float *) calloc(l.outputs, sizeof(float));
160 |     for(j = 0; j < demo_frame; ++j) predictions[j] = (float *) calloc(l.outputs, sizeof(float));
161 | 
162 |     boxes = (box *)calloc(l.w*l.h*l.n, sizeof(box));
163 |     probs = (float **)calloc(l.w*l.h*l.n, sizeof(float *));
164 |     for(j = 0; j < l.w*l.h*l.n; ++j) probs[j] = (float *)calloc(l.classes+1, sizeof(float));
165 | 
166 |     buff[0] = get_image_from_stream(cap);
167 |     buff[1] = copy_image(buff[0]);
168 |     buff[2] = copy_image(buff[0]);
169 |     buff_letter[0] = letterbox_image(buff[0], net->w, net->h);
170 |     buff_letter[1] = letterbox_image(buff[0], net->w, net->h);
171 |     buff_letter[2] = letterbox_image(buff[0], net->w, net->h);
172 |     ipl = cvCreateImage(cvSize(buff[0].w,buff[0].h), IPL_DEPTH_8U, buff[0].c);
173 | 
174 |     int count = 0;
175 |     if(!prefix){
176 |         cvNamedWindow("Demo", CV_WINDOW_NORMAL); 
177 |         if(fullscreen){
178 |             cvSetWindowProperty("Demo", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
179 |         } else {
180 |             cvMoveWindow("Demo", 0, 0);
181 |             cvResizeWindow("Demo", 1352, 1013);
182 |         }
183 |     }
184 | 
185 |     demo_time = what_time_is_it_now();
186 | 
187 |     while(!demo_done){
188 |         printf("frame %d:\n",count+1);  //??? count+1
189 |         buff_index = (buff_index + 1) %3;
190 |         if(pthread_create(&fetch_thread, 0, fetch_in_thread, 0)) error("Thread creation failed");
191 |         if(pthread_create(&detect_thread, 0, detect_in_thread, 0)) error("Thread creation failed");
192 |         if(!prefix){
193 |             fps = 1./(what_time_is_it_now() - demo_time);
194 |             demo_time = what_time_is_it_now();
195 |             display_in_thread(0);
196 |         }else{
197 |             char name[256];
198 |             sprintf(name, "%s_%04d", prefix, count);
199 |             save_image(buff[(buff_index + 1)%3], name);  //写入图片
200 |         }
201 |         pthread_join(fetch_thread, 0);
202 |         pthread_join(detect_thread, 0);
203 |         ++count;
204 |     }
205 | }
206 | 
207 | void demo_compare(char *cfg1, char *weight1, char *cfg2, char *weight2, float thresh, int cam_index, const char *filename, char **names, int classes, int delay, char *prefix, int avg_frames, float hier, int w, int h, int frames, int fullscreen)
208 | {
209 |     demo_frame = avg_frames;
210 |     predictions = calloc(demo_frame, sizeof(float*));
211 |     image **alphabet = load_alphabet();
212 |     demo_names = names;
213 |     demo_alphabet = alphabet;
214 |     demo_classes = classes;
215 |     demo_thresh = thresh;
216 |     demo_hier = hier;
217 |     printf("Demo\n");
218 |     net = load_network(cfg1, weight1, 0);
219 |     set_batch_network(net, 1);
220 |     pthread_t detect_thread;
221 |     pthread_t fetch_thread;
222 | 
223 |     srand(2222222);
224 | 
225 |     if(filename){
226 |         printf("video file: %s\n", filename);
227 |         cap = cvCaptureFromFile(filename);
228 |     }else{
229 |         cap = cvCaptureFromCAM(cam_index);
230 | 
231 |         if(w){
232 |             cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_WIDTH, w);
233 |         }
234 |         if(h){
235 |             cvSetCaptureProperty(cap, CV_CAP_PROP_FRAME_HEIGHT, h);
236 |         }
237 |         if(frames){
238 |             cvSetCaptureProperty(cap, CV_CAP_PROP_FPS, frames);
239 |         }
240 |     }
241 | 
242 |     if(!cap) error("Couldn't connect to webcam.\n");
243 | 
244 |     layer l = net->layers[net->n-1];
245 |     demo_detections = l.n*l.w*l.h;
246 |     int j;
247 | 
248 |     avg = (float *) calloc(l.outputs, sizeof(float));
249 |     for(j = 0; j < demo_frame; ++j) predictions[j] = (float *) calloc(l.outputs, sizeof(float));
250 | 
251 |     boxes = (box *)calloc(l.w*l.h*l.n, sizeof(box));
252 |     probs = (float **)calloc(l.w*l.h*l.n, sizeof(float *));
253 |     for(j = 0; j < l.w*l.h*l.n; ++j) probs[j] = (float *)calloc(l.classes+1, sizeof(float));
254 | 
255 |     buff[0] = get_image_from_stream(cap);
256 |     buff[1] = copy_image(buff[0]);
257 |     buff[2] = copy_image(buff[0]);
258 |     buff_letter[0] = letterbox_image(buff[0], net->w, net->h);
259 |     buff_letter[1] = letterbox_image(buff[0], net->w, net->h);
260 |     buff_letter[2] = letterbox_image(buff[0], net->w, net->h);
261 |     ipl = cvCreateImage(cvSize(buff[0].w,buff[0].h), IPL_DEPTH_8U, buff[0].c);
262 | 
263 |     int count = 0;
264 |     if(!prefix){
265 |         cvNamedWindow("Demo", CV_WINDOW_NORMAL); 
266 |         if(fullscreen){
267 |             cvSetWindowProperty("Demo", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
268 |         } else {
269 |             cvMoveWindow("Demo", 0, 0);
270 |             cvResizeWindow("Demo", 1352, 1013);
271 |         }
272 |     }
273 | 
274 |     demo_time = what_time_is_it_now();
275 | 
276 |     while(!demo_done){
277 |         buff_index = (buff_index + 1) %3;
278 |         if(pthread_create(&fetch_thread, 0, fetch_in_thread, 0)) error("Thread creation failed");
279 |         if(pthread_create(&detect_thread, 0, detect_in_thread, 0)) error("Thread creation failed");
280 |         if(!prefix){
281 |             fps = 1./(what_time_is_it_now() - demo_time);
282 |             demo_time = what_time_is_it_now();
283 |             display_in_thread(0);
284 |         }else{
285 |             char name[256];
286 |             sprintf(name, "%s_%04d", prefix, count);
287 |             save_image(buff[(buff_index + 1)%3], name);
288 |         }
289 |         pthread_join(fetch_thread, 0);
290 |         pthread_join(detect_thread, 0);
291 |         ++count;
292 |     }
293 | }
294 | #else
295 | void demo(char *cfgfile, char *weightfile, float thresh, int cam_index, const char *filename, char **names, int classes, int delay, char *prefix, int avg, float hier, int w, int h, int frames, int fullscreen)
296 | {
297 |     fprintf(stderr, "Demo needs OpenCV for webcam images.\n");
298 | }
299 | #endif
300 | 
301 | 


--------------------------------------------------------------------------------
/yolo/demo.h:
--------------------------------------------------------------------------------
1 | #ifndef DEMO_H
2 | #define DEMO_H
3 | 
4 | #include "image.h"
5 | 
6 | #endif
7 | 


--------------------------------------------------------------------------------
/yolo/image.c:
--------------------------------------------------------------------------------
   1 | #include "image.h"
   2 | #include "utils.h"
   3 | #include "blas.h"
   4 | #include "cuda.h"
   5 | #include <stdio.h>
   6 | #include <math.h>
   7 | 
   8 | #define STB_IMAGE_IMPLEMENTATION
   9 | #include "stb_image.h"
  10 | #define STB_IMAGE_WRITE_IMPLEMENTATION
  11 | #include "stb_image_write.h"
  12 | 
  13 | #include <cv.h>  
  14 | #include <cxcore.h>  
  15 | #include <highgui.h>  
  16 | 
  17 | 
  18 | int windows = 0;
  19 | 
  20 | int count=0; 
  21 | 
  22 | float colors[6][3] = { {1,0,1}, {0,0,1},{0,1,1},{0,1,0},{1,1,0},{1,0,0} };
  23 | 
  24 | float get_color(int c, int x, int max)
  25 | {
  26 |     float ratio = ((float)x/max)*5;
  27 |     int i = floor(ratio);
  28 |     int j = ceil(ratio);
  29 |     ratio -= i;
  30 |     float r = (1-ratio) * colors[i][c] + ratio*colors[j][c];
  31 |     //printf("%f\n", r);
  32 |     return r;
  33 | }
  34 | 
  35 | image mask_to_rgb(image mask)
  36 | {
  37 |     int n = mask.c;
  38 |     image im = make_image(mask.w, mask.h, 3);
  39 |     int i, j;
  40 |     for(j = 0; j < n; ++j){
  41 |         int offset = j*123457 % n;
  42 |         float red = get_color(2,offset,n);
  43 |         float green = get_color(1,offset,n);
  44 |         float blue = get_color(0,offset,n);
  45 |         for(i = 0; i < im.w*im.h; ++i){
  46 |             im.data[i + 0*im.w*im.h] += mask.data[j*im.h*im.w + i]*red;
  47 |             im.data[i + 1*im.w*im.h] += mask.data[j*im.h*im.w + i]*green;
  48 |             im.data[i + 2*im.w*im.h] += mask.data[j*im.h*im.w + i]*blue;
  49 |         }
  50 |     }
  51 |     return im;
  52 | }
  53 | 
  54 | static float get_pixel(image m, int x, int y, int c)
  55 | {
  56 |     assert(x < m.w && y < m.h && c < m.c);
  57 |     return m.data[c*m.h*m.w + y*m.w + x];
  58 | }
  59 | static float get_pixel_extend(image m, int x, int y, int c)
  60 | {
  61 |     if(x < 0 || x >= m.w || y < 0 || y >= m.h) return 0;
  62 |     /*
  63 |     if(x < 0) x = 0;
  64 |     if(x >= m.w) x = m.w-1;
  65 |     if(y < 0) y = 0;
  66 |     if(y >= m.h) y = m.h-1;
  67 |     */
  68 |     if(c < 0 || c >= m.c) return 0;
  69 |     return get_pixel(m, x, y, c);
  70 | }
  71 | static void set_pixel(image m, int x, int y, int c, float val)
  72 | {
  73 |     if (x < 0 || y < 0 || c < 0 || x >= m.w || y >= m.h || c >= m.c) return;
  74 |     assert(x < m.w && y < m.h && c < m.c);
  75 |     m.data[c*m.h*m.w + y*m.w + x] = val;
  76 | }
  77 | static void add_pixel(image m, int x, int y, int c, float val)
  78 | {
  79 |     assert(x < m.w && y < m.h && c < m.c);
  80 |     m.data[c*m.h*m.w + y*m.w + x] += val;
  81 | }
  82 | 
  83 | static float bilinear_interpolate(image im, float x, float y, int c)
  84 | {
  85 |     int ix = (int) floorf(x);
  86 |     int iy = (int) floorf(y);
  87 | 
  88 |     float dx = x - ix;
  89 |     float dy = y - iy;
  90 | 
  91 |     float val = (1-dy) * (1-dx) * get_pixel_extend(im, ix, iy, c) + 
  92 |         dy     * (1-dx) * get_pixel_extend(im, ix, iy+1, c) + 
  93 |         (1-dy) *   dx   * get_pixel_extend(im, ix+1, iy, c) +
  94 |         dy     *   dx   * get_pixel_extend(im, ix+1, iy+1, c);
  95 |     return val;
  96 | }
  97 | 
  98 | 
  99 | void composite_image(image source, image dest, int dx, int dy)
 100 | {
 101 |     int x,y,k;
 102 |     for(k = 0; k < source.c; ++k){
 103 |         for(y = 0; y < source.h; ++y){
 104 |             for(x = 0; x < source.w; ++x){
 105 |                 float val = get_pixel(source, x, y, k);
 106 |                 float val2 = get_pixel_extend(dest, dx+x, dy+y, k);
 107 |                 set_pixel(dest, dx+x, dy+y, k, val * val2);
 108 |             }
 109 |         }
 110 |     }
 111 | }
 112 | 
 113 | image border_image(image a, int border)
 114 | {
 115 |     image b = make_image(a.w + 2*border, a.h + 2*border, a.c);
 116 |     int x,y,k;
 117 |     for(k = 0; k < b.c; ++k){
 118 |         for(y = 0; y < b.h; ++y){
 119 |             for(x = 0; x < b.w; ++x){
 120 |                 float val = get_pixel_extend(a, x - border, y - border, k);
 121 |                 if(x - border < 0 || x - border >= a.w || y - border < 0 || y - border >= a.h) val = 1;
 122 |                 set_pixel(b, x, y, k, val);
 123 |             }
 124 |         }
 125 |     }
 126 |     return b;
 127 | }
 128 | 
 129 | image tile_images(image a, image b, int dx)
 130 | {
 131 |     if(a.w == 0) return copy_image(b);
 132 |     image c = make_image(a.w + b.w + dx, (a.h > b.h) ? a.h : b.h, (a.c > b.c) ? a.c : b.c);
 133 |     fill_cpu(c.w*c.h*c.c, 1, c.data, 1);
 134 |     embed_image(a, c, 0, 0); 
 135 |     composite_image(b, c, a.w + dx, 0);
 136 |     return c;
 137 | }
 138 | 
 139 | image get_label(image **characters, char *string, int size)
 140 | {
 141 |     if(size > 7) size = 7;
 142 |     image label = make_empty_image(0,0,0);
 143 |     while(*string){
 144 |         image l = characters[size][(int)*string];
 145 |         image n = tile_images(label, l, -size - 1 + (size+1)/2);
 146 |         free_image(label);
 147 |         label = n;
 148 |         ++string;
 149 |     }
 150 |     image b = border_image(label, label.h*.25);
 151 |     free_image(label);
 152 |     return b;
 153 | }
 154 | 
 155 | void draw_label(image a, int r, int c, image label, const float *rgb)
 156 | {
 157 |     int w = label.w;
 158 |     int h = label.h;
 159 |     if (r - h >= 0) r = r - h;
 160 | 
 161 |     int i, j, k;
 162 |     for(j = 0; j < h && j + r < a.h; ++j){
 163 |         for(i = 0; i < w && i + c < a.w; ++i){
 164 |             for(k = 0; k < label.c; ++k){
 165 |                 float val = get_pixel(label, i, j, k);
 166 |                 set_pixel(a, i+c, j+r, k, rgb[k] * val);
 167 |             }
 168 |         }
 169 |     }
 170 | }
 171 | 
 172 | void draw_box(image a, int x1, int y1, int x2, int y2, float r, float g, float b)
 173 | {
 174 |     //normalize_image(a);
 175 |     int i;
 176 |     if(x1 < 0) x1 = 0;
 177 |     if(x1 >= a.w) x1 = a.w-1;
 178 |     if(x2 < 0) x2 = 0;
 179 |     if(x2 >= a.w) x2 = a.w-1;
 180 | 
 181 |     if(y1 < 0) y1 = 0;
 182 |     if(y1 >= a.h) y1 = a.h-1;
 183 |     if(y2 < 0) y2 = 0;
 184 |     if(y2 >= a.h) y2 = a.h-1;
 185 | 
 186 |     for(i = x1; i <= x2; ++i){
 187 |         a.data[i + y1*a.w + 0*a.w*a.h] = r;
 188 |         a.data[i + y2*a.w + 0*a.w*a.h] = r;
 189 | 
 190 |         a.data[i + y1*a.w + 1*a.w*a.h] = g;
 191 |         a.data[i + y2*a.w + 1*a.w*a.h] = g;
 192 | 
 193 |         a.data[i + y1*a.w + 2*a.w*a.h] = b;
 194 |         a.data[i + y2*a.w + 2*a.w*a.h] = b;
 195 |     }
 196 |     for(i = y1; i <= y2; ++i){
 197 |         a.data[x1 + i*a.w + 0*a.w*a.h] = r;
 198 |         a.data[x2 + i*a.w + 0*a.w*a.h] = r;
 199 | 
 200 |         a.data[x1 + i*a.w + 1*a.w*a.h] = g;
 201 |         a.data[x2 + i*a.w + 1*a.w*a.h] = g;
 202 | 
 203 |         a.data[x1 + i*a.w + 2*a.w*a.h] = b;
 204 |         a.data[x2 + i*a.w + 2*a.w*a.h] = b;
 205 |     }
 206 | }
 207 | 
 208 |                    //printf("left:%d top:%d right:%d bot:%d width:%d\n",left,top,right,bot,width);
 209 | void draw_box_width(image a, int x1, int y1, int x2, int y2, int w, float r, float g, float b)  
 210 | {
 211 |     int i;
 212 |     for(i = 0; i < w; ++i){
 213 |         draw_box(a, x1+i, y1+i, x2-i, y2-i, r, g, b);
 214 |     }
 215 | }
 216 | 
 217 | void draw_bbox(image a, box bbox, int w, float r, float g, float b)
 218 | {
 219 |     int left  = (bbox.x-bbox.w/2)*a.w;
 220 |     int right = (bbox.x+bbox.w/2)*a.w;
 221 |     int top   = (bbox.y-bbox.h/2)*a.h;
 222 |     int bot   = (bbox.y+bbox.h/2)*a.h;
 223 | 
 224 |     int i;
 225 |     for(i = 0; i < w; ++i){
 226 |         draw_box(a, left+i, top+i, right-i, bot-i, r, g, b);
 227 |     }
 228 | }
 229 | 
 230 | image **load_alphabet()
 231 | {
 232 |     int i, j;
 233 |     const int nsize = 8;
 234 |     image **alphabets = calloc(nsize, sizeof(image));
 235 |     for(j = 0; j < nsize; ++j){
 236 |         alphabets[j] = calloc(128, sizeof(image));
 237 |         for(i = 32; i < 127; ++i){
 238 |             char buff[256];
 239 |             sprintf(buff, "data/labels/%d_%d.png", i, j);
 240 |             alphabets[j][i] = load_image_color(buff, 0, 0);
 241 |         }
 242 |     }
 243 |     return alphabets;
 244 | }
 245 | 
 246 | void draw_detections(image im, int num, float thresh, box *boxes, float **probs, float **masks, char **names, image **alphabet, int classes)  //被demo.c调用的函数，画出ROI
 247 | {
 248 |     count++;
 249 |     char image_name[256];
 250 |     sprintf(image_name, "%s_%04d.jpg", "output", count);
 251 |     //printf("%s\n",image_name);
 252 |     char image_roi_name[256];
 253 |     int roi_num=0;  //一张图片可能有多个ROI
 254 |     sprintf(image_roi_name, "ROI/%04d_roi_%03d.jpg", count, roi_num);  
 255 |     
 256 | 
 257 |     int i,j;
 258 |     for(i = 0; i < num; ++i){  //遍历所有检测到的物体
 259 |         char labelstr[4096] = {0};
 260 |         int class = -1;
 261 |         for(j = 0; j < classes; ++j){  //遍历所有分类
 262 |             if (probs[i][j] > thresh){
 263 |                 if (class < 0) {
 264 |                     strcat(labelstr, names[j]);
 265 |                     class = j;
 266 |                 } else {
 267 |                     strcat(labelstr, ", ");
 268 |                     strcat(labelstr, names[j]);
 269 |                 }
 270 |                 //printf("%s\n","hehe");
 271 |                 printf(" %s: %.0f%%\n", names[j], probs[i][j]*100);
 272 |             }
 273 |         }
 274 |         if(class >= 0){
 275 |             int width = im.h * .006;
 276 | 
 277 |             /*
 278 |                if(0){
 279 |                width = pow(prob, 1./2.)*10+1;
 280 |                alphabet = 0;
 281 |                }
 282 |              */
 283 | 
 284 |             //printf("%d %s: %.0f%%\n", i, names[class], prob*100);
 285 |             int offset = class*123457 % classes;
 286 |             float red = get_color(2,offset,classes);
 287 |             float green = get_color(1,offset,classes);
 288 |             float blue = get_color(0,offset,classes);
 289 |             float rgb[3];
 290 | 
 291 |             //width = prob*20+2;
 292 | 
 293 |             rgb[0] = red;
 294 |             rgb[1] = green;
 295 |             rgb[2] = blue;
 296 |             box b = boxes[i];
 297 | 
 298 |             int left  = (b.x-b.w/2.)*im.w;
 299 |             int right = (b.x+b.w/2.)*im.w;
 300 |             int top   = (b.y-b.h/2.)*im.h;
 301 |             int bot   = (b.y+b.h/2.)*im.h;
 302 | 
 303 |             if(left < 0) left = 0;
 304 |             if(right > im.w-1) right = im.w-1;
 305 |             if(top < 0) top = 0;
 306 |             if(bot > im.h-1) bot = im.h-1;
 307 | 
 308 |             draw_box_width(im, left, top, right, bot, width, red, green, blue);  //画ROI
 309 |             printf("  left:%d top:%d right:%d bot:%d\n",left,top,right,bot);  //确定左上角和右下角的位置
 310 | ///*
 311 |             //加载源图像CV_LOAD_IMAGE_COLOR或者CV_LOAD_IMAGE_GRAYSCALE
 312 |             IplImage *pSrc = cvLoadImage(image_name, -1);
 313 |             //printf("  loading %s\n",image_name);
 314 |             if(!pSrc) {
 315 |                 printf("  %s load failed!\n",image_name); 
 316 |                 return ;
 317 |             }
 318 |             CvSize size= cvSize(right-left,bot-top);//区域大小
 319 |             cvSetImageROI(pSrc,cvRect(left,top,size.width, size.height));//设置源图像ROI 左边界，上边界，宽度，高度
 320 |             IplImage* pDest = cvCreateImage(size,pSrc->depth,pSrc->nChannels);//创建目标图像
 321 |             cvCopy(pSrc,pDest,0); //复制图像
 322 |             cvResetImageROI(pSrc);//源图像用完后，清空ROI
 323 |             cvSaveImage(image_roi_name,pDest,0);//保存目标图像
 324 |             printf("  saved %s\n",image_roi_name);
 325 |             roi_num++;
 326 |             sprintf(image_roi_name, "ROI/%04d_roi_%03d.jpg", count, roi_num); 
 327 | //*/
 328 | 
 329 |             if (alphabet) {  //画标签
 330 | ///*
 331 |                 image label = get_label(alphabet, labelstr, (im.h*.03)/10);
 332 |                 //printf("label:%s\n",label);  //null
 333 |                 draw_label(im, top + width, left, label, rgb);  //标签
 334 |                 free_image(label);
 335 | //*/
 336 |             }
 337 |             if (masks){
 338 |                 image mask = float_to_image(14, 14, 1, masks[i]);
 339 |                 image resized_mask = resize_image(mask, b.w*im.w, b.h*im.h);
 340 |                 image tmask = threshold_image(resized_mask, .5);
 341 |                 embed_image(tmask, im, left, top);
 342 |                 free_image(mask);
 343 |                 free_image(resized_mask);
 344 |                 free_image(tmask);
 345 |             }
 346 |         }
 347 |     }
 348 | }
 349 | 
 350 | void transpose_image(image im)
 351 | {
 352 |     assert(im.w == im.h);
 353 |     int n, m;
 354 |     int c;
 355 |     for(c = 0; c < im.c; ++c){
 356 |         for(n = 0; n < im.w-1; ++n){
 357 |             for(m = n + 1; m < im.w; ++m){
 358 |                 float swap = im.data[m + im.w*(n + im.h*c)];
 359 |                 im.data[m + im.w*(n + im.h*c)] = im.data[n + im.w*(m + im.h*c)];
 360 |                 im.data[n + im.w*(m + im.h*c)] = swap;
 361 |             }
 362 |         }
 363 |     }
 364 | }
 365 | 
 366 | void rotate_image_cw(image im, int times)
 367 | {
 368 |     assert(im.w == im.h);
 369 |     times = (times + 400) % 4;
 370 |     int i, x, y, c;
 371 |     int n = im.w;
 372 |     for(i = 0; i < times; ++i){
 373 |         for(c = 0; c < im.c; ++c){
 374 |             for(x = 0; x < n/2; ++x){
 375 |                 for(y = 0; y < (n-1)/2 + 1; ++y){
 376 |                     float temp = im.data[y + im.w*(x + im.h*c)];
 377 |                     im.data[y + im.w*(x + im.h*c)] = im.data[n-1-x + im.w*(y + im.h*c)];
 378 |                     im.data[n-1-x + im.w*(y + im.h*c)] = im.data[n-1-y + im.w*(n-1-x + im.h*c)];
 379 |                     im.data[n-1-y + im.w*(n-1-x + im.h*c)] = im.data[x + im.w*(n-1-y + im.h*c)];
 380 |                     im.data[x + im.w*(n-1-y + im.h*c)] = temp;
 381 |                 }
 382 |             }
 383 |         }
 384 |     }
 385 | }
 386 | 
 387 | void flip_image(image a)
 388 | {
 389 |     int i,j,k;
 390 |     for(k = 0; k < a.c; ++k){
 391 |         for(i = 0; i < a.h; ++i){
 392 |             for(j = 0; j < a.w/2; ++j){
 393 |                 int index = j + a.w*(i + a.h*(k));
 394 |                 int flip = (a.w - j - 1) + a.w*(i + a.h*(k));
 395 |                 float swap = a.data[flip];
 396 |                 a.data[flip] = a.data[index];
 397 |                 a.data[index] = swap;
 398 |             }
 399 |         }
 400 |     }
 401 | }
 402 | 
 403 | image image_distance(image a, image b)
 404 | {
 405 |     int i,j;
 406 |     image dist = make_image(a.w, a.h, 1);
 407 |     for(i = 0; i < a.c; ++i){
 408 |         for(j = 0; j < a.h*a.w; ++j){
 409 |             dist.data[j] += pow(a.data[i*a.h*a.w+j]-b.data[i*a.h*a.w+j],2);
 410 |         }
 411 |     }
 412 |     for(j = 0; j < a.h*a.w; ++j){
 413 |         dist.data[j] = sqrt(dist.data[j]);
 414 |     }
 415 |     return dist;
 416 | }
 417 | 
 418 | void ghost_image(image source, image dest, int dx, int dy)
 419 | {
 420 |     int x,y,k;
 421 |     float max_dist = sqrt((-source.w/2. + .5)*(-source.w/2. + .5));
 422 |     for(k = 0; k < source.c; ++k){
 423 |         for(y = 0; y < source.h; ++y){
 424 |             for(x = 0; x < source.w; ++x){
 425 |                 float dist = sqrt((x - source.w/2. + .5)*(x - source.w/2. + .5) + (y - source.h/2. + .5)*(y - source.h/2. + .5));
 426 |                 float alpha = (1 - dist/max_dist);
 427 |                 if(alpha < 0) alpha = 0;
 428 |                 float v1 = get_pixel(source, x,y,k);
 429 |                 float v2 = get_pixel(dest, dx+x,dy+y,k);
 430 |                 float val = alpha*v1 + (1-alpha)*v2;
 431 |                 set_pixel(dest, dx+x, dy+y, k, val);
 432 |             }
 433 |         }
 434 |     }
 435 | }
 436 | 
 437 | void embed_image(image source, image dest, int dx, int dy)
 438 | {
 439 |     int x,y,k;
 440 |     for(k = 0; k < source.c; ++k){
 441 |         for(y = 0; y < source.h; ++y){
 442 |             for(x = 0; x < source.w; ++x){
 443 |                 float val = get_pixel(source, x,y,k);
 444 |                 set_pixel(dest, dx+x, dy+y, k, val);
 445 |             }
 446 |         }
 447 |     }
 448 | }
 449 | 
 450 | image collapse_image_layers(image source, int border)
 451 | {
 452 |     int h = source.h;
 453 |     h = (h+border)*source.c - border;
 454 |     image dest = make_image(source.w, h, 1);
 455 |     int i;
 456 |     for(i = 0; i < source.c; ++i){
 457 |         image layer = get_image_layer(source, i);
 458 |         int h_offset = i*(source.h+border);
 459 |         embed_image(layer, dest, 0, h_offset);
 460 |         free_image(layer);
 461 |     }
 462 |     return dest;
 463 | }
 464 | 
 465 | void constrain_image(image im)
 466 | {
 467 |     int i;
 468 |     for(i = 0; i < im.w*im.h*im.c; ++i){
 469 |         if(im.data[i] < 0) im.data[i] = 0;
 470 |         if(im.data[i] > 1) im.data[i] = 1;
 471 |     }
 472 | }
 473 | 
 474 | void normalize_image(image p)
 475 | {
 476 |     int i;
 477 |     float min = 9999999;
 478 |     float max = -999999;
 479 | 
 480 |     for(i = 0; i < p.h*p.w*p.c; ++i){
 481 |         float v = p.data[i];
 482 |         if(v < min) min = v;
 483 |         if(v > max) max = v;
 484 |     }
 485 |     if(max - min < .000000001){
 486 |         min = 0;
 487 |         max = 1;
 488 |     }
 489 |     for(i = 0; i < p.c*p.w*p.h; ++i){
 490 |         p.data[i] = (p.data[i] - min)/(max-min);
 491 |     }
 492 | }
 493 | 
 494 | void normalize_image2(image p)
 495 | {
 496 |     float *min = calloc(p.c, sizeof(float));
 497 |     float *max = calloc(p.c, sizeof(float));
 498 |     int i,j;
 499 |     for(i = 0; i < p.c; ++i) min[i] = max[i] = p.data[i*p.h*p.w];
 500 | 
 501 |     for(j = 0; j < p.c; ++j){
 502 |         for(i = 0; i < p.h*p.w; ++i){
 503 |             float v = p.data[i+j*p.h*p.w];
 504 |             if(v < min[j]) min[j] = v;
 505 |             if(v > max[j]) max[j] = v;
 506 |         }
 507 |     }
 508 |     for(i = 0; i < p.c; ++i){
 509 |         if(max[i] - min[i] < .000000001){
 510 |             min[i] = 0;
 511 |             max[i] = 1;
 512 |         }
 513 |     }
 514 |     for(j = 0; j < p.c; ++j){
 515 |         for(i = 0; i < p.w*p.h; ++i){
 516 |             p.data[i+j*p.h*p.w] = (p.data[i+j*p.h*p.w] - min[j])/(max[j]-min[j]);
 517 |         }
 518 |     }
 519 |     free(min);
 520 |     free(max);
 521 | }
 522 | 
 523 | void copy_image_into(image src, image dest)
 524 | {
 525 |     memcpy(dest.data, src.data, src.h*src.w*src.c*sizeof(float));
 526 | }
 527 | 
 528 | image copy_image(image p)
 529 | {
 530 |     image copy = p;
 531 |     copy.data = calloc(p.h*p.w*p.c, sizeof(float));
 532 |     memcpy(copy.data, p.data, p.h*p.w*p.c*sizeof(float));
 533 |     return copy;
 534 | }
 535 | 
 536 | void rgbgr_image(image im)
 537 | {
 538 |     int i;
 539 |     for(i = 0; i < im.w*im.h; ++i){
 540 |         float swap = im.data[i];
 541 |         im.data[i] = im.data[i+im.w*im.h*2];
 542 |         im.data[i+im.w*im.h*2] = swap;
 543 |     }
 544 | }
 545 | 
 546 | #ifdef OPENCV
 547 | void show_image_cv(image p, const char *name, IplImage *disp)
 548 | {
 549 |     int x,y,k;
 550 |     if(p.c == 3) rgbgr_image(p);
 551 |     //normalize_image(copy);
 552 | 
 553 |     char buff[256];
 554 |     //sprintf(buff, "%s (%d)", name, windows);
 555 |     sprintf(buff, "%s", name);
 556 | 
 557 |     int step = disp->widthStep;
 558 |     cvNamedWindow(buff, CV_WINDOW_NORMAL); 
 559 |     //cvMoveWindow(buff, 100*(windows%10) + 200*(windows/10), 100*(windows%10));
 560 |     ++windows;
 561 |     for(y = 0; y < p.h; ++y){
 562 |         for(x = 0; x < p.w; ++x){
 563 |             for(k= 0; k < p.c; ++k){
 564 |                 disp->imageData[y*step + x*p.c + k] = (unsigned char)(get_pixel(p,x,y,k)*255);
 565 |             }
 566 |         }
 567 |     }
 568 |     if(0){
 569 |         int w = 448;
 570 |         int h = w*p.h/p.w;
 571 |         if(h > 1000){
 572 |             h = 1000;
 573 |             w = h*p.w/p.h;
 574 |         }
 575 |         IplImage *buffer = disp;
 576 |         disp = cvCreateImage(cvSize(w, h), buffer->depth, buffer->nChannels);
 577 |         cvResize(buffer, disp, CV_INTER_LINEAR);
 578 |         cvReleaseImage(&buffer);
 579 |     }
 580 |     cvShowImage(buff, disp);
 581 | }
 582 | #endif
 583 | 
 584 | void show_image(image p, const char *name)
 585 | {
 586 | #ifdef OPENCV
 587 |     IplImage *disp = cvCreateImage(cvSize(p.w,p.h), IPL_DEPTH_8U, p.c);
 588 |     image copy = copy_image(p);
 589 |     constrain_image(copy);
 590 |     show_image_cv(copy, name, disp);
 591 |     free_image(copy);
 592 |     cvReleaseImage(&disp);
 593 | #else
 594 |     fprintf(stderr, "Not compiled with OpenCV, saving to %s.png instead\n", name);
 595 |     save_image(p, name);
 596 | #endif
 597 | }
 598 | 
 599 | #ifdef OPENCV
 600 | 
 601 | void ipl_into_image(IplImage* src, image im)
 602 | {
 603 |     unsigned char *data = (unsigned char *)src->imageData;
 604 |     int h = src->height;
 605 |     int w = src->width;
 606 |     int c = src->nChannels;
 607 |     int step = src->widthStep;
 608 |     int i, j, k;
 609 | 
 610 |     for(i = 0; i < h; ++i){
 611 |         for(k= 0; k < c; ++k){
 612 |             for(j = 0; j < w; ++j){
 613 |                 im.data[k*w*h + i*w + j] = data[i*step + j*c + k]/255.;
 614 |             }
 615 |         }
 616 |     }
 617 | }
 618 | 
 619 | image ipl_to_image(IplImage* src)
 620 | {
 621 |     int h = src->height;
 622 |     int w = src->width;
 623 |     int c = src->nChannels;
 624 |     image out = make_image(w, h, c);
 625 |     ipl_into_image(src, out);
 626 |     return out;
 627 | }
 628 | 
 629 | image load_image_cv(char *filename, int channels)
 630 | {
 631 |     IplImage* src = 0;
 632 |     int flag = -1;
 633 |     if (channels == 0) flag = -1;
 634 |     else if (channels == 1) flag = 0;
 635 |     else if (channels == 3) flag = 1;
 636 |     else {
 637 |         fprintf(stderr, "OpenCV can't force load with %d channels\n", channels);
 638 |     }
 639 | 
 640 |     if( (src = cvLoadImage(filename, flag)) == 0 )
 641 |     {
 642 |         fprintf(stderr, "Cannot load image \"%s\"\n", filename);
 643 |         char buff[256];
 644 |         sprintf(buff, "echo %s >> bad.list", filename);
 645 |         system(buff);
 646 |         return make_image(10,10,3);
 647 |         //exit(0);
 648 |     }
 649 |     image out = ipl_to_image(src);
 650 |     cvReleaseImage(&src);
 651 |     rgbgr_image(out);
 652 |     return out;
 653 | }
 654 | 
 655 | void flush_stream_buffer(CvCapture *cap, int n)
 656 | {
 657 |     int i;
 658 |     for(i = 0; i < n; ++i) {
 659 |         cvQueryFrame(cap);
 660 |     }
 661 | }
 662 | 
 663 | image get_image_from_stream(CvCapture *cap)
 664 | {
 665 |     IplImage* src = cvQueryFrame(cap);
 666 |     if (!src) return make_empty_image(0,0,0);
 667 |     image im = ipl_to_image(src);
 668 |     rgbgr_image(im);
 669 |     return im;
 670 | }
 671 | 
 672 | int fill_image_from_stream(CvCapture *cap, image im)
 673 | {
 674 |     IplImage* src = cvQueryFrame(cap);
 675 |     if (!src) return 0;
 676 |     ipl_into_image(src, im);
 677 |     rgbgr_image(im);
 678 |     return 1;
 679 | }
 680 | 
 681 | void save_image_jpg(image p, const char *name)
 682 | {
 683 |     image copy = copy_image(p);
 684 |     if(p.c == 3) rgbgr_image(copy);
 685 |     int x,y,k;
 686 | 
 687 |     char buff[256];
 688 |     sprintf(buff, "%s.jpg", name);
 689 | 
 690 |     IplImage *disp = cvCreateImage(cvSize(p.w,p.h), IPL_DEPTH_8U, p.c);
 691 |     int step = disp->widthStep;
 692 |     for(y = 0; y < p.h; ++y){
 693 |         for(x = 0; x < p.w; ++x){
 694 |             for(k= 0; k < p.c; ++k){
 695 |                 disp->imageData[y*step + x*p.c + k] = (unsigned char)(get_pixel(copy,x,y,k)*255);
 696 |             }
 697 |         }
 698 |     }
 699 |     cvSaveImage(buff, disp,0);
 700 |     cvReleaseImage(&disp);
 701 |     free_image(copy);
 702 | }
 703 | #endif
 704 | 
 705 | void save_image_png(image im, const char *name)
 706 | {
 707 |     char buff[256];
 708 |     //sprintf(buff, "%s (%d)", name, windows);
 709 |     sprintf(buff, "%s.png", name);
 710 |     unsigned char *data = calloc(im.w*im.h*im.c, sizeof(char));
 711 |     int i,k;
 712 |     for(k = 0; k < im.c; ++k){
 713 |         for(i = 0; i < im.w*im.h; ++i){
 714 |             data[i*im.c+k] = (unsigned char) (255*im.data[i + k*im.w*im.h]);
 715 |         }
 716 |     }
 717 |     int success = stbi_write_png(buff, im.w, im.h, im.c, data, im.w*im.c);
 718 |     free(data);
 719 |     if(!success) fprintf(stderr, "Failed to write image %s\n", buff);
 720 | }
 721 | 
 722 | void save_image(image im, const char *name)
 723 | {
 724 | #ifdef OPENCV
 725 |     save_image_jpg(im, name);
 726 | #else
 727 |     save_image_png(im, name);
 728 | #endif
 729 | }
 730 | 
 731 | 
 732 | void show_image_layers(image p, char *name)
 733 | {
 734 |     int i;
 735 |     char buff[256];
 736 |     for(i = 0; i < p.c; ++i){
 737 |         sprintf(buff, "%s - Layer %d", name, i);
 738 |         image layer = get_image_layer(p, i);
 739 |         show_image(layer, buff);
 740 |         free_image(layer);
 741 |     }
 742 | }
 743 | 
 744 | void show_image_collapsed(image p, char *name)
 745 | {
 746 |     image c = collapse_image_layers(p, 1);
 747 |     show_image(c, name);
 748 |     free_image(c);
 749 | }
 750 | 
 751 | image make_empty_image(int w, int h, int c)
 752 | {
 753 |     image out;
 754 |     out.data = 0;
 755 |     out.h = h;
 756 |     out.w = w;
 757 |     out.c = c;
 758 |     return out;
 759 | }
 760 | 
 761 | image make_image(int w, int h, int c)
 762 | {
 763 |     image out = make_empty_image(w,h,c);
 764 |     out.data = calloc(h*w*c, sizeof(float));
 765 |     return out;
 766 | }
 767 | 
 768 | image make_random_image(int w, int h, int c)
 769 | {
 770 |     image out = make_empty_image(w,h,c);
 771 |     out.data = calloc(h*w*c, sizeof(float));
 772 |     int i;
 773 |     for(i = 0; i < w*h*c; ++i){
 774 |         out.data[i] = (rand_normal() * .25) + .5;
 775 |     }
 776 |     return out;
 777 | }
 778 | 
 779 | image float_to_image(int w, int h, int c, float *data)
 780 | {
 781 |     image out = make_empty_image(w,h,c);
 782 |     out.data = data;
 783 |     return out;
 784 | }
 785 | 
 786 | void place_image(image im, int w, int h, int dx, int dy, image canvas)
 787 | {
 788 |     int x, y, c;
 789 |     for(c = 0; c < im.c; ++c){
 790 |         for(y = 0; y < h; ++y){
 791 |             for(x = 0; x < w; ++x){
 792 |                 int rx = ((float)x / w) * im.w;
 793 |                 int ry = ((float)y / h) * im.h;
 794 |                 float val = bilinear_interpolate(im, rx, ry, c);
 795 |                 set_pixel(canvas, x + dx, y + dy, c, val);
 796 |             }
 797 |         }
 798 |     }
 799 | }
 800 | 
 801 | image center_crop_image(image im, int w, int h)
 802 | {
 803 |     int m = (im.w < im.h) ? im.w : im.h;   
 804 |     image c = crop_image(im, (im.w - m) / 2, (im.h - m)/2, m, m);
 805 |     image r = resize_image(c, w, h);
 806 |     free_image(c);
 807 |     return r;
 808 | }
 809 | 
 810 | image rotate_crop_image(image im, float rad, float s, int w, int h, float dx, float dy, float aspect)
 811 | {
 812 |     int x, y, c;
 813 |     float cx = im.w/2.;
 814 |     float cy = im.h/2.;
 815 |     image rot = make_image(w, h, im.c);
 816 |     for(c = 0; c < im.c; ++c){
 817 |         for(y = 0; y < h; ++y){
 818 |             for(x = 0; x < w; ++x){
 819 |                 float rx = cos(rad)*((x - w/2.)/s*aspect + dx/s*aspect) - sin(rad)*((y - h/2.)/s + dy/s) + cx;
 820 |                 float ry = sin(rad)*((x - w/2.)/s*aspect + dx/s*aspect) + cos(rad)*((y - h/2.)/s + dy/s) + cy;
 821 |                 float val = bilinear_interpolate(im, rx, ry, c);
 822 |                 set_pixel(rot, x, y, c, val);
 823 |             }
 824 |         }
 825 |     }
 826 |     return rot;
 827 | }
 828 | 
 829 | image rotate_image(image im, float rad)
 830 | {
 831 |     int x, y, c;
 832 |     float cx = im.w/2.;
 833 |     float cy = im.h/2.;
 834 |     image rot = make_image(im.w, im.h, im.c);
 835 |     for(c = 0; c < im.c; ++c){
 836 |         for(y = 0; y < im.h; ++y){
 837 |             for(x = 0; x < im.w; ++x){
 838 |                 float rx = cos(rad)*(x-cx) - sin(rad)*(y-cy) + cx;
 839 |                 float ry = sin(rad)*(x-cx) + cos(rad)*(y-cy) + cy;
 840 |                 float val = bilinear_interpolate(im, rx, ry, c);
 841 |                 set_pixel(rot, x, y, c, val);
 842 |             }
 843 |         }
 844 |     }
 845 |     return rot;
 846 | }
 847 | 
 848 | void fill_image(image m, float s)
 849 | {
 850 |     int i;
 851 |     for(i = 0; i < m.h*m.w*m.c; ++i) m.data[i] = s;
 852 | }
 853 | 
 854 | void translate_image(image m, float s)
 855 | {
 856 |     int i;
 857 |     for(i = 0; i < m.h*m.w*m.c; ++i) m.data[i] += s;
 858 | }
 859 | 
 860 | void scale_image(image m, float s)
 861 | {
 862 |     int i;
 863 |     for(i = 0; i < m.h*m.w*m.c; ++i) m.data[i] *= s;
 864 | }
 865 | 
 866 | image crop_image(image im, int dx, int dy, int w, int h)
 867 | {
 868 |     image cropped = make_image(w, h, im.c);
 869 |     int i, j, k;
 870 |     for(k = 0; k < im.c; ++k){
 871 |         for(j = 0; j < h; ++j){
 872 |             for(i = 0; i < w; ++i){
 873 |                 int r = j + dy;
 874 |                 int c = i + dx;
 875 |                 float val = 0;
 876 |                 r = constrain_int(r, 0, im.h-1);
 877 |                 c = constrain_int(c, 0, im.w-1);
 878 |                 val = get_pixel(im, c, r, k);
 879 |                 set_pixel(cropped, i, j, k, val);
 880 |             }
 881 |         }
 882 |     }
 883 |     return cropped;
 884 | }
 885 | 
 886 | int best_3d_shift_r(image a, image b, int min, int max)
 887 | {
 888 |     if(min == max) return min;
 889 |     int mid = floor((min + max) / 2.);
 890 |     image c1 = crop_image(b, 0, mid, b.w, b.h);
 891 |     image c2 = crop_image(b, 0, mid+1, b.w, b.h);
 892 |     float d1 = dist_array(c1.data, a.data, a.w*a.h*a.c, 10);
 893 |     float d2 = dist_array(c2.data, a.data, a.w*a.h*a.c, 10);
 894 |     free_image(c1);
 895 |     free_image(c2);
 896 |     if(d1 < d2) return best_3d_shift_r(a, b, min, mid);
 897 |     else return best_3d_shift_r(a, b, mid+1, max);
 898 | }
 899 | 
 900 | int best_3d_shift(image a, image b, int min, int max)
 901 | {
 902 |     int i;
 903 |     int best = 0;
 904 |     float best_distance = FLT_MAX;
 905 |     for(i = min; i <= max; i += 2){
 906 |         image c = crop_image(b, 0, i, b.w, b.h);
 907 |         float d = dist_array(c.data, a.data, a.w*a.h*a.c, 100);
 908 |         if(d < best_distance){
 909 |             best_distance = d;
 910 |             best = i;
 911 |         }
 912 |         printf("%d %f\n", i, d);
 913 |         free_image(c);
 914 |     }
 915 |     return best;
 916 | }
 917 | 
 918 | void composite_3d(char *f1, char *f2, char *out, int delta)
 919 | {
 920 |     if(!out) out = "out";
 921 |     image a = load_image(f1, 0,0,0);
 922 |     image b = load_image(f2, 0,0,0);
 923 |     int shift = best_3d_shift_r(a, b, -a.h/100, a.h/100);
 924 | 
 925 |     image c1 = crop_image(b, 10, shift, b.w, b.h);
 926 |     float d1 = dist_array(c1.data, a.data, a.w*a.h*a.c, 100);
 927 |     image c2 = crop_image(b, -10, shift, b.w, b.h);
 928 |     float d2 = dist_array(c2.data, a.data, a.w*a.h*a.c, 100);
 929 | 
 930 |     if(d2 < d1 && 0){
 931 |         image swap = a;
 932 |         a = b;
 933 |         b = swap;
 934 |         shift = -shift;
 935 |         printf("swapped, %d\n", shift);
 936 |     }
 937 |     else{
 938 |         printf("%d\n", shift);
 939 |     }
 940 | 
 941 |     image c = crop_image(b, delta, shift, a.w, a.h);
 942 |     int i;
 943 |     for(i = 0; i < c.w*c.h; ++i){
 944 |         c.data[i] = a.data[i];
 945 |     }
 946 | #ifdef OPENCV
 947 |     save_image_jpg(c, out);
 948 | #else
 949 |     save_image(c, out);
 950 | #endif
 951 | }
 952 | 
 953 | void letterbox_image_into(image im, int w, int h, image boxed)
 954 | {
 955 |     int new_w = im.w;
 956 |     int new_h = im.h;
 957 |     if (((float)w/im.w) < ((float)h/im.h)) {
 958 |         new_w = w;
 959 |         new_h = (im.h * w)/im.w;
 960 |     } else {
 961 |         new_h = h;
 962 |         new_w = (im.w * h)/im.h;
 963 |     }
 964 |     image resized = resize_image(im, new_w, new_h);
 965 |     embed_image(resized, boxed, (w-new_w)/2, (h-new_h)/2); 
 966 |     free_image(resized);
 967 | }
 968 | 
 969 | image letterbox_image(image im, int w, int h)
 970 | {
 971 |     int new_w = im.w;
 972 |     int new_h = im.h;
 973 |     if (((float)w/im.w) < ((float)h/im.h)) {
 974 |         new_w = w;
 975 |         new_h = (im.h * w)/im.w;
 976 |     } else {
 977 |         new_h = h;
 978 |         new_w = (im.w * h)/im.h;
 979 |     }
 980 |     image resized = resize_image(im, new_w, new_h);
 981 |     image boxed = make_image(w, h, im.c);
 982 |     fill_image(boxed, .5);
 983 |     //int i;
 984 |     //for(i = 0; i < boxed.w*boxed.h*boxed.c; ++i) boxed.data[i] = 0;
 985 |     embed_image(resized, boxed, (w-new_w)/2, (h-new_h)/2); 
 986 |     free_image(resized);
 987 |     return boxed;
 988 | }
 989 | 
 990 | image resize_max(image im, int max)
 991 | {
 992 |     int w = im.w;
 993 |     int h = im.h;
 994 |     if(w > h){
 995 |         h = (h * max) / w;
 996 |         w = max;
 997 |     } else {
 998 |         w = (w * max) / h;
 999 |         h = max;
1000 |     }
1001 |     if(w == im.w && h == im.h) return im;
1002 |     image resized = resize_image(im, w, h);
1003 |     return resized;
1004 | }
1005 | 
1006 | image resize_min(image im, int min)
1007 | {
1008 |     int w = im.w;
1009 |     int h = im.h;
1010 |     if(w < h){
1011 |         h = (h * min) / w;
1012 |         w = min;
1013 |     } else {
1014 |         w = (w * min) / h;
1015 |         h = min;
1016 |     }
1017 |     if(w == im.w && h == im.h) return im;
1018 |     image resized = resize_image(im, w, h);
1019 |     return resized;
1020 | }
1021 | 
1022 | image random_crop_image(image im, int w, int h)
1023 | {
1024 |     int dx = rand_int(0, im.w - w);
1025 |     int dy = rand_int(0, im.h - h);
1026 |     image crop = crop_image(im, dx, dy, w, h);
1027 |     return crop;
1028 | }
1029 | 
1030 | augment_args random_augment_args(image im, float angle, float aspect, int low, int high, int w, int h)
1031 | {
1032 |     augment_args a = {0};
1033 |     aspect = rand_scale(aspect);
1034 |     int r = rand_int(low, high);
1035 |     int min = (im.h < im.w*aspect) ? im.h : im.w*aspect;
1036 |     float scale = (float)r / min;
1037 | 
1038 |     float rad = rand_uniform(-angle, angle) * TWO_PI / 360.;
1039 | 
1040 |     float dx = (im.w*scale/aspect - w) / 2.;
1041 |     float dy = (im.h*scale - w) / 2.;
1042 |     //if(dx < 0) dx = 0;
1043 |     //if(dy < 0) dy = 0;
1044 |     dx = rand_uniform(-dx, dx);
1045 |     dy = rand_uniform(-dy, dy);
1046 | 
1047 |     a.rad = rad;
1048 |     a.scale = scale;
1049 |     a.w = w;
1050 |     a.h = h;
1051 |     a.dx = dx;
1052 |     a.dy = dy;
1053 |     a.aspect = aspect;
1054 |     return a;
1055 | }
1056 | 
1057 | image random_augment_image(image im, float angle, float aspect, int low, int high, int w, int h)
1058 | {
1059 |     augment_args a = random_augment_args(im, angle, aspect, low, high, w, h);
1060 |     image crop = rotate_crop_image(im, a.rad, a.scale, a.w, a.h, a.dx, a.dy, a.aspect);
1061 |     return crop;
1062 | }
1063 | 
1064 | float three_way_max(float a, float b, float c)
1065 | {
1066 |     return (a > b) ? ( (a > c) ? a : c) : ( (b > c) ? b : c) ;
1067 | }
1068 | 
1069 | float three_way_min(float a, float b, float c)
1070 | {
1071 |     return (a < b) ? ( (a < c) ? a : c) : ( (b < c) ? b : c) ;
1072 | }
1073 | 
1074 | void yuv_to_rgb(image im)
1075 | {
1076 |     assert(im.c == 3);
1077 |     int i, j;
1078 |     float r, g, b;
1079 |     float y, u, v;
1080 |     for(j = 0; j < im.h; ++j){
1081 |         for(i = 0; i < im.w; ++i){
1082 |             y = get_pixel(im, i , j, 0);
1083 |             u = get_pixel(im, i , j, 1);
1084 |             v = get_pixel(im, i , j, 2);
1085 | 
1086 |             r = y + 1.13983*v;
1087 |             g = y + -.39465*u + -.58060*v;
1088 |             b = y + 2.03211*u;
1089 | 
1090 |             set_pixel(im, i, j, 0, r);
1091 |             set_pixel(im, i, j, 1, g);
1092 |             set_pixel(im, i, j, 2, b);
1093 |         }
1094 |     }
1095 | }
1096 | 
1097 | void rgb_to_yuv(image im)
1098 | {
1099 |     assert(im.c == 3);
1100 |     int i, j;
1101 |     float r, g, b;
1102 |     float y, u, v;
1103 |     for(j = 0; j < im.h; ++j){
1104 |         for(i = 0; i < im.w; ++i){
1105 |             r = get_pixel(im, i , j, 0);
1106 |             g = get_pixel(im, i , j, 1);
1107 |             b = get_pixel(im, i , j, 2);
1108 | 
1109 |             y = .299*r + .587*g + .114*b;
1110 |             u = -.14713*r + -.28886*g + .436*b;
1111 |             v = .615*r + -.51499*g + -.10001*b;
1112 | 
1113 |             set_pixel(im, i, j, 0, y);
1114 |             set_pixel(im, i, j, 1, u);
1115 |             set_pixel(im, i, j, 2, v);
1116 |         }
1117 |     }
1118 | }
1119 | 
1120 | // http://www.cs.rit.edu/~ncs/color/t_convert.html
1121 | void rgb_to_hsv(image im)
1122 | {
1123 |     assert(im.c == 3);
1124 |     int i, j;
1125 |     float r, g, b;
1126 |     float h, s, v;
1127 |     for(j = 0; j < im.h; ++j){
1128 |         for(i = 0; i < im.w; ++i){
1129 |             r = get_pixel(im, i , j, 0);
1130 |             g = get_pixel(im, i , j, 1);
1131 |             b = get_pixel(im, i , j, 2);
1132 |             float max = three_way_max(r,g,b);
1133 |             float min = three_way_min(r,g,b);
1134 |             float delta = max - min;
1135 |             v = max;
1136 |             if(max == 0){
1137 |                 s = 0;
1138 |                 h = 0;
1139 |             }else{
1140 |                 s = delta/max;
1141 |                 if(r == max){
1142 |                     h = (g - b) / delta;
1143 |                 } else if (g == max) {
1144 |                     h = 2 + (b - r) / delta;
1145 |                 } else {
1146 |                     h = 4 + (r - g) / delta;
1147 |                 }
1148 |                 if (h < 0) h += 6;
1149 |                 h = h/6.;
1150 |             }
1151 |             set_pixel(im, i, j, 0, h);
1152 |             set_pixel(im, i, j, 1, s);
1153 |             set_pixel(im, i, j, 2, v);
1154 |         }
1155 |     }
1156 | }
1157 | 
1158 | void hsv_to_rgb(image im)
1159 | {
1160 |     assert(im.c == 3);
1161 |     int i, j;
1162 |     float r, g, b;
1163 |     float h, s, v;
1164 |     float f, p, q, t;
1165 |     for(j = 0; j < im.h; ++j){
1166 |         for(i = 0; i < im.w; ++i){
1167 |             h = 6 * get_pixel(im, i , j, 0);
1168 |             s = get_pixel(im, i , j, 1);
1169 |             v = get_pixel(im, i , j, 2);
1170 |             if (s == 0) {
1171 |                 r = g = b = v;
1172 |             } else {
1173 |                 int index = floor(h);
1174 |                 f = h - index;
1175 |                 p = v*(1-s);
1176 |                 q = v*(1-s*f);
1177 |                 t = v*(1-s*(1-f));
1178 |                 if(index == 0){
1179 |                     r = v; g = t; b = p;
1180 |                 } else if(index == 1){
1181 |                     r = q; g = v; b = p;
1182 |                 } else if(index == 2){
1183 |                     r = p; g = v; b = t;
1184 |                 } else if(index == 3){
1185 |                     r = p; g = q; b = v;
1186 |                 } else if(index == 4){
1187 |                     r = t; g = p; b = v;
1188 |                 } else {
1189 |                     r = v; g = p; b = q;
1190 |                 }
1191 |             }
1192 |             set_pixel(im, i, j, 0, r);
1193 |             set_pixel(im, i, j, 1, g);
1194 |             set_pixel(im, i, j, 2, b);
1195 |         }
1196 |     }
1197 | }
1198 | 
1199 | void grayscale_image_3c(image im)
1200 | {
1201 |     assert(im.c == 3);
1202 |     int i, j, k;
1203 |     float scale[] = {0.299, 0.587, 0.114};
1204 |     for(j = 0; j < im.h; ++j){
1205 |         for(i = 0; i < im.w; ++i){
1206 |             float val = 0;
1207 |             for(k = 0; k < 3; ++k){
1208 |                 val += scale[k]*get_pixel(im, i, j, k);
1209 |             }
1210 |             im.data[0*im.h*im.w + im.w*j + i] = val;
1211 |             im.data[1*im.h*im.w + im.w*j + i] = val;
1212 |             im.data[2*im.h*im.w + im.w*j + i] = val;
1213 |         }
1214 |     }
1215 | }
1216 | 
1217 | image grayscale_image(image im)
1218 | {
1219 |     assert(im.c == 3);
1220 |     int i, j, k;
1221 |     image gray = make_image(im.w, im.h, 1);
1222 |     float scale[] = {0.299, 0.587, 0.114};
1223 |     for(k = 0; k < im.c; ++k){
1224 |         for(j = 0; j < im.h; ++j){
1225 |             for(i = 0; i < im.w; ++i){
1226 |                 gray.data[i+im.w*j] += scale[k]*get_pixel(im, i, j, k);
1227 |             }
1228 |         }
1229 |     }
1230 |     return gray;
1231 | }
1232 | 
1233 | image threshold_image(image im, float thresh)
1234 | {
1235 |     int i;
1236 |     image t = make_image(im.w, im.h, im.c);
1237 |     for(i = 0; i < im.w*im.h*im.c; ++i){
1238 |         t.data[i] = im.data[i]>thresh ? 1 : 0;
1239 |     }
1240 |     return t;
1241 | }
1242 | 
1243 | image blend_image(image fore, image back, float alpha)
1244 | {
1245 |     assert(fore.w == back.w && fore.h == back.h && fore.c == back.c);
1246 |     image blend = make_image(fore.w, fore.h, fore.c);
1247 |     int i, j, k;
1248 |     for(k = 0; k < fore.c; ++k){
1249 |         for(j = 0; j < fore.h; ++j){
1250 |             for(i = 0; i < fore.w; ++i){
1251 |                 float val = alpha * get_pixel(fore, i, j, k) + 
1252 |                     (1 - alpha)* get_pixel(back, i, j, k);
1253 |                 set_pixel(blend, i, j, k, val);
1254 |             }
1255 |         }
1256 |     }
1257 |     return blend;
1258 | }
1259 | 
1260 | void scale_image_channel(image im, int c, float v)
1261 | {
1262 |     int i, j;
1263 |     for(j = 0; j < im.h; ++j){
1264 |         for(i = 0; i < im.w; ++i){
1265 |             float pix = get_pixel(im, i, j, c);
1266 |             pix = pix*v;
1267 |             set_pixel(im, i, j, c, pix);
1268 |         }
1269 |     }
1270 | }
1271 | 
1272 | void translate_image_channel(image im, int c, float v)
1273 | {
1274 |     int i, j;
1275 |     for(j = 0; j < im.h; ++j){
1276 |         for(i = 0; i < im.w; ++i){
1277 |             float pix = get_pixel(im, i, j, c);
1278 |             pix = pix+v;
1279 |             set_pixel(im, i, j, c, pix);
1280 |         }
1281 |     }
1282 | }
1283 | 
1284 | image binarize_image(image im)
1285 | {
1286 |     image c = copy_image(im);
1287 |     int i;
1288 |     for(i = 0; i < im.w * im.h * im.c; ++i){
1289 |         if(c.data[i] > .5) c.data[i] = 1;
1290 |         else c.data[i] = 0;
1291 |     }
1292 |     return c;
1293 | }
1294 | 
1295 | void saturate_image(image im, float sat)
1296 | {
1297 |     rgb_to_hsv(im);
1298 |     scale_image_channel(im, 1, sat);
1299 |     hsv_to_rgb(im);
1300 |     constrain_image(im);
1301 | }
1302 | 
1303 | void hue_image(image im, float hue)
1304 | {
1305 |     rgb_to_hsv(im);
1306 |     int i;
1307 |     for(i = 0; i < im.w*im.h; ++i){
1308 |         im.data[i] = im.data[i] + hue;
1309 |         if (im.data[i] > 1) im.data[i] -= 1;
1310 |         if (im.data[i] < 0) im.data[i] += 1;
1311 |     }
1312 |     hsv_to_rgb(im);
1313 |     constrain_image(im);
1314 | }
1315 | 
1316 | void exposure_image(image im, float sat)
1317 | {
1318 |     rgb_to_hsv(im);
1319 |     scale_image_channel(im, 2, sat);
1320 |     hsv_to_rgb(im);
1321 |     constrain_image(im);
1322 | }
1323 | 
1324 | void distort_image(image im, float hue, float sat, float val)
1325 | {
1326 |     rgb_to_hsv(im);
1327 |     scale_image_channel(im, 1, sat);
1328 |     scale_image_channel(im, 2, val);
1329 |     int i;
1330 |     for(i = 0; i < im.w*im.h; ++i){
1331 |         im.data[i] = im.data[i] + hue;
1332 |         if (im.data[i] > 1) im.data[i] -= 1;
1333 |         if (im.data[i] < 0) im.data[i] += 1;
1334 |     }
1335 |     hsv_to_rgb(im);
1336 |     constrain_image(im);
1337 | }
1338 | 
1339 | void random_distort_image(image im, float hue, float saturation, float exposure)
1340 | {
1341 |     float dhue = rand_uniform(-hue, hue);
1342 |     float dsat = rand_scale(saturation);
1343 |     float dexp = rand_scale(exposure);
1344 |     distort_image(im, dhue, dsat, dexp);
1345 | }
1346 | 
1347 | void saturate_exposure_image(image im, float sat, float exposure)
1348 | {
1349 |     rgb_to_hsv(im);
1350 |     scale_image_channel(im, 1, sat);
1351 |     scale_image_channel(im, 2, exposure);
1352 |     hsv_to_rgb(im);
1353 |     constrain_image(im);
1354 | }
1355 | 
1356 | image resize_image(image im, int w, int h)
1357 | {
1358 |     image resized = make_image(w, h, im.c);   
1359 |     image part = make_image(w, im.h, im.c);
1360 |     int r, c, k;
1361 |     float w_scale = (float)(im.w - 1) / (w - 1);
1362 |     float h_scale = (float)(im.h - 1) / (h - 1);
1363 |     for(k = 0; k < im.c; ++k){
1364 |         for(r = 0; r < im.h; ++r){
1365 |             for(c = 0; c < w; ++c){
1366 |                 float val = 0;
1367 |                 if(c == w-1 || im.w == 1){
1368 |                     val = get_pixel(im, im.w-1, r, k);
1369 |                 } else {
1370 |                     float sx = c*w_scale;
1371 |                     int ix = (int) sx;
1372 |                     float dx = sx - ix;
1373 |                     val = (1 - dx) * get_pixel(im, ix, r, k) + dx * get_pixel(im, ix+1, r, k);
1374 |                 }
1375 |                 set_pixel(part, c, r, k, val);
1376 |             }
1377 |         }
1378 |     }
1379 |     for(k = 0; k < im.c; ++k){
1380 |         for(r = 0; r < h; ++r){
1381 |             float sy = r*h_scale;
1382 |             int iy = (int) sy;
1383 |             float dy = sy - iy;
1384 |             for(c = 0; c < w; ++c){
1385 |                 float val = (1-dy) * get_pixel(part, c, iy, k);
1386 |                 set_pixel(resized, c, r, k, val);
1387 |             }
1388 |             if(r == h-1 || im.h == 1) continue;
1389 |             for(c = 0; c < w; ++c){
1390 |                 float val = dy * get_pixel(part, c, iy+1, k);
1391 |                 add_pixel(resized, c, r, k, val);
1392 |             }
1393 |         }
1394 |     }
1395 | 
1396 |     free_image(part);
1397 |     return resized;
1398 | }
1399 | 
1400 | 
1401 | void test_resize(char *filename)
1402 | {
1403 |     image im = load_image(filename, 0,0, 3);
1404 |     float mag = mag_array(im.data, im.w*im.h*im.c);
1405 |     printf("L2 Norm: %f\n", mag);
1406 |     image gray = grayscale_image(im);
1407 | 
1408 |     image c1 = copy_image(im);
1409 |     image c2 = copy_image(im);
1410 |     image c3 = copy_image(im);
1411 |     image c4 = copy_image(im);
1412 |     distort_image(c1, .1, 1.5, 1.5);
1413 |     distort_image(c2, -.1, .66666, .66666);
1414 |     distort_image(c3, .1, 1.5, .66666);
1415 |     distort_image(c4, .1, .66666, 1.5);
1416 | 
1417 | 
1418 |     show_image(im,   "Original");
1419 |     show_image(gray, "Gray");
1420 |     show_image(c1, "C1");
1421 |     show_image(c2, "C2");
1422 |     show_image(c3, "C3");
1423 |     show_image(c4, "C4");
1424 | #ifdef OPENCV
1425 |     while(1){
1426 |         image aug = random_augment_image(im, 0, .75, 320, 448, 320, 320);
1427 |         show_image(aug, "aug");
1428 |         free_image(aug);
1429 | 
1430 | 
1431 |         float exposure = 1.15;
1432 |         float saturation = 1.15;
1433 |         float hue = .05;
1434 | 
1435 |         image c = copy_image(im);
1436 | 
1437 |         float dexp = rand_scale(exposure);
1438 |         float dsat = rand_scale(saturation);
1439 |         float dhue = rand_uniform(-hue, hue);
1440 | 
1441 |         distort_image(c, dhue, dsat, dexp);
1442 |         show_image(c, "rand");
1443 |         printf("%f %f %f\n", dhue, dsat, dexp);
1444 |         free_image(c);
1445 |         cvWaitKey(0);
1446 |     }
1447 | #endif
1448 | }
1449 | 
1450 | 
1451 | image load_image_stb(char *filename, int channels)
1452 | {
1453 |     int w, h, c;
1454 |     unsigned char *data = stbi_load(filename, &w, &h, &c, channels);
1455 |     if (!data) {
1456 |         fprintf(stderr, "Cannot load image \"%s\"\nSTB Reason: %s\n", filename, stbi_failure_reason());
1457 |         exit(0);
1458 |     }
1459 |     if(channels) c = channels;
1460 |     int i,j,k;
1461 |     image im = make_image(w, h, c);
1462 |     for(k = 0; k < c; ++k){
1463 |         for(j = 0; j < h; ++j){
1464 |             for(i = 0; i < w; ++i){
1465 |                 int dst_index = i + w*j + w*h*k;
1466 |                 int src_index = k + c*i + c*w*j;
1467 |                 im.data[dst_index] = (float)data[src_index]/255.;
1468 |             }
1469 |         }
1470 |     }
1471 |     free(data);
1472 |     return im;
1473 | }
1474 | 
1475 | image load_image(char *filename, int w, int h, int c)
1476 | {
1477 | #ifdef OPENCV
1478 |     image out = load_image_cv(filename, c);
1479 | #else
1480 |     image out = load_image_stb(filename, c);
1481 | #endif
1482 | 
1483 |     if((h && w) && (h != out.h || w != out.w)){
1484 |         image resized = resize_image(out, w, h);
1485 |         free_image(out);
1486 |         out = resized;
1487 |     }
1488 |     return out;
1489 | }
1490 | 
1491 | image load_image_color(char *filename, int w, int h)
1492 | {
1493 |     return load_image(filename, w, h, 3);
1494 | }
1495 | 
1496 | image get_image_layer(image m, int l)
1497 | {
1498 |     image out = make_image(m.w, m.h, 1);
1499 |     int i;
1500 |     for(i = 0; i < m.h*m.w; ++i){
1501 |         out.data[i] = m.data[i+l*m.h*m.w];
1502 |     }
1503 |     return out;
1504 | }
1505 | void print_image(image m)
1506 | {
1507 |     int i, j, k;
1508 |     for(i =0 ; i < m.c; ++i){
1509 |         for(j =0 ; j < m.h; ++j){
1510 |             for(k = 0; k < m.w; ++k){
1511 |                 printf("%.2lf, ", m.data[i*m.h*m.w + j*m.w + k]);
1512 |                 if(k > 30) break;
1513 |             }
1514 |             printf("\n");
1515 |             if(j > 30) break;
1516 |         }
1517 |         printf("\n");
1518 |     }
1519 |     printf("\n");
1520 | }
1521 | 
1522 | image collapse_images_vert(image *ims, int n)
1523 | {
1524 |     int color = 1;
1525 |     int border = 1;
1526 |     int h,w,c;
1527 |     w = ims[0].w;
1528 |     h = (ims[0].h + border) * n - border;
1529 |     c = ims[0].c;
1530 |     if(c != 3 || !color){
1531 |         w = (w+border)*c - border;
1532 |         c = 1;
1533 |     }
1534 | 
1535 |     image filters = make_image(w, h, c);
1536 |     int i,j;
1537 |     for(i = 0; i < n; ++i){
1538 |         int h_offset = i*(ims[0].h+border);
1539 |         image copy = copy_image(ims[i]);
1540 |         //normalize_image(copy);
1541 |         if(c == 3 && color){
1542 |             embed_image(copy, filters, 0, h_offset);
1543 |         }
1544 |         else{
1545 |             for(j = 0; j < copy.c; ++j){
1546 |                 int w_offset = j*(ims[0].w+border);
1547 |                 image layer = get_image_layer(copy, j);
1548 |                 embed_image(layer, filters, w_offset, h_offset);
1549 |                 free_image(layer);
1550 |             }
1551 |         }
1552 |         free_image(copy);
1553 |     }
1554 |     return filters;
1555 | } 
1556 | 
1557 | image collapse_images_horz(image *ims, int n)
1558 | {
1559 |     int color = 1;
1560 |     int border = 1;
1561 |     int h,w,c;
1562 |     int size = ims[0].h;
1563 |     h = size;
1564 |     w = (ims[0].w + border) * n - border;
1565 |     c = ims[0].c;
1566 |     if(c != 3 || !color){
1567 |         h = (h+border)*c - border;
1568 |         c = 1;
1569 |     }
1570 | 
1571 |     image filters = make_image(w, h, c);
1572 |     int i,j;
1573 |     for(i = 0; i < n; ++i){
1574 |         int w_offset = i*(size+border);
1575 |         image copy = copy_image(ims[i]);
1576 |         //normalize_image(copy);
1577 |         if(c == 3 && color){
1578 |             embed_image(copy, filters, w_offset, 0);
1579 |         }
1580 |         else{
1581 |             for(j = 0; j < copy.c; ++j){
1582 |                 int h_offset = j*(size+border);
1583 |                 image layer = get_image_layer(copy, j);
1584 |                 embed_image(layer, filters, w_offset, h_offset);
1585 |                 free_image(layer);
1586 |             }
1587 |         }
1588 |         free_image(copy);
1589 |     }
1590 |     return filters;
1591 | } 
1592 | 
1593 | void show_image_normalized(image im, const char *name)
1594 | {
1595 |     image c = copy_image(im);
1596 |     normalize_image(c);
1597 |     show_image(c, name);
1598 |     free_image(c);
1599 | }
1600 | 
1601 | void show_images(image *ims, int n, char *window)
1602 | {
1603 |     image m = collapse_images_vert(ims, n);
1604 |     /*
1605 |        int w = 448;
1606 |        int h = ((float)m.h/m.w) * 448;
1607 |        if(h > 896){
1608 |        h = 896;
1609 |        w = ((float)m.w/m.h) * 896;
1610 |        }
1611 |        image sized = resize_image(m, w, h);
1612 |      */
1613 |     normalize_image(m);
1614 |     save_image(m, window);
1615 |     show_image(m, window);
1616 |     free_image(m);
1617 | }
1618 | 
1619 | void free_image(image m)
1620 | {
1621 |     if(m.data){
1622 |         free(m.data);
1623 |     }
1624 | }
1625 | 


--------------------------------------------------------------------------------
/yolo/image.h:
--------------------------------------------------------------------------------
 1 | #ifndef IMAGE_H
 2 | #define IMAGE_H
 3 | 
 4 | #include <stdlib.h>
 5 | #include <stdio.h>
 6 | #include <float.h>
 7 | #include <string.h>
 8 | #include <math.h>
 9 | #include "box.h"
10 | #include "darknet.h"
11 | 
12 | #ifndef __cplusplus
13 | #ifdef OPENCV
14 | int fill_image_from_stream(CvCapture *cap, image im);
15 | image ipl_to_image(IplImage* src);
16 | void ipl_into_image(IplImage* src, image im);
17 | void flush_stream_buffer(CvCapture *cap, int n);
18 | void show_image_cv(image p, const char *name, IplImage *disp);
19 | #endif
20 | #endif
21 | 
22 | float get_color(int c, int x, int max);
23 | void draw_box(image a, int x1, int y1, int x2, int y2, float r, float g, float b);
24 | void draw_bbox(image a, box bbox, int w, float r, float g, float b);
25 | void draw_label(image a, int r, int c, image label, const float *rgb);
26 | void write_label(image a, int r, int c, image *characters, char *string, float *rgb);
27 | image image_distance(image a, image b);
28 | void scale_image(image m, float s);
29 | image rotate_crop_image(image im, float rad, float s, int w, int h, float dx, float dy, float aspect);
30 | image center_crop_image(image im, int w, int h);
31 | image random_crop_image(image im, int w, int h);
32 | image random_augment_image(image im, float angle, float aspect, int low, int high, int w, int h);
33 | augment_args random_augment_args(image im, float angle, float aspect, int low, int high, int w, int h);
34 | void letterbox_image_into(image im, int w, int h, image boxed);
35 | image resize_max(image im, int max);
36 | void translate_image(image m, float s);
37 | void embed_image(image source, image dest, int dx, int dy);
38 | void place_image(image im, int w, int h, int dx, int dy, image canvas);
39 | void saturate_image(image im, float sat);
40 | void exposure_image(image im, float sat);
41 | void distort_image(image im, float hue, float sat, float val);
42 | void saturate_exposure_image(image im, float sat, float exposure);
43 | void rgb_to_hsv(image im);
44 | void hsv_to_rgb(image im);
45 | void yuv_to_rgb(image im);
46 | void rgb_to_yuv(image im);
47 | 
48 | 
49 | image collapse_image_layers(image source, int border);
50 | image collapse_images_horz(image *ims, int n);
51 | image collapse_images_vert(image *ims, int n);
52 | 
53 | void show_image_normalized(image im, const char *name);
54 | void show_images(image *ims, int n, char *window);
55 | void show_image_layers(image p, char *name);
56 | void show_image_collapsed(image p, char *name);
57 | 
58 | void print_image(image m);
59 | 
60 | image make_empty_image(int w, int h, int c);
61 | void copy_image_into(image src, image dest);
62 | 
63 | image get_image_layer(image m, int l);
64 | 
65 | #endif
66 | 
67 | 


--------------------------------------------------------------------------------