├── README.md
├── code
    ├── divede_trainAndVal.py
    ├── feature_extract.py
    └── trainMultiFeature.py
├── flower1.jpg
├── flower2.jpg
├── flower3.jpg
├── flower4.jpg
├── flower5.jpg
├── readme.txt
├── result.PNG
└── result1.PNG


/README.md:
--------------------------------------------------------------------------------
 1 | # 使用迁移学习做一个简单的baseline
 2 | ## 实验环境
 3 | * 系统：windows
 4 | * 编辑器：spyder
 5 | * 工具：keras  
 6 | 
 7 | ## 总体思路
 8 | * 使用inceptionv3等模型，在训练集和验证集上提取出特征和label并作为h5进行保存。可以使用多个模型保存特征。提取出特征后，构建两层神经网络做分类器。进行训练，训练速度很快。总共只需5分钟就能得到一个94%的分类器。
 9 | * 修改keras中自带的resize方法，改成antialias,原有方法会产生纹波.  
10 | 
11 | ## 实验内容
12 | * 5种不同的花图片，共3600张。分别放在5个文件夹中，取10%作为验证集，使用inceptionv3,vgg16,residual50提取特征。
13 | * ![flower1](flower1.jpg)
14 | * ![flower2](flower2.jpg)
15 | * ![flower3](flower3.jpg)
16 | * ![flower4](flower4.jpg)
17 | * ![flower5](flower5.jpg)
18 | 
19 | ## 具体步骤
20 | * 数据集：[链接](http://download.tensorflow.org/example_images/flower_photos.tgz)
21 | * divide_trainAndval.py将原文件夹，切分成train和val两个文件夹。  
22 | * inceptionv3,inceptionv3,vgg16,residual50提取特征,代码:
23 | 
24 | ```python
25 | basic_model = inception_v3.Inception_V3(include_top = False,weight = 'imagenet')
26 | feature = GlobalAveragePooling2D()(basic_model.output)
27 | model = models(inputs = basic_model.input,outputs = feature)
28 | f = model.predict(data)	  #f即为提取出的特征。
29 | ```  
30 | 
31 | * data使用ImageDataGenerator()和gen.flow\_from\_direction产生.其中keras自带的resize方法很差，会对图片产生纹波，换成interpolation='antialias'。这个需要在keras里面进行修改，在"C:\Users\tunan\AppData\Local\Continuum\anaconda3\envs\tensorflow\lib\site-packages\keras_preprocessing\image.py"文件第34行加入`'antialias':pil_image.ANTIALIAS`
32 | 
33 | ```python
34 | gen = ImageDataGenerator()
35 | train_gen = gen.flow_from_directory(r"data\valAndTrain\train",
36 |                                     shuffle = False,batch_size = 16,
37 |                                     class_mode = 'categorical',
38 |                                     target_size = input_size,
39 |                                     interpolation='antialias',
40 |                                      )
41 | data,label = train_gen.next()	  #取出每次的数据和标签
42 | ```
43 | 
44 | * 用h5py的方法储存提取出的特征。与直接用np.array写入文件的好处是，不用一次写入文件，分批次写入速度更快。而且写入是以字典的形式，容易读取。  
45 | 
46 | ```python  #上下对齐，反上斜点。
47 | dt = h5py.special_dtype(vlen = str)
48 | with h5py.File("data\\feature") as h:
49 |     h.create_dataset("train", data=train_feature)
50 |     h.create_dataset("val", data=val_feature) 
51 |     h.create_dataset('test',data = test_feature)
52 |     h.create_dataset("train_labels", data=train_labels)
53 |     h.create_dataset("val_labels", data=val_labels)
54 |     ds = h.create_dataset('file_name',test_name.shape,dtype = dt)
55 |     ds[:] = test_name
56 | ```  
57 | 
58 | * 该代码块用字典的方式将train\_feature,val\_feature等变量储存到h5文件中。  
59 |   存储代码:1、h = h5py.File("name")  
60 |   2、h.create\_dataset("train",data = train\_feature)  
61 |   3、储存字符串，dt = h5py.special\_dtype(vlen = str);   #定义类型  
62 |      ds = h.create\_dataset('file_name',test_name.shape,dtype = dt) #定义变量，大小和类型  
63 |      ds[:] = test_name   #对其赋值
64 | * 读取方式:
65 |   1、h = h5py.File('name','r')  
66 |   2、train\_feature = np.array(h[train_feature])
67 | 
68 | ## 结果  
69 | * 训练速度很快，在2000张图片（400\*400）时只用5分钟即可完成所有训练。而使用完整的迁移学习+数据增强+fine tune 需要**10个小时**才能得到最终结果。因此该方法适合做一个迁移学习的baseline，查看迁移学习是否有效。
70 | * 特征提取3300张图片共需要5分钟，使用提取好的特征进行训练很快，几乎秒完成。结果：训练集上的正确率：95.07%，验证集上的正确率：93.22% (注：有时候会出现很差的情况，无法训练，故需要多训练两次看结果)
71 | ![result](result.PNG)  
72 | * 使用多一层的dense（512），结果变好一些。96.67%，验证集上的正确率：94.21%。这里可以看出多使用一层有明显提升。
73 | ![result](result1.PNG) 
74 | 
75 | 
76 | 


--------------------------------------------------------------------------------
/code/divede_trainAndVal.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | Created on Fri Jul 20 16:20:14 2018
 4 | 
 5 | @author: tunan
 6 | """
 7 | 
 8 | #from reshapeAndLogist import build_resize,read_img,build_test_resize
 9 | import os
10 | import numpy as np
11 | import shutil
12 | def read_img_path(file_path):
13 |     path = os.listdir(file_path)
14 |     path = [os.path.join(file_path,p) for p in path]
15 |     return path
16 | def build_remove(source,target):
17 |     if not os.path.exists(target):
18 |         os.mkdir(target)
19 |         print("build target direction")
20 |     for image in source:
21 |         shutil.copy(image,target+'\\'+image.split("\\")[-1])
22 |         
23 | if __name__ == '__main__':
24 |     os.chdir("..")
25 |     #分到train和test中
26 |     SOURCE_PATH = 'data\\flower_photos'
27 |     resizeTrainAndTest = "data\\valAndTrain"
28 |     if not os.path.exists(resizeTrainAndTest):
29 |         os.mkdir(resizeTrainAndTest)
30 |         print("build resizeTrainAndTest direction")
31 |     TRAIN_PATH = os.path.join(resizeTrainAndTest,"train")
32 |     VAL_PATH = os.path.join(resizeTrainAndTest,"val")
33 |     #切换到训练夹目录下后
34 |     if not os.path.exists(TRAIN_PATH):
35 |         os.mkdir(TRAIN_PATH)
36 |         print("build TRAIN_PATH direction")
37 |     if not os.path.exists(VAL_PATH):
38 |         os.mkdir(VAL_PATH)
39 |         print("build TEST_PATH direction")
40 |     for cur_path in os.listdir(SOURCE_PATH):
41 |         cur =np.asarray(read_img_path(os.path.join(SOURCE_PATH,cur_path)))
42 |         np.random.shuffle(cur)
43 |         train = cur[:int(len(cur)*0.9)]
44 |         val = cur[int(len(cur)*0.9):]
45 |         #文件移动resizeTrainAndTest下的train和val中的norm和flaws
46 |         build_remove(train,os.path.join(resizeTrainAndTest,'train',cur_path))
47 |         build_remove(val,os.path.join(resizeTrainAndTest,'val',cur_path))
48 | 


--------------------------------------------------------------------------------
/code/feature_extract.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Fri Jul 27 20:06:50 2018
  4 | 
  5 | @author: tunan
  6 | """
  7 | 
  8 | '''
  9 | 自己总结一份transla learning加上每张图片的hog值
 10 | 使用模型融合
 11 | data_gene.flow_from_direction
 12 | 一、h5py存入字符串
 13 |    1、规定格式 dt =  h5py.special_dtype(vlen=str)
 14 |    2、讲字符串列表转成字符串array
 15 |    3、创建h5py文件，创建数组ds = h.creat_dataset('a',a.shape,dtype = dt),注意直接第二个参数用a.shape
 16 |        写入数据 ds[:] = a
 17 |    4、读出 np.array(h['a'])
 18 | '''
 19 | from keras.layers import GlobalAveragePooling2D
 20 | from keras.applications import vgg16,inception_v3,resnet50
 21 | from keras.preprocessing.image import ImageDataGenerator
 22 | from keras.models import Model
 23 | import numpy as np
 24 | import h5py
 25 | from tqdm import tqdm
 26 | import os,cv2
 27 | 
 28 | def get_test(Models,lamda_func):
 29 |     #得到test的特征和文件名
 30 |     path = r"C:\Users\tunan\Desktop\xuelang\resize_test"
 31 |     basic_model = Models(include_top = False,weights = 'imagenet')
 32 |     feature = GlobalAveragePooling2D()(basic_model.output)
 33 |     model = Model(inputs = basic_model.input,outputs = feature)
 34 |     file_name = os.listdir(path)
 35 |     path_name = [os.path.join(path,x) for x in file_name]
 36 |     test_data = []
 37 |     for p in tqdm(path_name):
 38 |         data = cv2.imread(p)
 39 |         data = np.expand_dims(data,axis = 0)
 40 |         data = lamda_func(data)
 41 |         feature = model.predict(data)
 42 |         test_data.append(feature)
 43 |     return np.concatenate(test_data,axis = 0),np.array(file_name)
 44 | 
 45 | def white_gap(input_size,Models,lamda_func,name):
 46 |     basic_model = Models(include_top = False,weights = 'imagenet')
 47 |     feature = GlobalAveragePooling2D()(basic_model.output)
 48 |     model = Model(inputs = basic_model.input,outputs = feature)
 49 |     gen = ImageDataGenerator()
 50 |     train_gen = gen.flow_from_directory(r"data\valAndTrain\train",
 51 |                                         shuffle = False,batch_size = 16,
 52 |                                         class_mode = 'categorical',
 53 |                                         target_size = input_size,
 54 |                                         interpolation='antialias',
 55 |                                          )
 56 |     val_gen = gen.flow_from_directory(r"data\valAndTrain\val",
 57 |                                         shuffle = False,batch_size = 16,
 58 |                                         class_mode = 'categorical',
 59 |                                         target_size = input_size,
 60 |                                         interpolation='antialias',
 61 |                                         )
 62 |     #predict_generator中的steps是指预测多少个batch，即生成器生成多少次。train_gen.n是所有样本数
 63 |     ntrain_batch = int(np.ceil(train_gen.n/16.))
 64 |     train_feature = []
 65 |     train_labels = []
 66 |     #将hog值和特征一批一批的提取出来
 67 |     for e in tqdm(range(ntrain_batch)):
 68 |         (data,label) = train_gen.next()
 69 |         train_labels.append(label)
 70 |         data =lamda_func(data)
 71 |         feature = model.predict(data)
 72 |         #提取特征，加入train_feature中
 73 |         train_feature.append(feature)
 74 |     train_feature = np.concatenate(train_feature,axis = 0)
 75 |     train_labels = np.concatenate(train_labels)
 76 |     #nval_batch是val上batch的次数，用一个train_feature能指定内存，无需动态分配
 77 |     nval_batch = int(np.ceil(val_gen.n/16.))
 78 |     val_dafeature = []
 79 |     val_labels = []
 80 |     for e in tqdm(range(nval_batch)):
 81 |         (data,label) = val_gen.next()
 82 |         val_labels.append(label)
 83 |         data =lamda_func(data)
 84 |         feature = model.predict(data)
 85 |         val_feature.append(feature)
 86 |     val_feature = np.concatenate(val_feature,axis = 0)
 87 |     val_labels = np.concatenate(val_labels)
 88 |     #得到test的特征
 89 | #    test_feature,test_name = get_test(Models,lamda_func)
 90 |     '''
 91 |     将数据以array的形式储存到h5文件中
 92 |     '''
 93 | #    dt = h5py.special_dtype(vlen = str)
 94 |     with h5py.File("data\\multi_gap_%s.h5"%(name)) as h:
 95 |         h.create_dataset("train", data=train_feature)
 96 |         h.create_dataset("val", data=val_feature)
 97 | #        h.create_dataset('test',data = test_feature)
 98 |         h.create_dataset("train_labels", data=train_labels)
 99 |         h.create_dataset("val_labels", data=val_labels)
100 | #        ds = h.create_dataset('file_name',test_name.shape,dtype = dt)
101 | #        ds[:] = test_name
102 | #使用vgg16,inception-v3,xception提取特征
103 | os.chdir("..")
104 | white_gap((600,600),vgg16.VGG16,vgg16.preprocess_input,'vgg16')
105 | white_gap((600,600),inception_v3.InceptionV3,inception_v3.preprocess_input,'inception_v3')
106 | white_gap((600,600),resnet50.ResNet50,resnet50.preprocess_input,'resnet50')


--------------------------------------------------------------------------------
/code/trainMultiFeature.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | Created on Mon Jul 30 16:39:10 2018
 4 | 
 5 | @author: tunan
 6 | """
 7 | 
 8 | import h5py
 9 | import numpy as np
10 | from sklearn.utils import shuffle
11 | from keras.layers import Input,Dense,Dropout
12 | from keras.models import Model
13 | from sklearn.metrics import roc_auc_score
14 | #from train_with_feature import build_test
15 | import os
16 | os.chdir("..")
17 | x_train,x_val,x_test =[],[],[]
18 | model_list = ['inception_v3','vgg16','resnet50']
19 | for modelName in model_list:
20 |     with h5py.File("data\\multi_gap_%s.h5"%(modelName),'r') as h:
21 |         x_train.append(np.array(h["train"]))
22 |         y_train = np.array(h['train_labels'])
23 |         x_val.append(np.array(h['val']))
24 |         y_val = np.array(h['val_labels'])
25 | # =============================================================================
26 | #         x_test.append(np.array(h['test']))
27 | #         Lname = np.array(h['file_name'])
28 | # =============================================================================
29 | x_train = np.concatenate(x_train,axis = 1)
30 | x_val = np.concatenate(x_val,axis = 1)
31 | #x_test = np.concatenate(x_test,axis = 1)
32 | #特征提取出来后随机打乱    
33 | shuffle(x_train,y_train)
34 | shuffle(x_val,y_val)
35 | #构造图结构，函数化结构
36 | inputs = Input((x_train.shape[1],))
37 | x = Dense(512,activation = 'relu',
38 |           kernel_initializer = 'TruncatedNormal')(inputs)
39 | x = Dropout(0.5)(inputs)
40 | outputs = Dense(5,activation = 'sigmoid',
41 |                 kernel_initializer = 'TruncatedNormal')(x)
42 | model = Model(inputs = inputs,outputs = outputs)
43 | model.compile(optimizer = 'Adam',loss = 'categorical_crossentropy',metrics = ['accuracy'])
44 | #开始训练
45 | model.fit(x= x_train,y = y_train,validation_data = (x_val,y_val),verbose = 2,
46 |           batch_size = 128,epochs = 30)
47 | #计算最后模型的auc值
48 | ypre_val = model.predict(x_val)
49 | print('auc is :',roc_auc_score(y_val,ypre_val))
50 | #导出预测文件
51 | #ypre_test = model.predict(x_test)
52 | #result = 1-ypre_test
53 | #build_test(result.reshape(-1,).tolist(),Lname.tolist(),'result_multi_hog2.csv')
54 | 


--------------------------------------------------------------------------------
/flower1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SummerRaining/kera_translation_learning/3cf11b3743b114a4cbe40d6d1239ac9593dda998/flower1.jpg


--------------------------------------------------------------------------------
/flower2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SummerRaining/kera_translation_learning/3cf11b3743b114a4cbe40d6d1239ac9593dda998/flower2.jpg


--------------------------------------------------------------------------------
/flower3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SummerRaining/kera_translation_learning/3cf11b3743b114a4cbe40d6d1239ac9593dda998/flower3.jpg


--------------------------------------------------------------------------------
/flower4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SummerRaining/kera_translation_learning/3cf11b3743b114a4cbe40d6d1239ac9593dda998/flower4.jpg


--------------------------------------------------------------------------------
/flower5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SummerRaining/kera_translation_learning/3cf11b3743b114a4cbe40d6d1239ac9593dda998/flower5.jpg


--------------------------------------------------------------------------------
/readme.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SummerRaining/kera_translation_learning/3cf11b3743b114a4cbe40d6d1239ac9593dda998/readme.txt


--------------------------------------------------------------------------------
/result.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SummerRaining/kera_translation_learning/3cf11b3743b114a4cbe40d6d1239ac9593dda998/result.PNG


--------------------------------------------------------------------------------
/result1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/SummerRaining/kera_translation_learning/3cf11b3743b114a4cbe40d6d1239ac9593dda998/result1.PNG


--------------------------------------------------------------------------------